HANDBOOK OF THE ECONOMICS OF FINANCE VOLUME 1B
HANDBOOKS IN ECONOMICS 21 Series Editors
KENNETH J. ARROW MICHAEL D. INTRILIGATOR
Amsterdam • Boston • Heidelberg • London • New York • Oxford Paris • San Diego • San Francisco • Singapore • Sydney • Tokyo
HANDBOOK OF THE ECONOMICS OF FINANCE VOLUME 1B FINANCIAL MARKETS AND ASSET PRICING Edited by
GEORGE M. CONSTANTINIDES University of Chicago
MILTON HARRIS University of Chicago and
RENE´ M. STULZ Ohio State University 2003
Amsterdam • Boston • Heidelberg • London • New York • Oxford Paris • San Diego • San Francisco • Singapore • Sydney • Tokyo
ELSEVIER B.V. Sara Burgerhartstraat 25 P.O. Box 211, 1000 AE Amsterdam, The Netherlands © 2003 Elsevier B.V. All rights reserved. This work is protected under copyright by Elsevier, and the following terms and conditions apply to its use: Photocopying Single photocopies of single chapters may be made for personal use as allowed by national copyright laws. Permission of the Publisher and payment of a fee is required for all other photocopying, including multiple or systematic copying, copying for advertising or promotional purposes, resale, and all forms of document delivery. Special rates are available for educational institutions that wish to make photocopies for non-profit educational classroom use. Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax (+44) 1865 853333, e-mail:
[email protected]. You may also complete your request on-line via the Elsevier home page (http://www.elsevier.com) by selecting ‘Customer Support’ and then ‘Obtaining Permissions’. In the USA, users may clear permissions and make payments through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA; phone: (+1) (978) 7508400, fax: (+1) (978) 7504744, and in the UK through the Copyright Licensing Agency Rapid Clearance Service (CLARCS), 90 Tottenham Court Road, London W1P 0LP, UK; phone: (+44) 20 7631 5555; fax: (+44) 20 7631 5500. Other countries may have a local reprographic rights agency for payments. Derivative Works Tables of contents may be reproduced for internal circulation, but permission of Elsevier is required for external resale or distribution of such material. Permission of the Publisher is required for all other derivative works, including compilations and translations. Electronic Storage or Usage Permission of the Publisher is required to store or use electronically any material contained in this work, including any chapter or part of a chapter. Except as outlined above, no part of this work may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the Publisher. Address permissions requests to: Elsevier’s Science & Technology Rights Department, at the phone, fax and e-mail addresses noted above. Notice No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made. First edition 2003 Library of Congress Cataloging-in-Publication Data A catalog record from the Library of Congress has been applied for. British Library Cataloguing in Publication Data A catalogue record from the British Library has been applied for.
ISBN: 0-444-50298-X (set, comprising vols. 1A & 1B) ISBN: 0-444-51362-0 (vol. 1A) ISBN: 0-444-51363-9 (vol. 1B) ISSN: 0169-7218 (Handbooks in Economics Series) ∞ The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper). Printed in The Netherlands.
INTRODUCTION TO THE SERIES
The aim of the Handbooks in Economics series is to produce Handbooks for various branches of economics, each of which is a definitive source, reference, and teaching supplement for use by professional researchers and advanced graduate students. Each Handbook provides self-contained surveys of the current state of a branch of economics in the form of chapters prepared by leading specialists on various aspects of this branch of economics. These surveys summarize not only received results but also newer developments, from recent journal articles and discussion papers. Some original material is also included, but the main goal is to provide comprehensive and accessible surveys. The Handbooks are intended to provide not only useful reference volumes for professional collections but also possible supplementary readings for advanced courses for graduate students in economics. KENNETH J. ARROW and MICHAEL D. INTRILIGATOR
PUBLISHER’S NOTE For a complete overview of the Handbooks in Economics Series, please refer to the listing at the end of this volume.
This Page Intentionally Left Blank
CONTENTS OF THE HANDBOOK
VOLUME 1A CORPORATE FINANCE Chapter 1 Corporate Governance and Control ¨ MARCO BECHT, PATRICK BOLTON and AILSA ROELL Chapter 2 Agency, Information and Corporate Investment JEREMY C. STEIN Chapter 3 Corporate Investment Policy MICHAEL J. BRENNAN Chapter 4 Financing of Corporations STEWART C. MYERS Chapter 5 Investment Banking and Security Issuance JAY R. RITTER Chapter 6 Financial Innovation PETER TUFANO Chapter 7 Payout Policy FRANKLIN ALLEN and RONI MICHAELY Chapter 8 Financial Intermediation GARY GORTON and ANDREW WINTON Chapter 9 Market Microstructure HANS R. STOLL
viii
Contents of the Handbook
VOLUME 1B FINANCIAL MARKETS AND ASSET PRICING Chapter 10 Arbitrage, State Prices and Portfolio Theory PHILIP H. DYBVIG and STEPHEN A. ROSS Chapter 11 Intertemporal Asset-Pricing Theory DARRELL DUFFIE Chapter 12 Tests of Multi-Factor Pricing Models, Volatility, and Portfolio Performance WAYNE E. FERSON Chapter 13 Consumption-Based Asset Pricing JOHN Y. CAMPBELL Chapter 14 The Equity Premium in Retrospect RAJNISH MEHRA and EDWARD C. PRESCOTT Chapter 15 Anomalies and Market Efficiency G. WILLIAM SCHWERT Chapter 16 Are Financial Assets Priced Locally or Globally? G. ANDREW KAROLYI and RENE´ M. STULZ Chapter 17 Microstructure and Asset Pricing DAVID EASLEY and MAUREEN O’HARA Chapter 18 A Survey of Behavioral Finance NICHOLAS C. BARBERIS and RICHARD H. THALER Finance Optimization, and the Irreducibly Irrational Component of Human Behavior ROBERT J. SHILLER Chapter 19 Derivatives ROBERT E. WHALEY Chapter 20 Fixed Income Pricing QIANG DAI and KENNETH J. SINGLETON
PREFACE
Financial economics applies the techniques of economic analysis to understand the savings and investment decisions by individuals, the investment, financing and payout decisions by firms, the level and properties of interest rates and prices of financial assets and derivatives, and the economic role of financial intermediaries. Until the 1950s, finance was viewed primarily as the study of financial institutional detail and was hardly accorded the status of a mainstream field of economics. This perception was epitomized by the difficulty Harry Markowitz had in receiving a PhD degree in the economics department at the University of Chicago for work that eventually would earn him a Nobel prize in economic science. This state of affairs changed in the second half of the 20th century with a revolution that took place from the 1950s to the early 1970s. At that time, key progress was made in understanding the financial decisions of individuals and firms and their implications for the pricing of common stocks, debt, and interest rates. Harry Markowitz, William Sharpe, James Tobin, and others showed how individuals concerned about their expected future wealth and its variance make investment decisions. Their key results showing the benefits of diversification, that wealth is optimally allocated across funds that are common across individuals, and that investors are rewarded for bearing risks that are not diversifiable, are now the basis for much of the investment industry. Merton Miller and Franco Modigliani showed that the concept of arbitrage is a powerful tool to understand the implications of firm capital structures for firm value. In a world without frictions, they showed that a firm’s value is unrelated to its capital structure. Eugene Fama put forth the efficient markets hypothesis and led the way in its empirical investigation. Finally, Fischer Black, Robert Merton and Myron Scholes provided one of the most elegant theories in all of economics: the theory of how to price financial derivatives in markets without frictions. Following the revolution brought about by these fathers of modern finance, the field of finance has experienced tremendous progress. Along the way, it influenced public policy throughout the world in a major way, played a crucial role in the growth of a new $100 trillion dollar derivatives industry, and affected how firms are managed everywhere. However, finance also evolved from being at best a junior partner in economics to being often a leader. Key concepts and theories first developed in finance led to progress in other fields of economics. It is now common among economists to use theories of arbitrage, rational expectations, equilibrium, agency relations, and information asymmetries that were first developed in finance. The committee for the
x
Preface
Alfred Nobel Memorial Prize in economic science eventually recognized this state of affairs. Markowitz, Merton, Miller, Modigliani, Scholes, Sharpe, and Tobin received Nobel prizes for contributions in financial economics. This Handbook presents the state of the field of finance fifty years after this revolution in modern finance started. The surveys are written by leaders in financial economics. They provide a comprehensive report on developments in both theory and empirical testing in finance at a level that, while rigorous, is nevertheless accessible to researchers not intimate with the field and doctoral students in economics, finance and related fields. By summarizing the state of the art and pointing out as-yet unresolved questions, this Handbook should prove an invaluable resource to researchers planning to contribute to the field and an excellent pedagogical tool for teaching doctoral students. The book is divided into two Volumes, corresponding to the traditional taxonomy of finance: corporate finance (1A) and financial markets and asset pricing (1B).
1. Corporate finance Corporate finance is concerned with how businesses work, in particular, how they allocate capital (traditionally, “the capital budgeting decision”) and how they obtain capital (“the financing decision”). Though managers play no independent role in the work of Miller and Modigliani, major contributions in finance since then have shown that managers maximize their own objectives. To understand the firm’s decisions, it is therefore necessary to understand the forces that lead managers to maximize the wealth of shareholders. For example, a number of researchers have emphasized the positive and negative roles of large shareholders in aligning incentives of managers and shareholders. The part of the Handbook devoted to corporate finance starts with an overview, entitled Corporate Governance and Control, by Marco Becht, Patrick Bolton, and Ailsa R¨oell (Chapter 1) of the framework in which managerial activities take place. Their broad survey covers everything about corporate governance, from its history and importance to theories and empirical evidence to cross-country comparisons. Following the survey of corporate governance in Chapter 1, two complementary essays discuss the investment decision. In Agency, Information and Corporate Investment, Jeremy Stein (Chapter 2) focuses on the effects of agency problems and asymmetric information on the allocation of capital, both across firms and within firms. This survey does not address the issue of how to value a proposed investment project, given information about the project. That topic is considered in Corporate Investment Policy by Michael Brennan in Chapter 3. Brennan draws out the implications of recent developments in asset pricing, including option pricing techniques and tax considerations, for evaluating investment projects. In Chapter 4, Financing of Corporations, the focus moves to the financing decision. Stewart Myers provides an overview of the research that seeks to explain firms’ capital structure, that is, the types and proportions of securities firms use to finance their
Preface
xi
investments. Myers covers the traditional theories that attempt to explain proportions of debt and equity financing as well as more recent theories that attempt to explain the characteristics of the securities issued. In assessing the different capital structure theories, he concludes that he does not expect that there will ever be “one” capital structure theory that applies to all firms. Rather, he believes that we will always use different theories to explain the behavior of different types of firms. In Chapter 5, Investment Banking and Security Issuance, Jay Ritter is concerned with how firms raise equity and the role of investment banks in that process. He examines both initial public offerings and seasoned equity offerings. A striking result discovered first by Ritter is that firms that issue equity experience poor long-term stock returns afterwards. This result has led to a number of vigorous controversies that Ritter reviews in this chapter. Firms may also obtain capital by issuing securities other than equity and debt. A hallmark of the last thirty years has been the tremendous amount of financial innovation that has taken place. Though some of the innovations fizzled and others provided fodder to crooks, financial innovation can enable firms to undertake profitable projects that otherwise they would not be able to undertake. In Chapter 6, Financial Innovation, Peter Tufano delves deeper into the issues of security design and financial innovation. He reviews the process of financial innovation and explanations of the quantity of innovation. Investors do not purchase equity without expecting a return from their investment. In one of their classic papers, Miller and Modigliani show that, in the absence of frictions, dividend policy is irrelevant for firm value. Since then, a large literature has developed that identifies when dividend policy matters and when it does not. Franklin Allen and Roni Michaely (Chapter 7) survey this literature in their essay entitled Payout Policy. Allen and Michaely consider the roles of taxes, asymmetric information, incomplete contracting and transaction costs in determining payouts to equity holders, both dividends and share repurchases. Chapter 8, Financial Intermediation, focuses more directly on the role financial intermediaries play. Although some investment is funded directly through capital markets, according to Gary Gorton and Andrew Winton, the vast majority of external investment flows through financial intermediaries. In Chapter 8, Gorton and Winton survey the literature on financial intermediation with emphasis on banking. They explore why intermediaries exist, discuss banking crises, and examine why and how they are regulated. Exchanges on which securities are traded play a crucial role in intermediating between individuals who want to buy securities and others who want to sell them. In many ways, they are special types of corporations whose workings affect the value of financial securities as well as the size of financial markets. The Handbook contains two chapters that deal with the issues of how securities are traded. Market Microstructure, by Hans Stoll (Chapter 9), focuses on how exchanges perform their functions as financial intermediaries and therefore is included in this part. Stoll examines explanations of the bid-ask spread, the empirical evidence for these explanations, and the implications for market design. Microstructure and Asset Pricing,
xii
Preface
by Maureen O’Hara and David Easley (Chapter 17), examines the implications of how securities trade for the properties of securities returns and is included in Volume 1B on Financial Markets and Asset Pricing.
2. Financial markets and asset pricing A central theme in finance and economics is the pursuit of an understanding of how the prices of financial securities are determined in financial markets. Currently, there is immense interest among academics, policy makers, and practitioners in whether these markets get prices right, fueled in part by the large daily volatility in prices and by the large increase in stock prices over most of the 1990s, followed by the sharp decrease in prices at the turn of the century. Our understanding of how securities are priced is far from complete. In the early 1960s, Eugene Fama from the University of Chicago established the foundations for the “efficient markets” view that financial markets are highly effective in incorporating information into asset prices. This view led to a large body of empirical and theoretical work. Some of the chapters in this part of the Handbook review that body of work, but the “efficient markets” view has been challenged by the emergence of a new, controversial field, behavioral finance, which seeks to show that psychological biases of individuals affect the pricing of securities. There is therefore divergence of opinion and critical reexamination of given doctrine. This is fertile ground for creative thinking and innovation. In Volume 1B of the Handbook, we invite the reader to partake in this intellectual odyssey. We present eleven original essays on the economics of financial markets. The divergence of opinion and puzzles presented in these essays belies the incredible progress made by financial economists over the second half of the 20th century that lay the foundations for future research. The modern quantitative approach to finance has its origins in neoclassical economics. In the opening essay titled Arbitrage, State Prices and Portfolio Theory (Chapter 10), Philip Dybvig and Stephen Ross illustrate a surprisingly large amount of the intuition and intellectual content of modern finance in the context of a singleperiod, perfect-markets neoclassical model. They discuss the fundamental theorems of asset pricing – the consequences of the absence of arbitrage, optimal portfolio choice, the properties of efficient portfolios, aggregation, the capital asset-pricing model (CAPM), mutual fund separation, and the arbitrage pricing theory (APT). A number of these notions may be traced to the original contributions of Stephen Ross. In his essay titled Intertemporal Asset Pricing Theory (Chapter 11), Darrell Duffie provides a systematic development of the theory of intertemporal asset pricing, first in a discrete-time setting and then in a continuous-time setting. As applications of the basic theory, Duffie also presents comprehensive treatments of the term structure of interest rates and fixed-income pricing, derivative pricing, and the pricing of corporate securities with default modeled both as an endogenous and an exogenous process.
Preface
xiii
These applications are discussed in further detail in some of the subsequent essays. Duffie’s essay is comprehensive and authoritative and may serve as the basis of an entire 2nd-year PhD-level course on asset pricing. Historically, the empirically testable implications of asset-pricing theory have been couched in terms of the mean-variance efficiency of a given portfolio, the validity of a multifactor pricing model with given factors, or the validity of a given stochastic discount factor. Furthermore, different methodologies have been developed and applied in the testing of these implications. In Tests of Multi-Factor Pricing Models, Volatility, and Portfolio Performance (Chapter 12), Wayne Ferson discusses the empirical methodologies applied in testing asset-pricing models. He points out that these three statements of the empirically testable implications are essentially equivalent and that the seemingly different empirical methodologies are equivalent as well. In his essay titled Consumption-Based Asset Pricing (Chapter 13), John Campbell begins by reviewing the salient features of the joint behavior of equity returns, aggregate dividends, the interest rate, and aggregate consumption in the USA. Features that challenge existing asset-pricing theory include, but are not limited to, the “equity premium puzzle”: the finding that the low covariance of the growth rate of aggregate consumption with equity returns is a major stumbling block in explaining the mean aggregate equity premium and the cross-section of asset returns, in the context of the representative-consumer, time-separable-preferences models examined by Grossman and Shiller (1981), Hansen and Singleton (1983), and Mehra and Prescott (1985). Campbell also examines data from other countries to see which features of the USA data are pervasive. He then proceeds to relate these findings to recent developments in asset-pricing theory that relax various assumptions of the standard asset-pricing model. In a closely related essay titled The Equity Premium in Retrospect (Chapter 14), Rajnish Mehra and Edward Prescott – the researchers who coined the term – critically reexamine the data sources used to document the equity premium puzzle in the USA and other major industrial countries. They then proceed to relate these findings to recent developments in asset-pricing theory by employing the methodological tool of calibration, as opposed to the standard empirical estimation of model parameters and the testing of over-identifying restrictions. Mehra and Prescott have different views than Campbell as to which assumptions of the standard asset-pricing model need to be relaxed in order to address the stylized empirical findings. Why are these questions important? First and foremost, financial markets play a central role in the allocation of investment capital and in the sharing of risk. Failure to answer these questions suggests that our understanding of the fundamental process of capital allocation is highly imperfect. Second, the basic economic paradigm employed in analyzing financial markets is closely related to the paradigm employed in the study of business cycles and growth. Failure to explain the stylized facts of financial markets calls into question the appropriateness of the related paradigms for the study of macroeconomic issues. The above two essays convey correctly the status quo that the puzzle
xiv
Preface
is at the forefront of academic interest and that views regarding its resolution are divergent. Several goals are accomplished in William Schwert’s comprehensive and incisive essay titled Anomalies and Market Efficiency (Chapter 15). First, Schwert discusses cross-sectional and time-series regularities in asset returns, both at the aggregate and disaggregate level. These include the size, book-to-market, momentum, and dividend yield effects. Second, Schwert discusses differences in returns realized by different types of investors, including individual and institutional investors. Third, he evaluates the role of measurement issues in many of the papers that study anomalies, including the difficult issues associated with long-horizon return performance. Finally, Schwert discusses the implications of the anomalies literature for asset-pricing and corporate finance theories. In discussing the informational efficiency of the market, Schwert points out that tests of market efficiency are also joint tests of market efficiency and a particular equilibrium asset-pricing model. In the essay titled Are Financial Assets Priced Locally or Globally? (Chapter 16), Andrew Karolyi and Ren´e Stulz discuss the theoretical implications of and empirical evidence concerning asset-pricing theory as it applies to international equities markets. They explain that country-risk premia are determined internationally, but the evidence is weak on whether international factors affect the cross-section of expected returns. A long-standing puzzle in international finance is that investors invest more heavily in domestic equities than predicted by the theory. Karolyi and Stulz argue that barriers to international investment only partly resolve the home-bias puzzle. They conclude that contagion – the linkage of international markets – may be far less prevalent than commonly assumed. At frequencies lower than the daily frequency, asset-pricing theory generally ignores the role of the microstructure of financial markets. In their essay titled Microstructure and Asset Pricing (Chapter 17), David Easley and Maureen O’Hara survey the theoretical and empirical literature linking microstructure factors to long-run returns, and focus on why stock prices might be expected to reflect premia related to liquidity or informational asymmetries. They show that asset-pricing dynamics may be better understood by recognizing the role played by microstructure factors and the linkages of microstructure and fundamental economic variables. All the models that are discussed in the essays by Campbell, Mehra and Prescott, Schwert, Karolyi and Stulz, and Easley and O’Hara are variations of the neoclassical asset-pricing model. The model is rational, in that investors process information rationally and have unambiguously defined preferences over consumption. Naturally, the model allows for market incompleteness, market imperfections, informational asymmetries, and learning. The model also allows for differences among assets for liquidity, transaction costs, tax status, and other institutional factors. Many of these variations are explored in the above essays. In their essay titled A Survey of Behavioral Finance (Chapter 18), Nicholas Barberis and Richard Thaler provide a counterpoint to the rational model by providing explanations of the cross-sectional and time-series regularities in asset returns by
Preface
xv
relying on economic models that are less than fully rational. These include cultural and psychological factors and tap into the rich and burgeoning literature on behavioral economics and finance. Robert Shiller, who is, along with Richard Thaler, one of the founders of behavioral finance, provides his personal perspective on behavioral finance in his statement titled Finance, Optimization and the Irreducibly Irrational Component of Human Behavior. One of the towering achievements in finance in the second half of the 20th century is the celebrated option-pricing theory of Black and Scholes (1973) and Merton (1973). The model has had a profound influence on the course of economic thought. In his essay titled Derivatives (Chapter 19), Robert Whaley provides comprehensive coverage of the topic. Following a historical overview of futures and options, he proceeds to derive the implications of the law of one price and then the Black–Scholes– Merton theory. He concludes with a systematic coverage of the empirical evidence and a discussion of the social costs and benefits associated with the introduction of derivatives. Whaley’s thorough and insightful essay provides an easy entry to an important topic that many economists find intimidating. In their essay titled Fixed-Income Pricing (Chapter 20), Qiang Dai and Ken Singleton survey the literature on fixed-income pricing models, including term structure models, fixed-income derivatives, and models of defaultable securities. They point out that this literature is vast, with both the academic and practitioner communities having proposed a wide variety of models. In guiding the reader through these models, they explain that different applications call for different models based on the trade-offs of complexity, flexibility, tractability, and data availability – the “art” of modeling. The Dai and Singleton essay, combined with Duffie’s earlier essay, provides an insightful and authoritative introduction to the world of fixed-income pricing models at the advanced MBA and PhD levels. We hope that the contributions represented by these essays communicate the excitement of financial economics to beginners and specialists alike and stimulate further research. We thank Rodolfo Martell for his help in processing the papers for publication. GEORGE M. CONSTANTINIDES University of Chicago, Chicago MILTON HARRIS University of Chicago, Chicago RENE´ STULZ Ohio State University, Columbus References Black, F., and M.S. Scholes (1973), “The pricing of options and corporate liabilities”, Journal of Political Economy 81:637−654.
xvi
Preface
Grossman, S.J., and R.J. Shiller (1981), “The determinants of the variability of stock market prices”, American Economic Review Papers and Proceedings 71:222−227. Hansen, L.P., and K.J. Singleton (1982), “Generalized instrumental variables estimation of nonlinear rational expectations models”, Econometrica 50:1269−1288. Mehra, R., and E.C. Prescott (1985), “The equity premium: a puzzle”, Journal of Monetary Economics 15:145−161. Merton, R.C. (1973), “Theory of rational option pricing”, Bell Journal of Economics and Management Science 4:141−183.
CONTENTS OF VOLUME 1B
Introduction to the Series
v
Contents of the Handbook
vii
Preface
ix
FINANCIAL MARKETS AND ASSET PRICING Chapter 10 Arbitrage, State Prices and Portfolio Theory PHILIP H. DYBVIG and STEPHEN A. ROSS Abstract Keywords 1. Introduction 2. Portfolio problems 3. Absence of arbitrage and preference-free results
7. Arbitrage pricing theory (APT) 8. Conclusion References
605 606 606 607 607 612 614 616 618 619 619 620 621 622 624 629 629 631 633 634 634
Chapter 11 Intertemporal Asset Pricing Theory DARRELL DUFFIE Abstract
639 641
3.1. Fundamental theorem of asset pricing 3.2. Pricing rule representation theorem
4. Various analyses: Arrow–Debreu world 4.1. 4.2. 4.3. 4.4. 4.5.
Optimal portfolio choice Efficient portfolios Aggregation Asset pricing Payoff distribution pricing
5. Capital asset pricing model (CAPM) 6. Mutual fund separation theory 6.1. Preference approach 6.2. Beliefs
xviii
Contents of Volume 1B
Keywords 1. Introduction 2. Basic theory 2.1. 2.2. 2.3. 2.4. 2.5. 2.6. 2.7. 2.8. 2.9. 2.10.
Setup Arbitrage, state prices, and martingales Individual agent optimality Habit and recursive utilities Equilibrium and Pareto optimality Equilibrium asset pricing Breeden’s consumption-based CAPM Arbitrage and martingale measures Valuation of redundant securities American exercise policies and valuation
3. Continuous-time modeling 3.1. 3.2. 3.3. 3.4. 3.5. 3.6. 3.7. 3.8. 3.9. 3.10. 3.11. 3.12. 3.13.
Trading gains for Brownian prices Martingale trading gains The Black–Scholes option-pricing formula Ito’s Formula Arbitrage modeling Numeraire invariance State prices and doubling strategies Equivalent martingale measures Girsanov and market prices of risk Black–Scholes again Complete markets Optimal trading and consumption Martingale solution to Merton’s problem
4. Term-structure models 4.1. 4.2. 4.3. 4.4. 4.5. 4.6.
One-factor models Term-structure derivatives Fundamental solution Multifactor term-structure models Affine models The HJM model of forward rates
5. Derivative pricing 5.1. Forward and futures prices 5.2. Options and stochastic volatility 5.3. Option valuation by transform analysis
6. Corporate securities 6.1. 6.2. 6.3. 6.4.
Endogenous default timing Example: Brownian dividend growth Taxes, bankruptcy costs, capital structure Intensity-based modeling of default
641 642 642 643 644 646 647 649 651 653 654 656 657 661 662 663 665 668 670 670 671 672 672 676 677 678 682 686 687 691 693 695 696 699 702 702 705 708 711 712 713 717 719
Contents of Volume 1B 6.5. Zero-recovery bond pricing 6.6. Pricing with recovery at default 6.7. Default-adjusted short rate
References Chapter 12 Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance WAYNE E. FERSON Abstract Keywords 1. Introduction 2. Multifactor asset-pricing models: Review and integration
xix
721 722 724 725
6. Conclusions References
743 745 745 746 748 748 750 751 753 754 760 765 768 768 770 773 774 774 775 781 785 787 788 790 792 793 794 795
Chapter 13 Consumption-Based Asset Pricing JOHN Y. CAMPBELL Abstract Keywords 1. Introduction
803 804 804 805
2.1. 2.2. 2.3. 2.4. 2.5. 2.6. 2.7.
The stochastic discount factor representation Expected risk premiums Return predictability Consumption-based asset-pricing models Multi-beta pricing models Mean-variance efficiency with conditioning information Choosing the factors
3. Modern variance bounds 3.1. The Hansen–Jagannathan bounds 3.2. Variance bounds with conditioning information 3.3. The Hansen–Jagannathan distance
4. Methodology and tests of multifactor asset-pricing models 4.1. The Generalized Method of Moments approach 4.2. Cross-sectional regression methods 4.3. Multivariate regression and beta-pricing models
5. Conditional performance evaluation 5.1. 5.2. 5.3. 5.4. 5.5.
Stochastic discount factor formulation Beta-pricing formulation Using portfolio weights Conditional market-timing models Empirical evidence on conditional performance
xx
Contents of Volume 1B
2. International stock market data 3. The equity premium puzzle 3.1. 3.2. 3.3. 3.4. 3.5.
The stochastic discount factor Consumption-based asset pricing with power utility The risk-free rate puzzle Bond returns and the equity-premium and risk-free rate puzzles Separating risk aversion and intertemporal substitution
4. The dynamics of asset returns and consumption 4.1. 4.2. 4.3. 4.4. 4.5. 4.6. 4.7.
Time-variation in conditional expectations A loglinear asset-pricing framework The equity volatility puzzle Implications for the equity premium puzzle What does the stock market forecast? Changing volatility in stock returns What does the bond market forecast?
5. Cyclical variation in the price of risk 5.1. Habit formation 5.2. Models with heterogeneous agents 5.3. Irrational expectations
6. Some implications for macroeconomics References Chapter 14 The Equity Premium in Retrospect RAJNISH MEHRA and EDWARD C. PRESCOTT Abstract Keywords 1. Introduction 2. The equity premium: history 2.1. 2.2. 2.3. 2.4.
Facts Data sources Estimates of the equity premium Variation in the equity premium over time
3. Is the equity premium due to a premium for bearing non-diversifiable risk? 3.1. 3.2. 3.3. 3.4. 3.5.
Standard preferences Estimating the equity risk premium versus estimating the risk aversion parameter Alternative preference structures Idiosyncratic and uninsurable income risk Models incorporating a disaster state and survivorship bias
4. Is the equity premium due to borrowing constraints, a liquidity premium or taxes? 4.1. Borrowing constraints 4.2. Liquidity premium
810 816 816 819 824 827 828 832 832 836 840 845 849 857 859 866 866 873 876 879 881
889 890 890 891 891 891 892 894 897 899 902 912 913 918 920 921 921 924
Contents of Volume 1B 4.3. Taxes and regulation
5. An equity premium in the future? Appendix A Appendix B. The original analysis of the equity premium puzzle B.1. The economy, asset prices and returns
References Chapter 15 Anomalies and Market Efficiency G. WILLIAM SCHWERT Abstract Keywords 1. Introduction 2. Selected empirical regularities 2.1. Predictable differences in returns across assets 2.2. Predictable differences in returns through time
3. Returns to different types of investors 3.1. Individual investors 3.2. Institutional investors 3.3. Limits to arbitrage
4. Long-run returns 4.1. Returns to firms issuing equity 4.2. Returns to bidder firms
5. Implications for asset pricing 5.1. 5.2. 5.3. 5.4.
The search for risk factors Conditional asset pricing Excess volatility The role of behavioral finance
6. Implications for corporate finance 6.1. Firm size and liquidity 6.2. Book-to-market effects 6.3. Slow reaction to corporate financial policy
7. Conclusions References Chapter 16 Are Financial Assets Priced Locally or Globally? G. ANDREW KAROLYI and RENE´ M. STULZ Abstract Keywords 1. Introduction 2. The perfect financial markets model 2.1. Identical consumption-opportunity sets across countries
xxi
924 927 928 930 930 935
939 941 941 942 943 943 951 956 956 958 961 961 962 964 966 966 967 967 967 968 968 968 969 970 970
975 976 976 977 978 979
xxii
Contents of Volume 1B
2.2. Different consumption-opportunity sets across countries 2.3. A general approach 2.4. Empirical evidence on asset pricing using perfect market models
3. Home bias 4. Flows, spillovers, and contagion 4.1. Flows and returns 4.2. Correlations, spillovers, and contagion
5. Conclusion References Chapter 17 Microstructure and Asset Pricing DAVID EASLEY and MAUREEN O’HARA Abstract Keywords 1. Introduction 2. Equilibrium asset pricing 3. Asset pricing in the short-run 3.1. 3.2. 3.3. 3.4.
The mechanics of pricing behavior The adjustment of prices to information Statistical and structural models of microstructure data Volume and price movements
4. Asset pricing in the long-run 4.1. Liquidity 4.2. Information
5. Linking microstructure and asset pricing: puzzles for researchers References Chapter 18 A Survey of Behavioral Finance NICHOLAS BARBERIS and RICHARD THALER Abstract Keywords 1. Introduction 2. Limits to arbitrage 2.1. Market efficiency 2.2. Theory 2.3. Evidence
3. Psychology 3.1. Beliefs 3.2. Preferences
4. Application: The aggregate stock market 4.1. The equity premium puzzle
982 988 992 997 1004 1007 1010 1014 1014
1021 1022 1022 1023 1024 1025 1026 1029 1031 1033 1035 1036 1041 1044 1047
1053 1054 1054 1055 1056 1056 1058 1061 1065 1065 1069 1075 1078
Contents of Volume 1B
xxiii
9. Conclusion Appendix A References
1083 1087 1092 1095 1097 1098 1098 1099 1101 1101 1103 1103 1104 1105 1106 1106 1109 1111 1113 1115 1116
Finance, Optimization, and the Irreducibly Irrational Component of Human Behavior ROBERT J. SHILLER
1125
4.2. The volatility puzzle
5. Application: The cross-section of average returns 5.1. Belief-based models 5.2. Belief-based models with institutional frictions 5.3. Preferences
6. Application: Closed-end funds and comovement 6.1. Closed-end funds 6.2. Comovement
7. Application: Investor behavior 7.1. 7.2. 7.3. 7.4. 7.5.
Insufficient diversification Naive diversification Excessive trading The selling decision The buying decision
8. Application: Corporate finance 8.1. Security issuance, capital structure and investment 8.2. Dividends 8.3. Models of managerial irrationality
Chapter 19 Derivatives ROBERT E. WHALEY Abstract Keywords 1. Introduction 2. Background 3. No-arbitrage pricing relations 3.1. Carrying costs 3.2. Valuing forward/futures using the no-arbitrage principle 3.3. Valuing options using the no-arbitrage principle
4. Option valuation 4.1. 4.2. 4.3. 4.4.
The Black–Scholes/Merton option valuation theory Analytical formulas Approximation methods Generalizations
1129 1131 1131 1132 1133 1139 1140 1141 1143 1148 1149 1151 1157 1164
xxiv
Contents of Volume 1B
5. Studies of no-arbitrage price relations 5.1. Forward/futures prices 5.2. Option prices 5.3. Summary and analysis
6. Studies of option valuation models 6.1. 6.2. 6.3. 6.4.
Pricing errors/implied volatility anomalies Trading simulations Informational content of implied volatility Summary and analysis
7. Social costs/benefits of derivatives trading 7.1. 7.2. 7.3. 7.4.
Contract introductions Contract expirations Market synchronization Summary and analysis
8. Summary References Chapter 20 Fixed-Income Pricing QIANG DAI and KENNETH J. SINGLETON Abstract Keywords 1. Introduction 2. Fixed-income pricing in a diffusion setting 2.1. 2.2. 2.3. 2.4.
The term structure Fixed-income securities with deterministic payoffs Fixed-income securities with state-dependent payoffs Fixed-income securities with stopping times
3. Dynamic term-structure models for default-free bonds 3.1. One-factor dynamic term-structure models 3.2. Multi-factor dynamic term-structure models
4. Dynamic term-structure models with jump diffusions 5. Dynamic term-structure models with regime shifts 6. Dynamic term-structure models with rating migrations 6.1. 6.2. 6.3. 6.4. 6.5.
Fractional recovery of market value Fractional recovery of par, payable at maturity Fractional recovery of par, payable at default Pricing defaultable coupon bonds Pricing Eurodollar swaps
7. Pricing of fixed-income derivatives 7.1. Derivatives pricing using dynamic term-structure models 7.2. Derivatives pricing using forward-rate models 7.3. Defaultable forward-rate models with rating migrations
1166 1167 1169 1173 1173 1174 1176 1179 1181 1189 1189 1193 1194 1197 1198 1199
1207 1208 1208 1209 1210 1210 1211 1212 1213 1215 1215 1218 1222 1223 1225 1225 1228 1229 1229 1230 1231 1231 1232 1234
Contents of Volume 1B 7.4. The LIBOR market model 7.5. The swaption market model
References Subject Index
xxv
1237 1241 1242 I-1
This Page Intentionally Left Blank
Chapter 10
ARBITRAGE, STATE PRICES AND PORTFOLIO THEORY PHILIP H. DYBVIG Washington University in Saint Louis STEPHEN A. ROSS MIT
Contents Abstract Keywords 1. Introduction 2. Portfolio problems 3. Absence of arbitrage and preference-free results 3.1. Fundamental theorem of asset pricing 3.2. Pricing rule representation theorem
4. Various analyses: Arrow–Debreu world 4.1. 4.2. 4.3. 4.4. 4.5.
Optimal portfolio choice Efficient portfolios Aggregation Asset pricing Payoff distribution pricing
5. Capital asset pricing model (CAPM) 6. Mutual fund separation theory 6.1. Preference approach 6.2. Beliefs
7. Arbitrage pricing theory (APT) 8. Conclusion References
Handbook of the Economics of Finance, Edited by G.M. Constantinides, M. Harris and R. Stulz © 2003 Elsevier B.V. All rights reserved
606 606 607 607 612 614 616 618 619 619 620 621 622 624 629 629 631 633 634 634
606
P.H. Dybvig and S.A. Ross
Abstract Neoclassical financial models provide the foundation for our understanding of finance. This chapter introduces the main ideas of neoclassical finance in a single-period context that avoids the technical difficulties of continuous-time models, but preserves the principal intuitions of the subject. The starting point of the analysis is the formulation of standard portfolio choice problems. A central conceptual result is the Fundamental Theorem of Asset Pricing, which asserts the equivalence of absence of arbitrage, the existence of a positive linear pricing rule, and the existence of an optimum for some agent who prefers more to less. A related conceptual result is the Pricing Rule Representation Theorem, which asserts that a positive linear pricing rule can be represented as using state prices, risk-neutral expectations, or a state-price density. Different equivalent representations are useful in different contexts. Many applied results can be derived from the first-order conditions of the portfolio choice problem. The first-order conditions say that marginal utility in each state is proportional to a consistent state-price density, where the constant of proportionality is determined by the budget constraint. If markets are complete, the implicit stateprice density is uniquely determined by investment opportunities and must be the same as viewed by all agents, thus simplifying the choice problem. Solving first-order conditions for quantities gives us optimal portfolio choice, solving them for prices gives us asset pricing models, solving them for utilities gives us preferences, and solving them for probabilities gives us beliefs. We look at two popular asset pricing models, the CAPM and the APT, as well as complete-markets pricing. In the case of the CAPM, the first-order conditions link nicely to the traditional measures of portfolio performance. Further conceptual results include aggregation and mutual fund separation theory, both of which are useful for understanding equilibrium and asset pricing.
Keywords arbitrage, arbitrage pricing theory, investments, portfolio choice, asset pricing, complete markets, mean-variance analysis, performance measurement, mutual fund separation, aggregation JEL classification: G11, G12
Ch. 10:
Arbitrage, State Prices and Portfolio Theory
607
1. Introduction The modern quantitative approach to finance has its original roots in neoclassical economics. Neoclassical economics studies an idealized world in which markets work smoothly without impediments such as transaction costs, taxes, asymmetry of information, or indivisibilities. This chapter considers what we learn from singleperiod neoclassical models in finance. While dynamic models are becoming more and more common, single-period models contain a surprisingly large amount of the intuition and intellectual content of modern finance, and are also commonly used by investment practitioners for the construction of optimal portfolios and communication of investment results. Focusing on a single period is also consistent with an important theme. While general equilibrium theory seeks great generality and abstraction, finance has work to be done and seeks specific models with strong assumptions and definite implications that can be tested and implemented in practice.
2. Portfolio problems In our analysis, there are two points of time, 0 and 1, with an interval of time in between during which nothing happens. At time zero, our champion (the agent) is making decisions that will affect the allocation of consumption between nonrandom consumption, c0 , at time 0, and random consumption {cw } across states w = 1, 2, . . . , W revealed at time 1. At time 0 and in each state at time 1, there is a single consumption good, and therefore consumption at time 0 or in a state at time 1 is a real number. This abstraction of a single good is obviously not “true” in any literal sense, but this is not a problem, and indeed any useful theoretical model is much simpler than reality. The abstraction does, however, face us with the question of how to interpret our simple model (in this case with a single good) in a practical context that is more complex (has multiple goods). In using a single-good model, there are two usual practices: either use nominal values and measure consumption in dollars, or use real values and measure consumption in inflation-adjusted dollars. Depending on the context, one or the other can make the most sense. Following the usual practice from general equilibrium theory of thinking of units of consumption at various times and in different states of nature as different goods, a typical consumption vector is C ≡ {c0 , c1 , . . . , cW }, where the real number c0 denotes consumption of the single good at time zero, and the vector c ≡ {c1 , . . . , cW } of real numbers c1 , . . . , cW denotes random consumption of the single good in each state 1, . . . , W at time 1. If this were a typical exercise in general equilibrium theory, we would have a price vector for consumption across goods. For example, we might have the following choice problem, which is named after two great pioneers of general equilibrium theory, Kenneth Arrow and Gerard Debreu:
608
P.H. Dybvig and S.A. Ross
Problem 1: Arrow–Debreu Problem. Choose consumptions C ≡ {c0 , c1 , . . . , cW } to maximize utility of consumption U (C) subject to the budget constraint c0 +
W
pw cw = W.
(1)
w=1
Here, U (·) is the utility function that represents preferences, p is the price vector, and W is wealth, which might be replaced by the market value of an endowment. We are taking consumption at time 0 to be the numeraire, and pw is the price of the Arrow– Debreu security which is a claim to one unit of consumption at time 1 in state w. The first-order condition for Problem 1 is the existence of a positive Lagrangian multiplier l (the marginal utility of wealth) such that U0 (c0 ) = l, and for all w = 1, . . . , W, Uw (cw ) = lpw . This is the usual result from neoclassical economics that the gradient of the utility function is proportional to prices. Specializing to the leading case in finance of timeseparable von Neumann–Morgenstern preferences, named after John von Neumann and Oscar Morgenstern (1944), two great pioneers of utility theory, we have that W U (C) = v(c0 ) + w = 1 pw u(cw ). We will take v and u to be differentiable, strictly increasing (more is preferred to less), and strictly concave (risk averse). Here, pw is the probability of state w. In this case, the first-order condition is the existence of l such that v (c0 ) = l,
(2)
and for all w = 1, 2, . . . , n, pw u (cw ) = lpw ,
(3)
or equivalently u (cw ) = løw ,
(4)
where øw ≡ pw / pw is the state-price density (also called the stochastic discount factor or pricing kernel), which is a measure of priced relative scarcity in state of nature w. Therefore, the marginal utility of consumption in a state is proportional to the relative scarcity. There is a solution if the problem is feasible, prices and probabilities are positive, the von Neumann–Morgenstern utility function is increasing and strictly concave, and there is satisfied the Inada condition limc ↑ ∞ u (c) = 0. 1 There are 1
Proving the existence of a solution requires more assumptions in continuous-state models.
Ch. 10:
Arbitrage, State Prices and Portfolio Theory
609
different motivations of von Neumann–Morgenstern preferences in the literature and the probabilities may be objective or subjective. What is important for us is that the von Neumann–Morgenstern utility function represents preferences in the sense that expected utility is higher for more preferred consumption patterns. 2 Using von Neumann–Morgenstern preferences has been popular in part because of axiomatic derivations of the theory [see, for example, Herstein and Milnor (1953) or Luce and Raiffa (1957, Chapter 2)]. There is also a large literature on alternatives and extensions to von Neumann–Morgenstern preferences. For single-period models, see Knight (1921), Bewley (1988), Machina (1982), Blume, Brandenburger and Dekel (1991) and Fishburn (1988). There is an even richer set of models in multiple periods, for example, time-separable von Neumann–Morgenstern (the traditional standard), habit formation [e.g., Duesenberry (1949), Pollak (1970), Abel (1990), Constantinides (1991) and Dybvig (1995)], local substitutability over time [Hindy and Huang (1992)], interpersonal dependence [Duesenberry (1949) and Abel (1990)], preference for resolution of uncertainty [Kreps and Porteus(1978)], time preference dependent on consumption [Bergman (1985)], and general recursive utility [Epstein and Zin (1989)]. Recently, there have also been some attempts to revive the age-old idea of studying financial situations using psychological theories [like prospect theory, Kahneman and Tversky (1979)]. Unfortunately, these models do not translate well to financial markets. For example, in prospect theory framing matters, that is, the observed phenomenon of an agent making different decisions when facing identical decision problems described differently. However, this is an alien concept for financial economists and when they proxy for it in models they substitute something more familiar [for example, some history dependence as in Barberis, Huang and Santos (2001)]. Another problem with the psychological theories is that they tend to be isolated stories rather than a general specification, and they are often hard to generalize. For example, prospect theory says that agents put extra weight on very unlikely outcomes, but it is not at all clear what this means in a model with a continuum of states. This literature also has problems with using ex post explanations (positive correlations of returns are underreaction and negative correlations are overreactions) and a lack of clarity of how much is going on that cannot be explained by traditional models (and much of it can). In actual financial markets, Arrow–Debreu securities do not trade directly, even if they can be constructed indirectly using a portfolio of securities. A security is characterized by its cash flows. This description would not be adequate for analysis of taxes, since different sources of cash flow might have very different tax treatment, but we are looking at models without taxes. For an asset like a common stock or a bond, the cash flow might be negative at time 0, from payment of the price, and positive or zero in each state at time 1, the positive amount coming from any repayment of
2
Later, when we look at multiple-agent results, we will also make the neoclassical assumption of identical beliefs, which is probably most naturally motivated by common objective beliefs.
610
P.H. Dybvig and S.A. Ross
principal, dividends, coupons, or proceeds from sale of the asset. For a futures contract, the cash flow would be 0 at time 0, and the cash flow in different states at time 1 could be positive, negative, or zero, depending on news about the value of the underlying commodity. In general, we think of the negative of the initial cash flow as the price of a security. We denote by P = {P1 , . . . , PN } the vector of prices of the N securities 1, . . . , N , and we denote by X the payoff matrix. We have that Pn is the price we pay for one unit of security n and Xwn is the payoff per unit of security n at time 1 in the single state of nature w. With the choice of a portfolio of assets, our choice problem might become Problem 2: First Portfolio Choice Problem. Choose portfolio holdings Q ≡ {Q1 , . . . , QN } and consumptions C ≡ {c0 , . . . , cW } to maximize utility of consumption U (C) subject to portfolio payoffs c ≡ {c1 , . . . , cw } = X Q and budget constraint c0 + P Q = W . Here, Q is the vector of portfolio weights. Time 0 consumption is the numeraire, and wealth W is now chosen in time 0 consumption units and the entire endowment is received at time 0. In the budget constraint, the term P Q is the cost of the portfolio holding, which is the sum across securities n of the price Pn times the number of shares orother unit Qn . The matrix product X Q says that the consumption in state w is cw = n Xwn Qn , i.e., the sum across securities n of the payoff Xwn of security n in state w, times the number of shares or other units Qn of security n our champion is holding. The first-order condition for Problem 2 is the existence of a vector of shadow prices p and a Lagrangian multiplier l such that pw u (cw ) = lpw ,
(5)
where P = pX.
(6)
The first equation is the same as in the Arrow–Debreu model, with an implicit shadow price vector in place of the given Arrow–Debreu prices. The second equation is a pricing equation that says the prices of all assets must be consistent with the shadow prices of the states. For the Arrow–Debreu model itself, the state-space tableau X is I , the identity matrix, and the price vector P is p, the vector of Arrow–Debreu state prices. For the Arrow–Debreu model, the pricing equation determines the shadow prices as equal to the state prices. Even if the assets are not the Arrow–Debreu securities, Problem 2 may be essentially equivalent to the Arrow–Debreu model in Problem 1. In economic terms, the important feature of the Arrow–Debreu problem is that all payoff patterns are spanned, i.e., each potential payoff pattern can be generated at some price by some portfolio of assets. Linear algebra tells us that all payoff patterns can be generated if the payoff matrix X
Ch. 10:
Arbitrage, State Prices and Portfolio Theory
611
has full row rank. If X has full row rank, p is determined (or over-determined) by Equation (6). If p is uniquely determined by the pricing equation (and therefore also all Arrow–Debreu assets can be purchased as portfolios of assets in the economy), we say that markets are complete, and for all practical purposes we are in an Arrow– Debreu world. For the choice problem to have a solution for any agent who prefers more to less, we also need for the price of each payoff pattern to be unique (the “law of one price”) and positive, or else there would be arbitrage (i.e., a “money pump” or a “free lunch”). If there is no arbitrage, then there is at least one vector of positive state prices p solving the pricing equation (6). There is an arbitrage if the vector of state prices is overdetermined or if all consistent vectors of state prices assign a negative or zero price to some state. The notion of absence of arbitrage is a central concept in finance, and we develop its implications more fully in the section on preference-free results. So far, we have been stating portfolio problems in prices and quantities, as we would in general equilibrium theory. However, it is also common to describe assets in terms of rates of return, which are relative price changes (often expressed as percentages). The return to security n, which is the relative change in total value (including any dividends, splits, warrant issues, coupons, stock issues, and the like as well as change in the price). There is not an absolute standard of what is meant by return, in different contexts this can be the rate of return, one plus the rate of return, or the difference between two rates of return. It is necessary to figure which is intended by asking or from context. Using the notation above, the rate of return in state w is rwn = (Xwn − Pn )/Pn . 3 Often, consumption at the outset is suppressed, and we specialize to von Neumann– Morgenstern expected utility. In this case, we have the following common form of portfolio problem. Problem 3: Portfolio Problem using Returns. Choose portfolio proportions q ≡ {q1 , . . . , qN } and consumptions c ≡ {c1 , . . . , cW } to maximize expected utility W of consumption w = 1 pq u(cw ) subject to the consumption equation c = W q (1 + r) and the budget constraint q 1 = 1. Here, p = {p1 , . . . , pW } is a vector of state probabilities, u(·) is the von Neumann– Morgenstern utility function, and 1 is a vector of 1’s. The dimensionality of 1 is determined implicitly from the context; here the dimensionality is the number of assets. The first-order condition for an optimum is the existence of shadow state-price density vector ø and shadow marginal utility of wealth l such that u (cw ) = løw
(7)
3 One unfortunate thing about returns is that they are not defined for contracts (like futures) that have zero price. However, this can be finessed formally by bundling a futures with a bond or other asset in defining the securities and unbundling them when interpreting the results. Bundling and unbundling does not change the underlying economics due to the linearity of consumptions and constraints in the portfolio choice problem.
612
P.H. Dybvig and S.A. Ross
and 1 = E[(1 + r)ø].
(8)
These equations say that the state-price density is consistent with the marginal valuation by the agent and with pricing in the market. As our final typical problem, let us consider a mean-variance optimization. This optimization is predicated on the assumption that investors care only about mean and variance (typically preferring more mean and less variance), so we have a utility function V (m, v) in mean m and variance v. For this problem, suppose there is a riskfree asset paying a return r (although the market-level implications of mean-variance analysis can also be derived in a general model without a risky asset). In this case, portfolio proportions in the risky assets are unconstrained (need not sum to 1) because the slack can be taken up by the risk-free asset. We denote by m the vector of mean risky asset returns and by s the covariance matrix of risky returns. Then our champion solves the following choice problem. Problem 4: Mean-variance optimization. Choose portfolio proportions q ≡ {q1 , . . . , qN } to maximize the mean-variance utility function V (r + (m − r1) q, q Sq). The first-order condition for the problem is m − r1 = lSq,
(9)
where q is the optimal vector of portfolio proportions and l is twice the marginal rate of substitution Vv (m, v)/Vm (m, v), evaluated at m = r + (m − r1) q and v = q Sq. The first-order condition (9) says that mean excess return for each asset is proportional to the marginal contribution of volatility to the agent’s optimal portfolio. We have seen a few of the typical types of portfolio problem. There are a lot of variations. The problem might be stated in terms of excess returns (rate of return less a risk-free rate) or total return (one plus the rate of return). Or, we might constrain portfolio holdings to be positive (no short sales) or we might require consumption to be nonnegative (limited liability). Many other variations adapt the basic portfolio problem to handle institutional features not present in a neoclassical formulation, such as transaction costs, bid–ask spreads, or taxes. These extensions are very interesting, but beyond the scope of what we are doing here, which is to explore the neoclassical foundations.
3. Absence of arbitrage and preference-free results Before considering specific solutions and applications, let us consider some general results that are useful for thinking about portfolio choice. These results are preferencefree in the sense that they do not depend on any specific assumptions about preferences
Ch. 10:
Arbitrage, State Prices and Portfolio Theory
613
but only depend on an assumption that agents prefer more to less. Central to this section is the notion of an arbitrage, which is a “money pump” or a “free lunch”. If there is arbitrage, linearity of the neoclassical problem implies that any candidate optimum can be dominated by adding the arbitrage. As a result, no agent who prefers more to less would have an optimum if there exists arbitrage. Furthermore, this seemingly weak assumption is enough to obtain two useful theorems. The Fundamental Theorem of Asset Pricing says that the following are equivalent: absence of arbitrage, existence of a consistent positive linear pricing rule, and existence of an optimum for some hypothetical agent who prefers more to less. The Pricing Rule Representation Theorem gives different equivalent forms for the consistent positive linear pricing rule, using state prices, risk-neutral probabilities (martingale valuation), state-price density (or stochastic discount factor or pricing kernel), or an abstract positive linear operator. The results in this section are from Cox and Ross (1975), Ross (1976c, 1978b) and Dybvig and Ross (1987). The results have been formalized in continuous time by Harrison and Kreps (1979) and Harrison and Pliska (1981). Occasionally, the theorems in this section can be applied directly to obtain an interesting result. For example, linearity of the pricing rule is enough to derive putcall parity without constructing the arbitrage. More often, the results in this section help to answer conceptual questions. For example, an option pricing formula that is derived using absence of arbitrage is always consistent with equilibrium, as can be seen from the Fundamental Theorem. By the Fundamental Theorem, absence of arbitrage implies there is an optimum for some hypothetical agent who prefers more to less; we can therefore construct an equilibrium in the single-agent pure exchange economy in which this agent is endowed with the optimal holding. By construction the equilibrium in this economy will have the desired pricing, and therefore any no-arbitrage pricing result is consistent with some equilibrium. In this section, we will work in the context of Problem 2. An arbitrage is a change in the portfolio that makes all agents who prefer more to less better off. We make all such agents better off if we increase consumption sometime, and in some state of nature, and we never decrease consumption. By combining the two constraints in Problem 2, we can write the consumption C associated with any portfolio choice Q using the stacked matrix equation W −P C= + Q. 0 X The first row, W − P Q, is consumption at time 0, which is wealth W less the cost of our portfolio. The remaining rows, X Q, give the random consumption across states at time 1. Now, when we move from the portfolio choice Q to the portfolio choice Q + h, the initial wealth term cancels and the change in consumption can now be written as −P h. DC = X
614
P.H. Dybvig and S.A. Ross
This will be an arbitrage if DC is never negative and is positive in at least one component, which we will write as 4 DC > 0 or
−P h > 0. X
Some authors describe taxonomies of different types of arbitrage, having perhaps a negative price today and zero payoff tomorrow, a zero price today and a nonnegative but not identically zero payoff tomorrow, or a negative price today and a positive payoff tomorrow. These are all examples of arbitrages that are subsumed by our general formula. The important thing is that there is an increase in consumption in some state of nature at some point of time and there is never any decrease in consumption. 3.1. Fundamental theorem of asset pricing Theorem 1: Fundamental Theorem of Asset Pricing. The following conditions on prices P and payoffs X are equivalent: −P h>0 . (i) Absence of arbitrage: (∃h) / X (ii) Existence of a consistent positive linear pricing rule (positive state prices): (∃p 0)(P = p X ). (iii) Some agent with strictly increasing preferences U has an optimum in Problem 2. Proof: We prove the equivalence by showing (i) ⇒ (ii), (ii) ⇒ (iii), and (iii) ⇒ (i). (i) ⇒ (ii): This is the most subtle part, and it follows from a separation theorem or the duality theorem from linear programming. From the definition of absence of arbitrage, we have that the sets S1 ≡
−P n h|h∈R X
and
S2 ≡ x ∈ RW + 1 | x > 0 must be disjoint. Therefore, there is a separating hyperplane z such that z x = 0 for all x ∈ S1 and z x > 0 for all x ∈ S2 . [See Karlin (1959), Theorem B3.5] Normalizing so that the first component (the shadow price of time zero consumption) is 1, we will see that p defined by (1 p ) = z/z0 is the consistent linear pricing rule we seek. Constancy
We use the following terminology for vector inequalities: (x y) ⇔ (∀i) (xi yi ), (x > y) ⇔ ((x y) & (∃i) (xi > yi )), and (x y) ⇔ (∀i) (xi > yi ).
4
Ch. 10:
Arbitrage, State Prices and Portfolio Theory
615
−P = 0, which is to say that P = p X , i.e., p X is a consistent linear pricing rule. Furthermore, z x positive for x ∈ S2 implies z 0 and consequently p 0, and p is indeed the desired consistent positive linear pricing rule. (ii) ⇒ (iii): This part is proven by construction. Let U (C) = (1 p ) C, then Q = 0 solves Problem 2. To see this, note that the objective function U (C) is constant and equal to W for all Q: of zx for x ∈ S1 implies that (1 p )
U (C) = (1 p ) C W −P = (1 p ) + Q 0 X = W + (−P + p X ) Q = W. (The motivation of this construction is the observation that the existence of the consistent linear pricing rule with state prices p implies that all feasible consumptions satisfy (1 p ) C = W .) (iii) ⇒ (i): This part is obvious, since any candidate optimum is dominated by adding the arbitrage, and therefore there can be no arbitrage if there is an optimum. More formally, adding an arbitrage implies the change of consumption DC > 0, which implies an increase in U (C). One feature of the proof that may seem strange is the degeneracy (linearity) of the utility function whose existence is constructed. This was all that was needed for this proof, but it could also be constructed to be strictly concave, additively separable over time, and of the von Neumann–Morgenstern class for given probabilities. Assuming any of these restrictions on the class would make some parts of the theorem weaker [(iii) implies (i) and (ii)] at the same time that it makes other parts stronger [(i) or (ii) implies (iii)]. The point is that the theorem is still true if (iii) is replaced by a much more restrictive class that imposes on U any or all of strict concavity, some order of differentiability, additive separability over time, and a von Neumann–Morgenstern form with or without specifying the probabilities in advance. All of these classes are restrictive enough to rule out arbitrage, and general enough to contain a utility function that admits an optimum when there is no arbitrage. The statement and proof of the theorem are a little more subtle if the state space is infinite-dimensional. The separation theorem is topological in nature, so we must restrict our attention to a topologically relevant subset of the nonnegative random variables. Also, we may lose the separating hyperplane theorem because the interior of the positive orthant is empty in most of these spaces (unless we use the sup-norm topology, in which case the dual is very large and includes dual vectors that do not support state prices). However, with some definition of arbitrage in limits, the economic content of the Fundamental Theorem can be maintained.
616
P.H. Dybvig and S.A. Ross
3.2. Pricing rule representation theorem Depending on the context, there are different useful ways of representing the pricing rule. For some abstract applications (like proving put–call parity), it is easiest to use a general abstract representation as a linear operator L(c) such that c > 0 ⇒ L(c) > 0. For asset pricing applications, it is often useful to use either the state-price representation we used in the Fundamental Theorem, L(c) = w pw cw , or risk-neutral probabilities, L(c) = (1 + r ∗ )−1 E ∗ [cw ] = (1 + r ∗ )−1 w pw∗ cw . The intuition behind the risk-neutral representation (or martingale representation 5 ) is that the price is the expected discounted value computed using a shadow risk-free rate (equal to the actual risk-free rate if there is one) and artificial risk-neutral probabilities p ∗ that assign positive probability to the same states as do the true probabilities. Riskneutral pricing says that all investments are fair gambles once we have adjusted for time preference by discounting and for risk preference by adjusting the probabilities. The final representation using the state-price density (or stochastic discount factor) ø to write L(c) = E[øw cw ] = w pw øw cw . The state price density simplifies firstorder conditions of portfolio choice problems because the state-price density measures priced scarcity of consumption. The state-price density is also handy for continuousstate models in which individual states have zero state probabilities and state prices but there exists a well-defined positive ratio of the two. Theorem 2: Pricing Rule Representation Theorem. The consistent positive linear pricing rule can be represented equivalently using (i) an abstract linear functional L(c) that Wis positive: (c > 0) ⇒ (L(c) > 0) (ii) positive state prices p 0: L(c) = w = 1 pw cw (iii) positive risk-neutral probabilities p ∗ 0 summing to 1 with associated shadow risk-free rate r ∗ : L(c) = (1 + r ∗ )−1 E ∗ [cw ] ≡ (1 + r ∗ )−1 w pw∗ cw (iv) positive state-price densities ø 0: L(c) = E[øc] ≡ w pw øw cw . Proof: (i) ⇒ (ii): This is the known form of a linear operator in RW ; p 0 follows from the positivity of L. (ii) ⇒ (iii): Note first that the shadow risk-free rate must price the riskless asset c = 1: W
pw 1 = (1 + r ∗ )−1 E ∗ [1],
w=1
which implies (since E ∗ [1] = 1) that r ∗ = 1/p 1 − 1. Then, matching coefficients in W w=1
5
pw cw = (1 + r ∗ )−1
pw∗ cw ,
w
The reason for calling the term “martingale representation” is that using the risk-neutral probabilities makes the discounted price process a martingale, which is a stochastic process that does not increase or decrease on average.
Ch. 10:
Arbitrage, State Prices and Portfolio Theory
617
we have that p ∗ = p/ 1 p, which sums to 1 as required and inherits positivity from p. (iii) ⇒ (iv): Simply let øw = (1 + r ∗ )−1 pw∗ (which is the same as pw / pw ). (iv) ⇒ (i): immediate. Perhaps what is most remarkable about the Fundamental Theorem and the Representation Theorem is that neither probabilities nor preferences appear in the determination of the pricing operator, beyond the initial identification of which states have nonzero probability and the assumption that more is preferred to less. It is this observation that empowers the theory of derivative asset pricing, and is, for example, the reason why the Black–Scholes option price does not depend on the mean return on the underlying stock. Preferences and beliefs are, however, in the background: in equilibrium, they would influence the price vector P and/or the payoff matrix X (or the mean return process for the Black–Scholes stock). Although the focus of this chapter is on the single-period model, we should note that the various representations have natural multiperiod extensions. The abstract linear functional and state prices have essentially the same form, noting that cash flows now extend across time as well as states of nature and that there are also conditional versions of the formula at each date and contingency. In some models, the information set is generated by the sample path of security prices; in this case the state of nature is a sample path through the tree of potential security prices. For the state-price density in multiple periods, there is in general a state-price-density process { øt } whose relatives can be used for valuation. For example, the value at time s of receiving subsequent cash flows cs+1 , cs+2 . . . ct is given by t øt Es ct , (10) øs t =s+1
where Es [·] denotes expectation conditional on information available at time s. Basically, this follows from iterated expectations and defining øt as a cumulative product of single-period ø’s. Similarly, we can write risk-neutral valuation as Pt Ps = Es∗ . (11) (1 + rs∗ )(1 + rs∗+ 1 ) . . . (1 + rt∗ ) Note that unless the risk-free rate is nonrandom, we cannot take the discount factors out of the expectation. 6 This is because of the way that the law of iterated expectations works. For example, consider the value V0 at time 0 of the cash flow in time 2. V0 = (1 + r1∗ )−1 E0∗ [V1 ] = (1 + r1∗ )−1 E0∗ [(1 + r2∗ )−1 E1∗ [c2 ]] = (1 +
r1∗ )−1 E0∗ [(1
+
(12)
r2∗ )−1 c2 ].
6 It would be possible to treat the whole time period from s to t as a single period and apply the pricing result to that large period in which case the discounting would be at the appropriate (t − s)-period rate. The problem with this is that the risk-neutral probabilities would be different for each pair of dates, which is unnecessarily cumbersome.
618
P.H. Dybvig and S.A. Ross
Now, (1 + r1∗ )−1 is outside the expectation (as could be (1 + rs∗+ 1 )−1 in Equation (11), but (1 + r2∗ )−1 cannot come outside the expectation unless it is nonrandom. 7 So, it is best to remember that when interest rates are stochastic, discounting for risk-neutral valuation should use the rolled-over spot rate, within the expectation.
4. Various analyses: Arrow–Debreu world The portfolio problem is the starting point of a lot of types of analysis in finance. Here are some implications that can be drawn from portfolio problems (usually through the first-order conditions): • optimal portfolio choice (asset allocation or stock selection) • portfolio efficiency • aggregation and market-level implications • asset pricing and performance measurement • payoff distribution pricing • recovery or estimation of preferences • inference of expectations We can think of many of these distinctions as a question of what we are solving for when we look at the first-order conditions. In optimal portfolio choice and its aggregation, we are solving for the portfolio choice given the preferences and beliefs about returns. In asset pricing, we are computing the prices (or restrictions on expected returns) given preferences, beliefs about payoffs, and the optimal choice (which is itself often derived using an aggregation result). In recovery, we derive preferences from beliefs and idealized observations about portfolio choice, e.g. at all wealth levels. Estimation of preferences is similar, but works with noisy observations of demand at a finite set of data points and uses a restriction in the functional form or smoothing in the statistical procedure to identify preferences. And, inference of expectations derives probability beliefs from preferences, prices, and the (observed) optimal demand. In this section, we illustrate the various analyses in the case of an Arrow–Debreu world. Analysis of the complete-markets model has been developed by many people over a period of time. Some of the more important works include some of the original work on competitive equilibrium such as Arrow and Debreu (1954), Debreu (1959) and Arrow and Hahn (1971), as well as some early work specific to security markets such as Arrow (1964), Rubinstein (1976), Ross (1976b), Banz and Miller (1978) and Breeden and Litzenberger (1978). There are also a lot of papers set in 7 In the special case in which c is uncorrelated with (1 + r ∗ )−1 (or in multiple periods if cash flows 2 2 are all independent of shadow interest rate moves), we can take the expected discount factor outside the expectation. In this case, we can use the multiperiod riskfree discount bond rate for discounting a simple expected final. However, in general, it is best to remember the general formula (11) with the rates in the denominator inside the expectation.
Ch. 10:
Arbitrage, State Prices and Portfolio Theory
619
multiple periods that contributed to the finance of complete markets; although not strictly within the scope of this chapter, we mention just a few here: Black and Scholes (1973), Merton (1971, 1973), Cox, Ross and Rubinstein (1979) and Breeden (1979). 4.1. Optimal portfolio choice The optimal portfolio choice is the choice of consumptions (c0 , c1 , . . . , cW ) and Lagrange multiplier l to solve the budget constraint (1) and the first-order conditions (2) and (3). If the inverse I (·) of u (·) and the inverse J (·) of v (·) are both known analytically, then finding the optimum W can be done using a one-dimensional monotone search for l such that J (l) + w = 1 pw I (lpw / pw ) = W . In some special cases, we can solve the optimization analytically. For logarithmic utility, v(c) = log(c) and u(c) = d log(c) for some d > 0, optimal consumption is given by c0 = W/ (1 + d) and cw = pw W d/ ((1 + d) pw ) (for w = 1, . . . , W). The portfolio choice can also be solved analytically for quadratic utility. 4.2. Efficient portfolios Efficient portfolios are the ones that are chosen by some agent in a given class of utility functions. For the Arrow–Debreu problem, we might take the class of utility functions to be the class of differentiable, increasing and strictlyconcave time-separable von W Neumann–Morgenstern utility functions U (c) = v(c0 ) + w = 1 pw u(cw ). 8 Since u(·) is increasing and strictly concave, (cw > cw ) ⇔ u (cw ) < u (cw ). Consequently, the firstorder condition (4) implies that (cw > cw ) ⇔ ( øw < øw ). Since the state-price density øw ≡ pw / pw is a measure of priced social scarcity in state w, this says that we consume less in states in which consumption is more expensive. This necessary condition for efficiency is also sufficient; if consumption reverses the order across states of the stateprice density, then it is easy to construct a utility function that satisfies the first-order conditions. Formally, Theorem 3: Arrow–Debreu Portfolio Efficiency. Consider a complete-markets world (in which agents solve Problem 1) in which state prices and probabilities are all strictly positive, and let U be the class of differentiable, increasing and strictly concave time-separable von Neumann–Morgenstern utility functions of the W form U (c) = v(c0 ) + w = 1 pw u(cw ). Then there exists a utility function in the class U that chooses the consumption vector c satisfying the budget constraint if and only if consumptions at time 1 are in the opposite order as the state-price densities, i.e., (∀w, w ∈ {1, . . . , W})((cw > cw ) ⇔ ( øw < øw )).
8
A non-time-separable version would be of the form U (c) =
W w=1
pw u(c0 , cw ).
620
P.H. Dybvig and S.A. Ross
Proof: The “only if ” part follows directly from the first-order condition and concavity as noted in the paragraph above. For the “if ” part, we are given a consumption vector with the appropriate ordering and we will construct a utility function that will choose it and satisfy the first-order condition with l = 1. For this, choose v(c) = exp(−(c − c0 )) (so that v (c0 ) = 1 as required by Equation 2), and choose u (c) to be any strictly positive and strictly decreasing function satisfying u (cw ) = øw for all w = {1, 2, . . . , W}, for example, by “connecting the dots” (with appropriate treatment past the endpoints) in the graph of øw as a function of cw . Integrating this function yields a utility function u(·) such that the von Neumann–Morgenstern utility function satisfies the first-order conditions, and by concavity this first-order solution is a solution. Friendly warning. There are many notions of efficiency in finance: Pareto efficiency, informational efficiency, and the portfolio efficiency we have mentioned are three leading examples. A common mistake in heuristic arguments is to assume incorrectly that one sense of efficiency necessarily implies another. 4.3. Aggregation Aggregation results typically show what features of individual portfolio choice are preserved at the market level. Many asset pricing results follow from aggregation and the first-order conditions. The most common type of aggregation result is the efficiency of the market portfolio. For most classes of preferences we consider, the efficient set is unchanged by rescaling wealth, and consequently the market portfolio is always efficient if and only if the efficient set is convex. This is because the market portfolio is a rescaled version of the individual portfolios. (If the portfolios are written in terms of proportions, no rescaling is needed). When the market portfolio is efficient, then we can invert the first-order condition for the hypothetical agent who holds the market portfolio to obtain the pricing rule. In the Arrow–Debreu world, the market portfolio is always efficient. This is because the ordering across states is preserved when we sum individual portfolio choices to form the market portfolio. Consider agents m = 1, . . . , M with felicity functions ∗ ∗ v1 (·), . . . , vM (·) and u1 (·), . . . , uM (·) and optimal consumptions C 1 , . . . , C M . The following results are close relatives of standard results in general equilibrium theory. Theorem 4: Aggregation Theorem. In a pure exchange equilibrium in a complete market, (i) all agents order time 1 consumption in the same order across states, (ii) aggregate time 1 consumption is in the same order across states, (iii) equilibrium is Pareto optimal, and (iv) there is a time separable von Neumann–Morgenstern utility function that would choose optimally aggregate consumption. Proof: (i) and (ii) Immediate, given Theorem 3.
Ch. 10:
Arbitrage, State Prices and Portfolio Theory
621
(iii) Let l m be the Lagrangian multiplier at the optimum in the first order condition in agent m’s decision problem. Consider the problem of maximizing the linear social welfare function with weights l m , namely
N W n n n n l v (c0 ) + u (cw ) . n=1
w=1
It is easy to verify from the first-order conditions from the equilibrium consumptions that they solve this problem too. This is a concave optimization, so the first-order conditions are sufficient, and since the welfare weights are positive the solution must be Pareto optimal (or else a Pareto improvement would increase the objective function). N (iv) Define vA (c) ≡ maxcn s n = 1 l n vn (cn ) to be the first-period aggregate felicity W function and define uA (c) ≡ maxcn s n = 1 ln un (cn ) to be the second-period aggregate felicity function. Then the utility function vA (c0 ) + E[uA (cw )] is a time-separable von Neumann–Morgenstern utility function that would choose the market’s aggregate consumption, since the objective function is the same as for the social welfare problem described under the proof of (iii). There is a different perspective that gives an alternative proof of the existence of a represenatitive agent (iv). The existence of a representative agent follows from the convexity of the set of efficient portfolios derived earlier. The main condition we require to have this work is that the efficient set of portfolio proportions is the same at all wealth levels, which is true here and typically of the cases we consider. 4.4. Asset pricing Asset pricing gets its name from valuation of cash flows, although asset pricing formulas may be expressed in several different ways, for example as a formula explaining expected returns across assets or as a moment condition satisfied by returns that can be tested econometrically. Let vA (·) and uA (·) represent the preferences of the hypothetical agent who holds aggregate consumption, as guaranteed by the aggregation theorem 4. Then we can solve the first-order conditions (2) and (3) to compute pw = pw uA (cwA )/ vA (c0A ) and therefore the time-0 valuation of the time-1 cash flow vector {c1 , . . . , cW } is W
uA (cwA ) cw vA (c0A ) w=1 uA (cwA ) cw . =E vA (c0A )
L(c1 , . . . , cW ) =
pw
(13)
This formula (with state-price density øw = uA (cwA )/ vA (c0A ) is the right one for pricing assets, but asset pricing equations are more often expressed as explanations of mean
622
P.H. Dybvig and S.A. Ross
returns across assets or as moment conditions satisfied by returns. Defining the rate of return (the relative value change) for some asset as rw ≡ (cw − P)/P where cw is the asset value in state w and P is the asset’s price. Letting rf be the risk-free rate of return (or the riskless interest rate), which must be rf =
1 E[uA (cwA )/ vA (c0A )]
,
we have that Equation (13) implies
uA (cwA ) E[rw ] = rf + (1 + rf ) cov , rw , vA (c0A )
(14)
(15)
so that the risk premium (the excess of expected return over the risk-free rate) is proportional to covariance of return with the state-price density. This is the representation of asset pricing in terms of expected returns, and is also the so-called consumption-capital asset pricing model (CCAPM) that is more commonly studied in a multiperiod setting. Either of the pricing relations could be used as moment conditions in an asset-pricing test, but it is more common to use the moment condition uA (cwA ) 1=E (1 + rw ) , (16) vA (c0A ) to test the CCAPM. This same equations characterize pricing for just about all the pricing models (perhaps with optimal consumption for some agent in place of aggregate consumption). Recall that the first-order conditions are just about the same whether markets are complete or incomplete. The main difference is that the state prices are shadow prices (Lagrangian multipliers) when markets are incomplete, but actual asset prices in complete markets. Either way, the first-order conditions are consistent with the same asset pricing equations. 4.5. Payoff distribution pricing For von Neumann–Morgenstern preferences (expected utility theory) and more general Machina preferences, preferences depend only on distributions of returns and payoffs and do not depend on the specific states in which those returns are realized. Consider, for example, a simple example with three equally probable states, p1 = p2 = p3 = 13 . Suppose that an individual has to choose one of the following payoff vectors for consumption at time 1: c1 = (1, 2, 2), c2 = (2, 1, 2), and c3 = (2, 2, 1). These three consumption patterns have the same distribution of consumption, giving consumption of 1 with probability 13 and consumption of 2 with probability 23 . Therefore, an agent with von Neumann–Morgenstern preferences or more general Machina preferences
Ch. 10:
Arbitrage, State Prices and Portfolio Theory
623
would find all these consumption vectors are equally attractive. However they do not all cost the same unless the state-price density (and in the example, the state price) is the same in all states. However, having the state-price density the same in all states is a risk-neutral world – all consumption bundles priced at their expected value – which is not very interesting since all risk-averse agents would choose a riskless investment. 9 In general, we expect the state-price density to be highest of states of social scarcity, when the market is down or the economy is in recession, since buying consumption in states of scarcity is a form of insurance. Suppose that the state-price vector is p = (.3, .2, .4). Then the prices of the bundles can be computed as p ca = .3 × 1 + .2 × 2 + .4 × 2 = 1.5, p cb = .3 × 2 + .2 × 1 + .4 × 2 = 1.6, and p cc = .3 × 2 + .2 × 2 + .4 × 1 = 1.4. The cheapest consumption pattern is cc , which places the larger consumption in the cheap states and the smallest consumption in the most expensive state. This gives us a very useful cash-value measure of the inefficiency of the other strategies. An agent will save 1.5 − 1.4 = 0.1 cash up front by choosing cc up front instead of ca or 1.6 − 1.4 = 0.2 cash up front by choosing cc instead of cb . Therefore, we can interpret 0.1 as a lower bound on the amount of inefficiency in ca , since any agent would pay that amount to swap to cc and perhaps more to swap to something better. The only assumption we need for this result is that the agent has preferences (such as von Neumann–Morgenstern preferences or Machina preferences) that care only about the distribution of consumption and not the identity of the particular states in which different parts of the distribution are realized. The general result is based on the “deep theoretical insight” that you should “buy more when it is cheaper”. This means that efficient consumption is decreasing in the state-price density. We can compute the (lower bound on the) inefficiency of the portfolio by reording its consumption in reverse order as the state-price density and computing the decline in cost. The payoff distributional price of a consumption pattern is the price of getting the same distribution the cheapest possible way (in reverse order as the state-price density). There is a nice general formula for the distributional price. Let Fc (·) be the cumulative distribution function of consumption and let ic (·) be its inverse. Similarly, let Fø (·) be the cumulative distribution function of the state-price density and let iø (·) be its inverse. Let c∗ be the efficient consumption pattern with distribution function Fc (·). Then the distributional price of the consumption pattern can be written as 1 ic (z) iø (1 − z) dz. (17) E[c∗ ø] = z=0
In this expression, z has units of probability and labels the states ordered in reverse of the state-price density, iø (1 − z) is state-price density in state z, and ic (z) is the optimal 9
This is different from there existing a change of probability that gives risk-neutral pricing. In a risk-neutral world, the actual probabilities are also risk-neutral probabilities.
624
P.H. Dybvig and S.A. Ross
consumption c∗ in state z. This formula is simplest to understand for a continuous state space, but also makes sense for finitely many equally-probable states as in the example, provided we define the inverse distribution function at mass points in the natural way. The original analysis of Payoff Distribution pricing for complete frictionless markets was presented by Dybvig (1988a,b). Payoff Distribution pricing can also be used in a model with incomplete markets or frictions, as developed by Jouini and Kallal (2001), but that analysis is beyond the scope of this chapter.
5. Capital asset pricing model (CAPM) The Capital Asset Pricing Model (CAPM) is an asset-pricing model based on equilibrium with agents having mean-variance preferences (as in Problem 4). It is based on the mean-variance analysis pioneered by Markowitz (1952, 1959) and Tobin (1958), and was extended to an equilibrium model by Sharpe (1964) and Lintner (1965). Even though there are many more modern pricing models, the CAPM is still the most important. This model gives us most of our basic intuitions about the trade-off between risk and return, about how market risk is priced, and about how idiosyncratic risk is not priced. The CAPM is also widely used in practice, not only in the derivation of optimal portfolios but also in the ex post assessment of performance. Sometimes people still refer to mean-variance analysis by the term Modern Portfolio Theory without intending a joke, even though we are approaching its 50th anniversary. In theoretical work, the mean-variance preferences assumed in Problem 4 are usually motivated by joint normality of returns (a restriction on beliefs) or by a restriction on preferences (a quadratic von Neumann–Morgenstern utility function). When returns are jointly normal, so are portfolio returns, so the entire distribution of a portfolio’s return (and therefore utility that depends only on distribution) is determined by the mean and variance. For quadratic utility, there is an algebraic relation between expected utility and mean and variance. Letting u(c) = k1 + k2 c − k3 c2 , E[u(c)] = k1 + k2 E[c] − k3 E[c2 ] = k1 + k2 E[c] − k3 (var(c) + (E[c])2 ),
(18)
which depends on the preferences parameters k1 , k2 , and k3 and the mean and variance of c and not on other features of the distribution (such as skewness or kurtosis). Neither assumption is literally true, but we must remember that models must be simpler than the world if they are to be useful. You may wonder why we need to motivate the representation of preferences by the utility function V (m, v), since it may seem very intuitive to write down preferences for risk an return directly. However, it is actually a little strange to assume that these preferences apply to all random variables. For example, if there is a trade-off between risk and return (so the agent cares about risk), then there should exist m1 > m2 and v1 > v2 > 0 such that the agent V (m1 , v1 ) < V (m2 , v2 ) and the agent would turn
Arbitrage, State Prices and Portfolio Theory
(m-1r )
Ch. 10:
E F
qM
’
mean return = r+q
625
qi rf
standard deviation of return = q’ Vq Fig. 1. The efficient frontier in means and standard deviations
down the higher return because of the higher risk. However, it is easy to construct random variables x1 and x2 with x1 > x2 that have means m1 and m2 and variances v1 and v2 . In other words, a non-trivial mean-variance utility function (that does not simply maximize the mean) cannot always prefer more to less. The two typical motivations of mean-variance preferences have different resolutions of this conundrum. Quadratic utility does not prefer more to less, so there is no inconsistency. This is not a nice feature of quadratic utility but it may not be a fatal problem either. Multivariate normality does not define preferences for all random variables, and in particular the random variables that generate the paradox are not available. When using any model, we need to think about whether the unrealistic features of the model are important for the application at hand. Many important features of the CAPM are illustrated by Figures 1 and 2. In Figure 1, F is the efficient frontier of risky asset returns in means and standard deviations. Other feasible portfolios of risky assets will plot to the right of F and will not be chosen by any agent who can choose only among the risky assets and prefers less risk at a given mean. And, agents who choose higher mean given the standard deviation, will only choose risky portfolios on the upper branch of F, which is called the positively efficient frontier of risky assets. When the risk-free asset rf is always available, all agents preferring a higher mean at a given standard deviation will choose a portfolio along the frontier E. 10 One important feature in either case is two-fund separation, namely, that the entire frontier F or E is spanned by two portfolios, which can be chosen to be any portfolios at two distinct points on the frontier. This is called a 10
For agents who prefer less risk at a given mean but may not prefer a higher mean at a given level of risk, there is another branch of E below that is the reflection of its continuation to the left of the axis.
626
P.H. Dybvig and S.A. Ross
“mutual fund separation” result because we can separate the portfolio choice problem into two stages: first find two “mutual funds” (portfolios) spanning the efficient frontier (which can be chosen independently of preferences) and then find the mixture of the two funds appropriate for the particular preferences. For a typical agent who prefers more to less and prefers to avoid risk, preferences are increasing up and to the left in Figure 1. A more risk-averse agent will choose a portfolio on the lower left part of the frontier, with low return but low risk, and a less risk-averse agent will choose a portfolio on the upper right part of the frontier, accepting higher risk in exchange for higher return. Figure 1 also illustrates the Sharpe ratio [Sharpe (1966)], which is used for performance measurement. The line through the riskless asset rf and the market portfolio q M has a slope in Figure 1 that is larger than the slope for any inefficient portfolio such as q i . The slope of the line through a particular portfolio is the Sharpe ratio for the particular portfolio. The Sharpe ratio is largest for an efficient portfolio and the shortfall below that amount is the measure of inefficiency for any other portfolio. (An even greater Sharpe ratio would be possible if the efficient proxy is inefficient in sample or if we are considering a portfolio, say from an informed trading strategy, that is not a fixed portfolio of the assets.) In practice, due to random sampling error, even an efficient portfolio will have a measured Sharpe ratio that is not the largest value. When stock returns are Gaussian, there is an important connection between the measured Sharpe ratio of the market portfolio and the likelihood ratio test of the CAPM [Gibbons, Ross and Shanken (1989)]. Figure 2 shows the security market line, which quantifies the relation between risk and return in the CAPM. Risk is measured using the beta coefficient, which is the slope coefficient of a linear regression of the asset’s return on the market’s return. If the CAPM is true, all assets and portfolios will plot on the Security Market Line (SML) that goes through the risk-free asset rf and the market portfolio of risky assets q M . In practice, measured asset returns are affected by random sampling error; if the CAPM is true it is entirely random whether a portfolio will plot above or below the security market line ex post. The use of beta as the appropriate measure of risk tells us that investors are rewarded for taking on market risk (correlated with market returns) not taking on idiosyncratic risk (uncorrelated with the market). If the security market line tells us how much of a reward is justified for a given amount of risk, it makes intuitive sense that deviations from the security market line can be used to measure superior or inferior performance. This is the intuition behind the Treynor Index and Jensen’s alpha [Treynor (1965) and Jensen (1969)]. For example, in Figure 2, Jensen’s alpha for q s is a s > 0, indicating superior performance, and Jensen’s alpha for q u is a u < 0, indicating underperformance. Unfortunately, any formal motivation for using Jensen’s alpha must come from outside the CAPM, since if the CAPM is true then the expected value of Jensen’s alpha is zero and the realized value is purely random. Theoretical models that incorporate superior performance from information-gathering have given mixed results on the value of using the security market line for measuring performance: a superior performer with
Arbitrage, State Prices and Portfolio Theory
(m-1r )
Ch. 10:
mean return = r+q
’
qs a
627
qe
s
qM
au qu
rf
1.0 ’ beta = q’ Vq M qM Vq M Fig. 2. The security market line connecting risk and return.
security-specific information will have a positive Jensen’s alpha, but for market timing a superior performer may have a negative Jensen’s alpha and may even plot inside the efficient frontier for static strategies [Mayers and Rice (1979) and Dybvig and Ross (1985)]. The Treynor Index is the slope of the line through the evaluated portfolio and the risk-free asset in the security market line diagram Figure 2. Performance is determined by comparing a portfolio’s Treynor Index to that of the market; a larger Treynor Index indicates better performance. The Treynor index will indicate superior or inferior performance compared to the market the same as the Jensen measure. However, the ordering of superior or inferior performers can be different because the Treynor measure is adjusted for leverage. The main results of the CAPM can be derived from the first-order condition (9). The first-order condition for agent n is m − r1 = l n Sq n ,
(19)
where ln = 2V n v (m, v)/ V n m (m, v), evaluated at the optimum m = r + (m − r1) q n and v = q n Sq n . Now, the market portfolio is the wealth-weighted average of all agents’ portfolios, N q
M
= n =N 1
wn q n
n=1
wn
,
(20)
and consequently we have the wealth-weighted average of the first-order conditions m − r1 = lM Sq M ,
(21)
628
P.H. Dybvig and S.A. Ross
where N l
M
= n N= 1
wn ln
n=1
wn
.
(22)
We can plug in the market portfolio to solve for l M and we obtain Sq M (m M − r), q M Sq M
m − r1 =
(23)
where m M ≡ q M m is the mean return on the market portfolio of risky assets. Applying Equation (23) to obtain the expected excess return of a portfolio q of risky assets (with q 1 = 1 since a portfolio of risky assets does not include any holdings of the risk-free asset), we have that 11 q m − r = lM q Sq M = b q (m M − r),
(24)
where b q is the portfolio’s beta, which is the slope coefficient of a regression of the returns of the portfolio q’s return on the market return, bq ≡
q Sq M . q M Sq M
(25)
The SML equation we plotted in Figure 2 is Equation (24). For a portfolio q, Jensen’s alpha is given by q m − r − b q (m M − r),
(26)
its Treynor index is q m − r , bq
(27)
and its Sharpe ratio is q m − r √ . q Sq
(28)
The portfolios encountered in practice are actively managed and the formulas for these performance measures would be more complex than for the simple fixed mix 11 We looked at the simpler case in the text, but the same pricing result holds for a portfolio including a holding in the risky asset. In this case, the expected return on the portfolio is q m + (1 − q 1) r and the expected excess return is q m + (1 − q 1) r − r = q m − q 1r.
Ch. 10:
Arbitrage, State Prices and Portfolio Theory
629
of assets q. However, the concepts are unchanged with the natural adaptations, e.g., replacing q m by the sample mean return on the portfolio and replacing b q = q Sq M / q M Sq M by the estimated slope from the regression of the portfolio return on the market return.
6. Mutual fund separation theory The general portfolio problem for arbitrary preferences and distributions is sufficiently rich to allow for nearly any sort of qualitative behavior [see Hart (1975) for negative results or Cass and Stiglitz (1972) for positive results in special cases]. In an effort to simplify this problem and obtain results that allow for aggregation so that the general behavior of the market can be understood in terms of the primitive properties of risk aversion and of the underlying distributions a collection of results known as separation results have been developed. Mutual Fund Separation is the separation of portfolio choice into two stages. The first stage is the selection of small set of “mutual funds” (portfolios) among which choice is to be made, and the second stage is the selection of an allocation to the mutual funds. We have “k-fund separation” for a particular class of distributions and a particular class of utility functions if for each joint return distribution in the class there exist k funds that can be used in the two-step procedure while making agents with utility in the class and any wealth level just as well off as the choosing in the whole market. The important restriction is that the choice of the funds is done once for the entire class of utility functions. In the literature, there are two general approaches: one approach [Hakansson (1969) and Cass and Stiglitz (1970)] restricts utility functions and has relatively unrestricted distributions, while the other approach [Ross (1978a)] restricts distributions and has relatively unrestricted utility functions. Either approach is useful for deriving asset pricing results because, for example, if individual investors hold mixtures of two funds, then the market portfolio must be a mixture of the same two funds. 6.1. Preference approach The preference approach focuses on classes of special utility functions. Many of the results involve utility functions that have properties of homotheticity or invariance. It is important that we require the same funds to work for each utility function at all wealth levels, since this avoids “accidental” cases such as a set containing any two utility functions over returns. Analysis in this section will use Problem 3, in some cases adding the assumption that one of the assets is riskless. First, we consider one-fund separation, which requires all portfolio choices to lie in a ray. Given the budget constraint, this implies that the portfolio choice is just proportional to wealth. For this to happen at all prices, the preferences have to be
630
P.H. Dybvig and S.A. Ross
homothetic. And, given the von Neumann–Morgenstern restriction, this is equivalent to either logarithmic utility, u(c) = log(c), or power utility, u(c) = c1 − R / (1 − R). Theorem 5: One-fund separation from preferences. The following are equivalent properties of a nonempty class U of utility functions: (1) For each joint distribution of security returns there exists a single portfolio q, such that every u ∈ U is just as well off choosing a multiple of q as choosing from the entire market. (2) The class U consists of a single utility function (up to an affine transform that leaves preferences unchanged) of the form u(c) = log(c) or u(c) = c1 − R / (1 − R). Proof: (2)⇒(1) Let u be the single utility function in U . The objective function in terms of portfolio proportions is E[u(wq r)]. In the log case, this is E[log(wq r)] = log(w) + E[log(q r)], and maximizing the objective is the same as maximizing the second term which does not depend on w. In the power case, the objective is E[(wq r)1 − R / (1 − R)] = w1 − R E[(q r)1 − R / (1 − R)], and maximizing the objective is the same as maximizing the second factor which does not depend on w. In either case, choosing the proportions that work at one wealth level gives a portfolio in proportions that will be optimal at all wealth levels. (1)⇒(2) Suppose u is an element of the class U . Then, the first-order condition for an optimum implies E[(1 + r − g) u (W q (1 + r))] = 0,
(29)
where g = l/E[u (W q (1 + r))]. In general, ø must satisfy Equation (8) and may vary with W , but for complete markets ø is uniquely determined by Equation (8) and may be taken as given. For the same portfolio weights q to be optimal for all W , it follows that the derivative of the first-order condition is zero and for complete markets we have E[(1 + r − g) q (1 + r) u (W q (1 + r))] = 0.
(30)
Now, one-fund separation implies that in all complete markets Equation (29) implies Equation (30), but the only way this can always be true is if everywhere cu (c) = −Ru (c),
(31)
where R = 1 implies logarithmic utility and any other R 0 implies power utility. (R 0 corresponds to a convex utility function.) And all utility functions in the class U must correspond to the same preferences or else it is easy to construct a 2-state counterexample. The utility functions in the theorem comprise the Constant Relative Risk Aversion (CRRA) class for which the Arrow–Pratt coefficient of relative risk aversion
Ch. 10:
Arbitrage, State Prices and Portfolio Theory
631
−cu (c)/u (c) is a constant. [See Arrow (1965) and Pratt (1964, 1976). 12 ] Other special utility functions lead to two-fund separation if there is a riskless asset. The Constant Absolute Risk Aversion (CARA) class of utility functions of the form u(c) = − exp(−Ac)/A for which the Arrow–Pratt coefficient of absolute risk aversion −u (c)/u (c) is constant leads to a special two-fund separation result in which the risky portfolio holding is constant and only the investment in the risk-free asset is changing as wealth changes. When there is a riskless asset, there is also two-fund separation in the larger Linear Risk Tolerance (LRT) class which encompasses the other two classes as well as wealth-translated relative risk aversion preferences of the form u(c) = log(c − c0 ) or u(c) = (c − c0 )1 − R / (1 − R). The linear risk tolerance class is defined by the risk tolerance −u (c)/u (c) having the linear form a(c − c0 ). We can include in this class the satiated utility functions of the form −(c − c0 )1 − R / (1 − R) defined for c c0 (and is typically extended to c > c0 in the obvious way in the quadratic case R = −1). With quadratic utility, we have a special result of two-fund separation even without a risk-free asset due to linearity of marginal utility. In these results, all utility functions in the class U must have the same power (or absolute risk aversion coefficient for exponential utility) but can have different translates c0 (but exponential utility is unchanged under translation). For details and proofs, see Cass and Stiglitz (1970). 6.2. Beliefs We have already seen one case of separation based on beliefs, which is in meanvariance analysis motivated by multivariate normality, as discussed in the section on the CAPM. Mean-variance preferences can also be derived from more general transformed spherically distributed preferences discussed by Chamberlain (1983). 13 We turn now to a strictly more general class, the separating distributions of Ross (1978a). The central intuition behind the separating distributions is that risk-averse agents will not choose to take on risk without any reward. This is the same intuition as in mean-variance analysis, but it is somewhat more subtle because risk can no longer be characterized by variance for general concave von Neumann–Morgenstern preferences. The appropriate definition of risk is related to Jensen’s inequality, which says that for any convex function f (·) and any random variable x, E[ f (x)] f (E[x]), with strict equality if f (·) is strictly concave and x is not (almost surely) constant. A risk-averse von Neumann–Morgenstern utility function u(·) is concave (so that −u(·) is convex), and consequently for any random consumption c, E[u(c)] u(E[c]), with strict inequality for strictly concave u and nonconstant c. More importantly for portfolio choice problems, we can use Jensen’s inequality and the law of iterated expectations
12 Some other ways of comparing risk aversion are given by Kihlstr¨ om, Romer and Williams (1981) and Ross (1981). 13 Another special case of one-fund separation is the symmetric case of Samuelson (1967).
632
P.H. Dybvig and S.A. Ross
to conclude that adding conditional-mean-zero noise makes a risk-averse agent worse off. That gives us the following useful result: Lemma 1. If E[e|c] = 0 and u is concave, then E[u(c + e)] E[u(c)].
(32)
Proof: E[u(c + e)] = E[E[u(c + e)|c]], E[u(E[c + e|c])], = E[u(c)],
(33)
by Jensen’s inequality and the law of iterated expectations. In fact, it can be shown that one random variable is dominated by another with the same mean for all concave utility functions if and only if the first has the same distribution as the second plus conditional-mean-zero noise. This is one of the results of the theory of Stochastic Dominance, which was pioneered by Quirk and Saposnik (1962) and Hadar and Russell (1969) and was popularized by Rothschild and Stiglitz (1970, 1971). The basic idea behind the separating distributions is that there are k funds (e.g., 2 funds for 2-fund separation) such that everything else is equal to some portfolio of the k funds, plus conditional-mean-zero noise. Formally, we have Theorem 6. Consider a world with k funds that are portfolios with weights y1 , . . . , yk summing (∀ j)1 y j = 1 (or in vector notation, 1 y = 1 ). 14 Further assume that returns on each asset i can be written as ri =
k
bij y j r + ei ,
(34)
j=1
k (i.e., r = b y r + e), where j = 1 bij = 1 (i.e., 1b = 1) and for all linear combination h of the fund returns, e is conditional-mean-zero noise: E[e|h br] = 0.
(35)
Then any agent with increasing and concave von Neumann–Morgenstern preferences will be just as happy choosing a portfolio of the k funds, as choosing from the As discussed earlier, the dimension of 1 is determined by context; in 1 y = 1 , the first occurrence of 1 is k × 1 and the second occurrence of 1 is n × 1. 14
Ch. 10:
Arbitrage, State Prices and Portfolio Theory
633
entire market. More formally, for each monotone and concave u and for each feasible portfolio q with 1 q = 1, there exists another portfolio h with 1 h such that Eu(h r) Eu(q r). Proof: Consider any portfolio q with 1 q = 1. Then q r = q ( by r + e), = q by r + q e.
(36)
But yb q is a valid portfolio because 1 yb q = 1 b q = 1 q = 1. And, the second term is conditional-mean-zero noise. Therefore, by Lemma 1 all agents with concave preferences would be at least as happy to switch from q to the portfolio yb q, which is a portfolio of the k funds (with weights b q). In the case of 1- and 2-fund separating distributions, the characterization is necessary as well as sufficient [see Ross (1978a)]. In the CAPM derived using multivariate normality, it is easy to show that the SML implies that each mean-variance inefficient portfolio has a payoff equal to the payoff of the efficient portfolio with the same mean plus conditional-mean-zero noise. Given that the mean-variance frontier is spanned by two portfolios, we see that the CAPM with multivariate normality is indeed in the class of 2-fund separating distributions.
7. Arbitrage pricing theory (APT) The Arbitrage Pricing Theory, which was introduced in Ross (1976a,c), is a model of security pricing that generalizes the pricing relation in the CAPM and also builds on the intuition of the separating distributions. First, we start with a factor model of returns of the sort studied in statistics: r = m + f b + e,
(37)
where m is a vector of mean returns (unrestricted at the moment but to be restricted by the theory), f is a vector of factor returns, of dimensionality much less than r, b is a matrix of factor loadings, and e is a vector of uncorrelated idiosyncratic noise terms. We can represent the restriction to the factor model by writing the covariance matrix as var(r) = bb + D,
(38)
where we have assumed an orthonormal set of factors with the identity matrix as covariance matrix (without loss of generality because we can always work with a linear transformation), and where D is a diagonal matrix which is the covariance matrix of the vector of security-specific noise terms e. The factor model is a useful restriction for
634
P.H. Dybvig and S.A. Ross
empirical work on security returns: given that typically we have many securities for the number of time periods, the full covariance matrix is not identified but a sufficiently low dimensional factor model has many fewer parameters and can be estimated. One intuition of the APT is that idiosyncratic risk is not very important economically and should not be priced. Another intuition of the APT is that compensation for risk should be linear or else there will be arbitrage. For example, if there is a single factor and two assets have different exposures to the factor (betas), excess return must be proportional to the risk exposure. Suppose the compensation per unit risk is larger for the asset with a larger risk exposure. Then a portfolio mixture of the risk-free asset and the high-risk asset will have the same risk exposure as the low-risk asset but a higher expected return, and combining a long position in the mixture with a short position in the low-risk asset gives a pure profit. This profit will be riskless in the absence of idiosyncratic risk; it will be profitable for some agents if idiosyncratic risk is diversifiable. Conversely, if the compensation per unit risk is larger for the asset with the lower risk exposure, the other asset can be dominated by a combination of a long position in the less risky asset with a short (borrowing) position in the risk-free asset. The main consequence of the APT is a pricing equation that looks like a multifactor version of the CAPM equation: m = rf 1 + Gb.
(39)
Here, rf is the risk-free rate and G is the vector of factor risk premia. There are several approaches to motivating this APT pricing equation; see for example Ross (1976a,c), Dybvig (1983), or Grinblatt and Titman (1983). The APT shares important features of the CAPM: the value of diversification, compensation for taking on systematic risk, and no compensation for taking on idiosyncratic risk. The main difference is that there may be multiple factors, and that the priced factors are the common factors that appear in many securities and not necessarily just the market factor.
8. Conclusion On reflection it is surprising that even our simplest context of a single-period neoclassical model of investments has such a rich theoretical development. We have hit on many of the highlights but even so we cannot claim to an exhaustive review of all that is known.
References Abel, A.B. (1990), “Asset prices under habit formation and catching up with the Joneses”, American Economic Review 80:38−42.
Ch. 10:
Arbitrage, State Prices and Portfolio Theory
635
Arrow, K., and G. Debreu (1954), “Existence of an equilibrium for a competitive economy”, Econometrica 22:265−290. Arrow, K., and F. Hahn (1971), General Competitive Analysis (Holden-Day, San Francisco). Arrow, K.J. (1964), “The role of securities in the optimal allocation of risk-bearing”, Review of Economic Studies 31:91−96. Arrow, K.J. (1965), “Aspects of the theory of risk-bearing”, Yrjo Jahnsson Lectures (Yrj¨o Jahnssonin S¨aa¨ ti¨o, Helsinki). Banz, R.W., and M.H. Miller (1978), “Prices for state-contingent claims: some estimates and applications”, Journal of Business 51:653−72. Barberis, N., M. Huang and T. Santos (2001), “Prospect theory and asset prices”, Quarterly Journal of Economics 116:1−53. Bergman, Y. (1985), “Time preference and capital asset pricing models”, Journal of Financial Economics 14:145−159. Bewley, T. (1988), “Knightian uncertainty”, Nancy Schwartz Lecture (Northwestern MEDS, Evanston). Black, F., and M. Scholes (1973), “The pricing of options and corporate liabilities”, Journal of Political Economy 81:637−654. Blume, L., A. Brandenburger and E. Dekel (1991), “Lexicographic probabilities and choice under uncertainty”, Econometrica 59:61−79. Breeden, D.T. (1979), “An intertemporal asset pricing model with stochastic consumption and investment opportunities”, Journal of Financial Economics 7:265−296. Breeden, D.T., and R.H. Litzenberger (1978), “Prices of state-contingent claims implicit in option prices”, Journal of Business 51:621−651. Cass, D., and J.E. Stiglitz (1970), “The structure of investor preferences and separability in portfolio allocation: a contribution to the pure theory of mutual funds”, Journal of Economoic Theory 2: 122−160. Cass, D., and J.E. Stiglitz (1972), “Risk aversion and wealth effects on portfolios with many assets”, Review of Economic Studies 39:331−354. Chamberlain, G. (1983), “A characterization of the distributions that imply mean-variance utility functions”, Journal of Economic Theory 29:185−201. Constantinides, G. (1991), “Habit formation: a resolution of the equity premium puzzle”, Journal of Political Economy 98:519−543. Cox, J.C., and S.A. Ross (1975), “A survey of some new results in financial option pricing theory”, Journal of Finance 31:383−402. Cox, J.C., S.A. Ross and M. Rubinstein (1979), “Option pricing: a simplified approach”, Journal of Financial Economics 7:229−263. Debreu, G. (1959), Theory of Value: An Axiomatic Analysis of Economic Equilibrium (Yale University Press, New Haven). Duesenberry, J.S. (1949), Income, Saving, and the Theory of Consumer Behavior (Harvard University Press, Cambridge, MA). Dybvig, P.H. (1983), “An explicit bound on individual assets’ deviations from APT pricing in a finite economy”, Journal of Financial Economics 12:483−496. Dybvig, P.H. (1988a), “Distributional analysis of portfolio choice”, Journal of Business 61:369−393. Dybvig, P.H. (1988b), “Inefficient dynamic portfolio strategies, or how to throw away a million dollars in the stock market”, Review of Financial Studies 1:67−88. Dybvig, P.H. (1995), “Duesenberry’s ratcheting of consumption: optimal dynamic consumption and investment given intolerance for any decline in standard of living”, Review of Economic Studies 62:287−313. Dybvig, P.H., and S.A. Ross (1985), “Differential information and performance measurement using a security market line”, Journal of Finance 40:383−399. Dybvig, P.H., and S.A. Ross (1987), “Arbitrage”, in: J. Eatwell, M. Milgate and P. Neuman, eds., The New Palgrave: a Dictionary of Economics (Macmillan, London) pp. 100–106.
636
P.H. Dybvig and S.A. Ross
Epstein, L.G., and S.E. Zin (1989), “Substitution, risk aversion, and the temporal behavior of consumption and asset returns: a theoretical framework”, Econometrica 57:937−969. Fishburn, P. (1988), Nonlinear Preference and Utility Theory (John Hopkins, Baltimore). Gibbons, M.R., S.A. Ross and J. Shanken (1989), “A test of the efficiency of a given portfolio”, Econometrica 57:1121−1152. Grinblatt, M., and S. Titman (1983), “Factor pricing in a finite economy”, Journal of Financial Economics 12:497−507. Hadar, J., and W.R. Russell (1969), “Rules for ordering uncertain prospects”, American Economic Review 59:25−34. Hakansson, N.H. (1969), “Risk disposition and the separation property in portfolio selection”, Journal of Financial and Quantitative Analysis 4:401−416. Harrison, J.M., and D.M. Kreps (1979), “Martingales and arbitrage in multiperiod securities markets”, Journal of Economic Theory 20:381−408. Harrison, J.M., and S. Pliska (1981), “Martingales and stochastic integrals in the theory of continuous trading”, Stochastic Processes and Their Applications 11:215−260. Hart, O.D. (1975), “Some negative results on the existence of comparative statics results in portfolio theory”, The Review of Economic Studies 42:615−621. Herstein, I.N., and J. Milnor (1953), “An axiomatic approach to measurable utility”, Econometrica 21:291−297. Hindy, A., and C. Huang (1992), “On intertemporal preferences for uncertain consumption: a continuous time approach”, Econometrica 60:781−801. Jensen, M.C. (1969), “Risk, the pricing of capital assets, and the evaluation of investment portfolios”, Journal of Business 42:167−247. Jouini, E., and H. Kallal (2001), “Efficient trading strategies in the presence of market frictions”, Review of Financial Studies 14:343−369. Kahneman, D., and A. Tversky (1979), “Prospect theory: an amalysis of decision under risk”, Econometrica 47:263−292. Karlin, S. (1959), Mathematical Methods and Theory in Games, Programming, and Economics (AddisonWesley, Reading, MA). Kihlstr¨om, R.E., D. Romer and S. Williams (1981), “Risk aversion with random initial wealth”, Econometrica 49:911−920. Knight, F.H. (1921), Risk, Uncertainty, and Profit (Houghton Mifflin, New York). Kreps, D., and E. Porteus (1978), “Temporal resolution of uncertainty and dynamic choice theory”, Econometrica 46:185−200. Lintner, J. (1965), “The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets”, Review of Economics and Statistics 47:13−37. Luce, R.D., and H. Raiffa (1957), Games and Decisions (Wiley, New York). Machina, M.J. (1982), “ ‘Expected Utility’ Analysis without the Independence Axiom”, Econometrica 50:277−324. Markowitz, H. (1952), “Portfolio selection”, Journal of Finance 7:77−91. Markowitz, H. (1959), Portfolio Selection, Efficient Diversification of Investments (Wiley, New York). Mayers, D., and E.M. Rice (1979), “Measuring portfolio performance and the empirical content of asset pricing models”, Journal of Financial Economics 7:3−28. Merton, R.C. (1971), “Optimal consumption and portfolio rules in a continuous-time model”, Journal of Economic Theory 3:373−413. Merton, R.C. (1973), “An intertemporal capital asset pricing model”, Econometrica 41:867−887. Pollak, R.A. (1970), “Habit formation and dynamic demand functions”, Journal of Political Economy 78:745−763. Pratt, J.W. (1964), “Risk aversion in the small and the large”, Econometrica 32:122−136. Pratt, J.W. (1976), “Risk aversion in the small and the large (erratum)”, Econometrica 55:420.
Ch. 10:
Arbitrage, State Prices and Portfolio Theory
637
Quirk, J.P., and R. Saposnik (1962), “Admissibility and measurable utility functions”, Review of Economic Studies 29:140−146. Ross, S.A. (1976a), “The arbitrage theory of capital asset pricing”, Journal of Economic Theory 13:341−360. Ross, S.A. (1976b), “Options and efficiency”, Quarterly Journal of Economics 90:75−89. Ross, S.A. (1976c), “Return, risk, and arbitrage”, in: I. Friend and J. Bicksler, eds., Risk and Return in Finance 1 (Ballinger, Cambridge, MA) pp. 189–218. Ross, S.A. (1978a), “Mutual fund separation in financial theory – the separating distributions”, Journal of Economic Theory 17:254−286. Ross, S.A. (1978b), “A simple approach to the valuation of risky streams”, Journal of Business 51: 453−475. Ross, S.A. (1981), “Some stronger measures of risk aversion in the small and the large”, Econometrica 49:621−638. Rothschild, M., and J.E. Stiglitz (1970), “Increasing risk: I. A definition”, Journal of Economic Theory 2:225−243. See also “Addendum to ‘Increasing risk: I. A definition’ ”, Journal of Economic Theory 5:306. Rothschild, M., and J.E. Stiglitz (1971), “Increasing risk: II. Its economic consequences”, Journal of Economic Theory 3:66−84. Rubinstein, M. (1976), “The valuation of uncertain income streams and the pricing of options”, Bell Journal of Economics 7:407−425. Samuelson, P.A. (1967), “General proof that diversification pays”, Journal of Financial and Quantitative Analysis 2:1−13. Sharpe, W.F. (1964), “Capital asset prices: a theory of market equilibrium under conditions of risk”, Journal of Finance 19:425−442. Sharpe, W.F. (1966), “Mutual fund performance”, Journal of Business 39:119−138. Tobin, J. (1958), “Liquidity preference as behavior towards risk”, Review of Economic Studies 25:65−86. Treynor, J. (1965), “How to rate management of investment funds”, Harvard Business Review 43:63−75. von Neumann, J., and O. Morgenstern (1944), Theory of Games and Economic Behavior (Princeton University Press, Princeton, NJ).
This Page Intentionally Left Blank
Chapter 11
INTERTEMPORAL ASSET PRICING THEORY DARRELL DUFFIE ° Stanford University
Contents Abstract Keywords 1. Introduction 2. Basic theory 2.1. 2.2. 2.3. 2.4. 2.5. 2.6. 2.7. 2.8. 2.9. 2.10.
Setup Arbitrage, state prices, and martingales Individual agent optimality Habit and recursive utilities Equilibrium and Pareto optimality Equilibrium asset pricing Breeden’s consumption-based CAPM Arbitrage and martingale measures Valuation of redundant securities American exercise policies and valuation
3. Continuous-time modeling 3.1. 3.2. 3.3. 3.4. 3.5. 3.6. 3.7. 3.8. 3.9. 3.10. 3.11.
Trading gains for Brownian prices Martingale trading gains The Black–Scholes option-pricing formula Ito’s Formula Arbitrage modeling Numeraire invariance State prices and doubling strategies Equivalent martingale measures Girsanov and market prices of risk Black–Scholes again Complete markets
641 641 642 642 643 644 646 647 649 651 653 654 656 657 661 662 663 665 668 670 670 671 672 672 676 677
° I am grateful for impetus from George Constantinides and Ren´ e Stulz, and for inspiration and guidance from many collaborators and Stanford colleagues. Address: Graduate School of Business, Stanford University, Stanford CA 94305-5015 USA; or email at duffi
[email protected]. The latest draft can be downloaded at www.stanford.edu/~duffie/.
Handbook of the Economics of Finance, Edited by G.M. Constantinides, M. Harris and R. Stulz This chapter is a moderately revised and updated version of work that appeared originally in “Dynamic Asset Pricing Theory”, © 2002 Princeton University Press.
640
D. Duffie
3.12. Optimal trading and consumption 3.13. Martingale solution to Merton’s problem
4. Term-structure models 4.1. 4.2. 4.3. 4.4. 4.5. 4.6.
One-factor models Term-structure derivatives Fundamental solution Multifactor term-structure models Affine models The HJM model of forward rates
5. Derivative pricing 5.1. Forward and futures prices 5.2. Options and stochastic volatility 5.3. Option valuation by transform analysis
6. Corporate securities 6.1. 6.2. 6.3. 6.4. 6.5. 6.6. 6.7.
Endogenous default timing Example: Brownian dividend growth Taxes, bankruptcy costs, capital structure Intensity-based modeling of default Zero-recovery bond pricing Pricing with recovery at default Default-adjusted short rate
References
678 682 686 687 691 693 695 696 699 702 702 705 708 711 712 713 717 719 721 722 724 725
Ch. 11:
Intertemporal Asset Pricing Theory
641
Abstract This is a survey of the basic theoretical foundations of intertemporal asset pricing theory. The broader theory is first reviewed in a simple discrete-time setting, emphasizing the key role of state prices. The existence of state prices is equivalent to the absence of arbitrage. State prices, which can be obtained from optimizing investors’ marginal rates of substitution, can be used to price contingent claims. In equilibrium, under locally quadratic utility, this leads to Breeden’s consumption-based capital asset pricing model. American options call for special handling. After extending the basic modeling approach to continuous-time settings, we turn to such applications as the dynamics of the term structure of interest rates, futures and forwards, option pricing under jumps and stochastic volatility, and the market valuation of corporate securities. The pricing of defaultable corporate debt is treated from a direct analysis of the incentives or ability of the firm to pay, and also by standard reduced-form methods that take as given an intensity process for default. This survey does not consider asymmetric information, and assumes price-taking behavior and the absence of transactions costs and many other market imperfections.
Keywords asset pricing, state pricing, option pricing, interest rates, bond pricing JEL classification: G12, G13, E43, E44
642
D. Duffie
1. Introduction This is a survey of “classical” intertemporal asset pricing theory. A central objective of this theory is to reduce asset-pricing problems to the identification of “state prices”, a notion of Arrow (1953) from which any security has an implied value as the weighted sum of its future cash flows, state by state, time by time, with weights given by the associated state prices. Such state prices may be viewed as the marginal rates of substitution among state-time consumption opportunities, for any unconstrained investor, with respect to a numeraire good. Under many types of market imperfections, state prices may not exist, or may be of relatively less use or meaning. While market imperfections constitute an important thrust of recent advances in asset pricing theory, they will play a limited role in this survey, given the limitations of space and the priority that should be accorded to first principles based on perfect markets. Section 2 of this survey provides the conceptual foundations of the broader theory in a simple discrete-time setting. After extending the basic modeling approach to a continuous-time setting in Section 3, we turn in Section 4 to term-structure modeling, in Section 5 to derivative pricing, and in Section 6 to corporate securities. The theory of optimal portfolio and consumption choice is closely linked to the theory of asset pricing, for example through the relationship between state prices and marginal rates of substitution at optimality. While this connection is emphasized, for example in Sections 2.3–2.4 and 3.12–3.13, the theory of optimal portfolio and consumption choice, particularly in dynamic incomplete-markets settings, has become so extensive as to defy a proper summary in the context of a reasonably sized survey of asset-pricing theory. The interested reader is especially directed to the treatments of Karatzas and Shreve (1998) and Schroder and Skiadas (1999, 2002). For ease of reference, as there is at most one theorem per sub-section, we refer to a theorem by its subsection number, and likewise for lemmas and propositions. For example, the unique proposition of Section 2.9 is called “Proposition 2.9”.
2. Basic theory Radner (1967, 1972) originated our standard approach to a dynamic equilibrium of “plans, prices, and expectations,” extending the static approach of Arrow (1953) and Debreu (1953). 1 After formulating this standard model, this section provides the equivalence of no arbitrage and state prices, and shows how state prices may be derived from investors’ marginal rates of substitution among state-time consumption opportunities. Given state prices, we examine the pricing of derivative securities, such
1
The model of Debreu (1953) appears in Chapter 7 of Debreu (1959). For more details in a finance setting, see Dothan (1990). The monograph by Magill and Quinzii (1996) is a comprehensive survey of the theory of general equilibrium in a setting such as this.
Ch. 11:
Intertemporal Asset Pricing Theory
643
as European and American options, whose payoffs can be replicated by trading the underlying primitive securities. 2.1. Setup We begin for simplicity with a setting in which uncertainty is modeled as some finite set W of states, with associated probabilities. We fix a set F of events, called a tribe, also known as a s -algebra, which is the collection of subsets of W that can be assigned a probability. The usual rules of probability apply. 2 We let P(A) denote the probability of an event A. There are T + 1 dates: 0, 1, . . . , T . At each of these, a tribe Ft ⊂ F is the set of events corresponding to the information available at time t. Any event in Ft is known at time t to be true or false. We adopt the usual convention that Ft ⊂ Fs whenever t s, meaning that events are never “forgotten”. For simplicity, we also take it that events in F0 have probability 0 or 1, meaning roughly that there is no information at time t = 0. Taken altogether, the filtration F = {F0 , . . . , FT }, sometimes called an information structure, represents how information is revealed through time. For any random variable Y , we let Et (Y ) = E(Y | Ft ) denote the conditional expectation of Y given Ft . In order to simplify things, for any two random variables Y and Z, we always write “Y = Z” if the probability that Y Ñ Z is zero. An adapted process is a sequence X = {X0 , . . . , XT } such that, for each t, Xt is a random variable with respect to (W, Ft ). Informally, this means that Xt is observable at time t. An adapted process X is a martingale if, for any times t and s > t, we have Et (Xs ) = Xt . A security is a claim to an adapted dividend process, say d, with dt denoting the dividend paid by the security at time t. Each security has an adapted security-price process S, so that St is the price of the security, ex dividend, at time t. That is, at each time t, the security pays its dividend dt and is then available for trade at the price St . This convention implies that d0 plays no role in determining ex-dividend prices. The cum-dividend security price at time t is St + dt . We suppose that there are N securities defined by an RN -valued adapted dividend process d = (d (1) , . . . , d (N ) ). These securities have some adapted price process S = (S (1) , . . . , S (N ) ). A trading strategy is an adapted process q in RN . Here, qt represents the portfolio held after trading at time t. The dividend process d q generated by a trading strategy q is defined by dtq = qt − 1 · (St + dt ) − qt · St , with “q−1 ” taken to be zero by convention.
2
The triple (W, F, P) is a probability space, as defined for example by Jacod and Protter (2000).
(1)
644
D. Duffie
2.2. Arbitrage, state prices, and martingales Given a dividend–price pair (d, S) for N securities, a trading strategy q is an arbitrage if d q > 0 (that is, if d q 0 and d q Ñ 0). An arbitrage is thus a trading strategy that costs nothing to form, never generates losses, and, with positive probability, will produce strictly positive gains at some time. One of the precepts of modern asset pricing theory is a notion of efficient markets under which there is no arbitrage. This is a reasonable axiom, for in the presence of an arbitrage, any rational investor who prefers to increase his dividends would undertake such arbitrages without limit, so markets could not be in equilibrium, in a sense that we shall see more formally later in this section. We will first explore the implications of no arbitrage for the representation of security prices in terms of “state prices”, the first step toward which is made with the following result. Proposition. There is no arbitrage if and only if there is a strictly positive adapted process p such that, for any trading strategy q,
E
T
pt dtq
= 0.
t=0
Proof: Let Q denote the space of trading strategies. For any q and f in Q and scalars a and b, we have ad q + bd f = d aq + bf . Thus, the marketed subspace M = {d q : q ∈ Q} of dividend processes generated by trading strategies is a linear subspace of the space L of adapted processes. Let L+ = {c ∈ L: c 0}. There is no arbitrage if and only if the cone L+ and the marketed subspace M intersect precisely at zero. Suppose there is no arbitrage. The Separating Hyperplane Theorem, in a version for closed convex cones that is sometimes called Stiemke’s Lemma (see Appendix B of Duffie (2001)) implies the existence of a nonzero linear functional F such that F(x) < F( y) for each x in M and each nonzero y in L+ . Since M is a linear subspace, this implies that F(x) = 0 for each x in M , and thus that F( y) > 0 for each nonzero y in L+ . This implies that F is strictly increasing. By the Riesz representation theorem, for any such linear function F there is a unique adapted process p , called the Riesz representation of F, such that
F(x) = E
T
pt xt
,
x ∈ L.
t=0
As F is strictly increasing, p is strictly positive, that is, P(pt > 0) = 1 for all t. The converse follows from the fact that if d q > 0 and p is a strictly positive process, T then E( t = 0 pt dtq ) > 0.
Ch. 11:
Intertemporal Asset Pricing Theory
645
For convenience, we call any strictly positive adapted process a deflator. A deflator p is a state-price density if, for all t, ⎛ ⎞ T 1 St = Et ⎝ (2) pj dj ⎠ . pt j=t+1
A state-price density is sometimes called a state-price deflator, a pricing kernel, or a marginal-rate-of-substitution process. For t = T , the right-hand side of Equation (2) is zero, so ST = 0 whenever there is a state-price density. It can be shown as an exercise that a deflator p is a state-price density if and only if, for any trading strategy q, ⎛ ⎞ T 1 qt · St = Et ⎝ t < T, (3) pj djq ⎠ , pt j=t+1
meaning roughly that the market value of a trading strategy is, at any time, the stateprice discounted expected future dividends generated by the strategy. t The gain process G for (d, S) is defined by Gt = St + j = 1 dj , the price plus accumulated dividend. Given a deflator g, the deflated gain process G g is defined t g by Gt = gt St + j = 1 gj dj . We can think of deflation as a change of numeraire. Theorem. The dividend–price pair (d, S) admits no arbitrage if and only if there is a state-price density. A deflator p is a state-price density if and only if ST = 0 and the state-price-deflated gain process G p is a martingale. Proof: It can be shown as an easy exercise that a deflator p is a state-price density if and only if ST = 0 and the state-price-deflated gain process G p is a martingale. Suppose there is no arbitrage. Then ST = 0, for otherwise the strategy q is an < T , qT = −ST . By the previous proposition, arbitrage when defined by qt = 0, t T there is some deflator p such that E( t = 0 dtq pt ) = 0 for any strategy q. We must prove Equation (2), or equivalently, that G p is a martingale. Doob’s Optional Sampling Theorem states that an adapted process X is a martingale if and only if E(Xt ) = X0 for any stopping time t T . Consider, for an arbitrary security n and an arbitrary stopping time t T , the trading strategy by q (k) = 0 for T q defined (n) (n) q k Ñ n and qt = 1, t < t, with qt = 0, t t. Since E( t = 0 pt dt ) = 0, we have
t (n) (n) (n) E −S0 p0 + pt dt + pt St = 0, t=1
implying that the p -deflated gain process G n,p of security n satisfies G0n,p = E(Gtn,p ). Since t is arbitrary, G n,p is a martingale, and since n is arbitrary, G p is a martingale. This shows that absence of arbitrage implies the existence of a state-price density. The converse is easy.
646
D. Duffie
The proof is motivated by those of Harrison and Kreps (1979) and Harrison and Pliska (1981) for a similar result to follow in this section regarding the notion of an “equivalent martingale measure”. Ross (1987), Prisman (1985), Kabanov and Stricker (2001), and Schachermayer (2001) show the impact of taxes or transactions costs on the state-pricing model. 2.3. Individual agent optimality We introduce an agent, defined by a strictly increasing 3 utility function U on the set L+ of nonnegative adapted “consumption” processes, and by an endowment process e in L+ . Given a dividend-price process (d, S), a trading strategy q leaves the agent with the total consumption process e + d q . Thus the agent has the budget-feasible consumption set C = {e + d q ∈ L+ : q ∈ Q}, and the problem sup U (c).
(4)
c∈C
The existence of a solution to Problem (4) implies the absence of arbitrage. Conversely, if U is continuous, 4 then the absence of arbitrage implies that there exists a solution to Problem (4). (This follows from the fact that the feasible consumption set C is compact if and only if there there is no arbitrage.) Assuming that (4) has a strictly positive solution c∗ and that U is continuously differentiable at c∗ , we can use the first-order conditions for optimality to characterize security prices in terms of the derivatives of the utility function U at c∗ . Specifically, for any c in L, the derivative of U at c∗ in the direction c is g (0), where g(a) = U (c∗ + ac) for any scalar a sufficiently small in absolute value. That is, g (0) is the marginal rate of improvement of utility as one moves in the direction c away from c∗ . This directional derivative is denoted ∇U (c∗ ; c). Because U is continuously differentiable at c∗ , the function that maps c to ∇U (c∗ ; c) is linear. Since d q is a budget-feasible direction of change for any trading strategy q, the first-order conditions for optimality of c∗ imply that ∇U (c∗ ; d q ) = 0,
q ∈ Q.
We now have a characterization of a state-price density. A function f : L → R is strictly increasing if f (c) > f (b) whenever c > b. For purposes of checking continuity or the closedness of sets in L, we will say that cn converges to c if E[ Tt=0 |cn (t) − c(t)|] → 0. Then U is continuous if U (cn ) → U (c) whenever cn → c. 3 4
Ch. 11:
Intertemporal Asset Pricing Theory
647
Proposition. Suppose that Problem (4) has a strictly positive solution c∗ and that U has a strictly positive continuous derivative at c∗ . Then there is no arbitrage and a state-price density is given by the Riesz representation p of ∇U (c∗ ), defined by ∇U (c∗ ; x) = E
T
pt x t
x ∈ L.
,
t=0
The Riesz representation of the utility gradient is also sometimes called the marginalrates-of-substitution process. Despite our standing assumption that U is strictly increasing, ∇U (c∗ ; · ) need not in general be strictly increasing, but is so if U is concave. As an example, suppose U has the additive form U (c) = E
T
c ∈ L+ ,
ut (ct ) ,
(5)
t=0
for some ut : R+ → R, t 0. It is an exercise to show that if ∇U (c) exists, then ∇U (c; x) = E
T
ut (ct ) xt
.
(6)
t=0
If, for all t, ut is concave with an unbounded derivative and e is strictly positive, then any solution c∗ to Equation (4) is strictly positive. Corollary. Suppose U is defined by Equation (5). Under the conditions of the Proposition, for any time t < T , St =
1
Et ut (ct∗ )
ut + 1 (ct∗+ 1 )(St + 1 + dt + 1 ) .
This result is often called the stochastic Euler equation, made famous in a timehomogeneous Markov setting by Lucas (1978). A precursor is due to LeRoy (1973). 2.4. Habit and recursive utilities The additive utility model is extremely restrictive, and routinely found to be inconsistent with experimental evidence on choice under uncertainty, as for example in Plott (1986). We will illustrate the state pricing associated with some simple extensions of the additive utility model, such as “habit-formation” utility and “recursive utility”.
648
D. Duffie
An example of a habit-formation utility is some U : L+ → R with U (c) = E
T
u(ct , ht ) ,
t=0
where u: R+ × R → R is continuously t differentiable and, for any t, the “habit” level of consumption is defined by ht = j = 1 aj ct − j for some a ∈ R+T . For example, we could take aj = g j for g ∈ (0, 1), which gives geometrically declining weights on past consumption. A natural motivation is that the relative desire to consume may be increased if one has become accustomed to high levels of consumption. By applying the chain rule, we can calculate the Riesz representation p of the gradient of U at a strictly positive consumption process c as pt = uc (ct , ht ) + Et
uh (cs , hs ) as − t
,
s>t
where uc and uh denote the partial derivatives of u with respect to its first and second arguments, respectively. The habit-formation utility model was developed by Dunn and Singleton (1986) and in continuous time by Ryder and Heal (1973), and has been applied to asset-pricing problems by Constantinides (1990), Sundaresan (1989) and Chapman (1998). Recursive utility, inspired by Koopmans (1960), Kreps and Porteus (1978) and Selden (1978), was developed for general discrete-time multi-period asset-pricing applications by Epstein and Zin (1989), who take a utility of the form U (c) = V0 , where the “utility process” V is defined recursively, backward in time from T , by Vt = F(ct , ~Vt + 1 | Ft ), where ~Vt + 1 | Ft denotes the probability distribution of Vt + 1 given Ft , where F is a measurable real-valued function whose first argument is a non-negative real number and whose second argument is a probability distribution, and finally where we take VT + 1 to be a fixed exogenously specified random variable. One may view Vt as the utility at time t for present and future consumption, noting the dependence on the future consumption stream through the conditional distribution of the following period’s utility. As a special case, for example, consider F(x, m) = f (x, E[h(Ym )]) ,
(7)
where f is a function in two real variables, h(·) is a “felicity” function in one variable, and Ym is any random variable whose probability distribution is m. This special case of the “Kreps–Porteus utility” aggregates the role of the conditional distribution of future consumption through an “expected utility of next period’s utility”. If h and J
Ch. 11:
Intertemporal Asset Pricing Theory
649
are concave and increasing functions, then U is concave and increasing. If h(v) = v and if f (x, y) = u(x) + by for some u: R+ → R and constant b > 0, then (for VT + 1 = 0) we recover the special case of additive utility given by U (c) = E
b t u(ct ) .
t
“Non-expected-utility” aggregation of future consumption utility can be based, for example, upon the local-expected-utility model of Machina (1982) and the betweenness-certainty-equivalent model of Chew (1983, 1989), Dekel (1989) and Gul and Lantto (1990). With recursive utility, as opposed to additive utility, it need not be the case that the degree of risk aversion is completely determined by the elasticity of intertemporal substitution. For the special case (Equation 7) of expected-utility aggregation, and with differentiability throughout, we have the utility gradient representation pt = f1 (ct , Et [h (Vt + 1 )])
f2 (cs , Es [h (Vs + 1 )]) Es h (Vs + 1 ) ,
s
where fi denotes the partial derivative of f with respect to its ith argument. Recursive utility allows for preference over early or late resolution of uncertainty (which have no impact on additive utility). This is relevant for asset prices, as for example in the context of remarks by Ross (1989), and as shown by Skiadas (1998) and Duffie, Schroder and Skiadas (1997). Grant, Kajii and Polak (2000) have more to say on preferences for the resolution of information. The equilibrium state-price density associated with recursive utility is computed in a Markovian setting by Kan (1995). 5 For further justification and properties of recursive utility, see Chew and Epstein (1991) and Skiadas (1997, 1998). For further implications for asset pricing, see Epstein (1988, 1992), Epstein and Zin (1999) and Giovannini and Weil (1989).
2.5. Equilibrium and Pareto optimality Now, we explore the implications of multi-agent equilibrium for state prices. A key objective is to link state prices with important macro-economic variables that are, hopefully, observable, such as total economy-wide consumption. Suppose there are m agents. Agent i is defined as above by a strictly increasing utility function Ui : L+ → R and an endowment process e(i) in L+ . Given a dividend
5
Kan (1993) further explored the utility gradient representation of recursive utility in this setting.
650
D. Duffie
process d for N securities, an equilibrium is a collection (q (1) , . . . , q (m) , S), where S is a security-price process and, for each agent i, q (i) is a trading strategy solving sup Ui e(i) + d q , q ∈Q
m with i = 1 q (i) = 0. We define markets to be complete if, for each process x in L, there is some trading strategy q with dtq = xt , t 1. Complete markets thus means that any consumption process x can be obtained by investing some amount at time 0 in a trading strategy that, at each future period t, generates the dividend xt . The First Welfare Theorem is that complete-markets equilibria provide efficient consumption allocations. Specifically, an allocation (c(1) , . . . , c(m) ) of consumption processes to the m agents is feasible if c(1) + · · · + c(m) e(1) + · · · + e(m) , and is Pareto optimal if there is no feasible allocation (b(1) , . . . , b(m) ) such that Ui (b(i) ) Ui (c(i) ) for all i, with strict inequality for some i. Any equilibrium (q (1) , . . . , q (m) , S) has an associated feasible consumption allocation (c(1) , . . . , c(m) ) defined by letting c(i) − e(i) be the dividend process generated by q (i) . First Welfare Theorem. Suppose (q (1) , . . . , q (m) , S) is an equilibrium and markets are complete. Then the associated consumption allocation is Pareto optimal. An easy proof is due to Arrow (1951). Suppose, with the objective of obtaining a contradiction, that (c(1) , . . . , c(m) ) is the consumption allocation of a completemarkets equilibrium and that there is a feasible allocation (b(1) , . . . , b(m) ) such that Ui (b(i) ) Ui (c(i) ) for all i, with strict inequality for some i. Because of equilibrium, there is no arbitrage, and therefore a state-price density p . For any consumption process x, let p · x = E( t pt xt ). We have p · b(i) p · c(i) , for otherwise, given complete markets, the utility of c(i) can be increased strictly by some feasible trading strategy generating b(i) − e(i) . Similarly, for at least some agent, we also have p · b(i) > p · c(i) . Thus p· b(i) > p · c(i) = p · e(i) , i
i
i
(i) the equality from the market-clearing = 0. This is impossible, iq condition however, for feasibility implies that i b(i) i e(i) . This contradiction implies the result. Duffie and Huang (1985) characterize the number of securities necessary for complete markets. Roughly speaking, extending the spanning insight of Arrow (1953) to allow for dynamic spanning, it is necessary (and generically sufficient) that there are at least as many securities as the maximal number of mutually exclusive events of positive conditional probability that could be revealed between two dates. For example, if the information generated at each date is that of a coin toss, then complete markets requires a minimum of two securities, and almost any two will suffice. Cox, Ross
Ch. 11:
Intertemporal Asset Pricing Theory
651
and Rubinstein (1979) provide the classical example in which one of the original securities has “binomial” returns and the other has riskless returns. That is, S = (Y , Z) is strictly positive, and, for all t < T , we have dt = 0, Yt + 1 /Yt is a Bernoulli trial, and Zt + 1 /Zt is a constant. More generally, however, to be assured of complete markets given the minimal number of securities, one must verify that the price process, which is endogenous, is not among the rare set that is associated with a reduced market span, a point emphasized by Hart (1975) and dealt with by Magill and Shafer (1990). In general, the dependence of the marketed subspace on endogenous security price processes makes the demonstration and calculation of an equilibrium problematic. Conditions for the generic existence of equilibrium in incomplete markets are given by Duffie and Shafer (1985, 1986). The literature on this topic is extensive. 6 Hahn (1994) raises some philosophical issues regarding the possibility of complete markets and efficiency, in a setting in which endogenous uncertainty may be of concern to investors. The Pareto inefficiency of incomplete markets equilibrium consumption allocations, and notions of constrained efficiency, are discussed by Hart (1975), Kreps (1979) (and references therein), Citanna, Kajii and Villanacci (1994), Citanna and Villanacci (1993) and Pan (1993, 1995). The optimality of individual portfolio and consumption choices in incomplete markets in this setting is given a dual interpretation by He and Pag`es (1993). [Girotto and Ortu (1994) offer related remarks.] Methods for computation of equilibrium with incomplete markets are developed by Brown, DeMarzo and Eaves (1996a,b), Cuoco and He (1994), DeMarzo and Eaves (1996) and Dumas and Maenhout (2002). Kraus and Litzenberger (1975) and Stapleton and Subrahmanyam (1978) gave early parametric examples of equilibrium. 2.6. Equilibrium asset pricing We will review a representative-agent state-pricing model of Constantinides (1982). The idea is to deduce a state-price density from aggregate, rather than individual, consumption behavior. Among other advantages, this allows for a version of the
6 Bottazzi (1995) has a somewhat more advanced version of existence in single-period multiplecommodity version. Related existence topics are studied by Bottazzi and Hens (1996), Hens (1991) and Zhou (1997). The literature is reviewed in depth by Geanakoplos (1990). Alternative proofs of existence of equilibrium are given in the 2-period version of the model by Geanakoplos and Shafer (1990), Hirsch, Magill and Mas-Colell (1990) and Husseini, Lasry and Magill (1990); and in a T period version by Florenzano and Gourdel (1994). If one defines security dividends in nominal terms, rather than in units of consumption, then equilibria always exist under standard technical conditions on preferences and endowments, as shown by Cass (1984), Werner (1985), Duffie (1987) and Gottardi and Hens (1996), although equilibrium may be indeterminate, as shown by Cass (1989) and Geanakoplos and Mas-Colell (1989). On this point, see also Kydland and Prescott (1991), Mas-Colell (1991) and Cass (1991). Surveys of general equilibrium models in incomplete markets settings are given by Cass (1991), Duffie (1992), Geanakoplos (1990), Magill and Quinzii (1996) and Magill and Shafer (1991). Hindy and Huang (1993) show the implications of linear collateral constraints on security valuation.
652
D. Duffie
consumption-based capital asset pricing model of Breeden (1979) in the special case of locally-quadratic utility. We define, for each vector l in R+m of “agent weights”, the utility function Ul : L+ → R by Ul (x) =
sup
m
(c(1) , ..., c(m) )
li Ui (ci )
subject to
c(1) + · · · + c(m) x.
(8)
i=1
Proposition. Suppose for all i that Ui is concave and strictly increasing. Suppose that (q (1) , . . . , q (m) , S) is an equilibrium and that markets are complete. Then there exists some nonzero l ∈ R+m such that (0, S) is a (no-trade) equilibrium for the one-agent economy [(Ul , e), d], where e = e(1) + · · · + e(m) . With this l and with x = e = e(1) + · · · + e(m) , problem (8) is solved by the equilibrium consumption allocation. A method of proof, as well as the intuition for this proposition, is that with complete markets, a state-price density p represents Lagrange multipliers for consumption in the various periods and states for all of the agents simultaneously, as well as for some representative agent (Ul , e), whose agent-weight vector l defines a hyperplane separating the set of feasible utility improvements from R+m . [See, for example, Duffie (2001) for details. This notion of “representative agent” is weaker than that associated with aggregation in the sense of Gorman (1953).] Corollary 1. If, moreover, Ul is continuously differentiable at e, then l can be chosen so that a state-price density is given by the Riesz representation of ∇Ul (e). Corollary 2. Suppose, for each i, that Ui is of the additive form T uit (ct ) . Ui (c) = E t=0
Then Ul is also additive, with T Ul (c) = E ult (ct ) , t=0
where ult ( y) = sup
m
x ∈ R+m i = 1
li uit (xi )
subject to
x1 + · · · + xm y.
In this case, the differentiability of Ul at e implies that for any times t and t t, ⎡ ⎤ t 1 Et ⎣ult (9) (et ) St + ulj (ej )dj ⎦ . St = ult (et ) j=t+1
Ch. 11:
Intertemporal Asset Pricing Theory
653
2.7. Breeden’s consumption-based CAPM The consumption-based capital asset-pricing model (CAPM) of Breeden (1979) extends the results of Rubinstein (1976) by showing that if agents have additive utility that is, locally quadratic, then expected asset returns are linear with respect to their covariances with aggregate consumption, as will be stated more carefully shortly. Notably, the result does not depend on complete markets. Locally quadratic additive utility is an extremely strong assumption. (It does not violate monotonicity, as utility need not be quadratic at all levels). Breeden actually worked in a continuous-time setting of Brownian information, reviewed shortly, within which smooth additive utility functions are automatically locally quadratic, in a sense that is sufficient to recover a continuous-time analogue of the following consumption-based CAPM. 7 In a oneperiod setting, the consumption-based CAPM corresponds to the classical CAPM of Sharpe (1964). First, we need some preliminary definitions. The return at time t + 1 on a trading strategy q whose market value qt · St is non-zero is Rqt + 1 =
qt · (St + 1 + dt + 1 ) . qt · St
There is short-term riskless borrowing if, for each given time t < T , there is a trading strategy q with Ft -conditionally deterministic return, denoted rt . We refer to the sequence {r0 , r1 , . . . , rT − 1 } of such short-term risk-free returns as the associated “short-rate process”, even though rT is not defined. Conditional on Ft , we let var t (·) and covt (·) denote variance and covariance, respectively. Proposition: Consumption-based CAPM. T Suppose, for each agent i, that the utility Ui (·) is of the additive form Ui (c) = E[ t = 0 uit (ct )], and moreover that, for equilibrium consumption processes c(1) , . . . , c(m) , we have uit (ct(i) ) = ait + bit ct(i) , where ait and bit > 0 are constants. Let S be the associated equilibrium price process of the securities. Then, for any time t, St = At Et (dt + 1 + St + 1 ) − Bt Et [(St + 1 + dt + 1 ) et + 1 ] , for adapted strictly positive scalar processes A and B. For a given time t, suppose that there is riskless borrowing at the short rate rt . Then there is a trading strategy with the property that its return R∗t + 1 has maximal Ft -conditional correlation with the aggregate consumption et + 1 (among all trading strategies). Suppose, moreover, that
7
For a theorem and proof, see Duffie and Zame (1989).
654
D. Duffie
there is riskless borrowing at the short rate rt and that var t (R∗t + 1 ) is strictly positive. Then, for any trading strategy q with return Rqt + 1 , Et Rqt + 1 − rt = btq Et R∗t + 1 − rt , where btq =
covt (Rqt + 1 , R∗t + 1 ) . var t (R∗t + 1 )
The essence of the result is that the expected return of any security, in excess of risk-free rates, is increasing in the degree to which the security’s return depends (in the sense of regression) on aggregate consumption. This is natural; there is an average preference in favor of securities that are hedges against aggregate economic performance. While the consumption-based CAPM does not depend on complete markets, its reliance on locally-quadratic expected utility, and otherwise perfect markets, is limiting, and its empirical performance is mixed, at best. For some evidence, see for example Hansen and Jaganathan (1990). 2.8. Arbitrage and martingale measures This section shows the equivalence between the absence of arbitrage and the existence of “risk-neutral” probabilities, under which, roughly speaking, the price of a security is the sum of its expected discounted dividends. This idea, stemming from Cox and Ross (1976), was developed into the notion of equivalent martingale measures by Harrison and Kreps (1979). We suppose throughout this subsection that there is short-term riskless borrowing at some uniquely defined short-rate process r. We can define, for any times t and t T , Rt,t = (1 + rt ) (1 + rt + 1 ) . . . (1 + rt − 1 ) , the payback at time t of one unit of account borrowed risklessly at time t and “rolled over” in short-term borrowing repeatedly until date t. It would be a simple situation, both computationally and conceptually, if any security’s price were merely the expected discounted dividends of the security. Of course, this is unlikely to be the case in a market with risk-averse investors. We can nevertheless come close to this sort of characterization of security prices by adjusting the original probability measure P. For this, we define a new probability measure Q to be equivalent to P if Q and P assign zero probabilities to the same events. An equivalent probability measure Q is an equivalent martingale measure if ⎛ ⎞ T d j ⎠ St = EtQ ⎝ , t < T, Rt, j j=t+1
where E Q denotes expectation under Q, and EtQ (X ) = E Q (X | Ft ) for any random variable X .
Ch. 11:
Intertemporal Asset Pricing Theory
655
It is easy to show that Q is an equivalent martingale measure if and only if, for any trading strategy q, ⎛ ⎞ T q d j ⎠ , t < T. (10) qt · St = EtQ ⎝ Rt, j j=t+1
We will show that the absence of arbitrage is equivalent to the existence of an equivalent martingale measure. g The deflator g defined by gt = R−1 0,t defines the discounted gain process G , by t g Gt = gt St + j = 1 gj dj . The word “martingale” in the term “equivalent martingale measure” comes from the following equivalence. Lemma. A probability measure Q equivalent to P is an equivalent martingale measure for (d, S) if and only if ST = 0 and the discounted gain process G g is a martingale with respect to Q. If, for example, a security pays no dividends before T , then the property described by the lemma is that the discounted price process is a Q-martingale. We already know that the absence of arbitrage is equivalent to the existence of a state-price density p . A probability measure Q equivalent to P can be defined in terms dQ of a Radon–Nikodym derivative, a strictly positive random variable dQ dP with E( dP ) = 1, dQ Q via the definition of expectation with respect to Q given by E (Z) = E( dP Z), for any random variable Z. We will consider the measure Q defined by dQ dP = xT , where xT =
pT R0,T . p0
(Indeed, one can check by applying the definition of a state-price density to the payoff R0,T that xT is strictly positive and of expectation 1.) The density process x for Q is defined by xt = Et (xT ). Bayes Rule implies that for any times t and j > t, and any Fj -measurable random variable Zj , EtQ (Zj ) =
1 Et (xj Zj ). xt
(11)
Fixing some time t < T , consider a trading strategy q that invests one unit of account at time t and repeatedly rolls the value over in short-term riskless borrowing until time T , with final value Rt,T . That is, qt · St = 1 and dTq = Rt,T . Relation (3) then implies that Et pT R0,T Et (xT p0 ) xt p0 = = . (12) pt = Et pT Rt,T = R0,t R0,t R0,t From Equations (11), (12), and the definition of a state-price density, Equation (10) is satisfied, so Q is indeed an equivalent martingale measure. We have shown the following result.
656
D. Duffie
Theorem. There is no arbitrage if and only if there exists an equivalent martingale measure. Moreover, p is a state-price density if and only if an equivalent martingale measure Q has the density process x defined by xt = R0,t pt / p0 . This martingale approach simplifies many asset-pricing problems that might otherwise appear to be quite complex, and applies much more generally than indicated here. For example, the assumption of short-term borrowing is merely a convenience, and one can typically obtain an equivalent martingale measure after normalizing prices and dividends by the price of some particular security (or trading strategy). Girotto and Ortu (1996) present general results of this type for this finite-dimensional setting. Dalang, Morton and Willinger (1990) gave a general discrete-time result on the equivalence of no arbitrage and the existence of an equivalent martingale measure, covering even the case with infinitely many states. 2.9. Valuation of redundant securities Suppose that the dividend–price pair (d, S) for the N given securities is arbitragefree, with an associated state-price density p . Now consider the introduction of a new security with dividend process dˆ and price process S. We say that dˆ is redundant given (d, S) if there exists a trading strategy q, with respect to only the original security ˆ in the sense that d q = dˆ t , t 1. dividend–price process (d, S), that replicates d, t If dˆ is redundant given (d, S), then the absence of arbitrage for the “augmented” ˆ (S, S)] implies that St = Yt , where dividend–price process [(d, d), ⎛ ⎞ T 1 ⎝ ˆ⎠ pj dj , t < T. Yt = Et pt j=t+1
If this were not the case, there would be an arbitrage, as follows. For example, suppose that for some stopping time t, we have St > Yt , and that t T with strictly positive probability. We can then define the strategy: (a) Sell the redundant security dˆ at time t for St , and hold this position until T . (b) Invest qt · St at time t in the replicating strategy q, and follow this strategy until T . Since the dividends generated by this combined strategy (a)–(b) after t are zero, the only dividend is at t, for the amount St − Yt > 0, which means that this is an arbitrage. Likewise, if St < Yt for some non-trivial stopping time t, the opposite strategy is an arbitrage. We have shown the following. Proposition. Suppose (d, S) is arbitrage-free with state-price density p. Let dˆ be a redundant dividend process with price process S. Then the augmented dividend–price ˆ (S, S)] is arbitrage-free if and only if it has p as a state-price density. pair [(d, d), In applications, it is often assumed that (d, S) generates complete markets, in which case any additional security is redundant, as in the classical “binomial” model of
Ch. 11:
Intertemporal Asset Pricing Theory
657
Cox, Ross and Rubinstein (1979), and its continuous-time analogue, the Black–Scholes option pricing model, coming up in the next section. Complete markets means that every new security is redundant. Theorem. Suppose that FT = F and there is no arbitrage. Then markets are complete if and only if there is a unique equivalent martingale measure. Banz and Miller (1978) and Breeden and Litzenberger (1978) explore the ability to deduce state prices from the valuation of derivative securities. 2.10. American exercise policies and valuation We now extend our pricing framework to include a family of securities, called “American,” for which there is discretion regarding the timing of cash flows. Given an adapted process X , each finite-valued stopping time t generates a dividend process d X ,t defined by dtX ,t = 0, t Ñ t, and dtX ,t = Xt . In this context, a finite-valued stopping time is an exercise policy, determining the time at which to accept payment. Any exercise policy t is constrained by t t, for some expiration time t T . (In what follows, we might take t to be a stopping time, which is useful for the case of certain knockout options.) We say that (X , t) defines an American security. The exercise policy is selected by the holder of the security. Once exercised, the security has no remaining cash flows. A standard example is an American put option on a security with price process p. The American put gives the holder of the option the right, but not the obligation, to sell the underlying security for a fixed exercise price at any time before a given expiration time t. If the option has an exercise price K and expiration time t < T , then Xt = (K − pt )+ , t t, and Xt = 0, t > t. We will suppose that, in addition to an American security (X , t), there are securities with an arbitrage-free dividend-price process (d, S) that generates complete markets. The assumption of complete markets will dramatically simplify our analysis since it implies, for any exercise policy t, that the dividend process d X ,t is redundant given (d, S). For notational convenience, we assume that 0 < t < T . Let p be a state-price density associated with (d, S). From Proposition 2.9, given any exercise policy t, the American security’s dividend process d X ,t has an associated cum-dividend price process, say V t , which, in the absence of arbitrage, satisfies 1 Vtt = Et (pt Xt ) , t t. pt This value does not depend on which state-price density is chosen because, with complete markets, state-price densities are identical up to a positive scaling. We consider the optimal stopping problem V0∗ ≡ max
t ∈ T (0)
V0t ,
(13)
where, for any time t t, we let T (t) denote the set of stopping times bounded below by t and above by t. A solution to Equation (13) is called a rational exercise policy
658
D. Duffie
for the American security X , in the sense that it maximizes the initial arbitrage-free value of the resulting claim. Merton (1973) was the first to attack American option valuation systematically using this arbitrage-based viewpoint. We claim that, in the absence of arbitrage, the actual initial price V0 for the American security must be V0∗ . In order to see this, suppose first that V0∗ > V0 . Then one could buy the American security, adopt for it a rational exercise policy t, and also undertake a trading strategy replicating −d X ,t . Since V0∗ = E(pt Xt )/ p0 , this replication involves an initial payoff of V0∗ , and the net effect is a total initial dividend of V0∗ − V0 > 0 and zero dividends after time 0, which defines an arbitrage. Thus the absence of arbitrage easily leads to the conclusion that V0 V0∗ . It remains to show that the absence of arbitrage also implies the opposite inequality V0 V0∗ . Suppose that V0 > V0∗ . One could sell the American security at time 0 for V0 . We will show that for an initial investment of V0∗ , one can “super-replicate” the payoff at exercise demanded by the holder of the American security, regardless of the exercise policy used. Specifically, a super-replicating trading strategy for (X , t, d, S) is a trading strategy q involving only the securities with dividend-price process (d, S) that has the following properties: (a) dtq = 0 for 0 < t < t, and (b) Vtq Xt for all t t, where Vtq is the cum-dividend market value of q at time t. Regardless of the exercise policy t used by the holder of the security, the payment of Xt demanded at time t is dominated by the market value Vtq of a super-replicating strategy q. (In effect, one modifies q by liquidating the portfolio qt at time t, so that the actual trading strategy f associated with the arbitrage is defined by ft = qt for t < t and ft = 0 for t t.) Now, suppose q is super-replicating, with V0q = V0∗ . If, indeed, V0 > V0∗ then the strategy of selling the American security and adopting a super-replicating strategy, liquidating at exercise, effectively defines an arbitrage. This notion of arbitrage for American securities, an extension of the definition of arbitrage used earlier, is reasonable because a super-replicating strategy does not depend on the exercise policy adopted by the holder (or sequence of holders over time) of the American security. It would be unreasonable to call a strategy involving a short position in the American security an “arbitrage” if, in carrying it out, one requires knowledge of the exercise policy for the American security that will be adopted by other agents that hold the security over time, who may after all act “irrationally.” The approach to American security valuation given here is similar to the continuoustime treatments of Bensoussan (1984) and Karatzas (1988), who do not formally connect the valuation of American securities with the absence of arbitrage, but rather deal with the similar notion of “fair price”. Proposition. Given (X , t, d, S), suppose (d, S) is arbitrage free and generates complete markets. Then there is a super-replicating trading strategy q for (X , t, d, S) with the initial value V0q = V0∗ .
Ch. 11:
Intertemporal Asset Pricing Theory
659
In order to construct a super-replicating strategy with the desired property, we will make a short excursion into the theory of optimal stopping. For any process Y in L, the Snell envelope W of Y is defined by Wt = max Et (Yt ), t ∈ T (t)
0 t t.
It can be shown that, naturally, for any t < t, Wt = max[Yt , Et (Wt + 1 )], which can be viewed as the Bellman equation for optimal stopping. Thus Wt Et (Wt + 1 ), implying that W is a supermartingale, implying that we can decompose W in the form W = Z − A, for some martingale Z and some increasing adapted 8 process A with A0 = 0. In order to prove the above proposition, we define Y by Yt = Xt pt , and let W , Z, and A be defined as above. By the definition of complete markets, there is a trading strategy q with the property that • dtq = 0 for 0 < t < t; • dtq = Zt / pt ; • dtq = 0 for t > t. Property (a) defining a super-replicating strategy is satisfied by this strategy q. From the fact that Z is a martingale and the definition of a state-price density, the cumdividend value V q satisfies pt Vtq = Et pt dtq = Et (Zt ) = Zt , t t. (14) From Equation (14) and the fact that A0 = 0, we know that V0q = V0∗ because Z0 = W0 = p0 V0∗ . Since Zt − At = Wt Yt for all t, from Equation (14) we also know that Vtq =
Zt 1 At (Yt + At ) = Xt + Xt , pt pt pt
t t,
the last inequality following from the fact that At 0 for all t. Thus the dominance property (b) defining a super-replicating strategy is also satisfied, and q is indeed a super-replicating strategy with V0q = V0∗ . This proves the proposition and implies that, unless there is an arbitrage, the initial price V0 of the American security is equal to the market value V0∗ associated with a rational exercise policy. The Snell envelope W is also the key to showing that a rational exercise policy is given by the dynamic-programming solution t 0 = min{t: Wt = Yt }. In order to verify this, suppose that t is a rational exercise policy. Then Wt = Yt . (This can be seen
8
More can be said, in that At can be taken to be Ft−1 -measurable.
660
D. Duffie
from the fact that Wt Yt , and if Wt > Yt then t cannot be rational.) From this fact, any rational exercise policy t has the property that t t 0 . For any such t, we have Et 0 [Y (t)] W (t 0 ) = Y (t 0 ), and the law of iterated expectations implies that E[Y (t)] E[Y (t 0 )], so t 0 is indeed rational. We have shown the following. Theorem. Given (X , t, d, S), suppose that (d, S) admits no arbitrage and generates complete markets. Let p be a state-price deflator. Let W be the Snell envelope of X p up to the expiration time t. Then a rational exercise policy for (X , t, d, S) is given by t 0 = min{t: Wt = pt Xt }. The unique initial cum-dividend arbitrage-free price of the American security is V0∗ =
1 E X (t 0 ) p (t 0 ) . p0
In terms of the equivalent martingale measure Q defined in Section 2.8, we can also write the optimal stopping problem (13) in the form V0∗
= max E t ∈ T (0)
Q
Xt R0,t
.
(15)
An optimal exercise time is t 0 = min{t: Vt∗ = Xt }, where Vt∗ = Wt / pt is the price of the American option at time t. This representation of the rational-exercise problem is sometimes convenient. For example, let us consider the case of an American call option on a security with price process p. We have Xt = ( pt − K)+ for some exercise price K. Suppose the underlying security has no dividends before or at the expiration time t. We suppose positive interest rates, meaning that Rt,s 1 for all t and s t. With these assumptions, we will show that it is never optimal to exercise the call option before its expiration date t. This property is sometimes called “no early exercise”, or “better alive than dead”. We define the “discounted price process” p∗ by p∗t = pt /R0,t . The fact that the underlying security pays dividends only after the expiration time t implies, by Lemma 2.8, that p∗ is a Q-martingale at least up to the expiration time t. That is, for t s t, we have EtQ ( p∗s ) = p∗t .
Ch. 11:
Intertemporal Asset Pricing Theory
661
With positive interest rates, we have, for any stopping time t t, E
Q
+ 1 K + Q ∗ ( pt − K) = E pt − R0,t R0,t + K = E Q EtQ p∗t − R0,t + K p∗t − E Q EtQ R0,t + K = EQ p∗t − R0,t + K Q ∗ pt − E R0,t 1 + Q ( pt − K) , =E R0,t
the first inequality by Jensen’s inequality, the second by the positivity of interest rates. It follows that t is a rational exercise policy. In typical cases, t is the unique rational exercise policy. If the underlying security pays dividends before expiration, then early exercise of the American call is, in certain cases, optimal. From the fact that the put payoff is increasing in the strike price (as opposed to decreasing for the call option), the second inequality above is reversed for the case of a put option, and one can guess that early exercise of the American put is sometimes optimal. Difficulties can arise with the valuation of American securities in incomplete markets. For example, the exercise policy may play a role in determining the marketed subspace, and therefore a role in pricing securities. If the state-price density depends on the exercise policy, it could even turn out that the notion of a rational exercise policy is not well defined.
3. Continuous-time modeling Many problems are more tractable, or have solutions appearing in a more natural form, when treated in a continuous-time setting. We first introduce the Brownian model of uncertainty and continuous security trading, and then derive partial differential equations for the arbitrage-free prices of derivative securities. The classic example is the Black–Scholes option-pricing formula. We then examine the connection between equivalent martingale measures and the “market price of risk” that arises from Girsanov’s Theorem. Finally, we briefly connect the theory of security valuation with that of optimal portfolio and consumption choice, using the elegant martingale approach of Cox and Huang (1989).
662
D. Duffie
3.1. Trading gains for Brownian prices We fix a probability space (W, F, P). A process is a measurable 9 function on W × [0, ∞) into R. The value of a process X at time t is the random variable variously written as Xt , X (t), or X (·, t): W → R. A standard Brownian motion is a process B defined by the following properties: (a) B0 = 0 almost surely; (b) Normality: for any times t and s > t, Bs − Bt is normally distributed with mean zero and variance s − t; (c) Independent increments: for any times t0 , . . . , tn such that 0 t0 < t1 < · · · < tn < ∞, the random variables B(t0 ), B(t1 ) − B(t0 ), . . . , B(tn ) − B(tn − 1 ) are independently distributed; and (d) Continuity: for each w in W, the sample path t → B(w, t) is continuous. It is a nontrivial fact, whose proof has a colorful history, that (W, F, P) can be constructed so that there exist standard Brownian motions. In perhaps the first scientific work involving Brownian motion, Bachelier (1900) proposed Brownian motion as a model of stock prices. We will follow his lead for the time being and suppose that a given standard Brownian motion B is the price process of a security. Later we consider more general classes of price processes. We fix the standard filtration F = {Ft : t 0} of B, defined for example in Protter (1990). Roughly speaking, 10 Ft is the set of events that can be distinguished as true or false by observation of B until time t. Our first task is to build a model of trading gains based on the possibility of continual adjustment of the position held. A trading strategy is an adapted process q specifying at each state w and time t the number qt (w) of units of the security to hold. If a strategy q is a constant, say q, between two dates t and s > t, then the total gain between those two dates is q(Bs − Bt ), the quantity held multiplied by the price change. So long as the trading strategy q is piecewise constant, we would have no difficulty in defining the total gain between any two times. For example, suppose, for some stopping times T0 , . . . , TN with 0 = T0 < T1 < · · · < TN = T , and for any n, we have q(t) = q(Tn − 1 ) for all t ∈ [Tn − 1 , Tn ). Then we define the total gain from trade as
T
qt dBt = 0
N
q (Tn − 1 ) [B (Tn ) − B (Tn − 1 )] .
(16)
n=1
More generally, in order to make for a good model of trading gains for trading strategies that are not necessarily piecewise constant, a trading strategy q is required !T to satisfy the technical condition that 0 qt2 dt < ∞ almost surely for each T . We let L2 denote the space of adapted processes satisfying this integrability restriction. 9 10
See Duffie (2001) for technical definitions not provided here. The standard filtration is augmented, so that Ft contains all null sets of F.
Ch. 11:
Intertemporal Asset Pricing Theory
663
2 !For each q in L there is an adapted process with continuous sample paths, denoted q!dB, that is called the stochastic integral of q with respect to B. A full definition of q dB is outlined in a standard source such as Karatzas and Shreve (1988). ! !T The value of the stochastic integral q dB at time T is usually denoted 0 qt dBt , and represents the total gain generated up to time T by trading the security ! with price process B according to the trading strategy q. The stochastic integral q dB has the properties that one would expect from a good model of trading gains. In particular, Equation (16) is satisfied for piece-wise constant q, and in general the stochastic integral is linear, in that, for any q and f in L2 and any scalars a and b, the process aq + bf is also in L2 , and, for any time T > 0,
T
(aqt + bft ) dBt = a 0
T
qt dBt + b 0
T
ft dBt .
(17)
0
3.2. Martingale trading gains The properties of standard Brownian motion imply that B is a martingale. (This follows basically from the property that its increments are independent and of zero expectation.) One must impose technical conditions on q, however, in order to ensure ! that q dB is also a martingale. This is natural; it should be impossible to generate an expected profit by trading a security that never experiences an expected price change. The following basic proposition can be found, for example, in Protter (1990). ! T 2 1/ 2 ! q dt < ∞ for all T > 0, then q dB is a martingale. Proposition. If E 0 t As a model of security-price processes, standard Brownian motion is too restrictive for most purposes. Consider, more generally, an Ito process, meaning a process S of the form St = x +
t
ms ds + 0
t
ss dBs ,
(18)
0
where x is a real number, s is in L2 , and m is in L1 , meaning that m is an adapted !t process such that 0 |ms | ds < ∞ almost surely for all t. It is common to write Equation (18) in the informal “differential” form dSt = mt dt + st dBt . One often thinks intuitively of dSt as the “increment” of S at time t, made up of two parts, the “locally riskless” part mt dt, and the “locally uncertain” part st dBt .
664
D. Duffie
In order to further interpret this differential representation of an Ito process, suppose that s and m have continuous sample paths and are bounded. It is then literally the case that for any time t, " d " = mt almost surely, (19) Et (St )" dt t =t and " d " var t (St ) " = st2 almost surely, (20) dt t =t where the derivatives are taken from the right, and where, for any random variable X with finite variance, var t (X ) ≡ Et (X 2 ) − [Et (X )]2 is the Ft -conditional variance of X . In this sense of Equations (19) and (20), we can interpret mt as the rate of change of the expectation of S, conditional on information available at time t, and likewise interpret st2 as the rate of change of the conditional variance of S at time t. One sometimes reads the associated abuses of notation “Et (dSt ) = mt dt” and “var t (dSt ) = st2 dt”. Of course, dSt is not even a random variable, so this sort of characterization is not rigorously justified and is used purely for its intuitive content. We will refer to m and s as the drift and diffusion processes of S, respectively. For an Ito process S of the form (18), let L(S) be the set whose elements are processes q with {qt mt : t !0} in L1 and {qt st : t !0} in L2 . For q in L(S), we define the stochastic integral q dS as the Ito process q dS given by T T T qt dSt = qt mt dt + qt st dBt , T 0. 0
0
0
! Assuming no dividends, we also refer to q dS as the gain process generated by the trading strategy q, given the price process S. We will have occasion to refer to adapted ! ∞ processes q and f that are equal almost everywhere, by which we mean that E( 0 |qt − ft | dt) = 0. In fact, we shall write “q = f” whenever q = f almost everywhere. This is a natural convention, for suppose that X and Y are Ito processes with X0 = Y0 and with dXt = mt dt + st dBt and dYt = at dt + bt dBt . Since stochastic integrals are defined for our purposes as continuous sample-path processes, it turns out that Xt = Yt for all t almost surely if and only if m = a almost everywhere and s = b almost everywhere. We call this the unique decomposition property of Ito processes. Ito’s Formula is the basis for explicit solutions to asset-pricing problems in a continuous-time setting. Ito’s Formula. Suppose X is an Ito process with dXt = mt dt + st dBt and f : R2 → R is twice continuously differentiable. Then the process Y , defined by Yt = f (Xt , t), is an Ito process with dYt = fx (Xt , t) mt + ft (Xt , t) + 12 fxx (Xt , t) st2 dt + fx (Xt , t) st dBt .
Ch. 11:
Intertemporal Asset Pricing Theory
665
A generalization of Ito’s Formula appears later in this section. 3.3. The Black–Scholes option-pricing formula We turn to one of the most important ideas in finance theory, the model of Black and Scholes (1973) for pricing options. Together with the method of proof provided by Robert Merton, this model revolutionized the practice of derivative pricing and risk management, and has changed the entire path of asset-pricing theory. Consider a security, to be called a stock, with price process St = x eat + s B(t) ,
t 0,
where x > 0, a, and s are constants. Such a process, called a geometric Brownian motion, is often called log-normal because, for any t, log(S + at + s Bt is ! t t ) = log(x) !t normally distributed. Moreover, since Xt ≡ at + s Bt = 0 a ds + 0 s dBs defines an Ito process X with constant drift a and diffusion s , Ito’s Formula implies that S is an Ito process and that dSt = mSt dt + s St dBt ;
S0 = x,
where m = a + s 2 / 2. From Equations (19) and (20), at any time t, the rate of change of the conditional mean of St is mSt , and the rate of change of the conditional variance is s 2 St2 , so that, per dollar invested in this security at time t, one may think of m as the “instantaneous” expected rate of return, and s as the “instantaneous” standard deviation of the rate of return. The coefficient s is also known as the volatility of S. A geometric Brownian motion is a natural two-parameter model of a security-price process because of these simple interpretations of m and s . Consider a second security, to be called a bond, with the price process b defined by bt = b0 ert ,
t 0,
for some constants b0 > 0 and r. We have the obvious interpretation of r as the continually compounding short rate. Since {rt: t 0} is trivially an Ito process, b is also an Ito process with dbt = rbt dt. A pair (a, b) consisting of trading strategies a for the stock and b for the bond is said to be self-financing if it generates no dividends before T (either positive or negative), meaning that, for all t, t t at St + bt bt = a0 S0 + b0 b0 + au dSu + bu dbu . (21) 0
0
This self-financing condition, conveniently defined by Harrison and Kreps (1979), is merely a statement that the current portfolio value (on the left-hand side) is precisely
666
D. Duffie
the initial investment plus any trading gains, and therefore that no dividend “inflow” or “outflow” is generated. Now consider a third security, an option. We begin with the case of a European call option on the stock, giving its owner the right, but not the obligation, to buy the stock at a given exercise price K on a given exercise date T . The option’s price process Y is as yet unknown except for the fact that YT = (ST − K)+ ≡ max(ST − K, 0), which follows from the fact that the option is rationally exercised if and only if ST > K. Suppose that the option is redundant, in that there exists a self-financing trading strategy (a, b) in the stock and bond with aT ST + bT bT = YT . If a0 S0 + b0 b0 < Y0 , then one could sell the option for Y0 , make an initial investment of a0 S0 + b0 b0 in the trading strategy (a, b), and at time T liquidate the entire portfolio (−1, aT , bT ) of option, stock, and bond with payoff −YT + aT ST + bT bT = 0. The initial profit Y0 − a0 S0 − b0 b0 > 0 is thus riskless, so the trading strategy (−1, a, b) would be an arbitrage. Likewise, if a0 S0 + b0 b0 > Y0 , the strategy (1, −a, −b) is an arbitrage. Thus, if there is no arbitrage, Y0 = a0 S0 + b0 b0 . The same arguments applied at each date t imply that in the absence of arbitrage, Yt = at St + bt bt . A full and careful definition of continuous-time arbitrage will be given later, but for now we can proceed without much ambiguity at this informal level. Our immediate objective is to show the following. The Black–Scholes Formula. If there is no arbitrage, then, for all t < T , Yt = C(St , t), where √ (22) C(x, t) = xF(z) − e−r(T − t) KF z − s T − t , with z=
log(x/K) + (r + s 2 / 2)(T − t) √ , s T −t
where F is the cumulative standard normal distribution function. The Black and Scholes (1973) formula was extended by Merton (1973, 1977), and subsequently given literally hundreds of further extensions and applications. Cox and Rubinstein (1985) is a standard reference on options, while Hull (2000) has further applications and references. We will see different ways to arrive at the Black–Scholes formula. Although not the shortest argument, the following is perhaps the most obvious and constructive. 11 We start by assuming that Yt = C(St , t), t < T , without knowledge of the function C aside from the assumption that it is twice continuously differentiable on (0, ∞) × [0, T ) 11 The line of exposition here is based on Gabay (1982) and Duffie (1988). Andreasen, Jensen and Poulsen (1998) provide numerous alternative methods of deriving the Black–Scholes Formula. The basic approach of using continuous-time self-financing strategies as the basis for making arbitrage arguments is due to Merton (1977).
Ch. 11:
Intertemporal Asset Pricing Theory
667
(allowing an application of Ito’s Formula). This will lead us to deduce Equation (22), justifying the assumption and proving the result at the same time. Based on our assumption that Yt = C(St , t) and Ito’s Formula, dYt = mY (t) dt + Cx (St , t) s St dBt ,
t < T,
(23)
where mY (t) = Cx (St , t) mSt + Ct (St , t) + 12 Cxx (St , t) s 2 St2 . Now suppose there is a self-financing trading strategy (a, b) with at St + bt bt = Yt ,
t ∈ [0, T ].
(24)
This assumption will also be justified shortly. Equations (21) and (24), along with the linearity of stochastic integration, imply that dYt = at dSt + bt dbt = (at mSt + bt bt r) dt + at s St dBt .
(25)
Based on the unique decomposition property of Ito processes, in order that the trading strategy (a, b) satisfies both Equation (23) and Equation (25), we must “match coefficients separately in both dBt and dt”. Specifically, we choose at so that at s St = Cx (St , t) s St ; for this, we let at = Cx (St , t). From Equation (24) and Yt = C(St , t), we then have Cx (St , t) St + bt bt = C(St , t), or bt =
1 [C (St , t) − Cx (St , t) St ] . bt
(26)
Finally, “matching coefficients in dt” from Equations (23) and (25) leaves, for t < T , − rC (St , t) + Ct (St , t) + rSt Cx (St , t) + 12 s 2 St2 Cxx (St , t) = 0.
(27)
In order for Equation (27) to hold, it is enough that C satisfies the partial differential equation (PDE) − rC(x, t) + Ct (x, t) + rxCx (x, t) + 12 s 2 x2 Cxx (x, t) = 0,
(28)
for (x, t) ∈ (0, ∞) × [0, T ). The fact that YT = C(ST , T ) = (ST − K)+ supplies the boundary condition: C(x, T ) = (x − K)+ ,
x ∈ (0, ∞).
(29)
By direct calculation of derivatives, one can show as an exercise that Equation (22) is a solution to Equations (28) and (29). All of this seems to confirm that C(S0 , 0), with C defined by the Black–Scholes formula (22), is a good candidate for the initial price of
668
D. Duffie
the option. In order to confirm this pricing, suppose to the contrary that Y0 > C(S0 , 0), where C is defined by Equation (22). Consider the strategy (−1, a, b) in the option, stock, and bond, with at = Cx (St , t) and bt given by Equation (26) for t < T . We can choose aT and bT arbitrarily so that Equation (24) is satisfied; this does not affect the self-financing condition (21) because the value of the trading strategy at a single point in time has no effect on the stochastic integral. The result is that (a, b) is self-financing by construction and that aT ST + bT bT = YT = (ST − K)+ . This strategy therefore nets an initial riskless profit of Y0 − a0 S0 − b0 b0 = Y0 − C (S0 , 0) > 0, which defines an arbitrage. Likewise, if Y0 < C(S0 , 0), the trading strategy (+1, −a, −b) is an arbitrage. Thus, it is indeed a necessary condition for the absence of arbitrage that Y0 = C(S0 , 0). Sufficiency is a more delicate matter. Under mild technical conditions on trading strategies that will follow, the Black–Scholes formula for the option price is also sufficient for the absence of arbitrage. Transactions costs play havoc with the sort of reasoning just applied. For example, if brokerage fees are any positive fixed fraction of the market value of stock trades, the stock-trading strategy a constructed above would call for infinite total brokerage fees, since, in effect, the number of shares traded is infinite! Leland (1985) has shown, nevertheless, that the Black–Scholes formula applies approximately, for small proportional transacations costs, once one artificially elevates the volatility parameter to compensate for the transactions costs. 3.4. Ito’s Formula Ito’s Formula is extended to the case of multidimensional Brownian motion as follows. A standard Brownian motion in Rd is defined by B = (B1 , . . . , Bd ) in Rd , where B1 , . . . , Bd are independent standard Brownian motions. We fix a standard Brownian motion B in Rd , restricted to some time interval [0, T ], on a given probability space (W, F, P). We also fix the standard filtration F = {Ft : t ∈ [0, T ]} of B. For simplicity, we take F to be FT . For an Rd -valued process q = (q (1) , . . . , q (d) ) with ! (i) 2 q in L for each i, the stochastic integral q dB is defined by
t
qs dBs = 0
d i=1
0
t
qs(i) dBsi .
(30)
An Ito process is now defined as one of the form t t Xt = x + ms ds + qs dBs , 0
0
!t
where m is a drift (with 0 |ms | ds < ∞ almost surely) and Equation (30). In this case, we call q the diffusion of X .
!t 0
qs dBs is defined as in
Ch. 11:
Intertemporal Asset Pricing Theory
669
We say that X = (X (1) , . . . , X (N ) ) an Ito process in RN if, for each i, X (i) is an Ito process. The drift of X is the RN -valued process m whose ith coordinate is the drift of X (i) . The diffusion of X is the RN × d -matrix-valued process s whose ith row is the diffusion of X (i) . In this case, we use the notation dXt = mt dt + st dBt .
(31)
Ito’s Formula. Suppose X is the Ito process in RN given by Equation (31) and f : RN × [0, ∞) × R is C 2,1 ; that is, f has at least two continuous derivatives with respect to its first (x) argument, and at least one continuous derivative with respect to its second (t) argument. Then { f (Xt , t): t 0} is an Ito process and, for any time t, f (Xt , t) = f (X0 , 0) +
t
D f (Xs , s) ds +
0
t
fx (Xs , s) qs dBs , 0
where D f (Xt , t) = fx (Xt , t) mt + ft (Xt , t) + 12 tr st st fxx (Xt , t) . Here, fx , ft , and fxx denote the obvious partial derivatives of f , valued in RN , R, and RN × N respectively, and tr(A) denotes the trace of a square matrix A (the sum of its diagonal elements). If X is an Ito process in RN with dXt = mt dt + st dBt and q = (q 1 , . . . , q N ) is a vector of adapted processes such that q · m is in L1 and, for each i, q !· s i is in L2 , then we say that q is in L(X ), which means that the stochastic integral q dX exists as an Ito process when defined by 0
T
qt dXt ≡ 0
T
qt · mt dt + 0
T
st qt dBt ,
T 0.
If X and Y are real-valued Ito processes with dXt = mX (t) dt + sX (t) dBt and dYt = mY (t) dt + sY (t) dBt , then Ito’s Formula (for N = 2) implies that the product Z = XY is an Ito process, with dZt = Xt dYt + Yt dXt + sX (t) · sY (t) dt.
(32)
If mX , mY , sX , and sY are bounded and have continuous sample paths (weaker conditions would suffice), then it follows from Equation (32) that " d " = sX (t) · sY (t) covt (Xs , Ys ) " ds s=t
almost surely,
where covt (Xs , Ys ) = Et (Xs Ys ) − Et (Xs ) Et (Ys ), and where the derivative is taken from the right, extending the intuition developed with Equations (19) and (20).
670
D. Duffie
3.5. Arbitrage modeling Now, we turn to a more careful definition of arbitrage for purposes of establishing a close link between the absence of arbitrage and the existence of state prices. Suppose the price processes of N given securities form an Ito process X = (X (1) , . . . , (N ) X ) in RN . We suppose, for technical regularity, that each security price process is in the space H 2 containing any Ito process Y with dYt = a(t) dt + b(t) dB(t) for which
2
t
<∞
a(s) ds
E
and
E
0
t
b(s) · b(s) ds < ∞.
0
We will suppose that the securities pay no dividends during the time interval [0, T ), and that XT is the vector of cum-dividend security prices at time T . N A trading strategy ! q is an R -valued process q in L(X ), meaning simply that the stochastic integral q dX defining trading gains is well defined. A trading strategy q is self-financing if qt · Xt = q0 · X0 +
t
qs dXs ,
t T.
(33)
0
We suppose that there is some short-rate process, a process r with the property !T that 0 |rt | dt is finite almost surely and, for some security with strictly positive price process b, bt = b0 exp
t
rs ds ,
t ∈ [0, T ].
(34)
0
In this case, dbt = rt bt dt, allowing us to view rt as the riskless short-term continuously compounding rate of interest, in an instantaneous sense, and to view bt as the market value of an account that is continually reinvested at the short-term interest rate r. A self-financing strategy q is an arbitrage if q0 · X0 < 0 and qT · XT 0, or if q0 · X0 0 and qT · XT > 0. Our first goal is to characterize the properties of a price process X that admits no arbitrage, at least after placing some reasonable restrictions on trading strategies. 3.6. Numeraire invariance It is often convenient to renormalize all security prices, sometimes relative to a particular price process. A deflator is a strictly positive Ito process. We can deflate the previously given security price process X by a deflator Y to get the new price process X Y defined by XtY = Xt Yt . Such a renormalization has essentially no economic effects, as suggested by the following result.
Ch. 11:
Intertemporal Asset Pricing Theory
671
Numeraire Invariance Theorem. Suppose Y is a deflator. Then a trading strategy q is self-financing with respect to X if and only if q is self-financing with respect to XY. The proof is an application of Ito’s Formula. We have the following corollary, which is immediate from the Numeraire Invariance Theorem, the strict positivity of Y , and the definition of an arbitrage. On numeraire invariance in more general settings, see Huang (1985a) and Protter (2001). 12 Corollary. Suppose Y is a deflator. A trading strategy is an arbitrage with respect to X if and only if it is an arbitrage with respect to the deflated price process X Y .
3.7. State prices and doubling strategies Paralleling the terminology of Section 2.2, a state-price density is a deflator p with the property that the deflated price process X p is a martingale. Other terms used for this concept in the literature are state-price deflator, marginal-rate-of-substitution process, and pricing kernel. In the discrete-state discrete-time setting of Section 2, we found that there is a state-price density if and only if there is no arbitrage. In a general continuous-time setting, this result is “almost” true, up to some technical issues. A technical nuisance in a continuous-time setting is that, without some frictions limiting trade, arbitrage is to be expected. For example, one may think of a series of bets on fair and independent coin tosses at times 1/ 2, 3/ 4, 7/ 8, and so on. Suppose one’s goal is to earn a riskless profit of a by time 1, where a is some arbitrarily large number. One can bet a on heads for the first coin toss at time 1/ 2. If the first toss comes up heads, one stops. Otherwise, one owes a to one’s opponent. A bet of 2a on heads for the second toss at time 3/ 4 produces the desired profit if heads comes up at that time. In that case, one stops. Otherwise, one is down 3a and bets 4a on the third toss, and so on. Because there is an infinite number of potential tosses, one will eventually stop with a riskless profit of a (almost surely), because the probability of losing on every one of an infinite number of tosses is (1/ 2) · (1/ 2) · (1/ 2) · · · = 0. This is a classic “doubling strategy” that can be ruled out either by a technical limitation, such as limiting the total number of bets, or by a credit restriction limiting the total amount that one is allowed to be in debt. For the case of continuous-time trading strategies, 13 we will eliminate the possibility of “doubling strategies” with a credit constraint, defining the set Q(X ) of self-financing trading strategies satisfying the non-negative wealth restriction qt · Xt 0 for all t. An alternative is to restrict trading strategies with a technical integrability condition, as reviewed in Duffie (2001). The next result is based on Dybvig and Huang (1988).
12 13
For more on the role of numeraire, see Geman, El Karoui and Rochet (1995). An actual continuous-time “doubling” strategy can be found in Karatzas (1993).
672
D. Duffie
Proposition. If there is a state-price density, then there is no arbitrage in Q(X ). Weaker no-arbitrage conditions based on a lower bound on wealth or on integrability conditions, are summarized in Duffie (2001), who provides a standard proof of this result. 3.8. Equivalent martingale measures In the finite-state setting of Section 2, it was shown that the existence of a stateprice deflator is equivalent to the existence of an equivalent martingale measure (after some deflation). Here, we say that Q is an equivalent martingale measure for the price process X if Q is equivalent to P (they have the same events of zero probability), and if X is a martingale under Q. Theorem. If the price process X admits an equivalent martingale measure, then there is no arbitrage in Q(X ). In most cases, the theorem is applied along the lines of the following corollary, a consequence of the corollary to the Numeraire Invariance Theorem of Section 3.6. Corollary. If there is a deflator Y such that the deflated price process X Y admits an equivalent martingale measure, then there is no arbitrage in Q(X ). As in the finite-state case, the absence of arbitrage and the existence of equivalent martingale measures are, in spirit, identical properties, although there are some technical distinctions in this infinite-dimensional setting. Inspired from early work by Kreps (1981), Delbaen and Schachermayer (1998) showed the equivalence, after deflation by a numeraire deflator, between no free lunch with vanishing risk (a slight strengthening of the notion of no arbitrage) and the existence of a local martingale measure. 14 3.9. Girsanov and market prices of risk We now look for convenient conditions on X supporting the existence of an equivalent martingale measure. We will also see how to calculate such a measure, and conditions for the uniqueness of such a measure, which is in spirit equivalent to complete markets. This is precisely the case for the finite-state setting of Theorem 2.9. The basic approach is from Harrison and Kreps (1979) and Harrison and Pliska (1981), who coined most of the terms and developed most of the techniques and basic results. Huang (1985a,b) generalized the basic theory. The development here
14 For related results, see Ansel and Stricker (1992a,b), Back and Pliska (1987), Cassese (1996), Duffie and Huang (1986), El Karoui and Quenez (1995), Frittelli and Lakner (1995), Jacod and Shiryaev (1998), Kabanov (1997), Kabanov and Kramkov (1995), Kusuoka (1993), Lakner (1993), Levental and Skorohod (1995), Rogers (1994), Schachermayer (1992, 1994, 2002), Schweizer (1992) and Stricker (1990).
Ch. 11:
Intertemporal Asset Pricing Theory
673
differs in some minor ways. Most of the results extend to an abstract filtration, not necessarily generated by Brownian motion, but the following important property of Brownian filtrations is somewhat special. Martingale Representation Theorem. For any !martingale x, there exists some Rd valued process q such that the stochastic integral q dB exists and such that, for all t,
t
xt = x0 +
qs dBs . 0
Now, we consider any given probability measure Q equivalent to P, with density process x defined by (11). By the martingale representation theorem, we can express the martingale x in terms of a stochastic integral of the form dxt = gt dBt , !T for some adapted process g = (g (1) , . . . , g (d) ) with 0 gt · gt dt < ∞ almost surely. Girsanov’s Theorem states that a standard Brownian motion BQ in Rd under Q is defined by B0Q = 0 and dBtQ = dBt + ht dt, where ht = −gt / xt . Suppose the price process X of the N given securities (possibly after some change of numeraire) is an Ito process in RN , with dXt = mt dt + st dBt . We can therefore write dXt = (mt − st ht ) dt + st dBtQ . If X is to be a Q-martingale, then its drift under Q must be zero, which means that, almost everywhere, s (w, t) h(w, t) = m(w, t),
(w, t) ∈ W × [0, T ].
(35)
Thus, the existence of a solution h to the system (35) of linear equations (almost everywhere) is necessary for the existence of an equivalent martingale measure for X . Under additional technical conditions, we will find that it is also sufficient. We can also view a solution h to Equation (35) as providing a proportional relationship between mean rates of change of prices (m) and the amounts (s ) of “risk” in price changes stemming from the underlying d Brownian motions. For this reason, any such solution h is called a market-price-of-risk process for X . The idea is that hi (t) is the “unit price”, measured in price drift, of bearing exposure to the increment of B(i) at time t. A numeraire deflator is a deflator that is the reciprocal of the price process of one of the securities. It is usually the case that one first chooses some numeraire deflator Y ,
674
D. Duffie
and then calculates the market price of risk for the deflated price process X Y . This is technically convenient because one of the securities, the “numeraire”, has a price that is always 1 after such a deflation. If there !is a short-rate process r, a typical numeraire t deflator is given by Y , where Yt = exp(− 0 rs ds). If there is no market price of risk, one may guess that something is “wrong”, as the following result confirms. Lemma. Let Y be a numeraire deflator. If there is no market-price-of-risk process for X Y , then there are arbitrages in Q(X ), and there is no equivalent martingale measure for X Y . Proof: Suppose X Y has drift process m Y and diffusion s Y , and that there is no solution h to s Y h = m Y . Then, as a matter of linear algebra, there exists an adapted process q taking values that are row vectors in RN such that qs Y ≡ 0 and qm Y Ñ 0. By replacing q(w, t) with zero for any (w, t) such that q(w, t) m Y (w, t) < 0, we can arrange to have qm Y > 0. (This works provided the resulting process q is not identically zero; in that case the same procedure applied to −q works.) Finally, because the numeraire security associated with the deflator has a price that is identically equal to 1 after deflation, we can also choose the trading strategy for the numeraire so that, in addition to the above properties, q is self-financing. That is, assuming without loss of generality that the numeraire security is the last security, we can let N −1 (i) Y ,(i) t (N ) (i) Y ,(i) qt Xt + qs dXs . qt = − 0
i=1
It follows that q is a self-financing trading strategy with q0 · X0Y = 0, whose wealth process W , defined by Wt = qt · XtY , is increasing and not constant. In particular, q is in Q(X Y ). It follows that q is an arbitrage for X Y , and therefore (by Numeraire Invariance) for X . Finally, the reasoning leading to Equation (35) implies that if there is no marketprice-of-risk process, then there can be no equivalent martingale measure for X Y . For any Rd -valued adapted process h in L(B), we let x h be defined by t t h 1 xt = exp − hs dBs − 2 hs · hs ds . 0
(36)
0
Ito’s Formula implies that dxth = −xth ht dBt . Novikov’s Condition, a sufficient technical condition for x to be a martingale, is that T hs · hs ds < ∞. E exp 12 0
Theorem. If X has a market price of risk process h satisfying Novikov’s condition, and moreover xTh has finite variance, then there is an equivalent martingale measure for X , and there is no arbitrage in Q(X ).
Ch. 11:
Intertemporal Asset Pricing Theory
675
Proof: By Novikov’s Condition, x h is a positive martingale. We have x0h = e0 = 1, so x h is indeed the density process of an equivalent probability measure Q defined by h dQ dP = xT . By Girsanov’s Theorem, a standard Brownian motion BQ in Rd under Q is defined by dBtQ = dBt + ht dt. Thus dXt = st dBtQ . As dQ dP has finite variance and each security (i) 2 price process X is by assumption in H , we know by the Cauchy–Schwartz Inequality that EQ 0
T
1/ 2 s (i) (t) · s (i) (t) dt
= EP 0
T
1/ 2 s (i) (t) · s (i) (t) dt
dQ , dP
is finite. Thus, X (i) is a Q-martingale by Proposition 3.2, and Q is therefore an equivalent martingale measure. The lack of arbitrage in Q(X ) follows from Theorem 3.8. Putting this result together with the previous lemma, we see that the existence of a market-price-of-risk process is necessary and, coupled with a technical integrability condition, sufficient for the absence of “well-behaved” arbitrages and the existence of an equivalent martingale measure. Huang and Pag`es (1992) give an extension to the case of an infinite-time horizon. For uniqueness of equivalent martingale measures, we can use the fact that, for any h such measure Q, Girsanov’s Theorem implies that we must have dQ dP = xT , for some market price of risk h. If s (w, t) is of maximal rank d, however, there can be at most one solution h(w, t) to Equation (35). This maximal rank condition is equivalent to the condition that the span of the rows of s (w, t) is all of Rd . Proposition. If rank(s ) = d almost everywhere, then there is at most one market price of risk and at most one equivalent martingale measure. If there is a unique market-price-of-risk process, then rank(s ) = d almost everywhere. With incomplete markets, significant attention in the literature has been paid to the issue of “which equivalent martingale measure to use” for the purpose of pricing contingent claims that are not redundant. Babbs and Selby (1996), B¨uhlmann, Delbaen, Embrechts and Shiryaev (1998), and F¨ollmer and Schweizer (1990) suggest some selection criteria or parameterization for equivalent martingale measures in incomplete markets. In particular, Artzner (1995), Bajeux-Besnainou and Portait (1997), Dijkstra (1996), Johnson (1994) and Long (1990) address the numeraire portfolio, also called growth-optimal portfolio, as a device for selecting a state-price density. Little of this literature offers an economic theory for the use of a particular measure for pricing new contingent claims that are not already traded (or replicated) by the given primitive securities.
676
D. Duffie
3.10. Black–Scholes again Suppose the given security-price process is X = (S (1) , . . . , S (N − 1) , b), where, for S = (S (1) , . . . , S (N − 1) ), dSt = mt dt + st dBt , and dbt = rt bt dt;
b0 > 0,
where m, s , and r are adapted processes (valued in RN − 1 , R(N − 1) × d , and R, respectively). We also suppose for technical convenience that the short-rate process r is bounded. Then Y = b −1 is a convenient numeraire deflator, and we let Z = SY . By Ito’s Formula, mt st dZt = −rt Zt + dt + dBt . bt bt In order to apply Theorem 3.9 to the deflated price process X = (Z, 1), it would be enough to know that Z has a market price of risk h and that the variance of xTh is finite. Given this, there would be an equivalent martingale measure Q and no arbitrage in Q(X ). Suppose, for the moment, that this is the case. By Girsanov’s Theorem, there is a standard Brownian motion BQ in Rd under Q such that dZt =
st dBtQ . bt
Because S = bZ, another application of Ito’s Formula yields dSt = rt St dt + st dBtQ .
(37)
Equation (37) is an important intermediate result for arbitrage-free asset pricing, giving an explicit expression for security prices under a probability measure Q with the property that the “discounted” price process S/ b is a martingale. For example, this leads to an easy recovery of the Black–Scholes formula, as follows. Suppose that, of the securities with price processes S (1) , . . . , S (N − 1) , one is a call option on another. For convenience, we denote the price process of the call option by U and the price process of the underlying security by V , so that UT = (VT − K)+ , for expiration at time T with some given exercise price K. Because UY is by assumption a martingale under Q, we have T UT r(s) ds (VT − K)+ . Ut = bt EtQ = EtQ exp − (38) bT t The reader may verify that this is the Black–Scholes formula for the case of d = 1, V0 > 0, and with constants r and non-zero s such that for all
Ch. 11:
Intertemporal Asset Pricing Theory
677
t, rt = r and dVt = Vt mV (t) dt + Vt s dBt , where mV is a bounded adapted process. Indeed, in this case, Z has a market-price-of-risk process h such that xTh has finite variance, an exercise, so the assumption of an equivalent martingale measure is justified. More precisely, it is sufficient for the absence of arbitrage that the option-price process is given by Equation (38). Necessity of the Black– Scholes formula for the absence of arbitrages in Q(X ) is addressed in Duffie (2001). We can already see, however, that the expectation in Equation (38) defining the Black–Scholes formula does not depend on which equivalent martingale measure Q one chooses, so one should expect that the Black–Scholes formula (38) is also necessary for the absence of arbitrage. If Equation (38) is not satisfied, for instance, there cannot be an equivalent martingale measure for S/ b. Unfortunately, and for purely technical reasons, this is not enough to imply directly the necessity of Equation (38) for the absence of well-behaved arbitrage, because we do not have a precise equivalence between the absence of arbitrage and the existence of equivalent martingale measures. In the Black–Scholes setting, s is of maximal rank d = 1 almost everywhere. Thus, from Proposition 3.9, there is exactly one equivalent martingale measure. The detailed calculations of Girsanov’s Theorem appear nowhere in the actual solution (37) for the “risk-neutral behavior” of arbitrage-free security prices, which can be given by inspection in terms of s and r only.
3.11. Complete markets We say that a random variable W can be replicated by a self-financing trading strategy q if W = qT · XT . Our basic objective in this section is to give a simple spanning condition on the diffusion s of the price process X under which, up to technical integrability conditions, any random variable can be replicated (without resorting to “doubling strategies”). Proposition. Suppose Y is a numerator deflator and Q is an equivalent martingale measure for the deflated price process X Y . Suppose the diffusion s Y of X Y is of maximal rank d almost everywhere. Let W be any random variable with E Q (|WY |) < ∞. Then there is a self-financing trading strategy q that replicates W and whose deflated market-value process {qt · XtY : t T } is a Q-martingale. Proof: Without loss of generality, the numeraire is the last of the N securities, so we write X Y = (Z, 1). Let BQ be the standard Brownian motion in Rd under Q obtained by Girsanov’s Theorem. The martingale representation property implies that, for any Q-martingale, there is some f such that EtQ (WYT ) = E Q (WYT ) + 0
t
fs dBsQ ,
t ∈ [0, T ].
(39)
678
D. Duffie
Y By the rank assumption on s Y and the fact that sNt = 0, there are adapted processes (1) (N − 1) solving q , ..., q N −1
qt( j) sjtY = ft ,
t ∈ [0, T ].
(40)
j=1
Let q (N ) be defined by qt(N )
Q
= E (WYT ) +
N − 1 t i=1
0
qs(i)
dZs(i)
−
qt(i) Zt(i)
.
(41)
Then q = (q (1) , . . . , q (N ) ) is self-financing and qT · XTY = WYT . By the Numeraire Invariance Theorem, q is also self-financing with respect to X and qT · XT = W . ! As f dBQ is by construction a Q-martingale, Equations (39–41) imply that {qt · XtY : 0 t T } is a Q-martingale. The property that the deflated market-value process {qt · XtY : 0 t T } is a Q-martingale ensures that there is no use of doubling strategies. For example, if W 0, then the martingale property implies that qt · Xt 0 for all t. Analogues to some of the results in this section for the case of market imperfections such as portfolio constraints or transactions costs are provided by Ahn, Dayal, Grannan and Swindle (1995), Bergman (1995), Constantinides and Zariphopoulou (1999, 2001), Cvitani´c and Karatzas (1993), Davis and Clark (1993), Grannan and Swindle (1996), Henrotte (1991), Jouini and Kallal (1993), Karatzas and Kou (1998), Kusuoka (1992, 1995), Soner, Shreve and Cvitani´c (1994) and Whalley and Wilmott (1997). Many of these results are asymptotic, for “small” proportional transactions costs, based on the approach of Leland (1985). 3.12. Optimal trading and consumption We now apply the “martingale” characterization of the cost of replicating an arbitrary payoff, given in the last proposition, to the problem of optimal portfolio and consumption processes. The setting is Merton’s problem, as formulated and solved in certain settings, for geometric Brownian prices, by Merton (1971). Merton used the method of dynamic programming, solving the associated Hamilton–Jacobi–Bellman (HJB) equation. 15 A major alternative method is the martingale approach to optimal investment, which reached a key stage of development with Cox and Huang (1989), who treat the agent’s candidate consumption choice as though it is a derivative security, and maximize 15
The book of Fleming and Soner (1993) treats HJB equations, stochastic control problems, emphasizing the use of viscosity methods.
Ch. 11:
Intertemporal Asset Pricing Theory
679
the agent’s utility subject to a wealth constraint on the arbitrage-free price of the consumption. Since that price can be calculated in terms of the given state-price density, the result is a simple static optimization problem. 16 Karatzas and Shreve (1998) provide a comprehensive treatment of optimal portfolio and consumption processes in this setting. Fixing a probability space (W, F, P) and the standard filtration {Ft : t 0} of a standard Brownian motion B in Rd , we suppose that X = (X (0) , X (1) , . . . , X (N ) ) is an Ito process in RN + 1 for the prices of N + 1 securities, with dXt(i) = mt(i) Xt(i) dt + Xt(i) st(i) dBt ;
X0(i) > 0,
(42)
where m = (m (0) , . . . , m (N ) ) and the RN × d -valued process s are bounded adapted processes. Letting s (i) denote the ith row of s , we suppose that s (0) = 0, so that we can treat m (0) as the short-rate process r. A special case of this setup is to have geometric Brownian security prices and a constant short rate, which was the setting of Merton’s original problem. We assume for simplicity that N = d. The excess expected returns of the “risky” securities are defined by the RN -valued process l given by lt(i) = mt(i) − rt . A deflated !t price process X is defined by Xt = Xt exp(− 0 rs ds). We assume that s is invertible (almost everywhere) and that the market-price-of-risk process h for X , defined by ht = st−1 lt , is bounded. It follows that markets are complete (in the sense of Proposition 3.11) and that there are no arbitrages meeting the standard credit constraint of non-negative wealth. In this setting, a state-price density p is defined by t rs ds xt , (43) pt = exp − 0 h
where x is the density process defined ! t by Equation (36) for an equivalent martingale measure Q, after deflation by exp[ 0 −r(s) ds]. Utility is defined over the space D of consumption pairs (c, Z), where c is an !T adapted nonnegative consumption-rate process with 0 ct dt < ∞ almost surely, and Z is an FT -measurable nonnegative random variable describing terminal lump-sum consumption. Specifically, U : D → R is defined by T u(ct , t) dt + F(Z) , (44) U (c, Z) = E 0
where • F: R+ → R is increasing and concave with F(0) = 0; 16 The related literature is immense, and includes Cox (1983), Pliska (1986), Cox and Huang (1991), Back (1986, 1991), Back and Pliska (1987), Duffie and Skiadas (1994), Foldes (1978a,b, 1990, 1991a,b, 1992, 2001), Harrison and Kreps (1979), Huang (1985b), Huang and Pag`es (1992), Karatzas, Lehoczky and Shreve (1987), Lakner and Slud (1991), Pag`es (1987) and Xu and Shreve (1992).
680
D. Duffie
• u: R+ × [0, T ] → R is continuous and, for each t in [0, T ], u(·, t): R+ → R is increasing and concave, with u(0, t) = 0; • F is strictly concave or zero, or for each t in [0, T ], u(·, t) is strictly concave or zero. • At least one of u and F is non-zero. (0) (N ) A trading strategy is a process q = (q ! , . . . , q ) in L(X ), meaning merely that the gain-from-trade stochastic integral q dX exists. Given an initial wealth w > 0, we say that (c, Z, q) is budget-feasible if (c, Z) is a consumption choice in D and q is a trading strategy satisfying qt · Xt = w +
t
qs dXs − 0
t
cs ds 0,
t ∈ [0, T ],
(45)
0
and qT · XT Z.
(46)
The first restriction (45) is that the current market value qt · Xt of the trading strategy is non-negative, a credit constraint, and is equal to its initial value w, plus any gains from security trade, less the cumulative consumption to date. The second restriction (46) is that the terminal portfolio value is sufficient to cover the terminal consumption. We now have the problem, for each initial wealth w, sup
(c,Z,q) ∈ L(w)
U (c, Z),
(47)
where L(w) is the set of budget-feasible choices at wealth w. First, we state an extension of the numeraire invariance result of Section 3.4, which obtains from an application of Ito’s Formula. Lemma. Let Y be any deflator. Given an initial wealth w 0, a strategy (c, Z, q) is budget-feasible given price process X if and only if it is budget feasible after deflation, that is, t t qs dXsY − Ys cs ds 0, t ∈ [0, T ], (48) qt · XtY = wY0 + 0
0
and qT · XTY ZYT .
(49)
With numeraire invariance, we can reduce the dynamic trading and consumption problem to a static optimization problem subject to an initial wealth constraint, as follows.
Ch. 11:
Intertemporal Asset Pricing Theory
681
Proposition. Given a consumption choice (c, Z) in D, there exists a trading strategy q such that (c, Z, q) is budget-feasible at initial wealth w if and only if E pT Z +
T
pt ct dt
w.
(50)
0
Proof: Suppose (c, Z, q) is budget-feasible. Applying the previous numeraire-invariance lemma to the state-price deflator p , and using the fact that p0 = x0 = 1, we have
T
w+ 0
qt dXtp
pT Z +
T
pt ct dt.
(51)
0
!t Because X p is a martingale under P, the process M , defined by Mt = w + 0 qs dXsp , is a non-negative local martingale, and therefore a supermartingale. For the definitions of local martingale and supermartingale, and for this property, see for example Protter (1990). By the supermartingale property, M0 E(MT ). Taking expectations through Equation (51) thus leaves Equation (50). Conversely, suppose (c, Z) satisfies Equation (50), and let M be the Q-martingale defined by Mt =
EtQ
e
−rT
T −rt
Z+
e ct dt . 0
By Girsanov’s Theorem, a standard Brownian motion BQ in Rd under Q is defined by dBtQ = dBt + ht dt, and BQ has the martingale representation property. Thus, there is some f = (f (1) , . . . , f (d) ) in L(BQ ) such that Mt = M0 + 0
t
fs dBsQ ,
t ∈ [0, T ],
!t where M0 w. For the deflator Y defined by Yt = exp[− 0 r(s) ds], we also know that X = X Y is a Q-martingale. From the definitions of the market price of risk h and of BQ , d Xt(i) = Xt(i) st(i) dBtQ ,
1 i N.
Because st is invertible and X is strictly positive with continuous sample paths, we can choose q (i) in L(X (i) ) for each i N such that
qt(1) Xt(1) , . . . , qt(N ) Xt(N ) st = ft ,
t ∈ [0, T ].
682
D. Duffie
This implies that M t = M0 +
N
t
i=1
0
N
t
qs(i) d Xs(i) .
(52)
We can also let qt(0) = w +
i=1
0
qs(i) d Xs(i) −
N
t
qt(i) Xt(i) −
e−rs cs ds.
(53)
0
i=1
!t
From Equation (50) and the fact that xt = pt exp[ process for Q, T e−rt ct dt w. M0 = E Q e−rT Z +
0
r(s) ds] defines the density
(54)
0
From Equations (53) and (52), and the fact that
!
q (0) d X (0) = 0,
t qs d Xs − e−rs cs ds, 0 0 t e−rs cs ds, = w + Mt − M0 − 0 T Q = w − M0 + E t e−rs cs ds + e−rT Z 0,
qt · Xt = w +
t
t
using Equation (54). With numeraire invariance, Equation (45) follows. We can also use the same inequality for t = T , Equation (54), and the fact that !T !T !t MT = exp[− 0 r(s) ds] Z + 0 exp[− 0 r(s) ds] ct dt to obtain Equation (46). Thus, (c, Z, q) is budget-feasible. Corollary. Given a consumption choice (c∗ , Z ∗ ) in D and some initial wealth w, there exists a trading strategy q ∗ such that (c∗ , Z ∗ , q ∗ ) solves Merton’s problem (47) if and only if (c∗ , Z ∗ ) solves the problem T pt ct dt + pT Z w. (55) sup U (c, Z) subject to E (c,Z) ∈ D
0
3.13. Martingale solution to Merton’s problem We are now in a position to obtain a relatively explicit solution to Merton’s problem (47) by using the equivalent formulation (55).
Ch. 11:
Intertemporal Asset Pricing Theory
683
By the Saddle Point Theorem and the strict monotonicity of U , (c∗ , Z ∗ ) solves (55) if and only if there is a scalar Lagrange multiplier g ∗ > 0 such that, first: (c∗ , Z ∗ ) solves the unconstrained problem sup
(c,Z) ∈ D
L (c, Z; g ∗ ) ,
(56)
where, for any g 0, L (c, Z; g) = U (c, Z) − gE pT Z +
T
pt ct dt − w ,
(57)
0
and second, (c∗ , Z ∗ ) satisfies the complementary-slackness condition T ∗ ∗ pt ct dt = w. E pT Z +
(58)
0
We can summarize our progress on Merton’s problem (47) as follows. Proposition. Given some (c∗ , Z ∗ ) in D, there is a trading strategy q ∗ such that (c∗ , Z ∗ , q ∗ ) solves Merton’s problem (47) if and only if there is a constant g ∗ > 0 !T such that (c∗ , Z ∗ ) solves Equation (56) and E(pT Z ∗ + 0 pt ct∗ dt) = w. In order to obtain intuition for the solution of (56), we begin with some arbitrary !T g > !0 and treat U (c, Z) = E[ 0 u(ct , t) dt + F(Z)] intuitively by thinking of “E” and “ ” as finite sums, in which case the first-order conditions for optimality of (c∗ , Z ∗ ) 0 for the problem sup(c,Z) L(c, Z; g), assuming differentiability of u and F, are uc (ct∗ , t) − gpt = 0,
t ∈ [0, T ],
(59)
and F (Z ∗ ) − gpT = 0.
(60)
Solving, we have ct∗ = I (gpt , t) ,
t ∈ [0, T ],
(61)
and Z ∗ = IF (gpT ) ,
(62)
where I (·, t) inverts 17 uc (·, t) and where IF inverts F . We will confirm these conjectured forms (61) and (62) of the solution in the next theorem. Under strict 17
If u = 0, we take I = 0. If F = 0, we take IF = 0.
684
D. Duffie
concavity of u or F, the inversions I (·, t) and IF , respectively, are continuous and strictly decreasing. A decreasing function w: ˆ (0, ∞) → R is therefore defined by T pt I (gpt , t) dt + pT IF (gpT ) . w(g) ˆ =E (63) 0
(We have not yet ruled out the possibility that the expectation may be +∞). All of this implies that (c∗ , Z ∗ ) of Equations (61) and (62) solves Problem (55) provided the required initial investment w(g) ˆ is equal to the endowed initial wealth w. This leaves an equation w(g) ˆ = w to solve for the “correct” Lagrange multiplier g ∗ , and with that an explicit solution to the optimal consumption policy for Merton’s problem. We now consider properties of u and F guaranteeing that w(g) ˆ = w can be solved for a unique g ∗ > 0. A strictly concave increasing function F: R+ → R that is differentiable on (0, ∞) satisfies Inada conditions if inf x F (x) = 0 and supx F (x) = +∞. If F satisfies these Inada conditions, then the inverse IF of F is well defined as a strictly decreasing continuous function on (0, ∞) whose image is (0, ∞). Condition A. Either F is zero or F is differentiable on (0, ∞), strictly concave, and satisfies Inada conditions. Either u is zero or, for all t, u(·, t) is differentiable on (0, ∞), strictly concave, and satisfies Inada conditions. For each g > 0, w(g) ˆ is finite. We recall the standing assumption that at least one of u and F is nonzero. The assumption of finiteness of w(·) ˆ has been shown by Kramkov and Schachermayer (1999) to follow from natural regularity conditions. Theorem. Under Condition A and the standing conditions on m, s , and r, for any w > 0, Merton’s problem has the optimal consumption policy given by Equations (61) and (62) for a unique scalar g > 0. Proof: Under Condition A, the Dominated Convergence Theorem implies that w(·) ˆ is continuous. Because one or both of I (·, t) and IF (·) have (0, ∞) as their image and are strictly decreasing, w(·) ˆ inherits these two properties. From this, given any initial wealth ˆ ∗ ) = w. Let (c∗ , Z ∗ ) be defined by Equation (61) w > 0, there is a unique g ∗ with w(g ∗ and (62), taking g = g . The previous proposition tells us there is a trading strategy q ∗ such that (c∗ , Z ∗ , q ∗ ) is budget-feasible. Let (q, c, Z) be any budget-feasible choice. The previous proposition also implies that (c, Z) satisfies Equation (50). For each (w, t), the first-order conditions (59) and (60) are sufficient (by concavity of u and F) for optimality of c∗ (w, t) and Z ∗ (w) in the problems sup u (c, t) − g ∗ p (w, t)c,
c ∈ [0,∞)
and sup F Z − g ∗ p (w, T )Z, Z ∈ [0,∞)
Ch. 11:
Intertemporal Asset Pricing Theory
685
respectively. Thus, u(ct∗ , t) − g ∗ pt ct∗ u(ct , t) − g ∗ pt ct ,
0 t T,
(64)
and F(Z ∗ ) − g ∗ pT Z ∗ F(Z) − g ∗ pT Z.
(65)
Integrating Equation (64) from 0 to T , adding Equation (65), taking expectations, and then applying the complementary slackness condition (58) and the budget constraint (50), leaves U (c∗ , Z ∗ ) U (c, Z). As (c, Z, q) is arbitrary, this implies the optimality of (c∗ , Z ∗ , q ∗ ). In practice, solving the equation w(g ˆ ∗ ) = w for g ∗ may require a one-dimensional numerical search, which is straightforward because w(·) ˆ is strictly monotone. This result, giving a relatively explicit consumption solution to Merton’s problem, has been extended in many directions, even generalizing the assumption of additive utility to allow for habit-formation or recursive utility, as shown by Schroder and Skiadas (1999). For a specific example, we treat terminal consumption only by taking u ≡ 0, and we let F(w) = wa / a for a ∈ (0, 1). Then c∗ = 0 and the calculations above imply that ˆ ∗ ) = w for g ∗ leaves w(g) ˆ = E[pT (gpT )1/ (a−1) ]. Solving w(g 1 − a g ∗ = wa − 1 E pTa/ (a − 1) . From Equation (62), Z ∗ = IF (g ∗ pT ) . Although this approach generates a straightforward solution for the optimal consumption policy, the form of the optimal trading strategy can be difficult to determine. For the special case of geometric Brownian price processes (constant m and s ) and a constant short rate r, we can calculate that Z ∗ = WT where W is the geometric Brownian wealth process obtained from dWt = Wt (r + f · l) dt + Wt f s dBt ;
W0 = w,
where f = (ss )−1 l/ (1 − a) is the vector of fixed optimal portfolio fractions. More generally, in a Markov setting, one can derive a PDE for the wealth process, as for the pricing approach to Black–Scholes option pricing formula, and from the derivatives of the solution function obtain the associated trading strategy. Merton’s original stochastic-control approach, in a Markov setting, gives explicit solutions for the optimal trading strategy in terms of the derivatives of the value function solving the HJB equation. Although there are only a few examples in which these derivatives are
686
D. Duffie
known explicitly, they can be approximated by a numerical solution of the Hamilton– Jacobi–Bellman equation. This martingale approach to solving Problem (47) has been extended with duality techniques and other methods to cases of investment with constraints, including incomplete markets. See, for example, Cvitani´c and Karatzas (1996), Cvitani´c, Wang and Schachermayer (2001), Cuoco (1997), and the many sources cited by Karatzas and Shreve (1998).
4. Term-structure models This section reviews models of the term structure of interest rates. These models are used to analyze the dynamic behavior of bond yields and their relationships with macro-economic covariates, and also for the pricing and hedging of fixed-income securities, those whose future payoffs are contingent on future interest rates. Termstructure modeling is one of the most active and sophisticated areas of application of financial theory to everyday business problems, ranging from managing the risk of a bond portfolio to the design and pricing of collateralized mortgage obligations. In this section, we treat default-free instruments. In Section 6, we turn to defaultable bonds. This section provides only a small skeleton of the extensive literature on term-structure models. More extensive notes to the literature are found in Duffie (2001) and in the surveys by Dai and Singleton (2003) and Piazzesi (2002). We first treat the standard “single-factor” examples of Merton (1974), Cox, Ingersoll and Ross (1985a), Dothan (1978), Vasicek (1977), Black, Derman and Toy (1990), and some of their variants. These models treat the entire term structure of interest rates at any time as a function of a single state variable, the short rate of interest. We will then turn to multi-factor models, including multi-factor affine models, extending the Cox– Ingersoll–Ross and Vasicek models. Finally, we turn to the term-structure framework of Heath, Jarrow and Morton (1992), which allows, under technical conditions, any initial term structure of forward interest rates and any process for the conditional volatilities and correlations of these forward rates. Numerical tractability is essential for practical and econometric applications. One must fit model parameters from time-series or cross-sectional data on bond and derivative prices. A fitted model may be used to price or hedge related contingent claims. Typical numerical methods include “binomial trees,” Fouriertransform methods, Monte-Carlo simulation, and finite-difference solution of PDEs. Even the “zero curve” of discounts must be fitted to the prices of coupon bonds. 18 In
18 See Adams and Van Deventer (1994), Coleman, Fisher and Ibbotson (1992), Diament (1993), Fisher, Nychka and Zervos (1994), Jaschke (1996), Konno and Takase (1995, 1996) and Svensson and Dahlquist (1996). Consistency of the curve-fitting method with an underlying term-structure model is examined by Bj¨ork and Christensen (1999), Bj¨ork and Gombani (1999) and Filipovi´c (1999).
Ch. 11:
Intertemporal Asset Pricing Theory
687
econometric applications, bond or option prices must be solved repeatedly for a large sample of dates and instruments, for each of many candidate parameter choices. We fix a probability space (W, F, P) and a filtration F = {Ft : 0 t T } satisfying the usual conditions, 19 as well as a short-rate process r. We have departed from a dependence on Brownian information in order to allow for “surprise jumps”, which are important in certain applications. A zero-coupon bond maturing at some future time s > t pays no dividends before time s, and offers a fixed lump-sum payment at time s that we can take without loss of generality to be 1 unit of account. Although it is not always essential to do so, we assume throughout that such a bond exists for each maturity date s. One of our main objectives is to characterize the price Lt,s at time t of the s-maturity bond, and its behavior over time. We fix some equivalent martingale measure Q, after taking as a numeraire for !t deflation purposes the market value exp[ 0 r(s) ds] of investments rolled over at the short-rate process r. The price at time t of the zero-coupon bond maturing at s is then s r(u) du . (66) Lt,s ≡ EtQ exp − t
The term structure is often expressed in terms of the yield curve. The continuously compounding yield yt,t on a zero-coupon bond maturing at time t + t is defined by log Lt, t + t . yt,t = − t The term structure can also be represented in terms of forward interest rates, as explained later in this section. 4.1. One-factor models A one-factor term-structure model means a model of r that satisfies a stochastic differential equation (SDE) of the form drt = m(rt , t) dt + s (rt , t) dBtQ ,
(67)
where BQ is a standard Brownian motion under Q and where m: R × [0, T ] → R and s : R × [0, T ] → Rd satisfy technical conditions guaranteeing the existence of a solution to Equation (67) such that, for all t and s t, the price Lt,s of the zero-coupon bond maturing at s is finite and well defined by Equation (66). The one-factor models are so named because the Markov property (under Q) of the solution r to Equation (67) implies, from Equation (66), that the short rate is the only 19
For these technical conditions, see for example, Protter (1990).
688
D. Duffie Table 1 Common single-factor model parameters, Equation (68)
Model
K0
K1
Cox, Ingersoll and Ross (1985a)
•
•
Pearson and Sun (1994)
•
•
K2
H0
H1 •
0.5
•
•
0.5
•
1.0
•
1.0
Dothan (1978) Brennan and Schwartz (1977)
•
Merton (1974), Ho and Lee (1986)
•
Vasicek (1977)
•
Black and Karasinski (1991)
• • •
•
Constantinides and Ingersoll (1984)
n
•
1.0
•
1.0 •
1.0
•
1.5
state variable, or “factor”, on which the current yield curve depends. That is, for all t and s t, we can write yt,s = F(t, s, rt ), for some fixed F: [0, T ] × [0, T ] × R → R. Table 1 shows many of the parametric examples of one-factor models appearing in the literature, with their conventional names. Each of these models is a special case of the SDE drt = [K0t + K1t rt + K2t rt log(rt )] dt + [H0t + H1t rt ]n dBtQ ,
(68)
for deterministic coefficients K0t , K1t , K2t , H0t and H1t depending continuously on t, and for some exponent n ∈ [0.5, 1.5]. Coefficient restrictions, and restrictions on the space of possible short rates, are needed for the existence and uniqueness of solutions. For each model, Table 1 shows the associated exponent n , and uses the symbol “•” to indicate those coefficients that appear in nonzero form. We can view a negative coefficient K1t as a mean-reversion parameter, in that a higher short rate generates a lower drift, and vice versa. Empirically speaking, mean reversion is widely believed to be a useful attribute to include in single-factor short-rate models. 20 Non-parametric single-factor models are estimated by A¨ıt-Sahalia (1996a,b, 2002). The empirical evidence, as examined for example by Dai and Singleton (2000), however, points strongly toward multifactor extensions, to which we will turn shortly. 20 In most cases, the original versions of these models had constant coefficients, and were only later extended to allow Kit and Hit to depend on t, for practical reasons, such as calibration of the model to a given set of bond and option prices. The Gaussian short-rate model of Merton (1974), who originated much of the approach taken here, was extended by Ho and Lee (1986), who developed the idea of calibration of the model to the current yield curve. The calibration idea was further developed by Black, Derman and Toy (1990), Hull and White (1990, 1993) and Black and Karasinski (1991), among others. Option evaluation and other applications of the Gaussian model is provided by Carverhill (1988), Jamshidian (1989a,b,c, 1991a, 1993b) and El Karoui and Rochet (1989). A popular special case of the Black–Karasinski model is the Black–Derman–Toy model.
Ch. 11:
Intertemporal Asset Pricing Theory
689
For essentially any single-factor model, the term structure can be computed (numerically, if not explicitly) by taking advantange of the Feynman–Kac relationship between SDEs and PDEs. Fixing for convenience the maturity date s, the Feynman– Kac approach implies from Equation (66), under technical conditions on m and s , for all t, that Lt,s = f (rt , t), where f ∈ C 2,1 (R × [0, T )) solves the PDE Df (x, t) − xf (x, t) = 0,
(x, t) ∈ R × [0, s),
(69)
with boundary condition f (x, s) = 1,
x ∈ R,
where Df (x, t) = ft (x, t) + fx (x, t) m(x, t) + 12 fxx (x, t) s (x, t)2 . This PDE can be quickly solved using standard finite-difference numerical algorithms. A subset of the models considered in Table 1, those with K2 = H1 = 0, are Gaussian. 21 Special cases are the models of Merton (1974) (often called “Ho–Lee”) and Vasicek (1977). For a Gaussian model, we can show that bond-price processes are log-normal (under Q) by defining a new process y satisfying dyt = −rt dt, and noting that (r, y) is a two-dimensional Gaussian Markov process. Thus, for any t and s t, !s the random variable ys − yt = − t ru du is normally distributed under Q, with a mean m(s − t) and variance v(s − t), conditional on Ft , that are easily computed in terms of rt , K0 , K1 , and H0 . The conditional variance v(s − t) is deterministic. The conditional mean m(t, s) is of the form a(s − t) + b(s − t) rt , for coefficients a(s − t) and b(s − t) whose calculation is left to the reader. It follows that s Q Lt,s = Et exp − ru du , t v(s − t) = exp m(t, s) + , 2 = exp [a(s − t) + b(s − t) r(t)] , where a(s − t) = a(s − t) + v(s − t)/ 2. Because rt is normally distributed under Q, this means that any zero-coupon bond price is log-normally distributed under Q. Using this property, one can compute bond-option prices in this setting using the original Black– Scholes formula. For this, a key simplifying trick of Jamshidian (1989b) is to adopt as a new numeraire the zero-coupon bond maturing at the expiration date of the option. The associated equivalent martingale measure is sometimes called the forward measure. By a Gaussian process, we mean that the short rates r(t1 ), . . . , r(tk ) at any finite set {t1 , . . . , tk } of times have a joint normal distribution under Q. 21
690
D. Duffie
Under the new numeraire and the forward measure, the price of the bond underlying the option is log-normally distributed with a variance that is easily calculated, and the Black–Scholes formula can be applied. Aside from the simplicity of the Gaussian model, this explicit computation is one of its main advantages in applications. An undesirable feature of the Gaussian model, however, is that it implies that the short rate and yields on bonds of any maturity are negative with positive probability at any future date. While negative interest rates are sometimes plausible when expressed in “real” (consumption numeraire) terms, it is common in practice to express term structures in nominal terms, relative to the price of money. In nominal terms, negative bond yields imply a kind of arbitrage. In order to describe this arbitrage, we can formally view money as a security with no dividends whose price process is identically equal to 1. (This definition in itself is an arbitrage!) If a particular zero-coupon bond were to offer a negative yield, consider a short position in the bond (that is, borrowing) and a long position of an equal number of units of money, both held to the maturity of the bond. With a negative bond yield, the initial bond price is larger than 1, implying that this position is an arbitrage. To address properly the role of money in supporting nonnegative interest rates would, however, require a rather wide detour into monetary theory and the institutional features of money markets. Let us merely leave this issue with the sense that allowing negative interest rates is not necessarily “wrong,” but is somewhat undesirable. Gaussian short-rate models are nevertheless frequently used because they are relatively tractable and in light of the low likelihood that they would assign to negative interest rates within a reasonably short time, with reasonable choices for the coefficient functions. One of the best-known single-factor term-structure models is that of Cox, Ingersoll and Ross (1985b), the “CIR model,” which exploits the stochastic properties of the diffusion model of population sizes of Feller (1951). For constant coefficient functions K0 , K1 , and H1 , the CIR drift and diffusion functions, m and s , may be written in the form m(x, t) = ú(x − x);
√ s (x, t) = C x,
x 0,
(70)
for constants ú, x, and C. Provided ú and x are non-negative, there is a nonnegative solution to the associated SDE (67). (Karatzas and Shreve (1988) offer a standard proof.) Given r0 , provided úx > C 2 , we know that rt has a non-central c 2 distribution under Q, with parameters that are known explicitly. The drift ú(x − rt ) indicates reversion of rt toward a stationary risk-neutral mean x at a rate ú, in the sense that E Q (rt ) = x + e−út (r0 − x) , which tends to x as t goes to +∞. Cox, Ingersoll and Ross (1985b) show how the coefficients ú, x, and C can be calculated in a general equilibrium setting in terms of the utility function and endowment of a representative agent. For the CIR model,
Ch. 11:
Intertemporal Asset Pricing Theory
691
it can be verified by direct computation of the derivatives that the solution for the term-structure PDE (69) is f (x, t) = exp [a(s − t) + b(s − t) x] ,
(71)
where 2úx log (2g exp [(g + ú) u/ 2]) − log (g + ú)(egu − 1) + 2g , C2 2(1 − egu ) b(u) = , (g + ú)(egu − 1) + 2g
a(u) =
for g = (ú 2 + 2C 2 )1/ 2 . The Gaussian and Cox–Ingersoll–Ross models are special cases of single-factor models with the property that the solution f of the term-structure PDE (69) is given by the exponential-affine form (71) for some coefficients a(·) and b(·) that are continuously differentiable. For all t, the yield − log[ f (x, t)]/ (s − t) obtained from Equation (71) is affine in x. We therefore call any such model an affine term-structure model. (A function g: Rk → R, for some k, is affine if there are constants a and b in Rk such that for all x, g(x) = a + b · x.) It turns out that, technicalities aside, m and s 2 are affine in x if and only if the term structure is itself affine in x. The idea that an affine term-structure model is typically associated with affine drift m and squared diffusion s 2 is foreshadowed in Cox, Ingersoll and Ross (1985b) and Hull and White (1990), and is explicit in Brown and Schaefer (1994). Filipovi´c (2001a) provides a definitive result for affine term structure models in a one-dimensional state space. We will get to multi-factor models shortly. The special cases associated with the Gaussian model and the CIR model have explicit solutions for a and b. Cherif, El Karoui, Myneni and Viswanathan (1995), Constantinides (1992), El Karoui, Myneni and Viswanathan (1992), Jamshidian (1996) and Rogers (1995) characterize a model in which the short rate is a linear-quadratic form in a multivariate Markov Gaussian process. This “LQG” class of models overlaps with the general affine models, as for example in Piazzesi (1999), although it remains to be seen how we would maximally nest the affine and quadratic Gaussian models in a simple and tractable framework. 4.2. Term-structure derivatives An important application of term-structure models is the arbitrage-free valuation of derivatives. Some of the most common derivatives are listed below, abstracting from many institutional details that can be found in a standard reference such as Sundaresan (1997). (a) A European option expiring at time s on a zero-coupon bond maturing at some later time u, with strike price p, is a claim to (Ls,u − p)+ at s.
692
D. Duffie
(b) A forward-rate agreement (FRA) calls for a net payment by the fixed-rate payer of c∗ − c(s) at time s, where c∗ is a fixed payment and c(s) is a floating-rate payment for a time-to-maturity d, in arrears, meaning that c(s) = Ls−1− d,s − 1 is the simple interest rate applying at time s − d for loans maturing at time s. In practice, we usually have a time to maturity, d, of one quarter or one half year. When originally sold, the fixed-rate payment c∗ is usually set so that the FRA is at market, meaning of zero market value. Cox, Ingersoll and Ross (1981), Duffie and Stanton (1988) and Grinblatt and Jegadeesh (1996) consider the relative pricing of futures and forwards. (c) An interest-rate swap is a portfolio of FRAs maturing at a given increasing sequence t(1), t(2), . . . , t(n) of coupon dates. The inter-coupon interval t(i) − t(i − 1) is usually 3 months or 6 months. The associated FRA for date t(i) calls for a net payment by the fixed-rate payer of c∗ − c(t(i)), where the floating-rate payment −1 ∗ received is c(t(i)) = Lt(i − 1),t(i) − 1, and the fixed-rate payment c is the same for all coupon dates. At initiation, the swap is usually at market, meaning that the fixed rate c∗ is chosen so that the swap is of zero market value. Ignoring default risk and market imperfections, this would imply that the fixed-rate coupon c∗ is the par coupon rate. That is, the at-market swap rate c∗ is set at the origination date t of the swap so that 1 = c∗ Lt, t(1) + · · · + Lt, t(n) + Lt, t(n) , meaning that c∗ is the coupon rate on a par bond, one whose face value and initial market value are the same. Swap markets are analyzed by Brace and Musiela (1994), Carr and Chen (1996), Collin-Dufresne and Solnik (2001), Duffie and Huang (1996), Duffie and Singleton (1997), El Karoui and Geman (1994) and Sundaresan (1997). For institutional and general economic features of swap markets, see Lang, Litzenberger and Liu (1998) and Litzenberger (1992). (d) A cap can be viewed as portfolio of “caplet” payments of the form (c(t(i)) − c∗ )+ , for a sequence of payment dates t(1), t(2), . . . , t(n) and floating rates c(t(i)) that are defined as for a swap. The fixed rate c∗ is set with the terms of the cap contract. For the valuation of caps, see, for example, Chen and Scott (1995), Clewlow, Pang and Strickland (1997), Miltersen, Sandmann and Sondermann (1997), and Scott (1997). The basic idea is to view a caplet as a put option on a zero-coupon bond. (e) A floor is defined symmetrically with a cap, replacing (c(t(i)) − c∗ )+ with (c∗ − c(t(i)))+ . (f) A swaption is an option to enter into a swap at a given strike rate c∗ at some exercise time. If the future time is fixed, the swaption is European. Pricing of European swaptions is developed in Gaussian settings by Jamshidian (1989a,b,c, 1991a), and more generally in affine settings by Berndt (2002), Collin-Dufresne and Goldstein (2002) and Singleton and Umantsev (2003). An important variant, the Bermudan swaption, allows exercise at any of a given set of successive coupon
Ch. 11:
Intertemporal Asset Pricing Theory
693
dates. For valuation methods, see Andersen and Andreasen (2000b) and Longstaff and Schwartz (2001). Jamshidian (2001) and Rutkowski (1996, 1998) offer general treatments of LIBOR (London Interbank Offering Rate) derivative modeling. 22 Path-dependent derivative securities, such as mortgage-backed securities, sometimes call for additional state variables. 23 In a one-factor setting, suppose a derivative has a payoff at some given time s defined by g(rs ). By the definition of an equivalent martingale measure, the price at time t for such a security is s F (rt , t) ≡ EtQ exp − ru du g (rs ) . t
Under technical conditions on m, s and g, we know that F solves the PDE, for (x, t) ∈ R × [0, s), Ft (x, t) + Fx (x, t) m(x, t) + 21 Fxx (x, t) s (x, t)2 − xF(x, t) = 0,
(72)
with boundary condition F(x, s) = g(x),
x ∈ R.
For example, the valuation of a zero-coupon bond option is given, in a one-factor setting, by the solution F to Equation (72), with boundary value g(x) = [ f (x, s) − p]+ , where f (x, s) is the price at time s of a zero-coupon bond maturing at u. 4.3. Fundamental solution Under technical conditions, we can also express the solution F of the PDE (72) for the value of a derivative term-structure security in the form +∞ G(x, t, y, s) g( y) dy, (73) F(x, t) = −∞
where G is the fundamental solution of the PDE (72). One may think of G(x, t, y, s) dy as the price at time t, state x, of an “infinitesimal security” paying one unit of account in 22
On the valuation of other specific forms of term-structure derivatives, see Artzner and Roger (1993), Bajeux-Besnainou and Portait (1998), Brace and Musiela (1994), Chacko and Das (2002), Chen and Scott (1992, 1993), Cherubini and Esposito (1995), Chesney, Elliott and Gibson (1993), Cohen (1995), Daher, Romano and Zacklad (1992), D´ecamps and Rochet (1997), El Karoui, Lepage, Myneni, Roseau and Viswanathan (1991a,b), and Turnbull (1993), Fleming and Whaley (1994) (wildcard options), Ingersoll (1977) (convertible bonds), Jamshidian (1993a, 1994) (diff swaps and quantos), Jarrow and Turnbull (1994), Longstaff (1990) (yield options), and Turnbull (1995). 23 The pricing of mortgage-backed securities based on term-structure models is pursued by Boudoukh, Richardson, Stanton and Whitelaw (1997), Cheyette (1996), Jakobsen (1992), Stanton (1995) and Stanton and Wallace (1995, 1998), who also review some of the related literature.
694
D. Duffie
the event that the state is at level y at time s, and nothing otherwise. One can compute the fundamental solution G by solving a PDE that is “dual” to Equation (72), in the following sense. Under technical conditions, for each (x, t) in R × [0, T ), a function y ∈ C 2,1 (R × (0, T ]) is defined by y( y, s) = G(x, t, y, s), and solves the forward Kolmogorov equation (also known as the Fokker–Planck equation): D∗ y( y, s) − yy( y, s) = 0,
(74)
where D∗ y( y, s) = −ys ( y, s) −
ð2 ð [y( y, s) m( y, s)] + 12 2 y( y, s) s ( y, s)2 . ðy ðy
The “intuitive” boundary condition for Equation (74) is obtained from the role of G in pricing securities. Imagine that the current short rate at time t is x, and consider an instrument that pays one unit of account immediately, if and only if the current short rate is some number y. Presumably this contingent claim is valued at 1 unit of account if x = y, and otherwise has no value. From continuity in s, one can thus think of y(·, s) as the density at time s of a measure on R that converges as s ↓ t to a probability measure n with n ({x}) = 1, sometimes called the Dirac measure at x. This initial boundary condition on y can be made more precise. See, for example, Karatzas and Shreve (1988) for details. Applications to term-structure modeling of the fundamental solution, sometimes erroneously called the “Green’s function,” are illustrated by Dash (1989), Beaglehole (1990), Beaglehole and Tenney (1991), B¨uttler and Waldvogel (1996), Dai (1994) and Jamshidian (1991b). For example, Beaglehole and Tenney (1991) show that the fundamental solution G of the Cox–Ingersoll–Ross model (70) is given explicitly in terms of the parameters ú, x and C by
G(x, 0, y, t) =
# f(t) Iq f(t) xy e−gt exp [f(t)( y + x e−gt ) − h(x + úxt − y)]
egt y x
q/ 2 ,
where g = (ú 2 + 2C 2 )1/ 2 , h = (ú − g)/C 2 , f(t) =
2g , C 2 (1 − e−gt )
q=
2úx − 1, C2
and Iq (·) is the modified Bessel function of the first kind of order q. For timeindependent m and s , as with the CIR model, we have, for all t and s > t, G(x, t, y, s) = G(x, 0, y, s − t). The fundamental solution for the Dothan (log-normal) short-rate model can be deduced from the form of the solution by Hogan and Weintraub (1993) of what he calls the “conditional discounting function”. Chen (1996) provides the fundamental
Ch. 11:
Intertemporal Asset Pricing Theory
695
solution for his 3-factor affine model. Van Steenkiste and Foresi (1999) provide a general treatment of fundamental solutions of the PDE for affine models. For more technical details and references see, for example, Karatzas and Shreve (1988). Given the fundamental solution G, the derivative asset-price function F is more easily computed by numerically integrating Equation (73) than from a direct numerical attack on the PDE (72). Thus, given a sufficient number of derivative securities whose prices must be computed, it may be worth the effort to compute G.
4.4. Multifactor term-structure models The one-factor model (67) for the short rate is limiting. Even a casual review of the empirical properties of the term structure, for example as reviewed in the surveys of Dai and Singleton (2003) and Piazzesi (2002), shows the significant potential improvements in fit offered by a multifactor term-structure model. While terminology varies from place to place, by a “multifactor” model we mean a model in which the short rate is of the form rt = R(Xt , t), t 0, where X is a Markov process with a state space D that is some subset of Rk , for k > 1. For example, in much of the literature, X is an Ito process solving a stochastic differential equation of the form dXt = m(Xt , t) dt + s (Xt , t) dBtQ ,
(75)
where BQ is a standard Brownian motion in Rd under Q and the given functions R, m and s on D × [0, ∞) into R, Rk and Rk × d , respectively, satisfy enough technical regularity to guarantee that Equation (75) has a unique solution and that the term structure (66) is well defined. In empirical applications, one often supposes that the state process X also satisfies a stochastic differential equation under the probability measure P, in order to exploit the time-series behavior of observed prices and pricedetermining variables in estimating the model. There are various approaches for identifying the state vector Xt . In certain models, some or all elements of the state vector Xt are latent, that is, unobservable to the modeler, except insofar as they can be inferred from prices that depend on the levels of X . For example, k state variables might be identified from bond yields at k distinct maturities. Alternatively, one might use both bond and bond option prices, as in Singleton and Umantsev (2003) or Collin-Dufresne and Goldstein (2001b, 2002). This is typically possible once one knows the parameters, as explained below, but the parameters must of course be estimated at the same time as the latent states are estimated. This latent-variable approach has nevertheless been popular in much of the empirical literature. Notable examples include Dai and Singleton (2000), and references cited by them. Another approach is to take some or all of the state variables to be directly observable variables, such as macro-economic determinants of the business cycle
696
D. Duffie
and inflation, that are thought to play a role in determining the term structure. This approach has also been explored by Piazzesi (1999), among others. 24 A derivative security, in this setting, can often be represented in terms of some realvalued terminal payment function g on Rk , for some maturity date s T . By the definition of an equivalent martingale measure, the associated derivative security price is s Q R (Xu , u) du g (Xs ) . F (Xt , t) = Et exp − t
For the case of a diffusion state process X satisfying Equation (75), extending Equation (72), under technical conditions we have the PDE characterization DF(x, t) − R(x, t) F(x, t) = 0,
(x, t) ∈ D × [0, s),
(76)
with boundary condition F(x, s) = g(x),
x ∈ D,
(77)
where
DF(x, t) = Ft (x, t) + Fx (x, t) m(x, t) + 12 tr s (x, t) s (x, t) Fxx (x, t) .
The case of a zero-coupon bond is g(x) ≡ 1. Under technical conditions, we can also express the solution F, as in Equation (73), in terms of the fundamental solution G of the PDE (76). 4.5. Affine models Many financial applications including term-structure modeling are based on a state process that is Markov, under some reference probability measure that, depending on the application, may or may not be an equivalent martingale measure. We will fix the probability measure P for the current discussion. A useful assumption is that the Markov state process is “affine”. While several equivalent definitions of the class of affine processes can be usefully applied, perhaps the simplest definition of the affine property for a Markov process X in a state space D ⊂ Rd is that its conditional characteristic function is of the form, for any u ∈ Rd , E (exp [iu · X (t)] | X (s)) = exp [f(t − s, u) + y(t − s, u) · X (s)] .
(78)
for some deterministic coefficients f(t − s, u) and y(t − s, u). Duffie, Filipovi´c and Schachermayer (2003) show that, for a time-homogeneous 25 affine process X with a 24
See also Babbs and Webber (1994), Balduzzi, Bertola, Foresi and Klapper (1998) and Piazzesi (1997). On modeling the term-structure of real interest rates, see Brown and Schaefer (1996) and Pennacchi (1991). 25 Filipovi´ c (2001b) extends to the time inhomogeneous case.
Ch. 11:
Intertemporal Asset Pricing Theory
697
state space of the form R+n × Rd − n , provided the coefficients f(·) and y(·) of the characteristic function are differentiable and their derivatives are continuous at 0, the affine process X must be a jump-diffusion process, in that dXt = m(Xt ) dt + s (Xt ) dBt + dJt ,
(79)
for a standard Brownian motion B in Rd and a pure-jump process J , and moreoever the drift m(Xt ), the “instantaneous” covariance matrix s (Xt ) s (Xt ) , and the jump measure associated with J must all have affine dependence on the state Xt . This result also provides necessary and sufficient conditions on the coefficients of the drift, diffusion, and jump measure for the process to be a well defined affine process, and provides that the coefficients f(·, u) and y(·, u) of the characteristic function satisfy a certain (generalized Riccati) ordinary differential equation (ODE), the key to tractability for this class of processes. 26 Conversely, any jump-diffusion whose coefficients are of this affine class is an affine process in the sense of Equation (78). A complete statement of this result is found in Duffie, Filipovi´c and Schachermayer (2003). Simple examples of affine processes used in financial modeling are the Gaussian Ornstein–Uhlenbeck model, applied to interest rates by Vasicek (1977), and the Feller (1951) diffusion, applied to interest-rate modeling by Cox, Ingersoll and Ross (1985b), as already mentioned in the context of one-factor models. A general multivariate class of affine term-structure jump-diffusion models was introduced by Duffie and Kan (1996) for term-structure modeling. Dai and Singleton (2000) classified 3-dimensional affine diffusion models, and found evidence in U.S. swap rate data that both timevarying conditional variances and negatively correlated state variables are essential ingredients to explaining the historical behavior of term structures. For option pricing, there is a substantial literature building on the particular affine stochastic-volatility model for currency and equity prices proposed by Heston (1993). Bates (1997), Bakshi, Cao and Chen (1997), Bakshi and Madan (2000) and Duffie, Pan and Singleton (2000) brought more general affine models to bear in order to allow for stochastic volatility and jumps, while maintaining and exploiting the simple property (78). A key property related to Equation (78) is that, for any affine function R: D → R and any w ∈ Rd , subject only to technical conditions reviewed in Duffie, Filipovi´c and Schachermayer (2003), s Et exp −R(X (u)) du + w · X (s) = exp [a(s − t) + b(s − t) · X (t)] , (80) t
for coefficients a(·) and b(·) that satisfy generalized Riccati ODEs (with real boundary conditions) of the same type solved by f and y of Equation (78), respectively. 26
Recent work, yet to be distributed, by Martino Graselli of CREST, Paris, and Claudio Tebaldi, provides explicit solutions for the Riccati equations of multi-factor affine diffusion processes.
698
D. Duffie
In order to get a quick sense of how the Riccati equations for a(·) and b(·) arise, we consider the special case of an affine diffusion process X solving the stochastic differential equation (79), with state space D = R+ , and with m(x) = a + bx and s 2 (x) = cx, for constant coefficients a, b and c. (This is the continuous branching process of Feller (1951).) We let R(x) = ø0 + ø1 x, for constants ø0 and ø1 , and apply the Feynman–Kac partial differential equation (PDE) (69) to the candidate solution exp[a(s − t) + b(s − t) · x] of Equation (80). After calculating all terms of the PDE and then dividing each term of the PDE by the common factor exp[a(s − t) + b(s − t) · x], we arrive at − a (z) − b (z) x + b(z)(a + bx) + 12 b(z)2 c2 x − ø0 − ø1 x = 0,
(81)
for all z 0. Collecting terms in x, we have u(z) x + v(z) = 0,
(82)
where u(z) = −b (z) + b(z) b + 12 b(z)2 c2 − ø1 ,
v(z) = −a (z) + b(z) a − ø0 .
(83) (84)
Because Equation (82) must hold for all x, it must be the case that u(z) = v(z) = 0. This leaves the Riccati equations: b (z) = b(z) b + 12 b(z)2 c2 − ø1 ,
a (z) = b(z) a − ø0 ,
(85) (86)
with the boundary conditions a(0) = 0 and b(0) = w, from Equation (80) for s = t. The explicit solutions for a(z) and b(z) were stated earlier for the CIR model (the case w = 0), and are given explicitly in a more general case with jumps, called a “basic affine process”, in Duffie and Gˆarleanu (2001). Beyond the Gaussian case, any Ornstein–Uhlenbeck process, whether driven by a Brownian motion (as for the Vasicek model) or by a more general L´evy process with jumps, as in Sato (1999), is affine. Moreover, any continuous-branching process with immigration (CBI process), including multi-type extensions of the Feller process, is affine. [See Kawazu and Watanabe (1971).] Conversely, an affine process in R+d is a CBI process. For term-structure modeling, 27 the state process X is typically assumed to be affine under a given equivalent martingale measure Q. For econometric modeling of 27 Special cases of affine term-structure models include those of Balduzzi, Das and Foresi (1998), Balduzzi, Das, Foresi and Sundaram (1996), Baz and Das (1996), Berardi and Esposito (1999), Chen (1996), Cox, Ingersoll and Ross (1985b), Das (1993, 1995, 1997, 1998), Das and Foresi (1996), Duffie and Kan (1996), Duffie, Pedersen and Singleton (2003), Heston (1988), Langetieg (1980), Longstaff and Schwartz (1992, 1993), Pang and Hodges (1995) and Selby and Strickland (1993).
Ch. 11:
Intertemporal Asset Pricing Theory
699
bond yields, the affine assumption is sometimes also made under the data-generating measure P, although Duffee (1999b) suggests that this is overly restrictive from an empirical viewpoint, at least for 3-factor models of interest rates in the USA that do not have jumps. For general reviews of this issue, and summaries of the empirical evidence on affine term structure models, see Dai and Singleton (2003) and Piazzesi (2002). The affine class allows for the analytic calculation of bond option prices on zero-coupon bonds and other derivative securities, as reviewed in Section 5, and extends to the case of defaultable models, as we show in Section 6. For related computational results, see Liu, Pan and Pedersen (1999) and Van Steenkiste and Foresi (1999). Singleton (2001) exploits the explicit form of the characteristic function of affine models to provide a class of moment conditions for econometric estimation.
4.6. The HJM model of forward rates We turn to the term structure model of Heath, Jarrow and Morton (1992). Until this point, we have taken as the primitive a model of the short-rate process of the form rt = R(Xt , t), where (under some equivalent martingale measure) X is a finitedimensional Markov process. This approach has analytical advantages, especially for derivative pricing and statistical modeling. A more general approach that is especially popular in business applications is to directly model the risk-neutral stochastic behavior of the entire term structure of interest rates. This is the essence of the Heath–Jarrow– Morton (HJM) model. The remainder of this section is a summary of the basic elements of the HJM model. If the discount Lt,s is differentiable with respect to the maturity date s, a mild regularity, we can write Lt,s
s = exp − f (t, u) du , t
where f (t, u) = −
1 ðLt,u . Lt,u ðu
The term structure can thus be represented in terms of the instantaneous forward rates, { f (t, u): u t}. The HJM approach is to take as primitive a particular stochastic model of these forward rates. First, for each fixed maturity date s, one models the one-dimensional forward-rate process f (·, s) = { f (t, s): 0 t s} as an Ito process, in that
t
m(u, s) du +
f (t, s) = f (0, s) + 0
0
t
s (u, s) dBuQ ,
0 t s,
(87)
700
D. Duffie
where m(·, s) = {m(t, s): 0 t s} and s (·, s) = {s (t, s): 0 t s} are adapted processes valued in R and Rd , respectively, such that Equation (87) is well defined. 28 Under purely technical conditions, it must be the case that s s (t, u) du. (88) m(t, s) = s (t, s) · t
In order to confirm this key risk-neutral drift restriction (88), consider the Q-martingale M defined by s Q Mt = Et exp − ru du 0 t (89) ru du Lt,s = exp − 0
= exp (Xt + Yt ) , where
t
Xt = −
ru du;
Yt = −
0
s
f (t, u) du. t
We can view Y as an infinite sum of the Ito processes for forward rates over all maturities ranging from t to s. Under technical conditions 29 for Fubini’s Theorem for stochastic integrals, we thus have dYt = mY (t) dt + sY (t) dBtQ , where
mY (t) = f (t, t) −
s
m(t, u) du, t
and
sY (t) = −
s
s (t, u) du. t
We can then apply Ito’s Formula in the usual way to Mt = eX (t) + Y (t) and obtain the drift under Q of M as mM (t) = Mt mY (t) + 12 sY (t) · sY (t) − rt . Because M is a Q-martingale, we must have mM = 0, so, substituting mY (t) into this equation, we obtain s s s 1 m(t, u) du = 2 s (t, u) du · s (t, u) du . t
t
t
Taking the derivative of each side with respect to s then leaves the risk-neutral drift restriction (88) which in turn provides, naturally, the property that r(t) = f (t, t). !s The necessary and sufficient condition is that, almost surely, 0 |m(t, s)| dt < ∞ and s (t, s) · s (t, s) dt < ∞. In addition to measurability, it suffices that m(t, u, w) and s (t, u, w) are uniformly bounded and, for each w, continuous in (t, u). For weaker conditions, see Protter (1990). 28 !s 0 29
Ch. 11:
Intertemporal Asset Pricing Theory
701
Thus, the initial forward rates { f (0, s): 0 s T } and the forward-rate “volatility” process s can be specified with nothing more than technical restrictions, and these are enough to determine all bond and interest-rate derivative price processes. Aside from the Gaussian special case associated with deterministic volatility s (t, s), however, most valuation work in the HJM setting is typically done by Monte Carlo simulation. Special cases aside, 30 there is no finite-dimensional state variable for the HJM model, so PDE-based computational methods cannot be used. The HJM model has been extensively treated in the case of Gaussian instantaneous forward rates by Jamshidian (1989b), who developed the forward-measure approach, and Jamshidian (1989a,c, 1991a) and El Karoui and Rochet (1989), and extended by El Karoui, Lepage, Myneni, Roseau and Viswanathan (1991a,b), El Karoui and Lacoste (1992), Frachot (1995), Frachot, Janci and Lacoste (1993), Frachot and Lesne (1993) and Miltersen (1994). A related model of log-normal discrete-period interest rates, the “market model,” was developed by Miltersen, Sandmann and Sondermann (1997). 31 Musiela (1994b) suggested treating the entire forward-rate curve g(t, u) = { f (t, t + u): 0 u ∞}, itself as a Markov process. Here, u indexes time to maturity, not date of maturity. That is, we treat the term structure g(t) = g(t, ·) as an element of some convenient state space S of real-valued continuously differentiable functions on [0, ∞). Now, letting v(t, u) = s (t, t + u), the risk-neutral drift restriction (88) on f , and enough regularity, imply the stochastic partial differential equation (SPDE) for g given by dg(t, u) =
ðg(t, u) dt + V (t, u) dt + v(t, u) dBtQ , ðu
where V (t, u) = v(t, u) ·
u
v(t, z) dz. 0
This formulation is an example of a rather delicate class of SPDEs that are called “hyperbolic”. Existence is usually not shown, or shown only in a “weak sense”, as by Kusuoka (2000). The idea is nevertheless elegant and potentially important in getting a parsimonious treatment of the yield curve as a Markov process. One may even allow 30 See Au and Thurston (1993), Bhar and Chiarella (1995), Cheyette (1995), Jeffrey (1995), Musiela (1994b), Ritchken and Sankarasubramaniam (1992) and Ritchken and Trevor (1993). 31 See also Andersen and Andreasen (2000a), Brace and Musiela (1995), Dothan (1978), Goldberg (1998), Goldys, Musiela and Sondermann (1994), Hansen and Jorgensen (2000), Hogan and Weintraub (1993), Jamshidian (1997a,b, 2001), Sandmann and Sondermann (1997), Miltersen, Sandmann and Sondermann (1997), Musiela (1994a) and Vargiolu (1999). A related log-normal futures-price term structure model is due to Heath (1998).
702
D. Duffie
the Brownian motion BQ to be “infinite-dimensional”. For related work in this setting, sometimes called a string, random field, or SPDE model of the term structure, see Cont (1998), Jong and Santa-Clara (1999), Goldstein (1997, 2000), Goldys and Musiela (1996), Hamza and Klebaner (1995), Kennedy (1994), Kusuoka (2000), Musiela and Sondermann (1994), Pang (1996), Santa-Clara and Sornette (2001) and Sornette (1998).
5. Derivative pricing We turn to a review of the pricing of derivative securities, taking first futures and forwards, and then turning to options. The literature is immense, and we shall again merely provide a brief summary of results. Again, we fix a probability space (W, F, P) and a filtration F = {Ft : 0 t T } satisfying the usual conditions, as well as a shortrate process r. 5.1. Forward and futures prices We briefly address the pricing of forward and futures contracts, an important class of derivatives. The forward contract is the simpler of these two closely related securities. Let W be an FT -measurable finite-variance random variable underlying the claim payable to a holder of the forward contract at its delivery date T . For example, with a forward contract for delivery of a foreign currency at time T , the random variable W is the market value at time T of the foreign currency. The forward-price process F is defined by the fact that one forward contract at time t is a commitment to pay the net amount Ft − W at time T , with no other cash flows at any time. In particular, the true price of a forward contract, at the contract date, is zero. We fix !an equivalent martingale measure Q for the available securities, after deflation t by exp[− 0 r(u) du], where r is a short-rate process that, for convenience, is assumed to be bounded. The dividend process H defined by the forward contract made at time t is given by Hs = 0, s < T , and HT = W − Ft . Because the true price of the forward contract at t is zero, 0=
EtQ
exp −
T
rs ds (W − Ft ) .
t
Solving for the forward price, % $ ! T EtQ exp − t rs ds W $ ! % . Ft = T EtQ exp − t rs ds
Ch. 11:
Intertemporal Asset Pricing Theory
703
If we assume that there exists at time t a zero-coupon riskless bond maturing at time T , with price Lt,T , then T 1 Q Ft = E exp − rs ds W . Lt,T t t If r and W are statistically independent with respect to Q, we have the simplified expression Ft = EtQ (W ), implying that the forward price is a Q-martingale. This would be true, for instance, if the short-rate process r is deterministic. As an example, suppose that the forward contract is for delivery at time T of one unit of a particular security with price process S and cumulative dividend process D. In particular, W = ST . We can obtain a more concrete representation of the forward price, as follows. We have 1 Ft = Lt,T
St −
EtQ
T
s exp − ru du dDs .
t
t
If the short-rate process r is deterministic, we can simplify further to St Ft = − EtQ Lt,T
t
T
exp
T
ru du
dDs ,
(90)
s
which is known as the cost-of-carry formula for forward prices for the case in which interest rates and dividends are deterministic. As with a forward contract, a futures contract with delivery date T is keyed to some delivery value W , which we take to be an FT -measurable random variable with finite variance. The contract is completely defined by a futures-price process F with the property that FT = W . As we shall see, the contract is literally a security whose price process is zero and whose cumulative dividend process is F. In other words, changes in the futures price are credited to the holder of the contract as they occur. This definition is an abstraction of the traditional notion of a futures contract, which calls for the holder of one contract at the delivery time T to accept delivery of some asset (whose spot market value at T is represented here by W ) in return for simultaneous payment of the current futures price FT . Likewise, the holder of −1 contract, also known as a short position of 1 contract, is traditionally obliged to make delivery of the same underlying assset in exchange for the current futures price FT . This informally justifies the property FT = W of the futures-price process F given in the definition above. Roughly speaking, if FT is not equal to W (and if we continue to neglect transactions costs and other details), there is a delivery arbitrage. We won’t explicitly define a delivery arbitrage since it only complicates the analysis of futures prices that follows. Informally, however, in the event that W > FT , one could buy at time T the deliverable asset for W , simultaneously sell one futures contract, and make immediate delivery for a profit of W − FT . Thus, the potential of delivery
704
D. Duffie
arbitrage will naturally equate FT with the delivery value W . This is sometimes known as the principle of convergence. Many modern futures contracts have streamlined procedures that avoid the delivery process. For these, the only link that exists with the notion of delivery is that the terminal futures price FT is contractually equated to some such variable W , which could be the price of some commodity or security, or even some abstract variable of general economic interest such as a price deflator. This procedure, finessing the actual delivery of some asset, is known as cash settlement. In any case, whether based on cash settlement or the absence of delivery arbitrage, we shall always take it by definition that the delivery futures price FT is equal to the given delivery value W . The institutional feature of futures markets that is central to our analysis of futures prices is resettlement, the process that generates daily or even more frequent payments to and from the holders of futures contracts based on changes in the futures price. As with the expression “forward price”, the term “futures price” can be misleading in that the futures price Ft at time t is not at all the price of the contract. Instead, at each resettlement time t, an investor who has held q futures contracts since the last resettlement time, say s, receives the resettlement payment q(Ft − Fs ), following the simplest resettlement recipe. More complicated resettlement arrangements often apply in practice. The continuous-time abstraction is to take the futures-price process F to be an Ito process and ! a futures position process to be some q in L(F) generating the resettlement gain q dF as a cumulative-dividend process. In particular, as we have already stated in its definition, the futures-price process F is itself, formally speaking, the cumulative dividend process associated with the contract. The true price process is zero, since (again ignoring some of the detailed institutional procedures), there is no payment against the contract due at the time a contract is bought or sold. The futures-price process F can now be characterized as follows. We suppose that !t the short-rate process r is bounded. For all t, let Yt = exp[− 0 r(s) ds]. Because F is strictly speaking the cumulative-dividend process associated with the futures contract, and since the true-price process of the contract is zero, from the fact that the riskneutral discounted gain is a martingale, T Q Ys dFs , 0 = Et t T, t
! from which it follows that the stochastic integral Y dF is a Q-martingale. Because r is bounded, there are constants k1 > 0 and k2 such that k1 Yt k2 for all t. The ! process Y dF is therefore a Q-martingale if and only if F is also a Q-martingale. Since FT = W , we have deduced a convenient representation for the futures-price process: Ft = EtQ (W ),
t ∈ [0, T ].
(91)
If r and W are statistically independent under Q, the futures-price process F and the forward-price process F are thus identical. Otherwise, as pointed out by Cox, Ingersoll
Ch. 11:
Intertemporal Asset Pricing Theory
705
and Ross (1981), there is a distinction based on correlation between changes in futures prices and interest rates. 5.2. Options and stochastic volatility The Black–Scholes formula, which treats option prices under constant volatility, can be extended to cases with stochastic volatility, which is crucial in many markets from an empirical viewpoint. We will briefly examine several basic approaches, and then turn to the computation of option prices using the Fourier-transform method introduced by Stein and Stein (1991), and then first exploited in an affine setting by Heston (1993). We recall that the Black–Scholes option-pricing formula is of the form C(x, p, r, t, s ), for C: R+5 → R+ , where x is the current underlying asset price, p is the exercise price, r is the constant short rate, t is the time to expiration, and s is the volatility coefficient for the underlying asset. For each fixed (x, p, r, t) with non-zero x and t, the map from s to C(x, p, r, t, s ) is strictly increasing, and its range is unbounded. We may therefore invert and obtain the volatility from the option price. That is, we can define an implied volatility function I : R+5 → R+ by c = C (x, p, r, t, I (x, p, r, t, c)) ,
(92)
for all sufficiently large c ∈ R+ . If c1 is the Black–Scholes price of an option on a given asset at strike p1 and expiration t1 , and c2 is the Black–Scholes price of an option on the same asset at strike p2 and expiration t2 , then the associated implied volatilities I (x, p1 , r, t1 , c1 ) and I (x, p2 , r, t2 , c2 ) must be identical, if indeed the assumptions underlying the Black– Scholes formula apply literally, and in particular if the underlying asset-price process has the constant volatility of a geometric Brownian motion. It has been widely noted, however, that actual market prices for European options on the same underlying asset have associated Black–Scholes implied volatilities that vary with both exercise price and expiration date. For example, in certain markets at certain times, the implied volatilities of options with a given exercise date depend on strike prices in a manner that is often termed a smile curve. Figure 1 illustrates the dependence of Black–Scholes implied volatilities on moneyness (the ratio of strike price to futures price), for various S&P 500 index options on November 2, 1993. Other forms of systematic deviation from constant implied volatilities have been noted, both over time and across various derivatives at a point in time. Three major lines of modeling address these systematic deviations from the assumptions underlying the Black–Scholes model. In all of these, a key step is to generalize the underlying log-normal price process √ by replacing the constant volatility parameter s of the Black–Scholes model with Vt , an adapted non-negative process V !T with 0 Vt dt < ∞ such that the underlying asset price process S satisfies dSt = rt St dt + St
#
Vt dûSt ,
(93)
706
D. Duffie
Fig. 1. “Smile curves” implied by SP500 Index options of 6 different times to expiration, from market data for November 2, 1993.
where BQ is a standard Brownian motion in Rd under the given equivalent martingale measure Q, and ûS = cS · BQ is a standard Brownian motion under Q obtained from any cS in Rd with unit norm. In the first class of models, Vt = v(St , t), for some function v: R × [0, T ] → R satisfying technical regularity conditions. In practical applications, the function v, or its discrete-time discrete-state analogue, is often “calibrated” to the available option prices. This approach, sometimes referred to as the implied-tree model, was developed by Dupire (1994), Rubinstein (1995) and Jackwerth and Rubinstein (1996). For a second class of models, called autoregressive conditional heteroscedastic, or ARCH, the volatility depends on the path of squared returns, as formulated by Engle (1982). The GARCH (generalized ARCH) variant has the squared volatility Vt at time t of the discrete-period return Rt + 1 = log St + 1 − log St adjusting according to the recursive formula Vt = a + bVt − 1 + cR2t ,
(94)
for fixed coefficients a, b and c satisfying regularity conditions. By taking a time period of length h, normalizing in a natural way, and taking limits, a natural continous-time limiting behavior for volatility is simply a deterministic mean-reverting process V satisfying the ordinary differential equation dV (t) = ú (v − V (t)) . (95) dt
Ch. 11:
Intertemporal Asset Pricing Theory
707
Corradi (2000) explains that this deterministic continuous-time limit is more natural than the stochastic limit of Nelson (1990). For both the implied-tree approach and the GARCH approach, the volatility process V depends only on the underlying asset prices; volatility is not a separate source of risk. In a third approach, however, the increments of the squared-volatility process V depend on Brownian motions that are not perfectly correlated with ûS . For example, in a simple “one-factor” setting, dVt = mV (Vt ) dt + sV (Vt ) dûVt ,
(96)
where ûV = cV · BQ is a standard Brownian motion under Q, for some constant vector cV of unit norm. As we shall see, the correlation parameter cSV = cS · cV has an important influence on option prices. The price of a European option at exercise price p and expiration at time t is f (Ss , Vs , s) = EsQ exp [−r(t − s)] (St − p)+ , which can be solved, for example, by reducing to a PDE and applying, if necessary, a finite-difference approach. In many settings, a pronounced skew to the smile, as in Figure 1, indicates an important potential role for correlation between the increments of the return-driving and volatility-driving Brownian motions, ûS and ûV . This role is borne out directly by the correlation apparent from time-series data on implied volatilities and returns for certain important asset classes, as indicated for example by Pan (2002). A tractable model that allows for the skew effects of correlation is the Heston model, the special case of Equation (96) for which # dVt = ú (v − Vt ) dt + sv Vt dûVt , (97) for positive coefficients ú, v and sv that play the same respective roles for V as for a Cox–Ingersoll–Ross interest-rate model. Indeed, this Feller diffusion model of volatility (97) is sometimes called a “CIR volatility model.” In the original Heston model, the short rate is a constant, say r, and option prices can be computed analytically, using transform methods explained later in this section, in terms of the parameters (r, cSV , ú, v, sv ) of the Heston model, as well as the initial volatility V0 , the initial underlying price S0 , the strike price, and the expiration time. Figure 2 shows the “smile curves,” for the same options illustrated in Figure 1, that are implied by the Heston model for parameters, including V0 , chosen to minimize the sum of squared differences between actual and theoretical option prices, a calibration approach popularized for this application by Bates (1997). Notably, the distinctly downward slopes, often called skews, are captured with a negative correlation coefficient cSV . Adopting a short rate r = 0.0319 that roughly captures the effects of contemporary short-term interest rates, the remaining coefficients of the √ Heston model are calibrated to cSV = −0.66, ú = 19.66, v = 0.017, sv = 1.516, and V0 = 0.094.
708
D. Duffie
Fig. 2. “Smile curves” calculated for SP500 Index options of 6 different exercise dates, November 2, 1993, using the Heston Model.
Going beyond the calibration approach, time-series data on both options and underlying prices have been used simultaneously to fit the parameters of various stochastic-volatility models, for example by A¨ıt-Sahalia, Wang and Yared (2001), Benzoni (2002), Chernov and Ghysels (2000), Guo (1998), Pan (2002), Poteshman (1998) and Renault and Touzi (1992). The empirical evidence for S&P 500 index returns and option prices suggests that the Heston model is overly restrictive for these data. For example, Pan (2002) rejects the Heston model in favor of a generalization with jumps in returns, proposed by Bates (1997), that is a special case of the affine model for option pricing to which we now turn. 5.3. Option valuation by transform analysis We now address the calculation of option prices with stochastic volatility and jumps in an affine setting of the type already introduced for term-structure modeling, a special case being the model of Heston (1993). We use an approach based on transform analysis that was initiated by Stein and Stein (1991) and Heston (1993), allowing for relatively rich and tractable specifications of stochastic interest rates and volatility, and for jumps. This approach and the underlying stochastic models were subsequently generalized by Bakshi, Cao and Chen (1997), Bakshi and Madan (2000), Bates (1997) and Duffie, Pan and Singleton (2000). We assume that there is a state process X that is affine under Q in a state space D ⊂ Rk , and that the short-rate process r is of the affine form rt = ø0 + ø1 · Xt ,
Ch. 11:
Intertemporal Asset Pricing Theory
709
for coefficients ø0 in R and ø1 in Rk . The price process S underlying the options in question is assumed to be of the exponential-affine form St = exp[a(t) + b(t) · X (t)], for potentially time-dependent coefficients a(t) in R and b(t) in Rk . An example would be the price of an equity, a foreign currency, or, as shown earlier in the context of affine term-structure models, the price of a zero-coupon bond. The Heston model (97) is a special case, for an affine process X = (X (1) , X (2) ), with (1) Xt = Yt ≡ log(St ), and Xt(2) = Vt , and with a constant short rate r = ø0 . From Ito’s Formula, # dYt = r − 12 Vt dt + Vt dûSt , (98) which indeed makes the state vector Xt = (Yt , Vt ) an affine process, whose state space is D = R × [0, ∞), as we can see from the fact that the drift and instantaneous covariance matrix of Xt are affine with respect to Xt . The underlying asset price is indeed of the desired exponential-affine form because St = eY (t) . We will return to the Heston model shortly with some explicit results on option valuation. One of the affine models generalizing Heston’s that was tested by Pan (2002) took # dYt = r − 12 Vt dt + Vt dûSt + dZt , (99) where, under the equivalent martingale measure Q, Z is a pure-jump process whose jump times have an arrival intensity (as defined in Section 6) that is affine with respect to the volatility process V , and whose jump sizes are independent normals. For the general affine case, suppose we are interested in valuing a European call option on the underlying security, with strike price p and exercise date t. We have the initial option price t U0 = E Q exp − ru du (Su − p)+ . 0
Letting A denote the exercise event {w: S(w, t) p}, we have the option price t rs ds (St 1A − p1A ) . U0 = E Q exp − 0
Because S(t) = exp[a(t) + b(t) · X (t)], U0 = ea(t) G (− log p + a(t); t, b(t), −b(t)) − pG (− log p + a(t); t, 0, −b(t)) , where, for any y ∈ R and for any coefficient vectors d and d in Rk , t rs ds exp[d · X (t)] 1d·X (t) y . G( y; t, d, d) = E Q exp −
(100)
(101)
0
So, if we can compute the function G, we can obtain the prices of options of any strike and exercise date. Likewise, the prices of European puts, interest-rate caps,
710
D. Duffie
chooser options, and many other derivatives can be derived in terms of G. For example, following this approach of Heston (1993), the valuation of discount bond options and caps in an affine setting was undertaken by Chen and Scott (1995), Duffie, Pan and Singleton (2000), Nunes, Clewlow and Hodges (1999) ! t and Scaillet (1996). We note, for fixed (t, d, d), assuming E Q (exp[− 0 r(u) du] exp[d · X (t)]) < ∞, that G(·; t, d, d) is a bounded increasing function. For any such function g: R → [0, ∞), an associated transform g: ˆ R → C, where C is the set of complex numbers, is defined by +∞ g(z) ˆ = eizy dg( y), (102) −∞
√ where i is the usual imaginary number, often denoted −1. Depending on one’s conventions, one may refer to gˆ as the Fourier transform of g. Under the technical ! +∞ ˆ | dz < ∞, we have the L´evy Inversion Formula condition that −∞ | g(z) 1 g(0) ˆ − g( y) = 2 p
0
∞
1 −izy Im e g(z) ˆ dz, z
(103)
where Im(c) denotes the imaginary part of a complex number c. For the case g(·) = G(·; t, d, d), with the associated transform G(·; t, d, d), we can compute G( y; t, d, d) from Equation (103), typically by computing the integral in Equation (103) numerically, and thereby obtain option prices from Equation (100). Our final objective is therefore to compute the transform G. Fixing z, and applying Fubini’s Theorem to Equation (102), we have G(z; t, d, d) = f (X0 , 0), where f : D × [0, t] → C is defined by t " " Q f (Xs , s) = E r(u) du exp [d · X (t)] exp [izd · X (t)] " Xs . exp − (104) s
From Equation (104), the same separation-of-variables arguments used to treat the affine term-structure models imply, under technical regularity conditions, that f (x, s) = exp [a(t − s) + b(t − s) · x] ,
(105)
where (a, b) solves the generalized Riccati ordinary differential equation (ODE) associated with the affine model and the coefficients ø0 and ø1 of the short rate. The solutions for a(·) and b(·) are complex numbers, in light of the complex boundary condition b(0) = d + izd. For technical details, see Duffie, Filipovi´c and Schachermayer (2003). Thus, under technical conditions, we have our transform G(z; t, d, d), evaluated at a particular z. We then have the option-pricing formula (100), where G( y; t, d, d) is obtained from the inversion formula (103) applied to the transforms G(·; t, b(t), −b(t)) and G(·; t, 0, −b(t)).
Ch. 11:
Intertemporal Asset Pricing Theory
711
For option pricing with the Heston model, we require only the transform y(u) = e−rt E Q (exp[uY (t)]), for some particular choices of u ∈ C. Heston (1993) solved the Riccati equation for this case, arriving at $ % ¯ u) V (0) , ¯ u) + uY (0) + b(t, y(u) = exp a(t, where, letting b = usv cSV − ú, a = u(1 − u), and g =
# b2 + asv2 ,
a 1 − e−gt , 2g − ( g + b) (1 − e−gt ) g +b 2 g +b −gt ¯ u) = rt(u − 1) − úv 1−e t + 2 log 1 − a(t, . sv2 sv 2g ¯ u) = − b(t,
Other special cases for which one can compute explicit solutions are cited in Duffie, Pan and Singleton (2000).
6. Corporate securities This section offers a basic review of the valuation of equities and corporate liabilities, beginning with some standard issues regarding the capital structure of a firm. Then, we turn to models of the valuation of defaultable debt that are based on an assumed stochastic arrival intensity of the stopping time defining default. The use of intensitybased defaultable bond pricing models was instigated by Artzner and Delbaen (1990, 1992, 1995), Lando (1994, 1998) and Jarrow and Turnbull (1995), and has become commonplace in business applications among banks and investment banks. We begin with an extremely simple model of the stochastic behavior of the market values of assets, equity, and debt. We may think of equity and debt, at this first pass, as derivatives with respect to the total market value of the firm, as proposed by Black and Scholes (1973) and Merton (1974). In the simplest case, equity is merely a call option on the assets of the firm, struck at the level of liabilities, with possible exercise at the maturity date of the debt. 32 At first, we are in a setting of perfect capital markets, where the results of Modigliani and Miller (1958) imply the irrelevance of capital structure for the total market value of the firm. Later, we introduce market imperfections and increase the degree of control that may be exercised by holders of equity and debt. With this, the theory becomes more complex and less like a derivative valuation model. There are many more interesting variations than could be addressed well in the space available here.
32
Geske (1977) used compound option modeling so as to extend to the Black–Scholes–Merton model to cases of debt at various maturities.
712
D. Duffie
Our objective is merely to convey some sense of the types of issues and standard modeling approaches. We let B be a standard Brownian motion in Rd on a complete probability space (W, F, P), and fix the standard filtration {Ft : t 0} of B. Later, we allow for information revealed by “Poisson-like arrivals”, in order to tractably model “suddensurprise” defaults that cannot be easily treated in a setting of Brownian information. 6.1. Endogenous default timing We assume a constant short rate r and take as given a martingale measure Q, in the infinite-horizon sense of Huang and Pag`es (1992), after deflation by e−rt . The resources of a given firm are assumed to !consist of cash flows at a rate dt for t each time t, where d is an adapted process with 0 | ds | ds < ∞ almost surely for all t. The market value of the assets of the firm at time t is defined as the market value At of the future cash flows. That is, At =
EtQ
∞
exp [−r(s − t)] ds ds .
(106)
t
We assume that At is well defined and finite for all t. The martingale representation theorem implies that dAt = (rAt − dt ) dt + st dBtQ ,
(107)
!T for some adapted Rd -valued process s such that 0 st · st dt < ∞ for all T ∈ [0, ∞), and where BQ is the standard Brownian motion in Rd under Q obtained from B and Girsanov’s Theorem. 33 We suppose that the original owners of the firm chose its capital structure to consist of a single bond as its debt, and pure equity, defined in detail below. The bond and equity investors have already paid the original owners for these securities. Before we consider the effects of market imperfections, the total of the market values of equity and debt must be the market value A of the assets, which is a given process, so the design of the capital structure is irrelevant from the viewpoint of maximizing the total value received by the original owners of the firm. For simplicity, we suppose that the bond promises to pay coupons at a constant total rate c, continually in time, until default. This sort of bond is sometimes called a consol. Equityholders receive the residual cash flow in the form of dividends at the rate dt − c at time t, until default. At default, the firm’s future cash flows are assigned to debtholders. 33
For an explanation of how Girsanov’s Theorem applies in an infinite-horizon setting, see for example the last section of Chapter 6 of Duffie (2001), based on Huang and Pag`es (1992).
Ch. 11:
Intertemporal Asset Pricing Theory
713
The equityholders’ dividend rate, dt − c, may have negative outcomes. It is commonly stipulated, however, that equity claimants have limited liability, meaning that they should not experience negative cash flows. One can arrange for limited liability by dilution of equity. 34 Equityholders are assumed to have the contractual right to declare default at any stopping time T , at which time equityholders give up to debtholders the rights to all future cash flows, a contractual arrangement termed strict priority, or sometimes absolute priority. We assume that equityholders are not permitted to delay liquidation after the value A of the firm reaches 0, so we ignore the possibility that AT < 0. We could also consider the option of equityholders to change the firm’s production technology, or to call in the debt for some price. The bond contract may convey to debtholders, under a protective covenant, the right to force liquidation at any stopping time t at which the asset value At is as low or lower than some stipulated level. We ignore this feature for brevity. 6.2. Example: Brownian dividend growth We turn to a specific model proposed by Fisher, Heinkel and Zechner (1989), and explicitly solved by Leland (1994), for optimal default timing and for the valuation of equity and debt. Once we allow for taxes and bankruptcy distress costs, 35 capital structure matters, and, within the following simple parametric framework, Leland (1994) calculated the initial capital structure that maximizes the total initial market value of the firm. Suppose the cash-flow rate process d is a geometric Brownian motion under Q, in that ddt = mdt dt + sdt dBtQ , for constants m and s , where BQ is a standard Brownian motion under Q. We assume throughout that m < r, so that, from Equation (106), A is finite and dAt = mAt dt + s At dBtQ . 34 That is, so long as the market value of equity remains strictly positive, newly issued equity can be sold into the market so as to continually finance the negative portion (c − dt )+ of the residual cash flow. While dilution increases the quantity of shares outstanding, it does not alter the total market value of all shares, and so is a relatively simple modeling device. Moreover, dilution is irrelevant to individual shareholders, who would in any case be in a position to avoid negative cash flows by selling their own shares as necessary to finance the negative portion of their dividends, with the same effect as if the firm had diluted their shares for this purpose. We are ignoring here any frictional costs of equity issuance or trading. 35 The model was further elaborated to treat coupon debt of finite maturity in Leland and Toft (1996), endogenous calling of debt and re-capitalization in Leland (1998) and Uhrig-Homburg (1998), incomplete observation by bond investors, with default intensity, in Duffie and Lando (2001), and alternative approaches to default recovery by Anderson and Sundaresan (1996), Anderson, Pan and Sundaresan (2001), D´ecamps and Faure-Grimaud (2000, 2002), Fan and Sundaresan (2000), Mella-Barral (1999) and Mella-Barral and Perraudin (1997).
714
D. Duffie
We calculate that dt = (r − m) At . For any given constant K ∈ (0, A0 ), the market value of a security that claims one unit of account at the hitting time t(K) = inf {t: At K} is, at any time t < t(K), −g At Q , (108) Et (exp [−r(t(K) − t)]) = K where g=
m+
√
m2 + 2rs 2 , s2
and where m = m − s 2 / 2. This can be shown by applying Ito’s Formula to see that e−rt (At /K)−g is a Q-martingale. Let us consider for simplicity the case in which bondholders have no protective covenant. Then, equityholders declare default at a stopping time that attains the maximum equity valuation T Q −rt e (dt − c) dt , (109) w(A0 ) ≡ sup E T ∈T
0
where T is the set of stopping times. We naturally conjecture that the maximization problem (109) is solved by a hitting time of the form t(AB ) = inf {t: At AB }, for some default-triggering level AB of assets to be determined. Black and Cox (1976) developed the idea of default at the first passage of assets to a sufficiently low level, but used an exogenous default boundary. Longstaff and Schwartz (1995) extended this approach to allow for stochastic defaultfree interest rates. Their work was then refined by Collin-Dufresne and Goldstein (2001a). Given this conjectured form t(AB ) for the optimal default time, we further conjecture from Ito’s Formula that the equity value function w: (0, ∞) → [0, ∞) defined by Equation (109) solves the ODE Dw(x) − rw(x) + (r − m) x − c = 0,
x > AB ,
(110)
where Dw(x) = w (x) mx + 12 w (x) s 2 x2 ,
(111)
with the absolute-priority boundary condition w(x) = 0,
x AB .
(112)
Finally, we conjecture the smooth-pasting condition w (AB ) = 0,
(113)
based on Equation (112) and continuity of the first derivative w (·) at AB . Although not an obvious requirement for optimality, the smooth-pasting condition, sometimes
Ch. 11:
Intertemporal Asset Pricing Theory
715
called the high-order-contact condition, has proven to be a fruitful method by which to conjecture solutions, as follows. If we are correct in conjecturing that the optimal default time is of the form t(AB ) = inf {t: At AB }, then, given an initial asset level A0 = x > AB , the value of equity must be w(x) = x − AB
x AB
−g −
−g x c . 1− r AB
(114)
This conjectured value of equity is merely the market value x of the total future cash flows of the firm, less a deduction equal to the market value of the debtholders’ claim to AB at the default time t(AB ) using Equation (108), less another deduction equal to the market value of coupon payments to bondholders before default. The market value of those coupon payments is easily computed as the present value c/r of coupons paid at the rate c from time 0 to time +∞, less the present value of coupons paid at the rate c from the default time t(AB ) until +∞, again using Equation (108). In order to complete our conjecture, we apply the smooth-pasting condition w (AB ) = 0 to this functional form (114), and by calculation obtain the conjectured default triggering asset level as AB = bc,
(115)
where b=
g . r(1 + g)
(116)
We are ready to state and verify this result of Leland (1994). Proposition. The default timing problem (109) is solved by inf {t: At bc}. The associated initial market value w(A0 ) of equity is W (A0 , c), where W (x, c) = 0,
x bc,
(117)
and W (x, c) = x − bc
x bc
−g −
−g x c , 1− r bc
x bc.
(118)
The initial value of debt is A0 − W (A0 , c). Proof: First, it may be checked by calcuation that W (·, c) satisfies the differential equation (110) and the smooth-pasting condition (113). Ito’s Formula applies to C 2 (twice continuously differentiable) functions. In our case, although W (·, c) need not
716
D. Duffie
be C 2 , it is convex, is C 1 , and is C 2 except at bc, where Wx ( bc, c) = 0. Under these conditions, we obtain the result of applying Ito’s Formula as s s W (As , c) = W (A0 , c) + DW (At , c) dt + Wx (At , c) s At dBtQ , 0
0
where DW (x, c) is defined as usual by DW (x, c) = Wx (x, c) mx + 12 Wxx (x, c) s 2 x2 , except at x = bc, where we may replace “Wxx ( bc, c)” with zero. [This slight extension of Ito’s Formula is found, for example, in Karatzas and Shreve (1988), p. 219.] For each time t, let t e−rs ((r − m) As − c) ds. qt = e−rt W (At , c) + 0
From Ito’s Formula, dqt = e−rt f (At ) dt + e−rt Wx (At , c) s At dBtQ ,
(119)
where f (x) = DW (x, c) − rW (x, c) + (r − m) x − c. Because Wx is bounded, the last term of Equation (119) defines a Q-martingale. For x bc, we have both W (x, c) = 0 and (r − m) x − c 0, so f (x) 0. For x > bc, we have Equation (110), and therefore f (x) = 0. The drift of q is therefore never positive, and for any stopping time T we have q0 E Q (qT ), or equivalently, T e−rs (ds − c) ds + e−rT W (AT , c) . W (A0 , c) E Q 0
For the particular stopping time t(bc), we have t(bc) Q −rs e (ds − c) ds , W (A0 , c) = E 0
using the boundary condition (117) and the fact that f (x) = 0 for x > bc. So, for any stopping time T , t(bc) Q −rs e (ds − c) ds W (A0 , c) = E 0
E
T
Q
−rs
e 0
EQ
T
(ds − c) ds + e
−rT
W (AT , c)
e−rs (ds − c) ds ,
0
using the non-negativity of W for the last inequality. This implies the optimality of the stopping time t(bc) and verification of the proposed solution W (A0 , c) of Equation (109).
Ch. 11:
Intertemporal Asset Pricing Theory
717
Boyarchenko and Levendorski˘ı (2002), Hilberink and Rogers (2002) and Zhou (2000) extend this first passage model of optimal default timing to the case of jumpdiffusion asset processes. 6.3. Taxes, bankruptcy costs, capital structure In order to see how the original owners of the firm may have a strict but limited incentive to issue debt, we introduce two market imperfections: • A tax deduction, at a tax rate of q, on interest expense, so that the after-tax effective coupon rate paid by the firm is (1 − q) c. • Bankruptcy costs, so that, with default at time t, the assets of the firm are disposed of at a salvage value of Aˆt At , where Aˆ is a given continuous adapted process. We also consider more carefully the formulation of an equilibrium, in which equityholders and bondholders each exercise their own rights so as to maximize the market values of their own securities, given correct conjectures regarding the equilibrium policy of the other claimant. Because the total of the market values of equity and debt is not the fixed process A, new considerations arise, including inefficiencies. That is, in an equilibrium, the total of the market values of equity and bond may be strictly less than maximal, for example because of default that is premature from the viewpoint of maximizing the total value of the firm. An unrestricted central planner could in such a case split the firm’s cash flows between equityholders and bondholders so as to achieve strictly larger market values for each than the equilibrium values of their respective securities. Absent the tax shield on debt, the original owner of the firm, who selects a capital structure at time 0 so as to maximize the total initial market value of all corporate securities, would have avoided a capital structure that involves an inefficiency of this type. For example, an all-equity firm would avoid bankruptcy costs. In order to illustrate the endogenous choice of capital structure based on the tradeoff between the values of tax shields and of bankruptcy losses, we extend the example of Section 6.2 by assuming a tax rate of q ∈ (0, 1) and bankruptcy recovery Aˆ = ûA, for a constant fractional recovery rate û ∈ [0, 1]. For simplicity, we assume no protective covenant. The equity valuation and optimal default timing problem is identical to Equation (109), except that equityholders treat the effective coupon rate as the after-tax rate c(1 − q). Thus, the optimal equity market value is W (A0 , c(1 − q)), where W (x, y) is given by Equations (117) and (118). The optimal default time is T ∗ = inf {t: At b(1 − q) c}. For a given coupon rate c, the bankruptcy recovery rate û has no effect on the equity value. The market value U (A0 , c) of debt, at asset level A0 and coupon rate c, is indeed affected by distress costs, in that U (x, c) = ûx,
x b(1 − q) c,
(120)
718
D. Duffie
and, for x b(1 − q) c, U (x, c) = ûbc(1 − q)
x bc(1 − q)
−g
−g x c + . 1− r bc(1 − q)
(121)
The first term of Equation (121) is the market value of the payment of the recovery value ûA(T ∗ ) = ûbc(1 − q) at default, using Equation (108). The second term is the market value of receiving the coupon rate c until T ∗ . The capital structure that maximizes the market value received by the initial owners for sale of equity and debt can now be determined from the coupon rate c∗ solving sup {U (A0 , c) + W (A0 , (1 − q) c)} .
(122)
c
Leland (1994) provides an explicit solution for c∗ , which then allows one to easily examine the resolution of the tradeoff between the market value −g A0 qc , H (A0 , c) = 1− r bc(1 − q) of tax shields and the market value −g A0 h (A0 , c) = ûbc(1 − q) , bc(1 − q) of financial distress costs associated with bankruptcy. The coupon rate that solves Equation (122) is that which maximizes H (A0 , c) − h(A0 , c), the benefit–cost difference. Although the tax shield is valuable to the firm, it is merely a transfer from somewhere else in the economy. The bankruptcy distress cost, however, involves a net social cost, illustrating one of the inefficiencies caused by taxes. Leland and Toft (1996) extend the model so as to treat bonds of finite maturity with discrete coupons. One can also allow for multiple classes of debtholders, each with its own contractual cash flows and rights. For example, bonds are conventionally classified by priority, so that, at liquidation, senior bondholders are contractually entitled to cash flows resulting from liquidation up to the total face value of senior debt (in proportion to the face values of the respective senior bonds, and normally without regard to maturity dates). If the most senior class of debtholders can be paid off in full, the next most senior class is assigned liquidation cash flows, and so on, to the lowest subordination class. Some bonds may be secured by certain identified assets, or collateralized, in effect giving them seniority over the liquidation value resulting from those cash flows, before any unsecured bonds may be paid according to the seniority of unsecured claims. In practice, the overall priority structure may be rather complicated. Corporate bonds are often callable, within certain time restrictions. Not infrequently, corporate bonds may be converted to equity at pre-arranged conversion ratios (number
Ch. 11:
Intertemporal Asset Pricing Theory
719
of shares for a given face value) at the timing option of bondholders. Such convertible bonds present a challenging set of valuation issues, some examined by Brennan and Schwartz (1980) and Nyborg (1996). Occasionally, corporate bonds are puttable, that is, may be sold back to the issuer at a pre-arranged price at the option of bondholders. One can also allow for adjustments in capital structure, normally instigated by equityholders, that result in the issuing and retiring of securities, subject to legal restrictions, some of which may be embedded in debt contracts. 6.4. Intensity-based modeling of default This section introduces a model for a default time as a stopping time t with a given intensity process l, as defined below. From the joint behavior of l, the short-rate process r, the promised payment of the security, and the model of recovery at default, as well as risk premia, one can characterize the stochastic behavior of the term structure of yields on defaultable bonds. In applications, default intensities may be modeled as functions of observable variables that are linked with the likelihood of default, such as debt-to-equity ratios, asset volatility measures, other accounting measures of indebtedness, market equity prices, bond yield spreads, industry performance measures, and macroeconomic variables related to the business cycle. This dependence could, but in practice does not usually, arise endogenously from a model of the ability or incentives of the firm to make payments on its debt. Because the approach presented here does not depend on the specific setting of a firm, it has also been applied to the valuation of defaultable sovereign debt, as in Duffie, Pedersen and Singleton (2003) and Pag`es (2000). We fix a complete probability space (W, F, P) and a filtration {Gt : t 0} satisfying the usual conditions. At some points, it will be important to make a distinction between an adapted process and a predictable process. A predictable process is, intuitively speaking, one whose value at any time t depends only on the information in the underlying filtration that is available up to, but not including, time t. Protter (1990) provides a full definition. A non-explosive counting process K (for example, a Poisson ! t process) has an intensity l if l is a predictable non-negative process satisfying 0 ls ds < ∞ almost surely for all t, with the property that a local martingale M , the compensated counting process, is given by Mt = Kt −
t
ls ds.
(123)
0
The ! t compensated counting process M is a martingale if, for all t, we have E( 0 ls ds) < ∞. A standard reference on counting processes is Br´emaud (1981). For simplicity, we will say that a stopping time t has an intensity l if t is the first jump time of a non-explosive counting process whose intensity process is l. The accompanying intuition is that, at any time t and state w with t < t(w), the
720
D. Duffie
Gt -conditional probability of an arrival before t + D is approximately l(w, t) D, for small D. This intuition is justified in the sense of derivatives if l is bounded and continuous, and under weaker conditions. A stopping time t is non-trivial if P(t ∈ (0, ∞)) > 0. If a stopping time t is non-trivial and if the filtration {Gt : t 0} is the standard filtration of some Brownian motion B in Rd , then t could not have an intensity. We know this from the fact that, if {Gt : t 0} is the standard filtration of B, then the associated compensated counting process M of Equation (123) (indeed, any local martingale) could be represented as a stochastic integral with respect to B, and therefore cannot jump, but M must jump at t. In order to have an intensity, a stopping time t must be totally inaccessible, roughly meaning that it cannot be “foretold” by an increasing sequence of stopping times that converges to t. An inaccessible stopping time is a “sudden surprise”, but there are no such surprises on a Brownian filtration! As an illustration, we could imagine that the firm’s equityholders or managers are equipped with some Brownian filtration for purposes of determining their optimal default time t, but that bondholders have imperfect monitoring, and may view t as having an intensity with respect to the bondholders’ own filtration {Gt : t 0}, which contains less information than the Brownian filtration. Such a situation arises in Duffie and Lando (2001). We say that t is doubly stochastic with intensity l if the underlying counting process whose first jump time is t is doubly stochastic with intensity l. This means roughly that, conditional on the intensity process, the counting process is a Poisson process with that same (conditionally deterministic) intensity. The doubly-stochastic property thus implies that, for t < t, using the law of iterated expectations, P (t > s | Gt ) = E P (t > s | Gt , {lu : t u s}) | Gt " s (124) " l(u) du " Gt , = E exp − t
using the fact that the probability of no jump between ! s t and s of a Poisson process with time-varying (deterministic) intensity l is exp[− t l(u) du]. This property (124) !s is convenient for calculations, because evaluating E(exp[− t l(u) du] | Gt ) is computationally equivalent to the pricing of a default-free zero-coupon bond, treating l as a short rate. Indeed, this analogy is also quite helpful for intuition and suggests tractable models for intensities based on models of the short rate that are tractable for default-free term structure modeling. As we shall see, it would be sufficient for Equation (124) that lt = L(Xt , t) for some measurable L: Rn × [0, ∞) → [0, ∞), where X in Rd solves a stochastic differential equation of the form dXt = m (Xt , t) dt + s (Xt , t) dBt ,
(125)
for some (Gt )-standard Brownian motion B in Rd . More generally, Equation (124) follows from assuming that the doubly-stochastic counting process K whose first jump
Ch. 11:
Intertemporal Asset Pricing Theory
721
time is t is driven by some filtration {Ft : t 0}. This means roughly that, for any t, conditional on Ft , the distribution of K during [0, t] is that of a Poisson process with time-varying conditionally deterministic intensity l. A complete definition is provided in Duffie (2001). 36 For purposes of the market valuation of bonds and other securities whose cash flows are sensitive to default timing, we would want to have a risk-neutral intensity process, that is, an intensity process l Q for the default time t that is associated with (W, F, Q) and the given filtration {Gt : t 0}, where Q is an equivalent martingale measure. In this case, we call lQ the Q-intensity of t. (As usual, there may be more than one equivalent martingale measure.) Such an intensity always exists, as shown by Artzner and Delbaen (1995), but the doubly-stochastic property may be lost with a change of measure [Kusuoka (1999)]. The ratio lQ / l (for l strictly positive) is in some sense a multiplicative risk premium for the uncertainty associated with the timing of default. This issue is pursued by Jarrow, Lando and Yu (2003), who provide sufficient conditions for no default-timing risk premium (but allowing nevertheless a default risk premium). 6.5. Zero-recovery bond pricing We fix!a short-rate process r and an equivalent martingale measure Q after deflation by t exp[− 0 r(u) du]. We consider the valuation of a security that pays F1{t > s} at a given time s > 0, where F is a GT -measurable bounded random variable. Because 1{t > s} is the random variable that is 1 in the event of no default by s and zero otherwise, we may view F as the contractually promised payment of the security at time s, with default by s leading to no payment. The case of a defaultable zero-coupon bond is treated by letting F = 1. In the next sub-section, we will consider recovery at default. From the definition of Q as an equivalent martingale measure, the price St of this security at any time t < s is s Q r(u) du 1{t > s} F , (126) St = Et exp − t
EtQ
where denotes Gt -conditional expectation under Q. From Equation (126) and the fact that t is a stopping time, St must be zero for all t t. Under Q, the default time t is assumed to have a Q-intensity process lQ . Theorem. Suppose that F, r and lQ are bounded and that t is doubly stochastic under Q driven by a filtration {Ft : t 0} such that r is (Ft )-adapted and F is Fs -measurable. Fix any t < s. Then, for t t, we have St = 0, and for t < t, s Q Q r(u) + l (u) du F . St = Et exp − (127) t
Included in the definition is the condition that l is (Ft )-predictable, that Ft ⊂ Gt , and that {Ft : t ≥ 0} satisfies the usual conditions. 36
722
D. Duffie
This theorem is based on Lando (1998). 37 The idea of this representation (127) of the pre-default price is that discounting for default that occurs at an intensity is analogous to discounting at the short rate r. Proof: From Equation (126), the law of iterated expectations, and the assumption that r is (Ft )-adapted and F is Fs -measurable, s " " St = E Q E Q exp − r(u) du 1{t > s} F | Fs ∨ Gt " Gt s t "" Q =E r(u) du FE Q 1{t > s} | Fs ∨ Gt " Gt . exp − t
The result then!follows from the implication of double stochasticity that Q(t > s | s Fs ∨ Gt ) = exp[ t l Q (u) du]. As a special case, suppose the filtration {Ft : t 0} is that generated by a process X that is affine under Q and valued in D ⊂ Rd . It is natural to allow dependence of l Q , r and F on the state process X in the sense that ltQ = L (Xt ) ,
rt = ø (Xt ) ,
F = exp [ f (X (T ))] ,
(128)
where L, ø and f are affine on D. Under the technical regularity in Duffie, Filipovi´c and Schachermayer (2003), relation (127) then implies that, for t < t, we have St = exp [a(T − t) + b(T − t) · X (t)] ,
(129)
for coefficients a(·) and b(·) that are computed from the associated Generalized Riccati equations. 6.6. Pricing with recovery at default The next step is to consider the recovery of some random payoff W at the default time t, if default occurs before the maturity date s of the security. We adopt the assumptions of Theorem 6.5, and add the assumption that W = wt , where w is a bounded predictable process that is also adapted to the driving filtration {Ft : t 0}.
37 Additional work in this vein is by Bielecki and Rutkowski (1999a,b, 2001), Cooper and Mello (1991, 1992), Das and Sundaram (2000), Das and Tufano (1995), Davydov, Linetsky and Lotz (1999), Duffie (1998), Duffie and Huang (1996), Duffie, Schroder and Skiadas (1996), Duffie and Singleton (1999), Elliott, Jeanblanc and Yor (2000), Hull and White (1992, 1995), Jarrow and Yu (2001), Jeanblanc and Rutkowski (2000), Madan and Unal (1998) and Nielsen and Ronn (1995).
Ch. 11:
Intertemporal Asset Pricing Theory
723
The market value at any time t < min(s, t) of any default recovery is, by definition of the equivalent martingale measure Q, given by Jt = EtQ exp
−r(u) du 1{t s} wt .
t
t
(130)
The doubly-stochastic assumption implies that t has a probability density under Q, at any time u in [t, s], conditional on Gt ∨ Fs , and on the event that t > t, of
u Q
−l (z) dz lQ (u).
q(t, u) = exp t
Thus, using the same iterated-expectations argument of the proof of Theorem 6.5, we have, on the event that t > t, t " " " " −r(z) dz 1{t s} wt " Fs ∨ Gt " Gt E exp Jt = E ut s " " Q =E exp −r(z) dz q(t, u) wu du " Gt t t s = F(t, u) du,
Q
Q
t
using Fubini’s Theorem, where u [l Q (z) + r(z)] dz lQ (u) w(u) . F(t, u) = EtQ exp −
(131)
t
We summarize the main defaultable valuation result as follows. Theorem. Consider a security that pays F at s if t > s, and otherwise pays wt at t. Suppose that w, F, l Q and r are bounded. Suppose that t is doubly stochastic under Q, driven by a filtration {Ft : t 0} with the property that r and w are (Ft )-adapted and F is Fs -measurable. Then, for t t, we have St = 0, and for t < t, s s (r(u) + lQ (u)) du F + F(t, u) du. St = EtQ exp − t
(132)
t
These results are based on Duffie, Schroder and Skiadas (1996) and Lando (1994, 1998). Sch¨onbucher (1998) extends to treat the case of recovery W which is not of the form wt for some predictable process w, but rather allows the recovery to be revealed just at the default time t. For details on this construction, see Duffie (2002).
724
D. Duffie
In the affine state-space setting described at the end of the previous section, F(t, u) can be computed by our usual “affine” methods, provided that w is of form wt = ea + b·X (t) for constant coefficients a and b. In this case, under technical regularity, F(t, u) = exp [a(u − t) + b(u − t) · X (t)] [c(u − t) + C(u − t) · X (t)] ,
(133)
for readily computed deterministic coefficients a, b, c and C, as in Duffie, Pan and Singleton (2000). This still leaves the task of numerical computation of the !s integral t F(t, u) du. For the price of a typical defaultable bond promising periodic coupons followed by its principal at maturity, one may sum the prices of the coupons and of the principal, treating each of these payments as though it were a separate zero-coupon bond. An often-used assumption, although one that need not apply in practice, is that there is no default recovery for coupons remaining to be paid as of the time of default, and that bonds of different maturities have the same recovery of principal. In any case, convenient parametric assumptions, based for example on an affine driving process X , lead to straightforward computation of a term structure of defaultable bond yields that may be applied in practical situations, such as the valuation of credit derivatives, a class of derivative securities designed to transfer credit risk that is treated in Duffie and Singleton (2003). For the case of defaultable bonds with embedded American options, the most typical cases being callable or convertible bonds, the usual resort is valuation by some numerical implementation of the associated dynamic programming problems. See Berndt (2002). 6.7. Default-adjusted short rate In the setting of Theorem 6.6, a particularly simple pricing representation can be based on the definition of a predictable process ° for the fractional loss in market value at default, according to (1 − °t ) (St− ) = wt .
(134)
Manipulation left to the reader shows that, under the conditions of Theorem 6.6, for t < t, s − r(u) + °(u) lQ (u) du F . St = EtQ exp
(135)
t
This valuation model (135) is from Duffie and Singleton (1999), and based on a precursor of Pye (1974). This representation (135) is particularly convenient if we take ° as an exogenously given fractional loss process, as it allows for the application of standard valuation methods, treating the payoff F as default-free, but accounting for the
Ch. 11:
Intertemporal Asset Pricing Theory
725
intensity and severity of default losses through the “default-adjusted” short-rate process r + °lQ . The adjustment °l Q is in fact the risk-neutral mean rate of proportional loss in market value due to default. Notably, the dependence of the bond price on the intensity l Q and fractional loss ° at default is only through the product °lQ . For example, doubling l Q and halving ° has no effect on the bond price process. Suppose, for example, that t is doubly stochastic driven by the filtration of a state process X that is affine under Q, and we take rt + °t ltQ = R(Xt ) and F = exp[ f (X (T ))], for affine R(·) and f (·). Then, under regularity conditions, we obtain at each time t before default a bond price of the simple form (129), again for coefficients solving the associated Generalized Riccati equation. Using this affine approach to default-adjusted short rates, Duffee (1999a) provides an empirical model of risk-neutral default intensities for corporate bonds. 38
References Adams, K., and D. Van Deventer (1994), “Fitting yield curves and forward rate curves with maximum smoothness”, Journal of Fixed Income 4 (June):52−62. Ahn, H., M. Dayal, E. Grannan and G. Swindle (1995), “Hedging with transaction costs”, Annals of Applied Probability 8:341−366. Andersen, L., and J. Andreasen (2000a), “Volatility skews and extensions of the Libor Market Model”, Applied Mathematical Finance 7(1):1−32. Andersen, L., and J. Andreasen (2000b), “Jump-diffusion processes: volatility smile fitting and numerical methods for option pricing”, Review of Derivatives Research 4:231−262. Anderson, R., and S. Sundaresan (1996), “Design and valuation of debt contracts”, Review of Financial Studies 9:37−68. Anderson, R., Y. Pan and S. Sundaresan (2001), “Corporate bond yield spreads and the term structure”, Finance 21(2):14−37. Andreasen, J., B. Jensen and R. Poulsen (1998), “Eight valuation methods in financial mathematics: the Black–Scholes formula as an example”, Mathematical Scientist 23:18−40. Ansel, J., and C. Stricker (1992a), “Quelques remarques sur un theoreme de Yan”, Working Paper (Universit´e de Franche-Comt´e). Ansel, J., and C. Stricker (1992b), “Lois de martingales, densit´es et d´ecomposition de F¨ollmer– Schweizer”, Annales de l’Institut Henri Poincar´e Probabilit´es Statistiques 28(3):375−392. Arrow, K. (1951), “An extension of the basic theorems of classical welfare economics”, in: J. Neyman, ed., Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability (University of California Press, Berkeley, CA) pp. 507−532. Arrow, K. (1953), “Le rˆole des valeurs boursi`eres pour la repartition la meilleure des risques”, Econometrie. Colloq. Internat. Centre National de la Recherche Scientifique 40(Paris 1952):41−47;
38
For related empirical work on sovereign debt, see Duffie, Pedersen and Singleton (2003) and Pag`es (2000).
726
D. Duffie
discussion, pp. 47–48, C.N.R.S. (Paris 1953). English Translation: 1964, Review of Economic Studies 31:91−96. Artzner, P. (1995), “References for the numeraire portfolio”, Working Paper (Institut de Recherche Math´ematique Avanc´ee Universit´e Louis Pasteur et CNRS, et Laboratoire de Recherche en Gestion). Artzner, P., and F. Delbaen (1990), “‘Finem lauda’ or the risk of swaps”, Insurance: Mathematics and Economics 9:295−303. Artzner, P., and F. Delbaen (1992), “Credit risk and prepayment option”, ASTIN Bulletin 22:81−96. Artzner, P., and F. Delbaen (1995), “Default risk and incomplete insurance markets”, Mathematical Finance 5:187−195. Artzner, P., and P. Roger (1993), “Definition and valuation of optional coupon reinvestment bonds”, Finance 14:7−22. A¨ıt-Sahalia, Y. (1996a), “Nonparametric pricing of interest rate derivative securities”, Econometrica 64:527−560. A¨ıt-Sahalia, Y. (1996b), “Testing continuous-time models of the spot interest rate”, Review of Financial Studies 9:385−342. A¨ıt-Sahalia, Y. (2002), “Telling from discrete data whether the underlying continuous-time model is a diffusion”, Journal of Finance 57(5):2075−2113. A¨ıt-Sahalia, Y., Y. Wang and F. Yared (2001), “Do option markets correctly price the probabilities of movement of the underlying asset?” Journal of Econometrics 102:67−110. Au, K., and D. Thurston (1993), “Markovian term structure movements”, Working Paper (School of Banking and Finance, University of New South Wales). Babbs, S., and M. Selby (1996), “Pricing by arbitrage in incomplete markets”, Mathematical Finance 8:163−168. Babbs, S., and N. Webber (1994), “A theory of the term structure with an official short rate” (Fiancial Options Research Center Warwick Business School). ´ Bachelier, L. (1900), “Th´eorie de la speculation”, Annales Scientifiques de L’Ecole Normale Sup´erieure, 3`eme serie 17:21−88. Translation: 1964, in: P. Cootner, ed., The Random Character of Stock Market Prices (MIT Press, Cambridge, MA) pp. 17−79. Back, K. (1986), “Securities market equilibrium without bankruptcy: contingent claim valuation and the martingale property”, Working Paper (Center for Mathematical Studies in Economics and Management Science, Northwestern University). Back, K. (1991), “Asset pricing for general processes”, Journal of Mathematical Economics 20:371−396. Back, K., and S. Pliska (1987), “The shadow price of information in continuous time decision problems”, Stochastics 22:151−186. Bajeux-Besnainou, I., and R. Portait (1997), “The numeraire portfolio: a new methodology for financial theory”, The European Journal of Finance 3:291−309. Bajeux-Besnainou, I., and R. Portait (1998), “Pricing derivative securities with a multi-factor gaussian model”, Applied Mathematical Finance 5:1−19. Bakshi, G., and D. Madan (2000), “Spanning and derivative security valuation”, Journal of Financial Economics 55:205−238. Bakshi, G., C. Cao and Z. Chen (1997), “Empirical performance of alternative option pricing models”, Journal of Finance 52:2003−2049. Balduzzi, P., S. Das, S. Foresi and R. Sundaram (1996), “A simple approach to three factor affine term structure models”, Journal of Fixed Income 6(December):43−53. Balduzzi, P., G. Bertola, S. Foresi and L. Klapper (1998), “Interest rate targeting and the dynamics of short-term interest rates”, Journal of Money, Credit, and Banking 30:26−50. Balduzzi, P., S. Das and S. Foresi (1998), “The central tendency: a second factor in bond yields”, Review of Economics and Statistics 80:62−72. Banz, R., and M. Miller (1978), “Prices for state-contingent claims: some evidence and applications”, Journal of Business 51:653−672.
Ch. 11:
Intertemporal Asset Pricing Theory
727
Bates, D. (1997), “Post-87’ crash fears in S-and-P 500 futures options”, Journal of Econometrics 94:181−238. Baz, J., and S. Das (1996), “Analytical approximations of the term structure for jump-diffusion processes: a numerical analysis”, Journal of Fixed Income 6(1):78−86. Beaglehole, D. (1990), “Tax clientele and stochastic processes in the gilt market”, Working Paper (Graduate School of Business, University of Chicago). Beaglehole, D., and M. Tenney (1991), “General solutions of some interest rate contingent claim pricing equations”, Journal of Fixed Income 1:69−84. Bensoussan, A. (1984), “On the theory of option pricing”, Acta Applicandae Mathematicae 2:139−158. Benzoni, L. (2002), “Pricing options under stochastic volatility: an empirical investigation”, Working Paper (Carlson School of Management, University of Minnesota). Berardi, A., and M. Esposito (1999), “A base model for multifactor specifications of the term structure”, Economic Notes 28:145−170. Bergman, Y. (1995), “Option pricing with differential interest rates”, The Review of Financial Studies 8:475−500. Berndt, A. (2002), “Estimating the term structure of credit spreads: callable corporate debt”, Working Paper (Department of Statistics, Stanford University). Bhar, R., and C. Chiarella (1995), “Transformation of Heath–Jarrow–Morton models to Markovian systems”, Working Paper (School of Finance and Economics, University of Technology, Sydney). Bielecki, T., and M. Rutkowski (1999a), “Credit risk modelling: a multiple ratings case”, Working Paper (Northeastern Illinois University and Technical University of Warsaw). Bielecki, T., and M. Rutkowski (1999b), “Modelling of the defaultable term structure: conditionally Markov approach”, Working Paper (Northeastern Illinois University and Technical University of Warsaw). Bielecki, T., and M. Rutkowski (2001), “Credit risk modelling: intensity based approach”, in: E. Jouini, J. Cvitanic and M. Musiela, eds., Option Pricing, Interest Rates and Risk Management (Cambridge University Press) pp. 399–457. Bj¨ork, T., and B. Christensen (1999), “Interest rate dynamics and consistent forward rate curves”, Mathematical Finance 22:17−23. Bj¨ork, T., and A. Gombani (1999), “Minimal realizations of interest rate models”, Finance and Stochastics 3:413−432. Black, F., and J. Cox (1976), “Valuing corporate securities: liabilities: some effects of bond indenture provisions”, Journal of Finance 31:351−367. Black, F., and P. Karasinski (1991), “Bond and option pricing when short rates are lognormal”, Financial Analysts Journal (July–August), pp. 52−59. Black, F., and M. Scholes (1973), “The pricing of options and corporate liabilities”, Journal of Political Economy 81:637−654. Black, F., E. Derman and W. Toy (1990), “A one-factor model of interest rates and its application to treasury bond options”, Financial Analysts Journal (January–February), pp. 33−39. Bottazzi, J.-M. (1995), “Existence of equilibria with incomplete markets: the case of smooth returns”, Journal of Mathematical Economics 24:59−72. Bottazzi, J.-M., and T. Hens (1996), “Excess demand functions and incomplete markets”, Journal of Economic Theory 68:49−63. Boudoukh, J., M. Richardson, R. Stanton and R. Whitelaw (1997), “Pricing mortgage-backed securities in a multifactor interest rate environment: a multivariate density estimation approach”, Review of Financial Studies 10:405−446. Boyarchenko, S., and S. Levendorski˘ı (2002), “Perpetual American options under L´evy processes”, SIAM Journal on Control and Optimization 40(6):1663−1696. Brace, A., and M. Musiela (1994), “Swap derivatives in a Gaussian HJM framework”, Working Paper (Treasury Group, Citibank, Sydney, Australia).
728
D. Duffie
Brace, A., and M. Musiela (1995), “The market model of interest rate dynamics”, Mathematical Finance 7:127−155. Breeden, D. (1979), “An intertemporal asset pricing model with stochastic consumption and investment opportunities”, Journal of Financial Economics 7:265−296. Breeden, D., and R. Litzenberger (1978), “Prices of state-contingent claims implicit in option prices”, Journal of Business 51:621−651. Br´emaud, P. (1981), Point Processes and Queues: Martingale Dynamics (Springer, New York). Brennan, M., and E. Schwartz (1977), “Savings bonds, retractable bonds and callable bonds”, Journal of Financial Economics 5:67−88. Brennan, M., and E. Schwartz (1980), “Analyzing convertible bonds”, Journal of Financial and Quantitative Analysis 10:907−929. Brown, D., P. DeMarzo and C. Eaves (1996a), “Computing equilibria when asset markets are incomplete”, Econometrica 64:1−27. Brown, D., P. DeMarzo and C. Eaves (1996b), “Computing zeros of sections vector bundles using homotopies and relocalization”, Mathematics of Operations Research 21:26−43. Brown, R., and S. Schaefer (1994), “Interest rate volatility and the shape of the term structure”, Philosophical Transactions of the Royal Society: Physical Sciences and Engineering 347:449−598. Brown, R., and S. Schaefer (1996), “Ten years of the real term structure: 1984–1994”, Journal of Fixed Income 6(March):6−22. B¨uhlmann, H., F. Delbaen, P. Embrechts and A. Shiryaev (1998), “On Esscher transforms in discrete finance models”, ASTIN Bulletin 28:171−186. B¨uttler, H., and J. Waldvogel (1996), “Pricing callable bonds by means of Green’s function”, Mathematical Finance 6:53−88. Carr, P., and R. Chen (1996), “Valuing bond futures and the quality option”, Working Paper (Johnson Graduate School of Management, Cornell University). Carverhill, A. (1988), “The Ho and Lee term structure theory: a continuous time version”, Working Paper (Financial Options Research Centre, University of Warwick). Cass, D. (1984), “Competitive equilibria in incomplete financial markets”, Working Paper (Center for Analytic Research in Economics and the Social Sciences, University of Pennsylvania). Cass, D. (1989), “Sunspots and incomplete financial markets: the leading example”, in: G. Feiwel, ed., The Economics of Imperfect Competition and Employment: Joan Robinson and Beyond (Macmillan, London) pp. 677−693. Cass, D. (1991), “Incomplete financial markets and indeterminacy of financial equilibrium”, in: J.-J. Laffont, ed., Advances in Economic Theory (Cambridge University Press, Cambridge) pp. 263−288. Cassese, G. (1996), “An elementary remark on martingale equivalence and the fundamental theorem of asset pricing”, Working Paper (Istituto di Economia Politica, Universit`a Commerciale “Luigi Bocconi”, Milan). Chacko, G., and S. Das (2002), “Pricing interest rate derivatives: a general approach”, Review of Financial Studies v15(1):195−241. Chapman, D. (1998), “Habit formation, consumption, and state-prices”, Econometrica 66:1223−1230. Chen, L. (1996), Stochastic Mean and Stochastic Volatility: A Three-Factor Model of the Term Structure of Interest Rates and Its Application to the Pricing of Interest Rate Derivatives: Part I (Blackwell Publishers, Oxford). Chen, R.-R., and L. Scott (1992), “Pricing interest rate options in a two-factor Cox-Ingersoll-Ross model of the term structure”, Review of Financial Studies 5:613−636. Chen, R.-R., and L. Scott (1993), “Pricing interest rate futures options with futures-style margining”, Journal of Futures Markets 13:15−22. Chen, R.-R., and L. Scott (1995), “Interest rate options in multifactor Cox–Ingersoll–Ross models of the term structure”, Journal of Derivatives 3:53−72.
Ch. 11:
Intertemporal Asset Pricing Theory
729
Cherif, T., N. El Karoui, R. Myneni and R. Viswanathan (1995), “Arbitrage pricing and hedging of quanto options and interest rate claims with quadratic gaussian state variables”, Working Paper (Laboratoire de Probabilit´es, Universit´e de Paris, VI). Chernov, M., and E. Ghysels (2000), “A study towards a unified approach to the joint estimation of objective and risk neutral measures for the purpose of options valuation”, Journal of Financial Economics 56:407−458. Cherubini, U., and M. Esposito (1995), “Options in and on interest rate futures contracts: results from martingale pricing theory”, Applied Mathematical Finance 2:1−15. Chesney, M., R. Elliott and R. Gibson (1993), “Analytical solution for the pricing of American bond and yield options”, Mathematical Finance 3:277−294. Chew, S.-H. (1983), “A generalization of the quasilinear mean with applications to the measurement of income inequality and decision theory resolving the Allais paradox”, Econometrica 51:1065−1092. Chew, S.-H. (1989), “Axiomatic utility theories with the betweenness property”, Annals of Operations Research 19:273−298. Chew, S.-H., and L. Epstein (1991), “Recursive utility under uncertainty”, in: A. Khan and N. Yannelis, eds., Equilibrium Theory with an Infinite Number of Commodities (Springer, New York) pp. 353−369. Cheyette, O. (1995), “Markov representation of the Heath–Jarrow–Morton model”, Working Paper (BARRA Inc., Berkeley, California). Cheyette, O. (1996), “Implied prepayments”, Working Paper (BARRA Inc., Berkeley, California). Citanna, A., and A. Villanacci (1993), “On generic pareto improvement in competitive economies with incomplete asset structure”, Working Paper (Center for Analytic Research in Economics and the Social Sciences, University of Pennsylvania). Citanna, A., A. Kajii and A. Villanacci (1994), “Constrained suboptimality in incomplete markets: a general approach and two applications”, Economic Theory 11:495−521. Clewlow, L., K. Pang and C. Strickland (1997), “Efficient pricing of caps and swaptions in a multi-factor gaussian interest rate model”, Working Paper (University of Warwick). Cohen, H. (1995), “Isolating the wild card option”, Mathematical Finance 2:155−166. Coleman, T., L. Fisher and R. Ibbotson (1992), “Estimating the term structure of interest rates from data that include the prices of coupon bonds”, Journal of Fixed Income 2 (September):85−116. Collin-Dufresne, P., and R. Goldstein (2001a), “Do credit spreads reflect stationary leverage ratios?”, Journal of Finance 56(5):1929−1958. Collin-Dufresne, P., and R. Goldstein (2001b), “Stochastic correlation and the relative pricing of caps and swaptions in a generalized-affine framework”, Working Paper (Carnegie Mellon University). Collin-Dufresne, P., and R. Goldstein (2002), “Pricing swaptions within the affine framework”, Journal of Derivatives 10(1):1−18. Collin-Dufresne, P., and B. Solnik (2001), “On the term structure of default premia in the swap and Libor markets”, Journal of Finance 56:1095−1116. Constantinides, G. (1982), “Intertemporal asset pricing with heterogeneous consumers and without demand aggregation”, Journal of Business 55:253−267. Constantinides, G. (1990), “Habit formation: a resolution of the equity premium puzzle”, Journal of Political Economy 98:519−543. Constantinides, G. (1992), “A theory of the nominal term structure of interest rates”, Review of Financial Studies 5:531−552. Constantinides, G., and J. Ingersoll (1984), “Optimal bond trading with personal taxes”, Journal of Financial Economics 13:299−335. Constantinides, G., and T. Zariphopoulou (1999), “Bounds on prices of contingent claims in an intertemporal economy with proportional transaction costs and general preferences”, Finance and Stochastics 3:345−369. Constantinides, G., and T. Zariphopoulou (2001), “Bounds on option prices in an intertemporal setting with proportional transaction costs and multiple securities”, Mathematical Finance 11:331−346.
730
D. Duffie
Cont, R. (1998), “Modeling term structure dynamics: an infinite dimensional approach”, Working Paper (Centre de Math´ematiques Appliqu´ees, Ecole Polytechnique, Palaiseau, France). Cooper, I., and A. Mello (1991), “The default risk of swaps”, Journal of Finance XLVI:597−620. Cooper, I., and A. Mello (1992), “Pricing and optimal use of forward contracts with default risk”, Working Paper (Department of Finance, London Business School, University of London). Corradi, V. (2000), “Degenerate continuous time limits of GARCH and GARCH-type processes”, Journal of Econometrics 96:145−153. Cox, J. (1983), “Optimal consumption and portfolio rules when assets follow a diffusion process”, Working Paper (Graduate School of Business, Stanford University). Cox, J., and C.-F. Huang (1989), “Optimal consumption and portfolio policies when asset prices follow a diffusion process”, Journal of Economic Theory 49:33−83. Cox, J., and C.-F. Huang (1991), “A variational problem arising in financial economics with an application to a portfolio turnpike theorem”, Journal of Mathematical Economics 20:465−488. Cox, J., and S. Ross (1976), “The valuation of options for alternative stochastic processes”, Journal of Financial Economics 3:145−166. Cox, J., and M. Rubinstein (1985), Options Markets (Prentice-Hall, Englewood Cliffs, NJ). Cox, J., S. Ross and M. Rubinstein (1979), “Option pricing: a simplified approach”, Journal of Financial Economics 7:229−263. Cox, J., J. Ingersoll and S. Ross (1981), “The relation between forward prices and futures prices”, Journal of Financial Economics 9:321−346. Cox, J., J. Ingersoll and S. Ross (1985a), “An intertemporal general equilibrium model of asset prices”, Econometrica 53:363−384. Cox, J., J. Ingersoll and S. Ross (1985b), “A theory of the term structure of interest rates”, Econometrica 53:385−408. Cuoco, D. (1997), “Optimal consumption and equilibrium prices with portfolio constraints and stochastic income”, Journal of Economic Theory 72:33−73. Cuoco, D., and H. He (1994), “Dynamic equilibrium in finite-dimensional economies with incomplete financial markets”, Working Paper (Wharton School, University of Pennsylvania). Cvitani´c, J., and I. Karatzas (1993), “Hedging contingent claims with constrained portfolios”, Annals of Applied Probability 3:652−681. Cvitani´c, J., and I. Karatzas (1996), “Hedging and portfolio optimization under transaction costs: a martingale approach”, Mathematical Finance 6:133−165. Cvitani´c, J., H. Wang and W. Schachermayer (2001), “Utility maximization in incomplete markets with random endowment”, Finance and Stochastics 5:259−272. Daher, C., M. Romano and G. Zacklad (1992), “Determination du prix de produits optionnels obligatoires a` partir d’un mod`ele multi-facteurs de la courbe des taux”, Working Paper (Caisse Autonome de Refinancement, Paris). Dai, Q. (1994), “Implied Green’s function in a no-arbitrage Markov model of the instantaneous short rate”, Working Paper (Graduate School of Business, Stanford University). Dai, Q., and K. Singleton (2000), “Specification analysis of affine term structure models”, Journal of Finance 55:1943−1978. Dai, Q., and K. Singleton (2003), “Term structure modelling in theory and reality”, Review of Financial Studies, forthcoming. Dalang, R., A. Morton and W. Willinger (1990), “Equivalent martingale measures and no-arbitrage in stochastic securities market models”, Stochastics and Stochastic Reports 29:185−201. Das, S. (1993), “Mean rate shifts and alternative models of the interest rate: theory and evidence”, Working Paper (Department of Finance, New York University). Das, S. (1995), “Pricing interest rate derivatives with arbitrary skewness and kurtosis: a simple approach to jump-diffusion bond option pricing”, Working Paper (Division of Research, Harvard Business School).
Ch. 11:
Intertemporal Asset Pricing Theory
731
Das, S. (1997), “Discrete-time bond and option pricing for jump-diffusion processes”, Review of Derivatives Research 1:211−243. Das, S. (1998), “The surprise element: interest rates as jump diffusions”, NBER Working Paper 6631; Journal of Econometrics, under review. Das, S., and S. Foresi (1996), “Exact solutions for bond and option prices with systematic jump risk”, Review of Derivatives Research 1:7−24. Das, S., and R. Sundaram (2000), “A discrete-time approach to arbitrage-free pricing of credit derivatives”, Management Science 46:46−62. Das, S., and P. Tufano (1995), “Pricing credit-sensitive debt when interest rates, credit ratings and credit spreads are stochastic”, Journal of Financial Engineering 5(2):161−198. Dash, J. (1989), “Path integrals and options − I”, Working Paper (Financial Strategies Group, Merrill Lynch Capital Markets, New York). Davis, M., and M. Clark (1993), “Analysis of financial models including transactions costs”, Working Paper (Imperial College, University of London). Davydov, D., V. Linetsky and C. Lotz (1999), “The hazard-rate approach to pricing risky debt: two analytically tractable examples”, Working Paper (Department of Economics, University of Michigan). Debreu, G. (1953), “Une economie de l’incertain”, Working Paper (Electricit´e de France). Debreu, G. (1959), Theory of Value, Cowles Foundation Monograph 17 (Yale University Press, New Haven, CT). D´ecamps, J.-P., and A. Faure-Grimaud (2000), “Bankruptcy costs, ex post renegotiation and gambling for resurrection”, Finance (December). D´ecamps, J.-P., and A. Faure-Grimaud (2002), “Should I stay or should I go? Excessive continuation and dynamic agency costs of debt”, European Economic Review 46(9):1623−1644. D´ecamps, J.-P., and J.-C. Rochet (1997), “A variational approach for pricing options and corporate bonds”, Economic Theory 9:557−569. Dekel, E. (1989), “Asset demands without the independence axiom”, Econometrica 57:163−169. Delbaen, F., and W. Schachermayer (1998), “The fundamental theorem of asset pricing for unbounded stochastic processes”, Mathematische Annalen 312:215−250. DeMarzo, P., and B. Eaves (1996), “A homotopy, Grassmann manifold, and relocalization for computing equilibria of GEI”, Journal of Mathematical Economics 26:479−497. Diament, P. (1993), “Semi-empirical smooth fit to the treasury yield curve”, Working Paper (Graduate School of Business, Columbia University). Dijkstra, T. (1996), “On numeraires and growth-optimum portfolios”, Working Paper (Faculty of Economics, University of Groningen). Dothan, M. (1978), “On the term structure of interest rates”, Journal of Financial Economics 7:229−264. Dothan, M. (1990), Prices in Financial Markets (Oxford University Press, New York). Duffee, G. (1999a), “Estimating the price of default risk”, Review of Financial Studies 12:197−226. Duffee, G. (1999b), “Forecasting future interest rates: are affine models failures?”, Working Paper (Federal Reserve Board). Duffie, D. (1987), “Stochastic equilibria with incomplete financial markets”, Journal of Economic Theory 41:405−416; Corrigendum: 1989, 49:384. Duffie, D. (1988), “An extension of the Black–Scholes model of security valuation”, Journal of Economic Theory 46:194−204. Duffie, D. (1992), The Nature of Incomplete Markets (Cambridge University Press, Cambridge) pp. 214−262. Duffie, D. (1998), “Defaultable term structures with fractional recovery of par”, Working Paper (Graduate School of Business, Stanford University, Stanford, CA). Duffie, D. (2001), Dynamic Asset Pricing Theory, 3rd Edition (Princeton University Press, Princeton, NJ). Duffie, D. (2002), “A short course on credit risk modeling with affine processes”, Working Paper (Graduate School of Business, Stanford University, Stanford, CA).
732
D. Duffie
Duffie, D., and N. Gˆarleanu (2001), “Risk and valuation of collateralized debt valuation”, Financial Analysts Journal 57 (1, January–February):41−62. Duffie, D., and C.-F. Huang (1985), “Implementing Arrow–Debreu equilibria by continuous trading of few long-lived securities”, Econometrica 53:1337−1356. Duffie, D., and C.-F. Huang (1986), “Multiperiod security markets with differential information: martingales and resolution times”, Journal of Mathematical Economics 15:283−303. Duffie, D., and M. Huang (1996), “Swap rates and credit quality”, Journal of Finance 51:921−949. Duffie, D., and R. Kan (1996), “A yield-factor model of interest rates”, Mathematical Finance 6:379−406. Duffie, D., and D. Lando (2001), “Term structures of credit spreads with incomplete accounting information”, Econometrica 69:633−664. Duffie, D., and W. Shafer (1985), “Equilibrium in incomplete markets I: a basic model of generic existence”, Journal of Mathematical Economics 14:285−300. Duffie, D., and W. Shafer (1986), “Equilibrium in incomplete markets II: generic existence in stochastic economies”, Journal of Mathematical Economics 15:199−216. Duffie, D., and K. Singleton (1997), “An econometric model of the term structure of interest rate swap yields”, Journal of Finance 52:1287−1321. Duffie, D., and K. Singleton (1999), “Modeling term structures of defaultable bonds”, Review of Financial Studies 12:687−720. Duffie, D., and K. Singleton (2003), Credit Risk: Pricing, Measurement, and Management (Princeton University Press, Princeton, NJ). Duffie, D., and C. Skiadas (1994), “Continuous-time security pricing: a utility gradient approach”, Journal of Mathematical Economics 23:107−132. Duffie, D., and R. Stanton (1988), “Pricing continuously resettled contingent claims”, Journal of Economic Dynamics and Control 16:561−574. Duffie, D., and W. Zame (1989), “The consumption-based capital asset pricing model”, Econometrica 57:1279−1297. Duffie, D., M. Schroder and C. Skiadas (1996), “Recursive valuation of defaultable securities and the timing of the resolution of uncertainty”, Annals of Applied Probability 6:1075−1090. Duffie, D., M. Schroder and C. Skiadas (1997), “A term structure model with preferences for the timing of resolution of uncertainty”, Economic Theory 9:3−22. Duffie, D., J. Pan and K. Singleton (2000), “Transform analysis and asset pricing for affine jumpdiffusions”, Econometrica 68:1343−1376. Duffie, D., L. Pedersen and K. Singleton (2003), “Modeling sovereign yield spreads: a case study of Russian debt”, The Journal of Finance 58:119−160. Duffie, D., D. Filipovi´c and W. Schachermayer (2003), “Affine processes and applications in finance”, Annals of Applied Probability 13, forthcoming. Dumas, B., and P. Maenhout (2002), “A central planning approach to dynamic incomplete markets”, Working Paper (INSEAD, France). Dunn, K., and K. Singleton (1986), “Modeling the term structure of interest rates under nonseparable utility and durability of goods”, Journal of Financial Economics 17:27−55. Dupire, B. (1994), “Pricing with a smile”, Risk (January), pp. 18−20. Dybvig, P., and C.-F. Huang (1988), “Nonnegative wealth, absence of arbitrage, and feasible consumption plans”, Review of Financial Studies 1:377−401. El Karoui, N., and H. Geman (1994), “A probabilistic approach to the valuation of general floating-rate notes with an application to interest rate swaps”, Advances in Futures and Options Research 7:47−63. El Karoui, N., and V. Lacoste (1992), “Multifactor models of the term structure of interest rates”, Working Paper (Laboratoire de Probabilit´es, Universit´e de Paris VI). El Karoui, N., and M. Quenez (1995), “Dynamic programming and pricing of contingent claims in an incomplete market”, SIAM Journal of Control and Optimization 33:29−66. El Karoui, N., and J.-C. Rochet (1989), “A pricing formula for options on coupon bonds”, Working Paper (October, Laboratoire de Probabilit´es, Universit´e de Paris VI).
Ch. 11:
Intertemporal Asset Pricing Theory
733
El Karoui, N., C. Lepage, R. Myneni, N. Roseau and R. Viswanathan (1991a), “The pricing and hedging of interest rate claims: applications”, Working Paper (Laboratoire de Probabilit´es, Universit´e de Paris VI). El Karoui, N., C. Lepage, R. Myneni, N. Roseau and R. Viswanathan (1991b), “The valuation and hedging of contingent claims with gaussian Markov interest rates”, Working Paper (Laboratoire de Probabilit´es, Universit´e de Paris VI). El Karoui, N., R. Myneni and R. Viswanathan (1992), “Arbitrage pricing and hedging of interest rate claims with state variables I: theory”, Working Paper (Laboratoire de Probabilit´es, Universit´e de Paris VI). Elliott, R., M. Jeanblanc and M. Yor (2000), “On models of default risk”, Mathematical Finance 10:77−106. Engle, R. (1982), “Autoregressive conditional heteroskedasticity with estimates of the variance of united kingdom inflation”, Econometrica 50:987−1008. Epstein, L. (1988), “Risk aversion and asset prices”, Journal of Monetary Economics 22:179−192. Epstein, L. (1992), “Behavior under risk: recent developments in theory and application”, in: J.-J. Laffont, ed., Advances in Economic Theory (Cambridge University Press, Cambridge) pp. 1−63. Epstein, L., and S. Zin (1989), “Substitution, risk aversion and the temporal behavior of consumption and asset returns I: a theoretical framework”, Econometrica 57:937−969. Epstein, L., and S. Zin (1999), “Substitution, risk aversion and the temporal behavior of consumption and asset returns: an empirical analysis”, Journal of Political Economy 99:263−286. Fan, H., and S. Sundaresan (2000), “Debt valuation, renegotiations and optimal dividend policy”, Review of Financial Studies 13(4):1057−1099. Feller, W. (1951), “Two singular diffusion problems”, Annals of Mathematics 54:173−182. Filipovi´c, D. (1999), “A note on the Nelson–Siegel family”, Mathematical Finance 9:349−359. Filipovi´c, D. (2001a), “A general characterization of one factor affine term structure models”, Finance and Stochastics 5:389−412. Filipovi´c, D. (2001b), “Time-inhomogeneous affine processes”, Working Paper (Department of Operations Research and Financial Engineering, Princeton University) submitted. Fisher, E., R. Heinkel and J. Zechner (1989), “Dynamic capital structure choice: theory and tests”, Journal of Finance 44:19−40. Fisher, M., D. Nychka and D. Zervos (1994), “Fitting the term structure of interest rates with smoothing splines”, Working Paper (Board of Governors of the Federal Reserve Board, Washington, DC). Fleming, J., and R. Whaley (1994), “The value of wildcard options”, Journal of Finance 1:215−236. Fleming, W., and M. Soner (1993), Controlled Markov Processes and Viscosity Solutions (Springer, New York). Florenzano, M., and P. Gourdel (1994), “T-period economies with incomplete markets”, Economics Letters 44:91−97. Foldes, L. (1978a), “Martingale conditions for optimal saving – discrete time”, Journal of Mathematical Economics 5:83−96. Foldes, L. (1978b), “Optimal saving and risk in continuous time”, Review of Economic Studies 45: 39−65. Foldes, L. (1990), “Conditions for optimality in the infinite-horizon portfolio-cum-saving problem with semimartingale investments”, Stochastics and Stochastics Reports 29:133−170. Foldes, L. (1991a), “Certainty equivalence in the continuous-time portfolio-cum-saving model”, in: M.H.A. Davis and R.J. Elliott, eds., Applied Stochastic Analysis (Gordon and Breach, London) pp. 343–387. Foldes, L. (1991b), “Optimal sure portfolio plans”, Mathematical Finance 1:15−55. Foldes, L. (1992), “Existence and uniqueness of an optimum in the infinite-horizon portfolio-cum-saving model with semimartingale investments”, Stochastic and Stochastic Reports 41:241−267. Foldes, L. (2001), “The optimal consumption function in a brownian model of accumulation, Part A:
734
D. Duffie
the consumption function as solution of a boundary value problem”, Journal of Economic Dynamics and Control 25:1951−1971. F¨ollmer, H., and M. Schweizer (1990), “Hedging of contingent claims under incomplete information”, in: M. Davis and R. Elliott, eds., Applied Stochastic Analysis (Gordon and Breach, London) pp. 389−414. Frachot, A. (1995), “Factor models of domestic and foreign interest rates with stochastic volatilities”, Mathematical Finance 5:167−185. Frachot, A., and J.-P. Lesne (1993), “Econometrics of linear factor models of interest rates”, Working Paper (Banque de France, Paris). Frachot, A., D. Janci and V. Lacoste (1993), “Factor analysis of the term structure: a probabilistic approach”, Working Paper (Banque de France, Paris). Frittelli, M., and P. Lakner (1995), “Arbitrage and free lunch in a general financial market model; the fundamental theorem of asset pricing”, Mathematical Finance 5:237−261. Gabay, D. (1982), “Stochastic processes in models of financial markets”, Working Paper; in: Proceedings of the IFIP Conference on Control of Distributed Systems, Toulouse (Pergamon Press, Toulouse). Geanakoplos, J. (1990), “An introduction to general equilibrium with incomplete asset markets”, Journal of Mathematical Economics 19:1−38. Geanakoplos, J., and A. Mas-Colell (1989), “Real indeterminacy with financial assets”, Journal of Economic Theory 47:22−38. Geanakoplos, J., and W. Shafer (1990), “Solving systems of simultaneous equations in economics”, Journal of Mathematical Economics 19:69−94. Geman, H., N. El Karoui and J. Rochet (1995), “Changes of num´eraire, changes of probability measure and option pricing”, Journal of Applied Probability 32:443−458. Geske, R. (1977), “The valuation of corporate liabilities as compound options”, Journal of Financial Economics 7:63−81. Giovannini, A., and P. Weil (1989), “Risk aversion and intertemporal substitution in the capital asset pricing model”, Working Paper w2824 (National Bureau of Economic Research, Cambridge, MA). Girotto, B., and F. Ortu (1994), “Consumption and portfolio policies with incomplete markets and short-sale contraints in the finite-dimensional case: some remarks”, Mathematical Finance 4:69−73. Girotto, B., and F. Ortu (1996), “Existence of equivalent martingale measures in finite dimensional securities markets”, Journal of Economic Theory 69:262−277. Goldberg, L. (1998), “Volatility of the short rate in the rational lognormal model”, Finance and Stochastics 2:199−211. Goldstein, R. (1997), “Beyond HJM: fitting the current term structure while maintaining a Markovian system”, Working Paper (Fisher College of Business, The Ohio State University). Goldstein, R. (2000), “The term structure of interest rates as a random field”, Review of Financial Studies 13:365−384. Goldys, B., and M. Musiela (1996), “On partial differential equations related to term structure models”, Working Paper (School of Mathematics, The University of New South Wales, Sydney, Australia). Goldys, B., M. Musiela and D. Sondermann (1994), “Lognormality of rates and term structure models”, Working Paper (School of Mathematics, University of New South Wales). Gorman, W. (1953), “Community preference fields”, Econometrica 21:63−80. Gottardi, P., and T. Hens (1996), “The survival assumption and existence of competitive equilibria when asset markets are incomplete”, Journal of Economic Theory 71:313−323. Grannan, E., and G. Swindle (1996), “Minimizing transaction costs of option hedging strategies”, Mathematical Finance 6:341−364. Grant, S., A. Kajii and B. Polak (2000), “Temporal resolution of uncertainty and recursive non-expected utility models”, Econometrica 68:425−434. Grinblatt, M., and N. Jegadeesh (1996), “The relative pricing of eurodollar futures and forward contracts”, Journal of Finance 51:1499−1522. Gul, F., and O. Lantto (1990), “Betweenness satisfying preferences and dynamic choice”, Journal of Economic Theory 52:162−177.
Ch. 11:
Intertemporal Asset Pricing Theory
735
Guo, D. (1998), “The risk premium of volatility implicit in currency options”, Journal of Business and Economics Statistics 16:498−507. Hahn, F. (1994), “On economies with Arrow securities”, Working Paper (Department of Economics, Cambridge University). Hamza, K., and F. Klebaner (1995), “A stochastic partial differential equation for term structure of interest rates”, Working Paper (Department of Statistics, The University of Melbourne). Hansen, A., and P. Jorgensen (2000), “Fast and accurate approximation of bond prices when short interest rates are log-normal”, Journal of Computational Finance 3(3):27−45. Hansen, L., and R. Jaganathan (1990), “Implications of security market data for models of dynamic economies”, Journal of Political Economy 99:225−262. Harrison, M., and D. Kreps (1979), “Martingales and arbitrage in multiperiod securities markets”, Journal of Economic Theory 20:381−408. Harrison, M., and S. Pliska (1981), “Martingales and stochastic integrals in the theory of continuous trading”, Stochastic Processes and Their Applications 11:215−260. Hart, O. (1975), “On the optimality of equilibrium when the market structure is incomplete”, Journal of Economic Theory 11:418−430. He, H., and H. Pag`es (1993), “Labor income, borrowing constraints, and equilibrium asset prices”, Economic Theory 3:663−696. Heath, D. (1998), “Some new term structure models”, Working Paper (Department of Mathematical Sciences, Carnegie Mellon University). Heath, D., R. Jarrow and A. Morton (1992), “Bond pricing and the term structure of interest rates: a new methodology for contingent claims valuation”, Econometrica 60:77−106. Henrotte, P. (1991), “Transactions costs and duplication strategies”, Working Paper (Graduate School of Business, Stanford University). Hens, T. (1991), “Structure of general equilibrium models with incomplete markets”, Working Paper (Department of Economics, University of Bonn). Heston, S. (1988), “Testing continuous time models of the term structure of interest rates”, Working Paper (Graduate School of Industrial Administration, Carnegie-Mellon University). Heston, S. (1993), “A closed-form solution for options with stochastic volatility with applications to bond and currency options”, Review of Financial Studies 6:327−344. Hilberink, B., and L.C.G. Rogers (2002), “Optimal capital structure and endogenous default”, Finance and Stochastic 6(2):237−263. Hindy, A., and M. Huang (1993), “Asset pricing with linear collateral constraints”, Working Paper (Graduate School of Business, Stanford University). Hirsch, M., M. Magill and A. Mas-Colell (1990), “A geometric approach to a class of equilibrium existence theorems”, Journal of Mathematical Economics 19:95−106. Ho, T., and S. Lee (1986), “Term structure movements and pricing interest rate contingent claims”, Journal of Finance 41:1011−1029. Hogan, M., and K. Weintraub (1993), “The lognormal interest rate model and eurodollar futures”, Working Paper (Citybank, New York). Huang, C.-F. (1985a), “Information structures and equilibrium asset prices”, Journal of Economic Theory 31:33−71. Huang, C.-F. (1985b), “Information structures and viable price systems”, Journal of Mathematical Economics 14:215−240. Huang, C.-F., and H. Pag`es (1992), “Optimal consumption and portfolio policies with an infinite horizon: existence and convergence”, Annals of Applied Probability 2:36−64. Hull, J. (2000), Options, Futures, and Other Derivative Securities, 4th Edition (Prentice-Hall, Englewood Cliffs, NJ). Hull, J., and A. White (1990), “Pricing interest rate derivative securities”, Review of Financial Studies 3:573−592. Hull, J., and A. White (1992), “The price of default”, Risk 5:101−103.
736
D. Duffie
Hull, J., and A. White (1993), “One-factor interest-rate models and the valuation of interest-rate derivative securities”, Journal of Financial and Quantitative Analysis 28:235−254. Hull, J., and A. White (1995), “The impact of default risk on the prices of options and other derivative securities”, Journal of Banking and Finance 19:299−322. Husseini, S., J.-M. Lasry and M. Magill (1990), “Existence of equilibrium with incomplete markets”, Journal of Mathematical Economics 19:39−68. Ingersoll, J. (1977), “An examination of corporate call policies on convertible securities”, Journal of Finance 32:463−478. Jackwerth, J., and M. Rubinstein (1996), “Recovering probability distributions from options prices”, Journal of Finance 51:1611−1631. Jacod, J., and P. Protter (2000), Probability Essentials (Springer, New York). Jacod, J., and A. Shiryaev (1998), “Local martingales and the fundamental asset pricing theorems in the discrete-time case”, Finance and Stochastics 2:259−274. Jakobsen, S. (1992), “Prepayment and the valuation of Danish mortgage-backed bonds”, Ph.D. Dissertation (The Aarhus School of Business, Denmark). Jamshidian, F. (1989a), “Closed-form solution for American options on coupon bonds in the general gaussian interest rate model”, Working Paper (Financial Strategies Group, Merrill Lynch Capital Markets, New York). Jamshidian, F. (1989b), “An exact bond option formula”, Journal of Finance 44:205−209. Jamshidian, F. (1989c), “The multifactor gaussian interest rate model and implementation”, Working Paper (Financial Strategies Group, Merrill Lynch Capital Markets, New York). Jamshidian, F. (1991a), “Bond and option evaluation in the gaussian interest rate model”, Research in Finance 9:131−170. Jamshidian, F. (1991b), “Forward induction and construction of yield curve diffusion models”, Journal of Fixed Income (June), pp. 62−74. Jamshidian, F. (1993a), “Hedging and evaluating diff swaps”, Working Paper (Fuji International Finance PLC, London). Jamshidian, F. (1993b), “Options and futures evaluation with deterministic volatilities”, Mathematical Finance 3:149−159. Jamshidian, F. (1994), “Hedging quantos, differential swaps and ratios”, Applied Mathematical Finance 1:1−20. Jamshidian, F. (1996), “Bond, futures and option evaluation in the quadratic interest rate model”, Applied Mathematical Finance 3:93−115. Jamshidian, F. (1997a), “Libor and swap market models and measures”, Finance and Stochastics 1: 293−330. Jamshidian, F. (1997b), “Pricing and hedging European swaptions with deterministic (lognormal) forward swap volatility”, Finance and Stochastics 1:293−330. Jamshidian, F. (2001), “Libor market model with semimartingales”, in: E. Jouini, J. Cvitanic and M. Musiela, eds., Option Pricing, Interest Rates and Risk Management. Handbooks in Mathematical Finance (Cambridge University Press) Part II, Ch. 10. Jarrow, R., and S. Turnbull (1994), “Delta, gamma and bucket hedging of interest rate derivatives”, Applied Mathematical Finance 1:21−48. Jarrow, R., and S. Turnbull (1995), “Pricing derivatives on financial securities subject to credit risk”, Journal of Finance 50:53−85. Jarrow, R., and F. Yu (2001), “Counterparty risk and the pricing of defaultable securities”, Journal of Finance 56(5):1765−1799. Jarrow, R., D. Lando and F. Yu (2003), “Default risk and diversification: theory and application”, Working Paper (Cornell University). Jaschke, S. (1996), “Arbitrage bounds for the term structure of interest rates”, Finance and Stochastics 2:29−40.
Ch. 11:
Intertemporal Asset Pricing Theory
737
Jeanblanc, M., and M. Rutkowski (2000), “Modelling of default risk: an overview”, in: Modern Mathematical Finance: Theory and Practice (Higher Education Press, Beijing) pp. 171–269. Jeffrey, A. (1995), “Single factor Heath–Jarrow–Morton term structure models based on Markov spot interest rate”, Journal of Financial and Quantitative Analysis 30:619−643. Johnson, B. (1994), “Dynamic asset pricing theory: the search for implementable results”, Working Paper (Engineering-Economic Systems Department, Stanford University). Jong, F.D., and P. Santa-Clara (1999), “The dynamics of the forward interest rate curve: a formulation with state variables”, Journal of Financial and Quantitative Analysis 34:131−157. Jouini, E., and H. Kallal (1993), “Efficient trading strategies in the presence of market frictions”, Working Paper (CREST-ENSAE, Paris). Kabanov, Y. (1997), “On the FTAP of Kreps–Delbaen–Schachermayer”, in: Statistics and Control of Stochastic Processes (World Scientific, River Edge, NJ) pp. 191–203. Moscow 1995/1996. Kabanov, Y., and D. Kramkov (1995), “Large financial markets: asymptotic arbitrage and contiguity”, Theory of Probability and its Applications 39:182−187. Kabanov, Y., and C. Stricker (2001), “The Harrison–Pliska arbitrage pricing theorem under transactions costs”, Journal of Mathematical Economics 35:185−196. Kan, R. (1993), “Gradient of the representative agent utility when agents have stochastic recursive preferences”, Working Paper (Graduate School of Business, Stanford University). Kan, R. (1995), “Structure of pareto optima when agents have stochastic recursive preferences”, Journal of Economic Theory 66:626−631. Karatzas, I. (1988), “On the pricing of American options”, Applied Mathematics and Optimization 17:37−60. Karatzas, I. (1993), “IMA tutorial lectures 1–3: Minneapolis”, Working Paper (Department of Statistics, Columbia University). Karatzas, I., and S.-G. Kou (1998), “Hedging American contingent claims with constrained portfolios”, Finance and Stochastics 2:215−258. Karatzas, I., and S. Shreve (1988), Brownian Motion and Stochastic Calculus (Springer, New York). Karatzas, I., and S. Shreve (1998), Methods of Mathematical Finance (Springer, New York). Karatzas, I., J. Lehoczky and S. Shreve (1987), “Optimal portfolio and consumption decisions for a ‘small investor’ on a finite horizon”, SIAM Journal of Control and Optimization 25:1157−1186. Kawazu, K., and S. Watanabe (1971), “Branching processes with immigration and related limit theorems”, Theory of Probability and its Applications 16:36−54. Kennedy, D. (1994), “The term structure of interest rates as a gaussian random field”, Mathematical Finance 4:247−258. Konno, H., and T. Takase (1995), “A constrained least square approach to the estimation of the term structure of interest rates”, Financial Engineering and the Japanese Markets 2:169−179. Konno, H., and T. Takase (1996), “On the de-facto convex structure of a least square problem for estimating the term structure of interest rates”, Financial Engineering and the Japanese Market 3:77−85. Koopmans, T. (1960), “Stationary utility and impatience”, Econometrica 28:287−309. Kramkov, D., and W. Schachermayer (1999), “The asymptotic elasticity of utility functions and optimal investment in incomplete markets”, Annals of Applied Probability 9(3):904−950. Kraus, A., and R. Litzenberger (1975), “Market equilibrium in a multiperiod state preference model with logarithmic utility”, Journal of Finance 30:1213−1227. Kreps, D. (1979), “Three essays on capital markets”, Working Paper (Institute for Mathematical Studies in the Social Sciences, Stanford University). Kreps, D. (1981), “Arbitrage and equilibrium in economies with infinitely many commodities”, Journal of Mathematical Economics 8:15−35. Kreps, D., and E. Porteus (1978), “Temporal resolution of uncertainty and dynamic choice”, Econometrica 46:185−200.
738
D. Duffie
Kusuoka, S. (1992), “Consistent price system when transaction costs exist”, Working Paper (Research Institute for Mathematical Sciences, Kyoto University). Kusuoka, S. (1993), “A remark on arbitrage and martingale measure”, Publ. RIMS, Kyoto University 29:833−840. Kusuoka, S. (1995), “Limit theorem on option replication cost with transaction costs”, Annals of Applied Probability 11:1283−1301. Kusuoka, S. (1999), “A remark on default risk models”, Advances in Mathematical Economics 1:69−82. Kusuoka, S. (2000), “Term structure and SPDE”, Advances in Mathematical Economics 2:67−85. Kydland, F., and E. Prescott (1991), “Indeterminacy in incomplete market economies”, Economic Theory 1:45−62. Lakner, P. (1993), “Equivalent local martingale measures and free lunch in a stochastic model of finance with continuous trading”, Working Paper (Statistics and Operation Research Department, New York University). Lakner, P., and E. Slud (1991), “Optimal consumption by a bond investor: the case of random interest rate adapted to a point process”, SIAM Journal of Control and Optimization 29:638−655. Lando, D. (1994), “Three essays on contingent claims pricing”, Working Paper (Ph.D. Dissertation, Statistics Center, Cornell University). Lando, D. (1998), “On Cox processes and credit risky securities”, Review of Derivatives Research 2:99−120. Lang, L., R. Litzenberger and A. Liu (1998), “Determinants of interest rate swap spreads”, Journal of Banking and Finance 22:1507−1532. Langetieg, T. (1980), “A multivariate model of the term structure”, Journal of Finance 35:71−97. Leland, H. (1985), “Option pricing and replication with transactions costs”, Journal of Finance 40: 1283−1301. Leland, H. (1994), “Corporate debt value, bond covenants, and optimal capital structure”, Journal of Finance 49:1213−1252. Leland, H. (1998), “Agency costs, risk management, and capital structure”, Journal of Finance 53: 1213−1242. Leland, H., and K. Toft (1996), “Optimal capital structure, endogenous bankruptcy, and the term structure of credit spreads”, Journal of Finance 51:987−1019. LeRoy, S. (1973), “Risk aversion and the martingale property of asset prices”, International Economic Review 14:436−446. Levental, S., and A. Skorohod (1995), “A necessary and sufficient condition for absence of arbitrage with tame portfolios”, Annals of Applied Probability 5:906−925. Litzenberger, R. (1992), “Swaps: plain and fanciful”, Journal of Finance 47:831−850. Liu, J., J. Pan and L. Pedersen (1999), “Density-based inference in affine jump-diffusions”, Working Paper (Graduate School of Business, Stanford University). Long, J. (1990), “The numeraire portfolio”, Journal of Financial Economics 26:29−69. Longstaff, F. (1990), “The valuation of options on yields”, Journal of Financial Economics 26:97−121. Longstaff, F., and E. Schwartz (1992), “Interest rate volatility and the term structure: a two-factor general equilibrium model”, Journal of Finance 47:1259−1282. Longstaff, F., and E.S. Schwartz (1993), “Implementing of the Longstaff–Schwartz interest rate model”, Journal of Fixed Income 3:7−14. Longstaff, F., and E.S. Schwartz (1995), “A simple approach to valuing risky fixed and floating rate debt”, Journal of Finance 50:789−819. Longstaff, F., and E.S. Schwartz (2001), “Valuing American options by simulation: a simple least-squares approach”, Review of Financial Studies 14(1):113−147. Lucas, R. (1978), “Asset prices in an exchange economy”, Econometrica 46:1429−1445. Machina, M. (1982), “‘Expected utility’ analysis without the independence axiom”, Econometrica 50: 277−323.
Ch. 11:
Intertemporal Asset Pricing Theory
739
Madan, D., and H. Unal (1998), “Pricing the risks of default”, Review of Derivatives Research 2: 121−160. Magill, M., and M. Quinzii (1996), Theory of Incomplete Markets (MIT Press, Cambridge, MA). Magill, M., and W. Shafer (1990), “Characterization of generically complete real asset structures”, Journal of Mathematical Economics 19:167−194. Magill, M., and W. Shafer (1991), “Incomplete markets”, in: W. Hildenbrand and H. Sonnenschein, eds., Handbook of Mathematical Economics, Vol. 4 (Elsevier, Amsterdam) pp. 1523−1614. Mas-Colell, A. (1991), “Indeterminacy in incomplete market economies”, Economic Theory 1:45−62. Mella-Barral, P. (1999), “Dynamics of default and debt reorganization”, Review of Financial Studies 12:535−578. Mella-Barral, P., and W. Perraudin (1997), “Strategic debt service”, Journal of Finance 52:531−556. Merton, R. (1971), “Optimum consumption and portfolio rules in a continuous time model”, Journal of Economic Theory 3:373−413; Erratum: 1973, 6:213−214. Merton, R. (1973), “The theory of rational option pricing”, Bell Journal of Economics and Management Science 4:141−183. Merton, R. (1974), “On the pricing of corporate debt: the risk structure of interest rates”, Journal of Finance 29:449−470. Merton, R. (1977), “On the pricing of contingent claims and the Modigliani–Miller theorem”, Journal of Financial Economics 5:241−250. Miltersen, K. (1994), “An arbitrage theory of the term structure of interest rates”, Annals of Applied Probability 4:953−967. Miltersen, K., K. Sandmann and D. Sondermann (1997), “Closed form solutions for term structure derivatives with log-normal interest rates”, Journal of Finance 52:409−430. Modigliani, F., and M. Miller (1958), “The cost of capital, corporation finance, and the theory of investment”, American Economic Review 48:261−297. Musiela, M. (1994a), “Nominal annual rates and lognormal volatility structure”, Working Paper (Department of Mathematics, University of New South Wales, Sydney). Musiela, M. (1994b), “Stochastic PDEs and term structure models”, Working Paper (Department of Mathematics, University of New South Wales, Sydney). Musiela, M., and D. Sondermann (1994), “Different dynamical specifications of the term structure of interest rates and their implications”, Working Paper (Department of Mathematics, University of New South Wales, Sydney). Nelson, D. (1990), “ARCH models as diffusion approximations”, Journal of Econometrics 45:7−38. Nielsen, S., and E. Ronn (1995), “The valuation of default risk in corporate bonds and interest rate swaps”, Working Paper (Department of Management Science and Information Systems, University of Texas at Austin). Nunes, J., L. Clewlow and S. Hodges (1999), “Interest rate derivatives in a Duffie and Kan model with stochastic volatility: an Arrow–Debreu pricing approach”, Review of Derivatives Research 3:5−66. Nyborg, K. (1996), “The use and pricing of convertible bonds”, Applied Mathematical Finance 3: 167−190. Pag`es, H. (1987), “Optimal consumption and portfolio policies when markets are incomplete”, Working Paper (Department of Economics, Massachusetts Institute of Technology). Pag`es, H. (2000), “Estimating brazilian sovereign risk from brady bond prices”, Working Paper (Bank of France). Pan, J. (2002), “The jump-risk premia implicit in options: evidence from an integrated time-series study”, Journal of Financial Economics 63:3−50. Pan, W.-H. (1993), “Constrained efficient allocations in incomplete markets: characterization and implementation”, Working Paper (Department of Economics, University of Rochester). Pan, W.-H. (1995), “A second welfare theorem for constrained efficient allocations in incomplete markets”, Journal of Mathematical Economics 24:577−599.
740
D. Duffie
Pang, K. (1996), “Multi-factor gaussian HJM approximation to Kennedy and calibration to caps and swaptions prices”, Working Paper (Financial Options Research Center, Warwick Business School, University of Warwick). Pang, K., and S. Hodges (1995), “Non-negative affine yield models of the term structure”, Working Paper (Financial Options Research Center, Warwick Business School, University of Warwick). Pearson, N., and T.-S. Sun (1994), “An empirical examination of the Cox, Ingersoll, and Ross model of the term structure of interest rates using the method of maximum likelihood”, Journal of Finance 54:929−959. Pennacchi, G. (1991), “Identifying the dynamics of real interest rates and inflation: evidence using survey data”, Review of Financial Studies 4:53−86. Piazzesi, M. (1997), “An affine model of the term structure of interest rates with macroeconomic factors”, Working Paper (Stanford University). Piazzesi, M. (1999), “A linear-quadratic jump-diffusion model with scheduled and unscheduled announcements”, Working Paper (Stanford University). Piazzesi, M. (2002), “Affine term structure models”, in: Y. A¨ıt-Sahalia and L.P. Hansen, eds., Handbook of Financial Economics (Elsevier, Amsterdam) forthcoming. Pliska, S. (1986), “A stochastic calculus model of continuous trading: optimal portfolios”, Mathematics of Operations Research 11:371−382. Plott, C. (1986), “Rational choice in experimental markets”, Journal of Business 59:S301−S327. Poteshman, A. (1998), “Estimating a general stochastic variance model from options prices”, Working Paper (Graduate School of Business, University of Chicago, Chicago, IL). Prisman, E. (1985), “Valuation of risky assets in arbitrage free economies with frictions”, Working Paper (Department of Finance, University of Arizona, Tucson, AZ). Protter, P. (1990), Stochastic Integration and Differential Equations (Springer, New York). Protter, P. (2001), “A partial introduction to financial asset pricing theory”, Stochastic Processes and their Applications 91(2):169−203. Pye, G. (1974), “Gauging the default premium”, Financial Analysts Journal (January–February), pp. 49−52. Radner, R. (1967), “Equilibre des march´es a terme et au comptant en cas d’incertitude”, Cahiers d’Econom´etrie 4:35−52. Radner, R. (1972), “Existence of equilibrium of plans, prices, and price expectations in a sequence of markets”, Econometrica 40:289−303. Renault, E., and N. Touzi (1992), “Stochastic volatility models: statistical inference from implied volatilities”, Working Paper (GREMAQ IDEI, Toulouse, and CREST, Paris, France). Ritchken, P., and L. Sankarasubramaniam (1992), “Valuing claims when interest rates have stochastic volatility”, Working Paper (Department of Finance, University of Southern California). Ritchken, P., and R. Trevor (1993), “On finite state Markovian representations of the term structure”, Working Paper (Department of Finance, University of Southern California). Rogers, C. (1994), “Equivalent martingale measures and no-arbitrage”, Stochastics and Stochastic Reports 51:1−9. Rogers, C. (1995), “Which model for term-structure of interest rates should one use?”, Mathematical Finance, IMA, v65 (Springer, New York) pp. 93–116. Ross, S. (1987), “Arbitrage and martingales with taxation”, Journal of Political Economy 95:371−393. Ross, S. (1989), “Information and volatility: the non-arbitrage martingale approach to timing and resolution irrelevancy”, Journal of Finance 64:1−17. Rubinstein, M. (1976), “The valuation of uncertain income streams and the pricing of options”, Bell Journal of Economics 7:407−425. Rubinstein, M. (1995), “As simple as one, two, three”, Risk 8(January):44−47. Rutkowski, M. (1996), “Valuation and hedging of contingent claims in the HJM model with deterministic volatilities”, Applied Mathematical Finance 3:237−267.
Ch. 11:
Intertemporal Asset Pricing Theory
741
Rutkowski, M. (1998), “Dynamics of spot, forward, and futures libor rates”, International Journal of Theoretical and Applied Finance 1:425−445. Ryder, H., and G. Heal (1973), “Optimal growth with intertemporally dependent preferences”, Review of Economic Studies 40:1−31. Sandmann, K., and D. Sondermann (1997), “On the stability of lognormal interest rate models”, Mathematical Finance 7:119−125. Santa-Clara, P., and D. Sornette (2001), “The dynamics of the forward interest rate curve with stochastic string shocks”, Review of Financial Studies 14:149−185. Sato, K. (1999), L´evy Processes and Infinitely Divisible Distributions (Cambridge University Press, Cambridge). Translated from the 1990 Japanese original, revised by the author. Scaillet, O. (1996), “Compound and exchange options in the affine term structure model”, Applied Mathematical Finance 3:75−92. Schachermayer, W. (1992), “A Hilbert-space proof of the fundamental theorem of asset pricing”, Insurance Mathematics and Economics 11:249−257. Schachermayer, W. (1994), “Martingale measures for discrete-time processes with infinite horizon”, Mathematical Finance 4:25−56. Schachermayer, W. (2001), “The fundamental theorem of asset pricing under proportional transaction costs in finite discrete time”, Working Paper (Institut f¨ur Statistik der Universit¨at Wien). Schachermayer, W. (2002), “No arbitrage: on the work of David Kreps”, Positivity 6:359−368. Sch¨onbucher, P. (1998), “Term stucture modelling of defaultable bonds”, Review of Derivatives Research 2:161−192. Schroder, M., and C. Skiadas (1999), “Optimal consumption and portfolio selection with stochastic differential utility”, Journal of Economic Theory 89:68−126. Schroder, M., and C. Skiadas (2002), “An isomorphism between asset pricing models with and without linear habit formation”, Review of Financial Studies 15:1189−1221. Schweizer, M. (1992), “Martingale densities for general asset prices”, Journal of Mathematical Economics 21:363−378. Scott, L. (1997), “The valuation of interest rate derivatives in a multi-factor Cox–Ingersoll–Ross model that matches the initial term structure”, Working Paper (Morgan Stanley, New York). Selby, M., and C. Strickland (1993), “Computing the Fong and Vasicek pure discount bond price formula”, Working Paper (FORC Preprint 93/42, October 1993, University of Warwick). Selden, L. (1978), “A new representation of preference over ‘certain × uncertain’ consumption pairs: the ‘ordinal certainty equivalent’ hypothesis”, Econometrica 46:1045−1060. Sharpe, W. (1964), “Capital asset prices: a theory of market equilibrium under conditions of risk”, Journal of Finance 19:425−442. Singleton, K. (2001), “Estimation of affine asset pricing models using the empirical characteristic function”, Journal of Econometrics 102:111−141. Singleton, K., and L. Umantsev (2003), “Pricing coupon-bond and swaptions in affine term structure models”, Mathematical Finance, forthcoming. Skiadas, C. (1997), “Conditioning and aggregation of preferences”, Econometrica 65:347−367. Skiadas, C. (1998), “Recursive utility and preferences for information”, Economic Theory 12:293−312. Soner, M., S. Shreve and J. Cvitani´c (1994), “There is no nontrivial hedging portfolio for option pricing with transaction costs”, Annals of Applied Probability 5:327−355. Sornette, D. (1998), “String formulation of the dynamics of the forward interest rate curve”, European Physical Journal B 3:125−137. Stanton, R. (1995), “Rational prepayment and the valuation of mortgage-backed securities”, Review of Financial Studies 8:677−708. Stanton, R., and N. Wallace (1995), “Arm wrestling: valuing adjustable rate mortgages indexed to the eleventh district cost of funds”, Real Estate Economics 23:311−345. Stanton, R., and N. Wallace (1998), “Mortgage choice: what’s the point?”, Real Estate Economics 26:173−205.
742
D. Duffie
Stapleton, R., and M. Subrahmanyam (1978), “A multiperiod equilibrium asset pricing model”, Econometrica 46:1077−1093. Stein, E., and J. Stein (1991), “Stock price distributions with stochastic volatility: an analytic approach”, Review of Financial Studies 4:725−752. Stricker, C. (1990), “Arbitrage et lois de martingale”, Annales de l’Institut Henri Poincar´e 26:451−460. Sundaresan, S. (1989), “Intertemporally dependent preferences in the theories of consumption, portfolio choice and equilibrium asset pricing”, Review of Financial Studies 2:73−89. Sundaresan, S. (1997), Fixed Income Markets and Their Derivatives (South-Western, Cincinnati, OH). Svensson, L., and M. Dahlquist (1996), “Estimating the term structure of interest rates for monetary policy analysis”, Scandinavian Journal of Economics 98:163−183. Turnbull, S.M. (1993), “Pricing and hedging diff swaps”, Journal of Financial Engineering (December): 297−334. Turnbull, S.M. (1995), “Interest rate digital options and range notes”, Journal of Derivatives 3:92−101. Uhrig-Homburg, M. (1998), “Endogenous bankruptcy when issuance is costly”, Working Paper 98-13 (Lehrstuhl f¨ur Finanzierung, University of Mannheim). Van Steenkiste, R., and S. Foresi (1999), “Arrow–Debreu prices for affine models”, Working Paper (Salomon Smith Barney, Inc., Goldman Sachs Asset Management). Vargiolu, T. (1999), “Invariant measures for the Musiela equation with deterministic diffusion term”, Finance and Stochastics 3:483−492. Vasicek, O. (1977), “An equilibrium characterization of the term structure”, Journal of Financial Economics 5:177−188. Werner, J. (1985), “Equilibrium in economies with incomplete financial markets”, Journal of Economic Theory 36:110−119. Whalley, A., and P. Wilmott (1997), “An asymptotic analysis of an optimal hedging model for options with transaction costs”, Mathematical Finance 7:307−324. Xu, G.-L., and S. Shreve (1992), “A duality method for optimal consumption and investment under short-selling prohibition. I. General market coefficients”, Annals of Applied Probability 2:87−112. Zhou, C.-S. (2000), “A jump-diffusion approach to modeling credit risk and valuing defaultable securities”, Working Paper (Federal Reserve Board, Washington, DC). Zhou, Y.-Q. (1997), “The global structure of equilibrium manifold in incomplete markets”, Journal of Mathematical Economics 27:91−111.
Chapter 12
TESTS OF MULTIFACTOR PRICING MODELS, VOLATILITY BOUNDS AND PORTFOLIO PERFORMANCE WAYNE E. FERSON ° Carroll School of Management, Boston College
Contents Abstract Keywords 1. Introduction 2. Multifactor asset-pricing models: Review and integration 2.1. 2.2. 2.3. 2.4. 2.5.
The stochastic discount factor representation Expected risk premiums Return predictability Consumption-based asset-pricing models Multi-beta pricing models 2.5.1. Relation to the stochastic discount factor 2.5.2. Relation to mean-variance efficiency 2.5.3. A large-markets interpretation 2.6. Mean-variance efficiency with conditioning information 2.6.1. Conditional versus unconditional efficiency 2.6.2. Implications for tests 2.7. Choosing the factors
3. Modern variance bounds 3.1. The Hansen–Jagannathan bounds 3.2. Variance bounds with conditioning information 3.2.1. Efficient portfolio bounds 3.2.2. Optimal bounds 3.2.3. Discussion 3.3. The Hansen–Jagannathan distance
4. Methodology and tests of multifactor asset-pricing models 4.1. The Generalized Method of Moments approach
745 745 746 748 748 750 751 753 754 754 756 757 760 762 764 765 768 768 770 771 771 772 773 774 774
° The author acknowledges financial support from the Collins Chair in Finance at Boston College and the Pigott-PACCAR professorship at the University of Washington. He is also grateful to Geert Bekaert, John Cochrane, George Constantinides and Ludan Liu for helpful comments and suggestions.
Handbook of the Economics of Finance, Edited by G.M. Constantinides, M. Harris and R. Stulz © 2003 Elsevier B.V. All rights reserved
744
W.E. Ferson 4.2. Cross-sectional regression methods 4.2.1. The Fama–MacBeth approach 4.2.2. Interpreting the estimates 4.2.3. A caveat 4.2.4. Errors-in-betas 4.3. Multivariate regression and beta-pricing models 4.3.1. Comparing the SDF and beta-pricing approaches
5. Conditional performance evaluation 5.0.1. A numerical example 5.1. Stochastic discount factor formulation 5.1.1. Invariance to the number of funds 5.1.2. Additional issues 5.2. Beta-pricing formulation 5.3. Using portfolio weights 5.3.1. Conditional performance attribution 5.3.2. Interim trading bias 5.4. Conditional market-timing models 5.5. Empirical evidence on conditional performance
6. Conclusions References
775 776 777 778 780 781 784 785 786 787 788 788 788 790 791 791 792 793 794 795
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
745
Abstract Three concepts: stochastic discount factors, multi-beta pricing and mean-variance efficiency, are at the core of modern empirical asset pricing. This chapter reviews these paradigms and the relations among them, concentrating on conditional assetpricing models where lagged variables serve as instruments for publicly available information. The different paradigms are associated with different empirical methods. We review the variance bounds of Hansen and Jagannathan (1991), concentrating on extensions for conditioning information. Hansen’s (1982) Generalized Method of Moments (GMM) is briefly reviewed as an organizing principle. Then, cross-sectional regression approaches as developed by Fama and MacBeth (1973) are reviewed and used to interpret empirical factors, such as those advocated by Fama and French (1993, 1996). Finally, we review the multivariate regression approach, popularized in the finance literature by Gibbons (1982) and others. A regression approach, with a beta pricing formulation, and a GMM approach with a stochastic discount factor formulation, may be considered competing paradigms for empirical work in asset pricing. This discussion clarifies the relations between the various approaches. Finally, we bring the models and methods together, with a review of the recent conditional performance evaluation literature, concentrating on mutual funds and pension funds.
Keywords stochastic discount factor, performance evaluation, asset pricing, portfolio efficiency, volatility bounds, predicting returns JEL classification: A23, C1, C31, C51, D91, E20, G10, G11, G12, G14, G23
746
W.E. Ferson
1. Introduction The asset-pricing models of modern finance describe the prices or expected rates of return of financial assets, which are claims traded in financial markets. Examples of financial assets are common stocks, bonds, options, futures and other “derivatives”, so named because they derive their values from other, underlying assets. Asset-pricing models are based on two central concepts. The first is the no arbitrage principle, which states that market forces tend to align prices so as to eliminate arbitrage opportunities. An arbitrage opportunity arises when assets can be combined in a portfolio with zero cost, no chance of a loss and positive probability of a gain. In Chapter 10 of this volume, Dybvig and Ross describe this theory. The second central concept in asset pricing is financial market equilibrium. Investors’ desired holdings of financial assets derive from an optimization problem. In equilibrium the first-order conditions of the optimization problem must be satisfied, and asset-pricing models follow from these conditions. When the agent considers the consequences of the investment decision for more than a single period in the future, intertemporal asset pricing models result. These models are reviewed by Campbell in Chapter 13 of this volume, and by Duffie in Chapter 11. The present chapter reviews multi-factor asset-pricing models from an empiricists’ perspective. Multi-factor models can be motivated by either the no-arbitrage principle or by an equilibrium model. Their distinguishing feature is that expected asset returns are determined by a linear combination of their covariances with variables representing the risk factors. This chapter has two main objectives. The first is to integrate the various empirical models and their tests in a self-contained discussion. The second is to review the application to the problem of measuring investment performance. This chapter concentrates heavily on the role of conditioning information, in the form of lagged variables that serve as instruments for publicly available information. I think that developments in this area, conditional asset pricing, represent some of the most significant advances in empirical asset-pricing research in recent years. The models described in this chapter are set in the classical world of perfectly efficient financial markets, and perfectly rational economic agents. Of course, a great deal of research is devoted to understanding asset prices under market imperfections like information and transactions costs, as several chapters in this volume amply illustrate. For asset pricing emphasizing human imperfections (behavioral finance) see Chapter 18 of this volume by Barberis and Thaler. The perfect-markets models reviewed here represent a baseline, and a starting point for understanding these more complex issues. Work in empirical asset pricing over the last few years has provided a markedly improved understanding of the relations among the various asset-pricing models. Bits and pieces of this are scattered across a number of published papers, and some is “common” knowledge, shared by aficionados. This chapter provides an integrative discussion, refining the earlier review in Ferson (1995) to reflect what I hope is an improved understanding.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
747
Much of our understanding of how asset-pricing models’ empirical predictions are related flows from representing the models as stochastic discount factors. Section 2 presents the stochastic discount factor approach, briefly illustrates a few examples of stochastic discount factors, and then relates the representation to beta pricing and to mean variance efficiency. These three concepts: stochastic discount factors, beta pricing and mean variance efficiency, are at the core of modern empirical asset pricing. We show the relation among these three concepts, and a “large-markets” interpretation of these relations. The discussion then proceeds to refinements of these issues in the presence of conditioning information. Section 2 ends with a brief discussion of how the risk factors have been identified in the empirical literature, and what the empirical evidence has to say about the factors. Section 3 begins with a fundamental empirical application of the stochastic discount factor approach – the variance bounds originally developed by Hansen and Jagannathan (1991). Unlike the case where a model identifies a particular stochastic discount factor, the question in the Hansen–Jagannathan bounds is: Given a set of asset returns, and some conditioning information, what can be said about the set of stochastic discount factors that could properly “price” the assets? By now, a number of reviews of the original Hansen–Jagannathan bounds are available in the literature. The discussion here is brief, quickly moving on to focus on less well-known refinements of the bounds to incorporate conditioning information. Section 4 discusses empirical methods, starting with Hansen’s (1982) Generalized Method of Moments (GMM). This important approach has also been the subject of several review articles and textbook chapters. We briefly review the use of the GMM to estimate stochastic discount factor models. This section is included only to make the latter parts of the chapter accessible to a reader who is not already familiar with the GMM. Section 4 then discusses two special cases that remain important in empirical asset pricing. The first is the cross-sectional regression approach, as developed by Fama and MacBeth (1973), and the second is the multivariate regression approach, popularized in the finance literature following Gibbons (1982). Once the mainstay of empirical work on asset pricing, cross-sectional regression continues to be used and useful. Our main focus is on the economic interpretation of the estimates. The discussion attempts to shed light on recent studies that employ the empirical factors advocated by Fama and French (1993, 1996), or generalizations of that approach. The multivariate regression approach to testing portfolio efficiency can be motivated by its immunity to the errors-in-variables problem that plagues the two step, cross-sectional regression approach. The multivariate approach is also elegant, and provides a nice intuition for the statistical tests. A regression approach, with a beta pricing formulation, and a GMM approach with a stochastic discount factor formulation, may be considered as competing paradigms for empirical work in asset pricing. However, under the same distributional assumptions, and when the same moments are estimated, the two approaches are essentially equivalent. The present discussion attempts to clarify these points, and suggests how to think about the choice of empirical method.
748
W.E. Ferson
Section 5 brings the models and methods together, in a review of the relatively recent literature on conditional performance evaluation. The problem of measuring the performance of managed portfolios has been the subject of research for more than 30 years. Traditional measures use unconditional expected returns, estimated by sample averages, as the baseline. However, if expected returns and risks vary over time, this may confuse common time-variation in fund risk and market risk premiums with average performance. In this way, traditional methods can ascribe abnormal performance to an investment strategy that trades mechanically, based only on public information. Conditional performance evaluation attempts to control these biases, while delivering potentially more powerful performance measures, by using lagged instruments to control for time-varying expectations. Section 5 reviews the main models for conditional performance evaluation, and includes a summary of the empirical evidence. Finally, Section 6 of this chapter offers concluding remarks.
2. Multifactor asset-pricing models: Review and integration 2.1. The stochastic discount factor representation Virtually all asset pricing models are special cases of the fundamental equation: Pt = Et {mt + 1 (Pt + 1 + Dt + 1 )},
(1)
where Pt is the price of the asset at time t and Dt + 1 is the amount of any dividends, interest or other payments received at time t + 1. The market-wide random variable mt + 1 is the stochastic discount factor (SDF). 1 The prices are obtained by “discounting” the payoffs using the SDF, or multiplying by mt + 1 , so that the expected “present value” of the payoff is equal to the price. The notation Et {·} denotes the conditional expectation, given a market-wide information set, Wt . Since empiricists don’t get to see Wt , it will be convenient to consider expectations conditioned on an observable subset of instruments, Zt . These expectations are denoted as E(·| Zt ). When Zt is the null-information set, we have the unconditional expectation, denoted as E(·). Empirical work on asset-pricing models like Equation (1) typically relies on rational expectations, interpreted as the assumption that the expectation terms in the model are mathematical conditional expectations. Taking the expected values of Equation (1), rational expectations implies that versions of Equation (1) must hold for the expectations E(·| Zt ) and E(·).
1 The random variable m t + 1 is also known as the pricing kernel, benchmark pricing variable, or intertemporal marginal rate of substitution, depending on the context. The representation (1) goes at least back to Beja (1971), while the term “stochastic discount factor” is usually ascribed to Hansen and Richard (1987).
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
749
Assuming nonzero prices, Equation (1) is equivalent to: E (mt + 1 Rt + 1 − 1 | Wt ) = 0,
(2)
where Rt + 1 is the N -vector of primitive asset gross returns and 1 is an N -vector of ones. The gross return Ri,t + 1 is defined as (Pi,t + 1 + Di,t + 1 )/Pi,t . We say that a SDF “prices” the assets if Equations (1) and (2) are satisfied. Empirical tests of asset-pricing models often work directly with Equation (2) and the relevant definition of mt + 1 . Without more structure Equations (1) and (2) have no content because it is almost always possible to find a random variable mt + 1 for which the equations hold. There will be some mt + 1 that “works”, in this sense, as long as there are no redundant asset returns. 2 With the restriction that mt + 1 is a strictly positive random variable, Equation (1) becomes equivalent to the no-arbitrage principle, which says that all portfolios of assets with payoffs that can never be negative, but which are positive with positive probability, must have positive prices [Beja (1971), Rubinstein (1976), Ross (1977), Harrison and Kreps (1979), Hansen and Richard (1987)]. The no arbitrage condition does not uniquely identify mt + 1 unless markets are complete. In that case, mt + 1 is equal to primitive state prices divided by state probabilities. To see this write Equation (1) as Pi,t = Et {mt + 1 Xi,t + 1 }, where Xi,t + 1 = Pi,t + 1 + Di,t + 1 . In a discrete-state setting, Pit = Ss ps Xi,s = Ss qs (ps /qs ) Xi,s , where qs is the probability that state s will occur and ps is the state price, equal to the value at time t of one unit of the numeraire to be paid at time t + 1 if state s occurs at time t + 1. Xi,s is the total payoff of the security i at time t + 1 if state s occurs. Comparing this expression with Equation (1) shows that ms = ps /qs > 0 is the value of the SDF in state s. While the no-arbitrage principle places some restrictions on mt + 1 , empirical work often explores the implications of equilibrium models for the SDF, based on investor optimization. Consider the Bellman equation for a representative consumerinvestor’s optimization: J (Wt , st ) ≡ Max Et {U (Ct , ·) + J (Wt + 1 , st + 1 )} ,
(3)
where U (Ct , ·) is the direct utility of consumption expenditures at time t, and J (·) is the indirect utility of wealth. The notation allows the direct utility of current consumption expenditures to depend on variables such as past consumption expenditures or other state variables, st . The state variables are sufficient statistics, given wealth, for the utility of future wealth in an optimal consumption-investment plan. Thus, changes in the state variables represent future consumption-investment opportunity risk. The budget constraint is: Wt + 1 = (Wt − Ct ) x Rt + 1 , where x is the portfolio weight vector, subject to x 1 = 1. 2
For example, take a sample of assets with a nonsingular second moment matrix and let mt + 1 be [1 (Et {Rt + 1 Rt + 1 })−1 ] Rt + 1 .
750
W.E. Ferson
If the allocation of resources to consumption and investment assets is optimal, it is not possible to obtain higher utility by changing the allocation. Suppose an investor considers reducing consumption at time t to purchase more of (any) asset. The expected utility cost at time t of the foregone consumption is the expected marginal utility of consumption expenditures, Uc (Ct , ·) > 0 (where a subscript denotes partial derivative), multiplied by the price Pi,t of the asset, measured in the numeraire unit. The expected utility gain of selling the investment asset and consuming the proceeds at time t + 1 is Et {(Pi,t + 1 + Di,t + 1 ) Jw (Wt + 1 , st + 1 )}. If the allocation maximizes expected utility, the following must hold: Pi,t Et {Uc (Ct , ·)} = Et {(Pi,t + 1 + Di,t + 1 ) Jw (Wt + 1 , st + 1 )}, which is equivalent to Equation (1), with mt + 1 =
Jw (Wt + 1 , st + 1 ) . Et {Uc (Ct , ·)}
(4)
The mt + 1 in Equation (4) is the intertemporal marginal rate of substitution (IMRS) of the consumer-investor, and Equations (2) and (4) combined are the intertemporal Euler equation. Asset-pricing models typically focus on the relation of security returns to aggregate quantities. To get there, it is necessary to aggregate the Euler equations of individuals to obtain equilibrium expressions in terms of aggregate quantities. Theoretical conditions which justify the use of aggregate quantities are discussed by Wilson (1968), Rubinstein (1974) and Constantinides (1982), among others. Some recent empirical work does not assume aggregation, but relies on panels of disaggregated data. Examples include Zeldes (1989), Brav, Constantinides and Geczy (2002) and Balduzzi and Yao (2001). Multiple-factor models for asset pricing follow when mt + 1 can be written as a function of several factors. Equation (4) suggests that likely candidates for the factors are variables that proxy for consumer wealth, consumption expenditures or the state variables – the variables that determine the marginal utility of future wealth in an optimal consumption-investment plan. 2.2. Expected risk premiums Typically, empirical work focuses on expressions for expected returns and excess rates of return. Expected excess returns are related to the risk factors that create variation in mt + 1 . Consider any asset return Ri,t + 1 and a reference asset return, R0,t + 1 . Define the excess return of asset i, relative to the reference asset as ri,t + 1 = Ri,t + 1 − R0,t + 1 . If Equation (2) holds for both assets it implies: Et {mt + 1 ri,t + 1 } = 0 for all i.
(5)
Use the definition of covariance to expand Equation (5) into the product of expectations plus the covariance, obtaining: Covt ri,t + 1 ; − mt + 1 , for all i, (6) Et {ri,t + 1 } = Et {mt + 1 }
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
751
where Covt (·; ·) is the conditional covariance. Equation (6) is a general expression for the expected excess return, from which most of the expressions in the literature can be derived. The conditional covariance of return with the SDF, mt + 1 , is a very general measure of systematic risk. Asset-pricing models say that assets earn expected return premiums for their systematic risk, not their total risk (i.e., variance of return). The covariance with −mt + 1 is systematic risk because it measures the component of the return that contributes to fluctuations in the marginal utility of wealth. If we regressed the asset return on the SDF, the residual in the regression would capture the “unsystematic” risk and would not be “priced”, or command a risk premium. If the conditional covariance with the SDF is zero for a particular asset, the expected excess return of that asset should be zero. 3 The more negative is the covariance with m t+ 1 the less desireable is the distribution of the random return, as the larger payoffs tend to occur when the marginal utility is low. The expected compensation for holding assets with this feature must be higher than for those with a more desireable distribution. Expected risk premiums should therefore differ across assets in proportion to their conditional covariances with −mt + 1 .
2.3. Return predictability Rational expectations implies that the difference between return realizations and the expectations in the model should be unrelated to the information that the expectations in the model are conditioned on. For example, Equation (2) says that the conditional expectation of the product of mt + 1 and Ri,t + 1 is the constant, 1.0. Therefore, 1 − mt + 1 Ri,t + 1 should not be predictably different from zero using any information available at time t. If we run a regression of 1 − mt + 1 Ri,t + 1 on any lagged variable, Zt , the regression coefficients should be zero. If there is predictability in a return Ri,t + 1 using instruments Zt , the model implies that the predictability is removed when Ri,t + 1 is multiplied by the correct mt + 1 . This is the sense in which conditional assetpricing models are asked to “explain” predictable variation in asset returns. This view generalizes the older “random walk” model of stock values, which states that stock returns should be completely unpredictable. That model is a special case which can be motivated by risk neutrality. Under risk neutrality the IMRS, mt + 1 , is a constant. Therefore, in this case the model implies that the return Ri,t + 1 should not differ predictably from a constant. Conditional asset pricing presumes the existence of some return predictability. There should be instruments Zt for which E(Rt + 1 |Zt ) or E(mt + 1 |Zt ) vary over time, in order
3 Equation (6) is weaker than Equation (2), since Equation (6) is equivalent to E {m t t + 1 Ri,t + 1 } = Dt , all i, where Dt is a constant across assets, while Equation (2) restricts Dt = 1. Therefore, empirical tests based on Equation (6) do not exploit all of the restrictions implied by a model that may be stated in the form of Equation (2).
752
W.E. Ferson
for the equation E(mt + 1 Rt + 1 − 1|Zt ) = 0 to have empirical bite. 4 Interest in predicting security-market returns is about as old as the security markets themselves. Fama (1970) reviews the early evidence and Schwert, in Chapter 15 of this volume, reviews “anomalies” based on predictability. One body of literature uses lagged returns to predict future stock returns, attempting to exploit serial dependence. High frequency serial dependence, such as daily or intraday patterns, are often considered to represent the effects of market microstructure, such as bid–ask spreads [e.g., Roll (1984)] and nonsynchronous trading of the stocks in an index [e.g., Scholes and Williams (1977)]. Serial dependence at longer horizons may represent predictable changes in the expected returns. Conrad and Kaul (1989) report serial dependence in weekly returns. Jegadeesh and Titman (1993) find that relatively high return, “winner” stocks tend to repeat their performance over three to nine-month horizons. DeBondt and Thaler (1985) find that past high-return stocks perform poorly over the next five years, and Fama and French (1988) find negative serial dependence over two to five-year horizons. These serial-dependence patterns motivate a large number of studies which attempt to assess the economic magnitude and statistical robustness of the implied predictability, or to explain the predictability as an economic phenomenon. For more comprehensive reviews, see Campbell, Lo and MacKinlay (1997) or Kaul (1996). Research in this area continues, and its fair to say that the jury is still out on the issue of predictability using lagged returns. A second body of literature studies predictability using other lagged variables as instruments. Fama and French (1989) assemble a list of variables from studies in the early 1980s, that as of this writing remain the workhorse instruments for conditional asset pricing models. These variables include the lagged dividend yield of a stock market index, a yield spread of long-term government bonds relative to short term bonds, and a yield spread of low-grade (high default risk) corporate bonds over highgrade bonds. In addition, studies often include the level of a short term interest rate [Fama and Schwert (1977), Ferson (1989)] and the lagged excess return of a mediumterm over a short-term Treasury bill [Campbell (1987), Ferson and Harvey (1991)]. Recently proposed instruments include an aggregate book-to-market ratio [Pontiff and Schall (1998)] and lagged consumption-to-wealth ratios [Lettau and Ludvigson (2001)]. Of course, many other predictor variables have been proposed and more will doubtless be proposed in the future. Predictability using lagged instruments remains controversial, and there are some good reasons the question the predictability. Studies have identified various statistical biases in predictive regressions [e.g., Hansen and Hodrick (1980), Stambaugh (1999),
4 At one level this is easy. Since E(m t + 1 |Zt ) should be the inverse of a risk-free return, all we need is observable risk-free rates that vary over time. Ferson (1989) shows that the behavior of stock returns and short-term interest rates imply that conditional covariances of returns with mt + 1 must also vary over time.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
753
Ang and Bekaert (2001), Ferson, Sarkissian and Simin (2002)], questioned the stability of the predictive relations across economic regimes [e.g., Kim, Nelson and Startz (1991)] and raised the possibility that the lagged instruments arise solely through data mining [e.g., Lo and MacKinlay (1990), Foster, Smith and Whaley (1997)]. A reasonable response to these concerns is to see if the predictive relations hold out-of-sample. This kind of evidence is also mixed. Some studies find support for predictability in step-ahead or out-of-sample exercises [e.g., Fama and French (1989), Pesaran and Timmermann (1995)]. Similar instruments show some ability to predict returns outside the context of the USA, where they arose [e.g., Harvey (1991), Solnik (1993), Ferson and Harvey (1993, 1999)]. However, other studies conclude that predictability using the standard lagged instruments does not hold [e.g., Goyal and Welch (1999), Simin (2002)]. It seems that research on the predictability of security returns will always be interesting, and conditional asset-pricing models should be useful in framing many future investigations of these issues. 2.4. Consumption-based asset-pricing models In these models the economic agent maximizes a lifetime utility function of consumption (including possibly a bequest to heirs). Consumption models may be derived from Equation (4) by exploiting the envelope condition, Uc (·) = Jw (·), which states that the marginal utility of consumption must be equal to the marginal utility of wealth if the consumer has optimized the tradeoff between the amount consumed and the amount invested. Breeden (1979) derived a consumption-based asset-pricing model in continuous time, assuming that the preferences are time-additive. The utility function for the lifetime stream of consumption is St b t U (Ct ), where b is a time preference parameter and U (·) is increasing and concave in current consumption, Ct . Breeden’s model is a linearization of Equation (2) which follows from the assumption that asset values and consumption follow diffusion processes [Bhattacharya (1981), Grossman and Shiller (1982)]. A discrete-time version follows Rubinstein (1976) and Lucas (1978), assuming a power utility function: U (C) =
C 1−a − 1 , 1−a
(7)
where a > 0 is the concavity parameter. This function displays constant relative risk aversion 5 equal to a. Using Equation (7) and the envelope condition, the IMRS in Equation (4) becomes: mt+1 = b (Ct + 1 /Ct )−a .
(8)
5 Relative risk aversion in consumption is defined as −Cu (C)/u (C). Absolute risk aversion is −u (C)/u (C). Ferson (1983) studies a consumption-based asset-pricing model with constant absolute risk aversion.
754
W.E. Ferson
A large literature has tested the pricing Equation (1), with the SDF given by the consumption model (8), and generalizations of that model. 6 2.5. Multi-beta pricing models The vast majority of the empirical work on asset-pricing models involves expressions for expected returns, stated in terms of beta coefficients relative to one or more portfolios or factors. The beta is the regression coefficient of the asset return on the factor. Multi-beta models have more than one risk factor and more than one beta for each asset. The Arbitrage Pricing Theory (APT) leads to approximate expressions for expected returns with multiple beta coefficients. Models based on investor optimization and equilibrium lead to exact expressions. 7 Both of these approaches lead to models with the following form: K bijt ljt , for all i. Et Ri,t + 1 = l0t +
(9)
j=1
The bi1t , . . . , biKt are the time t betas of asset i relative to the K risk factors Fj,t + 1 , j = 1, . . . , K. These betas are the conditional multiple-regression coefficients of the assets on the factors. The lj,t , j = 1, . . . , K are the factor risk premiums, which represent increments to the expected return per unit of type-j beta. These premiums do not depend on the specific security i. l0,t is the expected zero-beta rate. This is the expected return of any security that is uncorrelated with each of the K factors in the model (i.e., b0jt = 0, j = 1, . . . , K). If there is a risk-free asset, then l0,t is the return of this asset. 2.5.1. Relation to the stochastic discount factor We first show how a multi-beta model can be derived as a special case of the SDF representation, when the factors capture the relevant systematic risks. We take this to mean that the error terms, ui,t + 1 , in a regression of returns on the factors are not “priced”; that is, they are uncorrelated with mt + 1 : Covt (ui,t + 1 , mt + 1 ) = 0. We then state the general equivalence between the two representations. This equivalence was
An important generalization allows for nonseparabilities in the Uc (Ct , ·) function in Equation (4), as may be implied by the durability of consumer goods, habit persistence in the preferences for consumption over time, or nonseparability of preferences across states of nature. Singleton (1990), Ferson (1995) and Campbell, in Chapter 13 of this volume, review this literature. 7 The multiple-beta equilibrium model was developed in continuous time by Merton (1973), Breeden (1979) and Cox, Ingersoll and Ross (1985). Long (1974), Sharpe (1977), Cragg and Malkiel (1982), Connor (1984), Dybvig (1983), Grinblatt and Titman (1983) and Shanken (1987) provide multibeta interpretations of equilibrium models in discrete time. 6
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
755
first discussed, for the case of a single-factor model, by Dybvig and Ingersoll (1982). The general, multi-factor case follows from Ferson and Jagannathan (1996). Let R0,t + 1 be a zero-beta portfolio, and l0,t the expected return on the zero-beta portfolio. Equation (6) implies: Covt Ri,t + 1 ; − mt + 1 Et Ri,t + 1 = l0,t + . (10) Et (mt + 1 ) Substituting the regression model Ri,t + 1 = ai + Sj bijt Fj,t + 1 + ui,t + 1 into the right hand side of (10) and assuming that Covt (ui,t + 1 , mt + 1 ) = 0 implies: Covt {Fj,t + 1 , −mt + 1 } Et Ri,t + 1 = l0,t + Sj = 1, ..., K bijt . Et (mt + 1 )
(11)
The risk premium per unit of type-j beta is lj,t = [Covt {Fj,t + 1 , −mt + 1 }/ Et (mt + 1 )]. In the special case where the factor Fj,t + 1 is a traded asset return, Equation (11) implies that lj,t = Et (Fj,t + 1 ) − l0,t ; the expected risk premium equals the factor portfolio’s expected excess return. Equation (11) is useful because it provides intuition about the signs and magnitudes of expected risk premiums for particular factors. The intuition is the same as in Equation (6) above. If a risk factor Fj,t + 1 is negatively-correlated with mt + 1 , the model implies that a positive risk premium is associated with that factor beta. A factor that is positively-related to marginal utility should carry a negative premium, because the big payoffs come when the value of payoffs is high. This implies a high present value and a low expected return. Expected risk premiums for a factor should also change over time if the conditional covariances of the factor with the scaled marginal utility [mt + 1 / Et (mt + 1 )] vary over time. The steps that take us from Equations (6) to (11) can be reversed, so the SDF and multi-beta representations are, in fact, equivalent. The formal statement is: Lemma 1 [Ferson and Jagannathan (1996)]. The stochastic discount factor representation (2) and the multi-beta model (9) are equivalent, where, mt + 1 = c0t + c1t F1t + 1 + · · · + cKt FKt + 1 , with
1 + Sk lk Et Fk,t + 1 / Var t Fk,t + 1 c0t = , l0,t
and cjt = −
lj,t , l0,t Var t Fj,t + 1
j = 1, . . . , K.
(12)
756
W.E. Ferson
For a proof, see Ferson and Jagannathan (1996). If the factors are not traded asset returns, then it is typically necessary to estimate the expected risk premiums for the factors, lk,t . These may be identified as the conditional expected excess returns on factor-mimicking portfolios. A factor-mimicking portfolio is defined as a portfolio whose return can be used in place of a factor in the model. There are several ways to obtain mimicking portfolios, as described in more detail below. 8 2.5.2. Relation to mean-variance efficiency The concept of a minimum-variance portfolio is central in the asset-pricing literature. A portfolio Rp,t + 1 is a minimum-variance portfolio if no portfolio with the same expected return has a smaller variance. Roll (1977) and others have shown that the portfolio Rp,t + 1 is a minimum-variance portfolio if and only if a beta-pricing model holds: 9 Et {Ri,t + 1 − Rpz,t + 1 } = bipt Et {Rp,t + 1 − Rpz,t + 1 } , Covt Ri,t + 1 ; Rp,t + 1 bipt = . Var t Rp,t + 1
all i;
(13)
In Equation (13), bipt is the conditional beta of Ri,t + 1 relative to Rp,t + 1 . Rpz,t + 1 is a zero beta asset relative to Rp,t + 1 . A zero-beta asset satisfies Covt (Rpz,t + 1 ; Rp,t + 1 ) = 0. Equation (13) is essentially a restatement of the first-order condition for the optimization problem that defines a minimum-variance portfolio. Equation (13) first appeared as an asset-pricing model in the famous Capital Asset Pricing Model (CAPM) of Sharpe (1964), Lintner (1965) and Black (1972). The CAPM is equivalent to the statement that the market portfolio Rm,t + 1 is meanvariance efficient. The market portfolio is the portfolio of all marketed assets, weighted according to their relative total values. The portfolio is mean-variance efficient if it satisfies Equation (13), and also Et (Rm,t + 1 − Rmz,t + 1 ) > 0. When the factors are traded assets like a market portfolio, or when mimicking portfolios are used, the multi-beta model in Equation (9) is equivalent to the statement that a combination of the factor portfolios is minimum-variance efficient. 10 Therefore, 8
Breeden (1979, footnote 7) derives maximum correlation mimicking portfolios. Grinblatt and Titman (1987), Shanken (1987), Lehmann and Modest (1988) and Huberman, Kandel and Stambaugh (1987) provide further characterizations of mimicking portfolios when there is no conditioning information. Ferson and Siegel (2002b) and Ferson, Siegel and Xu (2002) consider cases where there is conditioning information. 9 It is assumed that the portfolio R p,t + 1 is not the global minimum-variance portfolio; that is, the minimum variance over all levels of expected return. This is because the betas of all assets on the global minimum-variance portfolio are identical. 10 This result is proved by Grinblatt and Titman (1987), Shanken (1987) and Huberman, Kandel and Stambaugh (1987) and reviewed by Ferson (1995).
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
757
multiple-beta asset-pricing models like Equation (9) always imply that combinations of particular portfolios are minimum-variance efficient. This correspondence is exploited by Gibbons, Ross and Shanken (1989) and Kandel and Stambaugh (1989), among others, to develop tests of multi-beta models based on mean-variance efficiency. Such tests are discussed in Section 4 below. Since the SDF representation in Equation (2) is equivalent to a multi-beta expression for expected returns, and a multi-beta model is equivalent to a statement about minimum-variance efficiency, it follows that the SDF representation is equivalent to a statement about minimum-variance efficiency. Let’s now complete the loop. Lemma 2. A portfolio which maximizes squared (conditional) correlation with mt + 1 in Equation (2) is a minimum (conditional) variance portfolio. Proof: Consider the conditional projection of mt + 1 on the vector of returns Rt + 1 . The coefficient vector is wt ≡ (Et (Rt + 1 Rt + 1 ))−1 Et (Rt + 1 mt + 1 ) = (Et (Rt + 1 Rt + 1 ))−1 1. Define the portfolio return Rp,t + 1 = (wt /wt 1) Rt + 1 . The portfolio maximizes the squared conditional correlation with mt + 1 . The regression of mt + 1 on the vector of returns can be written as mt + 1 = (wt 1) Rp,t + 1 + et + 1 . The error term et + 1 is conditionally uncorrelated with Ri,t + 1 for all i, and therefore with Rp,t + 1 . 11 Substituting for mt + 1 in Equation (6) from this regression produces: Covt ri,t + 1 ; Rp,t + 1 Et {rp,t + 1 } , all i, Et {ri,t + 1 } = (14) Covt rp,t + 1 ; Rp,t + 1 where ri,t + 1 and rp,t + 1 are excess returns. If the reference asset for the excess returns is taken to be Rpz,t + 1 , a zero-beta asset for Rp,t + 1 , then Equation (13) follows directly from Equation (14). Equation (13) implies that Rp,t + 1 is a conditional minimumvariance portfolio. We have seen that exact multi-beta pricing is equivalent to the statement that E(mRi ) = 1 for all i, under the assumption that m is a linear function of the factors, and also equivalent to the statement that a portfolio of the factors is a minimumvariance-efficient portfolio. Thus, we have equivalence among the three paradigms: Exact multi-beta pricing, stochastic discount factors, and mean-variance efficiency. 2.5.3. A large-markets interpretation This section describes how the three paradigms of empirical asset pricing work in the large markets of the Arbitrage Pricing Theory (APT) of Ross (1976), as refined 11 The fitted values of the regression will have the same pricing implications as m t + 1 . That is, m∗t + 1 = (wt 1) Rp,t + 1 can replace mt + 1 in Equation (1). Note that when the covariance matrix of asset returns is nonsingular, m∗t + 1 is the unique SDF (i.e., satisfies Equation (1)) which is also an asset return. An SDF which satisfies Equation (1) is not in general an asset return, nor is it unique, unless markets are complete. If markets are complete, m∗t + 1 is perfectly correlated with mt + 1 [Hansen and Richard (1987)].
758
W.E. Ferson
by Chamberlain (1983) and Chamberlain and Rothschild (1983). For this purpose, we ignore the existence of any “conditioning information”, and suppress the time subscripts and related notation. [For arbitrage pricing relations with conditioning information, see Stambaugh (1983)]. Assume that the following data-generating model describes equity returns in excess of a risk-free asset: ri = E (ri ) + bi f + ei ,
(15)
where E( f ) = 0 = E(ei f ), all i, and ft = Ft − E(Ft ) are the unexpected factor returns. We can normalize the factors to have the identity as their covariance matrix; the bi absorb the normalization. The N × N covariance matrix of the asset return residuals can then be expressed as: Cov(R) ≡ S = BB + V ,
(16)
where V is the covariance matrix of the residual vector, e, B is the N × K matrix of the bi , and S is assumed to be nonsingular for all N . The factor model assumes that the eigenvalues of V are bounded as N → ∞, while the K nonzero eigenvalues of BB become infinite as N → ∞. Thus, the covariance matrix S has K unbounded and N − K bounded eigenvalues as N becomes large. This is called an “approximate factor structure”, to distinguish it from an “exact” factor structure, where V is assumed to be diagonal. The factor model in Equation (16) decomposes the variances of returns into “pervasive” and “nonsystematic” risks. If x is an N -vector of portfolio weights, the portfolio variance is x Sx, where lmax (S) x x x Sx lmin (S) x x, lmin (S) being the smallest eigenvalue of S and lmax (S) being the largest. Following Chamberlain (1983), a portfolio is “well diversified” iff x x → 0 as N grows without bound. For example, an equally weighted portfolio is well diversified; in this case x x = (1/N ) → 0. The bounded eigenvalues of V imply that V captures the component of portfolio risk that is not pervasive or systematic, in the sense that this part of the variance vanishes in a well-diversified portfolio. The exploding eigenvalues of BB imply that the common factor risks are pervasive, in the sense that they remain in a large, well-diversified portfolio. The arbitrage pricing theory of Ross (1976) asserts that a a < ∞ as N grows without bound, where a is the N vector of “alphas,” or expected abnormal returns, measured as the differences between the left- and right-hand sides of Equation (9), using the APT factors in the multi-beta model. The alphas are the differences between the assets’ expected returns and the returns predicted by the multi-beta model, sometimes called the “pricing errors”. The Ross APT implies that the multi-beta model’s pricing errors are “small”, on average, in a large market. If a a < ∞ as N grows, then the cross-asset average of the squared pricing errors, (a a)/N , must go to zero as N grows.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
759
To see how the approximate beta pricing of the APT relates to the other paradigms of empirical asset pricing, we first describe how the pricing errors in the beta-pricing model are related to those of a stochastic discount factor representation. If we define am = E(mR − 1), where m is linear in the APT factors, then it follows from Equations (10) and (11) that am = E(m)a; the beta-pricing and stochastic discount factor alphas are proportional, where the risk-free rate determines the constant of proportionality. Provided that the risk-free rate is bounded above −100%, then E(m) is bounded, and a a is bounded above if and only if am am is bounded above. Thus, the Ross APT has the same implications for the pricing errors in the stochastic discount factor and beta-pricing paradigms. The third paradigm is mean-variance efficiency. We know that a combination of the APT factors is minimum-variance efficient, if and only if a = 0. Thus, under the Ross APT a combination of the factors is not minimum-variance efficient. However, an upper bound on a a implies a lower bound on the correlation between a minimum-variance combination of the factors and a minimum-variance-efficient portfolio. To see how the Ross APT restricts the correlation between the factors and a minimum-variance-efficient portfolio, we need two facts. The first is the “law of conservation of squared Sharpe ratios”, developed as Equation (52) below (p. 782). Here we state the law as S 2 (r) = a S −1 a + S 2 ( f ), where S 2 ( f ) is the maximum squared Sharpe ratio that can be obtained by a portfolio of the factors, 12 S 2 (r) is the squared Sharpe ratio of a minimum-variance-efficient portfolio using all of the assets, and S is the covariance matrix of the assets’s excess returns. The second fact, which follows from Equation (13), describes the correlation, ø, between the minimum-variance-efficient portfolio of the factors and the minimum-varianceefficient portfolios that uses all of the assets: ø = S( f )/S(r). Combining these results, S 2 (r) − S 2 ( f ) = a S −1 a a almax (S −1 ) = a a/ lmin (S). Substituting for S(r) in terms of ø and S( f ), we arrive at: [1/ ø2 − 1] a a/ [lmin (S)S 2 ( f )]. Thus, an upper bound on a a places a lower bound on the squared correlation between the minimum-variance factor portfolio and a minimum-variance-efficient portfolio of the assets. The “exact” version of the APT asserts that a a → 0 as N grows without bound, thus, the pricing errors of all assets go to zero as the market gets large. This version of the model requires stronger economic assumptions, as described by Connor (1984). Chamberlain (1983) shows that the exact APT is equivalent to the statement that all minimum-variance portfolios are well-diversified, and are thus combinations of the APT factors. In this case, we are essentially back to the original equivalence between the three paradigms holding as N gets large. That is, we have E(mR − 1) = 0 if and only if a = 0 when m is linear in the APT factors, and equivalently, a combination of the factors is a minimum-variance-efficient portfolio in the large market.
12
The Sharpe ratio is the expected excess return divided by the standard deviation.
760
W.E. Ferson
2.6. Mean-variance efficiency with conditioning information Most asset-pricing models are stated in terms of expected asset returns, covariances and betas, conditional on the available public information at time t. However, empirical tests traditionally examine unconditional expected returns and betas, or use instruments that are a subset of the available public information. Given the equivalence between mean variance efficiency and the other asset pricing representations, it follows that all tests of asset pricing models using portfolios, have examined whether particular portfolios are either unconditionally minimum variance, or minimum variance conditional on a subset of the information. To understand how such tests are related to the theories we need to examine different concepts of efficiency when there is conditioning information. When there is conditioning information, minimum-variance efficiency may be defined in terms of the conditional means and variances (conditionally efficient), or in terms of unconditional moments. When the objective is to minimize the unconditional variance for a given unconditional mean, but where portfolio strategies may be functions of the information, we have (unconditional) minimum-variance efficiency with respect to the conditioning information. Unconditional efficiency with respect to conditioning information may seem confusing because conditioning information may be employed by the portfolio, but unconditional expectations about that portfolio’s returns are used to define efficiency. However, this information structure is actually quite common. Often the agent conducting a portfolio optimization uses more information than is available to the observer of the outcomes. If the observer does not have the conditioning information, he or she can only form unconditional, or less informed, expectations. Dybvig and Ross (1985) provide an example. Consider a portfolio manager who is evaluated based on the unconditional mean and variance of the portfolio return. The manager may use conditioning information about future returns in forming the portfolio. They show that the manager’s conditionally-efficient portfolio will typically not appear efficient to the uninformed investor. The portfolio that maximizes the manager’s measured performance in this setting is the unconditionally-efficient portfolio with respect to the information. Ferson and Siegel (2001) derive efficient-portfolio strategies with respect to conditioning information and illustrate their properties. Consider an example with two assets: a riskless asset (with gross rate of return Rf ) and a risky asset with gross return, R. The risky asset’s return is written as: R = Rf + m(Z) + e,
(17)
where m(Z) = E(R − Rf | Z). The conditional variance of the return given Z, is S(Z). The problem to be solved is: Min Var {x(Z) R} x(Z)
subject to :
mp = Rf + E x(Z) R − Rf .
(18)
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
761
The weight function x(Z) specifies the fraction invested in the risky asset, as a function of the conditioning information, Z. Here we provide a constructive derivation. 13 Let L(Z)−1 = E{(R − Rf )2 | Z} = S(Z) + m(Z)2 and write the Lagrangian:
. L (x | Z) = E x(Z) L(Z)−1 x(Z) − 2l x(Z) m(Z) − mp − Rf
(19)
Let w(Z) ˆ = x(Z) + ay(Z), where y(Z) is any function and x(Z) is the optimal solution. If we consider L(w(Z) ˆ | Z) = L(x(Z) + ay(Z) | Z), then optimality of x(Z) requires ðL/ða |a = 0 = 0 = E[{x(Z) L(Z)−1 − lm(Z)} y(Z)] = 0. Since this must hold for all functions y(Z), it implies x(Z) L(Z)−1 − lm(Z) = 0. Solving for x(Z) and evaluating l by substituting the solution back into the constraint that E{x(Z) m(Z)} = (mp − Rf ) gives the solution for the unconditionally mean-variance efficient strategy with respect to the information, Z: z −1 mP − Rf m(Z) x(Z) = , (20) [m(Z)]2 + S(Z) where & z=E
m(Z)2 m(Z)2 + S(Z)
' .
(21)
The minimized variance implied by this solution is: sp2
=
mp − Rf
2 z
(1 − z)
(22)
Figure 1 gives an empirical example of the optimal weight as a function of the conditional expected excess return m(Z), for a given unconditional mean mp , equal to 11.1% per year. This figure matches the Standard and Poors 500 stock index return for the 1963–1994 sample period. The example assumes homoskedasticity, where S(Z) is a constant. The weight is shown for several values of R2 , defined as the ratio of the variance of the conditional mean to the variance of the stock index return. As R2 approaches zero the weight becomes a constant function. When the conditional expected excess return of the risky asset is zero, the weight in the risky asset is zero. For conditional expected excess returns near zero, the efficient weight appears monotone and nearly linear in m(Z). This is similar to other utilitymaximizing strategies. For example, assuming a normal distribution the strategy that maximizes an exponential utility is linear in the conditional expected return. Kim and Omberg (1996) and Campbell and Viceira (1999) solve intertemporal portfolio 13
Thanks to Ludan Liu.
762
W.E. Ferson 4 3
Risky Asset Weight
2 1 0 -1 -2 -3 -4 -0.6
-0.4
-0.2
0
0.2
0.4
0.6
Signal
Fig. 1. Optimal weight versus signal. R2 are 0.0045, 0.105 and 0.355.
problems and find that the portfolio weights are approximately linear in the state variables. Thus, traditional solutions to the portfolio optimization problem imply portfolio weights that are highly sensitive to extreme values of the signal. For example, if the signal is normally distributed a linear portfolio weight is unbounded. The weight in Equation (20) satisfies x(Z) → 0 as m(Z) → ± ∞. After a certain point, even an optimistic extreme signal leads to purchasing less of the risky asset, when the objective is to attain a given unconditional mean return with the smallest unconditional variance. Intuitively, an extremely high expected return presents an opportunity to reduce risk by taking a small position in the risky asset this period, without compromising the average portfolio performance. 14 The “conservative” nature of the solution implies an interesting “agency” problem in a portfolio management context. The portfolio manager who is evaluated, as is common in practice, on the basis of unconditional mean return relative to unconditional return volatility may be induced to adopt a conservative response to extreme signals in order to maximize the measured performance. 2.6.1. Conditional versus unconditional efficiency Hansen and Richard (1987) show that in the set of returns that can be generated using conditioning information, an unconditionally efficient strategy with respect to 14 The precise shape of the curve depends on the homoskedasticity assumption used in Figure 1. However, according to Equation (20), if there is heteroskedasticity, where an extreme value of the signal is associated with a large conditional variance, the conservative behavior of the strategy is reinforced. Ferson and Siegel (2001) show that the solution for an n-asset example also implies the portfolio weight is a bounded function of the signal. They also note that the graph of the unconditionally efficient portfolio weight is similar to the redescending influence curves used in robust statistics [e.g., Hampel (1974), Goodall (1983) and Carroll (1989)]. This suggests that the unconditionally efficient portfolios may be empirically robust.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
763
the information must be conditionally efficient, but the reverse is not true. The relation between conditional and unconditional efficiency with respect to conditioning information may be understood in terms of the utility functions for which the solutions are optimal. Ferson and Siegel (2001) show that an unconditionally-efficient portfolio with respect to the conditioning information maximizes the conditional expectation of a quadratic utility function in a single-period problem. Since quadratic-utility agents choose mean-variance-efficient portfolios, this implies that an unconditionally-efficient portfolio must be a conditionally mean-variance-efficient portfolio. However, other utility functions also lead to conditional mean-variance-efficient portfolios. One example is the exponential utility function previously mentioned, when returns are normally distributed conditional on the information. Ferson and Siegel show that the solution for the unconditionally efficient portfolio with respect to the information is unique, so the exponential utility agent chooses a conditionally mean-variance-efficient portfolio that is not unconditionally efficient with respect to the information. Thus, conditional efficiency does not imply unconditional efficiency with respect to the information. The unconditional efficient portfolios with respect to given conditioning information are a subset of the conditionally efficient portfolios with respect to the same information. The relation between conditional and unconditional efficient portfolios with respect to given conditioning information can be represented as in Figure 2, with the unconditional mean return on the y axis and unconditional standard deviation on the x axis. The usual “fixed weight” mean-standard deviation boundary, which ignores the conditioning information, is the curve farthest to the right in the lower portion of the figure. There are an infinite number of conditionally efficient portfolio strategies, some examples of which are depicted by the other curves. 15 Some of the conditionally efficient strategies can plot inside the fixed-weight strategy that ignores the conditioning information, as shown by Dybvig and Ross (1985) and illustrated by one of the examples in the figure. The unconditional efficient strategy with respect to the information Z is the outer envelope of all the conditionally efficient strategies. This is shown as the left-most curve in Figure 2. Hansen and Richard (1987) provide a formal characterization and prove that the outer envelope in Figure 2 has the familiar properties associated with mean-standard deviation boundaries when there is no conditioning information, e.g., two-fund separation. See Ingersoll (1987).
15 To see that there are an infinite number, note that a conditionally minimum-variance efficient strategy solves:
Min Var x(Z) R | Z s.t. E x(Z) R | Z = T (Z), x(Z)
where T (Z) is the target conditional mean return. Each of the infinite possible specifications for the function T (Z) implies a conditionally efficient strategy.
764
W.E. Ferson
1.02
Expected Gross Return
1.015 1.01 1.005 1 0.995 0.99 0.985 0.98 0
0.5
1
1.5
2
2.5
3
Standard Deviation
Fig. 2. Minimum variance boundaries.
2.6.2. Implications for tests These concepts of minimum-variance efficiency have important implications for tests of asset-pricing models. In principle, we can devise tests to reject the hypothesis that a portfolio is unconditional efficient or efficient conditional on some observed instruments, but we can not tell if a portfolio is efficient given all the public information, W. If we interpret asset-pricing models as identifying which portfolios are conditionally efficient given W, we have a problem. The collection of minimumvariance portfolios, conditional on the market information set, W, is larger than the set of minimum-variance portfolios conditional on an observable subset of instruments, Z. Thus, even if we reject that a portfolio is efficient given Z, we can not infer that it is inefficient given W. This is similar to the Roll (1977) critique of tests of the CAPM. Roll pointed out that since the market portfolio of the CAPM can not be measured, the CAPM can not be tested without making assumptions about the unobserved market return. The problem here is that we can not test the conditional CAPM because the full information set W is not observed, unless we make assumptions about the unobserved information set. This problem is present even if the true market portfolio return could be measured. 16 There is an important exception to this conundrum. When tests are based on Equation (2) it is possible to test the model without observing the complete information set, when mt + 1 depends only on observable data and model parameters. Equation (2) implies that E(mt + 1 Rt + 1 |Zt ) = 1, so tests may proceed using the observed instruments Zt . This is the case, for example, in versions of the consumption-based asset-pricing model, when the relevant consumption can be measured. Given a model 16
See Wheatley (1989) for a critique of the earliest conditional asset-pricing studies based on similar logic.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
765
in which mt + 1 is a function of observed data and parameters, it is also possible to use the concept of unconditional efficiency with respect to given information, Z, to conduct tests of the model. This approach is developed in Ferson and Siegel (2002b). 2.7. Choosing the factors A beta pricing model has no empirical content until the factors are specified, since there will always be a minimum-variance portfolio which satisfies Equation (13). The minimum-variance portfolio can serve as a single factor in Equation (12). Therefore, the empirical content of the model is the discipline imposed in selecting the factors. There have been three main approaches to specifying empirical factors for multiplebeta asset-pricing models. One approach is to use statistical factor analytic or principal components methods. This approach is motivated by the APT, where the “right” factors are the ones that capture all the pervasive (unbounded eigenvalue) risk, leaving only nonsystematic (bounded eigenvalue) risk in the residuals. That approach is pursued by Roll and Ross (1980) and Connor and Korajczyk (1986, 1988), among others. The advantage of the Connor–Korajczyk approach is that the factor extraction is conducted under essentially the same large-markets assumptions that lead to the APT. This lends some rigor to the tests. The disadvantage is that purely statistical factors provide little economic intuition. Burmeister and McElroy (1988) augment statistical factors with a market portfolio and illustrate how to “rotate” the factors, to interpret them relative to more intuitive economic variables. In a second approach the risk factors are explicitly chosen economic variables or portfolios, chosen based on economic intuition [e.g., Chen, Roll and Ross (1986), Ferson and Harvey (1991), Campbell (1993) and Cochrane (1996)]. Here is where Equation (4) should come in. According to that equation, the factors should be related to consumer wealth, consumption expenditures, and the sufficient statistics for the marginal utility of future wealth in an optimal consumption-investment plan. A third approach for choosing factors uses the cross-sectional empirical relation of stock returns to firm attributes. For example, portfolios are formed by ranking stocks on firm characteristics that are observed to be correlated with the cross-section of average returns. Perhaps the most famous current example is the three-factor model of Fama and French (1993, 1996). Fama and French group common stocks according to their “size” (market value of equity) and their ratios of book value to market value of equity per share. Previous studies such as Keim (1983) and Reinganum (1981) found that stock returns are related to these attributes. Fama and French use the returns of small stocks in excess of large stocks, and the returns of high book-to-market in excess of low book-to-market stocks, as two “factors”. This approach is critiqued on methodological grounds in Section 4.2. The empirical literature which examines multiple-beta pricing models is vast. Fama (1991), Connor and Korajczyk (1995) and Harvey and Kirby (1996) provide selective reviews. Studies typically focus on particular factors, and may mix the three approaches to factor selection. There is scant empirical evidence that focuses directly
766
W.E. Ferson
on the general question: which of the three methods of factor selection is superior? The answer to this question depends on the application to which the multiple-beta model is put. In their role as empirical models for security returns, multiple-beta models are used for essentially three things. First, they are used to explain the cross-section of average returns on different securities. This relates to Equation (6), where expected returns differ according to the return covariances with mt + 1 . Second, the models are used to explain predictable patterns in security returns over time. This is the main goal of conditional asset pricing, as discussed in Section 2.1. Finally, multiple-beta models are used to explain the contemporaneous variance of security returns, through the variation of the risk factors. This relates more to multiple regression models, that are often associated with multiple-beta expected return models like Equation (9). The cross section of expected returns is central for a number of applications. The models’ fitted expected returns serve as estimates of “required” returns, in relation to risk. They are used, among other things, for the cost of equity capital, an important input in corporate project selection problems (see the surveys of Bruner et al. (1998) and Graham and Harvey (2001), and for portfolio construction and performance evaluation (see Section 5). There are problems in evaluating the three approaches to factor selection for this purpose. First, the results depend crucially on the “test assets”, or portfolios for which the models are evaluated. If portfolios are formed to emphasize cross sectional variation in a particular dimension, thus de-emphasizing others, 17 then a model that “explains” that particular dimension will look good. For example, the Fama–French (1993) three-factor model emphasizes size and book-to-market. Fama and French (1996) find that it captures the cross section of average returns pretty well in size and book-to-market sorted portfolios. However, when confronted with industry returns [Fama and French (1997)] or with cross-sectional variation in average returns, related to the momentum effect of Jegadeesh and Titman (1993), the model performs poorly. This issue of portfolio formation has muddled some attempts in the literature to distinguish between the explanation of power of security characteristics versus betas on related factors, for the cross-section of average returns [e.g., Daniel and Titman (1997)]. See Berk (2000) for an analysis and critique. The second problem in evaluating the methods of factor selection relates to the discussion of conditional and unconditional efficiency in Section 2.6.1. A model may identify a conditionally efficient portfolio, but the portfolio is unconditionally inefficient. In other words, conditional covariances with a portfolio return could provide an exact description of the cross section of conditional expected returns, while at the same time average returns are not explained by their unconditional covariances with the same portfolio return. To see this algebraically, take the unconditional expectation of Equation (10), and recall that the expectation of the conditional covariance differs
17
The total sum of squares in any sample must equal the across-group sum of squares plus the withingroup sum of squares.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
767
from the unconditional covariance by the covariance of the conditional means. If the cross section of assets i differ in their values of Cov{Et (Ri,t + 1 ), Et (mt + 1 )}, we have a problem. Recall that Et (mt + 1 ) is the inverse of the risk-free return. Evidence from Fama and Schwert (1977) and Ferson (1989) shows that different assets’ expected returns have different sensitivities to measures of risk-free interest rates, so this problem may be a serious one. Although these caveats make it difficult to interpret the evidence in relation to theory, it is still interesting to know which models provide a good empirical description for the cross section of average returns. A few studies compare the alternative approaches to factor selection from this perspective. Lehmann and Modest (1987) take no stand on which approach is superior, but observe that predicted expected returns for a sample of mutual funds can be very sensitive to using the CAPM versus the APT, as well as to different approaches for implementing the APT. Farnsworth et al. (2002) compare a collection of SDF models, including: (1) three-factor models based on the asymptotic principal components of Connor and Korajczyk (1986); (2) three traded economic factors relating to the stock market, government bonds and low-grade corporate bonds; and (3) the three-factor model of Fama and French (1993, 1996). They estimate the models in a common sample of nine primitive “assets”, portfolios emphasizing variation in equity size, book-to-market and momentum, as well as bond market returns. They find that the principal components-based model is the worst-performing model in this group for explaining the cross section of average returns, as summarized by the Hansen–Jagganathan distance measure described in the next section. Explaining predictability in security returns is another important and controversial application for multi-beta asset-pricing models. Much of the controversy relates to the interpretation. Fama (1970, 1991) emphasizes that evidence relating to market efficiency involves a “joint hypothesis”. A model of equilibrium (essentially, a specification for the SDF) is jointly tested with the hypothesis that markets are informationally efficient with respect to particular information. If the tests reject, then logically the market could be inefficient or the SDF model could be wrong. From this perspective, predictability that cannot be explained using any of the standard assetpricing models suggests market inefficiency; or alternatively, the need to move beyond the standard models. A few studies have compared the alternative approaches to factor selection, for the purpose of explaining return predictability. Ferson and Korajczyk (1995) compare economic factors similar to those chosen by Chen, Roll and Ross (1986) with the asymptotic principal components of Connor and Korajczyk (1986). They study predictability in one-month to two-year returns based on a list of “standard” lagged instruments discussed above, estimating the fraction of the predictable variance of return that is captured by the models. They find that single-factor models can capture about 60% of the predictable variance in a sample of industry returns, while fivefactor models capture about 80%. These results are not highly sensitive to the return horizon. The performance of a five-principal components model and a five prespecified-factor model are broadly similar for capturing predictability in returns for
768
W.E. Ferson
all of the horizons. Farnsworth et al. (2002) find that, among the three-factor models, the Fama–French model performs the worst for explaining predictability in their study. Additional evidence that this model performs poorly for capturing return predictability is presented by Kirby (1998) and Ferson and Harvey (1999). Factors with good contemporaneous explanatory power for security returns are useful for risk modeling and for controlling systematic variance in some research contexts. A regression of security returns on a selection of factors does not impose an asset pricing model unless the regression coefficients are restricted: examples are given in Section 4. But it is easier to draw general conclusions about the empirical performance of the methods of factor selection in this setting. In a given sample, a factor analytic approach constructs factors to be highly correlated with the asset returns. If in-sample, contemporaneous correlation is the goal, this approach almost has to be the most effective. Choosing economic variables is likely to be the worst approach, because security returns, and stock returns in particular, are only weakly correlated with most economic data [e.g., Roll (1988)]. Indeed, this low contemporaneous correlation is one motivation for the use of mimicking portfolios, described in Section 4, to replace factors based on economic data in empirical models.
3. Modern variance bounds 3.1. The Hansen–Jagannathan bounds Hansen and Jagannathan (HJ, 1991) showed how the fundamental asset pricing Equation (1) places restrictions on the mean and variance of mt + 1 . These restrictions depend only on the sample of assets, and thus provide a diagnostic tool for comparing different models of mt + 1 . If a candidate for mt + 1 , corresponding to a particular theory, fails to satisfy the HJ bounds, then it can not satisfy Equation (1). Recent papers refine and extend the HJ bounds in several directions, and a number of papers and textbooks provide basic reviews [see Ferson (1995)]. 18 We briefly review the case where there is no conditioning information, then move on to extensions with conditioning information. Assume that the random column n-vector R of the assets’ gross returns has mean E(R) = m and covariance matrix S. When there is no conditioning information 18 Snow (1991) considers selected higher moments of the returns distribution. Bansal and Lehmann (1997) derive restrictions on E[ln(m)] that involve all higher moments of m and reduce to the HJ bounds if returns are lognormally distributed. Balduzzi and Kallal (1997) incorporate the implications for the risk premium on an economic variable. Cochrane and Hansen (1992) state restrictions in terms of the correlation between the stochastic discount factor and returns, while Cochrane and Saa’-Requejo (2000) consider bounds on the Sharpe ratios of assets’ pricing errors. Hansen, Heaton and Luttmer (1995) develop asymptotic distribution theory for specification errors on stochastic discount factors, where the HJ bounds are a special case, and Ferson and Siegel (2002a) evaluate these standard errors by simulation.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
769
a stochastic discount factor is defined as any random variable m such that E(mR) = 1. Hansen and Jagannathan (1991) show that the stochastic discount factor with minimum variance for its expectation E(m) is given by: m∗ = E(m) + 1 − E(m) E(R ) S −1 [R − m],
(23)
and the minimum variance for a SDF is the variance of m∗ : Var(m) 1 − E(m) E(R ) S −1 [1 − E(m) E(R)] .
(24)
The proof is instructive. Consider a regression of any m satisfying E(mR) = 1 on the asset returns, R. The fitted value is m∗ = E(m) + Cov(m, R) S −1 [R − m], and m = m∗ + e, where e is the regression error satisfying E(e) = 0 = E(eR). Since m∗ is a linear function of R, it follows that E(em∗ ) = 0. Thus, Var(m) = Var(m∗ ) + Var(e) Var(m∗ ). Finally, expanding E(mR) = 1 = E(m) m + Cov(m, R) and substituting for Cov(m, R), we arrive at Equation (23). The right-hand side of Equation (24) is just the variance of the m∗ in Equation (23). The HJ bound is related to the maximum Sharpe ratio that can be obtained by a portfolio of the assets. The Sharpe ratio is defined as the ratio of the expected excess return to the standard deviation of the portfolio return. If the vector of assets’ expected excess returns is m − E(m)−1 and S is the covariance matrix, the square of the maximum Sharpe ratio is [m − E(m)−1 ] S −1 [m − E(m)−1 ]. Thus, from Equation (24) the lower bound on the variance of stochastic discount factors is the maximum squared Sharpe ratio multiplied by E(m)2 . The larger is the maximum squared Sharpe ratio for a given E(m), the tighter is the bound on Var(m) and the more potential SDFs can be ruled out. The Hansen and Jagannathan (1991) region for {E(m), s (m)} is given by the square root of Equation (24). The boundary of this region is the minimum value of the standard deviation, s (m), for each value of E(m). Some empirical examples are illustrated in Figure 3, corresponding to the different versions of the bounds described below. The bounds are drawn for quarterly data similar to Hansen and Jagannathan, consisting of 3-, 6-, 9- and 12-month Treasury bill returns for the 1964–1986 sample period. For a given hyperbola in Figure 3, as we vary E(m) we move around the {E(m), s (m)} boundary. In order for an SDF to satisfy E(mR) = 1, its mean and standard deviation must plot above the boundary, “inside the cup”. The points shown by the “×” symbols are the sample means and standard deviations of the mt + 1 of Equation (8), using quarterly total consumption data per capita in the USA over the same sample period, and various values of the relative risk aversion, a. Note that the SDF does not plot inside even the lowest cup for many values of a. In fact, the SDF just touches the boundary of Figure 3 when a = 71. The SDF does not enter the highest cups for any value of risk aversion. The simple consumption model does not produce SDF’s that are volatile enough. This is a version of the equity premium puzzle
770
W.E. Ferson
2 1.8 1.6 1.4
Std(m)
1.2 1 0.8 0.6 0.4 0.2 0 0.97
0.98
0.99
1
1.01
1.02
1.03
E(m)
Fig. 3. Hansen–Jagannathan bounds.
of Mehra and Prescott (1985), reviewed by Mehra and Prescott in Chapter 14 of this volume. The bound in Equation (24) is not the sharpest lower bound on s (m) that can be derived. Hansen and Jagannathan (1991) show how imposing that mt + 1 is a strictly positive random variable can sharpen the bound. Computing the bounds imposing positivity requires a numerical search procedure. Another way to sharpen the bounds is through the use of conditioning information. 3.2. Variance bounds with conditioning information The preceding analysis is based on the unconditional moments. With conditioning information Z, we may consider a stochastic discount factor for (R, Z) to be any random variable m such that E(mR | Z) = 1 for all realizations of Z. In principle, everything above could be stated for conditional means and variances, an approach pursued by Gallant, Hansen and Tauchen (1990). This would complicate Figure 3, because we would have to show a new one for each realization of Zt . Alternatively, Hansen and Jagannathan (1991) describe a clever way to extend the analysis to partially exploit the information in a set of lagged instruments, while using unconditional moments to describe the bound. Equation (5) implies that for any set of instruments E{mt + 1 ri,t + 1 ⊗ Zt | Zt } = 0, and therefore E{mt + 1 ri,t + 1 ⊗ Zt } = 0, where ⊗ is the Kronecker product. If we view {ri,t + 1 ⊗ Zt } as the excess returns to a set of “dynamic” trading strategies, the preceding analysis goes through essentially unchanged. (The trading rule holds at time t, Zt units of the asset i long and Zt units of the zero-th asset short). The approach of Hansen and Jagannathan (1991) is just one way to implement HJ bounds that use conditioning information. To understand how alternative ap-
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
771
proaches to conditioning information can refine the bounds let rt + 1 be the vector of excess returns. In this case, Equation (5) implies the following: E {mt + 1 rt + 1 f (Zt )} = 0
for all functions
f (·),
(25)
where the unconditional expectation is assumed to exist. In other words, if we consider rt + 1 f (Zt ) to represent a possible dynamic trading strategy, then the presence of conditioning information Zt says that mt + 1 should price all the dynamic trading strategies, not just rt + 1 ⊗ Zt . The larger is the set of strategies for which the Equation (25) is required to hold, the smaller is the set of mt + 1 ’s that can satisfy the condition, and the tighter are the bounds. This is the motivation for extending HJs original approach, in order to use the information efficiently. Three versions of HJ bounds with conditioning information have appeared in the literature. These may be understood through Equation (25). First are the multiplicative bounds of Hansen and Jagannathan (1991), who choose f (·) to be the linear function, I ⊗ Zt . Second are the efficient portfolio bounds of Ferson and Siegel (2002a), where f (·) is the set of portfolio weights that may depend on Zt and sum to 1. Finally, the optimal bounds of Gallant, Hansen and Tauchen (1990) require Equation (25) to hold for all functions f (·). 3.2.1. Efficient portfolio bounds Efficient portfolio bounds are based on the unconditionally efficient portfolios with respect to the information Z, derived by Ferson and Siegel (2001) and discussed in Section 2.6. Since these portfolios maximize the Sharpe ratio, over all dynamic strategies x(Z) whose weights sum to 1.0, they efficiently use the information in Z to tighten the bounds. For given (R, Z), the solutions describe an unconditional meanstandard deviation boundary, as depicted in Figure 2. Fixed-weight combinations of any two portfolios on an unconditional mean-standard-deviation boundary can describe the entire boundary [Hansen and Richard (1987)]. Thus, efficient portfolio bounds can be formed from two “arbitrary” portfolios from the boundary. 3.2.2. Optimal bounds Gallant, Hansen and Tauchen (1990) derive optimal bounds that do not restrict to portfolio functions, with weights that sum to 1.0. The solution for the optimal bounds is presented in Ferson and Siegel (2002a) as follows. First, define the following conditional portfolio constants, which are analogous to the efficient-set constants used in the traditional mean-variance analysis [see, e.g., Ingersoll (1987)]: a(Z) = 1 S(Z)−1 1, b(Z) = 1 S(Z)−1 m(Z),
and
(26)
−1
g(Z) = m(Z) S(Z) m(Z), where m(Z) and S(Z) are the conditional mean and variance functions. The stochastic
772
W.E. Ferson
discount factor m for (R, Z) with minimum variance for its expectation E(m) is given by m∗ (Z) = z(Z) + [1 − ú(Z) m(Z)] S(Z)−1 [R − m(Z)] ,
(27)
where z(Z) is the conditional mean of m given Z, defined as: z(Z) = E (m | Z) = {1 + g(Z)}
−1
% $ −1 b(Z) + E {1 + g(Z)} E(m) − E
The unconditional variance of m∗ (Z) is: 2 $ % b(Z) −1 Var (m∗ (Z)) = E {1 + g(Z)} E(m) − E {1 + g(Z)} b(Z)2 + E [a(Z)] − [E(m)]2 − E . {1 + g(Z)}
b(Z) {1 + g(Z)}
. (28)
(29)
Equation (29) may be used directly to compute the optimal HJ bounds. To implement the bound, it is necessary to specify the conditional mean function m(Z) and the conditional variance function. Then, the four unconditional expectations that appear in Equation (29) may be estimated from the corresponding sample means, independent of the value of E(m). 3.2.3. Discussion For given conditioning information, Z, the optimal bounds provide the greatest lower bound on stochastic discount factors and thus the highest, most restrictive cup. The efficient portfolio bounds incorporate an additional restriction to functions that are portfolio weights, which sum to 1.0 at each date. This reduces the flexibility of the efficient portfolio bounds to exploit the conditioning information, and thus they do not attain the greatest lower bound. Intuitively, suppose there was only one asset. Then the restricted weight could not respond at all to the conditioning information. The multiplicative bound of Hansen and Jagannathan (1991) does not restrict portfolio weights, but neither does it attempt to use the conditioning information efficiently. Bekaert and Liu (2002) and Basu and Stremme (2003) further discuss the relations among the HJ bounds with conditioning information. Ferson and Siegel (2002a) conduct a simulation study of HJ bounds with conditioning information. They find that sample values of the bounds are upwardly biased, the bias becoming substantial when the number of assets is large relative to the number of time-series observations. This means that studies using the biased bounds run a risk of rejecting too many models for the stochastic discount factor. They derive a finite-sample adjustment for the bounds and show that it helps control the bias.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
773
Sarkissian (2002) uses the adjustments and finds that they change the inferences about international consumption-based asset-pricing models. The evidence to date leads to several conclusions. First, the Multiplicative bounds of HJ can be terribly biased in realistic samples. It is important to use the finitesample adjustment to their location. Second, the Optimal bounds of Gallant, Hansen and Tauchen (1990) are more difficult to implement than the multiplicative bounds, requiring the specification of conditional means and variances, but they are the tightest bounds. While not as biased as the multiplicative bounds, they are significantly biased in some finite samples. The finite-sample adjustment should be used and improves their accuracy. Third, the efficient-portfolio bounds are similar in complexity to the optimal bounds, also requiring the specification of the conditional moments. However, unlike the optimal bounds, they remain valid (but inefficient) when these moments are not specified correctly. They have the smallest sampling error variances of all the bounds with conditioning information. They are not as biased as either the optimal or multiplicative bounds, but finite-sample adjustment is still useful. 3.3. The Hansen–Jagannathan distance Hansen and Jagannathan (1997) develop measures of misspecification for models of the stochastic discount factor. They consider m(f), a “candidate” stochastic discount factor, as may be proposed by an asset-pricing model. Since the candidate SDF is misspecified, then E(m(f) R − 1|Z) Ñ 0. They propose a measure for how “close” is m(f) to a stochastic discount factor that “works”. We first consider the case where there is no conditioning information. It is easy to show that a particular SDF, formed from the asset returns, m∗ = [1 E(RR )−1 ] R, is one that “works” for pricing R. Hansen and Jagannathan measure how close m(f) is to m∗ . They do this by first projecting m(f) on the returns R to get the fitted value m ˆ = [E(m(f) R ) E(RR )−1 ] R. They then measure the mean square ∗ distance between ( the fitted) values and m . This is the HJ Distance Measure: ˆ − m∗ )2 , HJD = E (m (30) where the sample averages are used in practice to estimate the expectations. Note that m ˆ − m∗ = [E(m(f) R) − 1] E(RR )−1 R, so we may write HJD as: [E(m(f) R) − 1] E(RR )−1 [E(m(f) R) − 1]. This leads to a couple of nice interpretations. First, if we let g = E(m(f) R) − 1), W = E(RR )−1 , then HJD = g Wg is Hansen’s J -test with a particular W , described in the next section. Second, by analogy with the T 2 test, HJD measures the “most mispriced” return. To see this interpretation, recall that g = am = E{m(f) R − 1} is a measure of expected pricing error using m(f). The alpha of a portfolio with weight vector x is the scalar ap = x am . Consider the problem of finding the absolutely most mispriced portfolio, relative to its second moment:
Max 2x a + l x E RR x − E rp2 , x
where l is a Lagrange multiplier. The maximized value of ap2 / E(rp2 ) is a E(RR )−1 a, equivalent to the HJD measure.
774
W.E. Ferson
When there is conditioning information in the form of lagged instruments, Z, then a correctly specified SDF has E(m(f) R − 1|Z) = 0, which implies E[m(f)(R ⊗ Z) − (1 ⊗ Z)] = E(0 ⊗ Z) = 0. Let z´ ≡ Z./ E(Z), where ./ denotes element-by-element division. The previous equation holds only if E[m(f)(R ⊗ z´ ) − (1 ⊗ z´ )] = 0, or ´ = 1, and we can E[m(f)(R ⊗ z´ )] = 1. If we define R´ ≡ R ⊗ z´ , we have E[m(f) R] proceed as before using R´ instead of R.
4. Methodology and tests of multifactor asset-pricing models The method of moments is briefly reviewed as a general way to test models based on Equation (2). This general framework is then specialized to discuss various tests of asset-pricing models. The special cases include cross-sectional regressions and multivariate regressions. 4.1. The Generalized Method of Moments approach Let xt + 1 be a vector of observable variables. Given a model which specifies mt + 1 = m(q, xt + 1 ), estimation of the parameters q and tests of the model can proceed under weak assumptions, using the Generalized Method of Moments (GMM), as developed by Hansen (1982). Define the model error term: ui,t + 1 = m (q, xt + 1 ) Ri,t + 1 − 1.
(31)
Suppose that we have a sample of N assets and T time periods. Combine the error terms from Equation (31) into a T × N matrix u, with typical row ut + 1 . Equation (2) and the model for mt + 1 imply that E(ui,t + 1 | Zt ) = 0 for all i and t, and therefore E(ut + 1 ⊗ Zt ) = 0 for all t. The condition E(ut + 1 ⊗ Zt ) = 0 says that ut + 1 is orthogonal to Zt , and is therefore called an orthogonality condition. Define an N × L matrix of sample mean orthogonality conditions: vec(Z u/T ), where Z is a T × L matrix of observed instruments with typical row Zt , a subset of the available information at time t. 19 Hansen’s (1982) GMM estimates of q are obtained as follows. Search for parameter values that make g close to zero by minimizing a quadratic form g Wg, where W is a fixed NL × NL weighting matrix. Hansen (1982) shows that the estimators of q that minimize g Wg are consistent and asymptotically normal, for any fixed W . If W is chosen to be the inverse of a consistent estimate of the covariance matrix of the orthogonality conditions, g, the estimators are asymptotically efficient in the class of
19
The vec(·) operator stacks the columns of a matrix. We assume that the same instruments are used for each of the asset equations. In general, each asset equation could use a different set of instruments, which complicates the notation.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
775
estimators that minimize g Wg for fixed W ’s. The asymptotic variance matrix of the GMM estimator of the parameter vector is then: −1 ðg ðg * Cov(q) ≈ T W . (32) ðq ðq where ðg/ðq is an NL× dim(q) matrix of derivatives. Hansen (1982) also shows that J = Tg Wg is asymptotically chi-square distributed, with degrees of freedom equal to the difference between the number of orthogonality conditions NL and the number of parameters, dim(q). This is Hansen’s J -statistic, mentioned in the last section, which serves as a goodness-of-fit statistic for the model. Several choices for the weighting * −1 , matrix W are available. A simple version of the optimal choice, where W = Cov(g) is: 1 1 * St ut + 1 ut + 1 ⊗ Zt Zt , Cov(g) = St gt gt = (33) T T and ⊗ denotes the Kronecker product. This case applies when the error terms ut , and therefore the moment conditions gt are serially uncorrelated. More general cases, and more detailed reviews are available in Hamilton (1994), Ferson (1995), Harvey and Kirby (1996) and Cochrane (2001), among others. 4.2. Cross-sectional regression methods Much of the early empirical work on asset pricing used cross-sectional regressions of returns on estimates of market betas [e.g., Lintner, reported in Douglas (1969)]. The approach remains popular. Multiple-beta models, in particular, are often studied using this technique [e.g., Chen, Roll and Ross (1986), Ferson and Harvey (1991), Fama and French (1993), Lettau and Ludvigson (2001)]. Cross-sectional regression is appealing because it is an intuitive approach. Taking the simple CAPM as an example, we hypothesize: E(Ri ) = Rf + bi E(Rm − Rf ), i = 1, . . . , N . The model implies that the cross-sectional relation between mean returns and betas has a slope equal to the expected excess return of the market. The intercept should be a risk-free return, or a zero-beta portfolio expected return. Let’s start our discussion of cross-sectional regressions with a classical two-step approach, similar to that of Black, Jensen and Scholes (1972) or Fama and MacBeth (1973). For the first step, suppose that market betas are constant over time. The betas come from: rit = ai + rmt bi + eit ,
t = 1, . . . , T
for each
i,
(34)
For now we ignore the estimation error in the time series estimates of beta. This will be discussed later. The second step is a cross-sectional regression for each month: bi + uit , rit = l0t + l1t
i = 1, . . . , N.
(35)
There could be K > 1 betas if we are testing a multi-factor asset-pricing model, then rmt is a vector of K excess returns, and l1t is a K-vector of slope coefficients.
776
W.E. Ferson
It is instructive to consider the GMM solution to the cross-sectional regression bi ) ⊗ (1, bi ). Choose the parameters to estimator. Define gt = (1/N ) Si (rit − l0t − l1t minimize gt Wgt for some weighting matrix, W . Here the model is exactly identified, with the same number of parameters as moment conditions, so the GMM solution may be obtained by setting gt = 0. This results in: 1 1 Si bi lˆ 0t = Si rit − lˆ 1t N N −1 lˆ 1t = Si bi bi Si bi rit − lˆ 0t ,
(36)
Iteratively solving Equation (36) yields estimates of the premium for market beta, l1t , and the zero-beta return, l0t , similar to Black, Jensen and Scholes (1972). If the riskfree rate is known, we have a cross-sectional regression of excess returns on betas. Of course, we want to use the cross-sectional regression to test hypotheses on the coefficients. For example, the hypothesis that E(l1t ) = 0 says that the expected market risk premium is zero, or that beta has no cross-sectional explanatory power for returns. Alternatively, we may hypothesize that E(l1t − rmt ) = 0, which says that the premium is the market excess return. Standard errors for the coefficients may be obtained from Equation (36), which implies that −1 lˆ 1t − l1t = Si bi bi Si bi uit , so that −1 −1 Var lˆ 1t − l1t = Si bi bi Var (Si bi uit ) Si bi bi −1 −1 B Cov (ut ) B B B , = B B
(37)
where B is the N × K matrix of betas. Note that the variance of the estimators given by Equation (37) is not the same as the OLS solution, sut (B B)−1 , where sut is a scalar, that one would obtain using a standard regression package to run a cross-sectional regression. Only in the special case where Cov(ut ) = sut In , are the OLS standard errors correct. This would occur if the cross-sectional regression errors were uncorrelated across assets and homoskedastic across assets – a very unlikely scenario for stock market return data. 4.2.1. The Fama–MacBeth approach Fama and MacBeth (1973) devise a simple and clever way to get estimates of the standard errors, while accounting for cross-sectional dependence. They suggest using the time-series of the estimators from a sequence of cross-sectional regressions, one for each month in the sample, to compute the standard error of the mean
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
777
coefficient. In testing the hypothesis that l1t = 0, they propose a simple t-ratio: (1/T ) St lˆ 1t / se((1/T ) St lˆ 1t ), where the standard error is estimated by: 2 1/ 2 1 ˆ 1 1 ˆ 1 ˆ . (38) se St l1t ≈ √ St l1t − St l1t T T T T We can evaluate this using the previous equations. First, the sample variance of lˆ 1t is examined under the null hypothesis that l1t is zero. 2 ( ) 1 ( )2 −1 −1 1 1 1 E St lˆ 1t − St lˆ 1t = E St l1t + B B B ut − St l1t + B B B ut T T T T 2 −1 1 1 = E S t B B B ut − S t u t , if l1t = 0, T T −1 −1 1 1 1 St ut − St ut ut − St ut =E BB B B B B T T T −1 −1 = B B B [Cov(u)] B B B , which is the same as Equation (37). If we assume the lˆ 1t are uncorrelated over time, with a constant variance, we have Var{(1/T ) St lˆ 1t } = (1/T )2 St Var{lˆ 1t }. Estimating Var{lˆ 1 } with the sample variance of the time series of the lˆ 1t , as in Equation (38), produces the correct result. Thus, if we ignore estimation error in the betas, and assume that stock returns are serially uncorrelated, then under the null hypothesis that l1t = 0, the Fama–MacBeth approach delivers the correct standard errors. 4.2.2. Interpreting the estimates Fama (1976) provides an intuitive interpretation of the cross-sectional regression estimators as portfolio returns. To fix things, start with the cross-sectional regression Rit = l0t + l1t bi + uit , i = 1, . . . , N . Let there be a single beta (K = 1). The CAPM implies that E(l1t ) = E(Rmt − Rft ) and E(l0t − Rft ) = 0. Under the standard assumptions that make OLS best linear unbiased, the cross-sectional estimator solves: lˆ 1t solves Min Si uˆ 2 , subject to : {wi}
it
Unbiased :
E lˆ 1t = l1t
(39)
lˆ 1t = Si wi Rit . We can use these conditions to characterize lˆ 1t as a portfolio. 20 In particular: E lˆ 1t = E (Si wi Rit ) = E (Si wi [l0t + l1t bi + uit ]) = l1t Linear :
20
Since uit is likely to be correlated across assets, GLS is better in theory. This amounts to a transformation of the asset returns and their betas into a different set of portfolios, then running OLS on the new portfolios. Therefore, the intuition here will translate.
778
W.E. Ferson
implies (Si wi ) = 0, and (Si wi bi ) = 1. This shows that the portfolio has weights, {wi } on the assets which sum to zero, and has a beta equal to one. 21 The first condition, (Si wi ) = 0, says that the return is an excess return. The second condition, (Si wi bi ) = 1, says that the portfolio beta must equal 1.0. In order for the weights to sum to zero while the beta is positive, the portfolio must be “long” (positive weights) in high beta securities, and also “short” (negative weights) in low beta securities. A similar analysis restricts the intercept estimator, implying that (Si wi ) = 1 and (Si wi bi ) = 0. Thus, the intercept is a fully invested portfolio with no “systematic”, or factor-related risk. Its expected return should therefore be the zero-beta rate. If there is a risk-free security, this should be the risk-free rate. The Fama–MacBeth cross-sectional regression coefficients represent one way to obtain the excess returns on mimicking portfolios for the risk factors. This is especially useful if the factors are not traded excess returns. Regression betas on nontraded variables, such as consumption, GNP growth or inflation, can be used. In this case, the Fama–MacBeth coefficients deliver excess returns, whose expected values are risk premiums for the factors. Indeed, the preceding analysis goes through if the crosssectional regressors are not betas. For example, studies have used attributes such as firm size, dividend yield, or book-to-market ratio in place of beta coefficients. 4.2.3. A caveat The Fama–MacBeth procedure constructs a “factor-mimicking” portfolio, for anything that we put on the right-hand side of the regression. This raises a potentially serious caveat. If a firm attribute is used that represents an anomaly, even if completely unrelated to risk, the procedure can deliver a mimicking portfolio return that may appear to work as a risk factor. This caveat is explored by Ferson, Sarkissian and Simin (1999). For a simple illustration, consider the following hypothetical regression: Rit = l0t + l1t ai + uit ,
i = 1, . . . , N ,
(40)
where ai is an anomaly in the average return of asset i. Let Ai ≡ ai − (1/N ) Si ai , then the OLS Fama–Macbeth slope estimator constructs the portfolio: Rpt = lˆ 1t = Si wi Rit ,
wi =
Ai . Si A2i
(41)
Suppose we used Rpt as a “factor” in an asset-pricing model. Would it appear to “price”, i.e., would returns be linear in covariances with Rpt ? Cov(R) A Cov Rt , Rpt = Cov(R) w = , Si A2i
(42)
21 This condition would also apply in a multi-beta context, in which case the coefficient for a particular beta is a portfolio return with unit beta on the particular factor. Unbiasedness would also imply that the beta on the other factors equal zero, so the portfolio targets only the risk as represented by the factor in question.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
779
where A is the N -vector of the Ai ’s. If Cov(R) A ∝ A, then the vector of covariances with Rpt = lˆ 1t will appear to “explain” the cross-section of expected asset returns. For example: suppose Cov(R) = I . Then, returns are independent and there is no systematic risk. But, Cov(R) A ∝ A, so the “factor” Rpt , formed by the Fama–MacBeth approach, will appear to work perfectly, in the sense that covariances with the factor return will exactly explain the cross-section of expected returns! Similar results should be obtained when “spread” portfolios replace the FM coefficients, as in Fama and French (1993, 1996). Spread portfolios are formed as the difference between a high-attribute portfolio and a low-attribute portfolio return. Thus, they are long the high-attribute stocks, short the low-attribute stocks, and their weights sum to zero. A cross-sectional regression coefficient for stock returns on the attribute is also a linear combination of the returns, with weights that sum to zero. The portfolio is long in high-attribute stocks and short in low-attribute stocks. If a multiple regression is used, it has zero exposure to the other regressors. Subject to these conditions, it has minimum variance. A spread portfolio has a similar property if multiple independent sorts are used, as in Fama and French (1996), to control for the other attributes. While a spread portfolio does not explicitly minimize variance subject to these conditions, it avoids estimation error. Ferson, Sarkissian and Simin (1999) provide an example where Fama–MacBeth coefficients and spread portfolios, similar to Fama and French (1996), produce similar results in the face of an anomaly in asset returns. Their example shows that an arbitrary attribute, bearing an anomalous relation to returns, can be repackaged as a spurious risk factor. Recent studies employing the approach of Fama and French (1996) do not use arbitrary anomalous attributes. Some of the most empirically powerful characteristics for the cross-sectional prediction of returns are ratios, with market price per share in the denominator. Berk (1995) emphasizes that the price of any stock is the value of its future cash flows discounted by future returns, so an anomalous pattern in the crosssection of returns would produce a corresponding pattern in book-to-market ratios or other proxies of cash-flow-to-price. A cross-sectional regression of returns on these ratios will pick out the anomalous patterns. Thus, the use of valuation ratios such as book-to-market as a sorting criterion increases the risk of creating a spurious risk factor. In the real world, empirically measured attributes may be correlated with systematic risk and also with anomalous patterns in return. The net result of the two effects, risk versus anomaly, is complicated and model specific. However, equity market databases are inherently unbalanced panels, with more stocks than quarters or months. As new data on equity attributes becomes widely accessible, more studies will sort securities according to their attributes. The important caveat is that sorting procedures are subtle and easily abused. More work is needed to improve our understanding of the properties of such approaches.
780
W.E. Ferson
4.2.4. Errors-in-betas When the cross-sectional regression uses betas that are measured with error, two main issues arise. First, the cross-sectional regression coefficients suffer from a classical “attenuation bias”. Second, the standard errors are biased. Early studies that used crosssectional regression also used portfolio grouping procedures, attempting to minimize errors in the betas, and to ensure that the remaining errors were uncorrelated with the other error terms in the model. More recently, empirical studies have taken to sorting stocks in order to accentuate some anomaly in the data, such as firm size, book-tomarket, etc., in order to “challenge” the asset pricing model more forcefully. Thus, concerns about errors in the betas remain relevant. Consider first the cross-sectional regression model with no errors in the betas: Rt = l0t + Blt + ut ,
(43)
where Cov(ut , B) = 0 and Rt is an N -vector of returns. Assume that we don’t get to see the true B, instead we have B∗ , where: B∗ = B + v = true + “noise ,
Cov(v, B) = 0.
(44)
Using the first stage time-series or GMM estimation, we can get an estimate of Cov(v), the cross-sectional covariance matrix of the errors-in-betas. If we run the crosssectional regression on the noisy betas: Rt = l0t + B∗ lt + et ,
(45)
then lˆ 1 →p Cov(B∗ )−1 Cov(B∗ , Rt ) = [Cov(B) + Cov(v)]−1 Cov(B + v, Blt + ut ) = [Cov(B) + Cov(v)]−1 {Cov(B) lt }. Theil (1971) proposes an adjusted estimator to control the bias: −1 Cov (B∗ ) lˆ 1 →p lt . lt∗ ≡ Cov (B∗ ) − Cov(v)
(46)
This estimator is used by Black and Scholes (1974), and Litzenberger and Ramaswamy (1979, 1982). Most of the preceding analysis assumes that the same betas are used in each crosssectional month. Under this simplifying assumption, errors in betas imply that the cross-sectional regression coefficients are not independent over time, because the same beta (with error) is used in each month. Shanken (1992) shows how to correct Fama– MacBeth standard errors for this fact. In principle, the cleanest way to deal with errors-in-betas is to estimate the time-series model of betas and the cross-sectional regression simultaneously, thus accounting for the estimation error. This is the subject of the next section.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
781
4.3. Multivariate regression and beta-pricing models Tests of portfolio efficiency using multivariate regression analysis and maximum likelihood became popular in empirical finance following the work of Jobson and Korkie (1982, 1985), Gibbons (1982) and Stambaugh (1982). The traditional focus of this literature is tests of the CAPM, which implies that the market portfolio is meanvariance efficient. However, given the discussion in Section 2.3, multibeta models and stochastic discount factor models also imply that some portfolio is minimum variance efficient, and the techniques reviewed here can be applied. Following the traditional literature, we ignore the presence of conditioning information in this section. [For tests of efficiency with conditioning information, see Ferson and Siegel (2002b)]. For simplicity, consider the case where a risk-free asset exists. Let rt = {Rit − RFt }i be an N -vector of excess returns, and let rpt = Rpt − RFt , be the excess return of the particular portfolio to be tested. The null hypothesis to be tested is that Rpt is a minimum-variance portfolio, which from Equation (13) is equivalent to: −1 , (47) E (rt ) = bE rpt ; b ≡ Cov rt ; rpt Var rpt when there is a given riskfree rate. The tests are based on a regression model: rt = a + brpt + ût ,
ût ~ iid (0, W),
(48)
where the null hypothesis implies that the vector of alphas or intercepts, a = 0. MacKinlay and Richardson (1991) illustrate that it is easy to use the GMM to implement the tests of portfolio efficiency. To do so, one can form the moment conditions: ût = rt − a − brpt , Zt = 1, rpt , St (ût ⊗ Zt ) . g= T
(49)
The parameters are f = (a , b ) . Choosing the parameters to Minf g Wg, we have the GMM estimators, which are the same as seemingly-unrelated OLS. These are consistent and asymptotically normal, even without the assumptions that the error terms are independent and identically distributed over time. It is assumed that the data are stationary, that E(ût ) = 0 = E(ût rpt ), and other technical conditions given by Hansen (1982). If the assumptions that justify OLS as best linear unbiased are imposed, GMM delivers the OLS standard errors as well. If the GMM uses the “optimal” weighting matrix, W = (1/T )[Cov(g)]−1 , then the asymptotic variance of the parameters is given by Equation (32). Imposing that ût ~ iid(0, W), the GLS standard errors fall out as a special case. Several tests for the hypothesis that a = 0 are available using the GMM. [See, e.g.,
782
W.E. Ferson
Newey and West (1987)]. One example is the Wald test, which may be formed as T a A Cov(a)−1 a, where A Cov(·) denotes the asymptotic covariance. 22 The Wald statistic is asymptotically distributed as a Chi-squared variable, with degrees of freedom equal to the dimension of a. Much of the literature works in a normal, maximum likelihood setting. In this case, the log of the likelihood function to be maximized is: ln L =
T NT ln(2P ) − ln |W| − 12 St rt − a − brpt W −1 rt − a − brpt . 2 2
(50)
Standard tests for the hypothesis that a = 0 are compared by Buse (1982) and Gibbons, Ross and Shanken (1989), and most of the standard tests have been used to test the efficiency of stock market indexes, as in the CAPM. Examples include the likelihood ratio test [Gibbons (1982)], the Lagrange multiplier test [Stambaugh (1982)] and the Wald test [Gibbons, Ross and Shanken (1989)]. The Wald test is of particular interest. This is not because of its sampling performance, which is typically the worst of the three, but because it leads to a graphical interpretation that provides some economic intuition for the tests. Since the likelihood ratio and Lagrange multiplier tests are simple transformations of the Wald statistic, as shown by Buse (1982), a similar intuition would apply. We first need some facts about squared Sharpe ratios. The Sharpe ratio of rp is E(rp )/ s (rp ), the ratio of the expected excess return to the standard deviation. Let S 2 (r) be the maximum squared Sharpe ratio that can be obtained using fixed-weight portfolios of the N assets: S 2 (r) ≡ Max x
(x E(r))2 = E(r) S −1 E(r), x Sx
(51)
where the second equality follows from solving the calculus problem. The maximum squared Sharpe ratio in a sample of assets is related to the squared Sharpe ratio of a tested portfolio, rp , included among the test assets, through a quadratic form in the alphas. I call this result the: Law of Conservation of Squared Sharpe Ratios. S 2 (r) = a S −1 a + S 2 rp .
(52)
√ The notation is as follows: T (C − C) converges in distribution to a vector with mean zero and variance, A Cov(C). Thus, the asymptotic approximation to the finite sample variance of C is (1/T ) A Cov(C). 22
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
783
A proof uses the fact that, in the stacked regression model, a S −1 b = 0. 23 Proof: S 2 (r) = E(r) S −1 E(r), −1 = a + bE rp S a + bE rp , = a S −1 a + 2E rp a S −1 b + E rp b S −1 bE rp , = a S −1 a + E rp b S −1 bE rp , −1 = a S −1 a + E rp Var rp E rp , = a S −1 a + S 2 rp . The law states that the highest squared Sharpe ratio obtainable in the sample is equal to the squared Sharpe ratio of the tested portfolio, plus a sort of squared Sharpe ratio, based on the alphas. If the alphas are zero, the two Sharpe ratios are the same and the tested portfolio is efficient. When the tested portfolio is not efficient, the quadratic form in the alphas tells how far it is from efficient. This is similar to our previous discussion of how a quadratic form in the APT pricing errors bounds the correlation between a combination of the APT factor portfolios and a minimum variance efficient portfolio. MacKinlay (1995) develops the interpretation of portfolios whose weights are proportional to S −1 a, which have many interesting properties. The law of conservation of squared Sharpe ratios provides a graphical interpretation of the Wald test statistic. Using the law, and the fact that in a multivariate regression model the covariance matrix of the intercept estimator is proportional to the covariance
23 This occurs when the right-hand side variable(s) are simple combinations of the test assets. In a stacked regression model: r = a + rp b + u, where rp = rW is a combination of the test assets with weight given by the n × k matrix, W . Using the definition b = (W SW )−1 W S, where S is the covariance matrix of r, then
$ −1 % a S −1 b = E r − rp b S −1 W SW W S , −1 = E r − rp b S −1 SW W SW , −1 −1 = E r − rW W SW W S W W SW , ( −1 −1 ) − rW W SW , = E rW W SW = 0. Note also that Var(rp ) = (W SW ), and b S −1 b = (W SW )−1 .
784
W.E. Ferson
matrix of the left-hand side asset returns, or Cov(a) = (1 + S 2 (rp )) S, we may write the Wald test as: Wald = T a Cov(a)−1 a, −1 −1 a S a, = T 1 + S 2 rp 2 −1 S (r) − S 2 rp . = T 1 + S 2 rp
(53)
Thus, the test may be interpreted as a normalized difference between S 2 (r), the maximum squared Sharpe ratio in the sample of tested assets, and S 2 (rp ), the squared ratio for the tested portfolio. If the tested portfolio presents a Sharpe ratio that is “close” to the sample efficient portfolio, the value of the test statistic is small and we should not reject efficiency. If the tested portfolio lies far inside the sample mean variance frontier, we are likely to reject its efficiency. 4.3.1. Comparing the SDF and beta-pricing approaches Before the mid 1980s, most of the empirical asset pricing literature used the betapricing representation (9) and regression-based approaches or MLE. Then, the SDF representation combined with the GMM began to take hold. The latter combination is appealing, since Et {mR − 1} = 0 leads naturally to moment conditions for the GMM, and it is easy to multiply by lagged instruments, in order to use conditioning information. Recent studies have started to explore the tradeoffs between these approaches; see Kan and Zhou (1999), Jagannathan and Wang (2002) and Cochrane (2001). We have seen that both cross-sectional and time-series regressions are special cases of the GMM. So is maximum likelihood. If we use the GMM on the first order conditions, or “scores” of the likelihood function (50), we get quasi-maximum likelihood estimators. If we further impose normality, then the information matrix identity leads to the MLE standard errors for the parameters, and therefore to the Cramer–Rao lower bound [see Hamilton (1994, Chapter 14)]. The implication is that the tradeoff between the approaches has little to do with GMM versus MLE or regression, but has everything to do with the set of moments that are examined. The SDF representation and the beta pricing formulation can lead us to examine different moments. When they do, their empirical results can differ. This can be illustrated in the context of a recent debate. Kan and Zhou (1999) consider returns in excess of a risk-free rate, r, comparing beta pricing with the SDF approach. Ignoring conditioning information, beta pricing says rt = b( ft + l) + ut , where ft ≡ Ft − E(Ft ) is the mean-centered factor, and the moments are E(ut ) = E(ut ft ) = 0. The stochastic discount factor is mt = 1 − b ft , and the SDF moment conditions for the excess returns are E{rr (1 − b ft )} = 0. Kan and Zhou find that the SDF approach is much less efficient than beta pricing. However, the moments being used are not the same. Kan and Zhou implicitly assume that E(Ft ) is known,
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
785
ignoring the moment condition for estimating E(Ft ), and the sampling variation that this moment condition generates. Jagannathan and Wang (2002) and Cochrane (2001) show that when the two methods correctly exploit the same moments, they deliver nearly identical results.
5. Conditional performance evaluation Classical measures of investment performance compare the return of a managed portfolio to that of a benchmark. For example, an alpha for a fund may be calculated as the average return in excess of a risk-free rate, minus a fixed beta times the average excess return of a benchmark portfolio. Using the market portfolio of the CAPM as the benchmark, Jensen (1968) advocated such a risk-adjusted measure of performance. These classical measures are “unconditional”, in the sense that the expected returns and betas in the model are unconditional moments, estimated by past averages. If expected returns and risks vary over time, the classical approach is likely to be unreliable. For example, if the risk exposure of a managed portfolio varies predictably with the business cycle, but the manager has no superior forecasting ability, a traditional approach to performance measurement will confuse the common variation in fund risk and expected market returns with abnormal performance. In conditional performance evaluation, we model the expected returns and risk measures, attempting to account for their changes with the state of the economy, and thus controlling for common variation. The problem of confounding variation in mutual fund risks and market returns has long been recognized [e.g., Jensen (1972), Grant (1977)], but previous studies interpreted it as reflecting superior information or market timing ability. Conditional performance evaluation takes the view that a managed portfolio strategy which can be replicated using readily available public information should not be judged as having superior performance. For example: in a conditional approach, a mechanical market timing rule using lagged interest rate data is not a value-adding strategy. Only managers who correctly use more information than is generally publicly available, are considered to have potentially superior ability. Conditional performance evaluation is therefore consistent with a version of market efficiency, in the semi-strong form sense of Fama (1970). The beauty of a conditional approach to performance evaluation is that it can accommodate whatever standard of superior information is held to be appropriate, by the choice of the lagged instruments that are used to represent the public information. Incorporating a given set of lagged instruments, managers who trade mechanically in response to these variables get no credit. In practice, the trading behavior of managers may overlay complex portfolio dynamics on the dynamics of the underlying assets they trade. The desire to handle such dynamic strategies further motivates a conditional approach.
786
W.E. Ferson
5.1. A numerical example The appeal of a conditional model for performance evaluation can be illustrated with the following highly stylized numerical example. Assume that there are two equallylikely states of the market as reflected in investors’ expectations; say, a “Bull” state and a “Bear” state. In a Bull market, assume that the expected return of the S&P500 is 20%, and in a Bear market, it is 0%. The risk-free return to cash is 5%. Assume that all investors share these views – the current state of expected market returns is common knowledge. In this case, an investment strategy using as its only information the current state of the market, will not yield abnormal returns. Now, imagine a mutual fund which holds the S&P500 in a Bull market and holds cash in a Bear market. Conditional on a Bull market, the beta of the fund is 1.0, the fund’s expected return is 20%, equal to the S&P500, and the fund’s alpha is zero. Conditional on a Bear market, the fund’s beta is 0.0, the expected return of the fund is the risk-free return, 5%, and the alpha is, again, zero. A conditional approach to performance evaluation correctly reports an alpha of zero in each state. By contrast, an unconditional approach to performance evaluation incorrectly reports an alpha greater than zero for our hypothetical mutual fund. The unconditional beta of the fund 24 is 0.75. The unconditional expected return of the fund is .5(.20) + .5(.05) = 0.125. The unconditional expected return of the S&P500 is .5(.20) + .5(.0) = .10, and the unconditional alpha of the fund is therefore: (.125 − .05) − 0.75(.10 − .05) = 0.0375. The unconditional approach leads to the mistaken conclusion that the manager has positive abnormal performance. But the manager’s performance does not reflect superior skill or ability, it just reflects the fund’s decision to take on more market risk in times when the risk is more highly rewarded in the market. Investors who have access to the same information about the 24
The calculation is as follows. The unconditional beta is Cov(F, M )/ Var(M ), where F is the fund return and M is the market return. The numerator is: Cov(F, M ) = E {(F − E(F)) (M − E(M )) | Bull} × Prob(Bull) + E {(F − E(F)) (M − E(M )) | Bear} × Prob(Bear) = {(0.20 − 0.125)(0.20 − 0.10)} × 0.5 + {(0.05 − 0.125)(0 − 0.10)} × 0.5 = 0.0075. The denominator is:
Var(M ) = E (M − E(M ))2 | Bull × Prob(Bull)
+ E (M − E(M ))2 | Bear × Prob(Bear)
= (0.20 − 0.10)2 × 0.5 + (0.0 − 0.10)2 × 0.5 = 0.01. The beta is therefore 0.0075/ 0.01 = 0.75.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
787
economic state, would not be willing to pay the fund management fees to use this common knowledge. 5.1. Stochastic discount factor formulation For a given SDF we may define a fund’s conditional SDF alpha following Chen and Knez (1996) and Farnsworth et al. (2002) as: apt ≡ E mt + 1 Rp,t + 1 | Zt − 1,
(54)
where one dollar invested with the fund at time t returns Rp,t + 1 dollars at time t + 1. If the SDF prices a set of “primitive” assets, Rt + 1 , then apt will be zero when the fund (costlessly) forms a portfolio of the primitive assets, if the portfolio strategy uses only the public information at time t. In that case Rp,t + 1 = x(Zt ) Rt + 1 , where x(Zt ) is the portfolio weight vector. Then Equation (2) implies that apt = [E(mt + 1 x(Zt ) Rt + 1 | Zt )] − 1 = x(Zt ) [E(mt + 1 Rt + 1 | Zt )] − 1 = x(Zt ) 1 − 1 = 0. Consider an example where mt + 1 is the intertemporal marginal rate of substitution for a representative investor, and Equation (2) is the Euler equation which must be satisfied in equilibrium. If the consumer has access to a fund for which the conditional alpha is not zero he or she will wish to adjust the portfolio, purchasing more of the fund if alpha is positive and less if alpha is negative. The SDF alpha depends on the model for the SDF, and the SDF is not unique unless markets are complete. Thus, different SDFs can produce different measured performance. This mirrors the classical approaches to performance evaluation, where performance is sensitive to the benchmark. 25 While apt is in general a function of Zt , it is simpler to discuss the estimation of ap = E(apt ). The parameter ap is the expectation of the conditional alpha, defined by Equation (54). Thus, we examine the average abnormal performance of a fund. 26 A useful approach for estimating SDF alphas in this case is to form a system of equations as follows: u1t = [mt + 1 Rt + 1 − 1] ⊗ Zt , u2t = ap − mt + 1 Rp,t + 1 + 1.
(55)
The sample moment condition is g = T −1 St (u1t , u2t ) . We can use the GMM to simultaneously estimate the parameters of the SDF model and the fund’s SDF alpha.
25 Roll (1978), Dybvig and Ross (1985), Brown and Brown (1987), Chen, Copeland and Mayers (1987), Lehmann and Modest (1988) and Grinblatt and Titman (1989b) address this issue in the beta-pricing context. Farnsworth et al. (2002) provide empirical evidence for the SDF setting. 26 For a discussion of time-varying conditional alphas, see Christopherson, Ferson and Glassman (1998a).
788
W.E. Ferson
5.1.1. Invariance to the number of funds The system (55) may be estimated using a two-step approach, where the parameters of the model for mt + 1 are estimated in the first step and the fitted SDF is used to estimate alphas in the second step. Farnsworth et al. (2002) find that simultaneous estimation is dramatically more efficient. However, a potential problem with the simultaneous approach is that the number of moment conditions grows substantially if many funds are to be evaluated, and there are more funds than months in most of the available data sets. Fortunately, Farnsworth et al. (2002) show that we can estimate the joint system separately for each fund without loss of generality. Estimating a version of system (55) for one fund at a time is equivalent to estimating a system with many funds simultaneously. The estimates of ap and the standard errors for any subset of funds is invariant to the presence of another subset of funds in the system. 5.1.2. Additional issues Farnsworth et al. (2002) consider two sets of linear factor models for mt + 1 . One is based on nontraded factors (e.g., industrial production) and another is based on traded factors (e.g. the S&P500 index). For the traded factor models, they find that it is important to impose the restriction that the model price the traded factors. For example, in the unconditional CAPM, mt + 1 = a + bRm,t + 1 , where Rm,t + 1 is the gross market return. Requiring the model to price the market return and also a zero beta return we have: E
a + bRm,t + 1 Rm,t + 1 = 1
and
E
a + bRm,t + 1 R0t + 1 = 1.
(56)
These two conditions identify the parameters a(·) and b(·) as functions of the first and second moments of the market index and the zero-beta return, as shown previously in Lemma 1. Farnsworth et al. also find that it is important to impose the restriction that the model prices the risk-free asset. This identifies the conditional mean of the SDF: E(mt + 1 |Zt ) = R−1 ft , when Rft is included in Zt . Non-traded factor models, in particular, are much less accurate when they aren’t forced to price the risk-free asset.
5.2. Beta-pricing formulation Ferson and Schadt (1996) modify Jensen’s alpha and two simple market-timing models to incorporate conditioning information. They start with a conditional CAPM, which implies that Equation (57) is satisfied for the assets available to portfolio managers.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
789
They show it is easy to extend the analysis beyond the CAPM, for a conditional multiple-beta model. rit + 1 = bim (Zt ) rmt + 1 + ui,t + 1 , E ui,t + 1 | Zt = 0, E ui,t + 1 rmt + 1 | Zt = 0,
i = 0, . . . , N ,
t = 0, . . . , T − 1, (57)
The bim (Zt ) are the time t conditional market betas of the excess return of asset i. The second Equation of (57) follows from the conditional CAPM assumption and the third equation says that the bim (Zt ) are conditional regression coefficients. Equation (57) implies that a portfolio strategy which depends only on the public information Zt will satisfy a similar regression. The intercept, or “alpha” of the regression should be zero, and the error term should not be related to the public information variables. 27 Under the hypothesis that the manager uses no more information than Zt , then the portfolio beta, bpm (Zt ), is a function only of Zt . Using a Taylor series we can approximate this function linearly: bpm (Zt ) = b0p + Bp zt ,
(58)
where zt = Zt − E(Z) is a vector of the deviations of Zt from the unconditional means, and Bp is a vector with dimension equal to the dimension of Zt . The coefficient b0p may be interpreted as an “average beta”, i.e., the unconditional mean of the conditional beta: E(bpm (Zt )). The elements of Bp are the response coefficients of the conditional beta with respect to the information variables Zt . Equations (57) and (58) imply a regression of a managed portfolio excess return on the market factor excess return and its product with the lagged information: (zt rmt + 1 ) + ept + 1 , rpt + 1 = ap + d1p rmt + 1 + d2p
(59)
where the model implies ap = 0, d1p = b0p , and d2p = Bp . The regression (59) may be interpreted as a multi-factor model, where the excess market return is the first factor and the products of the market and the lagged information variables are additional factors. The additional factors may be interpreted as the returns to dynamic strategies, which hold zt units of the market index, financed by borrowing or selling zt in Treasury bills. The coefficient ap is the average difference between the managed portfolio excess return and the excess return to the dynamic strategies, which replicate its time-varying risk exposure. A manager with a positive 27 That is, if R p, t + 1 = x(Zt ) Rt + 1 , where x(·) is an N -vector of weights and Rt + 1 is the N -vector of the available risky security returns, then the portfolio excess return will satisfy the conditional CAPM, with bpm (Zt ) = x(Zt ) bm (Zt ), where bm (Zt ) is the vector of the securities’ conditional betas. The error term in the regression for the portfolio strategy is up, t + 1 = x(Zt ) ut + 1 , where ut + 1 is the vector of the ui, t + 1 ’s, and therefore E(up, t + 1 | Zt ) = E(x(Zt ) ut + 1 | Zt ) = x(Zt ) E(ut + 1 | Zt ) = 0.
790
W.E. Ferson
alpha in this setting is one whose average return is higher than the average return of the conditional-beta-replicating strategies. 5.3. Using portfolio weights The previously discussed performance measurement techniques are all returnsbased. The strength of returns-based methodologies is their minimal information requirements. One needs only returns on the managed portfolio and data for the model of mt + 1 . However, this ignores potentially useful information that is often available in practice: the composition of the managed portfolio. Grinblatt and Titman (1989a, 1993) propose a weight-based measure of mutual fund performance. Their measure combines portfolio weights with unconditional moments to measure performance. Ferson and Khang (2002) argue that the use of portfolio weights may be especially important in a conditional setting. When expected returns are time-varying and managers trade between return observation dates, returns-based approaches are likely to be biased. This “interim trading bias” can be avoided by using portfolio weights in a conditional setting. The intuition behind weight-based performance measures can be motivated with a single-period model where an investor maximizes the expected utility of terminal wealth.
(60) Max E U W0 Rf + x r | Z, S , x
where Rf is the risk-free rate, r is the vector of risky asset returns in excess of the risk-free rate, W0 is the initial wealth, x is the vector of portfolio weights on the risky assets, Z is public information available at time 0, and S is private information available at time 0. Private information, by definition, is correlated with r, conditional on Z. If returns are conditionally normal, the first and second order conditions for the maximization when the investor has nonincreasing absolute risk aversion imply [see Khang (1997)] that
(61) E x(Z, S) r − E(r | Z) | Z > 0, where x(Z, S) is the optimal weight vector and r − E(r|Z) are the unexpected, or abnormal returns, from the perspective of an observer with the public information. Conditional on the public information, the sum of the conditional covariances between the weights of a manager with private information, S, and the abnormal returns for the securities in a portfolio is positive. If the manager has no private information, S, then the covariance is zero. Ferson and Khang (2002) study a Conditional Weight Measure (CWM) that follows from Equation (61). They introduce a “benchmark” weight, xb , that is in the public information set Z, so Equation (61) implies
(62) E [x(Z, S) − xb ] r − E(r | Z) | Z > 0, if the manager has superior information, S. Because xb is a constant given Z, it will not affect the conditional covariance. Weight changes are advantageous on statistical
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
791
grounds, as the levels of the weights may be nonstationary. Other benchmark weights could be used, when a particular benchmark may be suggested by the application. 5.3.1. Conditional performance attribution Traditional regression-based analysis sometimes interprets the regression as providing a decomposition of the sources of a fund’s returns. For example, fund beta times the market excess return is the component of the fund’s excess return due to overall market exposure. Conditional performance measures allow refinements of such decompositions. For example, in the Ferson and Schadt regression (59), we have a component due to average market exposure and one due to the mechanical use of the instruments, Zt , to track the time-varying exposure. The average conditional alpha is the difference between the fund return and the dynamic beta-matched strategy. Weightbased measures allow a similar decomposition. Consider the following identity for the unconditional covariance:
Sj Cov Dxj , rj = Sj E Cov Dxj , rj |Z + Sj Cov E Dxj |Z , E rj |Z ,
(63)
where Dxj ≡ xjt − xbjt . The left-hand side is the unconditional weight measure (UWM) as in Grinblatt and Titman (1993). The second term is the “average” conditional weight measure, equal to the unconditional mean of Equation (62). The third term captures the variation in the weight changes associated with changes in the expected returns, conditioned on public information. By comparing the conditional and unconditional measures, the third term may be calculated as a residual. Equation (63) decomposes the manager’s total return from active trading into a component attributable to private information (the first term on the right) and a component attributable to the public information. For example, the second component may be compared with the investor’s cost of monitoring the public information. The first component is the performance the investor could not obtain without the manager, even if he chose to monitor the public information. Isolating this component enables an investor to compensate a manager for his use of private information. 5.3.2. Interim trading bias The conditional weight-based approach can control an interim trading bias, which arises when we depart from the assumption that returns are independently and identically distributed over time (iid), and is therefore especially relevant to a conditional setting. Consider an example where returns are measured over two “periods”, but a manager trades each period. The manager has neutral performance, but the portfolio weights for the second period can be a function of public information at the intervening date. If returns are iid, this creates no bias, as there is no information at the intervening date that is correlated with the second period return. However, if expected returns vary with public information, then a manager who observes and trades
792
W.E. Ferson
on public information at the intervening date generates a return for the second period from the conditional distribution. His two-period portfolio strategy will contain more than the public information at the beginning of the first period, and a returns-based measure over the two periods will detect this as “superior” information. Goetzmann, Ingersoll and Ivkovic (2000) address interim trading bias by simulating the multiperiod returns generated by the option to trade between return observation dates. A conditional weight-based measure avoids the problem by examining the conditional covariance between the manager’s weights at the beginning of the first period and the subsequent two-period returns. The ability of the manager to trade at the intervening period thus creates no interim trading bias. Of course, managers may engage in interim trading based on superior information to enhance performance, and a weight-based measure will not record these interim trading effects. Interim trading thus presents a bias under the null hypothesis that managers possess only pubic information. Under the alternative hypothesis of superior ability, a weight-based measure may have limited power to detect the ability. Thus, the cost of using a weight-based measure to avoid bias is a potential loss of power. Ferson and Khang (2002) evaluate these tradeoffs, and conclude that the conditional weight-based measure is attractive. 5.4. Conditional market-timing models In a market-timing context, the goal of conditional performance evaluation is to distinguish timing ability that merely reflects publicly available information, as captured by the set of lagged instrumental variables, from timing based on better information. We may call such informed timing ability conditional market timing. A classic market-timing regression, when there is no conditioning information, is the quadratic regression of Treynor and Mazuy (1966): 2 (64) rpt + 1 = ap + bp rmt + 1 + gtmu rm,t + 1 + vpt + 1 , where the coefficient gtmu measures market timing ability. Admati et al. (1986) describe a model in which a manager with constant absolute risk aversion in a normally distributed world, observes at time t a private signal equal to the future market return plus noise, rmt + 1 + h. The manager’s response is change the portfolio beta as a linear function of the signal. They show that the gtmu coefficient in regression (64) is positive if the manager increases market exposure when the signal about the future market return is positive. In a conditional model, the part of the correlation of fund betas with the future market return that can be attributed to the public information, is not considered to reflect market timing ability. Ferson and Schadt (1996) develop a conditional version of the Treynor–Mazuy regression: 2 (65) rpt + 1 = ap + bp rmt + 1 + Cp (zt rmt + 1 ) + gtmc rm,t + 1 + vpt + 1 , where the coefficient vector Cp captures the linear response of the manager’s beta to the public information, Zt . The term Cp (zt rmt + 1 ) controls for the public information
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
793
effect, which would bias the coefficients in the original Treynor–Mazuy model. The coefficient gtmc measures the sensitivity of the manager’s beta to the private market timing signal. Merton and Henriksson (1981) and Henriksson (1984) describe an alternative model of market timing in which the quadratic term in Equation (64) is replaced by an option payoff, Max(0, rm,t + 1 ). This reflects the idea that market timers may be thought of as delivering (hopefully, attractively priced) put options on the market index. Ferson and Schadt (1996) develop a conditional version of this model as well. Becker et al. (1999) develop conditional market-timing models with explicit performance benchmarks. In this case, managers maximize the utility of their portfolio returns in excess of a benchmark portfolio return. In practice, performance benchmarks often represent an important component of managers’ incentive systems. Such benchmarks have been controversial in the academic literature. Starks (1987), Grinblatt and Titman (1989c) and Admati and Pfleiderer (1997) argue that benchmarks don’t properly align managers’ incentives. Carpenter, Dybvig and Farnsworth (2000) provide a theoretical justification of benchmarks, used in combination with investment restrictions. Becker et al. simultaneously estimate the fund managers’ risk aversion for tracking error and the precision of the market-timing signal, in a sample of more than 400 U.S. mutual funds for 1976–94, including a subsample with explicit asset allocation objectives. The estimates suggest that U.S. equity mutual funds behave as risk averse, benchmark investors, but little evidence of timing ability is found. 5.5. Empirical evidence on conditional performance Traditional measures of the average abnormal performance of mutual funds, like Jensen’s alpha, are observed to be negative more often than positive across the many studies. For example, Jensen (1968) used the CAPM to conclude that a typical fund has neutral performance, only after adding back expenses. Traditional measures of market timing often find that any significant market timing ability is perversely “negative”, suggesting that investors could time the market by doing the opposite of a typical fund. Such results make little economic sense, which suggests that they may be spurious. Conditional performance evaluation takes the view that a mechanically managed portfolio strategy using only public information does not have abnormal performance. A manager’s return is therefore compared with such a benchmark, mechanically constructed using public information to match the time-varying risk of the fund. The empirical evidence suggests that conditional performance measures can produce results different from the classical methods. Ferson and Schadt (1996) find evidence that funds’ risk exposures change in response to public information on the economy, such as level of interest rates and dividend yields. Using conditional models Ferson and Schadt (1996), Kryzanowski, Lalancette and To (1997) and Zheng (1999) find that the distribution of mutual fund alphas shifts to the right and is centered near zero. Ferson and Warther (1996) attribute differences between unconditional and conditional alphas to predictable flows of public
794
W.E. Ferson
money into funds. Inflows are correlated with reduced market exposure, at times when the public expects high returns, due to larger cash holdings at such times. In pension funds, which are not subject to high frequency flows of public money, no overall shift in the distribution of fund alphas is found when moving to conditional models [Christopherson et al. 1998b)]. Once we control for public information variables, there seems to be little evidence that mutual funds have conditional timing ability for the level of the market return. Busse (1999) asks whether fund returns contain information about market volatility. He finds evidence using daily data that funds may shift their market exposures in response to changes in second moments. Further research in this direction is clearly warranted. Farnsworth et al. (2002) use a variety of SDF models to evaluate performance in a monthly sample of US equity mutual funds. They find that many of the SDF models are biased. The average bias is about −0.19% per month for unconditional models, −0.12% for conditional models. This is less than two standard errors, as a typical standard error is 0.1% per month. They find that the average mutual fund alpha is no worse than a hypothetical stock-picking fund with neutral performance. Adding back average expenses of about 0.17% per month to the mutual fund alphas (since the actual funds pay expenses, while the hypothetical funds do not), the average fund’s performance is slightly higher than hypothetical funds with no ability. Ferson and Khang (2002) develop the conditional, weight-based approach to measuring performance. Using a sample of equity pension fund managers, 1985–1994, they find that the traditional, returns-based alphas of the funds are positive, consistent with previous studies of pension fund performance. However, these alphas are smaller than the potential effects of interim trading bias. By using instruments for public information combined with portfolio weights, their conditional weight-based measures find that the pension funds also have neutral performance. Thus, the empirical evidence based on conditional performance measures suggests that abnormal fund performance, controlling for public information, is rare.
6. Conclusions This chapter has reviewed tests of multifactor asset pricing models, volatility bounds and portfolio performance. We developed three essentially equivalent paradigms: beta pricing, stochastic discount factors and minimum variance efficiency, and we discussed each approach in the context of conditional asset-pricing models. These models are stated in terms of expected returns and risk measures, conditioned on available information about the state of the economy. Conditional models are most interesting when there are observable instruments that can track time-varying expected returns and security risks. The evidence for such predictability in returns is both extensive and controversial. Conditional asset pricing models should provide a useful framework for many continuing investigations.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
795
The three paradigms of empirical asset pricing have traditionally been linked with particular empirical methods. The stochastic discount factor paradigm seems to fit naturally with the generalized method of moments. Mean variance efficiency tests have (since the early 1980s) most commonly employed multivariate regression, and multibeta models seem to cry out for cross-sectional regressions. But this pairing of the models and methods is not sacrosanct. In fact, any of these empirical methods can be paired with any of the paradigms, and recent studies are beginning to explore the possibilities and tradeoffs. I am coming to the view that the set of moments the investigator chooses to examine is the key issue. Different approaches can lead one to examine different moments, and when they do, the results will differ. Conditional performance evaluation provides an example, where the models and methods meet the data. Empirical work in this area essentially applies conditional asset pricing models to the returns of managed portfolios. The evidence to date shows that conditional models make a difference. I expect these approaches to yield more interesting insights in the future, about the behavior and performance of mutual funds, pension funds, hedge funds and other professionally managed portfolios. Conditional asset pricing and conditional performance evaluation are still relatively young. They are roughly “teenagers” at the time of this writing. I expect them to contribute a lot more to our state of knowledge as they mature with future research. I hope that this chapter helps to facilitate some of that research.
References Admati, A., and P. Pfleiderer (1997), “Performance benchmarks: does it all add up?” Journal of Business 70(3):323−350. Admati, A., S. Bhattacharya, S. Ross and P. Pfliederer (1986), “On timing and selectivity”, Journal of Finance 41:715−730. Ang, A., and G. Bekaert (2001), “Stock return predictability: is it there?” Working Paper (Columbia University, New York). Balduzzi, P., and H. Kallal (1997), “Risk premia and variance bounds”, The Journal of Finance 52(5): 1913−1949. Balduzzi, P., and T. Yao (2001), “Does heterogeneity matter for asset pricing?” Working Paper (Boston College). Bansal, R., and B.N. Lehmann (1997), “Growth-optimal restrictions on asset pricing models”, Macroeconomic Dynamics 1:1−22. Basu, D., and A. Stremme (2003), “Portfolio efficiency and stochastic discount factor bounds with conditioning information: an empirical study”, Working Paper (University of Warwick, Coventry, UK). Becker, C., W. Ferson, D. Myers and M. Schill (1999), “Conditional market timing with benchmark investors”, Journal of Financial Economics 52:119−148. Beja, A. (1971), “The structure of the cost of capital under uncertainty”, Review of Economic Studies 4:359−369. Bekaert, G., and J. Liu (2002), “Conditioning information and variance bounds on pricing kernels”, Review of Financial Studies, forthcoming. Berk, J. (1995), “A critique of size-related anomalies”, Review of Financial Studies 8:275−286. Berk, J. (2000), “Sorting out sorts”, Journal of Finance 55:407−427.
796
W.E. Ferson
Bhattacharya, S. (1981), “Notes on multiperiod valuation and the pricing of options”, Journal of Finance 36:163−180. Black, F. (1972), “Capital market equilibrium with restricted borrowing”, Journal of Business 45: 444−454. Black, F., and M. Scholes (1974), “The effects of dividend yield and dividend policy on common stock prices and returns”, Journal of Financial Economics 1:1−22. Black, F., M. Jensen and M. Scholes (1972), “The capital asset pricing model: some empirical tests”, in: M. Jensen, ed., Studies in the Theory of Capital Markets (Praeger, New York). Brav, A., G. Constantinides and C. Geczy (2002), “Asset pricing with heterogeneous consumers and limited participation: empirical evidence”, Journal of Political Economy 110(4). Breeden, D. (1979), “An intertemporal asset pricing model with stochastic consumption and investment opportunities”, Journal of Financial Economics 7:265−296. Brown, K., and G. Brown (1987), “Does the market portfolio’s composition matter?” Journal of Portfolio Management 13:26−32. Bruner, R.F., K. Eades, R. Harris and R. Higgins (1998), “Best practices in estimating the cost of capital: survey and synthesis”, Journal of Financial Practice and Education (Spring/Summer), pp. 13–27. Burmeister, E., and M.B. McElroy (1988), “Joint estimation of factor sensitivities and risk premia for the arbitrage pricing theory”, Journal of Finance 43:721−733. Buse, A. (1982), “The likelihood ratio, Woud and Lagrange multiplier tests: an expository note”, American Statistician 36:153−157. Busse, J. (1999), “Volatility timing in mutual timing: evidence from daily returns”, Review of Financial Studies 12:1009−1041. Campbell, J.Y. (1987), “Stock returns and the term structure”, Journal of Financial Economics 18: 373−399. Campbell, J.Y. (1993), “Intertemporal asset pricing without consumption data”, American Economic Review 83:487−512. Campbell, J.Y., and L. Viceira (1999), “Consumption and portfolio decisions when expected returns are time-varying”, Quarterly Journal of Economics 114:433−495. Campbell, J.Y., A. Lo and A.C. MacKinlay (1997), The Econometrics of Financial Markets (Princeton University Press, Princeton, NJ). Carpenter, J., P.H. Dybvig and H. Farnsworth (2000), “Portfolio performance and agency”, Working Paper (Washington University, St. Louis). Carroll, R.J. (1989), “Redescending M -estimators”, in: Samuel Kotz, ed., Encyclopedia of Statistical Sciences, Supplement Volume (Wiley, New York) pp. 134−137. Chamberlain, G. (1983), “Funds, factors and diversification in arbitrage pricing models”, Econometrica 51:1305−1324. Chamberlain, G., and M. Rothschild (1983), “Arbitrage, factor structure and mean variance analysis on large asset markets”, Econometrica 51:1281−1304. Chen, N., R. Roll and S. Ross (1986), “Economic forces and the stock market”, Journal of Business 59:383−403. Chen, N.-f., T. Copeland and D. Mayers (1987), “A comparison of single and multifactor performance methodologies”, Journal of Financial and Quantitative Analysis 22:401−417. Chen, Z., and P.J. Knez (1996), “Portfolio performance measurement: theory and applications”, Review of Financial Studies 9:511−556. Christopherson, J.A., W. Ferson and D.A. Glassman (1998a), “Conditioning manager alphas on economic information: another look at the persistence of performance”, Review of Financial Studies 11:111−142. Christopherson, J.A., W. Ferson and D.A. Glassman (1998b), “Conditional measures of performance and persistence for pension funds”, in: A. Chen, ed., Research in Finance, Vol. 16 (JAI Press, Stamford, CT) pp. 1–46.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
797
Cochrane, J., and L. Hansen (1992), “Asset pricing lessons for macroeconomics”, in: O. Blanchard and S. Fisher, eds., The Macroeconomics Annual (MIT Press, Cambridge, MA) pp. 115–165. Cochrane, J.H. (1996), “A cross-sectional test of a production based asset pricing model”, Journal of Political Economy 104:572−621. Cochrane, J.H. (2001), Asset Pricing (Princeton University Press, New Jersey). Cochrane, J.H., and J. Saa’-Requejo (2000), “Beyond arbitrage: good-deal pricing of derivatives in incomplete markets”, Journal of Political Economy 108:79−119. Connor, G. (1984), “A unified beta pricing theory”, Journal of Economic Theory 34:13−31. Connor, G., and R. Korajczyk (1988), “Risk and return in an equilibrium APT: application of a new test methodology”, Journal of Financial Economics 21:255−290. Connor, G., and R. Korajczyk (1995), “Arbitrage pricing theory and multifactor models of asset returns”, in: R. Jarrow, V. Maksimovic and W.T. Ziemba, eds., Handbooks in Operations Research and Management Science: Finance (Elsevier, Amsterdam) pp. 87–137. Connor, G., and R.A. Korajczyk (1986), “Performance measurement with the arbitrage pricing theory: a new framework for analysis”, Journal of Financial Economics 15:373−394. Conrad, J., and G. Kaul (1989), “Mean reversion in short-horizon expected returns”, Review of Financial Studies 2:225−240. Constantinides, G.M. (1982), “Intertemporal asset pricing with heterogeneous consumers and without demand aggregation”, Journal of Business 55:253−267. Cox, J.C., J.E. Ingersoll and S.A. Ross (1985), “A theory of the term structure of interest rates”, Econometrica 53:385−408. Cragg, J.G., and B.G. Malkiel (1982), Expectations and the Structure of Share Prices (University of Chicago Press, Chicago). Daniel, K., and S. Titman (1997), “Evidence on the characteristics of cross sectional variation in stock returns”, Journal of Finance 52:1−33. DeBondt, W., and R. Thaler (1985), “Does the stock market overreact?” Journal of Finance 40:793−805. Douglas, G. (1969), “Risk in equity markets: an empirical appraisal of market efficiency”, Yale Economics Essays 9:3−45. Dybvig, P., and J. Ingersoll (1982), “Mean variance theory in complete markets”, Journal of Business 55:233−252. Dybvig, P.H. (1983), “An explicit bound on individual assets’ deviations from APT pricing in a finite economy”, Journal of Financial Economics 12:483−496. Dybvig, P.H., and S.A. Ross (1985), “Performance measurement using differential information and a security market line”, Journal of Finance 40:383−399. Fama, E., and K. French (1988), “Permanent and temporary components of stock prices”, Journal of Political Economy 96:246−273. Fama, E., and K. French (1989), “Business conditions and expected returns on stocks and bonds”, Journal of Financial Economics 25:23−49. Fama, E., and K. French (1993), “Common risk factors in the returns of stocks and bonds”, Journal of Financial Economics 33:3−56. Fama, E.F. (1970), “Efficient capital markets: a review of theory and empirical work”, Journal of Finance 25:383−417. Fama, E.F. (1976), Foundations of Finance (Basic Books, New York). Fama, E.F. (1991), “Efficient capital markets II”, Journal of Finance 46:1575−1617. Fama, E.F., and K.R. French (1996), “Multifactor explanations of asset pricing anomalies”, Journal of Finance 51:55−87. Fama, E.F., and K.R. French (1997), “Industry costs of equity”, Journal of Financial Economics 43: 153−194. Fama, E.F., and J.D. MacBeth (1973), “Risk, return and equilibrium: empirical tests”, Journal of Political Economy 81:607−636.
798
W.E. Ferson
Fama, E.F., and G.W. Schwert (1977), “Asset returns and inflation”, Journal of Financial Economics 5:115−146. Farnsworth, H., W. Ferson, D. Jackson and S. Todd (2002), “Performance evaluation with stochastic discount factors”, Journal of Business 75:473−504. Ferson, W.E. (1983), “Expectations of real interest rates and aggregate consumption: empirical tests”, Journal of Financial and Quantitative Analysis 18:477−497. Ferson, W.E. (1989), “Changes in expected security returns, risk and the level of interest rates”, Journal of Finance 44:1191−1217. Ferson, W.E. (1995), “Theory and empirical testing of asset pricing models”, in: R. Jarrow, V. Maksimovic and W.T. Ziemba, eds., Handbooks in Operations Research and Management Science: Finance (Elsevier, Amsterdam) pp. 145−200. Ferson, W.E., and C.R. Harvey (1991), “The variation of economic risk premiums”, Journal of Political Economy 99:385−415. Ferson, W.E., and C.R. Harvey (1993), “The risk and predictability of international equity returns”, Review of Financial Studies 6:527−566. Ferson, W.E., and C.R. Harvey (1999), “Economic, financial and fundamental global risk in and out of EMU”, Swedish Economic Policy Review 6:123−184. Ferson, W.E., and R. Jagannathan (1996), “Econometric evaluation of asset pricing models”, in: G.S. Maddala and C.R. Rao, eds., Handbook of Statistics, Vol. 14: Statistical Methods in Finance (Elsevier, Amsterdam) pp. 1−30. Ferson, W.E., and K. Khang (2002), “Conditional performance measurement using portfolio weights: evidence for pension funds”, Journal of Financial Economics 65:249−282. Ferson, W.E., and R.A. Korajczyk (1995), “Do arbitrage pricing models explain the predictability of stock return?”, Journal of Business 68(3):309−349. Ferson, W.E., and R.W. Schadt (1996), “Measuring fund strategy and performance in changing economic conditions”, Journal of Finance 51:425−462. Ferson, W.E., and A.F. Siegel (2001), “The efficient use of conditioning information in portfolios”, Journal of Finance 56(3):967−982. Ferson, W.E., and A.F. Siegel (2002a), “Stochastic discount factor bounds with conditioning information”, Review of Financial Studies, forthcoming. Ferson, W.E., and A.F. Siegel (2002b), “Testing portfolio efficiency with conditioning information”, Working Paper (Boston College). Ferson, W.E., and V.A. Warther (1996), “Evaluating fund performance in a dynamic market”, Financial Analysts Journal 52(6):20−28. Ferson, W.E., S. Sarkissian and T. Simin (1999), “The alpha factor asset pricing model: a parable”, Journal of Financial Markets 2:49−68. Ferson, W.E., S. Sarkissian and T. Simin (2002), “Spurious regressions in financial economics?” Journal of Finance, forthcoming. Ferson, W.E., A. Siegel and P. Xu (2002), “Mimicking portfolios with conditioning information”, Working Paper (Boston College). Foster, D., T. Smith and R. Whaley (1997), “Assessing goodness-of-fit of asset pricing models: the distribution of the maximal R-squared”, Journal of Finance 52:591−607. Gallant, R.A., L.P. Hansen and G. Tauchen (1990), “Using the conditional moments of asset payoffs to infer the volatility of intertemporal marginal rates of substitution”, Journal of Econometrics 45: 141−179. Gibbons, M.R. (1982), “Multivariate tests of financial models”, Journal of Financial Economics 10:3−27. Gibbons, M.R., S.A. Ross and J. Shanken (1989), “A test of the efficiency of a given portfolio”, Econometrica 57:1121−1152. Goetzmann, W.N., J. Ingersoll and Z. Ivkovic (2000), “Monthly measurement of daily timers”, Journal of Financial and Quantitative Analysis 35:257−290.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
799
Goodall, C. (1983), “M -estimators of location: an outline of the theory”, in: D.C. Hoaglin, F. Mosteller and J.W. Tukey, eds., Understanding Robust and Exploratory Data Analysis (Wiley, New York) pp. 339−403. Goyal, A., and I. Welch (1999), “Predicting the equity premium”, Working Paper (University of California, Los Angeles). Graham, J.R., and C.R. Harvey (2001), “The theory and practice of corporate finance: evidence from the field”, Journal of Financial Economics 60:187−243. Grant, D. (1977), “Portfolio performance and the ‘cost’ of timing decisions”, Journal of Finance 32:837−846. Grinblatt, M., and S. Titman (1983), “Factor pricing in a finite economy”, Journal of Financial Economics 12:497−508. Grinblatt, M., and S. Titman (1987), “The relation between mean-variance efficiency and arbitrage pricing”, Journal of Business 60:97−112. Grinblatt, M., and S. Titman (1989a), “Mutual fund performance: an analysis of quarterly portfolio holdings”, Journal of Business 62:393−416. Grinblatt, M., and S. Titman (1989b), “Portfolio performance evaluation: old issues and new insights”, Review of Financial Studies 2:393−421. Grinblatt, M., and S. Titman (1989c), “Adverse risk incentives in the design of performance-based contracts”, Management Science 35:807−822. Grinblatt, M., and S. Titman (1993), “Performance measurement without benchmarks: an examination of mutual fund returns”, Journal of Business 60:97−112. Grossman, S., and R.J. Shiller (1982), “Consumption correlatedness and risk measurement in economies with nontraded assets and heterogeneous information”, Journal of Financial Economics 10:195−210. Hamilton, J.D. (1994), Time-Series Analysis (Princeton University Press, Princeton, NJ). Hampel, F.R. (1974), “The influence curve and its role in robust estimation”, Journal of the American Statistical Association 69:383−393. Hansen, L.P. (1982), “Large sample properties of the generalized method of moments estimators”, Econometrica 50:1029−1054. Hansen, L.P., and R. Hodrick (1980), “Forward exchange rates as optimal predictors of future spot rates: an econometric analysis”, Journal of Political Economy 88:829−853. Hansen, L.P., and R. Jagannathan (1991), “Implications of security market data for models of dynamic economies”, Journal of Political Economy 99:225−262. Hansen, L.P., and R. Jagannathan (1997), “Assessing specification errors in stochastic discount factor models”, Journal of Finance 52:557−590. Hansen, L.P., and S.F. Richard (1987), “The role of conditioning information in deducing testable restrictions implied by dynamic asset pricing models”, Econometrica 55:587−613. Hansen, L.P., J. Heaton and E. Luttmer (1995), “Econometric evaluation of asset pricing models”, Review of Financial Studies 8:237−274. Harrison, M., and D. Kreps (1979), “Martingales and arbitrage in multi-period securities markets”, Journal of Economic Theory 20:381−408. Harvey, C.R. (1991), “The world price of covariance risk”, Journal of Finance 46:111−157. Harvey, C.R., and C.M. Kirby (1996), “Instrumental variables estimation of conditional beta pricing models”, in: G.S. Maddala and C.R. Rao, eds., Handbook of Statistics, Vol 14 (Elsevier, New York) Ch. 2. Henriksson, R.D. (1984), “Market timing and mutual fund performance: an empirical investigation”, Journal of Business 57:73−96. Huberman, G., S.A. Kandel and R.F. Stambaugh (1987), “Mimicking portfolios and exact arbitrage pricing”, Journal of Finance 42:1−10. Ingersoll, J.E. (1987), Theory of Financial Decision Making (Rowman and Littlefield, Savage, MD). Jagannathan, R., and Z. Wang (2002), “Empirical evaluation of asset pricing models: a comparison of SDF and beta methods”, Journal of Finance 57:2337−2367.
800
W.E. Ferson
Jegadeesh, N., and S. Titman (1993), “Returns to buying winners and selling losers: implicataions for stock market efficiency”, Journal of Finance 48:65−91. Jensen, M. (1968), “The performance of mutual funds in the period 1945–1964”, Journal of Finance 23:389−46. Jensen, M.C. (1972), “Optimal utilization of market forecasts and the evaluation of investment performance”, in: G.P. Szego and K. Shell, eds., Mathematical Methods in Investment and Finance (Elsevier, Amsterdam). Jobson, J.D., and R. Korkie (1982), “Potential performance and tests of portfolio efficiency”, Journal of Financial Economics 10:433−466. Jobson, J.D., and R. Korkie (1985), “Some tests of asset pricing with multivariate normality”, Canadian Journal of Administrative Sciences 2:114−138. Kan, R., and G. Zhou (1999), “A critique of the stochastic discount factor methodology”, Journal of Finance 54:1221−1248. Kandel, S., and R.F. Stambaugh (1989), “A mean variance framework for tests of asset pricing models”, Review of Financial Studies 2:125−156. Kaul, G. (1996), “Predictable components in stock returns”, in: G.S. Maddala and C.R. Rao, eds., Handbook of Statistics, Vol. 14 (Elsevier, Amsterdam) pp. 269−296. Keim, D.B. (1983), “Size-related anomalies and stock return seasonality: further empirical evidence”, Journal of Financial Economics 12:13−32. Khang, K. (1997), “Performance measurement using portfolio weights and conditioning information: an examination of pension fund equity manager performance”, Ph.D dissertation (University of Washington, Seattle, WA) unpublished. Kim, M., C.R. Nelson and R. Startz (1991), “Mean reversion in stock returns? A reappraisal of the statistical evidence”, Review of Economic Studies 58:515−528. Kim, T.S., and E. Omberg (1996), “Dynamic nonmyopic portfolio behavior”, Review of Financial Studies 9:141−161. Kirby, C. (1998), “The restrictions on predictability implied by asset pricing models”, Review of Financial Studies 11:343−382. Kryzanowski, L., S. Lalancette and M.C. To (1997), “Performance attribution using an APT with prespecified macrofactors and time-varying risk premia and betas”, Journal of Financial and Quantitative Analysis 32:205−224. Lehmann, B.N., and D.M. Modest (1987), “Mutual fund performance evaluation: a comparison of benchmarks and benchmark comparisons”, Journal of Finance 42:233−265. Lehmann, B.N., and D.M. Modest (1988), “The empirical foundations of the arbitrage pricing theory”, Journal of Financial Economics 21:213−254. Lettau, M., and S. Ludvigson (2001), “Consumption, aggregate wealth and expected stock returns”, Journal of Finance 56:815−849. Lintner, J. (1965), “The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets”, Review of Economics and Statistics 47:13−37. Litzenberger, R., and K. Ramaswamy (1979), “The effect of personal taxes and dividends on capital asset prices: theory and evidence”, Journal of Financial Economics 7:163−196. Litzenberger, R., and K. Ramaswamy (1982), “The effects of dividends on common stock prices: tax effects or information effects?” Journal of Finance 37:429−433. Lo, A.W., and A.C. MacKinlay (1990), “Data snooping in tests of financial asset pricing models”, Review of Financial Studies 1:41−66. Long, J. (1974), “Stock prices, inflation, and the term structure of interest rates”, Journal of Financial Economics 1:131−170. Lucas Jr, R.E. (1978), “Asset prices in an exchange economy”, Econometrica 46:1429−1445. MacKinlay, A.C. (1995), “Multifactor models do not explain deviations from the CAPM”, Journal of Financial Economics 38:3−28.
Ch. 12:
Tests of Multifactor Pricing Models, Volatility Bounds and Portfolio Performance
801
MacKinlay, A.C., and M.P. Richardson (1991), “Using the generalized method of moments to test mean-variance efficiency”, Journal of Finance 46:511−528. Mehra, R., and E. Prescott (1985), “The equity premium: a puzzle”, Journal of Monetary Economics 15:145−162. Merton, R.C. (1973), “An intertemporal capital asset pricing model”, Econometrica 41:867−887. Merton, R.C., and R. Henriksson (1981), “On market timing and investment performance II: Statistical procedures for evaluating forecasting skills”, Journal of Business 54:513−533. Newey, W., and K.D. West (1987), “A simple, positive definite, heteroskedasticity and autocorrelation consistent covariance matrix”, Econometrica 55:703−708. Pesaran, M.H., and A. Timmermann (1995), “Predictability of stock returns: robustness and economic significance”, Journal of Finance 50:1201−1228. Pontiff, J., and L. Schall (1998), “Book-to-market as a predictor of market returns”, Journal of Financial Economics 49:141−160. Reinganum, M.R. (1981), “Misspecification of capital asset pricing: empirical anomalies based on earnings yields and market values”, Journal of Financial Economics 9:19−46. Roll, R. (1977), “A critique of the asset pricing theory’s tests – part 1: On past and potential testability of the theory”, Journal of Financial Economics 4:129−176. Roll, R. (1978), “Ambiguity when performance is measured by the security market line”, Journal of Finance 33:1051−1069. Roll, R., and S.A. Ross (1980), “An empirical investigation of the arbitrage pricing theory”, Journal of Finance 35:1073−1103. Roll, R.R. (1984), “A simple implicit measure of the effective bid-ask spread in an efficient market”, Journal of Finance 39:1127−1140. Roll, R.R. (1988), “R2 ”, Journal of Finance 43:541−566. Ross, S. (1977), “Risk, return and arbitrage”, in: I. Friend and J. Bicksler, eds., Risk and Return in Finance (Ballinger, Cambridge, MA). Ross, S.A. (1976), “The arbitrage pricing theory of capital asset pricing”, Journal of Economic Theory 13:341−360. Rubinstein, M. (1974), “An aggregation theorem for securities markets”, Journal of Financial Economics 1:225−244. Rubinstein, M. (1976), “The valuation of uncertain income streams and the pricing of options”, Bell Journal of Economics and Management Science 7:407−425. Sarkissian, S. (2002), “Incomplete consumption risk sharing and currency risk premiums”, Review of Financial Studies, forthcoming. Scholes, M., and J. Williams (1977), “Estimating beta from nonsynchronous data”, Journal of Financial Economics 5:309−327. Shanken, J. (1987), “Multivariate proxies and asset pricing relations: living with the Roll critique”, Journal of Financial Economics 18:91−110. Shanken, J. (1992), “On the estimation of beta pricing models”, Review of Financial Studies 5:1−34. Sharpe, W.F. (1964), “Capital asset prices: a theory of market equilibrium under conditions of risk”, Journal of Finance 19:425−442. Sharpe, W.F. (1977), “The capital asset pricing model: a muti-beta interpretation”, in: H. Levy and M. Sarnat, eds., Financial Decision Making under Uncertainty (Academic Press, New York). Simin, T. (2002), “The (poor) predictive performance of asset pricing models”, Working Paper (Pennsylvania State University). Singleton, K. (1990), “Specification and estimation of intertemporal asset pricing models”, in: B. Freidman and F. Hahn, eds., Handbook of Monetary Economics (Elsevier, Amsterdam). Snow, K.N. (1991), “Diagnosing asset pricing models using the distribution of asset returns”, Journal of Finance 46:955−983. Solnik, B. (1993), “The unconditional performance of international asset allocation strategies using conditioning information”, Journal of Empirical Finance 1:33−55.
802
W.E. Ferson
Stambaugh, R.F. (1982), “On the exlusion of assets from tests of the two-parameter model”, Journal of Financial Economics 10:235−268. Stambaugh, R.F. (1983), “Testing the CAPM with broader market indexes: a problem of mean deficiency”, Journal of Banking and Finance 7:5−16. Stambaugh, R.F. (1999), “Predictive regressions”, Journal of Financial Economics 54:315−421. Starks, L. (1987), “Performance incentive fees: an agency theoretic approach”, Journal of Financial and Quantitative Analysis 22:17−32. Theil, H. (1971), Principles of Econometrics (Wiley, New York). Treynor, J., and K. Mazuy (1966), “Can mutual funds outguess the market?” Harvard Business Review 44:131−136. Wheatley, S. (1989), “A critique of latent variable tests of asset pricing models”, Journal of Financial Economics 23:325−338. Wilson, R.B. (1968), “The theory of syndicates”, Econometrica 36:119−131. Zeldes, S. (1989), “Consumption and liquidity constraints: an empirical investigation”, Journal of Political Economy 97:305−346. Zheng, L. (1999), “Is money smart? A study of mutual fund investors’ fund selection ability”, Journal of Finance 54:901−933.
Chapter 13
CONSUMPTION-BASED ASSET PRICING JOHN Y. CAMPBELL ° Harvard University and NBER
Contents Abstract Keywords 1. Introduction 2. International stock market data 3. The equity premium puzzle 3.1. 3.2. 3.3. 3.4. 3.5.
The stochastic discount factor Consumption-based asset pricing with power utility The risk-free rate puzzle Bond returns and the equity-premium and risk-free rate puzzles Separating risk aversion and intertemporal substitution
4. The dynamics of asset returns and consumption 4.1. 4.2. 4.3. 4.4. 4.5. 4.6. 4.7.
Time-variation in conditional expectations A loglinear asset-pricing framework The equity volatility puzzle Implications for the equity premium puzzle What does the stock market forecast? Changing volatility in stock returns What does the bond market forecast?
5. Cyclical variation in the price of risk 5.1. Habit formation 5.2. Models with heterogeneous agents
804 804 805 810 816 816 819 824 827 828 832 832 836 840 845 849 857 859 866 866 873
° This chapter is a revised and updated version of John Y. Campbell, 1999, “Asset prices, consumption, and the business cycle”, in: J. Taylor and M. Woodford, eds., Handbook of Macroeconomics, Vol. 1, Chapter 19, pp. 1231–1303. All the acknowledgements in that chapter continue to apply. In addition, I am grateful to the National Science Foundation for financial support, to the Faculty of Economics and Politics at the University of Cambridge for the invitation to deliver the 2001 Marshall Lectures, where I presented some of the ideas in this chapter, to Andrew Abel, Sydney Ludvigson, and Rajnish Mehra for helpful comments, and to Samit Dasgupta, Stephen Shore, Daniel Waldman, and Motohiro Yogo for able research assistance. Email:
[email protected]. Web page http://post.economics.harvard.edu/faculty/campbell/campbell.html.
Handbook of the Economics of Finance, Edited by G.M. Constantinides, M. Harris and R. Stulz © 2003 Elsevier B.V. All rights reserved
804
J.Y. Campbell 5.3. Irrational expectations
6. Some implications for macroeconomics References
876 879 881
Abstract This chapter reviews the behavior of financial asset prices in relation to consumption. The chapter lists some important stylized facts that characterize U.S. data, and relates them to recent developments in equilibrium asset pricing theory. Data from other countries are examined to see which features of the U.S. experience apply more generally. The chapter argues that to make sense of asset market behavior one needs a model in which the market price of risk is high, time-varying, and correlated with the state of the economy. Models that have this feature, including models with habitformation in utility, heterogeneous investors, and irrational expectations, are discussed. The main focus is on stock returns and short-term real interest rates, but bond returns are also considered.
Keywords equity premium puzzle, risk aversion, stochastic discount factor, intertemporal marginal rate of substitution, risk-free rate puzzle, equity volatility JEL classification: G12, E21
Ch. 13:
Consumption-Based Asset Pricing
805
1. Introduction The behavior of aggregate stock prices is a subject of enduring fascination to investors, policymakers, and economists. In the last 20 years stock markets have continued to show some familiar patterns, including high average returns and volatile and procyclical price movements. Economists have struggled to understand these patterns. If stock prices are determined by fundamentals, then what exactly are these fundamentals and what is the mechanism by which they move prices? Researchers, working primarily with U.S. data, have documented a host of interesting stylized facts about the stock market and its relation to short-term interest rates and aggregate consumption. 1. The average real return on stock is high. In quarterly U.S. data over the period 1947.2 to 1998.4, a standard data set that is used throughout this chapter, the average real stock return was 8.1% at an annual rate. 1 2. The average riskless real interest rate is low. 3-month Treasury bills deliver a return that is riskless in nominal terms and close to riskless in real terms because there is only modest uncertainty about inflation at a 3-month horizon. In the postwar quarterly U.S. data, the average real return on 3-month Treasury bills was 0.9% per year. 3. Real stock returns are volatile, with an annualized standard deviation of 15.6% in the U.S. data. 4. The real interest rate is much less volatile. The annualized standard deviation of the ex post real return on U.S. Treasury bills is 1.7%, and much of this is due to shortrun inflation risk. Less than half the variance of the real bill return is forecastable, so the standard deviation of the ex ante real interest rate is considerably smaller than 1.7%. 5. Real consumption growth is very smooth. The annualized standard deviation of the growth rate of seasonally adjusted real consumption of nondurables and services is 1.1% in the U.S. data. 6. Real dividend growth is extremely volatile at short horizons because dividend data are not adjusted to remove seasonality in dividend payments. The annualized quarterly standard deviation of real dividend growth is 28.3% in the U.S. data. At longer horizons, however, the volatility of dividend growth is intermediate between the volatility of stock returns and the volatility of consumption growth. At an annual frequency, for example, the volatility of real dividend growth is only 6% in the U.S. data. 7. Quarterly real consumption growth and real dividend growth have a very weak correlation of 0.05 in the U.S. data, but the correlation increases at lower frequencies to 0.25 at a 4-year horizon. 1 Here and throughout the chapter, the word return is used to mean a log or continuously compounded return unless otherwise stated. Thus the average return corresponds to a geometric average, which is lower than the arithmetic average of simple returns.
806
J.Y. Campbell
8. Real consumption growth and real stock returns have a quarterly correlation of 0.23 in the U.S. data. The correlation increases to 0.34 at a 1-year horizon, and declines at longer horizons. 9. Quarterly real dividend growth and real stock returns have a very weak correlation of 0.03 in the U.S. data, but the correlation increases dramatically at lower frequencies to reach 0.47 at a 4-year horizon. 10. Real U.S. consumption growth is not well forecast by its own history or by the stock market. The first-order autocorrelation of the quarterly growth rate of real nondurables and services consumption is a modest 0.2, and the log price-dividend ratio forecasts less than 4% of the variation of real consumption growth at horizons of 1 to 4 years. 11. Real U.S. dividend growth has some short-run forecastability arising from the seasonality of dividend payments. But it is not well forecast by the stock market. The log price-dividend ratio forecasts no more than 8% of the variation of real dividend growth at horizons of 1 to 4 years. 12. The real interest rate has some positive serial correlation; its first-order autocorrelation in postwar quarterly U.S. data is 0.5. However the real interest rate is not well forecast by the stock market, since the log price-dividend ratio forecasts less than 1% of the variation of the real interest rate at horizons of 1 to 4 years. 13. Excess returns on U.S. stock over Treasury bills are highly forecastable. The log price–dividend ratio forecasts 10% of the variance of the excess return at a 1-year horizon, 22% at a 2-year horizon, and 38% at a 4-year horizon. These facts raise two important questions for students of macroeconomics and finance. • Why is the average real stock return so high in relation to the average short-term real interest rate? • Why is the volatility of real stock returns so high in relation to the volatility of the short-term real interest rate? Mehra and Prescott (1985) call the first question the “equity premium puzzle”. 2 Finance theory explains the expected excess return on any risky asset over the riskless interest rate as the quantity of risk times the price of risk. In a standard consumptionbased asset pricing model of the type studied by Rubinstein (1976), Lucas (1978), Grossman and Shiller (1981) and Hansen and Singleton (1983), the quantity of stock market risk is measured by the covariance of the excess stock return with consumption growth, while the price of risk is the coefficient of relative risk aversion of a representative investor. The high average stock return and low riskless interest rate (stylized facts 1 and 2) imply that the expected excess return on stock, the equity premium, is high. But the smoothness of consumption (stylized fact 5) makes the covariance of stock returns with consumption low; hence the equity premium can only be explained by a very high coefficient of risk aversion.
2
For excellent recent surveys, see Kocherlakota (1996) or Cochrane (2001).
Ch. 13:
Consumption-Based Asset Pricing
807
Shiller (1982), Hansen and Jagannathan (1991), and Cochrane and Hansen (1992), building on the work of Rubinstein (1976), have related the equity premium puzzle to the volatility of the stochastic discount factor, or equivalently the volatility of the intertemporal marginal rate of substitution of a representative investor. Expressed in these terms, the equity premium puzzle is that an extremely volatile stochastic discount factor is required to match the ratio of the equity premium to the standard deviation of stock returns (the Sharpe ratio of the stock market). Some authors, such as Kandel and Stambaugh (1991), have responded to the equity premium puzzle by arguing that risk aversion is indeed much higher than traditionally thought. However this can lead to the “risk-free rate puzzle” of Weil (1989). If investors are very risk averse, then they have a strong desire to transfer wealth from periods with high consumption to periods with low consumption. Since consumption has tended to grow steadily over time, high risk aversion makes investors want to borrow to reduce the discrepancy between future consumption and present consumption. To reconcile this with the low real interest rate we observe, we must postulate that investors are extremely patient; their preferences give future consumption almost as much weight as current consumption, or even greater weight than current consumption. In other words they have a low or even negative rate of time preference. I will call the second question the “equity volatility puzzle”. To understand the puzzle, it is helpful to classify the possible sources of stock market volatility. Recall first that prices, dividends, and returns are not independent but are linked by an accounting identity. If an asset’s price is high today, then either its dividend must be high tomorrow, or its return must be low between today and tomorrow, or its price must be even higher tomorrow. If one excludes the possibility that an asset price can grow explosively forever in a “rational bubble”, then it follows that an asset with a high price today must have some combination of high dividends over the indefinite future and low returns over the indefinite future. Investors must recognize this fact in forming their expectations, so when an asset price is high investors expect some combination of high future dividends and low future returns. Movements in prices must then be associated with some combination of changing expectations (“news”) about future dividends and changing expectations about future returns; the latter can in turn be broken into news about future riskless real interest rates and news about future excess returns on stocks over short-term debt. Until the early 1980s, most financial economists believed that there was very little predictable variation in stock returns and that dividend news was by far the most important factor driving stock market fluctuations. LeRoy and Porter (1981) and Shiller (1981) challenged this orthodoxy by pointing out that plausible measures of expected future dividends are far less volatile than real stock prices. Their work is related to stylized facts 6, 9, and 11. Later in the 1980s Campbell and Shiller (1988a,b), Fama and French (1988a,b, 1989), Poterba and Summers (1988) and others showed that there appears to be a forecastable component of stock returns that is important when returns are measured over long horizons. The variables that predict returns are ratios of stock prices to scale
808
J.Y. Campbell
factors such as dividends, earnings, moving averages of earnings, or the book value of equity. When stock prices are high relative to these scale factors, subsequent longhorizon real stock returns tend to be low. This predictable variation in stock returns is not matched by any equivalent variation in long-term real interest rates, which are comparatively stable and do not seem to move with the stock market. In the late 1970s, for example, real interest rates were unusually low yet stock prices were depressed, implying high forecast stock returns; the 1980s saw much higher real interest rates along with buoyant stock prices, implying low forecast stock returns. Thus excess returns on stock over Treasury bills are just as forecastable as real returns on stock. This work is related to stylized facts 12 and 13. Campbell (1991) used this evidence to show that much of stock market volatility is associated with changing forecasts of excess stock returns. Changing forecasts of dividend growth and real interest rates are less important empirically. The equity volatility puzzle is closely related to the equity premium puzzle. A complete model of stock market behavior must explain both the average level of stock prices and their movements over time. One strand of work on the equity premium puzzle makes this explicit by studying not the consumption covariance of measured stock returns, but the consumption covariance of returns on hypothetical assets whose dividends are determined by consumption. The same model is used to generate both the volatility of stock prices and the implied equity premium. This was the approach of Mehra and Prescott (1985), and many subsequent authors have followed their lead. Unfortunately, it is not easy to construct a general equilibrium model that fits all the stylized facts given above. The standard model of Mehra and Prescott (1985) gets variation in stock prices relative to dividends only from predictable variation in consumption growth which moves the expected dividend growth rate and the riskless real interest rate. The model is not consistent with the empirical evidence for predictable variation in excess stock returns. Bond market data pose a further challenge to this standard model of stock returns. In the model, stocks behave very much like long-term real bonds; both assets are driven by long-term movements in the riskless real interest rate. Thus, parameter values that produce a large equity premium tend also to produce a large term premium on real bonds. While there is little direct evidence on real bond premia, nominal bond premia have historically been much smaller than equity premia. Since the data suggest that predictable variation in excess returns is an important source of stock market volatility, researchers have begun to develop models in which the quantity of stock market risk or the price of risk change through time. ARCH models and other econometric methods show that the conditional variance of stock returns is highly variable. If this conditional variance is an adequate proxy for the quantity of stock market risk, then perhaps it can explain the predictability of excess stock returns. There are several problems with this approach. First, changes in conditional variance are most dramatic in daily or monthly data and are much weaker at lower frequencies. There is some business-cycle variation in volatility, but it does not seem strong enough to explain large movements in aggregate stock prices [Bollerslev,
Ch. 13:
Consumption-Based Asset Pricing
809
Chou and Kroner (1992), Schwert (1989)]. Second, forecasts of excess stock returns do not move proportionally with estimates of conditional variance [Harvey (1989, 1991), Chou, Engle and Kane (1992)]. Finally, one would like to derive stock market volatility endogenously within a model rather than treating it as an exogenous variable. There is little evidence of cyclical variation in consumption or dividend volatility that could explain the variation in stock market volatility. A more promising possibility is that the price of risk varies over time. Time-variation in the price of risk arises naturally in a model with a representative agent whose utility displays habit-formation. Campbell and Cochrane (1999), building on the work of Abel (1990), Constantinides (1990), and others, have proposed a simple asset pricing model of this sort. Campbell and Cochrane suggest that assets are priced as if there were a representative agent whose utility is a power function of the difference between consumption and “habit”, where habit is a slow-moving nonlinear average of past aggregate consumption. This utility function makes the agent more risk-averse in bad times, when consumption is low relative to its past history, than in good times, when consumption is high relative to its past history. Stock market volatility is explained by a small amount of underlying consumption (dividend) risk, amplified by variable risk aversion; the equity premium is explained by high stock market volatility, together with a high average level of risk aversion. Similar ideas have been put forward in the recent literature on behavioral finance. Kahneman and Tversky (1979) used experimental evidence to argue that agents behave as if their utility function is kinked at a reference point which is close to the current level of wealth. Benartzi and Thaler (1995) argued that Kahneman and Tversky’s “prospect theory” could explain the equity premium puzzle if agents frequently evaluate their utility and reset their reference points, so that the kink in utility increases their effective risk aversion. Barberis, Huang and Santos (2001), building on behavioral evidence of Thaler and Johnson (1990), argue that prospect theory should be extended to make agents effectively less risk averse if their wealth has recently risen, very much in the spirit of a habit-formation model. Time-variation in the price of risk can also arise from the interaction of heterogeneous agents. Constantinides and Duffie (1996) develop a simple framework with many agents who have identical utility functions but heterogeneous streams of labor income; they show how changes in the cross-sectional distribution of income can generate any desired behavior of the market price of risk. Dumas (1989), Grossman and Zhou (1996), Wang (1996), Sandroni (1999) and Chan and Kogan (2002) move in a somewhat different direction by exploring the interactions of agents who have different levels of risk aversion. Some aspects of asset market behavior could also be explained by irrational expectations of investors. If investors are excessively pessimistic about economic growth, for example, they will overprice short-term bills and underprice stocks; this would help to explain the equity premium and risk-free rate puzzles. If investors overestimate the persistence of variations in economic growth, they will overprice stocks when growth has been high and underprice them when growth has been low,
810
J.Y. Campbell
producing time-variation in the price of risk [Barsky and De Long (1993), Barberis, Shleifer and Vishny (1998)]. This chapter has three objectives. First, it tries to summarize recent work on stock price behavior, much of which is highly technical, in a way that is accessible to a broader professional audience. Second, the chapter summarizes stock market data from other countries and asks which of the U.S. stylized facts hold true more generally. The recent theoretical literature is used to guide the exploration of the international data. Third, the chapter systematically compares stock market data with bond market data. This is an important discipline because some popular models of stock prices are difficult to reconcile with the behavior of bond prices. The organization of the chapter is as follows. Section 2 introduces the international data and reviews stylized facts 1–9 to see which of them apply outside the USA. (Additional details are given in a Data Appendix available on the author’s web page.) Section 3 discusses the equity premium puzzle, taking the volatility of stock returns as given. Section 4 discusses the stock market volatility puzzle. This section also reviews stylized facts 10–13 in the international data. Sections 3 and 4 drive one towards the conclusion that the price of risk is both high and time-varying. It must be high to explain the equity premium puzzle, and it must be time-varying to explain the predictable variation in stock returns that seems to be responsible for the volatility of stock returns. Section 5 discusses models which produce this result, including models with habit-formation in utility, heterogeneous investors, and irrational expectations. Section 6 draws some implications for research in macroeconomics, including the modelling of investment, labor supply, and the welfare costs of economic fluctuations.
2. International stock market data The stylized facts described in the previous section apply to postwar quarterly U.S. data. Most empirical work on stock prices uses this data set, or a longer annual U.S. time series originally put together by Shiller (1981). But data on stock prices, interest rates, and consumption are also available for many other countries. In this chapter I use an updated version of the international developed-country data set in Campbell (1999). The dataset includes Morgan Stanley Capital International (MSCI) stock market data covering the period since 1970. I combine the MSCI data with macroeconomic data on consumption, short- and long-term interest rates, and the price level from the International Financial Statistics (IFS) of the International Monetary Fund. I am able to use consumption of nondurables and services for the USA, but must use total consumption expenditure for the other countries in the dataset. For some countries the IFS data are only available quarterly over a shorter sample period, so I use the longest available sample for each country. Sample start dates range from 1970.1 to 1982.2, and sample end dates range from 1997.4 to 1999.3. I work
Ch. 13:
Consumption-Based Asset Pricing
811
with data from 11 countries: Australia, Canada, France, Germany, Italy, Japan, the Netherlands, Sweden, Switzerland, UK and USA. For some purposes it is useful to have data over a much longer span of calendar time. I have been able to obtain annual data for Sweden over the period 1920–1998 and the UK over the period 1919–1998 to complement the U.S. annual data for the period 1891–1998. The Swedish data come from Frennberg and Hansson (1992) and Hassler, Lundvik, Persson and S¨oderlind (1994), while the UK data come from Barclays de Zoete Wedd Securities (1995) and the Economist (1987). 3 In working with international stock market data, it is important to keep in mind that different national stock markets are of very different sizes, both absolutely and in proportion to national GDP’s. Campbell (1999, Table 1), reports that in the quarterly MSCI data for 1993 the Japanese MSCI index was worth only 65% of the U.S. MSCI index, the UK MSCI index was worth only 30% of the U.S. index, the French and German MSCI indexes were worth only 11% of the U.S. index, and all other countries’ indexes were worth less than 10% of the U.S. index. The USA and Japan together accounted for 66% of the world MSCI capitalization, while the USA, Japan, the UK, France, and Germany together accounted for 86%. The same table shows that different countries’ stock market values are very different as a fraction of GDP. If one thinks that total wealth-output ratios are likely to be fairly constant across countries, then this indicates that national stock markets are very different fractions of total wealth in different countries. In highly capitalized countries such as the UK and Switzerland, the MSCI index accounted for about 80% of GDP in 1993, whereas in Germany and Italy it accounted for less than 20% of GDP. The theoretical convention of treating the stock market as a claim to total consumption, or as a proxy for the aggregate wealth of an economy, makes much more sense in the highly capitalized countries. 4 Table 1 reports summary statistics for international asset returns. For each country the table reports the mean, standard deviation, and first-order autocorrelation of the real stock return and the real return on a short-term debt instrument. 5 The first line of Table 1 gives numbers for the standard postwar quarterly U.S. data set summarized in the introduction. The top panel gives numbers for the 11-country quarterly MSCI data, and the bottom panel gives numbers for the long-term annual 3
The annual data end in 1994 or 1995 and are updated using the more recently available quarterly data. Full details about the construction of the quarterly and annual data are given in a Data Appendix available on the author’s web page. Dimson, Marsh and Staunton (2002) report summary statistics for a more comprehensive long-term annual international dataset. 4 Stock ownership also tends to be much more concentrated in the countries with low capitalization. La Porta, Lopez-de-Silanes, Shleifer and Vishny (1997) have related these international patterns to differences in the protections afforded outside investors by different legal systems. 5 As explained in the Data Appendix, the best available short-term interest rate is sometimes a Treasury bill rate and sometimes another money market interest rate. Both means and standard deviations are given in annualized percentage points. To annualize the raw quarterly numbers, means are multiplied by 400 while standard deviations are multiplied by 200 (since standard deviations increase with the square root of the time interval in serially uncorrelated data).
812
J.Y. Campbell Table 1 International stock and bill returns
Country
Sample period
re
s (re )
ø(re )
rf
s (rf )
ø(rf )
USA
1947.2–1998.4
8.085
15.645
0.083
0.896
1.748
0.508
AUL
1970.1–1999.1
3.540
22.699
0.005
2.054
2.528
0.645
CAN
1970.1–1999.2
5.431
17.279
0.072
2.713
1.855
0.667
FR
1973.2–1998.4
9.023
23.425
0.048
2.715
1.837
0.710
GER
1978.4–1997.4
9.838
20.097
0.090
3.219
1.152
0.348
ITA
1971.2–1998.2
3.168
27.039
0.079
2.371
2.847
0.691
JAP
1970.2–1999.1
4.715
21.909
0.021
1.388
2.298
0.480
NTH
1977.2–1998.4
14.070
17.228
−0.030
3.377
1.591
−0.085
SWD
1970.1–1999.3
10.648
23.839
0.022
1.995
2.835
0.260
SWT
1982.2–1999.1
13.744
21.828
−0.128
1.393
1.498
0.243
UK
1970.1–1999.2
8.155
21.190
0.084
1.301
2.957
0.478
USA
1970.1–1998.4
6.929
17.556
0.051
1.494
1.687
0.571
SWD
1920–1998
7.084
18.641
0.096
2.209
5.800
0.710
UK
1919–1998
7.713
22.170
−0.023
1.255
5.319
0.589
USA
1891–1998
7.169
18.599
0.047
2.020
8.811
0.338
data sets. The table shows that the first four stylized facts given in the introduction are fairly robust across countries. 1. Stock markets have delivered average real returns of 4.5% or better in almost every country and time period. The exceptions to this occur in short-term quarterly data, and are concentrated in markets that are particularly small relative to GDP (Italy, 3.2%), or that predominantly represent claims on natural resources (Australia, 3.5%). The very poor performance of the Japanese stock market in the 1990s has reduced the average Japanese return to 4.7%. 2. Short-term debt has rarely delivered an average real return above 3%. The exceptions to this occur in two countries, Germany and the Netherlands, whose sample periods begin in the late 1970s and thus exclude much of the surprise inflation of the oil-shock period. 3. The annualized standard deviation of stock returns ranges from 15% to 27%. It is striking that the market with the highest volatility, Italy, is the smallest market relative to GDP and the one with the lowest average return. 4. In quarterly data the annualized volatility of real returns on short debt is 2.9% for the UK, 2.8% for Italy and Sweden, 2.5% for Australia, 2.3% for Japan, and below 2% for all other countries. Volatility is higher in long-term annual data because of large swings in inflation in the interwar period, particularly in 1919–1921. Much of
Ch. 13:
Consumption-Based Asset Pricing
813
the volatility in these real returns is probably due to unanticipated inflation and does not reflect volatility in the ex ante real interest rate. These numbers show that high average stock returns, relative to the returns on shortterm debt, are not unique to the USA but characterize many other countries as well. Recently a number of authors have suggested that average excess returns in the USA may be overstated by sample selection or survivorship bias. If economists study the USA because it has had an unusually successful economy, then sample average U.S. stock returns may overstate the true mean U.S. stock return. Brown, Goetzmann and Ross (1995) present a formal model of this effect. While survivorship bias may affect data from all the countries included in Table 1, it is reassuring that the stylized facts are so consistent across these countries. 6 Table 2 turns to data on aggregate consumption and stock market dividends. The table is organized in the same way as Table 1. It illustrates the robustness of two more of the stylized facts given in the introduction. 5. In the postwar period the annualized standard deviation of real consumption growth is never above 3%. This is true even though data are used on total consumption, rather than nondurables and services consumption, for all countries other than the USA. Even in the longer annual data, which include the turbulent interwar period, consumption volatility slightly exceeds 3% only in the USA. 6. The volatility of dividend growth is much greater than the volatility of consumption growth, but generally less than the volatility of stock returns. The exceptions to this occur in countries with highly seasonal dividend payments; these countries have large negative autocorrelations for quarterly dividend growth and much smaller volatility when dividend growth is measured over a full year rather than over a quarter. Table 3 reports the contemporaneous correlations among real consumption growth, real dividend growth, and stock returns. It turns out that these correlations are somewhat sensitive to the timing convention used for consumption. A timing convention is needed because the level of consumption is a flow during a quarter rather than a point-in-time observation; that is, the consumption data are timeaveraged. 7 If we think of a given quarter’s consumption data as measuring consumption at the beginning of the quarter, then consumption growth for the quarter is next 6 Jorion and Goetzmann (1999) consider international stock-price data from earlier in the 20th Century and argue that the long-term average real growth rate of stock prices has been higher in the U.S. than elsewhere. However they do not have data on dividend yields, which are an important component of total return and are likely to have been particularly important in Europe during the troubled interwar period. Dimson, Marsh and Staunton (2002) do have dividend yields for the early 20th Century and find that U.S. stock returns were not extraordinarily high relative to other countries in that period. Li and Xu (2002) argue that survival bias can be large only if the ex ante probability that a market fails to survive is unrealistically large. 7 Time-averaging is one of a number of interrelated issues that arise in relating measured consumption data to the theoretical concept of consumption. Other issues include measurement error, seasonal adjustment, and durable goods. Grossman, Melino and Shiller (1987), Wheatley (1988), Miron (1986)
814
J.Y. Campbell Table 2 International consumption and dividends
Country
Sample period
Dc
s (Dc)
USA
1947.2–1998.4
1.964
1.073
0.216
2.159
28.291
−0.544
AUL
1970.1–1999.1
2.099
2.056
−0.324
0.656
34.584
−0.450
CAN
1970.1–1999.2
2.082
1.971
0.105
−0.488
5.604
0.522
FR
1973.2–1998.4
1.233
2.909
0.029
−0.255
13.108
−0.133
GER
1978.4–1997.4
1.681
2.431
−0.327
1.189
8.932
0.078
ITA
1971.2–1998.2
2.200
1.700
0.283
−3.100
19.092
0.298
JAP
1970.2–1999.1
3.205
2.554
−0.275
−2.350
4.351
0.354
NTH
1977.2–1998.4
1.841
2.619
−0.257
4.679
4.973
0.294
SWD
1970.1–1999.3
0.962
1.856
−0.266
4.977
14.050
0.386
SWT
1982.2–1999.1
0.524
2.112
−0.399
6.052
7.698
0.271
UK
1970.1–1999.2
2.203
2.507
−0.006
0.591
7.047
0.313
USA
1970.1–1998.4
1.812
0.907
0.374
0.612
16.803
−0.578
SWD
1920–1998
1.770
2.816
0.150
1.551
12.894
0.315
UK
1919–1998
1.551
2.886
0.294
1.990
7.824
0.233
USA
1891–1998
1.789
3.218
−0.116
1.516
14.019
−0.087
ø(Dc)
Dd
s (Dd)
ø(Dd)
quarter’s consumption divided by this quarter’s consumption. If on the other hand we think of the consumption data as measuring consumption at the end of the quarter, then consumption growth is this quarter’s consumption divided by last quarter’s consumption. Table 3 uses the former, “beginning-of-quarter” timing convention because this produces a higher contemporaneous correlation between consumption growth and stock returns. The timing convention has less effect on correlations when the data are measured at longer horizons. Table 3 also shows how the correlations among real consumption growth, real dividend growth, and real stock returns vary with the horizon. Each pairwise correlation among these series is calculated for horizons of 1, 4, 8 and 16 quarters in the quarterly data and for horizons of 1, 2, 4 and 8 years in the long-term annual data. The table illustrates three more stylized facts from the introduction. 7. Real consumption growth and dividend growth are generally weakly positively correlated in the quarterly data. In many countries the correlation increases strongly with the measurement horizon. However long-horizon correlations remain close to
and Heaton (1995) handle time-averaging, measurement error, seasonality, and durability, respectively, in a much more careful way than is possible here, while Wilcox (1992) provides a detailed account of the sampling procedures used to construct U.S. consumption data.
Ch. 13:
Table 3 Horizon effects on correlations of real consumption growth, dividend growth, and stock returns ø(Dc, Dd)
Sample period
ø(Dc, re )
ø(Dd, re )
1
4
8
16
1
4
8
16
1
4
8
16
0.205
0.249
0.229
0.340
0.267
0.029
0.034
0.055
0.215
0.471
USA
1947.3–1998.3
0.053
0.135
AUL
1970.2–1998.4
−0.058
−0.044
0.076
−0.039
0.162
0.282
0.261
0.422
0.091
−0.007
0.191
0.390
CAN
1970.2–1999.1
−0.107
−0.120
−0.057
−0.088
0.188
0.352
0.272
0.068
−0.046
0.176
0.414
0.476
FR
1973.2–1998.3
0.076
0.059
0.053
0.166
−0.099
−0.117
−0.320
−0.138
0.094
0.176
0.147
0.134
GER
1978.4–1997.3
0.029
0.118
0.291
0.278
0.027
−0.151
−0.091
−0.249
0.057
0.298
0.421
0.481
ITA
1971.2–1998.1
0.107
−0.106
−0.226
−0.193
−0.028
−0.033
−0.040
−0.201
0.080
0.296
0.382
0.716
JAP
1970.2–1998.4
−0.030
−0.147
−0.217
−0.230
0.112
0.398
0.400
0.235
0.029
0.103
0.120
0.317
NTH
1977.2–1998.3
0.087
0.211
0.331
0.348
0.024
0.174
0.238
0.183
0.122
0.285
0.466
0.624
SWD
1970.2–1999.2
0.074
0.201
0.285
0.370
0.027
0.092
0.114
0.082
0.099
0.090
0.294
0.575
SWT
1982.2–1998.4
−0.019
−0.009
0.138
0.237
−0.119
0.015
0.092
−0.087
0.173
0.218
0.556
0.732
UK
1970.2–1999.1
0.046
0.104
0.118
0.328
0.125
0.197
0.359
0.441
−0.100
0.007
0.284
0.650
USA
1970.2–1998.3
−0.029
0.131
0.256
0.420
0.286
0.359
0.324
0.144
0.015
−0.049
0.020
0.330
SWD
1920–1997
0.261
0.359
0.354
0.084
0.209
0.287
0.387
0.137
0.285
0.447
0.648
0.684
UK
1920–1997
0.083
0.335
0.516
0.422
0.422
0.467
0.458
0.390
0.161
0.442
0.594
0.782
USA
1891–1997
0.178
0.151
0.202
0.098
0.452
0.491
0.396
0.138
0.476
0.503
0.676
0.784
Consumption-Based Asset Pricing
Country
815
816
J.Y. Campbell
zero for Australia and Canada, and are substantially negative for Italy (with a very small stock market) and Japan (with anomalous dividend behavior). The correlations of consumption and dividend growth are positive and often quite large in the longerterm annual data sets. 8. The correlations between real consumption growth rates and stock returns are quite variable across countries. They tend to be somewhat higher in high-capitalization countries (with the notable exception of Switzerland), which is consistent with the view that stock returns proxy more accurately for wealth returns in these countries. Correlations typically increase with the measurement horizon out to 1 or 2 years, and are moderately positive in the longer-term annual data sets. 9. The correlations between real dividend growth rates and stock returns are small at a quarterly horizon but increase dramatically with the horizon. This pattern holds in every country. The correlations also increase strongly with the horizon in the longerterm annual data. After this preliminary look at the data, I now use some simple finance theory to interpret the stylized facts. 3. The equity premium puzzle 3.1. The stochastic discount factor To understand the equity premium puzzle, consider the intertemporal choice problem of an investor, indexed by k, who can trade freely in some asset i and can obtain a gross simple rate of return (1 + Ri,t + 1 ) on the asset held from time t to time t + 1. If the investor consumes Ck,t at time t and has time-separable utility with discount factor d and period utility U (Ck,t ), then her first-order condition is (1) U (Ck,t ) = dEt 1 + Ri,t + 1 U Ck,t + 1 . The left-hand side of Equation (1) is the marginal utility cost of consuming one real dollar less at time t; the right-hand side is the expected marginal utility benefit from investing the dollar in asset i at time t, selling it at time t + 1, and consuming the proceeds. The investor equates marginal cost and marginal benefit, so Equation (1) must describe the optimum. Dividing Equation (1) by U (Ck,t ) yields U Ck,t + 1 = Et 1 + Ri,t + 1 Mk,t + 1 , (2) 1 = Et 1 + Ri,t + 1 d U Ck,t where Mk,t + 1 = dU (Ck,t + 1 )/U (Ck,t ) is the intertemporal marginal rate of substitution of the investor, also known as the stochastic discount factor. This way of writing the model in discrete time is due originally to Rubinstein (1976), while the continuoustime version of the model is due to Breeden (1979). Grossman and Shiller (1981),
Ch. 13:
Consumption-Based Asset Pricing
817
Shiller (1982), Hansen and Jagannathan (1991) and Cochrane and Hansen (1992) have developed the implications of the discrete-time model in detail. Cochrane (2001) gives a textbook exposition of finance using this framework. The derivation just given for Equation (2) assumes the existence of an investor maximizing a time-separable utility function, but in fact the equation holds more generally. The existence of a positive stochastic discount factor is guaranteed by the absence of arbitrage in markets in which non-satiated investors can trade freely without transactions costs. In general there can be many such stochastic discount factors – for example, different investors k whose marginal utilities follow different stochastic processes will have different Mk,t + 1 – but each stochastic discount factor must satisfy Equation (2). It is common practice to drop the subscript k from this equation and simply write (3) 1 = Et 1 + Ri,t + 1 Mt + 1 . In complete markets the stochastic discount factor Mt + 1 is unique because investors can trade with one another to eliminate any idiosyncratic variation in their marginal utilities. To understand the implications of Equation (3) it is helpful to write the expectation of the product as the product of expectations plus the covariance, Et 1 + Ri,t + 1 Mt + 1 = Et 1 + Ri,t + 1 Et [Mt + 1 ] + Covt Ri,t + 1 , Mt + 1 . (4) Substituting into Equation (3) and rearranging gives 1 − Covt Ri,t + 1 , Mt + 1 1 + Et Ri,t + 1 = . Et [Mt + 1 ]
(5)
An asset with a high expected simple return must have a low covariance with the stochastic discount factor. Such an asset tends to have low returns when investors have high marginal utility. It is risky in that it fails to deliver wealth precisely when wealth is most valuable to investors. Investors therefore demand a large risk premium to hold it. Equation (5) must hold for any asset, including a riskless asset whose gross simple return is 1 + Rf ,t + 1 . Since the simple riskless return has zero covariance with the stochastic discount factor (or any other random variable), it is just the reciprocal of the expectation of the stochastic discount factor: 1 + Rf ,t + 1 =
1 . Et [Mt + 1 ]
This can be used to rewrite Equation (5) as 1 + Et Ri,t + 1 = 1 + Rf ,t + 1 1 − Covt Ri,t + 1 , Mt + 1 .
(6)
(7)
For simplicity I now follow Hansen and Singleton (1983) and assume that the joint conditional distribution of asset returns and the stochastic discount factor is lognormal
818
J.Y. Campbell
and homoskedastic. While these assumptions are not literally realistic – stock returns in particular have fat-tailed distributions with variances that change over time – they do make it easier to discuss the main forces that should determine the equity premium. When a random variable X is conditionally lognormally distributed, it has the convenient property that log Et X = Et log X + 12 Var t log X ,
(8)
where Var t log X ≡ Et [(log X − Et log X )2 ]. If in addition X is conditionally homoskedastic, then Var t log X = E[(log X − Et log X )2 ] = Var(log X − Et log X ). Thus, with joint conditional lognormality and homoskedasticity of asset returns and consumption, I can take logs of Equation (3) and obtain 0 = Et ri,t + 1 + Et mt + 1 +
1 2
si2 + sm2 + 2sim .
(9)
Here mt = log(Mt ) and ri,t = log(1 + Ri,t ), while si2 denotes the unconditional variance of log return innovations Var(ri,t + 1 −Et ri,t + 1 ), sm2 denotes the unconditional variance of innovations to the stochastic discount factor Var(mt + 1 − Et mt + 1 ), and sim denotes the unconditional covariance of innovations Cov(ri,t + 1 − Et ri,t + 1 , mt + 1 − Et mt + 1 ). Equation (9) has both time-series and cross-sectional implications. Consider first an asset with a riskless real return rf ,t + 1 . For this asset the return innovation variance sf2 and the covariance sfm are both zero, so the riskless real interest rate obeys rf ,t + 1 = −Et mt + 1 −
sm2 . 2
(10)
This equation is the log counterpart of Equation (6). Subtracting Equation (10) from Equation (9) yields an expression for the expected excess return on risky assets over the riskless rate: s2 Et ri,t + 1 − rf ,t + 1 + i = −sim . 2
(11)
The variance term on the left-hand side of Equation (11) is a Jensen’s Inequality adjustment arising from the fact that we are describing expectations of log returns. In effect this term converts the expected excess return from a geometric average to an arithmetic average. It would disappear if we rewrote the equation in terms of the log expectation of the ratio of gross simple returns: log Et [(1 + Ri,t + 1 )/ (1 + Rf ,t + 1 )] = −sim . The right hand side of Equation (11) says that the risk premium is determined by the negative of the covariance of the asset with the stochastic discount factor. This equation is the log counterpart of Equation (7). The covariance sim can be written as the product of the standard deviation of the asset return si , the standard deviation of the stochastic discount factor sm , and the
Ch. 13:
Consumption-Based Asset Pricing
819
correlation between the asset return and the stochastic discount factor øim . Since øim −1, −sim si sm . Substituting into Equation (11), sm
Et [ri,t + 1 − rf ,t + 1 ] + si2 / 2 . si
(12)
This inequality was first derived by Shiller (1982); a multi-asset version was derived by Hansen and Jagannathan (1991) and developed further by Cochrane and Hansen (1992). The right-hand side of Equation (12) is the excess log return on an asset, adjusted for Jensen’s Inequality, divided by the standard deviation of the asset’s return – a logarithmic Sharpe ratio for the asset. Equation (12) says that the standard deviation of the log stochastic discount factor must be greater than this Sharpe ratio for all assets i, that is, it must be greater than the maximum possible Sharpe ratio obtainable in asset markets. Table 4 uses Equation (12) to illustrate the equity premium puzzle. For each data set the first column of the table reports the average excess return on stock over short-term debt, adjusted for Jensen’s Inequality by adding one-half the sample variance of the excess log return to get a sample estimate of the numerator in Equation (12). This adjusted or arithmetic average excess return is multiplied by 400 to express it in annualized percentage points. The second column of the table gives the annualized standard deviation of the excess log stock return, a sample estimate of the denominator in Equation (12). This standard deviation was reported earlier in Table 1. The third column gives the ratio of the first two columns, multiplied by 100; this is a sample estimate of the lower bound on the standard deviation of the log stochastic discount factor, expressed in annualized percentage points. In the postwar U.S. data the estimated lower bound is a standard deviation greater than 50% a year; in the other quarterly data sets it is between 15% and 20% for Australia and Italy, between 20% and 30% for Canada and Japan, and above 30% for all the other countries. In the long-run annual data sets the lower bound on the standard deviation exceeds 30% for all three countries. 3.2. Consumption-based asset pricing with power utility To understand why these numbers are disturbing, I now follow Rubinstein (1976), Lucas (1978), Breeden (1979), Grossman and Shiller (1981), Mehra and Prescott (1985) and other classic papers on the equity premium puzzle and assume that there is a representative agent who maximizes a time-separable power utility function defined over aggregate consumption Ct : U (Ct ) =
Ct1−g − 1 , 1−g
(13)
where g is the coefficient of relative risk aversion. This utility function has several important properties.
820
Table 4 The equity premium puzzle Country
Sample period
aere
s (ere )
s (m)
s (Dc)
ø(ere , Dc)
cov(ere , Dc)
RRA(1)
RRA(2)
USA
1947.2–1998.3
8.071
15.271
52.853
1.071
AUL
1970.1–1998.4
3.885
22.403
17.342
2.059
0.205
3.354
240.647
49.326
0.144
6.640
58.511
8.421
CAN
1970.1–1999.1
3.968
17.266
22.979
1.920
0.202
6.694
FR
1973.2–1998.3
8.308
23.175
35.848
2.922
−0.093
−6.315
GER
1978.4–1997.3
8.669
20.196
42.922
2.447
0.029
1.446
ITA
1971.2–1998.1
4.687
27.068
17.314
1.665
−0.006
−0.252
JAP
1970.2–1998.4
5.098
21.498
23.715
2.561
0.112
6.171
82.620
9.260
NTH
1977.2–1998.3
11.421
16.901
67.576
2.510
0.032
1.344
849.991
26.918
SWD
1970.1–1999.2
11.539
23.518
49.066
1.851
0.015
0.674
1713.197
SWT
1982.2–1998.4
14.898
21.878
68.098
2.123
−0.112
−5.181
UK
1970.1–1999.1
9.169
21.198
43.253
2.511
0.093
USA
1970.1–1998.3
6.353
16.976
37.425
0.909
0.274
SWD
1920–1997
6.540
18.763
34.855
5.622
59.266
11.966
<0
12.270
599.468
17.542
<0
10.400
26.501
<0
32.076
4.930
185.977
17.222
4.233
150.100
41.178
0.167
8.830
74.062
12.400
UK
1919–1997
8.674
21.277
40.767
5.630
0.351
21.042
41.223
14.483
USA
1891–1997
6.723
18.496
36.345
6.437
0.495
29.450
22.827
11.293
J.Y. Campbell
Ch. 13:
Consumption-Based Asset Pricing
821
First, it is scale-invariant; with constant return distributions, risk premia do not change over time as aggregate wealth and the scale of the economy increase. This is important because over the past two centuries wealth and consumption have increased manyfold, yet riskless interest rates and risk premia do not seem to have trended up or down. Power utility is one of the few utility specifications that are consistent with this fact. Related to this, if different investors in the economy have different wealth levels but the same power utility function, then they can be aggregated into a single representative investor with the same utility function as the individual investors. A possibly less desirable property of power utility is that the elasticity of intertemporal substitution, which I write as y, is the reciprocal of the coefficient of relative risk aversion g. Epstein and Zin (1991) and Weil (1989) have proposed a more general utility specification that preserves the scale-invariance of power utility but breaks the tight link between the coefficient of relative risk aversion and the elasticity of intertemporal substitution. I discuss this form of utility in Section 3.4 below. Power utility implies that marginal utility U (Ct ) = Ct−g , and the stochastic discount factor Mt + 1 = d(Ct + 1 /Ct )−g . The assumption made previously that the stochastic discount factor is conditionally lognormal will be implied by the assumption that aggregate consumption is conditionally lognormal [Hansen and Singleton (1983)]. Making this assumption for expositional convenience, the log stochastic discount factor is mt + 1 = log(d) − gDct + 1 , where ct = log(Ct ), and Equation (9) becomes 0 = Et ri,t + 1 + log d − gEt Dct + 1 +
1 2
si2 + g 2 sc2 − 2gsic .
(14)
Here sc2 denotes the unconditional variance of log consumption innovations Var(ct + 1 − Et ct + 1 ), and sic denotes the unconditional covariance of innovations Cov(ri,t + 1 − Et ri,t + 1 , ct + 1 − Et ct + 1 ). Equation (10) now becomes rf ,t + 1 = − log d + gEt Dct + 1 −
g 2 sc2 . 2
(15)
This equation says that the riskless real rate is linear in expected consumption growth, with slope coefficient equal to the coefficient of relative risk aversion. The conditional variance of consumption growth has a negative effect on the riskless rate which can be interpreted as a precautionary savings effect. Equation (11) becomes s2 Et ri,t + 1 − rf ,t + 1 + i = gsic . 2
(16)
The log risk premium on any asset is the coefficient of relative risk aversion times the covariance of the asset return with consumption growth. Intuitively, an asset with a high consumption covariance tends to have low returns when consumption is low,
822
J.Y. Campbell
that is, when the marginal utility of consumption is high. Such an asset is risky and commands a large risk premium. Table 4 uses Equation (16) to illustrate the equity premium puzzle. As already discussed, the first column of the table reports a sample estimate of the left-hand side of Equation (16), multiplied by 400 to express it in annualized percentage points. The second column reports the annualized standard deviation of the excess log stock return (given earlier in Table 1), the fourth column reports the annualized standard deviation of consumption growth (given earlier in Table 2), the fifth column reports the correlation between the excess log stock return and consumption growth, and the sixth column gives the product of these three variables which is the annualized covariance sic between the log stock return and consumption growth. Finally, the table gives two columns with implied risk aversion coefficients. The column headed RRA(1) uses Equation (16) directly, dividing the adjusted average excess return by the estimated covariance to get estimated risk aversion. 8 The column headed RRA(2) sets the correlation of stock returns and consumption growth equal to one before calculating risk aversion. While this is of course a counterfactual exercise, it is a valuable diagnostic because it indicates the extent to which the equity premium puzzle arises from the smoothness of consumption rather than the low correlation between consumption and stock returns. The correlation is hard to measure accurately because it is easily distorted by short-term measurement errors in consumption, and Table 4 indicates that the sample correlation is quite sensitive to the measurement horizon. By setting the correlation to one, the RRA(2) column indicates the extent to which the equity premium puzzle is robust to such issues. A correlation of one is also implicitly assumed in the volatility bound for the stochastic discount factor, Equation (12), and in many calibration exercises such as Mehra and Prescott (1985), Abel (1999) or Campbell and Cochrane (1999). Table 4 shows that the equity premium puzzle is a robust phenomenon in international data. The coefficients of relative risk aversion in the RRA(1) column are generally extremely large. They are usually many times greater than 10, the maximum level considered plausible by Mehra and Prescott (1985). In a few cases the risk aversion coefficients are negative because the estimated covariance of stock returns with consumption growth is negative, but in these cases the covariance is extremely close to zero. Even when one ignores the low correlation between stock returns and consumption growth and gives the model its best chance by setting the correlation to one, the RRA(2) column still has risk aversion coefficients above 10 in all countries except Australia and Japan. Thus, the fact shown in Table 3, that for some countries the correlation of stock returns and consumption increases with the horizon, is unable by itself to resolve the equity premium puzzle.
8
The calculation is done correctly, in natural units, even though the table reports average excess returns and covariances in percentage point units. Equivalently, the ratio of the quantities given in the table is multiplied by 100.
Ch. 13:
Consumption-Based Asset Pricing
823
Gabaix and Laibson (2001) and Parker (2001) have argued that adjustment costs in consumption artificially reduce the short-run variability of consumption and its correlation with stock returns, biasing upwards the estimated risk aversion coefficients in Table 4. Adjustment costs should dampen the short-term volatility of consumption growth but not the volatility over longer horizons; equivalently, short-term consumption growth should be positively autocorrelated. The first-order autocorrelation coefficients for consumption growth, shown in Table 2, do not generally support this model since they are typically small and often negative. 9 However Gabaix and Laibson (2001) look at higher-order autocorrelations and point out that they tend to be larger in countries with larger stock markets. The results of other studies of U.S. consumption growth are mixed. Campbell and Mankiw (1989), Cochrane (1994) and Lettau and Ludvigson (2001) find that U.S. consumption growth is almost unforecastable, although discrete-state Markov models estimated by Cecchetti, Lam and Mark (1990, 1993), Kandel and Stambaugh (1991) and Mehra and Prescott (1985) imply modest but persistent predictable variation in U.S. consumption growth. The risk aversion estimates in Table 4 are of course point estimates and are subject to sampling error. No standard errors are reported for these estimates. However authors such as Cecchetti, Lam and Mark (1993) and Kocherlakota (1996), studying the longrun annual U.S. data, have found small enough standard errors that they can reject risk aversion coefficients below about 8 at conventional significance levels. Of course, the validity of these tests depends on the characteristics of the data set in which they are used. Rietz (1988) has argued that there may be a peso problem in these data. A peso problem arises when there is a small positive probability of an important event, and investors take this probability into account when setting market prices. If the event does not occur in a particular sample period, investors will appear irrational in the sample and economists will misestimate their preferences. While it may seem unlikely that this could be an important problem in 100 years of annual data, Rietz (1988) argues that an economic catastrophe that destroys almost all stock-market value can be extremely unlikely and yet have a major depressing effect on stock prices. One difficulty with this argument is that it requires not only a potential catastrophe, but one which affects stock market investors more seriously than investors in short-term debt instruments. Many countries that have experienced catastrophes, such as Russia or Germany, have seen very low returns on short-term government debt as well as on equity. A peso problem that affects both asset returns equally will affect estimates of the average levels of returns but not estimates of the equity premium. 10 The major
9 These autocorrelations are biased upwards by the time-averaging of consumption data, but outside the U.S. are biased downwards by the durable component of total consumption expenditure. The absence of positive autocorrelations in consumption growth is also evidence against the Constantinides (1990) model of habit formation, discussed in Section 5.1, which has similar implications for consumption growth. 10 This point is relevant for the study of Jorion and Goetzmann (1999). These authors measure average growth rates of real stock prices, as a proxy for real stock returns. They find low real stock-price growth
824
J.Y. Campbell
example of a disaster for stockholders that did not negatively affect bondholders is the Great Depression of the early 1930s, but of course this is included in the long-run annual data for Sweden, the UK, and the USA, all of which display an equity premium puzzle. Also, the consistency of the results across countries requires investors in all countries to be concerned about catastrophes. If the potential catastrophes are uncorrelated across countries, then it becomes less likely that the data set includes no catastrophes; thus the argument seems to require a potential international catastrophe that affects all countries simultaneously. Even if the equity premium puzzle is not entirely spurious, there are several reasons to think that stock returns exceeded their true long-run mean in the late 20th Century. Dimson, Marsh and Staunton (2002) find that international returns were generally higher in the late 20th Century than in the early 20th Century. Siegel (1994) reports similar results for U.S. data going back to the early 19th Century. Fama and French (2002) point out that average U.S. stock returns in the late 20th Century were considerably higher than accountants’ estimates of the return on equity for U.S. corporations. Thus if one uses average returns as an estimate of the true cost of capital, one is forced to the implausible conclusion that corporations destroyed stockholder value by retaining and reinvesting earnings rather than paying them out. Unusually high stock returns in the late 20th Century could have resulted from unexpectedly favorable conditions for economic growth. But they could also have resulted from a structural decline in the equity premium. Several economists have recently argued that the equity premium is now far lower than it was in the early 20th Century [Heaton and Lucas (1999), Jagannathan, McGrattan and Scherbina (2001)]. 3.3. The risk-free rate puzzle One response to the equity premium puzzle is to consider larger values for the coefficient of relative risk aversion g. Kandel and Stambaugh (1991) have advocated this. 11 However this leads to a second puzzle. Equation (15) implies that the unconditional mean riskless interest rate is g 2 sc2 Erf ,t + 1 = − log d + gg − , (17) 2 where g is the mean growth rate of consumption. Since g is positive, as shown in Table 3, high values of g imply high values of gg. Ignoring the term −g 2 sc2 / 2 for rates in many countries in the early 20th Century, but Dimson, Marsh and Staunton (2002) show that in many cases these were accompanied by low returns to holders of short-term debt, and also by high dividend yields. 11 One might think that introspection would be sufficient to rule out very large values of g, but Kandel and Stambaugh (1991) point out that introspection can deliver very different estimates of risk aversion depending on the size of the gamble considered. This suggests that introspection can be misleading or that some more general model of utility is needed.
Ch. 13:
Consumption-Based Asset Pricing
825
the moment, this can be reconciled with low average short-term real interest rates, shown in Table 1, only if the discount factor d is close to or even greater than one, corresponding to a low or even negative rate of time preference. This is the risk-free rate puzzle emphasized by Weil (1989). Intuitively, the risk-free rate puzzle is that if investors are risk-averse then with power utility they must also be extremely unwilling to substitute intertemporally. Given positive average consumption growth, a low riskless interest rate and a high rate of time preference, such investors would have a strong desire to borrow from the future to reduce their average consumption growth rate. A low riskless interest rate is possible in equilibrium only if investors have a low or negative rate of time preference that reduces their desire to borrow. 12 Of course, if the risk aversion coefficient g is high enough then the negative quadratic term −g 2 sc2 / 2 in Equation (17) dominates the linear term and pushes the riskless interest rate down again. The quadratic term reflects precautionary savings; risk-averse agents with uncertain consumption streams have a precautionary desire to save, which can work against their desire to borrow. But a reasonable rate of time preference is obtained only as a knife-edge case. Table 5 illustrates the risk-free rate puzzle in international data. The table first shows the average risk-free rate from Table 1 and the mean consumption growth rate and standard deviation of consumption growth from Table 2. These moments and the risk aversion coefficients calculated in Table 4 are substituted into Equation (17), and the equation is solved for an implied time preference rate. The time preference rate is reported in percentage points per year; it can be interpreted as the riskless real interest rate that would prevail if consumption were known to be constant forever at its current level, with no growth and no volatility. Risk aversion coefficients in the RRA(2) range imply negative time preference rates in every country except Switzerland, whereas larger risk aversion coefficients in the RRA(1) range imply time preference rates that are often positive but always implausible and vary wildly across countries. An interesting issue is how mismeasurement of average inflation might affect these calculations. There is a growing consensus that in recent years conventional price indices have overstated true inflation by failing to fully capture the effects of quality improvements, consumer substitution to cheaper retail outlets, and price declines in newly introduced goods. If inflation is overstated by, say, 1%, the real interest rate is understated by 1%, which by itself might help to explain the risk-free rate puzzle. Unfortunately the real growth rate of consumption is also understated by 1%, which worsens the risk-free rate puzzle. When g > 1, this second effect dominates and understated inflation makes the risk-free rate puzzle even harder to explain.
12 As Abel (1999) and Kocherlakota (1996) point out, negative time preference is consistent with finite utility in a time-separable model provided that consumption is growing, and marginal utility shrinking, sufficiently rapidly. The question is whether negative time preference is plausible.
826
Table 5 The risk-free rate puzzle Country
Sample period
rf
Dc
s (Dc)
RRA(1)
TPR(1)
RRA(2)
TPR(2)
USA
1947.2–1998.3
0.896
1.951
1.071
240.647
−136.270
49.326
−81.393
AUL
1970.1–1998.4
2.054
2.071
2.059
58.511
−46.512
8.421
−13.880
CAN
1970.1–1999.1
2.713
2.170
1.920
−61.154
11.966
−20.618
FR
1973.2–1998.3
2.715
1.212
2.922
<0
N/A
12.270
−5.735
GER
1978.4–1997.3
3.219
1.673
2.447
599.468
9757.265
17.542
−16.910
ITA
1971.2–1998.1
2.371
2.273
1.665
<0
N/A
10.400
−19.765
JAP
1970.2–1998.4
1.388
3.233
2.561
82.620
−41.841
9.260
−25.735
NTH
1977.2–1998.3
3.377
1.671
2.510
849.991
21349.249
26.918
−18.769
SWD
1970.1–1999.2
1.995
1.001
1.851
1713.197
48590.956
26.501
−12.506
SWT
1982.2–1998.4
1.393
0.559
2.123
<0
N/A
32.076
6.636
UK
1970.1–1999.1
1.301
2.235
2.511
185.977
676.439
17.222
−27.838
USA
1970.1–1998.3
1.494
1.802
0.909
150.100
−175.916
41.178
−65.701
SWD
1920–1997
2.209
1.730
2.811
74.062
90.793
12.400
−13.165
UK
1919–1997
1.255
1.472
2.815
41.223
7.913
14.483
−11.749
USA
1891–1997
2.020
1.760
3.218
22.827
−11.162
11.293
−11.247
59.266
J.Y. Campbell
Ch. 13:
Consumption-Based Asset Pricing
827
3.4. Bond returns and the equity-premium and risk-free rate puzzles Some authors have argued that the risk-free interest rate is low because short-term government debt is more liquid than long-term financial assets. Short-term debt is “moneylike” in that it facilitates transactions and can be traded at minimal cost. The liquidity advantage of debt reduces its equilibrium return and increases the equity premium [Bansal and Coleman (1996), Heaton and Lucas (1996)]. The difficulty with this argument is that it implies that all long-term assets should have large excess returns over short-term debt. Long-term government bonds, for example, are not moneylike and so the liquidity argument implies that they should offer a large term premium. But historically, the term premium has been many times smaller than the equity premium. This point is illustrated in Table 6, which reports two alternative measures of the term premium. The first measure is the average log yield spread on long-term bonds over the short-term interest rate, while the second is the average quarterly excess log return on long bonds. In a long enough sample these two averages should coincide if there is no upward or downward drift in interest rates. The average yield spread is typically between 0.5% and 1.5%. A notable outlier is Italy, which has a negative average yield spread in this period. Average long bond returns are quite variable across countries, reflecting differences in inflationary experiences; however the average excess bond return rarely exceeds 2% per year. Thus both measures suggest that term premia are far smaller than equity premia. Table 7 develops this point further by repeating the calculations of Table 5, using bond returns rather than equity returns. The average excess log return on bonds over short debt, adjusted for Jensen’s Inequality, is divided by the standard deviation of the excess bond return to calculate a bond Sharpe ratio which is a lower bound on the standard deviation of the stochastic discount factor. The Sharpe ratio for bonds is several times smaller than the Sharpe ratio for equities, indicating that term premia are small even after taking account of the lower volatility of bond returns. This finding is not consistent with a strong liquidity effect at the short end of the term structure, but it is consistent with a consumption-based asset pricing model if bond returns have a low correlation with consumption growth. Table 7 shows that sample consumption correlations often are lower for bonds, so that RRA(1) risk aversion estimates for bonds, which use these correlations, are often comparable to those for equities. A direct test of the liquidity story is to measure excess returns on stocks over long bonds, rather than over short debt. If the equity premium is due to a liquidity effect on short-term interest rates, then there should be no “equity-bond premium” puzzle. Table 8 carries out this exercise and finds that the equity-bond premium puzzle is just as severe as the standard equity premium puzzle. 13
13
The excess return of equities over bonds must be measured with the appropriate correction for Jensen’s Inequality to adjust from a geometric to an arithmetic mean. From Equation (9), the appropriate
828
J.Y. Campbell Table 6 International yield spreads and bond excess returns
Country
Sample period
s
s (s)
ø(s)
erb
s (erb )
ø(erb )
USA
1947.2–1998.4
1.177
0.991
0.782
0.295
8.823
0.073
AUL
1970.1–1999.1
0.946
1.604
0.751
0.817
8.582
0.171
CAN
1970.1–1999.2
1.114
1.636
0.835
1.512
9.074
0.014
FR
1973.2–1998.4
0.992
1.508
0.748
2.127
7.942
0.303
GER
1978.4–1997.4
1.073
1.486
0.884
1.335
7.256
0.132
ITA
1971.2–1998.2
−0.157
1.928
0.762
0.417
9.599
0.359
JAP
1970.2–1999.1
0.665
1.447
0.849
2.067
9.232
−0.081
NTH
1977.2–1998.4
1.336
1.735
0.613
2.592
7.712
0.060
SWD
1970.1–1999.3
1.024
1.928
0.726
1.170
7.924
0.269
SWT
1982.2–1999.1
0.653
1.595
0.782
1.816
6.382
0.240
UK
1970.1–1999.2
1.068
2.076
0.896
2.006
11.309
−0.023
USA
1970.1–1998.4
1.498
1.180
0.743
1.909
10.420
0.034
SWD
1920–1998
0.378
1.184
0.325
0.057
8.877
−0.342
UK
1919–1998
1.220
1.510
0.676
0.872
9.041
−0.087
USA
1891–1998
0.737
1.533
0.595
0.359
6.666
0.072
3.5. Separating risk aversion and intertemporal substitution Epstein and Zin (1989, 1991) and Weil (1989) use the theoretical framework of Kreps and Porteus (1978) to develop a more flexible version of the basic power utility model. That model is restrictive in that it makes the elasticity of intertemporal substitution, y, the reciprocal of the coefficient of relative risk aversion, g. Yet it is not clear that these two concepts should be linked so tightly. Risk aversion describes the consumer’s reluctance to substitute consumption across states of the world and is meaningful even in an atemporal setting, whereas the elasticity of intertemporal substitution describes the consumer’s willingness to substitute consumption over time and is meaningful even in a deterministic setting. The Epstein–Zin–Weil model retains many of the attractive features of power utility but breaks the link between the parameters g and y. The Epstein–Zin–Weil objective function is defined recursively by Ut =
1−g q
(1 − d) Ct
+d
Et Ut1−g +1
q q1 1−g
,
(18)
measure is the log excess return on equities over short-term debt, less the log excess return on bonds over short-term debt, plus one-half the variance of the log equity return, less one-half the variance of the log bond return.
Ch. 13:
Table 7 The bond premium puzzle s (Dc)
ø(erb , Dc)
cov(erb , Dc)
8.842
7.503
1.071
0.080
8.584
15.474
2.059
0.115
2.048
9.088
22.532
1.920
0.163
2.852
2.446
7.981
30.642
2.922
−0.022
−0.523
1978.4–1997.3
1.502
7.291
20.601
2.447
0.109
1.942
ITA
1971.2–1998.1
0.852
9.643
8.840
1.665
−0.012
−0.197
JAP
1970.2–1998.4
2.410
9.261
26.020
2.561
0.009
0.213
1132.286
10.160
NTH
1977.2–1998.3
2.827
7.751
36.477
2.510
0.024
0.465
608.349
14.530
SWD
1970.1–1999.2
1.677
7.885
21.265
1.851
0.080
1.170
143.256
11.486
SWT
1982.2–1998.4
2.182
6.395
34.126
2.123
−0.135
−1.832
<0
16.075
UK
1970.1–1999.1
2.774
11.337
24.468
2.511
0.139
3.967
69.917
9.743
USA
1970.1–1998.3
2.429
10.464
23.213
0.909
0.238
2.259
107.540
25.541
SWD
1920–1997
0.272
8.800
3.086
5.622
−0.069
−1.698
Sample period
aerb
s (erb )
USA
1947.2–1998.3
0.663
AUL
1970.1–1998.4
1.328
CAN
1970.1–1999.1
FR
1973.2–1998.3
GER
RRA(1)
RRA(2)
0.762
87.051
7.002
2.036
65.248
7.514
71.812 <0 77.359 <0
<0
11.733 10.488 8.420 5.310
Consumption-Based Asset Pricing
s (m)
Country
1.098
UK
1919–1997
1.036
8.862
11.689
5.630
0.293
7.308
14.176
4.153
USA
1891–1997
0.497
6.641
7.477
6.437
0.114
2.432
20.423
2.323
829
830
Table 8 The equity-bond premium puzzle s (m)
s (Dc)
ø(ereb , Dc)
cov(ereb , Dc)
RRA(1)
RRA(2)
16.097
49.253
1.071
0.150
2.592
305.896
45.967
21.436
12.658
2.059
0.104
4.604
58.930
6.146
1.920
16.371
13.330
1.920
0.122
3.843
56.786
6.941
5.862
22.022
26.883
2.922
−0.090
−5.792
<0
9.201
1978.4–1997.3
7.167
20.126
36.860
2.447
−0.010
−0.496
<0
15.065
ITA
1971.2–1998.1
3.834
26.356
15.590
1.665
−0.001
−0.055
<0
JAP
1970.2–1998.4
2.688
22.382
14.794
2.561
0.104
5.958
55.579
5.777
NTH
1977.2–1998.3
8.593
17.006
52.403
2.510
0.021
0.879
1013.977
20.874
SWD
1970.1–1999.2
9.863
22.435
44.238
1.851
−0.012
−0.497
<0
SWT
1982.2–1998.4
12.716
21.570
59.591
2.123
−0.073
−3.348
<0
28.069
UK
1970.1–1999.1
6.395
17.400
36.233
2.511
0.022
0.963
654.932
14.427
USA
1970.1–1998.3
3.924
17.646
25.998
0.909
0.123
1.974
232.408
28.606
SWD
1920–1997
6.268
20.136
34.378
5.622
0.186
10.528
65.753
12.231
Country
Sample period
aereb
s (ereb )
USA
1947.2–1998.3
7.408
AUL
1970.1–1998.4
2.557
CAN
1970.1–1999.1
FR
1973.2–1998.3
GER
9.364
23.894
UK
1919–1997
7.638
17.595
41.575
5.630
0.277
13.734
53.264
14.770
USA
1891–1997
6.226
19.110
34.337
6.437
0.439
27.018
24.287
10.669
J.Y. Campbell
Ch. 13:
Consumption-Based Asset Pricing
831
where q ≡ (1 − g)/ (1 − 1/ y). When g = 1/ y, q = 1 and the recursion (18) becomes linear; it can then be solved forward to yield the familiar time-separable power utility model. The intertemporal budget constraint for a representative agent can be written as Wt + 1 = 1 + Rw,t + 1 (Wt − Ct ) ,
(19)
where Wt + 1 is the representative agent’s wealth, and (1 + Rw,t + 1 ) is the gross simple return on the portfolio of all invested wealth. 14 This form of the budget constraint is appropriate for a complete-markets model in which wealth includes human capital as well as financial assets. Epstein and Zin use dynamic programming arguments to show that Equations (18) and (19) together imply an Euler equation of the form ⎤ ⎡& 1−q − y1 'q Ct + 1 1 1 + Ri,t + 1 ⎦ . 1 = Et ⎣ d Ct (1 + Rw,t + 1 )
(20)
If I assume that asset returns and consumption are homoskedastic and jointly lognormal, then this implies that the riskless real interest rate is rf ,t + 1 = − log d +
q −1 2 1 q 2 Et [Dct + 1 ] + sw − s . y 2 2y 2 c
(21)
The riskless interest rate is a constant, plus 1/ y times expected consumption growth. In the power utility model, 1/ y = g and q = 1, so Equation (21) reduces to Equation (15). The premium on risky assets, including the wealth portfolio itself, is sic s2 + (1 − q) siw . Et ri,t + 1 − rf ,t + 1 + i = q 2 y
(22)
This says that the risk premium on asset i is a weighted combination of asset i’s covariance with consumption growth (divided by the elasticity of intertemporal substitution y) and asset i’s covariance with the return on wealth. The weights are q and 1 − q, respectively. The Epstein–Zin–Weil model thus nests the consumption CAPM with power utility (q = 1) and the traditional static CAPM (q = 0). Equations (21) and (22) seem to indicate that Epstein–Zin–Weil utility might be helpful in resolving the equity-premium and risk-free rate puzzles. First, in
14
This is often called the “market” return and written Rm,t + 1 , but I have already used m to denote the stochastic discount factor so I write Rw,t + 1 to avoid confusion.
832
J.Y. Campbell
Equation (21) a high risk aversion coefficient does not necessarily imply a low average risk-free rate, because Erf ,t + 1 = − log d +
g q −1 2 q 2 + s − s . y 2 w 2y 2 c
(23)
The average consumption growth rate is divided by y here, and in the Epstein–Zin– Weil framework y need not be small even if g is large. Second, Equation (22) suggests that it might not even be necessary to have a high risk aversion coefficient to explain the equity premium. If q Ñ 1, then the risk premium on an asset is determined in part by its covariance with the wealth portfolio, siw . If the return on wealth is more volatile than consumption growth, as implied by the common use of a stock index return as a proxy for the return on wealth, then siw may be much larger than sic , and this may help to explain the equity premium. Unfortunately, there are serious difficulties with both these potential escape routes from the equity premium and risk-free rate puzzles. The difficulty with the first is that there is direct empirical evidence for a low elasticity of intertemporal substitution in consumption. The difficulty with the second is that consumption and wealth are linked through the intertemporal budget constraint; if consumption is smooth and wealth is volatile, this itself is a puzzle that must be explained, not an exogenous fact that can be used to resolve other puzzles. I now develop these points in detail by analyzing the dynamic behavior of stock returns and short-term interest rates in relation to consumption. 4. The dynamics of asset returns and consumption 4.1. Time-variation in conditional expectations Equations (21) and (22) imply a tight link between rational expectations of asset returns and of consumption growth. Expected asset returns are perfectly correlated with expected consumption growth, with a standard deviation 1/ y times as large. Equivalently, the standard deviation of expected consumption growth is y times as large as the standard deviation of expected asset returns. This suggests a way to estimate y. Hansen and Singleton (1983), followed by Campbell and Mankiw (1989), Hall (1988), and others, have proposed an instrumental variables (IV) regression approach. If we define an error term hi,t + 1 ≡ ri,t + 1 − Et [ri,t + 1 ]− g(Dct + 1 − Et [Dct + 1 ]), then we can rewrite Equations (21) and (22) as a regression equation, 1 ri,t + 1 = mi + (24) Dct + 1 + hi,t + 1 . y In general, the error term hi,t + 1 will be correlated with realized consumption growth so OLS is not an appropriate estimation method. However hi,t + 1 is uncorrelated with
Ch. 13:
Consumption-Based Asset Pricing
833
any variables in the information set at time t. Hence any lagged variables correlated with asset returns can be used as instruments in an IV regression to estimate 1/ y. Table 9 illustrates two-stage least squares estimation of Equation (24). In each panel the first set of results uses the short-term real interest rate, while the second set uses the real stock return. The instruments are the asset return, the consumption growth rate, and the log price-dividend ratio. The instruments are lagged twice to avoid difficulties caused by time-aggregation of the consumption data [Campbell and Mankiw (1989, 1991), Wheatley (1988)]. For each asset and set of instruments, the table first reports the R2 statistics, and significance levels below, for first-stage regressions of the asset return and consumption growth rate onto the instruments. The table then shows the IV estimate of 1/ y with its standard error below, and (in the column headed “Test (24)”) the R2 statistic for a regression of the residual on the instruments together with the associated significance level of a test of the over-identifying restrictions of the model. The quarterly results in Table 9 show that the short-term real interest rate is highly forecastable in every country except Germany. The real stock return is also forecastable in some countries, but there is relatively weak evidence for forecastability in consumption growth. In fact the R2 statistic for forecasting consumption growth is lower than the R2 statistic for stock returns in many of the quarterly data sets. The IV estimates of 1/ y are very imprecise; they are sometimes large and positive, often negative, but they are almost never significantly different from zero. The overidentifying restrictions of the model are often strongly rejected, particularly when the short-term interest rate is used in the model. Results are similar for the annual data sets in Table 10, except that twice-lagged instruments have almost no ability to forecast real interest rates or stock returns in the annual U.S. data. 15 Campbell and Mankiw (1989, 1991) have explored this regression in more detail, using both U.S. and international data, and have found that predictable variation in consumption growth is often associated with predictable variation in income growth. This suggests that some consumers keep their consumption close to their income, either because they follow “rules of thumb”, or because they are liquidity-constrained, or because they are “buffer-stock” savers [Deaton (1991), Carroll (1992)]. After controlling for the effect of predictable income growth, there is little remaining predictable variation in consumption growth to be explained by consumers’ response to variation in real interest rates. One problem with IV estimation of Equation (24) is that the instruments are only very weakly correlated with the regressor because consumption growth is hard to forecast in this data set. Nelson and Startz (1990) and Staiger and Stock (1997) have
15 Campbell, Lo and MacKinlay (1997, Table 8.2), shows much greater forecastability of returns using once-lagged instruments in a similar annual U.S. data set. Even with twice-lagged instruments, U.S. annual returns become forecastable once one increases the return horizon beyond one year, as shown in Table 11 below.
834
J.Y. Campbell Table 9 Predictable variation in returns and consumption growth (see Equations 24 and 25)
Country Sample period
USA
1947.2–1998.3
Asset
rf re
AUL
1970.2–1998.4
rf re
CAN
1970.2–1999.1
rf re
FR
1973.2–1998.3
rf re
GER
1978.4–1997.3
rf re
ITA
1971.2–1998.1
rf re
JAP
1970.2–1998.4
rf re
NTH
1977.2–1998.3
rf re
ri
Dc
* (1/ y) (s.e.)
0.167
0.037
0.705
0.066
0.155
0.037
0.000
0.064
0.734
0.107
0.000
0.024
0.052
0.037
−9.455
−0.030
0.022
0.019
0.015
0.064
7.376
0.032
0.106
0.146
0.414
0.010
7.064
0.107
0.003
0.002
0.000
0.575
5.167
0.101
0.839
0.875
0.051
0.010
17.405
0.027
0.009
0.005
0.043
0.575
13.796
0.026
0.601
0.749
0.271
0.026
−0.899
−0.090
0.160
0.024
0.000
0.379
0.838
0.172
0.000
0.255
0.020
0.026
7.239
0.126
0.001
0.001
0.414
0.379
5.679
0.101
0.929
0.928
0.503
0.005
−4.044
−0.095
0.007
0.003
0.000
0.495
3.573
0.144
0.693
0.867
0.084
0.005
−2.872
−0.002
0.078
0.005
0.029
0.495
10.190
0.032
0.021
0.795
0.045
0.052
0.331
1.494
0.017
0.018
0.431
0.095
0.338
1.147
0.537
0.519
0.030
0.052
−4.984
−0.091
0.013
0.016
0.089
0.095
4.821
0.083
0.628
0.567
0.413
0.017
−4.126
−0.055
0.059
0.014
0.000
0.736
3.559
0.108
0.045
0.489
0.042
0.017
4.867
0.007
0.036
0.016
0.400
0.736
10.455
0.034
0.150
0.432
0.176
0.043
−0.362
−0.106
0.146
0.041
0.018
0.051
0.365
0.276
0.000
0.103
0.115
0.043
10.622
0.054
0.022
0.017
0.000
0.051
4.727
0.028
0.289
0.393
0.350
0.036
−0.601
−0.163
0.178
0.033
0.000
0.025
0.406
0.266
0.001
0.256
0.015
0.036
−2.612
−0.133
0.008
0.012
0.809
0.025
5.678
0.163
0.707
0.597
First-stage regressions
y (s.e.)
Test (24)
Test (25)
continued on next page
Ch. 13:
Consumption-Based Asset Pricing
835
Table 9, continued Country Sample period
SWD
1970.2–1998.3
Asset
rf re
SWT
1982.2–1998.4
rf re
UK
1970.2–1999.1
rf re
USA
1970.2–1998.3
rf re
SWD
1920–1997
rf re
UK
1920–1997
rf re
USA
1891–1997
rf re
ri
Dc
* (1/ y) (s.e.)
0.290
0.003
10.873
0.043
0.004
0.002
0.000
0.919
15.863
0.079
0.812
0.905
0.083
0.003
−3.313
−0.001
0.079
0.003
0.056
0.919
20.570
0.018
0.012
0.826
0.112
0.006
2.514
0.360
0.001
0.001
0.000
0.909
3.951
0.526
0.982
0.983
0.010
0.006
11.801
0.063
0.001
0.001
0.741
0.909
26.241
0.153
0.966
0.968
0.310
0.069
1.902
0.302
0.046
0.030
0.000
0.009
0.806
0.131
0.074
0.181
0.065
0.069
−3.372
−0.051
0.043
0.044
0.222
0.009
3.417
0.042
0.089
0.081
0.283
0.067
1.765
0.120
0.150
0.055
0.000
0.071
0.815
0.110
0.000
0.047
0.050
0.067
1.446
0.006
0.052
0.070
0.129
0.071
8.429
0.027
0.056
0.020
0.317
0.041
3.327
0.174
0.026
0.017
0.000
0.356
2.084
0.151
0.379
0.535
0.018
0.041
−1.575
−0.075
0.014
0.025
0.624
0.356
3.560
0.101
0.586
0.390
0.261
0.063
2.269
0.196
0.067
0.037
0.001
0.178
1.401
0.127
0.082
0.250
0.129
0.063
4.064
0.031
0.125
0.062
0.124
0.178
3.948
0.024
0.009
0.098
0.013
0.059
−0.229
−0.142
0.012
0.051
0.760
0.009
0.938
0.295
0.524
0.069
0.018
0.059
−0.326
−0.031
0.017
0.049
0.526
0.009
2.411
0.126
0.417
0.079
First-stage regressions
y (s.e.)
Test (24)
Test (25)
836
J.Y. Campbell
shown that in this situation asymptotic theory can be a poor guide to inference in finite samples; the asymptotic standard error of the coefficient tends to be too small and the overidentifying restrictions of the model may be rejected even when the model is true. To circumvent this problem, one can reverse the regression (24) and estimate Dct + 1 = ti + yri,t + 1 + zi,t + 1 .
(25)
If the orthogonality conditions hold, then the estimate of y in Equation (25) will asymptotically be the reciprocal of the estimate of 1/ y in Equation (24). In a finite sample, however, if y is small then IV estimates of Equation (25) will be better behaved than IV estimates of Equation (24). In Table 9 y is estimated to be close to zero everywhere except in Germany, where the standard error of the estimate is also large. The estimates are typically more precise than those for 1/ y. The overidentifying restrictions of the model are sometimes rejected, but less often and less strongly than when Equation (24) is estimated. These results suggest that the elasticity of intertemporal substitution y is small, so that the generality of the Epstein–Zin–Weil model, which allows y to be large even if g is large, does not actually help one fit the data on consumption and asset returns. Ogaki and Reinhart (1998) and Yogo (2003) reach similar conclusions taking account of durable goods and using weak-instrument econometric methods, respectively. Several caveats are worth noting. First, Attanasio and Weber (1993) and Beaudry and van Wincoop (1996) have found higher values for y using disaggregated cohort-level and state-level consumption data. Second, Vissing-Jorgensen (2002) points out that many consumers do not participate actively in asset markets; using household data she finds a higher value for y among asset market participants. Third, Bansal and Yaron (2000) have pointed out that estimates like those reported in Table 9 depend on the assumption that consumption growth and asset returns are homoskedastic. Time-varying second moments would introduce a time-varying intercept into the IV regression and could bias the estimates of the elasticity of intertemporal substitution. 4.2. A loglinear asset-pricing framework In order to understand the second moments of stock returns, it is essential to have a framework relating movements in stock prices to movements in expected future dividends and discount rates. The present value model of stock prices is intractably nonlinear when expected stock returns are time-varying, and this has forced researchers to use one of several available simplifying assumptions. The most common approach is to assume a discrete-state Markov process either for dividend growth [Mehra and Prescott (1985)] or, following Hamilton (1989), for conditionally expected dividend growth [Abel (1994, 1999), Cecchetti, Lam and Mark (1990, 1993), Kandel and Stambaugh (1991)]. The Markov structure makes it possible to solve the present value model, but the derived expressions for returns tend to be extremely complicated and
Ch. 13:
Consumption-Based Asset Pricing
837
so these papers usually emphasize numerical results derived under specific numerical assumptions about parameter values. 16 An alternative framework, which produces simpler closed-form expressions and hence is better suited for an overview of the literature, is the loglinear approximation to the exact present value model suggested by Campbell and Shiller (1988a). Campbell and Shiller’s loglinear relation between prices, dividends, and returns provides an accounting framework: High prices must eventually be followed by high future dividends or low future returns, and high prices must be associated with high expected future dividends or low expected future returns. Similarly, high returns must be associated with upward revisions in expected future dividends or downward revisions in expected future returns. The loglinear approximation starts with the definition of the log return on some asset i, ri,t + 1 ≡ log(Pi,t + 1 + Di,t + 1 ) − log(Pi,t ). The timing convention here is that prices are measured at the end of each period so that they represent claims to next period’s dividends. The log return is a nonlinear function of log prices pi,t and pi,t + 1 and log dividends di,t + 1 , but it can be approximated around the mean log dividend-price ratio, (di,t − pi,t ), using a first-order Taylor expansion. The resulting approximation is ri,t + 1 ≈ k + øpi,t + 1 + (1 − ø) di,t + 1 − pi,t ,
(26)
where ø and k are parameters of linearization defined by ø ≡ 1/ (1 + exp(di,t − pi,t )) and k ≡ − log(ø) − (1 − ø) log(1/ ø − 1). When the dividend-price ratio is constant, then ø = Pi / (Pi + Di ), the ratio of the ex-dividend to the cum-dividend stock price. In the postwar quarterly U.S. data shown in Table 10, the average price-dividend ratio has been 28.3 on an annual basis, implying that ø should be about 0.966 in annual data. 17 The Taylor approximation (26) replaces the log of the sum of the stock price and the dividend in the exact relation with a weighted average of the log stock price and the log dividend. The log stock price gets a weight ø close to one, while the log dividend gets a weight 1 − ø close to zero because the dividend is on average much smaller than the stock price, so a given percentage change in the dividend has a much smaller effect on the return than a given percentage change in the price. Equation (26) is a linear difference equation for the log stock price. Solving forward, imposing the terminal condition that limj → ∞ ø j pi,t + j = 0, taking expectations, and subtracting the current dividend, one gets ∞
pi,t − di,t =
k + Et ø j Ddi,t + 1 + j − ri,t + 1 + j . 1−ø
(27)
j=0
16 A partial exception to this statement is that Abel (1994) derives several analytical results for the first moments of returns in a Markov model for expected dividend growth. 17 Strictly speaking both ø and k should have asset subscripts i, but I omit these for simplicity. The asset pricing formulas later in this chapter assume that all assets have the same ø, which simplifies some expressions but does not change any of the qualitative conclusions.
838
J.Y. Campbell
This equation says that the log price–dividend ratio is high when dividends are expected to grow rapidly, or when stock returns are expected to be low. The equation should be thought of as an accounting identity rather than a behavioral model; it has been obtained merely by approximating an identity, solving forward subject to a terminal condition, and taking expectations. Intuitively, if the stock price is high today, then from the definition of the return and the terminal condition that the stock price is non-explosive, there must either be high dividends or low stock returns in the future. Investors must then expect some combination of high dividends and low stock returns if their expectations are to be consistent with the observed price. Equation (27) can also be understood as a dynamic generalization of the famous formula, often attributed to Myron Gordon (1962) but probably due originally to John Burr Williams (1938), that applies when the discount rate is a constant R and the expected dividend growth rate is a constant G. In this case the price–dividend ratio is a constant given by 1+G Pt . = Dt R − G
(28)
Equation (27) is equivalent to Equation (28) when expected returns and dividend growth rates are constant. The terminal condition used to obtain Equation (27) is perhaps controversial. Models of “rational bubbles” do not impose this condition. Blanchard and Watson (1982) and Froot and Obstfeld (1991) have proposed simple, explicit models of explosive bubbles in asset prices. There are however several reasons to rule out such bubbles. The theoretical circumstances under which bubbles can exist are quite restrictive; Tirole (1985), for example, uses an overlapping generations framework and finds that bubbles can only exist if the economy is dynamically inefficient, a condition which seems unlikely on prior grounds and which is hard to reconcile with the empirical evidence of Abel, Mankiw, Summers and Zeckhauser (1989). Santos and Woodford (1997) also conclude that the conditions under which bubbles can exist are fragile. Empirically, bubbles imply explosive behavior of prices in relation to dividends and other measures of fundamentals; there is no evidence of this, although nonlinear bubble models are hard to reject using standard linear econometric methods. 18 Equation (27) describes the log price–dividend ratio rather than the log price itself. This is a useful way to write the model if dividends follow a loglinear unit root process, so that log dividends and log prices are nonstationary. In this case changes in log dividends are stationary, so from Equation (27) the log price–dividend ratio is stationary provided that the expected stock return is stationary. Thus, log stock prices and dividends are cointegrated, and the stationary linear combination of these variables involves no unknown parameters since it is just the log ratio. 18
Campbell, Lo and MacKinlay (1997, Chapter 7), gives a somewhat more detailed textbook discussion of the literature on rational bubbles.
Ch. 13:
Consumption-Based Asset Pricing
839
Table 10 International stock prices and dividends Dp
Dd
−0.875
4.149
2.159
2.307
−3.099
−0.422
0.656
−1.163
0.924
−0.925
1.852
−0.488
1.999
0.567
0.967
−1.141
3.483
−0.255
3.738
0.322
0.918
−1.285
6.001
1.189
5.136
42.822
0.324
0.878
−3.442
0.554
−3.100
4.238
1970.2–1999.1
93.403
0.626
0.966
−1.843
3.142
−2.350
5.690
NTH
1977.2–1998.4
23.795
0.358
0.945
−0.390
9.207
4.679
4.498
SWD
1970.1–1999.3
39.506
0.479
0.958
−1.596
7.846
4.977
2.756
SWT
1982.2–1999.1
53.607
0.313
0.893
−1.283
11.642
6.052
6.328
UK
1970.1–1999.2
19.402
0.303
0.920
−1.185
2.599
0.591
1.533
USA
1970.1–1998.4
29.955
0.298
0.921
−0.260
3.214
0.612
2.699
SWD
1920–1998
28.441
0.379
0.818
−0.440
3.118
1.551
1.895
UK
1919–1998
21.197
0.245
0.542
−3.570
2.558
1.990
0.568
USA
1891–1998
23.751
0.321
0.787
−0.812
2.617
1.516
1.006
Country
Sample period
P/D
s ( p − d)
ø( p − d)
ADF(1)
USA
1947.2–1998.4
28.312
0.301
0.941
AUL
1970.1–1999.1
26.359
0.263
0.857
CAN
1970.1–1999.2
32.372
0.277
FR
1973.2–1998.4
24.698
GER
1978.4–1997.4
29.033
ITA
1971.2–1998.2
JAP
Dp− d
Table 10 reports some summary statistics for international stock prices in relation to dividends. The table gives the average price–dividend ratio, the standard deviation of the log price–dividend ratio in natural units, the first-order autocorrelation of the log price–dividend ratio, average growth rates of prices, dividends, and the log price– dividend ratio in percentage points per year, and a test statistic for the null hypothesis that the log price–dividend ratio has a unit root. Following standard practice, the price– dividend ratio is measured as the ratio of the current stock price to the total of dividends paid during the past year. Average price–dividend ratios vary considerably across countries. The extreme outlier is Japan, which has an average price–dividend ratio of 93. Price–dividend ratios have tended to increase over the sample period, as illustrated by the faster average growth rates of prices than of dividends; this reflects both the depressed stock markets of the 1970s, when many of the sample periods begin, and the rapid increases in stock prices of the late 1990s. For similar reasons the autocorrelations of price-dividend ratios tend to be high, and unit root tests rarely reject the unit root null hypothesis. One reaction to these findings is that price–dividend ratios are truly stationary, but unit root tests lack power to reject the null hypothesis. However there has been increasing interest in the possibility that permanent changes in dividend growth rates or expected returns generate permanent changes in dividend–price ratios.
840
J.Y. Campbell
One way in which this can occur is through shifts in corporate financial policy. If a firm permanently reduces its dividends and devotes the resources to repurchasing shares, current dividends fall but the growth rate of dividends per share increases because the number of shares outstanding starts to decline over time. One can adjust for this change by working with the total value of the firm, rather than the price per share, and calculating dividends plus repurchases. Liang and Sharpe (1999) have done this calculation for selected S&P 500 firms in the 1990s, and have found that repurchases add 75 to 100 basis points to the conventionally measured dividend–price ratio. Concerns about the instability of corporate financial policy have stimulated research on alternative valuation ratios such as the price–earnings or market–book ratio. Vuolteenaho (2000), for example, derives an expression for the market–book ratio, analogous to Equation (27), that relates it to expected future stock returns and accounting return on equity (ROE). So far I have written asset prices as linear combinations of expected future dividends and returns. Following Campbell (1991), I can also write asset returns as linear combinations of revisions in expected future dividends and returns. Substituting Equation (27) into Equation (26), I obtain ri,t + 1 − Et ri,t + 1 = (Et + 1 − Et )
∞ j=0
ø j Ddi,t + 1 + j − (Et + 1 − Et )
∞
ø j ri,t + 1 + j .
(29)
j=1
This equation says that unexpected stock returns must be associated with changes in expectations of future dividends or real returns. An increase in expected future dividends is associated with a capital gain today, while an increase in expected future returns is associated with a capital loss today. The reason is that with a given dividend stream, higher future returns can only be generated by future price appreciation from a lower current price. 4.3. The equity volatility puzzle I now use this accounting framework to illustrate the equity volatility puzzle. The intertemporal budget constraint for a representative agent, Equation (19), implies that aggregate consumption is the dividend on the portfolio of all invested wealth, denoted by subscript w: dwt = ct .
(30)
Many authors, including Grossman and Shiller (1981), Lucas (1978) and Mehra and Prescott (1985), have assumed that the aggregate stock market, denoted by subscript e for equity, is a good proxy for the portfolio of all wealth and thus is priced as if it pays consumption as its dividend. 19 Here I follow Campbell (1986) and Abel (1999) 19
This does not require that measured dividends literally equal measured consumption. For example, consumption could equal dividends plus repurchases if firms are repurchasing shares. Or consumption
Ch. 13:
Consumption-Based Asset Pricing
841
and make the slightly more general assumption that the dividend on equity equals aggregate consumption raised to a power l. In logs, we have det = lct .
(31)
Abel (1999) shows that the coefficient l can be interpreted as a measure of leverage. When l > 1, dividends and stock returns are more volatile than the returns on the aggregate wealth portfolio. This framework has the additional advantage that a riskless real bond with infinite maturity – an inflation-indexed consol, denoted by subscript b – can be priced merely by setting l = 0. The representative-agent asset-pricing model with Epstein–Zin–Weil utility, conditional lognormality, and homoskedasticity (Equations 21 and 22) implies that 1 Et re,t + 1 = me + (32) Et Dct + 1 , y where me is an asset-specific constant term. The expected log return on equity, like the expected log return on any other asset, is just a constant plus expected consumption growth divided by the elasticity of intertemporal substitution y. Power utility is the special case where the coefficient of relative risk aversion g is the reciprocal of y so the effect of expected consumption growth on expected asset returns is proportional to g; but this is not true in general. Substituting Equations (31) and (32) into Equations (27) and (29), I find that pet − det =
∞ 1 ke ø j Dct + 1 + j , + l− Et 1−ø y
(33)
j=0
and
re,t + 1 − Et re,t + 1
1 = l (Dct + 1 − Et Dct + 1 ) + l − y
(Et + 1 − Et )
∞
ø j Dct + 1 + j .
j=1
(34) Expected future consumption growth has offsetting effects on the log price–dividend ratio. It has a direct positive effect by increasing expected future dividends l-forone, but it has an indirect negative effect by increasing expected future real interest rates (1/ y)-for-one. The unexpected log return on the stock market is l times contemporaneous unexpected consumption growth (since contemporaneous consumption growth increases the contemporaneous dividend l-for-one), plus (l − 1/ y) times the discounted sum of revisions in expected future consumption growth. could be financed in part by labor income, which can be thought of as the dividend on human capital. If human capital returns and stock returns are perfectly correlated, the stock market is a proxy for total wealth even though it does not equal total wealth.
842
J.Y. Campbell
For future reference I note that Equation (34) can be inverted to express consumption growth as a function of the unexpected return on equity and revisions in expectations about future returns on equity. Rearranging Equation (34) and using Equation (32), Dct + 1 − Et Dct + 1
∞ 1 1 re,t + 1 − Et re,t + 1 + − y (Et + 1 − Et ) = ø j re,t + 1 + j . l l j=1
(35) An innovation in the equity return raises wealth by a factor (1/ l), and this raises consumption by the same factor. Increases in expected future equity returns have offsetting income and substitution effects on consumption; the positive income effect is (1/ l), and the negative substitution effect is −y. These equations can be simplified if I assume that expected aggregate consumption growth, which I write as zt , follows an AR(1) process with mean g and positive persistence ÷: Dct + 1 = zt + ûc,t + 1 ,
(36)
zt + 1 = (1 − ÷) g + ÷zt + ûz,t + 1 .
(37)
This is a linear version of the model used by Cecchetti, Lam and Mark (1990, 1993) and Kandel and Stambaugh (1991), in which expected consumption growth follows a persistent discrete-state Markov process. The contemporaneous shocks to realized consumption growth ûc,t + 1 and expected future consumption growth ûz,t + 1 may be positively or negatively correlated. The correlation between these contemporaneous shocks controls the univariate autocovariances of consumption growth; the first-order autocovariance is ÷ Var(zt ) + Cov(ûz,t + 1 , ûc,t + 1 ), and higher-order autocovariances die out geometrically at rate ÷. Thus, consumption growth inherits the positive serial correlation of the zt process unless the contemporaneous shocks are sufficiently negatively correlated. An important special case of the model sets ûz,t + 1 = ÷ûc,t + 1 to make consumption growth itself an AR(1) process; this is a linear version of the model of Mehra and Prescott (1985). 20 From Equation (21), the riskless interest rate is linear in expected consumption growth zt , so this model implies a homoskedastic AR(1) process for the riskless interest rate, with persistence ÷. It is a discrete-time version of the Vasicek (1977) model of 20 As noted earlier, the empirical evidence on serial correlation in consumption growth is mixed. Table 2 shows small negative autocorrelation in 7 out of 12 quarterly datasets, but only 1 out of 3 annual data sets. Measurement problems may bias these autocorrelations in either direction. Durability of consumption tends to bias autocorrelation downwards, but time-averaging and seasonal adjustment tend to bias it upwards. Empirical estimates of discrete-state Markov models by Cecchetti, Lam and Mark (1990, 1993), Kandel and Stambaugh (1991) and Mehra and Prescott (1985) find some evidence for modest but persistent predictable variation in US consumption growth. However Campbell and Mankiw (1989, 1991), Hall (1988), Cochrane (1994) and Lettau and Ludvigson (2001) find that U.S. consumption growth is almost unforecastable.
Ch. 13:
Consumption-Based Asset Pricing
843
the term structure of interest rates. Campbell, Lo and MacKinlay (1997, Chapter 11) gives a detailed textbook exposition of this model following Singleton (1990) and Sun (1992). Equations (36) and (37) allow me to rewrite Equations (33) and (34) as pet − det =
1 g zt − g k + l− + , 1−ø y 1 − ø 1 − ø÷
(38)
and ø 1 re,t + 1 − Et re,t + 1 = lûc,t + 1 + l − ûz,t + 1 . y 1 − ø÷
(39)
Equation (39) shows why it is difficult to match the volatility of stock returns within this standard framework. The most obvious way to generate volatile stock returns is to assume a large l, that is, a volatile dividend. Increasing l, however, has mixed effects; it increases the volatility of the first term in Equation (39) proportionally, but as long as l < 1/ y it diminishes the volatility of the second term because the dividend and real interest rate effects of expected consumption growth offset each other more exactly. The overall volatility of stock returns may actually fall, or grow only slowly, with l until the point is reached where l > 1/ y. The empirical evidence for small y presented in Table 9 suggests that very high l will be needed to generate volatile stock returns. A similar point has been made by Abel (1999), who emphasizes that predictable variation in expected consumption growth can dampen stock market volatility and exacerbate the equity premium puzzle. This model also tends to produce highly volatile returns on real (inflation-indexed) bonds. By setting l = 0 in Equations (38) and (39), the log yield and unexpected return on a real consol bond, denoted by a subscript b, are kb + ybt = dbt − pbt = − 1−ø
1 g zt − g + , y 1 − ø 1 − ø÷
(40)
and rb,t + 1 − Et rb,t + 1
1 ø =− ûz,t + 1 . y 1 − ø÷
(41)
When y is small, even modest variation in zt will tend to produce large variation in the risk-free interest rate and in the yields and returns on long-term real bonds. The correlation of stock and real bond returns is positive if l < 1/ y, but turns negative if l is large enough so that l > 1/ y. Of course, all these calculations are dependent on the assumption made at the beginning of this subsection, that stocks are priced as if they pay a multiple l of log aggregate consumption. More general models, allowing separate variation in dividends
844
J.Y. Campbell
and consumption, can in principle generate volatile stock returns without excessive variation in real interest rates. For example, we might modify Equation (31) to allow a second autonomous component of the dividend: det = lct + at ,
(42)
where Dat + 1 has a similar structure to consumption growth, being forecast by an AR(1) state variable: Dat + 1 = yt + ûa,t + 1 ,
(43)
yt + 1 = (1 − q) n + qyt + ûy,t + 1 .
(44)
This modification of the basic model would add a term n / (1 − ø) + ( yt − n )/ (1 − øq) to the formula for the log price–dividend ratio, Equation (38), and would add a term ûa,t + 1 + øûy,t + 1 / (1 − øq) to the formula for the unexpected log stock return, Equation (39). Cecchetti, Lam and Mark (1993) estimate a discrete-state Markov model allowing for this sort of separate variability in consumption and dividends. While such a model provides a more realistic description of dividends, it requires large predictable movements in dividends to explain stock market volatility. There is evidence for such dividend variation in some countries, but not in others as I show in Section 4.5. Lettau and Ludvigson (2001) have recently approached these issues from a somewhat different point of view. They divide the aggregate wealth portfolio into two components, financial wealth and human wealth. They use the Federal Reserve Board’s Flow of Funds accounts to measure aggregate financial wealth in the USA. While human wealth is not directly observable, they argue that it should be cointegrated with current labor income since labor income follows a unit root process. They also use the intertemporal budget constraint to argue that aggregate consumption should be cointegrated with total wealth if the returns on wealth are stationary. This implies that aggregate consumption, financial wealth, and labor income should be cointegrated. Lettau and Ludvigson find evidence to support this hypothesis in U.S. data. Cointegration among three variables implies that there exists a stationary linear combination of the variables, and that this combination forecasts the growth rates of at least one variable. Lettau and Ludvigson find that there is very little forecastability in consumption growth or labor income growth; instead, wealth growth is forecastable. These results are consistent with earlier findings of Campbell and Mankiw (1989), Hall (1988), and particularly Cochrane (1994). Cointegration between consumption and wealth implies that in the very long run the annualized growth rates of consumption and wealth must be identical and therefore must have identical volatilities. In the short run, however, we know that consumption is far smoother than wealth; this is precisely the equity volatility puzzle. How can we reconcile the observed short-run properties of consumption and wealth with the properties we know they must have in the long run? There are only two possibilities. First, it may be that the annualized volatility of consumption growth increases with
Ch. 13:
Consumption-Based Asset Pricing
845
the horizon over which it is measured, so that ultimately it reaches the high volatility of wealth growth. This would require that consumption is not a random walk, but has positive serial correlation in growth rates. Second, it may be that the annualized volatility of wealth growth decreases with the horizon over which it is measured, so that ultimately it reaches the low volatility of consumption growth. This would require that wealth is not a random walk, but has negative serial correlation in growth rates. These two possibilities represent fundamentally different views of the world. Is the world safe as suggested by consumption, or risky as suggested by the stock market? The work of Lettau and Ludvigson implies that the former view is correct: Wealth is mean-reverting and adjusts over long horizons to match the smoothness of consumption. A satisfactory model of equity volatility must be consistent with this finding.
4.4. Implications for the equity premium puzzle I now return to the basic model in which the log dividend is a multiple of log aggregate consumption, and use the formulas derived in the previous subsection to gain a deeper understanding of the equity premium puzzle. The discussion of the puzzle in Section 3 treated the covariance of stock returns with consumption as exogenous, but given a tight link between stock dividends and consumption the covariance can be derived from the stochastic properties of consumption itself. This is the approach of many papers including Abel (1994, 1999), Kandel and Stambaugh (1991), Mehra and Prescott (1985) and Rietz (1988). An advantage of this approach is that it clarifies the implications of Epstein–Zin– Weil utility. The Epstein–Zin–Weil Euler equation is derived by imposing a budget constraint that links consumption and wealth, and it explains risk premia by the covariances of asset returns with both consumption growth and the return on the wealth portfolio. The stochastic properties of consumption, together with the budget constraint, can be used to substitute either consumption or wealth out of the Epstein– Zin–Weil model. To understand this point, note that Equation (34) applies to the return on the wealth portfolio when l = 1. Setting e = w and l = 1, Equation (34) becomes ∞ 1 (Et + 1 − Et ) ø j Dct + 1 + j , rw,t + 1 − Et rw,t + 1 = Dct + 1 − Et Dct + 1 + 1 − y j=1
(45) an equation derived by Restoy and Weil (1998). It follows that the covariance of any asset return with the wealth portfolio must satisfy 1 sig , siw = sic + 1 − y
(46)
846
J.Y. Campbell
where sig denotes the covariance of asset return i with revisions in expectations of future consumption growth: ⎛ sig ≡ Cov ⎝ri,t + 1 − Et ri,t + 1 , (Et + 1 − Et )
∞
⎞ ø j Dct + 1 + j ⎠ .
(47)
j=1
The letter g is used here as a mnemonic for consumption growth. Substituting this expression into the formula for risk premia in the Epstein–Zin–Weil model, (22), that formula simplifies to s2 1 Et ri,t + 1 − rf ,t + 1 + i = gsic + g − sig . 2 y
(48)
The risk premium on any asset is the coefficient of risk aversion g times the covariance of that asset with consumption growth, plus (g − 1/ y) times the covariance of the asset with revisions in expected future consumption growth. The second term is zero if g = 1/ y, the power utility case, or if there are no revisions in expected future consumption growth. 21 I now return to the assumption made in the previous subsection that expected consumption growth is an AR(1) process given by Equation (37). Under this assumption, (Et + 1 − Et )
∞ j=1
ø Dct + 1 + j = j
ø 1 − ø÷
ûz,t + 1 .
(49)
Equations (39), (48) and (49) imply that ø se2 1 2 = g lsc + l − Et re,t + 1 − rf ,t + 1 + scz 2 y 1 − ø÷ 2 ø 1 1 lø sz2 . + g− scz + l − y 1 − ø÷ y 1 − ø÷ (50) This expression nests many of the leading cases explored in the literature on the equity premium puzzle. To understand it, it is helpful to break the equity premium
21 Using a continuous-time model, Svensson (1989) also emphasizes that risk premia in the Epstein– Zin–Weil model are determined only by risk aversion when investment opportunities and expected consumption growth are constant.
Ch. 13:
Consumption-Based Asset Pricing
847
into two components, the premium on real consol bonds over the riskless interest rate, and the premium on equities over real consol bonds: s2 1 ø Et rb,t + 1 − rf ,t + 1 + b = g − scz 2 y 1 − ø÷ 2 ø 1 1 + g− sz2 . − y y 1 − ø÷
(51)
s2 s2 ø Et re,t + 1 − rb,t + 1 + e − b = gl sc2 + scz 2 2 1 − ø÷ 2 ø 1 ø + g− scz + sz2 . l y 1 − ø÷ 1 − ø÷ (52) Equations (51) and (52) add up to Equation (50). The first term in each of these expressions represents the premium under power utility, while the second term represents the effect on the premium of moving to Epstein–Zin–Weil utility and allowing the coefficient of risk aversion to differ from the reciprocal of the intertemporal elasticity of substitution. Under power utility, the real bond premium in Equation (51) is determined by the covariance scz of realized consumption growth and innovations to expected future consumption growth. If this covariance is positive, then an increase in consumption is associated with higher expected future consumption growth, higher real interest rates, and lower bond prices. Real bonds accordingly have hedge value and the real bond premium is negative. If scz is negative, then the real bond premium is positive. 22 Under Epstein–Zin–Weil utility with g < 1/ y, assets that covary negatively with expected future consumption growth have higher risk premia. Since real bonds have this characteristic, Epstein–Zin–Weil utility with g < 1/ y tends to produce large term premia. This runs counter to the empirical observation in Tables 6 and 7 that term premia are only modest; while the term premia measured in the tables are on nominal rather than real bonds, nominal term premia should if anything be larger than real term premia because they include a reward for bearing inflation risk which is unlikely to be negative. This suggests that Epstein–Zin–Weil utility will require g > 1/ y to fit the behavior of bond prices. The premium on equities over real bonds is proportional to the coefficient l that governs the volatility of dividend growth. Under power utility the equity-bond premium is just risk aversion g times l times terms in sc2 and scz . Since both sc2 and scz must be small to match the observed moments of consumption growth, it is hard to rationalize the large equity-bond premium shown in Table 8. Epstein–Zin–Weil utility
22
Campbell (1986) develops this intuition in a univariate model for consumption growth.
848
J.Y. Campbell
with g > 1/ y adds a second term in scz and sz2 . The sz2 term is positive, which would help to rationalize the equity-bond premium. In conclusion, Epstein–Zin–Weil utility might help the consumption-based model to fit the patterns of risk premia in the data if g > 1/ y [Bansal and Yaron (2000)]. Given the evidence for small y reported in Section 4.1, however, this still requires a large risk aversion coefficient g. Thus, Epstein–Zin–Weil utility does not provide an easy solution to the equity premium puzzle. Campbell (1993) uses these relations in a different way. Instead of substituting the wealth return out of the Epstein–Zin–Weil model, Campbell substitutes consumption out of the model to get a discrete-time version of the intertemporal CAPM of Merton (1973). Setting e = w and l = 1 in Equation (35), the innovation in consumption is Dct + 1 − Et Dct + 1 = rw,t + 1 − Et rw,t + 1 + (1 − y) (Et + 1 − Et )
∞
ø j rw,t + 1 + j .
(53)
j=1
Thus, the covariance of any asset return with consumption growth must satisfy sic = siw + (1 − y) sih ,
(54)
where sih denotes the covariance of asset return i with revisions in expected future returns on wealth: ⎞ ⎛ ∞ (55) ø j rw,t + 1 + j ⎠ . sih ≡ Cov ⎝ri,t + 1 − Et ri,t + 1 , (Et + 1 − Et ) j=1
The letter h here is used as a mnemonic for hedging demand [Merton (1973)], a term commonly used in the finance literature to describe the component of asset demand that is determined by investors’ responses to changing investment opportunities. sic can now be substituted out of Equation (22) to obtain s2 Et ri,t + 1 − rf ,t + 1 + i = gsiw + ( g − 1) sih . 2
(56)
The risk premium on any asset is the coefficient of risk aversion g times the covariance of that asset with the return on the wealth portfolio, plus (g − 1) times the covariance of the asset with revisions in expected future returns on wealth. The second term is zero if g = 1; in this case it is well known that intertemporal asset demands are zero and asset pricing is myopic. Campbell (1996) uses this formula to study U.S. stock price data, assuming that the log return on wealth is a linear combination of the stock return and the return on human capital (proxied by innovations to labor income). He argues that mean-reversion in U.S. stock prices implies a positive covariance sew between U.S. stock returns and the current return on wealth, but a negative covariance seh between U.S. stock returns and revisions in expected future returns on wealth. Equation (56)
Ch. 13:
Consumption-Based Asset Pricing
849
then implies that increases in g above one have only a damped effect on the equity premium, so high risk aversion is needed to explain the equity premium puzzle. This conclusion is reached without any reference to measured aggregate consumption data. 4.5. What does the stock market forecast? All the calculations in Sections 4.3 and 4.4 rely heavily on the assumptions of the representative-agent model with power utility, lognormal distributions, constant variances, and a deterministic link between stock dividends and consumption. They leave open the possibility that the stock market volatility puzzle could be resolved by relaxing these assumptions, for example to allow independent variation in dividends in the manner discussed at the end of Section 4.3. A more direct way to understand the stock market volatility puzzle is to use the loglinear asset pricing framework to study the empirical relationships between log price–dividend ratios and future consumption or dividend growth rates, real interest rates, and excess stock returns. According to Equation (27), the log price–dividend ratio embodies rational forecasts of dividend growth rates and stock returns, which in turn are the sum of real interest rates and excess stock returns, discounted to an infinite horizon. One can compare the empirical importance of these different forecasts by regressing long-horizon consumption and dividend growth rates, real interest rates, and excess stock returns onto the log price-dividend ratio. Table 11 reports the results of this exercise. For comparative purposes real output growth, realized stock market volatility, and the excess bond return are also included as dependent variables. The regressions are divided into two groups; Table 11A includes the regressions whose dependent variables are growth rates of consumption, output, or dividends, while Table 11B includes the regressions whose dependent variables are the real interest rate, the excess stock return, a measure of realized volatility in excess stock returns, or the excess bond return. For each quarterly data set the dependent variables are computed in natural units over 4, 8 and 16 quarters (1, 2 and 4 years) and regressed onto the log price–dividend ratio divided by its standard deviation. Thus, the regression coefficient gives the effect of a one standard deviation change in the log price-dividend ratio on the cumulative growth rate or rate of return in natural units. The table reports the regression coefficient, heteroskedasticity- and autocorrelationconsistent t statistic and R2 statistic. In the benchmark postwar quarterly U.S. data, the log price–dividend ratio has no clear ability to forecast consumption growth, output growth, dividend growth, or the real interest rate at any horizon. What it does forecast is the excess return on stocks, with t statistics that start above 2.4 and increase, and with R2 statistics that start at 0.10 and increase to 0.38 at a 4-year horizon. In the introduction these results were summarized as stylized facts 10, 11, 12 and 13. Table 11 extends them to international data. 10. Regressions of consumption growth on the log price–dividend ratio give very mixed results across countries. There are statistically significant positive coefficients
850
Table 11A Forecasting with the log price–dividend ratio Country
Sample period
Horizon ˆ b(k)
USA
AUL
CAN
FR
GER
ITA
1947.2–1998.3
1970.2–1998.4
1970.2–1999.1
1973.2–1998.3
1978.4–1997.3
1971.2–1998.1
Consumption growth ˆ t( b(k)) R2 (k)
ˆ b(k)
Output growth ˆ t( b(k)) R2 (k)
ˆ b(k)
Dividend growth ˆ t( b(k)) R2 (k)
4
0.002
1.286
0.022
0.001
0.137
0.000
−0.003
−0.352
0.001
8
0.003
0.784
0.014
−0.002
−0.320
0.003
−0.019
−1.201
0.025
16
0.007
0.950
0.036
0.001
0.134
0.001
−0.043
−1.723
0.073
4
−0.001
−0.802
0.009
0.004
1.670
0.039
0.047
1.824
0.051
8
−0.003
−1.238
0.029
0.002
0.523
0.005
0.083
1.926
0.101
16
−0.006
−1.691
0.072
−0.004
−1.350
0.017
0.045
0.953
0.027
4
−0.003
−0.661
0.013
−0.002
−0.494
0.004
0.044
3.000
0.237
8
−0.007
−0.918
0.029
−0.011
−1.836
0.048
0.055
1.964
0.112
16
−0.021
−1.342
0.085
−0.029
−2.004
0.131
0.018
0.388
0.006
4
−0.002
−0.464
0.005
0.001
0.387
0.006
0.082
2.800
0.348
8
−0.005
−0.558
0.016
0.002
0.231
0.005
0.152
3.046
0.448
16
−0.007
−0.569
0.031
0.002
0.142
0.002
0.231
4.601
0.475
4
0.004
1.469
0.053
0.007
2.142
0.107
0.050
3.947
0.208
8
0.007
2.081
0.079
0.009
1.822
0.091
0.075
3.670
0.166
16
0.012
5.976
0.172
0.009
1.767
0.050
0.030
0.723
0.014
4
−0.005
−1.113
0.052
−0.005
−1.079
0.034
0.106
3.670
0.221
8
−0.008
−1.049
0.074
−0.009
−1.119
0.083
0.210
9.351
0.402
16
−0.008
−0.677
0.038
−0.005
−0.453
0.014
0.283
5.298
0.243 J.Y. Campbell
continued on next page
Ch. 13:
Table 11A, continued Sample period
Horizon ˆ b(k)
JAP
NTH
SWD
SWT
UK
USA
1970.2–1998.4
1977.2–1998.3
1970.2–1998.3
1982.2–1998.4
1970.2–1999.1
1970.2–1998.3
Consumption growth ˆ t( b(k)) R2 (k)
ˆ b(k)
Output growth ˆ t( b(k)) R2 (k)
ˆ b(k)
Dividend growth ˆ t( b(k)) R2 (k)
4
−0.009
−2.325
0.170
−0.004
−0.887
0.040
0.034
5.298
0.362
8
−0.014
−2.140
0.239
−0.008
−0.887
0.057
0.062
4.108
0.423
16
−0.024
−2.915
0.397
−0.011
−0.982
0.058
0.102
2.636
0.374 0.212
4
0.011
3.223
0.224
0.008
2.377
0.139
0.032
5.299
8
0.024
3.979
0.382
0.017
2.758
0.228
0.049
3.088
0.152
16
0.039
3.287
0.391
0.030
3.322
0.308
0.017
0.458
0.007
4
0.002
0.767
0.013
0.003
1.124
0.015
0.099
3.947
0.308
8
0.003
0.799
0.024
−0.001
−0.244
0.001
0.170
2.810
0.338
16
0.001
0.166
0.002
−0.014
−1.372
0.061
0.176
1.934
0.186
4
−0.001
−0.243
0.002
−0.002
−0.516
0.011
0.056
2.167
0.338 0.121
8
0.000
0.020
0.000
−0.008
−0.903
0.033
0.053
1.165
16
0.006
1.113
0.046
−0.022
−1.418
0.074
0.064
0.857
0.087
4
0.009
2.237
0.096
0.005
1.567
0.041
0.029
2.528
0.101
8
0.008
1.105
0.027
0.002
0.261
0.002
0.043
2.145
0.101
16
−0.007
−0.487
0.010
−0.010
−0.914
0.034
0.014
0.344
0.005
4
0.001
0.379
0.005
0.003
0.772
0.013
0.010
1.530
0.011
8
−0.001
−0.125
0.001
0.001
0.111
0.001
0.004
0.276
0.001
16
−0.006
−0.917
0.027
−0.009
−0.916
0.023
−0.017
−0.737
0.011
Consumption-Based Asset Pricing
Country
continued on next page 851
852
Table 11A, continued Country
Sample period
Horizon ˆ b(k)
SWD
UK
USA
1920–1997
1920–1997
1891–1997
Consumption growth ˆ t( b(k)) R2 (k)
ˆ b(k)
Output growth ˆ t( b(k)) R2 (k)
ˆ b(k)
Dividend growth ˆ t( b(k)) R2 (k)
1
−0.001
−0.587
0.003
−0.002
−0.947
0.008
0.065
5.527
0.238
4
−0.016
−1.477
0.047
−0.025
−1.982
0.110
0.123
2.887
0.143
8
−0.032
−2.419
0.160
−0.053
−2.843
0.251
0.127
2.873
0.130
1
0.003
1.075
0.012
−0.001
−0.444
0.001
0.020
1.786
0.063
4
−0.005
−0.784
0.006
0.006
0.683
0.007
−0.030
−1.116
0.026
8
−0.023
−2.440
0.094
0.004
0.377
0.001
−0.110
−3.456
0.171
1
0.000
0.010
0.000
−0.007
−1.609
0.019
0.019
1.264
0.016
4
−0.005
−0.577
0.005
−0.010
−0.684
0.005
0.026
0.681
0.009
8
0.005
0.413
0.002
−0.001
−0.037
0.000
−0.027
−0.492
0.005
J.Y. Campbell
Ch. 13:
Table 11B Forecasting with the log price–dividend ratio
USA
AUL
CAN
FR
GER
ITA
1947.2–1998.3
1970.2–1998.4
1970.2–1999.1
1973.2–1998.3
1978.4–1997.3
1971.2–1998.1
Horizon
4
Real interest rate ˆ ˆ b(k) t( b(k)) R2 (k) 0.004
0.777
Excess stock return ˆ ˆ b(k) t( b(k)) R2 (k)
0.016 −0.053
−2.415
0.103
ˆ b(k)
Stock volatility ˆ t( b(k)) R2 (k)
0.001
1.130
Excess bond return ˆ ˆ b(k) t( b(k)) R2 (k)
0.009 −0.003
−0.229
0.001
8
0.003
0.316
0.005 −0.107
−2.579
0.219
0.001
1.681
0.051 −0.001
−0.024
0.000
16
−0.003
−0.143
0.001 −0.184
−3.273
0.377
0.001
1.581
0.122 −0.007
−0.205
0.001
4
−0.003
−0.329
0.006 −0.105
−4.555
0.223
0.005
1.978
0.089 −0.023
−1.507
0.055
8
−0.009
−0.434
0.012 −0.154
−4.056
0.287
0.005
2.716
0.183 −0.045
−1.902
0.096
16
−0.030
−0.874
0.035 −0.252
−5.271
0.535
0.004
4.559
0.303 −0.046
−1.645
0.065
4
0.005
0.961
0.023 −0.031
−1.024
0.025
0.000
0.136
0.001
0.003
0.153
0.001
8
0.009
0.666
0.014 −0.052
−0.816
0.033
0.000
0.260
0.002
0.026
0.757
0.026 0.050
16
0.022
0.944
0.021 −0.153
−1.674
0.172 −0.000
−0.170
0.001
0.058
0.827
4
0.020
3.263
0.381 −0.015
−0.425
0.004
0.001
0.313
0.002
0.011
0.697
0.012
8
0.042
2.956
0.437 −0.050
−1.228
0.028
0.001
0.583
0.020
0.021
0.794
0.020 0.009
16
0.079
2.875
0.442 −0.159
−3.524
0.198
0.000
0.268
0.006
0.020
0.373
4
−0.000
−0.114
0.001 −0.025
−0.578
0.012
0.005
1.987
0.104
0.011
0.575
0.016
8
−0.003
−0.775
0.019 −0.074
−0.980
0.059
0.005
2.033
0.292
0.017
0.695
0.020
16
−0.001
−0.465
0.005 −0.224
−4.713
0.385
0.005
6.018
0.413 −0.056
−1.999
0.143
4
0.011
1.289
0.045 −0.067
−1.703
0.047
0.001
0.519
0.004 −0.005
−0.234
0.001
8
0.027
1.679
0.081 −0.140
−2.572
0.104 −0.001
−0.515
0.008
0.020
0.677
0.011
16
0.064
2.219
0.129 −0.101
−1.499
0.030 −0.000
−0.656
0.006
0.057
0.960
0.043
Consumption-Based Asset Pricing
Country Sample period
continued on next page
853
854
Table 11B, continued Country Sample period
JAP
NTH
SWD
SWT
UK
USA
1970.2–1998.4
1977.2–1998.3
1970.2–1998.3
1982.2–1998.4
1970.2–1999.1
1970.2–1998.3
Horizon
Real interest rate ˆ ˆ b(k) t( b(k)) R2 (k)
Excess stock return ˆ ˆ b(k) t( b(k)) R2 (k)
ˆ b(k)
Stock volatility ˆ t( b(k)) R2 (k)
Excess bond return ˆ ˆ b(k) t( b(k)) R2 (k)
4
0.011
2.262
0.098 −0.077
−2.002
0.123
0.003
1.506
0.111
0.004
0.277
0.002
8
0.026
2.388
0.178 −0.145
−2.301
0.208
0.003
1.430
0.190
0.014
0.492
0.012
0.294 −0.226
−2.478
0.263
0.004
1.763
0.311
0.047
1.094
0.084
0.313
0.005
0.001
0.763
0.010
0.018
0.826
0.037
16
0.057
2.185
4
−0.003
−0.703
0.025
0.013
0.003
8
−0.002
−0.176
0.013
0.136
0.002
0.000
0.186
0.001
0.031
0.664
0.033
16
0.008
0.353
0.016 −0.112
−0.738
0.066
0.000
0.401
0.007 −0.034
−0.487
0.020
4
0.024
5.331
0.322 −0.028
−0.723
0.014
0.002
0.851
0.019
0.014
0.745
0.022
8
0.050
4.648
0.371 −0.045
−0.472
0.017
0.003
0.922
0.067
0.050
2.458
0.192
16
0.094
3.697
0.330 −0.144
−1.070
0.079
0.004
1.442
0.156
0.075
2.654
0.281
4
0.003
0.739
0.029 −0.014
−0.321
0.004
0.005
2.642
0.084
0.021
2.062
0.059
8
0.002
0.287
0.006 −0.015
−0.133
0.002
0.004
1.514
0.118
0.049
1.847
0.108
16
0.000
0.004
0.000 −0.015
−0.079
0.001
0.001
0.765
0.021
0.040
0.616
0.026
4
0.022
2.108
0.205 −0.088
−3.046
0.149 −0.005
−1.147
0.070 −0.021
−1.084
0.030
8
0.033
1.655
0.128 −0.141
−2.628
0.226 −0.002
−0.721
0.012 −0.049
−1.553
0.076 0.143
16
0.037
1.123
0.044 −0.222
−3.541
0.380
0.001
0.446
0.008 −0.083
−1.719
4
−0.004
−0.498
0.015 −0.028
−0.824
0.024 −0.000
−0.503
0.003 −0.001
−0.040
0.000
8
−0.018
−0.995
0.071 −0.059
−0.731
0.053
0.001
0.368
0.005 −0.004
−0.082
0.000
16
−0.060
−2.507
0.224 −0.095
−0.898
0.085
0.001
0.623
0.047 −0.029
−0.461
0.009 J.Y. Campbell
continued on next page
Ch. 13:
Table 11B, continued
SWD
UK
USA
1920–1997
1920–1997
1891–1997
Horizon
Real interest rate ˆ ˆ b(k) t( b(k)) R2 (k) 0.001
Excess stock return ˆ ˆ b(k) t( b(k)) R2 (k)
ˆ b(k)
Stock volatility ˆ t( b(k)) R2 (k)
Excess bond return ˆ ˆ b(k) t( b(k)) R2 (k)
1
−0.002
−0.204
0.007
0.289
0.002
0.007
0.900
0.019
0.009
0.452
0.010
4
0.009
0.178
0.002 −0.018
−0.343
0.002
0.012
2.110
0.144
0.011
0.457
0.007
8
0.021
0.207
0.005 −0.064
−0.790
0.018
0.012
3.096
0.241
0.003
0.039
0.000
1
−0.009
−0.756
0.028 −0.087
−2.835
0.170 −0.007
−0.758
0.007 −0.035
−4.052
0.152
4
−0.055
−1.803
0.116 −0.221
−9.801
0.451
0.010
1.005
0.035 −0.066
−3.489
0.194
8
−0.127
−2.984
0.218 −0.213
−6.042
0.417
0.009
1.071
0.053 −0.076
−2.360
0.120
1
0.004
0.510
0.002 −0.037
−2.183
0.037 −0.000
−0.110
0.000
0.004
0.527
0.003
4
0.011
0.263
0.002 −0.135
−2.350
0.123 −0.000
−0.055
0.000 −0.021
−0.773
0.015
8
−0.006
−0.073
0.000 −0.273
−4.194
0.289 −0.002
−0.620
0.003 −0.080
−1.689
0.091
Consumption-Based Asset Pricing
Country Sample period
855
856
J.Y. Campbell
in Germany and the Netherlands, but statistically significant negative coefficients in Japan. The other countries resemble the USA in that they have no statistically significant consumption growth forecasts. The regressions with output growth as the dependent variable show a similar pattern across countries. 11. Results are somewhat more promising for real dividend growth in many countries. Positive and statistically significant coefficients are found in Canada, France, Germany, Italy, Japan, the Netherlands, Sweden, Switzerland and the UK. It seems clear that changing forecasts of real dividend growth have some role to play in explaining stock market movements. 12. The short-term real interest rate does not seem to be a promising candidate for the driving force behind stock market fluctuations. One would expect to find high price-dividend ratios forecasting low real interest rates, but the regression coefficients are significantly positive in France, Italy, Japan, Sweden and the UK. This presumably reflects the fact that stock markets in most countries were depressed in the 1970s, when real interest rates were low, and buoyant during the 1980s, when real interest rates were high. 13. Finally, in some countries the log price–dividend ratio is a powerful forecaster of excess stock returns. The results are particularly striking in the USA, Australia and the UK, but there is also some predictability in France, Germany, and Japan. In the long-term annual data for Sweden, the UK and the USA, I use horizons of 1 year, 4 years and 8 years. In the U.S. data the log price–dividend ratio fails to forecast real dividend growth, suggesting that authors such as Barsky and De Long (1993) overemphasize the role of dividend forecasts in interpreting long-run U.S. experience. Consistent with the quarterly results, the log price-dividend ratio also fails to forecast consumption growth, output growth, or the real interest rate, but does forecast excess stock returns. The UK data are similar, although here the 8-year regression coefficients for consumption growth and dividend growth are even statistically significant with the wrong (negative) sign. The 8-year regression coefficient for the real interest rate is also significantly negative, consistent with the idea that the UK stock market is related to the real interest rate. But much the strongest relation is between the log price-dividend ratio and future excess returns on the UK stock market. The Swedish data are quite different; here the log price-dividend ratio forecasts short-run dividend growth positively but has little predictive power for consumption growth, output growth, the real interest rate, or the excess log stock return. The right-hand column of Table 11B considers one more dependent variable, the excess bond return. The predictive power of the stock market for excess stock returns does not generally carry over to excess bond returns; there are significant negative coefficients only in the UK (and positive coefficients in Sweden). There are some econometric pitfalls in interpreting the regression results in Table 11. The price–dividend ratio is extremely persistent, and its innovations are highly positively correlated with stock returns. Stambaugh (1999) shows that these two conditions create a negative bias in the coefficient of a regression that forecasts stock
Ch. 13:
Consumption-Based Asset Pricing
857
returns from the price–dividend ratio. The negative coefficients in Table 11 might be attributable to this bias rather than to true predictability of stock returns. Lewellen (2003) points out, however, that the negative bias is concentrated in samples where the price–dividend ratio appears to be less persistent than it truly is. He suggests a bias correction that conditions on the measured persistence of the price-dividend ratio relative to an upper bound of unity; this correction is small in recent U.S. data, since the price–dividend ratio has a measured first-order autocorrelation close to one. Campbell and Yogo (2002) use near-unit-root econometric theory [Cavanagh, Elliott and Stock (1995), Torous, Valkanov and Yan (2003)] to motivate a similar test procedure and also find some evidence for stock return predictability. Concerns about the instability of corporate financial policy have led some authors to use other variables to forecast U.S. stock returns. For example, Campbell and Shiller (1988b, 2001) look at the price–earnings ratio, Lewellen (1999) and Vuolteenaho (2000) use the market–book ratio, Lamont (1998) combines the two by using the dividend payout ratio, Baker and Wurgler (2000) look at the share of equity in new finance, and Lettau and Ludvigson (2001) look at the level of consumption in relation to aggregate financial wealth and labor income. Other authors have considered interestrate variables such as recent changes in short-term interest rates [Campbell (1987), Hodrick (1992)] or yield spreads between long-term and short-term interest rates [Campbell (1987), Fama and French (1989), Keim and Stambaugh (1986)]. While many of these variables have some predictive power, the dramatic runup in stock prices in the late 1990s diminished the statistical evidence for predictability in almost all cases [but see Lettau and Ludvigson (2001) and Lewellen (2003)]. While this poses a challenge to the view that stock returns are forecastable, it also poses a challenge to the view that stocks are priced with constant discount rates, because the dividend growth forecasts needed to rationalize late-1990s stock prices with a constant discount rate seem extraordinarily optimistic [Campbell and Shiller (2001), Heaton and Lucas (1999), Shiller (2000)]. Overall, these results suggest that a new model of stock market volatility may be needed. The standard model of Section 4.3 drives all stock market fluctuations from changing forecasts of long-run consumption growth, dividend growth, and real interest rates; forecasts of excess stock returns are constant. The data for many countries suggest instead that forecasts of consumption growth, dividend growth, and real interest rates are variable only in the short run, so that long-run forecasts of these variables are fairly stable; changing forecasts of excess stock returns make an important contribution to the fluctuations of the stock market. 4.6. Changing volatility in stock returns One reason why excess stock returns might be predictable is that the risk of stock market investment, as measured for example by the volatility of stock returns, might vary over time. With a constant price of risk, shifts in the quantity of risk will lead to changes in the equity risk premium.
858
J.Y. Campbell
There is a vast literature documenting the fact that stock market volatility does change with time. However, the variation in volatility is concentrated at high frequencies; it is most dramatic in daily or monthly data and is much less striking at lower frequencies. There is some business-cycle variation in volatility, but it does not seem strong enough to explain large movements in aggregate stock prices [Bollerslev, Chou and Kroner (1992), Schwert (1989)]. A second difficulty is that there is only weak evidence that periods of high stock market volatility coincide with periods of predictably high stock returns. Some papers do find a positive relationship between conditional first and second moments of returns, particularly at long horizons [Bollerslev, Engle and Wooldridge (1988), French, Schwert and Stambaugh (1987), Harrison and Zhang (1999), Harvey (1989)], but other papers find that when short-term nominal interest rates are high, the conditional volatility of stock returns is high while the conditional mean stock return is low [Campbell (1987), Glosten, Jagannathan and Runkle (1993)]. French, Schwert and Stambaugh (1987) emphasize that innovations in volatility are strongly negatively correlated with innovations in returns. This could be indirect evidence for a positive relationship between volatility and expected returns, but it could also indicate that negative shocks to stock prices raise volatility, perhaps by raising financial or operating leverage of companies [Black (1976)]. Some researchers have built models that allow for independent variation in the quantity and price of risk. Harvey (1989, 1991) uses the Generalized Method of Moments to estimate such a system, and finds that the price of risk appears to vary countercyclically. Chou, Engle and Kane (1992) find similar results using a GARCH framework. 23 Within the confines of this chapter it is not possible to do justice to the sophistication of the econometrics used in this literature. Instead I illustrate the empirical findings of the literature by constructing a crude measure of ex post volatility for excess stock returns – the average over 4, 8 or 16 quarters of the squared quarterly excess stock return – and regressing it onto the log price–dividend ratio. The results of this regression are reported in the third column of Table 11B. There are several significant coefficients in these regressions, but they are all positive, indicating that high pricedividend ratios predict high, not low volatility in these data. These results reinforce the conclusion of the literature that the price of risk seems to vary over time in relation to the level of aggregate consumption. Section 5 discusses economic models that have this property.
23 There is also recent work emphasizing that the quantity of risk should be measured by the consumption covariance of stock returns rather than their volatility; these two measures of risk may differ if stock returns and consumption growth are not perfectly correlated. Whitelaw (2000) builds a regime-switching model with state-dependent transition probabilities in which the consumption covariance of stock returns moves with the business cycle.
Ch. 13:
Consumption-Based Asset Pricing
859
4.7. What does the bond market forecast? I conclude this section by briefly comparing the results of Table 11 with those that can be obtained using bond market data. Table 12 repeats the regressions of Table 11 using the yield spread between long-term and short-term bonds as the regressor. Many authors have found that in U.S. data, yield spreads have some ability to forecast excess bond returns [Campbell and Shiller (1991), Fama and Bliss (1987)]. This contradicts the expectations hypothesis of the term structure, the hypothesis that excess bond returns are unforecastable. Other authors have found that yield spreads are powerful forecasters of macroeconomic conditions, particularly output growth [Chen (1991), Estrella and Hardouvelis (1991)]. Fama and French (1989) have argued that both pricedividend ratios and yield spreads capture short-term cyclical conditions, although yield spreads are more highly correlated with conventional measures of the U.S. business cycle. The results of Table 12 are strikingly different from those of Table 11. In the quarterly data, yield spreads forecast positive output growth in every country except Italy and Japan, and positive consumption growth in many countries. Outside the USA, there is also a strong tendency for yield spreads to forecast low real interest rates. Thus, the findings of Chen (1991) and Estrella and Hardouvelis (1991) carry over to international data. Yield spreads are less consistently successful as forecasters of excess stock returns, stock market volatility, or even excess bond returns; the ability of the yield spread to forecast excess bond returns appears to be primarily a U.S. rather than an international phenomenon. 24 Similar conclusions are reported by Hardouvelis (1994) and Bekaert, Hodrick and Marshall (2001). While these authors do report some evidence for predictability of excess bond returns in international data, the evidence is much weaker than in U.S. data. One important point to note, however, is that the small-sample bias emphasized by Stambaugh (1999) tends to bias the coefficients of bond returns on yield spreads downwards, that is, towards zero. Correcting the bias therefore strengthens the evidence for predictability of excess bond returns; this is the opposite of the situation when stock returns are regressed on price–dividend ratios. Bekaert, Hodrick and Marshall (1997) explore this effect. These results are consistent with the view that there is some procyclical variation in the short-term real interest rate which is not matched by the long-term real interest rate. Thus, yield spreads tend to rise at business cycle troughs when real interest rates are predictably low and future output and consumption growth are predictably high. This interpretation is complicated by the fact that yields are measured on nominal bonds rather than real bonds. Inflationary expectations and monetary policy therefore have a large impact on yield spreads. The particular characteristics of U.S. monetary 24
Results at a one-quarter horizon, not reported in the table, are qualitatively consistent with the longhorizon results.
860
Table 12A Forecasting with the yield spread Country
USA
AUL
CAN
FR
GER
ITA
Sample period
1947.2–1998.3
1970.2–1998.4
1970.2–1999.1
1973.2–1998.3
1978.4–1997.3
1971.2–1998.1
Horizon
4
Output growth ˆ t( b(k)) R2 (k)
Dividend growth ˆ t( b(k)) R2 (k)
Consumption growth ˆ ˆ b(k) t( b(k)) R2 (k)
ˆ b(k)
0.003
0.007
2.267
0.066
0.011
1.571
0.018
1.716
0.040
ˆ b(k)
8
0.002
0.569
0.007
0.007
1.478
0.039
0.024
2.528
0.050
16
−0.004
−0.889
0.017
0.001
0.275
0.001
0.015
1.106
0.013
4
0.003
1.688
0.037
0.008
2.745
0.147
0.002
0.081
0.000
8
0.005
2.186
0.073
0.011
1.802
0.147
0.032
0.834
0.016
16
0.003
0.824
0.015
0.013
2.915
0.151
0.052
1.294
0.039
4
0.010
4.203
0.213
0.017
5.770
0.429
0.016
1.118
0.037
8
0.014
3.533
0.149
0.025
4.360
0.376
0.056
3.758
0.169 0.228
16
0.015
2.068
0.083
0.027
2.871
0.203
0.080
2.843
4
−0.001
−0.336
0.002
0.007
2.822
0.141
0.002
0.187
0.000
8
−0.005
−0.802
0.019
0.004
1.377
0.029
0.013
0.704
0.004
16
0.015
3.792
0.150
0.005
0.877
0.017
−0.002
−0.059
0.000
4
0.005
2.800
0.099
0.008
2.686
0.181
0.070
5.278
0.454
8
0.007
2.773
0.106
0.012
3.923
0.180
0.132
6.312
0.571
16
0.009
1.809
0.118
0.014
1.662
0.153
0.096
2.779
0.176
4
0.003
0.934
0.026
0.007
1.858
0.077
−0.076
−2.971
0.124
8
0.001
0.244
0.002
−0.000
−0.074
0.000
−0.052
−0.972
0.027
16
0.000
0.079
0.000
−0.003
−0.522
0.005
−0.066
−1.193
0.015 J.Y. Campbell
continued on next page
Ch. 13:
Table 12A, continued
JAP
NTH
SWD
SWT
UK
USA
Sample period
1970.2–1998.4
1977.2–1998.3
1970.2–1998.3
1982.2–1998.4
1970.2–1999.1
1970.2–1998.3
Horizon
Consumption growth ˆ ˆ b(k) t( b(k)) R2 (k)
ˆ b(k)
Output growth ˆ t( b(k)) R2 (k)
ˆ b(k)
Dividend growth ˆ t( b(k)) R2 (k)
4
−0.003
−0.837
0.014
0.001
0.518
0.004
0.005
0.470
8
−0.007
−1.227
0.055
−0.002
−0.305
0.003
0.015
1.277
0.008 0.024
16
−0.008
−1.520
0.039
−0.001
−0.128
0.000
0.032
2.405
0.035
4
0.005
1.866
0.060
0.006
2.233
0.089
0.020
2.064
0.104
8
0.010
2.402
0.122
0.011
3.559
0.151
0.045
1.944
0.223
16
0.002
0.283
0.002
0.002
0.376
0.003
0.035
1.350
0.065
4
0.002
0.967
0.014
0.008
2.503
0.120
0.057
1.541
0.110 0.049
8
0.002
0.512
0.010
0.009
1.545
0.061
0.060
1.100
16
0.002
0.442
0.005
0.006
1.082
0.016
−0.006
−0.149
0.000
4
0.004
3.097
0.114
0.009
4.123
0.290
0.032
1.998
0.148
8
0.009
3.313
0.287
0.020
6.099
0.460
0.057
3.082
0.296
16
0.010
3.618
0.381
0.032
8.304
0.439
0.026
1.710
0.040
4
0.007
1.308
0.054
0.011
2.849
0.173
−0.009
−0.806
0.010 0.000
8
0.009
1.004
0.041
0.015
1.937
0.134
0.002
0.075
16
0.001
0.109
0.001
0.009
0.774
0.032
−0.020
−0.574
0.013
4
0.006
3.439
0.194
0.013
3.972
0.276
0.020
2.723
0.053
8
0.006
1.900
0.099
0.016
3.218
0.189
0.039
3.729
0.154
16
0.002
0.515
0.004
0.009
1.920
0.046
0.026
2.501
0.049
Consumption-Based Asset Pricing
Country
continued on next page 861
862
Table 12A, continued Country
SWD
UK
USA
Sample period
1920–1997
1920–1997
1891–1997
Horizon
Consumption growth ˆ ˆ b(k) t( b(k)) R2 (k)
ˆ b(k)
Output growth ˆ t( b(k)) R2 (k)
ˆ b(k)
Dividend growth ˆ t( b(k)) R2 (k)
1
−0.004
−1.812
0.017
−0.000
−0.226
0.000
0.026
1.384
4
−0.011
−2.192
0.028
−0.017
−2.786
0.067
0.030
1.057
0.041 0.011
8
−0.021
−2.076
0.088
−0.031
−2.682
0.114
0.011
0.545
0.001
1
−0.001
−0.376
0.002
0.005
1.532
0.025
−0.009
−1.202
0.014
4
−0.014
−1.486
0.052
0.004
0.412
0.003
−0.039
−1.350
0.043
8
−0.023
−2.991
0.080
−0.009
−0.880
0.008
−0.066
−1.911
0.053
1
0.002
0.441
0.002
0.007
1.321
0.018
−0.006
−0.441
0.002
4
0.006
0.705
0.010
0.018
0.860
0.019
0.010
0.385
0.002
8
0.010
0.632
0.011
0.045
0.999
0.072
0.006
0.139
0.000
J.Y. Campbell
Ch. 13:
Table 12B Forecasting with the yield spread
USA
AUL
CAN
FR
GER
ITA
1947.2–1998.3
1970.2–1998.4
1970.2–1999.1
1973.2–1998.3
1978.4–1997.3
1971.2–1998.1
Horizon
4
Real interest rate ˆ ˆ b(k) t( b(k)) R2 (k)
Excess stock return ˆ ˆ b(k) t( b(k)) R2 (k)
0.006
0.019
1.153
1.689
0.051
8
0.007
1.061
0.025 −0.000
16
0.011
0.871
0.020
4
−0.014
8
−0.025
ˆ b(k)
Stock volatility ˆ t( b(k)) R2 (k)
0.015 −0.001
−1.328
0.038
Excess bond return ˆ ˆ b(k) t( b(k)) R2 (k) 0.033
4.100
0.131
−0.020
0.000 −0.001
−1.613
0.053
0.037
2.787
0.087
0.017
0.409
0.005 −0.000
−0.187
0.001
0.042
1.604
0.060
−2.184
0.106 −0.007
−0.236
0.001 −0.003
−1.077
0.020
0.006
0.683
0.004
−1.861
0.089
0.001
0.036
0.000 −0.003
−1.267
0.060 −0.017
−1.139
0.015
−1.084
0.030
16
−0.043
−2.168
0.078
0.065
2.300
0.038 −0.001
−1.053
0.016 −0.031
4
−0.011
−2.124
0.114
0.054
1.864
0.087 −0.002
−1.370
0.064
0.024
1.940
0.068
8
−0.023
−1.843
0.144
0.063
2.310
0.070 −0.001
−0.861
0.053
0.024
1.030
0.032
−0.307
0.001
16
−0.053
−2.109
0.225
0.053
0.925
0.037 −0.001
−0.565
0.031 −0.007
4
−0.012
−2.001
0.139
0.050
1.481
0.047 −0.000
−0.071
0.000
0.007
0.475
0.005
8
−0.017
−1.413
0.084
0.077
2.111
0.075
0.140
0.001
0.005
0.207
0.001
−0.940
0.020
0.000
16
−0.022
−1.444
0.040
0.091
2.614
0.074
0.002
1.500
0.124 −0.027
4
−0.008
−4.710
0.419
0.051
1.862
0.058
0.001
0.531
0.005
0.013
0.808
0.026
8
−0.010
−3.599
0.258
0.081
2.104
0.079
0.002
1.553
0.037 −0.003
−0.127
0.001
16
0.001
0.179
0.001 −0.078
−2.723
0.057
0.004
4.259
0.311 −0.055
−2.937
0.165
4
−0.014
−1.758
0.083
0.020
0.410
0.004 −0.003
−1.801
0.039
0.018
1.323
0.019
8
−0.027
−1.453
0.088
0.054
0.870
0.017 −0.001
−0.649
0.015
0.018
0.659
0.009
16
−0.051
−2.136
0.093
0.089
1.262
0.026
0.030 −0.007
−0.213
0.001
0.001
1.479
Consumption-Based Asset Pricing
Country Sample period
continued on next page
863
864
Table 12B, continued Country Sample period
JAP
NTH
SWD
SWT
UK
USA
1970.2–1998.4
1977.2–1998.3
1970.2–1998.3
1982.2–1998.4
1970.2–1999.1
1970.2–1998.3
Horizon
Real interest rate ˆ ˆ b(k) t( b(k)) R2 (k)
Excess stock return ˆ ˆ b(k) t( b(k)) R2 (k)
ˆ b(k)
Stock volatility ˆ t( b(k)) R2 (k)
Excess bond return ˆ ˆ b(k) t( b(k)) R2 (k)
4
−0.007
−1.425
0.041
0.016
0.520
0.005 −0.002
−1.736
0.039
0.011
1.042
0.016
8
−0.011
−1.352
0.033 −0.001
−0.021
0.000 −0.001
−1.382
0.022 −0.003
−0.145
0.000
0.001 −0.019
−0.659
0.002
0.000
0.282
0.001 −0.032
−1.482
0.038
1.889
0.126
0.000
0.036
0.000
0.026
2.405
0.104
0.159 −0.000
−0.684
0.007
16
0.003
0.373
4
−0.011
−3.702
0.450
0.057
8
−0.015
−2.798
0.273
0.092
1.846
0.032
1.658
0.062
16
−0.001
−0.122
0.000 −0.031
−0.780
0.010
0.000
0.417
0.006 −0.018
−0.591
0.011
4
−0.022
−4.943
0.303 −0.056
−2.191
0.056 −0.004
−2.478
0.065 −0.005
−0.477
0.003
8
−0.042
−5.461
0.295 −0.056
−1.725
0.031 −0.003
−2.581
0.079 −0.015
−0.928
0.019
16
−0.072
−7.219
0.271 −0.078
−1.730
0.032 −0.001
−1.436
0.027 −0.041
−5.335
0.113
4
−0.010
−5.455
0.483
0.069
2.227
0.129 −0.001
−0.497
0.007
0.014
1.042
0.037
8
−0.017
−7.014
0.596
0.108
3.007
0.184 −0.001
−0.823
0.017 −0.002
−0.111
0.000
16
−0.021
−6.257
0.559
0.016
0.354
0.003
0.001
1.317
0.057 −0.046
−1.905
0.096
4
−0.027
−4.448
0.343
0.040
1.581
0.033
0.001
0.671
0.006
0.010
0.712
0.008
8
−0.052
−5.684
0.362
0.026
0.643
0.009
0.001
0.753
0.005 −0.002
−0.073
0.000
16
−0.100
−4.544
0.396
0.009
0.173
0.001
0.003
1.627
0.067 −0.042
−1.410
0.046
4
0.004
0.823
0.021
0.036
1.524
0.049 −0.003
−1.650
0.116
0.040
3.678
0.136
8
0.003
0.344
0.002
0.028
1.748
0.019 −0.002
−2.036
0.168
0.038
1.725
0.062
16
0.001
0.046
0.000
0.082
2.910
0.123 −0.001
−1.248
0.048
0.025
1.232
0.014 J.Y. Campbell
continued on next page
Ch. 13:
Table 12B, continued
SWD
UK
USA
1920–1997
1920–1997
1891–1997
Horizon
Real interest rate ˆ ˆ b(k) t( b(k)) R2 (k)
Excess stock return ˆ ˆ b(k) t( b(k)) R2 (k)
ˆ b(k)
Stock volatility ˆ t( b(k)) R2 (k)
1
−0.014
−2.418
0.057 −0.005
−0.201
0.001 −0.011
0.036
3.748
0.161
4
−0.042
−2.162
0.062 −0.028
−0.578
0.006
0.002
0.571
0.004 −0.017
−1.106
0.021
8
−0.045
−1.211
0.026
1.417
0.025
0.002
0.831
0.005 −0.009
−0.269
0.002
1
−0.031
−5.615
0.339
0.018
0.612
0.007
0.022
1.496
0.041
0.010
1.088
0.013
4
−0.109
−4.696
0.474
0.002
0.059
0.000
0.017
1.217
0.047 −0.008
−0.528
0.003
0.065
−2.193
0.044
Excess bond return ˆ ˆ b(k) t( b(k)) R2 (k)
8
−0.157
−3.690
0.289
0.076
1.549
0.046
0.003
0.475
0.003
0.015
0.343
0.004
1
−0.020
−2.541
0.051
0.014
0.775
0.006 −0.000
−0.070
0.000
0.037
6.341
0.311
4
−0.060
−2.089
0.073
0.087
2.169
0.065
0.001
0.208
0.001
0.067
3.007
0.195
8
−0.105
−1.528
0.100
0.110
1.814
0.057 −0.001
−0.264
0.001
0.094
2.715
0.153
Consumption-Based Asset Pricing
Country Sample period
865
866
J.Y. Campbell
policy may help to explain why previously reported U.S. results do not carry over to other countries in Table 12. U.S. monetary policy has tended to smooth real and nominal interest rates, which reduces the forecastability of real interest rates and increases the sensitivity of the yield spread to changes in bond-market risk premia. Mankiw and Miron (1986) have found that the yield spread was a better forecaster of U.S. interest rates in the period before the founding of the Federal Reserve, while Kugler (1988) has found that the yield spread is a better forecaster of interest rates in Germany and Switzerland and has related this to the characteristics of German and Swiss monetary policy. The findings in Table 12 are consistent with this literature.
5. Cyclical variation in the price of risk In previous sections I have documented a challenging array of stylized facts and have discussed the problems they pose for standard asset-pricing theory. Briefly, the equity premium puzzle suggests that risk aversion must be high on average to explain high average excess stock returns, while the equity volatility puzzle suggests that risk aversion must vary over time to explain predictable variation in excess returns and the associated volatility of stock prices. This section describes some models that display these features. 5.1. Habit formation Constantinides (1990) and Sundaresan (1989) have argued for the importance of habit formation, a positive effect of today’s consumption on tomorrow’s marginal utility of consumption. Several modeling issues arise at the outset. Writing the period utility function as U (Ct , Xt ), where Xt is the time-varying habit or subsistence level, the first issue is the functional form for U (·). Abel (1990, 1999) has proposed that U (·) should be a power function of the ratio Ct /Xt , while Boldrin, Christiano and Fisher (2001), Campbell and Cochrane (1999), Constantinides (1990) and Sundaresan (1989) have used a power function of the difference Ct − Xt . The second issue is the effect of an agent’s own decisions on future levels of habit. In standard “internal habit” models such as those in Constantinides (1990) and Sundaresan (1989), habit depends on an agent’s own consumption and the agent takes account of this when choosing how much to consume. In “external habit” models such as those in Abel (1990, 1999) and Campbell and Cochrane (1999), habit depends on aggregate consumption which is unaffected by any one agent’s decisions. Abel calls this “catching up with the Joneses”. The third issue is the speed with which habit reacts to individual or aggregate consumption. Abel (1990, 1999), Dunn and Singleton (1986) and Ferson and Constantinides (1991) make habit depend on one lag of consumption, whereas Boldrin, Christiano and Fisher (2001), Constantinides (1990), Sundaresan (1989), Campbell and Cochrane (1999) and Heaton (1995) assume that habit reacts only gradually to changes in consumption.
Ch. 13:
Consumption-Based Asset Pricing
867
The choice between ratio models and difference models of habit is important because ratio models have constant risk aversion whereas difference models have time-varying risk aversion. To see this, consider Abel’s (1990) specification in which an agent’s utility can be written as a power function of the ratio Ct /Xt , Ut =
∞
dj
j=0
(Ct + j /Xt + j )1−g − 1 , 1−g
(57)
where Xt summarizes the influence of past consumption levels on today’s utility. For simplicity and to keep the model conditionally lognormal, specify Xt as an external habit depending on only one lag of aggregate consumption: ú
Xt = C t − 1 ,
(58)
where C t − 1 is aggregate past consumption and the parameter ú governs the degree of time-nonseparability. Since there is a representative agent, in equilibrium aggregate consumption equals the agent’s own consumption, so in equilibrium Xt = Ctú− 1 . With this specification of utility, in equilibrium the first-order condition is $ % 1 = dEt 1 + Ri,t + 1 (Ct /Ct − 1 )ú(g−1) (Ct + 1 /Ct )−g .
(59)
(60)
Assuming homoskedasticity and joint lognormality of asset returns and consumption growth, this implies the following restrictions on risk premia and the riskless real interest rate: rf ,t + 1 = − log d − g 2 sc2 / 2 + gEt Dct + 1 − ú(g − 1) Dct , Et ri,t + 1 − rf ,t + 1 + si2 / 2 = gsic .
(61) (62)
Equation (61) says that the riskless real interest rate equals its value under power utility, less ú(g − 1) Dct . Holding consumption today and expected consumption tomorrow constant, an increase in consumption yesterday increases the marginal utility of consumption today. This makes the representative agent want to borrow from the future, driving up the real interest rate. Equation (62) describing the risk premium is exactly the same as Equation (16), the risk premium formula for the power utility model. The external habit simply adds a term to the Euler equation (60) which is known at time t, and this does not affect the risk premium. Abel (1990) nevertheless argues that catching up with the Joneses can help to explain the equity premium puzzle. This argument is based on two considerations. First, the average level of the riskless rate in Equation (61) is − log d − g 2 sc2 / 2 + (g − ú(g − 1)) g,
868
J.Y. Campbell
where g is the average consumption growth rate. When risk aversion g is very large, a positive ú reduces the average riskless rate. Thus, catching up with the Joneses enables one to increase risk aversion to solve the equity premium puzzle without encountering the riskless rate puzzle. Second, a positive ú is likely to make the riskless real interest rate more variable because of the term −ú(g − 1) Dct in Equation (61). If one solves for the stock returns implied by the assumption that stock dividends equal consumption, a more variable real interest rate increases the covariance of stock returns and consumption sic and drives up the equity premium. The second of these points can be regarded as a weakness rather than a strength of the model. The puzzle illustrated in Table 4 is that the ratio of the measured equity premium to the measured covariance sic is large; increasing the consumption covariance sic does not by itself help to explain the size of this ratio. Also, Table 1 shows that the real interest rate is fairly stable ex post, while Table 6 shows that at most half of its variance is forecastable. Thus, the standard deviation of the expected real interest rate is quite small, and this is not consistent with large values of ú and g in Equation (61). This difficulty with the riskless real interest rate is a fundamental problem for habit formation models, as Abel (1999) points out. Time-nonseparable preferences make marginal utility volatile even when consumption is smooth, because consumers derive utility from consumption relative to its recent history rather than from the absolute level of consumption. But unless the consumption and habit processes take particular forms, time-nonseparability also creates large swings in expected marginal utility at successive dates, and this implies large movements in the real interest rate. I now present an alternative specification in which it is possible to solve this problem, and in which risk aversion varies over time. Campbell and Cochrane (1999) build a model with external habit formation in which a representative agent derives utility from the difference between consumption and a time-varying subsistence or habit level. They assume that log consumption follows a random walk. This fits the observation that most countries do not have highly predictable consumption or dividend growth rates (Tables 9 and 11). The consumption growth process is Dct + 1 = g + ûc,t + 1 ,
(63) sc2 .
This is just the where ûc,t + 1 is a normal homoskedastic innovation with variance ARMA(1,1) model (36) of the previous section, with constant expected consumption growth. The utility function of the representative agent takes the form 1−g ∞ −1 j C t + j − Xt + j . (64) d Et 1−g j=0
Here Xt is the level of habit, d is the subjective discount factor, and g is the utility curvature parameter. Utility depends on a power function of the difference between consumption and habit; it is only defined when consumption exceeds habit.
Ch. 13:
Consumption-Based Asset Pricing
869
It is convenient to capture the relation between consumption and habit by the surplus consumption ratio St , defined by St ≡
Ct − Xt . Ct
(65)
The surplus consumption ratio is the fraction of consumption that exceeds habit and is therefore available to generate utility in Equation (64). If habit Xt is held fixed as consumption Ct varies, the local coefficient of relative risk aversion is −CuCC g = , uC St
(66)
where uC and uCC are the first and second derivatives of utility with respect to consumption. Risk aversion rises as the surplus consumption ratio St declines, that is, as consumption approaches the habit level. Note that g, the curvature parameter in utility, is no longer the coefficient of relative risk aversion in this model. To complete the description of preferences, one must specify how the habit Xt evolves over time in response to aggregate consumption. Campbell and Cochrane suggest an AR(1) model for the log surplus consumption ratio, st ≡ log(St ): st + 1 = (1 − f) s¯ + fst + l (st ) ûc,t + 1 .
(67)
The parameter f governs the persistence of the log surplus consumption ratio, while the “sensitivity function” l(st ) controls the sensitivity of st + 1 and thus of log habit xt + 1 to innovations in consumption growth ûc,t + 1 . Equation (67) specifies that today’s habit is a complex nonlinear function of current and past consumption. A linear approximation may help to understand it. If I substitute the definition st ≡ log(1 − exp(xt − ct )) into Equation (67) and linearize around the steady state, I find that Equation (67) is approximately a traditional habit-formation model in which log habit responds slowly and linearly to log consumption, xt + 1 ≈ (1 − f) a + fxt + (1 − f) ct = a + (1 − f)
∞
f j ct − j .
(68)
j=0
The linear model (68) has two serious problems. First, when consumption follows an exogenous process such as Equation (63) there is nothing to stop consumption falling below habit, in which case utility is undefined. This problem does not arise when one specifies a process for st , since any real value for st corresponds to positive St and hence Ct > Xt . Second, the linear model typically implies a highly volatile riskless real interest rate. The process (67) with a non-constant sensitivity function l(st ) allows one to control or even eliminate variation in the riskless interest rate.
870
J.Y. Campbell
To derive the real interest rate implied by this model, one first calculates the marginal utility of consumption as u (Ct ) = (Ct − Xt )−g = St−g Ct−g .
(69)
The gross simple riskless rate is then
−1 −g −g −1 St + 1 Ct + 1 U (Ct + 1 ) 1 + Rft + 1 = dEt = dE . t U (Ct ) St Ct
(70)
Taking logs, and using Equations (63) and (67), the log riskless real interest rate is rtf+ 1 = − log(d) + gg − g(1 − f)(st − s¯ ) −
g 2 sc2 [l (st ) + 1]2 . 2
(71)
The first two terms on the right-hand side of Equation (71) are familiar from the power utility model (17), while the last two terms are new. The third term (linear in (st − s¯ )) reflects intertemporal substitution. If the surplus consumption ratio is low, the marginal utility of consumption is high. However, the surplus consumption ratio is expected to revert to its mean, so marginal utility is expected to fall in the future. Therefore, the consumer would like to borrow and this drives up the equilibrium risk free interest rate. Note that what determines intertemporal substitution is meanreversion in marginal utility, not mean-reversion in consumption itself. In this model consumption follows a random walk so there is no mean-reversion in consumption; but habit formation causes the consumer to adjust gradually to a new level of consumption, creating mean-reversion in marginal utility. The fourth term (linear in [l(st ) + 1]2 ) reflects precautionary savings. As uncertainty increases, consumers become more willing to save and this drives down the equilibrium riskless interest rate. Note that what determines precautionary savings is uncertainty about marginal utility, not uncertainty about consumption itself. In this model the consumption process is homoskedastic so there is no time-variation in uncertainty about consumption; but habit formation makes a given level of consumption uncertainty more serious for marginal utility when consumption is low relative to habit. Equation (71) can be made to match the observed stability of real interest rates in two ways. First, it is helpful if the habit persistence parameter f is close to one, since this limits the strength of the intertemporal substitution effect. Second, the precautionary savings effect offsets the intertemporal substitution effect if l(st ) declines with st . In fact, Campbell and Cochrane parameterize the l(st ) function so that these two effects exactly offset each other everywhere, implying a constant riskless interest rate. With a constant riskless rate, real bonds of all maturities are also riskless and there are no real term premia. Thus in the Campbell–Cochrane model the equity premium is also an equity-bond premium.
Ch. 13:
Consumption-Based Asset Pricing
871
The sensitivity function l(st ) is not fully determined by the requirement of a constant riskless interest rate. Campbell and Cochrane choose the function to satisfy three conditions: 1) The riskless real interest rate is constant. 25 2) Habit is predetermined at the steady state st = s¯ . 3) Habit is predetermined near the steady state, or, equivalently, positive shocks to consumption may increase habit but never reduce it. To understand conditions 2) and 3), recall that the traditional notion of habit makes it a predetermined variable. On the other hand habit cannot be predetermined everywhere, or a sufficiently low realization of consumption growth would leave consumption below habit. To make habit “as predetermined as possible”, Campbell and Cochrane assume that habit is predetermined at and near the steady state. This also eliminates the counterintuitive possibility that positive shocks to consumption cause declines in habit. Using these three conditions, Campbell and Cochrane show that the steady-state surplus consumption ratio must be a function of the other parameters of the model, and that the sensitivity function l(st ) must take a particular form. Campbell and Cochrane pick parameters for the model by calibrating it to fit postwar quarterly U.S. data. They choose the mean consumption growth rate g = 1.89% per year and the standard deviation of consumption growth sc = 1.50% per year to match the moments of the U.S. consumption data. Campbell and Cochrane follow Mehra and Prescott (1985) by assuming that the stock market pays a dividend equal to consumption. They also consider a more realistic model in which the dividend is a random walk whose innovations are correlated with consumption growth. They show that results in this model are very similar because the implied regression coefficient of dividend growth on consumption growth is close to one, which produces similar asset price behavior. They use numerical methods to find the price-dividend ratio for the stock market as a function of the state variable st . They set the persistence of the state variable, f, equal to 0.87 per year to match the persistence of the log price–dividend ratio. They choose g = 2.00 to match the ratio of unconditional mean to unconditional standard deviation of return in U.S. stock returns. These parameter values imply that at the steady state, the surplus consumption ratio S¯ = 0.057 so habit is about 94% of consumption. Finally, Campbell and Cochrane choose the discount factor d = 0.89 to give a riskless real interest rate of just under 1% per year. It is important to understand that with these parameter values the model uses high average risk aversion to fit the high unconditional equity premium. Steady-state risk aversion is g/ S¯ = 2.00/ 0.057 = 35. In this respect the model resembles a power utility model with a very high risk aversion coefficient. There are however two important differences between the Campbell–Cochrane habit formation model and the power utility model with high risk aversion. First, the model with habit formation avoids the risk-free rate puzzle. Evaluating Equation (71) at the
25
Wachter (2001) relaxes this condition and explores implications of the model for bond prices as well as stock prices.
872
J.Y. Campbell
steady state surplus consumption ratio and using the restrictions on the sensitivity function l(st ), the constant riskless interest rate in the Campbell–Cochrane model is
rtf+ 1
2 2 g sc . = − log(d) + gg − ¯S 2
(72)
In the power utility model the same large coefficient g would appear in the consumption growth term and the consumption volatility term (Equation 17); in the Campbell– Cochrane model the curvature parameter g appears in the consumption growth term, and this is much lower than the steady-state risk aversion coefficient g/ S¯ which appears in the consumption volatility term. Thus, a much lower value of the discount factor d is consistent with the average level of the risk free interest rate, and the model implies a less sensitive relationship between mean consumption growth and interest rates. This property of the model is similar to that of an Epstein–Zin–Weil model with an elasticity of intertemporal substitution y that exceeds the reciprocal of risk aversion. Second, the Campbell–Cochrane model has risk aversion that varies with the level of consumption, whereas a power utility model has constant risk aversion. The timevariation in risk aversion generates predictable movements in excess stock returns like those documented in Table 11, enabling the model to match the volatility of stock prices even with a smooth consumption series and a constant riskless interest rate. It is instructive to compare the Campbell–Cochrane (1999) and Constantinides (1990) models of habit formation. The Campbell–Cochrane model assumes random walk consumption and implies negative autocorrelation of stock returns. The Constantinides model, by contrast, assumes unforecastable asset returns and implies positive autocorrelation of consumption growth. Thus, these two models take different stands on the question of whether wealth or consumption accurately represents long-run risk. The Constantinides model fits the equity premium with low risk-aversion, but it achieves this success at the cost of a positively serially correlated consumption process that contradicts the empirical findings of Cochrane (1994) and Lettau and Ludvigson (2001). Recent work in behavioral finance has explored similar themes. Barberis, Huang and Santos (2001) combine a standard power utility function in consumption with the prospect theory of Kahneman and Tversky (1979). Appealing to experimental evidence of Thaler and Johnson (1990), they argue that aversion to losses varies with past outcomes; past success reduces effective risk aversion as investors feel they are “gambling with house money”. This creates a time-varying price of risk which explains aggregate stock market volatility in a similar manner to Campbell and Cochrane (1999). The Barberis–Huang–Santos model has a lower aversion to consumption risk than the Campbell–Cochrane model, because it generates risk-averse behavior from direct aversion to wealth fluctuations as well as from standard aversion to consumption fluctuations.
Ch. 13:
Consumption-Based Asset Pricing
873
5.2. Models with heterogeneous agents
All the models considered so far assume that assets can be priced as if there is a representative agent who consumes aggregate consumption. An alternative view is that aggregate consumption is not an adequate proxy for the consumption of stock market investors. One simple explanation for a discrepancy between these two measures of consumption is that there are two types of agents in the economy: constrained agents who are prevented from trading in asset markets and simply consume their labor income each period, and unconstrained agents. The consumption of the constrained agents is irrelevant to the determination of equilibrium asset prices, but it may be a large fraction of aggregate consumption. Campbell and Mankiw (1989) argue that predictable variation in consumption growth, correlated with predictable variation in income growth, suggests an important role for constrained agents, while Mankiw and Zeldes (1991) and Parker (2001) use U.S. panel data to show that the consumption of stockholders is more volatile and more highly correlated with the stock market than the consumption of nonstockholders. Such effects are likely to be even more important in countries with low stock market capitalization and concentrated equity ownership. The constrained agents in the above model do not directly influence asset prices, because they are assumed not to hold or trade financial assets. Another strand of the literature argues that there may be some investors who buy and sell stocks for exogenous, perhaps psychological reasons. These “noise traders” can influence stock prices because other investors, who are rational utility-maximizers, must be induced to accommodate their shifts in demand. If utility-maximizing investors are risk-averse, then they will only buy stocks from noise traders who wish to sell if stock prices fall and expected stock returns rise; conversely they will only sell stocks to noise traders who wish to buy if stock prices rise and expected stock returns fall. Campbell and Kyle (1993), Cutler, Poterba and Summers (1991), De Long, Shleifer, Summers and Waldmann (1990) and Shiller (1984) develop this model in some detail. The model implies that rational investors do not hold the market portfolio – instead they shift in and out of the stock market in response to changing demand from noise traders – and do not consume aggregate consumption since some consumption is accounted for by noise traders. This makes the model hard to test without having detailed information on the investment strategies of different market participants. It is also possible that utility-maximizing stock market investors are heterogeneous in important ways. If investors are subject to large idiosyncratic risks in their labor income and can share these risks only indirectly by trading a few assets such as stocks and Treasury bills, their individual consumption paths may be much more volatile than aggregate consumption. Even if individual investors have the same power utility function, so that any individual’s consumption growth rate raised to the power −g would be a valid stochastic discount factor, the aggregate consumption growth rate raised to the power −g may not be a valid stochastic discount factor.
874
J.Y. Campbell
This problem is an example of Jensen’s Inequality. Since marginal utility is nonlinear, the average of investors’ marginal utilities of consumption is not generally the same as the marginal utility of average consumption. The problem disappears when investors’ individual consumption streams are perfectly correlated with one another as they will be in a complete markets setting. Grossman and Shiller (1982) point out that it also disappears in a continuous-time model when the processes for individual consumption streams and asset prices are diffusions. Constantinides and Duffie (1996) have provided a simple framework within which the effects of heterogeneity can be understood. Constantinides and Duffie postulate an economy in which individual investors k have different consumption levels Ck,t . The cross-sectional distribution of individual consumption is lognormal, and the change from time t to time t + 1 in individual log consumption is cross-sectionally uncorrelated with the level of individual log consumption at time t. All investors have the same power utility function with time discount factor d and coefficient of relative risk aversion g. In this economy each investor’s own intertemporal marginal rate of substitution is a valid stochastic discount factor. Hence the cross-sectional average of investors’ intertemporal marginal rates of substitution is a valid stochastic discount factor. I write this as −g Ck,t + 1 ∗ ∗ , (73) Mt + 1 ≡ dEt + 1 Ck,t where Et∗ denotes an expectation taken over the cross-sectional distribution at time t. That is, for any cross-sectionally random variable Xk,t , Et∗ Xk,t = limK → ∞ K (1/K) k = 1 Xk,t , the limit as the number of cross-sectional units increases of the crosssectional sample average of Xk,t . Note that Et∗ Xk,t will in general vary over time and need not be lognormally distributed conditional on past information. The assumption of cross-sectional lognormality means that the log stochastic discount factor, m∗t + 1 ≡ log(Mt∗+ 1 ), can be written as a function of the cross-sectional mean and variance of the change in log consumption: 2 g ∗ ∗ (74) mt + 1 = − log(d) − gEt + 1 Dck,t + 1 + Var ∗t + 1 Dck,t + 1 , 2 K where Var ∗t is defined analogously to Et∗ as Var ∗t Xk,t = limK → ∞ (1/K) k = 1 (Xk,t − Et∗ Xk,t )2 , and like Et∗ will in general vary over time. An economist who knows the underlying preference parameters of investors but does not understand the heterogeneity in this economy might attempt to construct a representative-agent stochastic discount factor, MtRA + 1 , using aggregate consumption: MtRA +1 ≡ d
−g Et∗+ 1 Ck,t + 1 . Et∗ Ck,t
(75)
Ch. 13:
Consumption-Based Asset Pricing
875
The log of this stochastic discount factor can also be related to the cross-sectional mean and variance of the change in log consumption: g Var ∗t + 1 ck,t + 1 − Var ∗t ck,t g2 ∗ Var ∗t + 1 Dck,t + 1 , = − log(d) − gEt + 1 Dck,t + 1 − 2
∗ mRA t + 1 = − log(d) − gEt + 1 Dck,t + 1 −
(76)
where the second equality follows from the relation ck,t + 1 = ck,t + Dck,t + 1 and the fact that Dck,t + 1 is cross-sectionally uncorrelated with ck,t . The difference between these two variables can now be written as m∗t + 1 − mRA t+1 =
g(g + 1) Var ∗t + 1 Dck,t + 1 . 2
(77)
The time series of this difference can have a nonzero mean, helping to explain the riskfree rate puzzle, and a nonzero variance, helping to explain the equity premium puzzle. If the cross-sectional variance of log consumption growth is negatively correlated with the level of aggregate consumption, so that idiosyncratic risk increases in economic downturns, then the true stochastic discount factor m∗t + 1 will be more strongly countercyclical than the representative-agent stochastic discount factor constructed using the same preference parameters; this has the potential to explain the high price of risk without assuming that individual investors have high risk aversion. Mankiw (1986) makes a similar point in a two-period model. An important unresolved question is whether the heterogeneity we can measure has the characteristics that are needed to help resolve the asset pricing puzzles. In the Constantinides–Duffie model the heterogeneity must be large to have important effects on the stochastic discount factor; a cross-sectional standard deviation of log consumption growth of 20%, for example, is a cross-sectional variance of only 0.04, and it is variation in this number over time that is needed to explain the equity premium puzzle. Interestingly, the effect of heterogeneity is strongly increasing in risk aversion since Var ∗t + 1 Dck,t + 1 is multiplied by g(g + 1)/ 2 in Equation (77). This suggests that heterogeneity may supplement high risk aversion but cannot altogether replace it as an explanation for the equity premium puzzle. Lettau (2002) uses U.S. panel data to argue that measured heterogeneity can have only small effects on the SDF. He assumes that individuals consume their income and calculates the risk-aversion coefficients needed to put model-based stochastic discount factors inside the Hansen–Jagannathan volatility bounds. This procedure is conservative in that individuals trading in financial markets are normally able to achieve some smoothing of consumption relative to income. Nevertheless Lettau finds that high individual risk aversion is still needed to satisfy the Hansen–Jagannathan bounds. These conclusions may not be surprising given the Grossman–Shiller (1982) result that the aggregation problem disappears in a continuous-time diffusion model. In
876
J.Y. Campbell
such a model, the cross-sectional variance of consumption is locally deterministic and hence the false SDF MtRA + 1 correctly prices risky assets. In a discrete-time model the cross-sectional variance of consumption can change randomly from one period to the next, but in practice these changes are likely to be small. This limits the effects of consumption heterogeneity on asset pricing. It is also important to note that idiosyncratic shocks are assumed to be permanent in the Constantinides–Duffie model. Heaton and Lucas (1996) calibrate individual income processes to micro data from the Panel Study of Income Dynamics (PSID). Because the PSID data show that idiosyncratic income variation is largely transitory, Heaton and Lucas find that investors can minimize its effects on their consumption by borrowing and lending. This prevents heterogeneity from having any large effects on aggregate asset prices. To get around this problem, several recent papers have combined heterogeneity with constraints on borrowing. Heaton and Lucas (1996) and Krusell and Smith (1997) find that borrowing constraints or large costs of trading equities are needed to explain the equity premium. Constantinides, Donaldson and Mehra (2002) focus on heterogeneity across generations. In a stylized three-period overlapping generations model young agents have the strongest desire to hold equities because they have the largest ratio of labor income to financial wealth. If these agents are prevented from borrowing to buy equities, the equilibrium equity premium can be large. Heterogeneity in preferences may also be important. Several authors have recently argued that trading between investors with different degrees of risk aversion or time preference, possibly in the presence of market frictions or portfolio insurance constraints, can lead to time-variation in the market price of risk [Dumas (1989), Grossman and Zhou (1996), Wang (1996), Sandroni (1999), Chan and Kogan (2002)]. Intuitively, risk-tolerant agents hold more risky assets so they control a greater share of wealth in good states than in bad states; aggregate risk aversion therefore falls in good states, producing effects similar to those of habit formation.
5.3. Irrational expectations So far I have maintained the assumption that investors have rational expectations and understand the time-series behavior of dividend and consumption growth. A number of papers have explored the consequences of relaxing this assumption. [See for example Barberis, Shleifer and Vishny (1998), Barsky and De Long (1993), Cecchetti, Lam and Mark (2000), Chow (1989) or Hansen, Sargent and Tallarini (1999)]. In the absence of arbitrage, there exist positive state prices that can rationalize the prices of traded financial assets. These state prices equal subjective state probabilities multiplied by ratios of marginal utilities in different states. Thus given any model of utility, there exist subjective probabilities that produce the necessary state prices and in this sense explain the observed prices of traded financial assets. The interesting question is whether these subjective probabilities are sufficiently close to objective
Ch. 13:
Consumption-Based Asset Pricing
877
probabilities, and sufficiently related to known psychological biases in behavior, to be plausible. Many of the papers in this area work in partial equilibrium and assume that stocks are priced by discounting expected future dividends at a constant rate. This assumption makes it easy to derive any desired behavior of stock prices directly from assumptions on dividend expectations. Barsky and De Long (1993), for example, assume that investors believe dividends to be generated by a doubly integrated process, so that the dividend growth rate has a unit root. These expectations imply that rapid dividend growth increases stock prices more than proportionally, so the price-dividend ratio rises when dividends are growing strongly. If dividend growth is in fact stationary, then the high price–dividend ratio is typically followed by dividend disappointments, low stock returns, and reversion to the long-run mean price–dividend ratio. Thus, Barsky and De Long’s model can account for the volatility puzzle and the predictability of stock returns. 26 Another potentially important form of irrationality is a failure to understand the difference between real and nominal magnitudes. Modigliani and Cohn (1979) argued that investors suffer from inflation illusion, in effect discounting real cash flows at nominal interest rates. Ritter and Warr (2002) and Sharpe (2003) argue that inflation illusion may have led investors to bid up stock prices as inflation has declined since the early 1980s. A limitation of these models is that they do not consider general equilibrium issues, in particular the implications of irrational beliefs for aggregate consumption. Using for simplicity the fiction that dividends equal consumption, investors’ irrational expectations about dividend growth should be linked to their irrational expectations about consumption growth. Interest rates are not exogenous, but like stock prices, are determined by investors’ expectations. Thus, it is significantly harder to build a general equilibrium model with irrational expectations. To see how irrationality can affect asset prices, consider first a static model in which log consumption follows a random walk (÷ = 0) with drift g. Investors understand that consumption is a random walk, but they expect it to grow at rate gˆ instead of g. Equation (38) implies that the log price–dividend ratio is gˆ 1 k + l− , (78) pet − det = 1−ø y 1−ø Equation (21) implies that the riskless interest rate is rf ,t + 1 = − log d +
gˆ q − 1 2 q 2 s , + s − y 2 w 2y 2 c
(79)
26 Shiller (2000) discusses psychological factors that contribute to the formation of extrapolative expectations, with special reference to the runup in stock prices during the 1990s. Barberis, Shleifer and Vishny (1998) present a related model.
878
J.Y. Campbell
and the rationally expected equity premium is s2 Et re,t + 1 − rf ,t + 1 + e = glsc2 + l(g − g). ˆ 2
(80)
The first term on the right-hand side of Equation (80) is the standard formula for the equity premium in a model with serially uncorrelated consumption growth. This is investors’ irrational expectation of the equity premium. The second term arises because dividend growth is systematically different from what investors expect. This model illustrates that irrational pessimism among investors (gˆ < g) can lower the average risk-free rate and increase the equity premium. Thus, pessimism has the same effects on asset prices as a low rate of time preference and a high coefficient of risk aversion, and it can help to explain both the risk-free rate puzzle and the equity premium puzzle. 27 To explain the volatility puzzle, a more complicated model of irrationality is needed. Suppose now that log consumption growth follows an AR(1) process, a special case of Equation (36), but that investors believe the persistence coefficient to be ÷ˆ when in fact it is ÷. 28 In this case the risk-free interest rate is given by rf ,t + 1 = mf +
÷ˆ (Dct − g) , y
(81)
while the rationally expected equity premium is
ˆ ø ÷ 1 se2 = me − ÷ˆ − ÷ l− Et re,t + 1 − rf ,t + 1 + + l (Dct − g) , 2 y 1 − ø÷ˆ (82) where mf and me are constants. If ÷ˆ is larger than ÷, and if the term in square brackets in Equation (82) is positive, then the equity premium falls when consumption growth has been rapid, and rises when consumption growth has been weak. This model, which can be seen as a general equilibrium version of Barsky and De Long (1993), fits the apparent cyclical variation in the market price of risk. One difficulty with this explanation for stock market behavior is that it has strong implications for bond market behavior. When investors become “irrationally 27 The effect of pessimism on the average price-dividend ratio is ambiguous, for the usual reason that lower riskfree rates and lower expected dividend growth have offsetting effects. Hansen, Sargent and Tallarini (1999) also emphasize that irrational pessimism can be observationally equivalent to lower time preference and higher risk aversion. 28 An alternative formulation would be to assume, following Equation (36), that log consumption growth is predicted by a state variable zt that investors observe, but that investors misperceive the persistence of this process to be ÷ˆ rather than ÷. In this case investors correctly forecast consumption growth over the next period, but incorrectly forecast subsequent consumption growth. Investors’ irrationality has no effect on the riskfree interest rate but causes time-variation in equity and bond premia.
Ch. 13:
Consumption-Based Asset Pricing
879
exuberant”, their optimism should lead to a strong desire to borrow from the future, which should drive up the riskless interest rate and the real bond premium even while it drives down the equity premium. Barsky and De Long (1993) work in partial equilibrium so they do not confront this problem. Cecchetti, Lam and Mark (2000) handle it by allowing the degree of investors’ irrationality itself to be stochastic and time-varying. 29 6. Some implications for macroeconomics The research summarized in this chapter has important implications for various aspects of macroeconomics. I conclude by briefly discussing some of these. A first set of issues concerns the modelling of production, and hence of investment. This chapter has followed the bulk of the asset-pricing literature by concentrating on the relation between asset prices and consumption, without asking how consumption is determined in relation to investment and production. Ultimately this is unsatisfactory, and authors such as Cochrane (1991, 1996) and Rouwenhorst (1995) have argued that asset pricing should place a renewed emphasis on the investment decisions of firms. Standard macroeconomic models with production, such as the canonical real business cycle model of Prescott (1986), imply that asset prices are extremely stable. The real interest rate equals the marginal product of capital, which is perturbed only by technology shocks and changes in the quantity of capital; when the model is calibrated to U.S. data the standard deviation of the real interest rate is only a few basis points. The return on capital is equally stable because capital can costlessly be transformed into consumption goods, so its price is always fixed at one and uncertainty in the return comes only from uncertainty about dividends. If real business cycle models are to generate volatile asset returns, they must be modified to include adjustment costs in investment so that changes in the demand for capital cause changes in the value of installed capital, or Tobin’s q, rather than changes in the quantity of capital. Baxter and Crucini (1993) and Jermann (1998), among others, show how this can be done. The adjustment costs affect not only asset prices, but other aspects of the model; the response of investment to shocks falls, for example, so larger shocks are needed to explain the cyclical behavior of investment. The modelling of labor supply is an equally difficult problem. Any model in which workers choose their labor supply implies a first-order condition of the form ðU ðU Gt = − , ðCt ðNt
(83)
where Gt is the real wage and Nt is labor supply. A well-known difficulty in business cycle theory is that with a constant real wage, the marginal utility of consumption 29 The work of Rietz (1988) can be understood in a similar way. Rietz argues that investors are concerned about an unlikely but serious event that has not actually occurred. Given the data we have, investors appear to be irrational but in fact, with a long enough data sample, they will prove to be rational.
880
J.Y. Campbell
ðU/ðCt will be perfectly correlated with the marginal disutility of work −ðU/ðNt . Since the marginal utility of consumption is declining in consumption while the marginal disutility of work is increasing in hours, this implies that consumption and hours worked will be negatively correlated. In the data, of course, consumption and hours worked are positively correlated since they are both procyclical. This problem can be resolved if the real wage is procyclical; then when consumption and hours increase in an expansion the decline in marginal utility of consumption is more than offset by an increase in the real wage. In a standard model with log utility of consumption only a 1% increase in the real wage is needed to offset the decline in marginal utility caused by a 1% increase in consumption. But preferences of the sort suggested by the asset pricing literature, with high risk aversion and low intertemporal elasticity of substitution, have rapidly declining marginal utility of consumption. These preferences imply that a much larger increase in the real wage will be needed to offset the effect on labor supply of a given increase in consumption. Boldrin, Christiano and Fisher (2001) try to resolve this problem by using a two-sector framework with limited mobility of labor between sectors. In this framework the first-order condition (83) does not hold contemporaneously, but only in expectation. Models with production also help one to move away from the common assumption that stock market dividends equal consumption or equivalently, that the aggregate stock market equals total national wealth. This assumption is clearly untrue even for the United States, and is even less appropriate for countries with smaller stock markets. While one can relax the assumption by writing down exogenous correlated timeseries processes for dividends and consumption in the manner of Section 4.3, it will ultimately be more satisfactory to derive both dividends and consumption within a general equilibrium model. Another important set of issues concerns the links between different national economies and their financial markets. In this chapter I have treated each national stock market as a separate entity with its own pricing model. That is, I have assumed that national economies are entirely closed so that there is no integrated world capital market. This assumption may be appropriate for examining long-term historical data, but it seems questionable under modern conditions. There is much work to be done on the pricing of national stock markets in a model with a perfectly or partially integrated world capital market. Finally, the asset-pricing literature is important in understanding the welfare costs of macroeconomic fluctuations. As Atkeson and Phelan (1994) and Alvarez and Jermann (2000) emphasize, asset market data reveal the tradeoff between average growth and volatility of wealth that is offered by asset markets, and this tradeoff must reflect investors’ preferences. Economic policymakers should take this into account when they face policy tradeoffs between economic growth and macroeconomic stability.
Ch. 13:
Consumption-Based Asset Pricing
881
References Abel, A.B. (1990), “Asset prices under habit formation and catching up with the Joneses”, American Economic Review Papers and Proceedings 80:38−42. Abel, A.B. (1994), “Exact solutions for expected rates of return under Markov regime switching: implications for the equity premium puzzle”, Journal of Money, Credit, and Banking 26:345−361. Abel, A.B. (1999), “Risk premia and term premia in general equilibrium”, Journal of Monetary Economics 43:3−33. Abel, A.B., N.G. Mankiw, L.H. Summers and R. Zeckhauser (1989), “Assessing dynamic efficiency: theory and evidence”, Review of Economic Studies 56:1−20. Alvarez, F., and U. Jermann (2000), “Using asset prices to measure the cost of business cycles”, NBER Working Paper No. 7978 (National Bureau of Economic Research). Atkeson, A., and C. Phelan (1994), “Reconsidering the costs of business cycles with incomplete markets”, in: S. Fischer and J.J. Rotemberg, eds., NBER Macroeconomics Annual 1994 (MIT Press, Cambridge) pp. 187−207. Attanasio, O.P., and G. Weber (1993), “Consumption growth, the interest rate, and aggregation”, Review of Economic Studies 60:631−649. Baker, M., and J. Wurgler (2000), “The equity share in new issues and aggregate stock returns”, Journal of Finance 55:2219−2257. Bansal, R., and W.J. Coleman II (1996), “A monetary explanation of the equity premium, term premium, and risk-free rate puzzles”, Journal of Political Economy 104:1135−1171. Bansal, R., and A. Yaron (2000), “Risks for the long run: a potential resolution of asset pricing puzzles”, NBER Working Paper 8059 (National Bureau of Economic Research). Barberis, N., A. Shleifer and R.W. Vishny (1998), “A model of investor sentiment”, Journal of Financial Economics 49:307−343. Barberis, N., M. Huang and T. Santos (2001), “Prospect theory and asset prices”, Quarterly Journal of Economics 116:1−53. Barclays de Zoete Wedd Securities Limited (1995), The BZW Equity-Gilt Study: Investment in the London Stock Market since 1918 (BZW, London). Barsky, R.B., and J.B. De Long (1993), “Why does the stock market fluctuate?”, Quarterly Journal of Economics 107:291−311. Baxter, M., and M.J. Crucini (1993), “Explaining saving-investment correlations”, American Economic Review 83:416−436. Beaudry, P., and E. van Wincoop (1996), “The intertemporal elasticity of substitution: an exploration using a US panel of state data”, Economica 63:495−512. Bekaert, G., R.J. Hodrick and D.A. Marshall (1997), “On biases in tests of the expectations hypothesis of the term structure of interest rates”, Journal of Financial Economics 44:309−348. Bekaert, G., R.J. Hodrick and D.A. Marshall (2001), “Peso problem explanations for term structure anomalies”, Journal of Monetary Economics 48:241−270. Benartzi, S., and R. Thaler (1995), “Myopic loss aversion and the equity premium puzzle”, Quarterly Journal of Economics 110:73−92. Black, F. (1976), “Studies of stock price volatility changes”, in: Proceedings of the 1976 Meetings of the Business and Economic Statistics Section (American Statistical Association) pp. 177−181. Blanchard, O.J., and M. Watson (1982), “Bubbles, rational expectations, and financial markets”, in: P. Wachtel, ed., Crises in the Economic and Financial Structure: Bubbles, Bursts, and Shocks (Lexington, Lexington, MA) pp. 295–316. Boldrin, M., L.J. Christiano and J.D.M. Fisher (2001), “Habit persistence, asset returns, and the business cycle”, American Economic Review 91:149−166. Bollerslev, T., R. Engle and J. Wooldridge (1988), “A capital asset pricing model with time varying covariances”, Journal of Political Economy 96:116−131.
882
J.Y. Campbell
Bollerslev, T., R.Y. Chou and K.F. Kroner (1992), “ARCH modeling in finance: a review of the theory and empirical evidence”, Journal of Econometrics 52:5−59. Breeden, D. (1979), “An intertemporal asset pricing model with stochastic consumption and investment opportunities”, Journal of Financial Economics 7:265−296. Brown, S., W. Goetzmann and S. Ross (1995), “Survival”, Journal of Finance 50:853−873. Campbell, J.Y. (1986), “Bond and stock returns in a simple exchange model”, Quarterly Journal of Economics 101:785−804. Campbell, J.Y. (1987), “Stock returns and the term structure”, Journal of Financial Economics 18: 373−399. Campbell, J.Y. (1991), “A variance decomposition for stock returns”, Economic Journal 101:157−179. Campbell, J.Y. (1993), “Intertemporal asset pricing without consumption data”, American Economic Review 83:487−512. Campbell, J.Y. (1996), “Understanding risk and return”, Journal of Political Economy 104:298−345. Campbell, J.Y. (1999), “Asset prices, consumption, and the business cycle”, in: J.B. Taylor and M. Woodford, eds., Handbook of Macroeconomics, Vol. 1 (Elsevier, Amsterdam) pp. 1231−1303. Campbell, J.Y., and J.H. Cochrane (1999), “By force of habit: a consumption-based explanation of aggregate stock market behavior”, Journal of Political Economy 107:205−251. Campbell, J.Y., and A.S. Kyle (1993), “Smart money, noise trading, and stock price behavior”, Review of Economic Studies 60:1−34. Campbell, J.Y., and N.G. Mankiw (1989), “Consumption, income, and interest rates: reinterpreting the time series evidence”, in: O.J. Blanchard and S. Fischer, eds., National Bureau of Economic Research Macroeconomics Annual 4:185−216. Campbell, J.Y., and N.G. Mankiw (1991), “The response of consumption to income: a cross-country investigation”, European Economic Review 35:723−767. Campbell, J.Y., and R.J. Shiller (1988a), “The dividend-price ratio and expectations of future dividends and discount factors”, Review of Financial Studies 1:195−227. Campbell, J.Y., and R.J. Shiller (1988b), “Stock prices, earnings, and expected dividends”, Journal of Finance 43:661−676. Campbell, J.Y., and R.J. Shiller (1991), “Yield spreads and interest rate movements: a bird’s eye view”, Review of Economic Studies 58:495−514. Campbell, J.Y., and R.J. Shiller (2001), “Valuation ratios and the long-run stock market outlook: an update”, NBER Working Paper No. 8221 (National Bureau of Economic Research). Campbell, J.Y., and M. Yogo (2002), “Efficient tests of stock return predictability”, unpublished paper (Harvard University). Campbell, J.Y., A.W. Lo and A.C. MacKinlay (1997), The Econometrics of Financial Markets (Princeton University Press, Princeton, NJ). Carroll, C.D. (1992), “The buffer-stock theory of saving: some macroeconomic evidence”, Brookings Papers on Economic Activity 2:61−156. Cavanagh, C., G. Elliott and J. Stock (1995), “Inference in models with nearly integrated regressors”, Econometric Theory 11:1131−1147. Cecchetti, S.G., P.-S. Lam and N.C. Mark (1990), “Mean reversion in equilibrium asset prices”, American Economic Review 80:398−418. Cecchetti, S.G., P.-S. Lam and N.C. Mark (1993), “The equity premium and the risk-free rate: matching the moments”, Journal of Monetary Economics 31:21−45. Cecchetti, S.G., P.-S. Lam and N.C. Mark (2000), “Asset pricing with distorted beliefs: are equity returns too good to be true?” American Economic Review 90:787−805. Chan, Y.L., and L. Kogan (2002), “Heterogeneous preferences and the dynamics of asset prices”, Journal of Political Economy 110:1255−1285. Chen, N. (1991), “Financial investment opportunities and the macroeconomy”, Journal of Finance 46:529−554.
Ch. 13:
Consumption-Based Asset Pricing
883
Chou, R.Y., R.F. Engle and A. Kane (1992), “Measuring risk aversion from excess returns on a stock index”, Journal of Econometrics 52:201−224. Chow, G.C. (1989), “Rational versus adaptive expectations in present value models”, Review of Economics and Statistics 71:376−384. Cochrane, J.H. (1991), “Production-based asset pricing and the link between stock returns and economic fluctuations”, Journal of Finance 46:209−237. Cochrane, J.H. (1994), “Permanent and transitory components of GDP and stock prices”, Quarterly Journal of Economics 109:241−265. Cochrane, J.H. (1996), “A cross-sectional test of an investment-based asset pricing model”, Journal of Political Economy 104:572−621. Cochrane, J.H. (2001), Asset Pricing (Princeton University Press, Princeton, NJ). Cochrane, J.H., and L.P. Hansen (1992), “Asset pricing lessons for macroeconomics”, in: O.J. Blanchard and S. Fischer, eds., NBER Macroeconomics Annual 1992 (MIT Press, Cambridge, MA). Constantinides, G. (1990), “Habit formation: a resolution of the equity premium puzzle”, Journal of Political Economy 98:519−543. Constantinides, G., and D. Duffie (1996), “Asset pricing with heterogeneous consumers”, Journal of Political Economy 104:219−240. Constantinides, G., J. Donaldson and R. Mehra (2002), “Junior can’t borrow: a new perspective on the equity premium puzzle”, Quarterly Journal of Economics 117:269−296. Cutler, D., J. Poterba and L. Summers (1991), “Speculative dynamics”, Review of Economic Studies 58:529−546. De Long, J.B., A. Shleifer, L. Summers and M. Waldmann (1990), “Noise trader risk in financial markets”, Journal of Political Economy 98:703−738. Deaton, A.S. (1991), “Saving and liquidity constraints”, Econometrica 59:1221−1248. Dimson, E., P. Marsh and M. Staunton (2002), Triumph of the Optimists: 101 Years of Global Investment Returns (Princeton University Press, Princeton, NJ). Dumas, B. (1989), “Two-person dynamic equilibrium in the capital market”, Review of Financial Studies 2:157−188. Dunn, K., and K. Singleton (1986), “Modeling the term structure of interest rates under non-separable utility and durability of goods”, Journal of Financial Economics 17:27−55. Economist (1987), One Hundred Years of Economic Statistics (The Economist, London). Epstein, L., and S. Zin (1989), “Substitution, risk aversion, and the temporal behavior of consumption and asset returns: a theoretical framework”, Econometrica 57:937−968. Epstein, L., and S. Zin (1991), “Substitution, risk aversion, and the temporal behavior of consumption and asset returns: an empirical investigation”, Journal of Political Economy 99:263−286. Estrella, A., and G. Hardouvelis (1991), “The term structure as a predictor of real economic activity”, Journal of Finance 46:555−576. Fama, E.F., and R. Bliss (1987), “The information in long-maturity forward rates”, American Economic Review 77:680−692. Fama, E.F., and K.R. French (1988a), “Permanent and temporary components of stock prices”, Journal of Political Economy 96:246−273. Fama, E.F., and K.R. French (1988b), “Dividend yields and expected stock returns”, Journal of Financial Economics 22:3−27. Fama, E.F., and K.R. French (1989), “Business conditions and expected returns on stocks and bonds”, Journal of Financial Economics 25:23−49. Fama, E.F., and K.R. French (2002), “The equity premium”, Journal of Finance 57:637−659. Ferson, W.E., and G. Constantinides (1991), “Habit persistence and durability in aggregate consumption: empirical tests”, Journal of Financial Economics 29:199−240. French, K., G.W. Schwert and R.F. Stambaugh (1987), “Expected stock returns and volatility”, Journal of Financial Economics 19:3−30.
884
J.Y. Campbell
Frennberg, P., and B. Hansson (1992), “Computation of a monthly index for Swedish stock returns 1919–1989”, Scandinavian Economic History Review 40:3−27. Froot, K., and M. Obstfeld (1991), “Intrinsic bubbles: the case of stock prices”, American Economic Review 81:1189−1217. Gabaix, X., and D. Laibson (2001), “The 6D bias and the equity premium puzzle”, in: B. Bernanke and K. Rogoff, eds., National Bureau of Economic Research Macroeconomics Annual (MIT Press, Cambridge, MA). Glosten, L., R. Jagannathan and D. Runkle (1993), “On the relation between the expected value and the volatility of the nominal excess return on stocks”, Journal of Finance 48:1779−1801. Gordon, M. (1962), The Investment, Financing, and Valuation of the Corporation (Irwin, Homewood, IL). Grossman, S.J., and R.J. Shiller (1981), “The determinants of the variability of stock market prices”, American Economic Review 71:222−227. Grossman, S.J., and R.J. Shiller (1982), “Consumption correlatedness and risk measurement in economies with non-traded assets and heterogeneous information”, Journal of Financial Economics 10:195−210. Grossman, S.J., and Z. Zhou (1996), “Equilibrium analysis of portfolio insurance”, Journal of Finance 51:1379−1403. Grossman, S.J., A. Melino and R.J. Shiller (1987), “Estimating the continuous time consumption based asset pricing model”, Journal of Business and Economic Statistics 5:315−328. Hall, R.E. (1988), “Intertemporal substitution in consumption”, Journal of Political Economy 96:221−273. Hamilton, J.D. (1989), “A new approach to the analysis of nonstationary returns and the business cycle”, Econometrica 57:357−384. Hansen, L.P., and R. Jagannathan (1991), “Restrictions on intertemporal marginal rates of substitution implied by asset returns”, Journal of Political Economy 99:225−262. Hansen, L.P., and K.J. Singleton (1983), “Stochastic consumption, risk aversion, and the temporal behavior of asset returns”, Journal of Political Economy 91:249−268. Hansen, L.P., T.J. Sargent and T.D. Tallarini (1999), “Robust permanent income and pricing”, Review of Economic Studies 66:873−907. Hardouvelis, G.A. (1994), “The term structure spread and future changes in long and short rates in the G7 countries: is there a puzzle?”, Journal of Monetary Economics 33:255−283. Harrison, P., and H. Zhang (1999), “An investigation of the risk and return relation at long horizons”, Review of Economics and Statistics 81:399−408. Harvey, C.R. (1989), “Time-varying conditional covariances in tests of asset pricing models”, Journal of Financial Economics 24:289−317. Harvey, C.R. (1991), “The world price of covariance risk”, Journal of Finance 46:111−157. Hassler, J., P. Lundvik, T. Persson and P. S¨oderlind (1994), “The Swedish business cycle: stylized facts over 130 years”, in: V. Bergstr¨om and A. Vredin, eds., Measuring and Interpreting Business Cycles (Clarendon Press, Oxford) pp. 9–113. Heaton, J. (1995), “An empirical investigation of asset pricing with temporally dependent preference specifications”, Econometrica 63:681−717. Heaton, J., and D. Lucas (1996), “Evaluating the effects of incomplete markets on risk sharing and asset pricing”, Journal of Political Economy 104:668−712. Heaton, J., and D. Lucas (1999), “Stock prices and fundamentals”, NBER Macroeconomics Annual, pp. 213−242. Hodrick, R.J. (1992), “Dividend yields and expected stock returns: alternative procedures for inference and measurement”, Review of Financial Studies 5:357−386. Jagannathan, R., E.R. McGrattan and A. Scherbina (2001), “The declining US equity premium”, NBER Working Paper 8172 (National Bureau of Economic Research). Jermann, U.J. (1998), “Asset pricing in production economies”, Journal of Monetary Economics 41: 257−275.
Ch. 13:
Consumption-Based Asset Pricing
885
Jorion, P., and W.N. Goetzmann (1999), “Global stock markets in the twentieth century”, Journal of Finance 54:953−980. Kahneman, D., and A. Tversky (1979), “Prospect theory: an analysis of decision under risk”, Econometrica 47:263−291. Kandel, S., and R.F. Stambaugh (1991), “Asset returns and intertemporal preferences”, Journal of Monetary Economics 27:39−71. Keim, D., and R.F. Stambaugh (1986), “Predicting returns in stock and bond markets”, Journal of Financial Economics 17:357−390. Kocherlakota, N. (1996), “The equity premium: it’s still a puzzle”, Journal of Economic Literature 34:42−71. Kreps, D., and E. Porteus (1978), “Temporal resolution of uncertainty and dynamic choice theory”, Econometrica 46:185−200. Krusell, P., and A.A. Smith Jr (1997), “Income and wealth heterogeneity, portfolio choice, and equilibrium asset returns”, Macroeconomic Dynamics 1:387−422. Kugler, P. (1988), “An empirical note on the term structure and interest rate stabilization policies”, Quarterly Journal of Economics 103:789−792. La Porta, R., F. Lopez-de-Silanes, A. Shleifer and R. Vishny (1997), “Legal determinants of external finance”, Journal of Finance 52:1131−1150. Lamont, O. (1998), “Earnings and expected returns”, Journal of Finance 53:1563−1587. LeRoy, S.F., and R. Porter (1981), “The present value relation: tests based on variance bounds”, Econometrica 49:555−577. Lettau, M. (2002), “Idiosyncratic risk and volatility bounds, or, can models with idiosyncratic risk solve the equity premium puzzle?”, Review of Economics and Statistics 84:376−380. Lettau, M., and S. Ludvigson (2001), “Consumption, aggregate wealth, and expected stock returns”, Journal of Finance 56:815−849. Lewellen, J. (1999), “The time-series relations among expected return, risk, and book-to-market”, Journal of Financial Economics 54:5−43. Lewellen, J. (2003), “Predicting returns with financial ratios”, Journal of Financial Economics, forthcoming. Li, H., and Y. Xu (2002), “Survival bias and the equity premium puzzle”, Journal of Finance 57: 1981−1995. Liang, N., and S.A. Sharpe (1999), “Share repurchases and employee stock options and their implications for S&P500 share retirements and expected returns”, Finance and Economics Discussion Series 199959 (Board of Governors of the Federal Reserve System). Lucas Jr, R.E. (1978), “Asset prices in an exchange economy”, Econometrica 46:1429−1446. Mankiw, N.G. (1986), “The equity premium and the concentration of aggregate shocks”, Journal of Financial Economics 17:211−219. Mankiw, N.G., and J.A. Miron (1986), “The changing behavior of the term structure of interest rates”, Quarterly Journal of Economics 101:211−228. Mankiw, N.G., and S.P. Zeldes (1991), “The consumption of stockholders and non-stockholders”, Journal of Financial Economics 29:97−112. Mehra, R., and E. Prescott (1985), “The equity premium puzzle”, Journal of Monetary Economics 15:145−161. Merton, R. (1973), “An intertemporal capital asset pricing model”, Econometrica 41:867−887. Miron, J.A. (1986), “Seasonal fluctuations and the life cycle-permanent income hypothesis of consumption”, Journal of Political Economy 94:1258−1279. Modigliani, F., and R.A. Cohn (1979), “Inflation and the stock market”, Financial Analysts Journal 35:24−44. Nelson, C., and R. Startz (1990), “The distribution of the instrumental variables estimator and its t-ratio when the instrument is a poor one”, Journal of Business 63:S125-S140.
886
J.Y. Campbell
Ogaki, M., and C. Reinhart (1998), “Measuring intertemporal substitution: the role of durable goods”, Journal of Political Economy 106:1078−1098. Parker, J.A. (2001), “The consumption risk of the stock market”, Brookings Papers on Economic Activity 2:279−348. Poterba, J., and L.H. Summers (1988), “Mean reversion in stock returns: evidence and implications”, Journal of Financial Economics 22:27−60. Prescott, E.C. (1986), “Theory ahead of business cycle measurement”, Carnegie-Rochester Conference Series on Public Policy 25:11−66. Restoy, F., and P. Weil (1998), “Approximate equilibrium asset prices”, NBER Working Paper 6611 (National Bureau of Economic Research). Rietz, T. (1988), “The equity risk premium: a solution?”, Journal of Monetary Economics 21:117−132. Ritter, J.R., and R.S. Warr (2002), “The decline of inflation and the bull market of 1982 to 1999”, Journal of Financial and Quantitative Analysis 37:29−61. Rouwenhorst, K.G. (1995), “Asset pricing implications of equilibrium business cycle models”, in: T. Cooley, ed., Frontiers of Business Cycle Research (Princeton University Press, Princeton, NJ) pp. 294–330. Rubinstein, M. (1976), “The valuation of uncertain income streams and the pricing of options”, Bell Journal of Economics 7:407−425. Sandroni, A. (1999), “Asset prices and the distribution of wealth”, Economics Letters 64:203−207. Santos, M.S., and M. Woodford (1997), “Rational asset pricing bubbles”, Econometrica 65:19−57. Schwert, G.W. (1989), “Why does stock market volatility change over time?”, Journal of Finance 44:1115−1153. Sharpe, S.A. (2003), “Reexamining stock valuation and inflation: the implications of analysts’ earnings forecasts”, Review of Economics and Statistics 84:632−648. Shiller, R.J. (1981), “Do stock prices move too much to be justified by subsequent changes in dividends?”, American Economic Review 71:421−436. Shiller, R.J. (1982), “Consumption, asset markets, and macroeconomic fluctuations”, Carnegie Mellon Conference Series on Public Policy 17:203−238. Shiller, R.J. (1984), “Stock prices and social dynamics”, Brookings Papers on Economic Activity 2:457−498. Shiller, R.J. (2000), Irrational Exuberance (Princeton University Press, Princeton, NJ). Siegel, J. (1994), Stocks for the Long Run (Richard D. Irwin, Chicago). Singleton, K. (1990), “Specification and estimation of intertemporal asset pricing models”, in: B. Friedman and F. Hahn, eds., Handbook of Monetary Economics (Elsevier, Amsterdam) pp. 583–626. Staiger, D., and J.H. Stock (1997), “Instrumental variables regression with weak instruments”, Econometrica 65:557−586. Stambaugh, R.F. (1999), “Predictive regressions”, Journal of Financial Economics 54:375−421. Sun, T. (1992), “Real and nominal interest rates: a discrete-time model and its continuous-time limit”, Review of Financial Studies 5:581−611. Sundaresan, S.M. (1989), “Intertemporally dependent preferences and the volatility of consumption and wealth”, Review of Financial Studies 2:73−88. Svensson, L. (1989), “Portfolio choice with non-expected utility in continuous time”, Economics Letters 30:313−317. Thaler, R.H., and E.J. Johnson (1990), “Gambling with the house money and trying to break even: the effects of prior outcomes on risky choice”, Management Science 36:643−660. Tirole, J. (1985), “Asset bubbles and overlapping generations”, Econometrica 53:1499−1527. Torous, W., R. Valkanov and S. Yan (2003), “On predicting stock returns with nearly integrated explanatory variables”, Journal of Business, forthcoming. Vasicek, O. (1977), “An equilibrium characterization of the term structure”, Journal of Financial Economics 5:177−188.
Ch. 13:
Consumption-Based Asset Pricing
887
Vissing-Jorgensen, A. (2002), “Limited asset market participation and the elasticity of intertemporal substitution in consumption”, Journal of Political Economy 110:825−853. Vuolteenaho, T. (2000), “Understanding the aggregate book-market ratio and its implications to current equity-premium expectations”, unpublished paper (Harvard University). Wachter, J. (2001), “Habit formation and returns on bonds and stocks”, unpublished paper (New York University). Wang, J. (1996), “The term structure of interest rates in a pure exchange economy with heterogeneous investors”, Journal of Financial Economics 41:75−110. Weil, P. (1989), “The equity premium puzzle and the risk-free rate puzzle”, Journal of Monetary Economics 24:401−421. Wheatley, S. (1988), “Some tests of the consumption-based asset pricing model”, Journal of Monetary Economics 22:193−218. Whitelaw, R.F. (2000), “Stock market risk and return: an equilibrium approach”, Review of Financial Studies 13:521−547. Wilcox, D. (1992), “The construction of US consumption data: some facts and their implications for empirical work”, American Economic Review 82:922−941. Williams, J.B. (1938), The Theory of Investment Value (Harvard University Press, Cambridge, MA). Yogo, M. (2003), “Estimating the elasticity of intertemporal substitution when instruments are weak”, Review of Economics and Statistics, forthcoming.
This Page Intentionally Left Blank
Chapter 14
THE EQUITY PREMIUM IN RETROSPECT RAJNISH MEHRA ° University of California, Santa Barbara, and NBER EDWARD C. PRESCOTT∗ University of Minnesota, and Federal Reserve Bank of Minneapolis
Contents Abstract Keywords 1. Introduction 2. The equity premium: history 2.1. Facts 2.2. Data sources 2.2.1. Subperiod 1802–1871 2.2.1.1. Equity return data 2.2.1.2. Return on a risk-free security 2.2.2. Sub-period 1871–1926 2.2.2.1. Equity return data 2.2.2.2. Return on a risk-free security 2.2.3. Sub-period 1926–present 2.2.3.1. Equity return data 2.2.3.2. Return on a risk-free security 2.3. Estimates of the equity premium 2.4. Variation in the equity premium over time
3. Is the equity premium due to a premium for bearing non-diversifiable risk? 3.1. Standard preferences 3.2. Estimating the equity risk premium versus estimating the risk aversion parameter 3.3. Alternative preference structures 3.3.1. Modifying the conventional time – and state – separable utility function 3.3.2. Habit formation 3.3.3. Resolution
890 890 891 891 891 892 892 892 893 893 893 893 893 893 894 894 897 899 902 912 913 913 914 917
° We thank George Constantinides, John Donaldson, Ellen R. McGrattan and Mark Rubinstein for helpful discussions. Mehra acknowledges financial support from the Academic Senate of the University of California. Prescott acknowledges financial support from the National Science Foundation.
Handbook of the Economics of Finance, Edited by G.M. Constantinides, M. Harris and R. Stulz © 2003 Elsevier B.V. All rights reserved
890
R. Mehra and E.C. Prescott 3.4. Idiosyncratic and uninsurable income risk 3.5. Models incorporating a disaster state and survivorship bias
4. Is the equity premium due to borrowing constraints, a liquidity premium or taxes? 4.1. Borrowing constraints 4.2. Liquidity premium 4.3. Taxes and regulation
5. An equity premium in the future? Appendix A Appendix B. The original analysis of the equity premium puzzle B.1. The economy, asset prices and returns
References
918 920 921 921 924 924 927 928 930 930 935
Abstract This paper is a critical review of the literature on the “equity premium puzzle”. The puzzle, as originally articulated more than fifteen years ago, underscored the inability of the standard paradigm of Economics and Finance to explain the magnitude of the risk premium, that is, the return earned by a risky asset in excess of the return to a relatively riskless asset such as a U.S. government bond. The paper summarizes the historical experience for the USA and other industrialized countries and details the intuition behind the discrepancy between model prediction and empirical data. Various research approaches that have been proposed to enhance the model’s realism are detailed and, as such, the paper reviews the major directions of theoretical financial research over the past ten years. The author argues that the majority of the proposed resolutions fail along crucial dimensions and proposes a promising direction for future research.
Keywords asset pricing, equity risk premium, CAPM, consumption CAPM, risk free rate puzzle JEL classification: G10, G12
Ch. 14:
The Equity Premium in Retrospect
891
1. Introduction More than two decades ago, we demonstrated that the equity premium (the return earned by a risky security in excess of that earned by a relatively risk-free T-bill), was an order of magnitude greater than could be rationalized in the context of the standard neoclassical paradigms of financial economics as a premium for bearing risk. We dubbed this historical regularity ‘the equity premium puzzle’. [Mehra and Prescott (1985)]. Our challenge to the profession has spawned a plethora of research efforts to explain it away. In this paper, we take a retrospective look at the puzzle, critically examine the data sources used to document the puzzle, attempt to clearly explain it and evaluate the various attempts to solve it. The paper is organized into four parts. Section 2 documents the historical equity premium in the United States and in selected countries with significant capital markets in terms of market value and comments on the data sources. Section 3 examines the question: “Is the equity premium due to a premium for bearing non-diversifiable risk?” Section 4 examines the related question: “Is the equity premium due to borrowing constraints, a liquidity premium or taxes?” Finally, Section 5 examines the equity premium expected to prevail in the future. We conclude that research to date suggests that the answer to the first question is “no”, unless one is willing to accept that individuals are implausibly risk averse. In answer to the second question McGrattan and Prescott (2001) found that, most likely, the high equity premium observed in the post-war period was indeed the result of a combination of the factors that included borrowing constraints and taxes. 2. The equity premium: history 2.1. Facts Any discussion of the equity premium over time confronts the question of which average returns are more useful in summarizing historical information: arithmetic or geometric? It is well known that the arithmetic average return exceeds the geometric average return and that if the returns are log-normally distributed, the difference between the two is one-half the variance of the returns. Since the annual standard deviation of the equity premium is about 20%, this can result in a difference of about 2% between the two measures, which is non-trivial since the phenomena under consideration has an arithmetic mean of between 2 and 8%. In Mehra and Prescott (1985), we reported arithmetic averages, since the best available evidence indicated that stock returns were uncorrelated over time. When this is the case, the expected future value of a $1 investment is obtained by compounding the arithmetic average of the sample return, which is the correct statistic to report if one is interested in the mean value of the investment. 1 If, however, the objective is to obtain the median 1
We present a simple proof in Appendix A.
892
R. Mehra and E.C. Prescott
future value of the investment, then the initial investment should be compounded at the geometric sample average. When returns are serially correlated, then the arithmetic average 2 can lead to misleading estimates and thus the geometric average may be the more appropriate statistic to use. In this paper, as in our 1985 paper, we report arithmetic averages. However, in instances where we cite the results of research when arithmetic averages are not available, we clearly indicate this. 3 2.2. Data sources A second crucial consideration in a discussion of the historical equity premium has to do with the reliability of early data sources. The data documenting the historical equity premium in the USA can be subdivided into three distinct sub-periods: 1802– 1871, 1871–1926 and 1926–present. The quality of the data is very different for each subperiod. Data on stock prices for the nineteenth century is patchy, often necessarily introducing an element of arbitrariness to compensate for its incompleteness. 2.2.1. Subperiod 1802–1871 2.2.1.1. Equity return data. We find that the equity return data prior to 1871 is not particularly reliable. To the best of our knowledge, the stock return data used by all researchers for the period 1802–1871 is due to Schwert (1990), who gives an excellent account of the construction and composition of early stock market indexes. Schwert (1990) constructs a “spliced” index for the period 1802–1987; his index for the period 1802–1862 is based on the work of Smith and Cole (1935), who constructed a number of early stock indexes. For the period 1802–1820, their index was constructed from an equally weighted portfolio of seven bank stocks, while another index for 1815– 1845 was composed of six bank stocks and one insurance stock. For the period 1834– 1862 the index consisted of an equally weighted portfolio of (at most) 27 railroad stocks. 4 They used one price quote, per stock, per month, from local newspapers. The prices used were the average of the bid and ask prices, rather than transaction prices, and their computation of returns ignores dividends. For the period 1863–1871, Schwert uses data from Macaulay (1938), who constructed a value-weighted index using a portfolio of about 25 north-east and mid-Atlantic railroad stocks; 5 this index
2
The point is well illustrated by the textbook example where an initial investment of $100 is worth $200 after one year and $100 after two years. The arithmetic average return is 25% whereas the geometric average return is 0%. The latter coincides with the true return. 3 In this case an approximate estimate of the arithmetic average return can be obtained by adding one-half the variance of the returns to the geometric average. 4 “They chose stocks in hindsight . . . the sample selection bias caused by including only stocks that survived and were actively quoted for the whole period is obvious” [Schwert (1990)]. 5 “It is unclear what sources Macaulay used to collect individual stock prices but he included all railroads with actively traded stocks” (Ibid).
Ch. 14:
The Equity Premium in Retrospect
893
also excludes dividends. Needless to say, it is difficult to assess how well this data proxies the “market”, since undoubtedly there were other industry sectors that were not reflected in the index. 2.2.1.2. Return on a risk-free security. Since there were no Treasury bills at the time, researchers have used the data set constructed by Siegel (1998) for this period, using highly rated securities with an adjustment for the default premium. It is interesting to observe, as mentioned earlier, that based on this data set the equity premium for the period 1802–1862 was zero. We conjecture that this may be due to the fact that since most financing in the first half of the nineteenth century was done through debt, the distinction between debt and equity securities was not very clear-cut. 6 2.2.2. Sub-period 1871–1926 2.2.2.1. Equity return data. Shiller (1990) is the definitive source for the equity return data for this period. His data is based on the work of Cowles (1939), which covers the period 1871–1938. Cowles used a value-weighted portfolio for his index, which consisted of 12 stocks 7 in 1871 and ended with 351 in 1938. He included all stocks listed on the New York Stock Exchange, whose prices were reported in the Commercial and Financial Chronicle. From 1918 onward he used the Standard and Poor’s (S&P) industrial portfolios. Cowles reported dividends, so that, unlike the earlier indexes for the period 1802–1871, a total return calculation was possible. 2.2.2.2. Return on a risk-free security. There is no definitive source for the short-term risk-free rate in the period before 1920, when Treasury certificates were first issued. In our 1985 study, we used short-term commercial paper as a proxy for a riskless shortterm security prior to 1920 and Treasury certificates from 1920–1930. Our data prior to 1920, was taken from Homer (1963). Most researchers have either used our data set or Siegel’s. 2.2.3. Sub-period 1926–present 2.2.3.1. Equity return data. This period is the “Golden Age” in regards to accurate financial data. The NYSE database at the Center for Research in Security Prices (CRSP) was initiated in 1926 and provides researchers with high quality equity return data.
6 The first actively traded stock was floated in the USA in 1791 and by 1801 there were over 300 corporations, although less than 10 were actively traded [Siegel (1998)]. 7 It was only from February 16, 1885, that Dow Jones began reporting an index, initially composed of 12 stocks. The S&P index dates back to 1928, though for the period 1928–1957 it consisted of 90 stocks. The S&P 500 debuted in March 1957.
894
R. Mehra and E.C. Prescott Table 1 U.S. equity premium using different data sets
Data set
% real return on a market index (mean)
% real return on a relatively riskless security (mean)
% equity premium (mean)
1802–1998 (Siegel)
7.0
2.9
4.1
1871–1999 (Shiller)
6.99
1.74
5.75
1889–2000 (Mehra–Prescott)
8.06
1.14
6.92
1926–2000 (Ibbotson)
8.8
0.4
8.4
The Ibbotson Associates Yearbooks 8 are also a very useful compendium of post–1926 financial data. 2.2.3.2. Return on a risk-free security. Since the advent of Treasury bills in 1931, short maturity bills have been an excellent proxy for a “real” risk-free security since the innovation in inflation is orthogonal to the path of real GNP growth. 9 Of course, with the advent of Treasury Inflation Protected Securities (TIPS) on January 29, 1997, the return on these securities is the real risk-free rate. 2.3. Estimates of the equity premium Historical data provides us with a wealth of evidence documenting that for over a century, stock returns have been considerably higher than those for Treasury-bills. This is illustrated in Table 1, which reports the unconditional estimates 10 for the equity premium in the USA based on the various data sets used in the literature, going back to 1802. The average annual real return, (the inflation-adjusted return) on the U.S. stock market over the last 110 years has been about 8.06%. Over the same period, the return on a relatively riskless security was a paltry 1.14%. The difference between these two returns, the “equity premium”, was 6.92%. Furthermore, this pattern of excess returns to equity holdings is not unique to the USA but is observed in every country with a significant capital market. The USA, together with the UK, Japan, Germany and France, accounts for more than 85% of the capitalized global equity value.
8
Ibbotson Associates, 2001, “Stocks, bonds, bills and inflation,” 2000 Yearbook (Ibbotson Associates, Chicago). 9 See Litterman (1980) who also found that that in post-war data the innovation in inflation had a standard deviation of one half of one percent. 10 To obtain unconditional estimates we use the entire data set to form our estimate. The Mehra–Prescott data set covers the longest time period for which both consumption and stock return data is available. The former is necessary to test the implication of consumption-based asset-pricing models.
Ch. 14:
The Equity Premium in Retrospect
895
Table 2 Equity premium in different countries a Country
% real return on a market index (mean)
% real return on a relatively riskless security (mean)
% equity premium (mean)
UK
1947–1999
5.7
1.1
4.6
Japan
1970–1999
4.7
1.4
3.3
Germany
1978–1997
9.8
3.2
6.6
France
1973–1998
9.0
2.7
6.3
a
Source: UK from Siegel (1998), the rest are from Campbell (2001). Table 3 Terminal value of $1 invested in stocks and bonds a Investment period
Stocks Real
1802–1997 1926–2000 a
$558,945 $266.47
Nominal $7,470,000 $2,586.52
Real
T-bills Nominal
$276 $1.71
$3,679 $16.56
Source: Ibbotson (2001) and Siegel (1998).
The annual return on the British stock market was 5.7% over the post-war period, an impressive 4.6% premium over the average bond return of 1.1%. Similar statistical differentials are documented for France, Germany and Japan. Table 2 illustrates the equity premium in the post-war period for these countries. The dramatic investment implications of this differential rate of return can be seen in Table 3, which maps the capital appreciation of $1 invested in different assets from 1802 to 1997 and from 1926 to 2000. As Table 3 illustrates, $1 invested in a diversified stock index yields an ending wealth of $558, 945 versus a value of $276, in real terms, for $1 invested in a portfolio of T-bills for the period 1802–1997. The corresponding values for the 75-year period, 1926–2000, are $266.47 and $1.71. We assume that all payments to the underlying asset, such as dividend payments to stock and interest payments to bonds are reinvested and that there are no taxes paid. This long-term perspective underscores the remarkable wealth building potential of the equity premium. It should come as no surprise therefore, that the equity premium is of central importance in portfolio allocation decisions, estimates of the cost of capital and is front and center in the current debate about the advantages of investing Social Security funds in the stock market. In Table 4 we report the premium for some interesting sub-periods: 1889–1933, when the USA was on a gold standard; 1934–2000, when it was off the gold standard;
896
R. Mehra and E.C. Prescott Table 4 Equity premium in different sub-periods a
Time period
% real return on a market index (mean)
1889–1933 1934–2000 1946–2000 a
% real return on a relatively riskless security (mean)
% equity premium (mean)
7.01
3.09
3.92
8.76
−0.17
8.93
9.03
0.68
8.36
Source: Mehra and Prescott (1985). Updated by the authors.
Table 5 Equity premium: 30-year moving averages a Time period
% real return on a market index (mean)
% real return on a relatively riskless security (mean)
% equity premium (mean)
1900–1950
6.51
2.01
4.50
1951–2000
8.98
1.41
7.58
a
Source: Mehra and Prescott (1985). Updated by the authors
and 1946–2000, the postwar period. Table 5 presents 30 year moving averages, similar to those reported by the U.S. meteorological service to document ‘normal’ temperature. Although the premium has been increasing over time, this is largely due to the diminishing return on the riskless asset, rather than a dramatic increase in the return on equity, which has been relatively constant. The low premium in the nineteenth century is largely due to the fact that the equity premium for the period 1802–1861 was zero. 11 If we exclude this period, we find that difference in the premium in the second half of the nineteenth century relative to average values in the twentieth century is less striking. We find a dramatic change in the equity premium in the post 1933 period – the premium rose from 3.92% to 8.93%, an increase of more than 125%. Since 1933 marked the end of the period when the USA was on the gold standard, this break can be seen as the change in the equity premium after the implementation of the new policy.
11
See the earlier discussion on data.
Ch. 14:
The Equity Premium in Retrospect
897
Fig. 1. Realized equity risk premium per year, 1926–2000. Source: Ibbotson (2001).
Fig. 2. Equity risk premium over 20-year periods, 1926–2000. Source: Ibbotson (2001).
2.4. Variation in the equity premium over time The equity premium has varied considerably over time, as illustrated in Figures 1 and 2. Furthermore, the variation depends on the time horizon over which it is measured. There have even been periods when the equity premium has been negative.
898
R. Mehra and E.C. Prescott 2.5
14
12
10
1.5 8
6 1
4 0.5 2
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
1979
1977
1975
1973
1971
1969
1967
1965
1963
1961
1959
1957
1955
1953
1951
1949
1947
1945
1943
1941
1939
1937
1935
1933
1931
0 1929
0
market value / national income
mean equity premium
2
Time
Fig. 3. Market value/national income and mean equity premium (averaged over time periods when MV/NI > 1 and MV/NI < 1).
The low frequency variation has been counter-cyclical. This is shown in Figure 3 where we have plotted stock market value as a share of national income 12 and the mean equity premium averaged over certain time periods. We have divided the time period from 1929 to 2000 into sub-periods, where the ratio market value of equity to national income was greater than 1 and where it was less than 1. Historically, as the figure illustrates, subsequent to periods when this ratio was high the realized equity premium was low. A similar result holds when stock valuations are low relative to national income. In this case the subsequent equity premium is high. Since After Tax Corporate Profits as a share of National Income are fairly constant over time, this translates into the observation that the realized equity premium was low subsequent to periods when the Price/Earnings ratio is high and vice versa. This is the basis for the returns predictability literature in Finance [Campbell and Shiller (1988) and Fama and French (1988)]. In Figure 4 we have plotted stock market value as a share of national income and the subsequent three-year mean equity premium. This provides further conformation that historically, periods of relatively high market valuation have been followed by periods when the equity premium was relatively low.
12 In Mehra (1998) it is argued that the variation in this ratio is difficult to rationalize in the standard neoclassical framework since over the same period after tax cash flows to equity as a share of national income are fairly constant. Here we do not address this issue and simply utilize the fact that this ratio has varied considerably over time.
Ch. 14:
The Equity Premium in Retrospect
899 2.5
18
2
mean equity premium
14
12 1.5 10
8 1 6
4
0.5
market value / national income
16
2
0
19 29 19 32 19 35 19 38 19 41 19 44 19 47 19 50 19 53 19 56 19 59 19 62 19 65 19 68 19 71 19 74 19 77 19 80 19 83 19 86 19 89 19 92 19 95 19 98
0
Time
Fig. 4. Market value/national income and 3-year ahead equity premium (averaged over time periods when MV/NI > 1 and MV/NI < 1).
3. Is the equity premium due to a premium for bearing non-diversifiable risk? In this section, we examine various models that attempt to explain the historical equity premium. We start with a model with standard (CRRA) preferences, then examine models incorporating alternative preference structures, idiosyncratic and uninsurable income risk, and models incorporating a disaster state and survivorship bias. Why have stocks been such an attractive investment relative to bonds? Why has the rate of return on stocks been higher than on relatively risk-free assets? One intuitive answer is that since stocks are “riskier” than bonds, investors require a larger premium for bearing this additional risk; and indeed, the standard deviation of the returns to stocks (about 20% per annum historically) is larger than that of the returns to T-bills (about 4% per annum), so, obviously they are considerably more risky than bills! But are they? Figures 5 and 6 illustrate the variability of the annual real rate of return on the S&P 500 index and a relatively risk-free security over the period 1889–2000. Of course, the index did not consist of 500 stocks for the entire period. To enhance and deepen our understanding of the risk–return trade-off in the pricing of financial assets, we take a detour into modern asset-pricing theory and look at why different assets yield different rates of return. The deus ex machina of this theory is that assets are priced such that, ex-ante, the loss in marginal utility incurred by sacrificing current consumption and buying an asset at a certain price is equal to the expected gain in marginal utility, contingent on the anticipated increase in consumption when the asset pays off in the future.
900
R. Mehra and E.C. Prescott
60
40
20
18 89 18 93 18 97 19 01 19 05 19 09 19 13 19 17 19 21 19 25 19 29 19 33 19 37 19 41 19 45 19 49 19 53 19 57 19 61 19 65 19 69 19 73 19 77 19 81 19 85 19 89 19 93 19 97
0
-20
-40
-60
Year
Fig. 5. Real annual return on S&P 500, 1889–2000 (%). Source: Mehra and Prescott (1985). Data updated by the authors.
20
15
10
5
18 89 18 93 18 97 19 01 19 05 19 09 19 13 19 17 19 21 19 25 19 29 19 33 19 37 19 41 19 45 19 49 19 53 19 57 19 61 19 65 19 69 19 73 19 77 19 81 19 85 19 89 19 93 19 97
0
-5
-10
-15
-20
Year
Fig. 6. Real annual return on a relatively riskless security, 1889–2000 (%). Source: Mehra and Prescott (1985). Data updated by the authors.
Ch. 14:
The Equity Premium in Retrospect
901
The operative emphasis here is the incremental loss or gain of utility of consumption and should be differentiated from incremental consumption. This is because the same amount of consumption may result in different degrees of well-being at different times. As a consequence, assets that pay off when times are good and consumption levels are high – when the marginal utility of consumption is low – are less desirable than those that pay off an equivalent amount when times are bad and additional consumption is more highly valued. Hence consumption in period t has a different price if times are good than if times are bad. Let us illustrate this principle in the context of the standard, popular paradigm, the Capital Asset-Pricing Model (CAPM). The model postulates a linear relationship between an asset’s ‘beta’, a measure of systematic risk, and its expected return. Thus, high-beta stocks yield a high expected rate of return. That is because in the CAPM, good times and bad times are captured by the return on the market. The performance of the market, as captured by a broad-based index, acts as a surrogate indicator for the relevant state of the economy. A high-beta security tends to pay off more when the market return is high – when times are good and consumption is plentiful; it provides less incremental utility than a security that pays off when consumption is low, is less valuable and consequently sells for less. Thus, higher beta assets that pay off in states of low marginal utility will sell for a lower price than similar assets that pay off in states of high marginal utility. Since rates of return are inversely proportional to asset prices, the lower beta assets will, on average, give a lower rate of return than the former. Another perspective on asset pricing emphasizes that economic agents prefer to smooth patterns of consumption over time. Assets that pay off a larger amount at times when consumption is already high “destabilize” these patterns of consumption, whereas assets that pay off when consumption levels are low “smooth” out consumption. Naturally, the latter are more valuable and thus require a lower rate of return to induce investors to hold these assets. (Insurance policies are a classic example of assets that smooth consumption. Individuals willingly purchase and hold them, despite their very low rates of return). To return to the original question: are stocks that much riskier than T-bills so as to justify a six percentage differential in their rates of return? What came as a surprise to many economists and researchers in finance was the conclusion of a paper by Mehra and Prescott, written in 1979. Stocks and bonds pay off in approximately the same states of nature or economic scenarios and hence, as argued earlier, they should command approximately the same rate of return. In fact, using standard theory to estimate risk-adjusted returns, we found that stocks on average should command, at most, a one percent return premium over bills. Since, for as long as we had reliable data (about 100 years), the mean premium on stocks over bills was considerably and consistently higher, we realized that we had a puzzle on our hands. It took us six more years to convince a skeptical profession and for our paper The equity premium: a puzzle to be published. [Mehra and Prescott (1985)].
902
R. Mehra and E.C. Prescott
3.1. Standard preferences The neoclassical growth model and its stochastic variants are a central construct in contemporary finance, public finance, and business-cycle theory. It has been used extensively by, among others, Abel et al. (1989), Auerbach and Kotlikoff (1987), Barro and Becker (1988), Brock (1979), Cox, Ingersoll and Ross (1985), Donaldson and Mehra (1984), Lucas (1978), Kydland and Prescott (1982) and Merton (1971). In fact, much of our economic intuition is derived from this model class. A key idea of this framework is that consumption today and consumption in some future period are treated as different goods. Relative prices of these different goods are equal to people’s willingness to substitute between these goods and businesses’ ability to transform these goods into each other. The model has had some remarkable successes when confronted with empirical data, particularly in the stream of macroeconomic research referred to as Real BusinessCycle Theory, where researchers have found that it easily replicates the essential macroeconomic features of the business cycle. See, in particular, Kydland and Prescott (1982). Unfortunately, when confronted with financial market data on stock returns, tests of these models have led, without exception, to their rejection. Perhaps the most striking of these rejections is contained in our 1985 paper. To illustrate this we employ a variation of Lucas’ (1978) pure exchange model. Since per capita consumption has grown over time, we assume that the growth rate of the endowment follows a Markov process. This is in contrast to the assumption in Lucas’ model that the endowment level follows a Markov process. Our assumption, which requires an extension of competitive equilibrium theory, 13 enables us to capture the non-stationarity in the consumption series associated with the large increase in per capita consumption that occurred over the last century. We consider a frictionless economy that has a single representative “stand-in” household. This unit orders its preferences over random consumption paths by ' &∞ t E0 (1) b U (ct ) , 0 < b < 1, t=0
where ct is the per capita consumption and the parameter b is the subjective time discount factor, which describes how impatient households are to consume. If b is small, people are highly impatient, with a strong preference for consumption now versus consumption in the future. As modeled, these households live forever, which implicitly means that the utility of parents depends on the utility of their children. In the real world, this is true for some people and not for others. However, economies with both types of people – those who care about their children’s utility and those who do not – have essentially the same implications for asset prices and returns. 14 13 14
This extension is developed in Mehra (1988). See Constantinides, Donaldson and Mehra (2002).
Ch. 14:
The Equity Premium in Retrospect
903
Thus, we use this simple abstraction to build quantitative economic intuition about what the returns on equity and debt should be. E0 {·} is the expectation operator conditional upon information available at time zero (which denotes the present time), and U : R+ → R is the increasing, continuously differentiable concave utility function. We further restrict the utility function to be of the constant relative risk aversion (CRRA) class U (c, a) =
c1 − a , 1−a
0 < a < ∞,
(2)
where the parameter a measures the curvature of the utility function. When a = 1, the utility function is defined to be logarithmic, which is the limit of the above representation as a approaches 1. The feature that makes this the “preference function of choice” in much of the literature in Growth and Real Business Cycle Theory is that it is scale invariant. This means that a household is more likely to accept a gamble if both its wealth and the gamble amount are scaled by a positive factor. Hence, although the level of aggregate variables such as capital stock have increased over time, the resulting equilibrium return process is stationary. A second attractive feature is that it is one of only two preference functions that allows for aggregation and a “stand-in” representative agent formulation that is independent of the initial distribution of endowments. One disadvantage of this representation is that it links risk preferences with time preferences. With CRRA preferences, agents who like to smooth consumption across various states of nature also prefer to smooth consumption over time, that is, they dislike growth. Specifically, the coefficient of relative risk aversion is the reciprocal of the elasticity of intertemporal substitution. There is no fundamental economic reason why this must be so. We will revisit this issue in Section 3.3, where we examine preference structures that do not impose this restriction. 15 We assume that there is one productive unit which produces output yt in period t which is the period dividend. There is one equity share with price pt that is competitively traded; it is a claim to the stochastic process { yt }. Consider the intertemporal choice problem of a typical investor at time t. He equates the loss in utility associated with buying one additional unit of equity to the discounted expected utility of the resulting additional consumption in the next period. To carry over one additional unit of equity pt units of the consumption good must be sacrificed and the resulting loss in utility is pt U (ct ). By selling this additional unit of equity, in the next period, pt + 1 + yt + 1 additional units of the consumption good can be consumed and bEt {( pt + 1 + yt + 1 ) U (ct + 1 )} is the expected value of the incremental utility next period. At an optimum these quantities must be equal. Hence the fundamental relation that prices assets is pt U (ct ) = bEt {( pt + 1 + yt + 1 ) U (ct + 1 )}. Versions of this expression can be found in Rubinstein (1976), Lucas (1978), Breeden
15
Epstein and Zin (1991) and Weil (1989).
904
R. Mehra and E.C. Prescott
(1979) and Prescott and Mehra (1980), among others. Excellent textbook treatments can be found in Cochrane (2001), Danthine and Donaldson (2001), Duffie (2001) and LeRoy and Werner (2001). We use it to price both stocks and riskless one period bonds. For equity we have U (ct + 1 ) R (3) 1 = bEt e,t + 1 , U (ct ) where Re,t + 1 =
p t + 1 + yt + 1 , pt
and for the riskless one-period bonds the relevant expression is U (ct + 1 ) 1 = bEt Rf ,t + 1 . U (ct )
(4)
(5)
Where the gross rate of return on the riskless asset is by definition Rf ,t + 1 =
1 , qt
(6)
with qt being the price of the bond. Since U (c) is assumed to be increasing, we can rewrite Equation (3) as 1 = bEt {Mt + 1 Re,t + 1 } ,
(7)
where Mt + 1 is a strictly positive stochastic discount factor. This guarantees that the economy will be arbitrage free and the law of one-price holds. A little algebra shows that −U (ct + 1 ), Re,t + 1 Et Re,t + 1 = Rf ,t + 1 + Covt . (8) Et (U (ct + 1 )) The equity premium Et (Re,t + 1 ) − Rf ,t + 1 can thus be easily computed. Expected asset returns equal the risk-free rate plus a premium for bearing risk, which depends on the covariance of the asset returns with the marginal utility of consumption. Assets that co-vary positively with consumption – that is, they payoff in states when consumption is high and marginal utility is low – command a high premium since these assets “destabilize” consumption. The question we need to address is the following: is the magnitude of the covariance between the marginal utility of consumption large enough to justify the observed 6% equity premium in U.S. equity markets? If not, how much of this historic equity premium is a compensation for bearing non-diversifiable aggregate risk.
Ch. 14:
The Equity Premium in Retrospect
905
To address this issue, we present a variation on the framework used in our original paper on the equity premium. An advantage of our original approach was that we could easily test the sensitivity of our results to changes in distributional assumptions. 16 We found that our results were essentially unchanged for very different consumption processes, provided that the mean and variances of growth rates equaled the historically observed values and the coefficient of relative risk aversion was less than ten. 17 Using this insight on the robustness of the results to distributional assumptions from our earlier analysis we consider the case where the growth rate of consumption xt + 1 ≡ ctc+t 1 is iid and lognormal. We do this to facilitate exposition and because this results in closed form solutions. 18 As a consequence, the gross return on equity Re,t (defined above) is iid and lognormal. Substituting U (ct ) = ct−a in the fundamental pricing relation and noting that in this exchange economy the equilibrium consumption process is { yt } pt = bEt
U (ct + 1 ( pt + 1 + yt + 1 ) U (ct )
,
(9)
we get pt = bEt {( pt + 1 + yt + 1 ) xt−a+ 1 } .
(10)
As pt is homogeneous of degree one in yt we can represent it as pt = wyt and hence Re,t + 1 can be expressed as Re,t + 1 =
(w + 1) yt + 1 w + 1 · · xt + 1 . = w yt w
(11)
It is easily shown 19 that w=
bEt {xt1−a + 1} . 1 − bEt {xt1−a + 1}
(12)
16 In contrast to our approach, which is in the applied general equilibrium tradition, there is another tradition of testing Euler equations (such as Equation 9) and rejecting them. Hansen and Singleton (1982) and Grossman and Shiller (1981) exemplify this approach. 17 See Mehra and Prescott (1985, pp. 156−157). The original framework also allowed us to address the issue of leverage. 18 The exposition below is based on Abel (1988). Our original analysis is presented in Appendix B. 19 See Appendix A in Mehra (2003).
906
R. Mehra and E.C. Prescott
hence Et {Re,t + 1 } =
Et {xt + 1 } . bEt {xt1−a + 1}
(13)
Analogously, the gross return on the riskless asset can be written as Rf ,t + 1 =
1 1 . b Et {xt−a+ 1 }
(14)
Since we have assumed the growth rate of consumption and dividends to be log normally distributed, exp mx + 12 sx2 , Et {Re,t + 1 } = (15) b exp (1 − a) mx + 12 (1 − a)2 sx2 and ln Et {Re,t + 1 } = − ln b + amx − 12 a 2 sx2 + asx2 ,
(16)
where mx = E(ln x), sx2 = Var(ln x) and ln x is the continuously compounded growth rate of consumption. Similarly 1 , b exp −amx + 12 a 2 sx2
(17)
ln Rf = − ln b + amx − 12 a 2 sx2
(18)
∴
(19)
Rf = and
ln E {Re } − ln Rf = asx2 .
From Equation (11) it also follows that ln E {Re } − ln Rf = asx,Re ,
(20)
where sx,Re = Cov (ln x, ln Re ) .
(21)
The (log) equity premium in this model is the product of the coefficient of risk aversion and the covariance of the (continuously compounded) growth rate of consumption with the (continuously compounded) return on equity or the growth rate of dividends. From Equation 19, it is also the product of the coefficient of relative risk aversion and the variance of the growth rate of consumption. As we see below, this
Ch. 14:
The Equity Premium in Retrospect
907
variance sx2 is 0.00125, so unless the coefficient of risk aversion a is large, a high equity premium is impossible. The growth rate of consumption just does not vary enough! In Mehra and Prescott (1985) we report the following sample statistics for the U.S. economy over the period 1889–1978: Mean risk-free rate Rf
1.008
Mean return on equity E{Re }
1.0698
Mean growth rate of consumption E{x}
1.018
Standard deviation of the growth rate of consumption s {x}
0.036
Mean equity premium E{Re } − Rf
0.0618
In our calibration, we are guided by the tenet that model parameters should meet the criteria of cross-model verification. Not only must they be consistent with the observations under consideration but they should not be grossly inconsistent with other observations in growth theory, business-cycle theory, labor market behavior and so on. There is a wealth of evidence from various studies that the coefficient of risk aversion a is a small number, certainly less than 10. A number of these studies are documented in Mehra and Prescott (1985). We can then pose a question: if we set the risk aversion coefficient a to be 10 and b to be 0.99 what are the expected rates of return and the risk premium using the parameterization above? Using the expressions derived earlier we have ln Rf = − ln b + amx − 12 a 2 sx2 = 0.120, or Rf = 1.127, that is, a risk-free rate of 12.7%! Since ln E {Re } = ln Rf + asx2 = 0.132, we have E {Re } = 1.141, or a return on equity of 14.1%. This implies an equity risk premium of 1.4%, far lower than the 6.18% historically observed equity premium. In this calculation we have been very liberal in choosing the values for a and b. Most studies indicate a value for a that
908
R. Mehra and E.C. Prescott
is close to 3. If we pick a lower value for b, the risk-free rate will be even higher and the premium lower. So the 1.4% value represents the maximum equity risk premium that can be obtained in this class of models given the constraints on a and b. Since the observed equity premium is over 6%, we have a puzzle on our hands that risk considerations alone cannot account for. Philippe Weil (1989) has dubbed the high risk-free rate obtained above “the riskfree rate puzzle”. The short-term real rate in the USA averages less than 1%, while the high value of a required to generate the observed equity premium results in an unacceptably high risk-free rate. The risk-free rate as shown in Equation (18) can be decomposed into three components: ln Rf = − ln b + amx − 12 a 2 sx2 . The first term, − ln b, is a time preference or impatience term. When b < 1 it reflects the fact that agents prefer early consumption to later consumption. Thus, in a world of perfect certainty and no growth in consumption, the unique interest rate in the economy will be Rf = 1/ b. The second term, amx , arises because of growth in consumption. If consumption is likely to be higher in the future, agents with concave utility would like to borrow against future consumption in order to smooth their lifetime consumption. The higher the curvature of the utility function and the larger the growth rate of consumption, the greater the desire to smooth consumption. In equilibrium this will lead to a higher interest rate since agents in the aggregate cannot simultaneously increase their current consumption. The third term, 12 a 2 sx2 arises due to a demand for precautionary saving. In a world of uncertainty, agents would like to hedge against future unfavorable consumption realizations by building “buffer stocks” of the consumption good. Hence, in equilibrium, the interest rate must fall to counter this enhanced demand for savings. Figure 7 plots ln Rf = − ln b + amx − 12 a 2 sx2 calibrated to the U.S. historical values with mx = 0.0175 and sx2 = 0.00123 for various values of b. It shows that the precautionary savings effect is negligible for reasonable values of a, (1 < a < 5). For a = 3 and b = 0.99, Rf = 1.65, which implies a risk-free rate of 6.5% – much higher than the historical mean rate of 0.8%. The economic intuition is straightforward – with consumption growing at 1.8% a year with a standard deviation of 3.6%, agents with isoelastic preferences have a sufficiently strong desire to borrow to smooth consumption that it takes a high interest rate to induce them not to do so. The late Fischer Black 20 proposed that a = 55 would solve the puzzle. Indeed it can be shown that the 1889–1978 U.S. experience reported above can be reconciled with a = 48 and b = 0.55. To see this, observe that since var(x) = 0.00125 sx2 = ln 1 + [E(x)]2 20
Private communication 1981.
Ch. 14:
The Equity Premium in Retrospect
909
80 70 60 50 40 30 20 10 Beta = .99 Beta = .96
49
47
43
45
41
39
37
33
35
31
27
29
25
23
21
19
17
15
13
9
-10
11
7
5
3
1
0
Beta = .55
-20 -30 -40 -50 -60 -70 -80 Alpha
Fig. 7. Mean risk-free rate vs. alpha.
and mx = ln E(x)− 12 sx2 = 0.0172, this implies ln E(R) − ln RF ax2 = 47.6.
a=
Since ln b = − ln RF + amx − 12 a 2 sx2 = −0.60, this implies b = 0.55. Besides postulating an unacceptably high a, another problem is that this is a “knife edge” solution. No other set of parameters will work, and a small change in a will lead to an unacceptable risk-free rate as shown in Figure 7. An alternate approach is
910
R. Mehra and E.C. Prescott
to experiment with negative time preferences; however there seems to be no empirical evidence that agents do have such preferences. 21 Figure 7 shows that for extremely high a the precautionary savings term dominates and results in a “low” risk-free rate. 22 However, then a small change in the growth rate of consumption will have a large impact on interest rates. This is inconsistent with a cross-country comparison of real risk-free rates and their observed variability. For example, throughout the 1980s, South Korea had a much higher growth than the USA but real rates were not appreciably higher. Nor does the risk-free rate vary considerably over time, as would be expected if a was large. In Section 3 we show how alternative preference structures can help resolve the risk-free rate puzzle. An alternative perspective on the puzzle is provided by Hansen and Jagannathan (1991). The fundamental pricing equation can be written as Mt + 1 , Re,t + 1 . (22) Et Re,t + 1 = Rf ,t + 1 − Covt Et (Mt + 1 ) This expression also holds unconditionally so that Rf ,t + 1 − s (Mt + 1 ) s Re,t + 1 øR,M , E Re,t + 1 = Et (Mt + 1 )
(23)
or E Re,t + 1 − Rf ,t + 1 s (Mt + 1 ) øR,M , =− Et (Mt + 1 ) s Re,t + 1
(24)
and since − 1 øR,M 1 " " " s (M ) "E R e,t + 1 − Rf ,t + 1 " " t+1 . " " " E (Mt + 1 ) " s Re,t + 1
(25)
This inequality is referred to as the Hansen–Jagannathan lower bound on the pricing kernel. For the U.S. economy, the Sharpe Ratio, E(Re,t + 1 ) − Rf ,t + 1 / s (Re,t + 1 ), can be calculated to be 0.37. Since Et (Mt + 1 ) is the expected price of a one-period risk-free bond, its value must be close to 1. In fact, for the parameterization discussed earlier, Et (Mt + 1 ) = 0.96 when a = 2. This implies that the lower bound on the standard deviation for the pricing kernel must be close to 0.3 if the Hansen–Jagannathan bound 21 In a model with growth, equilibrium can exist with b > 1. See Mehra (1988) for the restrictions on the parameters a and b for equilibrium to exist. 22 Kandel and Stambaugh (1991) have suggested this approach.
Ch. 14:
The Equity Premium in Retrospect
911
is to be satisfied. However, when this is calculated in the Mehra–Prescott framework, we obtain an estimate for s (Mt + 1 ) = 0.002, which is off by more than an order of magnitude. We would like to emphasize that the equity premium puzzle is a quantitative puzzle; standard theory is consistent with our notion of risk that, on average, stocks should return more than bonds. The puzzle arises from the fact that the quantitative predictions of the theory are an order of magnitude different from what has been historically documented. The puzzle cannot be dismissed lightly, since much of our economic intuition is based on the very class of models that fall short so dramatically when confronted with financial data. It underscores the failure of paradigms central to financial and economic modeling to capture the characteristic that appears to make stocks comparatively so risky. Hence the viability of using this class of models for any quantitative assessment, say, for instance, to gauge the welfare implications of alternative stabilization policies, is thrown open to question. For this reason, over the last 15 years or so, attempts to resolve the puzzle have become a major research impetus in Finance and Economics. Several generalizations of key features of the Mehra and Prescott (1985) model have been proposed to better reconcile observations with theory. These include alternative assumptions on preferences, 23 modified probability distributions to admit rare but disastrous events, 24 survival bias, 25 incomplete markets, 26 and market imperfections. 27 They also include attempts at modeling limited participation of consumers in the stock market, 28 problems of temporal aggregation 29 and behavioral explanations. 30 However, none have fully resolved the anomalies. We examine some of the research efforts to resolve the puzzle 31 below and in Section 4.
23 For example, Abel (1990), Bansal and Yaron (2000), Benartzi and Thaler (1995), Boldrin, Christiano and Fisher (2001), Campbell and Cochrane (1999), Constantinides (1990), Epstein and Zin (1991) and Ferson and Constantinides (1991). 24 See, Rietz (1988) and Mehra and Prescott (1988). 25 Brown, Goetzmann and Ross (1995). 26 For example, Bewley (1982), Brav, Constantinides and Geczy (2002), Constantinides and Duffie (1996), Heaton and Lucas (1997, 2000), Krebs (2000), Lucas (1994), Mankiw (1986), Mehra and Prescott (1985), Storesletten, Telmer and Yaron (2001) and Telmer (1993). 27 For example, Aiyagari and Gertler (1991), Alvarez and Jermann (2000), Bansal and Coleman (1996), Basak and Cuoco (1998), Constantinides, Donaldson and Mehra (2002), Danthine, Donaldson and Mehra (1992), Daniel and Marshall (1997), He and Modest (1995), Heaton and Lucas (1996) and Luttmer (1996), McGrattan and Prescott (2001) and Storesletten, Telmer and Yaron (2001). 28 Attanasio, Banks and Tanner (2002), Brav, Constantinides and Geczy (2002), Brav and Geczy (1995), Mankiw and Zeldes (1991) and Vissing-Jorgensen (2002). 29 Gabaix and Laibson (2001), Heaton (1995) and Lynch (1996). 30 See Barberis, Huang and Santos (2001) and Mehra and Sah (2002). 31 The reader is also referred to the excellent surveys by Narayana Kocherlakota (1996), John Cochrane (1997), Cochrane and Hansen (1992) and by John Campbell (1999, 2001).
912
R. Mehra and E.C. Prescott
3.2. Estimating the equity risk premium versus estimating the risk aversion parameter Estimating or measuring the relative risk parameter using statistical tools is very different than estimating the equity risk premium. Mehra and Prescott (1985), as discussed above, use an extension of Lucas’ (1978) asset-pricing model to estimate how much of the historical difference in yields on treasury bills and corporate equity is a premium for bearing aggregate risk. Crucial to their analysis is their use of micro observations to restrict the value of the risk aversion parameter. They did not estimate either the risk aversion parameter or the discount rate parameters. Mehra and Prescott (1985) reject extreme risk aversion based upon observations on individual behavior. These observations include the small size of premia for jobs with uncertain income and the limited amount of insurance against idiosyncratic income risk. Another observation is that people with limited access to capital markets make investments in human capital that result in very uneven consumption over time. A sharp estimate for the magnitude of the risk aversion parameter comes from macroeconomics. The evidence is that the basic growth model, when restricted to be consistent with the growth facts, generates business cycle fluctuations if and only if this risk aversion parameter is near zero. (This corresponds to the log case in standard usage). The point is that the risk aversion parameter comes up in wide variety of observations at both the household and the aggregate level and is not found to be large. For all values of the risk-aversion coefficient less than ten, which is an upper bound number for this parameter, Mehra and Prescott find that a premium for bearing aggregate risk accounts for little of the historic equity premium. This finding has stood the test of time. Another tradition is to use consumption and stock market data to estimate the degree of relative risk aversion parameter and the discount factor parameter. This is what Grossman and Shiller report they did in their American Economic Review Papers and Proceedings article (1982, p. 226). Hansen and Singleton, in a paper in which they develop “a method for estimating nonlinear rational expectations models directly from stochastic Euler equations”, illustrate their methods by estimating the risk aversion parameter and the discount factor using stock dividend consumption prices (1982, p. 1269). What the work of Grossman and Shiller (ibid) and Hansen and Singleton (ibid) establish is that using consumption and stock market data and assuming frictionless capital markets is a bad way to estimate the risk aversion and discount factor parameters. It is analogous to estimating the force of gravity near the earth’s surface by dropping a feather from the top of the Leaning Tower of Pisa, under the assumption that friction is zero. A tradition related to statistical estimation is to statistically test whether the stochastic Euler equation arising from the stand-in household’s intertemporal optimization holds. Both Grossman and Shiller (1981) and Hansen and Singleton (1982) reject this
Ch. 14:
The Equity Premium in Retrospect
913
relation. The fact that this relation is inconsistent with time series data from the USA is no reason to conclude that the model economy used by Mehra and Prescott to estimate how much of the historical equity premium is a premium for bearing aggregate risk is not a good one for that purpose. Returning to the analogy from Physics, it would be silly to reject Newtonian mechanics as a useful tool for drawing scientific inference because the distance traveled by the feather did not satisfy 12 gt 2 . 3.3. Alternative preference structures 3.3.1. Modifying the conventional time – and state – separable utility function The analysis above shows that the isoelastic preferences used in Mehra and Prescott (1985) can only be made consistent with the observed equity premium if the coefficient of relative risk aversion is implausibly large. One restriction imposed by this class of preferences is that the coefficient of risk aversion is rigidly linked to the elasticity of intertemporal substitution. One is the reciprocal of the other. What this implies is that if an individual is averse to variation of consumption across different states at a particular point of time then he will be averse to consumption variation over time. There is no a priori reason that this must be so. Since, on average, consumption is growing over time, the agents in the Mehra and Prescott (1985) setup have little incentive to save. The demand for bonds is low and as a consequence the riskfree rate is counterfactually high. Epstein and Zin (1991) have presented a class of preferences that they term “Generalized Expected Utility” (GEU) which allows independent parameterization for the coefficient of risk aversion and the elasticity of intertemporal substitution. In this class of preferences utility is recursively defined by ( ø ) ø1
ø , (26) Ut = (1 − b) ct + b Et U˜ ta+ 1 a 1 where 1 − a is the coefficient of relative risk aversion and s = 1−ø the elasticity of intertemporal substitution. The usual isoelastic preferences follow as a special case when ø = a. In the Epstein and Zin model, agents’ wealth W evolves as Wt + 1 = (Wt − ct )(1 + Rw,t + 1 ) where Rw,t + 1 is the return on all invested wealth and is potentially unobservable. To examine the asset-pricing implications of this modification we examine the pricing kernel 32 a(ø−1) ø a−ø a ct + 1 1 + Rw,t + 1 ø . kt + 1 = b ø (27) ct Thus, the price pt of an asset with payoff yt + 1 at time t + 1 is
pt = Et (kt + 1 yt + 1 ) .
(28)
In this framework the asset is priced both by its covariance with the growth rate of consumption (the first term in Equation 27) and with the return on the wealth portfolio. 32
Epstein and Zin (1991) use dynamic programming to calculate this. See their Equations 8–13. Although the final result is correct, there appear to be errors in the intermediate steps.
914
R. Mehra and E.C. Prescott
This captures the pricing features of both the standard consumption CAPM and the traditional static CAPM. To see this, note that when a = ø, we get the consumption CAPM and with logarithmic preferences (a/ ø = 0), the static CAPM. Another feature of this class of models is that a high coefficient of risk aversion, 1 − a, does not necessarily imply that agents will want to smooth consumption over time. However, the main difficulty in testing this alternative preference structure stems from the fact that the counterparts of Equations (3) and (5) using GEU depend on variables that are unobservable, and this makes calibration tricky. One needs to make specific assumptions on the consumption process to obtain first-order conditions in terms of observables. Epstein and Zin (1991) use the “market portfolio” as a proxy for the wealth portfolio and claim that their framework offers a solution to the equity premium puzzle. We feel that this proxy overstates the correlation between asset returns and the wealth portfolio and hence their claim. This modification has the potential to resolve the risk-free rate puzzle. We illustrate this below. Under the log-normality assumptions from Section 3.1, and using the market portfolio as a stand-in for the wealth portfolio we have ln Rf = − ln b +
mx a/ ø 2 (a/ ø) − 1 2 − sm . s + s 2s 2 x 2
(29)
Here sm2 is the variance of the return on the “market portfolio” of all invested wealth. Since 1 − a need not equal 1/ s , we can have a large a without making s small and hence obtain a reasonable risk-free rate if one is prepared to assume a large s . The problem with this is that there is independent evidence that the elasticity of intertemporal substitution is small [Campbell (2001)], hence this generality is not very useful when the model is accurately calibrated. 3.3.2. Habit formation A second approach to modifying preferences was initiated by Constantinides (1990) by incorporating habit formation. This formulation assumes that utility is affected not only by current consumption but also by past consumption. It captures a fundamental feature of human behavior that repeated exposure to a stimulus diminishes the response to it. The literature distinguishes between two types of habit, “internal” and “external” and two modeling perspectives, “difference” and “ratio”. We illustrate these below. Internal habit formation captures the notion that utility is a decreasing function of one’s own past consumption and marginal utility is an increasing function of one’s own past consumption. Models with external habit emphasize that the operative benchmark is not one’s own past consumption but the consumption relative to other agents in the economy. Constantinides (1990) considers a model with internal habit where utility is defined over the difference between current consumption and lagged past consumption. Although the Constantinides (1990) model is in continuous time with a general lag
Ch. 14:
The Equity Premium in Retrospect
915
structure, we can illustrate the intuition behind this class of models incorporating “habit” by considering preferences with a one period lag U (c) = Et
∞ s=0
bs
(ct + s − lct + s − 1 )1−a , 1−a
l > 0.
(30)
If l = 1 and the subsistence level is fixed, the period utility function specializes to the form u(c) =
(c − x)1−a , 1−a
where x is the fixed subsistence level. 33 The implied local coefficient of relative risk aversion is −
a cu . = u 1 − x/c
(31)
If x/c = 0.8 then the effective risk aversion is 5a! This preference ordering makes the agent extremely averse to consumption risk even when the risk aversion is small. For small changes in consumption, changes in marginal utility can be large. Thus, while this approach cannot resolve the equity premium puzzle without invoking extreme aversion to consumption risk, it can address the risk-free rate puzzle since the induced aversion to consumption risk increases the demand for bonds, thereby reducing the risk-free rate. Furthermore, if the growth rate of consumption is assumed to be iid, an implication of this model is that the risk-free rate will vary considerably (and counterfactually) over time. Constantinides (1990) gets around this problem since the growth rate in his model is not iid. 34 An alternate approach to circumvent this problem has been expounded by Campbell and Cochrane (1999). The model incorporates the possibility of recession as a state variable so that risk aversion varies in a highly nonlinear manner. 35 The risk aversion of investors rises dramatically when the chances of a recession become larger and thus the model can generate a high equity premium. Since risk aversion increases precisely when consumption is low, it generates a precautionary demand for bonds that helps lower the risk-free rate. This model is consistent with both consumption and asset market data. However, it is an open question whether investors actually have the huge time varying counter-cyclical variations in risk aversion postulated in the model.
33
See also the discussion in Weil (1989). In fact, a number of studies suggest that there is a small serial correlation in the growth rate. 35 If we linearize the “surplus consumption ratio” in the Campbell–Cochrane (1999) model, we get the same variation in the risk-free rate as in the standard habit model. The nonlinear “surplus consumption ratio” is essential to reducing this variation. 34
916
R. Mehra and E.C. Prescott
Another modification of the Constantinides (1990) approach is to define utility of consumption relative to average per capita consumption. This is an external habit model where preferences are defined over the ratio of consumption to lagged 36 aggregate consumption. Abel (1990) terms his model “Catching up with the Joneses”. The idea is that one’s utility depends not on the absolute level of consumption, but on how one is doing relative to others. The effect is that, once again, an individual can become extremely sensitive and averse to consumption variation. Equity may have a negative rate of return and this can result in personal consumption falling relative to others. Equity thus becomes an undesirable asset relative to bonds. Since average per capita consumption is rising over time, the induced demand for bonds with this modification helps in mitigating the risk-free rate puzzle. Abel (1990) defines utility as the ratio of consumption relative to average per capita consumption rather than the difference between the two. It can be shown that this is not a trivial modification. 37 While “difference” habit models can, in principle, generate a high equity premium, ratio models generate a premium that is similar to that obtained with standard preferences. To illustrate, consider the framework in Abel (1990) specialized to the “catching up with the Joneses” case. At time t, the representative agent in the economy chooses the level of consumption ct to maximize ∞ g 1−a t ct /Ct − 1 , a > 0, (32) b U (c) = Et 1−a t=0
where Ct − 1 is the lagged aggregate consumption. In equilibrium of course Ct = ct , a fact we use in writing the counterparts of Equations (3) and (5) below. ( ) 1 = bEt Re,t + 1 xtg(a−1) xt−a+ 1 , (33) ( ) 1 = bRf ,t + 1 Et xtg(a−1) xt−a+ 1 , (34) where xt + 1 ≡ ctc+t 1 is the growth rate of consumption. Under the assumptions made in Section 3.1 we can ( write) Rf ,t + 1 =
Et xtg(a−1) +1
bEt {xt−a+ 1 }
,
(35)
and
( ) ( ) Et {xt + 1 } + AEt xt1+g(a−1) + 1 Et {Re,t + 1 } = Et xtg(a−1) . (36) +1 A We see that in the expression ln Rf = − ln b + amx − 12 a 2 sx2 − g(1 − a) mx , the equity premium is ln E{Re } − ln Rf = asx,z , which is exactly the same as what was obtained 36 Hence “Catching up with the Joneses” rather than “keeping up with the Joneses” [Abel (1990, footnote 1)]. 37 See Campbell (2001) for a detailed discussion.
Ch. 14:
The Equity Premium in Retrospect
917
earlier. Hence the equity premium is unchanged! However when g > 0, a high a does not lead to the risk-free rate puzzle. The statement, “External habit simply adds a term to the Euler Equation 60 which is known at time t, and this does not affect the premium” in Campbell (2001) appears to be inconsistent with the results in Table 1 Panel B in Abel (1990). 3.3.3. Resolution Although the “set up” in Abel (1990) and Campbell (2001) is similar, Campbell’s result is based on the assumption that asset returns and the growth rate of consumption are jointly log-normally distributed in both the “standard time additive” case and the “Joneses” case. In Abel (1990) the return distributions are endogenously determined and Campbell’s assumption is internally inconsistent in the context of that model. In Abel (1990), with “standard time additive” preferences, if consumption growth is log-normally distributed gross asset returns will also be lognormal, however, this is not the case with the “Joneses” preferences. In the latter case since 1 + Ri,t + 1 = xt1−a (xt + 1 + Axta+ 1 )/A, log-normality of x will not induce log-normality in 1 + Ri,t + 1 . Abel (1990) reports expressions for E(1 + Ri,t + 1 ) and E(1 + Rf ,t + 1 ) in his Equations 17 and+18. Let Abel = ln(E(1 + Ri,t + 1 )) − ln(E(1 + Rf ,t + 1 )). In the Abel model with q = 0 (the “standard time additive”+case), if the growth rate of consumption is assumed to be lognormally distributed Abel can be written as:
= E ln 1 + Ri,t + 1 + 0.5 Var ln 1 + Ri,t + 1 − E ln 1 + Rf ,t + 1
Abel
− 0.5 Var ln 1 + Rf ,t + 1 ,
(37)
or
=
Abel
, + 0.5 Var ln 1 + Ri,t + 1 − Var ln 1 + Rf ,t + 1
(38)
+ 0.5 Var(ln(x)),
(39)
Campbell
or Abel
=
Campbell
+ where Campbell = E(ln(1 + Ri,t + 1 )) − E(ln(1 + Rf ,t + 1 )), is the definition of the equity premium in Campbell (2001). With “standard time additive” preferences and log-normally distributed returns, + the analysis in both Abel and Campbell are equivalent. Indeed, a direct evaluation of Abel + from Equations 17 and 18 in Abel (1990) yields Abel = a Cov(ln x, ln(1 + Ri )).
918
R. Mehra and E.C. Prescott
This is identical to that obtained by adjusting Equation 62 in Campbell by adding 0.5 Var(ln(x)). However, in Abel (1990) with “Joneses” preferences, if the growth rate of consumption is log-normally distributed, asset returns will not be lognormal, hence the analysis in Campbell (2001) after Equation 60 will not apply. In Abel (1990), as preferences change, return distributions will change, hence if the counterpart of Equation 16 (in Campbell) represents the equity premium in the “standard time additive” framework, then Equation 62 will not be the relevant expression for the premium in the “Joneses” case. Counterparts of Equations 16 and 62 in Campbell (2001) will not both hold simultaneously in Abel (1990). To summarize, models with habit formation and relative or subsistence consumption have had success in addressing the risk-free rate puzzle but only limited success with resolving the equity premium puzzle, since in these models effective risk aversion and prudence become implausibly large. 3.4. Idiosyncratic and uninsurable income risk At a theoretical level, aggregate consumption is a meaningful economic construct if the market is complete, or effectively so. 38 Market completeness is implicitly incorporated into asset-pricing models in finance and neoclassical macroeconomics through the assumption of the existence of a representative household. In complete markets, heterogeneous households are able to equalize, state by state, their marginal rate of substitution. The equilibrium in a heterogeneous full-information economy is isomorphic in its pricing implications to the equilibrium in a representativehousehold, full-information economy, if households have von Neumann–Morgenstern preferences. Bewley (1982), Mankiw (1986) and Mehra and Prescott (1985) suggest the potential of enriching the asset-pricing implications of the representative-household paradigm, by relaxing the assumption of complete markets. 39 Current financial paradigms postulate that idiosyncratic income shocks must exhibit three properties in order to explain the returns on financial assets: uninsurability, persistence heteroscedasticity and counter cyclical conditional variance. In infinite horizon models, agents faced with uninsurable income shocks will dynamically selfinsure, effectively smoothing consumption. Hence the difference in the equity premium in incomplete markets and complete markets is small. 40
38
This section draws on Constantinides (2002). There is an extensive literature on the hypothesis of complete consumption insurance. See, Altonji, Hayashi and Kotlikoff (1992), Attanasio and Davis (1997), Cochrane (1991) and Mace (1991). 40 Lucas (1994) and Telmer (1993) calibrate economies in which consumers face uninsurable income risk and borrowing or short-selling constraints. They conclude that consumers come close to the completemarkets rule of complete risk sharing, although consumers are allowed to trade in just one security in a frictionless market. Aiyagari and Gertler (1991) and Heaton and Lucas (1996) add transaction costs 39
Ch. 14:
The Equity Premium in Retrospect
919
Constantinides and Duffie (1996), propose a model incorporating heterogeneity that captures the notion that consumers are subject to idiosyncratic income shocks that cannot be insured away. For instance, consumers face the risk of job loss, or other major personal disasters that cannot be hedged away or insured against. 41 Equities and related pro-cyclical investments exhibit the undesirable feature that they drop in value when the probability of job loss increases, as, for instance, in recessions. In economic downturns, consumers thus need an extra incentive to hold equities and other similar investment instruments; the equity premium can then be rationalized as the added inducement needed to make equities palatable to investors. The model provides an explanation of the counter-cyclical behavior of the equity risk premium: the risk premium is highest in a recession since equities are a poor hedge against the potential loss of employment. It also provides an explanation of the unconditional equity premium puzzle: even though per capita consumption growth is poorly correlated with stocks returns, investors require a hefty premium to hold stocks over short-term bonds because stocks perform poorly in recessions, when an investor is more likely to be laid off. Since the proposition demonstrates the existence of equilibrium in frictionless markets, it implies that the Euler equations of household (but not necessarily of per capita) consumption must hold. Furthermore, since the given price processes have embedded in them whatever predictability of returns of the dividend-price ratios and other instruments that the researcher cares to ascribe to them, the equilibrium price processes have this predictability built into them by construction. Constantinides and Duffie (1996), point out that periods with frequent and large uninsurable idiosyncratic income shocks are associated with both dispersed crosssectional distribution of the household consumption growth and low stock returns. Brav, Constantinides and Geczy (2002) provide empirical evidence of the impact of uninsurable idiosyncratic income risk on pricing. They estimate the relative risk aversion (RRA) coefficient and test the set of Euler equations of household consumption on the premium of the value-weighted and the equally weighted market portfolio return over the risk-free rate, and on the premium of value stocks over growth stocks. 42 They do not reject the Euler equations of household consumption with an economically plausible RRA coefficient of between two and four, although they reject the Euler equations of per capita consumption with any value of the RRA coefficient. and/or borrowing costs and reach a similar negative conclusion, provided that the supply of bonds is not restricted to an unrealistically low level. 41 Storesletten, Telmer and Yaron (2001) provide empirical evidence from the Panel Study on Income Dynamics (PSID) that idiosyncratic income shocks are persistent and have counter cyclical conditional variance. Storesletten, Telmer and Yaron (2000) corroborate this evidence by studying household consumption over the life cycle. 42 In related studies, Jacobs (1999) studies the PSID database on food consumption; Cogley (1999) and Vissing-Jorgensen (2002) study the CEX database on broad measures of consumption; Jacobs and Wang (2001) study the CEX database by constructing synthetic cohorts; and Ait-Sahalia, Parker and Yogo (2001) measure the household consumption with the purchases of certain luxury goods.
920
R. Mehra and E.C. Prescott
Krebs (2000) extends the Constantinides and Duffie (1996) model by introducing rare idiosyncratic income shocks that drive consumption close to zero. In his model, the conditional variance and skewness of the idiosyncratic income shocks are nearly constant over time. He provides a theoretical justification of the difficulty of empirically assessing the contribution of these catastrophic shocks in the low-order cross-sectional moments.
3.5. Models incorporating a disaster state and survivorship bias Rietz (1988) has proposed a solution to the puzzle that incorporates a very small probability of a very large drop in consumption. He finds that in such a scenario the risk-free rate is much lower than the return on an equity security. This model requires a 1-in-100 chance of a 25% decline in consumption to reconcile the equity premium with a risk aversion parameter of 10. Such a scenario has not been observed in the USA for the last years for which we have economic data. Nevertheless, one can evaluate the implications of the model. One implication is that the real interest rate and the probability of the occurrence of the extreme event move inversely. For example, the perceived probability of a recurrence of a depression was probably very high just after World War II and subsequently declined over time. If real interest rates rose significantly as the war years receded, that evidence would support the Rietz hypothesis. Similarly, if the low probability event precipitating the large decline in consumption were a nuclear war, the perceived probability of such an event has surely varied over the last 100 years. It must have been low before 1945, the first and only year the atom bomb was used. And it must have been higher before the Cuban Missile Crisis than after it. If real interest rates had moved as predicted, that would support Rietz’s disaster scenario. But they did not. Another attempt at resolving the puzzle proposed by Brown et al. (1995) focuses on survival bias. The central thesis here is that the ex-post measured returns reflect the premium, in the USA, on a stock market that has successfully weathered the vicissitudes of fluctuating financial fortunes. Many other exchanges were unsuccessful and hence the ex-ante equity premium was low. Since it was not known a priori which exchanges would survive, for this explanation to work, stock and bond markets must be differentially impacted by a financial crisis. Governments have expropriated much of the real value of nominal debt by the mechanism of unanticipated inflation. Five historical instances come readily to mind: During the German hyperinflation, holders of bonds denominated in Reich marks lost virtually all value invested in those assets. During the Poincar´e administration in France in the 1920s, bond-holders lost nearly 90% of the value invested in nominal debt. And in the 1980s, Mexican holders of dollar-denominated debt lost a sizable fraction of its value when the Mexican government, in a period of rapid inflation, converted the debt to pesos and limited the rate at which these funds could be withdrawn. Czarist bonds in Russia and Chinese
Ch. 14:
The Equity Premium in Retrospect
921
debt holdings (subsequent to the fall of the Nationalists) suffered a similar fate under communist regimes. The above examples demonstrate that in times of financial crisis, bonds are as likely to lose value as stocks. Although a survival bias may impact on the levels of both the return on equity and debt, there is no evidence to support the assertion that these crises impact differentially on the returns to stocks and bonds; hence the equity premium is not impacted. In every instance where trading equity has been suspended, due to political upheavals, etc., governments have either reneged on their debt obligations or expropriated much of the real value of nominal debt through the mechanism of unanticipated inflation. The difficulty that, collectively, several model classes have had in explaining the equity premium as a compensation for bearing risk leads us to conclude that perhaps it is not a “risk premium” but rather due to other factors. We consider these in the next section.
4. Is the equity premium due to borrowing constraints, a liquidity premium or taxes? 4.1. Borrowing constraints In models with borrowing constraints and transaction costs, the effect is to force investors to hold an inventory of bonds (precautionary demand) to smooth consumption. Hence in infinite horizon models with borrowing constraints, agents come close to equalizing their marginal rates of substitution with little effect on the equity premium 43 Some recent attempts to resolve the puzzle incorpor.ating both borrowing constraints and consumer heterogeneity appear promising. One approach, which departs from the representative agent model, has been proposed in Constantinides, Donaldson and Mehra (2002). In order to systematically illustrate these ideas, the authors construct an overlappinggenerations (OLG) exchange economy in which consumers live for three periods. In the first period, a period of human capital acquisition, the consumer receives a relatively low endowment income. In the second period, the consumer is employed and receives wage income subject to large uncertainty. In the third period, the consumer retires and consumes the assets accumulated in the second period. The authors explore the implications of a borrowing constraint by deriving and contrasting the stationary equilibria in two versions of the economy. In the borrowingconstrained version, the young are prohibited from borrowing and from selling equity short. The borrowing-unconstrained economy differs from the borrowing-constrained one only in that the borrowing constraint and the short-sale constraint are absent.
43
This is true unless the supply of bonds is unrealistically low. See Aiyagari and Gertler (1991).
922
R. Mehra and E.C. Prescott
An unconstrained representative agent’s maximization problem is formulated as follows. An agent born in period t solves max E {zt,ie ,zt,ib }
2
i
b U Ct,i
,
(40)
i=0
s.t. e b + qtb zt,1 w0 , ct,0 + qte zt,1
(41)
e b e b ct,1 + qte + 1 zt,2 + qtb+ 1 zt,2 qte + 1 + dt + 1 zt,1 + qtb+ 1 + b zt,1 + wt1+ 1 ct,2 e b e b qt + 2 + dt + 2 zt,2 + qt + 2 + b zt,2 ,
(42)
ct,j is the consumption in period t + j ( j = 0, 1, 2) of a consumer born in period t. There are two types of securities in the model: bonds and equity with ex-coupon and ex-dividend prices qtb and qte , respectively. Bonds are a claim to a coupon payment b every period, while the equity is a claim to the dividend stream {dt }. The consumer born in period t receives deterministic wage income w0 > 0 in period t, when young; stochastic wage income wt1+ 1 > 0 in period t + 1, when middle-aged; and zero wage e b shares of stock and zt,0 income in period t + 2, when old. The consumer purchases zt,0 e b bonds when young. The consumer adjusts these holdings to zt,1 and zt,1 , respectively, when middle-aged. The consumer liquidates his/her entire portfolio when old. Thus, e b zt,2 = 0 and zt,2 = 0. When considering the borrowing constrained equilibrium the following additional e b > 0 and zt,2 > 0. constraints are imposed zt,j The model introduces two forms of market incompleteness. First, consumers of one generation are prohibited from trading claims against their future wage income with consumers of another generation. 44 Second, consumers of one generation are prohibited from trading bonds and equity with consumers of an unborn generation. As discussed earlier, absent a complete set of contingent claims, consumer heterogeneity in the form of uninsurable, persistent and heteroscedastic idiosyncratic income shocks, with counter-cyclical conditional variance, can potentially resolve empirical difficulties encountered by representative-consumer models. 45 The novelty of the paper lies in incorporating a life-cycle feature to study asset pricing. The idea is appealingly simple. As discussed earlier, the attractiveness of equity as an asset depends on the correlation between consumption and equity income. If equity pays off in states of high marginal utility of consumption, it will command a higher price (and consequently a lower rate of return), than if its payoff is in states 44 Being homogeneous within their generation, consumers have no incentive to trade claims with consumers of their own generation. 45 See Mankiw (1986) and Constantinides and Duffie (1996).
Ch. 14:
The Equity Premium in Retrospect
923
where marginal utility is low. Since the marginal utility of consumption varies inversely with consumption, equity will command a high rate of return if it pays off in states when consumption is high, and vice versa. 46 A key insight of their paper is that as the correlation of equity income with consumption changes over the life cycle of an individual, so does the attractiveness of equity as an asset. Consumption can be decomposed into the sum of wages and equity income. A young person looking forward at his life has uncertain future wage and equity income; furthermore, the correlation of equity income with consumption will not be particularly high, as long as stock and wage income are not highly correlated. This is empirically the case, as documented by Davis and Willen (2000). Equity will thus be a hedge against fluctuations in wages and a “desirable” asset to hold as far as the young are concerned. The same asset (equity) has a very different characteristic for the middle-aged. Their wage uncertainty has largely been resolved. Their future retirement wage income is either zero or deterministic and the innovations (fluctuations) in their consumption occur from fluctuations in equity income. At this stage of the life cycle, equity income is highly correlated with consumption. Consumption is high when equity income is high, and equity is no longer a hedge against fluctuations in consumption; hence, for this group, it requires a higher rate of return. The characteristics of equity as an asset therefore change, depending on who the predominant holder of the equity is. Life cycle considerations thus become crucial for asset pricing. If equity is a “desirable” asset for the marginal investor in the economy, then the observed equity premium will be low, relative to an economy where the marginal investor finds it unattractive to hold equity. The deus ex machina is the stage in the life cycle of the marginal investor. The authors argue that the young, who should be holding equity in an economy without frictions and with complete contraction, are effectively shut out of this market because of borrowing constraints. The young are characterized by low wages; ideally they would like to smooth lifetime consumption by borrowing against future wage income (consuming a part of the loan and investing the rest in higher return equity). However, they are prevented from doing so because human capital alone does not collateralize major loans in modern economies for reasons of moral hazard and adverse selection. In the presence of borrowing constraints, equity is thus exclusively priced by the middle-aged investors, since the young are effectively excluded from the equity markets and we observe a high equity premium. If the borrowing constraint is relaxed, the young will borrow to purchase equity, thereby raising the bond yield. The increase
46 This is precisely the reason as explained earlier why high-beta stocks in the simple CAPM framework have a high rate of return. In that model, the return on the market is a proxy for consumption. High-beta stocks pay off when the market return is high, i.e., when marginal utility is low, hence their price is (relatively) low and their rate of return high.
924
R. Mehra and E.C. Prescott
in the bond yield induces the middle-aged to shift their portfolio holdings from equity to bonds. The increase in demand for equity by the young and the decrease in the demand for equity by the middle-aged work in opposite directions. On balance, the effect is to increase both the equity and the bond return while simultaneously shrinking the equity premium. Furthermore, the relaxation of the borrowing constraint reduces the net demand for bonds and the risk-free rate puzzle re-emerges. 4.2. Liquidity premium Bansal and Coleman (1996) develop a monetary model that offers an explanation of the equity premium. In their model, some assets other than money play a key feature by facilitating transactions. This affects the rate of return they offer in equilibrium. Considering the role of a variety of assets in facilitating transactions, they argue that, on the margin, the transaction service return of money relative to interest bearing checking accounts should be the interest rate paid on these accounts. They estimate this to be 6%, based on the rate offered on NOW accounts for the period they analyze. Since this is a substantial number, they suggest that other money-like assets may also implicitly include a transaction service component to their return. Insofar as T-bills and equity have a different service component built into their returns, this may offer an explanation for the observed equity premium. In fact, if this service component differential were about 5%, there would be no equity premium puzzle. We argue that this approach can be challenged on three accounts. First, the majority of T-bills are held by institutions, that do not use them as compensatory balances for checking accounts and it is difficult to imagine their having a significant transaction service component. Second, the returns on NOW and other interest bearing accounts have varied over time. These returns have been higher post-1980 than in earlier periods. In fact, for most of the twentieth century, checking accounts were not interest bearing. However, contrary to the implications of this model, the equity premium has not diminished in the post-1980 period when presumably the implied transaction service component was the greatest. Third, this model implies that there should be a significant yield differential between T-bills and long term government bonds that presumably do not have a significant transaction service component. However, this has not been the case. 4.3. Taxes and regulation McGrattan and Prescott (2000, 2001) take the position that factors other than a premium for bearing non-diversifiable risk account for the large difference in the return on corporate equity and the after-tax real interest rate in the 1960–2000 period. They find that changes in the tax and legal-regulatory systems in the USA that permitted retirement accounts and pension funds to hold corporate equity and reductions in marginal income tax rates account for the high return on corporate equity in this period.
Ch. 14:
The Equity Premium in Retrospect
925
Subsequent to the writing of our equity premium paper [Mehra and Prescott (1985)], real business-cycle theory was developed by Kydland and Prescott (1982). Real business-cycle theory uses the stochastic growth model augmented to include the labor–leisure decision. One finding of the real business-cycle literature is that the real after-tax interest rate varies in the range from 4 to 4.5%. Another finding is that the predicted after-tax return on corporate equity is essentially equal to this real interest rate. These results are closely related to and consistent with what Mehra and Prescott (1985) found in their “Equity Premium Puzzle” paper. The key difference is the empirical counterpart of the real interest rate. Mehra and Prescott (1985) use the highly liquid T-bill rate, corrected for expected inflation. Business-cycle theorists [see McGrattan and Prescott (2000, 2001), who incorporate the details of the tax system] use the intertemporal marginal rate of substitution for consumption to determine this interest rate. Why was the average real return on T-bills significantly below the real interest rate as found in the business-cycle literature? Why was the average real return on corporate equity significantly above this real interest rate in the 1960–2000 period? The low realized real return on T-bills in this period probably has to do with the liquidity services that T-bills provide. The total T-bill real return, including liquidity services, could very well have been in the range from 4 to 4.5%. A more interesting question is, why was the return on corporate equity so high in the 1960–2000 period? McGrattan and Prescott (2000) answer this question in the process of estimating the fundamental value of the stock market in 1962 and 2000. They chose these two points in time because, relative to GDP, after-tax corporate earnings, net corporate debt, and corporate tangible capital stock were approximately the same and the tax system had been stable for a number of years. Further, at neither point in time was there any fear of full or partial expropriation of capital. What differed was that the value of the stock market relative to GDP in 2000 was nearly twice as large as in 1962. What changed between 1962 and 2000 were the tax and legal-regulatory systems. The marginal tax rate on corporate distributions was 43% in the 1955–1962 period and only 17% in the 1987–2000. This marginal tax rate on dividends does not have consequences for steady-state after-tax earnings or steady-state corporate capital, provided that tax revenues are returned lump-sum to households. This tax rate does however have consequences for the value of corporate equity. The important changes in the legal-regulatory system, most of which occurred in the late 1970s and early 1980s, were that corporate equity was permitted to be held as pension fund reserves and that people could invest on a before-tax basis in individual retirement accounts that could include equity. The threat of a lawsuit is why debt assets and not equity with higher returns were held as pension fund reserves in 1962. At that time, little equity was held in defined contribution plans retirement accounts because the total assets in these accounts were then a small number. Thus, debt and not equity could and was held tax free in 1962. In 2000, both could be held tax free in defined benefit and defined contribution pension funds and in individual retirement accounts.
926
R. Mehra and E.C. Prescott
Not surprisingly, the assets held in untaxed retirement accounts were large in 2000, being approximately 1.3 GDP [McGrattan and Prescott (2000)]. McGrattan and Prescott (2000, 2001) in determining whether the stock market was overvalued or undervalued vis-a-vis standard growth theory exploit the fact that the value of a set of real assets is the sum of the values of the individual assets in the set. They develop a method for estimating the value of intangible corporate capital, something that is not reported on balance sheets and, like tangible capital, adds to the value of corporations. Their method uses only national account data and the equilibrium condition that after-tax returns are equated across assets. They also incorporate the most important features of the U.S. tax system into the model they use to determine the value of corporate equity, in particular, the fact that capital gains are only taxed upon realization. The formula they develop for the fundamental value of corporate equities V is V = (1 − td ) KT + (1 − td ) (1 − tc ) KI ,
(43)
where td is the tax rate on distributions; tc is the tax rate on corporate income; KT is the end-of-period tangible corporate capital stock; and KI is the end-of-period intangible corporate capital stock. The reasons for the tax factors are as follows. Corporate earnings significantly exceed corporate investment. As a result, aggregate corporate distributions are large and positive. Historically, these distributions have been in the form of dividends. Therefore, the cost of a unit of tangible capital on the margin is only 1 − td units of forgone consumption. In the case of intangible capital, the consumption cost of a unit of capital is even smaller because investments in intangible capital reduce corporate tax liabilities. 47 The tricky part of the calculation is in constructing a measure of intangible capital. These investments reduce current accounting profits and they increase future economic profits. The formula for steady-state before tax accounting profits is p=
i KT + iKI − gKI , 1 − tc
(44)
where g is the steady-state growth rate of the economy and the interest rate i the steady-state after-tax real interest rate. Note that gKI is steady-state net investment in intangible capital, which reduces accounting profits because it is expensed. Note also, that all the variables in formula (44) are reported in the system of national accounts with the exception of i and KI . McGrattan and Prescott (2001) estimate i using national income data. Their estimate of i is the after-tax real return on capital in the noncorporate sector, which has 47
In fact, formula (1) must be adjusted if economic depreciation and accounting depreciation are not equal and if there is an investment tax credit. See McGrattan and Prescott (2001).
Ch. 14:
The Equity Premium in Retrospect
927
as much capital as does the corporate sector. They find that the stock market was neither overvalued nor undervalued in 1962 and 2000. The primary reason for the low valuation in 1962 relative to GDP and high valuation in 2000 relative to GDP is that td was much higher in 1962 than it was in 2000. The secondary reason is that the value of foreign subsidiaries of U.S. corporations grew in the period. An increase in the size of the corporate intangible capital stock was also a contributing factor. McGrattan and Prescott (2001) find that in the economically and politically stable 1960–2000 period, the after-tax real return on holding corporate equity was as predicted by theory if the changes in the tax and regulatory system were not anticipated. These unanticipated changes led to a large unanticipated capital gain on holding corporate equity. Evidence of the importance of these changes is that the share of corporate equity held in retirement accounts and as pension fund reserves increased from essentially zero in 1962 to slightly over 50% in 2000. This is important because it means that half of corporate dividends are now subject to zero taxation. In periods of economic uncertainty, such as those that prevailed in the 1930– 1955 period with the Great Depression, World War II, and the fear of another great depression, the survival of the capitalistic system was in doubt. In such times, low equity prices and high real returns on holding equity are not surprising. This is the Brown, Goetzmann and Ross (1995) explanation of the equity premium. By 1960, the fears of another great depression and of an abandonment of the capitalistic system in the USA had vanished, and clearly other factors gave rise to the high return on equity in the 1960–2000 period.
5. An equity premium in the future? There is a group of academicians and professionals who claim that at present there is no equity premium, and by implication, no equity premium puzzle. To address these claims we need to differentiate between two different interpretations of the term “equity premium”. One is the ex-post or realized equity premium. This is the actual, historically observed difference between the return on the market, as captured by a stock index, and the risk free rate, as proxied by the return on government bills. This is what we addressed in Mehra and Prescott (1985). However, there is a related concept – the ex-ante equity premium. This is a forward-looking measure of the premium, that is, the equity premium that is expected to prevail in the future or the conditional equity premium given the current state of the economy. To elaborate, after a bull market, when stock valuations are high relative to fundamentals, the ex-ante equity premium is likely to be low. However, it is precisely in these times, when the market has risen sharply, that the ex-post, or the realized premium is high. Conversely, after a major downward correction, the ex-ante (expected) premium is likely to be high while the realized premium will be low. This should not come as a surprise since returns to stock have been documented to be mean-reverting.
928
R. Mehra and E.C. Prescott
Dimson, Marsh and Staunton (2000), Siegel (1998) and Fama and French (2002) document that equity returns over the past 50 years have been higher than their expected values. Fama and French argue that since the average realized return over this period exceeds the one-year ahead conditional forecast (based on the price dividend ratio) by an average of 3.11 to 4.88% per year, the expected equity premium should have declined by this amount. The key implication here is that the expected equity premium is small. If investors have overestimated the equity premium over the second half of this century, Constantinides (2002) argues that “we now have a bigger puzzle on our hands”. Why have investors systematically biased their estimates over such a long horizon? He, however, finds no statistical support for the Fama and French claim. 48 Which of these interpretations of the equity premium is relevant for an investment advisor? Clearly this depends on the planning horizon. The equity premium that we documented in our 1985 paper is for very long investment horizons. It has little to do with what the premium is going to be over the next couple of years. The ex-post equity premium is the realization of a stochastic process over a certain period and as shown earlier (see Figures 1, 2 and 3) it has varied considerably and counter-cyclically over time. Market watchers and other professionals who are interested in short-term investment planning will wish to project the conditional equity premium over their planning horizon. This is by no means a simple task. Even if the conditional equity premium given current market conditions is small, and there appears to be general consensus that it is, this in itself does not imply that it was obvious either that the historical premium was too high or that the equity premium has diminished. The data used to document the equity premium over the past 100 years is as good an economic data set as we have and this is a long series when it comes to economic data. Before we dismiss the premium, not only do we need to understand the observed phenomena but we also need a plausible explanation why the future is likely to be any different from the past. In the absence of this, and based on what we currently know, we can make the following claim: over the long horizon the equity premium is likely to be similar to what it has been in the past and the returns to investment in equity will continue to substantially dominate that in T-bills for investors with a long planning horizon. Appendix A Suppose the distribution of returns period by period is independently and identically distributed. Then as the number of periods tends to infinity, the future value of the 48 “Notwithstanding the possibility that regime shifts may well have occurred during this period and that behavior deviations from rationality may have been at work, the simple present-value model matches the gross features of the equity return and the price–dividend ratio without having to resort to regime shifts or deviations from rationality” [Constantinides (2002)].
Ch. 14:
The Equity Premium in Retrospect
929
investment, computed at the arithmetic average of returns tends to the expected value of the investment with probability 1. +T To see this, let VT = t = 1 (1 + rt ), where rt is the asset return in period t and VT is the terminal value of one dollar at time T . Then T (1 + rt ) . E (VT ) = E i=1
Since the rt ’s are assumed to be uncorrelated, we have E (VT ) =
T
E (1 + rt ) .
i=1
or E (VT ) =
T
(1 + E(rt )) .
i=1
T Let the arithmetic average, AA = T1 t = t rt . Then, by the strong law of large numbers [Billingsley (1995, Theorem 22.1)] E (VT ) →
T
(1 + AA)
as T → ∞,
i=1
or E (VT ) → (1 + AA)T , as the number of periods T becomes large. If asset returns, rt , are identically and independently log normally distributed, then, as the number of periods tends to infinity, the future value of an investment compounded at the continuously compounded geometric average rate tends to the median value the investment. +of T Let VT = t = 1 (1 + rt ), where rt is the asset return in period t and VT is the terminal value of one dollar at time T . The Geometric Average is defined by: GA =
T
1/T (1 + rt )
− 1,
t=1
hence VT = (1 + GA)T and ln(1 + GA) =
1 T
ln(1 + rt ).
930
R. Mehra and E.C. Prescott
Let the continuously compounded geometric rate of return = mrc . Then by definition ln (1 + GA) = mrc , or 1 + GA = exp [mrc ] , and (1 + GA)T = exp [T mrc ] . By the properties of the lognormal distribution, the median value of VT = exp[E E ln(1 + rt ) → T mre as (ln VT )] and by the strong law of large numbers E(ln VT ) = T → ∞ [Billingsley (1995, Theorem 22.1)]. Hence the median value of VT = exp[T mrc = (1 + GA)T as claimed above.
Appendix B. The original analysis of the equity premium puzzle In this Appendix we present our original analysis of the equity premium puzzle. Needless to say, it draws heavily from Mehra and Prescott (1985). B.1. The economy, asset prices and returns We employ a variation of Lucas’ (1978) pure exchange model. Since per capita consumption has grown over time, we assume that the growth rate of the endowment follows a Markov process. This is in contrast to the assumption in Lucas’ model that the endowment level follows a Markov process. Our assumption, which requires an extension of competitive equilibrium theory, enables us to capture the non-stationarity in the consumption series associated with the large increase in per capita consumption that occurred in the 1889–1978 period. The economy we consider was judiciously selected so that the joint process governing the growth rates in aggregate per capita consumption and asset prices would be stationary and easily determined. The economy has a single representative “standin” household. This unit orders its preferences over random consumption paths by E0
&∞
' b t U (ct ) ,
0 < b < 1,
(B.1)
t=0
where ct is per capita consumption, b is the subjective time discount factor, E{·} is the expectation operator conditional upon information available at time zero (which denotes the present time) and U : R+ → R is the increasing concave utility function.
Ch. 14:
The Equity Premium in Retrospect
931
To insure that the equilibrium return process is stationary, we further restrict the utility function to be of the constant relative risk aversion (CRRA) class c1−a , 0 < a < ∞. (B.2) 1−a The parameter a measures the curvature of the utility function. When a is equal to one, the utility function is defined to be the logarithmic function, which is the limit of the above function as a approaches one. We assume there is one productive unit which produces output yt in period t which is the period dividend. There is one equity share with price pt that is competitively traded; it is a claim to the stochastic process { yt }. The growth rate in yt is subject to a Markov chain; that is, U (c, a) =
yt + 1 = xt + 1 yt ,
(B.3)
where xt + 1 ∈ {l1 , . . . , ln } is the growth rate, and Pr {xt + 1 = l1 ; xt = lj } = ÷ij .
(B.4)
It is also assumed that the Markov chain is ergodic. The li are all positive and y0 > 0. The random variable yt is observed at the beginning of the period, at which time dividend payments are made. All securities are traded ex-dividend. We also assume that the matrix A with elements aij ≡ b÷ij lj1−a for i, j = 1, . . . , n is stable; that is, lim Am as m → ∞ is zero. In Mehra (1988) it is shown that this is necessary and sufficient for expected utility to exist if the stand-in household consumes yt every period. The paper also defines and establishes the existence of a Debreu (1954) competitive equilibrium with a price system having a dot product representation under this condition. Next we formulate expressions for the equilibrium time t price of the equity share and the risk-free bill. We follow the convention of pricing securities ex-dividend or exinterest payments at time t, in terms of the time t consumption good. For any security with process {ds } on payments, its price in period t is ' & ∞ s − t U ( ys ) ds , (B.5) P t = Et b U ( yt ) s=t+1
as the equilibrium consumption is the process { ys } and the equilibrium price system has a dot product representation. The dividend payment process for the equity share in this economy is { ys }. Consequently, using the fact that U (c) = c−a , Pte = P e (xt , yt ) & ∞ ' a s − t yt =E b ys | xt , yt . ysa
(B.6)
s=t+1
Variables xt and yt are sufficient relative to the entire history of shocks up to, and including, time t for predicting the subsequent evolution of the economy. They thus
932
R. Mehra and E.C. Prescott
constitute legitimate state variables for the model. Since ys = yt xt + 1 · · · xs , the price of the equity security is homogeneous of degree one in yt which is the current endowment of the consumption good. As the equilibrium values of the economies being studied are time invariant functions of the state (xt , yt ), the subscript t can be dropped. This is accomplished by redefining the state to be the pair (c, i), if yt = c and xt = li . With this convention, the price of the equity share from Equation (B.6) satisfies pe (c, i) = b
n
−a e p lj c, j + lj c ca . ÷ij lj c
(B.7)
j=1
Using the result that pe (c, i) is homogeneous of degree one in c, we represent this function as pe (c, i) = wi c,
(B.8)
where wi is a constant. Making this substitution in Equation (B.7) and dividing by c yields wi = b
n
÷ij lj(1−a) wj + 1
for
i = 1, . . . , n.
(B.9)
j=1
This is a system of n linear equations in n unknowns. The assumption that guaranteed existence of equilibrium guarantees the existence of a unique positive solution to this system. The period return if the current state is (c, i) and next period state (lj c, j) is rije
pe lj c, j + lj c − pe (c, i) = pe (c, i) lj wj + 1 − 1, = wi
(B.10)
The equity’s expected period return if the current state is i is Rei =
n
÷ij rije .
(B.11)
j=1
Capital letters are used to denote expected return. With the subscript i, it is the expected return conditional upon the current state being (c, i). Without this subscript it is the expected return with respect to the stationary distribution. The superscript indicates the type of security.
Ch. 14:
The Equity Premium in Retrospect
933
The other security considered is the one-period real bill or riskless asset, which pays one unit of the consumption good next period with certainty. From Equation (B.6), pif = p f (c, i) n U lj c =b ÷ij U (c) j=1 =b ÷ij lj−a .
(B.12)
The certain return on this riskless security is Rif =
1 pif
−1
,
(B.13)
when the current state is (c, i). As mentioned earlier, the statistics that are probably most robust to the modeling specification are the means over time. Let p ∈ Rn be the vector of stationary probabilities on i. This exists because the chain on i has been assumed to be ergodic. The vector p is the solution to the system of equations p = ÷T p , with n
pi = 1
and
÷T = {÷ji } .
i=1
The expected returns on the equity and the risk-free security are, respectively, Re =
n
pi Rei
i=1
and
Rf =
n
pi Rif .
(B.14)
i=1
Time sample averages will converge in probability to these values given the ergodicity of the Markov chain. The risk premium for equity is Re − R f , a parameter that is used in the test. The parameters defining preferences are a and b while the parameters defining technology are the elements of [÷ij ] and [li ]. Our approach is to assume two states for the Markov chain and to restrict the process as follows: l1 = 1 + m + d,
l2 = 1 + m − d,
÷1 1 = ÷2 2 = ÷,
÷1 2 = ÷2 1 = (1 − ÷).
The parameters m, ÷, and d now define the technology. We require d > 0 and 0 < ÷ < 1. This particular parameterization was selected because it permitted us to independently
934
R. Mehra and E.C. Prescott
Average risk premia (%) e
f
R –R
2
1 Admissible region
f
0
1
2
3
R (%) 4 Average risk-free rate
Fig. 8. Set of admissible average equity risk premia and real returns.
vary the average growth rate of output by changing m, the variability of consumption by altering d, and the serial correlation of growth rates by adjusting ÷. The parameters were selected so that the average growth rate of per capita consumption, the standard deviation of the growth rate of per capita consumption and the first-order serial correlation of this growth rate, all with respect to the model’s stationary distribution, matched the sample values for the U.S. economy between 1889–1978. The sample values for the U.S. economy were 0.018, 0.036 and −0.14, respectively. The resulting parameter’s values were m = 0.018, d = 0.036 and ÷ = 0.43. Given these values, the nature of the test is to search for parameters a and b for which the model’s averaged risk-free rate and equity risk premium match those observed for the U.S. economy over this ninety-year period. The parameter a, which measures peoples’ willingness to substitute consumption between successive yearly time periods is an important one in many fields of economics. As mentioned in the text there is a wealth of evidence from various studies that the coefficient of risk aversion a is a small number, certainly less than 10. A number of these studies are documented in Mehra and Prescott (1985). This is an important restriction, for with large a virtually any pair of average equity and riskfree returns can be obtained by making small changes in the process on consumption. Given the estimated process on consumption, Figure 8 depicts the set of values of the average risk-free rate and equity risk premium which are both consistent with the model and result in average real risk-free rates between zero and four percent. These are values that can be obtained by varying preference parameters a between zero and ten and b between zero and one. The observed real return of 0.80% and equity premium of 6% is clearly inconsistent with the predictions of the model. The largest premium obtainable with the model is 0.35%, which is not close to the observed value.
Ch. 14:
The Equity Premium in Retrospect
935
An advantage of our approach is that we can easily test the sensitivity of our results to such distributional assumptions. With a less than ten, we found that our results were essentially unchanged for very different consumption processes, provided that the mean and variances of growth rates equaled the historically observed values. We use this fact in motivating the discussion in the text.
References Abel, A.B. (1988), “Stock prices under time varying dividend risk: an exact solution in an infinite horizon general equilibrium model”, Journal of Monetary Economics 22:375−394. Abel, A.B. (1990), “Asset prices under habit formation and catching up with the Joneses”, A.E.R. Papers and Proceedings 80:38−42. Abel, A.B., N.G. Mankiw, L.H. Summers and R.J. Zeckhauser (1989), “Assessing dynamic efficiency: theory and evidence”, Review of Economic Studies 56:1−20. Ait-Sahalia, Y., J.A. Parker and M. Yogo (2001), “Luxury goods and the equity premium”, Working Paper 8417 (NBER). Aiyagari, S.R., and M. Gertler (1991), “Asset returns with transactions costs and uninsured individual risk”, Journal of Monetary Economics 27:311−331. Altonji, J.G., F. Hayashi and L.J. Kotlikoff (1992), “Is the extended family altruistically linked?” American Economic Review 82:1177−1198. Alvarez, F., and U. Jermann (2000), “Asset pricing when risk sharing is limited by default”, Econometrica 48:775−797. Attanasio, O.P., and S.J. Davis (1997), “Relative wage movements and the distribution of consumption”, Journal of Political Economy 104:1227−1262. Attanasio, O.P., J. Banks and S. Tanner (2002), “Asset holding and consumption volatility”, Journal of Political Economy 110:771−792. Auerbach, A.J., and L.J. Kotlikoff (1987), Dynamic Fiscal Policy (Cambridge University Press). Bansal, R., and J.W. Coleman (1996), “A monetary explanation of the equity premium, term premium and risk free rate puzzles”, Journal of Political Economy 104:1135−1171. Bansal, R., and A. Yaron (2000), “Risks for the long run: a potential resolution of asset pricing puzzles”, Working Paper 8059 (NBER). Barberis, N., M. Huang and T. Santos (2001), “Prospect theory and asset prices”, Quarterly Journal of Economics 116:1−53. Barro, R.J., and G.S. Becker (1988), “Population growth and economic growth”, Working Paper (Harvard University). Basak, S., and D. Cuoco (1998), “An equilibrium model with restricted stock market participation”, Review of Financial Studies 11:309−341. Benartzi, S., and R.H. Thaler (1995), “Myopic loss aversion and the equity premium puzzle”, Quarterly Journal of Economics 110:73−92. Bewley, T.F. (1982), “Thoughts on tests of the intertemporal asset pricing model”, Working Paper (Northwestern University, IL). Billingsley, P. (1995), Probability and Measure (Wiley, New York). Boldrin, M., L.J. Christiano and J.D.M. Fisher (2001), “Habit persistence, asset returns, and the business cycle”, American Economic Review 91:149−166. Brav, A., and C.C. Geczy (1995), “An empirical resurrection of the simple consumption CAPM with power utility”, Working Paper (University of Chicago). Brav, A., G.M. Constantinides and C.C. Geczy (2002), “Asset pricing with heterogeneous consumers and limited participation: empirical evidence”, Journal of Political Economy 110:793−824.
936
R. Mehra and E.C. Prescott
Breeden, D. (1979), “An intertemporal asset pricing model with stochastic consumption and investment opportunities”, Journal of Financial Economics 7:265−296. Brock, W.A. (1979), “An integration of stochastic growth theory and the theory of finance, Part 1: The growth model”, in: J. Green and J. Scheinkman, eds., General Equilibrium, Growth and Trade (Academic Press, New York) pp. 165–190. Brown, S., W. Goetzmann and S. Ross (1995), “Survival”, Journal of Finance 50:853−873. Campbell, J.Y. (1999), “Asset prices, consumption, and the business cycle”, in: J.B. Taylor and M. Woodford, eds., Handbook of Macroeconomics, Vol. 1 (Elsevier, Amsterdam) pp. 1231−1303. Campbell, J.Y. (2001), “Asset pricing at the millennium”, Journal of Finance 55:1515−1567. Campbell, J.Y., and J.H. Cochrane (1999), “By force of habit: a consumption-based explanation of aggregate stock market behavior”, Journal of Political Economy 107:205−251. Campbell, J.Y., and R.J. Shiller (1988), “Valuation ratios and the long-run stock market outlook”, Journal of Portfolio Management 24:11−26. Cochrane, J.H. (1991), “A simple test of consumption insurance”, Journal of Political Economy 99: 957−976. Cochrane, J.H. (1997), “Where is the market going? Uncertain facts and novel theories”, Economic Perspectives 21:3−37. Cochrane, J.H. (2001), Asset Pricing (Princeton University Press, NJ). Cochrane, J.H., and L.P. Hansen (1992), “Asset pricing explorations for macroeconomics”, in: O.J. Blanchard and S. Fischer, eds., NBER Macroeconomics Annual (MIT Press, MA). Cogley, T. (1999), “Idiosyncratic risk and the equity premium: evidence from the consumer expenditure survey”, Working Paper (Arizona State University). Constantinides, G.M. (1990), “Habit formation: a resolution of the equity premium puzzle”, Journal of Political Economy 98:519−543. Constantinides, G.M. (2002), “Rational asset prices”, Journal of Finance 57:1567−1591. Constantinides, G.M., and D. Duffie (1996), “Asset pricing with heterogeneous consumers”, Journal of Political Economy 104:219−240. Constantinides, G.M., J.B. Donaldson and R. Mehra (2002), “Junior can’t borrow: a new perspective on the equity premium puzzle”, Quarterly Journal of Economics 118:269−296. Cowles and Associates (1939), “Common stock indexes”, Cowles Commission Monograph 3, 2nd Edition (Principia Press, Bloomington, IN). Cox, J.C., J.E. Ingersoll Jr and S.A. Ross (1985), “A theory of the term structure of interest rates”, Econometrica 53:385−407. Daniel, K., and D. Marshall (1997), “The equity premium puzzle and the risk-free rate puzzle at long horizons”, Macroeconomic Dynamics 1:452−484. Danthine, J.-P., and J.B. Donaldson (2001), Intermediate Financial Theory (Prentice Hall, NJ). Danthine, J.-P., J.B. Donaldson and R. Mehra (1992), “The equity premium and the allocation of income risk”, Journal of Economic Dynamics and Control 16:509−532. Davis, S.J., and P. Willen (2000), “Using financial assets to hedge labor income risk: estimating the benefits”, Working Paper (University of Chicago). Debreu, G. (1954), “Valuation equilibrium and pareto optimum”, Proceedings of the National Academy of Sciences 70:588−592. Dimson, E., P. Marsh and M. Staunton (2000), “The millennium book: a century of investment returns”, Working Paper (ABN Amro; London Business School). Donaldson, J.B., and R. Mehra (1984), “Comparative dynamics of an equilibrium intertemporal asset pricing model”, Review of Economic Studies 51:491−508. Duffie, D. (2001), Dynamic Asset Pricing Theory, 3rd Edition (Princeton University Press, NJ). Epstein, L.G., and S.E. Zin (1991), “Substitution, risk aversion, and the temporal behavior of consumption and asset returns: an empirical analysis”, Journal of Political Economy 99:263−286. Fama, E.F., and K.R. French (1988), “Dividend yields and expected stock returns”, Journal of Financial Economics 22:3−25.
Ch. 14:
The Equity Premium in Retrospect
937
Fama, E.F., and K.R. French (2002), “The equity premium”, Journal of Finance 57:637−659. Ferson, W.E., and G.M. Constantinides (1991), “Habit persistence and durability in aggregate consumption”, Journal of Financial Economics 29:199−240. Gabaix, X., and D. Laibson (2001), “The 6D bias and the equity premium puzzle,” in: B. Bernanke and K. Rogoff, eds., NBER Macroeconomics Annual (MIT Press, MA). Grossman, S.J., and R.J. Shiller (1981), “The determinants of the variability of stock market prices”, American Economic Review 71:222−227. Hansen, L.P., and R. Jagannathan (1991), “Implications of security market data for models of dynamic economies”, Journal of Political Economy 99:225−262. Hansen, L.P., and K.J. Singleton (1982), “Generalized instrumental variables estimation of nonlinear rational expectations models”, Econometrica 50:1269−1288. He, H., and D.M. Modest (1995), “Market frictions and consumption-based asset pricing”, Journal of Political Economy 103:94−117. Heaton, J. (1995), “An empirical investigation of asset pricing with temporally dependent preference specifications”, Econometrica 66:681−717. Heaton, J., and D.J. Lucas (1996), “Evaluating the effects of incomplete markets on risk sharing and asset pricing”, Journal of Political Economy 104:443−487. Heaton, J., and D.J. Lucas (1997), “Market frictions, savings behavior and portfolio choice”, Journal of Macroeconomic Dynamics 1:76−101. Heaton, J.C., and D.J. Lucas (2000), “Portfolio choice and asset prices: the importance of entrepreneurial risk”, Journal of Finance 55:1163−1198. Homer, S. (1963), A History of Interest Rates (Rutgers University Press, New Brunswick, NJ). Ibbotson Associates (2001), Stocks, Bonds, Bills and Inflation. 2000 Yearbook (Ibbotson Associates, Chicago). Jacobs, K. (1999), “Incomplete markets and security prices: do asset-pricing puzzles result from aggregation problems?” Journal of Finance 54:123−163. Jacobs, K., and K.Q. Wang (2001), “Idiosyncratic consumption risk and the cross-section of asset returns”, Working Paper (McGill University and University of Toronto). Kandel, S., and R.F. Stambaugh (1991), “Asset returns and intertemporal preferences”, Journal of Monetary Economics 27:39−71. Kocherlakota, N.R. (1996), “The equity premium: it’s still a puzzle”, Journal of Economic Literature 34:42−71. Krebs, T. (2000), “Consumption-based asset pricing with incomplete markets”, Working Paper (Brown University, Providence, RI). Kydland, F., and E.C. Prescott (1982), “Time to build and aggregate fluctuations”, Econometrica 50: 1345−1371. LeRoy, S.H., and J. Werner (2001), Principles of Financial Economics (Cambridge University Press, New York). Litterman, R.B. (1980), “Bayesian procedure for forecasting with vector auto-regressions”, Working Paper (MIT, MA). Lucas, D.J. (1994), “Asset pricing with undiversifiable risk and short sales constraints: deepening the equity premium puzzle”, Journal of Monetary Economics 34:325−341. Lucas Jr, R.E. (1978), “Asset prices in an exchange economy”, Econometrica 46:1429−1445. Luttmer, E.G.J. (1996), “Asset pricing in economies with frictions”, Econometrica 64:1439−1467. Lynch, A.W. (1996), “Decision frequency and synchronization across agents: implications for aggregate consumption and equity returns”, Journal of Finance 51:1479−1497. Macaulay, F.R. (1938), The Movements of Interest Rates, Bond Yields and Stock Prices in the United States since 1856 (National Bureau of Economic Research, New York). Mace, B.J. (1991), “Full insurance in the presence of aggregate uncertainty”, Journal of Political Economy 99:928−956.
938
R. Mehra and E.C. Prescott
Mankiw, N.G. (1986), “The equity premium and the concentration of aggregate shocks”, Journal of Financial Economics 17:211−219. Mankiw, N.G., and S.P. Zeldes (1991), “The consumption of stockholders and nonstockholders”, Journal of Financial Economics 29:97−112. McGrattan, E.R., and E.C. Prescott (2000), “Is the market overvalued?” Federal Reserve Bank of Minneapolis Quarterly Review 24:20−40. McGrattan, E.R., and E.C. Prescott (2001), “Taxes, regulations, and asset prices”, Working Paper 610 (Federal Reserve Bank of Minneapolis). Mehra, R. (1988), “On the existence and representation of equilibrium in an economy with growth and nonstationary consumption”, International Economic Review 29:131−135. Mehra, R. (1998), “On the volatility of stock prices: an exercise in quantitative theory”, International Journal of Systems Science 29:1203−1211. Mehra, R. (2003), “The equity premium: why is it a puzzle”, Financial Analysts Journal January/February, pp. 54–69. Mehra, R., and E.C. Prescott (1985), “The equity premium: a puzzle”, Journal of Monetary Economy 15:145−161. Mehra, R., and E.C. Prescott (1988), “The equity premium: a solution?” Journal of Monetary Economy 22:133−136. Mehra, R., and R. Sah (2002), “Mood fluctuations, projection bias, and volatility of equity prices”, Journal of Economic Dynamics and Control 26:869−887. Merton, R.C. (1971), “Optimum consumption and portfolio rules in a continuous time model”, Journal of Theory 3:373−413. Prescott, E.C., and R. Mehra (1980), “Recursive competitive equilibrium: the case of homogeneous households”, Econometrica 48:1365−1379. Rietz, T.A. (1988), “The equity risk premium: a solution”, Journal of Monetary Economy 22:117−131. Rubinstein, M. (1976), “The valuation of uncertain income streams and the pricing of options”, Bell Journal of Economics 7:407−425. Schwert, G.W. (1990), “Indexes of U.S. stock prices from 1802 to 1987”, Journal of Business 63: 399−426. Shiller, R.J. (1990), Market Volatility (MIT Press, Cambridge, MA). Siegel, J. (1998), Stocks for the Long Run, 2nd Edition (Irwin, New York). Smith, W.B., and A.H. Cole (1935), Fluctuations in American Business, 1790–1860 (Harvard University Press, Cambridge, MA). Storesletten, K., C.I. Telmer and A. Yaron (2000), “Consumption and risk sharing over the lifecycle”, Working Paper (Carnegie Mellon University, Pittsburg, PA). Storesletten, K., C.I. Telmer and A. Yaron (2001), “Asset pricing with idiosyncratic risk and overlapping generations”, Working Paper (Carnegie Mellon University, Pittsburgh, PA). Telmer, C.I. (1993), “Asset-pricing puzzles and incomplete markets”, Journal of Finance 49:1803−1832. Vissing-Jorgensen, A. (2002), “Limited asset market participation and the elasticity of intertemporal substitution”, Journal of Political Economy, forthcoming. Weil, P. (1989), “The equity premium puzzle and the risk-free rate puzzle”, Journal of Monetary Economy 24:401−421.
Chapter 15
ANOMALIES AND MARKET EFFICIENCY G. WILLIAM SCHWERT ° University of Rochester, and NBER
Contents Abstract Keywords 1. Introduction 2. Selected empirical regularities 2.1. Predictable differences in returns across assets 2.1.1. Data snooping 2.1.2. The size effect 2.1.3. The turn-of-the-year effect 2.1.4. The weekend effect 2.1.5. The value effect 2.1.6. The momentum effect 2.2. Predictable differences in returns through time 2.2.1. Short-term interest rates, expected inflation, and stock returns 2.2.2. Dividend yields and stock returns
3. Returns to different types of investors 3.1. Individual investors 3.1.1. Closed-end funds 3.2. Institutional investors 3.2.1. Mutual funds 3.2.2. Hedge funds 3.2.3. Returns to IPOs 3.3. Limits to arbitrage
4. Long-run returns 4.1. Returns to firms issuing equity
941 941 942 943 943 943 944 945 946 947 949 951 952 954 956 956 957 958 958 958 959 961 961 962
° The Bradley Policy Research Center, William E. Simon Graduate School of Business Administration, University of Rochester, provided support for this research. I received helpful comments from Yakov Amihud, Brad Barber, John Cochrane, Eugene Fama, Murray Frank, Ken French, David Hirshleifer, Tim Loughran, Randall Mørck, Jeff Pontiff, Jay Ritter, Ren´e Stulz, A. Subrahmanyam, Sheridan Titman, Janice Willett and Jerold Zimmerman. The views expressed herein are those of the author and do not necessarily reflect the views of the National Bureau of Economic Research.
Handbook of the Economics of Finance, Edited by G.M. Constantinides, M. Harris and R. Stulz © 2003 Elsevier B.V. All rights reserved
940
G.W. Schwert 4.2. Returns to bidder firms
5. Implications for asset pricing 5.1. 5.2. 5.3. 5.4.
The search for risk factors Conditional asset pricing Excess volatility The role of behavioral finance
6. Implications for corporate finance 6.1. Firm size and liquidity 6.2. Book-to-market effects 6.3. Slow reaction to corporate financial policy
7. Conclusions References
964 966 966 967 967 967 968 968 968 969 970 970
Ch. 15:
Anomalies and Market Efficiency
941
Abstract Anomalies are empirical results that seem to be inconsistent with maintained theories of asset-pricing behavior. They indicate either market inefficiency (profit opportunities) or inadequacies in the underlying asset-pricing model. After they are documented and analyzed in the academic literature, anomalies often seem to disappear, reverse, or attenuate. This raises the question of whether profit opportunities existed in the past, but have since been arbitraged away, or whether the anomalies were simply statistical aberrations that attracted the attention of academics and practitioners. One of the interesting findings from the empirical work in this chapter is that many of the well-known anomalies in the finance literature do not hold up in different sample periods. In particular, the size effect and the value effect seem to have disappeared after the papers that highlighted them were published. At about the same time, practitioners began investment vehicles that implemented the strategies implied by the academic papers. The weekend effect and the dividend yield effect also seem to have lost their predictive power after the papers that made them famous were published. In these cases, however, I am not aware of any practitioners who have tried to use these anomalies as a major basis of their investment strategy. The small-firm turn-of-the-year effect became weaker in the years after it was first documented in the academic literature, although there is some evidence that it still exists. Interestingly, however, it does not seem to exist in the portfolio returns of practitioners who focus on small-capitalization firms. Likewise, the evidence that stock market returns are predictable using variables such as dividend yields or inflation is much weaker in the periods after the papers that documented these findings were published. All of these findings raise the possibility that anomalies are more apparent than real. The notoriety associated with the findings of unusual evidence tempts authors to further investigate puzzling anomalies and later to try to explain them. But even if the anomalies existed in the sample period in which they were first identified, the activities of practitioners who implement strategies to take advantage of anomalous behavior can cause the anomalies to disappear (as research findings cause the market to become more efficient).
Keywords market efficiency, anomaly, size effect, value effect, selection bias, momentum JEL classification: G14, G12, G34, G32
942
G.W. Schwert
1. Introduction Anomalies are empirical results that seem to be inconsistent with maintained theories of asset-pricing behavior. They indicate either market inefficiency (profit opportunities) or inadequacies in the underlying asset-pricing model. After they are documented and analyzed in the academic literature, anomalies often seem to disappear, reverse, or attenuate. This raises the question of whether profit opportunities existed in the past, but have since been arbitraged away, or whether the anomalies were simply statistical aberrations that attracted the attention of academics and practitioners. Surveys of the efficient markets literature date back at least to Fama (1970), and there are several recent updates, including Fama (1991) and Keim and Ziemba (2000), that stress particular areas of the finance literature. By their nature, surveys reflect the views and perspectives of their authors, and this one will be no exception. My goal is to highlight some interesting findings that have emerged from the research of many people and to raise questions about the implications of these findings for the way academics and practitioners use financial theory. 1 There are obvious connections between this chapter and other chapters by Ritter (5: Investment Banking and Security Issuance), Stoll (9: Market Microstructure), Dybvig and Ross (10: Arbitrage, State Prices and Portfolio Theory), Duffie (11: Intertemporal Asset Pricing Models), Ferson (12: Tests of Multi-Factor Pricing Models, Volatility, and Portfolio Performance), Campbell (13: Equilibrium Asset Pricing Models), Easley and O’Hara (17: Asset Prices Market Microstructure) and Barberis and Thaler (18: Behavioral Issues in Asset Pricing). In fact, those chapters draw on some of the same findings and papers that provide the basis for my conclusions. At a fundamental level, anomalies can only be defined relative to a model of “normal” return behavior. Fama (1970) noted this fact early on, pointing out that tests of market efficiency also jointly test a maintained hypothesis about equilibrium expected asset returns. Thus, whenever someone concludes that a finding seems to indicate market inefficiency, it may also be evidence that the underlying asset-pricing model is inadequate. It is also important to consider the economic relevance of a presumed anomaly. Jensen (1978) stressed the importance of trading profitability in assessing market efficiency. In particular, if anomalous return behavior is not definitive enough for an efficient trader to make money trading on it, then it is not economically significant. This definition of market efficiency directly reflects the practical relevance of academic research into return behavior. It also highlights the importance of transactions costs and other market microstructure issues for defining market efficiency. The growth in the amount of data and computing power available to researchers, along with the growth in the number of active empirical researchers in finance since
1
This chapter is not meant to be a survey of all of the literature on market efficiency or anomalies. Failure to cite particular papers should not be taken as a reflection on those papers.
Ch. 15:
Anomalies and Market Efficiency
943
Fama’s (1970) survey article, has created an explosion of findings that raise questions about the first, simple models of efficient capital markets. Many people have noted that the normal tendency of researchers to focus on unusual findings (which could be a byproduct of the publication process, if there is a bias toward the publication of findings that challenge existing theories) could lead to the over-discovery of “anomalies”. For example, if a random process results in a particular sample that looks unusual, thereby attracting the attention of researchers, this “sample selection bias” could lead to the perception that the underlying model was not random. Of course, the key test is whether the anomaly persists in new, independent samples. Some interesting questions arise when perceived market inefficiencies or anomalies seem to disappear after they are documented in the finance literature: Does their disappearance reflect sample selection bias, so that there was never an anomaly in the first place? Or does it reflect the actions of practitioners who learn about the anomaly and trade so that profitable transactions vanish? The remainder of this chapter is organized as follows. Section 2 discusses crosssectional and times-series regularities in asset returns, including the size, book-tomarket, momentum, and dividend yield effects. Section 3 discusses differences in returns realized by different types of investors, including individual investors (through closed-end funds and brokerage account trading data) and institutional investors (through mutual fund performance and hedge fund performance). Section 4 evaluates the role of measurement issues in many of the papers that study anomalies, including the difficult issues associated with long-horizon return performance. Section 5 discusses the implications of the anomalies literature for asset-pricing theories, and Section 6 discusses the implications of the anomalies literature for corporate finance. Section 7 contains brief concluding remarks.
2. Selected empirical regularities 2.1. Predictable differences in returns across assets 2.1.1. Data snooping Many analysts have been concerned that the process of examining data and models affects the likelihood of finding anomalies. Authors in search of an interesting research paper are likely to focus attention on “surprising” results. To the extent that subsequent authors reiterate or refine the surprising results by examining the same or at least positively correlated data, there is really no additional evidence in favor of the anomaly. Lo and MacKinlay (1990) illustrate the data-snooping phenomenon and show how the inferences drawn from such exercises are misleading. One obvious solution to this problem is to test the anomaly on an independent sample. Sometimes researchers use data from other countries, and sometimes they use data from prior time periods. If sufficient time elapses after the discovery of an
944
G.W. Schwert Table 1 Size and value effects a , January 1982 – May 2002 Sample period
ai
t(ai = 0)
bi
t(bi = 1) b
DFA 9-10 Small company portfolio 1982–2002
0.0020
0.67
1.033
0.68
1982–1987
−0.0019
−0.44
1.000
0.00
1988–1993
0.0038
0.80
1.104
1.21
1994–2002
0.0035
0.66
1.013
0.15
−0.59
0.816
−2.14
DFA US 6-10 value portfolio 1994–2002
−0.0022
a
Performance of DFA US 9-10 Small Company Portfolio relative to the CRSP value-weighted portfolio of NYSE, Amex, and Nasdaq stocks (Rm ) and the one-month Treasury bill yield (Rf ), January 1982 – May 2002. The intercept in this regression, ai , is known as “Jensen’s alpha” (1968) and it measures the average difference between the monthly return to the DFA fund and the return predicted by the CAPM (see also Equation 1). b The performance of the DFA US 6-10 Value Portfolio from January 1994 – May 2002. Heteroskedasticity-consistent standard errors are used to compute the t-statistics.
anomaly, the analysis of subsequent data also provides a test of the anomaly. I supply some evidence below on the post-publication performance of several anomalies. 2.1.2. The size effect Banz (1981) and Reinganum (1981) showed that small-capitalization firms on the New York Stock Exchange (NYSE) earned higher average returns than is predicted by the Sharpe (1964) – Lintner (1965) capital asset-pricing model (CAPM) from 1936–75. This “small-firm effect” spawned many subsequent papers that extended and clarified the early papers. For example, a special issue of the Journal of Financial Economics contained several papers that extended the size-effect literature. 2 Interestingly, at least some members of the financial community picked up on the small-firm effect, since the firm Dimensional Fund Advisors (DFA) began in 1981 with Eugene Fama as its Director of Research. 3 Table 1 shows the abnormal performance
2
Schwert (1983) discusses all of these papers in more detail. Information about DFA comes from their web page: http://www.dfafunds.com and from the Center for Research in Security Prices (CRSP) Mutual Fund database. Ken French maintains current data for the Fama–French factors on his web site: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/. 3
Ch. 15:
Anomalies and Market Efficiency
945
of the DFA US 9–10 Small Company Portfolio, which closely mimics the strategy described by Banz (1981). The measure of abnormal return ai in Table 1 is called Jensen’s (1968) alpha, from the following familiar model: (1) Rit −Rft = ai + bi Rmt −Rft + eit , where Rit is the return on the DFA fund in month t, Rft is the yield on a one-month Treasury bill, and Rmt is the return on the CRSP value-weighted market portfolio of NYSE, Amex, and Nasdaq stocks. The intercept ai in (1) measures the average difference between the monthly return to the DFA fund and the return predicted by the CAPM. The market risk of the DFA fund, measured by bi , is insignificantly different from 1.0 in the period January 1982–May 2002, as well as in each of the three subperiods, 1982–1987, 1988–1993, and 1994–2002. The estimates of abnormal monthly returns are between −0.2% and 0.4% per month, although none are reliably below zero. Thus, it seems that the small-firm anomaly has disappeared since the initial publication of the papers that discovered it. Alternatively, the differential risk premium for small-capitalization stocks has been much smaller since 1982 than it was during the period 1926–1982. 2.1.3. The turn-of-the-year effect Keim (1983) and Reinganum (1983) showed that much of the abnormal return to small firms (measured relative to the CAPM) occurs during the first two weeks in January. This anomaly became known as the “turn-of-the-year effect”. Roll (1983) hypothesized that the higher volatility of small-capitalization stocks caused more of them to experience substantial short-term capital losses that investors might want to realize for income tax purposes before the end of the year. This selling pressure might reduce prices of small-cap stocks in December, leading to a rebound in early January as investors repurchase these stocks to reestablish their investment positions. 4 Table 2 shows estimates of the turn-of-the-year effect for the period 1962–2001, as well as for the 1962–1979 period analyzed by Reinganum (1983), and the subsequent 1980–1989 and 1990–2001 sample periods. The dependent variable is the difference in the daily return to the CRSP NYSE small-firm portfolio (decile 1) and the return to the CRSP NYSE large-firm portfolio (decile 10), (R1t − R10t ). The independent variable, January, equals one when the daily return occurs during the first 15 calendar days of January, and zero otherwise. Thus, the coefficient aJ measures the difference between the average daily return during the first 15 calendar days of January and the rest of the 4 There are many mechanisms that could mitigate the size of such an effect, including the choice of a tax year different from a calendar year, the incentive to establish short-term losses before December, and the opportunities for other investors to earn higher returns by providing liquidity in December.
946
G.W. Schwert Table 2 Small firm/turn-of-the-year effect a , daily returns, 1962–2001 Sample period
a0
t(a0 = 0)
1962–2001
−0.00007
−0.92
0.00641
9.87
1962–1979
0.00009
0.97
0.00815
7.14
1980–1989
−0.00014
−0.73
0.00433
4.55
1990–2001
−0.00026
−1.72
0.00565
5.37
aJ
t(aJ = 0)
a
(R1t − R10t ) = a0 + aJ Januaryt + et . R1t is the return to the CRSP NYSE smallfirm portfolio (decile 1) and R10t is the return to the CRSP NYSE large-firm portfolio (decile 10). January = 1 when the daily return occurs during the first 15 calendar days of January, and zero otherwise. The coefficient of January measures the difference in average return between small- and large-firm portfolios during the first two weeks of the year versus other days in the year. Heteroskedasticity-consistent standard errors are used to compute the t–statistics.
year. If small firms earn higher average returns than large firms during the first half of January, aJ should be reliably positive. Unlike the results in Table 1, it does not seem that the turn-of-the-year anomaly has completely disappeared since it was originally documented. The estimates of the turnof-the-year coefficient aJ are around 0.4% per day over the periods 1980–1989 and 1990–2001, which is about half the size of the estimate over the 1962–1979 period of 0.8%. Thus, while the effect is smaller than observed by Keim (1983) and Reinganum (1983), it is still reliably positive. Interestingly, Booth and Keim (2000) have shown that the turn-of-the-year anomaly is not reliably different from zero in the returns to the DFA 9–10 portfolio over the period 1982–1995. They conclude that the restrictions placed on the DFA fund (no stocks trading at less than $2 per share or with less than $10 million in equity capitalization, and no stocks whose IPO was less than one year ago) explain the difference between the behavior of the CRSP small-firm portfolio and the DFA portfolio. Thus, it is the lowest-priced and least-liquid stocks that apparently explain the turn-of-the-year anomaly. This raises the possibility that market microstructure effects, especially the costs of illiquidity, play an important role in explaining some anomalies (see chapters 9 and 17 by Stoll and Easley and O’Hara, respectively). 2.1.4. The weekend effect French (1980) observed another calendar anomaly. He noted that the average return to the Standard and Poor’s (S&P) composite portfolio was reliably negative over weekends in the period 1953–1977. Table 3 shows estimates of the weekend effect from February 1885 to May 2002, as well as for the 1953–1977 period analyzed by French (1980) and the 1885–1927, 1928–1952, and 1978–2002 sample periods not included in French’s study. The dependent variable is the daily return to a broad
Ch. 15:
Anomalies and Market Efficiency
947
Table 3 Day-of-the-week effects in the U.S. stock returns a , February 1885−May 2002 Sample period
a0
t(a0 = 0)
aW
t(aW = 0)
1885–2002
0.0005
8.52
−0.0017
−10.13
1885–1927
0.0004
4.46
−0.0013
−4.96
1928–1952
0.0007
3.64
−0.0030
−6.45
1953–1977
0.0007
6.80
−0.0023
−8.86
1978–2002
0.0005
4.00
−0.0005
−1.37
a
Rt = a0 + aW Weekend t + et . Weekend = 1 when the return spans Sunday (e.g., Friday to Monday), and zero otherwise. The coefficient of Weekend measures the difference in average return over the weekend versus other days of the week. From 1885–1927, Dow Jones portfolios are used [see Schwert (1990)]. From 1928–May 2002, the Standard & Poor’s composite portfolio is used. Heteroskedasticity-consistent standard errors are used to compute the t-statistics.
portfolio of U.S. stocks. For the 1885–1927 period, the Schwert (1990) portfolio based on Dow Jones indexes is used. For 1928–2002, the S&P composite portfolio is used. The independent variable, Weekend, equals one when the daily return spans a weekend (e.g., Friday to Monday), and zero otherwise. Thus, the coefficient aW measures the difference between the average daily return over weekends and the other days of the week. If weekend returns are reliably lower than returns on other days of the week, aW should be reliably negative (and the sum of a0 + aW should be reliably negative to confirm French’s (1980) results). The results for 1953–1977 replicate the results in French (1980). The estimate of the weekend effect for 1928–1952 is even more negative, as previously noted by Keim and Stambaugh (1984). The estimate of the weekend effect from 1885–1927 is smaller, about half the size for 1953–1977 and about one-third the size for 1928–1952, but still reliably negative. Interestingly, the estimate of the weekend effect since 1978 is not reliably different from the other days of the week. While the point estimate of aW is negative from 1978–2002, it is about one-quarter as large as the estimate for 1953–1977, and it is not reliably less than zero. The estimate of the average return over weekends is the sum a0 + aW , which is essentially zero for 1978–2002. Thus, like the size effect, the weekend effect seems to have disappeared, or at least substantially attenuated, since it was first documented in 1980. 2.1.5. The value effect Around the same time as early size-effect papers, Basu (1977, 1983) noted that firms with high earnings-to-price (E/P) ratios earn positive abnormal returns relative to the CAPM. Many subsequent papers have noted that positive abnormal returns seem to accrue to portfolios of stocks with high dividend yields (D/P) or to stocks with high book-to-market (B/M) values.
948
G.W. Schwert
Ball (1978) made the important observation that such evidence was likely to indicate a fault in the CAPM rather than market inefficiency, because the characteristics that would cause a trader following this strategy to add a firm to his or her portfolio would be stable over time and easy to observe. In other words, turnover and transactions costs would be low and information collection costs would be low. If such a strategy earned reliable “abnormal” returns, it would be available to a large number of potential arbitrageurs at a very low cost. More recently, Fama and French (1992, 1993) have argued that size and value (as measured by the book-to-market value of common stock) represent two risk factors that are missing from the CAPM. In particular, they suggest using regressions of the form:
Rit −Rft = ai + bi Rmt −Rft + si SMBt + hi HMLt + eit ,
(2)
to measure abnormal performance, ai . In Equation (2), SMB represents the difference between the returns to portfolios of small- and large-capitalization firms, holding constant the B/M ratios for these stocks, and HML represents the difference between the returns to portfolios of high and low B/M ratio firms, holding constant the capitalization for these stocks. Thus, the regression coefficients si and hi represent exposures to size and value risk in much the same way that bi measures the exposure to market risk. Fama and French (1993) used their three-factor model to explore several of the anomalies that have been identified in earlier literature, where the test of abnormal returns is based on whether ai = 0 in Equation (2). They found that abnormal returns from the three-factor model in Equation (2) are not reliably different from zero for portfolios of stocks sorted by: equity capitalization, B/M ratios, dividend yield, or earnings-to-price ratios. The largest deviations from their three-factor model occur in the portfolio of low B/M (i.e., growth) stocks, where small-capitalization stocks have returns that are too low and large-capitalization stocks have returns that are too high (ai > 0). Fama and French (1996) extended the use of their three-factor model to explain the anomalies studied by Lakonishok, Shleifer and Vishny (1994). They found no estimates of abnormal performance in Equation (2) that are reliably different from zero based on variables such as B/M, E/P, cash flow over price (C/P), and the rank of past sales growth rates. In 1993, Dimensional Fund Advisors (DFA) began a mutual fund that focuses on small firms with high B/M ratios (the DFA US 6–10 Value Portfolio). Based on the results in Fama and French (1993), this portfolio would have earned significantly positive “abnormal” returns of about 0.5% per month over the period 1963–1991 relative to the CAPM. The estimate of the abnormal return to the DFA Value portfolio from 1994–2002 in the last row of Table 1 is −0.2% per month, with a t-statistic of −0.59. Thus, as with the DFA US 9–10 Small Company Portfolio, the apparent anomaly that motivated the fund’s creation seems to have disappeared, or at least attenuated.
Ch. 15:
Anomalies and Market Efficiency
949
Davis, Fama and French (2000) collected and analyzed B/M data from 1929 through 1963 to study a sample that does not overlap the data studied in Fama and French (1993). They found that the apparent premium associated with value stocks is similar in the pre-1963 data to the post-1963 evidence. They also found that the size effect is subsumed by the value effect in the earlier sample period. Fama and French (1998) have shown that the value effect exists in a sample covering 13 countries (including the USA) over the period 1975–1995. Thus, in samples that pre-date the publication of the original Fama and French (1993) paper, the evidence supports the existence of a value effect. Daniel and Titman (1997) have argued that size and M/B characteristics dominate the Fama–French size and B/M risk factors in explaining the cross-sectional pattern of average returns. They conclude that size and M/B are not risk factors in an equilibrium pricing model. However, Davis, Fama and French (2000) found that Daniel and Titman’s results do not hold up outside their sample period. 2.1.6. The momentum effect Fama and French (1996) have also tested two versions of momentum strategies. DeBondt and Thaler (1985) found an anomaly whereby past losers (stocks with low returns in the past three to five years) have higher average returns than past winners (stocks with high returns in the past three to five years), which is a “contrarian” effect. On the other hand, Jegadeesh and Titman (1993) found that recent past winners (portfolios formed on the last year of past returns) out-perform recent past losers, which is a “continuation” or “momentum” effect. Using their three-factor model in Equation (2), Fama and French found no estimates of abnormal performance that are reliably different from zero based on the long-term reversal strategy of DeBondt and Thaler (1985), which they attribute to the similarity of past losers and small distressed firms. On the other hand, Fama and French are not able to explain the short-term momentum effects found by Jegadeesh and Titman (1993) using their three-factor model. The estimates of abnormal returns are strongly positive for short-term winners. Table 4 shows estimates of the momentum effect using both the CAPM benchmark in Equation (1) and the Fama–French three-factor benchmark in Equation (2). The measure of momentum is the difference between the returns to portfolios of high and low prior return firms, UMD, where prior returns are measured over months −2 to −13 relative to the month in question. 5 The sample periods shown are the 1965–1989 period used by Jegadeesh and Titman (1993), the 1927–1964 period that preceded their sample, the 1990–2001 period that occurred after their paper was published, and the overall 1927–2001 period. Compared with the CAPM benchmark in the top panel of Table 4, the momentum effect seems quite large and reliable. The intercept a is about
5
This Fama–French momentum factor for the period 1927–2001 is available from Ken French’s web site, http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Momentum_Factor.zip.
950
Table 4 Momentum effects a , 1927–2001 Sample period
Sample size (T )
a
t(a = 0)
b
t( b = 0)
s
t(s = 0)
h
t(h = 0)
−0.102
−1.14
−0.484
−4.65
Single-factor CAPM benchmark 1926–2001
900
0.0095
6.98
−0.280
−3.48
1926–1964
456
0.0100
5.33
−0.415
−4.06
1965–1989
300
0.0082
4.00
0.016
0.22
1926–1989
756
0.0091
6.37
−0.303
−3.50
1990–2001
144
0.0107
2.71
−0.063
−0.56
Three-factor Fama–French benchmark 1926–2001
900
0.0110
8.25
−0.193
−3.75
1926–1964
456
0.0103
5.72
−0.204
−3.45
−0.137
−0.95
−0.525
−3.67
1965–1989
300
0.0100
4.61
−0.010
−0.13
−0.132
−1.17
−0.276
−2.08
1926–1989
756
0.0107
7.77
−0.170
−3.27
−0.128
−1.25
−0.519
−4.50
1990–2001
144
0.0123
2.95
−0.201
−1.83
0.093
0.54
−0.245
−1.35
a
UMDt = a + b(Rmt −Rft ) + sSMBt + hHMLt + et . UMDt is the return to a portfolio that is long stocks with high returns and short stocks with low returns in recent months (months −13 through −2). The market risk premium is measured as the difference in return between the CRSP value-weighted portfolio of NYSE, Amex and Nasdaq stocks (Rm ) and the one-month Treasury bill yield (Rf ). SMBt is the difference between the returns to portfolios of small- and large-capitalization firms, holding constant the B/M ratios for these stocks, and HMLt is the difference between the returns to portfolios of high and low B/M ratio firms, holding constant the capitalization for these stocks. Heteroskedasticity-consistent standard errors are used to compute the t-statistics. G.W. Schwert
Ch. 15:
Anomalies and Market Efficiency
951
1% per month, with t-statistics between 2.7 and 7.0. In fact, the smallest estimate of abnormal returns occurs in the 1965–1989 period used by Jegadeesh and Titman (1993) and the largest estimate occurs in the 1990–2001 sample after their paper was published. 6 Fama and French (1996) noted that their three-factor model does not explain the momentum effect, since the intercepts in the bottom panel of Table 4 are all reliably positive. In fact, the intercepts from the three-factor models are larger than from the single-factor CAPM model in the upper panel. Lewellen (2002) has presented evidence that portfolios of stocks sorted on size and B/M characteristics have similar momentum effects as those seen by Jegadeesh and Titman (1993, 2001) and Fama and French (1996). He argues that the existence of momentum in large diversified portfolios makes it unlikely that behavioral biases in information processing are likely to explain the evidence on momentum. Brennan, Chordia and Subrahmanyam (1998) found that size and B/M characteristics do not explain differences in average returns, given the Fama and French three-factor model. Like Fama and French (1996), they found that the Fama–French model does not explain the momentum effect. Finally, they found a negative relation between average returns and recent past dollar trading volume. They argue that this reflects a relation between expected returns and liquidity as suggested by Amihud and Mendelson (1986) and Brennan and Subrahmanyam (1996). Thus, while many of the systematic differences in average returns across stocks can be explained by the three-factor characterization of Fama and French (1993), momentum cannot. Interestingly, the average returns to index funds that were created to mimic the size and value strategies discussed above have not matched up to the historical estimates, as shown in Table 1. The evidence on the momentum effect seems to persist, but may reflect predictable variation in risk premiums that are not yet understood. 2.2. Predictable differences in returns through time In the early years of the efficient markets literature, the random walk model, in which returns should not be autocorrelated, was often confused with the hypothesis of market efficiency [see, for example, Black (1971)]. Fama (1970, 1976) made clear that the assumption of constant equilibrium expected returns over time is not a part of the efficient markets hypothesis, although that assumption worked well as a rough approximation in many of the early efficient markets tests. Since then, many papers have documented a small degree of predictability in stock returns based on prior information. Examples include Fama and Schwert (1977) [shortterm interest rates], Keim and Stambaugh (1986) [spreads between high-risk corporate 6 Jegadeesh and Titman (2001) also show that the momentum effect remains large in the post 1989 period. They tentatively conclude that momentum effects may be related to behavioral biases of investors.
952
G.W. Schwert
bond yields and short-term interest rates], Campbell (1987) [spreads between long- and short-term interest rates], French, Schwert and Stambaugh (1987) [stock volatility], Fama and French (1988) [dividend yields on aggregate stock portfolios], and Kothari and Shanken (1997) [book-to-market ratios on aggregate stock portfolios]. Recently, Baker and Wurgler (2000) have shown that the proportion of new securities issues that are equity issues is a negative predictor of future equity returns over the period 1928–1997. An obvious question given evidence of the time-series predictability of returns is whether this is evidence of market inefficiency, or simply evidence of time-varying equilibrium expected returns. Fama and Schwert (1977) found weak evidence that excess returns to the CRSP value-weighted portfolio of NYSE stocks (in excess of the one-month Treasury bill yield) are predictably negative. Many subsequent papers have used similar metrics to judge whether the evidence of time variation in expected returns seems to imply profitable trading strategies. I am not aware of a paper that claims to find strong evidence that excess stock returns have been predictably negative, although that may be an extreme standard for defining market inefficiency since it ignores risk. 2.2.1. Short-term interest rates, expected inflation, and stock returns Using data from 1953–1971, Fama and Schwert (1977) documented a reliable negative relation between aggregate stock returns and short-term interest rates. Since Fama (1975) had shown that most of the variation in short-term interest rates was due to variation in expected inflation rates during this period, Fama and Schwert concluded that expected stock returns are negatively related to expected inflation. Table 5 shows estimates of the relation between stock returns and short-term interest rates or expected inflation rates for the period January 1831–May 2002, as well as for the 1953–1971 period analyzed by Fama and Schwert (1977). The dependent variable Rmt is the monthly return to an aggregate stock portfolio [based on the Schwert (1990) data for 1831–1925 and the CRSP value-weighted portfolio for 1926–2001, and the Standard and Poor’s composite for 2002], Rmt = a + gRft + et ,
(3)
where Rft is the yield on a short-term low-risk security (commercial paper yields from 1831–1925 and Treasury yields from 1926–2002). 7 The negative relation between expected stock returns and short-term interest rates is strongest for the 1953–1971 period, but the estimate is negative in all of the sample periods in Table 5, and it is reliably different from zero over 1831–1925. The t-statistic for 1972–2002 is −1.08. It is common to use the average difference between the return from a large portfolio of stocks and the yield on a short-term bond (Rmt −Rft ) as an estimate of the market risk
7
Schwert (1989) describes the sources and methods used to derive the short-term interest rate series.
Ch. 15:
Anomalies and Market Efficiency
953
Table 5 Relation between stock market returns and short-term interest rates or expected inflation a , January 1831 – May 2002 Sample period b
Rft
E(PPIt ) c
E(CPIt ) d
1831–2002 [2,053]
−2.073 (−3.50)
0.139 (0.93)
−0.591 (−0.68)
1831–1925 [1,136]
−3.958 (−4.58)
0.223 (1.53)
1926–1952 [324]
0.114 (0.03)
−0.056 (−0.10)
−0.580 (−0.46)
1953–1971 [228]
−5.559 (−2.57)
−0.412 (−0.43)
−2.448 (−1.13)
1972–2002 [357]
−1.140 (−1.08)
−0.612 (−0.95)
−1.258 (−1.29)
a R mt = a + gXt + et ; Xt = Rft , E(PPIt ), or E(CPIt ). Rft is the yield on a one-month security (commercial paper from 1831–1925 and Treasury securities from 1926–2002). E(PPIt ) is the one-month-ahead forecast from a predictive model for PPI inflation: PPIt = a0 + g0 Rft + [(1 − qL)/ [(1 − ÷L)] et , which is a regression of PPI inflation on the short-term interest rate with ARMA(1,1) errors estimated with the prior 120 months of data. Similarly, E(CPIt ) is the one-month-ahead forecast from a predictive model for CPI inflation. Heteroskedasticity-consistent t-statistics are in parentheses next to the coefficient estimates. b Sample size between brackets. c 120 PPI observations are used to create the forecasting model, so the sample size from 1831–2002 is 1,932 and from 1831–1925 it is 1,015. d CPI data are available from 1931–2002, and 120 observations are used to create the forecasting model, so the sample size from 1831–2002 is 952.
premium [e.g., Ibbotson Associates (1998) and Brealey and Myers (2000)]. This model of the market risk premium implies that the coefficient of Rft in Equation (3) should be 1.0, so that the negative estimates are even more surprising. For example, the t-statistic for the hypothesis that the coefficient of Rft equals 1.0 for 1972–2002 is −2.03. Table 5 also shows estimates of the relation between stock returns and two measures of the expected inflation rate, using the Consumer Price Index (CPI) and the Producer Price Index (PPI). The model for expected inflation uses a regression of the inflation rate on the short-term interest rate with ARMA(1,1) errors, PPIt = a0 + g0 TBt +
(1 − qL) et , (1 − ÷L)
(4)
where L is the lag operator, Lk Xt = Xt − k , estimated using the most recent 120 months of data to forecast inflation in month t + 1. 8 It is notable that the negative relation with stock returns is stronger for the interest rate Rft than for either measure of the expected inflation rate, even though Rft is a part of the prediction model for inflation. 8 This model is similar the model used by Nelson and Schwert (1977) to model the CPI inflation rate from 1953–1977. It is a flexible model that is capable of representing a wide variety of persistence in the inflation data.
954
G.W. Schwert
This shows that the interest rate is not a close proxy for the expected inflation rate outside the 1953–1971 period. It also shows that the negative relation between stock returns and short-term interest rates is not always due to expected inflation. Thus, the apparent ability of short-term interest rates to predict stock returns is strongest in the period used by Fama and Schwert (1977). Nevertheless, it does seem that excess returns on stocks are negatively related to interest rates, suggesting a slowly time-varying market risk premium. If the market risk premium varies because of underlying economic fundamentals, this is not an anomaly that would allow investors to trade to make abnormal profits. 2.2.2. Dividend yields and stock returns Using CRSP data for the period 1927–1986, Fama and French (1988) showed that aggregate dividend yields predict subsequent stock returns. Many subsequent papers have amplified this finding and several have questioned aspects of the statistical procedures used, including Goyal and Welch (1999). Table 6 reproduces some of the main results from Fama and French (1988), but also uses the Cowles (1939) data for 1872–1926 and additional CRSP data for 1987–2000. The equation estimated by Fama and French is, r(t, t + T ) = a + dY (t) + e(t, t + T ),
(5)
where Y (t) = D(t)/P(t − 1), P(t) is the price at time t, D(t) is the dividend for the year preceding t, and r(t, t + T ) is the continuously compounded nominal return from t to t + T . What is clear from Table 6 is that the incremental data both before and after the 1927–1986 period studied by Fama and French shows a much weaker relation between aggregate dividend yields and subsequent stock returns. None of the t-statistics for the slope coefficient d are larger than 2.0, even for the 1872–2000 sample which includes the 1927–1986 data used by Fama and French (about half of the sample). This occurs because the slope estimates are much smaller and the explanatory power of the models (R2 ) is negligible. Figure 1 illustrates the limitations of the dividend yield model for predicting stock returns. Figure 1a shows the predictions of stock returns from the model based on lagged dividend yield, D(t)/P(t − 1), for a one-year horizon based on estimates for 1927–1986 (the top row in the right-hand panel of Table 6). It also shows the one-year return to short-term commercial paper and Treasury securities. The model for 1927– 1986 is used to predict stock returns both before and after the estimation sample, for the 1872–2000 period. Until 1961, the predicted stock return is always higher than the interest rate. However, starting in 1990, the predicted stock return is always below the interest rate. 9 9
Campbell and Shiller (1998) also stress the pessimistic implications of low aggregate dividend yields and apparently followed the advice of their model (Wall Street Journal, January 13, 1997).
Ch. 15:
Anomalies and Market Efficiency
955
Table 6 Relation between stock market returns and aggregate dividend yields a , 1872–2000 Return horizon, T
Y (t) = D(t)/P(t − 1)
Y (t) = D(t)/P(t) d
t(d)
R2
S(e)
d
t(d)
R2
S(e)
1
2.21
1.00
0.01
0.21
5.25
3.03
0.07
0.20
2
6.88
2.78
0.08
0.30
8.85
3.53
0.09
0.29
3
9.28
3.23
0.12
0.33
11.25
3.82
0.12
0.33
4
12.05
4.00
0.16
0.36
12.55
4.54
0.12
0.37
1
0.53
0.52
−0.01
0.18
1.27
1.16
0.00
0.18
2
2.03
1.44
0.01
0.26
1.11
0.66
−0.01
0.26
3
2.30
1.33
0.00
0.30
2.17
1.04
0.00
0.30
4
3.87
1.83
0.02
0.34
3.40
1.42
0.01
0.34
1
0.84
0.64
−0.01
0.16
0.55
0.29
−0.02
0.16
2
2.29
1.20
0.00
0.22
−1.14
−0.47
−0.02
0.22
3
1.49
0.70
−0.01
0.24
1.16
0.42
−0.02
0.24
4
3.51
1.40
0.01
0.28
4.48
1.39
0.01
0.28
1927–1986, N = 60
1872–2000, N = 129
1872–1926, N = 55
a
r(t, t + T ) = a + dY (t) + e(t, t + T ). P(t) is the price at time t. Y (t) equals either D(t)/P(t) or D(t)/P(t − 1), where D(t) is the dividend for the year preceding t. r(t, t + T ) is the continuously compounded nominal return from t to t + T to the CRSP value-weighted portfolio from 1926–2000 and to the Cowles portfolio from 1872–1925. The regressions for two-, three- and four-year returns use overlapping annual observations. The t-statistics t(d) use heteroskedasticity- and autocorrelation-consistent standard error estimates. R2 is the coefficient of determination, adjusted for degrees of freedom, and S(e) is the standard error of the regression.
Figure 1b shows the investment results that would have occurred from following a strategy of investing in short-term bonds, rather than stocks, when the dividend yield model in Table 6 predicts stock returns lower than interest rates. Both that strategy and a benchmark buy-and-hold strategy start with a $1000 investment in 1872. By the end of 1999, the buy-and-hold strategy is worth almost $6.7 million, whereas the dividend yield asset allocation strategy is worth just over $2.2 million. This large difference reflects the high stock returns during the 1990s when the dividend yield model would have predicted low stock returns. In short, the out-of-sample prediction performance of this model would have been disastrous. 10 10
Of course, it is possible that a less extreme asset-allocation model that reduced exposure to stocks when dividend yields were low relative to interest rates would perform better.
956
G.W. Schwert 35% 30% 25% 20% 15% 10% 5% 0% -5% 2001
1991
1981
1971
1961
1951
1941
1931
1921
1911
1901
1891
1881
1871
-10%
Fig. 1a. Predictions of stock returns based on lagged dividend yields, D(t)/P(t − 1), and the regression sample from 1927–1986 versus interest rates, 1872–2000. Solid line, interest rate; dashed line, predicted stock return. $10,000,000
$1,000,000
$100,000
$10,000
$1,000
2001
1991
1981
1971
1961
1951
1941
1931
1921
1911
1901
1891
1881
1871
$100
Fig. 1b. Value of $1 invested in stocks (“buy-and-hold”) versus a strategy based on predictions of stock returns from a regression on lagged dividend yields, D(t)/P(t − 1), from 1927–1986. When predicted stock returns exceed interest rates, invest in stocks for that year. When predicted stock returns are below interest rates, invest in short-term money market instruments, 1872–2000. Black line, buy-and-hold; grey line, dividend yield strategy.
3. Returns to different types of investors 3.1. Individual investors One simple corollary of the efficient markets hypothesis is that uninformed investors should be able to earn “normal” rates of return. It should be just as hard to select stocks that will under-perform as to select stocks that will out-perform the market, otherwise, a strategy of short-selling or similarly taking opposite positions would earn above-normal returns. Of course, investors who trade too much and incur unnecessary
Ch. 15:
Anomalies and Market Efficiency
957
and unproductive transactions costs should earn below-normal returns net of these costs. Odean (1999) examined data from 10 000 individual accounts randomly selected from a large national discount brokerage firm for the period 1987–1993. This sample covers over 160 000 trades. Because the data source is a discount brokerage firm, recommendations from a retail broker are presumably not the source of information used by investors to make trading decisions. Odean found that traders lower their returns through trading, even ignoring transactions costs, because the stocks they sell earn higher subsequent returns than the stocks they purchase. Barber and Odean (2000, 2001) used different data from the same discount brokerage firm and found that active trading accounts earn lower risk-adjusted net returns than less-active accounts. They have also found that men trade more actively than women and thus earn lower risk-adjusted net returns and that the stocks that individual investors buy subsequently under-perform the stocks that they sell. The results in these papers are anomalies, but not because trading costs reduce net returns, or because men trade more often than women. They are anomalies because it seems that these individual investors can identify stocks that will systematically under-perform the Fama–French three-factor model in Equation (2). One potential clue in Odean (1999) is that these investors tend to sell stocks that have risen rapidly in the recent weeks, suggesting that the subsequent good performance of these stocks is due to the momentum effect described earlier. By going against momentum, these individual investors may be earning lower returns. 3.1.1. Closed-end funds The closed-end fund puzzle has been recognized for many years. Closed-end funds generally trade in organized secondary trading markets, such as the NYSE. Since marketable securities of other firms constitute most of the assets of closed-end funds, it is relatively easy to observe both the value of the stock of the closed-end fund and the value of its assets. On average, in most periods, the fund trades at less than the value of its underlying assets, which leads to the “closed-end fund discount” anomaly. Thompson (1978) was one of the first to carefully show that closed-end fund discounts could be used to predict above-normal returns to the shares of closed-end funds. Lee, Shleifer and Thaler (1991) argued that the time-series behavior of closedend fund discounts is driven by investor sentiment, with discounts shrinking when individual investors are optimistic. They found that discounts shrink at the same time that returns to small-capitalization stocks are relatively high. Pontiff (1995) updated and extended Thompson’s tests and found that the abnormal returns to closed-end funds are due to mean reversion in the discount, not to unusual returns to the assets held by the funds. In other words, when the prices of closedend fund shares depart too much from their asset values, the difference tends to grow smaller, leading to higher-than-average returns to these shares.
958
G.W. Schwert
Since the anomaly here pertains to the prices of the closed-end fund shares, not to the underlying investment portfolios, and since closed-end fund shares are predominantly held by individual investors, this evidence sheds light on the investment performance of some individual investors.
3.2. Institutional investors Studies of the investment performance of institutional investors date back at least to Cowles (1933). Cowles concluded that professional money managers did not systematically outperform a passive index fund strategy (although he did not use the term “index fund”). There is an extensive literature studying the returns to large samples of open-end mutual funds and, more recently, to private hedge funds. 3.2.1. Mutual funds Hendricks, Patel and Zeckhauser (1993) have found short-run persistence in mutual fund performance, although the strongest evidence is of a “cold-hands” phenomenon whereby poor performance seems more likely to persist than would be true by random chance. Malkiel (1995) studied a database from Lipper that includes all open-end equity funds that existed in each year of the period 1971–1991. Unlike many mutual fund databases that retroactively omit funds that go out of business or merge, Malkiel’s data do not suffer from the survivorship bias stressed by Brown, Goetzmann, Ibbotson and Ross (1992). Malkiel found that mutual funds earn gross returns that are consistent with the CAPM in Equation (1) and net returns that are inferior because of the expenses of active management. He also found evidence of performance persistence for the 1970s, but not for the 1980s. Carhart (1997) also used a mutual-fund database that is free of survivorship bias and found that the persistence identified by Hendricks, Patel and Zeckhauser (1993) is explainable by the momentum effect for individual stocks described earlier. After taking this into account, the only evidence of persistent performance of open-end funds is that poorly performing managers have “cold hands”. 3.2.2. Hedge funds The problem of assessing performance for hedge funds is complicated by the unusual strategies used by many of these funds. Fung and Hsieh (1997) showed that hedge fund returns are not well characterized as fixed linear combinations of traditional asset classes, similar to the Fama–French three-factor model. Because of changing leverage, contingent claims, and frequent changes in investment positions, traditional fund performance measures are of dubious value.
Ch. 15:
Anomalies and Market Efficiency
959
3.2.3. Returns to IPOs The large returns available to investors who can purchase stocks in underwritten firmcommitment initial public offerings (IPOs) at the offering price have been the subject of many papers, dating at least to Ibbotson (1975). Most of the literature on high average initial returns to IPOs focuses on the implied underpricing of the IPO stock and the effects on the issuing firm, but this evidence has equivalent implications for abnormal profits to IPO investors. Several theories have been developed to explain the systematic underpricing of IPO stocks (see chapter 5 in this Handbook by Ritter). Many of these theories point to the difficulty of individual investors in acquiring the most underpriced of IPOs, which is why I include this discussion in the section under returns to institutional investors. How large are the returns to IPO investing? Figure 2a shows the cumulative value of a strategy of investing $1000 starting in January 1960 in a random sample of IPOs, selling after one month, and then re-investing in a new set of IPOs in the next month. The returns to IPOs are from Ibbotson, Sindelar and Ritter (1994) and are updated on Jay Ritter’s website [http://bear.cba.ufl.edu/ritter/ipoall.htm]. For comparison, Figure 2a also shows the value of investing in the CRSP value-weighted portfolio over the same period. By December 2001, the CRSP portfolio is worth about $74, 000. On the other hand, the IPO portfolio strategy is worth over $533 × 1033 . Clearly, no one has been able to follow this strategy, or people like Bill Gates and Warren Buffet would be viewed as rank amateurs in the wealth-creation business!
$1.E+36
IPO strategy
Value of $1,000 Investment in 1960
$1.E+33 $1.E+30 $1.E+27 $1.E+24 $1.E+21 $1.E+18 $1.E+15
CRSP market strategy
$1.E+12 $1.E+9 $1.E+6 $1.E+3
2000
1998
1996
1994
1992
1990
1988
1986
1984
1982
1980
1978
1976
1974
1972
1970
1968
1966
1964
1962
1960
$1.E+0
Fig. 2a. Value each month of $1000 invested in January 1960 in a random sample of IPOs. At the end of each month, the IPO stocks are sold and the proceeds invested in a new sample of IPOs in the next month. The scale is logarithmic and the December 2001 value of the IPO strategy is over $533×1033 . For comparison, the strategy of investing $1000 in the CRSP value-weighted market portfolio in January 1960 is worth almost $74, 000 by December 2001.
960
G.W. Schwert
What are the impediments to IPO investing as a strategy for earning abnormal returns? First, it is difficult to be included in the allocations made by the underwriters. Investment banks usually allocate shares first to large institutional customers (see, e.g., Wall Street Journal, January 27, 2000). If the institutional customers can distinguish between deals that are more underpriced and those that are less underpriced, then the shares available to individual investors are likely to offer lower initial returns. It has also been alleged that in exchange for potential favors (“spinning”), investment banks allocate shares to preferred individual clients such as politicians, including House Speaker Thomas Foley (Wall Street Journal, July 20, 1993) and Senator Alphonse D’Amato, a prominent member of the Senate Banking Committee (Wall Street Journal, June 6, 1996), or to the executives of private firms that are considering going public in the near future (see, e.g., Wall Street Journal, November 12, 1997). Thus, a typical individual investor would have difficulty acquiring shares in the IPOs that are most underpriced. Second, many investment banks discourage the practice of buying shares in an IPO and then selling the shares in the secondary market (“flipping”). Forcing IPO investors to hold shares for more than a month, for example, would increase the risk and costs of pursuing the IPO strategy outlined above (although it would still seem extremely profitable). To the extent that underwriters sometimes provide informal price support in the after-market by buying shares at a price close to the IPO price, it is clear why they would want to discourage flipping when initial returns are negative. On the other hand, when the after-market price rises dramatically and volume is high, flipping is beneficial to the underwriter by increasing market-maker profits. It is necessary for some investors who purchased shares in the IPO to sell their shares to create a public float and therefore liquidity. Indeed, there has been recent acknowledgement that flipping is useful in helping to create liquidity (see, e.g., Wall Street Journal, February 2, 2000). Another unusual feature of IPO returns is their apparent persistence, shown in Figure 2b. While average IPO returns are positive in almost every month from 1960 to 2001, there seem to be very noticeable cycles in these returns, with high returns following high returns and vice versa. According to Lowry and Schwert (2002), these cycles are explained by two important factors. First, the types of firms that go public tend to be clustered in time, so that cross-sectional differences in IPO returns that may be due to information asymmetry, for example, show up in average returns across IPOs. Second, the learning that occurs during the registration period (as underwriters talk to potential investors) affects IPO prices and subsequent returns for the similartype firms that are in the IPO process at the same time, and this process usually lasts more than one month. Lowry and Schwert argue that firms cannot use the persistence in IPO returns shown in Figure 2b to optimally time their IPOs (trying to minimize initial returns). By analogy, investors cannot time their participation in the IPO market (trying to maximize their returns).
Ch. 15:
Anomalies and Market Efficiency
961
Monthly Percentage Return to IPOs
125% 100% 75% 50% 25% 0% -25%
2000
1998
1996
1994
1992
1990
1988
1986
1984
1982
1980
1978
1976
1974
1972
1970
1968
1966
1964
1962
1960
-50%
Fig. 2b. Ibbotson, Sindelar and Ritter’s (1994) monthly data on the average initial returns to IPO investors, January 1960 to December 2001.
Thus, while IPOs seem to offer large abnormal returns to investors who can obtain shares in the IPO allocation, it is not clear that this is an anomaly that can benefit most investors. 3.3. Limits to arbitrage It has long been recognized that transactions costs can limit the ability of traders to profit from mispricing [e.g., Jensen (1978)]. The question of how market frictions affect asset prices and allow apparent anomalies to persist has received increasing attention in recent years. Shleifer and Vishny (1997) have argued that agency problems associated with professional money managers, along with transactions costs, can cause mispricing to persist and that many anomalies are a result of such market frictions. Pontiff (1996) has shown that the absolute value of closed-end fund discounts and premiums are correlated with various measures of the costs of trying to arbitrage mispricing, including the composition of the funds’ portfolios and the level of interest rates. Table 7 lists nine papers that appeared in a special issue of the Journal of Financial Economics, all of which study the effects on asset prices of various kinds of frictions. Several of these papers contain evidence similar to Pontiff ’s in that the extent of apparent pricing anomalies is correlated with the size of transactions costs.
4. Long-run returns DeBondt and Thaler (1985) tracked the returns to “winner” and “loser” portfolios for 36 months after portfolio formation and noted a slow drift upward in the cumulative abnormal returns (CARs) of loser stocks that had performed poorly in the recent
962
G.W. Schwert
Table 7 Contents of the Special Issue of the Journal of Financial Economics on the Limits to Arbitrage, Vol. 66(2–3), November/December 2002 Authors
Paper title
Joseph Chen, Harrison Hong and Jeremy C. Stein
Breadth of ownership and stock returns
Charles M. Jones and Owen A. Lamont
Short sale constraints and stock returns
Christopher C. Geczy, David K. Musto and Adam V. Reed
Stocks are special too: An analysis of the equity lending market
Gene D’Avolio
The market for borrowing risk
Darrell Duffie, Nicolae Garleanu and Lasse Heje Pedersen
Securities lending, shorting, and pricing
Dilip Abreu and Markus K. Brunnermeier
Synchronization risk and delayed arbitrage
Denis Gromb and Dimitri Vayanos
Equilibrium and welfare in markets with financially constrained arbitrageurs
Randolph B. Cohen, Paul A. Gompers and Tuomo Vuolteenaho
Who underreacts to cash-flow news? Evidence from trading between individuals and institutions
Arvind Krishnamurthy
The bond/old-bond spread
past. They interpret this result as evidence of excessive pessimism following poor performance, making the stocks of loser firms profitable investments. Ball, Kothari and Shanken (1995) have argued that poor stock return performance will generally lead to higher leverage, because the value of the stock drops more than the value of the firm’s debt. The increase in leverage should lead to higher risk and higher expected returns than would be reflected in risk estimates from a period before the drop in stock price. They have also pointed out that many of the stocks earning the highest returns have very low prices, so that microstructure effects, such as a large proportional bid–ask spread, can reduce subsequent performance by large amounts. 4.1. Returns to firms issuing equity Using both CARs and buy-and-hold abnormal returns (BHARs), Ritter (1991) measured post-IPO stock performance and concluded that IPO stocks yield belownormal returns in the 36 months following the IPO. He interpreted this result as evidence that investors become too optimistic about IPO firms, inflating the initial IPO return (from the IPO price to the secondary market trading price), and lowering
Ch. 15:
Anomalies and Market Efficiency
963
subsequent returns. Loughran and Ritter (1995) extended Ritter’s analysis using a sample of IPOs from 1970–1990. Brav and Gompers (1997) and Brav, Geczy and Gompers (2000) have studied the returns to IPO firms for the period 1975–1992 and found that underperformance is concentrated primarily in small firms with low book-to-market ratios. They argue that this is the same behavior as seen by Fama and French (1993) in their tests of their threefactor model and that the IPO anomaly is thus a manifestation of a general problem in pricing small firms with low book-to-market ratios. Brav, Geczy and Gompers (2000) also studied seasoned equity offerings (SEOs) and found that momentum, in addition to the Fama–French three-factor model, helps explain the behavior of returns after SEOs. Eckbo, Masulis and Norli (2000) have shown that the reduction in leverage that occurs when new equity is issued reduces subsequent equity risk exposure and thus contributes to the apparent unusual behavior of returns following SEOs. Schultz (2003) used simulations to study the behavior of abnormal return measures after events that are triggered by prior stock price performance. For example, if a firm chooses to issue stock after its price has risen in the recent past, even if the stock price is fully rational, many of the popular measures of long-run abnormal returns will falsely reveal subsequent poor performance (he refers to this as “pseudo-market timing”). The driving force behind his result is that the covariance between current excess returns and the number of future offerings is positive. Many papers have analyzed long-run stock returns following a variety of events and a large number of papers have also analyzed the properties of these long-run stock return tests and alternative hypotheses to explain these types of results. Fama (1998) has argued that the problem of measuring normal returns is particularly important when measuring long-run returns, because model problems that may be small in a day or a month can be compounded into larger apparent effects over three or five years. He has also argued that most papers that attribute apparent abnormal stock returns to behavioral effects are not testing a specific alternative model. Recent papers by Barberis, Shleifer and Vishny (1998), Daniel, Hirshleifer and Subrahmanyam (1998, 2001) and Barberis and Shleifer (2003) are examples of models that make predictions for short- and long-run stock returns from irrational investor behavior. At this point, however, it is unclear whether these models have refutable predictions that differ from tests that have already been performed. Several papers have studied the statistical properties of long-run CARs and BHARs, including Barber and Lyon (1997), Kothari and Warner (1997) and Mitchell and Stafford (2000). All of these papers conclude that it is difficult to find long-run abnormal return measures that have well-specified statistical properties and reasonable power. Mitchell and Stafford (2000) argue that the calendar-time regression approach originally used by Jaffe (1974) and Mandelker (1974), and advocated by Fama (1998), provides more reliable inferences than long-run CARs or BHARs.
964
G.W. Schwert
4.2. Returns to bidder firms The returns to bidder firms’ stocks provide another example of potentially anomalous post-event behavior. Since at least Asquith (1983), researchers have noted that there is a pronounced downward drift in the cumulative abnormal returns to the stocks of firms that are bidders in mergers. One interpretation of this evidence is that bidders overpay and that it takes the market some time to gradually learn about this mistake. Schwert (1996) analyzed the returns to 790 NYSE and Amex-listed bidders for the period 1975–1991 and found a negative drift of about 7% in the year following the announcement of the bid. He concluded, however, that the explanation for this drift is the unusually good stock return performance of the bidder firms in the period prior to the bid. To measure abnormal performance, he used a market model regression, Rit = ai + bi Rmt + eit ,
(6)
where Rit is the return to the bidder firm and Rmt is the return to the CRSP valueweighted portfolio in period t, based on 253 daily returns in the year before the event analysis (which starts six months before the first bid is announced). Using the estimates of ai and bi , abnormal returns are estimated, averaged, and cumulated for the period from 127 trading days before the bid announcement to 253 trading days after the bid announcement, eik = Rik − ai − bi Rmk 790 AR k = eik (7)
i=1
CAR J =
J
AR k .
k = −127
The dashed line in Figure 3 represents the CAR to the bidder firms in Equation (7). It drifts downward after the first bid announcement to about −8% a year afterwards. The solid line in Figure 3 represents a simple adjustment to the calculation of abnormal returns to bidders’ stocks: the intercept ai is set equal to zero. This adjusted cumulative abnormal return does not have a noticeable drift in Figure 3, which is consistent with the efficient markets hypothesis. The adjustment eliminates the negative drift in abnormal returns because the estimated intercepts in the market model are systematically positive for bidder stocks in the year and a half before the bid, reflecting the fact that bidder firms are more likely to have recently experienced good performance, at least in terms of their stock prices. This abnormally good performance vanishes after the first bid (as it should in an efficient market). Note that this does not mean that the bid somehow caused something bad to happen to the bidder firm; it simply means that bidders’ stock returns were normal in the period
Ch. 15:
Anomalies and Market Efficiency
965
Cumulative Average Abnormal Returns
2%
0%
-2%
-4%
-6%
-8%
252
231
210
189
168
147
126
84
105
63
42
0
21
-21
-42
-63
-84
-105
-126
-10%
Event Date Relative to First Bid
Fig. 3. Cumulative average abnormal returns to bidder firms’ stocks from trading day −126 to +253 relative to the first bid for NYSE- and AMEX-listed target firms for the period 1975–1991. Market model parameters used to define abnormal returns are estimated using the CRSP value-weighted portfolio for days −379 to −127. The solid line shows the effect of setting the intercepts to zero, since the bidder firms seem to have abnormally high stock returns during the estimation period (shown by the dotted line that drifts downward from day −126 to day +253). There are 790 NYSE- or Amex-listed bidder firms that made the first bid for exchange-listed target firms in this period. Solid line, first bidder CAR (intercept = 0); dotted line, first bidder CAR.
following the announcement of the bid. The unusually positive performance of bidders’ stocks before the bid is an example of sample selection bias: the decisions of bidder firms to pursue acquisitions is correlated with their past stock price performance. It is important to note that it is not necessary to adjust the CAR for the sample of target firms. The CAR for target firms rises gradually before the first bid announcement, reflecting bid anticipation, and jumps on the day of the announcement. After that, it remains flat for the next year. In contrast with the bidder firms, the target firms’ intercepts from the estimated market models are not unusually large, reflecting neither positive nor negative stock price performance in the year and half before they become targets. Mitchell and Stafford (2000) used the calendar-time portfolio method suggested by Fama (1998) to measure abnormal returns to acquiring firms. They concluded that an equal-weighted portfolio of acquirers seems to earn negative abnormal returns over a three-year window following an acquisition, but that a value-weighted portfolio does not, using the Fama–French three-factor model in Equation (2) as a benchmark. This method of measuring the size and significance of abnormal returns is not affected by unusual prior performance in the same way as the CARs in Figure 3. Loughran and Vijh (1997) compared buy-and-hold returns to bidders’ stocks measured five years after acquisitions with returns to control firms that are matched on size and book-to-market characteristics. They found that stock mergers are followed
966
G.W. Schwert
by negative excess returns and cash tender offers are followed by positive excess returns. Since the choice of payment by the bidder is similar to a choice concerning equity financing, the sample selection issues raised by Schultz (2003) might affect the Loughran and Vijh (1997) results.
5. Implications for asset pricing Consistent with Fama’s observation (1970, 1976, 1998) that tests of market efficiency are necessarily joint tests of a model of expected returns, evidence of anomalies is also potentially evidence of a short-coming in the implied asset-pricing model used for the test. One example of this phenomenon that has created much activity in the finance literature in recent years is the Fama and French (1993) three-factor model, which incorporates the size and book-to-market anomalies into the asset-pricing model. 5.1. The search for risk factors An obvious question that arises from empirically motivated adjustments of assetpricing models is whether the new, extended model accurately describes equilibrium behavior, or is just a convenient offshoot of the anomalous findings that motivated the extension. For example, the simple two-parameter CAPM of Sharpe (1964) and Lintner (1965) was motivated by portfolio theory. Many people have developed extensions of theoretical asset-pricing models that include multiple factors (see, for example, chapters 10, 11 and 12 in this Handbook by Dybvig and Ross, Duffie, and Ferson, respectively), although none of these models match closely with the empirical Fama– French model. On the other hand, as Fama and French (1993) have pointed out, some versions of multifactor models are vague about the risk factors that might lead to differences in expected returns across assets, so that their empirical proxies (size and book-to-market) may be reflecting equilibrium trade-offs between risk and expected return. The Fama and French (1993, 1996) tests are consistent with their three-factor model being an adequate asset-pricing model, in the sense that the intercepts in their regression tests (measuring average abnormal returns to different portfolio strategies) are not reliably different from zero. 11 There is at least one other issue that must be addressed, however, before concluding that the three-factor model is an accurate equilibrium-pricing model. As noted by MacKinlay (1995), the estimates of factor risk premiums from the Fama–French model seem very high, particularly for the book-to-market factor. In some ways,
11 An exception is that the Fama–French (1993) portfolio of the smallest firms with the lowest book-tomarket ratios has a reliably negative intercept. Also, as mentioned above, the Fama–French model does not seem to explain the momentum evidence.
Ch. 15:
Anomalies and Market Efficiency
967
this is analogous to the “equity premium puzzle” that has been frequently discussed in the macro-finance literature (see chapter 13 in this Handbook by Campbell). If the estimates of risk premiums are too high (or too low) to be consistent with the underlying economic theory that motivates the model, the fact that average returns are linearly related to the risk factors is not sufficient to conclude that the market is efficient. If the book-to-market premium is too high, as argued by MacKinlay, then returns vary too much with this risk factor. From this perspective, the evidence that the three-factor model provides a good linear model of risk and return may be just a fortuitous description of an anomaly. 5.2. Conditional asset pricing The evidence on time-varying expected returns has obvious implications for the growing literature on conditional asset-pricing models. On the other hand, the poor out-of-sample performance of some of the predictor variables raises questions about their role in asset prices. 5.3. Excess volatility I have not addressed the question raised by Shiller (1981a,b) of whether stock market volatility is “too high”. His provocative papers on “excess volatility” stimulated many rebuttals, including Kleidon (1986) and Marsh and Merton (1986), that raised questions about the validity and robustness of his statistical methods. While I have written many papers on the behavior of stock volatility, some of which raise questions about why volatility varies over time as much as it does [e.g., Schwert (1989)], I do not believe that this literature is closely linked with the literature on anomalies and market efficiency. In my 1991 review of Shiller (1989), I argue that Shiller’s research on excess volatility is really a test of a particular valuation model and provides no guidance on how to identify or profit from mispricing. 5.4. The role of behavioral finance Finally, there is the issue of whether the findings in the anomalies literature can be combined with behavioral theories from the psychology literature to create new asset-pricing theories that combine economic equilibrium concepts with psychological concepts to create an improved asset-pricing model (see chapter 18 by Barberis and Thaler). My impression, to date, is that the attempts to proceed in this direction have produced models that might explain some of the existing anomalies, but they make no predictions for observable behavior that have not already been tested extensively. 12 In other words, the new behavioral theories have not yet made predictions that are
12
Fama (1998) is less sympathetic to the ability of these new models to explain existing anomalies.
968
G.W. Schwert
refutable with new tests. Going beyond the stage of building theories to explain the “stylized facts” that already exist will be a significant challenge.
6. Implications for corporate finance What implications do market efficiency and anomalies have for corporate finance? The standard textbook treatment of corporate finance in an efficient market [for example, Brealey and Myers (2000)] tells firms to choose projects that maximize value, and perhaps choose capital structures or dividend policies that create value, but to take the market prices of their stocks and bonds as given and more or less correct. 6.1. Firm size and liquidity How would the kinds of anomalies discussed above change this advice, if at all? To the extent that the small-firm effect is real, firms that merge and become larger would have a lower cost of capital, and therefore a higher value. But this kind of financial synergy is hard to believe. In fact, it raises the question of whether firm size somehow proxies for a more fundamental source of risk or value. Amihud and Mendelson (1986) have argued that firm size proxies for the illiquidity of the stock and that higher transactions costs for small firms raise the required gross return so that net expected returns are equalized, given the risk of the stock. In their empirical work, they found that the cross-sectional dispersion in average returns across portfolios of NYSE stocks sorted on bid–ask spreads is similar to the dispersion in average returns across portfolios sorted on risk estimates. From this perspective, size is not a risk factor, but rather a proxy for differential transactions costs. 13 Thus, actions that increase the liquidity of a firm’s stock would reduce required returns and increase the stock price if such actions were costless. Decisions on whether the firm should undertake policies that increase liquidity depend on whether the benefits exceed the costs. There has been much recent work on the linkages between market microstructure, asset pricing, and corporate finance (see chapter 17 in this Handbook by Easley and O’Hara). 6.2. Book-to-market effects Fama and French interpret the book-to-market ratio as an indicator of “value” versus “growth” stocks, and the HML risk factor as reflecting “distress risk”. In their tests, firms with high book-to-market ratios or risk sensitivities are often firms whose value
13 The apparent disappearance of the size effect discussed in Section 2.1, if true, would be problematic for the liquidity effect unless small-capitalization stocks have relatively low transactions costs in recent years.
Ch. 15:
Anomalies and Market Efficiency
969
has fallen recently because of bad performance. These firms are more likely to suffer financial distress costs in future periods if further bad news hits. To the extent that Fama and French (1993) are correct that SMB and HML reflect priced risk factors, then reducing a firm’s exposure to these types of risk would lower the expected return on its stock, and therefore, its cost of capital. Such a change would not increase the value of the firm, however, so there is no obvious prescription for managerial behavior. If Daniel and Titman (1997) are correct that firms with lower book-to-market ratios have lower expected returns, holding risk constant, then corporate financial policies designed to lower B/M would improve firm value by lowering the cost of capital. Of course, holding book value constant, this is equivalent to increasing the market value of the stock, which is generally good for shareholders (and not a new insight). In the corporate finance literature, the book-to-market ratio has been interpreted as a measure of the type of investment opportunities that are available to the firm. For example, Smith and Watts (1992) have interpreted high book-to-market firms as those with “assets-in-place” and low book-to-market firms as those with relatively more “growth options”. From this perspective, the fact that accounting book values make no attempt to measure the value of growth options drives the cross-sectional dispersion in book-to-market ratios. Interpreted this way, the book-to-market ratio is exogenous and reflects the investment opportunity set facing the firm. It would not make sense, for example, to advise firms to sell assets in place and invest in growth options just to lower book-to-market and, from the perspective of Daniel and Titman, to lower the cost of capital. There has also been a substantial literature using Tobin’s Q-ratio (a close relative of book-to-market) as a proxy for the efficiency with which managers use corporate assets. Dating back at least to Mørck, Shleifer and Vishny (1988), high book-to-market ratios have been interpreted as indicating poor performance and possibly the existence of agency problems between stockholders and managers. The fact that the same empirical proxy has been used in three quite different ways raises serious questions about interpreting any of this evidence in a normative way to give firms or managers advice about corporate financial policy. 6.3. Slow reaction to corporate financial policy Much of the literature studying long-horizon returns focuses on corporate financial policy decisions such as IPOs, seasoned equity offerings, share repurchases, merger bids, and so forth. A common theme in this literature is that there is a slow drift in the stock price of the firm after the event, apparently reflecting a gradual process of learning the good or bad news associated with the event. A slow reaction is inconsistent with the efficient markets hypothesis. As mentioned above, the papers that have systematically studied the behavior of long-horizon performance measures found that they have low power and unreliable statistical properties in most situations. Even if one were to accept the premise that
970
G.W. Schwert
the market learns very slowly about the implications of changes in corporate financial policy, the uncertainty associated with the future price performance for an individual firm over a period of one to five years is so great that it would be senseless to advise that firms choose their financial policies so as to take advantage of market mispricing that is only corrected after five years. 7. Conclusions This chapter highlights some interesting findings that have emerged from empirical research on the behavior of asset prices and discusses the implications of these findings for the way academics and practitioners use financial theory. In the process, I have replicated and extended some puzzling findings that have been called anomalies because they do not conform with the predictions of accepted models of asset pricing. One of the interesting findings from the empirical work in this chapter is that many of the well-known anomalies in the finance literature do not hold up in different sample periods. In particular, the size effect and the value effect seem to have disappeared after the papers that highlighted them were published. At about the same time, practitioners began investment vehicles that implemented the strategies implied by the academic papers. The weekend effect and the dividend yield effect also seem to have lost their predictive power after the papers that made them famous were published. In these cases, however, I am not aware of any practitioners who have tried to use these anomalies as a major basis of their investment strategy. The small-firm turn-of-the-year effect became weaker in the years after it was first documented in the academic literature, although there is some evidence that it still exists. Interestingly, however, it does not seem to exist in the portfolio returns of practitioners who focus on small-capitalization firms. Likewise, the evidence that stock market returns are predictable using variables such as dividend yields or inflation is much weaker in the periods after the papers that documented these findings were published. All of these findings raise the possibility that anomalies are more apparent than real. The notoriety associated with the findings of unusual evidence tempts authors to further investigate puzzling anomalies and later to try to explain them. But even if the anomalies existed in the sample period in which they were first identified, the activities of practitioners who implement strategies to take advantage of anomalous behavior can cause the anomalies to disappear (as research findings cause the market to become more efficient). References Abreu, D., and M.K. Brunnermeier (2002), “Synchronization risk and delayed arbitrage”, Journal of Financial Economics 66:341−360.
Ch. 15:
Anomalies and Market Efficiency
971
Amihud, Y., and H. Mendelson (1986), “Asset pricing and the bid-ask spread”, Journal of Financial Economics 17:223−249. Asquith, P. (1983), “Merger bids, uncertainty, and stockholder returns”, Journal of Financial Economics 11:51−83. Baker, M., and J. Wurgler (2000), “The equity share in new issues and aggregate stock returns”, Journal of Finance 55:2219−2257. Ball, R. (1978), “Anomalies in relationships between securities’ yields and yield-surrogates”, Journal of Financial Economics 6:103−126. Ball, R., S.P. Kothari and J. Shanken (1995), “Problems in measuring portfolio performance: an application to contrarian investment strategies”, Journal of Financial Economics 38:79−107. Banz, R. (1981), “The relationship between return and market value of common stock”, Journal of Financial Economics 9:3−18. Barber, B.M., and J.D. Lyon (1997), “Detecting long-run abnormal stock returns: empirical power and specification of test statistics”, Journal of Financial Economics 43:341−372. Barber, B.M., and T. Odean (2000), “Trading is hazardous to your wealth: the common stock investment performance of individual investors”, Journal of Finance 55:773−806. Barber, B.M., and T. Odean (2001), “Boys will be boys: gender, overconfidence, and common stock investment”, Quarterly Journal of Economics 116:261−292. Barberis, N., and A. Shleifer (2003), “Style investing”, Journal of Financial Economics 68:161−199. Barberis, N., A. Shleifer and R. Vishny (1998), “A Model of investor sentiment”, Journal of Financial Economics 49:307−343. Basu, S. (1977), “Investment performance of common stocks in relation to their price-earning ratios: a test of the efficient market hypothesis”, Journal of Finance 32:663−682. Basu, S. (1983), “The relationship between earnings’ yield, market value and return for NYSE common stocks: further evidence”, Journal of Financial Economics 12:129−156. Black, F. (1971), “Random walk and portfolio management”, Financial Analyst Journal 27:16−22. Booth, D.G., and D.B. Keim (2000), “Is there still a January effect?” in: D.B. Keim and W.T. Ziemba, eds., Security Market Imperfections in Worldwide Equity Markets (Cambridge University Press, Cambridge) pp. 169–178. Brav, A., and P.A. Gompers (1997), “Myth or reality? The long-run underperformance of initial public offerings: evidence from venture and nonventure capital-backed companies”, Journal of Finance 52:1791−1821. Brav, A., C. Geczy and P.A. Gompers (2000), “Is the abnormal return following equity issuances anomalous?”, Journal of Financial Economics 56:209−249. Brealey, R.A., and S.C. Myers (2000), Principles of Corporate Finance, 6th Edition (McGraw-Hill, New York). Brennan, M.J., and A. Subrahmanyam (1996), “Market microstructure and asset pricing: on the compensation for illiquidity in stock returns”, Journal of Financial Economics 41:441−464. Brennan, M.J., T. Chordia and A. Subrahmanyam (1998), “Alternative factor specifications, security characteristics, and the cross-section of expected stock returns”, Journal of Financial Economics 49:345−373. Brown, S.J., W. Goetzmann, R.G. Ibbotson and S.A. Ross (1992), “Survivorship bias in performance studies”, Review of Financial Studies 5:679−698. Campbell, J.Y. (1987), “Stock returns and the term structure”, Journal of Financial Economics 18: 373−400. Campbell, J.Y., and R.J. Shiller (1998), “Valuation ratios and the long-run stock market outlook”, Journal of Portfolio Management 24:11−26. Carhart, M.M. (1997), “On the persistence in mutual fund performance”, Journal of Finance 52:57−82. Chen, J., H. Hong and J.C. Stein (2002), “Breadth of ownership and stock returns”, Journal of Financial Economics 66:171−205.
972
G.W. Schwert
Cohen, R.B., P.A. Gompers and T. Vuolteenaho (2002), “Who underreacts to cash-flow news? Evidence from trading between individuals and institutions”, Journal of Financial Economics 66:409−462. Cowles, A. (1933), “Can stock market forecasters forecast?” Econometrica 1:309−324. Cowles, A. (1939), Common Stock Indexes, 2nd Edition, Cowles Commission Monograph 3 (Principia Press, Inc., Bloomington, IN). Daniel, K., and S. Titman (1997), “Evidence on the characteristics of cross-sectional variation in stock returns”, Journal of Finance 52:1−33. Daniel, K., D. Hirshleifer and A. Subrahmanyam (1998), “Investor psychology and security market under- and overreactions”, Journal of Finance 53:1839−1885. Daniel, K., D. Hirshleifer and A. Subrahmanyam (2001), “Overconfidence, arbitrage, and equilibrium asset pricing”, Journal of Finance 56:921−965. Davis, J.L., E.F. Fama and K.R. French (2000), “Characteristics, covariances and average returns”, Journal of Finance 55:389−406. D’Avolio, G. (2002), “The market for borrowing stock”, Journal of Financial Economics 66:271−306. DeBondt, W.F.M., and R. Thaler (1985), “Does the stock market overreact?” Journal of Finance 40: 793−805. Duffie, D., N. Garleanu and L.H. Pedersen (2002), “Securities lending, shorting, and pricing”, Journal of Financial Economics 66:307−339. Eckbo, B.E., R.W. Masulis and Ø. Norli (2000), “Seasoned public offerings: resolution of the ‘new issues puzzle’ ”, Journal of Financial Economics 56:251−291. Fama, E.F. (1970), “Efficient capital markets: a review of theory and empirical work”, Journal of Finance 25:383−417. Fama, E.F. (1975), “Short-term interest rates as predictors of inflation”, American Economic Review 65:269−282. Fama, E.F. (1976), Foundations of Finance (Basic Books, New York). Fama, E.F. (1991), “Efficient capital markets II”, Journal of Finance 46:1575−1617. Fama, E.F. (1998), “Market efficiency, long-term returns, and behavioral finance”, Journal of Financial Economics 49:283−306. Fama, E.F., and K.R. French (1988), “Dividend yields and expected stock returns”, Journal of Financial Economics 22:3−25. Fama, E.F., and K.R. French (1992), “The Cross-section of expected returns”, Journal of Finance 47:427−465. Fama, E.F., and K.R. French (1993), “Common risk factors in the returns on stocks and bonds”, Journal of Financial Economics 33:3−56. Fama, E.F., and K.R. French (1996), “Multifactor explanations of asset pricing anomalies”, Journal of Finance 51:55−84. Fama, E.F., and K.R. French (1998), “Value versus growth: the international evidence”, Journal of Finance 53:1975−1999. Fama, E.F., and G.W. Schwert (1977), “Asset returns and inflation”, Journal of Financial Economics 5:115−146. French, K.R. (1980), “Stock returns and the weekend effect”, Journal of Financial Economics 8:55−69. French, K.R., G.W. Schwert and R.F. Stambaugh (1987), “Expected stock returns and volatility”, Journal of Financial Economics 19:3−29. Fung, W., and D.A. Hsieh (1997), “Empirical characteristics of dynamic trading strategies: the case of hedge funds”, Review of Financial Studies 10:275−302. Geczy, C.C., D.K. Musto and A.V. Reed (2002), “Stocks are special too: an analysis of the equity lending market”, Journal of Financial Economics 66:241−269. Goyal, A., and I. Welch (1999), “Predicting the equity premium with dividend ratios”, manuscript (Yale University). Gromb, D., and D. Vayanos (2002), “Equilibrium and welfare in markets with financially constrained arbitrageurs”, Journal of Financial Economics 66:361−407.
Ch. 15:
Anomalies and Market Efficiency
973
Hendricks, D., J. Patel and R. Zeckhauser (1993), “Hot hands in mutual funds: short-run persistence of relative performance”, Journal of Finance 48:93−130. Ibbotson, R. (1975), “Price performance of common stock new issues”, Journal of Financial Economics 2:235−272. Ibbotson, R., J. Sindelar and J. Ritter (1994), “The market’s problem with the pricing of initial public offerings”, Journal of Applied Corporate Finance 7:66−74. Ibbotson Associates (1998), Stocks, Bonds, Bills, and Inflation: 1998 Yearbook (Ibbotson Associates, Chicago). Jaffe, J. (1974), “Special information and insider trading”, Journal of Business 47:411–428. Jegadeesh, N., and S. Titman (1993), “Returns to buying winners and selling losers: implications for stock market efficiency”, Journal of Finance 48:65−91. Jegadeesh, N., and S. Titman (2001), “Profitability of momentum strategies: an evaluation of alternative explanations”, Journal of Finance 56:699−720. Jensen, M.C. (1968), “The performance of mutual funds in the period 1945–64”, Journal of Finance 23:389−416. Jensen, M.C. (1978), “Some anomalous evidence regarding market efficiency”, Journal of Financial Economics 6:95−102. Jones, C.M., and O.A. Lamont (2002), “Short sale constraints and stock returns”, Journal of Financial Economics 66:207−239. Keim, D.B. (1983), “Size-related anomalies and stock return seasonality: further empirical evidence”, Journal of Financial Economics 12:13−32. Keim, D.B., and R.F. Stambaugh (1984), “A further investigation of the weekend effect in stock returns”, Journal of Finance 39:819−835. Keim, D.B., and R.F. Stambaugh (1986), “Predicting returns in the stock and bond markets”, Journal of Financial Economics 17:357−390. Keim, D.B., and W.T. Ziemba (2000), “Security market imperfections: an overview”, in: D.B. Keim and W.T. Ziemba, eds., Security Market Imperfections in Worldwide Equity Markets (Cambridge University Press, Cambridge) pp. xv–xxvii. Kleidon, A.W. (1986), “Variance bounds tests and stock price valuation models”, Journal of Political Economy 94:953−1001. Kothari, S.P., and J. Shanken (1997), “Book-to-market, dividend yield, and expected market returns: a time series analysis”, Journal of Financial Economics 44:169−203. Kothari, S.P., and J.B. Warner (1997), “Measuring long-horizon security price performance”, Journal of Financial Economics 43:301−339. Krishnamurthy, A. (2002), “The bond/old-bond spread”, Journal of Financial Economics 66:463−506. Lakonishok, J., A. Shleifer and R.W. Vishny (1994), “Contrarian investment, extrapolation, and risk”, Journal of Finance 49:1541−1578. Lee, C.M.C., A. Shleifer and R.H. Thaler (1991), “Investor sentiment and the closed-end fund puzzle”, Journal of Finance 46:75−110. Lewellen, J. (2002), “Momentum and autocorrelation in stock returns”, Review of Financial Studies 15:533−563. Lintner, J. (1965), “The valuation of risk assets and the selection of risky investment in stock portfolios and capital budgets”, Review of Economics and Statistics 47:13−37. Lo, A.W., and A.C. MacKinlay (1990), “Data-snooping biases in tests of financial asset pricing models”, Review of Financial Studies 3:431−467. Loughran, T., and J.R. Ritter (1995), “The new issues puzzle”, Journal of Finance 50:23−51. Loughran, T., and A.M. Vijh (1997), “Do long-term shareholders benefit from corporate acquisitions?” Journal of Finance 52:1765−1790. Lowry, M., and G.W. Schwert (2002), “IPO market cycles: bubbles or sequential learning?” Journal of Finance 57:1171−1200.
974
G.W. Schwert
MacKinlay, A.C. (1995), “Multifactor models do not explain deviations from the CAPM”, Journal of Financial Economics 38:3−28. Malkiel, B.G. (1995), “Returns from investing in equity mutual funds, 1971 to 1991”, Journal of Finance 50:549−572. Mandelker, G. (1974), “Risk and return: the case of merging firms”, Journal of Financial Economics 1:303−335. Marsh, T.A., and R.C. Merton (1986), “Dividend variability and variance bounds tests for the rationality of stock market prices”, American Economic Review 76:483−498. Mitchell, M.L., and E. Stafford (2000), “Managerial decisions and long-term stock price performance”, Journal of Business 73:287−320. Mørck, R., A. Shleifer and R.W. Vishny (1988), “Management ownership and market valuation: an empirical analysis”, Journal of Financial Economics 20:293−315. Nelson, C.R., and G.W. Schwert (1977), “On testing the hypothesis that the real rate of interest is constant”, American Economic Review 67:478−486. Odean, T. (1999), “Do investors trade too much?” American Economic Review 89:1279−1298. Pontiff, J. (1995), “Closed-end fund premia and returns: implications for financial market equilibrium”, Journal of Financial Economics 37:341−370. Pontiff, J. (1996), “Costly arbitrage: evidence from closed-end funds”, Quarterly Journal of Economics 111:1135−1151. Reinganum, M.R. (1981), “Misspecification of capital asset pricing: empirical anomalies based on earnings’ yields and market values”, Journal of Financial Economics 9:19−46. Reinganum, M.R. (1983), “The anomalous stock market behavior of small firms in january: empirical tests for tax-loss selling effects”, Journal of Financial Economics 12:89−104. Ritter, J.R. (1991), “The long-run performance of initial public offerings”, Journal of Finance 46:3−27. Roll, R. (1983), “Vas ist das? The turn-of-the-year effect and the return premia of small firms”, Journal of Portfolio Management 9:18−28. Schultz, P. (2003), “Pseudo market timing and the long-run underperformance of IPOs”, Journal of Finance 58:483−517. Schwert, G.W. (1983), “Size and stock returns, and other empirical regularities”, Journal of Financial Economics 12:3−12. Schwert, G.W. (1989), “Why does stock market volatility change over time?” Journal of Finance 44:1115−1153. Schwert, G.W. (1990), “Indexes of United States stock prices from 1802 to 1987”, Journal of Business 63:399−426. Schwert, G.W. (1991), “Review of market volatility by R. Shiller: Much ado about . . . very little”, Journal of Portfolio Management 17:74−78. Schwert, G.W. (1996), “Markup pricing in mergers & acquisitions”, Journal of Financial Economics 41:153−192. Sharpe, W.F. (1964), “Capital asset prices: a theory of market equilibrium under conditions of risk”, Journal of Finance 19:425−442. Shiller, R.J. (1981a), “Do stock prices move too much to be justified by subsequent changes in dividends?” American Economic Review 75:421−436. Shiller, R.J. (1981b), “The use of volatility measures in assessing market efficiency”, Journal of Finance 36:291−304. Shiller, R.J. (1989), Market Volatility (MIT Press, Cambridge, MA). Shleifer, A., and R.W. Vishny (1997), “The limits of arbitrage”, Journal of Finance 52:35−55. Smith, C.W., and R.L. Watts (1992), “The investment opportunity set and corporate financing, dividend, and compensation policies”, Journal of Financial Economics 32:263−292. Thompson, R. (1978), “The information content of discounts and premiums on closed-end fund shares”, Journal of Financial Economics 6:151−186.
Chapter 16
ARE FINANCIAL ASSETS PRICED LOCALLY OR GLOBALLY? G. ANDREW KAROLYI ° Ohio State University RENE´ M. STULZ∗ Ohio State University, NBER
Contents Abstract Keywords 1. Introduction 2. The perfect financial markets model 2.1. 2.2. 2.3. 2.4.
Identical consumption-opportunity sets across countries Different consumption-opportunity sets across countries A general approach Empirical evidence on asset pricing using perfect market models
3. Home bias 4. Flows, spillovers, and contagion 4.1. Flows and returns 4.2. Correlations, spillovers, and contagion
5. Conclusion References
976 976 977 978 979 982 988 992 997 1004 1007 1010 1014 1014
° Respectively, Charles R. Webb Professor of Finance and Reese Chair of Banking and Monetary Economics, Fisher College of Business, Ohio State University. Ren´e Stulz is also a Research Associate at the National Bureau of Economic Research. We are grateful to Dong Lee and Boyce Watkins for research assistance, and to Michael Adler, Kee-Hong Bae, Warren Bailey, Soehnke Bartram, Laura Bottazzi, Magnus Dahlquist, Craig Doidge, Vihang Errunza, Cheol Eun, Jeff Frankel, Thomas Gehrig, John Griffin, Cam Harvey, Mervyn King, Paul O’Connell, Sergio Schmukler, Ravi Schukla, Enrique Sentana, Bruno Solnik, Christof Stahel, Lars Svensson, Linda Tesar, Frank Warnock, and Simon Wheatley for comments.
Handbook of the Economics of Finance, Edited by G.M. Constantinides, M. Harris and R. Stulz © 2003 Elsevier B.V. All rights reserved
976
G.A. Karolyi and R.M. Stulz
Abstract We review the international finance literature to assess the extent to which international factors affect financial asset demands and prices. International asset-pricing models with mean-variance investors predict that an asset’s risk premium depends on its covariance with the world market portfolio and, possibly, with exchange rate changes. The existing empirical evidence shows that a country’s risk premium depends on its covariance with the world market portfolio and that there is some evidence that exchange rate risk affects expected returns. However, the theoretical assetpricing literature relying on mean-variance optimizing investors fails in explaining the portfolio holdings of investors, equity flows, and the time-varying properties of correlations across countries. The home bias has the effect of increasing local influences on asset prices, while equity flows and cross-country correlations increase global influences on asset prices.
Keywords international capital asset-pricing model, home bias puzzle, segmented markets, integrated markets, equity flows JEL classification: G12, G15, G11
Ch. 16:
Are Financial Assets Priced Locally or Globally?
977
1. Introduction Over the last forty years, financial markets throughout the world have steadily become more open to foreign investors. Yet, most academic research on portfolio choice and asset pricing focuses only on local factors when investigating the determinants of portfolio choice and of expected returns on risky assets. For instance, a vast literature looks at the relation between the U.S. risk premium and the volatility of the U.S. stock market even though in global markets the U.S. risk premium ought to depend at least on the relation between U.S. stocks, a global market portfolio, and possibly exchange rates. In this paper, we examine the lessons from the theoretical and empirical finance literature on the extent to which global factors – i.e., foreign stock markets and exchange rates – affect asset demands and prices, and on when these factors can be ignored or have to be taken seriously. We start this paper by examining how portfolios would be chosen and asset prices determined in a world where financial markets are assumed to be internationally perfect. In such a world, an asset has the same price regardless of where it is traded and no finance is local. Markets where assets have the same price regardless of where they are traded are said to be integrated, while markets where the price of an asset depends on where it is traded are said to be segmented. In examining the empirical evidence on asset-pricing models that assume financial markets to be internationally integrated, a case can be made for studying the cross-section of expected returns without regard for international influences, at least for large markets like the USA. But, the case for ignoring international determinants of national stock market risk premiums and how they evolve over time has no basis. Models with internationally perfect financial markets have severe limitations in explaining portfolio holdings and how portfolio holdings change over time. The home bias puzzle, which refers to the phenomenon that investors overweight the securities of their country in their portfolio, is inconsistent with models where investors have the same information across countries and where financial markets are assumed to be perfect. Recent research provides a wealth of empirical results on how foreign investors choose their asset holdings and on their performance. The picture that emerges from this literature is that there are systematic patterns in ownership of foreign stocks that are hard to reconcile with models assuming perfect financial markets. The only way to rationalize these patterns would be to argue that the gains from international diversification are too small to make holding foreign assets worthwhile. Though there is evidence suggesting that the gains from international diversification have become weaker over time, only investors with extremely strong priors against the hypothesis that assets are priced in internationally integrated markets are likely to conclude that international diversification is not worth it. The home bias decreases the relevance of international determinants of domestic stock prices. The 1990s showed that cross-country equity flows are highly volatile and raised important questions about whether and how flows affect stock prices. In 1998, net equity flows to Latin America amounted to $1.7 billion in contrast to $27.2 billion in
978
G.A. Karolyi and R.M. Stulz
1993. Yet, at the same time, net capital flows to Latin America in 1998 were more than twice their amount in 1993. In other words, both the amounts of net equity flows and net equity flows in proportion to total capital flows are highly variable. Further, from 1994 to 1995, net equity flows to Latin America fell by roughly 40% while net equity flows to East Asia increased by a bit more than 40%. 1 The growing importance of capital flows in the 1990s has led to concerns about contagion. Since national stock markets are correlated because of interdependence among countries, one would naturally expect a shock in one country to affect other countries. Some have called this transmission of shocks contagion, even though it has little to do with the traditional definition of contagion in epidemiology according to which a healthy individual is made sick by a disease transmitted in some way from a stricken individual. Contagion that makes a healthy country sick has been a source of concern in the 1990s, leading many to believe that such irrational or non-fundamental contagion can destabilize economies. Neither the high volatility of equity flows nor contagion is consistent with models where investors act rationally, are similarly informed, and trade in perfect financial markets. Some models suggest that factors that have been advanced to explain the home bias can also help explain the volatility of equity flows and perhaps some forms of contagion. However, the volatility of equity flows and contagion increase the importance of international determinants of stock prices – at least for the countries affected by such phenomena. The paper proceeds as follows. Section 2 reviews the various international assetpricing models that assume internationally integrated financial markets, shows the conditions under which assets can be priced locally with these models, and surveys the empirical evidence on these models. Section 3 discusses the home bias. Section 4 addresses the issues of the volatility of equity flows and contagion. Section 5 concludes.
2. The perfect financial markets model How would individuals choose portfolios and how would asset prices be set in a world with perfect financial markets? In an international setting, the most important implication of perfect financial markets is that all investors face the same investmentopportunity set because there are no barriers to international investment. However, in a world of perfect financial markets, the assumptions one makes about how goods prices in different currencies are related have important implications for how individuals choose portfolios and how asset prices are determined. If consumption-opportunity sets are the same across countries, it does not matter where an investor is located. An investor can achieve the same expected lifetime utility given his wealth anywhere in the world. If consumption-opportunity sets differ across countries and investors
1
See Edison and Warnock (2002, table 1).
Ch. 16:
Are Financial Assets Priced Locally or Globally?
979
are not mobile, then an investor’s expected lifetime utility depends on where she is located. In that case, an investor may hold a different portfolio depending on her country of residence. We consider successively the case where investors have the same consumption-opportunity sets across countries and the case where they do not. We then present a general approach that encompasses the special cases we discuss in the first two parts of this section. We conclude the section with a discussion of the empirical evidence on the predictions of the perfect markets model for security returns. 2.1. Identical consumption-opportunity sets across countries Consider a world where goods and financial markets are perfect, so that we have no transportation costs, no tariffs, no taxes, no transaction costs, and no restrictions to short sales. Grauer, Litzenberger and Stehle (1976) modeled such a world using a state-preference framework. We assume further that there is only one consumption good. 2 In such a world, every investor has the same consumption and investmentopportunity sets regardless of where she resides. Further, the law of one price holds for the consumption good, so that if e(t) is the price of foreign currency at date t, P(t) is the price of the good in the domestic country, and P ∗ (t) is the price in the foreign currency, it must be that P(t) = e(t) P ∗ (t). In such a world, an investor can use the consumption good as the num´eraire, so that all prices and returns are expressed in units of the consumption good. We now consider a one-period economy in which real returns are multivariate normal and there is one asset that has a risk-free return in real terms, earning r. Investors care only about the distribution of their real terminal wealth. The properties of the multivariate normal distribution [see Fama (1976, Chs 4 and 8)] imply that: E(ri ) − r = ai + bid [E (rd ) − r] ,
(1)
where E(·) denotes an expectation, ri is the real return on asset i, rd is the real return i ,rd ) on the domestic market portfolio, bid is the domestic beta of asset i defined as Cov(r Var(rd ) where Cov(·, ·) denotes a covariance and Var(·) denotes a variance, and ai is a constant. If domestic investors can only hold domestic assets and if foreign investors cannot hold domestic assets, then we know that ai must be equal to zero for all assets because then the capital asset-pricing model (CAPM) must hold in the domestic country. However, in a world where domestic investors have access to foreign assets, the CAPM must hold in real terms using the world market portfolio. Consequently: E(ri ) − r = biw [E (rw ) − r] ,
(2) biw
where rw is the real return on the world market portfolio, is the world beta of Cov(ri ,rw ) asset i defined as Var(rw ) . We call Equation (2) the “world CAPM” to contrast it with 2
The consumption good can be a basket of goods where the spending proportions on each good are constant.
980
G.A. Karolyi and R.M. Stulz
the traditional implementation of the CAPM that uses the domestic market portfolio, which we call the “domestic CAPM”. We now build on Stulz (1995) to analyze the mistake in using the domestic CAPM when the world CAPM is appropriate. Equations (1) and (2) imply that ai must satisfy ai = biw − bdw bid [E (rw ) − r] .
(3)
Equation (3) shows that systematic mistakes are possible when one uses the domestic CAPM and when domestic investors have access to world markets. At the same time, though, Equation (3) puts a bound on the economic importance of these mistakes. Since the domestic market portfolio has a beta of one, the weighted average of the alphas in Equation (3) must be equal to zero. To understand the nature of the mistakes, we have to understand how the world beta of asset i, biw , differs from the product of the world beta of the domestic market, bdw , and the domestic beta of asset i, bid . Using the multivariate normal distribution, we can write the return of asset i as ri − r = ai + bid [rd − r] + eid .
(4)
Similarly, the return of the domestic market portfolio can be written as rd − r = bdw [rw − r] + edw .
(5)
Substituting Equation (5) into Equation (4), we have: ri − r = ai + bid bdw [rw − r] + edw + eid . The world market beta of asset i is therefore: Cov eid , rw w d w . bi = bi bd + Var (rw )
(6)
(7)
Substituting Equation (7) into Equation (3), we get the result of Stehle (1977) that the pricing mistake from using the domestic CAPM when the global CAPM is appropriate is Cov eid , rw [E (rw ) − r] . (8) ai = Var (rw ) Equation (8) shows that the domestic CAPM understates the return of assets whose market model residual is positively correlated with the world market portfolio. However, the domestic CAPM correctly prices those assets whose market model residual is uncorrelated with the world market portfolio. If high domestic beta stocks have risk diversifiable internationally but not domestically, using the domestic CAPM inappropriately would lead one to conclude that the security market line is too flat.
Ch. 16:
Are Financial Assets Priced Locally or Globally?
981
One would expect multinational corporations to have returns correlated with foreign markets in such a way that the domestic market portfolio return does not capture all their systematic risk. Our analysis so far shows that, even when the assumptions required for the CAPM to hold are made, the domestic CAPM does not hold in global markets. One inevitably makes pricing mistakes using the domestic CAPM as long as Cov(eid , rw ) is not zero for every security. There exists an upper bound on the absolute value of the pricing mistakes. Let R2ij be the R-square of a regression of the return of asset i on asset j. With this notation, the bound can be written as: , Var(ri ) [E (rw ) − r] . 1 − R2id 1 − R2wd (9) |ai | Var(rw ) If the variance of security i is four times the variance of the world market, the R-square of a regression of the security on the domestic market portfolio is 0.3, the R-square of a regression of the world market on the domestic market is 0.2, and the world market risk premium is 6%, the bound is 9%. Consider now an asset with the same volatility but in the USA, where a regression of the world market portfolio on the domestic market portfolio would have an R-square of at least 0.8. In this case, the bound would be 4.49%. If we are willing to make assumptions about the correlation between the domestic market model residual of an asset and the return of the world market portfolio that is not explained by the domestic market portfolio, we can compute the limits of the pricing mistakes. Consider, for example, a correlation of 0.2. In this case, the asset in a country whose market portfolio is poorly explained by the world market portfolio has a pricing mistake no greater than 1.8%, while the U.S. asset has a pricing mistake of 0.88%. It follows from this analysis that using a domestic CAPM is more of a problem in countries whose market has a lower beta relative to the world market portfolio. Since the USA is a country where the R-square of a regression of the world market portfolio on the domestic market portfolio is high, the mistakes made by using the domestic market portfolio for U.S. risky assets are smaller than for risky assets of other countries where the R-square of similar regressions is much smaller. Further, the mistakes are likely to be less important for larger firms simply because the domestic market portfolio explains more of the return of these firms. We now turn to the issue of the determinants of the market risk premium. We keep the same assumptions, but now we have an investor who chooses her portfolio to maximize her expected utility of terminal real wealth, E[U (W )], with U (W ) strictly increasing and concave in W . Consider first an investor who can only hold U.S. assets, so that the investor holds the U.S. market portfolio. Using a first-order Taylor-series expansion, the first-order conditions of the investor’s portfolio choice problem imply that E (rd ) − r = T R Var (rd ) ,
(10)
982
G.A. Karolyi and R.M. Stulz
where T R is the investor’s coefficient of relative risk tolerance. Equation (10) implies that the risk premium on the domestic market portfolio increases with the variance of the return of the domestic market. This equation has been the basis for a large literature starting from Merton (1980). However, Equation (2) provides a different formula for the risk premium on the domestic market portfolio: E (rd ) − r = bdw [E (rw ) − r] .
(11)
With Equation (11), the risk premium on the domestic market portfolio depends on its world beta. Since stock markets typically have a positive correlation with the world market portfolio, an increase in the variance of the domestic market portfolio will increase its risk premium. However, if one were to look at the cross-section of expected returns across national stock markets, there is no reason for markets with higher variances to have higher risk premiums since a higher variance for a country’s market portfolio does not imply that the country’s market has a higher correlation with the world market portfolio. In fact, one would expect the opposite, in that emerging markets tend to have high variances but low correlations with the world market portfolio [see Harvey (1995)]. So far, we considered a one-period model. If one is willing to assume that Equation (11) holds each period, the volatility of the domestic market portfolio could increase at the same time that the risk premium on the domestic market portfolio could fall. This could happen because the correlation of the domestic market portfolio with the world market portfolio falls or the risk premium on the world market portfolio falls. 2.2. Different consumption-opportunity sets across countries In the previous section, we considered a model where all investors have the same consumption-opportunity set. In such a model, the residence of an investor who maximizes the expected utility of terminal wealth and whose terminal wealth consists only of the value of securities held is irrelevant. In the presence of inflation, purchasing power parity must hold under such conditions. With purchasing power parity, a percentage depreciation of the domestic currency is offset by an equivalent percentage increase in domestic prices. There is a large literature in international economics that documents that purchasing power parity does not hold – at least for countries with moderate and low inflation. 3 The assumption that consumption-opportunity sets are the same across countries is stronger than the assumption of purchasing power parity. It requires that the relative price of any two consumption goods be the same in each country at all times. Even with purchasing power parity holding, it would be possible that a particular good is not available in one country but is available in another country. 3
Froot and Rogoff (1995, p. 1648) state that “price level movements do not begin to offset exchange rate swings on a monthly or even annual basis”. In a comprehensive study using 64 real exchange rates, O’Connell (1998) concludes that “no evidence against the random walk null can be found”.
Ch. 16:
Are Financial Assets Priced Locally or Globally?
983
The law of one price would not apply for that good even though it has to apply for every good when all investors have the same consumption-opportunity set. The empirical evidence shows that there are significant departures from the law of one price across as well as within countries, but keeping distance constant, the departures are more volatile across countries than within countries. 4 A simple way to model a world with departures from the law of one price is to assume that there is a different consumption good in each country. 5 This approach was first used by Solnik (1974) to derive an asset-pricing model where foreign exchange rate risk is priced. Such an assumption implies that trade takes place in intermediate goods. Solnik (1974) assumes that there is no inflation, so that the price of the consumption good in a country is fixed and the price of a foreign currency is simply the price of the foreign good in terms of the domestic good. With such an approach, investors in each country use their consumption good as the num´eraire. If one considers the portfolio choice of a domestic investor in this model, there is only one difference from the model discussed in the previous section but it is of critical importance. In the previous section, the consumption good had the same price everywhere using the same num´eraire. This meant that all investors would evaluate the riskiness of an asset in the same way. Now, this is no longer true. Consider an asset that is riskless in terms of the domestic good. This asset is riskless for a domestic investor, but it is not so for a foreign investor. Let r be the return per unit of time of the domestic risk-free asset in domestic currency and r ∗ be the return of the foreign currency risk-free asset in foreign currency. For a domestic investor, the instantaneous domestic currency return on the foreign currency riskless asset depends on the exchange rate dynamics. We assume that the exchange rate follows a geometric Brownian motion, so that de = me dt + se dze , e
(12)
where me is the instantaneous expected rate of change of the exchange rate, se is the instantaneous volatility of the rate of change of the exchange rate, and dze is the increment over dt of a Brownian motion with zero drift and unit volatility. With this assumption about the foreign exchange dynamics, the instantaneous domestic currency return on the foreign risk-free asset is r ∗ dt + (de/e), while the foreign currency return on the foreign risk-free asset is r ∗ dt. In contrast, the foreign currency return on the 4
See Froot and Rogoff (1995) and Sarno and Taylor (2001) for reviews of the evidence. Engel and Rogers (1996) examine deviations from the law of one price across cities in the USA and Canada. They find that the volatility of deviations increases as distance between two cities increases but there is also a substantial border effect on this volatility. 5 Alternatively, there could be many consumption goods but all investors would consume the goods in the same constant proportions so that the basket of goods they consume is constant. Such an extension would lead to the same results as Solnik’s model provided that the price of a country’s basket of goods is constant in the currency of that country.
984
G.A. Karolyi and R.M. Stulz
domestic risk-free asset is rdt−(de/e) + se2 dt. The volatility term comes from Jensen’s inequality: the price of the domestic currency for foreign investors is a convex function of the domestic currency price of the foreign currency. Whenever a domestic investor holds a foreign asset, her return in domestic currency depends on the exchange rate, so that she bears exchange rate risk. Let Ifj be the price in domestic currency of foreign asset j and let Ifj∗ be the price of the same asset in foreign currency. The instantaneous return in domestic currency on foreign asset j is dIfj dIfj∗ de ∗ = ∗ + sfj,e dt + , Ifj Ifj e
(13)
∗ where sfj,e is the instantaneous covariance of the foreign currency return of foreign asset j with the rate of change of the exchange rate. A domestic investor can make the domestic currency return on her holdings of foreign assets uncorrelated with the rate of change of the exchange rate, so that she does not have to bear foreign exchange rate risk if she does not wish to do so. Let Iff∗ be the value in foreign currency of a domestic investor’s investment in foreign risky assets. If the investor shorts n units of the foreign risk-free asset per unit of investment in the foreign risky assets and puts the proceeds of the short sale in the domestic risk-free asset, the domestic currency return on the investor’s holdings of foreign currency risky assets over dt is
dIff∗ de dIdf de ∗ ∗ = ∗ + + sff ,e dt − n r dt + − r dt . Idf Iff e e s∗
(14)
If the investor chooses n to be equal to 1 + sff2,e , the domestic currency return of e the investor’s investment in the foreign currency risky assets is uncorrelated with the exchange rate. In the risk management literature, n is the hedge ratio – i.e., the ratio of the size of the hedge relative to the size of the position being hedged. In Solnik’s model, the returns of risky assets in their own currency are assumed to be uncorrelated with the rate of change of the exchange rate, so that the optimal hedge ratio is one. Sercu (1980) relaxes this assumption of Solnik’s model. We call the extended model the Solnik/Sercu model in the following discussion. With multiple countries, each risky asset is exposed to the risk of multiple currencies, so that an asset has an optimal hedge ratio with respect to each foreign currency. Since the investor can hedge her foreign stocks against exchange rate risk, the two decisions of how much foreign exchange rate risk to bear and of how much foreign s∗ equity to hold are separable. Empirically, sff2,e is close to 0, so that an investor hedges e the foreign exchange risk of a foreign currency investment by shorting the currency in which the risky assets are denominated for an amount approximately equal to her holdings of these risky assets. If our investor chose to take on no foreign exchange rate risk, she would therefore sell short an amount of the foreign risk-free asset roughly equal to her holdings of foreign risky assets. Foreign investors would have to hold
Ch. 16:
Are Financial Assets Priced Locally or Globally?
985
an amount of the foreign risk-free asset equal to the amount of that asset shorted by domestic investors. If the domestic investor holds some foreign exchange risk, she does not hedge all her holdings of foreign stocks and, consequently, foreign investors hold a smaller amount of the foreign risk-free bond. Perold and Schulman (1988) argued in an influential article that hedging is a free lunch since it reduces volatility at no cost in expected return because on average there is no compensation for bearing foreign exchange rate risk. Their argument is controversial on empirical grounds because generally high interest rate currencies earn a risk premium (see Froot and Thaler (1990), for a review of the evidence). If correct, however, their argument would imply that the exchange rate exposure of stocks would not affect their expected returns, so that one could ignore exchange rates when considering the cross-section of expected returns for common stocks. To consider the validity of the Perold and Schulman argument on theoretical grounds, note that it implies that foreign investors short the domestic currency for an amount equal to their holdings of domestic currency risky assets. Domestic investors must be the counterparties of these short sales, so that if foreign investors hedge all their domestic exchange rate risk, domestic investors must hold an amount of the domestic risk-free asset equal to the short position in domestic risky assets of foreign investors. In this case, foreign investors take no foreign exchange risk. This makes sense if foreign investors receive no reward for being long the domestic currency, which requires that the expected excess return of the domestic currency risk-free asset over the foreign risk-free asset is zero for them. The foreign currency excess return of the domestic currency risk-free asset over the foreign risk-free asset is r dt + se2 dt−de/e−r ∗ dt. Foreign investors will not take a long position in the domestic risk-free asset if the expected excess return of the domestic currency risk-free asset over the foreign risk-free asset for them is zero. That is, the following condition must hold: r dt + se2 dt−me dt−r ∗ dt = 0. For the Perold and Schulman argument to be theoretically valid, it must be that when this condition is satisfied, domestic investors do not take exchange rate risk either. However, the excess return on the foreign risk-free asset over the domestic risk-free asset for a domestic investor is r ∗ dt + de/e−r dt, so that the expected excess return for a domestic investor on a long position in the foreign risk-free asset is r ∗ dt + me dt−r dt. If the expected excess return of holding the domestic risk-free asset is zero for foreign investors, then the expected excess return of holding the foreign risk-free asset is positive and equal to se2 dt for a domestic investor. Consequently, it is not possible for both domestic and foreign investors to contemplate an expected excess return of zero when investing in the other country’s risk-free asset. 6 In a symmetric world, one would think neither
6 This result holds more generally and is known as Siegel’s Paradox [Siegel (1972)]. This Paradox states that it is not possible for the forward exchange rate to be an unbiased predictor of the spot exchange rate from the perspective of investors in the domestic as well as in the foreign country. If the forward exchange rate is F and the spot exchange rate at maturity of the contract is e, the forward exchange
986
G.A. Karolyi and R.M. Stulz
domestic nor foreign investors would be rewarded for bearing exchange rate risk since rewarding one group of investors would seem to penalize the other group of investors. Jensen’s inequality makes this reasoning incorrect. If foreign investors receive no reward for being long the foreign currency risk-free asset, then domestic investors are rewarded for being long the foreign currency risk-free asset. In equilibrium, in a symmetric world where consumption-opportunity sets differ across countries, there will therefore exist some reward for taking currency risk, so that exposures to foreign exchange rates will be priced in assets. To present a key insight of Solnik’s model, let’s use his assumption that the domestic risky assets have domestic currency returns uncorrelated with the exchange rate and that the foreign risky assets have foreign currency returns uncorrelated with the exchange rate. In this case, the domestic investor can hedge her holdings of the foreign currency risky assets against exchange rate risk by borrowing an amount in foreign currency equal to her holdings of foreign currency risky assets. Similarly, foreign investors hedge their currency risk by borrowing in domestic currency an amount equal to their holdings of domestic currency risky assets. If there is no reward for taking foreign currency risk, domestic investors will short the foreign currency risk-free asset for an amount exactly equal to their investment in foreign currency risky assets and will bear no foreign exchange rate risk. As the reward for taking foreign exchange rate risk increases, domestic investors decrease the extent to which they hedge the foreign exchange risk associated with their holdings of assets risky in foreign currency. We assume that there are N risky assets for each investor. Let the N th risky asset be the foreign currency risk-free asset for a domestic investor and the domestic currency risk-free asset for a foreign investor. Define wNi as the fraction of her wealth the ith domestic investor puts in the N th domestic risky asset, which is the foreign currency risk-free asset for her, and wNfj be the fraction of her wealth the jth foreign investor puts in the N th foreign risky asset, which is the domestic currency risk-free asset for her. Solving for these asset demands, we have: ∗ r + me − r i d (15) − wfdi , wN = Ti se2 r − me + se2 − r ∗ fj f wN = Tj (16) − wdfj , se2 where Tid and Tj f are, respectively, the relative risk tolerances of the domestic investor and of the foreign investor, and wfdi and wdfi are, respectively, the fraction of the wealth rate is an unbiased predictor from the perspective of the domestic country if F = E(e). However, from the perspective of investors in the foreign country, the forward exchange rate is an unbiased predictor if 1/F = E(1/e). Because of Jensen’s inequality, E(e) is not equal to 1/ E(1/e) as long as e is random. Black (1990) argues that investors bear exchange rate risk in equilibrium because of Siegel’s paradox. Solnik (1993) clarifies the implication of Siegel’s paradox for the pricing of exchange rate risk.
Ch. 16:
Are Financial Assets Priced Locally or Globally?
987
of the ith domestic investor invested in foreign risky assets and of the wealth of the jth foreign investor invested in domestic risky assets. Let T d be the wealth-weighted average of the risk-tolerances of the domestic investors and T f the corresponding quantity for foreign investors. Further, let W d be the aggregate wealth of domestic investors, W f the aggregate wealth of foreign investors, and W w equals the sum of W d and W f . We can use Equations (15) and (16) to obtain the equilibrium risk premium on an investment in a foreign risk-free bond financed through risk-free borrowing in the domestic currency. Assuming that there is no net supply of the risk-free asset, the domestic holdings of the domestic risk-free asset must correspond to the short position of foreign investors in the same asset. Another way to put this is that a dollar invested in the domestic risk-free asset by domestic investors is a dollar that foreign investors borrow from domestic investors. Equilibrium in the market for the domestic risk-free asset therefore requires that W d − wdw W w + T f W f r ∗ + me − r = , d d f f T W +T W se2
(17)
where wdw is the weight of domestic currency risky assets in the world market portfolio. The numerator on the right-hand side of this equation is the domestic currency expected excess return on an investment in the foreign risk-free bond. Alternatively, it is the expected gain from a long forward position to buy the foreign currency. This is the forward risk premium for a domestic investor. An important implication of Equation (17) is that the forward risk premium can be different from zero when (1) the foreign exchange rate is uncorrelated with equities in their own currency and (2) there is no net supply of riskless bonds in either domestic or foreign currency. 7 In such a world, there would be no risk premium for foreign exchange in the world CAPM in real terms. The domestic currency return on foreign stocks would depend on the exchange rate, but the real return of foreign stocks would not. Consequently, the beta of foreign exchange with respect to the world market portfolio would be zero. In contrast, foreign exchange rate risk would be priced in Solnik’s model so that the markets for the foreign currency and domestic currency risk-free assets are in equilibrium. Using the domestic CAPM leads to pricing mistakes in this model not only because the market portfolio should include foreign assets, but also because it omits the foreign currency risk factor when the assumptions that lead to Solnik’s model hold. To understand the determinants of the risk premium on foreign exchange, consider an increase in domestic wealth, W d . As a result of this increase, the demand for 7 One might argue that assuming no riskless bonds outstanding is unrealistic since there are government bonds outstanding. However, government bonds are obligations of domestic residents, so that in a world where the Ricardian equivalence result holds, one can view domestic residents as supplying these bonds [see Barro (1974)].
988
G.A. Karolyi and R.M. Stulz
foreign stocks by domestic investors increases. Absent increased borrowing in foreign currency, the foreign currency exposure of domestic investors increases. To reduce this foreign currency exposure, domestic investors want to borrow more abroad to hedge holdings of foreign stocks. Keeping all expected returns the same, there is now excess demand for borrowing in foreign currency. To reduce that excess demand, the reward for bearing foreign currency risk has to increase for domestic investors, so that they borrow less to hedge their foreign stock holdings. An increase in the value of domestic stocks means that the weight of domestic stocks in the portfolios of foreign investors increases and hence these investors want to borrow more in domestic currency to reduce their foreign currency exposure. To reduce the excess demand for borrowing in domestic currency by foreign investors, the expected excess return of the domestic risk-free asset in foreign currency has to increase, which is equivalent to a decrease of the expected excess return of the foreign risk-free asset in domestic currency. This pricing of foreign exchange risk is the central feature of Solnik’s model. It is important to note, however, that his model takes exchange rate dynamics as exogenous. In an equilibrium model, the exchange rate and its dynamics would be functions of more primitive quantities. In such a model, there would be no exchange rate risk premium – the risk premium paid for taking on foreign exchange rate risk would be the risk premium for taking on exposure to the primitive variables that affect the exchange rate. However, it would still be correct that taking on foreign exchange rate risk could be rewarded even when there is no correlation between the return on foreign exchange and the return on the market portfolio. 8
2.3. A general approach Merton (1973) provides a general representation of the demand function for risky assets when the investment-opportunity set changes over time. His approach was subsequently extended to the case where commodity prices change randomly over time. Let S be a vector of state variables that describe the investment and consumption opportunity sets. We now assume that there are K countries, with n j assets in country j, and N risky assets in domestic currency. V is the instantaneous variance–covariance matrix of returns of the assets risky in domestic currency, VS is the instantaneous covariance matrix between asset returns and the changes in the state variables, and m is the vector of instantaneous expected excess returns using the domestic currency as the num´eraire. Investors are assumed to maximize their expected lifetime utility of consumption of goods. Let J (W , S, t) be the indirect utility function of lifetime
8
International asset-pricing models that make the exchange rate endogenous include Lucas (1982), Svensson (1985), and Stulz (1987) for monetary economies. Uppal (1993) extends the model of Dumas (1992), where it is costly to transport the only consumed good between two countries. In his model, the exchange rate is the relative price of the foreign good in terms of the domestic good.
Ch. 16:
Are Financial Assets Priced Locally or Globally?
989
consumption as a function of wealth, the state variables, and time. With this notation, the asset demands for the ith investor of country k must satisfy: wki =
ki −JwS −Jwki −1 −1 V m + V V , S ki W ki ki W ki Jww Jww
(18)
ki ki where Jwki is the partial derivative of J (W , S, t) with respect to wealth. Jww and JwS are, ki respectively, the partial derivatives of Jw with respect to wealth and the vector of state variables. Equation (18) has a straightforward interpretation. All investors’ holdings of risky assets can be decomposed into holdings of the so-called tangency portfolio, V −1 m, and S so-called hedge portfolios, where the weights of these portfolios sum to one through riskless borrowing and lending. The hedge portfolios hedge the investor against the impact of unexpected changes in state variables on her marginal utility of wealth. The asset demands of Equation (18) accommodate any assumptions about consumption opportunity sets. Note that all investors can solve their portfolio choice problem using domestic currency as their num´eraire. We can use Equation (18) to understand the asset demands of the model discussed in Section 2.1. With that model, all investors consume the same good, the price of the good differs across countries, the law of one price applies to the good, and the investment-opportunity set is constant. The real wealth of the investor depends only on the price of the good in the num´eraire currency. If the price dynamics of the good in that currency are exogenously specified and the price of the good is a sufficient statistic for the consumption-opportunity set, then the price of the good in the num´eraire currency is the only state variable. If there is a risk-free asset in real terms, then a nominal risk-free bond is a perfect hedge against unanticipated changes in the price of the good. Hence, Equation (18) can be used to characterize asset demands when there is a risk-free asset in real terms as well as when such an asset does not exist. This equation shows how inconsequential exchange rates are in the model of Section 2.1. A foreign currency risk-free asset is treated no differently than any other risky asset in the asset demand equations. In other words, there is nothing special about exchange rates in that model. If the assumptions of the Solnik/Sercu model are used to characterize asset demands using Equation (18), then the only state variables are exchange rates. Each investor consumes only one commodity, but that commodity depends on the country of residence of the investor. The indirect utility function of an investor depends only on her wealth and on the exchange rate of her country. This means that the indirect utility function of investors will vary across countries because they care about different prices. The role of the exchange rate is not trivial in that model because the price of each currency relative to the num´eraire currency is a sufficient statistic for the consumptionopportunity set of investors from the country of that currency. Using the domestic currency as the num´eraire, there is a perfect hedge against fluctuations of the exchange rate of country i, namely the risk-free bond of country i. Since that hedge is part of the risky securities, each column of VS is also a row of V
990
G.A. Karolyi and R.M. Stulz
in the Solnik/Sercu model, so that V −1 VS is an identity matrix. An investor from the jth country would therefore add to her holdings of the tangency portfolio an investment −JwS of JWW Wj in the risk-free bond of her country. Interestingly, in this model, all investors hold the same tangency portfolio and in addition take on a position in the risk-free bond of their country to hedge against changes in the price of the commodity they consume. Since stocks are not used to hedge, every investor in the world holds two different stocks in the same proportion. Consider an investor from country j. In addition to her holdings of the risk-free bond of her country through the tangency portfolio, −JwS she holds JWW Wj in the risk-free asset of her country. We could construct the asset demands in any currency that we want. If we use currency k for the asset demands for the ith investor in country k, then no state variable enters the investor’s indirect utility function of wealth. This investor will put T ki W ki in the tangency portfolio and (1−T ki )W ki in the risk-free asset of country k. Consider then stock j that has a weight of wjw in the world market portfolio. Since there is no hedging demand for stocks in the Solnik/Sercu model, it must be that wjw W W = wjT
K
T i W i = wjT lW W ,
(19)
i
where wjT is the weight of asset j in the tangency portfolio, T i is the wealth-weighted average of risk tolerances in country i, W i is the country i wealth, l is the wealthweighted average of risk-tolerances across countries, and W W is world wealth. It follows from this that the weight of a stock in the tangency portfolio is 1/l times its weight in the world market portfolio. We can now get to the weight of country q’s risk-free bond in the tangency portfolio. The risk-free bond has zero capitalization but has a hedging demand from investors of country q. Let the weight in the tangency portfolio of the risk-free bond from country q be wqT . Consequently: (1 − T q ) W q + wqT
K
T i W i = 0.
(20)
i
We can solve for wqT to find out the exposure to currency q in the tangency portfolio. Black (1990) created a considerable amount of controversy by using the currency exposure in the tangency portfolio to point out that, in a symmetric world, there is a universal hedge ratio. To understand the issues raised by Black (1990), it is better to follow Solnik (1993) and consider the amount of the risk-free bond sold in the tangency portfolio as a fraction of the share of the risky assets of country q in the tangency portfolio, which we write as wTq . This expression, which has the interpretation of a hedge ratio, is −wqT (1 − T q ) W q = , (21) wTq Mq where M q is the market capitalization of country q. Suppose that we are in a symmetric world with relative risk tolerance less than one. In such a world, there cannot be net hq =
Ch. 16:
Are Financial Assets Priced Locally or Globally?
991
foreign investment, so that W q = M q , and everybody has the same risk tolerance. Hence, the hedge ratio is the same for stocks of every country – it is exactly 1 − T , where T is the common risk tolerance. Black (1990) called this the universal hedge ratio. In general, the world is not as simple as we presumed in the analysis just concluded, so that hedge ratios differ for stocks of different countries. For instance, differences in risk tolerance across countries or net foreign investment render the universal hedge ratio formula useless. More importantly, the result relies on the assumption that investors hold the log portfolio and the riskless asset when they compute their asset demands in their own currency. In general, this is not the case because investors want to hedge a variety of risks. For example, a major limitation of the Solnik/Sercu model is that it assumes away inflation. Introducing inflation makes it possible to distinguish between nominal exchange rate changes and real exchange rate changes. If there is inflation, investors will attempt to hedge against the effects of inflation. Adler and Dumas (1983) develop a model where the price level in each country evolves randomly and where deviations from purchasing power parity are possible. In their work, they make clear the role of the utility function of investors, of price level changes, and of deviations from purchasing power parity in international portfolio choice and international asset pricing. More generally, the instantaneous distribution of the rate of change of the exchange rate can change over time, consumption baskets may differ across investors, the investment-opportunity set can change over time, investors may have non-investment income, and so on. In such a world, Stulz (1981a), building on Breeden (1979), shows that instantaneous expected excess returns must satisfy m Ij − r − s I j , P =
bIj , c mM − sM , P , bM , c
(22)
where mIj is the instantaneous domestic expected currency return of the jth asset (the return is given by Equation (13) for a foreign asset), r is the instantaneous domestic nominal rate of interest, sIj , P is the instantaneous covariance between the return on asset j and the domestic rate of inflation, bIi , c is the instantaneous consumption beta of the return of asset j, where c is real consumption in the domestic country, bM , c is the instantaneous consumption beta of portfolio M , mM is the instantaneous expected excess return of portfolio M , and sM , P is the instantaneous covariance between the excess return of portfolio M and the rate of inflation. This equation holds regardless of the composition of the reference portfolio M , so that it holds if that portfolio is the world market portfolio. The CAPM is a special case of this equation that obtains when the market portfolio has a return perfectly correlated with consumption. It is immediately clear from Equation (22) that the domestic CAPM cannot hold if investors hold internationally diversified portfolios because consumption will not be perfectly correlated with the domestic market portfolio. Yet, if domestic assets are not correlated with the residual of a regression of consumption on the domestic
992
G.A. Karolyi and R.M. Stulz
market portfolio, the relative pricing of domestic risky assets will still be correct when using the domestic market portfolio. More generally, however, even foreign assets can be priced domestically when using the consumption beta model. As emphasized in Stulz (1981a), with the consumption capital asset-pricing model, one does not have to make a priori assumptions about foreign exchange rate dynamics to price assets since the same version of the model is consistent with any type of foreign exchange rate dynamics. Further, one does not have to make assumptions about the importance of global factors on asset pricing since the consumption capital asset-pricing model implemented with domestic consumption holds regardless of the importance of these factors. The model can be used to price foreign exchange risk and implies that the risk premium on foreign currency increases with the consumption beta of the exchange rate. Finally, the earlier models of Grauer, Litzenberger and Stehle, of Solnik, and of Sercu, as well as the Adler and Dumas model, obtain as special cases of Equation (22). 2.4. Empirical evidence on asset pricing using perfect market models How well do these international asset-pricing models perform? The empirical literature can be classified into those studies that provide tests of the world CAPM singlefactor model (Section 2.1), those that examine multi-factor asset-pricing models, such as the international arbitrage pricing model or the Solnik/Sercu international assetpricing model (IAPM) with currency hedging portfolios (Section 2.2), and finally, the general intertemporal or consumption-based asset-pricing models (Section 2.3). We assess each of these separate classes of studies in this section, but it is worthwhile to note several common themes. First, empirical tests for each class of models adopted unconditional approaches early on. Over time, the focus shifted to conditional tests that assume the model holds period by period and thus allow the joint distribution of asset returns to change dynamically over time. Second, studies in each class differentiate themselves according to the test assets that are under examination by the authors. For example, while equities are often the focus and, usually, in the form of national indexes, there are some studies with individual stocks and portfolios constructed using various stock characteristics, such as growth, value, size, and industry group. Further, some of the international models are tested with fixed income securities, currency futures, and forward contracts. Finally, tests employ a return horizon that is dictated by data availability and low frequency quarterly and monthly returns are most commonly featured. However, a number of more recent tests consider higher frequency daily or weekly returns. Overall, the evidence shows that country risk premiums change dynamically and predictably over time with their covariance with the world market portfolio return, a result that is supportive of international asset pricing. However, it is much less clear how the cross-section of expected returns across global securities is affected by global factors when one goes from a purely-domestic asset-pricing model to an international asset-pricing model. The early empirical studies of international asset-pricing models provide unconditional tests of the world CAPM with monthly returns on national stock indexes. Stehle
Ch. 16:
Are Financial Assets Priced Locally or Globally?
993
(1977) evaluates the world CAPM for the USA and eight other national indexes to investigate whether U.S. assets are priced internationally. He tests whether the risks that are diversifiable internationally but not domestically are priced. He finds support for this hypothesis, but in his sample the zero-beta portfolio return is too high and the international security market line slope is too flat to be consistent with the model. Overall, this rather weak inference in favor of international pricing stems from the low power of his tests. After all, there is strong collinearity between the U.S. index and the world index with more than 40% weight of the world market portfolio placed on the former. This challenge motivates Jorion and Schwartz (1986) to focus on the North American case, using monthly returns on about 750 individual stocks in Canada. They take advantage of a control sample of Canadian stocks that were also listed on U.S. exchanges and use an orthogonalization of the world market portfolio (NYSE Composite index) relative to the domestic market portfolio (Toronto Stock Exchange 300). They find that, while national factors are priced, risk exposures to the U.S. market are not priced for Canadian stocks, even for those cross-listed in the USA. Mittoo (1992) finds evidence, using the CAPM and the arbitrage pricing model, that Canadian stocks were priced as in a segmented market from 1977 to 1981, but as in an integrated market from 1982 through 1986. Results supportive of international pricing are found in Korajczyk and Viallet (1989) who provide evidence using over 6000 individual stocks from the USA, Japan, and the UK with 15 years of monthly data. They show that the world CAPM outperforms the domestic CAPM in that average model mispricing errors are smaller. However, in their various tests, they also find that the world CAPM model still retains large pricing errors for small stocks, that these average pricing errors are significant for the world CAPM for each market in at least one specification, and that the world CAPM dominates only in the more recent (1979– 83) subperiod. A number of recent studies investigate the world CAPM model in a conditional setting with time-varying expected returns, variances, and covariances. This is an important feature that can have important implications for the empirical performance of asset-pricing models. The first efforts in this regard by Mark (1988), Giovannini and Jorion (1989) and McCurdy and Morgan (1992) investigate the risk premium in currency forward contracts. They all find that the conditional covariances implied by the world CAPM indeed have some success in forecasting the conditional risk premia over time. Unfortunately, the risk premia are quite large in certain subperiods and the overidentifying restrictions implied by the world CAPM are rejected easily. Giovannini and Jorion, in particular, use weekly returns constructed from government bonds denominated in six currencies and a simple multivariate GARCH specification. They find that there are negligible differences in the huge statistical rejections for either the static (unconditional) world CAPM or those specifications with more general covariance dynamics. However, their tests and others like theirs make the additional assumption that government bonds are net wealth. Rejecting the world CAPM in such tests can be due to this critical auxiliary assumption. Harvey (1991) employs a conditional framework using monthly returns on a number of Morgan Stanley Capital
994
G.A. Karolyi and R.M. Stulz
International (MSCI) stock indexes allowing the expected excess index returns, their betas, and even the world market price of risk to vary over time. Estimated with GMM and local and world instrumental variables, the conditional world CAPM model is useful for predicting expected excess returns. Although the overidentifying restrictions of the model cannot be rejected for the G-7 countries, model pricing errors associated with several countries, and especially Japan, suggest either these markets are not integrated with world markets or the model fails to capture some important priced risks. Chan, Karolyi and Stulz (1992) test implications of the world CAPM using daily data for the 1980s with a multivariate GARCH structure for time-varying conditional expected excess returns and covariances. They cannot reject the world CAPM restriction in a model with a domestic (USA) and foreign (MSCI EAFE index) portfolio, but they find that a simple two-beta model where each portfolio is a source of risk outperforms the world CAPM. One of the limitations of their approach is the low cross-sectional power of the tests. As a result, De Santis and Gerard (1997) extend this multivariate GARCH approach to eight national MSCI indexes with a more parsimonious but less general dynamic GARCH structure. Their tests cannot reject that the price of risk is equal across markets or that the model pricing errors equal zero. Moreover, they cannot reject that country-specific risks are zero. The results are somewhat sensitive to bear market conditions and high interest rate economic environments. The relevance of economic cycles for conditional risk and expected returns is featured in the specification tests of Zhang (2001) that are applied to size and book-to-market portfolios of stocks from the USA, the UK, and Japan. She shows that the world CAPM (Equation 1) is not rejected with these business cycle conditioning instrumental variables. Strikingly, she mostly seems to find that exchange rate risk is not priced. The inability of the world CAPM to explain the cross-section of global security expected returns may stem from the potential importance of sources of priced risk other than the world market portfolio in international asset-pricing models. A number of studies establish the conditions under which a multi-beta model, such as the Arbitrage Pricing Theory (APT) or intertemporal CAPM, can hold internationally [see, for instance, Solnik (1983)] assuming a perfect markets model with identical consumption and investment-opportunity sets across countries (Section 2.1). Cho, Eun and Senbet (1986) are the first to provide tests of the international APT. They employ factor analysis with monthly returns for 349 stocks from 11 countries and find that three or four factors are reliably identified. However, the cross-sectional tests (equality of intercepts across pairings of country groups of stocks) lead them to reject international market integration and the APT. Curiously, the results are not sensitive to the currency denomination of the returns and the model does reliably hold for certain pairings of country groupings. Gultekin, Gultekin and Penati (1989) and Korajczyk and Viallet (1989) show that the performance of the international APT depends on the sample period because markets were less open in the 1970s or the early 1980s than they are later on. For example, Korajczyk and Viallet (1989) find that the international APT
Ch. 16:
Are Financial Assets Priced Locally or Globally?
995
outperforms the world CAPM in terms of average pricing errors (though strangely both models are dominated by the domestic APT) but the international APT performs best with capital control dummies during the 1969–74 period. Finally, building on the early success of the three-factor model of Fama and French (1993) in capturing market, size, and book-to-market effects in the USA, Fama and French (1998) propose a similar multi-beta model for the cross-section of global stock returns of the MSCI universe of 7500 stocks from thirteen countries between 1975 and 1995. They show that, while the world CAPM fails, a world two-factor model (world market beta and value-growth beta) can explain the cross-section of expected returns, including most of the annual 7.68% average global value premium for size and book-to-market portfolios during this period. Griffin (2002) finds that the success of the world value-growth factor for some countries stems from the domestic components of that factor. The world CAPM would perform poorly if differences in consumption baskets across countries play an important role in the cross-section of expected returns. In this case, the consumption asset-pricing model or the Solnik/Sercu model would perform better. Despite the conceptual superiority of the consumption-beta model, only a few papers have attempted to implement tests of that model, and even fewer have done so for international stock returns because of the problems of measuring consumption flows. Nevertheless, Wheatley (1988) and Cumby (1990) provide evidence that the Stulz (1981a) consumption beta model (Equation 22) works reasonably for monthly MSCI national index returns during the 1970s and 1980s. Wheatley tests the model’s restrictions on unconditional mean returns, and shows that the model performs well for all but certain markets (France, Hong Kong, and Italy) and for all but the early 1980s. Cumby employs vector autoregressions to test the model’s restrictions on conditional mean returns, and finds the model works well during the 1980s, although not during the late 1970s. Conditional tests of international APT or multi-beta asset-pricing models have been the focus of most recent studies and have consistently shown a lot of promise. Campbell and Hamao (1992), Bekaert and Hodrick (1992), Ferson and Harvey (1993, 1994, 1995, 1997) and Harvey (1995) allow cross-sectional variation using extra-market factors related to macroeconomic and fundamental stock characteristics and show consistently that expected returns of individual countries are forecastable. Each of these studies examines national index returns for developed markets and, in the case of Harvey (1995), emerging markets, but their focus is typically on the role of local versus global conditioning information and exposures to pre-specified factors for expected returns and not the ability of the various multi-beta models to explain the cross-section of expected returns. In fact, Ferson and Harvey (1994) and Harvey (1995) show that both the international CAPM and the multi-beta models they study fail to explain the cross-section of expected returns for 21 national equity markets and new emerging equity markets in Europe, Latin America, Asia, Mideast and Africa. Rouwenhorst (1999) offers further evidence of the importance of local factors, such as momentum, turnover, size, and value, using data from emerging markets. Griffin (2002) shows that size and book-to-market are local factors as well. Finally, Ilmanen (1995) presents
996
G.A. Karolyi and R.M. Stulz
conditional tests for long-maturity government bonds in six countries using a small set of global instruments and a latent factor model, like Bekaert and Hodrick (1992). He demonstrates significant predictable variation in excess bond returns, but he identifies one global risk factor that captures the cross-section of expected returns that is strongly correlated with a world excess bond return factor. When exchange rates are perfectly correlated with relative prices of goods as in the model of Section 2.2 and the only hedging portfolios investors hold are foreigncurrency denominated bonds, expected asset returns depend on the covariances of assets with the return of the world market portfolio and with currency returns. Nevertheless, remarkably few of the empirical tests have paid attention to the role of foreign exchange risk. An early paper by Jorion (1991) showed that a tradeweighted exchange rate risk measure is priced in the 1980s for test assets comprised of portfolios of about 900 U.S. multinational firms constructed by the proportion of overseas sales. Later, Dumas and Solnik (1995) and De Santis and Gerard (1998) find empirical support for conditional versions of multi-factor asset-pricing models that include a world market portfolio and a series of foreign exchange return factors. Dumas and Solnik combine a “pricing kernel” formulation with a generalized instrumental variables approach for four developed markets and find that exchange rate risk is priced and cannot reject the international asset-pricing model. De Santis and Gerard (1998) conduct similar tests using the multivariate GARCH representation of their 1997 paper, but also document that the currency risk premium captures almost 64% of the total premium during the 1980s for each market, except the USA. Dahlquist and S¨allstr¨om (2002) investigate the ability of asset-pricing models to price 19 national portfolios, national industry portfolios sorted into 25 characteristic-sorted portfolios (such as value, growth, and size), and 39 global industry portfolios. They find that using national portfolios they cannot reject any of the models. However, when they investigate the characteristic-sorted portfolios, they find that the CAPM without foreign exchange risk factors performs poorly. Even that model faces difficulties because characteristics are priced when added to the pricing equation and it does poorly when they try to price the global industry portfolios. Given the considerable success of industrial production as a conditioning variable in Zhang’s (2001) work, it would be interesting to see how this variable performs in the tests implemented by Dahlquist and S¨allstr¨om. Though the empirical evidence on the world CAPM suggests that, for the USA, it may not be necessary to use the world market portfolio to investigate the cross-section of expected returns, some of the empirical evidence on the pricing of exchange rate risk indicates that one may have to take into account the exposure of stocks to foreign exchange risk. Paradoxically, the evidence on foreign exchange rate exposures of stocks is weak as discussed in Griffin and Stulz (2001), making it puzzling that exchange rates would matter so much in the cross-section of returns in some studies.
Ch. 16:
Are Financial Assets Priced Locally or Globally?
997
3. Home bias The models discussed in Sections 2.1 and 2.2 have strong implications for the equity portfolios of investors. With these models, all investors hold any two stocks in the same proportion. The generalized model discussed in Section 2.3 predicts that all investors invest in the portfolio an investor with logarithmic utility would hold plus hedge portfolios. Except for hedge portfolios, therefore, all investors also hold any two stocks in the same proportion. If hedge portfolios do not include stocks, then all three models imply that investors hold any two stocks in the same proportion. Unless hedge portfolios contain stocks, it must therefore be the case that all investors hold the world market portfolio of stocks. As shown in Ahearne, Griever and Warnock (2002), at the end of 1997 U.S. stocks comprised 48.3% of the world market portfolio. At that time, foreign stocks represented only 10.1% of the stock portfolios of U.S. investors. The holdings of foreign stocks predicted by the world CAPM and the Solnik/Sercu model were, therefore, about five times the actual holdings of foreign stocks of U.S. investors. This dramatic underweighting of foreign stocks in portfolios is called the home bias. 9 The home bias is pervasive across countries. French and Poterba (1991) and Tesar and Werner (1994) show that at the beginning of the 1990s, the fraction of stock market wealth invested domestically was in excess of 90% for the USA and Japan, and in excess of 80% for the UK and Germany. Figure 1 uses flow of funds data to show how the home bias has evolved over time for U.S. investors from 1973 through the end of 2000. This figure shows holdings of foreign stocks for American investors as a fraction of their stock market wealth and the share of U.S. stocks in the world market portfolio. Though the home bias has decreased for these investors, most of the decrease occurred over a few years from 1985 through 1994. Before 1985, holdings of foreign stocks represented a trivial fraction of the stock market wealth of American investors. From 1945 to 1973, holdings of foreign stocks as a fraction of stock market wealth never exceed 1%. Though holdings of foreign stocks increased sharply from 1985 through 1994 to roughly 10% of the stock market wealth of American investors, this fraction has stayed remarkably constant since then. It was 9.91% in 1994 and 10.53% at the end of 2000. Though the portfolio weight of foreign assets stayed relatively stable since 1994, the dollar value of foreign holdings more than doubled. For a number of years after World War II, most countries had strong barriers to foreign investment. Because most currencies were not convertible, investing abroad required access to scarce foreign currencies. Many countries also had prohibitions to foreign investment by their own citizens and often limits or outright prohibition to ownership of domestic stocks by foreign investors. Even when countries did not have outright restrictions on foreign investment, equity investment by foreign investors could be disadvantaged because of tax considerations, because of the necessity to
9
Lewis (1999) provides a recent extensive review of the home bias literature.
998
G.A. Karolyi and R.M. Stulz 80%
70%
60%
50%
40%
30%
20%
10%
0% 1973
1976
1979
1982
1985
1988
1991
1994
1997
2000
Foreign equities share of world market portfolio Foreign equities share in U.S. stock market wealth
Fig. 1. US home bias, 1973–2000. Source: Flow of Funds Accounts of the United States, Flows and Outstandings, Federal Reserve Board.
acquire foreign exchange, and because of the costs of hedging exchange rate risk. In the presence of barriers to international investment, one expects investors to hold more domestic stocks than predicted by the world CAPM or the Solnik/Sercu model because barriers to international investment lessen the benefits of international diversification. Black (1974) derives a model of international portfolio choice and asset pricing where barriers to international investment take the form of a proportional tax that is rebated for short sales. Barriers of this type might correspond to some types of taxes but, generally, obstacles to investment are such that they reduce the return both for short and long positions. Stulz (1981b) models such barriers as the equivalent of a tax paid on the absolute value of holdings of foreign stocks and shows that they imply that some foreign stocks are not held by domestic residents. This tax can represent explicit direct costs of holding foreign stocks as well as proxy for other indirect costs, such as information costs. Other barriers to international investment take the form of outright ownership restrictions. Eun and Janakiramanan (1986) and Errunza and Losq (1985) provide models that examine the portfolio and asset-pricing implications of such barriers. A number of papers investigate empirically the implications of partial segmentation, where partial segmentation is defined to mean that there are some equity flows that take place either in or out of a country, but these flows are limited because of explicit constraints on or because of barriers to international investment. Many of the papers discussed in this section use the hypothesis of segmented markets as their alternative hypothesis. In particular, it is common in the literature to contrast global pricing
Ch. 16:
Are Financial Assets Priced Locally or Globally?
999
of assets to local pricing. When using developed markets for the 1980s and 1990s, authors generally find that they can reject local pricing. However, for a number of countries, some of the barriers to international investment are known explicitly, so that a model that reflects these barriers can be tested. Errunza and Losq (1985) derive explicit predictions for the expected returns of securities that cannot be held freely by foreign investors and test these predictions on a short sample period, obtaining results that are consistent with their predictions. Hietala (1989) tests a model where he incorporates the ownership restrictions that applied to Finnish companies over his sample period and finds supportive evidence that the magnitude of the premium of unrestricted shares relative to restricted shares can be explained by his model. Bailey and Jagtiani (1994) investigate the determinants of the premium of shares available to foreign investors in Thailand and show how that premium varies over time. With explicit or implicit barriers to international investment, securities available to foreign investors may not be equally attractive to all foreign investors since the barriers may differ across investors depending, for instance, on their tax status. In this case, the demand curves for domestic securities from foreign investors may be downward-sloping, creating incentives for firms to restrict their supply of shares available to foreign investors to increase the price of these shares. Stulz and Wasserfallen (1995) expand models with barriers to international investment to take into account the downward-sloping demand curves for domestic securities from foreign investors. They find supportive evidence for Switzerland. Domowitz, Glen and Madhavan (1997) investigate the pricing of restricted and unrestricted shares in Mexico and find support for the hypothesis that the demand curve for Mexican shares from foreign investors is downward sloping. Over the last thirty years, however, barriers to international investment have fallen dramatically. Emerging markets took longer to remove explicit barriers to international investment. However, even for emerging markets that have few such explicit barriers, sovereign risk remains often as a significant barrier to international investment. Further, in these markets, there have been instances where barriers have been restored – the most visible case being Malaysia in 1998. Bekaert and Harvey (1995) estimate a multivariate GARCH model where the degree of integration of an emerging market changes over time and then extract the extent to which the market is segmented from the data. With their approach, the risk premium on a market depends on its volatility if the market is completely segmented and depends on its world market beta if it is completely integrated. They allow a market’s expected return to depend both on its volatility and on its world beta. The degree of segmentation of a market decreases when the market’s world beta becomes a more important determinant of the market’s expected return. They find that the degree of segmentation varies over time – sometimes decreasing and sometimes increasing.
1000
G.A. Karolyi and R.M. Stulz
Bekaert and Harvey (2000) and Henry (2000) provide evidence on the impact of removing barriers to international investment for emerging markets. 10 When a country’s risk premium is determined locally, the mean-variance model implies that the risk premium increases with the volatility of that country’s market. In contrast, when a country’s risk premium is determined globally, the risk premium depends on the covariance of the return of that country’s market portfolio with the return of the world market portfolio. Because emerging markets typically have high volatility but low betas, one would expect their equity to appreciate substantially when they move from local to global pricing. Henry (2000) shows that it is so. Investigating twelve countries, he finds that on average the equity of these countries appreciates by more than 25% over the seven months that precede the opening of their markets and the month of the opening. Further, Chari and Henry (2001) provide evidence supportive of the hypothesis that the expected return of individual stocks is determined by their local beta before their markets open up to foreign investors and by their global beta afterwards. At the same time, however, it is clear that the impact of opening a country’s markets to foreign investors has a relatively small impact on the risk premium of that market. While Henry (2000) investigates stock index returns around stock market openings, Bekaert and Harvey (2000) investigate changes in dividend yields. They provide a battery of econometric tests showing that, even though the risk premium falls, a reasonable estimate of this decrease in the risk premium is only at most between 100 basis points and 200 basis points. The small decrease in the risk premium following the opening of a market to foreign investors is puzzling. Since the beta of emerging markets is so low and their volatility is so high, one would expect a much more dramatic impact. The home bias may help explain this low impact of stock market liberalization on the risk premium of emerging markets. The dramatic change from local to global pricing takes place only if, after liberalization, foreign investors begin to hold stocks of the liberalizing country in proportion to their weight in the world market portfolio. We know that they do not. If they did, the holdings of domestic stocks by domestic investors in emerging markets would be trivial since the share of an emerging market in the world market portfolio is typically less than 1%. Obviously, foreign holdings of emerging market stocks are not as extensive. There are at least four reasons why one has to be skeptical about explanations of the home bias that rely on explicit direct barriers to international investment. First, these barriers have fallen over time. In fact, Errunza, Hogan and Hung (1999) show that investors can obtain most of the benefits from international diversification by investing in securities traded in the USA such as American Depositary Receipts (ADRs) and country funds. Despite the growth in these securities, as shown in Figure 1, the home
10 Stulz and Wasserfallen (1995) provide an event study of the removal of barriers to international investment for a Swiss stock. They show that when Nestl´e made shares that could be held only by Swiss residents freely available, the price of these shares increased sharply as predicted.
Ch. 16:
Are Financial Assets Priced Locally or Globally?
1001
bias for U.S. investors is unchanged over the last seven years. Second, the gross equity flows are very large compared to the net equity flows. 11 For instance, in 1999, equity transactions between U.S. investors and foreign investors totaled $4.6 trillion. Further, during the first three quarters of 2000, U.S. investors bought foreign shares for $1.376 trillion and sold foreign shares for $1.364 trillion, buying a net amount of foreign shares of about $12 billion. Such dramatic gross purchases and gross sales seem hard to reconcile with the existence of important differential transaction costs for the purchase and sale of foreign shares compared to domestic shares. Third, Glassman and Riddick (2001) calibrate a mean-variance model of portfolio choice with transaction costs in order to explain the deviations from the world market portfolio for U.S. investors when considering foreign holdings in Canada, France, Germany, Japan, and the UK. They show that explicit direct barriers to international investment would have to be of an extraordinary magnitude – more than 1% per month for an investor with a coefficient of relative risk aversion of 3 – for a reasonable explanation of home bias. Fourth, Ahearne, Griever and Warnock (2002) show ownership restrictions and transaction costs are second-order effects in explaining the cross-country holdings of foreign stocks for U.S. investors. Liberalization does not mean that foreign investors will immediately acquire a large stake in the liberalizing country. Absent foreign equity investment, the risk-sharing benefits of liberalization will not be obtained [see Stulz (1999)]. If explicit direct barriers to international investment and ownership restrictions fail to explain the home bias for U.S. investors, what can? The models discussed in Section 2.3 predict that investors will hold portfolios to hedge against unanticipated changes in state variables on which their expected lifetime utility depend. As long as the impact of some state variables depends on the country of residence of the investor and on the fact that to hedge these state variables requires common stocks, one would expect investors to hold stock portfolios that depend on their country of residence. The literature has focused on consumption good prices and human capital as the main state variable that could lead to differences in portfolios across countries. In Krugman (1981) and Stulz (1983), investors hold bonds rather than stocks to hedge against relative goods prices because the price of foreign bonds in domestic currency is correlated with the relative price of foreign goods so that their models cannot explain the home bias. Uppal (1993) provides a general equilibrium model that builds on Dumas (1992). The Dumas model is a two-country model where there is only one consumption good, but the location of the inventory of that good matters because changes in the stock of the good in one country can only take place as a flow and it is costly to move goods from one country to the other. In the Dumas model, the exchange rate is interpreted as the price of the good in the foreign country in terms of the price of the good in the domestic country. The setup of the model makes it
11
Tesar and Werner (1995) are first to notice that non-resident investors trade very actively compared to resident investors.
1002
G.A. Karolyi and R.M. Stulz
possible to generate interesting dynamics for the real exchange rate. In particular, it is possible for the real exchange rate to differ from one and no trade in the good to take place. However, there are upper and lower limits to the extent to which the real exchange rate can depart from one because trade takes place when the real exchange rate deviates too much from one. Uppal’s model investigates portfolio choice in such a setting. He finds that investors with a coefficient of relative risk aversion that exceeds unity end up overweighting foreign stocks. Cooper and Kaplanis (1994) examine whether hedging against inflation risk can explain the home bias and conclude that it cannot. In a simple mean-variance model where human capital is a non-traded asset, Mayers (1973) shows that investors hold stocks to hedge that non-traded asset. If the return to human capital of an investor is negatively correlated with the return of the stocks from that investor’s country, then that investor would increase her position in domestic stocks relative to what would be predicted in a model without non-traded assets. Fama and Schwert (1977) were first to examine this issue for the USA and found little correlation. However, Baxter and Jermann (1997) argue that the correlation between human capital and the stock market of the country of an investor is positive, so that their analysis leads them to conclude that domestic investors should hold fewer domestic stocks than predicted by the mean-variance model. In contrast, Bottazzi, Pesenti and Van Wincoop (1996) present estimates showing a negative correlation, so that they can conclude that hedging of human capital can explain up to 40% of the home bias. However, Glassman and Riddick (2001) show that for human capital to explain the home bias, the return to human capital has to have positive correlation with stock market returns, this positive correlation has to be larger with most foreign markets than it is with the U.S., and the volatility of the return to human capital has to be large compared to the volatility of the U.S. stock market. Though explicit direct barriers to international investment and ownership restrictions cannot explain the home bias, there are also indirect barriers to international investment. It has been often argued that investors know less about foreign stocks than they know about domestic stocks. The hypothesis that foreign investors are less well informed about domestic stocks forms the starting point for several theoretical models. In particular, Gehrig (1993) derives the optimal portfolio when foreign investors know less and shows that this assumption leads to an overweighting of domestic assets. Kang and Stulz (1997) provide some evidence about holdings of foreign assets that is consistent with an information advantage for domestic investors. They show that foreign investors in Japan hold more stocks of large companies than they do of small companies. In other words, the home bias is stronger for small stocks than for large stocks. One would expect that the information disadvantage of foreign investors would be smaller for large stocks. Dahlquist and Robertsson (2001) investigate the holdings of Swedish stocks by non-resident investors. Their results are consistent with Kang and Stulz (1997). They point out that non-resident investors are mostly institutional investors and that the holdings of stocks by non-resident investors exhibit biases that are also typical of resident institutional investors.
Ch. 16:
Are Financial Assets Priced Locally or Globally?
1003
Foreign investors could be disadvantaged because of distance, because of differences in language and culture, and because of time zone differences. There is evidence that distance matters. In two papers, Coval and Moskowitz (1999, 2001) show that the weight of a U.S. stock in U.S. mutual funds is negatively related to the distance between the location of the fund and the location of the headquarters of the firm and that mutual fund managers do better with their holdings of stocks of firms located more closely to where the mutual fund is located. Grinblatt and Keloharju (2001) show that, in Finland, language matters in an investor’s portfolio allocation. Finnish investors whose native language is Swedish are more likely to own stocks of companies in Finland that have annual reports in Swedish and whose CEOs speak Swedish than those investors whose native language is Finnish. Choe, Kho and Stulz (2001) find evidence that foreign investors buy at higher prices than resident investors in Korea and sell at lower prices. Shukla and van Inwegen (1995) show that UK money managers underperform American money managers when picking U.S. stocks. Hau (2001) finds that proprietary trades on the German stock market do better when they are geographically closer to Frankfurt. There is however some evidence that conflicts with the view that foreign investors are less well informed about domestic stocks. Grinblatt and Keloharju (2000), Seasholes (2000), and Karolyi (2002) provide evidence that foreign institutional investors outperform residents. Grinblatt and Keloharju use Finnish data to show that over their sample period foreign investors are better at picking Finnish stocks than domestic investors. Seasholes shows that in Taiwan foreign institutional investors buy stocks before positive earnings announcements and sell stocks before negative earnings announcements. Both papers argue that foreign institutional investors do better because they are more skilled at acquiring and interpreting information. Karolyi (2002) shows that foreign investors in Japanese equities outperformed Japanese individuals and institutions, including banks, trust and life insurance companies, and corporations themselves during the Asian financial crisis period. In a mean-variance model where investors choose among all stocks, the information disadvantage of foreign investors matters in their portfolio allocation if they perceive that foreign stocks are riskier. Glassman and Riddick (2001) show that investors would have to scale up perceived market portfolio standard deviations typically by a factor from 2 to 5 depending on risk aversion to produce a home bias similar to the one observed for U.S. investors. Such scaling makes little sense. Yet, the confidence intervals around portfolio weights obtained from mean-variance optimization are large enough that they include the case of zero weight on foreign assets [Britten-Jones (1999)], so that investors with different priors might reach dramatically different conclusions about the benefits of international diversification. However, Pastor (2000) and Li (2001) provide evidence that Bayesian investors would have to have rather extreme beliefs in favor of domestic asset pricing to stay away from foreign stocks. It may well be, though, that investors overestimate the risk of foreign stocks and underestimate their expected returns because of behavioral factors. Shiller, Kon-Ya and Tsutsui (1996) provide survey evidence that domestic investors typically expect domestic stocks to earn more than foreign investors do.
1004
G.A. Karolyi and R.M. Stulz
Merton (1987) develops a model where investors hold stocks that they know. Such a model would be equivalent to one where investors think that the risk of stocks they do not know is extremely high. With that model, one would also see investors overweight domestic stocks. Strikingly, Ahearne, Griever and Warnock (2002) find that the most important variable in explaining the underweighting of stocks of various countries by American investors is the fraction of a country’s market capitalization that corresponds to firms with ADR programs: the greater this fraction for a country, the greater the weight of that country in the portfolio of U.S. investors. The impact of ADR programs might be due to the fact that having a foreign stock traded in the USA represents a certification effect and reduces information asymmetries because of the adoption of U.S. GAAP. Ahearne, Griever and Warnock (2002) find that the impact of ADRs comes from listings that require firms to use U.S. GAAP and to provide SEC disclosures as opposed to OTC listings. At the same time, however, an ADR program makes firms known, especially when it involves listing on an exchange, which would be supportive of Merton’s model. The empirical evidence on ADR programs [surveyed by Karolyi (1998)] shows that there is a positive abnormal return when a firm announces or lists an ADR program [Foerster and Karolyi (1999) and Miller (1999)], that ADR programs experience a pre-listing stock price run-up and post-listing stock price decline [Foerster and Karolyi (1999, 2000)], and that global factors affect the pricing of ADR firms more than firms from the countries of ADR firms that do not list in the USA [Mittoo (1992) and Karolyi (1998)]. This evidence is consistent with the hypothesis that a U.S. listing reduces some barriers to international investment. The home bias has generated a great deal of research. However, most explanations for the home bias have trouble explaining the fact that the allocation to foreign stocks of U.S. investors remained stable from 1994 through 2000. During that period of time, the size of ADR programs increased dramatically and it became dramatically easier for U.S. investors to get exposure to foreign stocks with little impact on portfolio allocations. It is plausible that the success of the U.S. stock market led U.S. investors to keep their allocation to that market high, so that momentum trading explains why the home bias has not decreased. However, this explanation for why the home bias of U.S. investors has not decreased makes it even more puzzling why the home bias of foreign investors is so large. Pinkowitz, Stulz and Williamson (2003) explain this home bias by the fact that, in most countries, firms have controlling investors who do not trade their shares. Perfect financial markets models with investors who are meanvariance optimizers cannot explain any of this.
4. Flows, spillovers, and contagion Cross-border capital flows have grown dramatically in the past three decades, especially to developing economies. For example, in 1975, gross cross-border transactions in bonds and equities for the USA were equivalent to 4% of GDP. As
Ch. 16:
Are Financial Assets Priced Locally or Globally?
1005
25000000
US $ Millions
20000000
15000000
10000000
5000000
0
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
Gross Sales to U.S. Residents by Foreigners Gross Purchases from US Residents by Foreigners Nominal US GDP (Constant US$))
Fig. 2. Foreign purchases of bonds and stocks from US residents and foreign sales of bonds and stocks to domestic US residents versus US GDP. Source: Treasury International Capital (TIC) Reporting System, US Treasury (http://www.treas.gov/tic/), International Monetary Fund, World Economic Outlook: Fiscal Policy and Macroeconomic Stability, World Economic and Financial Surveys, May 2001. (http://www.imf.org/external/pubs/ft/weo/2001/01/index.htm)
shown in Figure 2, these transactions exceeded GDP for the first time in 1992 and by 2000 they had grown to 245% of GDP. Moreover, net portfolio flows have become an economically significant component of total capital flows. Figure 3 shows that total net capital flows to emerging markets were positive each year but were accelerating to a high of $180 billion in 1996 and subsequently declining to a low of $45 billion in 2000. Flows are also remarkably volatile: the East Asian crisis countries went from net private capital flows of $62.4 billion in 1996 to net private capital flows of −$46.2 billion in 1998. The dramatic change in net private capital flows that takes place in crisis periods, as evidenced by the statistics for East Asia as well as the other statistics given in Section 1, have led many policymakers to question whether the financial liberalization process has gone too far and whether controls on capital flows should be reintroduced. The recent upheavals in Asia and Russia have led to the reimposition of some barriers to international investment in some countries. Prominent economists have argued that, while trade liberalization should be encouraged, this is not true of financial liberalization. For instance, Stiglitz (1998) calls for greater regulation of capital flows, arguing that “. . . developing countries are more vulnerable to vacillations in international flows than ever before”. Krugman (1998) argues as follows: “What turned a bad financial situation into a catastrophe was the way a loss of confidence turned into self-reinforcing panic. In 1996 capital was flowing into emerging Asia at the rate
1006
G.A. Karolyi and R.M. Stulz 200
Net Capital Flows (US $ Billions)
150
100
50
0
-50
-100
-150
1993
1994
1995 1996 Total Portfolio Investment
1997 1998 1999 Private Flows Other Private Flows
2000 2001 Direct Investment
2002
Fig. 3. Net capital flows to developing countries, 1993–2002. Source: International Monetary Fund, World Economic Outlook: Fiscal Policy and Macroeconomic Stability, World Economic and Financial Surveys, May 2001. Statistical Appendix (http://www.imf.org/external/pubs/ft/weo/2001/01/index.htm)
of about $100 billion a year; by the second half of 1997 it was flowing out at about the same rate. Inevitably, with that kind of reversal Asia’s asset markets plunged, its economies went into recession, and it only got worse from there”. Bhagwati (1998) states that: “This is a seductive idea: freeing up trade is good, why not also let capital move freely across borders? But the claims of enormous benefits from free capital mobility are not persuasive. (. . . ) It is time to shift the burden of proof from those who oppose to those who favor liberated capital”. Policymakers, economists, and other observers have been concerned that countries are adversely affected by volatile flows and by contagion. These channels through which global forces affect countries cannot be understood with the perfect financial markets model discussed in Section 2. There is little role for equity flows in models where all investors hold stocks in the same proportions. In this section, we review two growing branches of the international finance literature that examine the channels through which global forces affect asset prices and that the economists quoted in the previous paragraph are concerned about. The first set of studies examines the joint dynamics of capital flows and asset returns. This research investigates whether flows reflect changes in expected returns as globalization and liberalization forces would predict or whether they may just as likely impact returns themselves. The second set of studies examines an important by-product of free capital flow: how global stock market returns move together. With free flows, markets are more closely connected. Investors who think that one market will have higher expected returns can move their investments to that market and this connection implies that markets move together more
Ch. 16:
Are Financial Assets Priced Locally or Globally?
1007
than they would if they were segmented. With growing liberalization and integration of markets, one would expect that international co-movements of stock returns have also grown, but a number of studies have rejected this notion. Moreover, while international stock returns comovements fluctuate over time, these fluctuations seem difficult to explain with economic fundamentals in those markets, leading some to suggest that they reflect irrational market contagion, especially around crisis periods [Mussa and Richards (1999)]. Evidence that is supportive of this view is provided by Kaminsky and Schmukler (1999) who show that it is often impossible to find public information justifying large stock-price movements and that large negative (but not positive) stockprice movements are followed by reversals. One important caveat has to be noted here that applies to our discussion in this section. There is a critical difference between equity flows and debt flows. Equity outflows can only take place if foreign investors sell stocks. Selling stocks in a hurry is expensive because of the price impact effect. Hence, the volatility of equity flows is naturally reduced by the price impact expense incurred by investors when they try to make rapid large changes in their stock holdings. In contrast, the properties of shortterm debt make short-term debt flows more volatile. Consider a firm that has short-term creditors. If a short-term creditor can call his loan at par or can choose not to renew his loan, he gets the par amount of his loan and makes no loss – he suffers no price impact. However, when the firm has limited liquidity, not all short-term creditors can get their money back – the first ones to ask for it will get it, but not the last ones. Consequently, short-term creditors have an incentive to ask for their money back and not renew loans at the first sign of risk of default. The literature we review here is focused on equity flows than on short-term debt flows. It does not therefore study the causes and consequences of non-renewal of short-term debt. To the extent that shortterm funding is used to invest in illiquid projects, non-renewal of short-term debt can lead to a classic liquidity crisis. Radelet and Sachs (1998) emphasize this mechanism in the context of the East Asian crisis. They state that “international loan markets are prone to self-fulfilling crises in which individual creditors may act rationally and yet market outcomes produce sharp, costly, and fundamentally panicked reversals in capital flows” [Radelet and Sachs (1998, p. 5)]. King (1999) points out that “Virtually the whole of the $125 billion reversal of flows to the five Asian countries was accounted for by swings in short-term debt finance. (. . . ) Liquidity runs, although not the sole cause of problems, did play a major part in recent financial crises”. The literature we review is focused on equity flows rather than debt flows and therefore pays little attention to the issue of a liquidity crisis. 4.1. Flows and returns Can changes in equity valuations be traced directly to capital flows? If so, does the impact of capital inflows and outflows on valuations reflect information that foreign investors have that is not yet incorporated into prices, or are changes in valuations just the destabilizing by-product of excessively volatile flows as foreign investors
1008
G.A. Karolyi and R.M. Stulz
come and go on a whim? That foreign investors are systematically better informed than domestic investors about events that affect the country as a whole is unlikely and, in fact, the asymmetry of information in favor of domestic investors is one of the leading explanations of the home bias phenomenon in Section 3. Studies by Tesar and Werner (1994, 1995) are the first to uncover positive, contemporaneous correlations between U.S. portfolio flows in developed and emerging foreign markets and market index returns. They employ quarterly data from the U.S. Treasury Bulletin between 1982 and 1994. Brennan and Cao (1997) develop and test a theoretical model that relates international investment flows to differences in information endowments between foreign and domestic investors. In their model, foreign investors are less well informed than domestic investors. As a result, public information is more valuable to foreign investors and has a greater impact on their forecasts of asset payoffs than on the forecasts of local investors. Foreign purchases of equity are volatile because foreign investors have more diffuse priors, so that news affects expected payoffs more for them than it does for local investors. They corroborate the Tesar and Werner findings of contemporaneous correlations in returns and flows using the same U.S. Treasury Bulletin data, but broaden the sample from Canada, Japan, USA, and UK to sixteen emerging markets. The coarseness of low-frequency, quarterly data can mask the true dynamics of the relation between flows and returns. Indeed, Bohn and Tesar (1996) extend the analysis to monthly data for a large number of countries and, with the higher frequency data, uncover evidence of a delayed response of U.S. net portfolio flows to returns. That is, they find that foreign investors are positive feedback traders, buying following positive returns and selling following negative returns. Froot, O’Connell and Seasholes (2001) corroborate this positive feedback trading using proprietary data from State Street Bank and Trust as custodian for institutional investors in 44 countries. Their advantage is the availability of daily reporting of these trades which allows them to delineate clearly components of the total quarterly covariance relationship into those due to flows leading returns, flows lagging returns, and flows contemporaneously correlated with returns. The results of this decomposition using their covariance ratio statistic clearly shows that flows lagging returns account for 80% of the total, while contemporaneous and flows leading returns capture 4% and 16%, respectively. Froot, O’Connell and Seasholes (2001) also estimate a bivariate VAR and impulse responses that allows them to take into account the dynamics of flows to show that flows are highly persistent, especially for emerging markets. They then show that a one basis point increase in flows leads to a 40 basis point increase in equity prices over the first 30 days or so, but after controlling for the persistence of flows, they find a 100 basis point shock to returns leads to only a 0.05 basis point additional inflow over the next two or three months. Though significant, this effect is small. Using an extended structural VAR model, Froot, O’Connell and Seasholes find that a one basis point shock to flows has a contemporaneous effect of increasing prices by 0.6 basis points. Most of the price impact of a shock to flows seems to come subsequently to the increase in flows. This could be evidence that flows contain information about future returns.
Ch. 16:
Are Financial Assets Priced Locally or Globally?
1009
This result is quite different from the literature on mutual fund flows and returns in the USA. Warther (1995) with monthly data and Edelen and Warner (2001) with daily data both find a permanent increase in prices following net inflows, but this increase is the result of the contemporaneous effect rather than of a lag effect. Clark and Berko (1996) document permanent price changes associated with foreign flows using Mexican data for the period from 1989 to 1993 when there was a significant acceleration in foreign ownership of Mexican equities. Froot and Ramadorai (2001) offer some new evidence indicating that the relation between flows and returns may be due to flows forecasting returns rather than the price pressure effect of flows. They examine the impact of cross-border portfolio flows using the prices of closed-end country funds to control for country fundamentals. One possibility is that structural breaks, due to shifts in fundamental economic factors or crisis events, complicate empirical analysis. Choe, Kho and Stulz (1999) examine portfolio equity flows to Korea during the Asian crisis period of 1997 at the stock level and again confirm the positive-feedback effect among foreign investors. Surprisingly, this positive-feedback effect weakens during the three months of their sample corresponding to the Korean phase of the Asian crisis. 12 They argue that, if the positive-feedback effect dissipates, the trading practices of foreign investors are even less likely to be destabilizing. In fact, since their data was available on an intraday, transaction-by-transaction basis for individual stocks, they test whether large block purchases and sales by foreign investors have a permanent impact on prices like those of institutional investors on the NYSE [Chan and Lakonishok (1993, 1995, 1997), Keim and Madhavan (1997)]. They find that these trades are incorporated into prices within 10 minutes with no subsequent lasting impact on prices, which is not consistent with the lag effect documented by Froot, O’Connell, and Seasholes. Unlike Choe et al. for Korea, Karolyi (2002) documents a significant structural break in flows and returns in Japan during the Asian crisis, but, surprisingly, he finds that positive feedback trading intensifies. Kaminsky, Lyons and Schmukler (2000) investigate the trading of U.S. emerging markets mutual funds and, at a lower frequency, find that there is strong momentum trading on the part of these funds. Distinguishing between contemporaneous and lagged momentum trading, they find that contemporaneous momentum trading increases during crises, more so because of the investors in these funds than because of the managers, but that lagged momentum trading weakens during crisis periods. They also demonstrate that funds engage in what they call “contagion trading”, namely that they sell in one country when returns in another country are poor. Such a trading practice might reflect the need to rebalance portfolios during crises, 12 Kim and Wei (2002) also investigate positive feedback trading of investors. They use monthly data and find strong positive feedback trading behavior during the crisis months. They attribute the differences between their results and those of Choe, Kho and Stulz to the fact that their dataset has holdings of investors and to the fact that their sample period is longer. They extend the crisis period to June 1998. Because their crisis period is so long, they cannot say anything about the importance of feedback trading in the unfolding of the crisis. It is perfectly possible that all their feedback trading is in 1998.
1010
G.A. Karolyi and R.M. Stulz
might be due to the fact that it is harder to sell in crises countries because of lower liquidity, or might simply reflect an element of panic. Bekaert, Harvey and Lumsdaine (2002a,b) explicitly model the importance of liberalization events and crises on the joint dynamics of flows and returns for a number of emerging markets using an econometric technique devised by Bai, Lumsdaine and Stock (1998), though with monthly data. They find sharply different results if their VAR model is estimated over the entire 20-year sample and if the nonstationarity in the flows and returns relationship is ignored. Edison and Warnock (2002), using a somewhat different approach, find that flows to emerging markets depend little on the fundamentals of these markets but are related to U.S. interest rates. Their result contradicts the result in Bekaert, Harvey and Lumsdaine. A possible explanation for the difference in results is that, whereas Bekaert, Harvey and Lumsdaine normalize flows by the market capitalization, Edison and Warnock do not. 4.2. Correlations, spillovers, and contagion There is growing evidence that risk premia are determined globally, as we showed in Section 2.4. This effect naturally induces comovements in stock prices around the world. In a conditional setting, time-variation in those comovements reflects economic fundamentals that would not naturally occur if markets were segmented. There are a number of studies that have documented patterns in comovements of global asset returns. Their findings are difficult to rationalize with models of international asset pricing that assume perfect financial markets. Much of the analysis of stock price comovements focuses on one measure: correlations. Typically, studies have shown that correlations with foreign indexes, particularly for emerging markets, are low. At the same time, these correlations change over time, which makes it difficult to determine if correlations are greater now than they used to be when capital flows were more restricted. Institutional factors have always clouded the picture further; for example, how national indexes are constructed in terms of scope, coverage, and industrial composition of the companies they include has always plagued studies of cross-country correlations [Roll (1992), Heston and Rouwenhorst (1994), Griffin and Karolyi (1998)]. Longin and Solnik (1995) test the equality of covariance and correlation matrices of returns for seven developed markets across different 5-year periods between 1960 and 1990. They reject the hypothesis of equality in 10 of 16 comparisons and find that correlations increase over time. They estimate a multivariate GARCH model, similar in structure to that in Chan, Karolyi and Stulz (1992) and De Santis and Gerard (1997), to evaluate and confirm the significance of a trend factor in conditional correlations across these markets. There are other studies that challenge the trend toward higher correlations. De Santis (1993) focused on emerging markets and finds that the correlation structure is essentially the same for the 1976–84 and 1984–1992 subperiods, which is surprising given the rapid pace of liberalizations in those markets. Bekaert and Harvey (1995) employ a multivariate GARCH model with a Hamilton (1989) regime-switching feature to test
Ch. 16:
Are Financial Assets Priced Locally or Globally?
1011
for time-variation in market integration. They develop an integration index that is a conditional regime probability statistic that captures the same conditional correlations as Longin and Solnik (1995), but nested within an asset-pricing structure. Their model is applied to 21 developed and 12 emerging markets and they examine on a country-by-country basis the time series of conditional regime probabilities. Overall, they find that there is little trend in these conditional probabilities. Finally, Bekaert and Harvey (2000) estimate a model that allows correlations between emerging markets and the world market to change over time. They then estimate these correlations before and after liberalizations. Out of seventeen emerging markets, they found the correlation to be higher for only nine markets, hardly overwhelming evidence in favor of higher correlations over time. Bekaert, Harvey and Lumsdaine (2002a) suggest that results on the impact of liberalization are often hampered by an incorrect identification of the actual liberalization event, which they estimate endogenously. While there is no clear evidence that correlations are increasing with greater liberalization, there is evidence that correlations are changing over time. Unfortunately, it is difficult to associate changes in correlations with economic fundamentals. Ilmanen (1995) shows that there is a strong common factor in interest rate movements across developed countries. He suggests that this increase in correlations with this common factor reflects the weakening of national monetary authorities to pursue different inflation or interest rate policies. But, it is not clear why this would be revealed in nominal bond yields rather than expected real yields. For equities, Longin and Solnik (1995) estimate their multivariate GARCH model allowing not only a trend factor in conditional correlations but also market returns, dividend yields, and interest rates. Overall, these market and economic factors are not reliably significant. King, Sentana and Wadhwani (1994) find only weak evidence of association between correlations in monthly national index returns and economic factors. These weak results with monthly returns motivate Karolyi and Stulz (1996) to examine higher frequency, intraday returns across the USA and Japan for indexes, portfolios of individual stocks and ADRs, and even Nikkei index futures contracts. They find that correlations are time-varying, but again are not significantly related to macroeconomic news, interest rate and exchange rate shocks, dividend yields, and even trading volume. The only instrument with reliable predictive power for conditional correlations is the magnitude and direction of market movements themselves, especially for negative returns. De Santis and Gerard (1997) find a similar volatility and asymmetry effect in conditional correlations for bear markets across ten developed markets for monthly returns. Longin and Solnik (2001) employ extreme value statistics to model this same pattern of “threshold” or “extreme” correlations more formally. A possible explanation for the weak evidence of association between correlations and fundamentals is that the correlations are inadequately measured. For example, there have been a number of studies of statistically significant leading and lagging relationships among national index returns and the volatility of their returns. Eun and
1012
G.A. Karolyi and R.M. Stulz
Shim (1989) is one of the early studies to examine the joint dynamics of returns across national markets. They employ a VAR model with variance decomposition and impulse response analysis to show that the U.S. market often leads other market returns by one or two days and that the variation in U.S. market returns could capture a significant fraction of the total variation in other developed market returns. The aftermath of the international October 1987 market crisis prompted studies by King and Wadhwani (1990), Hamao, Masulis and Ng (1990) and Lin, Engle and Ito (1994) to model the joint dynamics of high-frequency, intraday return volatilities using multivariate GARCH models. They find that unexpectedly high volatility in the USA, when the U.S. market is open, leads to high volatility in Japan. The “volatility spillover” results provide similar inferences to Eun and Shim, but they indicate that the structure of conditional correlations is more complex than observed with daily, weekly, or monthly returns and when ignoring the volatility process. For example, for the USA, UK, and Japan, Hamao et al. show that relationship is more asymmetric: unexpected volatility shocks in the USA lead to higher volatility in Japan the next day and unexpected shocks to volatility in Japan leads to higher volatility in the UK but not in the USA. There have been dozens of studies that seek to relate economic fundamentals to volatility spillovers, but, like with returns themselves, with limited success [asymmetry of positive and negative returns, Bae and Karolyi (1994); regional versus international factors, Ng (2000); macroeconomic news announcements, Connolly and Wang (2003)]. The problem with the evidence on comovements in returns and volatility spillovers is that it is consistent with two hypotheses that are difficult to separate empirically, but yet have very different implications for the efficiency of financial markets. One hypothesis is that the markets have common, unobservable global components and the changes in correlations and spillovers reflect innovations in these common components. Under this view, spillovers show that markets incorporate information efficiently. The second hypothesis is that correlations and spillovers are the work of uninformed investors who systematically overreact to news in one market, corresponding to shifts in sentiments. That is, they become more risk averse following bad news and less risk averse following good news irrespective of fundamentals in their own markets. The weak results with regard to fundamentals and the finding that comovements increase systematically during bear markets, and especially during the 1994 Mexican and 1997 Asian crises, prompted a number of researchers to focus on the latter hypothesis, commonly referred to as “contagion”. The traditional view of contagion has to do with banking panics. The idea is that a bank fails and depositors start withdrawing funds from other banks that are healthy, thereby weakening these banks [Diamond and Dybvig (1983)]. King and Wadhwani (1990) is among the first studies to apply the same concept to international stock returns: a shock in one market leads investors to withdraw funds from other markets because of irrational fears and thus leads to unusually high comovements of asset prices, particularly on the
Ch. 16:
Are Financial Assets Priced Locally or Globally?
1013
downside. 13 The concept, its definition and measurement issues have been addressed in a number of important recent papers, a subset of which are surveyed by Claessens, Dornbusch and Park (2001) and featured in the book by Claessens and Forbes (2001). The literature often defines contagion to be an increase in correlations among asset returns in different markets in periods of crisis. Calvo and Reinhart (1996), Frankel and Schmukler (1998), and Bailey, Chan and Chung (2000) have suggested that the aftermath of the peso crisis in Mexico in 1994 was evidence of contagion, especially to other Latin American markets, often referred to as a “tequila effect”. Calvo and Reinhart (1996) find evidence that correlations of weekly returns on equities and Brady bonds for Asian and Latin American emerging markets was higher after the Mexican crisis than before. Frankel and Schmukler (1998) provide evidence that emerging market disturbances spread via the international investor community in New York. Bailey, Chan and Chung (2000) offer more powerful tests with transaction data at 30-minute intervals between December 1994 and April 1995 on Asian and Latin American ADRs on the NYSE and country funds. They show that on the critical days of the depreciation of the peso, exchange rate shocks had a significant and rapid (within 60 minutes) adverse effect on non-Mexican Latin American ADRs and country funds, but no measurable impact on Asian ADRs or country funds. A marked increase in correlations among different markets may, however, not be sufficient proof of contagion. Forbes and Rigobon (2001, 2002) show that in the presence of heteroscedasticity (increases in volatility around crises), an increase in correlation could simply be a continuation of strong transmission mechanisms that exist in more stable periods. They also show that increases in correlations of asset prices may result when changes in fundamentals, risk perceptions, or preferences (endogenous or omitted variables) are correlated without any additional contagion being present. In fact, they show less than 10% of the pairwise correlations among 28 developed and emerging markets during the 1987 October crash, the 1994 Mexican peso crisis, and 1997 Asian crisis increased after accounting for problems of heteroscedasticity, endogenous and omitted variables. Another way to control for the fundamentals is to study conditional probabilities rather than raw correlations. Eichengreen, Rose and Wyplosz (1996), Sachs, Tornell and Velasco (1996) and Bae, Karolyi and Stulz (2003) examine whether the likelihood of an extreme outcome (in terms of exchange rates, interest rates, or stock returns) in one country increases when there are extreme outcomes in another or several countries. These authors employ limited dependent variable models (multinomial probit and logit) to control for economic and financial channels through which linkages would occur in stable periods. These models do not rely on correlations as linear measures of
13 Claessens, Dornbusch and Park (2001, p. 4) define contagion as “the spread of market disturbances – mostly on the downside – from one (emerging market) country to the other, a process observed through co-movements in exchange rates, stock prices, sovereign spreads and capital flows”.
1014
G.A. Karolyi and R.M. Stulz
association, but there is much discretion in defining an extreme outcome and what constitutes contagion. 5. Conclusion In this paper, we have attempted to assess what we learn from the literature on the influence of global factors on portfolio choices and asset pricing. Intentionally, we have mostly focused on equities. The literature has provided clear evidence that national market risk premiums are determined internationally, but less clear evidence that international factors affect the cross-section of expected returns. Among international factors that affect asset returns, it may well be that omitting exchanges rates is more of a problem than omitting the return of foreign assets uncorrelated with the domestic market portfolio for the USA. Models that rely on perfect financial markets do not explain important stylized facts in international finance, such as the home bias and the volatility of capital flows. Though introducing barriers to international investment, especially differences in information between local and foreign investors, helps in understanding these facts better, our understanding of these facts is quite incomplete. Yet, existing explanations for the home bias cannot explain why this bias has not decreased in the USA since 1995. It may well be that investors buy foreign stocks when domestic stocks have done poorly or domestic interest rates are low, so that they did not increase their share of foreign stocks during the U.S. bull market, but such behavior seems a long way from what one would expect in a world of perfect financial markets populated with mean-variance optimizers. Perhaps a good indication of how much we still have to learn is that, while the policymakers and the press have taken it as a fact that there is contagion and that it is a first-order phenomenon, existing research raises more questions about contagion than it resolves and some authors fail to find it altogether. References Adler, M., and B. Dumas (1983), “International portfolio choice and corporation finance: a synthesis”, Journal of Finance 38:925−984. Ahearne, A.G., W.L. Griever and F.E. Warnock (2002), “Information costs and home bias: an analysis of U.S. holdings of foreign equities”, Journal of International Economics, forthcoming. Bae, K., and G.A. Karolyi (1994), “Good news, bad news and international spillovers of stock return volatility between Japan and the U.S.”, Pacific-Basin Finance Journal 2:405−438. Bae, K., G.A. Karolyi and R.M. Stulz (2003), “A new approach to measuring financial contagion”, Review of Financial Studies, forthcoming. Bai, J., R. Lumsdaine and J.H. Stock (1998), “Testing for and dating common breaks in multivariate time series”, Review of Economic Studies 65:395−432. Bailey, W., and J. Jagtiani (1994), “Foreign ownership restrictions and stock prices in the Thai capital market”, Journal of Financial Economics 36:57−87. Bailey, W., K. Chan and Y.P. Chung (2000), “Depositary receipts, country funds, and the peso crash: the intraday evidence”, Journal of Finance 55:2693−2717.
Ch. 16:
Are Financial Assets Priced Locally or Globally?
1015
Barro, R.J. (1974), “Are government bonds net wealth?”, Journal of Political Economy 82:1095−1118. Baxter, M., and U.J. Jermann (1997), “The international diversification puzzle is worse than you think”, American Economic Review 87:170−180. Bekaert, G., and C. Harvey (1995), “Time-varying world market integration”, Journal of Finance 50:403−444. Bekaert, G., and C. Harvey (2000), “Foreign speculators and emerging equity markets”, Journal of Finance 55:565−613. Bekaert, G., and R.J. Hodrick (1992), “Characterizing predictable components in excess returns on equity and foreign exchange markets”, Journal of Finance 47:467−510. Bekaert, G., C. Harvey and R. Lumsdaine (2002a), “Dating the integration of world capital markets”, Journal of Financial Economics 65:203−249. Bekaert, G., C. Harvey and R. Lumsdaine (2002b), “The dynamics of emerging market equity flows”, Journal of International Money and Finance 21:295−350. Bhagwati, J. (1998), “The capital myth”, Foreign Affairs 77(3):7−12. Black, F. (1974), “International capital market equilibrium with investment barriers”, Journal of Financial Economics 1:337−352. Black, F. (1990), “Equilibrium exchange rate hedging”, Journal of Finance 45:899−908. Bohn, H., and L. Tesar (1996), “U.S. equity investment in foreign markets: portfolio rebalancing or return chasing?”, American Economic Review 86:77−81. Bottazzi, L., P. Pesenti and E. Van Wincoop (1996), “Wages, profits and the international portfolio puzzle”, European Economic Review 40:219−254. Breeden, D. (1979), “An intertemporal asset pricing model with stochastic consumption and investment opportunities”, Journal of Financial Economics 7:265−296. Brennan, M.J., and H. Cao (1997), “International portfolio flows”, Journal of Finance 52:1851−1880. Britten-Jones, M. (1999), “The sampling error in estimates of mean–variance efficient portfolio weights”, Journal of Finance 54:655−671. Calvo, G., and C.M. Reinhart (1996), “Capital flows to Latin America: is there evidence of a contagion effect?”, in: G.A. Calvo, M. Goldstein and E. Hochreiter, eds., Private Capital Flows to Emerging Markets after the Mexican Crisis (Institute for International Economics, Washington, DC) pp. 151– 171. Campbell, J.Y., and Y. Hamao (1992), “Predictable stock returns in the United States and Japan: a study of long-term capital market integration”, Journal of Finance 47:43−70. Chan, K.C., G.A. Karolyi and R.M. Stulz (1992), “Global financial markets and the risk premium on U.S. equity”, Journal of Financial Economics 32:137−168. Chan, L., and J. Lakonishok (1993), “Institutional trades and intraday stock prices behavior”, Journal of Financial Economics 33:173−200. Chan, L., and J. Lakonishok (1995), “The behavior of stock prices around institutional trades”, Journal of Finance 50:1174−1174. Chan, L., and J. Lakonishok (1997), “Institutional equity trading costs: NYSE versus Nasdaq”, Journal of Finance 52:713−735. Chari, A., and P. Henry (2001), “Stock market liberalization and the repricing of systematic risk”, NBER Working Paper 8265 (NBER, Cambridge, MA). Cho, D.C., C.S. Eun and L.W. Senbet (1986), “International arbitrage pricing theory: an empirical investigation”, Journal of Finance 41:313−330. Choe, H., B. Kho and R.M. Stulz (1999), “Do foreign investors destabilize stock markets? The Korean experience in 1997”, Journal of Financial Economics 54:3−46. Choe, H., B. Kho and R.M. Stulz (2001), “Do domestic investors have more valuable information about individual stocks than foreign investors?”, NBER Working Paper 8073 (NBER, Cambridge, MA). Claessens, S., and K. Forbes, eds (2001), International Financial Contagion (Kluwer Academic Publishers, New York).
1016
G.A. Karolyi and R.M. Stulz
Claessens, S., R. Dornbusch and Y.C. Park (2001), “Contagion: why crises spread and how this can be stopped”, in: S. Claessens and K. Forbes, eds., International Financial Contagion (Kluwer Academic Publishers, New York) pp. 19–42. Clark, J., and E. Berko (1996), “Foreign investment fluctuations and emerging market stock returns: the case of Mexico”, Working Paper (Federal Reserve Bank of New York, New York) unpublished. Connolly, R., and F.A. Wang (2003), “International equity market co-movement: economic fundamentals or contagion?”, Pacific-Basin Finance Journal 11:23−43. Cooper, I.A., and E. Kaplanis (1994), “Home bias in equity portfolios, inflation hedging and international capital market equilibrium”, Review of Financial Studies 7:45−60. Coval, J.A., and T.J. Moskowitz (1999), “Home bias at home: local equity preference in domestic portfolios”, Journal of Finance 54:2045−2073. Coval, J.A., and T.J. Moskowitz (2001), “The geography of investment: informed trading and asset prices”, Journal of Political Economy 109:811−841. Cumby, R.E. (1990), “Consumption risk and international equity returns: some empirical evidence”, Journal of International Money and Finance 9:182−192. Dahlquist, M., and G. Robertsson (2001), “Direct foreign ownership, institutional investors, and firm characteristics”, Journal of Financial Economics 59:413−440. Dahlquist, M., and T. S¨allstr¨om (2002), “An evaluation of international asset pricing models”, Working Paper (Duke University, Durham, NC). De Santis, G. (1993), “Asset pricing and portfolio diversification: evidence from emerging financial markets” in: S. Claesens and S. Gooptu, eds., Portfolio Investment in Developing Countries (World Bank, Washington, DC) pp. 145–168. De Santis, G., and B. Gerard (1997), “International asset pricing and portfolio diversification with time-varying risk”, Journal of Finance 52:1881−1912. De Santis, G., and B. Gerard (1998), “How big is the premium for currency risk?”, Journal of Financial Economics 49:375−412. Diamond, D., and P. Dybvig (1983), “Bank runs, deposit insurance, and liquidity”, Journal of Political Economy 91:401−419. Domowitz, I., J. Glen and A. Madhavan (1997), “Market segmentation and stock prices: evidence from an emerging market”, Journal of Finance 52:1059−1085. Dumas, B. (1992), “Dynamic equilibrium and the real exchange rate in a spatially separated world”, Review of Financial Studies 5:153−180. Dumas, B., and B. Solnik (1995), “The world price of foreign exchange risk”, Journal of Finance 50:445−479. Edelen, R., and J. Warner (2001), “Aggregate price effects of institutional trading: a study of mutual fund flow and market returns”, Journal of Financial Economics 59:195−220. Edison, H.J., and F.E. Warnock (2002), “Cross-border listings, capital controls, and equity flows to emerging markets”, unpublished paper (Federal Reserve Board, Washington, DC). Eichengreen, B., A. Rose and C. Wyplosz (1996), “Contagious currency crises: first tests”, Scandinavian Journal of Economics 98:463−484. Engel, C., and J.H. Rogers (1996), “How wide is the border?”, American Economic Review 86: 1112−1125. Errunza, V., and E. Losq (1985), “International asset pricing under mild segmentation: theory and test”, Journal of Finance 40:105−124. Errunza, V., K. Hogan and M. Hung (1999), “Can the gains from international diversification be achieved without trading abroad?”, Journal of Finance 54:2075−2107. Eun, C.S., and S. Janakiramanan (1986), “A model of international asset pricing with a constraint on the foreign equity ownership”, Journal of Finance 41:897−914. Eun, C.S., and S. Shim (1989), “International transmission of stock market movements”, Journal of Financial and Quantitative Analysis 24:241−256. Fama, E. (1976), Foundations of Finance (Basic Books, New York).
Ch. 16:
Are Financial Assets Priced Locally or Globally?
1017
Fama, E., and K. French (1993), “Common risk factors in the returns on stocks and bonds”, Journal of Financial Economics 33:3−56. Fama, E., and K. French (1998), “Value versus growth: the international evidence”, Journal of Finance 53:1975−1999. Fama, E., and G. Schwert (1977), “Human capital and capital market equilibrium”, Journal of Financial Economics 4:95−125. Ferson, W., and C. Harvey (1993), “The risk and predictability of international equity returns”, Review of Financial Studies 6:527−566. Ferson, W., and C. Harvey (1994), “Sources of risk and expected returns in global equity markets”, Journal of Banking and Finance 18:775−803. Ferson, W., and C. Harvey (1995), “Predictability and time-varying risk in world equity markets”, Research in Finance 13:25−87. Ferson, W., and C. Harvey (1997), “Fundamental determinants of national equity market returns: a perspective on conditional asset pricing”, Journal of Banking and Finance 21:1625−1665. Foerster, S., and A. Karolyi (1999), “The effects of market segmentation and investor recognition on asset prices: evidence from foreign stocks listing in the U.S.”, Journal of Finance 54:981−1014. Foerster, S.R., and G.A. Karolyi (2000), “The long run performance of global equity offerings”, Journal of Financial and Quantitative Analysis 35:499−528. Forbes, K., and R. Rigobon (2001), “Measuring contagion: conceptual and empirical issues”, in: S. Claessens and K. Forbes, eds., International Financial Contagion (Kluwer Academic Publishers, New York). Forbes, K., and R. Rigobon (2002), “No contagion, only interdependence: measuring stock market comovements”, Journal of Finance 57:2223−2261. Frankel, J., and S. Schmukler (1998), “Crises, contagion, and country funds: effects on East Asia and Latin America”, in: R. Glick, ed., Managing of Capital Flows and Exchange Rates: Lessons from the Pacific Rim (Cambridge University Press, Cambridge) pp. 232–266. French, K., and J.M. Poterba (1991), “International diversification and international equity markets”, American Economic Review 81:222−226. Froot, K., and T. Ramadorai (2001), “The information content of international portfolio flows”, NBER Working Paper 8472 (NBER, Cambridge, MA). Froot, K., and K. Rogoff (1995), “Perspectives on PPP and long-run real exchange rates”, in: G. Grossman and K. Rogoff, eds., Handbook of International Economics, Vol. III (Elsevier, Amsterdam) pp. 1647– 1688. Froot, K., and R.H. Thaler (1990), “Anomalies: foreign exchange”, Journal of Economic Perspectives 4:179−192. Froot, K., P. O’Connell and M. Seasholes (2001), “The portfolio flows of international investors”, Journal of Financial Economics 59:151−193. Gehrig, T. (1993), “An information based explanation of the domestic bias in international equity investment”, Scandinavian Journal of Economics 95:97−109. Giovannini, A., and P. Jorion (1989), “The time variation of risk and return in the foreign exchange and stock markets”, Journal of Finance 44:307−326. Glassman, D.A., and L.A. Riddick (2001), “What causes home asset bias and how should it be measured?”, Journal of Empirical Finance 8:35−54. Grauer, F.L.A., R.H. Litzenberger and R.E. Stehle (1976), “Sharing rules and equilibrium in an international market under uncertainty”, Journal of Financial Economics 3:233−256. Griffin, J.M. (2002), “Are the Fama and French factors global or country-specific?”, The Review of Financial Studies 15:783−803. Griffin, J.M., and G.A. Karolyi (1998), “Another look at the role of the industrial structure of markets for international diversification strategies”, Journal of Financial Economics 50:351−373. Griffin, J.M., and R.M. Stulz (2001), “International competition and exchange rate shocks: a cross-country industry analysis of stocks”, Review of Financial Studies, 215−241.
1018
G.A. Karolyi and R.M. Stulz
Grinblatt, M., and M. Keloharju (2000), “The investment behavior and performance of various investortypes: a study of Finland’s unique data set”, Journal of Financial Economics 55:43−67. Grinblatt, M., and M. Keloharju (2001), “How distance, language, and culture influence stockholdings and trades”, Journal of Finance 56:1053−1073. Gultekin, M.N., N.B. Gultekin and A. Penati (1989), “Capital controls and international capital market segmentation: the evidence from the Japanese and American stock markets”, Journal of Finance 44:849−870. Hamao, Y., R.W. Masulis and V. Ng (1990), “Correlations in price changes and volatility across international stock markets”, Review of Financial Studies 3:281−308. Hamilton, J.D. (1989), “A new approach to the economic analysis of nonstationary time series and the business cycle”, Econometrica 57:357−384. Harvey, C. (1991), “The world price of covariance risk”, Journal of Finance 46:111−158. Harvey, C. (1995), “Predictable risk and returns in emerging markets”, Review of Financial Studies 8:773−816. Hau, H. (2001), “Location matters”, Journal of Finance 56:1959−1983. Henry, P. (2000), “Stock market liberalization, economic reform, and emerging market prices”, Journal of Finance 55:529−564. Heston, S.L., and K.G. Rouwenhorst (1994), “Does industrial structure explain the benefits of international diversification?”, Journal of Financial Economics 36:3−27. Hietala, P. (1989), “Asset pricing in partially segmented markets: evidence from the Finnish market”, Journal of Finance 44:697−718. Ilmanen, A. (1995), “Time-varying expected returns in international bond markets”, Journal of Finance 50:481−506. Jorion, P. (1991), “The pricing of exchange rate risk in the stock market”, Journal of Financial and Quantitative Analysis 26:363−376. Jorion, P., and E. Schwartz (1986), “Integration vs. segmentation in the Canadian stock market”, Journal of Finance 41:603−613. Kaminsky, G., and S. Schmukler (1999), “What triggers market jitters? A chronicle of the East Asian crisis”, Journal of International Money and Finance 18:537−560. Kaminsky, G., R. Lyons and S. Schmukler (2000), “Managers, investors, and crises: mutual fund strategies in emerging markets”, NBER Working Paper 7855 (NBER, Cambridge, MA). Kang, J., and R.M. Stulz (1997), “Why is there a home bias? An analysis of foreign portfolio equity ownership in Japan”, Journal of Financial Economics 46:3−28. Karolyi, A. (1998), “Why do companies list shares abroad? A survey of the evidence and its managerial implications”, Financial Markets, Institutions, and Instruments 7:1−60. Karolyi, G.A. (2002), “Did the Asian financial crisis scare foreign investors out of Japan?”, Pacific Basin Finance Journal 10:411−442. Karolyi, G.A., and R.M. Stulz (1996), “Why do markets move together? An investigation of U.S.–Japan stock return comovements”, Journal of Finance 51:951−986. Keim, D., and A. Madhavan (1997), “Transactions costs and investment style: an interexchange analysis of institutional equity trades”, Journal of Financial Economics 46:265−292. Kim, W., and S.-J. Wei (2002), “Foreign portfolio investors before and during a crisis”, Journal of International Economics 56:77−96. King, M. (1999), Reforming the International Monetary System: The Middle Way (Bank of England, London). King, M., and S. Wadhwani (1990), “Transmission of volatility between stock markets”, Review of Financial Studies 3:5−33. King, M., E. Sentana and S. Wadhwani (1994), “Volatility and links between national stock markets”, Econometrica 62:901−933. Korajczyk, R.A., and C.J. Viallet (1989), “An empirical investigation of international asset pricing”, Review of Financial Studies 2:553−586.
Ch. 16:
Are Financial Assets Priced Locally or Globally?
1019
Krugman, P. (1981), “Consumption preferences, asset demands, and distribution effects in international financial markets”, NBER Working Paper 651 (NBER, Cambridge, MA). Krugman, P. (1998), “Saving Asia: it’s time to get radical”, Fortune, September 7. Lewis, K.K. (1999), “Trying to explain home bias in equities and consumption”, Journal of Economic Literature 37:571−608. Li, K. (2001), “Home bias in equities: an investor perspective”, UBC Working Paper (University of British Columbia, Vancouver, BC). Lin, W., R. Engle and T. Ito (1994), “Do bulls and bears move across borders? International transmission of stock returns and volatility”, Review of Financial Studies 7:507−538. Longin, F., and B. Solnik (1995), “Is the correlation in international equity returns constant: 1970−1990?”, Journal of International Money and Finance 14:3−26. Longin, F., and B. Solnik (2001), “Extreme correlation of international equity markets”, Journal of Finance 56:649−676. Lucas Jr, R. (1982), “Interest rates and currency prices in a two-country world”, Journal of Monetary Economics 10:335−360. Mark, N. (1988), “Time-varying betas and risk premia in the pricing of forward foreign exchange contracts”, Journal of Financial Economics 22:335−354. Mayers, D. (1973), “Nonmarketable assets and the determination of capital asset prices in the absence of a riskless asset”, Journal of Business 46:258−267. McCurdy, T., and I. Morgan (1992), “Evidence of risk premiums in foreign currency futures markets”, Review of Financial Studies 5:65−84. Merton, R.C. (1973), “An intertemporal capital asset pricing model”, Econometrica 41:867−888. Merton, R.C. (1980), “On estimating the expected return on the market: an exploratory investigation”, Journal of Financial Economics 8:323−361. Merton, R.C. (1987), “Presidential address: a simple model of capital market equilibrium with incomplete information”, Journal of Finance 42:483−510. Miller, D. (1999), “The market reaction to international cross-listing: evidence from depositary receipts”, Journal of Financial Economics 51:103−123. Mittoo, U. (1992), “Additional evidence on integration in the Canadian stock market”, Journal of Finance 47:2035−2054. Mussa, M., and A. Richards (1999), “Capital flows in the 1990s before and after the Asian crisis”, Research Department Working Paper (International Monetary Fund, Washington, DC). Ng, A. (2000), “Volatility spillover effects from Japan and the US to the pacific-basin”, Journal of International Money and Finance 19:207−233. O’Connell, P.J. (1998), “The overvaluation of purchasing power parity”, Journal of International Economics 44:1−19. Pastor, L. (2000), “Portfolio selection and asset pricing models”, Journal of Finance 55:179−223. Perold, A.F., and E.C. Schulman (1988), “The free lunch in currency hedging: implications for investment policy and performance standards”, Financial Analyst Journal 44:45−52. Pinkowitz, L., R. Stulz and R. Williamson (2003), “Corporate control and the home bias”, Journal of Financial and Quantitative Analysis 38:87−110. Radelet, S., and J. Sachs (1998), “The onset of the East Asian financial crisis”, Working Paper (Harvard Institute for Economic Development, Cambridge, MA). Roll, R. (1992), “Industrial structure and the comparative behavior of international stock market indexes”, Journal of Finance 47:3−42. Rouwenhorst, K.G. (1999), “Local return factors and turnover in emerging stock markets”, Journal of Finance 54:1439−1464. Sachs, J.D., A. Tornell and A. Velasco (1996), “Financial crises in emerging markets: the lessons from 1995”, Brookings Papers 27:147−199. Sarno, L., and M.P. Taylor (2001), “Purchasing power parity and the real exchange rate”, CEPR Discussion Paper 2913 (CEPR, London).
1020
G.A. Karolyi and R.M. Stulz
Seasholes, M. (2000), “Smart foreign traders in emerging markets”, Working Paper (University of California at Berkeley, Berkeley, CA). Sercu, P. (1980), “A generalization of the international asset pricing model”, Revue de l’Association Fran¸caise de Finance 1:91−135. Shiller, R.J., F. Kon-Ya and Y. Tsutsui (1996), “Why did the Nikkei crash? Expanding the scope of expectations data collection”, Review of Economics and Statistics 78:156−164. Shukla, R.K., and G.B. van Inwegen (1995), “Do locals perform better than foreigners? An analysis of UK and US mutual fund managers”, Journal of Economics and Business 47:241−254. Siegel, J.J. (1972), “Risk, interest, and forward exchange”, Quarterly Journal of Economics 86:303−309. Solnik, B. (1974), “An equilibrium model of the international capital market”, Journal of Economic Theory 8:500−524. Solnik, B. (1983), “International arbitrage pricing”, Journal of Finance 38:449−457. Solnik, B. (1993), “Currency hedging and Siegel’s paradox: on Black’s universal hedging rule”, Review of International Economics 1:180−187. Stehle, R. (1977), “An empirical test of the alternative hypotheses of national and international pricing of risky assets”, Journal of Finance 32:493−502. Stiglitz, J. (1998), “Boats, planes and capital flows”, Financial Times, March 25. Stulz, R.M. (1981a), “A model of international asset pricing”, Journal of Financial Economics 9: 383−406. Stulz, R.M. (1981b), “On effects of barriers to international investment”, Journal of Finance 36:923−934. Stulz, R.M. (1983), “The demand for foreign bonds”, Journal of International Economics 26:271−289. Stulz, R.M. (1987), “An equilibrium model of exchange rate determination and asset pricing with non-traded goods”, Journal of Political Economy 95:1024−1040. Stulz, R.M. (1995), “The cost of capital in internationally integrated markets: the case of Nestl´e”, European Financial Management 1:11−22. Stulz, R.M. (1999), “International portfolio flows and security markets”, in: M. Feldstein, ed., International Capital Flows (University of Chicago Press, Chicago, IL) pp. 257–293. Stulz, R.M., and W. Wasserfallen (1995), “Foreign equity investment restrictions, capital flight, and shareholder wealth maximization: theory and evidence”, Review of Financial Studies 8:1019−1057. Svensson, L.E.O. (1985), “Currency prices, terms of trade, and interest rates: a general equilibrium asset-pricing cash-in-advance approach”, Journal of International Economics 18:17−42. Tesar, L., and I. Werner (1994), “International equity transactions and U.S. portfolio choice”, in: J. Frankel, ed., Internationalization of Equity Markets, NBER Project Report (University of Chicago Press, Chicago, IL) pp. 185–216. Tesar, L., and I.M. Werner (1995), “Home bias and high turnover”, Journal of International Money and Finance 14:467−493. Uppal, R. (1993), “A general equilibrium model of international portfolio choice”, Journal of Finance 48:529−553. Warther, V.A. (1995), “Aggregate mutual fund flows and security returns”, Journal of Financial Economics 39:209−235. Wheatley, S. (1988), “Some tests of international equity market integration”, Journal of Financial Economics 21:177−212. Zhang, X. (2001), “Specification tests of asset pricing models in international markets”, Working Paper (Columbia University, New York).
Chapter 17
MICROSTRUCTURE AND ASSET PRICING DAVID EASLEY ° Department of Economics, Cornell University MAUREEN O’HARA∗ Johnson Graduate School of Management, Cornell University
Contents Abstract Keywords 1. Introduction 2. Equilibrium asset pricing 3. Asset pricing in the short-run 3.1. 3.2. 3.3. 3.4.
The mechanics of pricing behavior The adjustment of prices to information Statistical and structural models of microstructure data Volume and price movements
4. Asset pricing in the long-run 4.1. Liquidity 4.2. Information
5. Linking microstructure and asset pricing: puzzles for researchers References
1022 1022 1023 1024 1025 1026 1029 1031 1033 1035 1036 1041 1044 1047
° We would like to thank Joel Hasbrouck, Ravi Jagannathan, Ren´ e Stulz, and Ivo Welch for helpful comments.
Handbook of the Economics of Finance, Edited by G.M. Constantinides, M. Harris and R. Stulz © 2003 Elsevier B.V. All rights reserved
1022
D. Easley and M. O’Hara
Abstract Market microstructure and asset pricing both consider the behavior and formation of prices in asset markets. Yet neither literature explicitly recognizes the importance and role of the factors so crucial to the other approach. This survey seeks to join the two literatures by surveying the work linking microstructure factors to asset price dynamics. In the short run, these dynamics involve issues such as the autocorrelation and cross-correlation structure of stocks, and our survey will examine the literature relating these correlation structures to microstructure factors such as nonsynchronous trading and dealer behavior. In the longer run, issues such as liquidity and the relation of private information to asset price dynamics are important. We survey the theoretical work linking microstructure factors to long-run returns, and we consider why stock prices might be expected to reflect premia related to liquidity or informational asymmetries. We also survey the empirical literature testing these relationships. We then discuss what issues remain contentious, and we provide some guidance for future research. We hope to show in this survey that asset-pricing dynamics may be better understood by recognizing the role played by microstructure factors, and that microstructure research can be enhanced by a greater understanding of its linkages with fundamental economic variables.
Keywords market microstructure, non-synchronous trading, dealer behavior, bid–ask spread JEL classification: G12, G14
Ch. 17:
Microstructure and Asset Pricing
1023
1. Introduction Market microstructure analyzes the behavior and formation of prices in asset markets. Fundamental to this approach is the belief that features of the particular trading mechanisms used in markets are important in influencing the behavior of asset prices. The asset-pricing literature also considers the behavior and formation of asset prices. This literature focuses on linking asset-price dynamics to underlying economic fundamentals. While these two literatures share a common focus, they also share a common flaw: neither literature explicitly recognizes the importance and role of the factors so crucial to the other approach. Thus, microstructure provides extensive characterizations of short-term price behavior, but fails to consider how these microstructure factors may influence asset returns. The asset-pricing literature extensively studies asset returns, but it rarely looks to see if these returns are related to underlying asset specifics such as private information or trading practices. Given the central importance of asset pricing in finance, a junction of these two very important literatures would seem beneficial. In this article, we seek to foster this process by surveying the work linking microstructure factors to asset price dynamics. In the short run, these asset price dynamics involve issues such as the auto-correlation and cross-correlation structure of stocks, and our survey will examine the literature relating these correlation structures to microstructure factors such as non-synchronous trading and dealer behavior. In the longer run, issues such as liquidity and the relation of private information to asset price dynamics dominate the research agenda. Here our work will survey the theoretical work linking microstructure factors to long-run returns, and in particular consider why stock prices might be expected to reflect premia related to liquidity or informational asymmetries. We also survey the extensive empirical literature testing these relationships. We note at the outset that while our focus in this survey is broad, it is not exhaustive. Market microstructure factors can exert great influence over the level of transactions costs, the way rents are split between dealers and investors (or between different classes of investors), and the interactions between markets. All of these undoubtedly affect asset returns but, while important, they will remain largely outside of the scope of our inquiry. Similarly, a wide range of factors including behavioral biases may influence asset prices, and these factors, in turn, may be affected by microstructure variables. 1 These issues, too, will generally be left to the side. Our exclusion of these factors reflects the inevitable difficulty of trying to tie together large and diffuse literatures, requiring us to focus our efforts on only a limited number of issues. There are, however, a number of excellent survey articles and other sources on market microstructure
1 For example, the market structure may greatly influence the volatility of asset prices. If investors react to volatility in predictably biased ways, then the microstructure may be an important determinant of these effects.
1024
D. Easley and M. O’Hara
issues [see Madhavan (2000), Bias, Glosten and Spatt (2002), Lyons (2001), Hasbrouck (1996), O’Hara (1995) and chapter 9 in this Handbook by H.R. Stoll] to which we refer the interested reader. What we hope to accomplish in this survey is to show that asset-pricing dynamics may be better understood by recognizing the role played by microstructure factors. We also hope to influence the direction of microstructure research towards greater analysis of the linkages of microstructure and fundamental economic variables. To do so, we will try to highlight what is known and not known about the effect of microstructure variables on short-run and long-run asset price behavior. The paper’s final section will then summarize what issues remain contentious, and provide some guidance for future research in this area.
2. Equilibrium asset pricing A useful starting point for our analysis is the standard explanation for asset-price behavior [see, for example, Cochrane (2001)]. The basic result is that the price of an asset at date t is the expectation at t of the return on the asset at t + 1 times a stochastic discount factor. In the consumption-based capital asset-pricing approach (in which rational individuals choose asset holdings to maximize their expected utility of consumption over time) this stochastic discount factor is the discounted ratio of marginal utility of consumption at t + 1 to marginal utility of consumption at t. From this basic pricing equation follow the two principles that much of the literature examines. First, idiosyncratic risk should not be priced. The price of an asset is its expected payoff discounted by the risk-free rate plus the covariance of its payoff with the stochastic discount factor. If this covariance is zero there is no adjustment in price for the risk in the asset’s payoff; or more generally, if the payoff on the asset is correlated with the market only the correlation is priced and any idiosyncratic risk is not priced. Second, viewing the return on an asset as its dividend plus its future price, it follows from the basic equation that asset prices net of dividends follow martingales. The distribution used in computing the expectation is the original distribution of payoffs transformed by the stochastic discount factor. This is simply the risk neutral probability measure. In the case in which individuals are risk neutral, or there is no aggregate risk, and discounting is ignored, or included in a drift term, this reduces to the random walk hypothesis for prices. This has the important implication that asset prices are not predictable in the sense that simple trading strategies based on past price behavior cannot be profitable. From this standard asset-pricing point of view, regardless of how asset prices are actually attained in the economy, an asset’s risk and return can be analyzed from the underlying decision problems confronting agents in the economy. This simple story, or a variant of it, has proved a useful construct for countless economic analyses of asset prices. Unfortunately, the elegance of this story may be deceptive. Asset price dynamics are much more complex than this characterization
Ch. 17:
Microstructure and Asset Pricing
1025
allows. Individuals can face substantial transaction costs in buying and selling securities, and these costs can influence their demand for and supply of securities. 2 Price adjustments may be complicated by market maker’s inventory positions, by price discreteness, or even by exchange price continuity rules. The informativeness of asset prices is complicated by the existence of private information that transforms securities trading from a simple transaction into a strategic game between traders. This transformation also imparts importance to other market data such as volume, trade size, and the timing of trade, as each of these variables can be informative of future security price movements. These difficulties raise the specter that equilibrium in asset prices is best viewed not as an outcome, but as a process in which the asset price cannot be viewed independently of the mechanism by which it is attained. How, then, to reconcile this picture of asset-pricing dynamics with the idealized one given above? One answer is to argue that these complications are second-order effects; that the idealized version of asset price dynamics is “close enough”, particularly if asset prices are looked at over long time intervals. Moreover, since microstructure research focuses on the trade-by-trade determination of prices, any microstructure effects must either be very short-lived or diversifiable across securities in any case. Yet, numerous researchers have found that asset-price movements exhibit predictable patterns both in the short-run and in the long run. These patterns raise the potential for profitable arbitrage, and at the very least imply a disequilibrium in the market inconsistent with the Olympian perspective given above. This predictability suggests that insights derived from microstructure analyses of the trading process may improve our understanding of asset-pricing dynamics, even in long run. Of particular importance here are market features such as dealer behavior, trading patterns, and the nature of intra-day and inter-day price adjustment to information. In the next section, we consider how such microstructure variables may relate to some puzzles in shortterm asset pricing. 3. Asset pricing in the short-run Consider the following empirical findings: (1) Weekly and monthly portfolio returns are positively autocorrelated [Conrad and Kaul (1988), Conrad, Kaul and Nimalendran (1991), Lo and MacKinley (1988), Mech (1993)]. (2) Short horizon returns on individual securities are negatively autocorrelated [Fama (1965), French and Roll (1986), Lo and MacKinley (1990b), Conrad, Kaul and Nimalendran (1991)]. (3) Stock returns on large firms lead stock returns on small firms [Lo and MacKinley (1990b), Boudoukh, Richardson and Whitelaw (1994)]. 2
These costs may even influence the type of securities we see being issued. Specifically, some securities may have such high trading costs that we never observe them in actual markets.
1026
D. Easley and M. O’Hara
(4) High volume security returns lead low volume security returns even on a monthly basis [Gervais, Kaniel and Mingelgrin (1999), Chordia and Swaminathan (2000)]. (5) Return reversals follow high volume days for large firms, but return continuations follow high volume days for smaller firms [Llorente, Michaely, Saar and Wang (2002), Antoniewicz (1993), Stickel and Verrecchia (1994)]. Each of these findings suggests that short-term stock price movements are predictable. Not surprisingly, each of these findings has also unleashed a torrent of research designed to uncover what is responsible for such behavior [see for example, Boudoukh, Richardson and Whitelaw (1994)]. The usual suspects investigated include measurement error, irrational or biased trading behavior, and microstructure effects. While not denying the potential importance of the first two causes, the fact that many of these analyses are based on transactions prices lends support to the notion that microstructure effects are involved. Rather than focus on the specific asset-pricing anomalies, let us instead consider more generally how specific microstructure factors affect price behavior. 3.1. The mechanics of pricing behavior Suppose we consider a simple depiction of asset prices. 3 Let the observable asset price pt be given by pt = vt + st ,
(1)
where vt is the underlying “efficient” price and the st are random variables representing frictions potentially due to microstructure effects. At this point, we leave the origins of these frictions unspecified, but we assume that whatever their cause they have mean zero and are uncorrelated with vt . A standard example of the st is to link these frictions explicitly to trades, or st = 12 sxt , where xt is a trade indicator taking the value +1 if the trade at time t is a buy and −1 if the trade at time t is a sell, buys and sells are equally likely and s is bid–ask spread. In an efficient market, prices follow a martingale, which simply means that vt can be thought of as a conditional expectation given market information of the random value of the asset, V , (i.e. vt = E[V | It ] where It is the market’s information). The true efficient price vt is not, of course, observable, a point made with admirable clarity by Hasbrouck (2000), so the actual estimation of this efficient price is a non-trivial problem. In general, we will only be able to make inferences about this true value by looking at transactions prices, bid and ask quotes, or possibly other market data. A traditional approach in finance is to argue that if we look at prices with long periodicity then we would expect that price movements are driven by changes in this underlying efficient price. Given reasonable stationarity assumptions, these price
3
Hasbrouck (1996) provides an excellent and more detailed description of these statistical models.
Ch. 17:
Microstructure and Asset Pricing
1027
changes would be expected to follow a random walk, so that the microstructure related price movements captured by st would be unimportant. Looking over shorter intervals, however, the change in prices can have predictable patterns unrelated to changes in the underlying efficient price. As Hasbrouck (1996) notes: “At the level of transactions prices, . . . , the random walk conjecture is a straw man, a hypothesis that is very easy to reject in most markets even in small data samples. In microstructure, the question is not “whether” transactions prices diverge from a random walk, but rather “how much?” and “why?” ”. To see this, let us look more closely at price changes: Rt = Ft + st −st − 1 = Ft + 12 s (xt −xt − 1 ) , ,
(2)
where Ft = E[V | It ] − E[V | It − 1 ], or the change in conditional expectations. In the absence of new information, Ft should be zero. The change in the market frictions term, however, need not be zero, and in particular, may introduce particular patterns into security prices. Suppose we assume that this market friction is a given bid/ask spread which arises due to exogenous factors such as order processing costs. Then 12 s is half the effective spread and the vt term in the pricing equation can be proxied by the spread midpoint. Roll (1984) was the first to point out that market frictions such as the bid/ask spread could lead to divergences in price behavior from the simple random walk story. In particular, Roll demonstrated that in the statistical framework given above Cov(Rt , Rt − 1 ) = − 14 s2 . The intuition for this is that in the absence of any new information, observed trade prices move between the bid and the ask price. If the price is already at the bid, for example, the following trade at the bid results in a zero price change, while a trade at the ask results in a mean reverting process. Thus, observable price movements become negatively serially correlated. 4 The market friction term may include more complex effects. For example, suppose that dealers try to manage their inventories by adjusting prices. 5 If a dealer has a preferred inventory position Q∗ , then the pricing equation can be modified as pt = vt − b (Qt −Q∗ ) + 12 sxt .
(3)
Now, the observed spreads will include both the order processing component and an inventory related factor. Because the dealer is setting prices to induce movement toward his optimum, dealer quotes (and thus prices) and inventory will be mean reverting. 4
Roll argued that “an interesting feature of this spread-induced serial covariance is that it is independent of the time interval chosen for collecting successive prices”. Thus, whether we are looking at trade by trade movements or across days, this negative serial correlation will be present. See also Harris (1990) for statistical analysis of the Roll measure. 5 For the dealer’s inventory control to be effective trades must be price sensitive. See Hasbrouck (1996) for more detail.
1028
D. Easley and M. O’Hara
Moreover, the mid-price of the observed spread need not be a good proxy for the underlying efficient price. An extensive literature has tested for these inventory effects in prices [see for example, Hasbrouck (1988), Madhavan and Smidt (1993), Hasbrouck and Sofianos (1993), Manaster and Mann (1996), Lyons (1998)]. In general, results from equity markets have found positive but rather small effects, while results from other markets such as futures or foreign exchange find stronger effects. Another large literature [see Glosten and Harris (1988), Stoll (1989), George, Kaul and Nimalendran (1991)] decomposes spreads into components due to inventory, order processing, and information (an issue we will address shortly). From our perspective here, these results suggest that over a short time horizon, prices can exhibit patterns unrelated to changes in the underlying efficient price, patterns that provide at least the potential for price predictability. An important feature of these particular frictions is that they are linked to trading. For example, trades bouncing between the bid and ask quotes result in the observed trade price process exhibiting negative serial correlation, even though the underlying true price process remains a random walk. However, an important feature of actual markets is that trades do not occur with regular periodicity, but instead depend upon order arrivals. Even the most active stocks exhibit intra-day patterns in these order arrivals, so that trades do not occur regularly throughout the day. This has the important implication that trade prices provide only a censored sample of the underlying true price [see Lo and MacKinley (1990a), Easley and O’Hara (1992)]. An immediate implication of this censoring is that comparing trade prices across stocks can result in spurious inferences. In particular, suppose that we have two stocks with identical underlying true price processes. Now, let an innovation occur to this true price. If orders do not arrive simultaneously for both stocks, then trades will occur at different times for the two stocks. Looking at the most recent trade price in the two stocks will reveal different prices, even though the underlying true price of the assets is in fact the same. This non-synchronous trading introduces lags into observed price adjustment, and thus cross-sectional differences in the auto-correlation patterns in returns. Numerous authors [see for example, Lo and MacKinley (1990a), Mech (1993), Kadlec and Patterson (1999)] have shown that nonsychronous trading plays an important role in explaining the observed pattern of large firm weekly returns leading small firm weekly returns. Indeed, Boudoukh, Richardson and Whitelaw (1994) argue a “large fraction” of the observed effect may be explained by this factor, while Kadlec and Patterson (1999) estimate it to be more on the order of 25%. An important feature of these analyses is that the order arrival process, while dependent on firm type, is not assumed linked to the underling true process per se. If new information causes trading, however, then non-trading may be correlated with the underlying true price process, suggesting a greater complexity to the observed patterns in asset price dynamics. To understand this linkage, we need to look more closely at the role of information in price adjustment.
Ch. 17:
Microstructure and Asset Pricing
1029
3.2. The adjustment of prices to information To this point, we have considered models in which the underlying value of the asset and the trade process were independent. Trades mattered only because they caused transaction prices to move between bid and ask prices that bracket the underlying value. However, much of the microstructure literature considers a more complex story, see Kyle (1985), Glosten and Milgrom (1985), Easley and O’Hara (1987) or O’Hara (1995) for an overview. In this story, some traders, the informed, know more about the underlying value of the asset than do other traders, the uninformed, or the market maker. The market maker cannot distinguish between informed traders and uninformed traders who are trading for non-informational reasons such as liquidity shocks. He loses to the informed traders, so to have zero expected profits he must make profits from the uninformed traders. He does this by setting a bid–ask spread. The actual spread may, of course, include compensation for the fixed cost of doing business and an inventory component, as well as this asymmetric information component. We focus first on this asymmetric information component. In this setting, buy and sell orders convey information so they affect the expected value of the asset. Let vt = E[V | It ] be the prior expected value of the asset, where It is the public information prior to the trade at time t. The trade indicator is again xt = +1 if the trade at t is a buy and −1 if the trade at t is a sell. The ask price, the price a trader has to pay for a unit of the asset, is pat = E[V | It , xt = +1] and the bid price, the price a trader will receive if he sells a unit of the asset, is pbt = E[V | It , xt = −1]. This pricing strategy induces a spread, st = pat − pbt , but only if the order type affects the expected value of the asset. This occurs in markets in which some orders come from informed traders. When an order to buy arrives at the market, the expected value of the asset rises because this order is a signal of good news about the value of the asset. Similarly, the arrival of an order to sell is bad news, the expected value of the asset falls and the transaction price is lower. The observable price of the asset at time t, pt , is thus a pt if xt = +1, pt = (4) if xt = −1. pbt Here again trade causes prices to bounce between a bid price and an ask price. In the previous section (ignoring active inventory control for now), this bounce was purely transitory; there was no effect of a trade at time t on the price at time t + 1. Now there is also a permanent component. An order to buy at time t causes a transaction at the time t ask price, pat , and it changes the expected value of the asset to pat . Bid and ask prices at time t + 1 bracket this new expected value of the asset. Orders at time t + 1 also cause prices to adjust, and this adjustment process can be complex. In particular, if prices have not adjusted to true values, then informed traders can be expected to continue trading all on the same side of the market, buying when there is good news and selling when there is bad. In a simple world where information events are known to have occurred, this repeated trading is incorporated by
1030
D. Easley and M. O’Hara
the market maker into his price-setting decision, and prices exhibit Markov behavior. If there is uncertainty about information, however, then the sequence of trades may also be informative. Sequence effects can also arise from the actions of uninformed traders who may opt to reduce the price effects of large trades by splitting orders [see Bernhardt and Hughson (1997)]. Numerous authors [see Hasbrouck (1996) for review] have shown that trades are positively auto-correlated, a result consistent with this nonMarkovian pattern. Returning to Equation (4), it is also possible to add back into this model fixed components of the spread and inventory control components. Glosten (1987) analyzes a model with both asymmetric information and fixed components. He shows that the asymmetric information component has different implications for prices than do fixed components, in particular the true price of the asset and the random spread effect are correlated. Of particular importance is the part of the bid–ask spread that is due to asymmetric information versus order processing costs or inventory control. Glosten and Harris (1988) show that a permanent adverse selection effect is important for a sample of 250 NYSE stocks. Stoll (1989) analyzes NASDAQ stocks and concludes that 43% of the spread is due to adverse selection, 10% is due to inventory cost and 47% is due to order processing costs. Huang and Stoll (1997) use a more general model on 19 actively traded NYSE stocks to conclude that 10% of the spread is due to adverse selection, 29% is due to inventory cost and 62% is due to order processing costs. The variation in results suggest that the matter is not yet settled, but it is clear that all three factors are important, and that they have differing implications for statistical biases in asset returns. Huang and Stoll (1994) reinforce these results by examining whether microstructure effects can explain quote revision or transaction price revision. In particular, these authors demonstrate that the movement of both quotes and prices can be predicted by factors linked to both inventory theories and information-based theories. 6 Consequently, at least over very short intervals, microstructure variables appear to predict subsequent stock price movements. Microstructure spread effects may also explain the behavior of stock prices over somewhat longer time intervals. In particular, an enduring puzzle in asset pricing is the role of taxes in affecting valuation. Numerous researchers [most notably, Elton and Gruber (1970)] argued that the ex-dividend behavior of stock prices was evidence that the market values capital gains more than cash dividends. Miller and Scholes (1982) pointed out, however, that transactions costs are not negligible when compared with the usual magnitude of cash dividends. Boyd and Jagannathan (1994) demonstrated that the implicit bid–ask spread can explain the price pattern observed on ex-dividend days, thus undermining the argument for tax effects. Frank and Jagannathan (1998) provide further evidence for microstructure effects by showing that trading costs can
6
Huang and Stoll (1994) note, however, that this need not imply the existence of arbitrage profits as the transactions costs of implementing trades may preclude such profits.
Ch. 17:
Microstructure and Asset Pricing
1031
explain the ex-dividend behavior of stock prices in Hong Kong (a country without capital gains taxes). A natural conclusion from this work is that microstructure effects may be large enough to explain a variety of short-run asset price movements. 3.3. Statistical and structural models of microstructure data In the previous two sections we considered simple reduced-form models of microstructure data that were designed to allow estimation of parametric effects of microstructure features on asset prices. An alternative approach is to forego structure and employ a purely statistical time series model of high frequency microstructure data. This data consists of trade characteristics such as price and quantity as well as the time of the trade, or the elapsed time since the last trade. Much of the literature in this area attempts to use time series models to forecast prices, price changes or volatility and to infer the information content of particular variables. 7 We first consider time-series models in which time is taken to be trade-time rather than clock-time. Here, the data is treated as if observations of trades and price changes occurred at equally spaced intervals in real time. The statistical models in the previous section are examples of this approach. In their reduced-form, these models are bivariate vector autoregressions with structure imposed by the underlying economic model. Hasbrouck (1991) takes this approach to estimate the information content of trades, which in this setting is given by the persistent price impact of the unexpected component of trades. Numerous authors have used this permanent and temporary price effects approach to estimate various asset price regularities such as the informativeness of small trades [see Hasbrouck (1991)] or the impact of block trades [see Saar (2001)]. An alternative approach is to consider trades in real time and ask if there is information content in the time between trades. Two papers here are particularly relevant. Diamond and Verrecchia (DV) (1987) develop a model in which short sale constraints impart information content to the time between trades. In their model, traders learning bad news may be unable to short the stock, so longer times between trades may signal bad news. Easley and O’Hara (EOH) (1992) show that if the existence of new information is uncertain, then the time between trades carries information. Their idea is that when there has been an information event, orders arrive from both uninformed and informed traders, resulting in more trades per time interval. So when trades occur fast they have more information content, and produce greater price impacts than when trades occur slowly. This implies that the volume of trade now signals information. We discuss the literature on volume later in this section. A direct implication of both the DV and EOH models is that time itself may be predictive of future price movements. This functional role for time is investigated empirically by Engle and Russell (ER) (1998) and by Hasbrouck (1999). Engle and Russell develop a new approach to
7
For a review of some of the issues in modeling high frequency data see Goodhart and O’Hara (1997).
1032
D. Easley and M. O’Hara
modeling irregularly spaced data called the autoregressive conditional duration model. The ACD model focuses on the inter-temporal correlations of the time interval between events. In this setting, the events can be trades, or quote updates, or even depth changes. The ACD model “treats the arrival times of the data as a point process with an intensity defined conditional on past activity”. The ACD model essentially estimates how long it will be until prices or quotes change given this past activity. With its focus on price changes, the ACD model is related to models of volatility. Indeed, Engle (2000) finds that, as suggested by the asymmetric information model, longer time between trades and longer expected times between trades are associated with lower price volatility. Engle and Russell (1997) apply this technique to foreign exchange data and they find that changes in the bid–ask spread are predictive of future price changes. These future price changes are defined over a short horizon, but these analyses show how patterns in trade and quote data can result in predictable price variation. Treating volatility in this way allows for the explicit inclusion of microstructure features such as intra-day seasonalities. While the asset-pricing literature is replete with models of volatility [for an excellent survey see Bollerslev, Chou and Kroner (1992)], typically these models require a high degree of stationarity in the data. Thus, analyses using ARCH-type models (i.e. GARCH, P-GARCH, E-GARCH, etc.) typically remove seasonalities or other microstructure factors from the data. Hasbrouck (1999) makes the important point that such real-time stationarity is refuted by the clustering of trades, price and quote changes that characterize microstructure data. Thus, he notes “a fast market is not merely a normal market that is speeded up, but one in which the relationships between component events differ”. This characterization is consistent with the Easley–O’Hara (1992) argument that the information structure differs between fast and slow markets. Investigating the linkage of microstructure variables to price volatility may also provide insights into more basic questions regarding the evolution of prices. In an intriguing paper, French and Roll (1986) showed that return volatility is significantly higher when measured during the trading day than it is during non-trading hours. They attribute this result to noise generated by the trading process. But how exactly is this occurring? Is it the case that trading creates volatility, in effect inducing traders to transact simply because the market is open? Or, if trading in fact reveals information, then might not prices be volatile because of learning? These issues are important in microstructure because, at least over short horizons, most microstructure time-series are non-linear. Thus, while French and Roll noted the distinction between market-open volatility and market-closed volatility, other authors have shown distinct volatility patterns during the day. In particular, numerous authors have shown that price volatility overall tends to be U-shaped, while the volatility of askprice changes actually declines over the day [see Madhavan, Richardson and Roomans (1997)].
Ch. 17:
Microstructure and Asset Pricing
1033
From our earlier discussion, it seems natural to attribute these volatility patterns to factors relating to information flows and to factors related to market frictions. 8 Hasbrouck (1993) uses a VAR approach to decompose volatility into these components, while Madhavan, Richardson and Roomans (MRR) (1997) develop a structural model to do so. This latter approach uses a simple model in which the intra-day variance arises from four components: price discreteness, asymmetric information, trading costs, and an interaction term. Estimating the model using intra-day NYSE data reveals a number of interesting findings. In particular, MRR find that the public information component of volatility declines by one-third over the day, consistent with the revelation of information through trading. Conversely, the market frictions component of volatility increases over the day, accounting for 65% of the price variance by the close of trading. Of these market frictions, the bid–ask spread plays an important role, explaining more than a third of total volatility by the close of trading. 3.4. Volume and price movements Because the time between trades is inversely related to the number of transactions, the findings discussed above imply that there may be information content in the number of trades and in volume. A widely-cited aphorism is that “it takes volume to move prices”, or simply that volume and volatility are positively correlated. Such an empirical finding has been demonstrated by numerous authors, see for example Karpoff (1987), Galant, Rossi and Tauchen (1992), or Campbell, Grossman and Wang (1993). Whether this volatility linkage is due to the volume of trades or to the number of transactions is contentious. Jones, Kaul and Lipson (1994) argue that it is the number of trades that matters; they present empirical evidence that there is no additional information in volume beyond that conveyed in the number of trades. Other researchers (particularly research on liquidity which we address in the next section) argue the opposite. It is well known that volume is also serially correlated. There is a recent theoretical literature showing how the serial correlation in volume and the correlation between volume and volatility can arise in a competitive market. He and Wang (1995) show that even when information is uncorrelated over time volume will be serially correlated in a partially revealing rational expectations equilibrium, and that volume and volatility will be contemporaneously correlated. This model relies on private information affecting trade in both the current period and possibly in future periods as traders learn from current and past prices. 9 Blume, Easley and O’Hara (1994) provide a model in which volume and volatility are positively correlated and in which volume has predictive content for future price 8 Stoll and Whaley (1990) present evidence that the structural procedure for opening stocks on the New York Stock Exchange appears to affect price volatility. 9 In effect, traders here can profit from using technical trading rules based on prices. Such technical trading strategies have also been analyzed by Brown and Jennings (1989) and Grundy and McNichols (1989).
1034
D. Easley and M. O’Hara
changes. In this model, volume itself is informative because it provides data on the quality or precision of information in past price movements. Thus, traders watching volume can learn information regarding the future movement of prices. BEO argue that this informative effect of volume is likely to be particularly important for small, less widely held firms, a result confirmed empirically by Conrad, Hameed and Niden (1994) in their analysis of the relation of volume and weekly returns. Campbell, Grossman and Wang (1993) develop a model in which volume can help distinguish between price changes due to public information and those that reflect changes in expected return. In their model, variation in the aggregate demands of liquidity traders can generate high volume, as can days in which there is new information. Their model predicts “price changes accompanied by high volume tend to be reversed while this is less true of price changes on days with low volume”. Empirical evidence in Llorente, Michaely, Saar and Wang (2002) using daily data on volume and first-order autocorrelations for individual stocks listed on the NYSE and AMEX confirms these results. That volume may be predictive of short-run movements in prices is consistent with microstructure effects arising from the adjustment of prices to public and private information. What is more difficult to understand is the apparent role of volume in predicting longer-term price movements. In particular, a number of researchers have found that conditioning on past volume is useful in predicting asset returns for months in advance. Gervais, Kaniel and Mingelgrin (1999) find that stocks experiencing unusually high trading volume over a period of one day to a week tend to appreciate over the next month and continue to generate significant returns for horizons as long as 20 weeks. Brennan, Chordia and Subrahmanyam (1998), looking at normal as opposed to abnormal volume, find that high volume stocks tends to be accompanied by lower expected returns, a result they attribute to liquidity effects. We consider these liquidity issues more fully in the next section. What is more puzzling still are results of Lee and Swaminathan (2000). These authors find that price momentum is stronger among high volume stocks, and that past trading volume predicts the magnitude and timing of price momentum reversals. Jegadeesh and Titman (1993) demonstrated that portfolios of “winner” stocks tend to experience higher returns and “loser” stocks experience lower returns over the next three-year period. These momentum effects are puzzling because they suggest the market somehow “under-reacts” to good and bad news. Lee and Swaminathan confirm these effects, but they also show that over years 3 to 5 the return pattern reverses, with winners now under-performing, and conversely. Of particular importance is their finding that high volume winners and losers experience faster momentum reversals. Thus, volume predicts both the level and turning points of returns. Momentum effects pose serious challenges to virtually all asset-pricing theories. Various explanations have been proposed, but momentum appears robust to obvious causes such as measurement errors and transactions costs [see, for example, Grundy and Martin (2001), Conrad and Kaul (1998), Jegadeesh and Titman (2002)]. Could microstructure effects provide part of the solution? Possibly, but the link is not obvious.
Ch. 17:
Microstructure and Asset Pricing
1035
It is hard to envision how dealer inventory management or price discreteness, or bid– ask spread changes or intra-day volatility patterns could produce such long-run effects. And more puzzling still is seeing how these factors could explain the non-linearity in momentum effects. The linkage of volume and momentum suggests that the answer might lie in the complex role played by information. Microstructure analyses highlight the important role of private information in affecting price behavior. Indeed, Hvidkjaer (2000) finds that buy–sell imbalances in momentum portfolios are consistent with traders acting on the basis of such information. What is needed is an understanding of why microstructure could influence asset returns in the long run, and it is this issue we address in the next section.
4. Asset pricing in the long-run The previous section examined the effects of microstructure variables on shortrun asset-pricing dynamics. While the specific influences discussed differ from one another, a feature common to these variables is their relation to the mechanics of trading and the subsequent adjustment of prices to equilibrium levels. Thus, the negative serial correlation found in transactions prices induced by bid/ask bounce or by market maker inventory behavior can be viewed as an artifact of the market clearing process. Similarly, the informative role of volume or the time between trades arises because of the price discovery role of markets. Over short horizons, it should not be surprising that the mechanics of trading can affect prices in predictable ways. What may be more perplexing is why microstructure variables should have any effect on long-term asset returns. There are two issues to consider here. First, and foremost, is the economic origin of such effects. Certainly, apart from short time intervals, the problems of non-synchronous trading or bid/ask bounce will not influence asset returns. And the excessive volume on any given day will quickly dissipate once prices have adjusted to new equilibrium levels. But other, more fundamental factors can arise in the long-term that affect the risk and return faced by traders. In particular, liquidity and the underlying information risk of the asset may influence the utility of investors, an issue we address further shortly. But even if these effects exist, there is a second problem, namely the ability to find them empirically. There is an overarching problem of econometric power when looking for microstructure effects in long-run data. 10 One immediate difficulty is that expected returns may be small relative to return variation (in effect, a low signal-to-noise ratio). A second difficulty is multi-collinearity, both with respect to economic factors and with the components of trading cost. Thus, microstructure effects may be correlated with other economic features such as firm size. Similarly, high variance firms may
10
We thank Joel Hasbrouck for bringing this point to our attention.
1036
D. Easley and M. O’Hara
have more asymmetric information, so that higher returns attributed to volatility may actually reflect compensation due to more complex informational risks. These econometric concerns dictate both caution in interpreting extant findings, and support for the need to develop better, more sensitive econometric techniques. Abstracting from these empirical difficulties, we now turn to the basic question of how, or why, microstructure variables could affect long-run asset returns. 4.1. Liquidity Virtually all would agree that the liquidity of an asset is an important feature; exactly what this liquidity is would elicit more debate. In its simplest form, liquidity relates to trading costs, with more liquid markets having lower costs. Indeed, this view of liquidity is put forth by Amihud and Mendelson (1986) who note “illiquidity can be measured by the cost of immediate execution”. Abstracting from the definitional battles of what is meant by cost (or immediate, for that matter), this view of liquidity leads to the simple axiom that liquidity is a desirable property for an asset; that other things equal, traders would prefer assets in which execution costs are lower. Whether liquidity is valued enough to affect asset returns is more controversial. Over a short period of time, higher levels of transactions costs must lower the return available to investors, and ceterus paribus, lower the price they are willing to pay for the asset. But given a long enough time horizon, are such effects large enough to actually affect returns? The traditional view in asset pricing is no. For example, Constantinides (1986) shows theoretically that transactions costs can only have a second-order effect on the liquidity premium implied by the equilibrium asset returns in an inter-temporal portfolio selection model. A similar conclusion is reached by Aiyagari and Gertler (1991), Heaton and Lucas (1996), Vayanos (1998) and Vayanos and Vila (1999). In effect, these authors all argue that the transactions costs are just too small relative to the equilibrium risk premium to make any real difference. Huang (2001) agrees that in general this is true, but that it need not be the case if traders are constrained from borrowing against their future income stream. 11 Holmstrom and Tirole (2001) develop a model in which firms demand liquidity to meet future cash flow needs. In this model, assets’ expected returns are affected by the covariance of their returns with market liquidity. The counter-argument is put forth by Amihud and Mendelson (AM) (1986, 1988), who argue that liquidity can more generally affect required returns. In the AM model, traders seek to maximize the present value of expected cash flows. The model allows traders to diversify and to have different time horizons. Traders buy and sell assets as
11 Huang (2001) looks at how liquidity shocks affect traders in a continuous time setting. His model uses uncertain holding periods and liquidity shocks to show how borrowing constraints may result in substantial return premia for less liquid securities.
Ch. 17:
Microstructure and Asset Pricing
1037
part of their portfolio problem and they face execution costs in doing so. AM proxy illiquidity by the bid/ask spread, and so higher spreads result in lower overall returns for traders. AM show that a clientele develops in the market for assets with different liquidity. In particular, only traders with long horizons will hold illiquid assets, and they will demand compensation for doing so. Thus, their model predicts that in equilibrium an asset’s return will be an increasing and concave function of its bid/ask spread. The simplicity of the AM argument is quite appealing. In this setting, illiquidity functions as a type of exogenous tax, and while some traders avoid it altogether by eschewing such assets in their portfolio, others bear the tax but demand compensation in the form of higher returns. Because bid/ask spreads measure this liquidity, the question becomes are spreads linked in an economically meaningful way to returns? There is an extensive body of empirical research investigating this question. 12 AM (1986, 1988) find a significant positive effect of bid–ask spreads on stock returns, and their results are supported by Eleswarapu (1997) and by Chalmers and Kadlec (1998) who link amortized spreads to returns. However, Chen, Grundy and Stambaugh (1990), Chen and Kan (1996) and Eleswarapu and Reinganum (1993) conclude the opposite, as does recent research by Easley, Hvidkjaer and O’Hara (2002) who find no direct link between spreads and returns for the period 1983–1998. 13 Indeed, Chen and Kan argue that the positive findings of AM and others are due to mis-specification of risk; that “when returns are not “properly” adjusted for risk, variables that are functions of the most recently observed price of a stock, such as size, dividend yield, and the relative bid–ask spread, are often found to possess explanatory power on the crosssectional difference in the risk-adjusted return”. Their argument captures Berk’s (1995) observation that any price-related variable will be related to returns under improper risk-adjustment. 14 But are spreads necessarily the correct measure of liquidity? Certainly, as a proxy for trading costs, spreads are only a part of the story; factors such as trading commissions, the overall volume in the market, the price impact of trades or even the trading mechanism itself also are important (see Keim and Madhavan (1995) for empirical analysis of the trading costs facing institutional traders). Indeed, the recent dramatic declines in spreads due to structural changes in tick sizes and regulatory changes 12
We limit our discussion here to studies involving the equity markets, but this linkage of liquidity and asset returns has been investigated in other settings as well. See for example, Amihud and Mendelson (1991) and Kamara (1994). 13 ER find that the association between bid/ask spreads and stock returns is mainly confined to the month of January, a result hard to reconcile with the underlying AM arguments. EHO find that spreads are insignificant when added to a Fama–French (1992) regression. However, they do find that spreads may have an effect when added to a return regression that also includes the standard deviation of returns, volume, and the volatility of turnover. This spread effect appears to arise because of the high correlation between spreads and standard deviations. 14 Essentially, the argument here is that spreads are derived from prices, and as Miller and Scholes (1982) pointed out prices may be correlated with a security’s beta, so any finding of a link between spreads and returns could simply be a measurement error of beta.
1038
D. Easley and M. O’Hara
highlight the limitations of this measure. A natural direction for research, therefore, is to investigate how returns are affected by alternative measures of liquidity. Datar, Naik and Radcliffe (1998) propose share turnover (i.e. the number of shares traded over the number of shares outstanding) as such a proxy for liquidity, arguing that liquidity should be correlated with trading frequency. They find that stock returns are a decreasing function of turnover rates. Brennan, Chordia and Subrahmanyam (1998) demonstrate a similar negative relation between average returns and dollar volume. Anushman, Chordia, and Subrahmanyam (ACS) (2001) also find negative relationships between returns and measures of turnover and dollar trading volume. More puzzling is their finding that returns are negatively related to the volatility of these measures. ACS argue that traders prefer volatility as it enhances their ability to implement trading strategies, but this explanation is contentious given the negative impact on utility usually associated with volatility. That volume-related measures appear related to returns is intriguing, but is this effect due to liquidity? In the previous section we saw that volume plays a complex role in markets, in part because of its relationship with information on the stock’s underlying true value. Indeed, Lee and Swaminathan (2000) argue that turnover may be a less than perfect proxy for liquidity because the relation between turnover and expected returns depends on how stocks have performed in the past. As we discussed in the last section, these authors argue that volume may be most useful in understanding the mysteries of momentum. There are, of course, other measures of liquidity to consider. A variety of authors propose variants on the price impact of trading. In particular, markets in which trades move prices a great deal are considered less liquid than those with smaller price effects. Breen, Hodrick and Korajczyk (2000) calculate liquidity as the relation between price changes and net turnover (defined over 5 and 30 minute periods, respectively). They find these price impacts exhibit substantial cross-sectional variation, while remaining remarkably stable when measured over their four-year sample period. Amihud, Mendelson and Lauterbach (AML) (1997) use the liquidity ratio, defined as the daily volume divided by the absolute value of the daily return, to capture this liquidity effect [see also Cooper, Groth and Avera (1986) and Berkman and Eleswarapu (1998) for studies using the liquidity ratio]. AML use a natural market experiment, the transition of trading on the Tel Aviv Stock Exchange from a call auction mechanism to more continuous trading, to investigate how trading mechanisms affect stock prices. This clever study shows that the transition between trading mechanisms resulted in both substantial liquidity gains and significant price increases for the transferred shares. Whether the enhanced liquidity caused the price increase, however, is not something this study can determine, in part, because the liquidity ratio itself is defined by daily returns. Muscarella and Piwowar (2001) show similar effects accompanying the movement of stocks from call market trading to continuous trading on the Paris Bourse. They find that stocks moving to continuous trading experience a price gain of more than 5%. Such stocks also exhibit increases in volume and, not surprisingly, in the liquidity ratio.
Ch. 17:
Microstructure and Asset Pricing
1039
Interestingly, this study also documents that inactive stocks moving from continuous trading to call trading did not fare well; prices generally fell, volume remained relatively stable, and there was a (weakly) significant decline in the liquidity ratio. These results suggest that call market trading did not enhance liquidity for inactive stocks as has been conjectured by numerous authors. Brennan and Subrahmanyam (BS) (1996) propose linking illiquidity to the price impact of trades as captured by the Kyle l. The l variable is essentially the slope coefficient in a regression relating the price change to trade-by-trade signed order flow. In the Kyle (1985) model, the l arises because of strategic trading by an informed investor, and so in that context it is a measure of adverse selection. BS argue that adverse selection is “a primary cause of illiquidity”, and they use the Kyle measure to provide a proxy for this trading cost. This is also one of the first studies to use transactions data to measure the nature of illiquidity. BS conclude “there is a significant return premium associated with the fixed and variable costs of transacting”. They find that the relation between the premium and the variable cost of transacting is concave, but that it is convex when measured with respect to the fixed costs of transacting. This latter result is inconsistent with the clientele argument of Amihud and Mendelson, but it may, as the authors note, reflect more the difficulty of measuring this cost variable using transactions data. Amihud (2000) also investigates the idea that illiquidity is the relationship between the price change and the associated order flow. In this analysis, illiquidity is defined by the average ratio of the daily absolute return to the (dollar) trading volume on that day. In effect, this measure gives the percentage daily price change per dollar of daily volume, and conceptually it is the inverse of the liquidity ratio defined earlier. 15 The paper finds a positive cross-sectional link between asset returns and this illiquidity measure. A puzzle with this and other measures based on daily volume-normalized price movements, however, is that the theoretical link to investors’ trading problems is not straightforward. Moreover, the earlier findings that volume alone is linked to returns raise difficulties in knowing exactly how to interpret these composite variables. Of course, given the difficulty of even defining liquidity, such problems should not be unexpected. A more controversial argument in this research is that expected market illiquidity affects expected stock returns. The notion here is that liquidity can be time-varying for the market as a whole, and investors demand compensation for bearing this market-related risk. Amihud calculates expected market liquidity by averaging the illiquidity measure defined above over all firms in the market, and then assuming that investors expect this market variable to follow an autoregressive process. The analysis hypothesizes that a rise in expected market illiquidity induces both an income and a substitution effect. For all stocks, there is a general fall in stock prices to compensate
15
In some sense, this measure is like a Kyle l but defined over total daily price change and volume.
1040
D. Easley and M. O’Hara
for the reduced liquidity. However, traders also substitute from less liquid to more liquid stocks, resulting in an increase in some stock prices. Pastor and Stambaugh (2001) take this argument one step further by arguing that liquidity per se is a priced factor in asset returns. These authors use a variant of a volume-linked price change as their liquidity measure. Their individual liquidity measure is a complicated one, entailing “the average effect that a given volume on day d has on the return for day d + 1, when the volume is given the same sign as the return on day d”. This measure is similar to that of Amihud (2000), but its use of today’s volume and tomorrow’s return makes this essentially the same variable investigated by Llorente, Michaely, Saar and Wang (2002). Pastor and Stambaugh then calculate each stock’s sensitivity, or “liquidity beta”, by constructing a market liquidity measure, given by the equally weighted average of the liquidity measures of individual stocks on the NYSE and Amex for the years 1962–1999. Their empirical findings suggest that such liquidity betas are highly significant factor in explaining asset returns. 16 Why would liquidity have a common factor or act as one in asset returns? Chordia, Roll, and Subrahmanyam (CRS) (2000), Hasbrouck and Seppi (HS) (2001) and Huberman and Halka (HH) (2001) address this issue. CRS argue that commonality in liquidity could arise for reasons related to common variation in dealer inventories. If trading volume induces co-movements in dealers’ inventories, then this in turn could cause co-movements in liquidity measures such as spreads and depths. Asymmetric information, one of the usual suspects in microstructure studies, is less likely to be the source “because few traders possess privileged information about broad market movements”. CRS calculate the time series averages of market spreads, and they provide evidence of weak commonality in several liquidity variables. Huberman and Halka (2001) provide similar empirical results, and they conjecture that a systematic component of liquidity arises “because of the presence and effect of noise traders”. They note, however, that their empirical results do not provide hard evidence of this explanation. 17 Hasbrouck and Seppi (2001), however, question the importance of any commonality in liquidity. These authors use a principal components analysis to show that common factors exist in the order flows of the 30 stocks in the Dow–Jones Industrial Average. Using canonical correlation analysis, they document that the common factor in returns is highly correlated with the common factor in order flows. Their results on liquidity, however, are quite different in that variations in liquidity are found to be largely idiosyncratic. This suggests that variations in liquidity at the aggregate level could be diversified away. They conclude “any liquidity-linked differences in expected return are
16
Liquidity betas are calculated using volume signed by future returns to predict returns. As with all priced-linked measures, the issue of mis-specification of risk raised by Berk (1995) must be considered. 17 Sias, Starks and Tinic (2001) investigate whether such noise trader risk is priced. They find no evidence using closed-end fund data that it is.
Ch. 17:
Microstructure and Asset Pricing
1041
most likely due to predictable changes in the level of liquidity, rather than to variability in liquidity, per se”. Whether liquidity affects asset returns remains contentious. What does appear robust is that asset returns behave in ways not predicted by traditional asset-pricing theories. While these divergences may be due to liquidity, there is an alternative microstructurebased explanation to which we now turn. 4.2. Information In standard consumption-based asset-pricing models, asset prices are such that the representative individual, or a collection of individuals with homogenous beliefs, chooses to hold the existing supply of assets. As the individuals’ beliefs about the value of the assets change over time, asset prices change, and this movement, along with dividends, generates returns. This basic model leads to an elegant asset-pricing theory in which the price of each asset is primarily dependent on the covariance of its returns with the returns on the entire collection of assets, or the “market”. Individuals need not hold idiosyncratic risk, and so in equilibrium they will not be compensated for holding this risk. Only market risk is priced. Much of the market microstructure literature, on the other hand, focuses on differences in information between individuals, and on how the flows of differential information generate trade, spreads and price changes. Typically in this literature one asset is priced at a time. 18 There is no mention of market versus idiosyncratic risk; everything seems to be idiosyncratic risk as it is asset-specific. For this reason, it is natural to suspect that the issues microstructure examines cannot affect long-run asset prices. Verifying this conjecture requires integrating these two paradigms of asset prices and their evolution. At one level this is simple. If everything else is held constant (endowments and preferences), then in the consumption-based asset-pricing approach, prices evolve because beliefs change. Beliefs change because the information available to individuals about the fundamental value of the assets changes. So, in a very important sense, these paradigms are alike: both theories analyze the pricing of assets in response to information flows. But pushing the information story a little further reveals differences. If individuals begin with a common prior on asset values and they receive common information (such as observations of past prices and realized asset payoffs), then nothing changes in the standard asset-pricing theory: there is no role for information-based effects on returns. If individuals receive differential information, and the economy is in a revealingrational expectations equilibrium, then individuals still have common equilibrium beliefs and, again, nothing changes. But, more realistically, if the equilibrium in the
18
Exceptions to this are work on the influence of information on basket securities, and the work on stock index futures, see Subrahmanyam (1991).
1042
D. Easley and M. O’Hara
asset market is not fully revealing, then individuals receiving differential information will have differing beliefs in equilibrium. Assets may still be priced according to some “market expectation” as in Lintner (1969), but no individual’s belief will be characterized by this market expectation. Individuals will have different perceptions of market and idiosyncratic risk, and they will hold differing portfolios. Now, it is not just fundamental values that matter for asset prices; the distribution of information also matters, just as it does in the market microstructure literature. The literature on partially revealing rational expectations, beginning with Grossman and Stiglitz (1980), shows how differential information affects asset prices. Admati’s (1985) paper generalized this analysis to multiple assets, and she showed how individuals face differing risk-return tradeoffs when differential information is not fully revealed in equilibrium. Wang (1993) showed in a two-asset, multi-period model that private information causes uninformed traders to demand compensation for the adverse selection problem they face, but that this effect is mitigated by the reduction in risk caused by partial revelation of information. Brennan and Cao (1997) use a similar idea to explain how superior information about home country assets can help explain international equity flows. Jones and Slezak (1999) use a multi-asset, multi-period partially revealing rational expectations to show that standard risk-return predictions are altered in predictable ways. Easley and O’Hara (2001) construct a multi-asset partially revealing rational expectations equilibrium in which the distribution of information affects the required rate of return on assets. This analysis shows that if information about an asset is private, rather than public, then uninformed investors demand a higher rate of return on the asset to compensate then for the risk of trading with better informed traders. The market microstructure literature demonstrates that the existence of differential information has a significant impact on the fine details of short run asset prices. The literature above shows that theoretically it could also have an impact on long-run asset prices. But does it? A natural approach to answer this question is to measure the extent of differential information asset-by-asset and ask whether this measure of differential information is priced. Since information, particularly private information, is not directly observable its presence can only be inferred from market data¾trades and prices. Fortunately, the microstructure literature provides ways to do this. Both the Kyle l, from Kyle (1985), and the probability of information-based trade (PIN), from Easley, Kiefer and O’Hara (1997b), are measures of the importance of private information in a microstructure setting. Kyle’s l measures the responsiveness of prices to signed order flow. It can be estimated by regressing price changes on signed order flow. PIN measures the fraction of orders that arise from informed traders. It can be estimated from data on trades. There is a substantial literature estimating each of these measures and showing that they provide insights into microstructure phenomena. See Glosten and Harris (1988), Hasbrouck (1991), Foster and Viswanathan (1993), Brennan and Subrahmanyam (1996) and Amihud (2000) on the Kyle l; and, Easley, Kiefer and O’Hara (1996,
Ch. 17:
Microstructure and Asset Pricing
1043
1997a,b), Easley, Kiefer, O’Hara and Paperman (1996) and Easley, Hvidkjaer and O’Hara (2002) on the probability of information-based trade. Brennan and Subrahmanyam (1996) and Amihud (2000) argue that stocks with high l’s are less attractive to uninformed investors, and they find support for this argument in transactions data and daily data, respectively. Easley, Hvidkjaer and O’Hara (2002) use a structural microstructure model to estimate the probability of information-based trade in each NYSE common stock yearly for the period 1983 to 1998. They show this information variable is priced by including it in a Fama–French (1992) asset-pricing regression. Stocks with higher probabilities of information-based trade are shown to require higher rates of return. A difference of 10 percentage points in the probability of information-based trade between two stocks leads to a difference in their expected returns of 2.5% per year. These results provide evidence that asymmetric information affects long-run asset pricing. There is another role that information may play in long-run asset pricing that does not fit into the asymmetric information approach discussed above. Merton (1987) proposes an asset-pricing model in which agents are unaware of the existence of some assets. In Merton’s model, all agents who know of an asset agree on its return distribution, but information is incomplete in that not all agents know about every asset. Merton shows that assets with incomplete information have a smaller investor base, and. with limited demand, these assets command a lower price. In the standard microstructure approach, all investors know about every asset but information is asymmetric: some investors know more than others about returns. Both approaches lead to cross-sectional differences in asset prices due to information, or to the lack of information. A related literature considers the role of participation constraints on traders [see, for example, Basak and Cuoco (1998) and Shapiro (2002)]. Here the story is that some traders are prohibited for exogenous reasons from holding certain assets. These prohibitions may be from lack of knowledge as in Merton, or they could reflect market imperfections such as suitability requirements or institutional portfolio restrictions that limit the securities available for investment. Such prohibitions result in the same structure analyzed in Merton, although potentially for a different reason. This research also shows that cross-sectional differences can arise in asset returns. Whether these constraints are actually binding in the market, or even if they are, whether they can be viewed as part of the market microstructure, is debatable. 19 But research in this general area does suggest that the simple story exposited in traditional asset pricing may be too rudimentary; that features of trade and trading do matter for asset price behavior.
19 A related issue is the impact of short-sale restrictions and other market constraints on investing such as margin requirements. These features introduce heterogeneity into the investor pool, and so may also affect cross-sectional asset pricing.
1044
D. Easley and M. O’Hara
5. Linking microstructure and asset pricing: puzzles for researchers Our contention in this paper is that features of the trading process provide important insights into short-term and long-term asset price behavior. We have reviewed a wide range of literature investigating these asset-pricing effects, but the vastness of the area dictates that many other papers not mentioned here are relevant to this general question. What has emerged from this survey is cogent and compelling evidence that microstructure factors affect asset-pricing behavior. But also apparent from our review are the many puzzles and challenges that impair our understanding. In this final section, we outline those issues we think most important for future research. At the top of the list is the role of volume. As we have seen in this survey, volume appears to play an important role both in the short-run and in the longrun. Numerous empirical papers find that statistical significance almost magically appears when volume (or a member of its extended family such as turnover) is added to a regression. But theory provides few reasons for this significance, and the range of stories advanced to explain this phenomenon is impressive, but ultimately unconvincing. Why should volume matter? We believe it is because volume, as with other market statistics, proxies for features related to price discovery in markets. Thus, the accounting literature [see, for example, Kim and Verrecchia (1991)] has linked volume to dispersion of beliefs regarding public information. Blume, Easley and O’Hara (1994) have suggested the link is to the quality of private information. Research on dividend recapture, or portfolio insurance, or “triple witching” days, has pointed to particular trading strategies as responsible for episodic high volume. We suspect that all of these theories may be correct, as well as others not yet discovered. A challenge for theoretical researchers is that volume is extremely difficult to work with mathematically. Unlike prices, which can, at least with some transformations, be viewed as approximately normally distributed, the distribution of volume is nonnormal and often complex. In the theoretical microstructure literature, traders (and market makers) learn from watching market data, and it is this learning process that leads to market efficiency. Yet our models of learning from volume are few, and our consonant inability to understand volume may follow as a result. Increased empirical focus on volume may lead to greater insights into its correlation with other variables, and this in turn may aid our understanding of its role theoretically. We may also need to think more deeply about the process of price formation, and recognize that our existing results are based on models that are mathematically tractable, but not necessarily economically valid. 20 The role of information in long-run asset pricing is a second puzzle for researchers. More theoretical work integrating the microstructure and asset-pricing approaches is
20
Thus, models of behavioral finance in which prices reflect alternative learning structures may yield insights into market statistics such as volume, and into the behavior or short-and long-run asset prices.
Ch. 17:
Microstructure and Asset Pricing
1045
needed. So far there is no complete analysis in which: asset prices are determined in a microstructure setting, without the fiction of a Walrasian auctioneer; traders are fully optimizing, with rational expectations but differential information and unrestricted risk preferences; and, prices are fully endogenous, the fiction of an exogenous value process common in the microstructure literature is dropped. Indeed, one way to characterize the shortcomings of both fields is to recognize that microstructure analyzes the transition to a “true” value process, while asset pricing assumes this transition has occurred (flawlessly) and then investigates what causes the value process to arise. As we noted earlier in the paper, this dichotomy is unlikely to capture the complex process of price discovery. Addressing these shortcomings may require rethinking basic concepts such as risk and efficiency. In asset pricing, risk relates only to aggregate uncertainty. The combination of diversification with the presumed efficiency of the price discovery process negates any importance to asset-specific risk. 21 Is this view too optimistic? In microstructure models, prices are always efficient with respect to public information, but this is relatively meaningless; it is the evolution of prices to incorporate private information that is important. Why then should the same not be true for the market as a whole? Does it make sense to assume that the overall market is priced “correctly” when it is so difficult for each individual asset to achieve this state? One response to this criticism is simply that if markets are not efficient then profitable trading opportunities will arise. Thus, asset-pricing models have been supported largely by empirical findings against such profit opportunities. Yet, recent empirical research is more troubling. Beta seems, at best, mis-priced, the explanatory power of traditional asset-pricing approaches (at least as captured by R2 ) is now vanishingly small, and the persistence of momentum all suggest that this base of empirical support has eroded. Moreover, the limits to arbitrage literature [see Shleifer and Vishny (1997)] argues that heterogeneity may more accurately characterize markets, undermining further the story that assets can be viewed interchangeably. Researchers have responded with the search for new and improved factors in asset pricing, but the theoretical justification for such variables is often lacking. In our view, the problem is more basic: if asset prices are not always “efficient”, then traders may face a wide variety of risks in portfolio selection. And foremost among these may be risks arising from asymmetric information, which can inhibit the ability of simple diversification strategies to remove asset-specific risks. More empirical work is needed to determine how robust are the recent findings that measures of differential information are priced. Microstructure models provide measures of private information, but these measures are admittedly crude. Both the Kyle l and the PIN measure developed by Easley et al. (1997b) require an exogenous specification of the periodicity of information events (conveniently chosen to be a
21
For elaboration of these issues, see O’Hara (2003).
1046
D. Easley and M. O’Hara
day). 22 The l measure involves both prices and aggregated quantities, leading to interpretation difficulties due to a wide range of factors. The PIN measure uses imbalances of buy and sell orders to infer the population parameters of informed and uninformed traders as well as the probabilities of new information and its direction. We would be the first to agree that it is surely too simple a measure to capture all the dimensions of information; that more complex market statistics might prove more accurate and have greater predictive ability. In our own research, we are investigating whether integrating ACD-based approaches with our sequential trade structural model might lead to better specifications of information-based trading [see Easley, Engle, O’Hara and Wu (2000)]. We are also investigating the cross-sectional determinants of PIN, with a view to understanding its correlation with accounting-based measures. Another approach that may prove fruitful in understanding how traders view risk in markets is experimental economics. Research by Bossaerts and Plott (1999) for example, shows that the CAPM does not hold when traders face limited numbers of illiquid assets. Experimental research by Bloomfield, O’Hara and Saar (2002) shows how informed and uninformed traders contribute to the production of liquidity in an electronic market. Because experimental analyses can “hold constant” other factors, such research may also suggest ways to deal with the econometric power problem in empirical long-term asset-pricing studies noted earlier. A third puzzle for researchers is the role and importance of liquidity in asset pricing. Here, the challenges are legion, as even defining liquidity entails controversy. Nonetheless, liquidity issues seem to be important in a wide range of markets, and for assets ranging from equities to bonds to derivatives and real estate. Is liquidity best viewed as a type of tax borne by investors, or is it something more complex, and potentially more important? Is liquidity time-varying, suggesting that traders need be concerned with more than the first moment of liquidity? Is liquidity a priced factor, or are “liquidity” shocks really serving as a proxy for more fundamental disturbances in the economy? These questions suggest revisiting the fundamental issue of whether liquidity effects, per se, are simply too small relative to aggregate shocks to be important in asset pricing. The argument advanced by Huang (2001) that liquidity matters when investors cannot borrow seems intriguing, but it seems more likely to provide insights into the liquidity of different classes of assets, rather than the cross-sectional differences within an asset class. Moreover, the liquidity spillovers between markets that prompt the infamous “flights to quality” suggest that liquidity issues may be important in a portfolio context [see Chordia, Sarkar and Subrahmanyam (2002)]. Increased empirical research on liquidity seems particularly promising in shedding light on these issues.
22
Hasbrouck (1999) argues that such a one-day specification is not supported by analyses of actual market data.
Ch. 17:
Microstructure and Asset Pricing
1047
References Admati, A. (1985), “A noisy rational expectations equilibrium for multi-asset securities markets”, Econometrica 53:629−658. Aiyagari, S.R., and M. Gertler (1991), “Asset returns and transaction costs and uninsured individual risk: a stage III exercise”, Journal of Monetary Economics 27:309−331. Amihud, Y. (2000), “Illiquidity and stock returns: cross-section and time series effects”, Working Paper (New York University). Amihud, Y., and H. Mendelson (1986), “Asset pricing and the bid–ask spread”, Journal of Financial Economics 17:223−249. Amihud, Y., and H. Mendelson (1988), “Liquidity and asset prices: financial management implications”, Financial Management 17(1):5−15. Amihud, Y., and H. Mendelson (1991), “Liquidity, maturity, and the yields on U.S. government securities”, Journal of Finance 46:1411−1426. Amihud, Y., H. Mendelson and B. Lauterbach (1997), “Market microstructure and security values: evidence from the Tel Aviv stock exchange”, Journal of Financial Economics 45:365−390. Antoniewicz, R.L. (1993), “Relative volume and subsequent price movements”, Working Paper (Board of Governors of the Federal Reserve System, Washington, DC). Anushman, V.R., T. Chordia and A. Subrahmanyam (2001), “Trading activity and expected stock returns”, Journal of Financial Economics 59(1):3−32. Basak, S., and D. Cuoco (1998), “An equilibrium model with restricted stock market participation”, Review of Financial Studies 11(2):309−341. Berk, J. (1995), “A critique of size related anomalies”, Review of Financial Studies 8:275−286. Berkman, H., and V. Eleswarapu (1998), “Short-term traders and liquidity: a test using Bombay stock exchange data”, Journal of Financial Economics 47:339−355. Bernhardt, D., and E. Hughson (1997), “Splitting orders”, Review of Financial Studies 10(1):69−101. Bias, B., L. Glosten and C. Spatt (2002), “The microstructure of stock markets”, Working Paper (Carnegie-Mellon University, Pittsburgh, PA). Bloomfield, R., M. O’Hara and G. Saar (2002), “The make or take decision in an electronic market: evidence for the evolution of liquidity”, Working Paper (Johnson Graduate School of Management, Cornell University, Ithaca, NY). Blume, L., D. Easley and M. O’Hara (1994), “Market statistics and technical analysis: the role of volume”, Journal of Finance 49(1):153−181. Bollerslev, T., R.Y. Chou and K.F. Kroner (1992), “ARCH modeling in finance: a review of the theory and empirical evidence”, Journal of Econometrics 52(1):91−113. Bossaerts, P., and C. Plott (1999), “Basic principles of asset pricing theory: evidence from large-scale experimental financial markets,” CalTech Working Paper 1070. Boudoukh, J., M. Richardson and R. Whitelaw (1994), “A tale of three schools: insights on autocorrelations of short-horizon returns”, Review of Financial Studies 7(3):539−574. Boyd, J.H., and R. Jagannathan (1994), “Ex-dividend price behavior of common stocks”, Review of Economic Studies 7(4):711−741. Breen, W.J., L.S. Hodrick and R.A. Korajczyk (2000), “Predicting equity liquidity”, Working Paper (Kellogg Graduate School of Management, Northwestern University, Evanston, IL). Brennan, M.J., and H.H. Cao (1997), “International portfolio investment flows”, Journal of Finance 52(5):1851−1880. Brennan, M.J., and A. Subrahmanyam (1996), “Market microstructure and asset pricing: on the compensation for illiquidity in stock returns”, Journal of Financial Economics 41:441−464. Brennan, M.J., T. Chordia and A. Subrahmanyam (1998), “Alternative factor specifications, security characteristics, and the cross-section of expected stock returns”, Journal of Financial Economics 49:345−373. Brown, D.P., and R.H. Jennings (1989), “On technical analysis”, Review of Financial Studies 2:527−551.
1048
D. Easley and M. O’Hara
Campbell, J., S. Grossman and J. Wang (1993), “Trading volume and serial correlation in stock returns”, Quarterly Journal of Economics 108:905−939. Chalmers, J.M., and G.B. Kadlec (1998), “An empirical examination of the amortization spread”, Journal of Financial Economics 48:159−188. Chen, N., and R. Kan (1996), “Expected returns and the bid–ask spread”, in: S. Saitov, K. Sawaki and K. Kubota, eds., Modern Portfolio Theory and Applications (Gakujutsu Shuppon Center, Osaka) pp. 65–80. Chen, N., B. Grundy and R. Stambaugh (1990), “Changing risk, changing risk premiums, and dividend yield effects”, Journal of Business 63:S51−S70. Chordia, T., and B. Swaminathan (2000), “Trading volume and cross-autocorrelations in stock returns”, Journal of Finance 55(2):913−935. Chordia, T., R. Roll and A. Subrahmanyam (2000), “Commonality in liquidity”, Journal of Financial Economics 56:3−28. Chordia, T., A. Sarkar and A. Subrahmanyam (2002), “Common determinants of bond and stock market liquidity: the impact of financial crises, monetary policy and mutual fund flows”, Working Paper (University of California, Los Angeles, CA). Cochrane, J. (2001), Asset Pricing (Princeton University Press, Princeton, NJ). Conrad, J., and G. Kaul (1988), “Time variation in expected returns”, Journal of Business 61:409−425. Conrad, J., and G. Kaul (1998), “An anatomy of trading strategies”, Review of Financial Studies 11:489−519. Conrad, J., G. Kaul and N. Nimalendran (1991), “Components of short-horizon individual security returns”, Journal of Financial Economics 28. Conrad, J., A. Hameed and C. Niden (1994), “Volume and autocovariances in short-horizon individual security returns”, Journal of Finance 49(4):1305−1329. Constantinides, G. (1986), “Capital market equilibrium with transaction costs”, Journal of Political Economy 94:842−862. Cooper, S., J. Groth and W. Avera (1986), “Liquidity, exchange listing, and common stock performance”, Journal of Economics and Business 37:19−33. Datar, V., N. Naik and R. Radcliffe (1998), “Liquidity and stock returns: an alternative test”, Journal of Financial Markets 1(2):203−219. Diamond, D., and R. Verrecchia (1987), “Constraints on short-selling and asset price adjustments to private information”, Journal of Financial Economics 18:277−311. Easley, D., and M. O’Hara (1987), “Price, trade size and information in securities markets”, Journal of Financial Economics 19:69−90. Easley, D., and M. O’Hara (1992), “Time and the process of security price adjustment”, Journal of Finance 47:577−605. Easley, D., and M. O’Hara (2001), “Information and the cost of capital”, Working Paper (Cornell University, Ithaca, NY). Easley, D., N.M. Kiefer and M. O’Hara (1996), “Cream-skimming or profit-sharing? The curious role of purchased order flow”, Journal of Finance 51:811−833. Easley, D., N.M. Kiefer, M. O’Hara and J.B. Paperman (1996), “Liquidity, information and infrequently traded stocks”, Journal of Finance 51(4):1405−1436. Easley, D., N.M. Kiefer and M. O’Hara (1997a), “The information content of the trading process”, Journal of Empirical Finance 4:159−186. Easley, D., N.M. Kiefer and M. O’Hara (1997b), “One day in the life of a very common stock”, Review of Financial Studies 10:805−835. Easley, D., R. Engle, M. O’Hara and L. Wu (2000), “Time varying arrival rates of informed and uninformed trades”, Working Paper (New York University). Easley, D., S. Hvidkjaer and M. O’Hara (2002), “Is information risk a determinant of asset returns?”, Journal of Finance 57(5):2185−2221.
Ch. 17:
Microstructure and Asset Pricing
1049
Eleswarapu, V.R. (1997), “Cost of transacting and expected return in the nasdaq market”, Journal of Finance 52(5):2113−2127. Eleswarapu, V.R., and M.R. Reinganum (1993), “The seasonal behavior of liquidity premium in asset pricing”, Journal of Financial Economics 34:373−386. Elton, E., and M. Gruber (1970), “Marginal stockholders tax rates and the clientele effect”, Journal of Finance 52:68−74. Engle, R.F. (2000), “The econometrics of ultra-high frequency data”, Econometrica 68(1):1−22. Engle, R.F., and J.R. Russell (1997), “Forecasting the frequency of changes in quoted foreign exchange prices with the autoregressive conditional duration model”, Journal of Empirical Finance 4:187−212. Engle, R.F., and J.R. Russell (1998), “Autoregressive conditionla duration: a new model for irregularly spaced transaction data”, Econometrica 66:1127−1162. Fama, E.F. (1965), “The behavior of stock market prices”, Journal of Business 38:34−105. Fama, E.F., and K.R. French (1992), “The cross-section of expected returns”, Journal of Finance 47(2):427−465. Foster, F.D., and S. Viswanathan (1993), “Variations in trading volume, return volatility, and trading costs: evidence on recent price formation models”, Journal of Finance 48:187−211. Frank, M., and R. Jagannathan (1998), “Why do stock prices drop by less than the value of the dividend? Evidence from a country without taxes”, Journal of Financial Economics 47:161−188. French, K.R., and R. Roll (1986), “Stock return variances: the arrival of information and the reaction of traders”, Journal of Financial Economics 17:5−26. Galant, R., P.E. Rossi and G. Tauchen (1992), “Stock prices and volume”, Review of Financial Studies 5:871−908. George, T., G. Kaul and M. Nimalendran (1991), “Estimation of the bid–ask spread and its components: a new approach”, Review of Financial Studies 4:623−656. Gervais, S., R. Kaniel and D. Mingelgrin (1999), “The high volume return premium”, Working Paper (Wharton School, University of Pennsylvania, Philadelphia, PA). Glosten, L. (1987), “Components of the bid–ask spread and the statistical properties of transaction prices”, Journal of Finance 42:1293−1307. Glosten, L., and L. Harris (1988), “Estimating the components of the bid–ask spread”, Journal of Financial Economics 21:123−142. Glosten, L., and P. Milgrom (1985), “Bid, ask and transaction prices in a specialist market with heterogeneously informed traders”, Journal of Financial Economics 14:71−100. Goodhart, C., and M. O’Hara (1997), “High frequency data in financial markets: issues and applications”, Journal of Empirical Finance 4:73−114. Grossman, S., and J. Stiglitz (1980), “On the impossibility of informationally efficient markets”, American Economic Review 70:393−408. Grundy, B., and S. Martin (2001), “Understanding the nature of the risks and the source of the rewards to momentum trading”, Review of Financial Studies 14:29−78. Grundy, B., and M. McNichols (1989), “Trade and the revelation of information through prices and direct disclosure”, Review of Financial Studies 2:495−526. Harris, L. (1990), “Statistical properties of the roll serial covariance bid/ask spread”, Journal of Finance 45:579−590. Hasbrouck, J. (1988), “Trades, quotes, inventories and information”, Journal of Financial Economics 22:229−252. Hasbrouck, J. (1991), “Measuring the information content of stock trades”, Journal of Finance 46(1): 179−207. Hasbrouck, J. (1993), “Assessing the quality of a security market: a new approach to measuring transactions costs”, Review of Financial Studies 6:191−212. Hasbrouck, J. (1996), “Modeling microstructure time series”, in: G.S. Maddala and C.R. Rao, eds., The Handbook of Statistics, Vol. 14 (Elsevier, Amsterdam) pp. 647–691.
1050
D. Easley and M. O’Hara
Hasbrouck, J. (1999), “Trading fast and trading slow: security market events in real time”, Working Paper (Stern School of Management, New York University). Hasbrouck, J. (2000), “Stalking the efficient price in market microstructure specifications: an overview”, Working Paper (New York University). Hasbrouck, J., and D. Seppi (2001), “Common factors in prices, order flows, and liquidity”, Journal of Financial Economics 59(2):383−411. Hasbrouck, J., and G. Sofianos (1993), “The trades of market makers: an empirical analysis of NYSE specialists”, Journal of Finance 48:1565−1593. He, H., and J. Wang (1995), “Differential information and dynamic behavior of stock trading volume”, Review of Financial Studies 8(4):919−972. Heaton, J., and D. Lucas (1996), “Evaluating the effects of incomplete markets on risk sharing and asset prices”, Journal of Political Economy 104:443−487. Holmstrom, B., and J. Tirole (2001), “LAPM: a liquidity-based asset pricing model”, Journal of Finance 56(3):1837−1867. Huang, M. (2001), “Liquidity shocks and equilibrium liquidity premia”, Working Paper (Stanford University, Stanford, CA). Huang, R., and H. Stoll (1994), “Market microstructure and stock return predictions”, Review of Financial Studies 7(1):179−213. Huang, R., and H. Stoll (1997), “The components of the bid–ask spread: a general approach”, The Review of Financial Studies 10(4):995−1034. Huberman, G., and D. Halka (2001), “Systematic liquidity”, Journal of Financial Research 24(2): 161−178. Hvidkjaer, S. (2000), “A trade-based analysis of momentum”, Working Paper (Johnson Graduate School of Management, Cornell University, Ithaca, NY). Jegadeesh, N., and S. Titman (1993), “Returns to buying winners and selling losers: implications for stock market efficiency”, Journal of Finance 48:65−91. Jegadeesh, N., and S. Titman (2002), “Cross-sectional and time-series determinants of momentum profits”, Review of Financial Studies 15:143−157. Jones, C., and S. Slezak (1999), “The theoretical implications of asymmetric information on the dynamic and cross-sectional characteristics of asset returns”, Working Paper (Columbia University, New York). Jones, C., G. Kaul and M. Lipson (1994), “Transactions, volume, and volatility”, Review of Financial Studies 7(4):631−651. Kadlec, G., and D. Patterson (1999), “A transactions data analysis of nonsynchronous trading”, Review of Financial Studies 12(3):609−630. Kamara, A. (1994), “Liquidity, taxes, and short-term treasury yields”, Journal of Financial and Quantitative Analysis 29:403−416. Karpoff, J. (1987), “The relation between price change and trading volume: a survey”, Journal of Financial and Quantitative Analysis 22:109−126. Keim, D., and A. Madhavan (1995), “Anatomy of the trading process: empirical evidence on the behavior of institutional traders”, Journal of Financial Economics 37:371−398. Kim, O., and R.E. Verrecchia (1991), “Market reactions to anticipated announcements”, Journal of Financial Economics 30:273−310. Kyle, A.P. (1985), “Continuous auctions and insider trading”, Econometrica 53:1315−1335. Lee, C., and B. Swaminathan (2000), “Price momentum and trading volume”, Journal of Finance 55:2017−2069. Lintner, J. (1969), “The aggregation of investor’s diverse judgments and preferences in purely competitive security markets”, Journal of Financial and Quantitative Analysis 4:347−400. Llorente, G., R. Michaely, G. Saar and J. Wang (2002), “Dynamic volume-return relation of individual stocks”, Review of Financial Studies 15:1005−1047. Lo, A., and C. MacKinley (1988), “Stock market prices do not follow a random walks: evidence from a simple specification test”, Review of Financial Studies 1:41−66.
Ch. 17:
Microstructure and Asset Pricing
1051
Lo, A., and C. MacKinley (1990a), “An econometric analysis of nonsynchonous trading”, Journal of Econometrics 45:181−211. Lo, A., and C. MacKinley (1990b), “When are contrarian profits due to stock market over-reactions?”, Review of Financial Studies 3:175−208. Lyons, R. (1998), “Profits and position control: a week of FX dealing”, Journal of International Money and Finance 17:97−115. Lyons, R. (2001), The Microstructure Approach to Exchange Rates (MIT Press, Cambridge, MA). Madhavan, A. (2000), “Market microstructure: a survey”, Journal of Financial Markets 3:205−258. Madhavan, A., and S. Smidt (1993), “An analysis of daily changes in specialist inventories and quotations”, Journal of Finance 48:1595−1628. Madhavan, A., M. Richardson and M. Roomans (1997), “Why do security prices change? A transactionlevel analysis of NYSE stocks”, Review of Financial Studies 10:1035−1064. Manaster, S., and S. Mann (1996), “Life in the pits: competitive market making and inventory control”, Review of Financial Studies 9:953−976. Mech, T. (1993), “Portfolio return autocorrelation”, Journal of Financial Economics 34:307−344. Merton, R.C. (1987), “A simple model of capital market equilibrium with incomplete information”, Journal of Finance 42:483−510. Miller, M., and M. Scholes (1982), “Dividends and taxes: empirical evidence”, Journal of Political Economy 90:1118−1141. Muscarella, C.J., and M.S. Piwowar (2001), “Market microstructure and securities values: evidence from the Paris Bourse”, Journal of Financial Markets 4(3):209−230. O’Hara, M. (1995), Market Microstructure Theory (Basil Blackwell, Cambridge, MA). O’Hara, M. (2003), “Liquidity and price discovery”, Journal of Finance, forthcoming. Pastor, L., and R. Stambaugh (2001), “Liquidity risk and expected stock returns”, Working Paper (Wharton School, University of Pennsylvania, Philadelphia, PA). Roll, R. (1984), “A simple implicit measure of the effective bid–ask spread in an efficient market”, Journal of Finance 49:1127−1139. Saar, G. (2001), “Price impact of block trades: an institutional trading explanation”, Review of Financial Studies 14:1153−1181. Shapiro, A. (2002), “The investor recognition hypothesis in a dynamic general equilibrium: theory and evidence”, Review of Financial Studies 15:97−141. Shleifer, A., and R. Vishny (1997), “The limits to arbitrage”, Journal of Finance 52:35−55. Sias, R., L.T. Starks and S.M. Tinic (2001), “Is noise trader risk priced?”, Journal of Financial Research 24(3):311−329. Stickel, S.E., and R.E. Verrecchia (1994), “Evidence that volume sustains price changes”, Financial Analysts Journal (November–December), pp. 57–67. Stoll, H. (1989), “Inferring the components of the bid–ask spread”, Journal of Finance 44:115−134. Stoll, H., and R.E. Whaley (1990), “Stock market structure and volatility”, Review of Financial Studies 3:37−71. Subrahmanyam, A. (1991), “A theory of trading in stock index futures”, Review of Financial Studies 4(1):17−52. Vayanos, D. (1998), “Transactions costs and asset prices: a dynamic equilibrium model”, Review of Financial Studies 11:1−58. Vayanos, D., and J.-L. Vila (1999), “Equilibrium interest rates and liquidity premium with transactions costs”, Economic Theory 13:509−539. Wang, J. (1993), “A model of inter-temporal asset prices under asymmetric information”, Review of Economic Studies 60:249−282.
This Page Intentionally Left Blank
Chapter 18
A SURVEY OF BEHAVIORAL FINANCE ° NICHOLAS BARBERIS University of Chicago RICHARD THALER University of Chicago
Contents Abstract Keywords 1. Introduction 2. Limits to arbitrage 2.1. Market efficiency 2.2. Theory 2.3. Evidence 2.3.1. Twin shares 2.3.2. Index inclusions 2.3.3. Internet carve-outs
3. Psychology 3.1. Beliefs 3.2. Preferences 3.2.1. Prospect theory 3.2.2. Ambiguity aversion
4. Application: The aggregate stock market 4.1. The equity premium puzzle 4.1.1. Prospect theory 4.1.2. Ambiguity aversion 4.2. The volatility puzzle 4.2.1. Beliefs 4.2.2. Preferences
5. Application: The cross-section of average returns 5.1. Belief-based models
1054 1054 1055 1056 1056 1058 1061 1061 1063 1064 1065 1065 1069 1069 1074 1075 1078 1079 1082 1083 1084 1086 1087 1092
° We are very grateful to Markus Brunnermeier, George Constantinides, Kent Daniel, Milt Harris, Ming Huang, Owen Lamont, Jay Ritter, Andrei Shleifer, Jeremy Stein and Tuomo Vuolteenaho for extensive comments.
Handbook of the Economics of Finance, Edited by G.M. Constantinides, M. Harris and R. Stulz © 2003 Elsevier B.V. All rights reserved
1054
N. Barberis and R. Thaler
5.2. Belief-based models with institutional frictions 5.3. Preferences
6. Application: Closed-end funds and comovement 6.1. Closed-end funds 6.2. Comovement
7. Application: Investor behavior 7.1. 7.2. 7.3. 7.4. 7.5.
Insufficient diversification Naive diversification Excessive trading The selling decision The buying decision
8. Application: Corporate finance 8.1. Security issuance, capital structure and investment 8.2. Dividends 8.3. Models of managerial irrationality
9. Conclusion Appendix A References
1095 1097 1098 1098 1099 1101 1101 1103 1103 1104 1105 1106 1106 1109 1111 1113 1115 1116
Abstract Behavioral finance argues that some financial phenomena can plausibly be understood using models in which some agents are not fully rational. The field has two building blocks: limits to arbitrage, which argues that it can be difficult for rational traders to undo the dislocations caused by less rational traders; and psychology, which catalogues the kinds of deviations from full rationality we might expect to see. We discuss these two topics, and then present a number of behavioral finance applications: to the aggregate stock market, to the cross-section of average returns, to individual trading behavior, and to corporate finance. We close by assessing progress in the field and speculating about its future course.
Keywords behavioral finance, market efficiency, prospect theory, limits to arbitrage, investor psychology, investor behavior JEL classification: G11, G12, G30
Ch. 18:
A Survey of Behavioral Finance
1055
1. Introduction The traditional finance paradigm, which underlies many of the other articles in this handbook, seeks to understand financial markets using models in which agents are “rational”. Rationality means two things. First, when they receive new information, agents update their beliefs correctly, in the manner described by Bayes’ law. Second, given their beliefs, agents make choices that are normatively acceptable, in the sense that they are consistent with Savage’s notion of Subjective Expected Utility (SEU). This traditional framework is appealingly simple, and it would be very satisfying if its predictions were confirmed in the data. Unfortunately, after years of effort, it has become clear that basic facts about the aggregate stock market, the cross-section of average returns and individual trading behavior are not easily understood in this framework. Behavioral finance is a new approach to financial markets that has emerged, at least in part, in response to the difficulties faced by the traditional paradigm. In broad terms, it argues that some financial phenomena can be better understood using models in which some agents are not fully rational. More specifically, it analyzes what happens when we relax one, or both, of the two tenets that underlie individual rationality. In some behavioral finance models, agents fail to update their beliefs correctly. In other models, agents apply Bayes’ law properly but make choices that are normatively questionable, in that they are incompatible with SEU. 1 This review essay evaluates recent work in this rapidly growing field. In Section 2, we consider the classic objection to behavioral finance, namely that even if some agents in the economy are less than fully rational, rational agents will prevent them from influencing security prices for very long, through a process known as arbitrage. One of the biggest successes of behavioral finance is a series of theoretical papers showing that in an economy where rational and irrational traders interact, irrationality can have a substantial and long-lived impact on prices. These papers, known as the literature on “limits to arbitrage”, form one of the two buildings blocks of behavioral finance. 1 It is important to note that most models of asset pricing use the Rational Expectations Equilibrium framework (REE), which assumes not only individual rationality but also consistent beliefs [Sargent (1993)]. Consistent beliefs means that agents’ beliefs are correct: the subjective distribution they use to forecast future realizations of unknown variables is indeed the distribution that those realizations are drawn from. This requires not only that agents process new information correctly, but that they have enough information about the structure of the economy to be able to figure out the correct distribution for the variables of interest. Behavioral finance departs from REE by relaxing the assumption of individual rationality. An alternative departure is to retain individual rationality but to relax the consistent beliefs assumption: while investors apply Bayes’ law correctly, they lack the information required to know the actual distribution variables are drawn from. This line of research is sometimes referred to as the literature on bounded rationality, or on structural uncertainty. For example, a model in which investors do not know the growth rate of an asset’s cash flows but learn it as best as they can from available data, would fall into this class. Although the literature we discuss also uses the term bounded rationality, the approach is quite different.
1056
N. Barberis and R. Thaler
To make sharp predictions, behavioral models often need to specify the form of agents’ irrationality. How exactly do people misapply Bayes law or deviate from SEU? For guidance on this, behavioral economists typically turn to the extensive experimental evidence compiled by cognitive psychologists on the biases that arise when people form beliefs, and on people’s preferences, or on how they make decisions, given their beliefs. Psychology is therefore the second building block of behavioral finance, and we review the psychology most relevant for financial economists in Section 3. 2 In Sections 4–8, we consider specific applications of behavioral finance: to understanding the aggregate stock market, the cross-section of average returns, and the pricing of closed-end funds in Sections 4, 5 and 6 respectively; to understanding how particular groups of investors choose their portfolios and trade over time in Section 7; and to understanding the financing and investment decisions of firms in Section 8. Section 9 takes stock and suggests directions for future research. 3
2. Limits to arbitrage 2.1. Market efficiency In the traditional framework where agents are rational and there are no frictions, a security’s price equals its “fundamental value”. This is the discounted sum of expected future cash flows, where in forming expectations, investors correctly process all available information, and where the discount rate is consistent with a normatively acceptable preference specification. The hypothesis that actual prices reflect fundamental values is the Efficient Markets Hypothesis (EMH). Put simply, under this hypothesis, “prices are right”, in that they are set by agents who understand Bayes’ law and have sensible preferences. In an efficient market, there is “no free lunch”: no investment strategy can earn excess risk-adjusted average returns, or average returns greater than are warranted for its risk. Behavioral finance argues that some features of asset prices are most plausibly interpreted as deviations from fundamental value, and that these deviations are brought about by the presence of traders who are not fully rational. A long-standing objection to this view that goes back to Friedman (1953) is that rational traders will quickly undo any dislocations caused by irrational traders. To illustrate the argument, suppose
2
The idea, now widely adopted, that behavioral finance rests on the two pillars of limits to arbitrage and investor psychology is originally due to Shleifer and Summers (1990). 3 We draw readers’ attention to two other recent surveys of behavioral finance. Shleifer (2000) provides a particularly detailed discussion of the theoretical and empirical work on limits to arbitrage, which we summarize in Section 2. Hirshleifer’s (2001) survey is closer to ours in terms of material covered, although we devote less space to asset pricing, and more to corporate finance and individual investor behavior. We also organize the material somewhat differently.
Ch. 18:
A Survey of Behavioral Finance
1057
that the fundamental value of a share of Ford is $20. Imagine that a group of irrational traders becomes excessively pessimistic about Ford’s future prospects and through its selling, pushes the price to $15. Defenders of the EMH argue that rational traders, sensing an attractive opportunity, will buy the security at its bargain price and at the same time, hedge their bet by shorting a “substitute” security, such as General Motors, that has similar cash flows to Ford in future states of the world. The buying pressure on Ford shares will then bring their price back to fundamental value. Friedman’s line of argument is initially compelling, but it has not survived careful theoretical scrutiny. In essence, it is based on two assertions. First, as soon as there is a deviation from fundamental value – in short, a mispricing – an attractive investment opportunity is created. Second, rational traders will immediately snap up the opportunity, thereby correcting the mispricing. Behavioral finance does not take issue with the second step in this argument: when attractive investment opportunities come to light, it is hard to believe that they are not quickly exploited. Rather, it disputes the first step. The argument, which we elaborate on in Sections 2.2 and 2.3, is that even when an asset is wildly mispriced, strategies designed to correct the mispricing can be both risky and costly, rendering them unattractive. As a result, the mispricing can remain unchallenged. It is interesting to think about common finance terminology in this light. While irrational traders are often known as “noise traders”, rational traders are typically referred to as “arbitrageurs”. Strictly speaking, an arbitrage is an investment strategy that offers riskless profits at no cost. Presumably, the rational traders in Friedman’s fable became known as arbitrageurs because of the belief that a mispriced asset immediately creates an opportunity for riskless profits. Behavioral finance argues that this is not true: the strategies that Friedman would have his rational traders adopt are not necessarily arbitrages; quite often, they are very risky. An immediate corollary of this line of thinking is that “prices are right” and “there is no free lunch” are not equivalent statements. While both are true in an efficient market, “no free lunch” can also be true in an inefficient market: just because prices are away from fundamental value does not necessarily mean that there are any excess risk-adjusted average returns for the taking. In other words, “prices are right” ⇒ “no free lunch” but “no free lunch” “prices are right”. This distinction is important for evaluating the ongoing debate on market efficiency. First, many researchers still point to the inability of professional money managers to beat the market as strong evidence of market efficiency [Rubinstein (2001), Ross (2001)]. Underlying this argument, though, is the assumption that “no free lunch” implies “prices are right.” If, as we argue in Sections 2.2 and 2.3, this link is broken, the
1058
N. Barberis and R. Thaler
performance of money managers tells us little about whether prices reflect fundamental value. Second, while some researchers accept that there is a distinction between “prices are right” and “there is no free lunch”, they believe that the debate should be more about the latter statement than about the former. We disagree with this emphasis. As economists, our ultimate concern is that capital be allocated to the most promising investment opportunities. Whether this is true or not depends much more on whether prices are right than on whether there are any free lunches for the taking. 2.2. Theory In the previous section, we emphasized the idea that when a mispricing occurs, strategies designed to correct it can be both risky and costly, thereby allowing the mispricing to survive. Here we discuss some of the risks and costs that have been identified. In our discussion, we return to the example of Ford, whose fundamental value is $20, but which has been pushed down to $15 by pessimistic noise traders. Fundamental risk. The most obvious risk an arbitrageur faces if he buys Ford’s stock at $15 is that a piece of bad news about Ford’s fundamental value causes the stock to fall further, leading to losses. Of course, arbitrageurs are well aware of this risk, which is why they short a substitute security such as General Motors at the same time that they buy Ford. The problem is that substitute securities are rarely perfect, and often highly imperfect, making it impossible to remove all the fundamental risk. Shorting General Motors protects the arbitrageur somewhat from adverse news about the car industry as a whole, but still leaves him vulnerable to news that is specific to Ford – news about defective tires, say. 4 Noise trader risk. Noise trader risk, an idea introduced by De Long et al. (1990a) and studied further by Shleifer and Vishny (1997), is the risk that the mispricing being exploited by the arbitrageur worsens in the short run. Even if General Motors is a perfect substitute security for Ford, the arbitrageur still faces the risk that the pessimistic investors causing Ford to be undervalued in the first place become even more pessimistic, lowering its price even further. Once one has granted the possibility that a security’s price can be different from its fundamental value, then one must also grant the possibility that future price movements will increase the divergence. Noise trader risk matters because it can force arbitrageurs to liquidate their positions early, bringing them potentially steep losses. To see this, note that most real-world arbitrageurs – in other words, professional portfolio managers – are not managing their
4
Another problem is that even if a substitute security exists, it may itself be mispriced. This can happen in situations involving industry-wide mispricing: in that case, the only stocks with similar future cash flows to the mispriced one are themselves mispriced.
Ch. 18:
A Survey of Behavioral Finance
1059
own money, but rather managing money for other people. In the words of Shleifer and Vishny (1997), there is “a separation of brains and capital”. This agency feature has important consequences. Investors, lacking the specialized knowledge to evaluate the arbitrageur’s strategy, may simply evaluate him based on his returns. If a mispricing that the arbitrageur is trying to exploit worsens in the short run, generating negative returns, investors may decide that he is incompetent, and withdraw their funds. If this happens, the arbitrageur will be forced to liquidate his position prematurely. Fear of such premature liquidation makes him less aggressive in combating the mispricing in the first place. These problems can be severely exacerbated by creditors. After poor short-term returns, creditors, seeing the value of their collateral erode, will call their loans, again triggering premature liquidation. In these scenarios, the forced liquidation is brought about by the worsening of the mispricing itself. This need not always be the case. For example, in their efforts to remove fundamental risk, many arbitrageurs sell securities short. Should the original owner of the borrowed security want it back, the arbitrageur may again be forced to close out his position if he cannot find other shares to borrow. The risk that this occurs during a temporary worsening of the mispricing makes the arbitrageur more cautious from the start. Implementation costs. Well-understood transaction costs such as commissions, bid– ask spreads and price impact can make it less attractive to exploit a mispricing. Since shorting is often essential to the arbitrage process, we also include short-sale constraints in the implementation costs category. These refer to anything that makes it less attractive to establish a short position than a long one. The simplest such constraint is the fee charged for borrowing a stock. In general these fees are small – D’Avolio (2002) finds that for most stocks, they range between 10 and 15 basis points – but they can be much larger; in some cases, arbitrageurs may not be able to find shares to borrow at any price. Other than the fees themselves, there can be legal constraints: for a large fraction of money managers – many pension fund and mutual fund managers in particular – short-selling is simply not allowed. 5 We also include in this category the cost of finding and learning about a mispricing, as well as the cost of the resources needed to exploit it [Merton (1987)]. Finding
5 The presence of per-period transaction costs like lending fees can expose arbitrageurs to another kind of risk, horizon risk, which is the risk that the mispricing takes so long to close that any profits are swamped by the accumulated transaction costs. This applies even when the arbitrageur is certain that no outside party will force him to liquidate early. Abreu and Brunnermeier (2002) study a particular type of horizon risk, which they label synchronization risk. Suppose that the elimination of a mispricing requires the participation of a sufficiently large number of separate arbitrageurs. Then in the presence of per-period transaction costs, arbitrageurs may hesitate to exploit the mispricing because they don’t know how many other arbitrageurs have heard about the opportunity, and therefore how long they will have to wait before prices revert to correct values.
1060
N. Barberis and R. Thaler
mispricing, in particular, can be a tricky matter. It was once thought that if noise traders influenced stock prices to any substantial degree, their actions would quickly show up in the form of predictability in returns. Shiller (1984) and Summers (1986) demonstrate that this argument is completely erroneous, with Shiller (1984) calling it “one of the most remarkable errors in the history of economic thought”. They show that even if noise trader demand is so strong as to cause a large and persistent mispricing, it may generate so little predictability in returns as to be virtually undetectable. In contrast, then, to straightforward-sounding textbook arbitrage, real world arbitrage entails both costs and risks, which under some conditions will limit arbitrage and allow deviations from fundamental value to persist. To see what these conditions are, consider two cases. Suppose first that the mispriced security does not have a close substitute. By definition then, the arbitrageur is exposed to fundamental risk. In this case, sufficient conditions for arbitrage to be limited are (i) that arbitrageurs are risk averse and (ii) that the fundamental risk is systematic, in that it cannot be diversified by taking many such positions. Condition (i) ensures that the mispricing will not be wiped out by a single arbitrageur taking a large position in the mispriced security. Condition (ii) ensures that the mispricing will not be wiped out by a large number of investors each adding a small position in the mispriced security to their current holdings. The presence of noise trader risk or implementation costs will only limit arbitrage further. Even if a perfect substitute does exist, arbitrage can still be limited. The existence of the substitute security immunizes the arbitrageur from fundamental risk. We can go further and assume that there are no implementation costs, so that only noise trader risk remains. De Long et al. (1990a) show that noise trader risk is powerful enough, that even with this single form of risk, arbitrage can sometimes be limited. The sufficient conditions are similar to those above, with one important difference. Here arbitrage will be limited if: (i) arbitrageurs are risk averse and have short horizons and (ii) the noise trader risk is systematic. As before, condition (i) ensures that the mispricing cannot be wiped out by a single, large arbitrageur, while condition (ii) prevents a large number of small investors from exploiting the mispricing. The central contribution of Shleifer and Vishny (1997) is to point out the real world relevance of condition (i): the possibility of an early, forced liquidation means that many arbitrageurs effectively have short horizons. In the presence of certain implementation costs, condition (ii) may not even be necessary. If it is costly to learn about a mispricing, or the resources required to exploit it are expensive, that may be enough to explain why a large number of different individuals do not intervene in an attempt to correct the mispricing. It is also important to note that for particular types of noise trading, arbitrageurs may prefer to trade in the same direction as the noise traders, thereby exacerbating the mispricing, rather than against them. For example, De Long et al. (1990b)
Ch. 18:
A Survey of Behavioral Finance
1061
consider an economy with positive feedback traders, who buy more of an asset this period if it performed well last period. If these noise traders push an asset’s price above fundamental value, arbitrageurs do not sell or short the asset. Rather, they buy it, knowing that the earlier price rise will attract more feedback traders next period, leading to still higher prices, at which point the arbitrageurs can exit at a profit. So far, we have argued that it is not easy for arbitrageurs like hedge funds to exploit market inefficiencies. However, hedge funds are not the only market participants trying to take advantage of noise traders: firm managers also play this game. If a manager believes that investors are overvaluing his firm’s shares, he can benefit the firm’s existing shareholders by issuing extra shares at attractive prices. The extra supply this generates could potentially push prices back to fundamental value. Unfortunately, this game entails risks and costs for managers, just as it does for hedge funds. Issuing shares is an expensive process, both in terms of underwriting fees and time spent by company management. Moreover, the manager can rarely be sure that investors are overvaluing his firm’s shares. If he issues shares, thinking that they are overvalued when in fact they are not, he incurs the costs of deviating from his target capital structure, without getting any benefits in return. 2.3. Evidence From the theoretical point of view, there is reason to believe that arbitrage is a risky process and therefore that it is only of limited effectiveness. But is there any evidence that arbitrage is limited? In principle, any example of persistent mispricing is immediate evidence of limited arbitrage: if arbitrage were not limited, the mispricing would quickly disappear. The problem is that while many pricing phenomena can be interpreted as deviations from fundamental value, it is only in a few cases that the presence of a mispricing can be established beyond any reasonable doubt. The reason for this is what Fama (1970) dubbed the “joint hypothesis problem”. In order to claim that the price of a security differs from its properly discounted future cash flows, one needs a model of “proper” discounting. Any test of mispricing is therefore inevitably a joint test of mispricing and of a model of discount rates, making it difficult to provide definitive evidence of inefficiency. In spite of this difficulty, researchers have uncovered a number of financial market phenomena that are almost certainly mispricings, and persistent ones at that. These examples show that arbitrage is indeed limited, and also serve as interesting illustrations of the risks and costs described earlier. 2.3.1. Twin shares In 1907, Royal Dutch and Shell Transport, at the time completely independent companies, agreed to merge their interests on a 60:40 basis while remaining separate entities. Shares of Royal Dutch, which are primarily traded in the USA and in the
1062
N. Barberis and R. Thaler
Fig. 1. Log deviations from Royal Dutch/Shell parity. Source: Froot and Dabora (1999).
Netherlands, are a claim to 60% of the total cash flow of the two companies, while Shell, which trades primarily in the UK, is a claim to the remaining 40%. If prices equal fundamental value, the market value of Royal Dutch equity should always be 1.5 times the market value of Shell equity. Remarkably, it isn’t. Figure 1, taken from Froot and Dabora’s (1999) analysis of this case, shows the ratio of Royal Dutch equity value to Shell equity value relative to the efficient markets benchmark of 1.5. The picture provides strong evidence of a persistent inefficiency. Moreover, the deviations are not small. Royal Dutch is sometimes 35% underpriced relative to parity, and sometimes 15% overpriced. This evidence of mispricing is simultaneously evidence of limited arbitrage, and it is not hard to see why arbitrage might be limited in this case. If an arbitrageur wanted to exploit this phenomenon – and several hedge funds, Long-Term Capital Management included, did try to – he would buy the relatively undervalued share and short the other. Table 1 summarizes the risks facing the arbitrageur. Since one share is a good substitute for the other, fundamental risk is nicely hedged: news about fundamentals should affect the two shares equally, leaving the arbitrageur immune. Nor are there Table 1 Arbitrage costs and risks that arise in exploiting mispricing Example
Royal Dutch/Shell Index Inclusions Palm/3Com
Fundamental risk (FR)
Noise trader risk (NTR)
× √
√
×
×
√
Implementation costs (IC) × × √
Ch. 18:
A Survey of Behavioral Finance
1063
any major implementation costs to speak of: shorting shares of either company is an easy matter. The one risk that remains is noise trader risk. Whatever investor sentiment is causing one share to be undervalued relative to the other could also cause that share to become even more undervalued in the short term. The graph shows that this danger is very real: an arbitrageur buying a 10% undervalued Royal Dutch share in March 1983 would have seen it drop still further in value over the next six months. As discussed earlier, when a mispriced security has a perfect substitute, arbitrage can still be limited if (i) arbitrageurs are risk averse and have short horizons and (ii) the noise trader risk is systematic, or the arbitrage requires specialized skills, or there are costs to learning about such opportunities. It is very plausible that both (i) and (ii) are true, thereby explaining why the mispricing persisted for so long. It took until 2001 for the shares to finally sell at par. This example also provides a nice illustration of the distinction between “prices are right” and “no free lunch” discussed in Section 2.1. While prices in this case are clearly not right, there are no easy profits for the taking. 2.3.2. Index inclusions Every so often, one of the companies in the S&P 500 is taken out of the index because of a merger or bankruptcy, and is replaced by another firm. Two early studies of such index inclusions, Harris and Gurel (1986) and Shleifer (1986), document a remarkable fact: when a stock is added to the index, it jumps in price by an average of 3.5%, and much of this jump is permanent. In one dramatic illustration of this phenomenon, when Yahoo was added to the index, its shares jumped by 24% in a single day. The fact that a stock jumps in value upon inclusion is once again clear evidence of mispricing: the price of the share changes even though its fundamental value does not. Standard and Poor’s emphasizes that in selecting stocks for inclusion, they are simply trying to make their index representative of the U.S. economy, not to convey any information about the level or riskiness of a firm’s future cash flows. 6 This example of a deviation from fundamental value is also evidence of limited arbitrage. When one thinks about the risks involved in trying to exploit the anomaly, its persistence becomes less surprising. An arbitrageur needs to short the included security and to buy as good a substitute security as he can. This entails considerable
6 After the initial studies on index inclusions appeared, some researchers argued that the price increase might be rationally explained through information or liquidity effects. While such explanations cannot be completely ruled out, the case for mispricing was considerably strengthened by Kaul, Mehrotra and Morck (2000). They consider the case of the TS300 index of Canadian equities, which in 1996 changed the weights of some of its component stocks to meet an innocuous regulatory requirement. The reweighting was accompanied by significant price effects. Since the affected stocks were already in the index at the time of the event, information and liquidity explanations for the price jumps are extremely implausible.
1064
N. Barberis and R. Thaler
fundamental risk because individual stocks rarely have good substitutes. It also carries substantial noise trader risk: whatever caused the initial jump in price – in all likelihood, buying by S&P 500 index funds – may continue, and cause the price to rise still further in the short run; indeed, Yahoo went from $115 prior to its S&P inclusion announcement to $210 a month later. Wurgler and Zhuravskaya (2002) provide additional support for the limited arbitrage view of S&P 500 inclusions. They hypothesize that the jump upon inclusion should be particularly large for those stocks with the worst substitute securities, in other words, for those stocks for which the arbitrage is riskiest. By constructing the best possible substitute portfolio for each included stock, they are able to test this, and find strong support. Their analysis also shows just how hard it is to find good substitute securities for individual stocks. For most regressions of included stock returns on the returns of the best substitute securities, the R2 is below 25%. 2.3.3. Internet carve-outs In March 2000, 3Com sold 5% of its wholly owned subsidiary Palm Inc. in an initial public offering, retaining ownership of the remaining 95%. After the IPO, a shareholder of 3Com indirectly owned 1.5 shares of Palm. 3Com also announced its intention to spin off the remainder of Palm within 9 months, at which time they would give each 3Com shareholder 1.5 shares of Palm. At the close of trading on the first day after the IPO, Palm shares stood at $95, putting a lower bound on the value of 3Com at $142. In fact, 3Com’s price was $81, implying a market valuation of 3Com’s substantial businesses outside of Palm of about −$60 per share! This situation surely represents a severe mispricing, and it persisted for several weeks. To exploit it, an arbitrageur could buy one share of 3Com, short 1.5 shares of Palm, and wait for the spin-off, thus earning certain profits at no cost. This strategy entails no fundamental risk and no noise trader risk. Why, then, is arbitrage limited? Lamont and Thaler (2003), who analyze this case in detail, argue that implementation costs played a major role. Many investors who tried to borrow Palm shares to short were either told by their broker that no shares were available, or else were quoted a very high borrowing price. This barrier to shorting was not a legal one, but one that arose endogenously in the marketplace: such was the demand for shorting Palm, that the supply of Palm shorts was unable to meet it. Arbitrage was therefore limited, and the mispricing persisted. 7 Some financial economists react to these examples by arguing that they are simply isolated instances with little broad relevance. 8 We think this is an overly complacent 7
See also Mitchell, Pulvino and Stafford (2002) and Ofek and Richardson (2003) for further discussion of such “negative stub” situations, in which the market value of a company is less than the sum of its publicly traded parts. 8 During a discussion of these issues at a University of Chicago seminar, one economist argued that these examples are “the tip of the iceberg”, to which another retorted that “they are the iceberg”.
Ch. 18:
A Survey of Behavioral Finance
1065
view. The “twin shares” example illustrates that in situations where arbitrageurs face only one type of risk – noise trader risk – securities can become mispriced by almost 35%. This suggests that if a typical stock trading on the NYSE or NASDAQ becomes subject to investor sentiment, the mispricing could be an order of magnitude larger. Not only would arbitrageurs face noise trader risk in trying to correct the mispricing, but fundamental risk as well, not to mention implementation costs.
3. Psychology The theory of limited arbitrage shows that if irrational traders cause deviations from fundamental value, rational traders will often be powerless to do anything about it. In order to say more about the structure of these deviations, behavioral models often assume a specific form of irrationality. For guidance on this, economists turn to the extensive experimental evidence compiled by cognitive psychologists on the systematic biases that arise when people form beliefs, and on people’s preferences. 9 In this section, we summarize the psychology that may be of particular interest to financial economists. Our discussion of each finding is necessarily brief. For a deeper understanding of the phenomena we touch on, we refer the reader to the surveys of Camerer (1995) and Rabin (1998) and to the edited volumes of Kahneman, Slovic and Tversky (1982), Kahneman and Tversky (2000) and Gilovich, Griffin and Kahneman (2002). 3.1. Beliefs A crucial component of any model of financial markets is a specification of how agents form expectations. We now summarize what psychologists have learned about how people appear to form beliefs in practice. Overconfidence. Extensive evidence shows that people are overconfident in their judgments. This appears in two guises. First, the confidence intervals people assign to their estimates of quantities – the level of the Dow in a year, say – are far too narrow. Their 98% confidence intervals, for example, include the true quantity only about 60% of the time [Alpert and Raiffa (1982)]. Second, people are poorly calibrated when estimating probabilities: events they think are certain to occur actually occur only
9
We emphasize, however, that behavioral models do not need to make extensive psychological assumptions in order to generate testable predictions. In Section 6, we discuss Lee, Shleifer and Thaler’s (1991) theory of closed-end fund pricing. That theory makes numerous crisp predictions using only the assumptions that there are noise traders with correlated sentiment in the economy, and that arbitrage is limited.
1066
N. Barberis and R. Thaler
around 80% of the time, and events they deem impossible occur approximately 20% of the time [Fischhoff, Slovic and Lichtenstein (1977)]. 10 Optimism and wishful thinking. Most people display unrealistically rosy views of their abilities and prospects [Weinstein (1980)]. Typically, over 90% of those surveyed think they are above average in such domains as driving skill, ability to get along with people and sense of humor. They also display a systematic planning fallacy: they predict that tasks (such as writing survey papers) will be completed much sooner than they actually are [Buehler, Griffin and Ross (1994)]. Representativeness. Kahneman and Tversky (1974) show that when people try to determine the probability that a data set A was generated by a model B, or that an object A belongs to a class B, they often use the representativeness heuristic. This means that they evaluate the probability by the degree to which A reflects the essential characteristics of B. Much of the time, representativeness is a helpful heuristic, but it can generate some severe biases. The first is base rate neglect. To illustrate, Kahneman and Tversky present this description of a person named Linda: Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. When asked which of “Linda is a bank teller” (statement A) and “Linda is a bank teller and is active in the feminist movement” (statement B) is more likely, subjects typically assign greater probability to B. This is, of course, impossible. Representativeness provides a simple explanation. The description of Linda sounds like the description of a feminist – it is representative of a feminist – leading subjects to pick B. Put differently, while Bayes law says that p (statement B | description) =
p (description | statement B) p (statement B) , p (description)
people apply the law incorrectly, putting too much weight on p(description | statement B), which captures representativeness, and too little weight on the base rate, p(statement B). 10 Overconfidence may in part stem from two other biases, self-attribution bias and hindsight bias. Self-attribution bias refers to people’s tendency to ascribe any success they have in some activity to their own talents, while blaming failure on bad luck, rather than on their ineptitude. Doing this repeatedly will lead people to the pleasing but erroneous conclusion that they are very talented. For example, investors might become overconfident after several quarters of investing success [Gervais and Odean (2001)]. Hindsight bias is the tendency of people to believe, after an event has occurred, that they predicted it before it happened. If people think they predicted the past better than they actually did, they may also believe that they can predict the future better than they actually can.
Ch. 18:
A Survey of Behavioral Finance
1067
Representativeness also leads to another bias, sample size neglect. When judging the likelihood that a data set was generated by a particular model, people often fail to take the size of the sample into account: after all, a small sample can be just as representative as a large one. Six tosses of a coin resulting in three heads and three tails are as representative of a fair coin as 500 heads and 500 tails are in a total of 1000 tosses. Representativeness implies that people will find the two sets of tosses equally informative about the fairness of the coin, even though the second set is much more so. Sample size neglect means that in cases where people do not initially know the data-generating process, they will tend to infer it too quickly on the basis of too few data points. For instance, they will come to believe that a financial analyst with four good stock picks is talented because four successes are not representative of a bad or mediocre analyst. It also generates a “hot hand” phenomenon, whereby sports fans become convinced that a basketball player who has made three shots in a row is on a hot streak and will score again, even though there is no evidence of a hot hand in the data [Gilovich, Vallone and Tversky (1985)]. This belief that even small samples will reflect the properties of the parent population is sometimes known as the “law of small numbers” [Rabin (2002)]. In situations where people do know the data-generating process in advance, the law of small numbers leads to a gambler’s fallacy effect. If a fair coin generates five heads in a row, people will say that “tails are due”. Since they believe that even a short sample should be representative of the fair coin, there have to be more tails to balance out the large number of heads. Conservatism. While representativeness leads to an underweighting of base rates, there are situations where base rates are over-emphasized relative to sample evidence. In an experiment run by Edwards (1968), there are two urns, one containing 3 blue balls and 7 red ones, and the other containing 7 blue balls and 3 red ones. A random draw of 12 balls, with replacement, from one of the urns yields 8 reds and 4 blues. What is the probability the draw was made from the first urn? While the correct answer is 0.97, most people estimate a number around 0.7, apparently overweighting the base rate of 0.5. At first sight, the evidence of conservatism appears at odds with representativeness. However, there may be a natural way in which they fit together. It appears that if a data sample is representative of an underlying model, then people overweight the data. However, if the data is not representative of any salient model, people react too little to the data and rely too much on their priors. In Edwards’ experiment, the draw of 8 red and 4 blue balls is not particularly representative of either urn, possibly leading to an overreliance on prior information. 11
11
Mullainathan (2001) presents a formal model that neatly reconciles the evidence on underweighting sample information with the evidence on overweighting sample information.
1068
N. Barberis and R. Thaler
Belief perseverance. There is much evidence that once people have formed an opinion, they cling to it too tightly and for too long [Lord, Ross and Lepper (1979)]. At least two effects appear to be at work. First, people are reluctant to search for evidence that contradicts their beliefs. Second, even if they find such evidence, they treat it with excessive skepticism. Some studies have found an even stronger effect, known as confirmation bias, whereby people misinterpret evidence that goes against their hypothesis as actually being in their favor. In the context of academic finance, belief perseverance predicts that if people start out believing in the Efficient Markets Hypothesis, they may continue to believe in it long after compelling evidence to the contrary has emerged. Anchoring. Kahneman and Tversky (1974) argue that when forming estimates, people often start with some initial, possibly arbitrary value, and then adjust away from it. Experimental evidence shows that the adjustment is often insufficient. Put differently, people “anchor” too much on the initial value. In one experiment, subjects were asked to estimate the percentage of United Nations’ countries that are African. More specifically, before giving a percentage, they were asked whether their guess was higher or lower than a randomly generated number between 0 and 100. Their subsequent estimates were significantly affected by the initial random number. Those who were asked to compare their estimate to 10, subsequently estimated 25%, while those who compared to 60, estimated 45%. Availability biases. When judging the probability of an event – the likelihood of getting mugged in Chicago, say – people often search their memories for relevant information. While this is a perfectly sensible procedure, it can produce biased estimates because not all memories are equally retrievable or “available”, in the language of Kahneman and Tversky (1974). More recent events and more salient events – the mugging of a close friend, say – will weigh more heavily and distort the estimate. Economists are sometimes wary of this body of experimental evidence because they believe (i) that people, through repetition, will learn their way out of biases; (ii) that experts in a field, such as traders in an investment bank, will make fewer errors; and (iii) that with more powerful incentives, the effects will disappear. While all these factors can attenuate biases to some extent, there is little evidence that they wipe them out altogether. The effect of learning is often muted by errors of application: when the bias is explained, people often understand it, but then immediately proceed to violate it again in specific applications. Expertise, too, is often a hindrance rather than a help: experts, armed with their sophisticated models, have been found to exhibit more overconfidence than laymen, particularly when they receive only limited feedback about their predictions. Finally, in a review of dozens of studies on the topic, Camerer and Hogarth (1999, p. 7) conclude that while incentives can
Ch. 18:
A Survey of Behavioral Finance
1069
sometimes reduce the biases people display, “no replicated study has made rationality violations disappear purely by raising incentives”. 3.2. Preferences 3.2.1. Prospect theory An essential ingredient of any model trying to understand asset prices or trading behavior is an assumption about investor preferences, or about how investors evaluate risky gambles. The vast majority of models assume that investors evaluate gambles according to the expected utility framework, EU henceforth. The theoretical motivation for this goes back to Von Neumann and Morgenstern (1944), VNM henceforth, who show that if preferences satisfy a number of plausible axioms – completeness, transitivity, continuity, and independence – then they can be represented by the expectation of a utility function. Unfortunately, experimental work in the decades after VNM has shown that people systematically violate EU theory when choosing among risky gambles. In response to this, there has been an explosion of work on so-called non-EU theories, all of them trying to do a better job of matching the experimental evidence. Some of the better known models include weighted-utility theory [Chew and MacCrimmon (1979), Chew (1983)], implicit EU [Chew (1989), Dekel (1986)], disappointment aversion [Gul (1991)], regret theory [Bell (1982), Loomes and Sugden (1982)], rank-dependent utility theories [Quiggin (1982), Segal (1987, 1989), Yaari (1987)], and prospect theory [Kahneman and Tversky (1979), Tversky and Kahneman (1992)]. Should financial economists be interested in any of these alternatives to expected utility? It may be that EU theory is a good approximation to how people evaluate a risky gamble like the stock market, even if it does not explain attitudes to the kinds of gambles studied in experimental settings. On the other hand, the difficulty the EU approach has encountered in trying to explain basic facts about the stock market suggests that it may be worth taking a closer look at the experimental evidence. Indeed, recent work in behavioral finance has argued that some of the lessons we learn from violations of EU are central to understanding a number of financial phenomena. Of all the non-EU theories, prospect theory may be the most promising for financial applications, and we discuss it in detail. The reason we focus on this theory is, quite simply, that it is the most successful at capturing the experimental results. In a way, this is not surprising. Most of the other non-EU models are what might be called quasinormative, in that they try to capture some of the anomalous experimental evidence by slightly weakening the VNM axioms. The difficulty with such models is that in trying to achieve two goals – normative and descriptive – they end up doing an unsatisfactory job at both. In contrast, prospect theory has no aspirations as a normative theory: it simply tries to capture people’s attitudes to risky gambles as parsimoniously as possible. Indeed, Tversky and Kahneman (1986) argue convincingly that normative approaches are doomed to failure, because people routinely make choices that are
1070
N. Barberis and R. Thaler
simply impossible to justify on normative grounds, in that they violate dominance or invariance. Kahneman and Tversky (1979), KT henceforth, lay out the original version of prospect theory, designed for gambles with at most two non-zero outcomes. They propose that when offered a gamble (x, p; y, q) , to be read as “get outcome x with probability p, outcome y with probability q”, where x 0 y or y 0 x, people assign it a value of p ( p) v(x) + p( q) v( y),
(1)
where v and p are shown in Figure 2. When choosing between different gambles, they pick the one with the highest value.
Fig. 2. Kahneman and Tversky’s (1979) proposed value function v and probability weighting function p .
This formulation has a number of important features. First, utility is defined over gains and losses rather than over final wealth positions, an idea first proposed by Markowitz (1952). This fits naturally with the way gambles are often presented and discussed in everyday life. More generally, it is consistent with the way people perceive attributes such as brightness, loudness, or temperature relative to earlier levels, rather than in absolute terms. Kahneman and Tversky (1979) also offer the following violation of EU as evidence that people focus on gains and losses. Subjects are asked: 12
12 All the experiments in Kahneman and Tversky (1979) are conducted in terms of Israeli currency. The authors note that at the time of their research, the median monthly family income was about 3000 Israeli lira.
Ch. 18:
A Survey of Behavioral Finance
1071
In addition to whatever you own, you have been given 1000. Now choose between A = (1000, 0.5) B = (500, 1). B was the more popular choice. The same subjects were then asked: In addition to whatever you own, you have been given 2000. Now choose between C = (−1000, 0.5) D = (−500, 1). This time, C was more popular. Note that the two problems are identical in terms of their final wealth positions and yet people choose differently. The subjects are apparently focusing only on gains and losses. Indeed, when they are not given any information about prior winnings, they choose B over A and C over D. The second important feature is the shape of the value function v, namely its concavity in the domain of gains and convexity in the domain of losses. Put simply, people are risk averse over gains, and risk-seeking over losses. Simple evidence for this comes from the fact just mentioned, namely that in the absence of any information about prior winnings 13 B A,
C D.
The v function also has a kink at the origin, indicating a greater sensitivity to losses than to gains, a feature known as loss aversion. Loss aversion is introduced to capture aversion to bets of the form: E = 110, 12 ;
−100, 12 .
It may seem surprising that we need to depart from the expected utility framework in order to understand attitudes to gambles as simple as E, but it is nonetheless true. In a remarkable paper, Rabin (2000) shows that if an expected utility maximizer rejects gamble E at all wealth levels, then he will also reject
20000000, 12 ;
−1000, 12 ,
an utterly implausible prediction. The intuition is simple: if a smooth, increasing, and concave utility function defined over final wealth has sufficient local curvature to reject In this section G1 G2 should be read as “a statistically significant fraction of Kahneman and Tversky’s subjects preferred G1 to G2 .”
13
1072
N. Barberis and R. Thaler
E over a wide range of wealth levels, it must be an extraordinarily concave function, making the investor extremely risk averse over large stakes gambles. The final piece of prospect theory is the nonlinear probability transformation. Small probabilities are overweighted, so that p ( p) > p. This is deduced from KT’s finding that (5000, 0.001) (5, 1), and (−5, 1) (−5000, 0.001), together with the earlier assumption that v is concave (convex) in the domain of gains (losses). Moreover, people are more sensitive to differences in probabilities at higher probability levels. For example, the following pair of choices, (3000, 1) (4000, 0.8; 0, 0.2), and (4000, 0.2; 0, 0.8) (3000, 0.25), which violate EU theory, imply p (1) p(0.25) < . p (0.2) p (0.8) The intuition is that the 20% jump in probability from 0.8 to 1 is more striking to people than the 20% jump from 0.2 to 0.25. In particular, people place much more weight on outcomes that are certain relative to outcomes that are merely probable, a feature sometimes known as the “certainty effect”. Along with capturing experimental evidence, prospect theory also simultaneously explains preferences for insurance and for buying lottery tickets. Although the concavity of v in the region of gains generally produces risk aversion, for lotteries which offer a small chance of a large gain, the overweighting of small probabilities in Figure 2 dominates, leading to risk-seeking. Along the same lines, while the convexity of v in the region of losses typically leads to risk-seeking, the same overweighting of small probabilities induces risk aversion over gambles which have a small chance of a large loss. Based on additional evidence, Tversky and Kahneman (1992) propose a generalization of prospect theory which can be applied to gambles with more than two
Ch. 18:
A Survey of Behavioral Finance
1073
outcomes. Specifically, if a gamble promises outcome xi with probability pi , Tversky and Kahneman (1992) propose that people assign the gamble the value
pi v (xi ) ,
(2)
i
where v=
if xa −l(−x)a if
x0 x<0
and pi = w (Pi ) − w (Pi∗ ) , Pg w(P) = . (P g + (1 − P)g )1/ g Here, Pi (Pi∗ ) is the probability that the gamble will yield an outcome at least as good as (strictly better than) xi . Tversky and Kahneman (1992) use experimental evidence to estimate a = 0.88, l = 2.25, and g = 0.65. Note that l is the coefficient of loss aversion, a measure of the relative sensitivity to gains and losses. Over a wide range of experimental contexts l has been estimated in the neighborhood of 2. Earlier in this section, we saw how prospect theory could explain why people made different choices in situations with identical final wealth levels. This illustrates an important feature of the theory, namely that it can accommodate the effects of problem description, or of framing. Such effects are powerful. There are numerous demonstrations of a 30 to 40% shift in preferences depending on the wording of a problem. No normative theory of choice can accommodate such behavior since a first principle of rational choice is that choices should be independent of the problem description or representation. Framing refers to the way a problem is posed for the decision maker. In many actual choice contexts the decision maker also has flexibility in how to think about the problem. For example, suppose that a gambler goes to the race track and wins $200 in his first bet, but then loses $50 on his second bet. Does he code the outcome of the second bet as a loss of $50 or as a reduction in his recently won gain of $200? In other words, is the utility of the second loss v(−50) or v(150) − v(200)? The process by which people formulate such problems for themselves is called mental accounting [Thaler (2000)]. Mental accounting matters because in prospect theory, v is nonlinear. One important feature of mental accounting is narrow framing, which is the tendency to treat individual gambles separately from other portions of wealth. In other words, when offered a gamble, people often evaluate it as if it is the only gamble they face in the world, rather than merging it with pre-existing bets to see if the new bet is a worthwhile addition.
1074
N. Barberis and R. Thaler
Redelmeier and Tversky (1992) provide a simple illustration, based on the gamble F = 2000, 12 ; −500, 12 . Subjects in their experiment were asked whether they were willing to take this bet; 57% said they would not. They were then asked whether they would prefer to play F five times or six times; 70% preferred the six-fold gamble. Finally they were asked: Suppose that you have played F five times but you don’t yet know your wins and losses. Would you play the gamble a sixth time? 60% rejected the opportunity to play a sixth time, reversing their preference from the earlier question. This suggests that some subjects are framing the sixth gamble narrowly, segregating it from the other gambles. Indeed, the 60% rejection level is very similar to the 57% rejection level for the one-off play of F. 3.2.2. Ambiguity aversion Our discussion so far has centered on understanding how people act when the outcomes of gambles have known objective probabilities. In reality, probabilities are rarely objectively known. To handle these situations, Savage (1964) develops a counterpart to expected utility known as subjective expected utility, SEU henceforth. Under certain axioms, preferences can be represented by the expectation of a utility function, this time weighted by the individual’s subjective probability assessment. Experimental work in the last few decades has been as unkind to SEU as it was to EU. The violations this time are of a different nature, but they may be just as relevant for financial economists. The classic experiment was described by Ellsberg (1961). Suppose that there are two urns, 1 and 2. Urn 2 contains a total of 100 balls, 50 red and 50 blue. Urn 1 also contains 100 balls, again a mix of red and blue, but the subject does not know the proportion of each. Subjects are asked to choose one of the following two gambles, each of which involves a possible payment of $100, depending on the color of a ball drawn at random from the relevant urn a1 : a ball is drawn from Urn 1, a2 : a ball is drawn from Urn 2,
$100 if red,
$0 if blue,
$100 if red,
$0 if blue.
Subjects are then also asked to choose between the following two gambles: b1 : a ball is drawn from Urn 1,
$100 if blue,
$0 if red,
b2 : a ball is drawn from Urn 2,
$100 if blue,
$0 if red.
a2 is typically preferred to a1 , while b2 is chosen over b1 . These choices are inconsistent with SEU: the choice of a2 implies a subjective probability that fewer than 50% of the balls in Urn 1 are red, while the choice of b2 implies the opposite.
Ch. 18:
A Survey of Behavioral Finance
1075
The experiment suggests that people do not like situations where they are uncertain about the probability distribution of a gamble. Such situations are known as situations of ambiguity, and the general dislike for them, as ambiguity aversion. 14 SEU does not allow agents to express their degree of confidence about a probability distribution and therefore cannot capture such aversion. Ambiguity aversion appears in a wide variety of contexts. For example, a researcher might ask a subject for his estimate of the probability that a certain team will win its upcoming football match, to which the subject might respond 0.4. The researcher then asks the subject to imagine a chance machine, which will display 1 with probability 0.4 and 0 otherwise, and asks whether the subject would prefer to bet on the football game – an ambiguous bet – or on the machine, which offers no ambiguity. In general, people prefer to bet on the machine, illustrating aversion to ambiguity. Heath and Tversky (1991) argue that in the real world, ambiguity aversion has much to do with how competent an individual feels he is at assessing the relevant distribution. Ambiguity aversion over a bet can be strengthened by highlighting subjects’ feelings of incompetence, either by showing them other bets in which they have more expertise, or by mentioning other people who are more qualified to evaluate the bet [Fox and Tversky (1995)]. Further evidence that supports the competence hypothesis is that in situations where people feel especially competent in evaluating a gamble, the opposite of ambiguity aversion, namely a “preference for the familiar”, has been observed. In the example above, people chosen to be especially knowledgeable about football often prefer to bet on the outcome of the game than on the chance machine. Just as with ambiguity aversion, such behavior cannot be captured by SEU. 4. Application: The aggregate stock market Researchers studying the aggregate U.S. stock market have identified a number of interesting facts about its behavior. Three of the most striking are: The Equity Premium. The stock market has historically earned a high excess rate of return. For example, using annual data from 1871–1993, Campbell and Cochrane (1999) report that the average log return on the S&P 500 index is 3.9% higher than the average log return on short-term commercial paper. Volatility. Stock returns and price–dividend ratios are both highly variable. In the same data set, the annual standard deviation of excess log returns on the S&P 500 is 18%, while the annual standard deviation of the log price–dividend ratio is 0.27. 14 An early discussion of this aversion can be found in Knight (1921), who defines risk as a gamble with known distribution and uncertainty as a gamble with unknown distribution, and suggests that people dislike uncertainty more than risk.
1076
N. Barberis and R. Thaler
Predictability. Stock returns are forecastable. Using monthly, real, equal-weighted NYSE returns from 1941–1986, Fama and French (1988) show that the dividend– price ratio is able to explain 27% of the variation of cumulative stock returns over the subsequent four years. 15 All three of these facts can be labelled puzzles. The first fact has been known as the equity premium puzzle since the work of Mehra and Prescott (1985) [see also Hansen and Singleton (1983)]. Campbell (1999) calls the second fact the volatility puzzle and we refer to the third fact as the predictability puzzle. The reason they are called puzzles is that they are hard to rationalize in a simple consumption-based model. To see this, consider the following endowment economy, which we come back to a number of times in this section. There are an infinite number of identical investors, and two assets: a risk-free asset in zero net supply, with gross return Rf ,t between time t and t + 1, and a risky asset – the stock market – in fixed positive supply, with gross return Rt + 1 between time t and t + 1. The stock market is a claim to a perishable stream of dividends {Dt }, where Dt + 1 = exp [gD + sD et + 1 ] , Dt
(3)
and where each period’s dividend can be thought of as one component of a consumption endowment Ct , where Ct + 1 = exp [gC + sC ht + 1 ] , Ct and
et ht
0 1 w ~N , , 0 w 1
(4)
i.i.d. over time.
(5)
Investors choose consumption Ct and an allocation St to the risky asset to maximize E0
∞ t=0
øt
Ct1−g , 1−g
(6)
subject to the standard budget constraint. 16 Using the Euler equation of optimality, −g Ct + 1 1 = øEt Rt + 1 , (7) Ct it is straightforward to derive expressions for stock returns and prices. The details are in the Appendix. 15
These three facts are widely agreed on, but they are not completely uncontroversial. A large literature has debated the statistical significance of the time series predictability, while others have argued that the equity premium is overstated due to survivorship bias [Brown, Goetzmann and Ross (1995)]. 16 For g = 1, we replace C 1 − g / 1 − g with log(C ). t t
Ch. 18:
A Survey of Behavioral Finance
1077
Table 2 Parameter values for a simple consumption-based model Parameter Value
gC 1.84%
sC 3.79%
gD 1.5%
sD 12.0%
w 0.15
g 1.0
ø 0.98
We can now examine the model’s quantitative predictions for the parameter values in Table 2. The endowment process parameters are taken from U.S. data spanning the 20th century, and are standard in the literature. It is also standard to start out by considering low values of g. The reason is that when one computes, for various values of g, how much wealth an individual would be prepared to give up to avoid a largescale timeless wealth gamble, low values of g match best with introspection as to what the answers should be [Mankiw and Zeldes (1991)]. We take g = 1, which corresponds to log utility. In an economy with these parameter values, the average log return on the stock market would be just 0.1% higher than the risk-free rate, not the 3.9% observed historically. The standard deviation of log stock returns would be only 12%, not 18%, and the price–dividend ratio would be constant (implying, of course, that the dividend– price ratio has no forecast power for future returns). It is useful to recall the intuition for these results. In an economy with power utility preferences, the equity premium is determined by risk aversion g and by risk, measured as the covariance of stock returns and consumption growth. Since consumption growth is very smooth in the data, this covariance is very low, thus predicting a very low equity premium. Stocks simply do not appear risky to investors with the preferences in Equation (6) and with low g, and therefore do not warrant a large premium. Of course, the equity premium predicted by the model can be increased by using higher values of g. However, other than making counterintuitive predictions about individuals’ attitudes to large-scale gambles, this would also predict a counterfactually high riskfree rate, a problem known as the risk-free rate puzzle [Weil (1989)]. To understand the volatility puzzle, note that in the simple economy described above, both discount rates and expected dividend growth are constant over time. A direct application of the present value formula implies that the price–dividend ratio, P/D henceforth, is constant. Since Rt + 1 =
Dt + 1 + Pt + 1 1 + Pt + 1 /Dt + 1 Dt + 1 = , Pt Pt /Dt Dt
(8)
it follows that rt + 1 = Ddt + 1 + const. ≡ dt + 1 − dt + const.,
(9)
where lower case letters indicate log variables. The standard deviation of log returns will therefore only be as high as the standard deviation of log dividend growth, namely 12%.
1078
N. Barberis and R. Thaler
The particular volatility puzzle seen here illustrates a more general point, first made by Shiller (1981) and LeRoy and Porter (1981), namely that it is difficult to explain the historical volatility of stock returns with any model in which investors are rational and discount rates are constant. To see the intuition, consider the identity in Equation (8) again. Since the volatility of log dividend growth is only 12%, the only way for a model to generate an 18% volatility of log returns is to introduce variation in the P/D ratio. But if discount rates are constant, a quick glance at a present-value formula shows that the only way to do that is to introduce variation in investors’ forecasts of the dividend growth rate: a higher forecast raises the P/D ratio, a lower forecast brings it down. There is a catch here, though: if investors are rational, their expectations for dividend growth must, on average, be confirmed. In other words, times of higher (lower) P/D ratios should, on average, be followed by higher (lower) cash-flow growth. Unfortunately, price–dividend ratios are not reliable forecasters of dividend growth, neither in the USA nor in most international markets (see Campbell (1999), for recent evidence). Shiller and LeRoy and Porter’s results shocked the profession when they first appeared. At the time, most economists felt that discount rates were close to constant over time, apparently implying that stock market volatility could only be fully explained by appealing to investor irrationality. Today, it is well understood that rational variation in discount rates can help explain the volatility puzzle, although we argue later that models with irrational beliefs also offer a plausible way of thinking about the data. Both the rational and behavioral approaches to finance have made progress in understanding the three puzzles singled out at the start of this section. The advances on the rational side are well described in other articles in this handbook. Here, we discuss the behavioral approaches, starting with the equity premium puzzle and then turning to the volatility puzzle. We do not consider the predictability puzzle separately, because in any model with a stationary P/D ratio, a resolution of the volatility puzzle is simultaneously a resolution of the predictability puzzle. To see this, recall from Equation (8) that any model which captures the empirical volatility of returns must involve variation in the P/D ratio. Moreover, for a model to be a satisfactory resolution of the volatility puzzle, it should not make the counterfactual prediction that P/D ratios forecast subsequent dividend growth. Now suppose that the P/D ratio is higher than average. The only way it can return to its mean is if cash flows D subsequently go up, or if prices P fall. Since the P/D ratio is not allowed to forecast cash flows, it must forecast lower returns, thereby explaining the predictability puzzle.
4.1. The equity premium puzzle The core of the equity premium puzzle is that even though stocks appear to be an attractive asset – they have high average returns and a low covariance with consumption
Ch. 18:
A Survey of Behavioral Finance
1079
growth – investors appear very unwilling to hold them. In particular, they appear to demand a substantial risk premium in order to hold the market supply. To date, behavioral finance has pursued two approaches to this puzzle. Both are based on preferences: one relies on prospect theory, the other on ambiguity aversion. In essence, both approaches try to understand what it is that is missing from the popular preference specification in Equation (6) that makes investors fear stocks so much, leading them to charge a high premium in equilibrium. 4.1.1. Prospect theory One of the earliest papers to link prospect theory to the equity premium is Benartzi and Thaler (1995), BT henceforth. They study how an investor with prospect theorytype preferences allocates his financial wealth between T–Bills and the stock market. Prospect theory argues that when choosing between gambles, people compute the gains and losses for each one and select the one with the highest prospective utility. In a financial context, this suggests that people may choose a portfolio allocation by computing, for each allocation, the potential gains and losses in the value of their holdings, and then taking the allocation with the highest prospective utility. In other words, they choose w, the fraction of financial wealth in stocks, to maximize Ep v (1 − w) Rf ,t + 1 + wRt + 1 − 1 ,
(10)
where p and v are defined in Equation (2). In particular, v captures loss aversion, the experimental finding that people are more sensitive to losses than to gains. Rf ,t + 1 and Rt + 1 are the gross returns on T-Bills and the stock market between t and t + 1, respectively, making the argument of v the return on financial wealth. In order to implement this model, BT need to stipulate how often investors evaluate their portfolios. In other words, how long is the time interval between t and t + 1? To see why this matters, compare two investors: energetic Nick who calculates the gains and losses in his portfolio every day, and laid-back Dick who looks at his portfolio only once per decade. Since, on a daily basis, stocks go down in value almost as often as they go up, the loss aversion built into v makes stocks appear unattractive to Nick. In contrast, loss aversion does not have much effect on Dick’s perception of stocks since, at ten year horizons, stocks offer only a small risk of losing money. Rather than simply pick an evaluation interval, BT calculate how often investors would have to evaluate their portfolios to make them indifferent between stocks and T-Bills: in other words, given historical U.S. data on stocks and T-Bills, for what evaluation interval would substituting w = 0 and w = 1 into Equation (10) give the same prospective utility? Roughly speaking, this calculation can be thought of as asking what kind of equity premium might be sustainable in equilibrium: how often would investors need to evaluate their gains and losses so that even in the face of the large historical equity premium, they would still be happy to hold the market supply of T-Bills.
1080
N. Barberis and R. Thaler
BT find that for the parametric forms for p and v estimated in experimental settings, the answer is one year, and they argue that this is indeed a natural evaluation period for investors to use. The way people frame gains and losses is plausibly influenced by the way information is presented to them. Since we receive our most comprehensive mutual fund reports once a year, and do our taxes once a year, it is not unreasonable that gains and losses might be expressed as annual changes in value. The BT calculation therefore suggests a simple way of understanding the high historical equity premium. If investors get utility from annual changes in financial wealth and are loss averse over these changes, their fear of a major drop in financial wealth will lead them to demand a high premium as compensation. BT call the combination of loss aversion and frequent evaluations myopic loss aversion. BT’s result is only suggestive of a solution to Mehra and Prescott’s equity premium puzzle. As emphasized at the start of this section, that puzzle is in large part a consumption puzzle: given the low volatility of consumption growth, why are investors so reluctant to buy a high return asset, stocks, especially when that asset’s covariance with consumption growth is so low? Since BT do not consider an intertemporal model with consumption choice, they cannot address this issue directly. To see if prospect theory can in fact help with the equity premium puzzle, Barberis, Huang and Santos (2001), BHS henceforth, make a first attempt at building it into a dynamic equilibrium model of stock returns. A simple version of their model, an extension of which we consider later, examines an economy with the same structure as the one described at the start of Section 4, but in which investors have the preferences ∞ 1−g −g t Ct (11) + b0 C t vˆ (Xt + 1 ) . ø E0 1−g t=0
The investor gets utility from consumption, but over and above that, he gets utility from changes in the value of his holdings of the risky asset between t and t + 1, denoted here by Xt + 1 . Motivated by BT’s findings, BHS define the unit of time to be a year, so that gains and losses are measured annually. The utility from these gains and losses is determined by vˆ where X X 0, vˆ (X ) = for (12) 2.25X X < 0. The 2.25 factor comes from Tversky and Kahneman’s (1992) experimental study of attitudes to timeless gambles. This functional form is simpler than the one used by BT, v. It captures loss aversion, but ignores other elements of prospect theory, such as the concavity (convexity) over gains (losses) and the probability transformation. In part this is because it is difficult to incorporate all these features into a fully dynamic framework; but also, it is based on BT’s observation that it is mainly loss aversion that drives their results. 17 17
−g
The b0 C t coefficient on the loss aversion term is a scaling factor which ensures that risk premia in the economy remain stationary even as aggregate wealth increases over time. It involves per capita
Ch. 18:
A Survey of Behavioral Finance
1081
BHS show that loss aversion can indeed provide a partial explanation of the high Sharpe ratio on the aggregate stock market. However, how much of the Sharpe ratio it can explain depends heavily on the importance of the second source of utility in Equation (11), or in short, on b0 . As a way of thinking about this parameter, BHS note that when b0 = 0.7, the psychological pain of losing $100 in the stock market, captured by the second term, is roughly equal to the consumption-related pain of having to consume $100 less, captured by the first term. For this b0 , the Sharpe ratio of the risky asset is 0.11, about a third of its historical value. BT and BHS are both effectively assuming that investors engage in narrow framing, both cross-sectionally and temporally. Even if they have many forms of wealth, both financial and non-financial, they still get utility from changes in the value of one specific component of their total wealth: financial wealth in the case of BT, and stock holdings in the case of BHS. And even if investors have long investment horizons, they still evaluate their portfolio returns on an annual basis. The assumption about cross-sectional narrow framing can be motivated in a number of ways. The simplest possibility is that it captures non-consumption utility, such as regret. Regret is the pain we feel when we realize that we would be better off if we had not taken a certain action in the past. If the investor’s stock holdings fall in value, he may regret the specific decision he made to invest in stocks. Such feelings are naturally captured by defining utility directly over changes in the investors’ financial wealth or in the value of his stock holdings. Another possibility is that while people actually care only about consumption-related utility, they are boundedly rational. For example, suppose that they are concerned that their consumption might fall below some habit level. They know that the right thing to do when considering a stock market investment is to merge the stock market risk with other pre-existing risks that they face – labor income risk, say – and then to compute the likelihood of consumption falling below habit. However, this calculation may be too complex. As a result, people may simply focus on gains and losses in stock market wealth alone, rather than on gains and losses in total wealth. What about temporal narrow framing? We suggested above that the way information is presented may lead investors to care about annual changes in financial wealth even if they have longer investment horizons. To provide further evidence for this, Thaler, Tversky, Kahneman and Schwartz (1997) provide an experimental test of the idea that the manner in which information is presented affects the frame people adopt in their decision-making. 18
consumption C t which is exogeneous to the investor, and so does not affect the intuition of the model. The constant b0 controls the importance of the loss aversion term in the investor’s preferences; setting b0 = 0 reduces the model to the much studied case of power utility over consumption. As b0 → ∞, the investor’s decisions are driven primarily by concern about gains and losses in financial wealth, as assumed by Benartzi and Thaler. 18 See also Gneezy and Potters (1997) for a similar experiment.
1082
N. Barberis and R. Thaler
In their experiment, subjects are asked to imagine that they are portfolio managers for a small college endowment. One group of subjects – Group I, say – is shown monthly observations on two funds, Fund A and Fund B. Returns on Fund A (B) are drawn from a normal distribution calibrated to mimic bond (stock) returns as closely as possible, although subjects are not given this information. After each monthly observation, subjects are asked to allocate their portfolio between the two funds over the next month. They are then shown the realized returns over that month, and asked to allocate once again. A second group of investors – Group II – is shown exactly the same series of returns, except that it is aggregated at the annual level; in other words, these subjects do not see the monthly fund fluctuations, but only cumulative annual returns. After each annual observation, they are asked to allocate their portfolio between the two funds over the next year. A final group of investors – Group III – is shown exactly the same data, this time aggregated at the five-year level, and they too are asked to allocate their portfolio after each observation. After going through a total of 200 months worth of observations, each group is asked to make one final portfolio allocation, which is to apply over the next 400 months. Thaler et al. (1997) find that the average final allocation chosen by subjects in Group I is much lower than that chosen by people in Groups II and III. This result is consistent with the idea that people code gains and losses based on how information is presented to them. Subjects in Group I see monthly observations and hence more frequent losses. If they adopt the monthly distribution as a frame, they will be more wary of stocks and will allocate less to them. 4.1.2. Ambiguity aversion In Section 3, we presented the Ellsberg paradox as evidence that people dislike ambiguity, or situations where they are not sure what the probability distribution of a gamble is. This is potentially very relevant for finance, as investors are often uncertain about the distribution of a stock’s return. Following the work of Ellsberg, many models of how people react to ambiguity have been proposed; Camerer and Weber (1992) provide a comprehensive review. One of the more popular approaches is to suppose that when faced with ambiguity, people entertain a range of possible probability distributions and act to maximize the minimum expected utility under any candidate distribution. In effect, people behave as if playing a game against a malevolent opponent who picks the actual distribution of the gamble so as to leave them as worse off as possible. Such a decision rule was first axiomatized by Gilboa and Schmeidler (1989). Epstein and Wang (1994) showed how such an approach could be incorporated into a dynamic asset pricing model, although they did not try to assess the quantitative implications of ambiguity aversion for asset prices. Quantitative implications have been derived using a closely related framework known as robust control. In this approach, the agent has a reference probability
Ch. 18:
A Survey of Behavioral Finance
1083
distribution in mind, but wants to ensure that his decisions are good ones even if the reference model is misspecified to some extent. Here too, the agent essentially tries to guard against a “worst-case” misspecification. Anderson, Hansen and Sargent (1998) show how such a framework can be used for portfolio choice and pricing problems, even when state equations and objective functions are nonlinear. Maenhout (1999) applies the Anderson et al. framework to the specific issue of the equity premium. He shows that if investors are concerned that their model of stock returns is misspecified, they will charge a substantially higher equity premium as compensation for the perceived ambiguity in the probability distribution. He notes, however, that to explain the full 3.9% equity premium requires an unreasonably high concern about misspecification. At best then, ambiguity aversion is only a partial resolution of the equity premium puzzle. 4.2. The volatility puzzle Before turning to behavioral work on the volatility puzzle, it is worth thinking about how rational approaches to this puzzle might proceed. Since, in the data, the volatility of returns is higher than the volatility of dividend growth, Equation (8) makes it clear that we have to make up the gap by introducing variation in the price–dividend ratio. What are the different ways we might do this? A useful framework for thinking about this is a version of the present value formula originally derived by Campbell and Shiller (1988). Starting from Rt + 1 =
Pt + 1 + Dt + 1 , Pt
(13)
where Pt is the value of the stock market at time t, they use a log-linear approximation to show that the log price–dividend ratio can be written pt − dt = Et
∞ j=0
øt Ddt + 1 + j − Et
∞ j=0
øt rt + 1+ j + Et lim ø j pt + j − dt + j + const., j→∞
(14) where lower case letters represent log variables – pt = log Pt , for example – and where Ddt + 1 = dt + 1 − dt . If the price–dividend ratio is stationary, so that the third term on the right is zero, this equation shows clearly that there are just two reasons price–dividend ratios can move around: changing expectations of future dividend growth or changing discount rates. Discount rates, in turn, can change because of changing expectations of future risk-free rates, changing forecasts of risk or changing risk aversion. While there appear to be many ways of introducing variation in the P/D ratio, it has become clear that most of them cannot form the basis of a rational explanation of the volatility puzzle. We cannot use changing forecasts of dividend growth to drive the P/D ratio: restating the argument of Shiller (1981) and LeRoy and Porter (1981), if
1084
N. Barberis and R. Thaler
these forecasts are indeed rational, it must be that P/D ratios predict cash-flow growth in the time series, which they do not. 19 Nor can we use changing forecasts of future risk-free rates: again, if the forecasts are rational, P/D ratios must predict interest rates in the time series, which they do not. Even changing forecasts of risk cannot work, as there is little evidence that P/D ratios predict changes in risk in the time series. The only story that remains is therefore one about changing risk aversion, and this is the idea behind the Campbell and Cochrane (1999) model of aggregate stock market behavior. They propose a habit formation framework in which changes in consumption relative to habit lead to changes in risk aversion and hence variation in P/D ratios. This variation helps to plug the gap between the volatility of dividend growth and the volatility of returns. Some rational approaches try to introduce variation in the P/D ratio through the third term on the right in Equation (14). Since this requires investors to expect explosive growth in P/D ratios forever, they are known as models of rational bubbles. The idea is that prices are high today because they are expected to be higher next period; and they are higher next period because they are expected to be higher the period after that, and so on, forever. While such a model might initially seem appealing, a number of papers, most recently Santos and Woodford (1997), show that the conditions under which rational bubbles can survive are extremely restrictive. 20 We now discuss some of the behavioral approaches to the volatility puzzle, grouping them by whether they focus on beliefs or on preferences. 4.2.1. Beliefs One possible story is that investors believe that the mean dividend growth rate is more variable than it actually is. When they see a surge in dividends, they are too quick to believe that the mean dividend growth rate has increased. Their exuberance pushes prices up relative to dividends, adding to the volatility of returns. A story of this kind can be derived as a direct application of representativeness and in particular, of the version of representativeness known as the law of small numbers, whereby people expect even short samples to reflect the properties of the parent population. If the investor sees many periods of good earnings, the law of small numbers leads him to believe that earnings growth has gone up, and hence that earnings 19
There is an imporant caveat to the statement that changing cash-flow forecasts cannot be the basis of a satisfactory solution to the volatility puzzle. A large literature on structural uncertainty and learning, in which investors do not know the parameters of the cash-flow process but learn them over time, has had some success in matching the empirical volatility of returns [Brennan and Xia (2001), Veronesi (1999)]. In these models, variation in price-dividend ratios comes precisely from changing forecasts of cash-flow growth. While these forecasts are not subsequently confirmed in the data, investors are not considered irrational – they simply don’t have enough data to infer the correct model. In related work, Barsky and De Long (1993) generate return volatility in an economy where investors forecast cash flows using a model that is wrong, but not easily rejected with available data. 20 Brunnermeier (2001) provides a comprehensive review of this literature.
Ch. 18:
A Survey of Behavioral Finance
1085
will continue to be high in the future. After all, the earnings growth rate cannot be “average”. If it were, then according to the law of small numbers, earnings should appear average, even in short samples: some good earnings news, some bad earnings news, but not several good pieces of news in a row. Another belief-based story relies more on private, rather than public information, and in particular, on overconfidence about private information. Suppose that an investor has seen public information about the economy, and has formed a prior opinion about future cash-flow growth. He then does some research on his own and becomes overconfident about the information he gathers: he overestimates its accuracy and puts too much weight on it relative to his prior. If the private information is positive, he will push prices up too high relative to current dividends, again adding to return volatility. 21 Price–dividend ratios and returns might also be excessively volatile because investors extrapolate past returns too far into the future when forming expectations of future returns. Such a story might again be based on representativeness and the law of small numbers. The same argument for why investors might extrapolate past cash flows too far into the future can be applied here to explain why they might do the same thing with past returns. The reader will have noticed that we do not cite any specific papers in connection with these behavioral stories. This is because these ideas were originally put forward in papers whose primary focus is explaining cross-sectional anomalies such as the value premium, even though they also apply here in a natural way. In brief, many of those papers – which we discuss in detail in Section 5 – generate certain crosssectional anomalies by building excessive time series variation into the price–earnings ratios of individual stocks. It is therefore not surprising that the mechanisms proposed there might also explain the substantial time series variation in aggregate-level price–earnings ratios. In fact, it is perhaps satisfying that these behavioral theories simultaneously address both aggregate and firm-level evidence. We close this section with a brief mention of “money illusion”, the confusion between real and nominal values first discussed by Fisher (1928), and more recently investigated by Shafir et al. (1997). In financial markets, Modigliani and Cohn (1979) and more recently, Ritter and Warr (2002), have argued that part of the variation in P/D ratios and returns may be due to investors mixing real and nominal quantities when forecasting future cash flows. The value of the stock market can be determined
21 Campbell (2000), among others, notes that behavioral models based on cash-flow forecasts often ignore potentially important interest rate effects. If investors are forecasting excessively high cashflow growth, pushing up prices, interest rates should also rise, thereby dampening the price rise. One response is that interest rates are governed by expectations about consumption growth, and in the short run, consumption and dividends can be somewhat delinked: even if dividend growth is expected to be high, this need not necessarily trigger an immediate interest rate response. Alternatively, one can try to specify investors’ expectations in such a way that interest rate effects become less important. Cecchetti, Lam and Mark (2000) take a step in this direction.
1086
N. Barberis and R. Thaler
by discounted real cash flows at real rates, or nominal cash flows at nominal rates. At times of especially high or especially low inflation though, it is possible that some investors mistakenly discount real cash flows at nominal rates. If inflation increases, so will the nominal discount rate. If investors then discount the same set of cash flows at this higher rate, they will push the value of the stock market down. Of course, this calculation is incorrect: the same inflation which pushes up the discount rate should also push up future cash flows. On net, inflation should have little effect on market value. Such real vs. nominal confusion may therefore cause excessive variation in P/D ratios and returns and seems particularly relevant to understanding the low market valuations during the high inflation years of the 1970s, as well as the high market valuations during the low inflation 1990s. 4.2.2. Preferences Barberis, Huang and Santos (2001) show that a straightforward extension of the version of their model discussed in Section 4.1 can explain both the equity premium and volatility puzzles. To do this, they appeal to experimental evidence about dynamic aspects of loss aversion. This evidence suggests that the degree of loss aversion is not the same in all circumstances but depends on prior gains and losses. In particular, Thaler and Johnson (1990) find that after prior gains, subjects take on gambles they normally do not, and that after prior losses, they refuse gambles that they normally accept. The first finding is sometimes known as the “house money effect”, reflecting gamblers’ increasing willingness to bet when ahead. One interpretation of this evidence is that losses are less painful after prior gains because they are cushioned by those gains. However, after being burned by a painful loss, people may become more wary of additional setbacks. 22 To capture these ideas, Barberis, Huang and Santos (2001) modify the utility function in Equation (11) to E0
∞ t=0
1−g C −g t + b0 C t v˜ (Xt + 1 , zt ) . øt 1−g
(15)
Here, zt is a state variable that tracks past gains and losses on the stock market. For any fixed zt , the function v˜ is a piecewise linear function similar in form to vˆ , defined in Equation (12). However, the investors’ sensitivity to losses is no longer constant at
22 It is important to distinguish Thaler and Johnson’s (1990) evidence from other evidence presented by Kahneman and Tversky (1979) and discussed in Section 3, showing that people are risk averse over gains and risk seeking over losses. One set of evidence pertains to one-shot gambles, the other to sequences of gambles. Kahneman and Tversky’s (1979) evidence suggests that people are willing to take risks in order to avoid a loss; Thaler and Johnson’s (1990) evidence suggests that if these efforts are unsuccessful and the investor suffers an unpleasant loss, he will subsequently act in a more risk averse manner.
Ch. 18:
A Survey of Behavioral Finance
1087
2.25, but is determined by zt , in a way that reflects the experimental evidence described above. A model of this kind can help explain the volatility puzzle. Suppose that there is some good cash-flow news. This pushes the stock market up, generating prior gains for investors, who are now less scared of stocks: any losses will be cushioned by the accumulated gains. They therefore discount future cash flows at a lower rate, pushing prices up still further relative to current dividends and adding to return volatility.
5. Application: The cross-section of average returns While the behavior of the aggregate stock market is not easy to understand from the rational point of view, promising rational models have nonetheless been developed and can be tested against behavioral alternatives. Empirical studies of the behavior of individual stocks have unearthed a set of facts which is altogether more frustrating for the rational paradigm. Many of these facts are about the cross-section of average returns: they document that one group of stocks earns higher average returns than another. These facts have come to be known as “anomalies” because they cannot be explained by the simplest and most intuitive model of risk and return in the financial economist’s toolkit, the Capital Asset Pricing Model, or CAPM. We now outline some of the more salient findings in this literature and then consider some of the rational and behavioral approaches in more detail. The size premium. This anomaly was first documented by Banz (1981). We report the more recent findings of Fama and French (1992). Every year from 1963 to 1990, Fama and French group all stocks traded on the NYSE, AMEX, and NASDAQ into deciles based on their market capitalization, and then measure the average return of each decile over the next year. They find that for this sample period, the average return of the smallest stock decile is 0.74% per month higher than the average return of the largest stock decile. This is certainly an anomaly relative to the CAPM: while stocks in the smallest decile do have higher betas, the difference in risk is not enough to explain the difference in average returns. 23 Long-term reversals. Every three years from 1926 to 1982, De Bondt and Thaler (1985) rank all stocks traded on the NYSE by their prior three year cumulative return and form two portfolios: a “winner” portfolio of the 35 stocks with the best prior record and a “loser” portfolio of the 35 worst performers. They then measure the average return of these two portfolios over the three years subsequent to their formation. They
23 The last decade of data has served to reduce the size premium considerably. Gompers and Metrick (2001) argue that this is due to demand pressure for large stocks resulting from the growth of institutional investors, who prefer such stocks.
1088
N. Barberis and R. Thaler
find that over the whole sample period, the average annual return of the loser portfolio is higher than the average return of the winner portfolio by almost 8% per year. The predictive power of scaled-price ratios. These anomalies, which are about the cross-sectional predictive power of variables like the book-to-market (B/M) and earnings-to-price (E/P) ratios, where some measure of fundamentals is scaled by price, have a long history in finance going back at least to Graham (1949), and more recently Dreman (1977), Basu (1983) and Rosenberg, Reid and Lanstein (1985). We concentrate on Fama and French’s (1992) more recent evidence. Every year, from 1963 to 1990, Fama and French group all stocks traded on the NYSE, AMEX and NASDAQ into deciles based on their book-to-market ratio, and measure the average return of each decile over the next year. They find that the average return of the highest-B/M-ratio decile, containing so called “value” stocks, is 1.53% per month higher than the average return on the lowest-B/M-ratio decile, “growth” or “glamour” stocks, a difference much higher than can be explained through differences in beta between the two portfolios. Repeating the calculations with the earnings–price ratio as the ranking measure produces a difference of 0.68% per month between the two extreme decile portfolios, again an anomalous result. 24 Momentum. Every month from January 1963 to December 1989, Jegadeesh and Titman (1993) group all stocks traded on the NYSE into deciles based on their prior six month return and compute average returns of each decile over the six months after portfolio formation. They find that the decile of biggest prior winners outperforms the decile of biggest prior losers by an average of 10% on an annual basis. Comparing this result to De Bondt and Thaler’s (1985) study of prior winners and losers illustrates the crucial role played by the length of the prior ranking period. In one case, prior winners continue to win; in the other, they perform poorly. 25 A challenge to both behavioral and rational approaches is to explain why extending the formation period switches the result in this way. There is some evidence that tax-loss selling creates seasonal variation in the momentum effect. Stocks with poor performance during the year may later be subject to selling by investors keen to realize losses that can offset capital gains elsewhere. This selling pressure means that prior losers continue to lose, enhancing the momentum effect. At the turn of the year, though, the selling pressure eases off, allowing prior losers to rebound and weakening the momentum effect. A careful analysis by Grinblatt
24 Ball (1978) and Berk (1995) point out that the size premium and the scaled-price ratio effects emerge naturally in any model where investors apply different discount rates to different stocks: if investors discount a stock’s cash flows at a higher rate, that stock will typically have a lower market capitalization and a lower price-earnings ratio, but also higher returns. Note, however, that this view does not shed any light on whether the variation in discount rates is rationally justifiable or not. 25 In fact, De Bondt and Thaler (1985) also report that one-year big winners outperform one-year big losers over the following year, but do not make much of this finding.
Ch. 18:
A Survey of Behavioral Finance
1089
and Moskowitz (1999) finds that on net, tax-loss selling may explain part of the momentum effect, but by no means all of it. In any case, while selling a stock for tax purposes is rational, a model of predictable price movements based on such behavior is not. Roll (1983) calls such explanations “stupid” since investors would have to be stupid not to buy in December if prices were going to increase in January. A number of studies have examined stock returns following important corporate announcements, a type of analysis known as an event study. Chapter 5 in this Handbook discusses many of these studies in detail; here, we summarize them briefly. Event studies of earnings announcements. Every quarter from 1974 to 1986, Bernard and Thomas (1989) group all stocks traded on the NYSE and AMEX into deciles based on the size of the surprise in their most recent earnings announcement. “Surprise” is measured relative to a simple random walk model of earnings. They find that on average, over the 60 days after the earnings announcement, the decile of stocks with surprisingly good news outperforms the decile with surprisingly bad news by an average of about 4%, a phenomenon known as post-earnings announcement drift. Once again, this difference in returns is not explained by differences in beta between the two portfolios. A later study by Chan, Jegadeesh and Lakonishok (1996) measures surprise in other ways – relative to analyst expectations, and by the stock price reaction to the news – and obtains similar results. 26 Event studies of dividend initiations and omissions. Michaely, Thaler and Womack (1995) study firms which announced initiation or omission of a dividend payment between 1964 and 1988. They find that on average, the shares of firms initiating (omitting) dividends significantly outperform (underperform) the market portfolio over the year after the announcement. Event studies of stock repurchases. Ikenberry, Lakonishok and Vermaelen (1995) look at firms which announced a share repurchase between 1980 and 1990, while Mitchell and Stafford (2001) study firms which did either self-tenders or share repurchases between 1960 and 1993. The latter study finds that on average, the shares of these firms outperform a control group matched on size and book-to-market by a substantial margin over the four year period following the event. Event studies of primary and secondary offerings. Loughran and Ritter (1995) study firms which undertook primary or secondary equity offerings between 1970 and 1990. 26 Vuolteenaho (2002) combines a clean-surplus accounting version of the present value formula with Campbell’s (1991) log-linear decomposion of returns to estimate a measure of cash-flow news that is potentially more accurate than earnings announcements. Analogous to the post-earnings announcement studies, he finds that stocks with good cash-flow news subsequently have higher average returns than stocks with disappointing cash-flow news.
1090
N. Barberis and R. Thaler
They find that the average return of shares of these firms over the five-year period after the issuance is markedly below the average return of shares of non-issuing firms matched to the issuing firms on size. Brav and Gompers (1997) and Brav, Geczy and Gompers (2000) argue that this anomaly may not be distinct from the scaled-price anomaly listed above: when the returns of event firms are compared to the returns of firms matched on both size and book-to-market, there is very little difference. Long-term event studies like the last three analyses summarized above raise some thorny statistical problems. In particular, conducting statistical inference with longterm buy-and-hold post-event returns is a treacherous business. Barber and Lyon (1997), Lyon, Barber and Tsai (1999), Brav (2000), Fama (1998), Loughran and Ritter (2000) and Mitchell and Stafford (2001) are just a few of the papers that discuss this topic. Cross-sectional correlation is one important issue: if a certain firm announces a share repurchase shortly after another firm does, their four-year post event returns will overlap and cannot be considered independent. Although the problem is an obvious one, it is not easy to deal with effectively. Some recent attempts to do so, such as Brav (2000), suggest that the anomalous evidence in the event studies on dividend announcements, repurchase announcements, and equity offerings is statistically weaker than initially thought, although how much weaker remains controversial. A more general concern with all the above empirical evidence is data-mining. After all, if we sort and rank stocks in enough different ways, we are bound to discover striking – but completely spurious – cross-sectional differences in average returns. A first response to the data-mining critique is to note that the above studies do not use the kind of obscure firm characteristics or marginal corporate announcements that would suggest data-mining. Indeed, it is hard to think of an important class of corporate announcements that has not been associated with a claim about anomalous post-event returns. A more direct check is to perform out-of-sample tests. Interestingly, a good deal of the above evidence has been replicated in other data sets. Fama, French and Davis (2000) show that there is a value premium in the subsample of U.S. data that precedes the data set used in Fama and French (1992), while Fama and French (1998) document a value premium in international stock markets. Rouwenhorst (1998) shows that the momentum effect is alive and well in international stock market data. If the empirical results are taken at face value, then the challenge to the rational paradigm is to show that the above cross-sectional evidence emerges naturally from a model with fully rational investors. In special cases, models of this form reduce to the CAPM, and we know that this does not explain the evidence. More generally, rational models predict a multifactor pricing structure, (16) r i − rf = bi,1 F 1 − rf + · · · + bi,K F K − rf , where the factors proxy for marginal utility growth and where the loadings bi,k come from a time series regression of excess stock returns on excess factor returns, (17) ri,t − rf ,t = ai + bi,1 F1,t − rf ,t + · · · + bi,K FK,t − rf ,t + ei,t .
Ch. 18:
A Survey of Behavioral Finance
1091
To date, it has proved difficult to derive a multi-factor model which explains the crosssectional evidence, although this remains a major research direction. Alternatively, one can skip the step of deriving a factor model, and simply try a specific model to see how it does. This is the approach of Fama and French (1993, 1996). They show that a certain three factor model does a good job explaining the average returns of portfolios formed on size and book-to-market rankings. Put differently, the ai intercepts in regression (17) are typically close to zero for these portfolios and for their choice of factors. The specific factors they use are the return on the market portfolio, the return on a portfolio of small stocks minus the return on a portfolio of large stocks – the “size” factor – and the return on a portfolio of value stocks minus the return on a portfolio of growth stocks – the “book-to-market” factor. By constructing these last two factors, Fama and French are isolating common factors in the returns of small stocks and value stocks, and their three factor model can be loosely motivated by the idea that this comovement is a systematic risk that is priced in equilibrium. The low ai intercepts obtained by Fama and French (1993, 1996) are not necessarily cause for celebration. After all, as Roll (1977) emphasizes, in any specific sample, it is always possible to mechanically construct a one factor model that prices average returns exactly. 27 This sounds a cautionary note: just because a factor model happens to work well does not necessarily mean that we are learning anything about the economic drivers of average returns. To be fair, Fama and French (1996) themselves admit that their results can only have their full impact once it is explained what it is about investor preferences and the structure of the economy that leads people to price assets according to their model. One general feature of the rational approach is that it is loadings or betas, and not firm characteristics, that determine average returns. For example, a risk-based approach would argue that value stocks earn high returns not because they have high book-to-market ratios, but because such stocks happen to have a high loading on the book-to-market factor. Daniel and Titman (1997) cast doubt on this specific prediction by performing double sorts of stocks on both book-to-market ratios and loadings on book-to-market factors, and showing that stocks with different loadings but the same book-to-market ratio do not differ in their average returns. These results appear quite damaging to the rational approach, but they have also proved controversial. Using a longer data set and a different methodology, Fama, French and Davis (2000) claim to reverse Daniel and Titman’s findings. More generally, rational approaches to the cross-sectional evidence face a number of other obstacles. First, rational models typically measure risk as the covariance of
27 For any sample of observations on individual returns, choose any one of the ex-post mean-variance efficient portfolios. Roll (1977) shows that there is an exact linear relationship between the sample mean returns of the individual assets and their betas, computed with respect to the mean-variance efficient portfolio.
1092
N. Barberis and R. Thaler
returns with marginal utility of consumption. Stocks are risky if they fail to pay out at times of high marginal utility – in “bad” times – and instead pay out when marginal utility is low – in “good” times. The problem is that for many of the above findings, there is little evidence that the portfolios with anomalously high average returns do poorly in bad times, whatever plausible measure of bad times is used. For example, Lakonishok, Shleifer and Vishny (1994) show that in their 1968 to 1989 sample period, value stocks do well relative to growth stocks even when the economy is in recession. Similarly, De Bondt and Thaler (1987) find that their loser stocks have higher betas than winners in up markets and lower betas in down markets – an attractive combination that no one would label “risky”. Second, some of the portfolios in the above studies – the decile of stocks with the lowest book-to-market ratios for example – earn average returns below the risk-free rate. It is not easy to explain why a rational investor would willingly accept a lower return than the T-Bill rate on a volatile portfolio. Third, Chopra, Lakonishok and Ritter (1992) and La Porta et al. (1997) show that a large fraction of the high (low) average returns to prior losers (winners) documented by De Bondt and Thaler (1985), and of the high (low) returns to value (growth) stocks, is earned over a very small number of days around earnings announcements. It is hard to tell a rational story for why the premia should be concentrated in this way, given that there is no evidence of changes in systematic risk around earnings announcements. Finally, in some of the examples given above, it is not just that one portfolio outperforms another on average. In some cases, the outperformance is present in almost every period of the sample. For example, in Bernard and Thomas’ (1989) study, firms with surprisingly good earnings outperform those with surprisingly poor earnings in 46 out of the 50 quarters studied. It is not easy to see any risk here that might justify the outperformance. 5.1. Belief-based models There are a number of behavioral models which try to explain some of the above phenomena. We classify them based on whether their mechanism centers on beliefs or on preferences. Barberis, Shleifer and Vishny (1998), BSV henceforth, argue that much of the above evidence is the result of systematic errors that investors make when they use public information to form expectations of future cash flows. They build a model that incorporates two of the updating biases from Section 3: conservatism, the tendency to underweight new information relative to priors; and representativeness, and in particular the version of representativeness known as the law of small numbers, whereby people expect even short samples to reflect the properties of the parent population. When a company announces surprisingly good earnings, conservatism means that investors react insufficiently, pushing the price up too little. Since the price is too low, subsequent returns will be higher on average, thereby generating both post-earnings
Ch. 18:
A Survey of Behavioral Finance
1093
announcement drift and momentum. After a series of good earnings announcements, though, representativeness causes people to overreact and push the price up too high. The reason is that after many periods of good earnings, the law of small numbers leads investors to believe that this is a firm with particularly high earnings growth, and hence to forecast high earnings in the future. After all, the firm cannot be “average”. If it were, then according the to law of small numbers, its earnings should appear average, even in short samples. Since the price is now too high, subsequent returns are too low on average, thereby generating long-term reversals and a scaled-price ratio effect. To capture these ideas mathematically, BSV consider a model with a representative risk-neutral investor in which the true earnings process for all assets is a random walk. Investors, however, do not use the random-walk model to forecast future earnings. They think that at any time, earnings are being generated by one of two regimes: a “meanreverting” regime, in which earnings are more mean-reverting than in reality, and a “trending” regime in which earnings trend more than in reality. The investor believes that the regime generating earnings changes exogenously over time and sees his task as trying to figure out which of the two regimes is currently generating earnings. This framework offers one way of modelling the updating biases described above. Including a “trending” regime in the model captures the effect of representativeness by allowing investors to put more weight on trends than they should. Conservatism suggests that people may put too little weight on the latest piece of earnings news relative to their prior beliefs. In other words, when they get a good piece of earnings news, they effectively act as if part of the shock will be reversed in the next period, in other words, as if they believe in a “mean-reverting” regime. BSV confirm that for a wide range of parameter values, this model does indeed generate post-earnings announcement drift, momentum, long-term reversals and cross-sectional forecasting power for scaled-price ratios. 28 Daniel, Hirshleifer and Subrahmanyam (1998, 2001), DHS henceforth, stress biases in the interpretation of private, rather than public information. Imagine that the investor does some research on his own to try to determine a firm’s future cash flows. DHS assume that he is overconfident about this information; in particular, they argue that investors are more likely to be overconfident about private information they have worked hard to generate than about public information. If the private information is positive, overconfidence means that investors will push prices up too far relative to fundamentals. Future public information will slowly pull prices back to their correct value, thus generating long-term reversals and a scaled-price effect. To get momentum and a post-earnings announcement effect, DHS assume that the public information alters the investor’s confidence in his original private information in
28 Poteshman (2001) finds evidence of a BSV-type expectations formation process in the options market. He shows that when pricing options, traders appear to underreact to individual daily changes in instantaneous variance, while overreacting to longer sequences of increasing or decreasing changes in instantaneous variance.
1094
N. Barberis and R. Thaler
an asymmetric fashion, a phenomenon known as self-attribution bias: public news which confirms the investor’s research strongly increases the confidence he has in that research. Disconfirming public news, though, is given less attention, and the investor’s confidence in the private information remains unchanged. This asymmetric response means that initial overconfidence is on average followed by even greater overconfidence, generating momentum. If, as BSV and DHS argue, long-term reversals and the predictive power of scaledprice ratios are driven by excessive optimism or pessimism about future cash flows followed by a correction, then most of the correction should occur at those times when investors find out that their initial beliefs were too extreme, in other words, at earnings announcement dates. The findings of Chopra, Lakonishok and Ritter (1992) and La Porta et al. (1997), who show that a large fraction of the premia to prior losers and to value stocks is earned around earnings announcement days, strongly confirm this prediction. Perhaps the simplest way of capturing much of the cross-sectional evidence is positive feedback trading, where investors buy more of an asset that has recently gone up in value [De Long et al. (1990b), Barberis and Shleifer (2003)]. If a company’s stock price goes up this period on good earnings, positive feedback traders buy the stock in the following period, causing a further price rise. On the one hand, this generates momentum and post-earnings announcement drift. On the other hand, since the price has now risen above what is justified by fundamentals, subsequent returns will on average be too low, generating long-term reversals and a scaled-price ratio effect. The simplest way of motivating positive feedback trading is extrapolative expectations, where investors’ expectations of future returns are based on past returns. This in turn, may be due to representativeness and to the law of small numbers in particular. The same argument made by BSV as to why investors might extrapolate past cash flows too far into the future can be applied here to explain why they might extrapolate past returns too far into the future. De Long et al. (1990b) note that institutional features such as portfolio insurance or margin calls can also generate positive feedback trading. Positive feedback trading also plays a central role in the model of Hong and Stein (1999), although in this case it emerges endogenously from more primitive assumptions. In this model, two boundedly rational groups of investors interact, where bounded rationality means that investors are only able to process a subset of available information. “Newswatchers” make forecasts based on private information, but do not condition on past prices. “Momentum traders” condition only on the most recent price change. Hong and Stein also assume that private information diffuses slowly through the population of newswatchers. Since these investors are unable to extract each others’ private information from prices, the slow diffusion generates momentum. Momentum traders are then added to the mix. Given what they are allowed to condition on, their optimal strategy is to engage in positive feedback trading: a price increase last period is a sign that good private information is diffusing through the economy. By buying,
Ch. 18:
A Survey of Behavioral Finance
1095
momentum traders hope to profit from the continued diffusion of information. This behavior preserves momentum, but also generates price reversals: since momentum traders cannot observe the extent of news diffusion, they keep buying even after price has reached fundamental value, generating an overreaction that is only later reversed. These four models differ most in their explanation of momentum. In two of the models – BSV and Hong and Stein (1999) – momentum is due to an initial underreaction followed by a correction. In De Long et al. (1990b) and DHS, it is due to an initial overreaction followed by even more overreaction. Within each pair, the stories are different again. 29 Hong, Lim and Stein (2000) present supportive evidence for the view of Hong and Stein (1999) that momentum is due simply to slow diffusion of private information through the economy. They argue that the diffusion of information will be particularly slow among small firms and among firms with low analyst coverage, and that the momentum effect should therefore be more prominent there, a prediction they confirm in the data. They also find that among firms with low analyst coverage, momentum is almost entirely driven by prior losers continuing to lose. They argue that this, too, is consistent with a diffusion story. If a firm not covered by analysts is sitting on good news, it will do its best to convey the news to as many people as possible, and as quickly as possible; bad news, however, will be swept under the carpet, making its diffusion much slower. 5.2. Belief-based models with institutional frictions Some authors have argued that models which combine mild assumptions about investor irrationality with institutional frictions may offer a fruitful way of thinking about some of the anomalous cross-sectional evidence. The institutional friction that has attracted the most attention is short-sale constraints. As mentioned in Section 2.2, these can be thought of as anything which makes investors less willing to establish a short position than a long one. They include the direct cost of shorting, namely the lending fee; the risk that the loan is recalled by the lender at an inopportune moment; as well as legal restrictions: a large fraction of mutual funds are not allowed to short stocks. Several papers argue that when investors differ in their beliefs, the existence of short-sale constraints can generate deviations from fundamental value and in particular, explain why stocks with high price–earnings ratios earn lower average returns in the cross-section. The simplest way of motivating the assumption of heterogeneous beliefs is overconfidence, which is why that assumption is often thought of as capturing a mild form of irrationality. In the absence of overconfidence, investors’ beliefs converge
29 In particular, the models make different predictions about how individual investors would trade following certain sequences of past returns. Armed with transaction-level data, Hvidkjaer (2001) exploits this to provide initial evidence that may distinguish the theories.
1096
N. Barberis and R. Thaler
rapidly as they hear each other’s opinions and hence deduce each other’s private information. There are at least two mechanisms through which differences of opinion and shortsale constraints can generate price–earnings ratios that are too high, and thereby explain why price–earnings ratios predict returns in the cross-section. Miller (1977) notes that when investors hold different views about a stock, those with bullish opinions will, of course, take long positions. Bearish investors, on the other hand, want to short the stock, but being unable to do so, they sit out of the market. Stock prices therefore reflect only the opinions of the most optimistic investors which, in turn, means that they are too high and that they will be followed by lower returns. Harrison and Kreps (1978) and Scheinkman and Xiong (2003) argue that in a dynamic setting, a second, speculation-based mechanism arises. They show that when there are differences in beliefs, investors will be happy to buy a stock for more than its fundamental value in anticipation of being able to sell it later to other investors even more optimistic than themselves. Note that short-sale constraints are essential to this story: in their absence, an investor can profit from another’s greater optimism by simply shorting the stock. With short-sale constraints, the only way to do so is to buy the stock first, and then sell it on later. Both types of models make the intriguing prediction that stocks which investors disagree about more will have higher price–earnings ratios and lower subsequent returns. Three recent papers test this prediction, each using a different measure of differences of opinion. Diether, Malloy and Scherbina (2002) use IBES data on analyst forecasts to obtain a direct measure of heterogeneity of opinion. They group stocks into quintiles based on the level of dispersion in analysts’ forecasts of current year earnings and confirm that the highest dispersion portfolio earns lower average returns than the lowest dispersion portfolio. Chen, Hong and Stein (2002) use “breadth of ownership” – defined roughly as the fraction of mutual funds that hold a particular stock – as a proxy for divergence of opinion about the stock. The more dispersion in opinions there is, the more mutual funds will need to sit out the market due to short sales constraints, leading to lower breadth. Chen et al. predict, and confirm in the data, that stocks experiencing a decrease in breadth subsequently have lower average returns compared to stocks whose breadth increases. Jones and Lamont (2002) use the cost of short-selling a stock – in other words, the lending fee – to measure differences of opinion about that stock. The idea is that if there is a lot of disagreement about a stock’s prospects, many investors will want to short the stock, thereby pushing up the cost of doing so. Jones and Lamont confirm that stocks with higher lending fees have higher price–earnings ratios and earn lower subsequent returns. It is interesting to note that their data set spans the years from 1926 to 1933. At that time, there existed a centralized market for borrowing stocks and lending fees were published daily in the Wall Street Journal. Today, by contrast, stock lending is an over-the-counter market, and data on lending fees is harder to come by.
Ch. 18:
A Survey of Behavioral Finance
1097
In other related work, Hong and Stein (2003) show that short-sale constraints and differences of opinion also have implications for higher order moments, in that they can lead to skewness. The intuition is that when a stock’s price goes down, more information is revealed: by seeing at what point they enter the market, we learn the valuations of those investors whose pessimistic views could not initially be reflected in the stock price, because of short-sale constraints. When the stock market goes up, the sidelined investors stay out of the market and there is less information revelation. This increase in volatility after a downturn is the source of the skewness. One prediction of this idea is that stocks which investors disagree about more should exhibit greater skewness. Chen, Hong and Stein (2001) test this idea using increases in turnover as a sign of investor disagreement. They show that stocks whose turnover increases subsequently display greater skewness. 5.3. Preferences Earlier, we discussed Barberis, Huang and Santos (2001), which tries to explain aggregate stock market behavior by combining loss aversion and narrow framing with an assumption about how the degree of loss aversion changes over time. Barberis and Huang (2001) show that applying the same ideas to individual stocks can generate the evidence on long-term reversals and on scaled-price ratios. The key idea is that when investors hold a number of different stocks, narrow framing may induce them to derive utility from gains and losses in the value of individual stocks. The specification of this additional source of utility is exactly the same as in BHS, except that it is now applied at the individual stock level instead of at the portfolio level: the investor is loss averse over individual stock fluctuations and the pain of a loss on a specific stock depends on that stock’s past performance. To see how this model generates a value premium, consider a stock which has had poor returns several periods in a row. Precisely because the investor focuses on individual stock gains and losses, he finds this painful and becomes especially sensitive to the possibility of further losses on the stock. In effect, he perceives the stock as riskier, and discounts its future cash flows at a higher rate: this lowers its price–earnings ratio and leads to higher subsequent returns, generating a value premium. In one sense, this model is narrower than those in the “beliefs” section, Section 5.1, as it does not claim to address momentum. In another sense, it is broader, in that it simultaneously explains the equity premium and derives the risk-free rate endogenously. The models we describe in Sections 5.1, 5.2 and 5.3 focus primarily on momentum, long-term reversals, the predictive power of scaled-price ratios and post-earnings announcement drift. What about the other examples of anomalous evidence with which we began Section 5? In Section 7, we argue that the long-run return patterns following equity issuance and repurchases may be the result of rational managers responding to the kinds of noise traders analyzed in the preceding behavioral models. In short, if investors cause prices to swing away from fundamental value, managers may try to
1098
N. Barberis and R. Thaler
time these cycles, issuing equity when it is overpriced, and repurchasing it when it is cheap. In such a world, equity issues will indeed be followed by low returns, and repurchases by high returns. The models we have discussed so far do not, however, shed light on the size anomaly, nor on the dividend announcement event study.
6. Application: Closed-end funds and comovement 6.1. Closed-end funds Closed-end funds differ from more familiar open-end funds in that they only issue a fixed number of shares. These shares are then traded on exchanges: an investor who wants to buy a share of a closed-end fund must go to the exchange and buy it from another investor at the prevailing price. By contrast, should he want to buy a share of an open-end fund, the fund would create a new share and sell it to him at its net asset value, or NAV, the per share market value of its asset holdings. The central puzzle about closed-end funds is that fund share prices differ from NAV. The typical fund trades at a discount to NAV of about 10% on average, although the difference between price and NAV varies substantially over time. When closed-end funds are created, the share price is typically above NAV; when they are terminated, either through liquidation or open-ending, the gap between price and NAV closes. A number of rational explanations for the average closed-end fund discount have been proposed. These include expenses, expectations about future fund manager performance, and tax liabilities. These factors can go some way to explaining certain aspects of the closed-end fund puzzle. However, none of them can satisfactorily explain all aspects of the evidence. For example, management fees can explain why funds usually sell at discounts, but not why they typically initially sell at a premium, nor why discounts tend to vary from week to week. Lee, Shleifer and Thaler (1991), LST henceforth, propose a simple behavioral view of these closed-end fund puzzles. They argue that some of the individual investors who are the primary owners of closed-end funds are noise traders, exhibiting irrational swings in their expectations about future fund returns. Sometimes they are too optimistic, while at other times, they are too pessimistic. Changes in their sentiment affect fund share prices and hence also the difference between prices and net asset values. 30 This view provides a clean explanation of all aspects of the closed-end fund puzzle. Owners of closed-end funds have to contend with two sources of risk: fluctuations 30
For the noise traders to affect the difference between price and NAV rather than just price, it must be that they are more active traders of closed-end fund shares than they are of assets owned by the funds. As evidence for this, LST point out that while funds are primarily owned by individual investors, the funds’ assets are not.
Ch. 18:
A Survey of Behavioral Finance
1099
in the value of the funds’ assets, and fluctuations in noise trader sentiment. If this second risk is systematic – we return to this issue shortly – rational investors will demand compensation for it. In other words, they will require that the fund’s shares trade at a discount to NAV. This also explains why new closed-end funds are often sold at a premium. Entrepreneurs will choose to create closed-end funds at times of investor exuberance, when they know that they can sell fund shares for more than they are worth. On the other hand, when a closed-end fund is liquidated, rational investors no longer have to worry about changes in noise trader sentiment because they know that at liquidation, the fund price will equal NAV. They therefore no longer demand compensation for this risk, and the fund price rises towards NAV. An immediate prediction of the LST view is that prices of closed-end funds should comove strongly, even if the cash-flow fundamentals of the assets held by the funds do not: if noise traders become irrationally pessimistic, they will sell closed-end funds across the board, depressing their prices regardless of cash-flow news. LST confirm in the data that closed-end fund discounts are highly correlated. The LST story depends on noise trader risk being systematic. There is good reason to think that it is. If the noise traders who hold closed-end funds also hold other assets, then negative changes in sentiment, say, will drive down the prices of closed-end funds and of their other holdings, making the noise trader risk systematic. To check this, LST compute the correlation of closed-end fund discounts with another group of assets primarily owned by individuals, small stocks. Consistent with the noise trader risk being systematic, they find a significant positive correlation. 6.2. Comovement The LST model illustrates that behavioral models can make interesting predictions not only about the average level of returns, but also about patterns of comovement. In particular, it explains why the prices of closed-end funds comove so strongly, and also why closed-end funds as a class comove with small stocks. This raises the hope that behavioral models might be able to explain other puzzling instances of comovement as well. Before studying this in more detail, it is worth setting out the traditional view of return comovement. This view, derived from economies without frictions and with rational investors, holds that comovement in prices reflects comovement in fundamental values. Since, in a frictionless economy with rational investors, price equals fundamental value – an asset’s rationally forecasted cash flows discounted at a rate appropriate for their risk – any comovement in prices must be due to comovement in fundamentals. There is little doubt that many instances of return comovement can be explained by fundamentals: stocks in the automotive industry move together primarily because their earnings are correlated. The closed-end fund evidence shows that the fundamentals-based view of comovement is at best, incomplete: in that case, the prices of closed-end funds comove even
1100
N. Barberis and R. Thaler
though their fundamentals do not. 31 Other evidence is just as puzzling. Froot and Dabora (1999) study “twin stocks”, which are claims to the same cash-flow stream, but are traded in different locations. The Royal Dutch/Shell pair, discussed in Section 2, is perhaps the best known example. If return comovement is simply a reflection of comovement in fundamentals, these two stocks should be perfectly correlated. In fact, as Froot and Dabora show, Royal Dutch comoves strongly with the S&P 500 index of U.S. stocks, while Shell comoves with the FTSE index of UK stocks. Fama and French (1993) uncover salient common factors in the returns of small stocks, as well as in the returns of value stocks. In order to test the rational view of comovement, Fama and French (1995) investigate whether these strong common factors can be traced to common factors in news about the earnings of these stocks. While they do uncover a common factor in the earnings news of small stocks, as well as in the earnings news of value stocks, these cash-flow factors are weaker than the factors in returns and there is little evidence that the return factors are driven by the cash-flow factors. Once again, there appears to be comovement in returns that has little to do with fundamentals-based comovement. 32 In response to this evidence, researchers have begun to posit behavioral theories of comovement. LST is one such theory. To state their argument more generally, they start by observing that many investors choose to trade only a subset of all available securities. As these investors’ risk aversion or sentiment changes, they alter their exposure to the particular securities they hold, thereby inducing a common factor in the returns of these securities. Put differently, this “habitat” view of comovement predicts that there will be a common factor in the returns of securities that are the primary holdings of a specific subset of investors, such as individual investors. This story seems particularly appropriate for thinking about closed-end funds, and also for Froot and Dabora’s evidence. A second behavioral view of comovement was recently proposed by Barberis and Shleifer (2003). They argue that to simplify the portfolio allocation process, many investors first group stocks into categories such as small-cap stocks or automotive industry stocks, and then allocate funds across these various categories. If these categories are also adopted by noise traders, then as these traders move funds from 31 Bodurtha et al. (1993) and Hardouvelis et al. (1994) provide further interesting examples of a delinking between fundamentals-based comovement and return comovement in the closed-end fund market. They study closed-end country funds, whose assets trade in a different location from the funds themselves and find that the funds comove as much with the national stock market in the country where they are traded as with the national stock market in the country where their assets are traded. For example, a closed-end fund invested in German equities but traded in the USA typically comoves as much with the U.S. stock market as with the German stock market. 32 In principle, comovement can also be rationally generated through changes in discount rates. However, changes in interest rates or risk aversion induce a common factor in the returns on all stocks, and do not explain why a particular group of stocks comoves. A common factor in news about the risk of certain assets may also be a source of comovement for those assets, but there is little direct evidence to support such a mechanism in the case of small stocks or value stocks.
Ch. 18:
A Survey of Behavioral Finance
1101
one category to another, the price pressure from their coordinated demand will induce common factors in the returns of stocks that happen to be classified into the same category, even if those stocks’ cash flows are largely uncorrelated. In particular, this view predicts that when an asset is added to a category, it should begin to comove more with that category than before. Barberis, Shleifer and Wurgler (2001) test this “category” view of comovement by taking a sample of stocks that have been added to the S&P 500, and computing the betas of these stocks with the S&P 500 both before and after inclusion. Based on both univariate and multivariate regressions, they show that upon inclusion, a stock’s beta with the S&P 500 rises significantly, as does the fraction of its variance that is explained by the S&P 500, while its beta with stocks outside the index falls. 33 This result does not sit well with the cash-flow view of comovement – addition to the S&P 500 is not intended to carry any information about the covariance of a stock’s cash flows with other stocks’ cash flows – but emerges naturally from a model where prices are affected by category-level demand shocks.
7. Application: Investor behavior Behavioral finance has also had some success in explaining how certain groups of investors behave, and in particular, what kinds of portfolios they choose to hold and how they trade over time. The goal here is less controversial than in the previous three sections: it is simply to explain the actions of certain investors, and not necessarily to claim that these actions also affect prices. Two factors make this type of research increasingly important. First, now that the costs of entering the stock market have fallen, more and more individuals are investing in equities. Second, the worldwide trend toward defined contribution retirement savings plans, and the possibility of individual accounts in social security systems mean that individuals are more responsible for their own financial well-being in retirement. It is therefore natural to ask how well they are handling these tasks. We now describe some of the evidence on the actions of investors and the behavioral ideas that have been used to explain it. 7.1. Insufficient diversification A large body of evidence suggests that investors diversify their portfolio holdings much less than is recommended by normative models of portfolio choice. First, investors exhibit a pronounced “home bias”. French and Poterba (1991) report that investors in the USA, Japan and the UK allocate 94%, 98%, and 82% of their overall equity investment, respectively, to domestic equities. It has not been easy to
33
Similar results from univariate regressions can also be found in earlier work by Vijh (1994).
1102
N. Barberis and R. Thaler
explain this fact on rational grounds [Lewis (1999)]. Indeed, normative portfolio choice models that take human capital into account typically advise investors to short their national stock market, because of its high correlation with their human capital [Baxter and Jermann (1997)]. Some studies have found an analog to home bias within countries. Using an especially detailed data set from Finland, Grinblatt and Keloharju (2001) find that investors in that country are much more likely to hold and trade stocks of Finnish firms which are located close to them geographically, which use their native tongue in company reports, and whose chief executive shares their cultural background. Huberman (2001) studies the geographic distribution of shareholders of U.S. Regional Bell Operating Companies (RBOCs) and finds that investors are much more likely to hold shares in their local RBOC than in out-of-state RBOCs. Finally, studies of allocation decisions in 401(k) plans find a strong bias towards holding own company stock: over 30% of defined contribution plan assets in large U.S. companies are invested in employer stock, much of this representing voluntary contributions by employees [Benartzi (2001)]. In Section 3, we discussed evidence showing that people dislike ambiguous situations, where they feel unable to specify a gamble’s probability distribution. Often, these are situations where they feel that they have little competence in evaluating a certain gamble. On the other hand, people show an excessive liking for familiar situations, where they feel they are in a better position than others to evaluate a gamble. Ambiguity and familiarity offer a simple way of understanding the different examples of insufficient diversification. Investors may find their national stock markets more familiar – or less ambiguous – than foreign stock indices; they may find firms situated close to them geographically more familiar than those located further away; and they may find their employer’s stock more familiar than other stocks. 34 Since familiar assets are attractive, people invest heavily in those, and invest little or nothing at all in ambiguous assets. Their portfolios therefore appear undiversified relative to the predictions of standard models that ignore the investor’s degree of confidence in the probability distribution of a gamble. Not all evidence of home bias should be interpreted as a preference for the familiar. Coval and Moskowitz (1999) show that U.S. mutual fund managers tend to hold stocks whose company headquarters are located close to their funds’ headquarters. However, Coval and Moskowitz’s (2001) finding that these local holdings subsequently perform well suggests that an information story is at work here, not a preference for the familiar. It is simply less costly to research local firms and so fund managers do indeed focus on those firms, picking out the stocks with higher expected returns. There is no obvious information-based explanation for the results of French and Poterba (1991), Huberman
34
Particularly relevant to this last point is survey data showing that people consider their own company stock less risky than a diversified index [Driscoll et al. (1995)].
Ch. 18:
A Survey of Behavioral Finance
1103
(2001) or Benartzi (2001), while Grinblatt and Keloharju (2001) argue against such an interpretation of their findings. 7.2. Naive diversification Benartzi and Thaler (2001) find that when people do diversify, they do so in a naive fashion. In particular, they provide evidence that in 401(k) plans, many people seem to use strategies as simple as allocating 1/n of their savings to each of the n available investment options, whatever those options are. Some evidence that people think in this way comes from the laboratory. Benartzi and Thaler ask subjects to make an allocation decision in each of the following three conditions: first, between a stock fund and a bond fund; next, between a stock fund and a balanced fund, which invests 50% in stocks and 50% in bonds; and finally, between a bond fund and a balanced fund. They find that in all three cases, a 50:50 split across the two funds is a popular choice, although of course this leads to very different effective choices between stocks and bonds: the average allocation to stocks in the three conditions was 54%, 73% and 35%, respectively. The 1/n diversification heuristic and other similar naive diversification strategies predict that in 401(k) plans which offer predominantly stock funds, investors will allocate more to stocks. Benartzi and Thaler test this in a sample of 170 large retirement savings plans. They divide the plans into three groups based on the fraction of funds – low, medium, or high – they offer that are stock funds. The allocation to stocks increases across the three groups, from 49% to 60% to 64%, confirming the initial prediction. 7.3. Excessive trading One of the clearest predictions of rational models of investing is that there should be very little trading. In a world where rationality is common knowledge, I am reluctant to buy if you are ready to sell. In contrast to this prediction, the volume of trading on the world’s stock exchanges is very high. Furthermore, studies of individuals and institutions suggest that both groups trade more than can be justified on rational grounds. Barber and Odean (2000) examine the trading activity from 1991 to 1996 in a large sample of accounts at a national discount brokerage firm. They find that after taking trading costs into account, the average return of investors in their sample is well below the return of standard benchmarks. Put simply, these investors would do a lot better if they traded less. The underperformance in this sample is largely due to transaction costs. However, there is also some evidence of poor security selection: in a similar data set covering the 1987 to 1993 time period, Odean (1999) finds that the average gross return of stocks that investors buy, over the year after they buy them, is lower than the average gross return of stocks that they sell, over the year after they sell them.
1104
N. Barberis and R. Thaler
The most prominent behavioral explanation of such excessive trading is overconfidence: people believe that they have information strong enough to justify a trade, whereas in fact the information is too weak to warrant any action. This hypothesis immediately predicts that people who are more overconfident will trade more and, because of transaction costs, earn lower returns. Consistent with this, Barber and Odean (2000) show that the investors in their sample who trade the most earn by far the lowest average returns. Building on evidence that men are more overconfident than women, and using the same data as in their earlier study, Barber and Odean (2001) predict and confirm that men trade more and earn lower returns on average. Working with the same data again, Barber and Odean (2002a) study the subsample of individual investors who switch from phone-based to online trading. They argue that for a number of reasons, the switch should be accompanied by an increase in overconfidence. First, better access to information and a greater degree of control – both features of an online trading environment – have been shown to increase overconfidence. Moreover, the investors who switch have often earned high returns prior to switching, which may only increase their overconfidence further. If this is indeed the case, they should trade more actively after switching and perform worse. Barber and Odean confirm these predictions. 7.4. The selling decision Several studies find that investors are reluctant to sell assets trading at a loss relative to the price at which they were purchased, a phenomenon labelled the “disposition effect” by Shefrin and Statman (1985). Working with the same discount brokerage data used in the Odean (1999) study from above, Odean (1998) finds that the individual investors in his sample are more likely to sell stocks which have gone up in value relative to their purchase price, rather than stocks which have gone down. It is hard to explain this behavior on rational grounds. Tax considerations point to the selling of losers, not winners. 35 Nor can one argue that investors rationally sell the winners because of information that their future performance will be poor. Odean reports that the average performance of stocks that people sell is better than that of stocks they hold on to. Two behavioral explanations of these findings have been suggested. First, investors may have an irrational belief in mean-reversion. A second possibility relies on prospect theory and narrow framing. We have used these ingredients before, but this time it is not loss aversion that is central, but rather the concavity (convexity) of the value function in the region of gains (losses). To see the argument, suppose that a stock that was originally bought at $50 now sells for $55. Should the investor sell it at this point? Suppose that the gains and losses of 35 Odean (1998) does find that in December, investors prefer to sell past losers rather than past winners, but overall, this effect is swamped by a strong preference for selling past winners in the remaining 11 months.
Ch. 18:
A Survey of Behavioral Finance
1105
prospect theory refer to the sale price minus the purchase price. In that case, the utility from selling the stock now is v(5). Alternatively, the investor can wait another period, whereupon we suppose that the stock could go to $50 or $60 with equal probability; in other words, we abstract from belief-based trading motives by saying that the investor expects the stock price to stay flat. The expected value of waiting and selling next period is then 12 v(0) + 12 v(10). Since the value function v is concave in the region of gains, the investor sells now. In a different scenario, the stock may currently be trading at $45. This time, the comparison is between v(−5) and 12 v(−10) + 12 v(0), assuming a second period distribution of $40 and $50 with equal probability. Convexity of v pushes the investor to wait. Intuitively, by not selling, he is gambling that the stock will eventually break even, saving him from having to experience a painful loss. The disposition effect is not confined to individual stocks. In an innovative study, Genesove and Mayer (2001) find evidence of a reluctance to sell at a loss in the housing market. They show that sellers whose expected selling price is below their original purchase price, set an asking price that exceeds the asking price of other sellers with comparable houses. Moreover, this is not simply wishful thinking on the sellers’ part that is later corrected by the market: sellers facing a possible loss do actually transact at considerably higher prices than other sellers. Coval and Shumway (2000) study the behavior of professional traders in the Treasury Bond futures pit at the CBOT. If the gains and losses of prospect theory are taken to be daily profits and losses, the curvature of the value function implies that traders with profits (losses) by the middle of the trading day will take less (more) risk in their afternoon trading. This prediction is borne out in the data. Grinblatt and Han (2001) argue that the investor behavior inherent in the disposition effect may be behind a puzzling feature of the cross-section of average returns, namely momentum in stock returns. Due to the concavity of the value function in the region of gains, investors will be keen to sell a stock which has earned them capital gains on paper. The selling pressure that results may initially depress the stock price, generating higher returns later. On the other hand, if the holders of a stock are facing capital losses, convexity in the region of losses means that they will only sell if offered a price premium; the price is therefore initially inflated, generating lower returns later. Grinblatt and Han provide supportive evidence for their story by regressing, in the cross-section, a stock’s return on its past 12-month return as well as on a measure of the capital gain or loss faced by its holders. This last variable is computed as the current stock price minus investors’ average cost basis, itself inferred from past volume. They find that the capital gain or loss variable steals a substantial amount of explanatory power from the past return. 7.5. The buying decision Odean (1999) presents useful information about the stocks the individual investors in his sample choose to buy. Unlike “sells”, which are mainly prior winners, “buys” are evenly split between prior winners and losers. Conditioning on the stock being a prior
1106
N. Barberis and R. Thaler
winner (loser) though, the stock is a big prior winner (loser). In other words, a good deal of the action is in the extremes. Odean argues that the results for stock purchases are in part due to an attention effect. When buying a stock, people do not tend to systematically sift through the thousands of listed shares until they find a good “buy”. They typically buy a stock that has caught their attention and perhaps the best attention draw is extreme past performance, whether good or bad. Among individual investors, attention is less likely to matter for stock sales because of a fundamental way in which the selling decision differs from the buying decision. Due to short-sale constraints, when individuals are looking for a stock to sell, they limit their search to those stocks that they currently own. When buying stocks, though, people have a much wider range of possibilities to choose from, and factors related to attention may enter the decision more. Using the same discount brokerage data as in their earlier papers, Barber and Odean (2002b) test the idea that for individual investors, buying decisions are more driven by attention than are selling decisions. On any particular day, they create portfolios of “attention-getting” stocks using a number of different criteria: stocks with abnormally high trading volume, stocks with abnormally high or low returns, and stocks with news announcements. They find that the individual investors in their sample are more likely, on the following day, to be purchasers of these high-attention stocks than sellers.
8. Application: Corporate finance 8.1. Security issuance, capital structure and investment An important strand of research in behavioral finance asks whether irrational investors such as those discussed in earlier sections affect the financing and investment decisions of firms. We first address this question theoretically, and ask how a rational manager interested in maximizing true firm value – in other words, the stock price that will prevail once any mispricing has worked its way out of valuations – should act in the face of irrational investors. Stein (1996) provides a useful framework for thinking about this, as well as about other issues that arise in this section. He shows that when a firm’s stock price is too high, the rational manager should issue more shares so as to take advantage of investor exuberance. Conversely, when the price is too low, the manager should repurchase shares. We refer to this model of security issuance as the “market timing” view. What evidence there is to date on security issuance appears remarkably consistent with this framework. First, at the aggregate level, the share of new equity issues among total new issues – the “equity share” – is higher when the overall stock market is more highly valued. In fact, Baker and Wurgler (2000) show that the equity share is a reliable predictor of future stock returns: a high share predicts low, and sometimes negative,
Ch. 18:
A Survey of Behavioral Finance
1107
stock returns. This is consistent with managers timing the market, issuing more equity at its peaks, just before it sinks back to more realistic valuation levels. At the individual firm level, a number of papers have shown that the book-tomarket ratio of a firm is a good cross-sectional predictor of new equity issuance [see Korajczyk, Lucas and McDonald (1991), Jung, Kim and Stulz (1996), Loughran, Ritter and Rydqvist (1994), Pagano, Panetta and Zingales (1998), Baker and Wurgler (2002a)]. Firms with high valuations issue more equity while those with low valuations repurchase their shares. Moreover, long-term stock returns after an IPO or SEO are low [Loughran and Ritter (1995)], while long-term returns after the announcement of a repurchase are high [Ikenberry, Lakonishok and Vermaelen (1995)]. Once again, this evidence is consistent with managers timing the market in their own securities. More support for the market-timing view comes from survey evidence. Graham and Harvey (2001) report that 67% of surveyed CFOs said that “the amount by which our stock is undervalued or overvalued” was an important consideration when issuing common stock. The success of the market-timing framework in predicting patterns of equity issuance offers the hope that it might also be the basis of a successful theory of capital structure. After all, a firm’s capital structure simply represents its cumulative financing decisions over time. Consider, for example, two firms which are similar in terms of characteristics like firm size, profitability, fraction of tangible assets, and current market-to-book ratio, which have traditionally been thought to affect capital structure. Suppose, however, that in the past, the market-to-book ratio of firm A has reached much higher levels than that of firm B. Since, under the market timing theory, managers of firm A may have issued more shares at that time to take advantage of possible overvaluation, firm A may have more equity in its capital structure today. In an intriguing recent paper, Baker and Wurgler (2002a) confirm this prediction. They show that all else equal, a firm’s weighted-average historical market-to-book ratio, where more weight is placed on years in which the firm made an issuance of some kind, whether debt or equity, is a good cross-sectional predictor of the fraction of equity in the firm’s capital structure today. There is some evidence, then, that irrational investor sentiment affects financing decisions. We now turn to the more critical question of whether this sentiment affects actual investment decisions. Once again, we consider the benchmark case in Stein’s (1996) model, in which the manager is both rational and interested in maximizing the firm’s true value. Suppose that a firm’s stock price is too high. As discussed above, the manager should issue more equity at this point. More subtly, though, Stein shows that he should not channel the fresh capital into any actual new investment, but instead keep it in cash or in another fairly priced capital market security. While investors’ exuberance means that, in their view, the firm has many positive net present value (NPV) projects it could undertake, the rational manager knows that these projects are not, in fact, positive NPV and that in the interest of true firm value, they should be avoided. Conversely, if the manager thinks that his firm’s stock price is irrationally low, he should repurchase
1108
N. Barberis and R. Thaler
shares at the advantageously low price but not scale back actual investment. In short, irrational investors may affect the timing of security issuance, but they should not affect the firm’s investment plans. Once we move beyond this simple benchmark case, though, there emerge several channels through which sentiment might affect investment after all. First, the above argument properly applies only to non-equity dependent firms; in other words, to firms which because of their ample internal funds and borrowing capacity do not need the equity markets to finance their marginal investments. For equity-dependent firms, however, investor sentiment and, in particular, excessive investor pessimism, may distort investment: when investors are excessively pessimistic, such firms may have to forgo attractive investment opportunities because it is too costly to finance them with undervalued equity. This thinking leads to a cross-sectional prediction, namely that the investment of equity-dependent firms should be more sensitive to gyrations in stock price than the investment of non-equity dependent firms. Other than this equity-dependence mechanism, there are other channels through which investor sentiment might distort investment. Consider the case where investors are excessively optimistic about a firm’s prospects. Even if a manager is in principle interested in maximizing true value, he faces the danger that if he refuses to undertake projects investors perceive as profitable, they may depress stock prices, exposing him to the risk of a takeover, or more simply, try to have him fired. 36 Even if the manager is rational, this does not mean he will choose to maximize the firm’s true value. The agency literature has argued that some managers may maximize other objectives – the size of their firm, say – as a way of enhancing their prestige. This suggests another channel for investment distortion: managers might use investor exuberance as a cover for doing negative NPV “empire building” projects. Finally, investor sentiment can also affect investment if managers put some weight on investors’ opinions, perhaps because they think investors know something they don’t. Managers may then mistake excessive optimism for well-founded optimism and get drawn into making negative NPV investments. An important goal of empirical research, then, is to try to understand whether sentiment does affect investment, and if so, through which channel. Early studies produced little evidence of investment distortion. In aggregate data, Blanchard, Rhee and Summers (1993) find that movements in price apparently unrelated to movements in fundamentals have only weak forecasting power for future investment: the effects are marginally statistically significant and weak in economic terms. To pick out two particular historical episodes: the rise in stock prices through the 1920s did not lead to 36 Shleifer and Vishny (2004) argue that in a situation such as this, where the manager feels forced to undertake some kind of investment, the best investment of all may be an acquisition of a less overvalued firm, in other words, one more likely to retain its value in the long run. This observation leads to a parsimonious theory of takeover waves, which predicts, among other things, an increase in stock-financed acquisitions at times of high dispersion in valuations.
Ch. 18:
A Survey of Behavioral Finance
1109
a commensurate rise in investment, nor did the crash of 1987 slow investment down appreciably. Morck, Shleifer and Vishny (1990) reach similar conclusions using firm level data, as do Baker and Wurgler (2002a): in their work on capital structure, they show that not only do firms with higher market-to-book ratios in their past have more equity in their capital structure today, but also that the equity funds raised are typically used to increase cash balances and not to finance new investment. More recently though, Polk and Sapienza (2001) report stronger evidence of investment distortion. They identify overvalued firms as firms with high accruals, defined as earnings minus actual cash flow, and as firms with high net issuance of equity. Firms with high accruals may become overvalued if investors fail to understand that earnings are overstating actual cash flows, and Chan et al. (2001) confirm that such firms indeed earn low returns. Overvalued firms may also be identified through their opportunistic issuance of equity, and we have already discussed the evidence that such firms earn low long-run returns. Controlling for actual investment opportunities as accurately as possible, Polk and Sapienza find that the firms they identify as overvalued appear to invest more than other firms, suggesting that sentiment does influence investment. Further evidence of distortion comes from Baker, Stein and Wurgler’s (2003) test of the cross-sectional prediction that equity-dependent firms will be more sensitive to stock price gyrations than will non-equity dependent firms. They identify equitydependent firms on the basis of their low cash balances, among other measures, and find that these firms have an investment sensitivity to stock prices about three times as high as that of non-equity dependent firms. This study therefore provides initial evidence that for some firms at least, sentiment may distort investment, and that it does so through the equity-dependence channel. 8.2. Dividends A major open question in corporate finance asks why firms pay dividends. Historically, dividends have been taxed at a higher rate than capital gains. This means that stockholders who pay taxes would always prefer that the firm repurchase shares rather than pay a dividend. Since the tax exempt shareholders would be indifferent between the dividend payment and the share repurchase, the share repurchase is a Pareto improving action. Why then, do investors seem perfectly happy to accept a substantial part of their return in the form of dividends? Or, using behavioral language, why do firms choose to frame part of their return as an explicit payment to stockholders, and in so doing, apparently make some of their shareholders worse off? Shefrin and Statman (1984) propose a number of behavioral explanations for why investors exhibit a preference for dividends. Their first idea relies on the notion of selfcontrol. Many people exhibit self-control problems. On the one hand, we want to deny ourselves an indulgence, but on the other hand, we quickly give in to temptation: today, we tell ourselves that tomorrow we will not overeat, and yet, when tomorrow arrives, we again eat too much. To deal with self-control problems, people often set rules, such
1110
N. Barberis and R. Thaler
as “bank the wife’s salary, and only spend from the husband’s paycheck”. Another very natural rule people might create to prevent themselves from overconsuming their wealth is “only consume the dividend, but don’t touch the portfolio capital”. In other words, people may like dividends because dividends help them surmount self-control problems through the creation of simple rules. A second rationale for dividends is based on mental accounting: by designating an explicit dividend payment, firms make it easier for investors to segregate gains from losses and hence to increase their utility. To see this, consider the following example. Over the course of a year, the value of a firm has increased by $10 per share. The firm could choose not to pay a dividend and return this increase in value to investors as a $10 capital gain. Alternatively, it could pay a $2 dividend, leaving an $8 capital gain. In the language of prospect theory, investors will code the first option as v(10). They may also code the second option as v(10), but the explicit segregation performed by the firm may encourage them to code it as v(2) + v(8). This will, of course, result in a higher perceived utility, due to the concavity of v in the domain of gains. This manipulation is equally useful in the case of losses. A firm whose value has declined by $10 per share over the year can offer investors a $10 capital loss or a $12 capital loss combined with a $2 dividend gain. While the first option will be coded as v(−10), the second is more likely to be coded as v(2) + v(−12), again resulting in a higher perceived utility, this time because of the convexity of v in the domain of losses. The utility enhancing trick in these examples depends on investors segregating the overall gain or loss into different components. The key insight of Shefrin and Statman is that by paying dividends, firms make it easier for investors to perform this segregation. Finally, Shefrin and Statman argue that by paying dividends, firms help investors avoid regret. Regret is a frustration that people feel when they imagine having taken an action that would have led to a more desirable outcome. It is stronger for errors of commission – cases where people suffer because of an action they took – than for errors of omission – where people suffer because of an action they failed to take. Consider a company which does not pay a dividend. In order to finance consumption, an investor has to sell stock. If the stock subsequently goes up in value, the investor feels substantial regret because the error is one of commission: he can readily imagine how not selling the stock would have left him better off. If the firm had paid a dividend and the investor was able to finance his consumption out of it, a rise in the stock price would not have caused so much regret. This time, the error would have been one of omission: to be better off, the investor would have had to reinvest the dividend. Shefrin and Statman try to explain why firms pay dividends at all. Another question asks how dividend paying firms decide on the size of their dividend. The classic paper on this subject is Lintner (1956). His treatment is based on extensive interviews with executives of large American companies in which he asked the respondent, often the CFO, how the firm set dividend policy. Based on these interviews Lintner proposed what we would now call a behavioral model. In his model, firms first establish a target
Ch. 18:
A Survey of Behavioral Finance
1111
dividend payout rate based on notions of fairness, in other words, on what portion of the earnings it is fair to return to shareholders. Then, as earnings increase and the dividend payout ratio falls below the target level, firms increase dividends only when they are confident that they will not have to reduce them in the future. There are several behavioral aspects to this model. First, the firm is not setting the dividend to maximize firm value or shareholder after-tax wealth. Second, perceptions of fairness are used to set the target payout rate. Third, the asymmetry between an increase in dividends and a decrease is explicitly considered. Although fewer firms now decide to start paying dividends, for those that do Lintner’s model appears to be valid to this day [Benartzi, Michaely and Thaler (1997), Fama and French (2001)]. Baker and Wurgler (2002b) argue that changes in dividend policy may also reflect changing investor sentiment about dividend-paying firms relative to their sentiment about non-paying firms. They argue that for some investors, dividend-paying firms and non-paying firms represent salient categories and that these investors exhibit changing sentiment about the categories. For instance, when investors become more risk averse, they may prefer dividend-paying stocks because of a confused notion that these firms are less risky (the well-known “bird in the hand” fallacy). If managers are interested in maximizing short-run value, perhaps because it is linked to their compensation, they may be tempted to change their dividend policy in the direction favored by investors. Baker and Wurgler find some supportive evidence for their theory. They measure relative investor sentiment about dividend-paying firms as the log market-to-book ratio of paying firms minus the log market-to-book ratio of non-paying firms, and find that in the time series, a high value of this measure one year predicts that in the following year, a higher fraction of non-paying firms initiate a dividend and a larger fraction of newly-listed firms choose to pay one. Similar results obtain for other measures of sentiment about dividend-paying firms. 8.3. Models of managerial irrationality The theories we have discussed so far interpret the data as reflecting actions taken by rational managers in response to irrationality on the part of investors. Other papers have argued that some aspects of managerial behavior are the result of irrationality on the part of managers themselves. Much of Section 2 was devoted to thinking about whether rational agents might be able to correct dislocations caused by irrational traders. Analogously, before we consider models of irrational managers, we should ask to what extent rational agents can undo their effects. On reflection, it doesn’t seem any easier to deal with irrational managers than irrational investors. It is true that many firms have mechanisms in place designed to solve agency problems and to keep the manager’s mind focused on maximizing firm value: giving him stock options for example, or saddling him with debt. The problem is that these mechanisms are unlikely to have much of an effect on irrational managers. These managers think that they are maximizing firm value, even if in reality, they are
1112
N. Barberis and R. Thaler
not. Since they think that they are already doing the right thing, stock options or debt are unlikely to change their behavior. In the best known paper on managerial irrationality, Roll (1986) argues that much of the evidence on takeover activity is consistent with an economy in which there are no overall gains to takeovers, but in which managers are overconfident, a theory he terms the “hubris hypothesis”. When managers think about taking over another firm, they conduct a valuation analysis of that firm, taking synergies into account. If managers are overconfident about the accuracy of their analysis, they will be too quick to launch a bid when their valuation exceeds the market price of the target. Just as overconfidence among individual investors may lead to excessive trading, so overconfidence among managers may lead to excessive takeover activity. The main predictions of the hubris hypothesis are that there will be a large amount of takeover activity, but that the total combined gain to bidder and target will be zero; and that on the announcement of a bid, the price of the target will rise and the value of the bidder will fall by a similar amount. Roll examines the available evidence and concludes that it is impossible to reject any of these predictions. Heaton (2002) analyses the consequences of managerial optimism whereby managers overestimate the probability that the future performance of their firm will be good. He shows that it can explain pecking order rules for capital structure: since managers are optimistic relative to the capital markets, they believe their equity is undervalued, and are therefore reluctant to issue it unless they have exhausted internally generated funds or the debt market. Managerial optimism can also explain the puzzlingly high correlation of investment and cash flow: when cash flow is low, managers’ reluctance to use external markets for financing means that they forgo an unusually large number of projects, lowering investment at the same time. Malmendier and Tate (2001) test Heaton’s model by investigating whether firms with excessively optimistic CEOs display a greater sensitivity of investment to cash flow. They detect excessive optimism among CEOs by examining at what point they exercise their stock options: CEOs who hold on to their options longer than recommended by normative models of optimal exercise are deemed to have an overly optimistic forecast of their stock’s future price. Malmendier and Tate find that the investment of these CEOs’ firms is indeed more sensitive to cash flow than the investment of other firms. 37
37 Another paper which can be included in the managerial irrationality category is Loughran and Ritter’s (2002) explanation for why managers issuing shares appear to leave significant amounts of money “on the table”, as evidenced by the high average return of IPOs on their first day of trading. The authors note that the IPOs with good first day performance are often those IPOs in which the price has risen far above its filing range, giving the managers a sizeable wealth gain. One explanation is therefore that since managers are already enjoying a major windfall, they do not care too much about the fact that they could have been even wealthier.
Ch. 18:
A Survey of Behavioral Finance
1113
9. Conclusion Behavioral finance is a young field, with its formal beginnings in the 1980s. Much of the research we have discussed was completed in the past five years. Where do we stand? Substantial progress has been made on numerous fronts. Empirical investigation of apparently anomalous facts. When De Bondt and Thaler’s (1985) paper was published, many scholars thought that the best explanation for their findings was a programming error. Since then their results have been replicated numerous times by authors both sympathetic to their view and by those with alternative views. At this stage, we think that most of the empirical facts are agreed upon by most of the profession, although the interpretation of those facts is still in dispute. This is progress. If we all agree that the planets do orbit the sun, we can focus on understanding why. Limits to arbitrage. Twenty years ago, many financial economists thought that the Efficient Markets Hypothesis had to be true because of the forces of arbitrage. We now understand that this was a naive view, and that the limits to arbitrage can permit substantial mispricing. It is now also understood by most that the absence of a profitable investment strategy does not imply the absence of mispricing. Prices can be very wrong without creating profit opportunities. Understanding bounded rationality. Thanks largely to the work of cognitive psychologists such as Daniel Kahneman and Amos Tversky, we now have a long list of robust empirical findings that catalogue some of the ways in which actual humans form expectations and make choices. There has also been progress in writing down formal models of these processes, with prospect theory being the most notable. Economists once thought that behavior was either rational or impossible to formalize. We now know that models of bounded rationality are both possible and also much more accurate descriptions of behavior than purely rational models. Behavioral finance theory building. In the past few years there has been a burst of theoretical work modelling financial markets with less than fully rational agents. These papers relax the assumption of individual rationality either through the belief formation process or through the decision-making process. Like the work of psychologists discussed above, these papers are important existence proofs, showing that it is possible to think coherently about asset pricing while incorporating salient aspects of human behavior. Investor behavior. We have now begun the important job of trying to document and understand how investors, both amateurs and professionals, make their portfolio choices. Until recently such research was notably absent from the repertoire of financial economists.
1114
N. Barberis and R. Thaler
This is a lot of accomplishment in a short period of time, but we are still much closer to the beginning of the research agenda than we are to the end. We know enough about the perils of forecasting to realize that most of the future progress of the field is unpredictable. Still, we cannot resist venturing a few observations on what may be coming next. First, much of the work we have summarized is narrow. Models typically capture something about investors’ beliefs, or their preferences, or the limits to arbitrage, but not all three. This comment applies to most research in economics, and is a natural implication of the fact that researchers are boundedly rational too. Still, as progress is made, we expect theorists to begin to incorporate more than one strand into their models. An example can, perhaps, illustrate the point. The empirical literature repeatedly finds that the asset pricing anomalies are more pronounced in small and mid-cap stocks than in the large cap sector. It seems likely that this finding reflects limits to arbitrage: the costs of trading smaller stocks are higher, keeping many potential arbitrageurs uninterested. While this observation may be an obvious one, it has not found its way into formal models. We expect investigation of the interplay between limits to arbitrage and cognitive biases to be an important research area in the coming years. Second, there are obviously competing behavioral explanations for some of the empirical facts. Some critics view this as a weakness of the field. It is sometimes said that the long list of cognitive biases summarized in Section 3 offer behavioral modelers so many degrees of freedom that anything can be explained. We concede that there are numerous degrees of freedom, but note that rational modelers have just as many options to choose from. As Arrow (1986) has forcefully argued, rationality per se does not yield many predictions. The predictions in rational models often come from auxiliary assumptions. There is really only one scientific way to compare alternative theories, behavioral or rational, and that is with empirical tests. One kind of test looks for novel predictions the theory makes. For example, Lee, Shleifer and Thaler (1991) test their model’s prediction that small firm returns will be correlated with closed-end fund discounts, while Hong, Lim and Stein (2000) test the implication of the Hong and Stein (1999) model that momentum will be stronger among stocks with thinner analyst coverage. Another sort of test is to look for evidence that agents actually behave the way a model claims they do. The Odean (1998) and Genesove and Mayer (2001) investigations of the disposition effect using actual market behavior fall into this category. Bloomfield and Hales (2002) offers an experimental test of the behavior theorized by Barberis, Shleifer and Vishny (1998). Of course, such tests are never airtight, but we should be skeptical of theories based on behavior that is undocumented empirically. Since behavioral theories claim to be grounded in realistic assumptions about behavior, we hope behavioral finance researchers will continue to give their
Ch. 18:
A Survey of Behavioral Finance
1115
assumptions empirical scrutiny. We would urge the same upon authors of rational theories. 38 We have two predictions about the outcome of direct tests of the assumptions of economic models. First, we will find that most of our current theories, both rational and behavioral, are wrong. Second, substantially better theories will emerge.
Appendix A We show that for the economy laid out in Equations (3–6), there is an equilibrium in which the risk-free rate is constant and given by Rf =
1 exp ggC − 12 g 2 sC2 , ø
(18)
and in which the price–dividend ratio is a constant f , and satisfies 1=ø
1+f exp gD − ggC + 12 sD2 + g 2 sC2 − 2gsC sD w . f
(19)
In this equilibrium, returns are therefore given by Rt + 1 =
Dt + 1 + Pt + 1 1 + Pt + 1 /Dt + 1 Dt + 1 1 + f exp [gD + sD et + 1 ] . = · = Pt Pt /Dt Dt f
(20)
To see this, start from the Euler equations of optimality, obtained through the usual perturbation arguments,
−g Ct + 1 , Ct −g Ct + 1 . 1 = øEt Rt + 1 Ct 1 = øRf Et
(21) (22)
Computing the expectation in Equation (21) gives Equation (18). We conjecture that in this economy, there is an equilibrium in which the price–dividend ratio is a constant f , so that returns are given by Equation (20). Substituting this into 38 Directly testing the validity of a model’s assumptions is not common practice in economics, perhaps because of Milton Friedman’s influential argument that one should evaluate theories based on the validity of their predictions rather than the validity of their assumptions. Whether or not this is sound scientific practice, we note that much of the debate over the past 20 years has occurred precisely because the evidence has not been consistent with the theories, so it may be a good time to start worrying about the assumptions. If a theorist wants to claim that fact X can be explained by behavior Y , it seems prudent to check whether people actually do Y .
1116
N. Barberis and R. Thaler
Equation (22) and computing the expectation gives Equation (19), as required. For given parameter values, the quantitative implications for P/D ratios and returns are now easily computed.
References Abreu, D., and M. Brunnermeier (2002), “Synchronization risk and delayed arbitrage”, Journal of Financial Economics 66:341−360. Alpert, M., and H. Raiffa (1982), “A progress report on the training of probability assessors”, in: D. Kahneman, P. Slovic and A. Tversky, eds., Judgment Under Uncertainty: Heuristics and Biases (Cambridge University Press, Cambridge) pp. 294–305. Anderson, E., L. Hansen and T. Sargent (1998), “Risk and robustness in equilibrium”, Working Paper (University of Chicago). Arrow, K. (1986), “Rationality of self and others”, in: R. Hogarth and M. Reder, eds., Rational Choice (University of Chicago Press, Chicago) pp. 201–215. Baker, M., and J. Wurgler (2000), “The equity share in new issues and aggregate stock returns”, Journal of Finance 55:2219−2257. Baker, M., and J. Wurgler (2002a), “Market timing and capital structure”, Journal of Finance 57:1−32. Baker, M., and J. Wurgler (2002b), “A catering theory of dividends”, Working Paper (Harvard University). Baker, M., J. Stein and J. Wurgler (2003), “When does the market matter? Stock prices and the investment of equity dependent firms”, Quarterly Journal of Economics, forthcoming. Ball, R. (1978), “Anomalies in relations between securities’ yields and yield surrogates”, Journal of Financial Economics 6:103−126. Banz, R. (1981), “The relation between return and market value of common stocks”, Journal of Financial Economics 9:3−18. Barber, B., and J. Lyon (1997), “Detecting long-run abnormal stock returns: the empirical power and specification of test statistics”, Journal of Financial Economics 43:341−372. Barber, B., and T. Odean (2000), “Trading is hazardous to your wealth: the common stock performance of individual investors”, Journal of Finance 55:773−806. Barber, B., and T. Odean (2001), “Boys will be boys: gender, overconfidence, and common stock investment”, Quarterly Journal of Economics 141:261−292. Barber, B., and T. Odean (2002a), “Online investors: do the slow die first?”, Review of Financial Studies 15:455−487. Barber, B., and T. Odean (2002b), “All that glitters: the effect of attention and news on the buying behavior of individual and institutional investors”, Working Paper (University of California, Berkeley, CA). Barberis, N., and M. Huang (2001), “Mental accounting, loss aversion and individual stock returns”, Journal of Finance 56:1247−1292. Barberis, N., and A. Shleifer (2003), “Style investing”, Journal of Financial Economics 68:161−199. Barberis, N., A. Shleifer and R. Vishny (1998), “A model of investor sentiment”, Journal of Financial Economics 49:307−345. Barberis, N., M. Huang and T. Santos (2001), “Prospect theory and asset prices”, Quarterly Journal of Economics 116:1−53. Barberis, N., A. Shleifer and J. Wurgler (2001), “Comovement”, Working Paper (University of Chicago). Barsky, R., and B. De Long (1993), “Why does the stock market fluctuate?”, Quarterly Journal of Economics 107:291−311. Basu, S. (1983), “The relationship between earnings yield, market value and return for NYSE common stocks: further evidence”, Journal of Financial Economics 12:129−156.
Ch. 18:
A Survey of Behavioral Finance
1117
Baxter, M., and U. Jermann (1997), “The international diversification puzzle is worse than you think”, American Economic Review 87:170−180. Bell, D. (1982), “Regret in decision making under uncertainty”, Operations Research 30:961−981. Benartzi, S. (2001), “Excessive extrapolation and the allocation of 401(k) accounts to company stock”, Journal of Finance 56:1747−1764. Benartzi, S., and R. Thaler (1995), “Myopic loss aversion and the equity premium puzzle”, Quarterly Journal of Economics 110:75−92. Benartzi, S., and R. Thaler (2001), “Na¨ıve diversification strategies in defined contribution savings plans”, American Economic Review 91:79−98. Benartzi, S., R. Michaely and R. Thaler (1997), “Do changes in dividends signal the future or the past?”, Journal of Finance 52:1007−1034. Berk, J. (1995), “A critique of size related anomalies”, Review of Financial Studies 8:275−286. Bernard, V., and J. Thomas (1989), “Post-earnings announcement drift: delayed price response or risk premium?”, Journal of Accounting Research (Supplement), pp. 1–36. Blanchard, O., C. Rhee and L. Summers (1993), “The stock market, profit, and investment”, Quarterly Journal of Economics 108:115−136. Bloomfield, R., and J. Hales (2002), “Predicting the next step of a random walk: experimental evidence of regime-shifting beliefs”, Journal of Financial Economics 65:397−414. Bodurtha, J., D. Kim and C.M. Lee (1993), “Closed-end country funds and U.S. market sentiment”, Review of Financial Studies 8:879−918. Brav, A. (2000), “Inference in long-horizon event studies”, Journal of Finance 55:1979−2016. Brav, A., and P. Gompers (1997), “Myth or reality? The long-run underperformance of initial public offerings: evidence from venture and non-venture-backed companies”, Journal of Finance 52:1791−1821. Brav, A., C. Geczy and P. Gompers (2000), “Is the abnormal return following equity issuances anomalous?”, Journal of Financial Economics 56:209−249. Brennan, M., and Y. Xia (2001), “Stock return volatility and the equity premium”, Journal of Monetary Economics 47:249−283. Brown, S., W. Goetzmann and S. Ross (1995), “Survival”, Journal of Finance 50:853−873. Brunnermeier, M. (2001), Asset Pricing under Asymmetric Information – Bubbles, Crashes, Technical Analysis, and Herding (Oxford University Press). Buehler, R., D. Griffin and M. Ross (1994), “Exploring the planning fallacy: why people underestimate their task completion times”, Journality of Personality and Social Psychology 67:366−381. Camerer, C. (1995), “Individual decision making”, in: J. Kagel and A. Roth, eds., Handbook of Experimental Economics (Princeton University Press). Camerer, C., and R. Hogarth (1999), “The effects of financial incentives in experiments: a review and capital-labor production framework”, Journal of Risk and Uncertainty 19:7−42. Camerer, C., and M. Weber (1992), “Recent developments in modeling preferences: uncertainty and ambiguity”, Journal of Risk and Uncertainty 5:325−70. Campbell, J.Y. (1991), “A variance decomposition for stock returns”, Economic Journal 101:157−179. Campbell, J.Y. (1999), “Asset prices, consumption and the business cycle”, in: J. Taylor and M. Woodford, eds., Handbook of Macroeconomics (Elsevier, Amsterdam) pp. 1231–1303. Campbell, J.Y. (2000), “Asset pricing at the millenium”, Journal of Finance 55:1515−1567. Campbell, J.Y., and J. Cochrane (1999), “By force of habit: a consumption-based explanation of aggregate stock market behavior”, Journal of Political Economy 107:205−251. Campbell, J.Y., and R. Shiller (1988), “Stock prices, earnings and expected dividends”, Journal of Finance 43:661−676. Cecchetti, S., P. Lam and N. Mark (2000), “Asset pricing with distorted beliefs: are equity returns too good to be true?”, American Economic Review 90:787−805. Chan, K., L. Chan, N. Jegadeesh and J. Lakonishok (2001), “Earnings quality and stock returns”, Working Paper (University of Illinois, Urbana, IL).
1118
N. Barberis and R. Thaler
Chan, L., N. Jegadeesh and J. Lakonishok (1996), “Momentum strategies”, Journal of Finance 51: 1681−1713. Chen, J., H. Hong and J. Stein (2001), “Forecasting crashes: trading volume, past returns and conditional skewness in stock prices”, Journal of Financial Economics 61:345−381. Chen, J., H. Hong and J. Stein (2002), “Breadth of ownership and stock returns”, Journal of Financial Economics 66:171−205. Chew, S. (1983), “A generalization of the quasilinear mean with applications to the measurement of income inequality and decision theory resolving the allais paradox”, Econometrica 51:1065−1092. Chew, S. (1989), “Axiomatic utility theories with the betweenness property”, Annals of Operations Research 19:273−98. Chew, S., and K. MacCrimmon (1979), “Alpha-nu choice theory: an axiomatization of expected utility”, Working Paper (University of British Columbia, Vancouver, BC). Chopra, N., J. Lakonishok and J. Ritter (1992), “Measuring abnormal performance: do stocks overreact?”, Journal of Financial Economics 31:235−268. Coval, J., and T. Moskowitz (1999), “Home bias at home: local equity preference in domestic portfolios”, Journal of Finance 54:2045−2073. Coval, J., and T. Moskowitz (2001), “The geography of investment: informed trading and asset prices”, Journal of Political Economy 109:811−841. Coval, J., and T. Shumway (2000), “Do behavioral biases affect prices?”, Working Paper (University of Michigan, Ann Arbor, MI). Daniel, K., and S. Titman (1997), “Evidence on the characteristics of cross-sectional variation in stock returns”, Journal of Finance 52:1−33. Daniel, K., D. Hirshleifer and A. Subrahmanyam (1998), “Investor psychology and security market under- and overreactions”, Journal of Finance 53:1839−1885. Daniel, K., D. Hirshleifer and A. Subrahmanyam (2001), “Overconfidence, arbitrage and equilbrium asset pricing”, Journal of Finance 56:921−965. D’Avolio, G. (2002), “The market for borrowing stock”, Journal of Financial Economics 66:271−306. De Bondt, W., and R. Thaler (1985), “Does the stock market overreact?”, Journal of Finance 40:793−808. De Bondt, W., and R. Thaler (1987), “Further evidence on investor overreaction and stock market seasonality”, Journal of Finance 42:557−581. De Long, J.B., A. Shleifer, L. Summers and R. Waldmann (1990a), “Noise trader risk in financial markets”, Journal of Political Economy 98:703−738. De Long, J.B., A. Shleifer, L. Summers and R. Waldmann (1990b), “Positive feedback investment strategies and destabilizing rational speculation”, Journal of Finance 45:375−395. Dekel, E. (1986), “An axiomatic characterization of preferences under uncertainty: weakening the independence axiom”, Journal of Economic Theory 40:304−18. Diether, K., C. Malloy and A. Scherbina (2002), “Stock prices and differences of opinion: empirical evidence that stock prices reflect optimism”, Journal of Finance 57:2113−2141. Dreman, D. (1977), Psychology and the Stock Market: Investment Strategy Beyond Random Walk (Warner Books, New York). Driscoll, K., J. Malcolm, M. Sirul and P. Slotter (1995), Gallup Survey of Defined Contribution Plan Participants (John Hancock Financial Services). Edwards, W. (1968), “Conservatism in human information processing”, in: B. Kleinmutz, ed., Formal Representation of Human Judgment (Wiley, New York) pp. 17–52. Ellsberg, D. (1961), “Risk, ambiguity, and the savage axioms”, Quarterly Journal of Economics 75: 643−69. Epstein, L., and T. Wang (1994), “Intertemporal asset pricing under Knightian uncertainty”, Econometrica 62:283−322. Fama, E. (1970), “Efficient capital markets: a review of theory and empirical work”, Journal of Finance 25:383−417.
Ch. 18:
A Survey of Behavioral Finance
1119
Fama, E. (1998), “Market efficiency, long-term returns and behavioral finance”, Journal of Financial Economics 49:283−307. Fama, E., and K. French (1988), “Dividend yields and expected stock returns”, Journal of Financial Economics 22:3−25. Fama, E., and K. French (1992), “The cross-section of expected stock returns”, Journal of Finance 47:427−465. Fama, E., and K. French (1993), “Common risk factors in the returns of bonds and stocks”, Journal of Financial Economics 33:3−56. Fama, E., and K. French (1995), “Size and book-to-market factors in earnings and returns”, Journal of Finance 50:131−155. Fama, E., and K. French (1996), “Multifactor explanations of asset pricing anomalies”, Journal of Finance 51:55−84. Fama, E., and K. French (1998), “Value vs. growth: the international evidence”, Journal of Finance 53:1975−1999. Fama, E., and K. French (2001), “Disappearing dividends: changing firm characteristics or lower propensity to pay?”, Journal of Financial Economics 60:3−43. Fama, E., K. French and J. Davis (2000), “Characteristics, covariances and average returns 1929–1997”, Journal of Finance 55:389−406. Fischhoff, B., P. Slovic and S. Lichtenstein (1977), “Knowing with certainty: the appropriateness of extreme confidence”, Journal of Experimental Pyschology: Human Perception and Performance 3: 552−564. Fisher, I. (1928), Money Illusion (Adelphi, New York). Fox, C., and A. Tversky (1995), “Ambiguity aversion and comparative ignorance”, Quarterly Journal of Economics 110:585−603. French, K., and J. Poterba (1991), “Investor diversification and international equity markets”, American Economic Review 81:222−226. Friedman, M. (1953), “The case for flexible exchange rates”, in: Essays in Positive Economics (University of Chicago Press) pp. 157–203. Froot, K., and E. Dabora (1999), “How are stock prices affected by the location of trade?”, Journal of Financial Economics 53:189−216. Genesove, D., and C. Mayer (2001), “Loss aversion and seller behavior: evidence from the housing market”, Quarterly Journal of Economics 116:1233−1260. Gervais, S., and T. Odean (2001), “Learning to be overconfident”, Review of Financial Studies 14:1−27. Gilboa, I., and D. Schmeidler (1989), “Maxmin expected utility with a non-unique prior”, Journal of Mathematical Economics 18:141−153. Gilovich, T., R. Vallone and A. Tversky (1985), “The hot hand in basketball: on the misperception of random sequences”, Cognitive Psychology 17:295−314. Gilovich, T., D. Griffin and D. Kahneman, eds (2002), Heuristics and Biases: The Psychology of Intuitive Judgment (Cambridge University Press). Gneezy, U., and J. Potters (1997), “An experiment on risk taking and evaluation periods”, Quarterly Journal of Economics 112:631−645. Gompers, P., and A. Metrick (2001), “Institutional investors and equity prices”, Quarterly Journal of Economics 116:229−259. Graham, B. (1949), The Intelligent Investor: A Book of Practical Counsel (Harper and Row, New York). Graham, J., and C. Harvey (2001), “The theory and practice of corporate finance: evidence from the field”, Journal of Financial Economics 60:187−243. Grinblatt, M., and B. Han (2001), “The disposition effect and momentum”, Working Paper (University of California, Los Angeles, CA). Grinblatt, M., and M. Keloharju (2001), “How distance, language, and culture influence stockholdings and trades”, Journal of Finance 56:1053−1073.
1120
N. Barberis and R. Thaler
Grinblatt, M., and T. Moskowitz (1999), “The cross-section of expected returns and its relation to past returns”, Working Paper (University of Chicago). Gul, F. (1991), “A theory of disappointment in decision making under uncertainty”, Econometrica 59:667−686. Hansen, L., and K. Singleton (1983), “Stochastic consumption, risk aversion and the temporal behavior of asset returns”, Journal of Political Economy 91:249−268. Hardouvelis, G., R. La Porta and T. Wizman (1994), “What moves the discount on country equity funds?”, in: J. Frankel, ed., The Internationalization of Equity Markets (University of Chicago Press) pp. 345–397. Harris, L., and E. Gurel (1986), “Price and volume effects associated with changes in the S&P 500: new evidence for the existence of price pressure”, Journal of Finance 41:851−860. Harrison, J.M., and D. Kreps (1978), “Speculative investor behavior in a stock market with heterogeneous expectations”, Quarterly Journal of Economics 92:323−336. Heath, C., and A. Tversky (1991), “Preference and belief: ambiguity and competence in choice under uncertainty”, Journal of Risk and Uncertainty 4:5−28. Heaton, J.B. (2002), “Managerial optimism and corporate finance”, Financial Managment (Summer), pp. 33–45. Hirshleifer, D. (2001), “Investor psychology and asset pricing”, Journal of Finance 56:1533−1597. Hong, H., and J. Stein (1999), “A unified theory of underreaction, momentum trading, and overreaction in asset markets”, Journal of Finance 54:2143−2184. Hong, H., and J. Stein (2003), “Differences of opinion, short-sale constraints and market crashes”, Review of Financial Studies 16:487−525. Hong, H., T. Lim and J. Stein (2000), “Bad news travels slowly: size, analyst coverage, and the profitability of momentum strategies”, Journal of Finance 55:265−295. Huberman, G. (2001), “Familiarity breeds investment”, Review of Financial Studies 14:659−680. Hvidkjaer, S. (2001), “A trade-based analysis of momentum”, Working Paper (University of Maryland, College Park, MD). Ikenberry, D., J. Lakonishok and T. Vermaelen (1995), “Market underreaction to open market share repurchases”, Journal of Financial Economics 39:181−208. Jegadeesh, N., and S. Titman (1993), “Returns to buying winners and selling losers: implications for stock market efficiency”, Journal of Finance 48:65−91. Jones, C., and O. Lamont (2002), “Short-sale constraints and stock returns”, Journal of Financial Economics 66:207−239. Jung, K., Y. Kim and R. Stulz (1996), “Timing, investment opportunities, managerial discretion, and the security issue decision”, Journal of Financial Economics 42:159−185. Kahneman, D., and A. Tversky (1974), “Judgment under uncertainty: heuristics and biases”, Science 185:1124−1131. Kahneman, D., and A. Tversky (1979), “Prospect theory: an analysis of decision under risk”, Econometrica 47:263−291. Kahneman, D., and A. Tversky, eds (2000), Choices, Values and Frames (Cambridge University Press). Kahneman, D., P. Slovic and A. Tversky, eds (1982), Judgment Under Uncertainty: Heuristics and Biases (Cambridge University Press). Kaul, A., V. Mehrotra and R. Morck (2000), “Demand curves for stocks do slope down: new evidence from an index weights adjustment”, Journal of Finance 55:893−912. Knight, F. (1921), Risk, Uncertainty and Profit (Houghton Mifflin, Boston, New York). Korajczyk, R., D. Lucas and R. McDonald (1991), “The effects of information releases on the pricing and timing of equity issues”, Review of Financial Studies 4:685−708. La Porta, R., J. Lakonishok, A. Shleifer and R. Vishny (1997), “Good news for value stocks: further evidence on market efficiency”, Journal of Finance 49:1541−1578. Lakonishok, J., A. Shleifer and R. Vishny (1994), “Contrarian investment, extrapolation and risk”, Journal of Finance 49:1541−1578.
Ch. 18:
A Survey of Behavioral Finance
1121
Lamont, O., and R. Thaler (2003), “Can the market add and subtract? Mispricing in tech stock carve-outs”, Journal of Political Economy 111:227−268. Lee, C., A. Shleifer and R. Thaler (1991), “Investor sentiment and the closed-end fund puzzle”, Journal of Finance 46:75−110. LeRoy, S., and R. Porter (1981), “The present-value relation: tests based on implied variance bounds”, Econometrica 49:97−113. Lewis, K. (1999), “Trying to explain home bias in equities and consumption”, Journal of Economic Literature 37:571−608. Lintner, J. (1956), “Distribution of incomes of corporations among dividends, retained earnings and taxes”, American Economic Review 46:97−113. Loomes, G., and R. Sugden (1982), “Regret theory: an alternative theory of rational choice under uncertainty”, The Economic Journal 92:805−824. Lord, C., L. Ross and M. Lepper (1979), “Biased assimilation and attitude polarization: the effects of prior theories on subsequently considered evidence”, Journal of Personality and Social Psychology 37:2098−2109. Loughran, T., and J. Ritter (1995), “The new issues puzzle”, Journal of Finance 50:23−50. Loughran, T., and J. Ritter (2000), “Uniformly least powerful tests of market efficiency”, Journal of Financial Economics 55:361−389. Loughran, T., and J. Ritter (2002), “Why don’t issuers get upset about leaving money on the table?”, Review of Financial Studies 15:413−443. Loughran, T., J. Ritter and K. Rydqvist (1994), “Initial public offerings: international insights”, Pacific Basin Finance Journal 2:165−199. Lyon, J., B. Barber and C. Tsai (1999), “Improved methods for tests of long-run abnormal stock returns”, Journal of Finance 54:165−201. Maenhout, P. (1999), “Robust portfolio rules and asset pricing”, Working Paper (INSEAD, Paris). Malmendier, U., and G. Tate (2001), “CEO overconfidence and corporate investment”, Working Paper (Harvard University). Mankiw, N.G., and S. Zeldes (1991), “The consumption of stockholders and non-stockholders”, Journal of Financial Economics 29:97−112. Markowitz, H. (1952), “The utility of wealth”, Journal of Political Economy 60:151−158. Mehra, R., and E. Prescott (1985), “The equity premium: a puzzle”, Journal of Monetary Economics 15:145−161. Merton, R. (1987), “A simple model of capital market equilibrium with incomplete information”, Journal of Finance 42:483−510. Michaely, R., R. Thaler and K. Womack (1995), “Price reactions to dividend initiations and omissions”, Journal of Finance 50:573−608. Miller, E. (1977), “Risk, uncertainty and divergence of opinion”, Journal of Finance 32:1151−1168. Mitchell, M., and E. Stafford (2001), “Managerial decisions and long-term stock price performance”, Journal of Business 73:287−329. Mitchell, M., T. Pulvino and E. Stafford (2002), “Limited arbitrage in equity markets”, Journal of Finance 57:551−584. Modigliani, F., and R. Cohn (1979), “Inflation and the stock market”, Financial Analysts Journal 35:24−44. Morck, R., A. Shleifer and R. Vishny (1990), “The stock market and investment: is the market a sideshow?”, Brookings Papers on Economic Activity 0:157−202. Mullainathan, S. (2001), “Thinking through categories”, Working Paper (MIT, Cambridge, MA). Odean, T. (1998), “Are investors reluctant to realize their losses?”, Journal of Finance 53:1775−1798. Odean, T. (1999), “Do investors trade too much?”, American Economic Review 89:1279−1298. Ofek, E., and M. Richardson (2003), “Dot-com mania: market inefficiency in the internet sector”, Journal of Finance 58:1113−1137.
1122
N. Barberis and R. Thaler
Pagano, M., F. Panetta and L. Zingales (1998), “Why do companies go public? An empirical analysis”, Journal of Finance 53:27−64. Polk, C., and P. Sapienza (2001), “The real effects of investor sentiment”, Working Paper (Northwestern University, Evanston, IL). Poteshman, A. (2001), “Underreaction, overreaction and increasing misreaction to information in the options market”, Journal of Finance 56:851−876. Quiggin, J. (1982), “A theory of anticipated utility”, Journal of Economic Behavior and Organization 3:323−343. Rabin, M. (1998), “Psychology and economics”, Journal of Economic Literature 36:11−46. Rabin, M. (2000), “Risk aversion and expected utility theory: a calibration theorem”, Econometrica 68:1281−1292. Rabin, M. (2002), “Inference by believers in the law of small numbers”, Quarterly Journal of Economics 117:775−816. Redelmeier, D., and A. Tversky (1992), “On the framing of multiple prospects”, Psychological Science 3:191−193. Ritter, J., and R. Warr (2002), “The decline of inflation and the bull market of 1982 to 1997”, Journal of Financial and Quantitative Analysis 37:29−61. Roll, R. (1977), “A critique of the asset pricing theory’s tests: part I”, Journal of Financial Economics 4:129−174. Roll, R. (1983), “Vas ist das?”, Journal of Portfolio Management 9:18−28. Roll, R. (1986), “The hubris hypothesis of corporate takeovers”, Journal of Business 59:197−216. Rosenberg, B., K. Reid and R. Lanstein (1985), “Persuasive evidence of market inefficiency”, Journal of Portfolio Management 11:9−17. Ross, S. (2001), Lectures Notes on Market Efficiency (MIT, Cambridge, MA). Rouwenhorst, G. (1998), “International momentum strategies”, Journal of Finance 53:267−284. Rubinstein, M. (2001), “Rational markets: yes or no? The affirmative case”, Financial Analysts Journal (May-June), pp. 15–29. Santos, M., and M. Woodford (1997), “Rational asset pricing bubbles”, Econometrica 65:19−58. Sargent, T. (1993), Bounded Rationality in Macroeconomics (Oxford University Press). Savage, L. (1964), The Foundations of Statistics (Wiley, New York). Scheinkman, J., and W. Xiong (2003), “Overconfidence and speculative bubbles”, Journal of Political Economy, forthcoming. Segal, U. (1987), “Some remarks on Quiggin’s anticipated utility”, Journal of Economic Behavior and Organization 8:145−154. Segal, U. (1989), “Anticipated utility: a measure representation approach”, Annals of Operations Research 19:359−373. Shafir, E., P. Diamond and A. Tversky (1997), “Money illusion”, Quarterly Journal of Economics 112:341−374. Shefrin, H., and M. Statman (1984), “Explaining investor preference for cash dividends”, Journal of Financial Economics 13:253−282. Shefrin, H., and M. Statman (1985), “The disposition to sell winners too early and ride losers too long”, Journal of Finance 40:777−790. Shiller, R. (1981), “Do stock prices move too much to be justified by subsequent changes in dividends?”, American Economic Review 71:421−436. Shiller, R. (1984), “Stock prices and social dynamics”, Brookings Papers on Economic Activity 2: 457−498. Shleifer, A. (1986), “Do demand curves for stocks slope down?”, Journal of Finance 41:579−90. Shleifer, A. (2000), Inefficient Markets: An Introduction to Behavioral Finance (Oxford University Press). Shleifer, A., and L. Summers (1990), “The noise trader approach to finance”, Journal of Economic Perspectives 4:19−33.
Ch. 18:
A Survey of Behavioral Finance
1123
Shleifer, A., and R. Vishny (1997), “The limits of arbitrage”, Journal of Finance 52:35−55. Shleifer, A., and R. Vishny (2004), “Stock market driven acquisitions”, Journal of Financial Economics, forthcoming. Stein, J. (1996), “Rational capital budgeting in an irrational world”, Journal of Business 69:429−455. Summers, L. (1986), “Does the stock market rationally reflect fundamental values?”, Journal of Finance 41:591−601. Thaler, R. (2000), “Mental accounting matters”, in: D. Kahneman and A. Tversky, eds., Choice, Values and Frames (Cambridge University Press, Cambridge, UK) pp. 241–268. Thaler, R., and E. Johnson (1990), “Gambling with the house money and trying to break even: the effects of prior outcomes on risky choice”, Management Science 36:643−660. Thaler, R., A. Tversky, D. Kahneman and A. Schwartz (1997), “The effect of myopia and loss aversion on risk-taking: an experimental test”, Quarterly Journal of Economics 112:647−661. Tversky, A., and D. Kahneman (1986), “Rational choice and the framing of decisions”, Journal of Business 59:251−278. Tversky, A., and D. Kahneman (1992), “Advances in prospect theory: cumulative representation of uncertainty”, Journal of Risk and Uncertainty 5:297−323. Veronesi, P. (1999), “Stock market overreaction to bad news in good times: a rational expectations equilibrium model”, Review of Financial Studies 12:975−1007. Vijh, A. (1994), “S&P 500 trading strategies and stock betas”, Review of Financial Studies 7:215−251. von Neumann, J., and O. Morgenstern (1944), Theory of Games and Economic Behavior (Princeton University Press). Vuolteenaho, T. (2002), “What drives firm-level stock returns?”, Journal of Finance 57:233−264. Weil, P. (1989), “The equity premium puzzle and the risk-free rate puzzle”, Journal of Monetary Economics 24:401−421. Weinstein, N. (1980), “Unrealistic optimism about future life events”, Journal of Personality and Social Psychology 39:806−820. Wurgler, J., and K. Zhuravskaya (2002), “Does arbitrage flatten demand curves for stocks?”, Journal of Business 75:583−608. Yaari, M. (1987), “The dual theory of choice under risk”, Econometrica 55:95−115.
This Page Intentionally Left Blank
FINANCE, OPTIMIZATION, AND THE IRREDUCIBLY IRRATIONAL COMPONENT OF HUMAN BEHAVIOR ROBERT J. SHILLER Yale University
Financial theory has been least successful in systematizing our knowledge about the sources of volatility in financial markets. This problem with financial theory has influenced my own changing research direction, and that of many others, in recent years. The most successful applications of financial theory have been in areas where we do not need to know all the sources of volatility. For example, the theory of derivative pricing, covered here by Robert Whaley (Chapter 19), has led to an explosion of new risk management instruments. Market microstructure theory, covered here by Hans Stoll (Chapter 9) and by David Easley and Maureen O’Hara (Chapter 17), has led to significant changes in the design and regulation of trading markets. Agency theory, covered here by Jeremy Stein (Chapter 2), has led to a revolution in managerial compensation that, despite some transitory glitches, promises to make our economy much more efficient. But none of these theories depends on a systematic understanding of the ultimate sources of market volatility. The recent international stock market boom, peaking in early 2000, and then falling in half or even further in many countries, is a stark example of this volatility. This example does not prove anything about anyone’s model; it is only one observation. But, the reasons for this boom and crash, which has occupied the attention and anxieties of hundreds of millions of people, are certainly not elucidated by any well-established financial theory. In looking at circumstantial evidence about this event, it seems to suggest some phenomena that are hardly discussed systematically today. The boom and crash in the stock market correspond somewhat to a boom and crash in measured earnings. But, these earnings movements do not really explain the market price changes, for there is no clear evidence of an exogenous cause for the earnings change: the earnings changes might just as well be caused by the stock price changes affecting individual behavior, or by human factors that caused the stock market changes, through various feedback effects. It seems as if many of the “states of nature” that must be priced in financial markets include states of our own collective psychology. Casual observers of the stock market boom and crash note an array of cultural changes that accompanied these events. Handbook of the Economics of Finance, Edited by G.M. Constantinides, M. Harris and R. Stulz © 2003 Elsevier B.V. All rights reserved
1126
R.J. Shiller
People seemed to become more enthusiastic about investing in the stock market as the 1990s progressed, and survey evidence supports this notion. People began increasingly to believe in business, willing to accept unquestioningly their earnings statements and their pronouncements, and young people changed their aspirations in response to this market. A dramatic “new era” theory of our economy entered public perceptions, and began to be regarded as an established fact. The same theory had resonance over much of the world, and threw the stock markets of the world into unusual synchrony. The news media have a name for this phenomenon, “irrational exuberance”, but this subject has had little academic scrutiny. There are many different approaches to understanding such events as a stock market boom and crash, and different departments in the university have their own special strengths in doing so. The psychology department offers insights into such human factors as overconfidence, self-esteem, social behavior, and attention anomalies. The sociology department offers us insights into patterns of collective thinking and belief in authority. The history department will have a different kind of insight, perhaps a more comprehensive, more inductive approach to understanding financial phenomena. We must not forget that when we study financial data we are studying history. But most financial theory derives formally from an abstract paradigm of individual optimization. Optimizing models have been the cornerstone of economic and financial theory. Such models are indeed the dominant tool of papers in this book. It is common to see behavioral finance as quite a different thing, based on a “nonoptimizing” theory. But what does it mean to be optimizing or not? Presumably, one could describe any human behavior as the solution to some optimization problem. We see described in this handbook, under the rubric of optimizing models, many variations on traditional expected utility theory. We see discussions, in Chapter 13 by John Campbell and Chapter 14 by Rajnish Mehra and Edward Prescott, of Epstein– Zin utility, which is not technically an expected utility maximization theory. We see discussions as well of habit formation, or keeping up with the Jones’s. These are pushing out the borders of rational optimization theory even if they can be modeled in terms of some form of expected utility. Even prospect theory, the most influential construct of behavioral economics, was couched by its framers, Daniel Kahneman and Amos Tversky, as an optimization problem. What sets the behavioral theories apart from mainstream optimizing models in most cases is their insistence on the inconsistencies of human judgment. People’s responses to any given situation are affected by framing, by salience, and by the internal dynamics of their own attention. An insistence that people are inconsistent in their behavior will never be the basis of an elegant financial theory. If one is to have a theory of stock prices, of derivatives prices, of corporate investment, of banking and other intermediaries, all subjects of this volume, one is naturally drawn to the idea that people are consistently rational optimizers of some well-defined and sensible objective. For all these phenomena are related to financial instruments: securities, options and corporate loans are tools that people use with purpose, and so a theory of these would naturally begin with an understanding of this
Finance, Optimization, and the Irreducibly Irrational Component of Human Behavior
1127
real purpose. Just as to understand a hammer or drill, one must reflect on what their use is, one must understand the complex uses of financial instruments and institutions. We cannot proceed on behavioral finance alone. Dybvig and Ross, in Chapter 10 in this volume, get at what they view as a big problem with behavioral finance. They argue that a problem with the psychological theories is that “ . . . they tend to be isolated stories rather than general specifications and are often hard to generalize. For example, prospect theory says that agents put extra weight on very unlikely outcomes, but it is not clear what this means in a model with a continuum of states” (p. 609). They are indirectly referring to the issue of psychological framing, which is discussed in detail in this Handbook by Nicholas Barberis and Richard Thaler (Chapter 18). Framing refers to people’s tendency to act inconsistently, so that their behavior depends on a suggested or currently convenient frame of reference, and hence can be inconsistent from time to time. Prospect theory says that people tend often to overestimate small probabilities that are salient to them from their frame of reference, but of course it cannot mean that people overestimate all small probabilities. Prospect theory is ultimately a theory of people’s faulty attention mechanism and cannot be the basis for an overarching financial theory. But, it should be part of the adjustments made to the theory before it is applied. There is a tendency for some observers to dismiss behavioral finance because of the transience of financial anomalies. After an anomaly, such as the January effect or the small-firm effect, is discovered and the discovery is given news-media attention, the anomaly tends to disappear. William Schwert, in his Chapter 15, asserts that “All these findings raise the possibility that the anomalies are more apparent than real” (p. 941). But, these anomalies are very real even though they are transient, since they ultimately account for a significant part of market volatility. If our paradigm emphasizes the inconsistency of individual behavior, then it accords with changing anomalies. I wish to argue here that both approaches to finance, the behavioral approach, and the rational optimizing approach, have their own contributions to make, and that much work remains to be done on integrating them. There are not enough people who take an active and constructive interest in both approaches. When there is a conflict of paradigms, as appears to be the case here, it is often most fruitful for research to be conducted at the point of conflict between the paradigms. There is a definite schism in much of this volume, with the references covered in the one chapter on behavioral finance figuring at most peripherally in the other essays. I find it odd that there should be a “field” called behavioral finance. Today, we have graduate students naming this as a field for their oral examinations. But all branches of finance should take account of the various social sciences, and ideally we would neither be giving oral examinations in behavioral finance nor be corralling all of the resulting insights into a single chapter of the handbook. I take the isolation of the single behavioral finance chapter in this handbook as a sign that acceptance and understanding of insights from other social sciences has only just begun to permeate the finance profession. When the next major edition of this handbook appears, let us suppose in a decade or more, I would not expect to see a chapter on behavioral finance. The material
1128
R.J. Shiller
should be thoroughly dispersed among most of the chapters. One may hope that by then some way of integrating it in a productive way with the insights that we have from our optimizing models will be generally understood.
Chapter 19
DERIVATIVES ROBERT E. WHALEY ° Fuqua School of Business, Duke University
Contents Abstract Keywords 1. Introduction 2. Background 3. No-arbitrage pricing relations 3.1. Carrying costs 3.2. Valuing forward/futures using the no-arbitrage principle 3.3. Valuing options using the no-arbitrage principle 3.3.1. Call options 3.3.2. Put options 3.3.3. Put–call parity 3.3.4. Summary
4. Option valuation 4.1. The Black–Scholes/Merton option valuation theory 4.2. Analytical formulas 4.2.1. The infamous “Black–Scholes/Merton formula” 4.2.2. Special cases of the Black–Scholes/Merton formula 4.2.2.1. Non-dividend-paying stock options 4.2.2.2. Constant-dividend-yield stock options 4.2.2.3. Futures options 4.2.2.4. Futures-style futures options 4.2.2.5. Foreign currency options 4.2.3. Valuation by replication 4.2.3.1. Dynamic portfolio insurance 4.2.3.2. Static replication
1131 1131 1132 1133 1139 1140 1141 1143 1143 1145 1146 1147 1148 1149 1151 1151 1153 1153 1153 1153 1153 1154 1154 1154 1155
° T. Austin Finch Foundation Professor of Business Administration, Fuqua School of Business, Duke University, Durham, NC 27706; E-mail:
[email protected]. Comments and suggestions by Nick Bollen, George Constantinides, Jeff Fleming, Tom Smith, Ren´e Stulz, and Seth Wechsler are gratefully acknowledged.
Handbook of the Economics of Finance, Edited by G.M. Constantinides, M. Harris and R. Stulz © 2003 Elsevier B.V. All rights reserved
1130
R.E. Whaley
4.2.4. Extensions: Single underlying asset 4.2.4.1. Compound options 4.2.4.2. American-style call options on dividend-paying stocks 4.2.4.3. Chooser options 4.2.4.4. Reset options 4.2.4.5. Lookback options 4.2.4.6. Barrier options 4.2.5. Extensions: Multiple underlying assets 4.2.5.1. Exchange options 4.2.5.2. Options on the minimum and the maximum 4.3. Approximation methods 4.3.1. Lattice-based methods 4.3.1.1. Binomial method 4.3.1.2. Trinomial method 4.3.1.3. Finite difference methods 4.3.2. Monte Carlo methods 4.3.3. Quasi-analytical methods 4.3.3.1. Compound option approximation 4.3.3.2. Quadratic approximation 4.3.4. Moore’s law 4.4. Generalizations 4.4.1. Deterministic volatility functions 4.4.2. Stochastic volatility functions
5. Studies of no-arbitrage price relations 5.1. Forward/futures prices 5.1.1. Cost of carry relation 5.1.2. Forward/futures price relation 5.2. Option prices 5.2.1. Intrinsic value relations 5.2.2. Put–call parity relations 5.3. Summary and analysis
6. Studies of option valuation models 6.1. 6.2. 6.3. 6.4.
Pricing errors/implied volatility anomalies Trading simulations Informational content of implied volatility Summary and analysis
7. Social costs/benefits of derivatives trading 7.1. Contract introductions 7.1.1. Institutional factors 7.1.2. Changes in return volatility 7.1.3. Price effects 7.2. Contract expirations
1155 1155 1156 1156 1156 1156 1156 1157 1157 1157 1157 1158 1158 1162 1162 1162 1163 1163 1163 1164 1164 1164 1165 1166 1167 1167 1168 1169 1170 1171 1173 1173 1174 1176 1179 1181 1189 1189 1189 1190 1192 1193
Ch. 19:
Derivatives
7.3. Market synchronization 7.3.1. Stock market versus option market 7.3.2. Stock market versus index derivatives markets 7.4. Summary and analysis
8. Summary References
1131
1194 1195 1196 1197 1198 1199
Abstract The area of derivatives is arguably the most fascinating area within financial economics during the past thirty years. This chapter reviews the evolution of derivatives contract markets and derivatives research over the past thirty years. The chapter has six complementary sections. The first contains a brief history of contract markets. The most important innovations occurred in the 1970s and 1980s, when contracts written on financial contracts were introduced. Concurrent with these important industry innovations was the development of modern-day option valuation theory, which is reviewed in the second and third sections. The key contribution is seminal theoretical framework of the Black–Scholes (1973) and Merton (1973) (“BSM”) model. The key economic insight of their model is that a risk-free hedge can be formed between a derivatives contract and its underlying asset. This implies that contract valuation is possible under the assumption of risk-neutrality without loss of generality. The final three sections summarize the three main strands of empirical work in the derivatives area. In the first group are studies that focus on testing no-arbitrage pricing relations that link the prices of derivatives contracts with their underling asset and with each other. The second group contains studies that evaluate option empirical performance of option valuation models. The approaches used include investigating the in-sample properties of option values by examining pricing errors or patterns in implied volatilities, examining the performance of different option valuation models by simulating a trading strategy based on under- and over-pricing, and examining the informational content of the volatility implied by option prices. The final group focuses on the social costs and/or benefits that arise from derivatives trading. The main conclusion that can be drawn from the empirical work is that the BSM model is one of the most resilient in the history of financial economics.
Keywords derivatives, options forwards, futures, swaps JEL classification: G100, G120, G130, G140
1132
R.E. Whaley
1. Introduction Arguably the most fascinating area within financial economics during the past thirty years is derivatives. With virtually no derivatives contracts written on financial assets at the beginning of the 1970s, the industry has grown to a level exceeding $100 trillion. This growth would not have been possible without the powerful theoretical contributions of Black–Scholes (1973) and Merton (1973). Their concept of forming a risk-free hedge between a derivatives contract and its underlying asset serves as the foundation for valuing an enormous array of different contract structures. The purpose of this chapter is to provide an overview of the key contributions to the derivatives literature over the past thirty years. This review has six complementary sections. Section 2 contains a brief history of derivatives contracts and contract markets. Although the origin of derivatives use dates back thousands of years, the most important innovations occurred only recently, in the 1970s and 1980s. Not coincidently, these two decades are also the most important in terms of theoretical developments in the derivatives literature. Section 3 is the first of two that focus on derivative contract valuation. The key assumption in the development of the valuation results presented in this section is the law of one price – two perfect substitutes will have the same price in a rationallyfunctioning marketplace. Under this seemingly innocuous assumption, a myriad of pricing relations can be developed for derivatives contracts including forwards, futures options and swaps. Section 4 focuses particularly on the valuation of contingent claims. To value such claims, it is necessary to know the character of the asset-price distribution at time(s) in the future as well as the appropriate discount rate to apply in bringing the expected future cash flows of the derivatives contract (which, of course, depend on the asset price distribution) back to the present. It is this style of claim that underlies the seminal theoretical framework of the Black–Scholes (1973) and Merton (1973) (“BSM”) model. The key economic insight of their model is that a risk-free hedge can be formed between an option and its underlying asset. This implies that option valuation is possible without knowing investor risk preferences, hence the risk-free rate of interest can be used as the appropriate discount rate to apply to the expected future cash flows. To develop the expectations of future cash flows, BSM assume that the underlying asset price follows geometric Brownian motion with constant volatility. Among other things, this implies that, at any point in time in the future, the asset price will be log-normally distributed, eliminating the prospect of negative asset prices that had plagued earlier work. Not only did this framework provide BSM with the ability to value standard call and put options, it has provided other researchers with the ability to value thousands of differently structured agreements including caps, collars, floors, binary options, and quantos. Many of these contributions, as well as other extensions to the BSM model, are summarized. Sections 5 through 7 of this chapter summarize empirical work that investigates the pricing and valuation of derivatives contracts and the efficiency of the markets within
Ch. 19:
Derivatives
1133
which they trade. The studies are divided into three groups. In the first group are studies that focus on testing no-arbitrage pricing conditions. These are contained in Section 5. A review of tests of the no-arbitrage price relations between forwards and futures and their underlying assets as well as tests lower price bounds and put–call parity in the options markets is provided. The second group contains studies that attempt to evaluate option empirical performance of option valuation models. Approaches differ. Some investigate the insample properties of option values by examining pricing errors or patterns in implied volatilities. Others examine the performance of different option valuation models by simulating a trading strategy based on under- and over-pricing. Yet others examine the informational content of the volatility implied by option prices. Discussions of each approach of study are included in Section 6. The third and final group of studies focuses on the social costs and/or benefits that arise from derivatives trading. One sub-group examines whether the introduction of derivatives trading disrupts the market for the underlying asset by generating abnormal price movements and/or increased volatility. A second sub-group examines whether the expiration of derivatives groups disrupts the underlying asset market. A final sub-group examines the inter-temporal relation of price movements in the derivatives and asset markets to ascertain, among other things, where private information is being traded first. All of these discussions are contained in Section 7. The final section contains a brief summary.
2. Background Derivatives, while seemingly new, have been used for thousands of years. In his treatise, Politics, 1 Aristotle tells the story of Thales, a philosopher (and reasonably good meteorologist). Based on studying the winter sky, Thales predicted an unusually large olive harvest. He was so confident of his prediction that he bought rights to rent all of the olive presses in the region for the following year. The fall arrived, and the harvest was unusually plentiful. The demand and price for the use of olive presses soared. Thales’ call options were early examples of over-the-counter (OTC) derivatives. OTC derivatives are private contracts negotiated between parties. Thales bought, and the olive press owners sold, call options. The prices of the options were negotiated, and Thales paid for them in the form of cash deposits. The chief advantage of OTC derivatives markets is limitless flexibility in contract design. The underlying asset can be anything, the size of the contract can be any amount, and the delivery can be made at any time and at any location. The only requirement of an OTC contract is a willing buyer and seller.
1
See Politics by Aristotle (350 BC, Book 1, Part XI).
1134
R.E. Whaley
Among the disadvantages of OTC markets, however, is that willing buyers and sellers must spend time identifying each other. Thousands of years ago, before the advent of high-speed communication and computer technology, such searches were costly. Consequently, centralized markets evolved. The Romans organized commodity markets with specific locations and fixed times for trading. Medieval fairs in England and France during the 12th and 13th centuries served the same purpose. While centralized commodity markets were originally developed to facilitate immediate cash transactions, the practice of contracting for future delivery (i.e., forward transactions) was also introduced. Another disadvantage of OTC derivatives is credit risk, that is, the risk that a counterparty will renege on his contractual obligation. Perhaps the most colorful example of this type of risk involves forward and option contracts on tulip bulbs. In what can be characterized as a speculative bubble, rare and beautiful tulips became collectors’ items for the upper class in Holland in the early 17th century. Prices soared to incredible levels. 2 Homes, jewels, livestock – nothing was too precious that it could not be sacrificed for the purchase of tulips. In an attempt to cash-in on this craze, it was not uncommon for tulip bulb dealers to sell bulbs for future delivery. They did so based on call options provided by tulip bulb growers. In this way, if bulb prices rose significantly prior to delivery, the dealers would simply exercise their options and acquire the bulbs to be delivered on the forward commitments at a fixed (lower) price. The tulip bulb growers also engaged in risk management by buying put options from the dealers. In this way, if prices fell, the growers could exercise their puts and sell their bulbs at a price higher than that prevailing in the market. In retrospect, both the tulip bulb dealers and growers were managing the risk of their positions quite sensibly. Everything could have worked out fine, except that the bubble burst in the winter of 1637 when a gathering of bulb merchants could not get the usual inflated prices for their bulbs. Panic ensued. Prices sank to levels of 1/100th of what they had once been. This set off an unfortunate chain of events. Individuals who had agreed to buy bulbs from dealers did not do so. Consequently, dealers did not have the cash necessary to buy the bulbs when the growers attempted to exercise their puts. Some legal attempts were made to enforce the contracts, however, the attempts were unsuccessful. These contract defaults left an indelible mark on OTC derivatives trading. By the 1800s, the pendulum had swung from undisciplined derivatives trading in OTC markets toward more structured trading on organized exchanges. The first derivatives exchange in the USA was the Chicago Board of Trade (CBT). While the CBT was originally formed in 1848 as a centralized marketplace for exchanging grain, forward contracts were also negotiated. The earliest recorded forward contract trade was made on March 13, 1851 and called for 3000 bushels of corn to be delivered in June at a price of one cent per bushel below the March 13th spot price. 3 Forward
2 3
Garber (2000) provides a detailed recount of tulip bulb price levels during this period. See Chicago Board of Trade (1994, ch. 1, p. 14).
Ch. 19:
Derivatives
1135
contracts had their drawbacks, however. They were not standardized according to quality or delivery time. In addition, like in the case of the tulip bulb fiasco, merchants and traders often did not fulfill their forward commitments. In 1865, the CBT made three important changes to the structure of their grain trading market. First, they introduced the use of standardized contracts called futures contracts. Unlike forward contracts in which the parties are free to choose the terms of the contract, the terms of futures contracts are set by the exchange and are standardized with respect to quality, quantity, and time and place of delivery for the underlying commodity. By concentrating hedging and speculative demands on fewer contracts, the depth and liquidity of the market are enhanced. This facilitates position unwinding. If a party to a trade wants to exit his position prior to the delivery date of the contract, he need only execute an opposite trade (i.e., reverse his trade) in the same contract. There is no need to seek out the counterparty of the original trade and attempt to negotiate the contract’s termination. The second and third changes were made in an effort to promote market integrity. The second was the introduction of a clearinghouse to stand between the buyer and the seller and guarantee the performance of each party. This crucial step eliminated the counterparty risk that had plagued OTC trading. In the event a buyer defaults, the clearinghouse “makes good” on the seller’s position, and then holds the buyer’s clearing firm liable for the consequences. The buyer’s clearing firm, in turn, passes the liability onto the buyer’s broker, and ultimately the buyer. Note that, at any point in time, the clearinghouse has no net position since there are as many long contracts outstanding as there are short. The third was the introduction of a margining system. When the buyer and seller enter a futures position, they are both required to deposit good-faith collateral designed to show that they can fulfill the terms of the contract. From the late 1800s through the early 1980s, the majority of derivatives trading took place on exchanges. Futures contracts were the dominant contract design, and agricultural commodities were the dominant underlying asset. The list of contracts that have been active for more than a century include the CBT’s corn, oats and wheat futures launched in 1865, the New York Cotton Exchange’s cotton futures launched in 1870, the Chicago Produce Exchange 4 (which later became the Chicago Mercantile Exchange) was formed by a group of agricultural dealers to trade futures on butter and egg futures launched in 1874, and the Coffee Exchange’s coffee futures launched in 1882. 5 The move to non-agricultural commodities was slow. Indeed, 51 years elapsed before the Commodity Exchange (COMEX) in New York was formed to trade the first metals 4
The Chicago Produce Exchange also traded futures on other perishable commodities. In 1898, the butter and egg dealers withdrew to form their own market, the Chicago Butter and Egg Board. In 1919, it was reorganized to trade other commodity futures and was renamed the Chicago Mercantile Exchange. 5 In 1914, the Coffee Exchange expanded to include sugar futures, and, in 1916, it changed its name to the New York Coffee and Sugar Exchange. In 1979, it merged with the New York Cocoa Exchange to form today’s Coffee, Sugar & Cocoa Exchange.
1136
R.E. Whaley
contract – silver futures. The New York Mercantile Exchange (NYMEX) followed with platinum futures in December 1956 and palladium futures in January 1968. The introduction of futures on livestock occurred in the 1960s. The Chicago Mercantile Exchange (CME) launched pork belly futures in September 1961, live cattle futures in November 1964, and live hog futures in February 1966. Futures contracts on energy products did not emerge until November 1978, at which time the NYMEX introduced the heating oil futures contract. The pace of innovation in derivatives markets increased remarkably in the 1970s. Many of the important events occurring during this decade, as well as the next, are summarized in Table 1. The first major innovation occurred in February 1972, when the CME began trading futures on currencies in its International Monetary Market (IMM) division. This marked the first time a futures contract was written on anything other than a physical commodity. The second was in April 1973, when the CBT formed the Chicago Board Options Exchange (CBOE) to trade options on common stocks. 6 This marked the first time an option was traded on an exchange. The third major innovation occurred in October 1975, when the CBT introduced the first futures contract on an interest rate instrument – Government National Mortgage Association futures. In January 1976, the CME launched Treasury bill futures and, in August 1977, the CBT launched Treasury bond futures. The 1980s brought yet another round of important innovations. The first was the use of cash settlement. In December 1981, the IMM launched the first cash settlement contracts, the 3-month Eurodollar futures. At expiration, the Eurodollar futures is settled in cash based on the interest rate prevailing for a three-month Eurodollar time deposit. 7 Cash settlement made feasible the introduction of derivatives on stock index futures, the second major innovation of the 1980s. In February 1982, the Kansas City Board of Trade (KCBT) listed futures on the Value Line Composite stock index, and, in April 1982, the CME listed futures on the S&P 500. These contract introductions marked the first time that futures contracts were written on stock indexes. The third major innovation of the 1980s was the introduction of exchange-traded option contracts written on “underlyings” 8 other than individual common stocks. 9 The CBOE and AMEX listed interest rate options in October 1982 and the Philadelphia Stock Exchange (PHLX) listed currency options in December 1982. In the same year, options
6
Initially, only call options were listed in the USA. Put option trading were not listed until June 1977, and, even then, only on an experimental basis. 7 A Eurodollar time deposit is a U.S. dollar deposit in a London bank, and the interest rate quoted on such deposits is called the London Interbank Offer Rate (i.e., the LIBOR rate). Since different banks may offer different rates on deposits of the same maturity, the settlement rate is based on an average of rates across banks. 8 From this point forward, the term “underlying” refers to the asset or instrument that underlies the derivative contract. 9 For a comprehensive review of these new option introductions and their economic purposes, see Stoll and Whaley (1985).
Ch. 19:
Derivatives
1137
Table 1 Milestones in the history of derivative contract market development 1750 BC Options to default on interest payments are described in the Code of Hammurabi. 350 BC Options to rent olive presses are described in Aristotle’s Politics. 1600 AD Forward and option contracts on tulip bulbs flourish in Holland. Tulip bulb prices collapse in the winter of 1637 causing contract defaults. 1848 Chicago Board of Trade (CBT) is formed to provide a centralized marketplace for cash and forward transactions in grains. 1865 CBT revamps forward markets by introducing futures contracts on agricultural commodities. Futures contracts are standardized contracts in terms of quality, quantity, and time and place of delivery. Futures trading involved a clearinghouse and a system of margining. 1870 New York Cotton Exchange (NYCE) is formed to trade cotton futures. 1874 Chicago Produce Exchange (CPE) is formed to trade futures on butter, eggs, poultry, and other perishable products. 1878 London Corn Trade Association introduces the first futures contract in the UK. 1882 Coffee Exchange (CE) is formed by a group of coffee merchants to trade coffee futures. 1898 Butter and egg dealers withdraw from the CPE to form the Chicago Butter and Egg Board (CBEB). 1904 Winnipeg Commodity Exchange (WCE) introduces first commodity (oat) futures contracts in Canada. 1919 S˜ao Paulo Commodities Exchange (BMSP) introduces first commodity futures in Brazil. CBEB becomes the Chicago Mercantile Exchange (CME). 1933 Commodity Exchange (COMEX) is formed and introduces first futures contract on a non-agricultural commodity – silver. 1952 October: London Metal Exchange (LME) lists the first metal (lead) futures contract in the UK. 1960 Sydney Futures Exchange (SFE), originally called the Greasy Wool Futures Exchange, is formed to trade greasy wool futures. 1961 September: CME introduces first futures contract on livestock – frozen pork bellies. 1972 February: CME introduces first futures contract on a financial instrument – foreign currencies. 1973 April 26th: CBT organizes the Chicago Board Options Exchange (CBOE) for the purpose of trading call options on sixteen New York Stock Exchange (NYSE) common stocks. Trading begins in a small smokers’ lounge overlooking the futures exchange. 1975 CBT introduces first interest rate futures contract – Government National Mortgage Association (GNMA) futures. Montreal Exchange (ME) begins to list stock options in Canada. January 13th: American Stock Exchange (AMEX) begins to list call options on stocks. June 27th: Philadelphia Stock Exchange (PHLX) begins to list call options on stocks. 1976 Australian Options Market (AOA) is formed in Australia to list stock options. January: CME begins to list T-bill futures contracts. March: Toronto Stock Exchange (TSE) lists stock options in Canada. April: Pacific Stock Exchange (PSE) begins to list stock options. December: Midwest Stock Exchange (MSE) begins to list call options on stocks. continued on next page
1138
R.E. Whaley
Table 1, continued 1977
1978
1980
1981 1982
1983
1984
1985 1986 1991
June 3rd: Put options on common stocks are listed for the first time in the USA on the CBOE, AMEX, MSE, PHLX, and PSE. August: CBT begins to list T-bond futures contracts. London Traded Options Market (LTOM) is formed and begins to list stock options. European Options Exchange (EOE), formed in November 1977, begins to list stock options in The Netherlands. November: NYMEX introduces first energy futures – heating oil. International Petroleum Exchange (IPE) is formed in the UK to list futures on petroleum and petroleum products. First over-the-counter (OTC) Treasury bond option takes place. September: Toronto Futures Exchange (TFE) is formed to list futures contracts on financial assets in Canada. First over-the-counter (OTC) interest rate swap transaction takes place. December: CME introduces the first cash settlement futures contract – the Eurodollar futures. London International Financial Futures Exchange (LIFFE) is formed in the UK to trade futures on financial instruments. February: Kansas City Board of Trade (KCBT) introduces first futures on a stock index (i.e., the Value Line stock index). April: CME begins to list S&P 500 index futures. October: First options listed on instruments other than common stocks: • CBOE and AMEX begin to list options on Treasury bonds, notes, and bills. • CBT begins to list options on T-bond futures. • Coffee, Sugar, and Cocoa (CSCE) begins to list options on sugar futures. • COMEX begins to list options on gold futures. December: PHLX begin to list options on currencies. January: CME and New York Futures Exchange (NYFE) begin to list options on stock index futures. February: SFE begins to list futures on the All Ordinaries Share Price Index in Australia. March: CBOE begins to list options on stock indexes. Singapore International Monetary Exchange (SIMEX) is inaugurated as the first financial futures exchange in Asia. May: LIFFE begins to list futures on the FT-SE 100 index in the UK. June 3rd: Options on NASDAQ stocks begin trading. June 3rd: New York Stock Exchange (NYSE) begins trading stock options. May: Hong Kong Futures Exchange begins to list futures on the Hang Seng Index. September: SIMEX begins to list futures on the Nikkei 225 Stock Average. Notional amount of OTC derivatives trading surpasses exchange-traded derivatives.
on futures appeared for the first time. In October 1982, the CBT began to list Treasury bond futures options, and the Coffee, Sugar and Cocoa Exchange (CSCE) began to list options on sugar and gold futures. In January 1983, the CME and the New York
Ch. 19:
Derivatives
1139
Futures Exchange (NYFE) began to list options directly on stock index futures, and, in March 1983, the CBOE began to list options on stock indexes. These two decades of innovation have transformed the nature of derivatives trading activity on exchanges. While derivatives exchanges were originally developed to help market participants manage the price risk of physical commodities, today’s trading activity is focused on hedging the financial risks associated with unanticipated price movements in stocks, bonds, and currencies. The 1980s also saw the re-emergence of OTC derivatives trading. As derivatives on financial assets became increasingly popular, investment banks began to think of new ways to tailor contracts to meet customer needs. Some innovations were minor changes in the standard terms of exchange-traded derivatives contracts on financial instruments (e.g., modifications to the expiration date and/or the contract denomination). In 1980, for example, the first OTC Treasury bond option was traded. Other contracts were new and seemingly different. They fall under the generic heading of “swaps”. A swap contract is a contract to “swap” a series of periodic future cash flows, where the terms of the swap are usually set such that the up-front payment is zero. The first interest rate swap was in 1981, when the Student Loan Marketing Association (i.e., “Sallie Mae”) swapped interest payments on intermediate-term fixed rate debt for floating-rate payments indexed to the three-month Treasury bill rate. The cash flows of the two legs of a swap can be linked to virtually any asset or index. A basis rate swap, for example, is an exchange of floating rate payments where the two floating rates are linked to, say, a three-month Treasury bill rate and a three-month Eurodollar time deposit rate. A currency swap is an exchange of interest payments (either fixed or floating) in one currency for payments (either fixed or floating) in another. An equity swap involves the exchange of an interest rate payment and a payment based on the performance of a stock index, while an equity basis swap involves an exchange of payments on two different indexes. Swap agreements may appear different from standard forward and option contracts, but they are not. Every swap can be decomposed into a portfolio of forwards and options. The benefit a swap provides is that several transactions are bundled into a single product.
3. No-arbitrage pricing relations A great deal can be learned about valuing derivatives under minimal assumptions. The sole necessary assumption is the law of one price (LOP). Stated simply, the LOP says that two perfect substitutes must have the same price. If they do not, a costless arbitrage profit can be earned by simultaneously buying the cheaper asset and selling the more expensive one. 10 Because the same asset is bought and sold simultaneously,
10
This “law of one price” argument is the fundamental theoretical underpinning to the Nobel prizewinning corporate finance theory of Modigliani/Miller (1958) and Miller/Modigliani (1961).
1140
R.E. Whaley
the position is risk-free. This is the key attribute of an arbitrage strategy. 11 The fact that the strategy involves no initial cash outlay makes it costless. The absence of costless arbitrage opportunities is fundamental in derivatives contract valuation. A second assumption is that markets are frictionless. Frictionless markets have a number of attributes including: (a) No trading costs. (b) No differential tax rates. (c) Unlimited borrowing and lending at the risk-free rate of interest. (d) Freedom to sell (short), with full use of any proceeds. (e) Can trade at any time and in any quantity. The frictionless market assumption is made largely for convenience. By ignoring market frictions, pricing relations can more easily be identified. In most cases, the impact of considerations such as trading costs, taxes, and divergent borrowing and lending rates can be and have been introduced into the valuation framework straightforwardly. Indeed, the very presence of these market restrictions has caused many derivatives markets to thrive. The focus of this section is to describe some important no-arbitrage relations for derivative contract prices.
3.1. Carrying costs Derivative contracts are written on four types of assets – stocks, bonds, foreign currencies and commodities. The derivatives literature contains seemingly independent developments of derivative valuation principles for each asset category. Generally speaking, however, the valuation principles are not asset-specific. The only distinction among assets is how carry costs are modeled. 12 The cost of carry refers to the difference between the costs and the benefits that accrue while holding an asset. Suppose a breakfast cereal producer needs 5000 bushels of wheat for processing in two months. To lock in the price of the wheat today, he can buy it and carry it for two months. One cost of this strategy is the opportunity cost of funds. To come up with the purchase price, he must either borrow money or reduce his earning assets by that amount. Beyond interest cost, however, carry costs vary depending upon the nature of the asset. For a physical asset such as wheat, he incurs storage costs (e.g., rent and insurance). At the same time, by storing wheat, he avoids the costs of possibly running out of his regular inventory before two months are up and having to pay extra for emergency deliveries. This benefit is called convenience
11
The term, arbitrage, is frequently misapplied. Risk arbitrage, for example, refers to a trading strategy in which the shares of a firm rumored to be on the verge of being acquired are purchased and the shares of the acquiring firm are simultaneously purchased. Since the merger may or may not take place and the stock prices may change, this activity is not arbitrage. 12 See, for example, Stoll and Whaley (1986b).
Ch. 19:
Derivatives
1141
yield. Thus, the cost of carry for a physical asset equals interest cost plus storage costs less convenience yield, that is, Carry costs = Cost of funds + storage cost − convenience yield.
(1a)
For a financial asset such as a stock or a bond, storage costs are negligible. Moreover, income (yield) accrues in the form of quarterly cash dividends or semi-annual coupon payments. The cost of carry for a financial asset is Carry costs = Cost of funds − income.
(1b)
Carry costs and benefits are modeled either as continuous rates or as discrete flows. Some costs/benefits such as the cost of funds (i.e., the risk-free interest rate) are best modeled continuously. The dividend yield on a broadly-based stock portfolio and the interest income on a foreign currency deposit also fall into this category. Other costs/benefits like quarterly cash dividends on individual common stocks, semi-annual coupons on bonds, and warehouse rent payments for holding an inventory of grain are best modeled as discrete cash flows. In the interest of brevity, only continuous costs are considered here. 13 Dividend income from holding a broadly-based stock index portfolio or interest income from holding a foreign currency is typically modeled as a constant, continuous rate. 14 The income, as it accrues, is re-invested in more units of the asset. In this way, buying exp[−iT ] units of a stock index portfolio today grows to exactly one unit at time T , and produces a net terminal value of S-T − S exp[(r − i) T ]. The cost of carry rate equals the difference between the risk-free rate of interest r and the dividend yield rate i for a stock index portfolio investment, and equals the difference between the domestic interest rate r and the foreign interest rate i for a foreign currency investment. The total cost of carry paid at time T is Carry costs = S (exp[(r − i) T ] − 1) .
(2)
3.2. Valuing forward/futures using the no-arbitrage principle The value of a forward contract is inextricably linked to the cost of carry of the underlying asset. Since a forward contract requires its buyer to accept delivery of the underlying asset at time T , buying a forward contract today is a perfect substitute for buying the asset today and carrying it until time T . The present value of the payment obligation under the forward contract strategy is f exp[−rT ], and the present value of 13
For a detailed discussion of the ways in which carrying costs can be modeled, see Whaley (2002). The carry cost rates within this framework are deterministic. In the short-run, this assumption is reasonable for most type of assets. 14
1142
R.E. Whaley
the latter strategy is S exp[−iT ]. Since both strategies provide exactly one unit of the asset at time T , (i.e., S-T ), their costs must be identical, f exp[−rT ] = S exp[−iT ].
(3a)
If the relation (3a) does not hold, costless arbitrage profits would be possible by selling the over-priced instrument and simultaneously buying the under-priced one. The relation (3a) is the present value version of the cost of carry relation. A more familiar version is the future value form, f = S exp[(r − i) T ].
(3b)
When the prices of the forward and the asset are such that Equation (3a) and/or Equation (3b) hold exactly, the forward market is said to be at full carry. Unless costless arbitrage is somehow impeded, the forward market will always be at full carry. The difference between the forward (or futures) price and the asset price is frequently referred to as the basis. Futures contracts are like forward contracts, except that price movements are marked-to-market each day rather than receiving a single, once-and-for-all settlement on the contract’s expiration day. 15 Obviously, the sum of the daily mark-to-market price moves over the life of the futures equals the overall price movement of a forward with the same maturity. With the futures position, however, the mark-to-market profits (losses) are invested (carried) at the risk-free interest rate until the futures expires. The value of the futures position at time T , therefore, may be greater or less than the terminal value of the forward position, depending on the path that futures price follows over the life of the contract. Cox, Ingersoll and Ross (1981) (hereafter “CIR”) and Jarrow and Oldfield (1981), among others, use no-arbitrage arguments to show the equivalence of forward and futures prices when interest rates are deterministic. To illustrate their argument, assume that the term structure of interest rates is flat and does not change through time. Also, assume that r is the continuously compounded interest rate on a daily basis. Now, consider a “rollover” futures position that begins, on day 0, with exp[−r(T − 1)] futures contracts and that increases the number of futures each day by a factor exp[r]. At the end of day 1, the position -1 − F). Assumis marked-to-market, generating proceeds of exp[−r(T − 1)](F ing this gain/loss is carried forward until day T , the terminal gain/loss will be -1 − F. For day 2, the position is increased -1 − F) exp[r(T − 1)] = F exp[−r(T − 1)](F -1 ), generating -2 − F by a factor exp[r] and is marked-to-market at exp[−r(T − 2)](F -1 ) exp[r(T − 2)] = F -2 − F -1 on day T , and -2 − F proceeds of exp[−r(T − 2)](F 15 Recall that, in Section 2, we discussed the historical fact that the CBT changed from making markets in forward contracts to making markets in futures contracts in 1865 as a means of ensuring market integrity.
Ch. 19:
Derivatives
1143
so on. Because the number of futures is chosen to exactly offset the accumulated interest factor on the daily mark-to-market gain/loss, the rollover futures position has exactly the same terminal value as the long forward position. Under the no-arbitrage assumption, the valuation equation for a futures contract is the same as that of the forward, that is, F = f = S exp[(r − i) T ].
(4)
CIR also use no-arbitrage arguments to show the relation between forward and futures prices when interest rates are stochastic. They find that the futures price will be less (greater) than the forward price if (a) the price changes of the futures contract and the default-free discount bond are positively (negatively) correlated and/or (b) the variance of bond price changes is less than (exceeds) the covariance between spot price changes and bond price changes. They also show that if the covariance between spot price changes and bond price changes is positive, the futures price is less than the forward price, and the futures-forward price difference is a decreasing function of the expected forward-bond covariance. 3.3. Valuing options using the no-arbitrage principle The no-arbitrage pricing results for options come in two primary forms – lower price bounds and put–call parity conditions. Each is discussed in turn. 3.3.1. Call options The lower price bound of a European-style call option is c max (0, S exp[−iT ] − X exp[−rT ]) ,
(5)
where c is the price of a European-style call with exercise price X and time to expiration T . The price of the call must be greater than or equal to zero since it is a privilege. The reason the call price must exceed S exp[−iT ] − X exp[−rT ] is based on the following no-arbitrage argument. Suppose a portfolio is formed by selling exp[−iT ] units of the underlying asset and buying a European-style call. To ensure that enough cash is on hand to exercise the call at expiration, X exp[−rT ] in risk-free securities are also purchased. At time T , the net value of the portfolio depends on whether the asset price is above or below the exercise price. If the asset price is below the exercise price, the call expires worthless. The risk-free securities (plus accrued interest) are used to buy a unit of the asset to cover the short sale obligation. A cash balance of X − S-T remains. If the asset price is greater than the exercise price on day T , the call will be exercised. This requires a cash payment of X , which can be made exactly using the risk-free securities. The unit of the asset received upon exercising the call is used to retire the short sale obligation. Thus, if ST > X , the net terminal value
1144
R.E. Whaley
of the portfolio is certain to be 0. Considering both possible outcomes, this portfolio is certain to have a net terminal value of at least 0. This means that its initial value must be less than or equal to 0, otherwise a costless arbitrage opportunity would exist. With S exp[−iT ] − X exp[−rT ] − c 0, the lower price bound or a European-style call is c S exp[−iT ] − X exp[−rT ]. In general, the lower price bound of an option is called its intrinsic value, and the difference between the option’s market price and its intrinsic value is called its time value. A European-style call has an intrinsic value of max(0, S exp[−iT ] − X exp[−rT ]) and a time value of c − max(0, S exp[−iT ] − X exp[−rT ]). No-arbitrage principle identifies the intrinsic value of an option. The determinants of time value are the focus of Section 4. American-style options are like European-style options except that they can be exercised at any time up to and including the expiration day. Since this additional right cannot have a negative value, the relation between the prices of American-style and European-style call options is C c,
(6)
where the C represents the price of an American-style call option with the same exercise price and time to expiration and on the same underlying asset as the Europeanstyle call. The lower price bound of an American-style call option is C max (0, S exp[−iT ] − X exp[−rT ], S − X ) .
(7)
This is the same as Equation (5), except that the term S − X is added within the maximum value operator on the right-hand side since the American-style call cannot sell for less than its early exercise proceeds, S − X . If C < S − X , a costless arbitrage profit of S − X − C can be earned by simultaneously buying the call (and exercising it) and selling the asset. The structure of the lower price bound of the American-style call (7) provides important insight regarding the motivation (or lack thereof) for early exercise. The second term in the parentheses, S exp[−iT ] − X exp[−rT ], is the minimum price at which the call can be sold in the marketplace. 16 The third term is the value of the American-style if it is exercised immediately. If the value of the second term is greater than the third term (for a certain set of call options), the call’s market price will always be greater than its exercise proceeds and it will never be optimal to exercise early.
16 To exit a long position in an American-style call option, you have three alternatives. First, you can hold it to expiration, at which time you will (a) let it expire worthless if it is out of the money or (b) exercise it if it is in the money. Second, you can exercise it immediately, receiving the difference between the current asset price and the exercise price. Third, you can sell it in the same marketplace. There is, after all, an active secondary market for standard calls and puts.
Ch. 19:
Derivatives
1145
To identify this set of calls, examine the conditions under which the relation S exp[−iT ] − X exp[−rT ] > S − X , or S (exp[−iT ] − 1) > −X (1 − exp[−rT ]) ,
(8)
holds. Since the risk-free interest rate is positive, the expression on the right-hand side is negative. Hence, if the left-hand side is positive or zero, early exercise will never be optimal. This condition is met in cases in which i 0. If i 0, an American-style call will never optimally be exercised early, and the value of the American-style call is equal to the value of the European-style call, C = c. Merton (1973) was the first to identify this result and refers to the situation as the call being “worth more alive than dead”. The intuition underlying the “worth more alive than dead” result can be broken down into two components – interest cost, r, and non-interest cost, i. Holding other factors constant, a call option holder prefers to defer exercise. Immediate exercise requires a cash payment of X today. On the other hand, if exercise is deferred until the call’s expiration, the cash is allowed to earn interest. The present value of the exercise cost is only X exp[−rT ]. With respect to non-interest cost, recall that i < 0 for physical assets that require storage. If a call on such an asset is exercised early, the asset is received immediately and storage costs begin to accrue. On the other hand, if exercise is deferred by continuing to hold the claim on the asset rather than the asset itself, storage costs are avoided. Note that, even if storage costs are zero (i.e., with i = 0), condition (8) holds because the interest cost incentive remains. For American-style call options on assets with i > 0 (e.g., stock index portfolio paying dividend yield and foreign currencies paying foreign interest), early exercise may be optimal. The intuition is that, while there remains the incentive to defer exercise and earn interest on the exercise price, deferring exercise means forfeiting the income being generated on the underlying asset. The only way to capture this income is by exercising the call and taking delivery of the asset. For American-style call options on assets with i > 0, early exercise may be optimal and, therefore, C > c. 3.3.2. Put options The lower price bound of a European-style put option is p max (0, X exp[−rT ] − S exp[−iT ]) ,
(9)
where p is the price of a put with exercise price X and time to expiration T . The reason that the put price must exceed X exp[−rT ] − S exp[−iT ] is based on a no-arbitrage portfolio involving a long position in the put, a long position of exp[−iT ] units of the
1146
R.E. Whaley
asset, and a short position of X exp[−rT ] in risk-free securities. If the asset price is less than or equal to the exercise price at the option’s expiration, the put will be exercised. The cash proceeds from exercise are used to cover the risk-free borrowing. If the asset price is greater than the exercise price, the put expires worthless, and the asset is sold to cover the risk-free borrowing, leaving S-T − X in cash. Since the net terminal value of the portfolio is always greater than or equal to zero, its present value must be less than or equal to zero. An American-style put has an early exercise privilege, which means that the relation between the prices of American-style and European-style put options is P p,
(10)
where P represents the price of an American-style put option with the same exercise price, time to expiration and underlying asset as the European-style put. The lower price bound of an American-style put option is p max (0, X exp[−rT ] − S exp[−iT ], X − S) .
(11)
This is the same as Equation (9), except that X − S is added within the maximum value operator. If P < X − S, a costless arbitrage profit of X − S − P can be earned by simultaneously buying the put (and exercising it) and buying the asset. In the case of an American-style call, early exercise is never optimal if the asset’s income rate is less than or equal to zero (i.e., i 0). In the case of an American-style put, no comparable condition exists; 17 there is always a possibility of early exercise depending on the value of S. To see this, suppose the asset price falls to 0. The put option holder will exercise immediately since (a) there is no chance that the asset price will fall further, and (b) deferring exercise means forfeiting the interest income that can be earned on the exercise proceeds. An American-style put is always worth more than the European-style put, P > p. 3.3.3. Put–call parity Put–call parity uses trades in the call, the put, and the asset simultaneously to create a risk-free portfolio. Put–call parity for European-style options is given by c − p = S exp[−iT ] − X exp[−rT ],
(12)
where the call and the put have the same exercise price and time to expiration, and are written on the same underlying asset. The pricing relation is driven by a no-arbitrage argument. In this case, the no-arbitrage portfolio consists of buying exp[−iT ] units 17
In the expression on the right-hand side of Equation (11), the third term is greater than the second term over some range for S, independent of the level of i.
Ch. 19:
Derivatives
1147
of the asset, buying a put, selling a call with the same exercise price, and borrowing X exp[−rT ]. It is straightforward to show that this portfolio will be worthless when the options expire at time T regardless of the relation between the asset price and the option’s exercise price. Since no one would pay a positive amount to hold such a portfolio (or a portfolio with reverse investments), the put–call parity relation (12) must hold. The set of trades used to derive put–call parity is called a conversion. If all of the trades are reversed (i.e., sell the asset, sell the put, buy the call, and buy risk-free securities), it is called a reverse conversion. These names arise from the fact that you can create any position in the asset, options, or risk-free securities by trading (or converting) the remaining securities. The concept of conversion/reverse conversion arbitrage was introduced into the academic literature about 30 years ago. 18 Some market participants were well aware of the concept decades earlier, however. Russell Sage, one of the great U.S. railroad speculators of the 1800s, used conversions to circumvent usury laws. Sage extended credit to individuals under three conditions: (a) they post collateral in the form of stock (with the loan amount capped at the current stock price, S), (b) they provide a written guarantee that Sage could sell back the stock at S, and (c) they pay a cash premium to Sage for the right to buy the stock (when the loan is repaid) at S. Ignoring the cash premium, the borrower has received an interestfree loan, borrowing S and then repaying S. In reality, however, the loan is anything but interest free. The cost of the call embeds the interest cost. Conveniently, the usury laws did not apply to implicit interest rates. The early exercise feature of American-style options complicates the put–call parity relation. The specification of the relation depends on the non-interest carry cost, i. The American-style put–call parity relations are S − X C − P S exp[−iT ] − X exp[−rT ]
if
i 0,
(13a)
S exp[−iT ] − X C − P S − X exp[−rT ]
if
i > 0.
(13b)
and
Each inequality in Equations (13a) and (13b) has a separate set of no-arbitrage trades. Proofs are provided in Stoll and Whaley (1986b). 3.3.4. Summary The purpose of this section was to show some of the derivatives pricing relations that can be developed under the seemingly innocuous assumption that two perfect substitutes must have the same price. Some of these relations will be used in the next
18
The appearance of put–call parity in the academic literature is in Stoll (1969).
1148
R.E. Whaley
section to gather intuition about the specification of option valuation formulas. The relations also serve as the basis for the empirical investigations discussed in Section 5. 4. Option valuation Valuing claims to uncertain income streams is one of the central problems in finance. The exercise is straightforward conceptually. First, the amount and the timing of the expected cash flows from holding the claim must be identified. Next, the expected cash flows must be discounted to the present. The valuation of a European-style call option, therefore, requires the estimation of (a) the mean of the call option’s payoff distribution on the day it expires, and (b) the risk-adjusted discount rate to apply to the option’s expected terminal payoff. In his dissertation, Theory of Speculation, Bachelier (1900) provides the first known valuation of the European-style call option. His valuation equation, which may be written ∞ (S − X ) f (S) dS, (14) c= X
shows that the option’s value depends its expected terminal value. Bachelier assumes that the underlying asset price follows arithmetic Brownian motion, 19 which means f (S) is a normal density function. Unfortunately, this assumption implies that asset prices can be negative. 20 To circumvent this problem, Sprenkle (1961) and Samuelson (1965) value the call under the assumption that the asset price follows geometric Brownian motion. By letting asset prices have multiplicative, rather than additive, fluctuations through time, the asset-price distribution at the option’s expiration is lognormal, rather than normal, and the prospect of the asset price becoming negative is eliminated. Under lognormality, Sprenkle and Samuelson show that the call option valuation formula has the form, c = exp[−ac T ] (S exp[aS T ] N (d1 ) − XN (d2 )) ,
(15)
where d1 =
√ ln(S/X ) + (aS + 0.5s 2 ) T √ , d2 = d1 − s T , s T
aS and ac are the expected risk-adjusted rates of price appreciation for the asset and the call, respectively, s is the asset’s volatility rate, S is the current asset price, X is the 19
Many consider Bachelier to be the father of modern option pricing theory. For an interesting recount of Bachelier’s life and his insights into option valuation, see Sullivan and Weithers (1991). 20 Bachelier also assumes that the asset’s expected price change is zero. This implies investors are risk-neutral and money has no time value.
Ch. 19:
Derivatives
1149
option’s exercise price, and T is the option’s time to expiration. The expression, N (·), is the cumulative univariate normal probability function. The structure of Equation (15) shows that call option value is the present value of its expected terminal value. The expected terminal value depends on a number of factors including the expected growth rate of the asset price, aS . The call is a claim to buy the asset, and the expected asset price at the option’s expiration is S exp[aS T ]. The expression, S exp[aS T ] N (d1 ), is the expected asset price conditional on the asset price exceeding the exercise price at the option’s expiration times the probability that the option will be exercised. The expression, XN (d2 ), is the expected exercise cost (i.e., the exercise price) times the probability that the option will be exercised. As simple and elegant as formula (15) appears, it is not very useful. To implement the formula requires estimates of the risk-adjusted rates of price appreciation for both the asset and the option. The estimation of these values is difficult. In the case of the call, estimation is particularly troublesome because its return depends on the asset’s rate of price appreciation as well as the passage of time. 4.1. The Black–Scholes/Merton option valuation theory The breakthrough came in the early 1970s when Black/Scholes (1973) and Merton (1973) proved that a risk-free hedge could be formed between an option and its underlying asset. The intuition underlying their argument can be illustrated using a simple one-period binomial framework. Consider a European call option that allows its holder to buy one unit of an asset in one month at an exercise price of $40. For the sake of simplicity, suppose that the current asset price is also $40 and that, at the end one month, the asset price will be either $45 or $35. Now, consider selling call options against the unit investment in the asset. At expiration, each call will have a value of $5 or $0, depending on whether the asset price is $45 or $35. Under this scenario, selling two call options against each unit of the asset will create a terminal portfolio value of $35, regardless of the level of asset price. Since the terminal portfolio value is certain, the value of the portfolio today must be $35 discounted at the risk-free rate of interest. If the simple risk-free rate of interest is one percent over the life of the option, the current value of the portfolio must be $34.65, and the current value of the call $2.675 (i.e., ($40.00 − 34.65)/2)). If the observed price of the call is above (below) its theoretical level of $2.675, risk-free arbitrage profits are possible by selling the call and buying (selling) a portfolio consisting of a long position in a half unit of the asset and a short position of $17.325 in risk-free bonds. In equilibrium, no such arbitrage opportunities can exist. The Black–Scholes/Merton (hereafter “BSM”) model is the continuous-time analogue of this illustration. First, asset-price movements are assumed to follow the geometric Brownian motion, dS = aS S dt + s S dz.
(16)
That is, over the next infinitesimally small interval of time dt, the change in asset price, dS, equals an expected price increment (i.e., the product of the instantaneous
1150
R.E. Whaley
expected rate of change in asset price, aS , times the current asset price, S, times the length of the interval) plus a random increment proportional to the instantaneous standard deviation of the rate of change in asset price, s , times the asset price. The term, dz, denotes an increment to a Wiener process. If the asset price follows the dynamics described by Equation (16), it can be shown by Ito’s lemma that derivative contracts written on the asset have price movements described by ðf ðf 1 ð 2 f 2 2 ðf aS S + + 2 2s S s S dz, (17) dt + df = ðS ðt ðS ðS where f is the value of the derivatives contract. Note that the underlying source of uncertainty, dz, in (16) and (17) is the same. The key insight of the BSM option valuation model is that, if the derivative contract and the underlying asset share the same source of risk, it is possible to create a risk-free hedge portfolio by buying ðf/ðS units of the asset and selling the derivative contract (or vice versa). This portfolio has an initial value of ðf S. ðS Over the next instant in time, the portfolio value changes in response to changes in the prices of the derivative contract and the asset, as well as a result of collecting income on the asset at the constant, continuous rate, i. Algebraically, V = −f +
ðf ðf dS + iS dt. ðS ðS Substituting Equations (17) and (16) for df and dS, ðf ðf ðf 1 ð 2 f 2 2 ðf ðf (aS S dt + s S dz) + dV = − aS S + + 2 2 s S dt − s S dz + iS dt ðS ðt ðS ðS ðS ðS ðf 1 ð 2 f 2 2 ðf + 2 2s S − iS dt. =− ðt ðS ðS dV = −df +
Note that by constructing the portfolio in this manner, the only source of risk, dz, has been eliminated. Since the portfolio is risk-free and perfect substitutes must have the same price, holding this portfolio is equivalent to holding an equal dollar investment in risk-free bonds, that is, ðf 1 ð 2 f 2 2 ðf ðf − + 2 2s S − iS dt = r −f + S dt. (18) ðt ðS ðS ðS By rearranging Equation (18), the BSM partial differential equation is identified, ðf ðf 1 2 2 ð 2 f = rf. (19) + (r − i) S + s S ðt ðS 2 ðS 2 Equation (19) is the Black–Scholes/Merton model, and should not be confused with the Black–Scholes/Merton formula. The latter is a special case of the model to be
Ch. 19:
Derivatives
1151
discussed shortly. The BSM model (19) applies to all derivatives written on S including calls, puts, European-style options, American-style options, caps, floors, and collars – any derivative contract for which it is appropriate to assume the asset price dynamics follow geometric Brownian motion. 21 What distinguishes each derivative is the set of boundary equations applied to Equation (19). For a European-style call option, the boundary condition is f = max(0, S − X ) at time T . For a European-style put option, the boundary condition is f = max(0, X − S) at time T . For American-style calls and puts, the respective boundary conditions apply at all times between the current time 0 and the expiration date T . Sometimes the partial differential equation subject to a boundary condition has a solution that can be expressed as an analytical formula. This is true for European-style options, for example. At other times, no analytical formula is possible and approximation methods must be used.
4.2. Analytical formulas For many types of derivatives contracts, analytical valuation formulas are possible, with the most well-known case being European-style options. These are the focus here. 4.2.1. The infamous “Black–Scholes/Merton formula” The BSM formula for a European-style call can be derived from Equation (19). It can also be obtained by applying the Black–Scholes/Merton insight to the Sprenkle/Samuelson valuation formula (15). More specifically, if it is possible to create a risk-free hedge portfolio by buying the asset and selling the option or vice versa, the option value will not depend on an individual’s risk preferences. A risk-averse individual will value a European-style call at the same level as a risk-neutral individual. Consequently, for tractability, it is convenient to assume a risk-neutral world in which all assets (including options) have an expected rate of return equal to the risk-free interest rate, r. That is not to say that all assets have the same expected rate of price appreciation. Some assets pay out income in the form of dividends or coupon interest. With the asset’s income modeled as a constant, continuous proportion of the asset price, i, the expected rate of price appreciation on the asset, aS , equals the interest rate less the cash disbursement rate, i, that is, aS = r − i. On the other hand, some assets like the call option pay out nothing through time, in which case ac = r. Substituting
21 This, of course, eliminates many derivatives contracts written on interest rate instruments whose underlying asset price cannot rise above a certain level (e.g., an option on a Treasury bill). In these instances, it is more common to let the underlying source of uncertainty be the short-term interest rate.
1152
R.E. Whaley
the risk-neutral levels of aS and ac into Equation (15), the “Black–Scholes/Merton formula” for the value of a European-style call option becomes c = exp[−rT ] (S exp[(r − i) T ] N (d1 ) − XN (d2 )) = S exp[−iT ] N (d1 ) − X exp[−rT ] N (d2 ).
(20)
where d1 =
ln(S/X ) + (r − i + 0.5s 2 ) T √ , s T
and
√ d2 = d1 − s T .
Several things about Equation (20) are noteworthy. First, neither the risk premium of the call nor the risk premium of the asset appears in the formula, so the estimation problems that arose in applying the Sprenkle/Samuelson formula are eliminated. Second, where the underlying asset’s volatility rate is zero (i.e., s = 0), the formula (20) reduces to the lower price bound (5) in Section 3. This implies that the asset’s return √ volatility over the life of the option, s T , drives the time value of the option. Finally, the value of the corresponding European-style put can be easily obtained by substituting Equation (19) into the put–call parity relation (12) from Section 3. Before proceeding with a description of other analytical formulas, it is worthwhile to show that the BSM call option formula (20) is a solution to Equation (19). The partial derivatives of Equation (20) are as follows: ð 2f exp[−iT ] n(d1 ) √ = , ðS 2 Ss T
delta:
ðf = exp[−iT ] N (d1 ), ðS
theta:
s ðf = S exp[−iT ] n(d1 ) √ − iS exp[−iT ] N (d1 ) + rX exp[−rT ] N (d2 ). ðt 2 T
gamma:
and
Substituting these expressions, as well as the valuation Equation (20), into the partial differential Equation (19), the expressions on the two sides of Equation (19) are shown to be equal, 22 s −S exp[−iT ] n(d1 ) √ + iS exp[−iT ] N (d1 ) − rX exp[−rT ] N (d2 ) 2 T exp[−iT ] n(d1 ) √ + (r − i) S exp[−iT ] N (d1 ) + 12 s 2 S 2 Ss T = r (S exp[−iT ] N (d1 ) − X exp[−rT ] N (d2 )) . 22
Note that the sign of theta needs to be reversed since the option’s life is growing shorter through time.
Ch. 19:
Derivatives
1153
4.2.2. Special cases of the Black–Scholes/Merton formula The BSM formula covers a wide range of underlying assets. To show its versatility, first re-write the European-style call option formula as c = exp[−rT ] (S exp[bT ] N (d1 ) − XN (d2 )) ,
(21)
where d1 =
ln(S/X ) + (b + 0.5s 2 ) T √ , s T
√ d2 = d1 − s T ,
and
b,
is the asset’s expected risk-neutral rate of price appreciation parameter. 4.2.2.1. Non-dividend-paying stock options. The most well-known option valuation problem is that of valuing options on non-dividend-paying stocks. This is, in fact, the valuation problem addressed by Black and Scholes (1973). With no dividends paid on the underlying stock, the expected price appreciation rate of the stock equals the risk-free rate of interest, and the call option valuation equation becomes the familiar Black/Scholes formula, c = SN (d1 ) − X exp[−rT ] N (d2 ), where d1 =
ln(S/X ) + (r + 0.5s 2 ) T √ , s T
and
√ d2 = d1 − s T .
4.2.2.2. Constant-dividend-yield stock options. Merton (1973) generalizes stock option valuation by assuming that stocks pay dividends at a constant, continuous dividend yield. The “Merton model”, used for valuing many options on broad-based stock indexes, is Equation (21), where i is the index’s dividend yield rate. 4.2.2.3. Futures options. Black (1976b) values options on futures. In a risk-neutral world with constant interest rates, the expected rate of price appreciation on a futures, because it involves no cash outlay, is zero. Substituting b = 0 into Equation (21) provides what is commonly known as the “Black model”, c = exp[−rT ] (FN (d1 ) − XN (d2 )) , where d1 =
ln(F/X ) + 0.5s 2 T √ , s T
and
√ d2 = d1 − s T .
4.2.2.4. Futures-style futures options. Following the work of Black, Asay (1982) values futures-style futures options. Such options trade on a number of exchanges
1154
R.E. Whaley
including the London International Financial Futures Exchange (LIFFE) and the Sydney Futures Exchange (SFE). They have the distinguishing feature that the option premium is not paid up front. Instead, the option position is marked-to-market in the same manner as the underlying futures. To value this option, the cost of carry rates for the asset and the option are both set equal to zero. In a risk-neutral world, any security whose upfront investment is zero has an expected return equal to zero. With b = 0 and r = 0, the resulting formula, called the “Asay model,” is c = FN (d1 ) − XN (d2 ), where d1 =
ln(F/X ) + 0.5s 2 T √ , s T
and
√ d2 = d 1 − s T .
4.2.2.5. Foreign currency options. Finally, Garman and Kohlhagen (1983) develop a formula to value options on foreign currency. In this case, the expected rate of price appreciation of a foreign currency equals the domestic rate of interest less the foreign rate of interest. The “Garman–Kohlhagen model” is specified exactly in the manner of the Merton model, except that r represents the domestic risk-free interest rate and i represents the foreign risk-free interest rate. 4.2.3. Valuation by replication The key contribution of the BSM model is the recognition that a risk-free hedge can be formed between an option and its underlying asset. Consequently, the payoffs of a call option can be replicated with a portfolio consisting of the asset and some riskfree bonds. The BSM formula provides the composition of the asset/bond portfolio that mimics the payoffs of the call. A long call position can be replicated by buying exp[−iT ] N (d1 ) units of the asset (each unit with price, S) and selling N (d2 ) units of risk-free bonds (each unit with price, X exp[−rT ]). As time passes and as the asset price moves, the units invested in the asset and risk-free bonds change. Nonetheless, with continuous rebalancing, the portfolio’s payoffs will be identical to those of the call. 4.2.3.1. Dynamic portfolio insurance 23 . Dynamic replication is at the heart of one of the most popular financial products of the 1980s – dynamic portfolio insurance. Because long-term index put options were not traded at the time, stock portfolio managers had to create their own insurance by dynamically rebalancing a portfolio
23
For a lucid description of portfolio insurance, see Rubinstein (1985a).
Ch. 19:
Derivatives
1155
consisting of stocks and risk-free bonds. The mechanism for identifying the portfolio weights is given by the BSM put option formula, p = X exp[−rT ] N (−d2 ) − S exp[−iT ] N (−d1 ). The objective is to create an “insured” portfolio whose payoffs mimic the portfolio, S exp[−iT ] + p. Substituting the BSM put formula, we find S exp[−iT ] + p = S exp[−iT ] + X exp[−rT ] N (−d2 ) − S exp[−iT ] N (−d1 ) = S exp[−iT ] N (d1 ) + X exp[−rT ] N (−d2 ). Hence, a dynamically insured portfolio has exp[−iT ] N (d1 ) units of stocks and N (−d2 ) units of risk-free bonds. The weights show that as stock prices rise, funds are transferred from bonds to stocks and vice versa. 4.2.3.2. Static replication. The valuation-by-replication technique can also be applied in a static context. Many multiple contingency financial products such as caps, collars, and floors (so-called “exotic” options) are valued as portfolios of standard options. Even a standard call option can be valued in this manner. Consider a portfolio that consists of (a) a long position in an asset-or-nothing call that pays the asset price at expiration if the asset price exceeds X and (b) a short position in a cash-or-nothing call that pays X if the asset price exceeds X . 24 Under the assumptions of risk-neutrality and lognormally distributed asset prices, the value of the asset-or-nothing call is S exp[−iT ] N (d1 ), and the value of the cash-or-nothing call option is X exp[−rT ] N (d2 ). Combining these option values produces the BSM formula (20). 4.2.4. Extensions: Single underlying asset The BSM option valuation framework has been extended in several important ways. Some involve the valuation of more complex claims on a single underlying asset. Others involve claims on two or more underlying assets. The extensions involving a single underlying asset are discussed first. 4.2.4.1. Compound options. An important extension of the BSM model that falls in the single underlying asset category is the compound option valuation theory developed by Geske (1979a). Compound options are options on options. A call on a call, for example, provides its holder with the right to buy a call on the underlying asset at some future date. Geske shows that, if these options are European-style, valuation formulas can be derived. 24
Asset-or-nothing and cash-or-nothing options are commonly referred to as “binary” or “digital” options, and, themselves, are generally considered to be “exotics”.
1156
R.E. Whaley
4.2.4.2. American-style call options on dividend-paying stocks. The Geske (1979a) compound option model has been applied in other contexts. Roll (1977), Geske (1979b) and Whaley (1981), for example, develop a formula for valuing an Americanstyle call option on a stock with known discrete dividends. If a stock pays a cash dividend during the call’s life, it may be optimal to exercise the call early, just prior to dividend payment. An American-style call on a dividend-paying stock, therefore, can be modeled as a compound option providing its holder with the right, on the exdividend date, either to exercise early and collect the dividend, or to leave the position open. In this application, the stock price, net of the present value of the promised dividends is assumed to follow geometric Brownian motion. 25 4.2.4.3. Chooser options. Rubinstein (1991) uses the compound option framework to value the “chooser” or “as-you-like-it” options traded in the OTC market. The holder of a chooser option has the right to decide at some future date whether the option is a call or a put. The call and the put usually have the same exercise price and the same time remaining to expiration. 4.2.4.4. Reset options. Gray and Whaley (1997) use the compound option framework to value yet another type of contingent claim. S&P 500 bear market warrants with a periodic reset trade at the CBOE and the NYSE. The warrants are originally issued as at-the-money put options, however, they have the distinguishing feature that if the underlying index level is above the original exercise on some pre-specified future date, the exercise price of the warrant is reset at the then prevailing index level. These warrants offer an intriguing form of portfolio insurance whose floor value adjusts automatically as the index level rises. The structure of the valuation problem is again a compound option. 4.2.4.5. Lookback options. A lookback option is another exotic with only one underlying source of price uncertainty. A lookback option is an option whose exercise price is determined at the end of the option’s life. For a call, the exercise price is set equal to the lowest price that the asset reached during the life of the option, and, for a put, the exercise price equals the highest asset price. These “buy at the low” and “sell at the high” options can be valued analytically. Formulas are provided in Goldman, Sosin and Gatto (1979). 4.2.4.6. Barrier options. Barrier options are the final type of option in this category to be discussed. Barrier options are options that either cease to exist or come into existence when some pre-defined asset price barrier is hit during the option’s life. A down-and-out call, for example, is a call that gets “knocked out” when the asset
25
Equivalently, the forward price of the stock is assumed to follow geometric Brownian motion.
Ch. 19:
Derivatives
1157
price falls to some pre-specified level prior to the option’s expiration. Rubinstein and Reiner (1991) provide valuation equations for a large family of barrier options. 4.2.5. Extensions: Multiple underlying assets The BSM option valuation framework has also been extended to include multiple underlying assets. As long as each asset is traded, the BSM risk-free hedge argument remains intact and risk-neutral valuation is permitted without loss of generality. 4.2.5.1. Exchange options. The first important development along this line was by Margrabe (1978). He derives a valuation formula for an exchange option. An exchange option gives its holder the right to exchange one risky asset or asset for another. The BSM formula is a special case of the Margrabe formula in the sense that if the call is in the money at expiration the option holder exchanges risk-free bonds for the asset. 4.2.5.2. Options on the minimum and the maximum. Stulz (1982) and Johnson (1987) derive valuation formulas for options on the maximum and the minimum of two or more risky assets. Many of the exchange-traded futures contracts can be valued as an option on the minimum. The CBT’s T-bond futures, for example, provides the seller with the right to deliver the cheapest of a number of deliverable T-bond issues.
4.3. Approximation methods Many option valuation problems do not have explicit closed-form solutions. Probably the best known example is the valuation of standard American-style options. With American-style options, the option holder has an infinite number of exercise opportunities between the current date and the option’s expiration date, making the problem intractable from a mathematical standpoint. 26 But, many other examples also exist. Hundreds of different types of exotic options trade in the OTC market, and many, if not most, of these options do not have analytical formulas. Nonetheless, all of them can be valued accurately using the BSM model. If a risk-free hedge can be formed between the option and the underlying asset, the BSM risk-neutral valuation theory can be applied, albeit through the use of numerical methods. Below three types of commonly-applied approximation methods are described. 27
26 An exception is, of course, an American-style call option on an asset where i 0, as was discussed in Section 2. 27 The techniques included in this discussion are lattice-based methods, Monte Carlo simulation methods, and quasi-analytical methods. A less traveled route to valuing American-style options is numerical integration. See, for example, Parkinson (1977).
1158
R.E. Whaley
4.3.1. Lattice-based methods A number of numerical methods for valuing options are lattice-based. These methods replace the BSM assumption that asset price moves smoothly and continuously through time with an assumption that the asset price moves in discrete jumps over discrete intervals during the option’s life. 4.3.1.1. Binomial method. Perhaps the best-known lattice-based method is the binomial method, developed independently by Cox, Ross and Rubinstein (1979) and Rendleman and Bartter (1979). Given the importance of the role that this approximation method plays within the derivatives industry, its development and relation to the BSM model are described more fully. To develop the binomial method, it is more convenient to use the dynamics of the logarithm of asset price rather than asset price. Under the BSM model, asset price follows the geometric Brownian motion described by Equation (16). It can be shown by Ito’s lemma that, if asset price follows Equation (16), the logarithm of asset price follows d ln S = m dt + s dz,
(22)
2
where m = a − s2 and the subscript on a has been suppressed. Since the binomial method replaces the assumption of continuous asset price movements with price movements over a discrete interval, Equation (22) is re-written as √ (23) D ln S = mDt + se Dt, where e is a normally distributed random variable with zero mean and unit standard deviation. Under the binomial method, the option’s life is divided into fixed length time steps, and, in each time step, the asset price is allowed to jump up or down. If n is the number of time steps, each time increment has length Dt = T/n, where T is the time to expiration of the option. The binomial distribution is characterized by the size of its price steps and their probabilities. The parameters are chosen in such a way that the mean and the variance of the discrete binomial distribution are consistent with the mean and the variance of the continuous log-normal distribution underlying the BSM model. Under the BSM assumptions, the logarithm of the asset price at the end of the time increment Dt is normally distributed with mean ln S + mDt and variance s 2 Dt. First, set the mean of the binomial distribution equal to the mean of the logarithm of asset price distribution, that is, p(ln S + v) + (1 − p)(ln S + w) = ln S + mDt.
(24)
In Equation (24), p is the probability that the logarithm of asset price changes by v, and 1 − p is the probability that the logarithm of asset price changes by w. What remains is pv + (1 − p) w = mDt.
(25)
Ch. 19:
Derivatives
1159
Next, set the variance of the binomial distribution equal to the variance of the logarithm of asset-price distribution, p (ln S + v − (ln S + mDt))2 + (1 − p) (ln S + w − (ln S − mDt))2 = s 2 Dt.
(26)
The ln S terms are again irrelevant, and, with a little additional algebra, Equation (26) becomes pv2 + (1 − p) w2 = s 2 Dt + m 2 Dt 2 .
(27a)
Equation (27a) is a little unusual in the sense that it has a term that includes the time increment squared, Dt 2 . In applying the binomial method to value options, however, a large number of time steps is usually used, so Dt is very small. Consequently, terms with higher order values of Dt can safely be ignored. Ignoring the higher order term, Equation (27a) can be written pv2 + (1 − p) w2 = s 2 Dt.
(27b)
Note that the values on the right-hand side of Equations (25) and (27) are known. They are parameters of the normal distribution of the logarithm of asset prices. The objective is to find the values of v, w, and p, which characterize the binomial distribution. With two equations (i.e., 25 and either 27a or 27b) and three unknowns, we cannot solve for the parameters v, w and p uniquely, so another constraint must be imposed. Below, the constraints used in two well-known implementations of the binomial method are discussed. Cox, Ross and Rubinstein (1979) (hereafter “CRR”) impose the symmetry constraint, w = −v, where v is a positive increment. This implies that the asset price will either rise to a level, ln S + v, or fall to a level, ln S − v over the next increment in time Dt. CRR use Equation (27b) to tie the variance of the binomial distribution to the variance of the logarithm of asset prices. The value of v becomes √ (28) v = s Dt. With v and, hence w(= −v), known, only the level of probability, p, remains to be identified. Substituting Equation (28) into Equation (26) and rearranging, m√ Dt. (29) p = 12 + 12 s Substituting the relation between the mean continuously compounded rate of price appreciation, m, and the continuously compounded mean rate of price appreciation, b, that is, m = b − 0.5s 2 , b − 0.5s 2 √ 1 1 p= 2+2 Dt. (30) s In another well-known implementation of the binomial method, Jarrow and Rudd (1983) (hereafter “JR”) impose the constraint that the up-step and the down-step
1160
R.E. Whaley
probabilities are both equal to 12 . This means that the relation between the mean of the binomial distribution and the mean of the change in the logarithm of prices Equation (25) may be written v + w = 2mDt.
(31)
To express the relation between the variances, JR use Equation (27a). The variance relation can be re-written as v2 + w2 = 2s 2 Dt +
1 2
4m 2 Dt 2 .
(32)
Substituting the square of Equation (31) into the parentheses on the right-hand side of Equation (32), rearranging, factoring, taking the square root and then simplifying, √ v − w = s Dt. (33) Equations (31) and (33) can now be used to identify u and v. With the probability set equal to 12 , the up-step coefficient is √ √ v = mDt + s Dt = b − 0.5s 2 Dt + s Dt,
(34a)
and the down-step coefficient is √ √ w = mDt − s Dt = b − 0.5s 2 Dt − s Dt.
(34b)
The distinction between the two approaches is that the CRR method handles the rate of drift in the asset price through the up-step and down-step probabilities, while the JR method handles the drift through the step sizes. With the probabilities and step sizes linked to the parameters of the BSM lognormal price distribution, the steps of the approximation method are now described. The first step is to enumerate the possible paths that the asset price may take between now and the option’s expiration. The user chooses the number of time steps, n, and thereby sets the time increment, Dt = T/n, and step sizes (i.e., Equation 28 under the CRR approach, and Equations 34a and 34b under the JR approach). Generally, the implementations use an asset price lattice rather than the logarithm of asset price, so the jumps at each step are multiplicative rather than additive. In the CRR method, for example, the initial asset price S moves to uS or dS √ at the end of the first time step, where the up-step coefficient is u = exp[v] = exp[s Dt] and the down-step coefficient is d = 1/u. At the end of the second time step, the asset prices are uuS, S and ddS, and so on. With n time steps, there will be n + 1 terminal asset price nodes. The greater the number of time steps, the more precise the method. The cost of the increased precision, however, is computational speed. With n time steps, 2n asset price paths over the life of the option are considered. With 20 time steps, this means over a million paths.
Ch. 19:
Derivatives
1161
The second step of the binomial method is to value the option at expiration at each of the possible asset price levels. At expiration, the option value at each asset price node is simply the option’s intrinsic value, that is, max(0, Sj − X ) for a call option and max(0, X − Sj ) for a put, where j represents the jth node. Once the option values at all nodes at time n are identified, the procedure steps backward one time step. The third step is recursive. At time n − 1, the value of the option at each node is computed by taking the present value of the expected future value of the option. The expected future value is simply the probability of an up-step times the option’s value if the asset price steps up plus the probability of a down-step times the option’s value if the asset price steps down. The discount rate in the present value computation is the risk-free rate of interest. Before proceeding backward another step in time, it is necessary to determine whether any of the computed options values at time n − 1 are affected by a feature of the contract. If the option is American-style, for example, the computed option value at each node must be compared with its early-exercise proceeds. If the early exercise proceeds exceed the computed value, the computed value is replaced by the amount of the exercise proceeds. The interpretation is, of course, that if the option holder finds himself standing at that time in the option’s life, with the underlying asset priced at that level, he will exercise his option. If proceeds are less, the option “is worth more alive than dead”, and the computed value is left undisturbed. Note that, if the check of the early exercise condition is not performed, the binomial method will produce an approximate value for a European-style option. 28 The procedure now takes another step back in time, repeats the computations of all nodes, and then checks for early exercise. The procedure is repeated again and again until only a single node remains at time 0. This node will contain the value of the American-style option, as approximated by the binomial method. The binomial method has wide applicability. Aside from the American-style option feature, which is easily incorporated within the framework, the binomial method can be used to value many types of exotic options. Knockout options, for example, can be valued using this technique. A different check on the computed option values at the nodes of the intermediate time steps between 0 and n is imposed. If the underlying asset price falls below the option’s barrier, the option value at that node is set equal to zero. The method can also be extended to handle multiple sources of asset price uncertainty. Boyle (1988) and Boyle, Evnine and Gibbs (1989) adapt the binomial procedure to handle exotics with multiple sources of uncertainty including options on the minimum and maximum, spread options, 29 and so on.
28
Indeed, a useful way to gauge the approximation error of the various numerical methods is to implement them on valuation problems for which there is an analytical formula. 29 A spread option is an option whose underlying source of uncertainty is the difference between two asset prices. Since the difference between two log-normally distributed variables is not log-normal, valuing spread options is not merely a matter of applying the BSM model to the difference in asset prices. For an application of the binomial method in valuing spread options, see Whaley (1996).
1162
R.E. Whaley
4.3.1.2. Trinomial method. The trinomial method is another popular lattice-based method. The trinomial method, as outlined by Kamrad and Ritchken (1991), allows the asset to move up, down, or stay the same at each time increment. Again, the parameters of the discrete distribution are chosen in a manner consistent with the lognormal distribution, and the procedure begins at the end of the option’s life and works backward. By having three branches as opposed to two, the trinomial method provides greater accuracy than the binomial method for a given number of time steps. The cost is, of course, the greater the number of branches, the slower the computational speed. The trinomial method is also useful in valuing options that depend on the prices of two underlying assets. 4.3.1.3. Finite difference methods. Finite difference methods solve the BSM differential Equation (19) by converting it into a set of difference equations and then solving the difference equations iteratively. The simplest finite difference method, and, indeed, the first application of a lattice-based procedure to value an option, is the explicit method, applied by Schwartz (1977) to value warrants and by Brennan and Schwartz (1977) to value American-style put options on stocks. The explicit finite difference method is the functional equivalent of the trinomial method in the sense that the asset price moves up, down, or stays the same at each time step during the option’s life. The difference in the techniques arises only from how the price increments and the probabilities are set. Once the lattice is traced out, the valuation computations begin at the end of the option’s life and work backwards. The implicit method is computationally more robust than the explicit method (i.e. converges to the differential equation as the asset price and time increments approach zero), however, requires simultaneous solution to the difference equations and therefore considerably more computational time. The chief advantage of the implicit method is its accuracy. 4.3.2. Monte Carlo methods Boyle (1977) introduced Monte Carlo simulation as a means of valuing options. Like the lattice-based procedures, the technique involves simulating possible paths that the asset price may take over the life of the option. And, again, the simulation is performed in a manner consistent with the lognormal asset price process. To value a Europeanstyle option, each sample run is used to produce a terminal asset price, which, in turn, is used to determine the terminal option value. Over the course of many sample runs, a distribution of terminal option values is obtained. The mean of the distribution is then discounted to the present to value the option. An advantage of the Monte Carlo method is that the degree of valuation error can be assessed directly using the standard error of the estimate. The standard error equals the standard deviation of the terminal option values divided by the square root of the number of trials. Another advantage of the Monte Carlo technique is its flexibility. Since the path of the asset price beginning at time 0 and continuing through the life of the option is observed, the technique is well-suited for handling barrier-style options, Asian-style
Ch. 19:
Derivatives
1163
options, Bermuda-style options, and other exotics. Moreover, it can be easily adapted to handle multiple sources of price uncertainty. The technique’s chief disadvantage is that it can only be applied when the option payout does not depend on its value at future points in time. This eliminates the possibility of applying the technique to Americanstyle option valuation, where the decision to exercise early depends on the value of the option that will be forfeited. In addition, a large number of trials are required to get the level of valuation accuracy to a reasonable level. 4.3.3. Quasi-analytical methods In valuing American-style options, the chief difficulty lies identifying a simple expression for the optimal early exercise boundary. Within the lattice-based procedures, the matter is handled by brute force, and the approximation procedure proceeds backwards through the option’s life, comparing computed “alive” values of the option with the option’s early exercise proceeds. Quasi-analytical methods make different simplifying assumptions regarding the optimal early exercise boundary and then proceed analytically. Below the compound option and quadratic approximations are discussed. 30 4.3.3.1. Compound option approximation. Geske and Johnson (1984) use the Geske (1979a) compound option valuation model to develop an approximate value of an American-style option. The approach is intuitively appealing. An American-style option is, after all, a compound option with an infinite number of early exercise opportunities. While valuing an option in this manner makes intuitive sense, the problem is intractable from a computational standpoint. The Geske/Johnson insight is that, although an option with an infinite number of early exercise opportunities cannot be value analytically, its value can be extrapolated from the values of a sequence of “pseudo-American” options with zero, one, two, and perhaps more early exercise opportunities at discrete, equally-spaced, intervals during the option’s life. The advantage of this approach is that each of these pseudo-American options can be valued analytically. Unfortunately, with each new option added to the sequence, the valuation of a higher-order multivariate normal integral is required. With no early exercise opportunities, only a univariate function is required; however, with one early exercise opportunity, a bivariate, with two opportunities, a trivariate, and so on. The more of these options used in the series, the greater the precision in approximating the limiting value of the sequence. The cost of increased precision is that higher-order multivariate integral valuations are time-consuming computationally. 4.3.3.2. Quadratic approximation. Barone-Adesi and Whaley (1987) present a quadratic approximation for valuing American-style options. Their approach, based 30
Carr (1998) develops a new approach for determining American-style option values and exercise boundaries based on a technique called randomization.
1164
R.E. Whaley
on the work of MacMillan (1986), separates the value of an American-style option into two components: the European-style option value and an early exercise premium. Since the BSM formula provides the value of the European-style option, they focus on approximating the value of the early exercise premium. By imposing a subtle change to the BSM partial differential equation, they obtain an analytical expression for the early exercise premium, which they then add to the European-style option value, thereby providing an approximation of the American-style option value. The advantages of the quadratic approximation method are speed and accuracy. 4.3.4. Moore’s law For many years, the search for quasi-analytical approximations was an important research pursuit. Using lattice-based procedures or Monte Carlo simulation were impractical in real-time applications. This pursuit has become much less critical, thanks to Moore’s Law. In April 1965, Gordon Moore, an engineer and co-founder of Intel predicted that integrated circuit complexity would double every two years. The prediction has been surprisingly accurate. In the late 1970s, when the lattice-based and Monte Carlo simulation methods were first applied to option valuation problems, Intel’s most advanced microprocessor technology was the 8086 chip. Today, the Pentium IV microprocessor is more than 2000 times faster, and the impracticality of lattice-based and simulation-based methods has been substantially reduced. 4.4. Generalizations The generalizations of the BSM option valuation theory focus mostly on relaxing the constant volatility assumption. 31 Some valuation models assume that the local volatility rate as a deterministic function of the asset price or time or both. Others assume that volatility, like asset price, is stochastic. 4.4.1. Deterministic volatility functions The BSM risk-free hedge mechanics are preserved under the assumption that the local volatility rate is a deterministic function of time or the asset price, so risk-neutral valuation remains possible. The simplest in this class of models is the case where the local volatility rate is a deterministic function of time. The asset price follows the process, dS = aS dt + s (t) S dz. Under this assumption, Merton (1973) shows that the valuation equation for a 31 The assumption of constant interest rates is relaxed in Merton (1973), Bailey and Stulz (1989) and Amin and Jarrow (1992), among others. These studies have attracted less attention than work exclusively on stochastic volatility because empirical investigations focus on exchanged-traded options, exchangetraded options are generally short-term, and short-term options are relatively insensitive to interest rates (and the assumed interest rate process).
Ch. 19:
Derivatives
1165
European-style call option is the BSM formula (20), where the volatility parameter is the average local volatility rate over the life of the option. Other models focus on the relation between asset price and volatility. These models attempt to account for the empirical fact that, in at least some markets, volatility varies inversely with the level of asset price. 32 One such model is the constant elasticity of variance (CEV) model proposed by Cox and Ross (1976). The CEV asset-price dynamics are dS = a dt + dS q/ 2 dz, where the instantaneous variance of the stock price is d 2 S q , the elasticity of variance with respect to stock price equals q, and 0 q 2. The instantaneous variance of return, s 2 , is given by s 2 = d 2 S q−2 . If the value of q equals 2, the instantaneous variance of return is constant, which is the assumption underlying Black and Scholes. If q = 0, volatility is inversely proportional to the asset price, and a European-style call option can also be valued analytically using a formula called the “absolute diffusion model”. 33 For the general case in which 0 q 2, analytical solutions are not possible. Valuation can be handled straightforwardly, however, using lattice-based or Monte Carlo simulation procedures. Recently, Derman and Kani (1994a,b), Dupire (1994) and Rubinstein (1994) developed a valuation framework in which the local volatility rate is a deterministic, but unspecified, function of asset price and time, dS = aS dt + s (S, t) S dz. Rather than positing a structural form for their deterministic volatility function (DVF), they search for a binomial or trinomial lattice that achieves an exact cross-sectional fit of reported option prices. 34 Rubinstein uses an “implied binomial tree” whose branches at each node are designed (either by choice of up-and-down increment sizes or probabilities) to reflect the time variation of volatility. 4.4.2. Stochastic volatility functions The effects of stochastic volatility on option valuation are modeled by either superimposing jumps on the asset price process, or allowing volatility to have its own
32
See, for example, Black (1976a). See Cox and Ross (1976). 34 In contrast, Dumas, Fleming and Whaley (1998) implemented the deterministic volatility function option valuation model by expanding the local volatility rate function in a Taylor series and estimating the parameters of the function directly. 33
1166
R.E. Whaley
diffusion process, or both. Merton (1976), for example, adds a jump term to the usual geometric Brownian motion governing asset price dynamics, that is, dS = (a − lk) S dt + s S dz + S dq, where dq is the Poisson process generating the jumps and dz and dq are independent. The parameter a is the instantaneous expected return on the asset, l is the mean number of arrivals per unit time, and k ≡ E(Y − 1), where (Y − 1) is the random variable percentage change in asset price if the Poisson event occurs. By assuming that the jump component of an asset’s return is unsystematic, Merton creates a riskfree portfolio in the BSM sense and applies risk-neutral valuation for European-style options. Merton’s case is an exception to the rule, however. In general, risk-neutral valuation is not possible where volatility is stochastic because volatility is not a traded asset and, consequently, the BSM risk-free hedging argument does not apply. In studies of option valuation under stochastic volatility, asset price and asset price volatility are modeled as separate, but correlated, diffusion processes. Asset price is usually assumed to follow geometric Brownian motion with a stochastic volatility rate. The assumptions governing volatility vary. Hull and White (1987) assume volatility follows geometric Brownian motion. Scott (1987) models volatility using a mean-reverting process, and Wiggins (1987) uses a general Wiener process. Bates (1996) combines both jump and volatility diffusions in valuing foreign currency options. 35 For the asset price process, he uses the Merton (1976) assumption. For volatility movements, he assumes that variance follows the mean reverting, square root process. In all of these pairings of price and volatility process assumptions, however, the resulting differential equation describing the option price dynamics is utility-dependent and, therefore, difficult to implement. Heston (1993) derives a closed-form solution for valuing options with stochastic volatility by assuming that the risk premium is proportional to the volatility rate.
5. Studies of no-arbitrage price relations In Section 3, a number of no-arbitrage price relations were described. These relations are based on the absence of costless arbitrage opportunities in an efficiently functioning market. This section reviews some of the studies that have empirically tested the noarbitrage bounds on the prices of derivatives contracts. The studies are divided into two categories – forward/futures and options.
35
The listed studies are by no means exhaustive. Other examples include Melino and Turnbull (1990, 1995) and Stein and Stein (1991).
Ch. 19:
Derivatives
1167
5.1. Forward/futures prices 5.1.1. Cost of carry relation Studies of the cost of carry relation (Equations 3a,b in Section 3) using futures prices are few. 36 There are two primary reasons. First, a no-arbitrage price relation is a relative pricing relation, which means that the prices of the derivatives contract and the asset must be observed simultaneously. This price synchronization requirement eliminates the possibility of meaningful empirical investigation using daily closing price data for most futures markets including bonds, currencies, and commodities. The CBT’s T-bond futures market, for example, closes at 2:00 PM CST while the underlying cash Treasury bond market closes at 4:30 PM. Using daily closing prices from these markets to examine the empirical performance of the cost of carry model would produce frequent (and large) violations of the cost of carry relation. Second, for many asset classes, the cost of carry relation is mis-specified. Physical commodities, for example, cannot be sold short. This means that the cost of carry relation will hold only as a weak inequality in which the futures price will be less than or equal to the asset price plus the cost of carry. Another reason is that many futures contracts have embedded options. The grain contracts traded on the CBT, for example, allow the short to deliver one of a number of different qualities of the underlying asset at one of a number of different delivery locations. Naturally, the short will choose the least expensive. Other contracts provide the short a great deal of flexibility regarding the timing of delivery during the delivery month. This timing option may also have significant value. Certain financial futures like the CBT’s T-bond and T-note futures have both options. Since the purchaser of the futures will want payment for the options being provided to the seller, the futures price will lie below the asset price by the cumulative value of these options. 37 A market ideally suited for empirical examination of the cost of carry relation is the stock index futures market. Nearly synchronous price data are easily accessible, and the contract design is unencumbered by embedded options. Stock index futures began trading in 1982. The Kansas City Board of Trade (KCBT) was the first to launch such a market introducing the Value Line index futures in February 1982. The Chicago Mercantile Exchange (CME) followed two months later with the S&P 500 index futures. The price synchronization problem is not as a serious problem as for other markets, since the futures market closes at 3:15 PM CST, only fifteen minutes after the stock market. Index futures contracts are cash settled, with no embedded quality or timing options. In addition, unlike commodities markets, stocks can usually
36 This ignores, of course, a large literature examining forward price relations such as interest rate parity. 37 The values of these options can, of course, be modeled. The “cheapest-to-deliver” option, for example, is an option on the minimum of several assets and can be valued using the Stulz (1982) and Johnson (1987) framework.
1168
R.E. Whaley
be sold short freely with full use of proceeds (at least by market professionals such as index arbitrageurs). This leaves only two sources for price deviations in the cost of carry relation – trading costs and staleness in the reported index level. The staleness or infrequent trading issue has to do with the fact that the reported index level is an amalgam of the last trade prices of individual stocks, some of which may not have traded for several hours. Hence, the reported index is always “stale”, so to speak. 38 On average, however, neither trading costs nor infrequent trading should cause the actual futures price to be different from its theoretical level. Empirical studies of stock index futures generally focus on the CME’s S&P 500 futures contract. It is by far the most active index futures contract in the world. The earliest study of the cost of carry relation for the S&P 500 futures is Figlewski (1984). During the period June 1982 through September 1983 (essentially the first fifteen months of trading of the S&P 500 futures), he finds that the average futures price is too low relative to the index level. The behavior is temporary, however. Using fifteen-minute (simultaneous) price observations, MacKinlay and Ramaswamy (1988) examine the difference between the actual futures price and theoretical futures price for all S&P 500 futures during the period April 1982 through June 1987. They find that, while the average deviation is positive and as high as 0.78 index points for contracts as recent as December 1984, the average deviation is consistently below 0.20 for all contract maturities after the September 1985 contract. The evidence indicates that in the early days of trading, index arbitrageurs had not yet fully developed effective mechanisms for short selling the basket of stocks underlying the S&P 500. Consequently, the futures price did not rise to its proper theoretical level. 5.1.2. Forward/futures price relation The difference between forward and futures prices is driven by the marking-to-market practice of futures markets. When interest rates are known, the forward and futures prices will be equal, as was demonstrated in Section 3. When interest rate are uncertain, however, the price differential between the forward and the futures is driven by the covariance between futures price changes and discount bond price changes. A positive (negative) covariance implies that the futures price is less (greater) than the forward price, as was discussed in Section 3. Cornell and Reinganum (1981) compare the daily closing prices of five different FX futures contracts traded on the Chicago Mercantile Exchange (CME) with FX forward rates quoted at the same time by the Continental Illinois Bank during the period June 1974 through June 1979. They conclude that any differences are statistically and economically insignificant. Not surprisingly, they also document that the sizes of the covariances are all very small, on order of 1.0 × 10−7 . Chang and Chang (1990) argue that the Cornell and Reinganum results may be
38
Miller, Muthuswamy and Whaley (1994) modeled the effects that infrequent trading has on the time-series properties of the theoretical basis in the S&P 500 futures market.
Ch. 19:
Derivatives
1169
undermined by (a) the mismatches in the delivery dates of the futures and forwards, and (b) a sample period that is dominated by an economic cycle in which interest rates are volatile but currency rates are not. They adjust for the mismatch problem in the 1974–1979 period as well as investigate the differences in prices for the 1979– 1987 period. They, too, conclude that there is no meaningful difference in the levels of forward and futures prices for currencies. Early tests of the forward/futures price relation in bond markets focused on the CME’s T-bill futures market and forward prices implied from spreads in the cash T-bill market. Rendleman and Carabini (1979), for example, examine daily closing price data during the first two years of market operation – January 1976 through March 1978. They report frequent violations of the no-arbitrage relation (of equal prices), but that the potential arbitrage gains are not worth exploiting due to trading costs, monitoring costs, maturity mismatches, and so on. Cornell (1981) and Viswanath (1989) investigate whether the differential tax treatment of cash T-bills and T-bill futures gains/losses in the early years of the market is the cause. A more recent study by Meulbroek (1992) focuses on the more liquid Eurodollar futures market using daily data during the period March 1982 through June 1987. She finds strong support of the CIR predictions. Among other things, she finds that the covariance between futures and bond price changes and the covariance of forward price changes and bond price changes are positive. Under these conditions, the futures price should be (and is documented as being) less than the forward price. The relation between forward and futures prices has also been examined using commodity futures prices. Using copper and silver data, French (1983) finds that the forward-futures price differential has the sign predicted by the CIR model, but not the magnitude. French ascribes the latter result to measurement error. Park and Chen (1985) examine daily price data for forwards and futures written on six physical commodities during the period July 1977 through December 1981 and find that the differences between futures and forward prices are positive and significant in a statistical sense and are consistent with the CIR predictions regarding the difference between the variance of the bond price changes and the covariance of the bond prices changes with the price changes of the underlying commodity. They conclude that differences between futures and forward prices can be attributed to the marking-tomarket process. 5.2. Option prices Empirical tests of no-arbitrage option price relations generally fall into two groups: those that examine violations of intrinsic value relations and those that examine violations of put–call parity relations. 39 In both cases, however, the structure of the
39
A third no-arbitrage condition called the convexity relation is occasionally examined. The convexity relation, as it applies to call options, is C(X2 ) qC(X1 ) + (1 − q) C(X3 ), where the option exercise prices
1170
R.E. Whaley
experiment is the same. Are there costless arbitrage opportunities available in the options market? 5.2.1. Intrinsic value relations Intrinsic value relations were derived in Section 3. The earliest investigations of intrinsic value violations were conducted on prices from the CBOE’s stock option market. Galai (1978) examines the prices of call options 40 on 32 stocks during the period April through November 1973, the first five months of CBOE operations. Using daily closing prices, he finds frequent violations. A trading strategy that exploits these violations generates positive profits, but the magnitude of the profits is small. Bhattacharya (1983) uses intraday trade price to examine call options on 58 stocks during the period August 1976 through June 1977, 86 137 option records in all. Of these, 1304 quotes of 54 375 violate the European-style option intrinsic value and 442 quotes of 32 432 violated an early exercise lower bound. The average mis-pricings for the violations are small – $9.88 per contract 41 and $10.85 per contract, respectively. After reasonable tradings costs, profitable arbitrage trading opportunities virtually disappear. The CBOE introduced the first stock index option contract on March 11, 1983, on the S&P 100 index. The American Exchange (AMEX) launched the Major Market Index contract on April 29th of the same year. Evnine and Rudd (1985) examine the trade prices of these contracts using on-the-hour data over the period June 1984 through August 1984. They report 30 violations of the immediate exercise bound in a sample of 1091 (2.7%) for the S&P 100 calls; and 11 violations in a sample of 707 observations (1.6%) for MMI calls. Interestingly, all violations occur during the first week of August 1984, which was a particularly turbulent time in the stock market. In such a period, the likelihood of reporting errors is higher than normal. Furthermore, at the time, the indexes were not traded contracts. To exploit such violations, it is necessary to short sell the index portfolio. Finally, as noted earlier, the reported index level is always a “stale” indicator of the true level of the index.
have the order X1 < X2 < X3 and q is defined by X2 = qX1 + (1 − q) X3 . In the event the convexity relation is violated, a costless arbitrage profit may be earned by engaging in a “butterfly spread”, that is, by selling the call with exercise price X2 , buying q and 1 − q units of the calls with exercise prices X1 and X3 , respectively. Galai (1979) examines CBOE stock options traded during the period April to October 1973. When closing prices are used, he reported violations in 24 of 1000 cases. When intraday trade prices are used, however, all of these violations disappear. Similarly, Bhattacharya (1983) examines 1006 triplets of CBOE call options written on the same underlying stock during the period August 1976 through June 1977 and finds no case in which the convexity condition is violated. 40 The exclusive focus on call options should not be surprising. Recall that in Section 2 it was noted the trading in put options on stocks did not commence until June 3, 1977, and even then, only on an experimental basis. 41 A contract is for 100 shares. The profit per share is $.0988.
Ch. 19:
Derivatives
1171
The Philadelphia Exchange (PHLX) launched trading in American-style options on five different currencies in December 1982. Bodurtha and Courtadon (1986) compare option prices to their immediate exercise proceeds during the period February 1983 through September 1984. When end-of-day prices for the option and the underlying exchange rate are used, frequent violations are found. When option trade prices are matched against the currency rate prevailing at the time (from Telerate bid/ask quotations provided by the PHLX), they find that only 0.9% of call option transaction prices and 6.7% of put option transaction prices violate the immediate-exercise lower bounds. Finally, the percentages drop to 0.03% for calls and 0.2% for puts when transaction costs are taken into account. Ogden and Tucker (1987) perform a similar experiment using the American-style British pound, Deutschemark, and Swiss franc futures options traded on the CME during the calendar year 1986. They carefully match each futures option trade price with the price of the underlying futures at the time of its nearest preceding trade, eliminating those with time differentials greater than thirty minutes. In all, 81 257 call trades and 44 362 put trades are identified. Before trading costs, 1756 (2.2%) call option violations and 251 (.6%) put violations are reported. After trading costs, the percentages drop to 1.8% and 0.5% for calls and puts, respectively. The Chicago Board of Trade (CBT) launched T-bond futures option trading on October 1, 1982. Blomeyer and Boyd (1988) examine the immediate exercise proceeds bounds for call and put options trades during the period October 1982 through June 1983. They examine both ex post and ex ante trading strategies. An ex post opportunity is signaled only if the arbitrage trade is profitable after transaction costs. An ex ante trade is predicated on an ex post signal, and the trade is executed at the next available prices for the option and the futures. For calls, only 377 of 50 477 (.7%) trades signal an ex post opportunity, and, if the ex ante strategy is executed, the trader incurs an average loss of $38 per trade. Of the 30 065 put trades, 90 (.3%) satisfy the ex post requirement and the average ex ante loss is $42 per trade. 5.2.2. Put–call parity relations Recall from Section 3 that put–call parity arises from conversion (and reverse conversion) arbitrage. Borrowing to buy a put and its underlying asset, for example, is tantamount to buying a call. The earliest systematic empirical examination of the put– call parity relation appears in Stoll (1969). Stoll uses OTC option price data that the Put and Call Dealers Association provided the Securities and Exchange Commission on a weekly basis during the two-year period 1966–67. The put–call parity relation that Stoll tests does not include an expression for cash dividends since the OTC stock options that traded at the time were dividend-protected. 42 Rather than examining violations
42
At the time of a cash dividend payment during the option’s life, the exercise price is reduced by the dividend amount.
1172
R.E. Whaley
of put–call parity per se, Stoll uses a pooled time-series cross-sectional regression framework. Although the coefficients in the regression model differ slightly from their theoretical predictions, Stoll concludes that the evidence supports the theory of put– call parity. 43 Klemkosky and Resnick (1979) perform the first test of put–call parity using exchange-traded options. They collect monthly observations for calls and puts traded on 15 underlying stocks during the period July 1977 through June 1978 44 including options traded on the CBOE, AMEX and PHLX. These data are obtained from Francis Emery Fitch, Inc. and include the price, volume, and time of each trade of each option and its underlying stock. To ensure simultaneity of prices, they require that the call, put, and underlying stock trade within one minute of each other. They conclude that their test results are consistent with put–call parity and the efficiency of the option market. Evnine and Rudd (1985) examine the put–call parity for S&P 100 and MMI options. In general, they find more violations than are common in studies of options in other markets. Considering the put–call parity relation does not apply when trading in the underlying asset is restricted, this is not surprising. The most common violation is where the call appears overpriced relative to the put and the underlying index. This is surprising considering that this is the easiest arbitrage to execute (i.e., the stock index portfolio is bought rather than sold short). One possible explanation for this result is that the market generally trended upward during this period. Given infrequent trading of the stocks comprising the index, the reported index level always lags the “true” value. In an upward trending market, index call prices are reacting more quickly than the prices of all of the stocks in the index portfolio. This explanation is further supported by the fact that the relative frequency of over-priced calls is lower for the 20-stock MMI than the 100-stock S&P 100 index. In addition to the intrinsic value tests discussed earlier, Bodurtha and Courtadon (1986) examine the American-style put–call parity relation using PHLX options using simultaneous spot and option prices during the period February 1983 through September 1984. Across the five currencies, they observe only 25 violations of put–call parity in 8509 tests (0.3%), all but one of which disappear when reasonable trading costs are considered. Using approximately simultaneous trade prices of CME options on three currency futures during the calendar year 1986, Ogden and Tucker (1987) report 466 violations in 29 288 tests (1.6%) net of transaction costs.
43 Gould and Galai (1974) re-examine put–call parity using the American-style option relation and reach a similar conclusion. 44 Beginning on June 3, 1977, the Securities Exchange Commission allowed the five stock options exchanges to begin trading put options on five different stocks each. It is unclear why the put options traded on the Midwest Exchange and the Pacific Exchange were not included in the sample.
Ch. 19:
Derivatives
1173
5.3. Summary and analysis The conclusion that must be drawn from the empirical investigations discussed in this section is that it is difficult, if not impossible, to earn abnormal profits from violation of no-arbitrage price relations. Violations, where they have been reported, are usually in the early stages of the market’s development. The fact that virtually no violations appear in established markets is reassuring, since, were they to occur, the fundamental financial-economics underpinning that individuals prefer more wealth to less would have to be reconsidered. Reported violations of no-arbitrage relations in today’s markets are most likely the result of (a) stale or non-synchronous prices, (b) data recording problems, and/or (c) mis-measured trading costs.
6. Studies of option valuation models The empirical performance of competing option valuation models has been evaluated using three different types of methodology. The first type of methodology focuses in-sample on either deviations of observed prices from model values (i.e., pricing errors) or on systematic patterns in implied volatilities. In the pricing error tests, a single volatility estimate is used to value all option series within the class, 45 and then deviations of the prices of the individual series from their model values are tabulated. In the implied volatility tests, the volatility level of each option series is inferred (or implied) by setting its price equal to the model value, and then the implied volatilities for the different option series in the class are tabulated. These studies are labeled “Pricing error/implied volatility anomalies” and are discussed first. Another methodology for investigating the performance of different option valuation models is to simulate a trading strategy. To understand the structure of these investigations, recall that, according to the Black–Scholes/Merton model, a risk-free hedge can be formed between an option and its underlying asset (and that the return of this portfolio will therefore be equal to the risk-free rate of interest). If the BSM assumptions hold and the BSM formula identifies a particular call option as being overpriced, then a portfolio formed by selling the call and “delta-hedging” continuously over its remaining life should produce a rate of return in excess of the risk-free rate. Option valuation studies using strategies such as this are discussed under the heading, “Trading simulations”. The final category is called “Informational content of implied volatility”. Studies falling in this category are characterized by their focus on understanding the predictive power of implied volatility. Some tests are cross-sectional in nature, comparing implied volatility to the realized volatility of the underlying asset’s daily returns during the 45
An option class refers to all options written on the same underlying asset (e.g., all options written on the shares of IBM). An option series is a single type of option written on the asset, and is identified by three attributes: (a) exercise price, (b) expiration date, and (c) call or put.
1174
R.E. Whaley
option’s life. Such cross-sectional investigations are possible only in cases where there are multiple option classes on the same type of asset. This limits this type of analysis to stock option markets. Consequently, most studies of implied volatility are time-series in nature. They tend to focus on stock index options. In the early 1980s, a number of index options were launched, with varying degrees of success. Three contract markets have dominated in terms of trading volume – the CBOE’s S&P 100 and S&P 500 index options, and the CME’s S&P 500 futures options. The fact that intraday data for these option classes are widely available and that, historically, there has been intense empirical interest in the phenomenon of “market volatility” has made this category of study the most voluminous. 6.1. Pricing errors/implied volatility anomalies The first empirical studies in this category appeared in the late 1970s and early 1980s. Black (1975) reports that the BSM formula systematically under-priced deep out-ofthe-money calls and overpriced deep in-the-money calls during the first two years of stock option trading (i.e., 1973–1975) at the CBOE. One reason that this “moneyness bias” may appear is that the BSM formula values European-style options while the stock options traded on the CBOE are American-style. If the stocks underlying the options paid no dividends, this would be a non-issue, 46 however, most of the stocks at the time were ‘blue-chip’ stocks with generous dividend payouts. Whaley (1982) uses weekly closing prices for CBOE call options during the period January 1975 through February 1978 to examine whether the American-style call option valuation formula of Roll (1977)–Geske (1979b)–Whaley (1981) eliminates the moneyness bias. He finds that explicit recognition of dividends and the early exercise premium reduces, but does not eliminate, the moneyness bias. MacBeth and Merville (1980) examine the daily closing prices of CBOE call options on six stocks during the calendar year 1976 and find that the BSM formula produces option values that are too high for out-of-the-money call options and too low for inthe-money call options, exactly opposite the bias reported by Black and Whaley. Such behavior, they contend, may be driven by the fact that the BSM model assumes a constant volatility rate throughout the life of the option. They suggest that stock price dynamics should be modeled as a constant elasticity of variance process described in Section 4. They cite anecdotal evidence that suggests that return volatility falls when stock prices rise to support their claim. When they use q < 2 and examine the pricing errors of the CEV option valuation model, they find it fits better than the BSM formula and that the moneyness bias is reduced. 47 46 Recall that, in Section 3, it was shown that a call option on a non-dividend-paying stock will never optimally be exercised early. 47 Based on the daily returns of 47 stocks during the period September 1972 through September 1977 (1254 trading days), Beckers (1980) concludes that the constant elasticity of variance provided a better descriptor of stock price behavior than does the constant variance lognormal model.
Ch. 19:
Derivatives
1175
Apparently perplexed by the reversal in sign of the moneyness bias, Emanuel and MacBeth (1982) gather an updated sample (calendar year 1978) of daily closing price data for the options on the same six stocks as MacBeth and Merville. Interestingly enough, they find that the moneyness bias reversed itself yet once again, with the pattern in pricing errors reverting back to that described by Black (1975). 48 The flip-flopping of the moneyness bias lead Emanuel and MacBeth to conclude that the CEV model with stationary parameters cannot explain the mispricing of call options any better than BSM. While the CEV model fits better than BSM formula in-sample due to the presence of an extra parameter, the movement in the parameter estimates undermines the model’s usefulness. Like the stock option prices (and as we will see shortly, currency option prices), stock index option prices exhibit moneyness biases, and the biases are not stationary through time. Whaley (1986) examines the prices of call and put options written on the S&P 500 futures option during the period January 1983 through December 1983 (i.e., the first calendar year of trading). Using the Barone-Adesi and Whaley (1987) approximation method to value these American-style options, he finds that out-of-themoney calls have model values that are too high and that in-the-money call options have model values that are too low. 49 Translated into implied volatility terms, this means implied volatility is a decreasing function of the option’s exercise price. Sheikh (1991) examines BSM implied volatility patterns for S&P 100 index call options 50 during the period July 1983 through December 1985. He divides his sample into three sub-periods and documents a different moneyness bias in each. In the first sub-period, for example, he finds that call option’s implied volatility falls monotonically with its exercise price across option maturities. In the second sub-period, the relation is more complex: short-term options had a “smile” shape, with the at-the-money option having the lowest implied volatility, while longer term options’ volatilities decreased with exercise price. Finally, in the third sub-period, period, he finds an implied volatility smile for all option maturities.
48 Rubinstein (1985b) also documents the moneyness patterns found in the MacBeth and Merville (1980) and Emanuel and MacBeth (1982) studies. 49 Naturally, due to put–call parity, Whaley documents the exact opposite bias for put options written on the S&P 500 futures. 50 On one hand, using price data for S&P 100 index options is the best of the available alternatives, since, at the time, they were by far the most actively traded index options. On the other hand, computing accurately implied volatilities from S&P 100 index options requires handling multiple discrete cash dividends during the option’s life [see Harvey and Whaley (1991)]. Such an exercise, while computationally burdensome, is mandatory. Moreover, the S&P 100 index options have a wildcard feature that allows the option holder to wait until 3:15 PM to decide upon exercise while the settlement proceeds are established at 3:00 PM. Fleming and Whaley (1994) show that the wildcard option may have significant value, and increases in value as the option goes further in the money. Sheihk’s use of the BSM formula to compute implied volatilities most assuredly accounts for some part of the positive relation between implied volatility and moneyness. It does not, however, explain the smile observed in the last sub-period.
1176
R.E. Whaley
The moneyness biases for foreign currency options are also non-stationary. Bodurtha and Courtadon (1987) examine pricing errors of five sets of foreign currency options traded on the PHLX during the period February 1983 through March 1985. They find that in-the-money calls (out-of-the-money puts) are over-priced and out-of-the-money calls (in-the-money puts) are under-priced. This means that option-implied volatility is a decreasing function of the option’s exercise price. Shastri and Tandon (1986a) find similar evidence for the CME’s call option on Deutschemark futures during the period February 1984 through December 1984. Hsieh and Manas-Anton (1988), on the other hand, find evidence of a U-shape relation between implied volatility and exercise price for Deutschemark futures option prices during the period January 23, 1984 through October 10, 1984. Similarly, Bates (1996) provides evidence a “volatility smile” for Deutschemark options in sample periods after 1988. Bollen and Rasiel (2003) measure explicitly time variation in the slope of the implied volatility function derived from OTC currency option quotes from 1998. In this single year, options on British Pounds Sterling and Japanese Yen exhibit both symmetric and asymmetric patterns, with significant changes from week to week. 6.2. Trading simulations Under the BSM option valuation assumptions, a mis-priced option, delta-hedged over its remaining life, should provide a risk-free return different from the prevailing riskfree rate. This proposition underlies all of the trading simulation tests of option valuation models, including that of Black and Scholes (1972). The strategies take on various forms, as discussed below. The basic procedure involves selling over-priced and buying under-priced options, simultaneously hedging the positions with a position in the underlying asset so that the net portfolio delta equals zero. The position is then held until the option’s expiration, with the zero-delta hedge being maintained at the close each day by adjusting the asset position. At the end of the option’s life, profits before and after transactions costs are tallied. 51 Black and Scholes (1972) design the first trading simulation test of an option valuation model. They use a sample of 2039 six-month call option transactions on 51 Trading simulations cannot exactly portray the BSM world in that (a) portfolios must be rebalanced discretely rather than continuously, (b) investors face significant trading costs, (c) contracts are indivisible, and (d) future volatility is not known. Nonetheless, tests can manage the effects of these market imperfections. First, the contract-indivisibility constraint usually arises because researchers typically buy or sell a single mis-priced option and hedge using a fractional number of units of the underlying asset. The remedy is simple – buy or sell more option contracts in the simulation. Second, while discreterebalancing and imperfect volatility foresight can and do undermine hedging effectiveness, researchers can and do handle the problem by risk-adjusting reported profits based on the realized hedge portfolio volatility. Finally, trading costs are known, and are usually implemented directly into simulation analysis. Figlewski (1989) carries out an extensive set of Monte Carlo simulations showing the effects of each of these factors on the expected return and risk of the hedge portfolio. Boyle and Emanuel (1980) show how discrete-rebalancing affects the properties of the hedge portfolio return distribution.
Ch. 19:
Derivatives
1177
545 NYSE securities taken from the diaries of an option broker for the period May 1966 through July 1969 (766 trading days). 52 BSM formula values are computed each day. The volatility parameter is based on the daily returns of the underlying stock over the past year. The six-month commercial paper rate is used as a proxy for the riskfree rate of interest. In all, they implement four trading strategies, each one involving delta-hedging the mis-priced option using the underlying stock. To assess whether the valuation model values are, on average, too high or too low, they conduct a strategy whereby all calls are purchased at model values. To assess whether the option writer’s premium is too high or too low on average, they conduct a second strategy whereby all calls are purchased at market prices. To test whether or not the model should be used to value contracts, they simulate a third strategy whereby under-priced calls are purchased and over-priced calls are sold at model values. Finally, to test whether or not profit opportunities existed in the option market over the sample period, they conduct a final test whereby under-priced calls are purchased and over-priced calls are sold at market prices. Most subsequent empirical studies implement only the last procedure. Their results are interesting in a number of respects. First, buying options at model values produces trading profits that are not significantly different from zero. In other words, the model produces unbiased estimates of option prices. This implies, of course, that the BSM implied volatility is an unbiased predictor of future realized volatility for the stocks in their sample. Second, buying options at market prices produces significant losses. This stands to reason since the trades contained in the sample are only sales by the option broker (i.e. customer buys). The option broker, of course, earned significant profits. The fourth strategy of buying under-priced and selling over-priced calls using market prices produces significant profits. The third strategy of buying under-priced calls and selling over-priced calls at model prices produces significant losses. The explanation for this is subtle. The estimates of volatility based on historical returns have measurement error. Holding other factors constant, calls on high-volatility stocks tend to appear under-priced because their volatility estimates are higher than they should be. Buying these options at market prices will tend to produce losses. Conversely, calls on low-volatility stocks will tend to appear over-priced because their volatility estimates are lower than they should be. Selling such calls at market prices will tend to produce losses. To test this explanation, Black and Scholes re-run the strategy using the realized rate of return volatility over the life of the contract. The profits from the third strategy become insignificantly different from zero. 53 Interestingly, using the realized volatility over the life of the call made the profits from the fourth strategy larger and more significant. This is analogous to saying the implied volatility from the Black–Scholes formula is a better predictor of future realized volatility than is past realized volatility. 52
Recall the Chicago Board Options Exchange did not begin trading stock options until April 1973. Karolyi (1993) documents the same type of behavior for a sample of CBOE call options written on 74 different stocks during the period January 1984 through December 1985. His test results show that the mean squared prediction error is reduced when a Bayesian shrinkage estimator is used. 53
1178
R.E. Whaley
Whaley (1982) simulates a similar trading strategy using closing prices of CBOE call options written on 91 different dividend-paying stocks during a 160-week period from January 1975 through February 1978. Prices for options and stocks, as well as T-bill rates (which proxy for the risk-free rate of return), are drawn from the Wall Street Journal. Dividend information and stock returns are drawn from the CRSP daily files. All options are valued using an average implied volatility for the option class from the previous week. All over-priced options on all stocks are sold, and all underpriced options on all stocks are purchased. Weights are assigned to the options in each portfolio so that both the dollar investment and the systematic risk of the option portfolios are the same. 54 Under this scheme, the expected rate of return of the hedge portfolio is zero (i.e., the equilibrium return on a portfolio with no risk and no capital investment). The positions are liquidated at the end of each week, and new positions are established. Over the sample period, the mean hedge portfolio return is 2.46% per week and is statistically significant. After trading costs equal to the bid/ask spread are applied, however, the profits disappear. Trading simulation tests have also been performed for index options as well as foreign currency options. Whaley (1986) conducts a trading simulation using all S&P 500 futures option trades reported in the CME’s trade and quote file during the period January 1983 through December 1983 (i.e., the first year of trading in this contract market). His strategy involves selling all over-priced options and buying all under-priced options. The volatility used in valuing the options is the at-the-money implied volatility from the previous day. Each option position is then delta-hedged and held to expiration. Any subsequent movement in the delta is corrected at the close each day by re-aligning the number of futures in the hedge portfolio. Overall, the before transaction-cost profits from the trading strategy are both positive and statistically significant. He also computes the breakeven trading cost rate and finds that, while it is below the rate a retail customer would face, it is well above the rates market makers would face. It is also interesting to note that when he categorizes the options by moneyness and option type (i.e., call or put), the out-of-the-money puts have by far the largest trading profits, followed by the in-the-money calls. This is not surprising considering the downward-sloping implied volatility function usually found in the S&P 500 futures options market. Shastri and Tandon (1986a) also use the CME’s trade and quote information to perform a trading strategy test for German mark futures options during the period February 1984 through December 1984. They find that significant excess trading profits, even after the incorporation of modest trading costs. Concerned that the trade may not be executable at the prices that signaled the profit opportunity, they repeat the simulation exercise by executing the trade at the next available trade prices. This delay
54 Whaley uses the Sharpe (1964)–Lintner (1965) capital asset-pricing model to create his hedge portfolio. To estimate the call option’s beta, he takes the product of the estimated beta of the underlying stock and the option’s elasticity with respect to stock price.
Ch. 19:
Derivatives
1179
causes the trading profits to disappear. Using the next trade price likely overstates the case, however, since the trade and quote data contain only prices that have changed from the previously recorded prices. Successive trades at the same price do not appear. Nonetheless, the evidence clearly indicates that the profit opportunities are fleeting. Bollen, Gray and Whaley (2000) examine the trading profit opportunities in the PHLX’s markets for options on British pounds, German marks, and Japanese yen during the period February 1983 through May 1996. They conclude that in these currency option markets, trading profits are better identified using a regime-switching option valuation model than using the standard models such as BSM. Reasonable trading costs mitigate the viability of the trading strategy, however. 6.3. Informational content of implied volatility A good deal of energy has been focused on the information content of implied volatility in the empirical options literature. While not talking about implied volatility per se, Black and Scholes (1972) observe that their option valuation formula better described the cross-sectional structure of observed option prices when realized volatility over the option’s life is used as the volatility parameter in the model rather than the realized volatility over the year preceding the cross section. This implies that the BSM implied volatility is a better predictor of future realized return volatility than is past realized return volatility. A number of direct investigations of this phenomenon ensued. The first study to appear is by Latane and Rendleman (1976). They use closing option and stock prices from the Wall Street Journal for 24 firms whose options traded on the CBOE during the period October 1973 through June 1974. Weekly price observations are used, and each option class is required to have prices for at least two option series each week. Because options on the same stock but at different exercise prices have different implied volatilities, Latane and Rendleman create a weighted implied standard deviation (WISD) by weighting each series’ estimate by its “vega” (i.e., the partial derivative of the option price with respect to volatility). They then compare the WISDs across stocks with estimates of realized volatilities for the period before, during, and after the sample period. They discover that implied volatility is more highly correlated with concurrent and subsequent realized volatility than historical volatility. Chiras and Manaster (1978) examine the predictive power of stock option implied volatility for all stock options traded on the CBOE each month during the period June 1973 through April 1975. On day t, they calculate a weighted-average implied volatility from all options series on a particular stock. Next they compute the standard deviations of realized returns over the past twenty trading days, and the next twenty trading days. Realized volatility over the next twenty days serves as the dependent variable in a cross-sectional regression. The weighted average implied volatility and the historical realized volatility serve as regressors. They conclude that the cross-sectional implied volatility has become more informative over time based on the increasing R2 values in the regression of future volatility on implied volatility. Based on the regressions that
1180
R.E. Whaley
include both implied volatility and historical volatility as independent variables, they conclude that historical volatility is insignificant. Beckers (1981) performs a similar cross-sectional regression using daily closing price data on CBOE stock options over a 75-trading day period in October 1975 through January 1976. His independent variables include historical volatility, the Latane and Rendlemen WISD, and the implied volatility of the at-the-money option. He concludes that at-the-money implied volatility predicts as well or better than the other alternatives. In addition, in contrast to Chiras and Manaster, Beckers find that historical volatility provides incremental explanatory power when included in the same regression with at-the-money implied volatility. For options on assets other than common stocks, the information content of implied volatility is assessed in a time-series fashion. Day and Lewis (1992) compare the implied volatility of S&P 100 index option prices to GARCH and EGARCH models of conditional volatility over a 319-week period from November 1983 to December 1989. They conclude that S&P 100 options provide unbiased forecasts of future volatility but that the inclusion of GARCH and EGARCH volatility assessments contains additional information. Harvey and Whaley (1992) develop and test a conditional market volatility prediction using S&P 100 index option prices during the period March 1983 through December 1989. They find that, although volatility changes are predictable in a statistical sense, the profits generated by simulated trading of S&P 100 index options based on the volatility predictions are insignificant after trading costs. 55 Fleming (1993, 1998) regresses first-differenced realized volatility (options’ lifetime and 28-day) on first-differenced implied volatility using daily transaction data over the period October 1985 through April 1992, excluding the 1987 crash period. He concludes that implied volatility is a biased but substantially informative forecast of future volatility. He also examines the profits from trading volatility straddles (i.e., buying an at-the-money call and put) on the S&P 100 index, and reports that apparent trading profits disappear after reasonable trading costs are imposed. In other words, the bias in implied volatility is not economically significant. The CBOE computes intraday levels of implied volatilities for S&P 100 and NASDAQ 100 index options. They are disseminated under the ticker symbols “VIX” and “VXN”, respectively. The methodology used to create the volatility indexes is described in Whaley (1993). Fleming, Ostdiek and Whaley (1995) examine the time series properties of the VIX over the seven-year period 1986 through 1992 on a daily and weekly basis. They report that changes in the VIX have a strong inverse and asymmetric contemporaneous association with the returns of the S&P 100 index. On days that the S&P 100 rises the VIX falls, but on days that the S&P 100 index falls the VIX rises by even more in absolute terms. This evidence is consistent with the work of Schwert (1989, 1990), who finds asymmetry in the relation between stock returns and expected volatility, that is, the increase in expected volatility corresponding
55
Lamoureux and Lastrapes (1993) conduct a similar analysis using stock option implied volatilities.
Ch. 19:
Derivatives
1181
to a given negative stock market return is larger than the decrease in expected volatility corresponding to a similar size positive return. Schwert (2002) examines the contemporaneous relation between changes in the VIX and the VXN, the dynamics of which he ascribes is due, in part, to IPO activity. Shastri and Tandon (1986b) compare the volatility predictions using daily closing prices British pound, German mark, Japanese yen, and Swiss franc options traded on the PHLX during the period December 1982 through February 1984. To predict future realized volatility, they use: (a) the historical realized volatility, (b) the weightedaverage implied volatility advocated by Latane and Rendleman (1976), and (c) the implied volatility of the at-the-money option. For British pounds and Japanese yen, the WISD performed best using a goodness-of-fit criterion, the historical estimator for German marks, and the at-the-money implied volatility for Swiss francs. The results indicate that the implied volatility is biased, however. In contrast, Lyons (1988) uses weekly price observations (from a transaction data base of PHLX currency options) to compare the at-the-money implied volatility to historical volatility of exchange rates for the Deutschemark, pound and yen during a sample period July 1983 through May 1986. He finds no difference on average between the levels of implied volatility and historical realized volatility. Jorion (1995) examines Deutschemark options over the period January 1985 to February 1992. He finds that implied volatilities are almost unbiased forecasts of the next day’s absolute return, but are slightly biased forecasts of the volatility over the option’s life. He also finds that neither historical volatility nor GARCH-based volatility assessments provide additional forecast power, however. Finally, a recent study by Ederington and Guan (2000) examines the informational content of implied volatilities at different exercise prices. Using S&P 500 futures options over the period January 1988 through April 1998, they regress return volatility estimated over the option’s life separately on implied volatilities of options with different exercise prices. They find that the “best” predictor of realized volatility is out-of-money calls/in-the-money puts, as evidenced by the least amount of bias and the highest correlation. Recall that earlier we discussed that fact that over the past ten or fifteen years, the implied volatilities of S&P 500 futures options have been a decreasing function of exercise price, but the slope of the relation changes through time. This suggests that the prices of out-of-the-money puts/in-the-money calls are driven by factors other than volatility. 6.4. Summary and analysis The studies described in this section provide a number of stylized facts regarding the performance of option valuation models: (1) The shape of the BSM implied volatility function 56 (IVF) is not stationary through time for stocks, stock indexes, or currencies. Sometimes it appears as a smile, 56
The implied volatility function is defined as the relation between an option’s BSM implied volatility and the option’s exercise price and time to expiration.
1182
R.E. Whaley
with at-the-money options having the lowest implied volatility. At other times it is downward sloping in exercise price. Yet, at other times still, the relation between implied volatility and exercise price is different for different option maturities. (2) Certain categories of options appear to generate higher abnormal trading profits than other categories of options. In particular, selling out-of-the-money index puts has been shown to consistently generate significant risk-adjusted profits before trading costs, even well before the October 1987 market crash. (3) Implied volatility appears to be an upward biased estimate of future volatility for index options, and, perhaps, for stock options. The economic significance of this bias appears small, however. For foreign currency options, implied volatility appears to be unbiased. Of these results, the absence of a flat IVF is probably the most perplexing. A number of studies argue that the “volatility smiles” have appeared because asset prices do not follow the assumed geometric Brownian motion with constant volatility. Under geometric Brownian motion, the conditional risk-neutral density function of the underlying asset price is lognormal (or, alternatively, the distribution of asset returns is normal). The fact that the IVF is not a horizontal line, it is argued, indicates that the risk-neutral density function has different skewness and kurtosis parameters than the lognormal distribution. A downward sloping IVF indicates that the risk-neutral distribution is more negatively skewed than the BSM model assumes. A smile-shaped IVF indicates that the risk-neutral density functions is leptokurtic (i.e., has fatter tails). Under this explanation, the theoretical challenge is to find a stochastic process capable of generating the moments of the risk-neutral distribution that match those implied by option prices. 57 One family of models advanced under this explanation are the deterministic volatility function (DVF) models. Derman and Kani (1994a,b), Dupire (1994) and Rubinstein (1994) develop variations of a model that assumes the local volatility rate of the index is a function of the index level and time. Rather than positing a structural form for their DVF, they search for a binomial or trinomial lattice that achieves an exact cross-sectional fit of reported option prices (or, alternatively, an exact match of the moments of the risk-neutral distribution). This model cannot be tested in-sample, since all pricing errors are equal to zero. To test such a model, it is necessary to use outof-sample data.
57 This argument presupposes that there are no trading costs or other impediments to hedging one option against another. A number of studies provide methods for estimating the moments of riskneutral distribution for arbitrary stochastic processes, an idea that originally appears in Breeden and Litzenberger (1978). Jarrow and Rudd (1982) are the first to attempt such estimations, and Corrado and Su (1996) improve upon the Jarrow and Rudd methodology. Jackwerth and Rubinstein (1996) and Bondarenko (2000) recover risk-neutral densities using nonparametric approaches. Bakshi, Kapadia and Madan (2001) and Dennis and Mayhew (2002) use the technique for inferring the moments of riskneutral stock return distributions.
Ch. 19:
Derivatives
1183
Dumas, Fleming and Whaley (1998) perform such an experiment for S&P 500 index options during the period June 1988 through December 1993. They conclude that, although there is unlimited flexibility in specifying the DVF and it is always possible to describe exactly the observed structure of option prices, a parsimonious model works best using in-sample according to the Akaike Information Criterion. More importantly, they also show that, when the fitted volatility function is used to value options one week later, the DVF model’s prediction errors grow large as the volatility function becomes less parsimonious. These results imply that models such as the DVF model are vulnerable to over-fitting the data. In an attempt to evaluate the economic content of the DVF parameter estimates, Dumas et al. evaluate the predictive performance of an ad hoc implementation of the BSM model that smoothes the implied volatilities of the S&P 500 options across exercise prices and times to expiration, 58 and then uses the estimated IVF to calculate option values one week later. Surprisingly, the ad hoc model outperforms all of the DVF models that they consider. 59 Another family of models that can explain the absence of a flat IVF is option valuation models based on stochastic volatility. Stochastic volatility models can generate a downward sloping implied volatility function through volatility innovations that are negatively correlated with index returns. Bakshi, Cao and Chen (1997) advocate the use of a stochastic volatility model with jumps for valuing S&P 500 index options. They conduct a comprehensive empirical study on the relative merits of competing option valuation models based on stochastic volatility (SV), stochastic interest rates (SI), and random jumps in the asset price (J). In all, their “SVSI-J” model has eleven parameters, which they estimate each day using a cross-section of S&P 500 index option prices. The sample period extends from June 1988 through May 1991. As in the experiments of Dumas, Fleming, and Whaley, the parameter estimates are obtained by minimizing the sum of squared errors between the last bid-ask quote midpoint and the model value. 60 In-sample pricing errors, out-of-sample pricing errors, and hedging errors are tabulated. Overall, their results appear to support the claim that a model with stochastic volatility and random jumps is a better alternative to the BSM formula with constant volatility across exercise prices. The results of the Bakshi et al. study are not conclusive, however. First, the fact that the more complicated models fit the cross-section of index option prices better than the BSM formula in-sample is simply a result of using more parameters. The Bakshi et al. tests do not penalize the goodness-of-fit for the addition of more parameters. Second, out-of-sample pricing and hedging errors are computed only one day after the parameters are estimated. As such, they are practically in-sample and are subject to the 58
This “model” is intended to mimic the practice of market makers. A recent study by Brandt and Wu (2002) reaches similar conclusions using cross-sectional data for European- and American style FT-SE 100 index options. 60 An important implementation issue that goes largely unnoted is that available numerical methods cannot (and do not) guarantee a global minimum in an estimation problem with so many parameters. 59
1184
R.E. Whaley
same criticisms. In all likelihood, the ad hoc BSM model used by Dumas et al. would have performed as well in such a test design. Both approaches are doing nothing more than indirectly smoothing a function through the cross-section of option prices, and the estimated surface happens to be fairly stable over short intervals of time. More frequent recalibration, in principle, should bring the prediction errors and hedging errors of the two methods closer together. This is not the issue, however. The objective of the study is to identify the stochastic process governing index movements through time. Third, the fact that the implied volatility smile is so steeply sloped (and changes through time) cannot be reconciled with the parameters of the empirical distribution or risk aversion. Bakshi et al. find that the volatility of volatility coefficient implied from options differs significantly from the one estimated directly from returns. Similarly, Bates (2000) examines the ability of a stochastic volatility model, with and without jumps, to generate the negative skewness consistent with a steep IVF. He finds that the inclusion of a jump process improves the model’s ability to generate IVFs consistent with market prices, but that the parameter values are unreasonable. Searching for an economic explanation, Jackwerth (2000) attempts to recover risk aversion functions from S&P 500 index option prices, but winds up concluding that they are “irreconcilable with a representative investor”. Using a new trading strategy test methodology, Bondarenko (2001) examines prices of out-of-the-money puts written on the S&P 500 futures during the period 1988 through 2000 and concludes the market is inefficient. Finally, it is also worth noting that different estimation methods for identifying the parameters of the risk-neutral distribution can provide dramatically different parameter estimates. Campa, Chang and Reider (1998) use three methods to estimate the parameters of the risk-neutral distribution from OTC currency option prices: (a) cublic splines based on the volatility-smoothing approach of Shimko (1993), (b) the implied binomial tree approach of Rubinstein (1994) (both untrimmed and trimmed), and (c) the mixture of lognormals approach of Melick and Thomas (1997). Using onemonth ITL/DEM options in the period April 1996 through March 1997, for example, they find that the average implied kurtosis is 2.379 using the cublic spline method, 16.23 and 1.346 using the untrimmed and trimmed binomial trees, and 2.192 using the mixture of lognormals. Based on these anomalies, spending additional resources developing more elaborate theoretical models (with even more parameters) and more sophisticated computational techniques seems imprudent, at least in the short-run. A more promising avenue of investigation, perhaps, is the study of the option market participants’ supply and demand for different option series in different option markets. One way to think of the IVF is as a series of market clearing option prices quoted in terms of BSM implied volatilities. In the BSM model, dynamic replication ensures that the supply curve for all option series in a given class is a horizontal line. No matter how large the demand for buying options, price and implied volatility are unaffected. In reality, however, a market maker will not stand ready to sell an unlimited number of contracts in a particular option series. As his position grows large, so do his expected hedging costs,
Ch. 19:
Derivatives
1185
0.40
Volatility
0.35 0.30 S&P 500 options
0.25
Stock options
0.20 0.15 0.10 1
2
3
4
5
Moneyness categories Fig. 1. Average implied volatility by moneyness for S&P 500 index options and twenty stock options traded on the Chicago Board Options Exchange during the period January 1995 through December 2000. Stock option classes are the twenty most active that traded continuously on the CBOE throughout the sample period. Implied volatilities are computed daily based on the midpoint of the bid/ask quotes as of 3PM (CST). The moneyness categories are based on the option deltas: 1: deep out-of-the-money puts (Dp > −0.125) / deep in-the-money calls (Dc > 0.875). 2: out-of-the-money puts (−0.125 Dp > −0.375) / in-the-money calls (0.625 < Dc 0.875). 3: at-the-money puts (−0.325 Dp > −0.625) / at-the-money calls (0.375 < Dc 0.625). 4: in-of-the-money puts (−0.625 Dp > −0.875) / out-of-the-money calls (0.125 < Dc 0.375). 5: deep in-the-money puts (−0.875 Dp ) / deep out-of-the-money calls (Dc 0.125). Source: Bollen and Whaley (2002).
not only in the form of direct trading costs such as the bid/ask spread of other options he will need to hedge, but also in the very availability of the other option series needed to hedge, so-called “limits to arbitrage”. 61 To help distinguish between the “stochastic process” and the “buying pressure” explanations for the shape of the IVF, consider Figure 1 in which the average daily implied volatility for S&P 500 index options and the average daily implied volatility for twenty stock options over the period January 1995 through December 2000 is plotted by moneyness category. The data are drawn from Bollen and Whaley (2002). The stock option classes considered are the twenty most active that traded continuously on the CBOE during the sample period. The underlying stocks have both high market-capitalization and highly liquid markets. Category 1 includes deep out-ofthe-money puts and deep in-the-money calls. Moneyness is based on the option’s delta. 62 Category 1 contains puts with deltas above −.125 and calls with deltas above 0.875. Calls and puts are included in the same category since there implied volatilities are linked through put–call parity. Category 2 include out-of-the-money
61
See Shleifer and Vishny (1997). In comparing IVFs across option series and across option classes, it is necessary to account for differing times to expiration and volatility rates in the definition of moneyness. 62
1186
R.E. Whaley 1.00 S&P 500 Stocks 0.80
Normal
0.60
0.40
0.20
3.00
2.70
2.40
2.10
1.80
1.50
1.20
0.90
0.60
0.30
0.00
-0.30
-0.60
-0.90
-1.20
-1.50
-1.80
-2.10
-2.40
-2.70
-3.00
0.00
Fig. 2. Empirical cumulative distribution functions (CDF) of standardized daily returns for the S&P 500 index, average empirical CDF of the standardized returns of twenty stocks, and the analytical CDF of a standard normal. Return data are from January 1995 through December 2000. Source: Bollen and Whaley (2002).
puts (−.125 to −.375)/in-the-money calls (.625 to 0.875), Category 3 contains at-themoney puts (−.375 to −.625) and calls (.375 to 0.625), and so on. With respect to distinguishing between the competing hypotheses, Figure 1 has two important features. First, the familiar IVF sneer appears for S&P 500 index options. Index option implied volatility decreases monotonically as the exercise price rises. They range from 26.2% for Category 1 options to 16.9% for category 5 options, about 970 basis points in total. Second, the IVF for stock options appears as a smile, and its range is less than 400 basis points – from 35.6% for Category 3 (at-the-money) options to 39.4% for Category 5 options. Since the shape of the IVF is tied to the moments of the risk-neutral distribution of the underlying asset, the IVFs shown in Figure 1 suggest that the return distribution for the S&P 500 index is highly skewed to the left while the return distribution for typical stocks options is leptokurtic (i.e., has fat tails). As Bollen and Whaley show (and is reproduced in Figure 2), however, the cumulative standardized empirical return distributions of the S&P 500 index and the individual stocks are not discernibly different from one another in terms of skewness, although both seem to have fatter tails than the normal. The subtle differences shown in Figure 2 can be translated into BSM implied volatilities by computing hypothetical risk-neutral put option prices based on the empirical distribution and then BSM implied volatilities from the hypothetical put prices. Figure 3 shows the resulting IVFs. The figure clearly shows the effects of the leptokurtosis in the empirical distributions as the IVFs are smiled-shaped. On the left-hand side of the effects, the slight differences in the skewness of the distribution appear. Overall, however, the proposition that the shape of the IVF is driven by a failure to identify the appropriate stochastic processes governing the movements of the asset price and volatility seems unjustified.
Ch. 19:
Derivatives
1187
Implied volatility
0.24
0.22 S&P 500 options
0.20
Stock options
0.18
0.16 0.000
-0.200
-0.400
-0.600
-0.800
-1.000
Put option delta Fig. 3. Hypothetical prices of one-month European-style put options based on an asset price of 100, a volatility rate of 20%, and a risk-free rate of interest of 5%. The underlying empirical distributions for the S&P 500 index and the individual stocks are tabulated using daily returns over the period January 1995 through December 2000. Source: Bollen and Whaley (2002).
Figure 4 provides more clues about what may be the root cause. Plotted are the average differences between the implied volatilities of Figure 1 and the realized historical volatility for each asset over the most recent sixty trading days. Again, the data are drawn from Bollen and Whaley. Interestingly, all of the differences between implied and historical volatility for S&P 500 index options are greater than zero, with the greatest difference being for Category 1 options (deep out-of-the-money puts/deep in-the-money calls). This finding is consistent with Longstaff (1995), who finds that the implied index level from S&P 100 call option prices exceeds the observed level of the S&P 100 index. For stock options, on the other hand, the deviations across all moneyness categories average about zero. For the individual categories, the differences are positive only for Category 1 and Category 5 options and negative for the rest. One possible explanation for these results is the supply and demand conditions differ for different option series on different underlying assets. It is well known, for example, that institutional investors buy S&P 500 puts for portfolio insurance. Unfortunately, there are no natural counterparties to these trades, and market makers must step in and absorb the imbalance, sometimes taking ever larger positions in a particular option series. Since hedging these positions is more and more costly as the size of his position increases, the market maker has no choice but to raise prices. The fact that all S&P 500 implied volatilities are higher than historical volatility suggests that trading activity in index options is largely buyer-initiated. The fact that out-of-the-money puts have higher implied volatilities than at-the-money puts reveals the institutional preference for out-of-the-money puts. In contrast, the flatness of the stock option IVF in Figure 1 and the small differences between implied volatility and historical volatility for stock options in Figure 3 suggest that the public order flow for stock options is much more
R.E. Whaley
Implied volatility less historical volatility
1188
0.10 0.05 S&P 500 options
0.00
Stock options
-0.05 -0.10 1
2
3
4
5
Moneyness categories Fig. 4. Average difference between implied volatility and historical volatility over the most recent sixty trading days by moneyness for S&P 500 index options and twenty stock options traded on the Chicago Board Options Exchange during the period January 1995 through December 2000. Stock option classes are the twenty most active that traded continuously on the CBOE throughout the sample period. Implied volatilities are computed daily based on the midpoint of the bid/ask quotes as of 3 PM (CST). The moneyness categories are based on the option deltas: 1: deep out-of-the-money puts (Dp > −0.125) / deep in-the-money calls (Dc > 0.875). 2: out-of-the-money puts (−0.125 Dp > −0.375) / in-the-money calls (0.625 < Dc 0.875). 3: at-the-money puts (−0.325 Dp > −0.625) / at-the-money calls (0.375 < Dc 0.625). 4: in-of-the-money puts (−0.625 Dp > −0.875) / out-of-the-money calls (0.125 < Dc 0.375). 5: deep in-the-money puts (−0.875 Dp ) / deep out-of-the-money calls (Dc 0.125). Source: Bollen and Whaley (2002).
evenly balanced between buyers and sellers, with market makers absorbing less of the imbalance. The above supply–demand argument relates to the average demand over time. If buying pressure explains the behavior of the IVF, the shape of the IVF should change through time as the demands for options at different exercise prices change. Such evidence has already appeared. Longstaff (1995), for example, finds that differences between the call option-implied and observed S&P 100 index level is related to trading costs and the level of trading activity, and Bates (1996) finds a strong relation between the slope of the IVF (implied risk-neutral skewness) and the relative trading activity in calls versus puts. Bollen and Whaley (2002) document that movements in the implied volatilities of S&P 500 index option and stock option series are strongly correlated with their net buying pressure. 63 The evidence indicates that net buying pressure along with upward sloping supply curves affects the average shape of IVFs as well as its movements through time. Until these effects are better understood and quantified, the practice of using option prices
63
Net buying pressure is defined as the total index-equivalent contract volume executed at the ask price less the total index-equivalent contract volume executed at the bid price.
Ch. 19:
Derivatives
1189
to deduce the stochastic process of the underlying asset will produce questionable results.
7. Social costs/benefits of derivatives trading A number of empirical studies have attempted to address issues related to the social cost/benefits of derivatives contracts. These studies fall into three categories. The first category includes studies that examine movements (either price or volatility) in the underlying asset market when a derivatives contract market is introduced, and the second category includes studies that examine movements in the underlying asset market when a derivatives contract expires. The final category examines the intertemporal relation between price changes in the derivative and asset markets. 7.1. Contract introductions In complete and frictionless markets, any new security can be synthesized from existing securities. Consequently, the introduction of derivative contracts should have no effect on the price or return volatility of the underlying asset. The studies that examine the effects of contract introduction focus primarily on stock options. The reason is that the sample of stock option introduction events is large, permitting more reliable statistical inference. Options on more than 2000 stocks have been introduced at different times over a period now spanning nearly 30 years. In contrast, the number of futures contracts on different financial assets is quite small, and there is only one event associated with each contract introduction. If an investigator wanted to assess the effect of the introduction of long-term interest rate futures contracts on interest rate volatility, the sample would consist of only one observation, the CBT’s introduction of the T-bond futures in August 1977. Separating the effects of contract introduction from other contemporaneous market events in such circumstances is virtually impossible. The effects of stock option introductions are measured stock return volatility and/or on stock price movements. Below, studies in each category are discussed in turn. Before doing so, however, it is useful to consider certain institutional details regarding the stock options market in the USA, as they have an influence on how to interpret the results. 7.1.1. Institutional factors The Chicago Board of Trade launched the Chicago Board Options Exchange on April 26, 1973, making it the first exchange-traded options market in the world. At the time, only call options were listed on sixteen NYSE stocks. Between April 1973 and December 1976 four more option exchanges began making markets in options, however, again, only calls. Puts on 25 stocks began trading in June 1977, but, even then, only on an experimental basis. The Securities Exchange Commission (SEC) allowed
1190
R.E. Whaley
each of the five stock options exchanges in the USA to trade puts on five stocks. Later in the year, however, the SEC declared a moratorium on option introductions while it reviewed the practices of the option exchanges and considered the economic impact of option trading. About three years later, the SEC lifted its moratorium and adopted a lottery system called the “Stock Allocation Plan”. According to this plan, exchanges randomly selected options from a pool of stocks and were each granted an exclusive franchise to trade options in those chosen stocks. The first options traded under the plan were listed June 2, 1980. On June 3, 1985, the first options on NASDAQ stocks were listed. These options were exempt from the SEC’s allocation plan and have always been eligible for multiple listing. On January 22, 1990, the SEC abolished its allocation plan and allowed all options to be multiple-listed. A program to “roll out” the grandfathered option classes began in November 1992 and went quarterly through February 1995. A second set of institutional factors that may have a bearing on the empirical results are the stock option listing criteria. The CBOE, for example, requires that the firm has (a) a minimum of seven million shares outstanding not including those held by insiders, and (b) a minimum of 2000 shareholders. In addition, it requires that the stock (c) traded at least 2 400 000 shares in the last twelve months, and (d) closed at a market price of at least $7.50 per share for the majority of the business days during the last three months. 64 Note that the first criterion requires that the prospective stock have at least seven million shares outstanding excluding those held by insiders. This helps ensure the availability of shares to short sell the underlying stocks, which market makers may have to do to hedge an open option position. To identify candidates for listing, the CBOE monitors the trading activity of all stocks satisfying the listing criteria. Among the factors considered in gauging the market’s potential interest are the stock’s trading volume and return volatility. The higher the trading volume and the greater the return volatility, the greater the projected option activity. Once the CBOE decides to list options on a particular stock, it registers with the SEC. Trading begins a few days later. As a matter of courtesy, the exchange sends a letter informing the firm of its decision. 7.1.2. Changes in return volatility A number of theoretical arguments on how option contract introduction affects the underlying stock’s return volatility have been advanced. A report by Nathan Associates Inc. (1969), for example, suggests that opening stock option markets might divert speculative trading volume away from the stock market, thereby reducing stock market liquidity and increasing stock return volatility. Others argue that return volatility will fall, albeit for different reasons. One explanation is based on selection bias. Exchanges use historical return volatility as one of the selection criteria for choosing stock
64
Chicago Board Options Exchange Constitution and Rules (May 1995), Paragraph 2113.
Ch. 19:
Derivatives
1191
options to list (i.e., the higher the volatility, the greater the prospective option trading activity). Since stock return volatility tends to be mean-reverting, exchanges may be systematically listing options on stocks whose volatilities are temporarily higher than their long-run levels. Similarly, the use of this criterion will pick off stocks whose return volatility is high due to sampling error. In both cases, return volatility is expected to fall after option introduction. A second measurement-based explanation is that, if the introduction of stock option markets allows stock market makers to hedge their inventories, the bid–ask spreads in the stock market should narrow after options listing due reduced inventory holding costs. With smaller spreads, there will be less bid–ask price bounce included in the measurement of stock return volatility. Yet another measurement-based explanation is based on “noise”. Noise, as defined in the context of Black (1986), is the difference between a security’s observed price and its intrinsic (but unobservable) value. One source of noise, as has just been discussed, is the market maker’s bid/ask spread. Another is that observed market prices do not convey everyone’s opinions because the market is incomplete. To the extent that the introduction of options affords potential market participants new trading opportunities (i.e., certain investors may be attracted by the availability of increased financial leverage at low transaction costs and/or the ability to short sell) and that arbitrageurs link the prices of options to the underlying stocks, stock prices will become less noisy and return volatility will be reduced. 65 None of the above arguments is particularly compelling. Moreover, they have little empirical support. Studies by Bansal, Pruitt and Wei (1989), Skinner (1989) and Damadoran and Lim (1991) report increased trading volume in the stock market following option listing, refuting the notion in the Nathan (1969) report that stock market liquidity would be reduced. In addition, Stephan and Whaley (1990) report that price changes in the stock market lead the option market by as much as fifteen minutes, which suggests that informed traders prefer depth and anonymity of the stock market. Fleming, Ostdiek and Whaley (1996) show that the relative illiquidity of the option market creates a price effect for block trades, such that an informed trader would rationally prefer the stock market. Fedenia and Grammatikos (1992) document that bid–ask spreads in stock markets narrow after options listing. They do not, however, directly assess the degree to whether variance reduction was caused by some true economic phenomenon or by the reduction in spread. The early empirical evidence examining the effects of option introduction on return
65 Grossman (1988) offers a similar argument regarding the usefulness of exhange-traded puts versus portfolio insurance. With dynamic portfolio insurance, the put is synthesized by trading the stocks in the underlying portfolio or an index futures. The absence of real put option trading prevents the dissemination to market participants of important information regarding expected future price volatility. The less the information being transmitted to liquidity providers, the more difficult for the market to absorb the trades implied by dynamic hedging strategies, and the higher the return volatility.
1192
R.E. Whaley
volatility finds post-listing reductions. Trennepohl and Dukes (1979) estimate betas for optioned and non-optioned stocks for the periods 1970–1973 and 1973–1976 and find that betas for optioned stocks decrease more than the betas of non-optioned stocks. Skinner (1989) finds that the total stock return variance falls by an average of 4.8% after options are introduced. Conrad (1989) estimates that the excess return variance decreases from 2.29% for the 200 days prior to listing to 1.79% for the 200 days after listing. Bansal, Pruitt and Wei (1989) find that variance drops by 6.4% after options are listed, while Damadoran and Lim (1991) detect a 20% drop. More recent evidence suggests that there is no change in stock return volatility when options are introduced. Bollen (1998) examines a sample of 1010 stock option listings ending in December 1992. Calls and puts are included, as well as exchangetraded and NASDAQ stocks. He carefully matches each stock with a control stock in the same industry, where each member of the control group has no options listed during the sample period. Over the full sample period, Bollen shows that there is no significant difference between the return variances of optioned stocks versus nonoptioned stocks. Furthermore, Bollen examines the NYSE–AMEX subsample used by Skinner and finds that, while return variance fell for optioned stocks after introduction, it also fell for the control group stocks, with the difference between the return variances being insignificantly different from zero. For the post-1986 period, Bollen documents increased return variance for the optioned stocks as well as non-optioned stocks, with no significant difference between the groups. 7.1.3. Price effects The primary argument supporting the notion that the introduction of stock option will induce a price reaction is one of market completeness. Suppose that the introduction of call options allows a subset of investors to take leveraged positions in an underlying stock that they were earlier impeded from doing. Assuming all of these investors step into the market at once and buy call options, their aggregate demand to buy would be met by market makers, who would delta-hedge their inventory by buying the stocks. If the aggregate demand for calls is large enough, stock prices will rise. Similarly, suppose that a subset of investors cannot short sell stocks or at least find it costly to do so. If put option markets are introduced, the excess demand to short sell will finally come to fruition through the purchase of puts. Market makers will then have to hedge their short put positions by short selling the stock, and the stock price will fall. Conrad (1989) examines the price effect of option introduction from 1974 to 1980. Using CBOE and AMEX stock option introduction dates gathered from the Wall Street Journal during the period 1974 through 1980, she analyzes abnormal stock returns and finds that the introduction of individual stock options caused a permanent price increase of about 2% in the underlying security, beginning approximately three days before the introduction. While the size of the increase is statistically significant, it is
Ch. 19:
Derivatives
1193
not large from an economic standpoint. Round-trip trading costs at the time would have almost certainly have eliminated trading profits. 66 The fact that the average return is positive, however, is consistent with the market completeness argument. Given her sample period, the option introductions fall squarely within the period in which the option exchanges listed only call options. The only put options in the sample are the puts introduced between June 1977. Thus, a potential explanation for her reported stock price increase is price pressure exerted by the market maker as a result of hedging short call option positions. Sorescu (2000) confirms Conrad’s findings for option introductions prior to 1980. For introductions after 1980, however, he finds a significant decrease. An explanation for the Sorescu results is that after 1980, call and put option introductions, for the most part, took place at the same time. Assuming the pent-up demand to short sell stocks supercedes the demand to buy, purchases of puts will exceed those of calls and the stock price will fall. 67 In related work, Figlewski and Webb (1993) examine the introduction of put options in June 1977. Here the market completeness argument is probably stronger than in the case of calls since a significant number of investors cannot short sell stocks in a costeffective manner. While Figlewski and Webb do not examine abnormal price effects per se, they report that (a) optionable stocks exhibit a significantly higher level of short interest than stocks without options, and (b) a stock’s short interest increases after option listing. 7.2. Contract expirations Studies of the effects of derivative contract expirations were motivated largely by public criticism of derivatives markets in the mid-1980s. This criticism was a consequence of abnormal price movements in the stock market during the “triple-witching” hour. The term, “triple”, in triple-witching refers to the fact that, at the time, stock index futures, stock index options, and stock options all expired at the close of trading on the third Friday of the contract month. Why stock options are included is not obvious since they are settled by delivery. 68 Stock index futures and options, on the other hand, are cashsettled, a feature that lies at the heart of the allegedly “abnormal” price movements. To understand the phenomenon, consider the S&P 500 index futures contract. Over the life of a typical contract, index arbitrageurs accumulate large positions in the S&P 500 futures and the stocks comprising the S&P 500 index portfolio. These positions are usually carried into expiration, at which time the futures contract is cashsettled at the reported index level. But, in order for the index arbitrageur to exit his stock positions at the cash settlement value of the index, he must place market-onclose orders to buy or short the stocks in the S&P 500 index portfolio, depending 66
The trading strategy would involve buying the stock and shorting the stock index portfolio. See also Danielsen and Sorescu (2001). 68 Klemkosky (1978) examines the stock returns in the week before and the week after stock option introductions. 67
1194
R.E. Whaley
upon whether his net stock position is short or long. When arbitrageurs exit these positions in unison on the quarterly expirations of the S&P 500 futures market, the reported level of the index moves. Stoll and Whaley (1986a, 1987) investigate these price movements for quarterly and monthly contract expirations for all contract expirations since contract market inception. They find that, while the reported index levels did move unpredictably in one direction or the other in the last half-hour of trading on the quarterly expirations, the size of the move is generally no more than should be expected given the size of the bid/ask spreads of the underlying stocks. A half-hour before the market close, the reported level of the index is based on the last trades of the index’s component stocks. Absent significant news in the marketplace, the last trades of about half the stocks occurred at a bid price and the last trades of about half the stocks occurred at an ask price, so the reported index level is trading at about a midpoint. At the market close, index arbitrageurs must unwind with market orders to buy or to sell, driving trade prices of all of the stocks in the index to an ask or bid. This movement from the midpoint to one-side of the spread constitutes a 0.25% price move for the S&P 500 index more than half of the average price move observed during the last half hour of trading on expiration days, 0.40%. The incremental 0.15% is an additional liquidity cost, no different in spirit (or in size) from the liquidity cost incurred in a large block trade. In spite of the fact that at the time the evidence documented abnormal trading volume but no abnormal price movements at the close on expiration day, the CME changed the cash settlement from the close on expiration day to the open beginning with the June 1987 contract. Stoll and Whaley (1991) investigate the impact of this change and find the size price reversal did not change relative to their earlier study – only its location during the day. Karolyi (1996) examines Nikkei 225 futures contract expirations, and, like Stoll and Whaley, concludes that the expiration of the Nikkei 225 futures induces abnormal trading volume but economically insignificant price effects. Stoll and Whaley (1997) find similar results for the Australian All Ordinaries Share Price Index futures and options expirations, as do Bollen and Whaley (1999) for the Hang Seng index futures in Hong Kong. 69
7.3. Market synchronization In frictionless markets, derivative contracts are redundant securities. Hence, the price changes of derivative contracts should be perfectly positively correlated with the price changes of their underlying assets. With friction, however, the situation may change.
69
Day and Lewis (1988) examine changes in implied volatility of index options in the days surrounding contract expiration.
Ch. 19:
Derivatives
1195
7.3.1. Stock market versus option market Early evidence in the stock option market indicates that price changes in the option market tend to lead those in the stock market. Manaster and Rendleman (1982), for example, analyze close-to-close returns of portfolios based on the relative difference between stock prices and stock prices implied by option prices. They conclude that closing option prices contain information that is not contained in closing stock prices. Furthermore, they claim that it takes up to one day for the stock market to adjust. This evidence is consistent with the view that the extreme leverage provided by stock option contracts makes them the preferred trading vehicle for informed traders. The use of closing price data, however, seriously undermines the interpretation of Manaster and Rendleman results. The stock option market closes at 3:10 PM (CST), ten minutes after the close of the stock market. Information that Manaster and Rendlemen ascribe to option prices and not stock prices may only be information that has disseminated into the marketplace between closing times in the two markets. Bhattacharya (1987) uses observed intraday bid/ask call prices to compute implied bid/ask stock prices. These prices are then compared to actual bid/ask stock prices in order to identify arbitrage opportunities. The stock is considered under-priced (overpriced) if the implied bid (ask) is higher (lower) than the actual ask (bid). A simulated trading strategy based on these arbitrage signals indicates that profits are insufficient overcome trading costs for all intraday holding periods. He also confirms the Manaster and Rendleman results by documenting statistically significant excess returns for overnight holding periods (after the costs of transacting but before information search and exchange seat opportunity costs). A flaw in Bhattacharya’s test design is that it can only detect whether the option market leads the stock market and not vice versa. 70 In order to test for the possibility of stock prices leading option prices, he should have used observed bid/ask stock prices to compute implied bid/ask call prices to identify over- (under)-pricing. Although he shows that option price changes have some predictive power, his results do not preclude the possibility that the stock price changes have enormous predictive power on option price changes. Anthony (1988) uses daily data to examine whether trading in the option market causes trading in the stock market (or vice versa). 71 He concludes “. . . trading in call options leads trading in the underlying shares, with one day lag”. His results are subject to the same caveats as Manaster and Rendleman due to the non-simultaneity of closing times in the two markets. Moreover, his evidence is not overwhelming. He finds that (a) option volume leads stock volume for 13 firms, (b) stock volume leads option volume for 4 firms, and (c) there is no unambiguous direction of causality for the remaining 8 firms.
70 71
Bhattacharya recognizes this problem but does not perform the simulations in the reverse way. Anthony uses the terms “cause” and “causality” in the sense of Granger (1969).
1196
R.E. Whaley
Stephan and Whaley (1990) use intraday transaction data to re-examine the price change and trading volume results. Their sample consists of all trades of CBOE call options on individual stocks during the first quarter of 1986, and all individual stock trades from the NYSE. From the trade price files, they compute five-minute stock price changes using the NYSE data, and five-minute implied stock price changes using the CBOE data. They also aggregate trading volume within each five-minute interval. When implied stock price changes are regressed on leading, contemporaneous, and lagged stock price changes, the stock market appears to lead the option market by as much as fifteen to twenty minutes on average. They divide the sample by intraday stock return and by option delta to see if option price changes tend to lead on days with large positive or negative stock returns due to the release of information. The results for the different sub-samples are qualitatively the same. Stephan and Whaley also perform the lead/lag test using number of trades and number of contracts/shares traded in each interval and find that the stock market’s lead can be as much as 30 minutes or more. Chan, Chung and Johnson (1993) extend the Stephan and Whaley results in an important way. They use the same sample period, but, in addition to trade data, they use bid/ask quote data for the options as well as the stocks. Consistent with Stephan and Whaley, they find that when trade prices are used to generate five-minute price changes, the stock market leads the option market by as much as fifteen minutes or more. When bid/ask midpoints are used to generate five-minute price changes, however, the prices in the two markets appear to move simultaneously. Taken together, these two results suggest that the documented lead of stock market is largely driven by stock options trading less frequently than stocks. The fact that the lead disappears when quote midpoints are used is not surprising considering that option market makers condition quotes on the prevailing stock price and, when the stock price changes, the quotes on all option series are updated automatically using an option valuation model. 7.3.2. Stock market versus index derivatives markets More recent work on market synchronization has tended to focus on index derivatives. Stoll and Whaley (1990), for example, examine five-minute returns for the S&P 500 futures in relation to the returns the underlying S&P 500 index. On face appearance, the stock market appears to lag the futures market by about five minutes on average, but occasionally as long as ten minutes or more. But this lag is illusory. The observed level of the S&P 500 index is an amalgam of 500 last trade prices, where many stocks have not traded in some time. The observed index level, therefore, is always a stale indicator of the true index level. This “infrequent trading” effect induces positive serial correlation in observed intraday index returns that does not appear in the returns of the index futures. To purge the infrequent trading effect, Stoll and Whaley model the observed index returns as an ARMA(2,3) process, and then correlate the S&P 500 index return innovations with the returns of the S&P 500 futures. They find pronounced lag that appears in raw returns disappears using return innovations and conclude the returns
Ch. 19:
Derivatives
1197
in the futures market and the stock market are, for the most part, contemporaneous. The mild evidence that continues to show that the futures market leads even after accounting for the effects of infrequent trading they ascribe to being support for the price discovery hypothesis that new market information disseminates in the market where it is least expensive to trade – in this case, the futures market. A number of subsequent empirical investigations re-examine the relation using more recent data, more refined measurement of the infrequent trading effect, and different index futures market domestically and internationally, and find that it continues to persist. 72 The evidence regarding the lead-lag relation between stock index options and the stock market is less plentiful because of the need to convert index option price movements into index movements. Kleidon and Whaley (1992) examine intraday price movements of the S&P 500 index, the S&P 500 futures, and the S&P 100 index options in the days surrounding the October 1987 crash. They use five-minute price changes. To compute the five-minute price changes for the S&P 100 index options, they take all index option trade prices in a given five-minute interval and perform a non-linear regression to solve of the implied index level and volatility rate. The implied index levels are used to compute S&P 100 index returns, and the implied S&P 100 index returns are compared to the returns of the S&P 500 index and index futures. They find that in the first half of October 1987 the return behavior of stocks, index and index options behave in a very similar fashion. During the October 1987 crash, however, the usual integration broke down but the problem lay in the stock market. The usual links between the futures and options markets lay largely intact. By contrast, both the futures and options markets were delinked from the cash market, which showed price levels much higher than those in either of the other markets. This result is not surprising considering that order queues in printers at the specialists’ posts were up to seventyfive minutes long by noon on October 19th. On this day, the index derivatives prices were better reflections of the level of the stock market than the stock market itself. 7.4. Summary and analysis The fact that equity derivatives markets are so active attests to the social benefits they provide. Call options offer the market participants increased leverage and limited liability. Put options offer similar advantages, but in addition provide some market participants (such as retail customers) with the ability to short sell without prohibitive restrictions. In addition, index futures and options are inexpensive ways for the public to trade portfolios of stocks. Of the categories of studies in this section, the most promising area for future empirical research is in the area market completeness/trading restrictions. Studies of
72 See, for example, Chan (1992), Wahab and Lashgari (1993), Shyy, Vijayraghavan and Scott-Quinn (1996), Fleming, Ostdiek and Whaley (1996), Booth, So and Tse (1999) and Frino, Walter and West (2000).
1198
R.E. Whaley
contract expirations reach the common conclusion that while trading volume is high, price movements are no larger than one would expect in block trades, and studies of market synchronization reach the conclusion that prices in derivatives and asset markets are closely integrated. The studies on stock option contract introductions, however, appear to indicate that stock prices may move as a result of option introduction. While this evidence supports the notion that stock options help complete the market, the evidence is surprising considering that (a) stock option trading activity pales by comparison to stock market trading in the best of times, and (b) stock options are seldom highly active from the outset. For stock prices to move significantly, stock option trading must be substantial. Showing the level of option trading (and open interest) relative to stock trading (and shares outstanding) just after options are introduced would add to the credibility of the reported price movement results. Along the same line, examining the time-series behavior of differences between the put and call option-implied volatilities (or, equivalently, violations of put–call parity) in stock option markets also seems a worthwhile pursuit. As shown in D’Avolio (2002), stocks are frequently difficult to borrow, and rebate rates on borrowed stocks are frequently negative. In such circumstances, the implied volatilities of puts will exceed those of calls since the market maker is faced with extraordinary costs when shorting stocks to hedge short put positions. This fact was noted recently by Lamont and Thaler (2003) for stock options written on Palm after its spin off from 3Com, and is examined more systematically in Ofek, Richardson and Whitelaw (2002). Much work remains to be done, however.
8. Summary The purpose of this chapter is to provide an overview of the evolution of derivatives contract markets and derivatives research over the past thirty years. The chapter has six complementary sections. Section 2 contains a brief history of derivatives contracts and contract markets. Although the origin of derivatives use dates back thousands of years, the most important innovations occurred only recently, in the 1970s and 1980s. Concurrent with these industry innovations was the development of modern-day option valuation theory. These advances are reviewed in the second and third sections. The key contribution is seminal theoretical framework of the Black–Scholes (1973) and Merton (1973) (“BSM”) model. The key economic insight of their model is that a risk-free hedge can be formed between a derivatives contract and its underlying asset. This implies that contract valuation is possible under the assumption of risk-neutrality without loss of generality. Not only does this framework provide BSM with the ability to value standard call and put options, it has provided other researchers with the ability to value thousands of different derivatives contract structures such as caps, collars, floors, binary options, and quantos. Many of these contributions, as well as other extensions to the BSM model, are summarized.
Ch. 19:
Derivatives
1199
The final three sections of this chapter summarize empirical work that investigates the pricing and valuation of derivatives contracts and the efficiency of the markets within which they trade. The studies are divided into three groups. In the first group are studies that focus on testing no-arbitrage pricing conditions. A review of tests of the no-arbitrage price relations between forwards and futures and their underlying assets as well as tests lower price bounds and put–call parity in the options markets is provided. The second group contains studies that attempt to evaluate option empirical performance of option valuation models. The approaches used include investigating the in-sample properties of option values by examining pricing errors or patterns in implied volatilities, examining the performance of different option valuation models by simulating a trading strategy based on under- and over-pricing, and examining the informational content of the volatility implied by option prices. The third and final group of studies focuses on the social costs and/or benefits that arise from derivatives trading. Through this review, one important fact emerges – the BSM model is one of the most resilient in the history of financial economics.
References Amin, K., and R.A. Jarrow (1992), “Pricing options on risky assets in a stochastic interest rate economy”, Mathematical Finance 2:217−237. Anthony, J.H. (1988), “The interrelation of stock and options market trading-volume data”, Journal of Finance 43:949−964. Aristotle (350 B.C.), Politics (Benjamin Jowett translation). http://classics.mit.edu/Aristotle/politics.html. Asay, M.R. (1982), “A note on the design of commodity option contracts”, Journal of Futures Markets 2:1−8. Bachelier, L. (1964), “Theory of speculation”, in: P.H. Cootner, ed., The Random Character of Stock Market Prices (MIT Press, Cambridge, MA). Originally published in 1900. Bailey, W., and R. Stulz (1989), “The pricing of stock index options in a general equilibrium model”, Journal of Financial and Quantitative Analysis 24:1−12. Bakshi, G., C. Cao and Z. Chen (1997), “Empirical performance of alternative option pricing models”, Journal of Finance 52:2003−2049. Bakshi, G., N. Kapadia and D. Madan (2001), “Stock return characteristics, skewness laws, and the differential pricing of individual equity options”, Review of Financial Studies. Bansal, V.K., S.W. Pruitt and K.C.J. Wei (1989), “An empirical examination of the impact of CBOE option initiation on the volatility and trading volume of the underlying equities: 1973–1986”, Financial Review 24:19−29. Barone-Adesi, G., and R.E. Whaley (1987), “Efficient analytic approximation of American option values”, Journal of Finance 42:301−320. Bates, D.S. (1996), “Dollar jump fears – 1984–1992: distributional abnormalities implicit in currency futures options”, Journal of International Money and Finance 15:65−93. Bates, D.S. (2000), “Post-’87 crash fears in the S&P 500 futures options market”, Journal of Econometrics 94:181−238. Beckers, S. (1980), “The constant elasticity of variance model and its implications for option pricing”, The Journal of Finance 35(3):661−673. Beckers, S. (1981), “Standard deviations implied in option prices as predictors of future stock price variability”, Journal of Banking and Finance 5:363−381.
1200
R.E. Whaley
Bhattacharya, M. (1983), “Transaction data tests of efficiency of the Chicago Board Options Exchange”, Journal of Financial Economics 12:161−185. Bhattacharya, M. (1987), “Price changes of related securities: the case of call options and stocks”, Journal of Financial and Quantitative Analysis 22:1−15. Black, F. (1975), “Fact and fantasy in the use of options”, Financial Analysts Journal 31:36−41; 61−72. Black, F. (1976a), “Studies of stock price volatility changes”, in: Proceedings of the 1976 Meetings of the Business and Economics Section, American Statistical Association:177−181. Black, F. (1976b), “The pricing of commodity contracts”, Journal of Financial Economics 3:167−179. Black, F. (1986), “Noise”, Journal of Finance 41:529−543. Black, F., and M. Scholes (1972), “The valuation of option contracts and a test of market efficiency”, Journal of Finance 27(2):399−417. Black, F., and M. Scholes (1973), “The pricing of options and corporate liabilities”, Journal of Political Economy 81:637−659. Blomeyer, E.C., and J.C. Boyd (1988), “Empirical tests of boundary conditions for options on Treasury bond futures”, Journal of Futures Markets 4:185−198. Bodurtha, J.N., and G.R. Courtadon (1986), “Efficiency tests of the foreign currency options market”, Journal of Finance 41:151−162. Bodurtha, J.N., and G.R. Courtadon (1987), “Tests of an American pricing model on the foreign currency options market”, Journal of Financial and Quantitative Analysis 22:153−167. Bollen, N.P.B. (1998), “A note on the impact of options on stock return volatility”, Journal of Banking and Finance 22:1181−1191. Bollen, N.P.B., and E. Rasiel (2003), “The performance of alternative valuation models in the OTC currency options market”, Journal of International Money and Finance 22:33−64. Bollen, N.P.B., and R.E. Whaley (1999), “Do expirations of Hang Seng index derivatives affect stock market volatility?”, Pacific-Basin Journal 7:453−470. Bollen, N.P.B., and R.E. Whaley (2002), “Does price pressure affect the shape of implied volatility functions?” Journal of Finance, forthcoming. Bollen, N.P.B., S.A. Gray and R.E. Whaley (2000), “Regime-switching in foreign exchange rates: evidence from currency option prices”, Journal of Econometrics 94:239−276. Bondarenko, O. (2000), “Recovering risk-neutral densities: a new nonparametric approach”, Working Paper (University of Illinois at Chicago). Bondarenko, O. (2001), “On market efficiency and joint hypothesis”, Working Paper (University of Illinois at Chicago). Booth, G., R. So and Y. Tse (1999), “Price discovery in the German equity index derivatives markets”, Journal of Futures Markets 19:619−643. Boyle, P.P. (1977), “Options: a Monte Carlo approach”, Journal of Financial Economics 4:323−338. Boyle, P.P. (1988), “A lattice framework for option pricing with two state variables”, Journal of Financial and Quantitative Analysis 23:1−12. Boyle, P.P., and D. Emanuel (1980), “Discretely-adjusted option hedges”, Journal of Financial Economics 8:259−282. Boyle, P.P., J. Evnine and S. Gibbs (1989), “Numerical evaluation of multivariate contingent claims”, Review of Financial Studies 2:241−250. Brandt, M.W., and T. Wu (2002), “Cross-sectional tests of deterministic volatility functions”, Journal of Empirical Finance 9:525−550. Breeden, D.T., and R.H. Litzenberger (1978), “Prices of state-contingent claims implicit in option prices”, Journal of Business 51:621−651. Brennan, M.J., and E.S. Schwartz (1977), “The valuation of American put options”, Journal of Finance 32:449−462. Campa, J., K. Chang and R. Reider (1998), “Implied exchange rate distributions: evidence from the OTC option markets”, Journal of International Money and Finance 17:117−160. Carr, P. (1998), “Randomization and the American put”, Review of Financial Studies 11:597−626.
Ch. 19:
Derivatives
1201
Chan, K. (1992), “A further analysis of the lead/lag relationship between the cash market and stock index futures market”, Review of Financial Studies 5:123−152. Chan, K., Y.P. Chung and H. Johnson (1993), “Why option prices lag stock prices: a trading-based explanation”, Journal of Finance 48:1957−1967. Chang, C.W., and J.S.K. Chang (1990), “Forward and futures prices: evidence from the foreign exchange markets”, Journal of Finance 45:1333−1336. Chicago Board of Trade (1994), Commodities Trading Manual (Chicago Board of Trade). Chicago Board Options Exchange (1995), Constitution and Rules (Chicago Board Options Exchange). Chiras, D.P., and S. Manaster (1978), “The information content of option prices and a test of market efficiency”, Journal of Financial Economics 6:213−234. Conrad, J. (1989), “The price effect of option introduction”, Journal of Finance 44:487−498. Cornell, B. (1981), “The relationship between volume and price variability in futures markets”, Journal of Futures Markets 1:303−316. Cornell, B., and M.R. Reinganum (1981), “Forward and futures prices: evidence from the foreign exchange market”, Journal of Finance 36:1035−1045. Corrado, C.J., and T. Su (1996), “Skewness and kurtosis in S&P 500 index returns implied by option prices”, Journal of Financial Research 19:175−192. Cox, J.C., and S.A. Ross (1976), “The valuation of options for alternative stochastic processes”, Journal of Financial Economics 3:145−166. Cox, J.C., S.A. Ross and M. Rubinstein (1979), “Option pricing: a simplified approach”, Journal of Financial Economics 7:229−264. Cox, J.C., J.E. Ingersoll and S.A. Ross (1981), “The relation between forward and futures prices”, Journal of Financial Economics 9:321−346. Damadoran, A., and J. Lim (1991), “The effects of option listing on the underlying stocks’ return process”, Journal of Banking and Finance 15:647−664. Danielsen, B.R., and S.M. Sorescu (2001), “Why do option introductions depress stock prices? A study of diminsishing short sale constraints”, Journal of Financial and Quantitative Analysis 36:451−484. D’Avolio, G. (2002), “The market for borrowing stock”, Journal of Financial Economics 66:271−306. Day, T.E., and C.M. Lewis (1988), “The behavior of the volatility implicit in the prices of stock index options”, Journal of Financial Economics 22:103−122. Day, T.E., and C.M. Lewis (1992), “Stock market volatility and the information content of stock index options”, Journal of Econometrics 52:267−287. Dennis, P., and S. Mayhew (2002), “Risk-neutral skewness: evidence from stock options”, Journal of Financial and Quantitative Analysis 37:471−493. Derman, E., and I. Kani (1994a), “Riding on the smile”, Risk 7:32−39. Derman, E., and I. Kani (1994b), “The volatility smile and its implied tree”, Working Paper (Goldman Sachs). Dumas, B., J. Fleming and R.E. Whaley (1998), “Implied volatility functions: empirical tests”, Journal of Finance 53:2059−2106. Dupire, B. (1994), “Pricing with a smile”, Risk 7:18−20. Ederington, L.H., and W. Guan (2000), “The information content of implied volatility and the ‘frown’ in option prices”, Working Paper (University of Oklahoma). Emanuel, D.C., and J.D. MacBeth (1982), “Further results on the constant elasticity of variance call option pricing model”, Journal of Financial and Quantitative Analysis 4:533−554. Evnine, J., and A. Rudd (1985), “Index options: the early evidence”, Journal of Finance 40:743−756. Fedenia, M., and T. Grammatikos (1992), “Options trading and the bid-ask spread of the underlying stocks”, Journal of Business 65(3):335−351. Figlewski, S. (1984), “Hedging performance and basis risk in stock index futures”, Journal of Finance 39:657−669. Figlewski, S. (1989), “Option arbitrage in imperfect markets”, Journal of Finance 44:1289−1311.
1202
R.E. Whaley
Figlewski, S., and G.P. Webb (1993), “Options, short sales, and market completeness”, Journal of Finance 48:761−777. Fleming, J. (1993), “The valuation and information content of S&P 100 index option prices”, Ph.D. Dissertation (Duke University, Durham, NC). Fleming, J. (1998), “The quality of market volatility forecasts implied by S&P 100 index option prices”, Journal of Empirical Finance 5:317−345. Fleming, J., and R.E. Whaley (1994), “The value of wildcard options”, Journal of Finance 49:215−236. Fleming, J., B. Ostdiek and R.E. Whaley (1995), “Predicting stock market volatility: a new measure”, Journal of Futures Markets 15(3):265−302. Fleming, J., B. Ostdiek and R.E. Whaley (1996), “Trading costs and the relative rates of price discovery in stock, futures, and options markets”, Journal of Futures Markets 16:353−387. French, K.R. (1983), “A comparison of forward and futures prices”, Journal of Financial Economics 12:311−342. Frino, A., T. Walter and A. West (2000), “The lead–lag relationship between equities and stock index futures around information releases”, Journal of Futures Markets 20:467−487. Galai, D. (1978), “Empirical tests of boundary conditions for CBOE options”, Journal of Financial Economics 6:187−211. Galai, D. (1979), “A convexity test for traded options”, Quarterly Review of Economics and Business 19:83−90. Garber, P.M. (2000), Famous First Bubbles: The Fundamentals of Early Manias (MIT Press, Cambridge, MA). Garman, M.B., and S.W. Kohlhagen (1983), “Foreign currency option values”, Journal of International Money and Finance 2:231−237. Geske, R. (1979a), “The valuation of compound options”, Journal of Financial Economics 7:63−81. Geske, R. (1979b), “A note on an analytical formula for unprotected American call options on stocks with known dividends”, Journal of Financial Economics 7:375−380. Geske, R., and H.E. Johnson (1984), “The American put valued analytically”, Journal of Finance 39:1511−1524. Goldman, B., H. Sosin and M.A. Gatto (1979), “Path dependent options: buy at the low, sell at the high”, Journal of Finance 34:1111−1127. Gould, J.P., and D. Galai (1974), “Transaction costs and the relationship between put and call prices”, Journal of Financial Economics 1:105−129. Granger, C.W.J. (1969), “Investigating causal relations by econometric models and cross spectral methods”, Econmetrica 37:424−438. Gray, S.F., and R.E. Whaley (1997), “Valuing S&P 500 bear marlet warrants with a periodic reset”, Journal of Derivatives 5:99−106. Grossman, S. (1988), “An analysis of the implications for stock and futures price volatility of program trading and dynamic hedging strategies”, Journal of Business 61:275−298. Harvey, C.R., and R.E. Whaley (1991), “S&P 100 index option volatility”, Journal of Finance 46: 1551−1561. Harvey, C.R., and R.E. Whaley (1992), “Market volatility prediction and the efficiency of the S&P 100 index option market”, Journal of Financial Economics 30:43−73. Heston, S.L. (1993), “A closed-form solution for options with stochastic volatility with applications to bond and currency options”, Review of Financial Studies 6:327−343. Hsieh, D.A., and L. Manas-Anton (1988), “Empirical regularities in the Deutschemark futures options”, Advances in Futures and Options Research 3:183−208. Hull, J., and A. White (1987), “The pricing of options on assets with stochastic volatilities”, Journal of Finance 42:281−300. Jackwerth, J.C. (2000), “Recovering risk aversion from option prices and realized returns”, Review of Financial Studies 13:433−451.
Ch. 19:
Derivatives
1203
Jackwerth, J.C., and M. Rubinstein (1996), “Recovering probability distributions from option prices”, Journal of Finance 51:1611−1631. Jarrow, R.A., and G.S. Oldfield (1981), “Forward contracts and futures contracts”, Journal of Financial Economics 9:373−382. Jarrow, R.A., and A. Rudd (1982), “Approximate option valuation for arbitrary stochastic processes”, Journal of Financial Economics 10:347−369. Jarrow, R.A., and A. Rudd (1983), Option Pricing (Irwin, Homewood, IL). Johnson, H.E. (1987), “Options on the minimum and maximum of several assets”, Journal of Financial and Quantitative Analysis 22:277−283. Jorion, P. (1995), “Predicting volatility in the foreign exchange market”, Journal of Finance 50:507−528. Kamrad, B., and P. Ritchken (1991), “Multinomial approximating models for options with k state variables”, Management Science 37(12):1640−1652. Karolyi, G.A. (1993), “A Bayesian approach to modeling stock return volatility for option valuation”, Journal of Financial and Quantitative Analysis 28:579−594. Karolyi, G.A. (1996), “Stock market volatility around expiration days in Japan”, Journal of Derivatives 4:23−43. Kleidon, A.W., and R.E. Whaley (1992), “One market? Stocks, futures, and options during October 1997”, Journal of Finance 47(3):851−877. Klemkosky, R.C. (1978), “The impact of option expirations on stock prices”, Journal of Financial and Quantitative Analysis 13:507−518. Klemkosky, R.C., and B.G. Resnick (1979), “Put–call parity and market efficiency”, Journal of Finance 34:1141−1155. Lamont, O., and R. Thaler (2003), “Can the market add and subtract? Mispricing in tech stock carve-outs”, Journal of Political Economy 111:227−268. Lamoureux, C.G., and W.D. Lastrapes (1993), “Forecasting stock-return variance: toward understanding stochastic implied volatilities”, Review of Financial Studies 6:293−326. Latane, H.A., and R.J. Rendleman (1976), “Standard deviations of stock price ratios implied in option prices”, Journal of Finance 31:369−381. Lintner, J. (1965), “The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets”, Review of Economics and Statistics 47:13−37. Longstaff, F.A. (1995), “Option pricing and the martingale restriction”, Review of Financial Studies 8:1091−1124. Lyons, R.K. (1988), “Tests of the foreign exchange risk premium using the expected second moments implied by option pricing”, Journal of International Money and Finance 7:91−108. MacBeth, J.D., and L.J. Merville (1980), “Tests of the Black/Scholes and Cox call option valuation models”, Journal of Finance 35:285−301. MacKinlay, C.A., and K. Ramaswamy (1988), “Index futures arbitrage and the behavior of stock index futures prices”, Review of Financial Studies 1:137−158. MacMillan, L.W. (1986), “Analytic approximation for the American put option”, Advances in Futures and Options Research 1:119−139. Manaster, S., and R. Rendleman (1982), “Option prices as predictors of equilibrium stock prices”, Journal of Finance 37:1043−1048. Margrabe, W. (1978), “The value of an option to exchange one asset for another”, Journal of Finance 33:177−186. Melick, W.R., and C.P. Thomas (1997), “Recovering an asset’s implied probability density function from option prices: an application to oil prices during the Gulf crisis”, Journal of Financial and Quantitative Analysis 32:91−115. Melino, A., and S. Turnbull (1990), “Pricing foreign currency options with stochastic volatility”, Journal of Econometrics 45:239−265. Melino, A., and S. Turnbull (1995), “Misspecification and the pricing and hedging of long-term foreign currency options”, Journal of International Money and Finance 14:373−393.
1204
R.E. Whaley
Merton, R.C. (1973), “Theory of rational option pricing”, Bell Journal of Economics and Management Science 4:141−183. Merton, R.C. (1976), “Option pricing when underlying stock returns are discontinuous”, Journal of Financial Economics 3:125−143. Meulbroek, L. (1992), “A comparison of forward and futures prices of an interest-rate sensitive financial asset”, Journal of Finance 47:381−396. Miller, M.H., and F. Modigliani (1961), “Dividend policy, growth and the valuation of shares”, Journal of Business 34:411−433. Miller, M.H., J. Muthuswamy and R.E. Whaley (1994), “Mean reversion in S&P 500 index basis changes: arbitrage-induced or statistical illusion?” Journal of Finance 49:479−513. Modigliani, F., and M.H. Miller (1958), “The cost of capital, corporation finance and the theory of investment”, American Economic Review 48:261−297. Nathan Associates Inc. (1969), Public Policy of a Futures-Type Market in Options on Securities (Chicago Board of Trade). Ofek, E., M. Richardson and R.F. Whitelaw (2002), “Limited arbitrage and short sales restrictions: evidence from the options market”, Working Paper (New York University). Ogden, J.P., and A.L. Tucker (1987), “Empirical tests of the efficiency of the currency futures option markets”, Journal of Futures Markets 7:695−703. Park, H.Y., and A.Y. Chen (1985), “Differences between futures and forward prices: a further investigation of marking to market effects”, Journal of Futures Markets 5:77−88. Parkinson, M. (1977), “Option pricing: the American put”, Journal of Business 50:21−36. Rendleman Jr, R.J., and B.J. Bartter (1979), “Two-state option pricing”, Journal of Finance 34:1093−1110. Rendleman Jr, R.J., and C. Carabini (1979), “The efficiency of the Treasury bill futures markets”, Journal of Finance 34:895−914. Roll, R. (1977), “An analytic valuation formula for unprotected American call options on stocks with known dividends”, Journal of Financial Economics 5:251−258. Rubinstein, M. (1985a), “Alternative paths to portfolio insurance”, Financial Analysts Journal 41:42−52. Rubinstein, M. (1985b), “Nonparametric tests of alternative option pricing models using all reported trades and quotes on the 30 most active CBOE option classes from August 23, 1976 through August 31, 1978”, Journal of Finance 40:455−480. Rubinstein, M. (1991), “Pay now, choose later”, Risk 4:13. Rubinstein, M. (1994), “Implied binomial trees”, Journal of Finance 49:771−818. Rubinstein, M., and E. Reiner (1991), “Breaking down the barriers”, Risk 4:31−35. Samuelson, P.A. (1965), “Rational theory of warrant pricing”, Industrial Management Review 10:13−31. Schwartz, E.S. (1977), “The valuation of warrants: implementing a new approach”, Journal of Financial Economics 4:79−93. Schwert, G.W. (1989), “Why does stock market volatility change over time?” Journal of Finance 44:1115−1154. Schwert, G.W. (1990), “Stock volatility and the crash of ’87”, Review of Financial Studies 3:77−102. Schwert, G.W. (2002), “Stock volatility in the new millenium: how wacky is NASDAQ?”, Journal of Monetary Economics 49:3−26. Scott, L.O. (1987), “Option pricing when the variance changes randomly: theory estimation, and an application”, Journal of Financial and Quantitative Analysis 22:419−438. Sharpe, W.F. (1964), “Capital asset prices: a theory of market equilibrium under conditions of risk”, Journal of Finance 19:425−442. Shastri, K., and K. Tandon (1986a), “An empirical test of a valuation model for American options on futures contracts”, Journal of Financial and Quantitative Analysis 21:377−392. Shastri, K., and K. Tandon (1986b), “Valuation of foreign currency options: some empirical tests”, Journal of Financial and Quantitative Analysis 21:145−160.
Ch. 19:
Derivatives
1205
Sheikh, A.M. (1991), “Transaction data tests of S&P 100 call option pricing”, Journal of Financial and Quantitative Analysis 26:459−475. Shimko, D. (1993), “Bounds of probability”, Risk 6:33−37. Shleifer, A., and R. Vishny (1997), “The limits of arbitrage”, Journal of Finance 52:35−55. Shyy, G., V. Vijayraghavan and B. Scott-Quinn (1996), “A further investigation of the lead–lag relationship between the cash market and stock index futures market with the use of bid/ask quotes: the case of France”, Journal of Futures Markets 16:405−420. Skinner, D. (1989), “Options markets and stock return volatility”, Journal of Financial Economics 23:61−78. Sorescu, S.M. (2000), “The effect of options on stock prices: 1973 to 1995”, Journal of Finance 55(1):487−514. Sprenkle, C.M. (1961), “Warrant prices as indicators of expectations and preferences”, Yale Economic Essays 1:172−231. Stein, E.M., and J.C. Stein (1991), “Stock price distributions with stochastic volatility: an analytic approach”, Review of Financial Studies 4:727−752. Stephan, J.A., and R.E. Whaley (1990), “Intraday price change and trading volume relations in the stock and stock option markets”, Journal of Finance 45:191−220. Stoll, H.R. (1969), “The relationship between put and call prices”, Journal of Finance 24:802−824. Stoll, H.R., and R.E. Whaley (1985), “The new options markets”, in: A. Peck, ed., Futures Markets: Their Economic Role (American Enterprise Institute, Washington, DC) pp. 205–289. Stoll, H.R., and R.E. Whaley (1986a), “Expiration day effects of index options and futures”, Monograph Series in Economics and Finance, No. 1986-3 (Graduate School of Business Administration, New York University). Stoll, H.R., and R.E. Whaley (1986b), “New option instruments: arbitrageable linkages and valuation”, Advances in Futures and Options Research 1(A):25−62. Stoll, H.R., and R.E. Whaley (1987), “Program trading and expiration-day effects”, Financial Analysts Journal 43:16−28. Stoll, H.R., and R.E. Whaley (1990), “The dynamics of stock index and stock index futures returns”, Journal of Financial and Quantitative Analysis 25:441−468. Stoll, H.R., and R.E. Whaley (1991), “Expiration-day effects: what has changed?” Financial Analysts Journal 47:58−72. Stoll, H.R., and R.E. Whaley (1997), “Expiration-day effects of the All Ordinaries Share Price Futures: empirical evidence and alternative settlement procedures”, Australian Journal of Management 22: 139−174. Stulz, R. (1982), “Options on the minimum or maximum of two assets”, Journal of Financial Economics 10:161−185. Sullivan, E.J., and T.M. Weithers (1991), “Louis Bachelier: the father of modern option pricing theory”, Journal of Economic Education 22:165−171. Trennepohl, G.L., and W.P. Dukes (1979), “CBOE options and stock volatility”, Review of Business and Economic Research 14:49−60. Viswanath, P.V. (1989), “Taxes and futures-forward price difference in the 91-day T-bill market”, Journal of Money, Credit and Banking 21(2):190−205. Wahab, M., and M. Lashgari (1993), “Price dynamics and error correction in stock index and stock index futures markets”, Journal of Futures Markets 13:711−742. Whaley, R.E. (1981), “On the valuation of American call options on stocks with known dividends”, Journal of Financial Economics 9:207−211. Whaley, R.E. (1982), “Valuation of American call options on dividend-paying stocks: empirical tests”, Journal of Financial Economics 10:29−58. Whaley, R.E. (1986), “Valuation of American futures options: theory and empirical tests”, Journal of Finance 41:127−150.
1206
R.E. Whaley
Whaley, R.E. (1993), “Derivatives on market volatility: hedging tools long overdue”, Journal of Derivatives 1:71−84. Whaley, R.E. (1996), “Valuing spreads options”, Energy in the News (Summer), pp. 42–45. Whaley, R.E. (2002), Derivatives: Markets, Valuation, and Risk Management (Irwin McGraw-Hill). Wiggins, J.B. (1987), “Option values under stochastic volatility: theory and empirical estimates”, Journal of Financial Economics 19:351−372.
Chapter 20
FIXED-INCOME PRICING ° QIANG DAI Stern School of Business, New York University KENNETH J. SINGLETON °° Graduate School of Business, Stanford University
Contents Abstract Keywords 1. Introduction 2. Fixed-income pricing in a diffusion setting 2.1. 2.2. 2.3. 2.4.
The term structure Fixed-income securities with deterministic payoffs Fixed-income securities with state-dependent payoffs Fixed-income securities with stopping times
3. Dynamic term-structure models for default-free bonds 3.1. One-factor dynamic term-structure models 3.2. Multi-factor dynamic term-structure models 3.2.1. Affine models 3.2.2. Quadratic Gaussian models
4. Dynamic term-structure models with jump diffusions 5. Dynamic term-structure models with regime shifts 6. Dynamic term-structure models with rating migrations 6.1. 6.2. 6.3. 6.4.
Fractional recovery of market value Fractional recovery of par, payable at maturity Fractional recovery of par, payable at default Pricing defaultable coupon bonds
1208 1208 1209 1210 1210 1211 1212 1213 1215 1215 1218 1219 1221 1222 1223 1225 1225 1228 1229 1229
° We are grateful to Len Umantsev and Mariusz Rabus for research assistance; to Len Umantsev for comments and suggestions; and for financial support from the Financial Research Initiative, The Stanford Program in Finance, and the Gifford Fong Associates Fund, at the Graduate School of Business, Stanford University. °° Dai is with the Stern School of Business, New York University, New York, NY,
[email protected]. Singleton is with the Graduate School of Business, Stanford University, Stanford, CA 94305 and NBER,
[email protected].
Handbook of the Economics of Finance, Edited by G.M. Constantinides, M. Harris and R. Stulz © 2003 Elsevier B.V. All rights reserved
1208 6.5. Pricing Eurodollar swaps
7. Pricing of fixed-income derivatives 7.1. Derivatives pricing using dynamic term-structure models 7.2. Derivatives pricing using forward-rate models 7.3. Defaultable forward-rate models with rating migrations 7.3.1. Fractional recovery of market value 7.3.2. Fractional recovery of face value, payable at maturity 7.3.3. Fractional recovery of face value, payable at default 7.4. The LIBOR market model 7.5. The swaption market model
References
Q. Dai and K.J. Singleton
1230 1231 1231 1232 1234 1235 1236 1237 1237 1241 1242
Abstract This chapter surveys the literature on fixed-income pricing models, including dynamic term-structure models, and interest-rate sensitive, derivative pricing models. Our overview of conceptual approaches highlights the tradeoffs that have emerged between the complexity of the probability model for the “risk factors”, data availability, the pricing objective, and the tractability of the resulting pricing model. Initially, we examine term-structure models that price both bonds (default-free and defaultable) and fixed-income derivatives with payoffs in terms of prices or yields on these bonds. These include affine, quadratic-Gaussian, and various stochastic volatility models of the term structure. Then we turn to models designed to price fixed-income derivatives, taking the current yield curve as an input into the pricing framework. These include models based on forward rates and the LIBOR and Swaption Market models.
Keywords term structure of interest rates, defaultable bonds, fixed income derivatives, forward rates, risk-neutral pricing JEL classification: G13, E43
Ch. 20:
Fixed-Income Pricing
1209
1. Introduction
This chapter surveys the literature on fixed-income pricing models, including dynamic term-structure models (DTSMs) and interest rate sensitive, derivative pricing models. This literature is vast with both the academic and practitioner communities having proposed a wide variety of models and model-selection criteria. Central to all pricing models, implicitly or explicitly, are: (i) the identity of the state vector: whether it is latent or observable and, in the latter case, which observable series; (ii) the law of motion (conditional distribution) of the state vector under the pricing measure; and (iii) the functional dependence of the short-term interest rate on this state vector. A primary objective, then, of research on fixed-income pricing has been the selection of these ingredients to capture relevant features of history, given the objectives of the modeler, while maintaining tractability, given available data and computational algorithms. Accordingly, we overview alternative conceptual approaches to fixedincome pricing, highlighting some of the tradeoffs that have emerged in the literature between the complexity of the probability model for the state, data availability, the pricing objective, and the tractability of the resulting model. A pricing model may be “monolithic” in the sense that it prices both bonds (as functions of a set of underlying state variables or “risk factors” – i.e., is a “termstructure model”) and fixed-income derivatives (with payoffs expressed in terms of the prices or yields on these underlying bonds). Alternatively, a model may be designed to price fixed-income derivatives, taking as given the current shape of the underlying yield curve. The former modeling strategy is certainly more comprehensive than the latter. However, researchers have often found that the latter approach offers more flexibility in calibration and tractability in computation when pricing certain derivatives. Initially, taking the monolithic approach, we overview a variety of models for pricing default-free bonds and associated derivatives written on these (or portfolios of these) bonds. Basic issues in pricing fixed-income securities (FIS) for the case where the state vector follows a diffusion are discussed in Section 2. “Yield-based” DTSMs are reviewed in Section 3. Extensions of these pricing models to allow for jumps or regime shifts are explored in Sections 4 and 5, respectively. Then, in Section 6, we turn to the case of defaultable securities. Here we start by considering a quite general framework in which there are multiple credit classes (possibly indexed by rating) and deriving pricing relations for the case where issuers may transition between classes according to a Markov process. Several of the most widely studied models for pricing defaultable bonds are compared by specializing to the case of a single credit class. The pricing of fixed-income derivatives is overviewed in Section 7. Initially, we continue our discussion of DTSMs and overview recent research on the pricing of derivatives using yield-based term-structure models. Then we shift our focus from monolithic models to models for pricing derivatives in which the current yield curve, and possibly the associated yield volatilities, are taken as inputs into the pricing
1210
Q. Dai and K.J. Singleton
problem. These include models based on forward rates (both for default-free and defaultable securities), and the LIBOR and Swaption Market models. To keep our overview of the literature manageable we focus, for the most part, on term-structure models and fairly standard derivatives on zero-coupon and coupon bonds (both default-free and defaultable), plain-vanilla swaps, caps, and swaptions. In particular, we do not delve deeply into many of the complex structured products that are increasingly being traded. Of particular note, we have chosen to side-step the important issue of pricing securities in which correlated defaults play a central role in valuation. 1 Additionally, we focus almost exclusively on pricing and the associated “pricing measures”. Our companion paper Dai and Singleton (2003) explores in depth the specifications of the market prices of risk that connect the pricing with the actual measures, as well as the empirical goodness-of-fit of models 2 under alternative specifications of the market prices of risks. 2. Fixed-income pricing in a diffusion setting A standard framework for pricing FIS has the riskless rate rt being a deterministic function of an N × 1 vector of risk factors Yt , rt = r (Yt , t) ,
(1)
and the risk-neutral dynamics of Yt following a diffusion process, dYt = m (Yt , t) dt + s (Yt , t) dWtQ .
3
(2)
WtQ
is a K × 1 vector of standard and independent Brownian motions under Here, the risk-neutral measure Q, m(Y , t) is a N × 1 vector of deterministic functions of Y and possibly time t, and s (Y , t) is a N × K matrix of deterministic functions of Y and possibly t. 2.1. The term structure Central to the pricing of FIS is the term structure of zero-coupon bond prices. The time-t price of a zero-coupon bond with maturity T and face value of $1 is given by T Q D(t, T ) = E rs ds | Ft , exp − (3) t
where Ft is the information set at time t, and EQ (· | Ft ) denotes the conditional expectation under the risk-neutral measure Q. Since a diffusion process is Markov, 1
Musiela and Rutkowski (1997b) discuss the pricing of a wide variety of fixed-income products, and Duffie and Singleton (2003) discuss pricing of structured products in which correlated default is a central consideration. 2 See also Chapman and Pearson (2001) for another surveys of the empirical term structure literature. 3 See Duffie (1996) for sufficient technical conditions for a solution to (2) to exist.
Ch. 20:
Fixed-Income Pricing
1211
we can take Ft to be the information set generated by Yt . Thus, the discount function {D(t, T ): T t} is completely determined by the risk-neutral distribution of the riskless rate and Yt . 4 As an application of the Feynman–Kac theorem, the price of a zero-coupon bond can alternatively be characterized as a solution to a partial differential equation (PDE). Heuristically, this PDE is obtained by applying Ito’s lemma to the pricing function D(t, T ), for some fixed T t: dD(t, T ) = m (Yt , t; T ) dt + s (Yt , t; T ) dWtQ , ð ðD(t, T ) + A D(t, T ), s (Y , t; T ) = s (Y , t) , m (Y , t; T ) = ðt ðY where A is the infinitesimal generator for the diffusion Yt : ð ð2 A = m(Y , t) + 12 Trace s (Y , t) s (Y , t) . ðY ðYðY No-arbitrage requires that, under Q, the instantaneous expected return on the bond be equal to the riskless rate rt . Imposing this requirement gives ð + A D(t, T ) − r(Y , t) D(t, T ) = 0, (4) ðt with the boundary condition D(T , T ) = $1 for all YT . 2.2. Fixed-income securities with deterministic payoffs The price of a security with a set of deterministic cash flows {Cj : j = 1, 2, . . . , n} at some given relative payoff dates tj ( j = 1, 2, . . . , n) is given by n P t; {Cj , tj : j = 1, 2, . . . , n} = Cj D(t, t + t j ). j=1
In particular, the price of a coupon-bond with face value F, semi-annual coupon rate of c, and maturity T = J × .5 years (where J is an integer) is P (t; {c, T }) =
J j=1
F×
c × D(t, t + .5j) + F × D(t, T ). 2
4 Here we assume that sufficient regularity conditions conditions (that may depend on the functional form of r(Y , t)) have been imposed to ensure that the conditional expectation in Equation (3) is welldefined and finite.
1212
Q. Dai and K.J. Singleton
It follows that the par yield – i.e., the semi-annually compounded yield on a par bond (with Pt = F) – is given by 2 [1 − D(t, T )] PY(t, T ) = J . j = 1 D(t, t + .5j)
(5)
2.3. Fixed-income securities with state-dependent payoffs The price of a FIS with coupon flow payment hs , t s T , and terminal payoff gT is P (t; {hs : t s T ; gT }) T s Q Q exp[− ru du] hs ds | Ft + E exp[− =E t
t
T
ru du] gT | Ft .
(6)
t
When ru = r(Yu , u), hu = h(Yu , u), and gu = g(Yu , u) are deterministic functions of the state vector Yu , this price is obtained as a solution to the PDE ð + A P(t) − r(Y , t) P(t) + h(Y , t) = 0, (7) ðt under the boundary condition P(T ; {hT ; gT }) = g(YT , T ), for all YT . (Equation 4 is obtained as the special case of Equation 7 with hu ≡ 0 and gT = $1.) A mathematically equivalent way of characterizing the price P(t; {hs : t s T ; gT }) is in terms of the Green’s function. Let d(x) denote ! the Dirac function, with the property that d(x) = 0 at x Ñ 0, d(0) = ∞, and dx d(x − y) f (x) = f ( y) for any continuous and bounded function f (·). The price, G(Yt , t; Y , T ), of a security with a payoff d(YT − Y ) at T , and nothing otherwise, is referred to as the Green’s function. By definition, the Green’s function is given by G (t, Yt ; T , Y ) = EQ exp[−
T
ru du]d (YT − Y ) | gt .
t
It is easy to see that G solves the PDE (7) with h(Y , t) ≡ 0 under the boundary condition G(YT , T ; Y , T ) = d(YT − Y ). If G is known, then any FIS with payment flow h(Yt , t) and terminal payoff g(YT , T ) is given by P (t; {hs : t s T ; gT }) T ds dY G (Yt , t; Y , s) h(Y , s) + dY G (Yt , t; Y , T ) g(Y , T ). =
(8)
t
Essentially, the Green’s function represents the set of Arrow–Debreu prices for the case of a continuous state space. When the Green’s function is known, Equation (8) is
Ch. 20:
Fixed-Income Pricing
1213
often convenient for the numerical computation of the prices of a wide variety of FIS [see Steenkiste and Foresi (1999) for applications of the Green’s function for affine term-structure models]. In the absence of default risk, some fixed-income derivative securities with statedependent payoffs can be priced using the discount function alone, because they can be perfectly hedged or replicated by a (static) portfolio of spot instruments. These include: • Forward Contracts: a forward contract with settlement date T and forward price F on a zero-coupon bond with par $100 and maturity date T + t can be replicated by a portfolio of spot instruments consisting of long a zero-coupon bond with maturity T + t and par $100 and short a zero-coupon bond with maturity T and par F. Thus, the market value of the forward contract is $100 × D(t, T + t) − F × D(t, T ). Consequently the forward price is given by F = $100 ×
D(t, T + t) . D(t, T )
• A Floating Payment: a floating payment indexed to a riskless rate with tenor t, with coupon rate reset at T and payment made at T + t, can be replicated by a portfolio of spot instruments consisting of long a zero-coupon bond with maturity T and par $100 and short a zero-coupon bond with maturity T + t and par $100. Thus, the price of the floating payment is $100 × [D(t, T ) − D(t, T + t)]. This implies immediately that a floating rate note with payment in arrears is always priced at par on any reset date. • A Plain Vanilla Interest Rate Swap: a plain-vanilla interest rate swap with the tenor of the floating index matching the payment frequency can be perfectly replicated by a portfolio of spot instruments consisting of long a floating rate note with the same floating index, payment frequency, and maturity and short a coupon bond with the same maturity and payment frequency, and with coupon rate equal to the swap rate. It follows that, at the inception of the swap, the swap rate is equal to the par rate: 1 − D(t, T ) , s(t, T ) = N − 1 j = 0 dj D(t, Tj ) where t ≡ T0 < T1 < · · · < TN ≡ T , dj = Tj + 1 − Tj is the length of the accrual payment period indexed by j, 0 j N − 1, based on an appropriate day-count convention, N is the number of payments, and T is the maturity of the swap. In the presence of default risk, the above pricing results may not hold except under specific conditions (see, e.g., Section 6.5 for pricing of Eurodollar swaps). 2.4. Fixed-income securities with stopping times For some fixed-income securities, including American options and defaultable securities, the cash-flow payoff dates are also random. A random payoff date is
1214
Q. Dai and K.J. Singleton
typically modeled as a stopping time, that may be exogenously given or endogenously determined (in the sense that it must be determined jointly with the price of the security under consideration). The optimal exercise policy of an American option can be characterized as an endogenous stopping time. Valuation of American options in general, and valuation of fixed-income securities containing features of an American option in particular, is challenging, because closed-form solutions are rarely available and numerical computations (finite-difference, binomial-lattice, or Monte Carlo simulation) are typically very expensive (especially when there are multiple risk factors). As a result, approximation schemes are often used [see, e.g., Longstaff and Schwartz (2001)], and considerable attention has been given to establishing upper and lower bounds on American option prices [e.g., Haugh and Kogan (2001) and Anderson and Broadie (2001)]. In the light of these complexities in pricing, some have questioned whether the optimal exercise strategies implicit in the parsimonious models typically used in practice are correctly valuing the American option feature of many products [e.g., Andersen and Andreasen (2001) and Longstaff, Santa-Clara and Schwartz (2001)]. Of course, characterizing the optimal exercise policy itself can be challenging, particularly in the case of mortgage-backed securities, because factors other than interest rates may influence the prepayment behavior [e.g., Stanton (1995)]. In “reduced-form” pricing models for defaultable securities [e.g., Jarrow, Lando and Turnbull (1997), Lando (1998), Madan and Unal (1998), and Duffie and Singleton (1999)], the default time is typically modeled as the exogenous arrival time of an autonomous counting process. The claim to the recovery value of a defaultable security with maturity T is the present value of the payoff qt = q(Yt , t) (recovery upon default) at the default arrival time t whenever t T : t (9) ru du]qt 1{t T } | Yt . P (t; {q (Yt , t)}) = EQ exp[− t
This expression simplifies if t is the arrival time of a doubly stochastic Poisson process with state-dependent intensity lt = l(Yt , t). At date t, the cumulative distribution of arrival of a stopping time before date! s, conditional on {Yu : t u s} is s Pr(t s; t | Yu : t u s) = 1 − exp[− t lu du]. It follows that [see, e.g., Lando (1998)] T s exp[− ru du] qs d Pr (t s; t | Yu : t u s) | Yt P (t; {q (Yt , t)}) = EQ t
= EQ
t
T
exp[− t
s
(ru + lu ) du] ls qs ds | Yt .
t
This pricing equation is a special case of Equation (6) with hs = ls qs , gT = 0, and an “effective riskless rate” of rs + ls . In “structural” pricing models of defaultable securities, the default time is typically modeled as the first passage time of firm value below some default boundary. With
Ch. 20:
Fixed-Income Pricing
1215
a constant default boundary and exogenous firm value process [e.g., Merton (1974), Black and Cox (1976), and Longstaff and Schwartz (1995)], the pricing of the default risk amounts to the computation of the first-passage probability under the forward measure. With an endogenously determined default boundary [e.g., Leland (1994) and Leland and Toft (1996)], the probability of the first passage time and the value of the risky debt must be jointly determined. 5
3. Dynamic term-structure models for default-free bonds In this section we overview the pricing of default-free bonds within DTSMs. We begin with an overview of one-factor models (N = 1) and then turn to the case of multi-factor models. 3.1. One-factor dynamic term-structure models Some of the more widely studied one-factor models are: Nonlinear CEV Model r follows the one-dimensional Feller (1951) process dr(t) = úqr(t)2h − 1 − úr(t) dt + s r(t)h dW Q (t).
(10)
In this model, the admissible range for h is [0, 1), and the zero boundary is entrance (cannot be reached from the interior of the state space) if úq > s 2 / 2 (the so called “Feller condition”). Further, the distribution of rt conditional on rt − 1 is known to be a generalized Bessel process [Eom (1998)]. The solution to Equation (10) has r > 0 for all h ∈ [0, 1), including h = 0, so long as the Feller condition is satisfied. However, we are not aware of closed-form solutions for D(t, T ) in this model outside of the case of h = .5. Square-root model [Cox, Ingersoll and Ross (1985)]. For this special case of Equation (10) with h = .5 the discount function is given by D(t, T ) = A(t) exp[−B(t) rt ], where, with g ≡
√
T t,
t ≡ (T − t),
ú 2 + 2s 2 ,
5 Similar to an American option, the price of the risky debt can be characterized as the solution to a PDE with a “free boundary”, with the boundary conditions given by the so-called “value-matching” and the “smooth-pasting” conditions.
1216
Q. Dai and K.J. Singleton
A(t) =
2g exp[(ú + g)t/ 2 (ú + g)(exp[gt] − 1) + 2g
2úq/ s 2 ,
B(t) =
2(exp[gt] − 1) . (ú + g)(exp[gt] − 1) + 2g
The Green’s function for the CIR model is given by G (rt , t; r, T ) = D(t, T ) 2c c 2 (2c r; n , ÷) , where c 2 (·, n , ÷) is the non-central chi-square density with the degrees of freedom n and the parameter of noncentrality ÷, defined by 4úq ú + g − (ú − g) exp[−g(T − t)] ,n = 2 , s 2 (1 − exp[−g(T − t)]) s 2 8g exp[−g(T − t)] rt . ÷= 2 s (1 − exp[−g(T − t)])[2g + (ú − g)(1 − exp[−g(T − t)])] c=
Log-normal model [Black, Derman and Toy (1990)]. As h → 1 in Equation (10), the process for r converges to that of a log-normal process. Though widely used in the financial industry (often in this one-factor formulation with time-dependent parameters), we are not aware of closed-form solutions for discount curves in this model. Three-halves model [Cox, Ingersoll and Ross (1980)]. r follows the process dr(t) = ú (q − r(t)) r(t) dt + s r(t)1.5 dW Q (t).
(11)
This process is stationary and zero is entrance if ú and s are greater than 0. D(t, T ) is given by [see Ahn and Gao (1999)] D(t, T ) =
G ( b − g) M (g, b, −x(t)) x(t)g , G ( b)
where G (·) is the “gamma” function, M (·) is a confluent hypergeometric function (computed through a series expansion), x(t) =
−2b , s 2 (exp[b(T − t)] − 1) r(t)
and 1 s2 2 b= 2 s g=
$#
% (.5s 2 − a)2 + 2s 2 − (.5s 2 − a) ,
−a + (1 + g) s 2 .
Ch. 20:
Fixed-Income Pricing
1217
Gaussian Model [Vasicek (1977)]. r follows the diffusion with linear drift ú(q − r(t)) dt and constant diffusion coefficient s . In this case, the discount function is given by D(t, T ) = exp[−A(T − t) − B(t, T ) rt ], T t, where, 1 − út s2 s2 B(t)2 , B(t) = . A(t) = d − 2 [t − B(t)] + 2ú 4ú ú The Green’s function for this model is given by [see Jamshidian (1989)]: $ % −f (t,T ))2 exp − (rT2v(t,T ) G(rt , t; rT , T ) = D(t, T ) # , 2p v(t, T ) where f (t, T ) = exp[−ú(T − t)] rt + (1 − exp[−ú(T − t)]) q −
s2 (1 − exp[−ú(T − t)])2 , 2ú 2
is the instantaneous forward rate and v(t, T ) =
s2 (1 − exp[−2ú(T − t)]), 2ú
is the conditional volatility of the spot short rate. An alternative means of constructing tractable one-factor models is to maintain a simpler representation of the state Y and to let rt = g(Yt , t), for some nonlinear function g. For example, the one-factor Quadratic Gaussian (QG) model [see Beaglehole and Tenney (1991)] is obtained by letting g(Yt , t) = a + bYt + gYt2 and Yt following a Gaussian diffusion. The discount function in this model is given by D(t, T ) = exp[−A(t) − B(t) Yt − C(t) Yt2 ], where
B(t) = C(t)
2ú(q + G
b 2g )
exp[Gt] − 1 b + , exp[Gt] + 1 g
and C(t) = and G =
g(exp[2Gt] − 1) , (G + ú)(exp[2Gt] − 1) + 2G
# ú 2 + 2gs 2 .
(A(t) is also known as a relatively complicated function of the underlying parameters.)
1218
Q. Dai and K.J. Singleton
Although the QG model is driven by one risk factor, it can be viewed equivalently as a degenerate two-factor model (with two state variables driven by the same Brownian motion). To see this, note that, from Ito’s lemma, drt =
2úa + úqb + gs 2 + 2úqgYt − 2úgrt dt + (b + 2gYt ) s dWtQ .
Using the fact that r is affine in Y and Y 2 , we see that the instantaneous conditional means and covariance of rt and Yt are affine in (rt , Yt ). One can build up multi-factor DTSMs from these one factor examples by simply N assuming that the short rate is the sum of N independent risk factors, rt = i = 1 Yti , with each Y following one of the preceding one-factor models for which a solution for zero prices is known [see Cox, Ingersoll and Ross (1985), Chen and Scott (1993), Pearson and Sun (1994), Duffie and Singleton (1997) and Jagannathan, Kaplan and Sun (2001) for multi-factor versions+of the square-root model]. In this case, the discount N function is given by D(t, T ) = i = 1 D(t, T )i , where D(t, T )i is the discount function in a single-factor model with the short rate given by rt = Yti . This approach leads, however, to rather restrictive formulations of multi-factor models, particularly with regard to the assumption of zero correlations among the risk factors. We turn next to multi-factor models with correlated risk factors. 3.2. Multi-factor dynamic term-structure models A quite general formulation of multi-factor models has rt = g(Yt , t), where Yt = (Yti : 1 i N ) and these risk factors may be mutually correlated. Specifications of the function g(·, t) and the dynamics of Yt are constrained only by the so-called admissibility conditions which stipulate that (i) Yt must be a well-defined stochastic process; and (ii) the conditional expectation in Equation (3) exists and is finite (equivalently, the PDE (4) has a well-defined and finite solution). In practice, however, model specifications are often influenced by their computational tractability in pricing FIS. Two classes of diffusion-based multi-factor models have been the focal points of much of the literature on pricing default-free bonds: affine models [see, e.g., Duffie and Kan (1996) and Dai and Singleton (2000)] and quadratic Gaussian models [see, e.g., Beaglehole and Tenney (1991), Ahn, Dittmar and Gallant (2002) and Leippold and Wu (2001)]. These models have the common feature that the discount function has the exponential form: D(t, T ) = exp[−G(Yt , t; T )],
T t,
(12)
where G(YT , T ; T ) = 0 for all YT , and limT → t GT (Yt , t; T ) exists and is finite for all Yt and t T . Heuristically, the affine and quadratic Gaussian models are “derived” from the requirement that G(Y , ·; ·) be, respectively, an affine and quadratic function of the
Ch. 20:
Fixed-Income Pricing
1219
state vector Y . Naturally, such a requirement restricts the functional form of g(Yt , t) and the Q-dynamics of Yt . By definition, rt = − lim
T →t
log Dt,T , T −t
from which it follows that rt = g(Yt , t) = lim
T →t
G(Yt , t; T ) . T −t
(13)
Thus, rt must be affine in Y in affine models and quadratic in Y in quadratic Gaussian models. Furthermore, substituting Equation (12) into Equation (4) yields Gt + m(Y , t) GY + 12 Trace s (Y , t) s (Y , t) (GYY − GY GY ) + g(Y , t) = 0,
(14)
which may be viewed as a restriction on the risk-neutral drift m(Y , t) and diffusion s (Y , t) for the state vector Y . 3.2.1. Affine models Affine term-structure models are characterized by the requirement that G(Yt , t; T ) be affine in Yt ; i.e., G(Yt , t; T ) = A(T − t) + B(T − t) Yt . In this case, rt must also be affine in Yt : rt = a + b Yt , where a = A (0) and b = B (0). Furthermore, Equation (14) reduces to ˙ + B(t) ˙ A(t) Yt = m(Yt , t) B(t) − 12 B(t) s (Yt , t) s (Yt , t) B(t) + (a + b Yt ),
(15)
where t ≡ T − t,
ðA(T − t) ðA(t) ˙ =− , A(t) = ðt ðt
˙ and B(t) is similarly defined. Duffie and Kan (1996) show that, in order for Equation (15) to hold for any Yt , it is sufficient that 6 (1) m(Yt , t) be affine in Yt : m(Yt , t) = a + bYt , where a be a N × 1 vector and b be a N × N matrix. 6 A more general mathematical characterization of affine models is presented in Duffie, Filipovic and Schachermayer (2001). See Gourieroux, Monfort and Polimenis (2002) for a formal development of multi-factor affine models in discrete time.
1220
Q. Dai and K.J. Singleton
(2) s (Yt , t) s (Yt , t) be affine in Yt : s (Yt , t) s (Yt , t) = h0 +
N j=1
h1j Yt j , where h0 and
h1j , j = 1, 2, . . . , N , are N × N matrices. (3) A(t) and B(t) satisfy the following ODEs: A˙ = a + a B(t) − 12 B(t) h0 B(t),
(16)
B˙ = b + b B(t) − 12 v(t),
(17)
where vj (t) ≡ B(t) h1j B(t). For suitable choices of (a, b; a, b; h0 , h1j : 1 j N ), the Ricatti equations (16– 17) admit a unique solution (A(t), B(t)) under the initial conditions A(0) = 0 and B(0) = 0N × 1 . It is easy to verify that the solution has the property A (0) = a and B (0) = b. Dai and Singleton (2000) examine multi-factor affine models with the following structure: rt = d0 + d Yt , # dYt = K (Q − Yt ) dt + S St dWtQ , where St is a diagonal matrix with [St ]ii = ai + Yt bi . Letting B be the N × N matrix with ith column given by bi , they construct admissible affine models – models that give unique, well-defined solutions for D(t, T ) – by restricting the parameter vector (d0 , d, K, Q, S, a, B). Specifically, for given m = rank(B), they introduce the canonical model with the structure BB 0m × (N − m) Km × m K= , DB DD K(N − m) × m K(N − m) × (N − m) for m > 0, and K is either upper or lower triangular for m = 0, B Qm × 1 , S = I, Q= 0(N − m) × 1 BmBD× (N − m) Im × m 0m × 1 a= , B= , 1(N − m) × 1 0(N − m) × m 0(N − m) × (N − m) with the following parametric restrictions imposed: di 0, m + 1 i N , m Ki Q ≡ Kij Qj > 0, 1 i m,
Kij 0,
1 j m,
j Ñ i,
j=1
Qi 0,
1 i m,
Bij 0,
1 i m,
m + 1 j N.
Then the sub-family Am (N ) of affine term-structure models is obtained by inclusion of all models that are invariant transformations of this canonical model or nested
Ch. 20:
Fixed-Income Pricing
1221
special cases of such transformed models. For the case of N risk factors, this gives N + 1 non-nested sub-families of admissible affine models. 7 Members of the families Am (N ) include, among others, Vasicek (1977), Langetieg (1980), Cox, Ingersoll and Ross (1985), Longstaff and Schwartz (1992), Chen and Scott (1993), Pearson and Sun (1994), Duffie and Singleton (1997), Balduzzi, Das, Foresi and Sundaram (1996), Balduzzi, Das and Foresi (1998), Duffie and Liu (2001) and Collin-Dufresne and Goldstein (2001a). 3.2.2. Quadratic Gaussian models If G(Yt , t; T ) is quadratic in Yt , i.e., G(Yt , t; T ) = A(T − t) + B(T − t) Yt + Yt C(T − t) Yt , then it must be the case that rt = a + b Yt + Yt gYt , where a = A (0), b = B (0), and g = C (0). Without loss of generality, we can assume that C(t) is symmetric. Thus g must also be symmetric and Equation (14) becomes ˙ t = a + Yt b + Yt gYt + m (Yt , t) [B(t) + 2CYt ] A˙ + Yt B˙ + Yt CY (18) + Trace s (Yt , t) Cs (Yt , t) 1 − 2 [B + 2CYt ] s (Yt , t) s (Yt , t) [B + 2CYt ] . In order for Equation (18) to hold for any Yt , it is sufficient that (1) m(Yt , t) = a + bYt , where the N × 1 vector a and N × N matrix b are constants; (2) s (Yt , t) = s , where s is a N × N constant matrix; (3) A(t), B(t) and C(t) satisfy the following ODEs: (19) A˙ = a + a B − 12 B ss B + Trace s Cs , B˙ = b + b B − 2C ss B + 2C a, C˙ = g + b C + C b − 2C ss C.
(20) (21)
For suitable choices of (a, b, g; a, b; s ), the Ricatti equations (19–21) admit a unique solution (A(t), B(t), C(t)) under the initial conditions A(0) = 0, B(0) = 0N × 1 and C(0) = 0N × N . It is easy to verify that the solution has the property that A (0) = a, B (0) = b and C (0) = g. The canonical representation of the QG models is simpler than in the case of affine models, because shocks to Y are homoskedastic. To derive their canonical model, Ahn, Dittmar and Gallant (2002) normalize the diagonal elements of g to unity, set b = 0, 7 Although the classification scheme was originally used by Dai and Singleton (2000) to characterize the state dynamics under the actual measure, it is equally applicable to the state dynamics under the Q-measure. Note that not all admissible affine DTSMs are subsumed by this classification scheme (i.e., not all admissible models are invariant transformations of a canonical model).
1222
Q. Dai and K.J. Singleton
have K (the mean reversion matrix for Y ) being lower triangular, and have S diagonal. They show that the QG models in Longstaff (1989), Constantinides (1992) and Lu (1999) are restricted special cases of their most flexible canonical model.
4. Dynamic term-structure models with jump diffusions Suppose that rt = r(Yt , t) is a function of a jump-diffusion process Y with risk-neutral dynamics dYt = m (Yt , t) dt + s (Yt , t) dWtQ + DYt dZt ,
(22)
where Zt is a Poisson counter with risk-neutral intensity lt , and the jump size DYt is drawn from a risk-neutral distribution nt (x) ≡ n (x; Yt , t). No arbitrage implies that the zero-coupon bond price D(t, T ) satisfies ð + A D(t, T ) − r(Y , t) D(t, T ) = 0, (23) ðt where A is the risk-neutral infinitesimal generator defined by 1 A f (Y , t) = mt fY + 2 Trace st st fYY + lt [ f (Y + x, t) − f (Y , t)] dnt (x), for any test function f (Yt , t). If f (Y , t) is exponential in Y (i.e., f (Y , t) = exp[A(t) +B(t) Y ]) and n (x; Y , t) is independent of Y , then [ f (Yt + x, t) − f (Yt , t)] dnt (x) = C(B(t), t) f (Yt , t), ! where C(u, t) = exp[u x] dnt (x) is the Laplace transform of the jump distribution. This observation underlies many of the analytic pricing relations that have been derived. Specifically, for the case of affine jump-diffusions, analytic expressions for zerocoupon bond prices are obtained under the following assumptions: rt , mt , st st , and lt are! affine in Yt , and the Laplace transform of the distribution nt (x), q(u, t) = exp[ux] dnt (x), depends at most on u and t. With these assumptions, D(t, T ) = exp[A(t, T ) + B(t, T )Yt ], with the coefficients A(t, T ) and B(t, T ) again determined by a set of ODEs [Duffie, Pan and Singleton (2000)]. Ahn and Thompson (1988) extend the equilibrium framework of Cox, Ingersoll and Ross (1985) to the case of Y following a square-root process with jumps. Brito and Flores (2001) develop an affine jump-diffusion model, and Piazzesi (2001) develops a mixed affine-QG model, in which the jumps are linked to the resetting of target interest rates by the Federal Reserve.
Ch. 20:
Fixed-Income Pricing
1223
5. Dynamic term-structure models with regime shifts The “regime switching” framework was introduced by Hamilton (1989) to model business-cycle fluctuations in real variables, and was subsequently adapted by Gray (1996) to model short-term interest rates with state-dependent regime switching probabilities. Only recently has Hamilton’s framework been extended to bond pricing [see, e.g., Naik and Lee (1997), Evans (2000), Landen (2000) and Bansal and Zhou (2002)]. Following Dai and Singleton (2003), we present a continuous-time formulation of fixed-income pricing with regime shifts. The evolution of “regimes” is governed by an (S + 1)-state continuous-time conditionally Markov chain st : W → {0, 1, . . . , S} with a (S + 1) × (S + 1) rate or generator matrix Rt with the property that all rows sum to zero. 8 Intuitively, Rijt dt, i Ñ j, which may be state-dependent (i.e., Rt = R(Yt , t)), represents the probability of moving from regime i to regime j over the next interval dt, and 1 + Riit dt is the probability of staying in regime i in the next interval dt. The relation between r and Y may be indexed by regime in that rt ≡ r(st ; Yt , t). Additionally, The state vector Yt is a solution to dYt = m j (Yt , t) dt + s j (Yt , t) dWtQ , with the conditional moments of Yt indexed by the regime j. Though these moments may change across regimes, the sample path of Y remains continuous. For simplicity, we assume that regime shifts and Brownian shocks are mutually independent. To facilitate analytical development of bond pricing under regime switching, we introduce (S + 1) regime indicator functions: ztj = 1{st = j} , j = 0, 1, . . . , S. Clearly, !t E[dztj | st , Yt ] = Rtj dt, therefore mtj ≡ ztj − 0 Ruj du is a Martingale. A useful property of these random variables is that, any regime-dependent variable F(st ; Yt , t) can be written as F (st ;Yt , t) ≡
S
ztj F j (Yt , t) ,
(24)
j=0
where F j (Yt , t) ≡ F(st = j; Yt , t). Conversely, given a set of (S + 1) functions F j (Yt , t), j = 0, 1, . . . , S, a regime-dependent random variable F(st ; Yt , t) can be defined through Equation (24). In particular, each column S of the matrix Rt defines a regime-dependent random variable Rtj ≡ R j (st ; Yt , t) = i = 0 zti Rijt , j = 0, 1, . . . , S. Furthermore, the drift and diffusion functions of the state vector under the (S + 1) different regimes can be
8 See Bielecki and Rutkowski (2001) for a formal definition of a “conditionally Markov chain”, where Rt is also referred to as the “conditional infinitesimal generator” of st under a proper extension of the probability measure Q, given the s -field Ft .
1224
Q. Dai and K.J. Singleton
S represented by two regime-dependent random variables: m(st ; Yt , t) ≡ j = 0 ztj m j (Yt , t) S and s (st ; Yt , t) ≡ j = 0 ztj s j (Yt , t). S Writing D(t, T ) ≡ j = 0 ztj D j (t, T ), where D j (t, T ) ≡ D(st = j; Yt , t; T ), Ito’s lemma implies that S % dD(t, T ) D j (t, T ) $ j Q D D 1− = mt,T dt + st,T dWt − dzt − Rtj dt , D(t, T ) D(t, T )
D mt,T =
j=0
j D j (t, T ) ð 1 + A D(t, T ) + , Rt D(t, T ) ðt D(t, T ) S
j=0
S ð log D(t, T ), A = ztj A j , ðYt j=0 ð ð2 + 12 Trace s j (Yt , t) s j (Yt , t) A j = m j (Yt , t) , ðY ðYðY
D st,T = s (st ; Yt , t)
0 j S.
D = rt for all 0 st S and all Yt = Y in the admissible No arbitrage requires that mt,T state space. This implies (S + 1) partial differential equations: S ð i + A Di (t, T ) + Rijt D j (t, T ) − rti Di (t, T ) = 0, 0 i S, ðt j=0
where ≡ r(st = i; Yt , t), 0 i S. In general, the matrix Rt is not diagonal. Therefore the above PDEs are coupled and the (S + 1) functions (Di (t, T ): 0 i S) must be solved jointly. The boundary condition is D(T , T ) = 1 for all sT , which is equivalent to (S + 1) boundary conditions: (Di (T , T ) = 1: 0 i S). Dai and Singleton (2003) derive a closed-form solution for D(t, T ) in this framework under two additional assumptions. First, under Q, the state dynamics for each regime i is described by rti
rti ≡ r(st = i; Yt , t) = d0i + Yt dY , mti ≡ m(st = i; Yt , t) = ú(q i − Yt ), sti ≡ s (st = i; Yt , t) = diag aki + Yt bk k = 1,2, ..., N . with regime dependence entering through the scalar constant d0i and aki and the N × 1 constant vectors q i (the N × N constant matrix ú and the N × 1 vectors dY and bk are regime independent). Second, the risk-neutral rate matrix Rt is state-independent. Under these assumptions, the discount functions are given by D(i; t, T ) = exp[−Ai (T − t) − Yt B(T − t)],
0 i S,
where Ai (·) and B(·) are explicitly known up to a set of ODEs. Regime-dependence of bond prices is captured through the “intercept” term Ai (T − t); the derivative of zero-coupon bond yields with respect to Y does not depend on the regime.
Ch. 20:
Fixed-Income Pricing
1225
The one-factor, two-regime model developed in Naik and Lee (1997), with bkj = 0 (for k = 1 and j = 0, 1), is a special case. Evans (2000) and Bansal and Zhou (2002) develop discrete-time, regime-switching models, with regime-dependent ú i and bti . The continuous-time limit of their models are special cases of the above general pricing framework.
6. Dynamic term-structure models with rating migrations With some technically minor, but conceptually important, modifications the framework developed in the last section can be adapted to model defaultable term structures with rating migrations. To illustrate this, we consider the case of a single economic regime and S + 1 credit rating classes. The rating history of a defaultable bond is represented by a conditionally Markov chain st taking values in the set of rating classes {0, 1, 2, . . . , S}, with risk-neutral rate matrix Rt . The mathematical constructions of st and Rt are exactly the same as in Section 5. Without loss of generality, we will designate S as the default state. As usual, we will assume that the default state is absorbing, so that RSj t = 0, 0 j S. Letting, for each rating class j, (B j (t, T ): T t) denote the rating-specific discount function at time t, the price of a defaultable zero-coupon bond can expressed as B(t, T ) ≡ B(st ; Yt , t; T ) ≡
S−1
ztj B j (t, T ) + ztS BS (st− ; Yt , t; T ),
j=0
where ztj ≡ 1{st = j} is now interpreted as the rating indicator (at time t, the bond is in the rating class j if and only if ztj = 1). The bond price in the default state is treated separately in order to account for recovery. The nature of the defaultable bond pricing relations depends on the nature of the recovery assumption. We begin by developing pricing relations under the assumption of fractional recovery of market value, proposed by Duffie and Singleton (1999), followed by a parallel development based on the assumption of fractional recovery of face value, proposed in various forms by Jarrow, Lando and Turnbull (1997), Duffie (1998) and Bielecki and Rutkowski (2000). 6.1. Fractional recovery of market value If, in the event of default, a fraction, 1 − °(k; Yt , t), of pre-default market value of the bond in rating class k is recovered, then BS (st− ; Yt , t; T ) = [1 − ° (st− ; Yt , t)] B (st− ; Yt , t; T ) .
(25)
We assume that while °(k; Yt , t), the loss rate, may be state-dependent, it does not depend explicitly on the pre-default bond price. To characterize the defaultable
1226
Q. Dai and K.J. Singleton
discount functions (B j (t, T ): 0 j S − 1, T t), consider a defaultable bond rated i Ñ S at time t, with price Bi (t, T ). In the next instant, t + dt, the rating may change to j with probability ptij , where ptij = Rijt dt for 0 j Ñ i S and ptii = 1 + Riit dt. The risk-neutral instantaneous expected return on the bond is therefore given by 1 m B (i; Yt , t; T ) = lim i dt → 0 B (t, T ) dt ⎤ ⎡ S − 1 ij S iS j i i B (t + dt, T ) − B (t, T ) pt + B (i; Yt , t; T ) − B (t, T ) pt ⎦ ×⎣ j=0
=
⎧ ⎨ ð
1 + A Bi (t, T ) + Bi (t, T ) ⎩ ðt
S−1 j=0
⎫ ⎬ Rijt B j (t, T ) , ⎭
S − 1
where Rijt = Rijt for j Ñ i, Riit = − j Ñ i Rijt − RiS °ti , °ti ≡ °(st− = i; Yt , t), and A is the infinitesimal generator for the state vector Yt . No arbitrage requires that m B (i; Yt , t; T ) = rt for all 0 i S − 1. Thus, the defaultable discount functions are jointly determined by the PDEs: S−1 ð Rijt B j (t, T ) − rBi (t, T ) = 0, 0 i S − 1. + A Bi (t, T ) + (26) ðt j=0
The matrix Rt in Equation (26) has an intuitive interpretation: 9 it is obtained from the risk-neutral rate matrix Rt by shifting a portion, 1 − °t , of the risk-neutral default intensity to the risk-neutral “no-transition” intensity, Rii = Rii + RiS (1 − °ti ). The “thinning” of the default intensity, with compensated adjustment to the “no-transition” intensity, captures the effect of default recovery. Letting B(t, T ) = {Bi (t, T )}Si =−01 denote the (S × 1) vector of defaultable discount factors, Equation (26) implies that T B(t, T ) = EtQ Rt − rt B(u, T ) du + B(T , T ) , t
the solution of which, when it exists, is given by T 4(t, T ) B(T , T ) , ru du] Y B(t, T ) = EtQ exp[−
(27)
t
4(t, T ) solves the backward differential equation where Y 4(t, T ) dt, Y 4(T , T ) = IS × S . 4(t, T ) = −Rt Y dY
(28)
Lando (1998) first derived Equation (27) under the assumption of zero recovery (°ti = 100% for ∀ i). Li (2000) and Duffie and Singleton (2003) extended Lando’s result to the case of nonzero fractional recovery of market value. 9
Note that, except under full recovery, the (S×S) “modified rate matrix” Rt is not a valid transition matrix, because its rows do not sum to zero.
Ch. 20:
Fixed-Income Pricing
1227
Of important practical interest is the question of under what conditions Equations (26) or (27) admit an analytic solution. If there is only one rating class, as in Duffie and Singleton (1999), an analytical solution obtains under an affine structure. Specifically, in this case, Rt = −Rt °t is a scalar, where Rt is the default intensity and ! 4(t, T ) = exp[− T Ru °u du], and the °t is the loss rate upon default. It follows that Y t defaultable discount function is given by T B(t, T ) = EtQ exp[− (ru + Ru °u ) ds] B(T , T ) . t
Duffie and Singleton (1999) show that if the “risk-adjusted” short rate rt + Rt °t is an affine function of an affine diffusion Y , then B(t, T ) is exponential affine in Y . This result is easily extended to the case where there are multiple rating classes, but there is no migration across non-default ratings (i.e., an issuer can only migrate to default). With multiple ratings and migration across rating classes, the backward differential Equation (28) typically does not admit an analytic solution. This is because the 4(t, T ) do not commute in general. To circumvent this difficulty, matrices Rt and Y Lando (1998) assumed zero recovery and that (i) the risk-neutral rate matrix Rt admits an eigen-value decomposition Rt = JGt J −1 , where J is a constant (S + 1) × (S + 1) matrix and Gt is a diagonal (S + 1) × (S + 1) matrix (henceforth we refer to this type of decomposition as a Lando decomposition); (ii) the state vector Y follows an affine process under the Q; and (iii) the riskless rate rt and the diagonal elements of the matrix of Gt are affine functions of Y . Under these assumptions, Bi (t, T ) =
S−1 −1 ij J exp[−g0j − Yt gYj ],
(29)
j=0
where g0j and gYj are explicitly known up to a set of ODEs. This follows from the observation that A(t, T ) = J −1 B(t, T ) satisfies ðA(t, T ) ðA(t, T ) 1 ð 2 A(t, T ) + m˜ t + 2 Trace st st + Gt A(t, T ) − rt A(t, T ) = 0, ðt ðY ðYðY (30) with boundary conditions A(T , T ) = J −1 B(T , T ). Since these equations are decoupled, each element of A(t, T ) can be solved individually. Furthermore, under the assumed affine structure, A(t, T ) j = exp[−g0j − Yt g j ], where g0j and g j depend in general on j, because the diagonal elements of Gt need not be the same across different rating classes. Inspired by Lando (1998), Li (2000) shows that the pricing formula (29) obtains for the nonzero-recovery case under the following assumptions: (i) the defective rate matrix Rt admits a Lando-decomposition: Rt = J Gt J −1 , where J is a constant S × S matrix and Gt is a diagonal S × S matrix; (ii) Yt is affine under the risk-neutral
1228
Q. Dai and K.J. Singleton
measure, in the sense of Duffie and Kan (1996); and (iii) rt and the diagonal elements of Gt are affine in Yt . To see that Li (2000) is a direct extension of Lando (1998), note first that Rt = JGt J −1 if and only if Rt = J Gt J −1 , where −1 J −J −1 1 J 1 Gt 0 −1 = = J = , J , G . t 1 0 0 1 0 0 It follows immediately that Rt admits a Lando-decomposition if and only if Rt admits a Lando-decomposition. Letting Y (t, T ) be the solution to the backward differential equation dY (t, T ) = −Rt Y (t, T ) dt, with boundary condition Y (T , T ) = I(S + 1) × (S + 1) , 4(t, T ) is the upper-left S × S sub-matrix of Y (t, T ). it is easy to see that Y 6.2. Fractional recovery of par, payable at maturity Suppose that, in the event of default, a fraction w(st− ; Yt , t) of face value is recovered, and that payment of w(st− ; Yt , t) is postponed until the original maturity date of the defaultable bond. Then BS (st− ; Yt , t; T ) = w(st− ; Yt , t) D(t, T ),
(31)
the recovery at the default time, is simply the recovery at maturity w(st− ; Yt , t) discounted back to t by the default-free discount factor D(t, T ). For this case of zerocoupon bonds, this recovery convention agrees with that proposed by Jarrow, Lando and Turnbull (1997) in which bond holders recover, at the time of default, a fraction (w(st− ; Yt , t)) of an otherwise equivalent Treasury bond (D(t, T )). Letting wtk ≡ w(k; Yt , t), ∀ k, under recovery assumption (31), the defaultable discount functions solve the following PDEs:
S−1 k ð j kS B (t, T ) + wtk RkS + A Bk (t, T ) + Rkj t B (t, T ) − rt + Rt t D(t, T ) = 0, ðt j=0
(32) with boundary condition Bk (T , T ) = 1, 0 k S − 1. As far as we are aware, the solution to the joint PDEs (32) with rating migrations (S > 1) has yet to be developed. However, for the special case of S = 1 (no rating migration) and state-independent (constant) w, T (ru + lu ) du B(t, T ) = EtQ exp − t (33) u T T EtQ exp −
+w t
rs ds exp − t
lv dv lu du , t
where the default intensity is R0S t = lt . Each of the expectations in Equation (33) is known in closed-form when rt and lt are affine functions of an affine diffusion, so
Ch. 20:
Fixed-Income Pricing
1229
B(t, T ) is known up to a one-dimensional numerical integration. 10 Jarrow, Lando and Turnbull (1997)’s model is the special case of Equation (33) in which rt and lt are statistically independent under Q, in which case B(t, T ) = D(t, T ) [Q(t, T ) + w(1 − Q(t, T ))] ,
(34)
!T where Q(t, T ) = EtQ (exp[− t lu du]) is the risk-neutral survival probability; i.e., the probability under Q that default occurs after T . 6.3. Fractional recovery of par, payable at default Duffie (1998) adopted an alternative timing convention for recovery: a fraction w(st− ; Yt , t) of par is recovered and paid at the time of default, BS (st− ; Yt , t; T ) = w(st− ; Yt , t).
(35)
Under this assumption, the defaultable discount functions jointly solve, for 0 k S − 1,
S−1 k ð j kS + A Bk (t, T ) + B (t, T ) + wtk RkS Rkj t B (t, T ) − rt + Rt t = 0, ðt
(36)
j=0
with boundary conditions Bk (T , T ) = 1. Again, we are not aware of any explicit solutions of Equation (36) in the presence of rating migrations. For the special case of S = 1, this model gives an expression that is identical to Equation !(33), except that each u of the expectations in the second term are replaced by EtQ (exp[− t (rv + lv ) dv] lu du). Thus, as shown by Duffie (1998), this model also admits closed-form expressions for the B(t, T ) up to a numerical integration. 6.4. Pricing defaultable coupon bonds Up to this point we have been focusing on defaultable zero-coupon bonds. Intuitively, a coupon bond with rating i should have a price equal to the present value of its promised cash flows, discounted by the defaultable discount function Bi (t, T ). This is true under the assumption that the loss rates °i (Yt , t) depend only on the rating class i and the economy-wide state vector, but not on characteristics of the cash flows of the bond being priced. In this case, given the Yt , all defaultable securities with the same rating i lose the same fraction °i (Yt , t) if default occurs at t. 10 Even with state-dependent w, tractability need not be lost. For instance, if w(Y , t) = exp[g + g Y ], t 0 Y t then all of the expectations in Equation (33) are still known in closed form in the case of an affine state process.
1230
Q. Dai and K.J. Singleton
When the loss rates depend on cash flow characteristics, then strictly speaking, there does not exist a universal defaultable discount function. Instead, each security (or subset of securities with particular cash flow patterns) will have its own set of defaultable discount functions, reflecting the unique impact of default on its pricing under non-default states. For the recovery-of-market-value models, the loss rate °i (Yt , t) is applied uniformly to the construction of the Bi (t, T ) for discounting coupons and face value. On the other hand, for the recovery-of-face-value models, in constructing the discount factors Bi (t, T ), it is typically assumed that °i (Yt , t) = 1 when discounting coupon payments (zero recovery of coupon payments) and 0 < °i (Yt , t) < 1 when discounting the face value of a bond. Finally, we note that, in the case of coupon bonds, the assumption in Jarrow, Lando and Turnbull (1997) that bond holders recover a fraction of an otherwise equivalent treasury bond represents a third and distinct recovery convention. For creditors now recover a fraction of both face value and promised future coupons through their recovery of a coupon-paying treasury bond. Outside of the special case they examined, this recovery convention has not, to our knowledge, been widely studied. 6.5. Pricing Eurodollar swaps We can, and the literature often does, treat LIBOR-based plain-vanilla swaps as a special case of the pricing relations developed under the fraction-recovery-of-marketvalue assumption. We let (B(t, T ): T t) be the discount curve for the LIBOR rating class. Following Duffie and Singleton (1997), if (i) both counterparties have the same credit rating as LIBOR issuers and maintain this rating up to the time of any defaults (“refreshed” LIBOR quality issuers); (ii) upon default, the counterparty who is in the money recovers a fraction °t of the marked-to-market value of the swap, where °t does not depend on cash flow characteristics; and (iii) the floating index is a LIBOR rate with tenor matching the payment frequency, then the swap rate s(t, T ) on any reset date t < T is equal to the par coupon yield with maturity T for the LIBOR rating class: s(t, T ) =
1 − B(t, T ) 1 , d (T − t)/ d B(t, t + dj)
(37)
j=1
where d is the length of each payment period, typically three months or half a year. The presumption of “refreshed” LIBOR quality clearly makes Equation (37) an approximation to the true pricing relation. In fact, the counterparties to a swap may have asymmetric credit qualities and, even if this is not true at the inception of a swap, the relative qualities of the counterparties may change over time. In these cases of two-sided credit risk, the effective default arrival intensity and recovery for pricing depends on current market price of the swap (that is, on which counterparty the swap is “in the money” to.) Duffie and Huang (1996) and Duffie and Singleton (1997) treat
Ch. 20:
Fixed-Income Pricing
1231
these issues in the case of a single ratings class. Huge and Lando (1999) discuss swap pricing with two-sided risk in a ratings-based model. Gupta and Subrahmanyam (2000) document and explain the mispricing of Eurodollar swaps when they are priced off the Eurodollar futures strips. Since futures rates are higher than the forward rates due to marking to market of futures contracts, swap rates computed by treating futures rates as forward rates are higher than the true “fair market” swap rates. The difference is referred to as the “convexity bias”.
7. Pricing of fixed-income derivatives This section overviews the pricing of fixed-income derivatives using DTSMs, forwardrate-based models, and models that adopt specialized “pricing measures” to simplify the computation of derivatives prices. 7.1. Derivatives pricing using dynamic term-structure models As with term-structure modeling, much of the academic literature on derivatives pricing using DTSMs has focused on affine and QG models. The tractability of affine models is captured by the “extended transform” result of Duffie, Pan and Singleton (2000) which gives G (Yt , t; T ; ø0 , ø1 , v0 , v1 , u) T ( ø0 + ø1 Ys ) ds v0 + v1 YT exp[u YT ] | Yt , = EQ exp −
(38)
t
for Y following an affine jump-diffusion, in closed-form. Specifically, G (Yt , t; T ; ø0 , ø1 , 1, 0, u) = exp a(t) + b(t) Yt , G (Yt , t; T ; ø0 , ø1 , 0, v, u) = exp a(t) + b(t) Yt A(t) + B(t) Yt ,
(39) (40)
where a(t), b(t), A(t) and B(t) are all explicitly known up to a set of ODEs (and, again, t = (T − t)). See Bakshi and Madan (2000) and Chacko and Das (2001) for related pricing results. The Green’s function can be obtained for affine jump-diffusion models by inverse Fourier transform of G, based on the fact that G (Yt , t; T ; ø0 , ø1 , 1, 0, u) = exp[u Y ]G (Yt , t; Y , T ) dY. It follows that European-style derivatives prices can be easily computed by integrating the product of the payoff function and the Green’s function. This observation underlies the pricing formulas for various fixed-income derivatives discussed, for example, in
1232
Q. Dai and K.J. Singleton
Buttler and Waldvogel (1996), Das and Foresi (1996), Nunes, Clewlow and Hodges (1999), Duffie, Pan and Singleton (2000), Bakshi and Madan (2000) and Chacko and Das (2001). From Equation (38) we see that payoffs that are exponential affine in the state or the product of an affine and exponential affine function of Y are accommodated by these pricing models. This covers options on zero-coupon bonds, for example, because the prices of zero-coupon bonds are exponential affine functions of Y . However, these results do not cover the most common form of bond option, namely, options on coupon bonds. Jamshidian (1987) derived coupon bond option pricing formulas for the case of one-factor models in which zero-coupon bond prices are strictly monotonic functions of the (one-dimensional) state. Gaussian and square-root diffusion models are examined in Jamshidian (1989) and Longstaff (1993), respectively. Taking a different approach, Wei (1997) showed that the price of a European option on a coupon bond is approximately proportional to the price of an option on a zero-coupon bond with maturity equal to the stochastic duration [Cox, Ingersoll and Ross (1979)] of the coupon bond. Subsequently, Munk (1999) extended Wei’s Stochastic Duration approximation to the general case of multi-factor affine models. These approximations work very well for options that are either far in or far out of the money, while having relatively large approximation errors (though still absolutely small) for options that are near the money. Approximate pricing formulas for coupon options that are computationally fast and very accurate over a wider range of moneyness, including nearly at the money options, were proposed by Collin-Dufresne and Goldstein (2001b) and Singleton and Umantsev (2002). The former approach uses an Edgeworth expansion of the probability distribution of the future price of a coupon bond. The latter exploits the empirical observation that the optimal exercise boundary for coupon bond options in affine DTSMs can be accurately approximated by straight line segments. Leippold and Wu (2001) derive the counterpart to the transform (39) for QG models which allows them to price derivatives with payoffs that are exponential quadratic functions of the state (which includes zero-coupon bonds as a special case). The approximate pricing of options on coupon bonds developed in Collin-Dufresne and Goldstein (2001b) and Singleton and Umantsev (2002) for affine models may be adapted to QG models, though to our knowledge this adaptation has not been developed. 7.2. Derivatives pricing using forward-rate models A significant part of the literature on fixed-income pricing has focused on forward-rate models in which the terminal payoff Z(T ) is assumed to be completely determined by the discount function (D(t, T ): T t) [as in Ho and Lee (1986)], or equivalently, the forward curve ( f (t, T ): T t) [as in Heath, Jarrow and Morton (1992)] defined by f (t, T ) = −
ð log D(t, T ) , ðT
for any
T t.
(41)
Ch. 20:
Fixed-Income Pricing
1233
The time t price of a fixed-income derivative with terminal payoff Z(T ) = Z( f (T , T + x): x 0) is then given by T Z(t) = EQ exp − f (u, u) du Z ( f (T , T + x): x 0) | f (t, t + x): x 0 . t
(42) For this model to be free of arbitrage opportunities, Heath, Jarrow and Morton (1992) show that the risk-neutral dynamics of the forward curve must be given by T df (t, T ) = s (t, T ) s (t, u) du dt + s (t, T ) dW Q (t), for any T t, (43) t
and for a suitably chosen volatility function s (t, T ). This forward-rate representation of prices is particularly convenient in practice, because the forward curve can be taken as an input for pricing derivatives and, once the functions s (t, T ), for all T t, are specified, then so are the processes f (t, T ) under Q. This approach, as typically used in practice, allows the implied rt and Lt to follow general Ito processes (up to mild regularity conditions); there is no presumption that the underlying state is Markov in this forward-rate formulation. Additionally, taking ( f (t, T ): T t) as an input for pricing means that a forward-rate based model can be completely agnostic about the behavior of yields under the actual data generating process. Building off of the original insights of Heath, Jarrow and Morton, a variety of different forward-rate-based models have been developed and used in practice. The finite dimensionality of W Q was relaxed by Musiela (1994), who models the forward curve as a solution to an infinite-dimensional stochastic partial differential equation (SPDE) [see Da Prato (1992) and Pardoux (1993) for some mathematical characterizations of the SPDE]. Specific formulations of infinite-dimensional SPDEs have been developed under the labels of “Brownian sheets” [Kennedy (1994)], “random fields” [Goldstein (2000)], and “stochastic string shocks” [Santa-Clara and Sornette (2001)]. The high dimensionality of these models gives a better fit to the correlation structure, particularly at high frequencies. Since solutions to SPDEs can be expanded in terms of a countable basis [cylindrical Brownian motions – see, e.g., Da Prato (1992) and Cont (1999)], the SPDE models can also be viewed as infinite-dimensional factor models. Though these formulations are mathematically rich, in practice, they often add little generality beyond finite-state forward-rate models, because practical considerations often lead modelers to work with a finitedimensional W Q . Key to all of these formulations is the specification of the volatility function, since this determines the drift of the relevant forward rates under Q [as in Heath, Jarrow and Morton (1992)]. Amin and Morton (1994) examine a class of one-factor models with the volatility function given by s (t, T ) = [s0 + s1 (T − t)] exp[−l(T − t)] f (t, T )g .
(44)
This specification nests many widely used volatility functions, including the continuous-time version of Ho and Lee (1986) (s (t, T ) = s0 ), the lognormal model
1234
Q. Dai and K.J. Singleton
(s (t, T ) = s0 f (t, T )), and the Gaussian model with time-dependent parameters as in Hull and White (1993). When g Ñ 0, Equation (44) is a special case of the “separable specification” s (t, T ) = x(t, T ) h(t) with x(t, T ) a deterministic function of time and h(t) a possibly stochastic function of Y . The state vector may include the current spot rate r(t) [see, e.g., Jeffrey (1995)], a set of forward rates with fixed time-to-maturity, or an autonomous Markovian vector of latent state variables [Cheyette (1994), Brace and Musiela (1994) and Andreasen, Collin-Dufresne and Shi (1997)]. In practice, the specification of h(t) has been kept simple to preserve computational tractability, often simpler than the specifications of stochastic volatility in yield-based models. On the other hand, Y often has a large dimension (many forward rates are used) and x(t, T ) is given a flexible functional form. Thus, there is the risk with forward-rate models of mis-specifying the dynamics through restrictive specifications of h(t), while “overfitting” to current market information through the specification of x(t, T ). More discipline, as well as added computational tractability, is obtained by imposing a Markovian structure on the forward rate processes. Two logically distinct approaches to deriving Markov HJM models have been explored in the literature. Ritchken and Sankarasubramanian (1995), Bhar and Chiarella (1997) and Inui and Kijima (1998) ask under what conditions, taking as given the current forward-rate curve, the evolution of future forward rates can be described by a Markov process in an HJM model. These papers show that an N -factor HJM model can be represented, under certain restrictions, as a Markov system in 2N state variables. While these results lead to simplifications in the computation of the prices of fixed-income derivatives, they do not build a natural bridge to Markov, spot-rate-based DTSMs. The distributions of both spot and forward rates depend on the date and shape of the initial forward-rate curve. Carverhill (1994), Jeffrey (1995) and Bjork and Svensson (2001) explore conditions under which an N -factor HJM model implies an N -factor Markov representation of the short rate r. In the case of N = 1, the question can be posed as: Under what conditions does a one-factor HJM model – that by construction matches the current forward curve – imply a diffusion model for r with drift and volatility functions that depend only on r and t? Under the assumption that the instantaneous variance of the T -period forward rate is a function only of (r, t, T ), sf2 (r, t, T ), Jeffrey proved the remarkable result that sf2 (r, t, T ) must be an affine function of r (with time-dependent coefficients) in order for r to follow a Markov process. Put differently, his result essentially says that the only family of “internally consistent” one-factor HJM models [see also Bjork and Christensen (1999)] that match the current forward curve and imply a Markov model for r is the family of affine DTSMs with time-dependent coefficients. Bjork and Svensson discuss the multi-factor counterpart to Jeffrey’s result. 7.3. Defaultable forward-rate models with rating migrations No-arbitrage restrictions on the risk-neutral drifts of defaultable forward rates have been derived in rating-migration models [see, e.g., Schonbucher (1998), Duffie (1998), Bielecki and Rutkowski (2000) and Acharya, Das and Sundaram (2002)]. The resulting
Ch. 20:
Fixed-Income Pricing
1235
risk-neutral specifications of the defaultable forward curves can be used to construct arbitrage-free pricing models for credit derivatives, in very much the same way as Equation (43) can be used to construct an arbitrage-free pricing model for defaultfree interest rate derivatives. In general, the no-arbitrage restrictions depend on the recovery scheme for the underlying default pricing model. We illustrate this by giving a heuristic derivation of the no-arbitrage restrictions under two widely used recovery schemes. For a partition t = T0 < T1 < · · · TN − 1 < TN ≡ T of the time interval [t, T ], let k {gt,i : 0 i N − 1} be a consecutive sequence of forward rates for rating class k with settlement dates Ti : Bk (t, Ti ) 1 k − 1 , = gt,i di Bk (t, Ti + di ) where di ≡ Ti + 1 − Ti is the tenor of the underlying zero-coupon bond for the ith forward contract. Inverting, the discount function for rating class k with maturity Tn , 1 n N , is given by Bk (t, Tn ) = exp −
n−1
k di gt,i .
(45)
i=0
Assuming that the defaultable forward rates in rating class k have the following riskneutral dynamics: k k dgt,i = mt,i dt + st,ik dWtQ ,
0 i N − 1,
k we now proceed to derive no-arbitrage restrictions on mt,i , 0 k S − 1, 0 i N − 1, under different recovery schemes.
7.3.1. Fractional recovery of market value In this case, the loss rate °tk does not depend on Bk (t, Tn ), for all k. Substituting the discount functions (45) into Equation (26) yields, for 1 ∀ n N and 0 ∀ k S − 1, −
n−1
k di mt,i +
1 2
n−1 n−1 i = 0 i = 0
i=0
di di st,ik · st,ik +
S−1 k = 0
Rkk t
Bk (t, Tn ) − r = 0. Bk (t, Tn )
(46)
For a given k, differencing Equation (46) with respect to index n yields k mt,n
=
k st,n
·
n−1 i=0
di st,ik
n − 1 S−1 kk dn k k kk exp[dn st,n ] − 1 kk Rt + st,n · st,n + exp di st,n , 2 dn k =0
i=0
1236
Q. Dai and K.J. Singleton
kk k k where st,i ≡ gt,i − gt,i is the spread between two forward rates with the same settlement date Ti but different rating classesk and k . Taking the limit as N → ∞ and N −1 supNi =−01 di → 0, in such a way that i = 0 di = T , we obtain
T
m (t, T ) = s (t, T ) · k
k
k
s (t, u) du + t
S−1
kk Rkk t s (t, T ) exp
k = 0
T
s
kk
(t, u) du ,
t
(47) k k and s k (t, T ) = limdN → 0 st,N are the risk-neutral where m k (t, T ) = limdN → 0 mt,N k , and drift and diffusion of the instantaneous forward rate g k (t, T ) = limdN → 0 gt,N kk k k s (t, T ) = g (t, T ) − g (t, T ) is the spread between two forward curves with different ratings k and k . Equation (47) generalizes Duffie and Singleton (1999) for the case of S = 1 (no rating migrations). Under the same recovery scheme, Acharya, Das and Sundaram (2002) derive no arbitrage restrictions on the risk-neutral drifts of inter-rating spreads skk (t, T ) on a lattice. Due to their discrete-time and discrete state-space setup, these risk-neutral drifts must be determined numerically by solving a system of equations. The continuous-time and continuous state-space limit of their result, when expressed in terms of risk-neutral drifts of defaultable forward rates, converges to Equation (47). Schonbucher (1998) derives Equation (47) under slightly more general assumptions about default events and recovery. In his setup default does not lead to a liquidation, but rather a reorganization of the issuer. Defaulted bonds lose a fraction of their face value and continue to trade, and the fractional loss is a random variable drawn from an exogenous distribution. 7.3.2. Fractional recovery of face value, payable at maturity In this case, the defaultable discount functions (45) must satisfy the PDEs (32), which implies that, for 0 ∀ k S − 1 and 1 ∀ n N , n−1
n−1 n−1
S−1
Bk (t, Tn ) D(t, Tn ) − r + wtk RkS = 0. − + di d + t k (t, T ) k (t, T ) B B n n i=0 i = 0 i k = 0 (48) Differencing with respect to the index n, dividing both sides by d , and taking the limit n N − 1 as N → ∞ and supNi =−01 di → 0 in such a way that i = 0 di = T , we obtain k di mt,i
1 2
i
m k (t, T ) = s k (t, T ) ·
st,ik
· st,ik
T
s k (t, u) du + t
k + wtk RkS t s (t, T ) exp
Rkk t
S−1 k = 0
T
sk (t, u) du , t
kk Rkk t s (t, T ) exp
T
skk (t, u) du t
0 k S − 1,
(49) where m k (t, T ) and s k (t, T ) are the instantaneous drift and diffusion of the instantaneous forward rate g k (t, T ), and sk (t, T ) ≡ g k (t, T ) − f (t, T ) is the forward credit
Ch. 20:
Fixed-Income Pricing
1237
spread of rating class k relative to the default-free forward curve. Equation (49) was first derived by Bielecki and Rutkowski (2000) as one of the “consistency conditions” for an arbitrage-free pricing model with rating migrations under the current recovery scheme. 7.3.3. Fractional recovery of face value, payable at default In this case, the defaultable discount functions must satisfy the PDEs (36). It is straightforward to show that the no-arbitrage restriction now takes the following form: m k (t, T ) = s k (t, T ) ·
T
s k (t, u) du + t
k + wtk RkS t g (t, T ) exp
S−1 k = 0
T
kk Rkk t s (t, T ) exp
g k (t, u) du ,
T
skk (t, u) du t
0 k S − 1,
t
(50) which generalizes Duffie (1998) for the case of S = 1. Under all of the recovery schemes discussed above, the risk-neutral drift of the defaultable forward curve for a given rating class depends on the diffusion and the initial forward curves for all rating classes. The defaultable forward curves are coupled, because the defaultable discount functions are strongly coupled through rating migrations. When S = 1, the no-arbitrage restriction under fractional recovery of market-value has the same form as in the default-free case. This is not the case under fractional recovery of face-value. 7.4. The LIBOR market model An important recent development in the HJM modeling approach, based on the work of Miltersen, Sandmann and Sondermann (1997), Brace, Gatarek and Musiela (1997), Musiela and Rutkowski (1997a) and Jamshidian (1997), is the construction of arbitrage-free models for forward LIBOR rates at an observed discrete tenor structure. Besides the practical benefit of working with observable forward rates (in contrast to the unobservable instantaneous forward rates), this shift overcomes a significant conceptual limitation of continuous-rate formulations. Namely, as shown by Morton (1988) and Sandmann and Sondermann (1997), a lognormal volatility structure for f (t, T ) is inadmissible, because it may imply zero prices for positive-payoff claims and, hence, arbitrage opportunities. With the use of discrete-tenor forwards, the lognormal assumption becomes admissible. The resulting LIBOR market model (LMM) is consistent with the industry-standard Black model for pricing interest rate caps. In addition to taking full account of the observed discrete-tenor structure, the LMM framework also facilitates tailoring the choice of “pricing measures” to the specific derivative products. In the absence of arbitrage opportunities, Harrison and Kreps (1979) and Harrison and Pliska (1981) demonstrated that, for each traded security
1238
Q. Dai and K.J. Singleton
with price Pt , there exists a measure M(P) under which the price of any other traded security with payoffs denominated in units of the numeraire security is a Martingale. The probability measure M(P) is referred to as the pricing measure induced by the price P of the numeraire security. The risk-neutral measure, underlying our preceding discussions of both DTSMs and HJM models, is one example of a pricing measure. The LIBOR market model is based on either one of the following two pricing measures: the terminal (forward) measure proposed by Musiela and Rutkowski (1997a) and the spot LIBOR measure proposed by Jamshidian (1997). To fix the notation for the tenor structure, let us suppose that, at time t = 0, there are N consecutive LIBOR forward contracts, with delivery dates Tn , n = 1, 2, . . . , N . The underlying of the nth forward contract is a Eurodollar deposit with tenor dn . Clearly, dn = Tn + 1 − Tn , n = 1, 2, . . . , N (with TN + 1 ≡ TN + dN ). For 0 < ∀ t TN , let us denote the next delivery date n(t) = inf n N {n: Tn t}. Let B(t, T ) be the LIBOR discount factor at time t with maturity date T . Then the time-t forward LIBOR rate with reset date t < Tn TN is given by B(t, Tn ) 1 −1 . Ln (t) = dn B(t, Tn + 1 ) A caplet is a security with payoff dn [Ln (Tn ) − k]+ , determined at the reset date Tn and paid at the settlement date Tn + 1 (payment in arrears), where Ln (Tn ) is the spot LIBOR rate at Tn and k is the strike rate. Letting Cn (t) denote the price of the caplet, Brace, Gatarek and Musiela (1997) show that, in the absence of arbitrage, both B(t, Tn )/B(t, Tn + 1 ) (and hence Ln (t)) and Cn (t)/B(t, Tn + 1 ) are Martingales under the forward measure, Pn + 1 ≡ M(B(t, Tn + 1 )), induced by the LIBOR discount factor B(t, Tn + 1 ). Furthermore, under the assumption that Ln (t) is log-normally distributed, 11 the Black model for caplet pricing obtains: Cn (t) = dn B(t, Tn + 1 ) [Ln (t) N (d1 ) − k N (d2 )] , d1
≡
log
Ln (t) k
√
+
vn
vn 2
,
d2 ≡
log
Ln (t) k
√
vn
−
vn 2
,
(51) (52)
where N (·) is the cumulative normal distribution function and vn is the cumulative volatility of the forward LIBOR rate from the trade date to the delivery date: !T vn ≡ t n sn (u) sn (u) du. The price of a cap is simply the sum of all un-settled 11
That is, dLn (t) = sn (t) dW n (t), Ln (t)
where W n is a vector of standard and independent Brownian motions under Pn , and sn (t) is a deterministic vector commensurate with W n .
Ch. 20:
Fixed-Income Pricing
1239
caplet prices (including the value of the caplet paid at settlement date Tn(t) which is known at t). The Black–Scholes type pricing formula (51–52) for caps is commonly referred to as the cap market model. The simplicity of the cap market model derives from the facts that (a) each caplet with reset date Tn and payment date Tn + 1 is priced under its own forward measure Pn + 1 ; (b) we can be completely agnostic about the exact nature of the forward measures and their relationship with each other; and (c) we can be completely agnostic about the factor structure: the caplet price Cn does not depend on how the total cumulative volatility vn is distributed across different shocks W n . The simplicity of the cap market model does not immediately extend to the pricing of securities whose payoffs depend on two or more spot LIBOR rates with different maturities, or equivalently two or more forward LIBOR rates with different reset dates. A typical example is a European swaption with expiration date n n(t), final maturity date TN + 1 , and strike k. Let B(t, Tn ) − B(t, TN + 1 ) Sn,N (t) = N , j = n dj B(t, Tj + 1 ) be the forward swap rate, with delivery date Tn and final maturity date TN + 1 , the payoff of the payer swaption at Tn is a stream of cash flows paid at Tj + 1 and in the amount dj [Sn,N (Tn ) − k]+ , n j N , where the spot swap rates Sn (Tn ) are completely determined by the forward LIBOR rates Lj (Tn ), n j N . The market N + value of these payments, as of Tn , is given by j = n dj B(Tn , Tj + 1 )[Sn (Tn ) − k] = N + [1 − B(Tn , TN + 1 ) − k j = n dj B(Tn , Tj + 1 )] . In order to price instruments of this kind, we need the joint distribution of the forward LIBOR rates {Lj (t): n j N , 0 t Tn }, under a single measure. The LIBOR market model arises precisely in order to meet this requirement. Musiela and Rutkowski (1997a) show that under the terminal measure P∗ ≡ PN + 1 , i.e., the probability measure induced by the LIBOR discount factor B(t, TN + 1 ), the forward LIBOR rates can be modeled as a joint solution to the following stochastic differential equations (SDEs): for n(t) ∀ n N , ⎡ ⎤ N d L (t) dLn (t) j j = sn (t) ⎣− sj (t) dt + dW ∗ (t)⎦ , Ln (t) 1 + dj Lj (t)
(53)
j=n+1
where W ∗ is a vector of standard and independent Brownian motions under P∗ . These SDEs have a recursive structure that can be exploited in simulating the LIBOR forward rates: first, the drift of LN (t) is identically zero, because it is a Martingale under P∗ ; second, for n < N , the drift of Ln (t) is determined by Lj (t), n j N . Jamshidian (1997) proposes an alternative construction of the LIBOR market model based on the so-called the spot LIBOR measure, PB , induced by the price of a “rolling
1240
Q. Dai and K.J. Singleton
zero-coupon bond” or “rolling C.D.” (rather than a continuously compounded bank deposit account which induces the risk-neutral measure):
B(t) ≡
n(t) − 1 B(t, Tn(t) ) 1 + dj Lj (Tj ) . B(0, T1 ) j=1
He shows that, under this measure, the set of LIBOR forward rates can be modeled as a joint solution to the following set of SDEs: for n(t) ∀ n N ,
⎡
dLn (t) = sn Ln(t) (t), t ⎣ Ln(t) (t)
n
j = n(t)
⎤ dj Lj (t) si (Li (t), t) dt + dW B (t)⎦ , 1 + dj Lj (t)
(54)
where W B is a vector of standard and independent Brownian motions under PB and the possible state-dependence of the volatility function is also made explicit. These SDEs also have a recursive structure: starting at n = n(t), Ln(t) (t) solves an autonomous SDE; for n > n(t), the drift of Ln (t) is determined by Lj (t), n(t) j n. Under the LIBOR market model, the time-t price of a security with payoff g({Lj (Tn ): n j N }) at Tn is given by g {Lj (Tn ): n j N } Pt = B (t, TN + 1 ) B (t, TN + 1 ) B g {Lj (Tn ): n j N } = B t, Tn(t) Et , +n − 1 j = n(t) (1 + dj Lj (Tj )) Et∗
(55)
where Et∗ [·] denotes the conditional expectation operator under the terminal measure P∗ and EtB [·] denotes the conditional expectation operator under the spot LIBOR measure PB . The Black model for caplet pricing or the cap market model is recovered under the assumption that the proportional volatility functions sj (t) are deterministic. 12
12 The pricing equation (55) holds even when the proportional volatility of the forward LIBOR rates are stochastic. Narrowly defined, the LIBOR market model refers to the pricing model based on the assumption that the proportional volatilities of the forward LIBOR rates are deterministic. Broadly defined, the LIBOR market model refers to the pricing model based on any specification of statedependent proportional volatilities (as long as appropriate Lipschitz and growth conditions are satisfied).
Ch. 20:
Fixed-Income Pricing
1241
7.5. The swaption market model According to Equation (55), the price of a payer swaption with expiration date Tn and final maturity date TN + 1 is given by ⎡ + ⎤ N 1 − B(T , T ) − k d B(T , T ) n N + 1 j n j + 1 j = n ⎢ ⎥ Pn,N (t) = B(t, TN + 1 ) Et∗ ⎣ ⎦ B(Tn , TN + 1 ) + ⎤ N , T ) − k d B(T , T ) 1 − B(T n N +1 n j+1 j=n j ⎥ ⎢ = B(t, Tn(t) ) EtB ⎣ ⎦. +n − 1 j = n(t) (1 + dj Lj (Tj )) ⎡
Under the assumption of deterministic proportional volatility for forward LIBOR rates, the above expression can not be evaluated analytically. In order to calibrate theoretical swaption prices directly to market quoted Black volatilities for swaptions, a more tractable model for pricing European swaptions is desirable. Jamshidian (1997) shows that such a model can be obtained by assuming that the proportional volatilities of forward swap rates, rather than those of forward LIBOR rates, are deterministic. The resulting model is referred to as the swaption market model. The swap market model is based on the forward swap measure, Pn,N , induced by the price of a set of fixed cash flows paid at Tj + 1 , n j N , namely, Bn,N (t) ≡
N
dj B(t, Tj + 1 ),
t Tn + 1 .
j=n
Under Pn,N , the forward swap rate Sn,N (t) is a Martingale: dSn,N (t) = sn,N (t) dW n,N , Sn,N (t) where W n,N is a vector of standard and independent Brownian motions under Pn,N . Thus, the price of a European payer swaption with expiration date Tn and final settlement date TN + 1 is given by $ + % (56) Pn,N (t) = Bn,N (t) Etn,N Sn,N (Tn ) − k , t Tn . Under the assumption that the proportional volatility of the forward swap rate is deterministic, the swaption is priced by a Black–Scholes type formula: Pn,N (t) = Bn,N (t) Sn,N N (d1 ) − kN (d2 ) , d1 ≡
log Sn,N + vn,N 2 , √k vn,N
d2 ≡
log Sn,N − vn,N 2 , √k vn,N
!T where vn,N ≡ t n sn,N (u) sn,N (u) du is the cumulative volatility of the forward swap rate from the trade date to the expiration date of the swaption.
1242
Q. Dai and K.J. Singleton
References Acharya, V.V., S.R. Das and R.K. Sundaram (2002), “Pricing credit derivatives with rating transitions”, Financial Analyst Journal 58:28−44. Ahn, C., and H. Thompson (1988), “Jump diffusion processes and term structure of interest rates”, Journal of Finance 43:155−174. Ahn, D.-H., and B. Gao (1999), “A parametric nonlinear model of term structure dynamics”, Review of Financial Studies 12:721−762. Ahn, D.-H., R.F. Dittmar and A.R. Gallant (2002), “Quadratic Gaussian models: theory and evidence”, Review of Financial Studies 15:243−288. Amin, K.I., and A.J. Morton (1994), “Implied volatility function in arbitrage-free term structure models”, Journal of Financial Economics 35:141−180. Andersen, L., and J. Andreasen (2001), “Factor dependence of Bermudan swaptions: fact or fiction”, Journal of Financial Economics 62:3−37. Anderson, L., and M. Broadie (2001), “A primal–dual simulation algorithm for pricing multi-dimensional American options”, Working Paper (Columbia University, New York). Andreasen, J., P. Collin-Dufresne and W. Shi (1997), “Applying the HJM-approach when volatility is stochastic”, Working Paper (Carnegie Mellon University, Pitsburgh, PA). Bakshi, G., and D. Madan (2000), “Spanning and derivative-security valuation”, Journal of Financial Economics 55:205−238. Balduzzi, P., S.R. Das, S. Foresi and R.K. Sundaram (1996), “A simple approach to three factor affine term structure models”, Journal of Fixed Income 6:43−53. Balduzzi, P., S.R. Das and S. Foresi (1998), “The central tendency: a second factor in bond yields”, Review of Economics and Statistics 80:62−72. Bansal, R., and H. Zhou (2002), “Term structure of interest rates with regime shifts”, Journal of Finance 57:1997−2048. Beaglehole, D.R., and M.S. Tenney (1991), “General solutions of some interest rate-contingent claim pricing equations”, Journal of Fixed Income 1:69−83. Bhar, R., and C. Chiarella (1997), “Transformation of Heath–Jarrow–Morton models to Markovian systems”, European Journal of Finance 3:1−26. Bielecki, T., and M. Rutkowski (2000), “Multiple ratings model of defaultable term structure”, Mathematical Finance 10:125−139. Bielecki, T., and M. Rutkowski (2001), “Modeling of the defaultable term structure: conditionally Markov approach”, Working Paper (Northeastern Illinois University; Warsaw University of Technology). Bjork, T., and B.J. Christensen (1999), “Interest rate dynamics and consistent forward rates curves”, Mathematical Finance 9(4):323−348. Bjork, T., and L. Svensson (2001), “On the existence of finite-dimensional realizations for nonlinear forward rate models”, Mathematical Finance 11(2):205−243. Black, F., and J. Cox (1976), “Valuing corporate securities: some effects of bond indenture provisions”, Journal of Finance 43:351−367. Black, F., E. Derman and W. Toy (1990), “A one-factor model of interest rates and its application to treasury bond options”, Financial Analysts Journal 1:33−39. Brace, A., and M. Musiela (1994), “A multifactor Gauss Markov implementation of Heath, Jarrow, and Morton”, Mathematical Finance 4:259−283. Brace, A., D. Gatarek and M. Musiela (1997), “The market model of interest rate dynamics”, Mathematical Finance 7:127−154. Brito, R., and R. Flores (2001), “A jump-diffusion yield-factor model of interest rates”, Working Paper (EPGE/FGV, Brazil). Buttler, H., and J. Waldvogel (1996), “Pricing callable bonds by means of Green’s function”, Mathematical Finance 6:53−88. Carverhill, A. (1994), “When is the short rate Markovian”, Mathematical Finance 4:305−312.
Ch. 20:
Fixed-Income Pricing
1243
Chacko, G., and S. Das (2001), “Pricing interest rate derivatives: a general approach”, Review of Financial Studies 15:195−241. Chapman, D., and N. Pearson (2001), “What can be learned from recent advances in estimating models of the term structure”, Financial Analysts Journal 57:77−95. Chen, R., and L. Scott (1993), “Maximum likelihood estimation for a multifactor equilibrium model of the term structure of interest rates”, Journal of Fixed Income 3:14−31. Cheyette, O. (1994), “Markov representation of the Heath–Jarrow–Morton model”, Working Paper (BARRA, California). Collin-Dufresne, P., and R.S. Goldstein (2001a), “Do bonds span the fixed income markets? Theory and evidence for unspanned stochastic volatility”, Working Paper (GSIA, Carnegie Mellon University, Philadelphia, PA; Ohio State University, Columbus, OH). Collin-Dufresne, P., and R.S. Goldstein (2001b), “Pricing swaptions within the affine framework”, Working Paper (Carnegie Mellon University, Philadelphia, PA). Constantinides, G. (1992), “A theory of the nominal term structure of interest rates”, Review of Financial Studies 5:531−552. Cont, R. (1999), “Modeling term structure dynamics: an infinite dimensional approach”, Working Paper ´ (Ecole Polytechnique, Paris, France). Cox, J.C., J.E. Ingersoll and S.A. Ross (1979), “Duration and the measurement of basis risk”, Journal of Business 52:51−61. Cox, J.C., J.E. Ingersoll and S.A. Ross (1980), “An analysis of variable loan contracts”, Journal of Finance 35:389−403. Cox, J.C., J.E. Ingersoll and S.A. Ross (1985), “A theory of the term structure of interest rates”, Econometrica 53:385−408. Da Prato, G. (1992), Stochastic Equations in Infinite Dimensions (Cambridge University Press, Cambridge). Dai, Q., and K. Singleton (2000), “Specification analysis of affine term structure models”, Journal of Finance 55:1943−1978. Dai, Q., and K. Singleton (2003), “Term structure modeling in theory and reality”, Review of Financial Studies, forthcoming. Das, S., and S. Foresi (1996), “Exact solutions for bond and option prices with systematic jump risk”, Review of Derivatives Research 1:7−24. Duffie, D. (1996), Dynamic Asset Pricing Theory, 2nd Edition (Princeton University Press, Princeton, NJ). Duffie, D. (1998), “Defaultable term structure models with fractional recovery of par”, Working Paper (Graduate School of Business, Stanford University, Stanford, CA). Duffie, D., and M. Huang (1996), “Swap rates and credit quality”, Journal of Finance 51:921−949. Duffie, D., and R. Kan (1996), “A yield-factor model of interest rates”, Mathematical Finance 6:379−406. Duffie, D., and J. Liu (2001), “Floating-fixed credit spreads”, Financial Analysts Review 57(3):76−87. Duffie, D., and K. Singleton (1997), “An econometric model of the term structure of interest rate swap yields”, Journal of Finance 52:1287−1321. Duffie, D., and K. Singleton (1999), “Modeling term structures of defaultable bonds”, Review of Financial Studies 12:687−720. Duffie, D., and K. Singleton (2003), Credit Risk Pricing and Risk Management for Financial Institutions (Princeton University Press, Princeton, NY). Duffie, D., J. Pan and K. Singleton (2000), “Transform analysis and asset pricing for affine jumpdiffusions”, Econometrica 68:1343−1376. Duffie, D., D. Filipovic and W. Schachermayer (2001), “Affine processes and applications in finance”, Working Paper (Stanford University, Stanford, CA). Eom, Y. (1998), “An efficient GMM estimation of continuous-time asset dynamics: implications for the term structure of interest rates”, Working Paper (Yonsei University, Seoul, South Korea).
1244
Q. Dai and K.J. Singleton
Evans, M.D. (2000), “Regime shifts, risk and the term structure”, Working Paper (Georgetown University, Washington, DC). Feller, W. (1951), “Two singular diffusion problems”, Annals of Mathematics 54:173−182. Goldstein, R.S. (2000), “The term structure of interest rates as a random field”, Review of Financial Studies 13:365−384. Gourieroux, C., A. Monfort and V. Polimenis (2002), “Affine term structure models”, Working Paper (University of Toronto, Canada). Gray, S. (1996), “Modeling the conditional distribution of interest rates as a regime-switching process”, Journal of Financial Economics 42:27−62. Gupta, A., and M. Subrahmanyam (2000), “An empirical investigation of the convexity bias in the pricing of interest rate swaps”, Journal of Financial Economics 55:239−279. Hamilton, J. (1989), “A new approach to the economic analysis of nonstationary time series and the business cycle”, Econometrica 57:357−384. Harrison, J.M., and S.R. Pliska (1981), “Martingales and stochastic integrals in the theory of continuous trading”, Stochastic Process and Their Applications 11:215−260. Harrison, M., and D. Kreps (1979), “Martingales and arbitrage in multi-period securities markets”, Journal of Economic Theory 20:381−408. Haugh, M.B., and L. Kogan (2001), “Pricing American options: a duality approach”, Working Paper (Sloan School of Management, MIT, Cambridge, MA). Heath, D., R. Jarrow and A. Morton (1992), “Bond pricing and the term structure of interest rates: a new methodology”, Econometrica 60:77−105. Ho, T.S., and S. Lee (1986), “Term structure movements and pricing interest rate contingent claims”, Journal of Finance 41:1011−1028. Huge, B., and D. Lando (1999), “Swap pricing with two-sided default risk in a ratings based model”, European Finance Review 3:239−268. Hull, J., and A. White (1993), “One-factor interest-rate models and the valuation of interest-rate derivative securities”, Journal of Financial and Quantitative Analysis 28:235−254. Inui, K., and M. Kijima (1998), “A Markovian framework in multi-factor Heath–Jarrow–Morton models”, Journal of Financial and Quantitative Analysis 33(3):423−440. Jagannathan, R., A. Kaplan and S. Sun (2001), “An evaluation of multi-factor CIR models using LIBOR, swap rates, and swaption prices”, Working Paper (Northwestern University, Evanston, IL). Jamshidian, F. (1987), “Pricing of contingent claims in the one-factor term structure model”, Working Paper (Merrill Lynch Capital Markets). Jamshidian, F. (1989), “An exact bond option formula”, Journal of Finance 44:205−209. Jamshidian, F. (1997), “Libor and swap market models and measures”, Finance Stochastics 1:293−330. Jarrow, R.A., D. Lando and S.M. Turnbull (1997), “A Markov model for the term structure of credit spreads”, Review of Financial Studies 10(2):481−523. Jeffrey, A. (1995), “Single factor Heath–Jarrow–Morton term structure models based on Markov spot interest rate dynamics”, Journal of Financial and Quantitative Analysis 30:619−642. Kennedy, D. (1994), “The term structure of interest rates as a gaussian random field”, Mathematical Finance 4:247−258. Landen, C. (2000), “Bond pricing in a hidden Markov model of the short rate”, Finance and Stochastics 4:371−389. Lando, D. (1998), “Cox processes and credit-risky securities”, Review of Derivatives Research 2:99−120. Langetieg, T. (1980), “A multivariate model of the term structure”, Journal of Finance 35:71−97. Leippold, M., and L. Wu (2001), “Design and estimation of quadratic term structure models”, Working Paper (Fordham University, New York). Leland, H. (1994), “Corporate debt value, bond covenants, and optimal capital structure”, Journal of Finance 49:1213−1252. Leland, H., and K. Toft (1996), “Optimal capital structure, endogenous bankruptcy, and the term structure of credit spreads”, Journal of Finance 51:987−1019.
Ch. 20:
Fixed-Income Pricing
1245
Li, T. (2000), “A model of pricing defaultable bonds and credit ratings”, Working Paper (Olin School of Business, Washington University at St. Louis, MO). Longstaff, F. (1989), “A nonlinear general equilibrium model of the term structure of interest rates”, Journal of Financial Economics 2:195−224. Longstaff, F. (1993), “The valuation of option on coupon bonds”, Journal of Banking and Finance 17:27−42. Longstaff, F., and E. Schwartz (1992), “Interest rate volatility and the term structure: a two-factor general equilibrium model”, Journal of Finance 47:1259−1282. Longstaff, F., and E. Schwartz (1995), “A simple approach to valuing risky fixed and floating rate debt”, Journal of Finance 50(3):789−819. Longstaff, F., and E. Schwartz (2001), “Valuing American options by simulation: a simple least-squares approach”, Review of Financial Studies 14:113−147. Longstaff, F., P. Santa-Clara and E. Schwartz (2001), “Throwing away a billion dollars”, Journal of Financial Economics 62:39−66. Lu, B. (1999), “An empirical analysis of the Constantinides model of the term structure”, Working Paper (University of Michigan, Ann Arbor, MI). Madan, D., and H. Unal (1998), “Pricing the risks of default”, Review of Derivatives Research 2: 121−160. Merton, R. (1974), “On the pricing of corporate debt: the risk structure of interest rates”, The Journal of Finance 29:449−470. Miltersen, K.R., K. Sandmann and D. Sondermann (1997), “Closed form solutions for term structure derivatives with log-normal interest rates”, Journal of Finance 52(1):409−430. Morton, A. (1988), “Arbitrage and martingales”, Working Paper [Technical Report 821] (Cornell University, Ithaca, NY). Munk, C. (1999), “Stochastic duration and fast coupon bond option pricing in multi-factor models”, Review of Derivatives Research 3:157−181. Musiela, M. (1994), “Stochastic PDEs and term structure models”, Working Paper (University of New South Wales, Sydney). Musiela, M., and M. Rutkowski (1997a), “Continuous-time term structure models: a forward measure approach”, Finance and Stochastics 1:261−291. Musiela, M., and M. Rutkowski (1997b), Martingale Methods in Financial Modelling (Springer, Berlin). Naik, V., and M.H. Lee (1997), “Yield curve dynamics with discrete shifts in economic regimes: theory and estimation”, Working Paper (Faculty of Commerce, University of British Columbia, Vancouver, BC). Nunes, J.P.V., L. Clewlow and S. Hodges (1999), “Interest rate derivatives in a Duffie and Kan model with stochastic volatility: an Arrow–Debreu pricing approach”, Review of Derivatives Research 3:5−66. Pardoux, E. (1993), “Stochastic partial differential equations: a review”, Bulletin des Sciences Math´ematiques 117(1):29−47. Pearson, N.D., and T. Sun (1994), “Exploiting the conditional density in estimating the term structure: an application to the Cox, Ingersoll, and Ross model”, Journal of Finance 49:1279−1304. Piazzesi, M. (2001), “An econometric model of the term structure with macroeconomic jump effects”, Working Paper (University of California, Los Angeles, CA). Ritchken, P., and L. Sankarasubramanian (1995), “Volatility structure of forward rates and the dynamics of the term structure”, Mathematical Finance 5:55−72. Sandmann, K., and D. Sondermann (1997), “A note on the stability of lognormal interest rate models and the pricing of Eurodollar futures”, Mathematical Finance 7:119−125. Santa-Clara, P., and D. Sornette (2001), “The dynamics of the forward interest rate curve with stochastic string shocks”, Review of Financial Studies 14:149−185. Schonbucher, P.J. (1998), “Term-structure modeling of defaultable bonds”, Review of Derivatives Research 2:161−192. Singleton, K., and L. Umantsev (2002), “The price of volatility risk implicit in swaption prices”, Working Paper (Stanford University, Stanford, CA).
1246
Q. Dai and K.J. Singleton
Stanton, R. (1995), “Rational prepayment and the valuation of mortgage-backed securities”, Review of Financial Studies 8:677−708. Steenkiste, R.J.V., and S. Foresi (1999), “Arrow–Debreu prices for affine models”, Working Paper (Soloman Smith Barney and Goldman Sachs, New York). Vasicek, O. (1977), “An equilibrium characterization of the term structure”, Journal of Financial Economics 5:177−188. Wei, J. (1997), “A simple approach to bond option pricing”, Journal of Futures Markets 17:131−160.
SUBJECT INDEX
alternative stabilization policies 911 alternative trading systems (ATS) 595 ambiguity 1102 ambiguity aversion 1074, 1075, 1082, 1083 American call 660, 661 American call, with dividends 661 American Depositary Receipts (ADRs) 1000 American Depository Receipts (ADRs) 278, 592 American Exchange (AMEX) 267, 297, 1170 American Free Banking Era 454 American option 1213 American option valuation 658 American put 661 American put option 657 American security 657 American security valuation 658 American Stock Exchange (AMSE) 589 AMEX-NYSE issuers 271 amortized spreads 1037 analyst recommendations 288 anchoring 1068 Anglo-American market-based system compared to Germany and Japan 42 announcement drift 389, 403, 410, 1089, 1093, 1094, 1097 announcement effects 261, 263–265, 274, 387, 395–398, 403, 410, 412, 438, 517, 1093 open market share repurchase 406 anomalies 779, 942, 1127 implied volatility anomalies 1174–1176 anonymity 587 anonymous traders 580 anti-takeover devices 46 classified boards 54 cross-shareholdings 54 enhanced voting rights 54 fair price amendments 20 staggered boards 20 super-majority amendments 20 super-majority requirements 54 voting right restrictions 54 anti-takeover laws 52
abandonment options 280 abnormal returns 270, 758 absolute priority 713 absorbing boundary 203 active traders 559 active trading 791 adapted process 643 additive utility model 647 adjusted present value 191 admissibility conditions 1218 adverse selection 117, 263, 288, 377–383, 408–410, 414, 415, 566, 567, 923, 1030, 1039 adverse selection costs 408 adverse selection effect 563 adverse-selection models 135, 339 adverse-selection problem 117, 141, 444, 445 affine function 691 affine jump-diffusions 1222 affine models 696, 709 characteristic function 696, 699 affine term-structure model 691 affirmative obligation 560 after-tax cash flows 189 after-tax corporate profits 898 after-tax pricing kernel 189 after-tax return 925 agency-based pecking order 241 agency concerns 315 agency conflicts 130 agency costs 217, 218, 226, 228, 233, 240, 398, 415 agency issues 316 agency problem 114, 247, 264, 339, 353, 383–386, 396–399, 762, 961, 969 top-level 140 agency theories 218, 239–243, 1125 aggregate risk 912 aggregation 620 allocation of shares 289 alpha 758, 773 alternative assumptions on preferences 911 alternative preference structures 910 I-1
I-2 appropriability problem 444 approximate factor structure 758 arbitrage 369, 371, 373, 611, 612, 644, 645, 666, 668, 670–672, 690, 817 absence of 614, 646, 655, 668 limits to 961, 1045, 1056–1065 arbitrage free 904 arbitrage pricing theory (APT) 170, 178, 183, 633, 754, 757, 758, 767, 994 international 994, 995 arbitrageurs 275 ARCH, see autoregressive conditional heteroscedastic (ARCH) models arithmetic average 891, 892 arithmetic mean 183 arm’s-length debt or equity 472 arm’s-length finance 463, 466 Arrow–Debreu model 610 Arrow–Debreu prices 1212 Arrow–Debreu problem 608 Arrow–Debreu security 608 Arrow–Debreu world 618 Asay model 1154 Asian crisis 8, 43, 1003, 1005, 1007, 1012, 1013 ask price 565 assessable common stock 316 asset allocation 422 asset payoffs 171 asset-price dynamics 1023 asset pricing 339, 340, 598, 621 fundamental theorem 614, 615 market effects 598 asset-pricing models 746, 942 6-factor 272 asset-pricing model-based estimates 181 consumption-based 753 multi-beta 754, 757 asset-pricing theory 170, 899 asset substitution 469 asset substitution effect 118 assets-in-place 233, 969 asymmetric response 387 asymptotic principal components 767 attention effect 1106 auction market 557 auctions 284 auto-correlation 1023 automated market 580 automation 587
Subject Index autoregressive conditional duration model 1032 autoregressive conditional heteroscedastic (ARCH) models 706, 808 availability 1068 average returns 265, 891 arithmetic 891 average annual real return 894 average real stock return 805 cross-section 1087–1098 determination 1091 geometric 891 seasoned equity offerings (SEO) 267
bailouts 30 balance sheets 217 bank capital crunch 134 bank capital requirements 529–531 bank coalitions 503 bank-existence theories 458, 459 bank failures 439 number of 520 bank fragility 492 bank lending channel 134, 135, 490, 491 bank-like intermediaries 434, 437, 447 bank loans 276, 433 bank monitoring 30 bank panics 436, 494–518, 1012 asymmetric information 509 Canada 500, 501 contagion 514, 516–518 definition 495 Great Depression 511–516 bank regulation 518, 519 bank relationships 440, 464 bank run 495 bank–industry ties 65 Banking Act of 1933 (USA) 529 banking crises 434, 495, 498 bankruptcy 119, 226, 227 costs 15, 218, 442, 717, 718 economies of scale 226 Bankruptcy Act 1978 (USA) 51 bank’s obligations 435 barrier options 1156 base rate neglect 1066 Basel Accord 493, 529–531 basis rate swap 1139 Bayesian shrinkage estimator 1177 beginning-of-quarter timing convention 814
Subject Index behavioral finance 126, 401, 1053–1116, 1126 belief-based models 1092–1097 beliefs 1065–1069 preferences 1069–1075 behavioral theories 169 belief perseverance 1068 Bellman equation 749 benchmark 790 Berle–Dodd dialogue 7 Bermudan swaption 692 Bessel function 694 beta coefficient 754, 778 beta pricing 747, 794 beta-pricing approaches 784 beta-pricing model 756 betweenness-certainty-equivalent model 649 biased trading behavior 1026 bid and ask quotes 1026 bid price 564, 565 bid–ask bounce 1035 bid–ask spread 414, 553–599, 1029 block trading 577 components 572 cross-section variation 575 herding 578 big bang 592 binomial distribution 1158 binomial returns 651 Black model 1153 Black–Scholes (1973) partial differential equation 196 Black–Scholes formula 666, 676, 677, 690, 705, 1153 Black–Scholes implied volatilities 705 Black–Scholes/Merton formula 1150, 1151, 1153, 1154, 1174 call options 1152 Black–Scholes/Merton model 1150, 1154, 1155 Black–Scholes/Merton option valuation theory 1149–1166 Black–Scholes model bonds 665 option pricing 665 options 666 stocks 665 Black–Scholes–Merton stock option pricing paradigm 175 block sales 58 block trades 374, 1031
I-3 block trading 577 crossing the block 577 price impacts 578 work the order 577 blockholders 24, 64, 70, 84, 285 boards of directors 13, 31, 49, 71 appointments 59, 66 as monitors of managers 83 composition 32, 72 employee representation 38 independence 32 role of 32 two-tier boards 81 working of 72 bond market 596 bond spreads 596 bondholders 171 bonds 276 book-building 281, 284, 286, 289, 293 book-to-market 265 book-to-market ratio 269, 752, 968 book-to-market stocks 765 book value 765 borrowing 381 borrowing-constrained economy 921 borrowing constraints 14, 364, 891, 921, 923 borrowing-unconstrained economy 921 boundary condition 667 breach of trust 23 breadth of ownership 1096 brokers 556 Brownian model of uncertainty 662 space of adapted processes 662 stochastic integral 663 Brownian motion 695, 697, 698, 705–707, 712, 1210, 1218 arithmetic 1148 geometric 178, 202, 665, 983, 1148, 1149, 1151, 1166, 1182 Ito’s formula 668 log-normal 665 standard 662 budget-feasible consumption set 646 buffer stock 452 buffer-stock savers 833 business cycle, USA 859 business-cycle fluctuations 133 business judgement 18 business risk 222 buy-and-hold abnormal returns (BHARs) 962 buy-and-hold returns 268, 269
I-4 buy-sell imbalances buybacks 407
Subject Index 1035
Cadbury Report and Recommendations 47 calendar-time portfolio method 965 calibration 688, 706 call-auction markets 558, 580 call-auction mechanism 1038 call market 579 call options, American-style 1144, 1151, 1174 early exercise 1144 lower price bound 1144 on dividend-paying stocks 1156 call options, European-style 666, 1149, 1151 Black–Scholes/Merton formula 1152, 1153 lower price bound 1143 valuation 1148 Call Reports 435, 508 callable bonds 724 canonical correlation analysis 1040 canonical model 1220 canonical model of asset pricing 208 capital allocation 169 internal 147 oil industry 147 within-firm 136 capital-allocation problem 114 capital-allocation process 174 capital asset pricing model (CAPM) 170, 181, 182, 190, 359, 395, 624, 633, 756, 767, 775, 831, 901, 914, 979, 1087 after-tax 364, 366 consumption-based 653 domestic 980, 981, 991, 993 global 980 high-beta shocks 923 international 995 mean absolute valuation errors 192 Merton’s intertemporal capital asset pricing model 178 of Lintner and Sharpe 178 risk and return relation 626 Roll critique 764 single-factor model 992 world 979, 980, 992–995, 997, 998 capital budgeting 340 capital budgeting practices 145 capital cash flows 191 capital crunches 134 capital expenditures 413, 414
capital flows 1004, 1005, 1007 barriers to 1005 capital gains 339, 347, 353, 359, 367, 372, 422, 926 long-term 347 tax rate 361 capital market definition 169 external 114 internal 114, 137, 140, 144, 152, 433 capital market equilibrium 174 capital market integration 12 capital-market line 200 capital markets, complete definition 353 capital rationing, internal 145 capital requirements 520 capital structure 54, 217, 339, 340, 1106 capital-structure effect 170 capital-structure irrelevance 217 capital structure policy 122 capital structure theories 217 tests of 217 “caplet” payments 692 carrying costs financial asset 1141 physical asset 1141 cash 385 cash distributions to equityholders 343 cash dividends 402 cash-flow non-linearities 195 cash-flow rights 50, 246 cash-flow volatility 179, 358 cash flows 339, 351, 382, 386, 412, 419 cash M&As 356 cash payments 421 cash reserves 413 cash settlement 704 catastrophic shocks 920 “catching up with the Joneses” 916–918 categories 1100 Cauchy–Schwartz Inequality 675 Center for Research in Security Prices (CRSP) 893, 944 centralization 593 CEO agency problems 13 compensation 49, 85 monitoring 31 certainty effect 1072 certainty-equivalent approach 175, 197
Subject Index certainty-equivalent cash flow 195 certainty-equivalent pricing 171, 195 certainty-equivalent valuation approach 197 certificates of deposit (CDs) 437 CEX database 919 Chicago Board of Trade (CBT) 1134–1136, 1138, 1157, 1167, 1171, 1189 Chicago Board Options Exchange (CBOE) 597, 1136, 1139, 1156, 1170, 1174, 1179, 1180, 1189, 1190, 1192, 1196 Chicago Mercantile Exchange (CME) 1135, 1167–1169 Chinese walls 260 chooser options 1156 CIR model (Cox, Ingersoll, Ross) 690, 691, 694, 698, 707 Citizens Utilities 401, 402 class-action suits 18 clearing and settlement 562 clearinghouses 454, 497, 501, 510, 512, 513, 1135 Clearing House of Montreal 503 clearinghouse loan certificates 502 clientele effect 363 CLOB (consolidated limit order book) 591 clock-time 1031 closed-end fund discount anomaly 957 closed-end fund puzzle 957 closed-end funds 1098, 1099 co-movement 1099–1101 codetermination 38, 81 coefficient of loss aversion 1073 coefficient of relative risk tolerance 982 coefficient of risk aversion 906, 907, 913 relative 806, 903, 906, 913 Coffee, Sugar and Cocoa Exchange (CSCE) 1138 cold-hands phenomenon 958 collateral 15, 718 collateral constraints 651 collective action problem 4 commissions 292, 590 soft dollars 590 commitment to trade 592 Commodity Exchange (COMEX) 1135 common agency problem 13 common bank lender 518 common stock 226 comparable firm multiples 280 compensated counting process 719 compensation 419
I-5 compensation package 240 competition from satellite markets 593 complementary-slackness condition 683 complete contracting possibilities 353 complete contracts 240 complete markets 351–353, 359, 368, 611, 618, 619, 630, 650, 653, 657, 677, 918 complete-markets model 831 completing markets 329 computation of equilibrium 651 computational models 232, 233 conditional alpha 787 conditional beta 756 conditional efficiency 763 conditional equity premium 928 conditional market timing 792 conditional multiple-beta model 789 conditional performance attribution 791 conditional performance evaluation 748, 785 conditional regression coefficients 789 conditional theories 247 conditional weight measure (CWM) 790 conditionally efficient 760 conditioning information 747, 768, 770, 772 confirmation bias 1068 conflicts of interest 228, 407 Connor–Korajczyk approach 765 conservatism 1067, 1092, 1093 consol 712 inflation-indexed 841 real consol bonds 843, 847 consolidated quote system (CQS) 561, 592 consolidated trade system (CTS) 561, 592 constant absolute risk aversion (CARA) 631 constant-dividend-yield stock options 1153 constant elasticity of variance (CEV) model 1165 constant relative risk aversion (CRRA) 630, 903, 931 preferences 899 constant relative risk condition (CRRC) 178, 180, 185, 194, 197 constant risk premium 176 consumer heterogeneity 921 consumer price index (CPI) 953 consumption 354, 679 consumption-based asset pricing 806, 1024, 1041 consumption capital asset pricing model (CCAPM) 622 consumption good 979
I-6 consumption model 754 consumption opportunity set 982 consumption process 646 contagion 495, 507, 516, 517, 978, 1010–1014 contagion trading 1009 contingent claims 922 contingent orders 588 continually compounding short rate 665 continuous auction market 558 continuous-branching process 698 continuous log-normal distribution 1158 continuous market 580 continuous-time economy 176 continuous utility model 646 continuously compounding yield 687 contract expirations 1193 contract introductions 1189 contracting 339 contractual failures 40 contrarian effect 949 control 4 control rights 119, 136, 138, 246 convenience yield 198, 1141 convergence 704 convertible bonds 276, 693, 719, 724 convexity bias 1231 convexity relation 1169 core capital 530 core deposits 458 corporate charter 13 corporate democracy 5 corporate discount rates 169 corporate earnings 342 corporate feudalism 5 corporate finance 968 effect of irrationality 1106–1112 corporate financing actions, reactions to 274– 279 corporate governance 4, 433 codes 47 comparative 4 employees 37 Germany 33, 35, 38, 40, 42, 43, 58, 59, 65, 71, 73, 81, 84 Japan 9, 29, 30, 42–44, 46, 60, 62, 65, 67, 73, 75, 81, 84 less-developed countries 294 world-wide convergence 46 corporate Jacksonians 7 corporate law 4 corporate suffrage 5
Subject Index corruption 288 cost of capital 43, 175, 180, 189, 339, 351, 383, 895 effects of taxation 170 weighted average 190, 219 cost-of-carry formula 703 cost-of-carry relation 1142, 1167 cost of debt 43 cost of equity 43, 181 effects of taxation 170 cost of equity capital 170, 180, 181, 185, 766 estimation 180 costless arbitrage profit 1139 costly-external-finance models 122, 129 costly state verification 441 costly-state-verification models 119 costs of illiquidity 946 counter-cyclical conditional variance 922 counterparty risk 228, 1135 counting process 719 country funds 1000 court decisions 319 covariance ratio statistic (CVR) 1008 covenants 228, 316, 462 credible signal 381 credit crunch 434, 493, 530, 531 credit cycles 436 credit default swaps 459 credit derivatives 436, 724 credit line 456 credit quality 465 credit rationing 117, 464 credit risk 233 credit risk channel 490 cross-correlation 1023 cross-listing 592 cross-sectional regression 774, 775 cross-sectional regression coefficients 778 cross-sectional regression model 780 cross-subsidization 149, 150 cultural changes 1125 cum-day 359 cum-dividend 670 cum-dividend security price 643 cumulative abnormal returns (CARs) 961 cumulative standard normal distribution function 666 currency market 596 currency swap 1139 customer relationship 448
Subject Index daily dollar volume 577 daily return variance 577 data-mining 1090 data recording problems 1173 data snooping 943 daytime returns 579 de novo banks 488, 489 deadweight costs 120 of external finance 119 dealer behavior 1023 dealer market 558 dealers 556 debt average ratios 231 contracts 228 convertible 236 covenants 227 debt financing 117 debt overhang 15, 227, 228 long-term debt 225 optimal ratios 220 perpetual 222 ratio 191, 217 renege on a debt commitment 384 restructuring 439 straight 236 tax benefits 218, 223, 224 declining return on assets 397 on capital expenditures 397 on cash 397 default 226–228, 233, 241 default-adjusted short-rate process 725 default premium 893 default recovery 723 defaultable forward rates 1234 defined benefit 925 defined contribution 925 defined-contribution plans 925 deflated gain process 645 deflator 645, 670 delegated monitoring 29, 65 delivery arbitrage 703 delta-hedging 1173 demand deposits 434 density process 655 deposit insurance 30, 436, 511, 512, 518–532 Depository Institutions Deregulation and Monetary Control Act of 1980 522 depth 588 deregulation 12, 404, 419
I-7 derivatives 1129–1199 carrying costs 1140, 1141 cash settlement 1136 centralized marketplace 1134 centralized markets 1134 credit risk 1134 forward transactions 1134 market innovations 1136 over-the-counter (OTC) derivatives 1133– 1135, 1139, 1156, 1157, 1171, 1176, 1184 over-the-counter (OTC) market 70 over-the-counter (OTC) Treasury bond option 1139 reverse trading 1135 social costs/benefits of derivatives trading 1189–1198 tulip bulbs 1134 derivatives pricing 1231–1241 deterministic volatility functions (DVF) 1164, 1182 Deutsche Boerse (Germany) 592 diff swaps 693 diffusion process 664 Dimensional Fund Advisors (DFA) 944 Dirac measure 694 directional derivative ∇U (c∗ ; c) 646 director and officer (D&O) liability insurance 19 disaster state 899 discount factor 184 discount factor parameter 912 discount rate 383 discounted cash-flow (DCF) 185 analysis 170, 195 approach 170, 181, 194 estimates 188 method 181 discrete-state Markov model 844 dispersed ownership 4, 17 disposition effect 1104 distortionary taxes 189 distress costs 718 distress risk 968 diversification 1101 diversification discount 146, 148 dividend capture trading 375 dividend changes 277, 350, 388, 393 dividend discount model 185 dividend forecast error 417 dividend growth rate
I-8 constant 185 time-varying 185 dividend initiation 1089 dividend irrelevance theorem 341 dividend omission 366 dividend payments 45 dividend payout 186 dividend policy 341, 351, 362, 417 dividend-price pair 644 dividend process 643 dividend puzzle 349 dividend recapture 1044 dividend signaling models 383 dividend smoothing 349 dividend tax effect 366 dividend yield 345, 358, 362, 364, 367, 400, 420, 752, 793, 954 dividend yield effect 970 dividends 348, 354, 356, 903, 1109 DOT (designated turnaround system) 562 Dothan (log-normal) short-rate model 694 doubling strategy 671 drift process 664 drift restriction 700 dual board system 33 dual class stock 56 duality techniques 686 Dutch auction 404–406 Dutch “structural regime” 59 dynamic clientele models 359 dynamic information acquisition 286 dynamic portfolio insurance 1154 dynamic programming 831 dynamic spanning 650 dynamic strategies 368, 789 dynamic term-structure models (DTSM) 1209, 1215–1232, 1234, 1238 affine models 1219–1221 derivatives pricing 1231, 1232 for default-free bonds 1215–1222 jump diffusions 1222 multi-factor models 1218–1222 one-factor models 1215–1218 quadratic Gaussian models 1221, 1222 rating migrations 1225–1231 regime shifts 1223–1225 yield-based 1209 early exercise 660 earnings 377, 379, 386, 390, 392, 1125 dividends as predictor 390
Subject Index mean reversion 390 momentum 390 surprises 393 earnings announcements, market reactions to 274 Easdaq 294 economic variables 765 economies of scale 593 economies of scope 479 effective half-spread 572 effective risk aversion 918 effective spread 1027 efficiency in finance 620 efficient financial markets 746 efficient frontier 625 efficient-market theories 126 efficient markets hypothesis (EMH) 1056 efficient portfolio 619, 772 efficient portfolio bounds 771 efficient price 1026 efficient-set constants 771 EFTs (exchange traded funds) 322 eigenvalue 758, 765 electronic communications system (ECN) 585, 590 Ellsberg paradox 1074, 1082 embedded options 1167 emerging markets 247 empire building 121, 129–131, 137, 141, 145, 242, 264 empirical factors 765 employee cooperative 39 Employee Retirement Income Security Act (ERISA) 9, 377, 399 endowment process 646 enforceable contracts 353 Enron 13, 35, 42, 47, 69, 83, 85, 228, 309 enterprise value 280 entrenching investments 240 entrenchment behavior 23 Epstein–Zin utility 1126 Epstein–Zin–Weil Euler equation 845 Epstein–Zin–Weil framework 832 Epstein–Zin–Weil model 828, 831, 846, 848, 872 Epstein–Zin–Weil objective function 828 Epstein–Zin–Weil utility 841, 848 equally weighted portfolio 892 equilibrium 650 indeterminate 651 equilibrium pricing model 966
Subject Index equilibrium risk premium 987 equities markets 553–599 equity basis swap 1139 equity-bond premium puzzle 827 equity carveouts 276, 298 equity flows, cross-country 977 equity issues 354 equity offerings 132 equity premium 43, 806, 891, 905, 921, 925, 927, 1075 historical-based estimates 183 history 891–898 log 906 taxation 891 equity premium puzzle 769, 806, 808, 816–832, 867, 878, 891, 911, 918, 927, 967, 1076, 1078–1083 ambiguity aversion 1082, 1083 original analysis 930 prospect theory 1079–1082 equity-risk premium 169, 170, 183 Fama–French model 188 equity share 903 equity swaps 317, 319, 1139 equity volatility puzzle 807, 808, 840 equivalent martingale measure 646, 654 uniqueness 675 ERISA, see Employee Retirement Income Security Act (ERISA) errors-in-betas 780 Euler equation 750, 787, 831, 867, 919, 1076 Epstein–Zin–Weil 845 stochastic 647, 912 Eurex 592 Euronext 592 Europe 27 shareholder suits 70 European Company Statute (ECS) 35 European option 1232 European-style derivatives 1231 European swaptions 1239, 1241 European Union 12 European Works Council 81 event studies 1089, 1090 evergreen plans 77 ex-ante efficiency 13, 19, 28 ex-ante equity premium 927 ex-day 359 ex-day price anomaly 375 ex-day trading 376 ex-dividend day 359, 363, 369, 402
I-9 ex-dividend security price 643 ex-post efficiency 13, 19, 20 exact factor structure 758 excess cash 397, 415 excess volatility 358, 967 excessive trading 1103 exchange options 1157 exchange rate dynamics 983 exchange traded funds (EFT) 322 exclusivity provisions 451 execution 562 executive compensation 33, 73 pay–performance sensitivity 75 sensitivity of pay 74 executive stock options 83, 415 exercise, American calls 660 exercise opportunities 1157 exercise policy 657 exercise price 657 expectations 386 expectations theory of interest rates expected inflation 952 expected returns 262, 748 expected risk premiums 755 expected utility 1069 subjective 1074 experimental economics 1046 expiration time 657 exponential-affine form 709 external auditors 13 external equity financing 125 external financing 257 banks 434, 437 external habit models 866 extrapolative expectations 1094
178
factor loadings 170, 181 factor (market) risk premia 181 factor-mimicking portfolios 756 factor model 633 factor risk premiums 181, 754, 966 fair price 658 Fama–French (1992) asset-pricing regression 1043 Fama–French factors 183 Fama–French model 391, 966 equity risk premiums 188 Fama–French momentum factor 949 Fama–French regressions 269, 393 Fama–French three-factor model 181, 182, 389, 395, 766–768, 948, 957, 958, 963, 1091
I-10 Fama–MacBeth approach 776 Fama–MacBeth coefficients 778 Fama–MacBeth procedure 778 Fama–MacBeth regressions 393 familiarity 1102 family-owned firms 35 Federal Deposit Insurance Corporation (FDIC) 521 Federal Deposit Insurance Corporation Improvement Act (USA) 529 Federal Reserve Board’s Flow of Funds accounts 844 Federal Reserve System (USA) 496, 512 Feller condition 1215 Feynman–Kac theorem 1211 fiat money 453 Fidelity 283 fiduciary duties 5, 49 filtration 643, 721 financial assets 746 financial crisis 495 financial development 136 financial distress 15, 224, 231 costs of 222, 226–229, 241 indirect costs of 226–229 financial flexibility 339 financial innovation 221, 307–331 benefits accruing to innovators 325 cost minimization 317 definition 310–313 evolution 324 financial innovation spiral 324 functional approach 313 globalization 320 history 311 investment banking 325 legal engineering 320 Liberty Loan program (1917) 317 Merton Miller (1986) 309 Million Adventure (1694) 312 process innovations 310, 317 processes of diffusion 326 product innovation 310 response to taxation and regulation 318 risk perception 320 role of failure 324 size of innovator 325 social welfare implication 327–329 State Street Bank v. Signature Financial 330 taxonomy 311–313
Subject Index technological shocks 321 volatility of financial markets 320 wealth impacts 327 welfare effects in incomplete markets 329 financial intermediation 431–534 bank-like financial intermediaries 434 financial leases 225 financial liberalization 1005 financial-market imperfections 133 Financial Services Modernization Act of 1999 (USA) 532 financially distressed companies 115 financing strategy 217, 244 effect of taxes 225 financing tactics 217, 244 tax-driven 225 finite-horizon growth model 186 finite-sample adjustment 772 firm investment decisions 1106 firm-specific human-capital investments 37 firm systematic risk 193 firm value 380, 384 first-come–first-served rule 505 first-day returns 281, 284, 290, 291, 293, 294 first-mover 593 first-order Taylor expansion 837 First Welfare Theorem 650 fixed-income derivatives, pricing 1209 fixed-income pricing 1207–1241 fixed-income securities (FIS) 686, 1209, 1210, 1212, 1213, 1218 with deterministic payoffs 1211 with state-dependent payoffs 1212 with stopping times 1213 fixed-price offer 281 fixed-price tender offer 405, 412 flight to quality 1046 floating payment 1213 focus lists 68 Fokker–Planck equation 694 follow-on stock offerings 259, 263 foreign currency options 1154 foreign exchange-rate risk 984 forward contracts 1213 at full carry 1142 basis 1142 value 1141 forward/futures price relation 1168 forward Kolmogorov equation 694 forward-measure approach 701 forward-rate agreement (FRA) 692
Subject Index arrears 692 at market 692 interest-rate swap 692 forward rates 699 forward risk premium 987 forward swap measure 1241 Fourier transform 710 fragmentation 594 framing 609, 1073, 1126 free-cash-flow problem 407 free-cash-flow theory 243 Jensen 243 free cash flows 15, 383, 394, 396, 414 free float 64 free lunch 985 free-rider problems 439 free-riding 137 free-trading option 566, 580, 587 frictionless capital market 912 frictionless market 1140 Fubini’s theorem 700, 710, 723 full-information risk-sharing equilibrium 449 fundamental pricing equation 910 fundamental solution 693, 694 fundamental value of corporate equities 926 futures 373 futures, interest rate 693 futures and forwards 692 futures contracts 1135 energy 1136 interest rate 1136 marked-to-market 1142 valuation 1143 futures markets 597 scalpers 597 futures options 1153 futures-position process 704 futures-price process 703 futures-style futures options 1153 gain 662 gain process 645, 664 gains and losses 1070, 1073 gamble for resurrection 36 game-theoretic models 325 GARCH, see generalized autoregressive conditional heteroscedastic (GARCH) models Garman–Kohlhagen model 1154 Garn–St. Germain Depository Institutions Act of 1982 522
I-11 Gauss–Wiener processes 176 Gaussian model 691, 697 Gaussian short-rate models 688, 690 General Electric 144 general equilibrium theory 642 generalized autoregressive conditional heteroscedastic (GARCH) models 706, 858, 993, 994, 999, 1010 GARCH-based volatility 1181 generalized expected utility (GEU) 913 generalized method of moments (GMM) 747, 774, 858 asymptotically efficient 774 asymptotically normal 774 consistent 774 geometric average 892 geometric mean 183 German corporations 38 German universal banks 452 Gilded Age 6 Girsanov’s theorem 661, 673, 675–677, 681, 712 glamour stocks 1088 Glass–Steagall Act 65, 257, 258, 511, 532 global factors 977 global markets 592 golden parachute 23, 33 good governance 55 Gordon approach 184 Gordon growth model 395 Great Depression 134, 439, 495–497, 500, 501, 508–516, 520, 532, 824 Canada 515 Green Shoe option 292 greenmail 405 Green’s function 694, 1212, 1217, 1231 gross spread 291 growth 395 growth and real business cycle theory 903 growth model 912 growth opportunities 222, 230, 233, 422 growth-optimal portfolio 675 growth options 969 growth rate of consumption 906 continuously compounded 906 variance 906 growth stocks 1092 habit formation 809, 866, 914, 918 external 914 internal 914
I-12 habit-formation utility 647, 648 habit-formation utility model 648 Hamilton–Jacobi–Bellman equation 678, 686 Hansen–Jagannathan bounds 768 Hansen–Jagannathan distance measure 773 Hansen–Jagannathan lower bound 910 Hansen–Jagannathan volatility bounds 875 Heath–Jarrow–Morton model 699 hedge funds 958 hedge portfolios 989 hedging demand 848 herding 123, 132, 578 Heston model 707, 709, 711 heterogeneous beliefs 1095 high-contact condition 204 high-order-contact condition 715 highly leveraged companies 226 highly leveraged transactions (HLTs) 229 historical volatility 1181 holding companies 6 holdup problem 37 home bias 977, 997–1004, 1101, 1102 hedging against inflation risk 1002 home bias puzzle 977 hostile stakes 58 hostile takeover 16 hot hand 1067 hot-issue markets 293 house money effect 1086 household consumption growth 919 housing market 1105 hubris hypothesis 1112 hurdle rate 262 i.o.u.’s 453 Ibbotson Associates Yearbooks 894 idiosyncratic income risk 899 idiosyncratic income shocks 918, 920 idiosyncratic risk 626, 1024 illiquidity 968 imbalance 577, 579 implementation costs 1059 implicit incentives 78 implied binomial tree 1165 implied-tree model 706 implied volatility 1185, 1188 Black–Scholes 705 information content 1179–1181 weighted average 1179, 1181 implied volatility function 705, 1181 Inada conditions 608, 684
Subject Index incentive compensation 418 incentive-driven theory of capital structure 244 incentive plans 418 income bonds 316 income shocks 918–920, 922 incomplete contracting 119 incomplete contracts 14, 342, 383–386, 492 incomplete markets 172, 314, 329, 651, 686, 911 index funds 322 index inclusions 1063, 1064 individual investors 356 industry risk loadings 182 inefficiencies 53 infinite horizon 921 infinite horizon models 918 inflation-adjusted return 894 inflation illusion 877 information 386, 389, 421 content of implied volatility 1179–1181 effects on long-run asset returns 1041 price adjustment to 1029 information asymmetries 114, 115, 135, 174, 233–235, 313, 315, 316, 339, 342, 377–386, 398, 408, 412, 422, 444–448, 563, 566, 569, 572, 584, 1023, 1029, 1030, 1043 bank panics 509 information-based theories 1030 information costs 409 information set 353 information/signaling hypothesis 413 informational cascades 289 informational content 387 informationally efficient 126 informationally efficient markets 767 informed investors 581 informed traders 559, 566 infrequent trading issue 1168 initial inventory 565 initial public offerings (IPO) 43, 256–260, 262, 263, 272, 279–299, 355, 356, 403, 946, 959, 1064, 1181 abnormal returns 295 allocating and pricing 281, 284 allocation 960 as a marketing event 290 auctions 284 cycles 960 Europe 294 first-day returns 281
Subject Index flipping 960 hot-issue markets 293 innovation 321 lock-up period 298 long-run performance 295–299 post-issue returns 297 price support 292 quiet period 298 returns 962 short-run underpricing 281 spinning 960 stabilization 292 supply and demand shifts affecting the price 298 systematic risk 298 underperformance 299 underpricing 286–291, 300, 959 corruption 288 dynamic information acquisition 286 informational cascades 289 IPO as a marketing event 290 lawsuit avoidance 290 prospect theory 286 signalling 290 winner’s curse 288 underwriter compensation 291 uniform price mechanisms 281 volume fluctuation 293 insider-trading laws 50 insolvency risk 488 “instantaneous” expected rate of return 665 institutional constraints 342, 408 institutional factors 1010, 1189 institutional investors 283 institutional or regulatory constraints 217 institutional ownership 377 institutional shareholder activism 29 intangible assets 222, 229, 439 intangible capital 926 integrated markets 977 intellectual property 329 intensity, risk-neutral 721 inter-corporate dividends 224 inter-dealer brokers 596 inter-dealer market 596 inter-temporal portfolio selection model 1036 interest rate swap 692, 1139 interest rates, negative 690 interest tax shields 222–225, 231 interim trading bias 790, 791 interlocking directorates 65
I-13 intermarket trading system (ITS) 592 intermediation theory 440 internal capital 408 internal funds 349 internal habit models 866 internal rate of return (IRR) 174 international arbitrage-pricing model 992 international asset-pricing model (IAPM) 992 international investment barriers to 997–1005 International Monetary Fund 497 International Monetary Market (IMM) 1136 international security market line 993 internet bubble 284, 294, 298, 300, 1125 internet carve-outs 1064 intertemporal choice problem 903 intertemporal marginal rate of substitution 750, 751, 753, 807, 925 intertemporal optimization 912 intertemporal pooling 289 intertemporal substitution effect 870 intertemporal substitution elasticity 821, 828, 841, 880, 903, 913, 914 intra-day seasonalities 1032 intra-day transaction data 1196 invariant transformations 1220 inventory 570 inventory effects 1027, 1028, 1030 inventory risk 563–566 inventory theories 1030 investment 379, 396, 411, 422, 607 investment appraisal 169 investment bankers 291 investment banking 255–300 investment behavior 115 Investment Company Act of 1940 (USA) 399 investment decisions 420 investment horizon 360 investment opportunities 383, 397, 413 investment opportunity set 230 investment outcomes, within-firm 137 investment policy 169, 341, 352 investment restrictions 793 investment returns 339 investment spending 339 investment with constraints 686 investor behavior 400, 403, 1101–1106 investor optimization 749 investor preferences 371 investor sentiment 957 IPO, see initial public offerings (IPO)
I-14
Subject Index
irrational expectations 809 irrational investor behavior 963 irrationality 876–879, 1095, 1126 effect on corporate finance 1106–1112 managerial irrationality 1111, 1112 irrationally exuberant 879 Islamic prohibition 320 isoelastic preferences 913 isomorphic equilibrium 918 Ito process 663, 665, 669, 670, 673, 679, 695, 699, 704 H 2 670 unique decomposition property 664, 667 L1 663 L(S) 664 L(X ) 669 Ito’s formula 664, 667–669, 674, 676, 680, 700, 714, 715 D 669 Ito’s lemma 177, 196, 664, 1150, 1158, 1211, 1218, 1224 Jensen’s alpha 272, 626, 944, 945 Jensen’s inequality 184, 631, 818, 819, 827, 874, 984, 986 job-rotation policy 144 joint hypothesis 767 joint hypothesis problem 1061 jointly log-normally distributed 917 junk bonds 227 just-in-time 43 k-fund separation 632 401(k) plans 69 Kalman filter 202 Kansas City Board of Trade (KCBT) 1167 keiretsu 44, 65, 440, 464 knife-edge solution 909 knockout options 657 Kreps–Porteus utility 648 Kyle model 581 Kyle’s l 1039, 1042, 1045 labor-market reputations 122 laddering 284 lagged instruments 752, 767 Lagrange multiplier 683, 684 Lagrange multiplier test 782 large creditors 29 large investors 4
1136,
large markets 757 large shareholders 4, 50 law of one price 904, 979, 983, 989, 1139 law of small numbers 1067, 1092 lawsuit avoidance 290 lead manager 259 lean production 43 legal origin 45 correlation with size of stock markets 45 legal systems 45 legal-tender bonds 321 lender-of-last-resort 495, 502, 512 letters of credit 434 leverage 43, 229, 230, 962 leverage-decreasing transactions 261 leverage-increasing transactions 261 leveraged buyouts (LBOs) 227, 242 leveraged restructurings 242 L´evy inversion formula 710 LIBOR, see London Interbank Offering Rate likelihood ratio test 782 limit order 558 limited liability 39, 220, 713 limited participation 911 linear pricing rule 614, 616 linear risk tolerance (LRT) 631 links among markets 595 liquidity 286, 377, 380, 381, 400 asymmetries 1023 beta 1040 effects on long-run asset returns 1036 premium 891, 1036 ratio 1038 runs 1007 shock 455, 1029 traders 454, 559 livestock 1136 loan commitments 434 loan structure 469–474 local expected utility 649 local factors 977 local volatility rate 1182 lock-up period 280, 298 logarithmic preferences 914 London Interbank Offering Rate (LIBOR) 693 London Interbank Offering Rate (LIBOR)-based plain-vanilla swaps 1230 London Interbank Offering Rate (LIBOR) model 1136, 1210, 1230, 1237–1240
Subject Index London International Financial Futures Exchange (LIFFE) 1154 London Stock Exchange (LSE) 59, 592 long-run returns 963 long-run shareholder value 122 Long-Term Capital Management (LTCM) debacle 517, 530 long-term large investor models compared to Anglo-American market-based system 42 long-term reversals 1087, 1093, 1094, 1097 lookback options 1156 looting 523, 525 loss aversion 1071, 1079, 1097 loss of reputation 442 losses to informed 575 lower-level agency 140 Lucas’ (1978) pure exchange model 902 LYONs (liquid yield option notes) 317 macroeconomics 879 main bank (hausbank) relationship 464 manager–shareholder agency conflict 124 manager–stockholder agency conflict 120 managerial agency problem 14 managerial corporation 6 managerial discretion 5 managerial entrenchment 23 managerial “hubris” 125 managerial irrationality 1111, 1112 managerial overoptimism 274 managerial ownership 64 managerial quiet life 131 mandatory regulatory intervention 16 Manning Rule 559 marginal-rate-of-substitution process 645, 647, 671 marginal rates of substitution 354, 642 marginal utility 899 marginal utility of wealth 751 margining system 1135 market capitalization 377, 400 market closing volatility 1032 market completeness 918 market efficiency 389, 1056–1058 tests of 273 market expectations 393, 394 market for corporate control 7 market frictions 376 market imperfections 217, 678, 911 market incompleteness 922
I-15 market inefficiency 942 market integrity 1135 market makers 556, 1025 market microstructure 1023 market microstructure theory 1125 market opening 581 market-opening volatility 582, 1032 market order 558 market perception of dividend-changing firms 388 market portfolio 364, 764 market-price-of-risk process 673 market risk 626 market risk premium 179, 981 market synchronization 1194 market timing 262, 785 market-timing models 792 market-to-book ratio 230 market underreaction 403, 415 market value 577 market-value balance sheet 218 market value of equity 765 market volatility 794 marketed subspace 644 Markov behavior 1030 Markov process 902 martingale 177, 567, 643, 645, 663, 671, 672, 674, 675, 681, 696, 1026, 1223, 1238, 1239, 1241 forward measure 689 local measure 672 theory of 170 unique equivalent martingale measure 657 martingale approach to asset valuation 196 martingale approach to optimal investment 678 martingale pricing 171, 195 martingale pricing theory 196 martingale representation 616 martingale representation property 677 martingale representation theorem 673 material-adverse-change clause 464 maximum likelihood 365, 781, 784 maximum risk-premium for a cash flow 179 mean-reversion parameter 688 mean-reverting debt ratios 237 mean-reverting process 1027 mean-reverting stock returns 927 mean standard-deviation boundaries 763 mean-variance efficiency 756, 757, 761 mean-variance optimization 612
I-16 mean-variance theory 624 measurement error 376, 1026 mechanisms of control 241 mental accounting 1073, 1110 mergers 12 mergers and acquisitions (M&As) 277, 340, 354–356 Merton model 1153 Merton’s problem 678, 682 Microsoft 129, 340 microstructure effects 1026 mimicking portfolios 778 minimum-variance portfolios 759, 764, 781 well-diversified 759 minority shareholders 46 mis-measured trading costs 1173 misspecified model 272 modified probability distributions 911 Modigliani and Miller’s leverage-irrelevance theory 221 Modigliani–Miller 258 Modigliani–Miller (1958) paradigm 127 Modigliani–Miller theorem 130 Modigliani–Miller value-irrelevance propositions 218–221 momentum 767, 1088, 1093–1095, 1097, 1105 momentum effect 949, 957, 958, 1034 momentum trading 1009 money, as a security with no dividends 690 money illusion 1085 money left on the table 284 moneyness bias 1174–1176 monitoring 384, 385, 397, 469–474, 507, 520 banks 440–444, 459, 463–468 blockholders 285 by banks 65 by shareholders 240 effect of shareholder suits 70 firms monitored by banks 433 monitoring-the-monitor problem 441–443 Monte Carlo simulation 1162 Moore’s law 1164 moral hazard 30, 117, 313, 315, 436, 453, 455, 458, 470, 494, 518–529, 923 thrift industry 522 more-money effect 140, 142 mortgage-backed securities 693 multi-beta pricing 757 multi-collinearity 1035 multi-constituency models 35
Subject Index multi-factor asset-pricing models 178, 181 multi-factor models 183, 789 multi-factor term-structure model 695 multiple constituencies 14, 80 debtholders 80 employees 81 multiple-factor models 750 multiple principals 13 multiple underlying assets 1157 multiplicative bounds 771 multivariate regressions 774, 781 mutual funds 29, 399, 767, 785 performance 958 separation 626, 629 Myers and Majluf model 274 Myers–Majluf (1984) adverse-selection model 119 myopic behavior 123 myopic loss aversion 1080 naive diversification 1103 naked short position 293 narrow framing 1073, 1081, 1097, 1104 Nasdaq 42, 267, 271, 275, 297, 583, 589 Instinet 585 odd eighths 584 SelectNet 584, 587 small order execution system (SOES) 584, 587 National Association of Securities Dealers 260 National Banking Era 496, 497, 509 national income 898 national market system 591 near-cash assets 360 necessary and sufficient conditions for valuation 177 negative interest rates 690 neoclassical growth model 902 neoclassical theory of the firm 171 net payout 351 net present value (NPV) rule 169, 173, 174, 193, 204 Marshallian 204, 205 netting by novation 454 network externalities 593 Neuer Markt (Germany) 42, 294 “new era” theory 1126 New Jersey registered holding companies 6 New York Cotton Exchange 1135 New York Futures Exchange (NYFE) 1139
Subject Index New York Mercantile Exchange (NYMEX) 1136 New York Stock Exchange (NYSE) 22, 70, 260, 267, 268, 271, 275, 295, 297, 589, 893 DOT system 584, 587 fixed commissions 591 floor brokers 591 overseeing the book 583 Rule 390 591 Rule 394 589–591 seats 583, 589, 591 specialists 591 no-arbitrage condition 749 no-arbitrage price relations 1166 no-arbitrage principle 746, 749, 1139–1148 no free lunch with vanishing risk 672 noise 1191 noise trader risk 1058, 1059, 1063–1065, 1099 noise traders 454, 873, 1057, 1058, 1060, 1061, 1097–1100 nominal dividends 651 non-competitive pricing 563 non-diversifiable risk 891, 924 non-dividend-paying stock options 1153 non-expected utility 649 non-satiated investors 817 non-synchronous trading 1023, 1028, 1035 non-traditional bank activities 436 nonlinear rational expectations 912 Novikov’s condition 674 NOW accounts 924 number of trades 577 numeraire 645 numeraire deflator 673 numeraire invariance theorem 671 numeraire portfolio 675 objective of the firm 171 October 1987 crash 1182, 1197 OECD Principles 47, 48, 50, 71 offer price 281 omissions 1089 omitted risk factors 367 one-factor term-structure model 687 one-fund separation 630 one-share–one-vote 5, 55, 84 one-shot auction 581 online trading 1104 open market repurchases 276
I-17 opportunity cost 362 opportunity cost of capital 341 optimal bounds 771 optimal compensation schemes 236 optimal investment decision 172 optimal investments 351 optimal monitoring incentives 24 optimal portfolio 619 optimal risk diversification 24 optimal risk sharing 369 optimal stopping problem 657 optimal stopping theory 659 optimism 1066, 1112 option effect 563 option introductions price effects 1192 option repricing 77 option spreads 597 option valuation 1148–1166 approximation methods 1157–1164 binomial method 1158 compound option approximation 1163 compound options 1155 finite difference methods 1162 lattice-based methods 1158 Monte Carlo methods 1162 multiple underlying assets 1157 quadratic approximation 1163 quasi-analytical methods 1163 single underlying asset 1155–1157 static replication 1155 stochastic volatility 1165 trading simulations 1176–1179 trinomial method 1162 valuation by replication 1154 option valuation models empirical performance 1173–1189 optional sampling theorem 645 options 373 interest rate 693 on the minimum and the maximum 1157 on yields 693 time value 1152 options markets 597 American exchange 597 Pacific exchange 597 Philadelphia exchange 597 primary market makers 597 order cancellation 581 order-driven auction market 580 order handling costs 563
I-18 order handling rules 559, 583, 585 order processing 569 Ornstein–Uhlenbeck model 697 Ornstein–Uhlenbeck process 200, 202, 698 outside banks 465 outside equity 240 outside financing 408 outside funds 349 outside investors 35, 217, 444 rights of 217 overallotment option 292 overconfidence 1065, 1085, 1093, 1094, 1104 overinvestment 43, 116, 121–123, 125, 130, 131, 134, 142, 151, 243, 384 oil industry 129 overlapping generations 452, 838 overlapping-generations (OLG) exchange economy 921 overnight returns 579 overshooting 582 ownership 4
Panel Study of Income Dynamics (PSID) 876, 919 par coupon rate 692 parallel trading 578 price effects 578 Pareto efficiency 14 Pareto improvement 452 Pareto inefficiency of incomplete markets 651 Pareto optimal 620, 650 Pareto-optimal allocation 171 Paris Bourse (France) 592 partial adjustment 286 partial differential equation 196, 667, 689 Feynman–Kac 698 participation constraints 1043 partnership 39 passive index fund strategy 958 passive portfolio managers 437 passive traders 559 patenting 329 path dependency 293 path-dependent derivative securities 693 payment for order flow 561, 594 payoff distribution pricing 622 payout decisions 379 payout policy 337–422 asymmetric information 377–386 incomplete contracts 383–386
Subject Index Miller and Modigliani 339, 341, 342, 351–354, 362, 378, 384, 386 taxation 358–377 payout ratios 349, 354, 398, 417 payout to individual investors 357 payout yield 345, 417, 421 pecking-order theory 218, 233–239, 243 Jensen and Meckling 240 of capital structure 235 time-series tests 237 pension fund performance 794 pension funds 925 per-capita consumption 902, 919 perfect financial markets 233 performance benchmarks 793 permanent earnings 419 personal funds 323 peso crisis 1013 peso problem 823 Philadelphia Exchange (PHLX) 1136, 1171 plain-vanilla interest rate swap 1213 poison pills 20, 23, 46, 54, 55, 67 Poisson process 1166 portfolio allocation 364, 895 portfolio choice 977 portfolio choice problem 610 portfolio constraints 678 portfolio efficiency 781 portfolio insurance 1044 portfolio manager 447 portfolio of securities 609 portfolio optimization 760, 762 portfolio strategies 763 portfolio theory 607 portfolio weights 772, 790 positive feedback trading 1094 positive linear pricing rule 614 post-bid defenses 54 post-dividend abnormal performance 389 post-dividend announcement drift 389, 403 post-dividend-change performance 389 post-earnings announcement drift 1089, 1093, 1094, 1097 post-earnings announcement effect 1093 power indices 62 power utility 821, 828, 831, 841, 846, 847, 867 power utility model 831, 870 pre-bid defenses 54 pre-default price 722 precautionary savings 452, 825, 870, 910
Subject Index predictability 753, 767, 919, 1076 predictability puzzle 1076 predictable process 719 preference-free results 612 preferencing 584 preferred stocks 316 premium 370, 372 premium for bearing risk 891 present value 748 price adjustment to information 1029 price discovery 1044 price discreteness 375, 376 price drop 375 price/earnings ratio 898 price formation 1044 price-impact coefficient 581 price improvement 572 price matching 561 price priority 560 price support 292 price synchronization 1167 price-to-earnings multiple 280 price volatility 358 pricing and allocation rules 286 pricing behavior 1026 pricing errors 758, 1174–1176 pricing kernel 202, 608, 645, 671, 748, 910, 913 pricing of derivative securities 702 pricing of forward and futures contracts 702 pricing of shares in the secondary market 26 pricing rule representation theorem 616 primary equity offerings 1089 principal-agent 401 principal-agent model 243 principal components 765 principal components analysis 1040 priority rules 594 private benefits 56, 122 private debt 459 private equity 241 private negotiations 68 private placements of equity 276 privately held firms 27 privatization 8, 281 probability of information-based trade (PIN) 1042 producer price index (PPI) 953 productive unit 903 profit-related bonus 34 profitability 383
I-19 property rights 37 proprietary trading systems 590 prospect theory 286, 609, 809, 1069–1074, 1079–1082, 1104, 1126 prospective cash flows 172 prospectuses 260 protective covenant 713 proxy fights 67, 84, 275 prudence 918 Prudent Man Investment Act 399 ‘prudent man’ laws 399 Prudent Man Rule 399, 400 pseudo market timing 299, 963 psychology 1126 public corporation 7 public debt 459 public finance 416 public float 280 public information 748 public policy 221 purchasing power parity 982 pure discount bond 176 put options, American-style 1146, 1151 lower price bound 1146 put options, European-style 1151, 1187 lower price bound 1145 put–call parity 1146, 1171 American-style 1147 conversion 1147 European-style options 1146 reverse conversion 1147 puttable bonds 719 quantitative puzzle 911 quantos 693 quiet period 260 quote-driven dealer market quote midpoint 568 quote revision 1030 quoted half-spread 572
580
Radon–Nikodym derivative 655 random field model 702 random jumps in the asset price 1183 random walk 1027 random-walk model 751, 951 rate of return 611 rating migrations 1225–1231, 1234–1237 rational bubbles 807, 838 rational exercise policy 657, 659 rational expectations 748, 1042
I-20 rational expectations equilibrium (REE) 1055 rational investors 169 rational markets paradigm 169 rationality 389, 403 rationing 289 R&D financing organization 315 R&D investments 193, 194 real after-tax interest rate 925 real business-cycle theory 902, 925 real costs 572, 575 real options 170, 195, 198, 203, 204 real options approach 175 real risk-free rate 894 real terminal wealth 979 realized half-spread 574 realized spread 569, 570 rebate rates 1198 recovery assumption 1225 recursive utility 647, 648 redistribution 54 reduced-form models 119, 1031, 1214 redundant securities 656 regime shifts 1223–1225 regime switching 1223 regional exchanges 589, 590 regulation 319, 925 regulatory dialectic 319 relationship banking 66 relationship investing 60 relative risk aversion 912, 919 coefficient of 819, 821, 828, 841 relative risk parameter 912 reliability problem 444 remuneration committee 34 rent-seeking agents 142 rent-seeking division managers 143 reorganization 226, 227 replicated random variables 677 representative agent 651, 652 representative-agent state-pricing model 651 representative-consumer models 922 representative-household paradigm 918 representativeness 1066, 1084, 1092, 1093 repurchase yield 345, 417 repurchases 339, 354, 356, 403–420 announcement effects 406, 410 Dutch auction 404, 405 fixed-price tender offer 404, 405 open-market share repurchase 404 signaling costs 408 reputation 460
Subject Index reputation-building 16 reputational herding models 124 research coverage 259 reserve requirements 437 reset options 1156 resettlement 704 resolution of uncertainty 649 retained earnings 397, 418, 420 retirement accounts 925 return volatility, changes 1190 returns predictability 751, 898 returns to bidder firms 964 returns to individual investors 956 returns to institutional investors 958 revealing rational expectations equilibrium 1041 Riccati equation generalized 697, 725 ordinary differential equation 710 Riesz representation theorem 644, 647 Rietz’s disaster scenario 920 rights issues 258 risk 359, 363, 373, 374, 377, 383, 395, 400, 403, 413 risk-adjusted discount rate 175, 180 risk-adjusted drift 197 risk-adjusted returns 367 risk aversion 1063 risk-aversion coefficient 912 risk exposure 789 risk factors 170, 754 risk-free hedge 1149 risk-free rate 364 risk-free rate puzzle 807, 825, 878, 908, 914–918, 924, 1077 risk-free security 894 risk-neutral 361, 369 risk-neutral dynamics 1210 risk-neutral intensity process 721 risk-neutral measure 1210 risk-neutral probabilities 616, 654 risk-neutral process 199 risk-neutral valuation 1166 risk preferences 1151 risk premia 170, 171, 176, 179 time variation in 200 risk premium 626, 634, 750, 919 risk profile 396, 413, 414 risk reduction 415 risk-return tradeoff 364, 899 risk shifting 227, 233, 469, 483
Subject Index riskless real interest 805 road show 283 Robber Barons 6 Roll implied spread 575, 582, 584 roll out 1190 routing orders 562 Royal Dutch 1063 run-up in stock prices 857 Russian crisis 530 Saddle Point Theorem 683 safe-harbor rule 404 salary 34 salience 1126 sample path 662 sample selection bias 943, 965 sample selection criteria 270 sample size neglect 1067 sampling error 773 Savings-and-Loan crisis 53, 498, 525 savings-and-loan institutions 522 savings–investment process 434, 437 scale economies 479, 483 scale-invariant 821 scaled-price effect 1093 scaled-price ratio effect 1093, 1094 scaled-price ratios 1088, 1094, 1097 scandals 13 scope economies 483, 484 seasoned equity offerings (SEO) 256–258, 262–276, 290, 292, 293, 295, 298, 300, 355, 356, 403, 963 3-factor regressions 270 announcement effect 263–265 buy-and-hold returns 268 Japan 277 long-run performance 265–271 mean percentage returns 267 performance 271 post-issue returns 266 systematic risk 272 underperformance 271–274, 299 secondary equity offerings 1089 secondary market liquidity 64 secondary priority 560 securities, design innovation 220 Securities Acts Amendments (USA) 590, 591 Securities and Exchange Commission (SEC) 260, 261, 399, 404, 585, 1189, 1190 moratorium on options introductions 1190 Order Handling Rules 583
I-21 Regulation FD (fair disclosure) 258 Regulation M 292 Rule 415 261 Rule 10b-18 404, 408 Rule 19c-3 591 Stock Allocation Plan 1190 Securities Exchange Act of 1934 404 Securities Exchange Act (SEA) 404 securities issuance 255–300 information 261, 262 long-run performance 262 securities regulation 259–261 securitization 435, 436 security 643 security characteristics 766 security issuance 1106 security market line (SML) 626, 633 security-price process 643 security prices 169 segmented markets 977 self-attribution bias 1094 self-control 401, 1109 self-dealing 28 self-insuring agents 918 self-regulatory obligations 595 SEO, see seasoned equity offerings (SEO) Separating Hyperplane Theorem 644 separation of ownership and control 6 separation of variables 710 serial covariance 571 serial covariance of price changes 570 serial dependence 752 share turnover 1038 shareholder activism 6, 49, 68 shareholder rights 59 shareholder suits 84 shareholder value 14 shareholder–debtholder conflict 398 Sharpe ratio 179, 200, 202, 626, 628, 759, 769, 771, 782, 783, 807, 819, 827, 910, 1081 law of conservation of squared Sharpe ratios 783 market 200 shelf issues 261 Sherman Antitrust Act 1890 (USA) 6 shock contagion 978 short horizons 1060, 1063 short position 703 short-rate process 670 short-run underpricing 279
I-22 short-sale constraints 315, 329, 364, 1059, 1095–1097, 1106 short-term bonds 919 short-term interest rates 952, 954 short-term performance 122 short-term price behavior 568, 1023 short-term riskless borrowing 653 short-termism models 131 Siegel’s paradox 985 signal-to-noise ratio 1035 signaling 290, 339, 377–383, 386–396, 407, 408, 412, 413, 421 signaling theories 379 single underlying asset 1155–1157 size effect 944, 970 size-matched benchmark 265 size premium 1087 small-firm effect 944 small-firm turn-of-the-year effect 970 smarter-money effect 140, 141 smile 1175 smile curves 705–708 skews 707 smooth consumption 918, 921 smooth-pasting condition 714 Snell envelope 659 soft budget constraint 137 Solnik/Sercu model 984, 989–991, 995, 997, 998 Solnik’s model 984, 986–988 sovereign risk 999 spanning 172, 610 spanning markets 329 specialist 556, 581, 583 specialist intervention 582 speculative bubble 1134 spillovers 1010–1014 spinning 288 spinoffs 151 spot interest rate 176 spot LIBOR measure 1239 spread midpoint 1027 spread portfolios 779 spurious inferences 1028 spurious risk factor 779 square root process 1166 squeeze (freeze) out 21 stabilization 292 stabilized (inflation-indexed) bonds 321 stable dividend policy 349 staged financing 280
Subject Index stale or non-synchronous prices 1173 staleness 1168 Standard and Poor’s (S&P) industrial portfolios 893 standard filtration 662, 668 standard growth theory 926 standard time-additive 917 state-contingent control allocation 36 state contingent payoffs 447, 456 state-price deflator 645, 671 state-price density 608, 616, 645–647, 655, 671, 672 state prices 616, 642, 644 state variables 176, 749, 988 state vector elements, latent 695 states of nature 1125 static clientele models 358, 362, 364 stationary equilibria 921 steady-state after-tax real interest rate 926 steady-state before-tax accounting profits 926 steady-state growth rate of the economy 926 Stiemke’s lemma 644 stochastic differential equation 687 stochastic discount factor (SDF) 608, 747–749, 751, 754, 755, 757, 767, 770, 772, 773, 784, 787, 807, 816–819, 873, 875, 876, 904 alpha 787 lognormal and homoskedastic 817 stochastic dominance 632 stochastic growth model 925 stochastic integral 664, 668–670 stochastic integration, linearity 663 stochastic interest rates 1183 stochastic partial differential equation (SPDE) 701, 702 stochastic process 176, 903 stochastic variation 171 stochastic variation in the premium 183 stochastic volatility 1165, 1183 econometric estimation 708 stochastic volatility models 1183 stock dividends 402 stock issue 233 stock market boom 1125 stock market indexes 892 stock market liberalization 1000 stock market liquidity 4 stock markets 461 size of 45 stock option listing criteria 1190 stock option market 1170
Subject Index stock options 411, 415, 416, 418 stock ownership 241 stock participation 34 stock repurchases 1089 stockholders 171 stopping time doubly stochastic 720 non-trivial 720 totally inaccessible 720 strict priority 713 string model 702 structural breaks 1009 structural pricing models 1214 Student Loan Marketing Association 1139 style matching 265 subjective-time discount factor 902 Suffolk banking system 502 sunshine trading 587 sunspots 504, 506, 508 super-replicating trading strategy 658 surplus consumption ratio 869 survival bias 184, 911, 920 survivorship bias 813, 899, 958 swap contract 1139 swap markets 692 swaps 1230 Swaption Market model 1210 swaptions 692, 1241 switching costs 594 Sydney Futures Exchange (SFE) 1154 symmetric information 353 syndicate 259 systematic risk 364, 365, 751, 754, 901 systemic risk 626 T-bills 924, 925 takeover activity 1112 takeover pressure 123 takeovers 12, 19, 20, 49, 50, 58, 83, 242, 243, 358, 385, 405, 407 City Code (UK) 45 defenses 54 Delaware 52 Europe 53 targets 51 tangency portfolio 989 tangible capital 926 tape revenue 561 target-adjustment financing 231 target-adjustment model 231, 237 target debt ratio 231
I-23 tax-free institutions 368 tax-induced trading 370 tax-loss carry-forwards 223, 225 tax-loss selling 1088 taxation 222–225, 313, 318, 319, 339, 341, 353, 358–377, 422, 924, 925 1986 Tax Reform Act 51, 347, 360, 372, 394, 402 as barrier to international investment 998 average future tax rate 223 capital gains tax rate 361 corporation income tax 360 dividend income tax rate 361 effective marginal tax rate 225 effects on financing strategies 225 financing tactics 225 implied tax rate 363 interest expense deduction 717 marginal tax rate on dividends 347 tax advantages of corporate debt 223 tax advantages of debt 224 tax avoidance 368 tax benefits of debt 218 tax clienteles 359, 361, 370, 372 tax disadvantage 409, 410 tax effect 364, 372 tax heterogeneity 376 tax liability 363 tax penalty 368, 402 tax rates 372 tax reforms 377 tax returns 346 tax shield 718 Taylor approximation 837 Taylor-series expansion 981 temporal aggregation 911 tender offer game 20 tequila effect 1013 term-structure derivatives 693 term-structure modeling 698 term-structure models 691 caps 692 caps, valuation of 692 floor 692 term structure of riskless interest rates 175, 180 terminal measure 1239 theory of taxes and portfolio choice 223 third market 590 Thompson Financial Securities Data 312 three-factor time-series regressions 269
I-24 thrift industry 522 tick size 375, 561, 584, 587 time-nonseparability 867 time-nonseparable preferences 868 time-separable power utility 819 time-separable utility 816 time-series behavior 567 time-series models 1031 time-series return variation 367 time steps 1160 time-varying equilibrium expected returns 952 time-varying market risk premium 954 timing convention 813 timing strategies 239 Tobin’s Q 51, 55, 72, 76, 397, 521, 879, 969 toehold 21 Tokyo Stock Exchange 269, 274 too-big-to-fail 487, 517, 524 total execution cost 572 trace of a square matrix (tr(A)) 669 tracking error 793 trade indicator 568 trade internalization 584 trade liberalization 1001, 1005, 1010, 1011 trade-off theory 218, 221–233, 243 time-series tests 237 trade-time 1031 traded factor models 788 traded half-spreads 574 trading behavior 371 trading centralization 593 trading costs of institutional investors 578 trading fragmentation 593 trading gains 670 trading halts 582 trading pressure 579 trading process 1044 trading simulation tests 1178 trading strategy 643, 662, 670 generated gain process 664 self-financing 665, 667, 670, 674, 677 trading volume 371, 373, 374 ex-day 370, 374–376 transaction costs 339, 342, 353, 354, 359, 369, 373, 374, 376, 379, 399–404, 422, 668, 678, 921, 961, 1023, 1036, 1059–1061, 1193 transaction prices 1026 transform 710 transmission of shocks 978 transparency 580, 581, 586, 593, 595
Subject Index Treasury auctions 596 Treasury bills 893 Treasury Inflation Protected Securities (TIPS) 894 Treasury stock 411 Treynor Index 626–628 trial-opening prices 581 tribe 643 triple witching 1044 turn-of-the-year effect 945 twin shares 1061–1063, 1065 twin stocks 1100 two-fund separation 625, 632 two-sided credit risk 1230 two-step approach 775 unanticipated inflation 921 unconditional beta 786 unconditional efficiency 763 unconditional equity premium 919 underinvestment 116, 118, 121–123, 125, 129, 131, 140, 227, 233 underpricing 286–291 corruption 288 dynamic information acquisition 286 informational cascades 289 IPO as a marketing event 290 lawsuit avoidance 290 prospect theory 286 signalling 290 winner’s curse 288 underreaction 273 undervaluation 380, 381, 414 underwriter compensation 291 underwriters 259, 463 underwriting discount 260 underwriting spreads 234, 325 uninsurable idiosyncratic income risk 919 uninsurable idiosyncratic income shocks 919 uninsurable income risk 899 uninsurable income shocks 918 universal hedge ratio 990, 991 unlisted trading privileges (UTP) 589 unseasoned equity offering 263 upper bound 981 upward sloping supply curves 1188 US House of Representatives Financial Services Committee 288 U-shaped volatility 1032 utility function 608 utility maximization theory 1126
Subject Index utility-maximizing strategies utility of consumption 749 utility theory 1069
I-25 761
valuation 339, 341, 617 valuation of nominal bonds 202 value effect 947, 970 value-matching condition 203 value maximization 171, 172 value-maximizing decisions 172 value premium 1097 value stocks 407, 1088, 1092 value-weighted index 892 value-weighted portfolio 893 variance bounds 747, 768 Vasicek (1977) model of the term structure 843 venture capital 26 venture-capital firms 123, 131, 132 venture capitalists 280 VIX 1180 volatility 665, 1075, 1125 volatility patterns 1033 volatility puzzle 1076, 1083–1087 beliefs 1084–1086 preferences 1086, 1087 volatility smiles 1182 volatility spillover 1012 von Neumann–Morgenstern 1069 expected utility 611 preferences 608, 609, 918 restriction 630 utility function 608, 619
voting control 60, 63 voting trusts 6 VXN 1180
Wald test 782–784 Walrasian auctioneer 1045 wealth constraints 14 wealth transfer 398 weekend effect 946, 970 weight-based performance measures 790 weighting matrix 775 Wiener process 1150 wildcard options 693 windows of opportunity 262 windows-of-opportunity model of capital structure 274 windows-of-opportunity theory 239 winner picking 141, 152 winner’s curse 288, 465–467, 475, 484, 485, 487, 489 wishful thinking 1066 WorldCom 35
yield curve 687 yield options 693
zero-beta portfolio zero-coupon bonds defaultable 721, zero-drift property
775, 993 317, 318, 687, 1210, 1211 1229 177
HANDBOOKS IN ECONOMICS
1. HANDBOOK OF MATHEMATICAL ECONOMICS (in 4 volumes) Volumes 1, 2 and 3 edited by Kenneth J. Arrow and Michael D. Intriligator Volume 4 edited by Werner Hildenbrand and Hugo Sonnenschein 2. HANDBOOK OF ECONOMETRICS (in 6 volumes) Volumes 1, 2 and 3 edited by Zvi Griliches and Michael D. Intriligator Volume 4 edited by Robert F. Engle and Daniel L. McFadden Volume 5 edited by James J. Heckman and Edward Leamer Volume 6 is in preparation (editors James J. Heckman and Edward Leamer) 3. HANDBOOK OF INTERNATIONAL ECONOMICS (in 3 volumes) Volumes 1 and 2 edited by Ronald W. Jones and Peter B. Kenen Volume 3 edited by Gene M. Grossman and Kenneth Rogoff 4. HANDBOOK OF PUBLIC ECONOMICS (in 4 volumes) Edited by Alan J. Auerbach and Martin Feldstein 5. HANDBOOK OF LABOR ECONOMICS (in 5 volumes) Volumes 1 and 2 edited by Orley C. Ashenfelter and Richard Layard Volumes 3A, 3B and 3C edited by Orley C. Ashenfelter and David Card 6. HANDBOOK OF NATURAL RESOURCE AND ENERGY ECONOMICS (in 3 volumes) Edited by Allen V. Kneese and James L. Sweeney 7. HANDBOOK OF REGIONAL AND URBAN ECONOMICS (in 4 volumes) Volume 1 edited by Peter Nijkamp Volume 2 edited by Edwin S. Mills Volume 3 edited by Paul C. Cheshire and Edwin S. Mills Volume 4 is in preparation (editors J. Vernon Henderson and Jacques-Fran¸cois Thisse)
8. HANDBOOK OF MONETARY ECONOMICS (in 2 volumes) Edited by Benjamin Friedman and Frank Hahn 9. HANDBOOK OF DEVELOPMENT ECONOMICS (in 4 volumes) Volumes 1 and 2 edited by Hollis B. Chenery and T.N. Srinivasan Volumes 3A and 3B edited by Jere Behrman and T.N. Srinivasan 10. HANDBOOK OF INDUSTRIAL ORGANIZATION (in 3 volumes) Volumes 1 and 2 edited by Richard Schmalensee and Robert R. Willig Volume 3 is in preparation (editors Mark Armstrong and Robert H. Porter) 11. HANDBOOK OF GAME THEORY with Economic Applications (in 3 volumes) Edited by Robert J. Aumann and Sergiu Hart 12. HANDBOOK OF DEFENSE ECONOMICS (in 1 volume) Edited by Keith Hartley and Todd Sandler 13. HANDBOOK OF COMPUTATIONAL ECONOMICS (in 2 volumes) Volume 1 edited by Hans M. Amman, David A. Kendrick and John Rust Volume 2 is in preparation (editors Kenneth L. Judd and Leigh Tesfatsion) 14. HANDBOOK OF POPULATION AND FAMILY ECONOMICS (in 2 volumes) Edited by Mark R. Rosenzweig and Oded Stark 15. HANDBOOK OF MACROECONOMICS (in 3 volumes) Edited by John B. Taylor and Michael Woodford 16. HANDBOOK OF INCOME DISTRIBUTION (in 1 volume) Edited by Anthony B. Atkinson and Fran¸cois Bourguignon 17. HANDBOOK OF HEALTH ECONOMICS (in 2 volumes) Edited by Anthony J. Culyer and Joseph P. Newhouse 18. HANDBOOK OF AGRICULTURAL ECONOMICS (in 4 volumes) Edited by Bruce L. Gardner and Gordon C. Rausser
19. HANDBOOK OF SOCIAL CHOICE AND WELFARE (in 2 volumes) Volume 1 edited by Kenneth J. Arrow, Amartya K. Sen and Kotaro Suzumura Volume 2 is in preparation (editors Kenneth J. Arrow, Amartya K. Sen and Kotaro Suzumura) 20. HANDBOOK OF ENVIRONMENTAL ECONOMICS (in 3 volumes) Volume 1 is edited by Karl-G¨oran M¨aler and Jeffrey R. Vincent Volumes 2 and 3 are in preparation (editors Karl-G¨oran M¨aler and Jeffrey R. Vincent) 21. HANDBOOK OF THE ECONOMICS OF FINANCE (in 2 volumes) Editors George M. Constantinides, Milton Harris and Ren´e M. Stulz
FORTHCOMING TITLES HANDBOOK OF EXPERIMENTAL ECONOMICS RESULTS Editors Charles Plott and Vernon L. Smith HANDBOOK ON THE ECONOMICS OF GIVING, RECIPROCITY AND ALTRUISM Editors Serge-Christophe Kolm and Jean Mercier Ythier HANDBOOK ON THE ECONOMICS OF ART AND CULTURE Editors Victor Ginsburgh and David Throsby HANDBOOK OF ECONOMIC GROWTH Editors Philippe Aghion and Steven N. Durlauf HANDBOOK OF LAW AND ECONOMICS Editors A. Mitchell Polinsky and Steven Shavell HANDBOOK OF ECONOMIC FORECASTING Editors Graham Elliott, Clive W.J. Granger and Allan Timmermann HANDBOOK OF THE ECONOMICS OF EDUCATION Editors Eric Hanushek and Finis Welch All published volumes available