Editorial policy: The Journal of Econometrics is designed to serve as an outlet for important new research in both theoretical and applied econometrics. Papers dealing with estimation and other methodological aspects of the application of statistical inference to economic data as well as papers dealing with the application of econometric techniques to substantive areas of economics fall within the scope of the Journal. Econometric research in the traditional divisions of the discipline or in the newly developing areas of social experimentation are decidedly within the range of the Journal’s interests. The Annals of Econometrics form an integral part of the Journal of Econometrics. Each issue of the Annals includes a collection of refereed papers on an important topic in econometrics. Editors: T. AMEMIYA, Department of Economics, Encina Hall, Stanford University, Stanford, CA 94035-6072, USA. A.R. GALLANT, Duke University, Fuqua School of Business, Durham, NC 27708-0120, USA. J.F. GEWEKE, Department of Economics, University of Iowa, Iowa City, IA 52240-1000, USA. C. HSIAO, Department of Economics, University of Southern California, Los Angeles, CA 90089, USA. P. ROBINSON, Department of Economics, London School of Economics, London WC2 2AE, UK. A. ZELLNER, Graduate School of Business, University of Chicago, Chicago, IL 60637, USA. Executive Council: D.J. AIGNER, Paul Merage School of Business, University of California, Irvine CA 92697; T. AMEMIYA, Stanford University; R. BLUNDELL, University College, London; P. DHRYMES, Columbia University; D. JORGENSON, Harvard University; A. ZELLNER, University of Chicago. Associate Editors: Y. AÏT-SAHALIA, Princeton University, Princeton, USA; B.H. BALTAGI, Syracuse University, Syracuse, USA; R. BANSAL, Duke University, Durham, NC, USA; M.J. CHAMBERS, University of Essex, Colchester, UK; SONGNIAN CHEN, Hong Kong University of Science and Technology, Kowloon, Hong Kong; XIAOHONG CHEN, Department of Economics, Yale University, 30 Hillhouse Avenue, P.O. Box 208281, New Haven, CT 06520-8281, USA; MIKHAIL CHERNOV (LSE), London Business School, Sussex Place, Regents Park, London, NW1 4SA, UK; V. CHERNOZHUKOV, MIT, Massachusetts, USA; M. DEISTLER, Technical University of Vienna, Vienna, Austria; M.A. DELGADO, Universidad Carlos III de Madrid, Madrid, Spain; YANQIN FAN, Department of Economics, Vanderbilt University, VU Station B #351819, 2301 Vanderbilt Place, Nashville, TN 37235-1819, USA; S. FRUHWIRTH-SCHNATTER, Johannes Kepler University, Liuz, Austria; E. GHYSELS, University of North Carolina at Chapel Hill, NC, USA; J.C. HAM, University of Southern California, Los Angeles, CA, USA; J. HIDALGO, London School of Economics, London, UK; H. HONG, Stanford University, Stanford, USA; MICHAEL KEANE, University of Technology Sydney, P.O. Box 123 Broadway, NSW 2007, Australia; Y. KITAMURA, Yale Univeristy, New Haven, USA; G.M. KOOP, University of Strathclyde, Glasgow, UK; N. KUNITOMO, University of Tokyo, Tokyo, Japan; K. LAHIRI, State University of New York, Albany, NY, USA; Q. LI, Texas A&M University, College Station, USA; T. LI, Vanderbilt University, Nashville, TN, USA; R.L. MATZKIN, Northwestern University, Evanston, IL, USA; FRANCESCA MOLINARI (CORNELL), Department of Economics, 492 Uris Hall, Ithaca, New York 14853-7601, USA; F.C. PALM, Rijksuniversiteit Limburg, Maastricht, The Netherlands; D.J. POIRIER, University of California, Irvine, USA; B.M. PÖTSCHER, University of Vienna, Vienna, Austria; I. PRUCHA, University of Maryland, College Park, USA; P.C. REISS, Stanford Business School, Stanford, USA; E. RENAULT, University of North Carolina, Chapel Hill, NC; F. SCHORFHEIDE, University of Pennsylvania, USA; R. SICKLES, Rice University, Houston, USA; F. SOWELL, Carnegie Mellon University, Pittsburgh, PA, USA; MARK STEEL (WARWICK), Department of Statistics, University of Warwick, Coventry CV4 7AL, UK; DAG BJARNE TJOESTHEIM, Department of Mathematics, University of Bergen, Bergen, Norway; HERMAN VAN DIJK, Erasmus University, Rotterdam, The Netherlands; Q.H. VUONG, Pennsylvania State University, University Park, PA, USA; E. VYTLACIL, Columbia University, New York, USA; T. WANSBEEK, Rijksuniversiteit Groningen, Groningen, Netherlands; T. ZHA, Federal Reserve Bank of Atlanta, Atlanta, USA and Emory University, Atlanta, USA. Submission fee: Unsolicited manuscripts must be accompanied by a submission fee of US$50 for authors who currently do not subscribe to the Journal of Econometrics; subscribers are exempt. Personal cheques or money orders accompanying the manuscripts should be made payable to the Journal of Econometrics. Publication information: Journal of Econometrics (ISSN 0304-4076). For 2011, Volumes 160–165 (12 issues) are scheduled for publication. Subscription prices are available upon request from the Publisher, from the Elsevier Customer Service Department nearest you, or from this journal’s website (http://www.elsevier.com/locate/jeconom). Further information is available on this journal and other Elsevier products through Elsevier’s website (http://www.elsevier.com). Subscriptions are accepted on a prepaid basis only and are entered on a calendar year basis. Issues are sent by standard mail (surface within Europe, air delivery outside Europe). Priority rates are available upon request. Claims for missing issues should be made within six months of the date of dispatch. USA mailing notice: Journal of Econometrics (ISSN 0304-4076) is published monthly by Elsevier B.V. (Radarweg 29, 1043 NX Amsterdam, The Netherlands). Periodicals postage paid at Rahway, NJ 07065-9998, USA, and at additional mailing offices. USA POSTMASTER: Send change of address to Journal of Econometrics, Elsevier Customer Service Department, 3251 Riverport Lane, Maryland Heights, MO 63043, USA. AIRFREIGHT AND MAILING in the USA by Mercury International Limited, 365 Blair Road, Avenel, NJ 07001-2231, USA. Orders, claims, and journal inquiries: Please contact the Elsevier Customer Service Department nearest you. St. Louis: Elsevier Customer Service Department, 3251 Riverport Lane, Maryland Heights, MO 63043, USA; phone: (877) 8397126 [toll free within the USA]; (+1) (314) 4478878 [outside the USA]; fax: (+1) (314) 4478077; e-mail:
[email protected]. Oxford: Elsevier Customer Service Department, The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK; phone: (+44) (1865) 843434; fax: (+44) (1865) 843970; e-mail:
[email protected]. Tokyo: Elsevier Customer Service Department, 4F Higashi-Azabu, 1-Chome Bldg., 1-9-15 Higashi-Azabu, Minato-ku, Tokyo 106-0044, Japan; phone: (+81) (3) 5561 5037; fax: (+81) (3) 5561 5047; e-mail:
[email protected]. Singapore: Elsevier Customer Service Department, 3 Killiney Road, #08-01 Winsland House I, Singapore 239519; phone: (+65) 63490222; fax: (+65) 67331510; e-mail:
[email protected]. Printed by Henry Ling Ltd., Dorchester, United Kingdom The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper)
Journal of Econometrics 162 (2011) 1–5
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Guest editorial
The economics and econometrics of risk: An introduction to the special issue 1. Introduction Since the late 1940s, understanding risky choices has become a major area of emphasis in economic research. The expected utility (EU) paradigm, ushered in by Von Neumann and Morgenstern (1944), and applied and expanded by Friedman and Savage (1948) as well as Arrow (1971), became the dominant theory of decision making under uncertainty and served as the basis for the Bayesian statistical decision theory that also includes the well-known learning model, Bayes’ theorem; see, e.g., Raiffa and Schlaifer (1961) and Zellner (1971) for some early results and applications in statistics and econometrics. Also, the EU and Bayesian approaches have had major applications in the analysis and design of insurance contracts, assessment of the behavior of risky assets using the capital asset pricing model, and in the management of investment portfolios as explained in Markowitz (1959, 1991). For example, Putnam and Quintana (1993, 1994) have developed a Bayesian state space model with time-varying parameters which they have implemented using Bayesian shrinkage techniques (Zellner et al., 1991) and monthly stock price and other data to derive optimal portfolios – optimal in the sense of maximizing an investor’s EU – subject to a wealth constraint, month by month. They found that their optimal portfolios have returns exceeding the cumulative returns on the S&P 500, a benchmark ‘‘hold the market’’ strategy. These techniques have become part of the core operational tools of the financial sector.
when making decisions under ignorance, solutions that are quite different from those of EU decision-makers who are assumed to have enough information to compute EU and to maximize it. Given that policymakers, academics, farmers, and others typically make decisions about complicated matters with limited information, an adequate theory of rational behavior under ignorance and incomplete information, as well as complete information, is essential. The EU approach has also been challenged by non-EU models arising primarily from experimental economics and psychology. Initially, such work concentrated on producing empirical evidence mainly critical of the EU model. However, over the years, it led to the creation of alternative hypotheses and theories and new methods for comparing and evaluating them. In a recent presentation, Hey (1995) identified three phases in the evolution of research on risk. In each of these stages, theoretical innovations were followed by empirical research. While the EU model is a key element in the second stage of evolution, he characterizes non-EU models and other approaches (similarity and hysteresis state-contingent utility models) as the third stage. In this volume, we introduce empirical models that address the transition from the EU model to alternative models that explain behavior of individuals in making risky choices. In this regard, a broad set of alternative models are considered. Some present corrections and adjustments to the EU framework and others incorporate new biases in the modeling of how individuals make choices among risky alternatives. 3. Challenges in estimation of EU models
2. Concerns about the EU paradigm As with any new approach, concerns about the ability of the EU approach to explain economic behavior under risk and/or uncertainty emerged almost from its inception. Allais’s (1953) hypothetical experiments provided some results contradicting the predictions of the EU theory. Over time, numerous paradoxes were put forward that contradicted important implications of the EU theory (see, e.g. Kahneman and Tversky, 1979). Furthermore, as Just and Just show in their paper, included in this volume, the EU model faces serious statistical identification problems that have not yet been overcome in revealed-preference applications to usual production and consumption choice problems. In addition, the various anomalies associated with the EU model found in experimental and nonexperimental empirical studies have resulted in a literature presenting alternatives to the EU theory. An early alternative to the EU model was suggested by Arrow and Hurwicz (1977) based on their theory of rational behavior under ignorance that produces outcomes that are very different from those derived from the full information EU theory. The Arrow–Hurwicz theory predicts that individuals go to extremes 0304-4076/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2009.10.003
The first set of papers in this volume contributes to the growing literature on improving the estimation of key parameters within the EU framework. This literature includes studies such as Chetty and Szeidl (2008), which derives an alternative method of estimating risk aversion using income and substitution effects for the duration of unemployment. They find that the level of risk aversion due to unemployment shocks is substantially higher than previously thought, suggesting large benefits for social insurance programs. Further, many studies on choices under risk using the EU approach have been carried out in agriculture and economic development (see the survey by Chavas and Holt, 2002). Farmers generally manage risk associated with production, consumption, and investment, and encounter risk arising from both markets and natural phenomena, such as weather and pest infestations. Availability of data on land allocation and other farmer choices, as well as the need to assess a variety of government programs, including crop and revenue insurance schemes, has led to extensive analyses of risky choices in agriculture. The econometric challenge in estimating risk behavior in agriculture is to identify and estimate the parameters of both
2
Guest editorial / Journal of Econometrics 162 (2011) 1–5
production functions and risk preferences separately. Several techniques are based on approximations. For example, many rely on a mean–variance approach where risk preferences can be reduced to a simple trade-off between the first two moments of a distribution. Serra et al., in their paper included in this volume, apply such a mean–variance approach to assess the impacts of various government policies. They distinguish between price supports that depend on production levels and ‘‘decoupled’’ payment policies that are intended to transfer payments without allocation effects. Their analysis finds that the decoupled payments still affect production through a wealth effect that reduces absolute risk aversion. Thus, risk considerations complicate the establishment of pure transfer payments. They also find that the supply elasticities of decoupled payments are smaller than those of coupled payments. Their study confirms the results of many others; that farmers’ land allocation, input use, and time allocation reflect decreasing absolute risk aversion (DARA) and increasing relative risk aversion (IRRA). The econometric techniques used to simultaneously estimate risk preference coefficients and the coefficients of economic activity that generate the risk (for example, a production function) have been ingenious but problematic. Just and Just show that common techniques fail to identify risk-preference parameters, even approximately, except under arbitrary assumptions. The use of field data in econometric estimation requires estimating not only parameters of risk preference and the technical relationships that determine outcomes, but also the characteristics of the payoff distributions driving risk behavior. This leads to a fundamental problem where risk preference parameters, technical parameters, and risk distribution parameters are confounded, preventing separate identification. This is a problem that has been virtually ignored. In the past 20 years, risk attitude estimation with field data has focused almost exclusively on the use of first-order conditions of the decision problem. By focusing only on the firstorder conditions, researchers have ignored many of the underlying relationships that provide the necessary information to separately identify the risk, production, and decision processes. This leads to a crucial trade-off between flexibility in estimation and parametric identification. In order to maintain identifiability, flexibility can only be introduced in one of the estimable functions by eliminating flexibility in one of the remaining functions — arbitrarily impacting the parameter estimates. Thus, one cannot truly gain flexibility in risk estimation without more information. Jointly estimating production equations with first-order conditions could alleviate the problem. Alternatively, one could address the problem of confounding parameters using independent data sources to estimate the parameters of preferences and economic activity separately. Unfortunately, in most cases, we operate with a paucity of data — severely limiting our ability to discover the underlying structure. While several studies have found weak evidence to support Arrow’s hypotheses (DARA and IRRA), determining the extent of the role of wealth effects has been difficult. The paper by Just derives a nonparametric method to calibrate the changes in concavity necessary to rationalize a change in risk behavior given an increase in wealth. He shows that wealth effects may be much less important in issues such as investment and trade than is portrayed in the theoretical literature. 4. Dynamic application While much of the literature that leads to the estimation of risk-aversion parameters is based on static optimization, Pope et al., in this volume, expand the EU framework to develop a dynamic discounted EU maximization model for farm enterprises involving production, investment, and consumption choices. The
optimization process yields arbitrage conditions that were estimated econometrically using cross-section and time-series data on returns from farming and other activities in the various states in the North Central region of the United States. While the results suggest that a standard, arbitrage equilibrium does not hold when fixed effects are added to the econometric model, it yields a significant risk aversion coefficient, and the hypothesis of homogeneity of risk preferences cannot be rejected. 5. Use of storage to reduce risk One of the major tools to address financial and other risks is storage. Inventory management is crucial to understand the behavior of prices of agricultural and other storable commodities, as well as financial assets. Much of the analysis on the management and pricing of storable commodities subject to production risk with rational expectations was introduced by Gustafson’s (1958) model. This model’s predictions have been challenged by Deaton and Laroque (1992, 1996), in particular, with respect to the model’s inability to explain serial correlation in the prices of major commodities. The paper by Cafiero et al. evaluates this criticism by recasting the Gustafson model, identifying the properties of its equilibrium, and re-estimating both the Gustafson and the Deaton–Laroque model with a much finer grid. The results suggest much less autocorrelation than was implied by the earlier Deaton and Laroque paper for many of the commodities. The paper also suggests that estimation of data-intensive models of commodity price series may yield different outcomes depending on the precision of the data; therefore, reliable estimation requires higher levels of disaggregation of the time series. 6. Exchange rate risk Another major application of risk modeling is in the analysis of exchange rate risk. This risk is affected by many factors, and capturing their impact on the dynamics of currency prices is challenging. Previous techniques have been unable to account for multiple sources of risk, thus simplifying decisions beyond reason. Egorov, Li, and Ng propose a new technique, based on principal components for estimation of joint-term structures for bond portfolios involving multiple countries. They identify four factors – two common and two country specific – that explain the joint-term structure of the US dollar and the Euro reasonably well. 7. Stochastic dominance alternatives Stochastic dominance techniques were introduced to compare overall riskiness of different distributions. While these techniques cannot provide complete ordering, they provide a comparison whenever they are applicable. Second-order stochastic dominance is an especially useful concept since, when it is applied, it represents outcomes that are consistent with risk aversion. There is growing literature for testing stochastic dominance among distributions (Eubank et al., 1993; Davidson and Duclos, 2006). Schumann expands these frameworks, proposing a nonparametric method for testing between functional forms for EU using field data. He demonstrates that standard parametric tests of alternative models of decision making depend crucially on assumptions about the properties of the distribution of payoffs. His analysis confirms that EU only performs well when choices have very similar moments, severely reducing the power of any tests regarding the properties of these distributions. The nonparametric methods allow consistent tests of hypotheses about risk-preference parameters and risky choices.
Guest editorial / Journal of Econometrics 162 (2011) 1–5
8. Addressing the paradoxes The studies discussed above rely on traditional economic models where decision makers have well-defined objective functions, be it EU or expected profit, that are used to derive decision rules for estimating parameters of portfolio choice, inventory management, risk management policies etc. However, the paradoxes in some of the experimental studies suggest that the assumptions of a single objective function may not explain data adequately. As a result, alternative theories and econometric techniques have been introduced to estimate key parameters. Conte, Hey, and Moffatt, in their paper, recognize that the literature suggests a large number of theories to explain choices under uncertainty, and there is extensive evidence that shows heterogeneity in risky choices. Thus, one may need different models for different agents. Their solution is to use a mixture model that fits both the EU model and the Rank Dependent Expected Utility Theory (RDEU) (Quiggin, 1982) to fit survey data relating to university students. Using a simulated maximum likelihood approach, they estimated both the probability of behavior according to EU vs. RDEU, and the resulting parameters under each model. The results suggest heterogeneity, both within and between subjects. Their analysis estimates that 20% of the population is EU and 80% is RDEU, suggesting that behavior that systematically violates the EU assumption is prevalent. Their analysis also found significant differences in the risk aversion parameters across individuals, and it provides a methodological basis for future studies with larger populations that further investigate heterogeneity in risk decision rules and attitudes across and within individuals. Wilcox, in his paper, suggests an alternative to the explanation of choice paradoxes. He proposes a paradigm of multiple criteria decision models, where the decision criteria are affected by context. More specifically, he introduces the notion of ‘‘stochastic risk aversion,’’ where individuals display heteroskedastic error in choice depending on the choice context. Wilcox derives a notion of ranking stochastic risk aversion that coincides with non-stochastic notions of risk aversion (e.g. Pratt, 1964). Wilcox’s contextual utility provides theoretically appealing observable meaning between Pratt’s definition of a ‘‘more risk averse’’ person in a deterministic sense with a stochastic counterpart of being ‘‘more risk averse.’’ Using this contextual utility approach, he was able to obtain superior performance in explaining the Hey and Orme (1994) data set. Wilcox’s approach provides a strong case to expand the traditional view of economic decision making under uncertainty, where there exists a strong preferential foundation for decision making that is modified by conditions. The approaches that suggest that individuals follow varying heuristics for risk management under different circumstances, are consistent with some of the findings in the neuroscience literature, that decisions of varying complexity or context occur in different areas of the brain (Camerer, 2003). One explanation for use of different decision rules is the decision–cost argument, where simpler decision rules are applied to address less significant choice problems. This logic is behind the similarity approach introduced by Rubinstein (1988), who argued that EU is used for making choices among dissimilar alternatives, and simpler rules (random selection, expected benefits) that are used to choose among similar choices. Rubinstein introduced several measures of similarity, and these measures have been generalized. 9. Similarity Buschena and Atwood’s paper introduces several measures of similarity and compare their performance empirically using grid search maximum likelihood estimation routines and Bayesian
3
information criteria measures, extending the work of Hey and Orme (1994). Their analysis found that Euclidean distance, as well as cross-entropy, provides measures of similarity that are best for explaining observed choices among risky pairs. The experiments of Buschena and Atwood, as many others in the literature, consider outcomes that would not drastically affect the well-being of the participants. They find that EU explains these choices when the alternatives are not similar. But does EU apply to choices corresponding to low-probability, high-loss events? These are the type of choices List and Mason consider in their experimental study. They conduct experiments on two pools of subjects consisting of college students and CEOs. Each is asked to make choices regarding lotteries with small probabilities of large losses. When the results are interpreted within an EU perspective, CEOs appear to show greater aversion to losses than student populations. However, both groups violate EU axioms substantially. Their results demonstrate how standard EU models may underestimate the willingness to pay to reduce low-probability, high-stakes risks. 10. Statistical inference While Buschena and Atwood develop a similarity approach to choice among alternatives, the paper by Gilboa, Lieberman, and Schmeidler investigates the role of similarity in statistical inference. Their work relates analogical reasoning, which emphasizes similarities to statistical decision rules for inference and prediction under uncertainty, with an application to problems of density estimation. Their basic premise is that people make decisions (including predictions) at the present based on analogies to the past. Past situations that are similar to the current choice problem receive heavier weight in decision making. They develop an axiomatic framework for decision making, which, under various assumptions, leads to prediction rules based on different measures of similarity that are identical to well-known statistical estimation techniques. They find that basic principles embodied by the ‘‘combination axiom’’ result in a particular kernel technique. They apply their approach to derive estimates of density functions, which complement the result of previous studies that develop estimates or predictors for other problems. There is a need for further investigation of the relative performance of these new techniques based on ‘‘axiomatic decision theory’’ and those embodied in the current statistical theory (e.g., current estimation, testing, and prediction techniques presented in statistical textbooks that have been widely used and have various theoretical justifications). While Gilboa et al. take an axiomatic approach to assess rules of inference under uncertainty, Russo and Yong take a behavioral approach. They aim to investigate whether preferences affect risk assessment. The psychological literature develops a number of phenomena, like cognitive dissonance, under which individuals modify their perceptions to accommodate their specific situations (smokers are less likely to believe that smoking causes cancer (see Hsieh et al., 1996)). They view risk as a function of the likelihood of an outcome and the loss associated with it. Using new experimental techniques, they find the existence of desirability bias, namely, individuals’ expectations of future events are biased by their preferences for outcomes. In particular, individuals believe the best outcomes are more likely than they truly are, and their perceptions are immune to safeguards, like presenting information in numerical format. They further illustrate the devastating effects this phenomenon may have on preference estimation, even in the sterile environment of a laboratory, as preferences and beliefs are now fully endogenous even when experimenters specify the probabilities. They also consider techniques to reduce this bias. The results suggest the need for further quantitative research of the links between preferences and assessment of risks.
4
Guest editorial / Journal of Econometrics 162 (2011) 1–5
11. Dynamic perceptions The experimental study of Heiman and Lowengart finds that availability of new risk information may affect the decision-making process. In particular, they found that decision making of buying frequently purchased food may be altered from a single attribute to a multi-attribute choice based on health risk information. Using factor analysis, they find that, initially, when food is perceived to be relatively ‘‘safe’’, the attribute that dominates consumer choice of food is taste. However, the introduction of new evidence of the health hazards modifies the evaluation process to consider multiple attributes including heath, value, convenience, and taste. They find modifications of the decision process, and the weights given to various attributes to be dependent on the strength of the information. The findings of these papers are consistent with studies that suggest decision making is costly (e.g. similarity models). As a result, decision-making processes evolve gradually, and changes in the processes are likely to occur when there are likely to be substantial gains. 12. Conclusions The EU framework, combined with Bayesian procedures for learning and updating good model-formulation techniques and well-trained specialists to implement these techniques, provides a foundation for a large body of ongoing empirical and real-world applications. However, the papers in this Annals Issue identify some possible limitations of the EU framework. First, some of the papers show that choices made by individuals frequently violate the basic assumptions of the EU framework. A key point in this regard is whether these individuals are novices or well-trained specialists in the use of the EU framework. Still, it may be that modeling of preferences has to be modified to allow for greater flexibility — say, either multiple criteria that vary according to circumstances or more flexible stochastic outcomes. Note that, in the innovative work of Putnam and Quintana (1993, 1994), monthly utility functions were employed that can vary from month to month. It also used dynamic state space models for individual stocks that had time-varying coefficients and time-varying error term covariance matrices. This very flexible modeling and decision-making approach has produced very good investment returns. Second, econometric analyses that simultaneously estimate parameters of risk preference and the process that generates risk, are found to have significant limitations. Just and Just, in particular, suggest procedures that may overcome these limitations. Third, experimental studies suggest the need for a better conceptual understanding and applied modeling of choices associated with low-probability, high-loss outcomes, where the performance of EU is especially weak. However, it must be recognized that subjects involved in these studies (e.g., college students and CEOs) are not well trained in the statistical methodology associated with applying the EU framework, and may be responsible for some of the decision-making errors encountered in these experiments. In addition to presenting papers on modeling and estimation of risk preferences, this special issue also presents papers on the econometrics of risk perception. One line of research in this area derives risk estimation procedures that are based on a rigorous axiomatic foundation, while the other is based on a behavioral approach and suggests that assessment of risk by decision makers cannot be separated from their preferences. Some links between information, risk evaluation, and decision making are demonstrated by the empirical study of food choices. Finally, the major challenge facing us is to find a unifying framework that accommodates the elements of decision making under
uncertainty discussed above, and provides us with much improved empirical solutions to public policy and private decision problems. Empirical work in risky decisions faces special challenges in estimating unobserved behavioral parameters from decisions between unobserved outcomes. With the expanding pantheon of risky choice models, new tools are necessary to assess predictive power, applicability and feasibility. The work of this issue outlines several potential new directions to address these important issues. As recent world events have reminded us, risky choice is essential to the analysis of economic behavior and the formation of economic policy. So much of policy is designed to influence individual or joint risk. Econometric technique must develop and adapt to allow the furtive expansion of choice theories to inform policy makers. References Allais, M., 1953. Le comportement de l’Homme rationnel devant le risque. Econometrica 21, 503–546. Arrow, K.J., 1971. The theory of risk aversion. In: Essays in the Theory of Risk Bearing. Markham Publishing Co., Chicago. Arrow, K.J., Hurwicz, L., 1977. An optimality criterion for decision-making under ignorance. In: Arrow, K., Hurwicz, L. (Eds.), Studies in Resource Allocation Processes. Cambridge University Press, Cambridge. Camerer, C.F., 2003. Psychology and economics: Strategizing in the brain. Science 300, 1673–1675. Chavas, J.-P., Holt, M.T., 2002. The econometrics of risk. In: A Comprehensive Assessment of the Role of Risk in US Agriculture. Kluwer Academic Publishers, Norwell, MA. Chetty, R., Szeidl, A., 2008. Do consumption commitments affect risk preferences? Evidence from portfolio choice, Working paper, Department of Economics, UC Berkeley and NBER. Davidson, R., Duclos, J.-Y., 2006. Testing for restricted stochastic dominance, Departmental Working Papers, 2006-20, Department of Economics, McGill University. Deaton, A., Laroque, G., 1992. On the behaviour of commodity prices. Review of Economic Studies 59, 1–23. Deaton, A., Laroque, G., 1996. Competitive storage and commodity price dynamics. Journal of Political Economy 104, 896–923. Eubank, R., Schechtman, E., Yitzhaki, S., 1993. Test for second order stochastic dominance. Communications in Statistics—Theory and Methods 22, 1893–1905. Friedman, M., Savage, L.J., 1948. The utility analysis of choices involving risk. Journal of Political Economy 56, 279–304. Gustafson, R.L., 1958. Implications of recent research on optimal storage rules. Journal of Farm Economics 40, 290–300. Hey, J.D., Orme, C., 1994. Investigating generalizations of expected utility theory using experimental data. Econometrica 62, 1291–1326. Hey, J., 1995. Experiments and the economics of individual decision making under uncertainty and risk, an invited paper prepared for the Symposium on Experimental Economics at the 7th World Congress of the Econometric Society, Tokyo, Japan. Hsieh, C.-R., Yen, L.-L., Liu, J.-T., Lin, C.J., 1996. Smoking, health knowledge, and antismoking campaigns: An empirical study in Taiwan. Journal of Health Economics 15, 87–104. Kahneman, D., Tversky, A., 1979. Prospect theory: An analysis of decision under risk. Econometrica 47, 263–291. Markowitz, H.M., 1959. Portfolio Selection: Efficient Diversification of Investments. Yale University Press, New Haven, London. Markowitz, H.M., 1991. Portfolio Selection: Efficient Diversification of Investments. Blackwell, Oxford, UK. Pratt, J.W., 1964. Risk aversion in the small and in the large. Econometrica 32, 122–136. Putnam, B.J., Quintana, J.M., 1993. Bayesian global portfolio management. In: 1993 Proceedings of the Section on Bayesian Statistical Science. American Statistical Association, Alexandria. Putnam, B.J., Quintana, J.M., 1994. New Bayesian statistical approaches to estimating and evaluating models of exchange rates determination. In: Proceedings of the ASA Section on Bayesian Statistical Science, 1994 Joint Statistical Meetings. American Statistical Association. Quiggin, J., 1982. A theory of anticipated utility. Journal of Economic Behavior and Organization 3, 323–343. Raiffa, H., Schlaifer, R., 1961. Applied Statistical Decision Theory. Cambridge University Press, Cambridge. Rubinstein, A., 1988. Similarity and decision making under risk: Is there a utility theory resolution to the Allais paradox? Journal of Economic Theory 46, 145–153. Von Neumann, J., Morgenstern, O., 1944. The Theory of Games and Economic Behavior. Princeton University Press, Princeton, New Jersey. Zellner, Arnold, 1971. An Introduction to Bayesian Inference in Econometrics. Wiley, New York, (reprinted in 1996 in the Wiley Classics Series).
Guest editorial / Journal of Econometrics 162 (2011) 1–5
5
Zellner, A., Hong, C., Min, C., 1991. Forecasting turning points in international output growth rates using Bayesian exponentially weighted autoregression, time-varying parameter, and pooling techniques. Journal of Econometrics 49, 275–304.
University of California, Berkeley, United States E-mail address:
[email protected].
Arnold Zellner Graduate School of Business, University of Chicago, United States
Available online 9 October 2009
David Zilberman ∗,1 Department of Agricultural and Resource Economics,
∗ Corresponding address: Department of Agricultural and Resource
Economics, University of California Berkeley, 207 Giannini Hall 94720, Berkeley, CA, United States. Tel.: +1 510 642 6570. 1 David Zilberman is a Member of the Giannini Foundation of Agricultural Economics.
Journal of Econometrics 162 (2011) 6–17
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Global identification of risk preferences with revealed preference data Richard E. Just a,∗ , David R. Just b a
University of Maryland, United States
b
Cornell University, United States
article
info
Article history: Available online 12 October 2009 JEL classification: C51 C52 D21 D81 Q12 Keywords: Risk Risk preferences Identification Stochastic technology
abstract The concept of parameter identification (for a given specification) is differentiated from global identification (which specification is right). First-order conditions for production under risk are shown to admit many alternative specification pairs representing risk preferences and either perceived price risk, production risk, or the deterministic production structure. Imposing an arbitrary specification on any of the latter three determines which risk preference specification fits a given dataset, undermining global identification even when parameter identification is suggested by typical statistics. This lack of identification is not relaxed by increasing the number of observations. Critical implications for estimation of mean–variance specifications are derived. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Historically, production technologies and related input demand systems were estimated without regard to risk preferences both in primal approaches (Marschak and Andrews, 1944; Mundlak and Hoch, 1965; Zellner et al., 1966) and dual approaches (see studies in Fuss and McFadden, 1978), even for highly risky production problems such as in agriculture (Lau and Yotopoulos, 1972). Two subsequent developments have emphasized the role of risk. First, Sandmo (1971) demonstrated the implications of Von Neumann and Morgenstern’s (1944) framework for risky decision making. Second, factor inputs were recognized as tools to control production risk as well as to increase output (Just and Pope, 1978). Nevertheless, empirical modeling of risky production has progressed slowly, first testing for risk response (Behrman, 1968) and later for the structure of risk preferences (Binswanger, 1981). Many later studies show that inputs affect risk. Pope and Just (1996, 2002) have shown that consistent estimation requires modeling these effects even under expected profit maximization or cost minimization. While risk effects of factor inputs were originally demonstrated with experimental data, revealed preference data have been regarded as nearly essential for identifying risk preferences to overcome common concerns regarding the size and realism of payoffs
∗ Corresponding address: 2200 Symons Hall, University of Maryland, College Park, MD 20742, United States. Tel.: +1 301 405 1289; fax: +1 301 879 4501. E-mail address:
[email protected] (R.E. Just). 0304-4076/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2009.10.004
among others (e.g. Smith, 1982). Using revealed preference data (as assumed hereafter), empirical analysis requires estimation of both risk preferences and risk.1 Estimation in these problems raises two concerns. First, behavioral equations (defined henceforth as firstorder conditions or their implications) confound risk preferences and risk so that their estimation alone cannot identify risk preferences and risk separately. Thus, identification of risk preferences requires identifying the risk distribution by some other means. Second, if the risk distribution is estimated by specifying a structure for the effects of inputs on risk, then risk preference estimation and inference are influenced by the imposed risk structure to the point that arbitrary specification choices determine which specification of risk preferences will fit a given dataset. We first review the few studies that attempt to estimate nontrivial risk preferences and risk jointly to illustrate the extent of strong restrictions that have been required to achieve parameter identification. Each of these studies relies on estimation of behavioral equations to discern risk preferences and risk separately. Sections 3–5 then present general theoretical results and illustrative examples to show that estimation of behavioral equations
1 Throughout this paper, the term ‘‘risk’’ refers to risk perceived by the decision maker, including both production and price risk, except where otherwise noted. Obviously, objective risk does not drive actions if it is not perceived. Experimental evidence has suggested some systematic irregularities in the relationship of objective and perceived risk, which add complications beyond those represented explicitly here. These irregularities present problems of the same character as those discussed herein for the identification of risk preferences.
R.E. Just, D.R. Just / Journal of Econometrics 162 (2011) 6–17
alone is insufficient to identify (i) risk preferences when estimated jointly with either (ii) perceived price risk, (iii) perceived production risk, or (iv) perceived production structure. An inflexible or incorrect specification of risk (meaning any of the latter three) indirectly determines the risk preference specification that fits the data. Results prove that a false sense of parameter identification (of a given specification) has been attained at the expense of global identification (which specification is right). Sections 6 and 7 explain the implications of these results for mean–variance and two-moment models. A Monte Carlo exercise in Section 8 demonstrates the potential inability to differentiate even the polar cases of constant absolute risk aversion (CARA) from constant relative risk aversion (CRRA) with common risk specifications. Section 9 presents concrete suggestions for how the lack (or false sense) of identification can be overcome by additional structural modeling, elicitation, or experimentation. Section 10 concludes.2 2. Parameter identification at the expense of global identification In this section, we review the extent of restrictions used in practice to achieve parameter identification in joint estimation of risk preferences and risk as motivation for the remainder of the paper. The studies that jointly estimate non-trivial risk preferences and risk yield reasonably precise parameter identification as judged by estimated standard errors. The question we raise is whether parameter identification achieved by imposing strong restrictions yields misleading estimates that are not statistically discernable from what might be a globally correct specification in a broader context. Assuming estimation is intended to discern risk preferences globally rather than simply find a model that fits the data, we thus juxtapose the conventional notion of parameter identification for a given specification with a broader notion of global identification, defined as the ability to discern the correct specification among the set of possible specifications. Of the few studies that jointly estimate risk preferences and risk, the state of the art appears to be characterized by Antle (1987), Love and Buccola (1991, hereafter LB), Saha et al. (1994, hereafter SST) and Chavas and Holt (1996, hereafter CH). While several of these studies begin with a general model of preferences and technology, they all impose restrictions to achieve identification. By comparison, the vast majority of empirical production studies, many of which are based on duality, ignore risk aversion and/or risk-controlling factor inputs.3 Table 1 illustrates the mix of restrictions that have been imposed on some combination of risk preferences and risk. Columns two and three indicate the flexibility of the risk preference specification. The last column indicates the flexibility of other components. The LB study imposes restrictive CARA preferences equivalent to traditional linear mean–variance expected utility.4 The SST
2 All results in this paper are derived in the case of von Neumann–Morgenstern expected utility maximization. Newer theories of risky choice that weight probabilities to better represent certain aspects of human behavior, such as prospect theory (Kahneman and Tversky, 1979), Quiggin’s (1982) rank-dependent utility, or Machina’s (1987) generalized expected utility model can be represented by adding a simple weighting function to our mathematics. Thus, the issues we develop clearly apply to these more general frameworks as well. 3 Most duality studies also ignore price and production risk by assumption. Appelbaum and Kohli (1997) provide a partial exception under linear mean–variance preferences. The state-contingent approach permits application of duality without abridging risk preferences (Chambers and Quiggin, 2000) but, of the few empirical studies, both O’Donnell and Griffiths (2006) and Chavas (2008) indicate that only two or three states can be modeled due to multicollinearity, which would also bias estimates of risk preferences. 4 They find wide variation in absolute risk aversion among the adjacent counties, which seems incongruent with assuming CARA within counties.
7
study espouses the flexibility of the expo-power utility function, γ2 u(π ) = θ − e−γ1 π , which permits decreasing absolute risk aversion (DARA), CARA, or increasing absolute risk aversion (IARA) as γ1 < 1, γ1 = 1, or γ1 > 1, and decreasing relative risk aversion (DRRA) or increasing relative risk aversion (IRRA) as γ2 < 0 or γ2 > 0, respectively. However, they assume γ2 = 1/γ1 for estimation yielding only one preference parameter.5 The CH utility function permits CARA with IRRA, DARA with either IRRA or DRRA, as well as IARA with three parameters.6 Antle achieves an intermediate 2parameter flexibility in estimating preferences with a very general, though approximate, specification of prices and production. The last column of Table 1 describes the flexibility of specifications for risk components that permit tractable estimation of these preference specifications. Studies by LB and SST estimate a more flexible technology than CH, but their one-parameter utility functions are the most restrictive as is their assumption of nonstochastic prices.7 The CH preference specification is clearly the most flexible but identification is attained by imposing the most restrictive technology.8 Antle’s preference specification has second-order flexibility but is unable to represent globally such plausible phenomena as DARA with IRRA.9 By considering third-order flexibility in the risk effects of inputs (downside as well as conventional risk), Antle’s technology is seemingly the most general. However, harsh identifying restrictions are used to justify ignoring other outputs in estimation (by assuming covariances of outputs are unaffected by inputs) and to restrict attention to a single first-order condition.10 Thus, while Antle’s application apparently represents the most general distributional specification, the assumptions required for tractability and elimination of conflicting results undermine the apparent generality. A review of these studies shows that parameter identification has been attained only by restricting either the preference specification or the risk specification so that major opportunities for producers to deal with risk are ignored (e.g., use of risk-controlling
5 They claim that γ is merely a ‘‘scale parameter’’ that ‘‘plays no role in deter2 mining the structure of risk preferences’’ (pp. 180–181). But absolute risk aversion γ1 is (1 − γ1 + γ2 π )/π and relative risk aversion is 1 − γ1 + γ2 π γ1 , both of which are non-trivial functions of γ2 . 6 Because CH do not consider initial wealth, their estimates of ‘‘relative risk aversion’’ are more accurately estimates of partial relative risk aversion (Menezes and Hanson, 1970). 7 Using a Just–Pope specification with flexible risk effects of inputs, SST specify
profit as π = pf (x) + pg (x)ε − wx where the output price is p, the input vector is x, and corresponding input prices are w. Assuming all prices are nonstochastic, they simplify first-order conditions E [u′ (pfx − w + pgx )] = 0 to fx − w /p + E (u′ gx )/E (u′ ) = 0 even though their output price, the Kansas wheat harvest price, is clearly stochastic as anticipated at planting time. 8 The CH technology has fixed stochastic proportions, q = y x , i = 1, 2, where i
i i
qi is total production of crop i, yi is the (normally distributed) yield per acre of crop i, and xi is acreage allocated to crop i. Assuming a ‘‘technology’’ constraint, x2 = β0 + β1 x1 + β2 x21 , that is additive in x2 permits elimination of x2 from the problem, thus permitting estimation of the utility function from a single first-order condition. 9 Just as Arrow (1965) discredited quadratic utility functions because they
generate IARA, the third moment can dominate risk preferences in a cubic utility specification. 10 With conflicting results for labor and fertilizer, Antle argues that the fertilizer condition is misspecified due to credit constraints and dismisses it based on a misspecification test. However, if one input is constrained, then first-order conditions for other inputs are altered (except under implausible substitution assumptions inconsistent with his quadratic revenue moments). Antle also avoids further potential for conflicting results by not estimating a first-order condition for land with the justification that it is not ‘‘variable (at the margin) in the short run’’ (p. 516), which ignores potential reallocation of land among crops at the margin as emphasized by the CH model. Antle permits at least indirect inference regarding the heterogeneity of individual farmer risk preferences by considering them as random draws. However, the strong evidence he finds against CARA means that risk aversion parameters are not simply random draws as his methodology assumes, but vary systematically with wealth and income.
8
R.E. Just, D.R. Just / Journal of Econometrics 162 (2011) 6–17
Table 1 Assumptions used for joint estimation of risk preferences and risk. Study
Expected utility
Estimated preference parameters
Stochastic price specification Stochastic production specification Deterministic production specification
Love and Buccola (1991)
E (−e−γ π )
1
1 output, nonstochastic prices, Normally distributed production, 3 inputs control mean & variance independently
Saha et al. (1994)
E (−e−γ π
1
Antle (1987)
E (π) + γ1 E (π 2 ) + γ2 E (π 3 )
1 output, nonstochastic prices, Weibull distributed production, 2 inputs control mean & variance independently Multiple outputs but only one is estimated,a 3 revenue distribution moments controlled by 4 inputs only 2 first-order conditions are estimated, one of which is discardedb
Chavas and Holt (1996)
E
1/γ
)
eγ0 +γ1 π+γ2 π dπ 2
2
3c
2 outputs controlled by one decision,d truncated Normal prices, normal yields, stochastic fixed-proportions production,e 1 allocated input
a
Other outputs are disregarded assuming that inputs do not affect the covariance of profits with other outputs. Of two inputs for which first-order conditions are estimated, the fertilizer condition is ignored, arguing that failure to represented credit constraints may cause bias (although that can also bias estimation of other first-order conditions). c The CH specification also includes an additional term to estimate a time trend associated with γ0 . d With a land allocation constraint, the input decision for the first output determines the input decision for the second. e The mean and standard deviation of each output is proportional to the amount of input allocated to it. b
inputs or diversification). We determine a fundamental reason why this is the case, regardless of data availability. 3. Dependence of estimated risk preferences on the price risk specification In the next three sections we show that parameter identification in joint estimation of risk preferences and risk from behavioral equations can be achieved only by imposing such restrictions. For theoretical clarity, we show this separately for each of three components of the risk specification. In this section we consider possibilities for separate identification of risk preferences and perceived price risk abstracting from production risk and production structure. To demonstrate the generality of the identification problem, suppose (i) u(π ) represents a general utility function of a random outcome π satisfying standard properties, u′ > 0 and u′′ < 0, (ii) the outcome π is a function π = π (x, ε) of a vector of choices x chosen from a feasible set x ∈ X and a vector of random prices ε where π is differentiable, increasing, and concave in x for every ε ∈ P and has a non-vanishing Jacobian with respect to ε for every x ∈ X , and (iii) the producer’s subjective probability density of ε is represented by g (ε) on support ε ∈ P where g satisfies standard properties, g (ε) > 0 for ε ∈ P and P g (ε)dε = 1. The density g induces a density h(π|x) on support π ∈ Π (x) = {z |z = π(x, ε), ε ∈ P } for the random outcome π conditioned on the vector of actions x. Each of these functions (u, π, g, and, hence, h) is assumed to depend on unknown parameters requiring estimation. The expected utility maximization criterion is thus
∫ max x∈X
Π (x)
u(π )h(π |x)dπ.
(1)
Proposition 1 (Parameter Identification). Suppose outcomes are subjectively random after conditioning on actions and that a given revealed preference dataset is sufficient to achieve identification of a joint parametric specification of the utility function and probability density of outcomes {u(π ), h(π |x)} satisfying standard properties. Then many alternative specification pairs, {˜u(π), h˜ (π|x)}, exist that each generate identical behavior for all x.
Proof. Consider an arbitrary monotonically increasing transformation π˜ = t (π ) with inverse function π = t −1 (π˜ ) that maps ˜ (x). This transformation can induce any arbitrary conΠ (x) onto Π tinuous distribution on π˜ that merely orders the profit outcomes from high to low in the same way. Applying this transformation reveals that
∫ Π (x)
u(π )h(π|x)dπ =
∫ ˜ (x) Π
∫ = ˜ (x) Π
where u˜ (π˜ )
1 u(t −1 (π˜ ))h(t −1 (π˜ )|x)tπ− ˜ ˜ dπ
u˜ (π˜ )h˜ (π˜ |x)dπ˜
≡ h(t −1 (π˜ )|x). 1 2 −1 Monotonically increasing utility requires u˜ π˜ = uπ (tπ− ˜ ) + utπ˜ π˜ > ≡
1 u(t −1 (π˜ ))tπ− and h˜ (π˜ |x) ˜
0, which holds if t is either concave or sensibly convex, defined by tπ π < uπ tπ /u.11 Because the decision criterion is an identical function of x within the set of specifications that satisfies this condition, behavior is identical. The condition on t merely requires a sensible pair of specifications, {˜u(π ), h˜ (π|x)}, i.e., monotonically increasing utility.12 This condition clearly includes widely differing possibilities with both more or less risk aversion compared to u(π ), and significant redistributions of risk compared to h(π|x). Because the result holds for all concave t as well as all convex t satisfying the convexity restriction, an infinite set of nontrivial pairs of specifications of risk preferences and price risk fits the same behavior equivalently. Since paired specifications in this set are indiscernible econometrically, the choice of specification for either the utility function or the subjective distribution of price risk is arbitrary.
11 Differentiating the identity π
= t −1 (t (π)) successively implies tπ−˜ 1 =
1 −1 −1 2 2 5 4 1/tπ , tπ− ˜ π˜ = −tπ˜ tππ /tπ and tπ˜ π˜ π˜ = 3tππ /tπ − tπππ /tπ , which simplifies interpretation. 12 As proved, Proposition 1 admits risk loving behavior. Risk averse (neutral) 1 3 −1 −1 −1 (loving) behavior occurs as u˜ π˜ π˜ = uππ (tπ− ˜ ) + 3uπ tπ˜ tπ˜ π˜ + utπ˜ π˜ π˜ < (=)(>)0, 2 or equivalently, as uππ tπ − 3u˜ π˜ tπ tππ − utπππ < (=)(>)0, where the first right hand term is negative and the second and third right hand terms are of the opposite signs as tππ and tπππ , respectively. However, the substantive implications of the proposition also hold if risk loving behavior is excluded, although the range of arbitrary specifications is reduced. Imposing uππ tπ − 3u˜ π˜ tπ2 tππ − utπππ ≤ 0 merely
means that arbitrary choices for u˜ (π) among the range of possible {˜u(π), h˜ (π |x)} must satisfy u˜ ′′ < 0.
R.E. Just, D.R. Just / Journal of Econometrics 162 (2011) 6–17
Proposition 2 (Failure of Parameter Identification in a Global Context). Let H be the set of all possible probability densities h˜ (π|x) and let U be the set of all possible utility specifications u˜ (π ) defined by the indistinguishable pairs of specifications associated with {u(π ), h(π|x)} in Proposition 1. If a flexible specification of utility that admits all choices in U can be identified by imposing a particular parametric probability density specification hˆ (π|x) ∈ H, then estimated risk preferences are determined by the arbitrary choice of hˆ (π|x) (if revealed preference data are adequate to identify both by estimation of behavioral equations). Proof. If the probability density specification hˆ (π |x) is imposed, then alternative pairs of specifications {˜u(π ), h˜ (π|x)} with hˆ (π|x) ̸= h˜ (π|x) are eliminated from consideration, even though one of the eliminated pairs may represent the true pair of specifications. Thus, imposing hˆ (π|x) determines u(π ) as the utility specification(s) uˆ (π ) that, when paired with hˆ (π |x), generate(s) the same behavior as {u(π ), h(π|x)}. Proposition 3 (Failure of Global Identification). Under the assumptions of Proposition 2, if flexible specifications of risk preferences admitting all choices in U and of risk admitting all choices in H can be found, then neither is identified by estimation of behavioral equations. Proof. If many pairs of specifications in U × H generate identical objective criteria and first-order conditions, then parameters that select u(π ) in U and h(π |x) in H are not identified. Intuitively, Propositions 2 and 3 show that estimation of behavioral equations cannot distinguish curvature in a utility function from a redistribution of price risk. The choice of either the utility function specification u˜ (π ) or the probability density specification h˜ (π|x) is arbitrary within the range of possible pairs and determines which specification of the other will fit behavior. For example, an inability to identify the distribution of price risk necessarily implies an inability to identify risk preferences from estimation of behavioral equations. Thus, global identification is not achieved regardless of the precision of parameter identification suggested by estimated standard errors. 3.1. Some examples These propositions apply to a wide variety of practical problems involving price risk such as the portfolio problem of finance and producer problems with price risk involving either fixed allocable resources or purchased inputs. With a fixed allocable resource constraint related to credit, capital, or plant size, profit is π = εx − c (x) with resource constraint vx = w where x is a column vector of allocations, ε is a row vector of revenues per unit of the resource, c (x) represents production costs (sometimes characterized by constant marginal costs as in activity analysis), v is a row vector of ones, and w is the binding resource limit. If c (x) ≡ 0, this is the standard portfolio problem where π = εx is ex post wealth, w is initial wealth, ε is a vector of rates of return including the original investment, and x is the portfolio allocation.13 Early portfolio work with this model assumed CARA and normality (e.g. Frankfurter et al., 1971), in which case, the decision
13 While we focus on estimation of a utility function, many portfolio studies have a simpler objective of measuring returns to a mean–variance efficiency frontier (e.g. Shanken, 1990). Our results show that such models are ill suited to infer risk preferences (as Markowitz, 1959, suggested) except under strict assumptions that can invalidate tests.
9
criterion is14 u−1 (Eu(π )) = µπ − (φ/2)σπ2 = µx − c (x) − (φ/2)xT Σ x,
(2)
where decision maker perceptions are represented by ε ∼ N (µ, Σ ). The Lagrangian for maxx {Eu(εx − c (x))|vx = w} is L = µx − c (x)−(φ/2)xT Σ x −λ(vx −w). First-order conditions are µ − c ′ − φ xT Σ = λv and vx = w . Eliminating the unobserved λ,
[
x = (φ Σ )−1 (µ − c ′ )T −
v (φ Σ )−1 (µ − c ′ )T − w v (φ Σ )−1 v T
]
vT .
(3)
In the corresponding case with purchased variable inputs, stochastic prices, and nonstochastic production, profit is π = εx(z ) − wz where x(z ) is a multi-output production function, z is a vector of inputs, and w is a nonstochastic row vector of input prices.15 Defining a cost function, c (˜x) = minz {wz |x(z ) = x˜ }, profit can again be stated as π = εx − c (x), so that under CARA and normality of ε the decision criterion in (2) applies. In this case, choices satisfy maxx {Eu(εx − c (x))} and first-order conditions imply directly that x = (φ Σ )−1 (µ − c ′ )T .
(4)
While the limitations of the models in (2)–(4) are widely recognized, they provide a transparent illustration of a general risk preference identification problem proved generally in Propositions 1–3. In the realistic case where decision makers’ subjective parameters µ and Σ must be estimated, estimation based on behavioral equations alone cannot identify separately either risk aversion or perceived risk. In every instance, φ and Σ appear multiplied together just as u and h are multiplied together in the propositions. Even if population averages of µ and Σ are known but individuals are heterogeneous in both preferences and perceptions, variations in risk aversion among individuals cannot be distinguished from variations in beliefs. 3.2. Practical implications for functional flexibility The practical implications of Propositions 2 and 3 are that estimates of risk preferences and price risk based on behavioral equations alone cannot be taken seriously because theory implies that the two cannot be identified separately. Any separate identification of these two model components suggested by statistical results must be a false product of imposing a specification insufficiently flexible to represent alternatives with distinctly different risk preferences that cannot be discerned econometrically. In contrast, defensible estimates of risk preference and price risk are possible only if the interpretation of revealed preference data is supplemented by an estimate of either risk preferences or price risk that is developed independent of behavioral equations. 4. Dependence of estimated risk preferences on the stochastic production specification Flexibility in the risk preference specification and the production risk specification can also serve as substitutes in fitting behavioral equations. To isolate production risk in the framework of Section 4, assume prices are nonstochastic and let x represent a
14 The differential equation φ = −u′′ /u′ defining CARA implies that the utility function, aside from affine transformations, can be represented as u(π) = −e−φπ where u′′ < 0 implies φ > 0. Where π ∼ N (µπ , σπ2 ), the moment generating
function Mπ (t ) = E (et π ) = et µπ +t σπ /2 yields E [u(π)] = −e−φµπ +φ σπ /2 from which (2) follows. 15 Prior conditions, such as wealth and fixed production factors, are suppressed 2
2
2
2
for convenience. While input price risk can be added, we do not do so because input prices are generally assumed known at the time of input choices.
10
R.E. Just, D.R. Just / Journal of Econometrics 162 (2011) 6–17
vector of factor inputs so that technology generates profits in the form π = π (x, ε) with input choice set x ∈ X where ε represents a vector of production shocks that are subjectively random at the time factor input choices are made (e.g., weather, breakdowns, disease). With this reinterpretation of the model, Propositions 1–3 immediately apply to joint estimation of risk preferences and production risk. Thus, an infinite set of paired specifications of risk preferences and production risk are also econometrically indiscernible based on estimation of behavioral equations with revealed preference data, implying that curvature in the utility function (risk aversion) and redistribution in the production risk specification are, in principle, perfect substitutes in fitting data. The choice of either the utility function specification u˜ (π ) or the production risk specification that induces h˜ (π|x) is arbitrary within a wide range of possible pairs and determines which specification of the other will fit behavior. Global identification is thus not possible from estimation of behavioral equations alone regardless of statistical precision suggested by estimates with particular specifications. 4.1. An example A simple but flexible way of representing the distinct roles of factor inputs in controlling both production risk and expected production is provided by the Just–Pope production function, q = f (x) + σ (x)ε, E (ε) = 0, where f and σ are both second-order differentiable and f is increasing and concave. While σ is assumed positive (without loss of generality) and concave (to assure second-order conditions), it can be either increasing or decreasing depending on whether inputs increase or decrease production risk (Just and Pope, 1978). Suppose utility follows CARA, ε ∼ N (0, 1), and that both f and σ have unknown parameters to be estimated in their derivatives. Then π ∼ N (pf (x)− wx, p2 σ 2 (x)) so the decision criterion is max pf (x) − wx − (φ/2)p2 σ 2 (x). x
(5)
The associated first-order conditions are pfx − w − (φ/2)p2 σ 2x = 0.
(6)
First-order conditions thus include an equation associated with each factor input (as well as for each output if generalized to multiple outputs). While one might think observing input choices that can control risk would provide rich information to discern risk preferences, estimation of behavioral equations alone cannot identify risk preferences and production risk separately. Any multiplicative parameter in σ 2x cannot be distinguished from the risk aversion parameter φ because they appear multiplicatively in every first-order condition. Thus, imposing one determines the other. Even worse, assuming σ 2x = 0, as in studies that ignore riskcontrolling inputs, makes the risk term in (6) vanish so that risk preferences cannot be estimated. Conversely, imposing expected profit maximization (φ = 0), as in typical studies using duality, makes the risk term in (6) vanish so that effects of factor inputs on risk cannot be estimated.16
16 A further complication arises when producers can determine input use after observing stochastic conditions during a production cycle (Antle and Hatchett, 1986). If ε is known when actions are determined, then profit is maximized in each state of nature, first-order conditions become pf ′ (x) − w + pσ 2x ε = 0, and the producer’s attitude toward risk is not revealed. Estimation of (6) causes E (pσ 2x ε) to be falsely interpreted as an estimate of (φ/2)p2 σ 2x . Thus, the estimated risk aversion coefficient reflects the correlation of σ 2x and ε , which depends solely on the technology. Thus, correct inference of risk preferences also depends on correctly characterizing individual inputs by whether they are determined before or after observing or partially observing ε .
4.2. Practical implications for functional flexibility Because risk preferences and production risk cannot be identified separately, any apparent statistical identification of both achieved by estimating behavioral equations with imperfectly flexible forms is artificial and cannot be taken seriously in the context of global identification. Valid separate estimation of risk preference and production risk requires a separate means of identifying either production risk or risk preferences to interpret revealed preference data. 5. Dependence of estimated risk preferences on the production structure specification It has long been recognized that typical concavity in production under expected profit maximization can cause risk response in behavior (Just, 1975). We show more generally that nonlinearity in the production structure can substitute for risk preferences in fitting data. To isolate production structure, suppose again that x represents a vector of production inputs rather than outputs, and that ε represents a random vector reflecting either random prices or random shocks to production, or both, that are not realized until after input quantities are chosen. For this case, we restate the criterion determining actions in (1) as
∫
u(π (x, ε))g (ε)dx.
max x∈X
P
Again the functions (u, π , and g) are assumed to depend on unknown parameters to be estimated. Proposition 4 (Parameter Identification). Suppose some factors affecting production are subjectively random when actions are determined and that a given revealed preference dataset is sufficient to achieve identification of a joint parametric specification of the utility function and technology {u(π ), π (x, ε)} satisfying standard properties. Then many alternative specification pairs, {˜u(π ), π˜ (x, ε)}, exist such that each generates identical behavior. Proof. Let u˜ (π ) be an arbitrary alternative utility function satisfying standard properties. Then define an alternative specification of the technology, π˜ (x, ε) ≡ u˜ −1 (u(π (x, ε))), where u˜ −1 (·) is the inverse function such that π0 = u˜ −1 (u0 ) if and only if u0 = u˜ (π0 ), which is well defined when u˜′ > 0. Then π˜ x (x, ε) = 1 ′ ˜′ u˜ − u uπ πx (x, ε) > 0 is assured by u > 0 and u > 0; concavity in x − 1 2 ′ − 1 1 requires π˜ xx (x, ε) = u˜ uu uπ πx πx + u˜ u uπ π πx πx′ + u˜ − u uπ πxx < 0. Thus, any u˜ (π ) satisfying this condition (as well as u˜′ > 0 and u˜′′ < 0) coupled with π˜ (x, ε) generates the same decision criterion as {u(π ), π (x, ε)}, P u(π (x, ε))g (ε)dx = P u˜ (π˜ (x, ε))g (ε)dx. Because the decision criterion is identical, behavior is identical for this wide range of specification pairs. The latter two terms of the concavity condition are negative representing the concavity of u in π , and π in x, while the first term 1 ˜ ππ < 0. represents the inverse curvature of u˜ where u˜ − uu > 0 as u Thus, the condition merely requires that the curvature of u˜ does not exceed the effective sum of curvatures of u in π , and π in x.17 This condition thus requires only a combination of u˜ (π ) and π˜ (x, ε) that satisfies second-order conditions. For all such cases u(π (x, ε)) = u˜ (π˜ (x, ε)) for all x ∈ X and ε ∈ P, which implies that
17 Differentiating the identity u = u˜ (˜u−1 (u)) implies u˜ −1 = 1/˜u and u˜ −1 = π u uu −˜uππ /˜u3π , from which the concavity condition can be expressed as u˜ π π˜ xx = 2 2 ′ ′ −˜uππ (uπ /˜uπ )πx πx + uππ πx πx + uπ πxx < 0. The first two right hand terms just offset one another when u = u˜ so that the third term offers additional possibilities for sharper curvature in u˜ .
R.E. Just, D.R. Just / Journal of Econometrics 162 (2011) 6–17
the two pairs of specifications, {u(π ), π (x, ε)} and {˜u(π ), π˜ (x, ε)}, generate identical behavior. Because the choice of u˜ (π ) is arbitrary aside from the concavity restriction (which permits a wide range of possibilities with either greater or less risk aversion than u), an infinite set of nontrivial pairs of specifications representing widely different preferences exists that generate exactly the same behavior.18 Proofs parallel to those of Propositions 2 and 3 thus yield similar propositions. Proposition 5 (Failure of Parameter Identification in a Global Context). Let U be the set of all possible utility specifications u˜ (π ) and let T be the set of all possible technology specifications π˜ (x, ε) defined by the indistinguishable pairs of specifications associated with {u(π), π (x, ε)} in Proposition 4. If a flexible specification of utility that admits all choices in U can be identified by imposing a particular parametric technology specification πˆ (x, ε) ∈ T , then estimated risk preferences are determined by the arbitrary choice of πˆ (x, ε) (if revealed preference data are adequate to identify both from estimation of behavioral equations). Proposition 6 (Failure of Global Identification). Under the assumptions of Proposition 5, if flexible specifications of risk preferences admitting all choices in U and of technology admitting all choices in T can be found, then neither is identified by estimation of behavioral equations. Propositions 4–6 show that revealed preference data can identify only the sum of curvatures attributable to utility and technology rather than the amount of curvature attributable to each individually. In particular, assuming risk neutrality, i.e., linearity of u˜ (π ), will always satisfy the conditions of Proposition 4. This has two implications. First, when both technology and preferences must be estimated from behavioral equations, risk averse and risk loving behavior cannot be distinguished without some knowledge of the technology that cannot be obtained from estimation of behavioral equations alone. Second, derivatives of the popular second-order flexible profit function forms of duality (translog, generalized Leontief, generalized Cobb–Douglas, etc.), which assume (expected) profit maximization, can fit revealed preference data in production problems when, in fact, concavity of the ‘‘estimated’’ technology estimates the sum of concavities of the true utility function and technology combined. Thus, dual forms with flexible technology specifications may fit a wide range of datasets even though the characterization of underlying behavior is far off the mark. 5.1. Some examples As a simple example, consider the CARA-normality case with a normalized nonstochastic output price, scalar input x, and profit π = f (x) + σ (x)ε where f (x) = bx − cx2 , σ (x) = d1/2 x, and ε ∼ N (0, 1). Where w is the relative input price, the certainty equivalent criterion is bx − cx2 − w x − (φ/2)dx2 with first-order condition x = (b − w)/(2c + φ d). Thus, any combination of c, d, and φ such that 2c + φ d = k for some constant k will fit a given dataset on x and w identically. If the risk specification incorrectly assumes σ (x) does not depend on x, then 2c is
incorrectly overestimated by φ d. Alternatively, if the deterministic specification of production, f (x), incorrectly omits cx2 , then the marginal risk premium (per unit of x), φ d, is overestimated by 2c. An indirect estimate of φ derived from an estimate of φ d (for example, by estimating d from a heteroskedasticity correction in the π equation) is overestimated accordingly. Thus, the estimate of risk preferences depends on the deterministic production specification. Another example where this problem may by important is the flexible utility function with fixed-proportions technology assumed by CH. They estimate a highly flexible utility function where marginal utility is exponential in a quadratic of profit, u′ = eγ0 +γ1 π+γ2 π , profit follows fixed-proportions, π = r1 x1 + r2 x2 , and ri is a random net return per unit of xi . Suppose alternatively that utility omits the quadratic term (γ2 = 0), and that the production structure permits diminishing marginal productivity, π ∗ = r1 (x1 − α1 x21 + α3 x1 x2 ) + r2 (x2 − α2 x22 + α3 x1 x2 ). Then the two marginal 2
utilities, eγ0 +γ1 π = eγ0 +γ1 [r1 (x1 −α1 x1 +α3 x1 x2 )+r2 (x2 −α2 x2 +α3 x1 x2 )] and 2 2 eγ0 +γ1 π+γ2 π = eγ0 +γ1 (r1 x1 +r2 x2 )+γ2 (r1 x1 +r2 x2 ) , are identical and produce identical first-order conditions if γ1 α1 = γ1 α2 = γ2 aside from approximating the multiplier 2γ2 r1 r2 of x1 x2 in the last second-order cross term by γ1 α3 (r1 + r2 ). Thus, the more general deterministic production specification is highly substitutable for the more general preference specification even in the context of explicit specifications. As a result, while parameter identification is attained with a narrow technology specification, global identification of risk preferences appears to be lacking. ∗
2
2
5.2. Practical implications for functional flexibility Propositions 4–6 show, for practical econometric purposes, that restricting the technology specification can indirectly determine the estimated risk preference structure just as can restricting the price or production risk specification. Unless a flexible specification is used for technology, both the estimated extent of risk aversion and risk preferences, in general, may reflect nothing more than diminishing marginal returns. While a typical suggestion for this problem might be to use flexible specifications for both preferences and technology, the implication of Proposition 6 is that specifying fully flexible forms for both results in a failure of identification. Any apparent statistical identification achieved by estimating imperfectly flexible forms based on behavioral equations alone is thus artificial. Valid estimates of either risk preferences or technology require some means of identification independent of behavioral equations. 6. Misinterpretation of mean–variance approximations This section considers implications for the common practice of regarding mean–variance models as approximations. The most common model for joint estimation of risk preferences and risk has been the linear mean–variance model, often justified as a two-term Taylor series approximation of utility, max µπ (x) − (φ/2)σπ2 (x), x
(7)
where φ = −u′′ (π¯ )/u′ (π¯ ) is absolute risk aversion at the mean of profit. Treating φ as a constant in this approximation, first-order conditions are represented as
∂µπ /∂ x − (φ/2)∂σπ2 /∂ x = 0, 18 Some might argue that a more explicit representation of profit such as π(x, ε) = pq(x, ε) − wx with an additive term wx might provide sufficient structure to avoid this lack of identification. However, even if prices are assumed nonstochastic to maximize possibilities for identification, the implied technology form is q(x, ε) = [π(x, ε) + wx]/p under utility u and q˜ (x, ε) = [π( ˜ x, ε) + wx]/p under utility u˜ . Thus, the implications of imposing an arbitrary utility function translate directly into an implied failure to identify the technology function q(x, ε) as well.
11
(8)
and are typically assumed to represent a local approximation by constant absolute risk aversion. Much of the empirical literature fails to recognize that φ in this approximation does not generally approximate absolute risk aversion. In fact, φ depends on expected profit and, thus, on factor input levels, which (8) fails to recognize. As a result, non-trivial effects of inputs on both risk aversion and risk are generally confounded in the second left-hand term of (8).
12
R.E. Just, D.R. Just / Journal of Econometrics 162 (2011) 6–17
Proposition 7. Suppose the distribution of outcomes is completely characterized by the mean and variance as in h(π|µπ , σπ2 ). For any given joint specification of the effects of inputs on the mean and variance {µπ (x), σπ2 (x)} and utility function u(π ) satisfying standard properties, first-order conditions can be represented in a ˆ 2)∂σπ2 /∂ x = 0 for some linear mean–variance form as ∂µπ /∂ x −(φ/
ˆ π , σπ2 ), assuming an internal maximum. φˆ = φ(µ
∞ −∞
u(π)h(π|µπ , σπ2 )dπ for which first-order conditions for maximization are u¯ 1 ∂µπ /∂ x + u¯ 2 ∂σπ2 /∂ x = 0 where u¯ 1 = ∂ u¯ /∂µπ
ˆ 2)∂σπ2 /∂ x = 0 where and u¯ 2 = ∂ u¯ /∂σπ2 . Thus, ∂µπ /∂ x − (φ/
Proposition 8. Under the conditions of Proposition 7, if either CARA ˆ π , σπ2 ) does not generally or normality does not hold, then φˆ = φ(µ measure absolute risk aversion, nor is the implied decision criterion ˆ 2)σπ2 . generally of the form maxx µπ − (φ/ Proof. Suppressing the arguments of µπ and σπ for convenience, utility is u(π ) = −e−φπ under CARA, so that u¯ (µπ , σπ2 ) = 2
Eu(π) = −e−φµπ +φ σπ /2 if π ∼ N (µπ , σπ2 ) where u¯ 2 /¯u1 = −φ/2 is correctly half of φ = −u′′ /u′ . As proof by exception under CARA, suppose π ∼ γ (α, β) where µπ = αβ and σπ2 = αβ 2 implies α = µ2π /σπ2 and β = σπ2 /µπ . The gamma moment generating function, E (eπ t ) = (1 − β t )−α , implies that expected utility is u¯ (µπ , σπ2 ) = Eu(π ) = E (−e−φπ ) = −(1 + φβ)−α = −(1 + 2 2
φσπ2 /µπ )−µπ /σπ . If σπ2 > 0, then u¯ 2 /¯u1 = −(µπ /σπ2 )[(φσπ2 + µπ ) ln(1+φσπ2 /µπ )−φσπ2 ]/[2(φσπ2 +µπ ) ln(1+φσπ2 /µπ )−φσπ2 ], which in general does not approximate half of φ = −u′′ /u′ . As an exception under normality, suppose π ∼ N (µπ , σπ2 ) and that utility follows a truncated Taylor series expansion of lnπ , u(π) = ln µπ + (π − µπ )/µπ − (π − µπ )2 /(2µ2π ) + (π − µπ )3 /(6µ3π )−(π −µπ )4 /(24µ4π ), for which u¯ (µπ , σπ2 ) = Eu(π ) = ln µπ − σπ2 /(2µ2π ) − 3σπ4 /(24µ4π ). Thus, u¯ 2 /¯u1 = −[(2µ2π + σπ2 )µπ ]/[2µ2π (2µ2π + σπ2 ) + 2σπ4 ], which is not equal to half of −u′′ /u′ = 3(5µ2π − 4π µπ + π 2 )/(16µ3π − 15π µ2π + 6π 2 µπ − π 3 ) except where σπ2 = 0. To prove the latter assertion, if φ is not constant then the first-order conditions for maximizing µπ (x) − (φ/2)σπ2 (x) differ from ∂µπ /∂ x − (φ/2)∂σπ2 /∂ x = 0 by the additive term −(σπ2 /2)∂φ/∂ x. Thus, the first-order condition is not 2
u−1 (Eu(π )) = µln π − (ψ − 1)σln2 π /2
= ln µπ − (ψ/2) ln(1 + σπ2 /µ2π ).
(9)
The certainty equivalent, which approaches µπ as σπ approaches 2 2 zero, is thus eln µπ −(ψ/2) ln(1+σπ /µπ ) = µπ (1 + σπ2 /µ2π )−ψ/2 = 2
Proof. Expected utility can be represented as u¯ (µπ , σπ2 ) ≡
ˆ 2 = φ(µ ˆ π , σπ2 )/2 = −¯u2 /¯u1 . φ/
The CRRA-log-normal case gives a concrete exception. Where ln π ∼ N (µln π , σln2 π ), µln π = ln µπ − ln(1 + σπ2 /µ2π )/2 and σln2 π = ln(1 + σπ2 /µ2π ).20 Thus, the decision criterion is21
2
generally in the form of (8) if the objective criterion is in the form of (7). Conversely, solving the associated differential equation, the ˆ 2)σπ2 yields only function φˆ of x for which maximizing µπ − (φ/
ˆ 2)∂σπ2 /∂ x = 0 requires φˆ to first-order conditions ∂µπ /∂ x + (φ/ be a constant.
Propositions 7 and 8 show that, while production risk problems can be represented and estimated in a certainty equivalent form given by mean profit plus some term multiplied by the variance of profit, this form does not convey evidence in support of CARA and the term multiplying the variance does not correspond to any particular Arrow–Pratt measure of risk aversion as many have suggested.19 If u¯ 2 /¯u1 is not constant in x, then the problem cannot be described by (7) and (8). These results validate Coyle’s (1999) claim (without proof) that the certainty equivalent of profit can ˆ π , σπ2 )σπ2 /2, but invalidate his claims be represented as µπ − φ(µ to estimate risk aversion using duality for cases beyond CARAnormality.
19 This result is suggested by the seminal work of Pratt (1964), who differentiated risk aversion in the small from in the large. The linear mean–variance approximation was offered only for small marginal introductions of risk.
ˆ π , σπ2 )σπ2 /2 where φ(µ ˆ π , σπ2 ) = 2(µπ /σπ2 )[1 − (1 + µπ − φ(µ 2 2 −ψ/2 σπ /µπ ) ]. ˆ π , σπ2 ) can reflect As this CRRA-log-normal case shows, φ(µ
something far different than a standard risk aversion measure. This can also be shown for many other distributions.22 Thus, while Proposition 7 proves the pervasive applicability of the linear ˆ π , σπ2 )σπ2 /2, Proposition 8 mean–variance specification, µπ − φ(µ
ˆ π , σπ2 ) has no direct relationship to absolute risk shows that φ(µ
ˆ π , σπ2 )σπ2 /2 aversion. Rather, the general applicability of µπ − φ(µ may explain why CARA-normality specifications find widespread positive empirical support even though results may be grossly misinterpreted as identifying risk preferences. 7. Implications for two-moment models The results here also have implications for the common practice of modeling risky decisions with two-moment approximations. Meyer (1987) has shown equivalence of expected utility and meanstandard-deviation models under a location and scale condition that requires profit to be of the form π = µ + σ ε where actions influence only µ and σ and ε is a scalar error with a two-parameter distribution. In practice, this result has been used to justify mean–variance or mean-standard-deviation models. However, this equivalence fails when profit is a nonlinear transformation of µ+σ ε. The CRRA-log-normality case is an example where profit is an exponential function of µ + σ ε . Some have failed to recognize these exceptions to Meyer’s condition, which he acknowledged. The generality of this failure is clarified by: Proposition 9. Suppose the utility function satisfies standard properties and that the effects of inputs on the profit distribution are determined uniquely by the mean and variance of a monotonic transformation of profit, µ(x) = E (t (π )) and σ 2 (x) = V (t (π)), respectively. Then for any given joint specification of the utility function u(π ) and mean and variance effects of inputs, µ(x) and σ 2 (x), many alternative joint specifications of these three functions generate identical behavior under different risk preferences. Proof. Suppressing the arguments of µ and σ 2 for convenience, let π˜ = t (π ) be an arbitrary monotonically increasing function of profit with inverse function denoted by π = t˜(π˜ ) such that u˜ (π˜ ) ≡ u(t˜(π˜ )) satisfies standard utility function properties. If x
20 The moment generating function for normality implies µ = E (eln π ) = π 2 2 Mln π (1) = eµln π +σln π /2 and E (π 2 ) = E (e2 ln π ) = Mln π (2) = e2µln π +2σln π . The 2 expressions for µln π and σln π are obtained by solving these relationships where
σπ2 = E (π 2 ) − µ2π = e2µln π +σln π (eσln π − 1). 21 The differential equation ψ = −u′′ π/u′ defining CRRA implies that the utility function, aside from affine transformations, can be represented as u(π) = (1 − ψ)−1 π 1−ψ = (1 − ψ)−1 e(1−ψ) ln π where u′′ < 0 implies ψ > 0. Where ln π ∼ N (µln π , σln2 π ), the moment generating function yields Eu(π) = (1 − 2 2 ψ)−1 E (e(1−ψ) ln π ) = (1 − ψ)−1 e(1−ψ)µln π +(1−ψ) σln π /2 ,from which (9) follows. 2
2
22 For example, a moment generating function approach under CARA can be used to derive behavioral equations under a variety of distributions, such as the gamma, ˆ π , σπ2 ). to find a nontrivial φˆ = φ(µ
R.E. Just, D.R. Just / Journal of Econometrics 162 (2011) 6–17
is chosen to maximize u(π ) where the distribution of π = µ + σ ε is determined by x and an arbitrary distribution of ε , then maximization of the utility function u˜ (π˜ ) generates identical behavior where π˜ = t (µ + σ ε) because u(t˜(t (π ))) = u˜ (π˜ ) for every choice of x. Where expected utility ∞and first-order conditions are represented by u¯ (µπ , σπ2 ) ≡ −∞ u(π )h(π|µπ , σπ2 )dπ
and u¯ 1 ∂µπ /∂ x + u¯ 2 ∂σπ2 /∂ x = 0 as in the proof of Proposition 7, let µ ˜ = µ(µ ˜ π ) be an arbitrary monotonically increasing function of µπ that is increasing and concave in x and let σ˜ 2 = σ˜ (σπ2 ) be an arbitrary monotonically increasing function of σπ2 with corresponding inverse functions µπ = µ ˜ −1 (µ) ˜ and 2 −1 2 σπ = σ˜ (σ˜ ), respectively. Define an alternative expected utility function u˜ (µ, ˜ σ˜ 2 ) ≡ u¯ (µ ˜ −1 (µ), ˜ σ˜ −1 (σ˜ 2 )). Then maximization of u˜ (µ, ˜ σ˜ 2 ) with mean function µ( ˜ x) ≡ µ(µ ˜ π (x)) and variance function σ˜ 2 (x) ≡ σ˜ (σπ2 (x)) generates the same first-order conditions as maximization of u¯ (µπ , σπ2 ) because µ ˜ −1 (µ(µ ˜ π )) = −1 2 µπ , (∂ µ ˜ −1 /∂ µ)(∂ ˜ µ/∂µ ˜ ) = 1, and (∂ σ ˜ /∂ σ ˜ )(∂ σ˜ 2 /∂σπ2 ) = π 1. Proposition 9 shows that a host of expected utility specifications generate identical behavior when Meyer’s location and scale condition is relaxed. Log-normality, where t is the exponential transformation, is only one of many possibilities. We conclude that inferring preferences from mean-standard deviation or mean–variance models is not meaningful for global identification of risk preferences. 8. A Monte Carlo investigation of global identification As an illustration of the inability to differentiate the two popular but rather polar cases of CARA with normality from CRRA with lognormality, we present a brief Monte Carlo analysis. For the CARAnormality case, suppose profit with a normalized nonstochastic output price is π = a + bx − cx2 + d1/2 xε where x is a scalar input and ε ∼ N (0, 1).23 Data are simulated for estimation of this model by solving the first-order condition for maximization of µπ − (φ/2)σπ2 = α + bx − cx2 − (φ/2)dx2 and adding a decision making error, δ ∼ N (0, σδ2 ), thus obtaining x = b/(2c +φ d)+δ . For the CRRA-log-normal case, let ln π = a + bx − cx2 + d1/2 xε where ε ∼ N (0, 1). Data are simulated for estimation of this model by solving the first-order condition for maximization of µln π − (ψ − 1)σln2 π /2 and adding a decision making error, δ ∼ N (0, σδ2 ), thus obtaining x = b/[2c + (ψ − 1)d] + δ.24 8.1. The Monte Carlo experiment We first generate distributions of wealth among 40, 100, and 400 producers with density Aw0α on support w0 ∈ (4, 25) where α = −0.1. The associated distributions of absolute and relative risk aversion are induced by φ = w0−α and ψ = w01−α , respectively. These forms have the advantage that the distribution of risk aversion is independent of the scale of wealth, i.e., φ = (t w0 )−α and ψ = (t w0 )1−α if the density of wealth is A(t w0 )α on support t w0 ∈ (4t , 25t ). We next generate data with the CARA-normality model and use it to estimate both the CARAnormality and CRRA-log-normality specifications. Then we generate data with the CRRA-log-normality model and use it to estimate
23 For example, suppose production follows q = f (x) + g (x)ε where f (x) = a + b∗ x − cx2 , g (x) = d1/2 x, and variable cost is w x where w is the price of x and b = b∗ − w . 24 For example, under CRRA with Just–Pope technology and a normalized output price suppose π = f (x) + g (x)ε ∗ where f (x) = ea+bx−c c − d/2, and ε = e ∗
d1/2 xε
−e
dx2 /2
∗ x2
2
, g (x) = ea+bx−cx , c ∗ = , which satisfies E (ε ) = 0. ∗
13
both the CRRA-log-normality and CARA-normality specifications. In each case, risk is estimated using the applicable π or ln π equations, and risk preferences are estimated using the x equation. Parameters used for the CARA-normality case are a = 4.875, b = 0.91, c = 0.0042, and d = 0.09, which are chosen to fit the production structure of profit (or, equivalently, to fit production assuming a normalized nonstochastic price) to arbitrary data means x¯ = 25 and π¯ = 25. Slope and curvature are chosen to match a Cobb–Douglas function with production elasticity 0.7 at data means, which assures a curvature within accepted wisdom for the single variable input case (SST estimate a materials elasticity of 0.72 where the only other input is capital). To assure plausible variability, d is chosen so the coefficient of variation for production is 0.30 (variability of yields for corn, soybeans, sorghum and wheat in Kansas reported by Just and Weninger (1999), correspond to coefficients of variation from 0.19 to 0.32). The error in the decision equation is assumed to have a standard deviation equal to 15 percent of x¯ . Parameters for the CRRA-log-normality case are chosen to fit the same calibration. The means and standard deviations of parameter estimates after 100,000 repetitions for each sample size are reported in Tables 2 and 3 where the data for Table 2 are generated by the CARA-normality model and the data for Table 3 are generated by the CRRA-log-normality model. For each repetition, the technology parameters are estimated by OLS regression of π or ln π on a constant, x, and x2 . Squared residuals are then regressed on x2 to estimate d. Because summaries of standard errors and t-ratios of individual regressions are not reported for brevity, our purposes are served without heteroskedasticity improvements. The estimated risk preference parameter, obtained by estimation of the decision equation conditional on the estimated production technology, is reported in the penultimate column of both tables. As expected, average errors in parameter estimates decline with the number of observations, although randomness in averages is surprisingly high even with 100,000 repetitions. Much larger random variation among averages was confirmed successively at 10,000 and 1000 repetitions. Standard deviations of both estimated technology coefficients and the estimated preference parameter, as well as the ratio of estimated parameters to standard deviations of both (not shown), decline with sample size in both tables regardless of whether the correct specification is estimated.25 Interestingly, the ratios of estimated coefficients to the standard deviations are lower in the incorrect case in Table 2 but higher in the incorrect case in Table 3. A Monte Carlo analysis also allows illustrating the errors-invariables (EIV) problem that afflicts joint estimation of risk preference and risk because risk preferences can be estimated only using imperfect estimates of risk. To measure errors-in-variables bias, the last column reports the estimated α when the decision equation follows the correct specification and is conditioned on the true production coefficients. Both Tables 2 and 3 reveal a substantial EIV bias, which is relatively larger at small sample sizes (declining from 30% to 3%) under CARA but is near 5% regardless of sample size under CRRA. Standard deviations of α estimates are much higher in the practical EIV cases than where the technology is known. By comparison, the case without EIV generates highly precise estimates of α for all small sample sizes. Assuming goodness-of-fit measures are used to determine which specification is appropriate, the sum of squared errors (SSE)
25 The standard deviations among parameter estimates in Tables 2 and 3 do not represent standard errors for individual regressions because they include variation among randomly generated datasets. Boot-strapped standard errors, for example, would be calculated by repetitions using the same data. Rather, our focus is on variation among datasets generated by the same structure.
14
R.E. Just, D.R. Just / Journal of Econometrics 162 (2011) 6–17
Table 2 Monte Carlo results for data generated under CARA behavior. Coefficient Actual value
a 4.875000
b 0.910000
c 0.004200
d 0.090000
α −0.100000
αw/oEIV −0.100000
Results for data generated by CARA behavior and estimated as CARA behavior 40 observations Mean Standard deviation 100 observations Mean Standard deviation 400 observations Mean Standard deviation
4.884101 4.133478
0.907752 0.833146
0.004084 0.039009
0.078090 0.026154
−0.033784
−0.099433
0.169943
0.021770
4.892751 2.212012
0.906086 0.466728
0.004004 0.022416
0.084716 0.018199
−0.071681
−0.099776
0.101584
0.013725
4.879336 1.023425
0.909184 0.222211
0.004168 0.010823
0.088493 0.009638
−0.092089
−0.099952
0.048662
0.006852
Results for data generated by CARA behavior and estimated as CRRA behavior 40 observations Mean Standard deviation 100 observations Mean Standard deviation 400 observations Mean Standard deviation
1.760356 0.296320
0.101592 0.055883
0.001994 0.002496
0.000335 0.000182
1.156563 0.326265
N/A N/A
1.750636 0.162006
0.102945 0.031165
0.002035 0.001410
0.000357 0.000132
1.080078 0.202987
N/A N/A
1.742332 0.080115
0.104207 0.015405
0.002079 0.000693
0.000368 0.000070
1.018365 0.085914
N/A N/A
Comparison of mean sum of squares of errors with CARA data Sum of squares/n
SSE/n for π
SSE/n for ln π
SSE/n for x
SSE/n w/o EIV
40 observations CARA estimated as CARA
0.061988
12.324793
18.293304
13.713492
CARA estimated as CRRA 100 observations CARA estimated as CARA
0.059721
12.557125
44.154411
N/A
0.064626
13.019477
15.632176
13.925473
CARA estimated as CRRA 400 observations CARA estimated as CARA
0.062813
13.277681
37.177597
N/A
0.065470
13.371700
14.106424
14.021340
CARA estimated as CRRA
0.064315
13.647158
30.672016
N/A
All cases are simulated with 100,000 repetitions. See the text for details. N/A corresponds to cases with incorrect specifications so comparison to true parameters is inapplicable.
for both specifications are reported in the last part of Tables 2 and 3. The SSEs for the production equations are almost identical between correct and incorrect specifications, preventing discernment with nonnested tests. Also, the differences that exist give conflicting indications. Suppose model selection is based on SSEs from predicting raw profit if CARA is assumed and on SSEs from predicting logged profit if CRRA is assumed. Then comparisons of SSEs in Table 2, where the true model is CARA, always favor the model that is not assumed for estimation, regardless of which is correct. However, comparisons in Table 3, where the true model is CRRA, always favor the model that is assumed for estimation, regardless of which is correct. Thus, typical approaches of model selection based on comparisons of SSEs are meaningless. Turning to the behavioral equations, comparisons of SSEs for the two specifications show that CARA is favored for sample sizes up to 100 regardless of whether the data were generated by CARA or CRRA. Only with a very large number of observations (400) do SSEs for the behavioral equations favor the correct model. Finally, consider the comparison of SSEs for the behavioral equations with and without EIV. With CARA data (Table 2), the SSE for the correct CARA model is considerably higher with EIV than without at small sample sizes but fades as sample size increases (as the error in estimating the behavioral coefficient declines). With CRRA data (Table 3), however, the difference in SSEs for the behavioral equation with and without EIV is huge and remains so as the number of observations increases (as the error in estimating the behavioral coefficient remains). Upon careful scrutiny, this occurs because of the nonlinearity of the EIV problem (α appears in the denominator of the x equation). As a result, the EIV problem can cause significant skewness in the errors so that typical random
variation in estimates of b, c, and d can, by chance, cause a very long tail in the distribution, which dramatically affects the SSE. To examine the generality of these results, we altered the parameters values over a wide but plausible range. We found that seemingly innocuous variations in parameter values (such as units of measurement for π and x) can determine whether empirical results favor one specification versus the other regardless of which is the true model. We also found that the major difference in SSEs for the behavioral equation with and without EIV can occur in the case of CARA data because it has the same nonlinear EIV problem in its structure. In addition to estimation of behavior conditioned on estimates of technology as in Tables 2 and 3, we also attempted joint estimation of the production and behavioral equations, but encountered serious problems of non-convergence. Upon reflection, this identification problem is exactly what is suggested by the lack of model identification in Tables 2 and 3. 8.2. Misleading implications These results illustrate the difficulty of global identification compared to the parameter identification of narrow specifications. Lack of empirical discernment is explained by the econometric similarities of the implicit behavioral criteria, µ − φσ 2 /2 and µ − (ψ −1)σ 2 /2. Applied econometricians know that models involving logged data are often hard to discern from similar specifications with non-logged data. Nevertheless, the two specifications have vastly different behavioral implications for the purposes of welfare or policy analysis. The risk premium with CRRA-log-normality in (9) is Pln N (ψ, µπ , σπ2 ) ≡ µπ [1 − (1 + σπ2 /µ2π )−ψ/2 ]
R.E. Just, D.R. Just / Journal of Econometrics 162 (2011) 6–17
15
Table 3 Monte Carlo results for data generated under CRRA behavior. Coefficient Actual value
a 1.468876
b 0.056000
c 0.000629
d 0.000138
1−α 0.900000
1 − α w/o EIV 0.900000
Results for data generated by CRRA behavior and estimated as CRRA behavior 40 observations Mean Standard deviation 100 observations Mean Standard deviation 400 observations Mean Standard deviation
1.474243 0.979660
0.055487 0.087966
0.000618 0.001892
0.000122 0.000039
0.949554 0.296689
0.900003 0.000793
1.470554 0.602250
0.055874 0.054002
0.000627 0.001158
0.000132 0.000026
0.963921 0.219109
0.899999 0.000514
1.469322 0.295749
0.055957 0.026489
0.000628 0.000567
0.000136 0.000013
0.954874 0.136949
0.900001 0.000282
Results for data generated by CRRA behavior and estimated as CARA behavior 40 observations Mean Standard deviation 100 observations Mean Standard deviation 400 observations Mean Standard deviation
2.822951 13.282839
0.425837 1.216754
0.001705 0.026717
0.022480 0.012315
−0.110564
2.749702 8.221429
0.433251 0.752484
0.001880 0.016483
0.024624 0.008728
−0.147579
2.729177 4.035185
0.434913 0.368952
0.001911 0.008068
0.025717 0.004595
−0.170021
0.250898
0.139504
0.083324
N/A N/A N/A N/A N/A N/A
Comparison of mean sum of squares of errors with CRRA data Sum of squares/n
SSE/n for π
SSE/n for ln π
SSE/n for x
SSE/n w/o EIV
40 observations CRRA estimated as CRRA CRRA estimated as CARA
11.44465 11.17653
0.06809978 0.07026530
62.73532 38.08284
0.02195355 N/A
100 observations CRRA estimated as CRRA CRRA estimated as CARA
12.21891 11.93544
0.07203491 0.07394611
66.10721 47.48484
0.02230486 N/A
400 observations CRRA estimated as CRRA CRRA estimated as CARA
12.59943 12.31147
0.07400521 0.07577511
36.18299 63.38168
0.02246308 N/A
All cases are simulated with 100,000 repetitions. See the text for details. N/A corresponds to cases with incorrect specifications so comparison to true parameters is inapplicable.
compared to PN (φ, Σ ) ≡ (φ/2)σπ2 under CARA-normality. These differ substantially in form and marginal effects, which would significantly alter the evaluation of policies affecting risk. These results may explain why CARA models perform well empirically even though theoretical logic tends to favor CRRA or some intermediate form with DARA and IRRA (Arrow, 1965). In fact, the case of ψ < 1 in the CRRA-log-normal criterion, E (ln π ) − (ψ − 1)V (ln π )/2, may explain why some estimates of the CARAnormality model, E (π ) − φ V (π )/2, have suggested risk loving behavior. Estimates of φ may actually estimate ψ − 1. 26 Further, according to theoretical results above, these are only two of many specifications that may be confused. 9. Possibilities for global identification of risk preferences and risk The propositions in this paper imply a triple jeopardy whereby an arbitrary or inappropriately narrow specification of price risk, production risk, or production structure biases risk preference estimates. Even though seemingly accurate parameter identification is achieved as suggested by estimated standard errors and t-ratios, global identification is not possible from estimation of behavioral equations alone.
26 While ψ < 1 may seem to imply risk-loving behavior contrary to Jensen’s inequality given concavity of utility, Jensen’s inequality applies further for E (ln π) as a function of π as verified by (9), which declines in V (π).
The typical approach to achieving global identification (as opposed to parameter identification) is to use more flexible functional forms. In most estimation problems, limits on specification flexibility can be relaxed by adding observations. However, adding observations does not solve the identification problem for joint estimation of risk preferences and risk. Many specifications of risk preferences paired with particular specifications of price risk, production risk, or production structure fit behavioral equations with a given revealed preference dataset equally well regardless of the number of observations. As a result, inference about any individual model component without regard to the specificity of others is of little value. Testing a restrictive hypothesis about u(π ) if h(π|x) is inflexible biases the test toward rejection of the true specification of u(π ) if h(π |x) is misspecified. On the other hand, testing a restrictive hypothesis about u(π ) if h(π|x) is too flexible biases the test toward nonrejection because many alternative sets of parameters fit equally well.27 Propositions 3 and 6 imply that no sensible specific hypothesis can be rejected in the context of global identification. The implications of our results are that global identification can be achieved only by supplementing estimation of behavioral equations. We suggest two general approaches for this purpose.
27 The same applies in reverse for testing a restrictive hypothesis about h(π|x) depending on the specification of u(π). These reverse implications offer an explanation for the discrepancies SST find between joint and separate estimates of preferences and technology. They find standard errors of their production coefficient are ‘‘consistently and considerably lower’’ (p. 182) in the case of joint estimation, which could reflect greater parameter identification for the technology imposed by a specific inflexible preference structure, compared to lacking global identification.
16
R.E. Just, D.R. Just / Journal of Econometrics 162 (2011) 6–17
9.1. Estimation of additional structural equations Many datasets are sufficient to allow not only estimation of behavioral equations but also additional relationships that identify price risk, production risk, and production structure independent of behavioral equations. Estimation of these relationships can resolve identification problems if three critical requirements can be met. First, when data are available on physical input and output quantities, the technology can be estimated directly, for example, in the form of a production function to allow inference independent of behavioral equations about the production structure, thus eliminating the ambiguities of Proposition 4. Rejection of a common technology specification between the production function and behavioral equations implies invalidation of the risk preference specification. Before the rise of duality, estimated production systems commonly including both the production relationship and first-order conditions (Mundlak and Hoch, 1965; Zellner et al., 1966). More recently, typical dual approaches have estimated only the first derivatives of the profit function, so that all estimated equations reflect first-order conditions. Thus, an equation that can ultimately identify separate functions in behavioral equations has been dropped from common practice rather than used to test preference assumptions.28 Second, the additional production equation must be used not only to identify the mean of production but to infer the distributional form given the mean. This requires inference to determine whether production risk follows normality or some other distribution to eliminate the ambiguities of Proposition 1 as applied to production risk. As suggested by Just and Weninger (1999), this can be difficult because estimation of random deviations depends on correctly estimating expected production. Further, for both production structure and production risk, an additional problem is adequate reflection of the subjective risk perceived by the producer rather than the objective risk that can be estimated econometrically. This likely requires data specific to the producer in panel form and representing perceptions of the production function and its risk with a Bayesian decision theory approach whereby future production responses do not influence earlier risk perceptions and more recent data may have higher weight (Just, 1977). Third, if data are available on both observed prices and the information on which subjective price distributions depend, e.g., as in a Bayesian approach that represents the dynamics of perceptions, then estimation of this relationship may be useful for eliminating the price risk ambiguities of Proposition 1. However, simply estimating price expectation equations or embedding price expectation functions in behavioral equations in place of prices, as has often been done, does not suffice.29 To eliminate the ambiguities identified by Proposition 1, this inference must determine the proper distributional form of price perceptions independent of behavioral equations to allow identification of risk preferences. Reviews by Bollerslev et al. (1992) show that ARCH/GARCH modeling of perceptions with normality are most common (e.g. Diebold and Nerlove, 1989). However, a more recent technique by Anderson et al. (2001) suggests that individual stock prices are distributed approximately log-normally. The theoretical and Monte Carlo results above for alternatives as similar as normality and log-normality show that this practical
28 Alternatively, dropping the physical relationship among inputs and outputs from the estimated system can also be viewed as a loss of information causing inefficient estimation (Mundlak, 1996). 29 A variety of price expectation mechanisms have been embedded in behavioral equations to characterize individual perceptions including naïve, adaptive, and rational expectations (with the latter based on either reduced-form or structural models). But such models typically include arbitrary distributional assumptions.
ambiguity in the distribution (given expectations) means that no confidence can be placed in estimated risk preferences. Tests of this distributional structure independent of behavioral equations are critical to the identification of risk preferences. In summary, the results of this paper show that if additional inference does not go so far as to identify the distribution of price risk, the distribution of production risk, and the functional structure of production by estimating equations that supplement behavioral equations, then one cannot claim to have identified risk preferences for producers. Further, these models must ideally generate observation-specific predictions that can be matched to observations on behavioral equations to allow appropriate estimation and inference of risk preferences. 9.2. Elicitation and experimentation Because the estimation of additional structural models requires more complex estimation than in typical practice,30 approaches associated with elicitation or experimentation may be attractive. The key to identifying risk preferences from revealed preference data is to observe risk in isolation. One possibility is to elicit perceptions of risk from the same agents that generate the revealed preference behavioral data. While this would entail a considerable expansion of traditional data, the feasibility of this approach was demonstrated by Just et al. (1999) who used a survey of perceptions piggy-backed onto an annual survey conducted by the US Department of Agriculture. By separating preferences from perceptions, they were able to differentiate adverse selection incentives from risk aversion incentives for crop insurance participation. Another possibility is the application of experimental methods. While experimental methods have become popular, they have been focused on isolated use rather than conjunctive use with revealed preference methods.31 While lab experiments are unsuitable because the risk perceptions must be paired with the revealed preference data, field experiments might in some cases be administered to the producers who generate the behavioral data. This would require designing experiments based on actual outcomes so as to eliminate or separate out the effect of preferences on responses. Of course, field experiments can also be used to estimate preferences directly. But field experiments with real economic consequences of a scale consistent with many modern production problems likely have infeasible expense or require questionable scaling. 10. Conclusions and implications for joint estimation of risk preferences and risk The hope of modern econometrics is to employ sufficiently flexible specifications and tests to overcome most concerns of misspecification. But the estimation of risk preferences presents an unusual challenge because any such tests are necessarily nested in problems of identifying the underlying perceptions of price risk, production risk and production structure. This paper shows that global identification of risk preferences is not possible by
30 Additional problems include bias from simultaneous estimation of the system if any component of the system is misspecified (Kaplan, 1988), nonlinear errors in variables when the non-behavioral components are estimated separately before estimating behavioral equations (Amemiya, 1985), and pretest estimation bias (Judge et al., 1985) when the same dataset is used to draw inference by successive testing of many interactive model components. 31 Experimental data would not necessarily be subject to the identification problems discussed here because risk can be controlled in economic experiments (Hey and Orme, 1994). However, similar problems can occur if individuals perceive the risk distribution incorrectly or incorrectly transform it into a value function, e.g., as in non-expected utility theories.
R.E. Just, D.R. Just / Journal of Econometrics 162 (2011) 6–17
estimating only behavioral equations with revealed preference data. The behavioral equations from utility maximization confound risk preference and risk so that additional structure must be estimated to infer risk preferences. We show that an inflexible specification for any one of these components indirectly influences which risk preference specification will fit a given dataset, and can suggest apparent parameter identification by small standard errors even though global identification fails. The results also demonstrate that the multiplier of variance in the widely-used linear mean–variance model does not generally approximate local absolute risk aversion as many assume. We show that a linear mean–variance form applies under expected utility maximization regardless of CARA or normality. But the function that multiplies variance does not measure absolute risk aversion if it is not constant. In this case, mean–variance frontiers may provide misleading information for expected utility maximization. Finally, we offer some concrete suggestions for how to overcome the failure to identify risk preferences by estimating additional relationships that do not depend on behavior implied by first-order conditions. Additional structure is required for identification and inference regarding the form of subjective price distributions for portfolio choice models as well as the structural form and distribution of production for production models. This expands upon the once typical practice of joint estimation of first-order conditions and the production structure, but is feasible depending on data limitations. An alternative approach is to measure perceived risk by elicitation or field experiments, but these approaches require measurements paired with behavioral data. Short of these expanded estimation approaches, we conclude that risk preferences cannot be estimated in the form of a utility function of profit. Rather, the scope must be reduced to a more modest investigation of the simple characteristics of risk preferences (such as DARA versus CARA, and IRRA versus CRRA) by testing certain behavioral restrictions following Pope (1980, 1988) and Chavas and Pope (1985). Alternatively, studies should clearly acknowledge their ‘‘as if’’ basis. For example, separate identification of risk preference and risk may not be necessary for certain types of ex post policy analysis (e.g., estimating impacts on measurable ˆ π , σπ2 ) can be estimated variables). Further, approximations of φ(µ and welfare can be evaluated via a certainty equivalent criterion ˆ π , σπ2 )σπ2 /2 based on our results, even though neither µπ − φ(µ the utility function nor the Arrow–Pratt risk measures of risk aversion can be inferred thereby. With this approach, many of the uses of estimated decision models are possible, e.g., ex ante policy and welfare analysis, even though risk preferences are not identified. References Amemiya, Y., 1985. Instrumental variable estimator for the nonlinear errors-invariables model. Journal of Econometrics 28, 273–289. Anderson, T.G., Bollerslev, T., Diebold, F.X., Ebens, H., 2001. The distribution of realized stock return volatility. Journal of Financial Economics 61, 43–76. Antle, J.M., 1987. Econometric estimation of producers’ risk attitudes. American Journal of Agricultural Economics 69, 509–522. Antle, J., Hatchett, S., 1986. Dynamic input decisions in econometric production models. American Journal of Agricultural Economics 68, 939–949. Appelbaum, E., Kohli, U., 1997. Import price uncertainty and the distribution of income. Review of Economics and Statistics 79, 620–630. Arrow, K.J., 1965. Aspects of the Theory of Risk Bearing. Yrio Jahnsson Foundation, Helsinki. Behrman, J.R., 1968. Supply Response in Underdeveloped Agriculture: A Case Study of Four Major Crops in Thailand, 1937–63. North Holland, Amsterdam. Binswanger, H.P., 1981. Attitudes toward risk: Theoretical implications of an experiment in rural India. Economic Journal 91, 867–890. Bollerslev, T., Chou, R.Y., Kroner, K.F., 1992. ARCH modeling in finance: A selective review of the theory and empirical evidence. Journal of Econometrics 52, 5–59. Chambers, R.G., Quiggin, J., 2000. Uncertainty, Production, Choice, and Agency. Cambridge University Press, Cambridge, UK.
17
Chavas, J.-P., 2008. A cost approach to economic analysis under state-contingent production uncertainty. American Journal of Agricultural Economics 90, 435–446. Chavas, J.-P., Holt, M.T., 1996. Economic behavior under uncertainty: A joint analysis of risk preferences and technology. Review of Economics and Statistics 78, 329–335. Chavas, J.-P., Pope, R., 1985. Price uncertainty and competitive firm behavior: Testable hypotheses from expected utility maximization. Journal of Economics and Business 37, 223–235. Coyle, B.T., 1999. Risk aversion and yield uncertainty in duality models of production: A mean–variance approach. American Journal of Agricultural Economics 81, 553–567. Diebold, F.X., Nerlove, M., 1989. The dynamics of exchange rate volatility: A multivariate latent factor ARCH model. Journal of Applied Econometrics 4, 1–22. Frankfurter, G.M., Phillips, H.E., Seagle, J.P., 1971. Portfolio selection: The effects of uncertain means, variances, and covariances. Journal of Financial and Quantitative Analysis 6, 1251–1262. Fuss, M., McFadden, D., 1978. Production Economics: A Dual Approach to Theory and Applications, vol. 1. North-Holland, Amsterdam. Hey, J.D., Orme, C., 1994. Investigating generalizations of expected utility theory using experimental data. Econometrica 62, 1291–1326. Judge, G.G., Griffiths, W.E., Hill, R.C., Lütkepohl, H., Lee, T.-C., 1985. The Theory and Practice of Econometrics, 2nd ed. John Wiley & Sons, New York. Just, R.E., 1975. Risk aversion under profit maximization. American Journal of Agricultural Economics 57, 347–352. Just, R.E., 1977. Existence of stable distributed lags. Econometrica 45, 1467–1480. Just, R.E., Calvin, L., Quiggin, J., 1999. Adverse selection in crop insurance: Actuarial and asymmetric information incentives. American Journal of Agricultural Economics 81, 834–849. Just, R.E., Pope, R.D., 1978. Stochastic specification of production functions and economic implications. Journal of Econometrics 7, 67–86. Just, R.E., Weninger, Q., 1999. Are crop yields normally distributed? American Journal of Agricultural Economics 81, 287–304. Kahneman, D., Tversky, A., 1979. Prospect theory: An analysis of decision under risk. Econometrica 47, 263–291. Kaplan, D., 1988. The impact of specification error on the estimation, testing, and improvement of structural equation models. Multivariate Behavioral Research 23, 69–86. Lau, L.J., Yotopoulos, P.A., 1972. Profit, supply, and factor demand functions. American Journal of Agricultural Economics 54, 11–18. Love, H.A., Buccola, S.T., 1991. Joint risk preference-technology estimation with a primal system. American Journal of Agricultural Economcis 73, 765–774. Machina, M.J., 1987. Choice under uncertainty: Problems solved and unsolved. Journal of Economic Perspectives 1, 121–154. Markowitz, H., 1959. Portfolio Selection: Efficient Diversification of Investments. John Wiley & Sons, New York. Marschak, J., Andrews Jr., W.H., 1944. Random simultaneous equations and the theory of production. Econometrica 12, 143–205. Menezes, C.F., Hanson, D.L., 1970. On the theory of risk aversion. International Economic Review 11, 481–487. Meyer, J., 1987. Two-moment decision models and expected utility maximization. American Economic Review 77, 421–430. Mundlak, Y., 1996. Production function estimation: Reviving the primal. Econometrica 64, 431–438. Mundlak, Y., Hoch, I., 1965. Consequences of alternative specifications of Cobb–Douglas production functions. Econometrica 33, 814–828. O’Donnell, C.J., Griffiths, W.E., 2006. Estimating state-contingent production frontiers. American Journal of Agricultural Economics 88, 249–266. Pope, R.D., 1980. The generalized envelope theorem and price uncertainty. International Economic Review 21, 75–86. Pope, R.D., 1988. A new parametric test for the structure of risk preferences. Economic Letters 27, 117–121. Pope, R.D., Just, R.E., 1996. Empirical implementation of ex ante cost functions. Journal of Econometrics 25, 231–249. Pope, R.D., Just, R.E., 2002. Random profits and duality. American Journal of Agricultural Economics 84, 1–7. Pratt, J.W., 1964. Risk aversion in the small and in the large. Econometrica 32, 122–136. Quiggin, J., 1982. A theory of anticipated utility. Journal of Economic Behavior and Organisation 3, 323–343. Saha, A., Shumway, C.R., Talpaz, H., 1994. Joint estimation of risk preference structure and technology using expo-power utility. American Journal of Agricultural Economics 76, 173–184. Sandmo, A., 1971. On the theory of the competitive firm under price uncertainty. American Economic Review 61, 65–73. Shanken, J., 1990. Intertemporal asset pricing: An empirical investigation. Journal of Econometrics 45, 90–120. Smith, V.L., 1982. Microeconomic systems as an experimental science. American Economic Review 72, 923–955. Von Neumann, J., Morgenstern, O., 1944. Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ. Zellner, A., Kmenta, J., Dreze, J., 1966. Specification and estimation of the Cobb–Douglas production function. Econometrica 34, 784–795.
Journal of Econometrics 162 (2011) 18–24
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Risk behavior in the presence of government programs Teresa Serra a,∗ , Barry K. Goodwin b , Allen M. Featherstone c a
Centre de Recerca en Economia i Desenvolupament Agroalimentaris (CREDA)-UPC-IRTA, 08860 Castelldefels, Spain
b
Agricultural and Resource Economics and Economics Departments, North Carolina State University, Raleigh, NC 27695, USA
c
Department of Agricultural Economics, Kansas State University, Manhattan, KS 66506, USA
article
info
Article history: Available online 12 October 2009 JEL classification: Q12 Keywords: Policy Risk Risk preferences Intensive margin Extensive margin
abstract Our paper assesses the impacts of the 1996 US Farm Bill on production decisions. We apply the expected utility model to analyze farmers’ behavior under risk and assess how farmers’ production decisions change in the presence of government programs. Specifically, we empirically evaluate the relative price and the risk-related effects of farm policy changes at the intensive margin of production, as well as the extra value that these policies add to farmers’ certainty equivalent. We use farm-level data collected in Kansas to estimate the model. We find evidence that decoupled government programs have only negligible impacts on production decisions. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Researchers have widely used the expected utility model in order to shed light on individuals’ behavior under risk. This approach has been widely criticized in previous research. Buschena (2003) argued that the expected utility theory’s axiomatic framework has been repeatedly violated in experimental settings. Rabin (2000) showed that attributing all risk behavior to diminishing marginal utility of wealth in small gambles can lead to unlikely levels of risk aversion.1 As Bardsley and Harris (1987) noted, estimates of risk attitudes based on experiments are artificial and may offer poor guidance regarding behavior in real economic environments (see (Wik et al., 2004) for an example of measurement of risk aversion from experimental data). Following this argument, Bar-Shira et al. (1997) and Roberts et al. (2004) pointed out that, though it is unclear that individuals have the ability to connect their beliefs to the rational concept of probability, it is possible that they actually behave according to expected utility theory outside an experimental framework. The reason why the expected utility model may be a good approach to assess individual firms’ behavior is that firms are often
∗ Corresponding address: Centre de Recerca en Economia i Desenvolupament Agroalimentaris (CREDA-UPC-IRTA), Parc Mediterrani de la Tecnologia, Edifici ESAB, C/ Esteve Terrades 8, 08860 Castelldefels, Spain. Tel.: +34 93 552 1209. E-mail address:
[email protected] (T. Serra). 1 Just and Peterson (2003) suggested that this criticism could be extended to more realistic risky situations. 0304-4076/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2009.10.005
evaluated on the basis of their short-run profits. Shareholders may be unsure about the manager’s ability, who in turn may be willing to give up some short-run profits in order to reduce longerterm losses. It is also true that many applied analyses based on the expected utility framework have obtained very reasonable results. Examples of these studies include Bar-Shira et al. (1997) and Isik and Khanna (2003). Additionally, one should note that the expected utility framework is useful in that it allows a determination of the welfare costs of a firm’s profit risk through the computation of the certainty equivalent (CE) of farm profits (Morduch, 1995). Farming, like other entrepreneurial activities, is subject to many financial risks. Concerns about farm risks and farmers’ abilities to manage these risks have led to a variety of government agricultural support programs in developed countries. As Roberts et al. (2004) explained, these programs are very relevant to US agriculture: direct payments from the Federal Government to farmers were around $22 billion annually in the period 1999–2001, while total crop sales receipts for this same period were about $92–$93 billion per year, suggesting that direct payments tended to be about 24% of total crop sales in each of the three years. Direct, decoupled payments, counter-cyclical payments, and price supports averaged about 75% of total direct payment outlays between 1990–2005, suggesting that most payments were made to major field crop producers. Roberts and Nigel (2003) noted that, in 2001, the year of the largest government outlays under the Federal Agriculture Improvement and Reform (FAIR) Act, of the $23 billion in total farm program payments, direct decoupled payments accounted for about $5 billion, marketing loan (price support) payments accounted for about $8 billion, and emergency assistance
T. Serra et al. / Journal of Econometrics 162 (2011) 18–24
was about $8 billion. Given the magnitude of these payments, they may strongly influence agricultural production decisions, even in cases where payments are not directly tied to production. In recent years, there have been important agricultural policy reforms worldwide that have often involved a change in the way farm incomes are supported (Gardner, 1992; Hennessy, 1998; Rude, 2001). Until recently, support was mainly provided through policies explicitly linked to production decisions (i.e., coupled policies). However, recent policy changes have attempted to break this link through a process known as ‘‘decoupling’’. Decoupling aims to support farm incomes while reducing efficiency losses related to coupled policies such as price-support measures or deficiency payments (Chambers, 1995). Our empirical study focuses on US agricultural policy reforms introduced by the 1996 Farm Bill. The literature that assesses production impacts of policy instruments has shown that, in the context of a deterministic world with complete (credit) markets, or under the assumption that agents are risk neutral, only those policies that distort relative market prices have an impact on producers’ decisions. In addition, an extensive literature shows that in a world with uncertainty, decoupled transfers, by means of altering total farm household wealth, can have an effect on economic agents’ risk attitudes and thus on their production decisions (see, Sandmo, 1971, Hennessy, 1998; or Young and Westcott, 2000). Under the assumption that farmers are characterized by decreasing absolute risk aversion preferences,2 lump sum payments may potentially reduce farmers’ absolute risk aversion, but likely not their relative risk aversion. Hennessy (1998) has shown that the willingness to assume more risk may result in an increase in production. These ‘‘second-order’’ effects are known as the wealth effects of policy. While a change in price supports is likely to exert a significant impact on production, it is less clear that producers will strongly react to decoupled government transfers. Secondorder effects might be expected to be small relative to the effects of coupled policies. The existence of these effects and their magnitude are issues requiring empirical analysis. The objective of this paper is to assess whether the 1996 US Farm Bill, aimed at decoupling government program payments from production decisions, succeeded in its objective. We apply the expected utility model to analyze farmers’ behavior under risk and to determine the extent to which this framework leads to reasonable results when implemented in realistic, risky situations. Specifically, we evaluate the relative price and the risk-related effects of farm policy changes at the intensive margin of production, as well as the extra value that these policies add to farmers’ certainty equivalent (CE). A large body of literature exists on the impacts of risk preferences and uncertainties on economic agents’ decisions (see, for example, Just and Zilberman, 1986; Chavas and Holt, 1990; Pope, 1982; Pope and Just, 1991; Leathers and Quiggin, 1991; Just and Pope, 2002; or Kumbhakar, 2002). Most of these studies, however, have imposed restrictive assumptions on producers’ risk preferences (see Love and Buccola, 1991). Isik and Khanna (2003), whose approach we follow in this analysis, use a functional form based on Saha’s (1997) work that adds flexibility to previously used forms. Though several published empirical studies have assessed the effects of (partially) decoupled policies, most of these analyses have assumed risk neutrality (see, for example Moro and Sckokai, 1999; Guyomard et al., 1996; Adams et al., 2001). As a result, they have ignored the potential risk effects of policy. A few studies have considered risk and risk preferences (Ridier and Jacket, 2002; Serra
2 A number of previous studies that have tested for economic agents’ risk preferences have provided evidence in favor of decreasing absolute risk aversion (see, for example, Isik and Khanna, 2003; Saha, 1997; or Bar-Shira et al., 1997).
19
et al., 2006)3 but have not distinguished between the relative price and risk-related effects of policy, nor assessed the influence of policy on farmers’ CE.4 We extend this literature by providing an empirical investigation of the effects of decoupling that explicitly considers producers’ risks and risk attitudes, that identifies the risk-related effects of policy, and that analyzes how the CE changes when farm programs are modified. The article is organized as follows. In the next section, we present the methods employed in our analysis. In the empirical application section, we offer specifics on the data and the definition of the model variables. We then present estimation results derived from the analysis of farm-level data obtained from a sample of Kansas farms. Concluding remarks are offered in the last section. 2. Methods Our model of production under uncertainty follows Meyer (1987) in that we assume that farmers rank alternative choices using a utility function that is defined on the first two moments of the random payoff. Farmers’ risk attitudes are represented using Saha’s (1997) nonlinear mean-standard deviation utility function. The same approach was recently used by Isik and Khanna (2003) to assess the value of site-specific technologies. Agricultural policy in developed countries usually involves the use of price-support measures (such as US deficiency payments) that keep market prices artificially high. We compare production impacts of lump sum payments with the effects of price supports, which represent a coupled element of support. A decoupled payment is defined as an income-support payment that is exogenously fixed and that does not depend on actual, current production or prices. The effects of government cash transfers on land values has been widely considered in the literature (see, for example, Barnard et al., 1997; Goodwin and Ortalo-Magné, 1992; Weersink et al., 1999; Just and Miranowski, 1993; Schertz and Johnston, 1998) and there seems to be a general agreement that economic rents from policy are likely to influence land prices, which in turn are likely to cause changes in relative input prices. By considering government transfers as fully decoupled, our model does not capture these changes which certainly constitute an interesting avenue for future research. Suppose a single-output firm produces output y using a technology that can be represented by y = f (x) + u, where x is the quantity utilized of a variable input, and u is a random error term with zero mean and variance σu2 . The mean and variance of output can be expressed as y¯ = f (x) and σy2 = σu2 , respectively. Farmers are assumed to maximize the expected utility of wealth. A farm’s total wealth is represented by W = ω + pf (x) − w x + G, where ω is a farm’s initial wealth, p is the market output price which is assumed to be a random variable that is independently distributed from the production disturbance, with mean p¯ and variance σp2 , w is the variable input price, and G are decoupled government payments. The producer’s optimization problem is
¯ , σW ), max Ux (W
(1)
¯ , the farmer’s expected wealth, is W ¯ = ω + p¯ f (x) − where W w x + G, and σW , the standard deviation of wealth, is σW = 2 2 1/2 p¯ σy + f (x)2 σp2 + σy2 σp2 . 3 Goodwin and Mishra (2006) estimate a model which is only indirectly related to the expected utility conceptual model and do not derive risk preference parameters. 4 Sckokai and Moro (2006) is an exception. They assess the risk-related impacts of the partially decoupled payments of the 1992 Common Agricultural Policy (CAP) reform by using a dual approach of the EU model. As noted by an anonymous referee, with the use of duality, the ability to reflect risk issues adequately is highly questionable. Antón and Le Mouël (2004), on the other hand, compare the riskreducing incentives to produce created by US loan deficiency and counter-cyclical payments.
20
T. Serra et al. / Journal of Econometrics 162 (2011) 18–24
We analyze production decisions at the intensive margin by assuming an internal solution, x > 0 which allows to express optimal input application as
¯ ∂σW ∂W =R , (2) ∂x ∂x ¯ ∂ f (x) where ∂∂Wx = p¯ ∂ x − w represents an input’s expected marginal ∂σ
∂ f (x)
σp2
income, ∂ xW = f (x) ∂ x σ measures the impact of a change in W input use on the standard deviation of wealth, and R = − ∂σ∂ u / ∂∂Wu¯ W is the marginal utility ratio of the utility function representing a farmer’s risk attitudes. Risk averse (neutral) [seeking] attitudes ∂σ involve R > (=)[<]0. The expression R ∂ xW is known as the marginal risk premium (MRP) and measures the impacts of uncertainty and risk preferences on production decisions. The first-order condition in (2) shows that a farmer reaches an optimum when the MRP equals the wedge between the expected value of the marginal product and the input price. Under risk neutrality, the first-order condition simply consists of equating the value of the marginal product to the input price expected ¯ ∂W ∂x
= 0 . While the left-hand side of the first-order condition
captures the impacts of a change in relative prices on input use, the right-hand side measures the risk effects of policy changes. We attempt to empirically quantify the magnitude of both effects. To our knowledge, no previous analysis has assessed the overall riskrelated effects of US agricultural policies. ¯ ,σ) = We assume the following flexible utility function U (W ¯ θ − σ γ (Saha, 1997), which allows measuring a producer’s risk W γ ¯ 1−θ γ −1 preferences by the following expression R = θ W σ , where θ > 0 and γ are parameters. If farmers are risk averse (neutral) [loving], γ > (=)[<]0. Under the assumption of risk aversion, decreasing absolute risk aversion (DARA), constant relative risk aversion (CARA), and increasing absolute risk aversion (IARA) are represented by θ > 1, θ = 1, and θ < 1, respectively. Decreasing relative risk aversion (DRRA), constant relative risk aversion (CRRA) and increasing relative risk aversion (IRRA) involve θ > γ , θ = γ , and θ < γ , respectively.5 2.1. Comparative statics analysis Farmers are assumed to have two income sources: lump sum transfers and market revenue. We choose market prices to represent a form of coupled income support, and compare their production effects with production responses originated by decoupled payments. We assume throughout the comparative statics analysis that farmers have DARA (θ > 1) and that the marginal productivity of the variable input is positive (fx > 0). The elasticity of agricultural output with respect to decoupled transfers is given by
εy_G =
1 R ∂σW
εy_x εR_G > 0, (3) A x ∂x where A < 0 is the second order condition of the optimization G problem, εR_G = (1 − θ ) W ¯ is the marginal utility ratio elasticity with respect to G, which shows the risk preference adjustment to a change in decoupled transfers, and εy_x is the input elasticity of output. In accordance with previous literature (see Hennessy, 1998; or Leathers and Quiggin, 1991), expression (3) suggests that an increase in decoupled government transfers increases DARA farmers’ willingness to assume more risk, which in turn stimulates production εy_G > 0 . From Eq. (3) it is also clear that, since decoupled government transfers do not alter relative prices, they 5 As an anonymous referee notes, although Saha’s (1997) utility function is flexible with respect to DARA, CARA, versus IARA and DRRA, CRRA, versus IRRA, it should be acknowledged that it also admits forms representing an expected utility that cannot be derived from any underlying utility function.
do not have an impact on a farm’s marginal income, thus causing only a risk effect. The elasticity of agricultural output with respect to the output price can be measured as follows
εy_p¯ = −
p¯ Ax
εy_x
[
∂ f ( x) − ∂x
] ∂ R ∂σW R ∂σW ∂σW − , (4) ∂ p¯ ∂ x σW ∂ x ∂ p¯ σ2 + (γ − 1) σ 2y p¯ measures the impact
y¯ where ∂∂ Rp¯ = R (1 − θ ) W¯
W
of a price change on farmers’ risk preferences. Expression (4) shows that output price variation generates two changes that can impact the level of output. The first is the relative price effect,which is measured through the marginal income effect
∂ f (x) ∂x
> 0 and cor-
responds to the first summand in (4). The risk effect corresponds to the second summand and measures the impacts of price changes ∂σ on risk and risk preferences. Specifically, ∂∂ Rp¯ ∂ xW represents the risk aversion effect of price and captures the responses of a farmer’s risk preferences to price changes. The sign of this effect depends upon the sign of ∂∂ Rp¯ which cannot be anticipated by theory and thus must ∂σ
∂σ
be empirically determined. Expression σR ∂ xW ∂ pW ¯ > 0 is the risk W effect of price and measures the impact of a price change on the risk faced by the farmer. Our comparative statics analysis allows derivation of expressions (3) and (4) that capture the effects of coupled and decoupled policy instruments on the level of production. A comparison of these two expressions, however, does not allow a clear conclusion regarding their relative magnitudes. This is an issue requiring empirical determination. However, output price is likely to have a stronger impact on production relative to decoupled payments. This is because lump sum transfers only impact producers’ behavior through a risk effect whereas output price influences production by means of the relative price and risk effects. However, it is unclear whether the risk-related effects of the change in price will be bigger or smaller than the risk effects of decoupled payments. While price influences both risk and risk attitudes, lump sum transfers only impact risk attitudes. Hence, if a decline in the output price occurs, a farm’s expected wealth will be smaller, which is likely to increase the degree of risk aversion and stimulate a reduction in output levels. However, this effect may be constrained by the fact that a reduction in price will also cause a decline in a farm’s variance of wealth. Nevertheless, to the extent that cash receipts are larger than decoupled transfers, output prices may have stronger effects on wealth and thus on risk preferences than is the case for decoupled transfers. In order to determine the extra value that government policies add to the farm business, we compute a farm’s certainty equivalent (CE) at different levels of governmental support. By using the elasticities previously determined, we assess the sensitivity of a farm’s CE to governmental direct payments and price-supports. Specifically, this measure is computed for different levels of decoupled transfers and price-supports. This analysis can also be used to predict the impacts of government programs on the extensive margin of production, since a negative value of the CE indicates that the farmer would be better off by abandoning the activity. As a result, we estimate the number of farms that are likely to stop production under different support levels. A reduction in both price supports and subsidies will reduce a farm’s profit, but will also increase a farmer’s degree of risk aversion. Both changes are likely to trigger a decline in farmers’ CE. 2.2. Empirical specification In order to estimate our model, we provide a parametric representation. Generalizing the model outlined above, we allow
T. Serra et al. / Journal of Econometrics 162 (2011) 18–24
for a multi-input production technology f (x), where x is a vector of n inputs.6 The technology structure is approximated ∑n through a quadratic production function y¯ = α 0 + i=1 αi xi + ∑n ∑n i =1 j=1 αij xi xj with α ’s being parameters. Following expression (2) a system of n first-order conditions can be derived as
∂ f (x) γ ¯ 1−θ γ −1 ∂ f (x) σp2 f (x) p¯ − wi − W σW = 0, ∂ xi θ ∂ xi σW i = 1, . . . , n.
(5)
Parameters of the production function are estimated by ordinary least squares. Parameters that represent producers’ risk attitudes are derived through the estimation of the system of Eq. (5) by non-linear three-stage least squares, conditional on the parameters obtained in the first step. A two-stage estimation approach such as the one followed in this analysis will usually lead to inaccurate estimates of the parameters’ standard errors. We address this problem by using bootstrapping methods. Specifically, we draw 500 pseudo-samples with replacement and re-estimate the model for each one, thus generating a sample of parameter estimates. Bootstrapping allows one to derive estimates that are both efficient and robust to heteroskedasticity. Parameter estimates and their variance–covariance matrix are derived from the distribution of the replicated estimates (i.e., estimates are given by the means and variances of the replicated sample). The elasticities of output with respect to coupled and decoupled policies are constructed based on the generalization of expressions (3) and (4) to a n-input model7 and are computed from the replicated estimates generated in the bootstrap. In this section we have shown that, though the sign of the effects of decoupled government programs can, to a certain extent, be predicted by theory, the sign of the impacts of a price change and the magnitude of both payment and price effects need to be determined by empirical analysis. We devote the next two sections to studying the impacts of decoupling on production decisions in a sample of Kansas farms. 3. Empirical application US farm policy underwent substantial changes in 1996. These changes involved a decoupling of certain aspects of US farm policy in that income-support payments were untied from production. Loan rates are used in US agricultural policy to determine marketing loans and loan deficiency payments which provide incomesupport to farmers. Subsequent to the 1996 legislation, producers could take the difference between the loan rate and the market price (if the price were beneath the loan rate) as a direct payment form of a price support, known as a ‘‘loan deficiency payment’’ or LDP. Thus, the loan rate served as a support price. The marketing loan program has long been providing agricultural price support by removing crops from the market when prices are below the loan rate, or below the loan rate plus the interest charged on the loan. While the 1996 Farm Bill continued marketing loans, it introduced important changes in the program that reduced its potential influence on market prices. The 1996 Act eliminated base acreage requirements to be eligible for price supports and instituted fully decoupled payments that were intended to compensate growers as policies were transitioned to a greater market orientation.8
6 See the empirical application section for a definition of the inputs considered. 7 Details of this generalization are available from the authors upon request. 8 To receive PFC payments, farmers who had participated in the wheat, feed grains, rice, and upland cotton programs in any of the years of the period 1991–95, had to enter a 7-year PFC program (1996–2002).
21
Table 1 Summary statistics for the variables of interest. Variable n = 2,241
Mean
Standard deviation
y (output) p¯ (expected price) x1 (chemical input) w1 (x1 price) x2 (fertilizer) w2 (x2 price) x3 (seed) w3 (x3 price) x4 (energy) w4 (x4 price) G (PFC payments) ω (initial wealth)
146,910.66 0.92 15,022.37 0.99 19,798.01 1.01 13,861.41 1.02 15,028.54 1.01 12,014.92 669,663.10
123,608.74 0.06 17,390.11 0.01 21,076.48 0.06 16,678.80 0.03 16,471.51 0.03 9,233.03 587,319.18
Note: All monetary values are expressed in constant 1998 currency units.
Our empirical analysis focuses on a sample of Kansas farmers. Farm-level data were taken from farm account records from the Kansas Farm Management Association database for the period 1998–2001.9 Thus, our period of analysis corresponds to a time during which the 1996 FAIR Act was effective. FAIR Act payments, also known as ‘‘Agricultural Market Transition Act’’ or AMTA payments, correspond to our definition of fixed payments per farm. Though the analysis is based on individual farm data, aggregate data are needed to define several important variables not recorded in the Kansas dataset. These aggregates are taken from the National Agricultural Statistics Service (NASS), the United States Department of Agriculture (USDA), and the Commodity Research Bureau (CRB) historical Commodity database. NASS provides country-level price indices and state-level output prices and quantities. USDA data include state-level marketing assistance loan rates (LR) and PFC payment rates. From the CRB database we extracted information on agricultural commodity futures prices. Summary statistics for the variables of interest are presented in Table 1. Four variable inputs are considered; x1 is the chemicals input, x2 is fertilizers, x3 is seed use, and x4 is a composite input that includes energy (irrigation energy, gas and fuel-oil and electricity). Because input prices are not identified in the Kansas database, country-level input price indices are taken from NASS. Implicit quantity indices for variable inputs are derived through the ratio of input use in currency units to the corresponding price index. Following the theoretical model, a single output category is considered as a quantity index that includes the production of wheat, corn, grain sorghum and soybeans- the predominant crops in Kansas.10 An aggregate output price index is defined as a Paasche index that represents a farmer’s expected price. To build the expected price index, unit prices for the crops are defined as expected prices in the following way: p¯ = max [E (Cp) , LR], where E (Cp), the expected cash price, is computed as the futures price adjusted by the basis.11 The basis is calculated as the five preceding years’ average of the difference between the cash price and the futures
9 Retrospective data for these farms are used to define several lagged variables used in the analysis. To be able to do so, a complete, balanced panel is built from our sample. 10 Our database does not provide information on input allocation among different outputs and thus a single aggregate output is defined. In a recent paper, Davis et al. (2000) extend the generalized composite commodity theorem and provide support for consistent aggregation of US agricultural outputs into as few as two categories: crops and livestock. However the composite commodity theorem ignores risk aversion and thus may entail problems in a framework that allows for risk and risk preferences. 11 When the futures price is unavailable, the lagged cash price is taken as the proxy for the expected price, thus assuming naïve expectations. This only occurs for the sorghum futures price. Chicago Board of Trade futures prices were used for wheat, corn and soybeans.
22
T. Serra et al. / Journal of Econometrics 162 (2011) 18–24
Table 2 Parameter estimates and summary statistics for the production function. Parameter
Coefficient estimate
Standard error
α0 α1 α2 α3 α4 α11 a α22 α33 α44 α12 α13 α14 α23 α24 α34
−1688.82
3512.69 0.37 0.28 0.46 0.38 0.62 0.27 1.02 0.71 0.79 1.36 1.41 11.60 1.08 1.64
* ** a
3.45** 2.23** 2.15** 2.44** −1.79** −0.45* −0.14 −2.18** −0.94 0.50 0.66 0.48 1.72 0.11
Statistical significance at the α = 0.1 level. Statistical significance at the α = 0.05 level. All cross terms were scaled by 1/100,000.
price. The cash price is the state-level output price. The futures price is defined as the daily average price during the planting season for the harvest month contract. State-level production is employed to derive the aggregate expected Paasche index. The Kansas database does not register PFC government payments. Instead, a single measure including all government payments received by each farm is recorded. To derive an estimate of farm-level PFC payments, the acreage of the program crops (base acreage) and the base yield for each crop are approximated using farm-level data. The approximation uses the 1986–88 average acreage and yield for each program crop and farm. PFC payments per crop are derived by multiplying 0.85 by the base acreage, yield and the PFC payment rate.12 PFC payments per crop are then added to get total direct payments per farm.13 A farm’s initial wealth is defined as the farm’s net worth. 4. Results Our article studies farmers’ behavior under risk and uncertainty. The analysis is based on the expected utility model and explicitly models government farm programs. Results of the estimation of technology and risk preference parameters are presented in Tables 2 and 3, respectively. Parameter estimates for the production function have the expected signs and linear and quadratic terms are, with one exception, statistically significant. At the data means, fxi > 0 and fxi xi < 0 which implies that an increase in input ‘‘i’’ use will increase output at a decreasing rate. Production technology for the farms in our sample is characterized by decreasing returns to scale. The mean and standard deviation of the scale elasticity take the value of 0.98 and 0.016, respectively. Parameter estimates for the risk preferences function are all statistically significant and, as expected, provide evidence that farms in our sample exhibit DARA and IRRA preferences. The Wald test indicates the model is globally significant (see Table 3). The estimated risk preference parameters are close to those derived by Isik and Khanna (2003) for a sample of Illinois farmers and by Saha (1997) for Kansas wheat farmers. Bar-Shira et al. (1997) also found evidence of DARA and IRRA risk preferences for a sample of Israeli farmers. Hence, the expected utility approach yields results that
12 Base acreage for most crops was established in the 1980s and thus we use actual acreage during this period to measure a farm’s base acreage. 13 This estimate is compared to actual government payments received by each farm. If estimated PFC payments exceed actual payments, the first measure is replaced by the second. This happens to 7% of our observations.
Table 3 Parameter estimates and summary statistics for the risk preferences function. Parameter*
Coefficient Standard error estimate
γ θ
1.26** 1.02**
Wald test for global significance H0 : αi = αii = αij = γ = θ = 0 * **
0.03 0.01 23,647.00**
Statistical significance at the α = 0.1 level. Statistical significance at the α = 0.05 level.
Table 4 Elasticity estimates at the data means. Elasticity value
Standard error
1.64* 4.43E−4
0.88 6.66E−4
0.19* 4.43E−4
0.10 6.66E−4
1.45* –
0.88 –
Total effect
εy_p εy_G Risk effect
εy_p εy_G Relative price effect
εy_p εy_G
Government payments have no relative price effect, which is denoted by ‘–’. * Statistical significance at the α = 0.1 level.
are reasonable and compatible with previous research. Similarities between our risk preference parameters and the ones derived by previous analyses should also allow one to use our results to predict farmers’ behavior in other circumstances. Price and payment elasticities of agricultural output are offered in Table 4. Focusing first on the total effect, coupled instruments have a much stronger impact on production than decoupled public transfers, as expected. While the output price elasticity is 1.64, the decoupled payment elasticity is 0.00043, thus suggesting that very large decoupled payments would be required to generate perceptible production effects. This conclusion is reinforced by the fact that the payment elasticity is not statistically different from zero and is compatible with Hennessy’s (1998) findings that, under DARA preferences, an increase in decoupled payments will have only a minor effect on production. From these results we can conclude that a policy reform involving a reduction in price supports compensated by an increase in lump-sum payments is very likely to have the effect of reducing agricultural output. Focusing on the decomposition of the total output effect of price into a relative price effect and a risk effect, the first effect is found to dominate the second. In a risk neutral scenario, an increase in output prices by 1% would generate an increase in output of the order of 1.45%. The difference between the relative price effect and the total effect is 0.19 which represents the risk effect of a price change. The risk effect of prices dominates the risk effect of payments (0.19 versus 0.00043). Again, this result is not surprising. For our sample farms and for the crops, decoupled payments represent less than 10% of total cash receipts. Hence, a change in output prices is likely to have a substantially stronger impact on wealth than a change in PFC payments, which reinforces the risk effect of price. We evaluate the impacts of government programs on farmers’ CE by computing this magnitude for different levels of public support and on a per acre basis. In Figs. 1 and 3 we represent mean CE and their 95% confidence intervals (CI). Figs. 2 and 4 are the result of counting the number of farms that obtain a negative CE for different levels of support. According to USDA baseline policy variables (see USDA, 2000), marketing assistance loan rates for the crops considered were reduced by about 6.3% over the period of analysis. We assume that any cut in marketing assistance loan rates will correspond to an equivalent decrease in market output prices and consider three possible scenarios involving a 5%, 10% and 15%
T. Serra et al. / Journal of Econometrics 162 (2011) 18–24
23
Fig. 1. Farm’s certainty equivalent for different levels of PFC payments. Fig. 4. Percentage of farms that will obtain a negative certainty equivalent if price supports are reduced.
5. Concluding remarks
Fig. 2. Percentage of farms that will obtain a negative certainty equivalent if PFC payments are reduced
Fig. 3. Farm’s certainty equivalent for different levels of price supports.
decline in loan rates, which in turn involves a reduction in output prices of the same magnitudes. We analyze CE levels for a wide array of PFC cuts, from 10% to 100%. Results confirm our findings at the intensive margin of production, i.e., that PFC payments have a negligible impact on farmers’ production decisions, while the influence of price supports is potentially very powerful. If PFC payments were cut by half (Fig. 1), the average CE could suffer a decline of the order of 10%. In spite of this decline in the CE, only 0.1% of the farms would be better off by abandoning their activity (Fig. 2). The elimination of PFC payments would reduce the mean CE by almost 20%, causing 0.22% of the farms to have a negative CE. Conversely, if one assumes a reduction in market prices, the results are substantially different. A reduction in output prices of the order of 10% implies a 62% decline in the CE and triggers the abandonment of almost 9% of the sample farms. A 15% cut in market prices would reduce the CE by 86% and would motivate the abandonment of almost 45% of the farms. Finally, a 5% decline in prices has an impact at the extensive margin that is equivalent to the elimination of decoupled transfers, i.e., it causes abandonment of 0.2% of the farms.
This article develops a model of production under uncertainty based on the Isik and Khanna (2003) framework and evaluates the effects of the FAIR Act policy changes both at the intensive and extensive margins. Though the theoretical framework allows a prediction of the sign of the effects of decoupled government transfers, the impacts of a price change cannot be predicted by theory. Additionally, the theoretical framework does not allow drawing a conclusion regarding the relative magnitudes of the price and payment effects. Farm-level data for a sample of Kansas farms are used to estimate the model. Results show that farmers in the sample have risk preferences that are characterized by decreasing absolute risk aversion (DARA) and increasingly relative risk aversion (IRRA). The expected utility framework leads to reasonable results that are comparable with previous research. Similarities between our risk preference parameters and the ones derived by previous analyses suggest that our results could be used to predict farmers’ behavior in other circumstances. Though decoupled government transfers are found to motivate an increase in input use and thus in agricultural output, elasticity values are very small, requiring substantial changes in payments to generate perceptible effects on production. Conversely, the effects of coupled policies such as price-supports are found to be substantially higher. Hence, a decoupling process consisting of a reduction in price supports in favor of decoupled government transfers is very likely to involve a reduction in output. The risk effect of a price change is found to dominate the risk effect of payments, a result which is not surprising in light of the magnitude of crop cash receipts relative to decoupled transfers. The impact of lump sum payments on farmers’ CE is found to be negligible. An elimination of PFC payments may reduce the mean CE by almost 20%, but would result in only 0.22% abandonment. In comparison, a 15% price decline would cause almost 45% of the farms to have a negative CE. Hence, though PFC payments are not fully decoupled in the presence of risk and uncertainty, their effects on agricultural production seem to be of a very small magnitude. Acknowledgements We would like to thank the Spanish Ministerio de Ciencia e Innovación for providing support to this analysis within the project AGL2006-00949/AGR. We are also very grateful to the Editor and the anonymous referees for comments and suggestions. References Adams, G., Westhoff, P., Willott, B., Young II, R.E., 2001. Do ‘decoupled’ payments affect US crop area? Preliminary evidence from 1997–2000. American Journal of Agricultural Economics 83, 1190–1195. Antón, J., Le Mouël, C., 2004. Do counter-cyclical payments in the 2002 US Farm Act create incentives to produce? Agricultural Eonomics 31, 277–284.
24
T. Serra et al. / Journal of Econometrics 162 (2011) 18–24
Bardsley, P., Harris, M., 1987. An approach to the econometric estimation of attitudes to risk in agriculture. Australian Journal of Agricultural Economics 31, 112–126. Bar-Shira, Z., Just, R.E., Zilberman, D., 1997. Estimation of farmers’ risk attitude: An econometric approach. Agricultural Economics 17, 211–222. Barnard, C.H., Whittaker, G., Westenbarger, D., Ahearn, M., 1997. Evidence of capitalization of direct government payments into US cropland values. American Journal of Agricultural Economics 79, 1642–1650. Buschena, D.E., 2003. Expected utility violations: Implications for agricultural and natural resource economics. American Journal of Agricultural Economics 85, 1242–1248. Chambers, R.G., 1995. The incidence of agricultural policies. Journal of Public Economics 57, 317–335. Chavas, J.P., Holt, M.T., 1990. Acreage decisions under risk: The case of corn and soybeans. American Journal of Agricultural Economics 72, 529–538. Davis, G.C., Lin, N., Shumway, C.R., 2000. Aggregation without separability: Tests of the United States and Mexican agricultural production data. American Journal of Agricultural Economics 82, 214–230. Gardner, B.L., 1992. Changing economic perspectives on the farm problem. Journal of Economic Literature 30, 62–101. Goodwin, B.K., Mishra, A.K., 2006. Are ‘decoupled’ farm program payments really decoupled? An empirical evaluation. American Journal of Agricultural Economics 88, 73–89. Goodwin, B.K., Ortalo-Magné, F., 1992. The capitalization of wheat subsidies into agricultural land values. Canadian Journal of Agricultural Economics 40, 37–54. Guyomard, H., Baudry, M., Carpentier, A., 1996. Estimating crop supply response in the presence of farm programmes: Application to the CAP. European Review of Agricultural Economics 23, 401–420. Hennessy, D.A., 1998. The production effects of agricultural income support policies under uncertainty. American Journal of Agricultural Economics 80, 46–57. Isik, M., Khanna, M., 2003. Stochastic technology, risk preferences, and adoption of site-specific technologies. American Journal of Agricultural Economics 85, 305–317. Just, D.R., Peterson, H.H., 2003. Diminishing marginal utility of wealth and calibration of risk in agriculture. American Journal of Agricultural Economics 85, 1234–1241. Just, R.E., Pope., R.D. (Eds.), 2002. A Comprehensive Assessment of the Role of Risk in US Agriculture. Kluwer Academic Publishers, Norwell, MA. Just, R.E., Miranowski, J.A., 1993. Understanding farmland price changes. American Journal of Agricultural Economics 75, 156–168. Just, R.E., Zilberman, D., 1986. Does the law of supply hold under uncertainty. The Economic Journal 96, 514–524. Kumbhakar, S.C., 2002. Specification and estimation of production risk, risk preferences and technical efficiency. American Journal of Agricultural Economics 84, 8–22. Leathers, H.D., Quiggin, J.C., 1991. Interactions between agricultural and resource policy: The importance of attitudes toward risk. American Journal of Agricultural Economics 73, 757–764. Love, H.A., Buccola, S.T., 1991. Joint risk preference-technology estimation with a primal system. American Journal of Agricultural Economics 73, 765–774.
Meyer, J., 1987. Two-moment decision models and expected utility maximization. The American Economic Review 77, 421–430. Morduch, J., 1995. Income smoothing and consumption smoothing. The Journal of Economic Perspectives 9, 103–114. Moro, D., Sckokai, P., 1999. Modelling the CAP arable crop regime in Italy: Degree of decoupling and impact of agenda 2000. Cahiers d’Economie at Sociologie Rurales 53, 49–73. Pope, R., 1982. Empirical estimation and use of risk preferences: An appraisal of estimation methods that use actual economic decisions. American Journal of Agricultural Economics 64, 376–383. Pope, R.D., Just, R.E., 1991. On testing the structure of risk preferences in agricultural supply analysis. American Journal of Agricultural Economics 80, 743–748. Rabin, M., 2000. Diminishing marginal utility of wealth cannot explain risk aversion. In: Kahneman, D., Tversky, A. (Eds.), Choices, Values, and Frames. Cambridge University Press, New York, NY, pp. 202–208. Ridier, A., Jacket, F., 2002. Decoupling direct payments and the dynamics of decisions under price risk in cattle farms. Journal of Agricultural Economics 53, 549–565. Roberts, M.J., Nigel, K., 2003. Who benefits from government farm payments? Relationships between payments received and farm household well-being. Choices, Third Quarter 7–14. Roberts, M.J., Osteen, C., Soule, M., 2004. Risk, government programs, and the environment. Technical Bulletin Number 1908, Economic Research Service, United States Department of Agriculture. Rude, J.I., 2001. Under the green box. The WTO and farm subsidies. Journal of World Trade 35, 1015–1033. Saha, A., 1997. Risk preference estimation in the nonlinear mean standard deviation approach. Economic Inquiry 35, 770–782. Sandmo, A., 1971. On the theory of the competitive firm under price uncertainty. The American Economic Review 61, 65–73. Serra, T., Zilberman, D., Goodwin, B.K., Featherstone, A.M., 2006. Effects of decoupling on agricultural output mean and variability. European Review of Agricultural Economics 33, 269–288. Schertz, L., Johnston, W., 1998. Landowners: They get the 1996 farm act benefits. Choices 13, 4–7. Sckokai, P., Moro, D., 2006. Modeling the reforms of the common agricultural policy for arable crops under uncertainty. American Journal of Agricultural Economics 88, 43–56. USDA, 2000. USDA Agricultural Baseline Projections to 2009. Staff Report No. WAOB-2001-1, World Agricultural Outlook Board, Office of the Chief Economist, Prepared by the Interagency Agricultural Projections Committee. Available from URL: http://www.ers.usda.gov/Publications/WAOB001 (accessed 23.01.07). Weersink, A., Clark, S., Turvey, C.G., Sarker, R., 1999. The effect of agricultural policy on farmland values. Land Economics 75, 425–439. Wik, M., Aragie Kebede, T., Bergland, O., Holden, S.T., 2004. On the measurement of risk aversion from experimental data. Applied Economics 36, 2443–2451. Young, C.E., Westcott, P.C., 2000. How decoupled is US agricultural support for major crops? American Journal of Agricultural Economics 82, 762–767.
Journal of Econometrics 162 (2011) 25–34
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Calibrating the wealth effects of decoupled payments: Does decreasing absolute risk aversion matter? David R. Just ∗ Applied Economics and Management, Cornell University, 254 Warren Hall, Ithaca, NY 14853, USA
article
info
Article history: Available online 12 October 2009 Keywords: Decreasing absolute risk aversion Calibration Diminishing marginal utility of wealth Arrow–Pratt risk aversion
abstract Arrow’s hypotheses regarding the relationship between wealth and risk aversion measures have formed the basis for a large body of empirical research and theory. For example, many have suggested that decoupled farm subsidy payments may increase production as they decrease farmers’ risk aversion. This paper develops a new calibration technique designed to measure the minimum change in concavity of a utility of wealth function necessary to describe a particular change in production behavior for some discrete change in wealth. I conclude that measurable changes in production levels should not be produced by changing levels of risk aversion except when wealth changes are a substantial portion of wealth. This tool draws into question the usefulness of Arrow’s hypotheses in many current applications. © 2009 Elsevier B.V. All rights reserved.
1. Introduction In the 1990s, spurred by the rising costs, and increased pressure from trading partners, the United States and the European Union began substituting lump sum payments to producers in place of more traditional production subsidies. Traditional subsidy schemes encouraged greater production by increasing the revenue received on each unit of production, creating trade distortions. Alternatively, lump sum payments, called decoupled payments, were intended to reduce trade distortions because they were not directly tied to the amount produced. Hennessy (1998), however, argues that despite not being directly tied to the amount produced, decoupled payments can influence production decisions by altering the risk attitude of producers. Hennessy’s argument depends heavily on the expected utility hypothesis, and the rich literature that has developed this model into the workhorse of applied risk research. Built on a set of three rationality axioms, the expected utility hypothesis implies that all risk preferences stem from a diminishing marginal utility of wealth—the concavity of a utility of wealth function. In Arrow’s (1971) seminal work, he proposes a simple measure of risk aversion based on the concavity of a utility function, called absolute risk aversion. Further, he argues that the absolute risk aversion must decrease with wealth. Sandmo (1971) used Arrow’s measures to show that the greater the absolute risk aversion demonstrated by a producer, the less they would produce. Combining these two arguments, Hennessy suggests that decoupled payments increase the wealth of producers, thus reducing their level of risk aversion. This decreased level of risk aversion leads producers to increase
∗
Tel.: +1 607 255 2086; fax: +1 607 255 9984. E-mail address:
[email protected].
0304-4076/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2009.10.006
production, thus potentially distorting trade. One is left to wonder how large a production effect could be caused by such decoupled payments. Indeed some have argued that these effects may be large and have important policy implications. Others have argued that they are small and may be negligible. The underlying theory issue is not unique to production subsidies and trade. Friedman and Savage (1948) used similar arguments to reconcile simultaneous lottery play and insurance purchases with expected utility. More recently, some have proposed that the relationship between wealth and risk may be responsible for behavior leading to persistent poverty (Lybbert and Barrett, 2007). Arrow’s hypotheses regarding wealth and risk aversion have led to the introduction of ever more flexible functional forms for utility of wealth (e.g. the expo-power utility function, Saha (1993)) and increasing numbers of empirical studies aimed at detecting relationships between wealth and risk behavior. The majority of these studies find significant relationships between wealth and risk behavior, although some of the relationships differ from study to study. Yet, for the relationship between wealth and risk aversion to be an important component of behavior, the changes in risk aversion due to differences in wealth must be substantial. As yet, no tools have been developed to determine if the relationship between wealth and risk aversion required for a specific application is potentially beyond the bounds of reasonable behavior. Estimating the level of risk aversion would seem a much simpler task than estimating the relationship between wealth and risk aversion. However, in many cases, estimated levels of risk aversion are seemingly implausible and possibly irrational (see, for example Siegel (1992), Just and Peterson (forthcoming)). As a result, some have begun to question the validity of the methods used to estimate utility functions (e.g. Mundlak (1996), Just and Peterson (2003a), Just and Pope (2003)) or whether any methods may suffice.
26
D.R. Just / Journal of Econometrics 162 (2011) 25–34
This problem has become apparent only as more effort has been devoted to finding reasonable ranges for risk aversion coefficients. The body of work examining phenomena like decreasing absolute risk aversion (DARA) is somewhat smaller and must be subject to the same difficulties in estimation. While many have proposed DARA as a plausible explanation for various behaviors, little has been done to find the reasonable bounds on changes in absolute risk aversion for various changes in wealth. Without the knowledge of such bounds, it is impossible to know whether DARA is an important driving force behind economic behavior, or simply an interesting, yet negligible, side effect of changes in wealth. Building on the calibration techniques of Rabin (2000) and Just and Peterson (forthcoming), I propose a method to calibrate changes in absolute risk aversion given revealed choices. Just and Peterson’s method determined the minimum level of concavity of the utility of wealth function necessary to rationalize a revealed preference choice under risk. In this paper, I use similar methods to bound changes in absolute risk aversion. Specifically, given a change in risk behavior corresponding to a given change in wealth, this paper derives the necessary minimum change in relative curvature of a utility function. Thus, using revealed preference data, it is possible to place a lower bound on the observed decrease in absolute risk aversion. If this change in risk aversion is unreasonable given the size of wealth change (for example, if the change requires that the individual become risk loving after wealth has increased) then we might reasonably exclude DARA as the primary cause of the change in behavior. To give some context to the importance of wealth effects in risky choice, I apply this calibration method to the recent debate surrounding World Trade Organization (WTO) allowance of decoupled payments, and their possible distortionary effect on trade. In this paper I focus primarily on production risk because of the many contemporary examples employing wealth–risk relationships in production. All of the results derived herein can (more easily) be applied to individual risk decisions, as will be discussed. I show that any substantial effects on production are unlikely to be a result of DARA. Any finding of substantial production changes due to decoupled payments are likely due to econometric misspecification, or exclusion of important variables. More generally, Arrow’s hypotheses regarding wealth appear to be of less importance than other potential drivers of risk behavior. 2. Literature review From a normative point of view, expected utility theory is appealing because it is built on three simple and hard to dispute rational axioms (requiring, for example, transitive preferences). These axioms imply that individuals behave as if maximizing the expectation of a utility of wealth function. Friedman and Savage (1948) were the first to apply expected utility theory to explain economic decisions. Fundamental to Friedman and Savage’s explanation of risk taking behavior is the notion that the utility of wealth function changes its degree of concavity (or convexity) dramatically as one moves from a situation with a low level of wealth to a situation in which they have large amounts of wealth. In particular, they suppose that utility of wealth functions must display risk-loving behavior (convexity) for intermediate amounts of wealth and risk-averse behavior (concavity) for large or small amounts of wealth. A utility function of this shape would explain why poor individuals may buy insurance and lottery tickets simultaneously, while wealthier individuals would reject lottery tickets. Later, Arrow addressed the relationship between wealth and risk behavior, taking the mathematical theory to its limits. Much of the simplicity and usefulness of expected utility theory in practical applications stems from the very simple measures of absolute and relative risk aversion developed by Pratt (1964) and Arrow (1971). Absolute risk aversion, or RA = −U ′′ (w) t /U ′ (w), where w is wealth and U represents utility as a function of wealth, is a measure
Table 1 Studies on risk preference structure (updated from Saha et al. (1994))a . Study
Year
Main conclusions
Cohn et al. Landskroner Lins et al.
1975 1977 1981
Siegel & Hoban
1982
Morin & Suarez
1983
Bellante & Saba Chavas & Holt Pope & Just Saha, Shumway & Talpaz Chavas & Holt Bar-Shira, Just & Zilberman
1986 1990 1991 1994 1996 1997
Strong evidence for DRRA Cannot reject CRRA DARA; IARA, CRRA and DRRA evident from different farm type IRRA for less wealthy, DRRA for more wealthy IRRA for lower net worth households, DRRA for intermediate net worth households, weak DRRA (approx. CRRA) for wealthiest households DRRA Reject CARA in favor of DARA Reject CARA; CRRA not rejected DARA and IRRA DARA DARA and IRRA
a
This table focuses primarily on agricultural production studies.
of an individual’s willingness to take on risk. However, RA cannot be compared across currencies unless the proper adjustments are made. Relative risk aversion, RR = −U ′′ (w) w/U ′ (w), was introduced as a unit-independent measure of risk aversion, allowing much more general comparisons. Relative risk aversion measures willingness to take on risk as a proportion of wealth. Arrow hypothesized that utility of wealth functions would demonstrate
2
U ′′′ (w) U ′ (w) − U ′′ (w)
−
<0
[U ′ (w)]2
(Decreasing absolute risk aversion),
(1)
and
−
2
U ′′′ (w) w + U ′′ (w) U ′ (w) − U ′′ (w)
[U ′
(w)]
2
(Increasing relative risk aversion).
w
>0 (2)
The intuition behind these relationships is simple. Individuals should be more willing to take any particular risk as their wealth increases and less willing to risk any percentage of their wealth. These two hypotheses, while simple, have fueled many applied studies generally finding support (though some find counter evidence). The studies examine both production decisions facing profit risk (e.g. Saha et al. (1994), Chavas and Holt (1990), Pope and Just (1991), Lins et al. (1981), Chavas and Holt (1996), Bar-Shira et al. (1997)) and more conventional individual decisions (e.g. Cohn et al. (1975), Landskroner (1977), Morin and Suarez (1983), Siegel and Hoban (1982), Bellante and Saba (1986)). Table 1 summarizes the major results of this literature (updated from Table 1 of Saha et al. (1994)). While there is some evidence for DARA, the wealth effects on decisions (such as production) are generally found to be small and barely detectable. Indeed, if risk aversion is a secondary behavioral effect that is considered to be small, DARA can only have tertiary effects that must also be small in nature. The impact of wealth on production behavior in general is minimal (Chavas and Holt (1990) estimate a planting elasticity of wealth of 0.087 for corn). Estimations of risk aversion parameters in production generally fall under two procedural regimes (Holt and Chavas, 2002)— reduced form methods and structural estimation. Both of these methods have been used to estimate the relationship of wealth to risk preferences (e.g. Pope and Just (1991) use a reduced form and Saha et al. (1994) use structural estimation), though both methods present significant drawbacks. Holt and Chavas (2002, p. 219) point out that estimation of risk preferences ‘‘is inextricably linked to the
D.R. Just / Journal of Econometrics 162 (2011) 25–34
framework used to specify and estimate the empirical distribution of risk’’. As such, a researcher wishing to test for the relationship between risk and wealth must not only specify a utility functional form (or reduced form), but also the production technology and the distribution of prices or yields used in decision-making. The chosen functional forms may severely affect the risk preference estimates (Just and Just, this issue). For example, it may be difficult (or impossible) to identify risk behavioral parameters without strict assumptions regarding the relationship of input variables to profit risk. Goodwin (2003) notes that much of the current research regarding production risk treats a single decision (or set of decisions) in isolation—ignoring other important behaviors that could explain the risk response (e.g., the holding of significant non-production related assets). Just and Pope (2003) argue that ‘‘alternative explanations can be offered for seemingly risk responsive behavior, only one of which is curvature of the utility function’’. They illustrate how individual responses to exogenous restrictions on input choices (which may relate to wealth) could affect behavior in a way that is similar to the curvature of a utility function. Without exploring other potential explanations we cannot draw conclusions about the shape of the utility function and aversion to risk. For this and other reasons, there is ample reason to doubt the veracity of the behavioral risk estimates obtained from econometric estimation. Indeed, Just and Peterson (2003a, forthcoming) show that estimation results may not be possible to reconcile with the majority of data used to estimate them. Their work builds on Rabin’s (2000) results, showing that risk-averse responses to small risks imply a severely concave utility function, and often imply non-monotonic preferences for money. 3. A calibration tool In this section I use the standard production risk model to derive a new calibration tool. The purpose of this tool is to determine if a pair of revealed preference decisions could be reasonably due to DARA. In particular, I assume that we are able to observe a production choice both before and after a change in wealth. Then, it is possible to find the minimum decline in absolute risk aversion that would be necessary to describe the change in behavior. This decline together with the change in wealth can be used to construct a lower bound on the level of DARA. If the implied level of DARA falls outside the realm of plausible behavior, then it is unlikely that DARA could be the sole reason for the change in behavior after the change in wealth. To derive the calibration tool, suppose that an individual faces a production problem similar to that proposed by Sandmo (1971). It can be shown that the same tools apply to the more common portfolio investment problem, and are actually much simpler to derive because they lack an arbitrary cost function.1 Suppose that the individual solves max EU (p · y − c (¯y)) y
(3)
where p is a random vector of output prices, y is a vector of outputs, c (·) represents the cost of production as a function of planned output, y¯ , and EU (·) is the expected utility function. For simplicity, suppose that y = y¯ +ε where E(ε ) = 0. Because the price is generally dependant on the aggregate production levels, there may be a stochastic relationship between ε and p. For example, the price and production of the same good may be negatively
1 Additionally, production problems implicitly assume that the decision-maker faces an individual specific risk (i.e. yield risk) that cannot be traded or shared in the same way investment risks can, representing an important market failure. The market nature of risk in portfolio analysis may complicate the estimation techniques required to represent revealed choices, but will not complicate the calibration tool itself.
27
correlated, while prices for related goods may have more complex relationships. Because production shocks are usually isolated to small areas, these correlations are likely to be small. For example, Roberts and Key (2002) find that while neighboring counties in Kansas may display production shocks that are highly correlated, counties that are further apart (though within the same state) display no significant correlation. We are interested in the change in behavior that occurs when the individual is given a lump-sum payment of D and solves max EU (p · y − c (¯y) + D) . y
(4)
Assuming decreasing absolute risk aversion, Sandmo shows that individuals will increase their production in the case of a single output, and alter their behavior significantly if multiple outputs are produced, in the case of price risk without production risk.2 Several have claimed to detect such an effect in general estimation (see Table 1).3 In solving (4), the first-order conditions are given by
∫∫∫ p ,y
U ′ (p · y − c (¯y) + D) (pi − ci (¯y)) f (p, ε) dpdε = 0.
(5)
From (5) it is clear that the only change in behavior due to a decoupled payment, under expected utility, must be due to the difference in marginal utility over the supports of π = p · y − c (¯y) and πD = p · yD − c (¯yD ) + D, or more precisely, the difference in the relative changes in marginal utility with the addition of D over the two supports for p · y. We can find the minimum level of change in concavity (or utility jerk) over the support that justifies a particular change in output. Assuming that the utility function is first-order (but not necessarily second-order) continuously differentiable, the simple one-output case of this problem can be written as min −[U ′ (π¯ ) − U ′ (π )] + [U ′ (π¯ D ) − U ′ (π D )] u
(6)
such that π¯
∫ π
π¯
∫ π
U ′ (π ) p − c ′ (¯y) f (π ) dπ = 0,
(7)
U ′ (π + D) p − c ′ (¯yD ) f (π ) dπ = 0,
(8)
U ′ (x + ε) ≤ U ′ (x) ,
∀ε > 0,
(9)
and π¯ D
∫ π
U ′ (π) dπ = π¯ D − π .
(10)
The first bracketed term in the objective function of Eq. (6) is the change in marginal utility over the support of the random variable representing profit. The second term is the same change in profit when receiving the decoupled payment. Thus the objective is simply to minimize the discrete representation of the change in diminishing marginal utility between the two supports of these random variables. Restrictions (7) and (8) force the utility function to imply the observed output levels with and without the payment. Restriction (9) imposes decreasing marginal utility, and (10) is included without loss of generality to ensure a unique solution. Under expected utility theory, utility functions that differ only by a multiplicative constant imply the same behavior. Thus Eq. (10)
2 The point will hold generally with yield risk if the amount of yield risk is weakly increasing in planned production. 3 Traditionally DARA has been tested using a static model. Dynamic issues may also play an important role, but they present a much more challenging identification problem. In particular, the EU model does not explicitly separate issues of intertemporal risk substitution and risk aversion (Epstein and Zin, 1989).
28
D.R. Just / Journal of Econometrics 162 (2011) 25–34
β0 + β1 π − π if π < π ∗ ∗ ∗ ′ β1 π − π π + π − 2π¯ D + 2 (1 − β0 ) π¯ D − π U(π ∗ ,β0 ,β1 ) (π ) = β0 + β1 π ∗ − π + π − π ∗ if π ∗ < π 2 ∗ (π¯ D − π ) π¯ β0 −1 π¯ ′ ′ where β0 > 1, β1 < −2 π¯ −π π U (π ) p − c (¯y) f (π ) dπ = 0, and π D U ′ (πD ) p − c ′ (¯yD ) fD (πD ) dπ = 0. D D
,
Box I.
is required for identification, imposing the necessary scale constraint on the utility function, without imposing any restriction on behavior. I call this problem the Revealed Minimum Change in Concavity (RMCC). It is possible to show that the RMCC takes on a very convenient form. Proposition 1. Given nontrivial differences in fD (·) and f (·), the derivative of the solution to the RMCC can be represented as two line segments, with utility function as given in Box I. Proof. If Proposition 1 were not true, it should be possible to improve upon any utility function of the form of the expressions in Box I by constructing some utility function whose derivative consists of some arbitrary number and length of connected line segments. Let U be the solution to the RMCC (and thus continuously differentiable). Then, let U˜ be any approximation to U such that the derivative of U˜ can be represented as a series of line segments, with at least three distinct line segments of different slope: U˜ ′ (π)
β0 + β1 π .. . = I − βi πi∗ − πi∗−1 + βI +1 π − πI∗ β 0 +
.. .
π ≤ π < π1∗ .. . (11)
if
πI∗−1 ≤ π .
if
We can now show that it is possible to weakly improve upon (11) with a function described by only two line segments. Let U˜ be a function satisfying (6)–(10) with the additional restriction that it is of the form (11). Let R = py, and R¯ ≡ p¯ (¯y + ε¯ ), where p¯ and ε¯ are the maximum possible values for p and ε , respectively. This problem can be written as
I
min −U ′ R¯ − c (¯y) | βi , πi∗ i=1 β,π ∗
I + U ′ −c (¯y) | βi , πi∗ i=1 I + U ′ R¯ d − c (¯yD ) + D| βi , πi∗ i=1 I − U ′ D − c (¯yD ) | βi , πi∗ i=1
R¯
R¯ 0
R¯ D 0
I
Uβ′ i RD − c (¯yD ) + D| βi , πi∗ i=1
× p − c (¯yD ) f (RD ) dRD ′
− λ3
π+ ¯ D
∫ π
I
Uβ′ i π | βi , πi∗ i=1 dπ = 0
(17)
I ∂L ′ ¯ y) | βi , πi∗ i=1 ∗ = −Uπi∗ R − c (¯ ∂πi I + Uπ′ ∗ −c (¯y) | βi , πi∗ i=1 i I ′ + Uπ ∗ R¯ − c (¯yD ) + D| βi , πi∗ i=1 i I ′ − Uπ ∗ D − c (¯yD ) | βi , πi∗ i=1 i ∫ ¯ R ′ ∗ I ′ Uπ ∗ R − c (¯y) | βi , πi i=1 p − c (¯y) f (R) dR − λ1 − λ2
∫
i
R¯ D 0
I
Uπ′ ∗ RD − c (¯yD ) + D| βi , πi∗ i=1 i
(12)
× p − c (¯yD ) f (RD ) dRD
I
p − c ′ (¯y) f (R) dp = 0,
(13)
I
′
× p − c ′ (¯yD ) f (RD ) dRD = 0, ∫ π¯ D I U ′ π | βi , πi∗ i=1 dπ = π¯ D − π ,
− λ3
π+ ¯ D
∫ π
I
Uπ′ ∗ π | βi , πi∗ i=1 dπ = 0 i
U ′ R − c (¯yD ) + D| βi , πi∗ i=1
I
for all πi∗ , where Uβ′ 0 .| βi , πi∗ i=1 (14) Uβ′ i
(15)
π
0 ∗ I π | βi , πi i=1 = π − πi∗−1 πi∗ − πi∗−1
(18)
= 1, if π < πi∗−1 if πi∗−1 ≤ π < πi∗ if πi∗ ≤ π
(19)
if π ≥ πi∗ if π < πi∗ .
(20)
and
and
βi−1 − βi ≥ 0.
− λ2
U ′ R − c (¯y) | βi , πi∗ i=1
0
∫
0
∫
0
subject to
∫
I ∂L = −Uβ′ i R¯ − c (¯y) | βi , πi∗ i=1 ∂βi I + Uβ′ i −c (¯y) | βi , πi∗ i=1 I + Uβ′ i R¯ − c (¯yD ) + D| βi , πi∗ i=1 I − Uβ′ i D − c (yD ) | βi , πi∗ i=1 ∫ ¯ R ∗ I ′ ′ Uβi R − c (¯y) | βi , πi i=1 p − c (¯y) f (R) dR − λ1
for all βi , and
i=1
require that
(16)
Upon inspection, this is a simple linear program. Eqs. (13)–(15) must bind. If each line segment is distinct, then (16) cannot bind for any pair of βi , βi−1 . Thus, the resulting first-order conditions
I
Uπ′ ∗ π | βi , πi∗ i=1 i
=
βi − βi+1 0
No slope variable βj appears in (17). Additionally, βi −βi+1 appears as a multiplicative constant before every non-zero term of (18). Thus, given that βi − βi+1 ̸= 0 (because we assumed that the
D.R. Just / Journal of Econometrics 162 (2011) 25–34
inequality constraints did not bind), we can divide all terms of each Eq. (18) by βi − βi+1 , leaving each devoid of any slope variable, βj . If U˜ contains n knots (πi∗ ), the first-order conditions embody n equations of the form (18) and n + 2 of the form (17). However, these equations only contain three unknown LaGrangian multipliers, λi , and n unknown knot values, πi∗ . Barring a set of revealed preference problems producing redundant conditions, the system can only have a solution if 2n + 2 = n + 3, or n = 1. But this is a contradiction. Because we cannot improve on a two-line segment U˜ with any number of line segments, it must be that the two-line segment U˜ is the solution to the RMCC. Finally, redundant revealed preference conditions will occur if the cumulative distribution of returns between sets of knots is linearly dependent between the pair of revealed preference constraints.4 In this case, multiple solutions may exist, including some with n > 1. Proposition 1 allows us to find a simple solution to the RMCC by focusing only on those functions that can be written in the form of the expressions in Box I. While the proposition is simple to extend to a portfolio investment case, it is difficult to extend to more complicated problems. In particular, problems represented by a cost function with multiple outputs, or a profit function with multiple combinations of inputs that produce the same level of expected output, prevent the use of the simple linear form encompassed in the proof above. Rather, because the first-order conditions must now be considered for each input or output decision, the RMCC problem must include multiple revealed preference restrictions for both wealth conditions. In this more general case, the multiple first-order conditions will almost always require a solution to the RMCC with multiple knots, though still piecewise linear in nature. The use of a cost function with a single variable is not a necessary assumption, but one that simplifies the problem tremendously. By examining the minimum change in concavity given a pair of revealed choices, one can make judgments about the plausibility of the DARA explanation for those choices. In the following section I present a detailed example of this procedure and how it may be used to draw conclusions regarding risk aversion and behavior in a policy setting. 4. Decoupled payments and risk aversion To demonstrate the use of the calibration tool derived in this paper, I will examine how production behavior may be affected by subsidy payments not directly linked to production. Traditional production subsidies have been awarded on a per unit basis, essentially increasing the output price and driving up production. Many countries, including the US, now employ decoupled subsidies, or transfer payments that are not based on the units of production. The WTO has allowed such subsidies based on the notion that they will have minimal impacts on production and, thus, be minimally trade distorting. The 1996 Federal Agriculture Improvement and Reform (FAIR) Act was intended as a transition away from market intervention by the US government. The FAIR act introduced production flexibility contracts (PFCs), which were continued under the 2002 Farm Bill under the title Fixed Direct Payments (FDPs). FDPs are determined based on the historical production for either the years 1998 to 2001, or the acreage used under the PFCs based on the years prior to 1996. Basing payments on historical production is intended to provide support for farm producers, without encouraging greater
4 This is an unlikely occurrence when the choices are distinct and the risk distributions are non-trivial. This would occur if, for example, the distribution of wealth was partially discrete, so that shifting the distribution to the right would not change the probability distribution between knots.
29
incentives to produce. These payments have often been referred to as decoupled, though there is serious debate about their influence on production. For example, if farmers expect the opportunity to update their base acreage after every new farm bill, these payments may still encourage production. The 2002 farm bill also introduced a counter-cyclical payment (CCP) providing support in the event that the market price for a crop falls below a certain target. The amount of the payment is based on historical production and the market price, providing a type of insurance against poor market conditions. In this way, the CCP may affect production, by ensuring against some portion of the price risk inherent in production. The CCPs are closely related to loan deficiency payments (LDPs), which allow farmers to take out loans using crop as collateral. If the market price falls below the loan rate, the farmer can thus default on the loan, in essence selling their crop to the government for the loan rate. The LDP is considered to be fully coupled because it is based on actual production and on the market price. Thus, the LDP provides a direct incentive to produce more. If producers display DARA, Sandmo (1971) has shown that increasing wealth will increase production. This will happen because producers become more willing to take on risk. The argument has been made that decoupled payments should lead to increased production as a result of DARA (Chau and de Gorter, 2000; Hennessy, 1998).5 Much debate has centered around how large these effects could be, and whether the risk effects of decoupled payments warrant the attention of the WTO (see De Gorter et al. (2008) for some alternative arguments against seemingly decoupled payments). Young and Westcott (2000) conclude that the impact of decoupled payments must be small if the payments are small. According to Makki et al. (2004) PFC payments make up about 3% of farmer wealth (calculated as net assets) in the US. However, this number was calculated by simply dividing aggregate payments by aggregate net assets, and thus ignores potential variability in both. In the absence of individual data that would be necessary to calculate reasonable bounds on decoupled payments, it is difficult to assess how important the individual variability may be. However, using the Agricultural Resource Management Survey summary statistics of assets, income and total government payments by region and revenue class (available from USDA), it is possible to find some benchmarks for the amount of decoupled payments. The summary statistics only report total government payments, and thus place an upper bound on the potential decoupled payments. The highest aggregate average government payments as a percent of net assets by region and farm revenue is found among farms in the plains with revenues between $ 500,000 and $ 1,000,000 at 5.18%. The amount of decoupled payments would necessarily be smaller than this. The highest aggregate average government payments as a percentage of total income is also found in the plains among farms with revenues between $ 250,000 and $ 500,000 at 43.6%.6 Thus, while payments may generate a substantial portion of income, they are generally a very minor portion of wealth. Roberts (2004) suggests that any change in risk aversion due to decoupled payments must be small. Empirically, Goodwin and Mishra (2006), using a reduced form approach, find decoupled
5 Some also argue that decoupled payments provide a certain source of income and could thus result in increased production at the same level of risk aversion. This potential effect should also be captured by later analysis. 6 Assuming that payments and wealth are uncorrelated (a dubious though functional assumption) and that both are distributed normally, the 99th percentile ratios for any region/revenue group would be 7.0% and 63.9% for wealth and income, respectively.
30
D.R. Just / Journal of Econometrics 162 (2011) 25–34
payments to have a significant, though not a particularly large, effect on production. The authors argue that their method is likely to display a positive bias, leading them to conclude that the production effects of payments may be negligible. Westcott and Young (2000) place the effect at between 180,000 to 570,000 additional acres of planned production. Alternatively, Anton and LeMouel (2002) suggest that the wealth effects of decoupled payments can lead to substantial effects relative to fully coupled subsidies. Using estimates of cost functions and the distribution of profits facing individual farms, Proposition 1 facilitates calibration of the change in concavity of the utility function necessary to induce various production responses to decoupled payments. I follow the estimation techniques of Pope and Just (1996) in using a dual distance function computation of expected output in estimating the cost function and the distribution of profits. Pope and Just propose the estimation of an ex ante cost function, or the cost as a function of planned – rather than realized – production. This method eliminates the substantial bias introduced by ignoring the stochastic nature of production. 4.1. Data and estimation Data are annual observations on US agriculture, aggregated by state, from 1960 to 1999. This is an updated version of the same dataset used by Pope and Just (1996, 1998), which is widely used elsewhere. The data, aggregated at the state level, are constructed by the US Department of Agriculture and are readily available for verification. Ideally, ex ante cost functions and revenue distributions would be estimated using panel data allowing observations of yields for individual farms for several consecutive years that might reasonably be considered draws from identical distributions conditional on inputs. This method would allow for efficient estimation of the idiosyncratic risk faced by an individual farmer. Unfortunately, data do not exist for the primary crops under consideration. To the author’s knowledge, while the method introduced by Pope and Just (1996) is widely considered the best and most widely cited method for estimating cost functions in agricultural production (and the only method that allows for uncertainty) the only applications other than those appearing in Pope and Just (1996, 1998) employ Monte Carlo simulations (for example Moschini (2001)). All other existing cost function estimates fail to account for risk entirely. The data used in this study are derived from the same source, though aggregated at the state level rather than nationally across all crops. Hence, while the cost functions I estimate still face substantial aggregation bias, they can be seen as a significant improvement over all cost function estimates previously appearing in the literature. Given these estimates and some reasonable assumptions regarding the relationship between idiosyncratic and aggregate risk, I can calibrate the RMCC. Further, I conduct sensitivity analysis to show that my calibration is relatively insensitive to the assumptions regarding individual risk. The data consist of four inputs: use of capital machinery, purchased materials, land and labor (both hired and self employed); and an aggregate output variable as well as corresponding prices and rental values.7 These aggregate input and output data were constructed using Tornqvist–Theil indices (see Ball (1985) for a complete description of their construction). The data were mean
7 It would be desirable to account for storage of grains and other mechanisms like hedging that may act somewhat like insurance. Certainly risk and wealth may influence storage behavior. There is no current study allowing both for ex ante risk and storage, etc., in estimation of a cost function. This is due both to a gap in estimation theory and a dearth of applicable data. Developing estimates that would account for such issues is beyond the scope of the current exercise.
scaled for estimation. All estimation was conducted at the state level. The ex ante cost function in (3) as a function of planned output is estimated following Pope and Just to eliminate the bias introduced by ignoring the stochastic nature of production. Specifically, using Diewert’s (1971) homothetic generalized Leontief specification, where c˜ is the cost of production, rt ∈ Rn++ a vector of input prices, and y¯ t ∈ R+ the planned level of crop output, the cost function is represented by
1/2
c˜ (rt , y¯ t ; B) = φ (¯yt ) rt
1/2
β rt
+ t αr ,
(21)
where the subscript t represents time period, B is the set of 1/2 unknown parameters in φ , β and α , rt is a vector with elements 1/2
rit , β is a symmetric matrix with positive parameters, and α a vector of parameters representing technical change.8 As shown by Pope and Just, the resulting input demand function is xt =
1 1/2 1/2 rt β rˆt + t α + ut ,
(22)
µ ˆt
−1/2
−1/2 1/2 β + t αˆ xˆ t , rˆt is 1/2 a diagonal matrix with diagonal element rt , αˆ is a diagonal matrix with elements corresponding to the values of α , and ut where µ ˆ t is the maximal root of xˆ t
is a vector of disturbances following the additive general error model (McElroy, 1987), designed to be internally consistent in the derivation of such input demand functions from cost functions. This input demand is the object of estimation for each state, which is accomplished following the procedure described by Pope and Just (1996), employing the concentrated log-likelihood function Tn
T
(23) (1 + ln 2π ) − ln |Ω | , 2 ∑ T 1 where Ω = t =1 ut ut . This estimation was conducted T L=−
2
individually for each state. A summary of the estimates can be found in Table 2. Notably, there are several outlying estimates, which can be observed where the mean is larger than the 75th percentile estimates of some parameters. This is particularly true of the off-diagonal elements of β . This reflects a substantial variation in production between states, which can also be observed in the high standard deviation of the state level parameter estimates. The majority of state level estimates and standard errors (not reported) look very similar to those found by Pope and Just (1996), as the data are derived from the same source (though for different years). Summary statistics are also reported for the corn-belt states (Illinois, Indiana, Iowa, Missouri, Ohio) and the Southeast and Delta region states (Arkansas, Louisiana, Mississippi, North Carolina, South Carolina, Georgia, Alabama, Florida).9 The corn-belt and southern regions receive the largest average direct payments per crop acre, and thus are of particular interest in this exercise. The state level estimates look very similar within region, with the notable exception of Iowa. The data for Iowa yield estimates for the parameters α that are large and negative (hence the negative mean values for corn-belt states). This appears to be due primarily to a differing pattern of changes in land values relative to other inputs within Iowa. Despite this difference, as the following simulations illustrate, the required risk behaviors fall within the same bounds and patterns as other states analyzed.
8 Estimating the cost function without simultaneously estimating the utility function may introduce some bias, as noted by Pope and Just (1998). Including the utility function would dampen the curvature of the cost function and increase the amount of change in curvature needed to explain the observed choices. Thus, my estimates produce a more conservative minimum change in concavity estimates, with the additional advantage of relating more directly to cost curve estimates in the current literature. 9 I classify states using the National Agricultural Statistical Service production region definitions.
D.R. Just / Journal of Econometrics 162 (2011) 25–34
31
Table 2 Description of state level cost function estimates. Parameters
25th percentile
50th percentile all states
75th percentile
Mean all states
Standard deviation (across states)a
50th percentile corn belt
Mean corn belt
50th percentile Southeast-Delta
Mean Southeast-Delta
β11 β12 β13 β14 β22 β23 β24 β33 β34 α1 α2 α3 α4 α5
< 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001
< 0.0001
0.0004 0.0010 0.0004 0.0008 1.1124 0.0003 0.0003 1.6687 0.0002 1.0460 9.1585 1.1110 1.7309 0.7806
0.0003 0.0023 0.0033 0.0103 3.1961 0.0004 0.0012 2.6080 0.0008 5.1803 8.8165 −1.6421 −0.7136 −3.4414
0.0007 0.0060 0.0120 0.0536 11.6024 0.0011 0.0064 7.4507 0.0023 16.9479 12.2475 7.5934 5.9299 13.8206
< 0.0001 < 0.0001 < 0.0001 < 0.0001
< 0.0001
0.0003 0.0002 0.0001 0.0002 < 0.0001 < 0.0001 < 0.0001 < 0.0001 0.0001 < 0.0001 8.3797 0.4020 1.1868 −2.5468
0.0008 0.0007 0.0007 0.0012 0.0354 0.0004 0.0004 0.0620 0.0007 0.1035 8.7715 0.2179 1.6569 −2.2562
a
2.9203 −1.0359 −0.9531 −2.1304
0.0001 0.0001 0.0002 0.1301 < 0.0001 < 0.0001 0.0902 < 0.0001 0.1186 4.5106 0.2171 0.5221 −0.2377
0.4382
< 0.0001 < 0.0001 1.4774
< 0.0001 0.2951 3.3778 0.1989 0.6720 0.1764
0.0060 0.0030 0.0733 0.4558 0.0013 0.0089 1.2571 0.0013 0.6823 12.2311 −4.1339 −3.5588 −18.0369
These are not estimated standard errors, rather these are the standard deviations of parameter estimates resulting from each individual state estimation.
4.2. Simulating the response to decoupled payments The object of this simulation is to determine what levels of response to decoupled payments are plausible. Using the econometric estimates described above, in addition to estimates of the distribution of revenues, and arbitrary values for the subsidy and change in production, we can determine the change in concavity of the utility function that must take place. Clearly, the level of risk evidenced in state level aggregate production will be far lower than that found at the individual level. To allow for this difference in simulation, I appeal to existing estimates relating individual production risk to aggregate risk.10 For the purposes of simulation, the revenues were assumed to be normally distributed.11 The cost structure itself is a deterministic function of planned production, with all uncertainty being due to variation in revenue (as described in Section 3). A time trend was removed from the price, subtracting the estimated price resulting from an ordinary least squares regression of price on a constant and year.12 This de-trended price was used to calculate revenue for each period. The mean and variance of revenue were then calculated for each state. It has been noted by Just and Weninger (1999) that the percentage variation in individual farm profits may be as much as 5 times that observed in the aggregate. Additionally, decoupled payments can vary substantially with the size and scope of the production operation. Thus, to examine the impacts of the level of risk and the level of payments, I use several values for each. Because
10 Panel data with repeated observations of farm level production would be ideal for this purpose and could provide much more convincing results. In the absence of the possibility of obtaining such a panel, I attempt to use aggregate data, simulating the individual idiosyncratic risk using relationships that are reasonable given the literature. The following sensitivity analysis shows that my results appear to be generally invariant to the actual level of revenue risk. 11 The estimates can be very dependent on the distribution chosen, as this will affect the curvature of the required functions. Given the relatively wide dispersion in this case, the choice of distribution can alter the required change in concavity significantly, though not by orders of magnitude. For example, conducting this exercise with a Gamma distribution results in changes between 300% and 600%, still well outside of what we might consider the reasonable range. The wider dispersion of results derives from the heavy dependence of the skewness of the distribution on mean and variance estimates. 12 If production shocks are not independent across states within a given year, it may be possible to improve the estimation using a seemingly unrelated regression technique to estimate jointly. As argued earlier, I expect these correlations to be relatively small. Alternatively, prices may be much more correlated across states. However, given the index nature of the data, determining the appropriate structure for joint estimation of price volatility would be a difficult exercise and may distract from the primary point of this analysis. Given the robustness of the results, it is unlikely that marginal changes in price distributions would have any discernable impact.
the variance faced by a farmer may be substantially greater than that of the aggregate, I use values for the standard deviation of revenue equal to 1, 2 and 10 times the estimated variation. Decoupled payments make up about 3% of farmer wealth in the US (Makki et al., 2004). Payments for corn were originally capped at 10% of the price, which would increase profits by about 35%. However, larger production responses are more likely with larger payments. Hence I simulate payments that are equal to 15, 30 and 50% of the average profits in the absence of payments.13 Finally, I examine the possibility of decoupled payments increasing planned production by 1, 5 and 10%. The median results for each simulated scenario are reported in Table 3 as well as the medians for each production region. Notably, all changes of 1% or more in production resulting from any change in profit of 50% or less require a decrease in absolute risk aversion of at least 452% (calculated using the standard arcpercentage change formula). Each of these changes in production requires the producer to be risk averse without the payment, but risk loving with the payment. This requirement would seem implausible considering the substantial evidence of risk aversion found in the literature. The individual region results are virtually identical, illustrating the low variability in the simulation results relative to the magnitude of the required change in risk aversion. This should not be too surprising given the primary drivers of the required change are the change in scale of production risk and the change in wealth—both of which are arbitrarily chosen for the purpose of this simulation. Alternatively, the convexity of the cost function could influence the required change. This result suggests that (i) DARA does not drive production behavior, (ii) there is some fundamental problem with expected utility as a positive model of behavior, or (iii) the cost function model is misspecified. While I cannot rule out (iii), prior experiments using several other existing cost function estimates in the literature produce similar results. Thus any misspecification problem is common among agricultural production studies. In order to show that these results are not skewed by outliers, it may be useful to view a histogram of the percentage change in absolute risk aversion resulting from the RMCC for each state. Fig. 1 displays this histogram for the case of a 1% change in production, a payment totaling 50% of profits, and variation equal
13 There is a long and storied debate among risk economists about what constitutes wealth in a utility function setting. Meyer and Meyer (2005, 2006) discuss this literature thoroughly. Only wealth that is allocable to the risk should be considered in the utility of wealth functions. I assume that it is impossible to allocate future land rents to production within this framework (except to sell one’s land). Hence, the appropriate frame is current year profit.
32
D.R. Just / Journal of Econometrics 162 (2011) 25–34
Table 3 Median minimum percentage change in absolute risk aversion. Percentage change in planned production
Payments equal 15% of profits
Payments equal 30% of profits
Payments equal 50% of profits
Standard deviation of revenue equals 1 times the aggregate 10 % increase All states Corn belt Southeast-Delta Lake states NE-Appalachia Mountain-Pacific 5 % increase All states Corn belt Southeast-Delta Lake states NE-Appalachia Mountain-Pacific 1 % increase All states Corn belt Southeast-Delta Lake states NE-Appalachia Mountain-Pacific 10 % increase All states Corn belt Southeast-Delta Lake states NE-Appalachia Mountain-Pacific 5 % increase All states Corn belt Southeast-Delta Lake states NE-Appalachia Mountain-Pacific 1 % increase All states Corn belt Southeast-Delta Lake states NE-Appalachia Mountain-Pacific 10 % increase All states Corn belt Southeast-Delta Lake states NE-Appalachia Mountain-Pacific 5 % increase All states Corn belt Southeast-Delta Lake states NE-Appalachia Mountain-Pacific 1 % increase All states Corn belt Southeast-Delta Lake states NE-Appalachia Mountain-Pacific
480.24% 480.23% 480.24% 480.23% 480.30% 480.24%
480.08% 480.07% 480.08% 480.06% 480.15% 480.08%
479.86% 479.85% 479.86% 479.84% 479.94% 479.86%
473.07% 473.06% 473.07% 473.05% 473.13% 473.06%
472.37% 472.37% 472.38% 472.35% 472.45% 472.37%
471.45% 471.45% 471.46% 471.43% 471.55% 471.45%
465.31% 459.47% 465.30% 459.45% 465.31% 459.48% 465.28% 459.43% 465.41% 459.64% 465.29% 459.47% Standard deviation of revenue equals 2 times the aggregate
452.16% 452.13% 452.17% 452.11% 452.41% 452.16%
480.23% 480.23% 480.23% 480.22% 480.26% 480.22%
479.85% 479.85% 479.85% 479.84% 479.89% 479.85%
480.07% 480.06% 480.07% 480.06% 480.10% 480.06%
473.05% 472.36% 473.06% 472.36% 473.06% 472.36% 473.04% 472.34% 473.07% 472.38% 473.04% 472.35% 465.29% 459.44% 465.29% 459.44% 465.30% 459.44% 465.26% 459.41% 465.31% 459.51% 465.26% 459.42% Standard deviation of revenue equals 10 times the aggregate
471.44% 471.44% 471.44% 471.42% 471.48% 471.43% 452.12% 452.11% 452.13% 452.09% 452.23% 452.12%
480.21% 480.22% 480.22% 480.21% 480.21% 480.21%
480.05% 480.06% 480.06% 480.05% 480.05% 480.05%
479.83% 479.84% 479.84% 479.83% 479.83% 479.83%
473.03% 473.05% 473.06% 473.03% 473.01% 473.01%
472.33% 472.35% 472.36% 472.34% 472.32% 472.32%
471.41% 471.43% 471.44% 471.41% 471.40% 471.40%
465.25% 465.28% 465.29% 465.25% 465.21% 465.22%
459.39% 459.42% 459.43% 459.39% 459.37% 459.38%
452.07% 452.09% 452.10% 452.07% 452.07% 452.06%
to 10 times the aggregate variance. Note that only two outliers appear to the right, and have no real effect on the resulting median. The two outliers found in all three figures are Rhode Island and New Hampshire—states characterized by much smaller farms that receive fewer direct payments. The resulting outliers reflect estimated cost curves that are highly convex relative to all other states in the sample.
It is also important to determine the relationship between the minimum change in concavity and the level of production. If the producers who are most likely to change their production by a significant percentage are the largest producers, then the effects of DARA may still be a policy issue. Figs. 2 and 3 display the estimated percentage decline in average risk aversion against total production and average farm size by state, respectively.
D.R. Just / Journal of Econometrics 162 (2011) 25–34
33
changes in risk behavior resulting from subsidies are found, it could be an indication of a poor choice of behavioral model. 5. Conclusion
Fig. 1. Distribution of the minimum percentage changes in absolute risk aversion by state resulting from a 1% increase in production, decoupled payment equal to 50% of profit, and variation equal to 10 times the aggregate.
Both figures make clear that while there appears to be a negative relationship between the necessary DARA and production, the slope is relatively shallow for larger production. Thus, it is seems highly unlikely that DARA would be an interesting point of policy in the debate over production subsidies in trade. Alternatively, if large
This paper introduces a new tool for calibrating the effects of DARA in revealed choice problems. While DARA intuitively seems like an important economic effect, it may not be very important in many current applications. In particular, the size of wealth transfer necessary to induce substantial changes in risk aversion – according to the expected utility model – must be extremely large to make a substantial difference. This provides some justification for the use of constant absolute risk aversion models as an approximation of risk behavior in many settings. If we observe changes in risk behavior over even the substantial changes in wealth used in this simulation, it could be an indicator that expected utility maximization is an inappropriate model or that the model is misspecified. Several alternative models have been proposed, but are seldom employed in policy applications. One example, Kahneman and Tversky’s (1979) prospect theory, would also seem an inappropriate choice, as increasing production at any level appears to imply risk-loving behavior despite increasing wealth above a reference point. Econometric estimation of models of risk aversion is unable to expose large calibration errors, due to the restrictive utility function specifications most often found in the literature. For example, Saha et al. (1994) estimate seemingly reasonable risk aversion parameters imposing the expo-power utility function. Just and Peterson (2003a) show that, using Saha, Shumway and Talpaz estimates of the potential choices, the average choices could
Fig. 2. Minimum percentage change in absolute risk aversion and total output by state.
Fig. 3. Minimum percentage change in absolute risk aversion and average farm size by state.
34
D.R. Just / Journal of Econometrics 162 (2011) 25–34
only be consistent with a negatively sloped utility function. This problem was hidden by the structure of the expo-power utility function, and an estimation approach which assumes that all behavior that does not maximize expected returns must be due to concavity of a utility function. Similarly, approaching policy issues regarding wealth and risk aversion by imposing a structural utility function, and assuming that DARA is the only driver of heterogeneous behavior across wealth levels, will sweep calibration problems under the rug. The tools outlined in this paper can overcome these problems, narrowing our search for behavioral explanations. References Anton, J., LeMouel, C., 2002. Risk effects of crop support measures. Presented at the European Association of Agricultural Economists, Zaragoza, Spain. Arrow, K.J., 1971. Essays on the Theory of Risk Bearing. Markham Publishing Co, Chicago, Chapter 3, pp. 90–133. Ball, V.E., 1985. Output, input, and productivity measurement. American Journal of Agricultural Economics 67, 475–486. Bar-Shira, Z., Just, R.E., Zilberman, D., 1997. Estimation of farmers’ risk attitude: An econometric approach. Agricultural Economics 17, 211–222. Bellante, D., Saba, R.P., 1986. Human capital and life-cycle effects on risk aversion. Journal of Financial Research 9, 41–51. Chau, N.H., de Gorter, H., 2000. Disentangling the production and export consequences of direct farm income payments. Paper presented at the AAEA Meetings, Tampa, Florida. Chavas, J.-P., Holt, M.T., 1990. Acreage decisions under risk: The case of corn and soybeans. American Journal of Agricultural Economics 72, 529–538. Chavas, J.-P., Holt, M.T., 1996. Economic behavior under uncertainty: A joint analysis of risk preferences and technology. Review of Economics and Statistics 78, 329–335. Cohn, R.A., Lewellen, W.G., Lease, R.C., Schlarbaum, G.S., 1975. Individual investor risk aversion and investment portfolio composition. Journal of Finance 30, 605–620. De Gorter, H., Just, D.R., Kropp, J.D., 2008. Cross-subsidization due to infra-marginal support in agriculture: A general theory and empirical evidence. American Journal of Agricultural Economics 90, 42–54. Diewert, W.E., 1971. An application of the Shephard duality theorem: A generalized Leontief production function. Journal of Political Economy 79, 481–507. Epstein, L.G., Zin, S.E., 1989. Substitution, risk aversion, and the temporal behavior of consumption and asset returns: A theoretical framework. Econometrica 57, 937–969. Friedman, Milton, Savage, Leonard J., 1948. The utility analysis of choices involving risk. Journal of Political Economy 56, 279–304. Goodwin, B.K., 2003. Does risk matter? Discussion. American Journal of Agricultural Economics 85, 1257–1258. Goodwin, B.K., Mishra, A.K., 2006. Are ‘Decoupled’ farm program payments really decoupled? An empirical evaluation. American Journal of Agricultural Economics 88, 73–89. Hennessy, D.A., 1998. The production effects of agricultural income support policies under uncertainty. American Journal of Agricultural Economics 80, 46–57. Holt, M.T., Chavas, J.-P., 2002. The Econometrics of Risk. In: Just, R.E., Pope, R.D. (Eds.), A Comprehensive Assessment of the Role of Risk in US Agriculture. Kluwer Academic Publishers, Boston, pp. 213–242. Just, D.R., Peterson, H.H., 2003a. Diminishing marginal utility of wealth and calibration of risk in agriculture. American Journal of Agricultural Economics 85, 1234–1241. Just, D.R., Peterson, H.H., Is expected utility theory applicable? a revealed preference test. American Journal of Agricultural Economics (forthcoming).
Just, R.E., Pope, R.D., 2003. Agricultural risk analysis: Adequacy of models, data, and issues. American Journal of Agricultural Economics 85, 1249–1256. Just, R.E., Just, D.R., Identification and tractable specification possibilities for risk preference estimation. Journal of Econometrics (this issue). Just, R.E., Weninger, Q., 1999. Are crop yields normally distributed? American Journal of Agricultural Economics 81, 287–304. Kahneman, Daniel, Tversky, Amos, 1979. Prospect theory: An analysis of decision under risk. Econometrica 47, 263–292. Landskroner, Y., 1977. Intertemporal determination of the market price of risk. Journal of Finance 32, 1671–1681. Lins, D.A., Gabriel, S.C., Sonka, S.T., 1981. An analysis of the risk aversion of farm operators: An asset portfolio approach. Western Journal of Agricultural Economics 6, 15–29. Lybbert, T.J., Barrett, C.B., 2007. Risk response to dynamic asset thresholds. Review of Agricultural Economics 29, 412–418. Makki, S.S., Somwaru, A., Vandeveer, M., 2004. Decoupled payments and farmers’ production decisions under risk. In: Burfisher, M.E. and J. Hopkins, Decoupled Payments in a Changing Policy Setting. Agricultural Economic Report No. 838, Economic Research Service, US Department of Agriculture, pp. 33–39. Meyer, J., Meyer, D.J., 2005. Relative risk aversion: What do we know? Journal of Risk and Uncertainty 31, 243–262. Meyer, J., Meyer, D.J., 2006. Measuring risk aversion. Foundations and Trends in Microeconomics 2, 107–203. McElroy, M.B., 1987. Addictive general error models for production, cost, and derived demand or share systems. Journal of Political Economy 95, 737–757. Morin, R.-A., Suarez, A.F., 1983. Risk aversion revisited. Journal of Finance 38, 1201–1216. Moschini, G.C., 2001. Production risk and the estimation of ex-ante cost functions. Journal of Econometrics 100, 357–380. Mundlak, Y., 1996. Production function estimation: Reviving the primal. Econometrica 64, 431–438. Pope, R.D., Just, R.E., 1991. On testing the structure of risk preferences in agricultural supply analysis. American Journal of Agricultural Economics 73, 743–748. Pope, R.D., Just, R.E., 1996. Empirical implementation of ex ante cost functions. Journal of Econometrics 72, 231–249. Pope, R.D., Just, R.E., 1998. Cost function estimation under risk aversion. American Journal of Agricultural Economics 80, 296–302. Pratt, J.W., 1964. Risk aversion in the small and in the large. Econometrica 32, 122–136. Rabin, M., 2000. Risk aversion and expected-utility theory: A calibration theorem. Econometrica 68, 1281–1292. Roberts, M.J., Key, N., 2002. Does liquidity matter to agricultural production. In: Just, R.E., Pope, R.D. (Eds.), A Comprehensive Assessment of the Role of Risk in US Agriculture. Kluwer Academic Publishers, Boston, pp. 391–416. Roberts, M.J., 2004. Effects of government payments on land rents, distribution of payments benefits, and production. In: Burfisher, M.E. and J. Hopkins, Decoupled Payments in a Changing Policy Setting. Agricultural Economic Report No. 838, Economic Research Service, US Department of Agriculture, pp. 49–54. Saha, A., 1993. Expo-power utility: A ‘flexible’ form for absolute and relative risk aversion. American Journal of Agricultural Economics 75, 906–913. Saha, A., Shumway, C.R., Talpaz, H., 1994. Joint estimation of risk preference structure and technology using expo-power utility. American Journal of Agricultural Economics 76, 173–184. Sandmo, A., 1971. On the theory of the competitive firm under price uncertainty. American Economic Review 61, 65–73. Siegel, J.J., 1992. The equity premium: Stock and bond returns since 1802. Financial Analysts Journal 48, 28–38. Siegel, F.W., Hoban Jr, J.P., 1982. Relative risk aversion revisited. Review of Economics and Statistics 64, 481–487. Westcott, P.C., Young, C.E., 2000. US farm program benefits: Links to planting decisions & agricultural markets, Agricultural Outlook. US Department of Agriculture, Economic Research Service. AGO-275. pp. 10–14. Young, C.E., Westcott, P., 2000. How decoupled is US agricultural support for major crops? American Journal of Agricultural Economics 82, 762–767.
Journal of Econometrics 162 (2011) 35–43
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Agricultural arbitrage and risk preferences Rulon D. Pope a,∗ , Jeffrey T. LaFrance b,c , Richard E. Just d a
Brigham Young University, United States
b
Washington State University, United States
c
University of California, Berkeley, United States
d
University of Maryland, United States
article
info
Article history: Available online 12 October 2009 JEL classification: G11 Q12
abstract A structural intertemporal model of agricultural asset arbitrage equilibrium is developed and applied to agriculture in the North Central region of the US. The data are consistent with a unifying level of risk aversion. The levels of risk aversion are more plausible than previous estimates for agriculture. However, the standard arbitrage equilibrium is rejected; perhaps, this is due to the period and the shortness of the period studied. © 2009 Elsevier B.V. All rights reserved.
Keywords: Arbitrage Risk aversion Agriculture
1. Introduction Many studies have applied portfolio theory (Sharpe, 1970) to explain acreage allocation in production agriculture (e.g., Behrman, 1968; Estes et al., 1981; Just, 1974; Lin et al., 1974; Lin, 1977). These applications and many since have been primarily applied to crop acreage decisions assuming linear technology and most are in a static setting. This literature, which grew out of Nerlovian models of supply response (Nerlove, 1956), is generally based upon adaptive interpretations of risk, and has evolved into a more rational risk approach (Holt and Aradhyula, 1998; Saha et al., 1994; Holt and Moschini, 1992). The general finding of this, by now, large literature is that the allocation of total acreage to specific production activities is significantly influenced by risk as generally modeled with variances and covariances. The most common finding is that an increase in the own variance of price or revenue reduces the acreage allocated to that activity. This is generally interpreted as the impact of risk aversion.1 From the perspective of more recent developments in portfolio theory, two general findings beg application in this empirical agricultural risk literature. First, explicit attempts to measure risk
∗ Corresponding address: Department of Economics, Brigham Young University, Provo, UT 84602, United States. Tel.: +1 (801) 422 3178; fax: +1 801 422 0194. E-mail address:
[email protected] (R.D. Pope). 1 Some authors have modeled production with explicit substitution among inputs along with risk aversion over profit or initial wealth (e.g., Saha et al., 1994; Love and Buccola, 1991; Antle, 1987). 0304-4076/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2009.10.007
aversion structurally such as those in the equity premium puzzle are preferred (Mehra and Prescott, 1985). Only this way will researchers be able to distinguish risk aversion from other behaviors. Second, the structural approach provides a way to determine whether estimated risk aversion is credible (Siegel and Thaler, 1997). Thus, we argue that the structural approach is a sensible way to proceed at least at this stage in the development of risk literature in agriculture. Specifically, this literature suggests advantages for a more integrative examination of the broader portfolio problem in agriculture that includes consumption, investment, and other risk sharing activities as well as production. Modern agriculture is characterized by much off-farm investment (Mishra and Morehart, 2001). At the very least, reduced form production- or acreage-oriented models may misinterpret the level of risk aversion (Just and Peterson, 2003). Worse, parameters can be biased if relevant variables are omitted. For example, if markets are incomplete, Fisher separation may not hold implying inconsistent estimation of parameters (Saha et al., 1993). A third issue concerns the advantages and disadvantages of using typical Euler equation representations of intertemporal arbitrage. Euler equations may yield important information from which to identify parameters, but imply that the dynamics must be properly specified (Carroll, 2001). For example, one must choose between the non-expected utility model of Kreps and Porteus (1978) and standard model of discounting with additive preferences (Laibson, 1997). After building a dynamic model of consumption, investment, and production, we obtain fundamental arbitrage equations that govern allocations of wealth to financial assets and agricultural
36
R.D. Pope et al. / Journal of Econometrics 162 (2011) 35–43
capital as well as the allocation of acreage. This enables econometric choice from a larger set of first-order (arbitrage) conditions in order to estimate risk preferences. We develop general conditions and then adapt them for empirical use with the available data. The crucial variable of interest driving decisions is consumption, which is facilitated by accumulation of net worth (wealth).2 For agricultural households, these both are notoriously difficult to measure.3 After developing the arbitrage conditions, empirical estimates are obtained by generalized method of moments (GMM) for eight states in the North Central region of the US using stock market returns, bonds, and agricultural land allocations.4 For these eight states in the period 1991–2000, reasonably good measurements of wealth are available which are essential for our approach.5 To the extent that we measure the impact of policy, it must be found in the distribution of crop returns, which include government payments. While this is a relatively short and somewhat anomalous time period when compared with typical studies in finance, we suggest this comprehensive approach to arbitrage structure can be beneficial when compared with typical incomplete approaches to the estimation of risk behavior in agriculture. Using contemporaneous arbitrage equations implied by Euler conditions, an econometric model is specified over future wealth and excess returns conditional on the current information set. In spite of limited data, we find evidence of aggregate risk aversion that is rationalized by a single set of representative consumer preferences using an unconventional but reasonable specification. 2. Variable definitions and timing for a micro-model of farm behavior Although the organizational form of farms varies, a recent report by Hoppe and Banker (2006) finds that 98% of US farms remained family farms as of 2003. In a family farm, the entrepreneur controls the means of production and makes investment, consumption, and production decisions. We begin by modeling the intertemporal interactions of these decisions. The starting point is a model similar in spirit to Hansen and Singleton’s (1983) but generalized to include consumption decisions and farm investments as well as financial investments and production decisions. Variable definitions are as follows, where t denotes the time period: Wt = beginning-of-period total wealth, Bt = current holding of bonds with a risk free rate of return rt , fi,t = current holding of the ith risky financial asset, i = 1, . . . , nF , δi,t +1 = dividend rate on the ith risky financial asset, pFi ,t = beginning-of-period market price of the ith risky financial asset, γi,t +1 = (pFi ,t +1 − pFi ,t )/pFi ,t =capital gain rate on the ith risky financial asset, ai,t = current allocation of land to the ith crop, i = 1, . . . , nY , At = total quantity of farm land, pL,t = beginning-of-period market price of land, ψt +1 = (pL,t +1 − pL,t )/pL,t = capital gain rate on land, xi,t = vector of input quantities employed in producing the ith crop,
wt = vector of market prices for the inputs, y¯ i,t = expected yield per acre for the ith crop, i = 1, . . . , nY , yi,t +1 = realized yield of the ith crop, pYi ,t +1 = end-of-period realized market price for the ith farm product, qt = vector of quantities of consumption goods, pQ ,t = vector of market prices for consumer goods, mt = total consumption expenditures, u(qt ) = periodic utility from consumption. As with all discrete time models, timing can be represented in multiple ways. In the model used here, all financial returns and farm asset gains are assumed to be realized at the end of each time period (where depreciation is represented by a negative asset gain). Variable inputs are assumed to be committed to farm production activities at the beginning of each decision period and the current period market prices for the variable inputs are known when these use decisions are made. Agricultural production per acre is realized stochastically at the end of the period such that yi,t +1 = y¯ i,t (1 + εi,t +1 ), i = 1, . . . , nY , where εi,t +1 is a random output shock with E (εi,t +1 ) = 0. Consumption decisions are made at the beginning of the decision period and the current market prices of consumption goods are known when these purchases are made. Utility is assumed to be strictly increasing and concave in qt . The total beginning-of-period value of financial assets, and land are, respectively, Ft =
nF −
pFi ,t fi,t ≡ pTF ,t ft ,
and
i =1
Lt = pL,t At ≡ pL,t ιT at where the total beginning-of-period quantity of land is At = ιT at , with ι denoting an n-vector of ones and, throughout, bolded notation represents the vector form of its unbolded and i-subscripted counterpart. Note that homogeneous land is assumed with a scalar price, pL,t . For an arbitrary n-vector z, denote the n × n diagonal matrix whose typical diagonal element is zi by ∆(z ). 3. Behavior and constraints We assume non-joint crop production with constant returns to scale so that the production function for the ith crop in per acre terms is y¯ i,t = gi,t (xi,t ), i = 1, . . . , nY . For each crop, the cost function per acre satisfies ci,t (wt , y¯ i,t ) ≡ min wtT xi,t : y¯ i,t = gi,t (xi,t ) ,
xi,t
i = 1, . . . , nY ,
sumption by agricultural households in the US. 4 Difficulty in measuring agricultural capital services at the state level led us to omit this arbitrage equation. 5 These are the wealth data used by Lin and Dismukes (2007) in a recent Economic Research Service (ERS) study of the USDA.
(2)
and total cost across all crops is additively separable (Hall, 1978; Muellbauer, 1974; Samuelson, 1966), 6 ct (wt , y¯ t , at ) = ct (wt , y¯ t )T at .
(3)
Revenue at t + 1 is the random price times production7 R t +1 =
nY −
(pYi ,t +1 y¯ i,t ai,t (1 + εi,t +1 ))
i=1
≡ (ι + εt +1 )T ∆(pY ,t +1 )∆(at )¯yt . 2 Net worth and wealth are used here interchangeably. For proprietorships, it is especially difficult to measure and untangle the contributions of human and physical capital. 3 Lence (2000) apparently had reasonable success calculating aggregate con-
(1)
(4)
6 Although x has a t subscript, this is a typical simplification for problems t without intra-seasonal states. 7 A futures market activity can also be added for each output. In this case,
πt + 1 =
ny − [phi ,t hi,t + pYi ,t +1 (¯yi,t (1 + εi,t +1 ) − hi,t )] − (1 + rt )c (wt , y¯ t , at ), i=1
where ht is the vector of hedging activities with associated forward or futures price ph,t .
R.D. Pope et al. / Journal of Econometrics 162 (2011) 35–43
Wealth is allocated at the beginning of period t to assets, production costs, and consumption, satisfying Wt = Bt + Ft + Lt + ct (wt , at , y¯ t ) + mt
= Bt +
pTF ,t ft
+ pL,t ι at + ct (wt , y¯ t ) at + mt . T
T
(5)
Thus, beginning-of-period wealth consists of holdings of bonds Bt with numeraire price 1, risky financial assets Ft , and land Lt , plus cash on hand used to finance production costs and consumption throughout the period. Although some costs occur at or near harvest (near t + 1), we include all costs in (5) at time t because they are incurred before revenues are received. Consumer utility maximization yields the quasi-convex indirect utility function conditioned on consumer good prices and expenditures,
υ(pQ ,t , mt ) ≡ max u(q) : pTQ ,t q = mt . n
(6)
Q
q∈R+
Realized end-of-period wealth is Wt +1 = (1 + rt )Bt + (ι + δt +1 + γt +1 )T ∆(pF ,t )ft
+ (1 + ψt +1 )pL,t ιT at + (ι + εt +1 )T ∆(pY ,t +1 )∆(at )¯yt . (7) Thus, the decision maker’s wealth is increased by the return on assets including interest, dividends, asset appreciation (less depreciation), and farm revenue. The decision maker’s intertemporal utility is assumed to follow UT (q1 , . . . , qT ) =
T − (1 + ρ)−t u(qt ).
(8)
t =0
The producer is assumed to maximize Von Neumann–Morgenstern expected utility of the discounted present value of periodic utility flow from consumption. 4. A solution approach The problem is solved by stochastic dynamic programming, treating T as fixed and finite, and working backwards from the last period in the planning horizon to the first. In the last period of the planning horizon, the decision is simply to invest or produce nothing and consume all remaining wealth, mT = WT . Denote the last period’s optimal value function by vT (WT ). Then, vT (WT ) = υ(pQ ,T , WT ) is the optimal utility for the terminal period. For other time periods, stochastic dynamic programming using (6)–(8) to optimize agricultural production, asset ownership and net investment decisions in each period yields the (Bellman) backward recursion problem for arbitrary t < T ,
vt (Wt ) ≡
max
(m,B,f ,k ,a,¯y )
υ(pQ ,t , m) + (1 + ρ)−1
+ (1 + ψt +1 )pL,t ι a + (ι + εt +1 ) ∆(pY ,t +1 )∆(¯y )a) : Wt = B + pTF ,t f + pL,t ιT a + ct (wt , y¯ )T a + m . (9) T
The associated Lagrangean is L = υ(pQ ,t , m) + (1 + ρ)−1
× Et vt +1 (1 + rt )B + (ι + δt +1 + γt +1 )T ∆(pF ,t )f + (1 + ψt +1 )pL,t ιT a + (ι + εt +1 )T ∆(pY ,t +1 )∆(¯y )a + λt Wt − B − pTF ,t f − pL,t ιT a − ct (wt , y¯ )T a − m .
λt > 0 and ∂υ(pQ ,t , mt ) = ∂ mt
1 + rt
1+ρ
Et νt′+1 (Wt +1 ) = λt .
(11)
Substituting the middle term for the Lagrange multiplier, λt , the remainder of the Kuhn–Tucker conditions for an optimal solution can be written as
∂L = ∆(pF ,t )Et vt′ +1 (Wt +1 ) (δt +1 + γt +1 − rt ι) ≤ 0, ∂f ∂L f ≥ 0, f T =0 (12) ∂f ∂L = Et vt′ +1 (Wt +1 ) (∆(pY ,t +1 )∆(¯y )(ι + εt +1 ) ∂a ∂L +(ψt +1 − rt )pL,t ι − (1 + rt )ct ) ≤ 0, a ≥ 0, aT = 0; (13) ∂a ∂L = Et vt′ +1 (Wt +1 ) (∆(pY ,t +1 )∆(a)(ι + εt +1 ) ∂ y¯ ∂cT ∂L − (1 + rt ) t a) ≤ 0, y¯ ≥ 0, y¯ T = 0. (14) ∂ y¯ ∂ y¯ Eq. (11) is the fundamental consumption smoothing equation. By the envelope theorem, λt = vt′ (Wt ), so that this Euler equation can be represented equivalently either in terms of the marginal utility of consumption or of wealth. The conditions in (12) represent financial asset arbitrage. Agricultural asset and production choices are found similarly from (13) and (14). The marginal net benefit of holding land in (13) includes future appreciation (or depreciation) ψt +1 pL,t , the marginal impact on costs of production, and the opportunity cost, rt pL,t . The marginal net benefit of expected crop yields in (14) includes random marginal revenue and marginal cost. Assuming an interior solution and dividing (13) by pL,t yields Et vt′ +1 (Wt +1 ) (ψt +1 − rt )ι + πt +1 /pL,t
= 0,
(15)
where πt +1 is a vector of short run per acre profits from crop production, defined as πt +1 = ∆(pY ,t +1 )yt +1 − (1 + rt )ct . Further defining the excess return rate for the ith crop, ei,t +1 = ψt +1 + (πi,t +1 /pL,t ) − rt , Eq. (15) can be expressed as Et vt′ +1 (Wt +1 )ei,t +1 = 0,
i = 1, . . . , nY .
(16)
Eq. (16) thus resembles other asset equations with dividends, where the dividend for production is the per acre profit from land relative to land prices. 5. Aggregation across choices and households
× Et vt +1 ((1 + rt )B + (ι + δt +1 + γt +1 )T ∆(pF ,t )f T
37
(10)
We assume that bonds and consumption expenditures are positive and that the wealth constraint holds with equality. Hence,
Data sets that contain all required farm financial and production data for implementation of the above model at the farm household level are lacking. Although a few surveys of farm households are available, they have serious shortcomings for this application. For example, the Agricultural Resource Management Survey conducted by the National Agricultural Statistics Service (NASS) for the ERS is periodic but is not a panel as necessary for estimating dynamic relationships. On the other hand, other micro-level data such as the Kansas State KMAR data are focused on production. Financial variables occur as holdings or expenditures without identifying whether an increase is due to asset appreciation or additional investment. Alternatively, aggregation both over households and observable asset categories facilitates application with available aggregate data. To illustrate the implied arbitrage equation for aggregate financial assets, consider (12). Kuhn–Tucker conditions imply that
38
fT
R.D. Pope et al. / Journal of Econometrics 162 (2011) 35–43
∂L = f T ∆(pF ,t )Et vt′ +1 (Wt +1 )(δt +1 + γt +1 − rt ι) ∂f nf − ′ = Et vt +1 (Wt +1 ) [Fi,t (δi,t +1 + γi,t +1 − rt )] = 0 (17) i =1
where Fi,t = pFi ,t fi,t . Dividing this expression by Ft , assuming Ft > 0, obtains the financial arbitrage condition, Et vt′ +1 (Wt +1 )eF ,t +1 = 0,
(18)
∑nf
where eF ,t +1 = i=1 (Fi,t /Ft )(δi,t +1 +γi,t +1 )− rt is excess return on the weighted average of financial investments. Eq. (18) is a crucial equation for our application because it contributes to the identification of properties of vt′ +1 (Wt +1 ), which contributes to the identification of (15). Another important issue of aggregation occurs across consuming units. Our data are available at the state level and are thus an aggregation across micro units. Aggregation usually washes out some variation in data so that perceived variability at the micro-level is inconsistent with variation reflected in aggregate data. Although many of the variables may be subject to heterogeneity, the most serious heterogeneity likely occurs in wealth, the production disturbance, and cost. We illustrate briefly the difficulty with heterogeneous production disturbances. Adding h subscripts to represent households and j subscripts to represent states, (16) can be expressed in the form Et [vt′ +1 (Wh,j,t +1 )ei,h,j,t +1 ] = 0. Summing over all Hj households in state j yields the implications of first-order conditions at the state level, Et
H j − 1 h=1
Hj
vt +1 (Wh,j,t +1 )ei,h,j,t +1 = 0. ′
(19)
This is not the condition imposed by a representative household ¯ j,t +1 )¯ei,j,t +1 ] = approach with average state-level data, Et [vt′ +1 (W 0, where overbars denote averaging. Techniques popularized by Aczel (1966) could be used to obtain restrictions on (19) required for exact aggregation. Alternatively, suppose state-level excess returns are related to individual excess returns by ei,h,j,t +1 = e¯ i,j,t +1 + ui,h,j,t +1 , and that the individual marginal utility of wealth is related to the marginal utility of wealth at the state-level average wealth by vt′ +1 (Wh,j,t +1 ) =
¯ j,t +1 ) + ςh,j,t +1 . Then, using state-level data yields vt′ +1 (W Et [vt′ +1 (Wh,j,t +1 )ei,h,j,t +1 ]
¯ j,t +1 ) + ςh,j,t +1 ][¯ei,j,t +1 + ui,h,j,t +1 ] = Et [vt′ +1 (W ¯ j,t +1 )¯ei,j,t +1 ] = Et [vt′ +1 (W ¯ j,t +1 )ui,h,j,t +1 ] + Et [ςh,j,t +1 e¯ i,j,t +1 ] + Et [vt′ +1 (W + Et [ςh,j,t +1 ui,h,j,t +1 ].
(20)
None of the latter three terms of (20) need vanish in general. In particular, because ui,h,j,t +1 is a deviation in return realized at t + 1 and ςh,j,t +1 depends on wealth at t + 1, which includes this realized return, these two terms are likely to be jointly determined. Therefore, as in a typical setting with state-level panel data, we consider fixed state effects in the form
¯ j,t +1 )¯ei,j,t +1 ] − (αi + φj ) = 0, Et [vt′ +1 (W
(21)
where αi and φj are parameters to be estimated, i = 1, . . . , nY , j = 1, . . . , 8, and nY represents the number of moment equations. These parameters appear here with a minus sign so that positive estimates will correspond to overshooting of an arbitrage condition. Time effects can also be added, t = 1, . . . , T , but because of the relatively short time period used in our data set, time effects do not appear useful. As usual, an idiosyncratic disturbance with zero expectation is added later for econometric purposes.
6. Identification of utility in arbitrage conditions To model a set of endogenous choices, a system of equations that determines all choice variables is preferred. This requires as many equations as decisions. However, any identified equation can be helpful to answer a particular question. For example, a single arbitrage condition such as (21) may serve to identify preference parameters. However, potential endogeneity issues must be addressed. We apply GMM with instruments to correct for endogeneity. When compared with the Euler formulations of arbitrage conditions, however, conditions in the form of (16) and (18) can present a numerical challenge for identification of utility. For econometric purposes, the marginal utility of wealth is typically formulated as vt′ +1 (Wt +1 , β), where β is a parameter or set of parameters to be estimated. Letting β be a scalar for illustration, vt′ +1 : R+ × R → R+ where Wt +1 ∈ R+ and β ∈ R. However, the model is not identified in the domain R of the parameter space in a way that makes economic sense. For example, consider the usual GMM estimator for an arbitrage condition defined by g (Yt , β) = Et [ZtT vt′ +1 (Wt +1 , β)et +1 ]
= Et [ZtT Φ (Wt +1 , et +1 ; β)] = 0,
(22)
where Wt represents wealth, ei,t +1 is excess returns, β is the preference parameter to be estimated, Φ represents the arbitrage conditions associated with first-order conditions, Zt represents a set of instruments in the information set that forms expectations at time t, and Yt = (Wt +1 , et +1 , Zt ) represents all the data collectively. Iterated expectations rationalizes the use of a sample moment of the ∑ ¯ (β) = Tt =1 ZtT Φ (Wt +1 , et +1 ; β)/T to minimize form m
¯ (β)T Am ¯ (β) , βˆ = arg min m
(23)
where A is the positive definite weighting matrix. If there exists β0 ∈ R such that marginal utility is (numerically indistinguishable from) zero, vt′ +1 (Wt +1 , β0 ) = 0 for all t, which implies that
¯ (β0 ) = 0, then βˆ = β0 solves Et Φ (Wt +1 , et +1 ; β0 ) = 0 and m (23) regardless of the true parameter. Unfortunately, this problem occurs with many common utility functional forms. For example, marginal utility under constant relative risk aversion (CRRA), −β vt′ +1 (Wt +1 ) = Wt +1 , implies that the minimum of (23) will occur for β large enough to be make this marginal utility numerically zero. A similar result occurs for constant absolute risk aversion (CARA) and generalizations of these two forms in common use (e.g., see Meyer and Meyer, 2006). Several approaches can be taken to solve this problem. Among them is restriction of the domain of β to make utility strongly monotonic. This is not easy because the domain of positive marginal utility depends on the arbitrage equation and the data. Instead, we consider a Taylor series approximation of the utility function. With a small number of terms as required for practical application, the accuracy of the approximation is less, but the parameters are estimable. In the consumption based Euler equations of Hansen and Singleton, a typical CRRA equation would be of the form Et
qt +1 qt
−β ˜ R t +1 1+ρ
−1=0
(24)
where q is the single consumption good, R˜ is one plus the rate of return on the asset including dividends, and β is the Arrow–Pratt measure of relative risk aversion. No parametric restriction on β for all values of the data implies that the expectation is one for all values of the data. A Taylor series approximation of marginal utility possesses this same virtue for the arbitrage conditions estimated here. To illustrate, the marginal utility for a CARA utility function can be written
R.D. Pope et al. / Journal of Econometrics 162 (2011) 35–43
as e−β Wt +1 and the arbitrage condition is Et et +1 /eβ Wt +1 = 0. Using GMM with this utility function would choose β large enough to make the moment condition identically zero at least numerically. However, the arbitrage conditions in (16) and (18) with a second-order Taylor series approximation of e−β Wt +1 around e−β Wt can be stacked and parameterized as
Φ (Wt +1 , et +1 ; β) = e−β Wt Et (1 − β(Wt +1 − Wt ) + 1/2β 2 (Wt +1 − Wt )2 )et +1 = 0.
(25)
Presuming that e−β Wt ̸= 0 for the trueβ , the root of (25) is such that the term in parenthesis is not zero.8 That is, no β ∈ R exists such that the approximated marginal utility is zero for all values of wealth. One can easily expand this argument to higher order approximations.
39
values and net farm returns) are deflated by the GDP implicit price deflator. The nominal rate of return for agricultural assets is calculated as the percentage change in nominal annual average value per acre of farmland and buildings. Wealth varies from $ 221,665 to $ 778,139 with a mean wealth of $ 413,855. Excess returns for the market vary from −.1351 to .1319 with a mean of .1072. Excess rates of returns for the crops vary from a low of −.1354 for Corn (mean = .0268) to a high of .1585 for soybeans (mean = .0639). The excess return that seems to be particularly low is wheat, which has a mean across all states of near zero. Clearly, however these returns vary by state. For instance, Iowa has very low returns to wheat yielding second-degree stochastic dominance and very little is grown. The standard deviations are $ 112,507 for wealth and, respectively, for the four arbitraged goods: .1319, .0532, .0447, .0978 for the market, corn, soybean and wheat land, respectively. 8. Econometric specification
7. Data and estimation strategy No carefully constructed publicly available panel of agricultural data including farm and off-farm decisions and wealth variables exists. The periodic Survey of Consumer Finances and the Panel Study of Income Dynamics has too little farm information to give a very complete picture of decisions and representation of farm households. The best available data on wealth are found in the Agricultural Resource Management Survey and the US Census of Agriculture, which are conducted by NASS. For reasons explained above, this survey does not suffice for application of our model at the micro level. However, data from this survey have been used within ERS to estimate average farm household net worth (wealth) by state for the period 1991–2001. These data include eight states in the North Central Region of the US: Illinois, Indiana, Iowa, Michigan, Minnesota, Missouri, Ohio, and Wisconsin. These data are not without issues but seem to be the best available source for net worth and are actively used by government personnel in the ERS for research. Alternative data (such as land value) would omit non-farm assets, which are a substantial portion of farm households’ net worth (Mishra and Morehart, 2001) and are intended as a key source of identification for this study. Although the time-period is short and in some ways atypical due to the run up of the stock market in the 1990s, this variation is ideal for identifying the arbitrage effects on agriculture of the returns to financial assets. This period also has the advantage that the impact of government policy on crop substitution is relatively reduced and less complicated. For example, the Freedom to Farm Act of 1996 culminated a growing effort to decouple farm subsidies from acreage allocation decisions. For each state, net returns and acres allocated to corn, soybeans, and wheat are from NASS (http://www.nass.usda.gov/index.asp), as are per acre production costs by crop (http://www.ers.usda.gov/ data/costsandreturns/testpick.htm). The rate of return on bonds is the annualized return on 90-day treasury bills in secondary markets from the Federal Reserve (http://www.federalreserve.gov/ releases/h15/data/Annual/H15_TB_M3.txt). The return on financial assets is the share-weighted rate of return on the S&P from Shiller (1992) as updated (http://www.econ.yale.edu/∼shiller/ data/chapt26.xls). The average per acre value of farmland and buildings by state is obtained from the ‘‘Farm resources, income, and expenses’’ chapter of the NASS publication Agricultural Statistics published annually from 1995 to 2005. Where discrepancies exist from one year’s publication to the next, the most recently published data are used because NASS updates estimates as further information becomes available. All monetary variables (asset
8 Indeed, one can then divide by e−β Wt to eliminate it.
Thus, our estimated arbitrage conditions include (18) and land asset arbitrage equations as in (16) for the three crops. These arbitrage conditions are parameterized by the additional αi and φj parameters as defined in (21) to account for the use of aggregated state-level rather than micro-level data. These parameters allow testing for departures from full arbitrage, but also represent heterogeneity within states and crops as explained above. Given the nature of the data, we suspected that there might be a difficulty in measuring precisely the curvature of risk aversion. However, in order to explore this issue, consider the marginal utility function ν ′ (Wt ) = e−β f (Wt ) , where f (Wt ) is parametrized as (Wtκ + κ − 1)/κ . CARA is represented by κ = 1, while CRRA is given by κ = 0. Absolute risk aversion, β W κ−1 , is decreasing if κ < 1 while relative risk aversion, β W κ , is increasing if κ > 0. In the interval 0 ≤ κ ≤ 1, in our later empirical model, we could not reject CARA and indeed found that the lowest value of the criterion function occurs at κ = 1.9 Hence, CARA (25) is used throughout the analysis. Since the time period is short, a quadratic approximation of the utility function is believed to adequately capture the curvature of utility as wealth changes (estimation with a cubic approximation showed that the estimate of risk aversion was only altered by less than 2%). Specifically, the four estimated equations are Et [vt′ +1 (Wj,t )ei,t +1 ] − αi − ϕj = ζi,j,t +1 , E (ζi,j,t +1 ) = 0, i = F , C , S , W , j = 1, . . . , 8, t = 1, . . . , 9,
(26)
where subscripts F , C , S, and W refer to financial assets, corn, soybeans, and wheat, respectively, vt′ +1 is approximated as in (25) with state per farm wealth Wj,t (dropping the overbars for convenience), and ζi,j,t +1 is an idiosyncratic disturbance added for econometric purposes. Only 9 time series observations are available for estimation because the tenth year is used to represent time t + 1 for year 9. Neither the selection of states nor crops is conceptualized as a random draw from among states and crops, respectively. Hence, equations are modeled as having common fixed effects, where αi is the overall intercept for the ith moment equation. This model is referred to as Model 1. Since Wisconsin is omitted, its fixed effects are included in the equation-specific constants so each φj can be interpreted as relative to Wisconsin. In addition, we assume that ζi,j,t +1 is potentially serially correlated, and correlated across equations and heteroskedastic. Due to the paucity of data, we present the Newey–West procedure and use a Bartlett Kernel with lag 1.
9 For example, corresponding to the first column in Table 1, the objective function in (29) is .8872 for κ = 1 and.8901 when κ = 0 with a constant covariance matrix.
40
R.D. Pope et al. / Journal of Econometrics 162 (2011) 35–43
There are other ways to introduce heterogeneity and/or think about αi + φj + ζi,j,t +1 in (26). To the extent that ζ has systematic time effects, αi represents the mean of the regression disturbance leaving ζ to have mean zero. That is, the equation intercepts may be interpreted as including an average time effect to the extent that time effects cause a deviation from full arbitrage on average over the period of estimation. For example, the volatility coupled with rapid growth in the stock market during 1991–2000 may have caused larger estimated deviations from full arbitrage in the form of larger covariances as represented in (21). A second way to introduce heterogeneity into our model is to have state-level utility functions. In this case, the utility function parameter is represented as
β = β1 +
8 −
βj dj .
(27)
j =2
This is called the ‘‘heterogeneous preference’’ model or Model 2. Although combining this specification with (21) may be useful conceptually, our data do not appear to allow estimation with both the state effects in (21) and the state-level utility functions implied by (27) simultaneously. Hence, we do not estimate fixed state effects in the heterogeneous preference model. As in Lence (2000), timing issues in the net worth data are problematic. Apparently, from all econometric tests and analysis, returns are not added into net worth until the following year. Thus, excess returns at time t are added to Wt +1 . Since our approximation of utility uses Wt as the mean of wealth at t + 1, we have chosen to use wealth at t − 1 rather than at time t as an instrument. Specifically, we use Wt −1 and Wt2−1 along with state dummies, unemployment at t, and the GDP growth rate at t as the instruments that comprise Zt . Thus, only eight observations on the eight states are available for estimation. Stacking the equations in (26) yields the initial moment conditions for GMM, which are represented compactly as g (Yt , θ) = ZtT Φ (Wt +1 , et +1 ; θ )/T = ZtT ζt +1 /T ,
(28)
where T is the number to observations, Φ is as specified in (25) with additional parameters included as specified in (21), (26), and (27) to account for the use of aggregate data, gt is vector-valued with the four equations in (26), and θ = (β, α, φ) ∈ Θ where β represents the risk aversion parameter including parameters reflecting state-level variations of it, α represents the vector of wedges in arbitrage conditions in (26) associated with source of returns, and φ represents the vector of state fixed effects. Since, we have a fixed effects model, we invoke the usual regularity conditions with pooled asymptotics (as opposed to panel asymptotics, which concern whether time or the size of the cross-section grows larger or faster) (e.g., Hall, 2005 or Newey and McFadden, 1994). Thus, for simplicity of notation, we will use T to represent the number of observations across states and years. The standard GMM estimator for our problem is
θˆ = arg min Q (θ ) = gTT AT gT (29) ∑T where gT = t =1 g (Yt , θ )/T and AT is positive definite and con-
verges to a positive definite constant matrix. Under these assumptions, the estimator θˆ is consistent, θˆ →P θ . Furthermore,
√
D
T (θˆ − θ0 ) → N (0, (∇ gTT A∇ gT )−1
× ∇ gTT ASAT ∇ gT (∇ gTT A∇ gT )−1 ), (30) where ∇ gT ≡ ∂ gT (θˆ )/∂θ , assuming that√ E (Z ζ ζ Z ) is finite and positive definite, and that E [limT →∞ Var( T gT (θ0 ))] = S is a finite, positive definite matrix. With A efficiently chosen as proportional to S –1 in (30), the covariance matrix estimate reduces to [T ∇ gTT S ∇ gT ]−1 . However, T
T
T
we estimate S by HAC with a Barlett Kernel, Sˆ = sˆ (0) +
K −
1−
k=1
k K +1
sˆ (k) + sˆ (k)T ,
(31)
where sˆ (k) is the sample autocovariance of the errors of the moment conditions in (28) (e.g., Newey and West, 1987, Hall, 2005, p. 127). 9. Empirical results We first estimate Model 1—the model in (26) using fixed effects as in (21). The states are: 1—Illinois, 2—Indiana, 3—Iowa, 4—Michigan, 5—Minnesota, 6—Missouri, 7—Ohio, 8—Wisconsin. Clearly, both the equation intercept αi and the coefficients of the dummy variables φ1 , . . . , φ8 cannot be identified without some restriction. One common restriction is αi = 0, where all eight fixed effects are estimated. Alternatively, the average of the state effects can be zero (an averaging analogue to the random effects assumption). This can be done in Model 1 by including dummy variables for each of seven states and having an intercept. In what follows, the omitted reference state is Wisconsin. Thus, for example, α1 + φ1 is the coefficient for Illinois in the financial arbitrage equation. Model 2 uses the heterogeneous preference specification in (27) with Wisconsin also omitted as the reference state. Table 1 presents the GMM estimates for the fixed effects model (Model 1) with heteroskedastic corrections as well as HAC estimators of the covariance matrix. Standard errors in both cases are similar in magnitude for most parameters. However, the standard error for the risk aversion parameter is considerably lower with the HAC estimator. Both present strong evidence against risk neutrality in favor of risk aversion. Given these estimates for αF − αW , the evidence is also convincing that the simple aggregate versions of the representative arbitrage conditions do not hold. In each case, the standard first-order conditions based on aggregate data overshoot zero leaving a positive and significant intercept αi , i = F , C , S , W . In the ith equation and the jth state, the estimated intercept is αˆ i + φˆ j . Using the first column to calculate average intercepts, they are .0860 for financial assets, .0215 for corn acreage, .0453 for soybean acreage, and .0039 for wheat acreage. The arbitrage condition with the largest mean deviation is for the stock market (financial assets) while wheat has the lowest. The same qualitative conclusion holds with Newey–West standard errors. These results are unsurprising given the period under consideration. However, for a few states and assets, the result differs substantially from the average. For example, for wheat in Illinois, αW = .0215 and the fixed effect coefficient is φ1 = −.0266, indicating that the wheat arbitrage equation underestimates full arbitrage for Illinois by .0051 (a similar condition holds for Missouri and Minnesota wheat). The strong overshooting of the arbitrage conditions for other crops likely drives this. For example, for Illinois corn and soybeans, the wedges are .0139 and .0391, respectively. The states with the largest average deviations include Michigan, Ohio and Indiana. Another issue is whether one should consider separate risk preferences for financial and cropland investors, which is in contrast to the model. This would involve specification of separate β s for each equation in (26). Though not reported, the computed quasi-likelihood statistic (which is asymptotically distributed as a χ 2 (3) random variable under the null hypothesis, H0 : βF = βC = βS = βW ) is 1.27 with a p-value of .75. Hence, the evidence supports the conclusion of an integrated investor as modeled in (10). Fixed effects often can be usefully ignored by differencing the data. Due to the small sample size in this case, differencing by equation rather than temporally appears more desirable. This has the advantage of eliminating possible misspecification due to the
R.D. Pope et al. / Journal of Econometrics 162 (2011) 35–43
41
Table 1 Estimated arbitrage conditions with fixed effects. Parameter
Model 1 Fixed effects, Panel White Standard errorsa
Model 1 Fixed effects, Panel Newey–West Standard errors
αF αC αS αW β φ1 φ2 φ3 φ4 φ5 φ6 φ7
.1040 (.0127)* .0395 (.0059)* .0633 (.0083)* .0219 (.0051)* .08504 (.0283)* −.0271 (.0074)* −.0069 (.0078) −.0288 (.0093)* −.0001 (.0051) −.0291 (.0083)* −.0244 (.0063)* −.0075 (.0061) 57.3634 (.037)b
.1053 (.0102)* .0405 (.0050)* .0657 (.0073)* .0215 (.0047)* .0705 (.0241)* −.0266 (.0057)* −.0069 (.0064) −.0031 (.0093)* .0026 (.0041) −.0304 (.0076)* −.0244 (.0054)* −.0068 (.0052) 33.3908 (.761)
J-Statistic
a The reported coefficients for risk aversion, the β ’s, and their standard errors are scaled up by 105 . The omitted state in the fixed effects model is Wisconsin. See the text for the pairing of α ’s with states. Durbin–Watson statistics are very similar for the two columns. For the second column, the statistics are: 1.594, .936, .999 and 1.353, respectively, for the four equations (financial, corn, soybeans, and wheat). b For the J-test, p-values are reported in parentheses. * Denotes statistical significance at the .01 level.
Table 2 Difference in excess returns under homogeneous preferences. Coefficient
Differenced Model 1a Fixed effects, Panel White Standard errors
Differenced Model 1 Fixed effects, Panel Newey–West Standard errors
β αC , F αS , F αW , F
.000000865 (.000000562) −.0737* (.0209) −.0468* (.0145) −.0879* (.02294) 42.7996b (.000)
−.05947 (.0150) −.0368 (.0105) −.0728 (.0173)
J-Statistic
.000001108 (.000000510)
27.1752 (.018)
a
Durbin–Watson’s are very similar for the two columns. For the second column, they are 1.412, 1.4777, and 1.6874, respectively, for the corn, soybean, and wheat equations. b For the J-test, p-values are reported in parentheses. * Denotes statistical significance at the .01 level. Table 3 Estimated arbitrage conditions with heterogeneous preferences. Parameter
Model 2a Panel White Standard errors
Model 2 Panel Newey–West Standard errors
αF αC αS αW β β1 β2 β3 β4 β5 β6 β7
.0991 (.0194)* .0280 (.0061)* .0575 (.0083)* .0050 (.0031) . 1236 (.0474)* −.1534 (.0256)* −.1132 (.0028)* −.0189 (.0263) −.0828 (.0183)* −.0830 (.0266)* −.1005 (.0242)* −.1541 (.0329)* 52.0327 (.041)b
.0941 (.0129)* .0259 (.0042)* .0552 (.0089)* .0039 (.0022) .1334 (.0369)* −.1514 (.0204)* −.1239 (.0270)* −.0329 (.0222) .0998 (.0126)* −.0864 (.0274)* −.0926 (.0169)* −.1519 (.0310)* 31.0758 (.702)
J-Statistic
The reported coefficients for risk aversion, the β ’s, and their standard errors are scaled up by 105 . The omitted state in the heterogeneous risk aversion model is Wisconsin. Durbin–Watson statistics for the two columns are very similar. For column two, they are 1.5997, .9332, 1.0083 and 1.332, respectively, for the financial, corn, soybean, and wheat equations. b For the J-test, p-values are reported in parentheses. * Denotes statistical significance at the .01 level. a
included state effects. Subtracting the excess return for financial assets from the three crops defines excess returns in terms of the difference of returns on crops compared to the stock market. Given the unusual stock market period, we might expect negative intercepts. Table 2 presents the estimates for the fixed effects model with differencing. The estimated risk aversion values are similar but higher for HAC covariance estimates. However, .1108 × 10−5 is not statistically different from .7050 × 10−6 ; the t-value is 1.1649 under the null hypothesis of equality of the coefficients. Moreover, all the intercepts in Table 2 are negative, indicating that the arbitrage condition on average undershoots zero. Taking the absolute value of the coefficients, the largest is wheat with soybeans slightly smaller than corn. Thus, the most significant negative departure from the arbitrage conditions is for wheat, which
has significantly lower risk-adjusted returns than the market portfolio. Durbin–Watson statistics are also considerably improved.10 In summary, the estimates of the more parsimonious parametric structure in Table 2 seem commensurate with the results of fixed effects found in Table 1. Turning to the heterogeneous preference structure of Model 2, Table 3 presents results for both White and Newey–West covariance estimation. Wisconsin is again omitted and all states are measured with β plus the state-level coefficient for the other shifters.
10 Durbin–Watson statistics are computed as d = i F , C , S, W .
∑8
j=1
∑9
t =2 (ei,j,t −ei,j,t −1 ) ∑8 2 t =1 ei,j,t
∑8
j=1
2
, i =
42
R.D. Pope et al. / Journal of Econometrics 162 (2011) 35–43
Thus, using the first column, for example, risk aversion is measured for Illinois as .000001236–.000001534 which would suggest riskloving preferences. Similarly, Ohio is estimated to have a negative level of risk aversion. However, the corresponding t-values for Illinois and Ohio, respectively, for a null hypothesis of risk neutrality using the White standard errors are −.5451 and −.5083. Hence, no clear evidence of risk preference is indicated for these states. Consistent with Table 1, the arbitrage conditions tend to overshoot zero. All the α coefficients are positive; however, the wheat alpha coefficient provides relatively weak evidence that it is not zero. Yet, the standard error is much lower under Newey–West standard errors with a p-value of .074. Performing a nested test for homogeneity of risk preferences (with the null hypothesis H0 : β2 = β3 = · · · = β8 = 0 against a two-sided alternative) using a quasi-likelihood ratio test yields a D
statistic of 1.68 for which TQ (θˆ ) → χ 2 (7). This provides virtually no support for rejecting the hypothesis of a single risk preference parameter (the upper tail p-value is approximately .95). Hence, little evidence supports further pursuit of Model 2 (although no completely correct nested or non-nested test of Model 1 versus Model 2 was done due to the paucity of the data). Another issue of model performance is the incidence of positive marginal utilities. For Table 1, marginal utilities ranged from approximately .80 to 1.20 with a mean of approximately .98. The sample standard deviation was approximately .088. Thus, marginal utilities were positive as required with considerable variation. Similar conclusions emerge from Table 2 as well. Finally, we consider whether our estimates of risk aversion are credible in magnitude. Estimates of .00000071–.00000085 in Table 1 represent a lower level of risk aversion than measured by many studies. However, the level of wealth in our model is different than in many studies. Saha et al. (1994) in their Table 2 report several estimates of absolute risk aversion in and out of agriculture ranging from 0 to 14.75 with all but one study finding estimates above .0012. Many of these seem implausibly large and are certainly large in comparison with our estimates. For example, using the methodology of Rabin (2000), Just and Peterson (2003) have found the estimates of Saha, Shumway, and Talpaz, ranging from .0045 to .0083, to be implausibly high. To illustrate simply by equating E (e−β X ) = e−β c , where X is a random payoff and c is a certain payoff, a person with absolute risk aversion equal to .2 would reject a bet with a 99% chance of $ 1000,000 and 1% chance of nothing in favor of a certain $ 24. Or, a person with absolute risk aversion of .01 would reject the bet in favor of a certain $ 461. Each of these has a risk premium over $ 989,000 on a nearly sure gain of $ 1000,000. These are levels commonly estimated but appear to represent unreasonably cautious behavior. By comparison, our estimate of .00000085 in the first column of Table 1 compares to a certain $ 985,527 and a risk premium of $ 5659 for this bet. This absolute risk aversion implies relative risk aversion from .188 to .661 with relative risk aversion .352 at mean wealth. This is lower than most estimates in other settings. However, we believe these results are more plausible given the nature of agricultural risk and the self-selected choices of producers to enter agriculture. We suggest that an advantage of our arbitrage model is to avoid over-attribution of behavior to risk aversion, a proven weakness of some preceding approaches. 10. Conclusions This article has considered asset choice for a rational, expected utility maximizing, and forward-looking producer/investor/ consumer. After deriving the optimal conditions for choice by backward recursion, Euler equations for consumption and asset accumulation are derived. Of particular interest are the arbitrage
conditions for a point in time involving expected future returns on investment multiplied by the marginal utility of wealth. These conditions do not require the measurement of consumption but require reasonably good wealth measurements. The model is estimated using state per-farm aggregates for net worth, and a decade of data from the 1990s for the North Central region of US crop production as well as market and bond returns. Due to the relatively short time series, CARA risk preferences are specified. The risk preference parameter is statistically different from zero and estimated to be positive indicating risk aversion. There is no evidence for segmented risk preferences where the risk preference parameter differs by asset. Further, the hypothesis of homogeneity of risk preferences against the hypothesis of state differences in risk preferences cannot be rejected. Thus, a single risk preference parameter rationalizes the data. However, evidence suggests that the standard arbitrage equilibrium does not hold, either due to aggregation errors or the short anomalous time period. Fixed effects are added to the econometric model. These are statistically significant deviations from the standard model of arbitrage equilibrium and may prove useful in other settings. Due to the annualized nature of agricultural crops, long time series comparable to off-farm financial returns are unlikely to be available for future research. However, a longer series of net worth or consumption at the disaggregated level or development of a continuous-time production approach, such as with some types of livestock, may offer fruitful possibilities to continue and improve on this line of research. References Aczel, J., 1966. Lectures on Functional Equations and Their Applications. Academic Publishers, New York. Antle, J.M., 1987. Econometric estimation of producers’ risk attitudes. American Journal of Agricultural Economics 69, 509–522. Behrman, J.R., 1937–63. Supply Response in Underdeveloped Agriculture: A Case Study of Four Major Crops in Thailand. North-Holland, Amsterdam. Carroll, C., 2001. Death to the log-linearized consumption Euler equation! (And very poor health to the second-order approximation). Advances in Macroeconomics 1, 1–36 (Article 6). Estes, E.A., Blakeslee, L.L., Mittelhammer, R.C., 1981. On variances of conditional linear least-squares search parameter estimates. American Journal of Agricultural Economics 63, 141–145. Hall, Robert E., 1978. Stochastic implications of the life cycle-permanent income hypothesis: Theory and evidence. Journal of Political Economy 86, 971–987. Hall, A.R., 2005. Generalized Method of Moments. Oxford University Press, Oxford, UK. Hansen, L.P., Singleton, K.J., 1983. Stochastic consumption, risk aversion, and the temporal behavior of asset returns. Journal of Political Economy 91, 249–265. Holt, M.T., Aradhyula, S.V., 1998. Endogenous risk in rational-expectations commodity models: A multivariate generalized ARCH-M approach. Journal of Empirical Finance 5, 99–129. Holt, M.T., Moschini, G., 1992. Alternative measures of risk in commodity supply models: An analysis of sow farrowing decisions in the United States. Journal of Agricultural and Resource Economics 17, 1–12. Hoppe, R.A., Banker, D.E., 2006. Structure and finances of US farms: 2005 family farm report, Economic Information Bulletin No. EIB-12 Economic Research Service, USDA, Washington DC. Just, R.E., 1974. An investigation of the importance of risk in farmers’ decisions. American Journal of Agricultural Economics 56, 14–25. Just, D.R., Peterson, H.H., 2003. Diminishing marginal utility of wealth and calibration of risk in agriculture. American Journal of Agricultural Economics 85, 1234–1241. Kreps, D., Porteus, E., 1978. Temporal resolution of uncertainty and dynamic choice theory. Econometrica 46, 185–200. Laibson, D., 1997. Golden eggs and hyperbolic discounting. Quarterly Journal of Economics 112, 443–477. Lence, S., 2000. Using consumption and asset return data to estimate farmers’ time preferences and risk attitudes. American Journal of Agricultural Economics 82, 934–947. Lin, W., 1977. Measuring aggregate supply response under instability. American Journal of Agricultural Economics 59, 903–907. Lin, W., Dean, G.W., Moore, C.V., 1974. An empirical test of utility vs. profit maximum in agricultural production. American Journal of Agricultural Economics 56, 497–508.
R.D. Pope et al. / Journal of Econometrics 162 (2011) 35–43 Lin, W., Dismukes, R., 2007. Supply response under risk: Implications for countercyclical payments’ production impact. Review of of Agricultural Economics 29, 64–86. Love, A., Buccola, S., 1991. Joint risk preference-technology estimation in a primal system. American Journal of Agricultural Economics 73, 765–774. Mehra, R., Prescott, E.C., 1985. The equity-premium puzzle. Journal of Monetary Economics 15, 145–161. Meyer, J., Meyer, D.J., 2006. Measuring risk aversion. Foundations and Trends in Microeconomics 2, 107–203. Mishra, A.K., Morehart, M.J., 2001. Off-farm investment of farm households: A logit analysis. Agricultural Finance Review 61, 87–101. Muellbauer, J., 1974. Household production theory, quality, and the ‘hedonic technique’. American Economic Review 64, 977–994. Nerlove, M., 1956. Estimates of the elasticities of supply of selected agricultural commodities. Journal of Farm Economics 38, 496–509. Newey, W., McFadden, D., 1994. Large sample estimation and hypothesis testing. In: Engel, R.F., McFadden, D.L. (Eds.), Handbook of Econometrics, vol. IV. NorthHolland, New York, NY.
43
Newey, W.K., West, K.D., 1987. A simple positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708. Rabin, M., 2000. Diminishing marginal utility of wealth cannot explain risk aversion. In: Kahneman, D., Tversky, A. (Eds.), Choices, Values, and Frames. Cambridge University Press, New York. Saha, A., Innes, R., Pope, R., 1993. Production and savings under uncertainty. International Review of Economics and Finance 2, 365–375. Saha, A., Shumway, C.R., Talpaz, H., 1994. Joint estimation of risk preference structure and technology using the expo-power utility. American Journal of Agricultural Economics 76, 173–184. Samuelson, P.A., 1966. The fundamental singularity theorem for non-joint production. International Economic Review 7, 34–41. Sharpe, W.F., 1970. Portfolio Theory and Capital Markets. McGraw-Hill, New York. Shiller, R.J., 1992. Market Volatility. MIT Press, Cambridge, MA. Siegel, J.J., Thaler, R.H., 1997. Anomalies: The equity premium puzzle. Journal of Economic Perspectives 11, 191–200.
Journal of Econometrics 162 (2011) 44–54
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
The empirical relevance of the competitive storage model✩ Carlo Cafiero a,∗ , Eugenio S.A. Bobenrieth H. b , Juan R.A. Bobenrieth H. c , Brian D. Wright d a
Università degli Studi di Napoli Federico II, Italy
b
Universidad de Concepción, Chile
c
Universidad del Bío-Bío, Chile d University of California at Berkeley, USA
article
info
Article history: Available online 13 October 2009 Keywords: Autocorrelation Commodity prices Pseudo maximum likelihood Simulation Storage
abstract The empirical relevance of models of competitive storage arbitrage in explaining commodity price behavior has been seriously challenged in a series of pathbreaking papers by Deaton and Laroque (1992, 1995, 1996). Here we address their major criticism, that the model is in general unable to explain the degree of serial correlation observed in the prices of twelve major commodities. First, we present a simple numerical version of their model which, contrary to Deaton and Laroque (1992), can generate the high levels of serial correlation observed in commodity prices, if it is parameterized to generate realistic levels of price variation. Then, after estimating the Deaton and Laroque (1995, 1996) model using their data set, model specification and econometric approach, we show that the use of a much finer grid to approximate the equilibrium price function yields quite different estimates for most commodities. Results are obtained for coffee, copper, jute, maize, palm oil, sugar and tin that support the specifications of the storage model with positive constant marginal storage cost and no deterioration as in Gustafson (1958a). Consumption demand has a low response to price and, except for sugar, stockouts are infrequent. The observed magnitudes of serial correlation of price match those implied by the estimated model. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Commodity price risk has long been an important concern for consumers and producers, and the potential of storage for moderating such risks is widely recognized. In 1958, Gustafson (1958a,b) made a major contribution to the study of the relation between storage and price risk when he presented his model of the market for a storable commodity subject to random supply disturbances, anticipating the concept of rational expectations of Muth (1961). Gustafson’s model showed that competitive intertemporal storage arbitrage can smooth the effects of temporary gluts and, when stocks are available, temporary shortages. Subsequent numerical models in the Gustafson tradition, including Johnson and Sumner
✩ Work on this paper was supported by USDA CSREES National Research Initiative Grant No. 2005-35400-15978, Energy Biosciences Initiative, CONICYT/Fondo Nacional de Desarrollo Cientifico y Tecnologico (FONDECYT) Projects 1090017 and 1050562, Dirección de Investigación Universidad de Concepción and Dirección de Investigación Universidad del Bío-Bío. Brian Wright is a member of the Giannini Foundation. We would like to thank two excellent anonymous referees for their advice, Elisabeth Sadoulet and Alain de Janvry for their support, and Betty Dow of the Development Prospects Group at the World Bank, for her help in providing the recent price data and resolving other data issues. We acknowledge the excellent assistance of Ernesto Guerra V. ∗ Corresponding author. E-mail address:
[email protected] (C. Cafiero).
0304-4076/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2009.10.008
(1976), Gardner (1979), Newbery and Stiglitz (1981, Ch. 30) and Wright and Williams (1982), have confirmed that the qualitative features of the price behavior of some important commodities are consistent with the effects of such arbitrage. In addition, numerical storage models (for example, Park, 2006) can explain the key qualitative features of farmers’ economic behavior when they face high transaction costs, and the threat of hunger if local food crops fail and prices soar. The estimation of theoretically acceptable models of price smoothing by storage arbitrage, however, was delayed for decades by the absence of satisfactory time series of aggregate production and stocks for major commodities. Deaton and Laroque pioneered the empirical estimation of models of storage arbitrage, given such data limitations, by developing an estimation strategy that used only deflated price data, assuming a fixed interest rate and specifying the cost of storage as proportional deterioration of the stock. Their conclusions were discouraging regarding the contribution of storage models to our understanding of the nature of commodity price risk. They furnished a body of numerical and empirical evidence (Deaton and Laroque, 1992, 1995, 1996) against the ability of their model to explain commodity price behavior, nicely summarized by Deaton and Laroque (2003, p. 290): ‘‘[T]he speculative model, although capable of introducing some autocorrelation into an otherwise i.i.d. process, appears to be incapable of generating the high degree of serial correlation of most commodity prices.’’
C. Cafiero et al. / Journal of Econometrics 162 (2011) 44–54
Indeed they find the failure in this respect to be a general feature of the competitive storage model, rather than a question of whether their specifications could yield high correlations that are consistent with the data (Deaton and Laroque, 1995, p. S28). In this paper we re-assess the relevance of speculative storage in explaining commodity price behavior. To do so, we must first address the claim that the inability to match the high correlations observed in commodity price data is a general feature of the models, regardless of the parameterization. One set of evidence presented by Deaton and Laroque consists of simulations of various numerical specifications of the model (Deaton and Laroque, 1992, p. 11), all of which fail to generate sufficiently high autocorrelation. We demonstrate that even their high variance simulation model with linear consumption demand, like key illustrative examples in Gustafson (1958a,b), Gardner (1979), and Williams and Wright (1991), fails to generate as much price variation as observed for the commodities they consider. With a less price-sensitive consumption demand curve, we show that storage can generate in their model levels of sample correlations and variation of price in the ranges observed for a number of major commodities. Thus the relevance of the storage model is re-established as an empirical question. Our numerical examples assume no storage cost apart from interest. It is clear that very high decay rates for stored commodities, such as those estimated by Deaton and Laroque (1995, 1996) (ranging from 6% to 18% per annum), would greatly reduce the correlations produced in our numerical examples, and make it less likely that storage would in fact induce the high correlations observed in price. A brief review of information on storage costs for some commodities and time periods yields no cases consistent with such high decay rates. Indeed the evidence in general points to a specification presented in Gustafson (1958a), with positive constant marginal storage cost. Using the econometric approach of Deaton and Laroque (1995, 1996) and the same dataset of 13 commodity prices,1 we move on to estimation. First, we re-evaluate the empirical results of the PML estimates of Deaton and Laroque (1995, 1996) for the case of i.i.d. production. Using a model based on our understanding of their empirical model and its implementation, we replicate the results for most commodities quite accurately, including the very high decay rates estimated by Deaton and Laroque. However, investigation of their estimation procedure reveals their fit of the price function to be unsatisfactory, due to use of insufficient grid points in approximating the price functions through splines. Re-estimation with finer grids yields quite different estimates, with the estimated decay cost of storage reduced or eliminated when the number of grid points is substantially increased, for most commodities. Simulations based on the models estimated with the finer grids reveal that, for five commodities (coffee, copper, maize, palm oil and sugar), the observed value of first-order correlation of prices lies within their symmetric 90% confidence regions. We then estimate a model that allows for a fixed positive marginal cost of storage, as well as for the possibility of positive deterioration of stocks, which therefore nests the model of Deaton and Laroque (1995, 1996). We obtain results for seven commodities: coffee, copper, jute, maize, palm oil, sugar, and tin. The estimates indicate a fixed positive marginal storage cost with no
1 The commodities are bananas, cocoa, coffee, copper, cotton, jute, maize, palm oil, rice, sugar, tea, tin and wheat. The original price indexes, attributed to World Bank sources, and a series for the United States Consumer Price Index, are available on-line at http://qed.econ.queensu.ca/jae/1995-v10.S/deaton-laroque/. The data reported as the US CPI for the period 1900–1913 appear to be from the deflator presented in Rees (1961).
45
deterioration, providing empirical support to the specification used in Gustafson (1958a). Simulations based on each of our estimates using this specification produce sample distributions of the first- and second-order autocorrelation that include observed values within the 90% symmetric confidence regions. Estimation using the alternate 2% real interest rate assumed by Gustafson (1958a) shows even better matches of mean prices, predicted autocorrelations, and coefficients of variation with the observed data. Thus we have established that competitive storage can generate the high levels of autocorrelation observed for the prices of major commodities. Further, the application of Deaton and Laroque’s econometric approach, modified to improve its numerical accuracy, using their own data set, can yield empirical results that are consistent with observed levels of price variation and autocorrelation for seven major commodities. 2. Can storage generate high serial correlation? We begin by focusing on a preliminary question: can a simple storage model with i.i.d. production disturbances generate price autocorrelations that are similar to those observed in time series for major commodities? To address this question, we consider specifications of the storage model that are special cases of models presented in Gustafson (1958a), and Deaton and Laroque (1992). Production is given by an i.i.d. sequence ωt (t ≥ 1) with bounded support. The available supply at time t is zt ≡ ωt + xt −1 , where xt −1 ≥ 0 are stocks carried from time t − 1 to time t. Consumption ct is the difference between available supply zt and stocks xt carried forward to the next period. The inverse consumption demand F (c ) is strictly decreasing. There is no storage cost apart from an interest rate r > 0. Storage and price satisfy the arbitrage conditions: xt = 0 ,
if (1 + r )−1 Et pt +1 < pt ,
xt ≥ 0,
if (1 + r )−1 Et pt +1 = pt ,
where pt represents the price at time t, and Et is the expectation conditional on information at time t. The above complementary inequalities are consistent with profit-maximizing speculation by risk-neutral price-takers. To investigate whether there exist, within the parameter space of the model, specifications that yield price behavior characteristic of observed commodity markets, one can solve the model for each of a set of parameterizations by numerical approximation of the equilibrium price function, and then derive by numerical methods the implications for time series of price behavior. In the numerical approximations of Deaton and Laroque (1992, Table 2, p. 11), the highest autocorrelation of price that they report is produced by a specification that they denote the ‘‘high-variance case’’, which matches an example in Williams and Wright (1991, pp. 59–60), with no deterioration or other physical storage cost, r = 0.05, linear inverse consumption demand, F (c ) = 600−5c, and production realizations drawn from a discrete approximation to the normal distribution (with mean 100 and standard deviation 10). This case implies a price autocorrelation of 0.48, far below the sample correlations calculated from the 88-year time series of prices of 13 commodities (bananas, cocoa, coffee, copper, cotton, jute, maize, palm oil, rice, sugar, tea, tin, and wheat as listed in Table 1) which are all in excess of 0.62. They conclude that perhaps the autocorrelation observed in commodity prices needs to be explained by phenomena other than storage (Deaton and Laroque, 1992, page 19). Our solution of the storage model for the same specification, when simulated for 100,000 periods, yields first- and second-order autocorrelations of prices, over this long sample, of 0.47 and 0.31. These values are close to those obtained by Deaton and Laroque (1992) for the invariant distribution (0.48 and 0.31, respectively).
46
C. Cafiero et al. / Journal of Econometrics 162 (2011) 44–54
Fig. 1. Price characteristics implied by the storage model with linear inverse consumption demand F (c ) = 600 − 50c and production realizations drawn from a normal distribution with mean 100 and standard deviation 10, truncated at five standard deviations from the mean. Table 1 Variation and correlation in the commodity price time series (1900–1987). Commodity
First-order autocorrelation
Second-order autocorrelation
Coefficient of variation
Bananas Cocoa Coffee Copper Cotton Jute Maize Palm oil Rice Sugar Tea Tin Wheat
0.92 0.84 0.81 0.85 0.88 0.71 0.75 0.72 0.84 0.62 0.80 0.89 0.86
0.83 0.66 0.61 0.67 0.69 0.45 0.54 0.48 0.63 0.39 0.64 0.75 0.68
0.17 0.54 0.45 0.38 0.34 0.33 0.38 0.48 0.36 0.60 0.26 0.42 0.38
In order to assess the implications of the model for samples of the same length as those of the observed commodity price series used for this paper, we take successive samples of size 88 from the simulated series, the first starting from period t = 1, the second from period t = 2, and so on, and measure the autocorrelation and coefficient of variation for each of them. Fig. 1 shows histograms of simulated sample first-order correlations and coefficients of variation for this exercise. The median of the first-order autocorrelations is 0.45. The 90th percentile is 0.61, a little below the lowest value in the commodity price series, which is 0.62, for sugar. For all twelve others in Table 1, the values are above 0.7, the 98.5 percentile of the distribution of simulated values; it is clear that the example does not match the data for these others at all well. The same criticism applies to many of the other examples in Wright and Williams (1982), and Williams and Wright (1991), with similar specifications. However this ‘‘high-variance case’’ has another problem. It does not generate sufficient price variation to match the values for most of the commodities in the 88-year samples. The long run estimate of the coefficient of variation of price is 0.25, half its value when storage is not possible. The coefficients of variation for the time series of prices of all the commodities in Table 1 but bananas and tea lie above the 98th percentile of the distribution of sample values generated from simulation of this numerical model. It is clear that this specification, and the others considered in Gustafson (1958a,b), Gardner (1979), Wright and Williams (1982), Williams and Wright
(1991) and Deaton and Laroque (1992, Table 2, p. 11), in fact imply lower price variation than observed in major commodity markets. Although it is conceivable that variation in production has been substantially underestimated, it appears more likely that the consumption demand functions specified in the numerical models, with price elasticities (at consumption equal to mean production) in the range −0.5 to −0.1, are more sensitive to price than are consumption demands in the markets we consider.2 Hence the simulations exhibit too little storage, too many stockouts, and consequently values for price variation and serial correlation that are too low to match those observed in the time series of prices of major commodities. To increase the price variation in the model, we rotate the linear consumption demand around its mean, changing its price elasticity at that point from −0.2 to −0.067, not, a priori, an unreasonable value for the demand for a basic commodity. Once again we solve the model and generate a simulated sample of 100,000 periods. The results are presented in Fig. 2. The median of the sample coefficients of variation derived from this numerical exercise is 0.46, quite close to the observed values for many of the commodities. Only bananas and tea have values less than the 5th percentile of the generated sample distribution. The median of the distribution of sample first-order correlations generated by simulation is 0.60. The values for six commodities (coffee, jute, maize, palm oil, sugar, and tea) lie between 5th and the 95th percentiles. Figs. 1 and 2 together show that tripling the price variation that would occur without storage leads to sufficiently greater arbitrage that the median price variation only doubles. The greater arbitrage is also reflected in much higher serial correlation. The simulations discussed above favor storage and high serial correlation by assuming no storage cost other than interest charges. But physical storage costs are not in general zero. Before moving to a discussion of estimation of the model, we discuss the choice of storage cost specification for the estimated model.
2 Choice of the appropriate demand elasticity is a challenge, due to the difficulty in empirically distinguishing the consumption and storage demand responses. This problem is noted by Gardner (1979) in his discussion of the finding of Hillman et al. (1975) that the wheat demand elasticity is smaller at higher prices.
C. Cafiero et al. / Journal of Econometrics 162 (2011) 44–54
47
Fig. 2. Price characteristics implied by the model with linear inverse consumption demand F (c ) = 600 − 150c and production realizations drawn from a normal distribution with mean 100 and standard deviation 10, truncated at five standard deviations from the mean.
3. The cost of storage
4. The model and the estimation procedure
Gustafson (1958a,b) and much of the subsequent literature (including Johnson and Sumner, 1976, Newbery and Stiglitz, 1979, chapter 29; Wright and Williams, 1982; Miranda and Helmberger, 1988 and Williams and Wright, 1991) focus on models where the marginal physical cost of storage is constant. In contrast, Samuelson (1971) and Deaton and Laroque (1992, 1995, 1996) specify the storage cost as a constant proportional deterioration or shrinkage of the stock. This implies that, since the price is decreasing in stocks, the marginal cost of storage is high when stocks are low. The fees for storage in public warehouses might be considered to be upper bounds on annual storage costs. They furnish some evidence regarding the choice between these cost specifications. When a commodity such as a grain or a metal is deposited in a warehouse, the warehouse receipt specifies the grade and quantity, and the depositor receives the right to withdraw later an equal quantity of the same grade. Any shrinkage or other deterioration is implicitly covered in the storage fee. There is evidence for some commodities that, within the sample period, the fee for storing one unit of commodity per unit of time has remained constant and independent of price movements over substantial time intervals.3 This suggests that the cost of deterioration, which is proportional to the value, might be too small to justify price-contingent storage fees. To allow for this possibility, in our empirical model we specify the cost of storage to include both a fixed marginal physical cost and non-negative deterioration.4
We model a competitive commodity market with constant, strictly positive marginal and average storage cost and proportional deterioration. All agents have rational expectations. Supply shocks ωt are i.i.d., with support in R that has lower bound ω ∈ R. Storers are risk neutral and have a constant discount rate r > 0. Stocks physically deteriorate at rate d, with 0 ≤ d < 1, and the cost of storing xt ≥ 0 units from time t to time t + 1, paid at time t, is given by kxt , with k > 0. The state variable zt is the total available supply at time t, zt ≡ ωt + (1 − d)xt −1 , with zt ∈ Z ≡ [ω, ∞[. The inverse consumption demand, F : R → R, is continuous, strictly decreasing, with {z : F (z ) = 0} ̸= ∅, d limz →−∞ F (z ) = ∞, and 11− EF (ω t ) − k > 0, where E denotes +r the expectation taken with respect to the random variable ωt . A stationary rational expectations equilibrium (SREE) is a price function p : Z → R which describes the current price pt as a function of the state zt , and satisfies, for all zt ,
3 For example, Holbrook Working reports that daily charges for wheat storage in public elevators in Chicago were constant from December 1910 through December 1916 (Working, 1929, p. 22). A detailed analysis of the cost of storing a number of major commodities around the decade of the 1970s, when prices were highly volatile, is found in UNCTAD (1975). The reported costs are not presented as contingent on the commodity prices. Where relevant, costs of rotation of stocks to prevent deterioration are explicitly recognized. Williams (1986, pp. 213–214) reports that for cocoa, which spoils more easily than major grains, warehouse storage fees in New York stayed around $ 5 per ton per month from 1975 through 1984 while the cocoa price fluctuated wildly, between $ 1063 and $ 4222 per ton. In Oklahoma, Texas, Arkansas and Kansas, public elevators charge the same fees per bushel for several grains, and these fees, which implicitly cover any shrinkage or deterioration, remain constant for considerable periods of time. For example, in Oklahoma, the grain storage cost per bushel was 2.5 cents per month from 1985 through 2000 (Anderson, 2005). 4 Like Deaton and Laroque (1992, 1995, 1996), we ignore the cost of initially placing the commodity in a warehouse, and the cost of withdrawal. Implications
pt = p(zt )
= max
1−d
Et p(ωt +1 + (1 − d)xt ) − k, F (zt )
1+r
(1)
where xt = zt − F −1 (p(zt )).
(2)
Since the ωt ’s are i.i.d., p is the solution to the functional equation p(z ) = max
1−d
1+r
Ep(ω + (1 − d)x(z )) − k, F (z ) ,
and x(z ) = z − F −1 (p(z )). The existence and uniqueness of the SREE, as well as some properties, are given by the following theorem:
of the costs of withdrawal for commodity prices are explored in Bobenrieth et al. (2004).
48
C. Cafiero et al. / Journal of Econometrics 162 (2011) 44–54
Theorem. There is a unique stationary rational expectations equilibrium p in the class dof continuous non-increasing functions. FurtherEp(ω) − k, more, for p∗ ≡ 11− +r p(z ) = F (z ),
for z ≤ F −1 (p∗ ),
p(z ) > F (z ),
for F −1 (p∗ ) < z .
Our proof of this theorem follows the same structure as the proof of Theorem 1 in Deaton and Laroque (1992).5 We estimate the model described in this section assuming that the inverse consumption demand is F (c ) = a + bc, where c is consumption, using the pseudo-likelihood maximization procedure of Deaton and Laroque (1995, 1996).6 First, we choose values ωtn+1 and Pr(ωtn+1 ) to discretize the standard normal distribution, 7 so that condition (1) can be expressed as pt = p(zt ) = max
1−d
− N
1+r
Parametersa
Commodity
Cocoa
p is strictly decreasing. The equilibrium level of inventories, x(z ), is strictly increasing for z > F −1 (p∗ ).
Table 2 Our replication of the estimates of Deaton and Laroque (1995, 1996).
Cocoab Coffee Copper Cotton Jute Maize Palm oil Rice
p(ωtn+1 + (1 − d)xt )
Sugar
n =1
× Pr(ωtn+1 ) − k, a + bzt .
(3)
Tea Tin
Next, we solve (3) numerically by approximating the function p with cubic splines on a grid of points over a suitable range of values of zt , imposing the restriction represented by (2). Then, using the approximate SREE price function p, we calculate the first two moments of pt +1 conditional on pt : m(pt ) =
N −
p ωtn+1 + (1 − d) p−1 (pt ) − F −1 (pt ) Pr(ωtn+1 ),
n=1
s(pt ) =
N − 2 p ωtn+1 + (1 − d) p−1 (pt ) − F −1 (pt ) n =1
× Pr(ωtn+1 ) − m2 (pt ). To match the prediction of the model with the actual price data, we form the logarithm of the pseudo-likelihood function as ln L =
T −1 −
ln lt = 0.5 −(T − 1) ln(2π ) −
t =1
− (pt +1 − m(pt )) t =1
ln s(pt )
s(pt )
2
.
Wheatc
b
d
0.1612 (0.0103) 0.1412 (0.0167) 0.2620 (0.0215) 0.5447 (0.0348) 0.6410 (0.0338) 0.5681 (0.0269) 0.5800 (0.0468) 0.4618 (0.0510) 0.5979 (0.0262) 0.6451 (0.0471) 0.4762 (0.0174) 0.2531 (0.0433) 0.6358 (0.0381) 1.0711 (0.1112)
−0.2190
0.1154 (0.0405) 0.0550 (0.0345) 0.1360 (0.0191) 0.0687 (0.0189) 0.1685 (0.0280) 0.0933 (0.0510) 0.0122 (0.0322) 0.0579 (0.0282) 0.1471 (0.0389) 0.1790 (0.0308) 0.1190 (0.0329) 0.1441 (0.0514) 0.0575 (0.0240) 0.0936 (0.0713)
(0.0326) −0.2228 (0.0260) −0.1617 (0.0261) −0.3268 (0.0536) −0.3131 (0.0341) −0.3624 (0.0565) −0.9619 (0.1549) −0.4288 (0.0601) −0.3358 (0.0294) −0.6240 (0.0656) −0.2156 (0.0251) −0.1728 (0.0482) −0.4236 (0.0322) −1.0403 (0.5006)
PL 124.6209 129.9174 112.0541 74.0137 29.8815 45.2556 37.0061 22.1912 26.0648
−10.7309 69.6786 110.1603 28.5261 10.5416
For all commodities but cocoa and wheat, we use the same grid limits and sizes as in Deaton and Laroque (1995, Table I). For cocoa, we replicate Deaton and Laroque’s estimates with 21 grid points instead of 20 and for wheat the lower limit is set at −5 rather than −3. a Asymptotic standard errors in parentheses. PL is the value of the maximized log-pseudo-likelihood. b Estimates for a grid of 20 points. c Estimates for a lower limit of the grid of −3.
To estimate the variance–covariance matrix of the vector of original parameters θ ≡ {a, b, d, k}, we first obtain a consistent estimate of the variance–covariance matrix of the parameters θ˜ by forming the following expression:
˜ = J−1 G′ GJ−1 , V
t =1
T −1
−
T −1 −
Wheat
a
(4)
Keeping the interest rate fixed, we maximize the log pseudolikelihood function (4) with respect to the vector of parameters ˜ ˜ ˜ θ˜ ≡ {a, b˜ , d˜ , k˜ }, where b = −eb , d = ed , and k = ek . The transformation is used to impose the restrictions b < 0, d > 0, and k > 0. Even though (4) is not the true log-likelihood (in the presence of storage, prices will not be distributed normally), the estimates are consistent (Gourieroux et al., 1984).
5 When there is a constant additive positive marginal storage cost, equilibrium price realizations can be negative. Recognition of free disposal avoids this problem. A proof of a version of the theorem for a model with positive marginal storage cost, possibly unbounded realized production, and free disposal is available from the authors. 6 We are grateful to Angus Deaton for sending us their estimation code. Based on this generous assistance, we developed our MATLAB code drawing on our interpretation of the original code, which was, quite understandably, not documented for third-party use. We added code for the estimation of standard errors. 7 In practice, as in Deaton and Laroque (1995, 1996), ωn is restricted to take
where the matrices J and G have typical elements J i ,j =
∂ 2 ln L ∂ ln lt and Gt ,i = ∂ θ˜i ∂ θ˜j ∂ θ˜i
calculated by taking numerical derivatives8 of the log-pseudolikelihood, ln L, and of its components, ln lt , all evaluated at the point estimates of the parameters θ˜ (see Deaton and Laroque, 1996, Eq. 18). A consistent estimate of the variance–covariance matrix of the original parameters θ is obtained using the delta method as
˜ V˜ D˜ V=D
′
˜ is a diagonal matrix of the derivatives of the transformawhere D tion functions: ˜ = D
1 0
0 0
0
0 ˜
0 0
−e b
0
0
ed
0
˜
0
0
e
.
k˜
t +1
one of the conditional means of N = 10 equiprobable intervals of the standard normal distribution, ±1.755, ±1.045, ±0.677, ±0.386, ±0.126. The restrictions of zero mean and unit variance for the distribution of the supply shocks are imposed to identify the model (see Deaton and Laroque, 1996, Proposition 1, p. 906).
8 All numerical derivatives are obtained with a MATLAB routine coded following Miranda and Fackler (2002, pp. 97–104).
C. Cafiero et al. / Journal of Econometrics 162 (2011) 44–54
49
Fig. 3. Sugar: Implications of grid density for numerical approximation of the equilibrium price function.
5. Data and empirical results Our initial data set, which is identical to that reported by Deaton and Laroque (1995), consists of a widely used set of commodity price indices, deflated by the United States Consumer Price Index, for bananas, cocoa, coffee, copper, cotton, jute, maize, palm oil, rice, sugar, tea, tin, and wheat for the period 1900–1987, with features summarized in Table 1.9 5.1. Replication of the PML results of Deaton and Laroque To check our estimation routine, we first estimate the model with k = 0, adopting parameterizations and grid specifications of Deaton and Laroque (1995, 1996), assuming the same interest rate, 5%. As shown in Table 2, we essentially replicate the point estimates of the parameters for 10 of the 13 commodities. Like Deaton and Laroque, we were unable to obtain an estimate for bananas, and do not consider this commodity further. For another two commodities, maize and wheat, our estimates have higher pseudolikelihood values, and lower estimates of the rate of deterioration. 5.2. Estimation of the constant-decay model with a finer grid In considering the estimation procedure, we have been concerned that the use of cubic splines to approximate the function p in the region of zero inventories might induce non-negligible errors if the grid is sparse, due to the fact that p is kinked (see Michaelides and Ng, 2000, p. 243; Cafiero, unpublished).10 To investigate the extent of the approximation error, we first solve the numerical model with the grid sizes and limits used by Deaton and Laroque (1995), and then with a much finer grid of 1000 points with the same limits. In both numerical exercises we assume a linear inverse consumption demand, F (c ) = a + bc, with parameters
9 For sources of these data see footnote 1. 10 Deaton and Laroque use spline smoothing, to obtain faster convergence of their numerical algorithm. See for example Deaton and Laroque (1995, p. S26).
a = 0.645, b = −0.624, and decay rate d = 0.179.11 Fig. 3 shows the effect of the change in grid size on the accuracy of approximation of the price function. Notice that the fine grid of 1000 points allows for clear identification of the kink in the price function, which occurs at a price equal to p∗ , and that the inaccuracy of the approximation of the price function with a sparse grid is especially large around that point, within a range where many prices are observed. This affects the accuracy of the evaluation of the pseudo-likelihood function, which makes use of the approximated price function to map from the observed price to the implied availability (see for example Eqs. 41 and 43 in Deaton and Laroque (1995)).12 To assess the extent of the effect induced by the approximation error on the estimation, we experiment by estimating the model for various numbers of grid points, on the presumption that a finer grid would reduce the errors associated with the spline approximation of the price function. The results of this experiment are reported in Table 3 for cotton and sugar. The estimates appear to become robust to the number of grid points only when the grid is sufficiently fine; 1000 grid points appears to be adequate. Using 1000 grid points, we are unable to obtain estimates for rice, tin and wheat, while for sugar we identify two maxima of the pseudo-likelihood (we report the maximum with the higher pseudo-likelihood value). For the other commodities, we find only one well-behaved maximum of the pseudo-likelihood. Increasing the number of grid points to 1000 decreases the point estimate of the depreciation rate substantially for every commodity, with the exception of tea (see Table 4).
11 These are the values obtained in our replication of the estimates of Deaton and Laroque for sugar, reported in Table 2. 12 For prices above the kink point, the implied levels of stock, i.e. the difference between implied availability and consumption, should be zero. Use of the smoother function in Fig. 3 would predict negative stocks. The effect of this appears to be reflected in Fig. 7, and in the dotted line of Fig. 9 of Deaton and Laroque (1995) that represents the predictions from their estimation of the i.i.d. storage model. Such a prediction should coincide with p∗ (1 + r )/(1 − d) whenever current price is above p∗ .
50
C. Cafiero et al. / Journal of Econometrics 162 (2011) 44–54
Table 3 Estimation of Deaton and Laroque models for varying grid size. Grid size
a
b
d
Cotton 10 19 37 73 145 577 1000 1153
0.6410 0.6343 0.6219 0.5716 0.5301 0.5292 0.5295 0.5311
−0.3131 −0.3281 −0.3560 −0.4366 −0.5191 −0.5123 −0.5133 −0.5114
0.1685 0.1515 0.1254 0.0805 0.0462 0.0478 0.0478 0.0485
Sugar 10 19 37 73 145 289 577 1000 1153 1500
0.6451 0.2296 0.2588 0.2436 0.2491 0.2514 0.2535 0.2545 0.2546 0.2521
−0.6240 −1.2345 −1.2874 −1.2615 −1.2722 −1.2742 −1.2650 −1.2650 −1.2666 −1.2670
0.1790 0.0000 0.0000 0.0000 0.0000 0.0003 0.0016 0.0020 0.0021 0.0006
Table 5 Grids used in the estimation. PL 29.8815 28.3221 28.4064 29.5948 29.6861 29.6761 29.6761 29.6783
−10.73 −6.745 −6.978 −6.815 −6.7660 −6.788 −6.791 −6.785 −6.783 −6.774
Other than for 1000 points, from one step to the next the number of grid points has been changed to increase the number of grid nodes without affecting the position of the existing ones, to avoid introducing further instabilities in the pseudo-likelihood maximization routine. The previous estimates are used as starting values for the estimates using the next grid size.
Commodity
Minimum z
Maximum z
Points
Coffee Copper Jute Maize Palm oil Sugar Tin
−5 −5 −5 −5 −5 −5 −5
30 40 30 40 30 20 45
1000 1000 1000 1000 1000 1000 1000
Table 6 Estimation of the constant marginal storage cost model (r = 0.05). Commodity
Coffee Copper Jute Maize Palm oil Sugar Tina
Table 4 Estimation of the Deaton and Laroque model with fine grids of 1000 points. Commodity
a
b
d
PL
Cocoa Coffee Copper Cotton Jute Maize Palm oil Sugar Tea
0.1276 0.6804 1.0482 0.5295 0.5572 1.3842 1.0975 0.2545 0.5108
−0.2651 −6.4599 −2.9135 −0.5133 −0.5738 −6.4838 −5.5795 −1.2650 −0.1687
0.0520 0.0 0.0 0.0478 0.0360 0.0 0.0 0.0020 0.1554
118.814 131.722 96.798 29.676 38.599 41.425 65.155 −6.785 63.865
The effects of use of a finer grid for function approximation on the estimation results are illustrated in Fig. 4, taking sugar as an example. With the finer grid, the model estimates a substantially steeper consumption demand (the slope of the inverse demand function changes from −0.6249 to −1.2661) and the estimated cutoff value p∗ increases from 0.6199 (87.3% of the mean price, located close to the 52nd percentile of the observed price distribution) to 0.9018 (127.1% of the mean price, located at the 74th percentile), that is, by an amount that is large relative to the distribution of observed prices. These changes in the estimated values imply much more storage (the average amount of stocks held over a long simulated series of 100,000 periods increases from 0.44, as predicted by the parameters estimated with the sparse grid, to 4.17, as predicted instead with the parameters obtained with the fine grid) and much higher price autocorrelations than reported in Deaton and Laroque (1995, 1996): the model estimated with the 1000 grid points implies a first-order autocorrelation of 0.647 in a simulation of 100,000 periods, as opposed to the values of 0.264, as reported by Deaton and Laroque (1996, Table I) and of 0.223, as implied by our replication of the Deaton and Laroque model reported in Table 2. 5.3. Estimation of the model with constant marginal storage cost In this section, we set the number of grid points at 1000 and estimate the model allowing for a positive k, assuming initially an interest of 5%, as in Deaton and Laroque (1995, 1996) and Gustafson (1958a). The lowest value of the range of z over which the price function is approximated is lower than the lowest possible
p∗
Parameters a
b
k
PL
0.5595 (0.1206) 0.9952 (0.1142) 1.1786 (0.1884) 1.1395 (0.1235) 1.2535 (0.1314) 0.6053 (0.0969) 5.8695 (0.0420)
−3.0740
0.0014 (0.0019) 0.0008 (0.0026) 0.0064 (0.0075) 0.0096 (0.0072) 0.0053 (0.0032) 0.0329 (0.0196) 0.0024 (0.0004)
131.8955
2.1443
96.8285
2.1775
53.5851
2.9230
41.4971
2.2195
66.0274
3.2685
−2.4657
0.9019
152.4536
18.0644
(0.9098) −2.4775 (0.6974) −3.5997 (0.5692) −2.3858 (0.3745) −4.1113 (0.4387) −0.8838 (0.1099) −24.1231 (1.2814)
Asymptotic standard errors in parentheses. a The estimate reported for tin is one of several that generate the same value of the maximized pseudo-likelihood. Table 7 Estimation of the constant marginal storage cost model (r = 0.02). p∗
Commodity
Parameters a
b
k
PL
Coffee
0.3047 (0.1032) 0.6787 (0.0800) 0.8615 (0.2358) 0.9217 (0.2017) 0.8427 (0.1299) 0.5829 (0.1666) 0.5741 (0.0178)
−1.8866
0.0035 (0.0022) 0.0053 (0.0028) 0.0115 (0.0087) 0.0129 (0.0087) 0.0099 (0.0040) 0.0429 (0.0313) 0.0039 (0.0004)
132.6319
1.3657
99.8395
1.7463
55.3096
2.6210
43.7939
2.4331
68.8829
2.6060
−2.7104
0.9051
155.6304
2.0350
Copper Jute Maize Palm oil Sugar Tin
(0.4443) −1.9770 (0.3391) −3.2399 (0.7406) −2.8352 (0.5665) −3.2297 (0.4808) −0.8769 (0.1287) −2.6172 (0.0493)
Asymptotic standard errors in parentheses.
production. The upper bound of the range for approximation should be large enough to ensure that the approximated function would cover even the lowest price data point. Finding this required some experimentation for the various commodities, with results reported in Table 5. Estimating the model presented in Section 4, we find maxima for the pseudo-likelihood function for seven commodities: coffee, copper, jute, maize, palm oil, sugar and tin. We are unable to locate well-behaved maxima for cocoa, cotton, rice, tea and wheat. For each of the seven commodities for which we obtain estimates, the estimated value of d approaches zero,13 while k is estimated to be strictly positive. Results given d = 0 are presented in Table 6. The log-pseudo-likelihood values for our estimated
13 We estimate d˜ = log(d), which tends to large negative numbers as d approaches zero. At some point, the slope of the objective function with respect to d˜ falls below the preset tolerance. When this occurs, we set d = 0 and re-run the estimations.
C. Cafiero et al. / Journal of Econometrics 162 (2011) 44–54
51
Fig. 4. SUGAR: Dependence of estimation results on grid density. Table 8 Maximized log pseudo-likelihood values for various models.
Cocoa Coffee Copper Cotton Jute Maize Palm oil Rice Sugar Tea Tin Wheat a b c d
Proportional decaya
Proportional decayb
Sparse grid
Dense grid, 1000 points
AR(1)c
125.2 111.0 73.9 29.8 44.8 32.1 22.2 26.0 −10.7 69.3 108.9 24.6
118.8 131.7 96.8 29.7 38.6 41.4 65.1 – −6.8 63.9 – –
124.1 118.9 81.1 74.2 50.2 27.0 27.6 61.0 −27.0 100.9 150.9 52.8
Fixed marginal costd r = 0.05
r = 0.02
– 131.9 96.8 – 53.5 41.5 65.9 – −2.5 – 152.4 –
– 132.6 99.8 – 55.3 43.8 68.9 – −2.7 – 155.6 –
Model estimated by Deaton and Laroque (1995, 1996), values reported in Deaton and Laroque (1995, Table III, column 3). Deaton and Laroque specification estimated with a fine grid of 1000 points. Reported by Deaton and Laroque (1995, Table III, column 2). Specifications used by Gustafson (1958a), estimated with a dense grid of 1000 points.
models are all higher than the corresponding values reported by Deaton and Laroque (1995, 1996) for their storage model with i.i.d. shocks and proportional deterioration (see Table 8). They are also substantially higher than the log-likelihood values reported for the AR(1) model by Deaton and Laroque (1995, 1996) and reproduced in Table 8. Table 7 shows estimates of the constant marginal storage cost model using Gustafson’s alternate interest rate of 2%. Other than for sugar, these latter had the highest maximized pseudo-likelihood values. 5.4. Empirical distributions of implied time series characteristics To explore the characteristics of time series of prices implied by the econometric results, we simulate all of the estimated models to generate price series of 100,000 periods.14 Table 9 shows the values of mean price, first-order auto correlation (a.c. 1), second-order auto correlation (a.c. 2), and coefficient of
14 In all the simulations, we use a series of 100,000 independent draws from a normal distribution with mean zero and variance one, truncated at ±5 standard deviations.
variation (CV) measured on the observed prices, 1900–1987. These values are then located within the empirical distributions of the same parameters generated from all possible samples of 88 consecutive periods drawn from each series of 100,000 prices. The table reports the corresponding percentiles. Our replication of the estimates of Deaton and Laroque (1995, 1996), identified in Table 9 as ‘‘proportional decay, sparse grid’’ with the caveats noted in Table 2, imply much too little price autocorrelation, consistent with their conclusions, for all commodities but maize. For maize, our estimation results (which differ from those of Deaton and Laroque) appear to imply sample distributions quite consistent with the observed mean, correlations, and coefficient of variation of maize price indexes. With the finer grid, estimates of the same model imply symmetric 90% confidence intervals for coffee, copper, maize, palm oil and sugar which contain the observed values. Though the estimates for all of these commodities except palm oil present other problems, they cannot support rejection of the storage model for failure to reproduce observed levels of price autocorrelation. Estimation of the constant marginal storage cost model with the 1000-point grid and, as above, a 5% interest rate,
52
C. Cafiero et al. / Journal of Econometrics 162 (2011) 44–54
Table 9 Characteristics of price series and model predictions. Commodity/model Cocoa Observed values Percentiles Proportional decay, sparse grid Proportional decay, dense grid, r = 5% Constant marginal storage cost, dense grid, r Constant marginal storage cost, dense grid, r Coffee Observed values Percentiles Proportional decay, sparse grid Proportional decay, dense grid, r = 5% Constant marginal storage cost, dense grid, r Constant marginal storage cost, dense grid, r Copper Observed values Percentiles Proportional decay, sparse grid Proportional decay, dense grid, r = 5% Constant marginal storage cost, dense grid, r Constant marginal storage cost, dense grid, r Cotton Observed values Percentiles Proportional decay, sparse grid Proportional decay, dense grid, r = 5% Constant marginal storage cost, dense grid, r Constant marginal storage cost, dense grid, r Jute Observed values Percentiles Proportional decay, sparse grid Proportional decay, dense grid, r = 5% Constant marginal storage cost, dense grid, r Constant marginal storage cost, dense grid, r Maize Observed values Percentiles Proportional decay, sparse grid Proportional decay, dense grid, r = 5% Constant marginal storage cost, dense grid, r Constant marginal storage cost, dense grid, r Palmoil Observed values Percentiles Proportional decay, sparse grid Proportional decay, dense grid, r = 5% Constant marginal storage cost, dense grid, r Constant marginal storage cost, dense grid, r Sugar Observed value Percentiles Proportional decay, sparse grid Proportional decay, dense grid, r = 5% Constant marginal storage cost, dense grid, r Constant marginal storage cost, dense grid, r Tea Observed values Percentiles Proportional decay, sparse grid Proportional decay, dense grid, r = 5% Constant marginal storage cost, dense grid, r Constant marginal storage cost, dense grid, r Tin Observed values Percentiles Proportional decay, sparse grid Proportional decay, dense grid, r = 5% Constant marginal storage cost, dense grid, r Constant marginal storage cost, dense grid, r
= 5% = 2%
= 5% = 2%
= 5% = 2%
= 5% = 2%
= 5% = 2%
= 5% = 2%
= 5% = 2%
= 5% = 2%
= 5% = 2%
= 5% = 2%
Mean
a.c. 1
a.c. 2
CV
0.1971
0.8357
0.6618
0.5444
71.0 97.7 n.a. n.a.
100 99.94 n.a. n.a.
100 99.49 n.a. n.a.
79.9 29.1 n.a. n.a.
0.226
0.8058
0.6146
0.4524
0.21 20.47 12.31 33.75
100 44.11 59.61 42.9
100 33.41 47.51 32.48
99.7 4.52 4.55 7.33
0.4912
0.8514
0.6615
0.3802
1.74 2.65 2.07 16.92
100 88.67 91.11 75.2
100 75.43 79.21 57.66
98.44 8.34 9.45 22.24
0.6463
0.8842
0.6808
0.3464
44.4 96.45 n.a. n.a.
100 100 n.a. n.a.
100 99.97 n.a. n.a.
69.0 11.47 n.a. n.a.
0.5994
0.7057
0.4549
0.325
64.0 62.92 5.79 20.75
99.99 98.82 56.83 35.2
99.38 91.57 35.78 20.02
52.8 7.51 0.84 1.86
0.7141
0.753
0.526
0.3834
87.31 14.59 4.52 23.91
67.09 49.77 81.47 51.77
47.77 35.52 63.95 35.66
28.8 3.55 6.71 9.96
0.5425
0.7246
0.4723
0.4772
91.57 15.4 3.37 16.84
99.98 41.84 68.06 36.81
98.74 25.95 45.43 21.2
72.3 11.61 16.32 22.96
0.7096
0.6202
0.3836
0.6037
70.1 100 87.67 92.06
99.98 29.83 84.06 80.8
99.6 17.56 67.48 63.61
85.2 30.5 44.27 42.5
0.5133
0.7989
0.6161
0.257
90.65 45.46 n.a. n.a.
100 100 n.a. n.a.
100 100 n.a. n.a.
33.1 40.61 n.a. n.a.
0.2221
0.8859
0.7554
0.415
0.58 n.a. 0 6.93
100 n.a. 88.06 74.75
100 n.a. 81.64 65.15
83.6 n.a. 6.49 17.48
implies that the observed first- and second-order correlations lie within symmetric 90% confidence regions for seven commodities (coffee, copper, jute, maize, palm oil, sugar and tin), as shown in Table 9. In this sense, the speculative storage model is
consistent with observed autocorrelation of the prices of these commodities. However, for jute and coffee the empirical 90% symmetric confidence regions do not contain the observed coefficient of variation
C. Cafiero et al. / Journal of Econometrics 162 (2011) 44–54
53
Table 10 Implied probability of at least n stockout in periods of 88 years. Model Proportional decaya
Constant marginal costb
(r = 5%) n
Cocoa Coffee Copper Cotton Jute Maize Palm oil Sugar Tea Tin a b
(r = 5%) n
(r = 2%) n
1
5
10
1
5
10
1
5
10
0.9917 0.3315 0.7435 0.9999 0.9986 0.5589 0.5286 0.6137 1.0000 0.4294
0.8252 0.0574 0.2603 0.9802 0.9539 0.1346 0.1241 0.1455 1.0000 0.0919
0.3909 0.0030 0.0362 0.8165 0.6915 0.0127 0.0109 0.0145 1.0000 0.0070
n.a. 0.5219 0.7835 n.a. 0.7463 0.8859 0.7183 0.9963 n.a. 0.6069
n.a. 0.1213 0.2973 n.a. 0.2585 0.4252 0.2307 0.9044 n.a. 0.1575
n.a. 0.0097 0.0464 n.a. 0.0359 0.0958 0.0310 0.5518 n.a. 0.0165
n.a. 0.3683 0.5656 n.a. 0.5491 0.6322 0.5289 0.9939 n.a. 0.4056
n.a. 0.0542 0.1219 n.a. 0.1083 0.1450 0.0989 0.8649 n.a. 0.0639
n.a. 0.0030 0.0100 n.a. 0.0092 0.0130 0.0088 0.4633 n.a. 0.0041
Deaton and Laroque specification, estimated with fine grids of 1000 points. Specifications proposed by Gustafson, estimated with fine grids of 1000 points.
Table 11 Average profits. Commodity r = 0.05 Average profits Percentiles r = 0.02 Average profits Percentiles
Coffee
Copper
Jute
Maize
Palm oil
Sugar
Tin
−0.0123
−0.0302
−0.0381
−0.0489
−0.0389
−0.0130
−0.0130
40.74
20.95
23.01
8.33
16.30
31.58
58.34
−0.0081
−0.0210
−0.0265
−0.0321
−0.0261
−0.0074
−0.0083
29.97
6.70
17.38
8.62
14.60
39.45
48.30
The table reports the results for the constant marginal storage cost model, estimated for the two alternate interest rate values. For each model, average profits implied by the estimated models evaluated on the actual 88 year price series are reported in the first row. Percentiles of the corresponding distribution of average profits over 88-period samples taken from one long series of 100,000 prices are reported in the second row.
of price. For four of the seven commodities, the observed mean price lies below its confidence interval. For jute in particular, our estimation of the Gustafson specification of the speculative model implies too much price variation, rather than too little correlation. Finally, simulation of the models estimated assuming Gustafson’s alternate 2% interest rate provided the best match, for each commodity, of the estimated mean price, serial correlations, and coefficient of variation with the observed data. A feature of the results is that, according to the estimated models, stockouts occurred over the sample interval only for sugar. For other commodities, the cutoff price for storage, p∗ , exceeds the highest price observed between 1900 and 1987. For all commodities but sugar, Table 10 shows that the probability of at least one stockout, in an 88-period sample drawn from the simulated series of 100,000 observations, is less than 0.88 at r = 0.05, and less than 0.63 at r = 0.02. A stringent check on our results is to calculate the realized profits for a speculator who buys one unit of the commodity when the price is below p∗ , and resells the unit in the next period. For the observed time series for each commodity, we compare the realized profits from such strategy with the simulated sample distributions of profits for the 88-period sequences. Percentiles for realized average profits are presented in Table 11. For all seven commodities, imputed profits lie within each corresponding 90% symmetric confidence interval. 6. Conclusion Our numerical and empirical results offer a new, more positive assessment of the empirical relevance of the commodity storage model. The pathbreaking and influential work of Deaton and Laroque includes an empirical implementation that exhibits problems of accuracy of approximation, which we show lead to substantial errors in estimation of the consumption demand functions and decay rates. When a finer grid is used, Deaton and Laroque’s
model yields estimates that are consistent with observed levels of price autocorrelation, for five commodities. Our estimates of the model that allows also for constant marginal storage cost in addition to proportional deterioration imply distributions of sample autocorrelations that generate 90% confidence intervals that include observed values for seven major commodities, coffee, copper, jute, maize, palm oil, sugar, and tin. The estimates imply constant marginal storage cost with no significant deterioration and lower price elasticities of consumption demand than assumed in most numerical storage models. Though no stockouts are indicated, except for sugar, over the 1900–1987 period, the average speculative profits implied by the model for those years are well within reasonable confidence regions for samples of that size. Numerical models in the tradition of Gustafson have tended to assume higher sensitivity of consumption demand (as distinct from market demand) to price, and lower price variability, than indicated by our empirical results for the seven commodities we consider. With less flexibility of consumption than previously assumed, storage arbitrage is more active, and stockouts are less frequent, inducing the high levels of serial correlation observed in the prices of these commodities. Note that the implications of such price behavior for producer risk management are not straightforward. The short-run price variation is in general lower, but price slumps are more persistent, than in an equivalent market with the lower levels of price autocorrelation indicated in previous empirical estimates. These results open the way for further empirical exploration of the role of commodity storage in reducing the amplitude, and increasing the persistence, of price variations encountered in commodity markets. References Anderson, K., 2005. Personal communication. Agricultural Extension Service. Oklahoma State University. Bobenrieth, E.S.A., Bobenrieth, J.R.A., Wright, B.D., 2004. A model of supply of storage. Economic Development and Cultural Change 52, 605–616.
54
C. Cafiero et al. / Journal of Econometrics 162 (2011) 44–54
Cafiero, C., 2002. Estimation of the commodity storage model. Ph.D. Dissertation, Department of Agricultural and Resource Economics, The University of California at Berkeley (unpublished). Deaton, A., Laroque, G., 1992. On the behaviour of commodity prices. Review of Economic Studies 59 (1), 1–23. Deaton, A., Laroque, G., 1995. Estimating a nonlinear rational expectations commodity price model with unobservable state variables. Journal of Applied Econometrics 10, S9–S40. Deaton, A., Laroque, G., 1996. Competitive storage and commodity price dynamics. Journal of Political Economy 104 (5), 896–923. Deaton, A., Laroque, G., 2003. A model of commodity prices after Sir Arthur Lewis. Journal of Development Economics 71, 289–310. Gardner, B.L., 1979. Optimal Stockpiling of Grain. Lexington Books, Lexington, Mass. Gourieroux, C., Monfort, A., Trognon, A., 1984. Pseudo maximum likelihood methods: Theory. Econometrica 52, 681–700. Gustafson, R.L., 1958a. Carryover levels for grains. Washington DC: USDA, Technical bulletin 1178. Gustafson, R.L., 1958b. Implications of recent research on optimal storage rules. Journal of Farm Economics 38, 290–300. Hillman, J., Johnson, D.G., Gray, R., 1975. Food reserve policies for world food security. Food and Agriculture Organization of the United Nations, ESC/75/2, January 1975. Johnson, D.G., Sumner, D., 1976. An optimization approach to grain reserves for developing countries. In: Eaton, D.J., Steele, W.S., (Eds.), Analysis of Grain Reserves: A Proceeding. US Department of Agriculture, Economic Research Service Report No 634, pp. 56–76. Michaelides, A., Ng, S., 2000. Estimating the rational expectations model of speculative storage: A Monte Carlo comparison of three simulation estimators. Journal of Econometrics 96, 231–266.
Miranda, M.J., Fackler, P.L., 2002. Applied Computational Economics and Finance. The MIT Press. Miranda, M.J., Helmberger, P., 1988. The effects of commodity price stabilization programs. American Economic Review 78, 46–58. Muth, J.F., 1961. Rational expectations and the theory of price movement. Econometrica 29 (3), 315–335. Newbery, D.M.G., Stiglitz, J.E., 1979. The theory of commodity price stabilization rules: Welfare impacts and supply responses. Economic Journal 89, 799–817. Newbery, D.M.G., Stiglitz, J.E., 1981. The Theory of Commodity Price Stabilization: A Study in the Economics of Risk. Oxford University Press, New York. Park, A., 2006. Risk and household grain management in developing countries. Economic Journal 116, 1088–1115. Rees, A., 1961. Real Wages in Manufacturing, 1890–1914. Princeton University Press, Princeton (A Study by the National Bureau of Economic Research). Samuelson, P.A., 1971. Stochastic speculative price. Proceedings of the National Academy of Sciences 68, 335–337. UNCTAD, 1975. Second progress report on storage costs and warehouse facilities. TD/B/C.1/198, Trade and Development Board, Committee on Commodities, Geneva, United Nations Conference on Trade and Development. Williams, J.C., 1986. The Economic Function of Futures Markets. Cambridge University Press, Cambridge, UK. Williams, J.C., Wright, B.D., 1991. Storage and Commodity Markets. Cambridge University Press, Cambridge. Working, H., 1929. The post-harvest depression of wheat prices. Wheat studies of the Food Research Institute 6, 1–40. Wright, B.D., Williams, J.C., 1982. The economic role of commodity storage. Economic Journal 92, 596–614.
Journal of Econometrics 162 (2011) 55–70
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
A tale of two yield curves: Modeling the joint term structure of dollar and euro interest rates Alexei V. Egorov a , Haitao Li b , David Ng c,∗ a
College of Business and Economics, West Virginia University, Morgantown, WV 26506, USA
b
Stephen M. Ross School of Business, University of Michigan, Ann Arbor, MI 48109, USA
c
Department of Applied Economics and Management, Cornell University, Ithaca, NY 14853, USA
article
info
Article history: Available online 13 October 2009 JEL classification: C4 C5 G1 Keywords: Affine term structure models International term structure models Approximate maximum likelihood LIBOR Euribor Specification analysis of term structure of interest rates Out-of-sample model evaluation
abstract Modeling the joint term structure of interest rates in the United States and the European Union, the two largest economies in the world, is extremely important in international finance. In this article, we provide both theoretical and empirical analysis of multi-factor joint affine term structure models (ATSM) for dollar and euro interest rates. In particular, we provide a systematic classification of multi-factor joint ATSM similar to that of Dai and Singleton (2000). A principal component analysis of daily dollar and euro interest rates reveals four factors in the data. We estimate four-factor joint ATSM using the approximate maximum likelihood method of Aït-Sahalia (2002, forthcoming) and compare the in-sample and out-ofsample performances of these models using some of the latest nonparametric methods. We find that a new four-factor model with two common and two local factors captures the joint term structure dynamics in the US and the EU reasonably well. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Financial markets have become increasingly globalized in recent years. Banks and institutional investors regularly borrow, lend and invest around the globe and take on huge fixed income positions in many different countries. These international fixed income positions create exposures not only to exchange rate risks, but also to interest rate risks in different countries.1 The multiple sources of risks involved make risk management of these fixed income portfolios challenging. Therefore, characterizing the joint dynamics of exchange rates and interest rates in different countries is extremely important for banks and investors, who are interested in assessing and managing the risks in their portfolios, as well as for regulators, who are keen to understand the underlying risks for setting adequate bank capital requirements and monitoring systemic risks.
∗
Corresponding author. Tel.: +1 607 255 0145; fax: +1 607 255 9984. E-mail address:
[email protected] (D. Ng).
1 Previous literature largely focuses on whether to hedge the exchange rate risk in such portfolios. However, the portfolio values are driven by interest rates in both regions, in addition to the exchange rate risk. 0304-4076/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2009.10.010
Our article contributes to the literature by providing one of the first studies of multi-factor affine models of the joint term structure of interest rates in the two largest economies in the world, the United States and the European Union. We focus on the US and the EU because of the dominance of their currencies in the international financial markets as well as the importance of dollarand euro-denominated bond markets. Since the introduction of euro in 1999, the currency has been gaining dominance as one of the two major currencies in the world. Countries that have adopted the euro include Austria, Belgium, Finland, France, Germany, Greece, Italy, Ireland, Luxembourg, The Netherlands, Spain and Portugal. With more than 610 billion euros in circulation at the end of 2006, the euro has surpassed the US dollar in terms of cash value in circulation (Atkins, 2006). Among international issuers outside the eurozone, euros and dollars are the favorite currencies. Most international issuers choose to issue their bonds and notes in euros (43.5% of total volume) or in US dollars (40.5% of total volume), according to the European Central Bank (2004). Among different bond markets, the euro and dollar bond markets are the two most important ones. As of the end of 2003, eurozone domestic governments and corporations had an outstanding volume of US $ 5462 billion worth of euro-denominated bonds issued in their domestic countries (European Central Bank, 2004). This represents 22.3% of outstanding volume of domestic-issued debt among all
56
A.V. Egorov et al. / Journal of Econometrics 162 (2011) 55–70
developed countries, and this size is second only to that of the United States. Our analysis of the joint dollar–euro affine term structure models (ATSM) makes both theoretical and empirical contributions to the literature. Theoretically, our article systematically examines joint term structure models with four (both local and common) factors. In a domestic setting, Dai and Singleton (2000) provide a systematic classification of N-factor ATSM into (N + 1)-subfamilies and derive the maximal model that nests existing models within each subfamily. However, international joint term structure models add another layer of complexity due to the presence of both local and common factors. We provide one of the first classifications of all four-factor joint ATSM within the maximally admissible classification scheme. Empirically, we provide new evidence on the joint term structure using daily LIBOR and Euribor rates (and corresponding swap rates) and dollar/euro exchange rates between July 1999 and June 2003.2 While Litterman and Scheinkman (1991) show that three factors are needed to capture the US term structure, we find that more than three factors are needed to capture the joint dollar–euro term structure dynamics focus on four-factor models in our estimation. One class of four-factor term structure models, with two common and two local factors, is particularly promising in terms of in-sample goodness of fit and out-of-sample density forecasting ability. The model that works the best is a four-factor affine model in which two local factors drive volatilities and two common factors follow Gaussian processes. We conduct our empirical analysis in three steps. First, using principal component analysis, we examine the total number of factors and the numbers of common vs. local factors needed to capture the joint dollar–euro term structure. Domestic term structure models usually use up to three factors (e.g., Dai and Singleton, 2000). But in an international setting, how many factors should be included and how many of them should be common or local remain open questions. Most international articles use two or three factors with different combinations of local and common factors (e.g., Inci and Lu, 2004; Hodrick and Vassalou, 2002; Tang and Xia, 2006; Backus et al., 2001; Mosburger and Schneider, 2005). Motivated by empirical evidence from our principal component analysis, we go beyond the usual three-factor models to examine four-factor models. Second, we estimate the models using the approximate maximum likelihood estimation, a powerful method developed recently by Aït-Sahalia (2002, forthcoming) and Aït-Sahalia and Kimmel (2002). The absence of a closed-form solution for transition density of affine models makes maximum likelihood estimation infeasible. Existing studies on international term structure models have resorted to other estimation methods, which could be either inaccurate in small samples or computationally burdensome. The approximate maximum likelihood estimation provides extremely fast and accurate estimations for affine models. We show that AïtSahalia’s approximate maximum likelihood estimation techniques work very well in estimating international term structure models. Third, we examine in-sample goodness of fit and out-of-sample density forecasting performances of all models using the nonparametric tests developed in Hong and Li (2005), Egorov et al. (2006), and Hong et al. (forthcoming). The difficulty in comparing different subfamilies of affine models, which generally are nonnested, has been recognized in the literature since Dai and Singleton (2000). Similarly, Tang and Xia (2006) note that ‘‘a suitable test in the current [non-nested] setting is. . . still not available in the literature’’. The nonparametric tests we consider overcome this
2 We use data from July 2003 to June 2006 to examine out-of-sample performance of our models.
difficulty by allowing direct comparisons of non-nested models. Moreover, these tests can reveal whether a given term structure model can satisfactorily capture the joint conditional densities of bond yields in both in-sample and out-of-sample settings. Therefore, unlike most existing studies that mainly focus on in-sample goodness of fit, we study the performances of joint ATSM in forecasting the joint conditional densities of bond yields, which are extremely important for managing international bond portfolios. Though in this article we focus on the completely affine models of Dai and Singleton (2002), we can easily extend our analysis to the essentially affine models of Duffee (2002) and the more flexible risk premium specifications of Cheridito et al. (2006). Since we estimate our term structure models in conjunction with the exchange rate data, we can easily extend our analysis to examine the forward premium puzzle and exchange rate dynamics. While we focus on forecasting the joint conditional density of bond yields in this article, we can extend the approximate maximum likelihood to calculate and forecast the Value-at-Risk (VaR) of international bond portfolios. Our article belongs to the recent fast-growing literature on twocountry joint term structure models, which has mainly focused on the forward premium puzzle or the benefits of international diversification.3 While these studies have made important contributions, our article is the first that examines the joint dollar–euro term structures. In addition, while each of these studies uses a different specification of the term structure model, we provide a systematic analysis of four-factor joint ATSM. While most existing studies mainly focus on in-sample fit, our article is the first that examines the out-of-sample density forecasting performances of joint ATSM. This article is the closest in spirit to a concurrent working article, Mosburger and Schneider (2005). They investigate the performance of models driven by a mutual set of global state variables in explaining joint term structures in the US and the UK. They then discuss which mixture of Gaussian and square root processes is best suited for modeling international bond markets. They also derive necessary conditions for the correlation and volatility structure of mixture models to explain the forward premium puzzle and differently shaped yield curves. Our study is different due to our focus on the euro and dollar markets, the econometric methodology used for comparing non-nested models, and the use of four-factor joint term structure models. The article is organized as follows. Section 2 describes the data and reports on the principal component analysis for dollar and euro term structures. Section 3 introduces our two-country affine term structure model and provides a systematic classification of fourfactor affine models with both global and local factors. Section 4 discusses the econometric methods for estimating and testing joint dollar–euro term structure models. Section 5 presents empirical results, and Section 6 concludes the article. 2. Data description and principal component analysis In this section, we provide necessary background information on the economies of the US and the EU, as well as the dollar/euro exchange rate. We also provide summary statistics on the dollar and euro term structure data. Finally, we provide principal component analysis on the number of factors that drive the joint term structure in the two economies.
3 Backus et al. (2001), Inci and Lu (2004), Han and Hammond (2003), and Brennan and Xia (2006) look at the forward premium puzzle. Dewachter and Maes (2001) and Ahn (2004) examine the benefits of international diversification. Hodrick and Vassalou (2002) and Tang and Xia (2006) empirically examine term structures in multiple countries.
A.V. Egorov et al. / Journal of Econometrics 162 (2011) 55–70
57
2.1. Data description Our primary data sample consists of daily euro interest rates (Euribor), US interest rates (LIBOR), and dollar/euro exchange rates between July 1, 1999 and June 30, 2003. We also use the next three years’ data from July 1, 2003 to June 30, 2006 to examine the out-of-sample performance of our estimates. Specifically, we consider Euribor with maturities of 1, 3, 6, and 9 months, LIBOR with maturities of 1, 3, and 6 months, euro-Euribor interest rate swap rates with maturities of 1–10 years, US dollar-LIBOR interest rate swap rates with maturities of 1 to 10 years, and dollar/euro exchange rates. While the interest rate data are obtained from Datastream, exchange rate data are provided by GTIS. The euro currency was officially launched in 1999, while the European Central Bank (ECB) was formed the year before. The ECB is an independent entity in charge of monetary policy for the European Union, with responsibility towards the euro. It holds the official central bank reserves of member countries in the European Union. As specified under the Maastricht Treaty, the main goal of the ECB is to achieve price stability within the European Union. At the same time, the ECB is under pressure from European governments to balance its mandate to fight inflation with the objective of maintaining economic growth in the EU (Open University, 2006). As our empirical inquiry is specific to dollars and euros, we are limited to the relatively short time sample after the euro was launched. The euro was launched on January 1, 1999 for accounting purposes and electronic fund transfers, although euro notes and coins were not issued as legal tender until 2002. To minimize any noise due to the initial adoption of the euro, we start our data series on July 1, 1999. Both Euribor and LIBOR, denoted here for simplicity as L (τ ), are simply compounded interest rates for euro and dollar, respectively. They are related to continuously compounded spot rates Y (τ ) by L(τ ) =
1 τ Y (τ ) e −1 ,
(1)
τ
where the time-to-maturity τ follows the actual-over-360-daycounting convention for both currencies, starting two business days forward. The swap rates for both LIBOR and Euribor have payment intervals of six months and are related to the zero prices D (τ ) (discount factors) by S w ap(τ ) = 2
1 − D(τ ) 2τ
∑
.
(2)
D (i/2)
Fig. 1. Term structures of LIBOR and Euribor zeros from July 1, 1999 to June 30, 2006. The top panel shows the LIBOR term structures, while the bottom panel shows the Euribor term structures. The vertical axis shows the annualized interest rates. The two horizonal axes represent the maturities of the bonds and the date (i.e., no of years after July 1, 1999).
In June 2003, the discount rate was at a level of 1%, the lowest in 50 years, while the 10-year interest rate in the US also dropped to 3.84%. From 2003 to 2006, the US short-term interest rate has gone up and the yield curve has become flatter. In our study, we model the yield curves of the two economies during this period. Fig. 2 shows the dollar–euro exchange rates along with the dollar and euro 6-month interest rates during our sample period. During the first two years of its launch, the euro depreciated against the dollar, partially driven by portfolio flows to the US where a higher interest rate prevailed. Since 2001, the euro has started to appreciate against the dollar. The appreciation of the euro has coincided with a flip in the interest rate differentials when euro interest rates exceeded US interest rates. The appreciation of the euro slowed down in the middle of 2005, when the US interest rate became higher than the euro interest rate again. Our theoretical model and empirical estimation incorporate the effect of the term structures on exchange rate movement. 2.2. Principal component analysis
i=1
Based on the above relations, we recover LIBOR and Euribor zero-coupon yields for maturities ranging from 1 year to 10 years using swap rates. Table 1 presents the summary statistics for the levels and changes in these zero-coupon bond yields over the sample period. Panel A reports the mean, median, standard deviation, skewness, and kurtosis for the levels and changes of LIBOR of different maturities. Panel B reports corresponding summary statistics for Euribor. Panel C contains the correlations between the levels and changes of LIBOR and Euribor zero-coupon yields at different maturities. Fig. 1 provides time series plots of the zero curves for both LIBOR and Euribor. From late 2000 to 2003, the ECB consistently cut interest rates to stimulate slow economic growth in the midst of a stable inflationary environment (European Commission, 2005). By 2003, in response to the ECB’s concerted effort in interest rate cutting, long-term bond yields in the eurozone dropped to the lowest point in decades. This low interest rate policy was pursued not only by the ECB, but also by its counterpart in the US, the Federal Reserve Bank. After the stock market crash in 2000, the Federal Reserve cut the discount rates 13 times to stimulate the economy.
The above results and existing studies suggest that term structures in the US and the eurozone could be driven by both common factors (affecting interest rates in both economies) and local or country-specific factors. In this section, we examine the number and nature of risk factors needed to satisfactorily capture the joint dynamics of LIBOR and Euribor term structures using principal component analysis. We perform principal component analysis using both LIBOR and Euribor rates jointly and refer to thus obtained principal components (PCs) as common-PCs. We also perform similar analysis using only LIBOR or Euribor rates and refer to thus obtained PCs as LIBOR- or Euribor-PCs, respectively. Table 2 reports the variations of LIBOR, Euribor, and the joint LIBOR–Euribor rates that are explained by the first seven LIBOR-, Euribor-, and common-PCs, respectively. Consistent with existing studies, such as Litterman and Scheinkman (1991), we find that the first three PCs capture 99.97% and 99.93% of the variations in LIBOR and Euribor rates, respectively. In the case of the joint term structure, the first two, three, and four common-PCs explain 98.28%, 99.68%, and 99.84% of the total variations, respectively.
58
A.V. Egorov et al. / Journal of Econometrics 162 (2011) 55–70
Table 1 Summary statistics. Mean
Median
Std dev
Skewness
Kurtosis
0.0554 0.0839 −0.0450 −0.2392 −0.2606
1.3036 1.3104 1.5888 1.8970 2.0225
1.3412 0.1589 0.2979 0.2593
138.8800 28.6190 5.3032 4.6131 5.8087
Panel A: LIBOR series Levels 1-month LIBOR 6-month LIBOR 2-year LIBOR swap 5-year LIBOR swap 10-year LIBOR swap Changes in 1-month LIBOR 6-month LIBOR 2-year LIBOR swap 5-year LIBOR swap 10-year LIBOR swap
3.7980 3.8950 4.6016 5.3947 5.9291
3.7500 3.7200 4.6050 5.4550 5.9650
2.0759 2.1608 1.9382 1.4539 1.1090
−0.0039 −0.0043 −0.0044 −0.0036 −0.0028
0 0 −0.0050 0 0
0.0513 0.0412 0.0643 0.0691 0.0684
3.6501 3.7280 4.1152 4.6898 5.2237
3.3820 3.5380 4.2600 4.8300 5.3450
0.7956 0.8095 0.8384 0.6868 0.5487
0.1994 0.0895 −0.4313 −0.7307 −0.8148
1.8049 2.0770 2.5734 2.9384 2.9866
−0.0005 −0.0007 −0.0011 −0.0011 −0.0010
−0.0010 −0.0010
0.0331 0.0266 0.0437 0.0467 0.0425
−0.7725 −0.6282
72.2850 19.0240 4.6783 4.2775 4.1020
−2.7039
Panel B: Euribor series Levels 1-month Euribor 6-month Euribor 2-year Euribor swap 5-year Euribor swap 10-year Euribor swap Changes in 1-month Euribor 6-month Euribor 2-year Euribor swap 5-year Euribor swap 10-year Euribor swap
0 0 0
0.5543 0.5008 0.4178
Panel C: Correlations Levels
6-month LIBOR 2-year LIBOR 5-year LIBOR 6-month Euribor 2-year Euribor
6-month LIBOR
2-year LIBOR
5-year LIBOR
6-month Euribor
2-year Euribor
5-year Euribor
1
0.979 1
0.941 0.989 1
0.697 0.684 0.672 1
0.786 0.831 0.844 0.922 1
0.802 0.870 0.898 0.837 0.979
6-month LIBOR
2-year LIBOR
5-year LIBOR
6-month Euribor
2-year Euribor
5-year Euribor
1
0.395 1
0.295 0.897 1
0.380 0.262 0.218 1
0.279 0.699 0.678 0.363 1
0.179 0.650 0.686 0.200 0.922
Changes in
6-month LIBOR 2-year LIBOR 5-year LIBOR 6-month Euribor 2-year Euribor
This table reports the summary statistics of the level and change series of 1-month, 6-month, 2-year, 5-year, and 10-year zero-coupon bond yields. The sample is from July 1, 1999 to June 30, 2003. Panel A shows the summary statistics for LIBOR. Panel B shows the summary statistics for Euribor. Panel C shows the correlations between levels of LIBOR and Euribor (upper table), and correlations between changes in LIBOR and Euribor (lower table).
While domestic term structure models typically include up to three factors, there is not a consensus on how many factors are needed for the joint dollar–euro term structures and how many of these factors should be common and local. To examine these issues, we regress LIBOR- and Euribor-PCs on common-PCs to establish the relations among these factors. Then, we perform principal component analysis on the regression residuals of LIBOR- and Euribor-PCs on the first one or two common-PCs. Panel A of Table 3 reports regression results of the first three LIBOR- and Euribor-PCs on the first six common-PCs. As seen in the first row, when the first LIBOR-PC is regressed upon the first six common-PCs, the coefficient on the first common-PC is 0.905. This suggests that the first common-PC is mostly associated with the first LIBOR-PC. Similarly, we find that the second common-PC is mostly associated with the first Euribor-PC with a coefficient of 0.874. The third common-PC is mostly associated with the second LIBOR-PC, with a coefficient of 0.868. The fourth common-PC is mostly associated with the second Euribor-PC, with a coefficient of −0.831. Panel B of Table 3 reports the R-squares from regressing LIBORand Euribor-PCs on the first five common-PCs. The first commonPC explains 99.245% of the variations of the first LIBOR-PC and
almost zero variation of the second and third LIBOR-PCs. On the other hand, the first common-PC explains only 86.267% of the variations of the first Euribor-PC. If we include the first three commonPCs in the regression, we can explain most of the variations in the first LIBOR-PC (99.99%), the second LIBOR-PC (96.5%), and the first Euribor-PC (99.97%). However, we are not able to fully capture the variations in the second Euribor-PC (only 83.37%). Only when the fourth common-PC is included in the regression, we can explain 99.96% of the second Euribor-PC. Therefore, if we want to capture the first two LIBOR- and Euribor-PCs, we need to include at least four factors. Furthermore, if the goal is to capture the first three LIBOR- and Euribor-PC, we would need to include at least five, or even six, factors. Therefore, the results in Table 3 suggest that at least a four-factor model is needed to capture the joint dollar–euro term structure. Table 4 provides further analysis on the number of local versus common factors we should include in the four term structure factors. Each column presents the result of a principal component analysis and reports the variations explained by the first seven PCs. To set the benchmark, column 2 reports the result of principal component analysis conducted on LIBOR rates only (same as that in Table 1). Column 3 (4) contains the results of principal component
A.V. Egorov et al. / Journal of Econometrics 162 (2011) 55–70 Table 2 Principal component analysis for LIBOR, Euribor, and joint term structure models. Number of principal components
Percentage of variations explained LIBOR (%)
Euribor (%)
Joint
1 2 3 4 5 6 7 First two First three First four First five First six First seven
98.456 1.377 0.137 0.018 0.006 0.002 0.001 99.833 99.970 99.988 99.994 99.996 99.998
96.368 3.263 0.301 0.046 0.012 0.005 0.001 99.631 99.932 99.978 99.990 99.995 99.996
94.738 3.540 1.401 0.162 0.086 0.049 0.001 98.278 99.679 99.841 99.927 99.976 99.984
Table 2 reports the variations of LIBOR, Euribor, and joint LIBOR–Euribor term structure models that are explained by the first seven PCs. The estimation is done using daily LIBOR and Euribor from July 1, 1999 till June 30, 2003. Column 1 lists the number of PCs. Column 2 reports the LIBOR term structure model variation that is explained by the corresponding PCs. Column 3 reports the Euribor term structure model variation that is explained by the corresponding number of principal components. Column 4 reports the joint term structure model variation that is explained by the corresponding number of PCs. The results for PCs ranging from 1 to 7 are first reported, followed by the results from the first two up to first seven PCs.
analysis on the residuals of LIBOR rates after regressions on the first one (two) common-PC(s). One local LIBOR factor captures 60.75% of the regression residuals on the first common-PC, and two local LIBOR factors are needed to capture most of the variations of the residuals. However, one local LIBOR factor explains about 90.56% of the regression residuals on the first two common-PCs. Therefore, it seems that the best combination is to have two common factors and one local LIBOR factor. Columns 5, 6, and 7 present the corresponding results for Euribor. One local Euribor factor captures 88.23% of the regression residuals of Euribor rates on the first two common-PCs. Again, the best combination here is to have two common factors and a local Euribor factor. Overall, Tables 3 and 4 suggest that four factors are needed to satisfactorily capture the two yield curves, and that the four factors
59
should include one LIBOR local factor, one Euribor local factor, and two common factors. If we want to include only three factors, then we should have a local LIBOR factor, a local Euribor factor and a common factor. In this case, while the first LIBOR- and Euribor-PC can be explained by the common factor reasonably well, at 99.25% and 86.27%, respectively, the second and third LIBOR- and EuriborPCs cannot be explained well by the local factors. Only 60.76% of the LIBOR residuals and about 85% of Euribor residuals can be explained, which is not completely satisfactory. Motivated by the above evidence, in our estimation we consider four-factor models in the article. Another motivation for consideration of models with more than three factors comes from recent articles by Cochrane and Piazzesi (2005) and Dai et al. (2004). This literature addresses bond risk premia, and, in particular, it shows that up to five factors might be needed for affine models to forecast bond risk premia in the US government bonds. Hence, a four-factor joint term structure model would be preferable to a three-factor model. 3. Theoretical model 3.1. Joint affine term structure models In this section, we develop a two-country joint ATSM for the US and the eurozone.4 Suppose the uncertainty of the two economies are described by a complete probability space (Ω , F , P ), where P denotes the physical measure. Let Q and Q ∗ be the equivalent martingale measures for the US and the eurozone, respectively. Also denote r (t ) and r ∗ (t ) as the instantaneous risk-free rate in the US and the eurozone, respectively. Then the time-t prices of zero-coupon bonds maturing at t + τ denominated in dollar and euro are given as, respectively, P (t , τ ) = Et
Q
[
t +τ
∫
r (u) du
exp −
] and
t
4 It should be noted that Euro is issued by the European Union but not by a single country. Nevertheless, we decide to use the term two-country joint term structure model throughout the article for the sake of exposition.
Table 3 Common factors in the term structure models. Panel A: Results from regressing LIBOR and Euribor factors on all six common factors Regression coefficients on common factor 1st factor LIBOR 1st factor LIBOR 2nd LIBOR 3rd Euribor 1st Euribor 2nd Euribor 3rd
0.905
−0.001 0.001
−0.425 0.017
−0.004
2nd factor
3rd factor
0.406 0.008 −0.085 0.874 0.246 −0.046
0.055 0.868 0.002 0.097 −0.478 −0.077
4th factor 0.107
−0.483 0.022 0.196 −0.831 0.031
5th factor
6th factor
R2
−0.349 −0.021 −0.974 −0.077 −0.032 −0.138
−0.019 −0.052
0.99999 0.99992 0.99933 0.99999 0.99980 0.98548
0.128
−0.031 0.041
−0.932
Panel B: R2 from regressing LIBOR and Euribor factors on the first five common factors R2
LIBOR 1st factor LIBOR 2nd LIBOR 3rd Euribor 1st Euribor 2nd Euribor 3rd
Only 1st factor
First 2 factors
First 3 factors
First 4 factors
First 5 factors
0.99245 0.00006 0.00167 0.86267 0.03921 0.02197
0.99920 0.00029 0.23574 0.99900 0.35896 0.14200
0.99997 0.96525 0.23581 0.99966 0.83370 0.27543
0.99999 0.99977 0.23652 0.99997 0.99955 0.27801
0.99999 0.99980 0.99198 0.99999 0.99969 0.30464
Table 3 reports the results from regressing the LIBOR and Euribor factors on to the common factors. LIBOR and Euribor factors are PCs from LIBOR and Euribor term structure models, while common factors are PCs from the joint term structure model. Panel A reports the results from regressing LIBOR and Euribor factors on six common factors. Panel B reports the R-square from regressing LIBOR and Euribor factors on the first six common factors. In panel A, the first column lists the dependent variables, i.e., the first three LIBOR or Euribor factors. The next six columns reports the regression coefficients on the first six common factors. The last column reports the R2 of the regressions. In panel B, the first column lists the dependent variables, i.e., the first three LIBOR and the first three Euribor factors. The next column reports the R2 when the dependent variable is regressed upon the first factor of the joint term structure model. The next four columns report the result when the regressions are run on the first two factors, first three factors, first four factors, and first five factors, respectively. The estimation is done using daily LIBOR and Euribor data from July 1, 1999 till June 30, 2003.
60
A.V. Egorov et al. / Journal of Econometrics 162 (2011) 55–70
6-mth Euribor 6-mth LIBOR $/Euro exchange rate
Fig. 2. Dollar and euro interest rates and exchange rates. This shows the interest rates and exchange rates from July 1, 1999 to June 30, 2006. Six-month LIBOR and Euribor interest rates in % are displayed under the scale shown on the left y-axis. Dollar-to-euro exchange rate is shown under the scale on the right y-axis. Table 4 Principal component analysis for LIBOR and Euribor Models after taking out 1 or 2 common factors. Principal component
Percentage of variations explained
1 2 3 4 5 6 7
LIBOR only
LIBOR after 1 common factor
LIBOR after 2 common factors
Euribor only
Euribor after 1 common factor
Euribor after 2 common factors
0.98456 0.01377 0.00137 0.00018 0.00006 0.00002 0.00001
0.60756 0.33526 0.04449 0.00753 0.00265 0.00101 0.00054
0.90956 0.06909 0.01182 0.00409 0.00181 0.00140 0.00082
0.96368 0.03263 0.00301 0.00046 0.00012 0.00004 0.00002
0.85074 0.13176 0.01409 0.00212 0.00070 0.00027 0.00013
0.88229 0.09397 0.01453 0.00468 0.00181 0.00110 0.00060
Table 4 reports the variations of LIBOR and Euribor residuals that are explained by the first seven PCs. The estimation is done using daily LIBOR and Euribor data from July 1, 1999 till June 30, 2003. Column 1 lists the number of principal components. Column 2 reports the LIBOR term structure model variation that is explained by the corresponding number of PCs. Column 3 reports the variations explained for the residuals of LIBOR after it is regressed on the first common factor. Column 4 reports the variations explained for the residuals of LIBOR after it is regressed on the first two common factors. Column 5 reports the Euribor term structure model variation that is explained by the corresponding number of PCs. Column 6 reports the variations explained for the residuals of Euribor after it is regressed on the first common factors. Column 7 reports the variations explained for the residuals of Euribor after it is regressed on the first two common factors. Q∗
P ∗ (t , τ ) = Et
[
t +τ
∫
]
r ∗ (u) du ,
exp − t
Q where Et
Q∗ and Et
denote Ft -conditional expectations under Q and Q ∗ , respectively. We assume that term structure dynamics in the US and the eurozone are driven by a vector of N latent state variables X (t ) = [X1 (t ), X2 (t ), . . . , XN (t )]′ . We assume that X (t ) follows an affine diffusion under the physical measure: dX (t ) = κ [ϑ − X (t )] dt + Σ St dW (t ),
(3)
where W (t ) is an N × 1 independent standard Brownian motion under measure P , κ and Σ are N × N parameter matrices, and ϑ is an N × 1 parameter vector. The matrix St is diagonal with (i, i)-th elements St (ii) ≡
αi + βi′ X (t ),
i = 1, . . . , N ,
(4)
where αi is a scalar parameter and βi is an N × 1 parameter vector. We also assume that the spot rates r (t ) and r ∗ (t ) are affine functions of the N latent state variables: r (t ) = δ0 + δ ′ X (t ) and
r ∗ (t ) = δ0∗ + δ ∗′ X (t ),
(5)
where δ0 and δ0∗ are scalars, and δ and δ ∗ are N × 1 vectors. X (t ) denotes all factors including both domestic and common factors. If all elements of vectors δ and δ ∗ are non-zero, then all factors affect both spot rates r (t ) and r ∗ (t ). If only those elements of δ are non-zero for which the corresponding elements of δ ∗ are zero, then only domestic factors affect spot rates r (t ) and r ∗ (t ). Common factors are defined as the factors that enter the expressions for both r (t ) and r ∗ (t ). All other factors in vector X (t ) are country-specific or local factors. We require local factors from one country to be conditionally independent of local factors from another country. That is, local factors from the two countries could be correlated through their dependence on common factors. However, conditioning on common factors, they should not be correlated. For the rest of the article, we refer to a two-country joint term structure model as decomposable if it can be decomposed under the physical measure P into two single-country affine models. It can be shown that all affine joint term structure models are decomposable as long as the common factors do not depend on the local factors. The local factors may or may not depend on the common factors. The basic intuition of the above result is that if the common factors do not depend on the local factors, then the term structure in
A.V. Egorov et al. / Journal of Econometrics 162 (2011) 55–70
each country is completely determined by the common factors and its local factors. Consequently, we can estimate term structure dynamics in each country without using any information from the other country if our model is correctly specified. If a model is decomposable under measure P , it is also decomposable under measure Q provided that the market prices of risk for each country do not depend on the local factors of the other country. We restrict our analysis to decomposable models, which has the important advantage of reducing the dimensionality of the joint model by the number of local factors from the other country. A two-country joint term structure model is symmetric if the submodels for domestic and foreign countries have the same structure. We restrict our analysis to symmetric models. In these cases, each country has the same number of local and common factors, although the extent to which they are affected by these factors can be different. While the symmetry assumption simplifies the analysis, it can be easily relaxed and the methodology can be extended to non-symmetric models. The country risk premiums are assumed to follow completely affine specifications. Hence, the domestic country risk premium is defined by Λ(t ) = St λ, where λ is an N × 1 parameter vector with zero components corresponding to the foreign country-specific factors. Similarly, the foreign country risk premium is defined by Λ∗ (t ) = St λ∗ , where λ∗ is an N × 1 parameter vector with zero components corresponding to the domestic countryspecific factors. Our decomposition from joint to single-country term structure models can be applied in a similar way towards decomposition in the risk-neutral measure. Under the Q -measure, dX (t ) = κ Q ϑ Q − X (t ) dt + Σ St dW Q (t ),
(6)
where dW (t ) = dW (t ) − Λ(t )dt , ϑ and κ represent the riskneutral parameters for the US. Similarly, under the Q ∗ -measure, Q
dX (t ) = κ Q
Q
∗
Q
∗ ∗ ϑ Q − X (t ) dt + Σ St dW Q (t ),
∗
∗
(7)
P (X (t ), τ ) = exp −A(τ ) − B(τ )′ X (t ) ,
P ∗ (X (t ), τ ) = exp −A∗ (τ ) − B∗ (τ )′ X (t ) .
The yields of zero-coupon bonds (denoted by Y (X (t ), τ ) = − ln(P (X (t ), τ ))/τ ) are an affine function of the state variables,5 Y (X (t ), τ ) = A(τ )/τ + B(τ )′ /τ X (t ),
(8)
Y (X (t ), τ ) = A (τ )/τ + B (τ ) /τ X (t ),
(9)
∗
∗
∗
′
where the scalar function A(·) (A (·)) and the N × 1 vector-valued function B(·) (B∗ (·)) either have a closed-form or can be easily solved via numerical methods. The affine structure significantly simplifies bond pricing and empirical analysis of term structure dynamics. To completely characterize the risk exposures of an international bond portfolio, we also need to model the dynamics of the dollar/euro exchange rate. Assuming complete market and noarbitrage pricing, Backus et al. (2001) and Ahn (2004) have shown that the exchange rate, S (t ), defined as the number of dollars per unit of euro, equals ∗
S (t + τ ) S (t )
=
M ∗ (t + τ ) M (t + τ )
or equivalently s (t + τ ) − s (t ) = m∗ (t + τ ) − m (t + τ ) , where M (t ) and M ∗ (t ) are the pricing kernels in the US and the eurozone, respectively, s (t ) = log S (t ) , m∗ (t ) = log M ∗ (t ), and m (t ) = log M (t ) . We assume that the pricing kernels in the US and the eurozone are given by the following specifications dM (t )
= −r (t ) dt − Λ′ (t )dW (t ) ,
M (t ) dM ∗ (t ) M ∗ (t )
= −r ∗ (t ) dt − Λ∗′ (t )dW (t ) .
Given the dynamics of the pricing kernels, following Backus et al. (2001) and Inci and Lu (2004), we obtain the dynamics of the log exchange rate by using Ito’s lemma, ds (t ) = r (t ) − r ∗ (t ) dt +
+
N −
λi − λ∗i
N 1 −
2 i=1
λ2i − λ∗i 2
αi + βi′ X (t ) dt
αi + βi′ X (t )dW (t ) .
(10)
i =1
Therefore, both bond yields and exchange rates follow affine diffusions. In the next section, we will conduct specification analysis of the number of common and local factors assumed in X (t ). Currency depreciation is driven by the difference in loadings on the common factors as well as the difference in the local factors times the loadings on them. In our empirical work, we will estimate the joint term structure model together with the exchange rate equation. Recognizing that there may be additional noise or observational error in the exchange rate process, we will include an error term in (10) which is normally distributed with zero mean.
∗
where dW Q (t ) = dW (t ) − Λ∗ (t )dt , ϑ Q and κ Q represent the risk-neutral parameters for the eurozone. Given the risk-neutral dynamics of X (t ) under both Q and Q ∗ , prices of zero-coupon bonds in the US and the eurozone are given as:
61
3.2. Classification of four-factor joint term structure models In the above section, we have developed a general N-factor joint ATSM for the US and the eurozone. In this section, we specialize the general model to a four-factor model based on the empirical results from principal component analysis using LIBOR and Euribor data. Following the analysis of Dai and Singleton (2000) for singlecountry models, we classify all admissible symmetric four-factor joint term structure models into subfamilies, and within each subfamily, derive the maximal model that nests existing models. The main idea is to treat the two-country model as a single model in which some factor(s) are common and the others are local. We then study possible specifications for common factors. Dropping trivial cases, we come up with specifications for all interesting models. Specifically, for the joint affine term structure, we assume the canonical representation Am (N ), where m ∈ {0, 1, . . . , N } is the number of state variables that affect the instantaneous variance of X (t ). In particular, for i = m + 1, . . . , N, we have St (ii) = Xi (t )1/2
1/2
for i = 1, . . . , m, and St (ii) = 1 + βi′ Xi (t )
′ where βi = βi1 , . . . , βim, 0, . . . , 0 .6
for i = m + 1, . . . , N,
For all the four-factor joint affine models we consider, our symmetry assumption implies that each model has two common factors and one local factor for each country. All four-factor models can be divided into the following six subfamilies.
,
5 See, e. g., Dai and Singleton (2000) and references therein.
6 Admissibility restrictions in Dai and Singleton (2000) and Aït-Sahalia and Kimmel (2002) also apply to our two-country affine term structure model. Additional restrictions on the structure of the affine two-country term structure are discussed later.
62
A.V. Egorov et al. / Journal of Econometrics 162 (2011) 55–70
• A0 (4) Model
In this model, the dynamics of X (t ) are given as
κ11 X1t κ21 X2t d = κ31 X3t X4t κ41
0
0 0
κ22 κ32 κ33 κ42 0
0 −X1t 0 −X2t dt 0 −X3t −X4t κ44
W1t
W + d 2t . W
(11)
3t
Without loss of generality, we assume that X1 and X2 are the common factors. We also assume that X3 is the US local factor and that X4 is the euro local factor. The restriction that κ43 = 0 is necessary for X3 and X4 to be local factors. Thus for each country we have an A0 (3) model with two common factors. Specifically, the spot rate in the US equals r (t ) = δ0 + δ1 X1 (t ) + δ2 X2 (t ) + δ3 X3 (t ), and the spot rate in the eurozone equals r ∗ (t ) = δ0∗ + δ1∗ X1 (t ) + δ2∗ X2 (t ) + δ4∗ X4 (t ) .
• A1 (4) Model
The dynamics of X (t ) are given as
κ11 X1t X2t κ21 d = κ31 X3t κ41 X4t
0
κ22 κ32 κ42
X1t 0 + 0 0
0 ϑ1 − X1t 0 −X2t dt 0 −X3t −X4t κ44
0 0
κ33 0
0
κ11 X1t 0 X2t d = X3t 0 X4t 0
W4t
Now, we have two possibilities for non-trivial symmetric choice of common factors. The first one is the A2,1 (4) model, where κ34 = κ43 = 0, and X1 and X2 are the two common factors. In this model, both common factors affect both the volatility and drift terms of (13), and none of the local factors affects the volatility term. For each country, we have an A2 (3) model. The spot rate in the US equals r (t ) = δ0 + δ1 X1 (t ) + δ2 X2 (t ) + δ3 X3 (t ), and the euro spot rate is r ∗ (t ) = δ0∗ + δ1∗ X1 (t ) + δ2∗ X2 (t ) + δ4∗ X4 (t ). • A2,2 (4) Model The model has the following dynamics:
0 0
1 + β21 X1t 0 0
1 + β31 X1t 0
0 0 0
1 + β41 X1t
W1t W2t . ×d W3t W4t
(12)
The model has the following dynamics: X1t κ11 X2t κ21 d = X3t κ31 X4t κ41
X1t
X2t 0 0
0 W1t 3t
κ33 0
0 ϑ1 − X1t 0 ϑ2 − X2t dt 0 −X3t κ44 −X4t
0
0 1 + β31 X1t + β32 X2t 0
0
0 0 1 + β41 X1t + β42 X2t
(13)
X1t
0 ϑ1 − X1t 0 ϑ2 − X2t dt 0 −X3t −X4t κ44
0
W1t 0 d W2t . W
0
X2t 0 0
0
0 1 0
0 1
(14)
3t
W4t
In A2,2 (4), X3 and X4 are the two common factors, while X1 and X2 are the local factors in (14). This leads to restrictions κ12 = κ21 = κ31 = κ32 = κ41 = κ42 = 0 and β31 = β32 = β41 = β42 = 0. We further impose, without loss of generality, κ34 = 0. In this case, the common factors affect only correlation structure (i.e., the drift term). For the US, we have r (t ) = δ0 + δ1 X1 (t ) + δ3 X3 (t ) + δ4 X4 (t ), and for the euro zone, we have r ∗ (t ) = δ0∗ + δ2∗ X2 (t ) + δ3∗ X3 (t ) + δ4∗ X4 (t ). Thus for each country, we have a special case of the A1 (3) model. The common factors X3 and X4 follow a bivariate Gaussian process, while each local factor follows a univariate square root process. • A3 (4) Model In this model, Eq. (3) specializes to X1t κ11 X2t κ21 d = X3t κ31 X4t κ41
0
κ22 0 0
0
0 + 0
X2t
0 0
0
0 0
κ33 0
0 ϑ1 − X1t 0 ϑ2 − X2t dt 0 ϑ3 − X3t κ44 −X4t
0
0
0
0
X3t 0
W1t W2t d . W3t
0 1 + β41 X1t
(15)
W4t
Without loss of generality, non-trivial symmetric common factors are X1 and X4 . This implies that β42 = β43 = 0, and κ12 = κ13 = κ23 = κ32 = κ42 = κ43 = 0. For the US, we have r (t ) = δ0 + δ1 X1 (t ) + δ2 X2 (t ) + δ4 X4 (t ), and for eurozone the spot rate is r ∗ (t ) = δ0∗ + δ1∗ X1 (t ) + δ3∗ X3 (t ) + δ4∗ X4 (t ). • A4 (4) Model In this model, Eq. (3) simplifies to X1t κ11 X2t κ21 d = X3t κ31 X4t κ41
κ12 κ22 κ32 κ42
W × d 2t . W W4t
0 0
0
0 + 0
κ12 κ22 κ32 κ42
κ33 κ43
0 0
0
X1t
• A2,1 (4) Model
0 0
κ22
0 + 0
The symmetry assumption suggests that X1 should be a common factor. Without loss of generality, we assume that X2 is the second common factor. To ensure that X3 and X4 are local factors, we include additional restrictions: κ23 = κ24 = κ34 = κ43 = 0. For each country, we have an A1 (3) model with two common factors X1 and X2 . The first factor X1 affects both the drift and volatility terms in (12), while the second factor X2 affects only the drift term in (12). Moreover, none of the local factors affects volatility directly since the equation for X1 in the system (12) does not depend on any other factor. In the US, the spot rate equals r (t ) = δ0 + δ1 X1 (t ) + δ2 X2 (t ) + δ3 X3 (t ), and in the euro zone, the spot rate is r ∗ (t ) = δ0∗ + δ1∗ X1 (t ) + δ2∗ X2 (t ) + δ4∗ X4 (t ).
0
X1t
0 + 0 0
0 0
κ33 0
0
X2t
0 0
0 ϑ1 − X1t 0 ϑ2 − X2t dt 0 ϑ3 − X3t κ44 ϑ4 − X4t
0
0
X3t
0
W1t 0 W2t d W3t . (16) 0 W 0
X4t
4t
Without loss of generality, non-trivial symmetric common factors are X1 and X2 . This implies that κ13 = κ14 = κ23 = κ24 = κ34 = κ43 = 0. For the US, we have r (t ) = δ0 + δ1 X1 (t ) + δ2 X2 (t ) + δ3 X3 (t ), and for eurozone, the spot rate is r ∗ (t ) = δ0∗ + δ1∗ X1 (t ) + δ2∗ X2 (t ) + δ4∗ X4 (t ). For each country, we get a particular case of A3 (3) model.
A.V. Egorov et al. / Journal of Econometrics 162 (2011) 55–70
4. Estimation and testing methodologies In this section, we discuss the econometric methodologies for estimating and testing the multi-factor joint ATSM. Specifically, we adopt the approximate maximum likelihood method of AïtSahalia (forthcoming) to estimate model parameters. We also use the nonparametric Portmanteau test of Hong and Li (2005) to examine the in-sample performance of the models. Similarly, we use the test of Egorov et al. (2006) and Hong et al. (forthcoming) to evaluate the out-of-sample density forecasting performances of the models. 4.1. Approximate maximum likelihood estimation One challenge we face in our analysis of the joint affine term structure models is parameter estimation. The most desirable estimation method is maximum likelihood, given its consistency and asymptotic efficiency. However, except for the case of a multi-factor Gaussian model, the transition density of most affine models generally has no closed-form. For these models, maximum likelihood is infeasible and alternative estimation methods have to be used. Many existing studies use the quasi-maximum likelihood estimation (e.g., Han and Hammond, 2003; Dewachter and Maes, 2001; Brennan and Xia, 2006; Tang and Xia, 2006) because of its ease of implementation. Two assumptions are needed for the quasi-maximum likelihood method to work. First, the density of the state vector conditional on the previous observation is assumed to follow a multivariate Gaussian distribution. Second, the mean vector and covariance matrix of the state vector are assumed to be proportional to the length of time between observations. As Aït-Sahalia and Kimmel (2002) point out, both these assumptions are unlikely to hold. Only some affine yield models have a Gaussian transition density, and even in those cases, the assumptions of quasi-maximum likelihood estimation regarding the mean and variance of the transition density are not accurate. Another popular method is the simulation-based efficient method of moments of Gallant and Tauchen (1996, 2001). Dai and Singleton (2000) use this in their affine term structure estimation. The efficient method of moments is efficient as the number of moment conditions goes to infinity with the number of data observations. However, Duffee and Stanton (2004) find that this method performs poorly in a small sample in the context of the affine term structure model. Other potential estimation methods include simulated maximum likelihood estimation method of Brandt and Santa-Clara (2002) or the empirical characteristic function method of Singleton (2001). These methods are computationally intensive for a scalar diffusion and especially difficult for multivariate diffusions. Following Aït-Sahalia (2002, forthcoming) and Aït-Sahalia and Kimmel (2002), we estimate the joint term structure models using the approximate maximum likelihood method. The approximate likelihood method provides extremely fast and accurate estimations for affine models (see the comparison in Egorov et al. (2003)) when the data are sampled daily, as in our case. One disadvantage of this method is that it requires preliminary work in obtaining a closed-form formula for the approximate likelihood through linear expansions. Aït-Sahalia and Kimmel (2002) derive this analytic formula for two-factor and three-factor models. Since the relation between the state vector and bond yields is affine, as in Eqs. (8) and (9), we can derive the transition function of the bond yields and currency change from the transition function of the state vector through a change of variables and multiplication by a Jacobian. Let pX (∆t , x|xo ; θ ) denote the transition function, that is the conditional density of X (t + ∆t ) = x given X (t ) = xo . Let pY (∆t , y|yo ; θ ) also denote the transition function of the vector of
63
yields and currency changes Y (t + ∆t ) = y given Y (t ) = yo . In our case, with daily data, ∆t is the inverse of the number of trading days in a year (∆t ≈ 1/250). We obtain latent factors X by inverting a system of Eqs. (8) and (9) by taking enough yields. To guarantee invertability of state vector X , the rank of this system should be equal to the number factors N. This system can be written in matrix form as Y = Γo (θ ) + Γ ′ (θ )X . It follows that X = Γ ′−1 (θ )(Y − Γo (θ )). Hence, pY (∆t , y|yo ; θ ) ≡ Γ ′−1 (θ )pX (∆t , Γ ′−1 (θ )(y − Γo (θ ))|Γ ′−1 (θ )
× (yo − Γo (θ )); θ ).
(17)
Noting that the yields vector follows a Markov process and applying the Bayes rule, the log-likelihood function for discrete data on the yield vector yt sampled at dates t0 , t1 , . . . ., tn is obtained: Ln (θ ) ≡ n−1
n −
ln pY ti − ti−1 , yti |yti−1 ; θ .
(18)
i=1
To estimate this likelihood function, we need to derive a closedform approximation for pY and for the log-likelihood function of the discretely sampled vector of yields. Aït-Sahalia and Kimmel (2002) use the highly accurate expansion method described in Aït-Sahalia (forthcoming) to derive the analytical formula in twoand three-factor ATSM. In our classification, we break down the joint term structure model into two-country models with a lower dimension of factors. The zero restrictions make the derivation of the closed-form solution for the likelihood expansions for the joint four-factor term structure model much easier, and make it possible to adapt the results in Aït-Sahalia and Kimmel’s (2002) three-factor models for our specifications. In principle, it is also possible to derive directly the likelihood expansions for the 4-dimensional, unconstrained model without assuming decomposability. In summary, here is an overview of the approximate maximum likelihood estimation procedure. Given an initial value of parameter vector θ we can estimate Γo (θ ) and Γ (θ ). Affine structure implies a system of ordinary differential equations for Γo (θ ) and Γ (θ ). It also provides us linear transformation from the observed yields and currency change Y (ti ) to the latent factors (or state variables) X (ti ) for i = 0, 1, 2, . . . , n. Closed-form approximation of Aït-Sahalia provides transition density pY ti − ti−1 , yti |yti−1 ; θ for i = 1, 2, . . . , n and thus likelihood function Ln (θ ). In the end, we maximize the likelihood function Ln (θ ). Thus, the only role the affine structure plays in the estimation method is to simplify the transformation from observed yields to state variables. This procedure can be extended for the case in which we want to use more yields than the number of factors in our model. In this situation, we usually assume that some yields and exchange rates are observed with errors and make assumptions on the structure of these errors. In particular, in our estimation we assume that 6month and 10-year LIBOR and Euribor are observed without errors, while 1-year, 2-year, and 5-year yields in the LIBOR and Euribor as well as exchange rate are observed with errors. Like Inci and Lu (2004), the observation errors are assumed to be normally distributed with zero means but different variances, serially uncorrelated, cross-sectionally uncorrelated and independent of the state variables. The log-likelihood function Ln (θ ) in that case should be augmented by additional term accounting for these errors. 4.2. Nonparametric tests for in-sample and out-of-sample performance Dai and Singleton (2000) point out that it is difficult to formally compare the relative goodness of fit of different affine models, which are generally not nested. To overcome this difficulty, we apply the nonparametric tests developed by Hong and Li (2005) to
64
A.V. Egorov et al. / Journal of Econometrics 162 (2011) 55–70
assess the in-sample goodness of fit of different models. More importantly, we provide one of the first analyses on the performances of different models in out-of-sample forecasting of the joint conditional densities of bond yields using the tests developed in Egorov et al. (2006) and Hong et al. (forthcoming). The nonparametric tests we used are based upon model transition density, which captures the full dynamics of a continuous time process. The basic idea of the tests is that if a model is correctly specified, then the probability integral transform of data via the model transition density, which is often called model ‘‘generalized residuals’’, should be i.i.d. U [0, 1]. Then, one can test the i.i.d. U [0, 1] hypothesis by comparing the kernel estimator of the joint density of the generalized residuals with the product of two U [0, 1] densities. One great advantage of this nonparametric approach is that it allows comparison across different non-nested models via a metric measuring the distance of the model generalized residuals from i.i.d. U [0, 1]. As the transition density can capture the full dynamics of a continuous-time process, these tests have power against most model misspecifications. Specifically, suppose we have a random sample of interest rates {rτ ∆ }Lτ =1 of size L, where ∆ is the time interval at which the data are observed or recorded. For a given continuous-time interest rate model, there is a model-implied transition density of
∂ P rτ ∆ ≤ r |I(τ −1)∆ , θ = p(r , τ ∆|I(τ −1)∆ , θ ), 0 < r < ∞, ∂r where θ is an unknown finite-dimensional parameter vector, I(τ −1)∆ = {r(τ −1)∆ , r(τ −2)∆ , . . . , r∆ } is the information set available at time (τ − 1) ∆. We divide the whole sample into two subsamples: an estimation sample {rτ ∆ }Rτ =1 of size R, which is used to estimate model parameters, and a forecast sample {rτ ∆ }Lτ =R+1 of size n = L − R, which is used to evaluate out-of-sample density
forecast.7 We can then define the probability integral transform of the data with respect to the model-implied transition density: Zτ (θ) ≡
∫
rτ ∆
p(r , τ ∆|I(τ −1)∆ , θ )dr ,
τ = 1, 2, . . . , L.
(19)
−∞
If the continuous-time model is correctly specified in the sense that there exists some θ0 such that the model-implied transition density p(r , τ ∆|I(τ −1)∆ , θ0 ) coincides with the true transition density of interest rates, then the transformed sequence {Zτ (θ0 )} is i.i.d. U [0, 1]. Intuitively, the U [0, 1] distribution indicates proper specification of the stationary distribution of rτ ∆ , and the i.i.d. property characterizes the correct specification of its dynamic structure. If {Zτ (θ )} is not i.i.d. U [0, 1] for all θ ∈ Θ , then p(r , τ ∆|I(τ −1)∆ , θ ) does not capture the true data-generating process. Thus, we can evaluate the in-sample and out-of-sample performance of a model by testing whether {Zτ (θ )}Rτ =1 and {Zτ (θ)}Lτ =R+1 are i.i.d. U [0, 1], respectively. Hong and Li (2005) propose an in-sample specification test that uses a quadratic form between gˆj (z1 , z2 ), a kernel estimator of the joint density of {Zτ , Zτ −j }, and 1, the product of two U [0, 1] densities.8 This test has been extended to the out-of-sample context in Egorov et al. (2006) and Hong et al. (forthcoming). Specifically, the test statistics in Hong and Li (2005) are given below
Qˆ (j) ≡
[
(n − j)h
1
∫
]
2
gˆj (z1 , z2 ) − 1
dz1 dz2
0
0
− hΨh0
1
∫
1/2
V0 ,
j = 1, 2, . . . ,
(20)
where j is a prespecified lag order.9 Hong and Li (2005) show that under suitable regularity conditions, Qˆ (j) → N (0, 1) in distribution when the continuous-time model is correctly specified. With various choices for lag order j, Qˆ (j) can reveal useful information regarding which lag order significantly departs from i.i.d. U [0, 1]. However, if a large set of {Qˆ (j)} is considered, then some of them will probably be significant even if the null is true, due to statistical sampling variation. Moreover, when comparing two different models, it is desirable to use a single Portmanteau test statistic. For this purpose, the above studies consider the following Portmanteau evaluation statistic p − ˆ (p) = √1 W Qˆ (j). p j=1
(21)
Egorov et al. (2006) and Hong et al. (forthcoming) have shown ˆ (p) → N (0, 1) in distribution when a model is that for any p, W correctly specified. Intuitively, when a model is correctly specified, cov[Qˆ (i), Qˆ (j)] → 0 in probability for i ̸= j as n → ∞. That is, Qˆ (i) and Qˆ (j) are asymptotically independent whenever i ̸= ˆ (p) is a normalized j. Thus, the Portmanteau test statistic W sum of approximately i.i.d. N (0, 1) random variables, and so is asymptotically N (0, 1). This test may be viewed as a generalization of the popular Box–Pierce–Ljung type autocorrelation test from a linear time series context to a continuous-time context with an out-of-sample setting. Under model misspecification, as n → ∞, Qˆ (j) → ∞ in probability whenever {Zτ , Zτ −j } are not independent or U [0, 1]. As long as model misspecification occurs such that there exists some ˆ (p) → ∞ lag order j ∈ {1, . . . , p} at which Qˆ (j) → ∞, we have W ˆ (p) can in probability. Therefore, the Portmanteau test statistic W be used as an omnibus procedure to evaluate the in-sample and out-of-sample performance of a continuous-time model. The key to evaluate multi-factor ATSM is to compute their generalized residuals. Suppose we have a time series observations of the yields of N zero-coupon bonds with different maturities,
L
Yτ ∆,k τ =1 , k = 1, . . . , N. Assuming that the yields are observed without error, given a parameter estimator θˆ using the estimation
R
sample Yτ ∆,k τ =1 , k = 1, . . . , N , we can solve for the underlying L state variables Xτ ∆,k τ =1 , k = 1, . . . , N. To examine whether the model transition density p(Xτ ∆ |I(τ −1)∆ , θ ) of Xτ ∆ given I(τ −1)∆ ≡ {X(τ −1)∆ , . . . , X∆ } under the physical measure captures the joint density of the process X (t ), we can test whether the probability
L
integral transforms of Yτ ∆,k τ =1 , k = 1, . . . , N, with respect to the model-implied transition density is i.i.d. U [0, 1]. There are different ways to conduct the probability integral transform for ATSM. Following Diebold et al. (1999), we partition
9 The nonstochastic centering and scaling factors are
7 One can also use rolling estimation or recursive estimation. We expect that our test procedures are applicable to these different estimation methods under suitable regularity conditions. 8 For more detailed discussion of this kernel estimator, see Hong and Li (2005). Simulation studies in Hong and Li (2005) show that the tests perform well in small samples even for highly persistent financial data.
1
[ ∫ Ψh0 ≡ (h−1 − 2)
k2 (u)du + 2 −1
∫
1
[∫
1
k(u + v)k(v)dv
V0 ≡ 2 −1
−1
b
and kb (·) ≡ k(·)/ −1 k(v)dv.
1
∫
∫
]2
b
k2b (u)dudb −1
0
2
]2 du
,
− 1,
A.V. Egorov et al. / Journal of Econometrics 162 (2011) 55–70
the joint density of the N different yields Yτ ∆,1 , . . . , Yτ ∆,N at time τ ∆ under the physical measure into the products of N conditional densities,
p Yτ ∆,1, Yτ ∆,2 , . . . , Yτ ∆,N |I(τ −1)∆ , θˆ
=
N ∏
p Yτ ∆,k |Yτ ∆,(k−1) , . . . , Yτ ∆,1 , I(τ −1)∆ , θˆ ,
k=1
where the conditional density p(Yτ ∆,k |Yτ ∆,(k−1) , . . . , Yτ ∆,1 , I(τ −1)∆ ,
ˆ of Yτ ∆,k depends on not only the past information I(τ −1)∆ but θ) k−1 also Yτ ∆,l l=1 , the yields at τ ∆ with shorter maturities.10 We then
transform the yield Yτ ∆,k through its corresponding model-implied transition density (1)
Zτ ,k (θˆ ) =
Yτ ∆,k
∫
p y|Yτ ∆,(k−1) , . . . , Yτ ∆,1 , I(τ −1)∆ , θˆ dy,
0
k = 1, . . . , N .
(22) (1)
This approach produces N generalized residual samples, {Zτ ,k
ˆ Lτ =1 , k = 1, . . . , N. We can use {Zτ(1,k) (θˆ )}Rτ =1 and {Zτ(1,k) (θˆ )}Lτ =R+1 (θ)}
to evaluate the in-sample and out-of-sample performances of ATSMs in capturing the dynamics of the k-th yield, respectively. For each k, both series should be approximately i.i.d. U [0, 1] under correct model specification. (1) We can also combine the N generalized residuals {Zτ ,k (θˆ )}Lτ =1 in (22) in a suitable manner to generate a long sequence, which we may call the combined generalized residuals of an ATSM. Define U = (Y∆,1, Y∆,2 , . . . , Y∆,N , Y2∆,1, Y2∆,2 , . . . , Y2∆,N ,
. . . , YL∆,1, YL∆,2 , . . . , YL∆,N ). We can then conduct the probability integral transforms of Uτ with respect to the model-implied transition density that depends on all the past yields and contemporaneous yields with shorter maturities: Zτ(2) (θˆ )
Uτ
∫ =
p y|Uτ −1 , . . . , U1 , θˆ dy,
τ = 1, . . . , LN .
(23)
0
(2) ˆ LN We could use {Zτ(2) (θˆ )}RN τ =1 and {Zτ (θ )}τ =RN +1 to measure the insample and out-of-sample performances of ATSMs, respectively. Both series should also be approximately i.i.d. U [0, 1] under correct model specification and can be used to check the overall performance of an ATSM. In contrast, each individual sequence of (1) generalized residuals {Zτ ,k }Lτ =R+1 in (22) can be used to check the performance of an ATSM in forecasting the probability density of each individual yield.
5. Empirical results In this section, we provide empirical analysis of the six fourfactor joint ATSM. Specifically, we provide approximate maximum likelihood estimation of A0 (4), A1 (4), A2,1 (4), A2,2 (4), A3 (4) and A4 (4) using dollar and euro yield data, as well as dollar/euro exchange rate. We also examine the in-sample and out-of-sample performances of these models based on the nonparametric tests.
10 In general, there are N ! ways of factoring the joint transition density of yields with different maturities. In our application, the transition density of the yields of long-term bonds depend on the contemporaneous yields of shorter maturity bonds, because the short-end of the yield curve is generally more sensitive to various economic shocks and is more volatile. In fact, one is often interested in knowing how short-term interest rate movements, which may be initiated or changed by the central banks, can be transmitted into long-term interest rate movements. For LIBOR and Euribor of the same maturity, we choose to condition Euribor on LIBOR.
65
Finally, we study the information contained in the term structure factors of the best model in forecasting macroeconomic variables in both the US and the eurozone. Table 5 contains parameter estimates and standard errors for the six models we consider. We also report the log-likelihood function estimate for each model. However, as the models are nonnested, their log likelihood estimates cannot be easily compared. Table 6 reports the nonparametric Portmanteau statistics for both in-sample and out-of sample performances. We choose our main sample (from July 1, 1999 to June 30, 2003) as the estimation sample and the next three years (from July 1, 2003 to June 30, 2006) as the forecast sample. We consider W (p) for p = 5, 10, 15, and 20, where p represents the lag truncation order. The use of W (p) with various lag orders can reveal which lag order significantly departs from i.i.d. U [0, 1]. As a robustness check, we separately report results when we limit our sample to only 6-month and 10year zero-coupon bonds for LIBOR and Euribor. The W (p) statistics have a standard normal distribution, and the results show that all models are rejected. One caveat about this nonparametric test is that it is extremely powerful. As an in-sample statistic, it tests whether the entire distribution is captured by the model. As an out-of-sample forecast statistic, it tests whether the forecast of the entire distribution is accurate. As a result, this test often rejects a model which other specification tests fail to reject. We, therefore, mainly use this specification test as a way to compare and rank different models. Out-of-sample performance is especially important, because introducing more complicated models with larger numbers of estimated parameters creates a potential danger of overfitting noise in the data. Thus, the model that performs better out-of-sample might be more effective in capturing the underlying data-generating process. Our results show that the A2,2 (4) model is the best model in capturing the joint term structure based on both in-sample and out-of-sample statistics. A4 (4) has the worst performance by far in out-of-sample test, although it has reasonable good in-sample performance. This illustrates the importance of examining both insample and out-of-sample performances. The other models perform worse in both in-sample and out-of-sample results. We can interpret our results in light of the trade-off between flexibility in modeling time-varying volatility and correlation of bond yields as pointed out in Dai and Singleton (2000). In Am (N ), m is the number of state variables that affect the instantaneous variance of X (t ). In the Gaussian model (m = 0), none of the state variables affect the conditional variance of bond yields. But bond yields can have arbitrary correlation structures. At the other extreme, when m = 4, all state variables affect the variances. Since parameter restrictions are imposed to ensure admissibility, models with m = 4 have zero correlation across the state variables. Hence, by increasing m from 0 to 4, while the model has more flexibility modeling the conditional variance of bond yields, it becomes more restrictive in modeling the correlation of bond yields. Our findings are consistent with this trade-off. We find that A2,2 (4) is the best model. In the A2,2 (4) model, the term structure in each individual country follows an A1 (3) model. Thus, volatility in each country is driven by the local factor, while both common factors are Gaussian. So, both countries have rich volatility structures while providing some freedom in correlation of factors. A3 (4) also provides a reasonably good fit of the term structure. Model A3 (4) assumes a restricted A2 (3) for each country. For each country, the volatility is driven by one common factor and the country-specific factor. This provides an even richer volatility structure in each country than A2,2 (4). However, A3 (4) is too restrictive in modeling the conditional correlations, and has an overall worse performance than A2,2 (4). The other models also prove to be too restrictive in modeling the dynamics of the joint term structure. In the A1 (4) model,
61.2463
Log L
0.0039 0.0139 0.0026 0.0077 0.0116 0.0352 0.0045 0.0053 0.0681 0.1323 0.0146 0.1658 0.0185 0.0858 0.1971 0.0226 0.0385 0.1021 0.0575 0.0877 0.0932 0.0326 0.1169 0.0221 0.0116 0.0144 0.0174 0.0640 0.0674 0.0417
Log L
δ0 δ1 δ2 δ3 δ0∗ δ1∗ δ2∗ δ4∗ κ11 κ21 κ22 κ31 κ32 κ33 κ41 κ42 κ44 β21 β31 β41 λ1 λ2 λ3 λ∗4 λ∗1 λ∗2 ϑ1 σFX σ1y σ2y σ5y ∗ σ1y ∗ σ2y ∗ σ5y
61.2876
0.0851 0.0000 0.0000 −0.0082 0.0366 0.0000 0.0027 −0.0029 0.7853 −0.4959 0.6352 0.7361 −0.0673 0.6869 0.4584 −0.6825 −0.0159 0.1064 0.0117 0.0142 −0.6115 −0.0882 0.0142 −0.0096 −0.6032 −0.0966 25.139 0.0065 0.0011 0.0010 0.0010 0.0012 0.0014 0.0011
0.0155 0.0027 0.0065 0.0055 0.0112 0.0097 0.0018 0.0044 0.0688 0.0841 0.0465 0.0271 0.0841 0.0222 0.0358 0.0585 0.0914 0.1434 0.2619 0.4659 0.0400 0.3969 0.4012 0.3206 0.1315 0.1521 0.2984 0.3329 0.0788 0.1091 0.0670 0.0175 0.0362 0.0205
SE
Log L
δ0 δ1 δ2 δ3 δ0∗ δ1∗ δ2∗ δ4∗ κ11 κ12 κ21 κ22 κ31 κ32 κ33 κ41 κ42 κ44 β31 β32 β41 β42 λ1 λ2 λ3 λ∗4 λ∗1 λ∗2 ϑ1 ϑ2 σFX σ1y σ2y σ5y ∗ σ1y ∗ σ2y ∗ σ5y 0.0064 0.0013 0.0039 0.0606 0.0013 0.0000 −0.0033 1.2886 −0.1760 −0.1621 1.0217 1.0875 −0.2346 0.1113 −0.2795 0.6119 0.5584 0.0000 0.1600 0.1314 0.0365 −0.5369 −0.7886 −0.0024 0.0008 −0.5390 −0.7849 12.5064 42.8511 0.0066 0.0011 0.0011 0.0011 0.0012 0.0014 0.0010 60.672
−0.0389
Coeff.
A2,1 (4) 0.0013 0.0022 0.0004 0.0019 0.0043 0.0027 0.0017 0.0010 0.0044 0.0076 0.0346 0.0113 0.0081 0.0014 0.0058 0.0310 0.0089 0.0005 0.0046 0.0033 0.0134 0.0174 0.0273 0.0108 0.0425 0.1089 0.0459 0.0170 0.0131 0.0090 0.0114 0.0049 0.0066 0.0022 0.0249 0.0182 0.0262
SE
Log L
δ0 δ1 δ3 δ4 δ0∗ δ2∗ δ3∗ δ4∗ κ11 κ22 κ33 κ43 κ44 λ1 λ3 λ4 λ∗2 λ∗3 λ∗4 ϑ1 ϑ2 σFX σ1y σ2y σ5y ∗ σ1y ∗ σ2y ∗ σ5y
A2,2 (4)
57.8895
0.0096 0.0044 0.0006 −0.0032 0.0056 3.8678 0.0000 0.7806 −3.0273 0.1920 −3.2366 −1.8499 −4.0547 −0.1756 −0.7064 −3.1839 0.9407 0.0000 0.0818 0.0010 0.0010 0.0011 0.0012 0.0015 0.0012
−0.0008
0.0207
−0.0274
Coeff. 0.0065 0.0020 0.0017 0.0050 0.0121 0.0325 0.0133 0.0040 0.0071 0.0000 0.0411 0.0130 0.0031 0.0015 0.1100 0.0402 0.0789 0.0322 0.0223 0.0137 0.0000 0.0262 0.0050 0.0068 0.0079 0.0155 0.0138 0.0096
SE
Log L
δ0 δ1 δ2 δ4 δ0∗ δ1∗ δ3∗ δ4∗ κ11 κ21 κ22 κ31 κ33 κ41 κ44 β41 λ1 λ2 λ4 λ∗1 λ∗3 λ∗4 ϑ1 ϑ2 ϑ3 σFX σ1y σ2y σ5y ∗ σ1y ∗ σ2y ∗ σ5y
59.9788
0.0000 0.0119 −0.0184 −0.0458 0.0000 0.0028 −0.0119 1.8491 0.0000 0.7574 −0.0035 0.0000 −2.0172 0.1899 0.0000 −0.9148 −0.0554 8.2898 −0.9054 −0.0448 8.3230 0.4448 3.9437 1580.3 0.0074 0.0011 0.0010 0.0009 0.0011 0.0014 0.0011
−0.0958
Coeff.
A3 (4)
0.0143 0.0356 0.0141 0.0274 0.0060 0.0151 0.0235 0.0095 0.0934 0.0763 0.1523 0.1515 0.0444 0.2039 0.0282 0.0259 0.1665 0.1919 0.1202 0.3478 0.2018 0.5020 0.0925 0.3116 30.0586 0.0468 0.0370 0.0228 0.0286 0.0348 0.0812 0.0457
SE
Log L
δ0 δ1 δ2 δ3 δ0∗ δ1∗ δ2∗ δ4∗ κ11 κ12 κ21 κ22 κ31 κ32 κ33 κ41 κ42 κ44 λ1 λ2 λ3 λ∗1 λ∗2 λ∗4 ϑ1 ϑ2 ϑ3 ϑ4 σFX σ1y σ2y σ5y ∗ σ1y ∗ σ2y ∗ σ5y
A4 (4)
59.5088
0.0076 0.0015 0.0000 0.1006 0.0189 0.0000 0.0000 0.0047 0.3263 0.0000 0.0000 0.0149 −0.3420 −0.0065 0.3213 −1.9995 −0.2404 0.5483 −0.0645 −0.0762 0.0310 −0.0082 −0.0627 0.0066 2.2335 0.5893 3.3201 0.0844 0.0068 0.0011 0.0014 0.0012 0.0015 0.0022 0.0014
Coeff.
SE 0.0013 0.0022 0.0004 0.0019 0.0043 0.0027 0.0017 0.0010 0.0044 0.0076 0.0346 0.0113 0.0081 0.0014 0.0058 0.0310 0.0089 0.0223 0.0046 0.0033 0.0134 0.0174 0.0273 0.0108 0.0425 0.7811 0.0419 0.2156 0.2160 0.0179 0.1622 0.0464 0.0995 0.0798 0.1615
Table 5 reports the estimation results from each of the six-factor models. The estimation is done using daily LIBOR and Euribor data from July 1, 1999 to June 30, 2003. For each model, we report the parameter estimates, the ∗ ∗ ∗ standard errors and estimates of the the log-likelihood functions (Log L). σFX is the standard deviation of the noise or observational error associated with the log exchange rate change. σ1y , σ2y , σ5y , σ1y , σ2y , and σ5y are the standard deviations of the noise or observational errors associated with the 2-year, 5-year and 10-year LIBOR, and 2-year, 5-year and 10-year Euribor. All other coefficients for the models are described in the text.
0.0000 0.0047 0.0071 0.0106 −0.0387 0.0030 0.0033 0.0032 0.6393 1.1395 0.1580 1.2641 −0.1945 0.6737 2.1726 0.4167 0.0000 0.3522 −11.7534 −0.0141 0.0105 0.3288 −11.751 0.0065 0.0011 0.0011 0.0010 0.0011 0.0014 0.0012
δ0 δ1 δ2 δ3 δ0∗ δ1∗ δ2∗ δ4∗ κ11 κ21 κ22 κ31 κ32 κ33 κ41 κ42 κ44 λ1 λ2 λ3 λ∗4 λ∗1 λ∗2 σFX σ1y σ2y σ5y ∗ σ1y ∗ σ2y ∗ σ5y
A1 (4) Coeff.
Coeff.
SE
A0 (4)
Table 5 Parameter estimates from the factor models.
66 A.V. Egorov et al. / Journal of Econometrics 162 (2011) 55–70
A.V. Egorov et al. / Journal of Econometrics 162 (2011) 55–70
67
Table 6 Nonparametric pormanteau statistics for the in-sample and out-of-sample performance of affine models. In-sample A0 (4)
A1 (4)
A2,1 (4)
A2,2 (4)
A3 (4)
A4 (4)
W (5) W (10) W (15) W (20) W (5) W (10) W (15) W (20) W (5) W (10) W (15) W (20) W (5) W (10) W (15) W (20) W (5) W (10) W (15) W (20) W (5) W (10) W (15) W (20)
Out-of-sample
Combined
US6m
EU6m
US10y
EU10y
Combined
US6m
EU6m
US10y
EU10y
210.52 291.49 355.74 410.20 188.32 263.41 320.41 369.73 221.66 308.65 375.57 436.64 106.34 151.57 184.55 213.91 159.72 224.49 274.11 317.51 85.77 119.31 144.63 166.48
47.59 68.37 84.38 97.75 52.73 73.91 89.32 103.46 53.66 74.59 90.54 105.28 27.04 39.19 48.18 54.81 47.79 66.41 80.78 94.03 21.13 28.12 34.58 38.65
58.56 79.35 97.21 110.65 60.94 86.28 106.31 122.49 64.71 93.33 115.20 133.07 35.56 49.93 61.48 71.32 37.46 52.21 64.44 74.14 22.69 35.17 43.97 49.99
46.40 69.27 84.92 98.72 67.91 95.37 116.87 134.72 63.77 87.66 107.74 124.37 26.44 37.28 45.39 51.51 37.93 54.47 67.05 77.29 23.89 33.55 41.65 47.53
68.86 97.96 119.00 135.45 47.60 66.34 82.01 93.59 58.15 80.37 99.15 113.86 25.74 37.92 48.60 54.65 46.10 64.72 78.53 54.65 25.12 36.42 45.29 52.60
49.31 68.11 82.25 95.12 236.43 333.85 406.03 469.80 116.13 166.85 205.00 238.07 42.62 60.83 74.42 85.73 127.33 179.12 219.10 253.96 875.41 1194.45 1468.09 1716.35
10.25 13.29 16.94 20.08 57.4 81.01 98.12 112.37 22.47 31.00 37.23 43.62 12.39 16.34 19.84 22.55 38.47 54.25 65.20 75.06 211.53 292.02 347.60 392.16
11.29 16.21 20.57 24.13 74.21 101.36 123.56 142.64 29.79 42.71 51.80 59.33 11.52 16.35 19.34 22.12 32.44 45.60 56.53 64.97 214.29 301.85 361.63 407.18
5.06 9.6 11.12 11.95 84.23 119.79 147.38 170.52 49.05 68.20 83.49 96.02 14.40 19.35 22.73 26.95 32.43 46.45 57.50 66.07 205.70 277.71 334.49 374.95
20.03 28.77 36.85 43.43 72.54 101.79 123.49 140.77 33.38 46.04 57.94 67.77 10.64 15.48 19.44 23.22 32.35 46.24 55.79 63.83 208.57 283.73 338.15 382.21
Table 6 reports the nonparametric pormanteau statistics W (p) defined in the text for in-sample and out-of-sample performances of six specifications of four-factor affine models. p represents the lag truncation order and equals 5, 10, 15, and 20 in our case. We separately report results when our sample is limited to only 6-month and 10-year zero-coupon bonds for LIBOR and Euribor, respectively. The in-sample estimation is done using daily LIBOR and Euribor data from July 1, 1999 to June 30, 2003 and the out-of-sample test is done using data from July 1, 2003 till June 30, 2006. W (p) has a standard normal distribution.
both countries’ term structures are A1 (3). In A1 (4), the volatility is described by a single common factor, while correlation is driven by two common factors and country-specific factors. This provides maximum flexibility in fitting correlation of factors among models that have non-Gaussian structure (i.e., among models A1 (4), A2,1 (4), A2,2 (4), A3 (4), A4 (4)). Model A2,1 (4) has two factors driving volatility. In model A2,1 (4), for each country the volatility is driven by both common factors. A2,1 (4) assumes a restricted A2 (3) for each country. But due to the restriction in volatility structure, both these two models A1 (4) and A2,1 (4) perform worse than A2,2 (4) both in-sample and out-of-sample. Model A0 (4) allows for maximal flexibility in fitting factor correlation, and thus the correlation of interest rates. But in this case, none of the state variables can affect the variance. Hence, the variance is restricted to be homoskedastic and hence the model performs substantially worse than A2,2 (4). Finally, for model A4 (4), each country’s term structure is described by a restricted A3 (3) model. Although the model allows for time-varying volatility and correlated factors, it imposes zero conditional correlations with positive unconditional correlation. This is clearly counterfactual. This results in A4 (4) having the worst fit of all six models out-of-sample, although it performs well in-sample. In summary, our empirical results so far provide important insights on the kind of affine models that are needed to capture the joint dollar–euro term structure. Tables 2–4 suggest that such models should consist of at least four factors. In particular, Table 4 suggests that two of these four factors should be common factors and the other two should be local factors. Tables 5 and 6 suggest that among all the completely affine models we consider, the A2,2 (4) model has the best in-sample and out-of-sample performances in capturing the joint dynamics of dollar–euro term structure. In this four-factor model, two local factors drive the volatilities of bond yields, while the common factors follow Gaussian processes. Next, we provide some interpretation for the factors in the best affine model, the A2,2 (4) model. Specifically, we relate the four implied latent factors of A2,2 (4) to the LIBOR-, Euribor-PCs
and the exchange rate changes, as well as subsequent growth in Gross Domestic Product (GDP) and inflation in both economies. We standardize the PCs and the A2,2 (4) factors to have means of 0 and standard deviations of 1 so that the coefficients can be compared. Table 7 provides regression results of the PCs and the exchange rate change on the latent factors, while Table 8 shows the regressions of subsequent US and EU GDP growth and inflation on the latent factors. In Table 7 these factors are in monthly frequency, while in Table 8 they are in quarterly frequency. The regressors in Tables 7 and 8 are the four A2,2 (4) factors. These factors are originally estimated on a daily basis. We use the four estimated month-end and quarter-end factors as regressors in Tables 7 and 8, respectively. In Table 7 Panel A, we regress the first two LIBOR- and EuriborPCs on the four latent factors of the A2,2 (4) model. When the first two LIBOR-PCs are regressed on the four latent factors, the coefficients of the first US local factor and the two common factors are highly significant. As expected, the Europe local factor is insignificant. Similarly, when the first two Euribor-PCs are regressed upon the four latent factors, the US local factor has by far the smallest effect. In general, both local and common factors are important in explaining the PCs. This confirms our initial analysis that both the local and common factors are needed to explain the first two PCs of dollar and euro term structures. In Table 7 Panel B, we regress the percentage change in dollar/euro exchange rate on the four latent factors. The first column shows the results with exchange rate change from last month to the current month, and the second column shows the results from the current month to the next month. We find that none of the four factors is significant in explaining exchange rate movements. The A2,2 (4) term structure factors are largely orthogonal to exchange rate movement. This shows that the movement of the exchange rates is not a simple linear function of the interest rate factors and that it needs to be modeled separately as an additional factor. We presently incorporate exchange rate movement as a constraint to our yield curve estimation through (10). In Table 8, we regress the subsequent 6-month, 1-year, and 2year GDP growth and inflation on the four latent factors. As we only
68
A.V. Egorov et al. / Journal of Econometrics 162 (2011) 55–70
Table 7 Regression of US and EU principal components and exchange rate on A2,2 (4) factors. Panel A: Principal components
Constant X1 X2 X3 X4
Coef t-stat Coef t-stat Coef t-stat Coef t-stat Coef t-stat
pc2_us
pc1_eu
pc2_eu
Lag FX change
FX change
−0.20 (−12.68)
0.07 (2.63) 0.28 (21.86) 0.03 (1.08) 0.60 (50.21) 0.68 (59.38) 1.00 48
0.14 (3.07) 0.08 (3.54) 0.36 (6.30) 0.32 (14.35) 0.49 (22.76) 0.96 48
0.02 (2.68) −0.01 (−1.35) 0.43 (47.12) 0.43 (118.73) 0.57 (166.48) 1.00 48
0.30 (0.29) −0.21 (−0.38) 0.65 (0.51) −0.60 (−1.19) −0.69 (−1.42) 0.03 48
−0.38 (−.36) −0.97 (−1.79)
0.67 (79.74) 0.00 (0.25) 0.22 (28.14) 0.80 (107.04) 1.00 48
Adj. R2 N
Panel B: Exchange rate
pc1_us
0.41 (0.32) 0.13 (0.25) −0.23 (−0.48) 0.01 48
Table 7 reports the results when A2,2 (4) factors are regressed on US and EU PCs, as well as exchange rate change. Panel A shows the results for PCs. Principal component analysis is conducted for US and EU interest rates separately and the first two factors are extracted. In each region, we include 1-month, 3-month, 6-month, 1-year, 10-year, and 30-year interest rates in the PCs. Panel B shows the results for exchange rate change. Lag FX change is defined as percentage change of dollar/euro exchange rate from last month to the current month. FX change is defined as the percentage change of dollar/euro exchange rate from the current month to the next month. We report the results using data from July 1, 1999 till June 30, 2003. All variables are standardized to have means of 0 and standard deviations of 1.
Table 8 Regression of A2,2 (4) factors on US and EU growth and inflation 6 months and 1 year ahead. Panel A: 6-month and 1-year ahead GDP growth US–EU 6m growth
Constant X1 X2 X3 X4
Coef t-stat Coef t-stat Coef t-stat Coef t-stat Coef t-stat
Adj. R2 N
US–EU 1y growth
US 6m growth
EU 6m growth
Diff
US 1y growth
EU 1y growth
Diff
2.50 (148.46) 0.17 (4.44)
2.10 (14.54)
4.98 (69.79) −0.04 (−0.36)
4.15 (10.74)
−0.08 (−0.45) 0.05 (0.26) 0.42 (2.43) 0.12 26
0.42 (6.27) −0.35 (−2.36) −0.21 (−1.01) −0.03 (−0.14) −1.03 (−7.79) 0.67 26
−0.39 (−2.42) −1.26 (−12.15) 0.84 24
0.20 (0.71) 0.68 (1.89) 0.20 24
0.86 (6.37) −1.08 (−5.74) −0.30 (−0.89) −0.33 (−1.41) −1.78 (−9.22) 0.85 24
US 6m inf
EU 6m inf
US–EU 6m in diff
US 1y inf
EU 1y inf
US–EU 1y inf diff
1.41 (8.01) 0.38 (2.02)
1.11 (19.17)
0.30 (2.12) 0.29 (1.64) −0.04 (−0.20) 0.04 (0.29) −0.09 (−0.59) 0.06 26
2.78 (26.07) 0.56 (4.38)
2.25 (42.55)
0.60 (6.54) 0.37 (3.38) −0.21 (−1.29) 0.07 (0.62) −0.31 (−2.93) 0.59 24
−0.09 (−1.14) −0.68 (−15.17) 0.74 26
−0.31 (−0.81)
Panel B: 6-month and 1-year ahead inflation
Constant X1 X2 X3 X4 Adj. R2 N
Coef t-stat Coef t-stat Coef t-stat Coef t-stat Coef t-stat
0.01 (0.08) −0.07 (−0.42) 0.06 26
0.02 (0.23) −0.02 (−0.31) 0.02 (0.27) −0.12 26
−0.10 (−1.60) 0.07 (0.50) −0.36 (−3.22) 0.50 24
0.07 (2.21) 0.06 (1.22) 0.01 24
Table 8 reports the results when A2,2 (4) factors are regressed on US and EU 6-month and 1-year ahead GDP growth and Inflation. We use quarterly data from Q3 of 1999 to Q2 of 2006. From Q3 2003 to Q2 2006, we construct the monthly A2,2 (4) factors using estimated coefficients from Q3 1999 to Q2 2003. Panel A reports the results from GDP growth. For each time horizon, the dependent variables are the US GDP growth, the EU GDP growth and the difference between the US and EU growth rates. Panel B reports the results from Inflation. For each time horizon, the dependent variables are the US inflation, the EU inflation and the difference between the US and EU inflation rates. The A2,2 (4) factors are standardized to have a mean of 0 and standard deviation of 1. Hansen–Hodrick (1980) standard errors are reported.
have a short sample of data, this part of our analysis is highly exploratory. We obtain seasonally adjusted quarterly US GDP and US consumer price index from the St. Louis Fed and the corresponding EU data from Eurostat, and construct the GDP growth and inflation from one quarter-end to the next. We use estimated quarter-end A2,2 (4) factors as regressors. Since the 1-year and 2-year macro variables are sampled less frequently than the regressors, there is an overlapping data problem which affects the standard errors. We use Hansen and Hodrick (1980)’s adjustment to correct for the standard error. To increase the sample, from 2003 to 2006, we construct the A2,2 (4) factors using estimated coefficients from 1999 to 2003. Fig. 3 shows the four A2,2 (4) factors through time.
Panel A of Table 8 shows the regressions based on subsequent GDP growths in the US and in Europe, as well as the difference in subsequent GDP growth rates in the two economies. Estrella and Hardouvelis (1991) and Ang et al. (2006), among others, find that yield curve factors predict future GDP growth. We find that our A2,2 (4) factors predict future GDP growth rates in both the US and Europe. The A2,2 (4) factors have more predictive power towards future US GDP growth than Europe GDP growth. The four factors predict 74% of the subsequent 6-month GDP growth in the US but only 12% of the growth in Europe. We also find that the predictive power is higher with a longer horizon. The factors also predict the difference in growth rates between the US and Europe. For
A.V. Egorov et al. / Journal of Econometrics 162 (2011) 55–70
69
X1 X2 X3 X4
Fig. 3. A2,2 (4) Factors over time Fig. 3 shows the four A2,2 (4) factors from July 1, 1999 to June 30, 2006. From 2003 to 2006, the A2,2 (4) factors are constructed based on the estimated coefficients from 1999 to 2003. The factors are standardized with means of 0 and standard deviations of 1.
instance, the second common factor X4 loads negatively on US GDP growth and positively on Europe GDP growth. Hence, it predicts the difference in GDP growth between the two economies. In a longer sample, one can potentially use the term structure factors to forecast with more accuracy future GDP growth rates and whether the relative business cycles are in sync in the two economies. Panel B of Table 8 reports the regression results of future inflation on the four latent factors. Mishkin (1990) and others find that the term structure contains information about future inflation. We find that in our sample the A2,2 (4) factors are more related to future US inflation but have less predictive power toward future Europe inflation. The four factors also have some predictive power toward future difference in inflations. With a longer sample, our factors may be able to forecast more accurately the future inflation difference in the two economies. We leave this to future research. 6. Conclusion In today’s global financial markets, many investors face the challenge of managing huge international fixed income portfolios. In this article, we provide a thorough analysis of multi-factor affine models for the joint dollar–euro term structure. Such models are essential for managing interest rate risks in the two largest economies in the world. Our article systematically classifies joint term structure models with up to four local and common factors. We then provide new evidence on the joint term structure using daily data in LIBOR and Euribor from July 1999 to June 2003. We estimate the joint term structure models using the approximate maximum likelihood method of Aït-Sahalia (2002, forthcoming) and Aït-Sahalia and Kimmel (2002). We also evaluate the in-sample and out-of-sample performances of the joint term structure models using the nonparametric tests of Hong and Li (2005), Egorov et al. (2006), and Hong et al. (forthcoming). We find that a four-factor model with two Gaussian common factors and two local square-root factors best describes the term structures of Euribor and LIBOR. This model seems to provide the best trade-off in terms of flexibility in modeling correlation and volatility.
Acknowledgements The authors thank an anonymous referee, Warren Bailey, Jonathan Batten, Pierre Collin-Dufresne, Bob Jarrow, David Just, Yongmiao Hong, Georg Mosburger, Monika Piazzesi, Martin Schneider, David Zilberman and the seminar participants at West Virginia University, the Third Vienna Symposium on Asset Management, and the 2007 European Financial Management Association Annual Meeting for helpful comments. References Ahn, D., 2004. Common factors and local factors: Implications for term structures and exchange rates. Journal of Financial and Quantitative Analysis 39, 69–101. Aït-Sahalia, Y., 2002. Maximum-likelihood estimation of discretely sampled diffusions: A closed-form approach. Econometrica 70, 223–262. Aït-Sahalia, Y., 2007. Closed-form likelihood expansions for multivariate diffusions. Annals of Statistics (forthcoming). Aït-Sahalia, Y., Kimmel, R., 2002. Estimating affine multifactor term structure models using closed-form likelihood expansions. Princeton University working paper. Ang, A., Piazzesi, M., Wei, M., 2006. What does the yield curve tell us about gdp growth? Journal of Econometrics 131, 359–403. Atkins, R., 2006. Euro notes cash in to overtake dollar. Financial Times, December 27. Backus, D.K., Foresi, S., Telmer, C., 2001. Affine term structure models and the forward premium anomaly. Journal of Finance 56, 279–304. Brandt, M., Santa-Clara, P., 2002. Simulated likelihood estimation of diffusions with an application to exchange rate dynamics in incomplete markets. Journal of Financial Economics 63, 161–210. Brennan, M., Xia, Y., 2006. International capital markets and foreign exchange risk. Review of Financial Studies 19, 753–795. Cheridito, P., Filipović, D., Kimmel, R., 2006. Market price of risk specifications for affine models: Theory and evidence. Journal of Financial Economics 83, 123–170. Cochrane, J., Piazzesi, M., 2005. Bond risk premia. American Economic Review 95, 138–160. Dai, Q., Singleton, K., 2000. Specification analysis of affine term structure models. Journal of Finance 55, 1943–1978. Dai, Q., Singleton, K., 2002. Term structure dynamics in theory and reality. Review of Financial Studies 16, 631–678. Dai, Q., Singleton, K., Yang, W., 2004. Predictability of bond risk premia and affine term structure models. Stanford University working paper. Dewachter, H., Maes, K., 2001. An admissible affine model for joint term structure dynamics of interest rates. KU Leuven working paper.
70
A.V. Egorov et al. / Journal of Econometrics 162 (2011) 55–70
Diebold, F., Hahn, J., Tay, A., 1999. Multivariate density forecast evaluation and calibration in financial risk management: High-frequency returns on foreign exchange. Review of Economics and Statistics 81, 661–673. Duffee, G., 2002. Term premia and interest rate forecasts in affine models. Journal of Finance 57, 405–443. Duffee, G., Stanton, R., 2004. Estimation of dynamic term structure models, University of California at Berkeley working paper. Egorov, A., Li, H., Xu, Y., 2003. Maximum likelihood estimation of timeinhomogeneous diffusions. Journal of Econometrics 114, 107–139. Egorov, A., Hong, Y., Li, H., 2006. Validating forecasts of the joint probability density of bond yields: Can affine models beat random walk? Journal of Econometrics 135, 255–284. European Central Bank. 2004. The euro bond market study, Frankfurt, Germany, December. European Commission. 2005. EMU After Five Years, European Economy Special Report 1, February. Estrella, A., Hardouvelis, G., 1991. The term structure as a predictor of real economic activity. Journal of Finance 45, 555–576. Gallant, A.R., Tauchen, G., 1996. Which moments to match? Econometric Theory 12, 657–681. Gallant, A.R., Tauchen, G., 2001. Efficient method of moments, University of North Carolina working paper. Han, B., Hammond, P., 2003. Affine models of joint dynamics of exchange rates and interest rates. Stanford University working paper. Hansen, L., Hodrick, R., 1980. Forward exchange rates as optimal predictors of future spot rates: An econometric analysis. Journal of Political Economy 88, 829–853.
Hodrick, R., Vassalou, M., 2002. Do we need multi-country models to explain exchange rate and interest rate dynamics? Journal of Economic Dynamics and Control 26, 1275–1299. Hong, Y., Li, H., 2005. Nonparametric specification testing for continuous-time models with applications to term structure of interest rates. Review of Financial Studies 18, 37–84. Hong, Y., Li, H., Zhao, F., 2007. Can the random walk model be beaten in out-ofsample density forecasts: Evidence from the intraday exchange rates. Journal of Econometrics (forthcoming). Inci, A., Lu, B., 2004. Exchange rates and interest rates: Can term structure models explain currency movements? Journal of Economic Dynamics and Control 28, 1595–1624. Litterman, R., Scheinkman, J., 1991. Common factors affecting bond returns. Journal of Fixed Income 1, 54–61. Mishkin, F., 1990. What does the term structure tell us about future inflation? Journal of Monetary Economics 25, 77–95. Mosburger, G., Schneider, P., 2005. Modelling international bond markets with affine term structure models. University of Vienna working paper. Open University. 2006. Managing the European economy after the introduction of the Euro, Open University working paper. Singleton, K., 2001. Estimation of affine asset pricing models using the empirical characteristic function. Journal of Econometrics 102, 111–141. Tang, H., Xia, Y., 2006. An international examination of affine term structure models and the expectation hypothesis. Journal of Financial and Quantitative Analysis 42, 41–80.
Journal of Econometrics 162 (2011) 71–78
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Semi-nonparametric test of second degree stochastic dominance with respect to a function Keith D. Schumann ∗ Texas A&M University, Department of Agricultural Economics, Agricultural and Food Policy Center, 77843-2124 College Station, TX, United States
article
info
Article history: Available online 12 October 2009
abstract In an expected utility framework, assuming a decision maker operates under utility k (·|θ), for two risky alternatives X and Y with respective distribution functions F and G, alternative X is said to dominate y alternative Y with respect to k (·|θ) if −∞ [F (t ) − G(t )]dk(t |θ ) ≤ 0 for all y. Utilizing the empirical distribution functions of F and G, a statistical test is presented to test the null hypothesis of indifference between X and Y given k (·|θ) against the hypothesis that X dominates Y with respect to k (·|θ ). This is a large sample testing application of stochastic dominance with respect to a function. The asymptotic distribution of the test statistic associated with the null hypothesis given a sub-set of the utility function parameter space is developed. Based on large sample rejection regions, the hypothesis of preference of one alternative over another is demonstrated with an empirical example. © 2011 Published by Elsevier B.V.
1. Introduction
found that, without loss of generality, for a restricted support random variable, standardized to the support range [0, 1],
Risky investment alternatives can be preference-ranked by the ordering of representations that attempt to take risk attitudes into account. Stochastic dominance measures attempt to evaluate preference rankings based on the distributions of the alternatives and an assumed decision mechanism. The goal of this paper is to attempt to extend the literature on statistical methods of testing for second degree stochastic dominance between alternatives. A testing procedure for preference ranking given a utility function as a mechanism for choice will be presented. This procedure generalizes previous methods developed for testing preferences. The formal expression of the nature of rational decisions in the context of stochastic outcomes was initially presented by Von Neumann and Morgenstern (1953), and later expounded upon, most notably by Friedman and Savage (1948, 1952), Pratt (1964) and Arrow (1965) by way of describing risk aversion. Built on the fundamentals of expected utility, stochastic dominance methods were developed as a technical means to rank risky alternatives. Hadar and Russell (1969) as well as Hanoch and Levy (1969) are commonly attributed as outlining the concepts of first as well as second degree stochastic dominance. Meyer (1975) extended these measures of stochastic dominance by explicitly incorporating the utility function in the comparison rather than general assumptions about the choice mechanism of the decisionmaking agent. Assuming a strictly increasing utility function, he
∫
∗
Tel.: +1 979 845 8014. E-mail address:
[email protected].
0304-4076/$ – see front matter © 2011 Published by Elsevier B.V. doi:10.1016/j.jeconom.2009.10.009
y
[F (t ) − G(t )]dr (t ) ≤ 0 0
for all y ∈ [0, 1] given an increasing, twice differentiable function r (·) if and only if G is at least as risky as F . The riskiness of a random variable is understood with respect to the utility preference or decision-making mechanism an agent employs. Given equivalent expectations, the more risky prospect is the one that has a higher relative variability. For an agent who is averse to risk, i.e., a person who chooses to insure against risk, minimizing risk or variability in the lower tail of a distribution is a primary concern. Meyer (1975) formalized the concept of second degree stochastic dominance with respect to a function (SSD(kθ ) or SSD(k)) with an application (Meyer, 1977a) and theoretical development (Meyer, 1977b). For two restricted support random variables standardized to the [0,1] interval with corresponding distribution function, F SSD(k) G if and only if y
∫
[F (t ) − G(t )]dk(t ) ≤ 0 0
for y ∈ [0, 1] and a function k (·|θ ) with a given value of θ . This is a necessary and sufficient condition for F to be preferred to or indifferent to G by all agents with a utility function u(·) that exhibits equal or more risk aversion, or concavity, than the function k(·). The SSD(k) ordering result is unique at y. It can be seen that second degree stochastic dominance (SSD) is a special case of SSD(kθ ) where k (x) = x. Another generalization of stochastic dominance with respect to a function is called stochastic efficiency with respect to a
72
K.D. Schumann / Journal of Econometrics 162 (2011) 71–78
function (SERF) (Hardaker et al., 2004). This method seeks to examine the dominance of one alternative over another given a continuous range of a risk aversion measure as originally specified by Meyer (1977b). A potential problem with a particular application of stochastic dominance with respect to a function is that order preferences are evaluated only at specified boundary points of the risk aversion measure. For two alternatives with distribution functions that cross multiple times, this could lead to misidentifications of the efficient set of alternatives over that range. The SERF method evaluates the certainty equivalent (CE (θ )), the fixed amount where a risk averse agent would be indifferent between the random variable and that amount, of each alternative over the relevant parameter space of the utility function. Several methods have been proposed to develop statistical hypothesis tests for ranking one random alternative over another, or placing alternatives in an efficient set of preferred alternatives. One often used assumption is that the nonparametric estimates of the underlying distributions can be effectively utilized in assessments. In particular, McFadden (1989) and Klecan et al. (1991) developed tests, based on equally numbered observations, which are variations of the Kolmogorov–Smirnov procedure and deal with the issue of sampling independence. Anderson (1996) proposed nonparametric tests for first, second, and third degree stochastic dominance criteria based on analogs of Pearson goodness of fit tests. These tests were shown to be comparable in size and power to generalized Lorenz curve methods (as in Bishop et al., 1989) for comparing distributional differences for wealth. Multivariate extensions to Anderson’s test have also been introduced (see Crawford (2005) and Post and Versijp (2005)). In addition to mean–variance methods and stochastic dominance criteria, efforts to link these and related numerical procedures have been undertaken. Yitzhaki (1982) introduced methods to incorporate Gini’s mean difference (GMD) in the analysis of preference ranking, and he elaborated on this work by utilizing resampling methods to calculate the variance of estimates of this type (Yitzhaki, 1991). A comparison of mean-GMD analyses and stochastic dominance results as well as an application in agricultural commodities was illustrated in McDonald et al. (1997). Here it was suggested that mean-GMD methods of analyzing the efficient set of preferred alternatives had superior properties over stochastic dominance methods. Shalit and Yitzhaki (1994) introduced the concept of marginal conditional stochastic dominance (MCSD) and Seiler (2001) proposed a nonparametric test for this preference ranking method. Another work in the area of hypothesis testing for preferences was Eubank et al. (1993), hereafter referred to as ESY. ESY described a nonparametric method to test for second degree stochastic dominance. This work was a generalization of a methodology developed by Deshpande and Singh (1985), where the distribution of one of the risky prospects was assumed to be known for testing purposes. The asymptotic properties of the SSD test statistic, including asymptotic power, were elicited in ESY. The large sample variance of the statistic had a form that was cumbersome from an estimation perspective, so a resampling method to approximate the variance was suggested. Another variation of this testing procedure was developed by Kaur et al. (1994) which can be implemented across unequal sample sizes but requires a search across a finite number of possibilities to determine the proper test statistic. Davidson and Duclos (2006) evaluate bootstrapping methods that evidence an improvement in the asymptotic efficiency of existing statistics for testing stochastic dominance in their simulated experiments.
respect to a function are specified in this section. Consider an agent with an initial level of wealth who wishes to invest in a subset of a finite number of risky investment alternatives. The agent’s initial wealth will be labeled w0 , and two risky opportunities, X and Y , will be considered for investment. It is assumed that the agent has a single utility function for decision-making involving wealth, measured with a proxy utility function k (·|θ), where the unknown parameter set θ describes the shape specifications of that utility function. The risk measure is utility function-specific and is a function of θ as it relates to the variables X and Y . The value for θ is considered to be known a priori. The class of utility functions to be considered will be those that are increasing and twice differentiable with respect to the risky variable. This ensures that the local risk aversion coefficient exists for all X and Y . Independent samples of size n and m from each of the prospective investments X and Y respectively will be considered as a basis for estimating the distribution functions of each of the alternatives. The observations on the investments will be denoted as (xi ) and (yj ) for i = 1, . . . , n and j = 1, . . . , m. Given the previous conditions, the expected utility of the first investment, for example, is defined as Ek(X |θ ) =
∫ χ
k(t |θ )dF (t ),
(1)
where F is the cumulative distribution function of the investment X over χ , the support of X . Often, there is not enough information to make parametric distributional assumptions on F . Even so, imposing assumed distributions when making comparisons can be detrimental to a testing procedure employing utility-weighting given the relative emphasis on the tails of the distributions. Since the distribution functions of the alternatives are assumed to be unknown, the empirical distribution function (EDF), Fn , will be considered as an estimate of F . The EDF will be defined as Fn (x) =
n 1−
n
I(−∞,xi ] (x) ,
(2)
1
where IA (z ) is the indicator function which takes on a value of one if z ∈ A and zero otherwise. Referring again to the definition of second degree stochastic dominance with respect to a function, where F SSD(kθ ) G if and only if
∫
x
[F (t ) − G(t )]dk(t |q) ≤ 0 −∞
for all x in the support of X and Y , a function k (·|θ ), and a given value of θ , a formal testing procedure can be elicited. Given that the utility functions in the general class in question are one-toone and assuming these functions are locally monotonic, it is straightforward to conjecture that the desired result might be attained by transforming the variables to utility measures and then using these as the basis for the quantities of interest in the test procedure for second degree stochastic dominance described in ESY. Following the previous assumptions, we wish to test hypotheses of the type:
2. Hypothesis test
∀θ ∈ Θ1 H1 : F SSD(kθ ) G ∀θ ∈ Θ1 ,
Drawing on many of the former concepts and methodologies, the conditions for test of second degree stochastic dominance with
where Θ1 is the parameterization of interest of the given utility function k (·|θ ). It will be shown that the test statistic described in
H0 : F = G
(3) (4)
K.D. Schumann / Journal of Econometrics 162 (2011) 71–78
ESY can be generalized for the present testing procedure.1 To begin, let2 x
∫
dF ,G (x|θ ) =
[F (t ) − G(t )]dk(t |θ )
(5)
73
The proofs of theorems and are given in the Appendix. We would be inclined to conclude that F SSD(kθ ) G if Dn,m (θ ) < 0. The following section develops the basis by which we would reject the hypothesis that F = G in favor of the hypothesis that F SSD(kθ ) G.
−∞
where x ∈ (−∞, ∞). Thus, if F SSD(kθ ) G, dF ,G (x|θ ) ≤ 0 for all x and is strictly less than zero for at least one x. Define DF ,G (θ ) =
∞
[∫
1 2
dF ,G (x|θ )dG(x) +
∞
∫
dF ,G (y|θ )dF (y)
] (6)
−∞
−∞
as the average of the expectations of dF ,G (·|θ ) with respect to the distributions of both X and Y . If F SSD(kθ ) G, any strict inequality is elicited over the combined support, and DF ,G will be less than zero. Substituting the EDFs Fn and Gm for the actual distributions F and G, respectively, the corresponding sample statistics for the previous quantities are dn,m (x|θ ) =
x
∫
[Fm (t ) − Gn (t )]dk(t |θ )
(7)
−∞
and Dn,m (θ ) =
1
∞
[∫
2
dn,m (x|θ )dGm (x) + −∞
∫
∞
] dn,m (y|θ )dFn (y) . (8)
−∞
The specific form of the sample statistic Dn,m (θ ) is based on the following two theorems: Theorem 1. Let X1 , . . . , Xn and Y1 , . . . , Ym be two independent samples of size n and m from distribution functions F and G, respectively, let Fn and Gm be the corresponding empirical distribution functions, and let kθ be an increasing utility function that is twice differentiable. Then Dn,m (θ ) =
1 2
n 1−
n i =1
+
1
[∫
2
Wi −
m 1 −
m j =1
where Wi =
k(yj |θ )]dFn (t ).
xi
∞
Fn (x) [1 − Fn (x)] dk(x|θ)
Dn,m (θ ) =
2
[k(t |θ ) − k(xi |θ)]dGm (t ) and Tj =
kθ (Y ) − kθ (X ) +
1 2
GMD[kθ (X )] −
The purpose of this function is to determine if GU can be attained by adding a function to FV such that the means of the distributions remain the same. For evaluating GU , δ must be a func v the dominance of FVover v tion such that −∞ δ(t )dt ≥ 0 ∀v and −∞ d(t )dt > 0 for some v .
Under some additional conditions5 on the spread function δ , which hold in general for the distribution functions of investment returns, the null hypothesis implies that δ = 0 on the boundary, or when the distributions are equivalent. Recall that the utility transformation of the random variables is monotonic, so if the spread function is zero, the null hypothesis of equivalent distributions is true. Given the previous conditions on δ , the statistic Dn,m (θ ) has an asymptotic normal distribution of the form D
1 dF ,G (v|θ ) = − √ N
∞ yj
[k(t |θ ) −
Theorem 2. Let X1 , . . . , Xn and Y1 , . . . , Ym be two independent samples of size n and m from distribution functions F and G, respectively, let Fn and Gm be the corresponding empirical distribution functions, and let kθ be an increasing utility function that is twice differentiable. Then 1
(9)
(10)
To find the value for µD , first note that if the relationship in (9) holds, with k (v|θ) = v , we find that
−∞
∞
√
GU (v) = FV (v) + δ(v)/ N .
NDn,m (θ ) −→ N (µD , σD2 ).
] Gm (y) [1 − Gm (y)] dk(y|θ ) ,
−
The utility-transformed variables can be utilized to determine the large sample properties of the test statistic Dn,m (θ ). Define V = kθ (X ) and U = kθ (Y ) and let FV and GU be their respective distribution functions.4 Letting N = n + m, the sum of the two sample sizes, the limiting distribution of the test statistic can be developed under local alternatives. Define a mean-preserving spread function δ similar to that used by Rothschild and Stiglitz (1970, 1971) and Diamond and Stiglitz (1974). This type of spread function was used to establish first degree dominance. The spread function is defined as δ in
√
Tj
−∞ ∞
∫
2.1. Asymptotic properties of the test statistic
1 2
v
∫
δ(t )dt ,
(11)
∞
and DF ,G (θ ) =
1 2
[ ∫ ∫ 2 −√ N
∫
1
= −√
N
v
dtdF (v) −
−∞
1 N
∫ ∫
v
d(t )dt δ ′ (v)dv
]
−∞
[1 − Fv (t )]δ(t )dt + O(N −1 ),
(12)
thus the asymptotic mean is
µD = −
∫
[1 − FV (t )] δ(t )d(t ).
(13)
GMD[kθ (Y )] ,
where, for the n realizations of the k(Xi |θ) variables, kθ (X ) is the average and GMD[kθ (X )] is the Gini’s mean difference,3 and kθ (Y ) and GMD[kθ (Y )] are defined similarly.
1 As Davidson and Duclos point out, the structure of the hypothesis in testing for stochastic dominance often confuses the issue of what can be concluded. Given that the current test is set up as a one-sided alternative, the conclusion of interest is whether F SSD(kθ ) G. If the null hypothesis is rejected, then there exists evidence to support this type of conclusion; failing to reject the null hypothesis does not necessarily imply that G dominates F . The only inference that can be made is that F does not dominate G with respect to kθ . 2 Notation adopted from ESY. 3 For n observations of a random variable z, Gini’s mean difference is defined as ∑∑ 2 i<j zi − zj . n(n−1)
Finding the asymptotic variance requires the outlining of some of the asymptotic properties of the statistic. Note first that Dn,m (θ ) is a two-sample U-statistic with kernel component functions of g (z1 , z2 ) = (z1 + z2 − |z1 − z2 |)/2 (cf. Serfling, 1980, Section 5) for both V and U. From Lemma A of Section 5.2.1 of Serfling, the variance of the kernel for this type of statistic is
σ2 =
4ζ1,Z n
+ O(n−2 )
(14)
4 For simplicity, θ will be taken as given and omitted from most of remainder of the development. 5 For the local alternative relation on the continuous distributions, G (v) = U FV (v) + δ(v)/N, δ must be a function that is absolutely continuous, | ∂∂t δ(t )|dt,
|t |3 | ∂∂t δ(t )|dt,
δ 2 (t )dt, and 0 ≤ δ ≤ 1 − FV . Additionally, V must be finite in its
absolute third moment, or E |V 3 | < ∞.
74
K.D. Schumann / Journal of Econometrics 162 (2011) 71–78
where ζ1,Z is the variance of the expected value of the kernel conditional on Z1 = z1 . Focusing on the components that correspond to V , we have E [g (v1 , V2 )] =
ζ1,V =
1 4
1
[
2
[
∫
V + mv −
∫
Var V −
] |V − z |dFV (z ) ,
] |V − z |dFV (z ) .
and so
(15)
(16)
The calculation yields a similar result for the component variance6 related to U such that ζ1,U → ζ1,V = ζ as N → ∞. Let λ represent the limit of the proportion of the data observations represented by FV . More specifically, n λ = lim , (17) N →∞ N and it is assumed that 0 < λ < 1. The overall variance of Dn,m (θ ) is the weighted combination of the component-wise variances based on λ, not unlike a two-sample t-test, and thus the asymptotic variance is
σD2 = lim N Var Dn,m (θ ) = 4ζ /λ (1 − λ) , N →∞
(18)
and the asymptotic distribution is
∫ √ D NDn,m (θ ) −→ N − [1 − Fv (t )]δ(t )d(t ), 4ζ /λ(1 − λ) . (19) Since δ = 0 for all v at the boundary of the null hypothesis, µD is also zero, which results7 in
D
N −1/2 Dn,m (θ) / 2 ζ /nm −→ N (0, 1).
(20)
Thus, the null hypothesis of equal distributions given a specified utility function is rejected if
N −1/2 Dn,m (θ) / 2 ζ /nm < −za ,
(21)
for large N, where −za is the lower quantile associated with the 100 (1 − α) % percentile of the standard normal distribution. The corresponding asymptotic power of the statistic is 1−Φ
za −
(1 − FV (t )) δ (t ) dt √ 2 ζ /λ (1 − λ)
(22)
where Φ (·) is the standard normal cumulative distribution. Given that estimating the asymptotic variance can be cumbersome, and, again, since Dn,m (θ ) is a two-sample U-statistic, the variance can be adequately approximated using a jackknife estimator (see Arvesen, 1969). The sample form of the statistic based on replacing the variance term with the jackknife estimate of variance is defined as TN (θ) = Dn,m (θ )/SEDn,m (θ) .
(23)
Under the assumptions given in Theorems 1 and 2, calculations for the statistic Dn,m (θ ) can be made for n ̸= m and the order of the sample data is not important. The formulations for the standard error of this statistic, especially when employing the jackknife method for estimation, are more specific. If the sample sizes are not equal but are independent of each other, the component-wise jackknife estimator can be readily estimated as shown in Arvesen (1969). If the sample sizes are equal but the between sample independence assumption is invalidated, the jackknife method can still be used in formulating an estimate of the variance, although possibly biased, as long as the observational order of the samples is retained in the case-wise deletion.
6 Additional higher order terms result in the variance calculation for U due to δ . See ESY for a thorough treatment of the derivations. 7 In ESY, the rate of convergence of the statistic was inaccurately presented as N 3/2 . The proper rate of convergence is N −1/2 .
Table 1 Data and summary statistics for 20 observations of pseudo returns of 6 alternative investments. Obs.
X1
X2
X3
X4
X5
X6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
103.00 91.09 101.53 10.19 104.72 95.35 110.42 92.62 94.03 96.77 88.64 99.02 110.04 191.34 104.78 106.91 94.64 104.11 101.08 99.70
100.11 100.52 100.04 −1.62 116.11 99.84 100.22 109.42 98.68 98.32 96.60 117.54 177.08 102.04 100.42 101.75 89.07 94.32 100.04 99.49
96.39 85.70 99.56 81.52 61.66 78.45 113.17 77.19 101.97 73.66 110.02 109.97 72.16 161.46 118.06 96.23 136.83 131.03 114.32 80.68
58.22 135.89 135.52 45.50 57.67 107.21 82.21 144.52 133.94 138.18 64.47 71.51 60.69 105.22 83.45 59.17 133.70 139.95 124.25 118.75
82.91 82.54 128.44 52.85 65.42 102.74 114.72 59.71 89.73 82.02 66.57 82.03 87.36 140.57 70.67 113.66 93.75 117.08 112.03 55.22
68.09 135.33 96.59 166.09 89.61 167.76 103.42 140.95 84.30 133.83 96.61 95.82 57.39 41.52 115.77 98.13 150.73 137.00 86.36 134.71
Mean St. error Minimum Maximum Skewness Kurtosis
100.00 30.00 10.19 191.34 0.08 8.47
100.00 30.00 −1.62 177.08 −1.26 9.14
100.00 25.00 61.66 161.46 0.71 0.39
100.00 35.00 45.50 144.52 −0.16 −1.73
90.00 25.00 52.85 140.57 0.33 −0.72
110.00 35.00 41.52 167.76 −0.08 −0.64
3. Empirical method comparison This section will outline an example designed to compare the previously developed test mechanism to some preference ranking methods. In keeping with some common notations in the expected utility literature (introduced in a simpler form in Von Neumann and Morgenstern (1953) in terms of comparing utilities), let ≻h indicate a binary preference ordering between two risky prospects, i.e. X1 ≻ X2 indicates (X1 ) is preferred to (X2 ). In addition, ≻h indicates ‘‘is preferred or equal to’’ and ∼h indicates ‘‘is equal or not preferred to’’. Thus, X1 ≻h X2 and X2 ≻h X1 is equivalent to X1 ∼h X2 . The determination of the preference ordering is based on the preference criteria, e.g. mean–variance, stochastic dominance, stochastic dominance with respect to a function, etc. Thus, in the previous section, X2 SSD(kθ )X1 ⇔ CE2 (θ ) > CE1 (θ ) ⇔ X2 ≻k X1
(24)
for a given θ , where ≻k is used to designate one alternative is ‘‘preferred with respect to the utility function k’’ over the other. The data in Table 1 will be used for an example analyses. These are pseudo data contrived to highlight strengths and illustrate some of the drawbacks of some of the preference ranking methods and to detail the empirical methods in calculating preferences. Each variable consists of 20 observations, which would generally be considered a small sample but is typical for investment ventures that might only yield annual returns, such as that found with analyses of agronomic data. The small sample size assists in comparing where one preference-ranking method might make stronger conclusions than one where limited data produces less confidence. Noting the sample statistics, the data were chosen to result in pairwise indifference in most cases. Four of the series have equal means of $100, one series has a mean of $90, and the last has a mean of $110. Two of each of the series have standard errors of $25, $30, and $35, respectively. It is assumed that the sample is independent between observations. Where applicable, an initial wealth level of $100 will be assumed. Because of the small sample size and the distribution properties evidenced by some of the sample statistics, it would be difficult
K.D. Schumann / Journal of Econometrics 162 (2011) 71–78
Fig. 1. Linear-smoothed empirical distribution functions (edfs) for the returns of 6 alternative investments of sample size 20.
75
Fig. 2. Mean–variance diagram for the returns of 6 alternative investments of sample size 20.
to assume a parametric distributional form for any or all of the alternatives. The sample exhibits varying levels of modality, skewness, and kurtosis. It should also be noted that there are dependencies between the samples, as is shown in the sample covariance matrix in Table 2. The dependency relationship can be an important factor in preference ranking since returns on investments are often functions of similar economic drivers. As illustrated in Fig. 1, there are no distributions that exhibit first degree dominance over any of the alternatives. Preference rankings using the mean–variance method are not altogether clear. Alternative Xj is preferred to alternative Xl in the sense of a mean–variance criterion (≻mv ) if it lies in the southeast quadrant of the Xl cross that is centered on the point based on the coordinates of the sample mean and standard deviation. Under these criteria, as can be seen in the diagram in Fig. 2, based on the sample estimates, X3 ≻mv X2 ∼mv X1 ≻mv X4 .
(25)
Also note that X3 ≻mv X5 . Table 3 lists all of the preference orders for the six options. The table is read as ‘‘(Xj ), in the leftmost column, is preferred to, equal or not preferred to, or subordinate to Xj+1 , Xj+2 , . . ., listed in the top row, in terms of mean–variance preferences’’. Note that several of the comparisons are ambiguous. The mean–variance analysis does not indicate the preference trade-off for simultaneously increasing expected values and variability. Based on sample estimates of the first two moments, it is clear that only alternative X4 can be eliminated from the efficient set of preferred alternatives. This type of mean–variance analysis does not account for dependency relationships between the variables. For the remaining comparisons, a power utility function will be utilized. This utility function is characterized by constant relative risk aversion. Deference is not being shown to this function as a preferred decision mechanism, it is only being used for example purposes. Consequently, the relative ranges on the utility parameter are often out of the bounds of ranges used in applied analyses; this is done simply to show the shape of the decision mechanism. The power utility function is specified as
k(x) =
1
1−θ ln x;
x1−θ ;
if θ ̸= 1;
(26)
otherwise.
The SERF method was employed to compare the alternatives. Based on the sample, only three of the six alternatives could ever be attributed to an efficient set depending on which sub-set of θ , the relative risk aversion parameter, is chosen. Fig. 3 illustrates
Fig. 3. Stochastic efficiency with respect to a function (SERF) analysis for the returns of 6 alternative investments of sample size 20. θ , an index of relative risk aversion, is the parameter of the power utility function used in formulating the certainty equivalents. Table 2 Sample covariance matrix of the 6 alternatives.
X1 X2 X3 X4 X5 X6
X1
X2
X3
X4
X5
X6
855.00
508.84 855.02
364.81 −68.07 593.76
179.40 54.84 150.21 1163.75
441.84 140.26 354.53 216.00 593.76
−640.30 −538.74 −226.20 362.97
−316.19 1163.73
the certainty equivalent lines over a range of θ . For all risk loving agents (θ ∈ (−∞, 0)), only X1 and X6 would be preferred. (X1 ) would be preferred for those highly risk loving individuals, and X6 would be preferred for those who are moderately risk loving to risk neutral. For all risk averse agents (θ ∈ (0, ∞)), X6 , X3 , or both would be preferred. X6 would be preferred by risk neutral to moderately risk averse agents, and those who are highly risk averse would prefer alternative X3 . Pairwise comparisons can be made in the context of a SERF analysis given a specified utility function parameterization. For θ = 0, or risk neutral agents, X6 is preferred over the remaining alternatives and X5 is dominated by all other alternatives. A risk neutral agent would be indifferent between the remaining four alternatives. Over the entire range of θ , there are several preference
76
K.D. Schumann / Journal of Econometrics 162 (2011) 71–78
Table 3 Preference orders based on a sample mean–variance analysis of the 6 alternatives.
X1 X1 X1 X1 X1
X2
X3
X4
X5
X7
∼mv
≺mv ≺mv
≻mv ≻mv ≻mv
? ?
≻mv
? ? ?
?
≺mv ?
Table 4 Preference orders based on pairwise tests of second degree stochastic dominance of 6 alternatives from unknown distributions based on a sample size of 20% and 95% confidence level.
θ =0
X2
X3
X4
X5
X6
X1 X2 X3 X4 X5
∼k
∼k ∼k
∼k ∼k ∼k
≻k ∼k ≻k ∼k
∼k ∼k ∼k ∼k ∼k
Table 5 Preference orders based on a sample second degree stochastic dominance with respect to a function analysis of 6 alternatives from unknown distributions based on a sample size of 20% and 95% confidence level. k is specified to be the power utility function. The pairwise preference orderings are based on θ = −6 and θ = 6, where θ is the coefficient of relative risk aversion. X2
X3
X4
X5
X6
∼k
∼k ∼k
∼k ∼k ∼k
≻k ≻k ≻k ∼k
∼k ∼k ∼k ∼k ∼k
∼k
∼k ∼k
∼k ∼k ≻k
∼k ∼k ≻k ∼k
∼k ∼k ∼k ∼k ∼k
θ = −6 X1 X2 X3 X4 X5
θ =6
Fig. 4. Test statistic, T40 (θ), based on sample of 20 returns of 6 alternative investments, X1 , . . . , X6 . The null hypothesis of indifference between two given alternatives is rejected in favor of H1 : Xj SSD(kθ ) X5 , j ̸= 5 in large samples for values of T40 (θ) < za=0.05 , where kθ is the power utility function with θ as the coefficient of relative risk aversion.
changes, which illustrates why comparisons at discrete values of θ can be misleading when allocating alternatives to the efficient set. For example, numerically evaluating the preferences at only θ = −15 and θ = 15, albeit these are impractical extremes in most cases, would exclude X6 from the efficient set even though it is the dominate alternatives for moderately risk loving to moderately risk averse agents. The formal testing procedure developed in Section 3 will now be utilized. The hypothesis framework necessitates that the direction of preference and utility parameterization be specified a priori. Once this is done, the testing method indicates the magnitude of preference based on the given sample and confidence level. Also, since the jackknife method was employed to estimate the variance of the test statistics at given values of θ , the between variable dependencies have been maintained and reflected in the results. Based on the mean–variance and SERF analyses, Alternative 5 appears to be at the lower end of preferences. This alternative will be used as a base for pairwise hypothesis tests of second degree stochastic dominance with respect to a function. Fig. 4 shows the results of the pairwise tests of the hypothesis H1 : Xj SSD(k) X5 for j ̸= 5 over the set where θ ∈ Θ1 = (−20, 20) using a 95% confidence level. Each test, if conducted independently, rejects the null hypothesis of indifference if T40 (θ ) = D20,20 (θ )/SED20,20 (θ) < za ≈ −1.645. From this illustration, it can be seen that only Alternatives 1, 2, or 3 dominate Alternative 5 with respect to the power utility function depending on the subset of θ . There is not enough information to reject the null hypothesis for any of the other comparisons. Comparing the results to the pairwise comparisons in the mean–variance analysis in Table 3, it can be seen that strict preference rankings resulting from the mean–variance are not everywhere statistically significant over the given range of θ , which might indicate over-confidence in the mean–variance rankings, especially since an asymptotic statistic is used in the test
X1 X2 X3 X4 X5
procedure. On the other hand, where comparisons to X5 lead to indeterminate results in the mean–variance analysis, a sense of the direction of preference relative to this alternative can be gained by whether or not the statistic is negative for a given θ . Let ≻k and ∼k denote ‘‘dominates’’ and ‘‘does not dominate’’ with respect to a utility function k, respectively, where k is the power utility function in this example. When, in this case, θ = 0, the comparisons are based on a risk neutral decision maker and the test reduces to the form in ESY, i.e., H1 : Xj SSD X5 for j ̸= 5. Table 4 presents the results of these tests for pairwise comparisons. The tests reveal indifference in general, apart from X1 and X3 second degree dominating X5 . Similar to Meyer’s method (1977b), let the arbitrary range of θ ∈ [−6, 6], representing moderately risk loving to moderately risk averse agents, be a range of interest. Table 5 represents the preference ordering for the outer bounds of this range. For these values of θ , the preference of X3 over X5 resulting from the mean–variance analysis is reinforced. Again, statistical indifference between two alternatives provides more information than the indeterminate results presented in the mean–variance analysis. It should also be noted that only one dominance relationship is shown to be significant in all of the examples of risk loving, risk neutral, and risk averse comparisons. 4. Conclusions Risk preferences in practice are typically restricted to moderately risk loving to moderately risk averse ranges, where special emphasis is put on risk averse agents because of their particular operations in marketplaces. Methods using stochastic dominance with respect to function assumptions do not rely strictly on the mean or mean and variance of the alternatives8 but rather on a
8 For example, use of the quadratic utility function in expected utility analyses, especially in conjunction with distributions fully specified by their first two moments, yields mean–variance analyses. The common criticism of using this type of utility function is that it exhibits increasing absolute risk aversion (cf. Pratt, 1964, for a critique).
K.D. Schumann / Journal of Econometrics 162 (2011) 71–78
weighted comparison of the entire distribution represented by the sample. The assumed utility function and specific parameterization are attempts to model decision behavior by weighting the distributions. The testing procedure developed here is a generalization of other tests for second degree stochastic dominance. Meanvariance type analyses depend on the first two moments of the alternatives, but rankings based on this procedure may not imply stochastic dominance of one alternative over another. The testing procedure developed here is an analog to mean–variance comparisons in that it is a statistical assessment of the meanGini’s mean difference comparison described by Yitzhaki (1982). It is also a generalization of Eubank, Schechtman, and Yitzhaki’s nonparametric method for testing second degree stochastic dominance. Methods comparing the certainty equivalents of alternatives, such as stochastic efficiency with respect to a function, produce rankings over ranges of risk indices but may also result in over-confident assertions of the efficient set if the properties of the sample are not adequately taken into account. Testing for stochastic dominance with respect to a function is a more formal means to make these comparisons, and thus it is consistent with Meyer’s approach to compare risky alternatives by explicitly incorporating the utility function. As with other methods that attempt to model the decision mechanism of an agent, much has to be known about the preferences of this agent so that the a priori conditions for the hypothesis can be satisfied. In addition, a notion of the direction of preference needs to be in place because the test does not distinguish preferences in both directions.
−∞
=
1 2 1
dn,m (x|θ )dGm (x) + −∞
2
+
∞
[∫
n i =1
[∫
2
∞
dn,m (y|θ )dFn (y)
]
Wi −
m 1 −
m j =1
−∞
It then must be shown that
∫
∞
[1 − Gm (t )] Fn (t )dk(t |θ ) =
−∞
=
Tj
Fn (x) [1 − Fn (x)] dk(x|θ )
∫
where Wi =
k(yj |θ )]dFn (t ).
xi
[k(t |θ ) − k(xi |θ )]dGm (t ) and Tj =
∞
dn,m (x|θ )dGm (x) =
∫
−∞
=
∞
∫
−∞ n 1−
n i=1
∫
∞ yj
[k(t |θ ) −
[1 − Gm (t )] Fn (t )dk(t |θ )
=
[1 − Gm (t )] Gm (t )dk(t |θ ),
Wi − −∞
dn,m (x|θ )dGm (x) = −∞
∫
∞ −∞
∞
[∫
−∞ ∫ ∞
= −∞
∫
[1 − Gm (t )] dk(t |θ ). n i=1 xi Integration by parts produces ] ∫ ∞ n [ 1− = −k(xi |θ )(1 − Gm (xi )) + k(t |θ )dGm (t ) n i=1 xi
=
n ∫ 1− ∞
n i=1
[k(t |θ ) − k(xi |θ )] dGm (t ) =
xi
n 1−
n i=1
Wi
which is in accordance ∑mwith the previous definition for Wi . A simi1 lar process yields m j=1 Tj for the complementary component of Dn,m (θ ). Proof of Theorem 2. Now, recall the result of Theorem 1:
1 2
kθ (Y ) − kθ (X ) +
1 2
GMD[kθ (X )] −
1 2
m 1 −
Wi −
m j =1
n 1−
Tj = kθ (Y ) − kθ (X ).
n i =1
∞
∫
[1 − Gm (t )] Fn (t )dk(t |θ )
Wi = −∞
m ∫ 1 −
=
yi
m j=1 −∞
Fn (t )dk(t |θ ),
and by integration by parts we find that m ∫ 1 −
[Fn (t ) − Gm (t )] dk(t |θ )dGm (x) −∞
m j=1 −∞
=
=
]
t
m 1 −
Fn (t )dk(t |θ )
[
m j=1 m ∫ 1 −
k(yj |θ )Fn (yj ) −
∫
yj
yj
k(yj |θ ) − k(t |θ ) dFn (t ) =
Recall also that ∞
[k(t |θ ) − k(yj |θ )]dFn (t )
Tj = yj
k(t |θ )dFn (t )
]
−∞
m j=1 −∞
∫ [1 − Gm (t )] [Fn (t ) − Gm (t )] dk(t |θ )
yi
n 1−
n i =1
GMD[kθ (Y )] .
Note that, from the previous proof,
x
dGm (x) [Fn (t ) − Gm (t )] dk(t |θ )
=
[1 − Gm (t )] I[x(i) ,∞] (t )dk(t |θ )
−∞
∞
∞
∞
n i=1 −∞ n ∫ 1− ∞
[Fn (t ) − Gm (t )] dk(t |θ )dGm (x)
∞
∫
n ∫ 1− ∞
x
where Wi = x [k(t |θ ) − k(xi |θ )]dGm (t ). A similar result will then i hold for the second component of Dn,m (θ ). Now, explicitly, we have
∫
Wi .
−∞
n i =1
Proof. For the first component of Dn,m (θ ), we show that
∫
n i=1
[k(t |θ ) − k(xi |θ)] dGm (t )
xi
∞
n 1−
−∞
∞
n i=1 n 1−
Proof. The proof proceeds by first showing that
∞
] Gm (y) [1 − Gm (y)] dk (y|θ ) ,
−
n ∫ 1− ∞
Now let x1 < x2 < · · · < x(n) be the ordered x observations and let I[a,b] (t ) be the indicator function that takes on a value of one for ∑n values in [a, b] and zero otherwise. Then Fn (t ) = 1n i=1 I[x(i) ,∞] (t ), and the first term in our previous expression becomes
Dn,m (θ ) =
−∞ ∞
∫
[1 − Gm (t )] Gm (t )dk(t |θ ).
−
−∞
n 1−
1
∫
∞
∫
Appendix
Dn,m (θ ) =
[1 − Gm (t )] Fn (t )dk(t |θ )
=
= Proof of Theorem 1. The following will show the result of theorem that
77
∞
∫
Wi .
78
K.D. Schumann / Journal of Econometrics 162 (2011) 71–78
and so m 1 −
m j =1
m ∫ 1 −
Tj =
m j =1
∞
[k(t |θ ) − k(yj |θ )]dFn (t ).
yj
Therefore, n 1−
n i=1
Wi −
Tj
m
j=1
∫ m 1 −
=
m j=1
m
∫
1
yj
k(yj |θ ) − k(t |θ ) dFn (t )
−∞
∞
∫
k(t |θ ) − k(yj |θ ) dFn (t )
− yj
∫ m 1 −
=
m j=1
yj
k(yj |θ )dFn (t ) −
k(t |θ )dFn (t ) +
−
∫
m 1 −
[
m j=1
k(yj |θ ) −
∞
k(yj |θ )dFn (t )
∫
∞
k(t |θ )dFn (t )
]
−∞
m
=
k(t |θ )dFn (t )
yj
yj
=
yj
−∞
−∞
∞
∫
∫
1 −
k(yj |θ ) −
m j=1
m n 1 −− 1
m j =1 i =1 n
k(xi |θ )
= kθ (Y ) − kθ (X ). ∞
Now, with respect to the second term of −∞ dn,m (x|θ )dGm (x), that being ∞
∫
[1 − Gm (t )] Gm (t )dk(t |θ ), −∞
a change of variable, vθ = k(t ), results in
∫
[1 − Gm (k−1 (v|θ ))]Gm (k−1 (v|θ ))dv, V
assuming k−1 (·) exists. And thus, reviewing Kendall and Stuart (1977, Chapter 2) and Yitzhaki (1982), the term becomes 1 2
m
−− k(Yi |θ ) − k(Yj |θ ) ,
2
i<j
∞
or [kθ (Y )]. Again, a similar result holds for −∞ [1 − Fn (t )] Fn (t )dk(t |θ ). 1 GMD 2
References Anderson, G., 1996. Nonparametric tests of stochastic dominance in income distributions. Econometrica 64, 1183–1193. Arrow, K.J., 1965. Aspects of the Theory of Risk-Bearing. Yrjö Hahnsson Foundation, Helsinki.
Arvesen, J.N., 1969. Jackknifing U-statistics. The Annals of Mathematical Statistics 40 (6), 2076–2100. Bishop, J.A., Chakraborti, S., Thistle, P.D., 1989. Asymptotically distribution-free statistical inference for generalized Lorenz curves. The Review of Economics and Statistics 71, 725–727. Crawford, Ian, 2005. A nonparametric test of stochastic dominance in multivariate distributions, presented at Cemmap Workshop: Testing Stochastic Dominance Restrictions (in honour of the contribution of Professor Haim Levy), Institute for Fiscal Studies, London. Davidson, R., Duclos, J., 2006. Testing for restricted stochastic dominance, Unpublished Working Paper, Montr’eal, Qu’ebec, Canada. Deshpande, J.V., Singh, S.P., 1985. Testing for second order stochastic dominance. Communications in Statistics: Theory and Methods 14, 887–893. Diamond, P.A., Stiglitz, J.E., 1974. Increases in risk and in risk aversion. Journal of Economic Theory 8, 337–360. Eubank, R., Schechtman, E., Yitzhaki, S., 1993. A test for second order stochastic dominance. Communications in Statistics: Theory and Methods 22, 1893–1905. Friedman, M., Savage, L.J., 1948. The utility analysis of choices involving risk. The Journal of Political Economy 56, 279–304. Friedman, M., Savage, L.J., 1952. The expected-utility hypothesis and the measurability of utility. The Journal of Political Economy 60, 463–474. Hadar, J., Russell, W.R., 1969. Rules for ordering uncertain prospects. The American Economic Review 59, 25–34. Hanoch, G., Levy, H., 1969. The efficiency analysis of choices involving risk. The Review of Economic Studies 36, 335–346. Hardaker, J.B, Richardson, J.W., Lien, G., Schumann, K.D., 2004. Stochastic efficiency analysis with risk aversion bounds: A simplified approach. Australian Journal of Agricultural and Resource Economics 48, 253–270. Kaur, A., Prakasa Rao, B.L.S., Singh, H., 1994. Testing for second-order stochastic dominance of two distributions. Econometric Theory 10, 849–866. Kendall, M., Stuart, A., 1977. The Advanced Theory of Statistics, Volume 1, 4th ed. MacMillan Publishing Co., Inc., New York. Klecan, L., McFadden, R., McFadden, D., 1991. A robust test for stochastic dominance, Working paper, Economics Dept., MIT. McDonald, J.D., Moffitt, L.J., Williams, C.E., 1997. Application of mean-Gini stochastic efficiency analysis. Australian Journal of Agricultural and Resource Economics 41, 45–62. McFadden, D., 1989. Testing for stochastic dominance. In: Fombay, T., Seo, T.K. (Eds.), Studies in Economics of Uncertainty. Springer, New York. Meyer, J., 1975. Increasing risk. Journal of Economic Theory 11, 119–132. Meyer, J., 1977a. Further applications of stochastic dominance to mutual fund performance. Journal of Financial and Quantitative Analysis 12, 235–242. Meyer, J., 1977b. Second degree stochastic dominance with respect to a function. International Economic Review 18, 477–487. Post, T., Versijp, P., 2005. Multivariate tests for stochastic dominance efficiency of a given portfolio, presented at Cemmap Workshop: Testing Stochastic Dominance Restrictions (in honour of the contribution of Professor Haim Levy), Institute for Fiscal Studies, London. Pratt, J.W., 1964. Risk aversion in the small and in the large. Econometrica 32, 122–136. Rothschild, M., Stiglitz, J.E., 1970. Increasing risk: A definition. Journal of Economic Theory 2, 225–243. Rothschild, M., Stiglitz, J.E., 1971. Increasing risk: Its economic consequences. Journal of Economic Theory 3, 66–84. Seiler, E.J., 2001. A nonparametric test for marginal conditional stochastic dominance. Applied Financial Economics 11, 173–177. Serfling, R.J., 1980. Approximation Theorems of Mathematical Statistics. John Wiley, New York. Shalit, H., Yitzhaki, S., 1994. Marginal conditional stochastic dominance. Management Science 40, 670–684. Von Neumann, J., Morgenstern, O., 1953. Theory of Games and Economic Behavior, 3rd ed. Princeton University Press, Princeton, NJ. Yitzhaki, S., 1982. Stochastic dominance, mean variance, and Gini’s mean difference. The American Economic Review 72, 178–185. Yitzhaki, S., 1991. Calculating Jackknife variance estimators for parameters of the Gini method. Journal of Business and Economics Statistics 9, 235–239.
Journal of Econometrics 162 (2011) 79–88
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Mixture models of choice under risk Anna Conte a,b , John D. Hey c,d,∗ , Peter G. Moffatt e a
Strategic Interaction Group, Max-Planck-Institut für Ökonomik, Jena, Germany
b
Centre for Employment Research, University of Westminster, London, UK
c
LUISS, Rome, Italy
d
University of York, UK
e
School of Economics, University of East Anglia, Norwich, UK
article
info
Article history: Available online 13 October 2009 JEL classification: C15 C29 C51 C87 C91 D81
abstract This paper is concerned with estimating preference functionals for choice under risk from the choice behaviour of individuals. We note that there is heterogeneity in behaviour between individuals and within individuals. By ‘heterogeneity between individuals’ we mean that people are different, in terms of both their preference functionals and their parameters for these functionals. By ‘heterogeneity within individuals’ we mean that the behaviour may be different even by the same individual for the same choice problem. We propose methods of taking into account all forms of heterogeneity, concentrating particularly on using a Mixture Model to capture the heterogeneity of preference functionals. © 2009 Elsevier B.V. All rights reserved.
Keywords: Expected utility theory Maximum simulated likelihood Mixture models Rank dependent expected utility theory Heterogeneity
1. Introduction As is clear from Starmer (2000), the past five decades have witnessed intensive theoretical and empirical research into finding a good descriptive theory of behaviour under risk. Since the general acceptance of the criticisms of Expected Utility made by Allais (for example, in Allais, 1953) and others, theorists have been active in developing new theories to explain the deficiencies of Expected Utility theory. Hey (1997) provides a list1 of the major theories at that time: Allais’ 1952 theory, Anticipated Utility theory, Cumulative Prospect theory, Disappointment theory, Disappointment Aversion theory, Implicit Expected (or linear) Utility theory, Implicit Rank Linear Utility theory, Implicit Weighed Utility theory, Lottery Dependent Expected Utility theory, Machina’s Generalised Expected Utility theory, Perspective theory, Prospect theory, Prospective Reference theory, Quadratic Utility theory, Rank Dependent Expected (or Linear) Utility theory, Regret theory, SSB theory, Weighted Expected Utility theory, and Yaari’s Dual theory. All these theories were motivated by the inability of Expected Utility
∗ Corresponding address: Department of Economics and Related Studies, University of York, Heslington, York, YO10 5DD, United Kingdom. Tel.: +44 1904 433786; fax: +44 1904 433759. E-mail address:
[email protected] (J.D. Hey). 1 Full references can be found in Hey (1997). 0304-4076/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2009.10.011
theory to explain all observed behaviour. This burst of theoretical activity took place in the last thirty years or so of the 20th century. Since then, activity has been concentrated more on discovering which of these theories are empirically most plausible and robust; see, for example, Hey and Orme (1994). This period of empirical work revealed clearly that there is considerable heterogeneity of behaviour both between individuals and within individuals. By ‘heterogeneity between individuals’ we mean that people are different, not only in terms of which type of preference functional that they have, but also in terms of their parameters for these functionals. By ‘heterogeneity within individuals’ we mean that the behaviour may be different even for the same choice problem. Econometric investigation has to take these heterogeneities into account. Some of the empirical literature adopted the strategy of trying to find the best preference functional individual by individual; see, for example, Hey and Orme (1994) and Gonzales and Wu (1999). Another part of the literature attempted to find the best preference functional across a group of individuals, by, in some way, pooling or aggregating the data; see, for example, Harless and Camerer (1994). In fitting data subject by subject, the problem of heterogeneity within subjects becomes immediately apparent in two different ways. First, when confronted with the same decision problem on different occasions, people respond differently. Second, and perhaps more importantly, it was soon realised that none of the long list of preference functionals listed above fitted any (non-trivial) data exactly. Economists responded
80
A. Conte et al. / Journal of Econometrics 162 (2011) 79–88
in their usual fashion — by declaring that individuals were noisy in their behaviour, or that they made errors of some kind when taking decisions. At this point, interest centred on ways of describing such noise and incorporating it into the econometric investigation. A number of solutions were proposed: the constant-probabilityof-making-a-mistake model of Harless and Camerer (1994), the Fechner-error model adopted by Hey and Orme (1994), and the random-preference model of Loomes and Sugden (1998), implemented econometrically by Loomes et al. (2002). In the first of these, subjects in experiments are thought of as implementing their choices with a constant error; in the second, subjects were perceived as measuring the value of each option with some error; in the third, subjects were thought of as not having precisely defined preferences, but preferences drawn randomly from some probability distribution. The tremble model, analysed in Moffatt and Peters (2001), can be considered like the constant-probability model but perhaps appended to one of the other two types. A useful discussion of the relative merits of these different models can be found in Ballinger and Wilcox (1997), which concludes that the constant-probability model on its own is dominated by the other two approaches. Further results can be found in Buschena and Zilberman (2000). Those economists who followed the measurement error story soon realised that the error might not be homoscedastic and could well depend on the nature of the choice problem (see, for example Hey, 1995). Indeed, Blavatskyy (2007) argues that, with the appropriate heteroscedastic error specification, Expected Utility theory can explain the data at least as well as any of the generalisations (after allowing for degrees of freedom). Not all would go as far as this, but the incorporation of some kind of error story has led to the demise of many of the theories noted in the list above. Two remain pre-eminent: Expected Utility – henceforth EU – theory; and Rank Dependent Expected Utility – henceforth RDEU – theory (Quiggin, 1982). Machina (1994) comments that the Rank Dependent model is ‘‘the most natural and useful modification of the classical expected utility formula’’. In certain contexts, for example the Cumulative Prospect theory of Tversky and Kahneman (1992), the theory is enriched with a context dependent reference point. Nevertheless, the consensus seems to be that EU theory and RDEU theory remain the leading contenders for the description of behaviour under risk. As we have already remarked, some of the investigations of the appropriate preference functional have taken each individual separately and have carried out econometric work individual by individual. There are problems here with degrees of freedom and with possible over-fitting. Other investigations have proceeded with pooled data — from a set of individuals. The problem with this latter approach, even though it saves on degrees of freedom, is that individuals are clearly different. They are different, not only in terms of which type of preference functional that they have, but also in terms of their parameters for these functionals. The latter can be taken care of by assuming a distribution of the relevant parameters over the individuals concerned and in estimating the parameters of this distribution. This heterogeneity may depend on observable and observed (demographic) characteristics of the individuals or it may just be unobserved heterogeneity. In either case, estimating the parameters of the distribution saves on degrees of freedom compared with estimating the underlying economic parameters for each individual. Moreover, the resulting estimates may be preferred if they are going to be used for predicting the behaviour of the same, or a similar, group of individuals. Some economists are now taking into account such heterogeneity. The dangers of not so doing are well illustrated by Wilcox (2006), who shows that serious distortions in the econometric results may well be the consequence. Similarly, the paper by Myung et al. (2000) shows clearly the problems with fitting a single agent model to a heterogeneous population.
Taking into account the fact that different individuals may have different preference functionals is more difficult. In this paper we adopt a solution: that of using a Mixture Model; see McLachlan and Peel (2000). We emphasise that we are by no means the first to use such a solution in such a context: a very useful reference is Harrison and Rutstrom (2009), which includes a discussion of the previous use of mixture models in economics.2 We restrict our attention to EU theory and RDEU theory, and we proceed by assuming that a proportion (1 − p) of the population from which the sample is drawn have EU preference functionals, and the remaining proportion have RDEU preference functionals. The parameter p is known as the mixing proportion, and it is estimated along with the other parameters of the model. Obviously the method can be extended to more than two functionals, but the purpose of this paper is to illustrate the power of the approach. Moreover, within each model we shall assume heterogeneity of parameters. Thus we take into account both types of heterogeneity between individuals, without sacrificing degrees of freedom, and without getting distorted results. Finally, to take into account heterogeneity within subjects we shall incorporate both a Fechnertype error and a tremble. We illustrate the approach with data from an experiment reported in Hey (2001). The next section describes the experiment. Section 3 details the specification of EU theory and RDEU theory, while Section 4 discusses the econometric detail, including the application of the Mixture Model (with unobserved heterogeneity) in this context. Section 5 discusses the results and Section 6 concludes. 2. The experiment and the data The data used in this study, previously analysed by Hey (2001) and more recently by Moffatt (2005), was obtained from 53 subjects, drawn from the student population of the University of York. Each subject faced a set of 100 pairwise-choice problems between two different lotteries, repeated on five different days over a two-week period, so that the total number of problems faced by each subject is 500. The ordering of the problems changed between days and also between subjects. The probabilities defining the 100 problems are listed in Table A.1 in the Appendix A. All 100 problems involved three of the four outcomes £0, £50, £100 and £150. The random lottery incentive system was applied: at the end of the final session, one of the subject’s 500 chosen lotteries was selected at random and played for real. For each subject and for each pairwise-choice problem we know the lottery chosen by the subject. The resulting matrix, of size 500 by 53, is our data. 3. The preference functionals under consideration3 We denote the four outcomes in the experiment by xi (i = 1, 2, 3, 4).4 In both the EU formulation and the RDEU formulation, there is a utility function, and we denote the corresponding utility values by ui (i = 1, 2, 3, 4). We normalise5 so that u1 = 0 and u4 = 1. Each choice problem involves two lotteries: the p-lottery and the q-lottery. We denote the probabilities of the four outcomes in these two lotteries in pairwise-choice problem t (t = 1, . . . , 500) by p1t , p2t , p3t , p4t and q1t , q2t , q3t , q4t respectively. The EU specification envisages subjects evaluating the expected utilities EU (pt ) and EU (qt ) of the two lotteries in pairwise-choice
2 We note that, while this paper and that of Harrison and Rutstrom (2009), are similar in many respects, there are differences, in particular that we include unobserved heterogeneity of parameter values across individuals. They, however, include demographic effects, which we do not. 3 A glossary of notation can be found in Table A.2. 4 Respectively £0, £50, £100 and £150. 5 The utility function in both specifications is unique only up to a linear transformation.
A. Conte et al. / Journal of Econometrics 162 (2011) 79–88
problem t as in Eq. (1). EU (pt ) = p2t u2 + p3t u3 + p4t
(1)
EU (qt ) = q2t u2 + q3t u3 + q4t .
In the absence of error, the EU specification envisages the subject choosing pt (qt ) if and only if d2t u2 + d3t u3 + d4t > (<) 0
(2)
where djt = pjt − qjt (j = 2, 3, 4). To incorporate the fact that subjects are noisy in their choice behaviour, we add a (Fechnerian) stochastic term to (2), implying that the subject chooses pt (qt ) if and only if d2t u2 + d3t u3 + d4t + εt > (<) 0
(3)
where we assume that each εt is independently and normally distributed with mean 0 and standard deviation σ . The magnitude of σ indicates the noisiness in the choices: the larger the value of this parameter, the greater the noise. We estimate σ along with the other parameters. Later we add a tremble, and at the end of Section 4 we note how our estimation procedure would need to be modified for the random preferences story of randomness in behaviour. The RDEU specification looks similar to that of EU but the subjects are envisaged as transforming the objective probabilities in a specific way. As a consequence, under RDEU theory, subjects evaluate the rank dependent expected utilities RDEU (pt ) and RDEU (qt ) of the two lotteries as in Eq. (4). RDEU (pt ) = P2t u2 + P3t u3 + P4t RDEU (qt ) = Q2t u2 + Q3t u3 + Q4t
(4)
where the P ′ s and Q ′ s are not the correct probabilities though they are derived from the correct probabilities in the manner specified in (5): P2t = w(p2t + p3t + p4t ) − w(p3t + p4t ) Q2t = w(q2t + q3t + q4t ) − w(q3t + q4t ) P3t = w(p3t + p4t ) − w(p4t ) P4t = w(p4t )
Q3t = w(q3t + q4t ) − w(q4t )
(5)
Q4t = w(q4t ).
(6)
where Djt = Pjt − Qjt (j = 2, 3, 4). Once again, to incorporate the fact that subjects are noisy in their choice behaviour, we add a (Fechnerian) stochastic term to Eq. (6), implying that the subject chooses pt (qt ) if and only if D2t u2 + D3t u3 + D4t + εt > (<) 0.
(7)
To proceed to estimation, we now need to parameterise both the utility function and the weighting function. For the former, there are two possibilities: (1) we could estimate u2 and u3 ; or (2) we could adopt a particular functional form and estimate the parameter(s) of that function. As we want to have a parsimonious specification, and as we want to introduce unobserved heterogeneity, we follow the second route. The most obvious contenders for the functional form are Constant Absolute Risk Aversion (CARA) and Constant Relative Risk Aversion (CRRA). In each of these there is one parameter. Given our normalisation, these forms can be written as in Eqs. (8) and (9). CARA: u(x) =
1 − exp(−rx)
1 − exp(−150r ) = x/150 r = 0
CRRA: u(x) = (x/150) . r
We will assume later that the parameter r in both formulations is distributed over the population (from which our subjects were recruited) and we will estimate the parameters of that distribution. For the CARA function, the parameter r can take any value between −∞ and +∞, and r is positive for risk averters, 0 for risk-neutral agents (for whom the functions become linear), and negative for risk-loving agents. For the CRRA function, the parameter r has to be positive,6 and r is less than 1 for risk-averter agents, equal to 1 for risk-neutral agents, and greater than 1 for risk-loving agents. For the weighting function we follow a similar route. The most parsimonious functions used in the literature are the Quiggin (1982) function and the Power function. These can be written as in Eq. (10), where we call γ the weighting-function parameter. pγ
Quiggin: w(p) =
(pγ + (1 − p)γ )1/γ Power: w(p) = p γ > 0.
γ > γ∗
γ
(10)
For those subjects who act in accordance with RDEU theory we will assume that the risk-aversion parameter, r, and the weightingfunction parameter, γ , are jointly distributed over the population from which our sample is drawn and we will estimate the parameters of that distribution. For the Quiggin function, γ must be greater than γ ∗ (otherwise w(.) is not monotonic),7 while for the Power function, γ must be positive. For both functions, RDEU theory reduces to EU theory when γ = 1. When γ ̸= 1, the Power function is either completely above or completely below the 45◦ -line. In contrast, the Quiggin function does cross the 45◦ -line, either with an S-shape or an inverted S-shape. This is often seen as an advantage of the Quiggin function. In what follows, we estimate all four combinations: CARA with Quiggin, CARA with Power, CRRA with Quiggin and CRRA with Power, so we can test for the robustness of our results.8 4. The econometric specification
Here the function w(.) is a probability weighting function which is monotonically non-decreasing everywhere in the range [0, 1] and for which w(0) = 0 and w(1) = 1. Note that, if w(p) = p everywhere, RDEU reduces to EU. In the absence of error, the RDEU specification envisages the subject choosing pt (qt ) if and only if D2t u2 + D3t u3 + D4t > (<) 0
81
r ∈ (−∞, 0) ∪ (0, ∞) (8) (9)
Let us use the binary indicator yt = 1(−1) to indicate that the subject chose pt (qt ) on problem t. We start with the EU specification. From the choice rule given by (3), we obtain the likelihood contribution for a single subject’s choice in problem t: P (yt |r , σ ) = Φ [yt (d2t u2 + d3t u3 + d4t ) /σ ]
yt ∈ { 1, −1 } (11)
where Φ (.) is the unit normal cumulative distribution function. Note that this depends on the risk-aversion parameter r and the standard deviation of the error term σ . We now introduce a tremble. By this we mean that the individual implements the choice indicated by Eq. (3) with probability (1 −ω), and chooses at random between the two lotteries with probability ω. The parameter ω is called the ‘‘tremble probability’’. Introducing this parameter into (11), the likelihood contribution becomes P (yt |r , σ , ω) = (1 − ω)Φ [yt (d2t u2 + d3t u3 + d4t )/σ ] + ω/2 yt ∈ {−1, 1}.
(12)
Following the same route for the RDEU specification, we obtain the likelihood contribution: P (yt |r , γ , σ , ω)
= (1 − ω)Φ [yt (D2t u2 + D3t u3 + D4t )/σ ] + ω/2 yt ∈ {−1, 1}.
(13)
6 We note that in some formulations the parameter r can be negative (in which case a different functional form is required). Wakker (2008) has an extended discussion of this case. However, we exclude this for a number of reasons, not least that the utility of zero is minus infinity, which makes it impossible to apply our normalisation and renders meaningless the interpretation of σ as a measure of the noisiness of the subjects’ responses. 7 γ ∗ = 0.279095. 8 Here we use just four possible combinations. There are clearly other possibilities, as is discussed in Stott (2006).
82
A. Conte et al. / Journal of Econometrics 162 (2011) 79–88
Table 1 Estimates for CRRA/Quiggin specification. Parameter estimates. Maximum simulated likelihood CRRA specification. Quiggin weighting function (53 individuals, 500 observations each, standard errors in parentheses) EU only
θ (EU) Θ (RDEU)
RDEU only
−1.15599
M
(0.07217) 0.55348 (0.03040) –
S
–
ρ
–
ω
0.01375 (0.00173) 0.04823 (0.00089) –
δ (EU) ∆ (RDEU)
σ p
−0.95338 (0.02332) 0.54652 (0.01755) −0.49629 (0.02043) 0.24676 (0.01965) 0.25563 (0.10011) 0.01534 (0.00184) 0.04134 (0.00089) –
−7210.27887
Log-likelihood
[ 2 Θ ∆ ∼N , M ρ ∆S [ 2 ln(r ) Θ ∆ ln (γ − γmin ) ∼ N , M ρ ∆S r ln (γ − γmin )
ρ ∆S
]
ρ ∆S
]
(14)
S2
S2
.
(15)
L (θ, δ, Θ , ∆, M , S , ρ, σ , ω, p)
= (1 − p)
∞ −∞
500 ∏
[(1 − ω)
t =1
× Φ [yt (d2t u2 + d3t u3 + d4t ) /σ ] + ω/2] + (1 − p)
∞ −∞
∫
∞ γmin
500 ∏
−0.95425
(0.09492) 0.32431 (0.06369) –
(0.01996) 0.53947 (0.01500) −0.55465 (0.03041) 0.24031 (0.01820) 0.33793 (0.08296)
– – 0.01139 (0.00157) 0.07438 (0.00297)
0.03398 (0.00081) 0.80266 (0.05623) −6716.49907
The overall log-likelihood for all 53 subjects is just the sum of the log of L given by (16) over all 53 subjects. Estimation proceeds by maximum simulated likelihood9 (see Gouriéroux and Monfort, 2002, for the general principles) because of the computational problems with the double integral in the likelihood function. We estimate the parameters θ , δ , Θ , ∆, M, S, ρ , ω, σ , and p. The program (written in GAUSS) is available on request. We carry out estimation for all four combinations of the utility function and the weighting function. A final note before proceeding is in order. It would be possible to modify our formulation to take into account the random preference model of Loomes and Sugden (1998) – where subjects are envisaged as taking a fresh drawing from their set of preferences on every problem – by re-writing (16) as in (17).
= (1 − ρ)
500 ∫ ∏ t =1
∞
[(1 − ω)Φ [yt (d2t u2 + d3t u3 + d4t )/σ ] −∞
+ ω/2]f (r ; θ , δ)dr 500 ∫ ∞ ∫ ∞ ∏ +p [(1 − ω)Φ [yt (D2t u2 + D3t u3 + D4t )/σ ] t =1
−∞
γmin
+ ω/2]g (r , γ ; Θ , ∆, M , S , ρ)dγ dr .
(17)
Note the difference: in formulation (16) it is as if the subject’s preferences are a random drawing from the set of all preferences, but these preferences then remain fixed for the duration of the experiment, while in (17) there is a fresh drawing on every choice problem, necessitating a reversal of the product and integration operations in the formula. 5. Results
∫
−0.76438
L(θ , δ, Θ , ∆, M , S , ρ, σ , ω, p)
The joint density function of r and γ will be denoted as g (r , γ ; Θ , ∆, M , S , ρ). Note that this function is not actually a bivariate normal density, since it is not the case that both arguments are normally distributed; either one or both arguments are lognormally distributed. Note that this formulation is assuming that the distribution of the risk-aversion parameter for the EU subjects may be different from that for the RDEU subjects. Finally we assume that a proportion p of the population is RDEU and a proportion (1 − p) is EU. Hence, the contribution to the likelihood for any given subject is as given in (16).
∫
RDEU-type
−6860.18750
Note that the rank dependent parameter γ now enters the likelihood in (13), through the D variables defined in (5) and (6). We now assume that, for the population of EU individuals, the parameter r (ln(r ) in the case of the CRRA specification) is distributed normally over the population with mean θ and variance δ 2 , and we denote this normal density function as f (r ; θ , δ). For the population of RDEU individuals, we assume that the parameters r and γ have a joint distribution, such that the two quantities r (ln(r ) in the case of the CRRA specification) and ln(γ −γmin ) have a bivariate normal distribution, where γmin is 0 in the Power case and γ ∗ in the Quiggin case (see (10)). The parameters of this bivariate normal are specified for CARA in (14) and for CRRA in (15):
Mixture model EU-type
f (r ; θ , δ) dr
The results for the CRRA/Quiggin specification are reported in Table 1. Those for the other three specifications are reported in the Appendix Tables A.3–A.5 and Appendix Fig. A.1. We conclude from these tables that the CRRA/Quiggin specification fits best,
[(1 − ω)
t =1
× Φ [yt (D2t u2 + D3t u3 + D4t ) /σ ] + ω/2] × g (r , γ ; Θ , ∆, M , S , ρ) dγ dr .
(16)
9 Integration over the distribution of r in the EU model and over the joint distribution of r and γ in the RDEU model is performed by simulation. In particular we use 100 draws for each subject based on Halton sequences (Train, 2003).
EU only
RDEU only
Mixture model
−7210.27887 −7210.27887 −7766.48390 −7766.48390
−6860.18750 −6845.15422 −7485.00936 −7488.98040
−6716.49907 −6773.37108 −7285.69205 −7419.30892
log-likelihood -6800 -6750
Table 2 The maximised log-likelihoods for the different specifications.
CRRA/Quiggin CRRA/Power CARA/Quiggin CARA/Power
-6850
Table 3 Vuong tests between the various specifications. Vuong tests
0
.2
H0: Model 1 and model 2 are equally close to the true model H1: Model 1 is closer to the true model than model 2
.6
.8
1
p-value
Fig. 1. Log-likelihood function (CRRA/Quiggin).
10 The log-likelihood test statistics for the Mixture Model v EU are 988, 874, 961 and 694 (for CRRA/Quiggin, CRRA/Power, CARA/Quiggin and CARA/Power respectively; the critical value at 1% is 18.475 (7 degrees of freedom). The loglikelihood test statistics for the Mixture Model v RDEU are 287, 143, 399 and 139 (for CRRA/Quiggin, CRRA/Power, CARA/Quiggin and CARA/Power respectively; the critical value at 1% is 13.277 (4 degrees of freedom).
-6717
log-likelihood
-6717.5 .75
.8
.85
.9
mixing proportion
20
30
40
Fig. 2. Log-likelihood function (CRRA/Quiggin).
Frequency
and so we will concentrate the discussion that follows on this specification. Our justification for this can be found in the following Table 2, which reports the maximised log-likelihoods for each of the specifications. The column headed ‘EU only’ (‘RDEU only’) shows the maximised log-likelihoods when it is assumed that all the subjects are EU (RDEU); that headed ‘Mixture Model’ shows the maximised log-likelihoods when our Mixture Model, as specified by Eq. (16), is fitted to the data. Whether we assume that all the subjects are EU or all are RDEU, the CRRA specification clearly emerges as the better utility function. If we assume that all subjects are RDEU, then Quiggin is marginally better when combined with the CARA and marginally worse when combined with CRRA. However, the Vuong (1989) tests reported in Table 3, while showing that the CRRA/Quiggin specification is superior to the other specifications, do not show it to be significantly better than the other specifications. However, and crucially for the purposes of this paper, the log-likelihoods in the table above show clearly that, for all specifications, the Mixture Model fits significantly better (at very small significance levels10 ) than either of the two preference functionals individually. This is one of the crucial points of this paper, and we expand on it here, in relation to the CRRA/Quiggin specification. Table 1 shows that the Mixture Model fits the data significantly better than either of the two preference functionals individually. Hence, it follows that assuming that, in the population from which our subjects were drawn, agents are either all EU or all RDEU gives a distorted view of the truth. The mixing proportion, p, is estimated to be slightly above 0.8, suggesting that 20% of the population are EU and 80% are RDEU. Figs. 1 and 2 show the log-likelihood as a function of the mixing proportion and the peak is well-defined. A 95% confidence interval for p is (0.692, 0.913). We can use our results to calculate the posterior probabilities of each subject being either EU or RDEU, conditional on their 500 choices. Using Bayes rule we have the posterior probabilities given in Eq. (18), shown in Box I. where
-6716.5
0.43365 0.48799 0.38635 0.37325 0.42140 0.27227
10
Voung_statistic
cara_quiggin / cara_power 0.16708 crra_quiggin / crra_power 0.03012 crra_quiggin / cara_quiggin 0.28884 crra_quiggin / cara_power 0.32326 crra_power / cara_quiggin 0.19832 crra_power / cara_power 0.60596 The Vuong statistic is distributed N (0, 1) under the null hypothesis — one-sided test
.4
mixing proportion
0
Model 1/model 2
83
-6700
A. Conte et al. / Journal of Econometrics 162 (2011) 79–88
0
.2 .4 .6 .8 1 posterior probability of being of type RDEU
Fig. 3. Posterior probabilities (CRRA/Quiggin).
L is as given in Eq. (16). The resulting histogram of the posterior probabilities is shown in Fig. 3. Apart from one apparently very confused subject, that partition is close to perfect. In the Mixture Model (with the CRRA/Quiggin specification), the estimated parameters of the distribution of the log of the risk-aversion parameter show a mean and standard deviation of −0.76438 and 0.32431 for the EU subjects and a mean and standard deviation of −0.95425 and 0.53947 for the RDEU subjects. This implies a mean and standard deviation of the risk-aversion parameter, r, for the EU subjects of 0.491 and 0.027. It also implies a 95% confidence interval for the EU subjects for ln(r ) given by (−1.400, −0.129), implying a 95% confidence interval for r given by (0.247, 0.839). To interpret these figures it may be useful to note that, for a subject with a CRRA parameter of 0.247 (0.839), his or
84
A. Conte et al. / Journal of Econometrics 162 (2011) 79–88
P (subject is EU |y1 · · · y500 )
(1 − p)
∞
500 ∏
−∞
[(1 − ω) Φ [yt (d2t u2 + d3t u3 + d4t ) /σ ] + ω/2] f (r ; θ , δ) dr
t =1
= L P (subject is RDEU |y1 · · · y500 ) 500 ∞ ∞ ∏ [(1 − ω) Φ [yt (D2t u2 + D3t u3 + D4t ) /σ ] + ω/2] g (r , γ ; Θ , ∆, M , S , ρ) dγ dr p −∞ γ min t =1 =
(18)
L
Box I.
1
function, constrained to lie between 0 and 1 for the outcomes in the experiment. So, for example, a 50–50 gamble between £0 and £150, which has an expected utility of 0.5, is valuated by subjects to have an expected utility with mean 0.5 and standard deviation 0.07438 (0.03398) by the EU (RDEU) subjects. So the EU [RDEU] subjects evaluate it with 95% probability in the range (0.354, 0.646) [(0.433, 0.567)]. This error appears in line with previous estimates. The tremble probability is estimated to be 0.01139, indicating a tremble of just over 1%. As we have already noted, the CRRA/Quiggin specification appears superior to the others. Appendix Tables A.3–A.5 rather obviously show variations in the estimates obtained, particularly in the estimates of the mixing parameter. But all specifications show clearly that the Mixture Model fits the data better than either of the two models (assuming subjects are either all EU or all RDEU). To demonstrate this result is one of the main purposes of this paper.
0.8
0.6 w(p) 0.4
0.2
0
0.2
0.4
0.6
0.8
1
p Fig. 4. The mean and 95% bounds for the weighting function: Mixture model CRRA/Quiggin specification.
her certainty equivalent for a 50–50 gamble between £0 and £150 is £9.06 (£68.17). An equivalent calculation for the RDEU subjects shows a 95% confidence interval for r given by (0.134, 1.109), with corresponding certainty equivalents for a 50–50 gamble between £0 and £150 given by (£0.85, £80.29). A small fraction of the RDEU subjects are risk-loving. Again within the Mixture Model and the CRRA/Quiggin specification, our results show that the distribution of ln(γ − γ ∗ ) has an estimated mean and standard deviation of −0.55465 and 0.24031, respectively. This implies that approximately 95% of the values of γ in the population lie between 0.637 and 1.199. The implied weighting functions at these two ‘extremes’ and the weighting function at the mean are plotted in Fig. 4. It is interesting to note that this range of the possible weighting functions includes the (unique) function estimated by Tversky and Kahneman (1992). It can also be seen from Fig. 1 that the RDEU estimates also include some subjects whose weighting function is close to that of the EU subjects (for whom w(p) = p). The proportion of the population who are therefore strictly RDEU is therefore somewhat less than the 80% implied by the estimate of the weighting parameter. It is interesting to note that the correlation ρ between ln(r ) and ln(γ −γ ∗ ) is estimated to be 0.33793 (and is significantly different from zero). This implies that, in general, the more risk-loving is a subject, the higher the value of the weighting parameter. Finally we should comment on the within-subject errors. The estimates of σ (the standard deviation of the Fechnerian error) are 0.07438 and 0.03398 for the EU and RDEU subjects, respectively. These have meaning with respect to the normalisation of the utility
6. Conclusions This paper started from the observation that there is considerable heterogeneity in the behaviour of subjects in experiments.11 This heterogeneity is both within subjects and between subjects. If we want to use the data to estimate the underlying preference functionals of subjects, we need to take this heterogeneity into account. Heterogeneity within subjects can be incorporated by appending some kind of error story into our analysis. This is already a common feature of empirical work in this area. As for heterogeneity between subjects, if we wish to save on degrees of freedom by pooling our data in some way, we cannot ignore this heterogeneity. This heterogeneity can be taken into account by modelling the parameters as being distributed within the population from which our subjects are drawn. This kind of heterogeneity has already been considered in the literature (see Botti et al., 2008). Heterogeneity of preference functionals across individuals is more difficult to take into account, and this we do by using a Mixture Model (as in Harrison and Rutstrom, 2009): we assume that different agents in the population have different functionals and we estimate the proportion of each type. This is the main contribution of the paper. We show that such a Mixture Model adds significantly to the explanatory power of our estimates. We thus present a method of using the data to take into account all forms of heterogeneity. We have applied the Mixture Model to the problem of estimating preference functionals from a sample of 53 subjects for each of whom we have 500 observations. Our results show that it is misleading to assume a representative agent model – not all agents are EU and not all agents are RDEU – there is a mixture in the population. Moreover, there is significant heterogeneity in both the risk-aversion parameter and the weighting function parameter. And, of course, there is considerably heterogeneity of 11 The same is true in data obtained from other sources.
A. Conte et al. / Journal of Econometrics 162 (2011) 79–88
85
Table A.1 The 100 choice problems (Note that the questions were presented to the subjects in a random sequence with left and right randomly interchanged). t
q1
q2
q3
q4
p1
p2
p3
p4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.125 0.250 0.250 0.250 0.250 0.375 0.375 0.375 0.375 0.250 0.750 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.250
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.250 0.250 0.250 0.250 0.250 0.250 0.250 0.375 0.125 0.375 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.750 0.750 0.750 0.875 0.875 0.875 0.875 0.875 0.750 0.750 0.750 0.750 0.750 0.875 0.875 0.875 0.875 0.875 0.875 0.875 0.875 0.375
0.875 0.875 0.875 0.875 0.875 0.875 0.875 0.500 0.500 0.875 0.875 0.875 0.875 0.875 0.875 0.750 0.750 0.750 0.750 0.750 0.750 0.750 0.500 0.875 0.125 0.500 0.500 0.875 0.875 0.875 0.875 0.875 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.625 0.750 0.750 0.750 0.750 0.375 0.625 0.625 0.625 0.750 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.375 0.375 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.125 0.000 0.500 0.500 0.500 0.125 0.125 0.125 0.125 0.125 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.250 0.000 0.000 0.000 0.000 0.250 0.000 0.000 0.000 0.000 0.250 0.250 0.250 0.250 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.375
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.125 0.125 0.125 0.125 0.375 0.500 0.750 0.125 0.125 0.375 0.500 0.750 0.750 0.750 0.500 0.375 0.500 0.750 0.750 0.500 0.500 0.750 0.750 0.375 0.750 0.250 0.375 0.625 0.250 0.375 0.500 0.625 0.625 0.250 0.375 0.500 0.625 0.625 0.250 0.375 0.500 0.625 0.625 0.750 0.875 0.875 0.375
0.125 0.125 0.125 0.375 0.375 0.375 0.625 0.375 0.375 0.375 0.375 0.375 0.375 0.625 0.875 0.375 0.375 0.375 0.375 0.375 0.625 0.875 0.625 0.250 0.375 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.375 0.125 0.000 0.375 0.125 0.250 0.000 0.125 0.375 0.125 0.250 0.000 0.125 0.375 0.125 0.250 0.000 0.125 0.125 0.000 0.000 0.125
0.000 0.000 0.500 0.000 0.125 0.250 0.000 0.000 0.125 0.000 0.125 0.250 0.500 0.000 0.000 0.000 0.125 0.250 0.500 0.500 0.000 0.000 0.000 0.750 0.250 0.250 0.250 0.250 0.625 0.375 0.000 0.000 0.250 0.625 0.375 0.000 0.000 0.000 0.125 0.000 0.375 0.000 0.000 0.125 0.000 0.000 0.000 0.125 0.625 0.125 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.875 0.875 0.375 0.625 0.500 0.375 0.375 0.625 0.500 0.625 0.500 0.375 0.125 0.375 0.125 0.625 0.500 0.375 0.125 0.125 0.375 0.125 0.375 0.000 0.375 0.625 0.625 0.625 0.250 0.250 0.500 0.250 0.625 0.250 0.250 0.500 0.250 0.250 0.125 0.500 0.250 0.500 0.250 0.125 0.500 0.500 0.250 0.125 0.000 0.125 0.375 0.500 0.375 0.375 0.500 0.250 0.375 0.250 0.375 0.500 0.250 0.375 0.250 0.375 0.500 0.250 0.375 0.250 0.125 0.125 0.125 0.500
(continued on next page)
86
A. Conte et al. / Journal of Econometrics 162 (2011) 79–88
Table A.1 (continued) t
q1
q2
q3
q4
p1
p2
p3
p4
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
0.500 0.500 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.250 0.250 0.250 0.250 0.375 0.375 0.375 0.375 0.375 0.375
0.250 0.250 0.750 0.750 0.750 0.750 0.750 0.750 0.750 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.625 0.625 0.625 0.625 0.250 0.250 0.625 0.625 0.625 0.125
0.000 0.000 0.000 0.250 0.250 0.250 0.250 0.250 0.250 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.125 0.125 0.125 0.125 0.375 0.375 0.000 0.000 0.000 0.500
0.250 0.250 0.250 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.625 0.625 0.125 0.125 0.125 0.375 0.375 0.500 0.500 0.125 0.125 0.250 0.375 0.375 0.500 0.500 0.500 0.750 0.375 0.375 0.500 0.500 0.500 0.500 0.500 0.500 0.750 0.500
0.000 0.000 0.750 0.000 0.375 0.125 0.250 0.000 0.125 0.000 0.375 0.625 0.125 0.250 0.000 0.000 0.125 0.125 0.125 0.250 0.000 0.125 0.000 0.000 0.000 0.125 0.125 0.125
0.000 0.000 0.000 0.875 0.500 0.500 0.375 0.500 0.375 0.875 0.500 0.125 0.500 0.375 0.500 0.500 0.375 0.125 0.500 0.375 0.500 0.375 0.500 0.500 0.500 0.375 0.125 0.375
0.375 0.375 0.125 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Table A.2 Glossary of notation. Variables xi ui pt qt pit (qit ) Pit (Qit ) dit Dit yt
εt r
γ Functions EU (.) RDEU (.) u(.)
w(.) Φ (.) f (.) g (.) L(.)
ith outcome utilityofithoutcome = u(xi ) P-lottery on problem t Q-lottery on problem t ProbabilityofoutcomeiinP-(Q-)lotteryonproblemnumbert ModifiedprobabilityofoutcomeiinP-(Q-)lotterynumberonproblemt(see (5)) pit − qit Pit − Qit Decision on lottery t (1: P-lottery; -1: Q-lottery) Measurement error on problem number t Risk-aversion parameter Weighting-function parameter Expected Utility function (see (1)) Rank Dependent Expected Utility function (see (4)) Utility function (see (8) and (9)) Weighting function (see (10)) Unit normal cumulative density function Probability density function of risk-aversion parameter (EU agents) Joint probability density function of risk-aversion and (log of) weighting-function parameters (RDEU agents) Likelihood function (see (16))
Parameters
θ Θ δ ∆ M S
ρ ω σ p
Mean of the (marginal) distribution of the risk-aversion parameter (EU agents) Mean of the (marginal) distribution of the risk-aversion parameter (RDEU agents) Standard deviation of the (marginal) distribution of the risk-aversion parameter (EU agents) Standard deviation of the (marginal) distribution of the risk-aversion parameter (RDEU agents) Mean of the (marginal) distribution of the (transformation of the) weighting-function parameter (RDEU agents) Standard deviation of the (marginal) distribution of the (transformation of the) weighting-function parameter (RDEU agents) Correlation between risk-aversion parameter and the (transformation of the) weighting-function parameter (RDEU agents) Probability of a tremble Standard deviation of the Fechnerian error The mixing parameter (proportion of RDEU agents in population)
behaviour within subjects. Our estimations take all these forms of heterogeneity into account.12 12 Clearly there is scope for further investigations. For example, one of the parameters that we have assumed the same for all members of the population, ω, might also be distributed over the population.
Acknowledgements The authors would like to thank two referees for extremely helpful suggestions, which led to significant improvements in the paper.
A. Conte et al. / Journal of Econometrics 162 (2011) 79–88
87
Table A.3 Estimates for CRRA power specification. Parameter estimates. Maximum simulated likelihood. CRRA specification. Power weighting function. (53 individuals, 500 observations each, standard errors in parentheses) EU only
θ (EU) Θ (RDEU)
−1.15599
M
(0.07217) 0.55348 (0.03040) –
S
–
ρ
–
ω
0.01375 (0.00173) 0.04823 (0.00089) –
δ (EU) ∆ (RDEU)
σ p Log-likelihood
−7210.27887
RDEU only
−1.06849 (0.03109) 0.72066 (0.02313) −0.02251 (0.02224) 0.34156 (0.01314) −0.21909 (0.05699) 0.01235 (0.00333) 0.04309 (0.00089) –
Mixture model EU-type
RDEU-type
−0.94253
−1.04301
(0.07489) 0.46515 (0.03238) –
(0.05346) 0.80237 (0.04294) −0.30098 (0.03833) 0.87514 (0.02717) −0.02117 (0.02247)
– – 0.01288 (0.00168) 0.05522 (0.00178)
0.03049 (0.00125) 0.58638 (0.07542) −6773.37108
−6845.15422
Table A.4 Estimates for CARA/Quiggin specification. Parameter estimates. Maximum simulated likelihood CARA specification. Quiggin weighting function (53 individuals, 500 observations each, standard errors in parentheses) EU only
RDEU only
Mixture model EU-type
θ (EU) Θ (RDEU)
M
0.02440 (0.00146) 0.01905 (0.00142) –
S
–
ρ
–
ω
0.01272 (0.00483) 0.06142 (0.00110) –
δ (EU) ∆ (RDEU)
σ p Log-likelihood
−7766.48390
0.02548 (0.00108) 0.01915 (0.00088) −0.46542 (0.02492) 0.31122 (0.02058) −0.55347 (0.05475) 0.01172 (0.00177) 0.05609 (0.00109) –
RDEU-type
0.03618 (0.00499) 0.02180 (0.02769) –
0.01693 (0.00034) 0.01492 (0.00039) −0.10515 (0.01987) 0.59426 (0.01988) 0.10194 (0.03750)
– – 0.00724 (0.00146) 0.09635 (0.00326)
0.04778 (0.00099) 0.75248 (0.05998) −7285.69205
−7485.00936
Table A.5 Estimates for CARA/Power specification. Parameter estimates. Maximum simulated likelihood CARA specification. Power weighting function (53 individuals, 500 observations each, standard errors in parentheses) EU only
RDEU only
Mixture model EU-type
θ (EU) Θ (RDEU)
M
0.02440 (0.00146) 0.01905 (0.00142) –
S
–
ρ
–
ω
0.01272 (0.00483) 0.06142 (0.00110) –
δ (EU) ∆ (RDEU)
σ p Log-likelihood
−7766.48390
0.02998 (0.00092) 0.02347 (0.00088) −0.17251 (0.03479) 0.36198 (0.01542) −0.25988 (0.08560) 0.01033 (0.00007) 0.05183 (0.00122) –
−7488.98040
RDEU-type
0.01873 (0.00240) 0.00853 (0.00141) –
0.03760 (0.00114) 0.02406 (0.00094) −0.29090 (0.04924) 0.50443 (0.02153) −0.05176 (0.08508)
– – 0.00873 (0.00146) 0.07657 (0.00268)
0.04257 (0.00170) 0.69953 (0.07494) −7419.30892
A. Conte et al. / Journal of Econometrics 162 (2011) 79–88
.6
.8
1
log-likelihood 0
.2
.4 .6 mixing proportion
0
1
log-likelihood
-7419.3 -7419.4
log-likelihood .65
.68
.7
.2
.4
.6
.8
1
mixing proportion
-7419.5 -7419.6
.6 mixing proportion
.72 .74 mixing proportion
.76
.74
.76 .78 mixing proportion
.8
.82
20 10
Frequency
15
30
.72
0
0
0
5
10
Frequency
20
Frequency
10
Posterior probabilità
.8
20
log-likelihood
-6773.8 -6773.7 -6773.6 -6773.5 -6773.4
.55
30
Log-likelihood function (detail)
mixing proportion
CARA Quiggin
-7380 -7360 -7340 -7320 -7300 -7280
-7420 .4
-7440
log-likelihood .2
-7480
0
-7460
-6780 -6800
log-likelihood
-6760
CARA power
-6820 -6840
Log_likelihood function
CRRA power
-7286.6 -7286.4 -7286.2 -7286 -7285.8 -7285.6
88
0
.2
.4
.6
.8
1
posterior probability of being type RDEU
0
.2
.4
.6
.8
posterior probability of being type RDEU
1
0
.2 .4 .6 .8 1 posterior probability of being type RDEU
Fig. A.1. The log-likelihoods and the posterior probabilities for the CARA and CRRA/Power Specifications.
Appendix A This Appendix contains Tables A.1–A.5 and Fig. A.1. References Allais, M., 1953. Le comportement de l’Homme Rationnel devant le Risque. Econometrica 21, 503–546. Ballinger, T.P., Wilcox, N., 1997. Decisions, error and heterogeneity. Economic Journal 107, 1090–1105. Blavatskyy, P., 2007. Stochastic expected utility theory. Journal of Risk and Uncertainty 34, 259–286. Botti, F., Conte, A., Di Cagno, D.T., D’Ippoliti, C., 2008. Risk attitude in real decision problems. The B.E. Journal of Economic Analysis & Policy 8 (1), (Advances), Article 6. Buschena, D.E., Zilberman, D., 2000. Generalized expected utility, heteroscedastic error, and path dependence in risky choice. Journal of Risk and Uncertainty 20, 67–88. Gonzales, R., Wu, G., 1999. On the shape of the probability weighting function. Cognitive Psychology 38, 129–166. Gouriéroux, C., Monfort, A., 2002. Simulation-based Econometric Methods. Oxford University Press, Oxford. Harless, D.W., Camerer, C.F., 1994. The predictive power of generalized expected utility theories. Econometrica 62, 1251–1289. Harrison, G.W., Rutstrom, E., 2009. Expected utility theory and prospect theory: One wedding and a decent funeral. Experimental Economics 12, 133–158. Hey, J.D., Orme, C.D., 1994. Investigating generalisations of expected utility theory using experimental data. Econometrica 62, 1291–1326. Hey, J.D., 1995. Experimental investigations of errors in decision making under risk. European Economic Review 39, 633–640. Hey, J.D., 1997. In: Kreps, D.M., K.F., Wallis (Eds.), Experiments and the Economics of Individual Decision Making Under Risk and Uncertainty. In: Advances in Economics and Econometrics: Theory and Applications, Vol. 1. Cambridge University Press, Cambridge, pp. 173–205.
Hey, J.D., 2001. Does repetition improve consistency?. Experimental Economics 4, 5–54. Loomes, G., Sugden, R., 1998. Testing different specifications of risky choice. Economica 61, 581–598. Loomes, G., Moffatt, P.G., Sugden, R., 2002. A microeconometric test of alternative stochastic theories of risky choice. Journal of Risk and Uncertainty 24, 103–130. Machina, M.J., 1994. Review of ‘Generalized expected utility theory: The rankdependent model’. Journal of Economic Literature 32, 1237–1238. McLachlan, G., Peel, D., 2000. Finite Mixture Models. Wiley, New York. Moffatt, P.G., 2005. Stochastic choice and the allocation of cognitive effort. Experimental Economics 8, 369–388. Moffatt, P.G., Peters, S.A., 2001. Testing for the presence of a tremble in economic experiments. Experimental Economics 4, 221–228. Myung, J., Kim, C., Pitt, M.A., 2000. Towards an explanation of the power law artefact: Insights from response surface analysis. Memory and Cognition 28, 832–840. Quiggin, J., 1982. A theory of anticipated utility. Journal of Economic Behavior and Organization 3, 323–343. Starmer, C., 2000. Developments in non-expected utility theory: The hunt for a descriptive theory of choice under risk. Journal of Economic Literature 38, 332–382. Stott, H.P., 2006. Cumulative prospect theory’s functional menagerie. Journal of Risk and Uncertainty 32, 101–130. Train, K., 2003. Discrete Choice Methods with Simulation. Cambridge University Press, Cambridge. Tversky, A., Kahneman, D., 1992. Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty 2, 235–264. Vuong, Q.H., 1989. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57, 303–333. Wakker, P.P., 2008. Explaining the characteristics of the power (CRRA) utility family. Health Economics 17 (12), 1329–1344. Wilcox, N., 2006. Theories of learning in games and heterogeneity bias. Econometrica 74, 1271–1292.
Journal of Econometrics 162 (2011) 89–104
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
‘Stochastically more risk averse:’ A contextual theory of stochastic discrete choice under risk Nathaniel T. Wilcox ∗ Economic Science Institute, Chapman University, United States Department of Economics, University of Houston, United States
article
abstract
info
Article history: Available online 13 October 2009
Microeconometric treatments of discrete choice under risk are typically homoscedastic latent variable models. Specifically, choice probabilities are given by preference functional differences (given by expected utility, rank-dependent utility, etc.) embedded in cumulative distribution functions. This approach has a problem: Estimated utility function parameters meant to represent agents’ degree of risk aversion in the sense of Pratt (1964) do not imply a suggested ‘‘stochastically more risk averse’’ relation within such models. A new heteroscedastic model called ‘‘contextual utility’’ remedies this, and estimates in one data set suggest it explains (and especially predicts) as well as or better than other stochastic models. © 2009 Elsevier B.V. All rights reserved.
JEL classification: C25 C91 D81 Keywords: Risk More risk averse Discrete choice Stochastic choice Heteroscedasticity
There is no doubt of the importance of two papers in the development of expected utility (EU) theory. Pratt (1964) gave us an EU-based understanding of ‘‘agent a is more risk averse than agent b:’’ Write this as a ≻ b and call this relation ‘‘MRA mra
in Pratt’s sense.’’ Pratt also gave us risk aversion measures and parametric utility functions for money outcomes that represent these measures of risk aversion in their parameters. Rothschild and Stiglitz (1970) then considered possible definitions of the relation ‘‘lottery T is riskier than lottery S’’ and proved that several definitions are equivalent under EU. Call {S , T } an MPS pair when T is a mean preserving spread of S (defined subsequently). Rothschild and Stiglitz showed that any EU agent with a strictly concave utility of money prefers S to T when choosing from any mean-preserving spread (MPS) pair {S , T }. An accumulated experimental literature suggests various problems with EU, and this has spawned alternative structural theories such as prospect theory (Kahneman and Tversky, 1979), rank-dependent utility (RDU) (Quiggin, 1982; Chew, 1983) and cumulative prospect theory (CPT) (Tversky and Kahneman, 1992). However, the experimental literature also established a fact that is not directly addressed by these theories: Choice under risk appears to be highly stochastic. Beginning with Mosteller and Nogee (1951), experiments with repeated trials of pairs reveal
∗ Corresponding address: Economic Science Institute, Chapman University, United States. Tel.: +1 714 628 7212; fax: +1 714 628 2881. E-mail address:
[email protected]. 0304-4076/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2009.10.012
substantial choice switching by the same subject between trials. In some cases, the trials span days (e.g. Tversky and Russo, 1969; Hey and Orme, 1994; Hey, 2001) and one might worry that decision-relevant conditions may have changed between trials. Yet substantial switching occurs even between trials separated by bare minutes, with no intervening change in wealth, background risk, or any other obviously decision-relevant variable (Camerer, 1989; Starmer and Sugden, 1989; Ballinger and Wilcox, 1997; Loomes and Sugden, 1998). How do we generalize the relation ‘‘more risk averse’’ to stochastic choice under risk? Suppose P n is the probability that agent n chooses S from MPS pair {S , T }, and suppose we regard these probabilities as the theoretical primitive. What might it mean for agent a to be stochastically more risk averse than agent b? Consider this proposal: Stochastically More Risk Averse (SMRA). Agent a is stochastically more risk averse than agent b, written a ≻ b, iff P a > P b for every MPS pair {S , T }.1
smra
That is, the stochastically more risk averse agent is more likely to choose the relatively safe lottery in every MPS pair. If
1 To my knowledge, Hilton (1989) first suggested definitions of ‘‘more risk averse’’ for stochastic choice. Let EV (T ) be the expected value of lottery T and consider pairs of the form{EV (T ), T }. Hilton defined ‘‘agent a is more risk averse in selection than agent b’’ as P a > P b for all pairs {EV (T ), T }. SMRA simply generalizes Hilton’s definition from a specific kind of MPS pair (a choice between a lottery and its expected value) to any MPS pair.
90
N.T. Wilcox / Journal of Econometrics 162 (2011) 89–104
sample proportions of relatively safe choices from MPS pairs vary significantly across subjects in an experiment, and we call this heterogeneous risk aversion, then we probably have something like SMRA in mind. Today we are witnessing rapid growth of a ‘‘new structural econometrics’’ of discrete choice under risk, usually based on discrete choices from lottery pairs. In practice, a parametric functional form from Pratt (1964) or a generalization of them (Saha, 1993) is used to specify an EU, RDU or CPT value difference between lotteries in a pair—what I will call a V -difference. These V -differences are then embedded within a c.d.f. to specify choice probabilities for maximum likelihood or other M-estimation. Call this approach a V -difference latent variable model. I argue that there is a deep problem with this approach: Parameters in such models that are meant to represent degrees of risk aversion in Pratt’s sense cannot represent degrees of stochastic risk aversion across all agents and all decision contexts in the SMRA sense. For example, suppose we use a V -difference latent variable model to estimate coefficients of relative risk aversion ρ for Anne and Bob, getting estimates ρˆ Anne > ρˆ Bob . We would like to say that Anne ≻ Bob in Pratt’s sense. However, Sections 3 and 4 below will mra
show that the V -difference latent variable model and ρ Anne > ρ Bob cannot imply that Anne ≻ Bob in the sense defined above. This smra
occurs because the utility or value of money u(z ) in theories such as EU, RDU and CPT is only unique up to an affine transformation and, therefore, it is not a theorem that MRA in Pratt’s sense implies a greater V -difference between lotteries in all MPS pairs. However, MRA in Pratt’s sense does imply orderings of ratios of differences of utilities. Psychologically, when we say Anne ≻ Bob mra
in Pratt’s sense, on some interval of outcomes [z1 , z3 ], we mean that Anne perceives the ratio [u(z2 ) − u(z1 )]/[u(z3 ) − u(z1 )] as larger than Bob does for all z2 ∈ (z1 , z3 ). Economically, this is an implication of Pratt’s main theorem. Therefore, both economics and psychology suggest that if we wish MRA in Pratt’s sense to imply SMRA in a latent variable model, the latent variable will of necessity involve ratios of utility differences rather than utility differences. Put differently, for MRA in Pratt’s sense to imply SMRA, our stochastic model may need to assume that agents perceive EU, RDU or CPT V -differences relative to some salient utility difference. In Section 5, I suggest a contextual utility stochastic choice model: It assumes that this salient utility difference is the difference between the utilities of the maximum and minimum possible outcomes in a lottery pair. This model will allow SMRA to be an implication of MRA in Pratt’s sense, in suitably defined ways. More generally, it seems that when we move from deterministic to stochastic choice under risk, context and risk aversion are inextricably entwined with one another. Put differently: If choice under risk is stochastic, a globally coherent notion of greater risk aversion necessarily implies that context matters. As a relational preference concept, MRA is ubiquitous in professional economic discourse. Fig. 1 illustrates this for the years 1977 to 2001, comparing counts of articles with either of the phrases ‘‘more risk averse’’ or ‘‘greater risk aversion’’ in their text to counts of articles with either of the phrases ‘‘more substitutable’’ or ‘‘greater elasticity of substitution’’ in their text.2 The latter is clearly a central relational concept of economics. The comparative levels of the two trend lines are somewhat arbitrary, since searches for
2 Some journals withhold JSTOR availability for recent years: 2001 is the most recent widely available year. Some important journals either do not have JSTOR availability, or did not exist, back to 1977 (e.g. the Journal of Labor Economics and the Journal of Economic Perspectives, respectively) and so are not included. Three important business journals (the Journal of Business, Journal of Finance and Management Science) are included as well.
Fig. 1. Articles with text references to risk aversion and substitutability relations, 1977–2001 (JSTOR Economics journals and selected Business journals) and Citations of Pratt (1964), 1977–2006 (Science and Social Science Citation Indices).
different text strings would affect the levels. Moreover, instances of the substitutability phrases include articles about technologies as well as preferences. The comparative trends are more meaningful. MRA gained on ‘‘more substitutable’’ as a relational concept over the quarter century from 1977 to 2001. Fig. 1 also shows that although citations3 to Pratt (1964) peaked in the early 1980s, they have continued at a fairly constant rate of about 35 to 40 a year over the last two decades and show no sign of decrease. Probably, very few economics articles show such steady and long-lived influence. Given the ubiquity of MRA in economic discourse, it would be nice if model parameters meant to represent MRA in Pratt’s sense had a theoretically appealing observable meaning such as SMRA. Contextual utility delivers such meaning; common alternative models do not. Section 6 shows that contextual utility may be empirically better too: both EU and RDU models estimated with the contextual utility stochastic model explain (and especially predict) risky choices better than V -difference latent variable models and the random preference stochastic model in the well-known Hey and Orme (1994) data set. 1. Preliminaries Let the vector c = (z1c , z2c , z3c ) be three money outcomes with z1c < z2c < z3c , called an outcome context or simply a context. Let Sm be a discrete probability distribution (sm1 , sm2 , sm3 ) on a vector (z1 , z2 , z3 ); then a three-outcome lottery Smc on context c is the distribution Sm on the outcome vector (z1c , z2c , z3c ). Sometimes lotteries are viewed as cumulative distribution functions Smc (z ) ≡ ∑ i|zic ≤z smi . A pair mc is two distinct lotteries {Smc , Tmc } on context c, and a basic pair is one where neither lottery first-order stochastically dominates the other. In basic pairs, we may choose lottery names so that sm2 + sm3 > tm2 + tm3 and sm3 < tm3 , and say that Smc is safer than Tmc in the sense that Smc has more chance of the center outcome z2c , and less chance of the extreme outcomes z1c and z3c , than does Tmc . If the expected values E (Smc ) and E (Tmc )
3 The citation counts are from 1977 to 2006 in all fields (not just economics) in the ISI Science and Social Science Citation Indices. The temporal pattern of citations in economics publications alone is essentially similar.
N.T. Wilcox / Journal of Econometrics 162 (2011) 89–104
are equal, the pair is a mean-preserving spread (MPS) pair. Let mps be the set of all MPS pairs: risk averse EU agents prefer Smc to Tmc ∀ mc ∈ mps (Rothschild and Stiglitz, 1970). To connect deterministic theory to probabilistic primitives, let the structure of choice under risk be defined as a function V of lotteries and a vector of structural parameters β n such that n V (Smc |β n ) − V (Tmc |β n ) ≥ 0 ⇔ Pmc ≥ 0.5, 4
(1.1)
n where Pmc
is the probability that agent n chooses Smc from pair mc. Call V (Smc |β n ) − V (Tmc |β n ) the V -difference in pair mc; when this is nonnegative, it represents the deterministic primitive ‘‘n weakly prefers Smc to Tmc .’’ In turn, (1.1) equates this with the probabilistic primitive ‘‘n is not more likely to choose Tmc from mc,’’ a common approach since Edwards (1954).5 For example, expected utility (EU) with the constant relative n risk aversion (CRRA) utility of money un (z ) = (1 − ρ n )−1 z 1−ρ is the structure V (Smc |ρ n ) = (1 − ρ n )−1
1−ρ n , i=1 smi zic
∑3
3 − 1−ρ n n (1 − ρ n )−1 (smi − tmi )zic ≥ 0 ⇔ Pmc ≥ 0.5,
so that (1.2)
i =1
where β = ρ n is the sole structural parameter, called agent n’s coefficient of relative risk aversion. Call this the CRRA EU structure. The rank-dependent utility (RDU) structure (Quiggin, 1982; Chew, 1983) replaces the probabilities weights w ∑ smi inEU with∑ smi . These weights are w smi = w − w z ≥i smz z >i smz , where a continuous and strictly increasing weighting function w(q) takes the unit interval onto itself. Writers suggest several parametric forms for the weighting function; here, I use Prelec’s (1998) onen parameter form, which is w(q|γ n ) = exp(−[− ln(q)]γ ) ∀q ∈ (0,1), w(0) = 0 and w(1) = 1. Then a CRRA RDU structure is n
V (Smc |ρ n , γ n ) = (1 − ρ n )−1
3 −
1−ρ n
w smi (γ n )zic
,
i=1
so that
(1 − ρ n )−1
3 − 1−ρ n ≥0 [w smi (γ n ) − wtmi (γ n )]zic i =1
n ⇔ Pmc ≥ 0.5.
(1.3)
Structures with weighting functions attribute risk attitudes to both utility function and weighting function shape (Quiggin, 1982; Tversky and Kahneman, 1992). This is why I have (so far) written ‘‘MRA in Pratt’s sense.’’ My issue is the partial contribution of utility function shape to risk attitude, holding weighting functions constant. Henceforth, when I write a ≻ b and a ≻ b, or refer to mra
smra
the MRA or SMRA relation, I am always considering agents with identical weighting functions. The constant reminder ‘‘in Pratt’s sense’’ will now cease. However, a certain fact about RDU weights in basic pairs will be used in Section 5 to derive properties of the contextual utility model. Since w(q) is strictly increasing, and since sm2 + sm3 > tm2 + tm3 and sm3 < tm3 in basic pairs, w(sm2 + sm3 ) > w(tm2 + tm3 ) and w(sm3 ) < w(tm3 ). So in all basic pairs,
wsm2 − w tm2 ≡ w(sm2 + sm3 ) − w(sm3 ) − [w(tm2 + tm3 ) − w(tm3 )] > 0.
(1.4)
91
2. V -difference latent variable models: Strong and strict utility To estimate any vector β n in (1.1), we need a stochastic model to complete the relationship between V -difference and choice probability. Latent variable models are one way to do this. Let ynmc = 1 if agent n chooses Smc from pair mc (ynmc = 0 otherwise). In general, such models assume that there is an underlying but ∗ unobserved continuous random latent variable ynmc such that ynmc = n∗ n n∗ 1 ⇔ ymc ≥ 0; then we have Pmc = Pr(ymc ≥ 0). Here, the latent variable is ∗
ynmc = V (Smc |β n ) − V (Tmc |β n ) − ε/λ,
where ε is a mean zero random variable with some standard variance and c.d.f. H (x) such that H (0) = 0.5 and H (x) = 1 − H (−x), usually assumed to be the standard normal or logistic c.d.f. n The resulting latent variable model of Pmc is then n Pmc = H λ[V (Smc |β n ) − V (Tmc |β n )] .
n Pmc = µn (Smc )/[µn (Smc ) + µn (Tmc )].
(2.3)
As Luce and Suppes (1965, p. 335) point out, every strict utility model is algebraically identical to n Pmc = Λ ln[µn (Smc )] − ln[µn (Tmc )] ,
where
Λ(x) = (1 + e−x )−1 is the logistic c.d.f.
(2.4)
If V is positive valued, we could for instance choose µ (Smc ) ≡ V (Smc |β n )λ and rewrite (2.4) as n
n Pmc = Λ λ ln[V (Smc |β n )] − ln[V (Tmc |β n )]
.
(2.5)
This resembles (2.2) except that a difference of logarithms of V , or logarithmic V -difference, replaces the V -difference. Model (2.5) was employed by Holt and Laury (2002). Another very common strict utility form, used both for quantal response equilibrium (McKelvey and Palfrey, 1995) and the experienceweighted attraction learning model (Camerer and Ho, 1999), sets µn (Smc ) ≡ exp[λV (Smc |β n )]. Substituting this into (2.4) gives
equation is incorrect, such as Machina (1985). Though as yet scant, existing evidence is not promising for these alternatives (Hey and Carbone, 1995).
(2.2)
In decision theory, (2.2) is a strong utility model (Debreu, 1958; Block and Marschak et al., 1960; Luce and Suppes, 1965). Letting L be the set of all lotteries, choice probabilities follow strong utility if there is a function µn : L → R and an increasing function ϕ n : R →[0,1], with ϕ n (0) = 0.5 and ϕ n (x) = 1 − ϕ n (−x), n such that Pmc = ϕ n (µn (Smc ) − µn (Tmc )). Clearly, (2.2) is a strong utility model where ϕ n (x) = H (λx) and µn (Smc ) = V (Smc |β n ). In (2.1), ε/λ may be regarded as computational, perceptual or evaluative noise in the decision maker’s apprehension of the V -difference V (Smc |β n ) − V (Tmc |β n ), with λ−1 proportional to the standard deviation of this noise. Microeconometric doctrine usually views ε/λ as a perturbation to V -difference observed by agents but not observed by the econometrician. In either case, as λ approaches infinity, choice probabilities converge on either zero or one, depending on the sign of the V -difference; put differently, the observed choice becomes increasingly likely to express the underlying preference direction of the structure. We can call λ−1 the noise parameter or call λ the precision parameter. Strict utility (Luce and Suppes, 1965) is more restrictive. Again there is a scale µn defined on lotteries, but it must be strictly positive and choice probabilities must take the form
n Pmc = Λ λ[V (Smc |β n ) − V (Tmc |β n )] ,
4 This restricts attention to transitive structures; this would be an unsatisfactory equation for a nontransitive theory. 5 There are stochastic choice models under which this innocent-sounding
(2.1)
(2.6)
which is identical to the strong utility model (2.2) with the logistic c.d.f. as the choice of H. My strong utility estimations will in fact be (2.2) with the logistic c.d.f., and hence also equivalent to this very common strict utility form. What I will call my strict utility
92
N.T. Wilcox / Journal of Econometrics 162 (2011) 89–104
Fig. 2. Behavior of five CRRA EU V -differences using various transformations, with homoscedastic precision: MPS pair 1 on context (0,50,100) from Hey (2001).
estimations instead use the logarithmic V -difference form and logistic c.d.f., as in (2.5) and Holt and Laury (2002). Homoscedasticity with respect to pairs mc (constant λ across pairs and contexts) is the essence of strong and strict utility models: without it, they do not imply strong stochastic transitivity or SST, their definitive property (Block and Marschak et al., 1960; Luce and Suppes, 1965). However, homoscedasticity with respect to agents n is not, and various transformations τ of V -difference may be interpreted as agent heteroscedasticity determined by risk parameters. For example, call the textbook CRRA utility function ub (z |ρ) = (1 − ρ)−1 z 1−ρ the basic transformation, and call up (z |ρ) = z 1−ρ the power transformation. When we write the V difference in a strong utility model using the basic transformation, this is algebraically identical to writing the V -difference with the power transformation and additionally assuming agentspecific precision parameters λn that are proportional to |(1 − ρ n )|. Section 4 will take up agent heteroscedasticity in a more theoretically grounded manner. In experiments with K ≥ 4 distinct outcomes z1 < z2 < · · · < zK (and several distinct three-outcome contexts), call uur 1−ρ 1−ρ 1−ρ (z |ρ) = (z 1−ρ − z1 )/(zK − z1 ) the unit range transformation (making the utility range across all outcomes equal to unity 1−ρ 1−ρ 1−ρ for all ρ ) and call umu (z |ρ) = (z 1−ρ − z1 )/(z2 − z1 ) the minimum unit transformation (making the utility difference between the two smallest outcomes equal to unity for all ρ ). With minor terminological abuse, I will say that the logarithmic V difference latent variable in (2.5) is written using the logarithmic transformation. Let ∆τ Vmc (ρ) be the CRRA EU V -difference (or logarithmic V -difference) in pair mc, as written with any of these five transformations τ . With any constant λ, ∆τ Vmc (ρ) would need to be monotone increasing in ρ for strong or strict utility models to imply SMRA. 3. Critique of strong and strict utility Pratt’s (1964) original EU definition of a ≻ b was extended by mra
Chew et al. (1987, p. 374) for RDU. I quote them here at length, with notation modified suitably to fit mine. To compare the attitudes toward risk of two preference relations ≽b and ≽a on DJ [a set of probability distributions on an interval J ⊂ R], we define Tmc ∈ DJ to differ from Smc ∈ DJ by a simple compensated spread from the viewpoint of ≽b , if and only if Tmc ∼b Smc and ∃z 0 ∈ J such that Tmc (z ) ≥ Smc (z ) for all z < z 0 and Tmc (z ) ≤ Smc (z ) for all z ≥ z 0 .
DEFINITION 5: A preference relation ≽a ∈ G [the set of preference relations representable by Gateaux-differentiable RDU functionals V ] is said to be more risk averse than ≽b ∈ G if Smc ≽a Tmc for every Tmc , Smc ∈ DJ such that Tmc differs from Smc by a simple compensated spread from the point of view of ≽b . . .. THEOREM 1. The following conditions on a pair of Gateaux differentiable [RDU] preference functionals V b and V a on DJ with respective utility functions ub and ua and [weighting] functions w b and w a are equivalent: . . . (ii) w a and ua are concave transformations of w b and ub , respectively. (iii) If Tmc differs from Smc by a simple compensated spread from the point of view of ≽b [implying that V b (Tmc ) = V b (Smc )] then V a (Tmc ) ≤ V a (Smc ) [all italics in original]. Consider what this means for a CRRA EU V -difference. In this case both of the weighting functions w a and w b are identity functions; and in every MPS pair, Tmc will be a simple compensated spread of Smc from the viewpoint of a risk-neutral decision maker, that is an agent b with ρ b = 0, so that V b (Tmc ) = V b (Smc ). Equivalently, we may write ∆τ Vmc (0) = 0 in any MPS pair. The results (ii) and (iii) of Theorem 1 then let us conclude that V a (Tmc ) ≤ V a (Smc ) for any more risk averse agent a, in this case any agent a with ρ a > 0. Equivalently, we may write ∆τ Vmc (ρ a ) ≥ 0 ∀ρ a > 0 in any MPS pair. However, nothing in the theorem implies that ∂ ∆τ Vmc (ρ)/∂ρ > 0 ∀ρ > 0; nothing suggests that any V -difference, written using any transformation, is monotone increasing in any parameter meant to represent MRA. The theorem only says that indifference becomes weak preference when we substitute any ‘‘more risk averse’’ utility function (or probability weighting function) for the less risk averse one that generated indifference; it says nothing about monotone increasing V -difference with greater risk aversion. But the latter is exactly what is required for a ≻ b ⇒ a ≻ b in strong and mra
smra
strict utility models when λ is constant across agents. In particular, strong utility specifies no necessary relationship between ρ and λ, so CRRA EU strong utility and MRA do not imply SMRA. Moreover, it seems that MRA cannot imply SMRA in any strong or strict utility model that assumes constant λ across agents. Graphs illustrate this using MPS pairs from Hey’s (2001) four-outcome design (these are common, e.g. Hey and Orme, 1994; Harrison and Rutström, 2009). The design employs the four outcomes 0, 50, 100 and 150 UK pounds and these generate four
N.T. Wilcox / Journal of Econometrics 162 (2011) 89–104
93
Fig. 3. Behavior of five CRRA EU V -differences using various transformations, with homoscedastic precision: MPS pair 1 on context (50,100,150) from Hey (2001).
distinct three-outcome contexts: All pairs are on one of these contexts. Index contexts by their omitted outcome, e.g. context c = −0 is (50,100,150), while context c = −150 is (0, 50, 100). Fig. 2 shows how λτ ∆τ V1,−150 (ρ) behaves for the five transformations τ defined previously,6 computed for the MPS pair 1 on context c = −150, given by S1,−150 = (0, 7/8, 1/8) and T1,−150 = (3/8, 1/8, 4/8) taken from Hey (2001), as ρ varies over the interval [0, 0.99]. To sketch Fig. 2, a constant value of λτ has been chosen for each transformation to make λτ ∆τ V1,−150 (ρ) reach a common maximum of 10 for ρ ∈ [0, 0.99], for easy visual comparisons; the important point is that λτ is held constant to draw each graph of λτ ∆τ V1,−150 (ρ), for each transformation τ . If λτ ∆τ V1,−150 (ρ) is nonmonotone on [0, 0.99], then MRA cannot imply SMRA under transformation τ given constant λ across agents. Fig. 2 shows that the basic, power and log transformations are not monotonically increasing in ρ , though the nonmonotonicity for the basic transformation is mild for this particular pair. The implication is that in a strong utility CRRA EU model with λ constant across agents, MRA defined in terms of ρ cannot imply SMRA using either the basic, power or log transformations, for MPS pair 1 on context c = −150. However, the unit range and minimum unit transformations can do this, at least for this MPS pair on this context. Unfortunately, this property of the unit range and minimum unit transformations disappears as soon as we consider a context that does not share the minimum outcome z1 = 0 used to define those transformations. Consider now MPS pair 1 on context c = −0, that is (50,100,150), given by S1,−0 = (0, 7/8, 1/8) and T1,−0 = (3/8, 1/8, 4/8), also taken from Hey (2001). Fig. 3 shows that for this MPS pair, λτ ∆τ V1,−0 (ρ) is severely nonmonotonic for all five transformations. MRA defined in terms of ρ cannot imply SMRA with any of these five transformations, given constant λ across agents, across all contexts constructed from a four-outcome vector. If we want to explain agent heterogeneity of safe choices from MPS pairs by means of a strong or strict utility model solely in terms of a risk parameter like ρ , and we use any of the five transformations discussed above, we cannot succeed if we assume that precision λ is constant across agents. It seems that a ≻ b ̸⇒
a ≻ b in strong and strict utility models with λ constant across smra
agents. The fact that the minimum unit and unit range transformations make MRA and SMRA congruent on a context that shares the minimum outcome used to define them (see Fig. 2) suggests one escape from these difficulties: Why not use a context-dependent transformation? This approach abandons homoscedasticity with respect to pairs mc; hence it is not a strong or strict utility model. This is in fact the approach taken by the contextual utility model in Section 5, and it can be given both good psychological motivation and firm grounding in terms of Pratt’s (1964) main theorem. First, however, it is worth considering more carefully whether a wellchosen form of agent heteroscedasticity might be adequate. We will see that it is not. 4. Construction and critique of an agent heteroscedasticity approach Fig. 4 shows two concave CRRA utility functions, for two agents a and b: a square-root (ρ b = 0.5) CRRA utility function, and a ‘‘near-log’’ (ρ a = 0.99) CRRA utility function; clearly a ≻ b. mra
Consider the MPS pair S2,−0 = (0, 1, 0) and T2,−0 = (1/2, 0, 1/2)—a sure 100 versus even chances of either 50 or 150. For a ≻ b mra
to imply a ≻ b under strong utility, the V -difference in this pair smra
with ρ b = 0.99 must exceed that with ρ a = 0.50: Otherwise the more risk averse utility function will not imply a higher probability of choosing the safer S2,−0 . A special transformation that equates first derivatives of utility functions at the intermediate outcome 100 allows Fig. 4 to be drawn that way, nesting the V -difference with ρ b = 0.5 inside the V -difference with ρ a = 0.99.7 The local absolute risk aversion measure −u′′ (z )/u′ (z ) may be thought of as measuring concavity relative to slope at z: with slopes equated at z = 100, Fig. 4 does this visually. Reflection on Fig. 4 suggests that for a ≻ b ⇒ a ≻ b on some mra
smra
collection of contexts, we might succeed with a transformation of u(z ) that equalizes first derivatives u′ (z ) of all utility functions at some z˙ sufficiently greater than z¯1 ≡ maxc {z1c }, the maximum
mra
6 Here, the CRRA unit range transformation is u (z |ρ) = (z 1−ρ − ur 01−ρ )/(1501−ρ − 01−ρ ) = (z /150)1−ρ , while the minimum unit transformation is umu (z |ρ) = (z 1−ρ − 01−ρ )/(501−ρ − 01−ρ ) = (z /50)1−ρ .
7 In Fig. 4, a ‘‘level’’ is also chosen for each utility function such that u(100) = 100 for both utility functions. This choice is irrelevant to any strong utility V -difference (it differences out) but it allows for easy visual comparison. The matching of first derivatives at z = 100 is the important move. Similar remarks apply to Fig. 6.
94
N.T. Wilcox / Journal of Econometrics 162 (2011) 89–104
Fig. 4. How agent heteroscedasticity might allow MRA to imply SMRA.
of the minimum outcomes found in any of several contexts c for which we have choice data. Without this, utility functions become arbitrarily flat over one or more contexts as their coefficients of relative risk aversion get large. This in turn implies that all V -differences between lotteries on such contexts approach zero as coefficients of relative risk aversion get large and, hence, that all strong or strict utility choice probabilities approach 0.5 for sufficiently great relative risk aversion on such contexts. The first derivative of the CRRA basic transformation is u′b (z ) = z −ρ ; multiplying the basic transformation by z˙ ρ , or (what is the same thing) eαρ where α = ln(˙z ) for some z˙ > z¯1 , we give all CRRA utility functions a common unit slope at z˙ = eα . Therefore, call usc (z |ρ) = eαρ z 1−ρ /(1 − ρ) an SMRA-compatible CRRA transformation. Fig. 4 is drawn using this transformation, with α = ln(100) = 4.61 giving all CRRA functions a unit slope at z = 100. If we write the CRRA EU V -difference using the SMRAcompatible transformation, the term eαρ factors out of the V -difference. We then have the strong utility model n Pmc
n = H λeαρ ∆b Vmc (ρ n ) .
(4.1)
Suppose, then, that we estimate a CRRA EU strong utility model one subject at a time, using the basic transformation: this would be the n model Pmc = H [λn ∆b Vmc (ρ n )]. Let λˆ n and ρˆ n be our estimates for each subject n. If (4.1) is correct, ln(λn ) ≡ ln(λ) + αρ n , and we ˆ n ) and should therefore expect a linear relationship between ln(λ ρˆ n . If this linear relationship has a slope α significantly greater than ln(˙z ), then a ≻ b ⇒ a ≻ b in the population from which subjects mra
smra
are drawn on all contexts found in the experiment that generated the data. Hey’s (2001) remarkable data set contains 500 lottery choices per subject, for 53 subjects. This allows one to estimate model parameters separately for each subject n with relative precision. The specification of choice probabilities actually used for this individual subject estimation adds a few small features for certain peripheral reasons. That specification is n Pmc = (1 − δmc ) (1 − ωn )H λn ∆b Vmc (ρ n ) + ωn /2
+ δmc (1 − ωn /2).
(4.2)
As is the case with several existing experimental data sets, Hey’s experiment contains a small number of pairs mc in which Smc first-order stochastically dominates Rmc . Letting fosd be the set
Fig. 5. Relationship between risk aversion and precision in Hey (2001), CRRA EU estimation.
of all such first order stochastic dominance (FOSD) pairs, let δmc = 1 ∀m ∈ fosd (δm = 0 otherwise). It is well known that strong and strict utility do a poor job of predicting the rareness of violations of FOSD (Loomes and Sugden, 1998). n Eq. (4.2) takes account of this, yielding Pmc = 1 − ωn /2 ∀mc ∈ fosd . The new parameter ωn is a tremble probability. Some randomness of observed choice has been thought to arise from attention lapses or simple responding mistakes that are independent of pairs mc, and adding a tremble probability as above takes account of this. It also provides a convenient way to model the low probability of FOSD violations. Moffatt and Peters (2001) find significant evidence of positive tremble probabilities even in data from experiments where there are no FOSD pairs, such as Hey and Orme (1994). The specification (4.2) follows Moffatt and Peters in assuming that ‘‘tremble events’’ occur with probability ωn independent of pairs mc and, in the event of a tremble, that choices of Smc or Tmc from pair mc are equiprobable. ˆ n ) and ρˆ n from Fig. 5 plots maximum likelihood estimates ln(λ Eq. (4.2) for each of Hey’s (2001) 53 subjects, along with a robust regression line between them. It does appear to be a remarkably linear relationship, as suggested by the heteroscedastic form ln(λn ) ≡ ln(λ) + αρ n . The robust regression coefficient is αˆ =
N.T. Wilcox / Journal of Econometrics 162 (2011) 89–104
95
Fig. 6. SMRA-compatible CRRA utility functions, slope and height matched at z = 69.41 = exp(4.24) (point estimate of α = 4.24).
Fig. 7. Behavior of CRRA EU V -differences using SMRA-compatible transformation: MPS pair 1 on the new context (100, 150, 200), at various values of α .
4.24, with a 90% confidence interval [4.11, 4.38].8 There are 16 MPS pairs in Hey’s experimental design. Computational methods easily find the minimum value z˙ > z¯1 ≡ maxc {z1c } = 50 for those MPS pairs, such that any α > ln(˙z ) allows MRA to imply SMRA for those MPS pairs.9 This minimum value is about z˙ = 59.7, so we need α > ln(59.7) = 4.09 for a ≻ b ⇒ a ≻ b. As can be seen, the point mra
smra
estimate αˆ = 4.24 does indeed allow it, and the 90% confidence interval for αˆ just allows one to reject (at 5%) the directional null α ≤ 4.09 in favor of the alternative that a ≻ b ⇒ a ≻ b mra
smra
amongst Hey’s subjects for all MPS pairs in all of the experiment’s contexts. The results are essentially similar if, instead, CRRA RDU is estimated using Prelec’s (1998) one-parameter weighting function. There is, however, an obvious difficulty with this agent heteroscedasticity solution. Suppose we invited Hey’s 53 subjects
8 The robust regression technique used is due to Yohai (1987), and deals with outliers in both the dependent and independent variable. This is appropriate here, since both are estimates. 9 Starting at α = ln(¯z ) = ln(50), α is incremented in small steps until 1
eαρ ∆b Vmc (ρ n ) is monotonically increasing in ρ (also checked by computational methods at each value of α ) for all MPS pairs mc in Hey’s design. n
back for another experimental session in which they made choices from lottery pairs on a new fifth context c ∗ = (100, 150, 200). If we believe the relationship shown in Fig. 5 is a fixed one for these subjects, independent of any outcome context they might face, then a ≻ b ̸⇒ a ≻ b on the new context c ∗ . The estimated mra
smra
upper confidence limit on αˆ , 4.38, implies a value of z˙ = 79.8 < 100 = z1c ∗ (since e4.38 = 79.8). Therefore, λe4.38ρ ∆b Vmc ∗ (ρ) will converge to zero for sufficiently large ρ and so cannot be monotonically increasing in ρ on the new context c ∗ . Figs. 6 and 7 illustrate the difficulty. Fig. 6 shows six CRRA utility functions (for ρ = 0.25, 0.5, 1, 2, 4 and 8), all drawn using the SMRA-compatible transformation and α = 4.24, the point estimate from the robust regression shown in Fig. 5, so that all slopes are unity at e4.24 = 69.41. While the relative curvature of these utility functions is substantial across the outcome range [0, 150] of Hey’s experiment, this is not true on the new context (100, 150, 200). There, increasing ρ eventually results in arbitrarily flat utility functions with arbitrarily small differences u(200)− u(100), and this eventually implies arbitrarily small V -differences on the new context for sufficiently large ρ . Fig. 7 graphs eαρ ∆b V1,c ∗ (ρ), the CRRA EU V -difference in the MPS pair S1,c ∗ = (0, 7/8, 1/8) and T1,c ∗ = (3/8, 1/8, 4/8), at the lower and upper confidence limits
96
N.T. Wilcox / Journal of Econometrics 162 (2011) 89–104
of α from the robust regression, as well as at α = 4.61 = ln(100) and α = 4.79 = ln(120). As can be seen, this function is not monotone increasing in ρ until we choose α sufficiently greater than the natural logarithm of the minimum outcome on the new context c ∗ . This problem is entirely general. Estimate or choose any finite value of α you wish. A new context c ∗ with z1c ∗ > eα always exists such that a ≻ b ̸⇒ a ≻ b on that new context, given that mra
smra
value of α . Although a suitably chosen α makes the transformation usc (z |ρ) = eαρ z 1−ρ /(1 − ρ) ‘‘SMRA compatible’’ on a given collection of contexts, it cannot be SMRA compatible on all contexts with arbitrarily large values of z1c . Of course, one could make α context dependent, but this means abandoning strong and strict utility, since there will then be heteroscedasticity with respect to contexts. If this is what we must do in order to make a ≻ b ⇒
Since (w sm1 − w tm1 ) ≡ −(w sm2 − w tm2 ) − (w sm3 − w tm3 ), the equation in Box I may be rewritten as n Pmc = H λ[(w sm2 − w tm2 )υcn (z2c ) + (w sm3 − w tm3 )] ,
mra
mra
5. Contextual utility
λ
V (Smc |β n ) − V (Tmc |β n ) V (z3c |β n ) − V (z1c |β n )
.
Under RDU, this may be rewritten as in Box I.
(5.1)
n
n
n
smra
Theorem 1. Let r n (z ) [equal to −u′′ (z )/u′ (z ) for utility function un ] be the local risk aversion . . . corresponding to the utility function u n , n = a, b. Then the following conditions are equivalent, in either the strong form (indicated in brackets), or the weak form (with the bracketed material omitted).
This is what the contextual utility model does.
n
across agents) on all three-outcome contexts, as hinted by Fig. 2. Here I quote the relevant parts of Pratt’s theorem, with notation modified to fit mine (all italics in original):
smra
n Pmc =H
(5.2)
where υ (z ) ≡ [u (z ) − u (z1c )]/[u (z3c ) − u (z1c )] can be recognized as a context-specific unit range transformation of agent n’s utility function. We may therefore view contextual utility as employing a ‘‘contextual unit range transformation’’ of the utility function. Suppose that pair mc in model (5.2) is an MPS pair. Pratt’s (1964, p. 128) Theorem 1 quickly shows that contextual utility guarantees that a ≻ b ⇒ a ≻ b (holding weighting functions and λ constant n c
a ≻ b, it is better to approach it in a more basic theoretical way.
Psychological motivation for contextual heteroscedasticity has its origin in signal detection and stimulus discrimination experiments. In this literature, stimulus categorization errors are known to increase with the subjective range taken by the stimulus or signal. For instance, Pollack (1953) and Hartman (1954) presented subjects with tones equally spaced over a range of tones. The range of tones used varies across subjects, but all subjects encounter specific target tones. Confusion of target tones is more common when the overall range of tones encountered is wider. Such observations gave rise to models of stimulus categorization and discrimination error predicting that classification error variance increases with the subjective range of the stimulus (Parducci, 1965, 1974; Holland 1968; Gravetter and Lockhead, 1973). In fact, a rough proportionality between subjective stimulus range and the standard deviation of latent error seemed descriptive of much data from categorization experiments, though some of the formal models allowed for some deviation from this (e.g. Holland, 1968 and Gravetter and Lockhead, 1973). In categorization experiments, where stimuli are presented one at a time and the subjects’ task is to assign the stimulus to a category, the subjective range of the stimulus was usually taken to be determined (after a period of adaptation) by the whole range of stimuli presented over the course of the experiment—what one might call the ‘‘global context’’ of the stimulus. The contextual utility model borrows the idea that the standard deviation of evaluative noise is proportional to the subjective range of stimuli from this literature on the perception of stimuli. However, being a model of choice from lottery pairs rather than a model of categorization of singly presented stimuli, it assumes that choice pairs create their own ‘‘local context’’ or idiosyncratic subjective stimulus range, in the form of the range of outcome utilities found in the pair. We may think of agents as perceiving lottery value on context c relative to the range of possible lottery values on context c. Letting V (z |β n ) be agent n’s value of a degenerate lottery that pays z with certainty, the subjective stimulus range for agent n on any context c is assumed proportional to V (z3c |β n ) − V (z1c |β n ). Assume that evaluative noise is proportional to this subjective stimulus range. For nonFOSD pairs on a three-outcome context, and ignoring any tremble for clarity, the contextual utility choice probabilities are:
(a) r a (z ) ≥ r b (z ) for all z [and > for at least one z in every interval] (e)
...
ua (z3c ) − ua (z2c )
≤ [<]
ub (z3c ) − ub (z2c )
ua (z2c ) − ua (z1c ) ub (z2c ) − ub (z1c ) with z1c < z2c < z3c .
forall z1c , z2c , z3c
Adding unity to both sides of (e) in the form [un (z2c ) − un (z1c )]/[un (z2c ) − un (z1c )], and then taking reciprocals of both sides, we get the following equivalence from the theorem: r a (z ) > r b (z ) ∀z ∈ [z1c , z3c ] ⇔
>
ub (z2c ) − ub (z1c ) ub (z3c ) − ub (z1c )
ua (z2c ) − ua (z1c ) ua (z3c ) − ua (z1c )
∀z2c ∈ (z1c , z3c ).
(5.3)
Since υcn (z2c ) = [un (z2c ) − un (z1c )]/[un (z3c ) − un (z1c )], Eq. (5.3) shows that r a (z ) > r b (z ) ∀z ∈ [z1c , z3c ] ⇔
υca (z2c )
> υcb (z2c ) ∀z2c ∈ (z1c , z3c ).
(5.4)
Eq. (5.2) shows that the RDU V -difference between Smc and Tmc under the contextual unit range transformation is (w sm2 − w tm2 )υcn (z2c ) + (w sm3 − w tm3 ), which is obviously increasing in υcn (z2c ) for all MPS pairs since (w sm2 − wtm2 ) > 0 in these pairs, as shown in (1.4). Therefore, by Eq. (5.4), it is increasing in local risk aversion r n (z ) as well. As a result, the contextual utility model easily implies what strong and strict utility cannot: That a ≻ b ⇒ a ≻ b on all three-outcome contexts when λ and mra
smra
weighting functions are held constant across agents. Some agents may be less precise than others: the precision λ may vary across agents. The following two propositions refine the result to allow this (proofs are obvious and so omitted). Proposition 1. Consider two EU agents such that λa ≥ λb and r a (z ) > r b (z ) ≥ 0 for all z. Then in an EU contextual utility model, a ≻ b. Put differently, a ≻ b and λa ≥ λb ⇒ a ≻ b under an EU smra
mra
smra
contextual utility model. The restriction to EU matters since RDU agents may prefer the riskier lottery in some MPS pairs. When they do, they have a negative V -difference in the pair. A larger value of λ will magnify that and possibly offset greater risk aversion in Pratt’s sense. For RDU, then, we confine attention to MPS pairs where the less risk averse agent b (in Pratt’s sense) prefers the safe lottery:
N.T. Wilcox / Journal of Econometrics 162 (2011) 89–104
n Pmc
97
(w sm1 − w tm1 )un (z1c ) + (w sm2 − w tm2 )un (z2c ) + (w sm3 − w tm3 )un (z3c ) =H λ . un (z3c ) − un (z1c ) Box I.
Proposition 2. Consider two RDU agents such that λa ≥ λb , r a (z ) > r b (z ) ≥ 0 for all z, and w a (q) ≡ w b (q). Then in an RDU contextual a b b utility model, Pmc > Pmc for all MPS pairs in which Pmc ≥ 0.5. Note that contextual utility does not imply that safe choices become certain as risk aversion (in Pratt’s sense) becomes infinite. This is easily seen by noting that υcn (z2c ) → 1 as r n (z ) → ∞, in which case Eq. (5.2) with agent-dependent precision λn becomes n Pmc = H λn [w n (sm2 + sm3 ) − wn (tm2 − tm3 )] .
(5.5)
Since w (sm2 + sm3 ) − w (tm2 + tm3 ) is obviously finite, contextual n utility does not imply that Pmc → 1 as r n (z ) → ∞ for all MPS n pairs; it just implies that Pmc is increasing in r n (z ) for given λn and w n (q). Of course, there will always exist a finite λn such that n Pmc approaches certainty to any specified degree as r n (z ) becomes large. When combined with either a constant absolute risk aversion (CARA) utility function u(z ) = −e−rz or a CRRA utility function, contextual utility implies some theoretically attractive invariance properties of choice probabilities. Call c + k = (z1c + k, z2c + k, z3c + k) and kc = (kz1c , kz2c , kz3c ) additive and proportional shifts of context c = (z1c , z2c , z3c ), respectively. Since u(z + k) = −e−r (z +k) = −e−rk e−rz for CARA, we have n
n
υcn+k (z + k) ≡ ≡
(−e−rk e−rz + e−rk e−rz1c ) (−e−rk e−rz3c + e−rk e−rz1c ) (−e−rz + e−rz1c ) ≡ υcn (z ). (−e−rz3c + e−rz1c )
(5.6)
n n Eq. (5.2) then implies that Pm ,c +k ≡ Pmc . Similarly, since u(kz ) = 1−ρ 1−ρ 1−ρ (zk) =k z for CRRA, we have n υkc (kz ) ≡
(k1−ρ z 1−ρ − 1−ρ
(k1−ρ z3c
−
1−ρ k1−ρ z1c 1−ρ k1−ρ z1c
) )
1−ρ
≡
(z 1−ρ − z1c ) (
1−ρ z3c
1−ρ
− z1c )
≡ υcn (z ).
(5.7)
n n Eq. (5.2) then implies that Pm ,kc ≡ Pmc . That is, contextual utility implies that choice probabilities in pairs are invariant to an additive (proportional) context shift given a CARA (CRRA) utility function. This property echoes what is well known about structural CARA and CRRA preferences, namely that their preference directions are invariant to additive and proportional context shifts, respectively.10 For sets of pairs on a single context, contextual utility will share all properties of strong utility models, such as SST (Luce and Suppes, 1965) and simple scalability (Tversky and Russo, 1969), since contextual utility is observationally identical to a strong utility model with agent heteroscedasticity on a single
10 The logarithmic V -difference strict utility form (2.5) and random preference models share these two shift invariance properties with contextual utility, but strong utility does not. Contextual utility and strong utility EU models share a different property which might be called the ‘‘false common ratio effect’’ (FCRE), while random preference EU models do not have this property. See Loomes (2005, pp. 303–305) for a lucid discussion of the manner in which strong utility EU produces the FCRE (the reasoning is identical for contextual utility) and why random preference EU does not. See Wilcox (2008) for an extensive comparison of properties of combinations of EU and RDU structures with various stochastic models.
context. However, because contextual utility is heteroscedastic across contexts, it will violate SST and simple scalability for sets of pairs on several distinct contexts. In general it only obeys moderate stochastic transitivity (MST; see Appendix A). This is a descriptive bonus. Contextual utility will explain well-known violations of simple scalability such as the Myers effect (Myers and Sadler, 1960) in much the same way other heteroscedastic models such as decision field theory do (Busemeyer and Townsend, 1993).11 Others have posited pair heteroscedasticity in discrete choice under risk. Both Hey (1995) and Buschena and Zilberman (2000) investigated several heteroscedastic forms for econometric reasons and/or theoretical reasons based on similarity relations.12 Though it is not Blavatskyy’s (2007) central innovation, part of his heteroscedastic form is precisely that posited by contextual utility and Blavatskyy’s reason for doing this is unadorned (but good) intuition. Busemeyer and Townsend’s (1993) decision field theory also produces a complex form of heteroscedasticity that varies with outcome utilities and probabilities. Its logic is stochastic sampling of outcome utilities, partially guided by outcome probabilities: Put differently, the heteroscedasticity arises from computational reasons. Contextual utility’s relatively unique and distinguishing feature is that it arrives at contextual heteroscedasticity by interrogating the logic of the relationship between MRA and SMRA in latent variable models, rather than a computational logic (though contextual utility has some empirical psychological grounding as well). It is surprising and interesting that similar (though by no means identical) forms of pair heteroscedasticity can be the conclusion of such different theoretical approaches to the issue of stochastic choice under risk. 6. An empirical comparison of contextual utility and some competitors The question naturally arises: Which stochastic model, when combined with EU or RDU, actually explains and predicts binary choice under risk best? To answer this question, we need a suitable data set. Strong utility and contextual utility are simple reparameterizations of one another for any one context. Therefore, data from any experiment where no subject makes choices from pairs on several distinct contexts (e.g. Loomes and Sugden, 1998) are not suitable. The experiment of Hey and Orme (1994), hereafter HO, is suitable since all subjects make choices from pairs on four distinct contexts. The HO experiment has another desirable design feature that is unique among experiments with multiple contexts: the same pairs of probability distributions are used to construct the lottery pairs on all four of its contexts. Therefore, when models fail to explain choices across multiple contexts in the HO data, we cannot attribute this to other differences in lottery pairs across contexts: the failure must be due to the model’s generalizability across contexts. The HO experiment builds lotteries from four equally spaced non-negative outcomes including zero, in increments of 10 UK
11 Busemeyer and Townsend (1993) explain simple scalability, how the ‘‘Myers effect’’ violates it, and the logic behind decision field theory’s explanation of the Myers effect (contextual utility gives a very similar explanation). 12 A class of heteroscedastic models described by Carroll and De Soete (1991) handles certain kinds of similarity: see Wilcox (2008) for an application of their ‘‘wandering vector model’’ to choice under risk.
98
N.T. Wilcox / Journal of Econometrics 162 (2011) 89–104
pounds. Letting ‘‘1’’ represent 10 UK pounds, the experiment uses the outcome vector (0, 1, 2, 3). Exactly one-fourth of observed lottery choices in the HO data are choices from pairs on each of the four possible three-outcome contexts from this vector. In keeping with past notation, these four contexts are denoted by c ∈ {−0, −1, −2, −3}, where −0 = (1, 2, 3), −1 = (0, 2, 3), −2 = (0, 1, 3) and −3 = (0, 1, 2). HO estimate a variety of structures combined with strong utility, and do this individually—that is, they estimate each structure separately for each subject. Additionally, for all structures that specify a utility function on outcomes u(z ), HO take a nonparametric approach to the utility function. Letting unz ≡ un (z ), HO set un0 = 0 and λ = 1, and estimate un1 , un2 and un3 directly, allowing the utility function u(z ) to take on arbitrary shapes across the outcome vector (0, 1, 2, 3). (The form of strong utility models and the affine transformation properties of u(z ) imply that just three of the five parameters λ, un0 , un1 , un2 and un3 are identified.) HO found that estimated utility functions overwhelmingly fall into two classes: concave utility functions, and inflected utility functions that are concave over the context (0, 1, 2) but convex over the context (1, 2, 3). The latter class accounts for 30 to 40% of subjects (depending on the structure estimated). Because of this, I follow HO in avoiding a simple parametric functional form such as CARA or CRRA that forces concavity or convexity across the entire outcome vector (0, 1, 2, 3), instead adopting their nonparametric treatment of utility functions in strong, strict and contextual utility models. However, I will instead set un0 = 0 and un1 = 1, and estimate λ, un2 and un3 . To account for heterogeneity of subject behavior, I part company with HO’s individual estimation and use a random parameters approach for estimation, as done by Loomes et al. (2002) and Moffatt (2005). Individual estimation does avoid distributional assumptions made by random parameters methods, and seems well-behaved at HO’s sample sizes when we confine attention to ‘‘in-sample’’ comparisons of model fit, as HO did. However, Monte Carlo analysis shows that with finite samples of the HO size, comparisons of the out-of-sample predictive performance of stochastic models can be extremely misleading with individual estimation and prediction (Wilcox, 2007). I want to address both the in-sample fit and out-of-sample predictive performance of models, so this issue matters here. Proper accounting for heterogeneity is crucial when comparing structural models of discrete choice under risk (Wilcox, 2008). With both of these issues in mind, I choose a random parameters estimation approach.13 The HO experiment allowed subjects to express indifference between lotteries. HO model this with an added ‘‘threshold of discrimination’’ parameter within a strong utility model. An alternative parameter-free approach, and the one I take here, treats indifference in a manner suggested by decision theory, where the indifference relation Smc ∼n Tmc is defined as the intersection of two weak preference relations, i.e. ‘‘Smc ≽n Tmc ∩ Tmc ≽n Smc .’’ This suggests treating indifference responses as two responses in the likelihood function – one of Smc being chosen from mc, and another of Tmc being chosen from mc – but dividing that total log likelihood by two since it is really based on just one independent observation. Formally, the definite choice of Smc adds ln(Pmc ) to the total log likelihood; the definite choice of Tmc adds ln(1 − Pmc ) to that total;
13 This does cost some extra parameters—8 in all for EU specifications, and 11 in all for RDU specifications. This allows for salient covariation of risk and precision parameters in the sampled population. My own opinion is that guarding against aggregation biases is well worth this cost, especially in a large data set like HO with 12,000 observations. See Appendix C and Wilcox (2008) for details of the specifications.
and indifference adds [ln(Pmc ) + ln(1 − Pmc )]/2 to that total. See also Papke and Wooldridge (1996) and Andersen et al. (2008) for related justifications of this approach.14 The random preference model also appears in both contemporary experimental and theoretical work on stochastic discrete choice under risk (Loomes and Sugden, 1995, 1998; Carbone, 1997; Loomes et al., 2002; Gul and Pesendorfer, 2006). Econometrically, the random preference model views stochastic choice as arising from randomness of structural parameters. We think of each agent n as having an urn filled with structural parameter vectors β n . Following Carbone (1997) for instance, we may set un0 = 0 and un1 = 1 and think of EU with random preferences as a situation where β n = (un2 , un3 ), with un2 > 1 and un3 > un2 . Each agent n has an urn filled with such utility vectors (un2 , un3 ). At each trial of any pair mc, an agent draws one of these vectors from her urn (with replacement) and uses it to calculate both V (Smc |β n ) and V (Tmc |β n ) without error, choosing Smc iff V (Smc |β n ) ≥ V (Tmc |β n ). Let Fβ (x|α n ) be the joint c.d.f. of β in agent n’s ‘‘random preference urn,’’ conditioned on some vector α n of parameters determining the shape of the distribution Fβ . Then under random preferences, n Pmc = Pr V (Smc |β n ) − V (Tmc |β n ) ≥ 0|Fβ (x|α n ) .
(6.1)
Loomes et al. (2002) show how to specify a random preferences RDU model for pairs on a single context. However, specification of a random preference RDU model across multiple contexts is more difficult. Building both on Loomes et al. (2002) and certain insights of Carbone (1997), Appendix B shows how this may be done for the three contexts −0, −2 and −3, but also shows why further extending random preferences to cover the other context −1 is not a transparent exercise for the RDU structure. For this reason, I confine all of my estimations to choices in the HO data on the contexts −0, −2 and −3, where a random preference RDU model can clearly be compared to strong, strict and contextual utility RDU models. Appendix C illustrates the random parameters specification and estimation in detail for the EU structure with strong utility; see Wilcox (2008) for detailed exposition of all specifications and their estimation. I perform two different kinds of comparisons of model fit. The first kind (very common in this literature) are ‘‘in-sample fit comparisons.’’ Models are estimated on all three of the HO contexts (−0, −2 and −3) and the resulting log likelihoods for each model across all three contexts are compared. The second kind of comparison, which is rare in this literature, compares the ‘‘out-of-sample’’ fit of models—that is, their predictive performance.15 For these comparisons, models are estimated on just the two HO contexts −2 and −3, that is contexts (0, 1, 3) and (0, 1, 2), and these estimates are used to predict choice probabilities and calculate log likelihoods of observed choices on HO context −0, that is context (1, 2, 3). This is something more than a simple out-of-sample prediction, which could simply be a prediction to new choice trials of the same pairs (and hence contexts)
14 Note that indifference responses are rare overall in the HO data (about 2.7% of all responses) and that these are concentrated amongst a relatively small number of subjects. 15 The HO design presents the same 100 pairs (25 on each of its four contexts) to a fixed group of subjects on two separate days. This is a panel of subjects with variation in pairs, contexts and days. Any such panel could be divided up into a ‘‘sample’’ for estimation, and remaining ‘‘out-of-sample’’ observations for predictive evaluation, in different ways. For instance one might divide it up by its time dimension, estimating with one day’s choices, and then trying to predict choices made on the other day. Since my interest lies with the different way in which the stochastic models treat choices across contexts, I choose to divide the panel up across its context, rather than time, dimension.
N.T. Wilcox / Journal of Econometrics 162 (2011) 89–104
used for estimation. It is additionally an ‘‘out-of-context’’ prediction.16 This particular kind of out-of-sample fit comparison may be quite difficult in the HO data. Relatively safe choices are the norm in contexts −2 and −3 of the HO data. The mean proportion of safe choices made by HO subjects in these contexts is 0.764, and at the individual level this proportion exceeds 1/2 for 70 of the 80 subjects. But relatively risky choices are the norm in context −0 of the HO data: The mean proportion of safe choices there is just 0.379, and falls short of 1/2 for 58 of the 80 subjects. Recall that we cannot attribute this to differences in the probability vectors that make up the lottery pairs in the different HO contexts, since the HO experiment holds this constant across contexts. This outof-sample prediction task is going to be difficult. From largely safe choices in the ‘‘estimation contexts’’ −2 and −3, the models need to predict largely risky choices in the ‘‘prediction context’’ −0. Table 1 displays both the in-sample and out-of-sample log likelihoods for the eight models. The top four rows are EU models, and the bottom four rows are RDU models; for each structure, the four rows show results for strong utility, strict utility, contextual utility and random preferences. The first column shows total insample log likelihoods, and the second column shows total outof-sample log likelihoods. Contextual utility always produces the highest log likelihood, whether it is combined with EU or RDU, and whether we look at in-sample or out-of-sample log likelihoods (though the log likelihood advantage of contextual utility is most pronounced in the out-of-sample comparisons). Buschena and Zilberman (2000) and Loomes et al. (2002) point out that the bestfitting stochastic model may depend on the structure estimated and offer empirical illustrations of this sensible econometric point. Yet in Table 1 contextual utility is the best stochastic model whether viewed from the perspective of EU or RDU, or from the perspective of in-sample or out-of-sample fit. Decision theory has been relatively dominated by structural theory innovation over the last quarter century. Table 1 has something to say about this. Examine the in-sample fit column first. Holding stochastic models constant, the maximum log likelihood improvement of switching from EU to RDU is 142.02 (with strict utility), and the improvement is 106.64 for the bestfitting stochastic model (contextual utility). Holding structures constant instead, the maximum improvement in log likelihood associated with changing the stochastic model is 151.48 (with the EU structure, switching from strict to contextual utility), but this is atypical: omitting strict utility models, which have an unusually poor in-sample fit, the maximum improvement is 51.28 (with the EU structure, switching from strong to contextual utility) and otherwise no more than half that. Therefore, except for the especially poor strict utility fits, in-sample comparisons make choice of a stochastic model appear to be a sideshow relative to choice of a structure. This appearance is reversed when we look at out-of-sample predictive power. Looking now at the out-of-sample fit column, notice first that under strict utility, RDU actually fits worse than EU does. We should be getting the impression by now, however, that strict utility is an unusually poor performer, so let us henceforth omit it from consideration. Among the remaining three stochastic models, the maximum out-of-sample fit improvement associated with switching from EU to RDU is 21.19 (for contextual utility). Holding structures constant instead, the maximum outof-sample fit difference between the stochastic models (again omitting strict utility) is 113.39 (for RDU, switching from strong to contextual utility). In out-of-sample prediction, then, structures
16 This is exactly Busemeyer and Wang’s (2000) distinction between ‘‘crossvalidation’’ and ‘‘generalization.’’
99
play Rosencrantz and Guildenstern to the stochastic model Hamlet. Decision research seems heavily preoccupied with explanation. But these results suggest that those who have more interest in prediction may want to think more about stochastic models, as repeatedly urged by Hey and Orme (1994), Hey (2001), Hey (2005) and suggested by Ballinger and Wilcox (1997). Table 2 reports a formal comparison of stochastic models, ˜ n be the difference between the conditional on each structure. Let D estimated log likelihoods (in-sample or out-of-sample) from a pair of models, for subject n. Vuong (1989) shows that asymptotically, ˜ n follows a normal distribution under a z-score based on the D the null that two non-nested models are equally close to the truth (neither model √ needs to be the true model). The statistic is ∑N n ˜ ˜D is the sample standard deviation D /(˜ s z = D N ), where s n=1
˜ n across subjects n (calculated without adjustment for of the D a degree of freedom) and N is the number of subjects. Table 2 reports these z-statistics and p-values against the null of equally good fit, with a one-tailed alternative that the directionally better fit is significantly better. While contextual utility is always directionally better than its competitors, no convincingly significant ordering of the stochastic models emerges from the insample comparisons shown in the left half of Table 2, though strict utility is clearly significantly worse than the other three stochastic models. Contextual utility shines, though, in the out-of-sample fit comparisons in the right half of Table 2, regardless of whether the structure is EU or RDU, where it beats the other three stochastic models with strong significance. 7. Conclusions While most of the scholarly conversation about decision under risk concerns its structure, there is resurgent interest in the stochastic part of decision under risk. This has been driven both by theoretical questions and empirical findings. Theoretically, some or all of what passes for ‘‘an anomaly’’ relative to some structure (usually EU) can sometimes be attributed to stochastic models rather than the structure in question (Wilcox, 2008). This is an old point, stretching back at least to Becker et al.’s (1963a; 1963b) observation that violations of betweenness are precluded by some stochastic versions of EU (random preferences) but predicted by other stochastic versions of EU (strong utility). But this general concern has been resurrected by many writers; Loomes (2005), Gul and Pesendorfer (2006) and Blavatskyy (2007) are just three relatively recent (but very different) examples. Empirically, we know from a (now quite large) body of experimental evidence on retest reliabilities that the stochastic part of decision under risk is surprisingly large. Additionally, a ‘‘new structural econometrics’’ has emerged over the last decade in which structures like EU, RDU and CPT are combined with some stochastic model for the purpose of estimating structural risk parameters. I have interrogated the empirical meaning of structural parameters in econometric models of risky choice. I find that in strong and strict utility models, structural parameters meant to represent the degree of risk aversion in Pratt’s (1964) sense cannot order agents according to a theoretically attractive definition of the relation ‘‘stochastically more risk averse’’ across all choice contexts. I conclude that strong and strict utility are deeply troubled for the econometrics of discrete choice under risk. Contextual utility eliminates this trouble without introducing extra parameters. In the Hey and Orme (1994) data set, its insample explanatory performance exceeds that of strong and strict utility models and the random preference model; and its outof-sample predictive performance is significantly the best of this particular collection of stochastic models. I regard stochastic choice as the oldest and most robust fact of choice under risk, and believe that serious interpretive errors
100
N.T. Wilcox / Journal of Econometrics 162 (2011) 89–104
Table 1 Log likelihoods of random parameters characterizations of the models in the Hey and Orme sample. Structure
EU
RDU
Stochastic model
Strong utility Strict utility Contextual utility Random preferences Strong utility Strict utility Contextual utility Random preferences
Estimated on all three contexts
Estimated on contexts (0, 1, 2) and (0, 1, 3)
Log likelihood on all three contexts (in-sample fit)
Log likelihood on context (1, 2, 3) (out-of-sample fit)
−5311.44 −5448.50 −5297.08 −5348.36 −5207.81 −5306.48 −5190.43 −5218.00
−2409.38 −2373.12 −2302.55 −2356.60 −2394.75 −2450.41 −2281.36 −2335.55
Table 2 Vuong (1989) non-nested tests between model pairs, by structure, stochastic model and in-sample versus out-of-sample fit. EU structure Estimated on all three contexts, and comparing fit on all three contexts (in-sample fit comparison)
Estimated on contexts (0, 1, 2) and (0, 1, 3), and comparing fit on context (1, 2, 3) (out-of-sample fit comparison)
Random prefs.
Strong utility
Strict utility
Strong utility
–
z = 0.703 p = 0.241 z = −1.574 p = 0.058 –
z = 6.067 p < 0.0001 z = 5.419 p < 0.0001 z = 5.961 p < 0.0001
Contextual utility
Random prefs.
z = 1.723 p = 0.042 –
z = 0.877 p = 0.190 z = −0.44 p = 0.330 –
z = 4.352 p < 0.0001 z = 3.808 p < 0.0001 z = 5.973 p < 0.0001
Contextual utility
Random prefs.
Strong utility
Strict utility
Random prefs.
z = 4.387 p < 0.0001 –
Strong utility
–
z = 3.044 p = 0.0012 z = 1.639 p = 0.051 –
z = 2.739 p = 0.0031 z = 1.422 p = 0.078 z = 0.028 p = 0.49
Contextual utility Random prefs.
z = 3.879 p < 0.0001 –
Strong utility
–
z = 3.304 p = 0.0005 z = 1.652 p = 0.049 –
z = 5.978 p < 0.0001 z = 3.831 p < 0.0001 z = 3.918 p < 0.0001
RDU structure
Random prefs.
z = 0.981 p = 0.163 –
Strong utility
–
Contextual utility
Notes: Positive z means the row stochastic model fits better than the column stochastic model.
can occur when the implications of stochastic choice models are ignored. I have shown that when choice is stochastic, a globally coherent notion of greater risk aversion necessarily implies the existence of certain context effects. Decision research views some contextual labilities of choice as failures of rationality. Surely some are, but not all are: some may be the prosaic consequence of a sensible stochastic model that makes global sense of SMRA. For instance, the Myers effect (Myers and Sadler, 1960) appears to be a reversal of preferences caused by a change in a standard of comparison. Busemeyer and Townsend (1993) explain why the Myers effect may be no more (or less) than contextual heteroscedasticity. Contextual utility also has implications for ‘‘strength of incentives’’ in experiments and more generally any mechanism. If contextual utility is correct, the marginal utility difference between taking risky actions X and Y is only part of what governs the strength of perceived incentives: the other important part is the subjective range of available utilities. This implies that scaling up outcomes may be a relatively ineffective means of strengthening incentives. If we can instead redesign an experiment or mechanism to shrink the subjective range of available utilities, while holding marginal utility improvements constant in the neighborhood of a maximum, this may be a more effective way of strengthening subjective optimization incentives. Acknowledgements John Hey generously made his remarkable data sets available to me. I also thank John Hey and Pavlo Blavatskyy, Soo Hong Chew, Edi Karni, Ondrej Rydval, Peter Wakker, two anonymous referees and especially Glenn Harrison for conversation, commentary and/or help, though any errors here are solely my own. I thank the National Science Foundation for support under grant SES 0350565.
Appendix A. Stochastic transitivity properties of contextual utility Definitions. Let {C , D, E } be any triple of lotteries generating the pairs {C , D}, {D, E } and {C , E }. In a basic triple, all three pairs are basic pairs. A heteroscedastic V -difference latent variable model is PST = F [λ(VS − VT )/σST ], where PST is the probability that S is chosen from pair {S , T }, VS ≡ V (S |β) is the structural value V of any lottery S, and σST is a noise component specific to pair {S , T }. (Suppress structural parameters β but assume they are fixed, so the discussion is about an individual agent.) Let V S and V¯ S denote the value of degenerate lotteries that pay the minimum and maximum outcomes in lottery S with certainty, respectively. In basic triples, the intervals [V C , V¯ C ], [V D , V¯ D ] and [V E , V¯ E ] must overlap (if not, the outcome ranges of two lotteries in the triple would be disjoint and they would form an FOSD pair). From (5.1), σST is the utility range in pair {S , T }, that is max(V¯ S , V¯ T ) − min(V S , V T ), in the contextual utility model. I make use of Halff’s Theorem (Halff, 1976): Any heteroscedastic V -difference latent variable model in which pair-specific noise components σST obey the triangle inequality across triples of pairs will satisfy moderate stochastic transitivity (MST). Proposition. Contextual utility obeys MST, but not strong stochastic transitivity (SST), in all basic triples. (Remark: This only rules out triples with glaringly transparent FOSD pairs where all outcomes in one lottery exceed all outcomes in another lottery. See (4.2) for a treatment of FOSD pairs using trembles. Proof. The utility range in a pair cannot be less than the utility range in either of its component lotteries, so σCD ≥ V¯ C − V C and σDE ≥ V¯ E − V E : Sum to get σCD + σDE ≥ V¯ C − V C + V¯ E − V E . Since {C , D, E } is a basic triple, [V C , V¯ C ] and [V E , V¯ E ] overlap.
N.T. Wilcox / Journal of Econometrics 162 (2011) 89–104
Therefore, the utility range in pair {C , E } cannot exceed the sum of the utility ranges of its component lotteries C and E; that is, σCE ≤ V¯ C − V C + V¯ E − V E . Combining the last two inequalities, we have σCD + σDE ≥ σCE , which is the triangle inequality. By Halff’s Theorem, contextual utility obeys MST for all basic triples. An example suffices to show that contextual utility can violate SST in basic triples. Consider an expected value maximizer. Assume that C , D and E have outcome ranges [0, 200], [100, 300] and [100, 400], respectively, and expected values 162, 160 and 150, respectively. The latent variable in contextual utility is the ratio of a pair’s V -difference to the pair’s range of possible utilities. In this example, these ratios are 2/300 in pair {C , D}, 10/300 in pair {D, E }, and 12/400 = 9/300 in pair {C , E }. All are positive, implying that all choice probabilities (of the first lottery in each pair) exceed 0.5. But the probability that C is chosen over E will be less than the probability that D is chosen over E, since the latent variable in the former pair (9/300) is less than the latent variable in the latter pair (10/300). This violates SST. Appendix B. Random preference RDU across multiple contexts Using insights of Carbone (1997), I generalize Loomes et al.’s (2002) technique for single contexts. Like Loomes et al. (2002), assume that weighting function parameters are nonstochastic; that is, that the only structural parameters that vary in a subject’s ‘‘random preference urn’’ are her outcome utilities. Combining (1.3) and (6.1), we have
n Pmc
= Pr
3 −
Wmi (γ )u (zic ) ≥ 0|Fu (x|α ) , n
n
n
where
Wmi (γ n ) = w
−
−w
smi |γ n
j ≥i
−w
−
tmi |γ
n
−
tmi |γ
n
.
(B.1)
j>i
where
≡ [u (z3c ) − u (z2c )]/[u (z2c ) − u (z1c )], n
n
n
Wm2 (γ n ) = w(sm2 + sm3 |γ n ) − w(sm3 |γ n ) − [w(tm2 + tm3 |γ n ) − w(tm3 |γ n )], Wm3 (γ ) = w(sm3 |γ ) − w(tm3 |γ ). n
n
n
and
(B.2)
Notice that we can view random preferences as based on a contextdependent ratio of differences transformation of utility, namely vnc , as is the case with contextual utility (though the ratio of differences is not the same in the two models). Random preference models will, therefore, also display context dependence. Unlike contextual utility, however, vnc ∈ R+ is a random variable, containing all choice-relevant stochastic information about the agent’s random utility function un (z ) for choices on context c. Let Gvc (x|αcn ) be the context-specific c.d.f. of vnc , generated by the joint distribution Fu (x|α n ) of agent n’s random utility vector. Since tm3 − sm3 > 0 in basic pairs, we have W3c (γ n ) = w(tm3 |γ n ) − w(sm3 |γ n ) > 0 for basic pairs.17 Rewriting (B.2) to make the change of random variables suggested above explicit, we then have n Pmc = Pr vnc ≤ −[Wm2 (γ n ) + Wm3 (γ n )]/Wm3 (γ n )|Gvc (x|αcn ) ,
With Eq. (B.3) we have arrived where Loomes et al. (2002) left things. Choosing some c.d.f. on R+ for Gvc , such as the lognormal or Gamma distribution, construction of a likelihood function from (B.3) and choice data is straightforward for one context and this is the kind of experimental data (Loomes et al., 2002) had. But when contexts vary in a data set, the method quickly becomes intractable except for special cases. I now work out one such special case. By choosing the utilities of the two lowest outcomes equal to zero and one, respectively, the random utility vector for the four outcomes (0, 1, 2, 3) may be summarized by a random utility vector for just the two highest outcomes, that is the random vector (un2 , un3 ), where un2 > 1 and un3 > un2 . Let g1n ≡ un2 − 1 ∈ R+ and g2n ≡ un3 − un2 ∈ R+ be two underlying random variables generating these random utilities as un2 = 1 + g1n and un3 = 1 + g1n + g2n . A little algebra shows that vn−3 = g1n , vn−2 = g1n + g2n , vn−1 = g2n /(g1n + 1), and vn−0 = g2n /g1n . With the four vnc (for the four contexts) expressed this way, we want a joint distribution of g1n and g2n so that as many of these as possible have tractable parametric distributions. The best choice I am aware of still only works for three of these four expressions. That choice is two independent gamma variates, each with the gamma distribution’s c.d.f. G(x|φ, κ), with identical ‘‘scale parameter’’ κ n but possibly different ‘‘shape’’ parameters φ1n and φ2n . Under this choice, vcn is distributed . . .
Beta-prime with c.d.f.B (x|φ2 , φ1 ) for pairs on context c = −0.
−
n Pmc = Pr Wm2 (γ n ) + Wm3 (γ n )[vcn + 1] ≥ 0|Fu (x|α n ) , n
(B.3)
(B.4)
Since Wm1 (γ n ) ≡ −Wm2 (γ n ) − Wm3 (γ n ), and assuming strict monotonicity of utility in outcomes so that we may divide the inequality within (B.1) through by un (z2c ) − un (z1c ), we have vnc
w(sm2 + sm3 |γ n ) − w(tm2 + tm3 |γ n ) n αc . w(tm3 |γ n ) − w(sm3 |γ n )
′
+w
smi |γ n
j >i
j ≥i
n Pmc = Gvc
Gamma with c.d.f.G(x|φ1 + φ2 , κ)f or pairs on context c = −2,
or
Gamma with c.d.f.G(x|φ1 , κ) for pairs on context c = −3,
i =1
101
17 FOSD pair choice probabilities are modeled as tremble events, as in (4.1) and Loomes et al. (2002).
The ‘‘beta-prime’’ distribution on R+ is also called a ‘‘beta distribution of the second kind’’ (Aitchison, 1963).18 These assumptions imply a joint distribution of un2 − 1 and un3 − 1 known as ‘‘McKay’s bivariate gamma distribution’’ and a correlation coefficient φ1n /(φ1n + φ2n ) between un2 and un3 in subject n’s ‘‘random preference urn’’ (Hutchinson and Lai, 1990). An acquaintance with the literature on estimation of random utility models may make these assumptions seem very special and unnecessary. However, theories of choice under risk are special relative to the kinds of preferences that typically get treated in that literature. Consider the classic example of transportation choice well known from Domencich and McFadden (1975). Certainly we expect the value of time and money to be correlated across the population of commuters. But for a single commuter making a choice between car and bus on a specific morning, we do not require a specific relationship between the disutility of commuting time and the marginal utility of income she happens to ‘‘draw’’ from her random utility urn on that particular morning. This gives us some latitude when we choose a distribution for the unobserved parts of her utilities of various commuting alternatives. We have much less of this latitude when we think of random preferences over lottery pairs. The spirit of random preferences is that every preference ordering drawn from the urn obeys all
18 The ratio relationship here is a generalization of the well-known fact that the ratio of independent chi-square variates follows an F distribution. Chi-square variates are gamma variates with common scale parameter κ = 2. In fact, a betaprime variate can be transformed into an F variate. If x is a beta-prime variate with parameters a and b, then bx/a is an F variate with degrees of freedom 2a and 2b. This is convenient because almost all statistics software packages contain thorough call routines for F variates, but not necessarily any call routines for beta-prime variates.
102
N.T. Wilcox / Journal of Econometrics 162 (2011) 89–104
properties of the preference structure (Loomes and Sugden, 1995). We demand, for instance, that every vector of outcome utilities drawn from the urn respects monotonicity in z; this implies that the joint distribution of un2 and un3 must have the property that un3 ≥ un2 ≥ 1. Moreover, the assumptions we make about the vnc must be probabilistically consistent across pair contexts. Choosing a joint distribution of un2 and un3 immediately implies exact commitments regarding the distribution of any and all functions of un2 and un3 . The issue does not arise in a data set where subjects make choices from pairs on just one context, as in Loomes et al. (2002): in this simplest of cases, any distribution of vcn on R+ , including the lognormal choice they make, is a wholly legitimate hypothesis. But as soon as each subject makes choices from pairs on several different overlapping contexts, random preferences are much more exacting. Unless we can specify a joint distribution of g1n and g2n that implies it, we are not entitled (for instance) to assume that vnc follows lognormal distributions in all of three overlapping contexts for a single subject.19 Put differently, our choice of a joint distribution for vn−2 and vn−3 has inescapable implications for the distribution of vn−0 . Carbone (1997) correctly saw this in her random preference treatment of the EU structure. Under these circumstances, a clever choice of the joint distribution of g1n and g2n is necessary. The random preference model can be quite limiting in practical applications. For instance, notice that I have not specified a distribution of vn−1 and hence have no method for applying Eq. (B.3) to pairs on the context (0, 2, 3). Fully a quarter of the data from experiments such as Hey and Orme (1994) are on that context. On the context −1 = (0, 2, 3), vn−1 = g2n /(1 + g1n ). As far as I am aware, the following is a true statement, though I may yet see it disproved. Conjecture. There is no nondegenerate joint distribution of g1n and g2n on (R+ )2 such that g1n , g1n + g2n , g2n /(g1n + 1) and g2n /g1n all have tractable parametric distributions. This is why I limit myself here to the three-fourths of these data sets that are choices from pairs on the contexts (0, 1, 2), (0, 1, 3) and (1, 2, 3): These are the contexts that the ‘‘independent gamma model’’ of random preferences developed above can be applied to. There are no similar practical modeling constraints on RDU strict, strong or contextual utility models (a considerable practical point in their favor); these models are easily applied to choices on any context. Note, however, that Loomes et al.’s (2002) technique could be sidestepped by (say) simulated maximum likelihood techniques. Appendix C. Estimation—An EU and strong utility illustration Though the Hey and Orme (1994) experiment contains no FOSD pairs, Moffatt and Peters (2001) found evidence of nonzero tremble probabilities using it, so I include a tremble probability in all models. However, using Hey’s (2001) still larger data set, I find no evidence that tremble probabilities vary across subjects. Therefore, I assume that ωn = ω for all subjects n, so that likelihood functions n n + ω/2. are built from the probabilities Pmc = (1 − ω)Pmc Let (EU, Strong) denote an expected utility structure with the strong utility stochastic model in which ψ n = (un2 , un3 , λn , ω) is subject n’s true parameter vector governing her choices from pairs. Let J (ψ|θ ) denote the joint c.d.f. governing the distribution of ψ = (u2 , u3 , λ, ω) in the sampled population, where θ are
19 Although ratios of lognormal variates are lognormal, there is no similar simple parametric family for sums of lognormal variates. The independent gammas with common scale are the only workable choice I am aware of.
parameters governing the shape and location of J. Let θ ∗ be the true value of θ in that population. We want an estimate θˆ of θ ∗ : this is random parameters estimation. We need a procedure for choosing a reasonable and tractable form for J that appears to characterize main features of the joint distribution of ψ in the sample. What follows illustrates this procedure for the (EU, Strong) specification: see Wilcox (2008) for a more elaborated description of the procedure and exact details of all specifications estimated here. Suppressing the subject index n, the (EU, Strong) specification is, at the individual level, Pmc = (1 − ω)Λ (λ [(sm1 − tm1 )u1c + (sm2 − tm2 )u2c
+ (sm3 − tm3 )u3c ]) + ω/2,
(C.1)
where Λ(x) = [1 + exp(−x)]−1 is the logistic c.d.f. (consistently employed as the function H (x) for the strong, strict and contextual utility models). Eq. (C.1) introduces the notation uic = u(zic ). In terms of the underlying utility parameters u2 and u3 of a subject, (u1c , u2c , u3c ) is (1, u2 , u3 ) for pairs on context c = −0 = (1, 2, 3), (0, 1, u3 ) for pairs on context c = −2 = (0, 1, 3); and (0, 1, u2 ) for pairs on context c = −3 = (0, 1, 2). Begin with individual estimation of a simplified version of (C.1), using 68 of HO’s 80 subjects.20 This initial estimation gives an impression of the form of the joint distribution of ψ , and how J (ψ|θ ) may be chosen to represent it. At this initial step, ω is not estimated, but is instead set equal to 0.04 in (C.1) for all subjects.21 (Estimation of ω is undertaken later in the random parameters estimation.) The log likelihood function for subject n, in terms of Pmc in (C.1) with ω = 0.04, is LLn (u2 , u3 , λ) =
−
ynmc ln(Pmc ) + (1 − ynmc ) ln(1 − Pmc ). (C.2)
mc
˜n = Maximizing this in u2 , u3 and λ yields initial estimates ψ n ˜n n (˜ , u˜ 3 , λ , 0.04) for each subject n. Fig. 8 graphs ln(˜u2 − 1), un2
˜ n ) against their first principal component, ln(˜un3 − 1) and ln(λ which accounts for about 69% of their collective variance.22 The figure also shows regression lines on the first principal component. The Pearson correlation between ln(˜un2 − 1) and ln(˜un3 − 1) is fairly high (0.848), and since these are estimates containing some pure sampling error, it appears that an assumption of perfect correlation between them in the underlying population may be roughly correct. Therefore, I make this assumption about the ˜ n ) appears joint distribution of ψ in the population. While ln(λ to share some variance with ln(˜un2 − 1) and ln(˜un3 − 1) (Pearson correlations of −0.22 and −0.45, respectively), it obviously either has independent variance of its own or is estimated with relatively low precision. These observations suggest that the joint distribution J (ψ|θ ) of ψ = (u2 , u3 , λ, ω) can be characterized as generated by two independent standard normal deviates xu and xλ , as follows: u2 (xu , θ ) = 1 + exp(a2 + b2 xu ), u3 (xu , θ ) = 1 + exp(a3 + b3 xu ), λ(xu , xλ , θ ) = exp(aλ + bλ xu + cλ xλ ) and ω a constant,
20 There are 12 subjects in the HO data with few or no choices of the riskier lottery in any pair. They can ultimately be included in random parameters estimations, but at this initial stage of individual estimation it is either not useful (due to poor identification) or simply not possible to estimate models for these subjects. 21 Estimation of ω is a nuisance at the individual level. Trembles are rare enough
that individual estimates of ω are typically zero for individuals. Even when estimates are nonzero, the addition of an extra parameter to estimate increases the noisiness of the remaining estimates and hides the pattern of variance and covariance of these parameters that we wish to see at this step. 22 Two hugely obvious outliers have been removed for both the principal components extraction and the graph.
N.T. Wilcox / Journal of Econometrics 162 (2011) 89–104
103
Table 3 Random parameters estimates of the (EU, Strong) model, using choice data from the contexts (0, 1, 2), (0, 1, 3) and (1,2,3) of the Hey and Orme (1994) sample. Structural and stochastic parameter models
Distributional parameter
Initial estimate
Final estimate
Asymptotic standard error
Asymptotic t-statistic
u2 = 1 + exp(a2 + b2 xu )
a2 b2
−1.2
−1.28 0.514
0.0411 0.0311
−31.0
0.57
u3 = 1 + exp(a3 + b3 xu )
a3 b3
−0.51
−0.653 0.657
0.0329 0.0316
−16.9
0.63
λ = exp(aλ + bλ xu + cλ xλ )
aλ bλ cλ
3.2 −0.49 0.66
3.39 −0.658 0.584
0.101 0.124 0.0571
−5.32
ω constant
ω
0.04
0.0105
4.26
0.0446
16.5 20.8 33.8 10.2
Log likelihood = −5311.44 Notes: xu and xλ are independent standard normal variates. Standard errors are calculated using the ‘‘sandwich estimator’’ (Wooldridge, 2002) and treating all of each subject’s choices as a single ‘‘super-observation,’’ that is, using degrees of freedom equal to the number of subjects rather than the number of subjects times the number of choices made.
Fig. 8. Shared variance of initial individual parameter estimates using the (EU, Strong) specification.
where (θ , ω) = (a2 , b2 , a3 , b3 , aλ , bλ , cλ , ω) are parameters to be estimated.
(C.3)
Then the (EU, Strong) model, conditional on xu , xλ and (θ , ω), becomes Pmc (xu , xλ , θ , ω) = (1 − ω)Λ (λ(xu , xλ , θ ) [(sm1 − tm1 )u1c
+ (sm2 − tm2 )u2c + (sm3 − tm3 )u3c ]) + ω/2, where u1c = 0 otherwise, u2c = u2 (xu , θ ) if c = −0, u2c = 1 otherwise, and u3c = u2 (xu , θ ) if c = −3, u3c = u3 (xu , θ ) otherwise. u1c = 1 if c = −0,
(C.4)
This implies the following random parameters log likelihood function in (θ , ω): LL(θ, ω) =
− n
∫ ∫ ln
∏
n
Pmc (xu , xλ , θ , ω)ymc
mc
1−ynmc
× [1 − Pmc (xu , xλ , θ , ω)]
dΦ (xu )dΦ (xλ ) ,
(C.5)
where Φ is the standard normal c.d.f. and Pmc (xu , xλ , θ , ω) is as shown in (C.4).23 The regression lines in Fig. 8 provide starting
23 Such integrations must be performed numerically in some manner for estimation. I use Gauss–Hermite quadratures, which are practical up to two or three integrals; for integrals of higher dimension, simulated maximum likelihood is usually more practical. Judd (1998) and Train (2003) are good sources for these methods.
values for maximizing (C.5). That is, initial estimates of the a and b coefficients in θ are the intercepts and slopes from the linear ˜ n ) on their first regressions of ln(˜un2 − 1), ln(˜un3 − 1) and ln(λ principal component; and the root mean squared error of the ˜ n ) on the first principal component provides an regression of ln(λ initial estimate of cλ . Table 3 shows the results of maximizing (C.5) in (θ , ω). These estimates produce the log likelihood in the first column of the top row of Table 1. Note that wherever bˆ 2 ̸= bˆ 3 , very large (or small) values of the underlying standard normal deviate xu imply a violation of monotonicity (that is u2 > u3 ). Rather than imposing b2 = b3 as a constraint on the estimations, I impose the weaker constraint |(a2 − a3 )/(b3 − b2 )| > 4.2649, making the estimated population fraction of such violations no larger than 10−5 . This constraint does not bind for the estimates shown in Table 3. Generally, it rarely binds, and is never close to significantly binding, for any of the strong, strict or contextual utility estimations done here. Recall that the nonparametric treatment of utility avoids a fixed risk attitude across the outcome vector (0, 1, 2, 3), as would be implied by a parametric form such as CARA or CRRA utility. The estimates shown in Table 3 imply a population in which about 68% of subjects have a weakly concave utility function, with the remaining 32% have an inflected ‘‘concave then convex’’ utility function, closely resembling Hey and Orme’s (1994) individual estimation results. That is, the random parameters estimation used here produces utility function heterogeneity much like that suggested by individual estimation. A very similar procedure was used to select and estimate random parameters characterizations of heterogeneity for all models. As with the detailed example of the (EU, Strong) specification, all specifications with utility parameters un2 and un3 (strong, strict or contextual utility specifications) yield quite high Pearson correlations between ln(˜un2 − 1) and ln(˜un3 − 1) across subjects, and heavy loadings of these on first principal components ˜ n . Therefore, the population of estimated parameter vectors ψ distributions of ψ = (u2 , u3 , λ, γ , ω) (strong, strict and contextual utility models, with γ ≡ 1 for EU) are in all cases modeled as having a perfect correlation between ln(un2 − 1) and ln(un3 − 1), generated by an underlying standard normal deviate xu . Similarly, individual estimations of random preference models where ψ = (φ1 , φ2 , κ, γ , ω) (γ ≡ 1 for EU) yield high Pearson correlations between ln(φ˜ 1n ) and ln(φ˜ 2n ) across subjects, and heavy loadings of these on first principal components ˜ n . So joint distributions of of estimated parameter vectors ψ ψ = (φ1 , φ2 , κ, γ , ω) are assumed to have a perfect correlation between ln(φ1n ) and ln(φ2n ) in the population, generated by an underlying standard normal deviate xφ .
104
N.T. Wilcox / Journal of Econometrics 162 (2011) 89–104
In all cases, all other model parameters are characterized as possibly partaking of some of the variance represented by a normally distributed first principle component xu (in strong, strict or contextual utility specifications) or xφ (in random preference specifications), but also having independent variance represented by an independent standard normal variate, as with the example of λ in the (EU, Strong) specification as shown in (C.3). For EU specifications, a likelihood function like (C.5) is maximized. RDU specifications add a third integration since these models allow for independent variance in γ (the Prelec weighting function parameter) through the addition of a third standard normal variate xγ . Integrations are carried out by Gauss–Hermite quadrature. In all cases, starting values for these numerical maximizations are computed in the manner described for the ˜ n are regressed on their (EU, Strong) model: parameters in ψ first principal component, and the intercepts and slopes of these regressions are the starting values for the a and b coefficients in the models, while the root mean squared errors of these regressions are the starting values for the c coefficients found in the equations for λ, κ and/or γ . References Aitchison, J., 1963. Inverse distributions and independent gamma-distributed products of random variables. Biometrika 50, 505–508. Andersen, S., Harrison, G.W., Lau, M.I., Rutström, E.E., 2008. Eliciting risk and time preferences. Econometrica 76, 583–618. Ballinger, T.P., Wilcox, N., 1997. Decisions, error and heterogeneity. Economic Journal 107, 1090–1105. Becker, G.M., DeGroot, M.H., Marschak, J., 1963a. Stochastic models of choice behavior. Behavioral Science 8, 41–55. Becker, G.M., DeGroot, M.H., Marschak, J., 1963b. An experimental study of some stochastic models for wagers. Behavioral Science 8, 199–202. Blavatskyy, P.R., 2007. Stochastic expected utility theory. Journal of Risk and Uncertainty 34, 259–286. Block, H D, Marschak, J., 1960. Random orderings and stochastic theories of responses. In: Olkin, I., et al. (Eds.), Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling. Stanford University Press, Stanford, pp. 97–132. Buschena, D.E., Zilberman, D., 2000. Generalized expected utility, heteroscedastic error, and path dependence in risky choice. Journal of Risk and Uncertainty 20, 67–88. Busemeyer, J., Townsend, J., 1993. Decision field theory: A dynamic-cognitive approach to decision making in an uncertain environment. Psychological Review 100, 432–459. Busemeyer, J., Wang, Y-M., 2000. Model comparisons and model selections based on generalization criterion methodology. Journal of Mathematical Psychology 44, 171–189. Camerer, C., 1989. An experimental test of several generalized expected utility theories. Journal of Risk and Uncertainty 2, 61–104. Camerer, C., Ho, T.-H., 1999. Experience weighted attraction learning in normalform games. Econometrica 67, 827–874. Carbone, E., 1997. Investigation of stochastic preference theory using experimental data. Economics Letters 57, 305–311. Carroll, J.D., De Soete, G., 1991. Toward a new paradigm for the study of multiattribute choice behavior. American Psychologist 46, 342–351. Chew, S.H., 1983. A generalization of the quasilinear mean with applications to the measurement of income inequality and decision theory resolving the Allais paradox. Econometrica 51, 1065–1092. Chew, S.H., Karni, E., Safra, Z., 1987. Risk aversion in the theory of expected utility with rank-dependent preferences. Journal of Economic Theory 42, 370–381. Debreu, G., 1958. Stochastic choice and cardinal utility. Econometrica 26, 440–444. Domencich, T., McFadden, D., 1975. Urban Travel Demand: A Behavioral Analysis. North-Holland, Amsterdam. Edwards, W., 1954. A theory of decision making. Psychological Bulletin 51, 380–417. Gravetter, F., Lockhead, G.R., 1973. Criterial range as a frame of reference for stimulus judgment. Psychological Review 80, 203–216. Gul, F., Pesendorfer, W., 2006. Random expected utility. Econometrica 74, 121–146. Halff, H.M., 1976. Choice theories for differentially comparable alternatives. Journal of Mathematical Psychology 14, 244–246. Harrison, G.W., Rutström, E.E., 2009. Expected utility theory and prospect theory: One wedding and a decent funeral. Experimental Economics 12, 133–158. Hartman, E.B., 1954. The influence of practice and pitch-distance between tones on the absolute identification of pitch. American Journal of Psychology 67, 1–14. Hey, J.D., 1995. Experimental investigations of errors in decision making under risk. European Economic Review 39, 633–640.
Hey, J.D., 2001. Does repetition improve consistency? Experimental Economics 4, 5–54. Hey, J.D., 2005. Why we should not be silent about noise. Experimental Economics 8, 325–345. Hey, J.D., Carbone, E., 1995. Stochastic choice with deterministic preferences: An experimental investigation. Economics Letters 47, 161–167. Hey, J.D., Orme, C., 1994. Investigating parsimonious generalizations of expected utility theory using experimental data. Econometrica 62, 1291–1329. Hilton, R.W., 1989. Risk attitude under random utility. Journal of Mathematical Psychology 33, 206–222. Holland, M.K., 1968. Channel capacity and sequential effects: The influence of the immediate stimulus history on recognition performance. Unpublished doctoral dissertation, Duke University. Holt, C.A., Laury, S.K., 2002. Risk aversion and incentive effects. American Economic Review 92, 1644–1655. Hutchinson, T.P., Lai, C.D., 1990. Continuous Bivariate Distributions, Emphasizing Applications. Rumsby Scientific Publishers, Adelaide, Australia. Judd, K.L., 1998. Numerical Methods in Economics. MIT Press, Cambridge, USA. Kahneman, D., Tversky, A., 1979. Prospect theory: An analysis of decision under risk. Econometrica 47, 263–291. Loomes, G., 2005. Modeling the stochastic component of behaviour in experiments: Some issues for the interpretation of data. Experimental Economics 8, 301–323. Loomes, G., Sugden, R., 1995. Incorporating a stochastic element into decision theories. European Economic Review 39, 641–648. Loomes, G., Sugden, R., 1998. Testing different stochastic specifications of risky choice. Economica 65, 581–598. Loomes, G., Moffatt, P., Sugden, R., 2002. A microeconometric test of alternative stochastic theories of risky choice. Journal of Risk and Uncertainty 24, 103–130. Luce, R.D., Suppes, P., 1965. Preference, utility and subjective probability. In: Luce, R.D., Bush, R.R., Galanter, E. (Eds.), Handbook of Mathematical Psychology, vol. III. Wiley, New York, pp. 249–410. Machina, M., 1985. Stochastic choice functions generated from deterministic preferences over lotteries. Economic Journal 95, 575–594. McKelvey, R., Palfrey, T., 1995. Quantal response equilibria for normal form games. Games and Economic Behavior 10, 6–38. Moffatt, P., 2005. Stochastic choice and the allocation of cognitive effort. Experimental Economics 8, 369–388. Moffatt, P., Peters, S., 2001. Testing for the presence of a tremble in economics experiments. Experimental Economics 4, 221–228. Mosteller, F., Nogee, P., 1951. An experimental measurement of utility. Journal of Political Economy 59, 371–404. Myers, J.L., Sadler, E., 1960. Effects of range of payoffs as a variable in risk taking. Journal of Experimental Psychology 60, 306–309. Papke, L.E., Wooldridge, J.M., 1996. Econometric methods for fractional response variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics 11, 619–632. Parducci, A., 1965. Category judgment: A range-frequency model. Psychological Review 72, 407–418. Parducci, A., 1974. Contextual effects: A range-frequency analysis. In: Carterette, E.C., Friedman, M.P. (Eds.), Handbook of Perception, vol. 2. Academic Press, New York, pp. 127–141. Pollack, I., 1953. The information of elementary auditory displays, II. Journal of the Acoustical Society of America 25, 765–769. Pratt, J.W., 1964. Risk aversion in the small and in the large. Econometrica 32, 122–136. Prelec, D., 1998. The probability weighting function. Econometrica 66, 497–527. Quiggin, J., 1982. A theory of anticipated utility. Journal of Economic Behavior and Organization 3, 323–343. Rothschild, M., Stiglitz, J.E., 1970. Increasing risk I: A definition. Journal of Economic Theory 2, 225–243. Saha, A., 1993. Expo-power utility: A ‘flexible’ form for absolute and relative risk aversion. American Journal of Agricultural Economics 75, 905–913. Starmer, C., Sugden, R., 1989. Probability and juxtaposition effects: An experimental investigation of the common ratio effect. Journal of Risk and Uncertainty 2, 159–178. Train, K., 2003. Discrete Choice Methods With Simulation. Cambridge University Press, Cambridge, UK. Tversky, A., Russo, J.E., 1969. Substitutability and similarity in binary choices. Journal of Mathematical Psychology 6, 1–12. Tversky, A., Kahneman, D., 1992. Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty 5, 297–323. Vuong, Q., 1989. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57, 307–333. Wilcox, N., 2008. Stochastic models for binary discrete choice under risk: A critical stochastic modeling primer and econometric comparison. In: Cox, J.C., Harrison, G.W. (Eds.), Research in Experimental Economics Vol. 12: Risk Aversion in Experiments. Emerald, Bingley, UK, pp. 197–292. Wilcox, N., 2007. Predicting risky choices out-of-context: A Monte Carlo study. Working Paper, University of Houston Department of Economics. Wooldridge, J.M., 2002. Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge, USA. Yohai, V.J., 1987. High breakdown point and high efficiency robust estimates for regression. Annals of Statistics 15, 642–656.
Journal of Econometrics 162 (2011) 105–113
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Evaluation of similarity models for expected utility violations David E. Buschena ∗ , Joseph A. Atwood Montana State University, United States
article
info
Article history: Available online 13 October 2009 JEL classification: C520 C900 D810 Keywords: Expected utility Information criteria Risk Similarity
abstract A body of work proposes a decision cost argument to explain expected utility (EU) violations based on pair similarity. These similarity models suggest various measures over the risky pairs that define decision costs and benefits. This paper assesses the empirical modeling success of these similarity measures in explaining risky choice patterns showing EU independence violations. We also compare model fit for these similarity models relative to EU and to a selected generalized EU model. Although the candidate models exhibit some degree of substitutability, our results indicate support for models that use relatively simple measures as instruments for similarity. © 2009 Elsevier B.V. All rights reserved.
0. Evaluation of similarity models for expected utility violations Maximization of expected utility (EU) has been the dominant explanation for risky choices. However, violations of EU, first suggested by Allais (1953) and famously addressed by Kahneman and Tversky (1979), are of considerable interest in numerous fields and have led to a search for alternative models. The generalized EU models (GEU) have been the primary approach to these EU independence violations (see the survey by Starmer (2000)). These GEU models have modified choice axioms, giving rise to a reduced-form maximization that reflects both underlying preference and nonlinear treatment of the probabilities. Another approach to EU violations considers pair similarity (Rubinstein, 1988; Leland, 1994; Buschena and Zilberman, 1995, 1999a,b; Coretto, 2002; Loomes, 2006), which assumes that decision makers may use different decision-making criteria for disparate risky choice pairs. These decision makers are modeled to apply a two-stage decision process: first, selecting a decision rule, and then making the actual risky choice. This decision algorithm selection depends on the similarity between the risky choices. A simple algorithm such as expected value maximization is chosen when the alternatives are similar, and a more complex algorithm (EU) is chosen when alternatives are not similar.
∗ Corresponding address: Department of Agricultural Economics and Economics, Montana State University, Bozeman, MT 59717, United States. Tel.: +1 406 994 5623; fax: +1 406 994 4838. E-mail address:
[email protected] (D.E. Buschena). 0304-4076/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2009.10.013
This similarity approach can be interpreted as taking into account the mental transaction costs associated with making risky choices, as well as the cost of making the wrong choice. In this way, the similarity approach has much in common with bounded rationality models for risky choice such as Payne et al. (1993) and Conlisk (1996). Harrison (1994) raised a related issue questioning the saliency of the risky pairs used to show EU violations. These similarity models also relate to a growing body of literature that models error specifications for risky choice in Ballinger and Wilcox (1997), Hey and Orme (1994), Hey (1995), Loomes and Sudgen (1998), Moffatt (2005), and Loomes (2005). A primary challenge for this framework lies in finding appropriate similarity measures that explain observed behavior. This paper develops a methodology to resolve this problem empirically, and applies it to a large data set of choices over risky pairs. We test three different similarity measures, including the Kullback–Leibler cross-entropy measure which we introduce here. The data come from experimental choice sessions with more than 300 undergraduate students facing choices over gambles offering risk-return trade-offs. Most subjects had an opportunity to play one of their selected lotteries for real payoffs. The risky pairs’ similarities vary for each subject’s set of decisions, resulting in several thousand observations that vary in choice pair similarity. After first assessing previous statistical methods used to assess EU violations and the decision limitation explanations for them, we test three candidate similarity models for risky choice patterns. We also test model fit of EU, a selected GEU model, and three models of EU with heteroscedastic error. Because most of the models evaluated are non-nested, an information criteria measure is utilized to assess their empirical fit.
106
D.E. Buschena, J.A. Atwood / Journal of Econometrics 162 (2011) 105–113
1. The pioneers: Simple means tests across two risky pairs von Neumann and Morgenstern’s (1953) EU model for risky choice provides the EU for a probability vector p = (p1 , p2 , . . . pn ) over a fixed vector of outcomes x = (x1 , x2 , . . . xn ). The probability of outcome xi is given by pi , and outcomes can take either positive or negative values. The probabilities satisfy pi ≥ 0 for all i, and ∑ i pi = 1. The expected utility of gamble p is: EU(p) =
n −
pi • u(xi ).
(1)
i=1
EU remains a powerful and widely used tool for the analysis of choice under risk. Shortly after von Neumann and Morgenstern’s paper, Allais (1953) raised the issue of systematic departures from EU. Although Allais did not carry out rigorous tests, a substantial body of subsequent evidence showed statistically significant violations of EU related to its independence axiom. These violations, in addition to other behavior inconsistent with EU, came to the fore in economics by the publication of Kahneman and Tversky (1979) seminal paper. The nature of these EU violations, and especially what to do about them, remains the topic of considerable interest within economics, psychology, management science, and other fields. Kahneman and Tversky’s (1979) certainty effect pairs, part of a more general family referred to as common ratio effect pairs, illustrate the nature of these violations: Pair 1: Choose between gamble A and B: A: gives $ 3000 with B: gives $ 4000 with probability 1.0 probability .8 gives $ 0 with probability .2 Pair 2: Choose between gamble C and D: C: gives $ 3000 with D: gives $ 4000 with probability .25 probability .2 gives $ 0 with gives $ 0 with probability .75 probability .8. Most experimental subjects select lotteries A and D. This choice pattern violates EU, as shown clearly by rewriting the gambles comprising the second pair as linear combinations of gambles A and B plus a gamble ($ 0) that denotes a gamble giving a zero payoff with certainty. This alternative presentation relies on the EU independence axiom. Under EU, if A is preferred to B, then C must be preferred to D because: C = 1/4 ∗ A + 3/4 ∗ ($0)
and D = 1/4 ∗ B + 3/4 ∗ ($0).
The empirical analysis of these original EU violations consisted of means tests of choice proportions between pair AB and CD. These tests are simple, and, in a sense, powerful. These tests, however, offer a limited view of the nature of these violations, and in particular how robust they are to gambles that differ in the probabilities of the alternative outcomes. 2. The extensive tests: Multiple choices per person and panel data In response to the simple EU violations above, numerous GEU models have been developed. These GEU models essentially introduced a nonlinear treatment, through a function π(·), of the probabilities for the valuation of risky alternatives, defining: GEU (p) =
n −
πi (p)u(xi ).
(2)
i =1
It is important to note that the function π(·) is defined over the entire probability vector p. The specific nature of π(·) distinguishes
the competing GEU models. Virtually all of these models retain some of the EU axioms (chiefly monotonicity and transitivity), while weakening the independence axiom in order to allow for common patterns of EU violations. Particularly good summaries of these models can be found in Fishburn (1988), and in Harless and Camerer (1994). In order to test between the many competing GEU models, Harless and Camerer (1994), Hey and Orme (1994), Hey (1995), Wilcox (1993), Ballinger and Wilcox (1997), and others evaluated choice over an extensive set of risky pairs. Some of these empirical estimations apply maximum likelihood methods and information criteria to the analysis of the competing GEU and related models. Ballinger and Wilcox and also Hey additionally and gainfully explored the role of heteroscedastic error structures for explaining risky choice. 3. A decision cost explanation: Similarity models In contrast to the ‘‘reduced form’’ treatment of choice patterns through the weighting function π (·) under the GEU models, an alternative approach establishes instruments for decision costs and benefits based on the importance and/or the difficulty of risky choice selection. To motivate this approach, consider again the Kahneman and Tversky pairs AB and CD above. The similarity approach (Rubinstein, 1988), 2003, (Leland, 1994; Buschena and Zilberman, 1995, 1999a,b; Coretto, 2002; Loomes, 2006) posits that the difference in the relative importance in choice between dissimilar pair AB and similar pair CD explains patterns of choice violating EU. In the similarity approach, respondents are held to place less importance on the choice between pair CD relative to the choice between pair AB. Both pairs offer a choice between a higher risk/higher return alternative (B and D) and a lower risk/lower return alternative (A and C). Respondents use EU to select between pair AB but use expected value maximization to select between the less important pair CD. This choice algorithm selection results in a higher likelihood of selection of the more risky alternative D over C than for the more risky alternative B over A. These similarity models suggest a structural approach whereby choice models are augmented with a measure that reflects the importance and difficulty of choice. Specifically, assuming that computation extracts mental cost, the decision maker selects from two algorithms. One algorithm is more effort intensive (EU), and the other less effort intensive (in this application expected value maximization). The similarity approach can be interpreted as the outcome of an optimization process where a decision maker maximizes expected utility of choice less mental cost of calculation. When outcomes are similar, and the loss for making the wrong choice is small, the decision maker may prefer the less effort intensive approach. On the other hand, when outcomes are dissimilar, and the loss for making the wrong choice is large, the decision maker may prefer the more effort intensive and more accurate approach. Ballinger and Wilcox (1997) found some support for similarity effects in their exploration of heteroscedastic error structures and risky choice. Buschena and Zilberman (1995, 1999a,b) tested a similarity model against GEU models, and found support for similarity effects through a structural approach. One difficulty with the similarity approach lies in defining an approximate measure because ‘‘similarity’’ is subjective. Rubinstein (1988) considers fairly simple gambles that offer only one non-zero outcome. In his model, the more risky gamble offers a p chance at outcome x, while the safer gamble offers a q chance of y (p < q and x > y). Rubinstein provides an axiomatized model for choice under EU preference with similarity effects, and considers two measures for similarity on both outcomes and probabilities (defined here for probabilities).
D.E. Buschena, J.A. Atwood / Journal of Econometrics 162 (2011) 105–113
Rubinstein’s first measure is the absolute difference between the probabilities, |p − q|. His relative difference measure is the ratio of the probabilities, p/q. Rubinstein also includes a qualitative similarity criterion; probability q is dissimilar to p if q = 1 and p < 1 — that is, degenerate probabilities that offer an outcome with certainty are viewed as dissimilar to all other probabilities. Because most risky pairs of interest offer more than one nonzero outcome, more general similarity measures have been proposed. Three similarity models are considered in this paper. All of the models used nonlinear measures over the probability vectors to define similarity, and all of these models differ in how they handle qualitative similarity (lotteries offering certainty of a positive payoff). Candidate model 1: Distance-based similarity. This similarity model is relatively simple, using the Euclidian distance between the probability vectors p and q to describe differences between gambles (Buschena and Zilberman, 1995, 2000). This distance is:
1/2 n − 2 D(p, q) = (pi − qi ) .
(3)
i =1
Moffatt (2005) has recently used this distance measure to assess decision time for choice over the risky gambles from an experiment from Hey and Orme (1994). This distance measure is reasonably applied for the set of gambles considered here that offer non-trivial risk-return trade-offs. A more generalized difference measure can also be considered as the absolute difference in the distributions’ Cumulative Distribution Functions (Buschena and Zilberman, 2000). In our empirical estimation, both the distance measure and its square will be considered. In addition to the distance measure, some of the models estimated also include a binary measure, quasi-certainty, that generalizes the qualitative component in Rubinstein’s similarity definition. Quasi-certainty takes the value 1 if the less risky of the two gambles in the pair gives greater than a zero payoff with certainty. Candidate model 2: (Dis)similarity as defined through cross entropy. Entropy provides a useful alternative measure for describing the differences between probability distributions. Specifically, the cross-entropy measure described below serves to define the similarity between risky alternatives. Shannon’s (1948) entropy stems from information theory and serves as a measure of how a particular distribution differs from a uniform distribution (Golan et al., 1996; Preckel, 2001). The entropy measure for a discrete distribution is: Entropy(p) = −
n −
pi ln(pi ).
(4)
i=1
The smaller a distribution’s entropy, the more information it contains. A uniform distribution has maximum entropy and is least informative, while a degenerate distribution with pi taking only values 0 or 1, has minimum entropy and is most informative. As pointed out by a reviewer, there is a brief but far-reaching history of discussion of entropy in economics. Marschak (1959) introduced entropy as a measure of information, and Arrow (1971) extended this discussion to relate entropy as the supply price of information. More recently for risk applications, Coretto (2002) develops a risky choice model, combining EU with Shannon’s entropy as an explanation of EU violations. More useful than Shannon’s entropy for evaluating risky pair similarity is the Kullback and Leibler (1951) cross entropy that measures how two distributions (our p and q) differ from one another: Cross Entropy(p, q) =
n − i=1
pi ln(pi /qi ).
(5)
107
Two identical (extremely similar) distributions would have a cross entropy value of 0. As the distributions differ more from one another (become increasingly dissimilar), the cross entropy measure increases. Note that the cross entropy functional is not symmetric, with CrossEntropy(p, q) ̸= CrossEntropy(q, p). Empirically, we will consistently define cross entropy with p defining the less risky and q the more risky (higher variance) gambles. Although there is a symmetric form for cross entropy, our use of the asymmetric form is not expected to affect our empirical results in a significant way.1 In order to provide additional background, there are also other candidate measures for pair similarity using entropy (Read and Cressie, 1988). Under the entropy-based candidate similarity model, the cross entropy of distributions is used to define the similarity of the pairs. For our empirical application, qi never takes the value 0. If pi takes the value 0, the value inside the sum in (5) is set to zero for our estimations. In our empirical application, we consider subsets of models including the risky pairs’ cross entropy, its square, and the previously defined quasi-certainty measure. Candidate model 3: Loomes’ nonlinear similarity. Loomes (2006) recently introduced a nonlinear similarity measure (φ). This measure allows for both the ratio of the gamble probabilities and the difference in the probabilities to affect choice. The relative effects of the probability ratios and differences can vary across individuals in Loomes’ measure, with these relative effects defined through two parameters, α and β , in the following formulation. Loomes’ model allows for gambles over as many as three outcomes, with these gambles defined through their probability vectors p and q: α
φ(α, β, p, q) = (fgh)β [(aI /aJ )(aI +aJ ) ], where f = [1 − (p1/q1)]; g = [1 − (q2/p2)]; h = [1 − (p3/q3)]; aI = (q1 − p1 ); and aJ = (q3 − p3 ).
(6)
Loomes development of φ(·) includes a discussion of its two component parts, relating to the two coefficients α and β . The coefficient α is restricted to be non-positive and defines the degree of divergence between the perceived (subjective) and objective (actual) probability ratios. If α = 0, there is no difference between these ratios; as α declines, this perceived vs. objective difference becomes larger. The (fgh)β component of φ(·) ‘‘scales down’’ the bracketed portion that relates to how close the less risky alternative is to certainty. In particular, Loomes’ is working to define qualitative certainty-type effects more generally than through a certainty or quasi-certainty definition. The β coefficient is restricted to be nonnegative. A value of β = 0 in (5) indicates that the ratio of probabilities does not affect choice, while β values of larger absolute value indicate a person strongly affected by these ratios, and thus more influenced by the less risky gamble’s relationship to certainty. As for α , the β coefficient likely differs across individuals in Loomes’ formulation. Alternative models: RDEU formulations and heteroscedastic error structures. As an alternative to similarity approaches, numerous models generalizing EU have been extensively tested versus one another, and versus the similarity models and against various heteroscedastic error models (Hey and Orme, 1994; Hey, 1995; Buschena and Zilberman, 2000; Moffatt, 2005). Although the focus of our paper is on assessing the various similarity formulations,
1 Our thanks to an anonymous reviewer for pointing out the symmetric form for ∑n ∑n ∑n cross entropy: i=1 pi ln(pi /qi ) + i=1 qi ln(qi /pi ) = i=1 (pi − qi ) ln(pi /qi ).
108
D.E. Buschena, J.A. Atwood / Journal of Econometrics 162 (2011) 105–113
we include estimation results from the GEU model that has shown empirical promise in previous research. We will assess, for comparison, the empirical performance of the similarity models versus two formulations of Quiggin’s (1982) Rank-Dependent EU (RDEU) model and also three models of EU with heteroscedastic error. We selected the RDEU models because (1) they have performed well empirically for these data (Buschena and Zilberman, 1999a,b) and another extensive data set (Hey and Orme, 1994; Hey, 1995; Buschena and Zilberman, 2000), and (2) they are specified by a relatively small number of parameters among the GEU models. This second point is important for our relatively small sample. Readers are directed to Quiggin (1982) and these empirical papers testing RDEU for the definition of this model. One of our RDEU model specifications does not impose any particular form for the utility function, using instead separate parameters for the outcomes (a two-parameter utility model) as in Hey and Orme (1994). This utility function is defined parametrically for a risky alternative defined through its probability vector p over three outcomes as: EU(p) = α ∗ x2 ∗ (p2 ) + β ∗ x3 ∗ (p3 ).
(7)
In (7), no parameter is necessary for outcome $ 0 without any loss of generality, with the utility of a payoff of zero normalized to zero as in Hey and Orme. The other RDEU model tested imposes the one-parameter constant absolute risk aversion (CARA) utility structure used in Moffatt (2005). Relative to the formulation in (7), this model exchanges an anticipated reduction in fit for a reduction by one in the number of parameters describing the utility function. The utility function over the outcomes is defined for each outcome xi as: U (xi ) =
1 − exp(−α xi )
α
.
(8)
In addition to the two RDEU formulations, we consider three models for EU with heteroscedastic error. One of the heteroscedastic error structures holds that the error variance decreases with the pair’s Euclidian distance. Under a logistic distribution for discrete choice, this error variance is σD2 = exp(α ∗ D)2 for D as in (3) above. A generalization of this heteroscedastic error approach was tested in Buschena and Zilberman (2000). Another heteroscedastic error structure we test was introduced in Hey (1995) and also has empirical support in Buschena and Zilberman (2000), where the heteroscedastic variance depends on the number of outcomes supported (having positive probability) for each gamble in the risky pairs. Under the logistic distribution, this heteroscedastic error variance is σN2 = exp(β ∗ N )2 , where N is the average number of outcomes supported in the pair. The value of N ranges from 1.5 to 3 in our experiment. A third heteroscedastic error structure was also tested in Hey (1995), where the variance depends on the absolute difference between the expected utilities of the gambles. We define this heteroscedastic error through σN = exp(γ ∗ A)2 , where A is the absolute difference in the gambles’ expected utilities for utility defined as in (7). 4. The data: ‘‘Industrial strength’’ probability triangle with real payoffs
respondent faced both hypothetical and ‘‘real’’ gambles, where, for each subject, one of their selected gambles from a randomly chosen pair was played for actual cash. Subjects generally completed the experiment in 20 min. More than 300 undergraduate students in the experiment selected between gamble pairs offering a risk/return trade-off. For each subject set of pairs, EU would predict that (1) either the less risky choice would always be selected for every choice pair by an individual, or (2) the more risky choice would always be selected. This design characteristic is important for our empirical analysis because it allows for a relatively small number of parameters to define the differences between our candidate models. Only a few (27, 8.5%) subjects had choices completely consistent with EU, in that they always selected the least risky option in every pair (not one of the 313 subjects always selected the most risky option). Put another way, the majority of subjects provided a set of choices showing some violation of EU. Our empirical focus is to assess these violations using candidate models of similarity, RDEU, and heteroscedastic error. The experiment and data are additionally described in Buschena and Zilberman (1999b). The risky pairs in the experimental design differed considerably in their quantitative and qualitative similarity, allowing an extensive test of how similarity (defined through our three candidate measures) relates to the occurrence of EU violations. Gamble p = (p1 , p2 , p3 ) and q = (q1 , q2 , q3 ) were defined for each subject over a common set of outcomes x = (x1 , x2 , x3 ) for all questions faced by each subject. Each subject selected a lottery for every pair presented, with a total of 26 risky pairs drawn randomly from a larger set of 106 pairs. Respondents faced either two or three of these pairs using a compound lottery formulation that alters choice and EU violations (see Fishburn (1988)); these lotteries were omitted from the analysis for every subject. The pattern for this set of gambles began with Kahneman and Tversky’s (1979) certainty effect pairs AB and CD, described at the outset of our paper. Kahneman and Tversky’s four original gambles are included in the entire set of risky pairs we use, but are augmented by an additional 104 gamble pairs. In each of these pairs, one gamble (s) is less risky (lower expected value and variance) than the other gamble (r ). The gambles making up the pairs are illustrated in Fig. 1. Lotteries on the borders of the unit triangle are listed in the table below Fig. 1, with each border pair defined by the probability vectors b1 and b2 . Kahneman and Tversky’s pairs are RV and kℓ in this figure. Every other lottery on the locus of points is defined using a scalar α ∈ (0, 1), as a linear combination of the border pairs: α b1 + (1 − α) b2 . For example, gamble B in the figure is a combination of border pairs A and C where α = 0.5. For interior pairs on the DH, RV, and fj loci, α takes values 0.25, 0.5, and 0.75. For interior pairs on the IQ and We loci, α takes values 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, and 0.875. A risky pair was defined for every possible combination of points on each locus— e.g., additional pairs on the DH locus were DE, DF, DG, EF, EG, EH, FG, FH, and GH. There were 106 gamble pairs in total to be selected from. The complete set of risky pairs varies considerably in their values for the candidate similarity measures, RDEU predictions, and measures defining the heteroscedastic error variance. 5. What do decision makers view as similar?
Our experiment was designed to intensively test for similarity effects on EU violations. The risky pairs allow for clear tests of both quantitative (such as measured by Euclidian distance) and qualitative effects (such as measured by quasi-certainty) across a large set of risky pairs. The set of pairs given to each respondent was devised so that parametric estimation of the three candidate similarity models in explaining EU violations should be possible. Each
Our three candidate models for risky pair similarity were designed to explain choice patterns violating EU, and that explanation is our primary interest. We do not know whether subjects view these pairs as similar or not. To further explore these measures, we assess respondents’ subjective similarity over these pairs. The experiment elicited subjective responses regarding the similarity of
D.E. Buschena, J.A. Atwood / Journal of Econometrics 162 (2011) 105–113
109
1.0
pH
C H Q
0.8
V
G
B
P O
e
F
0.6 A
N
U
d M
E
c
L T
D
0.4
b
K J I
0.2
j
a
i
Z
S
h
Y
g
X R
0 0
f
W
0.2
0.6
k 0.75
1.0
pL
Fig. 1. Probability triangle for experimental risky choice pairs.
the gamble pairs, using a visual scale on the computer screen over a subset of risky choice pairs (5–6 per person). This similarity elicitation has a history of use for decision analysis in psychology (see, for example Mellers et al. (1992)). Respondents moved a cursor over a computer screen to indicate their subjective similarity level on the scale below:
0.7 0.6 0.5 0.4 0.3
The purpose of our inquiry into how subjective similarity relates to objective measures, is to measure and test the effects of similarity on choice, as illustrated in Fig. 2. Median similarity reported over all respondents for the subset of 1672 questions using the similarity scale is graphed against the percent of respondents selecting the more risky alternative in the pair. This relationship is in the direction hypothesized, with a correlation of .72. Estimation. The three candidate models were used to assess the subjective responses via OLS regression, n = 1672. The error structure did not exhibit any apparent error assumption violations. There were no observations at the lower bound and only eight observations at the upper bound of 10. Estimations using censored models with both upper and lower bounds showed no significant improvement in fit over OLS for the two linear-in-parameters models. The Euclidian distance model for explaining subjective similarity included a constant, distance, its square, and the quasi-certainty measure. The cross-entropy model included a constant, the pairs’ cross entropy, and the quasi-certainty measure; a model including the square of the cross entropy showed no significant improvement in model fit. The Loomes’ power function model included a constant and the function φ(·) defined through parameters α and β in (6).
0.2 0.1
Median
Linear fit
0 1 Very dissimilar
2
3
4
5
6
Median perceived similarity
7
8
Very similar
Fig. 2. Median choice and perceived similarity.
The results of the estimation for the subjective similarity responses under the Euclidian distance model are given in Table 1. The subjects viewed a pair as increasingly less similar at a decreasing rate, as the pair’s distance increases. Subjects also viewed quasi-certain pairs as significantly less similar. The results of the cross-entropy model in Table 2 show the pairs’ subjective similarity decreasing as the cross entropy increases. Quasi-certain pairs are subjectively less similar. This cross-entropy model had the worst fit (judging by the Akaike information criteria) of the three models. The results of the Loomes’ model for predicting subjective similarity are reported in Table 3. The estimated alpha and beta
110
D.E. Buschena, J.A. Atwood / Journal of Econometrics 162 (2011) 105–113
Table 1 Subjective similarity and Euclidian distance. Variable
Coefficient est.
Standard error
Constant Distance Distance SQ Quasi certainty
6.42* −5.84* 3.06* −.674*
.065 .361 .291 .151
LLF −3285.2, AIC = 3.93; N = 1672; R2 = .201; R2 adjusted = .200. * Indicates significance at the 1% level. Table 2 Subjective similarity and cross entropy. Variable
Coefficient est.
Standard error
Constant Cross entropy Cross entropy squared Quasi certainty
4.73* −.47E−02* −.13E−04 −1.01*
.050 .26E−02 .16E−04 .151
LLF −3414.1, AIC = 4.09; N = 1672; R2 = .068; R2 adjusted = .067 * Indicates significance at the 1% level Table 3 Subjective similarity and Loomes’ power function. Variable
Coefficient est.
Standard error
Constant Alpha Beta
3.61* −0.23* 2.20*
0.202 0.011 1.05
LLF −3286.0, AIC = 3.93. * Indicates significance at the 1% level.
coefficients were inside the allowable range (recall that alpha must be non-positive and beta must be non-negative). Model fit using the Akaike information criteria measure is virtually identical to that for the distance model in Table 1. These subjective similarity estimations provide some support for the three objectively defined similarity measures. These measures’ empirical value in predicting risk choice patterns is assessed below. 6. Estimated effects of candidate models on choice 6.1. Estimation and assessment Estimation. Variants of each of the three candidate similarity models, the two RDEU formulations, and the three models using EU with heteroscedastic error, were estimated in a logit framework for discrete choice (0 = the less risky lottery selected, 1 = the more risky lottery selected). These were estimated separately for each respondent’s choices, where there were generally 24 observations
per respondent.2 The degrees of freedom in these estimations ranged from 23 for EU to 19 for the models with the largest number of parameters. The error structure for the discrete choice problem enters the individual’s valuation of each pair of gambles additively, as in Hey and Orme (1994): y∗ = V (p, q) + ε.
(9)
In Eq. (8), the function V (·, ·) is defined for either EU, EU with similarity arguments, or RDEU. All of these candidate models are discussed above. The error term ε is a random draw from a logistic distribution, rather than from a normal as in Hey and Orme, to allow for the potential for our relatively small sample size to give rise to higher variance and heavier tails. The error variance is homoscedastic for some of the models we test, and heteroscedastic in others, also discussed above. Estimations were carried out through a grid search using the public domain software R. This software has become a powerful statistical tool that is particularly good at data management and model diagnostics. We estimated a total of 16 different models for each subject’s choices. This fairly extensive coverage of the various models was undertaken in order to give an extensive test of each model’s predictive power. The 16 candidate models are listed in Table 4, with the variable set omitting the constant terms that were included for each model. Model L2 was added to the list of estimated models after preliminary runs over model L1 gave rise to β estimates of zero for all of the first 10 subjects considered. There were some subjects who had no risky lotteries on the border of Fig. 1, so none of their observed choices had a value other than 0 for the quasi-certainty variable. For these subjects, models D3, D4, C3, and C4 were estimated, omitting the quasi-certainty variable; for example, for these subjects, model D1 was estimated instead of model D3. Our model assessment was carried out in two steps in an approach consistent with Hey and Orme, first using likelihood ratio tests for nested models and then using an information criteria measure. Assessment step 1. Likelihood ratio tests were used to select a model from within each family of nested models. Consider the Euclidian distance family; Model D4 was tested against D2, D1, and EU; Model D2 was subsequently tested against D1 and EU; and then D1 was tested against EU. The same procedure was used to assess D4, D1, and EU. For the non-nested three-parameter models D2
2 The non-EU estimations were not carried out for the 24 subjects who always selected the least risky lottery, but these subjects are included in the summary statistics in Tables 5 and 6.
Table 4 List of candidate models. Model EU
Measure/Model Expected utility
Variables (in addition to a constant) None
D1 D2 D3 D4 C1 C2 C3 C4 L1 L2 R1 R2 H1 H2 H3
Euclidian distance Euclidian distance Euclidian distance Euclidian distance Cross entropy Cross entropy Cross entropy Cross entropy Loomes’ power Loomes’ power RDEU Quiggin RDEU Quiggin Heteroscedasticity Heteroscedasticity Heteroscedasticity
Distance Distance, distance squared Distance, quasi-certainty Distance, distance squared, quasi certainty Cross entropy Cross entropy, cross entropy squared Cross entropy, quasi certainty Cross entropy, cross entropy squared, quasi certainty Loomes power function, unrestricted model Loomes power function, β restricted to zero Quiggin’s RDEU model, general utility Quiggin’s RDEU model, CARA utility EU with heteroscedastic error: Euclidian Distance EU with heteroscedastic error: average number of outcomes EU with heteroscedastic error: absolute difference in EU
D.E. Buschena, J.A. Atwood / Journal of Econometrics 162 (2011) 105–113 Table 5 Percentage of subjects (N = 313) with models significantly different from EU. Model
D1 D2 D3 D4 C1 C2 C3 C4 L1 L2 R1 R2 H1 H2 H3
Subjects with significant likelihood ratio tests statistics vs. EU at various significance levels 1% level
5% level
10% level
25% level
19% 19% 19% 19% 10% 11% 9% 14% 0% 17% 17% 7% 1% 2% 0%
36% 31% 33% 31% 18% 27% 24% 33% 0% 31% 30% 20% 10% 12% 3%
42% 40% 41% 42% 28% 44% 35% 44% 0% 37% 36% 30% 24% 22% 5%
54% 60% 56% 60% 28% 66% 53% 64% 0% 51% 44% 50% 56% 56% 12%
vs. D3, the model with the superior log-likelihood was selected in the event that the likelihood ratio tests did not favor D4, D1, or EU over these two models. For clarity, consider also the cross-entropy family. Likelihood ratio tests were carried out for C4 vs. C2, C1, EU, etc. Loomes’ Model L1 was assessed using the 10% likelihood ratio tests against the restricted Model L2 and EU. L2 was then assessed relative to EU. The restricted RDEU Model R2 was tested against R1 and EU, and R2 was tested against EU using the same test statistic and critical level. Each of the heteroscedastic specifications (H1, H2, and H3) were assessed vs. EU using likelihood ratio tests. Assessment step 2. Information criteria provide a useful method of fit comparison for the multiple non-nested models we consider. Both the Akaike (1973) and the Bayesian information criteria measure (BIC) developed in Schwartz (1978) gave virtually identical ranking results. The BIC has the added benefit of providing posterior odds ratios for the likelihood the data were drawn from a particular model (discussed below), so we report the BIC ranks here. Schwartz’s BIC for model i with sample size n is: BICi = [−LLF i (.) + 1/2ki ∗ log(n)].
(10)
This BIC, like other measures such as the Akaike information criteria, ranks model j based on its log-likelihood function, LLF j (.), plus a penalty for the number of parameters in the model, kj . These ranks are from lowest (the best model) to highest BIC. 6.2. Results Information from the likelihood ratio tests and information criteria rankings for each model over every individual’s choices is given in Tables 5 and 6. We first report (Table 5) the number of subjects for which EU was not rejected at various significance levels in favor of the set of alternative models listed in Table 4. We then report (Table 6) the BIC ranking results and posterior odds for the selected models from each one of the seven ‘‘families’’ of nested models in Table 4. These families are distance (D1–D4), cross entropy (C1–C4), Loomes power (L1–L2), RDEU (R1–R2), and the three heteroscedastic error families H1, H2, and H3. Higher order models vs. EU. The proportion of respondents whose choices did not support each one of the higher order models over EU are given in Table 5. The likelihood ratio tests determining these proportions are listed for the 1%, 5%, 10%, and the 25% levels. Although some of the reported results are somewhat redundant given the nested nature of these models (e.g., D1 nested inside D2, D2 nested inside D4), we report all of these proportions for completeness.
111
There is clearly considerable difference between the models across the columns under various significance levels. The distance family (D1–D4), Loomes (L2), and the RDEU family (R1) fared best at the 1% level. Additional support was given for every family as the test significance level was weakened to 5%, 10%, and 25% levels. It is useful to gauge these results relative to another data set that has been extensively studied, in order to evaluate our experiment and to select the significance level for the likelihood ratio tests. The proportion rejecting EU under the 10% significance level for Model R1 in our experiment with (generally) 24 observations compares with the proportion rejecting EU in favor of the same RDEU formulation in Hey and Orme’s combined data set over 200 choices for 80 subjects. Hey and Orme (Table VI, page 1311) report test results for both a 1% and a 5% level, and use the 1% level for the nested tests in the remainder of their paper. We will use the 10% level of significance for our nested tests given our smaller sample. Although it is difficult to generally assess the individual specifications within each family, the unrestricted Loomes model (L1) is clearly not supported by these data. Indeed, this model was ranked quite poorly for all subjects. The restricted Loomes model (L2) alternatively shows some promise. At least one of the heteroscedastic models, H3, also appears to have quite limited statistical support. Model selection results. Table 6 provides significance test information and BIC ranking results for each of the seven families of nested models individually given in Tables 4 and 5. As discussed above, likelihood ratio tests were used to select between the specifications for the Euclidian distance family (D1–D4), the cross-entropy family (C1–C4), the Loomes power family (L1–L2), the RDEU family (R1-R2), and the three heteroscedastic error families (H1, H2, and H3). This selection process creates a set of nonnested models representing each of the families of models, plus EU. The second column in Table 6 lists the number of subjects for whom EU was not rejected in favor of the selected model from a family (e.g., D1) in the likelihood ratio test. The cross-entropy family is favored under this criterion, with the heteroscedastic models generally doing poorly. Note also that there were 47 subjects for whom EU was not rejected in favor of any of the alternative models; that is, there were 266 subjects for whom EU was rejected in favor of at least one alternative model. The third column in Table 6 provides the proportion of subjects for whom the selected model for each family was either ranked highest by the BIC criteria for non-nested models, or, in the case of EU, where it was not rejected in favor of any other model using likelihood ratio statistics. The cross-entropy family at 35%, and the distance family at 31%, of the population exhibit the strongest support using this criterion, while the three heteroscedastic error specifications have quite low support. The EU model is the best model for 15% of the respondents under this ranking criterion. Of this 15%, approximately half (24) of these subjects had choices that exhibited no EU violations; that is, they always selected the less risky lottery for every pair. An additional 23 of these respondents exhibited some EU violations, but EU was statistically the best model under the likelihood ratio test statistics. Posterior probabilities using the BIC measures. Even though a model might not be the most favored under the likelihood ratio tests and counts of BIC #1 rankings in column 3 of Table 6, it may have merit in explaining choice under risk. A model might, for instance, score a large number of second-place BIC ranks and perform relatively well on average for the entire set of respondents. One way to provide this information is to proceed as in Hey (1995), who provides subject counts for all 11 ranks for the set of models he considers. For some subjects, however, many of these seven model families did not have support vs. EU in likelihood ratio tests (column 2 in Table 6). The question then becomes how to rank these families
112
D.E. Buschena, J.A. Atwood / Journal of Econometrics 162 (2011) 105–113
Table 6 Results for nested significance tests and Bayesian information criteria, selecting the best model for each subject. N = 313. Model
Number for whom EU not rejected at 10%
Favored model using nested tests or BIC (%)
Mean
Median
Max.
Min.
0. Expected utility D1–D4. Euclidian distance family C1–C4. Cross-entropy family L2. Loomes’ power formulation, restricted model R1. Quiggin’s RDEU, no assumption on utility H1. EU with hetero. error, distance H2. EU with hetero. error, number of outcomes H3. EU with hetero. error, absolute EU difference
(47)a 150 129 195
15 31 35 7
28 25 14
16 14 10
100 100 100
0 0 0
219
8
13
9
100
0
240 263
0 3
7 8
7 6
26 92
0 0
298
0
6
5
40
0
a
BIC posterior odds (%)
There were 47 subjects whose choice patterns were such that no higher level model was significantly better than EU at the 10% level.
since they all nest EU as a submodel, and they are in a sense tied with one another when more than one of the model families do not support higher order models beyond EU. Such questions arise for using either counts of rank placement (2nd, 3rd, etc.), and summary measures such as average ranks. An additional consideration is that assessing model ranks becomes increasingly difficult as the number of models increases. We use a summary statistic that mitigates some of the ranking concerns and additionally provides useful model fit information. The Bayesian structure of the BIC statistic in Eq. (8) allows construction of an approximate measure of the (posterior) probability for observing the estimated BIC conditional on a uniform (uninformative) prior. We construct and estimate these estimated posterior probabilities for the families of the candidate models, where again the representative from each family was selected via the likelihood ratio tests. Note that, in the event that the EU model was selected as the representative for a particular family of models, the posterior conditional probabilities for this family were calculated using the EU likelihood ratio and the models’ single parameter. For the 47 subjects for whom EU was not rejected for any of the candidate model families, the posterior conditional probabilities were equal for all seven of the families with a value of .143. Because EU was selected to represent multiple families for numerous subjects, we do not report the conditional probabilities for the EU model. The approximate posterior conditional probabilities, the BIC weights, for model j and respondent i are constructed as:
−
BIC Weightji = exp(−BICji )
exp(−BICki ).
(11)
k
As shown in Schwartz, these BIC weights are the approximate posterior probabilities of observing the sample conditional on an uninformative prior (see also Ramsey and Schafer, 2002)). We provide summary statistics for these BIC weights across the set of respondents for each of the candidate models in the last three columns in Table 6. Although omitted, minimums for these approximate posterior probabilities were uniformly zero for each model, reflecting that no one model family fits the data from every subject. The mean posterior probabilities illustrate the relative substitutability of the candidate models, with the results favoring the Euclidian distance based models (D1–D4), and the models using cross entropy (C1–C4). 7. Conclusion Experimental studies suggest that the traditional frameworks for analyzing decision making under uncertainty have to be modified. The similarity approach can explain some of the behavioral
paradoxes detected by experimental studies, and it also suggests that different algorithms are used for different choices under uncertainty. In particular, this similarity approach suggests that EU is used when choices are dissimilar, and as a result the stakes are higher, and a simpler rule is used when the alternatives are more similar and the stakes are correspondingly lower. The task, then, is to identify which measure of similarity is triggering the algorithm choice. We introduce a new measure for risky pair similarity using the Kullback–Liebler cross entropy. We test this cross-entropy measure against the previously proposed Euclidian distance measure and against a nonlinear similarity specification proposed by Loomes. The empirical fit of each of these three non-nested candidate similarity measures was evaluated relative to the empirical fit of a selected GEU model and three models of EU with heteroscedastic error. The empirical analysis suggests that the Euclidean distance measure and the cross-entropy measures are most consistent with the observed data. The distance measure is quite simple and has been empirically supported in previous work. The heretofore untested cross-entropy models use a natural measure for differences in the information content of the distributions that define the risky pairs. The methodology presented here allows for selection between similarity measures using (1) robust grid search maximum likelihood estimation routines over the full set of choices made by a respondent, and (2) the Bayesian information criteria measures. These methods provide a general evaluation of numerous competing models such as similarity approaches, generalized EU, and heteroscedastic error formulations. In this regard, we extend work by Hey and Orme (1994), and by Hey (1995), to which risk analysts are substantially indebted. References Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, F. (Eds.), Second International Symposium on Information Theory. Akademiai Kiado, Budapest, pp. 267–281. Allais, M., 1953. Le comportement de l’homme rationnel devant le risque: Critique de postulats et axiomes de l’ecole américaine. Econometrica 21, 503–546. Arrow, K.J., 1971. The value of and demand for information. In: McGuire, C.B., Marshack, J. (Eds.), Decision and Organization: A Volume in Honor of Jacob Marshack. North-Holland, Amsterdam, pp. 131–140. Ballinger, T.P., Wilcox, N.T., 1997. Decisions, error, and heterogeneity. The Economic Journal 107, 1090–1105. Buschena, D.E., Zilberman, D., 1995. Performance of the similarity hypothesis relative to existing models of risky choice. Journal of Risk and Uncertainty 11, 233–262. Buschena, D.E., Zilberman, D., 1999a. Testing the effects of similarity on risky choice. In: Machina, M., Munier., B. (Eds.), Beliefs, Interactions and Preferences in Decision Making. Kluwer, Dordrecht. Buschena, D.E., Zilberman, D., 1999b. Testing the effects of similarity on risky choice: Implications for violations of expected utility. Theory and Decision 46, 253–280.
D.E. Buschena, J.A. Atwood / Journal of Econometrics 162 (2011) 105–113 Buschena, D.E., Zilberman, D., 2000. Generalized expected utility, heteroscedastic error, and path dependence in risky choice. Journal of Risk and Uncertainty 20, 67–88. Conlisk, J., 1996. Why bounded rationality. Journal of Economic Literature 34, 669–700. Coretto, P., 2002. A theory of decidability: Entropy and choice under uncertainty. Rivista di Politica Economica 92, 29–62. Fishburn, P.C., 1988. Nonlinear Preference and Utility Theory. Johns Hopkins University Press, Baltimore, MD. Golan, A., Judge, G., Miller, D., 1996. Maximum Entropy Econometrics: Robust Estimation with Limited Data. John Wiley and Sons, New York. Harless, D., Camerer, C.F., 1994. The predictive utility of generalized expected utility theory. Econometrica 62, 1251–1290. Harrison, G., 1994. Expected utility theory and the experimentalists. Empirical Economics 19, 223–253. Hey, J.D., 1995. Experimental investigations of errors in decision making under risk. European Economic Review 39, 633–640. Hey, J.D., Orme, C., 1994. Investigating generalizations of expected utility theory using experimental data. Econometrica 62, 1291–1326. Kahneman, D., Tversky, A., 1979. Prospect theory: An analysis of decision under risk. Econometrica 47, 263–291. Kullback, S., Leibler, R.A., 1951. On information and sufficiency. Annals of Mathematical Statistics 22, 79–86. Leland, J.W., 1994. Generalized similarity judgments: An alternative explanation for choice anomalies. Journal of Risk and Uncertainty 9, 151–172. Loomes, G., 2005. Modeling the stochastic component of behaviour in experiments: Some issues for the interpretation of data. Experimental Economics 8, 301–323. Loomes, G., 2006. The improbability of a general, rational and descriptively adequate theory of decision under risk. Working Paper, School of Economics, University of East Anglia. Loomes, G., Sudgen, R., 1998. Testing different stochastic specifications of risky choice. Economica 65, 581–598.
113
Marschak, J., 1959. Remarks on the economics of information. Contributions to Scientific Research in Management, Western Data Processing Center, University of California, Los Angeles, pp. 79–98. Mellers, B.A., Ordónez, L., Birnbaum, M.H., 1992. A change-of-process theory for contextual effects and preference reversals in risky decision making. Organizational Behavior and Human Decision Processes 52, 319–330. Moffatt, P.G., 2005. Stochastic choice and the allocation of cognitive effort. Experimental Economics 8, 369–388. Payne, J.W., Bettmann, J.R., Johnson, E.J., 1993. The Adaptive Decision Maker. Cambridge University Press, Cambridge, United Kingdom. Preckel, P., 2001. Least squares and entropy: A penalty function perspective. American Journal of Agricultural Economics 83, 366–377. Quiggin, J., 1982. A theory of anticipated utility. Journal of Economic Behavior and Organization 3, 323–343. Ramsey, F.L., Schafer, D.W., 2002. The Statistical Sleuth: A Course in Methods of Data Analysis, 2nd ed. Duxbury, Thompson Learning, Belmont, CA. Read, T.R.C., Cressie, N.A.C, 1988. Goodness-of-Fit Statistics for Discrete Multivariate Data. Springer-Verlag, New York. Rubinstein, A., 1988. Similarity and decision making under risk: Is there a utility theory resolution to the Allais Paradox? Journal of Economic Theory 46, 145–153. Schwartz, G., 1978. Estimating the dimension of a model. The Annals of Statistics 6, 461–464. Shannon, C.E., 1948. A mathematical theory of communication. Bell System Technical Journal 27, 379–423. 623-59. Starmer, C., 2000. Developments in non-expected utility theory: The hunt for a descriptive theory of choice under risk. Journal of Economic Literature 38, 332–382. von Neumann, J., Morgenstern, O., 1953. Theory of Games and Economic Behavior, 3rd ed. Princeton University Press, Princeton, NJ. Wilcox, N.T., 1993. Lottery choice: Incentives, complexity and decision time. Economic Journal 103, 1397–1417.
Journal of Econometrics 162 (2011) 114–123
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Are CEOs expected utility maximizers? John A. List a,b , Charles F. Mason c,d,∗ a
University of Chicago, United States
b
NBER, United States University of Wyoming, United States
c d
Cambridge University, United Kingdom
article
info
Article history: Available online 13 October 2009 JEL classification: C91 D81 Keywords: Decision making under uncertainty High stakes Experiments
abstract Are individuals expected utility maximizers? This question represents much more than academic curiosity. In a normative sense, at stake are the fundamental underpinnings of the bulk of the last halfcentury’s models of choice under uncertainty. From a positive perspective, the ubiquitous use of benefitcost analysis across government agencies renders the expected utility maximization paradigm literally the only game in town. In this study, we advance the literature by exploring CEO’s preferences over small probability, high loss lotteries. Using undergraduate students as our experimental control group, we find that both our CEO and student subject pools exhibit frequent and large departures from expected utility theory. In addition, as the extreme payoffs become more likely CEOs exhibit greater aversion to risk. Our results suggest that use of the expected utility paradigm in decision making substantially underestimates society’s willingness to pay to reduce risk in small probability, high loss events. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Government officials around the globe are currently mapping a course of action to deal efficiently with terrorism risk, chemical plant security, potential nuclear accidents, climate change, and biodiversity loss. One common thread linking these high-profile issues is that most experts agree they are small probability, high loss events. In the US, the Bush Administration and both Houses continue to debate the most proficient level of government action in each of the specific cases with one ubiquitous line of reasoning: the expected costs and expected benefits of the various policy proposals must be compared and contrasted. Such an approach, which has become the hallmark of public policy decision making around the globe, implicitly assumes that citizens maximize expected utility (see, e.g. Chichilnisky and Heal, 1993). While the expected utility (EU) approach conveniently models probabilistic choice, a great deal of experimental evidence calls into question the empirical validity of the EU maximization paradigm (see, e.g. Machina, 1987; Viscusi, 1992; Thaler, 1992).1
∗ Corresponding address: Universtiy of Wyoming, Department of Economics and Finance, Laramie, WY 82071-3985, United States. Tel.: +1 307 766 2178; fax: +1 307 766 5090. E-mail address:
[email protected] (C.F. Mason). 1 Perhaps of greater importance for our purposes is that the preponderance of evidence points to a tendency for the EU model to fail when the uncertain events include an outcome that is relatively unlikely to occur, but has large payoff implications (Lichtenstein et al., 1978; Baron, 1992). 0304-4076/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2009.10.014
An influential group of commentators has argued that the experimental evidence warrants a major revision to the current public policy framework, but it is important to recognize that the body of evidence is based largely on student behavior over positive outcomes (see, e.g. Starmer, 2000). Since many important public policies involve small probability, high loss events, it is important to understand individual preferences over lotteries for considerable losses; and since the cost-benefit approach is linked critically to ability to pay, it is of great import to understand affluent citizens’ preferences over small probability, high loss events. With the goal of procuring a subject pool that would be on the opposite end of the ‘‘experience spectrum’’ from undergraduate students in terms of evaluating and dealing with risky outcomes, while at the same time allowing an analysis of high stakes decision making among the relatively affluent, we set out searching for such opportunities. Our search concluded when board members at the Costa Rica Coffee Institute (ICAFE) extended an invitation to their annual conference, at which we would have (i) access to chief executive officers (CEOs) and (ii) conference time and floor space on ICAFE grounds to carry out experiments. To ensure that the CEOs were compelled to treat the experimental lottery outcome as a true loss, we had them participate in other unrelated experiments over a one-hour period to earn their initial endowments. Our experiment, therefore, could be thought of as a first test of the empirical accuracy of the expected utility model over losses for agents who are players in the international marketplace. In light of the recent arguments in Harrison and List (2004) and Levitt and List (2007),
J.A. List, C.F. Mason / Journal of Econometrics 162 (2011) 114–123
such an artefactual field experiment represents a useful advance in the area of risky decision making.2 Making use of Costa Rican undergraduate students as our experimental control group, we find that both cohorts exhibited behavior inconsistent with expected utility theory. In fact, observed departures from expected utility theory suggest that a policy approach based solely on expected benefits and expected costs would significantly understate society’s actual willingness to pay to reduce risk in low probability, high loss situations. Our results indicate that for a typical CEO, willingness to pay to reduce the chance of the worst event is very similar to the corresponding willingness to pay for a typical student. Yet, we do find some important differences in behavior across subject pools; for example, as the extreme events become more likely CEOs exhibit greater aversion to risk. The remainder of our study is crafted as follows. Section 2 provides a brief background and summarizes our experimental design. In Section 3, we present our empirical findings. Section 4 concludes. 2. Background and experimental design We begin with a brief discussion of the traditional expected utility model and note some recent literature concerning violations of the underlying modeling assumptions. Consider three events, x1 , x2 , and x3 , where the monetary magnitudes of the events are situated as follows: x1 < x2 < x3 . If pi is the probability that outcome xi will be realized, then the lottery p is the vector of probabilities (p1 , p2 , p3 ). The EU hypothesis postulates that there is an increasing function u(•) over wealth such that an agent prefers lottery p to lottery q if and only if V (p) > V (q), where V (p) =
3 −
u(xi )pi .
(1)
i =1
Behind this framework lie three axioms: ordering, continuity, and independence, which together imply that preferences can be represented by a numerical utility index. We focus exclusively on the third axiom, as it is the independence axiom that implies linear indifference curves in probability space (i.e., indifference curves are parallel straight lines in the Marschak–Machina triangle (Marschak, 1950)). Several experimenters have tested this axiom using experiments with degenerate gambles over certain outcomes.3 For example, Harless (1992), Hey (1995), and Hey and Orme (1994) present econometric estimates of indifference curves under risk at the individual level (in gain space). Many of the proposed variants on expected utility maximization imply a representation that is quadratic in probabilities (as opposed to the linear representation induced by expected utility). The general conclusion is that neither expected utility theory nor the non-expected utility alternatives do a satisfactory job of organizing behavior at the individual level. In particular, considering shapes of indifference curves in the Marschak–Machina triangle, some ‘‘stylized facts’’ concerning individual choice include: (i) indifference curves vary in slope from risk-averse to risk-seeking; (ii) indifference curves are not straight, and indeed fan in and out in a systematic, complex pattern; and (iii) indifference curves are most curved near the boundaries of the
2 Making use of professionals in controlled experiments is not novel to this study — see List’s field experiment website that lists more than 100 artefactual field experiments as well as reviews in Harrison and List (2004) and List (2006). For example, Burns (1985) uses Australian wool traders to explore bidding patterns in auctions. More recently, Fehr and List (2004) use these same Costa Rican CEOs to explore trust and trustworthiness; Haigh and List (2005) use traders from the Chicago Board of Trade to explore myopic loss aversion. 3 For an insightful overview see Starmer (2000).
115
triangle. Similar patterns have also been found in market settings: Evans (1997) examined the role of the market in reducing expected utility violations and found that while it did indeed reduce violations, the improved performance may have been induced by the market price selection rules. To our best knowledge, the experiments we design and examine below are novel to the literature in that they are based on lotteries over losses that span the individual experience spectrum (in terms of evaluating and dealing with risky outcomes), while simultaneously providing a glimpse at preferences of the relatively affluent over high stakes. Such an artefactual field experiment has not been attempted to our best knowledge. By examining the pattern of subjects’ choices, we can determine whether their choices are consistent with the expected utility representation. And, we are provided with a sense of the preference structure of economic actors who are in prestigious roles within the international economy. Beyond its practical import, we find such an analysis important in establishing a dialogue on how self-selection effects might importantly influence behaviors. As Fehr and List (2004) note, one common criticism of laboratory experiments is that the use of a student subject pool might compromise generalizability. 2.1. Experimental design Our student lottery sessions included 101 subjects from the undergraduate student body at the University of Costa Rica. Each student session was run in a large classroom on the campus of the University of Costa Rica. To ensure that decisions remained anonymous the subjects were seated far apart from each other. The CEO subject pool included 29 CEOs from the coffee beneficio (coffee mill) sector who were gathered at the Costa Rica Coffee Institute’s (ICAFE) annual conference.4 The conference is funded by ICAFE and presents the CEOs with information related to the most recent technological advances in the coffee processing sector, regulations within Costa Rica as well as abroad, and general market conditions, among other agenda items. Each of the CEO treatments was run in a large room on-site at the institute. As in the case of the students, communication between the subjects was prohibited and the CEOs were seated such that no subject could observe another individual’s decision. Our student treatment was run in the two days directly preceding the CEO treatment. To begin the experiment, subjects signed a consent form and were informed that the entire experiment would last about two hours, and that after all parts of the experiment were completed, their earnings (losses and gains) would be determined and would constitute their take-home pay. In the first part of the experiment, student (CEO) subjects participated in unrelated treatments (reported in Fehr and List, 2004) in which they earned at least $10 ($100). Once subjects had earned their funds, we informed them that they were now entering the final stage of the experiment, and that this stage would present subjects with 40 pairs of lotteries, which we called ‘‘options’’ (see Appendix C). Following the notation above, we defined the lotteries as follows: x1 , x2 , and x3 represent the magnitudes of the three losses, where x1 < x2 < x3 — suggesting that the first possible outcome entails the largest loss, while the third outcome entails the smallest loss. In our experiment, we make x1 = $80, x2 = $30, and x3 = $0 for CEOs and
4 ICAFE was created in 1948, and is a semi-autonomous institution in charge of providing technical assistance, undertaking field research, supervising receipts and processing of coffee, and recording export contracts.
116
J.A. List, C.F. Mason / Journal of Econometrics 162 (2011) 114–123
instructions and having all of their questions answered, subjects began Stage 2. In Stage 2, each subject was given an option sheet with 40 pairs of options, and circled his or her preferred option for each of the 40 pairs; thus our experimental design provided us with more than 1000 (4000) CEO (student) lottery choices. Each option was divided into 3 probabilities:
in
cr
ea
sin
g
pr
ef
P3 = prob (x 3)
A
B4 B3 B2 B1
er
en
ce
B
C
0
0.05
0.1 P1 = prob (x 1)
Fig. 1. Comparison of lotteries in our experimental design.
x1 = $8, x2 = $3, and x3 = $0 for students. Given what we observed during the experiment and received in feedback via postexperimental interviews, we are confident that both subject pools considered these stake levels as considerable.5 We built the set of lotteries around three reference lotteries, which we selected to reflect specific low probability risk scenarios. In lottery A, the ‘less bad’ outcome obtains with a small probability. This describes a situation in which both the worst outcome and the less bad outcome are not very likely to occur. In lottery B, the less bad outcome is more likely than the other events, but still is not highly probable. This corresponds to a situation with a substantial chance of medium-size losses. In lottery C, losses are quite likely, but they are overwhelmingly more likely to be modest than large. These different scenarios are suggestive of different types of potential catastrophes. Fig. 1 illustrates our method for selecting lotteries. The three probabilities for lottery A in this example are p1 = 0.05, p2 = 0.35, and p3 = 0.6. The three probabilities for B are p1 = 0.05, p2 = 0.55, and p3 = 0.4. The three probabilities for C are p1 = 0.05, p2 = 0.75, and p3 = 0.2. Notice that in each of these three lotteries, the probability of the worst event (lose $80) is quite small. Each of these reference lotteries was compared to twelve other points; four where p1 was reduced to 0.01, four where p1 was increased to 0.1, and four where p1 was increased to 0.2. The decrease in p1 from 0.05 to 0.01 was combined with a decrease in p3 . Conversely, the increase in p1 from 0.05 to either 0.1 or 0.2 was combined with an increase in p3 . The decreases (and increases) in p3 followed a specific path. For example, the four points where p1 was increased from 0.05 to 0.1 are labeled as points B1 (0.1, 0.49, 0.41), B2 (0.1, 0.45, 0.45), B3 (0.1, 0.4, 0.5), and B4 (0.1, 0.3, 0.6). We ran the experiment in four stages. In the first stage the monitor read the instructions, while subjects followed along on their copy.6 Subjects were told that no communication between them would be allowed during the experiment. After reading the
5 Note that it is the loss associated with an event, and not the expected loss that is large. This interpretation of large-stakes events is in keeping with the traditional approach to modeling decision making under uncertainty (Hirshleife and Riley, 1992). 6 See Appendix A for a copy of the experimental instructions, which followed Mason et al. (2005). Note that we took great care to ensure that the experimental instructions were understood. They were first written in English and then translated into Spanish. This translation was performed by a Costa Rican expert. To control for translation biases, a different translator located in Arizona then translated the Spanish instructions back into English. We then cross-checked the translated experimental instructions for internal consistency.
p1 is the probability of losing $80; p2 is the probability of losing $30; and p3 is the probability of losing $0. For example, if an option has p1 = 20%, p2 = 50%, and p3 = 30%, this implies a subject has a 20% chance to lose $80, a 50% chance to lose $30, and a 30% chance to lose $0. For each option, the three probabilities always sum to 100% (p1 + p2 + p3 = 100%). After all the subjects had filled out the option sheet, Stage 3 began. In Stage 3, the monitor had a subject choose one slip of paper out of an envelope that contained 40 slips of paper, numbered from 1 to 40. The number on the slip of paper determined which of the 40 options on the option sheet would be played. For example, if slip #6 was drawn, everyone in the experiment played the option he/she had circled for the pair #6 on his/her option sheet. Once the option to be played was determined, a different subject then drew a slip of paper from a different envelope that contained 100 slips of paper, numbered 1 to 100. The number on this slip of paper determined the actual outcome of the option: −$80, −$30, or $0. Continuing with our example, suppose Lottery A (option #6) is to be played, thus, P1 = 5%, P2 = 75%, P3 = 20%. If the slip of paper drawn is numbered between 1 and 5, event 1 obtains, so that the subject loses $80; if the slip of paper is numbered between 6 and 80, he loses $30; or if a slip is numbered between 81 and 100, a $0 outcome obtains. In the fourth and final stage, each subject was paid his or her take-home earnings in cash and was asked a few follow-up questions. For example, we probed into whether they interpreted the stakes as large and whether they had understood the experimental instructions. 3. Experimental results Before discussing the formal results of our econometric analysis, we first present some summary information. For each of the 40 lottery comparisons, we identified the percentage of subjects within each cohort (students or CEOs) that indicated a preference for option A over option B, and then computed the difference between the fraction of CEOs that preferred option A and the fraction of students that preferred option A. For each of the 40 comparisons, we also calculated the difference between the probability ascribed to the worst outcome under the two options (i.e., p1d = pA1 − pB1 , where pk1 is the probability ascribed to event 1 under option k = A or B). A graphical representation of the relation between these two differences is contained in Fig. 2. In that diagram, we plot the difference in probabilities, p1d , on the x-axis, and the difference between the fraction of CEOs and the fraction of students selecting option A for a given comparison (denoted aved) on the y-axis. A trend line is overlaid on this scatter plot. While this characterization of the data is quite rough, it does point to an important relationship. Specifically, there seems to be an indication that students are slightly less likely than CEOs to choose options with a larger probability on the worst outcome. This might indicate an overall pattern of students exhibiting a greater degree of risk aversion than CEOs, or it might be associated with differences in tendencies to exhibit nonexpected utility maximization between the two groups. To better understand the explanatory power of each possibility, we first analyze a regression model that assumes subject behavior is consistent with expected utility maximization. If subject i makes
J.A. List, C.F. Mason / Journal of Econometrics 162 (2011) 114–123 Table 1 Comparison of average estimated risk indices, CEOs vs. students.
0.2
aved
0.1
Average value Population s.e. s.e. of mean
0.0
CEOs
Students
1.670 1.161 0.0430
1.528 0.6495 0.0072
Note: t-statistic on differences in means = 0.8049. The table population estimates are derived from Eq. (4). Note that these estimates are garnered from logit estimation that assumes subject behavior is consistent with expected utility maximization.
-0.1
-0.2 -0.15 -0.13 -0.11 -0.09 -0.07 -0.05 -0.03 -0.01 0.01 0.03 0.05
p1d Fig. 2. Differences in tendency to choose option A, CEOs vs. students.
his/her choice on the basis of expected utility, then the criterion for selecting option A is
(pA1 − pB1 )ui1 + (pA2 − pB2 )ui2 + (pA3 − pB3 )ui3 > 0,
(2)
pkj
where is the probability that event j = 1, 2 or 3 will occur under option k = A or B, and uij is the von Neumann–Morgenstern (VNM) utility that agent i ascribes to event j. Since probabilities must sum to one, we can simplify the expression on the left side of Eq. (2). Moreover, because the VNM utility function is only uniquely defined up to a monotonic transformation, we impose the normalization ui1 = 0. The resulting criterion becomes ui3 (pA3 − pB3 ) + ui2 (pB3 − pA3 + pB1 − pA1 ) > 0.
(3)
We note that a measure of the risk aversion associated with agent i’s VNM utility function is7 qi = ui3 /ui2 .
117
(4)
To obtain information on agents’ risk attitudes, we use a logit estimation approach.8 Such an approach produces estimates of the parameters ui3 and ui2 for each agent i; we then construct the index of risk aversion qi using Eq. (4). Following this approach for each of the subjects in our sample yields a set of estimates of risk attitudes for the entire population. The main question of interest using this approach is whether students tend to exhibit greater risk aversion than do CEOs. To evaluate this hypothesis, we compare the average estimated measures of risk aversion between the two groups. The relevant information for this test is contained in Table 1. The key finding in Table 1 is that there is no evidence of a statistically important difference in risk attitudes between CEOs and students, assuming agents’ behavior is consistent with the expected utility paradigm. With that observation in mind, we performed a more detailed evaluation of subjects’ risk attitudes. For these purposes, we ran a set of OLS regressions, all of which used the induced measure of the subject’s risk attitude as the dependent variable. Explanatory variables are taken from the ancillary data collected in our post-experiment survey (see Appendix B). The first regression, reported in the second column of Table 2,
7 The index of absolute risk aversion is typically defined by the ratio of second derivative to first derivative; given the discrete nature of our problem, that is approximately [(ui3 − ui2 )/(x3 − x2 )−(ui2 − ui1 )/(x2 − x1 )]/(ui2 − ui1 )/(x2 − x1 ). With the normalization ui1 = 0, this approximation can be reduced to (ui3 /ui2 )[(x2 − x1 )/(x3 − x2 )] − (x3 − x1 )/(x3 − x2 ). As the fractions involving differences in the x’s are constant across subjects, it follows that the ratio ui3 /ui2 summarizes the relevant information on agent i’s aversion to risk. 8 One can think of this approach as emerging naturally from a random utility model, wherein the agent’s true criterion is to choose option A when the left-hand side of Eq. (3) exceeds a random variable that follows the logistic distribution.
included information on gender, whether the subject had formal training in statistics (coded as a 1 if yes, otherwise 0), family income, and whether the subject was from the CEO pool (coded as a 1 for CEOs, otherwise 0). Empirical results from this model continue to suggest a lack of significant difference between subject pools. That said, the coefficients on gender, statistics, and CEO are each qualitatively larger than their respective standard errors, hinting at the possibility that a different specification might uncover a significant effect. With this in mind, we consider three variants of this baseline specification, each of which allows for slope differences as well as intercept heterogeneity between subject pools. These are the regressions listed as models 2, 3, and 4 in Table 2. Important slope differences do emerge for statistics in each variant; there is also some evidence of a potentially important difference in slopes for gender. With respect to statistics, it appears that CEOs have significantly smaller risk attitudes than students, as evidenced by the negative coefficient on the interaction effect. Alternatively, there is some indication that male CEOs are more risk averse than their female counterparts, though the latter group is not well represented in our sample. As indicated above, however, there is considerable experimental evidence that suggests the expected utility paradigm may not be valid. Accordingly, we investigate an expanded discrete choice model where we allow (i) divergences from the expected utility model and (ii) differences across subjects. Because we are interested in identifying the importance of non-linear effects, a natural approach is to specify Vk as a non-linear function of the probabilities. In particular, we assume that the representation V is a cubic function of the probabilities.9 This may be regarded as a third-order Taylor’s series approximation to a more general non-linear form. We parameterize the cubic as: V (p) = α + β1 p1 + β2 p3 + β3 p21 + β4 p1 p3 + β5 p23
+ β6 p31 + β7 p21 p3 + β8 p1 p23 + β9 p33 .
(5)
− , Y4 = q1 q3 − p1 p3 , Let Y1 = q1 − p1 , Y2 = q3 − p3 , Y3 = = q1 q23 − p1 p23 , and Y5 = q23 − p23 , Y6 = q31 − p31 , Y7 = q21 q3 − 3 3 Y9 = q3 − p3 . Based on this specification, the agent prefers lottery p over lottery q if p21 q21 2 p1 p3 , Y8
ε > β1 Y1 + β2 Y2 + β3 Y3 + β4 Y4 + β5 Y5 + β6 Y6 + β7 Y7 + β8 Y8 + β9 Y9 ,
(6)
where ε is a disturbance term that captures unobservable features. The approach we take towards identifying summary information for each group is based on the average agent. Under this approach, we regard each individual’s taste parameters (the coefficients in our regression) as drawn from a population; this approach is termed ‘‘mixed Logit’’ (Revelt and Train, 1998; Train, 1998, 1999; McFadden and Train, 2000). Under the mixed Logit approach, the
9 Chew et al. (1991) were the first to propose the quadratic utility approach. They replace the independence axiom with the weaker mixture symmetry axiom that allows for indifference curves to be non-linear so that predicted behavior matches up reasonably well with observed behavior. We generalize their approach.
118
J.A. List, C.F. Mason / Journal of Econometrics 162 (2011) 114–123
Table 2 OLS regression analysis of risk attitudes. Regressor
Regression model 1
Male Stat. Income CEO Male*CEO Stat*CEO Income*CEO Constant R2
2 0.1761 (0.1651)
3
4
0.5417* (0.2861) 1.233 (0.7948) −0.9521*** (0.3130) 4.97E−08 (3.24E−08) 1.528*** (0.0805) 0.097
0.7308*** (2.669) 1.459* (0.7857) −0.9643*** (0.3147)
0.1287 (0.1651)
−0.2930 (0.1560) 2.71E−08 (2.03E−08)
−0.0836 (0.1806)
0.2802 (0.1947)
0.7590*** (0.2809) 1.330* (0.8068) −0.8807** (0.3643)
1.491*** (0.1194) 0.057
1.499*** (0.1159) 0.104
1.528*** (0.0805) 0.097
Notes: Standard errors in parentheses. The dependent variable in these models is the induced measure of the subject’s risk attitude as computed in Table 1. Explanatory variables are taken from the ancillary data collected in our post-experiment survey, as described in the text. * Significant at 10% level or better. ** Significant at 5% level or better. *** Significant at 1% level or better. Table 3 Mixed LOGIT results, cubic version. Regressor
Students
CEOs
First moment
Second moment
First moment
Second moment
Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9
111.5*** (15.13) 0.937 (0.647) −1542*** (205.4) −0.257 (2.357) −0.206 (0.893) 5003*** (673.7) −7.965 (20.02) 4.609 (6.577) −0.057** (0.860)
1.317 (1.811) 0.2609 (0.355) 22.11*** (5.74) 0.030 (0.726) 1.206*** (0.504) 1.634 (26.19) 14.70 (11.55) 2.460 (2.354) 0.044 (0.651)
115.5*** (25.76) 2.211* (1.327) −1584*** (343.1) 2.406 (5.797) −1.375 (1.944) 5181*** (1119) −5.178* (2.720) 9.315 (8.597) −1.198 (1.524)
0.216 (0.611) 0.065 (0.208) 7.092*** (3.267) 1.913 (1.668) 1.713** (0.717) 50.11*** (11.56) 0.924 (1.007) 3.090 (2.929) 0.099 (0.382)
Log-likelihood statistic Test statistic on H0 (no differences)
−2600.41
−750.14
45.28 (5% = 28.87; 1% = 34.81)
Notes: Asymptotic standard errors in parentheses. Estimates in the table are from an expanded discrete choice model where we allow (i) divergences from the expected utility model and (ii) differences across subjects. Because we are interested in identifying the importance of non-linear effects, we assume a representation that is a cubic function of the probabilities (see Eq. (5)). * Significant at 10% level or better. ** Significant at 5% level or better. *** Significant at 1% level or better.
econometrician identifies the sample mean of the coefficient vector. This mean vector then provides the summary information for the cohort, which we use to identify the behavior of a typical subject in each cohort. The vector (β1 , . . . , β9 ) summarizes each agent’s tastes, which we regard as a draw from a multi-variate distribution. Once the distribution for this vector is specified,10 the joint likelihood function can be made explicit. This likelihood function depends on the first two sample moments of the distribution over the parameters, and the stipulated distribution over the error term (e.g., extreme value for the logit application). Estimates of the mean and standard error parameter vectors are then obtained through maximum likelihood estimation. Unfortunately, exact maximum likelihood estimation is generally impossible (Revelt and Train, 1998; Train, 1998). The alternative is to numerically simulate the distribution over the parameters, use the simulated distribution to approximate the true likelihood function, and then maximize the simulated likelihood function. Table 3 shows the results from such a procedure; we report estimated mean and standard error parameter vectors for each cohort. For students, the mean population effect for the two non-linear terms Y3 and Y6 (corresponding to the squared and cubic terms in p1 ) are statistically significant at the p < 0.01 level. In addition, the
10 We assume the parameter vector is multi-normally distributed.
population standard error associated with the variables Y3 and Y5 is also significant at the p < 0.01 level, indicating the presence of important heterogeneities among the population of students with respect to the quadratic term in p1 . Each of these parameter estimates is also significant at the p < 0.01 level for CEOs; in addition, the mean effects associated with the variables Y2 and Y7 are also statistically significant at better than the p < 0.10 level, and the standard error associated with the variable Y5 is significant at better than the p < 0.05 level. A key point in these estimates is that many of the parameters corresponding to the non-linear effects are both numerically and statistically important. We infer that the expected utility paradigm does not do a particularly good job of explaining the data from our experiment, either for students or for CEOs. We are also interested in possible differences between the behavior of students and CEOs. To investigate the hypothesis of statistically indistinguishable behavior, we compare the maximized likelihood function under the restriction that the mean parameter vector is identical for the two cohorts against the corresponding maximal value of the likelihood function when we allow for differences between the two cohorts. We report this statistic at the bottom of Table 3. Under the null hypothesis that behavior is indistinguishable between the cohorts, the test statistic (twice the difference between the maximal log-likelihood values) follows a central chi-square distribution with number of degrees of freedom equal to the number of restrictions. In the case at hand there are 18 restrictions (the first two moments are restricted to be equal for
J.A. List, C.F. Mason / Journal of Econometrics 162 (2011) 114–123
119
Table 4 Mixed LOGIT results, quadratic version. Regressor
Students First moment
Y1 Y2 Y3 Y4 Y5 Log-likelihood statistic Test statistic on H0 (no impact from cubic terms)
2.140 (1.862)
−0.280 (0.488) −28.29** (8.227) −2.640 (1.744) 1.889** (0.682)
CEOs Second moment
First moment *
0.152 (2.409) 0.161 (1.304) 22.99** (3.351) 0.532 (1.015) 0.834 (0.959)
5.631 (2.250) 0.401 (1.173) −37.11** (11.69) −4.929 (3.937) 0.983 (1.458) −777.00 53.72**
−2683.90 166.98**
Second moment 0.541 (0.658) 0.061 (0.219) 11.35** (3.418) 0.774 (0.582) 1.584* (0.692)
Notes: Asymptotic standard errors in parentheses. The regression model for these results is identical to the model used to generate Table 3, except in this case Y6 , Y7 , Y8 , and Y9 are all assumed to be zero (i.e., this is a restricted quadratic regression). * Significant at 5% level or better. ** Significant at 1% level or better.
the two cohorts for each of the nine parameters). As our test statistic is substantially larger than conventional critical values, we conclude that there are statistically important differences in behavior between the two cohorts. As aforementioned, many of the earlier experimental analyses of possible deviations from the expected utility paradigm focused on representations that were quadratic in probabilities. Such representations can be nested within our analysis; if the coefficients on the terms Y6 , Y7 , Y8 , and Y9 are all zero. Such a restriction is easily imposed on the analysis, by running a variant of the mixed logit regressions reported in Table 3, excluding the last 4 explanatory variables; the results from this restricted regression are reported in Table 4. One can then test the joint hypothesis that none of these last four coefficients is important by means of a likelihood ratio test. We report this test statistic at the bottom of Table 4, for each of the subject pools. The interesting thing to note here is the null hypothesis – that none of the four coefficients is important – is soundly rejected for each cohort. Accordingly, we believe our results have important implications for possible alternative forms for the representation of agents’ preferences.11 While the results we discuss above point to statistical differences in behavior, they do not necessarily imply important economic differences. To address this related issue, one must ask whether the regression model that applies to one cohort differs from the regression model for the other cohort in some significant way. We interpret the notion of ‘‘significant differences’’ as meaning the two models would imply different behavior. From a geometric perspective, such differences are manifested in terms of clear differences in the preference maps for the two groups. To investigate the possibility of such a phenomenon, we used the regression models reported in Table 3 to numerically generate indifference curves within the Machina–Marschak probability triangle. Four such level curves were generated for both students and CEOs. These four curves all begin from the same combination of probabilities (p1 , p3 ), and then trace out the combinations with the same induced level of value, based on the parameters reported in Table 3, for the cohort in question. We plot these sets of level curves in Fig. 3. For each of the four starting combinations of (p1 , p3 ), the solid curve represents the induced level curve for students while the dashed curve represents the induced level curve for CEOs. There are two noteworthy features. First, for probability combinations with middling values of p3 (say, between 0.25 and 0.65) and relatively small values of p1 (say, smaller than 0.15),12
11 A referee suggested that Yaari’s (1987) dual model might provide such an explanation. While the spirit of Yaari’s approach might be apropos, a literal application is not: Yaari’s model leads to a representation that is linear in income or wealth; as we shall see below this does not appear to be consistent with our results. 12 We note that this range of probability combinations largely conforms to the range of probabilities to which subjects were exposed in our experimental design.
P3 1.00
0.75
0.50
0.25
0.05
0.10
0.15
P1
Fig. 3. Level curves implied by cubic representation over lotteries. (Students’ curves are solid, CEOs’ curves dashed.)
the level curves for CEOs tend to be flatter, and to lie below the level curves for students. Accordingly, CEOs would generally accept a smaller increase in p3 for a given increase in p1 , for this range of probability combinations. This phenomenon is roughly the same as the idea that CEOs are ‘‘less risk averse’’ than students, though strictly speaking that related notion would make sense only in the context of the expected utility model. Second, it appears that the level curves for CEOs are more convex than the level curves for students. Thus, when we look at probability combinations closer to the counter-diagonal, the level curves for CEOs have a larger slope than the level curves for students, suggesting that CEOs may exhibit behavior akin to greater aversion to risk at probability combinations where p2 is relatively small. This second phenomenon seems most marked for the level curves starting from the combination p1 = 0.04, p3 = 0.5 (the pair of curves highest up in the triangle). It is interesting to contrast this observation with earlier studies, which tended to find that the most important departures from the expected utility paradigm appear in the corners of the triangle, where one probability is quite large and the others quite small. Our results would seem to suggest that CEOs are relatively more likely to exhibit a similar behavior. 4. A monetary interpretation In this section, we use results from the mixed logit model to investigate a functional form that allows us to infer willingness to pay for a specified change in a lottery faced by the average subject. This discussion is motivated by the following idea: Suppose an agent’s choices are consistent with the expected utility paradigm. Then we can use the data on his choices to estimate a linear
120
J.A. List, C.F. Mason / Journal of Econometrics 162 (2011) 114–123
representation over probabilities, and this linear form can be used to infer a von Neumann–Morgenstern utility function over prizes. If the lotteries in question are defined over three prizes, as in our experiments, the inferred utility function is quadratic. This suggests an interpretation with non-linear representations over probabilities wherein the parameters on the various polynomial terms involving probabilities can be linked to some function of the associated prize. We can then use this link between parameters and prize to estimate the representative agent’s ex ante willingness to pay for a change in risk. In our application, with a cubic representation over probabilities, there are 18 terms involving probabilities: V (p; y) = u1 p1 + u2 p2 + u3 p3 +
u4 p21
+
u5 p22
+
u6 p23
+ u7 p1 p2 + u8 p1 p3 + u9 p2 p3 + u10 p31 + u11 p21 p2 + u12 p21 p3 + u13 p1 p22 + u14 p1 p23 +
u15 p32
+
u16 p22 p3
+
u17 p22 p3
+
u18 p33
,
(7)
where the ui ’s are functions of the prizes yi . Since the probabilities sum to one, we reduce this specification to nine parameters, as in Eq. (5). The resultant parameters (the β ’s in Eq. (5)) are therefore tied to the original functions in a specific manner. Next, we propose a functional relation between the parameters ui in Eq. (7) and the associated prizes. The functional representation we propose is motivated by the observation that the highest-order function that can be employed with three prizes is quadratic, and by the constraint that there are only nine parameters estimated in the mixed logit application. Accordingly, we explore the functional relations: ui = γ1 yi + γ2 y2i ,
i = 1, 2, and 3;
ui = φ1 yi−3 + φ
2 2 yi−3
u7 = ηy1 y2 ,
,
i = 4, 5, and 6;
u8 = η y 1 y 3 ,
u10 = ω1 y1 + ω1 y21 , u18 = ω1 y3 + ω
2 1 y3
and u9 = ηy2 y3 ;
u15 = ω1 y2 + ω1 y22 ,
and
;
u11 = ξ1 y1 y2 + ξ
,
u13 = ξ1 y1 y2 + ξ
,
u14 = ξ1 y1 y3 + ξ
,
u17 = ξ1 y2 y3 + ξ
.
2 2 y1 y2 2 2 y1 y2 2 2 y1 y3 2 2 y2 y3
u12 = ξ1 y1 y3 + ξ2 y21 y3 , u16 = ξ1 y2 y3 + ξ2 y22 y3 ,
and
The goal is to obtain estimates of the parameters γ1 , γ2 , φ1 , φ2 , η, ω1 , ω2 , ξ1 , and ξ2 from the estimated parameters β1 through β9 , for both students and CEOs. Such a process is tedious, involving substantial algebraic manipulation; in the interest of brevity we do not reproduce these calculations here. Table 5 lists the estimates of the nine new parameters of interest, based on the result of those manipulations and the parameter estimates from Table 3. Interestingly, these representations appear to be quite similar for the two groups. As we will see below, the similarity in these parameter vectors induces strong similarities in monetary valuations.13 Armed with these values, we describe a monetary value of a policy change. For example, suppose a certain intervention could reduce the probability of the worst outcome from p1 to p′1 , with an offsetting increase in the probability of the middle outcome from p2 to p′2 . The monetary value of this intervention is the value of OP that solves V (p1 , p2 , p3 ; y) = V (p′1 , p′2 , p3 ; y − OP).
(8)
13 We also conducted a parallel but restricted investigation, based on the quadratic representation whose estimates are given in Table 4. The monetary value induced by this alternative investigation is similar to that reported in the text, and is available from the authors on request.
Table 5 Implied coefficients on money in non-linear representation, students vs. CEOs. Parameter
Estimated value for students
Estimated value for CEOs
γ1 γ2 φ1 φ2 η ω1 ω2 ξ1 ξ2
−1236.6
−1281.8
11.484 4.5372 446.13 −2.3813 −212.11 1.2434 −2.5193 −0.00011
11.902 4.7003 462.27 −2.4681 −219.56 1.2863 −2.6076 −0.00012
The table reports estimates of the parameters γ1 , γ2 , φ1 , φ2 , η, ω1 , ω2 , ξ1 , and ξ2 from the estimated parameters β1 through β9 , for both students and CEOs. Such a process is tedious, involving substantial algebraic manipulation; but it builds on the parameter estimates from Table 3.
The monetary value OP is the agent’s ex ante willingness to pay, irrespective of the ultimate state of nature that obtains, to influence the change in probabilities. We consider two examples to illustrate the point. First, suppose we start from the combination (p1 , p2 , p3 ) = (0.1, 0.3, 0.6) and reduce p1 by 0.05, moving to (p1 , p2 , p3 ) = (0.05, 0.3, 0.6). This change increases the expected value of the lottery by $2.50. Using the parameters in Table 5, we calculate the ex ante monetary value of this change as OP = $1.7028 for both students and CEOs. Similarly, a change that completely eliminates the risk associated with the worst outcome (i.e., moving 0.10 from p1 to p2 ) has an expected value of $5. Based on the parameters in Table 3, we calculate that the monetary value of this change is OP = $3.5340 for both students and CEOs. For both hypothetical changes, then, we conclude that there is no discernible difference in willingness to pay for the two groups. As a second example, suppose the initial combination is (p1 , p2 , p3 ) = (0.05, 0.45, 0.5); again, we reduce p1 by 50% (here, to 0.025), moving to (p1 , p2 , p3 ) = (0.025, 0.45, 0.5). This change increases the expected value of the lottery by $1.25. Using the parameters in Table 3, we calculate the ex ante monetary value of this change as OP = $1.3504 for both students and CEOs. Similarly, a change that completely eliminates the risk associated with the worst outcome (i.e., moving 0.10 from p1 to p2 ) has an expected value of $5. Based on the parameters in Table 5, we calculate that the monetary value of this change is OP = $2.6922 for both students and CEOs. Again, we conclude that there is no discernible difference in willingness to pay for the two groups. 5. Concluding remarks Small probability, high loss events are ubiquitous. Whether individual behavior in such situations follows the postulates of expected utility theory merits serious consideration from both a positive and normative perspective since it remains the dominant paradigm used throughout the public sector and it remains the most popular model of choice under uncertainty. In this study, we combined high stakes experiments with a unique subject pool – CEOs – to examine individual preferences over lotteries for losses. Examining more than 1000 (4000) CEO (student) lottery choices, we found significant departures from expected utility theory both in the student data as well as in the CEO data. In particular, our findings suggest that a policy approach based solely on expected benefits and expected costs would significantly understate society’s actual willingness to pay to reduce risk in low probability, high loss events. Our econometric results indicate that representations of the typical subject are quite similar for the two groups. Specifically, based on the estimated parameters, we find that willingness to pay to reduce the chance of the worst event for a typical CEO is very similar to the corresponding willingness to pay for a
J.A. List, C.F. Mason / Journal of Econometrics 162 (2011) 114–123
typical student. Yet we do find interesting subtle differences in behavior across subject pools. This study represents but a first step outside of the typical laboratory exercise to more fully understand behavior over small probability, high loss events. While we have explored how results vary across students and CEOs, representativeness of the situation has been put on the sidelines. Given that List (2006) argues that representativeness of the environment, rather than representativeness of the sampled population, is the most crucial variable in determining the generalizability of results for a large class of experimental laboratory games, it is important to note that we examine behavior in one highly stylized environment, potentially far removed from domains and decision tasks present in certain markets. We trust that future research will soon begin these important next steps. Acknowledgements Thanks to Colin Camerer, Glenn Harrison, Kerry Smith, Chris Starmer, Robert Sugden, David Zilberman, and two referees for supplying comments. We thank Kenneth Train for supplying the GAUSS program with which we conducted our empirical estimation. An earlier version of this paper was presented at the World Conference for Environmental and Natural Resource Economists and the Southern Economic Association Meetings. Appendix A. Experimental instructions Instructions Welcome. This is an experiment in decision making that will take about an hour to complete. You will be paid in cash for participating at the end of the experiment. How much you earn depends on your decisions and chance. Please do not talk and do not try to communicate with any other subject during the experiment. If you have a question, please raise your hand and a monitor will come over. If you fail to follow these instructions, you will be asked to leave and forfeit any moneys earned. You can leave the experiment at any time without prejudice. Please read these instructions carefully, and then review your answers on the Questions and Answers page. An overview: You will be presented with 40 pairs of options. For each pair, you will pick the option you prefer. After you have made all 40 choices, you will then play one of the 40 options to determine your take-home earnings. The experiment Stage #1: The option sheet: After filling out the waiver and the survey forms, the experiment begins. You start with $100, and your choices and chance affect how much of this money you can keep as your take-home earnings. You will be given an option sheet with 40 pairs of options. For each pair, you will circle the option you prefer. Each option is divided into 3 probabilities: P1 is the probability you will lose $80; P2 is the probability you will lose $30; and P3 is the probability you will lose $0. For each option, the three probabilities always add up to 100% (P1 + P2 + P3 = 100%). For example, if an option has P1 = 20%, P2 = 50% and P3 = 30%, this implies you have a 20% chance to lose $80, a 50% chance to lose $30, and a 30% chance to lose $0. On your option sheet, you circle your preferred option for each of the 40 pairs. For example, consider the pair of options, A and B, presented below. Suppose after examining the pair of options carefully, you prefer option A to B — then you would circle A (as
121
shown below). If you prefer B, you would circle B. A P1 = 10%, P2 = 20%, P3 = 70%
B P1 = 20%, P2 = 20%, P3 = 60%
Stage #2: The tan pitcher: After filling out your option sheet, please wait until the monitor calls you to the front of the room. When called, bring your waiver form, survey, and option sheet with you. On the front table is a tan pitcher with 40 chips inside, numbered 1–40. The number on the chip represents the option you will play from your option sheet. You will reach into the tan pitcher without looking at the chips, and pick out a chip. The number on the chip determines which option you will play to determine your take-home earnings. For example, if you draw chip #23, you will play the option you circled for the pair #23 on your option sheet. Stage #3: The blue pitcher: After you have selected the option you will play, you then draw a different chip from a second pitcher — the blue pitcher. The blue pitcher has 100 chips, numbered 1–100. The number on the chip determines the actual outcome of the option — a loss of either $80, $30, or $0. For example, if your option played has P1 = 10% P2 = 50% P3 = 40%, then if you pick a chip numbered between 1 and 10, you lose $80; if you pick a chip between 11 and 60, you lose $30; or if you pick a chip between 61 and 100, you lose $0. If instead, your option played has P1 = 20% P2 = 20% P3 = 60%, then if you pick a chip between 1 and 20, you lose $80; if you pick a chip between 21 and 40, you lose $30; or if you pick a chip between 41 and 100, you lose $0. Stage #4: Ending the experiment: After playing the option, you fill out a tax form. The monitor will then hand over your take-home earnings, and you can leave the room. Now please read through the questions and answers on the next page. Questions and Answers 1. When I make a choice, I will choose between how many options? 2. 2. I will make how many choices? 40. 3. My initial $$ endowment is how much? $100. 4. P1 represents what? The probability of losing $80. 5. P2 represents what? The probability of losing $30. 6. P3 represents what? The probability of losing $0. 7. For each option, the three probabilities sum to what? 100%. 8. What does the number drawn from the tan pitcher represent? The option (1–40) played from your option sheet. 9. What does the number drawn from the blue pitcher represent? The outcome (1–100) of the option played — determining whether you lose either $80, $30, or $0. Are there any questions?
122
J.A. List, C.F. Mason / Journal of Econometrics 162 (2011) 114–123
Appendix B. The survey sheet 17. 1. Social Security Number: _______________________________ 2. Gender: (circle) Male Female 3. Birthdate: _________________ (month/day/year) Highest Level of School Completed: (please circle) Junior High School High School or Equivalency College or Trade School Graduate or Professional School 4. Courses Taken in Mathematics: (please circle all that apply) College Algebra Calculus or Business Calculus Linear Algebra Statistics or Business Statistics 5. Family’s Annual Income: _____________________ 6. Personal Annual Income: _____________________
Social Security Number: _________________________ An Example: A B P1 = 10%, P2 = 20%, P1 = 20%, P2 = 20%, P3 = 70% P3 = 60% 10% chance of 20% chance of losing $80 losing $80 20% chance of 20% chance of losing $30 losing $30 70% chance of 60% chance of losing $0 losing $0
3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13 14. 15. 16
20. 21. 22. 23. 24.
26.
Appendix C. The option sheet
2.
19.
25.
Thank you
# 1.
18.
A P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3
= 5%, P2 = 35%, = 60% = 20%, P2 = 0%, = 80% = 5%, P2 = 35%, = 60% = 5%, P2 = 35%, = 60% = 5%, P2 = 55%, = 40% = 5%, P2 = 75%, = 20% = 5%, P2 = 55%, = 40% = 5%, P2 = 75%, = 20% = 5%, P2 = 75%, = 20% = 5%, P2 = 55%, = 40% = 5%, P2 = 75%, = 20% = 5%, P2 = 55%, = 40% = 5%, P2 = 35%, = 60% = 5%, P2 = 35%, = 60% = 5%, P2 = 35%, = 60% = 5%, P2 = 35%, = 60%
B P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3
27. 28. 29. 30. 31. 32. 33.
= 1%, P2 = 40%, = 59% = 20%, P2 = 39%, = 41% = 1%, P2 = 49%, = 50% = 5%, P2 = 55%, = 40% = 1%, P2 = 69%, = 30% = 20%, P2 = 50%, = 30% = 1%, P2 = 60%, = 39% = 20%, P2 = 59%, = 21% = 1%, P2 = 89%, = 10% = 20%, P2 = 30%, = 50% = 1%, P2 = 80%, = 19% = 20%, P2 = 39%, = 41% = 10%, P2 = 25%, = 65% = 20%, P2 = 10%, = 70% = 10%, P2 = 10%, = 80% = 20%, P2 = 19%, = 61%
34. 35. 36. 37. 38. 39. 40.
A P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3
= 5%, P2 = 55%, = 40% = 5%, P2 = 75%, = 20% = 5%, P2 = 55%, = 40% = 5%, P2 = 75%, = 20% = 5%, P2 = 35%, = 60% = 10%, P2 = 10%, = 80% = 5%, P2 = 35%, = 60% = 1%, P2 = 40%, = 59% = 5%, P2 = 55%, = 40% = 5%, P2 = 75%, = 20% = 5%, P2 = 55%, = 40% = 5%, P2 = 75%, = 20% = 5%, P2 = 75%, = 20% = 5%, P2 = 55%, = 40% = 5%, P2 = 75%, = 20% = 5%, P2 = 55%, = 40% = 5%, P2 = 35%, = 60% = 5%, P2 = 35%, = 60% = 5%, P2 = 35%, = 60% = 5%, P2 = 35%, = 60% = 5%, P2 = 55%, = 40% = 5%, P2 = 75%, = 20% = 5%, P2 = 55%, = 40% = 5%, P2 = 75%, = 20%
B P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3 P1 P3
= 10%, P2 = 45%, = 45% = 10%, P2 = 60%, = 30% = 10%, P2 = 30%, = 60% = 10%, P2 = 69%, = 21% = 1%, P2 = 44%, = 55% = 10%, P2 = 49%, = 41% = 1%, P2 = 59%, = 40% = 1%, P2 = 79%, = 20% = 1%, P2 = 79%, = 20% = 20%, P2 = 40%, = 40% = 1%, P2 = 64%, = 35% = 20%, P2 = 55%, = 25% = 1%, P2 = 99%, = 0% = 20%, P2 = 20%, = 60% = 1%, P2 = 84%, = 15% = 20%, P2 = 35%, = 45% = 10%, P2 = 29%, = 61% = 20%, P2 = 0%, = 80% = 10%, P2 = 20%, = 70% = 20%, P2 = 15%, = 65% = 10%, P2 = 49%, = 41% = 10%, P2 = 50%, = 40% = 10%, P2 = 40%, = 50% = 10%, P2 = 65%, = 25%
References Baron, J., 1992. Thinking and Deciding. Cambridge University Press, New York. Burns, P., 1985. Experience and decisionmaking: A comparison of students and businessmen in a simulated progressive auction. In: Smith, Vernon L. (Ed.), Research in Experimental Economics. JAI, pp. 139–153. Chew, S., Epstein, L., Segal, U., 1991. Mixture symmetry and quadratic utility. Econometrica 59, 139–163. Chichilnisky, G., Heal, G., 1993. Global environmental risks. Journal of Economic Perspectives 7, 65–86. Evans, D., 1997. The role of markets in reducing expected utility violations. Journal of Political Economy 105, 622–636. Fehr, E., List, J.A., 2004. The hidden costs and returns of incentives—Trust and trustworthiness among CEOS. Journal of the European Economic Association 2 (5), 743–771. Haigh, M., List, J.A., 2005. Do professional traders exhibit myopic loss aversion? An experimental analysis. Journal of Finance 60 (1), 523–534.
J.A. List, C.F. Mason / Journal of Econometrics 162 (2011) 114–123 Harless, D., 1992. Predictions about indifference curves inside the unit triangle: A test of variants of expected utility theory. Journal of Economic Behavior and Organization 18, 391–414. Harrison, G.W., List, J.A., 2004. Field experiments. Journal of Economic Literature 42, 1009–1055. Hey, J., 1995. Experimental investigations of errors in decision making under risk. European Economic Review 39, 633–640. Hey, J., Orme, C., 1994. Investigating generalizations of expected utility theory using experimental data. Econometrica 62, 1291–1326. Hirshleifer, J., Riley, J.G., 1992. The Analytics of Uncertainty and Information. Cambrdge University Press, New York. Levitt, S.D., List, J.A., 2007. What do laboratory experiments measuring social preferences tell us about the real world. Journal of Economic Perspectives 21 (2), 153–174. List, J.A., 2006. Field experiments: A bridge between lab and naturally occurring data. Advances in Economic Analysis & Policy 6 (2), Article 8. Available at: http://www.bepress.com/bejeap/advances/vol6/iss2/art8. Lichtenstein, S., et al., 1978. The judged frequency of lethal events. Journal of Experimental Psychology 4, 551–578. Machina, M., 1987. Choice under uncertainty: Problems resolved and unresolved. Journal of Economic Perspectives 1, 121–154.
123
Marschak, J., 1950. Rational behavior, uncertain prospects, and measurable utility. Econometrica 18, 111–141. Mason, C.F., Shogren, J.F., Settle, C., List, J.A., 2005. Investigating risky choices over losses using experimental data. The Journal of Risk and Uncertainty 31 (2), 187–215. McFadden, D., Train, K., 2000. Mixed MNL models for discrete response. Journal of Applied Econometrics 15, 447–470. Revelt, D., Train, K., 1998. Mixed logit with repeated choices: Households’ choices of appliance efficiency level. Review of Economics and Statistics 53, 647–657. Starmer, C., 2000. Developments in non-expected utility theory: The hunt for a descriptive theory of choice under risk. Journal of Economic Literature 38, 332–382. Thaler, R., 1992. The Winner’s Curse. Free Press, New York. Train, K., 1998. Recreation demand models with taste differences over people. Land Economics 74, 230–239. Train, K., 1999. Mixed logit models of recreation demand. In: Kling, C., Herriges, J. (Eds.), Valuing the Environment Using Recreation Demand Models. Elgar Press. Viscusi, W.K, 1992. Fatal Tradeoffs. Oxford University Press, New York. Yaari, M.E., 1987. The dual theory of choice under risk. Econometrica 55 (1), 95–115.
Journal of Econometrics 162 (2011) 124–131
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
A similarity-based approach to prediction✩ Itzhak Gilboa a,b,c,∗ , Offer Lieberman d , David Schmeidler a,e a
Tel-Aviv University, Israel
b
HEC, Paris, France Cowles Foundation, Yale University, USA
c d
University of Haifa, Israel
e
The Ohio State University, USA
article
info
Article history: Available online 29 October 2009 Keywords: Density estimation Empirical similarity Kernel Spatial models
abstract Assume we are asked to predict a real-valued variable yt based on certain characteristics xt = (x1t , . . . , xdt ), and on a database consisting of (x1i , . . . , xdi , yi ) for i = 1, . . . , n. Analogical reasoning suggests to combine past observations of x and y with the current values of x to generate an assessment of y by similarity-weighted averaging. Specifically, the predicted value of y, yst , is the weighted average of all previously observed values yi , where the weight of yi , for every i = 1, . . . , n, is the similarity between the vector x1t , . . . , xdt , associated with yt , and the previously observed vector, x1i , . . . , xdi . The ‘‘empirical similarity’’ approach suggests estimation of the similarity function from past data. We discuss this approach as a statistical method of prediction, study its relationship to the statistical literature, and extend it to the estimation of probabilities and of density functions. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Reasoning by analogies is a basic method of predicting future events based on past experience. Hume (1748), who famously questioned the logical validity of inductive reasoning, also argued that analogical reasoning is the fundamental tool by which we learn from the past about the future. Analogical reasoning has been widely studied in psychology and artificial intelligence (see Schank, 1986; Riesbeck and Schank, 1989), and it is very common in everyday discussions of political and economic issues. Furthermore, it is a standard approach to teaching in various professional domains such as medicine, law, and business. However, analogical reasoning has not been explicitly applied to statistics. The goal of this paper is to present an analogy-based statistical method, and to explore its relationships to existing statistical techniques. Suppose that we are trying to assess the value of a variable yt based on the values of relevant variables, xt = (x1t , . . . , xdt ), and on a database consisting of the variables (x1i , . . . , xdi , yi ) for
✩ We are grateful to two anonymous referees for their comments. We also gratefully acknowledge financial support from Israel Science Foundation Grant No. 355/06. Gilboa and Schmeidler also acknowledge support from the Pinhas Sapir Center for Development and from the Polarization and Conflict Project CIT-2-CT2004-506084 funded by the European Commission-DG Research Sixth Framework Programme. ∗ Corresponding address: Tel-Aviv University, Tel-Aviv 69978, Israel. E-mail addresses:
[email protected] (I. Gilboa),
[email protected] (O. Lieberman),
[email protected] (D. Schmeidler).
0304-4076/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2009.10.015
i = 1, . . . , n. For example, yt may be the price of an antique piece of furniture, where xt denotes certain characteristics thereof, such as its style, period, size, and so forth. Alternatively, yt may be an indicator variable, denoting whether a PhD candidate completes her studies successfully, where xt specifies what is known about the candidate at the time of admission, including such variables as GRE and GPE scores, the ranking of the college from which the candidate graduated, etc. How should we combine past observations of x and y with the current values of x to generate an assessment of y? If we were to follow Hume’s idea, we would need a notion of similarity, indicating which past conditions xi = (x1i , . . . , xdi ) were more similar and which xi ’s were less similar to xt . We would like to give the observations that were obtained under more similar conditions a higher weight in the prediction of yt than those who were obtained under less similar conditions. In the examples above, it makes sense to assess the price of an antique by the price of other, similar antiques that have recently been sold. Moreover, the more similar is a previous observation to the current one – in terms of style, period, size, and even time of sale – the greater is the weight we would like to put on this observation in the current assessment. Similarly, in assessing the probability of success of a PhD candidate, it seems desirable to put more weight on the observed outcomes involving more similar candidates as compared to less similar ones. In attempting to let previous cases matter for a current prediction problem, but to do so in varying degrees, a similarity-weighted average is arguably the most natural formula. Formally, one may assume that there is a similarity function s : Rd × Rd → R++ =
I. Gilboa et al. / Journal of Econometrics 162 (2011) 124–131
(0, ∞) such that, given a database (xi , yi )i≤n and a new data point xt = (x1t , . . . , xdt ) ∈ Rd , the similarity-based predictor of yt is ∑ s(xi , xt )yi i≤n yst = ∑ . (1) s( xi , xt ) i ≤n
Observe that, in the case when all similarity values are constant, this formula boils down to a simple average of past observations. The sample average is arguably the most basic and most widely used statistic. As such, the formula (1) appears to be a minor variation on the averaging principle. Rather than a simple average, we suggest using a weighted one, where the weights reflect the relevant similarity. If we consider a limiting case where the function s is the indicator s∗ (xi , xt ) = 1 if xi = xt and s∗ (xi , xt ) = 0 otherwise, (1) becomes the conditional sample average of y, given that x = xt . Thus, (1) may be viewed as a continuous family of formulae spanning the range between the conditional and the unconditional average of past observations. However, formula (1) is not the only way to simultaneously generalize averaging and conditional averaging. Is it more or less reasonable than others? What properties does it have? Such questions call for an axiomatic treatment. Gilboa and Schmeidler (1995, 2001) suggested an axiomatic theory of case-based decision making. Gilboa and Schmeidler (2003) specialized the general theory to prediction problems. Their approach studies the way that possible predictions are ranked, as a function of the database of given observations. A key axiom in this paper is the so-called combination axiom, stating that a ranking that follows from two disjoint databases should also follow from their union. The main result uses the combination axiom, coupled with a few other axioms, to characterize a general prediction rule. It turns out that several statistical techniques are special cases of this general rule. In particular, kernel estimation of a density function, kernel classification, and maximum likelihood estimation are such special cases. The axiomatic approach to statistical problems allows one to study the properties that characterize various techniques, to ask how reasonable these techniques are, and to find similarities between them. For example, Gilboa and Schmeidler (2003) discuss the combination axiom and attempt to come up with general guidelines for the classification of applications in which it may be reasonable. Such a discussion may enrich our understanding of the statistical techniques that satisfy this axiom. Moreover, the axiomatic treatment exposes similarities that may not be otherwise obvious, such as the similarity between kernel classification and maximum likelihood estimation. At the same time, the axiomatic analysis also makes it easier to come up with ‘‘counter-examples’’, that is, with situations in which axioms are implausible, thereby delineating the scope of applicability of various techniques. In particular, the combination axiom appears less compelling for time series than it is for cross-sectional datasets. Correspondingly, applying formula (1) where t denotes time may be inappropriate.1 We maintain that the axiomatic approach may benefit statistical theory in general, because axioms may be viewed as criteria for the evaluation of statistical techniques in finite samples. Applying the axiomatic approach to the problem at hand, Gilboa, Lieberman, and Schmeidler (GLS, 2006) axiomatized formula (1) for the case that y is a real-valued variable, while Billot, Gilboa, Samet, and Schmeidler (BGSS, 2005) axiomatized it for the case that y is a multi-dimensional probability vector. These papers do not assume that the similarity function is given. Rather, they
125
consider a certain observable measure – such as a likelihood ordering or a probability assessment – and ask how this observable measure varies with the database that is the input to the problem. The axiomatizations impose certain constraints on the way the observable measure varies with the input database, and prove that the constraints are satisfied if and only if there exists a similarity function such that (1) holds. The formula (1) may be used with any function s : Rd × Rd → R++ . Which function should we choose? GLS (2006) suggest obtaining the similarity function from the data, selecting the function s that best fits the data. The notion of ‘‘best fit’’ can be defined within a statistical model or otherwise. A non-statistical approach, often used in machine learning, does not specify a data generating process (DGP). Rather, it selects a best-fit criterion such as minimal sum of squared errors. Alternatively, the formula (1) can be embedded within a statistical model, parametric or non-parametric. In either case, the optimal s is computed from the data. (See details in Section 2 below.) The right-hand side of formula (1) is mathematically equivalent to a kernel estimator of a non-parametric function, where the similarity function plays the role of the kernel. Thus, the axiomatic derivations of this formula in GLS (2006) and BGSS (2005) may be viewed as axiomatizing kernel-based non-parametric methods. If one takes GLS (2006) and BGSS (2005) as a descriptive model of human reasoning, one might argue that the Nadaraya–Watson estimator of an unknown function coincides with the way the human mind has evolved to predict variables. Indeed, since the human mind is supposed to be a general inference tool, capable of making predictions in unknown environments, it stands to reason that it solves a non-parametric statistical prediction problem. The main contributions of the present paper are to relate the empirical similarity approach to the statistical literature, and to extend it to the problem of density estimation, where the density of a variable yt is assumed to depend on observable variables xt = (x1t , . . . , xdt ). Section 2 describes the empirical similarity statistical models. We devote Section 3 to a more detailed discussion of the relationship between kernel-based estimation and empirical similarity. We then briefly discuss the relationship of our method to spatial models in Section 4. Section 5 discusses the case of a binary random variable. In Section 6 we apply our method to the non-parametric estimation of a density function, and provide an axiomatization of a ‘‘double-kernel’’ estimation method. Finally, Section 7 concludes with a discussion of additional directions for future research. 2. Empirical similarity models Which function s : Rd × Rd → R++ best explains the database (xi , yi )i≤n ? This question, which may or may not be couched in a statistical model, would take a different form depending on whether the data are naturally ordered. If they are, such that for every i > j, (xi , yi ) was realized after (xj , yj ), it is natural to consider the similarity-based predictor of yi , for a given s, to be
∑ ysi
j
s(xj , xi )yj
= ∑ j
s(xj , xi )
.
(2)
If, however, the order of the datapoints in (xi , yi )i≤n is arbitrary, it is more natural to define
∑ ysi
s(xj , xi )yj
j̸=i
= ∑
s(xj , xi )
.
(3)
j̸=i
1 In GLS (2006) we suggest that time series may be analyzed by defining similarities over patterns, or subsequences of observations.
In either case, the choice of the function s may be guided partly by theoretical considerations. Billot et al. (2008) provide
126
I. Gilboa et al. / Journal of Econometrics 162 (2011) 124–131
conditions on similarity-weighted averages that are equivalent to the similarity function taking the form s(x, x′ ) = exp(− x − x′ )
where ‖·‖ is a norm on Rd . For concreteness, we focus on the family of norms defined by weighted Euclidean distances. sw x, x′ = exp −dw (x, x′ )
where w ∈ Rd+ is a weight vector such that the distance between two vectors x, x′ ∈ Rd is given by d w x, x
′
=
d −
′ 2
wj xj − xj
.
(4)
j =1
Thus, the similarity function is known up to a d-dimensional vector of parameters, one for each predictor. In order to conduct statistical inference and to obtain qualitative results by hypotheses tests, one may embed Eqs. (2) and (3) within a statistical model, namely
∑ i
sw (xi , xt )yi
yt = ∑ i
sw (xi , xt )
+ εt ,
accumulation of data. But the basic idea of ‘‘empirical similarity’’ is precisely this, namely, that the similarity function be learnt from the same data that are used, in conjunction with this similarity function, for generating predictions. Hence, the axiomatic derivations mentioned above are limited. Similarly, formula (1) calls for a generalization that would allow it to refine the similarity assessment, and the statistical models (5) and (6) should be accordingly generalized. 3. Empirical similarity and kernel-based methods For clarity of exposition, we start with the unidimensional case, that is, when d = 1 and there is only one explanatory variable X . A nonparametric regression model assumes a DGP of the following type: yi = m (xi ) + εi ,
(7)
where xi is a scalar and m : R → R is the unknown function relating x to y. A widely used nonparametric estimator of m (·) is the Nadaraya–Watson estimator, defined as n x i −x t ∑
(5)
K
h
i=1 n
ˆ (xt ) = m
and
(i = 1, . . . , n) ,
εi ∼ iid 0, σ 2 ,
yi
,
(8)
∑ xi −xt K
∑
sw (xi , xt )yi
i̸=t
yt = ∑
sw (xi , xt )
h
i=1
+ εt ,
(6)
i̸=t
respectively, where {εt } are iid 0, σ 2 . Model (5) can be interpreted as an explicit causal model. Consider, for example, a process of price formation by case-based economic agents. These agents determine the prices of unique goods such as apartments or art pieces according to the similarity of these goods to other goods, whose prices have already been determined in the past.2 Thus, (5) can be thought of as a model of the mental process that economic agents engage in when determining prices. The estimation of sw in such a model is thus an estimation of a similarity function that presumably causally determines the observed y’s. The asymptotic theory for this model was developed by Lieberman (in press). Model (6) cannot be directly interpreted in the same way. Because the distribution of each yt depends on all the other yi ’s, (6) cannot be a temporal account of the evolution of the process. However, such interdependencies may be quite natural in geographical, sociological, or political data, as is common in spatial statistics (see Section 4 below). Both models (5) and (6) assume that the similarity function is fixed and does not change with the realizations of yt , nor with t itself. They rely on the axiomatizations in GLS (2006) and in BGSS (2005). Each of these axiomatizations, like Gilboa and Schmeidler (2001, 2003), uses a so-called ‘‘combination’’ (or ‘‘concatenation’’) axiom.3 Whereas axioms of this type may appear reasonable at first, they are rather restrictive. Gilboa and Schmeidler (2003) contains an extensive discussion of such an axiom and its limitations, and the latter apply to all versions of the axiom, including those that appear in GLS (2006) and in BGSS (2005). For our purposes, it is important to note that the combination axiom does not allow one to learn the similarity function from the data. Correspondingly, formula (1) does not allow the similarity function to change with the
where K (x) is a kernel function, that is, a non-negative function satisfying K (z ) dz = 1, as well as other regularity conditions, and h is a bandwidth parameter. For instance, if we choose the Gaussian kernel, then 1 h
K
h
−1/2 (xi − xt )2 = 2 π h2 exp − . 2 2h
(9)
The choice of h is central in the nonparametric literature, because there is a trade-off between variance and bias. One of the most common criteria for the selection of an optimal bandwidth is to minimize the mean integrated squared error (MISE). That is, the optimal h satisfies
∫
∗
2
ˆ (x) − m (x) m
h = arg min Ef0 h
dx,
(10)
where the expectation is taken under the true density f0 of y. If x is countable and m (x) is replaced by y, then we end up with a minimum expected sum of squared errors criterion. We now turn to discuss the connection between kernel-based estimation and empirical similarity. As described above, the empirical similarity method suggests predicting yt by n ∑
yt =
sw (xi , xt ) yi
i=1 n
∑
, sw (xi , xt )
i =1
where sw (xi , xt ) = exp (−dw )
1
= (π /w)1/2 √ K 1/ 2w
xi − xt
√
1/ 2w
,
dw was defined in (4), and K is given in (9). Then, n ∑
2 See Gayer et al. (2007). 3 A variant of this axiom is also used in the axiomatization in Section 6.
xi − xt
n ∑
sw (xi , xt ) yi
i=1 n
∑ i =1
K
= sw (xi , xt )
i =1 n
x i −x t
i=1
1/ 2w
∑ K
√
x i −x t
√
1/ 2w
yi
.
I. Gilboa et al. / Journal of Econometrics 162 (2011) 124–131
It follows that, in this setting,
√ h = 1/ 2w. Thus, we have a direct mapping from the similarity parameter to the bandwidth parameter. Among other things, we can set w ∗ to satisfy the MISE criterion. Despite the similarity between kernel-based estimation and empirical similarity, there is a fundamental difference between them. The former is a statistical technique that is used, among other things, for the estimation of model (7). By contrast, in models (5) and (6) we use the formula (1) as part of the DGP itself. This difference is accentuated when we focus on the ordered case. We can rewrite model (5) as
ˆw yj = m (j−1) xj + εj ,
( j = 2 , . . . , n) ,
The discussion above generalizes to higher dimensions (d > 1) without any fundamental modifications. Kernel estimation is used for estimation of a non-parametric model (7) where x is multidimensional, and the models (5) and (6) have also been formulated for a multi-dimensional x. Indeed, similar relationships exist between the kernel bandwidth parameters and the weights that determine the similarity function. Specifically, we may specify sw (xi , xt ) = exp (−dw )
= (2π )d/2 (det (W ))1/2 (det (W ))−1/2 K (xi − xt ; W ) ,
n ∑
ˆw where m (j−1) xj is defined as in (8), restricted to the observations
i =1 n
that precede j, namely
n ∑
sw (xi , xt ) yi
∑
= sw ( xi , xt )
∑ ˆw m (j−1) xj =
sw xi , xj y i
i=1 j−1
∑
. sw xi , xj
,
∑
K (xi − xt ; W )
i =1
where the jth bandwidth hj is equal to 1/ 2wj . The bulk of the literature on multivariate kernels focuses only on one bandwidth parameter, but there is no conceptual difficulty in optimizing a multi-dimensional bandwidth. This, indeed, has been discussed by Yang and Tchernig (1999). As in the univariate case, we find the same conceptual differences between the empirical similarity model and kernel estimation. In particular, the empirical similarity model allows one to test hypotheses of the form
K (xi − xt ; W ) yi
i=1 n
i=1 j −1
(13)
where W −1 is a diagonal matrix with elements 2wj , j = 1, . . . , d, and the term in the square brackets of (13) integrates to one. In this setting
(11)
127
(12)
i =1
Model (7) assumes that the distribution of yt is a function of xt alone. If the function m were known, the best predictor of yt given xt would have been m(xt ), independent of previous realizations of x and of y. In other words, model (7) specifies a rule, m, relating xt to yt . This is not the case for model (5). In this model, the DGP is case based, where the distribution of yt depends on all past and present realizations of x, as well as all past realizations of y. Observe that this difference also has an implication regarding the type of questions that are raised about the parameters w or h. In (7), the parameter h is chosen optimally, so as to minimize an expected loss function. It has a purely statistical purpose and meaning. But in (5) and (6), w has a model meaning. Similar to a regression parameter, w may have an economic, psychological, or other substantial meaning having to do with the interpretation of the model. Indeed, in GLS we develop tests for hypotheses of the form4 H0 : w = 0. That is, in this model ‘‘What is the true value of w ?’’ is a meaningful question, whereas in (7) one may only ask ‘‘What is a useful value of h?’’. Despite these differences, the mathematical connections established above suggest that one may also use the empirical similarity approach to predict the value of y even though, in reality, the true DGP is (7). One would then expect the empirical similarity function to become ‘‘tighter’’ with an increase in the database size. To consider an extreme example, assume that a database is replicated in precisely the same way a large number of times. For every past observation (xi , yi ) there will be many identical observations, and the similarity function that best explains existing data will be one with infinite w , that is, a similarity function that ignores all but the identical x values.5
H0 : wj = 0 suggesting that variable xj is immaterial in similarity judgments. Rejecting such a hypothesis constitutes a statistical proof that the variable xj matters for the assessment of y. By contrast, a kernel function that is not part of the DGP does not render itself to the testing of similar qualitative hypotheses. 4. Empirical similarity and spatial models The general spatial model can be written in at least two ways, in each case leading to a different likelihood. Besag (1974, p. 201; see also Cressie, 1993) describes the two possibilities. First, the conditional density of yi given y−i = (y1 , . . . , yi−1 , yi+1 , . . . , yn ) is specified as pi (yi |y−i ) = 2π σ 2
−1/2
× exp −
1 2σ 2
yi − µi −
−
βi,j
2 . yj − µj
j̸=i
This results in the following joint density of y = (y1 , . . . , yn ): p (y) = 2π σ 2
−n/2
] [ 1 |B|1/2 exp − 2 (y − µ)′ B (y − µ) , 2σ
where [B]i,i = 1, [B]i,j = [B]j,i = −βij and B is positive definite. Alternatively, one can assume that E (yi |y−i ) = µi +
−
βi,j yj − µj .
j̸=i
For example, this holds for the model
−
4 Under the hypothesis that w = 0, S x , x = 1 for all i and j. This suggests w i j that y is not influenced by x — past values of y are relevant to its current evaluation irrespective of the x values that were associated with them. Mathematically, setting w = 0 yields the same prediction as using a kernel approach with h = ∞, where for every x, y is evaluated by a simple average of all past y’s. 5 In fact, two replications would suffice for the above argument. But a large
yi = µi +
number of replications would have a similar impact even if the database is not replicated in precisely the same way.
p (y) = 2π σ
βi,j yj − µj + εi ,
j̸=i
where ε1 , . . . , εn are iid normal variables with zero mean and variance σ 2 . In this case the joint density is
2 −n/2
[ |B| exp −
1 2σ 2
] (y − µ) B B (y − µ) . ′
′
(14)
128
I. Gilboa et al. / Journal of Econometrics 162 (2011) 124–131
for i, j = 1, . . . , n, which is the leave-yi -out cross-validation first step. At the second stage of the procedure we obtain
It is required that B is positive definite. Note that if we define sw (xi , xj ) [B]i,j = − ∑ , sw (xi , xj )
w ˆ CV = arg max w
j̸=i
then (14) is the joint density of y in the similarity model (6). This model is also entitled conditional autoregression (or CAR). These spatial models resemble models (5) and (6). The latter may appear more restrictive than the spatial model, because the similarity function sw specifies a particular functional form for the coefficients βi,j (and, in (5), there are additional constraints that βi,j = 0 for i < j). However, in most spatial applications (e.g., Anselin, 1988) the βi,j ’s are taken to be fixed and given whereas in models (5) and (6) the coefficients are not assumed known. Rather, they are functions of the x’s and the w ’s and therefore, they are ultimately estimated from the data. 5. Probability estimation GLS (2006) also propose using the empirical similarity approach for the estimation of probabilities. Such probabilities may be used in a decision problem, employing expected utility maximization or some other decision procedure that is probability based, such as median-utility maximization. Our focus at this point is on probabilities per se. In this context, consider yt ∈ {0, 1}, as in the example of success in a PhD program mentioned above. GLS develop the likelihood function for the ordered model, in which the probability that yt = 1 depends only on past observations, yi for i < t, and this probability is taken to be the similarity-weighted average of these past observations, namely, the similarity-weighted frequency of 1’s in the past6 :
∑ i
sw (xi , xt ) yi
psw (yt = 1|x1 , . . . , xt , y1 , . . . , yt −1 ) = ∑ i
sw ( xi , xt )
.
(15)
However, there are many applications in which the given data are not ordered in any natural way. In this case, one may assume that the probability that each data point yt , t = 1, . . . , n, equals 1 is given by psw (yt = 1|x1 , . . . , xn , y1 , . . . , yt −1 , yt +1 , . . . , yn )
∑
sw (xi , xt ) yi
i̸=t
= ∑
sw (xi , xt )
.
(16)
i̸=t
If p (yi ) = p for all i, then psw (yt = 1|·) is evidently unbiased for p. To estimate w , we can use the idea of likelihood cross-validation, as follows. First, we define psw,−i (yi = 1|x1 , . . . , xn , y1 , . . . , yi−1 , yi+1 , . . . , yn )
∑
sw xj , xi yj
j̸=i
= ∑
sw xj , xi
,
j̸=i
6 GLS also allow the probability to depend on this similarity-weighted frequency in a monotone way. The more specific assumption, namely, that the similarityweighted frequency is the probability, suggests an interpretation of ‘‘probability’’ that generalizes the frequentist definition, while retaining its intuitive appeal. However, this model cannot describe how the process starts and generates both 0’s and 1’s.
n −
log psw,−i (yi = 1|x1 , . . . , xn , y1 , . . . , yi−1 , yi+1 , . . . , yn ) .
i=1
Finally, for a new data point t = n + 1, we estimate (15) by n ∑
pˆ wˆ CV (yt = 1|x1 , . . . , xn , xt , y1 , . . . , yn ) =
swˆ CV (xi , xt ) yi
i=1 n
∑
. swˆ CV (xi , xt )
i=1
Note the difference between this procedure and the one discussed in Silverman (1986, pp. 126–127). In our notation, Silverman’s equation (6.7) reduces to
(yi −yt )2 n λ− 1−λ , pˆ (yt = 1) = n i=1 λ
(17)
where λ is a parameter, assumed to lie in [1/2, 1], to be estimated by likelihood cross-validation. That is,
λˆ CV = arg max λ
n −
log pˆ −i (yi = 1)
i =1
with 2 λ − 1 − λ (yj −yi ) pˆ −i (yi = 1) = . n j̸=i λ
Unlike the case of nonparametric estimation of m (x) with unordered data, it is not apparent how we can map λ into w . Also, with the ‘‘right’’ choice of sw it might be possible to find a similarity-based predicted probability which outperforms (17) in terms of the sum of squared errors. 6. Double kernel density estimation Suppose that one wishes to estimate the density function of a real-valued variable y, where this density is assumed to depend on the values of other real-valued variables x = (x1 , . . . , xd ). Assume that the jth past observation is a vector (x1j , . . . , xdj , yj ) ∈ Rd+1 ,
j = 1, . . . , t − 1. A new datapoint xt ∈ Rd is given. How should we estimate the density of y given xt ? Kernel estimation of a density function is a well-known and widely used technique for the case in which there are no explanatory variables x1 , . . . , xd . (See Akaike, 1954; Rosenblatt, 1956; Parzen, 1962; Silverman, 1986; Scott, 1992.) It is therefore a natural candidate for a starting point. One can therefore ask a more concrete question: How can we generalize kernel estimation to the current problem, in which the density of y is assumed to depend on the realization of the variables x1 , . . . , xd ? Gilboa and Schmeidler (2003) used a ‘‘combination’’ axiom to derive kernel estimation of a density function for the standard case, in which there are no explanatory variables. As mentioned above, variants of this combination axiom are at the heart of the derivation of the similarity-weighted averages in BGSS (2005) and GLS (2006). It therefore appears coherent to estimate the density of y by a kernel method, but to allow the kernel to depend on the explanatory variables x1 , . . . , xd in a way that resembles the similarity-weighted average used above. Specifically, assume that there exists a function s : Rd × Rd → R++ , where s(xt , xj ) measures the degree to which data point xt ∈ Rd is similar to data point xj ∈ Rd , and a kernel function
I. Gilboa et al. / Journal of Econometrics 162 (2011) 124–131
K : R → R+ , i.e., a symmetric density function which is nonincreasing on R+ . For a database (x1j , . . . , xdj , yj ) , consider the j
following formula:
∑ ft (y) =
j
s(xj , xt )K (y − yj )
∑ j
s(xj , xt )
.
(18)
This formula is an (s-)similarity-weighted average of the kernel functions K (yj − y). Thus, each observation yj is thought of as inducing a density function Kyj (y) = K (yj − y) centered around yj . These density functions are aggregated so that the weight of K (yj − y) in the assessment of the density of yt is proportional to the degree that the data point xj is similar to the new data point xt . As in the other models discussed above, two special cases of (18) may be of interest. First, assume that s is constant. This is equivalent to suggesting that all past observations are equally relevant. In this case, (18) boils down to classical kernel estimation of the density f (ignoring the variables x1 , . . . , xd ). Another special case is given by s(xt , xj ) = 1{xt =xj } . 7 In this case, (18) becomes a standard kernel estimation of f given only the sub-database defined by xt . Thus, formula (18) may be viewed as offering a continuous spectrum between the unconditional kernel estimation and conditional kernel estimation given xt . In this section we justify the formula (18) on axiomatic grounds and develop a procedure for its estimation. We start with the axiomatic model, considering the estimated density as a function of the database. We then proceed to interpret the formula we obtain as a data-generating process. This implies that the functions s and K , whose existence follows from the axioms, can be viewed as functions of unknown parameters of a distribution, and thus as the object of statistical inference. We proceed to develop the statistical theory for the estimation of these functions. 6.1. Axiomatization Let F be the set of continuous, Rieman-integrable density functions on R.8 Let C = Rd+1 be the set of possible observations.9 A database is a sequence of cases, D ∈ C n for n ≥ 1. The set of all databases is denoted C ∗ = ∪n≥1 C n . The concatenation of two databases, D = (c1 , . . . , cn ) ∈ C n and E = (c1′ , . . . , ct′ ) ∈ C t , is denoted by D ◦ E and it is defined by D ◦ E = (c1 , . . . , cn , c1′ , . . . , ct′ ) ∈ C n+t . Observe that the same element of C may appear more than once in a given database. Fix a prediction problem, xt ∈ Rd . We suppress it from the notation through the statement of Theorem 1. For each D ∈ C ∗ , the predictor has a density f (D) ∈ F reflecting her beliefs over the value of yt in the problem under discussion. Thus, we study functions f : C ∗ → F , and our axioms will take the form of consistency requirements imposed on such functions. For n ≥ 1, let Πn be the set of all permutations on {1, . . . , n}, i.e., all bijections π : {1, . . . , n} → {1, . . . , n}. For D ∈ C n and a permutation π ∈ Πn , let π D be the permuted database, that is, π D ∈ C n is defined by (π D)i = Dπ(i) for i ≤ n. We formulate the following axioms. A1, Order Invariance: For every n ≥ 1, every D ∈ C n , and every permutation π ∈ Πn , f (D) = f (π D).
7 We assume that the function s is strictly positive. This simplifies the analysis as one need not deal with vanishing denominators. Yet, for the purposes of the present discussion it is useful to consider the more general case, allowing zero similarity values. This case is not axiomatized in this paper. 8 Our results can be extended to Rm with no major complications. 9 For the purposes of the axiomatization, C may be an abstract set of arbitrarily large cardinality.
129
A2, Concatenation: For every D, E ∈ C ∗ , f (D ◦ E ) = λf (D) +
(1 − λ)f (E ) for some λ ∈ (0, 1).
Almost identical axioms appear in BGSS (2005). They deal with probability vectors over a finite space, rather than with densities. In their model, for every database D there exists a probability vector p(D) in a finite-dimensional simplex, and the axioms they impose are identical to A1 and A2 with p playing the role of f . The Order Invariance axiom states that a permuted database will result in the same estimated density. This axiom is not too restrictive provided that the variables x = (x1 , . . . , xd ) specify all relevant information (such as the time at which the observation was made). The Concatenation axiom has the following behavioral interpretation. Assume that, given database D, an expected utility maximizer has to make decisions, where the state of the world is y ∈ R, and assume that her beliefs are given by the density f (D). The Concatenation axiom is equivalent to saying that, for any integrable bounded utility function, if act a has a higher expected utility than does act b given each of two disjoint databases D and E, then a will be preferred to b also given their union D ◦ E. Equivalently, the Concatenation axiom requires that, for any two integrable bounded functions ϕ, ψ : R → R, if the expectation of ϕ(Y ) is at least as large as that of ψ(Y ) given each of two disjoint databases D and E, then this inequality holds also given their union D ◦ E. This axiom is a variation of the Combination axiom in Gilboa and Schmeidler (2003), where it is extensively discussed. In particular, the Combination axiom is unlikely to hold when the data may reflect patterns. Thus, when time series are involved, a straightforward application of our method may lead to poor predictions. The following theorem is an adaptation of the main result of BGSS (2005) to our context. Theorem 1. Let there be given a function f : C ∗ → F and assume that not all {f (D)}D∈C ∗ are collinear. Then the following are equivalent: (i) f satisfies A1 and A2; (ii) There exists a function f0 : C → F , and a function s : C → R++ such that, for every n ≥ 1 and every D = (c1 , . . . , cn ) ∈ C n ,
∑ f (D) =
s(cj )f0 (cj )
j ≤n
∑
s(cj )
.
(∗)
j ≤n
Moreover, in this case the function f0 is unique, and the function s is unique up to multiplication by a positive number. Recall that the discussion has been relative to a new datapoint xt , and that cj = (x1j , . . . , xdj , yj ). Abusing notation, we write (xj , yj ) for (x1j , . . . , xdj , yj ). Thus, an explicit formulation of (∗) would be s (xj , yj ), xt f0 ((xj , yj ))(y)
∑ f (D, xt )(y) =
j ≤n
∑ s (xj , yj ), xt
.
(19)
j ≤n
We interpret this formula as follows. Let s (xj , yj ), xt be the degree to which past observation (xj , yj ) is considered to be relevant to the present datapoint xt . We would like to think of this degree of relevance as the similarity of the past case to the present one. Let f0 ((xj , yj ))(y) be the value of the density function, given a single observation (xj , yj ), at the point y. Then, given database D, the estimated density of y is a similarity-weighted average of the densities f0 ((xj , yj ))(y) given each past observation, where more similar observations get proportionately higher weight in the average. We now make the following additional assumptions: (i) the similarity function depends only on the variables x = (x1 , . . . , xd ),
130
I. Gilboa et al. / Journal of Econometrics 162 (2011) 124–131
thus, s (xj , yj ), xt = s xj , xt ; (ii) the density function f0 ((xj , yj )) (y) does not depend on xj , i.e., f0 ((xj , yj ))(y) = f0 (yj )(y); and (iii) the density f0 (yj )(y) is a non-increasing function of the distance between yj and y, that is, f0 (yj )(y) = K (yj − y) for a kernel function K ∈ F .10 Under these assumptions, (19) boils down to (18). We refer to (19) as a ‘‘double-kernel’’ density function: each observation yj for predictor values xj affects the density of y values that are close to yj , and it does so not only for the density of y given the specific xj , but also for values of x that are close to xj .
6.2. Statistical analysis The formula (18) can be viewed either parametrically or nonparametrically. If the former approach is taken, then (18) is assumed to be correctly specified up to a finite dimensional vector of parameters, say, ψ = (w, θ)′ , where w = (w1 , . . . , wd ) are the weights of the similarity function as above, and θ = (θ1 , . . . , θr ) are parameters that specify the kernel function K .11 To estimate this model, let Ft = σ (x1 , . . . , xt , y1 , . . . , yt −1 ) and assume that the true conditional density of yt , given Ft −1 , is given by
∑ j
ft (y; ψ) =
sw xt , xj Kθ y − yj
∑ j
sw xt , xj
n ∏
,
t = 2, 3, . . . , n.
= (y1 , . . . , yn ), conditional on x =
The joint density of y (x1 , . . . , xn ), is f (y; ψ) =
ft (yt ; ψ)
7. Discussion Analogical reasoning is a cornerstone of human intelligence. Formal and axiomatically based models of such reasoning have resulted in the empirical similarity approach discussed above. The formulae used in this approach turn out to be very similar to kernel methods in statistics. While the differences between the empirical similarity approach and kernel methods should not be underestimated, the striking similarity between the formulae used in both method is probably not coincidental. Our findings suggest that a closer interaction between statistical theory and axiomatic decision theory may be fruitful for both disciplines. Statistical techniques may be interpreted as models of human reasoning and decision making. Just as kernel techniques may be viewed as formal models of reasoning by analogies, other statistical methods may also inform us regarding the way people think. In particular, regression analysis suggests a simple model of reasoning that goes beyond mere analogies to the identification of trends. It appears obvious that decision makers engage in such reasoning, and decision theory should incorporate it into its formal models. Conversely, the axiomatic approach may further our understanding of statistical techniques and help us see connections among them. For instance, we find that a basic principle, namely the Combination axiom, appears to be at the foundation of several techniques, such as kernel estimation, kernel classification, likelihood maximization as well as the empirical similarity approach. Studying the underlying principles of various methods may suggest new ways to combine them in order to tackle new problems.
t =1
∑ =
n ∏
j
sw xt , xj Kθ yt − yj
∑ t =1
j
sw xt , xj
Appendix. Proof of Theorem 1
.
We can proceed with any classical approach, such as maximum likelihood estimation (MLE), where the MLE of ψ is defined as
∑ ψˆ = arg max ψ
n −
log
j
sw xt , xj Kθ yt − yj
∑
t =1
j
sw xt , xj
.
Alternatively, we can take a nonparametric approach, viewing (18) as a nonparametric conditional density estimator. If we consider a kernel function given up to a single bandwidth parameter h, we obtain the following double-kernel, adaptive non-parametric density estimator,
ft (y) =
j
h
∑
y−yj
j
sw xt , xj
∑ (20)
depending on d + 1 parameters, w1 , . . . , wd , h. In the special case where w1 = · · · = wd = 0 (i.e., when all the sw ’s are equal), the formula reduces to the usual kernel density estimate, ft (y) =
1
(t − 1)h
t −1 − y − yj K
j=1
h
T+
l 2
,T + m
l+1 2m
.
In order to make (20) operational, we can choose h and w jointly so as to satisfy any reasonable criterion, such as the minimum of the MISE.
(21)
fm (D)(A) =
sm (cj )fm (cj )(A)
j ≤n
∑
sm (cj )
.
(22)
j ≤n
It follows that (22) holds also for every event A that is Pm measurable. Consider two consecutive partitions, Pm and Pm+1 . Since every event A ∈ Pm is also Pm+1 -measurable, we conclude that, for every n ≥ 1, every D = (c1 , . . . , cn ) ∈ C n , and every A ∈ Pm ,
∑ fm+1 (D)(A) =
sm+1 (cj )fm+1 (cj )(A)
j ≤n
∑
sm+1 (cj )
.
(23)
j ≤n
However, fm+1 (D)(A) = fm (D)(A) = f (D)(y)dy and A fm (cj )(A) = fm+1 (cj )(A) = A f (cj )(y)dy. Combining these with (22) and (23), we conclude that sm+1 can replace sm in (22). By the uniqueness result of BGSS (2005), sm+1 is a multiple of sm . Without loss of generality, we may assume that sm+1 = sm . Thus, these
10 These simplifying assumptions can be written in terms of axioms on f : C ∗ → F . However, this translation is straightforward and therefore omitted. 11 Of course, one may consider richer parametric models, such as a quadratic distance function that depends on
d 2
+ d parameters.
Thus, Pm contains m2m+1 + 2 intervals, of which two are infinite. For f ∈ F , let fm be the distribution induced by f on Pm . Specifically, for A ∈ Pm , fm (A) = A f (y)dy. Observe that, for every f ∈ F , max{fm (A) | A ∈ Pm } → 0 as m → ∞. Fix Pm and consider fm (D) for D ∈ C ∗ . Observe that fm satisfies the axioms of BGSS (2005). Hence for every m ≥ 1 there exists a function sm : C → R++ such that, for every n ≥ 1, every D = (c1 , . . . , cn ) ∈ C n , and every A ∈ Pm ,
h
[
− m ≤ T ≤ m − 1, 0 ≤ l ≤ 2m − 1 .
sw xt , xj K
Pm = {(−∞, −m), [m, ∞)} ∪
ˆ . Then, the estimated conditional density of yt is ft y; ψ
∑
The necessity of the axioms is straightforward. We now prove sufficiency. Consider the sequence of partitions of R defined by
I. Gilboa et al. / Journal of Econometrics 162 (2011) 124–131
exists a function s : C → R++ , and, for each c ∈ C , a density f (c ) ∈ F , such that, for every m ≥ 1, for every n ≥ 1, every D = (c1 , . . . , cn ) ∈ C n , and every A ∈ Pm ,
∑ fm (D)(A) =
s(cj )f (cj )(A)
j ≤n
∑
s(cj )
.
(24)
j ≤n
Next consider an arbitrary finite interval (u, v) (where −∞ ≤ u < v ≤ ∞). Observe that, for every n ≥ 1 and every D = (c1 , . . . , cn ) ∈ C n ,
−
f (D)((u, v)) = lim
m→∞
fm (D)(A)
{A∈Pm |A⊂(u,v)}
∑ −
= lim
m→∞
{A∈Pm |A⊂(u,v)}
s(cj )f (cj )(A)
j ≤n
∑
s(cj )
j ≤n
− s(cj ) − ∑ f (cj )(A) m→∞ s(cj ) {A∈P |A⊂(u,v)} j ≤n
= lim
j ≤n
m
− − s(cj ) ∑ lim f (cj )(A) = s(cj ) m→∞ {A∈P |A⊂(u,v)} j ≤n j ≤n
=
m
− s(cj ) ∑ f (cj )((u, v)); s(cj ) j ≤n j ≤n
hence (∗) is proved. Finally, the uniqueness of f is obvious, and the uniqueness of s (up to multiplication by a positive number) follows from the uniqueness result in BGSS (2005).
131
References Akaike, H., 1954. An approximation to the density function. Annals of the Institute of Statistical Mathematics 6, 127–132. Anselin, L., 1988. Spatial Econometrics: Methods and Models. Kluwer, Dordrecht. Besag, J., 1974. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society Series B 36, 192–236. Billot, A., Gilboa, I., Samet, D., Schmeidler, D., 2005. Probabilities as similarityweighted frequencies. Econometrica 73, 1125–1136. Billot, A., Gilboa, I., Schmeidler, D., 2008. An axiomatization of an exponential similarity function. Mathematical Social Sciences 55, 107–115. Cressie, N., 1993. Statistics for Spatial Data. John Wiley & Sons, New York. Gayer, G., Gilboa, I., Lieberman, O., 2007. Rule-based and case-based reasoning in housing prices. BE Journals in Economics 7. Article 10. Gilboa, I., Schmeidler, D., 1995. Case-based decision theory. Quarterly Journal of Economics 110, 605–639. Gilboa, I., Schmeidler, D., 2001. A Theory of Case-Based Decisions. Cambridge University Press, Cambridge. Gilboa, I., Schmeidler, D., 2003. Inductive inference: An axiomatic approach. Econometrica 71, 1–26. Gilboa, I., Lieberman, O., Schmeidler, D., 2006. Empirical similarity. Review of Economics and Statistics 88, 433–444. Hume, D., 1748. Enquiry into the Human Understanding. Clarendon Press, Oxford. Lieberman, O., 2010. Asymptotic theory for empirical similarity models. Econometric Theory 26 (in press). Mimeo. Parzen, E., 1962. On the estimation of a probability density function and the mode. Annals of Mathematical Statistics 33, 1065–1076. Riesbeck, C.K., Schank, R.C., 1989. Inside Case-Based Reasoning. Lawrence Erlbaum Associates, Inc, Hillsdale, NJ. Rosenblatt, M., 1956. Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics 27, 832–837. Schank, R.C., 1986. Explanation Patterns: Understanding Mechanically and Creatively. Lawrence Erlbaum Associates, Hillsdale, NJ. Scott, D.W., 1992. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley and Sons, New York. Silverman, B.W., 1986. Density Estimation for Statistics and Data Analysis. Chapman and Hall, London and New York. Yang, L., Tchernig, R., 1999. Multivariate bandwidth selection for local linear regression. Journal of the Royal Statistical Society Series B 61 (4), 793–815.
Journal of Econometrics 162 (2011) 132–139
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
The distortion of information to support an emerging evaluation of risk J.E. Russo a,∗ , Kevyn Yong b a
Cornell University, United States
b
HEC Paris, France
article
info
Article history: Available online 11 July 2010 Keywords: Desirability bias Distortion of information Format of information Risk Risk assessment
abstract A persistent problem in the assessment of the risk of an event is a bias driven by the desirability of different outcomes. However, such a desirability bias should not occur in the absence of prior dispositions toward those outcomes. This assumption is tested in an experiment designed to track the evaluation of information during an emerging evaluation of risk. Results confirm the presence of a substantial desirability bias even when there is no prior disposition toward any outcome. These findings bear implications for the assessment of risk not only in the presence of prior desirability, but also in situations currently considered benign. © 2010 Elsevier B.V. All rights reserved.
‘‘We are all children of the Enlightenment, whatever other forebears we may acknowledge. It has been a cardinal principle of our upbringing that we must never believe things simply because we want them to be true’’. Mary Midgley, Science as Salvation: Modern Myth and Its Meaning, Routledge, London: 1992, p. 119 For policy makers, practitioners, and researchers who must deal with risk, a constant concern is biased risk assessment. If people systematically over- or underestimate their reported level of risk, they greatly complicate the challenge of formulating successful public policy or designing effective managerial practice. Further, when individuals lack awareness of their own misestimation of the risk in their decision options, they make it much more difficult to choose wisely. The present work investigates a relatively new form of the well known desirability bias, one that occurs in the absence of any prior disposition. Before turning to the desirability bias, we acknowledge how we use the terms disposition and risk. A disposition is an evaluative position, a current and tentative evaluation of an outcome or alternative. An investor confronted with a new prospect forms a tentative disposition toward that opportunity as information is received and evaluated. We use the term ‘‘disposition’’ rather than the familiar ‘‘preference’’ in order to distinguish the emerging, reversible characteristics of a disposition from the innate, stable qualities usually associated with preference. Risk is viewed as a function of both the likelihood of an event’s outcome and the perceived loss associated with that
∗ Corresponding address: Johnson Graduate School of Management, Cornell University, 443 Sage Hall, Ithaca, NY 14853, United States. Tel.: +1 607 255 5440; fax: +1 607 255 5993. E-mail address:
[email protected] (J.E. Russo). 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.07.004
outcome. (The perceived loss encompasses any outcome felt as a loss, including foregone gains or opportunity losses.) We do not attempt to specify risk more precisely. Indeed, because felt risk is, like utility, inherently subjective, we accept an individual’s subjective perception of risk without attempting to analyze or explain it (e.g., Koonce et al., 2005). Although both the magnitude of an event’s likelihood and the magnitude of its associated loss contribute to risk, comparisons among risks are often reduced to comparisons among likelihoods only (see, for example, Kernic et al., 2000 and Williams, 2003; but for a challenge to this assumption, see Hay et al., 2005). In such cases, the events whose risks need to be considered are typically negative, the loss component of risk is presumed rather than explicitly recognized, and only likelihoods are assessed and analyzed. Some analyses that treat risk as synonymous with likelihood do explicitly acknowledge the loss component of risk (i.e., as distinct from the risk’s likelihood). This loss component is labeled the ‘‘severity’’ of the risk. Both likelihood and ‘‘risk severity’’ may then be assessed separately but in parallel (e.g., Fihn et al., 1996). We adopt the more general view of risk as comprising both likelihood and loss in a subjective manner (e.g., Koonce et al., 2005; Weber et al., 2002). Besides accepting risk as personal and thereby avoiding the challenge of defining it, we also do not address certain other well known problems associated with risk assessment, viz. (a) different meanings of risk across cultures (e.g., Goszczynska et al., 1991; Weber and Hsee, 1998), (b) systematic misestimates of likelihood when it is considered synonymous with risk (e.g., DeJoy, 1992; Slovic, 1993; Zeckhauser and Viscusi, 1990) and (c) different subjective interpretations of likelihood terms across people (e.g., Budescu et al., 2003; Smits and Hoorens, 2005).
J.E. Russo, K. Yong / Journal of Econometrics 162 (2011) 132–139
Desirability bias The desirability bias is an increase in the estimated likelihood of an outcome when it is desirable and a decrease when it is undesirable. For instance, in a political election voters tend to believe that their preferred candidate is more likely to win (Fischer and Budescu, 1995). This bias, in its different contexts and variations, is known as wishful thinking (McGuire and McGuire, 1991; Slovic, 1966), overoptimism (Messick and Bazerman, 1996), outcome bias (e.g., Cohen and Wallsten, 1992), and value bias (Slovic, 1966; Yates, 1990).1 The desirability bias is a clear violation of rationality. How can an investor rationally estimate the financial risk of a new investment or a parent the safety risk of a vehicle for their college-bound daughter if they cannot estimate the likelihoods of the various outcomes independent of their own preferences (over those same outcomes)? The desirability bias’ violation of the rational use of information transgresses the rules of the Bayesian calculus. In the Bayesian updating of likelihood, a prior disposition (the prior probability) is combined with the evaluation of new information (a datum) to yield a revised or updated disposition (the posterior probability). Crucial to this Bayesian calculus is the assumption of independence between its two inputs, the prior probability and the datum. What the desirability bias amounts to is failure of that independence assumption. The prior disposition/probability not only contributes directly to its posterior counterpart, but also biases or distorts the evaluation of the datum as well (see also Boulding et al., 1999). As irrational and patently embarrassing as such a bias might appear, it seems to be remarkably widespread. For instance, Olsen (1997) found it exhibited by highly trained professionals whose job is to evaluate risk so as to select investment portfolios. He showed that professional investment managers bias their estimates of the likelihoods of events to correlate positively with their desirability ratings of the same events.2 This said, the evidence for the desirability bias from controlled experiments is not great. Krizan and Windschitl (2007) reviewed this phenomenon and noted that ‘‘despite the prevalence of the idea that desires bias optimism, the empirical evidence regarding this possibility is limited (p. 2)’’. The present work contributes to this body of evidence by investigating the presence of the desirability bias when the driving disposition is not pre-existing and stable but only emerges tentatively and reversibly during a choice or judgment. Further, we track a specific mechanism through which the desirability bias may operate, namely the distortion of new information to accord with the emerging disposition. A controlled experiment then tests for (a) the presence of a significant bias, (b) its difference in magnitude between the evaluation of a single option and the
1 Although our focus is entirely on risk, an equivalent form of the desirability bias has been observed in the domain of preferences. During a choice, an existing preference for one of the options leads to the distortion of new information to support that preferred alternative. That is, instead of a prior preference distorting the judged risk derived from new information, it distorts the judged preferential value of that information. To take a classic demonstration, Lord et al. (1979) showed that when individuals have a prior disposition toward capital punishment, a combination of information for and against capital punishment ‘‘polarizes’’ those existing opinions. That is, people who are opposed to capital punishment before reading the new, mixed information are more opposed after, while those who supported capital punishment before support it even more strongly after. 2 Olsen described his experimental participants as follows: ‘‘106 US Chartered Financial Analysts [who] were personally responsible for the ‘investment positioning’ of large institutional investment portfolios, such as mutual funds, pension funds, etc. and [who] had a high degree of professional training and experience. Most have at least one academic degree in finance or a related discipline as well as the CFA designation. The CFA is a professional certification awarded by the Financial Analysts Federation upon completion of a rigorous three-year program of practice, study, and examination. Only about 14% of all candidates successfully complete the CFA program and possession of the ‘charter’ should signify an analyst of exceptional skill and judgment (1997, p. 66)’’.
133
choice between two options, and (c) whether switching from a verbal to a numerical information format can eliminate, or at least reduce, the bias. Distortion of information during a decision In all documented cases of the desirability bias, it is driven by a prior commitment to a preferred position, often a clear and sometimes long-standing one. That is, decision makers begin their task with a stable disposition in favor of an option or outcome. Thus, voters are often positively predisposed toward one candidate’s political party; and Olsen’s (1997) investment managers desire one outcome of a financially consequential event over all the others. In contrast to such pre-existing dispositions is the finding of the same desirability-driven bias when there is no prior position. Instead, this form of the desirability bias is based on the tentative leaning toward one option that inevitably emerges during a decision (Russo et al., 1996) or for the tentative judgment that emerges during the evaluation of a single option (Bond et al., 2007). Although this bias, by definition, appears in the final choice or judgment, it can be traced to the distortion of the information on which that final conclusion is based. Specifically, after a tentative disposition emerges based on the initial information, new information is biased or ‘‘distorted’’ to support it. That is, the new information is interpreted and evaluated as more supportive of the emerging, tentative disposition than it should be. The result, of course, is a biased final choice or judgment. What may be more important, however, is tracing the cause of the desirability bias in the decision output to its origin in the distortion of information during the decision process. The phenomenon of information distortion in the assessment of risk may best be explained by describing the procedure that exposes and measures it during a binary choice. Consider the decision (taken from the study that follows) between two investments, both resort hotels similar in size and cost and in the same sunny community. The choice between them is based solely on their investment risk. That is, the resort hotel with the lower risk of failing to be financially profitable should be chosen. Our risk-based task may be viewed as changing the choice criterion from the maximization of overall value to the minimization of risk (where the former characterizes nearly all prior work using this experimental paradigm3 ). In the choice driven by the goal of selecting the less risky of two hotel investments, the decision maker must evaluate relevant information in a set of attributes like location, beach and management, described for both hotels. Each such attribute is a brief paragraph like the following one for the location of both hotels. Hotel J is within a very short walk of the entertainment center of town, which offers many pubs and nightclubs, a good variety of cafés and restaurants, and a grocery store that sells wine and beer. The hotel will provide a free shuttle service to the heart of town. Besides a small gift shop at the hotel, there are a couple of stores only a few minutes walk that sell T-shirts, beachwear, souvenirs, and the like — at local prices. Hotel C ’s location is a brisk walk to the heart of town where there is a wide variety of bars and dance clubs. The area offers excellent restaurants, good shopping with several stores that sell souvenirs or a range of clothing from fairly dressy to totally casual, and two liquor stores (hotels are not allowed to sell ‘‘packaged’’ alcohol). Also, the hotel will provide a free shuttle service to the heart of town.
3 The only exception that we are aware of is recent work by DeKay and several colleagues (DeKay et al., 2009a,b, in press).
134
J.E. Russo, K. Yong / Journal of Econometrics 162 (2011) 132–139
The key to exposing the distortion of this information is to segment the choice process at the evaluation of each attribute. Then the evolution of the leaning toward one option can be tracked as the decision maker progresses stepwise through the attributes until the eventual choice of one alternative (Meloy and Russo, 2004). In this stepwise method (described in detail below) each attribute’s risk value is subjectively assessed by the decision maker. This assessment, by comparison with a control group’s unbiased evaluation of the same attribute information, enables the measurement of information distortion. Specifically, if some decision makers lean toward Hotel J (the leading alternative) as the less risky investment after having seen the first attribute, the presence of distortion is manifest as an evaluation of the second attribute that is biased toward favoring Hotel J as less risky (relative to the control group’s evaluation of the second attribute). Similarly, those decision makers who lean toward Hotel C after the first attribute tend to bias their evaluation of the second attribute to favor Hotel C. This distortion of information amounts to a desirability bias based on an emerging disposition that favors the leading alternative. That is, its starting point is the complete absence of any prior disposition toward, or even knowledge of, the choice options. Further, the bias is manifest not only in the chosen alternative, but in the evaluation of the information on which the decision is based. Thus, it takes the desirability bias into the decision process itself, not merely as a final consequence of that process. Surprisingly, given the absence of a prior disposition, the distortion of information based on an emerging preference seems to be nearly ubiquitous. It has been found in decisions made by professionals (Boyle et al., 2009; Russo et al., 2000) and in the evaluation of single options as well as in choices between options (Bond et al., 2007). Further, it is material, and not some curiosity of the decision process that has no impact on the decision itself. Russo et al. (2006) demonstrate that an inferior alternative can be installed as the leader, then supported by the distortion of subsequent information and, as a consequence, chosen a majority of the time. Finally, we note that it has proved nearly impossible to eliminate, for instance, by offering a financial incentive for unbiased evaluation of the information (Meloy et al., 2006). Indeed, the distortion of information to support an emerging preference seems to broadly characterize human judgment and choice. We note that the cause of the distortion of information during a decision is still being explored. It appears to result from a process that Holyoak and Simon (1999) describe as the bidirectional influence of the current disposition on the new information and vice versa. This influence, in turn, is driven by the desire for consistency between old and new information (Russo et al., 2008), where the latter is the running evaluation of the attributes already seen in our experimental paradigm. We emphasize, however, that the phenomenon of the distortion of information is new enough that its underlying mechanism and driving causes have not been empirically verified by multiple researchers. Research goals The present work has four goals. The first and most fundamental is to test whether the desirability bias leads to the distorted evaluation of risk-relevant information even in the absence of any prior disposition toward the riskiness of the options. Second, we test for this desirability bias in both binary choice and the evaluation of a single option and contrast their relative magnitudes. Third, the argument is often made for the superiority of numerical information over purely verbal. Numerical is seen as more objective and less susceptible to the kind of pliant interpretation that must take place for distortion to occur. Thus, we test two information formats, verbal and numerical, keeping the information content as equivalent as possible across them. Finally, we address the question of awareness. The potential danger from
distorting risk-relevant information should be greater if decision makers are unaware of any tendency to do so. Thus, our fourth goal is to test for a relation between the magnitude of any emerging desirability bias and the reported awareness of it. 1. Method 1.1. Design Two factors were manipulated, task (binary choice versus the evaluation of a single option) and information format (verbal versus numerical). This created four experimental conditions that were administered in a between-participants design. In addition, each of these four had a corresponding control condition in which the same information was evaluated but not in the context of a choice or judgment. The control groups provided an estimate of the unbiased value of each attribute of information against which any desirability bias (i.e., distortion) could be measured. Finally, two domains were used to provide generality, a financially risky investment in a resort hotel and the safety risk of a motor vehicle. 1.2. Participants The experimental participants were 302 volunteers recruited at a large Northeastern university. Each completed an hour-long session that included other studies and for which each received a total payment of $10. Of these participants, 191 were assigned to one of the four experimental conditions (range: 45–50 per condition), while the remainder served as control participants. Because the control tasks were relatively short, each control participant completed both the hotel and vehicle domains while experimental participants completed either a hotel or vehicle task One participant failed to complete all elements of the task and was excluded. 1.3. Materials Each participant received a paper packet that consisted of a consent form, a cover story, the appropriate task (binary choice or single-option evaluation), and a concluding section that ended with a standard suspicion check. The cover story for the hotel investment involved a college Entrepreneurship Team with a make-or-break opportunity to invest in one of two new, youthoriented resort hotels. The cover story for the vehicles involved recommending the one with the lower risk of accident or injury to the parents of a cousin who was receiving a new car as she went off to college. The Appendix contains the full cover stories for two of the four packets, judging the investment attractiveness of a single hotel and choosing the less risky of two vehicles. For each domain, the hotel investment and vehicle safety, an option was described by six attributes (for hotels: beach, dining options, location, management, rooms, and target market; and for vehicles: accident record, brakes, crash worthiness, passenger protection, safety features, and size). For each attribute an ‘‘attribute description’’ was written for both of the options. Thus, there were six pairs of descriptions in each of the hotel and vehicle domains (24 individual descriptions in total). In the binary choice task each pair appeared together, creating six attributes like the hotel location attribute shown above on which the choice of the less risky investment or the less risky vehicle was to be based. In the single option conditions, the attribute descriptions were shown one at a time. In the numerical information format, the variable information was changed to numbers that reflected approximately the same values as their verbal counterparts. That is, we substituted equivalent numbers for several of the value-conveying words
J.E. Russo, K. Yong / Journal of Econometrics 162 (2011) 132–139
in the verbal version of an attribute. Pretesting revealed the numerical values that approximated the purely verbal information. Below is the resulting numerical format of the location attribute. This can be contrasted with the purely verbal representation of the same attribute shown above. Hotel J is within a short 5-min walk to the entertainment center of town, which offers 15 pubs, 6 nightclubs, a good variety of cafés and restaurants, and a grocery store that sells wine and beer. The hotel will provide a free shuttle service to the heart of town. Besides a small gift shop at the hotel, there are two stores only a 2-min walk away that sell T-shirts, beachwear, souvenirs, and the like — at local prices. Hotel C ’s location is a brisk 20-min walk to the heart of town where there is a wide variety of bars (at least 12) and 4 dance clubs. The area offers excellent restaurants and good shopping, with something like 8 to 10 stores that sell souvenirs or a range of clothing from fairly dressy to totally casual, and two liquor stores (hotels are not allowed to sell ‘‘packaged’’ alcohol). The hotel provides a free shuttle service. The order of attributes was chosen randomly and reversed for half of the participants. Similarly, the option that appeared first, either Hotel C or J in the example above, was reversed for half of the participants. Both counterbalances reflected standard precautionary methodology. Neither had any effect on the results and, therefore, will not be discussed further. After the main task, participants assigned an importance weight to each attribute on a 100-point constant-sum scale, indicated their awareness of any distortion, and completed a standard suspicion check. Importance weights were used to check that, on average, no attribute was considered irrelevant. The minimum mean importance weight was 12.5 for hotels (the dining options attribute) and 10.4 for vehicles (the size attribute). These values were judged to be large enough to ensure that all attributes were sufficiently important, at least on average. The suspicion check consisted of three questions, ‘‘Were you suspicious of the experimenter?’’, ‘‘Were you suspicious of the instructions?’’, and ‘‘Were you suspicious of the materials?’’. A ‘‘yes’’ to any question, with no requirement of an explanation, was sufficient to disqualify a participant. The suspicion check was failed by 21 participants, who were dropped from all analyses. 1.4. Procedure Participants were given a packet, read and signed a consent form, and then asked to proceed through the remaining pages at their own pace. Packets were randomly assigned to participants with the exception that all controls were completed first. The task took between 10 and 20 min in total. Payment was made, in cash, at the completion of the session. Note that no performance-based financial incentives were offered. First, there was no objectively correct final choice or judgment, nor any objectively correct evaluations of the individual attributes, on which to base an incentive. Thus, deception would have been required. Second, incentives might have introduced their own problems, including causing an increase in information distortion (Meloy et al., 2006). 1.5. Intermediate responses The identification of information distortion was achieved by partitioning the choice process at the individual attributes and collecting three responses after each one. The first response was an evaluation of the risk of the current attribute on a 1-to-9 risk scale specific to the decision domain. It is this evaluation of the attribute information that was expected to be distorted. For the two hotels, the endpoints of the scale were 1 = ‘‘very low financial risk in
135
investing in Hotel J’’ and 9 = ‘‘very low financial risk in investing in Hotel C’’, while the midpoint 5 = ‘‘equal financial risk’’. ‘‘For the vehicles, 1[9] = ‘‘Vehicle J[C] poses a far greater risk to passenger safety’’ and the midpoint 5 = ‘‘both vehicles pose an equal risk to passenger safety’’. For the evaluation of single options, the labeled points of the 1-to-9 scale read, for investments, 1 = ‘‘very low financial risk in investing in this hotel’’, 5 = ‘‘neither low nor high’’, and 9 = ‘‘very high financial risk in investing in this hotel’’. For the evaluation of a single vehicle, 1 = ‘‘a very low risk to passenger safety’’, 5 = ‘‘neither a low nor a high risk to passenger safety’’, and 9 = ‘‘a very high risk to passenger safety’’. Note that participants provided an evaluation of risk (or of relative risk in binary choice), not an estimate of likelihood or probability. As noted earlier, we allowed participants their own interpretations of risk and presumed that these conceptions included both some estimate of likelihood and some estimate of the magnitude of loss. In the hotel domain, these would translate, respectively, into the probabilities and levels of financial loss. Thus, the evaluation measure was intended to include a more complete notion of risk, rather than only an estimated probability. The next two responses captured the emerging disposition favoring one option that was expected to orient the distortion of an attribute. In the binary choice, the second and third responses indicated which alternative was ‘‘leading’’ and by how much. For instance, participants making the hotel investment choice were asked which of the two posed the greater financial risk and their confidence in that identification on a scale from 50 (‘‘50–50; a complete toss-up’’) to 100 (‘‘absolutely certain’’). For the evaluation of single options, only one response was needed to capture the information in the second and third binary choice responses. This was a rating on a 0-to-100 scale whose labeled points read, for hotels, 0 = ‘‘an absolute minimum financial risk in investing in this hotel’’ and 100 = ‘‘an absolute maximum financial risk in investing in this hotel’’. For the evaluation of single vehicles, 0 = ‘‘an absolute minimum risk to passenger safety’’ and 100 = ‘‘an absolute maximum risk to passenger safety’’. 1.6. Control conditions In order to obtain unbiased estimates of the attributes’ values, the control groups evaluated the same information, but not in the context of binary choice or single-option judgment. Because attribute evaluations were biased in these contexts (Russo et al., 1998), it was necessary to avoid them in order to obtain the needed unbiased estimates of the diagnostic value of each attribute. This was achieved by preventing the formation of the cumulative preference that drives the distortion of new information. First, in the choice and judgment tasks described above, the second and third responses were omitted, so that there was no request to maintain a running, cumulative evaluation. Second, to prevent a cumulative evaluation from developing spontaneously, the letters that identified each hotel or vehicle were changed from attribute to attribute. Thus, if the first attribute for hotels was dining options, the information was presented for Hotels J and C. Then the second attribute’s hotels were identified by L and S; the third attribute’s letters were P and F; and so on. This effectively precluded cumulating the attribute evaluations into a leaning toward one alternative which could drive the distortion of the new information. The result of the control groups’ evaluations should have been the unbiased diagnostic value of each attribute. The control conditions also enabled a test for the equivalence of the verbal and numerical versions of the same information. Recall that we wanted only the format, not the diagnostic values, of the information to change. For the 24 individual attribute descriptions, the mean of the verbal–numerical differences on the 1-to-9 was +0.05 (range: −1.10 to +1.39). For the 12 binary attributes, the mean difference was −0.05 (range: −1.03 to +0.62). Thus, on average, both versions were similar in their evaluations.
136
J.E. Russo, K. Yong / Journal of Econometrics 162 (2011) 132–139
Table 1 Mean information distortion. Task
Binary choice Single evaluation Overall a
Hotel investment
Vehicle safety
Both domainsa
Information format
Information format
Information format
Numerical
Verbal
Numerical
Verbal
Numerical
Verbal
0.72 0.43 0.58
0.61 0.52 0.57
0.59 0.35 0.47
0.58 0.28 0.43
0.66 0.39 0.52
0.60 0.40 0.50
The values combined over domains and over tasks are equally weighted means that ignore any differences in sample sizes.
1.7. Computation of information distortion In the binary choice task, the distortion of one participant’s rating of one attribute (on the 1-to-9 scale) was computed as follows. The control group’s mean for that attribute was subtracted from the participant’s rating. That calculated difference was signed positively if its direction favored the ‘‘leading’’ alternative (i.e., accorded with the current disposition). In such cases the evaluation of the attribute was distorted to support the leading alternative, hence the positive sign. In contrast, if the direction of the difference between the attribute’s evaluation and the control group’s unbiased value pointed toward the ‘‘trailing’’ alternative (i.e., ran counter to the current disposition), the sign of that difference was negative. Suppose, for example, that the individual’s rating was 7 and the control group’s mean of the same attribute was 5.4, then if the alternative that anchored the high end of the scale (i.e., 9) was leading immediately prior to the attribute’s presentation, distortion was computed to be positive, (7 − 5.4) = +1.6. In contrast, if the alternative that anchored the low end of the scale (i.e., 1) had been leading, then distortion would have been −1.6. Note that distortion cannot be calculated when there is no leader immediately prior to an attribute. Thus, there cannot be any distortion of the first attribute, nor did we calculate distortion when the confidence rating of the previous attribute indicated no leaning toward one of the two alternatives (‘‘50–50; a complete toss-up’’). One participant provided only responses of no preference, and so made no contribution to the database and was effectively self-disqualified. In the binary choice task, the ‘‘leading’’ alternative defined the current disposition and, therefore, the direction of distortion. When participants evaluated single options, identification of the ‘‘leader’’ required an additional step. Recall that the evidence of the current disposition was a value on a scale from 0 to 100, where the lower end indicated, for the hotel investment, ‘‘will certainly NOT fail’’ and the upper end ‘‘will certainly fail’’. To achieve the dichotomy between the ‘‘leader’’ being ‘‘not fail’’ or ‘‘fail’’, we simply partitioned this scale into lower and higher segments. Values in the lower part were assigned to ‘‘not fail’’ and those in the higher to ‘‘fail’’. This partition enabled us to identify either ‘‘not fail’’ or ‘‘fail’’ as the ‘‘leading’’ direction. The point of partition of the scale into lower and higher segments was set as the mean rating on the scale from 0 to 100. This was considered the best estimate of the subjective midpoint of the scale. For each of the four conditions, this calculated midpoint was: verbal-investment, 32.2; numerical-investment, 38.0; verbal-vehicle, 38.4; and numericalvehicle 37.7. Once a participant’s position was identified as being on one side or the other of these condition-specific midpoints, we could compute distortion following the procedure described above for binary choice. 2. Results The data analysis must answer three questions about the magnitude of information distortion: was it present, did it differ between single-option evaluation and binary choice, and did the numerical format reduce it relative to the verbal format?
The answers to all three questions were obtained from a 2 × 2 × 2 ANOVA with the three factors of task (single-option evaluation versus binary choice), format (numerical versus verbal) and domain (the investment risk of a resort hotel and the safety risk of a motor vehicle). The mean levels of information distortion are displayed in Table 1. The presence of information distortion was confirmed. The unweighted mean across the eight cells of the 2 × 2 × 2 design was 0.51. The corresponding test of statistical significance is the intercept against the null hypothesis of µ = 0. The resulting F (1, 161) = 34.93 (p < 0.001). Thus, an emerging disposition, such as the judgment that one investment is less risky than another, distorted the estimate of the risk conveyed by new information. The second goal was to compare the magnitudes of distortion when single-option evaluations and binary choices were based on the same information. The only other study to compare the two evaluation modes did so in the context of preferential choice, not risk (Bond et al., 2007). These researchers found no reliable difference in distortion between the single evaluations and choice. The analysis of our data revealed mean information distortion in single-option evaluation of 0.40 and in binary choice of 0.63. Although their relative difference seemed substantial, it was not statistically reliable, F (1, 161) = 1.80, p > 0.18. Thus, we concluded that there was no evidence that the amount of information distortion was different during single-option evaluations and binary choices. Finally, might the bias have been substantially reduced, or possibly eliminated, when the information was numerical rather than verbal? The mean information distortion was 0.50 in the verbal format and 0.52 in the numerical. These values were not even directionally supportive of numerical superiority, F (1, 161) = 0.015, p > 0.90. Thus, contrary to expectation, there was no evidence of less bias when the information was numerical rather than purely verbal. No other terms in the ANOVA were significant. That is, there was no main effect of the hotel versus vehicle domain nor any significant interactions. 2.1. Awareness of bias Were participants aware of their information distortion? If they were, there is more hope that they can self-correct. If they were not, then some external intervention would seem to be needed. Reported awareness was correlated with mean distortion across participants, separately for all binary choices and for all single option evaluations. A significantly positive correlation would have indicated participants’ awareness of their own tendency to distort information. The resulting values were 0.01 for binary choices and −0.12 for the single-option evaluations. Neither was significantly different from zero (for both, two-sided p > 0.25). Thus, there was no evidence that participants were aware of their own information distortion. 3. Discussion The potential for the desirability bias to taint the assessment of risk is well known (e.g. Krizan and Windschitl, 2007). The
J.E. Russo, K. Yong / Journal of Econometrics 162 (2011) 132–139
dismaying contribution of the present work is to demonstrate that the desirability bias may affect risk assessment even in the absence of a prior disposition. One consequence of this result is a creeping and unappreciated overconfidence. Consider the process of forming an evaluative disposition, whether a choice or a judgment. The desirability bias that is driven by the emerging disposition appears during the process as distorted information, a distortion about which individuals seem unaware. Consequently, these same individuals are likely to be just as unaware that the resulting confidence in their final position is unjustified. Indeed, one can almost describe them as having talked themselves into that final confidence, as they distort their evaluation of new information to support the current (and tentative) disposition. Thus, they end up with an unwarranted conviction in their final choice or judgment, which may be manifest both as overly optimistic risk estimates and as the premature commitment to that final action. The issue that we address in the remainder of this discussion is the challenge of reducing the desirability bias, especially the emerging form that our empirical work has exposed. We take the challenge of remediation seriously, both per se and for what it may teach us about the underlying phenomenon. Three aspects of this topic are considered: (a) why the familiar form of the bias is not self-corrected; (b) the role of numerical formats, particularly in light of our own results; and (c) two potential tactics for remediation. 3.1. Persistence of the desirability bias We make two presumptions about the standard desirability bias, that is, when it is driven by a clear prior disposition favoring one outcome. First, people are at least partially aware of the bias, even if such awareness is restricted to perceiving the bias in others. Second, people would be embarrassed to realize that they are distorting information about what will occur in order to accord with what they want to occur, as suggested by Mary Midgley’s head quote. Yet, in spite of awareness and embarrassment, the desirability bias seems to endure, at least for many people in many situations. There are at least two barriers to self-correction of the desirability bias. First, people are often not aware of it when it occurs. John Stuart Mill observed, ‘‘while everyone knows himself to be fallible, few. . . admit the supposition that any opinion of which they feel very certain may be one of the examples of the error to which they acknowledge themselves to be liable’’ (Mill, 1926, p. 21). That is, awareness of a bias in general may be quite different from recognition at the time it is being exhibited and needs to be corrected. A second reason that the desirability bias endures is that correcting it may require eliminating the information distortion during the decision process that is, at least in part, its source. The latter is made especially difficult because, as our study reveals, people are unaware of such distortion. Thus, limited awareness of the desirability bias as an outcome of a decision process would not seem to translate into awareness of information distortion as it occurs during the decision process. Both causes bring to mind the saying, ‘‘You can’t fix it, if you can’t find it’’. The result is the continued presence of a bias that people are at least somewhat aware of and almost certainly reject. 3.2. The value of numbers over words There is a general belief that the precision of numbers should not only inhibit any self-serving distortion, but more generally encourage objective analysis (e.g., Nakao and Axelrod, 1983; Stone and Dilla, 1994). Why then did the switch to the numerical
137
format in our study not reduce the observed desirability bias? We believe that the presence of multiple subattributes within the same descriptive paragraph may have provided enough maneuvering room for distortion to occur. Our participants could easily alter either the diagnostic value or the importance weight that they assigned to such subattributes as ‘‘10 inexpensive restaurants’’ or ‘‘none more than a 15-min walk’’. If the emerging disposition pointed toward the hotel described by these two subattributes so that information distortion should make them more positive, individuals might think, respectively, ‘‘lots of choice’’ and ‘‘Good. I’m not going there to hike to dinner!’’. In contrast, if the competing hotel were leading so that desirability/distortion pointed in the opposite direction, decision makers could just have easily thought more negatively, ‘‘How many restaurants do I need for six nights?’’ and ‘‘Okay, but they don’t tell you how many are actually at the hotel—and it does rain there!’’. The point is that if decision makers want to rationalize an evaluation of an attribute one way or the other, it can be easily done by interpreting the information in the subattributes, whether they are numerical or not. Does this suggest that the claimed advantages of numerical representations over verbal ones are illusory? If the advantage is founded on numbers inserting greater analytic rigor into the process of evaluating information, the present results are discouraging. Possibly the advantage is realized, or is more likely to be realized, when the information consists only of a single number. For instance, Kagehiro (1990) shows that in judgments of legal culpability, verbal phrases conveying evidentiary standards (e.g., ‘‘preponderance of evidence’’ and ‘‘clear and convincing evidence’’) have far less effect than the same phrases augmented by a single number (‘‘>50%’’ and ‘‘>70%’’, respectively). In the case of information distortion, in as yet unpublished work, Meloy (personal communication) found that when an attribute consists only of a single number for each alternative, distortion is reduced by about half. In the hotel example, such information for the dining options attribute might look something like, ‘‘Hotel J has 12 restaurants at the hotel itself or within easy walking distance, while Hotel C has 10 such restaurants’’. Alternatively, the attribute could be presented as a rating by experts on a 1-to-7 scale, ‘‘Hotel J’s dining, both at the hotel itself and nearby, earned it a 5.1 from Resorts Magazine’s experts while Hotel C received a 5.3 from the same panel’’. However, we note that even in this sparsest case of a single number for each alternative, information distortion was not eliminated.4 Thus, whatever the ameliorative value of a numerical representation may be, it is not a guaranteed solution to the information distortion that underlies the desirability bias. 3.3. Two ameliorative tactics The two tactics that we discuss deal, respectively, with the cases of the presence and absence of a prior disposition to favor one outcome or alternative. The absence of such a prior disposition is, of course, the phenomenon that distinguishes the present from all prior work on the desirability bias. 3.3.1. Concealment of identity When there is a stable prior disposition, one might try to conceal the identity of the risk that is to be evaluated. That is,
4 Again, it seems that the key to enabling distortion is altering the numerical values themselves or their importance to the decision. For instance, if Hotel J were leading, the clear numerical advantage of 5.3 over 5.1 could be dismissed by thinking that ‘‘they’re practically equal’’. In contrast, when Hotel C is the leader, an easy response is, ‘‘That’s a real advantage for C’’. These quotations are based on actual justifications from pilot data.
138
J.E. Russo, K. Yong / Journal of Econometrics 162 (2011) 132–139
maybe the best one can do to reduce the bias is to attempt to disguise the context or identity of the risk, describing it only by its constituent characteristics. This is similar to asking consumers to evaluate a potential new product by describing its attributes or functionalities without naming the brand or manufacturer. The present work suggests that this tactic may have, at best, partial success once a tentative disposition emerges and begins to drive information distortion. As soon as one unit of information has been received, its evaluation is likely to lead to the distortion of the evaluation of the next information. Nonetheless, this bias may be less, possibly substantially less, than when the risk’s identity is known from the start. 3.3.2. Precommitment If concealment is inadequate, how else might we eliminate this emerging desirability bias, or at least reduce it? One potential alternative is to require a precommitment to the information’s value before it appears in the context of a risky decision or judgment. For instance, individuals could be required to evaluate each unit of information (i.e., every attribute description in our study) on a numerical scale prior to any evaluation during a decision. Carlson and Bond (2006) (see also Carlson and Pearo, 2004) have successfully done this in preferential choice. Their work suggests that the same tactic may succeed here as well, though this claim should be empirically verified. 3.4. Conclusion In this paper, we present a study investigating the question of how risk-relevant information is processed during a risky decision or judgment. We uncover one sobering truth about decision making in a risk context: the distortion of information is widespread and seems immune to intuitively obvious safeguards, like presenting risk information in a numerical format. The discussion considered one promising technique for reducing this desirability bias, precommitment to a numerical evaluation. A better understanding of the various mechanisms that underlie information distortion may lead to delimiting its scope and to additional methods for less biased assessment of risk. Acknowledgements This work was supported by Grant #SES-0112039 from the National Science Foundation to the first-listed author. Both authors thank Kurt A. Carlson, Margaret G. Meloy, and Nicholas A. Seybert for their insightful comments on an earlier draft, and Nicole Grospe, Karen Schandler, and Debbie Trinh for their assistance in collecting and analyzing the data. Appendix A.1. Cover story for the evaluation of a single hotel You have just been made the head of your college’s Entrepreneurship Team. Several years ago an alum who had been financially successful as an entrepreneur donated ‘‘seed money’’ to give students the chance to learn how to evaluate investment proposals made by new companies. The money funded investments that were chosen by the Entrepreneurship Team that you now lead. In the early years of the Team, which is really a club, student members were successful in picking investments that were generally able to make a profit. However, during the last two years the performance of the club’s investments has been disappointing. Simply put, too many of the recent investments have failed and led to significant overall losses. If this downward trend continues, the club will have to close down. To give you a last chance to turn around
the club’s investment portfolio, the alum benefactor has allocated an additional $100,000 in discretionary funds to enable you to seek out new investments. The hope is that these new investments will yield profits that enable the club to survive. However, should these new investments fail to yield a sufficient return in time, the benefactor will not provide further funding and the university will have no choice but to close down the club. In response, you form a team whose job is to seek and, especially, to evaluate potential investment opportunities. The two potential investments that have survived the initial screening process are both youth-oriented resort hotels currently in development. You must choose one of them. The choice of youthoriented resort hotels is important because your team feels that they have the necessary knowledge and background to make a good decision. The two properties are being developed in similar communities and have similar construction costs. The real question is their relative chances of commercial success or, in terms of the Entrepreneurship Team’s current situation, the relative risk of financial failure. To evaluate carefully these two development projects, this special team is divided to cover six aspects of resort hotels: beach, location, management, restaurants, rooms, and target market. For each aspect a small subgroup researches and submits its analysis of both investment opportunities. Following good practice, these reports are prepared separately so that each subgroup contributes its analysis independently of the others. After several weeks of research, the six subgroups report their analysis of the two investments. The following pages present summaries of the six reports for the two different hotels. Your task is to evaluate each of the summaries and then decide which hotel you should invest in. As you learn about the hotels, keep in mind that you must decide between them on the basis of financial risk. A.2. Cover story for the binary vehicle choice A younger cousin of yours is about to start college and her parents want to buy her a car. Because of the notoriously high teenage accident rate, her parents are worried about her entering an unsafe driving situation with a friend that has poor driving skills or is untrustworthy. After reading a recent study claiming that there has been a significant increase in the number of students that have been in an accident in the short time they have been driving, her parents were even more insistent that she have her own car. Her parents are most concerned with purchasing a safe and economical car. Together, they sought to gather available information to find the perfect car. After looking through various magazines, including Consumer Reports and Motor Trend, they narrowed down their options to two different possibilities, a sport utility vehicle and a sports sedan. Both are comparably priced. The parents cannot seem to decide between the two. They realize that vehicle safety is not only a function of the vehicle itself, but also of how it is driven, including what kind of driving behavior it invites. Because you are a college student, who rides in cars with other students, and has a sense of how each of these two vehicles will actually be driven, the parents are asking for your opinion in order to help them decide. They have gathered a lot of relevant information from reliable sources about the safety risk of both vehicles. They give this to you and ask for your evaluation about which vehicle poses less safety risk to their daughter and, consequently, which one they should buy. (Because they cost essentially the same, the determining factor is only the safety risk.)
J.E. Russo, K. Yong / Journal of Econometrics 162 (2011) 132–139
References Bond, S.D., Carlson, K.A., Meloy, M.G., Russo, J.E., Tanner, R.J., 2007. Precommitment bias in the evaluation of a single option. Organizational Behavior and Human Decision Processes 102 (2), 240–254. Boulding, W., Kalra, A., Staelin, R., 1999. The quality double whammy. Marketing Science 18, 463–484. Boyle, P.J., Hanlon, D., Russo, J.E., 2009. The Act of Decision Making as a Source of Entrepreneurs’ Unwarranted Confidence. Working Paper. Central Washington University. Budescu, D.V., Karelitz, T.M., Wallsten, T.S., 2003. Predicting the directionality of probability words from their membership functions. Journal of Behavioral Decision Making 16, 159–180. Carlson, K.A., Bond, S.D., 2006. Improving preference assessment through preexposure to attribute levels. Management Science 52 (3), 410–421. Carlson, K.A., Pearo, L.K., 2004. Limiting predecisional distortion by prior valuation of attribute components. Organizational Behavior and Human Decision Processes 94, 48–59. Cohen, B.L., Wallsten, T.S., 1992. The effect of consistent outcome value on judgments and decision making given linguistic probabilities. Journal of Behavioral Decision Making 5, 53–72. DeJoy, D.M., 1992. An examination of gender differences in traffic accident risk perception. Accident Analysis and Prevention 24 (3), 237–246. DeKay, M.L., Patino-Echeverri, D., Fischbeck, P.S., 2009a. Better safe than sorry: Precautionary reasoning and implied dominance in risky decisions. Journal of Behavioral Decision Making 22, 338–361. DeKay, M.L., Patino-Echeverri, D., Fischbeck, P.S., 2009b. Distortion of probability and outcome information in risky decisions. Organizational Behavior and Human Decision Processes 109, 79–92. DeKay, M.L., Stone, E.R., Miller, S.A., Leader-driven distortion of probability and payoff information affects choices between risky prospects. Journal of Behavioral Decision Making (in press). Fihn, S.D., Callahan, C.M., Martin, D.M., McDonell, M.B., Henikoff, J.G., White, R.H., 1996. The risk for and severity of bleeding complications in elderly patients treated with Warfarin. Annals of Internal Medicine 124 (11), 970–979. Fischer, I., Budescu, D.V., 1995. Desirability and hindsight biases in predicting results in a multi-party election. In: Caverni, J.P., Bar-Hillel, M., Barron, F.H., Jungermann, H. (Eds.), Contributions to Decision Making I. Elsevier Science, New York, pp. 193–211. Goszczynska, M., Tyszka, T., Slovic, P., 1991. Risk perception in Poland: a comparison with three other countries. Journal of Behavioral Decision Making 4 (3), 179–193. Hay, J., Shuk, E., Cruz, G., Ostroff, J., 2005. Thinking through cancer risk: changing smokers’ process of risk determination. Qualitative Health Research 15 (8), 1074–1085. Holyoak, K.J., Simon, D., 1999. Bidirectional reasoning in decision making by constraint satisfaction. Journal of Experimental Psychology: General 123 (1), 3–31. Kagehiro, D.K., 1990. Defining the standard of proof in jury instructions. Psychological Science 1, 194–200. Kernic, M.A., Wolf, M.E., Holt, V.L., 2000. Rates and relative risk of hospital admission among women in violent intimate partner relationships. American Journal of Public Health 90 (9), 1416–1420.
139
Koonce, L., McAnally, M.L., Mercer, M., 2005. How do investors judge the risk of financial items? The Accounting Review 80 (1), 221–241. Krizan, Z., Windschitl, P.D., 2007. The influence of outcome desirability on optimism. Psychological Bulletin 133 (1), 95–121. Lord, C.G., Ross, L., Lepper, M.R., 1979. Biased assimilation and attitude polarization: the effects of prior theories on subsequently considered evidence. Journal of Personality and Social Psychology 37, 2098–2110. McGuire, W.J., McGuire, C.V., 1991. The content, structure, and operation of thought systems. In: Wyer Jr., R.S., Srull, T.K. (Eds.), Advances in Social Cognition, vol. 4. Erlbaum, Hillsdale, NJ, pp. 1–78. Meloy, M.G., Personal communication, 1 July 2006. Meloy, M.G., Russo, J.E., 2004. Binary choice under instructions to select versus reject. Organizational Behavior and Human Decision Processes 114–128. Meloy, M.G., Russo, J.E., Miller, E.G., 2006. Monetary incentives and mood. Journal of Marketing Research 267–275. Messick, D.M., Bazerman, M.H., 1996. Ethical leadership and the psychology of decision making. Sloan Management Review 37 (2), 9–22. Mill, J.S., 1926. On Liberty Routledge. London. Nakao, M.A., Axelrod, S., 1983. Numbers are better than words: verbal specifications of frequency have no place in medicine. American Journal of Medicine 74, 1061–1065. Olsen, R.A., 1997. Desirability bias among professional investment managers: some evidence from experts. Journal of Behavioral Decision Making 10 (1), 65–72. Russo, J.E., Carlson, K.A., Meloy, M.G., 2006. Choosing an inferior option. Psychological Science 17 (10), 899–904. Russo, J.E., Carlson, K.A., Meloy, M.G., Yong, K., 2008. The goal of consistency as a cause of information distortion. Journal of Experimental Psychology: General 137 (3), 456–470. Russo, J.E., Medvec, V.H., Meloy, M.G., 1996. The distortion of information during decisions. Organizational Behavior and Human Decision Processes 661, 102–110. Russo, J.E., Meloy, M.G., Medvec, V.H., 1998. The distortion of product information during brand choice. Journal of Marketing Research 354, 438–452. Russo, J.E., Meloy, M.G., Wilks, T.J., 2000. Predecisional distortion of information by auditors and salespersons. Management Science 46 (1), 13–27. Slovic, P., 1966. Cue consistency and cue utilization in judgment. American Journal of Psychology 79, 427–434. Slovic, P., 1993. Perceived risk, trust, and democracy. Risk Analysis 13, 675–682. Smits, T., Hoorens, V., 2005. How probable is probably? It depends on whom you’re talking about. Journal of Behavioral Decision Making 18, 83–96. Stone, D.N., Dilla, W.N., 1994. When numbers are better than words: the joint effects of response representation and experience on inherent risk judgments. Auditing: A Journal of Practice and Theory 13 (Suppl.), 1–19. Weber, E.U., Blais, A.-R., Betz, N.E., 2002. A domain-specific risk-attitude scale: measuring risk perceptions and risk behaviors. Journal of Behavioral Decision Making 15 (4), 263–290. Weber, E.U., Hsee, C.K., 1998. Cross-cultural differences in risk perception, but crosscultural similarities in attitudes toward perceived risk. Management Science 44 (9), 1205–1217. Williams, A.F., 2003. Teenage drivers: patterns of risk. Journal of Safety Research 34, 5–15. Yates, J.F., 1990. Judgment and Decision Making. Prentice-Hall, Englewood Cliffs, NJ. Zeckhauser, R.J., Viscusi, W.K., 1990. Risk within reason. Science 248, 559–564.
Journal of Econometrics 162 (2011) 140–147
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
The effects of information about health hazards in food on consumers’ choice process Amir Heiman a,∗ , Oded Lowengart b a
The Department of Agricultural Economics and Management, The Hebrew University of Jerusalem, Israel
b
The Department of Business Administration, School of Management, Ben-Gurion University of the Negev, Israel
article
info
Article history: Available online 11 July 2010 Keywords: D81-Criterion for decision making under uncertainty Negative information Choice process Health hazards
abstract This study examines the effects of context (health hazard), direction (positive versus negative) and intensity of information about health hazards on consumers’ choice processes. We propose that choice of frequently purchased food commodities, ceteris paribus, is based on a single dimension—taste. We develop a set of hypotheses regarding the type of choice process to be employed in various information types and empirically test them in a field experiment design. Our results indicate that a single-dimension choice process is employed under a nonsevere message and a multidimensional process under high-intensity negative information. © 2010 Elsevier B.V. All rights reserved.
1. Introduction The role of information about risk in decision making has been studied in the behavioral literature, predominantly in the context of individuals’ suboptimal choices. Deviations from normative decision outcomes are mainly attributed to the inherent biases in information processing, memorizing and decision making. A large body of research has been devoted to exploring biases in the integration of new information and the effect of these biases on choices. Most of the studies in this general area are primarily concerned with examining the effects of various factors that can affect individuals’ judgments. These factors include, among others, availability of cognitive resources, time pressure, limited information, integration of new information, representativeness and affect. The effects of a single factor or the interactive effects of two (or more) factors on consumer judgment are generally examined in a controlled experimental design. Individual judgment tasks under such manipulations are characterized by a choice among lotteries (i.e., risky) or probability judgments of outcomes for which consumers, in general, have limited knowledge. Testing for differences among individuals in this context is typically carried out by comparing attitude ratings, opinions or likelihood of choices under the different manipulations, with an
∗ Corresponding address: Department of Agricultural Economics and Management, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, P.O. Box 12, Rehovot, 76100, Israel. Tel.: +972 8 9489143; fax: +972 8 9466267. E-mail address:
[email protected] (A. Heiman). 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.07.003
emphasis on the overall utility or evaluations of products by individuals. We offer a different approach for analyzing the role of various types of information on consumer choice by analyzing the consumer choice-decision process and not outcome judgments. Specifically, we are interested in the domain of choices with information on risk. Therefore, our approach differs from the main stream of research in this area in three dimensions. First, we are interested in the process of decision making and in that context, we are aiming to gain additional insight into the effect of information about health risk on this process. Analyzing the saliency of the dimensions (i.e., product characteristics) of the choice process under different information types makes it possible to characterize different choice-process strategies. Furthermore, the emphasis on choice between alternatives enables us to capture the relative competitiveness intensity between products. To this end, a choice model is employed that provides us with diagnostic information about the choice processes and not just a comparison of attitude ratings or similar measures. Second, the research design is a field experiment that allows us to depict a more realistic range of responses from individuals. Third, we study choice of food commodities, a product category that is frequently purchased by consumers and about which they have a relatively high degree of knowledge. Their choice process under ‘‘regular’’ information, therefore, might be based on some ‘‘simple’’ decision rules which may be altered by new product information. 2. Conceptual framework Previous studies that explored aspects of negative information about health hazards on consumer behavior were primarily
A. Heiman, O. Lowengart / Journal of Econometrics 162 (2011) 140–147
concerned with the effect of negative information on attitude and purchase intention through food labeling (Viscusi and O’Connor, 1984; Zuckerman and Chaiken, 1998) and education and nutrition values (Wansink, 2005). Choices can be conceptualized as the outcome of a process (decision making) of selecting the alternative that will yield the highest utility. Resources, therefore, are allocated to the decision task when a consumer thinks that this will lead to a better choice. In frequently purchased commodities such as food in general, and meat products in particular, consumers are generally familiar with the products’ characteristics and will, therefore, minimize the cognitive effort in product choice (Brock and Brannon, 1992). Minimizing cognitive efforts induces the use of a mechanism that simplifies the choice process yet provides a good outcome. This is the basic concept behind Simon’s (1955) theory of bounded rationality that consumers can adopt an array of simplifying heuristics. Choices based on attribute comparisons are less cognitively demanding than product-based comparisons (Tversky, 1972). Choices based on noncompensatory decision rules are less resource demanding than compensatory choice processes (Bettman et al., 1998). The noncompensatory choice process assumes that there is no tradeoff between attributes. That is, high value in one attribute does not compensate for low value in another (Payne et al., 1992). In its essence, a noncompensatory process is a less intensive (shortcut) decision strategy, where a screening process of alternatives that is based on some heuristics (i.e., conjunctive, disjunctive, lexicographic1 ) is used (Gilbride and Allenby, 2004). Formally, consumers employ a screening rule that screens out all the alternatives that do not meet a specific criterion. Let xij denote the level of attribute i (i = 1 · · · n) in alternative j, and γi the threshold level set for attribute i. The disjunctive screening rule is defined by
Πn I xjn > γn = 1 and; 1 if Πn I xjn > γn I xjn > γn = 0
otherwise
(see Gilbride and Allenby (2004)). Choices that are based on comparing alternatives on a single attribute are a private case of disjunctive screening rules if all the alternatives pass the thresholds levels on all attributes. Employing a single attribute choice rule, which is not compensatory by definitions, is consistent with utility maximization notion, if the perceived differences between the alternatives on the other attributes is small, the importance of the other attributes is small, or both (see Kohli and Jedidi, 2007 for formulation of these conditions for the lexicographic rule). In the case of a noncompensatory choice strategy that is based on a single attribute, the probability that product k will be chosen is given by Pr(k) = x1k + εk > x1j + εj ∀j ∈ J
where x1k represents the deterministic evaluation of the first and most important attribute in product k and ε1k denotes the random
1 Conjunctive rule—a screening rule where the alternative is accepted if and only if it passes the thresholds on all the relevant attributes. The disjunctive screening rule requires that the alternative pass a threshold level on one attribute. The lexicographic choice rule screens alternatives on the most important attribute, and the alternative that yields the highest value on that attribute is chosen. Eliminating by aspects (Tversky, 1972) combines lexicographic and threshold strategies (i.e., screening of alternatives on the most important attribute, and eliminating alternatives that do not pass the cutoff level). Lexicographic choice strategy is consistent with the conjunctive rule (Gilbride and Allenby, 2004). It is more likely to be employed by consumers when the choice task is between products (e.g., meat commodities), and not brands (Hoyer and MacInnis, 2004, p. 230).
141
noise. The choice probability is, therefore, represented by Pr(k) =
∞
∫
xik −xij +εk
[∫
f εj d εj · · ·
xik −xiJ +εk
f εJ d εJ
]
−∞
−∞
−∞
∫
× f (εk ) dεk where J is the number of alternatives considered. In contrast, the compensatory choice process assumes a tradeoff between attributes. Formally, the probability that a consumer will choose alternative k is given by Pr(k) = (Vk (x1k · · · xnk ) + εk > Vj (x1j · · · xnj ) + εj ∀j ∈ J ) where Vk denotes the utility derived from the consumption of alternative k, and J denotes the consideration set. Vk is a function of the vector x1k · · · xnk that denotes the perception of the i = 1 · · · n attribute in alternative k. Additive utility function is an example of a compensatory choice rule (Keeney and Raiffa, 1976, ∑n p. 295: Roberts and Lattin, 1991), i.e., Vk (x1k · · · xnk ) = i=1 wi xik . Multi-attribute choice processes that are based on the sum or weighted sum of attribute ratings are considered to yield a good prediction of the choice task (e.g., Lynch, 1985). In this case, the probability that alternative k will be chosen is defined by Pr(k) =
∞
∫
∫
n ∑
wi xik k −
i=1
n ∑
wi xij +εk
i=1
−∞
∫ ×
f εj d εj · · ·
−∞ n ∑ i=1
wi xik −
n ∑
i=1
wi xiJ +εk
f εJ dεJ f (εk ) dεk
−∞
where J is the number of alternatives considered. The compensatory choice process requires more cognitive effort as it is composed of several inter-related aspects: collecting and integrating information on multiple product attributes across all alternative products being considered, weighing the different attributes according to their subjective contribution to utility, computing the utility for each alternative, and choosing the alternative that yields the highest utility. In low-involvement choice tasks such as frequently purchased, undifferentiated commodities (e.g., meat products), purchases are made with low cognitive effort (Wansink, 2005; Brock and Brannon, 1992). Compensatory processes, therefore, are less likely to occur and consumers will try to reduce the cognitive load by adopting a noncompensatory heuristic rule (see Bettman et al., 1998, for an extensive review). An important issue to consider is the number of attributes used in the choice process. We analogously draw from studies on products’ choice set (e.g., Roberts and Lattin, 1991; Hauser and Wernerfelt, 1990) and suggest that an attribute will be employed in the choice process if its inclusion adds to utility more than the cost of collecting additional information (on the attribute) and processing it. Bryant (2007) argued that in most low-involvement decisionmaking processes, the choice between alternatives is based on the attribute that guarantees the greatest distinctive criterion between the alternatives. It is logical to expect that a choice process based on a single attribute will minimize the cognitive resources and, therefore, will be employed in a purchase decision of frequently purchased products without new information. Taste is a more observable attribute than health in such product categories and, therefore, can provide a greater distinctive value in evaluating alternative products. Shepherd and Stockley (1985) and Shepherd and Towler (1992) argue that consumers’ experience with (and valuation of) food products is shaped by sensory attributes and in particular taste. In that context, Koivisto and Sjóden (1997) argue that taste is a good explanatory variable for food choices. ‘‘. . . some researchers suggest that taste, in particular, is the only criterion used when deciding whether to purchase a
142
A. Heiman, O. Lowengart / Journal of Econometrics 162 (2011) 140–147
particular food’’ (Holm and Kildevang, 1996; Moskovich et al., 2005 p. 9). Based on the above we postulated that
Based on the above discussion, we propose Hypothesis (2) that deals with the effect of nonacute information on the choice process.
H1: With no new information, the choice of repeatedly purchased food commodities will be based solely on the taste attribute. Now suppose that consumers are exposed to new information about possible health hazards associated with the consumption of a meat product that was previously considered to be healthy. Integration of new and unfavorable information with previous favorable beliefs about a product is supposed to reduce product evaluation. The reduction in evaluation will occur even if the new information is perceived to be inaccurate or unreliable, since individuals cannot completely ignore it. As Tybout et al. (1981) argue, ‘‘these thoughts are less positive than those that could have been retrieved in the absence of the rumor’’ (p. 74). Low cognitive effort, peripheral processing and a message that contradicts previously held beliefs may cause ambiguity. Ambiguity, in turn, encourages biased processing (Chaiken and Maheswaran, 1994). A cognitive dissonance explanation would argue that the conflict between previously held beliefs and new information may generate dissonance2 (Festinger, 1957). Feelings of dissonance can be reduced by decreasing the importance of the elements that might be causing this inconsistency. In the case of low-intensity information, an efficient way of doing this would be to ignore or reject the new information (Petty et al., 1997). This is also consistent with the self-affirmation framework (Petty et al., 1997) and defense motivation3 (Zuckerman and Chaiken, 1998). The literature suggests alternative explanations for why individuals deviate from the Bayesian rule in their judgments of risk or probability and information integration (e.g., Hertwig and Todd, 2000). These explanations include overweighting of new information when it is perceived to more accurately represent the situation consumers need to judge (e.g., Kahneman and Tversky, 1972), underweighting new information, amplifying priors, sticky priors (e.g., Bar-Hillel, 1980), overconfidence and biases (e.g., Alba and Hutchinson, 2000). Most of the work done in this area has focused on examining the effect of manipulated stimuli (i.e., new information and priors) on a specific outcome (i.e., choice of lotteries, rating of alternatives). Other studies have also included mediating variables to explain individuals’ judgments. However, a scarcity of research exists into the consumers’ ‘‘black box’’ that contains the decision process itself. This study aims at uncovering the effect of information about health hazards in food on choice processes. Identifying decision rules (e.g., compensatory) might facilitate better judgment of partial effects of information integration and the conditions leading to biased integration. The basic premise in this study is that information intensity (severity) will determine the choice strategy used by consumers. When the new information about health is perceived to be mild (nonacute), and it concerns an attribute that was not salient in the initial choice process, individuals will not ‘‘waste’’ scarce cognitive resources to change their decision process. Individuals will use a more demanding choice strategy when they have sufficient motivation, ability and opportunity to do so. Motivation may be increased when the new information is very damaging (Zuckerman and Chaiken, 1998). Fig. 1 illustrates the effects of information with various severities, valence, and strength of prior beliefs on the choice process.
H2: Low-intensity messages, either positive or negative, about health hazards in low-involvement food products will not affect choice. Thus, choice will still be based solely on the taste attribute of the product. When consumers are exposed to a high-intensity information message, which is threatening and can therefore not be discounted, they will abandon the heuristic decision-making strategy and engage in systematic processing (Chaiken, 1980). Since the prior was favorable and the posterior very unfavorable, a potential conflict may be created, increasing the likelihood of employing a more complex decision process. Engaging in a complex decisionmaking process will increase the likelihood of using some sort of weighted-sum (multi-attribute) decision process (Bettman et al., 1998). Negative feelings arise in choice situations that require individuals to engage in a tradeoff between important attributes, such as health and taste in the case of food products. Sacrificing the utility derived from taste (pleasure) to gain health or vice versa will increase the likelihood of employing a multi-attribute choice process (Luce et al., 1997). This notion is consistent with Coupey (1994), who argued that the process of decision making may vary according to circumstances and new information (learning). The next hypothesis summarizes the above discussion:
2 Cognitive dissonance occurs when there is a discrepancy between beliefs, attitudes and values in light of new information or experience that call prior beliefs into question. The discrepancy causes psychological discomfort and, as a result, adjustments are made to reduce the discrepancy. 3 Self-affirmation and defense motivation are psychological mechanisms that are operationalized to defend beliefs and internal consistency.
H3: A high-intensity, negative message will increase consumers involvement, leading them to increase the cognitive resources allocated to the choice procedure. This will result in employing a multiattribute decision in which health will be a salient variable. Hypotheses (2) and (3) suggest that more alarming (intense negative) information on the health hazards of food commodities will increase the cognitive resources allocated to the choice process, while information that is not severe will not change the decision-making process. The purpose of this study is not to gain a better understanding of the effect of information intensity on consumers’ perceptions of attributes, or on the potential changes in market shares. We, therefore, do not construct formal hypotheses concerning the relationships or effects of different information types on such aspects, but rather develop hypotheses about the choice process itself. Naturally changes in the choice process will result in changes in shares of the commodities or perceptual discrimination between attributes, but this is a matter for future research. 3. Research design A between-subject design was used in a field experiment that allowed detection of variations in consumer decision making in food commodities as a function of different types of information on health risks that differ in their severity and valence (positive versus negative). There were four different groups of respondents: a control group and three manipulation groups. Information on health hazards resulting from the consumption of chicken were manipulated in terms of their severity and valence. The first manipulated group was exposed to positive information (reduction of health hazards); the second group was exposed to mild negative information, and the third group received aggressive negative (severe) information. Interviews were held in the meat departments of similar stores of a large supermarket chain.4 A photocopy of an article with an unrelated story about a large dairy producer that intends to launch a new line of ready-to-eat ethnic salads was given to the interviewees. At the top of the page with the unrelated story, we inserted a manipulated short article
4 Collaboration with the second largest supermarket chain in Israel facilitated the interviews in the meat department.
A. Heiman, O. Lowengart / Journal of Econometrics 162 (2011) 140–147
143
Fig. 1. The effects of information on the choice process.
about health hazards resulting from the consumption of chicken. It included a report describing lab findings on poultry and readyto-eat foods aimed at tracing residual vaccines and antibiotics. For each of the three experimental groups, this part was changed according to the assigned manipulation as follows: The first version dealt with improvements in chicken quality and was manipulated through inclusion of a report claiming that chicken growers could now comply with any health criteria imposed by the European Common Market. This was claimed to be due to new breeding technology, which eliminated the need to give antibiotics to chickens during their last month of rearing. The message was framed as a reduction in losses rather than a profit enlargement. This was done to increase the ambiguity of the message and to increase the likelihood that the message would not be perceived as negative. The second version reported that only small traces of hormones and antibiotics had been found in a few chickens. It was made to sound like a sporadic finding that did not reflect information regarding most chickens. The third group received a report in which the main findings indicated that antibiotics had been found in about 60% of the sampled chickens (aggressive negative). Furthermore, the article stated that the antibiotics in question were only permitted by the strict European Common Market criteria if the chickens were not treated in the last month of rearing.5 We chose to depart from the research design of Johnson and Slovic (1995) and not to present
5 The three manipulations were pre-tested in focus groups comprised of students, held at a large Israeli university. We presented the three scenarios to each group and asked participants to talk about their feelings after reading the reports. The negative reports were indicated to be alarming and the judgment of the severity was in the predicted direction. Judgments about the positive news (health improvement) were mixed and while some thought that consumption was safer, others indicated that they had no idea they were eating something that could be hazardous. A total of three focus groups were conducted.
a specific risk measure (probability of death or illness) or a range of probabilities, which would increase uncertainty. This is mainly due to the findings of Ofir (1988) and Lynch and Ofir (1989) that perceptions of probabilities are sensitive to the relative weight of the base and the case. Additionally, consumers are exposed in real life to a similar form of information, thus our design enhances the external validity of the findings. The control group received the same questionnaire, but the manipulated part dealing with poultry was omitted. Each interviewer received a package containing a random assortment (i.e., manipulation) of information. In the field experiment, respondents were asked to express their opinion on the new product line that was not related to our study. Immediately afterwards, they were asked to participate in a study on meat-purchasing behavior and in return, they would receive a pair of quality Italian pantyhose worth about $25. The interviewer read the questions and recorded the answers. The interview took about 20 min. A total of 3306 participants were interviewed for this study, 80 in each of the manipulated groups and 90 in the control group. Consumers were asked to rate six meat commodities: chicken, turkey, beef, and ready-to-eat chicken, turkey and beef, on 10 attributes using a five-item Likert-type scale. The ready-to-eat products were used to capture the possible substitution effect resulting from the negative information about the meat. The following product attributes were rated: taste, health (positive direction), ease of preparation, price fairness (positive direction), diverse ways of cooking, unhealthiness (negative direction), low fat content, ease of preparation, taste (negative direction), and low cost.7 Evaluations with respect to taste and health were collected using two bipolar-scale questions: one with
6 Five interviewees declined to complete the questionnaire and six declined to participate, yielding a response rate of 96%. 7 See Just et al. (2002).
144
A. Heiman, O. Lowengart / Journal of Econometrics 162 (2011) 140–147
a positive statement and the second with a negative one.8 Fat content was separated from the health attribute as it has been found that certain segments of the population perceive fat in meat as a sign of quality and do not attribute it to health. These 10 product attribute ratings on all six product alternatives were used to explain consumer choice of the six meat products. Respondents were then asked to indicate how much of each of the six meat commodities they would choose in their next 10 shopping trips (i.e., dependent measure).
process. As such, if only one factor (taste) out of the three is significant without additional information, Hypothesis (1) is supported. Hypothesis (2) is supported if exposure to mildnegative or positive information did not affect choice process, i.e., the only significant factor will still be the taste aspects of the product. Hypothesis (3) suggests that at least two factors should be salient in the choice process in light of the exposure to negative information. Namely, the health and taste aspects of the product should be significant.
4. Empirical analysis
4.1. Choice model
As our interest is in identifying the choice process given different information types, we employed a two-stage type of analysis for that purpose: the first stage consisted of dimensionality reduction of the data using a principle components-based factor analysis, identification of underlying factors, and using the new dimensions as an input for the next step of the analysis. The analysis yielded a three-factor solution as presented in Table 1 (detailed results are reported in Appendix). As can be seen from Table 1, the dimensions have similar meanings, but vary in terms of explained variance. This indicates that the different types of information affected the structure of consumer perceptions. Recall that our study focuses on the effect of valence and severity of information on decision making and not on building a set of assumptions about the evaluation of product attributes (e.g., health) in light of such information. We did, however, use the health-related factor values to examine how health perceptions varied across the four different groups by factor analyzing the perceptions of chicken (manipulated product) across all groups. In the case of negative information about health hazards, one would expect a monotonic relationship, i.e., the greater the severity of the message, the higher the rating of the nonhealthy attribute. However, following Tybout et al. (1981), a positive message may reduce health perception (i.e., a higher rating on the nonhealthy attribute). Message severity may increase the perception of risk, but from a certain point a stronger (negative) message may be discounted (Lynch and Ofir, 1989), resulting in a counter-effect leading to a U-shaped relationship. Such U-shaped patterns have been found in studies exploring the relationship between fear and willingness to adopt protective behavior (e.g., McGuire, 1968). We employed a quadratic form of a polynomial contrast test as this allows the detection of relationships among the group means (instead of a simple main effect test as in ANOVA) and a verification of these differences. The mean values of the health dimension for the control, positive, mild negative, and aggressive negative groups were 0.3544, 0.0057, −0.2852, and −0.1237, respectively, and the contrast test was significant at the 0.05 level. A contrast test for differences between the control group and the average of all manipulated groups was significant at the 0.001 level. Similar results were obtained for the differences between the control group and positive information (0.05) and the control and the negative groups (0.001). From various angles (i.e., Table 1, contrast tests, and pre-testing), we get an indication that variation in the type of information affected respondents’ health evaluation and, therefore, provided face validity to this relationship. The second stage of the analysis aimed at estimating the probability of choice from the set of alternative substitute products, given the manipulation of information using a choice model (i.e., multinomial Logit). Under this framework, we are interested in obtaining diagnostic information about the choice
8 Cacioppo and Berntson (1994) recommend using a bipolar scale in situations in which it is suspected that good and bad are not opposite.
Following the identification of the underlying dimensions involved in purchasing such products, we applied the multinomial Logit (MNL) choice model with the factor scores from the previous stage (see Gensch and Ghose, 1992, for an application of this method). The MNL model is a simultaneous compensatory attribute choice model that incorporates the concepts of thresholds, diminishing returns to scale and saturation levels (McFadden, 1974). Let Uij be the utility of alternative product j for customer i, and m the number of alternative products. The utility function can be separated into a deterministic component Vij (measured in terms of perceived value associated with the characteristics of the products), and an unobserved random component, εij , which is assumed to be drawn from an independent and identical distribution such that Uij = Vij + εij .
(1)
The distribution of εij is assumed to be exponential (Gumbel type II extreme value) and thus the probability that alternative product j out of m alternatives will be chosen by customer i is represented by: Pij =
exp(Vij ) j =m
∑
.
(2)
(Vij )
exp
j =1
4.2. Utility specification The deterministic component of the utility function is a product of the weighted sum of the three factors identified in Table 1 and the product-specific component, i.e., Vij = α1 F 1ij + α2 F 2ij + α3 F 3ij + α4 PSA + α5 PSB
+ α6 PSC + α7 PSD + α8 PSE
(3)
where Fkij -Respondent i’s perceptions of factor (product dimension) k, through factor scores, of product alternative j, for k = 1, 2, 3 (health, taste-value, and convenience). PSJ -Product alternative j’s idiosyncratic effects, for j = 1, 2, 3, 4, 5, 6.9 α1 , α2 , α3 , α4 , α5 , α6 α7 , α8 -Parameters to estimate such that α1 , α2 , and α3 are the product dimension coefficients, and α4 , α5 , α6 , α7 , α8 are the product-specific variables’ coefficients. The chosen strategy is identified through the significance assigned to each of the three factors, i.e., α1 , α2 α3 . If only one of the three factor α ’s is significant, then among other things, we get an indication that a noncompensatory strategy has been adopted, and if more than one is significant, a weighted sum (compensatory) strategy has been used.
9 Certain product-specific variables, or other variables shared by all alternatives and which were not explicitly accounted for in this study, may add to the predictive power of the model. These product-specific variables capture the idiosyncratic effects of the product (see, for example Guadagni and Little, 1983). To avoid singularity, only j − 1 variables are included in the model.
A. Heiman, O. Lowengart / Journal of Econometrics 162 (2011) 140–147
145
Table 1 Factor analysis resultsa . Factor
Control
Chicken positive
Chicken negative
Chicken negative aggressive
F1 F2 F3 % of variance explained
Taste-value Health Convenience 64.34%
Taste-value Convenience Health 58.54%
Taste-value Convenience Health 60.96%
Health-value Taste Convenience 61.62%
a
The loading matrix analyses for the four different treatments are presented in Appendix, Tables A.1–A.4.
Table 2 MNL coefficients. Attribute
Health Taste-Value Convenience PS1 PS2 PS3 PS4 PS5 * **
Control coefficient
0.3046 0.8009* 0.0426 0.535364 0.386713 −0.0631673 0.186251 −0.250163
Valence Positive coefficient
Negative coefficient
Aggressive negative coefficient
0.2833 0.6507* 0.2496 1.4328 0.8777 0.7076 0.3301 −1.4402
0.1134 0.5164* 0.4336* 1.8643 1.5376 1.2411 0.0725 −0.5430
0.3547** 0.8013* 0.4019** 1.4172 0.8743 0.8909 0.3825 −0.6218
Significant at p < 0.05. Significant at p < 0.10.
4.3. Results The results of the empirical analysis are presented in Table 2 and enable us to test our set of hypotheses. It can be seen that the control group employed a noncompensatory choice strategy. Respondents in this group chose their products using a noncompensatory, lexicographic-type decision rule, where only one dimension is relevant (i.e., significant). Further, it can be seen that the relevant dimension in this noncompensatory choice process is the taste-value dimension. Thus, we get supports our first hypothesis, H1. Next, we examine Hypothesis 2, which relates to the effect of new, mild-intensity (nonsevere) information (positive or negative) on the choice process. The results in Table 2 indicate partial support for this hypothesis. Specifically, it can be seen that the new positive information did not result in a change in the choice process and it is still of a lexicographic type. Thus, we find support for the first part of our second hypothesis, H2. In the case of low-intensity negative information, we find partial support for Hypothesis (2). We find that the taste dimension is still salient in the choice process, and that the convenience of preparation becomes salient as well. The health dimension is still not salient in this case. These results indicate that our conceptual development is still valid. One possible explanation for the lack of full support for this hypothesis in the case of low-intensity negative information as compared to positive information is that these messages were not exactly equal in strength, and that the negative information was perceived to be somewhat stronger than the positive one in absolute terms. Another possible explanation is the asymmetry in consumer evaluation of positive and negative deviations from common reference points (Kahneman and Tversky, 1979). This idea bears exploring in future research. A strong unfavorable message caused individuals to base their choice on all three factors. As a result, the choice process was compensatory in nature, and the health aspects of the product became salient. That is, we find support for Hypothesis (3). In general, the product-specific dummy variables are in the right order of the aggregated market shares, thereby lending face validity to our empirical findings. Our results suggest that the purchasing decision of frequently purchased meat products is characterized by low involvement and low cognitive effort-type choice processes. With low cognitive effort and in the absence of information, consumers make their
choice based on one attribute—taste. As new information is presented to the consumers, the following occurs: a positive lowintensity message about a health hazard does not affect the choice process; it continues to be based on the taste attribute. In the case of low-intensity negative information, consumers change their decision process and utilize a more complex and resourcedemanding process. However, they continue to ignore the health aspects of the product. The finding that taste and convenience are salient in the choice process, and health is still not significant, might indicate that consumers hesitate to face two conflicting goals, health and taste. The exclusion of the health dimension and the inclusion of the convenience dimension in the decision process might reflect consumers’ problem-solving strategy, i.e. resorting to a nonthreatening dimension to avoid conflict. When the health message is framed as being of high negative intensity, individuals use a compensatory choice model. In this case, consumers decided that they could no longer ignore the health-hazard information, and it becomes salient in the choice process. When the message is strong enough, the individual will use all three attributes, requiring high cognitive effort. 5. Conclusions Our study focused on the effect of new information about health hazards on the decision-making process of purchasing food products. Unlike previous studies that explored the effect of information on attitude or share of consumption, our study is aimed at exploring the effect of information on the choice process. We argue that the decision-making process depends on the perception of the health risk. Before receiving new information on health, the choice of frequently purchased meats is based on a single attribute, taste. New information that contradicts previously held beliefs on the healthiness of a certain food increases uncertainty in the optimality of the choice process that is based on taste alone, without taking the health attribute into account. Our study implies that decision making on purchases of frequently purchased food is information-dependent, and the difference in strategies (i.e., single versus multi-attribute choice) is based on the severity of the message. High-intensity negative information creates a gap between the strong prior beliefs about the healthiness of the meat and the new information that creates doubts about this belief. Such information causes consumers to base their choice process on all the relevant choice attributes: taste,
146
A. Heiman, O. Lowengart / Journal of Econometrics 162 (2011) 140–147
Table A.1 Rotated component matrix—control. Attributes
Taste Healthiness Easy to prepare Value for money Diverse ways of cooking Good to eat frequently Not fatty Quick cooking No artificial flavor Inexpensive
Table A.4 Rotated component matrix—aggressive negative information. Factors and percent of explained variance
Attributes
1
2
3
0.870 0.483 0.042 0.719 0.547 −0.278 0.273 −0.043 0.681 −0.029
−0.051
−0.039 −0.046
0.649 −0.065 0.373 0.384 −0.678 0.766 0.052 0.358 0.634
0.896 0.125 −0.338 0.031 −0.025 0.879 0.030 0.018
Taste Healthiness Easy to prepare Value for money Diverse ways of cooking Good to eat frequently Not fatty Quick cooking No artificial flavor Inexpensive
Taste Healthiness Easy to prepare Value for money Diverse ways of cooking Good to eat frequently Not fatty Quick cooking No artificial flavor Inexpensive
2
3
0.161 0.709 0.005 0.557 0.134 −0.769 0.601 0.107 0.131 0.584
0.675 0.306 0.110 0.451 0.742 0.229 0.425 0.002 0.704 0.321
0.179 0.163 0.905 0.211 −0.152 0.108 0.139 0.914 0.128 0.005
See Tables A.1–A.4.
Factors and percent of explained variance 1
2
3
0.619 0.447 0.009 0.741 0.670 0.009 0.388 0.117 0.473 0.710
0.192 0.164 0.866 0.226 −0.302 0.006 0.007 0.868 0.005 0.139
0.182 0.652 0.006 0.009 0.001 −0.822 0.597 0.002 0.331 0.109
Table A.3 Rotated component matrix—mild negative information. Attributes
1
Appendix. Factor analysis result
Table A.2 Rotated component matrix—positive information. Attributes
Taste Healthiness Easy to prepare Value for money Diverse ways of cooking Good to eat frequently Not fatty Quick cooking No artificial flavor Inexpensive
Factors and percent of explained variance
Factors and percent of explained variance 1
2
3
0.698 0.506 0.008 0.676 0.565 0.006 0.314 −0.007 0.741 0.645
0.007 0.004 0.890 0.232 −0.270 0.104 0.214 0.918 −0.151 0.247
0.003 0.609 −0.003 0.224 0.261 −0.786 0.652 0.008 0.001 0.161
value, health and convenience (ease of cooking). Nonsevere (mild) negative information causes consumers to base their choice on two of the three factors—taste and convenience. We found that positive information (i.e., a message of improvement) did not affect the choice process relative to the control group. Future research might be conducted to examine whether similarities exist between positive information and rumor denial, as happens in many cases in the food industry. Additional studies might investigate other types of manipulation to examine their effects on the choice process (e.g., consumers’ health concerns, optimism, mood, uncertainty). Acknowledgements This research was supported by the Israeli Poultry Growers Board, the Davidson Center for Agribusiness, and the Center for Research in Agricultural Economics are gratefully acknowledged. We wish to thank Shai Danziger, Moshe Givon, David R. Just, Chezy Ofir, and Yacov Tsur for helpful comments.
References Alba, J.W., Hutchinson, J.W., 2000. Knowledge calibration: what consumers know and what they think they know. Journal of Consumer Research 27, 123–157. Bar-Hillel, M., 1980. The base-rate fallacy in probability judgments. Acta Psychologica 44, 211–233. Bettman, J.R., Luce, M.F., Payne, J.W., 1998. Constructive consumer choice processes. Journal of Consumer Research 25, 187–217. Brock, T.C., Brannon, L.A., 1992. Liberalization of commodity theory. Basic and Applied Social Psychology 13, 135–144. Bryant, D.J., 2007. Classifying simulated air threats with fast and frugal heuristics. Journal of Behavioral Decision Making 20, 37–64. Cacioppo, J.T., Berntson, G.G., 1994. Relationship between attitudes and evaluative space: a critical review, with emphasis on the separability of positive and negative substrates. Psychological Bulletin 115, 401–423. Chaiken, S., 1980. Heuristic versus systematic information processing and the use of source versus message cues in persuasion. Journal of Personality and Social Psychology 39, 752–766. Chaiken, S., Maheswaran, D., 1994. Heuristic processing can bias systematic processing: effects of source credibility, argument ambiguity, and task importance on attitude judgment. Journal of Personality and Social Psychology 66, 460–473. Coupey, E., 1994. Restructuring: constructive processing of information displays in consumer choice. Journal of Consumer Research 21, 83–99. Festinger, L., 1957. A Theory of Cognitive Dissonance. Row Peterson, Evanston, IL. Gensch, D.H., Ghose, S., 1992. Elimination by dimensions. Journal of Marketing Research 29, 417–430. Gilbride, T.J., Allenby, G.M., 2004. A choice model with conjunctive, disjunctive, and compensatory screening rules. Marketing Science 23, 391–406. Guadagni, P.M., Little, J.D., 1983. A logit model of brand choice calibrated on scanner data. Marketing Science 2, 203–238. Hauser, J.R., Wernerfelt, B., 1990. An evaluation cost model of consideration sets. Journal of Consumer Research 16, 393–408. Hertwig, R., Todd, P.M., 2000. Biases to the left, fallacies to the right, stuck in the middle with null hypothesis significance testing. Psycoloquy 11 (28), Social Bias (20) (http://psycprints.ecs.soton.ac.uk/archive/00000028/). Holm, L., Kildevang, H., 1996. Consumers’ views on food quality. A qualitative interview study. Appetite 27, 1–14. Hoyer, W.D., MacInnis, D.J., 2004. Consumer Behavior, 3rd ed. Houghton Mifflin, New York. Johnson, B.B., Slovic, P., 1995. Presenting uncertainty in health risk assessment: initial studies of its effects on risk perception and trust. Risk Analysis 15, 485–494. Just, D.R., Zilberman, D., Heiman, A., 2002. Characteristic uncertainty, perception and preferences: an extension of the Lancaster model. University of California at Berkeley. Working Paper. Kahneman, D., Tversky, A., 1972. Subjective probability: a judgment of representiveness. Cognitive Psychology 3, 430–454. Kahneman, D., Tversky, A., 1979. Prospect theory: an analysis of decision making under risk. Econometrica 47, 263–291. Keeney, R.L., Raiffa, H., 1976. Decision with Multiple Objectives: Preferences and Value Tradeoffs. John Wiley & Sons. Inc., New York, Chichester, Brisbane, Toronto. Kohli, R., Jedidi, K., 2007. Representation and inference of lexicographic preference models and their variants. Marketing Science 26 (3), 380–399. Koivisto, H.U.K., Sjóden, P.O., 1997. Food related end general neophobia and their relationship with self-reported food choice: familial resemblance in Swedish families with children of ages 7–17 years. Appetite 29, 89–103.
A. Heiman, O. Lowengart / Journal of Econometrics 162 (2011) 140–147 Luce, M.F., Bettman, J.R., Payne, J.W., 1997. Choice processing in emotionally difficult decisions. Journal of Experimental Psychology: Learning, Memory, and Cognition 23, 384–405. Lynch, J.G., 1985. Uniqueness issues in the decompositional modeling of multiattribute overall evaluating: an information integrating perceptive. Journal of Marketing Research 22, 1–19. Lynch, J.G., Ofir, C., 1989. Effects of cue consistency and value on base-rate utilization. Journal of Personality and Social Psychology 56, 170–181. McFadden, D., 1974. Conditional Logit analysis of qualitative choice. In: Zarembka, P. (Ed.), Frontiers in Econometrics. Academic Press, New York. McGuire, W., 1968. Personality and susceptibility to social influence. In: Borgatta, E., Lambert, L. (Eds.), Handbook of Personality Theory and Research. Rand McNally, Chicago, pp. 1130–1188. Moskovich, H.R., German, J.B., Saguy, I.S., 2005. Unveiling health attitudes and creating good-for-you foods: the genomics metaphor, consumer innovative web-based technologies. Critical Reviews in Food Science and Nutrition 45, 165–191. Ofir, C., 1988. Pseudodiagnosticity in judgment under uncertainty. Organizational Behavior and Human Decision Processes 423, 343–364. Payne, J.W., Bettman, J.R., Johnson, E.J., 1992. Behavioral decision research: a constructive processing perspective. Annual Review of Psychology 43, 87–131. Petty, R.E., Wegener, D.T., Fabrigar, L.R., 1997. Attitudes and attitude change. Annual Review of Psychology 48, 609–647.
147
Roberts, J.H., Lattin, J.M., 1991. Development and testing of a model of consideration set composition. Journal of Marketing Research 28 (4), 429–440. Shepherd, R., Stockley, L., 1985. Fat consumption and attitude towards foods with a high fat content. Human Nutrition. Applied Nutrition 39A, 431–442. Shepherd, R., Towler, G., 1992. Nutrition knowledge, attitudes and fat intake: application of the theory of reasoned action. Journal of Human Nutrition and Dietetics 5, 387–397. Simon, H.A., 1955. A behavioral model of rational choice. Quarterly Journal of Economics 69, 99–111. Tybout, A.M., Calder, B.J., Sternthal, B., 1981. Using information processing theory to design marketing strategies. Journal of Marketing Research 18, 73–79. Tversky, A., 1972. Elimination by aspects: a theory of choice. Psychological Review 97, 281–299. Viscusi, K.W., O’Connor, C.J., 1984. Adoptive responses to chemical labeling: are workers Bayesian decision makers? The American Economic Review 74, 942–956. Wansink, B., 2005. Marketing Nutrition: Soy, Functional Foods, Biotechnology and Obesity. University if Illinois Press, Urbana, Chicago. Zuckerman, A., Chaiken, S., 1998. A heuristic–systematic processing analysis of the effectiveness of product warning labels. Psychology and Marketing 15 (7), 621–642.