Springer Finance
Ramaprasad Bhar Shigeyuki Hamori
Empirical Techniques in Finance With 30 Figures and 30 Tables
123
Professor Ramaprasad Bhar School of Banking and Finance The University of New South Wales Sydney 2052 Australia E-mail:
[email protected] Professor Shigeyuki Hamori Graduate School of Economics Kobe University Rokkodai, Nada-Ku, Kobe 657-8501 Japan E-mail:
[email protected]
Mathematics Subject Classification (2000): 62-02, 62-07
Cataloging-in-Publication Data Library of Congress Control Number: 2005924539
ISBN 3-540-25123-5 Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springeronline.com © Springer-Verlag Berlin Heidelberg 2005 Printed in Germany The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: design & production Production: Helmut Petri Printing: Strauss Offsetdruck SPIN 11401841
Printed on acid-free paper – 43/3153 – 5 4 3 2 1 0
Table of Contents
1 Introduction
1
2 Basic Probability Theory and Markov Chains 2.1 Random Variables 2.2 Function of Random Variable 2.3 Normal Random Variable 2.4 Lognormal Random Variable 2.5 Markov Chains 2.6 Passage Time 2.7 Examples and Exercises References 3 Estimation Techniques 3.1 Models, Parameters and Likelihood - An Overview 3.2 Maximum Likelihood Estimation and Covariance Matrix of Parameters 3.3 MLE Example - Classical Linear Regression 3.4 Dependent Observations 3.5 Prediction Error Decomposition 3.6 Serially Correlated Errors - Overview 3.7 Constrained Optimization and the Covariance Matrix 3.8 Examples and Exercises References
4 Non-Parametric Method of Estimation 4.1 Background 4.2 Non-Parametric Approach 4.3 Kernel Regression 4.4 Illustration 1 (EViews) 4.5 Optimal Bandwidth Selection 4.6 Illustration 2 (EViews) 4.7 Examples and Exercises References
19 19 20 22 23 24 25 27 28 29
X
Table of Contents
5 Unit Root, Cointegration and Related Issues 5.1 Stationary Process 5.2 Unit Root 5.3 Dickey-Fuller Test 5.4 Cointegration 5.5 Residual-based Cointegration Test 5.6 Unit Root in a Regression Model 5.7 Application to Stock Markets References 6 VAR Modeling
6.1 Stationary Process 6.2 Granger Causality 6.3 Cointegration and Error Correction 6.4 Johansen Test 6.5 LA-VAR 6.6 Application to Stock Prices References 7 Time Varying Volatility Models 7.1 Background 7.2 ARCH and GARCH Models 7.3 TGARCH and EGARCH Models 7.4 Causality-in-Variance Approach 7.5 Information Flow between Price Change and Trading Volume References
8 State-Space Models (I) 8.1 Background 8.2 Classical Regression 8.3 Important Time Series Processes 8.4 Recursive Least Squares 8.5 State-Space Representation 8.6 Examples and Exercises References
9 State-Space Models (11) 9.1 Likelihood Function Maximization 9.2 EM Algorithm
9.3 Time Varying Parameters and Changing Conditional Variance (EViews) 9.4 GARCH and Stochastic Variance Model for Exchange Rate (EViews) 9.5 Examples and Exercises References 10 Discrete Time Real Asset Valuation Model 10.1 Asset Price Basics 10.2 Mining Project Background 10.3 Example 1 10.4 Example 2 10.5 Example 3 10.6 Example 4 Appendix References
11 Discrete Time Model of Interest Rate I 1.1 Preliminaries of Short Rate Lattice 11.2 Forward Recursion for Lattice and Elementary Price 11.3 Matching the Current Term Structure 11.4 Immunization: Application of Short Rate Lattice 11.5 Valuing Callable Bond 11.6 Exercises References
12 Global Bubbles in Stock Markets and Linkages 12.1 Introduction 12.2 Speculative Bubbles 12.3 Review of Key Empirical Papers 12.4 New Contribution 12.5 Global Stock Market Integration 12.6 Dynamic Linear Models for Bubble Solutions 12.7 Dynamic Linear Models for No-Bubble Solutions 12.8 Subset VAR for Linkages between Markets 12.9 Results and Discussions 12.10 Summary References 13 Forward FX Market and the Risk Premium 13.1 Introduction
111 113 116 126
XI1
Table of Contents
Alternative Approach to Model Risk Premia The Proposed Model State-Space Framework 13.5 Brief Description of WolffICheung Model 13.6 Application of the Model and Data Description 13.7 Summary and Conclusions Appendix References 14 Equity Risk Premia from Derivative Prices 14.1 Introduction 14.2 The Theory behind the Modeling Framework 14.3 The Continuous Time State-Space Framework 14.4 Setting Up The Filtering Framework 14.5 The Data Set 14.6 Estimation Results 14.7 Summary and Conclusions References
Index About the Authors
1 Introduction
This book offers the opportunity to study and experience advanced empirical techniques in finance and in general financial economics. It is not only suitable for students with an interest in the field, it is also highly recommended for academic researchers as well as the researchers in the industry. The book focuses on the contemporary empirical techniques used in the analysis of financial markets and how these are implemented using actual market data. With an emphasis on implementation, this book helps focusing on strategies for rigorously combing finance theory and modeling technology to extend extant considerations in the literature. The main aim of this book is to equip the readers with an array of tools and techniques that will allow them to explore financial market problems with a fresh perspective. In this sense it is not another volume in econometrics. Of course, the traditional econometric methods are still valid and important; the contents of this book will bring in other related modeling topics that help more in-depth exploration of finance theory and putting it into practice. As seen in the derivatives analysis, modern finance theory requires a sophisticated understanding of stochastic processes. The actual data analyses also require new statistical tools that can address the unique aspects of financial data. To meet these new demands, this book explains diverse modeling approaches with an emphasis on the application in the field of finance. This book has been written for anyone with a general knowledge of the finance discipline and interest in its principles together with a good mathematical aptitude. For the presentation of the materials throughout the book, we therefore focused more on presenting a comprehensible discussion than on the rigors of mathematical derivations. We also made extensive use of actual data in an effort to promote the understanding of the topics. We have used standard software tools and packages to implement various algorithms. The readers with a computer programming orientation will enormously benefit from the available program codes. We have illustrated the implementation of various algorithms using contemporary data as appropriate and utilized either Excel spreadsheet (Microsoft Corporation), EViews (Quantitative Micro Software), or GAUSS
2
1 Introduction
(Aptech Systems Inc.) environments. These program codes and data would be made available through one of the author's website (www.bhar.id.au) with appropriate reference to the chapters in the book. We have implemented the routines using the software package versions currently in use so that most readers would be able to experiment with these almost immediately. We sincerely hope that the readers would utilize these software codes to enhance the capabilities and thus contribute to the empirical finance field in future.' Besides the first introductory chapter, the book comprises thirteen other chapters and the brief description these chapters follow. The chapter 2 reviews the basic probability and statistical techniques commonly used in quantitative finance. It also briefly covers the topic of Markov chains and the concept of first passage time. The chapter 3 is devoted to estimation techniques. Since most empirical models would consist of several unknown parameters as suggested by the underlying theory, the issue of inferring these parameters from available data is of paramount importance. Without these parameters the model is of little use in practice. In this chapter we mainly focus on maximum likelihood approach to model estimation. We discuss different ways to specify the likelihood function and these become useful for later chapters. We devote sufficient time to explain how to deal with placing restrictions on model parameters. As most commercially available optimization routines would automatically produce the covariance matrix of the parameters at the point of convergence, we include a careful analysis how to translate this to the constrained parameters that are of interest to the user. This translated covariance matrix is used to make inference about the statistical significance of the estimated parameters. In chapter 4 we cover the essential elements of non-parametric regression models and illustrate the principles with examples. Here we make use of the routines on Kernel density function available in the software package EViews. Chapters 5 and 6 then review the stationary and nonstationary time series models. The former chapter discusses the unit root, cointegration and related issues; the latter, multivariate time series models such as VAR, VECM and LA-VAR. Chapter 7 reviews time varying volatility models, such as ARCH, GARCH, T-GARCH and E-GARCH. Since these models have dominated the literature over the last several years we emphasize applications rather than theories. We extend this topic to include causality in
Datastream is a trademark of THOMSON FINANCIAL. EViews is a trademark of Quantitative Micro Software. Excel is a trademark of Microsoft Corporation. GAUSS is a trademark of Aptech Systems, Inc.
1 Introduction
3
variance and demonstrate its efficacy with an application to the commodity futures contracts. The chapter 8 is devoted to explaining the state-space models and its application to several time series data. We have attempted to demystifL the concepts underlying unobserved components in a dynamical system and how these could be inferred from an application of the filtering algorithm. The filtering algorithm and its various sophistication dominate the engineering literature. In this book we restrict ourselves to those problem settings most familiar to the researchers in finance and economics. Some of the examples in this chapter make use of the limited facility available in EViews to estimate such models. In chapter 9 we take up the issue of estimation of state-space models in greater detail by way of maximization of prediction error decomposition form of the likelihood function. This analysis gives the reader an insight into how various model parameters feed into the adaptive filtering algorithm and thus constitute the likelihood function. It becomes clear to the reader that for any reasonable practical system the likelihood function is highly non-linear in model parameters and thus optimization is a very complex problem. Although the standard numerical function optimization technique would work in most situations, there are cases where an alternative method based on Expectation Maximization is preferable. For the benefit of the readers we give a complete GAUSS program code to implement the EM algorithm. We sincerely hope that the readers would experiment with this code structure to enhance their understanding of this very important algorithm. In the next two chapters, 10 and 11, we move away from the state-space systems and take up the practical issues in modeling stochastic process in a discrete time framework. We will, however, take up more challenging modeling exercises using state-space models in the succeeding chapters. In chapter 10 we discuss the discrete time stochastic nature of real asset problem. This approach is suitable for resource based valuation exercises where there might be several embedded options available to the investor. After describing the basic issues in financial options valuation and the real asset options valuation, we describe the approach using a mining problem. We structure the development in this chapter and the next one following the excellent book by D. G. Luenberger on Investment Science (Luenberger DG (1997) Investment science. Oxford University Press, New York). We, however, add our interpretation of the issues as well as other illustrations. Besides, we include the relevant implementations in Excel spreadsheets. In chapter 1 1, we maintain the same discrete time theme and take up the issues with modeling interest rates and securities contingent on the term
4
1 Introduction
structure of interest rates. We explain the elegant algorithm of forward recursion in a fashion similar to that in D. G. Luenberger's Investment Science. There are several illustrations as well as the spreadsheets of the implementation. We hope that the readers would take full advantage of these spreadsheets and develop them further to suit their own research needs. In chapter 12 we highlight the recent advances in inferring the speculative component in aggregate equity prices. The topic area in this chapter should be very familiar to most researchers and students in finance. However, it may not be as obvious how to extend the standard present value models to infer unobserved speculative price component. The implementation of the models in this chapter relies upon the understanding of the contents of chapters 8 and 9. We not only infer the speculative components, we extend the analysis to investigate whether these components are related between different markets. The last two chapters, 13 and 14, deal with the important issue of risk premium in asset prices. Chapter 13 covers the foreign exchange prices and the chapter 14 deals with the equity market risk premium. Both these chapters require some understanding of the stochastic processes. Most students and researchers would be familiar with the topic of risk premium in a regression-based approach. That makes it a backward looking estimation process. We, however, exploit the rich theoretical structures in the derivatives market that connects the probability distribution in the risk-neutral world and the real world. Finally, this leads to a convenient mechanism, with minimum of assumptions, to uncover the market's belief about the likely risk premium in these markets. Since the methodology is based on the derivative securities, the inferred risk premium is necessarily forwardlooking. These two chapters completely rely on the unobserved component modeling framework introduced in the chapters 8 and 9. The associated GAUSS programs would be of immense benefit to all readers and would help them enhance their skills in this area as well.
2 Basic Probability Theory and Markov Chains
2.1 Random Variables Suppose x is a random variable that can take on any one of a finite number of specific values, e.g., x,,x ,,... x, . Also assume that there is a probability p, that represents the relative chance of an occurrencex, . In this case, pl satisfies
xm 1-1
pi = 1 and p, 2 0 for each i. Each p, can be thought of as the
relative frequency with which x, will occur when an experiment to observe x is carried out a large number of times. If the outcome variable can take on any real value in an interval, e.g., the temperature of a room, then a probability density function p(5) describes the probability. Since the random variable can take on a continuum of values, the probability density function has the following interpretation:
The probability distribution of the random variable x is the function F(5) defined as,
It therefore follows that F(-a) = 0 and F(a) = 1 . In the case of a continuum of values, if F is differentiable, then dF/d< = p(5) . Two random variables, x and y, are described by their joint probability density or joint probability distribution. The joint distribution is the function F defined as,
6
2 Basic Probability Theory and Markov Chains
The joint density is defined in terms of derivatives for random variables that take on a continuum of values. For discrete values, the joint density at a pair (x, ,y,) is p(x, ,y,) , which is equal to the probability of that pair occurring. This concept can be extended to n random variables. The distribution of any one of the random variables can be easily recovered from a joint distribution. Given the distribution F(<,q) of x and y, the distribution of x is Fx(5) = F(5, co) . We say that the random variables x and y are independent if the density function can be written as,
The expected value of a random variable x with density function p is defined as,
If we denote E[x] by p , the variance of x is defined as,
Similarly, the covariance between two random variables x and y is defined as,
where E[x] = p, and E[y] = py. The random variables are independent of each other if the covariance is zero. Further details may be found in (Ross 2000).
2.2 Function of Random Variable
7
2.2 Function of Random Variable Suppose that we are interested in finding the expected value of some function of x, say, g(x). For any real valued function of x, this is defined as,
Now suppose that y = g(x) defines the one-to-one correspondence between x and y and that x = w(y) is the unique inverse function. In this case, the density function h(y) of the random variable y is given by,
where J = dw/dy is known as the Jacobian of the transformation. For example, consider y = a + bx , where 'a' and 'b' are constants. The inverse function here is (y - a)/b ,and the Jacobian is l/b . Thus, y-a h(Y) = P(T)lil.
1
If we have a known density function of x, then the above equation allows us to determine the density function of y. Now let's assume that x, and x, are continuous random variables with joint density function p(x,, x,) . Here we want to determine the joint distribution of the random variables y, and y, , where y, = g, (x,, x,) and y2 = g, (x, ,x,) for some functions g, and g, . The joint density function of y, and y, , h(y, ,y,) is given by,
8
2 Basic Probability Theory and Markov Chains
;;;:;1
where the Jacobian is the 2 x 2 determinant, J = X, = w,(y,, y,),
;:;;:I,
and
x, = wz(yl,y,) are the unique inverse functions.
2.3 Normal Random Variable A random variable x is said to be normally distributed if its probability density function is of the form,
In this case, the expected value of x is p and the variance is c 2 .A normal random variable is standardized if p = 0 and oZ= 1. The density function of such a standard normal variable is,
and the corresponding distribution is given by,
When more than one normal random variables are involved, it is convenient to express these equations in matrix notation. If X is a vector of n normal random variables then the mean vector is CL = [p,, p, ,...,pnIr. The covariance matrix Q is a n x n symmetric matrix with the components Q i , = C ov[x,,x, ] . The covariance matrix may also be expressed as,
2.4 Lognormal Random Variable
9
If the n variables are jointly normal, then,
If two jointly normal random variables are uncorrelated, then the joint density is the product of the densities for the individual variable. Another important property of the jointly normal random variables is the summation property. If x and y are jointly normal, then the random variables of the form (ax + py) for constants a,p are also normal. This can be extended to higher order sums.
2.4 Lognormal Random Variable A random variable z is lognormal if ln(z) is normal. In other words, if x is normal then z = ex is lognormal. The density function of z is,
The relevant parameters are,
From the summation result for jointly normal random variables, it follows that the products and powers of jointly lognormal variables are also lognormal. If u and v are lognormal, for example, then z = uavP is also lognormal.
10
2 Basic Probability Theory and Markov Chains
2.5 Markov Chains When modeling uncertainty, we usually adopt probability distributions to quantitatively describe the set of possible outcomes. An understanding of the processes is very important when specifying these distributions. A stochastic process is a collection of random variables indexed to time t and a state x. For example, we can write {x,t > 01, t E T . A finite T is referred to as a "countable stochastic process." The indices can assume discrete or continuous values. Various stochastic processes such as random walk, Markov chains, Wiener processes, and stochastic differential equations are applied to different applications. This section briefly introduces Markov chains. Markov chains were originally proposed by the Russian mathematician Markov in 1907. Over the many decades since, they have been extensively applied to problems in social science, economics and finance, computer science, computer-generated music, and other fields. Consider a communications system that transmits the digits 0 and 1. Each digit transmitted must pass through several stages, and at each stage there is a probability p that the digit entered will be unchanged when it leaves. When we let X, denote the digit entering the nth stage, {x., n = 0,1, ...) is a two-state Markov chain having a transition probability matrix,
If there are three states in a Markov chain, then the transition probability matrix will have the following form,
In this matrix we note that, pi, 2 0, i = 1,...,n; j = 1,...,n; and
xn
p , = 1.
~=1
The properties of the Markov chain are then defined by the mathematical properties of the probability matrix.
2.5 Markov Chains
11
Consider the case (Tapeiro 2000) of a new airline planning to launch operations in a newly opened market after the lifting of restrictions to protect the national airline. The new entrant will offer various inducements to attract passengers from the national airline, and it has been estimated that only 116 of the clients of the national airline will remain loyal without switching in a given month. The clients of the new company, in turn, have a 213 probability of remaining loyal. Our aim, given the number of passengers in the beginning, is to find the expected number of passengers in each of the two airlines after a month, after two months, and after many months, i.e., in the long run. Define, state 0: A customer flies with the national airline, state 1 : A customer flies with the new airline. The transition probability matrix is given by,
Let No(0) be the number of clients in the national airline at the beginning (say, 600K) and let N,(O) = 0 (in other words, the number of clients at month 0 in the new airline is 0). After the first month, the distribution of clients among the companies is given by,
In numbers this becomesN,(l) = 100, and Nl(l) = 500. In matrix notation this can be written as,
12
2 Basic Probability Theory and Markov Chains
Therefore, when we consider two consecutive months, the matrix notation becomes,
For the second month, we have, N(2) = PN(1) , which gives (in numerical terms) N, (2) = 183, N, (2) = 4 17 . The matrix pTdenotes the transition probability from one state to another in T steps. If the number of steps is large, the transition probabilities are then called ergodic transition probabilities and are given by the equilibrium probabilities (assuming they exist):
An application of the Chapman-Kolmogorov matrix multiplication formula yields,
This provides a system of linear equations that can be used to calculate the ergodic probabilities, along with the fact that I=I n, = l,ni 2 0 .
x"
For our airline problem, this implies,
2.5 Markov Chains
13
Equations (2.28) can be solved by any of several well-known methods. The solution provides the answer to our question of distribution of passengers in the long run. The implications of these equations are very important in practice. These reveal the state and probabilities towards which a process will incline. Thus, if we compare two approaches that lead to two different Markov chains, we can study the long run effects of these methods. We next consider a more efficient solution approach (Kim and Nelson 1999) to a system of equations such as that given in (2.28). From equation (2.27) we obtain,
where I, is the identity matrix of order M (i.e., the order of the probability transition matrix P) and 0, is the M x 1 vector of zeros. The condition where the steady state probabilities sum to one is captured by,
wherei, =[1 1 1 ... I]'. Combining equations (2.29) and (2.30), we get,
14
2 Basic Probability Theory and Markov Chains
By multiplying both sides of (2.32) by (A'A)-' A' we obtain,
2.6 Passage Time Economists are often required to ascertain the time required to reach a particular state. Given a wealth process, for example, how long will it take to reach the bankrupt state, i.e., the state without any wealth? Assume that we are in state i and let cJ(n)be the probability of a first transition from state i to state j in n steps. This is the probability that the jth state has not been passed through in prior transitions. For a transition in one step, the transition matrix gives this probability. For a transition in two steps, it equals the probability in two steps conditional on not having transited in one step. This implies two step transition probability less the one step transition probability times the probability that if it has reached such a state, it does not stay there. This can be represented by,
and in general,
When the states i and j communicate (i.e., when it is possible to switch from state i to state j in a finite number of transitions), we can compute the expectation of this passage time. This expectation is defined by,
2.6 Passage Time
15
With some further analysis we can show that the mean first passage time can be obtained by solving the set of equations given by,
Note here that if k = j, then pkj= pkk= 0 . Now consider the Hedge Fund Market Share example described by Tapeiro (2000): The current market positions of a hedge fund and its two main competitors are 12%, 40% and 48%, respectively. Based on industry data, clients switch from one fund to another for various reasons (fund performance falls below expectation, etc.). Here, the switching fund matrix is estimated to be,
Our aim is to find the mean first passage time for clients in funds 2 and 3 to fund 1. This is given by the following system of equations from (2.37):
Solving these we get p,, = psi = 10 months. Other interesting applications for the first passage time include calculation of the time to bankruptcy, the first time cash attains a given level, etc. Continuing with the same example of hedge funds, determine the longterm market share of each of these funds.
16
2 Basic Probability Theory and Markov Chains
2.7 Examples and Exercises Example 2.1 : Suppose that x is a continuous random variable with density p(x), and let E[x] = p . In this condition, show that Var[x] = E [ x ~-] p2 .
Example 2.2: Given a normally distributed variable x with mean p and variance 0 2 , find the mean and variance of the random variable defined by y = a + bx , where 'a' and 'b' are constants. y-a h(Y) = P(T)lil
1
This shows that the random variable y has a mean of (a + bp) and the variance of b202. Exercise 2.1 : Use equation (2.33) and the transition matrix from the airline problem to compute the steady state probabilities. Attempt this problem in Excel or GAUSS.
References
17
References Kim C-J, Nelson CR (1999) State-space models with regime switching: classical and Gibbs-sampling approach with applications. The MIT Press, Cambridge Ross SM (2000) Introduction to probability models. 7th edn. Harcourt Academic Publishers, London Tapeiro CS (2000) Applied stochastic models and control for finance and insurance. Kluwer Academic Publishers, Dordrecht
3 Estimation Techniques
-
3.1 Models, Parameters and Likelihood An Overview When we speak about the probability of observing events, we are implicitly assuming some kind of model, even in the simple case of tossing a coin. In the case of tossing a coin, the model gives a certain and fixed probability for a particular outcome. This model has one parameter, 8 , representing the probability that the coin will land on heads. If the coin is fair, then 8 = 0.5 . Given specific parameter values for the model, we can speak about the probability of observing an event. In this simple case, if 8 = 0.5 , then the probability that the coin will land on heads on any one toss is also 0.5. This simple example does not appear to provide us with very much: we merely seem to be calling what was previously a simple probability the parameter of a model. As we shall see, however, this way of thinking provides a very useful framework for expressing more complex problems. In the real world, very few things have absolute, fixed probabilities. Many of the aspects of the world with which we are familiar are not truly random. Take for instance, the probability of becoming a millionaire. Say that the ratio of millionaire in a population is 10%. If we know nothing else about an individual, we would say that the probability of this individual becoming a millionaire is 0.10. In mathematical notation, p(M) = 0.10, where M shows the event of being a millionaire. We know, however, that certain people are more likely to be a millionaire than others. For example, having a strong academic background such as MBA may greatly increase one's possibility of becoming a millionaire. The probability above is essentially an average probability, taken across all individuals both with and without MBA. The notion of conditional probability allows us to incorporate other potentially important variables, such as MBA, into statements about the probability of an individual becoming a millionaire. Mathematically, we write p(X I Y), meaning the probability of X conditional on Y or given Y. In our example, we could write,
3 Estimation Techniques
20
p(M I with MBA),
(3.1)
p(M I without MBA).
(3.2)
and
Whether or not these two values differ is an indication of the influence of MBA upon an individual's chances of becoming a millionaire. Now we are in a position to introduce the concept of likelihood. If the probability of an event X dependent on model parameters 8 is written p(X / 8) , then we would talk about the likelihood L(8 1 X) , that is, the likelihood of the parameters given the data. For most sensible models, we find that certain data are more probable than other data. The aim of maximum likelihood estimation is to find the parameter value(s) that makes the observed data most likely. The likelihood of the parameters given the data is defined to be equal to the probability of the data given the parameters (technically, they are proportional to each other, but this does not affect the principle). If we were in the business of making predictions based on a set of solid assumptions, then we would be interested in probabilities-the probabilities of certain outcomes occurring or not occurring. In the case of data analysis, however, all the data have already been observed. Once they have been observed they are fixed; no 'probabilistic' part to them remains (the word data comes from the Latin word meaning 'given'). We are much more interested in the likelihood of the model parameters that underlie the fixed data. Probability: Knowing parameters Likelihood: Observation of data
-+ Prediction of outcome,
3
Estimation of parameters.
3.2 Maximum Likelihood Estimation and Covariance Matrix of Parameters A statistical model with the parameter vector 8 of dimension k specifies a joint distribution for a vector of observations 7, = [y,,y, ,..., y,]' :
3.2 Maximum Likelihood Estimation and Covariance Matrix of Parameters
Joint density function: p (& 1 0) .
21
(3.3)
The joint density is, therefore, a function of 7, given 0 . In econometric work, we know the 7, vector, or the sample data, but we do not know the parameter vector 0 of the underlying statistical model. In this sense, the joint density in equation (3.3) is a function of 0 given 7,. We call it the likelihood function: Likelihood hnction: L (0 I 7,)
.
(3 -4)
This is functionally equivalent to equation (3.3). Different values of 0 result in different values of the likelihood function (3.4). The function represents the likelihood of observing the data given the parameter vector. In the maximum likelihood method we are interested in choosing parameter estimates that allow us to maximize the probability of having generated the observed sample by maximizing the log of the likelihood function:
bML= arg max
in (L (0 I 9,)).
Maximizing the log of the likelihood function instead of the likelihood function itself allows us to directly compute the covariance matrix, ~ov[b,], of the maximum likelihood estimate, 0,. The expectation of the second derivative of the log likelihood function provides us with the information matrix summarizing the amount of information in the sample, i.e.,
The inverse of the information matrix provides us with the lower bound for the covariance matrix of an unbiased estimator, 0 , otherwise known as the Cramer-Rao inequality. The maximum likelihood estimator has been shown to have the following asymptotic distribution,
22
3 Estimation Techniques
3.3 MLE Example - Classical Linear Regression The classical theory of maximum likelihood estimation (MLE) is based on a situation in which each of T observations is drawn independently from the same distribution. As the observations are drawn independently, the joint density function is given by,
where p(y,;0) is the probability density function y, . The models encountered in econometrics rarely conform to this pattern. Nonetheless, the main results of maximum likelihood estimators are valid for most cases. Consider the linear regression model in matrix notation, where disturbances are uncorrelated, each having mean zero and a constant, but finite variance,
where, the error vector
E = [E,,E
,,..., E,]' has the properties,
Assuming the parameter vector 0 = [P, 02]', we obtain the following log likelihood function of the system in (3.9) with normally distributed errors:
3.4 Dependent Observations
23
While the likelihood function in equation (3.1 1) can be maximized numerically, in this case we can show that,
To find the covariance matrix we need the information matrix, which in this case turns out to be:
3.4 Dependent Observations By definition, the observations in time series applications are dependent. Here we explore the method for constructing the likelihood function in this situation. The function cannot be written in the form of equation (3.8). Instead, we express the joint density in terms of conditional densities. When there are two observations, the joint density can be written as,
The first term on the right-hand side is the probability density hnction of y2 conditional on observing y, . Similarly, for three observations we can write, P ( Y ~ , Y ~=, p(Y3 Y ~ )I Y ~ , Y ~ ) P ( Y ~ , Y ~ ) , and with the help of equation (3.14),
(3.15)
24
3 Estimation Techniques
P ( Y ~ , Y ~ ,=YPI()YI ~Y ~ ~ Y I ) P (I Y I~) P ( Y I ) .
(3.16)
Thus, in general,
As an example, consider the first-order autoregressive model,
In this case, the distribution of y,, conditional on yt-, , can be specified as normal with mean +y,-, and the variance likelihood function can be expressed as,
02.Under
this interpretation, the
All that remains is to consider the treatment of the initial condition as reflected by the distribution of y, . When this first observation is fixed, as is often the case, it does not enter the likelihood function to be maximized and can be dropped from equation (3.19). A similar analysis is possible for the different form of equation (3.18).
3.5 Prediction Error Decomposition The mean of the conditional distribution of y,, E[yt I y,-,], is the optimal predictor of y, in the sense that it minimizes the prediction mean square error. The variance of the corresponding prediction error,
3.6 Serially Correlated Errors - Overview
25
is the same as the conditional variance of y,, that is,
The likelihood function in equation (3.17) can be expressed in terms of the prediction errors. This operation, otherwise known as the "prediction error decomposition," is highly relevant for normally distributed observations. Let us write the conditional variance as,
where o2is a parameter. The prediction error decomposition yields,
For the AR(1) model considered earlier, we can easily see that ft = 1, Vt . For more complicated time series models, we can compute the prediction error decomposition form of the likelihood h c t i o n by putting the model in the state-space form (i.e., in the form of a dynamic linear model) and applying a Kalman filter. The prediction error form also offers a powerful approach for handling estimation issues for multivariate models. Further details can be found in Harvey (1990).
3.6 Serially Correlated Errors - Overview A basic observation on prices in financial markets is that large returns tend to be followed by still larger returns of either sign. This, in turn, implies that the volatility of asset returns tends to be serially correlated. Here, the volatility implies the conditional variance of asset returns. Various econometric tests have been developed to infer the extent of this correlation in volatility. The most significant development for capturing this serial correlation has been the specification of a class of time-varying volatility models known as ARCH (autoregressive conditional heteroskedasticity) or
26
3 Estimation Techniques
GARCH (generalized ARCH). This topic is extensively discussed in chapter 7. Here we only outline the essential elements for the maximum likelihood estimation of such models. In this case, we write the conditional variance as,
This represents an ARCH(p) process of order p implying that the current variance is determined by the last p surprises (disturbances). In practice, this approach may introduce a large number of parameters to be estimated, depending the value of p. One way to simplify the model is to introduce lagged values of the conditional variance itself. This suggests,
This is referred to as a GARCH(p,q) model. The coefficient a measures the extent to which a volatility shock today feeds through into the next period's volatility, and ( a + P) measures the rate at which this effect decays. Focusing on a GARCH(1,l) formulation, we can show by successive substitution that,
If (a,+ p) < 1 in the GARCH(1,l) model, the unconditional expectation of 0: is given by a,/(l - a, - P) . If (a, + P) = 1, then today's volatility affects volatility into the indefinite future. A great many variations of this basic time-varying volatility model have been proposed in the literature and applied to different situations. For the sake of simplicity, however, we will limit our next focus to the estimation of a GARCH(1,l) model by the maximum likelihood techniques discussed earlier. Consider the simple market model of stock return that relates the return from a stock to the market return,
3.7 Constrained Optimization and the Covariance Matrix
27
where I,-, is the information set available at time t-1. We are interested in P] ' , given the estimating the parameters of this model, i.e., 8 = [a, b, a,, a,, observations on r, and m, . Given that this model is conditionally normal, as discussed before, its log likelihood function is expressed as,
When maximizing the likelihood function in GARCH models, we usually need to impose the condition (a, + P) < 1 . This leads us to the next topic on constrained optimization and the covariance matrix.
3.7 Constrained Optimization and the Covariance Matrix The maximum likelihood estimator, i),, can be obtained by setting the first derivative of the log likelihood function with respect to the parameter vector to zero. In practice, a closed-form solution for 8, is usually unavailable. As an alternative, a non-linear optimization procedure is generally adopted to maximize the likelihood function. When starting with an estimate of the parameter, i.e., @I-',this implies that we can obtain new estimates, i.e., 8j, using the information provided by the first and sometimes second derivatives of the log likelihood function evaluated at @I-'. The new estimates reflect larger values of the log likelihood function. This process is repeated until no further improvement in the likelihood function can be realized. It also remains possible that these maximum points are not unique. While this numerical optimization proceeds, the algorithm searches for values of the parameters over the entire range of real numbers from-a to a. In certain models, however, some of the parameters will probably have to be restricted to certain ranges as the values outside those
,.
3 Estimation Techniques
28
ranges are meaningless. In the case of GARCH models, for example, we have just seen that the sum of the two parameters should be less than one. Similarly, if the model contains parameters that represent probabilities, these must be between 0 and 1. Restricting the parameter space while maximizing the likelihood function is referred to as constrained optimization. Kim and Nelson (1999) discuss additional issues in this context. Suppose, firstly, that the model parameter vector 9 is obtained by transforming (i.e., constraining) such that 8 = g(tq), and secondly, that the numerical optimization is carried out with respect to y~ as unconstrained paand Cov[$,], while rameters. This procedure ultimately results in
+,
we actually want 0, by
6,
= g($,)
and COV[~,] . The parameters of interest are given
. We also find that,
3.8 Examples and Exercises Exercise 3.1 : Assume that the short-term interest rate (r,) process is given by the following equation:
where the model parameters are, 9 = [q, y , ~ ] 'and Aw, is a normally distributed noise term with expected value zero and variance given by At. Given the time series of a sample of short-term interest rate data, set up the likelihood problem in Excel (or GAUSS) and maximize the likelihood function to obtain the estimates of the parameters. Exercise 3.2: Suggest a way to constrain the parameters during the optimization process for the following situations:
References
a. b. c.
29
Parameter must be positive, Parameter represents probability, Two parameters must be positive and sum to less than 1.
References Harvey A (1990) The econometric analysis of time series. The MIT Press, Cambridge Kim C-J, Nelson CR (1999) State-space models with regime switching: classical and Gibbs-sampling approach with applications. The MIT Press, Cambridge
4 Non-Parametric Method of Estimation
4.1 Background In some financial applications we may face a functional relationship between two variables Y and X without the benefit of a structural model to restrict the parametric form of the relation. In these situations, we can apply nonparametric estimation techniques to capture a wide variety of nonlinearities without recourse to any one particular specification of the nonlinear relation. In contrast to a highly structured or parametric approach to estimating non-linearities, nonparametric estimation requires few assumptions about the nature of the non-linearities. This is not to say that the approach is free of drawbacks. To begin with, the highly data-intensive nature of the process can make it somewhat costly. Further, nonparametric estimation is poorly suited to small samples and has been found to over fit the data. A regression curve describes the general relationship between an explanatory variable X and a response variable Y. Having observed X, the average value of Y is given by the regression function. The form of the regression function may tell us where higher Y-values are to be expected for certain values of X or where a special sort of dependence is indicated. A pre-selected parametric model might be too restricted to fit unexpected features of the data. The term "non-parametric"refers to the flexible functional form of the regression curve. The non-parametric approach to a regression curve serves four main functions. First, it provides a versatile method for exploring a general relationship between two variables. Second, it gives predictions of observations yet to be made without reference to a fixed parametric model. Third, it provides a tool for finding spurious observations by studying the influence of isolated points. Fourth, it constitutes a flexible method for substituting missing values or interpolating between adjacent X values. The flexibility of the method is extremely helpful in a preliminary and exploratory statistical analysis of a data set. When no a priori model information about the regression curve is available, non-parametric analysis
32
4 Non-Parametric Method of Estimation
can help in providing simple parametric formulations of the regression relationship.
4.2 Non-Parametric Approach The most commonly used non-parametric estimators are smoothing estimators that reduce observational errors by averaging the data in sophisticated ways. Kernel regression, orthogonal series expansion, projection pursuit, nearest-neighbor estimators, average derivative estimators, and splines are all examples of smoothing. The following example illustrates the motivation for applying this type of averaging. Suppose that we want to estimate the relation between Y, and X, ,a pair of variables that satisfy,
where m(.) is an arbitrary fixed but unknown nonlinear function and {E,) is a zero-mean IID (identically and independently distributed) process. Next, suppose that we want to estimate m(.) at a particular date to for which Xto= x, . Suppose further that for this one observation Xto, we can obtain repeated independent observations of the variable Yto , say Y;
= yI,...,Y: = yn .
In this case, a natural estimator of the function
m(.) at the point x, is,
and by law of large numbers, the second term on the right-hand side becomes negligible for large n. If Y, is a time series, we do not have the luxury of repeated observations for a given Xi. If, on the other hand, we assume that the function m(.) is sufficiently smooth, then for time series observations X, near the value xo , the corresponding values of Yt should be close to m(x,) . In
4.3 Kernel Regression
33
other words, if m(.) is sufficiently smooth, then in a small neighborhood around x, , m(x,) will be nearly constant and may be estimated by taking the average of the Yt 's that corresponds to those X, 's near x, . The closer the X t l s are to the value x, , the closer the average of the corresponding Ytls will be to m(x,) . This argues for a weighted average of the Yt Is, where the weights decline as the X t l s get further away from x, . This weighted average procedure for estimating m(x) is the essence of smoothing. More formally, for any arbitrary x, a smoothing estimator of m(x) may be expressed as,
where the weights {w,,,(x)) are large for those Y, 's paired with X, 's near x, and small for those Yt 's with X, 's far from x. To implement this concept we need to define the meanings of "nearNand *'far."If the neighborhood around x is too large for computing the average, then the weighted average will be too smooth and may not exhibit the genuine non-linearities of m(.). If, on the other hand, the neighborhood is too small, then the observational noise will have a strong influence on the computed average and m(.) will be highly variable. The weights, {w,,, (x)) ,will have to be chosen to balance these conflicting effects.
4.3 Kernel Regression Kernel regression is an important smoothing technique for estimating m(*). The weight function in this model, w,,,(x), is constructed from a probability density function K(x), also called a kernel. Therefore,
Though K(x) is a probability density function, it plays no probabilistic part in the subsequent analysis. To the contrary, it serves merely as a convenient method for computing a weighted average. In no case does it im-
34
4 Non-Parametric Method of Estimation
ply, for example, that X is distributed according to K(x). If that were the case, it would become a parametric approach. By rescaling the kernel with respect to a variable h > 0 , we can change the spread by varying h if we define,
We can now define the weight function to be used in the weighted average as,
If h is very small, the averaging will be done with respect to a rather small neighborhood around each of the X, 's . If h is very large, the averaging will be over a large neighborhood of the X, 's . The degree of averaging amounts to adjusting the smoothing parameter, h, also known as bandwidth. Substituting equations (4.6) and (4.7) into equation (4.3) yields,
This is known as Nadaraya-Watson kernel estimator m, (x) of m(x) . Under a certain regularity condition on the shape of the kernel and the magnitude and behavior of the weights, we find that as the sample size grows, m,(x) + m(x) asymptotically. This convergence property holds for a large class of kernels. One of the most popular kernels is the Gaussian kernel defined by,
4.4 Illustration 1 (EViews)
35
In analyzing different examples with the EViews package, we would make use of this Gaussian kernel.
4.4 Illustration I (EViews) To illustrate the efficacy of kernel regression in capturing nonlinear relations, consider the smoothing technique for an artificially generated dataset using Monte Carlo simulation. Let {X,) denote a sequence of 500 observations that take on values between 0 and 2n: at evenly spaced increments, and let {Y,) be related to {X,) through the following nonlinear relation:
where {E,) is a sequence of IID pseudorandom standard normal variates. Using the simulated data pairs{X,,Y,) , we attempt to estimate the conditional expectation E[Y, I X,] = sin(X,) by kernel regression. We apply the Nadaraya-Watson estimator (4.8) with a Gaussian kernel to the data, and vary the bandwidth parameter h among 0.16,, 0.36,, 0.56, , where 6, is the sample standard deviation of {X,) . By varying h in units of standard deviation we are effectively normalizing the explanatory variable. The kernel estimator can be plotted for each variable. We notice from the plots that the kernel estimator is too choppy when the bandwidth is too small. It thus appears that for very low bandwidth, the information is too sparse to recover sin(X,) . While the kernel estimator succeeds in picking up the general nature of the function, it shows local variations due to noise. As the bandwidth climbs these variations can be smoothed out. At intermediate bandwidth, for example, the local noise is largely removed and the general appearance of the estimator is quite appealing. At still higher bandwidth, the noise is completely removed but the estimator fails to capture the genuine profile of the sine function. In the limit, the kernel estimator approaches the sample average of {Y,) and all the variability with respect to {Xi) is lost.
36
4 Non-Parametric Method of Estimation
This experiment may be carried out with other kernel functions (provided by EViews) as well. EViews also allows automatic selection of bandwidth. This brings us to the topic of optimal bandwidth selection.
4.5 Optimal Bandwidth Selection Choosing the proper bandwidth is critical in kernel regression. Among the several methods available for bandwidth selection, the most common is called cross-validation. This method is performed by choosing the bandwidth to minimize the weighted-average squared error of the kernel estimator. For a sample of T observations {X,,Yt):I: , let 1
m,,, (x,)= -Cwt,T(x,)yt T ttj
.
This is basically the kernel estimator based on the dataset with the jth observation deleted, evaluated at the j& value X, . The cross validation function CV(h) is defined as,
where 6(Xt) is a non-negative weight function required to reduce the effect of edges of the boundary (for additional information, see Hardle (1990)). The function CV(h) is called cross-validation since it validates the success of the kernel estimator in fitting {Y,) across T subsamples (Xt ,Yt , each with one observation omitted. The optimal bandwidth is the bandwidth that minimizes this function.
4.6 Illustration 2 (EViews) This example will acquaint you with the use of EViews for applying both non-parametric and parametric estimations procedures in the modeling of the short-term interest rate. Academic researchers and practitioners have
4.6 Illustration 2 (EViews)
37
been investigating ways to model the behavior of the short-term interest rate for many years. The short-term interest rate plays very important roles in financial economics. To cite just one use, its function is crucial in the pricing of fixed income securities and interest rate contingent claims. The following example deals with a method to model the mean and variance of changes in the short-term interest rate. The behavior of the short-term interest rate is generally represented by,
where r, is the interest rate, dWt is the standard Brownian motion, M(.) is the conditional mean function ofdr, , and V(.) is the conditional variance function of dr, . In estimating the model in equation (4.13), a non-parametric method does not need to specify the functions M(.) and V(-) . As part of the exercise, a standard parametric form may also be estimated for equation (4.13). The volatility estimated by both methods may then be compared. The standard non-parametric regression model is,
where, Y, is the dependent variable, X, is the independent variable, and v, is the IID with mean zero and finite variance. The aim is to obtain a non-parametric estimate of f(-) using the Nadaraya-Watson estimator. The conditional mean and variance of the interest rate changes can be defined as,
38
4 Non-Parametric Method of Estimation 2
Y,, = Art, YZt= (Ar,) . Estimates of the conditional means, EIYlt 1 X, = x] and E[YZt( Xt = x] are obtained from the following non-
where, X,
= rt,
parametric regressions:
The means of equations (4.17) and (4.18) can be estimated using the estimator in equation (4.8). This provides the estimates of ~ ( r , and ) ?(rt) in the equations (4.15) and (4.16), respectively. In this process, we use a Gaussian kernel and the optimal bandwidth suggested by Silverman (available in EViews). The variance estimate obtained from this non-parametric method may be compared with that from a parametric specification. The popular parametric specification for the short-term interest rate is the following GARCH-M (GARCH-in-Mean) model,
We encourage you to study the two different variance estimates-the non-parametric one obtained from equation (4.16) and the parametric one obtained from equation (4.20). Both of these methods have been applied in practice for different applications. The dataset for this exercise is described in the next section.
4.7 Examples and Exercises Exercise 4.1 : The empirical models for equity return in the CAPM (capital asset pricing model) framework most commonly adopted by researchers is given by,
References
39
The left-hand side represents the excess return from the asset and the right-hand side is a linear function of the excess return from the market. The aim of this exercise is to explore, using nonparametric kernel regression, whether such a linear relation holds for a given dataset. The dataset consists of weekly Japanese market data spanning January 1990 to December 2000. It contains data on excess return in the banking sector as well as excess return in the total market. You may also examine the above relationship using the GARCH(1,l) structure for the residual term.
References Hafner CM (1998) Nonlinear time series analysis with applications to foreign exchange rate volatility. Physica-Verlag, Berlin Hardle W (1990) Applied nonparametric regression. Cambridge University Press, Cambridge Hart JD (1997) Nonparametric smoothing and lack-of-fit tests. Springer, Berlin Niizeki MK (1998) Empirical tests of short-term interest rate models: a nonparametric approach. Applied Financial Economics 8: 347-352 Pagan A, Ullah A (1999) Nonparametric econometrics. Cambridge University Press, Cambridge Scott DW (1992) Multivariate density estimation. John Wiley & Sons, New York Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, New York
5 Unit Root, Cointegration and Related Issues
5.1 Stationary Process A stochastic process {y,) is weakly stationary or covariance stationary if it satisfies the following conditions. 1. E[y, ] is a constant; 2. V[y, ] is a constant; 3. Cov[y, ,y,,] is a function of s, but not oft, where s = +1,+2,-.-. In other words, a stochastic process (y,) is said to be weakly stationary (or covariance stationary) if its mean, variance and autocovariances are unaffected by changes of time. Note that the covariance between observations in the series is a function only of how far apart the observations are in time. Since the covariance between y, and y,, is a function of s, we can define the autocovarince function:
Equation (5.1) shows that for s=O, y(0) is equivalent to the variance of y, . Further, the autocorrelation function (ACF) or correlogram between y, and y,, is obtained by dividing y(s) by the variance y(0) as follows:
Enders (1995) and Hamilton (1994) are good reference to understand time series models. The explanation of chapters 5 and 6 relies on them.
42
5 Unit Root, Cointegration and Related Issues
The property of stationary stochastic process is represented by autocorrelation function. Example 5.1 : white noise process The simplest stationary stochastic processes is called the white noise process, u, . This process has the following properties:
.(.)=
('
s=o, 0 for s=+1,+2,... .
A sequence u, is a white noise process if each value in the sequence has a mean of zero, a constant variance and no autocorrelation. Example 5.2: MA(1) process The first order moving average process, i.e., the MA(1) process, is written as follows:
where u, is a white noise process. Thus,
5.1 Stationary Process
002 Cov[~t,~t-,I=~(s)= 0
C (
43
for s = +l, for s=+2,f3,..-,
1 0
for
s=O
for
s=+l
0
for s =i2,+3,--a.
Generally speaking, the MA(q) process is written as follows:
Note that the finite moving average processes are stationary regardless of the parameter values. Example 5.3: AR(1) process The first order autoregressive process, i.e., the AR(1) process, is written as :
where ut is a white noise process. If/ $ I< 1, we characterize the AR(1) process as stationary. Thus,
44
5 Unit Root, Cointegration and Related Issues
P(S>=
1 for s = 0, 4"or s = f l , f 2 , . . ..
As is clear from equation (5.17), p(s) + 0 as I s I+ co . A more general AR(p) process would be written as follows:
The AR(p) process is stationary if the roots of the characteristic equation,
have modulus greater than 1 or lie outside the unit circle. The AR(1) process is the simplest case and its characteristic equation is,
with a single root of 114. This lies outside the unit circle if
I 4 I< 1.
5.2 Unit Root There are important differences between stationary and nonstationary time series. Shocks to a stationary time series are temporary, while a nonstationary series has permanent components. Consider the following AR (I) process:
5.2 Unit Root
45
where u, is a white noise process. If I@I<1,then equation (5.21) satisfies the stationary condition. If @=l,then equation (5.2 1) becomes
which is non-stationary stochastic process. The root with @ = 1 is called the unit root. We focus on this issue as a nonstationary case. Assuming an initial value of y o , we iterate the substitution into equation (5.22) to obtain the following equations:
y, =tp+yo+ul+u2 +
a * '
+ u,.
(5.23)
As we clearly learn from equation (5.23), y, is the sum of stationary variables and receive the same impact from the past shock. By taking up the expectation of equation (5.23), we obtain the following:
Equation (5.24) shows that the expectation of y, has a linear trend. That is, if a constant term is included in equation (5.22), expectation of y, has a time trend. The constant term in equation (5.22) is called a drift term. In order to transfer the nonstationary process into stationary process, we need to take the difference. Let us define the lag operator (L) as follows:
A related operation is the first difference,
46
5 Unit Root, Cointegration and Related Issues
where A is the difference operator. Using equation (5.26), we can rewrite equation (5.19) as:
The right hand side of equation (5.27) is the stationary process with meanp and finite variance. With many economic data we can obtain the stationary process simply by taking the first difference. A process is called difference stationary if it is not stationary but its first difference is stationary. On the other hand, a process is called trend stationary if it is stationary after subtracting from it a function of time. If taking a first-difference produces a stationary process, the series y, is said to be integrated of order one and is denoted by I(1). A nonstationary series is integrated of order d and is denoted by I(d) if it becomes stationary after being first differenced d times. Note that the stationary process is denoted by I(0).
5.3 Dickey-Fuller Test Suppose that y, follows the AR(1) process,
where ut is a white noise process. If 1 4 I< 1 , then yt is I(0) process. If 4 = 1 , however, y, has a unit root and is thus an 1(1) process. Dickey and Fuller proposed the following approach for the analysis of the unit root. First, rewrite equation (5.28) as follows:
where p = 4 - 1. Next, consider the following hypothesis testing:
5.3 Dickey-Fuller Test
47
Dickey and Fuller (1979) show that under the null hypothesis of a unit root, the t-value of (estimator of p ) does not follow the conventional tdistribution. Proceeding thus, they derive asymptotic results and simulate critical values for various test and sample sizes. They specify the following three regression equations and show the statistical tables of for each case. This test is called the Dickey-Fuller test (DF test).
p
The main difference among equations (5.31), (5.32) and (5.33) is the specification of deterministic terms. Equation (5.3 1) has no deterministic term, equation (5.32) has a constant, and equation (5.33) has a constant and a time trend. In all cases, the parameter of interest in the regressions is p ; if P = 0 , the y, sequence contains a unit root. The test is performed by estimating one or more of the equations using OLS in order to obtain the estimated value of p and the associated standard error. By comparing the resulting t-statistic with the appropriate value reported in the Dickey-Fuller tables, we can determine whether to accept or reject the null hypothesis p=0.2 The methodology is precisely the same regardless for all three forms of the equation estimated. Be aware, however, that the critical values of the tstatistics do depend on whether an intercept andlor time trend is included in the regression equation. Dickey and Fuller found that the critical values for p = 0 depend on the specification of the regression and sample size. The augmented Dickey-Fuller test (ADF test) is the same as that above, carried out in the context of the following models:
See Fuller (1976).
48
5 Unit Root, Cointegration and Related Issues
The advantage of this formula is that it can accommodate serial correlation in error term, u, . The unit root test is carried out against P < 0, as before. Note that the asymptotic distribution of the t-ratio for P is independent of the number of lagged first differences (AY,-~)included in the ADF regression. The coefficients of the lagged values of Ayt-i in equations (5.34) through (5.36) are not generally of interest. Nevertheless, it is important to ensure that the u, series approximates white-noise. The inclusion of too many lags reduces the power of the test to reject the null of a unit root since the increased number of lags necessitates the estimation of additional parameters and a loss of degrees of freedom. How do we select the appropriate lag length in these circumstances? One approach is to use the information criterion to determine the lag length. The most commonly used criteria for selecting models are the Akaike Information Criterion (AIC) and the Schwarz Bayesian Information Criterion (SBIC). AIC and SBIC are defined as follows:
In T SBIC = 1ne2+-N, T where Ei2 is the sum of squared residuals divided by the sample size (T) and N is the number of parameters. As the number of parameters increases, the fit of the model gets better, and thus, the first term in AIC or SBIC de-
5.4 Cointegration
49
clines. At the same time, however, the second term increases. AIC or SBIC strikes the balance between a better fit and model parsimony. We will choose the lag length that leads to the minimum AIC or SBIC. An alternative approach is to start with a relatively long lag length and pare down the model by the usual t-test. For example, one could estimate equation (5.34) using a lag length p. If the t-statistic on lag p is insignificant at some specified critical value, the regression is repeatedly estimated using a lag length p- 1 until the lag is significantly different from zero. Once a tentative lag length has been determined, diagnostic checking may be conducted.
5.4 Cointegration The concept of cointegration applies to a wide variety of economic models. Any equilibrium relationship among a set of nonstationary variables implies that their stochastic trends must be linked. After all, the equilibrium relationship means that the variables cannot move independently of each other. This linkage among the stochastic trends requires that the variables be cointegrated. The formal analysis of cointegration begins by considering a set of economic variables in long-run equilibrium, where
The deviation from long-run equilibrium, the so-called equilibrium error, is u, ,leaving us with,
If the equilibrium is meaningful, the equilibrium error process must be stationary. The components of the vector y, = [y,,, ,y2,,]' are said to be cointegrated if 1. All components of y, = [y,,,,y,,,]' are integrated of order one, 2.
There exists a vector n = [n1,n2]' such that linear combination n'y, = nlyl,, + n2y2,,is integrated of order zero.
50
5 Unit Root, Cointegration and Related Issues
The vector n =[nl,n2]' is called the cointegrating vector. Note that the cointegrating vector is not unique. If n = [n,, n,]' is a cointegrating vector, then [An,, An,]' is also a cointegrating vector for any nonzero value of A . One of the variables is typically used to normalize the cointegrating vector by fixing its coefficient at unity. To normalize the cointegrating vector with respect to y,,, , simply select h = l/n, .
5.5 Residual-based Cointegration Test Engle and Granger (1987) propose a straightforward test to determine whether two 1(1) variables are cointegrated. Suppose that we want to determine whether there exists an equilibrium relationship between the two variables y,,, and y,,, . By definition, cointegration requires that the variables be integrated. Thus, the first step in the analysis is to infer the existence of the unit root in each of the variables based on the ADF test. If the results of the unit rot test indicate that both y,,, and y,,, are I(1), the next step is to estimate the long-run equilibrium relationship in the form:
where u, is a disturbance term. When the variables are cointegrated, deviations from long-run equilibrium are found to be stationary. Thus, we need to perform an ADF test on the residuals (fit ) to analyze the order of integration. Consider the autoregression of the residuals:
where v, is a disturbance term. Then, we can carry out the following hypothesis testing:
5.6 Unit Root in a Regression Model
51
The acceptance of the null hypothesis (P = 0 ) implies that the residual series contains a unit root. Hence, we conclude that y,,, and y,, are not cointegrated. Instead, the rejection of the null hypothesis implies that the residual sequence is stationary. Given that both y,,, and y,,, are I(1) and that the residuals are stationary, we can conclude that the series are cointegrated. Note that we cannot use the Dickey-Fuller tables to test for cointegration. The problem is that the ii, sequence is generated from a regression equation; we can observe the residuals (fit ) but not the true error (u, ). Only if Po and 0, are known in advance, an ordinary Dickey-Fuller table would be appropriate. The appropriate tables for cointegration test are provided by Engle and Yoo (1987). Although the Engle-Granger test is easily implemented, it has some limitations. Firstly, the estimation of the long-run equilibrium regression requires that we place one variable on the left-hand side and use the others as regressors. In practice, it is possible to find that one regression indicates the variables are cointegrated whereas reversing the order indicates no cointegration. This is an undesirable feature of the procedure since the test for cointegration should be invariant to the choice of the variables selected for normalization. Second, there may be more than one cointegrating vector in tests using three or more variables. The Engle-Granger test cannot handle this problem.
5.6 Unit Root in a Regression Model The unit root issue arises quite naturally in the context of the standard regression model. Consider the regression equation:
where u, is a disturbance term. The assumption of the classical regression model requires, firstly, that both x, and y, sequences be stationary, and secondly, that he error term have a zero mean and finite variance. In the presence of nonstationary variables, there might be what Granger and Newbold (1974) call spurious regression. A spurious regression has high
52
5 Unit Root, Cointegration and Related Issues
R ~t-statistic , that appear to be significant, and low Durbin-Watson statistic. The regression output looks good, but the results lack any economic meaning. In any case, we must remember to be careful in working with nonstationary variables. In terms of equation (5.44), we have three cases to consider.
-
-
Case 1: x, I(0) and y, I(0) Both x, and y, are stationary. When both variables are stationary, the classical regression model is appropriate.
-
-
-
Case 2: x, I(1) , y, I(1) and u, 1(1) The nonstationary sequences x, and y, are integrated of order one and the residual sequence contains a stochastic trend. This is the case in which the regression is spurious. The frequent recommendation in this case is that the regression equation be estimated in first differences. Consider the first difference of equation (5.44): Ay, = P,Ax, + Au, .
(5.45)
Since x, , y, ,and u, each contain unit roots, the first difference of each is stationary. Hence, the usual asymptotic results apply.
-
-
-
Case 3 : x, I(1) , y, 1(1) and u, I(0) The nonstationary x, and y, sequences are integrated of order one and the residual sequence is stationary. In this case, x, and y, are cointegrated.
5.7 Application to Stock Markets This section applies the unit root test to stock price data from the USA and Japan. The prices are measured based on the logarithmic values of the prices at the end of each month over a sample period from December 1969 to March 2004. The data are taken from the Morgan Stanley Capital International Index.
5.7 Application to Stock Markets
53
Table 5.1 Unit root test: USA Level Specification None Constant Constant and time trend Specification None Constant Constant and time trend
Test statistic 2.551 0.121
--
P-value 0.998 0.967
Lag Length 0 0
First Difference Test statistic P-value -19.630 0.000 -19.936 0.000
Lag Length 0 0
-2.299
-19.933
0.000
0
P-value 0.970 0.297
Lag Length 0 0
First Difference Test statistic P-value -18.662 0.000 -18.781 0.000
Lag Length 0 0
Table 5.2 Unit root test: Japan Level Specification None Constant Constant and time trend Specification None Constant Constant and time trend
Test statistic 1.537 -1.977 -0.846
-18.907
0.000
0
Table 5.1 and Table 5.2 show the empirical results. The ADF test is used to test whether the stock price index for each country has a unit root. The unit root test statistic is the t-value of in equations (5.34) through (5.36). The null hypothesis is that a unit root is included, and the alternative hypothesis shows that a unit root is not included. The lag length is selected using the AIC. A lag length of 0 is chosen in every case for both countries. In the case of the USA, the test statistic and its associate P-value are 2.551 and 0.998,O. 121 and 0.967, and -2.299 and 0.433 for the level of the stock price index and -19.630 and 0.000, -19.936 and 0.000, and -2.299 and 0.433 for the first difference. The null hypothesis of a unit root is not rejected for the level of the stock price index but is rejected for the first
a
54
5 Unit Root, Cointegration and Related Issues
difference. Accordingly, the US stock price index is found to be I(1) variable. For Japan, the test statistic and its associate P-value are 1.537 and 0.970, -1.977 and 0.297, and -0.846 and 0.959 for the level of stock price index and 18.662 and 0.000, -18.78 1 and 0.000, and -18.907 and 0.000 for the first difference. The null hypothesis of a unit root is not rejected for former case but is rejected for the latter. Thus, the Japanese stock price index is also found to be I(1) variable.
References Dickey D, Fuller WA (1979) Distribution of the estimates for autoregressive time series with a unit root. Journal of the American Statistical Society, 75: 527531 Enders W (2004) Applied econometric time series, 2nd edn. John Wiley & Sons, New York Engle RF, Granger CWJ (1987) Cointegration and error correction: representation, estimation and testing. Econometrica, 55: 25 1-276 Engle RF, Yoo BS (1987) Forecasting and testing in cointegrated systems. Journal of Econometrics, 35: 143-159 Fuller WA (1976) Introduction to statistical time series. John Wiley & Sons, New York Granger CWJ, Newbold P (1975) Spurious regressions in econometrics. Journal of Econometrics, 2: 111-120 Hamilton JD (1994) Time series analysis. Princeton University Press, Princeton
6 VAR Modeling
6.1 Stationary Process Let {y,} be n dimensional vector such as yt = [y,,,,y,,,,..-,y,,,]' . Then, the stochastic process {y,} is weakly stationary or covariance stationary if it satisfies the following conditions. 1.
2. 3.
E[y, ] is constant for all t; E[(y, - p)(yt - p)'] is constant for all t; E[(yt - p)(yt-, - p)'] depends only on s for all t, where s = f1, +2;-. .
The first condition indicates that y, has the same finite mean vector p = [pI,pZ, - - . , p n I tThe . second condition shows that y, has the same finite variance matrix. The third condition requires that the autocovariances of the process depend not on t but on s. The subject of interest in this chapter is the following VAR(p) model (vector autoregression of order p model).
where @, is a fixed ( n x 1) vector of intercept terms, mi (i=1,2,. ..,n) are fixed ( n x n ) coefficient matrices, and u, = [u,, ,u,, ,-..u,]' is a ndimensional innovation process that satisfies the following conditions:
56
6 VAR Modeling
E[u,ut-,
'I =
C for s = 0, 0 otherwise,
where the covariance matrix C is assumed to be nonsingular. The VAR(p) process is covariance stationary if all values of z satisfying
lie outside the unit circle. Example 6.1 : The bivariate VAR(1) model can be written as follows:
Note that the right hand side of equation (6.1) contain only predetermined variables and that the error terms are serially uncorrelated with constant variance. Hence, each equation in the system can be estimated using OLS (ordinary least squares). Moreover, OLS estimators are consistent and asymptotically efficient.' In a VAR, the long lag lengths quickly consume degrees of freedom. If lag length isp, each of the n equations contains np coefficients plus the intercept term. It is important to select the appropriate lag length. To check lag length, we can use the multivariate version of the AIC or SBIC:
If some of the VAR equations have regressors not included in the others (including the possibility of different lag length), the system is called a near-VAR. In this case, SUR (seemingly unrelated regressions) can provide efficient estimates of the VAR coefficients.
6.2 Granger Causality
57
-
l~g(T)~, SBIC =log/CI +--T where I f I is the determinant of the variance covariance matrix of the residuals, and N is the total number of parameters estimated in all equations. If each equation in an n-variable VAR has p lags and an intercept, each of the n equations has np lagged regressors and an intercept. Thus, there are N = n2p+ n parameters. We can determine the appropriate lag length by choosing the specification with the lowest value of the AIC or SBIC.
6.2 Granger Causality Granger (1969) developed a method to analyze the causal relationship among variables systematically. In the Granger approach to the causality from y,, to y,,, , we determine whether the past values of y,,, can help to explain the current y,,, . We begin by defining three information sets,
The information set I,,, consists of the history of y,,, , the information set I,,, consists of the history of y,,, , and the information set I, consists of both y,,, and y,,, . We say that y,,, Granger-causes y,,, if
Equation (6.8) implies that y,,, Granger-causes y,,, ( y,,, -+ y,,, ) if y,,, helps in the prediction of y,,,
. If
y,,, does not help in the prediction of
Y,,, , Y,,, does not Granger-cause Y,,, ( y1.t %Y,, )Similarly, we say that y,,, Granger-causes y,,, if
58
6 VAR Modeling
To explain the testing procedure concretely, let us consider the following bivariate VAR(p) process:
where u,,, (i=1,2) is a disturbance term. Then, y,,, does not Granger-cause y,,, if
To analyze whether y,, Granger-causes Y,,~,we carry out the following hypothesis testing:
Rejection of the null hypothesis ( H o ) implies that some of the coefficients on the lagged y,, 's are statistically significant, whereas acceptance of the null hypothesis may indicate that none of the coefficients on the lagged y,,'s are statistically significant. In the former case y,,, Granger- causes y,,, ,while in the latter y,,, may not Granger-cause y,,, . This can be tested using the F test or asymptotic chi-square test.2 For the bivariate model, we have four cases to consider.
(RSS - USS) 1p
-
F(p,T-2p-I), where T USS/(T - 2p - 1) is the sample size, RSS is the restricted residual sum of squares and USS is the (p) . X 2 unrestricted residual sum of squares. It is also shown that p ~ A
F-statistic is shown as follows: F =
6.3 Cointegration and Error Correction
Case 1: Yl,, -+ Y 2 , but ~ , , t t J ~ l ., t In this case, we have a one-way causality running from y,,, to y,,,
59
.
Case 2: Y , , Y , , but y,tAy2,t . In this case, we have a one-way causality running from y,,, to y,, . -+
Case 3: Y,,, -+ Y,,, and Y2,t + Yl,,. Here we obtain a feedback between y,,, and y,,,
.
Case 4: ~ l , ~ t J Y , ,and t ~2,t%~l,t Here we obtain no causal relationship between y,,, and y,,, . While Granger-causality measures the precedence and information content, note that it does not measure causality by itself in the more common sense of the word.
6.3 Cointegration and Error Correction As discussed in the last chapter, the principal feature of cointegrating variables is the influence of deviations from long-run equilibrium on their time paths. Thus, the deviation from the long-run relationship influences the short-run dynamics. Here, let all variables in y, = [y,,,,y,,,]' be I(1). Consider the following VAR(2) model:
where u, = [u,,,,u,,, 1' is a vector of disturbance term. Then, it holds that
60
6 VAR Modeling
where I I = @ , + @ , - I , r = - @ , . If we solve equation (6.14) in terms of nyt-, , we obtain the following:
Since the right-hand side of equation (6.15) is stationary, the left-hand side should also be stationary. Here we must consider two cases to satisfj the stationarity . Case 1: II=0 In the first case, II is a zero matrix. Under this condition, equation (6.14) becomes:
This corresponds to the standard VAR in first differences.
-
Case 2: ny,-, I(0) In the second case, the product of l l and y,-, is stationary. This implies that the linear combination of y,,, and y, , is stationary, and thus that these two variables are cointegrated. In this case,
n~,-, can be expressed as:
where
The a vector is called the adjustment vector and P is the cointegrating vector. Substituting equation (6.17) into equation (6.13) yields the following:
6.4 Johansen Test
61
The term EC,-, is called the error correction term and equation (6.18) is called the vector error correction model (VECM). The cointegrating relation is interpreted as the long-run equilibrium. That is, the relationship Plyl,, + P2y2,,= 0 is satisfied in the long-run. The error correction term is interpreted as the deviation from the long-run equilibrium. Equation (6.18) is called the error correction model since y, moves in such a way that it adjusts to the past error (EC,-,). Thus, estimating y, as a VAR in first difference is inappropriate if y, has an error-correction representation.
6.4 Johansen Test As we discussed in the last chapter, the Engle-Granger test has several limitations. Johansen and Juselius (1990) developed an alternative approach to test for cointegration. Consider the following general model of equation (6.14):
We can use the rank of I3 to determine whether or not the variables in y, is cointegrated: the rank of I3 is equal to the number of independent cointegrating vectors. The number of cointegrating vectors can be obtained by checking the significance of the characteristic roots of I3 . We know that the rank of a matrix is equal to the number of its characteristic roots that differ from zero. If variables in y, are not cointegrated, rank(l3) = 0 and all of these characteristic roots will equal zero. There is a single cointegrating vector if rank(n) = l . There are multiple cointegrating vectors if 1< rank(n) < n where n is a number of variables. Two testing procedure can be used to find the number of cointegrating vectors, a variable indicated by r. The first procedure, the "trace test," applies a test statistic written as ,X . The trace test tests the null hypothesis that the number of distinct cointegrating vectors is less than or equal to r against a general alternative. Thus, they are shown as follows:
62
6 VAR Modeling
The second procedure, the "maximum eigenvalue test," applies a statistic written as ,,A . The maximum eigenvalue test tests the null hypothesis that the number of cointegrating vectors is r against the alternative r + 1 cointegrating vectors. They are shown as follows:
We apply these tests in a sequential manner starting from H, :r = 0 . A testing sequence terminates when H, is rejected for the first time. Osterwald-Lenum (1992) refines the critical values of the A,, and A, statistics originally calculated by Johansen and Juselius (1990).
6.5 LA-VAR As the foregoing discussion makes clear, the integration and cointegration must be tested before specifying the model. The VAR model in the firstorder differences is used when the variables are integrated of order one and have no cointegration between them, while the VECM is used when the variables are integrated of order one and do have cointegration between them (Fig.6.1). However, the standard approach to testing economic hypotheses conditioned on the testing of a unit root and cointegration may suffer from severe pretest bias. Toda and Yamamoto (1995) developed the LA-VAR (lag-augmented VAR) to overcome this problem. Their approach is appealing because it remains applicable regardless of whether the VAR process is stationary, integrated, or cointegrated. Suppose that a two dimensional vector y, =[y,,,,y,,,]' is generated by the VAR(k) model as follows:
6.5 LA-VAR
I
Cointegration
63
I
Differences
Fig. 6.1 Specification of the model
To analyze whether y,, Granger-causes y,,,, we carry out the following hypothesis testing:
Next, we consider estimating a VAR formulated in levels by ordinary least squares (OLS), as follows:
64
6 VAR Modeling
where p is equal to the true lag length (k) plus the possible maximum integration order considered in the process (dm,). Note that the order of integration of the process should not exceed the true lag length of the model (dm, k ). Since the true coefficients of $,,(k + I),-.-,$,,(p) are zero, we should note that they are not included in the restriction in (6.23). We can test the null hypothesis using an asymptotic chi-square distribution with k degrees of freedom. As noted by Toda and Yamamoto (1995), however, the LA-VAR method is inefficient in terms of power and should not totally replace conventional hypothesis testing methods, which are conditional on unit root and cointegration tests. Therefore, both methods can be used to assess the robustness of the empirical results.
6.6 Application to Stock Prices This section applies the cointegration test to stock price data from the USA and Japan. The prices are measured based on the logarithmic values of the prices at the end of each month over a sample period from December 1969 to March 2004. The data are taken from the Morgan Stanley Capital International Index. This is the same data used in the last chapter. Table 6.1 shows the empirical results. In the last chapter, we found that both US and Japanese stock prices are I(1) variables. Thus, the Johansen test is used to test whether the stock price index for each country has a cointegrating relation. The cointegration test statistic is the trace test statistic ( A,, ) and maximum eigenvalue statistic ( A,,,,, ). The null hypothesis holds that there is no cointegrating relation, and the alternative hypothesis holds that there is a cointegrating relation. The results confirm the absence of any cointegrating relation between the US and Japanese stock prices for all lag length. Thus, we specifj the model as a VAR in first differences and carry out the Granger causality test. Table 6.2 shows the F-test statistic and its corresponding P-value. As we can clearly see from this table, the US stock prices Granger-cause Japanese stock prices.
References
65
Table 6.1 Cointegration test Lag= 1
lag=3
lag=6
5% critical value
Table 6.2 Granger causality test --------.---- Null Hypothesis l_lll_..."~l~l%=L l_-lmp-.-
USA does not Granger-cause Japan Japan does not
9.628 (0.002) 0.064
lag=3 3.603 (0.014) 0.299 ll.-l_llllllllllll---.
lag=6 ".. -----.-.2.271 (0.036) 1.206
Numbers in parentheses are P-value
References Enders W (2004) Applied econometric time series, 2nd edn. John Wiley & Sons, New York Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral methods. Econornetrica, 37: 161-194 Hamilton JD (1994) Time series analysis. Princeton University Press, Princeton Johansen S, Juselius K (1990) Maximum likelihood estimation and inferences on cointegration with application to the demand for money. Oxford Bulletin of Economics and Statistics, 52: 169-210 Ostenvald-Lenum M (1992) A note with quantiles of the asymptotic distribution of the maximum likelihood cointegration rank test statistics, 54: 461-472 Toda H, Yamamoto T (1995) Statistical inference in vector autoregressions with possibly near integrated processes. Journal of Econometrics, 66: 225-250
7 Time Varying Volatility Models
7.1 Background Since the seminal work by Engle (1982), the ARCH model has reached a remarkable level of sophistication.' The ARCH model has become one of the most prevalent tools for characterizing changing variance. Consider the basic nature of the forecasting problem. When the volatility of stock returns is constant, the confidence interval for the stock return is a function of the sample variance or sample standard deviation. Here, the volatility implies the conditional variance of asset returns. Yet the shocks that affect the stock returns are also likely to affect the volatility of the stock returns, hence the sample variance or standard deviation will not be constant. For this reason, the development of a reasonably accurate confidence interval for forecasting requires an understanding of the characteristics of volatility in relationship to the stock returns. The ARCH process explicitly recognizes the difference between unconditional and conditional variance and allows the latter to change over time as a function of past errors. Data has also shown that the percentage changes in stock prices have fatter tails than the percentage changes predicted by stationary normal distributions (Kon 1984). The ARCH model recognizes the temporal dependence in the second moment of stock returns and exhibits a leptokurtic distribution for the unconditional errors from the stock-return-generating process. Under the recognized phenomenon known as volatility clustering, a period of increased (decreased) volatility is frequently followed by a period of high (low) volatility that persists for some time. As the ARCH model takes the high persistence of volatility into consideration, it has often been extended to more complex models used to characterize changing variance as a function of time. Bollerslev (1986) extended it into the GARCH (general-
' A survey article by Bollerslev, Chou, and Kroner (1992) cited more than 300 papers applying ARCH, GARCH, and other related models. ARCH and GARCH models were shown to successfUlly model time-varying volatility in financial time-series data.
7 Time Varying Volatility Models
68
ized ARCH) model; Glosten, Jagannathan, and Runkle (1993) and Zakoian (1994) extended it to the TGARCH (threshold GARCH) model; and Nelson (1991) extended it to the EGARCH (exponential GARCH) model. This chapter focuses on the ARCH-type modeling approach and the causality technique developed by Cheung and Ng (1996).
7.2 ARCH and GARCH Models We begin with a brief review of the ARCH family of statistical models. The ARCH model was originally designed by Engle (1982) to model and forecast the conditional variance. The process allows the conditional variance to change over time as a function of past errors while the unconditional variance remains constant. Let variable y, have the following AR(k) process:
where z, is identical and independent distribution (i.i.d.) with E[z,] = 0 and ~ [ z : ]= 1 , and z, and o, are statistically independent. It thus holds that
Thus, the conditional variance of y, is equal to the conditional variance of Et
The kurtosis (K) of
E,
is defined as follows:
7.2 ARCH and GARCH Models
69
Suppose z, has a normal distribution. Since the kurtosis of a normal distribution is equal to 3, we have K(z,) =E[Z;]/(E[Z:])~ = 3 . For c , , it holds that
where the second equality follows from the independence of o, and z, , and the inequality in the fourth line is implied by Jensen's ineq~ality.~ Equation (7.5) shows that the distribution of E, has a fatter tail than normal as long as o, is not constant (Campbell, Lo and MacKinlay, 1997). This is consistent with the idea that the percentage changes in stock prices have fatter tails than the percentage changes predicted by a stationary normal distribution (Kon 1984). The ARCH(p) model is specified as follows:
The conditional variance at time t depends on two factors: a constant ( o ) and past news about volatility taken as the squared error from the past (the ARCH term, i.e.,
xp 1=1
aia:, ). The p of the ARCH@) refers to the number
of ARCH terms in equation (7.4). The condition o 2 0 ,ai2 0 guarantees the non-negativity of variance. As equation (7.6) clearly shows, the conditional variance is the weighted average of the squared values of past errors. For the ARCH model, it holds that Var,-,[y,] = E,-,[&:I = o:E,-,[z:] = o: , where o: is the conditional variance of y, and is called volatility. Let X be a random variable with mean E[X], and let g(.) be a convex function. Here, Jensen's inequality implies E[g(X)] 2 g(E[X]) . For example: g ( ~ ) =is~convex, 2 hence E[x'] > (E[x])'.
70
7 Time Varying Volatility Models
The GARCH model developed by Bollerslev (1986) is an extension of the ARCH model. The ARCH(p) process specifies the conditional variance solely as a linear function of past sample variances, whereas the GARCH(p,q) process allows lagged conditional variances to enter as well. This corresponds to some sort of adaptive learning mechanism. The variance dynamics is thus specified as follows:
The conditional variance at time t depends on three factors: a constant (o), past news about volatility taken as the squared error from the past (the ARCH term, i.e., Celaia:i ), and past forecast variance (the GARCH term, i.e.,
xq r=l
~,o:, ). The (p,q) in GARCH(p,q) refers to p ARCH terms
and q GARCH terms. The condition 0 2 0 , a, 2 0 , Pi 2 0 guarantees the non-negativity of ~ariance.~ This specification is logical since variance at time t is predicted by forming a weighted average of the forecast from the past and either a long-term average or constant variance. Example 7.1 : GARCH(1,l) Model Let us consider the simple GARCH(1,l) model as follow^:^
As clearly seen from equation (7.8), the GARCH (1,l) model includes one ARCH term and one GARCH (o:,) term. If equation (7.8) is lagged by one period and substituted for the lagged variance on the righthand side, an expression with two lagged squared errors and a two-period
Nelson and Cao (1992) show that inequality constraints less severe than those commonly imposed are sufficient to keep the conditional variance non-negative. In the GARCH(2,l) case, for example, w > 0 , al 2 0 ,
PI 2 0 , and
P,al+a22 0 are sufficient to ensure
or > 0 , such that a>may be negative.
The parameter subscripts are not necessary for the GARCH(1,1), TGARCH(l,l), and EGARCH(1,l) models and are suppressed for the remainder of this section.
7.3 TGARCH and EGARCH Models
71
lagged variance is obtained. By successively substituting for the lagged conditional variance, the following expression is found:
A sample variance would give each of the past squares an equal weight rather than a declining weight. The GARCH variance is thus like a sample variance, but one that emphasizes the most recent observations. Since o: is the variance forecasted one period ahead based on past information, we call it the conditional variance or volatility. The unpredictable in squared returns is given by
an equation which, by definition, is unpredictable based on the past. Substituting equation (7.10) into equation (7.8) yields the following alternative expression:
We can immediately see that the squared errors ( E: ) follow an ARMA(1,l) process. The autoregressive root is the sum of a and p , a value that governs the persistence of volatility shocks.
7.3 TGARCH and EGARCH Models In the models discussed so far, the variance dynamics are treated as symmetric, wherein a positive shock and negative shock of the same magnitude will have the same effect on the present volatility. Christie (1982) and Schwert (1989), however, pointed out that downward movements in the market are often followed by higher volatility than upward movements of the same magnitude. Glosten, Jagannathan and Runkle (1993), Zakoian (1994) and Nelson (1991) explicitly treat this asymmetry in variance in their extensions of the GARCH model.
72
7 Time Varying Volatility Models
Glosten, Jagannathan and Runkle (1993) and Zakoian (1994) proposed the threshold GARCH model (TGARCH model) to specify the asymmetry of volatility. The TGARCH(p,q) model is specified as follows:
where the dummy variable D,, is equal to 0 for a positive shock (E,, > 0 ) and 1 for a negative shock (E,, < 0). Provided that yi > 0 , the TGARCH model generates higher values for ot given E,, < 0 , than for a positive shock of equal magnitude. As with the ARCH and GARCH models, the parameters of the conditional variance are subject to non-negativity constraints. Example 7.2: TGARCH(1,l) Model As a special case, the TGARCH(1,l) model is given as:
In this case, equation (7.13) becomes
for a positive shock ( E,, > 0 ), and
for a negative shock (E,, < 0). Thus, the presence of a leverage effect can be tested by the hypothesis that y = 0 , where the impact is asymmetric if y+O. An alternative way of describing the asymmetry in variance is through the use of the EGARCH (exponential GARCH) model proposed by Nelson (199 1). The EGARCH(p,q) model is given by
7.3 TGARCH and EGARCH Models
73
where z, = E, /ot . Note that the left-hand side of equation (7.16) is the log of the conditional variance. The log form of the EGARCH(p,q) model ensures the non-negativity of the conditional variance without the need to constrain the coefficients of the model. The asymmetric effect of positive and negative shocks is represented by inclusion of the term z,, . If yi > 0 , volatility tends to rise (fall) when the lagged standardized shock, z ~=- ~~ ~ - ,~is/positive o ~ - (negative). ~ The persistence of shocks to the conditional variance is given by
x"i 1=1
. Since negative coefficients are not
precluded, the EGARCH models allows for the possibility of cyclical behavior in volatility. Example 7.3: EGARCH(1,l) Model As a special case, the EGARCH(1,l) model is given as follows:
Equation (7.17) becomes
for a positive shock (z, > 0), and
for a negative shock (z, < 0). Thus, the presence of a leverage effect can be tested by the hypothesis that yi = 0 , where the impact is asymmetric if y, # 0 . Furthermore, the sum of a and governs the persistence of volatility shocks in the GARCH (1,l) model, whereas only parameter P governs the persistence of volatility shocks in the EGARCH(1,l) model.
74
7 Time Varying Volatility Models
7.4 Causality-in-Variance Approach Cheung and Ng (1996) developed a testing procedure for causality-inmean and causality-in-variance. This test is based on the residual crosscorrelation function (CCF) and is robust to distributional assumptions. Their procedure to test for causality-in-mean and causality-in-variance consists of two steps. The first step involves the estimation of univariate timeseries models that allow for time variation in both conditional means and conditional variances. The second step constructs the residuals standardized by conditional variances and the squared residuals standardized by conditional variances. The CCF of standardized residuals is used to test the null hypothesis of no causality-in-mean, while the CCF of squaredstandardized residuals is used to test the null hypothesis of no causality-invariance. In the vein of Cheung and Ng (1996) and Hong (200 I), let us summarize the two-step procedure of testing causality. Suppose that there are two stationary time-series, X, and Yt . When I,, , I,, and I, are three information sets defined by I,,, = (X, ,X,,,..-), I,,, = (Y,, Y,-,,-..) and I, = (X,, X,, ,. ..Y,, Y,-,,-.-), Y is said to cause X in mean if
Similarly, X is said to cause Y in mean if
Feedback in mean occurs if Y causes X in mean and X causes Y in mean. On the other hand, Y is said to cause X in variance if
where px , is the mean of X, conditioned on 11,,,. Similarly, X is said to cause Y in variance if
7.4 Causality-in-Variance Approach
75
where py , is the mean of Yt conditioned on I,,,, . Feedback in variance occurs if X causes Y in variance and Y causes X in variance. The causality-in-variance has its own interest since it is directly related to volatility spillover across different assets or markets. As the concept defined in equations (7.20) through (7.23) is too general to test empirically, additional structure is required to make the general causality concept applicable in practice. Suppose X, and Yt can be written as:
where E, and 6, are two independent white noise processes with zero mean and unit variance. For the causality-in-mean test, we have the standardized innovation as follows:
Since both
E,
and
6, are unobservable, we have to use their estimates, 6,
and t t , to test the hypothesis of no causality-in-mean. Next, the sample cross-correlation coefficient at lag k, FEc(k), is computed from the consistent estimates of the conditional mean and variance of X, and Y, . This gives us
76
7 Time Varying Volatility Models
where c,<(k) is the k-th lag sample cross-covariance given by
and similarly, c,, (0) and cLL(0)are defined as the sample variances of
E,
and 6, , respectively. Causality in the mean of X, and Y, can be tested by examining fEL(k), the univariate standardized residual CCF. Under the condition of regularity, it holds that
shows the convergence in distribution. where We can test the null hypothesis of no causality-in-mean using this test statistic. To test for a causal relationship at a specified lag k, we compute fiiE,(k). If the test statistic is larger than the critical value of standard normal distribution, then we reject the null hypothesis. For the causality-in-variance test, let u, and v, be the squares of the standardized innovations, given by
Since both u, and v, are unobservable, their estimates, fit and ir,, have to be used to test the hypothesis of no causality-in-variance.
7.5 Information Flow between Price Change and Trading Volume
77
Next, the sample cross-correlation coefficient at lag k, ?uv(k),is computed from the consistent estimates of the conditional mean and variance of X, and Yt . This gives us:
where cuv(k) is the k-th lag sample cross-covariance given by
and similarly, cuu(0) and c, (0) are defined as the sample variances of u, and v, , respectively Causality in the variance of X, and Yt can be tested by examining the squared standardized residual CCF, ?uv (k) . Under the condition of regularity, it holds that
We can test the null hypothesis of no causality-in-variance using this test statistic. To test for a causal relationship at a specified lag k, we compute fiiUv(ki). If the test statistic is larger than the critical value of standard normal distribution, then we reject the null hypothesis.
7.5 Information Flow between Price Change and Trading Volume Many researchers have studied the interaction between price and trading volume in financial asset^.^ The importance of this relationship stems from the widespread belief that the arrival of new information induces trading in asset markets. The trading volume is thought to reflect information about The content of this section is based on the following paper with permission from the journal: Bhar R,Hamori S (2004) Information flow between price change and trading volume in gold futures contracts, International Journal of Business and Economics, 3: 45-56.
78
7 Time Varying Volatility Models
aggregate changes in investor expectations. Researchers might also be interested in the prospect of devising profitable technical trading rules based on strong relationships they observe between price and trading volume. The ability to forecast better price movement in the futures market might also help improve hedging ~trategies.~ This section attempts to characterize the interaction between the percentage price change and trading volume in gold futures contracts. Gold futures are an interesting market to investigate since the events in other markets, for example, equities, are generally expected to influence the trading of gold. When the equity market underperforms, speculative trading in the gold market is likely to rise. If this occurs, the rising short sales of gold will be transacted chiefly in the futures market due to the relative difficulty of taking short positions in the physical market. In combination, these changes could lead to different patterns of information flow between the percentage price change and trading volume in gold futures as opposed to the other commodity futures contracts mentioned before. This chapter uses daily data on the gold future price and trading volume from January 3, 1990 to December 27, 2000. The continuous series of futures data are obtained from Datastream and represent NYMEX daily settlement prices. The percentage return is calculated as y,=(P, -P,-,)x 1004-,, where P, is the future price at time t. Thus, the percentage price change is obtained for the period between January 4, 1990 and December 27,2000. We model the dynamics of the percentage price change and trading volume using the AR-GARCH process as follows. The simplicity of the AR structure in the mean equation justifies its use for the single time series here. The GARCH effect in the variance process is well known for most futures contracts, particularly in the daily frequency.
where y, is the percentage price change ( R , ) or trading volume (V, ). Equation (7.36) shows the conditional mean dynamics and is specified as
Other recent studies focusing on these issues include Fujihara and Mougoue (1997), Moosa and Silvapulle (2000), and Kocagil and Shachmurove (1998).
7.5 Information Flow between Price Change and Trading Volume
the AR( p, ) model. Here,
E,
79
is the error term with its conditional variance
ot2. Equation (7.37) shows the conditional variance dynamics and is
specified as the GARCH ( p, ,p, ) model. The variables p, and p, are the number of ARCH terms and GARCH terms, respectively. The results from the fitting of the AR-GARCH model to the percentage price change and trading volume are reported in Table 7.1. Schwarz Bayesian information criteria (SBIC) and diagnostic statistics are used to choose the final models from various possible AR-GARCH specifications. The maximum likelihood estimates confirm that the percentage price change and trading volume exhibit significant conditional heteroskedasticity. The lag order of the AR part in the mean equation (7.36) is set at five for price data and ten for trading volume data. The GARCH (2,l) model is chosen for the percentage price change, while the GARCH (1,l) model is chosen for the trading volume. For the price data, the coefficient of the GARCH term is 0.966 and the corresponding standard error is 0.009, indicating substantial persistence. For the trading volume data, the coefficient of the GARCH term is relatively small, 0.908, and its corresponding standard error is 0.035, indicating less persistence compared to the price data. Q(20) and ~ ~ ( 2 are 0 ) the Ljung-Box statistics with 20 lags for the stan0) dardized residuals and their squares. The Q(20) and ~ ~ ( 2 statistics, values calculated from the first 20 autocorrelation coefficients of the standardized residuals and their squares, indicate that the null hypothesis of no autocorrelation is accepted for both the price and trading volume. This suggests that the selected specifications explain the data quite well. The cross correlations computed from the standardized residuals of the AR-GARCH models of Table 7.1 are given in Table 7.2. The "lag" refers to the number of days that the trading volume data lags behind the percentage price change data. The "lead" refers to the number of days that the percentage price change data lags behind the trading volume data. The significance of a statistic in the "lag" column implies that the trading volume causes the percentage price change. Similarly, the significance of a statistic in the "lead" column implies that the percentage price change causes the trading volume. Cross correlation statistics under the "Levels" columns are based on standardized residuals themselves and are used to test for causality in the mean. Cross correlation statistics under the "Squares" columns are based on the squares of standardized residuals and are used to test for causality in the variance.
80
7 Time Varying Volatility Models
Table 7.1 AR-GARCH model for percentage price change and trading volume
a
c Estimate
e Change SE
Estimate
SE
a0
-0.016
0.012
3.069**
0.296
a,
-0.033
0.025
0.368""
0.021
a2
-0.013
0.021
0.090**
0.020
a3
-0.045
0.024
0.077""
0.022
a4
0.017
0.024
0.065""
0.023
a5
0.034
0.021
0.037
0.022
a6
-0.006
0.021
a7
-0.006
0.020
as
-0.016
0.020
a9
0.046*
0.021
ci)
0.001
0.001
0.043* 0.014*
0.021 0.007
a,
0.177**
0.062
0.041**
0.014
a 2
-0.141*
0.060
PI
0.966**
0.009
0.908**
0.035
itlo
Loglikelihood
-2974.926
P - value
15.978 0.718
Q~(20) P - value
18.300 0.568
Q(20)
* indicates significance at the 5% level. ** indicates significance at the
1% level. Bollerslev-Woodridge (1992) robust standard errors are used to calculate the t-value. Q(20) and ~ ' ( 2 0 ) are the Ljung-Box statistics with 20 lags for the standardized residuals and their
squares.
References
81
Table 7.2 Cross correlation analysis for the levels and squares of the standardized
residuals Levels
Squares
p p p
k
Lag
Lead
R & V(-k)
R & (+k)
0
10
0.019
0.000
0.003
Lag
Lead
k 0
R & V(-k)
R & V(+k)
10
0.024
0.223**
0.016
* indicates significance at the 5% level. ** indicates significance at the 1% level. The empirical results of cross correlations in Table 7.2 reveal a complex and dynamic causation pattern between the percentage price change and the trading volume. For instance, the feedback effects in the means involve a high-order lag structure. Trading volume causes the mean percentage price change at lag three at the 5% significance level. The percentage price change causes the mean of the trading volume at lag three and seven at the 5% significance level. Further, there is evidence of strong contemporaneous causality-in-variance and mild lagged causality-in-variancewhen moving from the percentage price change to the trading volume, but not vice versa. The percentage price change causes variance of the trading volume at lag two at the 5% significance level. These results show that a proper account of conditional heteroskedasticity can have significant implications for the study of price change and trading volume spillovers. The infonnation flows between the price change and trading volume influence not only their mean movements, but also their volatility movements in this market.
References Bhar R, Hamori S (2004) Information flow between price change and trading volume in gold futures contracts. International Journal of Business and Economics, 3: 45-56 Bollerslev T (1986) Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 3 1: 307-327
82
7 Time Varying Volatility Models
Bollerslev T, Chou RY, Kroner KF (1992) ARCH modeling in finance. Journal of Econometrics, 52: 5-59 Campbell JY, Lo AW, McKinlay AC (1997) The econometrics of financial markets. Princeton University Press, Princeton, New Jersey. Cheung Y-W, Ng LK (1996) A causality-in-variance test and its application to financial market prices. Journal of Econometrics, 72: 33-48 Christie AA (1982) The stochastic behavior of common stock variances: value, leverage and interest rate effects. Journal of Financial Economics, 10: 407-432 Enders W (2004) Applied econometric time series, 2nd edn. John Wiley & Sons, New York Engle RF (1982) Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation. Econometrica, 50: 987-1008 Fujihara RA, Mougoue M (1997) An examination of linear and nonlinear causal relationships between price variability and volume in petroleum futures markets. Journal of Futures Markets, 17: 385-416 Glosten LR, Jagannathan R, Runkle DE (1993) On the relation between the expected value and the volatility of the nominal excess return on stocks. Journal of Finance, 48: 1779-1801 Hong Y (2001) A test for volatility spillover with application to exchange rates. Journal of Econometrics, 103: 183-224 Kocagil AE, Shachmurove Y (1998) Return-volume dynamics in futures markets. Journal of Futures Markets, 18: 399-426 Kon SJ (1984) Models of stock returns: a comparison. Journal of Finance, 39: 147-165 Moosa IA, Silvapulle P (2000) The price-volume relationship in the crude oil futures market: some results based on linear and nonlinear causality testing. International Review of Economics and Finance, 9: 11-30 Nelson DB (1991) Conditional heteroskedasticity in asset returns: a new approach. Econometrica, 59: 347-370 Nelson DB, Cao CQ (1992) Inequality constraints in the univariate GARCH model. Journal of Business and Economic Statistics, 10: 229-235 Schwert GW (1989) Why does stock market volatility change over time? Journal ofFinance, 44: 1115-1153 Zakoian J-M (1994) Threshold heteroskedastic models. Journal of Economic Dynamics and Control, 18: 93 1-955
8 State-Space Models (I)
8.1 Background Many problems that arise in the analysis of data in such diverse fields as air pollution, economics, or sociology require the researcher to work with incompletely specified noisy data. For example, one might mention pollution data where values can be missing on days when measurements were not made or economic data where several different sources are providing partially complete data relating to some given series of interest. We, therefore, need general techniques for interpolating sections of data where there are missing values and for construction of reasonable forecasts of future values. A very general model that subsumes a whole class of special cases of interest in much the same way that linear regression does is the state-space model (SSM) or the dynamic linear model (DLM). This was introduced by Kalman (1960), and Kalman and Bucy (1961). The model was originally intended for aerospace-related research but it has found immense application in economics. In this approach typically dynamic time series models that involve unobserved components are analyzed. The wide range of potential applications in econometrics that involve unobserved variables include, for example, permanent income, expectations, the ex ante real interest rate etc. Before we introduce the basic idea in the formulation of a state-space model, we will revise the concepts in classical regression, both univariate and multivariate cases.
8.2 Classical Regression We start this discussion in the time series context by assuming some output or dependent time series, say, x,, t = 1,2, ...,n , that is being influenced by a collection of possible input or independent series, say, z,,, z,,,..., z,, ,
84
8 State-Space Models (I)
where we consider the input as fixed and known. This relationship is expressed through the linear regression model, as,
where PI,P,, ...,Pq are unknown fixed regression coefficients and w, is the random error or noise. This is normally assumed to be white noise with mean zero and variance o i . This linear model above can be more conveniently described using matrix and vector notation as,
where, P = [P, ,P,, ...,P, 1' and z, = [z,, ,z,, ,...,ztq]'. When the noise term w, is normally distributed it can be shown that an estimate of the coefficients of interest are given by,
when the rank of (z'z) is q. [z,, Z, ,..., z,]' and X = [xl,x2,...,X, of the noise is given by,
In equation (8.3) we define, Z =
1' .Similarly an estimate of the variance
It is often necessary to explore the statistical significance of the estimates of the coefficients and this is done with the help of the diagonal elements of the covariance matrix of the coefficients given by,
8.2 Classical Regression
85
To understand the capabilities of the state-space models a basic understanding of the multivariate time series regression technique is required. We next discuss this multivariate time series technique in the classical regression context. Suppose, instead of a single output variable y, , a collection of p output variables y,,, yt2,...,y, exist that are related to the inputs as,
for each i = 1,2,...,p output variable. We assume that the noise wti terms are correlated through the identifier I but uncorrelated through t. We will denote cov .(wis,w j, ) = o, for s=t and 0 otherwise. In matrix notation, let yt = [ytl,y,, ,..., y,]' be the vector of outputs, and B = {Pij},i = 1,2,..., p , j = 1,2, ...,q be an p x q matrix containing the regression coefficients, then,
Here, the p x 1 vector process w, is the collection of independent noises with covariance matrix E {w,wI) = Z, , the p x p matrix containing the covariances o,,. The maximum likelihood estimator in this case (similar to the univariate case) is,
where, Z' = [z, ,z2,...,zn] and Y' = [y,,y2,..., yn] . The noise covariance matrix is given by,
Again the standard error of estimator of the coefficients are given by,
86
8 State-Space Models (I)
where 8, is the j' diagonal element of ment of
(CLz,z;)'
e, and ciiis the i'
diagonal ele-
.
8.3 Important Time Series Processes In this section we review the structure of most commonly encountered univariate and multivariate time series processes. These are autoregressive (AR), autoregressive moving average (ARMA), vector autoregressive (VAR) and vector autoregressive moving average (VARMA) processes. These time series processes become very useful while discussing statespace models. An AR process of order p i.e. AR(p) for a univariate series x, is represented as,
Here pis a constant. It is instructive to note that by suitably defining some vector quantities, it is possible to represent equation (8.1 1) as a regression equation examined before. For example, let $ = [$', $2,...,$p ]' and X,-, = [x,-,,x,-, ,..., x,-,Iq, then,
There are, however, some technical difficulties with this representation when compared to the regression equation (8.2), since z, was assumed fixed. In this case, X,-, is not fixed. We will not, however, pursue further these technical issues. An alternative to the autoregressive model in which x, on the left hand side is assumed to be determined by a linear combination of the white
8.3 Important Time Series Processes
87
noise w, on the right hand side. This gives rise to the moving average model of order q, MA(q), and is represented as,
where there are q lags in the moving average and €4, €4, ...0, are parameters that determine the overall pattern of the process. A combination of these two models has also found many useful applications. This is referred to as autoregressive moving average or ARMA(p,q) model. This has autoregressive order p and moving average order q. The general structure is,
I -US Stock Price . - - - -Stock . Return I Fig. 8.1 US stock price and stock return
88
8 State-Space Models (I)
Another important concept that goes with time series modeling is the concept of stationarity. A time series is strictly stationary if the probabilistic behavior of xt,,xt2,...,xtk is identical to that of the shifted set, for any collection of time points t,, t,, ...,t, , for any x ~ , +xt,+,, ~ , ,..., number k = 1,2,..., and for any shift h = 0,f 1,f 2,... . This means that all multivariate distribution functions for subsets of variables must agree with their counterparts in the shifted set for all values of the shift parameter h. For a practical understanding of stationary and non-stationary series we show the US stock price movement over the period January 1980 to December 1998 in Fig.8.1. We also include the stock return i.e. percentage change in stock price over the same period in Fig.8.1. It should be visually clear that the stock price series is non-stationary, whereas the return series is stationary. There are, however, statistical tests that can be employed to establish that fact. A typical way to convert a non-stationary series to a stationary one is to take consecutive differences of the values. In a sense, to compute the percentage change in the stock price that is what we are doing. We next discuss some examples of time series processes involving multiple time series. In dealing with economic variables often the value of one variable is not only related to its past value in time, in addition, it depends on past values of other variables. For instance, household consumption expenditures may depend on variables such as income, interest rates, and investment expenditures. If all these variables are related to the consumption expenditures it makes sense to use their possible additional information content in forecasting consumption expenditures. Consider a possible relation between growth rate in GNP, money demand (M2) and an interest rate (IR). The following VAR(2) model may express the relationship between these variables:
where the noise sources may have the following covariance structure:
8.4 Recursive Least Squares
I":
89
o."'ij
Cw = 0.03 0.09
As with the ARMA model for univariate series, the structure of a VARMA(2,l) model for a bi-variate system has the following appearance:
8.4 Recursive Least Squares An estimator of the parameter vector in the classical regression model is computed from the whole of the given sample. Referring to equation (8.3) it is clear that such an estimate requires inversion of a product matrix. Suppose that instead of all n observations being available at the same time, we have access to data up to time t, and we compute an estimate of the parameters utilizing that available series. We then need a mechanism to update the estimate using the newly available data. There are some advantages of generating regression estimates this way. For example, (a) it enables us to track changes in the parameters with time, and (b) it reduces the computation task by not requiring large matrix inversion. This section outlines this procedure and it will help understand the nature of state-space models (discussed in the next section) in a more dynamic setting. The regression estimator based on the first t observations may be written as in equation (8.3) but using only data up to time t,
Once the (t + l)th observation becomes available, as,
fit+, may be obtained
90
8 State-Space Models (I)
You should note that the expression in the parenthesis in the above equation reflects the forecast error based on the regression estimates of the parameters . This is referred to as recursive residual and when multiplied by the gain vectorKt+,, it represents the necessary adjustments to the parameter estimates. We can compute the gain vector as,
fit
When we have processed all the n observations the final estimate of the parameters will be equal to the usual regression estimates discussed earlier. Now let us simplify the equation (8.20) so that we do not need to invert the product matrix as each observation becomes available. We will adopt P, = (z:z,)-' to simplify the exposition. We will use the following matrix inversion lemma (Maybeck 1979), for any given three compatible matrices,
For our problem, we identifl, A = (Z:Zt ), B = zt+,,C = B' , then
Note that the denominator in the above equation is a scalar and does not represent a matrix inversion. As new observation becomes available we update the estimates of the parameter vector via equation (8.19) and at the same time update Pt via equation (8.22) efficiently without the need for additional inversion. Also, we can use the final estimate of Pt to obtain the covariance of the parameter estimates as given in equation (8.5). Harvey
8.5 State-Space Representation
91
(1990) gives additional information about the usefulness of the recursive residuals in model identification.
8.5 State-Space Representation The SSM in its basic form retains a VAR(1) structure for the state equation,
where the state equation determines the rule for generation of the states yti from the past states y,-,,, , j = 1,2,..., p for i = 1,2,...,p and time points t =1,2, ...,n . For completeness we assume that w, are px 1 independent and identically distributed zero-mean normal vectors with covariance Q . The state process is assumed to have started with the initial value given by the vector, yo, taken from normally distributed variables with mean vector po and the p x p covariance matrix, C, . The state vector itself is not observed but some transformation of these is observed but in a linearly added noisy environment. Thus, the measurement equation is given by,
In this sense, the q x 1 vector z, is observed through the q x p measurement matrix r, together with the qx 1 Gaussian white noise v,, with the covariance matrix, R . We also assume that the two noise sources in the state and the measurement equations are uncorrelated. The model arose originally in the space tracking area, where the state equation defines the motion equations for the space position of a spacecraft with location y, and z, reflects information that can be observed from a tracking device such as velocity and height. The next step is to make use of the Gaussian assumptions and produce estimates of the underlying unobserved state vector given the measurements up to a particular point in time. In other words, we would like to find out, E[y , I (z,-, , z,-, .- - z, )] and the covariance matrix, P,,,+,=
92
8 State-Space Models (I)
E[(yt - ytlt-,)(yt- ytl,-,)'I. This is achieved by using Kalman filter and the basic system of equations is described below. Given the initial conditions yolo= p,, and Po,= C, , for observations made at time 1,2,3 T,
where the Kalman gain matrix
and the covariance matrix PtItafter the tthmeasurement has been made is,
Equation (8.25) forecasts the state vector for the next period given the current state vector. Using this one step ahead forecast of the state vector it is possible to define the innovation vector as,
and its covariance as,
Since in finance and economic applications all the observations are available, it is possible to improve the estimates of state vector based upon
8.5 State-Space Representation
93
the whole sample. This is referred to as Kalman smoother and it starts with . The folinitial conditions at the last measurement point i.e. y,, and PTIT lowing set of equations describes the smoother algorithm:
where
It should be clear from the above that to implement the smoothing algorithm the quantities y,,, and PtItgenerated during the filter pass must be stored. It is worth pointing out here that in EViews implementation of ARMA model in state-space framework, the measurement error is constrained to be zero. In other words, R = 0 for such models. The description of the above filtering and the smoothing algorithms assumes that these parameters are known. In fact, we want to determine these parameters and this achieved by maximizing the innovation form of the likelihood function. The one step ahead innovation and its covariance matrix are defined by the equations (8.30) and (8.31) and since these are assumed to be independent and conditionally Gaussian, the log likelihood function (without the constant term) is given by,
In this expression O is specifically used to emphasize the dependence of the log likelihood function on the parameters of the model. Once the function is maximized with respect to the parameters of the model, the next step of smoothing can start using those estimated parameters. There are different numerical approaches that may be taken to carry out the maximization of the log likelihood function. The computational com-
94
8 State-Space Models (I)
plexity and other numerical issues are beyond the scope of this book. However, some intuitions in these matters will be given in a later chapter. In order to encapsulate this adaptive algorithm we have given a schematic in Fig. 8.2. This should help clarifL the flow of the filter process as observations are processed sequentially.
8.6 Examples and Exercises Example 8.1: State-Space Representation of a VARMA (2,l) Model Consider the example given in equation (8.17) and assume that the series have been adjusted for their mean. This implies that the first constant vector would be zero. In order to put this model in the state-space form we need to define the system matrices as follows:
the state vector and it is also the measurement vector in this setup. The state transition matrix and the state noise vector may be defined as,
It is now clear that the measurement equation would not have any noise term and the measurement matrix is,
8.6 Examples and Exercises
t
?t,t
@t,t
9
v State Dynamics
t+ 1 i .m
Zt+ljt
4
'
f t + l / t Pt+l,t
v
v
Prediction Error Vt+l
=
t+l-Zt+~~t
Updating Equations
v
v
Accumulate Prediction Error Form of Likelihood Function
?t+ljt+l
3
,. Pt+llt+~
.m
Use
vt+1,
1 Fig. 8.2 Filter schematic
Pt+,
t+2
95
96
8 State-Space Models (I)
If the matrix elements were not known and had to be estimated, this state-space representation of the model may be estimated by the maximum likelihood method as discussed earlier. For several other examples, see Lutkepohl(1993). Example 8.2: Signal Extraction Consider the quarterly earnings of Johnson & Johnson (data obtained from Shumway and Stoffer (2000)) shown in Fig.8.3. It seems likely that the series has a trend component and superimposed on that is the seasonal (quarterly) variations in earnings. Our aim in this example is to apply statespace model approach to extract such seasonal signals as well as the trend earnings component. The upward bend in the curve could not be removed by making some functional transformation of the series e.g. taking logarithms and/or square or cube roots. We would, therefore, model the series in its original form and represent is as,
where the trend component is Tt and the seasonal component is St .We will let the trend component to grow exponentially or in other words,
where 4 > 1 . We will assume that the seasonal (quarterly) components are such that over the year these sum up to zero or white noise. Thus,
8.6 Examples and Exercises
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
97
81
Fig. 8.3 Quarterly earnings
Table 8.1 Trend and seasonal component model of quarterly earnings
---- -
"
Estimates Std Error
1.03508 0.14767
1.23e-10 1.82e-11
0.01971 0.00291
0.04927 0.00727
The equations (8.39) - (8.41) may now be put into the state-space form with the following definitions:
98
8 State-Space Models (I)
The state equation may be written as,
and the state noise covariance matrix,
This structure of the state noise covariance matrix implies that the noise sources for the trend and the seasonal components are independent of each other. In this problem we have to estimate the four parameters, (4, r,, ,q,, ,q,,) . Besides, the estimation algorithm defined earlier would provide us with the smoothed estimates of the states i.e. the trend and the seasonal components. We used the maximum likelihood estimation method with numerical optimization procedure in GAUSS and Table 8.1 summarizes the parameters of the model along with their standard errors. We then include two graphs (Fig.8.4 and Fig.8.5). The first one shows the original series and the trend component superimposed. You should note how well the trend line tracks the actual earnings series. In the second graph we show the seasonal component only. It should also be clear that the seasonal component fluctuates more toward the end of the sample period.
8.6 Examples and Exercises
- - ~ a r n i n ~ s- - . - .
Fig. 8.4 Earnings: actual and estimated trend
rend/
99
100
8 State-Space Models (I)
Example 8.3: Coincident Indicator In classical factor analysis approach it is assumed that a set of q observed variables z, depends linearly on q < p unobserved common factors ft and possibly on individual factors, u, . This structure may be expressed as,
where the matrix L is referred to as the factor loading. The main objective of such an approach will be to infer the unobserved factors as well as the factor loading matrix for practical use. The state-space modeling framework is very useful in dealing with this situation. In that context the equation (8.45) may be viewed as the measurement equation and we need to define the dynamics of the unobserved factors. This is where the flexibility of the framework is most appreciated. The exact nature of the system dynamics will, however, depend upon the economic background in which the model is proposed. There have been several applications where the factors are assumed to have ARCH effect and again the parameters of the ARCH process might depend upon a hidden Markov process. A good reference for such details is Khabie-Zeitune et a1 (2000). Another interesting application is given in Chauvet and Potter (2000). The authors attempt to build a coincident financial indicator from a set of four macroeconomic variables and explore the usefulness of the extracted indicator in forecasting the financial market conditions. These authors also allow the dynamic of the latent factor or the coincident indicator to be influenced by a hidden Markov process and switch states with some transition probability matrix. To estimate such a model the authors use statespace framework with the added complexity of the driving hidden Markov process. In this example we refrain form that extra complexity and to get an insight into the versatility of the state-space models we use simpler factor structure and attempt to show its usefulness. The dynamic factor approach in Chauvet and Potter (2000) starts with the preliminary analysis of four variables that reflect public information about the state of the financial market. These are, excess return on the market index, pricelearning ratio at the index level, short-term interest rate and a proxy variable for the volatility in the market. The squared excess return is used for this proxy, but several other choices are possible. The authors suggest that these variables should be transformed to make them stationary. While applying this model to the Australian data we need to take
8.6 Examples and Exercises
101
the first differences of the pricelearning ratio and the short-term interest rate to make these two variables stationary. We will use the symbols consistent with our description of the statespace model earlier. We assume a straightforward dynamic of the unobserved factor (representing the state of the financial market) as,
The observation vector of four variables identified above is assumed to have been generated by contribution from this unobserved factor with varying degrees of sensitivities. Any unexplained part is captured by individual noise terms with their individual variances and these noise terms are uncorrelated. This leads us to define the state equation as follows:
In the above measurement equation the observed variables, z,,, (i = 1,2, 3, 4) represent excess return, market volatility, change in pricelearning ratio and change in short-term interest rate, in that order. The parameters p, 's measure the sensitivity of the individual observed variable with the unobserved financial market coincident indicator. The state-space model given by the equations (8.46) and (8.47) can now be estimated given the observations of the four variables in the measurement equation. This will be achieved by numerically maximizing the log likelihood function discussed earlier. The parameters of interest are, ($o,$,, 0:. P , , P ~ , P ~ CF;,G;, , P ~ , CT;,0:) and at the same time we would be able to infer about the unobserved component i.e. financial market coincident indicator. We apply this to the Australian monthly data covering the period August 1983 to June 2001. This is implemented using a GAUSS program and below we just summarize the correlations between the inferred coincident indicator and each of the four observed variables used in the model.
102
8 State-Space Models (I)
Table 8.2 Correlations between coincident indicator and others
Excess Return
Market Volatility
Change in PIE
Change in Short Rate
These correlations compare very well with those obtained from the U.S. market by Chauvet and Potter (2000), although those authors use more elaborate setup that includes a hidden Markov process driving both the mean and the variance of the unobserved component. Our aim has been to demonstrate the usefulness of state-space approach to the factor analytic models and those interested should refer to the paper by Chauvet and Potter (2000) where the authors use the coincident indicator to forecast out-of-sample the state of the financial markets. It is conceivable that such an approach could be extended to constructing leading indicator and thus making it useful for portfolio allocation. Exercise 8.1 : AR Model and State-Space Form (EViews) The objective in this exercise is to appreciate modeling flexibility offered by state-space approach. In this exercise, you model the real dividend data from the aggregate US equity market covering the period January 1951 to December 1998. As a straightforward application of time series model, first apply an AR(3) structure to this data and estimate the pararneters. Next, put the AR(3) model in the state-space form and estimate its parameters. Compare the estimation results. Exercise 8.2: Time Varying Beta (EViews) Market model of equity return is adopted for empirical work in the CAPM framework. The systematic risk of the equity portfolio is captured by beta. Although, in many cases this quantity is considered time invariant, there are many articles that describe this as a time varying quantity and offer many different approaches to estimating such a time varying beta. In this exercise, you are going to treat beta as an unobserved state variable and apply state-space methodology to estimate this from known values of equity return. The data contains banking sector return from Japan and the Nikkei index return is the proxy for the market portfolio. In order to understand the difference between constant beta and time varying beta, you should also estimate this using simple linear regression approach.
References
103
Exercise 8.3: Stochastic Regression (EViews) This exercise explores the relationship between interest rate and inflation. The three month interest rate, y,, and the quarterly inflation rate, z, , from the second quarter of 1976 to the first quarter o f 2001 (for Australia) is to be analyzed following the stochastic regression equation given below:
where, a is a fixed constant,
P, is a stochastic regression coefficient,
and
v, is a white noise with a variance 0:. Your task is to formulate this as a state-space problem where the variable P, is assumed to have a constant mean, b, and the associate error term has a variance o i . In other words, p, may be expressed as,
(P, -b)=+(Pt-,
-b)+w,.
How would you forecast from this state-space model the next three month's interest rate? Also, comment on how well this model performs for the Australian market.
References Chauvet M, Potter S (2000) Coincident and leading indicators of the stock market. Journal of Empirical Finance, 7: 87-111 Harvey AC (1989) Forecasting, structural time series models and the Kalman filter. Cambridge University Press, Cambridge Harvey AC (1990) The econometric analysis of time series. The MIT Press, Cambridge Kalman RE (1960) A new approach to linear filtering and prediction problems. Journal Basic Engineering, Transactions ASME, Series D, 82: 35-45 Kalman RE, Bucy RS (1961) New results in linear filtering and prediction theory. Journal Basic Engineering, Transactions ASME, Series D, 83: 95-108 Khabie-Zeitoune D, Salkin G , Christofides N (2000) Factor GARCH, regime switching and the term structure. Advances in Quantitative Asset Management, vol 1, Kluwer Academic Publishers, Dordrecht Kim C-J, Nelson CR (1999) State-space models with regime switching: classical and Gibbs-sampling approach with applications. The MIT Press, Cambridge
104
8 State-Space Models (I)
Lutkepohl H (1993) Introduction to multiple time series analysis. Springer-Verlag, Berlin Maybeck PS (1979) Stochastic models, estimation and control. vol I, Academic Press, New York, and London Shumway RH, Stoffer DS (2000) Time series analysis and its applications. Springer Text in Statistics, New York Tanizaki H (1993) Nonlinear filters: estimation and applications. Lecture Notes in Economics and Mathematical Systems, Springer-Verlag, Berlin Wells C (1996) The Kalman filter in finance. Kluwer Academic Publishers, Dordrecht
9 State-Space Models (11)
9.1 Likelihood Function Maximization We have discussed earlier that to estimate the unknown parameters of a model cast into state-space form we need to maximize the prediction error form of the likelihood function. In this segment we will review some of the intricacies of function optimization and indicate some of the software implementation of these procedures. Further details in this context are normally covered in courses like numerical techniques. In order to focus our attention on the topic we reproduce the likelihood function developed in the previous lecture and all symbols retain their original meaning:
The state process is assumed to have started with the initial value given by the vector, yo, taken from normally distributed variables with mean vector po and the px p covariance matrix, C., . In carrying out this optimization we need to specify these prior quantities. When we are modeling stationary series we may be able to initialize the mean state vector using information of the sample means of the observation variables or any other knowledge we have about the system. For non-stationary series we usually initialize with the first set of observations. For finance and economic applications we normally initialize the prior covariance matrix with large but finite diagonal elements. This represents the fact the information about the prior mean vector is diffused i.e. coming from a widely dispersed distribution. The initial specification of the covariance matrix, C,, is especially important since the forecast error variance, p,,,,, is not only dependent on the observation data but it is also partially determined by C,. Under certain situations it is possible to demonstrate that the system has limited memory
106
9 State-Space Models (11)
i.e. the value of Z, is quickly forgotten as more data are processed. It is, therefore, sometimes important to empirically investigate this. Further details about this topic are available in Jazwinski (1970) and Bhar and Chiarella (1997). Software products like EViews and Excel have limited number of optimization routines incorporated. On the other hand, in the GAUSS programming environment there are several choices available to suit many applications. Although, for most, detailed understanding of these routines is not a priority, it, however, helps in practice to have some knowledge of the internal workings of these algorithms. Most commercial implementation of optimization routines offers different tuning parameters, which are used to control the progress of the algorithm. The kinds of functions we encounter in practice for optimization are highly non-linear in several parameters to be estimated. Therefore, there is no guarantee that a sensible result can be obtained by simply running these programs. The choice of starting values and other tuning parameters are critical for meaningful results to be obtained. Here we will briefly describe the logic of one of the most common algorithms used. This is commonly known as the NewtonRaphson technique and the discussion will focus on minimizing a given function. When we are maximizing a function, as in the case of above likelihood function, we simply define the negative of that function to be minimized. To simplify exposition of this algorithm we redefine equation (9.1) simply as, L(O) and the parameter vector O has n elements. The algorithm requires gradient information about the objective function. The Jacobian gradient vector, g , is defined as,
The n x n symmetric matrix of second order partial derivatives of L is known as the Hessian matrix and is denoted by, H where,
9.1 Likelihood Function Maximization 107
Assuming the function L is sufficiently differentiable, we can express L near a minimum for a small change the parameter vector as,
Our objective is to determine the elements of A 0 i.e. 0 , ,i = 1,2,...n so that we can change these elements from the current values in order to move toward a minimum. We can write equation (4) in expanded form as,
and to determine A 0 we consider the gradient and the Hessian matrices are constant and partially differentiate equation (9.5) with respect to the elements Ae, ,for each j from 1 to n . Setting these results to zero gives,
for the first order condition of a minimum. In matrix notation, equation (9.6) is simply,
108
9 State-Space Models (11)
as the approximation to the required movement to the minimum a, from a nearby point, 0 . In general, the required movement to a minimum from a nearby point is approximately given by the equation (9.8). As you will notice from the above analysis, the procedure requires computation of the gradient and the Hessian matrix. During the process of optimization these quantities have to be calculated many times, therefore, much effort has gone into developing faster algorithms to compute these. In practice, it is hardly possible to compute the partial derivatives analytically for complex likelihood functions. In that situation these are computed using numerical methods e.g. forward finite differencing scheme etc. Another problem often encountered in practice is that of ill conditioned matrices. Special actions are required to ensure that such situations are avoided. In this book we will not be able to delve into this topic in greater detail.
9.2 EM Algorithm EM stands for Expectation Maximization. In addition to Newton-Raphson technique, Shumway and Stoffer (2000) suggest this procedure that has been found to be more robust to arbitrary initial values, although researchers report that this procedure is somewhat slower than the NewtonRaphson method. The Research Department of Bank of Canada appears to have adopted a mixture of these two approaches. The apparent robustness of the EM algorithm is utilized to get started on the complex optimization problem and after a while the values generated for the parameters are used to initialize the Newton-Raphson method. They have reported some success in this hybrid mechanism. In this section we will give an overview of this algorithm. The GAUSS code I have developed is available for those who are interested. Referring to the state-space model developed in the previous lecture, assume that the states Y, = {yo,y,,...y,) in addition to the observations Z, = {z,,z ,,... z,) are observed, then we would consider {Y,,,Z,) as the complete data. Under the normality assumption the joint likelihood (without the constant term) would then be,
9.2 EM Algorithm 109
Thus, if we had the complete data then we could use the relevant theory from multivariate normal distribution to obtain maximum likelihood estimate (MLE) of O . Without the complete observation, the EM algorithm gives us an iterative method for finding the MLE of @ based on the incomplete data, Z, , by successively maximizing the conditional expectation of the complete data likelihood. Let us write at iteration j, ( j = 1,2,....) ,
The equation (10) is referred to as the expectation step. Given the current value of the parameters, @'-',we can use the smoothing algorithm described earlier. This results in,
where
110
9 State-Space Models (11)
and
The equations (9.1 1)-(9.14) are evaluated under the current value of the parameters, a'-'.Minimization of equation (9.1 1) with respect to the parameters is equivalent to the usual multivariate regression approach and constitutes the maximization step at iteration j. This results in,
In this procedure the initial mean and the covariance cannot be estimated simultaneously. In practice we fix the covariance matrix and use the estimator,
obtained from the minimization of equation (9.11). The overall procedure simply alternates between Kalman filtering and smoothing recursions and the multivariate normal maximum likelihood estimators given by equations (9.15)-(9.18). Convergence in EM algorithm is guaranteed. To summarize, the EM algorithm steps are,
9.3 Time Varying Parameters and Changing Conditional Variance (EViews) 111
Initialize the process by starting parameter values, O0 = (p, ,T,Q,R) , and fix Co For each j, j
=
1,2,
.
Compute the incomplete-data likelihood, -2 ln L, (O '-I) , see equation (9. l ) E-step: For the current value of the parameters i.e. OJ-' , compute the smoothed estimates of yP,P:,P:t-l and use these to compute sI1~s1o~soo
M-step: Update the estimates of p,, T,Q, R using equations (9.15)(9.18) i.e. obtain OJ Repeat the above steps until convergence is achieved.
9.3 Time Varying Parameters and Changing Conditional Variance (EViews) In the usual linear regression models the variance of the innovation is assumed constant. However, many financial and economic time series exhibit changing conditional variance. The main modeling approach for this is ARCH (autoregressive conditional heteroscedasticity). In this class of models the current conditional variance depends on the past squared innovations. It has been suggested that the uncertainty about future arises not simply because of future random innovation but also due to the uncertainty about the current parameter values and of the model's ability to relate present to the future. In the fixed parameter regression framework model's parameters are constant and hence may not capture the dynamic properly. In other words, the changing parameters might be able to explain the observed time varying conditional variance in regression models. In this section we describe such an attempt in modeling the quarterly monetary growth rate in the US over a sample period of 1959:3 to 1985:4. The relevant article for this section is Kim and Nelson (1989) and the whole exercise should be carried out in EViews using the data set provided.
112
9 State-Space Models (11)
First we consider the model in the fixed parameter context. The model equation is given by, AM, = Po +PiArt-,+P21NFt-,+P,SURP,-, +P,AMt-I + s t ,
(9.19)
where E, - N(0,c:). The meaning of the variables are, AM is the quarterly growth rate of M1 money in the US, Ar is the change in the three month interest rate, INF is the rate of inflation, and SURP is the budget surplus. Your task is to estimate this model and test for ARCH effect in the residual. This gives indication that the ARCH effect has to be incorporated in the model. Again using EViews apply ARCH (1) specification to the model in equation (9.19) and construct the variance series. This should depict how the uncertainty evolved over time. Using Chow test it can be shown that the parameters in equation (9.19) are not stable over the sample period analyzed. This brings us to the alternative specification of time varying parameter model. A natural way to model the time variation of the parameters is to specify them as random walk. You should note that once we make the parameters in equation (9.19) time varying we couldn't estimate the model using standard maximum likelihood method since we have no observations of the parameters. In other words, given the random walk dynamic of the parameters, we want to infer the parameters from the observations of the different variables we have in the model. The easiest way to deal with this situation is to model the observations and the unobserved components in the state-space framework. We describe the model structure below and your task is to set up and estimate the model in EViews using state-space object:
where i = 0,1,2,3,4 . We will also assume that the innovation process for the parameters and the observation are uncorrelated. Once the model has been put in the state-space form it can be estimated using Kalman filter. The recursive nature of the filter captures the insight into how different
9.4 GARCH and Stochastic Variance Model for Exchange Rate (EViews) 113
policy regimes force the revision of the estimates. This is not possible in the fixed parameter regression framework. While estimating this model in EViews you should also analyze the importance of specification of the prior values of the state variables and their covariance. Once the estimation is complete you should generate the forecast error variance from the estimated components. Your next task is to compare this forecast error variance with that of the ARCH variance developed earlier. In an ARCH model, changing uncertainty about future is focused on the changing conditional variance in the disturbance term of the regression equation. In the time varying parameter model, the uncertainty about current regression coefficients contributes to the changing conditional variance of the monetary growth. This is well represented by the equation for the Kalman filter that describes the conditional forecast error, which has two components. The first part captures the variance associated with the uncertain parameters (states). It will be interesting to plot the two variance series, one from the ARCH estimation and the other one from the time varying parameter model, in the same graph. You would notice that around 1974 the variance from the time varying parameter model is higher than the one from the ARCH model. This suggests that during oil price shock uncertainty about the future monetary policy was higher than that suggested by the ARCH model. In that period, unusually high inflation rate caused the conditional variance to be unusually high as well.
9.4 GARCH and Stochastic Variance Model for Exchange Rate (EViews) Another alternative to ARCH in modeling changing conditional variance is to model variance as an unobserved stochastic process. In this section we describe such an approach suggested by Harvey, Ruiz and Shepherd (1994) and as an application of this model you will apply this technique in EViews to exchange rate data. Before formulating the stochastic variance (SV) model we first state the specification for the GARCH(1,l) model that is usually applied to exchange rate data. Let rt denote the return from the spot exchange rate i.e. the log price difference series, then the GARCH (1,l) model is,
114
9 State-Space Models (11)
where o: is the conditional variance that depends upon past squared innovation as well as the past conditional variance. The mean equation (22) in this case is a simple AR(1) process and in other cases it could include other explanatory variables. Since the model is formulated in terms of one-step ahead prediction error, maximum likelihood estimation is straightforward. The dynamics of a GARCH model show up in the autocorrelation function (ACF) of the squared observations. If the parameters P,, P, sum close to one then the ACF decays very slowly implying slowly changing conditional variance. In practice it is also quite difficult to establish the condition under which is o: always positive. The stochastic variance models are the natural discrete time version of continuous time models on which much of modern finance theory has been developed. The main difficulty is the maximum likelihood estimation of these models. However, such models can be generalized to multivariate case in a natural way. The estimation of SV model is based on quasimaximum likelihood procedure and it has been shown to capture the exchange rate data quite well. The SV model is characterized by a Gaussian white noise process multiplied by a GARCH(1,l) factor. In order to ensure positive variance it has been suggested that we model log of the variance. If we define ht = lno: and y
,= In :r
,then equation (9.22) can be re-expressed as,
Equation (9.24) is considered the observation equation, and the stochastic variance h, is considered to be an unobserved state process. In its basic form the volatility process follows an autoregression,
where w, is the white Gaussian noise with variance 0:. Together, the equations (9.24) and (9.25) make up the stochastic variance model. If E:
9.4 GARCH and Stochastic Variance Model for Exchange Rate (EViews) 115
had a log normal distribution, then equations (9.24)-(9.25) would form a Gaussian state-space model. Unfortunately y ,= ln :r is rarely normal, so we keep the ARCH normality assumption for E, . In that case, In E: is distributed as the log of a chi-squared random variable with one degree of freedom. Such a variable has a mean of -1.27 and variance n2/2 . Although various approaches to estimating the stochastic variance model have been proposed in the literature, we will investigate an approximate Kalman filter and you would be able to implement the model in EViews and investigate the model given an exchange rate series. You may also experiment whether a specification other than equation (9.25) fits the data better. An alternative to equation (9.25) would be a random walk specification. The estimation method suggested in Harvey, Ruiz and Shepherd (1994) is a quasi-maximum likelihood (QML) method and computed using the Kalman filter. The following state-space form may be adopted for implementing the SV model,
Although Kalman filter can be applied to (9.26)-(9.27), it will only yield minimum mean square linear estimators of state and observation rather than minimum mean square estimators. Besides, since the model is not conditionally Gaussian, the exact likelihood cannot be obtained from the resulting prediction errors. Nevertheless, estimates can be obtained by treating 5, as N(O,cr;) and maximizing the resulting quasi-likelihood function. Further reference to the relevant literature is available in Harvey, Ruiz and Shepherd (1994) for the performance of these estimators. Before you attempt the model in EViews, the following notes would be helpful: a. Initializing the state variable: When the state variable has an autoregression structure the sample variance of the observation series may be used as the prior variance,
116
9 State-Space Models (11)
When the state variable has the random walk structure, the prior variance may be a large but finite positive integer. b. Avoiding numerical problem with data
Remember that y, = In rt , and rt = In pt - In pt-, ,where p, is the observed exchange rate on date t, so it is possible that r, could be zero in some cases. This will create numerical problem to generate y, since log of zero is undefined. To avoid this problem, first generate the series r, and then subtract the mean of r, from each of the elements.
9.5 Examples and Exercises Example 9.1: EM Algorithm (Data from Shumway and Stoffer (2000)) Measurements of average temperature and salt level in an agricultural field at equally spaced intervals are shown in Fig.9.1. Our aim is to explore the relationship between these two variables using a suitable state-space form and estimate such a model using EM algorithm. We assume that the bivariate vector of temperature and salt level are measured with some measurement errors that are uncorrelated. Thus,
where the state vector is assumed to have the following dynamic.
The covariance matrices of the measurement errors and the state equation would be represented as,
9.5 Examples and Exercises 117
1 -Average
Temperature --- Salt Level
1
Fig. 9.1 Soil science data
Table 9.1 Empirical results
This system has, therefore, nine parameters, to be estimated,
We applied EM algorithm to estimate this model and Table 9.1 summarizes the result.
118
9 State-Space Models (11)
The GAUSS program using EM algorithm may be applied to the soil science data to gain some experience in the internal operation of the code as well as any potential computational problem areas. Example 9.2: (Friedman's Plucking Model of Business Fluctuations) Milton Friedman suggested that the aggregate output of an economy couldn't exceed a ceiling level. The available resources, methods of organizing them etc, set this upper level of output. However, occasionally the economy is pulled down into a recession. Friedman also noted that the extent of contraction is strongly correlated with the succeeding expansion, but the extent of expansion is not correlated with the extent of succeeding contraction. Although, this kind of asymmetry described by Friedman quite some time back, it took long time for formal modeling to empirically test the hypothesis. Kim and Nelson (1999) article achieves this using sophisticated approach that combines state-space technique with hidden Markov processes. In this example, we demonstrate the essential feature of their model to test Friedman's hypothesis that utilizes the state-space framework but avoids using hidden Markov states. Although the hidden Markov states are most suitable structures to employ in dealing with long data sets at the monthly or quarterly frequencies, the added mathematical complexities would confuse the essential objective of this exercise that is to demonstrate the versatility of the state-space framework. We focus on the following unobserved component model of the log of real GDP (y,) decomposed into a trend (7,) and a transitory component (c+>:
The transitory component is assumed to follow a stationary AR(2) process with constant innovation variance. This is where we depart from Kim and Nelson (1999) specification that allow the innovations shocks to be driven by a hidden Markov process that would capture the type of asymmetry we discussed above. Thus, in our simplified example,
9.5 Examples and Exercises 119
The stochastic trend component or the time varying ceiling of the economy's output is modeled to have a stochastically varying growth rate as well. Thus the specification becomes (again avoiding hidden Markov process),
We should note at this stage that this model is a prime candidate for state-space framework since we have some unobserved components with their associated dynamics, which ultimately determine one single measured output variable, the log real GDP. We can now write the state equation from the above discussions as,
and the measurement equation is,
The covariance matrix of the state equation noise sources (under the assumption of mutual independence) is,
120
9 State-Space Models (11)
I-LGDP
---~rendl
Fig. 9.2 LGDP and estimated trend component
9.5 Examples and Exercises 12 1
Once the model is in the state-space form, it is straightforward to infer the unknown parameters (@ = 4, ,4, ,o: ,o: ,o i ) by maximizing the prediction error form of the likelihood function. Here we present two graphs displaying the inferred stochastic trend and the transitory components (Fig.9.1 and Fig.9.2). When you compare this graph with the one in Kim and Nelson (1999) the main difference you will notice is in the pronounced plucking effect. This would not be so pronounced in our example here since we have restricted the innovation variances to be constant and is, therefore, not able to capture the asymmetric effect discussed earlier. The transitory component is shown in Fig.9.2. Again with respect to this graph, we could say that the marked changes between the episodes of economic expansion and contraction are less clear. This is solely due to the absence of hidden Markov states in our simplified approach. However, the example clearly establishes the versatility of the modeling framework that allows unobserved components to exist. Example 9.3 : Decomposition of Earnings This is also another example of signal extraction in the context of earnings reported by the firms and its relation to the share price of the firm. Hasley (2001) proposes that earnings data contain a stationary component in addition to a random walk trend. The author uses unobserved component approach to infer the cyclical component of the earnings and show that it impacts on the share price of the firm. The author extends this analysis to construct hedge portfolios at cyclical peaks in earnings and report positive market adjusted returns from such portfolios. Since our focus here is to explore possible ways of utilizing state-space framework to achieve such decompositions of the earnings data, we do not explore how such extracted earnings component may be exploited. The earnings model is very similar to the example of the Friedman's plucking model. The cyclical model is, however, modeled differently. The earnings series (x ,) is represented as,
122
9 State-Space Models (11)
I -Earnings
- - - Trend
1
Fig. 9.4 Earnings and trend
and the stochastic trend is captured by,
The cycle is a stochastic function given by a cosine function, , where A is the amplitude and 0 is the
yt = ACos (At - 8), t = 1,. ..,T
phase, and h(0 < h < x ) is the frequency measured in radians. Therefore,
9.5 Examples and Exercises 123
yt = aCosht + PSinht , where a = ACose and P = ASinO. This stochastic cycle allows a , P to evolve over time as, a,Cosh + PtSinh, along with the noise, K, that is uncorrelated to other noise sources, and a damping factor p(o1p11).
For the purpose of state-space model this cycle component is expressed as a recursion as follows (Harvey 1989):
where y ~ ,= a , y; = P, and the two noise terms are either uncorrelated or have the same variances. In practice, both these assumptions are imposed for computational convenience. Finally the state-space representation of the system turns out to be as follows:
and the covariance matrix of the noise vector is,
The measurement equation is given by,
9 State-Space Models (11)
124
1
18
35
52
69
86
103 120 137 154 171 188 205 222 239 256 273
Fig. 9.5 Cycle
- 1 Fig. 9.6 Growth rate and earnings
9.5 Examples and Exercises 125
The estimation of the parameters as well as the inference of the unobserved components is achieved by maximizing the prediction error form of the likelihood fbnction. Figs.9.5 and 9.6 display the various components. The observation data represents the aggregate banking sector earnings in the U.K. from January 1980 to February 2003. Exercise 9.1 : Evolution in Commodity Prices (EViews) In an article, Robert S. Pindyck explores long-run characteristics (http://web.mit.eddceepr/www/workingpapers.htm, January 1999, working paper No. WP99001), of commodity prices, mainly related to the energy sector. The main idea analyzed there is whether price reversion to a stochastically fluctuating trend helps forecast prices better. The implication for such results is better investment decision. Here is a brief outline of the main model equations required for this exercise. Let pi be the price of the commodity under investigation at time t. The reversion to a stochastically varying trend is captured by 4, and 4,. The trend line reflects long-run marginal cost, which is unobservable. The model, thus, lends itself to the state-space framework naturally. The discrete time version of the model proposed by Pindyck is given by,
Given the time series of monthly copper prices from January 1971 to December 2000, your task is to put this model in the state-space form and estimate the various parameters. You should also explore the sensitivity of the model to the specification of starting values of the state variables and
126
9 State-Space Models (11)
the prior covariance of the state vector. Once the model has been estimated how would you utilize the estimated parameters to forecast copper prices beyond the sample period?
References Bhar R, Chiarella C (1997) Interest rate futures: estimation of the volatility parameters in an arbitrage-free framework. Applied Mathematical Finance, 4: 181-200 Harvey AC (1989) Forecasting structural time series and the Kalman filter. Cambridge University Press, Cambridge Harvey A, Ruiz E, Shepherd N (1994) Multivariate stochastic variance models. Review of Economic Studies, 6 1: 247-264 Halsey RF (2001) Stationary components of earnings and stock prices. Advances in Quantitative Analysis of Finance and Accounting, 9: 81-1 10 J m i n s k i AH (1970) Stochastic processes and filtering theory. Academic Press, New York and London Kim C-J, Nelson CR (1989) The time varying parameter model for modeling changing conditional variance: the case of Lucas hypothesis. Journal of Business and Economic Statistics, 7: 433-440 Kim C-J, Nelson CR (1999) Friedman's plucking model of business fluctuations: tests and estimates of permanent and transitory components. Journal of Money, Credit and Banking, 3 1: 3 17-334 Shumway RH, Stoffer DS (2000) Time series analysis and its applications. Springer Text in Statistics, New York
10 Discrete Time Real Asset Valuation Model
10.1 Asset Price Basics In multi-period investment problems the fluctuation of the value of the asset presents an important challenge to decision makers. In discrete time, binomial lattice models are very popular way to represent such fluctuations. These are analytically simple and offer a tractable mechanism to solve investment problems. This topic focuses on representation of asset price dynamic in the binomial lattice framework and applies this to solve investment decision in a mining project. We model the asset price as a random walk {St,t = 0,1,2 ,...,T) . Next, we consider one time step in this model and set At = 1. At time 0, the asset price is So = s . At time 1, the asset price is a random variable S, . Let us assume that there are only two possible outcomes, it goes up to S, = suwith probability p, or down to S, = s, with probability (1-p) . We then consider any option contract with payoff depending on the underlying asset, say the payoff is xuif the price goes up and x, if the price goes down. Linear algebra says that we can replicate the option payoff by constructing a suitable portfolio ($, \y) of $ units of asset and \y bonds. Note that, in order to avoid an obvious arbitrage, s, Is < er < sUI s , where r is the risk-free rate for one period. Solving the following two equations we get,
and
128
10 Discrete Time Real Asset Valuation Model
Thus the value of the portfolio at time 0 is, Vo = @So+ y ~ which , may be written as, e's -sd
V, =-
Su
- Sd
sU- e's
e 'xu +-
-
e 'xd = qx,
+ (1 - q)xd ,
(10.4)
Su -Sd
e's - sd where q = -is the probability of moving up. su - Sd In other words,
i.e. the arbitrage-free price of the option. Also, note that under q, So = E,[e-'S,] .The q depends on s and is called the risk neutral or martingale probability. The p is called the real world or market probability. We can examine each time step in the multi-period model as a one period model. Hence we can find the portfolio required to hedge the option contract fiom one time step to the next and, working backwards down the tree, we can determine a unique arbitrage free price for any European option. If Vt denotes the value of the option contract at time t, then, we can write,
The trading strategy is a pair (4,, uy,) ,which denotes the holding in the risky asset, S and the risk-free bond, B. The value of the portfolio at time t is V, =@,St+ uy,Bt. This is a dynamic strategy, which generally requires adjustment of the portfolio after each step and hence is itself a random process. It is called self-financing if (4,-,S, + uy,-, B, = @,St+ uy, Bt ) . An
10.2 Mining Project Background
129
arbitrage opportunity is a self-financing trading strategy with the property that V, = 0 and V, 2 0 for all t, but V, > 0 with positive probability for some t. The fact that there are risk neutral probabilities in the binomial model can be used to show that this model does not allow arbitrage opportunity. The binomial model is also said to be complete as any contingent claim can be replicated by a self-financing trading strategy.
10.2 Mining Project Background Options associated with investment opportunities are not always financial options. A factory manager may have the option to hire more employees, buying new equipment etc. Similarly, drilling for oil in a piece of land or offshore may be viewed as a series of operational options. These are referred to as real options. The option valuation framework has great scope since it can value virtually any contingent decision. In product development, for example, different design choices lead to different follow-on opportunities. In large irreversible investments the framework can be used to evaluate modifications to construction schedule or the trade-offs between options to delay, abandon, expand or accelerate against additional value they create. Both real and financial options valuation can be less precise in practice than in theory because certain asset and market features can affect law of one price from holding. The option valuation framework provides a clear image of the magnitude of the imprecision in valuation. The option valuation method essentially relies on constructing a tracking portfolio, which is dynamically updated as the value of the underlying asset changes. This states that the option and the tracking portfolio are affected by the same source of uncertainty. Two real asset features cause tracking errors: the costs of tracking and the quality of tracking. When it is costly to change the portfolio composition frequently it may result in the portfolio to wander away from the value of the option. For real options the tracking portfolio may include specific features or commodities that make dynamic tracking difficult. For example, real options have private risk that is not contained in traded securities. The risk of failing to develop a new technology is a private risk carried by the high-tech firm.The risk of not finding large amount of oil in a particular prospect is a private risk borne by the oil firm. For more information regarding real options see excellent books by, Amran and Kulatilaka (1 999) and Trigeorgis (1997).
130
10 Discrete Time Real Asset Valuation Model
Real options can usually be analyzed by the same methods used to analyze financial options. We need to set out an appropriate representation of uncertainty of the underlying asset e.g. using a binomial lattice and work backward to find the value of the option. We will analyze the issues in this approach through some examples. The examples here are taken from Luenberger (1998). We explain the intuitions behind these examples as well as provide the spreadsheets to implement the models.
10.3 Example I A straightforward gold mine has a great deal of remaining gold deposits and you are part of team that is considering leasing the mine from its owners for a period of 10 years. Gold can be extracted from this mine at a rate of up to 10,000 ounces per year at a cost of $200 per ounce. This cost is the total of mining and refining and does not include leasing cost. The current market price of gold is $400 per ounce. The interest rate is 10% per annum. Assuming that the price of gold, the operating cost and the interest rate remains constant over the 10 year period, what is the value of the lease? This is very straightforward and it is clear that the mine should be operated at full capacity to maximize profit. The annual profit is, therefore,
If we further assume that this cash flow occurs at the end of each year, the present value is,
This is the value of the lease. There is, however, some inherent contradiction in analyzing the problem in this simple way. These would become clearer as we proceed to more complex and realistic situations.
10.4 Example 2
131
10.4 Example 2 We extend the gold mine of example 1 to the case where the gold price fluctuates randomly. We still maintain the assumption the term structure of interest rate is flat. We also follow the convention that the price obtained from gold mined during the year is the price that prevailed at the beginning of that year and the cash flow occurs at the end of the year. We represent the random nature of gold price by a binomial lattice. Each year the price can increase by a factor of 1.2 with a probability of 0.75 and can decrease by a factor of 0.9 with a probability of 0.25. All other relevant parameters of the problem remain same as described in example 1. We wish to find out the value of the 10-year lease. We find the value of the lease by the methods developed for option pricing. The trick is to note that the value of the mining lease can be regarded as a financial instrument whose value fluctuates with the price of gold. In fact, the value of the lease at any point can only be dependent on the price of gold and the interest rate (assumed constant). The value of the gold mine lease is a derivative instrument dependent on the price of gold. In the attached spreadsheet labeled 'Example-2-3 (Table 10.1) the top panel describes the movement of the gold price based on the up and down factor assumed for this problem. The value of the lease can be entered node by node based on the gold price lattice. This is done in the second panel. The lease values are easily determined on the last nodes where these are all zeros since the mine must be returned to the owner at that time. Consider a node in year 9 where the lease has one more year to go. The lease value must be equal to the profit that can be made from the gold mined in that year. Remember that our assumption is that the gold can be sold at the price that prevailed at the beginning of the year and the cash flow occurs at the end of the year. Focusing on the top node in year 9 on the second panel, the value of the lease is,
Following the same reasoning all the nodes in year 9 can be filled in with lease values.
132
10 Discrete Time Real Asset Valuation Model
Table 10.1 Example -2-3 0
1
2
3
4
5
6
7
8
9
10
Panel 1: Gold Price Lattice 400.00
480.00 360.00
576.00 432.00 324.00
691.20 518.40 388.80 291.60
829.44 622.08 466.56 349.92 262.44
995.33 746.50 559.87 419.90 314.93 236.20
1194.39 895.80 671.85 503.88 377.91 283.44 212.58
37.74 26.41 17.91 11.54 6.76 3.19
37.15 26.28 18.12 12.01 7.42 3.98 1.46
1433.27 1 7 1 9 1074.95 1289.95 806.22 967.46 604.66 725.59 453.50 544.20 340.12 408.15 255.09 306.11 191.32 229.58 172.19
. 9 3 1 1547.93 1160.95 870.71 653.03 489.78 367.33 275.50 206.62 154.97
2476.69 1857.52 1393.14 1044.86 783.64 587.73 440.80 330.60 247.95 185.96 139.47
27.81+7( 19.99 12.25 14.13 8.74 9.73 6.10 6.43 4.12 3.95 2.63 2.10 1.52 0.71 0.69 0.04 0.06 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Panel 2: Lease Values ($ Million) 24.22
27.90 18.04
31.3520.85 12.97
34.37 23.34 15.07 8.87
36.63 25.30 16.80 10.42 5.64
34.15 24.37 17.03 11.53 7.41 4.31 1.99 0.44
Panel 3: Lease Values Assuming Enhancements in Place ($ Million) 27.21
31.99 19.68
36.53 23.39 13.55
40.53 26.74 16.41 8.68
43.62 29.45 18.82 10.85 4.95
45.28 31.12 20.50 12.53 6.56 2.33
44.86 31.26 21.07 13.43 7.69 3.41 0.84
41.45 29.22 20.05 13.17 8.01 4.15 1.31 0.15
33.90 24.12 16.79 11.29 7.17 4.07 1.75 0.25 0.00
20.73 14.86 10.47 7.17 4.69 2.84 1.45 0.40 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
40.86 27.26 18.12 12.01 7.42 3.98 1.46
37.45 25.22 17.03 11.53 7.41 4.31 1.99 0.44
29.90 20.12 14.13 9.73 6.43 3.95 2.10 0.71 0.04
16.94 12.25 8.74 6.10 4.12 2.63 1.52 0.69 0.06 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Panel 4: Lease with Option for Enhancement 24.80
28.79 18.14
32.75 21.01 12.97
36.53 23.60 15.07 8.87
39.62 25.73 16.80 10.42 5.64
41.28 27.12 17.91 11.54 6.76 3.19
See main text for explanation of entries.
The lease value at any node before year 9 would be equal to the profit that can be made from the gold mined in that year plus the expected value discounted from the two succeeding nodes.
10.5 Example 3
133
For example again, we focus on the top node in year 8 and the computation is detailed below:
In the above expression p is the risk-neutral probability of gold price to go up. This is easily seen to be,
The lease value can, therefore, be calculated by this backward recursion. We should also note that at those nodes where the price of gold is less than the cost of extraction (i.e. $200) we do not mine that year. In this way, the value of the lease today is $24.2m. (Based upon the level of accuracy you carry through the lattice you may get slightly different results). You probably are able to see the similarity of pricing financial options in the binomial lattice and this example. In example 1 we maintain the gold price constant and it is clearly not realistic. The assumption of flat term structure of interest rate, however, is commonly employed in problems of this kind. You should also note that if gold price were known to be constant then it would act as a risk-free asset with zero rate of return. In that case it would be incompatible with assumption of risk-free rate of 10%. Indeed, for the lattice of gold price must be constructed such that, u > (1 + r) > d , where u is the up factor, d is the down factor and r is the risk-free rate of interest. We can now move on to the next level of complexity, where more realistic operational scenario is analyzed.
10.5 Example 3 The gold mine we are analyzing already contains several real options, e.g. the yearly options to carry out the operations. In fact, the value of the lease can be expressed as the sum of these individual options. More interesting, however, is to consider the following situation.
134
10 Discrete Time Real Asset Valuation Model
Assume that there is possibility of enhancing the production rate of the above mine by making some structural changes and buying and installing new machines. This enhancement would cost $4 million, but would raise the mine capability to produce 12,500 ounces of gold per year and raise the operating cost to $240 per ounce. This enhancement alternative is an option since it need not be carried out and it is available to you over the whole term of the lease. We also assume that the enhancement can be undertaken in any year (beginning) and once in place it applies to all future years. We further assume that at the termination of the lease, the enhancement becomes the property of the original owner of the mine. We focus on the panels 3 and 4 in the spreadsheet labeled 'Example-2-3 to analyze this enhancement option. We proceed as follows. We first calculate the value of the lease assuming that the enhancement is already in place. This is shown in panel 3. This is same as the panel 2 except for the annual production is 12,500 ounces of gold and the operating cost of $240 per ounce. This shows the value of the lease is $27.2 million. Remember that this figure does not include the cost of enhancement, which is $4 million. If the enhancement were to be implemented at the beginning the lease value would be $23.2 million, which is slightly less than the $24.2 million valuation we got before. This indicates that it is not optimal to carry out the enhancement at the beginning. We have the option to carry out the enhancement at any time later and we proceed as follows to value that option. We construct the panel 4 in the spreadsheet labeled 'Example23 and we use the original parameters i.e. production rate of 10,000 ounces per year and production cost of $200 per ounce. But, at each node, in addition to the computation explained before, we also compare the value at a node with the value of the corresponding node in panel 3. If the node value in panel 3 is $4 million more than that computed for panel 4 we take that value. This ensures that the benefit of enhancement is taken into account correctly. The figures in bold in panel 4 indicate where we find it is advantageous to implement the enhancement. The overall value of the lease turns out to be $24.8 million - this is a slight improvement (since $4 million cost of enhancement has already been factored in) over the original value of $24.2 million. Finally, we illustrate additional operational complexity that is often encountered in real life situations.
10.6 Example 4
135
10.6 Example 4 In this case we analyze the problem of mining lease when the cost of extraction depends on the amount of gold remaining. If you lease the mine, you must decide how much to mine each period, taking into account that mining in one period affects future mining costs. The practical relevance of such a scenario can be looked at this way. If the mine has been worked heavily in the past and it is reaching depletion then it becomes increasingly difficult to extract rich ore. Hence the cost of extraction depends on the ore body remaining. Referring to the spreadsheet labeled 'Example4 (Table 10.2) the top panel gives the fluctuation of gold price over the ten-year period as for the previous example. We assume that that the cost of extraction is given by the function,
where x is the amount of gold remaining at the beginning of the year and z is the amount of gold extracted in ounces. Initially we assume that there are x, = 50,000 ounces of gold in the mine. We continue with assumption of 10% flat term structure of interest rates, and the profit from mining is determined by the price of gold at the beginning of the year. We assume further that all cash flows occur at the beginning of the year. As before, the preliminary analysis suggests that the value of the lease is zero at the final time and we enter zero at all the nodes for year 10. At any node in end of year 9 we must decide the optimal amount to mine during the tenth year. Accordingly, we need to solve,
where g is the price of gold at that particular node. We find the maximum by differentiating (see appendix for these steps) this with respect to z, and equating to zero. This gives,
136
10 Discrete Time Real Asset Valuation Model
Table 10.2 Example4
Panel 1: Gold Price Lattice 400.00
480.00 360.00
576.00 432.00 324.00
691.20 518.40 388.80 291.60
829.44 622.08 466.56 349.92 262.44
995.33 746.50 559.87 419.90 314.93 236.20
1194.39 895.80 671.85 503.88 377.91 283.44 212.58
1433.27 1074.95 806.22 604.66 453.50 340.12 255.09 191.32
865.4 579.9 391.7 260.7 170.0 108.5
1062.1 695.9 464.3 305.8 197.4 124.6 77.1
1316.8 833.0 543.7 351.5 222.9 138.3 84.2 50.4
1719.93 2063.91 2476.69 1289.95 1547.93 1857.52 967.46 1160.95 1393.14 725.59 870.71 1044.86 544.20 653.03 783.64 408.15 489.78 587.73 440.80 306.11 367.33 229.58 275.50 330.60 172.19 206.62 247.95 154.97 185.96 139.47
Panel 2: Factor of Proportionality 326.2
395.9 273.9
480.4 331.3 226.8
583.3 400.1 273.3 183.5
709.3 482.2 328.0 219.6 144.1
1658.7 996.0 622.4 387.6 237.4 142.9 84.6 49.5 28.7
2129.9 1198.0 673.9 379.1 213.2 119.9 67.5 37.9 21.3 12.0
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
See main text for explanation of the entries.
Therefore,
This shows that the value of the lease is proportional to x, and the proportionality constant, K, is given by,
10.6 Example 4
137
Thus,
We can now set up a lattice of K values with nodes corresponding to various gold prices. We have already shown that K,, = 0 for all the nodes in year 10. We can easily enter the entries for year 9 following the above relation by picking up the appropriate value of g from gold price lattice in the upper panel. The next important point to realize is that for nodes in year 8 we can write the following:
where, K, = p x K9 + (1 - p)K$, and K9 is the value on the node directly to the right (i.e. up state) and Ki is the value on the node just below that (i.e. down state). The discount factor given by d is (for our example) 111.10. This leads to,
and
where
138
10 Discrete Time Real Asset Valuation Model
We repeat this process for each node in year 8 and then for all other nodes till year 0. The lattice shows that KO=326.2 and this gives the lease value at the beginning, Vo = Koxo= 326.2 x 50,000
= $16.2 million.
(10.22)
This particular approach has been based on the particular cost function assumed in the process. It may not generally be true that for all cost functions such a convenient proportionality constant could be defined. In that case some form dynamic optimization algorithm needs to be adopted.
Appendix Node at t=9:
gx, 9'
( X ~ ) = ~ G -
500g2x: 10002X9
Appendix
139
Thus,
Node at t=8:
where d is the discount factor based on the interest rate assumed constant, and K, is the expected value of the factor K at t = 9 with respect to the two succeeding nodes. Thus,
Note that the factor K only depends upon the gold price in the respective nodes and not on the amount of gold remaining in the deposit or extracted. dV8 -dz,
-g---
1oooz,
x8
dK, =O.
Substituting z, from the above equation in equation (A10.6) leads to,
which simplifies to,
140
10 Discrete Time Real Asset Valuation Model
References Amran M, Kulatilaka N (1999) Real options, managing strategic Investment in an uncertain world. Harvard Business School Press, Boston Luenberger DG (1997) Investment science. Oxford University Press, New York Trigeorgis L (1997) Rea options, managerial flexibility and strategy in resource allocation. The MIT Press, Cambridge
11 Discrete Time Model of Interest Rate
11.I Preliminaries of Short Rate Lattice This chapter draws on the book by Luenberger (1998), and enhances its interpretation as well as illustrates its application to several examples. Binomial lattice provides a framework for constructing interest rate models. The basic time span is selected to be of interest to the analyst e.g. a week or a month etc. We then assign a short rate to each node. Short rate is just one period forward rate. This rate is applicable over the next period. To obtain the full probabilistic behaviour of the process each node may be assigned probabilities. In pricing securities dependent on interest rate "real" probabilities are not important. We, therefore, assign risk-neutral probabilities. In this approach risk-neutral probabilities are assigned rather than derived from the replication argument (as in equity option). It is, therefore, convenient to assign these probabilities as 0.50. To be able feel comfortable with the development in this area we need to have a good understanding of the symbols normally used and what it really means. In this respect the following Figure 11.1 introduces readers to the environment appropriately. From a given node (t, i) two successor nodes that are reachable; (t+l, i) and (t+l, i+l). Let Vt,ibe the value of a security at (t, i) and D,; be the cash inflow at (t, i). Then, the lattice rule suggests,
where rt,i > 0 is the short rate at node (t, i).
142
11 Discrete Time Model of Interest Rate
Fig. 11.1 Short rate lattice and naming conventions
The spot rate implied by a short rate lattice can be extracted as follows. Poo(2)is the price of a bond that matures in period 2 and pays $1.
1
PO,(2) = -[0.5x Plo(2) + 0.5 x PI,(2)l. 1+ roo
Thus, the 2 period spot rate, s2, is obtained from
1
This process can be adapted to infer spot rates for other maturities. Besides, the process can be applied with respect to any other node (t, i). Thus, the short rate lattice helps generate a family of spot rate curves. This describes the evolution of the term structure. The term structure obtained from the short rate lattice is arbitrage-free. To understand this, consider the one period risk-neutral pricing formula,
1 1 . l Preliminaries of Short Rate Lattice
143
For the security to represent arbitrage we must have, P,,, 1 0 , V,,, , 2 0, Vt+,,i+l2 0 with one of these inequalities being a strict one. This is clearly impossible since all values are positive by construction. Hence, no arbitrage is possible over one period. This argument can be extended to any other periods. Hence, the short rate lattice provides an environment for modeling interest rate movements. Example: Consider the following 6 period short rate lattice. To construct this we used an up factor, u=1.3, and down factor d=0.9. Risk-neutral probabilities are 0.5.
The four period spot rates can be obtained as follows. First, assign the final cash flow from the discount bond at period 4, then compute discounted present value at the prior nodes using the short rates applicable to that node.
The computation involved is explained below.
Similarly, all other prices are obtained. The bond that pays $1 at t=4 is priced today 0.7334. The four period spot rate is,
11 Discrete Time Model of Interest Rate
144
Similarly, the other spot rates can be obtained. With respect to the above short rate lattice the spot rates are 0.0700, 0.0734, 0.0769, 0.0806 and 0.0844. Now consider the price of a call option on the 4 period bond with maturity at t=2 and strike price 84.00. We reproduce the bond price lattice up to t=2 for $100 face value.
The payoff from the call option at t=2, give that the strike price is 84.00,
coo
CII clo
0.00 0.81 5.09
We want find Coo. Referring to the short rate lattice,
Thus, the value of the call option today is 1.4703.
11.2 Forward Recursion for Lattice and Elementary Price
145
The discussion above relies upon the short rate lattice. The most critical task is to obtain this lattice such that it matches the observed term structure. This may be achieved by following the methodologies suggested by Ho and Lee (1986) or Black, Derman and Toy (1990). In the next section we illustrate how this may be achieved. Again, before embarking on the actual process, we need to understand how to speed up the algorithm of spot rate computation given a short rate lattice, without traversing the tree several times.
11.2 Forward Recursion for Lattice and Elementary Price We have seen how to use lattice for valuing interest rate sensitive securities using a backward recursion approach. To construct the term structure of interest rates from the short rate lattice many passes have to be made through the lattice. There is another approach - forward recursion -, which is more efficient in term of computation time. A single pass through the lattice will determine the whole term structure. It, however, depends upon the concept of elementary prices. With reference to the short rate lattice, the elementary price, Po(k, s) , refers to the price of a security at time 0 that pays $1 at the node (k, s) and zero everywhere else. Here k refers to the time axis and s refers to the state at that time. Fig. 11.2 shows the naming conventions. Po(1,l) refers to the price of a security at time 0, which pays $1 at that node and zero everywhere else. These are called elementary prices since they refer to a security with $1 payoff at only one node. Given a short rate lattice we can construct the elementary prices for each node in one forward pass. Once the elementary prices are known, constructing the term structure is straightforward. Assume that all elementary prices have been computed up to and including time k starting at 0. There are (k+l) states (s) at that time period and these are labelled 0 through k. Consider now the time period (k+l) and any state in the range 1 through k at time ( k i l ) i.e. excluding the two end states. For a node (k+l,s) there are two predecessor nodes at time k and these are: (k, s) and (k, s-1).
146
11 Discrete Time Model of Interest Rate
Fig. 11.2 Elementary prices
Fig. 11.3 Forward recursion
In Fig. 11.3, d,,, , and d,,,-, are the discount factors determined by the corresponding short rate lattice. If r represents short rate then d,,, = 1/(I+ r,,,) , similarly for the other one. Recall that we have assumed Po(k, s) and Po(k, s - 1) have already been determined. Corresponding to an elementary security that pays $1 at node (k+l,s), the time zero values at the two predecessor nodes are 0.5xdk,,-, x Po(k,s - 1) and 0.5 x d,,, x Po(k, s) respectively. Therefore, the time zero value of the elementary security corresponding to the node (k+l,s) is the sum of these two components. For the two end nodes at time (k+l) since
11.2 Forward Recursion for Lattice and Elementary Price
147
there is only one predecessor node the values will consist of only one component. This is summarized below:
Once all the elementary prices are computed, price of any security can be easily found. For, example the time zero price of a discount bond that
Po(n, s) .
pays $1 at time n, is s=o
Fig. 1 1.4 presents another intuitive explanation of the efficacy of the elementary prices. Other aspect of the forward recursion algorithm is also highlighted. The elementary prices at t=2 are labelled. The discount factor applicable at two different nodes and branches are also identified.
Fig. 11.4 Elementary prices and discount bond
148
11 Discrete Time Model of Interest Rate
From the elementary prices identified in the diagram, we can write the price of two-period zero coupon bond as,
11.3 Matching the Current Term Structure The previous discussion relied upon having the short rate lattice. We will now explore how to build the short rate lattice from the current observed term structure of interest rates. The procedure should also account for the fluctuation or volatility in the interest rates. There are number of methods available. We outline the method based on Ho-Lee (1986) modelling framework. The short rate is given by, r,, =a, + b,s , where a, and b, are parameters to be determined. It is clear that in Ho-Lee framework these parameters are time varying. At any given time period the differences in the short rate between states is controlled by b, . In fact it can be shown that this parameter represents the volatility of rate movement and a, represents the aggregate drift from period 0 through k. In the basic application we will assume that this volatility is constant. The problem, therefore, is to choose the parameter a, such that the short rate lattice produces the spot rate structure that is observed at time zero covering periods 0,1,2 ...n . The process is described in the context of a eight period term structure in the spreadsheet snap shot presented below. Notes: The relation between the Ho-Lee model parameters and the spot rate is an indirect one. The solution has to be based upon numerical methods of optimising some suitable criterion. It should become apparent that the method could be extended to account for the situation when the volatility parameter is not constant and in fact there is a term structure of volatility. The figures seen in the print out of the spreadsheet is essentially the at the point of convergence of the optimisation algorithm. In Excel, we have used Solver function to minimise the sum of
11.4 Immunization: Application of Short Rate Lattice
149
squared error differences of the model given spot rate and the given spot rates. Of course, any other meaningful optimisation criterion could be used depending on the application, The best way to get the appropriate feel of the model implementation, the reader has to experiment with it in Excel or any other suitable software environment. Period Spot rate
0 0.0767 0.0766
ak bk
0.0001 Short rate
State 7 6 5 4 3 2 1 0
0.0766
State Elementay price
p(0) Model spot rate Squared error Sum squared error
1.0000
0.9289 0.0766
0.8530 0.0827
0.7767 0.0879
0.6993 0.0935
0.6305 0.0966
0.5564 0.1026
0.4990 0.1044
0.4376 0.1088
1.51E-08 1.04E-09 4.66E-08 1.898-07 7.47E-07 l.lOE-06 6.15E-07 1.12E-07 2.838-06
11.4 Immunization: Application of Short Rate Lattice One of the key risk management in bond portfolio is to immunize it fiom anticipated changes in interest rats. Some of the methods normally used assume that the term structure changes in a parallel fashion. In other words, interest rate uncertainty is not properly addressed in such an approach. The short rate lattice can be used for immunising a bond portfolio while accounting for such uncertainties. The aim in this section is to explain how such a strategy might be implemented.
150
11 Discrete Time Model of Interest Rate
Consider a case where a series of cash obligations have to be met at specific times in future. Compute the initial value of the obligation stream using the lattice i.e. the present value of the obligation. We then need a bond portfolio with the same present value. After the first period the obligation stream can take one of two possible values depending on the two successor nodes. The bond portfolio will also have two possible values at the two successor nodes. If the bond portfolio values match the obligation values, the portfolio is immunised for one period. Therefore, for one period immunisation we need to match the present values at three places - the initial node and the two successor nodes. Due to the arbitrage-free property of the lattice, this matching can be achieved in a straightforward way using different bonds. For example, using two different bonds we can construct a portfolio having the same values (as that of the obligation stream) at each of the two successor nodes. The no-arbitrage property then makes sure that the initial value of the portfolio will also match the obligation stream. After one period, the portfolio can be rebalanced to obtain immunisation for the following period. By continuing this rebalancing for each period, complete immunisation for all periods can be achieved. Consider the immunisation problem below. We have a $1 million liability in 5 years time and we need to invest in two bonds so that the portfolio is immunised for one period. We will use the short rate lattice discussed before as our starting point and the bonds are: Bond 1: 5 year, 10% coupon Bond 2: 6 year, 6% coupon We assume that bonds pay annual coupon and the short rate lattice use year as period.
11.4 Immunization: Application of Short Rate Lattice
15 1
The attached spreadsheet describes the prices of these bonds with respect to the short rate lattice and the present value of the liability is also computed similarly. To construct the immunisation, we assume that x is the number of bond 1 and y is the number of bond 2 required. The following two equations can be solved to find x and y.
The first equation implies that the present value of the liability at t=O is met by the value of the bond portfolio. The second equation states that the same situation holds at t=l and state 1. There is no need to replicate the same for t=l and state 0 explicitly, since the arbitrage free nature of the lattice will ensure that happens. The solution can be easily found to be: x = 522.15 and y = 70 13.72. Bond 1. 5 year, 10%
10.00
10.00
10.00
10.00
110.00
Bond 2: 6 year 6%
Obligation: $lm 5 year
Just to sum up, the present values for the bonds and the liabilities are computed using the short rate tree developed before. The driving equation is (1 1.2). The solution for x and y shown above are computed using any of
152
11 Discrete Time Model of Interest Rate
the pairs of nodes as outlined above. The values displayed above are only two decimal places but the Excel internally would carry much more floating point precision. If you just use ordinary calculator and use the two decimal place values displayed above you would not get the x and y values given above. This is also due to the fact that the present values of the two bonds and that of the liability are order of magnitude different.
11.5 Valuing Callable Bond We will now illustrate how to use a short rate lattice to value a callable bond, and in turn find the value of the call feature, given a short rate tree. Consider a non-callable bond with coupon rate 6% per period and face value of $100. We are interested in finding its price today. This bond provides cash flow $6 at t=l, t=2 and $106 at t=3. As shown in the spreadsheet below the straight bond value is $102.90. Short Rate
Straight bond price
102.90
105.77 109.30
Callable bond price
104.51 106.70 108.27
106 106 106 106
11.6 Exercises
153
Assume the bond has one period worth of call protection i.e. the issuer may not exercise the call provision prior to one period from now i.e. the issue date. The issuer may call the bond back at the end of period 1 or 2 at face value plus one period coupon (i.e. $106). The issuer would want to call this bond whenever this reduces the value that the bondholders' claim would otherwise have on the firm. In other words, this action would increase the value of the shareholders' claim on the firm. From the previous tree we can identify the nodes at which the bond would be called as those at which the bond's value exceed the current call price. That is whenever the bond's value exceeds $106. Since the bondholders would be aware of the issuer's optimal call strategy they would never pay more than $106 at the nodes where call is imminent. To value the callable bond we simply replace the bond value as $106 at those identified nodes (shown in bold). These nodes are identified as bold italic characters. After replacing the bond value as $106 we recomputed the bond's price today using the same interest tree. This turns out to be $10 1.17. Therefore, comparing this value with the non-callable bond value the value of the issuer's call provision is ($102.90-$101.17) = $1.73. Once we have the short rate tree we can use it to value any type of complex bond, as long as, all of the uncertainty is confined to future interest rate movements captured by the tree.
11.6 Exercises Exercise 11.1 : Valuing a digital option as described here. Given the following short rate tree your task is to find the value of the following bet: you win $10 if one period interest rate is greater than 10% four periods from now.
154
11 Discrete Time Model of Interest Rate
References Black FE, Derman E, Toy W (1990) A one-factor model of interest rates and its application to Treaswy bond options. Financial Analysts Journal, JanuaryFebruary, 46: 33-39 Ho TSY, Lee SB (1986) Term structure movements and pricing interest rate contingent claims. Journal of Finance, 41 : 1011-1029 Luenberger DG (1997) Investment science. Oxford University Press, New York
12 Global Bubbles in Stock Markets and Linkages
12.1 Introduction During the past 30 years, theoretical models and empirical testing in asset pricing have been motivated by market efficiency. This theory claims that the price of an asset today reflects correctly the future random payoffs of this asset conditioned on today's information and appropriately discounted by a stochastic factor. The opportunity to make a riskless profit using arbitrage strategies ensures that markets are efficient. Campbell (2000) articulates this paradigm in detail. Many economists accept market efficiency as the well-established paradigm of financial economics but also acknowledge that asset prices are too volatile. For example, NASDAQ, from its peak on March 10,2000 when it stood at 5048.62 to its low of 1638.80 on April 4, 2001, it declined by 67.5%. Is this significant decline due to substantial revisions of the expected payoffs andlor changes in the discount factor? This chapter considers the speculative bubble approach as an alternative to the present value model of market efficiency. Section 12.2 reviews the general ideas of rational bubbles and section 12.3 identifies several key statistical tests proposed by economists. In section 12.4 we offer a modification of the existing tests and apply it to the stock markets of the US, Japan, England and Germany. We find evidence of rational bubbles and then proceed to examine whether these bubbles travel across mature economies. Section 12.5 presents the relevant literature that supports the hypothesis of global integration. In sections 12.6, 12.7 and 12.8 we elaborate in detail our methodological procedures to test for bubbles and linkages of such bubbles between mature stock markets. Our main findings and the conclusions are given in the last two sections.
156
12 Global Bubbles in Stock Markets and Linkages
12.2 Speculative Bubbles Economists have long conjectured that movements in stock prices may involve speculative bubbles, as trading is often said to generate over-priced markets. Some economists, however, believe that stock price fluctuations reflect changes in the values of the underlying market fundamentals. The standard definition of fundamental value is the summed discounted value of all future cash flows. The difference, if any, between the market value of the security and its fundamental value is termed a speculative bubble. Yet conhsion persists about what factors generate bubbles. Fads and irrationality have always figured prominently, and the hypothesis that these factors are important has gained some empirical support from the literature on asset price volatility. Another bubble-producing factor is the structure of information in the market. In a partial-equilibrium setting, Allen and Gorton (1988) showed that rational bubbles could exist with a finite number of agents who had asymmetric information. The existence of bubbles is inherently an empirical issue that has not been settled yet. A number of studies such as Blanchard and Watson (1982) and West (1988) have argued that dividend and stock price data are not consistent with the "market fundamentals" hypothesis, in which prices are given by the present discounted values of expected dividends. These results have often been construed as evidence for the existence of bubbles or fads. According to Shiller (1981), and LeRoy and Porter (1981) the variability of stock price movements is too large to be explained by the present value of future earnings. Over the past century US stock prices are five to thirteen times more volatile than can be justified by new information about future dividends. Campbell and Shiller (1988 a, b) and West (1987, 1988) remove the assumption of constant discount rate. But a variable discount rate provides only marginal support in explaining stock price volatility. They reject the null hypothesis of no bubbles. See also Rappoport and White (1993, 1994). A major problem with such arguments is that evidence for bubbles can be reinterpreted in terms of market fundamentals that are unobserved by the econometricians (Flood and Garber 1980; Hamilton and Whiteman 1985; Hamilton 1986). Diba and Grossman (1984, 1988a, 1988b) have recommended the alternative strategy of testing for rational bubbles by investigating the stationarity properties of asset prices and observable fundamentals. In essence, the argument for equities is that if stock prices are not more explosive than dividends, then it can be concluded that rational bubbles are
12.2 Speculative Bubbles
157
not present, since they would generate an explosive component in stock prices. Using unit-root tests, autocorrelation patterns, and cointegration tests to implement this procedure, several authors reach the conclusion that stock prices do not contain explosive rational bubbles (see Dezhbakhsh and Demirgue-Kunt (1990)). Evans (1991) criticizes tests for bubbles based on an investigation of the stationarity properties of stock prices and dividends. By Monte-Carlo simulations he demonstrates that an important class of rational bubbles cannot be detected by these tests even though the bubbles are explosive. Froot and Obstfeld (1991), introduce the concept of intrinsic bubble, which they define as exclusively dependent on market fundamentals and not on extraneous events. Assuming that a stock price should go to zero as dividends go to zero, they derive a bubble solution composed of stable and unstable components. This bubble is unstable and implies an explosive price-dividend ratio. They find significant evidence of such a bubble and demonstrate that incorporating an intrinsic bubble into the simple presentvalue model helps account for the long-run variability of the US stock data. Furthermore, as their bubble is a deterministic function of dividends, once the bubble gets started, it will never burst as long as dividends remain positive. In practice, tests for intrinsic bubbles are very easily implemented only when dividends are assumed to follow a very simple process, for example, a geometric random walk. When a more general dynamic specification, such as ARIMA (p, 1, q) process is introduced for dividends, the test procedure for intrinsic bubbles becomes virtually intractable. Using a stochastic dividend-growth model, Ikeda and Shibata (1992) specify the stock price as a function of dividends (i.e. market fundamentals) as well as of time. The resulting bubble solution will bridge a gap between time-driven bubbles (Flood and Garber 1980) and bubbles exclusively depending on fundamentals (Froot and Obstfeld 1991). Depending on a parameter that decides relative degrees of fundamental dependency and time dependency, the bubble solution obtained exhibits various dynamic properties, which cannot be derived by combining linearly the two special solutions. Wu (1997) examines a stochastic bubble, able to burst and restart continuously. The specification is parsimonious and allows easy estimation. The model fits the data reasonably well, especially during several bull and bear markets in this century. Such rational stochastic bubbles can explain much of the deviation of US stock prices from the simple present-value model. Miller and Weller (1990), and Buiter and Pesenti (1990) examine the effects of fundamental-dependents bubbles on exchange rate dynamics, us-
158
12 Global Bubbles in Stock Markets and Linkages
ing log linear models in which the consideration of free-disposal or pricesensitivity constraint is not required. A rational speculative bubble is nonnegative by definition: it represents what an investor might be willing to pay to buy a stock forever stripped of its dividends. As soon as either dividends or discount rates depend on the presence or absence of a bubble, however, the fundamental is affected by the presence of a bubble. For instance, the existence of a bubble may lead to an increase in interest rates, which so depresses the fundamental that the sum of the positive bubble and the bubbly fundamental falls short of the non-bubbly fundamental. Hence, a positive rational bubble may in fact decrease the overall price of a stock, contrary to what is commonly believed. Weil(1990), for example, provides empirical tests of this hypothesis. Most of the references above address issues of rational bubbles either theoretically or in the context of a mature economy. To complete this rapid bibliographical review we need to mention two additional trends in this literature. First several studies have tested for the existence of bubbles in emerging markets. For example, Richards (1996) claims that emerging markets have not consistently been subject to fads or bubbles. Chan et al. (1998) test for rational bubbles in Asian stock markets, and Chen (1999) specializes his search for bubbles in the Hong Kong market. Sarno and Taylor (1999) find evidence of bubbles in all East Asian economies. Significant increases in cross-markets linkages after a shock have now become a topic of important research under the term "contagion". Several important papers collected in Claessens and Forbes (2001) discusses both methodological issues and case studies of contagion. Beyond the existence or not of bubbles, economists have also studied in detail the implications of a stock market bubble to the economy at large. Biswanger (1999) offers a comprehensive review of these issues and Chirink0 and Schaller (1996) argue that bubbles existed in the US stock market but real investment decisions were based on fundamentals.
12.3 Review of Key Empirical Papers To motivate our methodological contribution we review several influential papers.
12.3.1 Flood and Garber (1980) We wish to begin with the classic Flood and Garber (1980). The authors test the hypothesis that price-level bubbles did not exist in a particular his-
12.3 Review of Key Empirical Papers
159
torical period. The existence of a price-level bubble places such extraordinary restrictions on the data that such bubbles are not an interesting research problem during normal times. Since hyperinflations generated series of data extraordinary enough to admit the existence of a price-level bubble, the German episode is an appropriate and interesting period to search for bubbles. The authors build a theoretical model of hyperinflation in which they allow price-level bubbles. Than, they translate the theoretical model into data restrictions and use these restrictions to test the hypothesis that price-level bubbles were not partly responsible for Germany's massive inflation during the early 1920s. Cagan (1956) used the following monetary model in his study of seven hyperinflations:
The quantities m and p are the natural logarithms of money and price at time t. The anticipated rate of inflation between t and t+l is n and E is a stochastic disturbance term. The rational-expectations assumption requires:
where n, = p,+, - p, is the mathematical expectations operator, and I is the information set available for use at time t. The solution of the equation (12.1) is:
where y = ( a - 1 ) / a >1, p,,, = m,+,+I-m,+,, w,+~ = E,+~+,-E,+, and A is an arbitrary constant. For this model, market fundamental is defined as
price level bubbles are then captured by the term -aAoyt .
160
12 Global Bubbles in Stock Markets and Linkages
Rational-expectations models normally contain the assumption A=O, which prevents bubbles. Notice that ifA + 0 , and then price will change with t even if market fundamentals are constant. The definition of a pricelevel bubble as a situation in which A + 0 is appropriate for two reasons. First, A is an arbitrary and self-fulfilling element in expectations. Second, ifA ;t 0 , then agents expect prices to change through time at an everaccelerating rate, even if market fundamentals do not change. Since economics usually consider price bubbles to be episodes of explosive price movement, which are unexplained by the normal determinants of market price, A st 0 will produce a price-level bubble. Finally, the results of the empirical analysis support the hypothesis of no price-level bubbles. 12.3.2 West (1987)
The test compares two sets of estimates of the parameters needed to calculate the expected present discounted value (PDV) of a given stock's dividend stream, with expectations conditional on current and all past dividends. In a constant discount rate model the two sets are obtained as follows. One set may be obtained simply by regressing the stock price on a suitable set of lagged dividends. The other set may be obtained indirectly from a pair of equations. One of the pair is an arbitrage equation yielding the discount rate, and the other is the ARIMA equation of the dividend process. The Hansen and Sargent (1980) formulas, familiar from rational expectations tests of cross-equation restrictions, may be applied to this pair of equations' coefficients to obtain a second set of estimates of the expected PDV parameters. Under the null hypothesis that the stock price is set in accord with a standard efficient markets model, the regression coefficients in all equations may be estimated consistently. When the two sets of estimates of the expected PDV parameters are compared, then, they should be the same, apart from sampling error. But this equality will not hold under the alternative hypothesis that the stock price equals the sum of two components: the price implied by the efficient markets model and a speculative bubble. A stock price is determined by the arbitrage condition:
12.3 Review of Key Empirical Papers
161
where p, is the real stock price in period t, b the constant ex ante real discount rate, O
unique forward solution to (12.4) is
But if this condition fails, there is a family of solutions to (12.4). Any p that satisfies
is also a solution. Note that c is by definition a speculative bubble. The aim of this paper is to test pt = P:, versus, p, =pi + c .
(12.7)
Checking for the equality of the two sets in long-term annual data on the Standard & Poor's 500 Index (1871-1980) and the Dow Jones Index (1928-1978) the author finds that the data reject the null hypothesis of no bubbles and the coefficients in the regression of price on dividends are biased upwards.
12.3.3 lkeda and Shibata (1992) Using a stochastic dividend-growth model, the paper provides a general analysis of fundamental-dependent bubbles in stock prices. Given that dividends follow a continuous Markov process, a stock price is specified as a hnction of dividends as well as of time. The authors derive a partial differential equation with respect to this price function from an arbitrage equation. Provided that a free-disposal condition is satisfied, a fundamental price process is defined as the forward-looking particular solution of this
162
12 Global Bubbles in Stock Markets and Linkages
equation and a price bubble as the general solution of the corresponding homogeneous equation. Lets consider a stock share, which yields dividends D(t) at time t. These dividends follow a geometric Brownian motion with positive drift:
Constant g and o are, respectively, are the expected value and the standard deviation of the instantaneous rate of dividend growth. dz is an independent increment of a standard Wiener process, z, with the initial condition z(0)=0. Since 1nD follows a normal distribution, the time series of dividend payments have a positive trend. Stochastic dividend-payment process (12.8) is the only source of randomness. Supposing risk neutrality of investors and free disposability of the stock, we assume that the cum-dividend stock price is determined by the following two conditions:
P(t) 2 0, with r > 0 Vt E [0, a]W.P. 1,
(12.10)
where E [.I Q, ] represents mathematical expectations conditional on Q, and parameter r denotes the riskless interest rate, which is assumed to be constant. The rational expectations stochastic process of the stock price is obtained then by solving nonhomogeneous partial differential equation (12.9), subject to the dividend payment process (12.8) and the price positivity condition (12.10). The author finds that the fundamentals dependency stabilizes bubble dynamics and that stock prices with fundamentals-dependent bubbles can be less volatile than fundamentals. Furthermore, fundamentals-dependent bubbles exhibit various transition patterns, such as nonmonotonic movements and monotonic shrinkage in magnitude and volatility.
12.3 Review of Key Empirical Papers
163
The paper estimates a rational stochastic bubble using the Kalman filtering technique. The bubble grows at the discount rate in expectation and it can collapse and restart continuously, allowing for the possibility of a negative bubble. The log dividends follow a general ARIMA (p, 1, q) process. The model for stock prices with the bubble component, the dividend process and the bubble process are expressed in the state-space form with the bubble being treated as an unobserved state vector. The model parameters are estimated by the method of maximum likelihood and obtain optimal estimates of stochastic bubbles through the Kalman filter. Let's consider the standard linear rational expectations model of stock price determination:
where P, is real stock price at time t, D, is the real dividend at time t, Et[.] is the mathematical expectation conditional on information available at time t and r is the required real rate of return (r>O). The log-linear approximation of (12.1 1) can be written as follows:
Where q is the required log gross return rate, Y is the average ratio of the stock price to the sum of the stock price and the dividend, k is -ln(Y)-(1Y)ln(l/Y-1), pt is ln(Pt), and dt is In(Dt). The general solution to (12.12) is given by:
where b, satisfies the following homogeneous difference equation:
164
12 Global Bubbles in Stock Markets and Linkages
In equation (12.12), the no-bubble solution p, is exclusively determined by dividends, while b can be driven by events extraneous to the market and is referred to as a rational speculative bubble. After defining the stock price equation, the parametric bubble process and the dividend process in a state-space form, the bubble is treated as an unobserved state vector, which can be estimated by the Kalman filtering technique. The author finds statistically significant estimate of the innovation variance for the bubble process. During the 1960s bull market the bubble accounts for between 40% and 50% of the actual stock prices. Negative bubbles are found during the 1919-1921 bear market, in which the bubble explains between 20% and 30% of the decline in stock prices.
The same model has been used also to estimate the unobserved bubbly component of the exchange rate and test whether it is significantly different from zero. Using the monetary model of exchange rate determination, the solution for the exchange rate is the sum of two components. The first component, called the fundamental solution, is a function of the observed market fundamental variables. The second component is an unobserved process, which satisfies the monetary model and is called the stochastic bubble. The monetary model, the market fundamental process and the bubble process are expressed in the state-space form, with the bubble being treated as a state variable. The Kalman filter can than be used to estimate the state variable. The author finds no significant estimate of a bubble component was found at any point in the period 1974-1988. Similar results were obtained for the sub-sample, 1981 through 1985, in which the results US dollar appreciated most drastically and a bubble might more likely have occurred.
12.4 New Contribution The purpose of our study is to search empirically for bubbles in national stock markets using state-of-the-art methodology such as Wu (1995, 1997) with emphasis on the U.S., Japan, Germany and the United Kingdom. We focus on the post-war period in these four countries as opposed to Wu (1997), which concentrates on only the U.S. annual data series dating back to 1871. All data are monthly returns of the S&P 500, Nikkei 225, Dax-30 and FT- 100 indexes ranging from January 1951 to December 1998, that is,
12.5 Global Stock Market Integration
165
576 observations. All data are converted to real values using the corresponding CPI measures and Global Financial Data provided the data. In order to establish the soundness of our methodology we attempted to reproduce the results from Wu (1997) using annual U.S. data (also obtained from Global Financial Data) covering the period 1871 - 1998. Although we employ the unobserved component modeling approach similar to Wu (1997), our implementations of the state-space form (or the Dynamic Linear Model, DLM, (chapter 8) is quite different from that of Wu. We treat both the dividend process and the bubble process as part of the unobserved components i.e. the state vector. The state equations also include their own system error, which are assumed uncorrelated. The measurement vector in this case contains the price and the realized dividend without any measurement errors. The advantage of this way modeling is that the comparison with the no bubble solution becomes much more straightforward. Wu (1997) had to resort to alternative way (GMM) of estimating the no bubble solution and the model adequacy tests are not performed there. Besides, the precise moment conditions used in the GMM estimation are not reported there. On the other hand, in our approach we are able to subject both the bubble and the no bubble solutions to a battery of diagnostics test applicable to state-space systems. In the following subsections we describe in detail the mathematical structures of our models and the estimation strategies. Once bubbles are confirmed empirically, we proceed to test linkages between the four markets in terms of both the fundamental price and the bubble price series. In this context we adopt a sub-set VAR methodology (Lutkepohl 1993 p.179). The approach builds into it the causal relations between the series and this gives us the opportunity to analyze the potential global contagion among these national equity markets through the speculative component of the prices. The potential existence of global linkages among equity markets will further decrease the expected benefits of a global diversification. The review of the literature associated with global integration and diversification is briefly presented next.
12.5 Global Stock Market Integration During the past thirty years, world stock markets have become more integrated, primarily because of financial deregulation and advances in computer technology. Financial researchers have examined various aspects of the evolution of this particular aspect of world integration. For example, the early studies by Grubel (1968), Levy and Sarnat (1970), Grubel and
166
12 Global Bubbles in Stock Markets and Linkages
Fadner (1971), Agmon (1972, 1973), Ripley (1973), Solnik (1974), and Lessard (1973, 1974, 1976) have investigated the benefits from international portfolio diversification. While some studies, such as Solnik (1976), were exclusively theoretical in extending the capital asset pricing model to a world economy, others such as Levy and Sarnat (1970) used both theory and empirical testing to confirm the existence of financial benefits from international diversification. Similar benefits were also confirmed by Grubel (1968), Grubel and Fadner (197 I), Ripley (1973), Lessard (1973, 1974, 1976), Agmon (1972, 1973), Makridakis and Wheelwright (1974), and others, who studied the relations among equity markets in various countries. Specifically, Agmon (1972, 1973) investigated the relationships among the equity markets of the U.S., United Kingdom, Germany and Japan, while Lessard (1973) considered a group of Latin American countries. By 1976, eight years after the pioneering work of Grubel (1968), enough knowledge had been accumulated on this subject to induce Panton, Lessing and Joy (1976) to offer taxonomy. It seems reasonable to argue that although these studies had used different methodologies and diverse data from a variety of countries, their main conclusions confirmed that correlations among national stock market returns were low and that national speculative markets were largely responding to domestic economic fundamentals. Theoretical developments on continuous time stochastic processes and arbitrage theory were quickly incorporated into international finance. Stulz (1981) developed a continuous time model of international asset pricing while Solnik (1983) extended arbitrage theory to an international setting. Adler and Dumas (1983) integrated international portfolio choice and corporate finance. Empirical research also continued to flow such as Hilliard (1979), Moldonado and Saunders (198 I), Christofi and Philippatos (1987), Philippatos, Christofi and Christofi (1983) and also Grauer and Hakansson (1987), Schollhammer and Sand (1987), Wheatley (1988), Eun and Shim (1989), von Furstenberg and Jeon (1989), Becker, Finnerty and Gupta (1990), Fisher and Palasvirta (1990), French and Poterba (I 99 1) and Harvey (1991). These numerous studies employ various recent methodologies and larger databases than the earlier studies to test for interdependencies between the time series of national stock market returns. The underlying issue remains the empirical assessment of how much integration exists among national stock markets. In contrast to earlier results, and despite some reservations, several of these new studies find high and statistically significant
12.6 Dynamic Linear Models for Bubble Solutions
167
level of interdependence between national markets supporting the hypothesis that global stock markets are becoming more integrated. In comparing the results of the earlier studies with those of the more recent ones, one could deduce that greater global integration implies fewer benefits from international portfolio diversification. If this is true, how can one explain the ever-increasing flow of big sums of money invested in international markets? To put differently, while Tesar and Werner (1992) confirm the home bias in the globalization of stock markets, why are increasing amounts of funds invested in non-home equity markets? For instance, currently about 10% of all trading in U.S. equities take place outside of the United States. The June 14, 1993 issue of Barron's reported that US investors have tripled their ownership of foreign equities over the past five years from $63 billion to over $200 billion in 1993. The analysis of the October 19, 1987 stock market crash may offer some insight in answering this question. Roll (1988, 1989), King and Wadhwani (1990), Hamao, Musulis and Ng (1990) and Malliaris and Urrutia (1992) confirm that almost all stock markets fell together during the October 1987 crash despite the existing differences of the national economies while no significant interrelationships seem to exist for periods prior and post the crash. Malliaris and Urrutia (1997) also confirm the simultaneous fall of national stock market returns because of the Iraqi invasion of Kuwait in July 1990. This evidence supports the hypothesis that certain global events, such as the crash of October 1987 or the Invasion of Kuwait in July, 1990, tend to move world equity markets in the same direction, thus reducing the effectiveness of international diversification. On the other hand, in the absence of global events, national markets are dominated by domestic fundamentals, and international investing increases the benefits of diversification. Exceptions exist, as in the case of regional markets, such as the European stock markets reported in Malliaris and Urrutia (1996). Longin and Solnik (2001) distinguish between bear and bull markets in international equity markets and find that correlation increases in bear markets, but not in bull markets.
12.6 Dynamic Linear Models for Bubble Solutions Our starting point in this approach is the equations (12.13) and (12.14) described earlier. As our preliminary investigations reveal that both the log real price and log real dividend series are non-stationary, we choose to work with the first differenced series. Thus, the equation (12.13) becomes,
168
12 Global Bubbles in Stock Markets and Linkages
Apt = Apf + Ab,,
x m
where, Ap: = (1 - y ~ ) W'E,[d,,, ] - (1 - W)
(12.15)
x m
yiEt-,[dt-,+,I . Assuming the
following parametric representation of equation (12.14),
In order to express the fundamental component of the price, hpf ,in term of the dividend process, we fit an appropriate AR model of sufficient order so that the information criterion AIC is minimized. We find that for the Japanese data a AR(1) model is sufficient whereas for the other three countries we need AR(3) models. The infinite sums in the expression for ~~f may be expressed in terms of the parameters of the dividend process once we note the following conditions: The differenced log real dividend series is stationary, therefore the infinite sum converges, Any finite order AR process can be expressed in companion form (VAR of order 1) by using extended state variables i.e. suitable lags of the original variables, (Campbell, Lo and MacKinlay 1997 p.280), Using demeaned variables the VAR(1) process can be easily used for multiperiod ahead forecast (Campbell, Lo and MacKinlay 1997 p.280). Assuming the demeaned log real dividend process has the following AR(3) representation, Ad, =$,Adt-, +$,Adt-, +$,Adt-, + c g , g - ~ ( ~ 7 . : ) , the companion form may be written as,
(12.18)
12.6 Dynamic Linear Models for Bubble Solutions
X,
=
ax,-,+ E,,
169
(12.20)
where the definitions of X,, 0 , and E, are obvious from comparison of equations (12.19) and (12.20). Following Campbell, Lo and MacKinlay (1997, p. 280), Apf may be expressed as, (with I being the identity matrix of the same dimension as @ ) Ap:
= Ad,
+ v @(I - v@)-'AX,.
(12.21)
We can now express equation (12.15) in terms of fundamental component and the bubble component, ~ p =,~ d+,e f v O(I - v@)-'AX, + Ab,,
(12.22)
where e' = [1 0 01. The equation (12.22) represents the measurement equation of the DLM and we need to suitably define the state equation for the model. An examination of the equation (12.17) and (12.19) suggests that the following state equation adequately represent the dynamics of the dividend and the bubble process:
12 Global Bubbles in Stock Markets and Linkages
170
-
1 0 0
O I 0
0
0
O O O O 0 0 0 1 0 0 1 0 0 -
-0
0
0
4 ) 1 $ 2 4 ) 3 O
-
=
Adt-3 bt - bt-1 -
-
-
-
v
0
1
O O 0 0
-
-
A
t
Eg
0 Adt-2 Adt-3 + 0 0 Adt-4 0 bt-l 0 -0 0 - - bt-2 -
0 -, 0 0 0 '
(12.23a)
E,
0-
We are in a position now to define the measurement equation of the DLM in term of the state vector in equation (12.23a). This is achieved by examining equation (12.22) and defining a row vector, M = elv@ (I - yr@)-' = [m,, m2,m3 as follows:
1,
Equation (12.24) determines the measurement equation of the DLM without any measurement error. In other words, the evolution of the state
12.6 Dynamic Linear Models for Bubble Solutions
171
vector in equation (12.23a) results in the measurement of the measurement vector through equation (1 2.24). Equations (12.23a) and (12.24) represent the DLM for the bubble solution when the dividend process is described by the AR(3) system in equation (12.19). In our sample this is the case for Germany, U.K. and the U.S.A data. Since the data for Japan required only a AR(1) process for the dividend in equation (12.19), the DLM, in this case, may be written directly as:
Similarly, the measurement equation for the DLM of the bubble solution for the Japanese data becomes,
[+,I
where, M r e'WQ,(l- yD)-l = [ml], since er=[l], Q, = . We have now set up the DLM for the bubble solution for the data for Germany, U.K., and the U.S.A. given by the equations (12.23a) and (12.24). For the Japanese data, on the other hand, these are given by the equations (12.25a) and (12.26). The parameters of the models embedded in these equations and the filtered and the smoothed estimates of the bubble series are to be estimated from the observed price and the dividend series.
172
12 Global Bubbles in Stock Markets and Linkages
The details of the estimation procedure are described in chapter 9. In the next section we proceed to set up the DLMs for the no-bubble solutions.
12.7 Dynamic Linear Models for No-Bubble Solutions In order to compare the performance of the bubble solution discussed in the previous sub-section we develop the DLM for a no-bubble solution. We maintain the same framework so that comparison is more meaningful. This is opposed to the approach taken in Wu (1997) where the no-bubble solution was estimated in the GMM framework. We also note that the model should account for the correlations in the variance of the stock return series. This is done by incorporating the GARCH(1,l) effect in the price equation (12.15) without the bubble component. In this context we adopt the methodology of Harvey, Ruiz and Sentana (1992) and follow Kim and Nelson (1999, page 144) to suitably augment the state vector of the DLM. For Germany, U.K. and the U.S.A date set the state equation (12.23a) becomes,
and a,-,is the information set at time t-1. The corresponding measurement equation becomes,
12.7 Dynamic Linear Models for No-Bubble Solutions
173
For the Japanese data with an AR(1) dividend process, the no-bubble DLM may be written following the approach above. The state equation (12.25a) becomes,
The corresponding measurement (12.26) becomes,
In the no-bubble solutions the parameters to be estimated are those of the dividend process and the GARCH(1,l) coefficients. The procedure for this is same as that for the bubble solutions and is described in detail in appendix A. The next sub-section takes up the issues in modeling the linkages between the markets in the subset VAR framework.
174
12 Global Bubbles in Stock Markets and Linkages
12.8 Subset VAR for Linkages between Markets Linkages among international stock markets have been extensively investigated and a good review of the literature can be found in McCarthy and Najand (1995). These authors adopt the state-space methodology to infer the linkage relationships between the stock markets in Canada, Germany, Japan, U.K and the U.S.A. The authors claim that this approach not only determines the causal relationship (in the Granger sense) but it delivers the result with minimum number of parameters necessary. They report that the U.S. market exerts the most influence on other markets. Since these authors use daily data there is some overlap in the market trading time and they attempt to take care of that in the interpretation of their results. The main finding is consistent with similar findings by other researchers e.g. Eun and Shim (1989), who examine nine stock markets in the North America and Europe over period 1980-1985 in a VAR framework. In this chapter we adopt a similar approach but at two different levels. The methodology developed in this chapter allows us to decompose the stock prices in their fundamental and the bubble components. We, therefore, analyze the linkage relationship both through the fundamental as well as through the speculative component. This helps us understand whether the market linkages are through the fundamental or through the speculative components of the price series. Also, since we are dealing with monthly data, the time overlap problem between markets is largely non-existent. The econometric procedure we adopt is referred to as the subset VAR. Use of standard VAR approach to study causal relations between variables is frequently employed. A typical VAR model involves a large number of coefficients to be estimated and thus estimation uncertainty remains. Some of the coefficients may in fact be zero. When we impose zero constraints on the coefficients in full VAR estimation problem what results is the subset VAR. But, since most often no a priori knowledge is available that will guide us to constrain certain coefficients, we base the modeling strategy on information based model selection criterion e.g. AIC (Akaike Information Criterion) and HQ (Hannan-Quinn). Actual mathematical definitions and the details of this approach can be found in Lutkepohl(1993, chapter 5). In the paragraphs below we very briefly summarize the procedure. We first obtain the order of the VAR process for the four variables using the information criterion mentioned above. The top-down strategy starts from this full VAR model and the coefficients are deleted one at a time (from the highest lag term) from the four equations separately. Each time a coefficient is deleted the model is estimated using least-square algorithm and the information criterion is compared with the previous minimum one.
12.9 Results and Discussions
175
If the current value of the criterion is greater than the previous minimum value, the coefficient is maintained otherwise it is deleted. The process is repeated for each of the four equations in the system. Once all the zero restrictions are determined the final set of equations are estimated again, which gives the most parsimonious model. We also check for the adequacy of this model by examining the multivariate version of the portmanteau test for whiteness of the residuals (Lutkepohl 1993, p. 188). Once the subset VAR model is estimated there is no further need for testing causal relations and/or linkages between the variables. The causality testing is built into the model development process. We will, therefore, examine linkages between the four markets in our study using this subset VAR model. As mentioned earlier we will explore linkages between these markets in two stages. In the first stage, the fundamental price series are all found to be stationary, and hence in this case the modeling is done using the levels of the variables. We find evidence of one unit root in the speculative components of the price series for all the four markets. As we suspect existence of a cointegrating relation between these speculative components, we explore this using Johansen's cointegration test and find evidence of one cointegrating vector. It is, therefore, natural to estimate a vector error correction model, which is essentially a restricted VAR model with the cointegrating relation designed into it. As suggested in Lutkepohl (1993, p. 378) we examine the causal relation between these variables in the same way as for a stable system. In other words, we explore the linkages as for the fundamental price component but in this case we use first differenced form and use the lagged values of the cointegrating vector as well.
12.9 Results and Discussions First we discuss the estimations results of the dynamic linear model with the bubble solution for the annual U.S. data series. In Table 12.1 we find all the parameter estimates are statistically significant. The significance of the parameter, IS, implies highly variable bubble component of the price through out the period 1871 to 1998.. The parameters describing the real dividend process are very close to the univariate estimation (not included) results of the dividend series. Besides, the discount parameter, , is close to its sample value. In Table 12.2 we present the estimates of the no-bubble solution with a GARCH (1,l) error structure for the price equation for the same U.S. annual data. Here also, most of the parameters are statistically significant.
176
12 Global Bubbles in Stock Markets and Linkages
The significance of the GARCH parameter, PI, implies persistence in the residual volatility. This model is used to compare the results of the bubble solution. We would like to stress the fact we implemented the GARCH (1,l) model also in the state-space framework so that the comparison with the bubble solution would be more realistic. This is, however, not the case with Wu (1997), which uses the GMM methodology. This approach also allows us to check the performance of both the models by analyzing the residual diagnostics. We present these test results in Table 12.3. The portmanteau tests support the whiteness of the residuals and the ARCH tests indicate no remaining heteroscedasticity in the residuals. Besides, the Kolmogorov-Smirnov tests support the normality of the residuals. These three tests overwhelmingly support the modeling approach adopted here and, therefore, the conclusions drawn are statistically meaningful. In addition to the three tests just outlined above, we also include two additional tests particularly designed for recursive residuals produced by the dynamic linear systems developed in this study. The modified von Neumann ratio tests against serial correlations in the residuals where as the recursive t-test is used to check for correct model specification. As the entries Table 12.3 suggest both the models, i.e. the dynamic linear models of the bubble and the no-bubble solution, both perform extremely well in respect of these two tests. There is overwhelming support for the adequacy of the models in describing the price process. In view of the battery of tests discussed in the preceding two paragraphs we can now proceed to analyze the other observations. As discussed in Wu (1997) the rational stochastic bubble can alternate between positive and negative values. It is argued there that the securities may be overvalued when the participants are bullish and these may be undervalued when the participants are bearish. The Figure 12.1 shows negative bubble in the very early part of the sample as well as during the early 1920s. It is obvious though that the stochastic bubbles account for a substantial percentage of the stock price in the sample. It is also interesting to note that in spite of the drop in the bubble percentage during the oil shock of the 1970 s and the stock market crash of 1987 there has been an upward trend of the bubble percentage through out the later part of the sample period considered. Next, we compare the performance of the bubble and the no-bubble solutions by examining the in sample frtting the stock prices. In Table 12.4 we display the criteria used and these are defined as,
12.9 Results and Discussions
177
and
where 6,is the fitted price and T is the number of observations. The entries in Table 12.4 clearly demonstrate the superiority of the bubble solution to capture the price process over the sample period. We next proceed to analyze the monthly data, covering the post war period, for the four countries, Germany, Japan, U.K. and the U.S.A. In Table 12.5 we present the estimation results of the bubble solutions and it is clear that most of the parameters are statistically significant. The discount parameter, y , as before is close to the respective sample values while the significant o, for all the four countries imply highly variable bubble components. Needless to say that the estimated parameters of the dividend processes are close to their respective univariate estimation (not reported here) results. As evident from the Table 12.6, the significant ARCH and the GARCH parameters indicate appropriateness of the error specification for the log price difference series. There is substantial persistence in the variance process. We now move to analyze the residual diagnostics in order to ascertain the appropriateness of the model for the monthly data series for all the four countries. As with the annual data (for the U.S.) we find evidence of whiteness on residuals from the portmanteau test and the lack of ARCH effect in the residuals from ARCH test results. The U.S. data also supports the normality of the residuals. More importantly though the tests for model adequacy are captured by the von Neumann ratio and the recursive t-test. As pointed out in Harvey (1990, page 157) von Neurnann test provides the most appropriate basis for a general test of misspecification with recursive residuals. In this context the dynamic linear models for the bubble and the no-bubble solutions both perform extremely well. Figure 12.2 plots the bubble price ratio for the sample period and the substantial variation of the bubble component is visible for all the countries. Except for the U.S. there is evidence of negative bubble for the other three countries in the initial part of the sample period. All countries were affected by the oil price shock of the 1970s but by different extent and the most severe appear to be in the U.K. The fall in the bubble percentage during the October 1987 stock market crash is evident for all the countries. It is also interesting to note that there is a general upward trend for the bub-
178
12 Global Bubbles in Stock Markets and Linkages
ble price ratio toward the later part of sample period for Germany, U.K. and the U.S.A. but not for Japan. This provides the visual evidence of the collapsing and self-starting nature of the rational stochastic bubble we have attempted to capture in this study. In order to quantify the performance improvement of the bubble solution compared to the no-bubble case with GARCH (1,l) errors we present the in sample fitting statistics, RMSE and MAE, in Table 12.8. The entries in Table 12.8 prove beyond doubt that the bubble solution does a credible job in terms of both metrics. For example, the bubble solution reduces the metric RMSE to 7% and the metric MAE to 52% of the no-bubble solution respectively, for the U.S. monthly data. We indicated earlier the importance and the extent of investigation into the study of market linkages by various researchers. In this chapter we are able to focus on this aspect in two different levels. The study of rational stochastic bubble through the dynamic linear models enables us to separate the price series into a fundamental and the bubble component. It is, therefore, natural to examine whether the market linkages exist via both these components. McCarthy and Najand (1995) demonstrated the influence of the U.S. market on several other OECD countries using daily data which might have unintended consequences of trading time overlap in these markets. Using monthly data over a period of 48 years we are in a better position to analyze the market interrelationships. VAR methodology is often employed to study causal relationships. If some variables are not Granger-causal for the others then zero coefficients are obtained. Besides, the information in the data may not be sufficient to provide precise estimates of the coefficients. In this context the top-down strategy of the subset VAR approach described in the earlier section is most suitable. For the fundamental price series we adopt this approach in the levels of the variables since these are all found to be stationary. Using the Hannan-Quinn criterion we start our VAR model with a lag of one and follow the subset analysis process described before. This gives us the model presented in Table 12.9. As with McCarthy and Najand (1995) we find strong evidence of the U.S. dominance on all the other three countries, but no reverse causality. This is a particularly important finding in the sense that this causality exists in the fundamental components of the prices. Intuitively, this evidence suggests that the US economy, as represented by the stock market data, acts as the engine of global growth. For Germany and Japan the causality from the U.S. are significant at 5% level whereas for the U.K it is significant at 1% level only. The overall significance of this modeling approach is also established by testing the multivariate version of the portmanteau test to detect whiteness of the residuals.
12.9 Results and Discussions
179
We also apply the top-down strategy for the subset VAR approach to the bubble components as well to examine the causality between the four markets. Since the bubble components are found to be non-stationary (results for the unit root tests not included) we model this using the first difference of the log prices. With the non-stationary bubble price series it is natural to expect some long-term equilibrium relationship between these variables. We detected one cointegrating vector using Johansen's procedure and this has been described Table 12.10. We follow the same process (as for the fundamental prices) to obtain the subset VAR model, including the cointegrating vector that describes the causal relationship between these markets. We find (from Table 12.10) that the causality also exists fiom the U.S. to the other three markets and the these linkages are significant at 5% level for Germany and Japan and only at 1% level for the U.K. Similar to the fundamental prices there is no reverse causality in the bubble price components as well. It is also observed that the strength of this causality from the U.S. to Japan is slightly stronger for the bubble price process, 0.1915 as opposed to 0.1878 for the fundamental prices. It is also noted fiom Table 12.10 that the coefficients of the error correction term i.e. 'Coint (-1)' are statistically significant. This implies that the modeled variables i.e. the changes in log prices adjust to departures from the equilibrium relationship. The magnitude of the coefficient 'Coint (-1)' for the Japanese log price difference is much higher than the others, capturing, first the upward and later, the downward trend in the Japanese market. Although, the existence of an error correction model implies some form of forecasting ability (see e.g. Ghosh (1993)), we do not pursue this in this chapter. Finally, we note the multivariate portmanteau test for whiteness of residuals in Table 12.10. This again supports the model adequacy and hence the inferences drawn are statistically meaningful. Table 12.1 Parameter estimates of bubble solution, USA yearly data
Estimates reported here are obtained from maximizing the innovation form of the likelihood function. Numerical optimization in GAUSS is used without any parameter restriction. The standard errors (reported below the parameters in parentheses) are obtained from the Hessian matrix at the point of convergence. These estimates are robust to different starting values including different specification of the prior covariance matrix. Significance at 5% level is indicated by *
180
12 Global Bubbles in Stock Markets and Linkages
Table 12.2 No-bubble GARCH(1,l) solution, USA yearly data
(0.132) .... c 0-8 6 L Q?.085) @,008) tO.OO6J (0.127) (0.264) Notes for Table 12.1 apply here. GARCH(1,l) error for state-space system implemented following Harvey, Ruiz, Sentana (1992).
Table 12.3 Diagnostics and model adequacy tests, USA yearly data MNR Rec. T Port. ARCH KS "--" Bubble 0.035 0.385 0.138 0.53 1 0.952 No Bubble 0.033 0.519 0.119 0.422 0.931 Entries are p-values for the respective statistics except for the KS statistic. These diagnostics-are computed from the recursive residual of the measurement equation, which corresponds to the real dividend process. The null hypothesis in portmanteau test is that the residuals are serially uncorrelated. The ARCH test checks for no serial correlations in the squared residual up to lag 26. Both these test are applicable to recursive residuals as explained in Wells (1996, page 27). MNR is the modified Von Neumann ratio test using recursive residual for model adequacy (see Harvey (1990, chapter 5). Similarly, if the model is correctly specified then Recursive T has a Student's t-distribution (see Harvey (1990, page 157). KS statistic represents the Kolmogorov-Smirnov test statistic for normality. 95% and 99% significance levels in this test are 0.121 and 0.145 respectively. When KS statistic is less than 0.121 or 0.145 the null hypothesis of normality cannot be rejected at the indicated level of significance. ----*--.--p--*---p-.mm-m-o"
Table 12.4 Models compared, USA yearly data Bubble
RMSE 0.25
MAE 0.34
RMSE and MAE stand for 'root mean squared error' and 'mean absolute error' respectively. These are computed from the differences between the actual log prices and the fitted log prices from the corresponding estimated model. Additional details are in the text.
12.9 Results and Discussions
181
Table 12.5 Parameter estimates of bubble solution, monthly data
Germany
Japan
U.K.
U.S.A. ---
(0.001 0) $1
-0.0009 (0.0400)
$2
0.0611" (0.0210)
4'3
0.0947* (0.0271)
(S6
0.0475"
Notes of Table 12.1 apply here. Table 12.6 Parameter estimates no-bubble solution, monthly data
Jxan Germany .-.-au.u.-.-.-. 0.8526* 0.5437* (0.0391) (0.0372)
U.K. 0.2830* (0.0380)
U.S.A. 0.3 189* (0.0344)
$1
0.0047 (0.0407)
-0.5331* (0.041 1)
-0.7213* (0.0413)
$2
0.063 1 (0.0409)
-0.34253 (0.0440)
-0.3271* (0.0484)
43
0.0848* (0.0415)
-0.1 148* (0.0399)
-0.0901* (0.0400)
66
0.0475* (0.0014) 0.0001* (5.14E-05) 0.1108* (0.0299) 0.8633*
0.0407* (0.0012) 0.0004* (0.0001) 0.2307* (0.0541) 0.6107*
0.0288* (0.0008) 0.0001* (4.62E-05) 0.0657* (0.0274) 0.8365*
"
""
W
a, '4 P1
Notes of Table 12.2 apply here.
-0.0906* (0.0407)
0.051 l* (0.0015) 0 0, 0.0988* (0.0232) 0.8869*
182
12 Global Bubbles in Stock Markets and Linkages
Table 12.7 Diagnostics and model adequacy tests, monthly data
-Bubble Germany Japan U.K. U.S.A.
Port.
ARCH
KS-
MNR
Rec. T
0.253 0.061 0.366 0.377
0.158 0.206 0.199 0.327
0.176 0.093 0.136 0.048
0.586 0.379 0.467 0.425
0.903 0.972 0.93 1 0.894
m -
No Bubble 0.254 0.195 0.175 0.466 0.806 Germany Japan 0.017 0.194 0.089 0.186 0.771 U.K. 0.307 0.179 0.139 0.571 0.907 U.S.A. 0.353 0.283 0.047 0.418 0.846 Notes of Table 12.3 apply. Critical values for KS statistic are 0.057 and 0.068, respectively.
Table 12.8 Models compared, monthly data
--
Bubble Germany Japan U.K. U.S.A.
No Bubble Garch(1,l) Germany Japan U.K. Notes for Table 12.4 apply here.
RMSE ---------0.796 1.730 0.247 0.117
2.945 4.394 0.719
------
MAE----"-- --0.795 1.730 0.366 0.895
12.9 Results and Discussions
183
Table 12.9 Subset VAR results: linkages in fundamental prices GR(- 1)
JP(- 1)
UK(- 1)
US(-1)
Constant
P
Details of the methodology for determining the subset VAR relations are given in the text. This has been done in the level variables since the fundamental price series are stationary. The numbers in parentheses are t-statistics for the corresponding coefficient. Significance at 5% and 10% level are indicated by * and ** respectively. The p-value for the multivariate portmanteau statistic for residual white noise is 0.017. This is described in Lutkepohl (1993) page 188. This indicates that the model adequately represents the relationship documented here.
Table 12.10 Subset VAR results: linkages in bubble prices AGR(-1) AGR AJP AUK AUS
AJP(-1)
AUK(-1)
AUS(1)
Coint(-1)
Const.
0.007* (2.47) 0.017~ (4.76)
0.096* (1.99)
0.194~ (3.91) 0.195* (3.20) 0.106"" (1.73)
0.003 (1.74) 0.005* (2.09) 0.002 (0.74) 0.004*
0.129~ (2.94) -0.144~ (-2.67)
0.001*
The bubble prices are found non-stationary and Johansen's procedure identified existence of one cointegrating vector. The lagged value of this cointegrating vector (COINT) has been used in estimating the subset VAR relations for the linkages between the markets. The details of the unit root and the cointegration tests are not reported here but can be obtained from the authors. The estimated cointegrating vector (normalized on GR) including TREND and constant terms is given below. The numbers in parentheses are t-statistics for the corresponding coefficient. Significance at 5% and 10% level are indicated by * and ** respectively. GR(-1)-1.5826JP(-1)+2.7303UK(-1)-3.2545US(-1)+0.0054T~ND+2.3772 The p-value for the multivariate portmanteau statistic for residual white noise is 0.068. This is described in Lutkepohl (1993) page 188. This indicates that the model adequately represents the relationship documented here.
184
12 Global Bubbles in Stock Markets and Linkages
BubblePrice Ratio ('33)
Fig. 12.1 Plot using smoothed estimates from bubble, yearly US data
BubblePrice Ratio (%)
Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan78 81 84 87 90 93 96 69 72 75 51 54 57 60 63 66
Fig. 12.2a Plot using smoothed estimates from bubble, monthly German data
12.9 Results and Discussions
BubblePrice Ratio (YO)
Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96
Fig. 12.2b Plot using smoothed estimates from bubble, monthly Japanese data
BubblePrice Ratio ( O h )
Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96
Fig. 1 2 . 2 ~Plot using smoothed estimates fkom bubble, monthly UK data
185
12 Global Bubbles in Stock Markets and Linkages
186
BubblePrice Ratio ( O h ) 40
-----
30
-
20
-
--
I
i -10
0
I
1
Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan51 54 72 75 78 81 57 60 63 66 69 84 87 90 93 96
Fig. 12.2d Plot using smoothed estimates f?om bubble, monthly US data
12.10 Summary Economists have long conjectured that movements in stock prices may involve speculative bubbles because trading often generates over-priced markets. A speculative bubble is usually defined as the difference between the market value of a security and its fundamental value. Although there are several important theoretical issues surrounding the topic of asset bubbles, the existence of bubbles is inherently an empirical issue that has not been settled yet. This chapter reviews several important tests and offers a new methodology that improves upon the existing ones. In addition, the new methodology is applied to the four mature markets of the US, Japan, England and Germany to test whether a bubble was present during the period of January 1951 to December 1998. Once we find evidence of bubbles in these four mature stock markets, we next ask the question whether these bubbles are interrelated. We avoid using the technical term of contagion because it has a very specific meaning. Several authors use contagion to mean a significant increase in crossmarket linkages, usually after a major shock. For example, when the Thai economy experienced a major devaluation of its currency during the summer of 1997, the spreading of the crisis across several Asian countries has
References
187
been viewed as a contagion. Unlike the short-term cross-market linkages that emerge as a result of a major, often regional economic shock, we are here interested in long-run linkages. Bubbles often take long time, that is several years to inflate and one is interested in knowing if such processes travel from one mature economy to another. The bursting of a bubble, as in the case of the Thai market with its impact on the Asian stock markets, can be viewed as a contagion. However, our methodology captures long-term characteristics describing the markets studied over the entire sample period. Our statistical tests of the long-term linkages between the four mature stock markets provide evidence that US bubbles cause bubbles in the other three markets but we find no evidence for reverse causality.
References Adler M, Dumas B (1983) International portfolio choice and corporation finance: a synthesis. Journal of Finance, 38: 925-984 Agrnon T (1972) The relations among equity markets: a study of share price comovements in the United States, United Kingdom, Germany and Japan. Journal of Finance, 27: 839-855 Agmon T (1973) Country risk: significance of country factor for share price movements in the United Kingdom, Germany, and Japan. Journal of Business, 46: 24-32 Allen F, Gorton G (1988) Rational finite bubbles. mimeo, The Wharton School, University of Pennsylvania. Becker KG, Finnerty JE, Gupta M (1990) The intertemporal relation between the U.S. and Japanese stock markets. Journal of Finance, 45: 1297-1306 Binswanger M (1999) Stock markets, speculative bubbles and economic growth. Elgar Publishing, Cheltenham Blanchard OJ, Watson M (1982) Bubbles, rational expectations and financial markets. In: Wachel P (ed) Crises in the Economic and Financial Structure, Lexington Books, Lexington Buiter WH, Pesenti PA (1990) Rational speculative bubbles in an exchange rate target zone. NBER Working Paper No. 3467 Cagan P (1956) The Monetary Dynamics of Hyperinflation. In: Friedman M (ed) Studies in the quantity theory of money, University of Chicago Press, Chicago Campbell JY (2000) Asset pricing at the millennium. Journal of Finance, 55: 1515-1568. Campbell JY, Shiller RJ (1988a) The dividend-price ratio and the expectations of future dividends and discount factors. Review of Financial Studies, 1: 195228.
188
12 Global Bubbles in Stock Markets and Linkages
Campbell JY, Shiller RJ (1988b) Stock prices, earnings and expected dividends. Journal of Finance, 43: 66 1-76 Chan K, McQueen G, Thorley S (1998) Are there rational speculative bubbles in Asian stock markets? Pacific-Basin Finance Journal, 6: 125-51 Chen J (1999) When the bubble is going to burst. International Journal of Theoretical and Applied Finance, 2: 285-92 Chirinko R, Schaller H (1996) Bubbles, fundamentals, and investment: a multiple equation testing strategy. Journal of Monetary Economics, 38: 47-76 Christofi AC, Granger CWJ (1987) Co-integration and error correction: representation, estimation, testing. Econometrica, 55: 25 1-276 Christofi AC, Philippatos GC (1987) An empirical investigation of the international arbitrage pricing theory. Management International Review, 27: 13-22 Claessens S, Forbes K editors (2001) International financial contagion. Kluwer Academic Publishers, Amsterdam Dezhbakhsh H, Demirguc-Kunt A (1990) On the presence of speculative bubbles in stock prices. Journal of Financial and Quantitative Analysis, 25: 101-112 Diba BT, Grossman HI (1984) Rational bubbles in the price of gold. National Bureau of Economic Research, Working Paper No. 1300 Diba BT, Grossman HI (1988a) The theory of rational bubbles in stock prices. Economic Journal, 98: 746-754 Diba BT, Grossman HI (1988b) Explosive rational bubbles in stock prices. American Economic Review, 78: 520-530 Eun C, Shim S (1989) International transmission of stock market movements. Journal of Financial and Quantitative Analysis, 24: 241-256 Evans GE (1991) Pitfalls in testing for explosive bubbles in asset prices. American Economic Review, 81: 922-30 Fischer KP, Palasvrita AP (1990) High road to a global marketplace: the international transmission of stock market fluctuations. Financial Review, 25: 371494 Flood RP, Garber PM (1980) Market fundamentals versus price level bubbles: the first tests. Journal of Political Economy, 88: 745-770 French K, Poterba J (1991) Investor diversification and international equity markets. American Economic Review, 8 l : 222-226 Froot KA, Obstfeld M (1991) Intrinsic bubbles: the case of stock prices. American Economic Review, 8 1: 1189-214 Geweke J, Meese R, Dent W (1983) Comparing alternative tests of causality in temporal systems: analytic results and experimental evidence. Journal of Econometrics, 2 1: 161- 194 Ghosh A (1993) Cointegration and error correction models: intertemporal causality between index spot and future prices. Journal of Futures Markets, 13: 193198 Gilles C, LeRoy SF (1992) Bubbles and charges. International Economic Review, 33: 323-339 Granger CWJ (1969) Investigating causal relations by econometric models and cross spectral methods. Econometrica, 37: 428-438
References
189
Grauer RR, Hakansson NH (1987) Gains from international diversification: 19681985 returns on portfolios of stocks and bonds. Journal of Finance, 42: 72174 1 Grubel HG, Fadner K (1971) The interdependence of international equity markets. Journal of Finance, 26: 89-94 Grubel HG (1968) Internationally diversified portfolios: welfare gains and capital flows. American Economic Review, 58: 1299-1314 Guilkey DK, Salemi MK (1982) Small sample properties of three tests of Granger causal ordering in a bivariate stochastic system. Review of Economics and Statistics, 64: 668-680 Hamao Y, Masulis RW, Ng V (1990) Correlations in price changes and volatility across international stock markets. Review of Financial Studies, 3: 28 1-307 Hamilton JD, Whiteman CH (1985) The observable implications of self-fulfilling expectations. Journal of Monetary Economics, 16: 353-73 Hamilton JD (1985) On testing for self-fulfilling speculative price bubbles. International Economic Review, 27: 545-552 Hansen LP, Sargent T (1980) Formulating and estimating dynamic linear rational expectations models. Journal of Economic Dynamics and Control, 2: 7-46 Harvey AC (1990) The econometric analysis of time series. second edition, The MIT Press, Cambridge Harvey AC, Ruiz E, Sentana E (1992) Unobserved component time series models with ARCH disturbances. Journal of Econometrics, 52: 129-157 Harvey CR (1991) The world price of covariance risk. Journal of Finance, 46: 111-157. Hilliard J (1979) The relationship between equity indices on world exchanges. Journal of Finance, 34: 103-114 Ikeda S, Shibata A (1992) Fundamentals-dependent bubbles in stock prices. Journal of Monetary Economics, 30: 143-168 Jeon BN, von Furstenberg GM (1990) Growing international co-movement in stock price indexes. Quarterly Review of Economics and Business, 30: 15-30 Kim C-J, Nelson CR (1999) State-space models with regime switching: classical and Gibbs-sampling approach with applications. The MIT Press, Cambridge King MA, Wadhwani S (1990) Transmission of volatility between stock markets. Review of Financial Studies, 3: 5-33 LeRoy S, Porter RD (1981) The present-value relation: tests based on implied variance bounds. Econometrica, 49: 555-74 Lessard DR (1973) International portfolio diversification: a multivariate analysis for a group of Latin American countries. Journal of Finance, 28: 619-633 Lessard DR (1974) World, national, and industry factors in equity returns. Journal of Finance, 24: 379-391 Lessard DR (1976) World, country, and industry relationships in equity returns. Financial Analysts Journal, 32: 2-8 Levy H, Sarnat M (1970) International diversification of investment portfolios. American Economic Review, 50: 668-675
190
12 Global Bubbles in Stock Markets and Linkages
Longin F, Solnik B (2001) Extreme correlation of international equity markets. Journal of Finance, 56: 649-76 Lutkepohl H (1993) Introduction to multiple time series analysis. second edition, Spinger-Verlag, New York McCarthy J, Najand M (1995) State-space modeling of linkages among international markets. Journal of Multinational Financial Management, 5: 1-9 Makridakis SG, Wheelwright SC (1974) An analysis of the interrelationships among the major world stock exchanges. Journal of Business Finance and Accounting, Summer, 195-215. Maldonado R, Saunders A (1981) International portfolio diversification and the inter-temporal stability of international stock market relationships. Financial Management, 10: 54-63 Malliaris AG, Urrutia JL (1992) The international crash of October 1987: causality tests. Journal of Financial and Quantitative Analysis, 27: 353-364 Malliaris AG, Urmtia JL (1996) European stock market fluctuations: short and long term links. Journal of International Financial Markets, Institutions and Money, 6: 21-34 Malliaris AG, Urrutia JL (1997) Equity and oil markets under external shocks. In: Ghosh D, Ortiz E (eds) Global Structure of Financial Markets, Routledge Publishers, London, pp. 103-116. Miller M, Weller P (1990) Currency bubbles which affect fundamentals: a qualitative treatment. Economic Journal, 100: 170-179 Panton DB, Lessing VP, Joy OM (1976) Co-movement of international equity markets: a taxonomic approach. Journal of Financial and Quantitative Analysis, 11: 415-432 Philippatos G, Christofi AC, Christofi P (1983) The inter-temporal stability of international stock market relationships: another view. Financial Management, 12: 63-69 Pierce DA, Haugh LD (1977) Causality in temporal systems: characterizations and a survey. Journal of Econometrics, 5: 265-293 Rappoport P, White EN (1993) Was there a bubble in the 1929 stock market? Journal of Economic History, 53: 549-574 Rappoport P, White EN (1994) The New York stock market in the 1920s and 1930s: did stock prices move together too much? NBER Working Paper, No. 4627 Ripley TM (1973) Systematic elements in the linkage of national stock market indices. Review of Economics and Statistics, 55: 3556-361 Richards A (1996) Volatility and predictability in national stock markets: how do emerging and mature markets differ? International Monetary Fund Staff Papers, 43: 461-501 Roll R (1988) The international crash of October 1987. In: Kanphuis R, Konnendi R, Watson H (eds) Black Monday and The Future of Financial Markets, Irwin, Homewood Roll R (1989) Price volatility, international market links and their implications for regulatory policies. Journal of Financial Services Research, 3: 21 1-236
References
191
Sarno L. Taylor M (1999) Moral hazard, asset price bubbles, capital flows and the East Asian crisis: the First test. Journal of International Money and Finance, 18: 637-57 Schollhammer H, Sand OC (1987) Lead-lag relationships among national equity markets: an empirical investigation. In: Khoury SJ, Ghosh A (eds) Recent Developments in International Banking and Finance, Lexington Books, Lexington Shiller RJ (1978) Rational expectations and the dynamic structure of macroeconomic models. Journal of Monetary Economics, 4: 1-44 Shiller RJ (1981) Do stock price move too much to be justified by subsequent changes in dividends? American Economic Review, 71: 421-436 Shumway RH, Stoffer DS (2000) Time series analysis and its applications, Springer, New York Solnik BH (1974) Why not diversify internationally rather than domestically? Financial Analyst Journal, 30: 91-135 Solnik BH (1976) An equilibrium model of the international capital market. Journal of Economic Theory, 8: 500-524 Solnik BH (1983) International arbitrage pricing theory. Journal of Finance, 38: 449-457 Stulz R (1981) A model of international asset pricing. Journal of Financial Economics, 9: 383-406 Tesar L, Werner I (1992) Home bias and the globalization of securities markets. NBER Working paper No. 4218. Tirole J (1982) On the possibility of speculation under rational expectations. Econornetrica, 50: 1163-1181 Tirole J (1985) Asset bubbles and overlapping generations. Econornetrica, 53: 1499-1528 Von Fustenberg GM, Jeon BN (1989) International stock price movements: links and messages. Brookings Papers on Economic Activity, 1:125-167 Weil P (1990) On the possibility of price decreasing bubbles. Econornetrica, 58: 1467-1474 Wells C (1996) The Kalman filter in finance, Kluwer Academic Publishers, Amsterdam West KD (1987) A specification test for speculative bubbles. Quarterly Journal of Economics, 102: 553-580 West KD (1988) Bubbles, fads, and stock price volatility tests: a partial evaluation. NBER Working Paper No. 2574 Wheatley S (1988) Some tests of international equity integration. Journal of Financial Economics, 2 1: 177-212 Wu Y (1995) Are there rational bubbles in foreign exchange markets? Journal of International Money and Finance, 14: 27-46 Wu Y (1997) Rational bubbles in the stock market: accounting for the U.S. stockprice volatility. Economic Inquiry, 35: 309-3 19
13 Forward FX Market and the Risk Premium
13.1 Introduction Several regression based studies attempted to explain the ability (or otherwise) of the forward exchange rates in predicting the future realized spot exchange rates. Although with the improvements in econometric theory the nature of the tests employed have changed, but the basic approach has remained essentially within the regression framework. For example, Wu and Zhang (1997) employ a non-parametric test and not only reject the unbiasedness hypothesis but also conclude that the forward premium either contains no information or wrong information about the future currency depreciation. On the other hand, Bakshi and Naka (1997) derive an error correction model under the assumption that the spot and the forward rates are cointegrated and conclude using the generalized method of moments that the unbiasedness hypothesis cannot be rejected. Phillips and McFarland (1997) develop a robust test and reject the unbiasedness hypothesis but conclude that the forward rate has an important role as a predictor of the fbture spot rate. The failure of the unbiasedness hypothesis has been attributed to the existence of a foreign exchange risk premium. This has led to a great deal of research on the modeling of the risk premia in the forward exchange rate market. However, models of risk premia have been unsuccessful in explaining the magnitude of the failure of unbiasedness (Engel 1996, page 124). We define the term rp, = f, - E,[s,+,] as the foreign exchange risk premium. Under risk-neutrality the market participants would behave in such a way that f , , equals E, [st+,] and the expected profit from forward market speculation would be zero. Stulz (1994) discusses a model of foreign exchange risk premium based on optimizing behavior of international investors. However, alongside such theoretical developments pure time series studies of rp, have also assumed a renewed importance. These are useful in describing the behavior of f,., - E,[s,+,] . Models of the foreign ex-
,
194
13 Forward FX Market and the Risk Premium
change risks premium that assume rational expectations should be able to explain the observed time series properties. Examples of such studies include Backus et a1 (1993) and Bekaert (1994). Modeling of the time varying risk premia has been inadequately addressed in the literature since there is little theory to guide us in this respect. Wolff (1987) and Cheung (1993) have modeled the risk premia as an unobserved component and estimated it using the Kalman filter. In their signal extraction approach they empirically determine the temporal behavior of the risk premium using only data of forward exchange rate and the spot exchange rate. Although the signal extraction approach avoids specifying any particular form of the risk premia, it offers little insight into the risk premia and other economic variables. In fact Cheung (1993) attempts to link the estimated risk premia with other macro economic variables in the intertemporal asset-pricing model of Lucas (1982). However, the results are not very encouraging and the estimated regression models have very low R-squares. Both Wolff (1987) and Cheung (1993) analyze the quantity (f,,, -st+,) to determine the time series characteristic of the unobserved risk premia. This in turn determines the dynamics of the unobserved component, the risk premia. For the different currencies they examine the dynamics of the risk premia can be captured by a low order ARMA process. Wolff (2000) further extends the number of currencies studied in the same framework. In those papers, therefore, the observed difference between the forward exchange rate at time t for the period t+k and the subsequently realized spot rate at time t+k is the main driver for the structure of the risk premia. This is assumed to be composed of the unobserved risk premia and the unexpected depreciation of the exchange rate. In this chapter we also adopt the unobserved component model approach and its estimation by Kalman filter. However, we attempt to model the market price of risk (and hence the risk premia) by utilizing the noarbitrage relation between the spot and the forward markets and by assuming a certain dynamic process for the market price of risk. This allows us to obtain the state-space system for the spot and the forward exchange rates and the market price of risk. The filtered estimates of the market price of risk and the other parameters of its dynamic process allow us to compute the risk premia. This approach essentially differs from those of Wolff (1987) and Cheung (1993) in that it models the spot and forward dynamics as well as the market price of risk. We obtain similar characterization of risk premia to Wolff and Cheung, which we interpret as confirmation of this methodology of using the noarbitrage relation under the historical measure. The advantage of the meth-
13.2 Alternative Approach to Model Risk Premia
195
odology is that it is extendible to other derivative markets such as FX options, which are a rich source of untapped information about markets' view of risk premia.
13.2 Alternative Approach to Model Risk Premia It is apparent from the preceding discussion that the risk premium plays an important role in explaining the divergence between the forward exchange rate and subsequently realised spot exchange rate. Most previous studies have employed some form of regression based approaches and the conclusions are based on asymptotic inferences. To avoid data correlation problems with overlapped sample the researchers normally align the data for the realised spot exchange rate with the time period spanned by the forward exchange rate. For example, when the one-month forward rate is used the data frequency for the spot exchange rate should also be of one month. This has the unwanted side effect of reducing effective sample size even if one starts with a large data set. This procedure may, therefore, lead to loss of information in the intervening period. Furthermore, the use of just the forward exchange rate and the subsequently realised spot exchange rate, as in Wolff (1987) and Cheung (1993), does not make use of the no-arbitrage relation that exists between the spot asset and the derivative instrument written on that asset. The derivative market being essentially forward looking it impounds a great deal of market information. As explained in Hull (1997, chapter 13) it is the market price of risk that connects the spot asset and all the derivative assets written on that spot asset. In this chapter we make use of this interconnection in a framework that starts with the usual assumptions of the Black-Scholes option-pricing model and let the spot exchange rate follow a geometric diffusion process. The standard arbitrage argument is then applied to relate the forward exchange rate (a derivative instrument) to the spot exchange rate through the contract period, and the related interest rates in the two countries. By applying Ito's lemma we are able to express the dynamics of the forward price as another stochastic differential equation. Following the argument in Hull (1997, chapter 13) the investors in the forward contracts may be assumed risk-neutral provided they are compensated by additional return from holding the derivative asset. This additional return is related to the market price of risk and the volatility of the underlying spot asset. Since this market price of risk is not observable, we adopt
196
13 Forward FX Market and the Risk Premium
the unobserved component modelling approach after specifying a suitable stochastic process for the market price of risk. Since we are not pricing the forward contracts as such in this chapter we incorporate the market price of risk and treat this as an unobserved state variable in the system dynamics under the historical measure. This is where the main innovation of this chapter enters. Once we express the dynamics of the market price of risk we can treat the observations of the forward exchange rates and the spot exchange rates in the historical measure as opposed to an equivalent risk neutral measure. This leads to a partially observed system involving three variables, the spot exchange rate, the forward exchange rate and the market price of risk. This system can be put into a state-space form and then suitably discretised for estimation by the Kalman filter. The advantage of this approach is that we get the filtered estimates of the market price of risk, which form the basis for estimation of the risk premia. Since we are modelling the dynamics of the three variables simultaneously through the discretisation period, there is no longer a need to align the spot exchange rates with the forward exchange rate period. We, therefore, have the advantage of benefiting from the utilisation of the information generated through the discretisation period. This is normally not possible in regression-based approaches. However, Dunis and Keller (1995) suggest a panel approach that avoids such a loss of data in the regression-based approach
13.3 The Proposed Model Let the spot exchange rate follow the one-dimensional geometric diffusion process, dS = pSdt + o , ~ (t) d ,~
(13.1)
where p is the expected return from the spot asset, o, is the volatility of this return, both measured per unit of time and dW is the increment of a Wiener process under the so-called historical (statistical) probability measure Q , r is the domestic risk-free interest rate and r, as the counterpart in the foreign currency. Since r, can be interpreted as a continuous dividend yield, the instantaneous return to an investor holding foreign exchange is @+I-,). Thus the relationship between the excess return demanded and the market price of risk (A)may be written
13.3 The Proposed Model
197
Thus, under the historical measure Q equation (13.1) can be rewritten dS = (r - rf + ho)Sdt + o,SdW(t) , under Q .
(13.3)
Alternatively under the risk neutral measure Q the last equation becomes
where, ~ ( t=)W(t) + Ih(u)du . 0
We recall that under Q , the process ~ ( t is) not a standard Wiener process since ~ [ d ~ ( = t )hdt ] # 0 in general. However, Girsanov's theorem allows us to obtain the equivalent measure Q under which ~ ( t does ) become a standard Wiener process. The measures Q and Q are related via the Radon-Nikodym derivative. Using standard arguments for pricing derivative securities (see for example, Hull (1997), chapter 13), the forward price at time t for a contract maturing at T(> t) , is
But from equation (l3.4), by Ito's lemma,
so that under Q , the quantity S(t)e-"-'f" is a martingale and it follows immediately that
6,(s,
) = ~,e('-'~
,1.e. '
198
13 Forward FX Market and the Risk Premium
If the maturity date of the contract is a constant period, x, ahead then (13.6) may be written as
Then from (13.3), (13.4) and (13.7) and by a trivial application of Ito's lemma we obtain the stochastic differential equation for F under Q and Q . Thus, under Q
whilst under Q ,
with, F(0, x) = ~,e"-~~'" . We now assume that under historical measure Q the market price of risk, 1 , follows the mean reverting stochastic process
where % is the long-term average of the market price risk, K defines the speed of mean reversion. Here, we assume that the same noise process drives both the spot exchange rate and the market price of risk. It would of course also be possible to consider a second independent Wiener process driving the stochastic differential equation for h . However, we leave investigation of this issue for future research. It should be pointed out here that when discretised the stochastic differential equation (13.10) would become a low order ARMA type process of the kind reported in Wolff (1987) and Cheung (1993). The parameters in
13.3 The Proposed Model
199
equation (13.10) may be estimated from the data using the Kalman filter as pointed out earlier. Considering we have one forward price, f(t, x) , then we have a system of 3 stochastic differential equations. These are (under the measure Q )
where, S(0) = So, i(0) = Lo , f (0, x) = ~ ~ e ". - ~ ~ ' ~ It should be noted that the information contained in equations (13.1 la) (13.1 1c) is also contained in the pricing relationships,
To estimate the parameters in the filtering framework, however, we choose to work with the equation (1 3.1 1c). From equation (13.3), we can write the spot price at time t + x as, using s(t) = In S(t) ,as
From equation (I 3.13) we can write the expected value of s(t + x) as
The calculations outlined in appendix allow us to then write,
200
13 Forward FX Market and the Risk Premium
The above equation may also be expressed (via use of equation (13.7)) as,
Let 7~ (t, X) represent the risk premium (under Q ) for the x period ahead spot rate, then from equation (13.16)
We pointed out in the introduction that previous studies attributed the difference between the forward rate and the subsequently realised spot exchange rate to a risk premium and the unexpected depreciation of the exchange rate. Equation (13.17) gives an explicit expression for the risk premium, characterising how the market price of risk enters the expectation formation and thus influence the risk premium. The integral terms involving the Wiener increments in equation (13.13) should be related to the noise terms identified in Wolff (1987) and Cheung (1993). We would now like to compare the time variation of risk premia for one-month forward rates obtained from equation (13.17) for different exchange rates. This will require estimates of the parameters describing the stochastic process for A given by equation (13.10). In the next section, we describe the state-space formulation of the system and estimation of these parameters as well as the filtered and smoothed estimates of h(t) . We would also like to compare this with the risk premia obtained from the approach outlined in Wolff (1987) and Cheung (1993). It should be pointed out that our method can be easily applied to multiple forward exchange rates and thereby help us examine the term structure of forward risk premia present in quoted forward exchange rates. Besides, our method is not reliant upon data synchronisation with respect to matching the forward rate period as opposed to the methods in Wolff (1987) and Cheung (1993). In
13.4 State-Space Framework
20 1
the following section we also briefly describe the method of WolffICheung so as to facilitate comparison with the approach presented in this chapter.
13.4 State-Space Framework The dynamics of the spot exchange rate, forward exchange rate and the market price of risk are described in equations (13.1 1a) through (13.1 lc). The main consideration in state-space formulation is the separation of the noise driving the system dynamics and the observational noise. The measurements in practical systems are not necessarily the variables driving the system but some transformation of these masked by the measurement noise. Also in most cases all the driving variables are not directly observable, thus leading to partially observed systems. Similarly, in our model the market price of risk is not observable and hence we are dealing with a partially observed system of three variables. We assume that the contribution to the observation noise in our system is from different sources eg the time of measurement (i.e. at the beginning of the trading day or the end of the trading day, etc.), or the spread in the quoted forward exchange rates. We also assume that this measurement noise is independent of the noise sources driving the system dynamics i.e. W . For the purposes of implementation and estimation we need to discretise the continuous time dynamics given by the equations (13.1 1 a) through (13.1 lc). Although a number of different approaches are available (see Kloeden and Platen (1992)) we choose to work with the Euler-Maruyama scheme. As can be seen from the equations (13.1 1a) and (13.1 1c), the diffusion terms are dependent on the state variables themselves and are thus stochastic in nature. We can avoid dealing with the stochastic diffusion system by a simple transformation of variable and application of Ito's lemma. We use the natural logarithm of the spot and forward exchange rates and this transforms the system with constant diffusion terms. The transformed stochastic differential equations with s = ln(S), and f = ln(F) are,
202
13 Forward FX Market and the Risk Premium
After discretisation of the equations (13.1 lb), (13.18) and (13.19) we obtain for the time interval between k and k+ 1:
a)
where AW(t) = W(t) - W(t - At) - N(O, . The equations (13.20) - (13.22) describe the dynamics of the partially observed system and in the state-space framework they are generally referred to as the state transition equations. In a multivariate situation it is convenient to express these in matrix notation and following Harvey (1990) this turns out as follows:
where
13.4 State-Space Framework
203
and q, is a (2x1) vector of noise sources that are serially uncorrelated, with expected values zero and the covariance matrix,
The observations in our system are related to the state variables in an obvious way as
where
The variance of the measurement errors is represented by h for both the observed variables. We are also assuming in this set up that the noise sources in the state and the measurement equations are independent of each other. The estimation process for the state-space system is adaptive in nature and thus requires specification of the initial state vector. As suggested in Harvey, Ruiz and Shepherd (1994) the first observations can be used to initialise it if non-stationarity is suspected. Another application of state-space formulation by the authors in the context of non-Markovian term structure model can be found in Bhar and Chiarella (1997). The full mathematical details of the Kalman filter algorithm are discussed in chapter 8 and 9.
204
13 Forward FX Market and the Risk Premium
--
13.5 Brief Description of WolffICheung Model The main idea in modelling the forward exchange rate risk premia in their model is the assumption that the forecast error resulting from the forward rate as the predictor of futures spot exchange rate consists of a premium component and a white noise error. In this context,
Here E~ is an uncorrelated zero-mean sequence and Pt is the unobserved risk premium. In terms of state-space system representation this is the measurement equation. Both Wolff and Cheung determine the dynamic of the risk premium by studying the time series properties (in the Box-Jenkins sense) of the quantity (Ft,t+,-St+,). As suggested in Wolff (2000) for most currencies either an ARMA(1,l) or an MA(1) representation is adequate. The corresponding equations for Pt for AR(l), ARMA(1,l) and MA(]), respectively, are given by,
The state equation matrices (with reference to equation (13.23)) are given below for each of these time series representations (assuming v2 is the variance of the innovation 9 ) . The estimation process for these models is same as described earlier in the context of our model. For the AR(1) representation:
For the ARMA(1,l) representation:
13.6 Application of the Model and Data Description
205
For the MA(1) representation the matrices are similar to that in case of ARMA(1,l) with restriction that 4 = 0 .
13.6 Application of the Model and Data Description As part of our empirical investigation we apply the methodology developed here to five different exchange rates, all against U.S. dollars. These are Australian dollar (AUD), German marks (DEM), French frank (FRF), British pound (GBP) and the Japanese yen (JPY). The data set covers the period January 1986 to December 1998 with 156 observations in each series. We use only the one-month forward exchange rates so that the results can be compared directly with those from the implementation of WolffICheung methodology. It should be pointed out that our modelling approach does not require that the observations should be properly aligned with the maturity of the forward rates. The exchange rate data reflects the daily 4PM London quotation obtained from Datastream and the interest rate data are the daily closing one-month Euro currency deposit rates. To start the adaptive algorithm of the Kalman filter we initialise the state vector with the first observations. The algorithm also requires specifying the prior covariance matrix for the state vector. In the absence of any specific knowledge about the prior distribution we use the diffuse prior specification following Harvey (1990, p. 121). See also chapter 8 and 9 for additional information in this respect. The parameter estimates are obtained by maximizing the log likelihood function given by the equation (8.35) in Chapter 8. The numerical optimization algorithm called 'Newton' in GAUSS is used for this purpose without any parameter constraints. The results of the estimation procedure are shown in Table 13.1. The t-statistics reported in that table are computed from the standard error obtained from the heteroscedasticity consistent covariance matrix of the parameters at the point of convergence. All the parameters of the model except the long-term average market price of risk are statistically significant for each of the currencies. The estimated parameter o, compares favourably with the sample estimates obtained from the spot exchange rate series (not reported separately). How the model fits the data is best analysed by examining the residual from the
206
13 Forward FX Market and the Risk Premium
estimation process. These are reported in Table 13.2. One of the main requirements is that the residual be serially uncorrelated both in its level and its squared form. The portmanteau test and the ARCH test support this requirement for all the currencies examined. As the Kalman filter generated residuals are recursive in nature two other tests are carried out to judge the model adequacy. These are modified Von Neumann ratio and the recursive t-tests (Harvey 1990, page 157). Both these tests support our modelling approach. Finally, the conditional normality assumption made in the modelling is also supported by the Kolmogorov-Smirnov test statistics reported in Table 13.2. Since we are interested in comparing and contrasting the modelling approach developed here with that reported earlier by WolffICheung we present the estimation results of their models for the same set of currencies in Table 13.3. As can be seen most parameters are statistically significant. We subject the model residuals to the same set of tests as in Table 13.2 (although in their original papers, WolffICheung do not report these diagnostics). The results reported in Table 13.4 support all the model adequacy tests. For the model developed in this chapter the risk premia contained in the one-month forward exchange rate can be computed easily from equation (13.17) with the help of the estimated parameters from Table 13.1 and the filtered (or smoothed) estimates of the market price of risk. Since WolffICheung method does not provide this risk premia directly we dot not analyse this aspect any further. Next, we compare the one-month ahead prediction of the spot exchange rate with the realised exchange rate and thus generate the mean absolute prediction error and the root mean squared prediction error for each of the exchange rate series. In the context of Kalman filter this is really ex ante prediction error since the prediction of the measurement variable for time k+l is made utilising information up to and including time k. This is true for WolffICheung model as well because the way we have implemented it. The comparative results are shown in Table 13.5 for our model, WolffICheung model as well as a martingale process. Overall conclusion from examining the Table 13.5 is that both our model and WolffICheung model perform better than the martingale process. There is, however, not much difference in forecasting performance between our model and WolffICheung model. It should, however, be remembered that our model could be implemented for data set of any observed frequency whereas WolffICheung approach is limited to data set where the spot exchange rate frequency aligns with the forward rate data used.
13.6 Application of the Model and Data Description
207
Table 13.1 Parameter estimates for the model based on market price of risk
--
"
AUD
DEM
FRF GBP JPY
0.0272 (2.16) 0.0797 (9.48) 0.0528 (3.45) 0.0609 (5.12) 0.0960
8.0884
0.7226
--
21.2137
Numbers in parentheses are t-statistics computed from standard errors obtained using the het&oscedasticity consistent cova&nce matrix at the point of convergence.
Table 13.2 Residual diagnostics and model (In Table 13.1) adequacy tests AUD DEM FRF GBP JPY
Port. 0.226 0.080 0.482 0.091 0.286
ARCH 0.702 0.474 0.494 0.342 0.608
MNR 0.832 0.996 0.871 0.897 0.600
Rec. T 0.597 0.887 0.917 0.857 0.956
-
KS 0.042 0.055 0.082 0.068 0.064
]on; diagnostics are computed from the recursive residual of the measurement equation, which corresponds to the spot index process. The null hypothesis in portrnanteau test is that the residuals are serially uncorrelated. The ARCH test checks for no serial correlations in the squared residual up to lag 26. Both these test are applicable to recursive residuals as explained in Wells (1996, page 27). MNR is the modified Von Neumann ratio test using recursive residual for model adequacy (see Harvey (1990, chapter 5). Similarly, if the model is correctly specified then Recursive T has a Student's t-distribution (see Harvey (1990, page 157). KS statistic represents the Kolmogorov-Smirnov test statistic for normality. 95% significance level in this test is 0.109. When KS statistic is less than 0.109 the null hypothesis of normality cannot be rejected.
208
13 Forward FX Market and the Risk Premium
Table 13.3 Parameter estimates for Wolff (1987) and Cheung (1993) models
-
p -
AUD
4)
0.9439 (23.18) -0.7150 (-6.57) -0.7183 (-7.40)
--
8
v
E
0.0261 0.0000 (17.01) (0.000) DEM 0.9206 0.0142 0.0277 (5.12) (1.64) (4.28) FRF 0.9189 0.0077 0.0288 (9.70) (5.56) (15.31) GBP 0.6318 0.0269 0.0159 (5.90) (6.14) (2.25) JPY 0.9311 0.0368 0.0054 (3.91) (10.27) (0.57) Numbers in parentheses are t-statistics computed from standard errors obtained using the heteroscedasticity consistent covariance matrix at the point of convergence.
---
Table 13.4 Residual diagnostics and model (In Table 13.3) adequacy tests Port. ARCH 0.130 0.769 AUD DEM 0.428 0.938 FRF 0.591 0.937 GBP 0.270 0.420 JPY 0.458 0.551 Notes of Table 13.2 apply here. 7
-
P
MNR 0.297 0.604 0.379 0.486 0.539 -
Rec. T 0.925 0.482 0.275 0.287 0.942
KS 0.035 0.055 0.059 0.083 0.063
?
13.7 Summary and Conclusions Table 13.5 One step ahead forecast error for spot exchange rate
----
MAE -----"-------
--"-----"PP
209
-
MSE
AUD Market price of risk CheungIWolff Martingale process
0.0205 0.0206 0.0279
0.0007 0.0007 0.0013
Market price of risk CheungNolff Martingale process
0.0258 0.0254 0.0451
0.001 1 0.0010 0.0035
Market price of risk Cheung/Wolff Martingale process
0.0248 0.0241 0.1446
0.0010 0.0009 0.0344
Market price of risk CheungIWolff Martingale process
0.0235 0.0250 0.0143
0.0010 0.001 1 0.0004
Market price of risk CheungNolff
0.0286 0.0290
0.0014 0.0014
DEM
FRF
GBP
JPY
'MAE' and 'MSE' represent mean absolute error and mean squared error respectively. These are computed from the one step ahead forecast error obtained during Kalman filter recursion. These forecast errors are used to develop the prediction error form of the likelihood function. CheungNolff model refers to our somewhat modified implementation of their approach.
13.7 Summary and Conclusions In this chapter we have presented a new approach to analyse the risk premium in forward exchange rates. This involves exploiting the relationship that links the spot exchange rate and the forward exchange rate through the market price of risk. By directly modelling the market price of risk as a mean reverting process we are able to show how the market price of risk enters into expectation formation for a future spot exchange rate. This methodology allows us to quantify the risk premium associated with a particular forward exchange rate in terms of the parameters of the process describing the market price of risk. We also demonstrate how these parameters can be estimated in a state-space framework by application of the Kalman filter. This procedure, in turn, generates the filtered and the smoothed estimates for the unobserved market price of risk.
210
13 Forward FX Market and the Risk Premium
We apply the procedure developed in this chapter to AUD, DEM, FRF, GBP and JPY all against USD and use one-month forward exchange rates. Various model diagnostics support the modelling approach. We also compare our results with the models proposed by Wolff (1987) and Cheung (1993). We also point out that our approach can be applied to any frequency of data whereas the model of WolffICheung would not work unless the forward rate maturity matches the observation frequency.
Appendix: Calculation of Et
[ j' f l h(r)dr
By an application of Ito's lemma the stochastic differential equation for h (equation 13.10) can be expressed as
Integrating A 13.1 from t to t (> t)
from which
Now integrating (A13.3) from t to t+x,
References
2 11
The first two integrals in the foregoing equation are readily evaluated. However, in order to proceed, the third integral needs to be expressed as a standard stochastic integral, having the dW(u) term in the outer integration. This is achieved by an application of Fubini's theorem (see Kloeden and Platen (1992)) which essentially allows us to interchange the order of integration in the obvious way. Thus,
Thus,
References Backus D, Gregory A, Telmer C (1993) Accounting for forward rates in markets for foreign currency. Journal of Finance, 48: 1887-1908 Baillie R, Bollerslev T (1990) A multivariate generalized ARCH approach to modelling risk premia in forward exchange markets. Journal of International Money and Finance, 9: 309-324 Bakshi GS, Naka A (1997) Unbiasedness of the forward exchange rates. The Financial Review, 32: 145-162 Bekaert G (1994) Exchange rate volatility and deviation from unbiasedness in a cash-in-advance model. Journal of International Economics, 36: 29-52 Bhar R, Chiarella C (2000) Analysis of time varying exchange rate risk premia. In: Dunis CL (ed) Advances in Quantitative Asset Management, Kluwer, Dordrecht, pp. 255 - 273 Bhar R, Chiarella C (1997) Interest rate futures: estimation of volatility parameters in an arbitrage-free framework. Applied Mathematical Finance, 4: 1-19 Boudoukh J, Richardson M, Smith T (1993) Is the ex ante risk premium always positive? Journal of Financial Economics, 34: 387-408
2 12
13 Forward FX Market and the Risk Premium
Canova, F. (1991), An empirical analysis of ex ante profits from forward speculation in foreign exchange markets. Review of Economics and Statistics, 73, 489-496. Canova F, Ito T (1991) The time series properties of the risk premium in the yenldollar exchange market. Journal of Applied Econometrics, 6: 125-142 Canova F, Marrinan J (1993) Profits, risk and uncertainty in foreign exchange markets. Journal of Monetary Economics, 32: 259-286 Cheung Y (1993) Exchange rate risk premiums. Journal of International Money and Finance, 12: 182-194 Cochrane JH (1999) New facts in finance. NBER Working Paper No. 7169 Dumas B (1993) Partial- vs. general-equilibrium models of the international capital market. NBER Working Paper No. 4446 Dunis C, Keller A (1995) Efficiency tests with overlapping data: an application to the currency option market. European Journal of Finance, 1: 345-66 Engel C (1996) The forward discount anomaly and the risk premium: a survey of recent evidence. Journal of Empirical Finance, 3: 123-192 Harvey AC (1990) Forecasting structural time series models and the Kalman filter. Cambridge University Press, Cambridge Harvey AC, Ruiz E, Shephard N (1994) Multivariate stochastic variance model. Review of Economic Studies, 6 1: 247-264 Hull JC (1997) Options, futures, and other derivatives. 31d edn. Prentice Hall International Inc Jazwinski AH (1970), Stochastic processes and filtering theory. Academic Press, New York Kloeden PE, Platen E (1992) Numerical solution of stochastic differential equations. Springer-Verlag, Berlin Lucas RE (1982) Interest rates and currency prices in a two country world. Journal of Monetary Economics, 10: 335-360 Nijman TE, Palm FC, Wolff CCP (1993) Premia in forward exchange rate as unobserved components. Journal of Business and Economic Statistics, 11: 361365 Ostdiek B (1998) The world ex ante risk premium: an empirical investigation. Journal of International Money and Finance, 17: 967-999 Phillips PCB, Mcfarland JW (1997) Forward exchange market unbiasedness: the case of Australian dollar since 1984. Journal of International Money and Finance, 16: 885-907 Ross SA, Westerfield RW, Jordan BJ (1998) Fundamentals of corporate finance. Irwin-McGraw-Hill Stulz R (1994) International portfolio choice and asset pricing an integrative survey. NBER Working Paper No. 4645 Wolff CCP (1987) Forward foreign exchange rates, expected spot rates, and premia: a signal-extraction approach. Journal of Finance, 42: 395-406 Wolff CCP (2000) Measuring the forward exchange risk premium: multi-country evidence from unobserved component models. Journal of International Financial markets, Institutions and Money, 10: 1-8
References
2 13
Wu Y, Zhang H (1997) Forward premiums as unbiased predictors of future currency depreciation: a non-parametric analysis. Journal of International Money and Finance, 16: 609-623
14 Equity Risk Premia from Derivative Prices
14.1 Introduction This chapter focuses on a topical and important area of finance theory and practice, namely the analysis of the equity market risk premium. In particular the chapter suggests a new approach to the estimation of the equity market risk premium by making use of the theoretical relationship that links it to the prices of traded derivatives and their underlying assets. The volume of trading in equity derivatives, particularly on broad indices, is enormous and it seems reasonable that prices of the underlying and the derivative should impound in them the market's view on the risk premium associated with the underlying. To our knowledge no attempt has been made to get at the risk premium from this perspective. The approach we adopt also has the advantage of quite naturally leading to a dynamic specification of the equity risk premium. This aspect of our framework is pertinent in the context that a great deal of recent research has pointed to the significance of time varying risk premia. Consider for instance research on the predictability of asset returns and capital market integration. If markets are completely integrated then assets with the same risk should have the same expected return irrespective of the particular market. Bekaert and Harvey (1995) use a time varying weight to capture differing price of variance risk across countries. Ferson and Harvey (1991) and Evans (1994) showed that although changes in covariance of returns induce changes in betas, most of the predictable movements in returns could be attributed to time changes in risk premia. Some authors have investigated the time variation of both the systematic and specific risks of portfolios in a number of equity markets using suitable dynamic specifications (such as GARCH-M type models) for return volatility e.g. Giannopoulos (1995). According to the equilibrium capital asset pricing model (CAPM), expected return from a risky asset is directly related to the market risk premium through its covariance with the market return (i.e. its beta). Although
216
14 Equity Risk Premia from Derivative Prices
in CAPM beta is assumed to be time invariant, many studies (e.g. Bos and Newbold (1984), Bollerslev, Engle and Wooldridge (1988), Chan, Karolyi and Stulz (1992)) have confirmed instability of betas over time. These authors also show that betas of financial assets can be better described by some type of stochastic model and hence explores the conditional CAPM. It is in this context that the modeling of risk-premia across time is important, particularly from the point of view of domestic f h d managers looking to diversify their portfolios internationally. The fact that risk is time varying has significant implications for portfolio managers. This is because many risk management strategies are based on the assumption of a static measure of risk, which does not offer satisfactory guide to its possible future evolution. The modeling of the dynamic behaviour of risk premia is a difficult exercise since it is not directly observable in the financial market. It can only be inferred from the prices of other related observable financial variables. Evans (1994) points out a number of information sources that can be used to measure risk-premia. These are, for example, lagged realized return on a one-month Treasury bill, the spread between the yield on one- and sixmonth Treasury bills, the spread between dividend-price ratio on the S&P500 and one month Treasury bill. However, one encounters some significant econometric problems such as multicollinearity when attempting to estimate risk-premia from these variables. Besides, the dynamic behaviour of risk-premia is still not well captured by such regression-based techniques. In this chapter we propose to model the dynamic behaviour of riskpremia using the stochastic differential equations for underlying price processes that arise from an application of the arbitrage arguments used to price derivatives on the underlying, such as index futures and options on such futures contracts. This stochastic differential system is considered under the so-called historical (or real world) probability measure rather than the risk neutral probability measure required for derivative security pricing. The link between these two probability measures is the risk-premium. The price process can thus be expressed in a dynamic form involving observable prices of the derivative securities and their underlying assets and the unobservable risk-premium. A mean reverting process for the dynamics of the risk premium is considered. This system of prices and riskpremium can be treated as a partially observed stochastic dynamic system. In order to cater for the time variation of volatility we use the option implied volatility in the dynamic equations for the index and its derivatives. This quantity is in a sense treated as a signal that impounds the market's forward looking view on the equity risk premium. The resulting system of
14.2 The Theory behind the Modeling Framework
217
stochastic differential equations can then be cast into a state-space form from which the risk-premia can be estimated using Kalman filtering methodology. We apply this approach to estimate the market risk premium at monthly frequency in the Australian and US markets over the period January 1995 to December 1999. The plan of the chapter is as follows. Section 14.2 lays out the theoretical framework linking the index, the futures on the index and the index futures option. The stochastic differential equations driving these quantities are expressed under both the risk-neutral measure and the historical measure. The role of the equity risk premium linking these two measures is then made explicit. In section 14.3 a stochastic differential equation modeling the dynamics of the market price of equity risk is specified. The dynamics of the entire system of index, index futures, index futures option and market price of equity risk is then laid out and interpreted in the language of state-space filtering, as in chapter 8. Section 14.4 describes the Kalman filtering set-up and how the equity risk premium is estimated following the ideas in chapter 9. Section 14.5 describes the data set used for empirical implementation. Section 14.6 gives the estimation results and various interpretations. Section 14.7 summarizes and concludes and makes suggestions for future research.
14.2 The Theory behind the Modeling Framework We use S to denote the index value, F a futures contract on the index and C an option on the futures. We assume that S follows the standard lognormal diffusion process,
where Z is a Wiener process under the historical probability measure P, p is the expected instantaneous index return and o its volatility. The spotlfutures price relationship is,
where q is the continuous dividend yield on the index and T is the maturity date of the index futures. Applying Ito's lemma to (14.2)' we derive the stochastic differential equation (SDE) for F, viz.
218
14 Equity Risk Premia from Derivative Prices
dF=p,Fdt+o, F d Z ,
(14.3)
where
Application of the standard Black-Scholes hedging argument to a portfolio containing the call option and a position in the futures yields the stochastic differential equations for S, F and C, namely dS = (r-q) S dt + o S d 2 ,
(14.5a)
Here, 2 is a Wiener process under the risk-neutral measure and is related to the Wiener process Z under the historical measure P according to,
where i ( t ) is the instantaneous market price of risk of the index. This latter quantity can be interpreted from the expected excess return relation' p - (r-q) = Lo,
(14.7)
We recall that the expected excess return relation equation (14.7) arises from expressing the condition of no riskless arbitrage between the index option and the pc-r-p~-runderlying as ----I. '=c
'=F
14.2 The Theory behind the Modeling Framework
219
as the amount investors require instantaneously to be compensated for a unit increase in the volatility of the index. In this study we interpret ho as the risk premium of the market, as it measures the compensation that an investor would require above the cost-of-carry ( = r - q for the index) to hold the market portfolio. The option return volatility o, is given by,
and the partial derivative is the option delta with respect to the futures price. Equation (14.5) is converted into the traditional Black's (1976) futures call option pricing formula via the observation that Ce-* is a martingale under the risk-neutral probability measure and is given by,
where,
In the expression (14. lo), T is the maturity of the option contract and is typically a few days before the futures delivery date2. Our purpose is to use market values of S, F and C to extract information about the market price of risk, 3L. Thus, we use equation (14.6) to convert the dynamic system (14.5) into a diffusion process under the historical measure P, namely, (14.1 la)
In this study we treat the option maturity and futures maturity as contemporaneous
220
14 Equity Risk Premia from Derivative Prices
Where o, can be calculated from Black's model as,
Equations (14.1 1) describe the dynamic evolution of the value of the index, its futures price and the price of a call option on the futures under the historical probability measure and assuming that there are no arbitrage opportunities between these assets. The volatility o and the market price of risk h are the only unobservable quantities. In the next section we describe how filtering techniques may be used to infer these quantities from the market prices.
14.3 The Continuous Time State-Space Framework A fundamental question is how should the time variation of A be modelled? Here we have little theory to guide us, though we could appeal to a dynamic general equilibrium framework. However this in turn requires many assumptions such as specification of utility function and process(es) for underlying factor(s). For our empirical application we prefer to simply assume h follows the mean reverting diffusion process,
Here, h is the long-run value of h , K is the speed of reversion and o, is the standard deviation of changes in h . We assume that the procedure for h is driven by the same Wiener process that drives the index. The motivation for this assumption is the further assumption that the market price of risk is some function of S and t. An application of Ito's Lemma would then imply that the dynamics for h are driven by the Wiener process Z(t).
14.3 The Continuous Time State-Space Framework
221
The specification (14.13) has a certain intuitive appeal. Through the mean reverting drift it captures the observation that ex-post empirical estimates of h appear to be mean reverting. The only open issue with the specification (14.13) is whether we should specify a more elaborate volatility structure rather than just assuming o, is constant. Here we prefer to let the data speak; if the specification (14.13) does not provide a good fit then it would seem appropriate to consider more elaborate volatility structures (and indeed also for the drift). Thus we end up considering a four dimensional stochastic dynamic system for S, F, C and h which we write in full here: dS = (r-q
+ Lo) Sdt + o SdZ ,
(14.14a)
It will be computationally convenient to express the system (14.14) in terms of logarithms of the quantities S, F and C. Thus our system becomes
222
14 Equity Risk Premia from Derivative Prices
where we set s = ln(S), f = ln(F) and c = In (C). In filtering language equation (14.15) is in state-space form and we are dealing with a partially observed system since the prices s, f and c are observed but the market price of risk, h, is not. In setting up the filtering framework in the next section it is most convenient to view h as the unobserved state vector (here a scalar) and changes in s, f and c as observations dependent on the evolution of the state. We know from a great deal of empirical work that the assumption of a constant o is not valid. Perhaps the most theoretically satisfactory way to cope with the non-constancy of o would be to develop a stochastic volatility model. However we then would not have a simple option pricing model such as (14.9), furthermore this would introduce a further market price of risk- namely that for volatility, into our framework. Thus as a practical solution to handling the non-constancy of o we shall use implied volatility calculated from market prices using Black's model. Given a set of observations f and c we can use equation (14.9) to infer the implied volatility &(f,c,t). Here we use a notation that emphasizes the hnctional dependence of & on f, c and t. This dependence becomes important when we set up the filtering algorithm in the next section. The corresponding option price volatility o, would be calculated from equation (14.12), bearing in mind that the quantity d, in equation (14. lo), also is now viewed as a function of &(f,c, t) . Thus we write
6,(f, C,t) = &(f,C,t)e(f-c)e-r(T-t)~(d, (f, &(f,c, t), t)) We can view the system (14.15) as a state-space system with (s, f, c, h ) being the state vector. This is a partially observed system in that we have observations of s, f and c but not of h . It is worth making the point that by using the implied volatility we are using a forward-looking measure of volatility as this quantity can be regarded as a signal that impounds the market's most up-to-date view about risk in the underlying index.
14.4 Setting Up The Filtering Framework
223
14.4 Setting Up The Filtering Framework The ideal framework to deal with estimation of partially observed dynamical systems is the Kalman filter. See for example, Jaminski (1970) and Lipster and Shiryaev (2000) as general references, and Harvey (1989) and Wells (1996) for economic and financial applications. Financial implementations of the Kalman filter are usually carried out in a discrete time setting as data are observed discretely. To this end we discretise the system (14.15) using the Euler-Maruyama discretisation, which has as one advantage that it retains the linear (conditionally) Gaussian feature of the continuous time counterpart. Considering first equation (14.15d) for the (unobserved) state variable X ( = A), after time discretisation its evolution from time period k (t = kAt) to k+l is given by
where,
and, the disturbance term ~k - N ((),I) is serially uncorrelated. In filtering terminology the equation (14.17) is known as the state transition equation. The observation equation in this system consists of changes in log of the spot index, index futures and the call option prices (obtained by discretising equations (14.15a- 14.15~).In matrix notation these are,
224
14 Equity Risk Premia from Derivative Prices
Here, Hk is the matrix
h
and we use ok and o,,, to denote the values of 6 and o, respectively at time kAt . In addition to the system noise€, we have assumed in (14.19) the existence of an observation noise term Qkqk, where q, - N(O,I) is serially uncorrelated and independent of the E, . The ( 3 x 3 ) diagonal matrix Q, has elements whose values would depend on features (such as bid-ask spread) of the market for each of the assets in the observation vector. Equation (14.19) can be written more compactly as, Yk
= dk
+ DkXk + H A + Qkrlk
7
(14.21)
where we use Yk to indicate the observation vector over the interval k to k+l, and its elements consist of the log price changes in s, f and c. In order to express the observation equation (14.21) in standard form we define the combined noise term Vk = Hk~k+ Qkrlk
9
so that vk - N (0, V, ) where
With these notations the observation equation (14.21) may then be written
14.4 Setting Up The Filtering Framework
225
The state transition equation (14.17) together with observation equation (14.24) constitute a state-space representation to which the Kalman filter as outlined in Jazwinski (1970) and Harvey (1989) may be applied. The relevant issues are also given in chapter 8 and 9. It needs to be noted that we are dealing with the case in which there is correlation between the system noise and observation noise3since
With the system now in state-space form, the recursive Kalman filter algorithm can be applied to compute the optimal estimator of the state at time k, based on the information available at time k. This information set consists of the observations of Y up to and including at time k. We also note that the basic assumption of Kalman filtering viz. that the distribution of the evolution of the state vector is conditionally normal is satisfied in our case since the Wiener increments are normal and the implied volatilities o, and o,,,(that affect the coefficients in the observation equation) depend on Y up to time (k-1). Therefore, the state variable is completely specified by the first two moments. It is these quantities that the Kalman filter computes as it proceeds from one time step to the next. Here we merely summarize these updating equations, full details of which are available in Jazwinski (1970), Lipster and Shiryaev (2000), Harvey (1989) and Wells (1996). Given the values of X, and P, , the optimal one step a head predictor of X,,, is given by (for k=O, 1, , ,N- 1)
while the covariance matrix (here a scalar) of the predictor is given by,
See Jazwinski (1970), section 7.3, pp 209-210.
226
14 Equity Risk Premia from Derivative Prices
Pk+,lk = TPkIkT + R R'.
(14.27)
The equations (14.26) and (14.27) are known as the prediction equations. Once the next new observation becomes available, the estimator of Xk+,in equation (14.26) can be updated as,
and
where
In order to clarifj the notation we note that Xk 1 k, Xk+lI k, Xk+lI k+l, Pk+lI k, Pk+l1 k+l, ak, T and R are scalars, dk, Dk and vk are 3-dimensional column vectors, Ckis a 3-dimensional row vector and, Fk, Vk are 3 x 3 matrices. The set of equations (14.26)-(14.30) essentially describes the Kalman filter and these are specified in terms of the initial values Xo and Var(Xo)= Po. Once these initial values are given, the Kalman filter produces the optimal estimator of the state vector, as each new observation becomes available. It should be noted that the equations (14.28) and (14.29) assume that the inverse of the matrix Fk+,exists. It may, however, be replaced, if needed, by a pseudo inverse. The updating equations step forward through the N observations. For insample estimation, as we are doing here, it is possible to improve the estimates of the state vector based upon the whole sample information. This is referred to as Kalman smoother and it uses as the initial conditions the last observation, N, and steps backwards through the observations at each step adjusting the mean and covariance matrix so as to better fit the observed data. The estimated mean and the associated covariance matrix at the N"
14.4 Setting Up The Filtering Framework
227
observation are XNIN, PNI,respectively. The following set of equations describes the smoother algorithm, for k = N, N-1, 2. Although the smoothing procedure has been introduced in chapter 8, for the sake of completion it is just summarized below.
where
Clearly to implement the smoothing algorithm the quantities Xklk,Pklk generated during the forward filter pass must be stored. The quantity within the second parentheses on the R.H.S. in equation (14.28) is known as the prediction error. For the conditional Gaussian model studied here, it can be used to form the likelihood function viz.,
-
where m is the number of elements in the state vector (in this study equal to 1). To estimate the parameter vector 0 ( K , % , G ~ )the likelihood function (14.32) can be maximized using a suitable numerical optimization procedure. This will yield the consistent and asymptotically efficient estimator 6 (Lo 1988).
228
14 Equity Risk Premia from Derivative Prices -
-
-
14.5 The Data Set The estimation methodology is applied to monthly data from the Australian and US markets for the period January 1995 to December 1999. For the Australian market, we use the market index (All Ordinaries Index), index futures, and call options on the index futures for all the four delivery months (March, June, September and December). For the US market we use S&P 500, index futures and call options on the index futures for all the four delivery months (March, June, September and December). The data were taken for the first trading day of each month. To avoid possible thin trading problems we construct a time series that uses only the last three months of a particular futures contract before switching to the next. For the Australian market, we collected all futures and futures options market data, including the implied volatility from the Sydney Futures Exchange and all the spot market data from DataStream. The 13-weeks Treasury note approximates the data for the risk-free interest rate and the information on dividend yield is provided by the Australian Stock Exchange. For the US market futures and futures options market data, including the implied volatility, were collected from the Futures Industry Institute and the US T-bill3 month rate was taken from DataStream.
14.6 Estimation Results The estimation results are set out in Tables 14.1 and 14.2. Table 14.1 gives the results for the estimation of the coefficients K , h and o,for both the Australian and US markets. The numbers in parentheses below the parameters represent t-ratios and * indicates significance at the 5% level. The t-statistic focuses on the significance of parameter estimates. We have also applied a range of other tests that focus on goodness-of-fit of the model itself, in particular residual diagnostics and model adequacy. The relevant tests are the portmanteau test, ARCH test, KS (KolmogorovSmirnov) test, the MNR (modified von Neuman ratio) test and the recursive t-test. The results of these are displayed in Table 14.2. Entries are pvalues for the respective statistics except for the KS statistic. These diagnostics are computed from the recursive residual of the measurement equation, which corresponds to the spot index process. The null hypothesis in the portmanteau test is that the residuals are serially uncorrelated and this hypothesis is clearly accepted. The ARCH test checks for no serial correlations in the squared residual up to lag 26 and the results in Table 14.2 indicate there are very little ARCH effects in the residuals. Both these test are
14.6 Estimation Results
229
applicable to recursive residuals as explained in Wells (1996, page 27). MNR is the modified Von Neumann ratio test using recursive residuals for model adequacy (Harvey 1990, chapter 5) and the results confirm model adequacy. Similarly, we conclude correct model specification on the basis of the recursive T since if the model is correctly specified then the recursive T has a Student's t-distribution (Harvey 1990, page 157). The KS (Kolmogorov-Smirnov) statistic represents the test statistic for normality. The 95% and 99% significance levels in this test are 0.088 and 0.105 respectively (when the KS statistic is less than 0.088 or 0.105 the null hypothesis of normality cannot be rejected at the indicated level of significance) and so the results provide support for the normality assumption underpinning the Kalman filter approach. Overall the set of tests in Table 14.2 indicate a good fit for the model in both markets. Table 14.1 Estimated parameters of market price of risk K
Australia (AOI) USA (S&P)
17.41* (1.90) 9.81*
3L
0,
1.1541* (0.3526) 2.0218*
0.0468* (0.0075) 0.0242*
Data set spans monthly (beginning) observations from January 1995 to December 1999. The numbers in parentheses below the parameters represent standard errors. Significance at 5% level is indicated by * and at 1% level is indicated by **. Table 14.2 Residual diagnostics and model adequacy tests "-ARCH -KS Test MNR Portmanteau Australia 0.785 0.603 0.099 0.465 USA 0.493 0.748 0.133 0.956 Entries are p-values for the respective statistics except for the KS statistic. These diagnostics are computed from the recursive residual of the measurement equation, which corresponds to the spot index process. The null hypothesis in portmanteau test is that the residuals are serially uncorrelated. The ARCH test checks for no serial correlations in the squared residual up to lag 26. Both these test are applicable to recursive residuals as explained in Wells (1996, page 27). MNR is the modified Von Neumann ratio test using recursive residual for model adequacy (Harvey 1990, chapter 5). KS statistic represents the Kolmogorov-Smirnov test statistic for normality. 95% and 99% significance levels in this test are 0.179 and 0.214 respectively. When KS statistic is less than 0.179 or 0.214 the null hypothesis of normality cannot be rejected at the indicated level of significance. ,
"e
230
14 Equity Risk Premia from Derivative Prices
Mar- Jul- Nov- Mar- Jul- Nov- Mar- Jul- Nov- Mar- Jul- Nov- Mar- Jul- Nov99 99 95 96 96 96 97 97 97 98 98 98 99 95 95
1 -Exnost --- Smoothed ....-Smoothed-2SD
Smoothed+2SDI
Fig. 14.1 Inferred risk premium for S&P (USA)
Mar- Jul- NOV- Mar- Jul- NOV- Mar- Jul- NOV- Mar- Jul- NOV- Mar- Jul- NOV98 99 99 99 96 97 97 97 98 98 95 95 95 96 96
/ -~ x ~ o-s- -t Smoothed . - - - Smoothed-2SD Fig. 14.2 Inferred risk premium for A01 (Australia)
Smoothed+2SD
/
14.6 Estimation Results
23 1
Table 14.3 Filtered mean and standard deviations (S.D.) for S&P risk premium Month
E
m
Model
Model-2S.D.
Model+2S.D.
232
14 Equity Risk Premia from Derivative Prices
Table 14.3 Continued.
----
Month
Average S.D. Corr.
Model
0.2071
Model-2S.D. ----------
Model+2S.D.
0.2744 0.1541 0.5488
Corr indicates correlation between model risk premium and the ex-post risk pre-
14.6 Estimation Results
233
Table 14.4 Filtered mean and standard deviations (S.D.) for A01 risk premium Month Ex-= -- Model Model-2S.D. Model+2S.D.
--
234
14 Equity Risk Premia from Derivative Prices
Table 14.4 Continued.
-
Month
Average S.D. Corr.
Ex-post
0.0943
Model
Model-2S.D.
Model+2S.D.
0.1412 0.2053 0.7391
Corr indicates correlation between model risk premium and the ex-post risk premium.
14.7 Summary and Conclusions
235
The result of the procedure of stepping forward and then stepping backward through the filter updating equations yield us estimates of the conditional mean X,,, and variance P,, of the distribution of the market price of risk A. These are turned into estimates of the conditional mean and variance of the equity risk premium at each k by appropriately scaling with the implied volatility 6,. In Fig. 14.1 (for the S&P 500) and Fig. 14.2 (for the SFE) we plot the estimates of the conditional mean of the equity risk premium together with a two standard deviations band. The actual computed vales are given in Tables 14.3 and 14.4. For comparison purposes we have also calculated the ex-post equity risk premium. This has been calculated simply by subtracting from monthly returns, the proxy for the risk free interest rate. For the S&P 500 the ex-post estimates remain within the two standard deviations band about 60% of the time, furthermore most movements out of the band are in the downward direction. So compared to the estimates of the equity risk premium implied by index futures options prices the ex-post estimates tend to be underestimates. The two standard deviation band of the SFE is wider than that for the S&P 500, and the ex-post estimates remain within the band about 84 % of the time. This could indicate a greater degree of uncertainty about the equity risk premium in the smaller Australian market.
14.7 Summary and Conclusions In this chapter we have expressed the no-riskless arbitrage relationship between the value of the stock market index, the prices of futures on the index and the prices of options on the futures as a system of stochastic differential equations under the historical probability measure, rather than the risk neutral measure used for derivative pricing. As a consequence the stochastic differential equation system involves the market price of risk for the stochastic factor driving the index. This market price of risk is an unobserved quantity and we posit for its dynamics a simple mean-reverting process. We view the resulting stochastic dynamic system in the statespace framework with the changes in index value, futures prices and option prices as the observed components and the market price of risk as the unobserved component. In order to cater for time varying (and possibly stochastic) volatility we replace the volatility of the index by the implied volatility calculated by use of Black's model. We use Kalman filtering methodology to estimate the parameters of this system and use these to es-
236
14 Equity Risk Premia from Derivative Prices
timate the time varying conditional normal distribution of the equity risk premium implied by futures options prices. The method has been applied to daily data on the Australian All Ordinaries index and options on the SPI futures and the S&P 500 and index futures options for the period 1995-1999. Estimations were performed at monthly frequency. As well as applying the usual t-test to determine significance of the parameter estimates a range of tests were conducted to determine the adequacy of the model. It was found that parameter estimates are significant and the model fit is quite good based on a range of goodness-of-fit tests. The estimates of the conditional mean and standard deviation of the distribution of the equity risk premium seem reasonable, when compared with point estimates computed simply from ex-post returns. For the S&P 500 the filtered estimates yield a much tighter band than the expost estimates. Overall we conclude that the approach of using filtering methodology to infer risk premia from derivative prices is a viable one and is worthy of further research effort. One advantage as we have discussed in section 3 is that it gives a forward-looking measure of the risk premium. Also it gives a time varying distribution of the equity risk premium as opposed to the point estimates of the ex-post calculation. A number of avenues for future research suggest themselves. First, a careful comparison of the equity risk premium computed by the methods of this chapter with that calculated using the traditional method based on ex-post returns should be carried out. Second, the technique could be extended to options on heavily traded stocks and risk premia for individual stocks could be calculated. These could be used to determine the beta for the stock implied by the option prices. These in turn could be used as the basis of portfolio strategies and the results could be compared with use of the beta calculated by traditional regression based methods.
References Bekaert G, Harvey CR (1995) Time-varying world market integration. Journal of Finance, 50: 403-444 Black F (1976) The pricing of commodity contracts. Journal of Financial Economics, 3: 167-179 Bollerslev T, Engle RF, Wooldridge JM (1988) A capital asset pricing model with time varying covariances. Journal of Political Economy, 96: 116-131 Bos T, Newbold P (1984) An empirical investigation of the possibility of stochastic systematic risk in the market model. Journal of Business, 57: 35-41
References
237
Chan KC, Karolyi GA, Stulz RM (1992) Global financial markets and the risk premium on U.S. equity. Journal of Financial Economics, 32: 137-167 Evans MD (1994) Expected returns, time varying risk, and risk premia. Journal of Finance, 49: 655-679 Ferson WE, Harvey CR (1991) The variation of economic risk premiums. Journal of Political Economy, 99: 385-415 Giannopoulos K (1995) Estimating the time varying component of international stock market risk. European Journal of Finance, 1: 129-164 Harvey AC (1989) Forecasting structural time series models and the Kalman filter. Cambridge University Press, Cambridge, New York Jazwinski AH (1970) Stochastic processes and filtering theory. Academic Press, New York, London Lipster RS, Shiryaev AN (2000) Statistics of random processes 11. Springer Ver1% Lo AW (1988) Maximum likelihood estimation of generalized Ito processes with discretely sampled data. Econometric Theory, 4: 23 1-247 Wells C (1996) The Kalman filter in finance. Kluwer Academic Publishers, Boston
Index
ACF(autocorre1ation function), 41 ADF(augmented Dickey-fuller) test, 47-48 AIC(Akaike information criterion), 48,56 AR(autoregressive),43, 86 AR(l), 43 AR(P), 44,86 ARMA(autoregressivemoving average), 86 ARMA(I>,q),87 ARCH(autoregressiveconditional heteroskedasticity), 26, 68-7 1,111 asset price basics, 127-129 callable bond, 152 causality-in-meantest, 75-76 causality-in-variancetest, 76-77 CCF(cross correlation function), 74 classical regression, 83-86 coincident indicator, 99- 101 cointegration, 49-50, 59-61 constrained optimization, 27-28 continuous time state space framework, 220-222 correlogram, 4 1 countable stochastic process, 10 decomposition of earnings, 121- 125 dependent observations, 23-24 DLM(dynamic linear model), 83 Dickey-Fuller test, 46-49
discrete time model of interest rate, 141 discrete time real asset valuation model, 127 dynamic linear models for bubble solutions, 167-172 dynamic linear models for no-bubble solutions, 172-173 EGARCH(exponentia1GARCH), 7 173 EGARCH(1, l), 73 EM(expectation maximization) algorithm, 108-111, 116-118 equity risk premia from derivative prices, 2 15 error correction, 59 evolution of commodity prices, 125126 forward FX market and the risk premium, 193 forward recursion for lattice and elementary price, 145 Friedman's plucking model of business fluctuations, 118- 121 function of random variable, 7-8 GARCH(genera1ized ARCH), 26-27, 68-71, 113-116 GARCH(1, I), 26-27,70-71, 113-116
240
Index
GARCH-M (GARCH-in-Mean), 38 Granger causality, 57-59 global bubbles in stock markets, 155 global stock market integration, 165167
I@), 46 I(1), 46 information flow between price change and trading volume, 77-8 1 immunization, 149 Johansen test, 61-62 kernel regression, 33-35 kurtosis, 69 lognormal random variable, 9- 10 LA-VAR(1ag-augmented VAR), 6264
MA(moving average), 42 MA(l), 42 MA(q), 43987 Markov chains, 10-14 matching the current term structure, 148-149 mining project, 129 MLE(maximum likelihood estimation), 21-22,22-23 maximum eigenvalue test, 62 Nadaraya-Watson (kernel) estimator, 34-35,37 non-parametric approach, 32-33
normal random variable, 8-9 optimal bandwidth selection, 36 passage time, 14-15 prediction error decomposition, 24-25 random variables, 5-6 recursive least squares, 89-9 1 residual-based cointegration test, 50-5 1
SBIC(Schwarz Bayesian Information Criterion), 48, 57 short rate lattice, 141-145, 149 signal extraction, 95-99 speculative bubbles, 156-158 spurious regression, 5 1 SSM (state-space model), 83, 105 state-space framework, 20 1-203 state-space representation, 9 1-94 state-space representation of a VARMA(2,l) model, 94-95 stationary process, 41-44, 55-57 stochastic regression, 102 stochastic variance, 113-116 term structure of interest rate, 148 TGARCH(thresho1d GARCH), 7 1-73 TGARCH(1, l), 72 trace test, 6 1 unit root, 44-46 unit root in a regression model, 5 1-54
Index
valuing callable bond, 152-153 VAR(vector autoregression), 55, 86 VARMA(vector autoregressive moving average), 86 VECM (vector error correction model), 61 volatility, 68 white noise process, 42 WolffICheung model, 204-205
241
About the Authors
Dr. Ramaprasad Bhar is an Associate Professor in the School of Banking and Finance at The University of New South Wales in Australia.
Dr. Shigeyuki Hamori is a Professor in the Graduate School of Economics at Kobe University in Japan.