This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
0; i D 1; : : :; N. 4. b is a vector of uncorrelated random variables such that E.bt / D 0 and var.bt / D b2 where b2 > 0 and t D 1; : : :; T. 5. ei D .ei1 ; : : :; eiT /0 is a sample of a realization of a finite moving-average time series of order m < T 1 for each i ; hence, eit D ˛0 t C ˛1 t
1
C : : : C ˛m t
m
t D 1; : : :; TI i D 1; : : :; N j D1
where ˛0 ; ˛1 ; : : :; ˛m are unknown constants such that ˛0 ¤0 and ˛m ¤0, and fj gj D 1 is a white noise process—that is, a sequence of uncorrelated random variables with E.t / D 0; E.t2 / D 2 , and 2 > 0. T T 6. The sets of random variables fai gN i D1 , fbt gt D1 , and fei t gt D1 for i D 1; : : :; N are mutually uncorrelated.
7. The random terms have normal distributions ai N.0; a2 /; bt N.0; b2 /; t k N.0; 2 /; for i D 1; : : :; NI t D 1; : : :TI and k D 1; : : :; m. If assumptions 1–6 are satisfied, then E.y/ D Xˇ and var.y/ D a2 .IN ˝JT / C b2 .JN ˝IT / C .IN ˝‰T /
and
1304 F Chapter 19: The PANEL Procedure
where ‰T is a T T matrix with elements t s as follows: ( .jt sj/ if jt sjm Cov.eit eis / D 0 if jt sj > m P k sj. For the definition of IN , IT , JN , and JT , see where .k/ D 2 m j D0 ˛j ˛j Ck for k D jt the section “Fuller and Battese’s Method” on page 1292. The covariance matrix, denoted by V, can be written in the form V D a2 .IN ˝JT / C b2 .JN ˝IT / C
m X
.k/
.k/.IN ˝‰T /
kD0 .0/
.k/
where ‰T D IT , and, for k =1,: : :, m, ‰T is a band matrix whose kth off-diagonal elements are 1’s and all other elements are 0’s. Thus, the covariance matrix of the vector of observations y has the form Var.y/ D
mC3 X
k Vk
kD1
where 1 D a2 2 D b2 k D
.k
3/k D 3; : : :; m C 3
V1 D IN ˝JT V2 D JN ˝IT .k 3/
Vk D IN ˝‰T
k D 3; : : :; m C 3
The estimator of ˇ is a two-step GLS-type estimator—that is, GLS with the unknown covariance matrix replaced by a suitable estimator of V. It is obtained by substituting Seely estimates for the scalar multiples k ; k D 1; 2; : : :; m C 3. Seely (1969) presents a general theory of unbiased estimation when the choice of estimators is restricted to finite dimensional vector spaces, with a special emphasis on quadratic estimation of Pn functions of the form i D1 ıi i . The Pn parameters i (i =1,: : :, n) are associated with a linear model E(y )=X ˇ with covariance matrix i D1 i Vi where Vi (i =1, : : :, n) are real symmetric matrices. The method is also discussed by Seely (1970a,1970b) and Seely and Zyskind (1971). Seely and Soong (1971) consider the MINQUE principle, using an approach along the lines of Seely (1969).
Dynamic Panel Estimator For an example on dynamic panel estimation using GMM option, see “Example 19.6: The Cigarette Sales Data: Dynamic Panel Estimation with GMM” on page 1343.
Dynamic Panel Estimator F 1305
Consider the case of the following general model: maxlag
yit D †lD1 l yi.t
l/
C †KkD1 ˇk xitk C i C ˛t C it
The x variables can include ones that are correlated or uncorrelated to the individual effects, predetermined, or strictly exogenous. The and ˛ are cross-sectional and time series fixed effects, respectively. Arellano and Bond (1991) show that it is possible to define conditions that should result in a consistent estimator. Consider the simple case of an autoregression in a panel setting (with only individual effects): yit D yi.t
1/
C i C it
Differencing the preceding relationship results in: yit D yi.t where it D it
it
1/
C it
1.
Obviously, y is not exogenous. However, Arellano and Bond (1991) show that it is still useful as an instrument, if properly lagged. For t D 2 (assuming the first observation corresponds to time period 1) you have, yi2 D yi1 C i2 Using yi1 as an instrument is not a good idea since Cov .i1 ; i2 / ¤ 0. Therefore, since it is not possible to form a moment restriction, you discard this observation. For t D 3 you have, yi3 D yi2 C i3 Clearly, you have every reason to suspect that Cov .i1 ; i3 / D 0. This condition forms one restriction. For t D 4, both Cov .i1 ; i4 / D 0 and Cov .i2 ; i4 / D 0 must hold. Proceeding in that fashion, you have the following matrix of instruments, 1 0 yi1 0 0 0 0 0 0 0 0 B 0 yi1 yi2 0 0 0 0 0 C 0 C B B 0 0 0 yi1 yi2 yi3 0 0 C 0 Zi D B C B :: C :: :: :: :: :: @ : A : : : : : 0 0 0 0 0 0 0 yi1 yi.T 2/ Using the instrument matrix, you form the weighting matrix AN as AN D
N 1 X 0 Z Hi Zi N i i
!
1
1306 F Chapter 19: The PANEL Procedure
The initial weighting matrix is 0 2 1 0 0 0 0 B 1 2 1 0 0 0 B B 0 1 2 1 0 0 B Hi D B : : : :: :: :: B :: : B @ 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0
0 0 0 :: :
1 0
2 1
0 0 0 :: :
1
C C C C C C C 1 A 2
Note that the maximum size of the Hi matrix is T–2. The origins of the initial weighting matrix are the expected error covariances. Notice that on the diagonals, 2 2 / . E it it D E it 2it i.t 1/ C i.t 1/ D 22 and off diagonals, E it i.t
1/
D E it i.t
it i.t
1/
2/
i.t
1/ i.t 1/
C i.t
1/ i.t 2/
D
2
If you let the vector of lagged differences (in the series yit ) be denoted as yi and the dependent variable as yi , then the optimal GMM estimator is "
! X
D
!#
0
yi Zi AN
i
X
0
1
! X
Zi yi
i
0
yi Zi AN
i
! X
0
Zi yi
i
O you can obtain estimates of the errors, O , or the differences, . Using the estimate, , O From the errors, the variance is calculated as, 0
O O M 1 P where M D Ni D1 Ti is the total number of observations. 2 D
Furthermore, you can calculate the variance of the parameter as, "
2
X 0 0 †i yi Zi AN Zi yi
!#
1
i
O you can Alternatively, you can view the initial estimate of the as a first step. That is, by using , improve the estimate of the weight matrix, AN . Instead of imposing the structure of the weighting, you form the Hi matrix through the following: 0
Hi D O i O i
You then complete the calculation as previously shown. The PROC PANEL option TWOSTEP specifies this estimation.
Dynamic Panel Estimator F 1307
The case of multiple right-hand-side variables illustrates more clearly the power of Arellano and Bond (1991) and Arellano and Bover (1995). Considering the general case you have: maxlag
yit D
X
l yi.t
l/
C ˛Xi C i C ˛t C it
lD1
It is clear that lags of the dependent variable are both not exogenous and correlated to the fixed effects. However, the independent variables can fall into one of several categories. An independent variable can be correlated and exogenous, uncorrelated and exogenous, correlated and predetermined, and uncorrelated and predetermined. The category in which an independent variable is found influences when or whether it becomes a suitable instrument. Note, however, that neither PROC PANEL nor Arellano and Bond require that a regressor be an instrument or that an instrument be a regressor. First, consider the question of exogenous or endogenous. An exogenous variable is not correlated with the error term in the model at all. Therefore, all observations (on the exogenous variable) become valid instruments at all time periods. If the model has only one instrument and it happens to be exogenous, then the optimal instrument matrix looks like, 1 0 xi1 xiT 0 0 0 0 B C 0 xi1 xiT 0 0 0 C B B C 0 0 xi1 xiTS 0 0 Zi D B C B C :: :: :: :: :: @ A : : : : : 0 0 0 0 xi1 xiTS The situation for the predetermined variables becomes a little more difficult. A predetermined variable is one whose future realizations can be correlated to current shocks in the dependent variable. With such an understanding, it is admissible to allow all current and lagged realizations as instruments. In other words you have, 0 1 xi1 0 0 0 0 B 0 xi1 xi2 C 0 0 0 B C B 0 C 0 xi1 xi3 0 0 Zi D B C B :: C :: :: :: :: @ : A : : : : 0
0
0
0 xi1 xi.TS
2/
When the data contain a mix of endogenous, exogenous, and predetermined variables, the instrument matrix is formed by combining the three. The third observation would have one observation on the dependent variable as an instrument, three observations on the predetermined variables as instruments, and all observations on the exogenous variables. There is yet another set of moment restrictions that can be employed. An uncorrelated variable means that the variable’s level is not affected by the individual specific effect. You write the general model presented above as: maxlag
yit D
X lD1
l yi.t
l/
C †KkD1 ˇk xitk C ˛t C it
1308 F Chapter 19: The PANEL Procedure
where it D i C it . Since the variables are uncorrelated with and uncorrelated with the error, you can perform a system estimation with the difference and level equations. That is, the uncorrelated variables imply moment restrictions on the level equation. If you denote the new instrument matrix with the full complement of instruments available by a and both x p and x e are uncorrelated, then you have: 0 1 Zi 0 0 0 0 B 0 xp xe 0 0 C i1 i1 B C Zi D B : : : : :: C :: :: :: @ :: : A 0
0
p
e xiTS xiTS
The formation of the initial weighting matrix becomes somewhat problematic. If you denote the new weighting matrix with a , then you can write the following: AN
D
N 1 X 0 Z H Z N i i i i
1
!
where 0 Hi
B B B DB B @
Hi 0 0 0 0 1 0 0 0 0 1 0 :: :: :: : : : : : : 0 0 0
1
0 0 0 :: :
C C C C C A
1
To finish, you write out the two equations (or two stages) that are estimated. yit D ˇ Si C ˛t
˛t
1
yit D ˇ Si C i C ˛t C it
C it
where Si is the matrix of all explanatory variables, lagged endogenous, exogenous, and predetermined. Let yit be given by yit yit D ˇ D yit
ˇ
S
Using the information above, " ! !# X 0 X 0 ˇ D Si Zi AN Zi Si i
i
Si Si
D
1
!
! X i
0
Si Zi AN
X
0
Zi yi
i
If the TWOSTEP or ITGMM option is not requested, estimation terminates here. If it terminates, you can obtain the following information. Variance of the error term comes from the second stage equation—that is, 0
2 D
O O M p
Dynamic Panel Estimator F 1309
where p is the number of regressors. The variance covariance matrix can be obtained from " ! !# 1 X 0 X 0 Si Zi AN Zi Si 2 i
i
Alternatively, a robust estimate of the variance covariance matrix can be obtained by specifying the ROBUST option. Without further reestimation of the model, the Hi matrix is recalculated as follows: 0 0 Hi;2 D 0 0 And the weighting matrix becomes AN;2 D
N 1 X 0 Z H Z N i i i;2 i
1
!
Using the information above, you construct the robust variance covariance matrix from the following: Let G denote a temporary matrix. "
! X
GD
!#
0
X
Si Zi AN
i
1
!
0
X
Zi Si
i
0
Si Zi AN
i
The robust variance covariance estimate of ˇ is: 0 Vrobust ˇ D GAN;21 G Alternatively, the new weighting matrix can be used to form an updated estimate of the regression parameters. This results when the TWOSTEP option is requested. In short, "
ˇ D
! X
0 Si Zi
!# AN;2
i
X
1
0 Zi Si
! X
i
0 Si Zi
i
! AN;2
X
0 Zi Si
i
The variance covariance estimate of the two step ˇ becomes " V ˇ
D
!#
! X i
0
Si Zi
AN;2
X
1
0
Zi Si
i
As a final note, it possible to iterate more than twice by specifying the ITGMM option. Such a multiple iteration should result in a more stable estimate of the variance covariance estimate. PROC
1310 F Chapter 19: The PANEL Procedure
PANEL allows two convergence criteria. Convergence can occur in the parameter estimates or in the weighting matrices. Iterate until ˇ ˇ ˇ ˇ ˇAN;kC1 .i; j / AN;k .i; j /ˇ ˇ ˇ ATOL max ˇ ˇ i;j dim.AN;k / ˇAN;k .i; j /ˇ or ˇ ˇ ˇ ˇ ˇˇkC1 .i / ˇk .i /ˇ ˇ ˇ BTOL max ˇˇ .i /ˇ i dim.ˇk / k where ATOL is the tolerance for convergence in the weighting matrix and BTOL is the tolerance for convergence in the parameter estimate matrix. The default convergence criteria is BTOL = 1E–8 for PROC PANEL.
Specification Testing For Dynamic Panel Specification tests under the GMM in PROC PANEL follow Arellano and Bond (1991) very generally. The first test available is a Sargan/Hansen test of over-identification. The test for a one-step estimation is constructed as ! ! X 0 X 0 i Zi AN Zi i 2 i
i
where i is the stacked error term (of the differenced equation and level equation). When the robust weighting matrix is used, the test statistic is computed as ! ! X 0 X 0 i Zi AN;2 Zi i i
i
This definition of the Sargan test is used for all iterated estimations. The Sargan test is distributed as a 2 with degrees of freedom equal to the number of moment conditions minus the number of parameters. In addition to the Sargan test, PROC PANEL tests for autocorrelation in the residuals. These tests are distributed as standard normal. PROC PANEL tests the hypothesis that the autocorrelation of the lth lag is significant. Define !l as the lag of Symbolically, 0 0 B :: B : B B 0 !l D B B 1 B B :: @ : T S
2 l
the differenced error, with zero padding for the missing values generated. 1 C C C C C C C C A
Instrument Choice F 1311
You define the constant k0 as X 0 !l;i i k0 .l/ D i
You next define the constant k1 as X 0 !l;i Hi !l;i k1 .l/ D i
Note that the choice of Hi is dependent on the stage of estimation. If the estimation is first stage, then you would use the matrix with twos along the main diagonal, and minus ones along the primary subdiagonals. In a robust estimation or multi-step estimation, this matrix would be formed from the outer product of the residuals (from the previous step). Define the constant k2 as ! k2 .l/ D
X
2
!
0
X
!l;i Si G
i
Si Zi AN;k
i
X
0
Zi Hi !l;i
i
The matrix G is defined as " ! !# X X 0 0 GD Si Zi AN;k Zi Si i
!
0
1
i
The constant k3 is defined as ! ! X 0 X 0 k3 .l/ D !l;i Si V ˇ Si !l;i i
i
Using the four quantities, the test for autoregressive structure in the differenced residual is k0 .l/ m .l/ D p k1 .l/ C k2 .l/ C k3 .l/ The m statistic is distributed as a normal random variable with mean zero and standard deviation of one.
Instrument Choice Arellano and Bond’s technique is a very useful method for dealing with any autoregressive characteristics in the data. However, there is one caveat to consider. Too many instruments bias the estimator to the within estimate. Furthermore, many instruments make this technique not scalable. The weighting matrix becomes very large, so every operation that involves it becomes more computationally intensive. The PANEL procedure enables you to specify a bandwidth for instrument selection. For example, specifying MAXBAND=10 means that at most there will be ten time observations for each variable entering as an instrument. The default is to follow the Arellano-Bond methodology.
1312 F Chapter 19: The PANEL Procedure
In specifying a maximum bandwidth, you can also specify the selection of the time observations. There are three possibilities: leading, trailing (default), and centered. The exact consequence of choosing any of those possibilities depends on the variable type (correlated, exogenous, or predetermined) and the time period of the current observation. If the MAXBAND option is specified, then the following is true under any selection criterion (let t be the time subscript for the current observation). The first observation for the endogenous variable (as instrument) is max.t MAXBAND; 2/ and the last instrument is t 2. The first observation for a predetermined variable is max.t MAXBAND; 2/ and t 1. The first and last observation for an exogenous variable is given in the list below. Trailing: If t < MAXBAND, then the first instrument is for the first time period and the last observation is MAXBAND. Otherwise, if t MAXBAND, then the first observation is t MAXBAND, while the last instrument to enter is t . Centered: If t MAXBAND , then the first observation is the first time period and the last 2 MAXBAND observation is MAXBAND. If t > T , then the first instrument included is the 2 T MAXBAND C 1 and the last observation is T . If MAXBAND < t T MAXBAND , then 2 2 MAXBAND MAXBAND the first included instrument is t C 1 and the last observation is t C . 2 2 If the MAXBAND value is an odd number, the procedure decrements by one. Leading : If t > T MAXBAND, then the first instrument corresponds to time period T MAXBAND, while the last observation is T S. Otherwise, if t T MAXBAND, then the first observation is t and the last observation is t C. The PANEL procedure enables you to include dummy variables to deal with the presence of time effects not captured by including the lagged dependent variable. The dummy variables directly affect the level equations. However, this implies that the difference of the dummy variables for dummy variable of time period t and t 1 enters the difference equation. The first usable observation occurs at t D 3. If the level equation is not used in the estimation, then there is no way to identify the dummy variables. Selecting the TIME option gives the same result as that which would be obtained by creating dummy variables in the data set and using those in the regression. The PANEL procedure gives you several options when it comes to missing values and unbalanced panel. By default, any time period for which there are missing values is skipped. The corresponding rows and columns of H matrices are zeroed, and the calculation is continued. Alternatively, you can elect to replace missing values and missing observations with zeros (ZERO), the overall mean of the series (OAM), the cross-sectional mean (CSM), or the time series mean (TSM).
Linear Hypothesis Testing For a linear hypothesis of the form R ˇ D r where R is JK and r is J1, the F-statistic with J; M K degrees of freedom is computed as .Rˇ
0
O 0 r/ ŒRVR
1
.Rˇ
r/
Heteroscedasticity-Corrected Covariance Matrices F 1313
However, it is also possible to write the F statistic as 0
FD
0
O .uO uO uO u/=J 0 O uO u=.M K/
where uO is the residual vector from the restricted regression uO is the residual vector from the unrestricted regression J is the number of restrictions .M
K/ are the degrees of freedom
The Wald, likelihood ratio (LR) and Lagrange multiplier (LM) tests are all related to the F test. You use this relationship of the F test to the likelihood ratio and Lagrange multiplier tests. The Wald test is calculated from its definition. The Wald test statistic is: W D .Rˇ
0 O 0 r/ ŒRVR
1
.Rˇ
r/
The advantage of calculating Wald in this manner is that it enables you to substitute a heteroscedasticity-corrected covariance matrix for the matrix V. PROC PANEL makes such a substitution if you request the HCCME option in the MODEL statement. The likelihood ratio is: LR D M ln 1 C
1 M
K
JF
The Lagrange multiplier test statistic is: JF LM D M M K C JF Note that only the Wald is changed when the HCCME option is selected. The LR and LM tests are unchanged. The distribution of these test statistics is the 2 with degrees of freedom equal to the number of restrictions imposed (J). The three tests are asymptotically equivalent, but they have differing small sample properties. Greene (2000, p. 392) and Davidson and MacKinnon (1993, pg. 456-458) discuss the small sample properties of these statistics.
Heteroscedasticity-Corrected Covariance Matrices The MODEL statement HCCME= option is used to select the type of heteroscedasticity- consistent covariance matrix. In the presence of heteroscedasticity, the covariance matrix has a complicated structure which can result in inefficiencies in the OLS estimates and biased estimates of the variance
1314 F Chapter 19: The PANEL Procedure
covariance matrix. Consider the simple linear model (this discussion parallels the discussion in Davidson and MacKinnon, 1993, pg. 548-562): y D Xˇ C The assumptions that make the linear regression best linear unbiased estimator (BLUE) are E./ D 0 0 and E. / D , where has the simple structure 2 I. Heteroscedasticity results in a general covariance structure, so that it is not possible to simplify . The result is the following: 0 ˇQ D .X X/
1
0
0
X y D .X X/
1
0
0
1
X .X˛ C ›/ D ˇ C .X X/
0
X›
As long as the following is true, then you are assured that the OLS estimate is consistent and unbiased: 1 0 pli mn!1 X D0 n If the regressors are nonrandom, then it is possible to write the variance of the estimated ˇ as the following: 0 0 0 Var ˇ ˇQ D .X X/ 1 X X.X X/ 1 The effect of structure in the variance covariance matrix can be ameliorated by using generalized least squares (GLS), provided that 1 can be calculated. Using 1 , you premultiply both sides of the regression equation, 1
1
yD
1
X˛ C
The resulting GLS ˇ is 0 ˇO D .X
1
X/
1
0
1
X
y
Using the GLS ˇ, you can write 0 ˇO D .X 0
D .X
1 1
X/ X/
0
D ˇ C .X
0
1
X
1
0
1
1
X .
X/
1
y
1
X˛ C
0
1
X
1
/
The resulting variance expression for the GLS estimator is Var ˇ
ˇO
0
1
X/
1
X
0
1
0
0
1
X/
1
X
0
1
0
1
X/
1
D .X D .X D .X
1 1
0
X.X 0
X.X
1 1
X/
X/
1 1
The difference in variance between the OLS estimator and the GLS estimator can be written as 0
.X X/
1
0
0
X X.X X/
1
0
.X
1
X/
1
By the Gauss Markov Theory, the difference matrix must be positive definite under most circumstances (zero if OLS and GLS are the same, when the usual classical regression assumptions are
Heteroscedasticity-Corrected Covariance Matrices F 1315
met). Thus, OLS is not efficient under a general error structure. It is crucial to realize is that OLS does not produce biased results. It would suffice if you had a method for estimating a consistent covariance matrix and you used the OLS ˇ. Estimation of the matrix is by no means simple. The matrix is square and has M 2 elements, so unless some sort of structure is assumed, it becomes an impossible problem to solve. However, the heteroscedasticity can have quite a general structure. White (1980) shows that it is not necessary to have a consistent estimate of . On the contrary, it suffices to get an estimate of the middle expression. That is, you need an estimate of: 0
ƒ D X X This matrix, ƒ, is easier to estimate because its dimension is K. PROC PANEL provides the following classical HCCME estimators: The matrix ƒ is approximated by: 0
HCCME=N0: This is the simple OLS estimator so .X X/. If you do not request the HCCME= option, then PROC PANEL defaults to this estimator. HCCME=0: M 1 X 2 0 Oi xi xi M iD0
The xi constitutes the it h row of the matrix Xs . HCCME=1: M 1 X M 0 2 O xi xi M M K i iD0
HCCME=2: M 1 X Oi2 0 xi xi O M hi iD0 1
The hO i term is i th diagonal element of the so called hat matrix. The expression for hO i is 0 0 Xi .X X/ 1 Xi . The hat matrix attempts to adjust the estimates for the presence of influence or leverage points. HCCME=3: M
Oi2 1X 0 xi xi 2 O n hi / iD0 .1 HCCME=4: This is the Arellano (1987) version of the White (1980) HCCME for panel. PROC PANEL includes an option for the calculation of the Arellano (1987) version of the White HCCME in the panel setting. Arellano’s insight is that in a panel there are N covariance matrices, each corresponding to a cross section. Forming the White (1980) HCCME for each panel, you need to take only the average of those N estimators that yield Arellano. The details
1316 F Chapter 19: The PANEL Procedure
of the estimation follow. First, you arrange the data such that the first cross section occupies the first Ti observations. You treat the panels as separate regressions with the form: yi D ˛i i C Xis ˇQ C i The parameter estimates ˇQ and ˛i are the result of LSDV or within estimator. i is a vector 0 ones of length Ti . The estimate of the i th cross section’s X X matrix (where the s subscript 0 indicates that no constant column has been suppressed to avoid confusion) is Xi Xi . The estimate for the whole sample is: 0
Xs Xs D
N X
0
Xi Xi
iD1
The Arellano standard error is in fact a White-Newey-West estimator with constant and equal weight on each component. It should be noted that, in the between estimators, selecting HCCME = 4 returns the HCCME = 0 result since there is no ‘other’ variable to group by. In their discussion, Davidson and MacKinnon (1993, pg. 554) argue that HCCME=1 should always be preferred to HCCME=0. While generally HCCME=3 is preferred to 2 and 2 is preferred to 1, the calculation of HCCME=1 is as simple as the calculation of HCCME=0. Therefore, it is clear that HCCME=1 is preferred when the calculation of the hat matrix is too tedious. All HCCME estimators have well defined asymptotic properties. The small sample properties are not well known, and care must exercised when sample sizes are small. The HCCME estimator of Var.ˇ/ is used to drive the covariance matrices for the fixed effects and the Lagrange multiplier standard errors. Robust estimates of the variance covariance matrix for ˇ imply robust variance covariance matrices for all other parameters.
R-Square The conventional R-square measure is inappropriate for all models that the PANEL procedure estimates by using GLS since a number outside the [0,1] range might be produced. Hence, a generalization of the R-square measure is reported. The following goodness-of-fit measure (Buse 1973) is reported: 2
R D1
0 O 1 uO uO V 0 0 O 1 yDV Dy
where uO are the residuals of the transformed model, uO D y and D D IM
0
jM jM .
O V 0 O jM V
1 1j M
0
O X.X V
1 X/ 1 X0 V O 1 y,
/.
This is a measure of the proportion of the transformed sum of squares of the dependent variable that is attributable to the influence of the independent variables.
Specification Tests F 1317
If there is no intercept in the model, the corresponding measure (Theil 1961) is 0
R2 D 1
O uO V 0 O yV
1u O 1y
However, the fixed-effects models are somewhat different. In the case of a fixed-effects model, the choice of including or excluding an intercept becomes merely a choice of classification. Suppressing the intercept in the FIXONE or FIXONETIME case merely changes the name of the intercept to a fixed effect. It makes no sense to redefine the R-square measure since nothing material changes in the model. Similarly, for the FIXTWO model there is no reason to change R-square. In the case of the FIXONE, FIXONETIME, and FIXTWO models, the R-square is defined as the Theil (1961) R-square (detailed above). This makes intuitive sense since you are regressing a transformed (demeaned) series on transformed regressors, excluding a constant. In other words, you are looking at one minus the sum of squared errors divided by the sum of squares of the (transformed) dependent variable. In the case of OLS estimation, both of the R-square formulas given here reduce to the usual R-square formula.
Specification Tests The PANEL procedure outputs the results of one specification test for fixed effects and two specification tests for random effects. For fixed effects, let ˇf be the n dimensional vector of fixed-effects parameters. The specification test reported is the conventional F statistic for the hypothesis ˇf D 0. The F statistic with n; M K degrees of freedom is computed as ˇOf SO f 1 ˇOf =n where SO f is the estimated covariance matrix of the fixed-effects parameters. Hausman’s (1978) specification test or m statistic can be used to test hypotheses in terms of bias or inconsistency of an estimator. This test was also proposed by Wu (1973) and further extended in Hausman and Taylor (1982). Hausman’s m statistic is as follows. Consider two estimators, ˇOa and ˇOb , which under the null hypothesis are both consistent, but only ˇOa is asymptotically efficient. Under the alternative hypothesis, only ˇOb is consistent. The m statistic is m D .ˇOb
0 ˇOa / .SO b
SO a /
1
.ˇOb
ˇOa /
where SO b and SO a are consistent estimates of the asymptotic covariance matrices of ˇOb and ˇOa . Then m is distributed 2 with k degrees of freedom, where k is the dimension of ˇOa and ˇOb . In the random-effects specification, the null hypothesis of no correlation between effects and regressors implies that the OLS estimates of the slope parameters are consistent and inefficient but the GLS estimates of the slope parameters are consistent and efficient. This facilitates a Hausman
1318 F Chapter 19: The PANEL Procedure
specification test. The reported 2 statistic has degrees of freedom equal to the number of slope parameters. Breusch and Pagan (1980) lay out a Lagrange multiplier test for random effects based on the simple OLS (pooled) estimator. If uO it is the itth residual from the OLS regression, then the Breusch-Pagan (BP) test for one-way random effects is 32 2 P hP i2 N T u O tD1 it NT 6 iD1 7 BP D 15 4 PN PT 2 2.T 1/ O it iD1 tD1 u The BP test generalizes to the case of a two-way random-effects model (Greene 2000, page 589). Specifically, i2 u O t = 1 it NT 6 4 PN PT 2.T 1/ O 2it i=1 t=1u 2P
BP2 D
n i=1
2P
C
T t=1
hP T
hP
N O it i=1u
i2
NT 6 4 PN PT 2.N 1/ O 2it i=1 t=1u
32 7 15 32 7 15
is distributed as a 2 statistic with two degrees of freedom. Since the BP2 test generalizes (nests the BP test) the test for random effects, the absence of random effects (nonrejection of the null of no random effects) in the BP2 is a fairly clear indication that there will probably not be any one-way effects either. In both cases (BP and BP2), the residuals are obtained from a pooled regression. There is very little extra cost in selecting both the BP and BP2 test. Notice that in the case of just groupwise heteroscedasticity, the BP2 test approaches BP. In the case of time based heteroscedasticity, the BP2 test reduces to a BP test of time effects. In the case of unbalanced panels, neither the BP nor BP2 statistics are valid. Finally, you should be aware that the BP option generates different results depending on whether the estimation is FIXONE or FIXONET. Specifically, under the FIXONE estimation technique, the BP tests for cross-sectional random effects. Under the FIXONET estimation, the BP tests for time random effects. While the Hausman statistic is automatically generated, you request Breusch-Pagan via the BP or BP2 option (see Baltagi 1995 for details).
Troubleshooting Some guidelines need to be followed when you use PROC PANEL for analysis. For each cross section, PROC PANEL requires at least two time series observations with nonmissing values for all model variables. There should be at least two cross sections for each time point in the data. If these two conditions are not met, then an error message is printed in the log stating that there is only one cross section or time series observation and further computations will be terminated. You have to
Troubleshooting F 1319
give adequate data for an estimation method to produce results, and you should check the log for any data related errors. If the number of cross sections is greater than the number of time series observations per cross section, PROC PANEL while using PARKS method produces an error message stating that the phi matrix is singular. This is analogous to seemingly unrelated regression with fewer observations than equations in the model. To avoid the problem, reduce the number of cross sections. Your data set could have multiple observations for each time ID within a particular cross section. However, PROC PANEL is applicable only in cases where you have only a single observation for each time ID within each cross section. In such a case, after you have sorted the data, an error warning specifying that the data has not been sorted in ascending sequence with respect to time series ID appears in the log. The cause of the error is due to multiple observations for each time ID for a given cross section. PROC PANEL allows only one observation for each time ID within each cross section. The following data set shown in Figure 19.2 illustrates the preceding instance with the correct representation. Figure 19.2 Single Observation for Each Time Series Obs 1 2 3 4 5 6 7 8
firm 1 1 1 1 2 2 2 2
year 1955 1960 1965 1970 1955 1960 1965 1970
production 5.36598 6.03787 6.37673 6.93245 6.54535 6.69827 7.40245 7.82644
cost 1.14867 1.45185 1.52257 1.76627 1.35041 1.71109 2.09519 2.39480
In this case, you can observe that there are no multiple observations with respect to a given time series ID within a cross section. This is the correct representation of a data set where PROC PANEL is applicable. If for state ID 1 you have two observations for the year=1955, then PROC PANEL produces the following error message: “The data set is not sorted in ascending sequence with respect to time series ID. The current time period has year=1955 and the previous time period has year=1955 in cross section firm=1.” A data set similar to the previous example with multiple observations for the YEAR=1955 is shown in Figure 19.3; this data set results in an error message due to multiple observations while using PROC PANEL.
1320 F Chapter 19: The PANEL Procedure
Figure 19.3 Multiple Observations for Each Time Series Obs 1 2 3 4 5 6 7 8
firm 1 1 1 1 2 2 2 2
year 1955 1955 1960 1970 1955 1960 1965 1970
production 5.36598 6.37673 6.03787 6.93245 6.54535 6.69827 7.40245 7.82644
cost 1.14867 1.52257 1.45185 1.76627 1.35041 1.71109 2.09519 2.39480
In order to use PROC PANEL, you need to aggregate the data so that you have unique time ID values within each cross section. One possible way to do this is to run a PROC MEANS on the input data set and compute the mean of all the variables by FIRM and YEAR, and then use the output data set.
ODS Graphics This section describes the use of ODS for creating graphics with the PANEL procedure. Both the graphical results and the syntax for specifying them are subject to change in a future release. To request these graphs, you must specify the ODS GRAPHICS statement. The table below lists the graph names, the plot descriptions, and the options used. Table 19.2
ODS Graphics Produced by PROC PANEL
ODS Graph Name ResidualPlot FitPlot QQPlot ResidSurfacePlot PredSurfacePlot ActSurfacePlot ResidStackPlot ResidHistogram
Plot Description Plot of the residuals Predicted versus actual plot Plot of the quantiles of the residuals Surface plot of the residuals Surface plot of the predicted values Surface plot of the actual values Stack plot of the residuals Plot of the histogram of residuals
Plots=Option Residual, Resid Fitplot QQ Residsurface Predsurface Actsurface Residstack, Resstack Residualhistogram, Residhistogram
The OUTPUT OUT= Data Set PROC PANEL writes the initial data of the estimated model, predicted values, and residuals to an output data set when the OUTPUT OUT= statement is specified. The OUT= data set contains the following variables:
The OUTEST= Data Set F 1321
_MODELL_
is a character variable that contains the label for the MODEL statement if a label is specified.
_METHOD_
is a character variable that identifies the estimation method.
_MODLNO_
is the number of the model estimated.
_ACTUAL_
contains the value of the dependent variable.
_WEIGHT_
contains the weighing variable.
_CSID_
is the value of the cross section ID.
_TSID_
is the value of the time period in the dynamic model.
regressors
are the values of regressor variables specified in the MODEL statement.
name
if PRED= name1 and/or RESIDUAL= name2 options are specified, then name1 and name2 are the columns of predicted values of dependent variable and residuals of the regression, respectively.
The OUTEST= Data Set PROC PANEL writes the parameter estimates to an output data set when the OUTEST= option is specified. The OUTEST= data set contains the following variables: _MODEL_
is a character variable that contains the label for the MODEL statement if a label is specified.
_METHOD_
is a character variable that identifies the estimation method. Current methods are FULLER, PARKS, and DASILVA.
_TYPE_
is a character variable that identifies the type of observation. Values of the _TYPE_ variable are CORRB, COVB, CSPARMS, and PARMS; the CORRB observation contains correlations of the parameter estimates; the COVB observation contains covariances of the parameter estimates; the CSPARMS observation contains cross-sectional parameter estimates; and the PARMS observation contains parameter estimates.
_NAME_
is a character variable that contains the name of a regressor variable for COVB and CORRB observations and is left blank for other observations. The _NAME_ variable is used in conjunction with the _TYPE_ values COVB and CORRB to identify rows of the correlation or covariance matrix.
_DEPVAR_
is a character variable that contains the name of the response variable.
_MSE_
is the mean square error of the transformed model.
_CSID_
is the value of the cross section ID for CSPARMS observations. The _CSID_ variable is used with the _TYPE_ value CSPARMS to identify the cross section for the first-order autoregressive parameter estimate contained in the observation. The _CSID_ variable is missing for observations with other _TYPE_ values. (Currently, only the _A_1 variable contains values for CSPARMS observations.)
1322 F Chapter 19: The PANEL Procedure
_VARCS_
is the variance component estimate due to cross sections. The _VARCS_ variable is included in the OUTEST= data set when either the FULLER or DASILVA option is specified.
_VARTS_
is the variance component estimate due to time series. The _VARTS_ variable is included in the OUTEST= data set when either the FULLER or DASILVA option is specified.
_VARERR_
is the variance component estimate due to error. The _VARERR_ variable is included in the OUTEST= data set when the FULLER option is specified.
_A_1
is the first-order autoregressive parameter estimate. The _A_1 variable is included in the OUTEST= data set when the PARKS option is specified. The values of _A_1 are cross-sectional parameters, meaning that they are estimated for each cross section separately. The _A_1 variable has a value only for _TYPE_=CSPARMS observations. The cross section to which the estimate belongs is indicated by the _CSID_ variable.
INTERCEP
is the intercept parameter estimate. (INTERCEP is missing for models when the NOINT option is specified.)
regressors
are the regressor variables specified in the MODEL statement. The regressor variables in the OUTEST= data set contain the corresponding parameter estimates for the model identified by _MODEL_ for _TYPE_=PARMS observations, and the corresponding covariance or correlation matrix elements for _TYPE_=COVB and _TYPE_=CORRB observations. The response variable contains the value–1 for the _TYPE_=PARMS observation for its model.
The OUTTRANS= Data Set PROC PANEL writes the transformed series to an output data set. That is, if the user selects FIXONE, FIXONETIME, or RANONE and supplies the OUTTRANS = option, the transformed dependent variable and independent variables are written out to a SAS data set; other variables in the input data set are copied unchanged. Say that your data set contains variables y, x1, x2, x3, and z2. The following statements result in a SAS data set with several characteristics: proc panel data=datain outtrans=dataout; id cs ts; model y = x1 x2 x3 / fixone; run;
First, z2 is copied over. Then y, x1, x2, and x3, are replaced with their mean deviates (from cross sectional). Furthermore, two new variables are created. _MODELL_
is the model’s label (if it exists).
_TRTYPE_
is the model’s transformation type. In the FIXONE case, this is _FIXONE_ or _FIXONET_. If the model RANONE model is selected, the _TRTYPE_ variable
Printed Output F 1323
is either _Ran1FB_, _Ran1WK_, _Ran1WH_ or _Ran1NL_ depending on the variance component estimators chosen.
Printed Output For each MODEL statement, the printed output from PROC PANEL includes the following: a model description, which gives the estimation method used, the model statement label if specified, the number of cross sections and the number of observations in each cross section, and the order of moving average error process for the DASILVA option. For fixed-effects model analysis, an F test for the absence of fixed effects is produced, and for random-effects model analysis, a Hausman test is used for the appropriateness of the random-effects specification. the estimates of the underlying error structure parameters the regression parameter estimates and analysis. For each regressor, this includes the name of the regressor, the degrees of freedom, the parameter estimate, the standard error of the estimate, a t statistic for testing whether the estimate is significantly different from 0, and the significance probability of the t statistic. Optionally, PROC PANEL prints the following: the covariance and correlation of the resulting regression parameter estimates for each model and assumed error structure O matrix that is the estimated contemporaneous covariance matrix for the PARKS option the ˆ
ODS Table Names PROC PANEL assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in Table 19.3. Table 19.3
ODS Tables Produced in PROC PANEL
ODS Table Name
Description
ODS Tables Created by the MODEL Statement ModelDescription Model description FitStatistics Fit statistics FixedEffectsTest F test for no fixed effects ParameterEstimates
Parameter estimates
Option
Default Default FIXONE,FIXTWO, FIXONET Default
1324 F Chapter 19: The PANEL Procedure
Table 19.3
(continued)
ODS Table Name CovB CorrB VarianceComponents
Description Covariance of parameter estimates Correlations of parameter estimates Variance component estimates
RandomEffectsTest
Hausman test for random effects
AR1Estimates
First-order autoregressive parameter estimates Tests of linear restrictions Breusch-Pagan one-way test Breusch-Pagan two-way test Sargan’s test for overidentification Autoregression test for the residuals Iteration history Convergence status of iterated GMM estimator Estimated phi matrix Estimates of autocovariances
TestResults BreuschPaganTest BreuschPaganTest2 Sargan ARTest IterHist ConvergenceStatus EstimatedPhiMatrix EstimatedAutocovariances
Option COVB CORRB RANONE, RANTWO, DASILVA RANONE, RANTWO RHO(PARKS)
BP BP2 GMM GMM ITGMM ITGMM PARKS PARKS
ODS Tables Created by the TEST Statement TestResults Test results
Example: PANEL Procedure
Example 19.1: Analyzing Demand for Liquid Assets In this example, the demand equations for liquid assets are estimated. The demand function for the demand deposits is estimated under three error structures while demand equations for time deposits and savings and loan (S&L) association shares are calculated using the Parks method. The data for seven states (CA, DC, FL, IL, NY, TX, and WA) are selected out of 49 states. See Feige (1964) for data description. All variables were transformed via natural logarithm. The data set A is shown below. data a; length state $ 2; input state $ year d t s y rd rt rs; label d = ’Per Capita Demand Deposits’
Example 19.1: Analyzing Demand for Liquid Assets F 1325
t = ’Per Capita Time Deposits’ s = ’Per Capita S & L Association Shares’ y = ’Permanent Per Capita Personal Income’ rd = ’Service Charge on Demand Deposits’ rt = ’Interest on Time Deposits’ rs = ’Interest on S & L Association Shares’; datalines; CA 1949 6.2785 6.1924 4.4998 7.2056 -1.0700 0.1080 CA 1950 6.4019 6.2106 4.6821 7.2889 -1.0106 0.1501 CA 1951 6.5058 6.2729 4.8598 7.3827 -1.0024 0.4008 CA 1952 6.4785 6.2729 5.0039 7.4000 -0.9970 0.4492 CA 1953 6.4118 6.2538 5.1761 7.4200 -0.8916 0.4662
1.0664 1.0767 1.1291 1.1227 1.2110
... more lines ...
As shown in the following statements, the SORT procedure is used to sort the data into the required time series cross-sectional format; then PROC PANEL analyzes the data. proc sort data=a; by state year; run; proc panel data=a; model d = y rd rt rs / fuller parks dasilva m=7; model t = y rd rt rs / parks; model s = y rd rt rs / parks; id state year; run;
The income elasticities for liquid assets are greater than 1 except for the demand deposit income elasticity (0.692757) estimated by the Da Silva method. In Output 19.1.1, Output 19.1.2, and Output 19.1.3, the coefficient estimates (–0.29094, –0.43591, and –0.27736) of demand deposits (RD) imply that demand deposits increase significantly as the service charge is reduced. The price elasticities (0.227152 and 0.408066) for time deposits (RT) and S&L association shares (RS) have the expected sign. Thus an increase in the interest rate on time deposits or S&L shares will increase the demand for the corresponding liquid asset. Demand deposits and S&L shares appear to be substitutes (see Output 19.1.2, Output 19.1.3, and Output 19.1.5). Time deposits are also substitutes for S&L shares in the time deposit demand equation (see Output 19.1.4), while these liquid assets are independent of each other in Output 19.1.5 (insignificant coefficient estimate of RT, 0:02705). Demand deposits and time deposits appear to be weak complements in Output 19.1.3 and Output 19.1.4, while the cross elasticities between demand deposits and time deposits are not significant in Output 19.1.2 and Output 19.1.5.
1326 F Chapter 19: The PANEL Procedure
Output 19.1.1 Demand for Demand Deposits, Fuller-Battese Method The PANEL Procedure Fuller and Battese Variance Components (RanTwo) Dependent Variable: d Per Capita Demand Deposits Model Description Estimation Method Number of Cross Sections Time Series Length
Fuller 7 11
Fit Statistics SSE MSE R-Square
0.0795 0.0011 0.6786
DFE Root MSE
72 0.0332
Variance Component Estimates Variance Component for Cross Sections Variance Component for Time Series Variance Component for Error
0.03427 0.00026 0.00111
Hausman Test for Random Effects DF
m Value
Pr > m
4
5.51
0.2385
Parameter Estimates
DF
Estimate
Standard Error
t Value
Pr > |t|
Intercept y
1 1
-1.23606 1.064058
0.7252 0.1040
-1.70 10.23
0.0926 <.0001
rd
1
-0.29094
0.0526
-5.53
<.0001
rt
1
0.039388
0.0278
1.42
0.1603
rs
1
-0.32662
0.1140
-2.86
0.0055
Variable
Label Intercept Permanent Per Capita Personal Income Service Charge on Demand Deposits Interest on Time Deposits Interest on S & L Association Shares
Example 19.1: Analyzing Demand for Liquid Assets F 1327
Output 19.1.2 Demand for Demand Deposits, Parks Method The PANEL Procedure Parks Method Estimation Dependent Variable: d Per Capita Demand Deposits Model Description Estimation Method Number of Cross Sections Time Series Length
Parks 7 11
Fit Statistics SSE MSE R-Square
40.0198 0.5558 0.9263
DFE Root MSE
72 0.7455
Parameter Estimates
DF
Estimate
Standard Error
t Value
Pr > |t|
Intercept y
1 1
-2.66565 1.222569
0.4250 0.0573
-6.27 21.33
<.0001 <.0001
rd
1
-0.43591
0.0272
-16.03
<.0001
rt
1
0.041237
0.0284
1.45
0.1505
rs
1
-0.26683
0.0886
-3.01
0.0036
Variable
Label Intercept Permanent Per Capita Personal Income Service Charge on Demand Deposits Interest on Time Deposits Interest on S & L Association Shares
Output 19.1.3 Demand for Demand Deposits, DaSilva Method The PANEL Procedure Da Silva Method Estimation Dependent Variable: d Per Capita Demand Deposits Model Description Estimation Method Number of Cross Sections Time Series Length Order of MA Error Process
DaSilva 7 11 7
Fit Statistics SSE MSE R-Square
21609.8923 300.1374 0.4995
DFE Root MSE
72 17.3245
1328 F Chapter 19: The PANEL Procedure
Output 19.1.3 continued Variance Component Estimates Variance Component for Cross Sections Variance Component for Time Series
0.03063 0.000148
Estimates of Autocovariances Lag Gamma 0 1 2 3 4 5 6 7
0.0008558553 0.0009081747 0.0008494797 0.0007889687 0.0013281983 0.0011091685 0.0009874973 0.0008462601
Output 19.1.4 Demand for Time Deposits, Parks Method The PANEL Procedure Parks Method Estimation Dependent Variable: t Per Capita Time Deposits Model Description Estimation Method Number of Cross Sections Time Series Length
Parks 7 11
Fit Statistics SSE MSE R-Square
34.5713 0.4802 0.9517
DFE Root MSE
72 0.6929
Parameter Estimates
DF
Estimate
Standard Error
t Value
Pr > |t|
Intercept y
1 1
-5.33334 1.516344
0.6780 0.1097
-7.87 13.82
<.0001 <.0001
rd
1
-0.04791
0.0399
-1.20
0.2335
rt
1
0.227152
0.0449
5.06
<.0001
rs
1
-0.42569
0.1708
-2.49
0.0150
Variable
Label Intercept Permanent Per Capita Personal Income Service Charge on Demand Deposits Interest on Time Deposits Interest on S & L Association Shares
Example 19.2: The Airline Cost Data: Fixtwo Model F 1329
Output 19.1.5 Demand for Savings and Loan Shares, Parks Method The PANEL Procedure Parks Method Estimation Dependent Variable: s Per Capita S & L Association Shares Model Description Estimation Method Number of Cross Sections Time Series Length
Parks 7 11
Fit Statistics SSE MSE R-Square
39.2550 0.5452 0.9017
DFE Root MSE
72 0.7384
Parameter Estimates
DF
Estimate
Standard Error
t Value
Pr > |t|
Intercept y
1 1
-8.09632 1.832988
1.0628 0.1567
-7.62 11.70
<.0001 <.0001
rd
1
0.576723
0.0589
9.80
<.0001
rt
1
-0.02705
0.0423
-0.64
0.5242
rs
1
0.408066
0.1478
2.76
0.0073
Variable
Label Intercept Permanent Per Capita Personal Income Service Charge on Demand Deposits Interest on Time Deposits Interest on S & L Association Shares
Example 19.2: The Airline Cost Data: Fixtwo Model The Christenson Associates airline data are a frequently cited data set (see Greene 2000). The data measure costs, prices of inputs, and utilization rates for six airlines over the time span 1970–1984. This example analyzes the log transformations of the cost, price and quantity, and the raw (not logged) capacity utilization measure. You speculate the following model: ln .T Cit / D ˛N C T C .˛i
˛N / C . t
T /
C ˇ1 ln .Qit / C ˇ2 ln .PFit / C ˇ3 LFit C it where the ˛ are the pure cross-sectional effects and are the time effects. The actual model speculated is highly nonlinear in the original variables. It would look like the following: ˇ
ˇ
T Cit D exp .˛i C t C ˇ3 LFit C it / Qit 1 PFit 2 The data and preliminary SAS statements are:
1330 F Chapter 19: The PANEL Procedure
data airline; input Obs I T C Q PF LF; label obs = "Observation number"; label I = "Firm Number (CSID)"; label T = "Time period (TSID)"; label Q = "Output in revenue passenger miles (index)"; label C = "Total cost, in thousands"; label PF = "Fuel price"; label LF = "Load Factor (utilization index)"; datalines; ... more lines ...
data airline; set airline; lC = log(C); lQ = log(Q); lPF = log(PF); label lC = "Log transformation of costs"; label lQ = "Log transformation of quantity"; label lPF= "Log transformation of price of fuel"; run;
The following statements fit the model. proc panel data=airline; id i t; model lC = lQ lPF LF / fixtwo; run;
First, you see the model’s description in Output 19.2.1. The model is a two-way fixed-effects model. There are six cross sections and fifteen time observations. Output 19.2.1 The Airline Cost Data—Model Description The PANEL Procedure Fixed Two Way Estimates Dependent Variable: lC Log transformation of costs Model Description Estimation Method Number of Cross Sections Time Series Length
FixTwo 6 15
The R-square and degrees of freedom can be seen in Table 19.2.2. On the whole, you see a large R-square, so there is a reasonable fit. The degrees of freedom of the estimate are 90 minus 14 time dummy variables minus 5 cross section dummy variables and 4 regressors.
Example 19.2: The Airline Cost Data: Fixtwo Model F 1331
Output 19.2.2 The Airline Cost Data—Fit Statistics Fit Statistics SSE MSE R-Square
0.1768 0.0026 0.9984
DFE Root MSE
67 0.0514
The F test for fixed effects is shown in Table 19.2.3. Testing the hypothesis that there are no fixed effects, you easily reject the null of poolability. There are group effects, or time effects, or both. The test is highly significant. OLS would not give reasonable results. Output 19.2.3 The Airline Cost Data—Test for Fixed Effects F Test for No Fixed Effects Num DF
Den DF
F Value
Pr > F
19
67
23.10
<.0001
Looking at the parameters, you see a more complicated pattern. Most of the cross-sectional effects are highly significant (with the exception of CS2). This means that the cross sections are significantly different from the sixth cross section. Many of the time effects show significance, but this is not uniform. It looks like the significance might be driven by a large 16t h period effect, since the first six time effects are negative and of similar magnitude. The time dummy variables taper off in size and lose significance from time period 12 onward. There are many causes to which you could attribute this decay of time effects. The time period of the data spans the OPEC oil embargoes and the dissolution of the Civil Aeronautics Board (CAB). These two forces are two possible reasons to observe the decay and parameter instability. As for the regression parameters, you see that quantity affects cost positively, and the price of fuel has a positive effect, but load factors negatively affect the costs of the airlines in this sample. The somewhat disturbing result is that the fuel cost is not significant. If the time effects are proxies for the effect of the oil embargoes, then an insignificant fuel cost parameter would make some sense. If the dummy variables proxy for the dissolution of the CAB, then the effect of load factors is also not being precisely estimated.
1332 F Chapter 19: The PANEL Procedure
Output 19.2.4 The Airline Cost Data—Parameter Estimates Parameter Estimates
DF
Estimate
Standard Error
t Value
Pr > |t|
CS1
1
0.174237
0.0861
2.02
0.0470
CS2
1
0.111412
0.0780
1.43
0.1576
CS3
1
-0.14354
0.0519
-2.77
0.0073
CS4
1
0.18019
0.0321
5.61
<.0001
CS5
1
-0.04671
0.0225
-2.08
0.0415
TS1
1
-0.69286
0.3378
-2.05
0.0442
TS2
1
-0.63816
0.3321
-1.92
0.0589
TS3
1
-0.59554
0.3294
-1.81
0.0751
TS4
1
-0.54192
0.3189
-1.70
0.0939
TS5
1
-0.47288
0.2319
-2.04
0.0454
TS6
1
-0.42705
0.1884
-2.27
0.0267
TS7
1
-0.39586
0.1733
-2.28
0.0255
TS8
1
-0.33972
0.1501
-2.26
0.0269
TS9
1
-0.2718
0.1348
-2.02
0.0478
TS10
1
-0.22734
0.0763
-2.98
0.0040
TS11
1
-0.1118
0.0319
-3.50
0.0008
TS12
1
-0.03366
0.0429
-0.78
0.4354
TS13
1
-0.01775
0.0363
-0.49
0.6261
TS14
1
-0.01865
0.0305
-0.61
0.5430
Intercept lQ
1 1
12.93834 0.817264
2.2181 0.0318
5.83 25.66
<.0001 <.0001
lPF
1
0.168732
0.1635
1.03
0.3057
LF
1
-0.88267
0.2617
-3.37
0.0012
Variable
Label Cross Sectional Effect 1 Cross Sectional Effect 2 Cross Sectional Effect 3 Cross Sectional Effect 4 Cross Sectional Effect 5 Time Series Effect 1 Time Series Effect 2 Time Series Effect 3 Time Series Effect 4 Time Series Effect 5 Time Series Effect 6 Time Series Effect 7 Time Series Effect 8 Time Series Effect 9 Time Series Effect 10 Time Series Effect 11 Time Series Effect 12 Time Series Effect 13 Time Series Effect 14 Intercept Log transformation of quantity Log transformation of price of fuel Load Factor (utilization index)
ODS Graphics Plots F 1333
ODS Graphics Plots ODS graphics plots can be obtained to graphically analyze the results. The following statements show how to generate the plots. If the PLOTS=ALL option is specified, all available plots are produced in two panels. For a complete list of options, see the section “ODS Graphics” on page 1320. proc panel data=airline; id i t; model lC = lQ lPF LF / fixtwo plots = all; run;
The preceding statements result in plots shown in Output 19.2.5 and Output 19.2.6. Output 19.2.5 Diagnostic Panel 1
1334 F Chapter 19: The PANEL Procedure
Output 19.2.6 Diagnostic Panel 2
The UNPACK and ONLY options produce individual detail images of paneled plots. The graph shown in Output 19.2.7 shows a detail plot of residuals by cross section. The packed version always puts all cross sections on one plot while the unpacked one shows the cross sections in groups of ten to avoid loss of detail. proc panel data=airline; id i t; model lC = lQ lPF LF / fixtwo plots(unpack only) = residsurface; run;
Example 19.3: The Airline Cost Data: Further Analysis F 1335
Output 19.2.7 Surface Plot of the Residual
Example 19.3: The Airline Cost Data: Further Analysis Using the same data as in Example 19.2, you further investigate the ‘true’ effect of fuel prices. Specifically, you run the FixOne model, ignoring time effects. You specify the following statements in PROC PANEL to run this model: proc panel data=airline; id i t; model lC = lQ lPF LF / fixone; run;
The preceding statements result in Output 19.3.1. The fit seems to have deteriorated somewhat. The SSE rises from 0.1768 to 0.2926.
1336 F Chapter 19: The PANEL Procedure
Output 19.3.1 The Airline Cost Data—Fit Statistics The PANEL Procedure Fixed One Way Estimates Dependent Variable: lC Log transformation of costs Fit Statistics SSE MSE R-Square
0.2926 0.0036 0.9974
DFE Root MSE
81 0.0601
You still reject poolability based on the F test in Output 19.3.2 at all accepted levels of significance. Output 19.3.2 The Airline Cost Data—Test for Fixed Effects F Test for No Fixed Effects Num DF
Den DF
F Value
Pr > F
5
81
57.74
<.0001
The parameters change somewhat dramatically as shown in Output 19.3.3. The effect of fuel costs comes in very strong and significant. The load factor’s coefficient increases, although not as dramatically. This suggests that the fixed time effects might be proxies for both the oil shocks and deregulation. Output 19.3.3 The Airline Cost Data—Parameter Estimates Parameter Estimates
DF
Estimate
Standard Error
t Value
Pr > |t|
CS1
1
-0.08708
0.0842
-1.03
0.3041
CS2
1
-0.12832
0.0757
-1.69
0.0940
CS3
1
-0.29599
0.0500
-5.92
<.0001
CS4
1
0.097487
0.0330
2.95
0.0041
CS5
1
-0.06301
0.0239
-2.64
0.0100
Intercept lQ
1 1
9.79304 0.919293
0.2636 0.0299
37.15 30.76
<.0001 <.0001
lPF
1
0.417492
0.0152
27.47
<.0001
LF
1
-1.07044
0.2017
-5.31
<.0001
Variable
Label Cross Sectional Effect 1 Cross Sectional Effect 2 Cross Sectional Effect 3 Cross Sectional Effect 4 Cross Sectional Effect 5 Intercept Log transformation of quantity Log transformation of price of fuel Load Factor (utilization index)
Example 19.4: The Airline Cost Data: Random-Effects Models F 1337
Example 19.4: The Airline Cost Data: Random-Effects Models This example continues to use the Christenson Associates airline data, which measures costs, prices of inputs, and utilization rates for six airlines over the time span 1970–1984. There are six cross sections and fifteen time observations. Here, you examine the different estimates generated from the one-way random-effects and two-way random-effects models, by using four different methods to estimate the variance components: Fuller and Battese, Wansbeek and Kapteyn, Wallace and Hussain, and Nerlove. The data for this example is created by the PROC PANEL statements shown in Example 19.2. The PROC PANEL statements necessary to generate the estimates are as follows: proc panel data=airline id I T; RANONE: model lC = RANONEwk: model lC = RANONEwh: model lC = RANONEnl: model lC = RANTWO: model lC = RANTWOwk: model lC = RANTWOwh: model lC = RANTWOnl: model lC = POOLED: model lC = BTWNG: model lC = BTWNT: model lC = run;
outest=estimates; lQ lQ lQ lQ lQ lQ lQ lQ lQ lQ lQ
lPF lPF lPF lPF lPF lPF lPF lPF lPF lPF lPF
lF lF lF lF lF lF lF lF lF lF lF
/ / / / / / / / / / /
ranone vcomp=fb; ranone vcomp=wk; ranone vcomp=wh; ranone vcomp=nl; rantwo vcomp=fb; rantwo vcomp=wk; rantwo vcomp=wh; rantwo vcomp=nl; pooled; btwng; btwnt;
data table; set estimates; VarCS = round(_VARCS_,.00001); VarTS = round(_VARTS_,.00001); VarErr = round(_VARERR_,.00001); Int = round(Intercept,.0001); lQ2 = round(lQ,.0001); lPF2 = round(lPF,.0001); lF2 = round(lF,.0001); if _n_ >= 9 then do; VarCS = . ; VarTS = . ; keep _MODEL_ _METHOD_ VarCS VarTS VarErr Int lQ2 lPF2 lF2; run;
The parameter estimates and variance components for both models are reported in Output 19.4.1 and Output 19.4.2. title "Parameter Estimates"; proc print data=table label noobs; label _MODEL_ = "Model" _METHOD_ = "Method" Int = "Intercept" lQ2 = "lQ"
1338 F Chapter 19: The PANEL Procedure
lPF2 = "lPF" lF2 = "lF"; var _METHOD_ _MODEL_ Int lQ2 lPF2 lF2; run;
Output 19.4.1 Parameter Estimates Parameter Estimates Method _Ran1FB_ _Ran1WK_ _Ran1WH_ _Ran1NL_ _Ran2FB_ _Ran2WK_ _Ran2WH_ _Ran2NL_ _POOLED_ _BTWGRP_ _BTWTME_
Model RANONE RANONEWK RANONEWH RANONENL RANTWO RANTWOWK RANTWOWH RANTWONL POOLED BTWNG BTWNT
Intercept
lQ
lPF
lF
9.7097 9.6295 9.6439 9.6406 9.3627 9.6436 9.3793 9.9726 9.5169 85.8094 11.1849
0.9187 0.9069 0.9090 0.9086 0.8665 0.8433 0.8692 0.8387 0.8827 0.7825 1.1333
0.4177 0.4227 0.4218 0.4220 0.4362 0.4097 0.4353 0.3829 0.4540 -5.5240 0.3343
-1.0700 -1.0646 -1.0650 -1.0648 -0.9805 -0.9263 -0.9852 -0.9134 -1.6275 -1.7509 -1.3509
title "Variance Component Estimates" ; proc print data=table label noobs; label _MODEL_ = "Model" _METHOD_ = "Method" VarCS = "Variance Component for Cross Sections" VarTS = "Variance Component for Time Series" VarErr = "Variance Component for Error"; var _METHOD_ _MODEL_ VarCS VarTS VarErr; run; title ’’;
Output 19.4.2 Variance Component Estimates Variance Component Estimates
Method _Ran1FB_ _Ran1WK_ _Ran1WH_ _Ran1NL_ _Ran2FB_ _Ran2WK_ _Ran2WH_ _Ran2NL_ _POOLED_ _BTWGRP_ _BTWTME_
Model RANONE RANONEWK RANONEWH RANONENL RANTWO RANTWOWK RANTWOWH RANTWONL POOLED BTWNG BTWNT
Variance Component for Cross Sections
Variance Component for Time Series
Variance Component for Error
0.47442 0.01602 0.01871 0.01745 0.01744 0.01561 0.01875 0.01707 . . .
. . . . 0.00108 0.03913 0.00085 0.05909 . . .
0.00361 0.00361 0.00328 0.00325 0.00264 0.00264 0.00250 0.00196 0.01553 0.01584 0.00051
Example 19.4: The Airline Cost Data: Random-Effects Models F 1339
In the random-effects model, individual constant terms are viewed as randomly distributed across cross-sectional units and not as parametric shifts of the regression function, as in the fixed-effects model. This is appropriate when the sampled cross-sectional units are drawn from a large population. Clearly, in this example, the six airlines are a sample of all the airlines in the industry and not an exhaustive, or nearly exhaustive, list. There are four ways of computing the variance components in the one-way random-effects model. The method by Fuller and Battese (1974) (FB), uses a “fitting of constants” methods to estimate them. The Wansbeek and Kapteyn (1989) (WK) method uses the true disturbances, while the Wallace and Hussain (WH) method uses ordinary least squares residuals. Looking at the estimates of the variance components for cross section and error in Output 19.4.2, you see that equal variance components for error are computed for both FB and WK, while WH and NL are nearly equal. All four techniques produce different variance components for cross sections. These estimates are then used to estimate the values of the parameters in Output 19.4.1. All the parameters appear to have similar and equally plausible estimates. Both the index for output in revenue passenger miles (lQ) and fuel price (lPF) have small, positive effects on total costs, which you would expect. The load factor (LF) has a somewhat larger and negative effect on total costs, suggesting that as utilization increases, costs decrease. As in the one-way random-effects model, the variance components for error produced by the FB and WK methods are equal. However, in this case, the WH and NL methods produce variance estimates that are dissimilar. The estimates of the variance component for cross sections are all different, but in a close range. The same cannot be said for the variance component for time series. As varied as each of the variance estimates may be, they produce parameter estimates that are similar and plausible. As with the one-way effects model, the index for output (lQ) and fuel price (lPF) are small and positive. The load factor (LF) estimates are all negative and, with the exception of the estimate produced by the WH method, somewhat smaller than the estimates produced in the oneway model. During the time the data were collected, the Civil Aeronautics Board dissolved, so it is possible that the dummy variables are proxies for this dissolution. This would lead to the decay of time effects and an imprecise estimation of the effects of the load factors, even though the estimates are statistically significant. The pooled estimates give you something to compare the random-effects estimates against. You see that signs and magnitudes of output and fuel price are similar but that the magnitude of the load factor coefficient is somewhat larger under pooling. Since the model appears to have both cross-sectional and time series effects, the pooled model should not be used. Finally, you examine the between groups estimators. For the between groups estimate, you are looking at each airline’s data averaged across time. You see in Output 19.4.1 that the between groups parameter estimates are radically different from all other parameter estimates. This could indicate that the time component is not being appropriately handled with this technique. For the between times estimate, you are looking at the average across all airlines in each time period. In this case, the parameter estimates are of the same sign and closer in magnitude to the previously computed estimates. Both the output and load factor effects appear to have more bearing on total costs.
1340 F Chapter 19: The PANEL Procedure
Example 19.5: Using the FLATDATA Statement Sometimes the data can be found in compressed form, where each line consists of all observations for the dependent and independent variables for the cross section. To illustrate, suppose you have a data set with 20 cross sections where each cross section consists of observations for six time periods. Each time period has values for dependent and independent variables Y1 . . . Y6 and X1 . . . X6 . The cs and num variables represent other character and numeric variables that are constant across each cross section. The observations for first five cross sections along with other variables are shown in Output 19.5.1. In this example, i represents the cross section. The time period is identified by the subscript on the Y and X variables; it ranges from 1 to 6. Output 19.5.1 Compressed Data Set ’ Obs
i
cs
1 2 3 4 5
1 2 3 4 5
CS1 CS2 CS3 CS4 CS5
Obs 1 2 3 4 5
num
X_1
X_2
X_3
X_4
X_5
-1.56058 0.30989 0.85054 -0.18885 -0.04761
0.40268 1.01950 0.60325 -0.64946 -0.79692
0.91951 -0.04699 0.71154 -1.23355 0.63445
0.69482 -0.96695 0.66168 0.04554 -2.23539
-2.28899 -1.08345 -0.66823 -0.24996 -0.37629
-1.32762 -0.05180 -1.87550 0.09685 -0.82212
X_6
Y_1
Y_2
Y_3
Y_4
Y_5
Y_6
1.92348 0.30266 0.55065 -0.92771 -0.70566
2.30418 4.50982 4.07276 2.40304 3.58092
2.11850 3.73887 4.89621 1.48182 6.08917
2.66009 1.44984 3.90470 2.70579 3.08249
-4.94104 -1.02996 1.03437 3.82672 4.26605
-0.83053 2.78260 0.54598 4.01117 3.65452
5.01359 1.73856 5.01460 1.97639 0.81826
Example 19.5: Using the FLATDATA Statement F 1341
Since the PANEL procedure cannot work directly with the data in compressed form, the FLATDATA statement can be used to transform the data. The OUT= option can be used to output transformed data to a data set. proc panel data=flattest; flatdata indid=i tsname="t" transform=(X Y) keep=( cs num seed ) / out=flat_out; id i t; model y = x / fixone noint; run;
First, six observations for the uncompressed data set and results for the one-way fixed-effects model fitted are shown in Output 19.5.2 and Output 19.5.3. Output 19.5.2 Uncompressed Data Set ’ Obs
I
t
1 2 3 4 5 6
1 1 1 1 1 1
1 2 3 4 5 6
X 0.40268 0.91951 0.69482 -2.28899 -1.32762 1.92348
Y 2.30418 2.11850 2.66009 -4.94104 -0.83053 5.01359
CS CS1 CS1 CS1 CS1 CS1 CS1
NUM -1.56058 -1.56058 -1.56058 -1.56058 -1.56058 -1.56058
1342 F Chapter 19: The PANEL Procedure
Output 19.5.3 Estimation with the FLATDATA Statement ’ The PANEL Procedure Fixed One Way Estimates Dependent Variable: Y Parameter Estimates
DF
Estimate
Standard Error
t Value
Pr > |t|
CS1
1
0.945589
0.4579
2.06
0.0416
CS2
1
2.475449
0.4582
5.40
<.0001
CS3
1
3.250337
0.4579
7.10
<.0001
CS4
1
3.712149
0.4617
8.04
<.0001
CS5
1
5.023584
0.4661
10.78
<.0001
CS6
1
6.791074
0.4707
14.43
<.0001
CS7
1
6.11374
0.4649
13.15
<.0001
CS8
1
8.733843
0.4580
19.07
<.0001
CS9
1
8.916685
0.4587
19.44
<.0001
CS10
1
8.913916
0.4614
19.32
<.0001
CS11
1
10.82881
0.4580
23.64
<.0001
CS12
1
11.40867
0.4603
24.79
<.0001
CS13
1
12.8865
0.4585
28.10
<.0001
CS14
1
13.37819
0.4580
29.21
<.0001
CS15
1
14.72619
0.4579
32.16
<.0001
CS16
1
15.58813
0.4580
34.04
<.0001
CS17
1
17.77983
0.4579
38.83
<.0001
CS18
1
17.9909
0.4618
38.96
<.0001
CS19
1
18.87283
0.4583
41.18
<.0001
CS20
1
19.40034
0.4579
42.37
<.0001
X
1
2.010753
0.1217
16.52
<.0001
Variable
Label Cross Sectional Effect 1 Cross Sectional Effect 2 Cross Sectional Effect 3 Cross Sectional Effect 4 Cross Sectional Effect 5 Cross Sectional Effect 6 Cross Sectional Effect 7 Cross Sectional Effect 8 Cross Sectional Effect 9 Cross Sectional Effect 10 Cross Sectional Effect 11 Cross Sectional Effect 12 Cross Sectional Effect 13 Cross Sectional Effect 14 Cross Sectional Effect 15 Cross Sectional Effect 16 Cross Sectional Effect 17 Cross Sectional Effect 18 Cross Sectional Effect 19 Cross Sectional Effect 20
Example 19.6: The Cigarette Sales Data: Dynamic Panel Estimation with GMM F 1343
Example 19.6: The Cigarette Sales Data: Dynamic Panel Estimation with GMM In this example, a dynamic panel demand model for cigarette sales is estimated. It illustrates the application of the method described in the section “Dynamic Panel Estimator” on page 1304. The data are a panel from 46 American states over the period 1963–92. See Baltagi and Levin (1992) and Baltagi (1995) for data description. All variables were transformed by taking the natural logarithm. The data set CIGAR is shown in the following statements. data cigar; input state year price pop pop_16 cpi ndi sales pimin; label state = ’State abbreviation’ year = ’YEAR’ price = ’Price per pack of cigarettes’ pop = ’Population’ pop_16 = ’Population above the age of 16’ cpi = ’Consumer price index with (1983=100)’ ndi = ’Per capita disposable income’ sales = ’Cigarette sales in packs per capita’ pimin = ’Minimum price in adjoining states per pack of cigarettes’; datalines; 1 63 28.6 3383 2236.5 30.6 1558.3045298 93.9 26.1 1 64 29.8 3431 2276.7 31.0 1684.0732025 95.4 27.5 1 65 29.8 3486 2327.5 31.5 1809.8418752 98.5 28.9 1 66 31.5 3524 2369.7 32.4 1915.1603572 96.4 29.5 1 67 31.6 3533 2393.7 33.4 2023.5463678 95.5 29.6 1 68 35.6 3522 2405.2 34.8 2202.4855362 88.4 32 1 69 36.6 3531 2411.9 36.7 2377.3346665 90.1 32.8 1 70 39.6 3444 2394.6 38.8 2591.0391591 89.8 34.3 1 71 42.7 3481 2443.5 40.5 2785.3159706 95.4 35.8 ... more lines ...
The following statements sort the data by STATE and YEAR variables. proc sort data=cigar; by state year; run;
Next, logarithms of the variables required for regression estimation are calculated, as shown in the following statements: data cigar; set cigar; lsales = log(sales); lprice = log(price); lndi = log(ndi); lpimin = log(pimin);
1344 F Chapter 19: The PANEL Procedure
label label label label
lprice lndi = lsales lpimin
= ’Log price per pack of cigarettes’; ’Log per capita disposable income’; = ’Log cigarette sales in packs per capita’; = ’Log minimum price in adjoining states per pack of cigarettes’;
run;
The following statements create the CIGAR_LAG data set with lagged variable for each cross section. proc panel data=cigar; id state year; clag lsales(1) / out=cigar_lag; run; data cigar_lag; set cigar_lag; label lsales_1 = ’Lagged log cigarette sales in packs per capita’; run;
Finally, the model is estimated by a two step GMM method. Five lags (MAXBAND=5) of the dependent variable are used as instruments. NOLEVELS options is specified to avoid use of level equations, as shown in the following statements: proc panel data=cigar_lag; inst depvar; model lsales = lsales_1 lprice lndi lpimin / gmm nolevels twostep maxband=5; id state year; run;
Output 19.6.1 Estimation with GMM ’ The PANEL Procedure GMM: First Differences Transformation Dependent Variable: lsales Log cigarette sales in packs per capita Model Description Estimation Method Number of Cross Sections Time Series Length Estimate Stage Maximum Number of Time Periods (MAXBAND)
GMM 46 30 2 5
Fit Statistics SSE MSE
3357.3384 2.6168
DFE Root MSE
1283 1.6176
References: PANEL Procedure F 1345
Output 19.6.1 continued Parameter Estimates
Variable
DF
Estimate
Standard Error
t Value
Pr > |t|
Intercept lsales_1 lprice lndi lpimin
1 1 1 1 1
0.00415 0.603963 -0.24099 0.17851 -0.07458
0.000873 0.00937 0.0305 0.0121 0.0312
4.75 64.44 -7.90 14.81 -2.39
<.0001 <.0001 <.0001 <.0001 0.0169
If the theory suggests that there are other valid instruments, PREDETERMINED, EXOGENOUS and CORRELATED options can also be used.
References: PANEL Procedure Arellano, M. (1987), “Computing Robust Standard Errors for Within-Groups Estimators,” Oxford Bulletin of Economics and Statistics, 49, 431-434. Arellano, M. and Bond, S. (1991), “Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations,” The Review of Economic Studies, 58(2), 277-297. Arellano, M. and Bover, O. (1995), “Another Look at the Instrumental Variable Estimation of ErrorComponents Models ,” Journal of Econometrics, 68(1), 29-51. Baltagi, B. H. (1995), Econometric Analysis of Panel Data, New York: John Wiley & Sons. Baltagi, B. H. and Chang, Y. (1994), “Incomplete Panels: A Comparative Study of Alternative Estimators for the Unbalanced One-Way Error Component Regression Model,” Journal of Econometrics, 62(2), 67-89. Baltagi, B. H. and D. Levin (1992), “Cigarette Taxation: Raising Revenues and Reducing Consumption,” Structural Change and Economic Dynamics, 3, 321-335. Baltagi, B. H., Song, Seuck H., and Jung, Byoung C. (2002), “A Comparative Study of Alternative Estimators for the Unbalanced Two-Way Error Component Regression Model,” Econometrics Journal, 5, 480-493. Breusch, T. S. and Pagan, A. R. (1980), “The Lagrange Multiplier Test and Its Applications to Model Specification in Econometrics,” The Review of Economic Studies, 47:1, 239-253. Buse, A. (1973), “Goodness of Fit in Generalized Least Squares Estimation,” American Statistician, 27, 106-108.
1346 F Chapter 19: The PANEL Procedure
Davidson, R. and MacKinnon, J. G. (1993), Estimation and Inference in Econometrics, New York: Oxford University Press. Da Silva, J. G. C. (1975), “The Analysis of Cross-Sectional Time Series Data,” Ph.D. dissertation, Department of Statistics, North Carolina State University. Davis, Peter (2002), “Estimating Multi-Way Error Components Models with Unbalanced Data Structures,” Journal of Econometrics, 106:1, 67-95. Feige, E. L. (1964), The Demand for Liquid Assets: A Temporal Cross-Section Analysis, Englewood Cliffs: Prentice-Hall. Feige, E. L. and Swamy, P. A. V. (1974), “A Random Coefficient Model of the Demand for Liquid Assets,” Journal of Money, Credit, and Banking, 6, 241-252. Fuller, W. A. and Battese, G. E. (1974), “Estimation of Linear Models with Crossed-Error Structure,” Journal of Econometrics, 2, 67-78. Greene, W. H. (1990), Econometric Analysis, First Edition, New York: Macmillan Publishing Company. Greene, W. H. (2000), Econometric Analysis, Fourth Edition, New York: Macmillan Publishing Company. Hausman, J. A. (1978), “Specification Tests in Econometrics,” Econometrica, 46, 1251-1271. Hausman, J. A. and Taylor, W. E. (1982), “A Generalized Specification Test,” Economics Letters, 8, 239-245. Hsiao, C. (1986), Analysis of Panel Data, Cambridge: Cambridge University Press. Judge, G. G., Griffiths, W. E., Hill, R. C., Lutkepohl, H., and Lee, T. C. (1985), The Theory and Practice of Econometrics, Second Edition, New York: John Wiley & Sons. Kmenta, J. (1971), Elements of Econometrics, AnnArbor: The University of Michigan Press. Lamotte, L. R. (1994), “A Note on the Role of Independence in t Statistics Constructed from Linear Statistics in Regression Models,” The American Statistician, 48:3, 238-240. Maddala, G. S. (1977), Econometrics, New York: McGraw-Hill Co. Parks, R. W. (1967), “Efficient Estimation of a System of Regression Equations When Disturbances Are Both Serially and Contemporaneously Correlated,” Journal of the American Statistical Association, 62, 500-509. SAS Institute Inc. (1979), SAS Technical Report S-106, PANEL: A SAS Procedure for the Analysis of Time-Series Cross-Section Data, Cary, NC: SAS Institute Inc. Searle S. R. (1971), “Topics in Variance Component Estimation,” Biometrics, 26, 1-76. Seely, J. (1969), “Estimation in Finite-Dimensional Vector Spaces with Application to the Mixed Linear Model,” Ph.D. dissertation, Department of Statistics, Iowa State University.
References: PANEL Procedure F 1347
Seely, J. (1970a), “Linear Spaces and Unbiased Estimation,” Annals of Mathematical Statistics, 41, 1725-1734. Seely, J. (1970b), “Linear Spaces and Unbiased Estimation—Application to the Mixed Linear Model,” Annals of Mathematical Statistics, 41, 1735-1748. Seely, J. and Soong, S. (1971), “A Note on MINQUE’s and Quadratic Estimability,” Corvallis, Oregon: Oregon State University. Seely, J. and Zyskind, G. (1971), “Linear Spaces and Minimum Variance Unbiased Estimation,” Annals of Mathematical Statistics, 42, 691-703. Theil, H. (1961), Economic Forecasts and Policy, Second Edition, Amsterdam: North-Holland, 435-437. Wansbeek, T., and Kapteyn, Arie (1989), “Estimation of the Error-Components Model with Incomplete Panels,” Journal of Econometrics, 41, 341-361. White, H. (1980), “A Heteroscedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroscedasticity,” Econometrica, 48, 817-838. Wu, D. M. (1973), “Alternative Tests of Independence between Stochastic Regressors and Disturbances,” Econometrica, 41(4), 733-750. Zellner, A. (1962), “An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias,” Journal of the American Statistical Association, 57, 348-368.
1348
Chapter 20
The PDLREG Procedure Contents Overview: PDLREG Procedure . . . . . . . . . . . . . Getting Started: PDLREG Procedure . . . . . . . . . . Introductory Example . . . . . . . . . . . . . . . Syntax: PDLREG Procedure . . . . . . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . . . PROC PDLREG Statement . . . . . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . . . MODEL Statement . . . . . . . . . . . . . . . . . OUTPUT Statement . . . . . . . . . . . . . . . . RESTRICT Statement . . . . . . . . . . . . . . . Details: PDLREG Procedure . . . . . . . . . . . . . . . Missing Values . . . . . . . . . . . . . . . . . . . Polynomial Distributed Lag Estimation . . . . . . Autoregressive Error Model Estimation . . . . . . OUT= Data Set . . . . . . . . . . . . . . . . . . . Printed Output . . . . . . . . . . . . . . . . . . . ODS Table Names . . . . . . . . . . . . . . . . . Examples: PDLREG Procedure . . . . . . . . . . . . . Example 20.1: Industrial Conference Board Data Example 20.2: Money Demand Model . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
1349 1351 1351 1354 1354 1355 1355 1356 1358 1360 1361 1361 1361 1362 1363 1363 1364 1365 1365 1368 1373
Overview: PDLREG Procedure The PDLREG procedure estimates regression models for time series data in which the effects of some of the regressor variables are distributed across time. The distributed lag model assumes that the effect of an input variable X on an output Y is distributed over time. If you change the value of X at time t, Y will experience some immediate effect at time t, and it will also experience a delayed effect at times t C 1, t C 2, and so on up to time t C p for some limit p. The regression model supported by PROC PDLREG can include any number of regressors with distribution lags and any number of covariates. (Simple regressors without lag distributions are called
1350 F Chapter 20: The PDLREG Procedure
covariates.) For example, the two-regressor model with a distributed lag effect for one regressor is written yt D ˛ C
p X
ˇi xt
i
C zt C ut
i D0
Here, xt is the regressor with a distributed lag effect, zt is a simple covariate, and ut is an error term. The distribution of the lagged effects is modeled by Almon lag polynomials. The coefficients bi of the lagged values of the regressor are assumed to lie on a polynomial curve. That is, bi D ˛0 C
d X
˛j i j
j D1
where d. p/ is the degree of the polynomial. For the numerically efficient estimation, the PDLREG procedure uses orthogonal polynomials. The preceding equation can be transformed into orthogonal polynomials: bi D ˛0 C
d X
˛j fj .i /
j D1
where fj .i / is a polynomial of degree j in the lag length i, and ˛j is a coefficient estimated from the data. The PDLREG procedure supports endpoint restrictions for the polynomial. That is, you can constrain the estimated polynomial lag distribution curve so that b 1 D 0 or bpC1 D 0, or both. You can also impose linear restrictions on the parameter estimates for the covariates. You can specify a minimum degree and a maximum degree for the lag distribution polynomial, and the procedure fits polynomials for all degrees in the specified range. (However, if distributed lags are specified for more that one regressor, you can specify a range of degrees for only one of them.) The PDLREG procedure can also test for autocorrelated residuals and perform autocorrelated error correction by using the autoregressive error model. You can specify any order autoregressive error model and can specify several different estimation methods for the autoregressive model, including exact maximum likelihood. The PDLREG procedure computes generalized Durbin-Watson statistics to test for autocorrelated residuals. For models with lagged dependent variables, the procedure can produce Durbin h and Durbin t statistics. You can request significance level p-values for the Durbin-Watson, Durbin h, and Durbin t statistics. See Chapter 8, “The AUTOREG Procedure,” for details about these statistics. The PDLREG procedure assumes that the input observations form a time series. Thus, the PDLREG procedure should be used only for ordered and equally spaced time series data.
Getting Started: PDLREG Procedure F 1351
Getting Started: PDLREG Procedure Use the MODEL statement to specify the regression model. The PDLREG procedure’s MODEL statement is written like MODEL statements in other SAS regression procedures, except that a regressor can be followed by a lag distribution specification enclosed in parentheses. For example, the following MODEL statement regresses Y on X and Z and specifies a distributed lag for X: model y = x(4,2) z;
The notation X(4,2) specifies that the model includes X and 4 lags of X, with the coefficients of X and its lags constrained to follow a second-degree (quadratic) polynomial. Thus, the regression model specified by this MODEL statement is yt D a C b0 xt C b1 xt
1
C b2 xt
2
C b3 xt
3
C b4 xt
4
C czt C ut
bi D ˛0 C ˛1 f1 .i / C ˛2 f2 .i / where f1 .i / is a polynomial of degree 1 in i and f2 .i/ is a polynomial of degree 2 in i. Lag distribution specifications are enclosed in parentheses and follow the name of the regressor variable. The general form of the lag distribution specification is regressor-name ( length, degree, minimum-degree, end-constraint )
where length
is the length of the lag distribution—that is, the number of lags of the regressor to use.
degree
is the degree of the distribution polynomial.
minimum-degree
is an optional minimum degree for the distribution polynomial.
end-constraint
is an optional endpoint restriction specification, which can have the value FIRST, LAST, or BOTH.
If the minimum-degree option is specified, the PDLREG procedure estimates models for all degrees between minimum-degree and degree.
Introductory Example The following statements generate simulated data for variables Y and X. Y depends on the first three lags of X, with coefficients .25, .5, and .25. Thus, the effect of changes of X on Y takes effect 25% after one period, 75% after two periods, and 100% after three periods.
1352 F Chapter 20: The PDLREG Procedure
data test; xl1 = 0; xl2 = 0; xl3 = 0; do t = -3 to 100; x = ranuni(1234); y = 10 + .25 * xl1 + .5 * xl2 + .25 * xl3 + .1 * rannor(1234); if t > 0 then output; xl3 = xl2; xl2 = xl1; xl1 = x; end; run;
The following statements use the PDLREG procedure to regress Y on a distributed lag of X. The length of the lag distribution is 4, and the degree of the distribution polynomial is specified as 3. proc pdlreg data=test; model y = x( 4, 3 ); run;
The PDLREG procedure first prints a table of statistics for the residuals of the model, as shown in Figure 20.1. See Chapter 8, “The AUTOREG Procedure,” for an explanation of these statistics. Figure 20.1 Residual Statistics The PDLREG Procedure Dependent Variable
y
The PDLREG Procedure Ordinary Least Squares Estimates SSE MSE SBC MAE MAPE Durbin-Watson
0.86604442 0.00952 -156.72612 0.07761107 0.73971576 1.9920
DFE Root MSE AIC AICC Regress R-Square Total R-Square
91 0.09755 -169.54786 -168.88119 0.7711 0.7711
The PDLREG procedure next prints a table of parameter estimates, standard errors, and t tests, as shown in Figure 20.2. Figure 20.2 Parameter Estimates
Variable Intercept x**0 x**1 x**2 x**3
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
1 1 1 1 1
10.0030 0.4406 0.0113 -0.4108 0.0331
0.0431 0.0378 0.0336 0.0322 0.0392
231.87 11.66 0.34 -12.75 0.84
<.0001 <.0001 0.7377 <.0001 0.4007
Introductory Example F 1353
The table in Figure 20.2 shows the model intercept and the estimated parameters of the lag distribution polynomial. The parameter labeled X**0 is the constant term, ˛0 , of the distribution polynomial. X**1 is the linear coefficient, ˛1 ; X**2 is the quadratic coefficient, ˛2 ; and X**3 is the cubic coefficient, ˛3 . The parameter estimates for the distribution polynomial are not of interest in themselves. Since the PDLREG procedure does not print the orthogonal polynomial basis that it constructs to represent the distribution polynomial, these coefficient values cannot be interpreted. However, because these estimates are for an orthogonal basis, you can use these results to test the degree of the polynomial. For example, this table shows that the X**3 estimate is not significant; the p-value for its t ratio is 0.4007, while the X**2 estimate is highly significant (p < :0001). This indicates that a second-degree polynomial might be more appropriate for this data set. The PDLREG procedure next prints the lag distribution coefficients and a graphical display of these coefficients, as shown in Figure 20.3. Figure 20.3 Coefficients and Graph of Estimated Lag Distribution Estimate of Lag Distribution
Variable x(0) x(1) x(2) x(3) x(4)
Estimate
Standard Error
t Value
Approx Pr > |t|
-0.040150 0.324241 0.416661 0.289482 -0.004926
0.0360 0.0307 0.0239 0.0315 0.0365
-1.12 10.55 17.45 9.20 -0.13
0.2677 <.0001 <.0001 <.0001 0.8929
Estimate of Lag Distribution
Variable
-0.04
0.4167
x(0) x(1) x(2) x(3) x(4)
|***| | | |***************************** | | |*************************************| | |************************** | | | |
The lag distribution coefficients are the coefficients of the lagged values of X in the regression model. These coefficients lie on the polynomial curve defined by the parameters shown in Figure 20.2. Note that the estimated values for X(1), X(2), and X(3) are highly significant, while X(0) and X(4) are not significantly different from 0. These estimates are reasonably close to the true values used to generate the simulated data. The graphical display of the lag distribution coefficients plots the estimated lag distribution polynomial reported in Figure 20.2. The roughly quadratic shape of this plot is another indication that a third-degree distribution curve is not needed for this data set.
1354 F Chapter 20: The PDLREG Procedure
Syntax: PDLREG Procedure The following statements can be used with the PDLREG procedure: PROC PDLREG option ; BY variables ; MODEL dependents = effects / options ; OUTPUT OUT= SAS-data-set keyword = variables ; RESTRICT restrictions ;
Functional Summary The statements and options used with the PDLREG procedure are summarized in the following table. Table 20.1
PDLREG Functional Summary
Description
Statement
Option
Data Set Options specify the input data set write predicted values to an output data set
PDLREG OUTPUT
DATA= OUT=
BY-Group Processing specify BY-group processing
BY
Printing Control Options request all print options print correlations of the estimates print covariances of the estimates print DW statistics up to order j print the marginal probability of DW statistics print inverse of the crossproducts matrix print details at each iteration step print Durbin t statistic print Durbin h statistic suppress printed output print partial autocorrelations print standardized parameter estimates print crossproducts matrix
MODEL MODEL MODEL MODEL MODEL MODEL MODEL MODEL MODEL MODEL MODEL MODEL MODEL
ALL CORRB COVB DW=j DWPROB I ITPRINT LAGDEP LAGDEP= NOPRINT PARTIAL STB XPX
Model Estimation Options specify order of autoregressive process suppress intercept parameter specify convergence criterion specify maximum number of iterations specify estimation method
MODEL MODEL MODEL MODEL MODEL
NLAG= NOINT CONVERGE= MAXITER= METHOD=
PROC PDLREG Statement F 1355
Table 20.1
continued
Description Output Control Options specify confidence limit size specify confidence limit size for structural predicted values output transformed intercept variable output lower confidence limit for predicted values output lower confidence limit for structural predicted values output predicted values output predicted values of the structural part output residuals from the predicted values output residuals from the structural predicted values output transformed variables output upper confidence limit for the predicted values output upper confidence limit for the structural predicted values
Statement
Option
OUTPUT OUTPUT
ALPHACLI= ALPHACLM=
OUTPUT OUTPUT
CONSTANT= LCL=
OUTPUT
LCLM=
OUTPUT OUTPUT OUTPUT OUTPUT
P= PM= R= RM=
OUTPUT OUTPUT
TRANSFORM= UCL=
OUTPUT
UCLM=
PROC PDLREG Statement PROC PDLREG option ;
The PROC PDLREG statement has the following option: DATA= SAS-data-set
specifies the name of the SAS data set containing the input data. If you do not specify the DATA= option, the most recently created SAS data set is used. In addition, you can place any of the following MODEL statement options in the PROC PDLREG statement, which is equivalent to specifying the option for every MODEL statement: ALL, CONVERGE=, CORRB, COVB, DW=, DWPROB, ITPRINT, MAXITER=, METHOD=, NOINT, NOPRINT, and PARTIAL.
BY Statement BY variables ;
A BY statement can be used with PROC PDLREG to obtain separate analyses on observations in groups defined by the BY variables.
1356 F Chapter 20: The PDLREG Procedure
MODEL Statement MODEL dependent = effects / options ;
The MODEL statement specifies the regression model. The keyword MODEL is followed by the dependent variable name, an equal sign, and a list of independent effects. Only one MODEL statement is allowed. Every variable in the model must be a numeric variable in the input data set. Specify an independent effect with a variable name optionally followed by a polynomial lag distribution specification.
Specifying Independent Effects The general form of an effect is variable ( length, degree, minimum-degree, constraint )
The term in parentheses following the variable name specifies a polynomial distributed lag (PDL) for the variable. The PDL specification is as follows: length
specifies the number of lags of the variable to include in the lag distribution.
degree
specifies the maximum degree of the distribution polynomial. If not specified, the degree defaults to the lag length.
minimum-degree
specifies the minimum degree of the polynomial. By default minimum-degree is the same as degree.
constraint
specifies endpoint restrictions on the polynomial. The value of constraint can be FIRST, LAST, or BOTH. If a value is not specified, there are no endpoint restrictions.
If you do not specify the degree or minimum-degree parameter, but you do specify endpoint restrictions, you must use commas to show which parameter, degree or minimum-degree, is left out.
MODEL Statement Options The following options can appear in the MODEL statement after a slash (/). ALL
prints all the matrices computed during the analysis of the model. CORRB
prints the matrix of estimated correlations between the parameter estimates. COVB
prints the matrix of estimated covariances between the parameter estimates.
MODEL Statement F 1357
DW= j
prints the generalized Durbin-Watson statistics up to the order of j. The default is DW=1. When you specify the LAGDEP or LAGDEP=name option, the Durbin-Watson statistic is not printed unless you specify the DW= option. DWPROB
prints the marginal probability of the Durbin-Watson statistic. CONVERGE= value
sets the convergence criterion. If the maximum absolute value of the change in the autoregressive parameter estimates between iterations is less than this amount, then convergence is assumed. The default is CONVERGE=.001. I
prints .X0 X/ 1 , the inverse of the crossproducts matrix for the model; or, if restrictions are specified, it prints .X0 X/ 1 adjusted for the restrictions. ITPRINT
prints information on each iteration. LAGDEP LAGDV
prints the t statistic for testing residual autocorrelation when regressors contain lagged dependent variables. LAGDEP= name LAGDV= name
prints the Durbin h statistic for testing the presence of first-order autocorrelation when regressors contain the lagged dependent variable whose name is specified as LAGDEP=name. When the h statistic cannot be computed, the asymptotically equivalent t statistic is given. MAXITER= number
sets the maximum number of iterations allowed. The default is MAXITER=50. METHOD= value
specifies the type of estimates for the autoregressive component. METHOD= option are as follows: METHOD=ML
specifies the maximum likelihood method.
METHOD=ULS
specifies unconditional least squares.
METHOD=YW
specifies the Yule-Walker method.
METHOD=ITYW
specifies iterative Yule-Walker estimates.
The values of the
The default is METHOD=ML if you specified the LAGDEP or LAGDEP= option; otherwise, METHOD=YW is the default.
1358 F Chapter 20: The PDLREG Procedure
NLAG= m NLAG= ( number-list )
specifies the order of the autoregressive process or the subset of autoregressive lags to be fit. If you do not specify the NLAG= option, PROC PDLREG does not fit an autoregressive model. NOINT
suppresses the intercept parameter from the model. NOPRINT
suppresses the printed output. PARTIAL
prints partial autocorrelations if the NLAG= option is specified. STB
prints standardized parameter estimates. Sometimes known as a standard partial regression coefficient, a standardized parameter estimate is a parameter estimate multiplied by the standard deviation of the associated regressor and divided by the standard deviation of the regressed variable. XPX
prints the crossproducts matrix, X0 X, used for the model. X refers to the transformed matrix of regressors for the regression.
OUTPUT Statement OUTPUT OUT= SAS-data-set keyword= option . . . ;
The OUTPUT statement creates an output SAS data set with variables as specified by the following keyword options. See the section “Predicted Values” in Chapter 8, “The AUTOREG Procedure,” for a description of the associated computations for these options. ALPHACLI= number
sets the confidence limit size for the estimates of future values of the current realization of the response time series to number, where number is less than one and greater than zero. The resulting confidence interval has 1-number confidence. The default value for number is 0.05, corresponding to a 95% confidence interval. ALPHACLM= number
sets the confidence limit size for the estimates of the structural or regression part of the model to number, where number is less than one and greater than zero. The resulting confidence interval has 1-number confidence. The default value for number is 0.05, corresponding to a 95% confidence interval. OUT= SAS-data-set
names the output data.
OUTPUT Statement F 1359
The following specifications are of the form KEYWORD=names, where KEYWORD= specifies the statistic to include in the output data set and names gives names to the variables that contain the statistics. CONSTANT= variable
writes the transformed intercept to the output data set. LCL= name
requests that the lower confidence limit for the predicted value (specified in the PREDICTED= option) be added to the output data set under the name given. LCLM= name
requests that the lower confidence limit for the structural predicted value (specified in the PREDICTEDM= option) be added to the output data set under the name given. PREDICTED= name P= name
stores the predicted values in the output data set under the name given. PREDICTEDM= name PM= name
stores the structural predicted values in the output data set under the name given. These values are formed from only the structural part of the model. RESIDUAL= name R= name
stores the residuals from the predicted values based on both the structural and time series parts of the model in the output data set under the name given. RESIDUALM= name RM= name
requests that the residuals from the structural prediction be given. TRANSFORM= variables
requests that the specified variables from the input data set be transformed by the autoregressive model and put in the output data set. If you need to reproduce the data suitable for reestimation, you must also transform an intercept variable. To do this, transform a variable that only takes the value 1 or use the CONSTANT= option. UCL= name
stores the upper confidence limit for the predicted value (specified in the PREDICTED= option) in the output data set under the name given. UCLM= name
stores the upper confidence limit for the structural predicted value (specified in the PREDICTEDM= option) in the output data set under the name given. For example, the SAS statements
1360 F Chapter 20: The PDLREG Procedure
proc pdlreg data=a; model y=x1 x2; output out=b p=yhat r=resid; run;
create an output data set named B. In addition to the input data set variables, the data set B contains the variable YHAT, whose values are predicted values of the dependent variable Y, and RESID, whose values are the residual values of Y.
RESTRICT Statement RESTRICT equation , . . . , equation ;
The RESTRICT statement places restrictions on the parameter estimates for covariates in the preceding MODEL statement. A parameter produced by a distributed lag cannot be restricted with the RESTRICT statement. Each restriction is written as a linear equation. If you specify more than one restriction in a RESTRICT statement, the restrictions are separated by commas. You can refer to parameters by the name of the corresponding regressor variable. Each name used in the equation must be a regressor in the preceding MODEL statement. Use the keyword INTERCEPT to refer to the intercept parameter in the model. RESTRICT statements can be given labels. You can use labels to distinguish results for different restrictions in the printed output. Labels are specified as follows: label : RESTRICT . . .
The following is an example of the use of the RESTRICT statement, in which the coefficients of the regressors X1 and X2 are required to sum to 1: proc pdlreg data=a; model y = x1 x2; restrict x1 + x2 = 1; run;
Parameter names can be multiplied by constants. When no equal sign appears, the linear combination is set equal to 0. Note that the parameters associated with the variables are restricted, not the variables themselves. Here are some examples of valid RESTRICT statements: restrict restrict restrict restrict restrict
x1 + x2 = 1; x1 + x2 - 1; 2 * x1 = x2 + x3 , intercept + x4 = 0; x1 = x2 = x3 = 1; 2 * x1 - x2;
Restricted parameter estimates are computed by introducing a Lagrangian parameter for each restriction (Pringle and Rayner 1971). The estimates of these Lagrangian parameters are printed in
Details: PDLREG Procedure F 1361
the parameter estimates table. If a restriction cannot be applied, its parameter value and degrees of freedom are listed as 0. The Lagrangian parameter, , measures the sensitivity of the SSE to the restriction. If the restriction is changed by a small amount , the SSE is changed by 2. The t ratio tests the significance of the restrictions. If is zero, the restricted estimates are the same as the unrestricted ones. You can specify any number of restrictions in a RESTRICT statement, and you can use any number of RESTRICT statements. The estimates are computed subject to all restrictions specified. However, restrictions should be consistent and not redundant.
Details: PDLREG Procedure
Missing Values The PDLREG procedure skips any observations at the beginning of the data set that have missing values. The procedure uses all observations with nonmissing values for all the independent and dependent variables such that the lag distribution has sufficient nonmissing lagged independent variables.
Polynomial Distributed Lag Estimation The simple finite distributed lag model is expressed in the form yt D ˛ C
p X
ˇi xt
i
C t
i D0
When the lag length (p) is long, severe multicollinearity can occur. Use the Almon or polynomial distributed lag model to avoid this problem, since the relatively low-degree d (p) polynomials can capture the true lag distribution. The lag coefficient can be written in the Almon polynomial lag ˇi D
˛0
C
d X
˛j i j
j D1
Emerson (1968) proposed an efficient method of constructing orthogonal polynomials from the preceding polynomial equation as ˇi D ˛0 C
d X j D1
˛j fj .i /
1362 F Chapter 20: The PDLREG Procedure
where fj .i / is a polynomial of degree j in the lag length i. The polynomials fj .i/ are chosen so that they are orthogonal: ( n X 1 ifj D k wi fj .i /fk .i / D 0 ifj ¤k i D1 where wi is the weighting factor, and n D p C 1. PROC PDLREG uses the equal weights (wi D 1) for all i. To construct the orthogonal polynomials, the following recursive relation is used: fj .i / D .Aj i C Bj /fj
1 .i/
Cj fj
D 1; : : :; d
2 .i/j
The constants Aj ; Bj , and Cj are determined as follows:
Aj
D
8 n <X :
wi i 2 fj2 1 .i /
i D1
n X
!2 wi ifj2 1 .i/
i D1 n X
wi ifj
1 .i/fj 2 .i/
!2 9 = ;
i D1
Bj
D
Aj
n X
1=2
wi ifj2 1 .i/
i D1
Cj
D Aj
n X
wi ifj
1 .i /fj 2 .i/
i D1
where f
1 .i /
D 0 and f0 .i / D 1=
qP n
i D1 wi .
PROC PDLREG estimates the orthogonal polynomial coefficients, ˛0 ; : : :; ˛d , to compute the coefficient estimate of each independent variable (X) with distributed lags. For example, if an independent variable is specified as X(9,3), a third-degree polynomial is used to specify the distributed lag coefficients. The third-degree polynomial is fit as a constant term, a linear term, a quadratic term, and a cubic term. The four terms are constructed to be orthogonal. In the output produced by the PDLREG procedure for this case, parameter estimates with names X**0, X**1, X**2, and X**3 correspond to ˛O 0 ; ˛O 1 ; ˛O 2 , and ˛O 3 , respectively. A test using the t statistic and the approximate p-value (“Approx Pr > jtj”) associated with X**3 can determine whether a second-degree polynomial rather than a third-degree polynomial is appropriate. The estimates of the 10 lag coefficients associated with the specification X(9,3) are labeled X(0), X(1), X(2), X(3), X(4), X(5), X(6), X(7), X(8), and X(9).
Autoregressive Error Model Estimation The PDLREG procedure uses the same autoregressive error model estimation methods as the AUTOREG procedure. These two procedures share the same computational resources for computing estimates. See Chapter 8, “The AUTOREG Procedure,” for details about estimation methods for autoregressive error models.
OUT= Data Set F 1363
OUT= Data Set The OUT= data set produced by the PDLREG procedure’s OUTPUT statement is similar in form to the OUT= data set produced by the AUTOREG procedure. See Chapter 8, “The AUTOREG Procedure,” for details on the OUT= data set.
Printed Output The PDLREG procedure prints the following items: 1. the name of the dependent variable 2. the ordinary least squares (OLS) estimates 3. the estimates of autocorrelations and of the autocovariance, and if line size permits, a graph of the autocorrelation at each lag. The autocorrelation for lag 0 is 1. These items are printed if you specify the NLAG= option. 4. the partial autocorrelations if the PARTIAL and NLAG= options are specified. The first partial autocorrelation is the autocorrelation for lag 1. 5. the preliminary mean square error, which results from solving the Yule-Walker equations if you specify the NLAG= option 6. the estimates of the autoregressive parameters, their standard errors, and the ratios of estimates to standard errors (t ) if you specify the NLAG= option 7. the statistics of fit for the final model if you specify the NLAG= option. These include the error sum of squares (SSE), the degrees of freedom for error (DFE), the mean square error (MSE), the root mean square error (Root MSE), the mean absolute error (MAE), the mean absolute percentage error (MAPE), the Schwarz information criterion (SBC), the Akaike’s information criterion (AIC), Akaike’s information criterion corrected(AICC), the regression R2 (Regress R-Square), the total R2 (Total R-Square), and the Durbin-Watson statistic (DurbinWatson). See Chapter 8, “The AUTOREG Procedure,” for details of the regression R2 and the total R2 . 8. the parameter estimates for the structural model (B), a standard error estimate, the ratio of estimate to standard error (t), and an approximation to the significance probability for the parameter being 0 (“Approx Pr > jtj”) 9. a plot of the lag distribution (estimate of lag distribution) 10. the covariance matrix of the parameter estimates if the COVB option is specified
1364 F Chapter 20: The PDLREG Procedure
ODS Table Names PROC PDLREG assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. Table 20.2
ODS Tables Produced in PROC PDLREG
ODS Table Name
Description
ODS Tables Created by the MODEL Statement ARParameterEstimates Estimates of autoregressive parameters CholeskyFactor Cholesky root of gamma Coefficients Coefficients for first NLAG observations ConvergenceStatus Convergence status table CorrB Correlation of parameter estimates CorrGraph Estimates of autocorrelations CovB Covariance of parameter estimates DependenceEquations Linear dependence equation Dependent Dependent variable DWTest Durbin-Watson statistics DWTestProb Durbin-Watson statistics and p-values ExpAutocorr Expected autocorrelations FitSummary Summary of regression GammaInverse Gamma inverse IterHistory Iteration history LagDist Lag distribution ParameterEstimates Parameter estimates ParameterEstimatesGivenAR Parameter estimates assuming AR parameters are given PartialAutoCorr Partial autocorrelation PreMSE Preliminary MSE XPXIMatrix .X0 X/ 1 matrix XPXMatrix X0 X matrix YWIterSSE Yule-Walker iteration sum of squared error ODS Tables Created by the RESTRICT Statement Restrict Restriction table
Option NLAG=
NLAG= default CORRB NLAG= COVB default DW= DW= DWPROB NLAG= default ITPRINT ALL default NLAG= PARTIAL NLAG= XPX XPX METHOD=ITYW
default
Examples: PDLREG Procedure F 1365
Examples: PDLREG Procedure
Example 20.1: Industrial Conference Board Data In this example, a second-degree Almon polynomial lag model is fit to a model with a five-period lag, and dummy variables are used for quarter effects. The PDL model is estimated using capital appropriations data series for the period 1952 to 1967. The estimation model is written CEt D a0 C b1 Q1t C b2 Q2t C b3 Q3t Cc0 CAt C c1 CAt 1 C : : : C c5 CAt 5 where CE represents capital expenditures and CA represents capital appropriations. title ’National Industrial Conference Board Data’; title2 ’Quarterly Series - 1952Q1 to 1967Q4’; data a; input ce ca @@; qtr = mod( _n_-1, 4 ) + 1; q1 = qtr=1; q2 = qtr=2; q3 = qtr=3; datalines; ... more lines ...
proc pdlreg data=a; model ce = q1 q2 q3 ca(5,2) / dwprob; run;
The printed output produced by the PDLREG procedure is shown in Output 20.1.1. The small Durbin-Watson test indicates autoregressive errors. Output 20.1.1 Printed Output Produced by PROC PDLREG National Industrial Conference Board Data Quarterly Series - 1952Q1 to 1967Q4 The PDLREG Procedure Dependent Variable
ce
1366 F Chapter 20: The PDLREG Procedure
Output 20.1.1 continued National Industrial Conference Board Data Quarterly Series - 1952Q1 to 1967Q4 The PDLREG Procedure Ordinary Least Squares Estimates SSE MSE SBC MAE MAPE Durbin-Watson
Variable Intercept q1 q2 q3 ca**0 ca**1 ca**2
1205186.4 25108 733.84921 107.777378 3.71653891 0.6157
DFE Root MSE AIC AICC Regress R-Square Total R-Square
48 158.45520 719.797878 722.180856 0.9834 0.9834
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
1 1 1 1 1 1 1
210.0109 -10.5515 -20.9887 -30.4337 0.3760 0.1297 0.0247
73.2524 61.0634 59.9386 59.9004 0.007318 0.0251 0.0593
2.87 -0.17 -0.35 -0.51 51.38 5.16 0.42
0.0061 0.8635 0.7277 0.6137 <.0001 <.0001 0.6794
Estimate of Lag Distribution
Variable
Estimate
Standard Error
t Value
Approx Pr > |t|
ca(0) ca(1) ca(2) ca(3) ca(4) ca(5)
0.089467 0.104317 0.127237 0.158230 0.197294 0.244429
0.0360 0.0109 0.0255 0.0254 0.0112 0.0370
2.49 9.56 5.00 6.24 17.69 6.60
0.0165 <.0001 <.0001 <.0001 <.0001 <.0001
Estimate of Lag Distribution
Variable
0
0.2444
ca(0) ca(1) ca(2) ca(3) ca(4) ca(5)
|*************** | |***************** | |********************* | |*************************** | |********************************* | |*****************************************|
The following statements use the REG procedure to fit the same polynomial distributed lag model. A DATA step computes lagged values of the regressor X, and RESTRICT statements are used to impose the polynomial lag distribution. Refer to Judge et al. (1985, pp. 357–359) for the restricted least squares estimation of the Almon distributed lag model.
Example 20.1: Industrial Conference Board Data F 1367
data b; set a; ca_1 = ca_2 = ca_3 = ca_4 = ca_5 = run;
lag( ca ); lag2( ca ); lag3( ca ); lag4( ca ); lag5( ca );
proc reg data=b; model ce = q1 q2 restrict - ca + restrict ca restrict -5*ca + run;
q3 ca ca_1 ca_2 ca_3 ca_4 ca_5; 5*ca_1 - 10*ca_2 + 10*ca_3 - 5*ca_4 + ca_5; 3*ca_1 + 2*ca_2 + 2*ca_3 - 3*ca_4 + ca_5; 7*ca_1 + 4*ca_2 - 4*ca_3 - 7*ca_4 + 5*ca_5;
The REG procedure output is shown in Output 20.1.2. Output 20.1.2 Printed Output Produced by PROC REG National Industrial Conference Board Data Quarterly Series - 1952Q1 to 1967Q4 The REG Procedure Model: MODEL1 Dependent Variable: ce Analysis of Variance
Source
DF
Sum of Squares
Mean Square
Model Error Corrected Total
6 48 54
71343377 1205186 72548564
11890563 25108
Root MSE Dependent Mean Coeff Var
158.45520 3185.69091 4.97397
R-Square Adj R-Sq
F Value
Pr > F
473.58
<.0001
0.9834 0.9813
1368 F Chapter 20: The PDLREG Procedure
Output 20.1.2 continued Parameter Estimates
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
Intercept q1 q2 q3 ca ca_1 ca_2 ca_3 ca_4 ca_5 RESTRICT RESTRICT RESTRICT
1 1 1 1 1 1 1 1 1 1 -1 -1 -1
210.01094 -10.55151 -20.98869 -30.43374 0.08947 0.10432 0.12724 0.15823 0.19729 0.24443 623.63242 18933 10303
73.25236 61.06341 59.93860 59.90045 0.03599 0.01091 0.02547 0.02537 0.01115 0.03704 12697 44803 18422
2.87 -0.17 -0.35 -0.51 2.49 9.56 5.00 6.24 17.69 6.60 0.05 0.42 0.56
0.0061 0.8635 0.7277 0.6137 0.0165 <.0001 <.0001 <.0001 <.0001 <.0001 0.9614* 0.6772* 0.5814*
* Probability computed using beta distribution.
Example 20.2: Money Demand Model This example estimates the demand for money by using the following dynamic specification: mt D a0 C b0 mt
1
C
5 X
ci yt
i
C
i D0
2 X
di rt
i
C
i D0
3 X
fi pt
i
C ut
i D0
where mt D log of real money stock (M1) yt D log of real GNP rt D interest rate (commercial paper rate) pt D inflation rate ci ; di ; and fi .i > 0/ are coefficients for the lagged variables The following DATA step reads the data and transforms the real money and real GNP variables using the natural logarithm. Refer to Balke and Gordon (1986) for a description of the data. data a; input m1 gnp gdf r @@; m = log( 100 * m1 / gdf ); lagm = lag( m ); y = log( gnp ); p = log( gdf / lag( gdf ) ); date = intnx( ’qtr’, ’1jan1968’d, _n_-1 );
Example 20.2: Money Demand Model F 1369
format date yyqc6.; label m = ’Real Money Stock (M1)’ lagm = ’Lagged Real Money Stock’ y = ’Real GNP’ r = ’Commercial Paper Rate’ p = ’Inflation Rate’; datalines; ... more lines ...
Output 20.2.1 shows a partial list of the data set. Output 20.2.1 Partial List of the Data Set A National Industrial Conference Board Data Quarterly Series - 1952Q1 to 1967Q4 Obs
date
1 2 3 4 5
1968:1 1968:2 1968:3 1968:4 1969:1
m
lagm
5.44041 5.44732 5.45815 5.46492 5.46980
. 5.44041 5.44732 5.45815 5.46492
y 6.94333 6.96226 6.97422 6.97661 6.98855
r 5.58 6.08 5.96 5.96 6.66
p . 0.011513 0.008246 0.014865 0.011005
The regression model is written for the PDLREG procedure with a MODEL statement. The LAGDEP= option is specified to test for the serial correlation in disturbances since regressors contain the lagged dependent variable LAGM. title ’Money Demand Estimation using Distributed Lag Model’; title2 ’Quarterly Data - 1968Q2 to 1983Q4’; proc pdlreg data=a; model m = lagm y(5,3) r(2, , ,first) p(3,2) / lagdep=lagm; run;
The estimated model is shown in Output 20.2.2 and Output 20.2.3. Output 20.2.2 Parameter Estimates Money Demand Estimation using Distributed Lag Model Quarterly Data - 1968Q2 to 1983Q4 The PDLREG Procedure Dependent Variable
m Real Money Stock (M1)
1370 F Chapter 20: The PDLREG Procedure
Output 20.2.2 continued Money Demand Estimation using Distributed Lag Model Quarterly Data - 1968Q2 to 1983Q4 The PDLREG Procedure Ordinary Least Squares Estimates SSE MSE SBC MAE MAPE
0.00169815 0.0000354 -404.60169 0.00383648 0.07051345
DFE Root MSE AIC AICC Regress R-Square Total R-Square
48 0.00595 -427.4546 -421.83758 0.9712 0.9712
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
1 1 1 1 1 1 1 1 1 1 1 1
-0.1407 0.9875 0.0132 -0.0704 0.1261 -0.4089 -0.000186 0.002200 0.000788 -0.6602 0.4036 -1.0064
0.2625 0.0425 0.004531 0.0528 0.0786 0.1265 0.000336 0.000774 0.000249 0.1132 0.2321 0.2288
-0.54 23.21 2.91 -1.33 1.60 -3.23 -0.55 2.84 3.16 -5.83 1.74 -4.40
0.5943 <.0001 0.0055 0.1891 0.1154 0.0022 0.5816 0.0065 0.0027 <.0001 0.0885 <.0001
Restriction
DF
L Value
Standard Error
t Value
Approx Pr > |t|
r(-1)
-1
0.0164
0.007275
2.26
0.0223
Variable Intercept lagm y**0 y**1 y**2 y**3 r**0 r**1 r**2 p**0 p**1 p**2
Example 20.2: Money Demand Model F 1371
Output 20.2.3 Estimates for Lagged Variables Estimate of Lag Distribution
Variable y(0) y(1) y(2) y(3) y(4) y(5)
Estimate
Standard Error
t Value
Approx Pr > |t|
0.268619 -0.196484 -0.163148 0.063850 0.179733 -0.120276
0.0910 0.0612 0.0537 0.0451 0.0588 0.0679
2.95 -3.21 -3.04 1.42 3.06 -1.77
0.0049 0.0024 0.0038 0.1632 0.0036 0.0827
Estimate of Lag Distribution
Variable
-0.196
0
0.2686
y(0) y(1) y(2) y(3) y(4) y(5)
| |************************| |****************| | | | *************| | |****** | | |**************** | | | *********| Estimate of Lag Distribution
Variable r(0) r(1) r(2)
Estimate
Standard Error
t Value
Approx Pr > |t|
-0.001341 -0.000751 0.001770
0.000388 0.000234 0.000754
-3.45 -3.22 2.35
0.0012 0.0023 0.0230
Estimate of Lag Distribution
Variable
-0.001
0
0.0018
r(0) r(1) r(2)
|*****************| | | | *********| | |***********************|
1372 F Chapter 20: The PDLREG Procedure
Output 20.2.3 continued Estimate of Lag Distribution
Variable p(0) p(1) p(2) p(3)
Estimate
Standard Error
t Value
Approx Pr > |t|
-1.104051 0.082892 0.263391 -0.562556
0.2027 0.1257 0.1381 0.2076
-5.45 0.66 1.91 -2.71
<.0001 0.5128 0.0624 0.0093
Estimate of Lag Distribution
Variable
-1.104
0
0.2634
p(0) p(1) p(2) p(3)
|********************************| | | |*** | | |********| | | ****************|
References F 1373
References Balke, N. S. and Gordon, R. J. (1986), “Historical Data,” in R. J. Gordon, ed., The American Business Cycle, 781–850, Chicago: The University of Chicago Press. Emerson, P. L. (1968), “Numerical Construction of Orthogonal Polynomials from a General Recurrence Formula,” Biometrics, 24, 695–701. Gallant, A. R. and Goebel, J. J. (1976), “Nonlinear Regression with Autoregressive Errors,” Journal of the American Statistical Association, 71, 961–967. Harvey, A. C. (1981), The Econometric Analysis of Time Series, New York: John Wiley & Sons. Johnston, J. (1972), Econometric Methods, Second Edition, New York: McGraw-Hill. Judge, G. G., Griffiths, W. E., Hill, R. C., Lutkepohl, H., and Lee, T. C. (1985), The Theory and Practice of Econometrics, Second Edition, New York: John Wiley & Sons. Park, R. E. and Mitchell, B. M. (1980), “Estimating the Autocorrelated Error Model with Trended Data,” Journal of Econometrics, 13, 185–201. Pringle, R. M. and Rayner, A. A. (1971), Generalized Inverse Matrices with Applications to Statistics, New York: Hafner Publishing.
1374
Chapter 21
The QLIM Procedure Contents Overview: QLIM Procedure . . . . . . . . . . . . . . . . . . Getting Started: QLIM Procedure . . . . . . . . . . . . . . . Introductory Example: Binary Probit and Logit Models Syntax: QLIM Procedure . . . . . . . . . . . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . . . . . . PROC QLIM Statement . . . . . . . . . . . . . . . . . BOUNDS Statement . . . . . . . . . . . . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . . . . . . CLASS Statement . . . . . . . . . . . . . . . . . . . . ENDOGENOUS Statement . . . . . . . . . . . . . . . HETERO Statement . . . . . . . . . . . . . . . . . . . INIT Statement . . . . . . . . . . . . . . . . . . . . . . MODEL Statement . . . . . . . . . . . . . . . . . . . . NLOPTIONS Statement . . . . . . . . . . . . . . . . . OUTPUT Statement . . . . . . . . . . . . . . . . . . . RESTRICT Statement . . . . . . . . . . . . . . . . . . TEST Statement . . . . . . . . . . . . . . . . . . . . . WEIGHT Statement . . . . . . . . . . . . . . . . . . . Details: QLIM Procedure . . . . . . . . . . . . . . . . . . . . Ordinal Discrete Choice Modeling . . . . . . . . . . . Limited Dependent Variable Models . . . . . . . . . . . Stochastic Frontier Production and Cost Models . . . . Heteroscedasticity and Box-Cox Transformation . . . . Bivariate Limited Dependent Variable Modeling . . . . Selection Models . . . . . . . . . . . . . . . . . . . . . Multivariate Limited Dependent Models . . . . . . . . Tests on Parameters . . . . . . . . . . . . . . . . . . . Output to SAS Data Set . . . . . . . . . . . . . . . . . OUTEST= Data Set . . . . . . . . . . . . . . . . . . . Naming . . . . . . . . . . . . . . . . . . . . . . . . . . ODS Table Names . . . . . . . . . . . . . . . . . . . . Examples: QLIM Procedure . . . . . . . . . . . . . . . . . . Example 21.1: Ordered Data Modeling . . . . . . . . . Example 21.2: Tobit Analysis . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1376 1377 1378 1382 1383 1384 1386 1387 1387 1387 1390 1391 1391 1392 1392 1393 1394 1395 1396 1396 1399 1403 1405 1406 1408 1409 1410 1411 1413 1414 1415 1416 1416 1419
1376 F Chapter 21: The QLIM Procedure
Example 21.3: Bivariate Probit Analysis . . . . . . . . . . . . . . . . . Example 21.4: Sample Selection Model . . . . . . . . . . . . . . . . . Example 21.5: Sample Selection Model with Truncation and Censoring Example 21.6: Types of Tobit Models . . . . . . . . . . . . . . . . . . Example 21.7: Stochastic Frontier Models . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
1421 1422 1423 1426 1432 1437
Overview: QLIM Procedure The QLIM (qualitative and limited dependent variable model) procedure analyzes univariate and multivariate limited dependent variable models where dependent variables take discrete values or dependent variables are observed only in a limited range of values. This procedure includes logit, probit, tobit, selection, and multivariate models. The multivariate model can contain discrete choice and limited endogenous variables as well as continuous endogenous variables. The QLIM procedure supports the following models: linear regression model with heteroscedasticity Box-Cox regression with heteroscedasticity probit with heteroscedasticity logit with heteroscedasticity tobit (censored and truncated) with heteroscedasticity bivariate probit bivariate tobit sample selection and switching regression models multivariate limited dependent variables stochastic frontier production and cost models In the linear regression models with heteroscedasticity, the assumption that error variance is constant across observations is relaxed. The QLIM procedure allows for a number of different linear and nonlinear variance specifications. Another way to make the linear model more appropriate to fit the data and reduce skewness is to apply Box-Cox transformation. If the nature of data is such that the dependent variable is discrete and it takes only two possible values, OLS estimates are inconsistent. The QLIM procedure offers probit and logit models to overcome these estimation problems. Assumptions about the error variance can also be relaxed in order to estimate probit or logit with heteroscedasticity. The QLIM procedure also offers a class of models where the dependent variable is censored or truncated from below and/or above. When a continuous dependent variable is observed only
Getting Started: QLIM Procedure F 1377
within a certain range and values outside this range are not available, the QLIM procedure offers a class of models adjusting for truncation. In some cases, the dependent variable is continuous only in a certain range and all values outside this range are reported as being on its boundary. For example, if it is not possible to observe negative values, the value of the dependent variable is reported as equal to zero. Because the data are censored, OLS results are inconsistent, and it cannot be guaranteed that the predicted values from the model fall in the appropriate region. Most of the models in the QLIM procedure can be extended to accommodate bivariate and multivariate scenarios. The assumption that one variable is observed only if another variable takes on certain values lead to the introduction of sample selection models. If the dependent variables are mutually exclusive and observed only for certain ranges of the selection variable, the sample selection can be extended to include cases of switching regression. Stochastic frontier production and cost models allow for random shocks of the production or cost. They include a systematic positive component in the error term adjusting for technological or cost inefficiency. The QLIM procedure uses maximum likelihood methods. Initial starting values for the nonlinear optimizations are typically calculated by OLS.
Getting Started: QLIM Procedure The QLIM procedure is similar in use to the other regression or simultaneous equations model procedures in the SAS System. For example, the following statements are used to estimate a binary choice model by using the probit probability function: proc qlim data=a; model y = x1; endogenous y ~ discrete; run;
The response variable, y, is numeric and has discrete values. PROC QLIM enables the user to specify the type of endogenous variables in the ENDOGENOUS statement. The binary probit model can be also specified as follows: model y = x1 / discrete;
When multiple endogenous variables are specified in the QLIM procedure, these equations are estimated as a system. Multiple endogenous variables can be specified with one MODEL statement in the QLIM procedure when these models have the same exogenous variables: model y1 y2 = x1 x2 / discrete;
The preceding specification is equivalent to the following statements:
1378 F Chapter 21: The QLIM Procedure
proc qlim data=a; model y1 = x1 x2; model y2 = x1 x2; endogenous y1 y2 ~ discrete; run;
The standard tobit model is estimated by specifying the endogenous variable to be truncated or censored. The limits of the dependent variable can be specified with the CENSORED or TRUNCATED option in the ENDOGENOUS or MODEL statement when the data are limited by specific values or variables. For example, the two-limit censored model requires two variables that contain the lower (bottom ) and upper (top ) bound: proc qlim data=a; model y = x1 x2 x3; endogenous y ~ censored(lb=bottom ub=top); run;
The bounds can be numbers if they are fixed for all observations in the data set. For example, the standard tobit model can be specified as follows: proc qlim data=a; model y = x1 x2 x3; endogenous y ~ censored(lb=0); run;
Introductory Example: Binary Probit and Logit Models The following example illustrates the use of PROC QLIM. The data were originally published by Mroz (1987) and downloaded from Wooldridge (2002). This data set is based on a sample of 753 married white women. The dependent variable is a discrete variable of labor force participation (inlf ). Explanatory variables are the number of children ages 5 or younger (kidslt6 ), the number of children ages 6 to 18 (kidsge6 ), the woman’s age (age ), the woman’s years of schooling (educ ), wife’s labor experience (exper ), square of experience (expersq ), and the family income excluding the wife’s wage (nwifeinc ). The program (with data values omitted) is as follows: /*-- Binary Probit --*/ proc qlim data=mroz; model inlf = nwifeinc educ exper expersq age kidslt6 kidsge6 / discrete; run;
Results of this analysis are shown in the following four figures. In the first table, shown in Figure 21.1, PROC QLIM provides frequency information about each choice. In this example, 428 women participate in the labor force (inlf =1).
Introductory Example: Binary Probit and Logit Models F 1379
Figure 21.1 Choice Frequency Summary Binary Data The QLIM Procedure Discrete Response Profile of inlf Index
Value
1 2
0 1
Frequency
Percent
325 428
43.16 56.84
The second table is the estimation summary table shown in Figure 21.2. Included are the number of dependent variables, names of dependent variables, the number of observations, the log-likelihood function value, the maximum absolute gradient, the number of iterations, AIC, and Schwarz criterion. Figure 21.2 Fit Summary Table of Binary Probit Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion
1 inlf 753 -401.30219 0.0000669 15 Quasi-Newton 818.60439 855.59691
Goodness-of-fit measures are displayed in Figure 21.3. All measures except McKelvey-Zavoina’s definition are based on the log-likelihood function value. The likelihood ratio test statistic has chisquare distribution conditional on the null hypothesis that all slope coefficients are zero. In this example, the likelihood ratio statistic is used to test the hypothesis that kidslt6 Dkidge6 Dage Deduc Dexper Dexpersq Dnwifeinc D 0.
1380 F Chapter 21: The QLIM Procedure
Figure 21.3 Goodness of Fit Goodness-of-Fit Measures Measure
Value
Likelihood Ratio (R) Upper Bound of R (U) Aldrich-Nelson Cragg-Uhler 1 Cragg-Uhler 2 Estrella Adjusted Estrella McFadden’s LRI Veall-Zimmermann McKelvey-Zavoina
227.14 1029.7 0.2317 0.2604 0.3494 0.2888 0.2693 0.2206 0.4012 0.4025
Formula 2 * (LogL - LogL0) - 2 * LogL0 R / (R+N) 1 - exp(-R/N) (1-exp(-R/N)) / (1-exp(-U/N)) 1 - (1-R/U)^(U/N) 1 - ((LogL-K)/LogL0)^(-2/N*LogL0) R / U (R * (U+N)) / (U * (R+N))
N = # of observations, K = # of regressors
Finally, the parameter estimates and standard errors are shown in Figure 21.4. Figure 21.4 Parameter Estimates of Binary Probit Parameter Estimates
Parameter
DF
Estimate
Standard Error
t Value
Intercept nwifeinc educ exper expersq age kidslt6 kidsge6
1 1 1 1 1 1 1 1
0.270077 -0.012024 0.130905 0.123348 -0.001887 -0.052853 -0.868329 0.036005
0.508590 0.004840 0.025255 0.018720 0.000600 0.008477 0.118519 0.043477
0.53 -2.48 5.18 6.59 -3.14 -6.24 -7.33 0.83
Approx Pr > |t| 0.5954 0.0130 <.0001 <.0001 0.0017 <.0001 <.0001 0.4076
When the error term has a logistic distribution, the binary logit model is estimated. To specify a logistic distribution, add D=LOGIT option as follows: /*-- Binary Logit --*/ proc qlim data=mroz; model inlf = nwifeinc educ exper expersq age kidslt6 kidsge6 / discrete(d=logit); run;
The estimated parameters are shown in Figure 21.5.
Introductory Example: Binary Probit and Logit Models F 1381
Figure 21.5 Parameter Estimates of Binary Logit Binary Data The QLIM Procedure Parameter Estimates
Parameter
DF
Estimate
Standard Error
t Value
Intercept nwifeinc educ exper expersq age kidslt6 kidsge6
1 1 1 1 1 1 1 1
0.425452 -0.021345 0.221170 0.205870 -0.003154 -0.088024 -1.443354 0.060112
0.860365 0.008421 0.043441 0.032070 0.001017 0.014572 0.203575 0.074791
0.49 -2.53 5.09 6.42 -3.10 -6.04 -7.09 0.80
Approx Pr > |t| 0.6210 0.0113 <.0001 <.0001 0.0019 <.0001 <.0001 0.4215
The heteroscedastic logit model can be estimated using the HETERO statement. If the variance of the logit model is a function of the family income level excluding wife’s income (nwifeinc ), the variance can be specified as Var.i / D 2 exp. *nwifeinci / where 2 is normalized to 1 because the dependent variable is discrete. The following SAS statements estimate the heteroscedastic logit model: /*-- Binary Logit with Heteroscedasticity --*/ proc qlim data=mroz; model inlf = nwifeinc educ exper expersq age kidslt6 kidsge6 / discrete(d=logit); hetero inlf ~ nwifeinc / noconst; run;
The parameter estimate, , of the heteroscedasticity variable is listed as _H.nwifeinc; see Figure 21.6.
1382 F Chapter 21: The QLIM Procedure
Figure 21.6 Parameter Estimates of Binary Logit with Heteroscedasticity Binary Data The QLIM Procedure Parameter Estimates
Parameter Intercept nwifeinc educ exper expersq age kidslt6 kidsge6 _H.nwifeinc
DF
Estimate
Standard Error
t Value
1 1 1 1 1 1 1 1 1
0.510445 -0.026778 0.255547 0.234105 -0.003613 -0.100878 -1.645206 0.066941 0.013280
0.983538 0.012108 0.061728 0.046639 0.001236 0.021491 0.311296 0.085633 0.013606
0.52 -2.21 4.14 5.02 -2.92 -4.69 -5.29 0.78 0.98
Approx Pr > |t| 0.6038 0.0270 <.0001 <.0001 0.0035 <.0001 <.0001 0.4344 0.3291
Syntax: QLIM Procedure The QLIM procedure is controlled by the following statements: PROC QLIM options ; BOUNDS bound1 < , bound2 . . . > ; BY variables ; CLASS variables ; ENDOGENOUS variables options ; HETERO dependent variables exogenous variables / options ; INIT initvalue1 < , initvalue2 . . . > ; MODEL dependent variables = regressors / options ; NLOPTIONS options ; OUTPUT options ; RESTRICT restriction1 < , restriction2 . . . > ; TEST options ; WEIGHT variable ;
At least one MODEL statement is required. If more than one MODEL statement is used, the QLIM procedure estimates a system of models. Main effects and higher-order terms can be specified in the MODEL statement, as in the GLM procedure and PROBIT procedure in SAS/STAT. If a CLASS statement is used, it must precede the MODEL statement.
Functional Summary F 1383
Functional Summary The statements and options used with the QLIM procedure are summarized in the following table. Table 21.1
QLIM Functional Summary
Description
Statement
Option
Data Set Options specify the input data set write parameter estimates to an output data set write predictions to an output data set
QLIM QLIM OUTPUT
DATA= OUTEST= OUT=
Declaring the Role of Variables specify BY-group processing specify classification variables specify a weight variable
BY CLASS WEIGHT
Printing Control Options request all printing options print correlation matrix of the estimates print covariance matrix of the estimates print a summary iteration listing suppress the normal printed output
QLIM QLIM QLIM QLIM QLIM
PRINTALL CORRB COVB ITPRINT NOPRINT
Options to Control the Optimization Process specify the optimization method specify the optimization options
QLIM NLOPTIONS
METHOD= see Chapter 6, “Nonlinear Optimization Methods,”
set initial values for parameters set linear restrictions on parameters
Model Estimation Options specify options specific to Box-Cox transformation suppress the intercept parameter specify a seed for pseudo-random number generation specify number of draws for Monte Carlo integration specify method to calculate parameter covariance Endogenous Variable Options specify discrete variable specify censored variable specify truncated variable specify variable selection condition
INIT BOUNDS STRICT
RE-
MODEL MODEL QLIM
BOXCOX() NOINT SEED=
QLIM
NDRAW=
QLIM
COVEST=
ENDOGENOUS ENDOGENOUS ENDOGENOUS ENDOGENOUS
DISCRETE() CENSORED() TRUNCATED() SELECT()
1384 F Chapter 21: The QLIM Procedure
Table 21.1
continued
Description
Statement
Option
specify stochastic frontier variable
ENDOGENOUS
FRONTIER()
Heteroscedasticity Model Options specify the function for heteroscedasticity models square the function for heteroscedasticity models specify no constant for heteroscedasticity models
HETERO HETERO HETERO
LINK= SQUARE NOCONST
Output Control Options output predicted values output structured part output residuals output error standard deviation output marginal effects output probability for the current response output probability for all responses output expected value output conditional expected value output inverse Mills ratio include covariances in the OUTEST= data set include correlations in the OUTEST= data set
OUTPUT OUTPUT OUTPUT OUTPUT OUTPUT OUTPUT OUTPUT OUTPUT OUTPUT OUTPUT QLIM QLIM
PREDICTED XBETA RESIDUAL ERRSTD MARGINAL PROB PROBALL EXPECTED CONDITIONAL MILLS COVOUT CORROUT
TEST
ALL
TEST TEST TEST
WALD LM LR
Test Request Options Requests Wald, Lagrange multiplier, and likelihood ratio tests Requests the WALD test Requests the Lagrange multiplier test Requests the likelihood ratio test
PROC QLIM Statement PROC QLIM options ;
The following options can be used in the PROC QLIM statement.
Data Set Options DATA=SAS-data-set
specifies the input SAS data set. If the DATA= option is not specified, PROC QLIM uses the most recently created SAS data set.
PROC QLIM Statement F 1385
Output Data Set Options OUTEST=SAS-data-set
writes the parameter estimates to an output data set. COVOUT
writes the covariance matrix for the parameter estimates to the OUTEST= data set. This option is valid only if the OUTEST= option is specified. CORROUT
writes the correlation matrix for the parameter estimates to the OUTEST= data set. This option is valid only if the OUTEST= option is specified.
Printing Options NOPRINT
suppresses the normal printed output but does not suppress error listings. If NOPRINT option is set, then any other print option is turned off. PRINTALL
turns on all the printing-control options. The options set by PRINTALL are COVB and CORRB. CORRB
prints the correlation matrix of the parameter estimates. COVB
prints the covariance matrix of the parameter estimates. ITPRINT
prints the initial parameter estimates, convergence criteria, and all constraints of the optimization. At each iteration, objective function value, step size, maximum gradient, and slope of search direction are printed as well.
Model Estimation Options COVEST=covariance-option
specifies the method to calculate the covariance matrix of parameter estimates. The supported covariance types are as follows: OP
specifies the covariance from the outer product matrix.
HESSIAN
specifies the covariance from the inverse Hessian matrix.
QML
specifies the covariance from the outer product and Hessian matrices (the quasi-maximum likelihood estimates).
The default is COVEST=HESSIAN.
1386 F Chapter 21: The QLIM Procedure
NDRAW=value
specifies the number of draws for Monte Carlo integration. SEED=value
specifies a seed for pseudo-random number generation in Monte Carlo integration.
Options to Control the Optimization Process PROC QLIM uses the nonlinear optimization (NLO) subsystem to perform nonlinear optimization tasks. All the NLO options are available from the NLOPTIONS statement. For details, see Chapter 6, “Nonlinear Optimization Methods.” METHOD=value
specifies the optimization method. If this option is specified, it overwrites the TECH= option in NLOPTIONS statement. Valid values are as follows: CONGRA
performs a conjugate-gradient optimization
DBLDOG
performs a version of double dogleg optimization
NMSIMP
performs a Nelder-Mead simplex optimization
NEWRAP
performs a Newton-Raphson optimization combining a line-search algorithm with ridging
NRRIDG
performs a Newton-Raphson optimization with ridging
QUANEW
performs a quasi-Newton optimization
TRUREG
performs a trust region optimization
The default method is METHOD=QUANEW.
BOUNDS Statement BOUNDS bound1 < , bound2 . . . > ;
The BOUNDS statement imposes simple boundary constraints on the parameter estimates. BOUNDS statement constraints refer to the parameters estimated by the QLIM procedure. Any number of BOUNDS statements can be specified. Each bound is composed of parameters and constants and inequality operators. Parameters associated with regressor variables are referred to by the names of the corresponding regressor variables: item operator item < operator item < operator item . . . > >
Each item is a constant, the name of a parameter, or a list of parameter names. See the section “Naming of Parameters” on page 1414 for more details on how parameters are named in the QLIM procedure. Each operator is ’<’, ’>’, ’<=’, or ’>=’.
BY Statement F 1387
Both the BOUNDS statement and the RESTRICT statement can be used to impose boundary constraints; however, the BOUNDS statement provides a simpler syntax for specifying these kinds of constraints. See the “RESTRICT Statement” on page 1393 for more information. The following BOUNDS statement constrains the estimates of the parameters associated with the variable ttime and the variables x1 through x10 to be between zero and one. This example illustrates the use of parameter lists to specify boundary constraints. bounds 0 < ttime x1-x10 < 1;
The following BOUNDS statement constrains the estimates of the correlation (_RHO) and sigma (_SIGMA) in the bivariate model: bounds _rho >= 0, _sigma.y1 > 1, _sigma.y2 < 5;
BY Statement BY variables ;
A BY statement can be used with PROC QLIM to obtain separate analyses on observations in groups defined by the BY variables.
CLASS Statement CLASS variables ;
The CLASS statement names the classification variables to be used in the analysis. Classification variables can be either character or numeric. Class levels are determined from the formatted values of the CLASS variables. Thus, you can use formats to group values into levels. See the discussion of the FORMAT procedure in SAS Language Reference: Dictionary for details. If the CLASS statement is used, it must appear before any of the MODEL statements.
ENDOGENOUS Statement ENDOGENOUS variables options ;
The ENDOGENOUS statement specifies the type of dependent variables that appear on the lefthand side of the equation. Endogenous variables listed refer to the dependent variables that appear on the left-hand side of the equation. Currently, no right-hand side endogeneity is handled in PROC QLIM. All variables appearing on the right-hand side of the equation are treated as exogenous.
1388 F Chapter 21: The QLIM Procedure
Discrete Variable Options DISCRETE < (discrete-options ) >
specifies that the endogenous variables in this statement are discrete. Valid discrete-options are as follows: ORDER=DATA | FORMATTED | FREQ | INTERNAL
specifies the sorting order for the levels of the discrete variables specified in the ENDOGENOUS statement. This ordering determines which parameters in the model correspond to each level in the data. The following table shows how PROC QLIM interprets values of the ORDER= option. Value of ORDER=
Levels Sorted By
DATA FORMATTED FREQ
order of appearance in the input data set formatted value descending frequency count; levels with the most observations come first in the order unformatted value
INTERNAL
By default, ORDER=FORMATTED. For the values FORMATTED and INTERNAL, the sort order is machine dependent. For more information about sorting order, see the chapter on the SORT procedure in the Base SAS Procedures Guide. DISTRIBUTION=distribution-type DIST=distribution-type D=distribution-type
specifies the cumulative distribution function used to model the response probabilities. Valid values for distribution-type are as follows: NORMAL
the normal distribution for the probit model
LOGISTIC
the logistic distribution for the logit model
By default, DISTRIBUTION=NORMAL. If a multivariate model is specified, logistic distribution is not allowed. Only normal distribution is supported.
Censored Variable Options CENSORED < (censored-options ) >
specifies that the endogenous variables in this statement be censored. Valid censored-options are as follows: LB=value or variable LOWERBOUND=value or variable
specifies the lower bound of the censored variables. If value is missing or the value in variable is missing, no lower bound is set. By default, no lower bound is set.
ENDOGENOUS Statement F 1389
UB=value or variable UPPERBOUND=value or variable
specifies the upper bound of the censored variables. If value is missing or the value in variable is missing, no upper bound is set. By default, no upper bound is set.
Truncated Variable Options TRUNCATED < (truncated-options ) >
specifies that the endogenous variables in this statement be truncated. Valid truncated-options are as follows: LB=value or variable LOWERBOUND=value or variable
specifies the lower bound of the truncated variables. If value is missing or the value in variable is missing, no lower bound is set. By default, no lower bound is set. UB=value or variable UPPERBOUND=value or variable
specifies the upper bound of the truncated variables. If value is missing or the value in variable is missing, no upper bound is set. By default, no upper bound is set.
Stochastic Frontier Variable Options FRONTIER < (frontier-options ) >
specifies that the endogenous variable in this statement follow a production or cost frontier. Valid frontier-options are as follows: TYPE=
HALF
specifies half-normal model.
EXPONENTIAL specifies exponential model. TRUNCATED
specifies truncated normal model.
PRODUCTION
specifies that the model estimated be a production function. COST
specifies that the model estimated be a cost function. If neither PRODUCTION nor COST option is specified, production function is estimated by default.
1390 F Chapter 21: The QLIM Procedure
Selection Options SELECT (select-option )
specifies selection criteria for sample selection model. Select-option specifies the condition for the endogenous variable to be selected. It is written as a variable name, followed by an equality operator (=) or an inequality operator (<, >, <=, >=), followed by a number: variable operator number
The variable is the endogenous variable that the selection is based on. The operator can be =, <, >, <= , or >=. Multiple select-options can be combined with the logic operators: AND, OR. The following example illustrates the use of the SELECT option: endogenous y1 ~ select(z=0); endogenous y2 ~ select(z=1 or z=2);
The SELECT option can be used together with the DISCRETE, CENSORED, or TRUNCATED option. For example: endogenous y1 ~ select(z=0) discrete; endogenous y2 ~ select(z=1) censored (lb=0); endogenous y3 ~ select(z=1 or z=2) truncated (ub=10);
For more details about selection models with censoring or truncation, see the section “Selection Models” on page 1408.
HETERO Statement HETERO ;
dependent variables exogenous variables < / options > The HETERO statement specifies variables that are related to the heteroscedasticity of the residuals and the way these variables are used to model the error variance. The heteroscedastic regression model supported by PROC QLIM is yi D x0i ˇ C i i N.0; i2 / See the section “Heteroscedasticity” on page 1405 for more details on the specification of functional forms. LINK=value
The functional form can be specified using the LINK= option. The following option values are allowed: EXP
specifies the exponential link function 0
i2 D 2 .1 C exp.zi //
INIT Statement F 1391
LINEAR
specifies the linear link function 0
i2 D 2 .1 C zi / When the LINK= option is not specified, the exponential link function is specified by default. NOCONST
specifies that there be no constant in the linear or exponential heteroscedasticity model. 0
i2 D 2 .zi / 0
i2 D 2 exp.zi / SQUARE
estimates the model by using the square of linear heteroscedasticity function. For example, you can specify the following heteroscedasticity function: 0
i2 D 2 .1 C .zi /2 / model y = x1 x2 / discrete; hetero y ~ z1 / link=linear square;
The option SQUARE does not apply to exponential heteroscedasticity function because the 0 0 square of an exponential function of zi is the same as the exponential of 2zi . Hence the only difference is that all estimates are divided by two.
INIT Statement INIT initvalue1 < , initvalue2 . . . > ;
The INIT statement is used to set initial values for parameters in the optimization. Any number of INIT statements can be specified. Each initvalue is written as a parameter or parameter list, followed by an optional equality operator (=), followed by a number: parameter <=> number
MODEL Statement MODEL dependent = regressors < / options > ;
The MODEL statement specifies the dependent variable and independent regressor variables for the regression model. The following options can be used in the MODEL statement after a slash (/).
1392 F Chapter 21: The QLIM Procedure
LIMIT1=value
specifies the restriction of the threshold value of the first category when the ordinal probit or logit model is estimated. LIMIT1=ZERO is the default option. When LIMIT1=VARYING is specified, the threshold value is estimated. NOINT
suppresses the intercept parameter.
Endogenous Variable Options The endogenous variable options are the same as the options specified in the ENDOGENOUS statement. If an endogenous variable has an endogenous option specified in both the MODEL statement and the ENDOGENOUS statement, the option in the ENDOGENOUS statement is used.
BOXCOX Estimation Options BOXCOX (option-list )
specifies options that are used for Box-Cox regression or regressor transformation. For example, the Box-Cox regression is specified as model y = x1 x2 / boxcox(y=lambda,x1 x2)
PROC QLIM estimates the following Box-Cox regression model: ./
yi
. /
. /
D ˇ0 C ˇ1 x1i 2 C ˇ2 x2i 2 C i
The option-list takes the form variable-list < = varname > separated by ’,’. The variable-list specifies that the list of variables have the same Box-Cox transformation; varname specifies the name of this Box-Cox coefficient. If varname is not specified, the coefficient is called _Lambdai, where i increments sequentially.
NLOPTIONS Statement NLOPTIONS < options > ;
PROC QLIM uses the be nonlinear optimization (NLO) subsystem to perform nonlinear optimization tasks. For a list of all the options of the NLOPTIONS statement, see Chapter 6, “Nonlinear Optimization Methods.”
OUTPUT Statement OUTPUT < OUT=SAS-data-set > < output-options > ;
RESTRICT Statement F 1393
The OUTPUT statement creates a new SAS data set containing all variables in the input data set and, optionally, the estimates of x0 ˛, predicted value, residual, marginal effects, probability, standard deviation of the error, expected value, conditional expected value, and inverse Mills ratio. When the response values are missing for the observation, all output estimates except residual are still computed as long as none of the explanatory variables is missing. This enables you to compute these statistics for prediction. You can specify only one OUTPUT statement. Details on the specifications in the OUTPUT statement are as follows: CONDITIONAL
output estimates of conditional expected values of continuous endogenous variables. ERRSTD
output estimates of j , the standard deviation of the error term. EXPECTED
output estimates of expected values of continuous endogenous variables. MARGINAL
output marginal effects. MILLS
output estimates of inverse Mills ratios of censored or truncated continuous, binary discrete, and selection endogenous variables. OUT=SAS-data-set
names the output data set. PREDICTED
output estimates of predicted endogenous variables. PROB
output estimates of probability of discrete endogenous variables taking the current observed responses. PROBALL
output estimates of probability of discrete endogenous variables for all possible responses. RESIDUAL
output estimates of residuals of continuous endogenous variables. XBETA
output estimates of x0 ˛.
RESTRICT Statement RESTRICT restriction1 < , restriction2 . . . > ;
1394 F Chapter 21: The QLIM Procedure
The RESTRICT statement is used to impose linear restrictions on the parameter estimates. Any number of RESTRICT statements can be specified, but the number of restrictions imposed is limited by the number of regressors. Each restriction is written as an expression, followed by an equality operator (=) or an inequality operator (<, >, <=, >=), followed by a second expression: expression operator expression
The operator can be =, <, >, <= , or >=. The operator and second expression are optional. Restriction expressions can be composed of parameter names, multiplication (), addition (C) and substitution ( ) operators, and constants. Parameters named in restriction expressions must be among the parameters estimated by the model. Parameters associated with a regressor variable are referred to by the name of the corresponding regressor variable. The restriction expressions must be a linear function of the parameters. The following is an example of the use of the RESTRICT statement: proc qlim data=one; model y = x1-x10 / discrete; restrict x1*2 <= x2 + x3; run;
The RESTRICT statement can also be used to impose cross-equation restrictions in multivariate models. The following RESTRICT statement imposes an equality restriction on coefficients of x1 in equation y1 and x1 in equation y2: proc qlim data=one; model y1 = x1-x10; model y2 = x1-x4; endogenous y1 y2 ~ discrete; restrict y1.x1=y2.x1; run;
TEST Statement <’label’:>
TEST <’string’:> equation [,equation. . . ] / options ;
The TEST statement performs Wald, Lagrange multiplier, and likelihood ratio tests of linear hypotheses about the regression parameters in the preceding MODEL statement. Each equation specifies a linear hypothesis to be tested. All hypotheses in one TEST statement are tested jointly. Variable names in the equations must correspond to regressors in the preceding MODEL statement, and each name represents the coefficient of the corresponding regressor. The keyword INTERCEPT refers to the coefficient of the intercept. The following options can be specified in the TEST statement after the slash (/):
WEIGHT Statement F 1395
ALL
requests Wald, Lagrange multiplier, and likelihood ratio tests. WALD
requests the Wald test. LM
requests the Lagrange multiplier test. LR
requests the likelihood ratio test. The following illustrates the use of the TEST statement: proc qlim; model y = x1 x2 x3; test x1 = 0, x2 * .5 + 2 * x3 = 0; test _int: test intercept = 0, x3 = 0; run;
The first test investigates the joint hypothesis that ˇ1 D 0 and 0:5ˇ2 C 2ˇ3 D 0 Only linear equality restrictions and tests are permitted in PROC QLIM. Tests expressions can be composed only of algebraic operations involving the addition symbol (+), subtraction symbol (-), and multiplication symbol (*). The TEST statement accepts labels that are reproduced in the printed output. TEST statement can be labeled in two ways. A TEST statement can be preceded by a label followed by a colon. Alternatively, the keyword TEST can be followed by a quoted string. If both are present, PROC QLIM uses the label preceding the colon. In the event no label is present, PROC QLIM automatically labels the tests.
WEIGHT Statement WEIGHT variable ;
The WEIGHT statement specifies a variable to supply weighting values to use for each observation in estimating parameters. The log likelihood for each observation is multiplied by the corresponding weight variable value. If the weight of an observation is nonpositive, that observation is not used in the estimation.
1396 F Chapter 21: The QLIM Procedure
Details: QLIM Procedure
Ordinal Discrete Choice Modeling Binary Probit and Logit Model The binary choice model is yi D x0i ˇ C i where value of the latent dependent variable, yi , is observed only as follows: yi
D 1 if yi > 0 D 0 otherwise
The disturbance, i , of the probit model has standard normal distribution with the distribution function (CDF) Z x 1 ˆ.x/ D p exp. t 2 =2/dt 2 1 The disturbance of the logit model has standard logistic distribution with the CDF ƒ.x/ D
exp.x/ 1 D 1 C exp.x/ 1 C exp. x/
The binary discrete choice model has the following probability that the event fyi D 1g occurs: ˆ.x0i ˇ/ .probit/ 0 P .yi D 1/ D F .xi ˇ/ D ƒ.x0i ˇ/ .logit/ The log-likelihood function is `D
N X ˚ yi logŒF .x0i ˇ/ C .1
yi / logŒ1
F .x0i ˇ/
i D1
where the CDF F .x/ is defined as ˆ.x/ for the probit model while F .x/ D ƒ.x/ for logit. The first order derivative of the logit model are N
X @` D .yi @ˇ
ƒ.x0i ˇ//xi
i D1
The probit model has more complicated derivatives N N X X .2yi 1/.x0i ˇ/ @` D x D ri xi i @ˇ ˆ.x0i ˇ/ i D1
i D1
Ordinal Discrete Choice Modeling F 1397
where ri D
.2yi
1/.x0i ˇ/ ˆ.x0i ˇ/
Note that the logit maximum likelihood estimates are p times greater than probit maximum like3 lihood estimates, since the probit parameter estimates, ˇ, are standardized, and the error term with 2 logistic distribution has a variance of 3 .
Ordinal Probit/Logit When the dependent variable is observed in sequence with M categories, binary discrete choice modeling is not appropriate for data analysis. McKelvey and Zavoina (1975) proposed the ordinal (or ordered) probit model. Consider the following regression equation: yi D x0i ˇ C i where error disturbances, i , have the distribution function F . The unobserved continuous random variable, yi , is identified as M categories. Suppose there are M C 1 real numbers, 0 ; ; M , where 0 D 1, 1 D 0, M D 1, and 0 1 M . Define Ri;j D j
x0i ˇ
The probability that the unobserved dependent variable is contained in the j th category can be written as P Œj
1
< yi j D F .Ri;j /
F .Ri;j
1/
The log-likelihood function is `D
N X M X
dij log F .Ri;j /
F .Ri;j
1/
i D1 j D1
where dij D
1 ifj 1 < yi j 0 otherwise
The first derivatives are written as N X M X f .Ri;j 1 / f .Ri;j / @` dij D xi @ˇ F .Ri;j / F .Ri;j 1 / i D1 j D1
N X M X ıj;k f .Ri;j / ıj 1;k f .Ri;j @` D dij @k F .Ri;j / F .Ri;j 1 /
1/
i D1 j D1
where f .x/ D dFdx.x/ and ıj;k D 1 if j D k. When the ordinal probit is estimated, it is assumed that F .Ri;j / D ˆ.Ri;j /. The ordinal logit model is estimated if F .Ri;j / D ƒ.Ri;j /. The first
1398 F Chapter 21: The QLIM Procedure
threshold parameter, 1 , is estimated when the LIMIT1=VARYING option is specified. By default (LIMIT1=ZERO), so that M 2 threshold parameters (2 ; : : : ; M 1 ) are estimated. The ordered probit models are analyzed by Aitchison and Silvey (1957), and Cox (1970) discussed ordered response data by using the logit model. They defined the probability that yi belongs to j th category as P Œj
1
< yi j D F .j C x0i /
F .j
1
C x0i /
where 0 D 1 and M D 1. Therefore, the ordered response model analyzed by Aitchison and Silvey can be estimated if the LIMIT1=VARYING option is specified. Note that D ˇ.
Goodness-of-Fit Measures The goodness-of-fit measures discussed in this section apply only to discrete dependent variable models. McFadden (1974) suggested a likelihood ratio index that is analogous to the R2 in the linear regression model: 2 RM D1
ln L ln L0
where L is the value of the maximum likelihood function and L0 is a likelihood function when regression coefficients except an intercept term are zero. It can be shown that L0 can be written as L0 D
M X
Nj ln.
j D1
Nj / N
where Nj is the number of responses in category j . Estrella (1998) proposes the following requirements for a goodness-of-fit measure to be desirable in discrete choice modeling: The measure must take values in Œ0; 1, where 0 represents no fit and 1 corresponds to perfect fit. The measure should be directly related to the valid test statistic for significance of all slope coefficients. The derivative of the measure with respect to the test statistic should comply with corresponding derivatives in a linear regression. Estrella’s (1998) measure is written 2 RE1
D1
ln L ln L0
2 N
ln L0
An alternative measure suggested by Estrella (1998) is 2 RE 2 D1
Œ.ln L
K/= ln L0
2 N
ln L0
Limited Dependent Variable Models F 1399
where ln L0 is computed with null slope parameter values, N is the number observations used, and K represents the number of estimated parameters. Other goodness-of-fit measures are summarized as follows: 2 RC U1
2 RC U2
D1
D
1
N2 .Cragg
Uhler1/
2
.L0 =L/ N 2
1 2 RA D
L0 L
.Cragg
Uhler2/
L0N
2.ln L ln L0 / .Aldrich 2.ln L ln L0 / C N
Nelson/
2 ln L0 N .Veall Zimmermann/ 2 ln L0 PN yNOi /2 i D1 .yOi 2 RMZ D .McKelvey Zavoina/ PN N C i D1 .yOi yONi /2 P where yOi D x0i ˇO and yNOi D N i D1 yOi =N . 2 RV2 Z D RA
Limited Dependent Variable Models Censored Regression Models When the dependent variable is censored, values in a certain range are all transformed to a single value. For example, the standard tobit model can be defined as yi D x0i ˇ C i yi ifyi > 0 yi D 0 ifyi 0 where i i idN.0; 2 /. The log-likelihood function of the standard censored regression model is X X yi x0i ˇ 0 /= `D lnŒ1 ˆ.xi ˇ= / C ln . i 2fyi D0g
i 2fyi >0g
where ˆ./ is the cumulative density function of the standard normal distribution and ./ is the probability density function of the standard normal distribution. The tobit model can be generalized to handle observation-by-observation censoring. The censored model on both of the lower and upper limits can be defined as 8 < Ri if yi Ri y if Li < yi < Ri yi D : i Li if yi Li
1400 F Chapter 21: The QLIM Procedure
The log-likelihood function can be written as x0i ˇ
yi ln .
X
` D
i 2fLi
X
ln ˆ.
x0i ˇ
Li
i 2fyi DLi g
/= C
X
Ri ln ˆ.
i 2fyi DRi g
x0i ˇ
/ C
/
Log-likelihood functions of the lower- or upper-limit censored model are easily derived from the two-limit censored model. The log-likelihood function of the lower-limit censored model is X X yi x0i ˇ Li x0i ˇ `D ln . /= C ln ˆ. / i 2fyi >Li g
i 2fyi DLi g
The log-likelihood function of the upper-limit censored model is X X Ri x0i ˇ yi x0i ˇ /= C ln 1 ˆ. / `D ln . i 2fyi
i 2fyi DRi g
Types of Tobit Models Amemiya (1984) classified Tobit models into five types based on characteristics of the likelihood function. For notational convenience, let P denote a distribution or density function, yji is assumed to be normally distributed with mean x0j i ˇj and variance j2 . Type 1 Tobit The Type 1 Tobit model was already discussed in the preceding section. y1i
D x01i ˇ1 C u1i
y1i
D y1i if y1i >0 D 0 if y1i 0
The likelihood function is characterized as P .y1 < 0/P .y1 /. Type 2 Tobit The Type 2 Tobit model is defined as y1i
D x01i ˇ1 C u1i
y2i
D x02i ˇ2 C u2i
y1i
D 1 if y1i >0 D 0 if y1i 0
y2i
D y2i if y1i >0 D 0 if y1i 0
Limited Dependent Variable Models F 1401
where .u1i ; u2i / N.0; †/. The likelihood function is described as P .y1 < 0/P .y1 > 0; y2 /. Type 3 Tobit of the Type 3 Tobit is observed The Type 3 Tobit model is different from the Type 2 Tobit in that y1i when y1i > 0. y1i
D x01i ˇ1 C u1i
y2i
D x02i ˇ2 C u2i
y1i
D y1i if y1i >0 D 0 if y1i 0
y2i
D y2i if y1i >0 D 0 if y1i 0
where .u1i ; u2i /0 i idN.0; †/. The likelihood function is characterized as P .y1 < 0/P .y1 ; y2 /. Type 4 Tobit The Type 4 Tobit model consists of three equations: y1i
D x01i ˇ1 C u1i
y2i
D x02i ˇ2 C u2i
y3i
D x03i ˇ3 C u3i
y1i
D y1i if y1i >0 D 0 if y1i 0
y2i
D y2i if y1i >0 D 0 if y1i 0
y3i
D y3i if y1i 0 D 0 if y1i >0
where .u1i ; u2i ; u3i /0 i idN.0; †/. The likelihood function of the Type 4 Tobit model is characterized as P .y1 < 0; y3 /P .y1 ; y2 /.
1402 F Chapter 21: The QLIM Procedure
Type 5 Tobit The Type 5 Tobit model is defined as follows: y1i
D x01i ˇ1 C u1i
y2i
D x02i ˇ2 C u2i
y3i
D x03i ˇ3 C u3i
y1i
D 1 if y1i >0 D 0 if y1i 0
y2i
D y2i if y1i >0 D 0 if y1i 0
y3i
D y3i if y1i 0 D 0 if y1i >0
where .u1i ; u2i ; u3i /0 are from iid trivariate normal distribution. The likelihood function of the Type 5 Tobit model is characterized as P .y1 < 0; y3 /P .y1 > 0; y2 /. Code examples for these models can be found in “Example 21.6: Types of Tobit Models” on page 1426.
Truncated Regression Models In a truncated model, the observed sample is a subset of the population where the dependent variable falls in a certain range. For example, when neither a dependent variable nor exogenous variables are observed for yi 0, the truncated regression model can be specified. X ..yi x0i ˇ/=/ 0 ln ˆ.xi ˇ=/ C ln `D i 2fyi >0g
Two-limit truncation model is defined as yi D yi if Li < yi < Ri The log-likelihood function of the two-limit truncated regression model is N X yi `D ln . i D1
x0i ˇ
/=
Ri ln ˆ.
x0i ˇ
/
ˆ.
x0i ˇ
Li
/
The log-likelihood functions of the lower- and upper-limit truncation model are N X yi ` D ln . i D1 N X yi ` D ln . i D1
x0i ˇ x0i ˇ
/= /=
ln 1
ˆ.
Ri ln ˆ.
x0i ˇ
Li x0i ˇ
(lower) /
/ (upper)
Stochastic Frontier Production and Cost Models F 1403
Stochastic Frontier Production and Cost Models Stochastic frontier production models were first developed by Aigner, Lovell, and Schmidt (1977) and Meeusen and van den Broeck (1977). Specification of these models allow for random shocks of the production or cost but also include a term for technological or cost inefficiency. Assuming that the production function takes a log-linear Cobb-Douglas form, the stochastic frontier production model can be written as X ln.yi / D ˇ0 C ˇn ln.xni / C i n
where i D vi ui . The vi term represents the stochastic error component and ui is the nonnegative, technology inefficiency error component. The vi error component is assumed to be distributed iid normal and independently from ui . If ui > 0, the error term, i , is negatively skewed and represents technology inefficiency. If ui < 0, the error term i is positively skewed and represents cost inefficiency. PROC QLIM models the ui error component as a half normal, exponential, or truncated normal distribution.
The Normal-Half Normal Model In case of the normal-half normal model, vi is iid N.0; v2 /, ui is iid N C .0; u2 / with vi and ui independent of each other. Given the independence of error terms, the joint density of v and u can be written as u2 v2 2 exp f .u; v/ D 2u v 2u2 2v2 Substituting v D C u into the preceding equation gives 2 . C u/2 u2 f .u; / D exp 2u v 2u2 2v2 Integrating u out to obtain the marginal density function of results in the following form: 1
Z f ./ D D D
f .u; /du 2 2 1 exp p 2 2 2 2 ˆ 0
where D u =v and D
p u2 C v2 .
In the case of a stochastic frontier cost model, v D 2 f ./ D ˆ
u and
1404 F Chapter 21: The QLIM Procedure
The log-likelihood function for the production model with N producers is written as X 1 X 2 i i ln L D const ant N ln C ln ˆ 2 2 i
i
The Normal-Exponential Model Under the normal-exponential model, vi is iid N.0; v2 / and ui is iid exponential. Given the independence of error term components ui and vi , the joint density of v and u can be written as 1 u v2 f .u; v/ D p exp u 2v2 2u v The marginal density function of for the production function is 1
Z f ./ D
f .u; /du 1 D ˆ u v 0
v v2 exp C 2 u u 2u
and the marginal density function for the cost function is equal to 1 v v2 f ./ D ˆ exp C 2 u v u u 2u The log-likelihood function for the normal-exponential production model with N producers is 2 X X i i v v C ln ˆ ln L D const ant N ln u C N C 2u2 u v u i
i
The Normal-Truncated Normal Model The normal-truncated normal model is a generalization of the normal-half normal model by allowing the mean of ui to differ from zero. Under the normal-truncated normal model, the error term component vi is iid N C .0; v2 / and ui is iid N.; u2 /. The joint density of vi and ui can be written as 1 .u /2 v2 f .u; v/ D p exp 2u2 2v2 2u v ˆ .=u / The marginal density function of for the production function is 1
Z f ./ D D D
f .u; /du 0
. C /2 ˆ exp p 2 2 2 ˆ .=u / 1 1 C ˆ ˆ u 1
Heteroscedasticity and Box-Cox Transformation F 1405
and the marginal density function for the cost function is
f ./ D
1 ˆ C ˆ u
1
The log-likelihood function for the normal-truncated normal production model with N producers is
ln L D const ant
N ln
N ln ˆ u
C
X i
i ln ˆ C
1 X i C 2 2 i
For more detail on normal-half normal, normal-exponential, and normal-truncated models, see Kumbhakar and Knox Lovell (2000) and Coelli, Prasada Rao, and Battese (1998).
Heteroscedasticity and Box-Cox Transformation Heteroscedasticity If the variance of regression disturbance, (i ), is heteroscedastic, the variance can be specified as a function of variables E.i2 / D i2 D f .z0i / The following table shows various functional forms of heteroscedasticity and the corresponding options to request each model. No.
Model
Options
1 2 3 4 5 6
f .z0i / D 2 .1 C exp.z0i // f .z0i / D 2 exp.z0i / P f .z0i / D 2 .1 C L
l zli / PlD1 L 0 2 f .zi / D .1 C . lD1 l zli /2 / P
l zli / f .z0i / D 2 . L PlD1 L 0 2 f .zi / D .. lD1 l zli /2 /
link=EXP (default) link=EXP noconst link=LINEAR link=LINEAR square link=LINEAR noconst link=LINEAR square noconst
For discrete choice models, 2 is normalized ( 2 D 1) since this parameter is not identified. Note that in models 3 and 5, it may be possible that variances of some observations are negative. Although the QLIM procedure assigns a large penalty to move the optimization away from such region, it is possible that the optimization cannot improve the objective function value and gets locked in the region. Signs of such outcome include extremely small likelihood values or missing standard errors in the estimates. In models 2 and 6, variances are guaranteed to be greater or equal to zero, but
1406 F Chapter 21: The QLIM Procedure
it may be possible that variances of some observations are very close to zero. In these scenarios, standard errors may be missing. Models 1 and 4 do not have such problems. Variances in these models are always positive and never close to zero. The heteroscedastic regression model is estimated using the following log-likelihood function: `D
N ln.2/ 2
where ei D yi
N X 1 i D1
2
N
ln.i2 /
1 X ei 2 . / 2 i i D1
x0i ˇ.
Box-Cox Modeling The Box-Cox transformation on x is defined as ( x 1 if ¤ 0 x ./ D ln.x/ if D 0 The Box-Cox regression model with heteroscedasticity is written as . / yi 0
D ˇ0 C
K X
. /
ˇk xki k C i
kD1
D i C i where i N.0; i2 / and transformed variables must be positive. In practice, too many transformation parameters cause numerical problems in model fitting. It is common to have the same Box-Cox transformation performed on all the variables — that is, 0 D 1 D D K . It is required for the magnitude of transformed variables to be in the tolerable range if the corresponding transformation parameters are jj > 1. The log-likelihood function of the Box-Cox regression model is written as `D
N ln.2/ 2 .0 /
where ei D yi
N X i D1
ln.i /
N 1 X 2 e C .0 2i2 i D1 i
1/
N X
ln.yi /
i D1
i .
When the dependent variable is discrete, censored, or truncated, the Box-Cox transformation can be applied only to explanatory variables.
Bivariate Limited Dependent Variable Modeling The generic form of a bivariate limited dependent variable model is y1i
D x01i ˇ1 C 1i
y2i
D x02i ˇ2 C 2i
Bivariate Limited Dependent Variable Modeling F 1407
where the disturbances, 1i and 2i , have joint normal distribution with zero mean, standard deviations 1 and 2 , and correlation of . y1 and y2 are latent variables. The dependent variables y1 and y2 are observed if the latent variables y1 and y2 fall in certain ranges: y1 D y1i if y1i 2 D1 .y1i / y2 D y2i if y2i 2 D2 .y2i / / to .y1i ; y2i /. For example, if y1 and y2 are censored variables ; y2i D is a transformation from .y1i with lower bound 0, then y1 D y1i if y1i > 0;
y1 D 0 if y1i 0
y2 D y2i if y2i > 0;
y2 D 0 if y2i 0
and There are three cases for the log likelihood of .y1i ; y2i /. The first case is that y1i D y1i y2i D y2i . That is, this observation is mapped to one point in the space of latent variables. The log likelihood is computed from a bivariate normal density, y1 x1 0 ˇ1 y2 x2 0 ˇ2 `i D ln 2 . ; ; / ln 1 ln 2 1 2
where 2 .u; v; / is the density function for standardized bivariate normal distribution with correlation , 2 .u; v; / D
.1=2/.u2 Cv 2 2uv/=.1 2 /
e
2 /1=2
2.1
The second case is that one observed dependent variable is mapped to a point of its latent variable and the other dependent variable is mapped to a segment in the space of its latent variable. For example, in the bivariate censored model specified, if observed y1 > 0 and y2 D 0, then y1 D y1 and y2 2 . 1; 0. In general, the log likelihood for one observation can be written as follows (the subscript i is dropped for simplicity): If one set is a single point and the other set is a range, without loss of generality, let D1 .y1 / D fy1 g and D2 .y2 / D ŒL2 ; R2 , `i
y1 D ln . " C ln ˆ
x1 0 ˇ1 / 1
R2
x2 0 ˇ2 2
ln 1 y1
x1 0 ˇ1 1
! ˆ
L2
x2 0 ˇ2
y1
x1 0 ˇ1 1
!#
2
where and ˆ are the density function and the cumulative probability function for standardized univariate normal distribution. The third case is that both dependent variables are mapped to segments in the space of latent variables. For example, in the bivariate censored model specified, if observed y1 D 0 and y2 D 0, then y1 2 . 1; 0 and y2 2 . 1; 0. In general, if D1 .y1 / D ŒL1 ; R1 and D2 .y2 / D ŒL2 ; R2 , the log likelihood is Z `i D ln
R1 L1
x1 0 ˇ1 1
x1 0 ˇ1 1
Z
R2 L2
x2 0 ˇ2 2
x2 0 ˇ2 2
2 .u; v; / du dv
1408 F Chapter 21: The QLIM Procedure
Selection Models In sample selection models, one or several dependent variables are observed when another variable takes certain values. For example, the standard Heckman selection model can be defined as zi D w0i C ui 1 if zi > 0 zi D 0 if zi 0 yi D x0i ˇ C i
if zi D 1
where ui and i are jointly normal with zero mean, standard deviations of 1 and , and correlation of . z is the variable that the selection is based on, and y is observed when z has a value of 1. Least squares regression using the observed data of y produces inconsistent estimates of ˇ. Maximum likelihood method is used to estimate selection models. It is also possible to estimate these models by using Heckman’s method, which is more computationally efficient. But it can be shown that the resulting estimates, although consistent, are not asymptotically efficient under normality assumption. Moreover, this method often violates the constraint on correlation coefficient jj 1. The log-likelihood function of the Heckman selection model is written as X
` D
lnŒ1
ˆ.w0i /
(
yi
i 2fzi D0g
X
C
ln .
i 2fzi D1g
xi 0 ˇ /
0
w0i C yi xi ˇ ln C ln ˆ p 1 2
!)
Only one variable is allowed for the selection to be based on, but the selection may lead to several variables. For example, in the following switching regression model, zi D w0i C ui 1 if zi > 0 zi D 0 if zi 0 y1i y2i
D x01i ˇ1 C 1i D
x02i ˇ2
C 2i
if zi D 0 if zi D 1
z is the variable that the selection is based on. If z D 0, then y1 is observed. If z D 1, then y2 is observed. Because it is never the case that y1 and y2 are observed at the same time, the correlation between y1 and y2 cannot be estimated. Only the correlation between z and y1 and the correlation between z and y2 can be estimated. This estimation uses the maximum likelihood method. A brief example of the code for this model can be found in “Example 21.4: Sample Selection Model” on page 1422.
Multivariate Limited Dependent Models F 1409
The Heckman selection model can include censoring or truncation. For a brief example of the code for these models see “Example 21.5: Sample Selection Model with Truncation and Censoring” on page 1423. The following example shows a variable yi that is censored from below at zero. zi D w0i C ui 1 if zi > 0 zi D 0 if zi 0 yi D x0i ˇ C i if zi D 1 yi ifyi > 0 yi D 0 ifyi 0 In this case, the log-likelihood function of the Heckman selection model needs to be modified to include the censored region. X
` D
ˆ.w0i /
lnŒ1
fi jzi D0g
( yi ln .
X
C
fi jzi D1;yi Dyi g
X
C
Z
xi 0 ˇ
Z
xi 0 ˇ /
fi jzi D1;yi D0g
!#)
1
ln 1
0
w0i C yi xi ˇ ln C ln ˆ p 1 2 "
wi 0
2 .u; v; / du dv
In case yi is truncated from below at zero instead of censored, the likelihood function can be written as X
` D
lnŒ1
ˆ.w0i /
fi jzi D0g
X
C
( yi ln .
fi jzi D1g
xi 0 ˇ /
0
w0i C yi xi ˇ ln C ln ˆ p 1 2 "
!# ln
ˆ.x0i ˇ=/
)
Multivariate Limited Dependent Models The multivariate model is similar to bivariate models. The generic form of the multivariate limited dependent variable model is y1i
D x01i ˇ1 C 1i
y2i
D x02i ˇ2 C 2i :::
ymi
D x0mi ˇm C mi
1410 F Chapter 21: The QLIM Procedure
where m is the number of models to be estimated. The vector has multivariate normal distribution with mean 0 and variance-covariance matrix †. Similar to bivariate models, the likelihood may involve computing multivariate normal integrations. This is done using Monte Carlo integration. (See Genz (1992) and Hajivassiliou and McFadden (1998).) When the number of equations, N , increases in a system, the number of parameters increases at the rate of N 2 because of the correlation matrix. When the number of parameters is large, sometimes the optimization converges but some of the standard deviations are missing. This usually means that the model is over-parameterized. The default method for computing the covariance is to use the inverse Hessian matrix. The Hessian is computed by finite differences, and in over-parameterized cases, the inverse cannot be computed. It is recommended that you reduce the number of parameters in such cases. Sometimes using the outer product covariance matrix (COVEST=OP option) may also help.
Tests on Parameters Tests on Parameters In general, the hypothesis tested can be written as H0 W h. / D 0 where h. / is an r by 1 vector valued function of the parameters given by the r expressions specified in the TEST statement. O Let O be the unconstrained estimate of and Q Let VO be the estimate of the covariance matrix of . Q be the constrained estimate of such that h./ D 0. Let A. / D @h. /=@ jO Using this notation, the test statistics for the three kinds of tests are computed as follows. The Wald test statistic is defined as 8 9 1 0 0 O W D h .O /:A.O /VO A .O /; h./ The Wald test is not invariant to reparameterization of the model (Gregory 1985; Gallant 1987, p. 219). For more information about the theoretical properties of the Wald test, see Phillips and Park (1988). The Lagrange multiplier test statistic is 0 0 LM D A.Q /VQ A .Q /
Q where is the vector of Lagrange multipliers from the computation of the restricted estimate . The likelihood ratio test statistic is LR D 2 L.O / L.Q /
Output to SAS Data Set F 1411
where Q represents the constrained estimate of and L is the concentrated log-likelihood value. For each kind of test, under the null hypothesis the test statistic is asymptotically distributed as a 2 random variable with r degrees of freedom, where r is the number of expressions in the TEST statement. The p-values reported for the tests are computed from the 2 .r/ distribution and are only asymptotically valid. Monte Carlo simulations suggest that the asymptotic distribution of the Wald test is a poorer approximation to its small sample distribution than that of the other two tests. However, the Wald test has the lowest computational cost, since it does not require computation of the constrained estimate Q . The following is an example of using the TEST statement to perform a likelihood ratio test: proc qlim; model y = x1 x2 x3; test x1 = 0, x2 * .5 + 2 * x3 = 0 /lr; run;
Output to SAS Data Set XBeta, Predicted, Residual Xbeta is the structural part on the right-hand side of the model. Predicted value is the predicted dependent variable value. For censored variables, if the predicted value is outside the boundaries, it is reported as the closest boundary. For discrete variables, it is the level whose boundaries Xbeta falls between. Residual is defined only for continuous variables and is defined as Residual D Observed
P red i cted
Error Standard Deviation Error standard deviation is i in the model. It varies only when the HETERO statement is used.
Marginal Effects Marginal effect is defined as a contribution of one control variable to the response variable. For the binary choice model with two response categories, 0 D 1, 1 D 0, 0 D 1; and ordinal response model with M response categories, 0 ; ; M , define Ri;j D j
x0i ˇ
The probability that the unobserved dependent variable is contained in the j th category can be written as P Œj
1
< yi j D F .Ri;j /
F .Ri;j
1/
1412 F Chapter 21: The QLIM Procedure
The marginal effect of changes in the regressors on the probability of yi D j is then @P robŒyi D j D Œf .j @x where f .x/ D
dF .x/ . dx
1
x0i ˇ/
f .j
x0i ˇ/ˇ
In particular,
dF .x/ f .x/ D D dx
(
2 p1 e x =2 2 e x Œ1Ce . x/ 2
.probit/ .logit/
The marginal effects in the Box-Cox regression model are x k @EŒyi Dˇ @x y 0
1 1
The marginal effects in the truncated regression model are @EŒyi jLi < yi < Ri ..ai / .bi //2 ai .ai / Dˇ 1 C 2 @x .ˆ.bi / ˆ.ai // ˆ.bi / where ai D
Li x0i ˇ i
and bi D
bi .bi / ˆ.ai /
Ri x0i ˇ . i
The marginal effects in the censored regression model are @EŒyjxi D ˇ P robŒLi < yi < Ri @x
Inverse Mills Ratio, Expected and Conditionally Expected Values Expected and conditionally expected values are computed only for continuous variables. The inverse Mills ratio is computed for censored or truncated continuous, binary discrete, and selection endogenous variables. Let Li and Ri be the lower boundary and upper boundary, respectively, for the yi . Define ai D Li x0i ˇ R x0 ˇ and bi D i i i . Then the inverse Mills ratio is defined as i D
..ai / .ˆ.bi /
.bi // ˆ.ai //
for a continuous variable and defined as D
.x0i ˇ/ ˆ.x0i ˇ/
for a binary discrete variable. The expected value is the unconditional expectation of the dependent variable. For a censored variable, it is EŒyi D ˆ.ai /Li C .x0i ˇ C i /.ˆ.bi /
ˆ.ai // C .1
ˆ.bi //Ri
OUTEST= Data Set F 1413
For a left-censored variable (Ri D 1), this formula is EŒyi D ˆ.ai /Li C .x0i ˇ C i /.1 where D
.ai / . 1 ˆ.ai /
For a right-censored variable (Li D
1), this formula is
EŒyi D .x0i ˇ C i /ˆ.bi / C .1 where D
ˆ.ai //
ˆ.bi //Ri
.bi / . ˆ.bi /
For a noncensored variable, this formula is EŒyi D x0i ˇ The conditional expected value is the expectation given that the variable is inside the boundaries: EŒyi jLi < yi < Ri D x0i ˇ C i
Probability Probability applies only to discrete responses. It is the marginal probability that the discrete response is taking the value of the observation. If the PROBALL option is specified, then the probability for all of the possible responses of the discrete variables is computed.
OUTEST= Data Set The OUTEST= data set contains all the parameters estimated in a MODEL statement. The OUTEST= option can be used when the PROC QLIM call contains one MODEL statement: proc qlim data=a outest=e; model y = x1 x2 x3; endogenous y ~ censored(lb=0); run;
Each parameter contains the estimate for the corresponding parameter in the corresponding model. In addition, the OUTEST= data set contains the following variables: _NAME_
the name of the independent variable
_TYPE_
type of observation. PARM indicates the row of coefficients; STD indicates the row of standard deviations of the corresponding coefficients.
_STATUS_
convergence status for optimization
1414 F Chapter 21: The QLIM Procedure
The rest of the columns correspond to the explanatory variables. The OUTEST= data set contains one observation for the MODEL statement, giving the parameter estimates for that model. If the COVOUT option is specified, the OUTEST= data set includes additional observations for the MODEL statement, giving the rows of the covariance matrix of parameter estimates. For covariance observations, the value of the _TYPE_ variable is COV, and the _NAME_ variable identifies the parameter associated with that row of the covariance matrix. If the CORROUT option is specified, the OUTEST= data set includes additional observations for the MODEL statement, giving the rows of the correlation matrix of parameter estimates. For correlation observations, the value of the _TYPE_ variable is CORR, and the _NAME_ variable identifies the parameter associated with that row of the correlation matrix.
Naming Naming of Parameters When there is only one equation in the estimation, parameters are named in the same way as in other SAS procedures such as REG, PROBIT, etc. The constant in the regression equation is called Intercept. The coefficients on independent variables are named by the independent variables. The standard deviation of the errors is called _Sigma. If there are Box-Cox transformations, the coefficients are named _Lambdai , where i increments from 1, or as specified by the user. The limits for the discrete dependent variable are named _Limiti . If the LIMIT=varying option is specified, then _Limiti starts from 1. If the LIMIT=varying option is not specified, then _Limit1 is set to 0 and the limit parameters start from i D 2. If the HETERO statement is included, the coefficients of the independent variables in the hetero equation are called _H.x, where x is the name of the independent variable. If the parameter name includes interaction terms, it needs to be enclosed in quotation marks followed by N . The following example restricts the parameter that includes the interaction term to be greater than zero: proc qlim data=a; model y = x1|x2; endogenous y ~ discrete; restrict "x1*x2"N>0; run;
When there are multiple equations in the estimation, the parameters in the main equation are named in the format of y.x, where y is the name of the dependent variable and x is the name of the independent variable. The standard deviation of the errors is called _Sigma.y. The correlation of the errors is called _Rho for bivariate model. For the model with three variables it is _Rho.y1.y2, _Rho.y1.y3, _Rho.y2.y3. The construction of correlation names for multivariate models is analogous. Box-Cox parameters are called _Lambdai .y and limit variables are called _Limiti .y. Parameters in the HETERO statement are named as _H.y.x. In the OUTEST= data set, all variables are changed from ’.’ to ’_’.
ODS Table Names F 1415
Naming of Output Variables The following table shows the option in the OUTPUT statement, with the corresponding variable names and their explanation. Option
Name
Explanation
PREDICTED RESIDUAL XBETA ERRSTD PROB
P_y RESID_y XBETA_y ERRSTD_y PROB_y
PROBALL
PROBi _y
MILLS EXPECTED CONDITIONAL
MILLS_y EXPCT_y CEXPCT_y
MARGINAL
MEFF_x
Predicted value of y Residual of y, (y-PredictedY) Structure part (x0 ˇ) of y equation Standard deviation of error term Probability that y is taking the observed value in this observation (discrete y only) Probability that y is taking the i th value (discrete y only) Inverse Mills ratio for y Unconditional expected value of y Conditional expected value of y, condition on the truncation. @y Marginal effect of x on y ( @x ) with single equation @y Marginal effect of x on y ( @x ) with multiple equations / Marginal effect of x on y ( @P rob.yDi ) @x with single equation and discrete y / Marginal effect of x on y ( @P rob.yDi ) @x with multiple equations and discrete y
MEFF_y_x MEFF_Pi _x MEFF_Pi _y_x
If you prefer to name the output variables differently, you can use the RENAME option in the data set. For example, the following statements rename the residual of y as Resid: proc qlim data=one; model y = x1-x10 / censored; output out=outds(rename=(resid_y=resid)) residual; run;
ODS Table Names PROC QLIM assigns a name to each table it creates. You can use these names to denote the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the Table 21.2.
1416 F Chapter 21: The QLIM Procedure
Table 21.2
ODS Tables Produced in PROC QLIM by the Model Statement
ODS Table Name ResponseProfile ClassLevels FitSummary GoodnessOfFit ConvergenceStatus ParameterEstimates CovB CorrB LinCon InputOptions ProblemDescription IterStart IterHist IterStop ConvergenceStatus ParameterEstimatesResults LinConSol
TestResults
Description ODS Tables Created by the Model Statement Response Profile Class Levels Summary of Non-linear Estimation Pseudo-R2 Measures Convergence Status Parameter Estimates Covariance of Parameter Estimates Correlation of Parameter Estimates Linear Constraints Input Options Problem Description Optimization Start Iteration History Optimization Results Convergence Status Resulting Parameters Linear Constraints Evaluated at Solution ODS Tables Created by the Test Statement Test Results
Option default default default default default default COVB CORRB ITPRINT ITPRINT ITPRINT ITPRINT ITPRINT ITPRINT ITPRINT ITPRINT ITPRINT
default
Examples: QLIM Procedure
Example 21.1: Ordered Data Modeling Cameron and Trivedi (1986) studied Australian Health Survey data. Variable definitions are given in Cameron and Trivedi (1998, p. 68). The dependent variable, dvisits, has nine ordered values. The following SAS statements estimate the ordinal probit model: /*-- Ordered Discrete Responses --*/ proc qlim data=docvisit; model dvisits = sex age agesq income levyplus freepoor freerepa illness actdays hscore chcond1 chcond2 / discrete; run;
Example 21.1: Ordered Data Modeling F 1417
The output of the QLIM procedure for ordered data modeling is shown in Output 21.1.1. Output 21.1.1 Ordered Data Modeling Binary Data The QLIM Procedure Discrete Response Profile of dvisits Index
Value
1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8
Frequency
Percent
4141 782 174 30 24 9 12 12 6
79.79 15.07 3.35 0.58 0.46 0.17 0.23 0.23 0.12
Output 21.1.1 continued Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion
1 dvisits 5190 -3138 0.0003675 82 Quasi-Newton 6316 6447
Goodness-of-Fit Measures Measure Likelihood Ratio (R) Upper Bound of R (U) Aldrich-Nelson Cragg-Uhler 1 Cragg-Uhler 2 Estrella Adjusted Estrella McFadden’s LRI Veall-Zimmermann McKelvey-Zavoina
Value 789.73 7065.9 0.1321 0.1412 0.1898 0.149 0.1416 0.1118 0.2291 0.2036
Formula 2 * (LogL - LogL0) - 2 * LogL0 R / (R+N) 1 - exp(-R/N) (1-exp(-R/N)) / (1-exp(-U/N)) 1 - (1-R/U)^(U/N) 1 - ((LogL-K)/LogL0)^(-2/N*LogL0) R / U (R * (U+N)) / (U * (R+N))
N = # of observations, K = # of regressors
1418 F Chapter 21: The QLIM Procedure
Output 21.1.1 continued Parameter Estimates
Parameter
DF
Estimate
Standard Error
t Value
Intercept sex age agesq income levyplus freepoor freerepa illness actdays hscore chcond1 chcond2 _Limit2 _Limit3 _Limit4 _Limit5 _Limit6 _Limit7 _Limit8
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
-1.378705 0.131885 -0.534190 0.857308 -0.062211 0.137030 -0.346045 0.178382 0.150485 0.100575 0.031862 0.061601 0.135321 0.938884 1.514288 1.711660 1.952860 2.087422 2.333786 2.789796
0.147413 0.043785 0.815907 0.898364 0.068017 0.053262 0.129638 0.074348 0.015747 0.005850 0.009201 0.049024 0.067711 0.031219 0.049329 0.058151 0.072014 0.081655 0.101760 0.156189
-9.35 3.01 -0.65 0.95 -0.91 2.57 -2.67 2.40 9.56 17.19 3.46 1.26 2.00 30.07 30.70 29.43 27.12 25.56 22.93 17.86
Approx Pr > |t| <.0001 0.0026 0.5126 0.3399 0.3604 0.0101 0.0076 0.0164 <.0001 <.0001 0.0005 0.2089 0.0457 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
By default, ordinal probit/logit models are estimated assuming that the first threshold or limit parameter (1 ) is 0. However, this parameter can also be estimated when the LIMIT1=VARYING option is specified. The probability that yi belongs to the j th category is defined as P Œj
1
< yi < j D F .j
x0i ˇ/
F .j
1
x0i ˇ/
where F ./ is the logistic or standard normal CDF, 0 D 1 and 9 D 1. Output 21.1.2 lists ordinal probit estimates computed in the following program. Note that the intercept term is suppressed for model identification when 1 is estimated. /*-- Ordered Probit --*/ proc qlim data=docvisit; model dvisits = sex age agesq income levyplus freepoor freerepa illness actdays hscore chcond1 chcond2 / discrete(d=normal) limit1=varying; run;
Example 21.2: Tobit Analysis F 1419
Output 21.1.2 Ordinal Probit Parameter Estimates with LIMIT1=VARYING Binary Data The QLIM Procedure Parameter Estimates
Parameter sex age agesq income levyplus freepoor freerepa illness actdays hscore chcond1 chcond2 _Limit1 _Limit2 _Limit3 _Limit4 _Limit5 _Limit6 _Limit7 _Limit8
DF
Estimate
Standard Error
t Value
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0.131885 -0.534181 0.857298 -0.062211 0.137031 -0.346045 0.178382 0.150485 0.100575 0.031862 0.061602 0.135322 1.378706 2.317590 2.892994 3.090367 3.331566 3.466128 3.712493 4.168502
0.043785 0.815915 0.898371 0.068017 0.053262 0.129638 0.074348 0.015747 0.005850 0.009201 0.049024 0.067711 0.147415 0.150206 0.155198 0.158263 0.164065 0.168799 0.179756 0.215738
3.01 -0.65 0.95 -0.91 2.57 -2.67 2.40 9.56 17.19 3.46 1.26 2.00 9.35 15.43 18.64 19.53 20.31 20.53 20.65 19.32
Approx Pr > |t| 0.0026 0.5127 0.3399 0.3604 0.0101 0.0076 0.0164 <.0001 <.0001 0.0005 0.2089 0.0457 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
Example 21.2: Tobit Analysis The following statements show a subset of the Mroz (1987) data set. In these data, Hours is the number of hours the wife worked outside the household in a given year, Yrs_Ed is the years of education, and Yrs_Exp is the years of work experience. A Tobit model will be fit to the hours worked with years of education and experience as covariates. By the nature of the data it is clear that there are a number of women who committed some positive number of hours to outside work (yi > 0 is observed). There are also a number of women who did not work at all (yi D 0 is observed). This gives us the following model: yi D x0i ˇ C i yi ifyi > 0 yi D 0 ifyi 0 where i i idN.0; 2 /. The set of explanatory variables is denoted by xi . title1 ’Estimating a Tobit model’;
1420 F Chapter 21: The QLIM Procedure
data subset; input Hours Yrs_Ed Yrs_Exp @@; if Hours eq 0 then Lower=.; else Lower=Hours; datalines; 0 8 9 0 8 12 0 9 10 0 10 15 0 11 4 0 11 6 1000 12 1 1960 12 29 0 13 3 2100 13 36 3686 14 11 1920 14 38 0 15 14 1728 16 3 1568 16 19 1316 17 7 0 17 15 ; /*-- Tobit Model --*/ proc qlim data=subset; model hours = yrs_ed yrs_exp; endogenous hours ~ censored(lb=0); run;
The output of the QLIM procedure is shown in Output 21.2.1. Output 21.2.1 Tobit Analysis Results Estimating a Tobit model The QLIM Procedure Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion
1 hours 17 -74.93700 1.18953E-6 23 Quasi-Newton 157.87400 161.20685
Parameter Estimates
Parameter
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
Intercept Yrs_Ed Yrs_Exp _Sigma
1 1 1 1
-5598.295129 373.123254 63.336247 1582.859635
27.692220 53.988877 36.551299 390.076480
-202.16 6.91 1.73 4.06
<.0001 <.0001 0.0831 <.0001
In the “Parameter Estimates” table there are four rows. The first three of these rows correspond to the vector estimate of the regression coefficients ˇ. The last one is called _Sigma, which corresponds to the estimate of the error variance .
Example 21.3: Bivariate Probit Analysis F 1421
Example 21.3: Bivariate Probit Analysis This example shows how to estimate a bivariate probit model. Note the INIT statement in the following program, which sets the initial values for some parameters in the optimization: data a; keep y1 y2 x1 x2; do i = 1 to 500; x1 = rannor( 19283 ); x2 = rannor( 98721 ); u1 = rannor( 76527 ); u2 = rannor( 65721 ); y1l = 1 + 2 * x1 + 3 * y2l = 3 + 4 * x1 - 2 * if ( y1l > 0 ) then y1 else y1 if ( y2l > 0 ) then y2 else y2 output; end; run;
x2 + u1; x2 + u1*.2 + u2; = 1; = 0; = 1; = 0;
/*-- Bivariate Probit --*/ proc qlim data=a method=qn; init y1.x1 2.8, y1.x2 2.1, _rho .1; model y1 = x1 x2; model y2 = x1 x2; endogenous y1 y2 ~ discrete; run;
The output of the QLIM procedure is shown in Output 21.3.1. Output 21.3.1 Bivariate Probit Analysis Results Estimating a Tobit model The QLIM Procedure Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion
2 y1 y2 500 -134.90796 3.23363E-7 17 Quasi-Newton 283.81592 313.31817
1422 F Chapter 21: The QLIM Procedure
Output 21.3.1 continued Parameter Estimates
Parameter y1.Intercept y1.x1 y1.x2 y2.Intercept y2.x1 y2.x2 _Rho
DF
Estimate
Standard Error
t Value
1 1 1 1 1 1 1
1.003639 2.244374 3.273441 3.621164 4.551525 -2.442769 0.144097
0.153678 0.256062 0.341581 0.457173 0.576547 0.332295 0.336459
6.53 8.76 9.58 7.92 7.89 -7.35 0.43
Approx Pr > |t| <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.6685
Example 21.4: Sample Selection Model This example illustrates the use of PROC QLIM for sample selection models. The data set is the same one from Mroz (1987). The goal is to estimate a wage offer function for married women, accounting for potential selection bias. Of the 753 women, the wage is observed for 428 working women. The labor force participation equation estimated in the introductory example is used for selection. The wage equation uses log wage (lwage ) as the dependent variable. The explanatory variables in the wage equation are the woman’s years of schooling (educ ), wife’s labor experience (exper), and square of experience (expersq ). The program is as follows: /*-- Sample Selection --*/ proc qlim data=mroz; model inlf = nwifeinc educ exper expersq age kidslt6 kidsge6 /discrete; model lwage = educ exper expersq / select(inlf=1); run;
The output of the QLIM procedure is shown in Output 21.4.1.
Example 21.5: Sample Selection Model with Truncation and Censoring F 1423
Output 21.4.1 Sample Selection Binary Data The QLIM Procedure Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion
2 inlf lwage 753 -832.88509 0.00502 78 Quasi-Newton 1694 1759
Parameter Estimates
Parameter lwage.Intercept lwage.educ lwage.exper lwage.expersq _Sigma.lwage inlf.Intercept inlf.nwifeinc inlf.educ inlf.exper inlf.expersq inlf.age inlf.kidslt6 inlf.kidsge6 _Rho
DF
Estimate
Standard Error
t Value
1 1 1 1 1 1 1 1 1 1 1 1 1 1
-0.552716 0.108351 0.042837 -0.000837 0.663397 0.266459 -0.012132 0.131341 0.123282 -0.001886 -0.052829 -0.867398 0.035872 0.026617
0.260371 0.014861 0.014878 0.000417 0.022706 0.508954 0.004877 0.025383 0.018728 0.000601 0.008479 0.118647 0.043476 0.147073
-2.12 7.29 2.88 -2.01 29.22 0.52 -2.49 5.17 6.58 -3.14 -6.23 -7.31 0.83 0.18
Approx Pr > |t| 0.0338 <.0001 0.0040 0.0449 <.0001 0.6006 0.0129 <.0001 <.0001 0.0017 <.0001 <.0001 0.4093 0.8564
Note the correlation estimate is insignificant. This indicates that selection bias is not a big problem in the estimation of wage equation.
Example 21.5: Sample Selection Model with Truncation and Censoring In this example the data are generated such that the selection variable is discrete and the variable Y is truncated from below by zero. The program follows, with the results shown in Output 21.5.1: data trunc; keep z y x1 x2; do i = 1 to 500; x1 = rannor( 19283 ); x2 = rannor( 98721 );
1424 F Chapter 21: The QLIM Procedure
u1 = rannor( 76527 ); u2 = rannor( 65721 ); zl = 1 + 2 * x1 + 3 * x2 + u1; y = 3 + 4 * x1 - 2 * x2 + u1*.2 + u2; if ( zl > 0 ) then z = 1; else z = 0; if y>=0 then output; end; run; /*-- Sample Selection with Truncation --*/ proc qlim data=trunc method=qn; model z = x1 x2 / discrete; model y = x1 x2 / select(z=1) truncated(lb=0); run;
Output 21.5.1 Sample Selection with Truncation Binary Data The QLIM Procedure Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion
2 z y 379 -344.10722 4.95535E-6 17 Quasi-Newton 704.21444 735.71473
Parameter Estimates
Parameter y.Intercept y.x1 y.x2 _Sigma.y z.Intercept z.x1 z.x2 _Rho
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
1 1 1 1 1 1 1 1
3.014158 3.995671 -1.972697 0.923428 0.949444 2.163928 3.134213 0.494356
0.128548 0.099599 0.096385 0.047233 0.190265 0.288384 0.379251 0.176542
23.45 40.12 -20.47 19.55 4.99 7.50 8.26 2.80
<.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.0051
In the following statements the data are generated such that the selection variable is discrete and the variable Y is censored from below by zero. The results are shown in Output 21.5.2. data cens; keep z y x1 x2; do i = 1 to 500;
Example 21.5: Sample Selection Model with Truncation and Censoring F 1425
x1 = rannor( 19283 ); x2 = rannor( 98721 ); u1 = rannor( 76527 ); u2 = rannor( 65721 ); zl = 1 + 2 * x1 + 3 * x2 + u1; yl = 3 + 4 * x1 - 2 * x2 + u1*.2 + u2; if ( zl > 0 ) then z = 1; else z = 0; if ( yl > 0 ) then y = yl; else y = 0; output; end; run; /*-- Sample Selection with Censoring --*/ proc qlim data=cens method=qn; model z = x1 x2 / discrete; model y = x1 x2 / select(z=1) censored(lb=0); run;
Output 21.5.2 Sample Selection with Censoring Binary Data The QLIM Procedure Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion
2 z y 500 -399.78508 2.30443E-6 19 Quasi-Newton 815.57015 849.28702
Parameter Estimates
Parameter y.Intercept y.x1 y.x2 _Sigma.y z.Intercept z.x1 z.x2 _Rho
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
1 1 1 1 1 1 1 1
3.074276 3.963619 -2.023548 0.920860 1.013610 2.256922 3.302692 0.350776
0.111617 0.085796 0.088714 0.043278 0.154081 0.255999 0.342168 0.197093
27.54 46.20 -22.81 21.28 6.58 8.82 9.65 1.78
<.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.0751
1426 F Chapter 21: The QLIM Procedure
Example 21.6: Types of Tobit Models The following five examples show how to estimate different types of Tobit models (see “Types of Tobit Models” on page 1400). Output 21.6.1 through Output 21.6.5 show the results of the corresponding programs. Type 1 Tobit data a1; keep y x; do i = 1 to 500; x = rannor( 19283 ); u = rannor( 76527 ); yl = 1 + 2 * x + u; if ( yl > 0 ) then y = yl; else y = 0; output; end; run; /*-- Type 1 Tobit --*/ proc qlim data=a1 method=qn; model y = x; endogenous y ~ censored(lb=0); run;
Output 21.6.1 Type 1 Tobit Binary Data The QLIM Procedure Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion
1 y 500 -554.17696 4.65556E-7 9 Quasi-Newton 1114 1127
Parameter Estimates
Parameter
DF
Estimate
Standard Error
t Value
Intercept x _Sigma
1 1 1
0.942734 2.049571 1.016571
0.056784 0.066979 0.039035
16.60 30.60 26.04
Approx Pr > |t| <.0001 <.0001 <.0001
Example 21.6: Types of Tobit Models F 1427
Type 2 Tobit data a2; keep y1 y2 x1 x2; do i = 1 to 500; x1 = rannor( 19283 ); x2 = rannor( 98721 ); u1 = rannor( 76527 ); u2 = rannor( 65721 ); y1l = 1 + 2 * x1 + 3 * y2l = 3 + 4 * x1 - 2 * if ( y1l > 0 ) then y1 else y1 if ( y1l > 0 ) then y2 else y2 output; end; run;
x2 + u1; x2 + u1*.2 + u2; = 1; = 0; = y2l; = 0;
/*-- Type 2 Tobit --*/ proc qlim data=a2 method=qn; model y1 = x1 x2 / discrete; model y2 = x1 x2 / select(y1=1); run;
Output 21.6.2 Type 2 Tobit Binary Data The QLIM Procedure Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion
2 y1 y2 500 -476.12328 8.30075E-7 17 Quasi-Newton 968.24655 1002
1428 F Chapter 21: The QLIM Procedure
Output 21.6.2 continued Parameter Estimates
Parameter y2.Intercept y2.x1 y2.x2 _Sigma.y2 y1.Intercept y1.x1 y1.x2 _Rho
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
1 1 1 1 1 1 1 1
3.066992 4.004874 -2.079352 0.940559 1.017140 2.253080 3.305140 0.292992
0.106903 0.072043 0.087544 0.039321 0.154975 0.256097 0.343695 0.210073
28.69 55.59 -23.75 23.92 6.56 8.80 9.62 1.39
<.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.1631
Type 3 Tobit data a3; keep y1 y2 x1 x2; do i = 1 to 500; x1 = rannor( 19283 ); x2 = rannor( 98721 ); u1 = rannor( 76527 ); u2 = rannor( 65721 ); y1l = 1 + 2 * x1 + 3 * y2l = 3 + 4 * x1 - 2 * if ( y1l > 0 ) then y1 else y1 if ( y1l > 0 ) then y2 else y2 output; end; run;
x2 + u1; x2 + u1*.2 + u2; = y1l; = 0; = y2l; = 0;
/*-- Type 3 Tobit --*/ proc qlim data=a3 method=qn; model y1 = x1 x2 / censored(lb=0); model y2 = x1 x2 / select(y1>0); run;
Example 21.6: Types of Tobit Models F 1429
Output 21.6.3 Type 3 Tobit Binary Data The QLIM Procedure Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion
2 y1 y2 500 -838.94087 9.71691E-6 16 Quasi-Newton 1696 1734
Parameter Estimates
Parameter y2.Intercept y2.x1 y2.x2 _Sigma.y2 y1.Intercept y1.x1 y1.x2 _Sigma.y1 _Rho
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
1 1 1 1 1 1 1 1 1
3.081206 3.998361 -2.088280 0.939799 0.981975 2.032675 2.976609 0.969968 0.226281
0.080121 0.063734 0.072876 0.039047 0.067351 0.059363 0.065584 0.039795 0.057672
38.46 62.73 -28.66 24.07 14.58 34.24 45.39 24.37 3.92
<.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
Type 4 Tobit data a4; keep y1 y2 y3 x1 x2; do i = 1 to 500; x1 = rannor( 19283 ); x2 = rannor( 98721 ); u1 = rannor( 76527 ); u2 = rannor( 65721 ); u3 = rannor( 12019 ); y1l = 1 + 2 * x1 + 3 * x2 + u1; y2l = 3 + 4 * x1 - 2 * x2 + u1*.2 + u2; y3l = 0 - 1 * x1 + 1 * x2 + u1*.1 - u2*.5 + u3*.5; if ( y1l > 0 ) then y1 = y1l; else y1 = 0; if ( y1l > 0 ) then y2 = y2l; else y2 = 0; if ( y1l <= 0 ) then y3 = y3l; else y3 = 0; output; end;
1430 F Chapter 21: The QLIM Procedure
run; /*-- Type 4 Tobit --*/ proc qlim data=a4 method=qn; model y1 = x1 x2 / censored(lb=0); model y2 = x1 x2 / select(y1>0); model y3 = x1 x2 / select(y1<=0); run;
Output 21.6.4 Type 4 Tobit Binary Data The QLIM Procedure Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion
3 y1 y2 y3 500 -1128 0.0000161 21 Quasi-Newton 2285 2344
Parameter Estimates
Parameter y2.Intercept y2.x1 y2.x2 _Sigma.y2 y3.Intercept y3.x1 y3.x2 _Sigma.y3 y1.Intercept y1.x1 y1.x2 _Sigma.y1 _Rho.y1.y2 _Rho.y1.y3
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
1 1 1 1 1 1 1 1 1 1 1 1 1 1
2.894656 4.072704 -1.901163 0.981655 0.064594 -0.938384 1.035798 0.743124 0.987370 2.050408 2.982190 1.032473 0.291587 -0.031665
0.076079 0.062675 0.076874 0.039564 0.179441 0.096570 0.123104 0.038240 0.067861 0.060819 0.072552 0.040971 0.053436 0.260057
38.05 64.98 -24.73 24.81 0.36 -9.72 8.41 19.43 14.55 33.71 41.10 25.20 5.46 -0.12
<.0001 <.0001 <.0001 <.0001 0.7189 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.9031
Type 5 Tobit data a5; keep y1 y2 y3 x1 x2; do i = 1 to 500; x1 = rannor( 19283 ); x2 = rannor( 98721 );
Example 21.6: Types of Tobit Models F 1431
u1 = rannor( 76527 ); u2 = rannor( 65721 ); u3 = rannor( 12019 ); y1l = 1 + 2 * x1 + 3 * x2 + u1; y2l = 3 + 4 * x1 - 2 * x2 + u1*.2 + u2; y3l = 0 - 1 * x1 + 1 * x2 + u1*.1 - u2*.5 + u3*.5; if ( y1l > 0 ) then y1 = 1; else y1 = 0; if ( y1l > 0 ) then y2 = y2l; else y2 = 0; if ( y1l <= 0 ) then y3 = y3l; else y3 = 0; output; end; run; /*-- Type 5 Tobit --*/ proc qlim data=a5 method=qn; model y1 = x1 x2 / discrete; model y2 = x1 x2 / select(y1>0); model y3 = x1 x2 / select(y1<=0); run;
Output 21.6.5 Type 5 Tobit Binary Data The QLIM Procedure Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion
3 y1 y2 y3 500 -734.50612 3.57134E-7 20 Quasi-Newton 1495 1550
1432 F Chapter 21: The QLIM Procedure
Output 21.6.5 continued Parameter Estimates
Parameter
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
1 1 1 1 1 1 1 1 1 1 1 1 1
2.887523 4.078926 -1.898898 0.983059 0.071764 -0.935299 1.039954 0.743083 1.067578 2.068376 3.157385 0.312369 -0.018225
0.095193 0.069623 0.086578 0.039987 0.171522 0.092843 0.120697 0.038225 0.142789 0.226020 0.314743 0.177010 0.234886
30.33 58.59 -21.93 24.58 0.42 -10.07 8.62 19.44 7.48 9.15 10.03 1.76 -0.08
<.0001 <.0001 <.0001 <.0001 0.6757 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.0776 0.9382
y2.Intercept y2.x1 y2.x2 _Sigma.y2 y3.Intercept y3.x1 y3.x2 _Sigma.y3 y1.Intercept y1.x1 y1.x2 _Rho.y1.y2 _Rho.y1.y3
Example 21.7: Stochastic Frontier Models This example illustrates the estimation of stochastic frontier production and cost models. First, a production function model is estimated. The data for this example were collected by Christensen Associates; they represent a sample of 125 observations on inputs and output for 10 airlines between 1970 and 1984. The explanatory variables (inputs) are fuel (LF), materials (LM), equipment (LE), labor (LL), and property (LP), and (LQ) is an index that represents passengers, charter, mail, and freight transported. The following statements create the dataset: title1 ’Stochastic Frontier Production Model’; data airlines; input TS FIRM NI LQ LF LM LE LL LP; datalines; 1 1 15 -0.0484 0.2473 0.2335 0.2294 0.2246 0.2124 1 1 15 -0.0133 0.2603 0.2492 0.241 0.2216 0.1069 2 1 15 0.088 0.2666 0.3273 0.3365 0.2039 0.0865 ... more lines ...
The following statements estimate a stochastic frontier exponential production model that uses Christensen Associates data: /*-- Stochastic Frontier Production Model --*/ proc qlim data=airlines method=congra; model LQ=LF LM LE LL LP;
Example 21.7: Stochastic Frontier Models F 1433
endogenous LQ ~ frontier (type=exponential production); run;
Output 21.7.1 shows the results from this production model. Output 21.7.1 Stochastic Frontier Production Model Stochastic Frontier Production Model The QLIM Procedure Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion Sigma Lambda
1 LQ 125 83.27815 0.0005771 78 Conjugate-Gradient -150.55630 -127.92979 0.12445 0.55766
Parameter Estimates
Parameter
DF
Estimate
Standard Error
t Value
Intercept LF LM LE LL LP _Sigma_v _Sigma_u
1 1 1 1 1 1 1 1
-0.085048 -0.115805 0.756252 0.424916 -0.136417 0.098966 0.108688 0.060611
0.024528 0.124178 0.078755 0.081893 0.089702 0.042776 0.010063 0.017603
-3.47 -0.93 9.60 5.19 -1.52 2.31 10.80 3.44
Approx Pr > |t| 0.0005 0.3510 <.0001 <.0001 0.1283 0.0207 <.0001 0.0006
Similarly, the stochastic frontier production function can be estimated with (type=half) or (type=truncated) options that represent half-normal and truncated normal production or cost models. In the next step, stochastic frontier cost function is estimated. The data for the cost model are provided by Christensen and Greene (1976). The data describe costs and production inputs of 145 U.S. electricity producers in 1955. The model being estimated follows the nonhomogenous version of the Cobb-Douglas cost function: Cost KPrice LPrice 1 log D ˇ0 Cˇ1 log Cˇ2 log Cˇ3 log.Output/Cˇ4 log.Output/2 C FPrice FPrice FPrice 2 All dollar values are normalized by fuel price. The quadratic log of the output is added to capture nonlinearities due to scale effects in cost functions. New variables, log_C_PF, log_PK_PF, log_PL_PF, log_y, and log_y_sq, are created to reflect transformations. The following statements create the data set and transformed variables:
1434 F Chapter 21: The QLIM Procedure
data electricity; input Firm Year Cost Output LPrice LShare datalines; 1 1955 .0820 2.0 2.090 .3164 183.000 2 1955 .6610 3.0 2.050 .2073 174.000 3 1955 .9900 4.0 2.050 .2349 171.000 4 1955 .3150 4.0 1.830 .1152 166.000
KPrice KShare FPrice FShare; .4521 .6676 .5799 .7857
17.9000 35.1000 35.1000 32.2000
.2315 .1251 .1852 .0990
... more lines ... /* Data transformations */ data electricity; set electricity; label Firm="firm index" Year="1955 for all observations" Cost="Total cost" Output="Total output" LPrice="Wage rate" LShare="Cost share for labor" KPrice="Capital price index" KShare="Cost share for capital" FPrice="Fuel price" FShare"Cost share for fuel"; log_C_PF=log(Cost/FPrice); log_PK_PF=log(KPrice/FPrice); log_PL_PF=log(LPrice/FPrice); log_y=log(Output); log_y_sq=log_y**2/2; run;
The following statements estimate a stochastic frontier exponential cost model that uses Christensen and Greene (1976) data: /*-- Stochastic Frontier Cost Model --*/ proc qlim data=electricity; model log_C_PF = log_PK_PF log_PL_PF log_y log_y_sq; endogenous log_C_PF ~ frontier (type=exponential cost); run;
Output 21.7.2 shows the results.
Example 21.7: Stochastic Frontier Models F 1435
Output 21.7.2 Exponential Distribution Stochastic Frontier Production Model The QLIM Procedure Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion Sigma Lambda
1 log_C_PF 159 -23.30430 3.0458E-6 21 Quasi-Newton 60.60860 82.09093 0.30750 1.71345
Parameter Estimates
Parameter
DF
Estimate
Standard Error
t Value
Intercept log_PK_PF log_PL_PF log_y log_y_sq _Sigma_v _Sigma_u
1 1 1 1 1 1 1
-4.983211 0.090242 0.504299 0.427182 0.066120 0.154998 0.265581
0.543328 0.109202 0.118263 0.066680 0.010079 0.020271 0.033614
-9.17 0.83 4.26 6.41 6.56 7.65 7.90
Approx Pr > |t| <.0001 0.4086 <.0001 <.0001 <.0001 <.0001 <.0001
Similarly, the stochastic frontier cost model can be estimated with (type=half) or (type=truncated) options that represent half-normal and truncated normal errors. The following statements illustrate the half-normal option: /*-- Stochastic Frontier Cost Model --*/ proc qlim data=electricity; model log_C_PF = log_PK_PF log_PL_PF log_y log_y_sq; endogenous log_C_PF ~ frontier (type=half cost); run;
Output 21.7.3 shows the result.
1436 F Chapter 21: The QLIM Procedure
Output 21.7.3 Half-Normal Distribution Stochastic Frontier Production Model The QLIM Procedure Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion Sigma Lambda
1 log_C_PF 159 -34.95304 0.0001150 22 Quasi-Newton 83.90607 105.38840 0.42761 1.80031
Parameter Estimates
Parameter
DF
Estimate
Standard Error
t Value
Intercept log_PK_PF log_PL_PF log_y log_y_sq _Sigma_v _Sigma_u
1 1 1 1 1 1 1
-4.434634 0.069624 0.474578 0.256874 0.088051 0.207637 0.373810
0.690197 0.136250 0.146812 0.080777 0.011817 0.039222 0.073605
-6.43 0.51 3.23 3.18 7.45 5.29 5.08
The following statements illustrate the truncated normal option: /*-- Stochastic Frontier Cost Model --*/ proc qlim data=electricity; model log_C_PF = log_PK_PF log_PL_PF log_y log_y_sq; endogenous log_C_PF ~ frontier (type=truncated cost); run;
Output 21.7.4 shows the results.
Approx Pr > |t| <.0001 0.6093 0.0012 0.0015 <.0001 <.0001 <.0001
References F 1437
Output 21.7.4 Truncated Normal Distribution Stochastic Frontier Production Model The QLIM Procedure Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion Sigma Lambda
1 log_C_PF 159 -60.32110 4225 4 Quasi-Newton 136.64220 161.19343 0.37350 0.70753
Parameter Estimates
Parameter
DF
Estimate
Standard Error
t Value
Intercept log_PK_PF log_PL_PF log_y log_y_sq _Sigma_v _Sigma_u _Mu
1 1 1 1 1 1 1 1
-3.770440 -0.045852 0.602961 0.094966 0.113010 0.304905 0.215728 0.477097
0.839388 0.176682 0.191454 0.071124 0.012225 0.047868 0.068725 0.116295
-4.49 -0.26 3.15 1.34 9.24 6.37 3.14 4.10
Approx Pr > |t| <.0001 0.7952 0.0016 0.1818 <.0001 <.0001 0.0017 <.0001
If no (Production) or (Cost) option is specified, the stochastic frontier production model is estimated by default.
References Abramowitz, M. and Stegun, A. (1970), Handbook of Mathematical Functions, New York: Dover Press. Aigner, C., Lovell, C. A. K., Schmidt, P. (1977), “Formulation and Estimation of Stochastic Frontier Production Function Models,” Journal of Econometrics, 6:1 (July), 21–37 Aitchison, J. and Silvey, S. (1957), “The Generalization of Probit Analysis to the Case of Multiple Responses,” Biometrika, 44, 131–140. Amemiya, T. (1978a), “The Estimation of a Simultaneous Equation Generalized Probit Model,” Econometrica, 46, 1193–1205.
1438 F Chapter 21: The QLIM Procedure
Amemiya, T. (1978b), “On a Two-Step Estimate of a Multivariate Logit Model,” Journal of Econometrics, 8, 13–21. Amemiya, T. (1981), “Qualitative Response Models: A Survey,” Journal of Economic Literature, 19, 483–536. Amemiya, T. (1984), “Tobit Models: A Survey,” Journal of Econometrics, 24, 3–61. Amemiya, T. (1985), Advanced Econometrics, Cambridge: Harvard University Press. Ben-Akiva, M. and Lerman, S. R. (1987), Discrete Choice Analysis, Cambridge: MIT Press. Bera, A. K., Jarque, C. M., and Lee, L.-F. (1984), “Testing the Normality Assumption in Limited Dependent Variable Models,” International Economic Review, 25, 563–578. Bloom, D. E. and Killingsworth, M. R. (1985), “Correcting for Truncation Bias Caused by a Latent Truncation Variable,” Journal of Econometrics, 27, 131–135. Box, G. E. P. and Cox, D. R. (1964), “An Analysis of Transformations,” Journal of the Royal Statistical Society, Series B., 26, 211–252. Cameron, A. C. and Trivedi, P. K. (1986), “Econometric Models Based on Count Data: Comparisons and Applications of Some Estimators,” Journal of Applied Econometrics, 1, 29–53. Cameron, A. C. and Trivedi, P. K. (1998), Regression Analysis of Count Data, Cambridge: Cambridge University Press. Christensen, L. and W. Greene, 1976, “Economies of Scale in U.S. Electric Power Generation,” Journal of Political Economy, 84, pp. 655-676. Coelli, T. J., Prasada Rao, D. S., Battese, G. E. (1998), An Introduction to Efficiency and Productivity Analysis, London: Kluwer Academic Publisher. Copley, P. A., Doucet, M. S., and Gaver, K. M. (1994), “A Simultaneous Equations Analysis of Quality Control Review Outcomes and Engagement Fees for Audits of Recipients of Federal Financial Assistance,” The Accounting Review, 69, 244–256. Cox, D. R. (1970), Analysis of Binary Data, London: Metheun. Cox, D. R. (1972), “Regression Models and Life Tables,” Journal of the Royal Statistical Society, Series B, 20, 187–220. Cox, D. R. (1975), “Partial Likelihood,” Biometrika, 62, 269–276. Deis, D. R. and Hill, R. C. (1998), “An Application of the Bootstrap Method to the Simultaneous Equations Model of the Demand and Supply of Audit Services,” Contemporary Accounting Research, 15, 83–99. Estrella, A. (1998), “A New Measure of Fit for Equations with Dichotomous Dependent Variables,” Journal of Business and Economic Statistics, 16, 198–205. Gallant, A. R. (1987), Nonlinear Statistical Models, New York: Wiley.
References F 1439
Genz, A. (1992), “Numerical Computation of Multivariate Normal Probabilities,” Journal of Computational and Graphical Statistics, 1, 141–150. Godfrey, L. G. (1988), Misspecification Tests in Econometrics, Cambridge: Cambridge University Press. Gourieroux, C., Monfort, A., Renault, E., and Trognon, A. (1987), “Generalized Residuals,” Journal of Econometrics, 34, 5–32. Greene, W. H. (1997), Econometric Analysis, Upper Saddle River, N.J.: Prentice Hall. Gregory, A. W. and Veall, M. R. (1985), “On Formulating Wald Tests for Nonlinear Restrictions,” Econometrica, 53, 1465–1468. Hajivassiliou, V. A. (1993), “Simulation Estimation Methods for Limited Dependent Variable Models,” in Handbook of Statistics, Vol. 11, ed. G. S. Maddala, C. R. Rao, and H. D. Vinod, New York: Elsevier Science Publishing. Hajivassiliou, V. A., and McFadden, D. (1998), “The Method of Simulated Scores for the Estimation of LDV Models,” Econometrica, 66, 863–896. Heckman, J. J. (1978), “Dummy Endogenous Variables in a Simultaneous Equation System,” Econometrica, 46, 931–959. Hinkley, D. V. (1975), “On Power Transformations to Symmetry,” Biometrika, 62, 101–111. Kim, M. and Hill, R. C. (1993), “The Box-Cox Transformation-of-Variables in Regression,” Empirical Economics, 18, 307–319. King, G. (1989b), Unifying Political Methodology: The Likelihood Theory and Statistical Inference, Cambridge: Cambridge University Press. Kumbhakar, S. C. and Knox Lovell, C. A. (2000), Stochastic Frontier Anaysis, New York: Cambridge University Press. Lee, L.-F. (1981), “Simultaneous Equations Models with Discrete and Censored Dependent Variables,” in Structural Analysis of Discrete Data with Econometric Applications, ed. C. F. Manski and D. McFadden, Cambridge: MIT Press Long, J. S. (1997), Regression Models for Categorical and Limited Dependent Variables, Thousand Oaks, CA: Sage Publications. McFadden, D. (1974), “Conditional Logit Analysis of Qualitative Choice Behavior,” in Frontiers in Econometrics, ed. P. Zarembka, New York: Academic Press. McFadden, D. (1981), “Econometric Models of Probabilistic Choice,” in Structural Analysis of Discrete Data with Econometric Applications, ed. C. F. Manski and D. McFadden, Cambridge: MIT Press. McKelvey, R. D. and Zavoina, W. (1975), “A Statistical Model for the Analysis of Ordinal Level Dependent Variables,” Journal of Mathematical Sociology, 4, 103–120.
1440 F Chapter 21: The QLIM Procedure
Meeusen, W. and van Den Broeck, J. (1977), “Efficiency Estimation from Cobb-Douglas Production Functions with Composed Error,” International Economic Review, 18:2(Jun), 435–444 Mroz, T. A. (1987), “The Sensitivity of an Empirical Model of Married Women’s Hours of Work to Economic and Statistical Assumptions,” Econometrica, 55, 765–799. Mroz, T. A. (1999), “Discrete Factor Approximations in Simultaneous Equation Models: Estimating the Impact of a Dummy Endogenous Variable on a Continuous Outcome,” Journal of Econometrics, 92, 233–274. Nawata, K. (1994), “Estimation of Sample Selection Bias Models by the Maximum Likelihood Estimator and Heckman’s Two-Step Estimator,” Economics Letters, 45, 33–40. Parks, R. W. (1967), “Efficient Estimation of a System of Regression Equations When Disturbances Are Both Serially and Contemporaneously Correlated,” Journal of the American Statistical Association, 62, 500–509. Phillips, C. B. and Park, J. Y. (1988), “On Formulating Wald Tests of Nonlinear Restrictions,” Econometrica, 56, 1065–1083. Powers, D. A. and Xie, Y. (2000), Statistical Methods for Categorical Data Analysis, San Diego: Academic Press. Wooldridge, J. M. (2002), Econometric Analysis of Cross Section of Panel Data, Cambridge, MA: MIT Press.
Chapter 22
The SIMILARITY Procedure (Experimental) Contents Overview: SIMILARITY Procedure . . . . . . Getting Started: SIMILARITY Procedure . . . Syntax: SIMILARITY Procedure . . . . . . . Functional Summary . . . . . . . . . . . PROC SIMILARITY Statement . . . . . BY Statement . . . . . . . . . . . . . . FCMPOPT Statement . . . . . . . . . . ID Statement . . . . . . . . . . . . . . . INPUT Statement . . . . . . . . . . . . TARGET Statement . . . . . . . . . . . Details: SIMILARITY Procedure . . . . . . . Accumulation . . . . . . . . . . . . . . Missing Value Interpretation . . . . . . . Zero Value Interpretation . . . . . . . . Time Series Transformation . . . . . . . Time Series Differencing . . . . . . . . Time Series Missing Value Trimming . . Time Series Descriptive Statistics . . . . Input and Target Sequences . . . . . . . Sliding Sequences . . . . . . . . . . . . Time Warping . . . . . . . . . . . . . . Sequence Normalization . . . . . . . . . Sequence Scaling . . . . . . . . . . . . . Similarity Measures . . . . . . . . . . . User-Defined Functions and Subroutines Output Data Sets . . . . . . . . . . . . . OUT= Data Set . . . . . . . . . . . . . . OUTMEASURE= Data Set . . . . . . . OUTSUM= Data Set . . . . . . . . . . . OUTSEQUENCE= Data Set . . . . . . . OUTPATH= Data Set . . . . . . . . . . _STATUS_ Variable Values . . . . . . . Printed Output . . . . . . . . . . . . . . ODS Tables Names . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1442 1444 1446 1446 1447 1450 1451 1451 1454 1456 1462 1463 1465 1465 1465 1466 1466 1466 1467 1467 1467 1467 1468 1468 1468 1476 1477 1477 1478 1479 1480 1480 1481 1482
1442 F Chapter 22: The SIMILARITY Procedure (Experimental)
ODS Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples: SIMILARITY Procedure . . . . . . . . . . . . . . . . . . . . . . Example 22.1: Accumulating Transactional Data into Time Series Data Example 22.2: Similarity Analysis . . . . . . . . . . . . . . . . . . . Example 22.3: Sliding Similarity Analysis . . . . . . . . . . . . . . . Example 22.4: Searching for Historical Analogies . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
1482 1485 1485 1486 1504 1506 1509
Overview: SIMILARITY Procedure The SIMILARITY procedure computes similarity measures associated with time-stamped data, time series, and/or other sequentially ordered numeric data. The procedure computes similarity measures for time-stamped transactional data (transactions) with respect to time by accumulating the data into a time series format (time series). The procedure computes similarity measures for sequentially ordered numeric data (sequences) by respecting the ordering of the data. Given two ordered numeric sequences (input and target), a similarity measure is a metric that measures the distance between the input and target sequences while taking into account the ordering of the data. The SIMILARITY procedure computes similarity measures between an input sequence and a target sequence, as well as similarity measures that “slide” the target sequence with respect to the input sequence. The “slides” can be by observation index (sliding-sequence similarity measures) or by seasonal index (seasonal-sliding-sequence similarity measures). In order to compare the raw input and target time-stamped data, the raw data must be accumulated to a time series format. After the input and target time series is formed, the two accumulated time series can be compared as two ordered numeric sequences. For raw time-stamped data, after the transactional data are accumulated to form time series and any missing values are interpreted, each accumulated time series can be functionally transformed, if desired. Transformations are useful when you want to stabilize the time series before computing the similarity measures. Transformations performed by the SIMILARITY procedure include the following: log (LOG) square-root (SQRT) logistic (LOGISTIC) Box-Cox (BOXCOX) user-defined transformations Each time series can be transformed further by using simple and/or seasonal differencing. Additional time series transformations can be performed by using various time series transformation and analysis techniques provided by this procedure or other procedures in SAS/ETS.
Overview: SIMILARITY Procedure F 1443
After optionally transforming each time series, the accumulated and transformed time series can be stored in an output data set (OUT= data set). After optional accumulation and transformation, each of these time series are the “working series,” which can now be analyzed as sequences of numeric data. Each of these sequences can be a target sequence, an input sequence, or both a target and an input sequence. Throughout the remainder of this chapter, the term “original sequence” applies to both the original input and target sequence. The term “working sequence” applies to a version both the original input and target sequence under investigation. Each original sequence can be normalized prior to similarity analysis. Normalizations are useful when you want to compare the “shape” or “profile” of the time series. Normalizations performed by the SIMILARITY procedure include the following: standard (STANDARD) absolute (ABSOLUTE) user-defined normalizations After each original sequence is optionally normalized as described above, each working input sequence can be scaled to the target sequence prior to similarity analysis. Scaling is useful when you want to compare the input sequence to the target sequence while discounting the variation of the target sequence. Input sequence scaling performed by the SIMILARITY procedure include the following: standard (STANDARD) sbsolute (ABSOLUTE) user-defined scaling Once the working input sequence is optionally scaled to the target sequence as described above, similarity measures can be computed. Similarity measures computed by the SIMILARITY procedure include: squared deviation (SQRDEV) absolute deviation (ABSDEV) mean square deviation (MSQRDEV) mean absolute deviation (MABSDEV) user-defined similarity measures In computing the similarity measure between two time series, tasks are needed for transforming time series, normalizing sequences, scaling sequences, and computing metrics or measures. The SIMILARITY procedure provides built-in routines to perform these tasks. The SIMILARITY procedure also provides a facility that allows you to extend the procedure with user-defined routines.
1444 F Chapter 22: The SIMILARITY Procedure (Experimental)
All results of the similarity analysis can be stored in output data sets, printed, or graphed using the Output Delivery System (ODS). The SIMILARITY procedure can process large amounts of time-stamped transactional data, time series, or sequential data. Therefore, the analysis results are useful for large-scale time series analysis, analogous time series forecasting, new product forecasting, or time series (temporal) data mining. In SAS/ETS, the EXPAND procedure can be used for frequency conversion and transformations of time series. The TIMESERIES procedure can be used for large-scale time series analysis. In SAS/STAT, the DISTANCE procedure can be used to compute various measures of distance, dissimilarity, or similarity between observations (rows) of a SAS data set.
Getting Started: SIMILARITY Procedure This section outlines the use of the SIMILARITY procedure and gives a cursory description of some of the analysis techniques that can be performed on time-stamped transactional data, time series, or sequentially-ordered numeric data. Given an input data set that contains numerous transaction variables recorded over time at no specific frequency, the SIMILARITY procedure can form equally spaced input and target time series as follows: PROC SIMILARITY DATA=
The SIMILARITY procedure forms time series from the input time-stamped transactional data. It can provide results in output data sets or in other output formats using the Output Delivery System (ODS). The examples in this section are more fully illustrated in the section “Examples: SIMILARITY Procedure” on page 1485. Time-stamped transactional data are often recorded at no fixed interval. Analysts often want to use time series analysis techniques that require fixed-time intervals. Therefore, the transactional data must be accumulated to form a fixed-interval time series. Suppose that a bank wants to analyze the transactions associated with each of its customers over time. Further, suppose that the data set WORK.TRANSACTIONS contains three variables related to the customer transactions (CUSTOMER, DATE, and WITHDRAWAL) and one variable that contains an example fraudulent behavior (FRAUD).
Getting Started: SIMILARITY Procedure F 1445
The following statements illustrate how to use the SIMILARITY procedure to accumulate timestamped transactional data to form a daily time series based on the accumulated daily totals of each type of transaction (WITHDRAWALS and FRAUD). proc similarity data=transactions out=timedata; by customer; id date interval=day accumulate=total; input withdrawals; target fraud; run;
The OUT=TIMEDATA option specifies that the resulting time series data for each customer is to be stored in the data set WORK.TIMEDATA. The INTERVAL=DAY option specifies that the transactions are to be accumulated on a daily basis. The ACCUMULATE=TOTAL option specifies that the sum of the transactions are to be accumulated. After the transactional data are accumulated into a time series format, the time series data can be normalized so that the “shape” or “profile” is analyzed. For example, the following statements build on the previous statements and demonstrate normalization of the accumulated time series. proc similarity data=transactions out=timedata; by customer; id date interval=day accumulate=total; input withdrawals / NORMALIZE=STANDARD; target fraud / NORMALIZE=STANDARD; run;
The NORMALIZE=STANDARD option specifies that each accumulated time series observation is normalized by subtracting the mean and then dividing by the standard deviation of the accumulated time series. The WORK.TIMEDATA data set now contains the accumulated and normalized time series data for each customer. After the transactional data are accumulated into a time series format and normalized to mean of zero and standard deviation of one, similarity analysis can be performed on the accumulated and normalized time series. For example, the following statements build on the previous statements and demonstrate similarity analysis of the accumulated and normalized time series. proc similarity data=transactions out=timedata OUTSUM=SUMMARY; by customer; id date interval=day accumulate=total; input withdrawals / normalize=standard; target fraud / normalize=standard SIMILARITY=MABSDEV; run;
The SIMILARITY=MABSDEV option specifies the accumulated and normalized time series data associated with the variables WITHDRAWALS and FRAUD are to be compared by using mean absolute deviation. The OUTSUM=SUMMARY option specifies that the similarity analysis summary for each customer is to be stored in the data set WORK.SUMMARY.
1446 F Chapter 22: The SIMILARITY Procedure (Experimental)
Syntax: SIMILARITY Procedure The following statements are used with the SIMILARITY procedure. PROC SIMILARITY options ; BY variables ; ID variable INTERVAL= interval options ; FCMPOPT options ; INPUT variable-list / options ; TARGET variable-list / options ;
Functional Summary The statements and options that control the SIMILARITY procedure are summarized in the following table. Table 22.1
SIMILARITY Functional Summary
Description
Statement
Option
Statements specify BY-group processing specify the time ID variable specify the FCMP options specify input variables to analyze specify target variables to analyze
BY ID FCMPOPT INPUT TARGET
Data Set Options specify the input data set specify the time series output data set specify measure summary output data set specify path output data set specify sequence output data set specify summary output data set
PROC SIMILARITY PROC SIMILARITY PROC SIMILARITY PROC SIMILARITY PROC SIMILARITY PROC SIMILARITY
DATA= OUT= OUTMEASURE= OUTPATH= OUTSEQUENCE= OUTSUM=
User-Defined Functions and Subroutine Options specify FCMP quiet mode specify FCMP trace mode
FCMPOPT FCMPOPT
QUIET= TRACE=
Accumulation and Seasonality Options specify accumulation frequency specify length of seasonal cycle specify interval alignment specify time ID variable values are not sorted
ID PROC SIMILARITY ID ID
INTERVAL= SEASONALITY= ALIGN= NOTSORTED
PROC SIMILARITY Statement F 1447
Description
Statement
Option
specify starting time ID value specify ending time ID value specify accumulation statistic specify missing value interpretation specify zero value interpretation specify missing value trimming
ID ID ID, INPUT, TARGET ID, INPUT, TARGET ID, INPUT, TARGET INPUT, TARGET
START= END= ACCUMULATE= SETMISS= ZEROMISS= TRIMMISS=
Time Series Transformation Options specify simple differencing specify seasonal differencing specify transformation
INPUT, TARGET INPUT, TARGET INPUT, TARGET
DIF= SDIF= TRANSFORM=
Input Sequence Options specify normalization specify scaling
INPUT INPUT
NORMALIZE= SCALE=
Target Sequence Options specify normalization
TARGET
NORMALIZE=
Similarity Measure Options specify compression limits specify expansion limits specify similarity measure specify similarity measure and path specify sequence slide
TARGET TARGET TARGET TARGET TARGET
COMPRESS= EXPAND= MEASURE= PATH= SLIDE=
Printing and Graphical Control Options specify time ID format specify printed output specify detailed printed output specify graphical output
ID PROC SIMILARITY PROC SIMILARITY PROC SIMILARITY
FORMAT= PRINT= PRINTDETAILS PLOTS=
PROC SIMILARITY
SORTNAMES
PROC SIMILARITY
ORDER=
Miscellaneous Options specify that analysis variables are processed in ascending order specify the ordering of the processsing of the input and target variables
PROC SIMILARITY Statement PROC SIMILARITY options ;
1448 F Chapter 22: The SIMILARITY Procedure (Experimental)
The following options can be used in the PROC SIMILARITY statement. DATA=SAS-data-set
names the SAS data set that contains the time series, transactional, or sequence input data for the procedure. If the DATA= option is not specified, the most recently created SAS data set is used. ORDER=
specifies the order that the variables listed in the INPUT and TARGET statements are processed. This ordering affects the OUTSEQUENCE=, OUTPATH=, OUTMEASURE=, and OUTSUM= data sets, as well as the printed and graphical output. The SORTNAMES option also affects the ordering of the analysis. OUT=SAS-data-set
names the output data set to contain the time series variables specified in the subsequent INPUT and TARGET statements. If an ID variable is specified, it is also included in the OUT= data set. The values are accumulated based on the ID statement INTERVAL= option or the ACCUMULATE= options or both. The values are transformed based on the INPUT or TARGET statement TRANSFORM=, DIF=, and/ or SDIF= options in this order. The OUT= data set is particularly useful when you want to further analyze, model, or forecast the resulting time series with other SAS/ETS procedures. OUTMEASURE=SAS-data-set
names the output data set to contain the detailed similarity measures by time ID value. The form of the OUTMEASURE= data set is determined by the PROC SIMILARITY statement SORTNAMES and ORDER= options. OUTPATH=SAS-data-set
names the output data set to contain the path used to compute the similarity measures for each slide and warp. The form of the OUTPATH= data set is determined by the PROC SIMILARITY statement SORTNAMES and ORDER= options. OUTSEQUENCE=SAS-data-set
names the output data set to contain the sequences used to compute the similarity measures for each slide and warp. The form of the OUTSEQUENCE= data set is determined by the PROC SIMILARITY statement SORTNAMES and ORDER= options. OUTSUM=SAS-data-set
names the output data set to contain the similarity measure summary. The OUTSUM= data set is particularly useful when analyzing large numbers of series and only the summary of the results are needed. The form of the OUTSUM= data set is determined by the PROC SIMILARITY statement SORTNAMES and ORDER= options. INPUT
specifies that each INPUT variable is processed and then the TARGET variables are processed. The results are stored and printed based only on the INPUT variables.
INPUTTARGET
specifies that each INPUT variable is processed and then the TARGET variables are processed. The results are stored and printed based on both the INPUT and TARGET variables. This is the default.
PROC SIMILARITY Statement F 1449
TARGET
specifies that each TARGET variable is processed and then the INPUT variables are processed. The results are stored and printed based only on the TARGET variables.
TARGETINPUT
specifies that each TARGET variable is processed and then the INPUT variables are processed. The results are stored and printed based on both the TARGET and INPUT variables.
PLOTS=option PLOTS=( options . . . )
specifies the graphical output desired. The options are separated by spaces. By default, the SIMILARITY procedure produces no graphical output. The following graphical options are available: ALL
same as PLOTS=(INPUTS TARGETS SEQUENCES NORMALIZED SCALED DISTANCES PATHS MAPS WARPS COST MEASURES).
COSTS
plots time warp costs graphics.
DISTANCES
plots similarity absolute and relative distances graphics. (OUTPATH= data set)
INPUTS
plots input variable time series graphics. (OUT= data set)
MAPS
plots time warp maps graphics. (OUTPATH= data set)
MEASURES
plots similarity measure graphics. (OUTMEASURE= data set)
NORMALIZED
plots both the input and target variable normalized sequence graphics. These plots are displayed only when the INPUT or TARGET statement NORMALIZE= option is specified.
PATHS
plots time warp paths graphics. (OUTPATH= data set)
SCALED
plots both the input variable scaled sequence graphics. These plots are displayed only when the INPUT statement SCALE= option is specified.
SEQUENCES
plots both the input and target variable sequence graphics. (OUTSEQUENCE= data set)
TARGETS
plots target variable time series graphics. (OUT= data set)
WARPS
plots time warps graphics. (OUTPATH= data set)
PRINT=option PRINT=(options . . . )
specifies the printed output desired. The options are separated by spaces. By default, the SIMILARITY procedure produces no printed output. The following printing options are available: DESCSTATS
prints the descriptive statistics for the working time series.
PATHS
prints the path statistics table.
COSTS
prints the cost statistics table.
WARPS
prints the warp summary table.
1450 F Chapter 22: The SIMILARITY Procedure (Experimental)
SLIDES
prints the slides summary table.
SUMMARY
prints the similarity measure summary table.
ALL
same as PRINT=(DESCSTATS PATHS COSTS WARPS SLIDES SUMMARY).
PRINTDETAILS
specifies that output requested with the PRINT= option be printed in greater detail. SEASONALITY=integer
specifies the length of the seasonal cycle where integer ranges from one to 10,000. For example, SEASONALITY=3 means that every group of three time periods forms a seasonal cycle. By default, the length of the seasonal cycle is one (no seasonality) or the length implied by the INTERVAL= option specified in the ID statement. For example, INTERVAL=MONTH implies that the length of the seasonal cycle is twelve. SORTNAMES
specifies that the variables specified in the INPUT and TARGET statements are processed in order sorted by the variable names. By default, the SIMILARITY procedure processes the variables in the order they are listed. The ORDER= option also affects the ordering of the analysis.
BY Statement A BY statement can be used with PROC SIMILARITY to obtain separate dummy variable definitions for groups of observations defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. If your input data set is not sorted in ascending order, use one of the following alternatives: Sort the data by using the SORT procedure with a similar BY statement. Specify the option NOTSORTED or DESCENDING in the BY statement for the SIMILARITY procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order. Create an index on the BY variables by using the DATASETS procedure. For more information about the BY statement, see SAS Language Reference: Concepts. For more information about the DATASETS procedure, see the discussion in the Base SAS Procedures Guide.
FCMPOPT Statement F 1451
FCMPOPT Statement FCMPOPT options ;
The FCMPOPT statement specifies the options related to user-defined functions and subroutines. The following options can be used with the FCMPOPT statement: QUIET=ON | OFF
specifies whether the nonfatal errors and warnings generated by the user-defined SAS language functions and subroutines are printed to the log. Nonfatal errors are usually associated with operations with missing values. The default is QUIET=ON. TRACE=ON | OFF
specifies whether the user-defined SAS language functions and subroutines tracings are printed to the log. Tracings are the results of every operation executed. This option is generally used for debugging. The default is TRACE=OFF.
ID Statement ID variable INTERVAL= interval options ;
The ID statement names a numeric variable that identifies observations in the input and output data sets. The ID variable’s values are assumed to be SAS date, time, or datetime values. In addition, the ID statement specifies the (desired) frequency associated with the time series. The ID statement options also specify how the observations are accumulated and how the time ID values are aligned to form the time series. The options specified affect all variables listed in subsequent INPUT and TARGET statements. If an ID statement is specified, the INTERVAL= option must also be specified. The other ID statement options are optional. If an ID statement is not specified, the observation number, with respect to the BY group, is used as the time ID. The following options can be used with the ID statement. ACCUMULATE=option
specifies how the data set observations are accumulated within each time period. The frequency (width of each time interval) is specified by the INTERVAL= option. The ID variable contains the time ID values. Each time ID variable value corresponds to a specific time period. The accumulated values form the time series, which is used in subsequent analysis. The ACCUMULATE= option is particularly useful when there are zero or more than one input observations coinciding with a particular time period (for example, time-stamped transactional data). The EXPAND procedure offers additional frequency conversions and transformations that can also be useful in creating a time series. The following options determine how the observations are accumulated within each time period based on the ID variable and the frequency specified by the INTERVAL= option:
1452 F Chapter 22: The SIMILARITY Procedure (Experimental)
NONE
No accumulation occurs; the ID variable values must be equally spaced with respect to the frequency. This is the default option.
TOTAL
Observations are accumulated based on the total sum of their values.
AVERAGE | AVG
Observations are accumulated based on the average of their values.
MINIMUM | MIN
Observations are accumulated based on the minimum of their values.
MEDIAN | MED
Observations are accumulated based on the median of their values.
MAXIMUM | MAX
Observations are accumulated based on the maximum of their values.
N
Observations are accumulated based on the number of nonmissing observations.
NMISS
Observations are accumulated based on the number of missing observations.
NOBS
Observations are accumulated based on the number of observations.
FIRST
Observations are accumulated based on the first of their values.
LAST
Observations are accumulated based on the last of their values.
STDDEV |STD
Observations are accumulated based on the standard deviation of their values.
CSS
Observations are accumulated based on the corrected sum of squares of their values.
USS
Observations are accumulated based on the uncorrected sum of squares of their values.
If the ACCUMULATE= option is specified, the SETMISSING= option is useful for specifying how accumulated missing values are treated. If missing values should be interpreted as zero, then SETMISSING=0 should be used. The DETAILS section describes accumulation in greater detail. ALIGN=option
controls the alignment of SAS dates that are used to identify output observations. The ALIGN= option accepts the following values: BEGINNING|BEG|B, MIDDLE|MID|M, and ENDING|END|E. ALIGN=BEGINNING is the default. END=option
specifies a SAS date, datetime, or time value that represents the end of the data. If the last time ID variable value is less than the END= value, the series is extended with missing values. If the last time ID variable value is greater than the END= value, the series is truncated. For example, END=“&sysdate”D uses the automatic macro variable SYSDATE to extend or truncate the series to the current date. The START= and END= options can be used to ensure that data associated within each BY group contains the same number of observations.
ID Statement F 1453
FORMAT=format
specifies the SAS format for the time ID values. If the FORMAT= option is not specified, the default format is implied by the INTERVAL= option. For example, FORMAT=DATE9. specifies that the DATE9. SAS format is to be used. Notice that the terminating “.” is required when specifying a SAS format. INTERVAL= interval
specifies the frequency of the accumulated time series. For example, if the input data set consists of quarterly observations, then INTERVAL=QTR should be used. If the SEASONALITY= option is not specified, the length of the seasonal cycle is implied from the INTERVAL= option. For example, INTERVAL=QTR implies a seasonal cycle of length 4. If the ACCUMULATE= option is also specified, the INTERVAL= option determines the time periods for the accumulation of observations. NOTSORTED
specifies that the time ID values are not in sorted order. The SIMILARITY procedure sorts the data with respect to the time ID prior to analysis if the NOTSORTED option is specified. SETMISSING=option | number
specifies how missing values (either actual or accumulated) are interpreted in the accumulated time series. If a number is specified, missing values are set to that number. If a missing value indicates an unknown value, the SETMISSING= option should not be used. If a missing value indicates no value, then SETMISSING=0 should be used. You typically use SETMISSING=0 for transactional data, because no recorded data usually implies no activity. The following options can also be used to determine how missing values are assigned: MISSING
Missing values are set to missing. This is the default option.
AVERAGE | AVG
Missing values are set to the accumulated average value.
MINIMUM | MIN
Missing values are set to the accumulated minimum value.
MEDIAN | MED
Missing values are set to the accumulated median value.
MAXIMUM | MAX
Missing values are set to the accumulated maximum value.
FIRST
Missing values are set to the accumulated first nonmissing value.
LAST
Missing values are set to the accumulated last nonmissing value.
PREVIOUS | PREV
Missing values are set to the previous period’s accumulated nonmissing value. Missing values at the beginning of the accumulated series remain missing.
NEXT
Missing values are set to the next period’s accumulated nonmissing value. Missing values at the end of the accumulated series remain missing.
START= option
specifies a SAS date, datetime, or time value that represents the beginning of the data. If the first time ID variable value is greater than the START= value, the series is prepended with missing values. If the first time ID variable value is less than the START= value, the series is truncated. The START= and END= options can be used to ensure that data associated with each BY group contains the same number of observations.
1454 F Chapter 22: The SIMILARITY Procedure (Experimental)
ZEROMISS=option
specifies how beginning and/or ending zero values (either actual or accumulated) are interpreted in the accumulated time series. The following options can also be used to determine how beginning and/or ending zero values are assigned: NONE
Beginning and/or ending zeros are unchanged. This is the default.
LEFT
Beginning zeros are set to missing.
RIGHT
Ending zeros are set to missing.
BOTH
Both beginning and ending zeros are set to missing.
If the accumulated series is all missing and/or zero, the series is not changed.
INPUT Statement INPUT variable-list < / options > ;
The INPUT statement lists the input numeric variables in the DATA= data set whose values are to be accumulated to form the time series or represent ordered numeric sequences (when no ID statement is specified). An input data set variable can be specified in only one INPUT or TARGET statement. Any number of INPUT statements can be used. The following options can be used with an INPUT statement. ACCUMULATE=option
specifies how the data set observations are accumulated within each time period for the variables listed in the INPUT statement. If the ACCUMULATE= option is not specified in the INPUT statement, accumulation is determined by the ACCUMULATE= option of the ID statement. If the ACCUMULATE= option is not specified on the ID statement or the INPUT statement, no accumulation is performed. See the ID statement ACCUMULATE= option for more details. DIF=(numlist)
specifies the differencing to be applied to the accumulated time series. The list of differencing orders must be separated by spaces or commas. For example, DIF=(1,3) specifies first, then third order, differencing. Differencing is applied after time series transformation. The TRANSFORM= option is applied before the DIF= option. Simple differencing is useful when you want to detrend the time series before computing the similarity measures. NORMALIZE=Argumentoption
specifies the sequence normalization to be applied to the working input sequence. The following normalization options are provided: NONE
No normalization is applied. This option is the default.
ABSOLUTE
Absolute normalization is applied.
STANDARD
Standard normalization is applied.
INPUT Statement F 1455
User-Defined
Normalization is computed by a user-defined subroutine created using the FCMP procedure, where User-Defined is the subroutine name.
Normalization is applied to the working input sequence, which can be a subset of the working input time series if the SLIDE=INDEX or SLIDE=SEASON options are specified. SCALE=option
specifies the scaling of the working input sequence with respect to the working target sequence. Scaling is performed after normalization. The following scaling options are provided: NONE
No scaling is applied. This option is the default.
ABSOLUTE
Absolute scaling is applied.
STANDARD
Standard scaling is applied.
User-Defined
Scaling is computed by a user-defined subroutine created using the FCMP procedure, where User-Defined is the subroutine name.
Scaling is applied to the working input sequence, which can be a subset of the working input time series if the SLIDE=INDEX or SLIDE=SEASON options are specified. SDIF=(numlist)
specifies the seasonal differencing to be applied to the accumulated time series. The list of seasonal differencing orders must be separated by spaces or commas. For example, SDIF=(1,3) specifies first, then third, order seasonal differencing. Differencing is applied after time series transformation. The TRANSFORM= option is applied before the SDIF= option. Seasonal differencing is useful when you want to deseasonalize the time series before computing the similarity measures. SETMISSING=option | number SETMISS=option | number
specifies how missing values (either actual or accumulated) are interpreted in the accumulated time series or ordered sequence for variables listed in the INPUT statement. If the SETMISSING= option is not specified in the INPUT statement, missing values are set based on the SETMISSING= option of the ID statement. If the SETMISSING= option is not specified on the ID statement or the INPUT statement, no missing value interpretation is performed. See the ID statement SETMISSING= option for more details. TRANSFORM= option
specifies the time series transformation to be applied to the accumulated time series. The following transformations are provided: NONE
No transformation is applied. This option is the default.
LOG
Logarithmic transformation is applied.
SQRT
Square-root transformation is applied.
LOGISTIC
Logistic transformation is applied.
1456 F Chapter 22: The SIMILARITY Procedure (Experimental)
BOXCOX(number )
Box-Cox transformation with parameter is applied, where the real number is between -5 and 5.
User-Defined
Transformation is computed by a user-defined subroutine created using the FCMP procedure where User-Defined is the subroutine name.
When the TRANSFORM= option is specified, the time series must be strictly positive unless a user-defined function is used. TRIMMISSING= option TRIMMISSING= option
specifies how missing values (either actual or accumulated) are trimmed from the accumulated time series or ordered sequence for variables listed in the INPUT statement. The following trimming options are provided: NONE
No missing value trimming is applied.
LEFT
Beginning missing values are trimmed.
RIGHT
Ending missing values are trimmed.
BOTH
Both beginning and ending missing value are trimmed. This is the default.
ZEROMISS= option
specifies how beginning and/or ending zero values (either actual or accumulated) are interpreted in the accumulated time series or ordered sequence for variables listed in the INPUT statement. If the ZEROMISS= option is not specified in the INPUT statement, beginning and/or ending zero values are set based on the ZEROMISS= option of the ID statement. If the ZERO= option is not specified on the ID statement or the INPUT statement, no zero value interpretation is performed. ZEROMISS= option of the ID statement. See the ID statement ZEROMISS= option for more details.
TARGET Statement TARGET variable-list < / options > ;
The TARGET statement lists the numeric target variables in the DATA= data set whose values are to be accumulated to form the time series or represent ordered numeric sequences (when no ID statement is specified). An input data set variable can be specified in only one INPUT or TARGET statement. Any number of TARGET statements can be used. The following options can be used with a TARGET statement. ACCUMULATE=option
specifies how the data set observations are accumulated within each time period for the variables listed in the TARGET statement. If the ACCUMULATE= option is not specified in the TARGET statement, accumulation is determined by the ACCUMULATE= option of the ID statement. If the ACCUMULATE= option is not specified on the ID statement or the
TARGET Statement F 1457
TARGET statement, no accumulation is performed. See the ID statement ACCUMULATE= option for more details. COMPRESS=option | (options)
specifies the sliding sequence (global) and warping (local) compression range of the target sequence with respect to the input sequence. Compression of the target sequence is the same as expansion of the input sequence and vice versa. The compression limits are defined based on the length of the target sequence and are imposed on the target sequence. The following compression options are provided: GLOBALABS=integer specifies the absolute global compression, where integer ranges from zero to 10,000. GLOBALABS=0 implies no global compression, which is the default unless the GLOBALPCT= option is specified. GLOBALPCT=number specifies global compression as a percentage of the length of the target sequence, where number ranges from zero to 100. GLOBALPCT=0 implies no global compression, which is the default. GLOBALPCT=100 implies maximum allowable compression. LOCALABS=integer specifies the absolute local compression, where integer ranges from zero to 10,000. The default is maximum allowable absolute local compression unless the LOCALPCT= option is specified. LOCALPCT=number specifies local compression as a percentage of the length of the input sequence, where number ranges from zero to 100. The percentage specified by the LOCALPCT= option must be less than the GLOBALPCT= option. LOCALPCT=0 implies no local compression. LOCALPCT=100 implies maximum allowable local compression. The default is LOCALPCT=100. If the SLIDE=NONE or the SLIDE=SEASON option is specified in the TARGET statement, the global compression options are ignored. To disallow local compression, use the option COMPRESS=(LOCALPCT=0 LOCALABS=0). If the SLIDE=INDEX option is specified, the global compression options are not ignored. To completely disallow both global and local compression, use the option COMPRESS=(GLOBALPCT=0 LOCALPCT=0) or COMPRESS=(GLOBALABS=0 LOCALABS=0). To allow only local compression, use the option COMPRESS=(GLOBALPCT=0 GLOBALABS=0). These are the default compression options. The above options can be used in combination to specify the desired amount of global and local compression as the following examples illustrate. Let Lc denote the global compression limit and lc denote the local compression limit. COMPRESS=(GLOBALPCT=20) specifies the global and local compression to range from ˘ zero to Lc D min 0:2Ny ; Ny 1 . COMPRESS=(GLOBALPCT=20 GLOBALABS=10) allows for the ˘ global and local compression to range from zero to Lc D min 0:2Ny ; min Ny 1 ; 10 . COMPRESS=(LOCALPCT=10) allows for the local compression to range from zero to lc D ˘ min 0:1Ny ; Ny 1 .
1458 F Chapter 22: The SIMILARITY Procedure (Experimental)
COMPRESS=(LOCALPCT=20 for the local compression to range ˘ LOCALABS=5) allows from zero to lc D min 0:2Ny ; min Ny 1 ; 5 . COMPRESS=(GLOBALPCT=20 LOCALPCT=20) ˘ allows for the global compression to range from zero to Lc D min 0:2Ny˘ ; Ny 1 and allows for the local compression to range from zero to lc D min 0:2Ny ; Ny 1 . COMPRESS=(GLOBALPCT=20 GLOBALABS=10 LOCALPCT=10 ˘LOCALABS=5) al- lows for the global compression to range from zero to Lc D min 0:2N Ny 1 ; 10 y ; min ˘ and allows for the local compression to range from zero to lc D min 0:1Ny ; min Ny 1 ; 5 . Suppose Tz is the length of the input time series and Ny is the length of the target sequence. The valid global compression limit, Lc , is always limited by the length of the target sequence: 0 Lc < Ny . Suppose Nx is the length of the input sequence, and Ny is the length of the target sequence. The valid local compression limit, lc , is always limited by the lengths of the input and target sequence: max 0; Ny Nx lc < Ny . DIF=(numlist)
specifies the differencing to be applied to the accumulated time series. The list of differencing orders must be separated by spaces or commas. For example, DIF=(1,3) specifies first, then third, order differencing. Differencing is applied after time series transformation. The TRANSFORM= option is applied before the DIF= option. Simple differencing is useful when you want to detrend the time series before computing the similarity measures. EXPAND=option | (options)
specifies the sliding sequence (global) and warping (local) expansion range of the target sequence with respect to the input sequence. Expansion of the target sequence is the same as compression of the input sequence and vice versa. The expansion limits are defined based on the length of the input sequence, but are imposed on the target sequence. The following expansion options are provided: GLOBALABS=integer specifies the absolute global expansion, where integer ranges from zero to 10,000. GLOBALABS=0 implies no global expansion, which is the default unless the GLOBALPCT= option is specified. GLOBALPCT=number specifies global expansion as a percentage of the length of the target sequence, where number ranges from zero to 100. GLOBALPCT=0 implies no global expansion, which is the default unless the GLOBALABS= option is specified. GLOBALPCT=100 implies maximum allowable global expansion. LOCALABS=integer specifies the absolute local expansion, where integer ranges from zero to 10,000. The default is maximal allowable absolute local expansion unless the LOCALPCT= option is specified. LOCALPCT=number specifies local expansion as a percentage of the length of the target sequence, where number ranges from zero to 100. LOCALPCT=0 implies no local expansion. LOCALPCT=100 implies maximum allowable local expansion. The default is LOCALPCT=100.
TARGET Statement F 1459
If the SLIDE=NONE or the SLIDE=SEASON option is specified in the TARGET statement, the global expansion options are ignored. To disallow local expansion, use the option EXPAND=(LOCALPCT=0 LOCALABS=0). If the SLIDE=INDEX option is specified, the global expansion options are not ignored. To completely disallow both global and local expansion, use the option EXPAND=(GLOBALPCT=0 LOCALPCT=0) or EXPAND=(GLOBALABS=0 LOCALABS=0). To allow only local expansion, use the option EXPAND=(GLOBALPCT=0 GLOBALABS=0). These are the default expansion options. The above options can be used in combination to specify the desired amount of global and local expansion as the following examples illustrate. Let Le denote the global expansion limit and le denote the local expansion limit. EXPAND=(GLOBALPCT=20) allows for the global and local expansion to range from zero ˘ to Le D min 0:2Ny ; Ny 1 . EXPAND=(GLOBALPCT=20 GLOBALABS=10) allowsfor the ˘ global and local expansion to range from zero to Le D min 0:2Ny ; min Ny 1 ; 10 . EXPAND=(LOCALPCT=10) allows for the local expansion to range from zero to le D ˘ min 0:1Ny ; Ny 1 . EXPAND=(LOCALPCT=10 ˘ LOCALABS=5) allows for the local expansion to range from zero to le D min 0:1Ny ; min Ny 1 ; 5 . EXPAND=(GLOBALPCT=20 LOCALPCT=10) ˘ allows for the global expansion to range from zero to Le D min 0:2N ; N 1 and allows for the local expansion to range y ˘y from zero to le D min 0:1Ny ; Ny 1 . EXPAND=(GLOBALPCT=20 GLOBALABS=10 LOCALPCT=10 ˘ LOCALABS=5) allows for the global expansion to range from zero to Le D min 0:2N ; min N 1 ; 10 y y ˘ and allows for the local expansion to range from zero to le D min 0:1Ny ; min Ny 1 ; 5 . Suppose Tz is the length of the input time series and Ny is the length of the target sequence. The valid global expansion limit, Le , is always limited by the length of the input time series: 0 Le < Tz . Suppose Nx is the length of the input sequence and Ny is the length of the target sequence. The valid local expansion limit, le , is always limited by the lengths of the input and target sequence: max 0; Nx Ny le < Nx . MEASURE=option
specifies the similarity measure to be computed by using the working input and target sequences. The following similarity measures are provided: SQRDEV
squared deviation. This option is the default.
ABSDEV
absolute deviation
MSQRDEV
mean squared deviation
MSQRDEVINP
mean squared deviation relative to the length of the input sequence
1460 F Chapter 22: The SIMILARITY Procedure (Experimental)
MSQRDEVTAR
mean squared deviation relative to the length of the target sequence
MSQRDEVMIN
mean squared deviation relative to the minimum valid path length
MSQRDEVMAX
mean squared deviation relative to the maximum valid path length
MABSDEV
mean absolute deviation
MABSDEVINP
mean absolute deviation relative to the length of the input sequence
MABSDEVTAR
mean absolute deviation relative to the length of the target sequence
MABSDEVMIN
mean absolute deviation relative to the minimum valid path length
MABSDEVMAX
mean absolute deviation relative to the maximum valid path length
User-Defined
Measure computed by a user-defined function created by using the FCMP procedure, where User-Defined is the function name
NORMALIZE=option
specifies the sequence normalization to be applied to the working target sequence. The following normalization options are provided: NONE
No normalization is applied. This option is the default.
ABSOLUTE
Absolute normalization is applied.
STANDARD
Standard normalization is applied.
User-Defined
Normalization computed by a user-defined subroutine, created by using the FCMP procedure, where User-Defined is the subroutine name.
PATH=option
specifies the similarity measure and warping path information to be computed using the working input and target sequences. The following similarity measures and warping path are provided: User-Defined
measure and path computed by a user-defined subroutine created by using the FCMP procedure, where User-Defined is the subroutine name
For computational efficiency, the PATH= option should be only used when it is desired to compute both the similarity measure and the warping path information. If only the similarity measure is needed, use the MEASURE= option. If both the MEASURE= and PATH= option are specified in the TARGET statement, the PATH= option takes precedence. SDIF=(numlist)
specifies the seasonal differencing to be applied to the accumulated time series. The list of seasonal differencing orders must be separated by spaces or commas. For example, SDIF=(1,3) specifies first, then third, order seasonal differencing. Differencing is applied after time series transformation. The TRANSFORM= option is applied before the SDIF= option. Seasonal differencing is useful when you want to deseasonalize the time series before computing the similarity measures. SETMISSING=option | number
TARGET Statement F 1461
SETMISS= option | number
option specifies how missing values (either actual or accumulated) are interpreted in the accumulated time series for variables listed in the TARGET statement. If the SETMISSING= option is not specified in the TARGET statement, missing values are set, based on the SETMISSING= option of the ID statement. If the SETMISSING= option is not specified on the ID statement or the TARGET statement, no missing value interpretation is performed. See the ID statement SETMISSING= option for more details. SLIDE=option
specifies the sliding of the target sequence with respect to the input sequence. The following slides are provided: NONE
No sequence sliding. The input time series is compared with the target sequence directly with no sliding. This option is the default.
INDEX
Slide by time index. The input time series is compared with the target sequence by observation index.
SEASON
Slide by seasonal index. The input time series is compared with the target sequence by seasonal index.
NOTE: The SLIDE= option takes precedence over the COMPRESS= and EXPAND= option. TRANSFORM= option
specifies the time series transformation to be applied to the accumulated time series. The following transformations are provided: NONE
No transformation is applied. This option is the default.
LOG
Logarithmic transformation is applied.
SQRT
Square-root transformation is applied.
LOGISTIC
Logistic transformation is applied.
BOXCOX(number )
Box-Cox transformation with parameter is applied, where the real
number is between -5 and 5 User-Defined
transformation is computed by a user-defined subroutine created by using the FCMP procedure, where User-Defined is the subroutine name.
When the TRANSFORM= option is specified, the time series must be strictly positive unless a user-defined function is used. TRIMMISSING=option TRIMMISS= option
specifies how missing values (either actual or accumulated) are trimmed from the accumulated time series or ordered sequence for variables listed in the TARGET statement. The following trimming options are provided: NONE
No missing value trimming is applied.
LEFT
Beginning missing values are trimmed.
1462 F Chapter 22: The SIMILARITY Procedure (Experimental)
RIGHT
Ending missing values are trimmed.
BOTH
Both beginning and ending missing values are trimmed. This is the default.
ZEROMISS=option
specifies how beginning and/or ending zero values (either actual or accumulated) are interpreted in the accumulated time series or ordered sequence for variables listed in the TARGET statement. If the ZEROMISS= option is not specified in the TARGET statement, beginning and/or ending values are set based on the ZEROMISS= option of the ID statement. See the ID statement ZEROMISS= option for more details.
Details: SIMILARITY Procedure The SIMILARITY procedure can be used to form time series data from transactional data. 1. accumulation
ACCUMULATE= option
2. missing value interpretation
SETMISSING= option
3. zero value interpretation
ZEROMISS= option
The accumulated time series can then be transformed to form the working time series. Transformations are useful when you want to stabilize the time series before computing the similarity measures. Simple and seasonal differencing are useful when you want to detrend or deseasonalize the time series before computing the similarity measures. Often, but not always, the TRANSFORM=, DIF=, and SDIF= options should be specified in the same way for both the target and input variables. 1. time series transformation
TRANSFORM= option
2. time series differencing
DIF= and SDIF= option
3. time series missing value trimming TRIMMISSING= option 4. time series descriptive statistics PRINT=DESCSTATS option After the working series is formed, it can be treated as an ordered sequence that can be normalized or scaled. Normalizations are useful when you want to compare the “shape” or “profile” of the time series. Scaling is useful when you want to compare the input sequence to the target sequence while discounting the variation of the target sequence. 1. normalization
NORMALIZE= option
2. scaling
SCALE= option
Accumulation F 1463
After the working sequences are formed, similarity measures can be computed between input and target sequences. 1. sliding
SLIDE= option
2. warping
COMPRESS= and EXPAND= option
3. similarity measure
MEASURE= and PATH= option
The SLIDE= option is used to specify observation-index sliding, seasonal-index sliding, or no sliding. The COMPRESS= and EXPAND= options are used to specify the warping limits. The MEASURE= and PATH= options are used to specify how the similarity measures are computed.
Accumulation If the ACCUMULATE= option is specified in the ID, INPUT, or TARGET statement, data set observations are accumulated within each time period. The frequency (width of each time interval) is specified by the INTERVAL= option in the ID statement. The ID variable contains the time ID values. Each time ID value corresponds to a specific time period. Accumulation is particularly useful when the input data set contains transactional data, whose observations are not spaced with respect to any particular time interval. The accumulated values form the time series, which is used in subsequent analyses. For example, suppose a data set contains the following observations: 19MAR1999 19MAR1999 11MAY1999 12MAY1999 23MAY1999
10 30 50 20 20
If the INTERVAL=MONTH is specified, all of the preceding observations fall within three time periods of March 1999, April 1999, and May 1999. The observations are accumulated within each time period as follows: If the ACCUMULATE=NONE option is specified, an error is generated because the ID variable values are not equally spaced with respect to the specified frequency (MONTH). If the ACCUMULATE=TOTAL option is specified, the data are accumulated as follows: O1MAR1999 O1APR1999 O1MAY1999
40 . 90
If the ACCUMULATE=AVERAGE option is specified, the data are accumulated as follows:
1464 F Chapter 22: The SIMILARITY Procedure (Experimental)
O1MAR1999 O1APR1999 O1MAY1999
20 . 30
If the ACCUMULATE=MINIMUM option is specified, the data are accumulated as follows: O1MAR1999 O1APR1999 O1MAY1999
10 . 20
If the ACCUMULATE=MEDIAN option is specified, the data are accumulated as follows: O1MAR1999 01APR1999 O1MAY1999
20 . 20
If the ACCUMULATE=MAXIMUM option is specified, the data are accumulated as follows: O1MAR1999 O1APR1999 O1MAY1999
30 . 50
If the ACCUMULATE=FIRST option is specified, the data are accumulated as follows: O1MAR1999 O1APR1999 O1MAY1999
10 . 50
If the ACCUMULATE=LAST option is specified, the data are accumulated as follows: O1MAR1999 O1APR1999 O1MAY1999
30 . 20
If the ACCUMULATE=STDDEV option is specified, the data are accumulated as follows: O1MAR1999 O1APR1999 O1MAY1999
14.14 . 17.32
As can be seen from the preceding examples, even though the data set observations contained no missing values, the accumulated time series can have missing values.
Missing Value Interpretation F 1465
Missing Value Interpretation Sometimes missing values should be interpreted as unknown values. But sometimes missing values are known, such as when missing values are created from accumulation and no observations should be interpreted as no (zero) value. In the former case, the SETMISSING= option in the ID, INPUT, or TARGET statement can be used to interpret how missing values are treated. The SETMISSING=0 option should be used when missing observations are to be treated as no (zero) values. In other cases, missing values should be interpreted as global values, such as minimum or maximum values of the accumulated series. The accumulated and interpreted time series is used in subsequent analyses.
Zero Value Interpretation When querying certain databases for time stamp data based on a particular time range, time periods that contain no data are sometimes assigned zero values. For certain analyses, it is more desirable to assign these values to missing. Often, these beginning or ending zero values need to be interpreted as missing values. The ZEROMISS= option in the ID, INPUT, or TARGET statement specifies that the beginning, ending, or both the beginning and ending values are to be interpreted as zero values.
Time Series Transformation Transformations are useful when you want to stabilize the time series before computing the similarity measures. There are four transformations available, for strictly positive series only. Let yt > 0 be the original time series, and let wt be the transformed series. The transformations are defined as follows: Log
is the logarithmic transformation, wt D ln.yt /
Logistic
is the logistic transformation, wt D ln.cyt =.1
cyt //
where the scaling factor c is c D .1
e
6
/10
ceil.log10 .max.yt ///
and ceil.x/ is the smallest integer greater than or equal to x. Square root
is the square root transformation, wt D
p
yt
1466 F Chapter 22: The SIMILARITY Procedure (Experimental)
Box-Cox
is the Box-Cox transformation, ( yt 1 ; ¤0 wt D ln.yt /; D 0
User-Defined
is the transformation computed by a user-defined subroutine created by using the FCMP procedure, where User-Defined is the subroutine name.
Other time series transformations can be performed prior to invoking the SIMILARITY procedure by using the EXPAND procedure of SAS/ETS or the DATA step.
Time Series Differencing After optionally transforming the series, the accumulated series can be simply or seasonally differenced using the INPUT or TARGET statement DIF= and SDIF= option. Simple and seasonal differencing are useful when you want to detrend or deseasonalize the time series before computing the similarity measures. For example, suppose yt is a monthly time series, the following examples of the DIF= and SDIF= options demonstrate how to simply and seasonally difference the time series. DIF=(1,3) specifies first, then third, order differencing. SDIF=(1,3) specifies first, then third, order seasonal differencing. Additionally, assuming that yt is strictly positive, the INPUT or TARGET statement TRANSFORM= option and the DIF= and SDIF= options can be combined.
Time Series Missing Value Trimming Sometimes time series missing values are interpreted as unknown values or are generated due to functional transformation or differencing. Either way the time series can contain missing values. The missing values can be trimmed from the beginning, the end, or both the beginning and the end of the time series with the TRIMMISS= option on the INPUT and TARGET statement.
Time Series Descriptive Statistics After a series has been optionally accumulated and transformed with missing values interpreted, descriptive statistics can be computed for the resulting working series by specifying the PRINT=DESCSTATS option. This option produces an ODS table that contains the sum, mean, minimum, maximum, and standard deviation of the working series.
Input and Target Sequences F 1467
Input and Target Sequences After the input and target working series are formed, they can be treated as two ordered sequences. Given an input time sequence, xi , for i D 1 to Nx , where i is the input sequence index, and a target time sequence, yj , for j D 1 to Ny , where j is the target sequence index, these sequences analayzed for similarity.
Sliding Sequences Similarity measures can be computed between the target sequence and any contiguous subsequences of the input time series. There are three types of sequence sliding: No Sliding Slide by Time Index Slide by Season Index For more information see Leonard et al (2008).
Time Warping Time warping allows for the comparison between target and input sequences of differing lengths by compressing or expanding the input sequence with respect the target sequence while respecting the order of the sequence elements. For more information see Leonard et al (2008).
Sequence Normalization The working (input or target) sequence can be normalized to prior to further analysis. Let qi be the original sequence with mean, q , and standard deviation, q ; and let rt be the normalized sequence. The normalizations are defined as follows: Standard is the standard normalization ri D .qi
q /=q
1468 F Chapter 22: The SIMILARITY Procedure (Experimental)
Absolute is the absolute normalization ri D .qi
mi n.qi //=.max.qi
mi n.qi //
User-Defined User-defined normalization
Sequence Scaling The working input sequence can be scaled to the working target sequence. Sequence scaling is applied after normalization. Let yj be the working target sequence with mean, y , and standard deviation, y . Let xi be the working input sequence and let qi be the scaled sequence. The scaling is defined as follows: Standard is the standard normalization qi D .xi
y /=y
Absolute is the absolute scaling qi D .xi
mi n.yj //=.max.yj
mi n.yj //
User-Defined is a user-defined scaling
Similarity Measures The working input sequence can be compared to the working target sequence. For more information see Leonard et al (2008).
User-Defined Functions and Subroutines A user-defined routine can be written in the SAS language by using the FCMP procedure or the C language by using both the FCMP procedure and the PROTO procedure, respectively. The SIMILARITY procedure cannot use C language routines directly. The procedure can only use SAS language routines that might or might not call C language routines. Creating user-defined routines is more completely described in the FCMP procedure and the PROTO procedure documentation. The FCMP and PROTO procedures are part of Base SAS software. The SAS language provides integrated memory management and exception handling such as operations on missing values. The C language provides flexibility and allows the integration of existing C language libraries. However, proper memory management and exception handling are solely the
User-Defined Functions and Subroutines F 1469
responsibility of the user. Additionally, the support for standard C libraries is restricted. If given a choice, it is highly recommended that user-defined functions and subroutines be written in the SAS language using the FCMP procedure. For each of the tasks described above, the following sections describe the required subroutine/function signature and provide examples of using a user-defined routine with the SIMILARITY procedure.
Time Series Transformations A user-defined transformation subroutine has the following subroutine signature: SUBROUTINE <SUBROUTINE-NAME> (
where the array-name is the time series to be transformed. For example, to duplicate the functionality of the built-in TRANSFORM=LOG option in the INPUT and TARGET statement, the following SAS statements create a user-defined version of this transformation called MYTRANSFORM and store this subroutine in the catalog SASUSER.MYSIMILAR. proc fcmp outlib=sasuser.mysimilar.package; subroutine mytransform( series[*] ); outargs series; length = DIM(series); do i = 1 to length; value = series[i]; if value > 0 then do; series[i] = log( value ); end; else do; series[i] = .; end; end; endsub; run;
This user-defined subroutine can be specified in the TRANSFORM= option of the INPUT or TARGET statement as follows: options cmplib = sasuser.mysimilar; proc similarity ...; ...
1470 F Chapter 22: The SIMILARITY Procedure (Experimental)
input myinput / transform=mytransform; target mytarget / transform=mytransform; ... run;
Sequence Normalizations A user-defined normalization subroutine has the following signature: SUBROUTINE <SUBROUTINE-NAME> (
where the array-name is the sequence to be normalized. For example, to duplicate the functionality of the built-in NORMALIZE=ABSOLUTE option in the INPUT and TARGET statement, the following SAS stements create a user-defined version of this normalization called MYNORMALIZE and store this subroutine in the catalog SASUSER.MYSIMILAR. proc fcmp outlib=sasuser.mysimilar.package; subroutine mynormalize( sequence[*] ); outargs sequence; length = DIM(sequence); minimum = .; maximum = .; do i = 1 to length; value = sequence[i]; if nmiss(minimum) | nmiss(maximum) then do; minimum = value; maximum = value; end; if nmiss(value) = 0 then do; if value < minimum then minimum = value; if value > maximum then maximum = value; end; end; do i = 1 to length; value = sequence[i]; if nmiss( value ) | minimum > maximum then do; sequence[i] = .; end; else do; sequence[i] = (value - minimum) / (maximum - minimum); end; end; endsub; run;
User-Defined Functions and Subroutines F 1471
This user-defined subroutine can be specified in the NORMALIZE= option of INPUT or TARGET statement as follows: options cmplib = sasuser.mysimilar; proc similarity ...; ... input myinput / normalize=mynormalize; target mytarget / normalize=mynormalize; ... run;
Sequence Scaling A user-defined scaling subroutine has the following signature: SUBROUTINE <SUBROUTINE-NAME> (
where the first array-name is the target sequence and the second array-name is the input sequence to be scaled. For example, to duplicate the functionality of the built-in SCALE=ABSOLUTE option in the INPUT statement, the following SAS statements create a user-defined version of this scaling called MYSCALE and store this subroutine in the catalog SASUSER.MYSIMILAR. proc fcmp outlib=sasuser.mysimilar.package; subroutine myscale( target[*], input[*] ); outargs input; length = DIM(target); minimum = .; maximum = .; do i = 1 to length; value = target[i]; if nmiss(minimum) | nmiss(maximum) then do; minimum = value; maximum = value; end; if nmiss(value) = 0 then do; if value < minimum then minimum = value; if value > maximum then maximum = value; end; end; do i = 1 to length; value = input[i]; if nmiss( value ) | minimum > maximum then do; input[i] = .;
1472 F Chapter 22: The SIMILARITY Procedure (Experimental)
end; else do; input[i] = (value - minimum) / (maximum - minimum); end; end; endsub; run;
This user-defined subroutine can be specified in the SCALE= option of INPUT statement as follows: options cmplib=sasuser.mysimilar; proc similarity ...; ... input myinput / scale=myscale; ... run;
Similarity Measures A user-defined similarity measure function has the following signature: FUNCTION
where the first array-name is the target sequence and the second array-name is the input sequence. The return value of the function is the similarity measure associated with the target sequence and the input sequence. For example, to duplicate the functionality of the built-in MEASURE=ABSDEV option in the TARGET statement with no warping, the following SAS statements create a user-defined version of this measure called MYMEASURE and store this subroutine in the catalog SASUSER.MYSIMILAR. proc fcmp outlib=sasuser.mysimilar.package; function mymeasure( target[*], input[*] ); length = min(DIM(target), DIM(input)); sum = 0; num = 0; do i = 1 to length; x = input[i]; w = target[i]; if nmiss(x) = 0 & nmiss(w) = 0 then do; d = x - w; sum = sum + abs(d); end; end;
User-Defined Functions and Subroutines F 1473
if num <= 0 then return(.); return(sum); endsub; run;
This user-defined function can be specified in the MEASURE= option of TARGET statement as follows: options cmplib=sasuser.mysimilar; proc similarity ...; ... target mytarget / measure=mymeasure; ... run;
For another example, to duplicate the functionality of the built-in MEASURE=SQRDEV and MEASURE=ABSDEV options by using the C language, the following SAS statements create a userdefined C language version of these measures called DTW_SQRDEV_C and DTW_ABSDEV_C; and store these functions in the catalog SASUSER.CSIMIL.CFUNCS. DTW refers to dynamic time warping. These C language functions can be then called by SAS language functions and subroutines. proc proto package=sasuser.csimil.cfuncs; mapmiss double = 999999999; double dtw_sqrdev_c( double * target / iotype=input, int targetLength, double * input / iotype=input, int inputLength ); externc dtw_sqrdev_c; double dtw_sqrdev_c( double * target, int targetLength, double * input, int inputLength ) { int i,j; double x,w,d; double * prev = (double *)malloc( sizeof(double)*targetLength); double * curr = (double *)malloc( sizeof(double)*inputLength); if ( prev == 0 || curr == 0 ) return 999999999; x = input[0]; for ( j=0; j
1474 F Chapter 22: The SIMILARITY Procedure (Experimental)
d = d*d; if ( j == 0 ) prev[j] = d; else prev[j] = d + prev[j-1]; } for (i=1; i
User-Defined Functions and Subroutines F 1475
d = x - w; d = fabs(d); if (j == 0) prev[j] = d; else prev[j] = d + prev[j-1]; } for (i=1; i
The preceding SAS statements create two C language functions which can then be used in SAS language functions and/or subroutines. However, these functions cannot be directly used by the SIMILARITY procedure. In order to use these C language functions in the SIMILARITY procedure, two SAS language functions must be created that call these two C language functions. The following SAS statements create two user-defined SAS language versions of these measures called DTW_SQRDEV and DTW_ABSDEV and stores these functions in the catalog SASUSER.MYSIMILAR.FUNCS. These SAS language functions use the previously created C language function; the SAS language functions can then be used by the SIMILARITY procedure. proc fcmp outlib=sasuser.mysimilar.funcs inlib=sasuser.cfuncs; function dtw_sqrdev( target[*], input[*] );
1476 F Chapter 22: The SIMILARITY Procedure (Experimental)
dev = dtw_sqrdev_c(target,DIM(target),input,DIM(input)); return( dev ); endsub; function dtw_absdev( target[*], input[*] ); dev = dtw_absdev_c(target,DIM(target),input,DIM(input)); return( dev ); endsub; run;
This user-defined function can be specified in the MEASURE= option of TARGET statement as follows: options cmplib=sasuser.mysimilar; proc similarity ...; ... target mytarget target yourtarget ... run;
/ measure=dtw_sqrdev; / measure=dtw_absdev;
Similarity Measures and Warping Path A user-defined similarity measure and warping path information function has the following signature: FUNCTION
where the first array-name is the target sequence, the second array-name is the input sequence, the third array-name is the returned target sequence indices, the fourth array-name is the returned input sequence indices, the fifth array-name is the returned path distances. The returned value of the function is the similarity measure. The last three returned arrays are used to compute the path and cost statistics. The returned sequence indices must represent a valid warping path, that is, integers greater than zero and less than or equal to the sequence length and recorded in ascending order. The returned path distances must be nonnegative numbers.
Output Data Sets The SIMILARITY procedure can create the OUT=, OUTSUM=, OUTMEASURE=, OUTSEQUENCE=, and OUTPATH= data sets. In general, these data sets contain the variables listed in
OUT= Data Set F 1477
the BY statement. The ID statement time ID variable is also included in the data sets when the time dimension is important. If an analysis step related to an output data step fails, then the values of this step are not recorded or are set to missing in the related output data set, and appropriate error and/or warning messages are recorded in the SAS log.
OUT= Data Set The OUT= data set contains the variables specified in the BY, ID, INPUT, and TARGET statements. If the ID statement is specified, the ID variable values are aligned and extended based on the ALIGN=, INTERVAL=, START=, and END= options. The values of the variables specified in the INPUT and TARGET statements are accumulated based on the ACCUMULATE= option, missing values are interpreted based on the SETMISSING= option, and zero values are interpreted using the ZEROMISS= option. The accumulated time series is transformed based on the TRANSFORM=, DIF=, and/ or SDIF= options.
OUTMEASURE= Data Set The OUTMEASURE= data set records the similarity measures between each INPUT and TARGET statement variable with respect to each time ID value. The form of the OUTMEASURE= data set depends on the SORTNAMES and ORDER= option. The OUTMEASURE= data set contains the variables specified in the BY statement as well as the variables listed below. For ORDER=INPUTTARGET and ORDER=TARGETINPUT, the OUTMEASURE= data set has the following form: _INPUT_
input variable name
_TARGET_
target variable name
_TIMEID_
time ID values
_INPSEQ_
input sequence values
_TARSEQ_
target sequence values
_SIM_
similarity measures
The OUTMEASURE= data set is ordered by the variables _INPUT_, then _TARGET_, then _TIMEID_ when ORDER=INPUTTARGET. The OUTMEASURE= data set is ordered by the variables _TARGET_, then _INPUT_, then _TIMEID_ when ORDER=TARGETINPUT. For ORDER=INPUT, the OUTMEASURE= data set has the following form: _INPUT_
input variable name
_TIMEID_
time ID values
_INPSEQ_
input sequence values
1478 F Chapter 22: The SIMILARITY Procedure (Experimental)
target-names
similarity measures associated with each TARGET statement variable name
The OUTMEASURE= data set is ordered by the variables _INPUT_, then _TIMEID_. For ORDER=TARGET, the OUTMEASURE= data set has the following form: _TARGET_
target variable name
_TIMEID_
time ID values
_TARSEQ_
target sequence values
input-names
similarity measures associated with each INPUT statement variable name
The OUTMEASURE= data set is ordered by the variables _TARGET_, then _TIMEID_.
OUTSUM= Data Set The OUTSUM= data set summarizes the similarity measures between each INPUT and TARGET statement variable. The form of the OUTSUM= data set depends on the SORTNAMES and ORDER= option. If the SORTNAMES option is specified, each variable (INPUT or TARGET) is analyzed in ascending order. The OUTSUM= data set contains the variables specified in the BY statement as well as the variables listed below. For ORDER=INPUTTARGET and ORDER=TARGETINPUT, the OUTSUM= data set has the following form: _INPUT_
input variable name
_TARGET_
target variable name
_STATUS_
status flag that indicates whether the requested analyses were successful
_TIMEID_
time ID values
_SIM_
similarity measure summary
The OUTSUM= data set is ordered by the variables _INPUT_, then _TARGET_ when ORDER=INPUTTARGET. The OUTSUM= data set is ordered by the variables _TARGET_, then _INPUT_ when ORDER=TARGETINPUT. For ORDER=INPUT, the OUTSUM= data set has the following form: _INPUT_
input variable name
_STATUS_
status flag that indicates whether the requested analyses were successful
target-names
similarity measure summary associated with each TARGET statement variable name
The OUTSUM= data set is ordered by the variable _INPUT_. For ORDER=TARGET, the OUTSUM= data set has the following form:
OUTSEQUENCE= Data Set F 1479
_TARGET_
target variable name
_STATUS_
status flag that indicates whether the requested analyses were successful
input-names
similarity measure summary associated with each INPUT statement variable name
The OUTSUM= data set is ordered by the variable _TARGET_.
OUTSEQUENCE= Data Set The OUTSEQUENCE= data set records the input and target sequences associated with each INPUT and TARGET statement variable. This data set records the input and target sequence values for each slide index and for each warp index associated with the slide index. The sequence values recorded are normalized and scaled based on the NORMALIZE= and SCALE= options. This data set also contains the similarity measure associated with the two sequences. The OUTSEQUENCE= data set contains the variables specified in the BY statement as well as the variables listed below. _INPUT_
input variable name
_TARGET_
target variable name
_TIMEID_
time ID values
_SLIDE_
slide index
_WARP_
warp Index
_INPSEQ_
input sequence values
_TARSEQ_
target sequence values
_SIM_
similarity measure
_STATUS_
sequence Status
The sorting of the OUTSEQUENCE= data set depends on the SORTNAMES and the ORDER= option. The OUTSEQUENCE= data set is ordered by the variables _INPUT_, then _TARGET_, then _TIMEID_ when ORDER=INPUTTARGET or ORDER=INPUT. The OUTSEQUENCE= data set is ordered by the variables _TARGET_, then _INPUT_, then _TIMEID_ when ORDER=TARGETINPUT or ORDER=TARGET. If there are a large number of slides and/or warps, this data set may be large.
1480 F Chapter 22: The SIMILARITY Procedure (Experimental)
OUTPATH= Data Set The OUTPATH= data set records the path analysis between each INPUT and TARGET statement variable. This data set records the path sequences for each slide index and for each warp index associated with the slide index. The sequence values recorded are normalized and scaled based on the NORMALIZE= and SCALE= options. The OUTPATH= data set contains the variables specified in the BY statement as well as the variables listed below. _INPUT_
input variable name
_TARGET_
target variable name
_TIMEID_
time ID values
_SLIDE_
slide index
_WARP_
warp index
_INPSEQ_
input sequence values
_TARSEQ_
target sequence values
_INPPTH_
input path index
_TARPTH_
target path index
_METRIC_
distance metric values
The sorting of the OUTPATH= data set depends on the SORTNAMES and the ORDER= option. The OUTPATH= data set is ordered by the variables _INPUT_, then _TARGET_, then _TIMEID_ when ORDER=INPUTTARGET or ORDER=INPUT. The OUTPATH= data set is ordered by the variables _TARGET_, then _INPUT_, then _TIMEID_ when ORDER=TARGETINPUT or ORDER=TARGET. If there are a large number of slides and/or warps, this data set may be large.
_STATUS_ Variable Values The _STATUS_ variable contains a code that specifies whether the similarity analysis has been successful or not. The _STATUS_ variable can take the following values: 0
Success
3000
Accumulation failure
4000
Missing value interpretation failure
6000
Series is all missing
7000
Transformation failure
Printed Output F 1481
8000
Differencing failure
9000
Unable to compute descriptive statistics
10000
Normalization failure
11000
Input contains imbedded missing values
12000
Target contains imbedded missing values
13000
Scaling failure
14000
Measure failure
15000
Path failure
16000
Slide summarization failure
Printed Output The SIMILARITY procedure optionally produces printed output by using the Output Delivery System (ODS). By default, the procedure produces no printed output. All output is controlled by the PRINT= and PRINTDETAILS options in the PROC SIMILARITY statement. The sort, order, and form of the printed output depends on both the SORTNAMES option and the ORDER= option. If the SORTNAMES option is specified, each variable (INPUT or TARGET) is analyzed in ascending order. For ORDER=INPUTTARGET, the printed output is ordered by the INPUT statement variables (row) and then by the TARGET statement variables (row). For ORDER=TARGETINPUT, the printed output is ordered by the TARGET statement variables (row) and then by the INPUT statement variables (row). For ORDER=INPUT, the printed output is ordered by the INPUT statement variables (row) and then the TARGET statement variables (column). For ORDER=TARGET, the printed output is ordered by the TARGET statement variables (row) and then the INPUT statement variables (column). In general, if an analysis step related to printed output fails, the values of this step are not printed and appropriate error and/or warning messages are recorded in the SAS log. The printed output is similar to the output data set, these similarities are described below.
PRINT=COSTS prints the costs statistics.
PRINT=DESCSTATS prints the descriptive statistics.
PRINT=PATHS prints the path statistics.
1482 F Chapter 22: The SIMILARITY Procedure (Experimental)
PRINT=SLIDES prints the sliding sequence summary.
PRINT=SUMMARY prints the summary of similarity measures similar to the OUTSUM= data set.
PRINT=WARPS prints the warp summary.
PRINTDETAILS prints each table with greater detail.
ODS Tables Names The table below relates the PRINT= options to ODS tables: Table 22.2
ODS Tables Produced in PROC SIMILARITY
ODS Table Name
Description
Option
CostStatistics DescStats PathLimits PathStatistics SlideMeasuresSummary MeasuresSummary InputMeasuresSummary TargetMeasuresSummary WarpMeasuresSummary
Cost statistics Descriptive statistics Path limits Path statistics Summary of measure per slide Measures summary Measures summary Measures summary Summary of measure per warp
PRINT=COSTS PRINT=DESCSTATS PRINT=PATHS PRINT=PATHS PRINT=SLIDES PRINT=SUMMARY PRINT=SUMMARY PRINT=SUMMARY PRINT=WARPS
The tables are related to a single series within a BY group.
ODS Graphics This section describes the use of ODS for creating graphics with the SIMILARITY procedure.
ODS Graphics F 1483
To request these graphs, you must specify the ODS GRAPHICS statement. In addition, you can specify the PLOTS= option in the PROC SIMILARITY statement as described in Table 22.3.
ODS Graph Names PROC SIMILARITY assigns a name to each graph it creates by using ODS. You can use these names to selectively reference the graphs. The names are listed in Table 22.3. Table 22.3
ODS Graphics Produced by PROC SIMILARITY
ODS Graph Name CostsPlot PathDistancePlot PathDistanceHistogram PathRelativeDistancePlot PathRelativeDistanceHistogram SeriesPlot TargetSequencePlot SequencePlot SimilarityPlot PathPlot PathSequencesPlot PathSequencesScaledPlot WarpPlot ScaledWarpPlot
Plot Description Costs plot Path distances plot Path distances histogram Path relative distances plot Path relative distances histogram Input time series plot Target sequence plot Sequence plot Similarity measures plot Path plot Path sequences plot Scaled path sequences map plot Warping plot Scaled warping plot
Statement SIMILARITY SIMILARITY SIMILARITY
PLOTS= Option PLOTS=COSTS PLOTS=DISTANCES PLOTS=DISTANCES
SIMILARITY
PLOTS=DISTANCES
SIMILARITY
PLOTS=DISTANCES
SIMILARITY SIMILARITY SIMILARITY SIMILARITY
PLOTS=INPUTS PLOTS=TARGETS PLOTS=SEQUENCES PLOTS=MEASURES
SIMILARITY SIMILARITY SIMILARITY
PLOTS=PATHS PLOTS=MAPS PLOTS=MAPS
SIMILARITY SIMILARITY
PLOTS=WARPS PLOTS=WARPS
Time Series Plots The time series plots (SeriesPlot) illustrate the input time series to be compared. The horizontal axis represents the input series time ID values, and the vertical axis represents the input series values.
Sequence Plots The sequence plots (SequencePlot) illustrate the target and input sequences to be compared. The horizontal axis represents the (target or input) sequence index, and the vertical axis represents the (target or input) sequence values.
1484 F Chapter 22: The SIMILARITY Procedure (Experimental)
Path Plots The path plot (PathPlot) and path limits plot (PathPlot) illustrate the path through the distance matrix. The horizontal axis represents the input sequence index, and the vertical axis represents the target sequence index. The dots represent the path coordinates. The upper parallel line represents the compression limit and the lower parallel line represents the expansion limit. These plots visualize the path through the distance matrix. Vertical movements indicate compression, and horizontal movements represent expansion of the target sequence with respect to the input sequence. These plots are useful for visualizing the amount of expansion and compression along the path.
Time Warp Plots The time warp plot (WarpPlot) and scaled time warp plot (ScaledWarpPlot) illustrate the time warping. The horizontal axis represents the (input and target) sequence index. The upper line plot represents the target sequence. The lower line plot represents the input sequence. The lines that connect the input and target sequence values represent the mapping between the input and target sequence indices along the optimal path. These plots visualize the warping of the time index with respect to the input and target sequence values. Expansion of a single target sequence value occurs when it is mapped to more than one input sequence value. Expansion of a single input sequence value occurs when it is mapped to more than one target sequence value. The plots are useful for visualizing the mapping between the input and target sequence values along the path. The plots are useful for comparing the path sequences or input and target sequence after time warping.
Path Sequence Plots The path sequence plot (PathSequencesPlot) and scaled path sequence plot (PathSequencesScaledPlot) illustrate the sequence mapping along the optimal path. The horizontal axis represents the path index. The dashed line represents the time warped input sequence. The solid line represents the time warped target sequence. These plots visualize the mapping between the input and target sequence values with respect to the path index. The scaled plot with the input and target sequence values are scaled and evenly separated for visual convenience.
Path Distance Plots The path distance plots (PathDistancePlot) and path relative distance plots (PathRelativeDistancePlot) illustrate the path (relative) distances. The horizontal axis represents the path index. The vertical needles represent the (relative) distances. The horizontal reference lines indicate one and two standard deviations. The path distance histogram (PathDistanceHistogram) and path relative distance histogram (PathDistanceRelativeHistogram) illustrate the distribution of the path (relative) distances. The bars represent the histogram, and the solid line represents a normal distribution with the same mean and variance.
Examples: SIMILARITY Procedure F 1485
Cost Plots The cost plot (CostPlot) and cost limits plot (CostPlot) illustrate the cost of traversing the distance matrix. The horizontal axis represents the input sequence index, and the vertical axis represents the target sequence index. The colors/shading within the plot illustrate the incremental cost of traversing the distance matrix. The upper parallel line represents the compression limit, and the lower parallel line represents the expansion limit.
Examples: SIMILARITY Procedure
Example 22.1: Accumulating Transactional Data into Time Series Data This example uses the SIMILARITY procedure to illustrates the accumulation of time-stamped transactional data that has been recorded at no particular frequency into time series data at a specific frequency. After the time series is created, the various SAS/ETS procedures related to time series analysis, similarity analysis, seasonal adjustment/decomposition, modeling, and forecasting can be used to further analyze the time series data. Suppose that the input data set WORK.RETAIL contains variables STORE and TIMESTAMP and numerous other numeric transaction variables. The BY variable STORE contains values that break up the transactions into groups (BY groups). The time ID variable TIMESTAMP contains SAS date values recorded at no particular frequency. The other data set variables contain the numeric transaction values to be analyzed. It is further assumed that the input data set is sorted by the variables STORE and TIMESTAMP. The following statements form monthly time series from the transactional data based on the median value (ACCUMULATE=MEDIAN) of the transactions recorded with each time period. The accumulated time series values for time periods with no transactions are set to zero instead of missing (SETMISS=0). Only transactions recorded between the first day of 1998 (START=’01JAN1998’D ) and last day of 2000 (END=’31JAN2000’D ) are considered and if needed are extended to include this range. proc similarity data=work.retail out=mseries; by store; id timestamp interval=month accumulate=median setmiss=0 start=’01jan1998’d end =’31dec2000’d; var _NUMERIC_; run;
The monthly time series data are stored in the data WORK.MSERIES. Each BY group associated with the BY variable STORE contains an observation for each of the 36 months associated with the
1486 F Chapter 22: The SIMILARITY Procedure (Experimental)
years 1998, 1999, and 2000. Each observation contains the variable STORE, TIMESTAMP, and each of the analysis variables in the input DATA= data set. After each set of transactions has been accumulated to form corresponding time series, accumulated time series can be analyzed by using various time series analysis techniques. For example, exponentially weighted moving averages can be used to smooth each series. The following statements use the EXPAND procedure to smooth the analysis variable named STOREITEM. proc expand data=mseries out=smoothed from=month; by store; id timestamp; convert storeitem=smooth / transform=(ewma 0.1); run;
The smoothed series is stored in the data set WORK.SMOOTHED. The variable SMOOTH contains the smoothed series. If the time ID variable TIMESTAMP contained SAS datetime values instead of SAS date values, the INTERVAL= , START=, and END= options in the SIMILARITY procedure must be changed accordingly, and the following statements could be used to accumulate the datetime transactions to a monthly interval: proc similarity data=work.retail out=tseries; by store; id timestamp interval=dtmonth accumulate=median setmiss=0 start=’01jan1998:00:00:00’dt end =’31dec2000:00:00:00’dt; var _NUMERIC_; run;
The monthly time series data are stored in the data WORK.TSERIES, and the time ID values use a SAS datetime representation.
Example 22.2: Similarity Analysis This simple example illustrates how to compare two time sequences using similarity analysis. The following statements create an example data set that contains two time sequences of differing lengths. data test; input i y x; datalines; 1 2 3 2 4 5 3 6 3 4 7 3
Example 22.2: Similarity Analysis F 1487
5 3 6 8 7 9 8 3 9 10 10 11 ; run;
3 6 3 8 . .
The following statements perform similarity analysis on the example data set: ods graphics on; proc similarity data=test out=_null_ print=all plot=all; input x; target y / measure=absdev; run;
The DATA=TEST option specifies that the input data set WORK.TEST is to be used in the analysis. The OUT=_NULL_ option specifies that no output time series data set is to be created. The PRINT=ALL and PLOTS=ALL options specify that all ODS tables and graphs are to be produced. The INPUT statement specifies that the input variable is X. The TARGET statement specifies that the target variable is Y and that the similarity measure is computed using absolute deviation (MEASURE=ABSDEV). Output 22.2.1 Description Statistics of the Input Variable, x The SIMILARITY Procedure Time Series Descriptive Statistics Variable Number of Observations Number of Missing Observations Minimum Maximum Mean Standard Deviation
x 10 2 3 8 4.25 1.908627
1488 F Chapter 22: The SIMILARITY Procedure (Experimental)
Output 22.2.2 Plot of Input Variable, x
Example 22.2: Similarity Analysis F 1489
Output 22.2.3 Target Sequence Plot
1490 F Chapter 22: The SIMILARITY Procedure (Experimental)
Output 22.2.4 Sequence Plot
Example 22.2: Similarity Analysis F 1491
Output 22.2.5 Path Plot
1492 F Chapter 22: The SIMILARITY Procedure (Experimental)
Output 22.2.6 Path Sequences Plot
Example 22.2: Similarity Analysis F 1493
Output 22.2.7 Path Sequences Scaled Plot
1494 F Chapter 22: The SIMILARITY Procedure (Experimental)
Output 22.2.8 Path Distance Plot
Example 22.2: Similarity Analysis F 1495
Output 22.2.9 Path Distance Histogram
1496 F Chapter 22: The SIMILARITY Procedure (Experimental)
Output 22.2.10 Path Relative Distance Plot
Example 22.2: Similarity Analysis F 1497
Output 22.2.11 Path Relative Distance Histogram
Output 22.2.12 Path Limits Path Limits
Limit Compression Expansion
Specified Absolute None None
Specified Percentage None None
Minimum Allowed
Maximum Allowed
Applied
2 0
9 7
9 7
1498 F Chapter 22: The SIMILARITY Procedure (Experimental)
Output 22.2.13 Path Statistics Path Statistics
Path Missing Map Direct Maps Compression Expansion Warps
Number 0 6 4 2 6
Path Percent
Input Percent
Target Percent
0.000% 50.00% 33.33% 16.67% 50.00%
0.000% 75.00% 50.00% 25.00% 75.00%
0.000% 60.00% 40.00% 20.00% 60.00%
Path Statistics
Path
Target Maximum Percent
Missing Map Direct Maps Compression Expansion Warps
0.000% 20.00% 10.00% 20.00% 20.00%
Maximum 0 2 1 2 2
Path Maximum Percent
Input Maximum Percent
0.000% 16.67% 8.333% 16.67% 16.67%
0.000% 25.00% 12.50% 25.00% 25.00%
Example 22.2: Similarity Analysis F 1499
Output 22.2.14 Cost Plot
Output 22.2.15 Cost Statistics Cost Statistics
Cost
Number
Total
Average
Standard Deviation
Minimum
Maximum
12 12
15.00000 2.25844
1.250000 0.188203
1.138180 0.160922
0 0
3.000000 0.500000
Absolute Relative
Cost Statistics
Cost Absolute Relative
Input Mean
Target Mean
Minimum Path Mean
Maximum Path Mean
1.875000 0.282305
1.500000 0.225844
1.875000 0.282305
0.8823529 0.1328495
Relative Costs based on Target Sequence values
1500 F Chapter 22: The SIMILARITY Procedure (Experimental)
Output 22.2.16 Time Warp Plot
Example 22.2: Similarity Analysis F 1501
Output 22.2.17 Time Warp Scaled Plot
The following statements repeat the above similarity analysis on the example data set with warping limits: ods graphics on; proc similarity data=test out=_null_ print=all plot=all; input x; target y / measure=absdev compress=(localabs=2) expand=(localabs=2); run;
The COMPRESS=(LOCALABS=2) option limits local absolute compression to 2. The EXPAND=(LOCALABS=2) option limits local absolute expansion to 2.
1502 F Chapter 22: The SIMILARITY Procedure (Experimental)
Output 22.2.18 Path Plot with Warping Limits
Output 22.2.19 Warped Path Limits Path Limits
Limit Compression Expansion
Specified Absolute 2 2
Specified Percentage None None
Minimum Allowed
Maximum Allowed
Applied
2 0
9 7
2 2
Example 22.2: Similarity Analysis F 1503
Output 22.2.20 Cost Plot with Warping Limits
The following statements repeat the above similarity analysis on the example data set but store the results in output data sets: proc similarity data=test out=series outsequence=sequences outpath=path outsum=summary; input x; target y / measure=absdev compress=(localabs=2) expand=(localabs=2); run;
The OUT=SERIES, OUTSEQUENCE=SEQUENCES, OUTPATH=PATH, and OUTSUM=SUMMARY options specify that the output time series, time sequences, path analysis, and summary data sets be created, respectively.
1504 F Chapter 22: The SIMILARITY Procedure (Experimental)
Example 22.3: Sliding Similarity Analysis This example illustrates how to compare two time sequences using sliding similarity analysis. The SASHELP.WORKERS data set contains two similar time series variables (ELECTRIC and MASONRY), which represent employment over time. The following statements create an example data set that contains two time series of differing lengths, where the variable MASONRY has the first 12 and last 7 observations set to missing to simulate the lack of data associated with the target series. data workers; set sashelp.workers; if ’01JAN1978’D <= date < ’01JAN1982’D then masonry = masonry; else masonry = .; run;
The goal of sliding similarity measures analysis is find the slide index that corresponds to the most similar subsequence of the input series when compared to the target sequence. The following statements perform sliding similarity analysis on the example data set: proc similarity data=workers out=_NULL_ print=(slides summary); id date interval=month; input electric; target masonry / slide=index measure=msqrdev expand=(localabs=3 globalabs=3) compress=(localabs=3 globalabs=3); run;
The DATA=WORKERS option specifies that the input data set WORK.WORKERS is to be used in the analysis. The OUT=_NULL_ option specifies that no output time series data set is to be created. The PRINT=(SLIDES SUMMARY) option specifies that the ODS tables related to the sliding similarity measures and their summary are produced. The INPUT statement specifies that the input variable is ELECTRIC. The TARGET statement specifies that the target variable is MASONRY and that the similarity measure is computed using mean squared deviation (MEASURE=MSQRDEV). The SLIDE=INDEX option specifies observation index sliding. The COMPRESS=(LOCALABS=3 GLOBALABS=3) option limits local and global absolute compression to 3. The EXPAND=(LOCALABS=3 GLOBALABS=3) option limits local and global absolute expansion to 3.
Example 22.3: Sliding Similarity Analysis F 1505
Output 22.3.1 Summary of the Slide Measures The SIMILARITY Procedure Slide Measures Summary for Input=ELECTRIC and Target=MASONRY
Slide Index
DATE
Slide Target Sequence Length
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
JAN1977 FEB1977 MAR1977 APR1977 MAY1977 JUN1977 JUL1977 AUG1977 SEP1977 OCT1977 NOV1977 DEC1977 JAN1978 FEB1978 MAR1978 APR1978 MAY1978 JUN1978 JUL1978 AUG1978 SEP1978 OCT1978 NOV1978
48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48
Slide Input Sequence Length
Slide Warping Amount
Slide Minimum Measure
51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 50 49 48 47 46 45
3 1 0 0 -2 -3 -3 3 3 3 3 3 2 1 0 -1 -2 -3 -3 0 -1 -2 -3
497.6737 482.6777 474.1251 490.7792 533.0788 605.8198 701.7138 646.5918 616.3258 510.9836 382.1434 340.4702 327.0572 322.5460 325.2689 351.4161 398.0490 471.6931 590.8089 595.2538 689.2233 745.8891 679.1907
Output 22.3.2 Minimum Measure Minimum Measure Summary Input Variable
MASONRY
ELECTRIC
322.5460
This analysis results in 23 slides based on the observation index with the minimum measure (322.5460) occurring at slide index 14 which corresponds to the time value FEB1978. Note that the original data set SASHELP.WORKERS was modified beginning at the time value JAN1978. This similarity analysis justifies the belief the ELECTRIC lags MASONRY by one month based on the time series cross-correlation analysis despite the lack of target data (MASONRY). The goal of seasonal sliding similarity measures is to find the seasonal slide index which corresponds to the most similar seasonal subsequence of the input series when compared to the target
1506 F Chapter 22: The SIMILARITY Procedure (Experimental)
sequence. The following statements repeat the above similarity analysis on the example data set with seasonal sliding: proc similarity data=workers out=_NULL_ print=(slides summary); id date interval=month; input electric; target masonry / slide=season measure=msqrdev; run;
Output 22.3.3 Summary of the Seasonal Slide Measures The SIMILARITY Procedure Slide Measures Summary for Input=ELECTRIC and Target=MASONRY
Slide Index
DATE
Slide Target Sequence Length
0 12
JAN1977 JAN1978
48 48
Slide Input Sequence Length
Slide Warping Amount
Slide Minimum Measure
48 48
0 0
1040.086 641.927
Output 22.3.4 Seasonal Minimum Measure Minimum Measure Summary Input Variable
MASONRY
ELECTRIC
641.9273
The analysis differs from the previous analysis in that the slides are performed based on the seasonal index (SLIDE=SEASON) with no warping. With a seasonality of 12, two seasonal slides are considered at slide indices 0 and 12 with the minimum measure (641.9273) occurring at slide index 12 which corresponds to the time value JAN1978. Note that the original data set SASHELP.WORKERS was modified beginning at the time value JAN1978. This similarity analysis justifies the belief that ELECTRIC and MASONRY have similar seasonal properties based on seasonal decomposition analysis despite the lack of target data (MASONRY).
Example 22.4: Searching for Historical Analogies This example illustrates how to search for historical analogies by using seasonal sliding similarity analysis of transactional time stamp data. The SASHELP.TIMEDATA data set contains the variable (VOLUME), which represents activity over time. The following statements create an example data
Example 22.4: Searching for Historical Analogies F 1507
set that contains two time series of differing lengths, where the variable HISTORY represents the historical activity and RECENT represents the more recent activity. data timedata; set sashelp.timedata; drop volume; recent = .; history = volume; if datetime >= ’20AUG2000:00:00:00’DT then do; recent = volume; history = .; end; run;
The goal of seasonal sliding similarity measures is to find the seasonal slide index that corresponds to the most similar seasonal subsequence of the input series when compared to the target sequence. The following statements perform similarity analysis on the example data set with seasonal sliding: proc similarity data=timedata out=_NULL_ outsequence=sequences outsum=summary; id datetime interval=dtday accumulate=total start=’27JUL1997:00:00:00’dt end=’21OCT2000:11:59:59’DT; input history / normalize=absolute; target recent / slide=season normalize=absolute measure=mabsdev; run;
The DATA=TIMEDATA option specifies that the input data set WORK.TIMEDATA is to be used in the analysis. The OUT=_NULL_ option specifies that no output time series data set is to be created. The OUTSEQUENCE=SEQUENCES and OUTSUM=SUMMARY options specify the output sequences and summary data sets, respectively. The ID statement specifies that the time ID variable is DATETIME, which is to be accumulated on a daily basis (INTERVAL=DTDAY) by summing the transactions (ACCUMULATE=TOTAL). The ID statement also specifies that the data is accumulated on the weekly boundaries starting on the week of 27JUL1997 and ending on the week of 15OCT2000 (START=’27JUL1997:00:00:00’DT END=’21OCT2000:11:59:59’DT). The INPUT statement specifies that the input variable is HISTORY, which is to be normalized using absolute normalization (NORMALIZE=ABSOLUTE). The TARGET statement specifies that the target variable is RECENT, which is to be normalized by using absolute normalization (NORMALIZE=ABSOLUTE) and that the similarity measure is computed by using mean absolute deviation (MEASURE=MABSDEV). The SLIDE=SEASON options specifies season index sliding. To illustrate the results of the similarity analysis, the output sequence data set must be subset by using the output summary data set. data _NULL_; set summary; call symput(’MEASURE’, left(trim(putn(recent,’BEST20.’)))); run; data result; set sequences; by _SLIDE_; retain flag 0; if first._SLIDE_ then do; if (&measure - 0.00001 < _SIM_ < &measure + 0.00001) then flag = 1;
1508 F Chapter 22: The SIMILARITY Procedure (Experimental)
end; if flag then output; if last._SLIDE_ then flag = 0; run;
The following statements generate a cross series plot of the results: proc timeseries data=result out=_NULL_ crossplot=series; id datetime interval=dtday; var _TARSEQ_; crossvar _INPSEQ_; run;
The cross series plot illustrates that the historical time series analogy most similar to the most recent time series data that started on 20AUG2000 occurred on 02AUG1998. Output 22.4.1 Cross Series Plot of the Historical Time Series
References F 1509
References Barry, M.J. and Linoff, G.S. (1997), Data Mining Techniques: For Marketing, Sales, and Customer Support, New York: John Wiley & Sons, Inc. Han, J. and Kamber, M. (2001), Data Mining: Concepts and Techniques, San Francisco: Morgan Kaufmann Publishers. Leonard, M.J. and Wolfe, B. L. (2005), Mining Transactional and Time Series Data, SUGI 30. Leonard, M.J., Elsheimer, D.B., and Sloan, J. (2008), An Introduction to Similarity Analysis Using SAS, SAS Forum 2008. Pyle, D. (1999), Data Preparation for Data Mining, San Francisco: Morgan Kaufman Publishers, Inc. Sankoff, D. and Kruskal, J. B. (2001), Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Stanford, CA: CSLI Publications.
1510
Chapter 23
The SIMLIN Procedure Contents Overview: SIMLIN Procedure . . . . . . . . . . . . . . . Getting Started: SIMLIN Procedure . . . . . . . . . . . . Prediction and Simulation . . . . . . . . . . . . . . Syntax: SIMLIN Procedure . . . . . . . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . . . . PROC SIMLIN Statement . . . . . . . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . . . . ENDOGENOUS Statement . . . . . . . . . . . . . EXOGENOUS Statement . . . . . . . . . . . . . . ID Statement . . . . . . . . . . . . . . . . . . . . . LAGGED Statement . . . . . . . . . . . . . . . . . OUTPUT Statement . . . . . . . . . . . . . . . . . Details: SIMLIN Procedure . . . . . . . . . . . . . . . . Defining the Structural Form . . . . . . . . . . . . . Computing the Reduced Form . . . . . . . . . . . . Dynamic Multipliers . . . . . . . . . . . . . . . . . Multipliers for Higher Order Lags . . . . . . . . . . EST= Data Set . . . . . . . . . . . . . . . . . . . . DATA= Data Set . . . . . . . . . . . . . . . . . . . OUTEST= Data Set . . . . . . . . . . . . . . . . . OUT= Data Set . . . . . . . . . . . . . . . . . . . . Printed Output . . . . . . . . . . . . . . . . . . . . ODS Table Names . . . . . . . . . . . . . . . . . . Examples: SIMLIN Procedure . . . . . . . . . . . . . . . Example 23.1: Simulating Klein’s Model I . . . . . Example 23.2: Multipliers for a Third-Order System References . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
1511 1512 1513 1514 1514 1515 1516 1516 1517 1517 1517 1517 1518 1519 1519 1519 1520 1521 1522 1522 1523 1523 1525 1525 1525 1534 1539
Overview: SIMLIN Procedure The SIMLIN procedure reads the coefficients for a set of linear structural equations, which are usually produced by the SYSLIN procedure. PROC SIMLIN then computes the reduced form
1512 F Chapter 23: The SIMLIN Procedure
and, if input data are given, uses the reduced form equations to generate predicted values. PROC SIMLIN is especially useful when dealing with sets of structural difference equations. The SIMLIN procedure can perform simulation or forecasting of the endogenous variables. The SIMLIN procedure can be applied only to models that are: linear with respect to the parameters linear with respect to the variables square (as many equations as endogenous variables) nonsingular (the coefficients of the endogenous variables form an invertible matrix)
Getting Started: SIMLIN Procedure The SIMLIN procedure processes the coefficients in a data set created by the SYSLIN procedure using the OUTEST= option or by another regression procedure such as PROC REG. To use PROC SIMLIN you must first produce the coefficient data set and then specify this data set on the EST= option of the PROC SIMLIN statement. You must also tell PROC SIMLIN which variables are endogenous and which variables are exogenous. List the endogenous variables in an ENDOGENOUS statement, and list the exogenous variables in an EXOGENOUS statement. The following example illustrates the creation of an OUTEST= data set with PROC SYSLIN and the computation and printing of the reduced form coefficients for the model with PROC SIMLIN. proc syslin data=in outest=e; model y1 = y2 x1; model y2 = y1 x2; run; proc simlin est=e; endogenous y1 y2; exogenous x1 x2; run;
If the model contains lagged endogenous variables you must also use a LAGGED statement to tell PROC SIMLIN which variables contain lagged values, which endogenous variables they are lags of, and the number of periods of lagging. For dynamic models, the TOTAL and INTERIM= options can be used on the PROC SIMLIN statement to compute and print total and impact multipliers. (See "Dynamic Multipliers" later in this section for an explanation of multipliers.) In the following example the variables Y1LAG1, Y2LAG1, and Y2LAG2 contain lagged values of the endogenous variables Y1 and Y2. Y1LAG1 and Y2LAG1 contain values of Y1 and Y2 for the previous observation, while Y2LAG2 contains 2 period lags of Y2. The LAGGED statement specifies the lagged relationships, and the TOTAL and INTERIM= options request multiplier analysis. The INTERIM=2 option prints matrices showing the impact that changes to the exogenous variables have on the endogenous variables after 1 and 2 periods.
Prediction and Simulation F 1513
data in; y1lag1 y2lag1 y2lag2 run;
set in; = lag(y1); = lag(y2); = lag2(y2);
proc syslin data=in outest=e; model y1 = y2 y1lag1 y2lag2 x1; model y2 = y1 y2lag1 x2; run; proc simlin est=e total interim=2; endogenous y1 y2; exogenous x1 x2; lagged y1lag1 y1 1 y2lag1 y2 1 y2lag2 y2 2; run;
After the reduced form of the model is computed, the model can be simulated by specifying an input data set on the PROC SIMLIN statement and using an OUTPUT statement to write the simulation results to an output data set. The following example modifies the PROC SIMLIN step from the preceding example to simulate the model and stores the results in an output data set. proc simlin est=e total interim=2 data=in; endogenous y1 y2; exogenous x1 x2; lagged y1lag1 y1 1 y2lag1 y2 1 y2lag2 y2 2; output out=sim predicted=y1hat y2hat residual=y1resid y2resid; run;
Prediction and Simulation If an input data set is specified with the DATA= option in the PROC SIMLIN statement, the procedure reads the data and uses the reduced form equations to compute predicted and residual values for each of the endogenous variables. (If no data set is specified with the DATA= option, no simulation of the system is performed, and only the reduced form and multipliers are computed.) The character of the prediction is based on the START= value. Until PROC SIMLIN encounters the START= observation, actual endogenous values are found and fed into the lagged endogenous terms. Once the START= observation is reached, dynamic simulation begins, where predicted values are fed into lagged endogenous terms until the end of the data set is reached. The predicted and residual values generated here are different from those produced by the SYSLIN procedure since PROC SYSLIN uses the structural form with actual endogenous values. The predicted values computed by the SIMLIN procedure solve the simultaneous equation system. These reduced-form predicted values are functions only of the exogenous and lagged endogenous variables and do not depend on actual values of current period endogenous variables.
1514 F Chapter 23: The SIMLIN Procedure
Syntax: SIMLIN Procedure The following statements can be used with PROC SIMLIN: PROC SIMLIN options ; BY variables ; ENDOGENOUS variables ; EXOGENOUS variables ; ID variables ; LAGGED lag-var endogenous-var number ellipsis ; OUTPUT OUT=SAS-data-set options ;
Functional Summary The statements and options controlling the SIMLIN procedure are summarized in the following table. Description
Statement
Option
PROC SIMLIN
EST=
PROC SIMLIN
TYPE=
PROC SIMLIN
OUTEST=
PROC SIMLIN OUTPUT
DATA=
Printing Control Options print the structural coefficients suppress printing of reduced form coefficients suppress all printed output
PROC SIMLIN PROC SIMLIN PROC SIMLIN
ESTPRINT NORED NOPRINT
Dynamic Multipliers compute interim multipliers compute total multipliers
PROC SIMLIN PROC SIMLIN
INTERIM= TOTAL
Declaring the Role of Variables specify BY-group processing specify the endogenous variables specify the exogenous variables specify identifying variables
BY ENDOGENOUS EXOGENOUS ID
Data Set Options specify input data set containing structural coefficients specify type of estimates read from EST= data set write reduced form coefficients and multipliers to an output data set specify the input data set for simulation write predicted and residual values to an output data set
PROC SIMLIN Statement F 1515
Description
Statement
specify lagged endogenous variables
LAGGED
Controlling the Simulation specify the starting observation for dynamic simulation
PROC SIMLIN
Option
START=
PROC SIMLIN Statement PROC SIMLIN options ;
The following options can be used in the PROC SIMLIN statement: DATA= SAS-data-set
specifies the SAS data set containing input data for the simulation. If the DATA= option is used, the data set specified must supply values for all exogenous variables throughout the simulation. If the DATA= option is not specified, no simulation of the system is performed, and only the reduced form and multipliers are computed. EST= SAS-data-set
specifies the input data set containing the structural coefficients of the system. If EST= is omitted the most recently created SAS data set is used. The EST= data set is normally a "TYPE=EST" data set produced by the OUTEST= option of PROC SYSLIN. However, you can also build the EST= data set with a SAS DATA step. See "The EST= Data Set" later in this chapter for details. ESTPRINT
prints the structural coefficients read from the EST= data set. INTERIM= n
requests that interim multipliers be computed for interims 1 through n. If not specified, no interim multipliers are computed. This feature is available only if there are no lags greater than 1. NOPRINT
suppresses all printed output. NORED
suppresses the printing of the reduced form coefficients. OUTEST= SAS-data-set
specifies an output SAS data set to contain the reduced form coefficients and multipliers, in addition to the structural coefficients read from the EST= data set. The OUTEST= data set has the same form as the EST= data set. If the OUTEST= option is not specified, the reduced form coefficients and multipliers are not written to a data set.
1516 F Chapter 23: The SIMLIN Procedure
START= n
specifies the observation number in the DATA= data set where the dynamic simulation is to be started. By default, the dynamic simulation starts with the first observation in the DATA= data set for which all variables (including lags) are not missing. TOTAL
requests that the total multipliers be computed. This feature is available only if there are no lags greater than 1. TYPE= value
specifies the type of estimates to be read from the EST= data set. The TYPE= value must match the value of the _TYPE_ variable for the observations that you want to select from the EST= data set (TYPE=2SLS, for example).
BY Statement BY variables ;
A BY statement can be used with PROC SIMLIN to obtain separate analyses for groups of observations defined by the BY variables. The BY statement can be applied to one or both of the EST= and the DATA= input data set. When a BY statement is used and both an EST= and a DATA= input data set are specified, PROC SIMLIN checks to see if one or both of the data sets contain the BY variables. Thus, there are three ways of using the BY statement with PROC SIMLIN: 1. If the BY variables are found in the EST= data set only, PROC SIMLIN simulates over the entire DATA= data set once for each set of coefficients read from the BY groups in the EST= data set. 2. If the BY variables are found in the DATA= data set only, PROC SIMLIN performs separate simulations over each BY group in the DATA= data set, using the single set of coefficients in the EST= data set. 3. If the BY variables are found in both the EST= and the DATA= data sets, PROC SIMLIN performs separate simulations over each BY group in the DATA= data set using the coefficients from the corresponding BY group in the EST= data set.
ENDOGENOUS Statement ENDOGENOUS variables ;
List the names of the endogenous (jointly dependent) variables in the ENDOGENOUS statement. The ENDOGENOUS statement can be abbreviated as ENDOG or ENDO.
EXOGENOUS Statement F 1517
EXOGENOUS Statement EXOGENOUS variables ;
List the names of the exogenous (independent) variables in the EXOGENOUS statement. The EXOGENOUS statement can be abbreviated as EXOG or EXO.
ID Statement ID variables ;
The ID statement can be used to restrict the variables copied from the DATA= data set to the OUT= data set. Use the ID statement to list the variables you want copied to the OUT= data set besides the exogenous, endogenous, lagged endogenous, and BY variables. If the ID statement is omitted, all the variables in the DATA= data set are copied to the OUT= data set.
LAGGED Statement LAGGED lag-var endogenous-var number ellipsis ;
For each lagged endogenous variable, specify the name of the lagged variable, the name of the endogenous variable that was lagged, and the degree of the lag. Only one LAGGED statement is allowed. The following is an example of the use of the LAGGED statement: proc simlin est=e; endog y1 y2; lagged y1lag1 y1 1 run;
y2lag1 y2 1
y2lag3 y2 3;
This statement specifies that the variable Y1LAG1 contains the values of the endogenous variable Y1 lagged one period; the variable Y2LAG1 refers to the values of Y2 lagged one period; and the variable Y2LAG3 refers to the values of Y2 lagged three periods.
OUTPUT Statement OUTPUT OUT= SAS-data-set options ;
1518 F Chapter 23: The SIMLIN Procedure
The OUTPUT statement specifies that predicted and residual values be put in an output data set. A DATA= input data set must be supplied if the OUTPUT statement is used, and only one OUTPUT statement is allowed. The following options can be used in the OUTPUT statement: OUT= SAS-data-set
names the output SAS data set to contain the predicted values and residuals. If OUT= is not specified, the output data set is named using the DATAn convention. PREDICTED= names P= names
names the variables in the output data set that contain the predicted values of the simulation. These variables correspond to the endogenous variables in the order in which they are specified in the ENDOGENOUS statement. Specify up to as many names as there are endogenous variables. If you specify names on the PREDICTED= option for only some of the endogenous variables, predicted values for the remaining variables are not output. The names must not match any variable name in the input data set. RESIDUAL= names R= names
names the variables in the output data set that contain the residual values from the simulation. The residuals are the differences between the actual values of the endogenous variables from the DATA= data set and the predicted values from the simulation. These variables correspond to the endogenous variables in the order in which they are specified in the ENDOGENOUS statement. Specify up to as many names as there are endogenous variables. The names must not match any variable name in the input data set. The following is an example of the use of the OUTPUT statement. This example outputs predicted values for Y1 and Y2 and outputs residuals for Y1. proc simlin est=e; endog y1 y2; output out=b predicted=y1hat y2hat residual=y1resid; run;
Details: SIMLIN Procedure The following sections explain the structural and reduced forms, dynamic multipliers, input data sets, and the model simulation process in more detail.
Defining the Structural Form F 1519
Defining the Structural Form An EST= input data set supplies the coefficients of the equation system. The data set containing the coefficients is normally a "TYPE=EST" data set created by the OUTEST= option of PROC SYSLIN or another regression procedure. The data set contains the special variables _TYPE_, _DEPVAR_, and INTERCEPT. You can also supply the structural coefficients of the system to PROC SIMLIN in a data set produced by a SAS DATA step as long as the data set is of the form TYPE=EST. Refer to SAS/STAT software documentation for a discussion of the special TYPE=EST type of SAS data set. Suppose that there is a g 1 vector of endogenous variables y t , an l 1 vector of lagged endogenous variables yL t , and a k 1 vector of exogenous variables x t , including the intercept. Then, there are g structural equations in the simultaneous system that can be written Gyt D CyL t C Bxt where G is the matrix of coefficients of current period endogenous variables, C is the matrix of coefficients of lagged endogenous variables, and B is the matrix of coefficients of exogenous variables. G is assumed to be nonsingular.
Computing the Reduced Form First, the SIMLIN procedure computes reduced form coefficients by premultiplying by G yt D G
1
CyL t CG
1
1:
Bxt
This can be written as yt D …1 yL t C …2 xt where …1 = G
1C
and …2 = G
1B
are the reduced form coefficient matrices.
The reduced form matrices …1 = G 1 C and …2 = G 1 B are printed unless the NORED option is specified in the PROC SIMLIN statement. The structural coefficient matrices G, C, and B are printed when the ESTPRINT option is specified.
Dynamic Multipliers For models that have only first-order lags, the equation of the reduced form of the system can be rewritten yt D Dyt
1
C …2 xt
1520 F Chapter 23: The SIMLIN Procedure
D is a matrix formed from the columns of …1 plus some columns of zeros, arranged in the order in which the variables meet the lags. The elements of …2 are called impact multipliers because they show the immediate effect of changes in each exogenous variable on the values of the endogenous variables. This equation can be rewritten as yt D D 2 yt
2
C D…2 xt
1
C …2 xt
The matrix formed by the product D …2 shows the effect of the exogenous variables one lag back; the elements in this matrix are called interim multipliers and are computed and printed when the INTERIM= option is specified in the PROC SIMLIN statement. The i th period interim multipliers are formed by D i …2 . The series can be expanded as yt D D 1 yt
1C
1 X
Di …2 xt
i
i D0
A permanent and constant setting of a value for x has the following cumulative effect: ! 1 X Di …2 x D .I D/ 1 …2 x iD0
The elements of (I-D) 1 …2 are called the total multipliers. Assuming that the sum converges and that (I-D ) is invertible, PROC SIMLIN computes the total multipliers when the TOTAL option is specified in the PROC SIMLIN statement.
Multipliers for Higher Order Lags The dynamic multiplier options require the system to have no lags of order greater than one. This limitation can be circumvented, since any system with lags greater than one can be rewritten as a system where no lag is greater than one by forming new endogenous variables that are single-period lags. For example, suppose you have the third-order single equation yt D ayt
3
C bxt
This can be converted to a first-order three-equation system by introducing two additional endogenous variables, y1;t and y2;t , and computing corresponding first-order lagged variables for each endogenous variable: yt 1 , y1;t 1 , and y2;t 1 . The higher order lag relations are then produced by adding identities to link the endogenous and identical lagged endogenous variables: y1;t D yt
1
y2;t D y1;t
1
yt D ay2;t
1
C bXt
This conversion using the SYSLIN and SIMLIN procedures requires three steps:
EST= Data Set F 1521
1. Add the extra endogenous and lagged endogenous variables to the input data set using a DATA step. Note that two copies of each lagged endogenous variable are needed for each lag reduced, one to serve as an endogenous variable and one to serve as a lagged endogenous variable in the reduced system. 2. Add IDENTITY statements to the PROC SYSLIN step to equate each added endogenous variable to its lagged endogenous variable copy. 3. In the PROC SIMLIN step, declare the added endogenous variables in the ENDOGENOUS statement and define the lag relations in the LAGGED statement. See Example 23.2 for an illustration of how to convert an equation system with higher-order lags into a larger system with only first-order lags.
EST= Data Set Normally, PROC SIMLIN uses an EST= data set produced by PROC SYSLIN with the OUTEST= option. This data set is in the form expected by PROC SIMLIN. If there is more than one set of estimates produced by PROC SYSLIN, you must use the TYPE= option in the PROC SIMLIN statement to select the set to be simulated. Then PROC SIMLIN reads from the EST= data set only those observations with a _TYPE_ value corresponding to the TYPE= option (for example, TYPE=2SLS) or with a _TYPE_ value of IDENTITY. The SIMLIN procedure can only solve square, nonsingular systems. If you have fewer equations than endogenous variables, you must specify IDENTITY statements in the PROC SYSLIN step to bring the system up to full rank. If there are g endogenous variables and m
1522 F Chapter 23: The SIMLIN Procedure
DATA= Data Set The DATA= data set must contain all of the exogenous variables. Values for all of the exogenous variables are required for each observation for which predicted endogenous values are desired. To forecast past the end of the historical data, the DATA= data set should contain nonmissing values for all of the exogenous variables and missing values for the endogenous variables for the forecast periods, in addition to the historical data. (See Example 23.1 for an illustration.) In order for PROC SIMLIN to output residuals and compute statistics of fit, the DATA= data set must also contain the endogenous variables with nonmissing actual values for each observation for which residuals and statistics are to be computed. If the system contains lags, initial values must be supplied for the lagged variables. This can be done by including either the lagged variables or the endogenous variables, or both, in the DATA= data set. If the lagged variables are not in the DATA= data set or if they have missing values in the early observations, PROC SIMLIN prints a warning and uses the endogenous variable values from the early observations to initialize the lags.
OUTEST= Data Set The OUTEST= data set contains all the variables read from the EST= data set. The variables in the OUTEST= data set are as follows. the BY statement variables, if any _TYPE_, a character variable that identifies the type of observation _DEPVAR_, a character variable containing the name of the dependent variable for the observation the endogenous variables the lagged endogenous variables the exogenous variables INTERCEPT, a numeric variable containing the intercept values _MODEL_, a character variable containing the name of the equation _SIGMA_, a numeric variable containing the estimated error variance of the equation (output only if present in the EST= data set) The observations read from the EST= data set that supply the structural coefficients are copied to the OUTEST= data set, except that the signs of endogenous coefficients are reversed. For these observations, the _TYPE_ variable values are the same as in the EST= data set. In addition, the OUTEST= data set contains observations with the following _TYPE_ values:
OUT= Data Set F 1523
REDUCED
the reduced form coefficients. The endogenous variables for this group of observations contain the inverse of the endogenous coefficient matrix G. The lagged endogenous variables contain the matrix …1 =G 1 C. The exogenous variables contain the matrix …2 =G 1 B.
IMULTi
the interim multipliers, if the INTERIM= option is specified. There are gn observations for the interim multipliers, where g is the number of endogenous variables and n is the value of the INTERIM=n option. For these observations the _TYPE_ variable has the value IMULTi, where the interim number i ranges from 1 to n. The exogenous variables in groups of g observations that have a _TYPE_ value of IMULTi contain the matrix D i …2 of multipliers at interim i. The endogenous and lagged endogenous variables for this group of observations are set to missing.
TOTAL
the total multipliers, if the TOTAL option is specified. The exogenous variables in this group of observations contain the matrix (I-D ) 1 …2 . The endogenous and lagged endogenous variables for this group of observations are set to missing.
OUT= Data Set The OUT= data set normally contains all of the variables in the input DATA= data set, plus the variables named in the PREDICTED= and RESIDUAL= options in the OUTPUT statement. You can use an ID statement to restrict the variables that are copied from the input data set. If an ID statement is used, the OUT= data set contains only the BY variables (if any), the ID variables, the endogenous and lagged endogenous variables (if any), the exogenous variables, plus the PREDICTED= and RESIDUAL= variables. The OUT= data set contains an observation for each observation in the DATA= data set. When the actual value of an endogenous variable is missing in the DATA= data set, or when the DATA= data set does not contain the endogenous variable, the corresponding residual is missing.
Printed Output Structural Form The following items are printed as they are read from the EST= input data set. Structural zeros are printed as dots in the listing of these matrices. 1. Structural Coefficients for Endogenous Variables. This is the G matrix, with g rows and g columns.
1524 F Chapter 23: The SIMLIN Procedure
2. Structural Coefficients for Lagged Endogenous Variables. These coefficients make up the C matrix, with g rows and l columns. 3. Structural Coefficients for Exogenous Variables. These coefficients make up the B matrix, with g rows and k columns.
Reduced Form 1. The reduced form coefficients are obtained by inverting G so that the endogenous variables can be directly expressed as functions of only lagged endogenous and exogenous variables. 2. Inverse Coefficient Matrix for Endogenous Variables. This is the inverse of the G matrix. 3. Reduced Form for Lagged Endogenous Variables. This is …1 =G 1 C, with g rows and l columns. Each value is a dynamic multiplier that shows how past values of lagged endogenous variables affect values of each of the endogenous variables. 4. Reduced Form for Exogenous Variables. This is …2 =G 1 B, with g rows and k columns. Its values are called impact multipliers because they show the immediate effect of each exogenous variable on the value of the endogenous variables.
Multipliers Interim and total multipliers show the effect of a change in an exogenous variable over time. 1. Interim Multipliers. These are the interim multiplier matrices. They are formed by multiplying …2 by powers of D. The d th interim multiplier is D d …2 . The interim multiplier of order d shows the effects of a change in the exogenous variables after d periods. Interim multipliers are only available if the maximum lag of the endogenous variables is 1. 2. Total Multipliers. This is the matrix of total multipliers, T=(I-D ) 1 …2 . This matrix shows the cumulative effect of changes in the exogenous variables. Total multipliers are only available if the maximum lag is one.
Statistics of Fit If the DATA= option is used and the DATA= data set contains endogenous variables, PROC SIMLIN prints a statistics-of-fit report for the simulation. The statistics printed include the following. (Summations are over the observations for which both yt and yOt are nonmissing.) 1. the number of nonmissing errors. (Number of observations for which both yt and yOt are nonmissing.) P 2. the mean error: n1 .yt yOt / 3. the mean percent error: 4. the mean absolute error:
100 P .yt yO t / n yt 1P jyt n
yOt j
Examples: SIMLIN Procedure F 1525
P jyt yOt j 5. the mean absolute percent error 100 n yt q P 6. the root mean square error: n1 .yt yOt /2 7. the root mean square percent error:
q
100 P .yt yO t / 2 . yt / n
ODS Table Names PROC SIMLIN assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. Table 23.2
ODS Tables Produced in PROC SIMLIN
ODS Table Name
Description
Option
Endogenous
Structural Coefficients for Endogenous Variables Structural Coefficients for Lagged Endogenous Variables Structural Coefficients for Exogenous Variables Inverse Coefficient Matrix for Endogenous Variables Reduced Form for Lagged Endogenous Variables Reduced Form for Exogenous Variables Interim Multipliers Total Multipliers Fit statistics
default
LaggedEndogenous Exogenous InverseCoeff RedFormLagEndo RedFormExog InterimMult TotalMult FitStatistics
default default default default default INTERIM= option TOTAL= option default
Examples: SIMLIN Procedure
Example 23.1: Simulating Klein’s Model I In this example, the SIMLIN procedure simulates a model of the U.S. economy called Klein’s Model I. The SAS data set KLEIN is used as input to the SYSLIN and SIMLIN procedures.
1526 F Chapter 23: The SIMLIN Procedure
data klein; input year c p w i x wp g t k wsum; date=mdy(1,1,year); format date year.; y = c + i + g - t; yr = year - 1931; klag = lag( k ); plag = lag( p ); xlag = lag( x ); if year >= 1921; label c =’consumption’ p =’profits’ w =’private wage bill’ i =’investment’ k =’capital stock’ y =’national income’ x =’private production’ wsum=’total wage bill’ wp =’govt wage bill’ g =’govt demand’ t =’taxes’ klag=’capital stock lagged’ plag=’profits lagged’ xlag=’private product lagged’ yr =’year-1931’; datalines; 1920 . 12.7 . . 44.9 .
.
.
182.8
.
... more lines ...
First, the model is specified and estimated using the SYSLIN procedure, and the parameter estimates are written to an OUTEST= data set. The printed output produced by the SYSLIN procedure is not shown here; see Example 26.1 in Chapter 26 for the printed output of the PROC SYSLIN step. title1 ’Simulation of Klein’’s Model I using SIMLIN’; proc syslin 3sls data=klein outest=a; instruments klag plag xlag wp g t yr; endogenous c p w i x wsum k y; consume: model invest: model labor: model product: income: profit: stock: wage: run;
identity identity identity identity identity
c = p plag wsum; i = p plag klag; w = x xlag yr; x = c + i + g; y = c + i + g - t; p = x - w - t; k = klag + i; wsum = w + wp;
The OUTEST= data set A created by the SYSLIN procedure contains parameter estimates to be used by the SIMLIN procedure. The OUTEST= data set is shown in Output 23.1.1.
Example 23.1: Simulating Klein’s Model I F 1527
Output 23.1.1 The OUTEST= Data Set Created by PROC SYSLIN Simulation of Klein’s Model I using SIMLIN
_ S T A T U S _
_ T Y O P b E s _ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
INST INST INST INST INST INST INST INST 3SLS 3SLS 3SLS IDENTITY IDENTITY IDENTITY IDENTITY IDENTITY
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Converged Converged Converged Converged Converged Converged Converged Converged Converged Converged Converged Converged Converged Converged Converged Converged
_ M O D E L _
_ D E P V A R _
FIRST FIRST FIRST FIRST FIRST FIRST FIRST FIRST CONSUME INVEST LABOR PRODUCT INCOME PROFIT STOCK WAGE
c p w i x wsum k y c i w x y p k wsum
O b s
w p
g
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0.19327 -0.07961 -0.44373 -0.71661 -0.52334 0.55627 -0.71661 -0.52334 . . . . . . . 1.00000
0.20501 0.43902 0.86622 0.10023 1.30524 0.86622 0.10023 1.30524 . . . 1.00000 1.00000 . . .
-0.36573 -0.92310 -0.60415 -0.16152 -0.52725 -0.60415 -0.16152 -1.52725 . . . . -1.00000 -1.00000 . .
y r
_ S I G M A _
I n t e r c e p t
k l a g
2.11403 2.18298 1.75427 1.72376 3.77347 1.75427 1.72376 3.77347 1.04956 1.60796 0.80149 . . . . .
58.3018 50.3844 43.4356 35.5182 93.8200 43.4356 35.5182 93.8200 16.4408 28.1778 1.7972 0.0000 0.0000 0.0000 0.0000 0.0000
-0.14654 -0.21610 -0.12295 -0.19251 -0.33906 -0.12295 0.80749 -0.33906 . -0.19485 . . . . 1.00000 .
c
p
w
i
p l a g
x
x l a g
0.74803 0.23007 0.80250 0.02200 0.87192 0.09533 0.92639 -0.11274 1.67442 0.11733 0.87192 0.09533 0.92639 -0.11274 1.67442 0.11733 0.16314 . 0.75572 . . 0.18129 . . . . . . . . . .
w s u m
k
y
0.70109 -1 . . . . . . . 0.31941 . -1.00000 . . . . . . 0.71358 . . -1 . . . . . 0.33190 . . . -1 . . . . 1.03299 . . . . -1.00000 . . . 0.71358 . . . . . -1.00000 . . 0.33190 . . . . . . -1 . 1.03299 . . . . . . . -1 . -1 0.12489 . . . 0.79008 . . . . -0.01308 . -1 . . . . 0.14967 . . -1 . 0.40049 . . . . 1 . . 1 -1.00000 . . . . 1 . . 1 . . . -1 . . -1.00000 -1 . 1.00000 . . . . . . . 1 . . -1 . . . . 1 . . -1.00000 . .
1528 F Chapter 23: The SIMLIN Procedure
Using the OUTEST= data set A produced by the SYSLIN procedure, the SIMLIN procedure can now compute the reduced form and simulate the model. The following statements perform the simulation. title1 ’Simulation of Klein’’s Model I using SIMLIN’; proc simlin data=klein est=a type=3sls estprint total interim=2 outest=b; endogenous c p w i x wsum k y; exogenous wp g t yr; lagged klag k 1 plag p 1 xlag x 1; id year; output out=c p=chat phat what ihat xhat wsumhat khat yhat r=cres pres wres ires xres wsumres kres yres; run;
The reduced form coefficients and multipliers are added to the information read from EST= data set A and written to the OUTEST= data set B. The predicted and residual values from the simulation are written to the OUT= data set C specified in the OUTPUT statement. The SIMLIN procedure first prints the structural coefficient matrices read from the EST= data set, as shown in Output 23.1.2 through Output 23.1.4. Output 23.1.2 SIMLIN Procedure Output – Endogenous Structural Coefficients Simulation of Klein’s Model I using SIMLIN The SIMLIN Procedure Structural Coefficients for Endogenous Variables Variable c i w x y p k wsum
c
p
w
i
1.0000 . . -1.0000 -1.0000 . . .
-0.1249 0.0131 . . . 1.0000 . .
. . 1.0000 . . 1.0000 . -1.0000
. 1.0000 . -1.0000 -1.0000 . -1.0000 .
Structural Coefficients for Endogenous Variables Variable c i w x y p k wsum
x
wsum
k
y
. . -0.4005 1.0000 . -1.0000 . .
-0.7901 . . . . . . 1.0000
. . . . . . 1.0000 .
. . . . 1.0000 . . .
Example 23.1: Simulating Klein’s Model I F 1529
Output 23.1.3 SIMLIN Procedure Output – Lagged Endogenous Structural Coefficients Structural Coefficients for Lagged Endogenous Variables Variable c i w x y p k wsum
klag
plag
xlag
. -0.1948 . . . . 1.0000 .
0.1631 0.7557 . . . . . .
. . 0.1813 . . . . .
Output 23.1.4 SIMLIN Procedure Output – Exogenous Structural Coefficients Structural Coefficients for Exogenous Variables Variable c i w x y p k wsum
wp
g
t
yr
Intercept
. . . . . . . 1.0000
. . . 1.0000 1.0000 . . .
. . . . -1.0000 -1.0000 . .
. . 0.1497 . . . . .
16.4408 28.1778 1.7972 0 0 0 0 0
The SIMLIN procedure then prints the inverse of the endogenous variables coefficient matrix, as shown in Output 23.1.5.
1530 F Chapter 23: The SIMLIN Procedure
Output 23.1.5 SIMLIN Procedure Output – Inverse Coefficient Matrix Inverse Coefficient Matrix for Endogenous Variables Variable c p w i x wsum k y
c
i
w
x
1.6347 0.9724 0.6496 -0.0127 1.6219 0.6496 -0.0127 1.6219
0.6347 0.9724 0.6496 0.9873 1.6219 0.6496 0.9873 1.6219
1.0957 -0.3405 1.4406 0.004453 1.1001 1.4406 0.004453 1.1001
0.6347 0.9724 0.6496 -0.0127 1.6219 0.6496 -0.0127 0.6219
Inverse Coefficient Matrix for Endogenous Variables Variable c p w i x wsum k y
y
p
k
wsum
0 0 0 0 0 0 0 1.0000
0.1959 1.1087 0.0726 -0.0145 0.1814 0.0726 -0.0145 0.1814
0 0 0 0 0 0 1.0000 0
1.2915 0.7682 0.5132 -0.0100 1.2815 1.5132 -0.0100 1.2815
The SIMLIN procedure next prints the reduced form coefficient matrices, as shown in Output 23.1.6. Output 23.1.6 SIMLIN Procedure Output – Reduced Form Coefficients Reduced Form for Lagged Endogenous Variables Variable c p w i x wsum k y
klag
plag
xlag
-0.1237 -0.1895 -0.1266 -0.1924 -0.3160 -0.1266 0.8076 -0.3160
0.7463 0.8935 0.5969 0.7440 1.4903 0.5969 0.7440 1.4903
0.1986 -0.0617 0.2612 0.000807 0.1994 0.2612 0.000807 0.1994
Example 23.1: Simulating Klein’s Model I F 1531
Output 23.1.6 continued Reduced Form for Exogenous Variables Variable c p w i x wsum k y
wp
g
t
yr
Intercept
1.2915 0.7682 0.5132 -0.0100 1.2815 1.5132 -0.0100 1.2815
0.6347 0.9724 0.6496 -0.0127 1.6219 0.6496 -0.0127 1.6219
-0.1959 -1.1087 -0.0726 0.0145 -0.1814 -0.0726 0.0145 -1.1814
0.1640 -0.0510 0.2156 0.000667 0.1647 0.2156 0.000667 0.1647
46.7273 42.7736 31.5721 27.6184 74.3457 31.5721 27.6184 74.3457
The multiplier matrices (requested by the INTERIM=2 and TOTAL options) are printed next, as shown in Output 23.1.7 and Output 23.1.8. Output 23.1.7 SIMLIN Procedure Output – Interim Multipliers Interim Multipliers for Interim 1 Variable c p w i x wsum k y
wp
g
t
yr
Intercept
0.829130 0.609213 0.794488 0.574572 1.403702 0.794488 0.564524 1.403702
1.049424 0.771077 1.005578 0.727231 1.776655 1.005578 0.714514 1.776655
-0.865262 -0.982167 -0.710961 -0.827867 -1.693129 -0.710961 -0.813366 -1.693129
-.0054080 -.0558215 0.0125018 -.0379117 -.0433197 0.0125018 -.0372452 -.0433197
43.27442 28.39545 41.45124 26.57227 69.84670 41.45124 54.19068 69.84670
Interim Multipliers for Interim 2 Variable c p w i x wsum k y
wp
g
t
yr
Intercept
0.663671 0.350716 0.658769 0.345813 1.009485 0.658769 0.910337 1.009485
0.840004 0.443899 0.833799 0.437694 1.277698 0.833799 1.152208 1.277698
-0.968727 -0.618929 -0.925467 -0.575669 -1.544396 -0.925467 -1.389035 -1.544396
-.0456589 -.0401446 -.0399178 -.0344035 -.0800624 -.0399178 -.0716486 -.0800624
28.36428 10.79216 28.33114 10.75901 39.12330 28.33114 64.94969 39.12330
1532 F Chapter 23: The SIMLIN Procedure
Output 23.1.8 SIMLIN Procedure Output – Total Multipliers Total Multipliers Variable c p w i x wsum k y
wp
g
t
yr
Intercept
1.881667 0.786945 1.094722 0.000000 1.881667 2.094722 2.999365 1.881667
1.381613 0.996031 1.385582 0.000000 2.381613 1.385582 3.796275 2.381613
-0.685987 -1.286891 -0.399095 -0.000000 -0.685987 -0.399095 -4.904859 -1.685987
0.1789624 -.0748290 0.2537914 0.0000000 0.1789624 0.2537914 -.2852032 0.1789624
41.3045 15.4770 25.8275 0.0000 41.3045 25.8275 203.6035 41.3045
The last part of the SIMLIN procedure output is a table of statistics of fit for the simulation, as shown in Output 23.1.9. Output 23.1.9 SIMLIN Procedure Output – Simulation Statistics Fit Statistics
Variable c p w i x wsum k y
N
Mean Error
Mean Pct Error
Mean Abs Error
Mean Abs Pct Error
RMS Error
RMS Pct Error
21 21 21 21 21 21 21 21
0.1367 0.1422 0.1282 0.1337 0.2704 0.1282 -0.1424 0.2704
-0.3827 -4.0671 -0.8939 105.8529 -0.9553 -0.6669 -0.1506 -1.3476
3.5011 2.9355 3.1247 2.4983 5.9622 3.1247 3.8879 5.9622
6.69769 19.61400 8.92110 127.13736 10.40057 7.88988 1.90614 11.74177
4.3155 3.4257 4.0930 2.9980 7.1881 4.0930 5.0036 7.1881
8.1701 26.0265 11.4709 252.3497 12.5653 10.1724 2.4209 14.2214
The OUTEST= output data set contains all the observations read from the EST= data set, and in addition contains observations for the reduced form and multiplier matrices. The following statements produce a partial listing of the OUTEST= data set, as shown in Output 23.1.10. proc print data=b; where _type_ = ’REDUCED’ | _type_ = ’IMULT1’; run;
Example 23.1: Simulating Klein’s Model I F 1533
Output 23.1.10 Partial Listing of OUTEST= Data Set Simulation of Klein’s Model I using SIMLIN
O b s 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
O b s 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
_ D E P V A R _
_ T Y P E _ REDUCED REDUCED REDUCED REDUCED REDUCED REDUCED REDUCED REDUCED IMULT1 IMULT1 IMULT1 IMULT1 IMULT1 IMULT1 IMULT1 IMULT1
_ M O D E L _
c p w i x wsum k y c p w i x wsum k y
k l a g -0.12366 -0.18946 -0.12657 -0.19237 -0.31603 -0.12657 0.80763 -0.31603 . . . . . . . .
_ S I G M A _
c
p
w
i
w s u m
x
k
y
. 1.63465 0.63465 1.09566 0.63465 0 0.19585 0 1.29151 . 0.97236 0.97236 -0.34048 0.97236 0 1.10872 0 0.76825 . 0.64957 0.64957 1.44059 0.64957 0 0.07263 0 0.51321 . -0.01272 0.98728 0.00445 -0.01272 0 -0.01450 0 -0.01005 . 1.62194 1.62194 1.10011 1.62194 0 0.18135 0 1.28146 . 0.64957 0.64957 1.44059 0.64957 0 0.07263 0 1.51321 . -0.01272 0.98728 0.00445 -0.01272 0 -0.01450 1 -0.01005 . 1.62194 1.62194 1.10011 0.62194 1 0.18135 0 1.28146 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
p l a g 0.74631 0.89347 0.59687 0.74404 1.49034 0.59687 0.74404 1.49034 . . . . . . . .
x l a g 0.19863 -0.06173 0.26117 0.00081 0.19944 0.26117 0.00081 0.19944 . . . . . . . .
w p 1.29151 0.76825 0.51321 -0.01005 1.28146 1.51321 -0.01005 1.28146 0.82913 0.60921 0.79449 0.57457 1.40370 0.79449 0.56452 1.40370
g 0.63465 0.97236 0.64957 -0.01272 1.62194 0.64957 -0.01272 1.62194 1.04942 0.77108 1.00558 0.72723 1.77666 1.00558 0.71451 1.77666
t -0.19585 -1.10872 -0.07263 0.01450 -0.18135 -0.07263 0.01450 -1.18135 -0.86526 -0.98217 -0.71096 -0.82787 -1.69313 -0.71096 -0.81337 -1.69313
y r 0.16399 -0.05096 0.21562 0.00067 0.16466 0.21562 0.00067 0.16466 -0.00541 -0.05582 0.01250 -0.03791 -0.04332 0.01250 -0.03725 -0.04332
I n t e r c e p t 46.7273 42.7736 31.5721 27.6184 74.3457 31.5721 27.6184 74.3457 43.2744 28.3955 41.4512 26.5723 69.8467 41.4512 54.1907 69.8467
1534 F Chapter 23: The SIMLIN Procedure
The actual and predicted values for the variable C are plotted in Output 23.1.11. title2 ’Plots of Simulation Results’; proc sgplot data=c; scatter x=year y=c; series x=year y=chat / markers markerattrs=(symbol=plus); refline 1941.5 / axis=x; run;
Output 23.1.11 Plot of Actual and Predicted Consumption
Example 23.2: Multipliers for a Third-Order System This example shows how to fit and simulate a single equation dynamic model with third-order lags. It then shows how to convert the third-order equation into a three equation system with only firstorder lags, so that the SIMLIN procedure can compute multipliers. (See the section "Multipliers for Higher Order Lags" earlier in this chapter for more information.) The input data set TEST is created from simulated data. A partial listing of the data set TEST produced by PROC PRINT is shown in Output 23.2.1.
Example 23.2: Multipliers for a Third-Order System F 1535
Output 23.2.1 Partial Listing of Input Data Set Simulate Equation with Third-Order Lags Listing of Simulated Input Data Obs 1 2 3 4 5 6 7 8 9 10
y
ylag1
ylag2
ylag3
x
n
8.2369 8.6285 10.2223 10.1372 10.0360 10.3560 11.4835 10.8508 11.2684 12.6310
8.5191 8.2369 8.6285 10.2223 10.1372 10.0360 10.3560 11.4835 10.8508 11.2684
6.9491 8.5191 8.2369 8.6285 10.2223 10.1372 10.0360 10.3560 11.4835 10.8508
7.8800 6.9491 8.5191 8.2369 8.6285 10.2223 10.1372 10.0360 10.3560 11.4835
-1.2593 -1.6805 -1.9844 -1.7855 -1.8092 -1.3921 -2.0987 -1.8788 -1.7154 -1.8418
1 2 3 4 5 6 7 8 9 10
The REG procedure processes the input data and writes the parameter estimates to the OUTEST= data set A. title2 ’Estimated Parameters’; proc reg data=test outest=a; model y=ylag3 x; run; title2 ’Listing of OUTEST= Data Set’; proc print data=a; run;
Output 23.2.2 shows the printed output produced by the REG procedure, and Output 23.2.3 displays the OUTEST= data set A produced. Output 23.2.2 Estimates and Fit Information from PROC REG Simulate Equation with Third-Order Lags Estimated Parameters The REG Procedure Model: MODEL1 Dependent Variable: y Analysis of Variance
Source
DF
Sum of Squares
Mean Square
Model Error Corrected Total
2 27 29
173.98377 1.38818 175.37196
86.99189 0.05141
Root MSE Dependent Mean Coeff Var
0.22675 13.05234 1.73721
R-Square Adj R-Sq
F Value
Pr > F
1691.98
<.0001
0.9921 0.9915
1536 F Chapter 23: The SIMLIN Procedure
Output 23.2.2 continued Parameter Estimates
Variable Intercept ylag3 x
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
1 1 1
0.14239 0.77121 -1.77668
0.23657 0.01723 0.10843
0.60 44.77 -16.39
0.5523 <.0001 <.0001
Output 23.2.3 The OUTEST= Data Set Created by PROC REG Simulate Equation with Third-Order Lags Listing of OUTEST= Data Set Obs
_MODEL_
_TYPE_
_DEPVAR_
_RMSE_
Intercept
ylag3
x
y
1
MODEL1
PARMS
y
0.22675
0.14239
0.77121
-1.77668
-1
The SIMLIN procedure processes the TEST data set using the estimates from PROC REG. The following statements perform the simulation and write the results to the OUT= data set OUT2. title2 ’Simulation of Equation’; proc simlin est=a data=test nored; endogenous y; exogenous x; lagged ylag3 y 3; id n; output out=out1 predicted=yhat residual=yresid; run;
The printed output from the SIMLIN procedure is shown in Output 23.2.4. Output 23.2.4 Output Produced by PROC SIMLIN Simulate Equation with Third-Order Lags Simulation of Equation The SIMLIN Procedure Fit Statistics
Variable y
N
Mean Error
Mean Pct Error
Mean Abs Error
Mean Abs Pct Error
RMS Error
RMS Pct Error
30
-0.0233
-0.2268
0.2662
2.05684
0.3408
2.6159
The following statements plot the actual and predicted values, as shown in Output 23.2.5. title2 ’Plots of Simulation Results’; proc sgplot data=out1;
Example 23.2: Multipliers for a Third-Order System F 1537
scatter x=n y=y; series x=n y=yhat / markers markerattrs=(symbol=plus); run;
Output 23.2.5 Plot of Predicted and Actual Values
Next, the input data set TEST is modified by creating two new variables, YLAG1X and YLAG2X, that are equal to YLAG1 and YLAG2. These variables are used in the SYSLIN procedure. (The estimates produced by PROC SYSLIN are the same as before and are not shown.) A listing of the OUTEST= data set B created by PROC SYSLIN is shown in Output 23.2.6. data test2; set test; ylag1x=ylag1; ylag2x=ylag2; run; title2 ’Estimation of parameters and definition of identities’; proc syslin data=test2 outest=b; endogenous y ylag1x ylag2x; model y=ylag3 x; identity ylag1x=ylag1; identity ylag2x=ylag2; run;
1538 F Chapter 23: The SIMLIN Procedure
title2 ’Listing of OUTEST= data set from PROC SYSLIN’; proc print data=b; run;
Output 23.2.6 Listing of OUTEST= Data Set Created from PROC SYSLIN Simulate Equation with Third-Order Lags Listing of OUTEST= data set from PROC SYSLIN
_ T Y O P b E s _
_ S T A T U S _
_ M O D E L _
_ D E P V A R _
I n t e r c e p t
_ S I G M A _
y l a g 3
x
y l a g 1
y l a g 2
y
y l a g 1 x
y l a g 2 x
1 OLS 0 Converged y y 0.22675 0.14239 0.77121 -1.77668 . . -1 . . 2 IDENTITY 0 Converged ylag1x . 0.00000 . . 1 . . -1 . 3 IDENTITY 0 Converged ylag2x . 0.00000 . . . 1 . . -1
The SIMLIN procedure is used to compute the reduced form and multipliers. The OUTEST= data set B from PROC SYSLIN is used as the EST= data set for the SIMLIN procedure. The following statements perform the multiplier analysis. title2 ’Simulation of transformed first-order equation system’; proc simlin est=b data=test2 total interim=2; endogenous y ylag1x ylag2x; exogenous x; lagged ylag1 y 1 ylag2 ylag1x 1 ylag3 ylag2x 1; id n; output out=out2 predicted=yhat residual=yresid; run;
Output 23.2.7 shows the interim 2 and total multipliers printed by the SIMLIN procedure. Output 23.2.7 Interim 2 and Total Multipliers Simulate Equation with Third-Order Lags Simulation of transformed first-order equation system The SIMLIN Procedure Interim Multipliers for Interim 2 Variable y ylag1x ylag2x
x
Intercept
0.000000 0.000000 -1.776682
0.0000000 0.0000000 0.1423865
References F 1539
Output 23.2.7 continued Total Multipliers Variable y ylag1x ylag2x
x
Intercept
-7.765556 -7.765556 -7.765556
0.6223455 0.6223455 0.6223455
References Maddala, G.S (1977), Econometrics, New York: McGraw-Hill Book Co. Pindyck, R.S. and Rubinfeld, D.L. (1991), Econometric Models and Economic Forecasts, Third Edition, New York: McGraw-Hill Book Co. Theil, H. (1971), Principles of Econometrics, New York: John Wiley & Sons, Inc.
1540
Chapter 24
The SPECTRA Procedure Contents Overview: SPECTRA Procedure . . . . . . . . . . . . . . . Getting Started: SPECTRA Procedure . . . . . . . . . . . . Syntax: SPECTRA Procedure . . . . . . . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . . . . . PROC SPECTRA Statement . . . . . . . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . . . . . VAR Statement . . . . . . . . . . . . . . . . . . . . . WEIGHTS Statement . . . . . . . . . . . . . . . . . Details: SPECTRA Procedure . . . . . . . . . . . . . . . . Input Data . . . . . . . . . . . . . . . . . . . . . . . Missing Values . . . . . . . . . . . . . . . . . . . . . Computational Method . . . . . . . . . . . . . . . . . Kernels . . . . . . . . . . . . . . . . . . . . . . . . . White Noise Test . . . . . . . . . . . . . . . . . . . . Transforming Frequencies . . . . . . . . . . . . . . . OUT= Data Set . . . . . . . . . . . . . . . . . . . . . Printed Output . . . . . . . . . . . . . . . . . . . . . ODS Table Names: SPECTRA procedure . . . . . . . Examples: SPECTRA Procedure . . . . . . . . . . . . . . . Example 24.1: Spectral Analysis of Sunspot Activity Example 24.2: Cross-Spectral Analysis . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
1541 1543 1544 1544 1545 1546 1547 1547 1548 1548 1548 1548 1549 1551 1552 1552 1554 1554 1555 1555 1562 1565
Overview: SPECTRA Procedure The SPECTRA procedure performs spectral and cross-spectral analysis of time series. You can use spectral analysis techniques to look for periodicities or cyclical patterns in data. The SPECTRA procedure produces estimates of the spectral and cross-spectral densities of a multivariate time series. Estimates of the spectral and cross-spectral densities of a multivariate time series are produced using a finite Fourier transform to obtain periodograms and cross-periodograms. The
1542 F Chapter 24: The SPECTRA Procedure
periodogram ordinates are smoothed by a moving average to produce estimated spectral and crossspectral densities. PROC SPECTRA can also test whether or not the data are white noise. PROC SPECTRA uses the finite Fourier transform to decompose data series into a sum of sine and cosine waves of different amplitudes and wavelengths. The Fourier transform decomposition of the series x t is m X a0 xt D C Œak cos.!k t/ C bk si n.!k t/ 2 kD1
where t
is the time subscript, t D 1; 2; : : : ; n
xt
are the equally spaced time series data
n
is the number of observations in the time series
m
is the number of frequencies in the Fourier decomposition: m D m D n 2 1 if n is odd
a0
is the mean term: a0 D 2x
ak
are the cosine coefficients
bk
are the sine coefficients
!k
are the Fourier frequencies: !k D
n 2
if n is even;
2k n
Functions of the Fourier coefficients ak and bk can be plotted against frequency or against wave length to form periodograms. The amplitude periodogram Jk is defined as follows: Jk D
n 2 .ak C bk2 / 2
Several definitions of the term periodogram are used in the spectral analysis literature. The following discussion refers to the Jk sequence as the periodogram. The periodogram can be interpreted as the contribution of the kth harmonic !k to the total sum of squares (in an analysis of variance sense) in the decomposition of the process into two-degree-offreedom components for each of the m frequencies. When n is even, si n.! n2 / is zero, and thus the last periodogram value is a one-degree-of-freedom component. The periodogram is a volatile and inconsistent estimator of the spectrum. The spectral density estimate is produced by smoothing the periodogram. Smoothing reduces the variance of the estimator but introduces a bias. The weight function used for the smoothing process, W(), often called the kernel or spectral window, is specified with the WEIGHTS statement. It is related to another weight function, w(), the lag window, that is used in other methods to taper the correlogram rather than to smooth the periodogram. Many specific weighting functions have been suggested in the literature (Fuller 1976, Jenkins and Watts 1968, Priestly 1981). Table 24.3 later in this chapter gives the relevant formulas when the WEIGHTS statement is used. p Letting i represent the imaginary unit 1, the cross-periodogram is defined as follows: xy
Jk D
n x y n y y .ak ak C bkx bk / C i .akx bk 2 2
y
bkx ak /
Getting Started: SPECTRA Procedure F 1543
The cross-spectral density estimate is produced by smoothing the cross-periodogram in the same way as the periodograms are smoothed using the spectral window specified by the WEIGHTS statement. The SPECTRA procedure creates an output SAS data set whose variables contain values of the periodograms, cross-periodograms, estimates of spectral densities, and estimates of cross-spectral densities. The form of the output data set is described in the section “OUT= Data Set” on page 1552.
Getting Started: SPECTRA Procedure To use the SPECTRA procedure, specify the input and output data sets and options for the analysis you want in the PROC SPECTRA statement, and list the variables to analyze in the VAR statement. The procedure produces no printed output unless the WHITETEST option is specified in the PROC SPECTRA statement. The periodogram, spectral density, and other results are written to the OUT= data set, depending on the options used. For example, to compute the Fourier transform of a variable X in a data set A, use the following statements: proc spectra data=a out=b coef; var x; run;
This PROC SPECTRA step writes the Fourier coefficients ak and bk to the variables COS_01 and SIN_01 in the output data set B. When a WEIGHTS statement is specified, the periodogram is smoothed by a weighted moving average to produce an estimate of the spectral density of the series. The following statements write a spectral density estimate for X to the variable S_01 in the output data set B. proc spectra data=a out=b s; var x; weights 1 2 3 4 3 2 1; run;
When the VAR statement specifies more than one variable, you can perform cross-spectral analysis by specifying the CROSS option in the PROC SPECTRA statemnet. The CROSS option by itself produces the cross-periodograms for all two-way combinations of the variables listed in the VAR statement. For example, the following statements write the real and imaginary parts of the crossperiodogram of X and Y to the variables RP_01_02 and IP_01_02 in the output data set B. proc spectra data=a out=b cross; var x y; run;
1544 F Chapter 24: The SPECTRA Procedure
To produce cross-spectral density estimates, specify both the CROSS option and the S option. The cross-periodogram is smoothed using the weights specified by the WEIGHTS statement in the same way as the spectral density. The squared coherency and phase estimates of the cross-spectrum are computed when the K and PH options are used. The following example computes cross-spectral density estimates for the variables X and Y. proc spectra data=a out=b cross s; var x y; weights 1 2 3 4 3 2 1; run;
The real part and imaginary part of the cross-spectral density estimates are written to the variables CS_01_02 and QS_01_02, respectively.
Syntax: SPECTRA Procedure The following statements are used with the SPECTRA procedure: PROC SPECTRA options ; BY variables ; VAR variables ; WEIGHTS < weights > < kernel > ;
Functional Summary Table 24.1 summarizes the statements and options that control the SPECTRA procedure. Table 24.1
SPECTRA Functional Summary
Description
Statement
Option
Statements specify BY-group processing specify the variables to be analyzed specify weights for spectral density estimates
BY VAR WEIGHTS
Data Set Options specify the input data set specify the output data set
PROC SPECTRA PROC SPECTRA
DATA= OUT=
Output Control Options output the amplitudes of the cross-spectrum output the Fourier coefficients
PROC SPECTRA PROC SPECTRA
A COEF
PROC SPECTRA Statement F 1545
Table 24.1
continued
Description
Statement
Option
output the periodogram output the spectral density estimates output cross-spectral analysis results output squared coherency of the crossspectrum output the phase of the cross-spectrum
PROC SPECTRA PROC SPECTRA PROC SPECTRA PROC SPECTRA
P S CROSS K
PROC SPECTRA
PH
WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS
BART PARZEN QS TUKEY TRUNCAT
PROC SPECTRA PROC SPECTRA
ADJMEAN ALTW
PROC SPECTRA
WHITETEST
Smoothing Options specify the Bartlett kernel specify the Parzen kernel specify the quadratic spectral kernel specify the Tukey-Hanning kernel specify the truncated kernel Other Options subtract the series mean specify an alternate quadrature spectrum estimate request tests for white noise
PROC SPECTRA Statement PROC SPECTRA options ;
The following options can be used in the PROC SPECTRA statement: A
outputs the amplitude variables (A_nn _mm ) of the cross-spectrum. ADJMEAN CENTER
subtracts the series mean before performing the Fourier decomposition. This sets the first periodogram ordinate to 0 rather than 2n times the squared mean. This option is commonly used when the periodograms are to be plotted to prevent a large first periodogram ordinate from distorting the scale of the plot. ALTW
specifies that the quadrature spectrum estimate is computed at the boundaries in the same way as the spectral density estimate and the cospectrum estimate are computed.
1546 F Chapter 24: The SPECTRA Procedure
COEF
outputs the Fourier cosine and sine coefficients of each series. CROSS
is used with the P and S options to output cross-periodograms and cross-spectral densities when more than one variable is listed in the VAR statement. DATA=SAS-data-set
names the SAS data set that contains the input data. If the DATA= option is omitted, the most recently created SAS data set is used. K
outputs the squared coherency variables (K_nn _mm ) of the cross-spectrum. The K_nn _mm variables are identically 1 unless weights are given in the WEIGHTS statement and the S option is specified. OUT=SAS-data-set
names the output data set created by PROC SPECTRA to store the results. If the OUT= option is omitted, the output data set is named by using the DATAn convention. P
outputs the periodogram variables. The variables are named P_nn, where nn is an index of the original variable with which the periodogram variable is associated. When both the P and CROSS options are specified, the cross-periodogram variables RP_nn_mm and IP_nn_mm are also output. PH
outputs the phase variables (PH_nn _mm) of the cross-spectrum. S
outputs the spectral density estimates. The variables are named S_nn, where nn is an index of the original variable with which the estimate variable is associated. When both the S and CROSS options are specified, the cross-spectral variables CS_nn _mm and QS_nn _mm are also output. WHITETEST
prints two tests of the hypothesis that the data are white noise. See the section “White Noise Test” on page 1551 for details. Note that the CROSS, A, K, and PH options are meaningful only if more than one variable is listed in the VAR statement.
BY Statement BY variables ;
A BY statement can be used with PROC SPECTRA to obtain separate analyses for groups of observations defined by the BY variables.
VAR Statement F 1547
VAR Statement VAR variables ;
The VAR statement specifies one or more numeric variables that contain the time series to analyze. The order of the variables in the VAR statement list determines the index, nn, used to name the output variables. The VAR statement is required.
WEIGHTS Statement WEIGHTS weight-constants | kernel-specification ;
The WEIGHTS statement specifies the relative weights used in the moving average applied to the periodogram ordinates to form the spectral density estimates. A WEIGHTS statement must be used to produce smoothed spectral density estimates. You can specify the relative weights in two ways: you can specify them explicitly as explained in the section “Using Weight Constants Specification” on page 1547, or you can specify them implicitly by using the kernel specification as explained in the section “Using Kernel Specifications” on page 1547. If the WEIGHTS statement is not used, only the periodogram is produced.
Using Weight Constants Specification Any number of weighting constants can be specified. The constants should be positive and symmetric about the middle weight. The middle constant (or the constant to the right of the middle if an even number of weight constants are specified) is the relative weight of the current periodogram ordinate. The constant immediately following the middle one is the relative weight of the next periodogram ordinate, and so on. The actual weights used in the smoothing process are the weights 1 specified in the WEIGHTS statement scaled so that they sum to 4 . The moving average reflects at each end of the periodogram. The first periodogram ordinate is not used; the second periodogram ordinate is used in its place. For example, a simple triangular weighting can be specified using the following WEIGHTS statement: weights 1 2 3 2 1;
Using Kernel Specifications You can specify five different kernels in the WEIGHTS statement. The syntax for the statement is WEIGHTS [PARZEN][BART][TUKEY][TRUNCAT][QS] [c e] ;
1548 F Chapter 24: The SPECTRA Procedure
where c >D 0 and e >D 0 are used to compute the bandwidth parameter as l.q/ D cq e and q is the number of periodogram ordinates +1: q D floor.n=2/ C 1 To specify the bandwidth explicitly, set c D to the desired bandwidth and e D 0. For example, a Parzen kernel can be specified using the following WEIGHTS statement: weights parzen 0.5 0;
For details, see the section “Kernels” on page 1549.
Details: SPECTRA Procedure
Input Data Observations in the data set analyzed by the SPECTRA procedure should form ordered, equally spaced time series. No more than 99 variables can be included in the analysis. Data are often detrended before analysis by the SPECTRA procedure. This can be done by using the residuals output by a SAS regression procedure. Optionally, the data can be centered using the ADJMEAN option in the PROC SPECTRA statement, since the zero periodogram ordinate corresponding to the mean is of little interest from the point of view of spectral analysis.
Missing Values Missing values are excluded from the analysis by the SPECTRA procedure. If the SPECTRA procedure encounters missing values for any variable listed in the VAR statement, the procedure determines the longest contiguous span of data that has no missing values for the variables listed in the VAR statement and uses that span for the analysis.
Computational Method If the number of observations n factors into prime integers that are less than or equal to 23, and the product of the square-free factors of n is less than 210, then PROC SPECTRA uses the fast
Kernels F 1549
Fourier transform developed by Cooley and Tukey and implemented by Singleton (1969). If n cannot be factored in this way, then PROC SPECTRA uses a Chirp-Z algorithm similar to that proposed by Monro and Branch (1976). To reduce memory requirements, when n is small, the Fourier coefficients are computed directly using the defining formulas.
Kernels Kernels are used to smooth the periodogram by using a weighted moving average of nearby points. A smoothed periodogram is defined by the following equation. JOi .l.q// D
l.q/ X D l.q/
w JQi C l.q/
where w.x/ is the kernel or weight function. At the endpoints, the moving average is computed cyclically; that is,
JQi C
8 ˆ <Ji C D J .i C/ ˆ : Jq .iC/
0
The SPECTRA procedure supports the following kernels. They are listed with their default bandwidth functions. Bartlett: KERNEL BART ( 1 jxj w.x/ D 0 1 1=3 q l.q/ D 2
jxj1 otherwise
Parzen: KERNEL PARZEN 8 2 3 ˆ <1 6jxj C 6jxj w.x/ D 2.1 jxj/3 ˆ : 0
0jxj 21 1 2 jxj1 otherwise
l.q/ D q 1=5 Quadratic spectral: KERNEL QS w.x/ D l.q/ D
25 12 2 x 2 1 1=5 q 2
si n.6x=5/ 6x=5
cos.6x=5/
1550 F Chapter 24: The SPECTRA Procedure
Tukey-Hanning: KERNEL TUKEY ( w.x/ D l.q/ D
.1 C cos.x//=2 0
jxj1 otherwise
2 1=5 q 3
Truncated: KERNEL TRUNCAT ( w.x/ D l.q/ D
1 jxj1 0 otherwise
1 1=5 q 4
A summary of the default values of the bandwidth parameters, c and e, associated with the kernel smoothers in PROC SPECTRA are listed below in Table 24.2:
Table 24.2
Bandwidth Parameters
Kernel Bartlett Parzen quadratic Tukey-Hanning truncated
c 1=2 1 1=2 2=3 1=4
e 1=3 1=5 1=5 1=5 1=5
White Noise Test F 1551
Figure 24.1 Kernels for Smoothing
See Andrews (1991) for details about the properties of these kernels.
White Noise Test PROC SPECTRA prints two test statistics for white noise when the WHITETEST option is specified: Fisher’s Kappa (Davis 1941, Fuller 1976) and Bartlett’s Kolmogorov-Smirnov statistic (Bartlett 1966, Fuller 1976, Durbin 1967). If the time series is a sequence of independent random variables with mean 0 and variance 2 , then the periodogram, Jk , will have the same expected value for all k. For a time series with nonzero autocorrelation, each ordinate of the periodogram, Jk , will have different expected values. The Fisher’s Kappa statistic tests whether the largest Jk can be considered different from the mean of the Jk . Critical values for the Fisher’s Kappa test can be found in Fuller 1976. The Kolmogorov-Smirnov statistic reported by PROC SPECTRA has the same asymptotic distribution as Bartlett’s test (Durbin 1967). The Kolmogorov-Smirnov statistic compares the normalized cumulative periodogram with the cumulative distribution function of a uniform(0,1) random variable. The normalized cumulative periodogram, Fj , of the series is Pj Jk ; j D 1; 2 : : : ; m 1 Fj D PkD1 m kD1 Jk
1552 F Chapter 24: The SPECTRA Procedure
where m D n2 if n is even or m D n 2 1 if n is odd. The test statistic is the maximum absolute difference of the normalized cumulative periodogram and the uniform cumulative distribution function. Approximate p-values for Bartlett’s Kolmogorov-Smirnov test statistics are provided with the test statistics. Small p-values cause you to reject the null-hypothesis that the series is white noise.
Transforming Frequencies The variable FREQ in the data set created by the SPECTRA procedure ranges from 0 to . Sometimes it is preferable to express frequencies in cycles per observation period, which is equal to 2 FREQ. To express frequencies in cycles per unit time (for example, in cycles per year), multiply FREQ by d 2 , where d is the number of observations per unit of time. For example, for monthly data, if the 2 desired time unit is years then d is 12. The period of the cycle is d FREQ , which ranges from d2 to infinity.
OUT= Data Set The OUT= data set contains n2 C 1 observations, if n is even, or nC1 2 observations, if n is odd, where n is the number of observations in the time series or the span of data being analyzed if missing values are present in the data. See the section “Missing Values” on page 1548 for details. The variables in the new data set are named according to the following conventions. Each variable to be analyzed is associated with an index. The first variable listed in the VAR statement is indexed as 01, the second variable as 02, and so on. Output variables are named by combining indexes with prefixes. The prefix always identifies the nature of the new variable, and the indices identify the original variables from which the statistics were obtained. Variables that contain spectral analysis results have names that consist of a prefix, an underscore, and the index of the variable analyzed. For example, the variable S_01 contains spectral density estimates for the first variable in the VAR statement. Variables that contain cross-spectral analysis results have names that consist of a prefix, an underscore, the index of the first variable, another underscore, and the index of the second variable. For example, the variable A_01_02 contains the amplitude of the cross-spectral density estimate for the first and second variables in the VAR statement. Table 24.3 shows the formulas and naming conventions used for the variables in the OUT= data set. Let X be variable number nn in the VAR statement list and let Y be variable number mm in the VAR statement list. Table 24.3 shows the output variables that contain the results of the spectral and cross-spectral analysis of X and Y. In Table 24.3 the following notation is used. Let Wj be the vector of 2p C 1 smoothing weights 1 given by the WEIGHTS statement, normalized to sum to 4 . Note that the weights are either explicitly provided using the constant specification or are implicitly determined by the kernel specification in the WEIGHTS statement.
OUT= Data Set F 1553
The subscript of Wj runs from W p to Wp , so that W0 is the middle weight in the list. Let n !k D 2k n , where k D 0; 1; : : :; floor. 2 /. Table 24.3
Variables Created by PROC SPECTRA
Variable
Description
FREQ
frequency in radians from 0 to (Note: Cycles per observation is FREQ 2 .)
PERIOD
2 period or wavelength: FREQ (Note: PERIOD is missing for FREQ=0.)
COS_nn
cosine P transform of n 2 x ak D n t D1 Xt cos.!k .t 1// P sine transform of X: bkx D n2 ntD1 Xt sin.!k .t periodogram of X: Jkx D n2 Œ.akx /2 C .bkx /2
SIN_nn P_nn
1//
S_nn
spectralP density estimate p x x Fk D j D p Wj JkCj (except across endpoints)
RP_nn _mm
real part of cross-periodogram y y xy real.Jk / D n2 .akx ak C bkx bk /
IP_nn _mm
imaginary part of cross-periodogram of X and Y: xy y y imag.Jk / D n2 .akx bk bkx ak /
CS_nn _mm
cospectrum estimate (real part of cross-spectrum) of X and Y:P xy xy p Ck D j D p Wj real.JkCj /(except across endpoints)
QS_nn _mm
quadrature spectrum estimate (imaginary part of cross-spectrum) of X and Y: Pp xy xy Qk D j D p Wj imag.JkCj /(except across endpoints)
A_nn _mm
amplitude q (modulus) of cross-spectrum of X and Y: xy xy 2 xy 2 Ak D .Ck / C .Qk /
K_nn _mm
coherency squared xy xy 2 y Kk D .Ak / =.Fkx Fk /
PH_nn _mm
phase spectrum in radians xy xy xy ˆk D arctan.Qk =Ck /
of
of
X:
X
X
of
X:
and
and
X
and
Y:
Y:
Y:
1554 F Chapter 24: The SPECTRA Procedure
Printed Output By default PROC SPECTRA produces no printed output. When the WHITETEST option is specified, the SPECTRA procedure prints the following statistics for each variable in the VAR statement: 1. the name of the variable 2. M–1, the number of two-degree-of-freedom periodogram ordinates used in the test 3. MAX(P(*)), the maximum periodogram ordinate 4. SUM(P(*)), the sum of the periodogram ordinates 5. Fisher’s Kappa statistic 6. Bartlett’s Kolmogorov-Smirnov test statistic 7. Approximate p-value for Bartlett’s Kolmogorov-Smirnov test statistic See the section “White Noise Test” on page 1551 for details.
ODS Table Names: SPECTRA procedure PROC SPECTRA assigns a name to each table it creates. You can use these names to reference the table when you use the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. Table 24.4
ODS Tables Produced in PROC SPECTRA
ODS Table Name
Description
Option
WhiteNoiseTest Kappa Bartlett
white noise test Fishers Kappa statistic Bartletts Kolmogorov-Smirnov statistic
WHITETEST WHITETEST WHITETEST
Examples: SPECTRA Procedure F 1555
Examples: SPECTRA Procedure
Example 24.1: Spectral Analysis of Sunspot Activity This example analyzes Wolfer’s sunspot data (Anderson 1971). The following statements read and plot the data. title "Wolfer’s Sunspot Data"; data sunspot; input year wolfer @@; datalines; ... more lines ... proc sgplot data=sunspot; series x=year y=wolfer / markers markerattrs=(symbol=circlefilled); xaxis values=(1740 to 1930 by 10); yaxis values=(0 to 1600 by 200); run;
The plot of the sunspot series is shown in Output 24.1.1.
1556 F Chapter 24: The SPECTRA Procedure
Output 24.1.1 Plot of Original Sunspot Data
The spectral analysis of the sunspot series is performed by the following statements: proc spectra data=sunspot out=b p s adjmean whitetest; var wolfer; weights 1 2 3 4 3 2 1; run; proc print data=b(obs=12); run;
The PROC SPECTRA statement specifies the P and S options to write the periodogram and spectral density estimates to the OUT= data set B. The WEIGHTS statement specifies a triangular spectral window for smoothing the periodogram to produce the spectral density estimate. The ADJMEAN option zeros the frequency 0 value and avoids the need to exclude that observation from the plots. The WHITETEST option prints tests for white noise. The Fisher’s Kappa test statistic of 16.070 is larger than the 5% critical value of 7.2, so the null hypothesis that the sunspot series is white noise is rejected (see the table of critical values in Fuller (1976)). The Bartlett’s Kolmogorov-Smirnov statistic is 0.6501, and its approximate p-value is < 0:0001. The small p-value associated with this test leads to the rejection of the null hypothesis that the spectrum represents white noise.
Example 24.1: Spectral Analysis of Sunspot Activity F 1557
The printed output produced by PROC SPECTRA is shown in Output 24.1.2. The output data set B created by PROC SPECTRA is shown in part in Output 24.1.3. Output 24.1.2 White Noise Test Results Wolfer’s Sunspot Data The SPECTRA Procedure Test for White Noise for Variable wolfer M-1 Max(P(*)) Sum(P(*))
87 4062267 21156512
Fisher’s Kappa: (M-1)*Max(P(*))/Sum(P(*)) Kappa
16.70489
Bartlett’s Kolmogorov-Smirnov Statistic: Maximum absolute difference of the standardized partial sums of the periodogram and the CDF of a uniform(0,1) random variable. Test Statistic Approximate P-Value
0.650055 <.0001
Output 24.1.3 First 12 Observations of the OUT= Data Set Wolfer’s Sunspot Data Obs 1 2 3 4 5 6 7 8 9 10 11 12
FREQ
PERIOD
P_01
0.00000 0.03570 0.07140 0.10710 0.14280 0.17850 0.21420 0.24990 0.28560 0.32130 0.35700 0.39270
. 176.000 88.000 58.667 44.000 35.200 29.333 25.143 22.000 19.556 17.600 16.000
0.00 3178.15 2435433.22 1077495.76 491850.36 2581.12 181163.15 283057.60 188672.97 122673.94 58532.93 213405.16
S_01 59327.52 61757.98 69528.68 66087.57 53352.02 36678.14 20604.52 15132.81 13265.89 14953.32 16402.84 18562.13
The following statements plot the periodogram and spectral density estimate by the frequency and period. proc sgplot data=b; series x=freq y=p_01 / markers markerattrs=(symbol=circlefilled); run;
1558 F Chapter 24: The SPECTRA Procedure
proc sgplot data=b; series x=period y=p_01 / markers markerattrs=(symbol=circlefilled); run; proc sgplot data=b; series x=freq y=s_01 / markers markerattrs=(symbol=circlefilled); run; proc sgplot data=b; series x=period y=s_01 / markers markerattrs=(symbol=circlefilled); run;
The periodogram is plotted against the frequency in Output 24.1.4 and plotted against the period in Output 24.1.5. The spectral density estimate is plotted against the frequency in Output 24.1.6 and plotted against the period in Output 24.1.7. Output 24.1.4 Plot of Periodogram by Frequency
Example 24.1: Spectral Analysis of Sunspot Activity F 1559
Output 24.1.5 Plot of Periodogram by Period
1560 F Chapter 24: The SPECTRA Procedure
Output 24.1.6 Plot of Spectral Density Estimate by Frequency
Example 24.1: Spectral Analysis of Sunspot Activity F 1561
Output 24.1.7 Plot of Spectral Density Estimate by Period
Since PERIOD is the reciprocal of frequency, the plot axis for PERIOD is stretched for low frequencies and compressed at high frequencies. One way to correct for this is to use a WHERE statement to restrict the plots and exclude the low frequency components. The following statements plot the spectral density for periods less than 50. proc sgplot data=b; where period < 50; series x=period y=s_01 / markers markerattrs=(symbol=circlefilled); refline 11 / axis=x; run;
The spectral analysis of the sunspot series confirms a strong 11-year cycle of sunspot activity. The plot makes this clear by drawing a reference line at the 11 year period, which highlights the position of the main peak in the spectral density. Output 24.1.8 shows the plot. Contrast Output 24.1.8 with Output 24.1.7.
1562 F Chapter 24: The SPECTRA Procedure
Output 24.1.8 Plot of Spectral Density Estimate by Period to 50 Years
Example 24.2: Cross-Spectral Analysis This example uses simulated data to show cross-spectral analysis for two variables X and Y. X is generated by an AR(1) process; Y is generated as white noise plus an input from X lagged 2 periods. All output options are specified in the PROC SPECTRA statement. PROC CONTENTS shows the contents of the OUT= data set. data a; xl = 0; xll = 0; do i = - 10 to 100; x = .4 * xl + rannor(123); y = .5 * xll + rannor(123); if i > 0 then output; xll = xl; xl = x; end; run; proc spectra data=a out=b cross coef a k p ph s; var x y;
Example 24.2: Cross-Spectral Analysis F 1563
weights 1 1.5 2 4 8 9 8 4 2 1.5 1; run; proc contents data=b position; run;
The PROC CONTENTS report for the output data set B is shown in Output 24.2.1. Output 24.2.1 Contents of PROC SPECTRA OUT= Data Set Wolfer’s Sunspot Data The CONTENTS Procedure Alphabetic List of Variables and Attributes #
Variable
Type
16 3 5 13 1 12 15 2 17 7 8 14 11 4 6 9 10
A_01_02 COS_01 COS_02 CS_01_02 FREQ IP_01_02 K_01_02 PERIOD PH_01_02 P_01 P_02 QS_01_02 RP_01_02 SIN_01 SIN_02 S_01 S_02
Num Num Num Num Num Num Num Num Num Num Num Num Num Num Num Num Num
Len 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
Label Amplitude of x by y Cosine Transform of x Cosine Transform of y Cospectra of x by y Frequency from 0 to PI Imag Periodogram of x by y Coherency**2 of x by y Period Phase of x by y Periodogram of x Periodogram of y Quadrature of x by y Real Periodogram of x by y Sine Transform of x Sine Transform of y Spectral Density of x Spectral Density of y
The following statements plot the amplitude of the cross-spectrum estimate against frequency and against period for periods less than 25. proc sgplot data=b; series x=freq y=a_01_02 / markers markerattrs=(symbol=circlefilled); xaxis values=(0 to 4 by 1); run;
The plot of the amplitude of the cross-spectrum estimate against frequency is shown in Output 24.2.2.
1564 F Chapter 24: The SPECTRA Procedure
Output 24.2.2 Plot of Cross-Spectrum Amplitude by Frequency
The plot of the cross-spectrum amplitude against period for periods less than 25 observations is shown in Output 24.2.3. proc sgplot data=b; where period < 25; series x=period y=a_01_02 / markers markerattrs=(symbol=circlefilled); xaxis values=(0 to 30 by 5); run;
References F 1565
Output 24.2.3 Plot of Cross-Spectrum Amplitude by Period
References Anderson, T. W. (1971), The Statistical Analysis of Time Series, New York: John Wiley & Sons. Andrews, D. W. K. (1991), “Heteroscedasticity and Autocorrelation Consistent Covariance Matrix Estimation,” Econometrica, 59 (3), 817–858. Bartlett, M. S. (1966), An Introduction to Stochastic Processes, Second Edition, Cambridge: Cambridge University Press. Brillinger, D. R. (1975), Time Series: Data Analysis and Theory, New York: Holt, Rinehart and Winston, Inc. Davis, H. T. (1941), The Analysis of Economic Time Series, Bloomington, IN: Principia Press. Durbin, J. (1967), “Tests of Serial Independence Based on the Cumulated Periodogram,” Bulletin of Int. Stat. Inst., 42, 1039–1049. Fuller, W. A. (1976), Introduction to Statistical Time Series, New York: John Wiley & Sons.
1566 F Chapter 24: The SPECTRA Procedure
Gentleman, W. M. and Sande, G. (1966), “Fast Fourier Transforms–for Fun and Profit,” AFIPS Proceedings of the Fall Joint Computer Conference, 19, 563–578. Jenkins, G. M. and Watts, D. G. (1968), Spectral Analysis and Its Applications, San Francisco: Holden-Day. Miller, L. H. (1956), “Tables of Percentage Points of Kolmogorov Statistics,” Journal of American Statistical Association, 51, 111. Monro, D. M. and Branch, J. L. (1976), “Algorithm AS 117. The Chirp Discrete Fourier Transform of General Length,” Applied Statistics, 26, 351–361. Nussbaumer, H. J. (1982), Fast Fourier Transform and Convolution Algorithms, Second Edition, New York: Springer-Verlag. Owen, D. B. (1962), Handbook of Statistical Tables, Addison Wesley. Parzen, E. (1957), “On Consistent Estimates of the Spectrum of a Stationary Time Series,” Annals of Mathematical Statistics, 28, 329–348. Priestly, M. B. (1981), Spectral Analysis and Time Series, New York: Academic Press, Inc. Singleton, R. C. (1969), “An Algorithm for Computing the Mixed Radix Fast Fourier Transform,” I.E.E.E. Transactions of Audio and Electroacoustics, AU-17, 93–103.
Chapter 25
The STATESPACE Procedure Contents Overview: STATESPACE Procedure . . . . . . . . . The State Space Model . . . . . . . . . . . . . How PROC STATESPACE Works . . . . . . . Getting Started: STATESPACE Procedure . . . . . . Automatic State Space Model Selection . . . . Specifying the State Space Model . . . . . . . Syntax: STATESPACE Procedure . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . PROC STATESPACE Statement . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . FORM Statement . . . . . . . . . . . . . . . . ID Statement . . . . . . . . . . . . . . . . . . INITIAL Statement . . . . . . . . . . . . . . RESTRICT Statement . . . . . . . . . . . . . VAR Statement . . . . . . . . . . . . . . . . . Details: STATESPACE Procedure . . . . . . . . . . Missing Values . . . . . . . . . . . . . . . . . Stationarity and Differencing . . . . . . . . . Preliminary Autoregressive Models . . . . . . Canonical Correlation Analysis . . . . . . . . Parameter Estimation . . . . . . . . . . . . . Forecasting . . . . . . . . . . . . . . . . . . . Relation of ARMA and State Space Forms . . OUT= Data Set . . . . . . . . . . . . . . . . . OUTAR= Data Set . . . . . . . . . . . . . . . OUTMODEL= Data Set . . . . . . . . . . . . Printed Output . . . . . . . . . . . . . . . . . ODS Table Names . . . . . . . . . . . . . . . Examples: STATESPACE Procedure . . . . . . . . . Example 25.1: Series J from Box and Jenkins References . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1568 1568 1569 1570 1571 1578 1580 1581 1582 1586 1586 1586 1587 1587 1587 1588 1588 1588 1590 1593 1596 1597 1599 1601 1601 1602 1603 1605 1605 1605 1611
1568 F Chapter 25: The STATESPACE Procedure
Overview: STATESPACE Procedure The STATESPACE procedure uses the state space model to analyze and forecast multivariate time series. The STATESPACE procedure is appropriate for jointly forecasting several related time series that have dynamic interactions. By taking into account the autocorrelations among all the variables in a set, the STATESPACE procedure can give better forecasts than methods that model each series separately. By default, the STATESPACE procedure automatically selects a state space model appropriate for the time series, making the procedure a good tool for automatic forecasting of multivariate time series. Alternatively, you can specify the state space model by giving the form of the state vector and the state transition and innovation matrices. The methods used by the STATESPACE procedure assume that the time series are jointly stationary. Nonstationary series must be made stationary by some preliminary transformation, usually by differencing. The STATESPACE procedure enables you to specify differencing of the input data. When differencing is specified, the STATESPACE procedure automatically integrates forecasts of the differenced series to produce forecasts of the original series.
The State Space Model The state space model represents a multivariate time series through auxiliary variables, some of which might not be directly observable. These auxiliary variables are called the state vector. The state vector summarizes all the information from the present and past values of the time series that is relevant to the prediction of future values of the series. The observed time series are expressed as linear combinations of the state variables. The state space model is also called a Markovian representation, or a canonical representation, of a multivariate time series process. The state space approach to modeling a multivariate stationary time series is summarized in Akaike (1976). The state space form encompasses a very rich class of models. Any Gaussian multivariate stationary time series can be written in a state space form, provided that the dimension of the predictor space is finite. In particular, any autoregressive moving average (ARMA) process has a state space representation and, conversely, any state space process can be expressed in an ARMA form (Akaike 1974). More details on the relation of the state space and ARMA forms are given in the section “Relation of ARMA and State Space Forms” on page 1599. Let xt be the r 1 vector of observed variables, after differencing (if differencing is specified) and subtracting the sample mean. Let zt be the state vector of dimension s, s r, where the first r components of zt consist of xt . Let the notation xt Ckjt represent the conditional expectation (or prediction) of xt Ck based on the information available at time t. Then the last s r elements of zt consist of elements of x t Ckjt , where k >0 is specified or determined automatically by the procedure. There are various forms of the state space model in use. The form of the state space model used by the STATESPACE procedure is based on Akaike (1976). The model is defined by the following
How PROC STATESPACE Works F 1569
state transition equation : zt C1 D Fzt C Get C1 In the state transition equation, the s s coefficient matrix F is called the transition matrix; it determines the dynamic properties of the model. The s r coefficient matrix G is called the input matrix; it determines the variance structure of the transition equation. For model identification, the first r rows and columns of G are set to an r r identity matrix. The input vector e t is a sequence of independent normally distributed random vectors of dimension r with mean 0 and covariance matrix †ee . The random error e t is sometimes called the innovation vector or shock vector. In addition to the state transition equation, state space models usually include a measurement equation or observation equation that gives the observed values xt as a function of the state vector zt . However, since PROC STATESPACE always includes the observed values xt in the state vector zt , the measurement equation in this case merely represents the extraction of the first r components of the state vector. The measurement equation used by the STATESPACE procedure is xt D ŒIr 0zt where Ir is an r r identity matrix. In practice, PROC STATESPACE performs the extraction of xt from zt without reference to an explicit measurement equation. In summary: xt
is an observation vector of dimension r.
zt
is a state vector of dimension s, whose first r elements are x s r elements are conditional prediction of future x t .
F
is an ss transition matrix.
G
is an sr input matrix, with the identity matrix I r forming the first r rows and columns.
et
is a sequence of independent normally distributed random vectors of dimension r with mean 0 and covariance matrix †ee .
t
and whose last
How PROC STATESPACE Works The design of the STATESPACE procedure closely follows the modeling strategy proposed by Akaike (1976). This strategy employs canonical correlation analysis for the automatic identification of the state space model. Following Akaike (1976), the procedure first fits a sequence of unrestricted vector autoregressive (VAR) models and computes Akaike’s information criterion (AIC) for each model. The vector
1570 F Chapter 25: The STATESPACE Procedure
autoregressive models are estimated using the sample autocovariance matrices and the Yule-Walker equations. The order of the VAR model that produces the smallest Akaike information criterion is chosen as the order (number of lags into the past) to use in the canonical correlation analysis. The elements of the state vector are then determined via a sequence of canonical correlation analyses of the sample autocovariance matrices through the selected order. This analysis computes the sample canonical correlations of the past with an increasing number of steps into the future. Variables that yield significant correlations are added to the state vector; those that yield insignificant correlations are excluded from further consideration. The importance of the correlation is judged on the basis of another information criterion proposed by Akaike. See the section “Canonical Correlation Analysis Options” on page 1583 for details. If you specify the state vector explicitly, these model identification steps are omitted. After the state vector is determined, the state space model is fit to the data. The free parameters in the F, G, and †ee matrices are estimated by approximate maximum likelihood. By default, the F and G matrices are unrestricted, except for identifiability requirements. Optionally, conditional least squares estimates can be computed. You can impose restrictions on elements of the F and G matrices. After the parameters are estimated, the Kalman filtering technique is used to produce forecasts from the fitted state space model. If differencing was specified, the forecasts are integrated to produce forecasts of the original input variables.
Getting Started: STATESPACE Procedure The following introductory example uses simulated data for two variables X and Y. The following statements generate the X and Y series. data in; x=10; y=40; x1=0; y1=0; a1=0; b1=0; iseed=123; do t=-100 to 200; a=rannor(iseed); b=rannor(iseed); dx = 0.5*x1 + 0.3*y1 + a - 0.2*a1 - 0.1*b1; dy = 0.3*x1 + 0.5*y1 + b; x = x + dx + .25; y = y + dy + .25; if t >= 0 then output; x1 = dx; y1 = dy; a1 = a; b1 = b; end; keep t x y; run;
The simulated series X and Y are shown in Figure 25.1.
Automatic State Space Model Selection F 1571
Figure 25.1 Example Series
Automatic State Space Model Selection The STATESPACE procedure is designed to automatically select the best state space model for forecasting the series. You can specify your own model if you want, and you can use the output from PROC STATESPACE to help you identify a state space model. However, the easiest way to use PROC STATESPACE is to let it choose the model.
Stationarity and Differencing Although PROC STATESPACE selects the state space model automatically, it does assume that the input series are stationary. If the series are nonstationary, then the process might fail. Therefore the first step is to examine your data and test to see if differencing is required. (See the section “Stationarity and Differencing” on page 1588 for further discussion of this issue.) The series shown in Figure 25.1 are nonstationary. In order to forecast X and Y with a state space model, you must difference them (or use some other detrending method). If you fail to difference
1572 F Chapter 25: The STATESPACE Procedure
when needed and try to use PROC STATESPACE with nonstationary data, an inappropriate state space model might be selected, and the model estimation might fail to converge. The following statements identify and fit a state space model for the first differences of X and Y, and forecast X and Y 10 periods ahead: proc statespace data=in out=out lead=10; var x(1) y(1); id t; run;
The DATA= option specifies the input data set and the OUT= option specifies the output data set for the forecasts. The LEAD= option specifies forecasting 10 observations past the end of the input data. The VAR statement specifies the variables to forecast and specifies differencing. The notation X(1) Y(1) specifies that the state space model analyzes the first differences of X and Y.
Descriptive Statistics and Preliminary Autoregressions The first page of the printed output produced by the preceding statements is shown in Figure 25.2. Figure 25.2 Descriptive Statistics and VAR Order Selection The STATESPACE Procedure Number of Observations
Mean
Standard Error
x
0.144316
1.233457
y
0.164871
1.304358
Variable
200
Has been differenced. With period(s) = 1. Has been differenced. With period(s) = 1.
The STATESPACE Procedure Information Criterion for Autoregressive Models Lag=0
Lag=1
Lag=2
Lag=3
Lag=4
Lag=5
Lag=6
Lag=7
Lag=8
149.697 8.387786 5.517099 12.05986 15.36952 21.79538 24.00638 29.88874 33.55708 Information Criterion for Autoregressive Models Lag=9
Lag=10
41.17606
47.70222
Automatic State Space Model Selection F 1573
Figure 25.2 continued Schematic Representation of Correlations Name/Lag x y
0
1
2
3
4
5
6
7
8
9
10
++ ++
++ ++
++ ++
++ ++
++ ++
++ +.
+. +.
.. +.
+. +.
+. ..
.. ..
+ is > 2*std error,
- is < -2*std error,
. is between
Descriptive statistics are printed first, giving the number of nonmissing observations after differencing and the sample means and standard deviations of the differenced series. The sample means are subtracted before the series are modeled (unless the NOCENTER option is specified), and the sample means are added back when the forecasts are produced. Let Xt and Yt be the observed values of X and Y, and let xt and yt be the values of X and Y after differencing and subtracting the mean difference. The series xt modeled by the STATEPSPACE procedure is x .1 xt D t D yt .1
B/Xt B/Yt
0:144316 0:164871
where B represents the backshift operator. After the descriptive statistics, PROC STATESPACE prints the Akaike information criterion (AIC) values for the autoregressive models fit to the series. The smallest AIC value, in this case 5.517 at lag 2, determines the number of autocovariance matrices analyzed in the canonical correlation phase. A schematic representation of the autocorrelations is printed next. This indicates which elements of the autocorrelation matrices at different lags are significantly greater than or less than 0. The second page of the STATESPACE printed output is shown in Figure 25.3. Figure 25.3 Partial Autocorrelations and VAR Model Schematic Representation of Partial Autocorrelations Name/Lag x y
1
2
3
4
5
6
7
8
9
10
++ ++
+. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
+ is > 2*std error,
- is < -2*std error,
. is between
Yule-Walker Estimates for Minimum AIC
x y
--------Lag=1------x y
--------Lag=2------x y
0.257438 0.292177
0.170812 -0.00537
0.202237 0.469297
0.133554 -0.00048
1574 F Chapter 25: The STATESPACE Procedure
Figure 25.3 shows a schematic representation of the partial autocorrelations, similar to the autocorrelations shown in Figure 25.2. The selection of a second order autoregressive model by the AIC statistic looks reasonable in this case because the partial autocorrelations for lags greater than 2 are not significant. Next, the Yule-Walker estimates for the selected autoregressive model are printed. This output shows the coefficient matrices of the vector autoregressive model at each lag.
Selected State Space Model Form and Preliminary Estimates After the autoregressive order selection process has determined the number of lags to consider, the canonical correlation analysis phase selects the state vector. By default, output for this process is not printed. You can use the CANCORR option to print details of the canonical correlation analysis. See the section “Canonical Correlation Analysis Options” on page 1583 for an explanation of this process. After the state vector is selected, the state space model is estimated by approximate maximum likelihood. Information from the canonical correlation analysis and from the preliminary autoregression is used to form preliminary estimates of the state space model parameters. These preliminary estimates are used as starting values for the iterative estimation process. The form of the state vector and the preliminary estimates are printed next, as shown in Figure 25.4. Figure 25.4 Preliminary Estimates of State Space Model The STATESPACE Procedure Selected Statespace Form and Preliminary Estimates State Vector x(T;T)
y(T;T)
x(T+1;T)
Estimate of Transition Matrix 0 0.291536 0.24869
0 0.468762 0.24484
1 -0.00411 0.204257
Input Matrix for Innovation 1 0 0.257438
0 1 0.202237
Variance Matrix for Innovation 0.945196 0.100786
0.100786 1.014703
Figure 25.4 first prints the state vector as X[T;T] Y[T;T] X[T+1;T]. This notation indicates that
Automatic State Space Model Selection F 1575
the state vector is 2
3 xt jt zt D 4 yt jt 5 xt C1jt The notation xt C1jt indicates the conditional expectation or prediction of xt C1 based on the information available at time t, and xt jt and yt jt are xt and yt , respectively. The remainder of Figure 25.4 shows the preliminary estimates of the transition matrix F, the input matrix G, and the covariance matrix †ee .
Estimated State Space Model The next page of the STATESPACE output prints the final estimates of the fitted model, as shown in Figure 25.5. This output has the same form as in Figure 25.4, but it shows the maximum likelihood estimates instead of the preliminary estimates. Figure 25.5 Fitted State Space Model The STATESPACE Procedure Selected Statespace Form and Fitted Model State Vector x(T;T)
y(T;T)
x(T+1;T)
Estimate of Transition Matrix 0 0.297273 0.2301
0 0.47376 0.228425
1 -0.01998 0.256031
Input Matrix for Innovation 1 0 0.257284
0 1 0.202273
Variance Matrix for Innovation 0.945188 0.100752
0.100752 1.014712
1576 F Chapter 25: The STATESPACE Procedure
The estimated state space model shown in Figure 25.5 is 2
2 3 xt C1jt C1 0 4yt C1jtC1 5 D 40:297 0:230 xt C2jt C1 e 0:945 var t C1 D nt C1 0:101
3 32 3 2 1 0 xt 0 1 e t C1 1 5 0:474 0:0205 4 yt 5 C 4 0 ntC1 0:257 0:202 xt C1jt 0:228 0:256 0:101 1:015
The next page of the STATESPACE output lists the estimates of the free parameters in the F and G matrices with standard errors and t statistics, as shown in Figure 25.6. Figure 25.6 Final Parameter Estimates Parameter Estimates
Parameter
Estimate
Standard Error
t Value
F(2,1) F(2,2) F(2,3) F(3,1) F(3,2) F(3,3) G(3,1) G(3,2)
0.297273 0.473760 -0.01998 0.230100 0.228425 0.256031 0.257284 0.202273
0.129995 0.115688 0.313025 0.126226 0.112978 0.305256 0.071060 0.068593
2.29 4.10 -0.06 1.82 2.02 0.84 3.62 2.95
Convergence Failures The maximum likelihood estimates are computed by an iterative nonlinear maximization algorithm, which might not converge. If the estimates fail to converge, warning messages are printed in the output. If you encounter convergence problems, you should recheck the stationarity of the data and ensure that the specified differencing orders are correct. Attempting to fit state space models to nonstationary data is a common cause of convergence failure. You can also use the MAXIT= option to increase the number of iterations allowed, or experiment with the convergence tolerance options DETTOL= and PARMTOL=.
Forecast Data Set The following statements print the output data set. The WHERE statement excludes the first 190 observations from the output, so that only the forecasts and the last 10 actual observations are printed. proc print data=out; id t; where t > 190;
Automatic State Space Model Selection F 1577
run;
The PROC PRINT output is shown in Figure 25.7. Figure 25.7 OUT= Data Set Produced by PROC STATESPACE t
x
FOR1
RES1
STD1
y
FOR2
RES2
STD2
191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210
34.8159 35.0656 34.7034 34.6626 34.4055 33.8210 34.0164 35.3819 36.2954 37.8945 . . . . . . . . . .
33.6299 35.6598 35.5530 34.7597 34.8322 34.6053 33.6230 33.6251 36.0528 37.1431 38.5068 39.0428 39.4619 39.8284 40.1474 40.4310 40.6861 40.9185 41.1330 41.3332
1.18600 -0.59419 -0.84962 -0.09707 -0.42664 -0.78434 0.39333 1.75684 0.24256 0.75142 . . . . . . . . . .
0.97221 0.97221 0.97221 0.97221 0.97221 0.97221 0.97221 0.97221 0.97221 0.97221 0.97221 1.59125 2.28028 2.97824 3.67689 4.36299 5.03040 5.67548 6.29673 6.89383
58.7189 58.5440 59.0476 59.7774 60.5118 59.8750 58.4698 60.6782 60.9692 60.8586 . . . . . . . . . .
57.9916 59.7718 58.5723 59.2241 60.1544 60.8260 59.4502 57.9167 62.1637 61.4085 61.3161 61.7509 62.1546 62.5099 62.8275 63.1139 63.3755 63.6174 63.8435 64.0572
0.72728 -1.22780 0.47522 0.55330 0.35738 -0.95102 -0.98046 2.76150 -1.19450 -0.54984 . . . . . . . . . .
1.00733 1.00733 1.00733 1.00733 1.00733 1.00733 1.00733 1.00733 1.00733 1.00733 1.00733 1.83678 2.62366 3.38839 4.12805 4.84149 5.52744 6.18564 6.81655 7.42114
The OUT= data set produced by PROC STATESPACE contains the VAR and ID statement variables. In addition, for each VAR statement variable, the OUT= data set contains the variables FORi, RESi, and STDi. These variables contain the predicted values, residuals, and forecast standard errors for the ith variable in the VAR statement list. In this case, X is listed first in the VAR statement, so FOR1 contains the forecasts of X, while FOR2 contains the forecasts of Y. The following statements plot the forecasts and actuals for the series. proc sgplot data=out noautolegend; where t > 150; series x=t y=for1 / markers markerattrs=(symbol=circle color=blue) lineattrs=(pattern=solid color=blue); series x=t y=for2 / markers markerattrs=(symbol=circle color=blue) lineattrs=(pattern=solid color=blue); series x=t y=x / markers markerattrs=(symbol=circle color=red) lineattrs=(pattern=solid color=red); series x=t y=y / markers markerattrs=(symbol=circle color=red) lineattrs=(pattern=solid color=red); refline 200.5 / axis=x; run;
1578 F Chapter 25: The STATESPACE Procedure
The forecast plot is shown in Figure 25.8. The last 50 observations are also plotted to provide context, and a reference line is drawn between the historical and forecast periods. Figure 25.8 Plot of Forecasts
Controlling Printed Output By default, the STATESPACE procedure produces a large amount of printed output. The NOPRINT option suppresses all printed output. You can suppress the printed output for the autoregressive model selection process with the PRINTOUT=NONE option. The descriptive statistics and state space model estimation output are still printed when PRINTOUT=NONE is specified. You can produce more detailed output with the PRINTOUT=LONG option and by specifying the printing control options CANCORR, COVB, and PRINT.
Specifying the State Space Model Instead of allowing the STATESPACE procedure to select the model automatically, you can use FORM and RESTRICT statements to specify a state space model.
Specifying the State Space Model F 1579
Specifying the State Vector Use the FORM statement to control the form of the state vector. You can use this feature to force PROC STATESPACE to estimate and forecast a model different from the model it would select automatically. You can also use this feature to reestimate the automatically selected model (possibly with restrictions) without repeating the canonical correlation analysis. The FORM statement specifies the number of lags of each variable to include in the state vector. For example, the statement FORM X 3; forces the state vector to include xt jt , xt C1jt , and xt C2jt . The following statement specifies the state vector .xt jt ; yt jt ; xtC1jt /, which is the same state vector selected in the preceding example: form x 2 y 1;
You can specify the form for only some of the variables and allow PROC STATESPACE to select the form for the other variables. If only some of the variables are specified in the FORM statement, canonical correlation analysis is used to determine the number of lags included in the state vector for the remaining variables not specified by the FORM statement. If the FORM statement includes specifications for all the variables listed in the VAR statement, the state vector is completely defined and the canonical correlation analysis is not performed.
Restricting the F and G matrices After you know the form of the state vector, you can use the RESTRICT statement to fix some parameters in the F and G matrices to specified values. One use of this feature is to remove insignificant parameters by restricting them to 0. In the introductory example shown in the preceding section, the F[2,3] parameter is not significant. (The parameters estimation output shown in Figure 25.6 gives the t statistic for F[2,3] as –0.06. F[3,3] and F[3,1] also have low significance with t < 2.) The following statements reestimate this model with F[2,3] restricted to 0. The FORM statement is used to specify the state vector and thus bypass the canonical correlation analysis. proc statespace data=in out=out lead=10; var x(1) y(1); id t; form x 2 y 1; restrict f(2,3)=0; run;
The final estimates produced by these statements are shown in Figure 25.10.
1580 F Chapter 25: The STATESPACE Procedure
Figure 25.9 Results Using RESTRICT Statement The STATESPACE Procedure Selected Statespace Form and Fitted Model State Vector x(T;T)
y(T;T)
x(T+1;T)
Estimate of Transition Matrix 0 0.290051 0.227051
0 0.467468 0.226139
1 0 0.26436
Input Matrix for Innovation 1 0 0.256826
0 1 0.202022
Variance Matrix for Innovation 0.945175 0.100696
0.100696 1.014733
Figure 25.10 Restricted Parameter Estiamtes Parameter Estimates
Parameter
Estimate
Standard Error
t Value
F(2,1) F(2,2) F(3,1) F(3,2) F(3,3) G(3,1) G(3,2)
0.290051 0.467468 0.227051 0.226139 0.264360 0.256826 0.202022
0.063904 0.060430 0.125221 0.111711 0.299537 0.070994 0.068507
4.54 7.74 1.81 2.02 0.88 3.62 2.95
Syntax: STATESPACE Procedure The STATESPACE procedure uses the following statements:
Functional Summary F 1581
PROC STATESPACE options ; BY variable . . . ; FORM variable value . . . ; ID variable ; INITIAL F (row,column)=value . . . G(row,column)=value . . . ; RESTRICT F (row,column)=value . . . G (row,column)=value . . . ; VAR variable (difference, difference, . . . ) . . . ;
Functional Summary Table 25.1 summarizes the statements and options used by PROC STATESPACE. Table 25.1
STATESPACE Functional Summary
Description
Statement
Option
Input Data Set Options specify the input data set prevent subtraction of sample mean specify the ID variable specify the observed series and differencing
PROC STATESPACE PROC STATESPACE ID VAR
DATA= NOCENTER
Options for Autoregressive Estimates specify the maximum order specify maximum lag for autocovariances output only minimum AIC model specify the amount of detail printed write preliminary AR models to a data set
PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE
ARMAX= LAGMAX= MINIC PRINTOUT= OUTAR=
PROC STATESPACE PROC STATESPACE
CANCORR DIMMAX=
PROC STATESPACE PROC STATESPACE
PASTMIN= SIGCORR=
INITIAL PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE
COVB DETTOL= PARMTOL= ITPRINT KLAG= MAXIT=
PROC STATESPACE
NOEST
Options for Canonical Correlation Analysis print the sequence of canonical correlations specify upper limit of dimension of state vector specify the minimum number of lags specify the multiplier of the degrees of freedom Options for State Space Model Estimation specify starting values print covariance matrix of parameter estimates specify the convergence criterion specify the convergence criterion print the details of the iterations specify an upper limit of the number of lags specify maximum number of iterations allowed suppress the final estimation
1582 F Chapter 25: The STATESPACE Procedure
Description
Statement
Option
write the state space model parameter estimates to an output data set use conditional least squares for final estimates specify criterion for testing for singularity
PROC STATESPACE
OUTMODEL=
PROC STATESPACE PROC STATESPACE
RESIDEST SINGULAR=
Options for Forecasting start forecasting before end of the input data specify the time interval between observations specify multiple periods in the time series specify how many periods to forecast specify the output data set for forecasts print forecasts
PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE
BACK= INTERVAL= INTPER= LEAD= OUT= PRINT
Options to Specify the State Space Model specify the state vector specify the parameter values
FORM RESTRICT
BY Groups specify BY-group processing
BY
Printing suppresses all printed output
NOPRINT
PROC STATESPACE Statement PROC STATESPACE options ;
The following options can be specified in the PROC STATESPACE statement.
Printing Options NOPRINT
suppresses all printed output.
Input Data Options DATA=SAS-data-set
specifies the name of the SAS data set to be used by the procedure. If the DATA= option is omitted, the most recently created SAS data set is used. LAGMAX=k
specifies the number of lags for which the sample autocovariance matrix is computed. The
PROC STATESPACE Statement F 1583
LAGMAX= option controls the number of lags printed in the schematic representation of the autocorrelations. The sample autocovariance matrix of lag i, denoted as Ci , is computed as Ci D
N X
1 N
1
xt x0t
i
t D1Ci
where xt is the differenced and centered data and N is the number of observations. (If the NOCENTER option is specified, 1 is not subtracted from N .) LAGMAX= k specifies that C0 through Ck are computed. The default is LAGMAX=10. NOCENTER
prevents subtraction of the sample mean from the input series (after any specified differencing) before the analysis.
Options for Preliminary Autoregressive Models ARMAX=n
specifies the maximum order of the preliminary autoregressive models. The ARMAX= option controls the autoregressive orders for which information criteria are printed, and controls the number of lags printed in the schematic representation of partial autocorrelations. The default is ARMAX=10. See the section “Preliminary Autoregressive Models” on page 1590 for details. MINIC
writes to the OUTAR= data set only the preliminary Yule-Walker estimates for the VAR model that produces the minimum AIC. See the section “OUTAR= Data Set” on page 1601 for details. OUTAR=SAS-data-set
writes the Yule-Walker estimates of the preliminary autoregressive models to a SAS data set. See the section “OUTAR= Data Set” on page 1601 for details. PRINTOUT=SHORT | LONG | NONE
determines the amount of detail printed. PRINTOUT=LONG prints the lagged covariance matrices, the partial autoregressive matrices, and estimates of the residual covariance matrices from the sequence of autoregressive models. PRINTOUT=NONE suppresses the output for the preliminary autoregressive models. The descriptive statistics and state space model estimation output are still printed when PRINTOUT=NONE is specified. PRINTOUT=SHORT is the default.
Canonical Correlation Analysis Options CANCORR
prints the canonical correlations and information criterion for each candidate state vector considered. See the section “Canonical Correlation Analysis Options” on page 1583 for details.
1584 F Chapter 25: The STATESPACE Procedure
DIMMAX=n
specifies the upper limit to the dimension of the state vector. The DIMMAX= option can be used to limit the size of the model selected. The default is DIMMAX=10. PASTMIN=n
specifies the minimum number of lags to include in the canonical correlation analysis. The default is PASTMIN=0. See the section “Canonical Correlation Analysis Options” on page 1583 for details. SIGCORR=value
specifies the multiplier of the degrees of freedom for the penalty term in the information criterion used to select the state space form. The default is SIGCORR=2. The larger the value of the SIGCORR= option, the smaller the state vector tends to be. Hence, a large value causes a simpler model to be fit. See the section “Canonical Correlation Analysis Options” on page 1583 for details.
State Space Model Estimation Options COVB
prints the inverse of the observed information matrix for the parameter estimates. This matrix is an estimate of the covariance matrix for the parameter estimates. DETTOL=value
specifies the convergence criterion. The DETTOL= and PARMTOL= option values are used together to test for convergence of the estimation process. If, during an iteration, the relative change of the parameter estimates is less than the PARMTOL= value and the relative change of the determinant of the innovation variance matrix is less than the DETTOL= value, then iteration ceases and the current estimates are accepted. The default is DETTOL=1E–5. ITPRINT
prints the iterations during the estimation process. KLAG=n
sets an upper limit for the number of lags of the sample autocovariance matrix used in computing the approximate likelihood function. If the data have a strong moving average character, a larger KLAG= value might be necessary to obtain good estimates. The default is KLAG=15. See the section “Parameter Estimation” on page 1596 for details. MAXIT=n
sets an upper limit to the number of iterations in the maximum likelihood or conditional least squares estimation. The default is MAXIT=50. NOEST
suppresses the final maximum likelihood estimation of the selected model. OUTMODEL=SAS-data-set
writes the parameter estimates and their standard errors to a SAS data set. See the section “OUTMODEL= Data Set” on page 1602 for details.
PROC STATESPACE Statement F 1585
PARMTOL=value
specifies the convergence criterion. The DETTOL= and PARMTOL= option values are used together to test for convergence of the estimation process. If, during an iteration, the relative change of the parameter estimates is less than the PARMTOL= value and the relative change of the determinant of the innovation variance matrix is less than the DETTOL= value, then iteration ceases and the current estimates are accepted. The default is PARMTOL=0.001. RESIDEST
computes the final estimates by using conditional least squares on the raw data. This type of estimation might be more stable than the default maximum likelihood method but is usually more computationally expensive. See the section “Parameter Estimation” on page 1596 for details about the conditional least squares method. SINGULAR=value
specifies the criterion for testing for singularity of a matrix. A matrix is declared singular if a scaled pivot is less than the SINGULAR= value when sweeping the matrix. The default is SINGULAR=1E–7.
Forecasting Options BACK=n
starts forecasting n periods before the end of the input data. The BACK= option value must not be greater than the number of observations. The default is BACK=0. INTERVAL=interval
specifies the time interval between observations. The INTERVAL= value is used in conjunction with the ID variable to check that the input data are in order and have no missing periods. The INTERVAL= option is also used to extrapolate the ID values past the end of the input data. See Chapter 4, “Date Intervals, Formats, and Functions,” for details about the INTERVAL= values allowed. INTPER=n
specifies that each input observation corresponds to n time periods. For example, the options INTERVAL=MONTH and INTPER=2 specify bimonthly data and are equivalent to specifying INTERVAL=MONTH2. If the INTERVAL= option is not specified, the INTPER= option controls the increment used to generate ID values for the forecast observations. The default is INTPER=1. LEAD=n
specifies how many forecast observations are produced. The forecasts start at the point set by the BACK= option. The default is LEAD=0, which produces no forecasts. OUT=SAS-data-set
writes the residuals, actual values, forecasts, and forecast standard errors to a SAS data set. See the section “OUT= Data Set” on page 1601 for details. PRINT
prints the forecasts.
1586 F Chapter 25: The STATESPACE Procedure
BY Statement BY variable . . . ;
A BY statement can be used with the STATESPACE procedure to obtain separate analyses on observations in groups defined by the BY variables.
FORM Statement FORM variable value . . . ;
The FORM statement specifies the number of times a variable is included in the state vector. Values can be specified for any variable listed in the VAR statement. If a value is specified for each variable in the VAR statement, the state vector for the state space model is entirely specified, and automatic selection of the state space model is not performed. The FORM statement forces the state vector, zt , to contain a specific variable a given number of times. For example, if Y is one of the variables in xt , then the statement form y 3;
forces the state vector to contain Yt ; YtC1jt , and Yt C2jt , possibly along with other variables. The following statements illustrate the use of the FORM statement: proc statespace data=in; var x y; form x 3 y 2; run;
These statements fit a state space model with the following state vector: 2 3 xt jt 6 yt jt 7 6 7 7 x zt D 6 t C1jt 6 7 4ytC1jt 5 xt C2jt
ID Statement ID variable ;
The ID statement specifies a variable that identifies observations in the input data set. The variable specified in the ID statement is included in the OUT= data set. The values of the ID variable are extrapolated for the forecast observations based on the values of the INTERVAL= and INTPER= options.
INITIAL Statement F 1587
INITIAL Statement INITIAL
F (row,column)= value . . . G(row, column)= value . . . ;
The INITIAL statement gives initial values to the specified elements of the F and G matrices. These initial values are used as starting values for the iterative estimation. Parts of the F and G matrices represent fixed structural identities. If an element specified is a fixed structural element instead of a free parameter, the corresponding initialization is ignored. The following is an example of an INITIAL statement: initial f(3,2)=0 g(4,1)=0 g(5,1)=0;
RESTRICT Statement RESTRICT F(row,column)= value . . . G(row,column)= value . . . ;
The RESTRICT statement restricts the specified elements of the F and G matrices to the specified values. To use the restrict statement, you need to know the form of the model. Either specify the form of the model with the FORM statement, or do a preliminary run (perhaps with the NOEST option) to find the form of the model that PROC STATESPACE selects for the data. The following is an example of a RESTRICT statement: restrict f(3,2)=0 g(4,1)=0 g(5,1)=0 ;
Parts of the F and G matrices represent fixed structural identities. If a restriction is specified for an element that is a fixed structural element instead of a free parameter, the restriction is ignored.
VAR Statement VAR variable (difference, difference, . . . ) . . . ;
The VAR statement specifies the variables in the input data set to model and forecast. The VAR statement also specifies differencing of the input variables. The VAR statement is required. Differencing is specified by following the variable name with a list of difference periods separated by commas. See the section “Stationarity and Differencing” on page 1588 for more information about differencing of input variables. The order in which variables are listed in the VAR statement controls the order in which variables are included in the state vector. Usually, potential inputs should be listed before potential outputs.
1588 F Chapter 25: The STATESPACE Procedure
For example, assuming the input data are monthly, the following VAR statement specifies modeling and forecasting of the one period and seasonal second difference of X and Y: var x(1,12) y(1,12);
In this example, the vector time series analyzed is .1 B/.1 B 12 /Xt x xt D .1 B/.1 B 12 /Yt y where B represents the back shift operator and x and y represent the means of the differenced series. If the NOCENTER option is specified, the mean differences are not subtracted.
Details: STATESPACE Procedure
Missing Values The STATESPACE procedure does not support missing values. The procedure uses the first contiguous group of observations with no missing values for any of the VAR statement variables. Observations at the beginning of the data set with missing values for any VAR statement variable are not used or included in the output data set.
Stationarity and Differencing The state space model used by the STATESPACE procedure assumes that the time series are stationary. Hence, the data should be checked for stationarity. One way to check for stationarity is to plot the series. A graph of series over time can show a time trend or variability changes. You can also check stationarity by using the sample autocorrelation functions displayed by the ARIMA procedure. The autocorrelation functions of nonstationary series tend to decay slowly. See Chapter 7, “The ARIMA Procedure,” for more information. Another alternative is to use the STATIONARITY= option in the IDENTIFY statement in PROC ARIMA to apply Dickey-Fuller tests for unit roots in the time series. See Chapter 7, “The ARIMA Procedure,” for more information about Dickey-Fuller unit root tests. The most popular way to transform a nonstationary series to stationarity is by differencing. Differencing of the time series is specified in the VAR statement. For example, to take a simple first difference of the series X, use this statement: var x(1);
Stationarity and Differencing F 1589
In this example, the change in X from one period to the next is analyzed. When the series has a seasonal pattern, differencing at a period equal to the length of the seasonal cycle can be desirable. For example, suppose the variable X is measured quarterly and shows a seasonal cycle over the year. You can use the following statement to analyze the series of changes from the same quarter in the previous year: var x(4);
To difference twice, add another differencing period to the list. For example, the following statement analyzes the series of second differences .Xt Xt 1 / .Xt 1 Xt 2 / D Xt 2Xt 1 C Xt 2 : var x(1,1);
The following statement analyzes the seasonal second difference series: var x(1,4);
The series that is being modeled is the 1-period difference of the 4-period difference: .Xt Xt 4 / .Xt 1 Xt 5 / D Xt Xt 1 Xt 4 C Xt 5 . Another way to obtain stationary series is to use a regression on time to detrend the data. If the time series has a deterministic linear trend, regressing the series on time produces residuals that should be stationary. The following statements write residuals of X and Y to the variable RX and RY in the output data set DETREND. data a; set a; t=_n_; run; proc reg data=a; model x y = t; output out=detrend r=rx ry; run;
You then use PROC STATESPACE to forecast the detrended series RX and RY. A disadvantage of this method is that you need to add the trend back to the forecast series in an additional step. A more serious disadvantage of the detrending method is that it assumes a deterministic trend. In practice, most time series appear to have a stochastic rather than a deterministic trend. Differencing is a more flexible and often more appropriate method. There are several other methods to handle nonstationary time series. For more information and examples, see Brockwell and Davis (1991).
1590 F Chapter 25: The STATESPACE Procedure
Preliminary Autoregressive Models After computing the sample autocovariance matrices, PROC STATESPACE fits a sequence of vector autoregressive models. These preliminary autoregressive models are used to estimate the autoregressive order of the process and limit the order of the autocovariances considered in the state vector selection process.
Yule-Walker Equations for Forward and Backward Models Unlike a univariate autoregressive model, a multivariate autoregressive model has different forms, depending on whether the present observation is being predicted from the past observations or from the future observations. Let xt be the r-component stationary time series given by the VAR statement after differencing and subtracting the vector of sample means. (If the NOCENTER option is specified, the mean is not subtracted.) Let n be the number of observations of xt from the input data set. Let et be a vector white noise sequence with mean vector 0 and variance matrix †p , and let nt be a vector white noise sequence with mean vector 0 and variance matrix p . Let p be the order of the vector autoregressive model for xt . The forward autoregressive form based on the past observations is written as follows: xt D
p X
p
ˆi x t
i
C et
i D1
The backward autoregressive form based on the future observations is written as follows: xt D
p X
p
‰i xt Ci C nt
i D1
Letting E denote the expected value operator, the autocovariance sequence for the xt series, i , is i D Ext x0t
i
The Yule-Walker equations for the autoregressive model that matches the first p elements of the autocovariance sequence are 2 6 6 6 4
0 10 :: : p0
p p :: :
1 0 :: : 1
p0
2
0
3 2 p3 2 3 1 ˆ1 7 6 p 7 6 2 7 2 7 6ˆ2 7 6 7 7 6 :: 7 D 6 :: 7 54 : 5 4 : 5 1
p
ˆp
p
Preliminary Autoregressive Models F 1591
and 2 6 6 6 4
p
p
1
2 03 p3 1 ‰1 7 6‰ p 7 6 0 7 27 6 2 7 6 27 7 6 :: 7 D 6 :: 7 54 : 5 4 : 5
p0 p0 :: :
10 0 :: :
0 1 :: :
2
1
32
0
p
‰p
p0
p
Here ˆi are the coefficient matrices for the past observation form of the vector autoregressive p model, and ‰i are the coefficient matrices for the future observation form. More information about the Yule-Walker equations in the multivariate setting can be found in Whittle (1963) and Ansley and Newbold (1979). The innovation variance matrices for the two forms can be written as follows: †p D 0
p X
p
ˆi i0
i D1
p D 0
p X
p
‰ i i
i D1
The autoregressive models are fit to the data by using the preceding Yule-Walker equations with i replaced by the sample covariance sequence Ci . The covariance matrices are calculated as Ci D
N X
1 N
1
xt x0t
i
t Di C1
bp , ‰ bp , † b p , and b p represent the Yule-Walker estimates of ˆp , ‰p , †p , and p , respecLet ˆ tively. These matrices are written to an output data set when the OUTAR= option is specified. b p and the correWhen the PRINTOUT=LONG option is specified, the sequence of matrices † b p is used to compute Akaike sponding correlation matrices are printed. The sequence of matrices † information criteria for selection of the autoregressive order of the process.
Akaike Information Criterion The Akaike information criterion (AIC) is defined as –2(maximum of log likelihood )+2(number of parameters). Since the vector autoregressive models are estimates from the Yule-Walker equations, not by maximum likelihood, the exact likelihood values are not available for computing the AIC. However, for the vector autoregressive model the maximum of the log likelihood can be approximated as n b p j/ ln.j† ln.L/ 2 Thus, the AIC for the order p model is computed as b p j/ C 2pr 2 AICp D nln.j†
1592 F Chapter 25: The STATESPACE Procedure
You can use the printed AIC array to compute a likelihood ratio test of the autoregressive order. The log-likelihood ratio test statistic for testing the order p model against the order p 1 model is b p j/ C nln.j† bp nln.j†
1 j/
This quantity is asymptotically distributed as a 2 with r2 degrees of freedom if the series is autoregressive of order p 1. It can be computed from the AIC array as AICp
1
AICp C 2r 2
You can evaluate the significance of these test statistics with the PROBCHI function in a SAS DATA step or with a 2 table.
Determining the Autoregressive Order Although the autoregressive models can be used for prediction, their primary value is to aid in the selection of a suitable portion of the sample covariance matrix for use in computing canonical correlations. If the multivariate time series xt is of autoregressive order p, then the vector of past values to lag p is considered to contain essentially all the information relevant for prediction of future values of the time series. By default, PROC STATESPACE selects the order p that produces the autoregressive model with the smallest AICp . If the value p for the minimum AICp is less than the value of the PASTMIN= option, then p is set to the PASTMIN= value. Alternatively, you can use the ARMAX= and PASTMIN= options to force PROC STATESPACE to use an order you select.
Significance Limits for Partial Autocorrelations The STATESPACE procedure prints a schematic representation of the partial autocorrelation matrices that indicates which partial autocorrelations are significantly greater than or significantly less than 0. Figure 25.11 shows an example of this table. Figure 25.11 Significant Partial Autocorrelations Schematic Representation of Partial Autocorrelations Name/Lag x y
1
2
3
4
5
6
7
8
9
10
++ ++
+. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
+ is > 2*std error,
- is < -2*std error,
. is between
bp The partial autocorrelations are from the sample partial autoregressive matrices ˆ p . The standard errors used for the significance limits of the partial autocorrelations are computed from the sequence of matrices †p and p .
Canonical Correlation Analysis F 1593
Under the assumption that the observed series arises from an autoregressive process of order p 1, 1 1 bp the pth sample partial autoregressive matrix ˆ p has an asymptotic variance matrix n p ˝†p . bp The significance limits for ˆ p used in the schematic plot of the sample partial autoregressive sequence are derived by replacing p and †p with their sample estimators to produce the variance estimate, as follows:
b
bp V ar ˆ p D
1 n
rp
b p 1 ˝† bp
Canonical Correlation Analysis Given the order p, let pt be the vector of current and past values relevant to prediction of xt C1 : pt D .x0t ; x0t
0 0 1 ; ; xt p /
Let ft be the vector of current and future values: ft D .x0t ; x0t C1 ; ; x0t Cp /0 In the canonical correlation analysis, consider submatrices of the sample covariance matrix of pt and ft . This covariance matrix, V, has a block Hankel form: 2
C0 6 C0 6 1 VD6 : 4 ::
C01 C02 :: :
C02 C03 :: :
3 C0p C0pC1 7 7 :: 7 : 5
C0p C0pC1 C0pC2
C02p
State Vector Selection Process j
The canonical correlation analysis forms a sequence of potential state vectors zt . Examine a sej quence ft of subvectors of ft , form the submatrix Vj that consists of the rows and columns of V j that correspond to the components of ft , and compute its canonical correlations. The smallest canonical correlation of Vj is then used in the selection of the components of the state vector. The selection process is described in the following discussion. For more details about this process, see Akaike (1976). In the following discussion, the notation xt Ckjt denotes the wide sense conditional expectation (best linear predictor) of xt Ck , given all xs with s less than or equal to t. In the notation xi;t C1 , the first subscript denotes the ith component of xt C1 . j
The initial state vector z1t is set to xt . The sequence ft is initialized by setting 0
f1t D .z1t ; x1;t C1jt /0 D .x0t ; x1;t C1jt /0
1594 F Chapter 25: The STATESPACE Procedure
That is, start by considering whether to add x1;t C1jt to the initial state vector z1t . The procedure forms the submatrix V1 that corresponds to f1t and computes its canonical correlations. Denote the smallest canonical correlation of V1 as mi n . If mi n is significantly greater than 0, x1;tC1jt is added to the state vector. If the smallest canonical correlation of V1 is not significantly greater than 0, then a linear combination of f1t is uncorrelated with the past, pt . Assuming that the determinant of C0 is not 0, (that is, no input series is a constant), you can take the coefficient of x1;t C1jt in this linear combination to be 1. Denote the coefficients of z1t in this linear combination as `. This gives the relationship: x1;t C1jt D `0 xt Therefore, the current state vector already contains all the past information useful for predicting x1;tC1 and any greater leads of x1;t . The variable x1;t C1jt is not added to the state vector, nor are any terms x1;t Ckjt considered as possible components of the state vector. The variable x1 is no longer active for state vector selection. The process described for x1;t C1jt is repeated for the remaining elements of ft . The next candidate for inclusion in the state vector is the next component of ft that corresponds to an active variable. Components of ft that correspond to inactive variables that produced a zero mi n in a previous step are skipped. j
Denote the next candidate as xl;tCkjt . The vector ft is formed from the current state vector and xl;tCkjt as follows: j0
j
ft D .zt ; xl;t Ckjt /0 j
The matrix Vj is formed from ft and its canonical correlations are computed. The smallest canonical correlation of Vj is judged to be either greater than or equal to 0. If it is judged to be greater j than 0, xl;tCkjt is added to the state vector. If it is judged to be 0, then a linear combination of ft is uncorrelated with the pt , and the variable xl is now inactive. The state vector selection process continues until no active variables remain.
Testing Significance of Canonical Correlations For each step in the canonical correlation sequence, the significance of the smallest canonical correlation mi n is judged by an information criterion from Akaike (1976). This information criterion is nln.1
2 mi n/
.r.p C 1/ j
q C 1/
where q is the dimension of ft at the current step, r is the order of the state vector, p is the order of the vector autoregressive process, and is the value of the SIGCORR= option. The default is SIGCORR=2. If this information criterion is less than or equal to 0, mi n is taken to be 0; otherwise, it is taken to be significantly greater than 0. (Do not confuse this information criterion with the AIC.)
Canonical Correlation Analysis F 1595
Variables in xt Cpjt are not added in the model, even with positive information criterion, because of the singularity of V. You can force the consideration of more candidate state variables by increasing the size of the V matrix by specifying a PASTMIN= option value larger than p.
Printing the Canonical Correlations To print the details of the canonical correlation analysis process, specify the CANCORR option in the PROC STATESPACE statement. The CANCORR option prints the candidate state vectors, the canonical correlations, and the information criteria for testing the significance of the smallest canonical correlation. Bartlett’s 2 and its degrees of freedom are also printed when the CANCORR option is specified. The formula used for Bartlett’s 2 is 2 D
.n
with r.p C 1/
:5.r.p C 1/
q C 1//ln.1
2 mi n/
q C 1 degrees of freedom.
Figure 25.12 shows the output of the CANCORR option for the introductory example shown in the “Getting Started: STATESPACE Procedure” on page 1570. proc statespace data=in out=out lead=10 cancorr; var x(1) y(1); id t; run;
Figure 25.12 Canonical Correlations Analysis The STATESPACE Procedure Canonical Correlations Analysis
x(T;T)
y(T;T)
x(T+1;T)
Information Criterion
Chi Square
DF
1
1
0.237045
3.566167
11.4505
4
New variables are added to the state vector if the information criteria are positive. In this example, ytC1jt and xt C2jt are not added to the state space vector because the information criteria for these models are negative. If the information criterion is nearly 0, then you might want to investigate models that arise if the opposite decision is made regarding mi n . This investigation can be accomplished by using a FORM statement to specify part or all of the state vector.
Preliminary Estimates of F When a candidate variable xl;t Ckjt yields a zero mi n and is not added to the state vector, a linear j j combination of ft is uncorrelated with the pt . Because of the method used to construct the ft
1596 F Chapter 25: The STATESPACE Procedure
j
sequence, the coefficient of xl;tCkjt in l can be taken as 1. Denote the coefficients of zt in this linear combination as l. This gives the relationship: j
xl;tCkjt D l0 zt
The vector l is used as a preliminary estimate of the first r columns of the row of the transition matrix F corresponding to xl;tCk 1jt .
Parameter Estimation The model is zt C1 D Fzt C Get C1 , where et is a sequence of independent multivariate normal innovations with mean vector 0 and variance †ee . The observed sequence xt composes the first r components of zt , and thus xt D Hzt , where H is the r s matrix ŒIr 0. Let E be the r n matrix of innovations: E D e1 en If the number of observations n is reasonably large, the log likelihood L can be approximated up to an additive constant as follows: LD
1 t race.†ee1 EE0 / 2
n ln.j†ee j/ 2
The elements of †ee are taken as free parameters and are estimated as follows: S0 D
1 0 EE n
Replacing †ee by S0 in the likelihood equation, the log likelihood, up to an additive constant, is LD
n ln.jS0 j/ 2
Letting B be the backshift operator, the formal relation between xt and et is xt D H.I et D .H.I
BF/ BF/
1
Get
1
1
G/
xt D
1 X
„i xt
i
i D0
Letting Ci be the ith lagged sample covariance of xt and neglecting end effects, the matrix S0 is S0 D
1 X i;j D0
„i C
0
i Cj „j
Forecasting F 1597
For the computation of S0 , the infinite sum is truncated at the value of the KLAG= option. The value of the KLAG= option should be large enough that the sequence „i is approximately 0 beyond that point. Let be the vector of free parameters in the F and G matrices. The derivative of the log likelihood with respect to the parameter is @L D @
n 1 @S0 trace S0 2 @
The second derivative is n @2 L 1 @S0 1 @S0 D trace S0 S @@ 0 2 @ 0 0 @
trace S0
1
@2 S0 @@ 0
Near the maximum, the first term is unimportant and the second term can be approximated to give the following second derivative approximation: @2 L Š @@ 0
n trace
0 1 @E @E S0 @ @ 0
The first derivative matrix and this second derivative matrix approximation are computed from the sample covariance matrix C0 and the truncated sequence „i . The approximate likelihood function is maximized by a modified Newton-Raphson algorithm that employs these derivative matrices. The matrix S0 is used as the estimate of the innovation covariance matrix, †ee . The negative of the inverse of the second derivative matrix at the maximum is used as an approximate covariance matrix for the parameter estimates. The standard errors of the parameter estimates printed in the parameter estimates tables are taken from the diagonal of this covariance matrix. The parameter covariance matrix is printed when the COVB option is specified. If the data are nearly nonstationary, a better estimate of †ee and the other parameters can sometimes be obtained by specifying the RESIDEST option. The RESIDEST option estimates the parameters by using conditional least squares instead of maximum likelihood. The residuals are computed using the state space equation and the sample mean values of the variables in the model as start-up values. The estimate of S0 is then computed using the residuals from the ith observation on, where i is the maximum number of times any variable occurs in the state vector. A multivariate Gauss-Marquardt algorithm is used to minimize jS0 j. See Harvey (1981a) for a further description of this method.
Forecasting Given estimates of F, G, and †ee , forecasts of xt are computed from the conditional expectation of zt .
1598 F Chapter 25: The STATESPACE Procedure
In forecasting, the parameters F, G, and †ee are replaced with the estimates or by values specified in the RESTRICT statement. One-step-ahead forecasting is performed for the observation xt , where tn b. Here n is the number of observations and b is the value of the BACK= option. For the observation xt , where t > n b, m-step-ahead forecasting is performed for m D t n C b. The forecasts are generated recursively with the initial condition z0 D 0. The m-step-ahead forecast of zt Cm is zt Cmjt , where ztCmjt denotes the conditional expectation of zt Cm given the information available at time t. The m-step-ahead forecast of xt Cm is xt Cmjt D Hzt Cmjt , where the matrix H D ŒIr 0. Let ‰i D Fi G. Note that the last s
r elements of zt consist of the elements of xujt for u > t.
The state vector zt Cm can be represented as zt Cm D Fm zt C
m X1
‰i et Cm
i
i D0
Since et Ci jt D 0 for i > 0, the m-step-ahead forecast zt Cmjt is zt Cmjt D Fm zt D Fzt Cm
1jt
Therefore, the m-step-ahead forecast of xt Cm is xt Cmjt D Hzt Cmjt The m-step-ahead forecast error is zt Cm
zt Cmjt D
m X1
‰i et Cm
i
i D0
The variance of the m-step-ahead forecast error is Vz;m D
m X1
‰i †ee ‰i0
i D0
Letting Vz;0 D 0, the variance of the m-step-ahead forecast error of zt Cm , Vz;m , can be computed recursively as follows: Vz;m D Vz;m
1
C ‰m
0
1 †ee ‰m 1
The variance of the m-step-ahead forecast error of xt Cm is the r r left upper submatrix of Vz;m ; that is, Vx;m D HVz;m H0 Unless the NOCENTER option is specified, the sample mean vector is added to the forecast. When differencing is specified, the forecasts x t Cmjt plus the sample mean vector are integrated back to produce forecasts for the original series.
Relation of ARMA and State Space Forms F 1599
Let yt be the original series specified by the VAR statement, with some 0 values appended that correspond to the unobserved past observations. Let B be the backshift operator, and let .B/ be the s s matrix polynomial in the backshift operator that corresponds to the differencing specified by the VAR statement. The off-diagonal elements of i are 0. Note that 0 D Is , where Is is the s s identity matrix. Then zt D .B/yt . This gives the relationship yt D
1
1 X
.B/zt D
ƒi zt
i
i D0
where
1 .B/
D
P1
i D0 ƒi B
i
and ƒ0 D Is .
The m-step-ahead forecast of yt Cm is yt Cmjt D
m X1
ƒi zt Cm
i jt
C
i D0
1 X
ƒi zt Cm
i
i Dm
The m-step-ahead forecast error of yt Cm is m X1
ƒi zt Cm
i
zt Cm
i jt
D
i D0
m X1
i X
i D0
uD0
! ƒ u ‰i
et Cm
u
i
Letting Vy;0 D 0, the variance of the m-step-ahead forecast error of yt Cm , Vy;m , is
Vy;m D
m X1
i X
i D0
uD0
D Vy;m
1
C
! ƒu ‰i
u
†ee
i X
!0 ƒu ‰i
u
uD0 m X1
! ƒ u ‰m
1 u
†ee
uD0
m X1
!0 ƒu ‰m
1 u
uD0
Relation of ARMA and State Space Forms Every state space model has an ARMA representation, and conversely every ARMA model has a state space representation. This section discusses this equivalence. The following material is adapted from Akaike (1974), where there is a more complete discussion. Pham-Dinh-Tuan (1978) also contains a discussion of this material. Suppose you are given the following ARMA model: ˆ.B/xt D ‚.B/et or, in more detail, xt ˆ1 xt 1 ˆp xt p D et C ‚1 et 1 C C ‚q et q (1) where et is a sequence of independent multivariate normal random vectors with mean 0 and variance
1600 F Chapter 25: The STATESPACE Procedure
matrix †ee , B is the backshift operator (Bxt D xt B, and xt is the observed process.
1 ),
ˆ.B/ and ‚.B/ are matrix polynomials in
If the roots of the determinantial equation jˆ.B/j D 0 are outside the unit circle in the complex plane, the model can also be written as xt D ˆ
1
.B/‚.B/et D
1 X
‰i et
i
i D0
The ‰i matrices are known as the impulse response matrices and can be computed as ˆ 1 .B/‚.B/. You can assume p > q since, if this is not initially true, you can add more terms ˆi that are identically 0 without changing the model. To write this set of equations in a state space form, proceed as follows. Let xtCi jt be the conditional expectation of xtCi given xw for wt. The following relations hold: xt Ci jt D
1 X
‰j et Ci
j
j Di
xt Ci jtC1 D xt Ci jt C ‰i
1 et C1
However, from equation (1) you can derive the following relationship: xt Cpjt D ˆ1 xt Cp 1jt C C ˆp xt
(2)
Hence, when i D p, you can substitute for xt Cpjt in the right-hand side of equation (2) and close the system of equations. This substitution results in the following model in the state space form zt C1 D Fzt C Get C1 : 2 3 2 32 3 2 3 xt C1 0 I 0 0 xt I 6 xt C2jtC1 7 6 0 6 7 6 7 0 I 0 7 6 7 6 7 6 xtC1jt 7 6 ‰1 7 6 7 D 6 :: 7 C 6 :: 7 et C1 :: :: :: :: 7 6 :: 4 5 4 : 5 4 : 5 : : : : 54 : xt Cpjt C1 ˆp ˆp 1 ˆ1 xtCp 1jt ‰p 1 Note that the state vector zt is composed of conditional expectations of xt and the first r components of zt are equal to xt . The state space form can be cast into an ARMA form by solving the system of difference equations for the first r components. When converting from an ARMA form to a state space form, you can generate a state vector larger than needed; that is, the state space model might not be a minimal representation. When going from a state space form to an ARMA form, you can have nontrivial common factors in the autoregressive and moving average operators that yield an ARMA model larger than necessary. If the state space form used is not a minimal representation, some but not all components of xt Ci jt might be linearly dependent. This situation corresponds to Œˆp ‚p 1 being of less than full rank when ˆ.B/ and ‚.B/ have no common nontrivial left factors. In this case, zt consists of a subset
OUT= Data Set F 1601
of the possible components of Œxt Ci jt i D 1; 2; ; p 1: However, once a component of xt Ci jt (for example, the jth one) is linearly dependent on the previous conditional expectations, then all subsequent jth components of xtCkjt for k > i must also be linearly dependent. Note that in this case, equivalent but seemingly different structures can arise if the order of the components within xt is changed.
OUT= Data Set The forecasts are contained in the output data set specified by the OUT= option in the PROC STATESPACE statement. The OUT= data set contains the following variables: the BY variables the ID variable the VAR statement variables. These variables contain the actual values from the input data set. FORi, numeric variables that contain the forecasts. The variable FORi contains the forecasts for the ith variable in the VAR statement list. Forecasts are one-step-ahead predictions until the end of the data or until the observation specified by the BACK= option. RESi, numeric variables that contain the residual for the forecast of the ith variable in the VAR statement list. For forecast observations, the actual values are missing and the RESi variables contain missing values. STDi, numeric variables that contain the standard deviation for the forecast of the i th variable in the VAR statement list. The values of the STDi variables can be used to construct univariate confidence limits for the corresponding forecasts. However, such confidence limits do not take into account the covariance of the forecasts.
OUTAR= Data Set The OUTAR= data set contains the estimates of the preliminary autoregressive models. The OUTAR= data set contains the following variables: ORDER, a numeric variable that contains the order p of the autoregressive model that the observation represents AIC, a numeric variable that contains the value of the information criterion AICp SIGFl, numeric variables that contain the estimate of the innovation covariance matrices for b p in the the forward autoregressive models. The variable SIGFl contains the lth column of † observations with ORDER=p.
1602 F Chapter 25: The STATESPACE Procedure
SIGBl, numeric variables that contain the estimate of the innovation covariance matrices for b p in the backward autoregressive models. The variable SIGBl contains the lth column of the observations with ORDER=p. FORk _l, numeric variables that contain the estimates of the autoregressive parameter matrices for the forward models. The variable FORk _l contains the lth column of the lag k b p in the observations with ORDER=p. autoregressive parameter matrix ˆ k BACk _l, numeric variables that contain the estimates of the autoregressive parameter matrices for the backward models. The variable BACk _l contains the lth column of the lag k b p in the observations with ORDER=p. autoregressive parameter matrix ‰ k The estimates for the order p autoregressive model can be selected as those observations with ORp DER=p. Within these observations, the k,lth element of ˆi is given by the value of the FORi _l p variable in the kth observation. The k,lth element of ‰i is given by the value of BACi _l variable in the kth observation. The k,lth element of † p is given by SIGFl in the kth observation. The k,lth element of p is given by SIGBl in the kth observation. Table 25.2 shows an example of the OUTAR= data set, with ARMAX=3 and xt of dimension 2. In Table 25.2, .i; j / indicate the i,jth element of the matrix. Table 25.2
Values in the OUTAR= Data Set
Obs
ORDER
AIC
SIGF1
SIGF2
SIGB1
SIGB2
FOR1_1
FOR1_2
FOR2_1
FOR2_2
FOR3_1
1 2 3 4 5 6 7 8
0 0 1 1 2 2 3 3
AIC0 AIC0 AIC1 AIC1 AIC2 AIC2 AIC3 AIC3
† 0.1;1/ † 0.2;1/ † 1.1;1/ † 1.2;1/ † 2.1;1/ † 2.2;1/ † 3.1;1/ † 3.2;1/
† 0.1;2/ † 0.2;2/ † 1.1;2/ † 1.1;2/ † 2.1;2/ † 2.1;2/ † 3.1;2/ † 3.1;2/
0.1;1/ 0.2;1/ 1.1;1/ 1.2;1/ 2.1;1/ 2.2;1/ 3.1;1/ 3.2;1/
0.1;2/ 0.2;2/ 1.1;2/ 1.1;2/ 2.1;2/ 2.1;2/ 3.1;2/ 3.1;2/
. .
. .
ˆ11 .1;1/ ˆ11 .2;1/ ˆ12 .1;1/ ˆ12 .2;1/ ˆ13 .1;1/ ˆ13 .2;1/
ˆ11 .1;2/ ˆ11 .2;2/ ˆ12 .1;2/ ˆ12 .2;2/ ˆ13 .1;2/ ˆ13 .2;2/
. . . .
. . . .
ˆ22 .1;1/ ˆ22 .2;1/ ˆ23 .1;1/ ˆ23 .2;1/
ˆ22 .1;2/ ˆ22 .2;2/ ˆ23 .1;2/ ˆ23 .2;2/
. . . . . .
Obs
FOR3_2
1 2 3 4 5 6 7 8
. . . . . . ˆ33 .1;2/ ˆ33 .2;2/
BACK1_1
ˆ33 .1;1/ ˆ33 .2;1/
BACK1_2
BACK2_1
BACK2_2
BACK3_1
BACK3_2
. .
. .
‰11 .1;1/ ‰11 .2;1/ ‰12 .1;1/ ‰12 .2;1/ ‰13 .1;1/ ‰13 .2;1/
‰11 .1;2/ ‰11 .2;2/ ‰12 .1;2/ ‰12 .2;2/ ‰13 .1;2/ ‰13 .2;2/
. . . .
. . . .
‰22 .1;1/ ‰22 .2;1/ ‰23 .1;1/ ‰23 .2;1/
‰22 .1;2/ ‰22 .2;2/ ‰23 .1;2/ ‰23 .2;2/
. . . . . .
. . . . . .
‰33 .1;1/ ‰33 .2;1/
‰33 .1;2/ ‰33 .2;2/
The estimated autoregressive parameters can be used in the IML procedure to obtain autoregressive estimates of the spectral density function or forecasts based on the autoregressive models.
OUTMODEL= Data Set The OUTMODEL= data set contains the estimates of the F and G matrices and their standard errors, the names of the components of the state vector, and the estimates of the innovation covariance matrix. The variables contained in the OUTMODEL= data set are as follows:
Printed Output F 1603
the BY variables STATEVEC, a character variable that contains the name of the component of the state vector corresponding to the observation. The STATEVEC variable has the value STD for standard deviations observations, which contain the standard errors for the estimates given in the preceding observation. F_j, numeric variables that contain the columns of the F matrix. The variable F_j contains the jth column of F. The number of F_j variables is equal to the value of the DIMMAX= option. If the model is of smaller dimension, the extraneous variables are set to missing. G_j, numeric variables that contain the columns of the G matrix. The variable G_j contains the jth column of G. The number of G_j variables is equal to r, the dimension of xt given by the number of variables in the VAR statement. SIG_j, numeric variables that contain the columns of the innovation covariance matrix. The variable SIG_j contains the jth column of †ee . There are r variables SIG_j. Table 25.3 shows an example of the OUTMODEL= data set, with xt D .xt ; yt /0 , zt D .xt ; yt ; xt C1jt /0 , and DIMMAX=4. In Table 25.3, Fi;j and Gi;j are the i,jth elements of F and G respectively. Note that all elements for F_4 are missing because F is a 3 3 matrix. Table 25.3
Value in the OUTMODEL= Data Set
Obs
STATEVEC
F_1
F_2
F_3
F_4
G_1
G_2
SIG_1
SIG_2
1 2 3 4 5 6
X(T;T) STD Y(T;T) STD X(T+1;T) STD
0 .
0 .
1 .
F 2;1 std F 2;1 F 3;1 std F 3;1
F 2;2 std F 2;2 F 3;2 std F 3;2
F 2;3 std F 2;3 F 3;3 std F 3;3
. . . . . .
1 . 0 .
0 . 1 .
G 3;1 std G 3;1
G 3;2 std G 3;2
† 1;1 . † 2;1 . . .
† 1;2 . † 2;2 . . .
Printed Output The printed output produced by the STATESPACE procedure includes the following: 1. descriptive statistics, which include the number of observations used, the names of the variables, their means and standard deviations (Std), and the differencing operations used 2. the Akaike information criteria for the sequence of preliminary autoregressive models 3. if the PRINTOUT=LONG option is specified, the sample autocovariance matrices of the input series at various lags 4. if the PRINTOUT=LONG option is specified, the sample autocorrelation matrices of the input series
1604 F Chapter 25: The STATESPACE Procedure
5. a schematic representation of the autocorrelation matrices, showing the significant autocorrelations 6. if the PRINTOUT=LONG option is specified, the partial autoregressive matrices. (These are p ˆp as described in the section “Preliminary Autoregressive Models” on page 1590.) 7. a schematic representation of the partial autocorrelation matrices, showing the significant partial autocorrelations 8. the Yule-Walker estimates of the autoregressive parameters for the autoregressive model with the minimum AIC 9. if the PRINTOUT=LONG option is specified, the autocovariance matrices of the residuals of the minimum AIC model. This is the sequence of estimated innovation variance matrices for the solutions of the Yule-Walker equations. 10. if the PRINTOUT=LONG option is specified, the autocorrelation matrices of the residuals of the minimum AIC model 11. If the CANCORR option is specified, the canonical correlations analysis for each potential state vector considered in the state vector selection process. This includes the potential state vector, the canonical correlations, the information criterion for the smallest canonical correlation, Bartlett’s 2 statistic (“Chi Square”) for the smallest canonical correlation, and the degrees of freedom of Bartlett’s 2 . 12. the components of the chosen state vector 13. the preliminary estimate of the transition matrix, F, the input matrix, G, and the variance matrix for the innovations, †ee 14. if the ITPRINT option is specified, the iteration history of the likelihood maximization. For each iteration, this shows the iteration number, the number of step halvings, the determinant of the innovation variance matrix, the damping factor Lambda, and the values of the parameters. 15. the state vector, printed again to aid interpretation of the following listing of F and G 16. the final estimate of the transition matrix F 17. the final estimate of the input matrix G 18. the final estimate of the variance matrix for the innovations †ee 19. a table that lists the estimates of the free parameters in F and G and their standard errors and t statistics 20. if the COVB option is specified, the covariance matrix of the parameter estimates 21. if the COVB option is specified, the correlation matrix of the parameter estimates 22. if the PRINT option is specified, the forecasts and their standard errors
Examples: STATESPACE Procedure F 1605
ODS Table Names PROC STATESPACE assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. Table 25.4
ODS Tables Produced in PROC STATESPACE
ODS Table Name
Description
Option
NObs Summary InfoCriterion CovLags CorrLags PartialAR YWEstimates CovResiduals CorrResiduals StateVector CorrGraph TransitionMatrix InputMatrix VarInnov CovB CorrB CanCorr IterHistory ParameterEstimates Forecasts ConvergenceStatus
number of observations simple summary statistics table information criterion table covariance matrices of input series correlation matrices of input series partial autoregressive matrices Yule-Walker estimates for minimum AIC covariance of residuals residual correlations from AR models state vector table schematic representation of correlations transition matrix input matrix variance matrix for the innovation covariance of parameter estimates correlation of parameter estimates canonical correlation analysis iterative fitting table parameter estimates table forecasts table convergence status table
default default default PRINTOUT=LONG PRINTOUT=LONG PRINTOUT=LONG default PRINTOUT=LONG PRINTOUT=LONG default default default default default COVB COVB CANCORR ITPRINT default PRINT default
Examples: STATESPACE Procedure
Example 25.1: Series J from Box and Jenkins This example analyzes the gas furnace data (series J) from Box and Jenkins. (The data are not shown; see Box and Jenkins (1976) for the data.)
1606 F Chapter 25: The STATESPACE Procedure
First, a model is selected and fit automatically using the following statements. title1 ’Gas Furnace Data’; title2 ’Box & Jenkins Series J’; title3 ’Automatically Selected Model’; proc statespace data=seriesj cancorr; var x y; run;
The results for the automatically selected model are shown in Output 25.1.1. Output 25.1.1 Results for Automatically Selected Model Gas Furnace Data Box & Jenkins Series J Automatically Selected Model The STATESPACE Procedure Number of Observations
Mean
Standard Error
-0.05683 53.50912
1.072766 3.202121
Variable x y
296
Gas Furnace Data Box & Jenkins Series J Automatically Selected Model The STATESPACE Procedure Information Criterion for Autoregressive Models Lag=0
Lag=1
Lag=2
Lag=3
Lag=4
Lag=5
Lag=6
Lag=7
Lag=8
651.3862 -1033.57 -1632.96 -1645.12 -1651.52 -1648.91 -1649.34 -1643.15 -1638.56 Information Criterion for Autoregressive Models Lag=9
Lag=10
-1634.8
-1633.59
Example 25.1: Series J from Box and Jenkins F 1607
Output 25.1.1 continued Schematic Representation of Correlations Name/Lag x y
0
1
2
3
4
5
6
7
8
9
10
+-+
+-+
+-+
+-+
+-+
+-+
+-+
+-+
+-+
+-+
+-+
+ is > 2*std error,
- is < -2*std error,
. is between
Output 25.1.2 Results for Automatically Selected Model Schematic Representation of Partial Autocorrelations Name/Lag x y
1
2
3
4
5
6
7
8
9
10
+. -+
-. --
+. -.
.. .+
.. ..
-. ..
.. ..
.. ..
.. ..
.. .+
+ is > 2*std error,
- is < -2*std error,
. is between
Yule-Walker Estimates for Minimum AIC ------Lag=1------ ------Lag=2------ ------Lag=3------ ------Lag=4-----x y x y x y x y x y
1.925887 -0.00124 -1.20166 0.004224 0.116918 -0.00867 0.104236 0.003268 0.050496 1.299793 -0.02046 -0.3277 -0.71182 -0.25701 0.195411 0.133417
Output 25.1.3 Results for Automatically Selected Model Gas Furnace Data Box & Jenkins Series J Automatically Selected Model The STATESPACE Procedure Canonical Correlations Analysis
x(T;T)
y(T;T)
x(T+1;T)
Information Criterion
Chi Square
DF
1
1
0.804883
292.9228
304.7481
8
1608 F Chapter 25: The STATESPACE Procedure
Output 25.1.4 Results for Automatically Selected Model Gas Furnace Data Box & Jenkins Series J Automatically Selected Model The STATESPACE Procedure Selected Statespace Form and Preliminary Estimates State Vector x(T;T)
y(T;T)
x(T+1;T)
y(T+1;T)
y(T+2;T)
Estimate of Transition Matrix 0 0 -0.84718 0 -0.19785
0 0 0.026794 0 0.334274
1 0 1.711715 0 -0.18174
0 1 -0.05019 0 -1.23557
0 0 0 1 1.787475
Input Matrix for Innovation 1 0 1.925887 0.050496 0.142421
0 1 -0.00124 1.299793 1.361696
Output 25.1.5 Results for Automatically Selected Model Variance Matrix for Innovation 0.035274 -0.00734
-0.00734 0.097569
Output 25.1.6 Results for Automatically Selected Model Gas Furnace Data Box & Jenkins Series J Automatically Selected Model The STATESPACE Procedure Selected Statespace Form and Fitted Model State Vector x(T;T)
y(T;T)
x(T+1;T)
y(T+1;T)
y(T+2;T)
Example 25.1: Series J from Box and Jenkins F 1609
Output 25.1.6 continued Estimate of Transition Matrix 0 0 -0.86192 0 -0.34839
0 0 0.030609 0 0.292124
1 0 1.724235 0 -0.09435
0 1 -0.05483 0 -1.09823
0 0 0 1 1.671418
Input Matrix for Innovation 1 0 1.92442 0.015621 0.08058
0 1 -0.00416 1.258495 1.353204
Output 25.1.7 Results for Automatically Selected Model Variance Matrix for Innovation 0.035579 -0.00728
-0.00728 0.095577
Parameter Estimates
Parameter
Estimate
Standard Error
t Value
F(3,1) F(3,2) F(3,3) F(3,4) F(5,1) F(5,2) F(5,3) F(5,4) F(5,5) G(3,1) G(3,2) G(4,1) G(4,2) G(5,1) G(5,2)
-0.86192 0.030609 1.724235 -0.05483 -0.34839 0.292124 -0.09435 -1.09823 1.671418 1.924420 -0.00416 0.015621 1.258495 0.080580 1.353204
0.072961 0.026167 0.061599 0.030169 0.135253 0.046299 0.096527 0.109525 0.083737 0.058162 0.035255 0.095771 0.055742 0.151622 0.091388
-11.81 1.17 27.99 -1.82 -2.58 6.31 -0.98 -10.03 19.96 33.09 -0.12 0.16 22.58 0.53 14.81
The two series are believed to have a transfer function relation with the gas rate (variable X) as the input and the CO2 concentration (variable Y) as the output. Since the parameter estimates shown in Output 25.1.1 support this kind of model, the model is reestimated with the feedback parameters restricted to 0. The following statements fit the transfer function (no feedback) model. title3 ’Transfer Function Model’; proc statespace data=seriesj printout=none;
1610 F Chapter 25: The STATESPACE Procedure
var x y; restrict f(3,2)=0 f(3,4)=0 g(3,2)=0 g(4,1)=0 g(5,1)=0; run;
The last two pages of the output are shown in Output 25.1.8. Output 25.1.8 STATESPACE Output for Transfer Function Model Gas Furnace Data Box & Jenkins Series J Transfer Function Model The STATESPACE Procedure Selected Statespace Form and Fitted Model State Vector x(T;T)
y(T;T)
x(T+1;T)
y(T+1;T)
y(T+2;T)
Estimate of Transition Matrix 0 0 -0.68882 0 -0.35944
0 0 0 0 0.284179
1 0 1.598717 0 -0.0963
0 1 0 0 -1.07313
Input Matrix for Innovation 1 0 1.923446 0 0
0 1 0 1.260856 1.346332
Output 25.1.9 STATESPACE Output for Transfer Function Model Variance Matrix for Innovation 0.036995 -0.0072
-0.0072 0.095712
0 0 0 1 1.650047
References F 1611
Output 25.1.9 continued Parameter Estimates
Parameter
Estimate
Standard Error
t Value
F(3,1) F(3,3) F(5,1) F(5,2) F(5,3) F(5,4) F(5,5) G(3,1) G(4,2) G(5,2)
-0.68882 1.598717 -0.35944 0.284179 -0.09630 -1.07313 1.650047 1.923446 1.260856 1.346332
0.050549 0.050924 0.229044 0.096944 0.140876 0.250385 0.188533 0.056328 0.056464 0.091086
-13.63 31.39 -1.57 2.93 -0.68 -4.29 8.75 34.15 22.33 14.78
References Akaike, H. (1974), “Markovian Representation of Stochastic Processes and Its Application to the Analysis of Autoregressive Moving Average Processes,” Annals of the Institute of Statistical Mathematics, 26, 363–387. Akaike, H. (1976), “Canonical Correlations Analysis of Time Series and the Use of an Information Criterion,” in Advances and Case Studies in System Identification, eds. R. Mehra and D.G. Lainiotis, New York: Academic Press. Anderson, T.W. (1971), The Statistical Analysis of Time Series, New York: John Wiley & Sons. Ansley, C.F. and Newbold, P. (1979), “Multivariate Partial Autocorrelations,” Proceedings of the Business and Economic Statistics Section, American Statistical Association, 349–353. Box, G.E.P. and Jenkins, G. (1976), Time Series Analysis: Forecasting and Control, San Francisco: Holden-Day. Brockwell, P.J. and Davis, R.A. (1991), Time Series: Theory and Methods, 2nd Edition, SpringerVerlag. Hannan, E.J. (1970), Multiple Time Series, New York: John Wiley & Sons. Hannan, E.J. (1976), “The Identification and Parameterization of ARMAX and State Space Forms,” Econometrica, 44, 713–722. Harvey, A.C. (1981a), The Econometric Analysis of Time Series, New York: John Wiley & Sons. Harvey, A.C. (1981b), Time Series Models, New York: John Wiley & Sons. Jones, R.H. (1974), “Identification and Autoregressive Spectrum Estimation,” IEEE Transactions
1612 F Chapter 25: The STATESPACE Procedure
on Automatic Control, AC-19, 894–897. Pham-Dinh-Tuan (1978), “On the Fitting of Multivariate Processes of the Autoregressive Moving Average Type,” Biometrika, 65, 99–107. Priestley, M.B. (1980), “System Identification, Kalman Filtering, and Stochastic Control,” in Directions in Time Series, eds. D.R. Brillinger and G.C. Tiao, Institute of Mathematical Statistics. Whittle, P. (1963), “On the Fitting of Multivariate Autoregressions and the Approximate Canonical Factorization of a Spectral Density Matrix,” Biometrika, 50, 129–134.
Chapter 26
The SYSLIN Procedure Contents Overview: SYSLIN Procedure . . . . . . . . . . . . . . Getting Started: SYSLIN Procedure . . . . . . . . . . . An Example Model . . . . . . . . . . . . . . . . Variables in a System of Equations . . . . . . . . Using PROC SYSLIN . . . . . . . . . . . . . . . OLS Estimation . . . . . . . . . . . . . . . . . . Two-Stage Least Squares Estimation . . . . . . . LIML, K-Class, and MELO Estimation . . . . . . SUR, 3SLS, and FIML Estimation . . . . . . . . . Computing Reduced Form Estimates . . . . . . . Restricting Parameter Estimates . . . . . . . . . . Testing Parameters . . . . . . . . . . . . . . . . . Saving Residuals and Predicted Values . . . . . . Plotting Residuals . . . . . . . . . . . . . . . . . Syntax: SYSLIN Procedure . . . . . . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . . . PROC SYSLIN Statement . . . . . . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . . . ENDOGENOUS Statement . . . . . . . . . . . . IDENTITY Statement . . . . . . . . . . . . . . . INSTRUMENTS Statement . . . . . . . . . . . . MODEL Statement . . . . . . . . . . . . . . . . . OUTPUT Statement . . . . . . . . . . . . . . . . RESTRICT Statement . . . . . . . . . . . . . . . SRESTRICT Statement . . . . . . . . . . . . . . STEST Statement . . . . . . . . . . . . . . . . . TEST Statement . . . . . . . . . . . . . . . . . . VAR Statement . . . . . . . . . . . . . . . . . . . WEIGHT Statement . . . . . . . . . . . . . . . . Details: SYSLIN Procedure . . . . . . . . . . . . . . . Input Data Set . . . . . . . . . . . . . . . . . . . Estimation Methods . . . . . . . . . . . . . . . . ANOVA Table for Instrumental Variables Methods The R-Square Statistics . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1614 1615 1615 1616 1617 1617 1619 1621 1621 1625 1626 1628 1630 1631 1632 1633 1634 1637 1637 1638 1638 1638 1640 1641 1642 1643 1644 1646 1646 1647 1647 1647 1650 1651
1614 F Chapter 26: The SYSLIN Procedure
Computational Details . . . . . . . . . . . . . . . . . . . . . . . Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . OUT= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . OUTEST= Data Set . . . . . . . . . . . . . . . . . . . . . . . . OUTSSCP= Data Set . . . . . . . . . . . . . . . . . . . . . . . Printed Output . . . . . . . . . . . . . . . . . . . . . . . . . . . ODS Table Names . . . . . . . . . . . . . . . . . . . . . . . . . ODS Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples: SYSLIN Procedure . . . . . . . . . . . . . . . . . . . . . . Example 26.1: Klein’s Model I Estimated with LIML and 3SLS . Example 26.2: Grunfeld’s Model Estimated with SUR . . . . . . Example 26.3: Illustration of ODS Graphics . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
1651 1654 1655 1655 1656 1657 1659 1660 1661 1661 1668 1672 1676
Overview: SYSLIN Procedure The SYSLIN procedure estimates parameters in an interdependent system of linear regression equations. Ordinary least squares (OLS) estimates are biased and inconsistent when current period endogenous variables appear as regressors in other equations in the system. The errors of a set of related regression equations are often correlated, and the efficiency of the estimates can be improved by taking these correlations into account. The SYSLIN procedure provides several techniques that produce consistent and asymptotically efficient estimates for systems of regression equations. The SYSLIN procedure provides the following estimation methods: ordinary least squares (OLS) two-stage least squares (2SLS) limited information maximum likelihood (LIML) K-class seemingly unrelated regressions (SUR) iterated seemingly unrelated regressions (ITSUR) three-stage least squares (3SLS) iterated three-stage least squares (IT3SLS) full information maximum likelihood (FIML) minimum expected loss (MELO)
Getting Started: SYSLIN Procedure F 1615
Other features of the SYSLIN procedure enable you to: impose linear restrictions on the parameter estimates test linear hypotheses about the parameters write predicted and residual values to an output SAS data set write parameter estimates to an output SAS data set write the crossproducts matrix (SSCP) to an output SAS data set use raw data, correlations, covariances, or cross products as input
Getting Started: SYSLIN Procedure This section introduces the use of the SYSLIN procedure. The problem of dependent regressors is introduced using a supply and demand example. This section explains the terminology used for variables in a system of regression equations and introduces the SYSLIN procedure statements for declaring the roles the variables play. The syntax used for the different estimation methods and the output produced is shown.
An Example Model In simultaneous systems of equations, endogenous variables are determined jointly rather than sequentially. Consider the following supply and demand functions for some product: QD D a1 C b1 P C c1 Y C d1 S C 1 .demand/ QS D a2 C b2 P C c2 U C 2 .supply/ Q D QD D QS .market equilibrium/ The variables in this system are as follows: QD
quantity demanded
QS
quantity supplied
Q
the observed quantity sold, which equates quantity supplied and quantity demanded in equilibrium
P
price per unit
Y
income
S
price of substitutes
1616 F Chapter 26: The SYSLIN Procedure
U
unit cost
1
the random error term for the demand equation
2
the random error term for the supply equation
In this system, quantity demanded depends on price, income, and the price of substitutes. Consumers normally purchase more of a product when prices are lower and when income and the price of substitute goods are higher. Quantity supplied depends on price and the unit cost of production. Producers supply more when price is high and when unit cost is low. The actual price and quantity sold are determined jointly by the values that equate demand and supply. Since price and quantity are jointly endogenous variables, both structural equations are necessary to adequately describe the observed values. A critical assumption of OLS is that the regressors are uncorrelated with the residual. When current endogenous variables appear as regressors in other equations (endogenous variables depend on each other), this assumption is violated and the OLS parameter estimates are biased and inconsistent. The bias caused by the violated assumptions is called simultaneous equation bias. Neither the demand nor supply equation can be estimated consistently by OLS.
Variables in a System of Equations Before explaining how to use the SYSLIN procedure, it is useful to define some terms. The variables in a system of equations can be classified as follows: Endogenous variables, which are also called jointly dependent or response variables, are the variables determined by the system. Endogenous variables can also appear on the right-hand side of equations. Exogenous variables are independent variables that do not depend on any of the endogenous variables in the system. Predetermined variables include both the exogenous variables and lagged endogenous variables, which are past values of endogenous variables determined at previous time periods. PROC SYSLIN does not compute lagged values; any lagged endogenous variables must be computed in a preceding DATA step. Instrumental variables are predetermined variables used in obtaining predicted values for the current period endogenous variables by a first-stage regression. The use of instrumental variables characterizes estimation methods such as two-stage least squares and three-stage least squares. Instrumental variables estimation methods substitute these first-stage predicted values for endogenous variables when they appear as regressors in model equations.
Using PROC SYSLIN F 1617
Using PROC SYSLIN First specify the input data set and estimation method in the PROC SYSLIN statement. If any model uses dependent regressors, and you are using an instrumental variables regression method, declare the dependent regressors with an ENDOGENOUS statement and declare the instruments with an INSTRUMENTS statement. Next, use MODEL statements to specify the structural equations of the system. The use of different estimation methods is shown by the following examples. These examples use the simulated dataset WORK.IN given below. data in; label q = "Quantity" p = "Price" s = "Price of Substitutes" y = "Income" u = "Unit Cost"; drop i e1 e2; p = 0; q = 0; do i = 1 to 60; y = 1 + .05*i + .15*rannor(123); u = 2 + .05*rannor(123) + .05*rannor(123); s = 4 - .001*(i-10)*(i-110) + .5*rannor(123); e1 = .15 * rannor(123); e2 = .15 * rannor(123); demandx = 1 + .3 * y + .35 * s + e1; supplyx = -1 - 1 * u + e2 - .4*e1; q = 1.4/2.15 * demandx + .75/2.15 * supplyx; p = ( - q + supplyx ) / -1.4; output; end; run;
OLS Estimation PROC SYSLIN performs OLS regression if you do not specify a method of estimation in the PROC SYSLIN statement. OLS does not use instruments, so the ENDOGENOUS and INSTRUMENTS statements can be omitted. The following statements estimate the supply and demand model shown previously: proc syslin data=in; demand: model q = p y s; supply: model q = p u; run;
The PROC SYSLIN output for the demand equation is shown in Figure 26.1, and the output for the
1618 F Chapter 26: The SYSLIN Procedure
supply equation is shown in Figure 26.2. Figure 26.1 OLS Results for Demand Equation The SYSLIN Procedure Ordinary Least Squares Estimation Model Dependent Variable Label
DEMAND q Quantity
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
Model Error Corrected Total
3 56 59
9.587901 0.449338 10.03724
3.195967 0.008024
Root MSE Dependent Mean Coeff Var
0.08958 1.30095 6.88542
R-Square Adj R-Sq
F Value
Pr > F
398.31
<.0001
0.95523 0.95283
Parameter Estimates
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
Intercept p y s
1 1 1 1
-0.47677 0.123326 0.201282 0.167258
0.210239 0.105177 0.032403 0.024091
-2.27 1.17 6.21 6.94
0.0272 0.2459 <.0001 <.0001
Variable Label Intercept Price Income Price of Substitutes
Figure 26.2 OLS Results for Supply Equation The SYSLIN Procedure Ordinary Least Squares Estimation Model Dependent Variable Label
SUPPLY q Quantity
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
Model Error Corrected Total
2 57 59
9.033902 1.003337 10.03724
4.516951 0.017602
F Value
Pr > F
256.61
<.0001
Two-Stage Least Squares Estimation F 1619
Figure 26.2 continued Root MSE Dependent Mean Coeff Var
0.13267 1.30095 10.19821
R-Square Adj R-Sq
0.90004 0.89653
Parameter Estimates
Variable Intercept p u
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
1 1 1
-0.30389 1.218743 -1.07757
0.471397 0.053914 0.234150
-0.64 22.61 -4.60
0.5217 <.0001 <.0001
Variable Label Intercept Price Unit Cost
For each MODEL statement, the output first shows the model label and dependent variable name and label. This is followed by an analysis-of-variance table for the model, which shows the model, error, and total mean squares, and an F test for the no-regression hypothesis. Next, the procedure prints the root mean squared error, dependent variable mean and coefficient of variation, and the R2 and adjusted R2 statistics. Finally, the table of parameter estimates shows the estimated regression coefficients, standard errors, and t tests. You would expect the price coefficient in a demand equation to be negative. However, note that the OLS estimate of the price coefficient P in the demand equation (0.1233) has a positive sign. This could be caused by simultaneous equation bias.
Two-Stage Least Squares Estimation In the supply and demand model, P is an endogenous variable, and consequently the OLS estimates are biased. The following example estimates this model using two-stage least squares. proc syslin data=in 2sls; endogenous p; instruments y u s; demand: model q = p y s; supply: model q = p u; run;
The 2SLS option in the PROC SYSLIN statement specifies the two-stage least squares method. The ENDOGENOUS statement specifies that P is an endogenous regressor for which first-stage predicted values are substituted. You need to declare an endogenous variable in the ENDOGENOUS statement only if it is used as a regressor; thus although Q is endogenous in this model, it is not necessary to list it in the ENDOGENOUS statement. Usually, all predetermined variables that appear in the system are used as instruments. The INSTRUMENTS statement specifies that the exogenous variables Y, U, and S are used as instruments for the first-stage regression to predict P.
1620 F Chapter 26: The SYSLIN Procedure
The 2SLS results are shown in Figure 26.3 and Figure 26.4. The first-stage regressions are not shown. To see the first-stage regression results, use the FIRST option in the PROC SYSLIN statement. Figure 26.3 2SLS Results for Demand Equation The SYSLIN Procedure Two-Stage Least Squares Estimation Model Dependent Variable Label
DEMAND q Quantity
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
Model Error Corrected Total
3 56 59
9.670892 1.561956 10.03724
3.223631 0.027892
Root MSE Dependent Mean Coeff Var
0.16701 1.30095 12.83744
R-Square Adj R-Sq
F Value
Pr > F
115.58
<.0001
0.86095 0.85350
Parameter Estimates
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
Intercept p y s
1 1 1 1
1.901048 -1.11519 0.419546 0.331475
1.171231 0.607395 0.117955 0.088472
1.62 -1.84 3.56 3.75
0.1102 0.0717 0.0008 0.0004
Variable Label Intercept Price Income Price of Substitutes
Figure 26.4 2SLS Results for Supply Equation The SYSLIN Procedure Two-Stage Least Squares Estimation Model Dependent Variable Label
SUPPLY q Quantity
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
Model Error Corrected Total
2 57 59
9.646109 1.082503 10.03724
4.823054 0.018991
F Value
Pr > F
253.96
<.0001
LIML, K-Class, and MELO Estimation F 1621
Figure 26.4 continued Root MSE Dependent Mean Coeff Var
0.13781 1.30095 10.59291
R-Square Adj R-Sq
0.89910 0.89556
Parameter Estimates
Variable Intercept p u
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
1 1 1
-0.51878 1.333080 -1.14623
0.490999 0.059271 0.243491
-1.06 22.49 -4.71
0.2952 <.0001 <.0001
Variable Label Intercept Price Unit Cost
The 2SLS output is similar in form to the OLS output. However, the 2SLS results are based on predicted values for the endogenous regressors from the first stage instrumental regressions. This makes the analysis-of-variance table and the R2 statistics difficult to interpret. See the sections “ANOVA Table for Instrumental Variables Methods” on page 1650 and “The R-Square Statistics” on page 1651 for details. Note that, unlike the OLS results, the 2SLS estimate for the P coefficient in the demand equation (–1.115) is negative.
LIML, K-Class, and MELO Estimation To obtain limited information maximum likelihood, general K-class, or minimum expected loss estimates, use the ENDOGENOUS, INSTRUMENTS, and MODEL statements as in the 2SLS case but specify the LIML, K=, or MELO option instead of 2SLS in the PROC SYSLIN statement. The following statements show this for K-class estimation. proc syslin data=in k=.5; endogenous p; instruments y u s; demand: model q = p y s; supply: model q = p u; run;
For more information about these estimation methods, see the section “Estimation Methods” on page 1647 and consult econometrics textbooks.
SUR, 3SLS, and FIML Estimation In a multivariate regression model, the errors in different equations might be correlated. In this case, the efficiency of the estimation might be improved by taking these cross-equation correlations into
1622 F Chapter 26: The SYSLIN Procedure
account.
Seemingly Unrelated Regression Seemingly unrelated regression (SUR), also called joint generalized least squares (JGLS) or Zellner estimation, is a generalization of OLS for multi-equation systems. Like OLS, the SUR method assumes that all the regressors are independent variables, but SUR uses the correlations among the errors in different equations to improve the regression estimates. The SUR method requires an initial OLS regression to compute residuals. The OLS residuals are used to estimate the cross-equation covariance matrix. The SUR option in the PROC SYSLIN statement specifies seemingly unrelated regression, as shown in the following statements: proc syslin data=in sur; demand: model q = p y s; supply: model q = p u; run;
INSTRUMENTS and ENDOGENOUS statements are not needed for SUR, because the SUR method assumes there are no endogenous regressors. For SUR to be effective, the models must use different regressors. SUR produces the same results as OLS unless the model contains at least one regressor not used in the other equations.
Three-Stage Least Squares The three-stage least squares method generalizes the two-stage least squares method to take into account the correlations between equations in the same way that SUR generalizes OLS. Three-stage least squares requires three steps: first-stage regressions to get predicted values for the endogenous regressors; a two-stage least squares step to get residuals to estimate the cross-equation correlation matrix; and the final 3SLS estimation step. The 3SLS option in the PROC SYSLIN statement specifies the three-stage least squares method, as shown in the following statements. proc syslin data=in 3sls; endogenous p; instruments y u s; demand: model q = p y s; supply: model q = p u; run;
The 3SLS output begins with a two-stage least squares regression to estimate the cross-model correlation matrix. This output is the same as the 2SLS results shown in Figure 26.3 and Figure 26.4, and is not repeated here. The next part of the 3SLS output prints the cross-model correlation matrix computed from the 2SLS residuals. This output is shown in Figure 26.5 and includes the crossmodel covariances, correlations, the inverse of the correlation matrix, and the inverse covariance matrix.
SUR, 3SLS, and FIML Estimation F 1623
Figure 26.5 Estimated Cross-Model Covariances Used for 3SLS Estimates The SYSLIN Procedure Three-Stage Least Squares Estimation Cross Model Covariance
DEMAND SUPPLY
DEMAND
SUPPLY
0.027892 -.011283
-.011283 0.018991
Cross Model Correlation
DEMAND SUPPLY
DEMAND
SUPPLY
1.00000 -0.49022
-0.49022 1.00000
Cross Model Inverse Correlation
DEMAND SUPPLY
DEMAND
SUPPLY
1.31634 0.64530
0.64530 1.31634
Cross Model Inverse Covariance
DEMAND SUPPLY
DEMAND
SUPPLY
47.1941 28.0379
28.0379 69.3130
The final 3SLS estimates are shown in Figure 26.6. Figure 26.6 Three-Stage Least Squares Results System Weighted MSE Degrees of freedom System Weighted R-Square Model Dependent Variable Label
0.5711 113 0.9627 DEMAND q Quantity
Parameter Estimates
Variable Intercept p y s
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
1 1 1 1
1.980269 -1.17654 0.404117 0.359204
1.169176 0.605015 0.117179 0.085077
1.69 -1.94 3.45 4.22
0.0959 0.0568 0.0011 <.0001
Variable Label Intercept Price Income Price of Substitutes
1624 F Chapter 26: The SYSLIN Procedure
Figure 26.6 continued Model Dependent Variable Label
SUPPLY q Quantity
Parameter Estimates
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
Intercept p u
1 1 1
-0.51878 1.333080 -1.14623
0.490999 0.059271 0.243491
-1.06 22.49 -4.71
0.2952 <.0001 <.0001
Variable Label Intercept Price Unit Cost
This output first prints the system weighted mean squared error and system weighted R2 statistics. The system weighted MSE and system weighted R2 measure the fit of the joint model obtained by stacking all the models together and performing a single regression with the stacked observations weighted by the inverse of the model error variances. See the section “The R-Square Statistics” on page 1651 for details. Next, the table of 3SLS parameter estimates for each model is printed. This output has the same form as for the other estimation methods. Note that, in some cases, the 3SLS and 2SLS results can be the same. Such a case could arise because of the same principle that causes OLS and SUR results to be identical, unless an equation includes a regressor not used in the other equations of the system. However, the application of this principle is more complex when instrumental variables are used. When all the exogenous variables are used as instruments, linear combinations of all the exogenous variables appear in the third-stage regressions through substitution of first-stage predicted values. In this example, 3SLS produces different (and, it is hoped, more efficient) estimates for the demand equation. However, the 3SLS and 2SLS results for the supply equation are the same. This is because the supply equation has one endogenous regressor and one exogenous regressor not used in other equations. In contrast, the demand equation has fewer endogenous regressors than exogenous regressors not used in other equations in the system.
Full Information Maximum Likelihood The FIML option in the PROC SYSLIN statement specifies the full information maximum likelihood method, as shown in the following statements. proc syslin data=in fiml; endogenous p q; instruments y u s; demand: model q = p y s; supply: model q = p u; run;
The FIML results are shown in Figure 26.7.
Computing Reduced Form Estimates F 1625
Figure 26.7 FIML Results The SYSLIN Procedure Full-Information Maximum Likelihood Estimation NOTE: Convergence criterion met at iteration 3. Model Dependent Variable Label
DEMAND q Quantity
Parameter Estimates
Variable Intercept p y s
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
1 1 1 1
1.988538 -1.18148 0.402312 0.361345
1.233632 0.652278 0.107270 0.103817
1.61 -1.81 3.75 3.48
0.1126 0.0755 0.0004 0.0010
Model Dependent Variable Label
Variable Label Intercept Price Income Price of Substitutes
SUPPLY q Quantity
Parameter Estimates
Variable Intercept p u
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
1 1 1
-0.52443 1.336083 -1.14804
0.479522 0.057939 0.237793
-1.09 23.06 -4.83
0.2787 <.0001 <.0001
Variable Label Intercept Price Unit Cost
Computing Reduced Form Estimates A system of structural equations with endogenous regressors can be represented as functions of only the predetermined variables. For this to be possible, there must be as many equations as endogenous variables. If there are more endogenous variables than regression models, you can use IDENTITY statements to complete the system. See the section “Reduced Form Estimates” on page 1653 for details. The REDUCED option in the PROC SYSLIN statement prints reduced form estimates. The following statements show this by using the 3SLS estimates of the structural parameters. proc syslin data=in 3sls reduced; endogenous p; instruments y u s; demand: model q = p y s; supply: model q = p u;
1626 F Chapter 26: The SYSLIN Procedure
run;
The first four pages of this output were as shown previously and are not repeated here. (See Figure 26.3, Figure 26.4, Figure 26.5, and Figure 26.6.) The final page of the output from this example contains the reduced form coefficients from the 3SLS structural estimates, as shown in Figure 26.8. Figure 26.8 Reduced Form 3SLS Results The SYSLIN Procedure Three-Stage Least Squares Estimation Endogenous Variables
DEMAND SUPPLY
p
q
1.176543 -1.33308
1 1
Exogenous Variables
DEMAND SUPPLY
Intercept
y
s
u
1.980269 -0.51878
0.404117 0
0.359204 0
0 -1.14623
Inverse Endogenous Variables
p q
DEMAND
SUPPLY
0.398466 0.531187
-0.39847 0.468813
Reduced Form
p q
Intercept
y
s
u
0.995788 0.808682
0.161027 0.214662
0.143131 0.190804
0.456735 -0.53737
Restricting Parameter Estimates You can impose restrictions on the parameter estimates with RESTRICT and SRESTRICT statements. The RESTRICT statement imposes linear restrictions on parameters in the equation specified by the preceding MODEL statement. The SRESTRICT statement imposes linear restrictions that relate parameters in different models. To impose restrictions involving parameters in different equations, use the SRESTRICT statement. Specify the parameters in the linear hypothesis as model-label.regressor-name. (If the MODEL statement does not have a label, you can use the dependent variable name as the label for the model,
Restricting Parameter Estimates F 1627
provided the dependent variable uniquely labels the model.) Tests for the significance of the restrictions are printed when RESTRICT or SRESTRICT statements are used. You can label RESTRICT and SRESTRICT statements to identify the restrictions in the output. The RESTRICT statement in the following example restricts the price coefficient in the demand equation to equal 0.015. The SRESTRICT statement restricts the estimate of the income coefficient in the demand equation to be 0.01 times the estimate of the unit cost coefficient in the supply equation. proc syslin data=in 3sls; endogenous p; instruments y u s; demand: model q = p y s; peq015: restrict p = .015; supply: model q = p u; yeq01u: srestrict demand.y = .01 * supply.u; run;
The restricted estimation results are shown in Figure 26.9. Figure 26.9 Restricted Estimates The SYSLIN Procedure Three-Stage Least Squares Estimation Model Dependent Variable Label
DEMAND q Quantity
Parameter Estimates
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
Intercept p y s RESTRICT
1 1 1 1 -1
-0.46584 0.015000 -0.00679 0.325589 50.59353
0.053307 0 0.002357 0.009872 7.464988
-8.74 . -2.88 32.98 6.78
<.0001 . 0.0056 <.0001 <.0001
Model Dependent Variable Label
Variable Label Intercept Price Income Price of Substitutes PEQ015
SUPPLY q Quantity
Parameter Estimates
Variable Intercept p u
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
1 1 1
-1.31894 1.291718 -0.67887
0.477633 0.059101 0.235679
-2.76 21.86 -2.88
0.0077 <.0001 0.0056
Variable Label Intercept Price Unit Cost
1628 F Chapter 26: The SYSLIN Procedure
Figure 26.9 continued Parameter Estimates
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
RESTRICT
-1
342.3605
38.12094
8.98
<.0001
Variable Label YEQ01U
The standard error for P in the demand equation is 0, since the value of the P coefficient was specified by the RESTRICT statement and not estimated from the data. The “Parameter Estimates” table for the demand equation contains an additional row for the restriction specified by the RESTRICT statement. The parameter estimate for the restriction is the value of the Lagrange multiplier used to impose the restriction. The restriction is highly significant (t D 6:777), which means that the data are not consistent with the restriction, and the model does not fit as well with the restriction imposed. See the section “RESTRICT Statement” on page 1641 for details. Following the “Parameter Estimates” table for the supply equation, the results for the cross model restrictions are printed. This shows that the restriction specified by the SRESTRICT statement is not consistent with the data (t D 8:98). See the section “SRESTRICT Statement” on page 1642 for details.
Testing Parameters You can test linear hypotheses about the model parameters with TEST and STEST statements. The TEST statement tests hypotheses about parameters in the equation specified by the preceding MODEL statement. The STEST statement tests hypotheses that relate parameters in different models. For example, the following statements test the hypothesis that the price coefficient in the demand equation is equal to 0.015. /*--- 3SLS Estimation with Tests ---*/ proc syslin data=in 3sls; endogenous p; instruments y u s; demand: model q = p y s; test_1: test p = .015; supply: model q = p u; run;
The TEST statement results are shown in Figure 26.10. This reports an F test for the hypothesis specified by the TEST statement. In this case, the F statistic is 6.79 (3.879/.571) with 1 and 113 degrees of freedom. The p value for this F statistic is 0.0104, which indicates that the hypothesis tested is almost but not quite rejected at the 0.01 level. See the section “TEST Statement” on page 1644 for details.
Testing Parameters F 1629
Figure 26.10 TEST Statement Results The SYSLIN Procedure Three-Stage Least Squares Estimation System Weighted MSE Degrees of freedom System Weighted R-Square Model Dependent Variable Label
0.5711 113 0.9627 DEMAND q Quantity
Parameter Estimates
Variable Intercept p y s
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
1 1 1 1
1.980269 -1.17654 0.404117 0.359204
1.169176 0.605015 0.117179 0.085077
1.69 -1.94 3.45 4.22
0.0959 0.0568 0.0011 <.0001
Variable Label Intercept Price Income Price of Substitutes
Test Results Num DF
Den DF
F Value
Pr > F
Label
1
113
6.79
0.0104
TEST_1
To test hypotheses that involve parameters in different equations, use the STEST statement. Specify the parameters in the linear hypothesis as model-label.regressor-name. (If the MODEL statement does not have a label, you can use the dependent variable name as the label for the model, provided the dependent variable uniquely labels the model.) For example, the following statements test the hypothesis that the income coefficient in the demand equation is 0.01 times the unit cost coefficient in the supply equation: proc syslin data=in 3sls; endogenous p; instruments y u s; demand: model q = p y s; supply: model q = p u; stest1: stest demand.y = .01 * supply.u; run;
The STEST statement results are shown in Figure 26.11. The form and interpretation of the STEST statement results are like the TEST statement results. In this case, the F test produces a p value less than 0.0001, and strongly rejects the hypothesis tested. See the section “STEST Statement” on page 1643 for details.
1630 F Chapter 26: The SYSLIN Procedure
Figure 26.11 STEST Statement Results The SYSLIN Procedure Three-Stage Least Squares Estimation System Weighted MSE Degrees of freedom System Weighted R-Square Model Dependent Variable Label
0.5711 113 0.9627 DEMAND q Quantity
Parameter Estimates
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
Intercept p y s
1 1 1 1
1.980269 -1.17654 0.404117 0.359204
1.169176 0.605015 0.117179 0.085077
1.69 -1.94 3.45 4.22
0.0959 0.0568 0.0011 <.0001
Model Dependent Variable Label
Variable Label Intercept Price Income Price of Substitutes
SUPPLY q Quantity
Parameter Estimates
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
Intercept p u
1 1 1
-0.51878 1.333080 -1.14623
0.490999 0.059271 0.243491
-1.06 22.49 -4.71
0.2952 <.0001 <.0001
Variable Label Intercept Price Unit Cost
Test Results Num DF
Den DF
F Value
Pr > F
Label
1
113
22.46
0.0001
STEST1
You can combine TEST and STEST statements with RESTRICT and SRESTRICT statements to perform hypothesis tests for restricted models. Of course, the validity of the TEST and STEST statement results depends on the correctness of any restrictions you impose on the estimates.
Saving Residuals and Predicted Values You can store predicted values and residuals from the estimated models in a SAS data set. Specify the OUT= option in the PROC SYSLIN statement and use the OUTPUT statement to specify names
Plotting Residuals F 1631
for new variables to contain the predicted and residual values. For example, the following statements store the predicted quantity from the supply and demand equations in a data set PRED: /*--- Saving Output Data from 3SLS Estimation ---*/ proc syslin data=in out=pred 3sls; endogenous p; instruments y u s; demand: model q = p y s; output predicted=q_demand; supply: model q = p u; output predicted=q_supply; run;
Plotting Residuals You can plot the residuals against the regressors by using the PROC SGPLOT. For example, the following statements plot the 2SLS residuals for the demand model against price, income, and price of substitutes. proc syslin data=in 2sls out=out; endogenous p; instruments y u s; demand: model q = p y s; output residual=residual_q; run; proc sgplot data=out; scatter x=p y=residual_q; refline 0 / axis=y; run; proc sgplot data=out; scatter x=y y=residual_q; refline 0 / axis=y; run; proc sgplot data=out; scatter x=s y=residual_q; refline 0 / axis=y; run;
The plot for income is shown in Figure 26.12. The other plots are not shown.
1632 F Chapter 26: The SYSLIN Procedure
Figure 26.12 Plot of Residuals against Income
Syntax: SYSLIN Procedure The SYSLIN procedure uses the following statements: PROC SYSLIN options ; BY variables ; ENDOGENOUS variables ; IDENTITY identities ; INSTRUMENTS variables ; MODEL response = regressors / options ; OUTPUT PREDICTED= variable RESIDUAL= variable ; RESTRICT restrictions ; SRESTRICT restrictions ; STEST equations ; TEST equations ; VAR variables ; WEIGHT variable ;
Functional Summary F 1633
Functional Summary The SYSLIN procedure statements and options are summarized in the following table.
Description
Statement
Option
Data Set Options specify the input data set specify the output data set write parameter estimates to an output data set write covariances to the OUTEST= data set
PROC SYSLIN PROC SYSLIN PROC SYSLIN PROC SYSLIN
write the SSCP matrix to an output data set
PROC SYSLIN
DATA= OUT= OUTEST= OUTCOV OUTCOV3 OUTSSCP=
PROC SYSLIN
FIML
PROC SYSLIN PROC SYSLIN PROC SYSLIN PROC SYSLIN
ITSUR IT3SLS K= LIML
PROC SYSLIN PROC SYSLIN PROC SYSLIN PROC SYSLIN PROC SYSLIN PROC SYSLIN PROC SYSLIN PROC SYSLIN PROC SYSLIN PROC SYSLIN PROC SYSLIN PROC SYSLIN
MELO OLS SUR 2SLS 3SLS ALPHA= CONVERGE= MAXIT= SDIAG NOINCLUDE SINGULAR= VARDEF=
PROC SYSLIN PROC SYSLIN PROC SYSLIN PROC SYSLIN PROC SYSLIN PROC SYSLIN MODEL
ALL FIRST ITPRINT REDUCED SIMPLE USSCP CORRB
Estimation Method Options specify full information maximum likelihood estimation specify iterative SUR estimation specify iterative 3SLS estimation specify K-class estimation specify limited information maximum likelihood estimation specify minimum expected loss estimation specify ordinary least squares estimation specify seemingly unrelated estimation specify two-stage least squares estimation specify three-stage least squares estimation specify Fuller’s modification to LIML specify convergence criterion specify maximum number of iterations use diagonal of S instead of S exclude RESTRICT statements in final stage specify criterion for testing for singularity specify denominator for variance estimates
Printing Control Options print all results print first-stage regression statistics print estimates and SSE at each iteration print the reduced form estimates print descriptive statistics print uncorrected SSCP matrix print correlations of the parameter estimates
1634 F Chapter 26: The SYSLIN Procedure
Description
Statement
Option
print covariances of the parameter estimates print Durbin-Watson statistics print Basmann’s test plot residual values against regressors print standardized parameter estimates print unrestricted parameter estimates print the model crossproducts matrix print the inverse of the crossproducts matrix suppress printed output suppress all printed output
MODEL MODEL MODEL MODEL MODEL MODEL MODEL MODEL MODEL PROC SYSLIN
COVB DW OVERID PLOT STB UNREST XPX I NOPRINT NOPRINT
Model Specification specify structural equations suppress the intercept parameter specify linear relationship among variables perform weighted regression
MODEL MODEL IDENTITY WEIGHT
Tests and Restrictions on Parameters place restrictions on parameter estimates place restrictions on parameter estimates test linear hypothesis test linear hypothesis
RESTRICT SRESTRICT STEST TEST
Other Statements specify BY-group processing specify the endogenous variables specify instrumental variables write predicted and residual values to a data set name variable for predicted values name variable for residual values include additional variables in X 0 X matrix
NOINT
BY ENDOGENOUS INSTRUMENTS OUTPUT OUTPUT OUTPUT VAR
PROC SYSLIN Statement PROC SYSLIN options ;
The following options can be used with the PROC SYSLIN statement.
PREDICTED= RESIDUAL=
PROC SYSLIN Statement F 1635
Data Set Options DATA=SAS-data-set
specifies the input data set. If the DATA= option is omitted, the most recently created SAS data set is used. In addition to ordinary SAS data sets, PROC SYSLIN can analyze data sets of TYPE=CORR, TYPE=COV, TYPE=UCORR, TYPE=UCOV, and TYPE=SSCP. See the section “Special TYPE= Input Data Sets” on page 1647 for details. OUT=SAS-data-set
specifies an output SAS data set for residuals and predicted values. The OUT= option is used in conjunction with the OUTPUT statement. See the section “OUT= Data Set” on page 1655 for details. OUTEST=SAS-data-set
writes the parameter estimates to an output data set. See the section “OUTEST= Data Set” on page 1655 for details. OUTCOV COVOUT
writes the covariance matrix of the parameter estimates to the OUTEST= data set in addition to the parameter estimates. OUTCOV3 COV3OUT
writes covariance matrices for each model in a system to the OUTEST= data set when the 3SLS, SUR, or FIML option is used. OUTSSCP=SAS-data-set
writes the sum-of-squares-and-crossproducts matrix to an output data set. See the section “OUTSSCP= Data Set” on page 1656 for details.
Estimation Method Options 2SLS
specifies the two-stage least squares estimation method. 3SLS
specifies the three-stage least squares estimation method. ALPHA=value
specifies Fuller’s modification to the LIML estimation method. See the section “Fuller’s Modification to LIML” on page 1654 for details. CONVERGE=value
specifies the convergence criterion for the iterative estimation methods IT3SLS, ITSUR, and FIML. The default is CONVERGE=0.0001. FIML
specifies the full information maximum likelihood estimation method.
1636 F Chapter 26: The SYSLIN Procedure
ITSUR
specifies the iterative seemingly unrelated estimation method. IT3SLS
specifies the iterative three-stage least squares estimation method. K=value
specifies the K-class estimation method. LIML
specifies the limited information maximum likelihood estimation method. MAXITER=n
specifies the maximum number of iterations allowed for the IT3SLS, ITSUR, and FIML estimation methods. The MAXITER= option can be abbreviated as MAXIT=. The default is MAXITER=30. MELO
specifies the minimum expected loss estimation method. NOINCLUDE
excludes the RESTRICT statements from the final stage for the 3SLS, IT3SLS, SUR, and ITSUR estimation methods. OLS
specifies the ordinary least squares estimation method. This is the default. SDIAG
uses the diagonal of S instead of S to do the estimation, where S is the covariance matrix of equation errors. See the section “Uncorrelated Errors across Equations” on page 1654 for details. SINGULAR=value
specifies a criterion for testing singularity of the crossproducts matrix. This is a tuning parameter used to make PROC SYSLIN more or less sensitive to singularities. The value must be between 0 and 1. The default is SINGULAR=1E–8. SUR
specifies the seemingly unrelated estimation method.
Printing Control Options ALL
specifies the CORRB, COVB, DW, I, OVERID, PLOT, STB, and XPX options for every MODEL statement. FIRST
prints first-stage regression statistics for the endogenous variables regressed on the instruments. This output includes sums of squares, estimates, variances, and standard deviations.
BY Statement F 1637
ITPRINT
prints parameter estimates, system-weighted residual sum of squares, and R2 at each iteration for the IT3SLS and ITSUR estimation methods. For the FIML method, the ITPRINT option prints parameter estimates, negative of log-likelihood function, and norm of gradient vector at each iteration. NOPRINT
suppresses all printed output. Specifying NOPRINT in the PROC SYSLIN statement is equivalent to specifying NOPRINT in every MODEL statement. REDUCED
prints the reduced form estimates. If the REDUCED option is specified, you should specify any IDENTITY statements needed to make the system square. See the section “Reduced Form Estimates” on page 1653 for details. SIMPLE
prints descriptive statistics for the dependent variables. The statistics printed include the sum, mean, uncorrected sum of squares, variance, and standard deviation. USSCP
prints the uncorrected sum-of-squares-and-crossproducts matrix. USSCP2
prints the uncorrected sum-of-squares-and-crossproducts matrix for all variables used in the analysis, including predicted values of variables generated by the procedure. VARDEF=DF | N | WEIGHT | WGT
specifies the denominator to use in calculating cross-equation error covariances and parameter standard errors and covariances. The default is VARDEF=DF, which corrects for model degrees of freedom. VARDEF=N specifies no degrees-of-freedom correction. VARDEF=WEIGHT specifies the sum of the observation weights. VARDEF=WGT specifies the sum of the observation weights minus the model degrees of freedom. See the section “Computation of Standard Errors” on page 1653 for details.
BY Statement BY variables ;
A BY statement can be used with PROC SYSLIN to obtain separate analyses on observations in groups defined by the BY variables.
ENDOGENOUS Statement ENDOGENOUS variables ;
1638 F Chapter 26: The SYSLIN Procedure
The ENDOGENOUS statement declares the jointly dependent variables that are projected in the first-stage regression through the instrument variables. The ENDOGENOUS statement is not needed for the SUR, ITSUR, or OLS estimation methods. The default ENDOGENOUS list consists of all the dependent variables in the MODEL and IDENTITY statements that do not appear in the INSTRUMENTS statement.
IDENTITY Statement IDENTITY equation ;
The IDENTITY statement specifies linear relationships among variables to write to the OUTEST= data set. It provides extra information in the OUTEST= data set but does not create or compute variables. The OUTEST= data set can be processed by the SIMLIN procedure in a later step. The IDENTITY statement is also used to compute reduced form coefficients when the REDUCED option in the PROC SYSLIN statement is specified. See the section “Reduced Form Estimates” on page 1653 for details. The equation given by the IDENTITY statement has the same form as equations in the MODEL statement. A label can be specified for an IDENTITY statement as follows: label : IDENTITY . . . ;
INSTRUMENTS Statement INSTRUMENTS variables ;
The INSTRUMENTS statement declares the variables used in obtaining first-stage predicted values. All the instruments specified are used in each first-stage regression. The INSTRUMENTS statement is required for the 2SLS, 3SLS, IT3SLS, LIML, MELO, and K-class estimation methods. The INSTRUMENTS statement is not needed for the SUR, ITSUR, OLS, or FIML estimation methods.
MODEL Statement MODEL response = regressors / options ;
The MODEL statement regresses the response variable on the left side of the equal sign against the regressors listed on the right side. Models can be given labels. Model labels are used in the printed output to identify the results for different models. Model labels are also used in SRESTRICT and STEST statements to refer to parameters in different models. If no label is specified, the response variable name is used as the label for the model. The model label is specified as follows:
MODEL Statement F 1639
label : MODEL . . . ;
The following options can be used in the MODEL statement after a slash (/). ALL
specifies the CORRB, COVB, DW, I, OVERID, PLOT, STB, and XPX options. ALPHA=value
specifies the ˛ parameter for Fuller’s modification to the LIML estimation method. See the section “Fuller’s Modification to LIML” on page 1654 for details. CORRB
prints the matrix of estimated correlations between the parameter estimates. COVB
prints the matrix of estimated covariances between the parameter estimates. DW
prints Durbin-Watson statistics and autocorrelation coefficients for the residuals. If there are missing values, d 0 is calculated according to Savin and White (1978). Use the DW option only if the data set to be analyzed is an ordinary SAS data set with time series observations sorted in time order. The Durbin-Watson test is not valid for models with lagged dependent regressors. I
prints the inverse of the crossproducts matrix for the model, .X0 X/ 1 . If restrictions are specified, the crossproducts matrix printed is adjusted for the restrictions. See the section “Computational Details” on page 1651 for details. K=value
specifies K-class estimation. NOINT
suppresses the intercept parameter from the model. NOPRINT
suppresses the normal printed output. OVERID
prints Basmann’s (1960) test for over identifying restrictions. “Overidentification Restrictions” on page 1654 for details.
See the section
PLOT
plots residual values against regressors. A plot of the residuals for each regressor is printed. STB
prints standardized parameter estimates. Sometimes known as a standard partial regression coefficient, a standardized parameter estimate is a parameter estimate multiplied by the standard deviation of the associated regressor and divided by the standard deviation of the response variable.
1640 F Chapter 26: The SYSLIN Procedure
UNREST
prints parameter estimates computed before restrictions are applied. The UNREST option is valid only if a RESTRICT statement is specified. XPX
prints the model crossproducts matrix, X 0 X . See the section “Computational Details” on page 1651 for details.
OUTPUT Statement OUTPUT < PREDICTED=variable > < RESIDUAL=variable > ;
The OUTPUT statement writes predicted values and residuals from the preceding model to the data set specified by the OUT= option in the PROC SYSLIN statement. An OUTPUT statement must come after the MODEL statement to which it applies. The OUT= option must be specified in the PROC SYSLIN statement. The following options can be specified in the OUTPUT statement: PREDICTED=variable
names a new variable to contain the predicted values for the response variable. The PREDICTED= option can be abbreviated as PREDICT=, PRED=, or P=. RESIDUAL=variable
names a new variable to contain the residual values for the response variable. The RESIDUAL= option can be abbreviated as RESID= or R=. For example, the following statements create an output data set named B. In addition to the variables in the input data set, the data set B contains the variable YHAT, with values that are predicted values of the response variable Y, and YRESID, with values that are the residual values of Y. proc syslin data=a out=b; model y = x1 x2; output p=yhat r=yresid; run;
For example, the following statements create an output data set named PRED. In addition to the variables in the input data set, the data set PRED contains the variables Q_DEMAND and Q_SUPPLY, with values that are predicted values of the response variable Q for the demand and supply equations respectively, and R_DEMAND and R_SUPPLY, with values that are the residual values of the demand and supply equations respectively. proc syslin data=in out=pred; demand: model q = p y s; output p=q_demand r=r_demand; supply: model q = p u; output p=q_supply r=r_supply; run;
RESTRICT Statement F 1641
See the section “OUT= Data Set” on page 1655 for details.
RESTRICT Statement RESTRICT equation , . . . , equation ;
The RESTRICT statement places restrictions on the parameter estimates for the preceding MODEL statement. Any number of RESTRICT statements can follow a MODEL statement. Each restriction is written as a linear equation. If more than one restriction is specified in a single RESTRICT statement, the restrictions are separated by commas. Parameters are referred to by the name of the corresponding regressor variable. Each name used in the equation must be a regressor in the preceding MODEL statement. The keyword INTERCEPT is used to refer to the intercept parameter in the model. RESTRICT statements can be given labels. The labels are used in the printed output to distinguish results for different restrictions. Labels are specified as follows: label : RESTRICT . . . ;
The following is an example of the use of the RESTRICT statement, in which the coefficients of the regressors X1 and X2 are required to sum to 1. proc syslin data=a; model y = x1 x2; restrict x1 + x2 = 1; run;
Variable names can be multiplied by constants. When no equal sign appears, the linear combination is set equal to 0. Note that the parameters associated with the variables are restricted, not the variables themselves. Here are some examples of valid RESTRICT statements: restrict restrict restrict restrict restrict
x1 + x2 = 1; x1 + x2 - 1; 2 * x1 = x2 + x3 , intercept + x4 = 0; x1 = x2 = x3 = 1; 2 * x1 - x2;
Restricted parameter estimates are computed by introducing a Lagrangian parameter for each restriction (Pringle and Rayner 1971). The estimates of these Lagrangian parameters are printed in the “Parameter Estimates” table. If a restriction cannot be applied, its parameter value and degrees of freedom are listed as 0. The Lagrangian parameter measures the sensitivity of the sum of squared errors (SSE) to the restriction. If the restriction is changed by a small amount , the SSE is changed by 2. The t ratio tests the significance of the restrictions. If is zero, the restricted estimates are the same as the unrestricted.
1642 F Chapter 26: The SYSLIN Procedure
Any number of restrictions can be specified on a RESTRICT statement, and any number of RESTRICT statements can be used. The estimates are computed subject to all restrictions specified. However, restrictions should be consistent and not redundant. N OTE : The RESTRICT statement is not supported for the FIML estimation method.
SRESTRICT Statement SRESTRICT equation , . . . , equation ;
The SRESTRICT statement imposes linear restrictions that involve parameters in two or more MODEL statements. The SRESTRICT statement is like the RESTRICT statement but is used to impose restrictions across equations, whereas the RESTRICT statement applies only to parameters in the immediately preceding MODEL statement. Each restriction is written as a linear equation. Parameters are referred to as label.variable, where label is the model label and variable is the name of the regressor to which the parameter is attached. (If the MODEL statement does not have a label, you can use the dependent variable name as the label for the model, provided the dependent variable uniquely labels the model.) Each variable name used must be a regressor in the indicated MODEL statement. The keyword INTERCEPT is used to refer to intercept parameters. SRESTRICT statements can be given labels. The labels are used in the printed output to distinguish results for different restrictions. Labels are specified as follows: label : SRESTRICT . . . ;
The following is an example of the use of the SRESTRICT statement, in which the coefficient for the regressor X2 is constrained to be the same in both models. proc syslin data=a 3sls; endogenous y1 y2; instruments x1 x2; model y1 = y2 x1 x2; model y2 = y1 x2; srestrict y1.x2 = y2.x2; run;
When no equal sign is used, the linear combination is set equal to 0. Thus, the restriction in the preceding example can also be specified as srestrict y1.x2 - y2.x2;
Any number of restrictions can be specified on an SRESTRICT statement, and any number of SRESTRICT statements can be used. The estimates are computed subject to all restrictions specified. However, restrictions should be consistent and not redundant.
STEST Statement F 1643
When a system restriction is requested for a single equation estimation method (such as OLS or 2SLS), PROC SYSLIN produces the restricted estimates by actually using a corresponding system method. For example, when SRESTRICT is specified along with OLS, PROC SYSLIN produces the restricted OLS estimates via a two-step process equivalent to using SUR estimation with the SDIAG option. First, the unrestricted OLS results are produced. Then, the GLS (SUR) estimation with the system restriction is performed, using the diagonal of the covariance matrix of the residuals. When SRESTRICT is specified along with 2SLS, PROC SYSLIN produces the restricted 2SLS estimates via a multistep process equivalent to using 3SLS estimation with the SDIAG option. First, the unrestricted 2SLS results are produced. Then, the GLS (3SLS) estimation with the system restriction is performed, using the diagonal of the covariance matrix of the residuals. The results of the SRESTRICT statements are printed after the parameter estimates for all the models in the system. The format of the SRESTRICT statement output is the same as the “Parameter Estimates” table. In this output the parameter estimate is the Lagrangian parameter used to impose the restriction. The Lagrangian parameter measures the sensitivity of the system sum of square errors to the restriction. The system SSE is the system MSE shown in the printed output multiplied by the degrees of freedom. If the restriction is changed by a small amount , the system SSE is changed by 2. The t ratio tests the significance of the restriction. If is zero, the restricted estimates are the same as the unrestricted estimates. The model degrees of freedom are not adjusted for the cross-model restrictions imposed by SRESTRICT statements. N OTE : The SRESTRICT statement is not supported for the LIML and the FIML estimation methods.
STEST Statement STEST equation , . . . , equation / options ;
The STEST statement performs an F test for the joint hypotheses specified in the statement. The hypothesis is represented in matrix notation as Lˇ D c and the F test is computed as .Lb
c/0 .L.X0 X/ 1 L0 / mO 2
1 .Lb
c/
where b is the estimate of ˇ, m is the number of restrictions, and O 2 is the system weighted mean squared error. See the section “Computational Details” on page 1651 for information about the matrix X0 X. Each hypothesis to be tested is written as a linear equation. Parameters are referred to as label.variable, where label is the model label and variable is the name of the regressor to which
1644 F Chapter 26: The SYSLIN Procedure
the parameter is attached. (If the MODEL statement does not have a label, you can use the dependent variable name as the label for the model, provided the dependent variable uniquely labels the model.) Each variable name used must be a regressor in the indicated MODEL statement. The keyword INTERCEPT is used to refer to intercept parameters. STEST statements can be given labels. The label is used in the printed output to distinguish different tests. Any number of STEST statements can be specified. Labels are specified as follows: label : STEST . . . ;
The following is an example of the STEST statement: proc syslin data=a 3sls; endogenous y1 y2; instruments x1 x2; model y1 = y2 x1 x2; model y2 = y1 x2; stest y1.x2 = y2.x2; run;
The test performed is exact only for ordinary least squares, given the OLS assumptions of the linear model. For other estimation methods, the F test is based on large sample theory and is only approximate in finite samples. If RESTRICT or SRESTRICT statements are used, the tests computed by the STEST statement are conditional on the restrictions specified. The validity of the tests can be compromised if incorrect restrictions are imposed on the estimates. The following are examples of STEST statements: stest a.x1 + b.x2 = l; stest 2 * b.x2 = c.x3 + c.x4 , a.intercept + b.x2 = 0; stest a.x1 = c.x2 = b.x3 = 1; stest 2 * a.x1 - b.x2 = 0;
The PRINT option can be specified in the STEST statement after a slash (/): PRINT
prints intermediate calculations for the hypothesis tests. N OTE : The STEST statement is not supported for the FIML estimation method.
TEST Statement TEST equation , . . . , equation / options ;
TEST Statement F 1645
The TEST statement performs F tests of linear hypotheses about the parameters in the preceding MODEL statement. Each equation specifies a linear hypothesis to be tested. If more than one equation is specified, the equations are separated by commas. Variable names must correspond to regressors in the preceding MODEL statement, and each name represents the coefficient of the corresponding regressor. The keyword INTERCEPT is used to refer to the model intercept. TEST statements can be given labels. The label is used in the printed output to distinguish different tests. Any number of TEST statements can be specified. Labels are specified as follows: label : TEST . . . ;
The following is an example of the use of TEST statement, which tests the hypothesis that the coefficients of X1 and X2 are the same: proc syslin data=a; model y = x1 x2; test x1 = x2; run;
The following statements perform F tests for the hypothesis that the coefficients of X1 and X2 are equal, for the hypothesis that the sum of the X1 and X2 coefficients is twice the intercept, and for the joint hypothesis. proc syslin data=a; model y = x1 x2; x1eqx2: test x1 = x2; sumeq2i: test x1 + x2 = 2 * intercept; joint: test x1 = x2, x1 + x2 = 2 * intercept; run;
The following are additional examples of TEST statements: test test test test
x1 + x2 = 1; x1 = x2 = x3 = 1; 2 * x1 = x2 + x3, intercept + x4 = 0; 2 * x1 - x2;
The TEST statement performs an F test for the joint hypotheses specified. The hypothesis is represented in matrix notation as follows: Lˇ D c The F test is computed as .Lb
c/0 .L.X0 X/ L0 / mO 2
1 .Lb
c/
1646 F Chapter 26: The SYSLIN Procedure
where b is the estimate of ˇ, m is the number of restrictions, and O 2 is the model mean squared error. See the section “Computational Details” on page 1651 for information about the matrix X0 X. The test performed is exact only for ordinary least squares, given the OLS assumptions of the linear model. For other estimation methods, the F test is based on large sample theory and is only approximate in finite samples. If RESTRICT or SRESTRICT statements are used, the tests computed by the TEST statement are conditional on the restrictions specified. The validity of the tests can be compromised if incorrect restrictions are imposed on the estimates. The PRINT option can be specified in the TEST statement after a slash (/): PRINT
prints intermediate calculations for the hypothesis tests. N OTE : The TEST statement is not supported for the FIML estimation method.
VAR Statement VAR variables ;
The VAR statement is used to include variables in the crossproducts matrix that are not specified in any MODEL statement. This statement is rarely used with PROC SYSLIN and is used only with the OUTSSCP= option in the PROC SYSLIN statement.
WEIGHT Statement WEIGHT variable ;
The WEIGHT statement is used to perform weighted regression. The WEIGHT statement names a variable in the input data set whose values are relative weights for a weighted least squares fit. If the weight value is proportional to the reciprocal of the variance for each observation, the weighted estimates are the best linear unbiased estimates (BLUE).
Details: SYSLIN Procedure F 1647
Details: SYSLIN Procedure
Input Data Set PROC SYSLIN does not compute new values for regressors. For example, if you need a lagged variable, you must create it with a DATA step. No values are computed by IDENTITY statements; all values must be in the input data set.
Special TYPE= Input Data Sets The input data set for most applications of the SYSLIN procedure contains standard rectangular data. However, PROC SYSLIN can also process input data in the form of a crossproducts, covariance, or correlation matrix. Data sets that contain such matrices are identified by values of the TYPE= data set option. These special kinds of input data sets can be used to save computer time. It takes nk 2 operations, where n is the number of observations and k is the number of variables, to calculate cross products; the regressions are of the order k 3 . When n is in the thousands and k is much smaller, you can save most of the computer time in later runs of PROC SYSLIN by reusing the SSCP matrix rather than recomputing it. The SYSLIN procedure can process TYPE=CORR, COV, UCORR, UCOV, or SSCP data sets. TYPE=CORR and TYPE=COV data sets, usually created by the CORR procedure, contain means and standard deviations, and correlations or covariances. TYPE=SSCP data sets, usually created in previous runs of PROC SYSLIN, contain sums of squares and cross products. See the SAS/STAT User’s Guide for more information about special SAS data sets. When special SAS data sets are read, you must specify the TYPE= data set option. PROC CORR and PROC SYSLIN automatically set the type for output data sets; however, if you create the data set by some other means, you must specify its type with the TYPE= data set option. When the special data sets are used, the DW (Durbin-Watson test) and PLOT options in the MODEL statement cannot be performed, and the OUTPUT statements are not valid.
Estimation Methods A brief description of the methods used by the SYSLIN procedure follows. For more information about these methods, see the references at the end of this chapter. There are two fundamental methods of estimation for simultaneous equations: least squares and maximum likelihood. There are two approaches within each of these categories: single equation methods (also referred to as limited information methods) and system methods (also referred to as
1648 F Chapter 26: The SYSLIN Procedure
full information methods). System methods take into account cross-equation correlations of the disturbances in estimating parameters, while single equation methods do not. OLS, 2SLS, MELO, K-class, SUR, ITSUR, 3SLS, and IT3SLS use the least squares method; LIML and FIML use the maximum likelihood method. OLS, 2SLS, MELO, K-class, and LIML are single equation methods. The system methods are SUR, ITSUR, 3SLS, IT3SLS, and FIML.
Single Equation Estimation Methods Single equation methods do not take into account correlations of errors across equations. As a result, these estimators are not asymptotically efficient compared to full information methods; however, there are instances in which they may be preferred. (See the section “Choosing a Method for Simultaneous Equations” on page 1650 for details.) Let yi be the dependent endogenous variable in equation i , and Xi and Yi be the matrices of exogenous and endogenous variables appearing as regressors in the same equation. The 2SLS method owes its name to the fact that, in a first stage, the instrumental variables are used as regressors to obtain a projected value YOi that is uncorrelated with the residual in equation i . In a second stage, YOi replaces Yi on the right-hand side to obtain consistent least squares estimators. Normally, the predetermined variables of the system are used as the instruments. It is possible to use variables other than predetermined variables from your system as instruments; however, the estimation might not be as efficient. For consistent estimates, the instruments must be uncorrelated with the residual and correlated with the endogenous variables. The LIML method results in consistent estimates that are equal to the 2SLS estimates when an equation is exactly identified. LIML can be viewed as a least-variance ratio estimation or as a maximum likelihood estimation. LIML involves minimizing the ratio D .rvar_eq/=.rvar_sys/, where rvar_eq is the residual variance associated with regressing the weighted endogenous variables on all predetermined variables that appear in that equation, and rvar_sys is the residual variance associated with regressing weighted endogenous variables on all predetermined variables in the system. The MELO method computes the minimum expected loss estimator. MELO estimators “minimize the posterior expectation of generalized quadratic loss functions for structural coefficients of linear structural models” (Judge et al. 1985, p. 635). K-class estimators are a class of estimators that depends on a user-specified parameter k. A k value less than 1 is recommended but not required. The parameter k can be deterministic or stochastic, but its probability limit must equal 1 for consistent parameter estimates. When all the predetermined variables are listed as instruments, they include all the other single equation estimators supported by PROC SYSLIN. The instance when some of the predetermined variables are not listed among the instruments is not supported by PROC SYSLIN for the general K-class estimation. However, it is supported for the other methods. For k D 1, the K-class estimator is the 2SLS estimator, while for k D 0, the K-class estimator is the OLS estimator. The K-class interpretation of LIML is that k D . Note that k is stochastic in the LIML method, unlike for OLS and 2SLS.
Estimation Methods F 1649
MELO is a Bayesian K-class estimator. It yields estimates that can be expressed as a matrixweighted average of the OLS and 2SLS estimates. MELO estimators have finite second moments and hence finite risk. Other frequently used K-class estimators might not have finite moments under some commonly encountered circumstances, and hence there can be infinite risk relative to quadratic and other loss functions. One way of comparing K-class estimators is to note that when k =1, the correlation between regressor and the residual is completely corrected for. In all other cases, it is only partially corrected for. See “Computational Details” on page 1651 for more details about K-class estimators.
SUR and 3SLS Estimation Methods SUR might improve the efficiency of parameter estimates when there is contemporaneous correlation of errors across equations. In practice, the contemporaneous correlation matrix is estimated using OLS residuals. Under two sets of circumstances, SUR parameter estimates are the same as those produced by OLS: when there is no contemporaneous correlation of errors across equations (the estimate of the contemporaneous correlation matrix is diagonal) and when the independent variables are the same across equations. Theoretically, SUR parameter estimates are always at least as efficient as OLS in large samples, provided that your equations are correctly specified. However, in small samples the need to estimate the covariance matrix from the OLS residuals increases the sampling variability of the SUR estimates. This effect can cause SUR to be less efficient than OLS. If the sample size is small and the cross-equation correlations are small, then OLS is preferred to SUR. The consequences of specification error are also more serious with SUR than with OLS. The 3SLS method combines the ideas of the 2SLS and SUR methods. Like 2SLS, the 3SLS method uses YO instead of Y for endogenous regressors, which results in consistent estimates. Like SUR, the 3SLS method takes the cross-equation error correlations into account to improve large sample efficiency. For 3SLS, the 2SLS residuals are used to estimate the cross-equation error covariance matrix. The SUR and 3SLS methods can be iterated by recomputing the estimate of the cross-equation covariance matrix from the SUR or 3SLS residuals and then computing new SUR or 3SLS estimates based on this updated covariance matrix estimate. Continuing this iteration until convergence produces ITSUR or IT3SLS estimates.
FIML Estimation Method The FIML estimator is a system generalization of the LIML estimator. The FIML method involves minimizing the determinant of the covariance matrix associated with residuals of the reduced form of the equation system. From a maximum likelihood standpoint, the LIML method involves assuming that the errors are normally distributed and then maximizing the likelihood function subject to restrictions on a particular equation. FIML is similar, except that the likelihood function is maximized subject to restrictions on all of the parameters in the model, not just those in the equation being estimated.
1650 F Chapter 26: The SYSLIN Procedure
N OTE : The RESTRICT, SRESTRICT, TEST, and STEST statements are not supported when the FIML method is used.
Choosing a Method for Simultaneous Equations A number of factors should be taken into account in choosing an estimation method. Although system methods are asymptotically most efficient in the absence of specification error, system methods are more sensitive to specification error than single equation methods. In practice, models are never perfectly specified. It is a matter of judgment whether the misspecification is serious enough to warrant avoidance of system methods. Another factor to consider is sample size. With small samples, 2SLS might be preferred to 3SLS. In general, it is difficult to say much about the small sample properties of K-class estimators because the results depend on the regressors used. LIML and FIML are invariant to the normalization rule imposed but are computationally more expensive than 2SLS or 3SLS. If the reason for contemporaneous correlation among errors across equations is a common, omitted variable, it is not necessarily best to apply SUR. SUR parameter estimates are more sensitive to specification error than OLS. OLS might produce better parameter estimates under these circumstances. SUR estimates are also affected by the sampling variation of the error covariance matrix. There is some evidence from Monte Carlo studies that SUR is less efficient than OLS in small samples.
ANOVA Table for Instrumental Variables Methods In the instrumental variables methods (2SLS, LIML, K-class, MELO), first-stage predicted values are substituted for the endogenous regressors. As a result, the regression sum of squares (RSS) and the error sum of squares (ESS) do not sum to the total corrected sum of squares for the dependent variable (TSS). The analysis-of-variance table included in the second-stage results gives these sums of squares and the mean squares that are used for the F test, but this table is not a variance decomposition in the usual sense. The F test shown in the instrumental variables case is a valid test of the no-regression hypothesis that the true coefficients of all regressors are 0. However, because of the first-stage projection of the regression mean square, this is a Wald-type test statistic, which is asymptotically F but not exactly F -distributed in finite samples. Thus, for small samples the F test is only approximate when instrumental variables are used.
The R-Square Statistics F 1651
The R-Square Statistics As explained in the section “ANOVA Table for Instrumental Variables Methods” on page 1650, when instrumental variables are used, the regression sum of squares (RSS) and the error sum of squares (ESS) do not sum to the total corrected sum of squares. In this case, there are several ways that the R2 statistic can be defined. The definition of R2 used by the SYSLIN procedure is R2 D
RS S RS S C ES S
This definition is consistent with the F test of the null hypothesis that the true coefficients of all regressors are zero. However, this R2 might not be a good measure of the goodness of fit of the model.
System Weighted R-Square and System Weighted Mean Squared Error The system weighted R2 , printed for the 3SLS, IT3SLS, SUR, ITSUR, and FIML methods, is computed as follows. R2 D Y0 WR.X0 X/
1
R0 WY=Y0 WY
In this equation, the matrix X0 X is R0 WR and W is the projection matrix of the instruments: WDS
1
˝Z.Z0 Z/
1 0
Z
The matrix Z is the instrument set, R is the regressor set, and S is the estimated cross-model covariance matrix. The system weighted MSE, printed for the 3SLS, IT3SLS, SUR, ITSUR, and FIML methods, is computed as follows: M SE D
1 .Y0 WY t df
Y0 WR.X0 X/
1
R0 WY/
In this equation, tdf is the sum of the error degrees of freedom for the equations in the system.
Computational Details This section discusses various computational details.
1652 F Chapter 26: The SYSLIN Procedure
Computation of Least Squares-Based Estimators Let the system be composed of G equations and let the i th equation be expressed in this form: yi D Yi ˇi C Xi i C u where yi
is the vector of observations on the dependent variable
Yi
is the matrix of observations on the endogenous variables included in the equation
ˇi
is the vector of parameters associated with Yi
Xi
is the matrix of observations on the predetermined variables included in the equation
i
is the vector of parameters associated with Xi
u
is a vector of errors
Let VOi D Yi Z.
YOi , where YOi is the projection of Yi onto the space spanned by the instruments matrix
Let ıi D
ˇi
i
be the vector of parameters associated with both the endogenous and exogenous variables. The K-class of estimators (Theil 1971) is defined by ıOi;k D
Yi0 Yi k VOi0 VOi Xi0 Yi
Yi0 Xi Xi0 Xi
1
.Yi
kVi /0 yi Xi0 yi
where k is a user-defined value. Let R D ŒYi Xi and O D ŒYOi Xi R The 2SLS estimator is defined as ıOi;2SLS D ŒRO i0 RO i
1
RO i0 yi
Let y and ı be the vectors obtained by stacking the vectors of dependent variables and parameters for O be the block diagonal matrices formed by Ri and RO i , respectively. all G equations, and let R and R The SUR and ITSUR estimators are defined as h i 1 O 1˝I R O ıO.I T /SUR D R0 † R0 †
1
˝I y
Computational Details F 1653
while the 3SLS and IT3SLS estimators are defined as i 1 0 h 0 O † O O † O 1˝I y O 1˝I R R ıO.I T /3SLS D R O is an estimator of the cross-equation correlation matrix. For where I is the identity matrix, and † O 3SLS, † is obtained from the 2SLS estimation, while for SUR it is derived from the OLS estimation. For IT3SLS and ITSUR, it is obtained iteratively from the previous estimation step, until convergence.
Computation of Standard Errors The VARDEF= option in the PROC SYSLIN statement controls the denominator used in calculating the cross-equation covariance estimates and the parameter standard errors and covariances. The values of the VARDEF= option and the resulting denominator are as follows: N
uses the number of nonmissing observations.
DF
uses the number of nonmissing observations less the degrees of freedom in the model.
WEIGHT
uses the sum of the observation weights given by the WEIGHTS statement.
WDF
uses the sum of the observation weights given by the WEIGHTS statement less the degrees of freedom in the model.
The VARDEF= option does not affect the model mean squared error, root mean squared error, or R2 statistics. These statistics are always based on the error degrees of freedom, regardless of the VARDEF= option. The VARDEF= option also does not affect the dependent variable coefficient of variation (CV).
Reduced Form Estimates The REDUCED option in the PROC SYSLIN statement computes estimates of the reduced form coefficients. The REDUCED option requires that the equation system be square. If there are fewer models than endogenous variables, IDENTITY statements can be used to complete the equation system. The reduced form coefficients are computed as follows. Represent the equation system, with all endogenous variables moved to the left-hand side of the equations and identities, as BY D X Here B is the estimated coefficient matrix for the endogenous variables Y, and is the estimated coefficient matrix for the exogenous (or predetermined) variables X. The system can be solved for Y as follows, provided B is square and nonsingular: YDB
1
X
The reduced form coefficients are the matrix B
1 .
1654 F Chapter 26: The SYSLIN Procedure
Uncorrelated Errors across Equations The SDIAG option in the PROC SYSLIN statement computes estimates by assuming uncorrelated errors across equations. As a result, when the SDIAG option is used, the 3SLS estimates are identical to 2SLS estimates, and the SUR estimates are the same as the OLS estimates.
Overidentification Restrictions The OVERID option in the MODEL statement can be used to test for overidentifying restrictions on parameters of each equation. The null hypothesis is that the predetermined variables that do not appear in any equation have zero coefficients. The alternative hypothesis is that at least one of the assumed zero coefficients is nonzero. The test is approximate and rejects the null hypothesis too frequently for small sample sizes. The formula for the test is given as follows. Let yi D ˇi Yi C i Zi C ei be the i th equation. Yi are the endogenous variables that appear as regressors in the i th equation, and Zi are the instrumental variables that appear as regressors in the i th equation. Let Ni be the number of variables in Yi and Zi . Let vi D yi Yi ˇOi . Let Z represent all instrumental variables, T be the total number of observations, and K be the total number of instrumental variables. Define lO as follows: v 0 i .I Zi .Z0 i Zi / 1 Z0 i /vi lO D v 0 i .I Z.Z0 Z/ 1 Z0 /vi Then the test statistic T K
K O .l Ni
1/
is distributed approximately as an F with K (1960) for more information.
Ni and T
K degrees of freedom. See Basmann
Fuller’s Modification to LIML The ALPHA= option in the PROC SYSLIN and MODEL statements parameterizes Fuller’s modification to LIML. This modification is k D .˛=.n g//, where ˛ is the value of the ALPHA= option, is the LIML k value,n is the number of observations, and g is the number of predetermined variables. Fuller’s modification is not used unless the ALPHA= option is specified. See Fuller (1977) for more information.
Missing Values Observations that have a missing value for any variable in the analysis are excluded from the computations.
OUT= Data Set F 1655
OUT= Data Set The output SAS data set produced by the OUT= option in the PROC SYSLIN statement contains all the variables in the input data set and the variables that contain predicted values and residuals specified by OUTPUT statements. The residuals are computed as actual values minus predicted values. Predicted values never use lags of other predicted values, as would be desirable for dynamic simulation. For these applications, PROC SIMLIN is available to predict or simulate values from the estimated equations.
OUTEST= Data Set The OUTEST= option produces a TYPE=EST output SAS data set that contains estimates from the regressions. The variables in the OUTEST= data set are as follows: BY variables
identifies the BY statement variables that are included in the OUTEST= data set.
_TYPE_
identifies the estimation type for the observations. The _TYPE_ value INST indicates first-stage regression estimates. Other values indicate the estimation method used: 2SLS indicates two-stage least squares results, 3SLS indicates three-stage least squares results, LIML indicates limited information maximum likelihood results, and so forth. Observations added by IDENTITY statements have the _TYPE_ value IDENTITY.
_STATUS_
identifies the convergence status of the estimation. _STATUS_ equals 0 when convergence criteria are met. Otherwise, _STATUS_ equals 1 when the estimation converges with a note, 2 when it converges with a warning, or 3 when it fails to converge.
_MODEL_
identifies the model label. The model label is the label specified in the MODEL statement or the dependent variable name if no label is specified. For first-stage regression estimates, _MODEL_ has the value FIRST.
_DEPVAR_
identifies the name of the dependent variable for the model.
_NAME_
identifies the names of the regressors for the rows of the covariance matrix, if the COVOUT option is specified. _NAME_ has a blank value for the parameter estimates observations. The _NAME_ variable is not included in the OUTEST= data set unless the COVOUT option is used to output the covariance of parameter estimates matrix.
_SIGMA_
contains the root mean squared error for the model, which is an estimate of the standard deviation of the error term. The _SIGMA_ variable contains the same values reported as Root MSE in the printed output.
INTERCEPT
identifies the intercept parameter estimates.
regressors
identifies the regressor variables from all the MODEL statements that are included in the OUTEST= data set. Variables used in IDENTIFY statements are also included in the OUTEST= data set.
1656 F Chapter 26: The SYSLIN Procedure
The parameter estimates are stored under the names of the regressor variables. The intercept parameters are stored in the variable INTERCEPT. The dependent variable of the model is given a coefficient of –1. Variables that are not in a model have missing values for the OUTEST= observations for that model. Some estimation methods require computation of preliminary estimates. All estimates computed are output to the OUTEST= data set. For each BY group and each estimation, the OUTEST= data set contains one observation for each MODEL or IDENTITY statement. Results for different estimations are identified by the _TYPE_ variable. For example, consider the following statements: proc syslin data=a outest=est 3sls; by b; endogenous y1 y2; instruments x1-x4; model y1 = y2 x1 x2; model y2 = y1 x3 x4; identity x1 = x3 + x4; run;
The 3SLS method requires both a preliminary 2SLS stage and preliminary first-stage regressions for the endogenous variable. The OUTEST= data set thus contains three different kinds of estimates. The observations for the first-stage regression estimates have the _TYPE_ value INST. The observations for the 2SLS estimates have the _TYPE_ value 2SLS. The observations for the final 3SLS estimates have the _TYPE_ value 3SLS. Since there are two endogenous variables in this example, there are two first-stage regressions and two _TYPE_=INST observations in the OUTEST= data set. Since there are two model statements, there are two OUTEST= observations with _TYPE_=2SLS and two observations with _TYPE_=3SLS. In addition, the OUTEST= data set contains an observation with the _TYPE_ value IDENTITY that contains the coefficients specified by the IDENTITY statement. All these observations are repeated for each BY group in the input data set defined by the values of the BY variable B. When the COVOUT option is specified, the estimated covariance matrix for the parameter estimates is included in the OUTEST= data set. Each observation for parameter estimates is followed by observations that contain the rows of the parameter covariance matrix for that model. The row of the covariance matrix is identified by the variable _NAME_. For observations that contain parameter estimates, _NAME_ is blank. For covariance observations, _NAME_ contains the regressor name for the row of the covariance matrix and the regressor variables contain the covariances. See Example 26.1 for an example of the OUTEST= data set.
OUTSSCP= Data Set The OUTSSCP= option produces a TYPE=SSCP output SAS data set that contains sums of squares and cross products. The data set contains all variables used in the MODEL, IDENTITY, and VAR
Printed Output F 1657
statements. Observations are identified by the variable _NAME_. The OUTSSCP= data set can be useful when a large number of observations are to be explored in many different SYSLIN runs. The sum-of-squares-and-crossproducts matrix can be saved with the OUTSSCP= option and used as the DATA= data set on subsequent SYSLIN runs. This is much less expensive computationally because PROC SYSLIN never reads the original data again. In the step that creates the OUTSSCP= data set, include in the VAR statement all the variables you expect to use.
Printed Output The printed output produced by the SYSLIN procedure is as follows: 1. If the SIMPLE option is used, a table of descriptive statistics is printed that shows the sum, mean, sum of squares, variance, and standard deviation for all the variables used in the models. 2. If the FIRST option is specified and an instrumental variables method is used, first-stage regression results are printed. The results show the regression of each endogenous variable on the variables in the INSTRUMENTS list. 3. The results of the second-stage regression are printed for each model. (See the following section “Printed Output for Each Model” on page 1657 for details.) 4. If a systems method like 3SLS, SUR, or FIML is used, the cross-equation error covariance matrix is printed. This matrix is shown four ways: the covariance matrix itself, the correlation matrix form, the inverse of the correlation matrix, and the inverse of the covariance matrix. 5. If a systems method like 3SLS, SUR, or FIML is used, the system weighted mean squared error and system weighted R2 statistics are printed. The system weighted MSE and R2 measure the fit of the joint model obtained by stacking all the models together and performing a single regression with the stacked observations weighted by the inverse of the model error variances. 6. If a systems method like 3SLS, SUR, or FIML is used, the final results are printed for each model. 7. If the REDUCED option is used, the reduced form coefficients are printed. These consist of the structural coefficient matrix for the endogenous variables, the structural coefficient matrix for the exogenous variables, the inverse of the endogenous coefficient matrix, and the reduced form coefficient matrix. The reduced form coefficient matrix is the product of the inverse of the endogenous coefficient matrix and the exogenous structural coefficient matrix.
Printed Output for Each Model The results printed for each model include the analysis-of-variance table, the “Parameter Estimates” table, and optional items requested by TEST statements or by options in the MODEL statement.
1658 F Chapter 26: The SYSLIN Procedure
The printed output produced for each model is described in the following. The analysis-of-variance table includes the following: the model degrees of freedom, sum of squares, and mean square the error degrees of freedom, sum of squares, and mean square. The error mean square is computed by dividing the error sum of squares by the error degrees of freedom and is not affected by the VARDEF= option. the corrected total degrees of freedom and total sum of squares. Note that for instrumental variables methods, the model and error sums of squares do not add to the total sum of squares. the F ratio, labeled “F Value,” and its significance, labeled “PROB>F,” for the test of the hypothesis that all the nonintercept parameters are 0 the root mean squared error. This is the square root of the error mean square. the dependent variable mean the coefficient of variation (CV) of the dependent variable the R2 statistic. This R2 is computed consistently with the calculation of the F statistic. It is valid for hypothesis tests but might not be a good measure of fit for models estimated by instrumental variables methods. the R2 statistic adjusted for model degrees of freedom, labeled “Adj R-SQ” The “Parameter Estimates” table includes the following: estimates of parameters for regressors in the model and the Lagrangian parameter for each restriction specified a degrees of freedom column labeled DF. Estimated model parameters have 1 degree of freedom. Restrictions have a DF of –1. Regressors or restrictions dropped from the model due to collinearity have a DF of 0. the standard errors of the parameter estimates the t statistics, which are the parameter estimates divided by the standard errors the significance of the t tests for the hypothesis that the true parameter is 0, labeled “Pr > |t|.” As previously noted, the significance tests are strictly valid in finite samples only for OLS estimates but are asymptotically valid for the other methods. the standardized regression coefficients, if the STB option is specified. This is the parameter estimate multiplied by the ratio of the standard deviation of the regressor to the standard deviation of the dependent variable. the labels of the regressor variables or restriction labels
ODS Table Names F 1659
In addition to the analysis-of-variance table and the “Parameter Estimates” table, the results printed for each model can include the following: If TEST statements are specified, the test results are printed. If the DW option is specified, the Durbin-Watson statistic and first-order autocorrelation coefficient are printed. If the OVERID option is specified, the results of Basmann’s test for overidentifying restrictions are printed. If the PLOT option is used, plots of residual against each regressor are printed. If the COVB or CORRB options are specified, the results for each model also include the covariance or correlation matrix of the parameter estimates. For systems methods like 3SLS and FIML, the COVB and CORB output is printed for the whole system after the output for the last model, instead of separately for each model. The third-stage output for 3SLS, SUR, IT3SLS, ITSUR, and FIML does not include the analysisof-variance table. When a systems method is used, the second-stage output does not include the optional output, except for the COVB and CORRB matrices.
ODS Table Names PROC SYSLIN assigns a name to each table it creates. You can use these names to reference the table when you use the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. If the estimation method used is 3SLS, IT3SLS, ITSUR or SUR, you can obtain tables by specifying ODS OUTPUT CorrResiduals, InvCorrResiduals, InvCovResiduals. Table 26.2
ODS Tables Produced in PROC SYSLIN
ODS Table Name ANOVA AugXPXMat AutoCorrStat ConvergenceStatus CorrB CorrResiduals CovB CovResiduals EndoMat ExogMat FitStatistics InvCorrResiduals InvCovResiduals
Description Summary of the SSE, MSE for the equations Model crossproducts Autocorrelation statistics Convergence status Correlations of parameters Correlations of residuals Covariance of parameters Covariance of residuals Endogenous variables Exogenous variables Statistics of fit Inverse correlations of residuals Inverse covariance of residuals
Option default XPX or USSCP DW default CORRB COVB REDUCED REDUCED default
1660 F Chapter 26: The SYSLIN Procedure
Table 26.2
(continued)
ODS Table Name InvEndoMat InvXPX IterHistory MissingValues ModelVars ParameterEstimates RedMat SimpleStatistics SSCP TestResults Weight
Description Inverse endogenous variables X 0 X inverse for system Iteration printing Missing values generated by the program Name and label for the model Parameter estimates Reduced form Descriptive statistics Model crossproducts Test for overidentifying restrictions Weighted model statistics
Option REDUCED I ITPRINT default default default REDUCED SIMPLE XPX or USSCP
ODS Graphics This section describes the use of ODS for creating graphics with the SYSLIN procedure.
ODS Graph Names PROC SYSLIN assigns a name to each graph it creates using ODS. You can use these names to reference the graphs when you use ODS. The names are listed in Table 26.3. To request these graphs, you must specify the ODS GRAPHICS statement. Table 26.3
ODS Graphics Produced by PROC SYSLIN
ODS Graph Name ActualByPredicted QQPlot ResidualHistogram ResidualPlot
Plot Description Predicted versus actual plot Q-Q plot of residuals Histogram of the residuals Residual plot
Examples: SYSLIN Procedure F 1661
Examples: SYSLIN Procedure
Example 26.1: Klein’s Model I Estimated with LIML and 3SLS This example uses PROC SYSLIN to estimate the classic Klein Model I. For a discussion of this model, see Theil (1971). The following statements read the data. *---------------------------Klein’s Model I----------------------------* | By L.R. Klein, Economic Fluctuations in the United States, 1921-1941 | | (1950), NY: John Wiley. A macro-economic model of the U.S. with | | three behavioral equations, and several identities. See Theil, p.456.| *----------------------------------------------------------------------*; data klein; input year c p w i x wp g t k wsum; date=mdy(1,1,year); format date monyy.; y =c+i+g-t; yr =year-1931; klag=lag(k); plag=lag(p); xlag=lag(x); label year=’Year’ date=’Date’ c =’Consumption’ p =’Profits’ w =’Private Wage Bill’ i =’Investment’ k =’Capital Stock’ y =’National Income’ x =’Private Production’ wsum=’Total Wage Bill’ wp =’Govt Wage Bill’ g =’Govt Demand’ i =’Taxes’ klag=’Capital Stock Lagged’ plag=’Profits Lagged’ xlag=’Private Product Lagged’ yr =’YEAR-1931’; datalines; 1920 . 12.7 . . 44.9 . . . 182.8 . 1921 41.9 12.4 25.5 -0.2 45.6 2.7 3.9 7.7 182.6 28.2 1922 45.0 16.9 29.3 1.9 50.1 2.9 3.2 3.9 184.5 32.2 1923 49.2 18.4 34.1 5.2 57.2 2.9 2.8 4.7 189.7 37.0 1924 50.6 19.4 33.9 3.0 57.1 3.1 3.5 3.8 192.7 37.0 1925 52.6 20.1 35.4 5.1 61.0 3.2 3.3 5.5 197.8 38.6 1926 55.1 19.6 37.4 5.6 64.0 3.3 3.3 7.0 203.4 40.7 1927 56.2 19.8 37.9 4.2 64.4 3.6 4.0 6.7 207.6 41.5 1928 57.3 21.1 39.2 3.0 64.5 3.7 4.2 4.2 210.6 42.9 1929 57.8 21.7 41.3 5.1 67.0 4.0 4.1 4.0 215.7 45.3
1662 F Chapter 26: The SYSLIN Procedure
... more lines ...
The following statements estimate the Klein model using the limited information maximum likelihood method. In addition, the parameter estimates are written to a SAS data set with the OUTEST= option. proc syslin data=klein outest=b liml; endogenous c p w i x wsum k y; instruments klag plag xlag wp g t yr; consume: model c = p plag wsum; invest: model i = p plag klag; labor: model w = x xlag yr; run; proc print data=b; run;
The PROC SYSLIN estimates are shown in Output 26.1.1 through Output 26.1.3. Output 26.1.1 LIML Estimates for Consumption The SYSLIN Procedure Limited-Information Maximum Likelihood Estimation Model Dependent Variable Label
CONSUME c Consumption
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
Model Error Corrected Total
3 17 20
854.3541 40.88419 941.4295
284.7847 2.404952
Root MSE Dependent Mean Coeff Var
1.55079 53.99524 2.87209
R-Square Adj R-Sq
F Value
Pr > F
118.42
<.0001
0.95433 0.94627
Parameter Estimates
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
Intercept p plag wsum
1 1 1 1
17.14765 -0.22251 0.396027 0.822559
2.045374 0.224230 0.192943 0.061549
8.38 -0.99 2.05 13.36
<.0001 0.3349 0.0558 <.0001
Variable Label Intercept Profits Profits Lagged Total Wage Bill
Example 26.1: Klein’s Model I Estimated with LIML and 3SLS F 1663
Output 26.1.2 LIML Estimates for Investments The SYSLIN Procedure Limited-Information Maximum Likelihood Estimation Model Dependent Variable Label
INVEST i Taxes
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
Model Error Corrected Total
3 17 20
210.3790 34.99649 252.3267
70.12634 2.058617
Root MSE Dependent Mean Coeff Var
1.43479 1.26667 113.27274
F Value
Pr > F
34.06
<.0001
R-Square Adj R-Sq
0.85738 0.83221
Parameter Estimates
Variable Intercept p plag klag
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
1 1 1 1
22.59083 0.075185 0.680386 -0.16826
9.498146 0.224712 0.209145 0.045345
2.38 0.33 3.25 -3.71
0.0294 0.7420 0.0047 0.0017
Variable Label Intercept Profits Profits Lagged Capital Stock Lagged
Output 26.1.3 LIML Estimates for Labor The SYSLIN Procedure Limited-Information Maximum Likelihood Estimation Model Dependent Variable Label
LABOR w Private Wage Bill
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
Model Error Corrected Total
3 17 20
696.1485 10.02192 794.9095
232.0495 0.589525
Root MSE Dependent Mean Coeff Var
0.76781 36.36190 2.11156
R-Square Adj R-Sq
F Value
Pr > F
393.62
<.0001
0.98581 0.98330
1664 F Chapter 26: The SYSLIN Procedure
Output 26.1.3 continued Parameter Estimates
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
Intercept x xlag
1 1 1
1.526187 0.433941 0.151321
1.320838 0.075507 0.074527
1.16 5.75 2.03
0.2639 <.0001 0.0583
yr
1
0.131593
0.035995
3.66
0.0020
Variable Label Intercept Private Production Private Product Lagged YEAR-1931
The OUTEST= data set is shown in part in Output 26.1.4. Note that the data set contains the parameter estimates and root mean squared errors, _SIGMA_, for the first-stage instrumental regressions as well as the parameter estimates and for the LIML estimates for the three structural equations. Output 26.1.4 The OUTEST= Data Set Obs _TYPE_ 1 2 3
LIML LIML LIML
_STATUS_
_MODEL_ _DEPVAR_ _SIGMA_ Intercept
0 Converged CONSUME 0 Converged INVEST 0 Converged LABOR
Obs
xlag
1 2 3
. . 0.15132
c i w
1.55079 1.43479 0.76781
klag
17.1477 22.5908 1.5262
plag
. 0.39603 -0.16826 0.68039 . .
wp
g
t
yr
c
p
w
i
x
wsum
k
y
. . .
. . .
. . .
. . 0.13159
-1 . .
-0.22251 0.07518 .
. . -1
. -1 .
. . 0.43394
0.82256 . .
. . .
. . .
The following statements estimate the model using the 3SLS method. The reduced form estimates are produced by the REDUCED option; IDENTITY statements are used to make the model complete. proc syslin data=klein 3sls reduced; endogenous c p w i x wsum k y; instruments klag plag xlag wp g t yr; consume: model c = p plag wsum; invest: model i = p plag klag; labor: model w = x xlag yr; product: identity x = c + i + g; income: identity y = c + i + g - t; profit: identity p = y - w; stock: identity k = klag + i; wage: identity wsum = w + wp; run;
The preliminary 2SLS results and estimated cross-model covariance matrix are not shown. The 3SLS estimates are shown in Output 26.1.5 through Output 26.1.7. The reduced form estimates are shown in Output 26.1.8 through Output 26.1.11.
Example 26.1: Klein’s Model I Estimated with LIML and 3SLS F 1665
Output 26.1.5 3SLS Estimates for Consumption The SYSLIN Procedure Three-Stage Least Squares Estimation System Weighted MSE Degrees of freedom System Weighted R-Square Model Dependent Variable Label
5.9342 51 0.9550
CONSUME c Consumption
Parameter Estimates
Variable Intercept p plag wsum
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
1 1 1 1
16.44079 0.124890 0.163144 0.790081
1.449925 0.120179 0.111631 0.042166
11.34 1.04 1.46 18.74
<.0001 0.3133 0.1621 <.0001
Variable Label Intercept Profits Profits Lagged Total Wage Bill
Output 26.1.6 3SLS Estimates for Investments Model Dependent Variable Label
INVEST i Taxes
Parameter Estimates
Variable Intercept p plag klag
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
1 1 1 1
28.17785 -0.01308 0.755724 -0.19485
7.550853 0.179938 0.169976 0.036156
3.73 -0.07 4.45 -5.39
0.0017 0.9429 0.0004 <.0001
Variable Label Intercept Profits Profits Lagged Capital Stock Lagged
Output 26.1.7 3SLS Estimates for Labor Model Dependent Variable Label
LABOR w Private Wage Bill
1666 F Chapter 26: The SYSLIN Procedure
Output 26.1.7 continued Parameter Estimates
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
Intercept x xlag
1 1 1
1.797218 0.400492 0.181291
1.240203 0.035359 0.037965
1.45 11.33 4.78
0.1655 <.0001 0.0002
yr
1
0.149674
0.031048
4.82
0.0002
Variable Label Intercept Private Production Private Product Lagged YEAR-1931
Output 26.1.8 Reduced Form Estimates Endogenous Variables
CONSUME INVEST LABOR PRODUCT INCOME PROFIT STOCK WAGE
c
p
w
i
1 0 0 -1 -1 0 0 0
-0.12489 0.013079 0 0 0 1 0 0
0 0 1 0 0 1 0 -1
0 1 0 -1 -1 0 -1 0
Endogenous Variables
CONSUME INVEST LABOR PRODUCT INCOME PROFIT STOCK WAGE
x
wsum
k
y
0 0 -0.40049 1 0 0 0 0
-0.79008 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0
0 0 0 0 1 -1 0 0
Example 26.1: Klein’s Model I Estimated with LIML and 3SLS F 1667
Output 26.1.9 Reduced Form Estimates Exogenous Variables
CONSUME INVEST LABOR PRODUCT INCOME PROFIT STOCK WAGE
Intercept
plag
klag
xlag
16.44079 28.17785 1.797218 0 0 0 0 0
0.163144 0.755724 0 0 0 0 0 0
0 -0.19485 0 0 0 0 1 0
0 0 0.181291 0 0 0 0 0
Exogenous Variables
CONSUME INVEST LABOR PRODUCT INCOME PROFIT STOCK WAGE
yr
g
t
wp
0 0 0.149674 0 0 0 0 0
0 0 0 1 1 0 0 0
0 0 0 0 -1 0 0 0
0 0 0 0 0 0 0 1
Output 26.1.10 Reduced Form Estimates Inverse Endogenous Variables
c p w i x wsum k y
CONSUME
INVEST
LABOR
PRODUCT
1.634654 0.972364 0.649572 -0.01272 1.621936 0.649572 -0.01272 1.621936
0.634654 0.972364 0.649572 0.987282 1.621936 0.649572 0.987282 1.621936
1.095657 -0.34048 1.440585 0.004453 1.10011 1.440585 0.004453 1.10011
0.438802 -0.13636 0.576943 0.001783 1.440585 0.576943 0.001783 0.440585
Inverse Endogenous Variables
c p w i x wsum k y
INCOME
PROFIT
STOCK
WAGE
0.195852 1.108721 0.072629 -0.0145 0.181351 0.072629 -0.0145 1.181351
0.195852 1.108721 0.072629 -0.0145 0.181351 0.072629 -0.0145 0.181351
0 0 0 0 0 0 1 0
1.291509 0.768246 0.513215 -0.01005 1.281461 1.513215 -0.01005 1.281461
1668 F Chapter 26: The SYSLIN Procedure
Output 26.1.11 Reduced Form Estimates Reduced Form
c p w i x wsum k y
Intercept
plag
klag
xlag
46.7273 42.77363 31.57207 27.6184 74.3457 31.57207 27.6184 74.3457
0.746307 0.893474 0.596871 0.744038 1.490345 0.596871 0.744038 1.490345
-0.12366 -0.18946 -0.12657 -0.19237 -0.31603 -0.12657 0.80763 -0.31603
0.198633 -0.06173 0.261165 0.000807 0.19944 0.261165 0.000807 0.19944
Reduced Form
c p w i x wsum k y
yr
g
t
wp
0.163991 -0.05096 0.215618 0.000667 0.164658 0.215618 0.000667 0.164658
0.634654 0.972364 0.649572 -0.01272 1.621936 0.649572 -0.01272 1.621936
-0.19585 -1.10872 -0.07263 0.014501 -0.18135 -0.07263 0.014501 -1.18135
1.291509 0.768246 0.513215 -0.01005 1.281461 1.513215 -0.01005 1.281461
Example 26.2: Grunfeld’s Model Estimated with SUR The following example was used by Zellner in his classic 1962 paper on seemingly unrelated regressions. Different stock prices often move in the same direction at a given point in time. The SUR technique might provide more efficient estimates than OLS in this situation. The following statements read the data. (The prefix GE stands for General Electric and WH stands for Westinghouse.) *---------Zellner’s Seemingly Unrelated Technique------------* | A. Zellner, "An Efficient Method of Estimating Seemingly | | Unrelated Regressions and Tests for Aggregation Bias," | | JASA 57(1962) pp.348-364 | | | | J.C.G. Boot, "Investment Demand: an Empirical Contribution | | to the Aggregation Problem," IER 1(1960) pp.3-30. | | | | Y. Grunfeld, "The Determinants of Corporate Investment," | | Unpublished thesis, Chicago, 1958 | *------------------------------------------------------------*; data grunfeld; input year ge_i ge_f ge_c wh_i wh_f wh_c;
Example 26.2: Grunfeld’s Model Estimated with SUR F 1669
label ge_i ge_c ge_f wh_i wh_c wh_f datalines;
= = = = = =
’Gross Investment, GE’ ’Capital Stock Lagged, GE’ ’Value of Outstanding Shares Lagged, GE’ ’Gross Investment, WH’ ’Capital Stock Lagged, WH’ ’Value of Outstanding Shares Lagged, WH’;
... more lines ...
The following statements compute the SUR estimates for the Grunfeld model. proc syslin data=grunfeld sur; ge: model ge_i = ge_f ge_c; westing: model wh_i = wh_f wh_c; run;
The PROC SYSLIN output is shown in Output 26.2.1 through Output 26.2.5. Output 26.2.1 PROC SYSLIN Output for SUR The SYSLIN Procedure Ordinary Least Squares Estimation Model Dependent Variable Label
GE ge_i Gross Investment, GE
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
Model Error Corrected Total
2 17 19
31632.03 13216.59 44848.62
15816.02 777.4463
Root MSE Dependent Mean Coeff Var
27.88272 102.29000 27.25850
R-Square Adj R-Sq
F Value
Pr > F
20.34
<.0001
0.70531 0.67064
Parameter Estimates
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
Intercept ge_f
1 1
-9.95631 0.026551
31.37425 0.015566
-0.32 1.71
0.7548 0.1063
ge_c
1
0.151694
0.025704
5.90
<.0001
Variable
Variable Label Intercept Value of Outstanding Shares Lagged, GE Capital Stock Lagged, GE
1670 F Chapter 26: The SYSLIN Procedure
Output 26.2.2 PROC SYSLIN Output for SUR The SYSLIN Procedure Ordinary Least Squares Estimation Model Dependent Variable Label
WESTING wh_i Gross Investment, WH
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
Model Error Corrected Total
2 17 19
5165.553 1773.234 6938.787
2582.776 104.3079
Root MSE Dependent Mean Coeff Var
10.21312 42.89150 23.81153
R-Square Adj R-Sq
F Value
Pr > F
24.76
<.0001
0.74445 0.71438
Parameter Estimates
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
Intercept wh_f
1 1
-0.50939 0.052894
8.015289 0.015707
-0.06 3.37
0.9501 0.0037
wh_c
1
0.092406
0.056099
1.65
0.1179
Variable Label Intercept Value of Outstanding Shares Lagged, WH Capital Stock Lagged, WH
Output 26.2.3 PROC SYSLIN Output for SUR The SYSLIN Procedure Seemingly Unrelated Regression Estimation Cross Model Covariance
GE WESTING
GE
WESTING
777.446 207.587
207.587 104.308
Cross Model Correlation
GE WESTING
GE
WESTING
1.00000 0.72896
0.72896 1.00000
Example 26.2: Grunfeld’s Model Estimated with SUR F 1671
Output 26.2.3 continued Cross Model Inverse Correlation
GE WESTING
GE
WESTING
2.13397 -1.55559
-1.55559 2.13397
Cross Model Inverse Covariance
GE WESTING
GE
WESTING
0.002745 -.005463
-.005463 0.020458
Output 26.2.4 PROC SYSLIN Output for SUR System Weighted MSE Degrees of freedom System Weighted R-Square Model Dependent Variable Label
0.9719 34 0.6284
GE ge_i Gross Investment, GE
Parameter Estimates
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
Intercept ge_f
1 1
-27.7193 0.038310
29.32122 0.014415
-0.95 2.66
0.3577 0.0166
ge_c
1
0.139036
0.024986
5.56
<.0001
Variable
Variable Label Intercept Value of Outstanding Shares Lagged, GE Capital Stock Lagged, GE
Output 26.2.5 PROC SYSLIN Output for SUR Model Dependent Variable Label
WESTING wh_i Gross Investment, WH
1672 F Chapter 26: The SYSLIN Procedure
Output 26.2.5 continued Parameter Estimates
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
Intercept wh_f
1 1
-1.25199 0.057630
7.545217 0.014546
-0.17 3.96
0.8702 0.0010
wh_c
1
0.063978
0.053041
1.21
0.2443
Variable Label Intercept Value of Outstanding Shares Lagged, WH Capital Stock Lagged, WH
Example 26.3: Illustration of ODS Graphics This example illustrates the use of ODS graphics. This is a continuation of the section “Example 26.1: Klein’s Model I Estimated with LIML and 3SLS” on page 1661. These graphical displays are requested by specifying the ODS GRAPHICS statement before running PROC SYSLIN. For information about the graphics available in the SYSLIN procedure, see the section “ODS Graphics” on page 1660. The following statements show how to generate ODS graphics plots with the SYSLIN procedure. The plots of residuals for each one of the equations in the model are displayed in Figure 26.3.1 through Figure 26.3.3. *---------------------------Klein’s Model I----------------------------* | By L.R. Klein, Economic Fluctuations in the United States, 1921-1941 | | (1950), NY: John Wiley. A macro-economic model of the U.S. with | | three behavioral equations, and several identities. See Theil, p.456.| *----------------------------------------------------------------------*; data klein; input year c p w i x wp g t k wsum; date=mdy(1,1,year); format date monyy.; y =c+i+g-t; yr =year-1931; klag=lag(k); plag=lag(p); xlag=lag(x); label year=’Year’ date=’Date’ c =’Consumption’ p =’Profits’ w =’Private Wage Bill’ i =’Investment’ k =’Capital Stock’ y =’National Income’ x =’Private Production’ wsum=’Total Wage Bill’
Example 26.3: Illustration of ODS Graphics F 1673
wp =’Govt Wage Bill’ g =’Govt Demand’ i =’Taxes’ klag=’Capital Stock Lagged’ plag=’Profits Lagged’ xlag=’Private Product Lagged’ yr =’YEAR-1931’; datalines; 1920 . 12.7 . . 44.9 . 1921 41.9 12.4 25.5 -0.2 45.6 2.7 1922 45.0 16.9 29.3 1.9 50.1 2.9 1923 49.2 18.4 34.1 5.2 57.2 2.9 1924 50.6 19.4 33.9 3.0 57.1 3.1 1925 52.6 20.1 35.4 5.1 61.0 3.2 1926 55.1 19.6 37.4 5.6 64.0 3.3 1927 56.2 19.8 37.9 4.2 64.4 3.6 1928 57.3 21.1 39.2 3.0 64.5 3.7 1929 57.8 21.7 41.3 5.1 67.0 4.0
. 3.9 3.2 2.8 3.5 3.3 3.3 4.0 4.2 4.1
. 7.7 3.9 4.7 3.8 5.5 7.0 6.7 4.2 4.0
182.8 182.6 184.5 189.7 192.7 197.8 203.4 207.6 210.6 215.7
. 28.2 32.2 37.0 37.0 38.6 40.7 41.5 42.9 45.3
... more lines ...
ods graphics on; proc syslin data=klein outest=b liml plots(unpack only)=residual ; endogenous c p w i x wsum k y; instruments klag plag xlag wp g t yr; consume: model c = p plag wsum; invest: model i = p plag klag; labor: model w = x xlag yr; run;
1674 F Chapter 26: The SYSLIN Procedure
Output 26.3.1 Residuals Diagnostic Plots for Consumption
Example 26.3: Illustration of ODS Graphics F 1675
Output 26.3.2 Residuals Diagnostic Plots for Investments
1676 F Chapter 26: The SYSLIN Procedure
Output 26.3.3 Residuals Diagnostic Plots for Labor
References Basmann, R.L. (1960), “On Finite Sample Distributions of Generalized Classical Linear Identifiability Test Statistics,” Journal of the American Statistical Association, 55, 650–659. Fuller, W.A. (1977), “Some Properties of a Modification of the Limited Information Estimator,” Econometrica, 45, 939–952. Hausman, J.A. (1975), “An Instrumental Variable Approach to Full Information Estimators for Linear and Certain Nonlinear Econometric Models,” Econometrica, 43, 727–738. Johnston, J. (1984), Econometric Methods, Third Edition, New York: McGraw-Hill. Judge, George G., W. E. Griffiths, R. Carter Hill, Helmut Lutkepohl, and Tsoung-Chao Lee (1985), The Theory and Practice of Econometrics, Second Edition, New York: John Wiley & Sons. Maddala, G.S. (1977), Econometrics, New York: McGraw-Hill. Park, S.B. (1982), “Some Sampling Properties of Minimum Expected Loss (MELO) Estimators of
References F 1677
Structural Coefficients,” Journal of the Econometrics, 18, 295–311. Pindyck, R.S. and Rubinfeld, D.L. (1981), Econometric Models and Economic Forecasts, Second Edition, New York: McGraw-Hill. Pringle, R.M. and Rayner, A.A. (1971), Generalized Inverse Matrices with Applications to Statistics, New York: Hafner Publishing Company. Rao, P. (1974), “Specification Bias in Seemingly Unrelated Regressions,” in Essays in Honor of Tinbergen, Volume 2, New York: International Arts and Sciences Press. Savin, N.E. and White, K.J. (1978), “Testing for Autocorrelation with Missing Observations,” Econometrics, 46, 59–66. Theil, H. (1971), Principles of Econometrics, New York: John Wiley & Sons. Zellner, A. (1962), “An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias,” Journal of the American Statistical Association, 57, 348–368. Zellner, A. (1978), “Estimation of Functions of Population Means and Regression Coefficients: A Minimum Expected Loss (MELO) Approach,” Journal of the Econometrics, 8, 127–158. Zellner, A. and Park, S. (1979), “Minimum Expected Loss (MELO) Estimators for Functions of Parameters and Structural Coefficients of Econometric Models,” Journal of the American Statistical Association, 74, 185–193.
1678
Chapter 27
The TIMESERIES Procedure Contents Overview: TIMESERIES Procedure . . . Getting Started: TIMESERIES Procedure Syntax: TIMESERIES Procedure . . . . Functional Summary . . . . . . . . PROC TIMESERIES Statement . . BY Statement . . . . . . . . . . . CORR Statement . . . . . . . . . . CROSSCORR Statement . . . . . DECOMP Statement . . . . . . . . ID Statement . . . . . . . . . . . . SEASON Statement . . . . . . . . TREND Statement . . . . . . . . . VAR and CROSSVAR Statements . Details: TIMESERIES Procedure . . . . Accumulation . . . . . . . . . . . Missing Value Interpretation . . . . Time Series Transformation . . . . Time Series Differencing . . . . . Descriptive Statistics . . . . . . . . Seasonal Decomposition . . . . . . Correlation Analysis . . . . . . . . Cross-Correlation Analysis . . . . Data Set Output . . . . . . . . . . OUT= Data Set . . . . . . . . . . . OUTCORR= Data Set . . . . . . . OUTCROSSCORR= Data Set . . . OUTDECOMP= Data Set . . . . . OUTSEASON= Data Set . . . . . OUTSUM= Data Set . . . . . . . . OUTTREND= Data Set . . . . . . _STATUS_ Variable Values . . . . Printed Output . . . . . . . . . . . ODS Table Names . . . . . . . . . ODS Graphics Names . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1680 1681 1683 1684 1686 1688 1689 1690 1691 1693 1696 1697 1698 1699 1700 1701 1702 1702 1703 1703 1705 1706 1707 1707 1707 1708 1709 1710 1711 1712 1713 1713 1714 1714
1680 F Chapter 27: The TIMESERIES Procedure
Examples: TIMESERIES Procedure . . . . . . . . . . . . . . . . . . . . . . Example 27.1: Accumulating Transactional Data into Time Series Data Example 27.2: Trend and Seasonal Analysis . . . . . . . . . . . . . . Example 27.3: Illustration of ODS Graphics . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
1715 1715 1717 1721 1725
Overview: TIMESERIES Procedure The TIMESERIES procedure analyzes time-stamped transactional data with respect to time and accumulates the data into a time series format. The procedure can perform trend and seasonal analysis on the transactions. After the transactional data are accumulated, time domain and frequency domain analysis can be performed on the accumulated time series. For seasonal analysis of the transaction data, various statistics can be computed for each season. For trend analysis of the transaction data, various statistics can be computed for each time period. The analysis is similar to applying the MEANS procedure of Base SAS software to each season or time period of concern. After the transactional data are accumulated to form a time series and any missing values are interpreted, the accumulated time series can be functionally transformed using log, square root, logistic, or Box-Cox transformations. The time series can be further transformed using simple and/or seasonal differencing. After functional and difference transformations have been applied, the accumulated and transformed time series can be stored in an output data set. This working time series can then be analyzed further using various time series analysis techniques provided by this procedure or other SAS/ETS procedures. Time series analyses performed by the TIMESERIES procedure include: descriptive (global) statistics seasonal decomposition/adjustment analysis correlation analysis cross-correlation analysis All results of the transactional or time series analysis can be stored in output data sets or printed using the Output Delivery System (ODS). The TIMESERIES procedure can process large amounts of time-stamped transactional data. Therefore, the analysis results are useful for large-scale time series analysis or (temporal) data mining. All of the results can be stored in output data sets in either a time series format (default) or in a coordinate format (transposed). The time series format is useful for preparing the data for subsequent analysis with other SAS/ETS procedures. For example, the working time series can be further analyzed, modeled, and forecast with other SAS/ETS procedures. The coordinate format is useful
Getting Started: TIMESERIES Procedure F 1681
when using this procedure with SAS/STAT procedures or SAS Enterprise Miner. For example, clustering time-stamped transactional data can be achieved by using the results of this procedure with the clustering procedures of SAS/STAT and the nodes of SAS Enterprise Miner. The EXPAND procedure can be used for the frequency conversion and transformations of time series output from this procedure.
Getting Started: TIMESERIES Procedure This section outlines the use of the TIMESERIES procedure and gives a cursory description of some of the analysis techniques that can be performed on time-stamped transactional data. Given an input data set that contains numerous transaction variables recorded over time at no specific frequency, the TIMESERIES procedure can form time series as follows: PROC TIMESERIES DATA=
The TIMESERIES procedure forms time series from the input time-stamped transactional data. It can provide results in output data sets or in other output formats by using the Output Delivery System (ODS). Time-stamped transactional data are often recorded at no fixed interval. Analysts often want to use time series analysis techniques that require fixed-time intervals. Therefore, the transactional data must be accumulated to form a fixed-interval time series. Suppose that a bank wants to analyze the transactions associated with each of its customers over time. Further, suppose that the data set WORK.TRANSACTIONS contains four variables that are related to these transactions: CUSTOMER, DATE, WITHDRAWAL, and DEPOSITS. The following examples illustrate possible ways to analyze these transactions by using the TIMESERIES procedure. To accumulate the time-stamped transactional data to form a daily time series based on the accumulated daily totals of each type of transaction (WITHDRAWALS and DEPOSITS ), the following TIMESERIES procedure statements can be used: proc timeseries data=transactions out=timeseries; by customer; id date interval=day accumulate=total; var withdrawals deposits; run;
1682 F Chapter 27: The TIMESERIES Procedure
The OUT=TIMESERIES option specifies that the resulting time series data for each customer is to be stored in the data set WORK.TIMESERIES. The INTERVAL=DAY option specifies that the transactions are to be accumulated on a daily basis. The ACCUMULATE=TOTAL option specifies that the sum of the transactions is to be calculated. After the transactional data is accumulated into a time series format, many of the procedures provided with SAS/ETS software can be used to analyze the resulting time series data. For example, the ARIMA procedure can be used to model and forecast each customer’s withdrawal data by using an ARIMA(0,1,1)(0,1,1)s model (where the number of seasons is s=7 days in a week) using the following statements: proc arima data=timeseries; identify var=withdrawals(1,7) noprint; estimate q=(1)(7) outest=estimates noprint; forecast id=date interval=day out=forecasts; quit;
The OUTEST=ESTIMATES data set contains the parameter estimates of the model specified. The OUT=FORECASTS data set contains forecasts based on the model specified. See the SAS/ETS ARIMA procedure for more detail. A single set of transactions can be very large and must be summarized in order to analyze them effectively. Analysts often want to examine transactional data for trends and seasonal variation. To analyze transactional data for trends and seasonality, statistics must be computed for each time period and season of concern. For each observation, the time period and season must be determined and the data must be analyzed based on this determination. The following statements illustrate how to use the TIMESERIES procedure to perform trend and seasonal analysis of time-stamped transactional data. proc timeseries data=transactions out=out outseason=season outtrend=trend; by customer; id date interval=day accumulate=total; var withdrawals deposits; run;
Since the INTERVAL=DAY option is specified, the length of the seasonal cycle is seven (7) where the first season is Sunday and the last season is Saturday. The output data set specified by the OUTSEASON=SEASON option contains the seasonal statistics for each day of the week by each customer. The output data set specified by the OUTTREND=TREND option contains the trend statistics for each day of the calendar by each customer. Often it is desired to seasonally decompose into seasonal, trend, cycle, and irregular components or to seasonally adjust a time series. The following techniques describe how the changing seasons influence the time series. The following statements illustrate how to use the TIMESERIES procedure to perform seasonal adjustment/decomposition analysis of time-stamped transactional data.
Syntax: TIMESERIES Procedure F 1683
proc timeseries data=transactions out=out outdecomp=decompose; by customer; id date interval=day accumulate=total; var withdrawals deposits; run;
The output data set specified by the OUTDECOMP=DECOMPOSE data set contains the decomposed/adjusted time series for each customer. A single time series can be very large. Often, a time series must be summarized with respect to time lags in order to be efficiently analyzed using time domain techniques. These techniques help describe how a current observation is related to the past observations with respect to the time (season) lag. The following statements illustrate how to use the TIMESERIES procedure to perform time domain analysis of time-stamped transactional data. proc timeseries data=transactions out=out outcorr=timedomain; by customer; id date interval=day accumulate=total; var withdrawals deposits; run;
The output data set specified by the OUTCORR=TIMEDOMAIN data set contains the time domain statistics, such as sample autocorrelations and partial autocorrelations, by each customer. By default, the TIMESERIES procedure produces no printed output.
Syntax: TIMESERIES Procedure The following statements are used with the TIMESERIES procedure: PROC TIMESERIES options ; BY variables ; CORR statistics-list / options ; CROSSCORR statistics-list / options ; CROSSVAR variable-list / options ; DECOMP component-list / options ; ID variable INTERVAL= interval-option ; SEASON statistics-list / options ; TREND statistics-list / options ; VAR variable-list / options ;
1684 F Chapter 27: The TIMESERIES Procedure
Functional Summary Table 27.1 summarizes the statements and options that control the TIMESERIES procedure. Table 27.1
TIMESERIES Functional Summary
Description
Statement
Statements specify BY-group processing specify variables to analyze specify cross variables to analyze specify the time ID variable specify correlation options specify cross-correlation options specify decomposition options specify seasonal statistics options specify trend statistics options
BY VAR CROSSVAR ID CORR CROSSCORR DECOMP SEASON TREND
Data Set Options specify the input data set specify the output data set specify correlations output data set specify cross-correlations output data set specify decomposition output data set specify seasonal statistics output data set specify summary statistics output data set specify trend statistics output data set
PROC TIMESERIES PROC TIMESERIES PROC TIMESERIES PROC TIMESERIES PROC TIMESERIES PROC TIMESERIES PROC TIMESERIES PROC TIMESERIES
DATA= OUT= OUTCORR= OUTCROSSCORR= OUTDECOMP= OUTSEASON= OUTSUM= OUTTREND=
ID PROC TIMESERIES ID ID
INTERVAL= SEASONALITY= ALIGN= NOTSORTED
ID ID ID, VAR, CROSSVAR ID, VAR, CROSSVAR
START= END= ACCUMULATE= SETMISS=
SEASON
TRANSPOSE=
Accumulation and Seasonality Options specify accumulation frequency specify length of seasonal cycle specify interval alignment specify that time ID variable values not be sorted specify starting time ID value specify ending time ID value specify accumulation statistic specify missing value interpretation Time-Stamped Data Seasonal Statistics Options specify the form of the output data set
Option
Functional Summary F 1685
Description
Statement
Option
TREND TREND
TRANSPOSE= NPERIODS=
Time Series Transformation Options specify simple differencing specify seasonal differencing specify transformation
VAR, CROSSVAR VAR, CROSSVAR VAR, CROSSVAR
DIF= SDIF= TRANSFORM=
Time Series Correlation Options specify the list of lags specify the number of lags specify the number of parameters specify the form of the output data set
CORR CORR CORR CORR
LAGS= NLAG= NPARMS= TRANSPOSE=
Time Series Cross-Correlation Options specify the list of lags specify the number of lags specify the form of the output data set
CROSSCORR CROSSCORR CROSSCORR
LAGS= NLAG= TRANSPOSE=
DECOMP DECOMP DECOMP
MODE= LAMBDA= NPERIODS=
DECOMP
TRANSPOSE=
ID PROC TIMESERIES PROC TIMESERIES PROC TIMESERIES PROC TIMESERIES
FORMAT= PRINT= PRINTDETAILS PLOTS= CROSSPLOTS=
PROC TIMESERIES
SORTNAMES
PROC TIMESERIES
MAXERROR=
PROC TIMESERIES PROC TIMESERIES
CROSSPLOT= PLOT=
Time-Stamped Data Trend Statistics Options specify the form of the output data set specify the number of time periods to be stored
Time Series Decomposition Options specify mode of decomposition specify the Hodrick-Prescott filter parameter specify the number of time periods to be stored specify the form of the output data set Printing Control Options specify time ID format specify printed output specify detailed printed output specify univariate graphical output specify cross-variable graphical output Miscellaneous Options specify that analysis variables be processed in sorted order limit error and warning messages ODS Graphics Options specify the cross-variable graphical output specify the variable graphical output
1686 F Chapter 27: The TIMESERIES Procedure
PROC TIMESERIES Statement PROC TIMESERIES options ;
The following options can be used in the PROC TIMESERIES statement: DATA= SAS-data-set
names the SAS data set that contains the input data for the procedure to create the time series. If the DATA= option is not specified, the most recently created SAS data set is used. CROSSPLOTS= option | ( options )
specifies the cross-variable graphical output desired. By default, the TIMESERIES procedure produces no graphical output. The following plotting options are available: SERIES
plots the time series (OUT= data set).
CCF
plots the cross-correlation functions (OUTCROSSCORR= data set).
ALL
same as PLOTS=(SERIES CCF).
For example, CROSSPLOTS=SERIES plots the two time series. The CROSSPLOTS= option produces graphical output for these results by using the Output Delivery System (ODS). The CROSSPLOTS= option produces results similar to the data sets listed in parentheses next to the preceding options. MAXERROR= number
limits the number of warning and error messages that are produced during the execution of the procedure to the specified value. The default is MAXERRORS=50. This option is particularly useful in BY-group processing where it can be used to suppress the recurring messages. OUT= SAS-data-set
names the output data set to contain the time series variables specified in the subsequent VAR and CROSSVAR statements. If BY variables are specified, they are also included in the OUT= data set. If an ID variable is specified, it is also included in the OUT= data set. The values are accumulated based on the ID statement INTERVAL= or the ACCUMULATE= option or both. The OUT= data set is particularly useful when you want to further analyze, model, or forecast the resulting time series with other SAS/ETS procedures. OUTCORR= SAS-data-set
names the output data set to contain the univariate time domain statistics. OUTCROSSCORR= SAS-data-set
names the output data set to contain the cross-correlation statistics. OUTDECOMP= SAS-data-set
names the output data set to contain the decomposed and/or seasonally adjusted time series. OUTSEASON= SAS-data-set
names the output data set to contain the seasonal statistics. The statistics are computed for each season as specified by the ID statement INTERVAL= option or the PROC TIMESERIES
PROC TIMESERIES Statement F 1687
statement SEASONALITY= option. The OUTSEASON= data set is particularly useful when analyzing transactional data for seasonal variations. OUTSUM= SAS-data-set
names the output data set to contain the descriptive statistics. The descriptive statistics are based on the accumulated time series when the ACCUMULATE= and/or SETMISSING= options are specified in the ID or VAR statements. The OUTSUM= data set is particularly useful when analyzing large numbers of series and a summary of the results are needed. OUTTREND= SAS-data-set
names the output data set to contain the trend statistics. The statistics are computed for each time period as specified by the ID statement INTERVAL= option. The OUTTREND= data set is particularly useful when analyzing transactional data for trends. PLOTS= option | ( options )
specifies the univariate graphical output desired. By default, the TIMESERIES procedure produces no graphical output. The following plotting options are available: SERIES
plots the time series (OUT= data set).
RESIDUAL
plots the residual time series (OUT= data set).
CYCLES
plots the seasonal cycles (OUT= data set).
CORR
plots the correlation panel (OUTCORR= data set).
ACF
plots the autocorrelation function (OUTCORR= data set).
PACF
plots the partial autocorrelation function (OUTCORR= data set).
IACF
plots the inverse autocorrelation function (OUTCORR= data set).
WN
plots the white noise probabilities (OUTCORR= data set).
DECOMP
plots the seasonal adjustment panel (OUTDECOMP= data set).
TCS
plots the trend-cycle-seasonal component (OUTDECOMP= data set).
TCC
plots the trend-cycle component (OUTDECOMP= data set).
SIC
plots the seasonal-irregular component (OUTDECOMP= data set).
SC
plots the seasonal component (OUTDECOMP= data set).
SA
plots the seasonal adjusted component (OUTDECOMP= data set).
PCSA
plots the percent change in the seasonal adjusted component (OUTDECOMP= data set).
IC
plots the irregular component (OUTDECOMP= data set).
TC
plots the trend component (OUTDECOMP= data set).
CC
plots the cycle component (OUTDECOMP= data set).
ALL
same as PLOTS=(SERIES ACF PACF IACF WN).
For example, PLOTS=SERIES plots the time series. The PLOTS= option produces graphical output for these results by using the Output Delivery System (ODS). The PLOTS= option produces results similar to the data sets listed in parentheses next to the preceding options.
1688 F Chapter 27: The TIMESERIES Procedure
PRINT= option | ( options )
specifies the printed output desired. By default, the TIMESERIES procedure produces no printed output. The following printing options are available: DECOMP
prints the seasonal decomposition/adjustment table (OUTDECOMP= data set).
SEASONS
prints the seasonal statistics table (OUTSEASON= data set).
DESCSTATS
prints the descriptive statistics for the accumulated time series (OUTSUM= data set).
SUMMARY
prints the descriptive statistics table for all time series (OUTSUM= data set).
TRENDS
prints the trend statistics table (OUTTREND= data set).
For example, PRINT=SEASONS prints the seasonal statistics. The PRINT= option produces printed output for these results by using the Output Delivery System (ODS). The PRINT= option produces results similar to the data sets listed in parentheses next to the preceding options. PRINTDETAILS
specifies that output requested with the PRINT= option be printed in greater detail. SEASONALITY= number
specifies the length of the seasonal cycle. For example, SEASONALITY=3 means that every group of three time periods forms a seasonal cycle. By default, the length of the seasonal cycle is one (no seasonality) or the length implied by the INTERVAL= option specified in the ID statement. For example, INTERVAL=MONTH implies that the length of the seasonal cycle is 12. SORTNAMES
specifies that the variables specified in the VAR and CROSSVAR statements be processed in sorted order by the variable names. This option allows the output data sets to be presorted by the variable names.
BY Statement A BY statement can be used with PROC TIMESERIES to obtain separate dummy variable definitions for groups of observations defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. If your input data set is not sorted in ascending order, use one of the following alternatives: Sort the data by using the SORT procedure with a similar BY statement.
CORR Statement F 1689
Specify the option NOTSORTED or DESCENDING in the BY statement for the TIMESERIES procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order. Create an index on the BY variables by using the DATASETS procedure. For more information about the BY statement, see SAS Language Reference: Concepts. For more information about the DATASETS procedure, see the discussion in the Base SAS Procedures Guide.
CORR Statement CORR statistics < / options > ;
A CORR statement can be used with the TIMESERIES procedure to specify options related to time domain analysis of the accumulated time series. Only one CORR statement is allowed. The following time domain statistics are available: LAG
time lag
N
number of variance products
ACOV
autocovariances
ACF
autocorrelations
ACFSTD
autocorrelation standard errors
ACF2STD
two standard errors beyond autocorrelations
ACFNORM
normalized autocorrelations
ACFPROB
autocorrelation probabilities
ACFLPROB
autocorrelation log probabilities
PACF
partial autocorrelations
PACFSTD
partial autocorrelation standard errors
PACF2STD
two standard errors beyond partial autocorrelations
PACFNORM
partial normalized autocorrelations
PACFPROB
partial autocorrelation probabilities
PACFLPROB
partial autocorrelation log probabilities
IACF
inverse autocorrelations
IACFSTD
inverse autocorrelation standard errors
IACF2STD
two standard errors beyond inverse autocorrelations
IACFNORM
normalized inverse autocorrelations
IACFPROB
inverse autocorrelation probabilities
1690 F Chapter 27: The TIMESERIES Procedure
IACFLPROB
inverse autocorrelation log probabilities
WN
white noise test statistics
WNPROB
white noise test probabilities
WNLPROB
white noise test log probabilities
If none of the correlation statistics are specified, the default is as follows: corr lag n acov acf acfstd pacf pacfstd iacf iacfstd wn wnprob;
The following options can be specified in the CORR statement following the slash (/): NLAG= number
specifies the number of lags to be stored in the OUTCORR= data set or to be plotted. The default is 24 or three times the length of the seasonal cycle, whichever is smaller. The LAGS= option takes precedence over the NLAG= option. LAGS= (numlist)
specifies the list of lags to be stored in OUTCORR= data set or to be plotted. The list of lags must be separated by spaces or commas. For example, LAGS=(1,3) specifies the first then third lag. NPARMS= number
specifies the number of parameters used in the model that created the residual time series. The number of parameters determines the degrees of freedom associated with the Ljung-Box statistics. The default is NPARMS=0. This option is useful when analyzing the residuals of a time series model with the number of parameters specified by NPARMS=number option. TRANSPOSE= NO|YES
TRANSPOSE=YES specifies that the OUTCORR= data set is recorded with the lags as the column names instead of with the correlation statistics as the column names. The TRANSPOSE=NO option is useful for graphing the correlation results with SAS/GRAPH procedures. The TRANSPOSE=YES option is useful for analyzing the correlation results from other SAS procedures such as the CLUSTER procedure of SAS/STAT or SAS Enterprise Miner. The default is TRANSPOSE=NO.
CROSSCORR Statement CROSSCORR statistics < / options > ;
A CROSSCORR statement can be used with the TIMESERIES procedure to specify options that are related to cross-correlation analysis of the accumulated time series. Only one CROSSCORR statement is allowed. The following time domain statistics are available:
DECOMP Statement F 1691
LAG
time lag
N
number of variance products
CCOV
cross covariances
CCF
cross-correlations
CCFSTD
cross-correlation standard errors
CCF2STD
two standard errors beyond cross-correlation
CCFNORM
normalized cross-correlations
CCFPROB
cross-correlation probabilities
CCFLPROB
cross-correlation log probabilities
If none of the cross-correlation statistics are specified, the default is as follows: crosscorr lag n ccov ccf ccfstd;
The following options can be specified in the CROSSCORR statement following the slash (/): NLAG= number
specifies the number of lags to be stored in the OUTCROSSCORR= data set or to be plotted. The default is 24 or three times the length of the seasonal cycle, whichever is smaller. The LAGS= option takes precedence over the NLAG= option. LAGS=( numlist )
specifies a list of lags to be stored in OUTCROSSCORR= data set or to be plotted. The list of lags must be separated by spaces or commas. For example, LAGS=(1,3) specifies the first then third lag. TRANSPOSE= NO|YES
TRANSPOSE=YES specifies that the OUTCROSSCORR= data set be recorded with the lags as the column names instead of with the cross-correlation statistics as the column names. The TRANSPOSE=NO option is useful for graphing the cross-correlation results with SAS/GRAPH procedures. The TRANSPOSE=YES option is useful for analyzing the cross-correlation results from using other procedures such as the CLUSTER procedure of SAS/STAT or SAS Enterprise Miner. The default is TRANSPOSE=NO.
DECOMP Statement DECOMP components < / options > ;
A DECOMP statement can be used with the TIMESERIES procedure to specify options related to classical seasonal decomposition of the time series data. Only one DECOMP statement is allowed. The options specified affect all variables listed in the VAR statements. Decomposition can be performed only when the length of the seasonal cycle specified by the PROC TIMESERIES statement SEASONALITY= option or implied by the ID statement INTERVAL= option is greater than one.
1692 F Chapter 27: The TIMESERIES Procedure
The following seasonal decomposition components are available: ORIG | originAL
original series
TCC | TRENDCYCLE
trend-cycle component
SIC | SEASONIRREGULAR
seasonal-irregular component
SC | seasonal
seasonal component
SCSTD
seasonal component standard errors
TCS | TRENDCYCLESEASON
trend-cycle-seasonal component
IC | IRREGULAR
irregular component
SA | ADJUSTED
seasonally adjusted series
PCSA
percent change seasonally adjusted series
TC
trend component
CC | CYCLE
cycle component
If none of the components are specified, the default is as follows: decomp orig tcc sc ic sa;
The following options can be specified in the DECOMP statement following the slash (/): MODE= option
specifies the type of decomposition to be used to decompose the time series. The following values can be specified for the MODE= option: ADD | ADDITIVE
additive decomposition
MULT | MULTIPLICATIVE
multiplicative decomposition
LOGADD | LOGADDITIVE
log-additive decomposition
PSEUDOADD | PSEUDOADDITIVE MULTORADD
pseudo-additive decomposition
multiplicative or additive decomposition, depending on data
Multiplicative and log additive decomposition require strictly positive time series. If the accumulated time series contains nonpositive values and the MODE=MULT or MODE=LOGADD option is specified, an error results. Pseudo-additive decomposition requires a nonnegative-valued time series. If the accumulated time series contains negative values and the MODE=PSEUDOADD option is specified, an error results. The MODE=MULTORADD option specifies that multiplicative decomposition be used when the accumulated time series contains only positive values, that pseudo-additive decomposition be used when the accumulated time series contains only nonnegative values, and that additive decomposition be used otherwise. The default is MODE=MULTORADD. LAMBDA= number
specifies the Hodrick-Prescott filter parameter for trend-cycle decomposition. The default is LAMBDA=1600. Filtering applies when the trend component or the cycle component is requested. If filtering is not specified, this option is ignored.
ID Statement F 1693
NPERIODS= number
specifies the number of time periods to be stored in the OUTDECOMP= data set when the TRANSPOSE=YES option is specified. If the TRANSPOSE=NO option is specified, the NPERIODS= option is ignored. If the NPERIODS= option is positive, the first or beginning time periods are recorded. If the NPERIODS= option is negative, the last or ending time periods are recorded. The NPERIODS= option specifies the number of OUTDECOMP= data set variables to contain the seasonal decomposition and is therefore limited to the maximum allowable number of SAS variables. If the number of time periods exceeds this limit, a warning is printed in the log and the number periods stored is reduced to the limit. If the NPERIODS= option is not specified, all of the periods specified between the ID statement START= and END= options are stored. If either of the START= or END= options are not specified, the default magnitude is the seasonality specified by the PROC TIMESERIES statement SEASONALITY= option or implied by the ID statement INTERVAL= option. If only the START= option is specified, the default sign is positive. If only the END= option is specified, the default sign is negative. TRANSPOSE= NO | YES
TRANSPOSE=YES specifies that the OUTDECOMP= data set be recorded with the time periods as the column names instead of with the statistics as the column names. The first and last time periods stored in the OUTDECOMP= data set correspond to the period of the ID statement START= option and END= option, respectively. If only the ID statement END= option is specified, the last time ID value of each accumulated time series corresponds to the last time period column. If only the ID statement START= option is specified, the first time ID value of each accumulated time series corresponds to the first time period column. If neither the START= option nor the END= option is specified with the ID statement, the first time ID value of each accumulated time series corresponds to the first time period column. The TRANSPOSE=NO option is useful for analyzing or displaying the decomposition results with SAS/GRAPH procedures. The TRANSPOSE=YES option is useful for analyzing the decomposition results from using other SAS procedures or SAS Enterprise Miner. The default is TRANSPOSE=NO.
ID Statement ID variable INTERVAL=interval < options > ;
The ID statement names a numeric variable that identifies observations in the input and output data sets. The ID variable’s values are assumed to be SAS date, time, or datetime values. In addition, the ID statement specifies the (desired) frequency associated with the time series. The ID statement options also specify how the observations are accumulated and how the time ID values are aligned to form the time series. The information specified affects all variables listed in subsequent VAR statements. If the ID statement is specified, the INTERVAL= option must also be used. If an ID statement is not specified, the observation number, with respect to the BY group, is used as the time ID. The following options can be used with the ID statement:
1694 F Chapter 27: The TIMESERIES Procedure
ACCUMULATE= option
specifies how the data set observations are to be accumulated within each time period. The frequency (width of each time interval) is specified by the INTERVAL= option. The ID variable contains the time ID values. Each time ID variable value corresponds to a specific time period. The accumulated values form the time series, which is used in subsequent analysis. The ACCUMULATE= option is useful when there are zero or more than one input observations that coincide with a particular time period (for example, time-stamped transactional data). The EXPAND procedure offers additional frequency conversions and transformations that can also be useful in creating a time series. The following options determine how the observations are accumulated within each time period based on the ID variable and the frequency specified by the INTERVAL= option: NONE
No accumulation occurs; the ID variable values must be equally spaced with respect to the frequency. This is the default option.
TOTAL
Observations are accumulated based on the total sum of their values.
AVERAGE | AVG
Observations are accumulated based on the average of their values.
MINIMUM | MIN
Observations are accumulated based on the minimum of their values.
MEDIAN | MED
Observations are accumulated based on the median of their values.
MAXIMUM | MAX
Observations are accumulated based on the maximum of their values.
N
Observations are accumulated based on the number of nonmissing observations.
NMISS
Observations are accumulated based on the number of missing observations.
NOBS
Observations are accumulated based on the number of observations.
FIRST
Observations are accumulated based on the first of their values.
LAST
Observations are accumulated based on the last of their values.
STDDEV |STD
Observations are accumulated based on the standard deviation of their values.
CSS
Observations are accumulated based on the corrected sum of squares of their values.
USS
Observations are accumulated based on the uncorrected sum of squares of their values.
If the ACCUMULATE= option is specified, the SETMISSING= option is useful for specifying how accumulated missing values are to be treated. If missing values should be interpreted as zero, then SETMISSING=0 should be used. The section “Details: TIMESERIES Procedure” on page 1699 describes accumulation in greater detail. ALIGN= option
controls the alignment of SAS dates used to identify output observations. The ALIGN= option accepts the following values: BEGINNING|BEG|B, MIDDLE|MID|M, and ENDING|END|E. BEGINNING is the default.
ID Statement F 1695
END= option
specifies a SAS date, datetime, or time value that represents the end of the data. If the last time ID variable value is less than the END= value, the series is extended with missing values. If the last time ID variable value is greater than the END= value, the series is truncated. For example, END=“&sysdate”D uses the automatic macro variable SYSDATE to extend or truncate the series to the current date. The START= and END= options can be used to ensure that data associated within each BY group contains the same number of observations. FORMAT= format
specifies the SAS format for the time ID values. If the FORMAT= option is not specified, the default format is implied from the INTERVAL= option. INTERVAL= interval
specifies the frequency of the accumulated time series. For example, if the input data set consists of quarterly observations, then INTERVAL=QTR should be used. If the PROC TIMESERIES statement SEASONALITY= option is not specified, the length of the seasonal cycle is implied from the INTERVAL= option. For example, INTERVAL=QTR implies a seasonal cycle of length 4. If the ACCUMULATE= option is also specified, the INTERVAL= option determines the time periods for the accumulation of observations. The INTERVAL= option is required and must be the first option specified in the ID statement. NOTSORTED
specifies that the time ID values not be in sorted order. The TIMESERIES procedure sorts the data with respect to the time ID prior to analysis. SETMISSING= option | number
specifies how missing values (either actual or accumulated) are to be interpreted in the accumulated time series. If a number is specified, missing values are set to the number. If a missing value indicates an unknown value, this option should not be used. If a missing value indicates no value, SETMISSING=0 should be used. You would typically use SETMISSING=0 for transactional data because no recorded data usually implies no activity. The following options can also be used to determine how missing values are assigned: MISSING
Missing values are set to missing. This is the default option.
AVERAGE | AVG
Missing values are set to the accumulated average value.
MINIMUM | MIN
Missing values are set to the accumulated minimum value.
MEDIAN | MED
Missing values are set to the accumulated median value.
MAXIMUM | MAX
Missing values are set to the accumulated maximum value.
FIRST
Missing values are set to the accumulated first nonmissing value.
LAST
Missing values are set to the accumulated last nonmissing value.
PREVIOUS | PREV
Missing values are set to the previous period’s accumulated nonmissing value. Missing values at the beginning of the accumulated series remain missing.
NEXT
Missing values are set to the next period’s accumulated nonmissing value. Missing values at the end of the accumulated series remain missing.
1696 F Chapter 27: The TIMESERIES Procedure
START= option
specifies a SAS date, datetime, or time value that represents the beginning of the data. If the first time ID variable value is greater than the START= value, the series is prepended with missing values. If the first time ID variable value is less than the START= value, the series is truncated. The START= and END= options can be used to ensure that data associated with each by group contains the same number of observations.
SEASON Statement SEASON statistics < / options > ;
A SEASON statement can be used with the TIMESERIES procedure to specify options that are related to seasonal analysis of the time-stamped transactional data. Only one SEASON statement is allowed. The options specified affect all variables specified in the VAR statements. Seasonal analysis can be performed only when the length of the seasonal cycle specified by the PROC TIMESERIES statement SEASONALITY= option or implied by the ID statement INTERVAL= option is greater than one. The following seasonal statistics are available: NOBS
number of observations
N
number of nonmissing observations
NMISS
number of missing observations
MINIMUM
minimum value
MAXIMUM
maximum value
RANGE
range value
SUM
summation value
MEAN
mean value
STDDEV
standard deviation
CSS
corrected sum of squares
USS
uncorrected sum of squares
MEDIAN
median value
If none of the season statistics are specified, the default is as follows: season n min max mean std;
The following option can be specified in the SEASON statement following the slash (/): TRANSPOSE= NO | YES
TRANSPOSE=YES specifies that the OUTSEASON= data set be recorded with the seasonal
TREND Statement F 1697
indices as the column names instead of with the statistics as the column names. The TRANSPOSE=NO option is useful for graphing the seasonal analysis results with SAS/GRAPH procedures. The TRANSPOSE=YES option is useful for analyzing the seasonal analysis results using other SAS procedures or SAS Enterprise Miner. The default is TRANSPOSE=NO.
TREND Statement TREND statistics < / options > ;
A TREND statement can be used with the TIMESERIES procedure to specify options related to trend analysis of the time-stamped transactional data. Only one TREND statement is allowed. The options specified affect all variables specified in the VAR statements. The following trend statistics are available: NOBS
number of observations
N
number of nonmissing observations
NMISS
number of missing observations
MINIMUM
minimum value
MAXIMUM
maximum value
RANGE
range value
sum
summation value
MEAN
mean value
STDDEV
standard deviation
CSS
corrected sum of squares
USS
uncorrected sum of squares
MEDIAN
median value
If none of the trend statistics are specified, the default is as follows: trend n min max mean std;
The following options can be specified in the TREND statement following the slash (/): NPERIODS= number
specifies the number of time periods to be stored in the OUTTREND= data set when the TRANSPOSE=YES option is specified. If the TRANSPOSE option is not specified, the NPERIODS= option is ignored. The NPERIODS= option specifies the number of OUTTREND= data set variables to contain the trend statistics and is therefore limited to the maximum allowable number of SAS variables. If the NPERIODS= option is not specified, all of the periods specified between the ID statement START= and END= options is stored. If either of the START= or END= options are not
1698 F Chapter 27: The TIMESERIES Procedure
specified, the default is the seasonality specified by the PROC TIMESERIES statement SEASONALITY= option or implied by the ID statement INTERVAL= option. If the seasonality is zero, the default is NPERIODS=5. TRANSPOSE= NO | YES
TRANSPOSE=YES specifies that the OUTTREND= data set be recorded with the time periods as the column names instead of with the statistics as the column names. The first and last time periods stored in the OUTTREND= data set correspond to the period of the ID statement START= and END= options, respectively. If only the ID statement END= option is specified, the last time ID value of each accumulated time series corresponds to the last time period column. If only the ID statement START= option is specified, the first time ID value of each accumulated time series corresponds to the first time period column. If neither the START= option nor the END= option is specified with the ID statement, the first time ID value of each accumulated time series corresponds to the first time period column. The TRANSPOSE=NO option is useful for analyzing or displaying the trend analysis results with SAS/GRAPH procedures. The TRANSPOSE=YES option is useful for analyzing the trend analysis results from using other SAS procedures or SAS Enterprise Miner. The default is TRANSPOSE=YES.
VAR and CROSSVAR Statements VAR variable-list < / options > ; CROSSVAR variable-list < / options > ;
The VAR and CROSSVAR statements list the numeric variables in the DATA= data set whose values are to be accumulated to form the time series. An input data set variable can be specified in only one VAR or CROSSVAR statement. Any number of VAR and CROSSVAR statements can be used. The following options can be used with the VAR and CROSSVAR statements: ACCUMULATE= option
specifies how the data set observations are to be accumulated within each time period for the variables listed in the VAR or CROSSVAR statement. If the ACCUMULATE= option is not specified in the VAR or CROSSVAR statement, accumulation is determined by the ACCUMULATE= option of the ID statement. See the ID statement ACCUMULATE= option for more details. DIF=( numlist )
specifies the differencing to be applied to the accumulated time series. The list of differencing orders must be separated by spaces or commas. For example, DIF=(1,3) specifies first then third order differencing. Differencing is applied after time series transformation. The TRANSFORM= option is applied before the DIF= option. SDIF=( numlist )
specifies the seasonal differencing to be applied to the accumulated time series. The list
Details: TIMESERIES Procedure F 1699
of seasonal differencing orders must be separated by spaces or commas. For example, SDIF=(1,3) specifies first then third order seasonal differencing. Differencing is applied after time series transformation. The TRANSFORM= option is applied before the SDIF= option. SETMISS= option | number SETMISSING= option | number
specifies how missing values (either actual or accumulated) are to be interpreted in the accumulated time series for variables listed in the VAR or CROSSVAR statement. If the SETMISSING= option is not specified in the VAR or CROSSVAR statement, missing values are set based on the SETMISSING= option of the ID statement. See the ID statement SETMISSING= option for more details. TRANSFORM= option
specifies the time series transformation to be applied to the accumulated time series. The following transformations are provided: NONE
No transformation is applied. This option is the default.
LOG
logarithmic transformation
SQRT
square-root transformation
LOGISTIC
logistic transformation
BOXCOX(n )
Box-Cox transformation with parameter number where the number is between –5 and 5
When the TRANSFORM= option is specified, the time series must be strictly positive.
Details: TIMESERIES Procedure The TIMESERIES procedure can be used to perform trend and seasonal analysis on transactional data. For trend analysis, various sample statistics are computed for each time period defined by the time ID variable and INTERVAL= option. For seasonal analysis, various sample statistics are computed for each season defined by the INTERVAL= or the SEASONALITY= option. For example, if the transactional data ranges from June 1990 to January 2000 and the data are to be accumulated on a monthly basis, then the trend statistics are computed for every month: June 1990, July 1990, . . . , January 2000. The seasonal statistics are computed for each season: January, February, . . . , December. The TIMESERIES procedure can be used to form time series data from transactional data. The accumulated time series can then be analyzed using time series techniques. The data is analyzed in the order described. 1. accumulation
ACCUMULATE= option
2. missing value interpretation
SETMISSING= option
1700 F Chapter 27: The TIMESERIES Procedure
3. time series transformation
TRANSFORM= option
4. time series differencing
DIF= and SDIF= option
5. descriptive statistics
OUTSUM= option, PRINT=DESCSTATS
6. seasonal decomposition
DECOMP statement, OUTDECOMP= option
7. correlation analysis
CORR statement, OUTCORR= option
8. cross-correlation analysis
CROSSCORR statement, OUTCROSSCORR= option
Accumulation If the ACCUMULATE= option in the ID, VAR, or CROSSVAR statement is specified, data set observations are accumulated within each time period. The frequency (width of each time interval) is specified by the ID statement INTERVAL= option. The ID variable contains the time ID values. Each time ID value corresponds to a specific time period. Accumulation is useful when the input data set contains transactional data, whose observations are not spaced with respect to any particular time interval. The accumulated values form the time series, which is used in subsequent analyses. For example, suppose a data set contains the following observations: 19MAR1999 19MAR1999 11MAY1999 12MAY1999 23MAY1999
10 30 50 20 20
If the INTERVAL=MONTH is specified, all of the above observations fall within a three-month period of time between March 1999 and May 1999. The observations are accumulated within each time period as follows: If the ACCUMULATE=NONE option is specified, an error is generated because the ID variable values are not equally spaced with respect to the specified frequency (MONTH). If the ACCUMULATE=TOTAL option is specified, the resulting time series is: O1MAR1999 O1APR1999 O1MAY1999
40 . 90
If the ACCUMULATE=AVERAGE option is specified, the resulting time series is: O1MAR1999 O1APR1999 O1MAY1999
20 . 30
Missing Value Interpretation F 1701
If the ACCUMULATE=MINIMUM option is specified, the resulting time series is: O1MAR1999 O1APR1999 O1MAY1999
10 . 20
If the ACCUMULATE=MEDIAN option is specified, the resulting time series is: O1MAR1999 01APR1999 O1MAY1999
20 . 20
If the ACCUMULATE=MAXIMUM option is specified, the resulting time series is: O1MAR1999 O1APR1999 O1MAY1999
30 . 50
If the ACCUMULATE=FIRST option is specified, the resulting time series is: O1MAR1999 O1APR1999 O1MAY1999
10 . 50
If the ACCUMULATE=LAST option is specified, the resulting time series is: O1MAR1999 O1APR1999 O1MAY1999
30 . 20
If the ACCUMULATE=STDDEV option is specified, the resulting time series is: O1MAR1999 O1APR1999 O1MAY1999
14.14 . 17.32
As can be seen from the above examples, even though the data set observations contain no missing values, the accumulated time series can have missing values.
Missing Value Interpretation Sometimes missing values should be interpreted as unknown values. But sometimes missing values are known, such as when missing values are created from accumulation and no observations should
1702 F Chapter 27: The TIMESERIES Procedure
be interpreted as no value—that is, zero. In the former case, the SETMISSING= option can be used to interpret how missing values are treated. The SETMISSING=0 option should be used when missing observations are to be treated as no (zero) values. In other cases, missing values should be interpreted as global values, such as minimum or maximum values of the accumulated series. The accumulated and interpreted time series is used in subsequent analyses.
Time Series Transformation There are four transformations available for strictly positive series only. Let yt > 0 be the original time series, and let wt be the transformed series. The transformations are defined as follows: Log
is the logarithmic transformation. wt D ln.yt /
Logistic
is the logistic transformation. wt D ln.cyt =.1
cyt //
where the scaling factor c is c D .1
10
6
/10
ceil.log10 .max.yt ///
and ceil.x/ is the smallest integer greater than or equal to x. Square root
is the square root transformation. p wt D yt
Box Cox
is the Box-Cox transformation. ( yt 1 ; ¤0 wt D ln.yt /; D 0
More complex time series transformations can be performed by using the EXPAND procedure of SAS/ETS.
Time Series Differencing After optionally transforming the series, the accumulated series can be simply or seasonally differenced by using the VAR and CROSSVAR statement DIF= and SDIF= options. For example, suppose yt is a monthly time series. The following examples of the DIF= and SDIF= options demonstrate how to simply and seasonally difference the time series. dif=(1) sdif=(1) dif=(1,12)
Additionally, assuming yt is strictly positive, the VAR and CROSSVAR statement TRANSFORM=, DIF=, and SDIF= options can be combined.
Descriptive Statistics F 1703
Descriptive Statistics Descriptive statistics can be computed from the working series by specifying the OUTSUM= option or PRINT=DESCSTATS.
Seasonal Decomposition Seasonal decomposition/analysis can be performed on the working series by specifying the OUTDECOMP= option, the PRINT=DECOMP option, or one of the PLOTS= options associated with decomposition in the PROC TIMESERIES statement. The DECOMP statement enables you to specify options related to decomposition. The TIMESERIES procedure uses classical decomposition. More complex seasonal decomposition/adjustment analysis can be performed by using the X11 or the X12 procedure of SAS/ETS. The DECOMP statement MODE= option determines the mode of the seasonal adjustment decomposition to be performed. There are four modes: multiplicative (MODE=MULT), additive (MODE=ADD), pseudo-additive (MODE=PSEUDOADD), and log-additive (MODE=LOGADD) decomposition. The default is MODE=MULTORADD which specifies MODE=MULT for series that are strictly positive, MODE=PSEUDOADD for series that are nonnegative, and MODE=ADD for series that are not nonnegative. When MODE=LOGADD is specified, the components are exponentiated to the original metric. The DECOMP statement LAMBDA= option specifies the Hodrick-Prescott filter parameter (Hodrick and Prescott 1980). The default is LAMBDA=1600. The Hodrick-Prescott filter is used to decompose the trend-cycle component into the trend component and cycle component in an additive fashion. A smaller parameter assigns less significance to the cycle; that is, LAMBDA=0 implies no cycle component. The notation and keywords associated with seasonal decomposition/adjustment analysis are defined in Table 27.2.
1704 F Chapter 27: The TIMESERIES Procedure
Table 27.2
Seasonal Adjustment Formulas
Component
Keyword
MODE= Option
Formula
original series
ORIGINAL
trend-cycle component
TCC
seasonal-irregular component
SIC
seasonal component
SC
irregular component
IC
trend-cycle-seasonal component
TCS
trend component
TC
cycle component
CC
seasonally adjusted series
SA
MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD
Ot D T Ct St It Ot D T C t C St C I t log.Ot / D T Ct C St C It Ot D T Ct .St C It 1/ centered moving average of Ot centered moving average of Ot centered moving average of log.Ot / centered moving average of Ot SIt D St It D Ot =T Ct SIt D St C It D Ot T Ct SIt D St C It D log.Ot / T Ct SIt D St C It 1 D Ot =T Ct seasonal Averages of SIt seasonal Averages of SIt seasonal Averages of SIt seasonal Averages of SIt It D SIt =St It D SIt St It D SIt St It D SIt St C 1 T CSt D T Ct St D Ot =It T CSt D T Ct C St D Ot It T CSt D T Ct C St D Ot It T CSt D T Ct St T t D T C t Ct T t D T C t Ct T t D T C t Ct T t D T C t Ct Ct D T Ct Tt Ct D T Ct Tt Ct D T Ct Tt Ct D T Ct Tt SAt D Ot =St D T Ct It SAt D Ot St D T Ct C It SAt D Ot =exp.St / D exp.T Ct C It / SAt D T Ct It
The trend-cycle component is computed from the s-period centered moving average as follows: T Ct D
bs=2c X kD bs=2c
yt Ck =s
Correlation Analysis F 1705
The seasonal component is obtained by averaging the seasonal-irregular component for each season. SkCjs D
X t Dk mod s
SIt T =s
where 0j T =s and 1ks. The seasonal components are normalized to sum to one (multiplicative) or zero (additive).
Correlation Analysis Correlation analysis can be performed on the working series by specifying the OUTCORR= option or one of the PLOTS= options that are associated with correlation. The CORR statement enables you to specify options that are related to correlation analysis.
Autocovariance Statistics LAGS
h 2 f0; : : : ; H g
N
Nh is the number of observed products at lag h, ignoring missing values P
O .h/ D T1 TtDhC1 .yt y/.yt h y/ P
O .h/ D N1h TtDhC1 .yt y/.yt h y/ when embedded missing values are present
ACOV ACOV
Autocorrelation Statistics
ACFSTD
.h/ O D O .h/= O .0/ r P 1 S t d..h// O D T1 1 C 2 hj D1 .j O /2
ACFNORM
Norm..h// O D .h/=Std. O .h// O
ACFPROB
P rob..h// O D 2 .1
ACFLPROB
LogP rob..h// O D log10 .P rob..h// O 8 .h/ O > 2Std..h// O < 1 0 2Std..h// O < .h/ O < 2Std..h// O F lag..h// O D : 1 .h/ O < 2Std..h// O
ACF
ACF2STD
ˆ .jNorm..h//j// O
Partial Autocorrelation Statistics h 1/ f j gj D1
PACF
'.h/ O D .0;h
PACFSTD
p S t d.'.h// O D 1= N0
PCFNORM
Norm.'.h// O D '.h/=Std. O '.h// O
PACFPROB
P rob.'.h// O D 2 .1
ˆ .jNorm.'.h//j// O
1706 F Chapter 27: The TIMESERIES Procedure
PACFLPROB PACF2STD
LogP rob.'.h// O D log10 .P rob.'.h// O 8 '.h/ O > 2Std.'.h// O < 1 0 2Std.'.h// O < '.h/ O < 2Std.'.h// O F lag.'.h// O D : 1 '.h/ O < 2Std.'.h// O
Inverse Autocorrelation Statistics IACF IACFSTD IACFNORM IACFPROB IACFLPROB IACF2STD
O .h/
p S t d.O .h// D 1= N0 O O Norm.O .h// D .h/=Std. .h// O P rob.O .h// D 2 1 ˆ jNorm..h//j O LogP rob.O .h// D log10 .P rob..h// 8 O O ˆ .h/ > 2Std..h// < 1 O O O F lag.O .h// D 0 2Std..h// < .h/ < 2Std..h// ˆ : 1 .h/ O O < 2Std..h//
White Noise Statistics
WN
P Q.h/ D T .T C 2/ hjD1 .j /2 =.T j / P Q.h/ D hjD1 Nj .j /2 when embedded missing values are present
WNPROB
P rob.Q.h// D max.1;h
WNLPROB
LogP rob.Q.h// D
WN
p/ .Q.h//
log10 .P rob.Q.h//
Cross-Correlation Analysis Cross-correlation analysis can be performed on the working series by specifying the OUTCROSSCORR= option or one of the CROSSPLOTS= options that are associated with cross-correlation. The CROSSCORR statement enables you to specify options that are related to cross-correlation analysis.
Cross-Correlation Statistics LAGS
h 2 f0; : : : ; H g
N
Nh is the number of observed products at lag h, ignoring missing values P
Ox;y .h/ D T1 TtDhC1 .xt x/.yt h y/ P
Ox;y .h/ D N1h TtDhC1 .xt x/.yt h y/ when embedded missing values are present p Ox;y .h/ D Ox;y .h/= Ox .0/ Oy .0/
CCOV CCOV CCF
Data Set Output F 1707
CCFSTD
p S t d.Ox;y .h// D 1= N0
CCFNORM
Norm.Ox;y .h// D Ox;y .h/=Std.Ox;y .h//
CCFPROB
P rob.Ox;y .h// D 2 1
CCFLPROB
LogP rob.Ox;y .h// D log10 .P rob.Ox;y .h// 8 Ox;y .h/ > 2Std.Ox;y .h// < 1 0 2Std.Ox;y .h// < Ox;y .h/ < 2Std.Ox;y .h// F lag.Ox;y .h// D : 1 Ox;y .h/ < 2Std.Ox;y .h//
CCF2STD
ˆ jNorm.Ox;y .h//j
Data Set Output The TIMESERIES procedure can create the OUT=, OUTCORR=, OUTCROSSCORR=, OUTDECOMP=, OUTSEASON=, OUTSUM=, and OUTTREND= data sets. In general, these data sets contain the variables listed in the BY statement. If an analysis step that is related to an output data step fails, the values of this step are not recorded or are set to missing in the related output data set, and appropriate error and/or warning messages are recorded in the log.
OUT= Data Set The OUT= data set contains the variables specified in the BY, ID, VAR, and CROSSVAR statements. If the ID statement is specified, the ID variable values are aligned and extended based on the ALIGN= and INTERVAL= options. The values of the variables specified in the VAR and CROSSVAR statements are accumulated based on the ACCUMULATE= option, and missing values are interpreted based on the SETMISSING= option.
OUTCORR= Data Set The OUTCORR= data set contains the variables specified in the BY statement as well as the variables listed below. The OUTCORR= data set records the correlations for each variable specified in a VAR statement (not the CROSSVAR statement). When the CORR statement TRANSPOSE=NO option is omitted or specified explicitly, the variable names are related to correlation statistics specified in the CORR statement options and the variable values are related to the NLAG= or LAGS= option. _NAME_
variable name
LAG
time lag
N
number of variance products
ACOV
autocovariances
ACF
autocorrelations
1708 F Chapter 27: The TIMESERIES Procedure
ACFSTD
autocorrelation standard errors
ACF2STD
two standard errors beyond autocorrelation
ACFNORM
normalized autocorrelations
ACFPROB
autocorrelation probabilities
ACFLPROB
autocorrelation log probabilities
PACF
partial autocorrelations
PACFSTD
partial autocorrelation standard errors
PACF2STD
two standard errors beyond partial autocorrelation
PACFNORM
partial normalized autocorrelations
PACFPROB
partial autocorrelation probabilities
PACFLPROB
partial autocorrelation log probabilities
IACF
inverse autocorrelations
IACFSTD
inverse autocorrelation standard errors
IACF2STD
two standard errors beyond inverse autocorrelation
IACFNORM
normalized inverse autocorrelations
IACFPROB
inverse autocorrelation probabilities
IACFLPROB
inverse autocorrelation log probabilities
WN
white noise test Statistics
WNPROB
white noise test probabilities
WNLPROB
white noise test log probabilities
The preceding correlation statistics are computed for each specified time lag. When the CORR statement TRANSPOSE=YES option is specified, the variable values are related to correlation statistics specified in the CORR statement and the variable names are related to the NLAG= or LAGS= options. _NAME_
variable name
_STAT_
correlation statistic name
_LABEL_
correlation statistic label
LAGh
correlation statistics for lag h
OUTCROSSCORR= Data Set The OUTCROSSCORR= data set contains the variables specified in the BY statement as well as the variables listed below. The OUTCROSSCORR= data set records the cross-correlations for each variable specified in a VAR and the CROSSVAR statements.
OUTDECOMP= Data Set F 1709
When the CROSSCORR statement TRANSPOSE=NO option is omitted or specified explicitly, the variable names are related to cross-correlation statistics specified in the CROSSCORR statement options and the variable values are related to the NLAG= or LAGS= option. _NAME_
variable name
_CROSS_
cross variable name
LAG
time lag
N
number of variance products
CCOV
cross-covariances
CCF
cross-correlations
CCFSTD
cross-correlation standard errors
CCF2STD
two standard errors beyond cross-correlation
CCFNORM
normalized cross-correlations
CCFPROB
cross-correlation probabilities
CCFLPROB
cross-correlation log probabilities
The preceding cross-correlation statistics are computed for each specified time lag. When the CROSSCORR statement TRANSPOSE=YES option is specified, the variable values are related to cross-correlation statistics specified in the CROSSCORR statement and the variable names are related to the NLAG= or LAGS= options. _NAME_
variable name
_CROSS_
cross variable name
_STAT_
cross-correlation statistic name
_LABEL_
cross-correlation statistic label
LAGh
cross-correlation statistics for lag h
OUTDECOMP= Data Set The OUTDECOMP= data set contains the variables specified in the BY statement as well as the variables listed below. The OUTDECOMP= data set records the seasonal decomposition/adjustments for each variable specified in a VAR statement (not the CROSSVAR statement). When the DECOMP statement TRANSPOSE=NO option is omitted or specified explicitly, the variable names are related to decomposition/adjustments specified in the DECOMP statement and the variable values are related to the ID statement INTERVAL= option and the PROC TIMESERIES statement SEASONALITY= option. _NAME_
variable name
_MODE_
mode of decomposition
1710 F Chapter 27: The TIMESERIES Procedure
_TIMEID_
time ID values
_SEASON_
seasonal index
ORIGINAL
original series values
TCC
trend-cycle component
SIC
seasonal-irregular component
SC
seasonal component
SCSTD
seasonal component standard errors
TCS
trend-cycle-seasonal component
IC
irregular component
SA
seasonally adjusted series
PCSA
percent change seasonally adjusted series
TC
trend component
CC
cycle component
The preceding decomposition components are computed for each time period. When the DECOMP statement TRANSPOSE=YES option is specified, the variable values are related to decomposition/adjustments specified in the DECOMP statement and the variable names are related to the ID statement INTERVAL= option, the PROC TIMESERIES statement SEASONALITY= option, and the DECOMP statement NPERIODS= option. _NAME_
variable name
_MODE_
mode of decomposition name
_COMP_
decomposition component name
_LABEL_
decomposition component label
PERIODt
decomposition component value for time period t
OUTSEASON= Data Set The OUTSEASON= data set contains the variables specified in the BY statement as well as the variables listed below. The OUTSEASON= data set records the seasonal statistics for each variable specified in a VAR statement (not the CROSSVAR statement). When the SEASON statement TRANSPOSE=NO option is omitted or specified explicitly, the variable names are related to seasonal statistics specified in the SEASON statement and the variable values are related to the ID statement INTERVAL= option or the PROC TIMESERIES statement SEASONALITY= option. _NAME_
variable name
_TIMEID_
time ID values
OUTSUM= Data Set F 1711
_SEASON_
seasonal index
NOBS
number of observations
N
number of nonmissing observations
NMISS
number of missing observations
MINIMUM
minimum value
MAXIMUM
maximum value
RANGE
maximum value
SUM
summation value
MEAN
mean value
STDDEV
standard deviation
CSS
corrected sum of squares
USS
uncorrected sum of squares
MEDIAN
median value
The preceding statistics are computed for each season. When the SEASON statement TRANSPOSE=YES option is specified, the variable values are related to seasonal statistics specified in the SEASON statement and the variable names are related to the ID statement INTERVAL= option or the PROC TIMESERIES statement SEASONALITY= option. _NAME_
variable name
_STAT_
season statistic name
_LABEL_
season statistic name
SEASONs
season statistic value for season s
OUTSUM= Data Set The OUTSUM= data set contains the variables specified in the BY statement as well as the variables listed below. The OUTSUM= data set records the descriptive statistics for each variable specified in a VAR statement (not the CROSSVAR statement). Variables related to descriptive statistics are based on the ACCUMULATE= and SETMISSING= options in the ID and VAR statements: _NAME_
variable name
_STATUS_
status flag that indicates whether the requested analyses were successful
NOBS
number of observations
N
number of nonmissing observations
1712 F Chapter 27: The TIMESERIES Procedure
NMISS
number of missing observations
MINIMUM
minimum value
MAXIMUM
maximum value
AVG
average value
STDDEV
standard deviation
The OUTSUM= data set contains the descriptive statistics of the (accumulated) time series.
OUTTREND= Data Set The OUTTREND= data set contains the variables specified in the BY statement as well as the variables listed below. The OUTTREND= data set records the trend statistics for each variable specified in a VAR statement (not the CROSSVAR statement). When the TREND statement TRANSPOSE=NO option is omitted or explicitly specified, the variable names are related to trend statistics specified in the TREND statement and the variable values are related to the ID statement INTERVAL= option or the PROC TIMESERIES statement SEASONALITY= option. _NAME_
variable name
_TIMEID_
time ID values
_SEASON_
seasonal index
NOBS
number of observations
N
number of nonmissing observations
NMISS
number of missing observations
MINIMUM
minimum value
MAXIMUM
maximum value
RANGE
maximum value
SUM
summation value
MEAN
mean value
STDDEV
standard deviation
CSS
corrected sum of squares
USS
uncorrected sum of squares
MEDIAN
median value
The preceding statistics are computed for each time period. When the TREND statement TRANSPOSE=YES option is specified, the variable values related to trend statistics specified in the TREND statement and the variable name are related to the ID statement INTERVAL=, the PROC TIMESERIES statement SEASONALITY= option, and the TREND statement NPERIODS= option.
_STATUS_ Variable Values F 1713
_NAME_
variable name
_STAT_
trend statistic name
_LABEL_
trend statistic name
PERIODt
trend statistic value for time period t
_STATUS_ Variable Values The _STATUS_ variable contains a code that specifies whether the analysis has been successful or not. The _STATUS_ variable can take the following values: 0
success
1000
transactional trend statistics failure
2000
transactional seasonal statistics failure
3000
accumulation failure
4000
missing value interpretation failure
6000
series is all missing
7000
transformation failure
8000
differencing failure
9000
unable to compute descriptive statistics
10000
seasonal decomposition failure
11000
correlation analysis failure
Printed Output The TIMESERIES procedure optionally produces printed output by using the Output Delivery System (ODS). By default, the procedure produces no printed output. All output is controlled by the PRINT= and PRINTDETAILS options associated with the PROC TIMESERIES statement. In general, if an analysis step related to printed output fails, the values of this step are not printed and appropriate error and/or warning messages are recorded in the log. The printed output is similar to the output data set, and these similarities are described below. The printed output produced by different printing option values is described as follows: PRINT=DECOMP
prints the seasonal decomposition similar to the OUTDECOMP= data set.
PRINT=DESCSTATS
prints a table of descriptive statistics for each variable.
PRINT=SEASONS prints the seasonal statistics similar to the OUTSEASON= data set. PRINT=SUMMARY prints the summary statistics similar to the OUTSUM= data set. PRINT=TRENDS
prints the trend statistics similar to the OUTTREND= data set.
1714 F Chapter 27: The TIMESERIES Procedure
PRINTDETAILS prints each table with greater detail. If PRINT=SEASONS and the PRINTDETAILS options are both specified, all seasonal statistics are printed.
ODS Table Names Table 27.3 relates the PRINT= options to ODS tables: Table 27.3
ODS Tables Produced in PROC TIMESERIES
ODS Table Name
Description
Statement
Option
SeasonalDecomposition DescStats GlobalStatistics SeasonStatistics StatisticsSummary TrendStatistics GlobalStatistics
seasonal decomposition descriptive statistics global statistics season statistics statistics summary trend statistics global statistics
PRINT PRINT PRINT PRINT PRINT PRINT PRINT
DECOMP DESCSTATS SEASONS SEASONS SUMMARY TRENDS TRENDS
The tables are related to a single series within a BY group.
ODS Graphics Names This section describes the graphical output produced by the TIMESERIES procedure. To request these graphs, you must specify the ODS GRAPHICS ON; statement in your SAS program before the PROC TIMESERIES step, and you must specify the PLOTS= or CROSSPLOTS= option in the PROC TIMESERIES statement. PROC TIMESERIES assigns a name to each graph it creates. These names are listed in Table 27.4. Table 27.4
ODS Graphics Produced by PROC TIMESERIES
ODS Graph Name
Plot Description
Statement
Option
ACFPlot ACFNORMPlot CCFNORMPlot CCFPlot CorrelationPlots CrossSeriesPlot CycleComponentPlot DecompositionPlots
autocorrelation function normalized autocorrelation function normalized cross-correlation function cross-correlation function correlation graphics panel cross series plot cycle component decomposition graphics panel
PLOTS PLOTS CROSSPLOTS CROSSPLOTS PLOTS CROSSPLOTS PLOTS PLOTS
ACF ACF CCF CCF CORR SERIES CC DECOMP
Examples: TIMESERIES Procedure F 1715
Table 27.4
(continued)
ODS Graph Name IACFPlot IACFNORMPlot
Plot Description inverse autocorrelation function normalized inverse autocorrelation function IrregularComponentPlot irregular component PACFPlot partial autocorrelation function PACFNORMPlot standardized partial autocorrelation function PercentChangeAdjustedplot percent-change seasonally adjusted ResidualPlot residual time series plot SeasonallyAdjustedPlot seasonally adjusted SeasonalComponentPlot seasonal component SeasonalCyclePlot seasonal cycles plot SeasonalIrregularComponentPlot seasonal-irregular component SeriesPlot time series plot TrendComponentPlot trend component TrendCycleComponentPlot trend-cycle component TrendCycleSeasonalPlot trend-cycle-seasonal component WhiteNoiseLogProbabilityPlot white noise log probability WhiteNoiseProbabilityPlot white noise probability
Statement PLOTS PLOTS
Option IACF IACF
PLOTS PLOTS PLOTS
IC PACF PACF
PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS
PCSA RESIDUAL SA SC CYCLES SIC SERIES TC TCC TCS WN WN
Examples: TIMESERIES Procedure
Example 27.1: Accumulating Transactional Data into Time Series Data This example illustrates using the TIMESERIES procedure to accumalate time-stamped transactional data that has been recorded at no particular frequency into time series data at a specific frequency. After the time series is created, the various SAS/ETS procedures related to time series analysis, seasonal adjustment/decomposition, modeling, and forecasting can be used to further analyze the time series data. Suppose that the input data set WORK.RETAIL contains variables STORE and TIMESTAMP and numerous other numeric transaction variables. The BY variable STORE contains values that break up the transactions into groups (BY groups). The time ID variable TIMESTAMP contains SAS date values recorded at no particular frequency. The other data set variables contain the numeric transaction values to be analyzed. It is further assumed that the input data set is sorted by the variables STORE and TIMESTAMP. The following statements form monthly time series from the transactional data based on the median value (ACCUMULATE=MEDIAN) of the transactions recorded with each time period. Also, the accumulated time series values for time periods with no transactions are set to zero instead of to missing (SETMISS=0) and only transactions recorded between the first day of 1998 (START=’01JAN1998’D ) and last day of 2000 (END=’31JAN2000’D) are considered and,
1716 F Chapter 27: The TIMESERIES Procedure
if needed, extended to include this range. proc timeseries data=retail out=mseries; by store; id timestamp interval=month accumulate=median setmiss=0 start=’01jan1998’d end =’31dec2000’d; var _numeric_; run;
The monthly time series data are stored in the data WORK.MSERIES. Each BY group associated with the BY variable STORE contains an observation for each of the 36 months associated with the years 1998, 1999, and 2000. Each observation contains the variable STORE, TIMESTAMP, and each of the analysis variables in the input data set. After each set of transactions has been accumulated to form corresponding time series, accumulated time series can be analyzed using various time series analysis techniques. For example, exponentially weighted moving averages can be used to smooth each series. The following statements use the EXPAND procedure to smooth the analysis variable named STOREITEM. proc expand data=mseries out=smoothed from=month; by store; id date; convert storeitem=smooth / transform=(ewma 0.1); run;
The smoothed series are stored in the data set WORK.SMOOTHED. The variable SMOOTH contains the smoothed series. If the time ID variable TIMESTAMP contains SAS datetime values instead of SAS date values, the INTERVAL=, START=, and END= options must be changed accordingly and the following statements could be used: proc timeseries data=retail out=tseries; by store; id timestamp interval=dtmonth accumulate=median setmiss=0 start=’01jan1998:00:00:00’dt end =’31dec2000:00:00:00’dt; var _numeric_; run;
The monthly time series data are stored in the data WORK.TSERIES, and the time ID values use a SAS datetime representation.
Example 27.2: Trend and Seasonal Analysis F 1717
Example 27.2: Trend and Seasonal Analysis This example illustrates using the TIMESERIES procedure for trend and seasonal analysis of timestamped transactional data. Suppose that the data set SASHELP.AIR contains two variables: DATE and AIR. The variable DATE contains sorted SAS date values recorded at no particular frequency. The variable AIR contains the transaction values to be analyzed. The following statements accumulate the transactional data on an average basis to form a quarterly time series and perform trend and seasonal analysis on the transactions. proc timeseries data=sashelp.air out=series outtrend=trend outseason=season print=seasons; id date interval=qtr accumulate=avg; var air; run;
The time series is stored in the data set WORK.SERIES, the trend statistics are stored in the data set WORK.TREND, and the seasonal statistics are stored in the data set WORK.SEASON. Additionally, the seasonal statistics are printed (PRINT=SEASONS) and the results of the seasonal analysis are shown in Output 27.2.1. Output 27.2.1 Seasonal Statistics Table The TIMESERIES Procedure Season Statistics for Variable AIR Season Index
N
Minimum
Maximum
Sum
Mean
Standard Deviation
1 2 3 4
36 36 36 36
112.0000 121.0000 136.0000 104.0000
419.0000 535.0000 622.0000 461.0000
8963.00 10207.00 12058.00 9135.00
248.9722 283.5278 334.9444 253.7500
95.65189 117.61839 143.97935 101.34732
Using the trend statistics stored in the WORK.TREND data set, the following statements plot various trend statistics associated with each time period over time. title1 "Trend Statistics"; proc sgplot data=trend; series x=date y=max / lineattrs=(pattern=solid); series x=date y=mean / lineattrs=(pattern=solid); series x=date y=min / lineattrs=(pattern=solid); yaxis display=(nolabel); format date year4.; run;
1718 F Chapter 27: The TIMESERIES Procedure
The results of this trend analysis are shown in Output 27.2.2. Output 27.2.2 Trend Statistics Plot
Using the trend statistics stored in the WORK.TREND data set, the following statements chart the sum of the transactions associated with each time period for the second season over time. title1 "Trend Statistics for 2nd Season"; proc sgplot data=trend; where _season_ = 2; vbar date / freq=sum; format date year4.; yaxis label=’Sum’; run;
The results of this trend analysis are shown in Output 27.2.3.
Example 27.2: Trend and Seasonal Analysis F 1719
Output 27.2.3 Trend Statistics Bar Chart
Using the trend statistics stored in the WORK.TREND data set, the following statements plot the mean of the transactions associated with each time period by each year over time. data trend; set trend; year = year(date); run; title1 "Trend Statistics by Year"; proc sgplot data=trend; series x=_season_ y=mean / group=year lineattrs=(pattern=solid); xaxis values=(1 to 4 by 1); run;
The results of this trend analysis are shown in Output 27.2.4.
1720 F Chapter 27: The TIMESERIES Procedure
Output 27.2.4 Trend Statistics
Using the season statistics stored in the WORK.SEASON data set, the following statements plot various season statistics for each season. title1 "Seasonal Statistics"; proc sgplot data=season; series x=_season_ y=max / lineattrs=(pattern=solid); series x=_season_ y=mean / lineattrs=(pattern=solid); series x=_season_ y=min / lineattrs=(pattern=solid); yaxis display=(nolabel); xaxis values=(1 to 4 by 1); run;
The results of this seasonal analysis are shown in Output 27.2.5.
Example 27.3: Illustration of ODS Graphics F 1721
Output 27.2.5 Seasonal Statistics Plot
Example 27.3: Illustration of ODS Graphics This example illustrates the use of ODS graphics. The following statements use the SASHELP.WORKERS data set to study the time series of electrical workers and its interaction with the series of masonry workers. The series plot, the correlation panel, the seasonal adjustment panel, and all cross-series plots are requested. Output 27.3.1 through Output 27.3.4 show a selection of the plots created. The graphical displays are requested by specifying the ODS GRAPHICS statement and the PLOTS= or CROSSPLOTS= options in the PROC TIMESERIES statement. For information about the graphics available in the TIMESERIES procedure, see the section “ODS Graphics Names” on page 1714. title "Illustration of ODS Graphics"; proc timeseries data=sashelp.workers out=_null_ plots=(series corr decomp) crossplots=all; id date interval=month; var electric;
1722 F Chapter 27: The TIMESERIES Procedure
crossvar masonry; run;
Output 27.3.1 Series Plot
Example 27.3: Illustration of ODS Graphics F 1723
Output 27.3.2 Correlation Panel
1724 F Chapter 27: The TIMESERIES Procedure
Output 27.3.3 Seasonal Decomposition Panel
References F 1725
Output 27.3.4 Cross-Correlation Plot
References Greene, W.H. (1999), Econometric Analysis, Fourth Edition, New York: Macmillan. Hodrick, R. and Prescott, E. (1980), “Post-War U.S. Business Cycles: An Empirical Investigation,” Discussion Paper 451, Carnegie Mellon University. Makridakis, S. and Wheelwright, S.C. (1978), Interactive Forecasting: Univariate and Multivariate Methods, Second Edition, San Francisco: Holden-Day, 198–201. Pyle, D. (1999), Data Preparation for Data Mining, San Francisco: Morgan Kaufman Publishers, Inc. Stoffer, D.S., Toloi, C.M.C. (1992), “A Note on the Ljung-Box-Pierce Portmanteau Statistic with Missing Data,” Statistics and Probability Letters 13, 391–396. Wheelwright, S.C. and Makridakis, S. (1973), Forecasting Methods for Management, Third Edition, New York: Wiley-Interscience, 123–133.
1726
Chapter 28
The TSCSREG Procedure Contents Overview: The TSCSREG Procedure . . . Getting Started: The TSCSREG Procedure Specifying the Input Data . . . . . . Unbalanced Data . . . . . . . . . . . Specifying the Regression Model . . Estimation Techniques . . . . . . . . Introductory Example . . . . . . . . Syntax: The TSCSREG Procedure . . . . . Functional Summary . . . . . . . . . PROC TSCSREG Statement . . . . . BY Statement . . . . . . . . . . . . ID Statement . . . . . . . . . . . . . MODEL Statement . . . . . . . . . . TEST Statement . . . . . . . . . . . Details: The TSCSREG Procedure . . . . . ODS Table Names . . . . . . . . . . Examples: The TSCSREG Procedure . . . Acknowledgments: TSCSREG Procedure . References: TSCSREG Procedure . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
1727 1728 1728 1728 1729 1730 1731 1733 1733 1734 1735 1735 1736 1737 1738 1738 1739 1739 1740
Overview: The TSCSREG Procedure The TSCSREG (time series cross section regression) procedure analyzes a class of linear econometric models that commonly arise when time series and cross-sectional data are combined. The TSCSREG procedure deals with panel data sets that consist of time series observations on each of several cross-sectional units. The TSCSREG procedure is very similar to the PANEL procedure; for full description, syntax details, models, and estimation methods, see Chapter 19, “The PANEL Procedure.” The TSCSREG procedure is no longer being updated, and it shares the code base with the PANEL procedure.
1728 F Chapter 28: The TSCSREG Procedure
Getting Started: The TSCSREG Procedure
Specifying the Input Data The input data set used by the TSCSREG procedure must be sorted by cross section and by time within each cross section. Therefore, the first step in using PROC TSCSREG is to make sure that the input data set is sorted. Normally, the input data set contains a variable that identifies the cross section for each observation and a variable that identifies the time period for each observation. To illustrate, suppose that you have a data set A that contains data over time for each of several states. You want to regress the variable Y on regressors X1 and X2. Cross sections are identified by the variable STATE, and time periods are identified by the variable DATE. The following statements sort the data set A appropriately: proc sort data=a; by state date; run;
The next step is to invoke the TSCSREG procedure and specify the cross section and time series variables in an ID statement. List the variables in the ID statement exactly as they are listed in the BY statement. proc tscsreg data=a; id state date;
Alternatively, you can omit the ID statement and use the CS= and TS= options on the PROC TSCSREG statement to specify the number of cross sections in the data set and the number of time series observations in each cross section.
Unbalanced Data In the case of fixed-effects and random-effects models, the TSCSREG procedure is capable of processing data with different numbers of time series observations across different cross sections. You must specify the ID statement to estimate models that use unbalanced data. The missing time series observations are recognized by the absence of time series ID variable values in some of the cross sections in the input data set. Moreover, if an observation with a particular time series ID value and cross-sectional ID value is present in the input data set, but one or more of the model variables are missing, that time series point is treated as missing for that cross section.
Specifying the Regression Model F 1729
Specifying the Regression Model Next, specify the linear regression model with a MODEL statement, as shown in the following statements. proc tscsreg data=a; id state date; model y = x1 x2; run;
The MODEL statement in PROC TSCSREG is specified like the MODEL statement in other SAS regression procedures: the dependent variable is listed first, followed by an equal sign, followed by the list of regressor variables. The reason for using PROC TSCSREG instead of other SAS regression procedures is that you can incorporate a model for the structure of the random errors. It is important to consider what kind of error structure model is appropriate for your data and to specify the corresponding option in the MODEL statement. The error structure options supported by the TSCSREG procedure are FIXONE, FIXTWO, RANONE, RANTWO, FULLER, PARKS, and DASILVA. See “Details: The TSCSREG Procedure” on page 1738 for more information about these methods and the error structures they assume. By default, the two-way random-effects error model structure is used while Fuller-Battese and Wansbeek-Kapteyn methods are used for the estimation of variance components in balanced data and unbalanced data, respectively. Thus, the preceding example is the same as specifying the RANTWO option, as shown in the following statements: proc tscsreg data=a; id state date; model y = x1 x2 / rantwo; run;
You can specify more than one error structure option in the MODEL statement; the analysis is repeated using each method specified. You can use any number of MODEL statements to estimate different regression models or estimate the same model by using different options. In order to aid in model specification within this class of models, the procedure provides two specification test statistics. The first is an F statistic that tests the null hypothesis that the fixed-effects parameters are all zero. The second is a Hausman m-statistic that provides information about the appropriateness of the random-effects specification. It is based on the idea that, under the null hypothesis of no correlation between the effects variables and the regressors, OLS and GLS are consistent, but OLS is inefficient. Hence, a test can be based on the result that the covariance of an efficient estimator with its difference from an inefficient estimator is zero. Rejection of the null hypothesis might suggest that the fixed-effects model is more appropriate. The procedure also provides the Buse R-square measure, which is the most appropriate goodnessof-fit measure for models estimated by using GLS. This number is interpreted as a measure of the
1730 F Chapter 28: The TSCSREG Procedure
proportion of the transformed sum of squares of the dependent variable that is attributable to the influence of the independent variables. In the case of OLS estimation, the Buse R-square measure is equivalent to the usual R-square measure.
Estimation Techniques If the effects are fixed, the models are essentially regression models with dummy variables that correspond to the specified effects. For fixed-effects models, ordinary least squares (OLS) estimation is equivalent to best linear unbiased estimation. The output from TSCSREG is identical to what one would obtain from creating dummy variables to represent the cross-sectional and time (fixed) effects. The output is presented in this manner to facilitate comparisons to the least squares dummy variables estimator (LSDV). As such, the inclusion of a intercept term implies that one dummy variable must be dropped. The actual estimation of the fixed-effects models is not LSDV. LSDV is much too cumbersome to implement. Instead, TSCSREG operates in a two step fashion. In the first step, the following occurs: One-way fixed-effects model: In the one-way fixed-effects model, the data is transformed by removing the cross-sectional means from the dependent and independent variables. The following is true: yQ it D yit
yN i
xQ it D xit
xN i
Two-way fixed-effects model: In the two-way fixed-effects model, the data is transformed by removing the cross-sectional and time means and adding back the overall means: yQ it D yit
yN i
yN t C yNN
xQ it D xit
xN i
xN t C xNN
where the symbols: yit and xit are the dependent variable (a scalar) and the explanatory variables (a vector whose columns are the explanatory variables not including a constant), respectively yN i and xN i are cross section means yN t and xN t are time means yN and xN are the overall means The second step consists of running OLS on the properly demeaned series, provided that the data are balanced. The unbalanced case is slightly more difficult, because the structure of the missing data must be retained. For this case, PROC TSCSREG uses a slight specialization on Wansbeek and Kapteyn.
Introductory Example F 1731
The other alternative is to assume that the effects are random. In the one-way case, E.i / D 0, E.i2 / D 2 , and E.i j / D 0 for i ¤j , and i is uncorrelated with i t for all i and t . In the twoway case, in addition to all of the preceding, E.et / D 0, E.et2 / D e2 , and E.et es / D 0 for t ¤s, and the et are uncorrelated with the i and the i t for all i and t . Thus, the model is a variance components model, with the variance components 2 , e2 , and 2 , to be estimated. A crucial implication of such a specification is that the effects are independent of the regressors. For randomeffects models, the estimation method is an estimated generalized least squares (EGLS) procedure that involves estimating the variance components in the first stage and using the estimated variance covariance matrix thus obtained to apply generalized least squares (GLS) to the data.
Introductory Example The following example uses the cost function data from Greene (1990) to estimate the variance components model. The variable OUTPUT is the log of output in millions of kilowatt-hours, and COST is the log of cost in millions of dollars. Refer to Greene (1990) for details. title1; data greene; input firm df1 = firm df2 = firm df3 = firm df4 = firm df5 = firm d60 = year d65 = year d70 = year datalines;
year output cost @@; = 1; = 2; = 3; = 4; = 5; = 1960; = 1965; = 1970;
... more lines ...
Usually you cannot explicitly specify all the explanatory variables that affect the dependent variable. The omitted or unobservable variables are summarized in the error disturbances. The TSCSREG procedure used with the RANTWO option specifies the two-way random-effects error model where the variance components are estimated by the Fuller-Battese method, because the data are balanced and the parameters are efficiently estimated by using the GLS method. The variance components model used by the Fuller-Battese method is yit D
K X
Xitk ˇk C vi C et C it i D 1; : : :; NI t D 1; : : :; T
kD1
The following statements fit this model. proc sort data=greene; by firm year; run;
1732 F Chapter 28: The TSCSREG Procedure
proc tscsreg data=greene; model cost = output / rantwo; id firm year; run;
The TSCSREG procedure output is shown in Figure 28.1. A model description is printed first; it reports the estimation method used and the number of cross sections and time periods. The variance components estimates are printed next. Finally, the table of regression parameter estimates shows the estimates, standard errors, and t tests. Figure 28.1 The Variance Components Estimates The TSCSREG Procedure Fuller and Battese Variance Components (RanTwo) Dependent Variable: cost Model Description Estimation Method Number of Cross Sections Time Series Length
RanTwo 6 4
Fit Statistics SSE MSE R-Square
0.3481 0.0158 0.8136
DFE Root MSE
22 0.1258
Variance Component Estimates Variance Component for Cross Sections Variance Component for Time Series Variance Component for Error
0.046907 0.00906 0.008749
Hausman Test for Random Effects DF
m Value
Pr > m
1
26.46
<.0001
Parameter Estimates
Variable Intercept output
DF
Estimate
Standard Error
t Value
Pr > |t|
1 1
-2.99992 0.746596
0.6478 0.0762
-4.63 9.80
0.0001 <.0001
Syntax: The TSCSREG Procedure F 1733
Syntax: The TSCSREG Procedure The following statements are used with the TSCSREG procedure. PROC TSCSREG options ; BY variables ; ID cross-section-id-variable time-series-id-variable ; MODEL dependent = regressor-variables / options ; TEST equation1 < ,equation2. . . > ;
Functional Summary The statements and options used with the TSCSREG procedure are summarized in the following table. Table 28.1
Functional Summary
Description
Statement
Option
Data Set Options specify the input data set write parameter estimates to an output data set include correlations in the OUTEST= data set include covariances in the OUTEST= data set specify number of time series observations specify number of cross sections
TSCSREG TSCSREG TSCSREG TSCSREG TSCSREG TSCSREG
DATA= OUTEST= CORROUT COVOUT TS= CS=
Declaring the Role of Variables specify BY-group processing specify the cross section and time ID variables
BY ID
Printing Control Options print correlations of the estimates print covariances of the estimates suppress printed output perform tests of linear hypotheses
MODEL MODEL MODEL TEST
CORRB COVB NOPRINT
Model Estimation Options specify the one-way fixed-effects model specify the two-way fixed-effects model specify the one-way random-effects model specify the two-way random-effects model specify Fuller-Battese method specify PARKS
MODEL MODEL MODEL MODEL MODEL MODEL
FIXONE FIXTWO RANONE RANTWO FULLER PARKS
1734 F Chapter 28: The TSCSREG Procedure
Description
Statement
Option
specify Da Silva method specify order of the moving-average error process for Da Silva method print ˆ matrix for Parks method print autocorrelation coefficients for Parks method suppress the intercept term control check for singularity
MODEL MODEL
DASILVA M=
MODEL MODEL
PHI RHO
MODEL MODEL
NOINT SINGULAR=
PROC TSCSREG Statement PROC TSCSREG options ;
The following options can be specified in the PROC TSCSREG statement. DATA=SAS-data-set
names the input data set. The input data set must be sorted by cross section and by time period within cross section. If you omit the DATA= option, the most recently created SAS data set is used. TS=number
specifies the number of observations in the time series for each cross section. The TS= option value must be greater than 1. The TS= option is required unless an ID statement is used. Note that the number of observations for each time series must be the same for each cross section and must cover the same time period. CS=number
specifies the number of cross sections. The CS= option value must be greater than 1. The CS= option is required unless an ID statement is used. OUTEST=SAS-data-set
the parameter estimates. When the OUTEST= option is not specified, the OUTEST= data set is not created. OUTCOV COVOUT
writes the covariance matrix of the parameter estimates to the OUTEST= data set. OUTCORR CORROUT
writes the correlation matrix of the parameter estimates to the OUTEST= data set. In addition, any of the following MODEL statement options can be specified in the PROC TSCSREG statement: CORRB, COVB, FIXONE, FIXTWO, RANONE, RANTWO,
BY Statement F 1735
FULLER, PARKS, DASILVA, NOINT, NOPRINT, M=, PHI, RHO, and SINGULAR=. When specified in the PROC TSCSREG statement, these options are equivalent to specifying the options for every MODEL statement.
BY Statement BY variables ;
A BY statement can be used with PROC TSCSREG to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the input data set must be sorted by the BY variables as well as by cross section and time period within the BY groups. When both an ID statement and a BY statement are specified, the input data set must be sorted first with respect to BY variables and then with respect to the cross section and time series ID variables. For example, proc sort data=a; by byvar1 byvar2 csid tsid; run; proc tscsreg data=a; by byvar1 byvar2; id csid tsid; ... run;
When both a BY statement and an ID statement are used, the data set might have a different number of cross sections or a different number of time periods in each BY group. If no ID statement is used, the CS=N and TS=T options must be specified and each BY group must contain N T observations.
ID Statement ID cross-section-id-variable time-series-id-variable ;
The ID statement is used to specify variables in the input data set that identify the cross section and time period for each observation. When an ID statement is used, the TSCSREG procedure verifies that the input data set is sorted by the cross section ID variable and by the time series ID variable within each cross section. The TSCSREG procedure also verifies that the time series ID values are the same for all cross sections. To make sure the input data set is correctly sorted, use PROC SORT with a BY statement with the variables listed exactly as they are listed in the ID statement to sort the input data set. For example, proc sort data=a;
1736 F Chapter 28: The TSCSREG Procedure
by csid tsid; run; proc tscsreg data=a; id csid tsid; ... etc. ... run;
If the ID statement is not used, the TS= and CS= options must be specified on the PROC TSCSREG statement. Note that the input data must be sorted by time within cross section, regardless of whether the cross section structure is given by an ID statement or by the options TS= and CS=. If an ID statement is specified, the time series length T is set to the minimum number of observations for any cross section, and only the first T observations in each cross section are used. If both the ID statement and the TS= and CS= options are specified, the TS= and CS= options are ignored.
MODEL Statement MODEL response = regressors / options ;
The MODEL statement specifies the regression model and the error structure assumed for the regression residuals. The response variable on the left side of the equal sign is regressed on the independent variables listed after the equal sign. Any number of MODEL statements can be used. For each model statement, only one response variable can be specified on the left side of the equal sign. The error structure is specified by the FIXONE, FIXTWO, RANONE, RANTWO, FULLER, PARKS, and DASILVA options. More than one of these options can be used, in which case the analysis is repeated for each error structure model specified. Models can be given labels up to 32 characters in length. Model labels are used in the printed output to identify the results for different models. If no label is specified, the response variable name is used as the label for the model. The model label is specified as follows: label: MODEL response = regressors / options ; The following options can be specified on the MODEL statement after a slash (/). CORRB CORR
prints the matrix of estimated correlations between the parameter estimates. COVB VAR
prints the matrix of estimated covariances between the parameter estimates. FIXONE
specifies that a one-way fixed-effects model be estimated with the one-way model that corresponds to group effects only.
TEST Statement F 1737
FIXTWO
specifies that a two-way fixed-effects model be estimated. RANONE
specifies that a one-way random-effects model be estimated. RANTWO
specifies that a two-way random-effects model be estimated. FULLER
specifies that the model be estimated by using the Fuller-Battese method, which assumes a variance components model for the error structure. PARKS
specifies that the model be estimated by using the Parks method, which assumes a first-order autoregressive model for the error structure. DASILVA
specifies that the model be estimated by using the Da Silva method, which assumes a mixed variance-component moving-average model for the error structure. M=number
specifies the order of the moving-average process in the Da Silva method. The M= value must be less than T 1. The default is M=1. PHI
prints the ˆ matrix of estimated covariances of the observations for the Parks method. The PHI option is relevant only when the PARKS option is used. RHO
prints the estimated autocorrelation coefficients for the Parks method. NOINT NOMEAN
suppresses the intercept parameter from the model. NOPRINT
suppresses the normal printed output. SINGULAR=number
specifies a singularity criterion for the inversion of the matrix. The default depends on the precision of the computer system.
TEST Statement TEST equation < , equation . . . > < / options > ;
1738 F Chapter 28: The TSCSREG Procedure
The TEST statement performs F tests of linear hypotheses about the regression parameters in the preceding MODEL statement. Each equation specifies a linear hypothesis to be tested. All hypotheses in one TEST statement are tested jointly. Variable names in the equations must correspond to regressors in the preceding MODEL statement, and each name represents the coefficient of the corresponding regressor. The keyword INTERCEPT refers to the coefficient of the intercept. The following statements illustrate the use of the TEST statement: proc tscsreg; model y = x1 x2 x3; test x1 = 0, x2 * .5 + 2 * x3= 0; test_int: test intercept=0, x3 = 0;
Note that a test of the following form is not permitted: test_bad: test x2 / 2 + 2 * x3= 0;
Do not use the division sign in test/restrict statements.
Details: The TSCSREG Procedure Models, estimators, and methods are covered in detail in Chapter 19, “The PANEL Procedure.”
ODS Table Names PROC TSCSREG assigns a name to each table it creates. You can use these names to reference the table when you use the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. Table 28.2
ODS Tables Produced in PROC TSCSREG
ODS Table Name
Description
ODS Tables Created by the MODEL Statement ModelDescription Model description FitStatistics Fit statistics FixedEffectsTest F test for no fixed effects
ParameterEstimates CovB
Parameter estimates Covariance of parameter estimates
Option default default FIXONE, FIXTWO, RANONE, RANTWO default COVB
Examples: The TSCSREG Procedure F 1739
Table 28.2
continued
ODS Table Name
Description
Option
CorrB VarianceComponents
Correlations of parameter estimates Variance component estimates
RandomEffectsTest
Hausman test for random effects
AR1Estimates
First order autoregressive parameter estimates Estimated phi matrix Estimates of autocovariances
CORRB FULLER, DASILVA, M=, RANONE, RANTWO FULLER, DASILVA, M=, RANONE, RANTWO PARKS, RHO
EstimatedPhiMatrix EstimatedAutocovariances
PARKS DASILVA, M=
ODS Tables Created by the TEST Statement TestResults Test results
Examples: The TSCSREG Procedure For examples of analysis of panel data, see Chapter 19, “The PANEL Procedure.”
Acknowledgments: TSCSREG Procedure The original TSCSREG procedure was developed by Douglas J. Drummond and A. Ronald Gallant, and contributed to the Version 5 SUGI Supplemental Library in 1979. The original code was changed substantially over the years. Additional new methods as well as other new features are currently included in the PANEL PROCEDURE. SAS Institute would like to thank Dr. Drummond and Dr. Gallant for their contribution of the original version of the TSCSREG procedure.
1740 F Chapter 28: The TSCSREG Procedure
References: TSCSREG Procedure Greene, W. H. (1990), Econometric Analysis, First Edition, New York: Macmillan Publishing Company.
Chapter 29
The UCM Procedure Contents Overview: UCM Procedure . . . . . . . . . . . . . . . . . Getting Started: UCM Procedure . . . . . . . . . . . . . . A Seasonal Series with Linear Trend . . . . . . . . Syntax: UCM Procedure . . . . . . . . . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . . . . PROC UCM Statement . . . . . . . . . . . . . . . AUTOREG Statement . . . . . . . . . . . . . . . . BLOCKSEASON Statement . . . . . . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . . . . CYCLE Statement . . . . . . . . . . . . . . . . . . DEPLAG Statement . . . . . . . . . . . . . . . . . ESTIMATE Statement . . . . . . . . . . . . . . . . FORECAST Statement . . . . . . . . . . . . . . . ID Statement . . . . . . . . . . . . . . . . . . . . . IRREGULAR Statement . . . . . . . . . . . . . . . LEVEL Statement . . . . . . . . . . . . . . . . . . MODEL Statement . . . . . . . . . . . . . . . . . . NLOPTIONS Statement . . . . . . . . . . . . . . . OUTLIER Statement . . . . . . . . . . . . . . . . . RANDOMREG Statement . . . . . . . . . . . . . . SEASON Statement . . . . . . . . . . . . . . . . . SLOPE Statement . . . . . . . . . . . . . . . . . . SPLINEREG Statement . . . . . . . . . . . . . . . SPLINESEASON Statement . . . . . . . . . . . . . Details: UCM Procedure . . . . . . . . . . . . . . . . . . An Introduction to Unobserved Component Models The UCMs as State Space Models . . . . . . . . . . Outlier Detection . . . . . . . . . . . . . . . . . . . Missing Values . . . . . . . . . . . . . . . . . . . . Parameter Estimation . . . . . . . . . . . . . . . . Computational Issues . . . . . . . . . . . . . . . . Displayed Output . . . . . . . . . . . . . . . . . . . Statistical Graphics . . . . . . . . . . . . . . . . . . ODS Table Names . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1742 1743 1743 1750 1751 1753 1757 1758 1760 1760 1762 1763 1765 1767 1768 1771 1772 1772 1773 1773 1774 1777 1778 1779 1781 1781 1787 1796 1797 1797 1799 1800 1800 1810
1742 F Chapter 29: The UCM Procedure
ODS Graph Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OUTFOR= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OUTEST= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Statistics of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples: UCM Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example 29.1: The Airline Series Revisited . . . . . . . . . . . . . . . . . Example 29.2: Variable Star Data . . . . . . . . . . . . . . . . . . . . . . . Example 29.3: Modeling Long Seasonal Patterns . . . . . . . . . . . . . . Example 29.4: Modeling Time-Varying Regression Effects Using the RANDOMREG Statement . . . . . . . . . . . . . . . . . . . . . . . . . Example 29.5: Trend Removal Using the Hodrick-Prescott Filter . . . . . . Example 29.6: Using Splines to Incorporate Nonlinear Effects . . . . . . . Example 29.7: Detection of Level Shift . . . . . . . . . . . . . . . . . . . . Example 29.8: ARIMA Modeling (Experimental) . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1813 1817 1818 1819 1820 1820 1826 1829 1833 1839 1841 1846 1849 1853
Overview: UCM Procedure The UCM procedure analyzes and forecasts equally spaced univariate time series data by using an unobserved components model (UCM). The UCMs are also called structural models in the time series literature. A UCM decomposes the response series into components such as trend, seasonals, cycles, and the regression effects due to predictor series. The components in the model are supposed to capture the salient features of the series that are useful in explaining and predicting its behavior. Harvey (1989) is a good reference for time series modeling that uses the UCMs. Harvey calls the components in a UCM the “stylized facts” about the series under consideration. Traditionally, the ARIMA models and, to some limited extent, the exponential smoothing models have been the main tools in the analysis of this type of time series data. It is fair to say that the UCMs capture the versatility of the ARIMA models while possessing the interpretability of the smoothing models. A thorough discussion of the correspondence between the ARIMA models and the UCMs, and the relative merits of UCM and ARIMA modeling, is given in Harvey (1989). The UCMs are also very similar to another set of models, called the dynamic models, that are popular in the Bayesian time series literature (West and Harrison 1999). In SAS/ETS you can use PROC ARIMA for ARIMA modeling (see Chapter 7, “The ARIMA Procedure”), PROC ESM for exponential smoothing modeling (see Chapter 13, “The ESM Procedure”), and use the Time Series Forecasting System for a point-and-click interface to ARIMA and exponential smoothing modeling. You can use the UCM procedure to fit a wide range of UCMs that can incorporate complex trend, seasonal, and cyclical patterns and can include multiple predictors. It provides a variety of diagnostic tools to assess the fitted model and to suggest the possible extensions or modifications. The components in the UCM provide a succinct description of the underlying mechanism governing the series. You can print, save, or plot the estimates of these component series. Along with the standard forecast and residual plots, the study of these component plots is an essential part of time series
Getting Started: UCM Procedure F 1743
analysis using the UCMs. Once a suitable UCM is found for the series under consideration, it can be used for a variety of purposes. For example, it can be used for the following: forecasting the values of the response series and the component series in the model obtaining a model-based seasonal decomposition of the series obtaining a “denoised” version and interpolating the missing values of the response series in the historical period obtaining the full sample or “smoothed” estimates of the component series in the model
Getting Started: UCM Procedure The analysis of time series using the UCMs involves recognizing the salient features present in the series and modeling them suitably. The UCM procedure provides a variety of models for estimating and forecasting the commonly observed features in time series. These models are discussed in detail later in the section “An Introduction to Unobserved Component Models” on page 1781. First the procedure is illustrated using an example.
A Seasonal Series with Linear Trend The airline passenger series, given as Series G in Box and Jenkins (1976), is often used in time series literature as an example of a nonstationary seasonal time series. This series is a monthly series consisting of the number of airline passengers who traveled during the years 1949 to 1960. Its main features are a steady rise in the number of passengers from year to year and the seasonal variation in the numbers during any given year. It also exhibits an increase in variability around the trend. A log transformation is used to stabilize this variability. The following DATA step prepares the log-transformed passenger series analyzed in this example: data seriesG; set sashelp.air; logair = log( air ); run;
The following statements produce a time series plot of the series by using the TIMESERIES procedure (see Chapter 27, “The TIMESERIES Procedure”). The trend and seasonal features of the series are apparent in the plot in Figure 29.1. proc timeseries data=seriesG plot=series; id date interval=month; var logair; run;
1744 F Chapter 29: The UCM Procedure
Figure 29.1 Series Plot of Log-Transformed Airline Passenger Series
In this example this series is modeled using an unobserved component model called the basic structural model (BSM). The BSM models a time series as a sum of three stochastic components: a trend component t , a seasonal component t , and random error t . Formally, a BSM for a response series yt can be described as yt D t C t C t Each of the stochastic components in the model is modeled separately. The random error t , also called the irregular component, is modeled simply as a sequence of independent, identically distributed (i.i.d.) zero-mean Gaussian random variables. The trend and the seasonal components can be modeled in a few different ways. The model for trend used here is called a locally linear time trend. This trend model can be written as follows: t
D t
1
C ˇt
ˇt
D ˇt
1
C t ;
1
C t ;
t i:i:d: N.0; 2 / t i:i:d: N.0; 2 /
These equations specify a trend where the level t as well as the slope ˇt is allowed to vary over time. This variation in slope and level is governed by the variances of the disturbance terms t and t in their respective equations. Some interesting special cases of this model arise when you
A Seasonal Series with Linear Trend F 1745
manipulate these disturbance variances. For example, if the variance of t is zero, the slope will be constant (equal to ˇ0 ); if the variance of t is also zero, t will be a deterministic trend given by the line 0 C ˇ0 t . The seasonal model used in this example is called a trigonometric seasonal. The stochastic equations governing a trigonometric seasonal are explained later (see the section “Modeling Seasons” on page 1783). However, it is interesting to note here that this seasonal model reduces to the familiar regression with deterministic seasonal dummies if the variance of the disturbance terms in its equations is equal to zero. The following statements specify a BSM with these three components: proc ucm data=seriesG; id date interval=month; model logair; irregular; level; slope; season length=12 type=trig print=smooth; estimate; forecast lead=24 print=decomp; run;
The PROC UCM statement signifies the start of the UCM procedure, and the input data set, seriesG, containing the dependent series is specified there. The optional ID statement is used to specify a date, datetime, or time identification variable, date in this example, to label the observations. The INTERVAL=MONTH option in the ID statement indicates that the measurements were collected on a monthly basis. The model specification begins with the MODEL statement, where the response series is specified (logair in this case). After this the components in the model are specified using separate statements that enable you to control their individual properties. The irregular component t is specified using the IRREGULAR statement and the trend component t is specified using the LEVEL and SLOPE statements. The seasonal component t is specified using the SEASON statement. The specifics of the seasonal characteristics such as the season length, its stochastic evolution properties, etc., are specified using the options in the SEASON statement. The seasonal component used in this example has a season length of 12, corresponding to the monthly seasonality, and is of the trigonometric type. Different types of seasonals are explained later (see the section “Modeling Seasons” on page 1783). The parameters of this model are the variances of the disturbance terms in the evolution equations of t , ˇt , and t and the variance of the irregular component t . These parameters are estimated by maximizing the likelihood of the data. The ESTIMATE statement options can be used to specify the span of data used in parameter estimation and to display and save the results of the estimation step and the model diagnostics. You can use the estimated model to obtain the forecasts of the series as well as the components. The options in the individual component statements can be used to display the component forecasts—for example, PRINT=SMOOTH option in the SEASON statement requests the displaying of smoothed forecasts of the seasonal component t . The series forecasts and forecasts of the sum of components can be requested using the FORECAST statement. The option PRINT=DECOMP in the FORECAST statement requests the printing of the smoothed trend t and the trend plus seasonal component (t C t ). The parameter estimates for this model are displayed in Figure 29.2.
1746 F Chapter 29: The UCM Procedure
Figure 29.2 BSM for the Logair Series The UCM Procedure Final Estimates of the Free Parameters
Component
Parameter
Irregular Level Slope Season
Error Error Error Error
Variance Variance Variance Variance
Estimate
Approx Std Error
t Value
Approx Pr > |t|
0.00023436 0.00029828 8.47911E-13 0.00000356
0.0001079 0.0001057 6.2271E-10 1.32347E-6
2.17 2.82 0.00 2.69
0.0298 0.0048 0.9989 0.0072
The estimates suggest that except for the slope component, the disturbance variances of all the components are significant—that is, all these components are stochastic. The slope component, however, appears to be deterministic because its error variance is quite insignificant. It might then be useful to check if the slope component can be dropped from the model—that is, if ˇ0 D 0. This can be checked by examining the significance analysis table of the components given in Figure 29.3. Figure 29.3 Component Significance Analysis for the Logair Series Significance Analysis of Components (Based on the Final State) Component
DF
Chi-Square
Pr > ChiSq
Irregular Level Slope Season
1 1 1 11
0.08 117867 43.78 507.75
0.7747 <.0001 <.0001 <.0001
This table provides the significance of the components in the model at the end of the estimation span. If a component is deterministic, this analysis is equivalent to checking whether the corresponding regression effect is significant. However, if a component is stochastic, then this analysis pertains only to the portion of the series near the end of the estimation span. In this example the slope appears quite significant and should be retained in the model, possibly as a deterministic component. Note that, on the basis of this table, the irregular component’s contribution appears insignificant toward the end of the estimation span; however, since it is a stochastic component, it cannot be dropped from the model on the basis of this analysis alone. The slope component can be made deterministic by holding the value of its error variance fixed at zero. This is done by modifying the SLOPE statement as follows: slope variance=0 noest;
After a tentative model is fit, its adequacy can be checked by examining different goodness-of-fit measures and other diagnostic tests and plots that are based on the model residuals. Once the model appears satisfactory, it can be used for forecasting. An interesting feature of the UCM procedure is that, apart from the series forecasts, you can request the forecasts of the individual components
A Seasonal Series with Linear Trend F 1747
in the model. The plots of component forecasts can be useful in understanding their contributions to the series. In order to obtain the plots, you need to turn ODS Graphics on by using the ODS GRAPHICS ON; statement. The following statements illustrate some of these features: ods graphics on; proc ucm data=seriesG; id date interval = month; model logair; irregular; level plot=smooth; slope variance=0 noest; season length=12 type=trig plot=smooth; estimate; forecast lead=24 plot=decomp; run;
The table given in Figure 29.4 shows the goodness-of-fit statistics that are computed by using the one-step-ahead prediction errors (see the section “Statistics of Fit” on page 1819). These measures indicate a good agreement between the model and the data. Additional diagnostic measures are also printed by default but are not shown here. Figure 29.4 Fit Statistics for the Logair Series The UCM Procedure Fit Statistics Based on Residuals Mean Squared Error Root Mean Squared Error Mean Absolute Percentage Error Maximum Percent Error R-Square Adjusted R-Square Random Walk R-Square Amemiya’s Adjusted R-Square
0.00147 0.03830 0.54132 2.19097 0.99061 0.99046 0.87288 0.99017
Number of non-missing residuals used for computing the fit statistics = 131
The first plot, shown in Figure 29.5, is produced by the PLOT=SMOOTH option in the LEVEL statement, it shows the smoothed level of the series.
1748 F Chapter 29: The UCM Procedure
Figure 29.5 Smoothed Trend in the Logair Series
The second plot (Figure 29.6), produced by the PLOT=SMOOTH option in the SEASON statement, shows the smoothed seasonal component by itself.
A Seasonal Series with Linear Trend F 1749
Figure 29.6 Smoothed Seasonal in the Logair Series
The plot of the sum of the trend and seasonal component, produced by the PLOT=DECOMP option in the FORECAST statement, is shown in Figure 29.7. You can see that, at least visually, the model seems to fit the data well. In all these decomposition plots the component estimates are extrapolated for two years in the future based on the LEAD=24 option specified in the FORECAST statement.
1750 F Chapter 29: The UCM Procedure
Figure 29.7 Smoothed Trend plus Seasonal in the Logair Series
Syntax: UCM Procedure The UCM procedure uses the following statements:
Functional Summary F 1751
PROC UCM < options > ; AUTOREG < options > ; BLOCKSEASON options ; BY variables ; CYCLE < options > ; DEPLAG options ; ESTIMATE < options > ; FORECAST < options > ; ID variable options ; IRREGULAR < options > ; LEVEL < options > ; MODEL dependent variable < = regressors > ; NLOPTIONS options ; OUTLIER options ; RANDOMREG regressors < / options > ; SEASON options ; SLOPE < options > ; SPLINEREG regressor < options > ; SPLINESEASON options ;
The PROC UCM and MODEL statements are required. In addition, the model must contain at least one component with nonzero disturbance variance.
Functional Summary The statements and options controlling the UCM procedure are summarized in the following table. Most commonly needed scenarios are listed; see the individual statements for additional details. You can use the PRINT= and PLOT= options in the individual component statements for printing and plotting the corresponding component forecasts. Table 29.1
Functional Summary
Description
Statement
Option
Data Set Options specify the input data set write parameter estimates to an output data set write series and component forecasts to an output data set
PROC UCM ESTIMATE FORECAST
DATA= OUTEST= OUTFOR=
Model Specification specify the dependent variable and simple predictors specify predictors with time-varying coefficients specify a nonlinear predictor
MODEL RANDOMREG SPLINEREG
1752 F Chapter 29: The UCM Procedure
Table 29.1
continued
Description
Statement
Option
specify the irregular component specify the random walk trend specify the locally linear trend specify a cycle component specify a dummy seasonal component specify a trigonometric seasonal component drop some harmonics from a trigonometric seasonal component specify a list of harmonics to keep in a trigonometric seasonal component specify a spline-season component specify a block-season component specify an autoreg component specify the lags of the dependent variable
IRREGULAR LEVEL LEVEL and SLOPE CYCLE SEASON SEASON SEASON
TYPE=DUMMY TYPE=TRIG DROPH=
SEASON
KEEPH=
SPLINESEASON BLOCKSEASON AUTOREG DEPLAG
Controlling the Likelihood Optimization Process request optimization of the profile likelihood ESTIMATE request optimization of the usual likelihood ESTIMATE specify the optimization technique NLOPTIONS limit the number of iterations NLOPTIONS Outlier Detection turn on the search for additive outliers turn on the search for level shifts specify the significance level for outlier tests limit the number of outliers limit the number of outliers to a percentage of the series length Controlling the Series Span exclude some initial observations from analysis during the parameter estimation exclude some observations at the end from analysis during the parameter estimation exclude some initial observations from analysis during forecasting exclude some observations at the end from analysis during forecasting Graphical Residual Analysis get a panel of plots consisting of residual autocorrelation plots and residual normality plots get the residual CUSUM plot get the residual cumulative sum of squares plot
PROFILE NOPROFILE TECH= MAXITER=
LEVEL OUTLIER OUTLIER OUTLIER
Default CHECKBREAK ALPHA= MAXNUM= MAXPCT=
ESTIMATE
SKIPFIRST=
ESTIMATE
BACK=
FORECAST
SKIPFIRST=
FORECAST
BACK=
ESTIMATE
PLOT=PANEL
ESTIMATE ESTIMATE
PLOT=CUSUM PLOT=CUSUMSQ
PROC UCM Statement F 1753
Table 29.1
continued
Description
Statement
Option
get a plot of p-values for the portmanteau white noise test get a time series plot of residuals with overlaid LOESS smoother
ESTIMATE
PLOT=WN
ESTIMATE
PLOT=LOESS
FORECAST
LEAD=
FORECAST
ALPHA=
FORECAST
PRINT=DECOMP
FORECAST
PRINT=FORECASTS
FORECAST
PLOT=DECOMP
FORECAST
PLOT=FORECASTS
Series Decomposition and Forecasting specify the number of periods to forecast in the future specify the significance level of the forecast confidence interval request printing of smoothed series decomposition request printing of one-step-ahead and multi step-ahead forecasts request plotting of smoothed series decomposition request plotting of one-step-ahead and multi step-ahead forecasts BY Groups specify BY group processing Global Printing and Plotting Options turn off all the printing for the procedure turn on all the printing options for the procedure turn off all the plotting for the procedure turn on all the plotting options for the procedure turn on a variety of plotting options for the procedure ID specify a variable that provides the time index for the series values
PROC UCM Statement PROC UCM < options > ;
BY
PROC UCM PROC UCM
NOPRINT PRINTALL
PROC UCM PROC UCM
PLOTS=NONE PLOTS=ALL
PROC UCM
PLOTS=
ID
1754 F Chapter 29: The UCM Procedure
The PROC UCM statement is required. The following options can be used in the PROC UCM statement: DATA=SAS-data-set
specifies the name of the SAS data set containing the time series. If the DATA= option is not specified in the PROC UCM statement, the most recently created SAS data set is used. NOPRINT
turns off all the printing for the procedure. The subsequent print options in the procedure are ignored. PLOTS< (global-plot-options) > < = plot-request < (options) > > PLOTS< (global-plot-options) > < = (plot-request < (options) > < ... plot-request < (options) > >) >
controls the plots produced with ODS Graphics. When you specify only one plot request, you can omit the parentheses around the plot request. Here are some examples: plots=none plots=all plots=residuals(acf loess) plots(noclm)=(smooth(decomp) residual(panel loess))
You must enable ODS Graphics before requesting plots, as shown in the following example. For general information about ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS” (SAS/STAT User’s Guide). ods graphics on; proc ucm; model y = x; irregular; level; run; proc ucm plots=all; model y = x; irregular; level; run;
The first PROC UCM step does not specify the PLOTS= option, so the default plot that displays the series forecasts in the forecast region is produced. The PLOTS=ALL option in the second PROC UCM step produces all the plots that are appropriate for the specified model. In addition to the PLOTS= option in the PROC UCM statement, you can request plots by using the PLOT= option in other statements of the UCM procedure. This way of requesting plots provides finer control over the plot production. If you have enabled ODS Graphics but do not specify any specific plot request, then PROC UCM produces the plot of series forecasts in the forecast horizon by default.
PROC UCM Statement F 1755
Global Plot Options: The global-plot-options apply to all relevant plots generated by the UCM procedure. The following global-plot-option is supported: NOCLM
suppresses the confidence limits in all the component and forecast plots.
Specific Plot Options: The following list describes the specific plots and their options: ALL
produces all plots appropriate for the particular analysis. NONE
suppresses all plots. FILTER (< filter-plot-options >)
produces time series plots of the filtered component estimates. The following filterplot-options are available: ALL
produces all the filtered component estimate plots appropriate for the particular analysis. LEVEL
produces a time series plot of the filtered level component estimate, provided the model contains the level component. SLOPE
produces a time series plot of the filtered slope component estimate, provided the model contains the slope component. CYCLE
produces time series plots of the filtered cycle component estimates for all cycle components in the model, if there are any. SEASON
produces time series plots of the filtered season component estimates for all seasonal components in the model, if there are any. DECOMP
produces time series plots of the filtered estimates of the series decomposition. RESIDUAL ( < residual-plot-options >)
produces the residuals plots. The following residual-plot-options are available: ALL
produces all the residual diagnostics plots appropriate for the particular analysis.
1756 F Chapter 29: The UCM Procedure
ACF
produces the autocorrelation plot of residuals. CUSUM
produces the plot of cumulative residuals against time. CUSUMSQ
produces the plot of cumulative squared residuals against time. HISTOGRAM
produces the histogram of residuals. LOESS
produces a scatter plot of residuals against time, which has an overlaid loess-fit. PACF
produces the partial-autocorrelation plot of residuals. PANEL
produces a summary panel of the residual diagnostics consisting of the following:
histogram of residuals
normal quantile plot of residuals
the residual-autocorrelation-plot
the residual-partial-autocorrelation-plot
QQ
produces a normal quantile plot of residuals. RESIDUAL
produces a needle plot of residuals against time. WN
produces the plot of Ljung-Box white-noise test p-values at different lags (in log scale). SMOOTH ( < smooth-plot-options >)
produces time series plots of the smoothed component estimates. The following smooth-plot-options are available: ALL
produces all the smoothed component estimate plots appropriate for the particular analysis. LEVEL
produces time series plot of the smoothed level component estimate, provided the model contains the level component.
AUTOREG Statement F 1757
SLOPE
produces time series plot of the smoothed slope component estimate, provided the model contains the slope component. CYCLE
produces time series plots of the smoothed cycle component estimates for all cycle components in the model, if there are any. SEASON
produces time series plots of the smoothed season component estimates for all season components in the model, if there are any. DECOMP
produces time series plots of the smoothed estimates of the series decomposition. PRINTALL
turns on all the printing options for the procedure. The subsequent NOPRINT options in the procedure are ignored.
AUTOREG Statement AUTOREG < options > ;
The AUTOREG statement specifies an autoregressive component in the model. An autoregressive component is a special case of cycle that corresponds to the frequency of zero or . It is modeled separately for easier interpretation. A stochastic equation for an autoregressive component rt can be written as follows: rt D rt
1
C t ;
t i:i:d: N.0; 2 /
The damping factor can take any value in the interval (–1, 1), including –1 but excluding 1. If D 1, the autoregressive component cannot be distinguished from the random walk level component. If D 1, the autoregressive component corresponds to a seasonal component with a season length of 2, or a nonstationary cycle with period 2. If jj < 1, then the autoregressive component is stationary. The following example illustrates the AUTOREG statement. This statement includes an autoregressive component in the model. The damping factor and the disturbance variance 2 are estimated from the data. autoreg;
NOEST=RHO NOEST=VARIANCE NOEST=(RHO VARIANCE)
fixes the values of and 2 to those specified in the RHO= and VARIANCE= options.
1758 F Chapter 29: The UCM Procedure
PLOT=FILTER PLOT=SMOOTH PLOT=( < FILTER > < SMOOTH > )
requests plotting of the filtered or smoothed estimate of the autoreg component. PRINT=FILTER PRINT=SMOOTH PRINT=(< FILTER > < SMOOTH >)
requests printing of the filtered or smoothed estimate of the autoreg component. RHO=value
specifies an initial value for the damping factor during the parameter estimation process. The value of must be in the interval (–1, 1), including –1 but excluding 1. VARIANCE=value
specifies an initial value for the disturbance variance 2 during the parameter estimation process. Any nonnegative value, including zero, is an acceptable starting value.
BLOCKSEASON Statement BLOCKSEASON NBLOCKS = integer BLOCKSIZE = integer < options > ;
The BLOCKSEASON or BLOCKSEASONAL statement is used to specify a seasonal component
t that has a special block structure. The seasonal t is called a block seasonal of block size m and number of blocks k if its season length, s, can be factored as s D m k and its seasonal effects have a block form—that is, the first m seasonal effects are all equal to some number 1 , the next m effects are all equal to some number 2 , and so on. This type of seasonal structure can be appropriate in some cases; for example, consider a series that is recorded on an hourly basis. Further assume that, in this particular case, the hour-of-the-day effect and the day-of-the-week effect are additive. In this situation the hour-of-the-week seasonality, having a season length of 168, can be modeled as a sum of two components. The hour-of-the-day effect is modeled using a simple seasonal of season length 24, while the day-of-the-week is modeled as a block seasonal component that has the days of the week as blocks. This day-of-the-week block seasonal component has seven blocks, each of size 24. A block seasonal specification requires, at the minimum, the block size m and the number of blocks in the seasonal k. These are specified using the BLOCKSIZE= and NBLOCKS= option, respectively. In addition, you might need to specify the position of the first observation of the series by using the OFFSET= option if it is not at the beginning of one of the blocks. In the example just considered, this corresponds to a situation where the first series measurement is not at the start of the day. Suppose that the first measurement of the series corresponds to the hour between 6:00 and 7:00 a.m., which is the seventh hour within that day or at the seventh position within that block. This is specified as OFFSET=7. The other options in this statement are very similar to the options in the SEASON statement; for example, a block seasonal can also be of one of the two types, DUMMY and TRIG. There can be
BLOCKSEASON Statement F 1759
more than one block seasonal component in the model, each specified using a separate BLOCKSEASON statement. No two block seasonals in the model can have the same NBLOCKS= and BLOCKSIZE= specifications. The following example illustrates the use of the BLOCKSEASON statement to specify the additive, hour-of-the-week seasonal model: season length=24 type=trig; blockseason nblocks=7 blocksize=24;
BLOCKSIZE=integer
specifies the block size, m. This is a required option in this statement. The block size can be any integer larger than or equal to two. Typical examples of block sizes are 24, corresponding to the hours of the day when a day is being used as a block in hourly data, or 60, corresponding to the minutes in an hour when an hour is being used as a block in data recorded by minutes, etc. NBLOCKS=integer
specifies the number of blocks, k. This is a required option in this statement. The number of blocks can be any integer greater than or equal to two. NOEST
specifies the value of the disturbance variance parameter to the value specified in the VARIANCE= option. OFFSET=integer
specifies the position of the first measurement within the block, if the first measurement is not at the start of a block. The OFFSET= value must be between one and the block size. The default value is one. The first measurement refers to the start of the estimation span and the forecast span. If these spans differ, their starting measurements must be separated by an integer multiple of the block size. PLOT=FILTER PLOT=SMOOTH PLOT=F_ANNUAL PLOT=S_ANNUAL PLOT=( < plot request > . . . < plot request > )
requests plots of the season component. When you specify only one plot request, you can omit the parentheses around the plot request. You can use the FILTER and SMOOTH options to plot the filtered and smoothed estimates of the season component t . You can use the F_ANNUAL and S_ANNUAL options to get the plots of “annual” variation in the filtered and smoothed estimates of t . The annual plots are useful to see the change in the contribution of a particular month over the span of years. Here “month” and “year” are generic terms that change appropriately with the interval type being used to label the observations and the season length. For example, for monthly data with a season length of 12, the usual meaning applies, while for daily data with a season length of 7, the days of the week serve as months and the weeks serve as years. The first period in each block is plotted over the years.
1760 F Chapter 29: The UCM Procedure
PRINT=FILTER PRINT=SMOOTH PRINT=( < FILTER > < SMOOTH > )
requests the printing of the filtered or smoothed estimate of the block seasonal component t . TYPE=DUMMY | TRIG
specifies the type of the block seasonal component. The default type is DUMMY. VARIANCE=value
specifies an initial value for the disturbance variance, !2 , in the t equation at the start of the parameter estimation process. Any nonnegative value, including zero, is an acceptable starting value.
BY Statement BY variables ;
A BY statement can be used in the UCM procedure to process a data set in groups of observations defined by the BY variables. The model specified using the MODEL and other component statements is applied to all the groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. The variables are one or more variables in the input data set.
CYCLE Statement CYCLE < options > ;
The CYCLE statement is used to specify a cycle component, t , in the model. The stochastic equation governing a cycle component of period p and damping factor is as follows: cos sin t t 1 t D C sin cos t t t 1 where t and t are independent, zero-mean, Gaussian disturbances with variance 2 and D 2 =p is the angular frequency of the cycle. Any p strictly greater than two is an admissible value for the period, and the damping factor can be any value in the interval (0, 1), including one but excluding zero. The cycles with frequency zero and , which correspond to the periods equal to infinity and two, respectively, can be specified using the AUTOREG statement. The values of less than one give rise to a stationary cycle, while D 1 gives rise to a nonstationary cycle. As a default, values of , p, and 2 are estimated from the data. However, if necessary, you can fix the values of some or all of these parameters. There can be multiple cycles in a model, each specified using a separate CYCLE statement. The examples that follow illustrate the use of the CYCLE statement.
CYCLE Statement F 1761
The following statements request including two cycles in the model. The parameters of each of these cycles are estimated from the data. cycle; cycle;
The following statement requests inclusion of a nonstationary cycle in the model. The cycle period p and the disturbance variance 2 are estimated from the data. cycle rho=1 noest=rho;
In the following statement a nonstationary cycle with a fixed period of 12 is specified. Moreover, a starting value is supplied for 2 . cycle period=12 rho=1 variance=4 noest=(rho period);
NOEST=PERIOD NOEST=RHO NOEST=VARIANCE NOEST=( < RHO > < PERIOD > < VARIANCE > )
fixes the values of the component parameters to those specified in the RHO=, PERIOD=, and VARIANCE= options. This option enables you to fix any combination of parameter values. PERIOD=value
specifies an initial value for the cycle period during the parameter estimation process. Period value must be strictly greater than 2. PLOT=FILTER PLOT=SMOOTH PLOT=( < FILTER > < SMOOTH > )
requests plotting of the filtered or smoothed estimate of the cycle component. PRINT=FILTER PRINT=SMOOTH PRINT=( < FILTER > < SMOOTH > )
requests the printing of a filtered or smoothed estimate of the cycle component
t.
RHO=value
specifies an initial value for the damping factor in this component during the parameter estimation process. Any value in the interval (0, 1), including one but excluding zero, is an acceptable initial value for the damping factor. VARIANCE=value
specifies an initial value for the disturbance variance parameter, 2 , to be used during the parameter estimation process. Any nonnegative value, including zero, is an acceptable starting value.
1762 F Chapter 29: The UCM Procedure
DEPLAG Statement DEPLAG LAGS = order < PHI = value . . . > < NOEST > ;
The DEPLAG statement is used to specify the lags of the dependent variable to be included as predictors in the model. The following examples illustrate the use of DEPLAG statement. If the dependent series is denoted by yt , the following statement specifies the inclusion of 1 yt 2 yt 2 in the model. The parameters 1 and 2 are estimated from the data.
1C
deplag lags=2;
The following statement requests including 1 yt of 1 and 2 are fixed at 0.8 and –1.2.
1 C 2 y t 4
1 2 y t
5
in the model. The values
deplag lags=(1)(4) phi=0.8 -1.2 noest;
The dependent lag parameters are not constrained to lie in any particular region. In particular, this implies that a UCM that contains only an irregular component and dependent lags, resulting in a traditional autoregressive model, is not constrained to be a stationary model. In the DEPLAG statement, if an initial value is supplied for any one of the parameters, the initial values must also be supplied for all other parameters. LAGS=order LAGS=(lag, . . . , lag ) . . . (lag, . . . , lag )
is a required option in this statement. LAGS=(l 1 , l 2 , . . . , l k ) defines a model with specified lags of the dependent variable included as predictors. LAGS=order is equivalent to LAGS=(1, 2, . . . , order ). A concatenation of parenthesized lists specifies a factored model. For example, LAGS=(1)(12) specifies that the lag values, 1, 12, and 13, corresponding to the following polynomial in the backward shift operator, be included in the model .1
1;1 B/.1
2;1 B 12 /
Note that, in this case, the coefficient of the thirteenth lag is constrained to be the product of the coefficients of the first and twelfth lags. NOEST
fixes the values of the parameters to those specified in PHI= option. PHI=value . . .
lists starting values for the coefficients of the lagged dependent variable. The order of the values listed corresponds with the order of the lags specified in the LAGS= option.
ESTIMATE Statement F 1763
ESTIMATE Statement ESTIMATE < options > ;
The ESTIMATE statement is an optional statement used to control the overall model-fitting environment. Using this statement, you can control the span of observations used to fit the model by using the SKIPFIRST= and BACK= options. This can be useful in model diagnostics. You can request a variety of goodness-of-fit statistics and other model diagnostic information including different residual diagnostic plots. Note that the ESTIMATE statement is not used to control the nonlinear optimization process itself. That is done using the NLOPTIONS statement, where you can control the number of iterations, choose between the different optimization techniques, and so on. You can save the estimated parameters and other related information in a data set by using the OUTEST= option. You can request the optimization of the profile likelihood, the likelihood obtained by concentrating out a disturbance variance, for parameter estimation by using the PROFILE option. The following example illustrates the use of this statement: estimate skipfirst=12 back=24;
This statement requests that the initial 12 measurements and the last 24 measurements be excluded during the model-fitting process. The actual observation span used to fit the model is decided as follows: Suppose that n0 and n1 are the observation numbers of the first and the last nonmissing values of the response variable, respectively. As a result of SKIPFIRST=12 and BACK=24, the measurements between observation numbers n0 C 12 and n1 24 form the estimation span. Of course, the model fitting might not take place if there are insufficient data in the resulting span. The model fitting does not take place if there are regressors in the model that have missing values in the estimation span. BACK=integer SKIPLAST=integer
indicates that some ending part of the data needs to be ignored during the parameter estimation. This can be useful when you want to study the forecasting performance of a model on the observed data. BACK=10 results in skipping the last 10 measurements of the response series during the parameter estimation. The default is BACK=0. EXTRADIFFUSE=k
enables continuation of the diffuse filtering iterations for k additional iterations beyond the first instance where the initialization of the diffuse state would have otherwise taken place. If the specified k is larger than the sample size, the diffuse iterations continue until the end of the sample. Note that one-step-ahead residuals are produced only after the diffuse state is initialized. Delaying the initialization leads to a reduction in the number of one-step-ahead residuals available for computing the residual diagnostic measures. This option is useful when you want to ignore the first few one-step-ahead residuals that often have large variance. NOPROFILE
requests that the usual likelihood be optimized for parameter estimation. For more information, see the section “Parameter Estimation by Profile Likelihood Optimization” on page 1798.
1764 F Chapter 29: The UCM Procedure
OUTEST=SAS-data-set
specifies an output data set for the estimated parameters. In the ESTIMATE statement, the PLOT= option is used to obtain different residual diagnostic plots. The different possibilities are as follows: PLOT=ACF PLOT=MODEL PLOT=LOESS PLOT=HISTOGRAM PLOT=PACF PLOT=PANEL PLOT=QQ PLOT=RESIDUAL PLOT=WN PLOT=( < plot request > . . . < plot request > )
requests different residual diagnostic plots. The different options are as follows: ACF
produces the residual-autocorrelation plot. CUSUM
produces the plot of cumulative residuals against time. CUSUMSQ
produces the plot of cumulative squared residuals against time. MODEL
produces the plot of one-step-ahead forecasts in the estimation span. HISTOGRAM
produces the histogram of residuals. LOESS
produces a scatter plot of residuals against time, which has an overlaid loess-fit. PACF
produces the residual-partial-autocorrelation plot. PANEL
produces a summary panel of the residual diagnostics consisting of
histogram of residuals normal quantile plot of residuals the residual-autocorrelation-plot the residual-partial-autocorrelation-plot
QQ
produces a normal quantile plot of residuals.
FORECAST Statement F 1765
RESIDUAL
produces a needle plot of residuals against time. WN
produces a plot of p-values, in log-scale, at different lags for the Ljung-Box portmanteau white noise test statistics. PRINT=NONE
suppresses all the printed output related to the model fitting, such as the parameter estimates, the goodness-of-fit statistics, and so on. PROFILE
requests that the profile likelihood, obtained by concentrating out one of the disturbance variances from the likelihood, be optimized for parameter estimation. By default, the profile likelihood is not optimized if any of the disturbance variance parameters is held fixed to a nonzero value. For more information see the section “Parameter Estimation by Profile Likelihood Optimization” on page 1798. SKIPFIRST=integer
indicates that some early part of the data needs to be ignored during the parameter estimation. This can be useful if there is a reason to believe that the model being estimated is not appropriate for this portion of the data. SKIPFIRST=10 results in skipping the first 10 measurements of the response series during the parameter estimation. The default is SKIPFIRST=0.
FORECAST Statement FORECAST < options > ;
The FORECAST statement is an optional statement that is used to specify the overall forecasting environment for the specified model. It can be used to specify the span of observations, the historical period, to use to compute the forecasts of the future observations. This is done using the SKIPFIRST= and BACK= options. The number of periods to forecast beyond the historical period, and the significance level of the forecast confidence interval, is specified using the LEAD= and ALPHA= options. You can request one-step-ahead series and component forecasts by using the PRINT= option. You can save the series forecasts, and the model-based decomposition of the series, in a data set by using the OUTFOR= option. The following example illustrates the use of this statement: forecast skipfirst=12 back=24 lead=30;
This statement requests that the initial 12 and the last 24 response values be excluded during the forecast computations. The forecast horizon, specified using the LEAD= option, is 30 periods; that is, multistep forecasting begins at the end of the historical period and continues for 30 periods. The actual observation span used to compute the multistep forecasting is decided as follows: Suppose that n0 and n1 are the observation numbers of the first and the last nonmissing values of the response variable, respectively. As a result of SKIPFIRST=12 and BACK=24, the historical period, or the
1766 F Chapter 29: The UCM Procedure
forecast span, begins at n0 C12 and ends at n1 24. Multistep forecasts are produced for the next 30 periods—that is, for the observation numbers n1 23 to n1 C6. Of course, the forecast computations can fail if the model has regressor variables that have missing values in the forecast span. If the regressors contain missing values in the forecast horizon—that is, between the observations n1 23 and n1 C 6—the forecast horizon is reduced accordingly. ALPHA=value
specifies the significance level of the forecast confidence intervals; for example, ALPHA=0.05, which is the default, results in a 95% confidence interval. BACK=integer SKIPLAST=integer
specifies the holdout sample for the evaluation of the forecasting performance of the model. For example, BACK=10 results in treating the last 10 observed values of the response series as unobserved. A post-sample-prediction-analysis table is produced for comparing the predicted values with the actual values in the holdout period. The default is BACK=0. EXTRADIFFUSE=k
enables continuation of the diffuse filtering iterations for k additional iterations beyond the first instance where the initialization of the diffuse state would have otherwise taken place. If the specified k is larger than the sample size, the diffuse iterations continue until the end of the sample. Note that one-step-ahead forecasts are produced only after the diffuse state is initialized. Delaying the initialization leads to reduction in the number of one-step-ahead forecasts. This option is useful when you want to ignore the first few one-step-ahead forecasts that often have large variance. LEAD=integer
specifies the number of periods to forecast beyond the historical period defined by the SKIPFIRST= and BACK= options; for example, LEAD=10 results in the forecasting of 10 future values of the response series. The default is LEAD=12. OUTFOR=SAS-data-set
specifies an output data set for the forecasts. The output data set contains the ID variable (if specified), the response and predictor series, the one-step-ahead and out-of-sample response series forecasts, the forecast confidence intervals, the smoothed values of the response series, and the smoothed forecasts produced as a result of the model-based decomposition of the series. PLOT=DECOMP PLOT=DECOMPVAR PLOT=FDECOMP PLOT=FDECOMPVAR PLOT=FORECASTS PLOT=TREND PLOT=( < plot request > . . . < plot request > )
requests forecast and model decomposition plots. The FORECASTS option provides the plot of the series forecasts, the TREND and DECOMP options provide the plots of the smoothed trend and other decompositions, the DECOMPVAR option can be used to plot the variance of
ID Statement F 1767
these components, and the FDECOMP and FDECOMPVAR options provide the same plots for the filtered decomposition estimates and their variances. PRINT=DECOMP PRINT=FDECOMP PRINT=FORECASTS PRINT=NONE PRINT=( < print request > . . . < print request > )
controls the printing of the series forecasts and the printing of smoothed model decomposition estimates. By default, the series forecasts are printed only for the forecast horizon specified by the LEAD= option; that is, the one-step-ahead predicted values are not printed. You can request forecasts for the entire forecast span by specifying the PRINT=FORECASTS option. Using PRINT=DECOMP, you can get smoothed estimates of the following effects: trend, trend plus regression, trend plus regression plus cycle, and sum of all components except the irregular. If some of these effects are absent in the model, then they are ignored. Similarly you can get filtered estimates of these effects by using PRINT=FDECOMP. You can use PRINT=NONE to suppress the printing of all the forecast output. SKIPFIRST=integer
indicates that some early part of the data needs to be ignored during the forecasting calculations. This can be useful if there is a reason to believe that the model being used for forecasting is not appropriate for this portion of the data. SKIPFIRST=10 results in skipping the first 10 measurements of the response series during the forecast calculations. The default is SKIPFIRST=0.
ID Statement ID variable INTERVAL=value < ALIGN=value > ;
The ID statement names a numeric variable that identifies observations in the input and output data sets. The ID variable’s values are assumed to be SAS date, time, or datetime values. In addition, the ID statement specifies the frequency associated with the time series. The ID statement options also specify how the observations are aligned to form the time series. If the ID statement is specified, the INTERVAL= option must also be specified. If the ID statement is not specified, the observation number, with respect to the BY group, is used as the time ID. The values of the ID variable are extrapolated for the forecast observations based on the values of the INTERVAL= option. ALIGN=value
controls the alignment of SAS dates used to identify output observations. The ALIGN= option has the following possible values: BEGINNING | BEG | B, MIDDLE | MID | M, and ENDING | END | E. The default is BEGINNING. The ALIGN= option is used to align the ID variable with the beginning, middle, or end of the time ID interval specified by the INTERVAL= option. INTERVAL=value
specifies the time interval between observations. This option is required in the ID statement.
1768 F Chapter 29: The UCM Procedure
INTERVAL=value is used in conjunction with the ID variable to check that the input data are in order and have no gaps. The INTERVAL= option is also used to extrapolate the ID values past the end of the input data. For a complete discussion of the intervals supported, please see Chapter 4, “Date Intervals, Formats, and Functions.”
IRREGULAR Statement IRREGULAR < options > ;
The IRREGULAR statement is used to include an irregular component in the model. There can be at most one IRREGULAR statement in the model specification. The irregular component corresponds to the overall random error, t , in the model. By default the irregular component is modeled as white noise—that is, as a sequence of independent, identically distributed, zero-mean, Gaussian random variables. However, as an experimental feature in this release of the UCM procedure, you can also model it as an autoregressive moving-average (ARMA) process. The options for specifying an ARMA model for the irregular component are given in a separate subsection: “ARMA Specification (Experimental)” on page 1769. The options in this statement enable you to specify the value of 2 and to output the forecasts of t . As a default, 2 is estimated using the data. Two examples of the IRREGULAR statement are given next. In the first example the statement is in its simplest form, resulting in the inclusion of an irregular component that is white noise with unknown variance: irregular;
The following statement provides a starting value for 2 , to be used in the nonlinear parameter estimation process. It also requests the printing of smoothed predictions of t . The smoothed irregulars are useful in model diagnostics. irregular variance=4 print=smooth;
NOEST
fixes the value of 2 to the value specified in the VARIANCE= option. PLOT=FILTER PLOT=SMOOTH PLOT=( < FILTER > < SMOOTH > )
requests plotting of the filtered or smoothed estimate of the irregular component. PRINT=FILTER PRINT=SMOOTH PRINT=( < FILTER > < SMOOTH > )
requests printing of the filtered or smoothed estimate of the irregular component. VARIANCE=value
specifies an initial value for 2 during the parameter estimation process. Any nonnegative value, including zero, is an acceptable starting value.
IRREGULAR Statement F 1769
ARMA Specification (Experimental) This section details the options for specifying an ARMA model for the irregular component. The specification of ARMA models requires some notation, which is explained first. Let B denote the backshift operator—that is, for any sequence t , Bt D t 1 . The higher powers of B represent larger shifts (for example, B 3 t D t 3 ). A random sequence t follows a zeromean ARMA(p,q)(P,Q)s model with nonseasonal autoregressive order p, seasonal autoregressive order P , nonseasonal moving-average order q, and seasonal moving-average order Q, if it satisfies the following difference equation specified in terms of the polynomials in the backshift operator: .B/ˆ.B s /t D .B/‚.B s /at where at is a white noise sequence and s is the season length. The polynomials ; ˆ; ; and ‚ are of orders p, P , q, and Q, respectively, which can be any nonnegative integers. The season length s must be a positive integer. For example, t satisfies an ARMA(1,1) model—that is, p D 1; q D 1; P D 0; and Q D 0—if t D 1 t
1
C at
1 at
1
for some coefficients 1 and 1 and a white noise sequence at . ARMA(1,1)(1,1)12 model if t D 1 t
1
C ˆ1 t
12
1 ˆ1 t
13
C at
1 at
1
‚1 at
12
Similarly t satisfies an C 1 ‚1 at
13
for some coefficients 1 ; ˆ1 ; 1 ; and ‚1 and a white noise sequence at . The ARMA process is stationary and invertible if the defining polynomials ; ˆ; ; and ‚ have all their roots outside the unit circle—that is, their absolute values are strictly larger than 1.0. It is assumed that the ARMA model specified for the irregular component is stationary and invertible—that is, the coefficients of the polynomials ; ˆ; ; and ‚ are constrained so that the stationarity and invertibility conditions are satisfied. The unknown coefficients of these polynomials become part of the model parameter vector that is estimated using the data. The notation for a closely related class of models, autoregressive integrated movingaverage (ARIMA) models, is also given here. A random sequence yt is said to follow an ARIMA(p,d,q)(P,D,Q)s model if, for some nonnegative integers d and D, the differenced series t D .1 B/d .1 B s /D yt follows an ARMA(p,q)(P,Q)s model. The integers d and D are called nonseasonal and seasonal differencing orders, respectively. You can specify ARIMA models by using the DEPLAG statement for specifying the differencing orders and by using the IRREGULAR statement for the ARMA specification. See Example 29.8 for an example of ARIMA(0,1,1)(0,1,1)12 model specification. Brockwell and Davis (1991) can be consulted for additional information about ARIMA models. You can use options of the IRREGULAR statement to specify the desired ARMA model and to request printed and graphical output. Several examples of the IRREGULAR statement are given next. The following statement specifies an irregular component that is modeled as an ARMA(1,1) process. It also requests plotting its smoothed estimate. irregular p=1 q=1 plot=smooth;
1770 F Chapter 29: The UCM Procedure
The following statement specifies an ARMA(1,1)(1,1)12 model. It also fixes the coefficient of the first-order seasonal moving-average polynomial to 0.1. The other coefficients and the white noise variance are estimated using the data. irregular p=1 sp=1 q=1 sq=1 s=12 sma=0.1 noest=(sma);
AR=1 2 . . . p
lists the starting values of the coefficients of the nonseasonal autoregressive polynomial: .B/ D 1
1 B
p B p
:::
where the order p is specified in the P= option. The coefficients i must define a stationary autoregressive polynomial. MA=1 2 . . . q
lists the starting values of the coefficients of the nonseasonal moving-average polynomial: .B/ D 1
1 B
:::
q B q
where the order q is specified in the Q= option. The coefficients i must define an invertible moving-average polynomial. NOEST=(
fixes the values of the ARMA parameters and the value of the white noise variance to those specified in the AR=, SAR=, MA=, SMA=, or VARIANCE= options. P=integer
specifies the order of the nonseasonal autoregressive polynomial. The order can be any nonnegative integer; the default value is 0. In practice the order is a small integer such as 1, 2, or 3. Q=integer
specifies the order of the nonseasonal moving-average polynomial. The order can be any nonnegative integer; the default value is 0. In practice the order is a small integer such as 1, 2, or 3. S=integer
specifies the season length used during the specification of the seasonal autoregressive or seasonal moving-average polynomial. The season length can be any positive integer; for example, S=4 might be an appropriate value for a quarterly series. The default value is S=1. SAR=ˆ1 ˆ2 . . . ˆP
lists the starting values of the coefficients of the seasonal autoregressive polynomial: ˆ.B s / D 1
ˆ1 B s
:::
ˆP B sP
where the order P is specified in the SP= option and the season length s is specified in the S= option. The coefficients ˆi must define a stationary autoregressive polynomial.
LEVEL Statement F 1771
SMA=‚1 ‚2 . . . ‚Q
lists the starting values of the coefficients of the seasonal moving-average polynomial: ‚.B s / D 1
‚1 B s
:::
‚Q B sQ
where the order Q is specified in the SQ= option and the season length s is specified in the S= option. The coefficients ‚i must define an invertible moving-average polynomial. SP=integer
specifies the order of the seasonal autoregressive polynomial. The order can be any nonnegative integer; the default value is 0. In practice the order is a small integer such as 1 or 2. SQ=integer
specifies the order of the seasonal moving-average polynomial. The order can be any nonnegative integer; the default value is 0. In practice the order is a small integer such as 1 or 2.
LEVEL Statement LEVEL < options > ;
The LEVEL statement is used to include a level component in the model. The level component, either by itself or together with a slope component (see the SLOPE statement), forms the trend component, t , of the model. If the slope component is absent, the resulting trend is a random walk (RW) specified by the following equations: t D t
1
C t ;
t i:i:d: N.0; 2 /
If the slope component is present, signified by the presence of a SLOPE statement, a locally linear trend (LLT) is obtained. The equations of LLT are as follows: t
D t
1
C ˇt
ˇt
D ˇt
1
C t ;
1
C t ;
t i:i:d: N.0; 2 / t i:i:d: N.0; 2 /
In either case, the options in the LEVEL statement are used to specify the value of 2 and to request forecasts of t . The SLOPE statement is used for similar purposes in the case of slope ˇt . The following examples illustrate the use of the LEVEL statement. Assuming that a SLOPE statement is not added subsequently, a simple random walk trend is specified by the following statement: level;
The following statements specify a locally linear trend with value of 2 fixed at 4. It also requests printing of filtered values of t . The value of 2 , the disturbance variance in the slope equation, is estimated from the data.
1772 F Chapter 29: The UCM Procedure
level variance=4 noest print=filter; slope;
CHECKBREAK
turns on the checking of breaks in the level component. NOEST
fixes the value of 2 to the value specified in the VARIANCE= option. PLOT=FILTER PLOT=SMOOTH PLOT=( < FILTER > < SMOOTH > )
requests plotting of the filtered or smoothed estimate of the level component. PRINT=FILTER PRINT=SMOOTH PRINT=( < FILTER > < SMOOTH > )
requests printing of the filtered or smoothed estimate of the level component. VARIANCE=value
specifies an initial value for 2 , the disturbance variance in the t equation at the start of the parameter estimation process. Any nonnegative value, including zero, is an acceptable starting value.
MODEL Statement MODEL dependent < = regressors > ;
The MODEL statement specifies the response variable and, optionally, the predictor or regressor variables for the UCM model. This is a required statement in the UCM procedure. The predictors specified in the MODEL statement are assumed to have a linear and time-invariant relationship with the response. The predictors that have time-varying regression coefficients are specified separately in the RANDOMREG statement. Similarly, the predictors that have a nonlinear effect on the response variable are specified separately in the SPLINEREG statement. Only one MODEL statement can be specified.
NLOPTIONS Statement NLOPTIONS < options > ;
PROC UCM uses the nonlinear optimization (NLO) subsystem to perform the nonlinear optimization of the likelihood function during the estimation of model parameters. You can use the NLOPTIONS statement to control different aspects of this optimization process. For most problems the
OUTLIER Statement F 1773
default settings of the optimization process are adequate. However, in some cases it might be useful to change the optimization technique or to change the maximum number of iterations. This can be done by using the TECH= and MAXITER= options in the NLOPTIONS statement as follows: nloptions tech=dbldog maxiter=200;
This sets the maximum number of iterations to 200 and changes the optimization technique to DBLDOG rather than the default technique, TRUREG, used in PROC UCM. A discussion of the full range of options that can be used with the NLOPTIONS statement is given in Chapter 6, “Nonlinear Optimization Methods.” In PROC UCM all these options are available except the options related to the printing of the optimization history. In this version of PROC UCM all the printed output from the NLO subsystem is suppressed.
OUTLIER Statement OUTLIER < options > ;
The OUTLIER statement enables you to control the reporting of the additive outliers (AO) and level shifts (LS) in the response series. The AOs are searched by default. You can turn on the search for LSs by using the CHECKBREAK option in the LEVEL statement. ALPHA=significance-level
specifies the significance level for reporting the outliers. The default is 0.05. MAXNUM=number
limits the number of outliers to search. The default is MAXNUM=5. MAXPCT=number
is similar to the MAXNUM= option. In the MAXPCT= option you can limit the number of outliers to search for according to a percentage of the series length. The default is MAXPCT=1. When both of these options are specified, the minimum of the two search numbers is used. PRINT=SHORT | DETAIL
enables you to control the printed output of the outlier search. The PRINT=SHORT option, which is the default, produces an outlier summary table containing the most significant outliers, either AO or LS, discovered in the outlier search. The PRINT=DETAIL option produces, in addition to the outlier summary table, separate tables containing the AO and LS structural break chi-square statistics computed at each time point in the estimation span.
RANDOMREG Statement RANDOMREG regressors < / options > ;
1774 F Chapter 29: The UCM Procedure
The RANDOMREG statement is used to specify regressors with time-varying regression coefficients. Each regression coefficient—say, ˇt — is assumed to evolve as a random walk: ˇt D ˇt
1
C t ;
t i:i:d: N.0; 2 /
Of course, if the random walk disturbance variance 2 is zero, then the regression coefficient is not time varying, and it reduces to the standard regression setting. There can be multiple RANDOMREG statements, and each statement can contain one or more regressors. The regressors in a given RANDOMREG statement form a group that is assumed to share the same disturbance variance parameter. The random walks associated with different regressors are assumed to be independent. For an example of using this statement see Example 29.4. See the section “Reporting Parameter Estimates for Random Regressors” on page 1794 for additional information about the way parameter estimates are reported for this type of regressors. NOEST
fixes the value of 2 to the value specified in the VARIANCE= option. PLOT=FILTER PLOT=SMOOTH PLOT=( < FILTER > < SMOOTH > )
requests plotting of filtered or smoothed estimate of the time-varying regression coefficient. PRINT=FILTER PRINT=SMOOTH PRINT=( < FILTER > < SMOOTH > )
requests printing of the filtered or smoothed estimate of the time-varying regression coefficient. VARIANCE=value
specifies an initial value for 2 during the parameter estimation process. Any nonnegative value, including zero, is an acceptable starting value.
SEASON Statement SEASON LENGTH = integer < options > ;
The SEASON or SEASONAL statement is used to specify a seasonal component, t , in the model. A seasonal component can be one of the two types, DUMMY or TRIG. A DUMMY seasonal with season length s satisfies the following stochastic equation: s 1 X
t
i
D !t ;
!t i:i:d: N.0; !2 /
i D0
The equations for a TRIG (short for trigonometric) seasonal component are as follows:
t D
Œs=2 X j D1
j;t
SEASON Statement F 1775
where Œs=2 equals s=2 if s is even and .s 1/=2 if it is odd. The sinusoids, also called harmonics,
j;t have frequencies j D 2j=s and are specified by the matrix equation
j;t
j;t
D
cos j sin j
sin j cos j
j;t
j;t
1 1
C
!j;t !j;t
are assumed to be independent and, for fixed j , !j;t and where the disturbances !j;t and !j;t 2 is not needed and s=2;t is given by !j;t N.0; ! /. If s is even, then the equation for s=2;t
s=2;t D
s=2;t
1
C !s=2;t
In the TRIG seasonal case, the option KEEPH= or DROPH= can be used to obtain subset trigonometric seasonals that contain only a subset of the full set of harmonics j;t , j D 1; 2; : : : ; Œs=2. This is particularly useful when the season length s is large and the seasonal pattern is relatively smooth. Note that whether the seasonal type is DUMMY or TRIG, there is only one parameter, the disturbance variance !2 , in the seasonal model. There can be more than one seasonal component in the model, necessarily with different season lengths if the seasons are full. You can have multiple subset season components with the same season length, if you need to use separate disturbance variances for different sets of harmonics. Each seasonal component is specified using a separate SEASON statement. A model with multiple seasonal components can easily become quite complex and might need a large amount of data and computing resources for its estimation and forecasting. The examples that follow illustrate the use of SEASON statement. The following statement specifies a DUMMY type (default) seasonal component with a season length of four, corresponding to the quarterly seasonality. The disturbance variance !2 is estimated from the data. season length=4;
The following statement specifies a trigonometric seasonal with monthly seasonality. It also provides a starting value for !2 . season length=12 type=trig variance=4;
DROPHARMONICS|DROPH=number-list | n TO m BY p
enables you to drop some harmonics j;t from the full set of harmonics used to obtain a trigonometric seasonal. The drop list can include any integer between 1 and Œs=2, s being the season length. For example, the following specification results in a specification of a trigonometric seasonal with a season length 12 that consists of only the first four harmonics
j;t , j D 1; 2; 3; 4: season length=12 type=trig DROPH=5 6;
The last two high frequency harmonics are dropped. The DROPH= option cannot be used with the KEEPH= option.
1776 F Chapter 29: The UCM Procedure
KEEPHARMONICS|KEEPH=number-list | n TO m BY p
enables you to keep only the harmonics j;t listed in the option to obtain a trigonometric seasonal. The keep list can include any integer between 1 and Œs=2, s being the season length. For example, the following specification results in a specification of a trigonometric seasonal with a season length of 12 that consists of all the six harmonics j;t , j D 1; : : : 6: season length=12 type=trig KEEPH=1 to 3; season length=12 type=trig KEEPH=4 to 6;
However, these six harmonics are grouped into two groups, each having its own disturbance variance parameter. The DROPH= option cannot be used with the KEEPH= option. LENGTH=integer
specifies the season length, s. This is a required option in this statement. The season length can be any integer greater than or equal to 2. Typical examples of season lengths are 12, corresponding to the monthly seasonality, or 4, corresponding to the quarterly seasonality. NOEST
fixes the value of the disturbance variance parameter to the value specified in the VARIANCE= option. PLOT=FILTER PLOT=SMOOTH PLOT=F_ANNUAL PLOT=S_ANNUAL PLOT=(
requests plots of the season component. When you specify only one plot request, you can omit the parentheses around the plot request. You can use the FILTER and SMOOTH options to plot the filtered and smoothed estimates of the season component t . You can use the F_ANNUAL and S_ANNUAL options to get the plots of “annual” variation in the filtered and smoothed estimates of t . The annual plots are useful to see the change in the contribution of a particular month over the span of years. Here “month” and “year” are generic terms that change appropriately with the interval type being used to label the observations and the season length. For example, for monthly data with a season length of 12, the usual meaning applies, while for daily data with a season length of 7, the days of the week serve as months and the weeks serve as years. PRINT=HARMONICS
requests printing of the summary of harmonics present in the seasonal component. This option is valid only for the trigonometric seasonal component. PRINT=FILTER PRINT=SMOOTH PRINT=( < print request > . . . < print request > )
requests printing of the filtered or smoothed estimate of the seasonal component t . TYPE=DUMMY | TRIG
specifies the type of the seasonal component. The default type is DUMMY.
SLOPE Statement F 1777
VARIANCE=value
specifies an initial value for the disturbance variance, !2 , in the t equation at the start of the parameter estimation process. Any nonnegative value, including zero, is an acceptable starting value.
SLOPE Statement SLOPE < options > ;
The SLOPE statement is used to include a slope component in the model. The slope component cannot be used without the level component (see the LEVEL statement). The level and slope specifications jointly define the trend component of the model. A SLOPE statement without the accompanying LEVEL statement is ignored. The equations of the trend, defined jointly by the level t and slope ˇt , are as follows: t
D t
1
C ˇt
ˇt
D ˇt
1
C t ;
1
C t ;
t i:i:d: N.0; 2 / t i:i:d: N.0; 2 /
The SLOPE statement is used to specify the value of the disturbance variance, 2 , in the slope equation, and to request forecasts of ˇt . The following examples illustrate this statement: level; slope;
The preceding statements fit a model with a locally linear trend. The disturbance variances 2 and 2 are estimated from the data. You can request a locally linear trend with fixed slope by using the following statements: level; slope variance=0 noest;
NOEST
fixes the value of the disturbance variance, 2 , to the value specified in the VARIANCE= option. PLOT=FILTER PLOT=SMOOTH PLOT=( < FILTER > < SMOOTH > )
requests plotting of the filtered or smoothed estimate of the slope component. PRINT=FILTER PRINT=SMOOTH PRINT=( < FILTER > < SMOOTH > )
requests printing of the filtered or smoothed estimate of the slope component ˇt .
1778 F Chapter 29: The UCM Procedure
VARIANCE=value
specifies an initial value for the disturbance variance, 2 , in the ˇt equation at the start of the parameter estimation process. Any nonnegative value, including zero, is an acceptable starting value.
SPLINEREG Statement SPLINEREG regressor < options > ;
The SPLINEREG statement is used to specify a regressor that has a nonlinear relationship with the dependent series that can be approximated by a given B-spline. If the specified spline has degree d and is based on n internal knots, then it is known that it can be written as a linear combination of .n C d C 1/ regressors that are derived from the original regressor. The span of these .n C d C 1/ derived regressors includes constant; therefore, to avoid multicollinearity with the level component, one of these regressors is dropped. Specifying the SPLINEREG statement is equivalent to specifying a RANDOMREG statement with these derived regressors. There can be multiple SPLINEREG statements. You must specify at least one interior knot, either using the NKNOTS= option or the KNOTS= option. For additional information about splines, see Chapter 90, “The TRANSREG Procedure” (SAS/STAT User’s Guide). For an example of using this statement, see Example 29.6. See the section “Reporting Parameter Estimates for Random Regressors” on page 1794 for additional information about the way parameter estimates are reported for this type of regressors. DEGREE=integer
specifies the degree of the spline. It can be any integer larger than or equal to zero. The default value is 3. The polynomial degree should be a small integer, usually 0, 1, 2, or 3. Larger values are rarely useful. If you have any doubt as to what degree to specify, use the default. KNOTS=number-list | n TO m BY p
specifies the interior knots or break points. The values in the knot list must be nondecreasing and must lie between the minimum and the maximum of the spline regressor values in the input data set. The first time you specify a value in the knot list, it indicates a discontinuity in the nth (from DEGREE=n) derivative of the transformation function at the value of the knot. The second mention of a value indicates a discontinuity in the .n 1/th derivative of the transformation function at the value of the knot. Knots can be repeated any number of times for decreasing smoothness at the break points, but the values in the knot list can never decrease. You cannot use the KNOTS= option with the NKNOTS= option. You should keep the number of knots small. NKNOTS=m
creates m knots, the first at the 100=.m C 1/ percentile, the second at the 200=.m C 1/ percentile, and so on. Knots are always placed at data values; there is no interpolation. For example, if NKNOTS=3, knots are placed at the 25th percentile, the median, and the 75th percentile. The value specified for the NKNOTS= option must be 1. You cannot use the NKNOTS=option with the KNOTS= option.
SPLINESEASON Statement F 1779
N OTE : Specifying knots by using the NKNOTS= option can result in different sets of knots in the estimation and forecast stages if the distributions of regressor values in the estimation and forecast spans differ. The estimation span is based on the BACK= and SKIPFIRST= options in the ESTIMATE statement, and the forecast span is based on the BACK= and SKIPFIRST= options in the FORECAST statement. NOEST
fixes the value of the regression coefficient random walk disturbance variance to the value specified in the VARIANCE= option. PLOT=FILTER PLOT=SMOOTH PLOT=( < FILTER > < SMOOTH > )
requests plotting of filtered or smoothed estimate of the time-varying regression coefficient. PRINT=FILTER PRINT=SMOOTH PRINT=( < FILTER > < SMOOTH > )
requests printing of filtered or smoothed estimate of the time-varying regression coefficient. VARIANCE=value
specifies an initial value for the regression coefficient random walk disturbance variance during the parameter estimation process. Any nonnegative value, including zero, is an acceptable starting value.
SPLINESEASON Statement SPLINESEASON LENGTH = integer KNOTS= i nteger1 i nteger2 . . . < options > ;
The SPLINESEASON statement is used to specify a seasonal pattern that is to be approximated by a given B-spline. If the specified spline has degree d and is based on n internal knots, then it can be written as a linear combination of .n C d / regressors that are derived from the seasonal dummy regressors. The SPLINESEASON specification is equivalent to specifying a RANDOMREG specification with these derived regressors. Such approximation is useful only if the season length is relatively large, at least larger than .n C d /. For additional information about splines, see Chapter 90, “The TRANSREG Procedure” (SAS/STAT User’s Guide). For an example of using this statement, see Example 29.3. DEGREE=integer
specifies the degree of the spline. It can be any integer greater than or equal to zero. The default value is 3. KNOTS=i nt eger1 i nt eger2 . . .
lists the internal knots. This list of values must be a nondecreasing sequence of integers within the range of 2 to .s 1/, where s is the season length specified in the LENGTH= option. This is a required option in this statement.
1780 F Chapter 29: The UCM Procedure
LENGTH=integer
specifies the season length, s. This is a required option in this statement. The length can be any integer greater than or equal to three. NOEST
fixes the value of the regression coefficient random walk disturbance variance to the value specified in the VARIANCE= option. OFFSET=integer
specifies the position of the first measurement within the season, if the first measurement is not at the start of the season. The OFFSET= value must be between one and the season length. The default value is one. The first measurement refers to the start of the estimation span and the forecast span. If these spans differ, their starting measurements must be separated by an integer multiple of the season length. PLOT=FILTER PLOT=SMOOTH PLOT=( < FILTER > < SMOOTH > )
requests plots of the season component. When you specify only one plot request, you can omit the parentheses around the plot request. You can use the FILTER and SMOOTH options to plot the filtered and smoothed estimates of the season component. PRINT=FILTER PRINT=SMOOTH PRINT=( < FILTER > < SMOOTH > )
requests the printing of the filtered or smoothed estimate of the spline season component. RKNOTS=(knot, . . . , knot ) . . . (knot, . . . , knot )
Experimental
specifies a grouping of knots such that the knots within the same group have identical seasonal values. The knots specified in this option must already be present in the list specified by the KNOTS= option. The knot groups must be non-overlapping and without any repeated knots. VARIANCE=value
specifies an initial value for the regression coefficient random walk disturbance variance during the parameter estimation process. Any nonnegative value, including zero, is an acceptable starting value.
Details: UCM Procedure F 1781
Details: UCM Procedure
An Introduction to Unobserved Component Models A UCM decomposes the response series into components such as trend, seasons, cycles, and the regression effects due to predictor series. The following model shows a possible scenario:
yt
D t C t C
t
C
m X
ˇj xjt C t
j D1
t
i:i:d: N.0; 2 /
The terms t ; t , and t represent the trend, seasonal, and cyclical components, respectively. In fact the model can contain multiple seasons and cycles, and the seasons can be of different types. For simplicity of discussion the preceding model contains only one of each of these components. The P regression term, m ˇ j D1 j xjt , includes contribution of regression variables with fixed regression coefficients. A model can also contain regression variables that have time varying regression coefficients or that have a nonlinear relationship with the dependent series (see “Incorporating Predictors of Different Kinds” on page 1793). The disturbance term t , also called the irregular component, is usually assumed to be Gaussian white noise. In some cases it is useful to model the irregular component as a stationary ARMA process. See the section “Modeling the Irregular Component” on page 1785 for additional information. By controlling the presence or absence of various terms and by choosing the proper flavor of the included terms, the UCMs can generate a rich variety of time series patterns. A UCM can be applied to variables after transforming them by transforms such as log and difference. The components t ; t , and t model structurally different aspects of the time series. For example, the trend t models the natural tendency of the series in the absence of any other perturbing effects such as seasonality, cyclical components, and the effects of exogenous variables, while the seasonal component t models the correction to the level due to the seasonal effects. These components are assumed to be statistically independent of each other and independent of the irregular component. All of the component models can be thought of as stochastic generalizations of the relevant deterministic patterns in time. This way the deterministic cases emerge as special cases of the stochastic models. The different models available for these unobserved components are discussed next.
Modeling the Trend As mentioned earlier, the trend in a series can be loosely defined as the natural tendency of the series in the absence of any other perturbing effects. The UCM procedure offers two ways to model the trend component t . The first model, called the random walk (RW) model, implies that the trend remains roughly constant throughout the life of the series without any persistent upward or downward drift. In the second model the trend is modeled as a locally linear time trend (LLT). The
1782 F Chapter 29: The UCM Procedure
RW model can be described as t
D t
1
t i:i:d: N.0; 2 /
C t ;
Note that if 2 D 0, then the model becomes t D constant . In the LLT model the trend is locally linear, consisting of both the level and slope. The LLT model is t
D t
1
C ˇt
ˇt
D ˇt
1
C t ;
1
C t ;
t i:i:d: N.0; 2 / t i:i:d: N.0; 2 /
The disturbances t and t are assumed to be independent. There are some interesting special cases of this model obtained by setting one or both of the disturbance variances 2 and 2 equal to zero. If 2 is set equal to zero, then you get a linear trend model with fixed slope. If 2 is set to zero, then the resulting model usually has a smoother trend. If both the variances are set to zero, then the resulting model is the deterministic linear time trend: t D 0 C ˇ0 t. You can incorporate these trend patterns in your model by using the LEVEL and SLOPE statements.
Modeling a Cycle A deterministic cycle t
t
with frequency , 0 < < , can be written as
D ˛ cos.t / C ˇ sin.t /
If the argument t is measured on a continuous scale, then t is a periodic function with period 2=, amplitude D .˛ 2 C ˇ 2 /1=2 , and phase D tan 1 .ˇ=˛/. Equivalently, the cycle can be written in terms of the amplitude and phase as t
D cos.t
/
Note that when t is measured only at the integer values, it is not exactly periodic, unless D .2j /=k for some integers j and k. The cycles in their pure form are not used very often in practice. However, they are very useful as building blocks for more complex periodic patterns. It is well known that the periodic pattern of any complexity can be written as a sum of pure cycles of different frequencies and amplitudes. In time series situations it is useful to generalize this simple cyclical pattern to a stochastic cycle that has a fixed period but time-varying amplitude and phase. The stochastic cycle considered here is motivated by the following recursive formula for computing t: cos sin t t 1 D sin cos t t 1 starting with 2 t
C
0
D ˛ and
2 t
0
D ˛2 C ˇ2
D ˇ. Note that for all t
t
and
t
satisfy the relation
An Introduction to Unobserved Component Models F 1783
A stochastic generalization of the cycle t can be obtained by adding random noise to this recursion and by introducing a damping factor, , for additional modeling flexibility. This model can be described as follows: t cos sin t 1 t C D t sin cos t t 1 where 0 1, and the disturbances t and t are independent N.0; 2 / variables. The resulting stochastic cycle has a fixed period but time-varying amplitude and phase. The stationarity properties of the random sequence t depend on the damping factor . If < 1, t has a stationary distribution with mean zero and variance 2 =.1 2 /. If D 1, t is nonstationary. You can incorporate a cycle in a UCM by specifying a CYCLE statement. You can include multiple cycles in the model by using separate CYCLE statements for each included cycle. As mentioned before, the cycles are very useful as building blocks for constructing more complex periodic patterns. Periodic patterns of almost any complexity can be created by superimposing cycles of different periods and amplitudes. In particular, the seasonal patterns, general periodic patterns with integer periods, can be constructed as sums of cycles. This important topic of modeling the seasonal components is considered next.
Modeling Seasons The seasonal fluctuations are a common source of variation in time series data. These fluctuations arise because of the regular changes in seasons or some other periodic events. The seasonal effects are regarded as corrections to the general trend of the series due to the seasonal variations, and these effects sum to zero when summed over the full season cycle. Therefore the seasonal component
t Ps 1 is modeled as a stochastic periodic pattern of an integer period s such that the sum i D0 t i is always zero in the mean. The period s is called the season length. Two different models for the seasonal component are considered here. The first model is called the dummy variable form of the seasonal component. It is described by the equation s 1 X
t
i
!t i:i:d: N.0; !2 /
D !t ;
i D0
The other model is called the trigonometric form of the seasonal component. In this case t is modeled as a sum of cycles of different frequencies. This model is given as follows:
t D
Œs=2 X
j;t
j D1
where Œs=2 equals s=2 if s is even and .s 1/=2 if it is odd. The cycles j;t have frequencies j D 2j=s and are specified by the matrix equation !j;t
j;t
j;t 1 cos j sin j D C !j;t
j;t
j;t sin j cos j 1 where the disturbances !j;t and !j;t are assumed to be independent and, for fixed j , !j;t and !j;t N.0; !2 /. If s is even, then the equation for s=2;t is not needed and s=2;t is given by
s=2;t D
s=2;t
1
C !s=2;t
1784 F Chapter 29: The UCM Procedure
The cycles j;t are called harmonics. If the seasonal component is deterministic, the decomposition of the seasonal effects into these harmonics is identical to its Fourier decomposition. In this case the sum of squares of the seasonal factors equals the sum of squares of the amplitudes of these harmonics. In many practical situations, the contribution of the high-frequency harmonics is negligible and can be ignored, giving rise to a simpler description of the seasonal. In the case of stochastic seasonals, the situation might not be so transparent; however, similar considerations still apply. Note that if the disturbance variance !2 D 0, then both the dummy and the trigonometric forms of seasonal components reduce to constant seasonal effects. That is, the seasonal component reduces to a deterministic function that is completely determined by its first s 1 values. In the UCM procedure you can specify a seasonal component in a variety of ways, the SEASON statement being the simplest of these. The dummy and the trigonometric seasonal components discussed so far can be considered as saturated seasonal components that put no restrictions on the s 1 seasonal values. In some cases a more parsimonious representation of the seasonal might be more appropriate. This is particularly useful for seasonal components with large season lengths. In the UCM procedure you can obtain parsimonious representations of the seasonal components by one of the following ways: Use a subset trigonometric seasonal component obtained by deleting a few of the Œs=2 harmonics used in its sum. For example, a slightly smoother seasonal component of length 12, corresponding to the monthly seasonality, can be obtained by deleting the highest-frequency harmonic of period 2. That is, such a seasonal component will be a sum of five stochastic cycles that have periods 12, 6, 4, 3, and 2.4. You can specify such subset seasonal components by using the KEEPH= or DROPH= option in the SEASON statement. Approximate the seasonal pattern by a suitable spline approximation. You can do this by using the SPLINESEASON statement. A block-seasonal pattern is a seasonal pattern where the pattern is divided into a few blocks of equal length such that the season values within a block are the same—for example, a monthly seasonal pattern that has only four different values, one for each quarter. In some situations a long seasonal pattern can be approximated by the sum of block season and a simple season, the length of the simple season being equal to the block length of the block season. You can obtain such approximation by using a combination of BLOCKSEASON and SEASON statements. Consider a seasonal component of a large season length as a sum of two or more seasonal components that are each of much smaller season lengths. This can be done by specifying more than one SEASON statements. Note that the preceding techniques of obtaining parsimonious seasonal components can also enable you to specify seasonal components that are more general than the simple saturated seasonal components. For example, you can specify a saturated trigonometric seasonal component that has some of its harmonics evolving according to one disturbance variance parameter while the others evolve with another disturbance variance parameter.
An Introduction to Unobserved Component Models F 1785
Modeling an Autoregression An autoregression of order one can be thought of as a special case of a cycle when the frequency is either 0 or . Modeling this special case separately helps interpretation and parameter estimation. The autoregression component rt is modeled as follows: rt D rt
1
C t ;
t i:i:d: N.0; 2 /
where 1 < 1. An autoregression can also provide an alternative to the IRREGULAR component when the model errors show some autocorrelation. You can incorporate an autoregression in your model by using the AUTOREG statement.
Modeling Regression Effects A predictor variable can affect the response variable in a variety of ways. The UCM procedure enables you to model several different types of predictor-response relationships: The predictor-response relationship is linear, and the regression coefficient does not change with time. This is the simplest kind of relationship and such predictors are specified in the MODEL statement. The predictor-response relationship is linear, but the regression coefficient does change with time. Such predictors are specified in the RANDOMREG statement. Here the regression coefficient is assumed to evolve as a random walk. The predictor-response relationship is nonlinear and the relationship can change with time. This type of relationship can be approximated by an appropriate time-varying spline. Such predictors are specified in the SPLINEREG statement. A response variable can depend on its own past values—that is, lagged dependent values. Such a relationship can be specified in the DEPLAG statement.
Modeling the Irregular Component The components—such as trend, seasonal and regression effects, and nonstationary cycles—are used to capture the structural dynamics of a response series. In contrast, the stationary cycles and the autoregression are used to capture the transient aspects of the response series that are important for its short-range prediction but have little impact on its long-term forecasts. The irregular component represents the residual variation remaining in the response series that is modeled using an appropriate selection of structural and transient effects. In most cases, the irregular component can be assumed to be simply Gaussian white noise. In some other cases, however, the residual variation can be more complicated. In such situations, it might be necessary to model the irregular component as a stationary ARMA process. Moreover, you can use the ARMA irregular component together with the dependent lag specification (see the DEPLAG statement) to specify an ARIMA(p,d,q)(P,D,Q)s model for the response series. See the IRREGULAR statement for the explanation of the ARIMA notation. See Example 29.8 for an example of modeling a series by using an ARIMA(0,1,1)(0,1,1)12 model.
1786 F Chapter 29: The UCM Procedure
The Model Parameters The parameter vector in a UCM consists of the variances of the disturbance terms of the unobserved components, the damping coefficients and frequencies in the cycles, the damping coefficient in the autoregression, and the regression coefficients in the regression terms. These parameters are estimated by maximizing the likelihood. It is possible to restrict the values of the model parameters to user-specified values.
Model Specification A UCM is specified by describing the components in the model. For example, consider the model yt D t C t C t consisting of the irregular, level, slope, and seasonal components. This model is called the basic structural model (BSM) by Harvey (1989). The syntax for a BSM with monthly seasonality of trigonometric type is as follows: model y; irregular; level; slope; season length=12 type=trig;
Similarly the following syntax specifies a BSM with a response variable y, a regressor x, and dummy-type monthly seasonality: model y = x; irregular; level; slope variance=0 noest; season length=12 type=dummy;
Moreover, the disturbance variance of the slope component is restricted to zero, giving rise to a local linear trend with fixed slope. A model can contain multiple cycle and seasonal components. In such cases the model syntax contains a separate statement for each of these multiple cycle or seasonal components; for example, the syntax for a model containing irregular and level components along with two cycle components could be as follows: model y = x; irregular; level; cycle; cycle;
The UCMs as State Space Models F 1787
The UCMs as State Space Models The UCMs considered in PROC UCM can be thought of as special cases of more general models, called (linear) Gaussian state space models (GSSM). A GSSM can be described as follows: yt
D Zt ˛t
˛t C1 D Tt ˛t C t C1 ;
t N.0; Qt /
˛1 N.0; P / The first equation, called the observation equation, relates the response series yt to a state vector ˛t that is usually unobserved. The second equation, called the state equation, describes the evolution of the state vector in time. The system matrices Zt and Tt are of appropriate dimensions and are known, except possibly for some unknown elements that become part of the parameter vector of the model. The noise series t consists of independent, zero-mean, Gaussian vectors with covariance matrices Qt . For most of the UCMs considered here, the system matrices Zt and Tt , and the noise covariances Qt , are time invariant—that is, they do not depend on time. In a few cases, however, some or all of them can depend on time. The initial state vector ˛1 is assumed to be independent of the noise series, and its covariance matrix P can be partially diffuse. A random vector has a partially diffuse covariance matrix if it can be partitioned such that one part of the vector has a properly defined probability distribution, while the covariance matrix of the other part is infinite— that is, you have no prior information about this part of the vector. The covariance of the initial state ˛1 is assumed to have the following form: P D P C P1 where P and P1 are nonnegative definite, symmetric matrices and is a constant that is assumed to be close to 1. In the case of UCMs considered here, P1 is always a diagonal matrix that consists of zeros and ones, and, if a particular diagonal element of P1 is one, then the corresponding row and column in P are zero. The state space formulation of a UCM has many computational advantages. In this formulation there are convenient algorithms for estimating and forecasting the unobserved states f˛t g by using the observed series fyt g. These algorithms also yield the in-sample and out-of-sample forecasts and the likelihood of fyt g. The state space representation of a UCM does not need to be unique. In the representation used here, the unobserved components in the UCM often appear as elements of the state vector. This makes the elements of the state interpretable and, more important, the sample estimates and forecasts of these unobserved components are easily obtained. For additional information about the computational aspects of the state space modeling, see Durbin and Koopman (2001). Next, some notation is developed to describe the essential quantities computed during the analysis of the state space models. Let fyt ; t D 1; : : : ; ng be the observed sample from a series that satisfies a state space model. Next, for 1 t n, let the one-step-ahead forecasts of the series, the states, and their variances be defined as follows, using the usual notation to denote the conditional expectation and conditional
1788 F Chapter 29: The UCM Procedure
variance: ˛O t
D E.˛t jy1 ; y2 ; : : : ; yt
t
D Var.˛t jy1 ; y2 ; : : : ; yt
yOt
D E.yt jy1 ; y2 ; : : : ; yt
Ft
D Var.yt jy1 ; y2 ; : : : ; yt
1/ 1/
1/ 1/
These are also called the filtered estimates of the series and the states. Similarly, for t 1, let the following denote the full-sample estimates of the series and the state values at time t: ˛Q t
D E.˛t jy1 ; y2 ; : : : ; yn /
t
D Var.˛t jy1 ; y2 ; : : : ; yn /
yQt
D E.yt jy1 ; y2 ; : : : ; yn /
Gt
D Var.yt jy1 ; y2 ; : : : ; yn /
If the time t is in the historical period— that is, if 1 t n— then the full-sample estimates are called the smoothed estimates, and if t lies in the future then they are called out-of-sample forecasts. Note that if 1 t n, then yQt D yt and Gt D 0, unless yt is missing. All the filtered and smoothed estimates (˛O t ; ˛Q t ; : : : ; Gt , and so on) are computed by using the Kalman filtering and smoothing (KFS) algorithm, which is an iterative process. If the initial state is diffuse, as is often the case for the UCMs, its treatment requires modification of the traditional KFS, which is called the diffuse KFS (DKFS). The details of DKFS implemented in the UCM procedure can be found in de Jong and Chu-Chun-Lin (2003). Additional information on the state space models can be found in Durbin and Koopman (2001). The likelihood formulas described in this section are taken from the latter reference. In the case of diffuse initial condition, the effect of the improper prior distribution of ˛1 manifests itself in the first few filtering iterations. During these initial filtering iterations the distribution of the filtered quantities remains diffuse; that is, during these iterations the one-step-ahead series and state forecast variances Ft and t have the following form: Ft
D Ft C F1t
t
D t C 1t
The actual number of iterations—say, I — affected by this improper prior depends on the nature of the vectors Zt , the number of nonzero diagonal elements of P1 , and the pattern of missing values in the dependent series. After I iterations, 1t and F1t become zero and the one-step-ahead series and state forecasts have proper distributions. These first I iterations constitute the initialization phase of the DKFS algorithm. The post-initialization phase of the DKFS and the traditional KFS is the same. In the state space modeling literature the pre-initialization and post-initialization phases are some times called pre-collapse and post-collapse phases of the diffuse Kalman filtering. In certain missing value patterns it is possible for I to exceed the sample size; that is, the sample information can be insufficient to create a proper prior for the filtering process. In these cases, parameter estimation and forecasting is done on the basis of this improper prior, and some or all
The UCMs as State Space Models F 1789
of the series and component forecasts can have infinite variances (or zero precision). The forecasts that have infinite variance are set to missing. The same situation can occur if the specified model contains components that are essentially multicollinear. In these situations no residual analysis is possible; in particular, no residuals-based goodness-of-fit statistics are produced. The log likelihood of the sample (L1 ), which takes account of this diffuse initialization step, is computed by using the one-step-ahead series forecasts as follows: .n
L1 .y1 ; : : : ; yn / D
d/ 2
I
log 2
1X wt 2 t D1
n 1 X 2 .log Ft C t / 2 Ft t DI C1
where d is the number of diffuse elements in the initial state ˛1 , t D yt ahead residuals, and wt
D log F1t D log Ft C
Zt ˛O t are the one-step-
if F1t > 0 t2
if F1t D 0
Ft
If yt is missing at some time t, then the corresponding summand in the log likelihood expression is deleted, and the constant term is adjusted suitably. Moreover, if the initialization step does not complete—that is, if I exceeds the sample size— then the value of d is reduced to the number of diffuse states that are successfully initialized. The portion of the log likelihood that corresponds to the post-initialization period is called the nondiffuse log likelihood (L0 ). The nondiffuse log likelihood is given by L0 .y1 ; : : : ; yn / D
n 1 X 2 .log Ft C t / 2 Ft t DI C1
In the P case of UCMs considered in PROC UCM, it often happens that the diffuse part of the likelihood, ItD1 wt , does not depend on the model parameters, and in these cases the maximization of nondiffuse and diffuse likelihoods is equivalent. However, in some cases, such as when the model consists of dependent lags, the diffuse part does depend on the model parameters. In these cases the maximization of the diffuse and nondiffuse likelihood can produce different parameter estimates. In some situations it is convenient to reparameterize the nondiffuse initial state covariance P as 2 P and the state noise covariance Qt as 2 Qt for some common scalar parameter 2 . In this case the preceding log-likelihood expression, up to a constant, can be written as I
L1 .y1 ; : : : ; yn / D
1X wt 2 t D1
n 1 X log Ft 2 t DI C1
n 1 X t2 2 2 Ft t DI C1
.n
d/ 2
log 2
Solving analytically for the optimum, the maximum likelihood estimate of 2 can be shown to be 1
2
O D
.n
n X t2 d/ Ft t DI C1
1790 F Chapter 29: The UCM Procedure
When this expression of 2 is substituted back into the likelihood formula, an expression called the profile likelihood (Lprof ile ) of the data is obtained: 2Lprof ile .y1 ; : : : ; yn / D
I X t D1
wt C
n X
log Ft C .n
t DI C1
d / log.
n X t2 / Ft
t DI C1
In some situations the parameter estimation is done by optimizing the profile likelihood (see the section “Parameter Estimation by Profile Likelihood Optimization” on page 1798 and the PROFILE option in the ESTIMATE statement). In the remainder of this section the state space formulation of UCMs is further explained by using some particular UCMs as examples. The examples show that the state space formulation of the UCMs depends on the components in the model in a simple fashion; for example, the system matrix T is usually a block diagonal matrix with blocks that correspond to the components in the model. The only exception to this pattern is the UCMs that consist of the lags of dependent variable. This case is considered at the end of the section. In what follows, Di ag Œa; b; : : : denotes a diagonal matrix with diagonal entries Œa; b; : : : , and 0 the transpose of a matrix T is denoted as T .
Local Level Model Recall that the dynamics of a local level model are yt
D t C t
t
D t
1
C ˇt
ˇt
D ˇt
1
C t
1
C t
Here yt is the response series and t ; t ; and t are independent, zero-mean Gaussian disturbance sequences with variances 2 ; 2 , and 2 , respectively. This model can be formulated as a state 0
0
space model where the state vector ˛t D Œ t t ˇt and the state noise t D Œ t t t . Note that the elements of the state vector are precisely the unobserved components in the model. The system matrices T and Z and the noise covariance Q corresponding to this choice of state and state noise vectors can be seen to be time invariant and are given by 2 3 000 h i Z D Œ 1 1 0 ; T D 4 0 1 1 5 and Q D Diag 2 ; 2 ; 2 001 The distribution of the initial state vector ˛1 is diffuse, with P D Diag 2 ; 0; 0 and P1 D Di ag Œ0; 1; 1. The parameter vector consists of all the disturbance variances—that is, D .2 ; 2 ; 2 /.
Basic Structural Model The basic structural model (BSM) is obtained by adding a seasonal component, t , to the local level model. In order to economize on the space, the state space formulation of a BSM with a relatively
The UCMs as State Space Models F 1791
short season length, season length = 4 (quarterly seasonality), is considered here. The pattern for longer season lengths such as 12 (monthly) and 52 (weekly) is easy to see. Let us first consider the dummy form of seasonality. In this case the state and state noise vectors are 0 0 ˛t D t t ˇt 1;t 2;t 3;t and t D Œ t t t !t 0 0 , respectively. The first three elements of the state vector are the irregular, level, and slope components, respectively. The remaining elements, i;t , are lagged versions of the seasonal component t . 1;t corresponds to lag zero—that is, the same as t , 2;t to lag 1 and 3;t to lag 2. The system matrices are 2 6 6 6 Z D Œ 1 1 0 1 0 0 ; T D 6 6 6 4
0 0 0 0 0 0
0 1 0 0 0 0
0 1 1 0 0 0
0 0 0 –1 1 0
0 0 0 –1 0 1
0 0 0 –1 0 0
3 7 7 7 7 7 7 5
h i and Q D Di ag 2 ; 2 ; 2 ; !2 ; 0; 0 . The distribution of the initial state vector ˛1 is diffuse, with P D Di ag 2 ; 0; 0; 0; 0; 0 and P1 D Diag Œ0; 1; 1; 1; 1; 1. h i0 In the case of the trigonometric type of seasonality, ˛t D t t ˇt 1;t 1;t
2;t and t D h i0 t t t !1;t !1;t !2;t . The disturbance sequences, !j;t ; 1 j 2, and !1;t , are independent, zero-mean, Gaussian sequences with variance !2 . The system matrices are 2 6 6 6 Z D Œ 1 1 0 1 0 1 ; T D 6 6 6 4
0 0 0 0 0 0
0 1 0 0 0 0
0 1 1 0 0 0
0 0 0 cos 1 sin 1 0
0 0 0 sin 1 cos 1 0
0 0 0 0 0 cos 2
3 7 7 7 7 7 7 5
h i and Q D Di ag 2 ; 2 ; 2 ; !2 ; !2 ; !2 . Here j D .2j /=4. The distribution of the initial state vector ˛1 is diffuse, with P D Di ag 2 ; 0; 0; 0; 0; 0 and P1 D Diag Œ0; 1; 1; 1; 1; 1. The parameter vector in both the cases is D .2 ; 2 ; 2 ; !2 /.
Seasons with Blocked Seasonal Values Block seasonals are special seasonal components that impose a special block structure on the seasonal effects. Let us consider a BSM with monthly seasonality that has a quarterly block structure— that is, months within the same quarter are assumed to have identical effects except for some random perturbation. Such a seasonal component is a block seasonal with block size m equal to 3 and the number of blocks k equal to 4. The state space structure for such a model with dummy-type sea 0 sonality is as follows: The state and state noise vectors are ˛t D t t ˇt 1;t 2;t 3;t and 0 t D Œ t t t !t 0 0 , respectively. The first three elements of the state vector are the irregular, level, and slope components, respectively. The remaining elements, i;t , are lagged versions of the seasonal component t . 1;t corresponds to lag zero—that is, the same as t , 2;t to lag m and 3;t
1792 F Chapter 29: The UCM Procedure
to lag 2m. All the system matrices h are time invariant,iexcept the matrix T . They can be seen to be Z D Œ 1 1 0 1 0 0 , Q D Di ag 2 ; 2 ; 2 ; !2 ; 0; 0 , and 2 6 6 6 Tt D 6 6 6 4
0 0 0 0 0 0
0 1 0 0 0 0
0 1 1 0 0 0
0 0 0 –1 1 0
0 0 0 –1 0 1
0 0 0 –1 0 0
3 7 7 7 7 7 7 5
when t is a multiple of the block size m, and 3 2 0 0 0 0 0 0 6 0 1 1 0 0 0 7 7 6 6 0 0 1 0 0 0 7 7 Tt D 6 6 0 0 0 1 0 0 7 7 6 4 0 0 0 0 1 0 5 0
0
0
0
0
1
otherwise. Note that when t is not a multiple of m, the portion of the Tt matrix corresponding to the seasonal is identity. The distribution of the initial state vector ˛1 is diffuse, with P D Di ag 2 ; 0; 0; 0; 0; 0 and P1 D Diag Œ0; 1; 1; 1; 1; 1. h i0 Similarly in the case of the trigonometric form of seasonality, ˛t D t t ˇt 1;t 1;t
2;t h i0 and t D t t t !1;t !1;t !2;t . The disturbance sequences, !j;t ; 1 j 2, and !1;t , are independent, zero-mean, Gaussian sequences with variance !2 . Z D Œ 1 1 0 1 0 1 , Q D h i Di ag 2 ; 2 ; 2 ; !2 ; !2 ; !2 , and 2 6 6 6 Tt D 6 6 6 4
0 0 0 0 0 0
0 1 0 0 0 0
0 1 1 0 0 0
0 0 0 cos 1 sin 1 0
0 0 0 sin 1 cos 1 0
0 0 0 0 0 cos 2
3 7 7 7 7 7 7 5
when t is a multiple of the block size m, and 2 3 0 0 0 0 0 0 6 0 1 1 0 0 0 7 6 7 6 0 0 1 0 0 0 7 6 7 Tt D 6 7 0 0 0 1 0 0 6 7 4 0 0 0 0 1 0 5 0 0 0 0 0 1 otherwise. As before, when t is not a multiple of m, the portion of the Tt matrix corresponding to the seasonal is identity. Here j D .2j /=4. The distribution of the initial state vector ˛1 is diffuse, with P D Di ag 2 ; 0; 0; 0; 0; 0 and P1 D Diag Œ0; 1; 1; 1; 1; 1. The parameter vector in both the cases is D .2 ; 2 ; 2 ; !2 /.
The UCMs as State Space Models F 1793
Cycles and Autoregression The preceding examples have illustrated how to build a state space model corresponding to a UCM that includes components such as irregular, trend, and seasonal. There you can see that the state vector and the system matrices have a simple block structure with blocks corresponding to the components in the model. Therefore, here only a simple model consisting of a single cycle and an irregular component is considered. The state space form for more complex UCMs consisting of multiple cycles and other components can be easily deduced from this example. Recall that a stochastic cycle t with frequency , 0 < < , and damping coefficient can be modeled as t cos sin t 1 t C D t sin cos t t 1 where t and t are independent, zero-mean, Gaussian disturbances with variance 2 . In what follows, a state space form for a model consisting of such a stochastic cycle and an irregular component is given. 0 0 The state vector ˛t D t t t , and the state noise vector t D t t t . The system matrices are 2 3 0 0 0 Z D Œ 1 1 0 T D 4 0 cos sin 5 Q D Diag 2 ; 2 ; 2 0 sin cos h i The distribution of the initial state vector ˛1 is proper, with P D Diag 2 ; 2 ; 2 , where 2 D 2 .1
2 /
1.
The parameter vector D .2 ; ; ; 2 /.
An autoregression rt can be considered as a special case of cycle with frequency equal to 0 or . In this case the equation for t is not needed. Therefore, for a UCM consisting of an autoregressive component and an irregular component, the state space model simplifies to the following form. 0
0
The state vector ˛t D Œ t rt , and the state noise vector t D Œ t t . The system matrices are 0 0 Z D Œ 1 1 ; T D and Q D Diag 2 ; 2 0 The distribution of the initial state vector ˛1 is proper, with P D Diag 2 ; r2 , where r2 D 2 .1 2 / 1 . The parameter vector D .2 ; ; 2 /.
Incorporating Predictors of Different Kinds In the UCM procedure, predictors can be incorporated in a UCM in a variety of ways: simple time-invariant linear predictors are specified in the MODEL statement, predictors with time-varying coefficients can be specified in the RANDOMREG statement, and predictors that have a nonlinear relationship with the response variable can be specified in the SPLINEREG statement. As with earlier examples, how to obtain a state space form of a UCM consisting of such variety of predictors is illustrated using a simple special case. Consider a random walk trend model with predictors
1794 F Chapter 29: The UCM Procedure
x; u1 ; u2 , and v. Let us assume that x is a simple regressor specified in the MODEL statement, u1 and u2 are random regressors with time-varying regression coefficients that are specified in the same RANDOMREG statement, and v is a nonlinear regressor specified on a SPLINEREG statement. Let us further assume that the spline associated with v has degree one and is based on two internal knots. As explained in the section “SPLINEREG Statement” on page 1778, using v is equivalent to using .nk not s C degree/ D .2 C 1/ D 3 derived (random) regressors: say, s1 ; s2 ; s3 . In all there are .1 C 2 C 3/ D 6 regressors, the first one being a simple regressor and the others being time-varying coefficient regressors. The time-varying regressors are in two groups, the first consisting of u1 and u2 and the other consisting of s1 ; s2 , and s3 . The dynamics of this model are as follows:
yt
D t C ˇxt C 1t u1t C 2t u2t C
t
D t
1t
D 1.t
1/
C 1t
2t
D 2.t
1/
C 2t
1t
D 1.t
1/
C 1t
2t
D 2.t
1/
C 2t
3t
D 3.t
1/
C 3t
3 X
i t si t C t
i D1 1
C t
All the disturbances t ; t ; 1t ; 2t ; 1t ; 2t ; and 3t are independent, zero-mean, Gaussian variables, where 1t ; 2t share a common variance parameter 2 and 1t ; 2t ; 3t share a common variance 2 . These dynamics can be captured in the state space form by taking state ˛t D 0
0
Œ t t ˇ 1t 2t 1t 2t 3t , state disturbance t D Œ t t 0 1t 2t 1t 2t 3t , and the system matrices Zt
D Œ 1 1 xt u1t u2t s1t s2t s3t
D Di ag Œ0; 1; 1; 1; 1; 1; 1; 1 h i Q D Di ag 2 ; 2 ; 0; 2 ; 2 ; 2 ; 2 ; 2 T
Note that the regression coefficients are elements of the state vector and that the system vector Zt is not time invariant. The distribution of the initial state vector ˛1 is diffuse, with P D Di ag 2 ; 0; 0; 0; 0; 0; 0; 0 and P1 D Diag Œ0; 1; 1; 1; 1; 1; 1; 1. The parameters of this model are the disturbance variances, 2 , 2 ; 2 ; and 2 , which get estimated by maximizing the likelihood. The regression coefficients, time-invariant ˇ and time-varying 1t ; 2t ; 1t ; 2t and 3t , get implicitly estimated during the state estimation (smoothing). Reporting Parameter Estimates for Random Regressors
If the random walk disturbance variance associated with a random regressor is held fixed at zero, then its coefficient is no longer time-varying. In the UCM procedure the random regressor parameter estimates are reported differently if the random walk disturbance variance associated with a random regressor is held fixed at zero. The following points explain how the parameter estimates are reported in the parameter estimates table and in the OUTEST= data set.
The UCMs as State Space Models F 1795
If the random walk disturbance variance associated with a random regressor is not held fixed, then its estimate is reported in the parameter estimates table and in the OUTEST= data set. If more that one random regressor is specified in a RANDOMREG statement, then the first regressor in the list is used as a representative of the list while reporting the corresponding common variance parameter estimate. If the random walk disturbance variance is held fixed at zero, then the parameter estimates table and the OUTEST= data set contain the corresponding regression parameter estimate rather than the variance parameter estimate. Similar considerations apply in the case of the derived random regressors associated with a spline-regressor.
ARMA Irregular Component (Experimental) The state space form for the irregular component that follows an ARMA(p,q)(P,Q)s model is described in this section. The notation for ARMA models is explained in the IRREGULAR statement. A number of alternate state space forms are possible in this case; the one given here is based on Jones (1980). With slight abuse of notation, let p D p C sP denote the effective autoregressive order and q D q C sQ denote the effective moving-average order of the model. Similarly, let be the effective autoregressive polynomial and be the effective moving-average polynomial in the backshift operator with coefficients 1 ; : : : ; p and 1 ; : : : ; q , obtained by multiplying the respective nonseasonal and seasonal factors. Then, a random sequence t that follows an ARMA(p,q)(P,Q)s model with a white noise sequence at has a state space form with state vector of size m D max.p; q C 1/. The system matrices, which are time invariant, are as follows: Z D Œ1 0 : : : 0. The state transition matrix T , in a blocked form, is given by 0 Im 1 T D m . . . 1 where i D 0 if i > p and Im 1 is an .m 1/ dimensional indentity matrix. The covariance of the 0 state disturbance matrix Q D 2 where 2 is the variance of the white noise sequence at and 0 the vector D Œ 0 : : : m 1 contains the first m values of the impulse response function—that is, the first m coefficients in the expansion of the ratio =. Since t is a stationary sequence, the initial state is nondiffuse and P1 D 0. The description of P , the covariance matrix of the initial state, is a little involved; the details are given in Jones (1980).
Models with Dependent Lags The state space form of a UCM consisting of the lags of the dependent variable is quite different from the state space forms considered so far. Let us consider an example to illustrate this situation. Consider a model that has random walk trend, two simple time-invariant regressors, and that also includes a few—say, k—lags of the dependent variable. That is, k X
yt
D
t
D t
i yt
i
i D1 1
C t
C t C ˇ1 x1t C ˇ2 x2t C t
1796 F Chapter 29: The UCM Procedure
The state space form of this augmented model can be described in terms of the state space form of a model that has random walk trend with two simple time-invariant regressors. A superscript dagger () has been added to distinguish the augmented model state space entities from the corresponding entities of the state space form of the random walk with predictors model. With this notation, the h 0 i0 state vector of the augmented model ˛t D ˛t yt yt 1 : : : yt kC1 and the new state noise h 0 i0 vector t D t ut 0 : : : 0 , where ut is the matrix product Zt t . Note that the length of the new state vector is k C length.˛t / D k C 4. The new system matrices, in block form, are 2 3 Tt 0 ... 0 1 . . . k 5 Zt D Œ 0 0 0 0 1 : : : 0 ; Tt D 4 Zt C1 Tt 0 Ik 1;k 1 0 where Ik
1;k 1
is the k
2
Qt Qt D 4 Zt Qt 0
1 dimensional identity matrix and 0
Qt Zt 0 Zt Qt Zt 0
3 0 0 5 0
Note that the T and Q matrices of the random walk with predictors model are time invariant, and in the expressions above their time indices are kept because they illustrate the pattern for more general models. The initial state vector is diffuse, with P 0 P1 0 ; P1 D P D 0 0 0 Ik;k The parameters of this model are the disturbance variances 2 and 2 , the lag coefficients 1 ; 2 ; : : : ; k , and the regression coefficients ˇ1 and ˇ2 . As before, the regression coefficients get estimated during the state smoothing, and the other parameters are estimated by maximizing the likelihood.
Outlier Detection In time series analysis it is often useful to detect changes over time in the characteristics of the response series. In the UCM procedure you can search for two types of changes, additive outliers (AO) and level shifts (LS). An additive outlier is an unusual value in the series, the cause of which might be a data recording error or a temporary shock to the series generation process. A level shift represents a permanent shift, either up or down, in the level of the series. You can control different aspects of the outlier search, such as the significance level of the reported outliers, by choosing different options in the OUTLIER statement. The search for AOs is done by default, whereas the CHECKBREAK option in the LEVEL statement must be used to turn on the search for LSs. The outlier detection process implemented in the UCM procedure is based on de Jong and Penzer (1998). In this approach the fitted model is taken to be the null model, and the series values and level shifts that are not adequately accounted for by the null model are flagged as outliers. The unusualness of a response series value at a particular time point t0 , with respect to the fitted model, can be judged by estimating its value based on the rest of the data (that is, the series obtained
Missing Values F 1797
by deleting the series value at t0 ) and comparing the estimated value to the observed value. If the difference between the estimated and observed values is statistically significant, then such value can be regarded as an AO. Note that this difference between the estimated and observed values is also the regression coefficient of a dummy regressor that takes the value 1.0 at t0 and is 0.0 elsewhere, assuming such a regressor is added to the null model. In this way the series value at t0 is regarded as AO if the regression coefficient of this dummy regressor is significant. Similarly, you can say that a level shift has occurred at a time point t0 if the regression coefficient of a regressor, which is 0.0 before t0 and 1.0 at t0 and thereafter, is statistically significant. De Jong and Penzer (1998) provide an efficient way to compute such AO and LS regression coefficients and their standard errors at all time points in the series. The outlier summary table, which is produced by default, simply lists the most statistically significant candidates among these.
Missing Values Embedded missing values in the dependent variable usually cause no problems in UCM modeling. However, no embedded missing values are allowed in the predictor variables. Certain patterns of missing values in the dependent variable can lead to failure of the initialization step of the diffuse Kalman filtering for some models. For example, if in a monthly series all values are missing for a certain month—say, May—then a BSM with monthly seasonality leads to such a situation. However, in this case the initialization step can complete successfully for a nonseasonal model such as local linear model.
Parameter Estimation The parameter vector in a UCM consists of the variances of the disturbance terms of the unobserved components, the damping coefficients and frequencies in the cycles, the damping coefficient in the autoregression, the lag coefficients of the dependent lags, and the regression coefficients in the regression terms. The regression coefficients are always part of the state vector and are estimated by state smoothing. The remaining parameters are estimated by maximizing either the full diffuse likelihood or the nondiffuse likelihood. The decision to use the full diffuse likelihood or the nondiffuse likelihood depends on the presence or absence of the dependent lag coefficients in the parameter vector. If the parameter vector does not contain any dependent lag coefficients, then the full diffuse likelihood is used. If, on the other hand, the parameter vector does contain some dependent lag coefficients, then the parameters are estimated by maximizing the nondiffuse likelihood. The optimization of the full diffuse likelihood is often unstable when the parameter vector contains dependent lag coefficients. In this sense, when the parameter vector contains dependent lag coefficients, the parameter estimates are not true maximum likelihood estimates. The optimization of the likelihood, either full or nondiffuse, is carried out using one of several nonlinear optimization algorithms. The user can control many aspects of the optimization process by using the NLOPTIONS statement and by providing the starting values of the parameters while specifying the corresponding components. However, in most cases the default settings work quite well. The optimization process is not guaranteed to converge to a maximum likelihood estimate. In
1798 F Chapter 29: The UCM Procedure
most cases the difficulties in parameter estimation are associated with the specification of a model that is not appropriate for the series being modeled.
Parameter Estimation by Profile Likelihood Optimization If a disturbance variance, such as the disturbance variance of the irregular component, is a part of the UCM and is a free parameter, then it can be profiled out of the likelihood. This means solving analytically for its optimum and plugging this expression back into the likelihood formula, giving rise to the so-called profile likelihood. The expression of the profile likelihood and the MLE of the profiled variance are given earlier in the section “The UCMs as State Space Models” on page 1787, where the computation of the likelihood of the state space model is also discussed. In some situations the optimization of the profile likelihood can be more efficient because the number of parameters to optimize is reduced by one; however, for a variety of reasons such gains might not always be observed. Moreover, in theory the estimates obtained by optimizing the profile likelihood and the usual likelihood should be the same, but in practice this might not hold because of numerical rounding and other conditions. In the UCM procedure, by default the usual likelihood is optimized if any of the disturbance variance parameters is held fixed to a nonzero value by using the NOEST option in the corresponding component statement. In other cases the decision whether to optimize the profile likelihood or the usual likelihood is based on several factors that are difficult to document. You can choose which likelihood to optimize during parameter estimation by specifying the PROFILE option for the profile likelihood optimization or the NOPROFILE option for the usual likelihood optimization. In the presence of the PROFILE option, the disturbance variance to profile is checked in a specific order, so that if the irregular component disturbance variance is free then it is always chosen. The situation in other cases is more complicated.
Profiling in the Presence of Fixed Variance Parameters
Note that when the parameter estimation is done by optimizing the profile likelihood, the interpretation of the variance parameters that are held fixed to nonzero values changes. In the presence of the PROFILE option, the disturbance variances that are held at a fixed value by using the NOEST option in their respective component statements are interpreted as being restricted to be that fixed multiple of the profiled variance rather than being fixed at that nominal value. That is, implicitly, the parameter estimation is done under the restriction of holding the disturbance variance ratio fixed at a given value rather than the disturbance variance itself. See Example 29.5 for an example of this type of restriction to obtain a UC model that is equivalent to the famous Hodrick-Prescott filter.
t values The t values reported in the table of parameter estimates are approximations whose accuracy depends on the validity of the model, the nature of the model, and the length of the observed series. The distributional properties of the maximum likelihood estimates of general unobserved components models have not been explored fully; therefore the probability values that correspond to a t distribution should be interpreted carefully, as they can be misleading. This is particularly true if the
Computational Issues F 1799
parameters in question are close to the boundary of the parameter space. The two sources by Harvey (1989, 2001) are good references for information about this topic. For some parameters, such as, the cycle period, the reported t values are uninformative because comparison of the estimated parameter with zero is never needed. In such cases the t values and the corresponding probability values should be ignored.
Computational Issues Convergence Problems As explained in the section “Parameter Estimation” on page 1797, the model parameters are estimated by nonlinear optimization of the likelihood. This process is not guaranteed to succeed. For some data sets, the optimization algorithm can fail to converge. Nonconvergence can result from a number of causes, including flat or ridged likelihood surfaces and ill-conditioned data. It is also possible for the algorithm to converge to a point that is not the global optimum of the likelihood. If you experience convergence problems, the following points might be helpful: Data that are extremely large or extremely small can adversely affect results because of the internal tolerances used during the filtering steps of the likelihood calculation. Rescaling the data can improve stability. Examine your model for redundancies in the included components and regressors. If some of the included components or regressors are nearly collinear to each other, then the optimization process can become unstable. Experimenting with different options offered by the NLOPTIONS statement can help. Lack of convergence can indicate model misspecification or a violation of the normality assumption.
Computer Resource Requirements The computing resources required for the UCM procedure depend on several factors. The memory requirement for the procedure is largely dependent on the number of observations to be processed and the size of the state vector underlying the specified model. If n denotes the sample size and m denotes the size of the state vector, the memory requirement for the smoothing stage of the Kalman filter is of the order of 6 8 n m2 bytes, ignoring the lower-order terms. If the smoothed component estimates are not needed then the memory requirement is of the order of 68.m2 Cn/ bytes. Besides m and n, the computing time for the parameter estimation depends on the type of components included in the model. For example, the parameter estimation is usually faster if the model parameter vector consists only of disturbance variances, because in this case there is an efficient way to compute the likelihood gradient.
1800 F Chapter 29: The UCM Procedure
Displayed Output The default printed output produced by the UCM procedure is described in the following list: brief information about the input data set, including the data set name and label, and the name of the ID variable specified in the ID statement summary statistics for the data in the estimation and forecast spans, including the names of the variables in the model, their categorization as dependent or predictor, the index of the beginning and ending observations in the spans, the total number of observations and the number of missing observations, the smallest and largest measurements, and the mean and standard deviation information about the model parameters at the start of the model-fitting stage, including the fixed parameters in the model and the initial estimates of the free parameters in the model convergence status of the likelihood optimization process if any parameter estimation is done estimates of the free parameters at the end of the model fitting-stage, including the parameter estimates, their approximate standard errors, t statistics, and the approximate p-value the likelihood-based goodness-of-fit statistics, including the full likelihood, the portion of the likelihood corresponding to the diffuse initialization, the sum of squares of residuals normalized by their standard errors, and the information criteria: AIC, AICC, HQIC, BIC, and CAIC the fit statistics that are based on the raw residuals (observed minus predicted), including the mean squared error (MSE), the root mean squared error (RMSE), the mean absolute percentage error (MAPE), the maximum percentage error (MAXPE), the R square, the adjusted R square, the random walk R square, and Amemiya’s R square the significance analysis of the components included in the model that is based on the estimation span brief information about the components included in the model additive outliers in the series, if any are detected the multistep series forecasts post-sample-prediction analysis table that compares the multistep forecasts with the observed series values, if the BACK= option is used in the FORECAST statement
Statistical Graphics This section provides information about the basic ODS statistical graphics produced by the UCM procedure. To request graphics with PROC UCM, you must first enable ODS Graphics by specifying the ODS GRAPHICS ON; statement. See Chapter 21, “Statistical Graphics Using ODS” (SAS/STAT User’s Guide), for more information.
Statistical Graphics F 1801
You can obtain most plots relevant to the specified model by using the global PLOTS= option in the PROC UCM statement. The plot of series forecasts in the forecast horizon is produced by default. You can further control the production of individual plots by using the PLOT= options in the different statements. The main types of plots available are as follows: Time series plots of the component estimates, either filtered or smoothed, can be requested by using the PLOT= option in the respective component statements. For example, the use of PLOT=SMOOTH option in a CYCLE statement produces a plot of smoothed estimate of that cycle. Residual plots for model diagnostics can be obtained by using the PLOT= option in the ESTIMATE statement. Plots of series forecasts and model decompositions can be obtained by using the PLOT= option in the FORECAST statement. The following example is a simple illustration of the available plot options.
Analysis of Sunspot Data: Illustration of ODS Graphics In this example a well-known series, Wolfer’s sunspot data (Anderson 1971), is considered. The data consist of yearly sunspot numbers recorded from 1749 to 1924. These sunspot numbers are known to have a cyclical pattern with a period of about eleven years. The following DATA step creates the input data set: data sunspot; input year wolfer @@; year = mdy(1,1, year); format year year4.; datalines; 1749 809 1750 834 1751 477 1752 1756 102 1757 324 1758 476 1759
478 1753 540 1760
307 1754 629 1761
122 1755 859 1762
96 612
... more lines ...
The following statements specify a UCM that includes a cycle component and a random walk trend component: ods graphics on; proc ucm data=sunspot; id year interval=year; model wolfer; irregular; level ; cycle plot=(filter smooth); estimate back=24 plot=(loess panel cusum wn);
1802 F Chapter 29: The UCM Procedure
forecast back=24 lead=24 plot=(forecasts decomp); run;
The following subsections explain the graphics produced by the above statements.
Component Plots
The plots in Figure 29.8 and Figure 29.9, produced by specifying PLOT=(FILTER SMOOTH) in the CYCLE statement, show the filtered and smoothed estimates, respectively, of the cycle component in the model. Note that the smoothed estimate appears smoother than the filtered estimate. This is always true because the filtered estimate of a component at time t is based on the observations prior to time t—that is, it uses measurements from the first observation up to the .t 1/th observation. On the other hand, the corresponding smoothed estimate uses all the available observations—that is, all the measurements from the first observation to the last. This makes the smoothed estimate of the component more precise than the filtered estimate for the time points within historical period. In the forecast horizon, both filtered and smoothed estimates are identical, being based on the same set of observations. Figure 29.8 Sunspots Series: Filtered Cycle
Statistical Graphics F 1803
Figure 29.9 Sunspots Series: Smoothed Cycle
Residual Diagnostics
If the fitted model is appropriate for the given data, then the corresponding one-step-ahead residuals should be approximately white—that is, uncorrelated—and approximately normal. Moreover, the residuals should not display any discernible pattern. You can detect departures from these conditions graphically. Different residual diagnostic plots can be requested by using the PLOT= option in the ESTIMATE statement. The normality can be checked by examining the histogram and the normal quantile plot of residuals. The whiteness can be checked by examining the ACF and PACF plots that show the sample autocorrelation and sample partial-autocorrelation at different lags. The diagnostic panel shown in Figure 29.10, produced by specifying PLOT=PANEL, contains these four plots.
1804 F Chapter 29: The UCM Procedure
Figure 29.10 Sunspots Series: Residual Diagnostics
The residual histogram and Q-Q plot show no serious violation of normality. The histogram appears reasonably symmetric and follows the overlaid normal density curve reasonably closely. Similarly in the Q-Q plot the residuals follow the reference line fairly closely. The ACF and PACF plots also do not exhibit any violation of the whiteness assumption; the correlations at all nonzero lags seem to be insignificant. The residual whiteness can also be formally tested by using the Ljung-Box portmanteau test. The plot in Figure 29.11, produced by specifying PLOT=WN, shows the p-values of the Ljung-Box test statistics at different lags. In these plots the p-values for the first few lags, equal to the number of estimated parameters in the model, are not shown because they are always missing. This portion of the plot is shaded blue to indicate this fact. In the case of this model, five parameters are estimated so the p-values for the first five lags are not shown. The p-values are displayed on a log scale in such a way that higher bars imply more extreme test statistics. In this plot some early p-values appear extreme. However, these p-values are based on large sample theory, which suggests that these statistics should be examined for lagsplarger than the square root of sample size. In this example it means that the p-values for the first 154 12 lags can be ignored. With this consideration, the plot shows no violation of whiteness since the p-values after the 12th lag do not appear extreme.
Statistical Graphics F 1805
Figure 29.11 Sunspots Series: Ljung-Box Portmanteau Test
The plot in Figure 29.12, produced by specifying PLOT=LOESS, shows the residuals plotted against time with an overlaid LOESS curve. This plot is useful for checking whether any discernible pattern remains in the residuals. Here again, no significant pattern appears to be present.
1806 F Chapter 29: The UCM Procedure
Figure 29.12 Sunspots Series: Residual Loess Plot
The plot in Figure 29.13, produced by specifying PLOT=CUSUM, shows the cumulative residuals plotted against time. This plot is useful for checking structural breaks. Here, there appears to be no evidence of structural break since the cumulative residuals remain within the confidence band throughout the sample period. Similarly you can request a plot of the squared cumulative residuals by specifying PLOT=CUSUMSQ.
Statistical Graphics F 1807
Figure 29.13 Sunspots Series: CUSUM Plot
Brockwell and Davis (1991) can be consulted for additional information on diagnosing residuals. For more information on CUSUM and CUSUMSQ plots, you can consult Harvey (1989).
Forecast and Series Decomposition Plots
You can use the PLOT= option in the FORECAST statement to obtain the series forecast plot and the series decomposition plots. The series decomposition plots show the result of successively adding different components in the model starting with the trend component. The IRREGULAR component is left out of this process. The following two plots, produced by specifying PLOT=DECOMP, show the results of successive component addition for this example. The first plot, shown in Figure 29.14, shows the smoothed trend component and the second plot, shown in Figure 29.15, shows the sum of smoothed trend and cycle.
1808 F Chapter 29: The UCM Procedure
Figure 29.14 Sunspots Series: Smoothed Trend
Statistical Graphics F 1809
Figure 29.15 Sunspots Series: Smoothed Trend plus Cycle
Finally, Figure 29.16 shows the forecast plot.
1810 F Chapter 29: The UCM Procedure
Figure 29.16 Sunspots Series: Series Forecasts
ODS Table Names The UCM procedure assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in Table 29.2. Table 29.2
ODS Tables Produced by PROC UCM
ODS Table Name
Description
Statement
Tables Summarizing the Estimation and Forecast Spans EstimationSpan Estimation span summary information ForecastSpan Forecast span summary information Tables Related to Model Parameters ConvergenceStatus Convergence status of the estimation process
Option default default
default
ODS Table Names F 1811
Table 29.2
continued
ODS Table Name
Description
FixedParameters
Fixed parameters in the model Initial estimates of the free parameters Final estimates of the free parameters
InitialParameters ParameterEstimates
Tables Related to Model Information and Diagnostics BlockSeasonDescription Information about the block seasonals in the model ComponentSignificance Significance analysis of the components in the model CycleDescription Information about the cycles in the model FitStatistics Fit statistics based on the one-step-ahead predictions FitSummary Likelihood-based fit statistics OutlierSummary Summary table of the detected outliers SeasonDescription Information about the seasonals in the model SeasonHarmonics Summary of harmonics in a trigonometric seasonal component SplineSeasonDescription Information about the splineseasonals in the model TrendInformation Summary information of the level and slope components Tables Related to Filtered Component Estimates FilteredAutoReg Filtered estimate of an autoreg component FilteredBlockSeason Filtered estimate of a block seasonal component FilteredCycle Filtered estimate of a cycle component FilteredIrregular Filtered estimate of the irregular component FilteredLevel Filtered estimate of the level component FilteredRandomReg Filtered estimate of the timevarying random-regression coefficient
Statement
Option default default default
default default default default default default default SEASON
PRINT=HARMONICS
default default
AUTOREG
PRINT=FILTER
BLOCKSEASON PRINT=FILTER CYCLE
PRINT=FILTER
IRREGULAR
PRINT=FILTER
LEVEL
PRINT=FILTER
RANDOMREG
PRINT=FILTER
1812 F Chapter 29: The UCM Procedure
Table 29.2
continued
ODS Table Name
Description
Statement
Option
FilteredSeason
Filtered estimate of a seasonal component Filtered estimate of the slope component Filtered estimate of the timevarying spline-regression coefficient Filtered estimate of a splineseasonal component
SEASON
PRINT=FILTER
SLOPE
PRINT=FILTER
SPLINEREG
PRINT=FILTER
FilteredSlope FilteredSplineReg
FilteredSplineSeason
Tables Related to Smoothed Component Estimates SmoothedAutoReg Smoothed estimate of an autoreg component SmoothedBlockSeason Smoothed estimate of a block seasonal component SmoothedCycle Smoothed estimate of the cycle component SmoothedIrregular Smoothed estimate of the irregular component SmoothedLevel Smoothed estimate of the level component SmoothedRandomReg Smoothed estimate of the time-varying randomregression coefficient SmoothedSeason Smoothed estimate of a seasonal component SmoothedSlope Smoothed estimate of the slope component SmoothedSplineReg Smoothed estimate of the time-varying splineregression coefficient SmoothedSplineSeason Smoothed estimate of a spline-seasonal component
SPLINESEASON PRINT=FILTER
AUTOREG
PRINT=SMOOTH
BLOCKSEASON PRINT=SMOOTH CYCLE
PRINT=SMOOTH
IRREGULAR
PRINT=SMOOTH
LEVEL
PRINT=SMOOTH
RANDOMREG
PRINT=SMOOTH
SEASON
PRINT=SMOOTH
SLOPE
PRINT=SMOOTH
SPLINEREG
PRINT=SMOOTH
SPLINESEASON PRINT=SMOOTH
Tables Related to Series Decomposition and Forecasting FilteredAllExceptIrreg Filtered estimate of sum of FORECAST all components except the irregular component FilteredTrend Filtered estimate of trend FORECAST FilteredTrendReg Filtered estimate of trend FORECAST plus regression FilteredTrendRegCyc Filtered estimate of trend FORECAST plus regression plus cycles and autoreg
PRINT=FDECOMP
PRINT= FDECOMP PRINT=FDECOMP PRINT=FDECOMP
ODS Graph Names F 1813
Table 29.2
continued
ODS Table Name
Description
Forecasts PostSamplePrediction
Dependent series forecasts Forecasting performance in the holdout period Smoothed estimate of sum of all components except the irregular component Smoothed estimate of trend Smoothed estimate of trend plus regression Smoothed estimate of trend plus regression plus cycles and autoreg
SmoothedAllExceptIrreg
SmoothedTrend SmoothedTrendReg SmoothedTrendRegCyc
Statement
Option
FORECAST
default BACK=
FORECAST
PRINT=DECOMP
FORECAST FORECAST
PRINT= DECOMP PRINT=DECOMP
FORECAST
PRINT=DECOMP
N OTE : The tables are related to a single series within a BY group. In the case of models that contain multiple cycles, seasonal components, or block seasonal components, the corresponding component estimate tables are sequentially numbered. For example, if a model contains two cycles and a seasonal component and the PRINT=SMOOTH option is used for each of them, the ODS tables containing the smoothed estimates will be named SmoothedCycle1, SmoothedCycle2, and SmoothedSeason. Note that the seasonal table is not numbered because there is only one seasonal component.
ODS Graph Names To request graphics with PROC UCM, you must first enable ODS Graphics by specifying the ODS GRAPHICS ON; statement. See Chapter 21, “Statistical Graphics Using ODS” (SAS/STAT User’s Guide), for more information. You can reference every graph produced through ODS Graphics with a name. The names of the graphs that PROC UCM generates are listed in Table 29.3, along with the required statements and options. Table 29.3
ODS Graphics Produced by PROC UCM
ODS Graph Name
Description
Plots Related to Residual Analysis ErrorACFPlot Prediction error autocorrelation plot ErrorPACFPlot Prediction error partial-autocorrelation plot
Statement
Option
ESTIMATE
PLOT=ACF
ESTIMATE
PLOT=PACF
1814 F Chapter 29: The UCM Procedure
Table 29.3
continued
ODS Graph Name
Description
Statement
Option
ErrorHistogram
Prediction error histogram Prediction error normal quantile plot Plot of prediction errors Plot of p-values at different lags for the Ljung-Box portmanteau white noise test statistics Plot of cumulative residuals Plot of cumulative squared residuals Plot of one-step-ahead forecasts in the estimation span Panel of residual diagnostic plots Time series plot of residuals with superimposed LOESS smoother
ESTIMATE
PLOT=NORMAL
ESTIMATE
PLOT=QQ
ESTIMATE
PLOT=RESIDUAL
ESTIMATE
PLOT=WN
ESTIMATE
PLOT=CUSUM
ESTIMATE
PLOT=CUSUMSQ
ESTIMATE
PLOT=MODEL
ESTIMATE
PLOT=PANEL
ESTIMATE
PLOT=LOESS
AUTOREG
PLOT=FILTER
BLOCKSEASON
PLOT=FILTER
CYCLE
PLOT=FILTER
IRREGULAR
PLOT=FILTER
LEVEL
PLOT=FILTER
RANDOMREG
PLOT=FILTER
SEASON
PLOT=FILTER
SLOPE
PLOT=FILTER
ErrorQQPlot ErrorPlot ErrorWhiteNoiseLogProbPlot
CUSUMPlot CUSUMSQPlot ModelPlot
PanelResidualPlot ResidualLoessPlot
Plots Related to Filtered Component Estimates FilteredAutoregPlot Plot of filtered autoreg component FilteredBlockSeasonPlot Plot of filtered block season component FilteredCyclePlot Plot of filtered cycle component FilteredIrregularPlot Plot of filtered irregular component FilteredLevelPlot Plot of filtered level component FilteredRandomRegPlot Plot of filtered timevarying regression coefficient FilteredSeasonPlot Plot of filtered season component FilteredSlopePlot Plot of filtered slope component
ODS Graph Names F 1815
Table 29.3
continued
ODS Graph Name
Description
Statement
Option
FilteredSplineRegPlot
Plot of filtered timevarying regression coefficient Plot of filtered splineseason component Plot of annual variation in the filtered season component
SPLINEREG
PLOT=FILTER
SPLINESEASON
PLOT=FILTER
SEASON
PLOT=F_ANNUAL
AUTOREG
PLOT=SMOOTH
BLOCKSEASON
PLOT=SMOOTH
CYCLE
PLOT=SMOOTH
IRREGULAR
PLOT=SMOOTH
LEVEL
PLOT=SMOOTH
RANDOMREG
PLOT=SMOOTH
SEASON
PLOT=SMOOTH
SLOPE
PLOT=SMOOTH
SPLINEREG
PLOT=SMOOTH
SPLINESEASON
PLOT=SMOOTH
SEASON
PLOT=S_ANNUAL
FORECAST
DEFAULT
FilteredSplineSeasonPlot AnnualSeasonPlot
Plots Related to Smoothed Component Estimates SmoothedAutoregPlot Plot of smoothed autoreg component SmoothedBlockSeasonPlot Plot of smoothed block season component SmoothedCyclePlot Plot of smoothed cycle component SmoothedIrregularPlot Plot of smoothed irregular component SmoothedLevelPlot Plot of smoothed level component SmoothedRandomRegPlot Plot of smoothed timevarying regression coefficient SmoothedSeasonPlot Plot of smoothed season component SmoothedSlopePlot Plot of smoothed slope component SmoothedSplineRegPlot Plot of smoothed timevarying regression coefficient SmoothedSplineSeasonPlot Plot of smoothed spline-season component AnnualSeasonPlot Plot of annual variation in the smoothed season component Plots Related to Series Decomposition and Forecasting ForecastsOnlyPlot Series forecasts beyond the historical period
1816 F Chapter 29: The UCM Procedure
Table 29.3
continued
ODS Graph Name
Description
Statement
Option
ForecastsPlot
One-step-ahead as well as multistepahead forecasts Plot of sum of all filtered components except the irregular component Plot of filtered trend Plot of sum of filtered trend, cycles, and regression effects Plot of filtered trend plus regression effects Plot of sum of all smoothed components except the irregular component Plot of smoothed trend Plot of smoothed trend plus regression effects Plot of sum of smoothed trend, cycles, and regression effects Plot of standard error of sum of all filtered components except the irregular Plot of standard error of filtered trend Plot of standard error of filtered trend plus regression effects Plot of standard error of filtered trend, cycles, and regression effects Plot of standard error of sum of all smoothed components except the irregular Plot of standard error of smoothed trend
FORECAST
PLOT=FORECASTS
FORECAST
PLOT= FDECOMP
FORECAST FORECAST
PLOT= FDECOMP PLOT= FDECOMP
FORECAST
PLOT= FDECOMP
FORECAST
PLOT= DECOMP
FORECAST FORECAST
PLOT= TREND PLOT= DECOMP
FORECAST
PLOT= DECOMP
FORECAST
PLOT= FDECOMPVAR
FORECAST
PLOT= FDECOMPVAR
FORECAST
PLOT= FDECOMPVAR
FORECAST
PLOT= FDECOMPVAR
FORECAST
PLOT= DECOMPVAR
FORECAST
PLOT= DECOMPVAR
FilteredAllExceptIrregPlot
FilteredTrendPlot FilteredTrendRegCycPlot
FilteredTrendRegPlot SmoothedAllExceptIrregPlot
SmoothedTrendPlot SmoothedTrendRegPlot SmoothedTrendRegCycPlot
FilteredAllExceptIrregVarPlot
FilteredTrendVarPlot FilteredTrendRegVarPlot
FilteredTrendRegCycVarPlot
SmoothedAllExceptIrregVarPlot
SmoothedTrendVarPlot
OUTFOR= Data Set F 1817
Table 29.3
continued
ODS Graph Name
Description
Statement
Option
SmoothedTrendRegVarPlot
Plot of standard error of smoothed trend plus regression effects Plot of standard error of smoothed trend, cycles, and regression effects
FORECAST
PLOT= DECOMPVAR
FORECAST
PLOT= DECOMPVAR
SmoothedTrendRegCycVarPlot
OUTFOR= Data Set You can use the OUTFOR= option in the FORECAST statement to store the series and component forecasts produced by the procedure. This data set contains the following columns: the BY variables the ID variable. If an ID variable is not specified, then a numerical variable, _ID_, is created that contains the observation numbers from the input data set. the dependent series and the predictor series FORECAST, a numerical variable containing the one-step-ahead predicted values and the multistep forecasts RESIDUAL, a numerical variable containing the difference between the actual and forecast values STD, a numerical variable containing the standard error of prediction LCL and UCL, numerical variables containing the lower and upper forecast confidence limits S_SERIES and VS_SERIES, numerical variables containing the smoothed values of the dependent series and their variances S_IRREG and VS_IRREG, numerical variables containing the smoothed values of the irregular component and their variances. These variables are present only if the model has an irregular component. F_LEVEL, VF_LEVEL, S_LEVEL, and VS_LEVEL, numerical variables containing the filtered and smoothed values of the level component and the respective variances. These variables are present only if the model has a level component. F_SLOPE, VF_SLOPE, S_SLOPE, and VS_SLOPE, numerical variables containing the filtered and smoothed values of the slope component and the respective variances. These variables are present only if the model has a slope component.
1818 F Chapter 29: The UCM Procedure
F_AUTOREG, VF_AUTOREG, S_AUTOREG, and VS_AUTOREG, numerical variables containing the filtered and smoothed values of the autoreg component and the respective variances. These variables are present only if the model has an autoreg component. F_CYCLE, VF_CYCLE, S_CYCLE, and VS_CYCLE, numerical variables containing the filtered and smoothed values of the cycle component and the respective variances. If there are multiple cycles in the model, these variables are sequentially numbered as F_CYCLE1, F_CYCLE2, etc. These variables are present only if the model has at least one cycle component. F_SEASON, VF_SEASON, S_SEASON, and VS_SEASON, numerical variables containing the filtered and smoothed values of the season component and the respective variances. If there are multiple seasons in the model, these variables are sequentially numbered as F_SEASON1, F_SEASON2, etc. These variables are present only if the model has at least one season component. F_BLKSEAS, VF_BLKSEAS, S_BLKSEAS, and VS_BLKSEAS, numerical variables containing the filtered and smoothed values of the blockseason component and the respective variances. If there are multiple block seasons in the model, these variables are sequentially numbered as F_BLKSEAS1, F_BLKSEAS2, etc. F_SPLSEAS, VF_SPLSEAS, S_SPLSEAS, and VS_SPLSEAS, numerical variables containing the filtered and smoothed values of the splineseason component and the respective variances. If there are multiple spline seasons in the model, these variables are sequentially numbered as F_SPLSEAS1, F_SPLSEAS2, etc. These variables are present only if the model has at least one splineseason component. Filtered and smoothed estimates, and their variances, of the time-varying regression coefficients of the variables specified in the RANDOMREG and SPLINEREG statements. A variable is not included if its coefficient is time-invariant, that is, if the associated disturbance variance is zero. S_TREG and VS_TREG, numerical variables containing the smoothed values of level plus regression component and their variances. These variables are present only if the model has at least one predictor variable or has dependent lags. S_TREGCYC and VS_TREGCYC, numerical variables containing the smoothed values of level plus regression plus cycle component and their variances. These variables are present only if the model has at least one cycle or an autoreg component. S_NOIRREG and VS_NOIRREG, numerical variables containing the smoothed values of the sum of all components except the irregular component and their variances. These variables are present only if the model has at least one seasonal or block seasonal component.
OUTEST= Data Set You can use the OUTEST= option in the ESTIMATE statement to store the model parameters and the related estimation details. This data set contains the following columns:
Statistics of Fit F 1819
the BY variables COMPONENT, a character variable containing the name of the component corresponding to the parameter being described PARAMETER, a character variable containing the parameter name TYPE, a character variable indicating whether the parameter value was fixed by the user or estimated _STATUS_, a character variable indicating whether the parameter estimation process converged or failed or there was an error of some other kind ESTIMATE, a numerical variable containing the parameter estimate STD, a numerical variable containing the standard error of the parameter estimate. This has a missing value if the parameter value is fixed. TVALUE, a numerical variable containing the t-statistic. This has a missing value if the parameter value is fixed. PVALUE, a numerical variable containing the p-value. This has a missing value if the parameter value is fixed.
Statistics of Fit This section explains the goodness-of-fit statistics reported to measure how well the specified model fits the data. First the various statistics of fit that are computed using the prediction errors, yt yOt , are considered. In these formulas, n is the number of nonmissing prediction errors and Pk is the number of fitted parameters in the model. Moreover, the sum of squared errors, P SSE D .yt yOt /2 , and the total sum of squares for the series corrected for the mean, SST D .yt y/2 , where y is the series mean, and the sums are over all the nonmissing prediction errors. Mean Squared Error The mean squared prediction error, MSE D n1 SSE Root Mean Squared Error p The root mean square error, RMSE = MSE Mean Absolute Percent Error The mean absolute percent prediction error, MAPE = The summation ignores observations where yt D 0.
100 n
Pn
t D1 j.yt
yOt /=yt j.
R-square The R-square statistic, R2 D 1 SSE=SST. If the model fits the series badly, the model error sum of squares, SSE, might be larger than SST and the R-square statistic will be negative. Adjusted R-square The adjusted R-square statistic, 1
. nn
1 /.1 k
R2 /
1820 F Chapter 29: The UCM Procedure
Amemiya’s Adjusted R-square Amemiya’s adjusted R-square, 1
/.1 . nCk n k
R2 /
Random Walk R-square The random walk R-square statistic (Harvey’s R-square statistic that Puses the random walk model for comparison), 1 . n n 1 /SSE=RWSSE, where RWSSE D ntD2 .yt yt 1 /2 , P and D n 1 1 ntD2 .yt yt 1 / Maximum Percent Error The largest percent prediction error, 100 max..yt tions where yt D 0 are ignored.
yOt /=yt /. In this computation the observa-
The likelihood-based fit statistics are reported separately (see the section “The UCMs as State Space Models” on page 1787). They include the full log likelihood (L1 ), the diffuse part of the log likelihood, the normalized residual sum of squares, and several information criteria: AIC, AICC, HQIC, BIC, and CAIC. Let q denote the number of estimated parameters, n be the number of nonmissing measurements in the estimation span, and d be the number of diffuse elements in the initial state vector that are successfully initialized during the Kalman filtering process. Moreover, let n D .n d /. The reported information criteria, all in smaller-is-better form, are described in Table 29.4: Table 29.4
Criterion
Information Criteria
Formula
Reference
AIC AICC
2L1 C 2q 2L1 C 2q n =.n
HQIC BIC CAIC
2L1 C 2q log log.n / 2L1 C q log.n / 2L1 C q.log.n / C 1/
q
1/
Akaike (1974) Hurvich and Tsai (1989) Burnham and Anderson (1998) Hannan and Quinn (1979) Schwarz (1978) Bozdogan (1987)
Examples: UCM Procedure
Example 29.1: The Airline Series Revisited The series in this example, the monthly airline passenger series, has already been discussed earlier; see the section “A Seasonal Series with Linear Trend” on page 1743. Recall that the series consists of monthly numbers of international airline travelers (from January 1949 to December 1960). Here additional output features of the UCM procedure are illustrated, such as how to use the ESTIMATE and FORECAST statements to limit the span of the data used in parameter estimation and forecasting. The following statements fit a BSM to the logarithm of the airline passenger numbers. The disturbance variance for the slope component is held fixed at value 0; that is, the trend is locally linear with constant slope. In order to evaluate the performance of the fitted model on observed data,
Example 29.1: The Airline Series Revisited F 1821
some of the observed data are withheld during parameter estimation and forecast computations. The observations in the last two years, years 1959 and 1960, are not used in parameter estimation, while the observations in the last year, year 1960, are not used in the forecasting computations. This is done using the BACK= option in the ESTIMATE and FORECAST statements. In addition, a panel of residual diagnostic plots is obtained using the PLOT=PANEL option in the ESTIMATE statement. data seriesG; set sashelp.air; logair = log(air); run;
proc ucm data = seriesG; id date interval = month; model logair; irregular; level; slope var = 0 noest; season length = 12 type=trig; estimate back=24 plot=panel; forecast back=12 lead=24 print=forecasts; run;
The following tables display the summary of data used in estimation and forecasting (Output 29.1.1 and Output 29.1.2). These tables provide simple summary statistics for the estimation and forecast spans; they include useful information such as the beginning and ending dates of the span, the number of nonmissing values, etc. Output 29.1.1 Observation Span Used in Parameter Estimation (partial output) Variable logair
Type Dependent
First
Last
Nobs
Mean
JAN1949
DEC1958
120
5.43035
Output 29.1.2 Observation Span Used in Forecasting (partial output) Variable logair
Type Dependent
First
Last
Nobs
Mean
JAN1949
DEC1959
132
5.48654
The following tables display the fixed parameters in the model, the preliminary estimates of the free parameters, and the final estimates of the free parameters (Output 29.1.3, Output 29.1.4, and Output 29.1.5).
1822 F Chapter 29: The UCM Procedure
Output 29.1.3 Fixed Parameters in the Model The UCM Procedure Fixed Parameters in the Model Component
Parameter
Value
Slope
Error Variance
0
Output 29.1.4 Starting Values for the Parameters to Be Estimated Preliminary
Estimates of the Free Parameters
Component
Parameter
Estimate
Irregular Level Season
Error Variance Error Variance Error Variance
6.64120 2.49045 1.26676
Output 29.1.5 Maximum Likelihood Estimates of the Free Parameters Final Estimates of the Free Parameters
Component
Parameter
Irregular Level Season
Error Variance Error Variance Error Variance
Estimate
Approx Std Error
t Value
Approx Pr > |t|
0.00018686 0.00040314 0.00000350
0.0001212 0.0001566 1.66319E-6
1.54 2.57 2.10
0.1233 0.0100 0.0354
Two types of goodness-of-fit statistics are reported after a model is fit to the series (see Output 29.1.6 and Output 29.1.7). The first type is the likelihood-based goodness-of-fit statistics, which include the full likelihood of the data, the diffuse portion of the likelihood (see the section “Details: UCM Procedure” on page 1781), and the information criteria. The second type of statistics is based on the raw residuals, residual = observed – predicted. If the model is nonstationary, then one-stepahead predictions are not available for some initial observations, and the number of values used in computing these fit statistics will be different from those used in computing the likelihood-based test statistics.
Example 29.1: The Airline Series Revisited F 1823
Output 29.1.6 Likelihood-Based Fit Statistics for the Airline Data Likelihood Based Fit Statistics Statistic
Value
Full Log Likelihood Diffuse Part of Log Likelihood Non-Missing Observations Used Estimated Parameters Initialized Diffuse State Elements Normalized Residual Sum of Squares AIC (smaller is better) BIC (smaller is better) AICC (smaller is better) HQIC (smaller is better) CAIC (smaller is better)
180.63 -13.93 120 3 13 107 -355.3 -347.2 -355 -352 -344.2
Output 29.1.7 Residuals-Based Fit Statistics for the Airline Data Fit Statistics Based on Residuals Mean Squared Error Root Mean Squared Error Mean Absolute Percentage Error Maximum Percent Error R-Square Adjusted R-Square Random Walk R-Square Amemiya’s Adjusted R-Square
0.00156 0.03944 0.57677 2.19396 0.98705 0.98680 0.86370 0.98630
Number of non-missing residuals used for computing the fit statistics = 107
The diagnostic plots based on the one-step-ahead residuals are shown in Output 29.1.8. The residual histogram and the Q-Q plot show no reasons to question the approximate normality of the residual distribution. The remaining plots check for the whiteness of the residuals. The sample correlation plots, the autocorrelation function (ACF) and the partial autocorrelation function (PACF), also do not show any significant violations of the whiteness of the residuals. Therefore, on the whole, the model seems to fit the data well.
1824 F Chapter 29: The UCM Procedure
Output 29.1.8 Residual Diagnostics for the Airline Series Using a BSM
The forecasts are given in Output 29.1.9. In order to save the space, the upper and lower confidence limit columns are dropped from the output, and only the rows corresponding to the year 1960 are shown. Recall that the actual measurements in the years 1959 and 1960 were withheld during the parameter estimation, and the ones in 1960 were not used in the forecast computations.
Example 29.1: The Airline Series Revisited F 1825
Output 29.1.9 Forecasts for the Airline Data Obs
date
133 134 135 136 137 138 139 140 141 142 143 144
JAN60 FEB60 MAR60 APR60 MAY60 JUN60 JUL60 AUG60 SEP60 OCT60 NOV60 DEC60
Forecast
StdErr
logair
Residual
6.050 5.996 6.156 6.124 6.168 6.303 6.435 6.450 6.265 6.138 6.015 6.121
0.038 0.044 0.049 0.053 0.058 0.061 0.065 0.068 0.071 0.073 0.075 0.077
6.033 5.969 6.038 6.133 6.157 6.282 6.433 6.407 6.230 6.133 5.966 6.068
-0.017 -0.027 -0.118 0.010 -0.011 -0.021 -0.002 -0.043 -0.035 -0.005 -0.049 -0.053
The figure Output 29.1.10 shows the forecast plot. The forecasts in the year 1960 show that the model predictions were quite good. Output 29.1.10 Forecast Plot of the Airline Series Using a BSM
1826 F Chapter 29: The UCM Procedure
Example 29.2: Variable Star Data The series in this example is studied in detail in Bloomfield (2000). This series consists of brightness measurements (magnitude) of a variable star taken at midnight for 600 consecutive days. The data can be downloaded from a time series archive maintained by the University of York, England (http://www.york.ac.uk/depts/maths/data/ts/welcome.htm (series number 26)). The following DATA step statements read the data in a SAS data set. data star; input magnitude @@; day = _n_; datalines; 25 28 31 32 33 33 14 10 7 4 2 0 15 19 23 26 29 32 24 20 17 13 10 7 7 10 13 16 19 22 27 25 24 21 19 17
32 0 33 5 24 15
31 0 34 3 26 13
28 2 33 3 27 12
25 4 32 3 28 11
22 8 30 4 29 11
18 11 27 5 28 10
... more lines ...
The following statements use the TIMESERIES procedure to get a timeseries plot of the series (see Output 29.2.1). proc timeseries data=star plot=series; var magnitude; run;
Example 29.2: Variable Star Data F 1827
Output 29.2.1 Plot of Star Brightness on Successive Days
The plot clearly shows the cyclic nature of the series. Bloomfield shows that the series is very well explained by a model that includes two deterministic cycles that have periods 29.0003 and 24.0001 days, a constant term, and a simple error term. He also mentions the difficulty involved in estimating the periods from the data (see Bloomfield 2000, Chapter 3). In his case the cycle periods are estimated by least squares, and the sum of squares surface has multiple local optima and ridges. The following statements show how to use the UCM procedure to fit this two-cycle model to the series. The constant term in the model is specified by holding the variance parameter of the level component to zero.
proc ucm data=star; model magnitude; irregular; level var=0 noest; cycle; cycle; estimate; run;
1828 F Chapter 29: The UCM Procedure
The final parameter estimates and the goodness-of-fit statistics are shown in Output 29.2.2 and Output 29.2.3, respectively. The model fit appears to be good. Output 29.2.2 Two-Cycle Model: Parameter Estimates The UCM Procedure Final Estimates of the Free Parameters
Component
Parameter
Irregular Cycle_1 Cycle_1 Cycle_1 Cycle_2 Cycle_2 Cycle_2
Error Variance Damping Factor Period Error Variance Damping Factor Period Error Variance
Estimate
Approx Std Error
t Value
Approx Pr > |t|
0.09257 1.00000 29.00036 0.00000882 1.00000 24.00011 0.00000535
0.0053845 1.81175E-7 0.0022709 5.27213E-6 2.11939E-7 0.0019128 3.56374E-6
17.19 5519514 12770.4 1.67 4718334 12547.2 1.50
<.0001 <.0001 <.0001 0.0944 <.0001 <.0001 0.1330
Output 29.2.3 Two-Cycle Model: Goodness of Fit Fit Statistics Based on Residuals Mean Squared Error Root Mean Squared Error Mean Absolute Percentage Error Maximum Percent Error R-Square Adjusted R-Square Random Walk R-Square Amemiya’s Adjusted R-Square
0.12072 0.34745 2.65141 36.38991 0.99850 0.99849 0.97281 0.99847
Number of non-missing residuals used for computing the fit statistics = 599
A summary of the cycles in the model is given in Output 29.2.4. Output 29.2.4 Two-Cycle Model: Summary Name
Type
Cycle_1 Cycle_2
Stationary Stationary
period
Rho
ErrorVar
29.00036 24.00011
1.00000 1.00000
0.00000882 0.00000535
Note that the estimated periods are the same as in Bloomfield’s model, the damping factors are nearly equal to 1.0, and the disturbance variances are very close to zero, implying persistent deterministic cycles. In fact, this model is identical to Bloomfield’s model.
Example 29.3: Modeling Long Seasonal Patterns F 1829
Example 29.3: Modeling Long Seasonal Patterns This example illustrates some of the techniques you can use to model long seasonal patterns in a series. If the seasonal pattern is of moderate length and the underlying dynamics are simple, then it is easily modeled by using the basic settings of the SEASON statement and these additional techniques are not needed. However, if the seasonal pattern has a long season length and/or has a complex stochastic dynamics, then the techniques discussed here can be useful. You can obtain parsimonious models for a long seasonal pattern by using an appropriate subset of trigonometric harmonics, or by using a suitable spline function, or by using a block-season pattern in combination with a seasonal component of much smaller length. You can also vary the disturbance variances of the subcomponents that combine to form the seasonal component. The time series used in this example consists of number of calls received per shift at a call center. Each shift is six hours long, and the first shift of the day begins at midnight, resulting in four shifts per day. The observations are available from December 15, 1999, to April 30, 2000. This series is seasonal with season length 28, which is moderate, and in fact there is no particular need to use pattern approximation techniques in this case. However, it is adequate for demonstration purposes. The plan of this example is as follows. First an initial model with a full seasonal component is created. This model is used as a baseline for comparing alternate models created by the techniques that are being illustrated. In practice any candidate model is first checked for adequacy by using various diagnostic procedures. In this illustration the main focus is on the different ways a long seasonal pattern can be modeled and no model diagnostics are done for the models being entertained. The alternate models are compared by using the sum of absolute prediction errors in the holdout region. The following DATA step statements create the input data set used in this example. data callCenter; input calls @@; label calls= "Number of Calls Received in a 6 Hour Shift"; start = ’15dec99:00:00’dt; datetime = INTNX( ’dthour6’, start, _n_-1 ); format datetime datetime10.; datalines; 18 122 244 128 19 113 230 119 17 219 93 14 73 139 53 11 32 74 15 137 289 153 20 125 227 106 16
112 56 101
... more lines ...
Initial exploration of the series clearly indicates that the series does not show any significant trend, and time of day and day of the week have a significant influence on the number of calls received. These considerations suggest a simple random walk trend model along with a seasonal component of season length 28, the total number of shifts in a week. The following statements specify this model. Note the PRINT=HARMONICS option in the SEASON statement, which produces a table that lists the full set of harmonics contributing to the seasonal along with the significance of their contribution. This table will be useful later in choosing a subset trigonometric model. The BACK=28 and the LEAD=28 specifications in the FORECAST statement create a holdout region of 28 observations. The sum of absolute prediction errors (SAE) in this holdout region are used to
1830 F Chapter 29: The UCM Procedure
compare the different models.
proc ucm data=callCenter; id datetime interval=dthour6; model calls; irregular; level; season length=28 type=trig print=(harmonics); estimate back=28; forecast back=28 lead=28; run;
The forecasting performance of this model in the holdout region is shown in Output 29.3.1. The sum of absolute prediction errors SAE = 516:22, which appears in the last row of the holdout analysis table. Output 29.3.1 Predictions in the Holdout Region: Baseline Model Obs
datetime
Actual
Forecast
Error
SAE
525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552
24APR00:00 24APR00:06 24APR00:12 24APR00:18 25APR00:00 25APR00:06 25APR00:12 25APR00:18 26APR00:00 26APR00:06 26APR00:12 26APR00:18 27APR00:00 27APR00:06 27APR00:12 27APR00:18 28APR00:00 28APR00:06 28APR00:12 28APR00:18 29APR00:00 29APR00:06 29APR00:12 29APR00:18 30APR00:00 30APR00:06 30APR00:12 30APR00:18
12 136 295 172 20 127 236 125 16 108 207 112 15 98 200 113 15 104 205 89 12 68 116 54 10 30 66 61
-4.004 110.825 262.820 145.127 2.188 105.442 217.043 114.313 2.855 95.202 194.184 97.687 1.270 85.875 184.891 93.113 -1.120 84.983 177.940 64.292 -6.020 46.286 100.339 34.700 -6.209 12.167 49.524 40.071
16.004 25.175 32.180 26.873 17.812 21.558 18.957 10.687 13.145 12.798 12.816 14.313 13.730 12.125 15.109 19.887 16.120 19.017 27.060 24.708 18.020 21.714 15.661 19.300 16.209 17.833 16.476 20.929
16.004 41.179 73.360 100.232 118.044 139.602 158.559 169.246 182.391 195.189 208.005 222.317 236.047 248.172 263.281 283.168 299.288 318.305 345.365 370.073 388.093 409.807 425.468 444.768 460.978 478.811 495.287 516.216
Now that a baseline model is created, the exploration for alternate models can begin. The review of the harmonic table in Output 29.3.2 shows that all but the last three harmonics are significant, and deleting any of them to form a subset trigonometric seasonal component will lead to a poorer model.
Example 29.3: Modeling Long Seasonal Patterns F 1831
The last three harmonics, 12th, 13th and 14th, with periods of 2.333, 2.15 and 2.0, respectively, do appear to be possible choices for deletion. Note that the disturbance variance of the seasonal component is not very insignificant (see Output 29.3.3); therefore the seasonal component is stochastic and the preceding logic, which is based on the final state estimate, provides only a rough guideline. Output 29.3.2 Harmonic Analysis of the Season: Initial Model The UCM Procedure Harmonic Analysis of Trigonometric Seasons (Based on the Final State)
Name Season Season Season Season Season Season Season Season Season Season Season Season Season Season
Season Length
Harmonic
Period
Chi-Square
DF
Pr > ChiSq
28 28 28 28 28 28 28 28 28 28 28 28 28 28
1 2 3 4 5 6 7 8 9 10 11 12 13 14
28.00000 14.00000 9.33333 7.00000 5.60000 4.66667 4.00000 3.50000 3.11111 2.80000 2.54545 2.33333 2.15385 2.00000
234.19 264.19 95.65 105.64 146.74 121.93 4299.12 150.79 89.68 8.95 6.14 2.20 3.40 2.33
2 2 2 2 2 2 2 2 2 2 2 2 2 1
<.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.0114 0.0464 0.3325 0.1828 0.1272
Output 29.3.3 Parameter Estimates: Initial Model Final Estimates of the Free Parameters
Component
Parameter
Estimate
Approx Std Error
t Value
Approx Pr > |t|
Irregular Level Season
Error Variance Error Variance Error Variance
92.14591 44.83595 0.01250
13.10986 10.65465 0.0065153
7.03 4.21 1.92
<.0001 <.0001 0.0551
The following statements fit a subset trigonometric model formed by dropping the last three harmonics by specifying the DROPH= option in the SEASON statement:
proc ucm data=callCenter; id datetime interval=dthour6; model calls; irregular; level; season length=28 type=trig droph=12 13 14; estimate back=28; forecast back=28 lead=28;
1832 F Chapter 29: The UCM Procedure
run;
The last row of the holdout region prediction analysis table for the preceding model is shown in Output 29.3.4. It shows that the subset trigonometric model has better prediction performance in the holdout region than the full trigonometric model, its SAE = 471:53 compared to the SAE = 516:22 for the full model. Output 29.3.4 SAE for the Subset Trigonometric Model Obs
datetime
Actual
Forecast
Error
SAE
552
30APR00:18
61
40.836
20.164
471.534
The following statements illustrate a spline approximation to this seasonal component. In the spline specification the knot placement is quite important, and usually some experimentation is needed. In the following model the knots are placed at the beginning and the middle of each day. Note that the knots at the beginning and end of the season, 1 and 28 in this case, should not be listed in the knot list because knots are always placed there anyway.
proc ucm data=callCenter; id datetime interval=dthour6; model calls; irregular; level; splineseason length=28 knots=3 5 7 9 11 13 15 17 19 21 23 25 27 degree=3; estimate back=28; forecast back=28 lead=28; run;
The spline season model takes about half the time to fit that the baseline model takes. The last row of the holdout region prediction analysis table for this model is shown in Output 29.3.5, which shows that the spline season model performs even better than the previous two models in the holdout region, its SAE = 313:79 compared to SAE = 471:53 for the previous model. Output 29.3.5 SAE for the Spline Season Model Obs
datetime
Actual
Forecast
Error
SAE
552
30APR00:18
61
23.350
37.650
313.792
The following statements illustrate yet another way to approximate a long seasonal component. Here a combination of BLOCKSEASON and SEASON statements results in a seasonal component that is a sum of two seasonal patterns: one seasonal pattern is simply a regular season with season length 4 that captures the within-day seasonal pattern, and the other seasonal pattern is a block seasonal pattern that remains constant during the day but varies from day to day within a week.
Example 29.4: Modeling Time-Varying Regression Effects Using the RANDOMREG Statement F 1833
Note the use of NLOPTIONS statement to change the optimization technique during the parameter estimation to DBLDOG, which in this case performs better than the default technique, TRUREG.
proc ucm data=callCenter; id datetime interval=dthour6; model calls; irregular; level; season length=4 type=trig; blockseason nblocks=7 blocksize=4 type=trig; estimate back=28; forecast back=28 lead=28; nloptions tech=dbldog; run;
This model also takes about half the time to fit that the baseline model takes. The last row of the holdout region prediction analysis table for this model is shown in Output 29.3.6, which shows that the block season model does slightly better than the baseline model but not as good as the other two models, its SAE = 508:52 compared to the SAE = 516:22 of the baseline model. Output 29.3.6 SAE for the Block Season Model Obs
datetime
Actual
Forecast
Error
SAE
552
30APR00:18
61
39.339
21.661
508.522
This example showed a few different ways to model a long seasonal pattern. It showed that parsimonious models for long seasonal patterns can be useful, and in some cases even more effective than the full model. Moreover, for very long seasonal patterns the high memory requirements and long computing times might make full models impractical.
Example 29.4: Modeling Time-Varying Regression Effects Using the RANDOMREG Statement In April 1979 the Albuquerque Police Department began a special enforcement program aimed at reducing the number of DWI (driving while intoxicated) accidents. The program was administered by a squad of police officers, who used breath alcohol testing (BAT) devices and a van that houses a BAT device (Batmobile). These data were collected by the Division of Governmental Research of the University of New Mexico, under a contract with the National Highway Traffic Safety Administration of the U.S. Department of Transportation, to evaluate the Batmobile program. The first 29 observations are for a control period, and the next 23 observations are for the experimental (Batmobile) period. The data, freely available at http://lib.stat.cmu.edu/DASL/Datafiles/batdat.html, consist of two variables: ACC, which represents injuries and fatalities from Wednesday to Saturday nighttime accidents, and FUEL, which represents fuel consumption (millions of gallons) in Albu-
1834 F Chapter 29: The UCM Procedure
querque. The variables are measured quarterly starting from the first quarter of 1972 up to the last quarter of 1984, covering the span of 13 years. The following DATA step statements create the input data set. data bat; input ACC FUEL @@; batProgram = 0; if _n_ > 29 then batProgram = 1; date = INTNX( ’qtr’, ’1jan1972’d, _n_- 1 ); format date qtr8.; datalines; 192 32.592 238 37.250 232 40.032 246 35.852 185 38.226 274 38.711 266 43.139 196 40.434 170 35.898 234 37.111 272 38.944 234 37.717 210 37.861 280 42.524 246 43.965 248 41.976 269 42.918 326 49.789 342 48.454 257 45.056 280 49.385 290 42.524 356 51.224 295 48.562 279 48.167 330 51.362 354 54.646 331 53.398 291 50.584 377 51.320 327 50.810 301 46.272 269 48.664 314 48.122 318 47.483 288 44.732 242 46.143 268 44.129 327 46.258 253 48.230 215 46.459 263 50.686 319 49.681 263 51.029 206 47.236 286 51.717 323 51.824 306 49.380 230 47.961 304 46.039 311 55.683 292 52.263 ;
There are a number of ways to study these data and the question of the effectiveness of the BAT program. One possibility is to study the before-after difference in the injuries and fatalities per million gallons of fuel consumed, by regressing ACC on the FUEL and the dummy variable BATPROGRAM, which is zero before the program began and one while the program is in place. However, it is possible that the effect of the Batmobiles might well be cumulative, because as awareness of the program becomes dispersed, its effectiveness as a deterrent to driving while intoxicated increases. This suggests that the regression coefficient of the BATPROGRAM variable might be time varying. The following program fits a model that incorporates these considerations. A seasonal component is included in the model since it is easy to see that the data show strong quarterly seasonality. proc ucm data=bat; model acc = fuel; id date interval=qtr; irregular; level var=0 noest; randomreg batProgram / plot=smooth; season length=4 var=0 noest plot=smooth; estimate plot=(panel residual); forecast plot=forecasts lead=0; run;
Example 29.4: Modeling Time-Varying Regression Effects Using the RANDOMREG Statement F 1835
The model seems to fit the data adequately. No data are withheld for model validation because the series is relatively short. The plot of the time-varying coefficient of BATPROGRAM is shown in Output 29.4.1. As expected, it shows that the effectiveness of the program increases as awareness of the program becomes dispersed. The effectiveness eventually seems to level off. The residual diagnostic plots are shown in Output 29.4.2 and Output 29.4.3, the forecast plot is in Output 29.4.4, the goodness-of-fit statistics are in Output 29.4.5, and the parameter estimates are in Output 29.4.6. Output 29.4.1 Time-Varying Regression Coefficient of BATPROGRAM
1836 F Chapter 29: The UCM Procedure
Output 29.4.2 Residuals for the Time-Varying Regression Model
Example 29.4: Modeling Time-Varying Regression Effects Using the RANDOMREG Statement F 1837 Output 29.4.3 Residual Diagnostics for the Time-Varying Regression Model
1838 F Chapter 29: The UCM Procedure
Output 29.4.4 One-Step-Ahead Forecasts for the Time-Varying Regression Model
Output 29.4.5 Model Fit for the Time-Varying Regression Model Fit Statistics Based on Residuals Mean Squared Error Root Mean Squared Error Mean Absolute Percentage Error Maximum Percent Error R-Square Adjusted R-Square Random Walk R-Square Amemiya’s Adjusted R-Square
866.75562 29.44071 9.50326 14.15368 0.32646 0.29278 0.63010 0.19175
Number of non-missing residuals used for computing the fit statistics = 22
Example 29.5: Trend Removal Using the Hodrick-Prescott Filter F 1839
Output 29.4.6 Parameter Estimates for the Time-Varying Regression Model Final Estimates of the Free Parameters
Component
Parameter
Irregular FUEL batProgram
Error Variance Coefficient Error Variance
Estimate
Approx Std Error
t Value
Approx Pr > |t|
480.92258 6.23279 84.22334
109.21980 0.67533 79.88166
4.40 9.23 1.05
<.0001 <.0001 0.2917
Example 29.5: Trend Removal Using the Hodrick-Prescott Filter Hodrick-Prescott filter (see Hodrick and Prescott (1997)) is a popular tool in macroeconomics for fitting smooth trend to time series. It is well known that the trend computation according to this filter is equivalent to fitting the local linear trend plus irregular model with the level disturbance variance restricted to zero and the slope disturbance variance restricted to be a suitable multiple of the irregular component variance. The multiple used depends on the frequency of the series; for example, for quarterly series the commonly recommended multiple is 1=1600 D 0:000625. For other intervals there is no consensus, but a frequently suggested value for monthly series is 1=14400 and the value for an annual series can range from 1=400 D 0:0025 to 1=7 D 0:15. The data set considered in this example consists of quarterly GNP values for the United States from 1960 to 1991. In the UCM procedure statements that follow, the presence of the PROFILE option in the ESTIMATE statement implies that the restriction that the disturbance variance of the slope component be fixed at 0:000625 is interpreted differently: it implies that the disturbance variance of the slope component be restricted to be 0:000625 times the estimated irregular component variance, as needed for the Hodrick-Prescott filter. The plot of the fitted trend is shown in Output 29.5.1, and the plot of the smoothed irregular component, which corresponds to the detrended series, is given in Output 29.5.2. The detrended series can be further analyzed for business cycles. proc ucm data=sashelp.gnp; id date interval=qtr; model gnp; irregular plot=smooth; level var=0 noest plot=smooth; slope var=0.000625 noest; estimate PROFILE; forecast plot=(decomp); run;
1840 F Chapter 29: The UCM Procedure
Output 29.5.1 Smoothed Trend for the GNP Series as per the Hodrick-Prescott Filter
Example 29.6: Using Splines to Incorporate Nonlinear Effects F 1841
Output 29.5.2 Detrended GNP Series
Example 29.6: Using Splines to Incorporate Nonlinear Effects The data in this example are created to mirror the electricity demand and temperature data recorded at a utility company in the midwest region of the United States. The data set (not shown), utility, has three variables: load, temp, and date. The load column contains the daily electricity demand, the temp column has the average daily temperature readings, and the date column records the observation date. The following statements produce a plot, shown in Output 29.6.1, of electricity load versus temperature. Clearly the relationship is smooth but nonlinear: the load generally increases when the temperatures are away from the comfortable sixties. proc sgplot data=utility; loess x=temp y=load / smooth=0.4; run;
1842 F Chapter 29: The UCM Procedure
Output 29.6.1 Load versus Temperature Plot
The time series plot of the load (not shown) also shows that, apart from a day-of-the-week seasonal effect, there are no additional easily identifiable patterns in the series. The series has no apparent upward or downward trend. The following statements fit a UCM to the series that takes into account these observations. The particular choice of the model is a result of a little modeling exercise that compared a small number of competing models. The chosen model is adequate but by no means the best possible. The temperature effect is modeled by a deterministic three-degree spline with knots at 30, 40, 50, 60, and 75. The knot locations and the degree were chosen by visual inspection of the plot (Output 29.6.1). An autoreg component is used in place of the simple irregular component, which improved the residual analysis. The last 60 days of data are withheld for outof-sample forecast evaluation (note the BACK= option in both the ESTIMATE and FORECAST statements). The OUTLIER statement is used to increase the number of outliers reported to 10. Since no CHECKBREAK option is used in the LEVEL statement, only the additive outliers are searched. In this example the use of the EXTRADIFFUSE= option in the ESTIMATE and FORECAST statements is useful for discarding some early one-step-ahead forecasts and residuals with large variance.
proc ucm data=utility; id date interval=day; model load;
Example 29.6: Using Splines to Incorporate Nonlinear Effects F 1843
autoreg; level plot=smooth; splinereg temp knots=30 40 50 65 75 degree=3 variance=0 noest; season length=7 var=0 noest; estimate plot=panel back=60 extradiffuse=50; outlier maxnum=10; forecast back=60 lead=60 extradiffuse=50; run;
The parameter estimates are given in Output 29.6.2, and the residual goodness-of-fit statistics are shown in Output 29.6.3. The residual diagnostic plots are shown in Output 29.6.4. The ACF and PACF plots appear satisfactory, but the normality plots, particularly the Q-Q plot, show possible violations. It appears that, at least in part, this nonNormal behavior of the residuals might be attributable to the outliers in the series. The outlier summary table, Output 29.6.5, shows the most likely outlying observations. Notice that most of these outliers are holidays, like July 4th, when the electricity load is lower than usual for that day of the week. Output 29.6.2 Electricity Load: Parameter Estimates The UCM Procedure Final Estimates of the Free Parameters
Component
Parameter
Level AutoReg AutoReg temp temp temp temp temp temp temp temp
Error Variance Damping Factor Error Variance Spline Coefficient_1 Spline Coefficient_2 Spline Coefficient_3 Spline Coefficient_4 Spline Coefficient_5 Spline Coefficient_6 Spline Coefficient_7 Spline Coefficient_8
Estimate 0.21185 0.57522 2.21057 4.72502 2.19116 -7.14492 -11.39950 -16.38055 -18.76075 -8.04628 -2.30525
Approx Approx Std Error t Value Pr > |t| 0.05025 0.03466 0.20478 1.93997 1.71243 1.56805 1.45098 1.36977 1.28898 1.09017 1.25102
4.22 16.60 10.79 2.44 1.28 -4.56 -7.86 -11.96 -14.55 -7.38 -1.84
<.0001 <.0001 <.0001 0.0149 0.2007 <.0001 <.0001 <.0001 <.0001 <.0001 0.0654
1844 F Chapter 29: The UCM Procedure
Output 29.6.3 Electricity Load: Goodness of Fit Fit Statistics Based on Residuals Mean Squared Error Root Mean Squared Error Mean Absolute Percentage Error Maximum Percent Error R-Square Adjusted R-Square Random Walk R-Square Amemiya’s Adjusted R-Square
2.90945 1.70571 2.92586 14.96281 0.92739 0.92721 0.69618 0.92684
Number of non-missing residuals used for computing the fit statistics = 791
Output 29.6.4 Electricity Load: Residual Diagnostics
Example 29.6: Using Splines to Incorporate Nonlinear Effects F 1845
Output 29.6.5 Additive Outliers in the Electricity Load Series
Obs
Time
Estimate
StdErr
ChiSq
DF
Prob ChiSq
1281 916 329 977 1341 693 915 1057 551 879
04JUL2002 04JUL2001 25NOV1999 03SEP2001 02SEP2002 23NOV2000 03JUL2001 22NOV2001 04JUL2000 28MAY2001
-7.99908 -6.55778 -5.85047 -5.67254 -5.49631 -5.27968 5.06557 -5.01550 -4.89965 -4.76135
1.3417486 1.338431 1.3379735 1.3389138 1.337843 1.3374368 1.3375273 1.3386184 1.3381557 1.3375349
35.54 24.01 19.12 17.95 16.88 15.58 14.34 14.04 13.41 12.67
1 1 1 1 1 1 1 1 1 1
<.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.0002 0.0002 0.0003 0.0004
The plot of the load forecasts for the withheld data is shown in Output 29.6.6. Output 29.6.6 Electricity Load: Forecast Evaluation of the Withheld Data
1846 F Chapter 29: The UCM Procedure
Example 29.7: Detection of Level Shift The series in this example consists of the yearly water level readings of the Nile River recorded at Aswan, Egypt (see Cobb (1978) and de Jong and Penzer (1998)). The readings are from the years 1871 to 1970. The series does not show any apparent trend or any other distinctive patterns; however, there is a shift in the water level starting at the year 1899. This shift could be attributed to the start of construction of a dam near Aswan in that year. A time series plot of this series is given in Output 29.7.1. The following DATA step statements create the input data set. data nile; input waterlevel @@; year = intnx( ’year’, ’1jan1871’d, _n_-1 ); format year year4.; datalines; 1120 1160 963 1210 1160 1160 813 1230 995 935 1110 994 1020 960 1180 799 1100 1210 1150 1250 1260 1220 1030 1100 874 694 940 833 701 916 692 1020 831 726 456 824 702 1120 1100 832 768 845 864 862 698 845 744 796 781 865 845 944 984 897 822 1010 649 846 812 742 801 1040 860 874 744 749 838 1050 918 986 797 923 1020 906 901 1170 912 746 919 718 ; proc timeseries data=nile plot=series; id year interval=year; var waterlevel; run;
1370 958 774 1050 764 1040 771 848 975 714
1140 1140 840 969 821 759 676 890 815 740
Example 29.7: Detection of Level Shift F 1847
Output 29.7.1 Nile Water Level
In this situation it is known that a shift in the water level occurred within the span of the series, and its effect can be easily taken into account by including an appropriate indicator variable as a regressor. However, in many situation such prior information is not available, and it is useful to detect such a shift in a data analytic fashion. You can check for breaks in the level by using the CHECKBREAK option in the LEVEL statement. The following statements fit a simple locally constant level plus error model to the series: proc ucm data=nile; id year interval=year; model waterlevel; irregular; level plot=smooth checkbreak; estimate; forecast plot=decomp; run;
The plot in Output 29.7.2 shows a noticeable drop in the smoothed water level around 1899.
1848 F Chapter 29: The UCM Procedure
Output 29.7.2 Smoothed Trend without the Shift of 1899
The “Outlier Summary” table in Output 29.7.3 shows the most likely types of breaks and their locations within the series span. The shift of 1899 is easily detected. Output 29.7.3 Detection of Structural Breaks in the Nile River Level Outlier Summary
Obs year Break Type 29 1899 Level
Estimate
Standard Error Chi-Square
-315.73791 97.639753
10.46
DF Pr > ChiSq 1
0.0012
The following statements specify a UCM that models the level of the river as a locally constant series with a shift in the year 1899, represented by a dummy regressor (SHIFT1899): data nile; set nile; shift1899 = ( year >= ’1jan1899’d ); run;
Example 29.8: ARIMA Modeling (Experimental) F 1849
proc ucm data=nile; id year interval=year; model waterlevel = shift1899; irregular; level; estimate; forecast plot=decomp; run;
The plot in Output 29.7.4 shows the smoothed trend, including the correction due to the shift in the year 1899. Notice the simplicity in the shape of the smoothed curve after the incorporation of the shift information. Output 29.7.4 Smoothed Trend plus Shift of 1899
Example 29.8: ARIMA Modeling (Experimental) This example shows how you can use the UCM procedure for ARIMA modeling. The parameter estimates and predictions for ARIMA models obtained by using PROC UCM will be close to those obtained by using PROC ARIMA (in the presence of the ML option in its ESTIMATE statement) if the model is stationary or if the model is nonstationary and there are no missing values in the
1850 F Chapter 29: The UCM Procedure
data. See Chapter 7, “The ARIMA Procedure,” for additional details about the ARIMA procedure. However, if there are missing values in the data and the model is nonstationary, then the UCM and ARIMA procedures can produce significantly different parameter estimates and predictions. An article by Kohn and Ansley (1986) suggests a statistically sound method of estimation, prediction, and interpolation for nonstationary ARIMA models with missing data. This method is based on an algorithm that is equivalent to the Kalman filtering and smoothing algorithm used in the UCM procedure. The results of an illustrative example in their article are reproduced here using the UCM procedure. In this example an ARIMA(0,1,1)(0,1,1)12 model is applied to the logarithm of the air series in the sashelp.air data set. Four different missing value patterns are considered to highlight different aspects of the problem: Data1. The full data set of 144 observations. Data2. The set of 78 observations that omit January through November in each of the last 6 years. Data3. The data set with the 5 observations July 1949, June, July, and August 1957, and July 1960 missing. Data4. The data set with all July observations missing and June and August 1957 also missing. The following DATA steps create these data sets: data Data1; set sashelp.air; logair = log(air); run; data Data2; set data1; if year(date) >= 1955 and month(date) < 12 then logair = .; run; data Data3; set data1; if (year(date) = 1949 and month(date) = 7) then logair = .; if ( year(date) = 1957 and (month(date) = 6 or month(date) = 7 or month(date) = 8)) then logair = .; if (year(date) = 1960 and month(date) = 7) then logair = .; run; data Data4; set data1; if month(date) = 7 then logair = .; if year(date) = 1957 and (month(date) = 6 or month(date) = 8) then logair = .; run;
The following statements specify the ARIMA.0; 1; 1/ .0; 1; 1/12 model for the logair series in the first data set (Data1):
Example 29.8: ARIMA Modeling (Experimental) F 1851
proc ucm data=Data1; id date interval=month; model logair; irregular q=1 sq=1 s=12; deplag lags=(1)(12) phi=1 1 noest; estimate outest=est1; forecast outfor=for1; run;
Note that the moving-average part of the model is specified by using the Q=, SQ=, and S= options in the IRREGULAR statement and the differencing operator, .1 B/.1 B 12 /, is specified by using the DEPLAG statement. The model does not contain an intercept term; therefore no LEVEL statement is needed. The parameter estimates are saved in a data set EST1 by using the OUTEST= option in the ESTIMATE statement and the forecasts and the component estimates are saved in a data set FOR1 by using the OUTFOR= option in the FORECAST statement. The same analysis is performed on the other three data sets, but is not shown here. Output 29.8.1 resembles Table 1 in Kohn and Ansley (1986). This table is generated by merging the parameter estimates from the four analyses. Only the moving-average parameter estimates and their standard errors are reported. The columns EST1 and STD1 correspond to the estimates for Data1. The parameter estimates and their standard errors for other three data sets are similarly named. Note that the parameter estimates closely match the parameter estimates in the article. However, their standard errors differ slightly. This difference could be the result of different ways of computing the Hessian at the optimum. The white noise error variance estimates are not reported here, but they agree quite closely with those in the article. Output 29.8.1 Data Sets 1–4: Parameter Estimates and Standard Errors P A R A M E T E R MA_1 SMA_1
e s t 1
s t d 1
e s t 2
s t d 2
e s t 3
s t d 3
e s t 4
s t d 4
0.402 0.557
0.090 0.073
0.457 0.758
0.121 0.236
0.408 0.566
0.092 0.075
0.431 0.573
0.091 0.074
Output 29.8.2 resembles Table 2 in Kohn and Ansley (1986). It contains forecasts and their standard errors for the four data sets. The numbers are very close to those in the article.
1852 F Chapter 29: The UCM Procedure
Output 29.8.2 Data Sets 1–4: Forecasts and Standard Errors DATE
for1
std1
for2
std2
for3
std3
for4
std4
JAN61 FEB61 MAR61 APR61 MAY61 JUN61 JUL61 AUG61 SEP61 OCT61 NOV61 DEC61
6.110 6.054 6.172 6.199 6.233 6.369 6.507 6.503 6.325 6.209 6.063 6.168
0.037 0.043 0.048 0.053 0.057 0.061 0.065 0.069 0.072 0.075 0.079 0.082
6.084 6.091 6.247 6.205 6.199 6.308 6.409 6.414 6.299 6.174 6.043 6.174
0.052 0.058 0.063 0.068 0.072 0.076 0.079 0.082 0.085 0.087 0.089 0.086
6.110 6.054 6.173 6.199 6.232 6.367 6.497 6.503 6.325 6.209 6.064 6.168
0.037 0.043 0.048 0.053 0.058 0.062 0.067 0.069 0.072 0.076 0.079 0.082
6.111 6.055 6.174 6.200 6.233 6.368 . 6.503 6.326 6.209 6.064 6.169
0.037 0.043 0.048 0.052 0.056 0.060 . 0.067 0.071 0.074 0.077 0.080
Output 29.8.3 is based on Data2. It resembles Table 3 in Kohn and Ansley (1986). The columns S_SERIES and VS_SERIES in the OUTFOR= data set contain the interpolated values of logair and their variances. The estimate column in Output 29.8.3 reports interpolated values (which are the same as S_SERIES), and the std column reports their standard errors (which are computed as square root of VS_SERIES) for January–November 1957. The actual logair values for these months, which
are missing in Data2, are also provided for comparison. The numbers are very close to those in the article. Output 29.8.3 Data Set 2: Interpolated Values and Standard Errors DATE
logair
estimate
std
JAN57 FEB57 MAR57 APR57 MAY57 JUN57 JUL57 AUG57 SEP57 OCT57 NOV57
5.753 5.707 5.875 5.852 5.872 6.045 6.142 6.146 6.001 5.849 5.720
5.733 5.738 5.893 5.850 5.843 5.951 6.051 6.055 5.938 5.812 5.680
0.045 0.049 0.052 0.054 0.055 0.055 0.055 0.054 0.052 0.049 0.045
Output 29.8.4 resembles Table 4 in Kohn and Ansley (1986). These numbers are based on Data3, and they also are very close to those in the article.
References F 1853
Output 29.8.4 Data Set 3: Interpolated Values and Standard Errors DATE JUL49 JUN57 JUL57 AUG57 JUL60
logair
estimate
std
4.997 6.045 6.142 6.146 6.433
5.013 6.024 6.147 6.148 6.409
0.031 0.030 0.031 0.030 0.031
Output 29.8.5 resembles Table 5 in Kohn and Ansley (1986). As before, the numbers are very close to those in the article. Output 29.8.5 Data Set 4: Interpolated Values and Standard Errors DATE JUN57 AUG57
logair
estimate
std
6.045 6.146
6.023 6.147
0.030 0.030
The similarity between the outputs in this example and the results shown in Kohn and Ansley (1986) demonstrate that PROC UCM can be effectively used for nonstationary ARIMA models with missing data.
References Akaike, H. (1974), “A New Look at the Statistical Model Identification,” IEEE Transaction on Automatic Control, AC–19, 716–723. Anderson, T. W. (1971), The Statistical Analysis of Time Series, New York: John Wiley & Sons. Bloomfield, P. (2000), Fourier Analysis of Time Series, Second Edition, New York: John Wiley & Sons. Box, G. E. P. and Jenkins, G. M. (1976), Time Series Analysis: Forecasting and Control, San Francisco: Holden-Day. Bozdogan, H. (1987), “Model Selection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions,” Psychometrika, 52, 345–370. Brockwell, P.J., and Davis, R.A. (1991), Time Series: Theory and Methods, Second Edition, New York: Springer-Verlag. Burnham, K. P. and Anderson, D. R. (1998), Model Selection and Inference: A Practical Information-Theoretic Approach, New York: Springer-Verlag.
1854 F Chapter 29: The UCM Procedure
Cobb, G. W. (1978), “The Problem of the Nile: Conditional Solution to a Change Point Problem,” Biometrika, 65, 243–251. de Jong, P. and Chu-Chun-Lin, S. (2003), “Smoothing with an Unknown Initial Condition,” Journal of Time Series Analysis, vol. 24, no. 2, 141–148. de Jong, P. and Penzer, J. (1998), “Diagnosing Shocks in Time Series,” Journal of the American Statistical Association, vol. 93, no. 442, 796–806. Durbin, J. and Koopman, S. J. (2001), Time Series Analysis by State Space Methods, Oxford: Oxford University Press. Hannan, E.J. and Quinn, B.G. (1979), “The Determination of the Order of an Autoregression,” Journal of the Royal Statistical Society, Series B, 41, 190–195. Harvey, A. C. (1989), Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge: Cambridge University Press. Harvey, A. C. (2001), “Testing in Unobserved Components Models,” Journal of Forecasting, 20, 1–19. Hodrick, R. and Prescott, E. (1997) “Postwar U.S. Business Cycles: An Empirical Investigation,” Journal of Money, Credit, and Banking, 29, 1–16. Hurvich, C. M. and Tsai, C.-L. (1989), “Regression and Time Series Model Selection in Small Samples,” Biometrika, 76, 297–307. Jones, Richard H. (1980), “Maximum Likelihood Fitting of ARMA Models to Time Series with Missing Observations,” Technometrics, 22, 389–396. Kohn, R. and Ansley C. F. (1986), “Estimation, Prediction, and Interpolation for ARIMA models With Missing Data,” Journal of the American Statistical Association, vol. 81, no. 395, 751–761. Schwarz, G. (1978), “Estimating the Dimension of a Model,” Annals of Statistics, 6, 461–464. West, M. and Harrison, J. (1999) Bayesian Forecasting and Dynamic Models, Second Edition, New York: Springer-Verlag.
Chapter 30
The VARMAX Procedure Contents Overview: VARMAX Procedure . . . . . . . . . . . . . . . . . Getting Started: VARMAX Procedure . . . . . . . . . . . . . . Vector Autoregressive Process . . . . . . . . . . . . . . . Bayesian Vector Autoregressive Process . . . . . . . . . . Vector Error Correction Model . . . . . . . . . . . . . . Bayesian Vector Error Correction Model . . . . . . . . . Vector Autoregressive Process with Exogenous Variables Parameter Estimation and Testing on Restrictions . . . . . Causality Testing . . . . . . . . . . . . . . . . . . . . . . Syntax: VARMAX Procedure . . . . . . . . . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . . . . . . . PROC VARMAX Statement . . . . . . . . . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . . . . . . . CAUSAL Statement . . . . . . . . . . . . . . . . . . . . COINTEG Statement . . . . . . . . . . . . . . . . . . . ID Statement . . . . . . . . . . . . . . . . . . . . . . . . MODEL Statement . . . . . . . . . . . . . . . . . . . . . GARCH Statement . . . . . . . . . . . . . . . . . . . . . NLOPTIONS Statement . . . . . . . . . . . . . . . . . . OUTPUT Statement . . . . . . . . . . . . . . . . . . . . RESTRICT Statement . . . . . . . . . . . . . . . . . . . TEST Statement . . . . . . . . . . . . . . . . . . . . . . Details: VARMAX Procedure . . . . . . . . . . . . . . . . . . Missing Values . . . . . . . . . . . . . . . . . . . . . . . VARMAX Model . . . . . . . . . . . . . . . . . . . . . Dynamic Simultaneous Equations Modeling . . . . . . . Impulse Response Function . . . . . . . . . . . . . . . . Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . Tentative Order Selection . . . . . . . . . . . . . . . . . VAR and VARX Modeling . . . . . . . . . . . . . . . . . Bayesian VAR and VARX Modeling . . . . . . . . . . . VARMA and VARMAX Modeling . . . . . . . . . . . . Model Diagnostic Checks . . . . . . . . . . . . . . . . . Cointegration . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1856 1858 1858 1866 1867 1873 1874 1878 1881 1882 1883 1885 1888 1889 1890 1892 1892 1907 1908 1908 1909 1911 1912 1912 1912 1916 1919 1930 1935 1941 1947 1949 1957 1959
1856 F Chapter 30: The VARMAX Procedure
Vector Error Correction Modeling . . . . . . . . . . . . I(2) Model . . . . . . . . . . . . . . . . . . . . . . . . Multivariate GARCH Modeling . . . . . . . . . . . . . Output Data Sets . . . . . . . . . . . . . . . . . . . . . OUT= Data Set . . . . . . . . . . . . . . . . . . . . . . OUTEST= Data Set . . . . . . . . . . . . . . . . . . . OUTHT= Data Set . . . . . . . . . . . . . . . . . . . . OUTSTAT= Data Set . . . . . . . . . . . . . . . . . . . Printed Output . . . . . . . . . . . . . . . . . . . . . . ODS Table Names . . . . . . . . . . . . . . . . . . . . ODS Graphics . . . . . . . . . . . . . . . . . . . . . . Computational Issues . . . . . . . . . . . . . . . . . . Examples: VARMAX Procedure . . . . . . . . . . . . . . . . Example 30.1: Analysis of U.S. Economic Variables . . Example 30.2: Analysis of German Economic Variables Example 30.3: Numerous Examples . . . . . . . . . . Example 30.4: Illustration of ODS Graphics . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
1962 1978 1981 1987 1987 1989 1991 1992 1994 1995 2000 2001 2002 2002 2013 2025 2028 2031
Overview: VARMAX Procedure Given a multivariate time series, the VARMAX procedure estimates the model parameters and generates forecasts associated with vector autoregressive moving-average processes with exogenous regressors (VARMAX) models. Often, economic or financial variables are not only contemporaneously correlated to each other, they are also correlated to each other’s past values. The VARMAX procedure can be used to model these types of time relationships. In many economic and financial applications, the variables of interest (dependent, response, or endogenous variables) are influenced by variables external to the system under consideration (independent, input, predictor, regressor, or exogenous variables). The VARMAX procedure enables you to model the dynamic relationship both between the dependent variables and also between the dependent and independent variables. VARMAX models are defined in terms of the orders of the autoregressive or moving-average process (or both). When you use the VARMAX procedure, these orders can be specified by options or they can be automatically determined. Criteria for automatically determining these orders include the following: Akaike information criterion (AIC) corrected AIC (AICC) Hannan-Quinn (HQ) criterion final prediction error (FPE)
Overview: VARMAX Procedure F 1857
Schwarz Bayesian criterion (SBC), also known as Bayesian information criterion (BIC) If you do not want to use the automatic order selection, the VARMAX procedure provides autoregressive order identification aids: partial cross-correlations Yule-Walker estimates partial autoregressive coefficients partial canonical correlations For situations where the stationarity of the time series is in question, the VARMAX procedure provides tests to aid in determining the presence of unit roots and cointegration. These tests include the following: Dickey-Fuller tests Johansen cointegration test for nonstationary vector processes of integrated order one Stock-Watson common trends test for the possibility of cointegration among nonstationary vector processes of integrated order one Johansen cointegration test for nonstationary vector processes of integrated order two For stationary vector times series (or nonstationary series made stationary by appropriate differencing), the VARMAX procedure provides for vector autoregressive and moving-average (VARMA) and Bayesian vector autoregressive (BVAR) models. To cope with the problem of high dimensionality in the parameters of the VAR model, the VARMAX procedure provides both vector error correction model (VECM) and Bayesian vector error correction model (BVECM). Bayesian models are used when prior information about the model parameters is available. The VARMAX procedure also allows independent (exogenous) variables with their distributed lags to influence dependent (endogenous) variables in various models such as VARMAX, BVARX, VECMX, and BVECMX models. Forecasting is one of the main objectives of multivariate time series analysis. After successfully fitting the VARMAX, BVARX, VECMX, and BVECMX models, the VARMAX procedure computes predicted values based on the parameter estimates and the past values of the vector time series. The model parameter estimation methods are the following: least squares (LS) maximum likelihood (ML) The VARMAX procedure provides various hypothesis tests of long-run effects and adjustment coefficients by using the likelihood ratio test based on Johansen cointegration analysis. The VARMAX procedure offers the likelihood ratio test of the weak exogeneity for each variable. After fitting the model parameters, the VARMAX procedure provides for model checks and residual analysis by using the following tests:
1858 F Chapter 30: The VARMAX Procedure
Durbin-Watson (DW) statistics F test for autoregressive conditional heteroscedastic (ARCH) disturbance F test for AR disturbance Jarque-Bera normality test Portmanteau test The VARMAX procedure supports several modeling features, including the following: seasonal deterministic terms subset models multiple regression with distributed lags dead-start model that does not have present values of the exogenous variables GARCH-type multivariate conditional heteroscedasticity models The VARMAX procedure provides a Granger causality test to determine the Granger-causal relationships between two distinct groups of variables. It also provides the following: infinite order AR representation impulse response function (or infinite order MA representation) decomposition of the predicted error covariances roots of the characteristic functions for both the AR and MA parts to evaluate the proximity of the roots to the unit circle contemporaneous relationships among the components of the vector time series
Getting Started: VARMAX Procedure This section outlines the use of the VARMAX procedure and gives five different examples of the kinds of models supported.
Vector Autoregressive Process Let yt D .y1t ; : : : ; ykt /0 ; t D 1; 2; : : : ; denote a k-dimensional time series vector of random variables of interest. The pth-order VAR process is written as yt D ı C ˆ1 yt
1
C C ˆp yt
p
C t
Vector Autoregressive Process F 1859
where the t is a vector white noise process with t D .1t ; : : : ; k t /0 such that E.t / D 0, E.t t0 / D †, and E.t s0 / D 0 for t ¤ s; ı D .ı1 ; : : : ; ık /0 is a constant vector and ˆi is a k k matrix. Analyzing and modeling the series jointly enables you to understand the dynamic relationships over time among the series and to improve the accuracy of forecasts for individual series by using the additional information available from the related series and their forecasts.
Example of Vector Autoregressive Model Consider the first-order stationary bivariate vector autoregressive model yt D
1:2 0:6
0:5 0:3
yt
1
C t ;
with † D
1:0 0:5 0:5 1:25
The following IML procedure statements simulate a bivariate vector time series from this model to provide test data for the VARMAX procedure: proc iml; sig = {1.0 0.5, 0.5 1.25}; phi = {1.2 -0.5, 0.6 0.3}; /* simulate the vector time series */ call varmasim(y,phi) sigma = sig n = 100 seed = 34657; cn = {’y1’ ’y2’}; create simul1 from y[colname=cn]; append from y; quit;
The following statements plot the simulated vector time series yt shown in Figure 30.1: data simul1; set simul1; date = intnx( ’year’, ’01jan1900’d, _n_-1 ); format date year4.; run; proc timeseries data=simul1 vectorplot=series; id date interval=year; var y1 y2; run;
1860 F Chapter 30: The VARMAX Procedure
Figure 30.1 Plot of Generated Data Process
The following statements fit a VAR(1) model to the simulated data. First, you specify the input data set in the PROC VARMAX statement. Then, you use the MODEL statement to designate the dependent variables, y1 and y2 . To estimate a VAR model with mean zero, you specify the order of the autoregressive model with the P= option and the NOINT option. The MODEL statement fits the model to the data and prints parameter estimates and their significance. The PRINT=ESTIMATES option prints the matrix form of parameter estimates, and the PRINT=DIAGNOSE option prints various diagnostic tests. The LAGMAX=3 option is used to print the output for the residual diagnostic checks. To output the forecasts to a data set, you specify the OUTPUT statement with the OUT= option. If you want to forecast five steps ahead, you use the LEAD=5 option. The ID statement specifies the yearly interval between observations and provides the Time column in the forecast output. The VARMAX procedure output is shown in Figure 30.2 through Figure 30.10. /*--- Vector Autoregressive Model ---*/ proc varmax data=simul1; id date interval=year; model y1 y2 / p=1 noint lagmax=3 print=(estimates diagnose);
Vector Autoregressive Process F 1861
output out=for lead=5; run;
Figure 30.2 Descriptive Statistics The VARMAX Procedure Number of Observations Number of Pairwise Missing
100 0
Simple Summary Statistics
Variable Type y1 y2
Dependent Dependent
N
Mean
Standard Deviation
Min
Max
100 100
-0.21653 0.16905
2.78210 2.58184
-4.75826 -6.04718
8.37032 9.58487
The VARMAX procedure first displays descriptive statistics. The Type column specifies that the variables are dependent variables. The column N stands for the number of nonmissing observations. Figure 30.3 shows the type and the estimation method of the fitted model for the simulated data. It also shows the AR coefficient matrix in terms of lag 1, the parameter estimates, and their significance, which can indicate how well the model fits the data. The second table schematically represents the parameter estimates and allows for easy verification of their significance in matrix form. In the last table, the first column gives the left-hand-side variable of the equation; the second column is the parameter name ARl_i _j , which indicates the (i; j )th element of the lag l autoregressive coefficient; the last column is the regressor that corresponds to the displayed parameter.
1862 F Chapter 30: The VARMAX Procedure
Figure 30.3 Model Type and Parameter Estimates The VARMAX Procedure Type of Model Estimation Method
VAR(1) Least Squares Estimation
AR Lag 1
Variable y1 y2
y1
y2
1.15977 0.54634
-0.51058 0.38499
Schematic Representation Variable/ Lag
AR1
y1 y2
+++
+ is > 2*std error, - is < -2*std error, . is between, * is N/A
Model Parameter Estimates
Equation Parameter
Estimate
y1
1.15977 -0.51058 0.54634 0.38499
AR1_1_1 AR1_1_2 AR1_2_1 AR1_2_2
y2
Standard Error t Value Pr > |t| Variable 0.05508 0.05898 0.05779 0.06188
21.06 -8.66 9.45 6.22
0.0001 0.0001 0.0001 0.0001
y1(t-1) y2(t-1) y1(t-1) y2(t-1)
The fitted VAR(1) model with estimated standard errors in parentheses is given as 0
1 1:160 0:511 B .0:055/ .0:059/ C Cy yt D B @ 0:546 0:385 A t .0:058/ .0:062/
1
C t
Clearly, all parameter estimates in the coefficient matrix ˆ1 are significant.
Vector Autoregressive Process F 1863
The model can also be written as two univariate regression equations. y1t
D 1:160 y1;t
1
0:511 y2;t
1
C 1t
y2t
D 0:546 y1;t
1
C 0:385 y2;t
1
C 2t
The table in Figure 30.4 shows the innovation covariance matrix estimates and the various information criteria results. The smaller value of information criteria fits the data better when it is compared to other models. The variable names in the covariance matrix are printed for convenience; y1 means the innovation for y1, and y2 means the innovation for y2. Figure 30.4 Innovation Covariance Estimates and Information Criteria Covariances of Innovations Variable y1 y2
y1
y2
1.28875 0.39751
0.39751 1.41839
Information Criteria AICC HQC AIC SBC FPEC
0.554443 0.595201 0.552777 0.65763 1.738092
Figure 30.5 shows the cross covariances of the residuals. The values of the lag zero are slightly different from Figure 30.4 due to the different degrees of freedom. Figure 30.5 Multivariate Diagnostic Checks Cross Covariances of Residuals Lag 0 1 2 3
Variable y1 y2 y1 y2 y1 y2 y1 y2
y1
y2
1.24909 0.36398 0.01635 -0.07210 0.06591 0.01096 -0.00203 -0.02129
0.36398 1.34203 0.03087 -0.09834 0.07835 -0.05780 0.08804 0.07277
Figure 30.6 and Figure 30.7 show tests for white noise residuals. The output shows that you cannot reject the null hypothesis that the residuals are uncorrelated.
1864 F Chapter 30: The VARMAX Procedure
Figure 30.6 Multivariate Diagnostic Checks Continued Cross Correlations of Residuals Lag
Variable
0
y1 y2 y1 y2 y1 y2 y1 y2
1 2 3
y1
y2
1.00000 0.28113 0.01309 -0.05569 0.05277 0.00847 -0.00163 -0.01644
0.28113 1.00000 0.02385 -0.07328 0.06052 -0.04307 0.06800 0.05422
Schematic Representation of Cross Correlations of Residuals Variable/ Lag 0 1 2 3 y1 y2
++ ++
.. ..
.. ..
.. ..
+ is > 2*std error, - is < -2*std error, . is between
Figure 30.7 Multivariate Diagnostic Checks Continued Portmanteau Test for Cross Correlations of Residuals Up To Lag 2 3
DF
Chi-Square
Pr > ChiSq
4 8
1.84 2.57
0.7659 0.9582
The VARMAX procedure provides diagnostic checks for the univariate form of the equations. The table in Figure 30.8 describes how well each univariate equation fits the data. From two univariate regression equations in Figure 30.3, the values of R2 in the second column are 0.84 and 0.80 for each equation. The standard deviations in the third column are the square roots of the diagonal elements of the covariance matrix from Figure 30.4. The F statistics are in the fourth column for hypotheses to test 11 D 12 D 0 and 21 D 22 D 0, respectively, where ij is the .i; j /th element of the matrix ˆ1 . The last column shows the p-values of the F statistics. The results show that each univariate model is significant.
Vector Autoregressive Process F 1865
Figure 30.8 Univariate Diagnostic Checks Univariate Model ANOVA Diagnostics
Variable y1 y2
R-Square
Standard Deviation
F Value
Pr > F
0.8369 0.7978
1.13523 1.19096
497.67 382.76
<.0001 <.0001
The check for white noise residuals in terms of the univariate equation is shown in Figure 30.9. This output contains information that indicates whether the residuals are correlated and heteroscedastic. In the first table, the second column contains the Durbin-Watson test statistics to test the null hypothesis that the residuals are uncorrelated. The third and fourth columns show the Jarque-Bera normality test statistics and their p-values to test the null hypothesis that the residuals have normality. The last two columns show F statistics and their p-values for ARCH(1) disturbances to test the null hypothesis that the residuals have equal covariances. The second table includes F statistics and their p-values for AR(1), AR(1,2), AR(1,2,3) and AR(1,2,3,4) models of residuals to test the null hypothesis that the residuals are uncorrelated. Figure 30.9 Univariate Diagnostic Checks Continued Univariate Model White Noise Diagnostics Durbin Watson
Variable y1 y2
Normality Chi-Square Pr > ChiSq
1.96656 2.13609
3.32 5.46
F Value
ARCH Pr > F
0.13 2.10
0.7199 0.1503
0.1900 0.0653
Univariate Model AR Diagnostics
Variable y1 y2
AR1 F Value Pr > F 0.02 0.52
0.8980 0.4709
AR2 F Value Pr > F 0.14 0.41
0.8662 0.6650
AR3 F Value Pr > F 0.09 0.32
0.9629 0.8136
AR4 F Value Pr > F 0.82 0.32
0.5164 0.8664
The table in Figure 30.10 gives forecasts, their prediction errors, and 95% confidence limits. See the section “Forecasting” on page 1930 for details.
1866 F Chapter 30: The VARMAX Procedure
Figure 30.10 Forecasts Forecasts
Variable
Obs
Time
Forecast
Standard Error
y1
101 102 103 104 105 101 102 103 104 105
2000 2001 2002 2003 2004 2000 2001 2002 2003 2004
-3.59212 -3.09448 -2.17433 -1.11395 -0.14342 -2.09873 -2.77050 -2.75724 -2.24943 -1.47460
1.13523 1.70915 2.14472 2.43166 2.58740 1.19096 1.47666 1.74212 2.01925 2.25169
y2
95% Confidence Limits -5.81713 -6.44435 -6.37792 -5.87992 -5.21463 -4.43298 -5.66469 -6.17173 -6.20709 -5.88782
-1.36711 0.25539 2.02925 3.65203 4.92779 0.23551 0.12369 0.65725 1.70823 2.93863
Bayesian Vector Autoregressive Process The Bayesian vector autoregressive (BVAR) model is used to avoid problems of collinearity and over-parameterization that often occur with the use of VAR models. BVAR models do this by imposing priors on the AR parameters. The following statements fit a BVAR(1) model to the simulated data. You specify the PRIOR= option with the hyperparameters. The LAMBDA=0.9 and THETA=0.1 options are hyperparameters controlling the prior covariance. Part of the VARMAX procedure output is shown in Figure 30.11. /*--- Bayesian Vector Autoregressive Process ---*/ proc varmax data=simul1; model y1 y2 / p=1 noint prior=(lambda=0.9 theta=0.1); run;
The output in Figure 30.11 shows that parameter estimates are slightly different from those in Figure 30.3. By choosing the appropriate priors, you might be able to get more accurate forecasts by using a BVAR model rather than by using an unconstrained VAR model. See the section “Bayesian VAR and VARX Modeling” on page 1947 for details.
Vector Error Correction Model F 1867
Figure 30.11 Parameter Estimates for the BVAR(1) Model The VARMAX Procedure Type of Model Estimation Method Prior Lambda Prior Theta
BVAR(1) Maximum Likelihood Estimation 0.9 0.1
Model Parameter Estimates
Equation Parameter
Estimate
y1
1.05623 -0.34707 0.40068 0.48728
y2
AR1_1_1 AR1_1_2 AR1_2_1 AR1_2_2
Standard Error t Value Pr > |t| Variable 0.05050 0.04824 0.04889 0.05740
20.92 -7.19 8.20 8.49
0.0001 0.0001 0.0001 0.0001
y1(t-1) y2(t-1) y1(t-1) y2(t-1)
Covariances of Innovations Variable y1 y2
y1
y2
1.35807 0.44152
0.44152 1.45070
Vector Error Correction Model A vector error correction model (VECM) can lead to a better understanding of the nature of any nonstationarity among the different component series and can also improve longer term forecasting over an unconstrained model. The VECM(p) form with the cointegration rank r. k/ is written as
yt D ı C …yt
1C
p X1
ˆi yt
i
C t
i D1
where is the differencing operator, such that yt D yt k r matrices; ˆi is a k k matrix.
yt
1;
… D ˛ˇ 0 , where ˛ and ˇ are
It has an equivalent VAR(p) representation as described in the preceding section.
yt D ı C .Ik C … C
ˆ1 /yt 1
C
p X1
.ˆi
i D2
where Ik is a k k identity matrix.
ˆi 1 /yt
i
ˆp
1 yt p
C t
1868 F Chapter 30: The VARMAX Procedure
Example of Vector Error Correction Model An example of the second-order nonstationary vector autoregressive model is
0:2 0:1 0:5 0:2
yt D
yt
1
C
0:8 0:7 0:4 0:6
yt
2
C t
with †D
100 0 0 100
and y0 D
0 0
This process can be given the following VECM(2) representation with the cointegration rank one: yt D
0:4 0:1
.1; 2/yt
1
0:8 0:7 0:4 0:6
yt
1
C t
The following PROC IML statements generate simulated data for the VECM(2) form specified above and plot the data as shown in Figure 30.12: proc iml; sig = 100*i(2); phi = {-0.2 0.1, 0.5 0.2, 0.8 0.7, -0.4 0.6}; call varmasim(y,phi) sigma=sig n=100 initial=0 seed=45876; cn = {’y1’ ’y2’}; create simul2 from y[colname=cn]; append from y; quit; data simul2; set simul2; date = intnx( ’year’, ’01jan1900’d, _n_-1 ); format date year4. ; run; proc timeseries data=simul2 vectorplot=series; id date interval=year; var y1 y2; run;
Vector Error Correction Model F 1869
Figure 30.12 Plot of Generated Data Process
Cointegration Testing The following statements use the Johansen cointegration rank test. The COINTTEST=(JOHANSEN) option does the Johansen trace test and is equivalent to specifying COINTTEST with no additional options or the COINTTEST=(JOHANSEN=(TYPE=TRACE)) option. /*--- Cointegration Test ---*/ proc varmax data=simul2; model y1 y2 / p=2 noint dftest cointtest=(johansen); run;
Figure 30.13 shows the output for Dickey-Fuller tests for the nonstationarity of each series and Johansen cointegration rank test between series.
1870 F Chapter 30: The VARMAX Procedure
Figure 30.13 Dickey-Fuller Tests and Cointegration Rank Test The VARMAX Procedure Unit Root Test Variable
Type
y1
Zero Mean Single Mean Trend Zero Mean Single Mean Trend
y2
Rho
Pr < Rho
Tau
Pr < Tau
1.47 -0.80 -10.88 -0.05 -6.03 -50.49
0.9628 0.9016 0.3573 0.6692 0.3358 0.0003
1.65 -0.47 -2.20 -0.03 -1.72 -4.92
0.9755 0.8916 0.4815 0.6707 0.4204 0.0006
Cointegration Rank Test Using Trace
H0: Rank=r
H1: Rank>r
Eigenvalue
Trace
0 1
0 1
0.5086 0.0111
70.7279 1.0921
5% Critical Value 12.21 4.14
Drift in ECM
Drift in Process
NOINT
Constant
In Dickey-Fuller tests, the second column specifies three types of models, which are zero mean, single mean, or trend. The third column ( Rho ) and the fifth column ( Tau ) are the test statistics for unit root testing. Other columns are their p-values. You can see that both series have unit roots. For a description of Dickey-Fuller tests, see the section “PROBDF Function for Dickey-Fuller Tests” on page 158 in Chapter 5, “SAS Macros and Functions.” In the cointegration rank test, the last two columns explain the drift in the model or process. Since the NOINT option is specified, the model is yt D …yt
1
C ˆ1 yt
1
C t
The column Drift In ECM means there is no separate drift in the error correction model, and the column Drift In Process means the process has a constant drift before differencing. H0 is the null hypothesis, and H1 is the alternative hypothesis. The first row tests r D 0 against r > 0; the second row tests r D 1 against r > 1. The Trace test statistics in the fourth column are P computed by T kiDrC1 log.1 i / where T is the available number of observations and i is the eigenvalue in the third column. By default, the critical values at 5% significance level are used for testing. You can compare the test statistics and critical values in each row. There is one cointegrated process in this example since the Trace statistic for testing r D 0 against r > 0 is greater than the critical value, but the Trace statistic for testing r D 1 against r > 1 is not greater than the critical value. The following statements fit a VECM(2) form to the simulated data. From the result in Figure 30.13, the time series are cointegrated with rank=1. You specify the ECM= option with the RANK=1 option. For normalizing the value of the cointegrated vector, you specify the normalized variable
Vector Error Correction Model F 1871
with the NORMALIZE= option. The PRINT=(IARR) option provides the VAR(2) representation. The VARMAX procedure output is shown in Figure 30.14 through Figure 30.16. /*--- Vector Error-Correction Model ---*/ proc varmax data=simul2; model y1 y2 / p=2 noint lagmax=3 ecm=(rank=1 normalize=y1) print=(iarr estimates); run;
The ECM= option produces the estimates of the long-run parameter, ˇ, and the adjustment coefficient, ˛. In Figure 30.14, “1” indicates the first column of the ˛ and ˇ matrices. Since the cointegration rank is 1 in the bivariate system, ˛ and ˇ are two-dimensional vectors. The estimated cointegrating vector is ˇO D .1; 1:96/0 . Therefore, the long-run relationship between y1t and y2t is y1t D 1:96y2t . The first element of ˇO is 1 since y1 is specified as the normalized variable. Figure 30.14 Parameter Estimates for the VECM(2) Form The VARMAX Procedure Type of Model Estimation Method Cointegrated Rank
VECM(2) Maximum Likelihood Estimation 1
Beta Variable
1
y1 y2
1.00000 -1.95575
Alpha Variable y1 y2
1 -0.46680 0.10667
Figure 30.15 shows the parameter estimates in terms of lag one coefficients, yt 1 , and lag one first differenced coefficients, yt 1 , and their significance. “Alpha * Beta0 ” indicates the coefficients of yt 1 and is obtained by multiplying the “Alpha” and “Beta” estimates in Figure 30.14. The parameter AR1_i_j corresponds to the elements in the “Alpha * Beta0 ” matrix. The t values and p-values corresponding to the parameters AR1_i _j are missing since the parameters AR1_i _j have non-Gaussian distributions. The parameter AR2_i _j corresponds to the elements in the differenced lagged AR coefficient matrix. The “D_” prefixed to a variable name in Figure 30.15 implies differencing.
1872 F Chapter 30: The VARMAX Procedure
Figure 30.15 Parameter Estimates for the VECM(2) Form Parameter Alpha * Beta’ Estimates Variable y1 y2
y1
y2
-0.46680 0.10667
0.91295 -0.20862
AR Coefficients of Differenced Lag DIF Lag 1
Variable y1 y2
y1
y2
-0.74332 0.40493
-0.74621 -0.57157
Model Parameter Estimates
Equation Parameter
Estimate
D_y1
-0.46680 0.91295 -0.74332 -0.74621 0.10667 -0.20862 0.40493 -0.57157
D_y2
AR1_1_1 AR1_1_2 AR2_1_1 AR2_1_2 AR1_2_1 AR1_2_2 AR2_2_1 AR2_2_2
Standard Error t Value Pr > |t| Variable 0.04786 0.09359 0.04526 0.04769 0.05146 0.10064 0.04867 0.05128
-16.42 -15.65
0.0001 0.0001
8.32 -11.15
0.0001 0.0001
y1(t-1) y2(t-1) D_y1(t-1) D_y2(t-1) y1(t-1) y2(t-1) D_y1(t-1) D_y2(t-1)
The fitted model is given as 1 0:467 0:913 B .0:048/ .0:094/ C Cy yt D B @ 0:107 0:209 A t .0:051/ .0:100/ 0
1 0:743 0:746 B .0:045/ .0:048/ C C y B 1C@ 0:405 0:572 A t .0:049/ .0:051/ 0
1
C t
Figure 30.16 Change the VECM(2) Form to the VAR(2) Model Infinite Order AR Representation Lag 1 2 3
Variable y1 y2 y1 y2 y1 y2
y1
y2
-0.21013 0.51160 0.74332 -0.40493 0.00000 0.00000
0.16674 0.21980 0.74621 0.57157 0.00000 0.00000
Bayesian Vector Error Correction Model F 1873
The PRINT=(IARR) option in the previous SAS statements prints the reparameterized coefficient estimates. For the LAGMAX=3 in the SAS statements, the coefficient matrix of lag 3 is zero. The VECM(2) form in Figure 30.16 can be rewritten as the following second-order vector autoregressive model: yt D
0:210 0:167 0:512 0:220
yt
1
C
0:743 0:746 0:405 0:572
yt
2
C t
Bayesian Vector Error Correction Model Bayesian inference on a cointegrated system begins by using the priors of ˇ obtained from the VECM(p) form. Bayesian vector error correction models can improve forecast accuracy for cointegrated processes. The following statements fit a BVECM(2) form to the simulated data. You specify both the PRIOR= and ECM= options for the Bayesian vector error correction model. The VARMAX procedure output is shown in Figure 30.17. /*--- Bayesian Vector Error-Correction Model ---*/ proc varmax data=simul2; model y1 y2 / p=2 noint prior=( lambda=0.5 theta=0.2 ) ecm=( rank=1 normalize=y1 ) print=(estimates); run;
Figure 30.17 shows the model type fitted to the data, the estimates of the adjustment coefficient (˛), the parameter estimates in terms of lag one coefficients (yt 1 ), and lag one first differenced coefficients (yt 1 ).
1874 F Chapter 30: The VARMAX Procedure
Figure 30.17 Parameter Estimates for the BVECM(2) Form The VARMAX Procedure Type of Model Estimation Method Cointegrated Rank Prior Lambda Prior Theta
BVECM(2) Maximum Likelihood Estimation 1 0.5 0.2
Alpha Variable
1
y1 y2
-0.34392 0.16659
Parameter Alpha * Beta’ Estimates Variable y1 y2
y1
y2
-0.34392 0.16659
0.67262 -0.32581
AR Coefficients of Differenced Lag DIF Lag 1
Variable y1 y2
y1
y2
-0.80070 0.33417
-0.59320 -0.53480
Vector Autoregressive Process with Exogenous Variables A VAR process can be affected by other observable variables that are determined outside the system of interest. Such variables are called exogenous (independent) variables. Exogenous variables can be stochastic or nonstochastic. The process can also be affected by the lags of exogenous variables. A model used to describe this process is called a VARX(p,s) model. The VARX(p,s) model is written as yt D ı C
p X
ˆi yt
i D1
i
C
s X
‚i xt
i
C t
i D0
where xt D .x1t ; : : : ; xrt /0 is an r-dimensional time series vector and ‚i is a k r matrix. For example, a VARX(1,0) model is yt D ı C ˆ1 yt
1
C ‚0 xt C t
Vector Autoregressive Process with Exogenous Variables F 1875
where yt D .y1t ; y2t ; y3t /0 and xt D .x1t ; x2t /0 . The following statements fit the VARX(1,0) model to the given data: data grunfeld; input year y1 y2 y3 x1 x2 x3; label y1=’Gross Investment GE’ y2=’Capital Stock Lagged GE’ y3=’Value of Outstanding Shares GE Lagged’ x1=’Gross Investment W’ x2=’Capital Stock Lagged W’ x3=’Value of Outstanding Shares Lagged W’; datalines; 1935 33.1 1170.6 97.8 12.93 191.5 1.8 1936 45.0 2015.8 104.4 25.90 516.0 .8 1937 77.2 2803.3 118.0 35.05 729.0 7.4 ... more lines ...
/*--- Vector Autoregressive Process with Exogenous Variables ---*/ proc varmax data=grunfeld; model y1-y3 = x1 x2 / p=1 lagmax=5 printform=univariate print=(impulsx=(all) estimates); run;
The VARMAX procedure output is shown in Figure 30.18 through Figure 30.20. Figure 30.18 shows the descriptive statistics for the dependent (endogenous) and independent (exogenous) variables with labels.
1876 F Chapter 30: The VARMAX Procedure
Figure 30.18 Descriptive Statistics for the VARX(1, 0) Model The VARMAX Procedure Number of Observations Number of Pairwise Missing
20 0
Simple Summary Statistics
Variable Type y1 y2 y3 x1 x2
Dependent Dependent Dependent Independent Independent
N
Mean
Standard Deviation
Min
Max
20 20 20 20 20
102.29000 1941.32500 400.16000 42.89150 670.91000
48.58450 413.84329 250.61885 19.11019 222.39193
33.10000 1170.60000 97.80000 12.93000 191.50000
189.60000 2803.30000 888.90000 90.08000 1193.50000
Simple Summary Statistics Variable Label y1 y2 y3 x1 x2
Gross Investment GE Capital Stock Lagged GE Value of Outstanding Shares GE Lagged Gross Investment W Capital Stock Lagged W
Figure 30.19 shows the parameter estimates for the constant, the lag zero coefficients of exogenous variables, and the lag one AR coefficients. From the schematic representation of parameter estimates, the significance of the parameter estimates can be easily verified. The symbol “C” means the constant and “XL0” means the lag zero coefficients of exogenous variables.
Vector Autoregressive Process with Exogenous Variables F 1877
Figure 30.19 Parameter Estimates for the VARX(1, 0) Model The VARMAX Procedure Type of Model Estimation Method
VARX(1,0) Least Squares Estimation
Constant Variable
Constant
y1 y2 y3
-12.01279 702.08673 -22.42110
XLag Lag 0
Variable y1 y2 y3
x1
x2
1.69281 -6.09850 -0.02317
-0.00859 2.57980 -0.01274
AR Lag 1
Variable y1 y2 y3
y1
y2
y3
0.23699 -2.46656 0.95116
0.00763 0.16379 0.00224
0.02941 -0.84090 0.93801
Schematic Representation Variable/ Lag
C
XL0
AR1
y1 y2 y3
. + -
+. .+ ..
... ... +.+
+ is > 2*std error, is < -2*std error, . is between, * is N/A
Figure 30.20 shows the parameter estimates and their significance.
1878 F Chapter 30: The VARMAX Procedure
Figure 30.20 Parameter Estimates for the VARX(1, 0) Model Continued Model Parameter Estimates
Equation Parameter y1
Estimate
CONST1 XL0_1_1 XL0_1_2 AR1_1_1 AR1_1_2 AR1_1_3 CONST2 XL0_2_1 XL0_2_2 AR1_2_1 AR1_2_2 AR1_2_3 CONST3 XL0_3_1 XL0_3_2 AR1_3_1 AR1_3_2 AR1_3_3
y2
y3
-12.01279 1.69281 -0.00859 0.23699 0.00763 0.02941 702.08673 -6.09850 2.57980 -2.46656 0.16379 -0.84090 -22.42110 -0.02317 -0.01274 0.95116 0.00224 0.93801
Standard Error t Value Pr > |t| Variable 27.47108 0.54395 0.05361 0.20668 0.01627 0.04852 256.48046 5.07849 0.50056 1.92967 0.15193 0.45304 10.31166 0.20418 0.02012 0.07758 0.00611 0.01821
-0.44 3.11 -0.16 1.15 0.47 0.61 2.74 -1.20 5.15 -1.28 1.08 -1.86 -2.17 -0.11 -0.63 12.26 0.37 51.50
0.6691 0.0083 0.8752 0.2722 0.6470 0.5548 0.0169 0.2512 0.0002 0.2235 0.3006 0.0862 0.0487 0.9114 0.5377 0.0001 0.7201 0.0001
1 x1(t) x2(t) y1(t-1) y2(t-1) y3(t-1) 1 x1(t) x2(t) y1(t-1) y2(t-1) y3(t-1) 1 x1(t) x2(t) y1(t-1) y2(t-1) y3(t-1)
The fitted model is given as 0
y1t B B B y2t B @ y3t
1
0
B C B C B C D B C B B A @
12:013 .27:471/ 702:086 .256:480/ 22:421 .10:312/
1
0
C B C B C B CCB C B C B A @
0
0:237 0:008 B .0:207/ .0:016/ B B 2:467 0:164 C B B .1:930/ .0:152/ B @ 0:951 0:002 .0:078/ .0:006/
1 1:693 0:009 0 1 .0:544/ .0:054/ C C x1t C 6:099 2:580 C @ A .5:078/ .0:501/ C C x2t 0:023 0:013 A .0:204/ .0:020/ 1 1 0 0:029 0 y1;t 1 1t C .0:049/ C B C B B C B 0:841 C C B y2;t 1 C C B 2t C B C B .0:453/ C @ A @ A 0:938 y3;t 1 3t .0:018/
Parameter Estimation and Testing on Restrictions In the previous example, the VARX(1,0) model is written as yt D ı C ‚0 xt C ˆ1 yt
1
C t
1 C C C C A
Parameter Estimation and Testing on Restrictions F 1879
with 1 0 1 11 12 13 12 11 A 22 ˆ1 D @ 21 22 23 A ‚0 D @ 21 31 32 33 31 32 0
In Figure 30.20 of the preceding section, you can see several insignificant parameters. For example, the coefficients XL0_1_2, AR1_1_2, and AR1_3_2 are insignificant. D 12 D 32 D 0 for the VARX(1,0) The following statements restrict the coefficients of 12 model.
/*--- Models with Restrictions and Tests ---*/ proc varmax data=grunfeld; model y1-y3 = x1 x2 / p=1 print=(estimates); restrict XL(0,1,2)=0, AR(1,1,2)=0, AR(1,3,2)=0; run; The output in Figure 30.21 shows that three parameters 12 , 12 , and 32 are replaced by the restricted values, zeros. In the schematic representation of parameter estimates, the three restricted parameters 12 , 12 , and 32 are replaced by .
1880 F Chapter 30: The VARMAX Procedure
Figure 30.21 Parameter Estimation with Restrictions The VARMAX Procedure XLag Lag 0
Variable y1 y2 y3
x1
x2
1.67592 -6.30880 -0.03576
0.00000 2.65308 -0.00919
AR Lag 1
Variable
y2
y3
0.27671 -2.16968 0.96398
0.00000 0.10945 0.00000
0.01747 -0.93053 0.93412
y1 y2 y3
y1
Schematic Representation Variable/ Lag
C
XL0
AR1
y1 y2 y3
. + -
+* .+ ..
.*. ..+*+
+ is > 2*std error, is < -2*std error, . is between, * is N/A
The output in Figure 30.22 shows the estimates of the Lagrangian parameters and their significance. Based on the p-values associated with the Lagrangian parameters, you cannot reject the null hypotheses 12 D 0, 12 D 0, and 32 D 0 with the 0.05 significance level. Figure 30.22 RESTRICT Statement Results Testing of the Restricted Parameters
Parameter
Estimate
Standard Error
t Value
Pr > |t|
XL0_1_2 AR1_1_2 AR1_3_2
1.74969 30.36254 55.42191
21.44026 70.74347 164.03075
0.08 0.43 0.34
0.9389 0.6899 0.7524
The TEST statement in the following example tests 31 D 0 and 12 D 12 D 32 D 0 for the VARX(1,0) model:
Causality Testing F 1881
proc varmax data=grunfeld; model y1-y3 = x1 x2 / p=1; test AR(1,3,1)=0; test XL(0,1,2)=0, AR(1,1,2)=0, AR(1,3,2)=0; run;
The output in Figure 30.23 shows that the first column in the output is the index corresponding to each TEST statement. You can reject the hypothesis test 31 D 0 at the 0.05 significance level, but D 12 D 32 D 0 at the 0.05 significance level. you cannot reject the joint hypothesis test 12 Figure 30.23 TEST Statement Results The VARMAX Procedure Testing of the Parameters Test
DF
Chi-Square
Pr > ChiSq
1 2
1 3
150.31 0.34
<.0001 0.9522
Causality Testing The following statements use the CAUSAL statement to compute the Granger causality test for a VAR(1) model. For the Granger causality tests, the autoregressive order should be defined by the P= option in the MODEL statement. The variable groups are defined in the MODEL statement as well. Regardless of whether the variables specified in the GROUP1= and GROUP2= options are designated as dependent or exogenous (independent) variables in the MODEL statement, the CAUSAL statement fits the VAR(p) model by considering the variables in the two groups as dependent variables. /*--- Causality Testing ---*/ proc varmax data=grunfeld; model y1-y3 = x1 x2 / p=1 noprint; causal group1=(x1) group2=(y1-y3); causal group1=(y3) group2=(y1 y2); run;
The output in Figure 30.24 is associated with the CAUSAL statement. The first CAUSAL statement fits the VAR(1) model by using the variables y1, y2, y3, and x1. The second CAUSAL statement fits the VAR(1) model by using the variables y1, y3, and y2.
1882 F Chapter 30: The VARMAX Procedure
Figure 30.24 CAUSAL Statement Results The VARMAX Procedure Granger-Causality Wald Test Test
DF
Chi-Square
Pr > ChiSq
1 2
3 2
2.40 262.88
0.4946 <.0001
Test 1:
Group 1 Variables: Group 2 Variables:
Test 2:
x1 y1 y2 y3
Group 1 Variables: Group 2 Variables:
y3 y1 y2
The null hypothesis of the Granger causality test is that GROUP1 is influenced only by itself, and not by GROUP2. The first column in the output is the index corresponding to each CAUSAL statement. The output shows that you cannot reject that x1 is influenced by itself and not by .y1; y2; y3/ at the 0.05 significance level for Test 1. You can reject that y3 is influenced by itself and not by .y1; y2/ for Test 2. See the section “VAR and VARX Modeling” on page 1941 for details.
Syntax: VARMAX Procedure PROC VARMAX options ; BY variables ; CAUSAL group1=(variables) group2=(variables) ; COINTEG rank=number < options > ; ID variable interval=value < option > ; MODEL dependent variables < =regressors > < , dependent variables < =regressors > . . . > < / options > ; GARCH options ; NLOPTIONS options ; OUTPUT < options > ; RESTRICT restrictions ; TEST restrictions ;
Functional Summary F 1883
Functional Summary The statements and options used with the VARMAX procedure are summarized in the following table: Table 30.1
VARMAX Functional Summary
Description Data Set Options specify the input data set write parameter estimates to an output data set include covariances in the OUTEST= data set write the diagnostic checking tests for a model and the cointegration test results to an output data set write actuals, predictions, residuals, and confidence limits to an output data set write the conditional covariance matrix to an output data set
Statement
Option
VARMAX VARMAX VARMAX VARMAX
DATA= OUTEST= OUTCOV OUTSTAT=
OUTPUT
OUT=
GARCH
OUTHT=
BY Groups specify BY-group processing
BY
ID Variable specify identifying variable specify the time interval between observations control the alignment of SAS Date values
ID ID ID
Options to Control the Optimization Process specify the optimization options
NLOPTIONS
Printing Control Options specify how many lags to print results suppress the printed output request all printing options request the printing format controls plots produced through ODS GRAPHICS
MODEL MODEL MODEL MODEL VARMAX
LAGMAX= NOPRINT PRINTALL PRINTFORM= PLOTS=
MODEL MODEL
CORRB CORRX
MODEL
CORRY
MODEL MODEL
COVPE COVX
PRINT= Option print the correlation matrix of parameter estimates print the cross-correlation matrices of independent variables print the cross-correlation matrices of dependent variables print the covariance matrices of prediction errors print the cross-covariance matrices of the independent variables
INTERVAL= ALIGN=
1884 F Chapter 30: The VARMAX Procedure
Table 30.1
continued
Description
Statement
Option
print the cross-covariance matrices of the dependent variables print the covariance matrix of parameter estimates print the decomposition of the prediction error covariance matrix print the residual diagnostics print the contemporaneous relationships among the components of the vector time series print the parameter estimates print the infinite order AR representation print the impulse response function print the impulse response function in the transfer function print the partial autoregressive coefficient matrices print the partial canonical correlation matrices print the partial correlation matrices print the eigenvalues of the companion matrix print the Yule-Walker estimates
MODEL
COVY
MODEL MODEL
COVB DECOMPOSE
MODEL MODEL
DIAGNOSE DYNAMIC
MODEL MODEL MODEL MODEL
ESTIMATES IARR IMPULSE= IMPULSX=
MODEL MODEL MODEL MODEL MODEL
PARCOEF PCANCORR PCORR ROOTS YW
MODEL MODEL
CENTER DIF=
MODEL
DIFX=
MODEL
DIFY=
MODEL MODEL MODEL MODEL
ECM= METHOD= MINIC= NOCURRENTX
MODEL MODEL MODEL MODEL MODEL MODEL MODEL MODEL
NOINT NSEASON= P= PRIOR= Q= SCENTER TREND= VARDEF=
MODEL
XLAG=
Model Estimation and Order Selection Options center the dependent variables specify the degrees of differencing for the specified model variables specify the degrees of differencing for all independent variables specify the degrees of differencing for all dependent variables specify the vector error correction model specify the estimation method select the tentative order suppress the current values of independent variables suppress the intercept parameters specify the number of seasonal periods specify the order of autoregressive polynomial specify the Bayesian prior model specify the order of moving-average polynomial center the seasonal dummies specify the degree of time trend polynomial specify the denominator for error covariance matrix estimates specify the lag order of independent variables
PROC VARMAX Statement F 1885
Table 30.1
continued
Description
Statement
Option
GARCH Related Options specify the GARCH-type model specify the order of the GARCH polynomial specify the order of the ARCH polynomial
GARCH GARCH GARCH
FORM= P= Q=
COINTEG
EXOGENEITY
COINTEG
H=
COINTEG
J=
Cointegration Related Options print the results from the weak exogeneity test of the long-run parameters specify the restriction on the cointegrated coefficient matrix specify the restriction on the adjustment coefficient matrix specify the variable name whose cointegrating vectors are normalized specify a cointegration rank print the Johansen cointegration rank test
COINTEG
NORMALIZE=
COINTEG MODEL
print the Stock-Watson common trends test print the Dickey-Fuller unit root test
MODEL MODEL
RANK= COINTTEST= (JOHANSEN= ) COINTTEST=(SW= ) DFTEST=
Tests and Restrictions on Parameters test the Granger causality
CAUSAL
place and test restrictions on parameter estimates test hypotheses on parameter estimates
RESTRICT TEST
Forecasting Control Options specify the size of confidence limits for forecasting start forecasting before end of the input data specify how many periods to forecast suppress the printed forecasts
OUTPUT OUTPUT OUTPUT OUTPUT
PROC VARMAX Statement PROC VARMAX options ;
The following options can be used in the PROC VARMAX statement:
GROUP1= GROUP2=
ALPHA= BACK= LEAD= NOPRINT
1886 F Chapter 30: The VARMAX Procedure
DATA=SAS-data-set
specifies the input SAS data set. If the DATA= option is not specified, the PROC VARMAX statement uses the most recently created SAS data set. OUTEST=SAS-data-set
writes the parameter estimates to the output data set. COVOUT OUTCOV
writes the covariance matrix for the parameter estimates to the OUTEST= data set. This option is valid only if the OUTEST= option is specified. OUTSTAT=SAS-data-set
writes residual diagnostic results to an output data set. If the COINTTEST=(JOHANSEN) option is specified, the results of this option are also written to the output data set. The following statements are the examples of these options in the PROC VARMAX statement: proc varmax data=one outest=est outcov outstat=stat; model y1-y3 / p=1; run; proc varmax data=one outest=est outstat=stat; model y1-y3 / p=1 cointtest=(johansen); run;
PLOTS< (global-plot-option) > = plot-request-option < (options) > PLOTS< (global-plot-option) > = ( plot-request-option < (options) > ... plot-request-option < (options) > )
controls the plots produced through ODS Graphics. When you specify only one plot, you can omit the parentheses around the plot request. Some examples follow: plots=none plots=all plots(unpack)=residual(residual normal) plots=(forecasts model)
You must enable ODS Graphics before requesting plots as shown in the following example. For general information about ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS” (SAS/STAT User’s Guide). ods graphics on; proc varmax data=one plots=impulse(simple); model y1-y3 / p=1; run; proc varmax data=one plots=(model residual); model y1-y3 / p=1; run;
PROC VARMAX Statement F 1887
proc varmax data=one plots=forecasts; model y1-y3 / p=1; output lead=12; run;
The first VARMAX program produces the simple response impulse plots. The second VARMAX program produces the plots associated with the model and prediction errors. The plots associated with prediction errors are the ACF, PACF, IACF, distribution, white-noise, and Normal quantile plots and the prediction error plot. The third VARMAX program produces the FORECASTS and FORECASTSONLY plots. The global-plot-option applies to the impulse and prediction error analysis plots generated by the VARMAX procedure. The following global-plot-option is available: UNPACK
breaks a graphic that is otherwise paneled into individual component plots.
The following plot-request-options are available: ALL
produces all plots appropriate for the particular analysis.
FORECASTS < (forecasts-plot-options ) > produces plots of the forecasts. The forecastsonly plot that shows the multistep forecasts in the forecast region is produced by default. The following forecasts-plot-options are available: ALL
produces the FORECASTSONLY and the FORECASTS plots. This is the default.
FORECASTS
produces a plot that shows the one-step-ahead as well as the multistep forecasts.
FORECASTSONLY produces a plot that shows only the multistep forecasts. IMPULSE < (impulse-plot-options ) > produces the plots of impulse response function and the impulse response of the transfer function. ALL
produces all impulse plots. This is the default.
ACCUM
produces the accumulated impulse plot.
ORTH
produces the orthogonalized impulse plot.
SIMPLE
produces the simple impulse plot.
MODEL
produces plots of dependent variables listed in the MODEL statement and plots of the one-step-ahead predicted values for each dependent variables.
NONE
suppresses all plots.
RESIDUAL < (residual-plot-options ) > produces plots associated with the prediction errors obtained after modeling the data. The following residual-plot-options are available: ALL
produces all plots associated with the analysis of the prediction errors. This is the default.
RESIDUAL
produces prediction error plot.
1888 F Chapter 30: The VARMAX Procedure
DIAGNOSTICS produces a panel of plots useful in assessing the autocorrelations and white-noise of the prediction errors. The panel consists of the following:
NORMAL
the autocorrelation plot of the prediction errors
the partial autocorrelation plot of the prediction errors
the inverse autocorrelation plot of the prediction errors
the log scaled white noise plot of the prediction errors
produces a panel of plots useful in assessing normality of the prediction errors. The panel consists of the following:
distribution of the prediction errors with overlaid the normal curve
normal quantile plot of the prediction errors
Other Options In addition, any of the following MODEL statement options can be specified in the PROC VARMAX statement, which is equivalent to specifying the option for every MODEL statement: CENTER, DFTEST=, DIF=, DIFX=, DIFY=, LAGMAX=, METHOD=, MINIC=, NOCURRENTX, NOINT, NOPRINT, NSEASON=, P=, PRINT=, PRINTALL, PRINTFORM=, Q=, SCENTER, TREND=, VARDEF=, and XLAG= options. The following is an example of the options in the PROC VARMAX statement: proc varmax data=one lagmax=3 method=ml; model y1-y3 / p=1; run;
BY Statement BY variables ;
A BY statement can be used with PROC VARMAX to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. If your input data set is not sorted in ascending order, use one of the following alternatives: Sort the data using the SORT procedure with a similar BY statement.
CAUSAL Statement F 1889
Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the VARMAX procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order. Create an index on the BY variables using the DATASETS procedure. For more information about the BY statement, see in SAS Language Reference: Concepts. For more information about the DATASETS procedure, see the discussion in the Base SAS Procedures Guide. The following is an example of the BY statement: proc varmax data=one; by region; model y1-y3 / p=1; run;
CAUSAL Statement CAUSAL GROUP1=( variables) GROUP2=( variables) ;
A CAUSAL statement prints the Granger causality test by fitting the VAR(p) model by using all variables defined in GROUP1 and GROUP2. Any number of CAUSAL statements can be specified. The CAUSAL statement proceeds with the MODEL statement and uses the variables and the autoregressive order, p, specified in the MODEL statement. Variables in the GROUP1= and GROUP2= options should be defined in the MODEL statement. If the P=0 option is specified in the MODEL statement, the CAUSAL statement is not applicable. The null hypothesis of the Granger causality test is that GROUP1 is influenced only by itself, and not by GROUP2. If the hypothesis test fails to reject the null, then the variables listed in GROUP1 might be considered as independent variables. See the section “VAR and VARX Modeling” on page 1941 for details. The following is an example of the CAUSAL statement. You specify the CAUSAL statement with the GROUP1= and GROUP2= options. proc varmax data=one; model y1-y3 = x1 / p=1; causal group1=(x1) group2=(y1-y3); causal group1=(y2) group2=(y1 y3); run;
The first CAUSAL statement fits the VAR(1) model by using the variables y1, y2, y3, and x1 and tests the null hypothesis that x1 causes the other variables, y1, y2, and y3, but the other variables do not cause x1. The second CAUSAL statement fits the VAR(1) model by using the variables y1, y3, and y2 and tests the null hypothesis that y2 causes the other variables, y1 and y3, but the other variables do not cause y2.
1890 F Chapter 30: The VARMAX Procedure
COINTEG Statement COINTEG RANK=number < H=( matrix) > < J=( matrix) > < EXOGENEITY > < NORMALIZE=variable > ;
The COINTEG statement fits the vector error correction model to the data, tests the restrictions of the long-run parameters and the adjustment parameters, and tests for the weak exogeneity in the long-run parameters. The cointegrated system uses the maximum likelihood analysis proposed by Johansen and Juselius (1990) and Johansen (1995a, 1995b). Only one COINTEG statement is allowed. You specify the ECM= option in the MODEL statement or the COINTEG statement to fit the VECM(p). The P= option in the MODEL statement is used to specify the autoregressive order of the VECM. The following statements are equivalent for fitting a VECM(2). proc varmax data=one; model y1-y3 / p=2 ecm=(rank=1); run;
proc varmax data=one; model y1-y3 / p=2; cointeg rank=1; run;
To test restrictions of either ˛ or ˇ or both, you specify either J= or H= or both, respectively. You specify the EXOGENEITY option in the COINTEG statement for tests of the weak exogeneity in the long-run parameters. The following is an example of the COINTEG statement. proc varmax data=one; model y1-y3 / p=2; cointeg rank=1 h=(1 0, -1 0, 0 1) j=(1 0, 0 0, 0 1) exogeneity; run;
The following options can be used in the COINTEG statement: EXOGENEITY
formulates the likelihood ratio tests for testing weak exogeneity in the long-run parameters. The null hypothesis is that one variable is weakly exogenous for the others. H=(matrix)
specifies the restrictions H on the k r or .k C 1/ r cointegrated coefficient matrix ˇ such that ˇ D H, where H is known and is unknown. If the VECM(p) is specified with the COINTEG statement or with the ECM= option in the MODEL statement and the ECTREND
COINTEG Statement F 1891
option is not included with the ECM= specification, then the H matrix has dimension k m. If the VECM(p) is specified with the COINTEG statement or with the ECM= option in the MODEL statement and the ECTREND option is also used, then the H matrix has dimension .k C 1/ m. Here k is the number of dependent variables, and m is r m < k where r is defined with the RANK=r option. For example, consider a system that contains four variables and the RANK=1 option with ˇ D .ˇ1 ; ˇ2 ; ˇ3 ; ˇ4 /0 . The restriction matrix for the test of ˇ1 C ˇ2 D 0 can be specified as cointeg rank=1 h=(1 0 0, -1 0 0, 0 1 0, 0 0 1);
Here the matrix H is 4 3 where k D 4 and m D 3, and each row of the matrix H is separated by commas. When the series has no separate deterministic trend, the constant term should be restricted by ˛0? ı D 0. In the preceding example, the ˇ can be either ˇ D .ˇ1 ; ˇ2 ; ˇ3 ; ˇ4 ; 1/0 or ˇ D .ˇ1 ; ˇ2 ; ˇ3 ; ˇ4 ; t /0 . You can specify the restriction matrix for the previous test of ˇ1 Cˇ2 D 0 as follows: cointeg rank=1 h=(1 0 0 0, -1 0 0 0, 0 1 0 0, 0 0 1 0, 0 0 0 1);
When the cointegrated system contains three dependent variables and the RANK=2 option, you can specify the restriction matrix for the test of ˇ1j D ˇ2j for j D 1; 2 as follows: cointeg rank=2 h=(1 0, -1 0, 0 1);
J=(matrix)
specifies the restrictions J on the k r adjustment matrix ˛ such that ˛ D J , where J is known and is unknown. The k m matrix J is specified by using this option, where k is the number of dependent variables, m is r m < k, and r is defined with the RANK=r option. For example, when the system contains four variables and the RANK=1 option is used, you can specify the restriction matrix for the test of ˛j D 0 for j D 2; 3; 4 as follows: cointeg rank=1 j=(1, 0, 0, 0);
When the system contains three variables and the RANK=2 option, you can specify the restriction matrix for the test of ˛2j D 0 for j D 1; 2 as follows: cointeg rank=2 j=(1 0, 0 0, 0 1);
NORMALIZE=variable
specifies a single dependent (endogenous) variable name whose cointegrating vectors are normalized. If the variable name is different from that specified in the COINTTEST=(JOHANSEN= ) or ECM= option in the MODEL statement, the variable name specified in the COINTEG statement is used. If the normalized variable is not specified, cointegrating vectors are not normalized.
1892 F Chapter 30: The VARMAX Procedure
RANK=number
specifies the cointegration rank of the cointegrated system. This option is required in the COINTEG statement. The rank of cointegration should be greater than zero and less than the number of dependent (endogenous) variables. If the value of the RANK= option in the COINTEG statement is different from that specified in the ECM= option, the rank specified in the COINTEG statement is used.
ID Statement ID variable INTERVAL=value < ALIGN=value > ;
The ID statement specifies a variable that identifies observations in the input data set. The datetime variable specified in the ID statement is included in the OUT= data set if the OUTPUT statement is specified. Note that the ID variable is usually a SAS datetime variable. The values of the ID variable are extrapolated for the forecast observations based on the value of the INTERVAL= option. ALIGN=value
controls the alignment of SAS dates used to identify output observations. The ALIGN= option allows the following values: BEGINNING | BEG | B, MIDDLE | MID | M, and ENDING | END | E. The default is BEGINNING. The ALIGN= option is used to align the ID variable to the beginning, middle, or end of the time ID interval specified by the INTERVAL= option. INTERVAL=value
specifies the time interval between observations. This option is required in the ID statement. The INTERVAL= option is used in conjunction with the ID variable to check that the input data are in order and have no missing periods. The INTERVAL= option is also used to extrapolate the ID values past the end of the input data when the OUTPUT statement is specified. The following is an example of the ID statement: proc varmax data=one; id date interval=qtr align=mid; model y1-y3 / p=1; run;
MODEL Statement MODEL dependents < = regressors > < , dependents < = regressors > . . . > < / options > ;
The MODEL statement specifies dependent (endogenous) variables and independent (exogenous) variables for the VARMAX model. The multivariate model can have the same or different independent variables corresponding to the dependent variables. As a special case, the VARMAX procedure allows you to analyze one dependent variable. Only one MODEL statement is allowed.
MODEL Statement F 1893
For example, the following statements are equivalent ways of specifying the multivariate model for the vector .y1; y2; y3/: model y1 y2 y3 ; model y1-y3 ;
The following statements are equivalent ways of specifying the multivariate model with independent variables, where y1; y2; y3, and y4 are the dependent variables and x1; x2; x3; x4, and x5 are the independent variables: model model model model
y1 y2 y3 y4 = x1 x2 x3 x4 x5 ; y1 y2 y3 y4 = x1-x5 ; y1 = x1-x5, y2 = x1-x5, y3 y4 = x1-x5 ; y1-y4 = x1-x5 ;
When the multivariate model has different independent variables that correspond to each of the dependent variables, equations are separated by commas (,) and the model can be specified as illustrated by the following MODEL statement: model y1 = x1-x3, y2 = x3-x5, y3 y4 = x1-x5 ;
The following options can be used in the MODEL statement after a forward slash (/): CENTER
centers the dependent (endogenous) variables by subtracting their means. Note that centering is done after differencing when the DIF= or DIFY= option is specified. If there are exogenous (independent) variables, this option is not applicable. model y1 y2 / p=1 center;
DIF(variable (number-list) < ... variable (number-list) >) DIF=(variable (number-list) < ... variable (number-list) >)
specifies the degrees of differencing to be applied to the specified dependent or independent variables. The number-list must contain one or more numbers, each of which should be greater than zero. The differencing can be the same for all variables, or it can vary among variables. For example, the DIF=(y1 (1,4) y3 (1) x2 (2)) option specifies that the series y1 is differenced at lag 1 and at lag 4, which is .1
B 4 /.1
B/y1t D .y1t
y1;t
1/
.y1;t
the series y3 is differenced at lag 1, which is .y3t at lag 2, which is .x2t x2;t 2 /.
4
y3;t
y1;t 1 /;
5/
and the series x2 is differenced
The following uses the data dy1, y2, x1, and dx2, where dy1 D .1 .1 B/2 x2t . model y1 y2 = x1 x2 / p=1 dif=(y1(1) x2(2));
B/y1t and dx2 D
1894 F Chapter 30: The VARMAX Procedure
DIFX(number-list) DIFX=(number-list)
specifies the degrees of differencing to be applied to all independent variables. The numberlist must contain one or more numbers, each of which should be greater than zero. For example, the DIFX=(1) option specifies that all of the independent series are differenced once at lag 1. The DIFX=(1,4) option specifies that all of the independent series are differenced at lag 1 and at lag 4. If independent variables are specified in the DIF= option, then the DIFX= option is ignored. The following statement uses the data y1, y2, dx1, and dx2, where dx1 D .1 dx2 D .1 B/x2t .
B/x1t and
model y1 y2 = x1 x2 / p=1 difx(1);
DIFY(number-list) DIFY=(number-list)
specifies the degrees of differencing to be applied to all dependent (endogenous) variables. The number-list must contain one or more numbers, each of which should be greater than zero. For details, see the DIFX= option. If dependent variables are specified in the DIF= option, then the DIFY= option is ignored. model y1 y2 / p=1 dify(1);
METHOD=value
requests the type of estimates to be computed. The possible values of the METHOD= option are as follows: LS
specifies least squares estimates.
ML
specifies maximum likelihood estimates.
When the ECM=, PRIOR=, and Q= options and the GARCH statement are specified, the default ML method is used regardless of the method given by the METHOD= option. model y1 y2 / p=1 method=ml;
NOCURRENTX
suppresses the current values xt of the independent variables. In general, the VARX(p; s) model is yt D ı C
p X i D1
ˆi yt
i
C
s X
‚i xt
i
C t
i D0
where p is the number of lags of the dependent variables included in the model, and s is the number of lags of the independent variables included in the model, including the contemporaneous values of xt . A VARX(1,2) model can be specified as: model y1 y2 = x1 x2 / p=1 xlag=2;
MODEL Statement F 1895
If the NOCURRENTX option is specified, it suppresses the current values xt and starts with xt 1 . The VARX(p; s) model is redefined as: yt D ı C
p X iD1
ˆi yt
i
C
s X
‚i xt
i
C t
i D1
This model with p D 1 and s D 2 can be specified as: model y1 y2 = x1 x2 / p=1 xlag=2 nocurrentx;
NOINT
suppresses the intercept parameter ı. model y1 y2 / p=1 noint;
NSEASON=number
specifies the number of seasonal periods. When the NSEASON=number option is specified, (number –1) seasonal dummies are added to the regressors. If the NOINT option is specified, the NSEASON= option is not applicable. model y1 y2 / p=1 nseason=4;
SCENTER
centers seasonal dummies specified by the NSEASON= option. The centered seasonal dummies are generated by c .1=s/, where c is a seasonal dummy generated by the NSEASON=s option. model y1 y2 / p=1 nseason=4 scenter;
TREND=value
specifies the degree of deterministic time trend included in the model. Valid values are as follows: LINEAR
includes a linear time trend as a regressor.
QUAD
includes linear and quadratic time trends as regressors.
The TREND=QUAD option is not applicable for a cointegration analysis. model y1 y2 / p=1 trend=linear;
VARDEF=value
corrects for the degrees of freedom of the denominator for computing an error covariance matrix for the METHOD=LS option. If the METHOD=ML option is specified, the VARDEF=N option is always used. Valid values are as follows: DF
specifies that the number of nonmissing observation minus the number of regressors be used.
N
specifies that the number of nonmissing observation be used. model y1 y2 / p=1 vardef=n;
1896 F Chapter 30: The VARMAX Procedure
Printing Control Options LAGMAX=number
specifies the maximum number of lags for which results are computed and displayed by the PRINT=(CORRX CORRY COVX COVY IARR IMPULSE= IMPULSX= PARCOEF PCANCORR PCORR) options. This option is also used to limit the printed results for the cross covariances and cross-correlations of residuals. The default is LAGMAX=min(12, T 2), where T is the number of nonmissing observations. model y1 y2 / p=1 lagmax=6;
NOPRINT
suppresses all printed output. model y1 y2 / p=1 noprint;
PRINTALL
requests all printing control options. The options set by the option PRINTALL are DFTEST=, MINIC=, PRINTFORM=BOTH, and PRINT=(CORRB CORRX CORRY COVB COVPE COVX COVY DECOMPOSE DYNAMIC IARR IMPULSE=(ALL) IMPULSX=(ALL) PARCOEF PCANCORR PCORR ROOTS YW). You can also specify this option as the option ALL. model y1 y2 / p=1 printall;
PRINTFORM=value
requests the printing format of the output generated by the PRINT= option and cross covariances and cross-correlations of residuals. Valid values are as follows: BOTH
prints output in both MATRIX and UNIVARIATE forms.
MATRIX
prints output in matrix form. This is the default.
UNIVARIATE
prints output by variables.
model y1 y2 / p=1 print=(impulse) printform=univariate;
Printing Options PRINT=(options)
The following options can be used in the PRINT=( ) option. The options are listed within parentheses. If a number in parentheses follows an option listed below, then the option prints the number of lags specified by number in parentheses. The default is the number of lags specified by the LAGMAX=number option. CORRB
prints the estimated correlations of the parameter estimates.
MODEL Statement F 1897
CORRX CORRX(number )
prints the cross-correlation matrices of exogenous (independent) variables. The number should be greater than zero. CORRY CORRY(number )
prints the cross-correlation matrices of dependent (endogenous) variables. The number should be greater than zero. COVB
prints the estimated covariances of the parameter estimates. COVPE COVPE(number )
prints the covariance matrices of number-ahead prediction errors for the VARMAX(p,q,s) model. The number should be greater than zero. If the DIF= or DIFY= option is specified, the covariance matrices of multistep prediction errors are computed based on the differenced data. This option is not applicable when the PRIOR= option is specified. See the section “Forecasting” on page 1930 for details. COVX COVX(number )
prints the cross-covariance matrices of exogenous (independent) variables. The number should be greater than zero. COVY COVY(number )
prints the cross-covariance matrices of dependent (endogenous) variables. The number should be greater than zero. DECOMPOSE DECOMPOSE(number )
prints the decomposition of the prediction error covariances using up to the number of lags specified by number in parentheses for the VARMA(p,q) model. The number should be greater than zero. It can be interpreted as the contribution of innovations in one variable to the mean squared error of the multistep forecast of another variable. The DECOMPOSE option also prints proportions of the forecast error variance. If the DIF= or DIFY= option is specified, the covariance matrices of multistep prediction errors are computed based on the differenced data. This option is not applicable when the PRIOR= option is specified. See the section “Forecasting” on page 1930 for details. DIAGNOSE
prints the residual diagnostics and model diagnostics. DYNAMIC
prints the contemporaneous relationships among the components of the vector time series.
1898 F Chapter 30: The VARMAX Procedure
ESTIMATES
prints the coefficient estimates and a schematic representation of the significance and sign of the parameter estimates. IARR IARR(number )
prints the infinite order AR representation of a VARMA process. The number should be greater than zero. If the ECM= option and the COINTEG statement are specified, then the reparameterized AR coefficient matrices are printed. IMPULSE IMPULSE(number ) IMPULSE=(SIMPLE ACCUM ORTH STDERR ALL) IMPULSE(number )=(SIMPLE ACCUM ORTH STDERR ALL)
prints the impulse response function. The number should be greater than zero. It investigates the response of one variable to an impulse in another variable in a system that involves a number of other variables as well. It is an infinite order MA representation of a VARMA process. See the section “Impulse Response Function” on page 1919 for details. The following options can be used in the IMPULSE=( ) option. The options are specified within parentheses. ACCUM
prints the accumulated impulse response function.
ALL
is equivalent to specifying all of SIMPLE, ACCUM, ORTH, and STDERR.
ORTH
prints the orthogonalized impulse response function.
SIMPLE
prints the impulse response function. This is the default.
STDERR
prints the standard errors of the impulse response function, the accumulated impulse response function, or the orthogonalized impulse response function.
If the exogenous variables are used to fit the model, then the STDERR option is ignored. IMPULSX IMPULSX(number ) IMPULSX=(SIMPLE ACCUM ALL) IMPULSX(number )=(SIMPLE ACCUM ALL)
prints the impulse response function related to exogenous (independent) variables. The number should be greater than zero. See the section “Impulse Response Function” on page 1919 for details. The following options can be used in the IMPULSX=( ) option. The options are specified within parentheses. ACCUM
prints the accumulated impulse response matrices for the transfer function.
ALL
is equivalent to specifying both SIMPLE and ACCUM.
MODEL Statement F 1899
SIMPLE
prints the impulse response matrices for the transfer function. This is the default.
PARCOEF PARCOEF(number )
prints the partial autoregression coefficient matrices, ˆmm up to the lag number. The number should be greater than zero. With a VAR process, this option is useful for the identification of the order since the ˆmm have the property that they equal zero for m > p under the hypothetical assumption of a VAR(p) model. See the section “Tentative Order Selection” on page 1935 for details. PCANCORR PCANCORR(number )
prints the partial canonical correlations of the process at lag m and the test for testing ˆm =0 for m > p up to the lag number. The number should be greater than zero. The lag m partial canonical correlations are the canonical correlations between yt and yt m , after adjustment for the dependence of these variables on the intervening values yt 1 , . . . , yt mC1 . See the section “Tentative Order Selection” on page 1935 for details. PCORR PCORR(number )
prints the partial correlation matrices. The number should be greater than zero. With a VAR process, this option is useful for a tentative order selection by the same property as the partial autoregression coefficient matrices, as described in the PRINT=(PARCOEF) option. See the section “Tentative Order Selection” on page 1935 for details. ROOTS
prints the eigenvalues of the kp kp companion matrix associated with the AR characteristic function ˆ.B/, where k is the number of dependent (endogenous) variables, and ˆ.B/ is the finite order matrix polynomial in the backshift operator B, such that B i yt D yt i . These eigenvalues indicate the stationary condition of the process since the stationary condition on the roots of jˆ.B/j D 0 in the VAR(p) model is equivalent to the condition in the corresponding VAR(1) representation that all eigenvalues of the companion matrix be less than one in absolute value. Similarly, you can use this option to check the invertibility of the MA process. In addition, when the GARCH statement is specified, this option prints the roots of the GARCH characteristic polynomials to check covariance stationarity for the GARCH process. YW
prints Yule-Walker estimates of the preliminary autoregressive model for the dependent (endogenous) variables. The coefficient matrices are printed using the maximum order of the autoregressive process. Some examples of the PRINT= option are as follows: model model model model
y1 y1 y1 y1
y2 y2 y2 y2
/ / / /
p=1 p=1 p=1 p=1
print=(covy(10) corry(10)); print=(parcoef pcancorr pcorr); print=(impulse(8) decompose(6) covpe(6)); print=(dynamic roots yw);
1900 F Chapter 30: The VARMAX Procedure
Lag Specification Options P=number P=(number-list)
specifies the order of the vector autoregressive process. Subset models of vector autoregressive orders can be specified by listing the desired set of lags. For example, you can specify the P=(1,3,4) option. The P=3 option is equivalent to the P=(1,2,3) option. The default is P=0. If P=0 and there are no exogenous (independent) variables, then the AR polynomial order is automatically determined by minimizing an information criterion. If P=0 and the PRIOR= or ECM= option or both are specified, then the AR polynomial order is determined automatically. If the ECM= option is specified, then subset models of vector autoregressive orders are not allowed and the AR maximum order specified is used. Examples illustrating the P= option follow: model y1 y2 / p=3; model y1 y2 / p=(1,3); model y1 y2 / p=(1,3) prior;
Q=number Q=(number-list)
specifies the order of the moving-average error process. Subset models of moving-average orders can be specified by listing the desired set of lags. For example, you can specify the Q=(1,5) option. The default is Q=0. model y1 y2 / p=1 q=1; model y1 y2 / q=(2);
XLAG=number XLAG=(number-list)
specifies the lags of exogenous (independent) variables. Subset models of distributed lags can be specified by listing the desired set of lags. For example, XLAG=(2) selects only a lag 2 of the exogenous variables. The default is XLAG=0. To exclude the present values of exogenous variables from the model, the NOCURRENTX option must be used. model y1 y2 = x1-x3 / xlag=2 nocurrentx; model y1 y2 = x1-x3 / p=1 xlag=(2);
Tentative Order Selection Options MINIC MINIC=(TYPE=value P=number Q=number PERROR=number )
prints the information criterion for the appropriate AR and MA tentative order selection and for the diagnostic checks of the fitted model.
MODEL Statement F 1901
If the MINIC= option is not specified, all types of information criteria are printed for diagnostic checks of the fitted model. The following options can be used in the MINIC=( ) option. The options are specified within parentheses. P=number P=(pmi n :pmax )
specifies the range of AR orders to be considered in the tentative order selection. The default is P=(0:5). The P=3 option is equivalent to the P=(0:3) option. PERROR=number PERROR=(p;mi n :p;max )
specifies the range of AR orders for obtaining the error series. PERROR=(pmax W pmax C qmax ).
The default is
Q=number Q=(qmi n :qmax )
specifies the range of MA orders to be considered in the tentative order selection. The default is Q=(0:5). TYPE=value
specifies the criterion for the model order selection. Valid criteria are as follows: AIC
specifies the Akaike information criterion.
AICC
specifies the corrected Akaike information criterion. This is the default criterion.
FPE
specifies the final prediction error criterion.
HQC
specifies the Hanna-Quinn criterion.
SBC
specifies the Schwarz Bayesian criterion. You can also specify this value as TYPE=BIC.
model y1 y2 / minic; model y1 y2 / minic=(type=aic p=5);
Cointegration Related Options Two options are related to integrated time series; one is the DFTEST option to test for a unit root and the other is the COINTTEST option to test for cointegration. DFTEST DFTEST=(DLAG=number ) DFTEST=(DLAG=(number ) . . . (number ) )
prints the Dickey-Fuller unit root tests. The DLAG=(number) . . . (number) option specifies the regular or seasonal unit root test. Supported values of number are in 1, 2, 4, 12. If the number is greater than one, a seasonal Dickey-Fuller test is performed. If the TREND= option is specified, the seasonal unit root test is not available. The default is DLAG=1.
1902 F Chapter 30: The VARMAX Procedure
For example, the DFTEST=(DLAG=(1)(12)) option produces two tables: the Dickey-Fuller regular unit root test and the seasonal unit root test. Some examples of the DFTEST= option follow: model model model model
y1 y1 y1 y1
y2 y2 y2 y2
/ / / /
p=2 p=2 p=2 p=2
dftest; dftest=(dlag=4); dftest=(dlag=(1)(12)); dftest cointtest;
COINTTEST COINTTEST=(JOHANSEN < (=options) > SW < (=options) > SIGLEVEL=number )
The following options can be used with the COINTTEST=( ) option. The options are specified within parentheses. JOHANSEN JOHANSEN=(TYPE=value IORDER=number NORMALIZE=variable)
prints the cointegration rank test for multivariate time series based on Johansen’s method. This test is provided when the number of dependent (endogenous) variables is less than or equal to 11. See the section “Vector Error Correction Modeling” on page 1962 for details. The VARX(p,s) model can be written as the error correction model yt D …yt
1C
p X1
ˆi yt
i
C ADt C
i D1
s X
‚i xt
i
C t
i D0
where …, ˆi , A, and ‚i are coefficient parameters; Dt is a deterministic term such as a constant, a linear trend, or seasonal dummies. The I.1/ model is defined by one reduced-rank condition. If the cointegration rank is r < k, then there exist k r matrices ˛ and ˇ of rank r such that … D ˛ˇ 0 . The I.1/ model is rewritten as the I.2/ model 2 yt D …yt
1
‰yt
1C
p X2
‰i 2 yt
i
C ADt C
i D1
where ‰ D Ik
Pp
1 i D1 ˆi
and ‰i D
s X
‚i xt
i
C t
i D0
Pp
1 j Di C1 ˆi .
The I.2/ model is defined by two reduced-rank conditions. One is that … D ˛ˇ 0 , where ˛ and ˇ are k r matrices of full-rank r. The other is that ˛0? ‰ˇ? D 0 where and are .k r/ s matrices with s k r; ˛? and ˇ? are k .k r/ matrices of full-rank k r such that ˛0 ˛? D 0 and ˇ 0 ˇ? D 0. The following options can be used in the JOHANSEN=( ) option. The options are specified within parentheses. IORDER=number
specifies the integrated order.
MODEL Statement F 1903
IORDER=1
prints the cointegration rank test for an integrated order 1 and prints the long-run parameter, ˇ, and the adjustment coefficient, ˛. This is the default. If the IORDER=1 option is specified, then the AR order should be greater than or equal to 1. When the P=0 option, the value of P is set to 1 for the Johansen test.
IORDER=2
prints the cointegration rank test for integrated orders 1 and 2. If the IORDER=2 option is specified, then the AR order should be greater than or equal to 2. If the P=1 option with the IORDER=2 option, then the value of IORDER is set to 1; if the P=0 option with the IORDER=2 option, then the value of P is set to 2.
NORMALIZE=variable specifies the dependent (endogenous) variable name whose cointegration vectors are to be normalized. If the normalized variable is different from that specified in the ECM= option or the COINTEG statement, then the value specified in the COINTEG statement is used. TYPE=value
specifies the type of cointegration rank test to be printed. Valid values are as follows: MAX
prints the cointegration maximum eigenvalue test.
TRACE
prints the cointegration trace test. This is the default.
If the NOINT option is not specified, the procedure prints two different cointegration rank tests in the presence of the unrestricted and restricted deterministic terms (constant or linear trend) models. If the IORDER=2 option is specified, the procedure automatically determines that the TYPE=TRACE option. Some examples illustrating the COINTTEST= option follow: model y1 y2 / p=2 cointtest=(johansen=(type=max normalize=y1)); model y1 y2 / p=2 cointtest=(johansen=(iorder=2 normalize=y1));
SIGLEVEL=value
sets the size of cointegration rank tests and common trends tests. The SIGLEVEL=value can be set to 0.1, 0.05, or 0.01. The default is SIGLEVEL=0.05. model y1 y2 / p=2 cointtest=(johansen siglevel=0.1); model y1 y2 / p=2 cointtest=(sw siglevel=0.1);
SW SW=(TYPE=value LAG=number )
prints common trends tests for a multivariate time series based on the Stock-Watson method. This test is provided when the number of dependent (endogenous) variables is less than or equal to 6. See the section “Common Trends” on page 1960 for details. The following options can be used in the SW=( ) option. The options are listed within parentheses.
1904 F Chapter 30: The VARMAX Procedure
LAG=number
specifies the number of lags. The default is LAG=max(1,p) for the TYPE=FILTDIF or TYPE=FILTRES option, where p is the AR maximum order specified by the P= option; LAG=T 1=4 for the TYPE=KERNEL option, where T is the number of nonmissing observations. If the specified LAG=number exceeds the default, then it is replaced by the default.
TYPE=value
specifies the type of common trends test to be printed. Valid values are as follows: FILTDIF
prints the common trends test based on the filtering method applied to the differenced series. This is the default.
FILTRES
prints the common trends test based on the filtering method applied to the residual series.
KERNEL
prints the common trends test based on the kernel method.
model y1 y2 / p=2 cointtest=(sw); model y1 y2 / p=2 cointtest=(sw=(type=kernel)); model y1 y2 / p=2 cointtest=(sw=(type=kernel lag=3));
Bayesian VARX Estimation Options PRIOR PRIOR=(prior-options)
specifies the prior value of parameters for the BVARX(p, s) model. The BVARX model allows for a subset model specification. If the ECM= option is specified with the PRIOR option, the BVECMX(p, s) form is fitted. To compute the standard errors of the forecasts, a bootstrap procedure is used. See the section “Bayesian VAR and VARX Modeling” on page 1947 for details. The following options can be used with the PRIOR=(prior-options) option. The prior-options are listed within parentheses. IVAR IVAR=(variables)
specifies an integrated BVAR(p) model. The variables should be specified in the MODEL statement as dependent variables. If you use the IVAR option without variables, then it sets the overall prior mean of the first lag of each variable equal to one in its own equation and sets all other coefficients to zero. If variables are specified, it sets the prior mean of the first lag of the specified variables equal to one in its own equation and sets all other coefficients to zero. When the series yt D .y1 ; y2 /0 follows a bivariate BVAR(2) process, the IVAR or IVAR=(y1 y2 ) option is equivalent to specifying MEAN=(1 0 0 0 0 1 0 0). If the PRIOR=(MEAN=) or ECM= option is specified, the IVAR= option is ignored. LAMBDA=value
specifies the prior standard deviation of the AR coefficient parameter matrices. It should be a positive number. The default is LAMBDA=1. As the value of the LAMBDA= option is increased, the BVAR(p) model becomes closer to a VAR(p) model.
MODEL Statement F 1905
MEAN=(vector )
specifies the mean vector in the prior distribution for the AR coefficients. If the vector is not specified, the prior value is assumed to be a zero vector. See the section “Bayesian VAR and VARX Modeling” on page 1947 for details. You can specify the mean vector by order of the equation. Let .ı; ˆ1 ; : : : ; ˆp / be the parameter sets to be estimated and ˆ D .ˆ1 ; : : : ; ˆp / be the AR parameter sets. The mean vector is specified by row-wise from ˆ; that is, the MEAN=(vec.ˆ0 /) option. For the PRIOR=(mean) option in the BVAR(2), 2 0:1 1 1;11 1;12 2;11 2;12 D ˆD 0:5 3 0 1;21 1;22 2;21 2;22
0 1
where l;ij is an element of ˆ, l is a lag, i is associated with the first dependent variable, and j is associated with the second dependent variable. model y1 y2 / p=2 prior=(mean=(2 0.1 1 0 0.5 3 0 -1));
The deterministic terms and exogenous variables are considered to shrink toward zero; you must omit prior means of exogenous variables and deterministic terms such as a constant, seasonal dummies, or trends. For a Bayesian error correction model estimated when both the ECM= and PRIOR= options are used, a mean vector for only lagged AR coefficients, ˆi , in terms of regressors yt i , for i D 1; : : : ; .p 1/ is used in the VECM(p) representation. The diffused prior variance of ˛ is used, since ˇ is replaced by ˇO estimated in a nonconstrained VECM(p) form. yt D ˛zt
1
C
p X1
ˆi yt i
C ADt C
i D1
s X
‚i xt
i
C t
i D0
where zt D ˇ 0 yt . For example, in the case of a bivariate (k D 2) BVECM(2) form, the option MEAN D .1;11 1;12 1;21 1;22 / where 1;ij is the .i; j /th element of the matrix ˆ1 .
NREP=number
specifies the number of periods to compute the measure of forecast accuracy. The default is NREP=0:5T , where T is the number of observations. THETA=value
specifies the prior standard deviation of the AR coefficient parameter matrices. The value is in the interval (0,1). The default is THETA=0.1. As the value of the THETA= option approaches 1, the specified BVAR(p) model approaches a VAR(p) model. Some examples of the PRIOR= option follow:
1906 F Chapter 30: The VARMAX Procedure
model y1 y2 / p=2 prior; model y1 y2 / p=2 prior=(theta=0.2 lambda=5); model y1 y2 = x1 / p=2 prior=(theta=0.2 lambda=5); model y1 y2 = x1 / p=2 prior=(theta=0.2 lambda=5 mean=(2 0.1 1 0 0.5 3 0 -1));
See the section “Bayesian VAR and VARX Modeling” on page 1947 for details.
Vector Error Correction Model Options ECM=(RANK=number NORMALIZE= emphvariable ECTREND )
specifies a vector error correction model. The following options can be used in the ECM=( ) option. The options are specified within parentheses. NORMALIZE=variable
specifies a single dependent variable name whose cointegrating vectors are normalized. If the variable name is different from that specified in the COINTEG statement, then the value specified in the COINTEG statement is used. RANK=number
specifies the cointegration rank. This option is required in the ECM= option. The value of the RANK= option should be greater than zero and less than or equal to the number of dependent (endogenous) variables, k. If the rank is different from that specified in the COINTEG statement, then the value specified in the COINTEG statement is used. ECTREND
specifies the restriction on the drift in the VECM(p) form.
There is no separate drift in the VECM(p) form, but a constant enters only through the error correction term. yt D ˛.ˇ 0 ; ˇ0 /.y0t
0 1 ; 1/ C
p X1
ˆi yt
i
C t
i D1
An example of the ECTREND option follows: model y1 y2 / p=2 ecm=(rank=1 ectrend);
There is a separate drift and no separate linear trend in the VECM(p) form, but a linear trend enters only through the error correction term. yt D ˛.ˇ 0 ; ˇ1 /.y0t
0 1 ; t/ C
p X1
ˆi yt
i
C ı0 C t
i D1
An example of the ECTREND option with the TREND= option follows: model y1 y2 / p=2 ecm=(rank=1 ectrend) trend=linear;
GARCH Statement F 1907
If the NSEASON option is specified, then the NSEASON option is ignored; if the NOINT option is specified, then the ECTREND option is ignored. Some examples of the ECM= option follow: model y1 y2 / p=2 ecm=(rank=1 normalized=y1); model y1 y2 / p=2 ecm=(rank=1 ectrend) trend=linear;
See the section “Vector Error Correction Modeling” on page 1962 for details.
GARCH Statement GARCH options ;
The GARCH statement specifies a GARCH-type multivariate conditional heteroscedasticity model. The following options can be used in the GARCH statement. FORM=value
specifies the representation for a GARCH model. Valid values are as follows: BEKK
specifies a BEKK representation. This is the default.
CCC
specifies a constant conditional correlation representation.
OUTHT=SAS-data-set
writes the conditional covariance matrix to an output data set. P=number P=(number-list)
specifies the order of the process or the subset of GARCH terms to be fitted. For example, you can specify the P=(1,3) option. The P=3 option is equivalent to the P=(1,2,3) option. The default is P=0. Q=number Q=(number-list)
specifies the order of the process or the subset of ARCH terms to be fitted. This option is required in the GARCH statement. For example, you can specify the Q=(2) option. The Q=2 option is equivalent to the Q=(1,2) option. For the VAR(1)–ARCH(1) model, model y1 y2 / p=1; garch q=1 form=bekk;
For the multivariate GARCH(1,1) model, model y1 y2; garch q=1 p=1 form=ccc;
1908 F Chapter 30: The VARMAX Procedure
Other multivariate GARCH-type models are model y1 y2 = x1 / xlag=1; garch q=1; model y1 y2 / q=1; garch q=1 p=1;
See the section “Multivariate GARCH Modeling” on page 1981 for details.
NLOPTIONS Statement NLOPTIONS options ;
The VARMAX procedure uses the nonlinear optimization (NLO) subsystem to perform nonlinear optimization tasks. For a list of all the options of the NLOPTIONS statement, see Chapter 6, “Nonlinear Optimization Methods.” An example of the NLOPTIONS statement follows: proc varmax data=one; nloptions tech=qn; model y1 y2 / p=2; run;
The VARMAX procedure uses the dual quasi-Newton optimization method by default when no NLOPTIONS statement is specified. However, it uses Newton-Raphson ridge optimization when the NLOPTIONS statement is specified. The following example uses the TECH=QUANEW by default. proc varmax data=one; model y1 y2 / p=2 method=ml; run;
The next example uses the TECH=NRRIDG by default. proc varmax data=one; nloptions maxiter=500 maxfunc=5000; model y1 y2 / p=2 method=ml; run;
OUTPUT Statement OUTPUT < options > ;
RESTRICT Statement F 1909
The OUTPUT statement generates and prints forecasts based on the model estimated in the previous MODEL statement and, optionally, creates an output SAS data set that contains these forecasts. When the GARCH model is estimated, the upper and lower confidence limits of forecasts are calculated by assuming that the error covariance has homoscedastic conditional covariance. ALPHA=number
sets the forecast confidence limit size, where number is between 0 and 1. When you specify the ALPHA=number option, the upper and lower confidence limits define the 100(1 ˛)% confidence interval. The default is ALPHA=0.05, which produces 95% confidence intervals. BACK=number
specifies the number of observations before the end of the data at which the multistep forecasts begin. The BACK= option value must be less than or equal to the number of observations minus the number of lagged regressors in the model. The default is BACK=0, which means that the forecasts start at the end of the available data. LEAD=number
specifies the number of multistep forecast values to compute. The default is LEAD=12. NOPRINT
suppresses the printed forecast values of each dependent (endogenous) variable. OUT=SAS-data-set
writes the forecast values to an output data set. Some examples of the OUTPUT statements follow: proc varmax data=one; model y1 y2 / p=2; output lead=6 back=2; run; proc varmax data=one; model y1 y2 / p=2; output out=for noprint; run;
RESTRICT Statement RESTRICT restriction, . . . , restriction ;
The RESTRICT statement restricts the specified parameters to the specified values. Only one RESTRICT statement is allowed, but multiple restrictions can be specified in one RESTRICT statement. The restriction’s form is parameter=value and each restriction is separated by commas. Parameters are referred by the following keywords: CONST(i ) is the intercept parameter of the ith time series yi t
1910 F Chapter 30: The VARMAX Procedure
AR(l; i; j ) is the autoregressive parameter of the lag l value of the j th dependent (endogenous) variable, yj;t l , to the i th dependent variable at time t, yi t MA(l; i; j ) is the moving-average parameter of the lag l value of the j th error process, j;t to the i th dependent variable at time t , yi t
l,
XL(l; i; j ) is the exogenous parameter of the lag l value of the j th exogenous (independent) variable, xj;t l , to the i th dependent variable at time t, yi t SDUMMY(i; j ) is the j th seasonal dummy of the i th time series at time t, yi t , where j D 1; : : : ; .nseason 1/, where nseason is based on the NSEASON= option in the MODEL statement LTREND(i ) is the linear trend parameter of the current value i th time series yi t QTREND(i ) is the quadratic trend parameter of the current value i th time series yi t The following keywords are for the fitted GARCH model. The indexes i and j refer to the position of the element in the coefficient matrix. GCHC(i ,j ) is the constant parameter of the covariance matrix, Ht , and (i ,j ) is 1 i D j k for CCC representation and 1 i j k for BEKK representations, where k is the number of dependent variables ACH(l,i,j ) is the ARCH parameter of the lag l value of t t0 , where i; j D 1; : : : ; k for BEKK representation and i D j D 1; : : : ; k for CCC representation GCH(l,i ,j ) is the GARCH parameter of the lag l value of covariance matrix, Ht , where i; j D 1; : : : ; k for BEKK representation and i D j D 1; : : : ; k for CCC representation CCC(i ,j ) is the constant conditional correlation parameter for only the CCC representation; (i ,j ) is 1 i < j k To use the RESTRICT statement, you need to know the form of the model. If the P=, Q=, and XLAG= options are not specified, then the RESTRICT statement is not applicable. Restricted parameter estimates are computed by introducing a Lagrangian parameter for each restriction (Pringle and Rayner 1971). The Lagrangian parameter measures the sensitivity of the sum of square errors to the restriction. The estimates of these Lagrangian parameters and their significance are printed in the restriction results table. The following are examples of the RESTRICT statement. The first example shows a bivariate (k=2) VAR(2) model, proc varmax data=one; model y1 y2 / p=2; restrict AR(1,1,2)=0, AR(2,1,2)=0.3; run;
TEST Statement F 1911
The AR(1,1,2) and AR(2,1,2) parameters are fixed as AR(1,1,2)=0 and AR(2,1,2)=0.3, respectively, and other parameters are to be estimated. The following shows a bivariate (k=2) VARX(1,1) model with three exogenous variables, proc varmax data=two; model y1 = x1 x2, y2 = x2 x3 / p=1 xlag=1; restrict XL(0,1,1)=-1.2, XL(1,2,3)=0; run;
The XL(0,1,1) and XL(1,2,3) parameters are fixed as XL(0,1,1)=–1.2 and XL(1,2,3)=0, respectively, and other parameters are to be estimated.
TEST Statement TEST restriction, . . . , restriction ;
The TEST statement performs the Wald test for the joint hypothesis specified in the statement. The restriction’s form is parameter=value, and each restriction is separated by commas. The restrictions are specified in the same manner as in the RESTRICT statement. See the RESTRICT statement for description of model parameter naming conventions used by the RESTRICT and TEST statements. Any number of TEST statements can be specified. To use the TEST statement, you need to know the form of the model. If the P=, Q=, and XLAG= options are not specified, then the TEST statement is not applicable. See the section “Granger Causality Test” on page 1944 for the Wald test. The following is an example of the TEST statement. In the case of a bivariate (k=2) VAR(2) model, proc varmax data=one; model y1 y2 / p=2; test AR(1,1,2)=0, AR(2,1,2)=0; run;
After estimating the parameters, the TEST statement tests the null hypothesis that AR(1,1,2)=0 and AR(2,1,2)=0.
1912 F Chapter 30: The VARMAX Procedure
Details: VARMAX Procedure
Missing Values The VARMAX procedure currently does not support missing values. The procedure uses the first contiguous group of observations with no missing values for any of the MODEL statement variables. Observations at the beginning of the data set with missing values for any MODEL statement variables are not used or included in the output data set. At the end of the data set, observations can have dependent (endogenous) variables with missing values and independent (exogenous) variables with nonmissing values.
VARMAX Model The vector autoregressive moving-average model with exogenous variables is called the VARMAX(p,q,s) model. The form of the model can be written as yt D
p X
ˆi yt
i
i D1
C
s X
‚i xt
i
C t
i D0
q X
‚i t
i
i D1
where the output variables of interest, yt D .y1t ; : : : ; yk t /0 , can be influenced by other input variables, xt D .x1t ; : : : ; xrt /0 , which are determined outside of the system of interest. The variables yt are referred to as dependent, response, or endogenous variables, and the variables xt are referred to as independent, input, predictor, regressor, or exogenous variables. The unobserved noise variables, t D .1t ; : : : ; kt /0 , are a vector white noise process. The VARMAX(p,q,s) model can be written ˆ.B/yt
D ‚ .B/xt C ‚.B/t
where ˆ.B/ D Ik
ˆ1 B
ˆp B p
‚ .B/ D ‚0 C ‚1 B C C ‚s B s ‚.B/ D Ik
‚1 B
‚q B q
are matrix polynomials in B in the backshift operator, such that B i yt D yt k k matrices, and the ‚i are k r matrices.
i,
the ˆi and ‚i are
The following assumptions are made: E.t / D 0, E.t t0 / D †, which is positive-definite, and E.t s0 / D 0 for t ¤ s.
VARMAX Model F 1913
For stationarity and invertibility of the VARMAX process, the roots of jˆ.z/j D 0 and j‚.z/j D 0 are outside the unit circle. The exogenous (independent) variables xt are not correlated with residuals t , E.xt t0 / D 0. The exogenous variables can be stochastic or nonstochastic. When the exogenous variables are stochastic and their future values are unknown, forecasts of these future values are needed to forecast the future values of the endogenous (dependent) variables. On occasion, future values of the exogenous variables can be assumed to be known because they are deterministic variables. The VARMAX procedure assumes that the exogenous variables are nonstochastic if future values are available in the input data set. Otherwise, the exogenous variables are assumed to be stochastic and their future values are forecasted by assuming that they follow the VARMA(p,q) model, prior to forecasting the endogenous variables, where p and q are the same as in the VARMAX(p,q,s) model.
State-Space Representation Another representation of the VARMAX(p,q,s) model is in the form of a state-variable or a statespace model, which consists of a state equation zt D F zt
1
C Kxt C Gt
and an observation equation yt D H zt where 3 2 3 ‚0 Ik 6 7 6 7 yt 60kr 7 60kk 7 6 : 7 6 : 7 6 : 7 6 :: 7 6 :: 7 6 :: 7 6 6 7 7 6 7 6y 60 7 7 60 7 6 t pC1 7 6 kr 7 6 kk 7 6 x 6 I 7 7 60 7 t 6 6 r 7 7 6 rk 7 6 : 7 60 7 6 : 7 6 rr 7 7 6 7 zt D 6 6 :: 7 ; K D 6 : 7 ; G D 6 :: 7 6 6 : 7 7 6 7 6 xt sC1 7 6 : 7 6 0rk 7 6 6 7 7 6 7 6 t 7 6 0rr 7 6Ikk 7 6 6 7 7 6 7 6 :: 7 60kr 7 60kk 7 6 7 6 7 4 : 5 6 :: 7 6 :: 7 4 : 5 4 : 5 t qC1 0kr 0kk 2 ˆ1 ˆp 1 ˆp ‚1 ‚s 1 ‚s 6 Ik 0 0 0 0 0 6 6 : : : : : : : :: :: :: :: :: :: :: 6 :: : 6 60 Ik 0 0 0 0 6 60 0 0 0 0 0 6 60 0 0 Ir 0 0 6 F D6 :: :: :: :: :: :: :: 6 :: 6 : : : : : : : : 6 60 0 0 0 Ir 0 6 60 0 0 0 0 0 6 60 0 0 0 0 0 6 6 :: :: :: :: :: :: :: :: 4 : : : : : : : : 0 0 0 0 0 0 2
2
3
‚1 0 :: : 0 0 0 :: : 0 0 Ik :: : 0
:: : :: : :: :
‚q 0 :: : 0 0 0 :: : 0 0 0 :: : Ik
1
3 ‚q 0 7 7 :: 7 : 7 7 0 7 7 0 7 7 0 7 7 :: 7 7 : 7 7 0 7 7 0 7 7 0 7 7 :: 7 : 5 0
1914 F Chapter 30: The VARMAX Procedure
and H D ŒIk ; 0kk ; : : : ; 0kk ; 0kr ; : : : ; 0kr ; 0kk ; : : : ; 0kk On the other hand, it is assumed that xt follows a VARMA(p,q) model xt D
p X
Ai xt
i
q X
C at
i D1
Ci at
i
i D1
The model can also be expressed as A.B/xt D C.B/at where A.B/ D Ir A1 B Ap B p and C.B/ D Ir C1 B Cq B q are matrix polynomials in B, and the Ai and Ci are r r matrices. Without loss of generality, the AR and MA orders can be taken to be the same as the VARMAX(p,q,s) model, and at and t are independent white noise processes. Under suitable conditions such as stationarity, xt is represented by an infinite order moving-average process
xt D A.B/
1
x
C.B/at D ‰ .B/at D
1 X
‰jx at
j
j D0
where ‰ x .B/ D A.B/
1 C.B/
D
P1
x j j D0 ‰j B .
The optimal minimum mean squared error (minimum MSE) i -step-ahead forecast of xt Ci is 1 X
D
xt Ci jt
‰jx at Ci
j
j Di
xt Ci jt C1 D xt Ci jt C ‰ix 1 atC1 For i > q, p X
xt Ci jt D
Aj xtCi
j jt
j D1
The VARMAX(p,q,s) model has an absolutely convergent representation as yt
D ˆ.B/
1
x
D ‰ .B/‰ .B/at C ˆ.B/ D V .B/at C ‰.B/t or yt D
1 X j D0
Vj at
j
1
‚ .B/xt C ˆ.B/
C
1 X j D0
‰j t
j
1
‚.B/t
‚.B/t
VARMAX Model F 1915
P1
where ‰.B/ D ˆ.B/ 1 ‚.B/ D P j ‰ .B/‰ x .B/ D 1 j D0 Vj B .
j D0 ‰j B
j,
‰ .B/ D ˆ.B/
1 ‚ .B/,
The optimal (minimum MSE) i -step-ahead forecast of yt Ci is D
yt Ci jt
1 X
Vj atCi
j
C
j Di
1 X
‰j t Ci
j
j Di
yt Ci jt C1 D yt Ci jt C Vi
1 at C1
C ‰i
1 t C1
for i D 1; : : : ; v with v D max.p; q C 1/. For i > q,
yt Ci jt
D
D
D
D
p X j D1 p X j D1 p X j D1 p X
ˆj yt Ci
j jt
s X
C
‚j xt Ci
j jt
j D0
ˆj yt Ci
j jt
C
‚0 xtCi jt
C
s X
‚j xt Ci
j jt
j D1
ˆj yt Ci
j jt
C
p X
‚0
Aj xt Ci
j jt
j D1
ˆj yt Ci
j jt
u X
C
j D1
s X
C
‚j xt Ci
j jt
j D1
.‚0 Aj C ‚j /xt Ci
j jt
j D1
where u D max.p; s/. Define …j D ‚0 Aj C ‚j . For i D v > q with v D max.p; q C 1/, you obtain
yt Cvjt yt Cvjt
D
D
p X j D1 p X
ˆj yt Cv
j jt
C
u X
…j xt Cv
j jt
for u v
…j xt Cv
j jt
for u > v
j D1
ˆj yt Cv
j jt
C
j D1
r X j D1
From the preceding relations, a state equation is zt C1 D F zt C Kxt C Get C1 and an observation equation is yt D H zt
and V .B/ D
1916 F Chapter 30: The VARMAX Procedure
where 2
yt
3
6 ytC1jt 7 6 7 2 3 6 7 :: xt Cv u 6 7 : 6 7 6xt Cv uC1 7 6ytCv 1jt 7 at C1 6 7 6 7 zt D 6 7 ; etC1 D :: 7 ; xt D 6 t C1 4 5 : 6 xt 7 6 x 7 xt 1 6 tC1jt 7 6 7 :: 4 5 : xtCv 1jt 2 0 Ik 0 0 0 0 0 60 0 I 0 0 0 0 k 6 6 :: :: :: : : : :: : : : :: :: :: 6 : : : : 6 6ˆv ˆv 1 ˆv 2 ˆ1 …v …v 1 …v 2 F D6 60 0 0 0 0 Ir 0 6 60 0 0 0 0 0 I r 6 6 :: :: :: :: :: :: :: :: 4 : : : : : : : : 0 0 0 0 Av Av 1 Av 2 3 2 2 3 V0 Ik 0 0 0 6 V1 ‰1 7 6 7 6 0 0 0 7 6 7 6 :: :: 7 6 :: 6 : :: :: 7 :: : 7 6 : 6 7 : : : 7 6 7 6 Vv 1 ‰v 1 7 7 6 7 KD6 6…u …u 1 …vC1 7 ; G D 6 Ir 0rk 7 6 0 7 6 7 0 0 7 6 6 ‰x 7 0 rk 7 7 6 :: 6 1 :: : :: :: 5 6 :: 4 : :: 7 : : 4 : : 5 0 0 0 x ‰v 1 0rk
:: : :: :
0 0 :: :
3
7 7 7 7 7 …1 7 7 0 7 7 0 7 7 :: 7 : 5 A1
and H D ŒIk ; 0kk ; : : : ; 0kk ; 0kr ; : : : ; 0kr Note that the matrix K and the input vector xt are defined only when u > v.
Dynamic Simultaneous Equations Modeling In the econometrics literature, the VARMAX(p,q,s) model is sometimes written in a form that is slightly different than the one shown in the previous section. This alternative form is referred to as a dynamic simultaneous equations model or a dynamic structural equations model. Since E.t t0 / D † is assumed to be positive-definite, there exists a lower triangular matrix A0 with ones on the diagonals such that A0 †A00 D †d , where †d is a diagonal matrix with positive diagonal elements. A 0 yt D
p X i D1
Ai yt
i
C
s X i D0
Ci xt
i
C C0 t
q X i D1
Ci t
i
Dynamic Simultaneous Equations Modeling F 1917
where Ai D A0 ˆi , Ci D A0 ‚i , C0 D A0 , and Ci D A0 ‚i . As an alternative form,
A0 yt D
p X i D1
Ai yt
i
C
s X i D0
Ci xt
i
C at
q X
Ci at
i
i D1
where Ai D A0 ˆi , Ci D A0 ‚i , Ci D A0 ‚i A0 1 , and at D C0 t has a diagonal covariance matrix †d . The PRINT=(DYNAMIC) option returns the parameter estimates that result from estimating the model in this form. A dynamic simultaneous equations model involves a leading (lower triangular) coefficient matrix for yt at lag 0 or a leading coefficient matrix for t at lag 0. Such a representation of the VARMAX(p,q,s) model can be more useful in certain circumstances than the standard representation. From the linear combination of the dependent variables obtained by A0 yt , you can easily see the relationship between the dependent variables in the current time. The following statements provide the dynamic simultaneous equations of the VAR(1) model. proc iml; sig = {1.0 0.5, 0.5 1.25}; phi = {1.2 -0.5, 0.6 0.3}; /* simulate the vector time series */ call varmasim(y,phi) sigma = sig n = 100 seed = 34657; cn = {’y1’ ’y2’}; create simul1 from y[colname=cn]; append from y; quit; data simul1; set simul1; date = intnx( ’year’, ’01jan1900’d, _n_-1 ); format date year4.; run; proc varmax data=simul1; model y1 y2 / p=1 noint print=(dynamic); run;
This is the same data set and model used in the section “Getting Started: VARMAX Procedure” on page 1858. You can compare the results of the VARMA model form and the dynamic simultaneous equations model form.
1918 F Chapter 30: The VARMAX Procedure
Figure 30.25 Dynamic Simultaneous Equations (DYNAMIC Option) The VARMAX Procedure Covariances of Innovations Variable y1 y2
y1
y2
1.28875 0.00000
0.00000 1.29578
AR Lag
Variable
0
y1 y2 y1 y2
1
y1
y2
1.00000 -0.30845 1.15977 0.18861
0.00000 1.00000 -0.51058 0.54247
Dynamic Model Parameter Estimates
Equation Parameter
Estimate
y1
1.15977 -0.51058 0.30845 0.18861 0.54247
AR1_1_1 AR1_1_2 AR0_2_1 AR1_2_1 AR1_2_2
y2
Standard Error t Value Pr > |t| Variable 0.05508 0.07140
21.06 -7.15
0.05779 0.07491
3.26 7.24
0.0001 y1(t-1) 0.0001 y2(t-1) y1(t) 0.0015 y1(t-1) 0.0001 y2(t-1)
In Figure 30.4 in the section “Getting Started: VARMAX Procedure” on page 1858, the covariance of t estimated from the VARMAX model form is † D
1:28875 0:39751 0:39751 1:41839
Figure 30.25 shows the results from estimating the model as a dynamic simultaneous equations model. By the decomposition of † , you get a diagonal matrix (†a ) and a lower triangular matrix (A0 ) such as †a D A0 † A00 where †a D
1:28875 0 0 1:29578
and A0 D
1 0 0:30845 1
The lower triangular matrix (A0 ) is shown in the left side of the simultaneous equations model. The parameter estimates in equations system are shown in the right side of the two-equations system. The simultaneous equations model is written as
1 0 0:30845 1
yt D
1:15977 0:18861
0:51058 0:54247
yt
1
C at
Impulse Response Function F 1919
The resulting two-equation system can be written as y1t
D 1:15977y1;t
y2t
D 0:30845y1t C 0:18861y1;t
1
0:51058y2;t 1
1
C a1t
C 0:54247y2;t
1
C a2t
Impulse Response Function Simple Impulse Response Function (IMPULSE=SIMPLE Option) The VARMAX(p,q,s) model has a convergent representation yt D ‰ .B/xt C ‰.B/t where ‰ .B/ D ˆ.B/
1 ‚ .B/
D
P1
j j D0 ‰j B
and ‰.B/ D ˆ.B/
1 ‚.B/
D
P1
j D0 ‰j B
j.
The elements of the matrices ‰j from the operator ‰.B/, called the impulse response, can be interpreted as the impact that a shock in one variable has on another variable. Let j;i n be the i nt h element of ‰j at lag j , where i is the index for the impulse variable, and n is the index for the response variable (impulse ! response). For instance, j;11 is an impulse response to y1t ! y1t , and j;12 is an impulse response to y1t ! y2t .
Accumulated Impulse Response Function (IMPULSE=ACCUM Option) The accumulated impulse response function is the cumulative sum of the impulse response function, P ‰la D lj D0 ‰j .
Orthogonalized Impulse Response Function (IMPULSE=ORTH Option) The MA representation of a VARMA(p,q) model with a standardized white noise innovation process offers another way to interpret a VARMA(p,q) model. Since † is positive-definite, there is a lower triangular matrix P such that † D PP 0 . The alternate MA representation of a VARMA(p,q) model is written as yt D ‰ o .B/ut P o j o where ‰ o .B/ D 1 j D0 ‰j B , ‰j D ‰j P , and ut D P
1 . t
The elements of the matrices ‰jo , called the orthogonal impulse response, can be interpreted as the effects of the components of the standardized shock process ut on the process yt at lag j .
1920 F Chapter 30: The VARMAX Procedure
Impulse Response of Transfer Function (IMPULSX=SIMPLE Option) The coefficient matrix ‰j from the transfer function operator ‰ .B/ can be interpreted as the effects that changes in the exogenous variables xt have on the output variable yt at lag j ; it is called an impulse response matrix in the transfer function.
Impulse Response of Transfer Function (IMPULSX=ACCUM Option) The accumulated impulse response in the transfer function is the cumulative sum of the impulse P response in the transfer function, ‰la D lj D0 ‰j . The asymptotic distributions of the impulse functions can be seen in the section “VAR and VARX Modeling” on page 1941. The following statements provide the impulse response and the accumulated impulse response in the transfer function for a VARX(1,0) model. proc varmax data=grunfeld plot=impulse; model y1-y3 = x1 x2 / p=1 lagmax=5 printform=univariate print=(impulsx=(all) estimates); run;
In Figure 30.26, the variables x1 and x2 are impulses and the variables y1, y2, and y3 are responses. You can read the table matching the pairs of impulse ! response such as x1 ! y1, x1 ! y2, x1 ! y3, x2 ! y1, x2 ! y2, and x2 ! y3. In the pair of x1 ! y1, you can see the long-run responses of y1 to an impulse in x1 (the values are 1.69281, 0.35399, 0.09090, and so on for lag 0, lag 1, lag 2, and so on, respectively).
Impulse Response Function F 1921
Figure 30.26 Impulse Response in Transfer Function (IMPULSX= Option) The VARMAX Procedure Simple Impulse Response of Transfer Function by Variable Variable Response\Impulse y1
y2
y3
Lag
x1
x2
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
1.69281 0.35399 0.09090 0.05136 0.04717 0.04620 -6.09850 -5.15484 -3.04168 -2.23797 -1.98183 -1.87415 -0.02317 1.57476 1.80231 1.77024 1.70435 1.63913
-0.00859 0.01727 0.00714 0.00214 0.00072 0.00040 2.57980 0.45445 0.04391 -0.01376 -0.01647 -0.01453 -0.01274 -0.01435 0.00398 0.01062 0.01197 0.01187
Figure 30.27 shows the responses of y1, y2, and y3 to a forecast error impulse in x1.
1922 F Chapter 30: The VARMAX Procedure
Figure 30.27 Plot of Impulse Response in Transfer Function
Figure 30.28 shows the accumulated impulse response in transfer function.
Impulse Response Function F 1923
Figure 30.28 Accumulated Impulse Response in Transfer Function (IMPULSX= Option) Accumulated Impulse Response of Transfer Function by Variable Variable Response\Impulse y1
y2
y3
Lag
x1
x2
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
1.69281 2.04680 2.13770 2.18906 2.23623 2.28243 -6.09850 -11.25334 -14.29502 -16.53299 -18.51482 -20.38897 -0.02317 1.55159 3.35390 5.12414 6.82848 8.46762
-0.00859 0.00868 0.01582 0.01796 0.01867 0.01907 2.57980 3.03425 3.07816 3.06440 3.04793 3.03340 -0.01274 -0.02709 -0.02311 -0.01249 -0.00052 0.01135
Figure 30.29 shows the accumulated responses of y1, y2, and y3 to a forecast error impulse in x1.
1924 F Chapter 30: The VARMAX Procedure
Figure 30.29 Plot of Accumulated Impulse Response in Transfer Function
The following statements provide the impulse response function, the accumulated impulse response function, and the orthogonalized impulse response function with their standard errors for a VAR(1) model. Parts of the VARMAX procedure output are shown in Figure 30.30, Figure 30.32, and Figure 30.34. proc varmax data=simul1 plot=impulse; model y1 y2 / p=1 noint lagmax=5 print=(impulse=(all)) printform=univariate; run;
Figure 30.30 is the output in a univariate format associated with the PRINT=(IMPULSE=) option for the impulse response function. The keyword STD stands for the standard errors of the elements. The matrix in terms of the lag 0 does not print since it is the identity. In Figure 30.30, the variables y1 and y2 of the first row are impulses, and the variables y1 and y2 of the first column are responses. You can read the table matching the impulse ! response pairs, such as y1 ! y1, y1 ! y2, y2 ! y1, and y2 ! y2. For example, in the pair of y1 ! y1 at lag 3, the response is 0.8055. This represents the impact on y1 of one-unit change in y1 after 3 periods. As the lag gets higher, you can see the long-run responses of y1 to an impulse in itself.
Impulse Response Function F 1925
Figure 30.30 Impulse Response Function (IMPULSE= Option) The VARMAX Procedure Simple Impulse Response by Variable Variable Response\Impulse y1
y2
Lag
y1
y2
1 STD 2 STD 3 STD 4 STD 5 STD 1 STD 2 STD 3 STD 4 STD 5 STD
1.15977 0.05508 1.06612 0.10450 0.80555 0.14522 0.47097 0.17191 0.14315 0.18214 0.54634 0.05779 0.84396 0.08481 0.90738 0.10307 0.78943 0.12318 0.56123 0.14236
-0.51058 0.05898 -0.78872 0.10702 -0.84798 0.14121 -0.73776 0.15864 -0.52450 0.16115 0.38499 0.06188 -0.13073 0.08556 -0.48124 0.09865 -0.64856 0.11661 -0.65275 0.13482
Figure 30.31 shows the responses of y1 and y2 to a forecast error impulse in y1 with two standard errors.
1926 F Chapter 30: The VARMAX Procedure
Figure 30.31 Plot of Impulse Response
Figure 30.32 is the output in a univariate format associated with the PRINT=(IMPULSE=) option for the accumulated impulse response function. The matrix in terms of the lag 0 does not print since it is the identity.
Impulse Response Function F 1927
Figure 30.32 Accumulated Impulse Response Function (IMPULSE= Option) Accumulated Impulse Response by Variable Variable Response\Impulse y1
y2
Lag
y1
y2
1 STD 2 STD 3 STD 4 STD 5 STD 1 STD 2 STD 3 STD 4 STD 5 STD
2.15977 0.05508 3.22589 0.21684 4.03144 0.52217 4.50241 0.96922 4.64556 1.51137 0.54634 0.05779 1.39030 0.17614 2.29768 0.36166 3.08711 0.65129 3.64834 1.07510
-0.51058 0.05898 -1.29929 0.22776 -2.14728 0.53649 -2.88504 0.97088 -3.40953 1.47122 1.38499 0.06188 1.25426 0.18392 0.77302 0.36874 0.12447 0.65333 -0.52829 1.06309
Figure 30.33 shows the accumulated responses of y1 and y2 to a forecast error impulse in y1 with two standard errors.
1928 F Chapter 30: The VARMAX Procedure
Figure 30.33 Plot of Accumulated Impulse Response
Figure 30.34 is the output in a univariate format associated with the PRINT=(IMPULSE=) option for the orthogonalized impulse response function. The two right-hand side columns, y1 and y2, represent the y1_i nnovat i on and y2_i nnovation variables. These are the impulses variables. The left-hand side column contains responses variables, y1 and y2. You can read the table by matching the i mpulse ! response pairs such as y1_i nnovation ! y1, y1_i nnovation ! y2, y2_i nnovat i on ! y1, and y2_i nnovation ! y2.
Impulse Response Function F 1929
Figure 30.34 Orthogonalized Impulse Response Function (IMPULSE= Option) Orthogonalized Impulse Response by Variable Variable Response\Impulse y1
y2
Lag
y1
y2
0 STD 1 STD 2 STD 3 STD 4 STD 5 STD 0 STD 1 STD 2 STD 3 STD 4 STD 5 STD
1.13523 0.08068 1.13783 0.10666 0.93412 0.13113 0.61756 0.15348 0.27633 0.16940 -0.02115 0.17432 0.35016 0.11676 0.75503 0.06949 0.91231 0.10553 0.86158 0.12266 0.66909 0.13305 0.40856 0.14189
0.00000 0.00000 -0.58120 0.14110 -0.89782 0.16776 -0.96528 0.18595 -0.83981 0.19230 -0.59705 0.18830 1.13832 0.08855 0.43824 0.10937 -0.14881 0.13565 -0.54780 0.14825 -0.73827 0.15846 -0.74304 0.16765
In Figure 30.4, there is a positive correlation between "1t and "2t . Therefore, shock in y1 can be accompanied by a shock in y2 in the same period. For example, in the pair of y1_i nnovation ! y2, you can see the long-run responses of y2 to an impulse in y1_i nnovation. Figure 30.35 shows the orthogonalized responses of y1 and y2 to a forecast error impulse in y1 with two standard errors.
1930 F Chapter 30: The VARMAX Procedure
Figure 30.35 Plot of Orthogonalized Impulse Response
Forecasting The optimal (minimum MSE) l-step-ahead forecast of yt Cl is
yt Cljt
yt Cljt
D
D
p X
ˆj yt Cl
j jt
C
s X
j D1
j D0
p X
s X
j D1
ˆj yt Cl
j jt
C
‚j xtCl j jt
q X
‚j t Cl
j;
l q
j Dl
‚j xtCl
j jt ;
l >q
j D0
with yt Cl j jt D yt Cl j and xt Cl j jt D xt Cl j for l j . For the forecasts xt Cl section “State-Space Representation” on page 1913.
j jt ,
see the
Forecasting F 1931
Covariance Matrices of Prediction Errors without Exogenous (Independent) Variables Under the stationarity assumption, the optimal (minimum MSE) l-step-ahead forecast of yt Cl has P1 an infinite moving-average form, yt Cljt D j Dl ‰j t Cl j . The prediction error of the optimal P 1 l-step-ahead forecast is et Cljt D yt Cl yt Cljt D lj D0 ‰j t Cl j , with zero mean and covariance matrix, †.l/ D Cov.et Cljt / D
l 1 X
‰j †‰j0 D
j D0
l 1 X
‰jo ‰jo
0
j D0
where ‰jo D ‰j P with a lower triangular matrix P such that † D PP 0 . Under the assumption of normality of the t , the l-step-ahead prediction error et Cljt is also normally distributed as multivariate N.0; †.l//. Hence, it follows that the diagonal elements i2i .l/ of †.l/ can be used, together with the point forecasts yi;t Cljt , to construct l-step-ahead prediction intervals of the future values of the component series, yi;t Cl . The following statements use the COVPE option to compute the covariance matrices of the prediction errors for a VAR(1) model. The parts of the VARMAX procedure output are shown in Figure 30.36 and Figure 30.37. proc varmax data=simul1; model y1 y2 / p=1 noint lagmax=5 printform=both print=(decompose(5) impulse=(all) covpe(5)); run;
Figure 30.36 is the output in a matrix format associated with the COVPE option for the prediction error covariance matrices. Figure 30.36 Covariances of Prediction Errors (COVPE Option) The VARMAX Procedure Prediction Error Covariances Lead 1 2 3 4 5
Variable y1 y2 y1 y2 y1 y2 y1 y2 y1 y2
y1
y2
1.28875 0.39751 2.92119 1.00189 4.59984 1.98771 5.91299 3.04856 6.69463 3.85346
0.39751 1.41839 1.00189 2.18051 1.98771 3.03498 3.04856 4.07738 3.85346 5.07010
Figure 30.37 is the output in a univariate format associated with the COVPE option for the prediction error covariances. This printing format more easily explains the prediction error covariances of each variable.
1932 F Chapter 30: The VARMAX Procedure
Figure 30.37 Covariances of Prediction Errors Prediction Error Covariances by Variable Variable y1
y2
Lead
y1
y2
1 2 3 4 5 1 2 3 4 5
1.28875 2.92119 4.59984 5.91299 6.69463 0.39751 1.00189 1.98771 3.04856 3.85346
0.39751 1.00189 1.98771 3.04856 3.85346 1.41839 2.18051 3.03498 4.07738 5.07010
Covariance Matrices of Prediction Errors in the Presence of Exogenous (Independent) Variables Exogenous variables can be both stochastic and nonstochastic (deterministic) variables. Considering the forecasts in the VARMAX(p,q,s) model, there are two cases. When exogenous (independent) variables are stochastic (future values not specified): As defined in the section “State-Space Representation” on page 1913, yt Cljt has the representation yt Cljt D
1 X
Vj at Cl
j
C
1 X
j Dl
j Dl
l 1 X
l 1 X
‰j t Cl
j
‰j t Cl
j
and hence et Cljt D
Vj at Cl
j
j D0
C
j D0
Therefore, the covariance matrix of the l-step-ahead prediction error is given as †.l/ D Cov.et Cljt / D
l 1 X j D0
Vj †a Vj0 C
l 1 X
‰j † ‰j0
j D0
where †a is the covariance of the white noise series at , and at is the white noise series for the VARMA(p,q) model of exogenous (independent) variables, which is assumed not to be correlated with t or its lags. When future exogenous (independent) variables are specified:
Forecasting F 1933
The optimal forecast yt Cljt of yt conditioned on the past information and also on known future values xt C1 ; : : : ; xt Cl can be represented as yt Cljt D
1 X
‰j xtCl
j
1 X
C
j D0
‰j tCl
j
j Dl
and the forecast error is et Cljt D
l 1 X
‰j tCl
j
j D0
Thus, the covariance matrix of the l-step-ahead prediction error is given as †.l/ D Cov.et Cljt / D
l 1 X
‰j † ‰j0
j D0
Decomposition of Prediction Error Covariances Pl 1 o o 0 In the relation †.l/ D j D0 ‰j ‰j , the diagonal elements can be interpreted as providing a decomposition of the l-step-ahead prediction error covariance i2i .l/ for each component series yi t into contributions from the components of the standardized innovations t . If you denote the (i; n)th element of ‰jo by MSE.yi;t Chjt / D E.yi;t Ch
j;i n ,
yi;t Chjt /2 D
the MSE of yi;tChjt is k l 1 X X
2 j;i n
j D0 nD1
P 1 2 Note that lj D0 j;i n is interpreted as the contribution of innovations in variable n to the prediction error covariance of the l-step-ahead forecast of variable i . The proportion, !l;i n , of the l-step-ahead forecast error covariance of variable i accounting for the innovations in variable n is !l;i n D
l 1 X
2 j;i n =MSE.yi;t Chjt /
j D0
The following statements use the DECOMPOSE option to compute the decomposition of prediction error covariances and their proportions for a VAR(1) model: proc varmax data=simul1; model y1 y2 / p=1 noint print=(decompose(15)) printform=univariate; run;
The proportions of decomposition of prediction error covariances of two variables are given in Figure 30.38. The output explains that about 91.356% of the one-step-ahead prediction error covariances of the variable y2t is accounted for by its own innovations and about 8.644% is accounted for by y1t innovations.
1934 F Chapter 30: The VARMAX Procedure
Figure 30.38 Decomposition of Prediction Error Covariances (DECOMPOSE Option) Proportions of Prediction Error Covariances by Variable Variable
Lead
y1
y2
1 2 3 4 5 1 2 3 4 5
1.00000 0.88436 0.75132 0.64897 0.58460 0.08644 0.31767 0.50247 0.55607 0.53549
0.00000 0.11564 0.24868 0.35103 0.41540 0.91356 0.68233 0.49753 0.44393 0.46451
y1
y2
Forecasting of the Centered Series If the CENTER option is specified, the sample mean vector is added to the forecast.
Forecasting of the Differenced Series If dependent (endogenous) variables are differenced, the final forecasts and their prediction error covariances are produced by integrating those of the differenced series. However, if the PRIOR option is specified, the forecasts and their prediction error variances of the differenced series are produced. Let zt be the original series with some appended zero values that correspond to the unobserved past observations. Let .B/ be the k k matrix polynomial in the backshift operator that corresponds to the differencing specified by the MODEL statement. The off-diagonal elements of i are zero, and the diagonal elements can be different. Then yt D .B/zt . This gives the relationship zt D
1
.B/yt D
1 X
ƒj yt
j
j D0
where
1 .B/
D
P1
j D0 ƒj B
j
and ƒ0 D Ik .
The l-step-ahead prediction of zt Cl is zt Cljt D
l 1 X j D0
ƒj yt Cl
j jt
C
1 X j Dl
ƒj yt Cl
j
Tentative Order Selection F 1935
The l-step-ahead prediction error of ztCl is l 1 X
ƒj yt Cl
yt Cl
j
j jt
l 1 X
D
0
j X
@
j D0
j D0
1 ƒu ‰ j
u A t Cl j
uD0
Letting †z .0/ D 0, the covariance matrix of the l-step-ahead prediction error of zt Cl , †z .l/, is l 1 X
†z .l/ D
0
j X
@
1 ƒu ‰j
u A †
uD0
j D0
j X
@
10 ƒu ‰j
uA
uD0
0 D †z .l
0
1/ C @
l 1 X
1 ƒj ‰ l
1 j
0
l 1 X
A † @
j D0
10 ƒj ‰l
1 j
A
j D0
If there are stochastic exogenous (independent) variables, the covariance matrix of the l-step-ahead prediction error of zt Cl , †z .l/, is 0 †z .l/ D †z .l
1/ C @
l 1 X
1 ƒj ‰ l
1 j
0
A † @
j D0
0 C@
l 1 X
1 j
j D0
10 ƒj ‰l
1 j
A
j D0
1 ƒj V l
l 1 X
0
A †a @
l 1 X
10 ƒj V l
1 j
A
j D0
Tentative Order Selection Sample Cross-Covariance and Cross-Correlation Matrices Given a stationary multivariate time series yt , cross-covariance matrices are .l/ D EŒ.yt
/.yt Cl
/0
where D E.yt /, and cross-correlation matrices are .l/ D D
1
.l/D
1
where D is a diagonal matrix with the standard deviations of the components of yt on the diagonal. The sample cross-covariance matrix at lag l, denoted as C.l/, is computed as T l 1 X O yQ t yQ 0t Cl .l/ D C.l/ D T t D1
1936 F Chapter 30: The VARMAX Procedure
O where yQ t is the centered data and T is the number of nonmissing observations. Thus, .l/ has .i; j /th element Oij .l/ D cij .l/. The sample cross-correlation matrix at lag l is computed as Oij .l/ D cij .l/=Œci i .0/cjj .0/1=2 ; i; j D 1; : : : ; k The following statements use the CORRY option to compute the sample cross-correlation matrices and their summary indicator plots in terms of C; ; and , where C indicates significant positive cross-correlations, indicates significant negative cross-correlations, and indicates insignificant cross-correlations. proc varmax data=simul1; model y1 y2 / p=1 noint lagmax=3 print=(corry) printform=univariate; run;
Figure 30.39 shows the sample cross-correlation matrices of y1t and y2t . As shown, the sample autocorrelation functions for each variable decay quickly, but are significant with respect to two standard errors. Figure 30.39 Cross-Correlations (CORRY Option) The VARMAX Procedure Cross Correlations of Dependent Series by Variable Variable y1
y2
Lag
y1
y2
0 1 2 3 0 1 2 3
1.00000 0.83143 0.56094 0.26629 0.67041 0.29707 -0.00936 -0.22058
0.67041 0.84330 0.81972 0.66154 1.00000 0.77132 0.48658 0.22014
Schematic Representation of Cross Correlations Variable/ Lag 0 1 2
3
y1 y2
++ -+
++ ++
++ ++
++ .+
+ is > 2*std error, - is < -2*std error, . is between
Tentative Order Selection F 1937
Partial Autoregressive Matrices For each m D 1; 2; : : : ; p you can define a sequence of matrices ˆmm , which is called the partial autoregression matrices of lag m, as the solution for ˆmm to the Yule-Walker equations of order m,
.l/ D
m X
.l
i /ˆ0im ; l D 1; 2; : : : ; m
i D1
The sequence of the partial autoregression matrices ˆmm of order m has the characteristic property that if the process follows the AR(p), then ˆpp D ˆp and ˆmm D 0 for m > p. Hence, the matrices ˆmm have the cutoff property for a VAR(p) model, and so they can be useful in the identification of the order of a pure VAR model. The following statements use the PARCOEF option to compute the partial autoregression matrices: proc varmax data=simul1; model y1 y2 / p=1 noint lagmax=3 printform=univariate print=(corry parcoef pcorr pcancorr roots); run;
Figure 30.40 shows that the model can be obtained by an AR order m D 1 since partial autoregression matrices are insignificant after lag 1 with respect to two standard errors. The matrix for lag 1 is the same as the Yule-Walker autoregressive matrix. Figure 30.40 Partial Autoregression Matrices (PARCOEF Option) The VARMAX Procedure Partial Autoregression Lag 1 2 3
Variable y1 y2 y1 y2 y1 y2
y1
y2
1.14844 0.54985 -0.00724 0.02409 -0.02578 -0.03720
-0.50954 0.37409 0.05138 0.05909 0.03885 0.10149
Schematic Representation of Partial Autoregression Variable/ Lag 1 2 3 y1 y2
+++
.. ..
.. ..
+ is > 2*std error, - is < -2*std error, . is between
1938 F Chapter 30: The VARMAX Procedure
Partial Correlation Matrices Define the forward autoregression m X1
yt D
ˆi;m
1 yt i
C um;t
i D1
and the backward autoregression
yt
m
D
m X1
ˆi;m
1 yt mCi
C um;t
m
i D1
The matrices P .m/ defined by Ansley and Newbold (1979) are given by 1=2 0 1=2 1 ˆmm †m 1
P .m/ D †m where
†m
1
D Cov.um;t / D .0/
m X1
. i/ˆ0i;m
1
i D1
and †m
1
D Cov.um;t
m / D .0/
m X1
.m
0
i/ˆm
i;m 1
i D1
P .m/ are the partial cross-correlation matrices at lag m between the elements of yt and yt m , given yt 1 ; : : : ; yt mC1 . The matrices P .m/ have the cutoff property for a VAR(p) model, and so they can be useful in the identification of the order of a pure VAR structure. The following statements use the PCORR option to compute the partial cross-correlation matrices: proc varmax data=simul1; model y1 y2 / p=1 noint lagmax=3 print=(pcorr) printform=univariate; run;
The partial cross-correlation matrices in Figure 30.41 are insignificant after lag 1 with respect to two standard errors. This indicates that an AR order of m D 1 can be an appropriate choice.
Tentative Order Selection F 1939
Figure 30.41 Partial Correlations (PCORR Option) The VARMAX Procedure Partial Cross Correlations by Variable Variable
Lag
y1
y2
1 2 3 1 2 3
0.80348 0.00276 -0.01091 -0.30946 0.04676 0.01993
0.42672 0.03978 0.00032 0.71906 0.07045 0.10676
y1
y2
Schematic Representation of Partial Cross Correlations Variable/ Lag 1 2 3 y1 y2
++ -+
.. ..
.. ..
+ is > 2*std error, - is < -2*std error, . is between
Partial Canonical Correlation Matrices The partial canonical correlations at lag m between the vectors yt and yt m , given yt 1 ; : : : ; yt mC1 , are 1 1 .m/ 2 .m/ k .m/. The partial canonical correlations are the canonical correlations between the residual series um;t and um;t m , where um;t and um;t m are defined in the previous section. Thus, the squared partial canonical correlations i2 .m/ are the eigenvalues of the matrix fCov.um;t /g
1
0
E.um;t um;t
0 1 m /fCov.um;t m /g E.um;t m um;t /
0
0
D ˆmm ˆmm
It follows that the test statistic to test for ˆm D 0 in the VAR model of order m > p is approximately .T
m/ tr
0 0 fˆmm ˆmm g
.T
m/
k X
i2 .m/
i D1
and has an asymptotic chi-square distribution with k 2 degrees of freedom for m > p. The following statements use the PCANCORR option to compute the partial canonical correlations: proc varmax data=simul1; model y1 y2 / p=1 noint lagmax=3 print=(pcancorr); run;
Figure 30.42 shows that the partial canonical correlations i .m/ between yt and yt m are {0.918, 0.773}, {0.092, 0.018}, and {0.109, 0.011} for lags m D1 to 3. After lag m D1, the partial
1940 F Chapter 30: The VARMAX Procedure
canonical correlations are insignificant with respect to the 0.05 significance level, indicating that an AR order of m D 1 can be an appropriate choice. Figure 30.42 Partial Canonical Correlations (PCANCORR Option) The VARMAX Procedure Partial Canonical Correlations Lag
Correlation1
Correlation2
DF
Chi-Square
Pr > ChiSq
1 2 3
0.91783 0.09171 0.10861
0.77335 0.01816 0.01078
4 4 4
142.61 0.86 1.16
<.0001 0.9307 0.8854
The Minimum Information Criterion (MINIC) Method The minimum information criterion (MINIC) method can tentatively identify the orders of a VARMA(p,q) process. Note that Spliid (1983), Koreisha and Pukkila (1989), and Quinn (1980) proposed this method. The first step of this method is to obtain estimates of the innovations series, t , from the VAR(p ), where p is chosen sufficiently large. The choice of the autoregressive order, p , is determined by use of a selection criterion. From the selected VAR(p ) model, you obtain estimates of residual series Qt D yt
p X
O p yt ˆ i
i
ıO p ; t D p C 1; : : : ; T
i D1
In the second step, you select the order (p; q) of the VARMA model for p in .pmi n W pmax / and q in .qmi n W qmax / yt D ı C
p X i D1
ˆi yt
i
q X
‚i Q t
i
C t
i D1
which minimizes a selection criterion like SBC or HQ. The following statements use the MINIC= option to compute a table that contains the information criterion associated with various AR and MA orders: proc varmax data=simul1; model y1 y2 / p=1 noint minic=(p=3 q=3); run;
Figure 30.43 shows the output associated with the MINIC= option. The criterion takes the smallest value at AR order 1.
VAR and VARX Modeling F 1941
Figure 30.43 MINIC= Option The VARMAX Procedure Minimum Information Criterion Based on AICC Lag AR AR AR AR
0 1 2 3
MA 0
MA 1
MA 2
MA 3
3.3574947 0.5544431 0.6369334 0.7235629
3.0570091 0.6507991 0.7407785 0.8131261
2.7272377 0.7120257 0.802996 0.8395558
2.3526368 0.7859524 0.850487 0.9094856
VAR and VARX Modeling The pth-order VAR process is written as
yt
p X
D
ˆi .yt
i
/ C t or ˆ.B/.yt
/ D t
i D1
Pp
with ˆ.B/ D Ik
i D1 ˆi B
i.
Equivalently, it can be written as p X
yt D ı C
ˆi yt
i
C t or ˆ.B/yt D ı C t
i D1
with ı D .Ik
Pp
i D1 ˆi /.
Stationarity For stationarity, the VAR process must be expressible in the convergent causal infinite MA form as
yt D C
1 X
‰j t
j
j D0
P1 P j where ‰.B/ D ˆ.B/ 1 D 1 j D0 ‰j B with j D0 jj‰j jj < 1, where jjAjj denotes a norm for 2 0 the matrix A such as jjAjj D trfA Ag. The matrix ‰j can be recursively obtained from the relation ˆ.B/‰.B/ D I ; it is ‰j D ˆ1 ‰j
1
C ˆ2 ‰j
2
C C ˆp ‰j
p
1942 F Chapter 30: The VARMAX Procedure
where ‰0 D Ik and ‰j D 0 for j < 0. The stationarity condition is satisfied if all roots of jˆ.z/j D 0 are outside of the unit circle. The stationarity condition is equivalent to the condition in the corresponding VAR(1) representation, Yt D ˆYt 1 C "t , that all eigenvalues of the kp kp companion matrix ˆ be less than one in absolute value, where Yt D .y0t ; : : : ; y0t pC1 /0 , "t D .t0 ; 00 ; : : : ; 00 /0 , and 2 ˆ1 ˆ2 6 Ik 0 6 6 ˆ D 6 0 Ik 6 :: :: 4 : : 0 0
3 ˆp 1 ˆp 0 0 7 7 0 0 7 7 :: :: 7 :: : : : 5 Ik 0
If the stationarity condition is not satisfied, a nonstationary model (a differenced model or an error correction model) might be more appropriate. The following statements estimate a VAR(1) model and use the ROOTS option to compute the characteristic polynomial roots: proc varmax data=simul1; model y1 y2 / p=1 noint print=(roots); run;
Figure 30.44 shows the output associated with the ROOTS option, which indicates that the series is stationary since the modulus of the eigenvalue is less than one. Figure 30.44 Stationarity (ROOTS Option) The VARMAX Procedure Roots of AR Characteristic Polynomial Index
Real
Imaginary
Modulus
Radian
Degree
1 2
0.77238 0.77238
0.35899 -0.35899
0.8517 0.8517
0.4351 -0.4351
24.9284 -24.9284
Parameter Estimation Consider the stationary VAR(p) model yt D ı C
p X
ˆi yt
i
C t
i D1
where y pC1 ; : : : ; y0 are assumed to be available (for convenience of notation). This can be represented by the general form of the multivariate linear model, Y D XB C E or y D .X ˝ Ik /ˇ C e
VAR and VARX Modeling F 1943
where Y
D .y1 ; : : : ; yT /0
B D .ı; ˆ1 ; : : : ; ˆp /0 0
X
D .X0 ; : : : ; XT
1/
Xt
D .1; y0t ; : : : ; y0t
0 pC1 /
E D .1 ; : : : ; T /0 y D vec.Y 0 / ˇ D vec.B 0 / e D vec.E 0 / with vec denoting the column stacking operator. The conditional least squares estimator of ˇ is ˇO D ..X 0 X /
1
X 0 ˝ Ik /y
and the estimate of † is O D .T †
.kp C 1//
1
T X
Ot Ot 0
t D1
where Ot is the residual vectors. Consistency and asymptotic normality of the LS estimator are that p
T .ˇO
d
ˇ/ ! N.0; p 1 ˝ †/ d
where X 0 X=T converges in probability to p and ! denotes convergence in distribution. The (conditional) maximum likelihood estimator in the VAR(p) model is equal to the (conditional) least squares estimator on the assumption of normality of the error vectors.
Asymptotic Distributions of Impulse Response Functions As before, vec denotes the column stacking operator and vech is the corresponding operator that stacks the elements on and below the diagonal. For any k k matrix A, the commutation matrix Kk is defined as Kk vec.A/ D vec.A0 /; the duplication matrix Dk is defined as Dk vech.A/ D vec.A/; the elimination matrix Lk is defined as Lk vec.A/ D vech.A/. The asymptotic distribution of the impulse response function (Lütkepohl 1993) is p
Oj T vec.‰
d
‰j / ! N.0; Gj †ˇ Gj0 / j D 1; 2; : : :
where †ˇ D p 1 ˝ † and Gj D
jX 1 @vec.‰j / D J.ˆ 0 /j @ˇ 0 i D0
1 i
˝ ‰i
1944 F Chapter 30: The VARMAX Procedure
where J D ŒIk ; 0; : : : ; 0 is a k kp matrix and ˆ is a kp kp companion matrix. The asymptotic distribution of the accumulated impulse response function is p
Oa T vec.‰ l
d
‰la / ! N.0; Fl †ˇ Fl0 / l D 1; 2; : : :
Pl
where Fl D
j D1 Gj .
The asymptotic distribution of the orthogonalized impulse response function is p
Oo T vec.‰ j
d ‰jo / ! N.0; Cj †ˇ Cj0 C CNj † CNj0 / j D 0; 1; 2; : : : 0
where C0 D 0, Cj D .‰0o ˝ Ik /Gj , CNj D .Ik ˝ ‰j /H , H D
@vec.‰0o / D L0k fLk .Ik 2 C Kk /.‰0o ˝ Ik /L0k g @ 0 0
and † D 2DkC .† ˝ †/DkC with DkC D .Dk0 Dk /
1D0 k
1
and D vech.E/. ˛
Granger Causality Test Let yt be arranged and partitioned in subgroups y1t and y2t with dimensions k1 and k2 , respectively 0 0 0 (k D k1 C k2 ); that is, yt D .y01t ; y02t /0 with the corresponding white noise process t D .1t ; 2t /. Consider the VAR(p) model with partitioned coefficients ˆij .B/ for i; j D 1; 2 as follows:
ˆ11 .B/ ˆ12 .B/ ˆ21 .B/ ˆ22 .B/
y1t ı D 1 C 1t y2t ı2 2t
The variables y1t are said to cause y2t , but y2t do not cause y1t if ˆ12 .B/ D 0. The implication of this model structure is that future values of the process y1t are influenced only by its own past and not by the past of y2t , where future values of y2t are influenced by the past of both y1t and y2t . If the future y1t are not influenced by the past values of y2t , then it can be better to model y1t separately from y2t . Consider testing H0 W C ˇ D c, where C is a s .k 2 p Ck/ matrix of rank s and c is an s-dimensional vector where s D k1 k2 p. Assuming that p
T .ˇO
d
ˇ/ ! N.0; p 1 ˝ †/
you get the Wald statistic T .C ˇO
O 0 c/0 ŒC.O p 1 ˝ †/C
1
.C ˇO
d
c/ ! 2 .s/
For the Granger causality test, the matrix C consists of zeros or ones and c is the zero vector. See Lütkepohl(1993) for more details of the Granger causality test.
VAR and VARX Modeling F 1945
VARX Modeling The vector autoregressive model with exogenous variables is called the VARX(p,s) model. The form of the VARX(p,s) model can be written as
yt D ı C
p X
ˆi yt
i
C
i D1
s X
‚i xt
i
C t
i D0
The parameter estimates can be obtained by representing the general form of the multivariate linear model, Y D XB C E or y D .X ˝ Ik /ˇ C e where Y
D .y1 ; : : : ; yT /0
B D .ı; ˆ1 ; : : : ; ˆp ; ‚0 ; : : : ; ‚s /0 0
X
D .X0 ; : : : ; XT
1/
Xt
D .1; y0t ; : : : ; y0t
0 0 0 pC1 ; xt C1 ; : : : ; xt sC1 /
E D .1 ; : : : ; T /0 y D vec.Y 0 / ˇ D vec.B 0 / e D vec.E 0 / The conditional least squares estimator of ˇ can be obtained by using the same method in a VAR(p) modeling. If the multivariate linear model has different independent variables that correspond to dependent variables, the SUR (seemingly unrelated regression) method is used to improve the regression estimates. The following example fits the ordinary regression model: proc varmax data=one; model y1-y3 = x1-x5; run;
This is equivalent to the REG procedure in the SAS/STAT software: proc reg model model model run;
data=one; y1 = x1-x5; y2 = x1-x5; y3 = x1-x5;
The following example fits the second-order lagged regression model:
1946 F Chapter 30: The VARMAX Procedure
proc varmax data=two; model y1 y2 = x / xlag=2; run;
This is equivalent to the REG procedure in the SAS/STAT software: data three; set two; xlag1 = lag1(x); xlag2 = lag2(x); run; proc reg data=three; model y1 = x xlag1 xlag2; model y2 = x xlag1 xlag2; run;
The following example fits the ordinary regression model with different regressors: proc varmax data=one; model y1 = x1-x3, y2 = x2 x3; run;
This is equivalent to the following SYSLIN procedure statements: proc syslin data=one vardef=df sur; endogenous y1 y2; model y1 = x1-x3; model y2 = x2 x3; run;
From the output in Figure 30.20 in the section “Getting Started: VARMAX Procedure” on page 1858, you can see that the parameters, XL0_1_2, XL0_2_1, XL0_3_1, and XL0_3_2 associated with the exogenous variables, are not significant. The following example fits the VARX(1,0) model with different regressors: proc varmax data=grunfeld; model y1 = x1, y2 = x2, y3 / p=1 print=(estimates); run;
Bayesian VAR and VARX Modeling F 1947
Figure 30.45 Parameter Estimates for the VARX(1, 0) Model The VARMAX Procedure XLag Lag
Variable
0
y1 y2 y3
x1
x2
1.83231 _ _
_ 2.42110 _
As you can see in Figure 30.45, the symbol ‘_’ in the elements of matrix corresponds to endogenous variables that do not take the denoted exogenous variables.
Bayesian VAR and VARX Modeling Consider the VAR(p) model yt D ı C ˆ1 yt
1
C C ˆp yt
p
C t
or y D .X ˝ Ik /ˇ C e When the parameter vector ˇ has a prior multivariate normal distribution with known mean ˇ and covariance matrix Vˇ , the prior density is written as f .ˇ/ D .
1 k 2 p=2 jVˇ j / 2
1=2
expŒ
1 .ˇ 2
ˇ /Vˇ 1 .ˇ
ˇ /
The likelihood function for the Gaussian process becomes 1 kT =2 / jIT ˝ †j 1=2 2 1 expŒ .y .X ˝ Ik /ˇ/0 .IT ˝ † 2
`.ˇjy/ D .
1
/.y
.X ˝ Ik /ˇ/
Therefore, the posterior density is derived as f .ˇjy/ / expŒ
1 .ˇ 2
N 0† N 1 .ˇ ˇ/ ˇ
N ˇ/
where the posterior mean is ˇN D ŒVˇ
1
C .X 0 X ˝ †
1
/
1
ŒVˇ 1 ˇ C .X 0 ˝ †
1
/y
1948 F Chapter 30: The VARMAX Procedure
and the posterior covariance matrix is N ˇ D ŒV † ˇ
1
1
C .X 0 X ˝ †
/
1
In practice, the prior mean ˇ and the prior variance Vˇ need to be specified. If all the parameters are considered to shrink toward zero, the null prior mean should be specified. According to Litterman (1986), the prior variance can be given by vij .l/ D
.= l/2 if i D j . i i =ljj /2 if i ¤ j
where vij .l/ is the prior variance of the .i; j /th element of ˆl , is the prior standard deviation of the diagonal elements of ˆl , is a constant in the interval .0; 1/, and i2i is the i th diagonal element of †. The deterministic terms have diffused prior variance. In practice, you replace the i2i by the diagonal element of the ML estimator of † in the nonconstrained model. For example, for a bivariate BVAR(2) model, y1t
D 0 C 1;11 y1;t
1
C 1;12 y2;t
1
C 2;11 y1;t
2
C 2;12 y2;t
2
C 1t
y2t
D 0 C 1;21 y1;t
1
C 1;22 y2;t
1
C 2;21 y1;t
2
C 2;22 y2;t
2
C 2t
with the prior covariance matrix Vˇ D Diag . 1; 2 ; .1 =2 /2 ; .=2/2 ; .1 =22 /2 ; 1; . 2 =1 /2 ; 2 ; .2 =21 /2 ; .=2/2 / For the Bayesian estimation of integrated systems, the prior mean is set to the first lag of each variable equal to one in its own equation and all other coefficients at zero. For example, for a bivariate BVAR(2) model, y1t
D 0 C 1 y1;t
1
C 0 y2;t
1
C 0 y1;t
2
C 0 y2;t
2
C 1t
y2t
D 0 C 0 y1;t
1
C 1 y2;t
1
C 0 y1;t
2
C 0 y2;t
2
C 2t
Forecasting of BVAR Modeling The mean squared error is used to measure forecast accuracy (Litterman 1986). The MSE of the forecast is M SE D
T 1 X .At T
Fts /2
t D1
where At is the actual value at time t and Fts is the forecast made s periods earlier.
VARMA and VARMAX Modeling F 1949
Bayesian VARX Modeling The Bayesian vector autoregressive model with exogenous variables is called the BVARX(p,s) model. The form of the BVARX(p,s) model can be written as
yt D ı C
p X
ˆi yt
i
C
i D1
s X
‚i xt
i
C t
i D0
The parameter estimates can be obtained by representing the general form of the multivariate linear model, y D .X ˝ Ik /ˇ C e The prior means for the AR coefficients are the same as those specified in BVAR(p). The prior means for the exogenous coefficients are set to zero. Some examples of the Bayesian VARX model are as follows: model y1 y2 = x1 / p=1 xlag=1 prior; model y1 y2 = x1 / p=(1 3) xlag=1 nocurrentx prior=(lambda=0.9 theta=0.1);
VARMA and VARMAX Modeling A VARMA(p; q) process is written as
yt D ı C
p X
ˆi yt
i
q X
C t
i D1
‚i t
i
i D1
or ˆ.B/yt D ı C ‚.B/t where ˆ.B/ D Ik
Pp
i D1 ˆi B
i
and ‚.B/ D Ik
Pq
i D1 ‚i B
i.
Stationarity and Invertibility For stationarity and invertibility of the VARMA process, the roots of jˆ.z/j D 0 and j‚.z/j D 0 are outside the unit circle.
1950 F Chapter 30: The VARMAX Procedure
Parameter Estimation Under the assumption of normality of the t with mean vector zero and nonsingular covariance matrix †, consider the conditional (approximate) log-likelihood function of a VARMA(p,q) model with mean zero. Define Y D .y1 ; : : : ; yT /0 and E D .1 ; : : : ; T /0 with B i Y D .y1 i ; : : : ; yT .1 i ; : : : ; T i /0 ; define y D vec.Y 0 / and e D vec.E 0 /. Then y
p X
q X
i
.IT ˝ ˆi /B y D e
iD1
0 i/
and B i E D
.IT ˝ ‚i /B i e
i D1
where B i y D vecŒ.B i Y /0 and B i e D vecŒ.B i E/0 . Then, the conditional (approximate) log-likelihood function can be written as follows (Reinsel 1997):
` D D
T
T log j†j 2
1X 0 t † 2
T log j†j 2
1 0 0 w‚ 2
Pp
where w D y
i D1 .IT
1
t
t D1
1
.IT ˝ †
1
/‚
1
w Pq
˝ ˆi /B i y, and ‚ is such that e
i D1 .IT
˝ ‚i /B i e D ‚e.
For the exact log-likelihood function of a VARMA(p,q) model, the Kalman filtering method is used transforming the VARMA process into the state-space form (Reinsel 1997). The state-space form of the VARMA(p,q) model consists of a state equation zt D F zt
1
C Gt
and an observation equation yt D H zt where for v D max.p; q C 1/ zt D .y0t ; y0t C1jt ; : : : ; y0t Cv 2 0 Ik 0 60 0 Ik 6 F D6 : :: :: : 4 : : : ˆv ˆv
1
ˆv
2
0 1jt /
:: :
0 0 :: :
3
2
3
Ik ‰1 :: :
7 6 7 6 7; G D 6 5 4 ˆ1 ‰v
7 7 7 5 1
and H D ŒIk ; 0; : : : ; 0 The Kalman filtering approach is used for evaluation of the likelihood function. The updating equation is zO t jt D zO t jt
1
C Kt tjt
1
VARMA and VARMAX Modeling F 1951
with Kt D Pt jt
1H
0
ŒHPt jt
1H
0
1
and the prediction equation is zO t jt
1
D F zO t
with Pt jt D ŒI
1jt 1 ;
Pt jt
Kt H Pt jt
1
1
D FPt
1jt 1 F
0
C G†G 0
for t D 1; 2; : : : ; n.
The log-likelihood function can be expressed as T
`D
1X Œlog j†t jt 2
1j
.yt
yO t jt
1/
0
†t jt1 1 .yt
yO t jt
1 /
t D1
where yO tjt 1 and †t jt 1 are determined recursively from the Kalman filter procedure. To construct the likelihood function from Kalman filtering, you obtain yO t jt 1 D H zO t jt 1 , O t jt 1 D yt yO tjt 1 , and †t jt 1 D HPt jt 1 H 0 . Define the vector ˇ ˇ D .10 ; : : : ; p0 ; 10 ; : : : ; q0 ; vech.†//0 where i D vec.ˆi / and i D vec.‚i /. The log-likelihood equations are solved by iterative numerical procedures such as the quasi-Newton optimization. The starting values for the AR and MA parameters are obtained from the least squares estimates.
Asymptotic Distribution of the Parameter Estimates Under the assumptions of stationarity and invertibility for the VARMA p model and the assumption that t is a white noise process, ˇO is a consistent estimator for ˇ and T .ˇO ˇ/ converges in distribution to the multivariate normal N.0; V 1 / as T ! 1, where V is the asymptotic information matrix of ˇ.
Asymptotic Distributions of Impulse Response Functions Defining the vector ˇ ˇ D .10 ; : : : ; p0 ; 10 ; : : : ; q0 /0 the asymptotic distribution of the impulse response function for a VARMA(p; q) model is p
Oj T vec.‰
d
‰j / ! N.0; Gj †ˇ Gj0 / j D 1; 2; : : :
where †ˇ is the covariance matrix of the parameter estimates and Gj D
jX 1 @vec.‰j / D H0 .A0 /j @ˇ 0 i D0
1 i
˝ JAi J0
1952 F Chapter 30: The VARMAX Procedure
where H D ŒIk ; 0; : : : ; 0; Ik ; 0; : : : ; 00 is a k.p C q/ k matrix with the second Ik following after p block matrices; J D ŒIk ; 0; : : : ; 0 is a k k.p C q/ matrix; A is a k.p C q/ k.p C q/ matrix,
A11 A12 AD A21 A22
where 2
A11
ˆ1 ˆ2 6 Ik 0 6 6 D 6 0 Ik 6 :: :: 4 : : 0 0
2 3 ˆp 1 ˆp ‚1 6 0 0 0 7 7 6 6 0 0 7 7 A12 D 6 0 7 6 :: : : :: :: :: 5 4 : : Ik 0 0
:: :
‚q 0 0 :: :
0
1
3 ‚q 0 7 7 0 7 7 :: 7 : 5 0
A21 is a kq kp zero matrix, and 2
A22
0 0 6Ik 0 6 6 D 6 0 Ik 6 :: :: 4: : 0 0
:: :
0 0 0 :: :
Ik
3 0 07 7 07 7 :: 7 :5 0
An Example of a VARMA(1,1) Model Consider a VARMA(1,1) model with mean zero yt D ˆ1 yt
1
C t
‚1 t
1
where t is the white noise process with a mean zero vector and the positive-definite covariance matrix †. The following IML procedure statements simulate a bivariate vector time series from this model to provide test data for the VARMAX procedure: proc iml; sig = {1.0 0.5, 0.5 1.25}; phi = {1.2 -0.5, 0.6 0.3}; theta = {0.5 -0.2, 0.1 0.3}; /* to simulate the vector time series */ call varmasim(y,phi,theta) sigma=sig n=100 seed=34657; cn = {’y1’ ’y2’}; create simul3 from y[colname=cn]; append from y; run;
VARMA and VARMAX Modeling F 1953
The following statements fit a VARMA(1,1) model to the simulated data. You specify the order of the autoregressive model with the P= option and the order of moving-average model with the Q= option. You specify the quasi-Newton optimization in the NLOPTIONS statement as an optimization method. proc varmax data=simul3; nloptions tech=qn; model y1 y2 / p=1 q=1 noint print=(estimates); run;
Figure 30.46 shows the initial values of parameters. The initial values were estimated using the least squares method. Figure 30.46 Start Parameter Estimates for the VARMA(1, 1) Model The VARMAX Procedure Optimization Start Parameter Estimates
N Parameter 1 2 3 4 5 6 7 8
AR1_1_1 AR1_2_1 AR1_1_2 AR1_2_2 MA1_1_1 MA1_2_1 MA1_1_2 MA1_2_2
Estimate
Gradient Objective Function
1.013118 0.510233 -0.399051 0.441344 0.295872 -0.002809 -0.044216 0.425334
-0.987110 0.163904 0.826824 -9.605845 -1.929771 2.408518 -0.632995 0.888222
Figure 30.47 shows the default option settings for the quasi-Newton optimization technique.
1954 F Chapter 30: The VARMAX Procedure
Figure 30.47 Default Criteria for the quasi-Newton Optimization Minimum Iterations Maximum Iterations Maximum Function Calls ABSGCONV Gradient Criterion GCONV Gradient Criterion ABSFCONV Function Criterion FCONV Function Criterion FCONV2 Function Criterion FSIZE Parameter ABSXCONV Parameter Change Criterion XCONV Parameter Change Criterion XSIZE Parameter ABSCONV Function Criterion Line Search Method Starting Alpha for Line Search Line Search Precision LSPRECISION DAMPSTEP Parameter for Line Search Singularity Tolerance (SINGULAR)
0 200 2000 0.00001 1E-8 0 2.220446E-16 0 0 0 0 0 -1.34078E154 2 1 0.4 . 1E-8
Figure 30.48 shows the iteration history of parameter estimates. Figure 30.48 Iteration History of Parameter Estimates
Iter
Rest arts
Func Calls
Act Con
Objective Function
1 2 3 4 5 6 7 8 9 10 11 12 13
0 0 0 0 0 0 0 0 0 0 0 0 0
38 57 76 95 114 133 152 171 207 226 245 264 283
0 0 0 0 0 0 0 0 0 0 0 0 0
121.86537 121.76369 121.55605 121.36386 121.25741 121.22578 121.20582 121.18747 121.13613 121.13536 121.13528 121.13528 121.13528
Figure 30.49 shows the final parameter estimates.
Max Abs Obj Fun Gradient Change Element
Step Size
Slope Search Direc
0.1020 6.3068 0.00376 0.1017 6.9381 0.0100 0.2076 5.1526 0.0100 0.1922 5.7292 0.0100 0.1064 5.5107 0.0100 0.0316 4.9213 0.0240 0.0200 6.0114 0.0316 0.0183 5.5324 0.0457 0.0513 0.5829 0.628 0.000766 0.2054 1.000 0.000082 0.0534 1.000 2.537E-6 0.0101 1.000 1.475E-7 0.000270 1.000
-54.483 -33.359 -31.265 -37.419 -17.708 -2.905 -1.268 -0.805 -0.164 -0.0021 -0.0002 -637E-8 -3E-7
VARMA and VARMAX Modeling F 1955
Figure 30.49 Results of Parameter Estimates for the VARMA(1, 1) Model The VARMAX Procedure Optimization Results Parameter Estimates N Parameter Estimate 1 2 3 4 5 6 7 8
AR1_1_1 AR1_2_1 AR1_1_2 AR1_2_2 MA1_1_1 MA1_2_1 MA1_1_2 MA1_2_2
1.018085 0.391465 -0.386513 0.552904 0.322906 -0.165661 -0.021533 0.586132
Figure 30.50 shows the AR coefficient matrix in terms of lag 1, the MA coefficient matrix in terms of lag 1, the parameter estimates, and their significance, which is one indication of how well the model fits the data.
1956 F Chapter 30: The VARMAX Procedure
Figure 30.50 Parameter Estimates for the VARMA(1, 1) Model The VARMAX Procedure Type of Model Estimation Method
VARMA(1,1) Maximum Likelihood Estimation
AR Lag 1
Variable y1 y2
y1
y2
1.01809 0.39147
-0.38651 0.55290
e1
e2
0.32291 -0.16566
-0.02153 0.58613
MA Lag 1
Variable y1 y2
Schematic Representation Variable/ Lag
AR1
MA1
y1 y2
+++
+. .+
+ is > 2*std error, is < -2*std error, . is between, * is N/A
Model Parameter Estimates
Equation Parameter
Estimate
y1
1.01809 -0.38651 0.32291 -0.02153 0.39147 0.55290 -0.16566 0.58612
y2
AR1_1_1 AR1_1_2 MA1_1_1 MA1_1_2 AR1_2_1 AR1_2_2 MA1_2_1 MA1_2_2
Standard Error t Value Pr > |t| Variable 0.10256 0.09643 0.14529 0.14199 0.10062 0.08421 0.15699 0.14114
9.93 -4.01 2.22 -0.15 3.89 6.57 -1.06 4.15
0.0001 0.0001 0.0285 0.8798 0.0002 0.0001 0.2939 0.0001
y1(t-1) y2(t-1) e1(t-1) e2(t-1) y1(t-1) y2(t-1) e1(t-1) e2(t-1)
Model Diagnostic Checks F 1957
The fitted VARMA(1,1) model with estimated standard errors in parentheses is given as 1 1:01809 0:38651 B .0:10256/ .0:09644/ C Cy yt D B @ 0:39147 0:55290 A t .0:10062/ .0:08421/
0
0
1
C t
1 0:32291 0:02153 B .0:14530/ .0:14199/ C B C @ 0:16566 0:58613 A t .0:15699/ .0:14115/
1
VARMAX Modeling A VARMAX(p; q; s) process is written as
yt D ı C
p X
ˆi yt
i
i D1
C
s X
‚i xt i
i D0
C t
q X
‚i t
i
i D1
or ˆ.B/yt D ı C ‚ .B/xt C ‚.B/t where Pq ˆ.B/i D Ik i D1 ‚i B .
Pp
i D1 ˆi B
i,
‚ .B/ D ‚0 C ‚1 B C C ‚s B s , and ‚.B/ D Ik
The dimension of the state-space vector of the Kalman filtering method for the parameter estimation of the VARMAX(p,q,s) model is large, which takes time and memory for computing. For convenience, the parameter estimation of the VARMAX(p,q,s) model uses the two-stage estimation method, which first estimates the deterministic terms and exogenous parameters, and then maximizes the log-likelihood function of a VARMA(p,q) model. Some examples of VARMAX modeling are as follows: model y1 y2 = x1 / q=1; nloptions tech=qn; model y1 y2 = x1 / p=1 q=1 xlag=1 nocurrentx; nloptions tech=qn;
Model Diagnostic Checks Multivariate Model Diagnostic Checks Information Criterion After fitting some candidate models to the data, various model selection criteria (normalized by T ) can be used to choose the appropriate model. The following list includes the Akaike information criterion (AIC), the corrected Akaike information criterion
1958 F Chapter 30: The VARMAX Procedure
(AICC), the final prediction error criterion (FPE), the Hannan-Quinn criterion (HQC), and the Schwarz Bayesian criterion (SBC, also referred to as BIC): Q C 2r=T AIC D log.j†j/ Q C 2r=.T AICC D log.j†j/
r=k/ T C r=k k Q / j†j FPE D . T r=k Q C 2r log.log.T //=T HQC D log.j†j/ Q C r log.T /=T SBC D log.j†j/
where r denotes the number of parameters estimated, k is the number of dependent variables, Q is the maximum likelihood T is the number of observations used to estimate the model, and † estimate of †. When comparing models, choose the model with the smallest criterion values. An example of the output was displayed in Figure 30.4. Portmanteau Qs statistic The Portmanteau Qs statistic is used to test whether correlation remains on the model residuals. The null hypothesis is that the residuals are uncorrelated. Let C .l/ be the residual cross-covariance matrices, O .l/ be the residual cross-correlation matrices as C .l/ D T
1
T Xl
t t0 Cl
t D1
and O .l/ D VO
1=2
C .l/VO
1=2
and O . l/ D O .l/0
2 2 O The multivariate where VO D Diag.O 11 ; : : : ; O kk / and O i2i are the diagonal elements of †. portmanteau test defined in Hosking (1980) is
Qs D T
2
s X
.T
l/
1
trfO .l/†
1
O . l/†
1
g
lD1
The statistic Qs has approximately the chi-square distribution with k 2 .s freedom. An example of the output is displayed in Figure 30.7.
p
q/ degrees of
Univariate Model Diagnostic Checks There are various ways to perform diagnostic checks for a univariate model. For details, see the section “Heteroscedasticity and Normality Tests” on page 374 in Chapter 8, “The AUTOREG Procedure.” An example of the output is displayed in Figure 30.8 and Figure 30.9. Durbin-Watson (DW) statistics: The DW test statistics test for the first order autocorrelation in the residuals.
Cointegration F 1959
Jarque-Bera normality test: This test is helpful in determining whether the model residuals represent a white noise process. This tests the null hypothesis that the residuals have normality. F tests for autoregressive conditional heteroscedastic (ARCH) disturbances: F test statistics test for the heteroscedastic disturbances in the residuals. This tests the null hypothesis that the residuals have equal covariances F tests for AR disturbance: These test statistics are computed from the residuals of the univariate AR(1), AR(1,2), AR(1,2,3) and AR(1,2,3,4) models to test the null hypothesis that the residuals are uncorrelated.
Cointegration This section briefly introduces the concepts of cointegration (Johansen 1995b). Definition 1. (Engle and Granger 1987): If a series yt with no deterministic components can be represented by a stationary and invertible ARMA process after differencing d times, the series is integrated of order d , that is, yt I.d /. Definition 2. (Engle and Granger 1987): If all elements of the vector yt are I.d / and there exists a cointegrating vector ˇ ¤ 0 such that ˇ 0 yt I.d b/ for any b > 0, the vector process is said to be cointegrated CI.d; b/. A simple example of a cointegrated process is the following bivariate system: y1t
D y2t C 1t
y2t
D y2;t
1
C 2t
with 1t and 2t being uncorrelated white noise processes. In the second equation, y2t is a random walk, y2t D 2t , 1 B. Differencing the first equation results in y1t D y2t C 1t D 2t C 1t
1;t
1
Thus, both y1t and y2t are I.1/ processes, but the linear combination y1t y2t is stationary. Hence yt D .y1t ; y2t /0 is cointegrated with a cointegrating vector ˇ D .1; /0 . In general, if the vector process yt has k components, then there can be more than one cointegrating vector ˇ 0 . It is assumed that there are r linearly independent cointegrating vectors with r < k, which make the k r matrix ˇ. The rank of matrix ˇ is r, which is called the cointegration rank of yt .
1960 F Chapter 30: The VARMAX Procedure
Common Trends This section briefly discusses the implication of cointegration for the moving-average representation. Let yt be cointegrated CI.1; 1/, then yt has the Wold representation: yt D ı C ‰.B/t where t is i id.0; †/, ‰.B/ D
P1
j D0 ‰j B
j
with ‰0 D Ik , and
P1
j D0 j j‰j j
< 1.
Assume that t D 0 if t 0 and y0 is a nonrandom initial value. Then the difference equation implies that
yt D y0 C ıt C ‰.1/
t X
i C ‰ .B/t
i D0
where ‰ .B/ D .1
B/
1 .‰.B/
‰.1// and ‰ .B/ is absolutely summable.
Assume that the rank of ‰.1/ is m D k r. When the process yt is cointegrated, there is a cointegrating k r matrix ˇ such that ˇ 0 yt is stationary. Premultiplying yt by ˇ 0 results in ˇ 0 yt D ˇ 0 y0 C ˇ 0 ‰ .B/t because ˇ 0 ‰.1/ D 0 and ˇ 0 ı D 0. Stock and Watson (1988) showed that the cointegrated process yt has a common trends representation derived from the moving-average representation. Since the rank of ‰.1/ is m D k r, there is a k r matrix H1 with rank r such that ‰.1/H1 D 0. Let H2 be a k m matrix with rank m such that H20 H1 D 0; then A D C.1/H2 has rank m. The H D .H1 ; H2 / has rank k. By construction of H , ‰.1/H D Œ0; A D ASm where Sm D .0mr ; Im /. Since ˇ 0 ‰.1/ D 0 and ˇ 0 ı D 0, ı lies in the column space of ‰.1/ and can be written ı D C.1/ıQ where ıQ is a k-dimensional vector. The common trends representation is written as
yt
Q C D y0 C ‰.1/Œıt
t X
i C ‰ .B/t
i D0
D y0 C ‰.1/H ŒH
1Q
ıt C H
1
t X i D0
D y0 C At C at
i C at
Cointegration F 1961
and t D C t
1
C vt
where at D ‰ .B/t , D Sm H
1 ı, Q
t D Sm ŒH
1 ıt Q
CH
1
Pt
i D0 i ,
and vt D Sm H
1 . t
Stock and Watson showed that the common trends representation expresses yt as a linear combination of m random walks (t ) with drift plus I.0/ components (at /.
Test for the Common Trends Stock and Watson (1988) proposed statistics for common trends testing. The null hypothesis is that the k-dimensional time series yt has m common stochastic trends, where m k and the alternative is that it has s common trends, where s < m . The test procedure of m versus s common stochastic trends is performed based on the first-order serial correlation matrix of yt . Let ˇ? be a k m matrix 0 0 orthogonal to the cointegrating matrix such that ˇ? ˇ D 0 and ˇ? ˇ? D Im . Let zt D ˇ 0 yt and 0 wt D ˇ? yt . Then 0 0 0 wt D ˇ? y0 C ˇ? ıt C ˇ? ‰.1/
t X
0 i C ˇ? ‰ .B/t
i D0
Combining the expression of zt and wt ,
zt wt
D C
ˇ 0 y0 0 ˇ? y0
C
ˇ 0 ‰ .B/ 0 ˇ? ‰ .B/
0 0
ˇ? ı
tC
0 0 ˇ? ‰.1/
X t
i
i D1
t
The Stock-Watson common trends test is performed based on the component wt by testing whether 0 ˇ? ‰.1/ has rank m against rank s. The following statements perform the Stock-Watson test for common trends: proc iml; sig = 100*i(2); phi = {-0.2 0.1, 0.5 0.2, 0.8 0.7, -0.4 0.6}; call varmasim(y,phi) sigma=sig n=100 initial=0 seed=45876; cn = {’y1’ ’y2’}; create simul2 from y[colname=cn]; append from y; quit; data simul2; set simul2; date = intnx( ’year’, ’01jan1900’d, _n_-1 ); format date year4. ; run;
1962 F Chapter 30: The VARMAX Procedure
proc varmax data=simul2; model y1 y2 / p=2 cointtest=(sw); run;
In Figure 30.51, the first column is the null hypothesis that yt has m k common trends; the second column is the alternative hypothesis that yt has s < m common trends; the third column contains the eigenvalues used for the test statistics; the fourth column contains the test statistics using AR(p) filtering of the data. The table shows the output of the case p D 2. Figure 30.51 Common Trends Test (COINTTEST=(SW) Option) The VARMAX Procedure Common Trend Test
H0: Rank=m
H1: Rank=s
Eigenvalue
Filter
1 2
0 0 1
1.000906 0.996763 0.648908
0.09 -0.32 -35.11
5% Critical Value -14.10 -8.80 -23.00
Lag 2
The test statistic for testing for 2 versus 1 common trends is more negative (–35.1) than the critical value (–23.0). Therefore, the test rejects the null hypothesis, which means that the series has a single common trend.
Vector Error Correction Modeling This section discusses the implication of cointegration for the autoregressive representation. Assume that the cointegrated series can be represented by a vector error correction model according to the Granger representation theorem (Engle and Granger 1987). Consider the vector autoregressive process with Gaussian errors defined by
yt D
p X
ˆi yt
i
C t
i D1
or ˆ.B/yt D t where the initial values, y pC1 ; : : : ; y0 , are fixed and t N.0; †/. Since the AR operator ˆ.B/ Pp 1 i can be re-expressed as ˆ.B/ D ˆ .B/.1 B/ C ˆ.1/B, where ˆ .B/ D Ik i D1 ˆi B with P p ˆi D ˆ , the vector error correction model is j Di C1 j ˆ .B/.1
B/yt D ˛ˇ 0 yt
1
C t
Vector Error Correction Modeling F 1963
or yt D ˛ˇ 0 yt
1C
p X1
ˆi yt
i
C t
i D1
where ˛ˇ 0 D
ˆ.1/ D
Ik C ˆ1 C ˆ2 C C ˆp .
One motivation for the VECM(p) form is to consider the relation ˇ 0 yt D c as defining the underlying economic relations and assume that the agents react to the disequilibrium error ˇ 0 yt c through the adjustment coefficient ˛ to restore equilibrium; that is, they satisfy the economic relations. The cointegrating vector, ˇ is sometimes called the long-run parameters. You can consider a vector error correction model with a deterministic term. The deterministic term Dt can contain a constant, a linear trend, and seasonal dummy variables. Exogenous variables can also be included in the model.
yt D …yt
1
C
p X1
ˆi yt i
C ADt C
i D1
s X
‚i xt
i
C t
i D0
where … D ˛ˇ 0 . The alternative vector error correction representation considers the error correction term at lag t and is written as yt D
p X1
]
ˆi yt
i
C …] yt
p C ADt C
i D1
s X
‚i xt
i
p
C t
i D0
If the matrix … has a full-rank (r D k), all components of yt are I.0/. On the other hand, yt are stationary in difference if rank.…/ D 0. When the rank of the matrix … is r < k, there are k r linear combinations that are nonstationary and r stationary cointegrating relations. Note that the linearly independent vector zt D ˇ 0 yt is stationary and this transformation is not unique unless r D 1. There does not exist a unique cointegrating matrix ˇ since the coefficient matrix … can also be decomposed as … D ˛MM
1 0
ˇ D ˛ ˇ
0
where M is an r r nonsingular matrix.
Test for the Cointegration The cointegration rank test determines the linearly independent columns of …. Johansen (1988, 1995a) and Johansen and Juselius (1990) proposed the cointegration rank test by using the reduced rank regression.
1964 F Chapter 30: The VARMAX Procedure
Different Specifications of Deterministic Trends When you construct the VECM(p) form from the VAR(p) model, the deterministic terms in the VECM(p) form can differ from those in the VAR(p) model. When there are deterministic cointegrated relationships among variables, deterministic terms in the VAR(p) model are not present in the VECM(p) form. On the other hand, if there are stochastic cointegrated relationships in the VAR(p) model, deterministic terms appear in the VECM(p) form via the error correction term or as an independent term in the VECM(p) form. There are five different specifications of deterministic trends in the VECM(p) form. Case 1: There is no separate drift in the VECM(p) form. yt D ˛ˇ 0 yt
1C
p X1
ˆi yt
i
C t
i D1
Case 2: There is no separate drift in the VECM(p) form, but a constant enters only via the error correction term. yt D ˛.ˇ 0 ; ˇ0 /.y0t
0 1 ; 1/ C
p X1
ˆi yt
i
C t
i D1
Case 3: There is a separate drift and no separate linear trend in the VECM(p) form. 0
yt D ˛ˇ yt
1
C
p X1
ˆi yt
i
C ı0 C t
i D1
Case 4: There is a separate drift and no separate linear trend in the VECM(p) form, but a linear trend enters only via the error correction term. yt D ˛.ˇ 0 ; ˇ1 /.y0t
0 1 ; t/ C
p X1
ˆi yt
i
C ı0 C t
i D1
Case 5: There is a separate linear trend in the VECM(p) form. yt D ˛ˇ 0 yt
1C
p X1
ˆi yt
i
C ı0 C ı1 t C t
i D1
First, focus on Cases 1, 3, and 5 to test the null hypothesis that there are at most r cointegrating vectors. Let Z0t
D yt
Z1t
D yt
Z2t
D
Z0 D Z1 D Z2 D
1 Œy0t 1 ; : : : ; y0t pC1 ; Dt 0 ŒZ01 ; : : : ; Z0T 0 ŒZ11 ; : : : ; Z1T 0 ŒZ21 ; : : : ; Z2T 0
Vector Error Correction Modeling F 1965
where Dt can be empty for Case 1, 1 for Case 3, and .1; t/ for Case 5. In Case 2, Z1t and Z2t are defined as Z1t
D Œy0t
Z2t
D
0 1 ; 1 Œy0t 1 ; : : : ; y0t pC1 0
In Case 4, Z1t and Z2t are defined as Z1t
D Œy0t
Z2t
D
0 1; t Œy0t 1 ; : : : ; y0t pC1 ; 10
Let ‰ be the matrix of parameters consisting of ˆ1 , . . . , ˆp 1 , A, and ‚0 , . . . , ‚s , where parameters A corresponds to regressors Dt . Then the VECM(p) form is rewritten in these variables as Z0t D ˛ˇ 0 Z1t C ‰Z2t C t The log-likelihood function is given by ` D
kT log 2 2 T 1X .Z0t 2
T log j†j 2 ˛ˇ 0 Z1t
‰Z2t /0 †
1
.Z0t
˛ˇ 0 Z1t
‰Z2t /
t D1
The residuals, R0t and R1t , are obtained by regressing Z0t and Z1t on Z2t , respectively. The regression equation of residuals is R0t D ˛ˇ 0 R1t C O t The crossproducts matrices are computed Sij D
T 1 X 0 Rit Rjt ; i; j D 0; 1 T t D1
Then the maximum likelihood estimator for ˇ is obtained from the eigenvectors that correspond to the r largest eigenvalues of the following equation: jS11
S10 S001 S01 j D 0
The eigenvalues of the preceding equation are squared canonical correlations between R0t and R1t , and the eigenvectors that correspond to the r largest eigenvalues are the r linear combinations of yt 1 , which have the largest squared partial correlations with the stationary process yt after correcting for lags and deterministic terms. Such an analysis calls for a reduced rank regression of yt on yt 1 corrected for .yt 1 ; : : : ; yt pC1 ; Dt /, as discussed by Anderson (1951). Johansen (1988) suggests two test statistics to test the null hypothesis that there are at most r cointegrating vectors H0 W i D 0 for i D r C 1; : : : ; k
1966 F Chapter 30: The VARMAX Procedure
Trace Test The trace statistic for testing the null hypothesis that there are at most r cointegrating vectors is as follows: t race D
T
k X
log.1
i /
i DrC1
The asymptotic distribution of this statistic is given by (Z ) Z 1 1Z 1 1 0 0 0 tr .d W /WQ WQ WQ dr WQ .d W / 0
0
0
where t r.A/ is the trace of a matrix A, W is the k r dimensional Brownian motion, and WQ is the Brownian motion itself, or the demeaned or detrended Brownian motion according to the different specifications of deterministic trends in the vector error correction model. Maximum Eigenvalue Test The maximum eigenvalue statistic for testing the null hypothesis that there are at most r cointegrating vectors is as follows: max D
T log.1
rC1 /
The asymptotic distribution of this statistic is given by Z 1 Z maxf .d W /WQ 0 . 0
1 0
WQ WQ 0 dr/
1
1
Z 0
WQ .d W /0 g
where max.A/ is the maximum eigenvalue of a matrix A. Osterwald-Lenum (1992) provided detailed tables of the critical values of these statistics. The following statements use the JOHANSEN option to compute the Johansen cointegration rank trace test of integrated order 1: proc varmax data=simul2; model y1 y2 / p=2 cointtest=(johansen=(normalize=y1)); run;
Figure 30.52 shows the output based on the model specified in the MODEL statement, an intercept term is assumed. In the “Cointegration Rank Test Using Trace” table, the column Drift In ECM means there is no separate drift in the error correction model and the column Drift In Process means the process has a constant drift before differencing. The “Cointegration Rank Test Using Trace” table shows the trace statistics based on Case 3 and the “Cointegration Rank Test Using Trace under Restriction” table shows the trace statistics based on Case 2. The output indicates that the series are cointegrated with rank 1 because the trace statistics are smaller than the critical values in both Case 2 and Case 3.
Vector Error Correction Modeling F 1967
Figure 30.52 Cointegration Rank Test (COINTTEST=(JOHANSEN=) Option) The VARMAX Procedure Cointegration Rank Test Using Trace
H0: Rank=r
H1: Rank>r
Eigenvalue
Trace
0 1
0 1
0.4644 0.0056
61.7522 0.5552
5% Critical Value 15.34 3.84
Drift in ECM
Drift in Process
Constant
Linear
Cointegration Rank Test Using Trace Under Restriction
H0: Rank=r
H1: Rank>r
Eigenvalue
Trace
0 1
0 1
0.5209 0.0426
76.3788 4.2680
5% Critical Value 19.99 9.13
Drift in ECM
Drift in Process
Constant
Constant
Figure 30.53 shows which result, either Case 2 (the hypothesis H0) or Case 3 (the hypothesis H1), is appropriate depending on the significance level. Since the cointegration rank is chosen to be 1 by the result in Figure 30.52, look at the last row that corresponds to rank=1. Since the p-value is 0.054, the Case 2 cannot be rejected at the significance level 5%, but it can be rejected at the significance level 10%. For modeling of the two Case 2 and Case 3, see Figure 30.56 and Figure 30.57. Figure 30.53 Cointegration Rank Test Continued Hypothesis of the Restriction
Hypothesis
Drift in ECM
Drift in Process
H0(Case 2) H1(Case 3)
Constant Constant
Constant Linear
Hypothesis Test of the Restriction
Rank
Eigenvalue
Restricted Eigenvalue
DF
Chi-Square
Pr > ChiSq
0 1
0.4644 0.0056
0.5209 0.0426
2 1
14.63 3.71
0.0007 0.0540
Figure 30.54 shows the estimates of long-run parameter (Beta) and adjustment coefficients (Alpha) based on Case 3.
1968 F Chapter 30: The VARMAX Procedure
Figure 30.54 Cointegration Rank Test Continued Beta Variable y1 y2
1
2
1.00000 -2.04869
1.00000 -0.02854
Alpha Variable y1 y2
1
2
-0.46421 0.17535
-0.00502 -0.01275
Using the NORMALIZE= option, the first low of the “Beta” table has 1. Considering that the cointegration rank is 1, the long-run relationship of the series is ˇ 0 yt
D
1
D y1t y1t
2:04869
y1 y2
2:04869y2t
D 2:04869y2t
Figure 30.55 shows the estimates of long-run parameter (Beta) and adjustment coefficients (Alpha) based on Case 2. Figure 30.55 Cointegration Rank Test Continued Beta Under Restriction Variable y1 y2 1
1
2
1.00000 -2.04366 6.75919
1.00000 -2.75773 101.37051
Alpha Under Restriction Variable y1 y2
1
2
-0.48015 0.12538
0.01091 0.03722
Vector Error Correction Modeling F 1969
Considering that the cointegration rank is 1, the long-run relationship of the series is 3 y1 2:04366 6:75919 4 y2 5 1 2:04366 y2t C 6:75919 2
ˇ 0 yt
D
1
D y1t y1t
D 2:04366 y2t
6:75919
Estimation of Vector Error Correction Model The preceding log-likelihood function is maximized for 1=2 ˇO D S11 Œv1 ; : : : ; vr O ˇO 0 S11 ˇ/ O 1 ˛O D S01 ˇ.
O D ˛O ˇO 0 … O 0/ O D .Z20 Z2 / 1 Z20 .Z0 Z1 … ‰ O D .Z0 Z2 ‰ O 0 Z1 … O 0 /0 .Z0 †
O0 Z2 ‰
O 0 /=T Z1 …
The estimators of the orthogonal complements of ˛ and ˇ are ˇO? D S11 ŒvrC1 ; : : : ; vk and ˛O ? D S001 S01 ŒvrC1 ; : : : ; vk The ML estimators have the following asymptotic properties: p
d
O ‰ O T vec.Œ…;
Œ…; ‰/ ! N.0; †co /
where †co D † ˝
ˇ 0 0 Ik
1
ˇ0 0 0 Ik
and D plim
1 T
ˇ 0 Z10 Z1 ˇ ˇ 0 Z10 Z2 Z20 Z1 ˇ Z20 Z2
The following statements are examples of fitting the five different cases of the vector error correction models mentioned in the previous section.
1970 F Chapter 30: The VARMAX Procedure
For fitting Case 1, model y1 y2 / p=2 ecm=(rank=1 normalize=y1) noint;
For fitting Case 2, model y1 y2 / p=2 ecm=(rank=1 normalize=y1 ectrend);
For fitting Case 3, model y1 y2 / p=2 ecm=(rank=1 normalize=y1);
For fitting Case 4, model y1 y2 / p=2 ecm=(rank=1 normalize=y1 ectrend) trend=linear;
For fitting Case 5, model y1 y2 / p=2 ecm=(rank=1 normalize=y1) trend=linear;
From Figure 30.53 that uses the COINTTEST=(JOHANSEN) option, you can fit the model by using either Case 2 or Case 3 because the test was not significant at the 0.05 level, but was significant at the 0.10 level. Here both models are fitted to show the difference in output display. Figure 30.56 is for Case 2, and Figure 30.57 is for Case 3. For Case 2, proc varmax data=simul2; model y1 y2 / p=2 ecm=(rank=1 normalize=y1 ectrend) print=(estimates); run;
Vector Error Correction Modeling F 1971
Figure 30.56 Parameter Estimation with the ECTREND Option The VARMAX Procedure Parameter Alpha * Beta’ Estimates Variable y1 y2
y1
y2
1
-0.48015 0.12538
0.98126 -0.25624
-3.24543 0.84748
AR Coefficients of Differenced Lag DIF Lag 1
Variable y1 y2
y1
y2
-0.72759 0.38982
-0.77463 -0.55173
Model Parameter Estimates
Equation Parameter
Estimate
D_y1
-3.24543 -0.48015 0.98126 -0.72759 -0.77463 0.84748 0.12538 -0.25624 0.38982 -0.55173
CONST1 AR1_1_1 AR1_1_2 AR2_1_1 AR2_1_2 CONST2 AR1_2_1 AR1_2_2 AR2_2_1 AR2_2_2
D_y2
Standard Error t Value Pr > |t| Variable 0.33022 0.04886 0.09984 0.04623 0.04978 0.35394 0.05236 0.10702 0.04955 0.05336
-15.74 -15.56
0.0001 0.0001
7.87 -10.34
0.0001 0.0001
1, EC y1(t-1) y2(t-1) D_y1(t-1) D_y2(t-1) 1, EC y1(t-1) y2(t-1) D_y1(t-1) D_y2(t-1)
Figure 30.56 can be reported as follows: yt
0:48015 0:12538
D C
0:72759 0:38982
2 y 3:24543 4 1;t y2;t 0:84748 1 0:77463 yt 1 C t 0:55173
0:98126 0:25624
1 1
3 5
The keyword “EC” in the “Model Parameter Estimates” table means that the ECTREND option is used for fitting the model. For fitting Case 3, proc varmax data=simul2; model y1 y2 / p=2 ecm=(rank=1 normalize=y1) print=(estimates); run;
1972 F Chapter 30: The VARMAX Procedure
Figure 30.57 Parameter Estimation without the ECTREND Option The VARMAX Procedure Parameter Alpha * Beta’ Estimates Variable y1 y2
y1
y2
-0.46421 0.17535
0.95103 -0.35923
AR Coefficients of Differenced Lag DIF Lag 1
Variable y1 y2
y1
y2
-0.74052 0.34820
-0.76305 -0.51194
Model Parameter Estimates
Equation Parameter
Estimate
D_y1
-2.60825 -0.46421 0.95103 -0.74052 -0.76305 3.43005 0.17535 -0.35923 0.34820 -0.51194
CONST1 AR1_1_1 AR1_1_2 AR2_1_1 AR2_1_2 CONST2 AR1_2_1 AR1_2_2 AR2_2_1 AR2_2_2
D_y2
Standard Error t Value Pr > |t| Variable 1.32398 0.05474 0.11215 0.05060 0.05352 1.39587 0.05771 0.11824 0.05335 0.05643
-1.97
-14.63 -14.26 2.46
6.53 -9.07
0.0518 1 y1(t-1) y2(t-1) 0.0001 D_y1(t-1) 0.0001 D_y2(t-1) 0.0159 1 y1(t-1) y2(t-1) 0.0001 D_y1(t-1) 0.0001 D_y2(t-1)
Figure 30.57 can be reported as follows: yt
0:46421 0:17535
D C
0:95103 0:35293 2:60825 C t 3:43005
yt
1
C
0:74052 0:34820
0:76305 0:51194
yt
1
Test for the Linear Restriction on the Parameters Consider the example with the variables mt log real money, yt log real income, itd deposit interest rate, and itb bond interest rate. It seems a natural hypothesis that in the long-run relation, money and income have equal coefficients with opposite signs. This can be formulated as the hypothesis that the cointegrated relation contains only mt and yt through mt yt . For the analysis, you can express these restrictions in the parameterization of H such that ˇ D H, where H is a known
Vector Error Correction Modeling F 1973
k s matrix and is given by 2 6 H D6 4
is the s r.r s < k/ parameter matrix to be estimated. For this example, H
1 1 0 0
0 0 1 0
3 0 0 7 7 0 5 1
Restriction H0 W ˇ D H When the linear restriction ˇ D H is given, it implies that the same restrictions are imposed on all cointegrating vectors. You obtain the maximum likelihood estimator of ˇ by reduced rank regression of yt on H yt 1 corrected for .yt 1 ; : : : ; yt pC1 ; Dt /, solving the following equation jH 0 S11 H
H 0 S10 S001 S01 H j D 0
for the eigenvalues 1 > 1 > > s > 0 and eigenvectors .v1 ; : : : ; vs /, Sij given in the preceding section. Then choose O D .v1 ; : : : ; vr / that corresponds to the r largest eigenvalues, and the ˇO is O H . The test statistic for H0 W ˇ D H is given by T
r X
i /=.1
logf.1
d
i /g ! 2r.k
s/
i D1
If the series has no deterministic trend, the constant term should be restricted by ˛0? ı0 D 0 as in Case 2. Then H is given by 2 6 6 H D6 6 4
1 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
3 7 7 7 7 5
The following statements test that 2 ˇ1 C ˇ2 D 0: proc varmax data=simul2; model y1 y2 / p=2 ecm=(rank=1 normalize=y1); cointeg rank=1 h=(1,-2); run;
Figure 30.58 shows the results of testing H0 W 2ˇ1 C ˇ2 D 0. The input H matrix is H D .1 2/0 . The adjustment coefficient is reestimated under the restriction, and the test indicates that you cannot reject the null hypothesis.
1974 F Chapter 30: The VARMAX Procedure
Figure 30.58 Testing of Linear Restriction (H= Option) The VARMAX Procedure Beta Under Restriction Variable
1
y1 y2
1.00000 -2.00000
Alpha Under Restriction Variable
1
y1 y2
-0.47404 0.17534
Hypothesis Test
Index
Eigenvalue
Restricted Eigenvalue
DF
Chi-Square
Pr > ChiSq
1
0.4644
0.4616
1
0.51
0.4738
Test for the Weak Exogeneity and Restrictions of Alpha Consider a vector error correction model: 0
yt D ˛ˇ yt
1
C
p X1
ˆi yt
i
C ADt C t
i D1
Divide the process yt into .y01t ; y02t /0 with dimension k1 and k2 and the † into †D
†11 †12 †21 †22
Similarly, the parameters can be decomposed as follows: ˛D
˛1 ˛2
ˆi
D
ˆ1i ˆ2i
AD
A1 A2
Then the VECM(p) form can be rewritten by using the decomposed parameters and processes:
y1t y2t
D
˛1 ˛2
0
ˇ yt
1
C
p X1 i D1
ˆ1i ˆ2i
yt
i
C
A1 A2
Dt C
1t 2t
Vector Error Correction Modeling F 1975
The conditional model for y1t given y2t is !˛2 /ˇ 0 yt
D !y2t C .˛1
y1t
1C
p X1
.ˆ1i
!ˆ2i /yt
i
i D1
C.A1
!A2 /Dt C 1t
!2t
and the marginal model of y2t is y2t D ˛2 ˇ 0 yt
1C
p X1
ˆ2i yt
i
C A2 Dt C 2t
i D1
where ! D †12 †221 . The test of weak exogeneity of y2t for the parameters .˛1 ; ˇ/ determines whether ˛2 D 0. Weak exogeneity means that there is no information about ˇ in the marginal model or that the variables y2t do not react to a disequilibrium. Restriction H0 W ˛ D J Consider the null hypothesis H0 W ˛ D J , where J is a k m matrix with r m < k. From the previous residual regression equation R0t D ˛ˇ 0 R1t C O t D J ˇ 0 R1t C O t you can obtain JN 0 R0t
D
ˇ 0 R1t C JN 0 O t
J?0 R0t
D J?0 O t
where JN D J.J 0 J /
1
and J? is orthogonal to J such that J?0 J D 0.
Define †JJ? D JN 0 †J? and †J? J? D J?0 †J? and let ! D †JJ? †J?1 J? . Then JN 0 R0t can be written as JN 0 R0t D
ˇ 0 R1t C !J?0 R0t C JN 0 O t
!J?0 O t
Using the marginal distribution of J?0 R0t and the conditional distribution of JN 0 R0t , the new residuals are computed as RQ J t RQ 1t
D JN 0 R0t D R1t
SJJ? SJ?1J? J?0 R0t S1J? SJ?1J? J?0 R0t
1976 F Chapter 30: The VARMAX Procedure
where SJJ? D JN 0 S00 J? ; SJ? J? D J?0 S00 J? ; and SJ? 1 D J?0 S01 In terms of RQ J t and RQ 1t , the MLE of ˇ is computed by using the reduced rank regression. Let Sij:J? D
T 1 X Q Q0 Rit Rjt ; for i; j D 1; J T t D1
Under the null hypothesis H0 W ˛ D J , the MLE ˇQ is computed by solving the equation jS11:J?
1 S1J:J? SJJ:J S jD0 ? J1:J?
Then ˇQ D .v1 ; : : : ; vr /, where the eigenvectors correspond to the r largest eigenvalues. The likelihood ratio test for H0 W ˛ D J is T
r X
logf.1
i /=.1
d
i /g ! 2r.k
m/
i D1
The test of weak exogeneity of y2t is a special case of the test ˛ D J , considering J D .Ik1 ; 0/0 . Consider the previous example with four variables ( mt ; yt ; itb ; itd ). If r D 1, you formulate the weak exogeneity of (yt ; itb ; itd ) for mt as J D Œ1; 0; 0; 00 and the weak exogeneity of itd for (mt ; yt ; itb ) as J D ŒI3 ; 00 . The following statements test the weak exogeneity of other variables, assuming r D 1: proc varmax data=simul2; model y1 y2 / p=2 ecm=(rank=1 normalize=y1); cointeg rank=1 exogeneity; run;
Figure 30.59 shows that each variable is not the weak exogeneity of other variable. Figure 30.59 Testing of Weak Exogeneity (EXOGENEITY Option) The VARMAX Procedure Testing Weak Exogeneity of Each Variables Variable y1 y2
DF
Chi-Square
Pr > ChiSq
1 1
53.46 8.76
<.0001 0.0031
Vector Error Correction Modeling F 1977
Forecasting of the VECM Consider the cointegrated moving-average representation of the differenced process of yt yt D ı C ‰.B/t Assume that y0 D 0. The linear process yt can be written as
yt D ıt C
t X t i X
‰ j i
i D1 j D0
Therefore, for any l > 0,
yt Cl D ı.t C l/ C
t t Cl Xi X
‰j i C
i D1 j D0
l i l X X
‰j t Ci
i D1 j D0
The l-step-ahead forecast is derived from the preceding equation:
yt Cljt D .t C l/ C
t t Cl Xi X
‰ j i
i D1 j D0
Note that lim ˇ 0 yt Cljt D 0
l!1
P i 0 since liml!1 tjCl D0 ‰j D ‰.1/ and ˇ ‰.1/ D 0. The long-run forecast of the cointegrated system shows that the cointegrated relationship holds, although there might exist some deviations from the equilibrium status in the short-run. The covariance matrix of the predict error et Cljt D ytCl yt Cljt is l l i l i X X X †.l/ D Œ. ‰j /†. ‰j0 / i D1 j D0
j D0
When the linear process is represented as a VECM(p) model, you can obtain
yt D …yt
1C
p X1
ˆj yt
j D1
The transition equation is defined as zt D F zt
1
C et
j
C ı C t
1978 F Chapter 30: The VARMAX Procedure
where zt D .y0t 2 6 6 6 F D6 6 4
0 0 1 ; yt ; yt 1 ;
; y0t
Ik Ik 0 … .… C ˆ1 / ˆ2 0 Ik 0 :: :: :: : : : 0 0
0 pC2 /
0 ˆp 0 :: :: : : Ik 0
is a state vector and the transition matrix is 3
1
7 7 7 7 7 5
where 0 is a k k zero matrix. The observation equation can be written yt D ıt C H zt where H D ŒIk ; Ik ; 0; : : : ; 0. The l-step-ahead forecast is computed as yt Cljt D ı.t C l/ C HF l zt
Cointegration with Exogenous Variables The error correction model with exogenous variables can be written as follows: yt D ˛ˇ 0 yt
1C
p X1
ˆi yt
i
C ADt C
i D1
s X
‚i xt
i
C t
i D0
The following statements demonstrate how to fit VECMX(p; s), where p D 2 and s D 1 from the P=2 and XLAG=1 options: proc varmax data=simul3; model y1 y2 = x1 / p=2 xlag=1 ecm=(rank=1); run;
The following statements demonstrate how to BVECMX(2,1): proc varmax data=simul3; model y1 y2 = x1 / p=2 xlag=1 ecm=(rank=1) prior=(lambda=0.9 theta=0.1); run;
I(2) Model The VARX(p,s) model can be written in the error correction form: yt D ˛ˇ 0 yt
1C
p X1 i D1
ˆi yt
i
C ADt C
s X i D0
‚i xt
i
C t
I(2) Model F 1979
Pp
1 i D1 ˆi .
Let ˆ D Ik
If ˛ and ˇ have full-rank r, and rank.˛0? ˆ ˇ? / D k
r, then yt is an I.1/ process.
If the condition rank.˛0? ˆ ˇ? / D k r fails and ˛0? ˆ ˇ? has reduced-rank ˛0? ˆ ˇ? D 0 where and are .k r/ s matrices with s k r, then ˛? and ˇ? are defined as k .k r/ matrices of full rank such that ˛0 ˛? D 0 and ˇ 0 ˇ? D 0. If and have full-rank s, then the process yt is I.2/, which has the implication of I.2/ model for the moving-average representation.
yt D B0 C B1 t C C2
j t X X
i C C1
j D1 i D1
t X
i C C0 .B/t
i D1
The matrices C1 , C2 , and C0 .B/ are determined by the cointegration properties of the process, and B0 and B1 are determined by the initial values. For details, see Johansen (1995a). The implication of the I.2/ model for the autoregressive representation is given by
2
yt D …yt
ˆ yt
1
1
C
p X2
2
‰ i yt
i
C ADt C
i D1
Pp
1 j Di C1 ˆi
where ‰i D
and ˆ D Ik
s X
‚i xt
i
C t
i D0
Pp
1 i D1 ˆi .
Test for I(2) The I.2/ cointegrated model is given by the following parameter restrictions: Hr;s W … D ˛ˇ 0 and ˛0? ˆ ˇ? D 0 where and are .k r/ s matrices with 0 s k r. Let Hr0 represent the I.1/ model where 0 represent the I.2/ model where and have full-rank s, and let ˛ and ˇ have full-rank r, let Hr;s Hr;s represent the I.2/ model where and have rank s. The following table shows the relation between the I.1/ models and the I.2/ models. Relation between the I.1/ and I.2/ Models
Table 30.2
I.2/ rnk
r
k H00
0 1 :: : k
s
1
H01 H10
I.1/ 1
k-1
H0;k 1 H1;k 2 :: : Hk 1;0
:: :
H0k H1;k 1 :: : Hk 1;1
D D :: : D
H00 H10 :: : Hk0 1
1980 F Chapter 30: The VARMAX Procedure
Johansen (1995a) proposed the two-step procedure to analyze the I.2/ model. In the first step, the values of .r; ˛; ˇ/ are estimated using the reduced rank regression analysis, performing the regression analysis 2 yt , yt 1 , and yt 1 on 2 yt 1 ; : : : ; 2 yt pC2 ; and Dt . This gives residuals R0t , R1t , and R2t , and residual product moment matrices Mij D
T 1 X 0 Rit Rjt for i; j D 0; 1; 2 T t D1
Perform the reduced rank regression analysis 2 yt on yt 1 corrected for yt 2 yt 1 ; : : : ; 2 yt pC2 ; and Dt , and solve the eigenvalue problem of the equation
1,
1 M20:1 M00:1 M02:1 j D 0
jM22:1
Mi1 M111 M1j for i; j D 0; 2.
where Mij:1 D Mij
In the second step, if .r; ˛; ˇ/ are known, the values of .s; ; / are determined us0 ing the reduced rank regression analysis, regressing ˛O 0? 2 yt on ˇO? yt 1 corrected for 2 2 0 yt 1 ; : : : ; yt pC2 ; Dt , and ˇO yt 1 . The reduced rank regression analysis reduces to the solution of an eigenvalue problem for the equation Mˇ? ˛? :ˇ M˛?1˛? :ˇ M˛? ˇ? :ˇ j D 0
jMˇ? ˇ? :ˇ where
0 Mˇ? ˇ? :ˇ D ˇ? .M11
Mˇ0 ? ˛? :ˇ
D
M˛? ˛? :ˇ D where ˛N D ˛.˛0 ˛/
M11 ˇ.ˇ 0 M11 ˇ/
1 0
ˇ M11 /ˇ? 0 M˛? ˇ? :ˇ D ˛N ? .M01 M01 ˇ.ˇ 0 M11 ˇ/ 1 ˇ 0 M11 /ˇ? ˛N 0? .M00 M01 ˇ.ˇ 0 M11 ˇ/ 1 ˇ 0 M10 /˛N ?
1.
The solution gives eigenvalues 1 > 1 > > s > 0 and eigenvectors .v1 ; : : : ; vs /. Then, the ML estimators are O D .v1 ; : : : ; vs / O D M˛? ˇ? :ˇ O The likelihood ratio test for the reduced rank model Hr;s with rank s in the model Hr;k is given by
Qr;s D
T
k Xr
log.1
i /; s D 0; : : : ; k
r
1
i DsC1
The following statements compute the rank test to test for cointegrated order 2:
r
D Hr0
Multivariate GARCH Modeling F 1981
proc varmax data=simul2; model y1 y2 / p=2 cointtest=(johansen=(iorder=2)); run;
The last two columns in Figure 30.60 explain the cointegration rank test with integrated order 1. The results indicate that there is the cointegrated relationship with the cointegration rank 1 with respect to the 0.05 significance level because the test statistic of 0.5552 is smaller than the critical value of 3.84. Now, look at the row associated with r D 1. Compare the test statistic value, 211.84512, to the critical value, 3.84, for the cointegrated order 2. There is no evidence that the series are integrated order 2 at the 0.05 significance level. Figure 30.60 Cointegrated I(2) Test (IORDER= Option) The VARMAX Procedure Cointegration Rank Test for I(2)
r\k-r-s 0 1 5% CV I(2)
2
1
720.40735
308.69199 211.84512 3.84000
15.34000
Trace of I(1) 61.7522 0.5552
5% CV of I(1) 15.34 3.84
Multivariate GARCH Modeling Stochastic volatility modeling is important in many areas, particularly in finance. To study the volatility of time series, GARCH models are widely used because they provide a good approach to conditional variance modeling.
BEKK Representation Engle and Kroner (1995) propose a general multivariate GARCH model and call it a BEKK representation. Let F.t 1/ be the sigma field generated by the past values of t , and let Ht be the conditional covariance matrix of the k-dimensional random vector t . Let Ht be measurable with respect to F.t 1/; then the multivariate GARCH model can be written as t jF.t
1/ N.0; Ht / q X Ht D C C A0i t
0 i t i Ai
C
i D1
where C , Ai and Gi are k k parameter matrices.
p X i D1
Gi0 Ht
i Gi
1982 F Chapter 30: The VARMAX Procedure
Consider a bivariate GARCH(1,1) model as follows: 0 2 1;t a11 a12 c11 c12 1 C D 2;t 1 1;t a21 a22 c12 c22 0 g11 g12 g11 g12 C Ht 1 g21 g22 g21 g22
Ht
1;t 1 2;t 2 2;t 1
1
1
a11 a12 a21 a22
or, representing the univariate model, h11;t
2 2 D c11 C a11 1;t 2 Cg11 h11;t 1
h12;t
D c12 C
1
C 2g11 g21 h12;t
2 a11 a12 1;t 1
Cg11 g12 h11;t h22;t
D c22 C
C 2a11 a21 1;t 1
C
2 2 C a21 2;t
C .a21 a12 C a11 a22 /1;t
C 2a12 a22 1;t
C 2g12 g22 h12;t
1 2;t 1 1
C
1
2 g21 h22;t 1
1 C .g21 g12 C g11 g22 /h12;t
2 2 1;t 1 a12
2 h11;t 1 Cg12
1 2;t 1
C
1 2;t 1
2 C a21 a22 2;t
1 C g21 g22 h22;t
1
1
2 2 a22 2;t 1
2 g22 h22;t 1
For the BEKK representation of the bivariate GARCH(1,1) model, the SAS statements are: model y1 y2; garch q=1 p=1 form=bekk;
CCC Representation Bollerslev (1990) propose a multivariate GARCH model with time-varying conditional variances and covariances but constant conditional correlations. The conditional covariance matrix Ht consists of Ht D Dt Dt where Dt is a k k stochastic diagonal matrix with element i t and is a k k time-invariant matrix with the typical element ij . The elements of Ht are
hi i;t
D ci C
q X
2 ai i;l i;t
lD1
hij;t
l
C
p X lD1
1=2
D ij .hi i;t hjj;t /
i ¤j
gi i;l hi i;t
l
i; j D 1; : : : k
Multivariate GARCH Modeling F 1983
Estimation of GARCH Model The log-likelihood function of the multivariate GARCH model is written without a constant term T
1X Œlog jHt j C t0 Ht 1 t 2
`D
t D1
The log-likelihood function is maximized by an iterative numerical method such as quasi-Newton optimization. The starting values for the regression parameters are obtained from the least squares estimates. The covariance of t is used as the starting values for the GARCH constant parameters, and the starting value used for the other GARCH parameters is either 10 6 or 10 3 depending on the GARCH models representation. For the identification of the parameters of a BEKK representation GARCH model, the diagonal elements of the GARCH constant, the ARCH, and the GARCH parameters are restricted to be positive.
Covariance Stationarity Define the multivariate GARCH process as ht D
1 X
G.B/i
1
Œc C A.B/t
i D1
where ht D vec.Ht /, c D vec.C0 /, and t D vec.t t0 /. This representation is equivalent to a GARCH(p; q) model by the following algebra:
ht
D c C A.B/t C
1 X
G.B/i
1
Œc C A.B/t
i D2
D c C A.B/t C G.B/
1 X
G.B/i
1
Œtmbc C A.B/t
i D1
D c C A.B/t C G.B/ht Defining A.B/ D tation.
Pq
i D1 .Ai
˝ Ai /0 B i and G.B/ D
Pp
i D1 .Gi
˝ Gi /0 B i gives a BEKK represen-
The necessary and sufficient conditions for covariance stationarity of the multivariate GARCH process is that all the eigenvalues of A.1/ C G.1/ are less than one in modulus.
An Example of a VAR(1)–ARCH(1) Model The following DATA step simulates a bivariate vector time series to provide test data for the multivariate GARCH model: data garch; retain seed 16587;
1984 F Chapter 30: The VARMAX Procedure
esq1 = 0; esq2 = 0; ly1 = 0; ly2 = 0; do i = 1 to 1000; ht = 6.25 + 0.5*esq1; call rannor(seed,ehat); e1 = sqrt(ht)*ehat; ht = 1.25 + 0.7*esq2; call rannor(seed,ehat); e2 = sqrt(ht)*ehat; y1 = 2 + 1.2*ly1 - 0.5*ly2 + e1; y2 = 4 + 0.6*ly1 + 0.3*ly2 + e2; if i>500 then output; esq1 = e1*e1; esq2 = e2*e2; ly1 = y1; ly2 = y2; end; keep y1 y2; run;
The following statements fit a VAR(1)–ARCH(1) model to the data. For a VAR-ARCH model, you specify the order of the autoregressive model with the P=1 option in the MODEL statement and the Q=1 option in the GARCH statement. In order to produce the initial and final values of parameters, the TECH=QN option is specified in the NLOPTIONS statement. proc varmax data=garch; model y1 y2 / p=1 print=(roots estimates diagnose); garch q=1; nloptions tech=qn; run;
Figure 30.61 through Figure 30.65 show the details of this example. Figure 30.61 shows the initial values of parameters.
Multivariate GARCH Modeling F 1985
Figure 30.61 Start Parameter Estimates for the VAR(1)–ARCH(1) Model The VARMAX Procedure Optimization Start Parameter Estimates
Estimate
Gradient Objective Function
2.249575 3.902673 1.231775 0.576890 -0.528405 0.343714 9.929763 0.193163 4.063245 0.001000 0 0 0.001000
5.787988 -4.856056 -17.155796 23.991176 14.656979 -12.763695 -0.111361 -0.684986 0.139403 -0.668058 -0.068657 -0.735896 -3.126628
N Parameter 1 2 3 4 5 6 7 8 9 10 11 12 13
CONST1 CONST2 AR1_1_1 AR1_2_1 AR1_1_2 AR1_2_2 GCHC1_1 GCHC1_2 GCHC2_2 ACH1_1_1 ACH1_2_1 ACH1_1_2 ACH1_2_2
Figure 30.62 shows the final parameter estimates. Figure 30.62 Results of Parameter Estimates for the VAR(1)–ARCH(1) Model The VARMAX Procedure Optimization Results Parameter Estimates N Parameter Estimate 1 2 3 4 5 6 7 8 9 10 11 12 13
CONST1 CONST2 AR1_1_1 AR1_2_1 AR1_1_2 AR1_2_2 GCHC1_1 GCHC1_2 GCHC2_2 ACH1_1_1 ACH1_2_1 ACH1_1_2 ACH1_2_2
1.943991 4.073898 1.220945 0.608263 -0.527121 0.303012 8.359045 -0.182483 1.602739 0.377569 0.032158 0.056491 0.710023
1986 F Chapter 30: The VARMAX Procedure
Figure 30.63 shows the conditional variance using the BEKK representation of the ARCH(1) model. The ARCH parameters are estimated by the vectorized parameter matrices. t jF.t
1/ N.0; Ht / 8:35905 0:18250 Ht D 0:18250 1:60275 0 0:37757 0:05649 C 0 0:03216 0:71002 t 1 t
0:37757 0:05649 1 0:03216 0:71002
Figure 30.63 ARCH(1) Parameter Estimates for the VAR(1)–ARCH(1) Model The VARMAX Procedure Type of Model Estimation Method Representation Type
VAR(1)-ARCH(1) Maximum Likelihood Estimation BEKK
GARCH Model Parameter Estimates
Parameter
Estimate
Standard Error
t Value
Pr > |t|
GCHC1_1 GCHC1_2 GCHC2_2 ACH1_1_1 ACH1_2_1 ACH1_1_2 ACH1_2_2
8.35905 -0.18248 1.60274 0.37757 0.03216 0.05649 0.71002
0.73116 0.21706 0.19398 0.07470 0.06971 0.02622 0.06844
11.43 -0.84 8.26 5.05 0.46 2.15 10.37
0.0001 0.4009 0.0001 0.0001 0.6448 0.0317 0.0001
Figure 30.64 shows the AR parameter estimates and their significance. The fitted VAR(1) model with the previous conditional covariance ARCH model is written as follows: 1:94399 1:22094 0:52712 yt D C yt 1 C t 4:07390 0:60826 0:30301
Output Data Sets F 1987
Figure 30.64 VAR(1) Parameter Estimates for the VAR(1)–ARCH(1) Model Model Parameter Estimates
Equation Parameter
Estimate
y1
1.94399 1.22095 -0.52712 4.07390 0.60826 0.30301
y2
CONST1 AR1_1_1 AR1_1_2 CONST2 AR1_2_1 AR1_2_2
Standard Error t Value Pr > |t| Variable 0.21017 0.02564 0.02836 0.10574 0.01231 0.01498
9.25 47.63 -18.59 38.53 49.42 20.23
0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
1 y1(t-1) y2(t-1) 1 y1(t-1) y2(t-1)
Figure 30.65 shows the roots of the AR and ARCH characteristic polynomials. The eigenvalues have a modulus less than one. Figure 30.65 Roots for the VAR(1)–ARCH(1) Model Roots of AR Characteristic Polynomial Index
Real
Imaginary
Modulus
Radian
Degree
1 2
0.76198 0.76198
0.33163 -0.33163
0.8310 0.8310
0.4105 -0.4105
23.5197 -23.5197
Roots of GARCH Characteristic Polynomial Index
Real
Imaginary
Modulus
Radian
Degree
1 2 3 4
0.51180 0.26627 0.26627 0.13853
0.00000 0.00000 0.00000 0.00000
0.5118 0.2663 0.2663 0.1385
0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
Output Data Sets The VARMAX procedure can create the OUT=, OUTEST=, OUTHT=, and OUTSTAT= data sets. In general, if processing fails, the output is not recorded or is set to missing in the relevant output data set, and appropriate error and/or warning messages are recorded in the log.
OUT= Data Set The OUT= data set contains the forecast values produced by the OUTPUT statement. The following output variables can be created:
1988 F Chapter 30: The VARMAX Procedure
the BY variables the ID variable the MODEL statement dependent (endogenous) variables. These variables contain the actual values from the input data set. FORi, numeric variables that contain the forecasts. The FORi variables contain the forecasts for the ith endogenous variable in the MODEL statement list. Forecasts are one-step-ahead predictions until the end of the data or until the observation specified by the BACK= option. Multistep forecasts can be computed after that point based on the LEAD= option. RESi, numeric variables that contain the residual for the forecast of the ith endogenous variable in the MODEL statement list. For multistep forecast observations, the actual values are missing and the RESi variables contain missing values. STDi, numeric variables that contain the standard deviation for the forecast of the ith endogenous variable in the MODEL statement list. The values of the STDi variables can be used to construct univariate confidence limits for the corresponding forecasts. LCIi, numeric variables that contain the lower confidence limits for the corresponding forecasts of the ith endogenous variable in the MODEL statement list. UCIi, numeric variables that contain the upper confidence limits for the corresponding forecasts of the ith endogenous variable in the MODEL statement list. The OUT= data set contains the values shown in Table 30.3 and Table 30.4 for a bivariate case. Table 30.3
Table 30.4
OUT= Data Set
Obs
ID variable
y1
FOR1
RES1
STD1
LCI1
UCI1
1 2 :: :
date date
y11 y12
f11 f12
r11 r12
11 11
l11 l12
u11 u12
OUT= Data Set Continued
Obs
y2
FOR2
RES2
STD2
LCI2
UCI2
1 2 :: :
y21 y22
f21 f22
r21 r22
22 22
l21 l22
u21 u22
Consider the following example: proc varmax data=simul1 noprint; id date interval=year; model y1 y2 / p=1 noint; output out=out lead=5; run;
OUTEST= Data Set F 1989
proc print data=out(firstobs=98); run;
The output in Figure 30.66 shows part of the results of the OUT= data set for the preceding example. Figure 30.66 OUT= Data Set Obs
date
y1
FOR1
RES1
STD1
LCI1
UCI1
98 99 100 101 102 103 104 105
1997 1998 1999 2000 2001 2002 2003 2004
-0.58433 -2.07170 -3.38342 . . . . .
-0.13500 -1.00649 -2.58612 -3.59212 -3.09448 -2.17433 -1.11395 -0.14342
-0.44934 -1.06522 -0.79730 . . . . .
1.13523 1.13523 1.13523 1.13523 1.70915 2.14472 2.43166 2.58740
-2.36001 -3.23150 -4.81113 -5.81713 -6.44435 -6.37792 -5.87992 -5.21463
2.09002 1.21853 -0.36111 -1.36711 0.25539 2.02925 3.65203 4.92779
Obs
y2
FOR2
RES2
STD2
LCI2
UCI2
98 99 100 101 102 103 104 105
0.64397 0.35925 -0.64999 . . . . .
-0.34932 -0.07132 -0.99354 -2.09873 -2.77050 -2.75724 -2.24943 -1.47460
0.99329 0.43057 0.34355 . . . . .
1.19096 1.19096 1.19096 1.19096 1.47666 1.74212 2.01925 2.25169
-2.68357 -2.40557 -3.32779 -4.43298 -5.66469 -6.17173 -6.20709 -5.88782
1.98492 2.26292 1.34070 0.23551 0.12369 0.65725 1.70823 2.93863
OUTEST= Data Set The OUTEST= data set contains estimation results of the fitted model produced by the VARMAX statement. The following output variables can be created: the BY variables NAME, a character variable that contains the name of endogenous (dependent) variables or the name of the parameters for the covariance of the matrix of the parameter estimates if the OUTCOV option is specified TYPE, a character variable that contains the value EST for parameter estimates, the value STD for standard error of parameter estimates, and the value COV for the covariance of the matrix of the parameter estimates if the OUTCOV option is specified CONST, a numeric variable that contains the estimates of constant parameters and their standard errors SEASON_i , a numeric variable that contains the estimates of seasonal dummy parameters and their standard errors, where i D 1; : : : ; .nseason 1/, and nseason is based on the NSEASON= option
1990 F Chapter 30: The VARMAX Procedure
LTREND, a numeric variable that contains the estimates of linear trend parameters and their standard errors QTREND, a numeric variable that contains the estimates of quadratic trend parameters and their standard errors XLl_i , numeric variables that contain the estimates of exogenous parameters and their standard errors, where l is the lag lth coefficient matrix and i D 1; : : : ; r, where r is the number of exogenous variables ARl_i , numeric variables that contain the estimates of autoregressive parameters and their standard errors, where l is the lag lth coefficient matrix and i D 1; : : : ; k, where k is the number of endogenous variables MAl_i , numeric variables that contain the estimates of moving-average parameters and their standard errors, where l is the lag lth coefficient matrix and i D 1; : : : ; k, where k is the number of endogenous variables ACHl_i are numeric variables that contain the estimates of the ARCH parameters of the covariance matrix and their standard errors, where l is the lag lth coefficient matrix and i D 1; : : : ; k for BEKK and CCC representations, where k is the number of endogenous variables. GCHl_i are numeric variables that contain the estimates of the GARCH parameters of the covariance matrix and their standard errors, where l is the lag lth coefficient matrix and i D 1; : : : ; k for BEKK and CCC representations, where k is the number of endogenous variables. GCHC_i are numeric variables that contain the estimates of the constant parameters of the covariance matrix and their standard errors, where i D 1; : : : ; k for BEKK representation, k is the number of endogenous variables, and i D 1 for CCC representation. CCC_i are numeric variables that contain the estimates of the conditional constant correlation parameters for CCC representation where i D 2; : : : ; k. The OUTEST= data set contains the values shown Table 30.5 for a bivariate case. Table 30.5
OUTEST= Data Set
Obs
NAME
TYPE
CONST
AR1_1
AR1_2
AR2_1
AR2_2
1 2 3 4
y1
EST STD EST STD
ı1 se(ı1 ) ı2 se(ı2 )
1;11 se(1;11 ) 1;21 se(1;21 )
1;12 se(1;12 ) 1;22 se(1;22 )
2;11 se(2;11 ) 2;21 se(2;21 )
2;12 se(2;12 ) 2;22 se(2;22 )
y2
Consider the following example: proc varmax data=simul2 outest=est; model y1 y2 / p=2 noint ecm=(rank=1 normalize=y1)
OUTHT= Data Set F 1991
noprint; run; proc print data=est; run;
The output in Figure 30.67 shows the results of the OUTEST= data set. Figure 30.67 OUTEST= Data Set Obs
NAME
TYPE
AR1_1
AR1_2
AR2_1
AR2_2
1 2 3 4
y1
EST STD EST STD
-0.46680 0.04786 0.10667 0.05146
0.91295 0.09359 -0.20862 0.10064
-0.74332 0.04526 0.40493 0.04867
-0.74621 0.04769 -0.57157 0.05128
y2
OUTHT= Data Set The OUTHT= data set contains prediction of the fitted GARCH model produced by the GARCH statement. The following output variables can be created. the BY variables Hi _j , numeric variables that contain the prediction of covariance, where 1 i < j k, where k is the number of dependent variables The OUTHT= data set contains the values shown in Table 30.6 for a bivariate case. Table 30.6
OUTHT= Data Set
Obs
H1_1
H1_2
H2_2
1 2 :
h111 h112 :
h121 h122 :
h221 h222 :
Consider the following example of the OUTHT= option: proc varmax data=garch; model y1 y2 / p=1 print=(roots estimates diagnose); garch q=1 outht=ht; run; proc print data=ht(firstobs=495); run;
1992 F Chapter 30: The VARMAX Procedure
The output in Figure 30.68 shows the part of the OUTHT= data set. Figure 30.68 OUTHT= Data Set Obs 495 496 497 498 499 500
h1_1 9.36568 8.46807 9.19686 8.40787 8.88429 8.60844
h1_2 -1.10406 -0.17464 0.09762 -0.33463 0.03646 -0.40260
h2_2 2.44644 1.60330 1.69639 2.07687 1.69401 1.79703
OUTSTAT= Data Set The OUTSTAT= data set contains estimation results of the fitted model produced by the VARMAX statement. The following output variables can be created. The subindex i is 1; : : : ; k, where k is the number of endogenous variables. the BY variables NAME, a character variable that contains the name of endogenous (dependent) variables SIGMA_i, numeric variables that contain the estimate of the innovation covariance matrix AICC, a numeric variable that contains the corrected Akaike’s information criterion value HQC, a numeric variable that contains the Hannan-Quinn’s information criterion value AIC, a numeric variable that contains the Akaike’s information criterion value SBC, a numeric variable that contains the Schwarz Bayesian’s information criterion value FPEC, a numeric variable that contains the final prediction error criterion value FValue, a numeric variable that contains the F statistics PValue, a numeric variable that contains p-value for the F statistics If the JOHANSEN= option is specified, the following items are added: Eigenvalue, a numeric variable that contains eigenvalues for the cointegration rank test of integrated order 1 RestrictedEigenvalue, a numeric variable that contains eigenvalues for the cointegration rank test of integrated order 1 when the NOINT option is not specified Beta_i , numeric variables that contain long-run effect parameter estimates, ˇ
OUTSTAT= Data Set F 1993
Alpha_i , numeric variables that contain adjustment parameter estimates, ˛ If the JOHANSEN=(IORDER=2) option is specified, the following items are added: EValueI2_i , numeric variables that contain eigenvalues for the cointegration rank test of integrated order 2 EValueI1, a numeric variable that contains eigenvalues for the cointegration rank test of integrated order 1 Eta_i , numeric variables that contain the parameter estimates in integrated order 2, Xi_i , numeric variables that contain the parameter estimates in integrated order 2, The OUTSTAT= data set contains the values shown Table 30.7 for a bivariate case. Table 30.7
OUTSTAT= Data Set
Obs
NAME
SIGMA_1
SIGMA_2
AICC
RSquare
FValue
PValue
1 2
y1 y2
11 21
12 22
aicc .
R12 R22
F1 F2
prob1 prob2
Obs
EValueI2_1
EValueI2_2
EValueI1
Beta_1
Beta_2
1 2
e11 e21
e12 .
e1 e2
ˇ11 ˇ21
ˇ12 ˇ21
Obs
Alpha_1
Alpha_2
Eta_1
Eta_2
Xi_1
Xi_2
1 2
˛11 ˛21
˛12 ˛22
11 21
12 22
11 21
12 22
Consider the following example: proc varmax data=simul2 outstat=stat; model y1 y2 / p=2 noint cointtest=(johansen=(iorder=2)) ecm=(rank=1 normalize=y1) noprint; run; proc print data=stat; run;
The output in Figure 30.69 shows the results of the OUTSTAT= data set.
1994 F Chapter 30: The VARMAX Procedure
Figure 30.69 OUTSTAT= Data Set Obs
NAME
SIGMA_1
SIGMA_2
AICC
HQC
AIC
SBC
FPEC
1 2
y1 y2
94.7557 4.5268
4.527 109.570
9.37221 .
9.43236 .
9.36834 .
9.52661 .
11712.14 .
Obs
RSquare
FValue
PValue
1 2
0.93905 0.94085
482.782 498.423
5.9027E-57 1.4445E-57
EValue I2_1
EValue I2_2
EValue I1
0.98486 0.81451
0.95079 .
0.50864 0.01108
Beta_1
Beta_2
1.00000 -1.95575
1.00000 -1.33622
Obs
Alpha_1
Alpha_2
Eta_1
Eta_2
Xi_1
Xi_2
1 2
-0.46680 0.10667
0.007937 0.033530
-0.012307 0.015555
0.027030 0.023086
54.1606 -79.4240
-52.3144 -18.3308
Printed Output The default printed output produced by the VARMAX procedure is described in the following list: descriptive statistics, which include the number of observations used, the names of the variables, their means and standard deviations (STD), their minimums and maximums, the differencing operations used, and the labels of the variables a type of model to fit the data and an estimation method a table of parameter estimates that shows the following for each parameter: the variable name for the left-hand side of equation, the parameter name, the parameter estimate, the approximate standard error, t value, the approximate probability (P r > jtj), and the variable name for the right-hand side of equations in terms of each parameter the innovation covariance matrix the information criteria If PRINT=ESTIMATES is specified, the VARMAX procedure prints the following list with the default printed output: the estimates of the constant vector (or seasonal constant matrix), the trend vector, the coefficient matrices of the distributed lags, the AR coefficient matrices, and the MA coefficient matrices the ALPHA and BETA parameter estimates for the error correction model the schematic representation of parameter estimates
ODS Table Names F 1995
If PRINT=DIAGNOSE is specified, the VARMAX procedure prints the following list with the default printed output: the cross-covariance and cross-correlation matrices of the residuals the tables of test statistics for the hypothesis that the residuals of the model are white noise: – Durbin-Watson (DW) statistics – F test for autoregressive conditional heteroscedastic (ARCH) disturbances – F test for AR disturbance – Jarque-Bera normality test – Portmanteau test
ODS Table Names The VARMAX procedure assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table: Table 30.8
ODS Tables Produced in the VARMAX Procedure
ODS Table Name
Description
Option
ODS Tables Created by the MODEL Statement AccumImpulse
Accumulated impulse response matrices
AccumImpulsebyVar Accumulated impulse response by variable AccumImpulseX Accumulated transfer function matrices AccumImpulseXbyVar Accumulated transfer function by variable Alpha ˛ coefficients AlphaInECM ˛ coefficients when rank=r AlphaOnDrift ˛ coefficients under the restriction of a deterministic term AlphaBetaInECM … D ˛ˇ 0 coefficients when rank=r ANOVA Univariate model diagnostic checks for the residuals ARCoef AR coefficients ARRoots Roots of AR characteristic polynomial Beta ˇ coefficients BetaInECM ˇ coefficients when rank=r BetaOnDrift ˇ coefficients under the restriction of a deterministic term
IMPULSE=(ACCUM) IMPULSE=(ALL) IMPULSE=(ACCUM) IMPULSE=(ALL) IMPULSX=(ACCUM) IMPULSX=(ALL) IMPULSX=(ACCUM) IMPULSX=(ALL) JOHANSEN= ECM= JOHANSEN= ECM= PRINT=DIAGNOSE P= ROOTS with P= JOHANSEN= ECM= JOHANSEN=
1996 F Chapter 30: The VARMAX Procedure
Table 30.8
continued
ODS Table Name Constant CorrB CorrResiduals CorrResidualsbyVar CorrResidualsGraph
Description
Constant estimates Correlations of parameter estimates Correlations of residuals Correlations of residuals by variable Schematic representation of correlations of residuals CorrXGraph Schematic representation of sample correlations of independent series CorrYGraph Schematic representation of sample correlations of dependent series CorrXLags Correlations of independent series CorrXbyVar Correlations of independent series by variable CorrYLags Correlations of dependent series CorrYbyVar Correlations of dependent series by variable CovB Covariances of parameter estimates CovInnovation Covariances of the innovations CovPredictError Covariance matrices of the prediction error CovPredictErrorbyVar Covariances of the prediction error by variable CovResiduals Covariances of residuals CovResidualsbyVar Covariances of residuals by variable CovXLags Covariances of independent series CovXbyVar Covariances of independent series by variable CovYLags Covariances of dependent series CovYbyVar Covariances of dependent series by variable DecomposeCovDecomposition of the prediction error PredictError covariances DecomposeCovDecomposition of the prediction error PredictErrorbyVar covariances by variable DFTest Dickey-Fuller test DiagnostAR Test the AR disturbance for the residuals DiagnostWN Test the ARCH disturbance and normality for the residuals DynamicARCoef AR coefficients of the dynamic model DynamicConstant Constant estimates of the dynamic model DynamicCov- Inno- Covariances of the innovations of the dyvation namic model DynamicLinearTrend Linear trend estimates of the dynamic model
Option without NOINT CORRB PRINT=DIAGNOSE PRINT=DIAGNOSE PRINT=DIAGNOSE CORRX CORRY CORRX CORRX CORRY CORRY COVB default COVPE COVPE PRINT=DIAGNOSE PRINT=DIAGNOSE COVX COVX COVY COVY DECOMPOSE DECOMPOSE DFTEST PRINT=DIAGNOSE PRINT=DIAGNOSE DYNAMIC DYNAMIC DYNAMIC DYNAMIC
ODS Table Names F 1997
Table 30.8
continued
ODS Table Name
Description
DynamicMACoef DynamicSConstant
MA coefficients of the dynamic model Seasonal constant estimates of the dynamic model DynamicParameter- Parameter estimates table of the dynamic Estimates model DynamicParameter- Schematic representation of the parameGraph ters of the dynamic model DynamicQuadTrend Quadratic trend estimates of the dynamic model DynamicSeasonGraph Schematic representation of the seasonal dummies of the dynamic model DynamicXLagCoef Dependent coefficients of the dynamic model Hypothesis Hypothesis of different deterministic terms in cointegration rank test HypothesisTest Test hypothesis of different deterministic terms in cointegration rank test EigenvalueI2 Eigenvalues in integrated order 2 Eta
coefficients
InfiniteARRepresent InfoCriteria LinearTrend MACoef MARoots MaxTest
Infinite order ar representation Information criteria Linear trend estimates MA coefficients Roots of MA characteristic polynomial Cointegration rank test using the maximum eigenvalue Tentative order selection Type of model Number of observations Orthogonalized impulse response matrices Orthogonalized impulse response by variable Parameter estimates table Schematic representation of the parameters Partial autoregression matrices Schematic representation of partial autoregression Partial canonical correlation analysis Partial cross-correlation matrices Partial cross-correlations by variable
Minic ModelType NObs OrthoImpulse OrthoImpulsebyVar ParameterEstimates ParameterGraph PartialAR PartialARGraph PartialCanCorr PartialCorr PartialCorrbyVar
Option DYNAMIC DYNAMIC DYNAMIC DYNAMIC DYNAMIC DYNAMIC DYNAMIC JOHANSEN= JOHANSEN= JOHANSEN= (IORDER=2) JOHANSEN= (IORDER=2) IARR default TREND= Q= ROOTS with Q= JOHANSEN= (TYPE=MAX) MINIC MINIC= default default IMPULSE=(ORTH) IMPULSE=(ALL) IMPULSE=(ORTH) IMPULSE=(ALL) default PRINT=ESTIMATES PARCOEF PARCOEF PCANCORR PCORR PCORR
1998 F Chapter 30: The VARMAX Procedure
Table 30.8
continued
ODS Table Name
Description
Option
PartialCorrGraph
Schematic representation of partial cross-correlations Chi-square test table for residual crosscorrelations Proportions of prediction error covariance decomposition Proportions of prediction error covariance decomposition by variable Cointegration rank test in integrated order 2 Cointegration rank test using the maximum eigenvalue under the restriction of a deterministic term Cointegration rank test using the trace under the restriction of a deterministic term Quadratic trend estimates Schematic representation of the seasonal dummies Seasonal constant estimates Impulse response matrices
PCORR
PortmanteauTest ProportionCov- PredictError ProportionCov- PredictErrorbyVar RankTestI2 RestrictMaxTest
RestrictTraceTest
QuadTrend SeasonGraph SConstant SimpleImpulse
SimpleImpulsebyVar Impulse response by variable SimpleImpulseX
Impulse response matrices of transfer function SimpleImpulseXbyVar Impulse response of transfer function by variable Summary Simple summary statistics SWTest Common trends test TraceTest Cointegration rank test using the trace Xi
coefficient matrix
XLagCoef YWEstimates
Dependent coefficients Yule-Walker estimates
PRINT=DIAGNOSE DECOMPOSE DECOMPOSE JOHANSEN= (IORDER=2) JOHANSEN= (TYPE=MAX) without NOINT JOHANSEN= (TYPE=TRACE) without NOINT TREND=QUAD PRINT=ESTIMATES NSEASON= IMPULSE=(SIMPLE) IMPULSE=(ALL) IMPULSE=(SIMPLE) IMPULSE=(ALL) IMPULSX=(SIMPLE) IMPULSX=(ALL) IMPULSX=(SIMPLE) IMPULSX=(ALL) default SW= JOHANSEN= (TYPE=TRACE) JOHANSEN= (IORDER=2) XLAG= YW
ODS Tables Created by the GARCH Statement ARCHCoef GARCHCoef GARCHConstant GARCHParameterEstimates
ARCH coefficients GARCH coefficients GARCH constant estimates GARCH parameter estimates table
Q= P= PRINT=ESTIMATES default
ODS Table Names F 1999
Table 30.8
continued
ODS Table Name
Description
Option
GARCHParameterGraph GARCHRoots
Schematic representation of the garch parameters Roots of GARCH characteristic polynomial
PRINT=ESTIMATES ROOTS
ODS Tables Created by the COINTEG Statement or the ECM option AlphaInECM AlphaBetaInECM AlphaOnAlpha AlphaOnBeta AlphaTestResults BetaInECM BetaOnBeta BetaOnAlpha BetaTestResults GrangerRepresent HMatrix JMatrix WeakExogeneity
˛ coefficients when rank=r … D ˛ˇ 0 coefficients when rank=r ˛ coefficients under the restriction of ˛ ˛ coefficients under the restriction of ˇ Hypothesis testing of ˇ ˇ coefficients when rank=r ˇ coefficients under the restriction of ˇ ˇ coefficients under the restriction of ˛ Hypothesis testing of ˇ Coefficient of Granger representation Restriction matrix for ˇ Restriction matrix for ˛ Testing weak exogeneity of each dependent variable with respect to BETA
PRINT=ESTIMATES PRINT=ESTIMATES J= H= J= PRINT=ESTIMATES H= J= H= PRINT=ESTIMATES H= J= EXOGENEITY
ODS Tables Created by the CAUSAL Statement CausalityTest GroupVars
Granger causality test Two groups of variables
default default
ODS Tables Created by the RESTRICT Statement Restrict
Restriction table
default
ODS Tables Created by the TEST Statement Test
Wald test
default
ODS Tables Created by the OUTPUT Statement Forecasts
Forecasts table
without NOPRINT
Note that the ODS table names suffixed by “byVar” can be obtained with the PRINTFORM=UNIVARIATE option.
2000 F Chapter 30: The VARMAX Procedure
ODS Graphics This section describes the use of ODS for creating statistical graphs with the VARMAX procedure. To request these graphs, you must specify the ODS GRAPHICS ON statement. When ODS GRAPHICS are in effect, the VARMAX procedure produces a variety of plots for each dependent variable. The plots available are as follows: The procedure displays the following plots for each dependent variable in the MODEL statement with the PLOT= option in the VARMAX statement: – impulse response function – impulse response of the transfer function – time series and predicted series – prediction errors – distribution of the prediction errors – normal quantile of the prediction errors – ACF of the prediction errors – PACF of the prediction errors – IACF of the prediction errors – log scaled white noise test of the prediction errors The procedure displays forecast plots for each dependent variable in the OUTPUT statement with the PLOT= option in the VARMAX statement.
ODS Graph Names The VARMAX procedure assigns a name to each graph it creates by using ODS. You can use these names to reference the graphs when using ODS. The names are listed in Table 30.9. Table 30.9
ODS Graphics Produced in the VARMAX Procedure
ODS Table Name
Plot Description
Statement
ErrorACFPlot
Autocorrelation function of prediction errors Inverse autocorrelation function of prediction errors Partial autocorrelation function of prediction errors Diagnostics of prediction errors Histogram and Q-Q plot of prediction errors
MODEL
ErrorIACFPlot ErrorPACFPlot ErrorDiagnosticsPanel ErrorNormalityPanel
MODEL MODEL MODEL MODEL
Computational Issues F 2001
Table 30.9
continued
ODS Table Name
Plot Description
Statement
ErrorDistribution ErrorQQPlot ErrorWhiteNoisePlot ErrorPlot ModelPlot AccumulatedIRFPanel AccumulatedIRFXPanel
Distribution of prediction errors Q-Q plot of prediction errors White noise test of prediction errors Prediction errors Time series and predicted series Accumulated impulse response function Accumulated impulse response of transfer function Orthogonalized impulse response function Simple impulse response function Simple impulse response of transfer function Time series and forecasts Forecasts
MODEL MODEL MODEL MODEL MODEL MODEL MODEL
OrthogonalIRFPanel SimpleIRFPanel SimpleIRFXPanel ModelForecastsPlot ForecastsOnlyPlot
MODEL MODEL MODEL OUTPUT OUTPUT
Computational Issues Computational Method The VARMAX procedure uses numerous linear algebra routines and frequently uses the sweep operator (Goodnight 1979) and the Cholesky root (Golub and Van Loan 1983). In addition, the VARMAX procedure uses the nonlinear optimization (NLO) subsystem to perform nonlinear optimization tasks for the maximum likelihood estimation. The optimization requires intensive computation.
Convergence Problems For some data sets, the computation algorithm can fail to converge. Nonconvergence can result from a number of causes, including flat or ridged likelihood surfaces and ill-conditioned data. If you experience convergence problems, the following points might be helpful: Data that contain extreme values can affect results in PROC VARMAX. Rescaling the data can improve stability. Changing the TECH=, MAXITER=, and MAXFUNC= options in the NLOPTIONS statement can improve the stability of the optimization process. Specifying a different model that might fit the data more closely and might improve convergence.
2002 F Chapter 30: The VARMAX Procedure
Memory Let T be the length of each series, k be the number of dependent variables, p be the order of autoregressive terms, and q be the order of moving-average terms. The number of parameters to estimate for a VARMA(p; q) model is k C .p C q/k 2 C k .k C 1/=2 As k increases, the number of parameters to estimate increases very quickly. Furthermore the memory requirement for VARMA(p; q) quadratically increases as k and T increase. For a VARMAX(p; q; s) model and GARCH-type multivariate conditional heteroscedasticity models, the number of parameters to estimate and the memory requirements are considerable.
Computing Time PROC VARMAX is computationally intensive, and execution times can be long. Extensive CPU time is often required to compute the maximum likelihood estimates.
Examples: VARMAX Procedure
Example 30.1: Analysis of U.S. Economic Variables Consider the following four-dimensional system of U.S. economic variables. Quarterly data for the years 1954 to 1987 are used (Lütkepohl 1993, Table E.3.). title ’Analysis of U.S. Economic Variables’; data us_money; date=intnx( ’qtr’, ’01jan54’d, _n_-1 ); format date yyq. ; input y1 y2 y3 y4 @@; y1=log(y1); y2=log(y2); label y1=’log(real money stock M1)’ y2=’log(GNP in bil. of 1982 dollars)’ y3=’Discount rate on 91-day T-bills’ y4=’Yield on 20-year Treasury bonds’; datalines; 450.9 1406.8 0.010800000 0.026133333 453.0 1401.2 0.0081333333 0.025233333 ... more lines ...
The following statements plot the series and proceed with the VARMAX procedure.
Example 30.1: Analysis of U.S. Economic Variables F 2003
proc timeseries data=us_money vectorplot=series; id date interval=qtr; var y1 y2; run;
Output 30.1.1 shows the plot of the variables y1 and y2. Output 30.1.1 Plot of Data
The following statements plot the variables y3 and y4. proc timeseries data=us_money vectorplot=series; id date interval=qtr; var y3 y4; run;
Output 30.1.2 shows the plot of the variables y3 and y4.
2004 F Chapter 30: The VARMAX Procedure
Output 30.1.2 Plot of Data
proc varmax data=us_money; id date interval=qtr; model y1-y4 / p=2 lagmax=6 dftest print=(iarr(3) estimates diagnose) cointtest=(johansen=(iorder=2)) ecm=(rank=1 normalize=y1); cointeg rank=1 normalize=y1 exogeneity; run;
This example performs the Dickey-Fuller test for stationarity, the Johansen cointegrated test integrated order 2, and the exogeneity test. The VECM(2) is fit to the data. From the outputs shown in Output 30.1.5, you can see that the series has unit roots and is cointegrated in rank 1 with integrated order 1. The fitted VECM(2) is given as 0 yt
1 0 0:0408 B 0:0860 C CCB A @ 0:0052 0:0144
B D B @ 0
0:3460 0:0913 B 0:0994 0:0379 CB @ 0:1812 0:0786 0:0322 0:0496
0:0140 0:0281 0:0022 0:0051 0:3535 0:2390 0:0223 0:0329
0:0065 0:0131 0:0010 0:0024
1 0:1306 0:2630 C Cy 0:0201 A t 0:0477
0:2026 0:4080 0:0312 0:0741 1
0:9690 0:2866 C C y 0:4051 A t 0:1857
1
C t
1
Example 30.1: Analysis of U.S. Economic Variables F 2005
The prefixed to a variable name implies differencing. Output 30.1.3 through Output 30.1.14 show the details. Output 30.1.3 shows the descriptive statistics. Output 30.1.3 Descriptive Statistics Analysis of U.S. Economic Variables The VARMAX Procedure Number of Observations Number of Pairwise Missing
136 0
Simple Summary Statistics
N
Mean
Standard Deviation
Min
Max
136 136 136 136
6.21295 7.77890 0.05608 0.06458
0.07924 0.30110 0.03109 0.02927
6.10278 7.24508 0.00813 0.02490
6.45331 8.27461 0.15087 0.13600
Variable Type y1 y2 y3 y4
Dependent Dependent Dependent Dependent
Simple Summary Statistics Variable Label y1 y2 y3 y4
log(real money stock M1) log(GNP in bil. of 1982 dollars) Discount rate on 91-day T-bills Yield on 20-year Treasury bonds
Output 30.1.4 shows the output for Dickey-Fuller tests for the nonstationarity of each series. The null hypotheses is to test a unit root. All series have a unit root. Output 30.1.4 Unit Root Tests Unit Root Test Variable
Type
y1
Zero Mean Single Mean Trend Zero Mean Single Mean Trend Zero Mean Single Mean Trend Zero Mean Single Mean Trend
y2
y3
y4
Rho
Pr < Rho
Tau
Pr < Tau
0.05 -2.97 -5.91 0.13 -0.43 -9.21 -1.28 -8.86 -18.97 0.40 -2.79 -12.12
0.6934 0.6572 0.7454 0.7124 0.9309 0.4787 0.4255 0.1700 0.0742 0.7803 0.6790 0.2923
1.14 -0.76 -1.34 5.14 -0.79 -2.16 -0.69 -2.27 -2.86 0.45 -1.29 -2.33
0.9343 0.8260 0.8725 0.9999 0.8176 0.5063 0.4182 0.1842 0.1803 0.8100 0.6328 0.4170
2006 F Chapter 30: The VARMAX Procedure
The Johansen cointegration rank test shows whether the series is integrated order either 1 or 2 as shown in Output 30.1.5. The last two columns in Output 30.1.5 explain the cointegration rank test with integrated order 1. The results indicate that there is the cointegrated relationship with the cointegration rank 1 with respect to the 0.05 significance level because the test statistic of 20.6542 is smaller than the critical value of 29.38. Now, look at the row associated with r D 1. Compare the test statistic value and critical value pairs such as (219.62395, 29.38), (89.21508, 15.34), and (27.32609, 3.84). There is no evidence that the series are integrated order 2 at the 0.05 significance level. Output 30.1.5 Cointegration Rank Test Cointegration Rank Test for I(2)
r\k-r-s 0 1 2 3 5% CV I(2)
4
3
2
1
384.60903
214.37904 219.62395
107.93782 89.21508 73.61779
47.21000
29.38000
15.34000
37.02523 27.32609 22.13279 38.29435 3.84000
Trace of I(1) 55.9633 20.6542 2.6477 0.0149
Cointegration Rank Test for I(2)
r\k-r-s 0 1 2 3 5% CV I(2)
5% CV of I(1) 47.21 29.38 15.34 3.84
Output 30.1.6 shows the estimates of the long-run parameter, ˇ, and the adjustment coefficient, ˛.
Example 30.1: Analysis of U.S. Economic Variables F 2007
Output 30.1.6 Cointegration Rank Test Continued Beta Variable y1 y2 y3 y4
1
2
3
4
1.00000 -0.46458 14.51619 -9.35520
1.00000 -0.63174 -1.29864 7.53672
1.00000 -0.69996 1.37007 2.47901
1.00000 -0.16140 -0.61806 1.43731
Alpha Variable y1 y2 y3 y4
1
2
3
4
-0.01396 -0.02811 -0.00215 0.00510
0.01396 -0.02739 -0.04967 -0.02514
-0.01119 -0.00032 -0.00183 -0.00220
0.00008 0.00076 -0.00072 0.00016
Output 30.1.7 shows the estimates and . Output 30.1.7 Cointegration Rank Test Continued Eta Variable y1 y2 y3 y4
1
2
3
4
52.74907 -49.10609 68.29674 121.25932
41.74502 -9.40081 -144.83173 271.80496
-20.80403 98.87199 -27.35953 85.85156
55.77415 22.56416 15.51142 -130.11599
Xi Variable y1 y2 y3 y4
1
2
3
4
-0.00842 0.00141 -0.00445 -0.00211
-0.00052 0.00213 0.00541 -0.00064
-0.00208 -0.00736 -0.00150 -0.00130
-0.00250 -0.00058 0.00310 0.00197
Output 30.1.8 shows that the VECM(2) is fit to the data. The ECM=(RANK=1) option produces the estimates of the long-run parameter, ˇ, and the adjustment coefficient, ˛.
2008 F Chapter 30: The VARMAX Procedure
Output 30.1.8 Parameter Estimates Analysis of U.S. Economic Variables The VARMAX Procedure Type of Model Estimation Method Cointegrated Rank
VECM(2) Maximum Likelihood Estimation 1
Beta Variable
1
y1 y2 y3 y4
1.00000 -0.46458 14.51619 -9.35520
Alpha Variable y1 y2 y3 y4
1 -0.01396 -0.02811 -0.00215 0.00510
Output 30.1.9 shows the parameter estimates in terms of the constant, the lag one coefficients (yt 1 ) contained in the ˛ˇ 0 estimates, and the coefficients associated with the lag one first differences (yt 1 ).
Example 30.1: Analysis of U.S. Economic Variables F 2009
Output 30.1.9 Parameter Estimates Continued Constant Variable
Constant
y1 y2 y3 y4
0.04076 0.08595 0.00518 -0.01438
Parameter Alpha * Beta’ Estimates Variable y1 y2 y3 y4
y1
y2
y3
y4
-0.01396 -0.02811 -0.00215 0.00510
0.00648 0.01306 0.00100 -0.00237
-0.20263 -0.40799 -0.03121 0.07407
0.13059 0.26294 0.02011 -0.04774
AR Coefficients of Differenced Lag DIF Lag 1
Variable y1 y2 y3 y4
y1
y2
y3
y4
0.34603 0.09936 0.18118 0.03222
0.09131 0.03791 0.07859 0.04961
-0.35351 0.23900 0.02234 -0.03292
-0.96895 0.28661 0.40508 0.18568
Output 30.1.10 shows the parameter estimates and their significance.
2010 F Chapter 30: The VARMAX Procedure
Output 30.1.10 Parameter Estimates Continued Model Parameter Estimates
Equation Parameter
Estimate
D_y1
0.04076 -0.01396 0.00648 -0.20263 0.13059 0.34603 0.09131 -0.35351 -0.96895 0.08595 -0.02811 0.01306 -0.40799 0.26294 0.09936 0.03791 0.23900 0.28661 0.00518 -0.00215 0.00100 -0.03121 0.02011 0.18118 0.07859 0.02234 0.40508 -0.01438 0.00510 -0.00237 0.07407 -0.04774 0.03222 0.04961 -0.03292 0.18568
D_y2
D_y3
D_y4
CONST1 AR1_1_1 AR1_1_2 AR1_1_3 AR1_1_4 AR2_1_1 AR2_1_2 AR2_1_3 AR2_1_4 CONST2 AR1_2_1 AR1_2_2 AR1_2_3 AR1_2_4 AR2_2_1 AR2_2_2 AR2_2_3 AR2_2_4 CONST3 AR1_3_1 AR1_3_2 AR1_3_3 AR1_3_4 AR2_3_1 AR2_3_2 AR2_3_3 AR2_3_4 CONST4 AR1_4_1 AR1_4_2 AR1_4_3 AR1_4_4 AR2_4_1 AR2_4_2 AR2_4_3 AR2_4_4
Standard Error t Value Pr > |t| Variable 0.01418 0.00495 0.00230 0.07191 0.04634 0.06414 0.07334 0.11024 0.20737 0.01679 0.00586 0.00272 0.08514 0.05487 0.07594 0.08683 0.13052 0.24552 0.01608 0.00562 0.00261 0.08151 0.05253 0.07271 0.08313 0.12496 0.23506 0.00803 0.00281 0.00130 0.04072 0.02624 0.03632 0.04153 0.06243 0.11744
2.87
5.39 1.25 -3.21 -4.67 5.12
1.31 0.44 1.83 1.17 0.32
2.49 0.95 0.18 1.72 -1.79
0.89 1.19 -0.53 1.58
0.0048 1 y1(t-1) y2(t-1) y3(t-1) y4(t-1) 0.0001 D_y1(t-1) 0.2154 D_y2(t-1) 0.0017 D_y3(t-1) 0.0001 D_y4(t-1) 0.0001 1 y1(t-1) y2(t-1) y3(t-1) y4(t-1) 0.1932 D_y1(t-1) 0.6632 D_y2(t-1) 0.0695 D_y3(t-1) 0.2453 D_y4(t-1) 0.7476 1 y1(t-1) y2(t-1) y3(t-1) y4(t-1) 0.0140 D_y1(t-1) 0.3463 D_y2(t-1) 0.8584 D_y3(t-1) 0.0873 D_y4(t-1) 0.0758 1 y1(t-1) y2(t-1) y3(t-1) y4(t-1) 0.3768 D_y1(t-1) 0.2345 D_y2(t-1) 0.5990 D_y3(t-1) 0.1164 D_y4(t-1)
Output 30.1.11 shows the innovation covariance matrix estimates, the various information criteria results, and the tests for white noise residuals. The residuals have significant correlations at lag 2 and 3. The Portmanteau test results into significant. These results show that a VECM(3) model might be better fit than the VECM(2) model is.
Example 30.1: Analysis of U.S. Economic Variables F 2011
Output 30.1.11 Diagnostic Checks Covariances of Innovations Variable y1 y2 y3 y4
y1
y2
y3
y4
0.00005 0.00001 -0.00001 -0.00000
0.00001 0.00007 0.00002 0.00001
-0.00001 0.00002 0.00007 0.00002
-0.00000 0.00001 0.00002 0.00002
Information Criteria AICC HQC AIC SBC FPEC
-40.6284 -40.4343 -40.6452 -40.1262 2.23E-18
Schematic Representation of Cross Correlations of Residuals Variable/ Lag 0 1 2 3 4 5 6 y1 y2 y3 y4
++.. ++++ .+++ .+++
.... .... .... ....
+ is > 2*std error,
++.. .... +.-. ....
.... .... ..++ ..+.
+... .... -... ....
- is < -2*std error,
..-.... .... ....
.... .... .... ....
. is between
Portmanteau Test for Cross Correlations of Residuals Up To Lag 3 4 5 6
DF
Chi-Square
Pr > ChiSq
16 32 48 64
53.90 74.03 103.08 116.94
<.0001 <.0001 <.0001 <.0001
Output 30.1.12 describes how well each univariate equation fits the data. The residuals for y3 and y4 are off from the normality. Except the residuals for y3, there are no AR effects on other residuals. Except the residuals for y4, there are no ARCH effects on other residuals.
2012 F Chapter 30: The VARMAX Procedure
Output 30.1.12 Diagnostic Checks Continued Univariate Model ANOVA Diagnostics
Variable y1 y2 y3 y4
R-Square
Standard Deviation
F Value
Pr > F
0.6754 0.3070 0.1328 0.0831
0.00712 0.00843 0.00807 0.00403
32.51 6.92 2.39 1.42
<.0001 <.0001 0.0196 0.1963
Univariate Model White Noise Diagnostics Durbin Watson
Variable y1 y2 y3 y4
Normality Chi-Square Pr > ChiSq
2.13418 2.04003 1.86892 1.98440
7.19 1.20 253.76 105.21
ARCH F Value Pr > F
0.0275 0.5483 <.0001 <.0001
1.62 1.23 1.78 21.01
0.2053 0.2697 0.1847 <.0001
Univariate Model AR Diagnostics
Variable
AR1 F Value Pr > F
y1 y2 y3 y4
0.68 0.05 0.56 0.01
0.4126 0.8185 0.4547 0.9340
AR2 F Value Pr > F 2.98 0.12 2.86 0.16
AR3 F Value Pr > F
0.0542 0.8842 0.0610 0.8559
2.01 0.41 4.83 1.21
AR4 F Value Pr > F
0.1154 0.7453 0.0032 0.3103
2.48 0.30 3.71 0.95
0.0473 0.8762 0.0069 0.4358
The PRINT=(IARR) option provides the VAR(2) representation in Output 30.1.13. Output 30.1.13 Infinite Order AR Representation Infinite Order AR Representation Lag 1
2
3
Variable y1 y2 y3 y4 y1 y2 y3 y4 y1 y2 y3 y4
y1
y2
y3
y4
1.33208 0.07125 0.17903 0.03732 -0.34603 -0.09936 -0.18118 -0.03222 0.00000 0.00000 0.00000 0.00000
0.09780 1.05096 0.07959 0.04724 -0.09131 -0.03791 -0.07859 -0.04961 0.00000 0.00000 0.00000 0.00000
-0.55614 -0.16899 0.99113 0.04116 0.35351 -0.23900 -0.02234 0.03292 0.00000 0.00000 0.00000 0.00000
-0.83836 0.54955 0.42520 1.13795 0.96895 -0.28661 -0.40508 -0.18568 0.00000 0.00000 0.00000 0.00000
Example 30.2: Analysis of German Economic Variables F 2013
Output 30.1.14 shows whether each variable is the weak exogeneity of other variables. The variable y1 is not the weak exogeneity of other variables, y2, y3, and y4; the variable y2 is not the weak exogeneity of other variables, y1, y3, and y4; the variable y3 and y4 are the weak exogeneity of other variables. Output 30.1.14 Weak Exogeneity Test Testing Weak Exogeneity of Each Variables Variable y1 y2 y3 y4
DF
Chi-Square
Pr > ChiSq
1 1 1 1
6.55 12.54 0.09 1.81
0.0105 0.0004 0.7695 0.1786
Example 30.2: Analysis of German Economic Variables This example considers a three-dimensional VAR(2) model. The model contains the logarithms of a quarterly, seasonally adjusted West German fixed investment, disposable income, and consumption expenditures. The data used are in Lütkepohl (1993, Table E.1). title ’Analysis of German Economic Variables’; data west; date = intnx( ’qtr’, ’01jan60’d, _n_-1 ); format date yyq. ; input y1 y2 y3 @@; y1 = log(y1); y2 = log(y2); y3 = log(y3); label y1 = ’logarithm of investment’ y2 = ’logarithm of income’ y3 = ’logarithm of consumption’; datalines; 180 451 415 179 465 421 185 485 434 192 493
448
... more lines ...
data use; set west; where date < ’01jan79’d; keep date y1 y2 y3; run; proc varmax data=use; id date interval=qtr; model y1-y3 / p=2 dify=(1) print=(decompose(6) impulse=(stderr) estimates diagnose)
2014 F Chapter 30: The VARMAX Procedure
printform=both lagmax=3; causal group1=(y1) group2=(y2 y3); output lead=5; run;
First, the differenced data is modeled as a VAR(2) with the following result: 1 0 0:01672 0:01577 A C @ 0:01293
0 yt
D @ 0 C@
0:14599 0:15273 0:22481 1 0:93439 0:01020 A yt 0:02223
1 0:96122 0:28850 A yt 0:26397
0:31963 0:04393 0:00242
0:16055 0:11460 0:05003 0:01917 0:03388 0:35491
2
1
C t
The parameter estimates AR1_1_2, AR1_1_3, AR2_1_2, and AR2_1_3 are insignificant, and the VARX model is fitted in the next step. The detailed output is shown in Output 30.2.1 through Output 30.2.8. Output 30.2.1 shows the descriptive statistics. Output 30.2.1 Descriptive Statistics Analysis of German Economic Variables The VARMAX Procedure Number of Observations Number of Pairwise Missing Observation(s) eliminated by differencing
75 0 1
Simple Summary Statistics
Variable Type y1 y2 y3
Dependent Dependent Dependent
N
Mean
Standard Deviation
Min
Max
75 75 75
0.01811 0.02071 0.01987
0.04680 0.01208 0.01040
-0.14018 -0.02888 -0.01300
0.19358 0.05023 0.04483
Simple Summary Statistics Variable Difference
Label
y1 y2 y3
logarithm of investment logarithm of income logarithm of consumption
1 1 1
Output 30.2.2 shows that a VAR(2) model is fit to the data.
Example 30.2: Analysis of German Economic Variables F 2015
Output 30.2.2 Parameter Estimates Analysis of German Economic Variables The VARMAX Procedure Type of Model Estimation Method
VAR(2) Least Squares Estimation
Constant Variable
Constant
y1 y2 y3
-0.01672 0.01577 0.01293
AR Lag 1
2
Variable y1 y2 y3 y1 y2 y3
y1
y2
y3
-0.31963 0.04393 -0.00242 -0.16055 0.05003 0.03388
0.14599 -0.15273 0.22481 0.11460 0.01917 0.35491
0.96122 0.28850 -0.26397 0.93439 -0.01020 -0.02223
Output 30.2.3 shows the parameter estimates and their significance.
2016 F Chapter 30: The VARMAX Procedure
Output 30.2.3 Parameter Estimates Continued Schematic Representation Variable/ Lag
C
AR1
AR2
y1 y2 y3
. + +
-.. ... .+.
... ... .+.
+ is > 2*std error, is < -2*std error, . is between, * is N/A
Model Parameter Estimates
Equation Parameter
Estimate
y1
-0.01672 -0.31963 0.14599 0.96122 -0.16055 0.11460 0.93439 0.01577 0.04393 -0.15273 0.28850 0.05003 0.01917 -0.01020 0.01293 -0.00242 0.22481 -0.26397 0.03388 0.35491 -0.02223
y2
y3
CONST1 AR1_1_1 AR1_1_2 AR1_1_3 AR2_1_1 AR2_1_2 AR2_1_3 CONST2 AR1_2_1 AR1_2_2 AR1_2_3 AR2_2_1 AR2_2_2 AR2_2_3 CONST3 AR1_3_1 AR1_3_2 AR1_3_3 AR2_3_1 AR2_3_2 AR2_3_3
Standard Error t Value Pr > |t| Variable 0.01723 0.12546 0.54567 0.66431 0.12491 0.53457 0.66510 0.00437 0.03186 0.13857 0.16870 0.03172 0.13575 0.16890 0.00353 0.02568 0.11168 0.13596 0.02556 0.10941 0.13612
-0.97 -2.55 0.27 1.45 -1.29 0.21 1.40 3.60 1.38 -1.10 1.71 1.58 0.14 -0.06 3.67 -0.09 2.01 -1.94 1.33 3.24 -0.16
0.3352 0.0132 0.7899 0.1526 0.2032 0.8309 0.1647 0.0006 0.1726 0.2744 0.0919 0.1195 0.8882 0.9520 0.0005 0.9251 0.0482 0.0565 0.1896 0.0019 0.8708
1 y1(t-1) y2(t-1) y3(t-1) y1(t-2) y2(t-2) y3(t-2) 1 y1(t-1) y2(t-1) y3(t-1) y1(t-2) y2(t-2) y3(t-2) 1 y1(t-1) y2(t-1) y3(t-1) y1(t-2) y2(t-2) y3(t-2)
Output 30.2.4 shows the innovation covariance matrix estimates, the various information criteria results, and the tests for white noise residuals. The residuals are uncorrelated except at lag 3 for y2 variable.
Example 30.2: Analysis of German Economic Variables F 2017
Output 30.2.4 Diagnostic Checks Covariances of Innovations Variable y1 y2 y3
y1
y2
y3
0.00213 0.00007 0.00012
0.00007 0.00014 0.00006
0.00012 0.00006 0.00009
Information Criteria AICC HQC AIC SBC FPEC
-24.4884 -24.2869 -24.5494 -23.8905 2.18E-11
Cross Correlations of Residuals Lag 0
1
2
3
Variable y1 y2 y3 y1 y2 y3 y1 y2 y3 y1 y2 y3
y1
y2
y3
1.00000 0.13242 0.28275 0.01461 -0.01125 -0.00993 0.07253 -0.08096 -0.02660 0.09915 -0.00289 -0.03364
0.13242 1.00000 0.55526 -0.00666 -0.00167 -0.06780 -0.00226 -0.01066 -0.01392 0.04484 0.14059 0.05374
0.28275 0.55526 1.00000 -0.02394 -0.04515 -0.09593 -0.01621 -0.02047 -0.02263 0.05243 0.25984 0.05644
Schematic Representation of Cross Correlations of Residuals Variable/ Lag 0 1 2 3 y1 y2 y3
+.+ .++ +++
... ... ...
... ... ...
... ..+ ...
+ is > 2*std error, - is < -2*std error, . is between
Portmanteau Test for Cross Correlations of Residuals Up To Lag 3
DF
Chi-Square
Pr > ChiSq
9
9.69
0.3766
2018 F Chapter 30: The VARMAX Procedure
Output 30.2.5 describes how well each univariate equation fits the data. The residuals are off from the normality, but have no AR effects. The residuals for y1 variable have the ARCH effect. Output 30.2.5 Diagnostic Checks Continued Univariate Model ANOVA Diagnostics
Variable y1 y2 y3
R-Square
Standard Deviation
F Value
Pr > F
0.1286 0.1142 0.2513
0.04615 0.01172 0.00944
1.62 1.42 3.69
0.1547 0.2210 0.0032
Univariate Model White Noise Diagnostics Durbin Watson
Variable y1 y2 y3
Normality Chi-Square Pr > ChiSq
1.96269 1.98145 2.14583
10.22 11.98 34.25
ARCH F Value Pr > F
0.0060 0.0025 <.0001
12.39 0.38 0.10
0.0008 0.5386 0.7480
Univariate Model AR Diagnostics
Variable y1 y2 y3
AR1 F Value Pr > F 0.01 0.00 0.68
0.9029 0.9883 0.4129
AR2 F Value Pr > F 0.19 0.00 0.38
0.8291 0.9961 0.6861
AR3 F Value Pr > F 0.39 0.46 0.30
0.7624 0.7097 0.8245
AR4 F Value Pr > F 1.39 0.34 0.21
0.2481 0.8486 0.9320
Output 30.2.6 is the output in a matrix format associated with the PRINT=(IMPULSE=) option for the impulse response function and standard errors. The y3 variable in the first row is an impulse variable. The y1 variable in the first column is a response variable. The numbers, 0.96122, 0.41555, –0.40789 at lag 1 to 3 are decreasing.
Example 30.2: Analysis of German Economic Variables F 2019
Output 30.2.6 Impulse Response Function Simple Impulse Response by Variable Variable Response\Impulse y1
y2
y3
Lag
y1
y2
y3
1 STD 2 STD 3 STD 1 STD 2 STD 3 STD 1 STD 2 STD 3 STD
-0.31963 0.12546 -0.05430 0.12919 0.11904 0.08362 0.04393 0.03186 0.02858 0.03184 -0.00884 0.01583 -0.00242 0.02568 0.04517 0.02563 -0.00055 0.01646
0.14599 0.54567 0.26174 0.54728 0.35283 0.38489 -0.15273 0.13857 0.11377 0.13425 0.07147 0.07914 0.22481 0.11168 0.26088 0.10820 -0.09818 0.07823
0.96122 0.66431 0.41555 0.66311 -0.40789 0.47867 0.28850 0.16870 -0.08820 0.16250 0.11977 0.09462 -0.26397 0.13596 0.10998 0.13101 0.09096 0.10280
The proportions of decomposition of the prediction error covariances of three variables are given in Output 30.2.7. If you see the y3 variable in the first column, then the output explains that about 64.713% of the one-step-ahead prediction error covariances of the variable y3t is accounted for by its own innovations, about 7.995% is accounted for by y1t innovations, and about 27.292% is accounted for by y2t innovations.
2020 F Chapter 30: The VARMAX Procedure
Output 30.2.7 Proportions of Prediction Error Covariance Decomposition Proportions of Prediction Error Covariances by Variable Variable y1
y2
y3
Lead
y1
y2
y3
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
1.00000 0.95996 0.94565 0.94079 0.93846 0.93831 0.01754 0.06025 0.06959 0.06831 0.06850 0.06924 0.07995 0.07725 0.12973 0.12870 0.12859 0.12852
0.00000 0.01751 0.02802 0.02936 0.03018 0.03025 0.98246 0.90747 0.89576 0.89232 0.89212 0.89141 0.27292 0.27385 0.33364 0.33499 0.33924 0.33963
0.00000 0.02253 0.02633 0.02985 0.03136 0.03145 0.00000 0.03228 0.03465 0.03937 0.03938 0.03935 0.64713 0.64890 0.53663 0.53631 0.53217 0.53185
The table in Output 30.2.8 gives forecasts and their prediction error covariances. Output 30.2.8 Forecasts Forecasts
Variable y1
y2
y3
Obs
Time
Forecast
Standard Error
77 78 79 80 81 77 78 79 80 81 77 78 79 80 81
1979:1 1979:2 1979:3 1979:4 1980:1 1979:1 1979:2 1979:3 1979:4 1980:1 1979:1 1979:2 1979:3 1979:4 1980:1
6.54027 6.55105 6.57217 6.58452 6.60193 7.68473 7.70508 7.72206 7.74266 7.76240 7.54024 7.55489 7.57472 7.59344 7.61232
0.04615 0.05825 0.06883 0.08021 0.09117 0.01172 0.01691 0.02156 0.02615 0.03005 0.00944 0.01282 0.01808 0.02205 0.02578
95% Confidence Limits 6.44982 6.43688 6.43725 6.42732 6.42324 7.66176 7.67193 7.67980 7.69140 7.70350 7.52172 7.52977 7.53928 7.55022 7.56179
6.63072 6.66522 6.70708 6.74173 6.78063 7.70770 7.73822 7.76431 7.79392 7.82130 7.55875 7.58001 7.61015 7.63666 7.66286
Output 30.2.9 shows that you cannot reject Granger noncausality from .y2; y3/ to y1 using the 0.05 significance level.
Example 30.2: Analysis of German Economic Variables F 2021
Output 30.2.9 Granger Causality Tests Granger-Causality Wald Test Test
DF
Chi-Square
Pr > ChiSq
1
4
6.37
0.1734
Test 1:
Group 1 Variables: Group 2 Variables:
y1 y2 y3
The following SAS statements show that the variable y1 is the exogenous variable and fit the VARX(2,1) model to the data. proc varmax data=use; id date interval=qtr; model y2 y3 = y1 / p=2 dify=(1) difx=(1) xlag=1 lagmax=3 print=(estimates diagnose); run;
The fitted VARX(2,1) model is written as
y 2t y 3t
0:01542 0:02520 0:03870 D C y 1t C y 1;t 0:01319 0:05130 0:00363 0:12258 0:25811 y 2;t 1 C 0:24367 0:31809 y 3;t 1 0:01651 0:03498 y 2;t 2 1t C C 0:34921 0:01664 y 3;t 2 2t
1
The detailed output is shown in Output 30.2.10 through Output 30.2.13. Output 30.2.10 shows the parameter estimates in terms of the constant, the current and the lag one coefficients of the exogenous variable, and the lag two coefficients of the dependent variables.
2022 F Chapter 30: The VARMAX Procedure
Output 30.2.10 Parameter Estimates Analysis of German Economic Variables The VARMAX Procedure Type of Model Estimation Method
VARX(2,1) Least Squares Estimation
Constant Variable
Constant
y2 y3
0.01542 0.01319
XLag Lag
Variable
0 1
y1
y2 y3 y2 y3
0.02520 0.05130 0.03870 0.00363
AR Lag 1 2
Variable y2 y3 y2 y3
y2
y3
-0.12258 0.24367 0.01651 0.34921
0.25811 -0.31809 0.03498 -0.01664
Output 30.2.11 shows the parameter estimates and their significance.
Example 30.2: Analysis of German Economic Variables F 2023
Output 30.2.11 Parameter Estimates Continued Model Parameter Estimates
Equation Parameter
Estimate
y2
0.01542 0.02520 0.03870 -0.12258 0.25811 0.01651 0.03498 0.01319 0.05130 0.00363 0.24367 -0.31809 0.34921 -0.01664
y3
CONST1 XL0_1_1 XL1_1_1 AR1_1_1 AR1_1_2 AR2_1_1 AR2_1_2 CONST2 XL0_2_1 XL1_2_1 AR1_2_1 AR1_2_2 AR2_2_1 AR2_2_2
Standard Error t Value Pr > |t| Variable 0.00443 0.03130 0.03252 0.13903 0.17370 0.13766 0.16783 0.00346 0.02441 0.02536 0.10842 0.13546 0.10736 0.13088
3.48 0.81 1.19 -0.88 1.49 0.12 0.21 3.81 2.10 0.14 2.25 -2.35 3.25 -0.13
0.0009 0.4237 0.2383 0.3811 0.1421 0.9049 0.8356 0.0003 0.0394 0.8868 0.0280 0.0219 0.0018 0.8992
1 y1(t) y1(t-1) y2(t-1) y3(t-1) y2(t-2) y3(t-2) 1 y1(t) y1(t-1) y2(t-1) y3(t-1) y2(t-2) y3(t-2)
Output 30.2.12 shows the innovation covariance matrix estimates, the various information criteria results, and the tests for white noise residuals. The residuals is uncorrelated except at lag 3 for y2 variable.
2024 F Chapter 30: The VARMAX Procedure
Output 30.2.12 Diagnostic Checks Covariances of Innovations Variable y2 y3
y2
y3
0.00014 0.00006
0.00006 0.00009
Information Criteria AICC HQC AIC SBC FPEC
-18.3902 -18.2558 -18.4309 -17.9916 9.91E-9
Cross Correlations of Residuals Lag
Variable
0
y2 y3 y2 y3 y2 y3 y2 y3
1 2 3
y2
y3
1.00000 0.56462 -0.02312 -0.07056 -0.02849 -0.05804 0.16071 0.10882
0.56462 1.00000 -0.05927 -0.09145 -0.05262 -0.08567 0.29588 0.13002
Schematic Representation of Cross Correlations of Residuals Variable/ Lag 0 1 2 3 y2 y3
++ ++
.. ..
.. ..
.+ ..
+ is > 2*std error, - is < -2*std error, . is between
Portmanteau Test for Cross Correlations of Residuals Up To Lag 3
DF
Chi-Square
Pr > ChiSq
4
8.38
0.0787
Output 30.2.13 describes how well each univariate equation fits the data. The residuals are off from the normality, but have no ARCH and AR effects.
Example 30.3: Numerous Examples F 2025
Output 30.2.13 Diagnostic Checks Continued Univariate Model ANOVA Diagnostics
Variable y2 y3
R-Square
Standard Deviation
F Value
Pr > F
0.0897 0.2796
0.01188 0.00926
1.08 4.27
0.3809 0.0011
Univariate Model White Noise Diagnostics Durbin Watson
Variable y2 y3
Normality Chi-Square Pr > ChiSq
2.02413 2.13414
14.54 32.27
F Value
ARCH Pr > F
0.49 0.08
0.4842 0.7782
0.0007 <.0001
Univariate Model AR Diagnostics
Variable y2 y3
AR1 F Value Pr > F 0.04 0.62
0.8448 0.4343
AR2 F Value Pr > F 0.04 0.62
0.9570 0.5383
AR3 F Value Pr > F 0.62 0.72
AR4 F Value Pr > F
0.6029 0.5452
Example 30.3: Numerous Examples The following are examples of syntax for model fitting: /* Data ’a’ Generated Process */ proc iml; sig = {1.0 0.5, 0.5 1.25}; phi = {1.2 -0.5, 0.6 0.3}; call varmasim(y,phi) sigma = sig n = 100 seed = 46859; cn = {’y1’ ’y2’}; create a from y[colname=cn]; append from y; run;; /* when the series has a linear trend */ proc varmax data=a; model y1 y2 / p=1 trend=linear; run; /* Fit subset of AR order 1 and 3 */ proc varmax data=a; model y1 y2 / p=(1,3); run;
0.42 0.36
0.7914 0.8379
2026 F Chapter 30: The VARMAX Procedure
/* Check if the series is nonstationary */ proc varmax data=a; model y1 y2 / p=1 dftest print=(roots); run; /* Fit VAR(1) in differencing */ proc varmax data=a; model y1 y2 / p=1 print=(roots) dify=(1); run; /* Fit VAR(1) in seasonal differencing */ proc varmax data=a; model y1 y2 / p=1 dify=(4) lagmax=5; run; /* Fit VAR(1) in both regular and seasonal differencing */ proc varmax data=a; model y1 y2 / p=1 dify=(1,4) lagmax=5; run; /* Fit VAR(1) in different differencing */ proc varmax data=a; model y1 y2 / p=1 dif=(y1(1,4) y2(1)) lagmax=5; run; /* Options related to prediction */ proc varmax data=a; model y1 y2 / p=1 lagmax=3 print=(impulse covpe(5) decompose(5)); run; /* Options related to tentative order selection */ proc varmax data=a; model y1 y2 / p=1 lagmax=5 minic print=(parcoef pcancorr pcorr); run; /* Automatic selection of the AR order */ proc varmax data=a; model y1 y2 / minic=(type=aic p=5); run; /* Compare results of LS and Yule-Walker Estimators */ proc varmax data=a; model y1 y2 / p=1 print=(yw); run; /* BVAR(1) of the nonstationary series y1 and y2 */ proc varmax data=a; model y1 y2 / p=1 prior=(lambda=1 theta=0.2 ivar); run;
Example 30.3: Numerous Examples F 2027
/* BVAR(1) of the nonstationary series y1 */ proc varmax data=a; model y1 y2 / p=1 prior=(lambda=0.1 theta=0.15 ivar=(y1)); run; /* Data ’b’ Generated Process */ proc iml; sig = { 0.5 0.14 -0.08 -0.03, 0.14 0.71 0.16 0.1, -0.08 0.16 0.65 0.23, -0.03 0.1 0.23 0.16}; sig = sig * 0.0001; phi = {1.2 -0.5 0. 0.1, 0.6 0.3 -0.2 0.5, 0.4 0. -0.2 0.1, -1.0 0.2 0.7 -0.2}; call varmasim(y,phi) sigma = sig n = 100 seed = 32567; cn = {’y1’ ’y2’ ’y3’ ’y4’}; create b from y[colname=cn]; append from y; quit; /* Cointegration Rank Test using Trace statistics */ proc varmax data=b; model y1-y4 / p=2 lagmax=4 cointtest; run; /* Cointegration Rank Test using Max statistics */ proc varmax data=b; model y1-y4 / p=2 lagmax=4 cointtest=(johansen=(type=max)); run; /* Common Trends Test using Filter(Differencing) statistics */ proc varmax data=b; model y1-y4 / p=2 lagmax=4 cointtest=(sw); run; /* Common Trends Test using Filter(Residual) statistics */ proc varmax data=b; model y1-y4 / p=2 lagmax=4 cointtest=(sw=(type=filtres lag=1)); run; /* Common Trends Test using Kernel statistics */ proc varmax data=b; model y1-y4 / p=2 lagmax=4 cointtest=(sw=(type=kernel lag=1)); run; /* Cointegration Rank Test for I(2) */ proc varmax data=b; model y1-y4 / p=2 lagmax=4 cointtest=(johansen=(iorder=2)); run; /* Fit VECM(2) with rank=3 */ proc varmax data=b; model y1-y4 / p=2 lagmax=4 print=(roots iarr) ecm=(rank=3 normalize=y1);
2028 F Chapter 30: The VARMAX Procedure
run; /* Weak Exogenous Testing for each variable */ proc varmax data=b outstat=bbb; model y1-y4 / p=2 lagmax=4 ecm=(rank=3 normalize=y1); cointeg rank=3 exogeneity; run; /* Hypotheses Testing for long-run and adjustment parameter */ proc varmax data=b outstat=bbb; model y1-y4 / p=2 lagmax=4 ecm=(rank=3 normalize=y1); cointeg rank=3 normalize=y1 h=(1 0 0, 0 1 0, -1 0 0, 0 0 1) j=(1 0 0, 0 1 0, 0 0 1, 0 0 0); run; /* ordinary regression model */ proc varmax data=grunfeld; model y1 y2 = x1-x3; run; /* Ordinary regression model with subset lagged terms */ proc varmax data=grunfeld; model y1 y2 = x1 / xlag=(1,3); run; /* VARX(1,1) with no current time Exogenous Variables */ proc varmax data=grunfeld; model y1 y2 = x1 / p=1 xlag=1 nocurrentx; run; /* VARX(1,1) with different Exogenous Variables */ proc varmax data=grunfeld; model y1 = x3, y2 = x1 x2 / p=1 xlag=1; run; /* VARX(1,2) in difference with current Exogenous Variables */ proc varmax data=grunfeld; model y1 y2 = x1 / p=1 xlag=2 difx=(1) dify=(1); run;
Example 30.4: Illustration of ODS Graphics This example illustrates the use of ODS Graphics. The graphical displays are requested by specifying the ODS GRAPHICS ON statement. For information about the graphics available in the VARMAX procedure, see the section “ODS Graphics” on page 2000. The following statements use the SASHELP.WORKERS data set to study the time series of electrical
Example 30.4: Illustration of ODS Graphics F 2029
workers and its interaction with the series of masonry workers. The series and predict plots, the residual plot, and the forecast plot are created in Output 30.4.1 through Output 30.4.3. These are a selection of the plots created by the VARMAX procedure. title "Illustration of ODS Graphics"; proc varmax data=sashelp.workers plot(unpack)=(residual model forecasts); id date interval=month; model electric masonry / dify=(1,12) noint p=1; output lead=12; run;
Output 30.4.1 Series and Predicted Series Plots
2030 F Chapter 30: The VARMAX Procedure
Output 30.4.2 Residual Plot
References F 2031
Output 30.4.3 Series and Forecast Plots
References Anderson, T. W. (1951), “Estimating Linear Restrictions on Regression Coefficients for Multivariate Normal Distributions,” Annals of Mathematical Statistics, 22, 327-351. Ansley, C. F. and Newbold, P. (1979), “Multivariate Partial Autocorrelations,” ASA Proceedings of the Business and Economic Statistics Section, 349–353. Bollerslev, T. (1990), “Modeling the Coherence in Short-Run Nominal Exchange Rates: A Multivariate Generalized ARCH Model,” Review of Econometrics and Stochastics, 72, 498–505. Engle, R. F. and Granger, C. W. J. (1987), “Co-integration and Error Correction: Representation, Estimation and Testing,” Econometrica, 55, 251–276. Engle, R. F. and Kroner, K. F. (1995), “Multivariate Simultaneous Generalized ARCH,” Econometric Theory, 11, 122–150. Golub, G. H. and Van Loan, C. F. (1983), Matrix Computations, Baltimore and London: Johns Hopkins University Press.
2032 F Chapter 30: The VARMAX Procedure
Goodnight, J. H. (1979), “A Tutorial on the SWEEP Operator,” The American Statistician, 33, 149– 158. Hosking, J. R. M. (1980), “The Multivariate Portmanteau Statistic,” Journal of the American Statistical Association, 75, 602–608. Johansen, S. (1988), “Statistical Analysis of Cointegration Vectors,” Journal of Economic Dynamics and Control, 12, 231–254. Johansen, S. (1995a), “A Statistical Analysis of Cointegration for I(2) Variables,” Econometric Theory, 11, 25–59. Johansen, S. (1995b), Likelihood-Based Inference in Cointegrated Vector Autoregressive Models, New York: Oxford University Press. Johansen, S. and Juselius, K. (1990), “Maximum Likelihood Estimation and Inference on Cointegration: With Applications to the Demand for Money,” Oxford Bulletin of Economics and Statistics, 52, 169–210. Koreisha, S. and Pukkila, T. (1989), “Fast Linear Estimation Methods for Vector Autoregressive Moving Average Models,” Journal of Time Series Analysis, 10, 325-339. Litterman, R. B. (1986), “Forecasting with Bayesian Vector Autoregressions: Five Years of Experience,” Journal of Business & Economic Statistics, 4, 25–38. Lütkepohl, H. (1993), Introduction to Multiple Time Series Analysis, Berlin: Springer-Verlag. Osterwald-Lenum, M. (1992), “A Note with Quantiles of the Asymptotic Distribution of the Maximum Likelihood Cointegration Rank Test Statistics,” Oxford Bulletin of Economics and Statistics, 54, 461–472. Pringle, R. M. and Rayner, D. L. (1971), Generalized Inverse Matrices with Applications to Statistics, Second Edition, New York: McGraw-Hill Inc. Quinn, B. G. (1980), “Order Determination for a Multivariate Autoregression,” Journal of the Royal Statistical Society, B, 42, 182–185. Reinsel, G. C. (1997), Elements of Multivariate Time Series Analysis, Second Edition, New York: Springer-Verlag. Spliid, H. (1983), “A Fast Estimation for the Vector Autoregressive Moving Average Models with Exogenous Variables,” Journal of the American Statistical Association, 78, 843–849. Stock, J. H. and Watson, M. W. (1988), “Testing for Common Trends,” Journal of the American Statistical Association, 83, 1097–1107.
Chapter 31
The X11 Procedure Contents Overview: X11 Procedure . . . . . . . . . . . . . . . . . . . . Getting Started: X11 Procedure . . . . . . . . . . . . . . . . . Basic Seasonal Adjustment . . . . . . . . . . . . . . . . X-11-ARIMA . . . . . . . . . . . . . . . . . . . . . . . Syntax: X11 Procedure . . . . . . . . . . . . . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . . . . . . . PROC X11 Statement . . . . . . . . . . . . . . . . . . . ARIMA Statement . . . . . . . . . . . . . . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . . . . . . . ID Statement . . . . . . . . . . . . . . . . . . . . . . . . MACURVES Statement . . . . . . . . . . . . . . . . . . MONTHLY Statement . . . . . . . . . . . . . . . . . . . OUTPUT Statement . . . . . . . . . . . . . . . . . . . . PDWEIGHTS Statement . . . . . . . . . . . . . . . . . . QUARTERLY Statement . . . . . . . . . . . . . . . . . SSPAN Statement . . . . . . . . . . . . . . . . . . . . . TABLES Statement . . . . . . . . . . . . . . . . . . . . VAR Statement . . . . . . . . . . . . . . . . . . . . . . . Details: X11 Procedure . . . . . . . . . . . . . . . . . . . . . . Historical Development of X-11 . . . . . . . . . . . . . . Implementation of the X-11 Seasonal Adjustment Method Computational Details for Sliding Spans Analysis . . . . Data Requirements . . . . . . . . . . . . . . . . . . . . . Missing Values . . . . . . . . . . . . . . . . . . . . . . . Prior Daily Weights and Trading-Day Regression . . . . . Adjustment for Prior Factors . . . . . . . . . . . . . . . . The YRAHEADOUT Option . . . . . . . . . . . . . . . Effect of Backcast and Forecast Length . . . . . . . . . . Details of Model Selection . . . . . . . . . . . . . . . . . OUT= Data Set . . . . . . . . . . . . . . . . . . . . . . . The OUTSPAN= Data Set . . . . . . . . . . . . . . . . . OUTSTB= Data Set . . . . . . . . . . . . . . . . . . . . OUTTDR= Data Set . . . . . . . . . . . . . . . . . . . . Printed Output . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2034 2034 2035 2038 2040 2041 2042 2043 2046 2046 2046 2047 2051 2052 2052 2055 2056 2056 2056 2056 2058 2062 2064 2065 2065 2066 2067 2067 2068 2071 2071 2071 2072 2074
2034 F Chapter 31: The X11 Procedure
ODS Table Names . . . . . . . . . . . . . . . . . . . . . Examples: X11 Procedure . . . . . . . . . . . . . . . . . . . . Example 31.1: Component Estimation—Monthly Data . Example 31.2: Components Estimation—Quarterly Data Example 31.3: Outlier Detection and Removal . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
2085 2089 2089 2093 2095 2097
Overview: X11 Procedure The X11 procedure, an adaptation of the U.S. Bureau of the Census X-11 Seasonal Adjustment program, seasonally adjusts monthly or quarterly time series. The procedure makes additive or multiplicative adjustments and creates an output data set containing the adjusted time series and intermediate calculations. The X11 procedure also provides the X-11-ARIMA method developed by Statistics Canada. This method fits an ARIMA model to the original series, then uses the model forecast to extend the original series. This extended series is then seasonally adjusted by the standard X-11 seasonal adjustment method. The extension of the series improves the estimation of the seasonal factors and reduces revisions to the seasonally adjusted series as new data become available. The X11 procedure incorporates sliding spans analysis. This type of analysis provides a diagnostic for determining the suitability of seasonal adjustment for an economic series. Seasonal adjustment of a series is based on the assumption that seasonal fluctuations can be measured in the original series, Ot , t D 1; : : : ; n, and separated from trend cycle, trading-day, and irregular fluctuations. The seasonal component of this time series, St , is defined as the intrayear variation that is repeated constantly or in an evolving fashion from year to year. The trend cycle component, Ct , includes variation due to the long-term trend, the business cycle, and other longterm cyclical factors. The trading-day component, Dt , is the variation that can be attributed to the composition of the calendar. The irregular component, It , is the residual variation. Many economic time series are related in a multiplicative fashion (Ot D St Ct Dt It ). A seasonally adjusted time series, Ct It , consists of only the trend cycle and irregular components.
Getting Started: X11 Procedure The most common use of the X11 procedure is to produce a seasonally adjusted series. Eliminating the seasonal component from an economic series facilitates comparison among consecutive months or quarters. A plot of the seasonally adjusted series is often more informative about trends or location in a business cycle than a plot of the unadjusted series. The following example shows how to use PROC X11 to produce a seasonally adjusted series, Ct It , from an original series Ot D St Ct Dt It .
Basic Seasonal Adjustment F 2035
In the multiplicative model, the trend cycle component Ct keeps the same scale as the original series Ot , while St , Dt , and It vary around 1.0. In all printed tables and in the output data set, these latter components are expressed as percentages, and thus will vary around 100.0 (in the additive case, they vary around 0.0). The naming convention used in PROC X11 for the tables follows the original U.S. Bureau of the Census X-11 Seasonal Adjustment program specification (Shiskin, Young, and Musgrave 1967). Also, see the section “Printed Output” on page 2074. This convention is outlined in Figure 31.1. The tables corresponding to parts A – C are intermediate calculations. The final estimates of the individual components are found in the D tables: table D10 contains the final seasonal factors, table D12 contains the final trend cycle, and table D13 contains the final irregular series. If you are primarily interested in seasonally adjusting a series without consideration of intermediate calculations or diagnostics, you only need to look at table D11, the final seasonally adjusted series. For further details about the X-11-ARIMA tables, see Ladiray and Quenneville (2001).
Basic Seasonal Adjustment Suppose you have monthly retail sales data starting in September 1978 in a SAS data set named SALES. At this point you do not suspect that any calendar effects are present, and there are no prior adjustments that need to be made to the data. In this simplest case, you need only specify the DATE= variable in the MONTHLY statement, which associates a SAS date value to each observation. To see the results of the seasonal adjustment, you must request table D11, the final seasonally adjusted series, in a TABLES statement. data sales; input sales @@; date = intnx( ’month’, ’01sep1978’d, _n_-1 ); format date monyy7.; datalines; ... more lines ...
/*--- X-11 ARIMA ---*/ proc x11 data=sales; monthly date=date; var sales; tables d11; run;
2036 F Chapter 31: The X11 Procedure
Figure 31.1 Basic Seasonal Adjustment The X11 Procedure X-11 Seasonal Adjustment Program U. S. Bureau of the Census Economic Research and Analysis Division November 1, 1968 The X-11 program is divided into seven major parts. Part Description A. Prior adjustments, if any B. Preliminary estimates of irregular component weights and regression trading day factors C. Final estimates of above D. Final estimates of seasonal, trend-cycle and irregular components E. Analytical tables F. Summary measures G. Charts Series - sales Period covered - 9/1978 to 8/1990 Type of run: multiplicative seasonal adjustment. Selected Tables or Charts. Sigma limits for graduating extreme values are 1.5 and 2.5 Irregular values outside of 2.5-sigma limits are excluded from trading day regression
Basic Seasonal Adjustment F 2037
Figure 31.2 Basic Seasonal Adjustment The X11 Procedure Seasonal Adjustment of - sales
Year
JAN
D11 Final Seasonally Adjusted Series FEB MAR APR
MAY
JUN
1978 . . . . . . 1979 124.935 126.533 125.282 125.650 127.754 129.648 1980 128.734 139.542 143.726 143.854 148.723 144.530 1981 176.329 166.264 167.433 167.509 173.573 175.541 1982 186.747 202.467 192.024 202.761 197.548 206.344 1983 233.109 223.345 218.179 226.389 224.249 227.700 1984 238.261 239.698 246.958 242.349 244.665 247.005 1985 275.766 282.316 294.169 285.034 294.034 296.114 1986 325.471 332.228 330.401 330.282 333.792 331.349 1987 363.592 373.118 368.670 377.650 380.316 376.297 1988 370.966 384.743 386.833 405.209 380.840 389.132 1989 428.276 418.236 429.409 446.467 437.639 440.832 1990 480.631 474.669 486.137 483.140 481.111 499.169 --------------------------------------------------------------------Avg 277.735 280.263 282.435 286.358 285.354 288.638
Year
JUL
D11 Final Seasonally Adjusted Series AUG SEP OCT NOV
DEC
Total
1978 . . 123.507 125.776 124.735 129.870 503.887 1979 127.880 129.285 126.562 134.905 133.356 136.117 1547.91 1980 140.120 153.475 159.281 162.128 168.848 165.159 1798.12 1981 179.301 182.254 187.448 197.431 184.341 184.304 2141.73 1982 211.690 213.691 214.204 218.060 228.035 240.347 2513.92 1983 222.045 222.127 222.835 212.227 230.187 232.827 2695.22 1984 251.247 253.805 264.924 266.004 265.366 277.025 3037.31 1985 294.196 309.162 311.539 319.518 318.564 323.921 3604.33 1986 337.095 341.127 346.173 350.183 360.792 362.333 4081.23 1987 379.668 375.607 374.257 372.672 368.135 364.150 4474.13 1988 385.479 377.147 397.404 403.156 413.843 416.142 4710.89 1989 450.103 454.176 460.601 462.029 427.499 485.113 5340.38 1990 485.370 485.103 . . . . 3875.33 -------------------------------------------------------------------------Avg 288.683 291.413 265.728 268.674 268.642 276.442 Total:
40324
Mean:
280.03
S.D.:
111.31
You can compare the original series, table B1, and the final seasonally adjusted series, table D11, by plotting them together. These tables are requested and named in the OUTPUT statement. title ’Monthly Retail Sales Data (in $1000)’; proc x11 data=sales noprint; monthly date=date; var sales; output out=out b1=sales d11=adjusted; run;
2038 F Chapter 31: The X11 Procedure
proc sgplot data=out; series x=date y=sales
/ markers markerattrs=(color=red symbol=’asterisk’) lineattrs=(color=red) legendlabel="original" ; series x=date y=adjusted / markers markerattrs=(color=blue symbol=’circle’) lineattrs=(color=blue) legendlabel="adjusted" ; yaxis label=’Original and Seasonally Adjusted Time Series’; run;
Figure 31.3 Plot of Original and Seasonally Adjusted Data
X-11-ARIMA An inherent problem with the X-11 method is the revision of the seasonal factor estimates as new data become available. The X-11 method uses a set of centered moving averages to estimate the seasonal components. These moving averages apply symmetric weights to all observations except those at the beginning and end of the series, where asymmetric weights have to be applied. These
X-11-ARIMA F 2039
asymmetric weights can cause poor estimates of the seasonal factors, which then can cause large revisions when new data become available. While large revisions to seasonally adjusted values are not common, they can happen. When they do happen, it undermines the credibility of the X-11 seasonal adjustment method. A method to address this problem was developed at Statistics Canada (Dagum 1980, 1982a). This method, known as X-11-ARIMA, applies an ARIMA model to the original data (after adjustments, if any) to forecast the series one or more years. This extended series is then seasonally adjusted, allowing symmetric weights to be applied to the end of the original data. This method was tested against a large number of Canadian economic series and was found to greatly reduce the amount of revisions as new data were added. The X-11-ARIMA method is available in PROC X11 through the use of the ARIMA statement. The ARIMA statement extends the original series either with a user-specified ARIMA model or by an automatic selection process in which the best model from a set of five predefined ARIMA models is used. The following example illustrates the use of the ARIMA statement. The ARIMA statement does not contain a user-specified model, so the best model is chosen by the automatic selection process. Forecasts from this best model are then used to extend the original series by one year. The following partial listing shows parameter estimates and model diagnostics for the ARIMA model chosen by the automatic selection process. proc x11 data=sales; monthly date=date; var sales; arima; run;
Figure 31.4 X-11-ARIMA Model Selection Monthly Retail Sales Data (in $1000) The X11 Procedure Seasonal Adjustment of - sales Conditional Least Squares Estimation Approx. Parameter Estimate Std Error t Value MU MA1,1 MA1,2 MA2,1
0.0001728 0.3739984 0.0231478 0.5727914
0.0009596 0.0893427 0.0892154 0.0790835
0.18 4.19 0.26 7.24
Lag 0 1 2 12
2040 F Chapter 31: The X11 Procedure
Figure 31.4 continued Conditional Least Squares Estimation Variance Estimate = Std Error Estimate = AIC = SBC = Number of Residuals=
0.0014313 0.0378326 -482.2412 -470.7404 131
* *
* Does not include log determinant Criteria Summary for Model 2: (0,1,2)(0,1,1)s, Log Transform Box-Ljung Chi-square: 22.03 with 21 df Prob= 0.40 (Criteria prob > 0.05) Test for over-differencing: sum of MA parameters = 0.57 (must be < 0.90) MAPE - Last Three Years: 2.84 (Must be < 15.00 %) - Last Year: 3.04 - Next to Last Year: 1.96 - Third from Last Year: 3.51
Table D11 (final seasonally adjusted series) is now constructed using symmetric weights on observations at the end of the actual data. This should result in better estimates of the seasonal factors and, thus, smaller revisions in Table D11 as more data become available.
Syntax: X11 Procedure The X11 procedure uses the following statements: PROC X11 options ; ARIMA options ; BY variables ; ID variables ; MACURVES option ; MONTHLY options ; OUTPUT OUT=dataset options ; PDWEIGHTS option ; QUARTERLY options ; SSPAN options ; TABLES tablenames ; VAR variables ;
Either the MONTHLY or QUARTERLY statement must be specified, depending on the type of time series data you have. The PDWEIGHTS and MACURVES statements can be used only with the MONTHLY statement. The TABLES statement controls the printing of tables, while the OUTPUT statement controls the creation of the OUT= data set.
Functional Summary F 2041
Functional Summary The statements and options controlling the X11 procedures are summarized in the following table. Description Data Set Options specify input data set write the trading-day regression results to an output data set write the stable seasonality test results to an output data set write table values to an output data set add extrapolated values to the output data set add year ahead estimates to the output data set write the sliding spans analysis results to an output data set Printing Control Options suppress all printed output suppress all printed ARIMA output print all ARIMA output print selected tables and charts print selected groups of tables print selected groups of charts print preliminary tables associated with ARIMA processing specify number of decimals for printed tables suppress all printed SSPAN output print all SSPAN output Date Information Options specify a SAS date variable specify the beginning date specify the ending date specify beginning year for trading-day regression Declaring the Role of Variables specify BY-group processing
Statement
Option
PROC X11 PROC X11
DATA= OUTTDR=
PROC X11
OUTSTB=
OUTPUT PROC X11 PROC X11 PROC X11
OUT= OUTEX YRAHEADOUT OUTSPAN=
PROC X11 ARIMA ARIMA TABLES MONTHLY QUARTERLY MONTHLY QUARTERLY ARIMA
NOPRINT NOPRINT PRINTALL
MONTHLY QUARTERLY SSPAN SSPAN
NDEC= NDEC= NOPRINT PRINTALL
MONTHLY QUARTERLY MONTHLY QUARTERLY MONTHLY QUARTERLY MONTHLY
DATE= DATE= START= START= END= END= TDCOMPUTE=
BY
PRINTOUT= PRINTOUT= CHARTS= CHARTS= PRINTFP
2042 F Chapter 31: The X11 Procedure
Description
Statement
Option
specify the variables to be seasonally adjusted specify identifying variables specify the prior monthly factor
VAR ID MONTHLY
PMFACTOR=
Controlling the Table Computations use additive adjustment specify seasonal factor moving average length specify the extreme value limit for trading-day regression specify the lower bound for extreme irregulars specify the upper bound for extreme irregulars include the length-of-month in trading-day regression specify trading-day regression action compute summary measure only modify extreme irregulars prior to trend cycle estimation specify moving average length in trend cycle estimation specify weights for prior trading-day factors
MONTHLY QUARTERLY MACURVES MONTHLY
ADDITIVE ADDITIVE
MONTHLY QUARTERLY MONTHLY QUARTERLY MONTHLY
FULLWEIGHT= FULLWEIGHT= ZEROWEIGHT= ZEROWEIGHT= LENGTH
MONTHLY MONTHLY QUARTERLY MONTHLY QUARTERLY MONTHLY QUARTERLY PDWEIGHTS
TDREGR= SUMMARY SUMMARY TRENDADJ TRENDADJ TRENDMA= TRENDMA=
EXCLUDE=
PROC X11 Statement PROC X11 options ;
The following options can appear in the PROC X11 statement: DATA= SAS-data-set
specifies the input SAS data set used. If it is omitted, the most recently created SAS data set is used. OUTEXTRAP
adds the extra observations used in ARIMA processing to the output data set. When ARIMA forecasting/backcasting is requested, extra observations are appended to the ends of the series, and the calculations are carried out on this extended series. The appended observations are not normally written to the OUT= data set. However, if OUTEXTRAP is specified, these extra observations are written to the output data set. If a DATE= variable is specified in the MONTHLY/QUARTERLY statement, the date variable is extrapolated to identify forecasts/backcasts. The OUTEXTRAP option can be abbreviated as OUTEX.
ARIMA Statement F 2043
NOPRINT
suppresses any printed output. The NOPRINT option overrides any PRINTOUT=, CHARTS=, or TABLES statement and any output associated with the ARIMA statement. OUTSPAN= SAS-data-set
specifies the output data set to store the sliding spans analysis results. Tables A1, C18, D10, and D11 for each span are written to this data set. See the section “The OUTSPAN= Data Set” on page 2071 for details. OUTSTB= SAS-data-set
specifies the output data set to store the stable seasonality test results (table D8). All the information in the analysis of variance table associated with the stable seasonality test is contained in the variables written to this data set. See the section “OUTSTB= Data Set” on page 2071 for details. OUTTDR= SAS-data-set
specifies the output data set to store the trading-day regression results (tables B15 and C15). All the information in the analysis of variance table associated with the trading-day regression is contained in the variables written to this data set. This option is valid only when TDREGR=PRINT, TEST, or ADJUST is specified in the MONTHLY statement. See the section “OUTTDR= Data Set” on page 2072 for details. YRAHEADOUT
adds one-year-ahead forecast values to the output data set for tables C16, C18, and D10. The original purpose of this option was to avoid recomputation of the seasonal adjustment factors when new data became available. While computing costs were an important factor when the X-11 method was developed, this is no longer the case and this option is obsolete. See the section “The YRAHEADOUT Option” on page 2067 for details.
ARIMA Statement ARIMA options ;
The ARIMA statement applies the X-11-ARIMA method to the series specified in the VAR statement. This method uses an ARIMA model estimated from the original data to extend the series one or more years. The ARIMA statement options control the ARIMA model used and the estimation, forecasting, and printing of this model. There are two ways of obtaining an ARIMA model to extend the series. A model can be given explicitly with the MODEL= and TRANSFORM= options. Alternatively, the best-fitting model from a set of five predefined models is found automatically whenever the MODEL= option is absent. See the section “Details of Model Selection” on page 2068 for details. BACKCAST= n
specifies the number of years to backcast the series. The default is BACKCAST= 0. See the section “Effect of Backcast and Forecast Length” on page 2067 for details.
2044 F Chapter 31: The X11 Procedure
CHICR= value
specifies the criteria for the significance level for the Box-Ljung chi-square test for lack of fit when testing the five predefined models. The default is CHICR= 0.05. The CHICR= option values must be between 0.01 and 0.90. The hypothesis being tested is that of model adequacy. Nonrejection of the hypothesis is evidence for an adequate model. Making the CHICR= value smaller makes it easier to accept the model. See the section “Criteria Details” on page 2069 for further details on the CHICR= option. CONVERGE= value
specifies the convergence criterion for the estimation of an ARIMA model. The default value is 0.001. The CONVERGE= value must be positive. FORECAST= n
specifies the number of years to forecast the series. The default is FORECAST= 1. See the section “Effect of Backcast and Forecast Length” on page 2067 for details. MAPECR= value
specifies the criteria for the mean absolute percent error (MAPE) when testing the five predefined models. A small MAPE value is evidence for an adequate model; a large MAPE value results in the model being rejected. The MAPECR= value is the boundary for acceptance/rejection. Thus a larger MAPECR= value would make it easier for a model to pass the criteria. The default is MAPECR= 15. The MAPECR= option values must be between 1 and 100. See the section “Criteria Details” on page 2069 for further details on the MAPECR= option. MAXITER= n
specifies the maximum number of iterations in the estimation process. MAXITER must be between 1 and 60; the default value is 15. METHOD= CLS METHOD= ULS METHOD= ML
specifies the estimation method. ML requests maximum likelihood, ULS requests unconditional least squares, and CLS requests conditional least squares. METHOD=CLS is the default. The maximum likelihood estimates are more expensive to compute than the conditional least squares estimates. In some cases, however, they can be preferable. For further information on the estimation methods, see “Estimation Details” on page 248 in Chapter 7, “The ARIMA Procedure.” MODEL= ( P=n1 Q=n2 SP=n3 SQ=n4 DIF=n5 SDIF=n6 < NOINT > < CENTER >)
specifies the ARIMA model. The AR and MA orders are given by P=n1 and Q=n2, respectively, while the seasonal AR and MA orders are given by SP=n3 and SQ=n4, respectively. The lag corresponding to seasonality is determined by the MONTHLY or QUARTERLY statement. Similarly, differencing and seasonal differencing are given by DIF=n5 and SDIF=n6, respectively. For example arima model=( p=2 q=1 sp=1 dif=1 sdif=1 );
ARIMA Statement F 2045
specifies a (2,1,1)(1,1,0)s model, where s, the seasonality, is either 12 (monthly) or 4 (quarterly). More examples of the MODEL= syntax are given in the section “Details of Model Selection” on page 2068. NOINT
suppresses the fitting of a constant (or intercept) parameter in the model. (That is, the parameter is omitted.) CENTER
centers each time series by subtracting its sample mean. The analysis is done on the centered data. Later, when forecasts are generated, the mean is added back. Note that centering is done after differencing. The CENTER option is normally used in conjunction with the NOCONSTANT option of the ESTIMATE statement. For example, to fit an AR(1) model on the centered data without an intercept, use the following ARIMA statement: arima model=( p=1 center noint );
NOPRINT
suppresses the normal printout generated by the ARIMA statement. Note that the effect of specifying the NOPRINT option in the ARIMA statement is different from the effect of specifying the NOPRINT in the PROC X11 statement, since the former only affects ARIMA output. OVDIFCR= value
specifies the criteria for the over-differencing test when testing the five predefined models. When the MA parameters in one of these models sum to a number close to 1.0, this is an indication of over-parameterization and the model is rejected. The OVDIFCR= value is the boundary for this rejection; values greater than this value fail the over-differencing test. A larger OVDIFCR= value would make it easier for a model to pass the criteria. The default is OVDIFCR= 0.90. The OVDIFCR= option values must be between 0.80 and 0.99. See the section “Criteria Details” on page 2069 for further details on the OVDIFCR= option. PRINTALL
provides the same output as the default printing for all models fit and, in addition, prints an estimation summary and chi-square statistics for each model fit. See “Printed Output” on page 2074 for details. PRINTFP
prints the results for the initial pass of X11 made to exclude trading-day effects. This option has an effect only when the TDREGR= option specifies ADJUST, TEST, or PRINT. In these cases, an initial pass of the standard X11 method is required to get rid of calendar effects before doing any ARIMA estimation. Usually this first pass is not of interest, and by default no tables are printed. However, specifying PRINTFP in the ARIMA statement causes any tables printed in the final pass to also be printed for this initial pass.
2046 F Chapter 31: The X11 Procedure
TRANSFORM= (LOG) | LOG TRANSFORM= ( constant ** power )
The ARIMA statement in PROC X11 allows certain transformations on the series before estimation. The specified transformation is applied only to a user-specified model. If TRANSFORM= is specified and the MODEL= option is not specified, the transformation request is ignored and a warning is printed. The LOG transformation requests that the natural log of the series be used for estimation. The resulting forecast values are transformed back to the original scale. A general power transformation of the form Xt ! .Xt C a/b is obtained by specifying transform= ( a ** b )
If the constant a is not specified, it is assumed to be zero. The specified ARIMA model is then estimated using the transformed series. The resulting forecast values are transformed back to the original scale.
BY Statement BY variables ;
A BY statement can be used with PROC X11 to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input DATA= data set to be sorted in order of the BY variables.
ID Statement ID variables ;
If you are creating an output data set, use the ID statement to put values of the ID variables, in addition to the table values, into the output data set. The ID statement has no effect when an output data set is not created. If the DATE= variable is specified in the MONTHLY or QUARTERLY statement, this variable is included automatically in the OUTPUT data set. If no DATE= variable is specified, the variable _DATE_ is added. The date variable (or _DATE_) values outside the range of the actual data (from ARIMA forecasting or backcasting, or from YRAHEADOUT) are extrapolated, while all other ID variables are missing.
MACURVES Statement MACURVES month=option . . . ;
MONTHLY Statement F 2047
The MACURVES statement specifies the length of the moving-average curves for estimating the seasonal factors for any month. This statement can be used only with monthly time series data. The month=option specifications consist of the month name (or the first three letters of the month name), an equal sign, and one of the following option values: ’3’
specifies a three-term moving average for the month
’3X3’
specifies a three-by-three moving average
’3X5’
specifies a three-by-five moving average
’3X9’
specifies a three-by-nine moving average
STABLE
specifies a stable seasonal factor (average of all values for the month)
For example, the statement macurves jan=’3’ feb=’3x3’ march=’3x5’ april=’3x9’;
uses a three-term moving average to estimate seasonal factors for January, a 3 3 (a three-term moving average of a three-term moving average) for February, a 3 5 (a three-term moving average of a five-term moving average) for March, and a 3 9 (a three-term moving average of a nine-term moving average) for April. The numeric values used for the weights of the various moving averages and a discussion of the derivation of these weights are given in Shiskin, Young, and Musgrave (1967). A general discussion of moving average weights is given in Dagum (1985). If the specification for a month is omitted, the X11 procedure uses a three-by-three moving average for the first estimate of each iteration and a three-by-five average for the second estimate.
MONTHLY Statement MONTHLY options ;
The MONTHLY statement must be used when the input data to PROC X11 are a monthly time series. The MONTHLY statement specifies options that determine the computations performed by PROC X11 and what is included in its output. Either the DATE= or START= option must be used. The following options can appear in the MONTHLY statement. ADDITIVE
performs additive adjustments. If the ADDITIVE option is omitted, PROC X11 performs multiplicative adjustments.
2048 F Chapter 31: The X11 Procedure
CHARTS= STANDARD CHARTS= FULL CHARTS= NONE
specifies the charts produced by the procedure. The default is CHARTS=STANDARD, which specifies 12 monthly seasonal charts and a trend cycle chart. If you specify CHARTS=FULL (or CHARTS=ALL), the procedure prints additional charts of irregular and seasonal factors. To print no charts, specify CHARTS=NONE. The TABLES statement can also be used to specify particular monthly charts to be printed. If no CHARTS= option is given, and a TABLES statement is given, the TABLES statement overrides the default value of CHARTS=STANDARD; that is, no charts (or tables) are printed except those specified in the TABLES statement. However, if both the CHARTS= option and a TABLES statement are given, the charts corresponding to the CHARTS= option and those requested by the TABLES statement are printed. For example, suppose you wanted only charts G1, the final seasonally adjusted series and trend cycle, and G4, the final irregular and final modified irregular series. You would specify the following statements: monthly date=date; tables g1 g4;
DATE= variable
specifies a variable that gives the date for each observation. The starting and ending dates are obtained from the first and last values of the DATE= variable, which must contain SAS date values. The procedure checks values of the DATE= variable to ensure that the input observations are sequenced correctly. This variable is automatically added to the OUTPUT= data set if one is requested and extrapolated if necessary. If the DATE= option is not specified, the START= option must be specified. The DATE= option and the START= and END= options can be used in combination to subset a series for processing. For example, suppose you have 12 years of monthly data (144 observations, no missing values) beginning in January 1970 and ending in December 1981, and you wanted to seasonally adjust only six years beginning in January 1974. Specifying monthly date=date start=jan1974 end=dec1979;
would seasonally adjust only this subset of the data. If instead you wanted to adjust the last eight years of data, only the START= option is needed: monthly date=date start=jan1974;
END= mmmyyyy
specifies that only the part of the input series ending with the month and year given be adjusted (for example, END=DEC1970). See the DATE=variable option for using the START= and END= options to subset a series for processing. EXCLUDE= value
excludes from the trading-day regression any irregular values that are more than value standard deviations from the mean. The EXCLUDE=value must be between 0.1 and 9.9, with the default value being 2.5.
MONTHLY Statement F 2049
FULLWEIGHT= value
assigns weights to irregular values based on their distance from the mean in standard deviation units. The weights are used for estimating seasonal and trend cycle components. Irregular values less than the FULLWEIGHT= value (in standard deviation units) are assigned full weights of 1, values that fall between the ZEROWEIGHT= and FULLWEIGHT= limits are assigned weights linearly graduated between 0 and 1, and values greater than the ZEROWEIGHT= limit are assigned a weight of 0. For example, if ZEROWEIGHT=2 and FULLWEIGHT=1, a value 1.3 standard deviations from the mean would be assigned a graduated weight. The FULLWEIGHT=value must be between 0.1 and 9.9 but must be less than the ZEROWEIGHT=value. The default is FULLWEIGHT=1.5. LENGTH
includes length-of-month allowance in computing trading-day factors. If this option is omitted, length-of-month allowances are included with the seasonal factors. NDEC= n
specifies the number of decimal places shown in the printed tables in the listing. This option has no effect on the precision of the variable values in the output data set. PMFACTOR= variable
specifies a variable containing the prior monthly factors. Use this option if you have previous knowledge of monthly adjustment factors. The PMFACTOR= option can be used to make the following adjustments:
adjust the level of all or part of a series with discontinuities
adjust for the influence of holidays that fall on different dates from year to year, such as the effect of Easter on certain retail sales
adjust for unreasonable weather influence on series, such as housing starts
adjust for changing starting dates of fiscal years (for budget series) or model years (for automobiles)
adjust for temporary dislocating events, such as strikes
See the section “Prior Daily Weights and Trading-Day Regression” on page 2065 for details and examples using the PMFACTOR= option. PRINTOUT= STANDARD | LONG | FULL | NONE
specifies the tables to be printed by the procedure. If the PRINTOUT=STANDARD option is specified, between 17 and 27 tables are printed, depending on the other options that are specified. PRINTOUT=LONG prints between 27 and 39 tables, and PRINTOUT=FULL prints between 44 and 59 tables. Specifying PRINTOUT=NONE results in no tables being printed; however, charts are still printed. The default is PRINTOUT=STANDARD. The TABLES statement can also be used to specify particular monthly tables to be printed. If no PRINTOUT= option is specified, and a TABLES statement is given, the TABLES statement overrides the default value of PRINTOUT=STANDARD; that is, no tables (or charts) are printed except those given in the TABLES statement. However, if both the PRINTOUT=
2050 F Chapter 31: The X11 Procedure
option and a TABLES statement are specified, the tables corresponding to the PRINTOUT= option and those requested by the TABLES statement are printed. START= mmmyyyy
adjusts only the part of the input series starting with the specified month and year. When the DATE= option is not used, the START= option gives the year and month of the first input observation — for example, START=JAN1966. START= must be specified if DATE= is not given. If START= is specified (and no DATE= option is given), and an OUT= data set is requested, a variable named _DATE_ is added to the data set, giving the date value for each observation. See the DATE= variable option for using the START= and END= options to subset a series. SUMMARY
specifies that the data are already seasonally adjusted and the procedure is to produce summary measures. If the SUMMARY option is omitted, the X11 procedure performs seasonal adjustment of the input data before calculating summary measures. TDCOMPUTE= year
uses the part of the input series beginning with January of the specified year to derive tradingday weights. If this option is omitted, the entire series is used. TDREGR= NONE | PRINT | ADJUST | TEST
specifies the treatment of trading-day regression. TDREG=NONE omits the computation of the trading-day regression. TDREG=PRINT computes and prints the trading-day regressions but does not adjust the series. TDREG=ADJUST computes and prints the trading-day regression and adjusts the irregular components to obtain preliminary weights. TDREG=TEST adjusts the final series if the trading-day regression estimates explain significant variation on the basis of an F test (or residual trading-day variation if prior weights are used). The default is TDREGR=NONE. See the section “Prior Daily Weights and Trading-Day Regression” on page 2065 for details and examples using the TDREGR= option. If ARIMA processing is requested, any value of TDREGR other than the default TDREGR=NONE will cause PROC X11 to perform an initial pass (see the “Details: X11 Procedure” on page 2056 section and the PRINTFP option). The significance level reported in Table C15 should be viewed with caution. The dependent variable in the trading-day regression is the irregular component formed by an averaging operation. This induces a correlation in the dependent variable and hence in the residuals from which the F test is computed. Hence the distribution of the trading-day regression F statistics differs from an exact F; see Cleveland and Devlin (1980) for details. TRENDADJ
modifies extreme irregular values prior to computing the trend cycle estimates in the first iteration. If the TRENDADJ option is omitted, the trend cycle is computed without modifications for extremes. TRENDMA= 9 | 13 | 23
specifies the number of terms in the moving average to be used by the procedure in estimating
OUTPUT Statement F 2051
the variable trend cycle component. The value of the TRENDMA= option must be 9, 13, or 23. If the TRENDMA= option is omitted, the procedure selects an appropriate moving average. For information about the number of terms in the moving average, see Shiskin, Young, and Musgrave (1967). ZEROWEIGHT= value
assigns weights to irregular values based on their distance from the mean in standard deviation units. The weights are used for estimating seasonal and trend cycle components. Irregular values beyond the standard deviation limit specified in the ZEROWEIGHT= option are assigned zero weights. Values that fall between the two limits (ZEROWEIGHT= and FULLWEIGHT=) are assigned weights linearly graduated between 0 and 1. For example, if ZEROWEIGHT=2 and FULLWEIGHT=1, a value 1.3 standard deviations from the mean would be assigned a graduated weight. The ZEROWEIGHT=value must be between 0.1 and 9.9 but must be greater than the FULLWEIGHT=value. The default is ZEROWEIGHT=2.5. The ZEROWEIGHT option can be used in conjunction with the FULLWEIGHT= option to adjust outliers from a monthly or quarterly series. See Example 31.3 later in this chapter for an illustration of this use.
OUTPUT Statement OUTPUT OUT= SAS-data-set tablename=var1 var2 . . . ;
The OUTPUT statement creates an output data set containing specified tables. The data set is named by the OUT= option. OUT= SAS-data-set
If OUT= is omitted, the SAS System names the new data set by using the DATAn convention. For each table to be included in the output data set, write the X11 table identification keyword, an equal sign, and a list of new variable names: tablename = var1 var2 ... The tablename keywords that can be used in the OUTPUT statement are listed in the section “Printed Output” on page 2074. The following is an example of a VAR statement and an OUTPUT statement: var z1 z2 z3; output out=out_x11
b1=s
d11=w x y;
The variable s contains the table B1 values for the variable z1, while the table D11 values for variables z1, z2, and z3 are contained in variables w, x, and y, respectively. As this example shows, the list of variables following a tablename= keyword can be shorter than the VAR variable list. In addition to the variables named by tablename =var1 var2 . . . , the ID variables, and BY variables, the output data set contains a date identifier variable. If the DATE= option is given
2052 F Chapter 31: The X11 Procedure
in the MONTHLY or QUARTERLY statement, the DATE= variable is the date identifier. If no DATE= option is given, a variable named _DATE_ is the date identifier.
PDWEIGHTS Statement PDWEIGHTS day=w . . . ;
The PDWEIGHTS statement can be used to specify one to seven daily weights. The statement can only be used with monthly series that are seasonally adjusted using the multiplicative model. These weights are used to compute prior trading-day factors, which are then used to adjust the original series prior to the seasonal adjustment process. Only relative weights are needed; the X11 procedure adjusts the weights so that they sum to 7.0. The weights can also be corrected by the procedure on the basis of estimates of trading-day variation from the input data. See the section “Prior Daily Weights and Trading-Day Regression” on page 2065 for details and examples using the PDWEIGHTS statement. Each day=w option specifies a weight (w) for the named day. The day can be any day, Sunday through Saturday. The day keyword can be the full spelling of the day, or the three-letter abbreviation. For example, SATURDAY=1.0 and SAT=1.0 are both valid. The weights w must be a numeric value between 0.0 and 10.0. The following is an example of a PDWEIGHTS statement: pdweights sun=.2 mon=.9 tue=1 wed=1 thu=1 fri=.8 sat=.3;
Any number of days can be specified with one PDWEIGHTS statement. The default weight value for any day that is not specified is 0. If you do not use a PDWEIGHTS statement, the program computes daily weights if TDREGR=ADJUST is specified. See Shiskin, Young, and Musgrave (1967) for details.
QUARTERLY Statement QUARTERLY options ;
The QUARTERLY statement must be used when the input data are quarterly time series. This statement includes options that determine the computations performed by the procedure and what is in the printed output. The DATE= option or the START= option must be used. The following options can appear in the QUARTERLY statement. ADDITIVE
performs additive adjustments. If this option is omitted, the procedure performs multiplicative adjustments.
QUARTERLY Statement F 2053
CHARTS= STANDARD CHARTS= FULL CHARTS= NONE
specifies the charts to be produced by the procedure. The default value is CHARTS=STANDARD, which specifies four quarterly seasonal charts and a trend cycle chart. If you specify CHARTS=FULL (or CHARTS=ALL), the procedure prints additional charts of irregular and seasonal factors. To print no charts, specify CHARTS=NONE. The TABLES statement can also be used to specify particular charts to be printed. The presence of a TABLES statement overrides the default value of CHARTS=STANDARD; that is, if a TABLES statement is specified, and no CHARTS=option is specified, no charts (nor tables) are printed except those given in the TABLES statement. However, if both the CHARTS= option and a TABLES statement are given, the charts corresponding to the CHARTS= option and those requested by the TABLES statement are printed. For example, suppose you wanted only charts G1, the final seasonally adjusted series and trend cycle, and G4, the final irregular and final modified irregular series. This is accomplished by specifying the following statements: quarterly date=date; tables g1 g4;
DATE= variable
specifies a variable that gives the date for each observation. The starting and ending dates are obtained from the first and last values of the DATE= variable, which must contain SAS date values. The procedure checks values of the DATE= variable to ensure that the input observations are sequenced correctly. This variable is automatically added to the OUTPUT= data set if one is requested, and extrapolated if necessary. If the DATE= option is not specified, the START= option must be specified. The DATE= option and the START= and END= options can be used in combination to subset a series for processing. For example, suppose you have a series with 10 years of quarterly data (40 observations, no missing values) beginning in ‘1970Q1’ and ending in ‘1979Q4’, and you want to seasonally adjust only four years beginning in ‘1974Q1’ and ending in ‘1977Q4’. Specifying quarterly date=variable start=’1974q1’ end=’1977q4’;
seasonally adjusts only this subset of the data. If instead you wanted to adjust the last six years of data, only the START= option is needed: quarterly date=variable start=’1974q1’;
END= ‘yyyyQq’
specifies that only the part of the input series ending with the quarter and year given be adjusted (for example, END=’1973Q4’). The specification must be enclosed in quotes and q must be 1, 2, 3, or 4. See the DATE= variable option for using the START= and END= options to subset a series. FULLWEIGHT= value
assigns weights to irregular values based on their distance from the mean in standard deviation
2054 F Chapter 31: The X11 Procedure
units. The weights are used for estimating seasonal and trend cycle components. Irregular values less than the FULLWEIGHT= value (in standard deviation units) are assigned full weights of 1, values that fall between the ZEROWEIGHT= and FULLWEIGHT= limits are assigned weights linearly graduated between 0 and 1, and values greater than the ZEROWEIGHT= limit are assigned a weight of 0. For example, if ZEROWEIGHT=2 and FULLWEIGHT=1, a value 1.3 standard deviations from the mean would be assigned a graduated weight. The default is FULLWEIGHT=1.5. NDEC= n
specifies the number of decimal places shown on the output tables. This option has no effect on the precision of the variables in the output data set. PRINTOUT= STANDARD PRINTOUT= LONG PRINTOUT= FULL PRINTOUT= NONE
specifies the tables to print. If PRINTOUT=STANDARD is specified, between 17 and 27 tables are printed, depending on the other options that are specified. PRINTOUT=LONG prints between 27 and 39 tables, and PRINTOUT=FULL prints between 44 and 59 tables. Specifying PRINTOUT=NONE results in no tables being printed. The default is PRINTOUT=STANDARD. The TABLES statement can also specify particular quarterly tables to be printed. If no PRINTOUT= is given, and a TABLES statement is given, the TABLES statement overrides the default value of PRINTOUT=STANDARD; that is, no tables (or charts) are printed except those given in the TABLES statement. However, if both the PRINTOUT= option and a TABLES statement are given, the tables corresponding to the PRINTOUT= option and those requested by the TABLES statement are printed. START= ’yyyyQq’
adjusts only the part of the input series starting with the quarter and year given. When the DATE= option is not used, the START= option gives the year and quarter of the first input observation (for example, START=’1967Q1’). The specification must be enclosed in quotes, and q must be 1, 2, 3, or 4. START= must be specified if the DATE= option is not given. If START= is specified (and no DATE= is given), and an OUTPUT= data set is requested, a variable named _DATE_ is added to the data set, giving the date value for a given observation. See the DATE= option for using the START= and END= options to subset a series. SUMMARY
specifies that the input is already seasonally adjusted and that the procedure is to produce summary measures. If this option is omitted, the procedure performs seasonal adjustment of the input data before calculating summary measures. TRENDADJ
modifies extreme irregular values prior to computing the trend cycle estimates. If this option is omitted, the trend cycle is computed without modification for extremes.
SSPAN Statement F 2055
ZEROWEIGHT= value
assigns weights to irregular values based on their distance from the mean in standard deviation units. The weights are used for estimating seasonal and trend cycle components. Irregular values beyond the standard deviation limit specified in the ZEROWEIGHT= option are assigned zero weights. Values that fall between the two limits (ZEROWEIGHT= and FULLWEIGHT=) are assigned weights linearly graduated between 0 and 1. For example, if ZEROWEIGHT=2 and FULLWEIGHT=1, a value 1.3 standard deviations from the mean would be assigned a graduated weight. The default is ZEROWEIGHT=2.5. The ZEROWEIGHT option can be used in conjunction with the FULLWEIGHT= option to adjust outliers from a monthly or quarterly series. See Example 31.3 later in this chapter for an illustration of this use.
SSPAN Statement SSPAN options ;
The SSPAN statement applies sliding spans analysis to determine the suitability of seasonal adjustment for an economic series. The following options can appear in the SSPAN statement: NDEC= n
specifies the number of decimal places shown on selected sliding span reports. This option has no effect on the precision of the variables values in the OUTSPAN output data set. CUTOFF= value
gives the percentage value for determining an excessive difference within a span for the seasonal factors, the seasonally adjusted series, and month-to-month and year-to-year differences in the seasonally adjusted series. The default value is 3.0. The use of the CUTOFF=value in determining the maximum percent difference (MPD) is described in the section “Computational Details for Sliding Spans Analysis” on page 2062. Caution should be used in changing the default CUTOFF=value. The empirical threshold ranges found by the U.S. Census Bureau no longer apply when value is changed. TDCUTOFF= value
gives the percentage value for determining an excessive difference within a span for the trading-day factors. The default value is 2.0. The use of the TDCUTOFF=value in determining the maximum percent difference (MPD) is described in the section “Computational Details for Sliding Spans Analysis” on page 2062. Caution should be used in changing the default TDCUTOFF=value. The empirical threshold ranges found by the U.S. Census Bureau no longer apply when the value is changed. NOPRINT
suppresses all sliding span reports. See “Computational Details for Sliding Spans Analysis” on page 2062 for more details on sliding span reports.
2056 F Chapter 31: The X11 Procedure
PRINT
prints the summary sliding span reports S 0 through S 6.E. PRINTALL
prints the summary sliding spans report S 0 through S 6.E, along with detail reports S 7.A through S 7.E.
TABLES Statement TABLES tablenames ;
The TABLES statement prints the tables specified in addition to the tables that are printed as a result of the PRINTOUT= option in the MONTHLY or QUARTERLY statement. Table names are listed in Table 31.4 later in this chapter. To print only selected tables, omit the PRINTOUT= option in the MONTHLY or QUARTERLY statement and list the tables to be printed in the TABLES statement. For example, to print only the final seasonal factors and final seasonally adjusted series, use the statement tables d10 d11;
VAR Statement VAR variables ;
The VAR statement is used to specify the variables in the input data set that are to be analyzed by the procedure. Only numeric variables can be specified. If the VAR statement is omitted, all numeric variables are analyzed except those appearing in a BY or ID statement or the variable named in the DATE= option in the MONTHLY or QUARTERLY statement.
Details: X11 Procedure
Historical Development of X-11 This section briefly describes the historical development of the standard X-11 seasonal adjustment method and the later development of the X-11-ARIMA method. Most of the following discussion is based on a comprehensive article by Bell and Hillmer (1984), which describes the history of X-11 and the justification of using seasonal adjustment methods, such as X-11, given the current
Historical Development of X-11 F 2057
availability of time series software. For further discussions about statistical problems associated with the X-11 method, see Ghysels (1990). Seasonal adjustment methods began to be developed in the 1920s and 1930s, before there were suitable analytic models available and before electronic computing devices were in existence. The lack of any suitable model led to methods that worked the same for any series — that is, methods that were not model-based and that could be applied to any series. Experience with economic series had shown that a given mathematical form could adequately represent a time series only for a fixed length; as more data were added, the model became inadequate. This suggested an approach that used moving averages. For further analysis of the properties of X-11 moving averages, see Cleveland and Tiao (1976). The basic method was to break up an economic time series into long-term trend, long-term cyclical movements, seasonal movements, and irregular fluctuations. Early investigators found that it was not possible to uniquely decompose the trend and cycle components. Thus, these two were grouped together; the resulting component is usually referred to as the “trend cycle component.” It was also found that estimating seasonal components in the presence of trend produced biased estimates of the seasonal components, but, at the same time, estimating trend in the presence of seasonality was difficult. This eventually lead to the iterative approach used in the X-11 method. Two other problems were encountered by early investigators. First, some economic series appear to have changing or evolving seasonality. Secondly, moving averages were very sensitive to extreme values. The estimation method used in the X-11 method allows for evolving seasonal components. For the second problem, the X-11 method uses repeated adjustment of extreme values. All of these problems encountered in the early investigation of seasonal adjustment methods suggested the use of moving averages in estimating components. Even with the use of moving averages instead of a model-based method, massive amounts of hand calculations were required. Only a small number of series could be adjusted, and little experimentation could be done to evaluate variations on the method. With the advent of electronic computing in the 1950s, work on seasonal adjustment methods proceeded rapidly. These methods still used the framework previously described; variants of these basic methods could now be easily tested against a large number of series. Much of the work was done by Julian Shiskin and others at the U.S. Bureau of the Census beginning in 1954 and culminating after a number of variants into the X-11 Variant of the Census Method II Seasonal Adjustment Program, which PROC X11 implements. References for this work during this period include Shiskin and Eisenpress (1957), Shiskin (1958), and Marris (1961). The authoritative documentation for the X-11 Variant is in Shiskin, Young, and Musgrave (1967). This document is not equivalent to a program specification; however, the FORTRAN code that implements the X-11 Variant is in the public domain. A less detailed description of the X-11 Variant is given in U.S. Bureau of the Census (1969).
2058 F Chapter 31: The X11 Procedure
Development of the X-11-ARIMA Method The X-11 method uses symmetric moving averages in estimating the various components. At the end of the series, however, these symmetric weights cannot be applied. Either asymmetric weights have to be used, or some method of extending the series must be found. While various methods of extending a series have been proposed, the most important method to date has been the X-11-ARIMA method developed at Statistics Canada. This method uses Box-Jenkins ARIMA models to extend the series. The Time Series Research and Analysis Division of Statistics Canada investigated 174 Canadian economic series and found five ARIMA models out of twelve that fit the majority of series well and reduced revisions for the most recent months. References that give details of various aspects of the X-11-ARIMA methodology include Dagum (1980, 1982a, c, 1983, 1988), Laniel (1985), Lothian and Morry (1978a), and Huot et al. (1986).
Differences between X11ARIMA/88 and PROC X11 The original implementation of the X-11-ARIMA method was by Statistics Canada in 1980 (Dagum 1980), with later changes and enhancements made in 1988 (Dagum 1988). The calculations performed by PROC X11 differ from those in X11ARIMA/88, which will result in differences in the final component estimates provided by these implementations. There are three areas where Statistics Canada made changes to the original X-11 seasonal adjustment method in developing X11ARIMA/80 (Monsell 1984). These are (a) selection of extreme values, (b) replacement of extreme values, and (c) generation of seasonal and trend cycle weights. These changes have not been implemented in the current version of PROC X11. Thus the procedure produces results identical to those from previous versions of PROC X11 in the absence of an ARIMA statement. Additional differences can result from the ARIMA estimation. X11ARIMA/88 uses conditional least squares (CLS), while CLS, unconditional least squares (ULS) and maximum likelihood (ML) are all available in PROC X11 by using the METHOD= option in the ARIMA statement. Generally, parameters estimates will differ for the different methods.
Implementation of the X-11 Seasonal Adjustment Method The following steps describe the analysis of a monthly time series using multiplicative seasonal adjustment. Additional steps used by the X-11-ARIMA method are also indicated. Equivalent descriptions apply for an additive model if you replace divide with subtract where applicable. In the multiplicative adjustment, the original series Ot is assumed to be of the form Ot D Ct St It Pt Dt where Ct is the trend cycle component, St is the seasonal component, It is the irregular component, Pt is the prior monthly factors component, and Dt is the trading-day component.
Implementation of the X-11 Seasonal Adjustment Method F 2059
The trading-day component can be further factored as Dt D Dr;t Dt r;t ; where Dt r;t are the trading-day factors derived from the prior daily weights, and Dr;t are the residual trading-day factors estimated from the trading-day regression. For further information about estimating trading day variation, see Young (1965).
Additional Steps When Using the X-11-ARIMA Method The X-11-ARIMA method consists of extending a given series by an ARIMA model and applying the usual X-11 seasonal adjustment method to this extended series. Thus in the simplest case in which there are no prior factors or calendar effects in the series, the ARIMA model selection, estimation, and forecasting are performed first, and the resulting extended series goes through the standard X-11 steps described in the next section. If prior factor or calendar effects are present, they must be eliminated from the series before the ARIMA estimation is done because these effects are not stochastic. Prior factors, if present, are removed first. Calendar effects represented by prior daily weights are then removed. If there are no further calendar effects, the adjusted series is extended by the ARIMA model, and this extended series goes through the standard X-11 steps without repeating the removal of prior factors and calendar effects from prior daily weights. If further calendar effects are present, a trading-day regression must be performed. In this case it is necessary to go through an initial pass of the X-11 steps to obtain a final trading-day adjustment. In this initial pass, the series, adjusted for prior factors and prior daily weights, goes through the standard X-11 steps. At the conclusion of these steps, a final series adjusted for prior factors and all calendar effects is available. This adjusted series is then extended by the ARIMA model, and this extended series goes through the standard X-11 steps again, without repeating the removal of prior factors and calendar effects from prior daily weights and trading-day regression.
The Standard X-11 Seasonal Adjustment Method The standard X-11 seasonal adjustment method consists of the following steps. These steps are applied to the original data or the original data extended by an ARIMA model. 1. In step 1, the data are read, ignoring missing values until the first nonmissing value is found. If prior monthly factors are present, the procedure reads prior monthly Pt factors and divides them into the original series to obtain Ot =Pt D Ct St It Dt r;t Dr;t . Seven daily weights can be specified to develop monthly factors to adjust the series for trading-day variation, Dt r;t ; these factors are then divided into the original or prior adjusted series to obtain Ct St It Dr;t . 2. In steps 2, 3, and 4, three iterations are performed, each of which provides estimates of the seasonal St , trading-day Dr;t , trend cycle Ct , and irregular components It . Each iteration refines estimates of the extreme values in the irregular components. After extreme values are
2060 F Chapter 31: The X11 Procedure
identified and modified, final estimates of the seasonal component, seasonally adjusted series, trend cycle, and irregular components are produced. Step 2 consists of three substeps: a) During the first iteration, a centered, 12-term moving average is applied to the original series Ot to provide a preliminary estimate CO t of the trend cycle curve Ct . This moving average combines 13 (a 2-term moving average of a 12-term moving average) consecutive monthly values, removing the St and It . Next, it obtains a preliminary estimate St It by
b
b
St It D
Ot CO t
b
b) A moving average is then applied to the St It to obtain an estimate SOt of the seasonal factors. St It is then divided by this estimate to obtain an estimate IOt of the irregular component. Next, a moving standard deviation is calculated from the irregular component and is used in assigning a weight to each monthly value for measuring its degree of extremeness. These weights are used to modify extreme values in St It . New seasonal factors are estimated by applying a moving average to the modified value of St It . A preliminary seasonally adjusted series is obtained by dividing the original series by these new seasonal factors. A second estimate of the trend cycle is obtained by applying a weighted moving average to this seasonally adjusted series.
b
b
b
c) The same process is used to obtain second estimates of the seasonally adjusted series and improved estimates of the irregular component. This irregular component is again modified for extreme values and then used to provide estimates of trading-day factors and refined weights for the identification of extreme values. 3. Using the same computations, a second iteration is performed on the original series that has been adjusted by the trading-day factors and irregular weights developed in the first iteration. The second iteration produces final estimates of the trading-day factors and irregular weights. 4. A third and final iteration is performed using the original series that has been adjusted for trading-day factors and irregular weights computed during the second iteration. During the third iteration, PROC X11 develops final estimates of seasonal factors, the seasonally adjusted series, the trend cycle, and the irregular components. The procedure computes summary measures of variation and produces a moving average of the final adjusted series.
Sliding Spans Analysis The motivation for sliding spans analysis is to answer the question, When is a economic series unsuitable for seasonal adjustment? There have been a number of past attempts to answer this question: stable seasonality F test; moving seasonality F test, Q statistics, and others. Sliding spans analysis attempts to quantify the stability of the seasonal adjustment process, and hence quantify the suitability of seasonal adjustment for a given series. It is based on a very simple idea: for a stable series, deleting a small number of observations should not result in greatly different component estimates compared with the original, full series. Conversely, if deleting a small number of observations results in drastically different estimates, the series is unstable. For example, a drastic difference in the seasonal factors (Table D10) might result
Implementation of the X-11 Seasonal Adjustment Method F 2061
from a dominating irregular component or sudden changes in the seasonally component. When the seasonal component estimates of a series is unstable in this manner, they have little meaning and the series is likely to be unsuitable for seasonal adjustment. Sliding spans analysis, developed at the Statistical Research Division of the U.S. Census Bureau (Findley et al. 1990; Findley and Monsell 1986), performs a repeated seasonal adjustment on subsets or spans of the full series. In particular, an initial span of the data, typically eight years in length, is seasonally adjusted, and the Tables C18, the trading-day factors (if trading-day regression performed), D10, the seasonal factors, and D11, the seasonally adjusted series are retained for further processing. Next, one year of data is deleted from the beginning of the initial span and one year of data is added. This new span is seasonally adjusted as before, with the same tables retained. This process continues until the end of the data is reached. The beginning and ending dates of the spans are such that the last observation in the original data is also the last observation in the last span. This is discussed in more detail in the following paragraphs. The following notation for the components or differences computed in the sliding spans analysis follows Findley et al. (1990). The meaning for the symbol Xt .k/ is component X in month (or quarter) t , computed from data in the kth span. These components are now defined. Seasonal Factors (Table D10): St .k/ Trading-Day Factors (Table C18): TDt .k/ Seasonally Adjusted Data (Table D11): SAt .k/ Month-to-Month Changes in the Seasonally Adjusted Data: MMt .k/ Year-to-Year Changes in the Seasonally Adjusted Data: Y Yt .k/ The key measure is the maximum percent difference across spans. For example, consider a series that begins in January 1972, ends in December 1984, and has four spans, each of length 8 years (see Figure 1 in Findley et al. (1990), p. 346). Consider St .k/ the seasonal factor (Table D10) for month t for span k, and let Nt denote the number of spans containing month t ; that is, Nt D fk W span k cont ai ns month t g In the middle years of the series there is overlap of all four spans, and Nt will be 4. The last year of the series will have only one span, while the beginning can have 1 or 0 spans depending on the original length. Since we are interested in how much the seasonal factors vary for a given month across the spans, a natural quantity to consider is maxkNt St .k/
mi nkNt St .k/
In the case of the multiplicative model, it is useful to compute a percentage difference; define the maximum percentage difference (MPD) at time t as MPDt D
maxkNt St .k/ mi nkNt St .k/ mi nkNt St .k/
2062 F Chapter 31: The X11 Procedure
The seasonal factor for month t is then unreliable if MPDt is large. While no exact significance level can be computed for this statistic, empirical levels have been established by considering over 500 economic series (Findley et al. 1990; Findley and Monsell 1986). For these series it was found that for four spans, stable series typically had less than 15% of the MPD values exceeding 3.0%, while in marginally stable series, between 15% and 25% of the MPD values exceeded 3.0%. A series in which 25% or more of the MPD values exceeded 3.0% is almost always unstable. While these empirical values cannot be considered an exact significance level, they provide a useful empirical basis for deciding if a series is suitable for seasonal adjustment. These percentage values are shifted down when fewer than four spans are used.
Computational Details for Sliding Spans Analysis Length and Number of Spans The algorithm for determining the length and number of spans for a given series was developed at the U.S. Bureau of the Census, Statistical Research Division. A summary of this algorithm is as follows. First, an initial length based on the MACURVE month=option specification is determined, and then the maximum number of spans possible using this length is determined. If this maximum number exceeds four, set the number of spans to four. If this maximum number is one or zero, there are not enough observations to perform the sliding spans analysis. In this case a note is written to the log and the sliding spans analysis is skipped for this variable. If the maximum number of spans is two or three, the actual number of spans used is set equal to this maximum. Finally, the length is adjusted so that the spans begin in January (or the first quarter) of the beginning year of the span. The remainder of this section gives the computation formulas for the maximum percentage difference (MPD) calculations along with the threshold regions.
Seasonal Factors (Table D10) For the additive model, the MPD is defined as maxkNt St .k/
mi nkNt St .k/
For the multiplicative model, the MPD is MPDt D
maxkNt St .k/ mi nkNt St .k/ mi nkNt St .k/
A series for which less than 15% of the MPD values of D10 exceed 3.0% is stable; between 15% and 25% is marginally stable; and greater than 25% is unstable. Span reports S 2.A through S 2.C give the various breakdowns for the number of times the MPD exceeded these levels.
Computational Details for Sliding Spans Analysis F 2063
Trading Day Factor (Table C18) For the additive model, the MPD is defined as maxkNt TDt .k/
mi nkNt TDt .k/
For the multiplicative model, the MPD is MPDt D
maxkNt TDt .k/ mi nkNt TDt .k/ mi nkNt TDt .k/
The U.S. Census Bureau currently gives no recommendation concerning MPD thresholds for the trading-day factors. Span reports S 3.A through S 3.C give the various breakdowns for MPD thresholds. When TDREGR=NONE is specified, no trading-day computations are done, and this table is skipped.
Seasonally Adjusted Data (Table D11) For the additive model, the MPD is defined as maxkNt SAt .k/
mi nkNt SAt .k/
For the multiplicative model, the MPD is MPDt D
maxkNt SAt .k/ mi nkNt SAt .k/ mi nkNt SAt .k/
A series for which less than 15% of the MPD values of D11 exceed 3.0% is stable; between 15% and 25% is marginally stable; and greater than 25% is unstable. Span reports S 4.A through S 4.C give the various breakdowns for the number of times the MPD exceeded these levels.
Month-to-Month Changes in the Seasonally Adjusted Data Some additional notation is needed for the month-to-month and year-to-year differences. Define N1t as N1t D fk W span k cont ai ns month t and t
1g
For the additive model, the month-to-month change for span k is defined as MMt .k/ D SAt
SAt
1
while for the multiplicative model MMt .k/ D
SAt SAt SAt 1
1
2064 F Chapter 31: The X11 Procedure
Since this quantity is already in percentage form, the MPD for both the additive and multiplicative model is defined as MPDt D maxkN1t MMt .k/
mi nkN1t MMt .k/
The current recommendation of the U.S. Census Bureau is that if 35% or more of the MPD values of the month-to-month differences of D11 exceed 3.0%, then the series is usually not stable; 40% exceeding this level clearly marks an unstable series. Span reports S 5.A.1 through S 5.C give the various breakdowns for the number of times the MPD exceeds these levels. Year-to-Year Changes in the Seasonally Adjusted Data
First define N12t as N12t D fk W span k cont ai ns month t and t
12g
(Appropriate changes in notation for a quarterly series are obvious.) For the additive model, the month-to-month change for span k is defined as Y Yt .k/ D SAt
SAt
12
while for the multiplicative model Y Yt .k/ D
SAt SAt SAt 12
12
Since this quantity is already in percentage form, the MPD for both the additive and multiplicative model is defined as MPDt D maxkN1t Y Yt .k/
mi nkN1t Y Yt .k/
The current recommendation of the U.S. Census Bureau is that if 10% or more of the MPD values of the month-to-month differences of D11 exceed 3.0%, then the series is usually not stable. Span reports S 6.A through S 6.C give the various breakdowns for the number of times the MPD exceeds these levels.
Data Requirements The input data set must contain either quarterly or monthly time series, and the data must be in chronological order. For the standard X-11 method, there must be at least three years of observations (12 for quarterly time series or 36 for monthly) in the input data sets or in each BY group in the input data set if a BY statement is used. For the X-11-ARIMA method, there must be at least five years of observations (20 for quarterly time series or 60 for monthly) in the input data sets or in each BY group in the input data set if a BY statement is used.
Missing Values F 2065
Missing Values Missing values at the beginning of a series to be adjusted are skipped. Processing starts with the first nonmissing value and continues until the end of the series or until another missing value is found. Missing values are not allowed for the DATE= variable. The procedure terminates if missing values are found for this variable. Missing values found in the PMFACTOR= variable are replaced by 100 for the multiplicative model (default) and by 0 for the additive model. Missing values can occur in the output data set. If the time series specified in the OUTPUT statement is not computed by the procedure, the values of the corresponding variable are missing. If the time series specified in the OUTPUT statement is a moving average, the values of the corresponding variable are missing for the first n and last n observations, where n depends on the length of the moving average. Additionally, if the time series specified is an irregular component modified for extremes, only the modified values are given, and the remaining values are missing.
Prior Daily Weights and Trading-Day Regression Suppose that a detailed examination of retail sales at ZXY Company indicates that certain days of the week have higher amounts of sales. In particular, Thursday, Friday, and Saturday have approximately twice the amount of sales as Monday, Tuesday, and Wednesday, and no sales occur on Sunday. This means that months with five Saturdays would have higher amounts of sales than months with only four Saturdays. This phenomenon is called a calendar effect; it can be handled in PROC X11 by using the PDWEIGHTS (prior daily weights) statement or the TDREGR=option (trading-day regression). The PDWEIGHTS statement and the TDREGR=option can be used separately or together. If the relative weights are known (as in the preceding) it is appropriate to use the PDWEIGHTS statement. If further residual calendar variation is present, TDREGR=ADJUST should also be used. If you know that a calendar effect is present, but know nothing about the relative weights, use TDREGR=ADJUST without a PDWEIGHTS statement. In this example, it is assumed that the calendar variation is due to both prior daily weights and residual variation. Thus both a PDWEIGHTS statement and TDREGR=ADJUST are specified. Note that only the relative weights are needed; in the actual computations, PROC X11 normalizes the weights to sum to 7.0. If a day of the week is not present in the PDWEIGHTS statement, it is given a value of zero. Thus “sun=0” is not needed. proc x11 data=sales; monthly date=date tdregr=adjust; var sales; tables a1 a4 b15 b16 C14 C15 c18 d11; pdweights mon=1 tue=1 wed=1 thu=2 fri=2 sat=2;
2066 F Chapter 31: The X11 Procedure
output out=x11out a1=a1 a4=a4 b1=b1 c14=c14 c16=c16 c18=c18 d11=d11; run;
Tables of interest include A1, A4, B15, B16, C14, C15, C18, and D11. Table A4 contains the adjustment factors derived from the prior daily weights; Table C14 contains the extreme irregular values excluded from trading-day regression; Table C15 contains the trading-day-regression results; Table C16 contains the monthly factors derived from the trading-day regression; and Table C18 contains the final trading-day factors derived from the combined daily weights. Finally, Table D11 contains the final seasonally adjusted series.
Adjustment for Prior Factors Suppose now that a strike at ZXY Company during July and August of 1988 caused sales to decrease an estimated 50%. Since this is a one-time event with a known cause, it is appropriate to prior adjust the data to reflect the effects of the strike. This is done in PROC X11 through the use of PMFACTOR=varname (prior monthly factor) in the MONTHLY statement. In the following example, the PMFACTOR variable is named PMF. Since the estimate of the decrease in sales is 50%, PMF has a value of 50.0 for the observations corresponding to July and August 1988, and a value of 100.0 for the remaining observations. This prior adjustment on SALES is performed by replacing SALES with the calculated value (SALES/PMF) * 100.0. A value of 100.0 for PMF leaves SALES unchanged, while a value of 50.0 for PMF doubles SALES. This value is the estimate of what SALES would have been without the strike. The following example shows how this prior adjustment is accomplished. data sales2; set sales; if ’01jul1988’d <= date <= ’01aug1988’d then pmf = 50; else pmf = 100; run; proc x11 data=sales2; monthly date=date pmfactor=pmf; var sales; tables a1 a2 a3 d11; output out=x11out a1=a1 a2=a2 a3=a3 d11=d11; run;
Table A2 contains the prior monthly factors (the values of PMF), and Table A3 contains the prior adjusted series.
The YRAHEADOUT Option F 2067
The YRAHEADOUT Option For monthly data, the YRAHEADOUT option affects only Tables C16 (regression trading-day adjustment factors), C18 (trading-day factors from combined daily weights), and D10 (seasonal factors). For quarterly data, only Table D10 is affected. Variables for all other tables have missing values for the forecast observations. The forecast values for a table are included only if that table is specified in the OUTPUT statement. Tables C16 and C18 are calendar effects that are extrapolated by calendar composition. These factors are independent of the data once trading-day weights have been calculated. Table D10 is extrapolated by a linear combination of past values. If N is the total number of nonmissing observations for the analysis variable, this linear combination is given by 1 D10t D .3 D10t 2
12
D10t
24 /;
t D N C 1; ::; N C 12
If the input data are monthly time series, 12 extra observations are added to the end of the output data set. (If a BY statement is used, 12 extra observations are added to the end of each BY group.) If the input data are a quarterly time series, four extra observations are added to the end of the output data set. (If a BY statement is used, four extra observations are added to each BY group.) The DATE= variable (or _DATE_) is extrapolated for the extra observations generated by the YRAHEADOUT option, while all other ID variables will have missing values. If ARIMA processing is requested, and if both the OUTEXTRAP and YRAHEADOUT options are specified in the PROC X11 statement, an additional 12 (or 4) observations are added to the end of output data set for monthly (or quarterly) data after the ARIMA forecasts, using the same linear combination of past values as before.
Effect of Backcast and Forecast Length Based on a number of empirical studies (Dagum 1982a, b, c; Dagum and Laniel 1987), one year of forecasts minimize revisions when new data become available. Two and three years of forecasts show only small gains. Backcasting improves seasonal adjustment but introduces permanent revisions at the beginning of the series and also at the end for series of length 8, 9, or 10 years. For series shorter than 7 years, the advantages of backcasting outweigh the disadvantages (Dagum 1988). Other studies (Pierce 1980; Bobbit and Otto 1990; Buszuwski 1987) suggest “full forecasting”— that is, using enough forecasts to allow symmetric weights for the seasonal moving averages for the most current data. For example, if a 3 9 seasonal moving average was specified for one or more months by using the MACURVES statement, five years of forecasts would be required. This is because the seasonal moving averages are performed on calendar months separately, and the 3 9 is an 11-term centered moving average, requiring five observations before and after the current observation. Thus
2068 F Chapter 31: The X11 Procedure
macurves dec=’3x9’;
would require five additional December values to compute the seasonal moving average.
Details of Model Selection If an ARIMA statement is present but no MODEL= is given, PROC X11 estimates and forecasts five predefined models and selects the best. This section describes the details of the selection criteria and the selection process. The five predefined models used by PROC X11 are the same as those used by X11ARIMA/88 from Statistics Canada. These particular models, shown in Table 31.2, were chosen on the basis of testing a large number of economics series (Dagum 1988) and should provide reasonable forecasts for most economic series. Table 31.2
Five Predefined Models
Model # 1 2 3 4 5
Specification (0,1,1)(0,1,1)s (0,1,2)(0,1,1)s (2,1,0)(0,1,1)s (0,2,2)(0,1,1)s (2,1,2)(0,1,1)s
Multiplicative log transform log transform log transform log transform no transform
Additive no transform no transform no transform no transform no transform
The selection process proceeds as follows. The five models are estimated and one-step-ahead forecasts are produced in the order shown in Table 31.2. As each model is estimated, the following three criteria are checked: The mean absolute percent error (MAPE) for the last three years of the series must be less than 15%. The significance probability for the Box-Ljung chi-square for up to lag 24 for monthly (8 for quarterly) must greater than 0.05. The over-differencing criteria must not exceed 0.9. The descriptions of these three criteria are given in the section “Criteria Details” on page 2069. The default values for these criteria are those used by X11ARIMA/88 from Statistics Canada; these defaults can be changed by the MAPECR=, CHICR=, and OVDIFCR= options. A model that fails any one of these three criteria is excluded from further consideration. In addition, if the ARIMA estimation fails for a given model, a warning is issued, and the model is excluded. The final set of all models considered consists of those that pass all three criteria and are estimated successfully. From this set, the model with the smallest MAPE for the last three years is chosen. If all five models fail, ARIMA processing is skipped for the variable being processed, and the standard X-11 seasonal adjustment is performed. A note is written to the log with this information.
Details of Model Selection F 2069
The chosen model is then used to forecast the series one or more years (determined by the FORECAST= option in the ARIMA statement). These forecasts are appended to the original data (or the prior and calendar-adjusted data). If a BACKCAST= option is specified, the chosen model form is used, but the parameters are reestimated using the reversed series. Using these parameters, the reversed series is forecast for the number of years specified by the BACKCAST= option. These forecasts are then reversed and appended to the beginning of the original series, or the prior and calendar-adjusted series, to produce the backcasts. Note that the final selection rule (the smallest MAPE using the last three years) emphasizes the quality of the forecasts at the end of the series. This is consistent with the purpose of the X-11ARIMA methodology, which is to improve the estimates of seasonal factors and thus minimize revisions to recent past data as new data become available.
Criteria Details Mean Absolute Percent Error (MAPE)
For the MAPE criteria testing, only the last three years of the original series (or prior and calendar adjusted series) is used in computing the MAPE. Let yt , t = 1,..,n, be the last three years of the series, and denote its one-step-ahead forecast by yOt , where n D 36 for a monthly series and n D 12 for a quarterly series. With this notation, the MAPE criteria are computed as n
100 X jyt yOt j MAPE D n jyt j t D1
Box-Ljung Chi-Square
The Box-Ljung chi-square is a lack-of-fit test based on the model residuals. This test statistic is computed using the Ljung-Box formula 2m D n.n C 2/
m X kD1
rk2 .n
k/
where n is the number of residuals that can be computed for the time series, and Pn k t D1 at at Ck rk D P n 2 t D1 at where the at ’s are the residual sequence. This formula has been suggested by Ljung and Box (1978) as yielding a better fit to the asymptotic chi-square distribution. Some simulation studies of the finite sample properties of this statistic are given by Davies, Triggs, and Newbold (1977) and by Ljung and Box (1978). For monthly series, m D 24, while for quarterly series, m D 8.
2070 F Chapter 31: The X11 Procedure
Over-Differencing Test
From Table 31.2 you can see that all models have a single seasonal MA factor and at most two nonseasonal MA factors. Also, all models have seasonal and nonseasonal differencing. Consider model 2 applied to a monthly series yt with E.yt / D : B 1 /.1
.1
B 12 /.yt
/ D .1
1 B
2 B 2 /.1
3 B 12 / and .1
If 3 D 1:0, then the factors .1 model.
3 B 12 /at
B 12 / will cancel, resulting in a lower-order
Similarly, if 1 C 2 D 1:0, .1
1 B
2 B 2 / D .1
B/.1
˛B/
for some ˛¤0:0. Again, this results in cancellation and a lower-order model. Since the parameters are not exact, it is not reasonable to require that 3 < 1:0 and 1 C 2 < 1:0 Instead, an approximate test is performed by requiring that 3 0:9 and 1 C 2 0:9 The default value of 0.9 can be changed by the OVDIFCR= option. Similar reasoning applies to the other models.
ARIMA Statement Options for the Five Predefined Models Table 31.3 lists the five predefined models and gives the equivalent MODEL= parameters in a PROC X11 ARIMA statement. In all models except the fifth, a log transformation is performed before the ARIMA estimation for the multiplicative case; no transformation is performed for the additive case. For the fifth model, no transformation is done for either case. The multiplicative case is assumed in the following table. The indicated seasonality s in the specification is either 12 (monthly) or 4 (quarterly). The MODEL statement assumes a monthly series. Table 31.3
ARIMA Statements Options for Predefined Models
Model
ARIMA Statement Options
(0,1,1)(0,1,1)s (0,1,2)(0,1,1)s (2,1,0)(0,1,1)s (0,2,2)(0,1,1)s (2,1,2)(0,1,1)s
MODEL=( Q=1 SQ=1 DIF=1 SDIF=1 ) TRANSFORM=LOG MODEL=( Q=2 SQ=1 DIF=1 SDIF=1 ) TRANSFORM=LOG MODEL=( P=2 SQ=1 DIF=1 SDIF=1 ) TRANSFORM=LOG MODEL=( Q=2 SQ=1 DIF=2 SDIF=1 ) TRANSFORM=LOG MODEL=( P=2 Q=2 SQ=1 DIF=1 SDIF=1 )
OUT= Data Set F 2071
OUT= Data Set The OUT= data set specified in the OUTPUT statement contains the BY variables, if any; the ID variables, if any; and the DATE= variable if the DATE= option is given, or _DATE_ if the DATE= option is not specified. In addition, the variables specified by the option tablename =var1 var2 . . . varn
are placed in the OUT= data set. A list of tables available for monthly and quarterly series is given later, in Table 31.4.
The OUTSPAN= Data Set The OUTSPAN= option is specified in the PROC statement, and writes the sliding spans results to the specified output data set. The OUTSPAN= data set contains the following variables: A1, a numeric variable that is a copy of the original series truncated to the current span. Note that overlapping spans will contain identical values for this variable. C18, a numeric variable that contains the trading-day factors for the seasonal adjustment for the current span D10, a numeric variable that contains the seasonal factors for the seasonal adjustment for the current span D11, a numeric variable that contains the seasonally adjusted series for the current span DATE, a numeric variable that contains the date within the current span SPAN, a numeric variable that contains the current span. The first span is the earliest span— that is the one with the earliest starting date. VARNAME, a character variable containing the name of each variable in the VAR list. A separate sliding spans analysis is performed on each variable in the VAR list.
OUTSTB= Data Set The output data set produced by the OUTSTB= option of the PROC X11 statement contains the information in the analysis of variance on table D8 (Final Unmodified S-I Ratios). This analysis of variance, following table D8 in the printed output, tests for stable seasonality (Shiskin, Young, and Musgrave 1967, Appendix A). These data contain the following variables:
2072 F Chapter 31: The X11 Procedure
VARNAME, a character variable containing the name of each variable in the VAR list TABLE, a character variable specifying the table from which the analysis of variance is performed. When ARIMA processing is requested, and two passes of X11 are required (when TDREGR=PRINT, TEST, or ADJUST), Table D8 and the stable seasonality test are computed twice: once in the initial pass, then again in the final pass. Both of these computations are put in the OUTSTB data set and are identified by D18.1 and D18.2, respectively. SOURCE, a character variable corresponding to the “source” column in the analysis of variance table following Table D8 SS, a numeric variable containing the sum of squares associated with the corresponding source term DF, a numeric variable containing the degrees of freedom associated with the corresponding source term MS, a numeric variable containing the mean square associated with the corresponding source term. MS is missing for the source term “Total” F, a numeric variable containing the F statistic for the “Between” source term. F is missing for all other source terms. PROBF, a numeric variable containing the significance level for the F statistic. PROBF is missing for the source terms “Total” and “Error.”
OUTTDR= Data Set The trading-day regression results (tables B15 and C15) are written to the OUTTDR= data set, which contains the following variables: VARNAME, a character variable containing the name of the VAR variable being processed TABLE, a character variable containing the name of the table. It can have only the value B15 (Preliminary Trading-Day Regression) or C15 (Final Trading-Day Regression). _TYPE_, a character variable whose value distinguishes the three distinct table format types. These types are (a) the regression, (b) the listing of the standard error associated with lengthof-month, and (c) the analysis of variance. The first seven observations in the OUTTDR data set correspond to the regression on days of the week; thus the _TYPE_ variable is given the value “REGRESS” (day-of-week regression coefficient). The next four observations correspond to 31-, 30-, 29-, and 28-day months and are given the value _TYPE_=LOM_STD (length-of-month standard errors). Finally, the last three observations correspond to the analysis of variance table, and _TYPE_=ANOVA. PARM, a character variable, further identifying the nature of the observation. PARM is set to blank for the three _TYPE_=ANOVA observations.
OUTTDR= Data Set F 2073
SOURCE, a character variable containing the source in the regression. This variable is missing for all _TYPE_=REGRESS and LOM_STD. CWGT, a numeric variable containing the combined trading-day weight (prior weight + weight found from regression). The variable is missing for all _TYPE_=LOM_STD and _TYPE_=ANOVA. PRWGT, a numeric variable containing the prior weight. The prior weight is 1.0 if PDWEIGHTS are not specified. This variable is missing for all _TYPE_=LOM_STD and _TYPE_=ANOVA. COEFF, a numeric variable containing the calculated regression coefficient for the given day. This variable is missing for all _TYPE_=LOM_STD and _TYPE_=ANOVA. STDERR, a numeric variable containing the standard errors. For observations with _TYPE_=REGRESS, this is the standard error corresponding to the regression coefficient. For observations with _TYPE_=LOM_STD, this is standard error for the corresponding length-of-month. This variable is missing for all _TYPE_=ANOVA. T1, a numeric variable containing the t statistic corresponding to the test that the combined weight is different from the prior weight. This variable is missing for all _TYPE_=LOM_STD and _TYPE_=ANOVA. T2, a numeric variable containing the t statistic corresponding to the test that the combined weight is different from 1.0. This variable is missing for all _TYPE_=LOM_STD and _TYPE_=ANOVA. PROBT1, a numeric variable containing the significance level for t statistic T1. The variable is missing for all _TYPE_=LOM_STD and _TYPE_=ANOVA. PROBT2, a numeric variable containing the significance level for t statistic T2. The variable is missing for all _TYPE_=LOM_STD and _TYPE_=ANOVA. SS, a numeric variable containing the sum of squares associated with the corresponding source term. This variable is missing for all _TYPE_=REGRESS and LOM_STD. DF, a numeric variable containing the degrees of freedom associated with the corresponding source term. This variable is missing for all _TYPE_=REGRESS and LOM_STD. MS, a numeric variable containing the mean square associated with the corresponding source term. This variable is missing for the source term ‘Total’ and for all _TYPE_=REGRESS and LOM_STD. F, a numeric variable containing the F statistic for the ‘Regression’ source term. The variable is missing for the source terms ‘Total’ and ‘Error’, and for all _TYPE_=REGRESS and LOM_STD. PROBF, a numeric variable containing the significance level for the F statistic. This variable is missing for the source term ‘Total’ and ‘Error’ and for all _TYPE_=REGRESS and LOM_STD.
2074 F Chapter 31: The X11 Procedure
Printed Output The output from PROC X11, both printed tables and the series written to the OUT= data set, depends on whether the data are monthly or quarterly. For the printed tables, the output depends further on the value of the PRINTOUT= option and the TABLE statement, along with other options specified. The printed output is organized into tables identified by a part letter and a sequence number within the part. The seven major parts of the X11 procedure are as follows: A
prior adjustments (optional)
B
preliminary estimates of irregular component weights and regression trading-day factors
C
final estimates of irregular component weights and regression trading-day factors
D
final estimates of seasonal, trend cycle, and irregular components
E
analytical tables
F
summary measures
G
charts
Table 31.4 describes the individual tables and charts. Most tables apply both to quarterly and monthly series. Those that apply only to a monthly time series are indicated by an “M” in the notes section, while “P” indicates the table is not a time series, and is only printed, not output to the OUT= data set. Table 31.4
Table Names and Descriptions
Table
Description
Notes
A1 A2 A3 A4 A5 A13 A14 A15
original series prior monthly adjustment factors original series adjusted for prior monthly factors prior trading-day adjustments prior adjusted or original series ARIMA forecasts ARIMA backcasts prior adjusted or original series extended by ARIMA backcasts and forecasts prior adjusted or original series trend cycle unmodified seasonal-irregular (S-I) ratios replacement values for extreme S-I ratios seasonal factors seasonally adjusted series trend cycle unmodified S-I ratios replacement values for extreme S-I ratios
M M M M M
B1 B2 B3 B4 B5 B6 B7 B8 B9
Printed Output F 2075
Table 31.4
continued
Table
Description
B10 B11 B13 B14 B15 B16 B17 B18 B19 C1
seasonal factors seasonally adjusted series irregular series extreme irregular values excluded from trading-day regression preliminary trading-day regression trading-day adjustment factors preliminary weights for irregular components trading-day factors derived from combined daily weights original series adjusted for trading-day and prior variation original series modified by preliminary weights and adjusted for trading-day and prior variation trend cycle modified S-I ratios seasonal factors seasonally adjusted series trend cycle modified S-I ratios seasonal factors seasonally adjusted series irregular series extreme irregular values excluded from trading-day regression final trading-day regression final trading-day adjustment factors derived from regression coefficients final weight for irregular components final trading-day factors derived from combined daily weights original series adjusted for trading-day and prior variation original series modified for final weights and adjusted for tradingday and prior variation trend cycle modified S-I ratios seasonal factors seasonally adjusted series trend cycle final unmodified S-I ratios final replacement values for extreme S-I ratios final seasonal factors final seasonally adjusted series final trend cycle final irregular series original series with outliers replaced modified seasonally adjusted series modified irregular series ratios of annual totals percent changes in original series
C2 C4 C5 C6 C7 C9 C10 C11 C13 C14 C15 C16 C17 C18 C19 D1 D2 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 E1 E2 E3 E4 E5
Notes
M M,P M M M
M M,P M
M M
P
2076 F Chapter 31: The X11 Procedure
Table 31.4
continued
Table
Description
E6 F1 F2 G1 G2
percent changes in final seasonally adjusted series MCD moving average summary measures chart of final seasonally adjusted series and trend cycle chart of S-I ratios with extremes, S-I ratios without extremes, and final seasonal factors chart of S-I ratios with extremes, S-I ratios without extremes, and final seasonal factors in calendar order chart of final irregular and final modified irregular series
G3 G4
Notes
P P P P P
The PRINTOUT= Option The PRINTOUT= option controls printing for groups of tables. See the “TABLES Statement” on page 2056 for details on specifying individual tables. The following list gives the tables printed for each value of the PRINTOUT= option: STANDARD (26 tables)
A1–A4, B1, C13–C19, D8–D13, E1–E6, F1, F2
LONG (40 tables)
A1–A5, A13–A15, B1, B2, B7, B10, B13–B15, C1, C7, C10, C13–C19, D1, D7–D11, D13, E1–E6, F1, F2
FULL (62 tables)
A1–A5, A13–A15, B1–B11, B13–B19, C1–C11, C13–C19, D1, D2, D4–D12, E1–E6, F1, F2
The actual number of tables printed depends on the options and statements specified. If a table is not computed, it is not printed. For example, if TDREGR=NONE is specified, none of the tables associated with the trading-day are printed.
The CHARTS= Option Of the four charts listed in Table 31.4, G1 and G2 are printed by default (CHARTS=STANDARD). Charts G3 and G4 are printed when CHARTS=FULL is specified. See the “TABLES Statement” on page 2056 for details on specifying individual charts.
Stable, Moving, and Combined Seasonality Tests on the Final Unmodified SI Ratios (Table D8) PROC X11 displays four tests used to identify stable seasonality and moving seasonality and to measure identifiable seasonality. These tests are displayed after Table D8. They are “Stable Seasonality Test,” “Moving Seasonality Test,” “Nonparametric Test for the Presence of Seasonality Assuming Stability,” and “Summary of Results and Combined Test for the Presence of Identifiable Seasonality.” The motivation, interpretation, and statistical details of all these tests are now given.
Printed Output F 2077
Motivation
The seasonal component of this time series, St , is defined as the intrayear variation that is repeated constantly (stable) or in an evolving fashion from year to year (moving seasonality). If the increase in the seasonal factors from year to year is too large, then the seasonal factors will introduce distortion into the model. It is important to determine if seasonality is identifiable without distorting the series. To determine if stable seasonality is present in a series, PROC X11 computes a one-way analysis of variance by using the seasons (months or quarters) as the factor on the Final Unmodified SI Ratios (Table D8). This is the appropriate table to use because the removal of the trend cycle is equivalent to detrending. PROC X11 prints this test, labeled “Stable Seasonality Test,” immediately after the Table D8. The X11 seasonal adjustment method tests for moving seasonality. Moving seasonality can be a source of distortion when seasonal factors are used in the model. PROC X11 computes and prints a test for moving seasonality. The test is a two-way analysis of variance that uses months (or quarters) and years. As in the “Stable Seasonality Test,” this analysis of variance is performed on the Final Unmodified SI Ratios (Table D8). PROC X11 prints this test, labeled “Moving Seasonality Test,” after the “Stable Seasonality Test.” PROC X11 next computes a nonparametric Kruskal-Wallis chi-squared test for stable seasonality, “Nonparametric Test for the Presence of Seasonality Assuming Stability.” The Kruskal-Wallis test is performed on the ranks of the Final Unmodified SI Ratios (Table D8). For further details about the Kruskal-Wallis test, see Lehmann (1998, pp. 204–210). The results of the preceding three tests are combined into a joint test to measure identifiable seasonality, “Summary of Results and Combined Test for the Presence of Identifiable Seasonality.” This test combines the two F tests previously described, along with the Kruskal-Wallis chi-squared test for stable seasonality, to determine “identifiable” seasonality. This test is printed after “Nonparametric Test for the Presence of Seasonality Assuming Stability.”
Interpretation and Statistical Details
The “Stable Seasonality Test” is a one-way analysis of variance on the “Final Unmodified SI Ratios” with seasons (months or quarters) as the factor. To determine whether stable seasonality is present in a series, PROC X11 computes a one-way analysis of variance by using the seasons (months or quarters) as the factor on the Final Unmodified SI Ratios (Table D8). This is the appropriate table to use because the removal of the trend cycle is similar to detrending. A large F statistic and a small significance level are evidence that a significant amount of variation in the SI-ratios is due to months or quarters, which in turn is evidence of seasonality; the null hypothesis of no month/quarter effect is rejected. Conversely, a small F statistic and a large significance level (close to 1.0) are evidence that variation due to month or quarter could be due to random error, and the null hypothesis of no month/quarter effect is not rejected. The interpretation and utility of seasonal adjustment are problematic under such conditions.
2078 F Chapter 31: The X11 Procedure
The F test for moving seasonality is performed by a two-way analysis of variance. The two factors are seasons (months or quarters) and years. The years effect is tested separately; the null hypothesis is no effect due to years after accounting for variation due to months or quarters. For further details about the moving seasonality test, see Lothian (1984a, b, 1978) and Higginson (1975). The significance level reported in both the moving and stable seasonality tests are only approximate. Table D8, the Final Unmodified SI Ratios, is constructed from an averaging operation that induces a correlation in the residuals from which the F test is computed. Hence the computed F statistic differs from an exact F statistic; see Cleveland and Devlin (1980) for details. The test for identifiable seasonality is performed by combining the F tests for stable and moving seasonality, along with a Kruskal-Wallis test for stable seasonality. The following description is based on Lothian and Morry (1978b); other details can be found in Dagum (1988, 1983). Let Fs and Fm denote the F value for the stable and moving seasonality tests, respectively. The combined test is performed as shown in Table 31.5 and as follows: 1. If the null hypothesis in the stable seasonality test is not rejected at the 0.10% significance level (0.001), this is an indication that the series is not seasonal. PROC X11 displays “Identifiable Seasonality Not Present.” 2. If the null hypothesis in step 1 is rejected, then compute the following quantities: T1 D
7 Fm
T2 D
3Fm Fs
Let T denote the simple average of T1 and T2 : T D
.T1 C T2 / 2
If the moving seasonality null hypothesis is not rejected at the 5.0% significance level (0.05) and if T 1:0, the null hypothesis of identifiable seasonality not present is accepted and PROC X11 displays “Identifiable Seasonality Not Present.” 3. If the null hypothesis of identifiable seasonality not present has not been accepted, but T1 1:0, T2 1:0, or the Kruskal-Wallis chi-squared test fails at the 0.10% significance level (0.001), then PROC X11 displays “Identifiable Seasonality Probably Not Present.” 4. If the FS and Kruskal-Wallis chi-squared tests pass, and if none of the combined measures described in steps 2 and 3 fail, then the null hypothesis of identifiable seasonality not present is rejected, and PROC X11 displays “Identifiable Seasonality Present.”
Printed Output F 2079
Figure 31.5 Combined Seasonality Test Flowchart
2080 F Chapter 31: The X11 Procedure
Tables Written to the OUT= Data Set All tables that are time series can be written to the OUT= data set. However, depending on the specified options and statements, not all tables are computed. When a table is not computed, but is requested in the OUTPUT statement, the resulting variable has all missing values. For example, if the PMFACTOR= option is not specified, Table A2 is not computed, and requesting this table in the OUTPUT statement results in the corresponding variable having all missing values. The trading-day regression results, Tables B15 and C15, although not written to the OUT= data set, can be written to an output data set; see the OUTTDR= option for details.
Printed Output Generated by Sliding Spans Analysis Table S 0.A
Table S 0.A gives the variable name, the length and number of spans, and the beginning and ending dates of each span.
Table S 0.B
Table S 0.B gives the summary of the two F tests performed during the standard X11 seasonal adjustments for stable and moving seasonality on Table D8, the final SI ratios. These tests are described in the section “Printed Output” on page 2074.
Table S 1.A
Table S 1.A gives the range analysis of seasonal factors. This includes the means for each month (or quarter) within a span, the maximum percentage difference across spans for each month, and the average. The minimum and maximum within a span are also indicated. For example, for a monthly series and an analysis with four spans, the January row would contain a column for each span, with the value representing the average seasonal factor (Table D10) over all January calendar months occurring within the span. Beside each span column is a character column with either a MIN, MAX, or blank value, indicating which calendar month had the minimum and maximum value over that span. Denote the average over the j th calendar month in span k; k D 1; ::; 4, by SNj .k/; then the maximum percent difference (MPD) for month j is defined by MPDj D
maxkD1;::;4 SNj .k/
mi nkD1;::;4 SNj .k/ mi nkD1;::;4 SNj .k/
The last numeric column of Table S 1.A is the average value over all spans for each calendar month, with the minimum and maximum row flagged as in the span columns.
Printed Output F 2081
Table S 1.B
Table S 1.B gives a summary of range measures for each span. The first column, Range Means, is calculated by computing the maximum and minimum over all months or quarters in a span, then taking the difference. The next column is the range ratio means, which is simply the ratio of the previously described maximum and minimum. The next two columns are the minimum and maximum seasonal factors over the entire span, while the range sf column is the difference of these. Finally, the last column is the ratio of the Max SF and Min SF columns.
Breakdown Tables
Table S 2.A.1 begins the breakdown analysis for the various series considered in the sliding spans analysis. The key concept here is the MPD described above in the section “Table S 1.A” on page 2080 and in the section “Computational Details for Sliding Spans Analysis” on page 2062. For a month or quarter that appears in two or more spans, the maximum percentage difference is computed and tested against a cutoff level. If it exceeds this cutoff, it is counted as an instance of exceeding the level. It is of interest to see if such instances fall disproportionately in certain months and years. Tables S 2.A.1 through S 6.A.3 display this breakdown for all series considered.
Table S 2.A.1
Table S 2.A.1 gives the monthly (quarterly) breakdown for the seasonal factors (table D10). The first column identifies the month or quarter. The next column is the number of times the MPD for D10 exceeded 3.0%, followed by the total count. The last is the average maximum percentage difference for the corresponding month or quarter.
Table S 2.A.2
Table S 2.A.2 gives the same information as Table S 2.A.1, but on a yearly basis.
Table S 2.A.3
The description of Table S 2.A.3 requires the definition of “Sign Change” and “Turning Point.” First, some motivation. Recall that for a highly stable series, adding or deleting a small number of observations should not affect the estimation of the various components of a seasonal adjustment procedure. Consider Table D10, the seasonal factors in a sliding spans analysis that uses four spans. For a given observation t , looking across the four spans, we can easily pick out large differences if they occur. More subtle differences can occur when estimates go from above to below (or vice versa) a base level. In the case of multiplicative model, the seasonal factors have a base level of 100.0. So it is useful to enumerate those instances where both a large change occurs (an MPD value exceeding 3.0%) and a change of sign (with respect to the base) occur. Let B denote the base value (which in general depends on the component being considered and the model type, multiplicative or additive). If, for span 1, St (1) is below B (i.e., St .1/ B is negative) and for some subsequent span k, St .k/ is above B (i.e., St .k/ B is positive), then a positive
2082 F Chapter 31: The X11 Procedure
“Change in Sign” has occurred at observation t. Similarly, if, for span 1, St (1) is above B, and for some subsequent span k, St .k/ is below B, then a negative “Change in Sign” has occurred. Both cases, positive or negative, constitute a “Change in Sign”; the actual direction is indicated in tables S 7.A through S 7.E, which are described below. Another behavior of interest occurs when component estimates increase then decrease (or vice versa) across spans for a given observation. Using the preceding example, the seasonal factors at observation t could first increase, then decrease across the four spans. This behavior, combined with an MPD exceeding the level, is of interest in questions of stability. Again, consider Table D10, the seasonal factors in a sliding spans analysis that uses four spans. For a given observation t (containing at least three spans), note the level of D10 for the first span. Continue across the spans until a difference of 1.0% or greater occurs (or no more spans are left), noting whether the difference is up or down. If the difference is up, continue until a difference of 1.0% or greater occurs downward (or no more spans are left). If such an up-down combination occurs, the observation is counted as an up-down turning point. A similar description occurs for a down-up turning point. Tables S 7.A through S 7.E, described below, show the occurrence of turning points, indicating whether up-down or down-up. Note that it requires at least three spans to test for a turning point. Hence Tables S 2.A.3 through S 6.A.3 show a reduced number in the “Turning Point” row for the “Total Tested” column, and in Tables S 7.A through S 7.E, the turning points symbols can occur only where three or more spans overlap. With these descriptions of sign change and turning point, we now describe Table S 2.A.3. The first column gives the type or category, the second column gives the total number of observations falling into the category, the third column gives the total number tested, and the last column gives the percentage for the number found in the category. The first category (row) of the table is for flagged observations—that is, those observations where the MPD exceeded the appropriate cutoff level (3.0% is default for the seasonal factors). The second category is for level changes, while the third category is for turning points. The fourth category is for flagged sign changes—that is, for those observations that are sign changes, how many are also flagged. Note the total tested column for this category equals the number found for sign change, reflecting the definition of the fourth category. The fifth column is for flagged turning points—that is, for those observations that are turning points, how many are also flagged. The footnote to Table S 2.A.3 gives the U.S. Census Bureau recommendation for thresholds, as described in the section “Computational Details for Sliding Spans Analysis” on page 2062.
Table S 2.B
Table S 2.B gives the histogram of flagged for seasonal factors (Table D10) using the appropriate cutoff value (default 3.0%). This table looks at the spread of the number of times the MPD exceeded the corresponding level. The range is divided up into four intervals: 3.0%–4.0%, 4.0%–5.0%, 5.0%–6.0%, and greater than 6.0%. The first column shows the symbol used in Table S 7.A, the second column gives the range in interval notation, and the last column gives the number found in the corresponding interval. Note that the sum of the last column should agree with the “Number Found” column of the “Flagged MPD” row in Table S 2.A.3.
Printed Output F 2083
Table S 2.C
Table S 2.C gives selected percentiles for the MPD for the seasonal factors (Table D10).
Tables S 3.A.1 through S 3.A.3
These table relate to the trading-day factors (Table C18) and follow the same format as Tables S 2.A.1 through S 2.A.3. The only difference between these tables and Tables S 2.A.1 through S 2.A.3 is the default cutoff value of 2.0% instead of the 3.0% used for the seasonal factors.
Tables S 3.B, S 3.C
These tables, applied to the trading-day factors (Table C18), are the same format as Tables S 2.B through S 2.C. The default cutoff value is different, with corresponding differences in the intervals in S 3.B.
Tables S 4.A.1 through S 4.A.3
These tables relate to the seasonally adjusted series (Table D11) and follow the same format as Tables S 2.A.1 through S 2.A.3. The same default cutoff value of 3.0% is used.
Tables S 4.B, S 4.C
These tables, applied to the seasonally adjusted series (Table D11), are the same format as tables S 2.B through S 2.C.
Tables S 5.A.1 through S 5.A.3
These table relate to the month-to-month (or quarter-to-quarter) differences in the seasonally adjusted series, and follow the same format as Tables S 2.A.1 through S 2.A.3. The same default cutoff value of 3.0% is used.
Tables S 5.B, S 5.C
These tables, applied to the month-to-month (or quarter-to-quarter) differences in the seasonally adjusted series, are the same format as tables S 2.B through S 2.C. The same default cutoff value of 3.0% is used.
Tables S 6.A.1 through S 6.A.3
These table relate to the year-to-year differences in the seasonally adjusted series, and follow the same format as Tables S 2.A.1 through S 2.A.3. The same default cutoff value of 3.0% is used.
2084 F Chapter 31: The X11 Procedure
Tables S 6.B, S 6.C
These tables, applied to the year-to-year differences in the seasonally adjusted series, are the same format as tables S 2.B through S 2.C. The same default cutoff value of 3.0% is used.
Table S 7.A
Table S 7.A gives the entire listing of the seasonal factors (Table D10) for each span. The first column gives the date for each observation included in the spans. Note that the dates do not cover the entire original data set. Only those observations included in one or more spans are listed. The next N columns (where N is the number of spans) are the individual spans starting at the earliest span. The span columns are labeled by their beginning and ending dates. Following the last span is the “Sign Change” column. As explained in the description of Table S 2.A.3, a sign change occurs at a given observation when the seasonal factor estimates go from above to below, or below to above, a base level. For the seasonal factors, 100.0 is the base level for the multiplicative model, 0.0 for the additive model. A blank value indicates no sign change, a “U” indicates a movement “upward” from the base level and a “D” indicates a movement “downward” from the base level. The next column is the “Turning Point” column. As explained in the description of Table S 2.A.3, a turning point occurs when there is an upward then downward movement, or downward then upward movement, of sufficient magnitude. A blank value indicates no turning point, a “U-D” indicates a movement “upward then downward,” and a “D-U” indicates a movement “downward then upward.” The next column is the maximum percentage difference (MPD). This quantity, described in the section “Computational Details for Sliding Spans Analysis” on page 2062, is the main computation for sliding spans analysis. A measure of how extreme the MPD value is given in the last column, the “Level of Excess” column. The symbols used and their meaning are described in Table S 2.A.3. If a given observation has exceeded the cutoff, the level of excess column is blank.
Table S 7.B
Table S 7.B gives the entire listing of the trading-day factors (Table C18) for each span. The format of this table is exactly like that of Table S 7.A.
Table S 7.C
Table S 7.C gives the entire listing of the seasonally adjusted data (Table D11) for each span. The format of this table is exactly like that of Table S 7.A except for the “Sign Change” column, which is not printed. The seasonally adjusted data have the same units as the original data; there is no natural base level as in the case of a percentage. Hence the sign change is not appropriate for D11.
Table S 7.D
Table S 7.D gives the entire listing of the month-to-month (or quarter-to-quarter) changes in seasonally adjusted data for each span. The format of this table is exactly like that of Table S 7.A.
ODS Table Names F 2085
Table S 7.E
Table S 7.E gives the entire listing of the year-to-year changes in seasonally adjusted data for each span. The format of this table is exactly like that of Table S 7.A.
Printed Output from the ARIMA Statement The information printed by default for the ARIMA model includes the parameter estimates, their approximate standard errors, t ratios, and variances, the standard deviation of the error term, and the AIC and SBC statistics for the model. In addition, a criteria summary for the chosen model is given that shows the values for each of the three test criteria and the corresponding critical values. If the PRINTALL option is specified, a summary of the nonlinear estimation optimization and a table of Box-Ljung statistics is also produced. If the automatic model selection is used, this information is printed for each of the five predefined models. Finally, a model selection summary is printed, showing the final model chosen.
ODS Table Names PROC X11 assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. N OTE : For monthly and quarterly tables, use the ODS names MonthlyTables and QuarterlyTables; For brevity, only the MonthlyTables are listed here; the QuarterlyTables are simply duplicates. Printing of individual tables can be specified by using the TABLES table_name, which is not listed here. Printing groups of tables is specified in the MONTHLY and QUARTERLY statements by specifying the option PRINTOUT=NONE|STANDARD|LONG|FULL. The default is PRINTOUT=STANDARD. Table 31.5
ODS Tables Produced in PROC X11
ODS Table Name
Description
Option
ODS Tables Created by the MONTHLY and QUARTERLY Statements Preface
X11 Seasonal Adjustment Program information giving credits, dates, and so on
A1 A2 A3
Table A1: original series Table A2: prior monthly Table A3: original series adjusted for prior monthly factors Table A4: prior trading day adjustment factors with and without length of month adjustment
A4
always printed unless NOPRINT
2086 F Chapter 31: The X11 Procedure
Table 31.5
continued
ODS Table Name
Description
A5 B1
Table A5: original series adjusted for priors Table B1: original series or original series adjusted for priors Table B2: trend cycle—centered nn-term moving average Table B3: unmodified SI ratios Table B4: replacement values for extreme SI ratios Table B5: seasonal factors Table B6: seasonally adjusted series Table B7: trend cycle—Henderson curve Table B8: unmodified SI ratios Table B9: replacement values for extreme SI ratios Table B10: seasonal factors Table B11: seasonally adjusted series Table B13: irregular series Table B15: preliminary trading day regression Table B16: trading day adjustment factors derived from regression Table B17: preliminary weights for irregular component Table B18: trading day adjustment factors from combined weights Table B19: original series adjusted for preliminary combined trading day weights Table C1: original series adjusted for preliminary weights Table C2: trend cycle—centered nn-term moving average Table C4: modified SI ratios Table C5: seasonal factors Table C6: seasonally adjusted series Table C7 trend cycle—Henderson curve Table C9: modified SI ratios Table C10: seasonal factors Table C11: seasonally adjusted series Table C13: irregular series Table C15: final trading day regression Table C16: trading day adjustment factors derived from regression Table C17: final weights for irregular component
B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B13 B15 B16 B17 B18 B19 C1 C2 C4 C5 C6 C7 C9 C10 C11 C13 C15 C16 C17
Option
ODS Table Names F 2087
Table 31.5
continued
ODS Table Name
Description
Option
C18
Table C18: trading day adjustment factors from combined weights Table C19: original series adjusted for final combined trading day weights Table D1: original series adjusted for final weights nn-term moving average Table D4: modified SI ratios Table D5: seasonal factors Table D6: seasonally adjusted series Table D7: trend cycle—Henderson curve Table D8: final unmodified SI ratios Table D10: final seasonal factors Table D11: final seasonally adjusted series Table D12: final trend cycle—Henderson curve Table D13: final irregular series Table E1: original series modified for extremes Table E2: modified seasonally adjusted series Table E3: modified irregular series Table E5: month-to-month changes in original series Table E6: month-to-month changes in final seasonally adjusted series Table F1: MCD moving average Table A13: ARIMA forecasts Table A14: ARIMA backcasts Table A15: ARIMA extrapolation
ARIMA statement ARIMA statement ARIMA statement
C19 D1 D4 D5 D6 D7 D8 D10 D11 D12 D13 E1 E2 E3 E5 E6 F1 A13 A14 A15 B14
Table B14: irregular values excluded from trading day regression
C14 D9
Table C14: irregular values excluded from trading day regression Table D9: final replacement values
PriorDailyWgts
adjusted prior daily weights
TDR_0
final/preliminary trading day regression, part 1
TDR_1
final/preliminary trading day regression, part 2
MONTHLY only, TDREGR=ADJUST, TEST MONTHLY only, TDREGR=ADJUST, TEST
2088 F Chapter 31: The X11 Procedure
Table 31.5
continued
ODS Table Name
Description
Option
StandErrors
standard errors of trading day adjustment factors
MONTHLY only, TDREGR=ADJUST, TEST
D9A
year-to-year change in irregular and seasonal components and moving seasonality ratio
StableSeasTest StableSeasFTest KruskalWallisTest
stable seasonality test moving seasonality test nonparametric test for the presence of seasonality assuming stability CombinedSeasonalityTest summary of results and combined test for the presence of identifiable seasonality f2a f2b f2c f2d f2f
F2 summary measures, part 1 F2 summary measures, part 2 F2 summary measures, part 3 I/C ratio for monthly/quarterly span average % change with regard to sign and standard deviation over span
E4
differences or ratios of annual totals for original and adjusted series
ChartG1 ChartG2
chart G1 chart G2
ODS Tables Created by the ARIMA Statement CriteriaSummary ConvergeSummary ArimaEst ArimaEst2 Model_Summary Ljung_BoxQ A13 A14 A15
criteria summary convergence summary ARIMA estimation results, part 1 ARIMA estimation results, part 2 model summary table of Ljung-Box Q statistics Table A13: ARIMA forecasts Table A14: ARIMA backcasts Table A15: ARIMA extrapolation
ARIMA statement
ODS Tables Created by the SSPAN Statement SPR0A_1 SpanDates
S 0.A sliding spans analysis, number, length of spans S 0.A sliding spans analysis: dates of spans
default printing
Examples: X11 Procedure F 2089
Table 31.5
continued
ODS Table Name
Description
Option
SPR0B
S 0.B summary of F tests for stable and moving seasonality S 1.A range analysis of seasonal factors S 1.B summary of range measures 2XA.1 breakdown of differences by month or quarter S X.B histogram of flagged observations S X.A.2 breakdown of differences by year S X.C: statistics for maximum percentage differences S 2.X.3 breakdown summary of flagged observations S 7.X sliding spans analysis
PRINTALL
SPR1_1 SPR1_b SPRXA SPRXB_2 SPRXA_2 MpdStats S_X_A_3 SPR7_X
Examples: X11 Procedure
Example 31.1: Component Estimation—Monthly Data This example computes and plots the final estimates of the individual components for a monthly series. In the first plot (Output 31.1.1), an overlaid plot of the original and seasonally adjusted data is produced. The trend in the data is more evident in the seasonally adjusted data than in the original data. This trend is even more clear in Output 31.1.3, the plot of Table D12, the trend cycle. Note that both the seasonal factors and the irregular factors vary around 100, while the trend cycle and the seasonally adjusted data are in the scale of the original data. From Output 31.1.2 the seasonal component appears to be slowly increasing, while no apparent pattern exists for the irregular series in Output 31.1.4. data sales; input sales @@; date = intnx( ’month’, ’01sep1978’d, _n_-1 ); format date monyy7.; datalines; ... more lines ...
proc x11 data=sales noprint;
2090 F Chapter 31: The X11 Procedure
monthly date=date; var sales; tables b1 d11; output out=out b1=series d10=d10 d11=d11 d12=d12 d13=d13; run; title ’Monthly Retail Sales Data (in $1000)’; proc sgplot data=out; series x=date y=series / markers markerattrs=(color=red symbol=’asterisk’) lineattrs=(color=red) legendlabel="original" ; series x=date y=d11 / markers markerattrs=(color=blue symbol=’circle’) lineattrs=(color=blue) legendlabel="adjusted" ; yaxis label=’Original and Seasonally Adjusted Time Series’; run;
Output 31.1.1 Plots of Original and Seasonally Adjusted Data
Example 31.1: Component Estimation—Monthly Data F 2091
title ’Monthly Seasonal Factors (in percent)’; proc sgplot data=out; series x=date y=d10 / markers markerattrs=(symbol=CircleFilled) ; run; title ’Monthly Retail Sales Data (in $1000)’; proc sgplot data=out; series x=date y=d12 / markers markerattrs=(symbol=CircleFilled) ; run; title ’Monthly Irregular Factors (in percent)’; proc sgplot data=out; series x=date y=d13 / markers markerattrs=(symbol=CircleFilled) ; run;
Output 31.1.2 Plot of D10, the Final Seasonal Factors
2092 F Chapter 31: The X11 Procedure
Output 31.1.3 Plot of D12, the Final Trend Cycle
Example 31.2: Components Estimation—Quarterly Data F 2093
Output 31.1.4 Plot of D13, the Final Irregular Series
Example 31.2: Components Estimation—Quarterly Data This example is similar to Example 31.1, except quarterly data are used. Tables B1, the original series, and D11, the final seasonally adjusted series, are printed by the TABLES statement. The OUTPUT statement writes the listed tables to an output data set. data quarter; input date yyq6. +1 fy35rr 5.2; format date yyq6.; datalines; ... more lines ...
title ’Monthly Retail Sales Data (in $1000)’; proc x11 data=quarter; var fy35rr; quarterly date=date; tables b1 d11;
2094 F Chapter 31: The X11 Procedure
output out=out b1=b1 d10=d10 d11=d11 d12=d12 d13=d13; run;
Output 31.2.1 X11 Procedure Quarterly Example Monthly Retail Sales Data (in $1000) The X11 Procedure X-11 Seasonal Adjustment Program U. S. Bureau of the Census Economic Research and Analysis Division November 1, 1968 The X-11 program is divided into seven major parts. Part Description A. Prior adjustments, if any B. Preliminary estimates of irregular component weights and regression trading day factors C. Final estimates of above D. Final estimates of seasonal, trend-cycle and irregular components E. Analytical tables F. Summary measures G. Charts Series - fy35rr Period covered - 1st Quarter 1971 to 4th Quarter 1976 Monthly Retail Sales Data (in $1000) The X11 Procedure Seasonal Adjustment of - fy35rr
Year
1st
B1 Original Series 2nd 3rd
4th
Total
1971 6.590 6.010 6.510 6.180 25.290 1972 5.520 5.590 5.840 6.330 23.280 1973 6.520 7.350 9.240 10.080 33.190 1974 9.910 11.150 12.400 11.640 45.100 1975 9.940 8.160 8.220 8.290 34.610 1976 7.540 7.440 7.800 7.280 30.060 ----------------------------------------------------------Avg 7.670 7.617 8.335 8.300 Total:
191.53
Mean:
7.9804
S.D.:
1.9424
Example 31.3: Outlier Detection and Removal F 2095
Output 31.2.2 X11 Procedure Quarterly Example, Table D11
Year
D11 Final Seasonally Adjusted Series 1st 2nd 3rd 4th
Total
1971 6.877 6.272 6.222 5.956 25.326 1972 5.762 5.836 5.583 6.089 23.271 1973 6.820 7.669 8.840 9.681 33.009 1974 10.370 11.655 11.855 11.160 45.040 1975 10.418 8.534 7.853 7.947 34.752 1976 7.901 7.793 7.444 6.979 30.116 ----------------------------------------------------------Avg 8.025 7.960 7.966 7.969 Total:
191.51
Mean:
7.9797
S.D.:
1.9059
Example 31.3: Outlier Detection and Removal PROC X11 can be used to detect and replace outliers in the irregular component of a monthly or quarterly series. The weighting scheme used in measuring the “extremeness” of the irregulars is developed iteratively; thus the statistical properties of the outlier adjustment method are unknown. In this example, the data are simulated by generating a trend plus a random error. Two periods in the series were made “extreme” by multiplying one generated value by 2.0 and another by 0.10. The additive model is appropriate based on the way the data were generated. Note that the trend in the generated data was modeled automatically by the trend cycle component estimation. The detection of outliers is accomplished by considering Table D9, the final replacement values for extreme S-I ratios. This table indicates which observations had irregular component values more than FULLWEIGHT= standard deviation units from 0.0 (1.0 for the multiplicative model). The default value of the FULLWEIGHT= option is 1.5; a larger value would result in fewer observations being declared extreme. In this example, FULLWEIGHT=3.0 is used to isolate the extreme inflated and deflated values generated in the DATA step. The value of ZEROWEIGHT= must be greater than FULLWEIGHT; it is given a value of 3.5. A plot of the original and modified series, Output 31.3.2, shows that the deviation from the trend line for the modified series is greatly reduced compared with the original series. data a; retain seed 99831; do kk = 1 to 48; x = kk + 100 + rannor( seed ); date = intnx( ’month’, ’01jan1970’d, kk-1 ); if kk = 20 then x = 2 * x; else if kk = 30 then x = x / 10; output;
2096 F Chapter 31: The X11 Procedure
end; run; proc x11 data=a; monthly date=date additive fullweight=3.0 zeroweight=3.5; var x; table d9; output out=b b1=original e1=e1; run; proc sgplot data=b; series x=date y=original / markers markerattrs=(color=red symbol=’asterisk’) lineattrs=(color=red) legendlabel="unmodified" ; series x=date y=e1 / markers markerattrs=(color=blue symbol=’circle’) lineattrs=(color=blue) legendlabel="modified" ; yaxis label=’Original and Outlier Adjusted Time Series’; run;
Output 31.3.1 Detection of Extreme Irregulars Monthly Retail Sales Data (in $1000) The X11 Procedure Seasonal Adjustment of - x
Year 1970 1971 1972 1973
Year 1970 1971 1972 1973
D9 Final Replacement Values for Extreme SI Ratios JAN FEB MAR APR MAY . . . .
. . . .
. . . .
. . . .
. . . .
D9 Final Replacement Values for Extreme SI Ratios JUL AUG SEP OCT NOV . . . .
. 11.180 . .
. . . .
. . . .
. . . .
JUN . . -10.671 .
DEC . . . .
References F 2097
Output 31.3.2 Plot of Modified and Unmodified Values
References Bell, W. R. and Hillmer, S. C. (1984), “Issues Involved with the Seasonal Adjustment of Economic Time Series,” Journal of Business and Economic Statistics, 2(4). Bobbit, L. G. and Otto, M. C. (1990), “Effects of Forecasts on the Revisions of Seasonally Adjusted Data Using the X-11 Adjustment Procedure,” Proceedings of the Business and Economic Statistics Section of the American Statistical Association, 449–453. Buszuwski, J. A. (1987), “Alternative ARIMA Forecasting Horizons When Seasonally Adjusting Producer Price Data with X-11-ARIMA,” Proceedings of the Business and Economic Statistics Section of the American Statistical Association, 488–493. Cleveland, W. P. and Tiao, G. C. (1976), “Decomposition of Seasonal Time Series: A Model for Census X-11 Program,” Journal of the American Statistical Association, 71(355). Cleveland, W. S. and Devlin, S. J. (1980), “Calendar Effects in Monthly Time Series: Detection by Spectrum Analysis and Graphical Methods,” Journal of the American Statistical Association, 75(No. 371), 487–496.
2098 F Chapter 31: The X11 Procedure
Dagum, E. B. (1980), The X-11-ARIMA Seasonal Adjustment Method, Statistics Canada. Dagum, E. B. (1982a), “The Effects of Asymmetric Filters on Seasonal Factor Revision,” Journal of the American Statistical Association, 77(380), 732–738. Dagum, E. B. (1982b), “Revisions of Seasonally Adjusted Data Due to Filter Changes,” Proceedings of the Business and Economic Section, the American Statistical Association, 39–45. Dagum, E. B. (1982c), “Revisions of Time Varying Seasonal Filters,” Journal of Forecasting, 1(Issue 2), 173–187. Dagum, E. B. (1983), The X-11-ARIMA Seasonal Adjustment Method, Technical Report 12-564E, Statistics Canada. Dagum, E. B. (1985), “Moving Averages,” in S. Kotz and N. L. Johnson, eds., Encyclopedia of Statistical Sciences, volume 5, New York: John Wiley & Sons. Dagum, E. B. (1988), The X-11-ARIMA/88 Seasonal Adjustment Method: Foundations and User’s Manual, Ottawa: Statistics Canada. Dagum, E. B. and Laniel, N. (1987), “Revisions of Trend Cycle Estimators of Moving Average Seasonal Adjustment Method,” Journal of Business and Economic Statistics, 5(2), 177–189. Davies, N., Triggs, C. M., and Newbold, P. (1977), “Significance Levels of the Box-Pierce Portmanteau Statistic in Finite Samples,” Biometrika, 64, 517–522. Findley, D. F. and Monsell, B. C. (1986), “New Techniques for Determining If a Time Series Can Be Seasonally Adjusted Reliably, and Their Application to U.S. Foreign Trade Series,” in M. R. Perryman and J. R. Schmidt, eds., Regional Econometric Modeling, 195–228, Amsterdam: KluwerNijhoff. Findley, D. F., Monsell, B. C., Shulman, H. B., and Pugh, M. G. (1990), “Sliding Spans Diagnostics for Seasonal and Related Adjustments,” Journal of the American Statistical Association, 85(410), 345–355. Ghysels, E. (1990), “Unit Root Tests and the Statistical Pitfalls of Seasonal Adjustment: The Case of U.S. Post War Real GNP,” Journal of Business and Economic Statistics, 8(2), 145–152. Higginson, J. (1975), An F test for the Presence of Moving Seasonality When Using Census Method II-X-II Variant, StatCan Staff Paper STC2102E, Seasonal Adjustment and Time Series Analysis Staff, Statistics Canada, Ottawa. Huot, G., Chui, L., Higginson, J., and Gait, N. (1986), “Analysis of Revisions in the Seasonal Adjustment of Data Using X11ARIMA Model-Based Filters,” International Journal of Forecasting, 2, 217–229. Ladiray, D. and Quenneville, B. (2001), Seasonal Adjustment with the X-11 Method, New York: Springer-Verlag. Laniel, N. (1985), “Design Criteria for the 13-Term Henderson End-Weights,” Working Paper, Methodology Branch, Ottawa: Statistics Canada. Lehmann, E. L. (1998), Nonparametrics: Statistical Methods Based on Ranks, San Francisco: Holden-Day.
References F 2099
Ljung, G. M. and Box, G. E. P. (1978), “On a Measure of Lack of Fit in Time Series Models,” Biometrika, 65(2), 297–303. Lothian, J. (1978), The Identification and Treatment of Moving Seasonality in the X-11 Seasonal Adjustment Method, StatCan Staff Paper STC0803E, Seasonal Adjustment and Time Series Analysis Staff, Statistics Canada, Ottawa. Lothian, J. (1984a), The Identification and Treatment of Moving Seasonality in the X-11-ARIMA Seasonal Adjustment Method, Statcan staff paper, Seasonal Adjustment and Time Series Analysis Staff, Statistics Canada, Ottawa. Lothian, J. (1984b), “The Identification and Treatment of Moving Seasonality in X-11-ARIMA,” in Proceedings of the Business and Economic Statistics Section of the American Statistical Association, 166–171. Lothian, J. and Morry, M. (1978a), Selection of Models for the Automated X-11-ARIMA Seasonal Adjustment Program, StatCan Staff Paper STC1789, Seasonal Adjustment & Time Series Analysis Staff, Statistics Canada, Ottawa. Lothian, J. and Morry, M. (1978b), A Test for the Presence of Identifiable Seasonality When Using the X-11-ARIMA Program, StatCan Staff Paper STC2118, Seasonal Adjustment and Time Series Analysis Staff, Statistics Canada, Ottawa. Marris, S. (1961), “The Treatment of Moving Seasonality in Census Method II,” in Seasonal Adjustment on Electronic Computers, 257–309, Paris: Organisation for Economic Co-operation and Development. Monsell, B. C. (1984), The Substantive Changes in the X-11 Procedure of X-11-ARIMA, SRD Research Report Census/SRD/RR-84/10, Bureau of the Census, Statistical Research Division. Pierce, D. A. (1980), “Data Revisions with Moving Average Seasonal Adjustment Procedures,” Journal of Econometrics, 14, 95–114. Shiskin, J. (1958), “Decomposition of Economic Time Series,” Science, 128(3338). Shiskin, J. and Eisenpress, H. (1957), “Seasonal Adjustment by Electronic Computer Methods,” Journal of the American Statistical Association, 52(280). Shiskin, J., Young, A. H., and Musgrave, J. C. (1967), The X-11 Variant of the Census Method II Seasonal Adjustment Program, Technical Report 15, U.S. Department of Commerce, Bureau of the Census. U.S. Bureau of the Census (1969), X-11 Information for the User, U.S. Department of Commerce, Washington, DC: Government Printing Office. Young, A. H. (1965), Estimating Trading Day Variation in Monthly Economic Time Series, Technical Report 12, U.S. Department of Commerce, Bureau of the Census, Washington, DC.
2100
Chapter 32
The X12 Procedure Contents Overview: X12 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . Getting Started: X12 Procedure . . . . . . . . . . . . . . . . . . . . . . . Basic Seasonal Adjustment . . . . . . . . . . . . . . . . . . . . . . Syntax: X12 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . . . . . . . . . . . . . PROC X12 Statement . . . . . . . . . . . . . . . . . . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ID Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EVENT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . INPUT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . ADJUST Statement . . . . . . . . . . . . . . . . . . . . . . . . . . ARIMA Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . ESTIMATE Statement . . . . . . . . . . . . . . . . . . . . . . . . . FORECAST Statement . . . . . . . . . . . . . . . . . . . . . . . . IDENTIFY Statement . . . . . . . . . . . . . . . . . . . . . . . . . AUTOMDL Statement . . . . . . . . . . . . . . . . . . . . . . . . . OUTPUT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . OUTLIER Statement . . . . . . . . . . . . . . . . . . . . . . . . . . REGRESSION Statement . . . . . . . . . . . . . . . . . . . . . . . TABLES Statement . . . . . . . . . . . . . . . . . . . . . . . . . . TRANSFORM Statement . . . . . . . . . . . . . . . . . . . . . . . USERDEFINED Statement . . . . . . . . . . . . . . . . . . . . . . VAR Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X11 Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Details: X12 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Combined Test for the Presence of Identifiable Seasonality . . . . . . Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Displayed Output/ODS Table Names/OUTPUT Tablename Keywords ODS Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Special Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples: X12 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . Example 32.1: Model Identification . . . . . . . . . . . . . . . . . . Example 32.2: Model Estimation . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2102 2103 2104 2106 2107 2109 2111 2111 2112 2113 2114 2115 2115 2116 2117 2118 2121 2122 2124 2129 2130 2131 2132 2132 2136 2136 2136 2138 2139 2141 2142 2146 2146 2150
2102 F Chapter 32: The X12 Procedure
Example 32.3: Seasonal Adjustment . . . . . . . . . . . . . . Example 32.4: RegARIMA Automatic Model Selection . . . . Example 32.5: Automatic Outlier Detection . . . . . . . . . . Example 32.6: User-Defined Regressors . . . . . . . . . . . . Example 32.7: MDLINFOIN= and MDLINFOOUT= Data Sets Example 32.8: Setting Regression Parameters . . . . . . . . . Example 32.9: Illustration of ODS Graphics . . . . . . . . . . Acknowledgments: X12 Procedure . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
2151 2154 2161 2166 2169 2174 2182 2184 2184
Overview: X12 Procedure The X12 procedure, an adaptation of the U.S. Bureau of the Census X-12-ARIMA Seasonal Adjustment program (U.S. Bureau of the Census 2001c), seasonally adjusts monthly or quarterly time series. The procedure makes additive or multiplicative adjustments and creates an output data set that contains the adjusted time series and intermediate calculations. The X-12-ARIMA program combines the capabilities of the X-11 program (Shiskin, Young, and Musgrave 1967) and the X-11-ARIMA/88 program (Dagum 1988) and also introduces some new features (Findley et al. 1998). One of the main enhancements involves the use of a regARIMA model, a regression model with ARIMA (autoregressive integrated moving average) errors. Thus, the X-12-ARIMA program contains methods developed by both the U.S. Census Bureau and Statistics Canada. In addition, the X-12-ARIMA automatic modeling routine is based on the TRAMO (time series regression with ARIMA noise, missing values, and outliers) method (Gomez and Maravall 1997a, b). The four major components of the X-12-ARIMA program are regARIMA modeling, model diagnostics, seasonal adjustment that uses enhanced X-11 methodology, and postadjustment diagnostics. Statistics Canada’s X-11 method fits an ARIMA model to the original series, and then uses the model forecast to extend the original series. This extended series is then seasonally adjusted by the standard X-11 seasonal adjustment method. The extension of the series improves the estimation of the seasonal factors and reduces revisions to the seasonally adjusted series as new data become available. Seasonal adjustment of a series is based on the assumption that seasonal fluctuations can be measured in the original series, Ot , t D 1; . . . , n, and separated from trend cycle, trading day, and irregular fluctuations. The seasonal component of this time series, St , is defined as the intrayear variation that is repeated consistently or in an evolving fashion from year to year. The trend cycle component, Ct , includes variation due to the long-term trend, the business cycle, and other longterm cyclical factors. The trading day component, Dt , is the variation that can be attributed to the composition of the calendar. The irregular component, It , is the residual variation. Many economic time series are related in a multiplicative fashion (Ot D St Ct Dt It ). Other economic series are related in an additive fashion (Ot D St C Ct C Dt C It ). A seasonally adjusted time series, Ct It or Ct C It , consists of only the trend cycle and irregular components. For more details about seasonal adjustment with the X-11 method, see Ladiray and Quenneville (2001).
Getting Started: X12 Procedure F 2103
Graphics are now available with the X12 procedure. For more information, see the section “ODS Graphics” on page 2141.
Getting Started: X12 Procedure The most common use of the X12 procedure is to produce a seasonally adjusted series. Eliminating the seasonal component from an economic series facilitates comparison among consecutive months or quarters. A plot of the seasonally adjusted series is often more informative about trends or location in a business cycle than a plot of the unadjusted series. The following example shows how to use PROC X12 to produce a seasonally adjusted series, Ct It , from an original series Ot D St Ct Dt It . In the multiplicative model, the trend cycle component Ct keeps the same scale as the original series Ot , while St , Dt , and It vary around 1.0. In all displayed tables, these latter components are expressed as percentages and thus vary around 100.0 (in the additive case, they vary around 0.0). However, in the output data set, the data displayed as percentages in the displayed output are expressed as the decimal equivalent and thus vary around 1.0 in the multiplicative case. The naming convention used in PROC X12 for the tables follows the convention used in the Census Bureau’s X-12-ARIMA program; see X-12-ARIMA Reference Manual (U.S. Bureau of the Census 2001b) and X-12-ARIMA Quick Reference for UNIX (U.S. Bureau of the Census 2001a). Also see the section “Displayed Output/ODS Table Names/OUTPUT Tablename Keywords” on page 2139. The table names are outlined in Table 32.8. The tables that correspond to parts A through C are intermediate calculations. The final estimates of the individual components are found in the D tables: Table D10 contains the final seasonal factors, Table D12 contains the final trend cycle, and Table D13 contains the final irregular series. If you are primarily interested in seasonally adjusting a series without consideration of intermediate calculations or diagnostics, you need to look only at Table D11, the final seasonally adjusted series. Tables in part E contain information about extreme values and changes in the original and seasonally adjusted series. The tables in part F are seasonal adjustment quality measures. Spectral analysis is performed in part G. For further information about the tables produced by the X11 statement, see Ladiray and Quenneville (2001).
2104 F Chapter 32: The X12 Procedure
Basic Seasonal Adjustment Suppose that you have monthly retail sales data starting in September 1978 in a SAS data set named SALES. At this point, you do not suspect that any calendar effects are present, and there are no prior adjustments that need to be made to the data. In this simplest case, you need only specify the DATE= variable in the PROC X12 statement and request seasonal adjustment in the X11 statement as shown in the following statements: data sales; set sashelp.air; sales = air; date = intnx( ’month’, ’01sep78’d, _n_-1 ); format date monyy.; run; proc x12 data=sales date=date; var sales; x11; ods select d11; run ;
Basic Seasonal Adjustment F 2105
The results of the seasonal adjustment are in table D11 (the final seasonally adjusted series) in the displayed output as shown in Figure 32.1. Figure 32.1 Basic Seasonal Adjustment The X12 Procedure Table D 11: Year
JAN JUL
Final Seasonally Adjusted Data For variable sales FEB MAR APR MAY AUG SEP OCT NOV
JUN DEC
Total
1978
. . . . . . . . 124.560 124.649 124.920 129.002 503.131 1979 125.087 126.759 125.252 126.415 127.012 130.041 128.056 129.165 127.182 133.847 133.199 135.847 1547.86 1980 128.767 139.839 143.883 144.576 148.048 145.170 140.021 153.322 159.128 161.614 167.996 165.388 1797.75 1981 175.984 166.805 168.380 167.913 173.429 175.711 179.012 182.017 186.737 197.367 183.443 184.907 2141.71 1982 186.080 203.099 193.386 201.988 198.322 205.983 210.898 213.516 213.897 218.902 227.172 240.453 2513.69 1983 231.839 224.165 219.411 225.907 225.015 226.535 221.680 222.177 222.959 212.531 230.552 232.565 2695.33 1984 237.477 239.870 246.835 242.642 244.982 246.732 251.023 254.210 264.670 266.120 266.217 276.251 3037.03 1985 275.485 281.826 294.144 286.114 293.192 296.601 293.861 309.102 311.275 319.239 319.936 323.663 3604.44 1986 326.693 330.341 330.383 330.792 333.037 332.134 336.444 341.017 346.256 350.609 361.283 362.519 4081.51 1987 364.951 371.274 369.238 377.242 379.413 376.451 378.930 375.392 374.940 373.612 368.753 364.885 4475.08 1988 371.618 383.842 385.849 404.810 381.270 388.689 385.661 377.706 397.438 404.247 414.084 416.486 4711.70 1989 426.716 419.491 427.869 446.161 438.317 440.639 450.193 454.638 460.644 463.209 427.728 485.386 5340.99 1990 477.259 477.753 483.841 483.056 481.902 499.200 484.893 485.245 . . . . 3873.15 -------------------------------------------------------------------------Avg 277.330 280.422 282.373 286.468 285.328 288.657 288.389 291.459 265.807 268.829 268.774 276.446 Total:
40323
Mean: Min:
280.02 S.D.: 124.56 Max:
111.31 499.2
You can compare the original series (Table A1) and the final seasonally adjusted series (Table D11) by plotting them together as shown in Figure 32.2. These tables are requested in the OUTPUT statement and are written to the OUT= data set. Note that the default variable name used in the output data set is the input variable name followed by an underscore and the corresponding table name. proc x12 data=sales date=date noprint; var sales; x11; output out=out a1 d11; run;
2106 F Chapter 32: The X12 Procedure
proc sgplot data=out; series x=date y=sales_A1 / name = "A1" markers markerattrs=(color=red symbol=’asterisk’) lineattrs=(color=red); series x=date y=sales_D11 / name= "D11" markers markerattrs=(symbol=’circle’) lineattrs=(color=blue); yaxis label=’Original and Seasonally Adjusted Time Series’; run;
Figure 32.2 Plot of Original and Seasonally Adjusted Data
Syntax: X12 Procedure The X12 procedure uses the following statements:
Functional Summary F 2107
PROC X12 options ; VAR variables ; BY variables ; ID variables ; EVENT variables ; USERDEFINED variables ; TRANSFORM options ; ADJUST options ; IDENTIFY options ; AUTOMDL options ; OUTLIER options ; REGRESSION options ; INPUT variables ; ARIMA options ; ESTIMATE options ; X11 options ; FORECAST options ; OUTPUT options ; TABLES options ;
The PROC X12 statements perform basically the same function as the Census Bureau’s X-12ARIMA specs. Specs (specifications) are used in X-12-ARIMA to control the computations and output. The PROC X12 statement performs some of the same functions as the Series spec in the Census Bureau’s X-12-ARIMA software. The ADJUST statement performs some of the same functions as the Transform spec. The TRANSFORM, IDENTIFY, AUTOMDL, OUTLIER, REGRESSION, ARIMA, ESTIMATE, X11, and FORECAST statements are designed to perform the same functions as the corresponding X-12-ARIMA specs, although full compatibility is not yet available. The Census Bureau documentation X-12-ARIMA Reference Manual (U.S. Bureau of the Census 2001b) can provide added insight to the functionality of these statements.
Functional Summary Table 32.1 summarizes the statements and options that control the X12 procedure. Table 32.1
X12 Syntax Summary
Description
Statement
Option
Data Set Options specify input data set specify regression and ARIMA information output regression and ARIMA information write table values to an output data set
PROC X12 PROC X12 PROC X12 OUTPUT
DATA= MDLINFOIN= MDLINFOOUT= OUT=
Display Control Options suppress all displayed output
PROC X12
NOPRINT
2108 F Chapter 32: The X12 Procedure
Description
Statement
Option
request tables that are not displayed by default specify that the summary line not be displayed display iterations history display information on restarted iterations display regression model parameter estimates display automatic model information
TABLES TABLES ESTIMATE ESTIMATE IDENTIFY AUTOMDL
NOSUM ITPRINT PRINTERR PRINTREG PRINT=
PROC X12 PROC X12 PROC X12
DATE= START= SPAN=
PROC X12 PROC X12
INTERVAL= SEASONS=
Date Information Options specify the date variable specify the date of the first observation specify the beginning and/or ending date of the subset specify the interval of the time series specify the interval of the time series Declaring the Role of Variables specify BY-group processing specify identifying variables specify the variables to be seasonally adjusted specify user-defined variables available for regression Controlling the Table Computations suppress trimming of leading/trailing missing values transform or prior-adjust the series transform or prior-adjust the series adjust the series by using a predefined adjustment variable use differencing to identify the ARIMA part of the model specify automatic outlier detection estimate the regARIMA model specified by the REGRESSION and ARIMA statements or the MDLINFOIN= option specify seasonal adjustment specify the number of forecasts to extend the series for seasonal adjustment Specifying the Regression Model specify Census regression variables specify user-defined regression variables specify user defined event definition data set specify user defined event regression variables
BY ID VAR USERDEFINED
PROC X12
NOTRIMMISS
TRANSFORM TRANSFORM ADJUST
FUNCTION= POWER= PREDEFINED=
IDENTIFY OUTLIER ESTIMATE
X11 FORECAST
REGRESSION REGRESSION PROC X12 EVENT
LEAD=
PREDEFINED= USERVAR= INEVENT=
PROC X12 Statement F 2109
Description Specifying the ARIMA Model use X-12-ARIMA TRAMO-based method to choose a model specify the ARIMA part of the model
Statement
Option
AUTOMDL ARIMA
MODEL=
PROC X12 Statement PROC X12 options ;
The PROC X12 statement provides information about the time series to be processed by PROC X12. Either the START= or the DATE= option must be specified. The original series is displayed in Table A1. If there are missing values in the original series and a regARIMA model is specified or automatically selected, then Table MV1 is displayed. Table MV1 contains the original series with missing values replaced by the predicted values from the fitted model. Table B1 is displayed when the original data is altered (for example, through an ARIMA model estimation, prior adjustment factors, or regression) or the series is extended with forecasts. Although the X-12-ARIMA method handles missing values, there are some restrictions. In order for PROC X12 to process the series, no month or quarter can contain missing values for all years. For instance, if the third quarter contained only missing values for all years, then processing is skipped for that series. In addition, if more than half the values for a month or a quarter are missing, then a warning message is displayed in the log file, and other errors might occur later in processing. If a series contains many missing values, other methods of missing value replacement should be considered prior to seasonally adjusting the series. The following options can appear in the PROC X12 statement. DATA=SAS-data-set
specifies the input SAS data set used. If this option is omitted, the most recently created SAS data set is used. DATE=variable DATEVAR=variable
specifies a variable that gives the date for each observation. Unless specified in the SPAN= option, the starting and ending dates are obtained from the first and last values of the DATE= variable, which must contain SAS date or datetime values. The procedure checks values of the DATE= variable to ensure that the input observations are sequenced correctly in ascending order. If the INTERVAL= option or the SEASONS= option is specified, the values of the date variable must be consistent with the specified seasonality or interval. If neither the INTERVAL= option nor the SEASONS= option is specified, then the procedure tries to determine the type of data from the values of the date variable. This variable is automatically added to the OUT= data set if a data set is requested in an OUTPUT statement, and the date values for
2110 F Chapter 32: The X12 Procedure
the variable are extrapolated if necessary. If the DATE= option is not specified, the START= option must be specified. START=mmmyy START=’yyQq’ STARTDATE=mmmyy STARTDATE=’yyQq’
specifies the date of the first observation. Unless the SPAN= option is used, the starting and ending dates are the dates of the first and last observations, respectively. Either this option or the DATE= option is required. When using this option, use either the INTERVAL= option or the SEASONS= option to specify monthly or quarterly data. If neither the INTERVAL= option nor the SEASONS= option is present, monthly data are assumed. Note that for a quarterly date, the specification must be enclosed in quotes. A four-digit year can be specified; if a two-digit year is specified, the value specified in the YEARCUTOFF= SAS system option applies. When using the START= option with BY processing, the start date is applied to the first observation in each BY group. SPAN=(mmmyy ,mmmyy ) SPAN=(’yyQq’ ,’yyQq’ )
specifies the dates of the first and last observations to define a subset for processing. A single date in parentheses is interpreted to be the starting date of the subset. To specify only the ending date, use SPAN=(,mmmyy ). If the starting or ending date is omitted, then the first or last date, respectively, of the input data set is assumed. A four-digit year can be specified; if a two-digit year is specified, the value specified in the YEARCUTOFF= SAS system option applies. INTERVAL=interval
specifies the frequency of the input time series. If the input data consist of quarterly observations, then INTERVAL=QTR should be used. If the input data consist of monthly observations, then INTERVAL=MONTH should be used. If the INTERVAL= option is not specified and SEASONS=4, then INTERVAL=QTR is assumed; likewise, SEASONS=12 implies INTERVAL=MONTH. If both the INTERVAL= option and the SEASONS= option are specified, the values should not be conflicting. If neither the INTERVAL= option nor the SEASONS= option is specified and the START= option is specified, then the data are assumed to be monthly. If a date variable is specified using the DATE= option, it is not necessary to specify the INTERVAL= option or the SEASONS= option; however, if specified, the values of the INTERVAL= option or the SEASONS= option should not be in conflict with the values of the date variable. See Chapter 4, “Date Intervals, Formats, and Functions,” for more details about intervals. SEASONS=number
specifies the number of observations in a seasonal cycle. If the SEASONS= option is not specified and INTERVAL=QTR, then SEASONS=4 is assumed. If the SEASONS= option is not specified and INTERVAL=MONTH, then SEASONS=12 is assumed. If the SEASONS= option is specified, its value should not conflict with the values of the INTERVAL= option or the values of the date variable. See the preceding descriptions for the START=, DATE=, and INTERVAL= options for more details.
BY Statement F 2111
NOTRIMMISS
suppresses the default, by which leading and trailing missing values are trimmed from the series. If NOTRIMMISS is used, PROC X12 automatically generates missing value regressors for any missing value within the span of the series, including leading and trailing missing values. INEVENT=SAS-data-set
specifies the input data set that defines any user-defined event variables. This option can be omitted if events are not specified or if only SAS predefined events are specified in an EVENT statement. For more information about the format of this data set, see the section “INEVENT= Data Set” on page 2144 for details. MDLINFOIN=SAS-data-set
specifies an optional input data set that contains model information that can replace the information contained in the TRANSFORM, REGRESSION, ARIMA, and AUTOMDL statements. The MDLINFOIN= data set can contain BY-group and series names; it is useful for providing specific information about each series to be seasonally adjusted. See the section “MDLINFOIN= and MDLINFOOUT= Data Sets” on page 2142 for details. MDLINFOOUT=SAS-data-set
specifies the optional output data set that contains the transformation, regression, and ARIMA information related to each seasonally adjusted series. The data set is sorted by the BY-group variables, if any, and by series names. The MDLINFOOUT= data set can be used as input for the MDLINFOIN= option. See the section “MDLINFOIN= and MDLINFOOUT= Data Sets” on page 2142 for details. NOPRINT
suppresses any printed output.
BY Statement BY variables ;
A BY statement can be used with PROC X12 to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input DATA= data set to be sorted in order of the BY variables.
ID Statement ID variables ;
If you are creating an output data set, use the ID statement to copy values of the ID variables, in addition to the table values, into the output data set. Or, if the VAR statement is omitted, all numeric variables that are not identified as BY variables, ID variables, the DATE= variable, or user-defined regressors are processed as time series. The ID statement has no effect when a VAR statement is
2112 F Chapter 32: The X12 Procedure
specified and an output data set is not created. If the DATE= variable is specified in the PROC X12 statement, this variable is included automatically in the OUTPUT data set. If no DATE= variable is specified, the variable _DATE_ is added. The date variable (or _DATE_ ) values outside the range of the actual data (from forecasting) are extrapolated, while all other ID variables are missing in the forecast horizon.
EVENT Statement EVENT variables < / options > ;
The EVENT statement specifies EVENTs to be included in the regression portion of the regARIMA model. Multiple EVENT statements can be specified. If a MDLINFOIN= data set is not specified, then all variables specified in the EVENT statements are applied to all BY-groups and all time series that are processed. If a MDLINFOIN= data set is specified, then the EVENT statements applies only if no regression information for the BY-group and series is available in the MDLINFOIN= data set. The EVENTs specified in the EVENT statements either must be SAS predefined EVENTs or must be defined in the data set specified in the INEVENT=SAS-data-set option of the PROC X12 statement. For a list of SAS predefined EVENTs, see the section “EVENTKEY Statement” in Chapter 6, “The HPFEVENTS Procedure” (SAS High-Performance Forecasting User’s Guide). The EVENT statement can also be used to include outlier, level shift, and temporary change regressors that are available as predefined U.S. Census Bureau variables in the X-12-ARIMA program. For example, the following statements specify an additive outlier in January 1970 and a level shift that begins in July 1971: proc x12 data=ICMETI seasons=12 start=jan1968; event AO01JAN1970D CBLS01JUL1971D;
and the following statements specify an additive outlier in the second quarter 1970 and a temporary change that begins in the fourth quarter 1971: proc x12 data=ICMETI seasons=4 start=’1970q1’; event AO01APR1970D TC01OCT1971D;
The following options can appear in the EVENT statement. B=(value < F > . . . )
specifies initial or fixed values for the EVENT parameters. For details about the B= option, see B=(value
INPUT Statement F 2113
USERTYPE=LOM USERTYPE=LOMSTOCK USERTYPE=LOQ USERTYPE=LPYEAR USERTYPE=LS USERTYPE=RP USERTYPE=SCEASTER USERTYPE=SEASONAL USERTYPE=TC USERTYPE=TD USERTYPE=TDSTOCK USERTYPE=THANKS USERTYPE=USER
For details about the USERTYPE= option, see the USERTYPE= option in the section “REGRESSION Statement” on page 2124.
INPUT Statement INPUT variables < / options > ;
The INPUT statement specifies variables in the PROC X12 DATA= data set that are to be used as regressors in the regression portion of the regARIMA model. The variables in the data set should contain the values for each observation that define the regressor. Future values of regression variables should also be included in the DATA= data set if the time series listed in the VAR statement is to be extended with regARIMA forecasts. Multiple INPUT statements can be specified. If a MDLINFOIN= data set is not specified, then all variables listed in the INPUT statements are applied to all BY-groups and all time series that are processed. If a MDLINFOIN= data set is specified, then the INPUT statements apply only if no regression information for the BY-group and series is available in the MDLINFOIN= data set. The following options can appear in the INPUT statement. B=(value
specifies initial or fixed values for the INPUT variable parameters. For details about the B= option, see the B=(value
2114 F Chapter 32: The X12 Procedure
USERTYPE=LOQ USERTYPE=LPYEAR USERTYPE=LS USERTYPE=RP USERTYPE=SCEASTER USERTYPE=SEASONAL USERTYPE=TC USERTYPE=TD USERTYPE=TDSTOCK USERTYPE=THANKS USERTYPE=USER
For details about the USERTYPE= option, see the USERTYPE= option in the section “REGRESSION Statement” on page 2124.
ADJUST Statement ADJUST options ;
The ADJUST statement adjusts the series for leap year and length-of-period factors prior to estimating a regARIMA model. The “Prior Adjustment Factors” table is associated with the ADJUST statement. The following option can appear in the ADJUST statement. PREDEFINED=LOM PREDEFINED=LOQ PREDEFINED=LPYEAR
specifies length-of-month adjustment, length-of-quarter adjustment, or leap year adjustment. PREDEFINED=LOM and PREDEFINED=LOQ are equivalent; the actual adjustment is determined by the interval of the time series. Also, since leap year adjustment is a limited form of length-of-period adjustment, only one type of predefined adjustment can be specified. The PREDEFINED= option should not be used in conjunction with PREDEFINED=TD or PREDEFINED=TD1COEF in the REGRESSION statement or MODE=ADD or MODE=PSEUDOADD in the X11 statement. PREDEFINED=LPYEAR cannot be specified unless the series is log transformed. If the series is to be transformed by using a Box-Cox or logistic transformation, the series is first adjusted according to the ADJUST statement, then transformed. In the case of a length-of-month adjustment for the series with observations Yt , each observation is first divided by the number of days in that month, mt , and then multiplied by the average length of month (30.4375), resulting in .30:4375 Yt /=mt . Length-of-quarter adjustments are performed in a similar manner, resulting in .91:3125 Yt /=qt , where qt is the length in days of quarter t.
ARIMA Statement F 2115
Forecasts of the transformed and adjusted data are transformed and adjusted back to the original scale for output.
ARIMA Statement ARIMA options ;
The ARIMA statement specifies the ARIMA part of the regARIMA model. This statement defines a pure ARIMA model if no REGRESSION statements, INPUT statements, or EVENT statements are specified. The ARIMA part of the model can include multiplicative seasonal factors. The following option can appear in the ARIMA statement. MODEL=((p d q) (P D Q)s)
specifies the ARIMA model. The format follows standard Box-Jenkins notation (Box, Jenkins, and Reinsel 1994). The nonseasonal AR and MA orders are given by p and q, respectively, while the seasonal AR and MA orders are given by P and Q. The number of differences and seasonal differences are given by d and D, respectively. The notation (p d q) and (P D Q) can also be specified as (p, d, q) and (P, D, Q). The maximum lag of any AR or MA parameter is 36. The maximum value of a difference order, d or D, is 144. All values for p, d, q, P, D, and Q should be nonnegative integers. The lag that corresponds to seasonality is s; s should be a positive integer. If s is omitted, it is set equal to the value used in the SEASONS= option in the PROC X12 statement. For example, the following statements specify an ARIMA (2,1,1)(1,1,0)12 model: proc x12 data=ICMETI seasons=12 start=jan1968; arima model=((2,1,1)(1,1,0));
ESTIMATE Statement ESTIMATE options ;
The ESTIMATE statement estimates the regARIMA model. The regARIMA model is specified by the REGRESSION, INPUT, EVENT, and ARIMA statements or by the MDLINFOIN= data set. Estimation output includes point estimates and standard errors for all estimated AR, MA, and regression parameters; the maximum likelihood estimate of the variance 2 ; t statistics for individual regression parameters; 2 statistics for assessing the joint significance of the parameters associated with certain regression effects (if included in the model); and likelihood-based model selection statistics (if the exact likelihood function is used). The regression effects for which 2 statistics are produced are fixed seasonal effects. Tables displayed in the output associated with estimation are “Exact ARMA Likelihood Estimation Iteration Tolerances,” “Average Absolute Percentage Error in within-Sample Forecasts,” “ARMA Iteration History,” “AR/MA Roots,” “Exact ARMA Likelihood Estimation Iteration Summary,”
2116 F Chapter 32: The X12 Procedure
“Regression Model Parameter Estimates,” “ Chi-Squared Tests for Groups of Regressors,” “Exact ARMA Maximum Likelihood Estimation,” and “Estimation Summary.” The following options can appear in the ESTIMATE statement. MAXITER=value
specifies the maximum number of iterations (for estimating the AR and MA parameters) allowed. For models with regression variables, this limit applies to the total number of ARMA iterations over all iterations of the iterative generalized least squares (IGLS) algorithm. For models without regression variables, this is the maximum number of iterations allowed for the set of ARMA iterations. The default is MAXITER=200. TOL=value
specifies the convergence tolerance for the nonlinear estimation. Absolute changes in the loglikelihood are compared to the TOL= value to check convergence of the estimation iterations. For models with regression variables, the TOL= value is used to check convergence of the IGLS iterations (where the regression parameters are reestimated for each new set of AR and MA parameters). For models without regression variables, there are no IGLS iterations, and the TOL= value is then used to check convergence of the nonlinear iterations used to estimate the AR and MA parameters. The default value is TOL=0.00001. The minimum tolerance value is a positive value based on the machine precision and the length of the series. If a tolerance less than the minimum supported value is specified, an error message is displayed and the series is not processed. ITPRINT
specifies that the “Iterations History” table be displayed. This includes detailed output for estimation iterations (including log-likelihood values and parameters) and counts of function evaluations and iterations. It is useful to examine the “Iterations History” table when errors occur within estimation iterations. By default, only successful iterations are displayed, unless the PRINTERR option is specified. An unsuccessful iteration is an iteration that is restarted due to a problem such as a root inside the unit circle. Successful iterations have a status of 0. If restarted iterations are displayed, a note at the end of the table gives definitions for status codes that indicate a restarted iteration. For restarted iterations, the number of function evaluations and the number of iterations will be –1, which is displayed as missing. If regression parameters are included in the model, then both IGLS and ARMA iterations are included in the table. The number of function evaluations is a cumulative total. PRINTERR
causes restarted iterations to be included in the “Iterations History” table (if ITPRINT is specified) or creates the “Restarted Iterations” table (if ITPRINT is not specified). Whether or not PRINTERR is specified, a WARNING message is printed to the log file if any iteration is restarted during estimation.
FORECAST Statement FORECAST options ;
IDENTIFY Statement F 2117
The FORECAST statement uses the estimated model to forecast the time series. The output contains point forecast and forecast statistics for the transformed and original series. The following option can appear in the FORECAST statement. LEAD=value
specifies the number of periods ahead to forecast. The default is the number of periods in a year (4 or 12), and the maximum is 60. Tables that contain forecasts, standard errors, and confidence limits are displayed in association with the FORECAST statement. If the data is transformed, then two tables are displayed: one table for the original data, and one table for the transformed data.
IDENTIFY Statement IDENTIFY options ;
The IDENTIFY statement is used to produce plots of the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) for identifying the ARIMA part of a regARIMA model. The sample ACF and PACF are produced for all combinations of the nonseasonal and seasonal differences of the data specified by the DIFF= and SDIFF= options. If the model includes a regression component (specified using the REGRESSION, INPUT, and EVENT statements or the MDLINFOIN= data set), then the ACFs and PACFs are calculated for the specified differences of the regression residuals. If the model does not include a regression component, then the ACFs and PACFs are calculated for the specified differences of the original data. Tables displayed in association with identification are “Autocorrelation of Model Residuals” and “Partial Autocorrelation of Model Residuals.” If the model includes a regression component (specified using the REGRESSION, INPUT, and EVENT statements or the MDLINFOIN= data set), then the “Regression Model Parameter Estimates” table is also available. The following options can appear in the IDENTIFY statement. DIFF=(order, order, order )
specifies orders of nonseasonal differencing to use in model identification. The value 0 specifies no differencing; the value 1 specifies one nonseasonal difference .1 B/; the value 2 specifies two nonseasonal differences .1 B/2 ; and so forth. The ACFs and PACFs are produced for all orders of nonseasonal differencing specified, in combination with all orders of seasonal differencing specified in the SDIFF= option. The default is DIFF=(0). You can specify up to three values for nonseasonal differences. SDIFF=(order, order, order )
specifies orders of seasonal differencing to use in model identification. The value 0 specifies no seasonal differencing; the value 1 specifies one seasonal difference .1 B s /; the value 2 specifies two seasonal differences .1 B s /2 ; and so forth. Here the value for s corresponds to the period specified in the SEASONS= option in the PROC X12 statement. The value of the SEASONS= option is supplied explicitly or is implicitly supplied through the INTERVAL= option or the values of the DATE= variable. The ACFs and PACFs are produced for all orders
2118 F Chapter 32: The X12 Procedure
of seasonal differencing specified, in combination with all orders of nonseasonal differencing specified in the DIFF= option. The default is SDIFF=(0). You can specify up to three values for seasonal differences. For example, the following statement produces ACFs and PACFs for two levels of differencing: .1 B/ and .1 B/.1 B s /: identify diff=(1) sdiff=(0, 1);
PRINTREG
causes the “Regression Model Parameter Estimates” table to be printed if the REGRESSION statement is present. By default, the table is not printed.
AUTOMDL Statement AUTOMDL options ;
The AUTOMDL statement is used to invoke the automatic model selection procedure of the X-12ARIMA method. This method is based largely on the TRAMO (time series regression with ARIMA noise, missing values, and outliers) method by Gomez and Maravall (1997a, b). If the AUTOMDL statement is used without the OUTLIER statement, then only missing values regressors are included in the regARIMA model. If the AUTOMDL and the OUTLIER statements are used, then both missing values regressors and regressors for automatically identified outliers are included in the regARIMA model. If both the AUTOMDL statement and the ARIMA statement are present, the ARIMA statement is ignored. The ARIMA statement specifies the model, while the AUTOMDL statement allows the X12 procedure to select the model. If the AUTOMDL statement is specified and a data set is specified in the MDLINFOIN= option of the PROC X12 statement, then the AUTOMDL statement is ignored if the specified data set contains a model specification for the series. If no model for the series is specified in the data set specified in the MDLINFOIN= option, then the AUTOMDL (or ARIMA) statement is used to determine the model. Thus, it is possible to give a specific model for some series and automatically identify the model for other series by using both the MDLINFOIN= option and the AUTOMDL statement. When AUTOMDL is specified, the X12 procedure compares a model selected using a TRAMO method to a default model. The TRAMO method is implemented first, and involves two parts: identifying the orders of differencing and identifying the ARIMA model. The table “ARIMA Estimates for Unit Root Identification” provides details about the identification of the orders of differencing, while the table “Results of Unit Root Test for Identifying Orders of Differencing” shows the orders of differencing selected by TRAMO. The table “Models Estimated by Automatic ARIMA Model Selection Procedure” provides details regarding the TRAMO automatic model selection, and the table “Best Five ARIMA Models Chosen by Automatic Modeling” ranks the best five models estimated using the TRAMO method. The next available table, “Comparison of Automatically Selected Model and Default Model,” compares the model selected by the TRAMO method to a default model. At this point in the processing, if the default model is selected over the TRAMO model, then PROC X12 displays a note. No note is displayed if the TRAMO model is selected. PROC X12 then performs checks for unit roots, over-differencing, and insignificant ARMA coefficients. If the model
AUTOMDL Statement F 2119
is changed due to any of these tests, a note is displayed. The last table, “Final Automatic Model Selection,” shows the results of the automatic model selection. The following options can appear in the AUTOMDL statement: MAXORDER=(nonseasonal order, seasonal order )
specifies the maximum orders of nonseasonal and seasonal ARMA polynomials for the automatic ARIMA model identification procedure. The maximum order for the nonseasonal ARMA parameters should be between 1 and 4; the maximum order for the seasonal ARMA should be 1 or 2. DIFFORDER=(nonseasonal order, seasonal order )
specifies the fixed orders of differencing to be used in the automatic ARIMA model identification procedure. When the DIFFORDER= option is used, only the AR and MA orders are automatically identified. Acceptable values for the regular differencing orders are 0, 1, and 2; acceptable values for the seasonal differencing orders are 0 and 1. If the MAXDIFF= option is also specified, then the DIFFORDER= option is ignored. There are no default values for DIFFORDER. If neither the DIFFORDER= option nor the MAXDIFF= option is specified, then the default is MAXDIFF=(2,1). MAXDIFF=(nonseasonal order, seasonal order )
specifies the maximum orders of regular and seasonal differencing for the automatic identification of differencing orders. When MAXDIFF is specified, the differencing orders are identified first, and then the AR and MA orders are identified. Acceptable values for the regular differencing orders are 1 and 2; the only acceptable value for the seasonal differencing order is 1. If both the MAXDIFF= option and the DIFFORDER option= are specified, then the DIFFORDER= option is ignored. If neither the DIFFORDER= nor the MAXDIFF= option is specified, the default is MAXDIFF=(2,1). NOINT
suppresses the fitting of a constant (or intercept) parameter in the model. (That is, the parameter is omitted.) PRINT=UNITROOTTEST PRINT=AUTOCHOICE PRINT=UNITROOTTESTMDL PRINT=AUTOCHOICEMDL PRINT=BEST5MODEL
lists the tables to be displayed in the output. PRINT=AUTOCHOICE displays the tables titled “Comparison of Automatically Selected Model and Default Model” and “Final Automatic Model Selection.” The “Comparison of Automatically Selected Model and Default Model” table compares a default model to the model chosen by the TRAMO-based automatic modeling method. The “Final Automatic Model Selection” table indicates which model has been chosen automatically. If the PRINT= option is not specified, then PRINT=AUTOCHOICE is displayed by default. PRINT=UNITROOTTEST causes the table titled “Results of Unit Root Test for Identifying Orders of Differencing” to be printed. This table displays the orders that were automatically
2120 F Chapter 32: The X12 Procedure
selected by AUTOMDL. Unless the nonseasonal and seasonal differences are specified using the DIFFORDER= option, AUTOMDL automatically identifies the orders of differencing. PRINT=UNITROOTMDL displays the table titled “ARIMA Estimates for Unit Root Identification.” This table summarizes the various models that were considered by the TRAMO automatic selection method while identifying the orders of differencing and the statistics associated with those models. The unit root identification method first attempts to obtain the coefficients by using the Hannan-Rissanen method. If Hannan-Rissanen estimation cannot be performed, the algorithm attempts to obtain the coefficients by using conditional likelihood estimation. PRINT=AUTOCHOICEMDL displays the table “Models Estimated by Automatic ARIMA Model Selection Procedure.” This table summarizes the various models that were considered by the TRAMO automatic model selection method and their measures of fit. PRINT=BEST5MODEL displays the table “Best Five ARIMA Models Chosen by Automatic Modeling.” This table ranks the five best models that were considered by the TRAMO automatic modeling method. BALANCED
specifies that the automatic modeling procedure prefer balanced models over unbalanced models. A balanced model is one in which the sum of AR, differencing, and seasonal differencing orders equal to the sum of MA and seasonal MA orders. Specifying BALANCED gives the same preference as the TRAMO program. If BALANCED is not specified, all models are given equal consideration. HRINITIAL
specifies that Hannan-Rissanen estimation be done before exact maximum likelihood estimation to provide initial values. If HRINITIAL is specified, then models for which the HannanRissanen estimation has an unacceptable coefficient are rejected. ACCEPTDEFAULT
specifies that the default model be chosen if its Ljung-Box Q is acceptable. LJUNGBOXLIMIT=value
specifies acceptance criteria for confidence coefficient of the Ljung-Box Q statistic. If the Ljung-Box Q for a final model is greater than this value, the model is rejected, the outlier critical value is reduced, and outlier identification is redone with the reduced value (see the REDUCECV option). The value specified must be greater than 0 and less than 1. The default value is 0.95. REDUCECV=value
specifies the percentage that the outlier critical value be reduced when a final model is found to have an unacceptable confidence coefficient for the Ljung-Box Q statistic. This value should be between 0 and 1. The default value is 0.14286. ARMACV=value
specifies the threshold value for the t statistics associated with the highest order ARMA coefficients. As a check of model parsimony, the parameter estimates and t statistics of the highest order ARMA coefficients are examined to determine if the coefficient is insignificant.
OUTPUT Statement F 2121
An ARMA coefficient is considered to be insignificant if the absolute value of the parameter estimate is below 0.15 for 150 or fewer observations, and below 0.1 for more than 150 observations and the t value (displayed in the table “Exact ARMA Maximum Likelihood Estimation”) is below the value specified in the ARMACV= option. If the highest order ARMA coefficient is found to be insignificant then the order of the ARMA model is reduced. For example, if AUTOMDL identifies a (3 1 1)(0 0 1) model and the parameter estimate of the seasonal MA lag of order 1 is –0.9 and its t value is –0.55, then the ARIMA model is reduced to at least (3 1 1)(0 0 0). After the model is reestimated, the check for insignificant coefficients is performed again. If ARMACV=0.54 is specified in the preceding example, then the coefficient is not found to be insignificant and the model is not reduced. If a constant regressor is allowed in the model and if the t value (displayed in the table “Regression Model Parameter Estimates”) is below the ARMACV= critical value, then the constant regressor is considered to be insignificant and is removed. Note that if a constant regressor is added to or removed from the model and then the ARIMA model changes, then the t statistic for the constant regressor also changes. Thus, changing the ARMACV= value does not necessarily add or remove a constant term from the model. The value specified in the ARMACV= option should be greater than zero. The default value is 1.0.
OUTPUT Statement OUTPUT OUT= SAS-data-set tablename1 tablename2 . . . ;
The OUTPUT statement creates an output data set that contains specified tables. The data set is named by the OUT= option. OUT=SAS-data-set
names the data set to contain the specified tables. If the OUT= option is omitted, the SAS System names the new data set by using the default DATAn convention. For each table to be included in the output data set, you must specify the X12 tablename keyword. The keyword corresponds to the title label used by the Census Bureau X12-ARIMA software. Currently available tables are A1, A2, A6, A7, A8, A8AO, A8LS, A8TC, A9, A10, B1, C17, C20, D1, D7, D8, D9, D10, D10B, D10D, D11, D11A, D11F, D11R, D12, D13, D16, D16B, D18, E5, E6, E6A, E6R, E7, and MV1. If no table is specified in the OUTPUT statement, Table A1 is output to the OUT= data set by default. The tablename keywords that can be used in the OUTPUT statement are listed in the section “Displayed Output/ODS Table Names/OUTPUT Tablename Keywords” on page 2139. The following is an example of a VAR statement and an OUTPUT statement: var sales costs; output out=out_x12
b1 d11;
Note that the default variable name used in the output data set is the input variable name followed by an underscore and the corresponding table name. The variable sales_B1 contains
2122 F Chapter 32: The X12 Procedure
the Table B1 values for the variable sales, the variable costs_B1 contains the Table B1 values for costs, while the Table D11 values for sales are contained in the variable sales_D11, and the variable costs_D11 contains the Table D11 values for costs. If necessary, the variable name is shortened so that the table name can be added. If the DATE= variable is specified in the PROC X12 statement, then that variable is included in the output data set; otherwise, a variable named _DATE_ is written to the OUT= data set as the date identifier.
OUTLIER Statement OUTLIER options ;
The OUTLIER statement specifies that the X12 procedure perform automatic detection of additive (point) outliers, temporary change outliers, level shifts, or any combination of the three when using the specified model. After outliers are identified, the appropriate regression variables are incorporated into the model as “Automatically Identified Outliers,” and the model is reestimated. This procedure is repeated until no additional outliers are found. The OUTLIER statement also identifies potential outliers and lists them in the table “Potential Outliers” in the displayed output. Potential outliers are identified by decreasing the critical value by 0.5. In the output, the default initial critical values used for outlier detection in a given analysis are displayed in the table “Critical Values to Use in Outlier Detection.” Outliers that are detected and incorporated into the model are displayed in the output in the table “Regression Model Parameter Estimates,” where the regression variable is listed as “Automatically Identified.” The following options can appear in the OUTLIER statement. SPAN=(mmmyy ,mmmyy ) SPAN=(’yyQq’ ,’yyQq’ )
gives the dates of the first and last observations to define a subset for searching for outliers. A single date in parentheses is interpreted to be the starting date of the subset. To specify only the ending date, use SPAN=(,mmmyy ) or SPAN=(,’yyQq’ ). If the starting or ending date is omitted, then the first or last date, respectively, of the input data set is assumed. A four-digit year can be specified; if a two-digit year is specified, the value specified in the YEARCUTOFF= SAS system option applies. TYPE=NONE TYPE=(outlier types)
lists the outlier types to be detected by the automatic outlier identification method. TYPE=NONE turns off outlier detection. The valid outlier types are AO, LS, and TC. The default is TYPE=(AO LS). CV=value
specifies an initial critical value to use for detection of all types of outliers. The absolute value of the t statistic associated with an outlier parameter estimate is compared with the critical value to determine the significance of the outlier. If the CV= option is not specified, then the
OUTLIER Statement F 2123
default initial critical value is computed using a formula presented by Ljung (1993), which is based on the number of observations or model span used in the analysis. Table 32.2 gives default critical values for various series lengths. Increasing the critical value decreases the sensitivity of the outlier detection routine and can reduce the number of observations treated as outliers. The automatic model identification process might lower the critical value by a certain percentage, if the automatic model identification process fails to identify an acceptable model. Table 32.2
Default Critical Values for Outlier Identification
Number of Observations 1 2 3 4 5 6 7 8 9 10 11 12 24 36 48 72 96 120 144 168 192 216 240 264 288 312 336 360
Outlier Critical Value 1.96 2.24 2.44 2.62 2.74 2.84 2.92 2.99 3.04 3.09 3.13 3.16 3.42 3.55 3.63 3.73 3.80 3.85 3.89 3.92 3.95 3.97 3.99 4.01 4.03 4.04 4.05 4.07
AOCV=value
specifies a critical value to use for additive (point) outliers. If AOCV is specified, this value overrides any default critical value for AO outliers. See the CV= option for more details. LSCV=value
specifies a critical value to use for level shift outliers. If LSCV is specified, this value overrides any default critical value for LS outliers. See the CV= option for more details.
2124 F Chapter 32: The X12 Procedure
TCCV=value
specifies a critical value to use for temporary change outliers. If TCCV is specified, this value overrides any default critical value for TC outliers. See the CV= option for more details.
REGRESSION Statement REGRESSION PREDEFINED= variables < / options > ; REGRESSION USERVAR= variables < / options > ;
The REGRESSION statement includes regression variables in a regARIMA model or specifies regression variables whose effects are to be removed by the IDENTIFY statement to aid in ARIMA model identification. Predefined regression variables are selected with the PREDEFINED= option. User-defined regression variables are specified with the USERVAR= option. The currently available predefined variables are listed below in Table 32.3. Table A6 in the displayed output generated by the X12 procedure provides information related to trading day effects. Table A7 provides information related to holiday effects. Tables A8, A8AO, A8LS, and A8TC provide information related to outlier factors. Ramps and level shifts are combined in the A8LS table. The A8AO, A8LS and A8TC tables are available only when more than one outlier type is present in the model. Table A9 provides information about user-defined regression effects. Table A10 provides information about the user-defined seasonal component. Missing values in the span of an input series automatically create missing value regressors. See the NOTRIMMISS option of the PROC X12 statement and the section “Missing Values” on page 2136 for further details about missing values. Combining your model with additional predefined regression variables can result in a singularity problem. If a singularity occurs, then you might need to alter either the model or the choices of the predefined regressors in order to successfully perform the regression. In order to seasonally adjust a series that uses a regARIMA model, the factors derived from the regression coefficients must be the same type as the factors generated by the seasonal adjustment procedure, so that combined adjustment factors can be derived and adjustment diagnostics can be generated. If the regARIMA model is applied to a log-transformed series, the regression factors are expressed in the form of ratios, which match seasonal factors generated by the multiplicative (or log-additive) adjustment modes. Conversely, if the regARIMA model is fit to the original series, the regression factors are measured on the same scale as the original series, which match seasonal factors generated by the additive adjustment mode. Note that the default transformation (no transformation) and the default seasonal adjustment mode (multiplicative) are in conflict. Thus when you specify the X11 statement and any of the REGRESSION, INPUT, or EVENT statements, it is necessary to also specify either a transform option (using the TRANSFORM statement) or a mode (using the MODE= option of the X11 statement) in order to seasonally adjust the data that uses the regARIMA model. According to Ladiray and Quenneville (2001), “X-12-ARIMA is based on the same principle [as the X-11 method] but proposes, in addition, a complete module, called Reg-ARIMA, that allows for the initial series to be corrected for all sorts of undesirable effects. These effects are estimated using regression models with ARIMA errors (Findley et al. [23]).” In order to correct the series for effects in this manner, the REGRESSION statement must be specified. The effects that can be corrected in this manner are listed in the PREDEFINED= option below.
REGRESSION Statement F 2125
Either the PREDEFINED= option or the USERVAR= option can be specified in a single REGRESSION statement, but not both. Multiple REGRESSION statements can be used. The following options can appear in the REGRESSION statement. PREDEFINED=CONSTANT < / B= > PREDEFINED=LOM PREDEFINED=LOMSTOCK PREDEFINED=LOQ PREDEFINED=LPYEAR PREDEFINED=SEASONAL PREDEFINED=TD PREDEFINED=TDNOLPYEAR PREDEFINED=TD1COEF PREDEFINED=TD1NOLPYEAR PREDEFINED=EASTER(value) PREDEFINED=SCEASTER(value) PREDEFINED=LABOR(value) PREDEFINED=THANK(value) PREDEFINED=TDSTOCK(value) PREDEFINED=SINCOS(value . . . )
lists the predefined regression variables to be included in the model. Data values for these variables are calculated by the program, mostly as functions of the calendar. Table 32.3 gives definitions for the available predefined variables. The values LOM and LOQ are actually equivalent: the actual regression is controlled by the PROC X12 SEASONS= option. Multiple predefined regression variables can be used. The syntax for using both a length-of-month and a seasonal regression can be in one of the following forms: regression predefined=lom seasonal; regression predefined=(lom seasonal); regression predefined=lom predefined=seasonal;
Certain restrictions apply when you use more than one predefined regression variable. Only one of TD, TDNOLPYEAR, TD1COEF, or TD1NOLPYEAR can be specified. LPYEAR cannot be used with TD, TD1COEF, LOM, LOMSTOCK, or LOQ. LOM or LOQ cannot be used with TD or TD1COEF. The following restriction also applies to the SINCOS predefined regression variable. If SINCOS is specified, then the INTERVAL= option or the SEASONS= option must also be specified because there are restrictions to this regression variable based on the frequency of the data. The predefined regression variables TDSTOCK, SCEASTER, EASTER, LABOR, THANK, and SINCOS require extra parameters. Only one TDSTOCK regressor can be implemented in the regression model. If multiple TDSTOCK variables are specified, PROC X12 uses
2126 F Chapter 32: The X12 Procedure
the last TDSTOCK variable specified. For SCEASTER, EASTER, LABOR, THANK, and SINCOS, multiple regressors can be implemented in the model by specifying the variables with different parameters. The syntax for specifying two EASTER regressors with widths 7 and 14 would be: regression predefined=easter(7) easter(14);
For SINCOS, specifying a parameter includes both the sine and the cosine regressor except for the highest order allowed (2 for quarterly data and 6 for monthly data.) The most common use of the SINCOS variable for quarterly data would be regression predefined=sincos(1,2);
and for monthly data would be regression predefined=sincos(1,2,3,4,5,6);
These statements include 3 and 11 regressors in the model, respectively. Table 32.3
Predefined Regression Variables in X-12-ARIMA
Regression Effect
Variable Definitions B s( /
.1
CONSTANT
where I.t 1/ D
length-of-month (monthly flow) LOM
B/
d .1
trend constant
D I.t
1 0
1/; for t 1 for t < 1
mt m N where mt = length of month t (in days) and m N D 30:4375 (average length of month)
(
mt m N .l/ for t D 1 SLOMt 1 C mt m N otherwise N and mt are defined in LOM and 8 where m ˆ 0:375 when 1st February in series is a leap year ˆ ˆ ˆ <0:125 when 2nd February in series is a leap year .l/ D ˆ 0:125 when 3rd February in series is a leap year ˆ ˆ ˆ : 0:375 when 4th February in series is a leap year SLOMt D
stock length-of-month LOMSTOCK
length-of-quarter (quarterly flow) LOQ leap year (monthly and quarterly flow) LPYEAR
qt qN where qt = length of quarter t (in days) and qN D 91:3125 (average length of quarter)
8 ˆ <0:75 LYt D 0:25 ˆ : 0
in leap year February (first quarter) in other Februaries (first quarter) otherwise
REGRESSION Statement F 2127
Table 32.3
continued
Regression Effect
fixed seasonal SEASONAL
fixed seasonal SINCOS(w)
trading day TD, TDNOLPYEAR
one coefficient trading day TD1COEF, TD1NOLPYEAR
stock trading day TDSTOCK(w)
Statistics Canada Easter (monthly or quarterly flow) SCEASTER(w)
Variable Definitions 8 ˆ <1
in January 1 in December ˆ : 0 otherwise 8 ˆ in November <1 ; : : : ; M11;t D 1 in December ˆ : 0 otherwise
M1;t D
si n.wj t/; cos.wj t/; where wj D 2j=12; 1 j s=2 and s is the seasonal period (drop si n.wj t/ 0 for j D s=2) T1;t D (no. of Mondays) – (no. of Sundays) ; : : : ; T6;t D (no. of Saturdays) – (no. of Sundays)
(no. of weekdays)
5 2 (no.
of Saturdays and Sundays)
8 ˆ <1
wQ t h day of month t is a Monday D1;t D 1 wQ t h day of month t is a Sunday ˆ : 0 otherwise 8 ˆ wQ t h day of month t is a Saturday <1 ; : : : ; D6;t D 1 wQ t h day of month t is a Sunday ˆ : 0 otherwise where wQ is the smaller of w and the length of month t. For end-of-month stock series, set w to 31; that is, specify TDSTOCK(31). Restriction: 1 w 31. If Easter falls before April w, let nE be the number of the w days on or before Easter that fall in March. Then: 8 ˆ in March
2128 F Chapter 32: The X12 Procedure
Table 32.3
continued
Regression Effect
Easter holiday EASTER(w)
Variable Definitions E.w; t/ D w1 nt and nt is the number of the w days before Easter that fall in month (or quarter) t. (Note: This variable is 0 except in February, March, and April (or first and second quarter). It is nonzero in February only for w > 22.) Restriction: 1 w 25.
Labor Day LABOR(w)
L.w; t/ D w1 Œno. of the w days before Labor Day that fall in month t (Note: This variable is 0 except in August and September.) Restriction: 1 w 25.
Thanksgiving THANK(w)
T hC.w; t/ D proportion of days from w days before Thanksgiving through December 24 that fall in month t (negative values of w indicate days after Thanksgiving). (Note: This variable is 0 except in November and December.) Restriction: 8 w 17.
USERVAR=(variables) < / B=value USERTYPE=option >
specifies variables in the PROC X12 DATA= data set that are to be used as regressors. The variables in the data set should contain the values for each observation that define the regressor; regression variables should also be defined for forecast values if the time series is to be extended with regARIMA forecasts. Missing values are not permitted within the data span, including forecasts, of the user-defined regressors. Example 32.6 shows how you would create an input data set that contains both the series to be seasonally adjusted and a user-defined input variable. Note that all regression variables in the USERVAR= option apply to all time series to be seasonally adjusted unless the MDLINFOIN= data set specifies different regression information. B=(value
specifies initial or fixed values for the regression parameters in the order in which they appear in the PREDEFINED= and USERVAR= options. Each B= list applies to the PREDEFINED= or USERVAR= variable list that immediately precedes the slash. The PREDEFINED= option and the USERVAR= option cannot be specified in the same REGRESSION statement; however, multiple REGRESSION statements can be specified. For example, the following statements set an initial value for the user-defined regressor, x, of 1: regression predefined=LOM ; regression uservar=x / b=1 2 ;
In this example, the B= option applies only to the USERVAR= statement. The value 2 is discarded since there is only one variable in the USERVAR= list. To assign an initial value of 1 to the LOM regressor and 2 to the x regressor, use the following statements:
TABLES Statement F 2129
regression predefined=LOM / b=1; regression uservar=x / b=2 ;
An F immediately following the numerical value indicates that this is not an initial value, but a fixed value. See Example 32.8 for an example that uses fixed parameters. In PROC X12, individual parameters can be fixed while other parameters in the same model are estimated. USERTYPE=CONSTANT USERTYPE=SEASONAL USERTYPE=TD USERTYPE=LOM USERTYPE=LOQ USERTYPE=LPYEAR USERTYPE=TDSTOCK USERTYPE=LOMSTOCK USERTYPE=EASTER USERTYPE=LABOR USERTYPE=THANKS USERTYPE=AO USERTYPE=LS USERTYPE=RP USERTYPE=HOLIDAY USERTYPE=SCEASTER USERTYPE=USER USERTYPE=TC
enables a user-defined variable to be processed in the same manner as a U.S. Census predefined variable. For instance, the U.S. Census Bureau EASTER(w) regression effects are included the “RegARIMA Holiday Component” table (A7). You should specify USERTYPE=EASTER to include a user-defined variable which would be processed exactly as the U.S. Census predefined EASTER(w) variable, including inclusion in the A7 table. Each USERTYPE= list applies to the USERVAR= variable list that immediately precedes the slash. USERTYPE= does not apply to U.S. Census predefined variables. The same rules for assigning B= values to regression variables apply for USERTYPE= options. See the example in B=(value
TABLES Statement TABLES tablename1 tablename2 . . . options ;
The TABLES statement enables you to alter the display of the PROC X12 tables. You can specify the display of tables that are not displayed by default by PROC X12, and the NOSUM option enables you to suppress the printing of the period summary line in the time series tables.
2130 F Chapter 32: The X12 Procedure
tablename1 tablename2 . . .
keywords that correspond to the title label used by the Census Bureau X12-ARIMA software. For each table to be included in the displayed output, you must specify the X12 tablename keyword. Currently available tables are C20, D1, and D7. Although these tables are not displayed by default, their values are sometimes useful in understanding the X-12ARIMA method. For further description of the available tables, see the section “Displayed Output/ODS Table Names/OUTPUT Tablename Keywords” on page 2139. NOSUM NOSUMMARY NOSUMMARYLINE
applies to the tables available for output in the OUTPUT Statement. By default, these tables include a summary line that gives the average, total, or standard deviation for the historical data by period. The NOSUM option suppresses the display of the summary line in the listing. Also, if the tables are output with ODS, the summary line is not an observation in the data set. Thus, the output to the data set is only the time series, both the historical data and the forecast data, if available.
TRANSFORM Statement TRANSFORM options ;
The TRANSFORM statement transforms or adjusts the series prior to estimating a regARIMA model. With this statement, the series can be Box-Cox (power) transformed. The “Prior Adjustment Factors” table is associated with the TRANSFORM statement. Only one of the following options can appear in the TRANSFORM statement. POWER=value
transforms the input series, Yt , by using a Box-Cox power transformation, log.Yt / D0 Yt ! yt D 2 C .Yt 1/= ¤ 0 The power must be specified (for example, POWERD 0:33). The default is no transformation ( D 1); that is, POWERD 1. The log transformation (POWERD 0), square root transformation (POWERD 0:5), and the inverse transformation (POWERD 1) are equivalent to the corresponding FUNCTION= option. Table 32.4
Power Values Related to the Census Bureau Function Argument
FUNCTION= NONE LOG SQRT INVERSE LOGISTIC
Transformation Yt log.Y t/ p 2. Yt 0:875/ 2 Y1t log. 1 YtYt /
Range for Yt all values Yt > 0 for all t Yt 0 for all t Yt ¤ 0 for all t 0 < Yt < 1 for all t
Equivalent Power Argument power D 1 power D 0 power D 0:5 power D 1 none equivalent
USERDEFINED Statement F 2131
FUNCTION=NONE FUNCTION=LOG FUNCTION=SQRT FUNCTION=INVERSE FUNCTION=LOGISTIC FUNCTION=AUTO TYPE=NONE | LOG | SQRT | INVERSE | LOGISTIC | AUTO
the transformation to be applied to the series prior to estimating a regARIMA model. The transformation used by FUNCTION=NONE, LOG, SQRT, INVERSE, and LOGISTIC is related to the POWER= option as shown in Table 32.4. FUNCTION=AUTO uses selection based on Akaike’s information criterion (AIC) to decide between a log transformation and no transformation. The default is FUNCTION=NONE. However, the FUNCTION= and POWER= options are not completely equivalent. In some cases, using the FUNCTION= option causes the program to automatically select other options. For instance, FUNCTION=NONE causes the default mode to be MODE=ADD in the X11 statement. Also, the choice of transformation invoked by the FUNCTION=AUTO option can impact the default mode of the X11 statement. Note that there are restrictions on the value used in the POWER and FUNCTION options when preadjustment factors for seasonal adjustment are generated from a regARIMA model. When seasonal adjustment is requested with the X11 statement, any value of the POWER option can be used for the purpose of forecasting the series with a regARIMA model. However, this is not the case when factors generated from the regression coefficients are used to adjust either the original series or the final seasonally adjusted series. In this case, the only accepted transformations are the log transformation, which can be specified as POWER=0 (for multiplicative or log-additive seasonal adjustments) and no transformation, which can be specified as POWER=1 (for additive seasonal adjustments). If no seasonal adjustment is performed, any POWER transformation can be used. The preceding restrictions also apply to FUNCTION=NONE and FUNCTION=LOG.
USERDEFINED Statement USERDEFINED variables ;
The USERDEFINED statement is used to identify the variables in the input data set that are available for user-defined regression. Only numeric variables can be specified. Note that specifying variables in the USERDEFINED statement does not include the variables as regressors. If a variable is specified in the INPUT statement or REGRESSION USERVAR= option, it is not necessary to include that variable in the USERDEFINED statement. However, if a variable is specified in the MDLINFOIN= data set and is not specified in an INPUT statement or in the REGRESSION USERVAR= option, then the variable should be specified in the USERDEFINED statement in order to make the variable available for regression.
2132 F Chapter 32: The X12 Procedure
VAR Statement VAR variables ;
The VAR statement is used to specify the variables in the input data set that are to be analyzed by the procedure. Only numeric variables can be specified. If the VAR statement is omitted, all numeric variables are analyzed except those that appear in a BY statement, ID statement, INPUT statement, USERDEFINED statement, the USERVAR= option of the REGRESSION statement, or the variable named in the DATE= option in the PROC X12 statement.
X11 Statement X11 options ;
The X11 statement is an optional statement for invoking seasonal adjustment by an enhanced version of the methodology of the Census Bureau X-11 and X-11Q programs. You can control the type of seasonal adjustment decomposition calculated with the MODE= option. The output includes the final tables and diagnostics for the X-11 seasonal adjustment method listed in Table 32.5. Tables C20, D1, and D7 are not displayed by default; you can display these tables by using the TABLES statement.
X11 Statement F 2133
Table 32.5
Tables Related to X11 Seasonal Adjustment
Table Name B1 C17 C20 D1 D7 D8 D8A D9 D9A D10 D10B D10D D11 D11A D11R D12 D13 D16 D16B D18 E4 E5 E6 E6A E6R E7 F2A–F2I F3 F4 G
Description original series, adjusted for prior effects and forecast extended final weights for the irregular component final extreme value adjustment factors modified original data, D iteration preliminary trend cycle, D iteration final unmodified SI ratios (differences) F tests for stable and moving seasonality, D8 final replacement values for extreme SI ratios (differences), D iteration moving seasonality ratios for each period final seasonal factors seasonal factors, adjusted for user-defined seasonal final seasonal difference final seasonally adjusted series final seasonally adjusted series with forced yearly totals rounded final seasonally adjusted series (with forced yearly totals) final trend cycle final irregular component combined seasonal and trading day factors final adjustment differences combined calendar adjustment factors ratio of yearly totals of original and seasonally adjusted series percent changes (differences) in original series percent changes (differences) in seasonally adjusted series percent changes (differences) in seasonally adjusted series with forced yearly totals (D11.A) percent changes (differences) in rounded seasonally adjusted series (D11.R) percent changes (differences) in final trend component series X11 diagnostic summary monitoring and quality assessment statistics day of the week trading day component factors spectral plots
For more details about the X-11 seasonal adjustment diagnostics, see Shiskin, Young, and Musgrave (1967), Lothian and Morry (1978a), and Ladiray and Quenneville (2001). The following options can appear in the X11 statement. MODE=ADD MODE=MULT MODE=LOGADD MODE=PSEUDOADD
determines the mode of the seasonal adjustment decomposition to be performed. There
2134 F Chapter 32: The X12 Procedure
are four choices: multiplicative (MODE=MULT), additive (MODE=ADD), pseudo-additive (MODE=PSEUDOADD), and log-additive (MODE=LOGADD) decomposition. If this option is omitted, the procedure performs multiplicative adjustments. Table 32.6 shows the values of the MODE= option and the corresponding models for the original (O) and the seasonally adjusted (SA) series. Table 32.6
Modes of Seasonal Adjustment and Their Models
Value of Mode Option MULT ADD PSEUDOADD LOGADD
Name multiplicative additive pseudo-additive log-additive
Model for O O DC S I O DC CS CI O D C ŒS C I 1 Log.O/ D C C S C I
Model for SA SA D C I SA D C C I SA D C I SA D exp.C C I /
OUTFCST OUTFORECAST
determines whether forecasts are included in certain tables sent to the output data set. If OUTFORECAST is specified, then forecast values are included in the output data set for tables A6, A7, A8, A9, A10, B1, D10, D10B, D10D, D16, D16B, and D18. The default is not to include forecasts. SEASONALMA=S3X1 SEASONALMA=S3X3 SEASONALMA=S3X5 SEASONALMA=S3X9 SEASONALMA=S3X15 SEASONALMA=STABLE SEASONALMA=X11DEFAULT SEASONALMA=MSR
specifies which seasonal moving average (also called seasonal “filter”) be used to estimate the seasonal factors. These seasonal moving averages are n m moving averages, meaning that an n-term simple average is taken of a sequence of consecutive m-term simple averages. X11DEFAULT is the method used by the U.S. Census Bureau’s X-11-ARIMA program. The default for PROC X12 is SEASONALMA=MSR, which is the methodology of Statistic Canada’s X-11-ARIMA/88 program. Table 32.7 describes the seasonal filter options available for the entire series:
X11 Statement F 2135
Table 32.7
X-12-ARIMA Seasonal Filter Options and Descriptions
Filter Name S3X1 S3X3 S3X5 S3X9 S3X15 STABLE
X11DEFAULT
MSR
Description of Filter a 3 1 moving average a 3 3 moving average a 3 5 moving average a 3 9 moving average a 3 15 moving average stable seasonal filter. A single seasonal factor for each calendar month or quarter is generated by calculating the simple average of all the values for each month or quarter (taken after detrending and outlier adjustment). a 3 3 moving average is used to calculate the initial seasonal factors in each iteration, and a 3 5 moving average to calculate the final seasonal factors filter chosen automatically by using the moving seasonality ratio of X-11-ARIMA/88 (Dagum 1988)
TRENDMA=value
specifies which Henderson moving average be used to estimate the final trend cycle. Any odd number greater than one and less than or equal to 101 can be specified. Example: TRENDMA=23. If no selection is made, the program selects a trend moving average based on statistical characteristics of the data. For monthly series, a 9-, 13-, or 23-term Henderson moving average is selected. For quarterly series, the program chooses either a 5- or a 7-term Henderson moving average. FINAL=AO FINAL=LS FINAL=TC FINAL=ALL
lists the types of prior adjustment factors, obtained from the regression and outlier statements, that are to be removed from the final seasonally adjusted series. Additive outliers (FINAL=AO), level change and ramp outliers (FINAL=LS), and temporary change (FINAL=TC) can be removed. If this option is not specified, the final seasonally adjusted series contains these effects. FORCE=TOTALS FORCE=ROUND FORCE=BOTH
specifies that the seasonally adjusted series be modified to (a) force the yearly totals of the seasonally adjusted series and the original series to be the same (FORCE=TOTALS), (b) adjust the seasonally adjusted values for each calendar year so that the sum of the rounded seasonally adjusted series for any year equals the rounded annual total (FORCE=ROUND), or (c) first force the yearly totals, then round the adjusted series (FORCE=BOTH). When FORCE=TOTALS, the differences between the annual totals is distributed over the seasonally adjusted values in a way that approximately preserves the month-to-month (or quarter-toquarter) movements of the original series. For more details, see Huot (1975) and Cholette
2136 F Chapter 32: The X12 Procedure
(1979). This forcing procedure is not recommended if the seasonal pattern is changing or if trading day adjustment is performed. Forcing the seasonally adjusted totals to be the same as the original series annual totals can degrade the quality of the seasonal adjustment, especially when the seasonal pattern is undergoing change. It is not natural if trading day adjustment is performed because the aggregate trading day effect over a year is variable and moderately different from zero.
Details: X12 Procedure
Missing Values PROC X12 can process a series with missing values. Missing values in a series are considered to be one of two types: One type of missing value is a leading or trailing missing value, which occurs before the first nonmissing value or after the last nonmissing value, respectively, in the span of a series. The span of a series can be determined either explicitly by the SPAN= option of the PROC X12 statement or implicitly by the START= or DATE= options. By default, leading and trailing missing values are ignored. The NOTRIMMISS option of the PROC X12 statement causes leading and trailing missing values to also be processed using the X-12-ARIMA missing value method. The second type of missing value is an embedded missing value. These missing values occur between the first nonmissing value and the last nonmissing value in the span of the series. Embedded missing values are processed using X-12-ARIMA’s missing value method described below. When the X-12-ARIMA method encounters a missing value, it inserts an additive outlier for that observation into the set of regression variables for the model of the series and then replaces the missing observation with a value large enough to be considered an outlier during model estimation. After the regARIMA model is estimated, the X-12-ARIMA method adjusts the original series by using factors generated from these missing value outlier regressors. The adjusted values are estimates of the missing values, and the adjusted series is displayed in Table MV1.
Combined Test for the Presence of Identifiable Seasonality The seasonal component of this time series, St , is defined as the intrayear variation that is repeated constantly (stable) or in an evolving fashion from year to year (moving seasonality). If the increase in the seasonal factors from year to year is too large, then the seasonal factors will introduce distortion into the model. It is important to determine if seasonality is identifiable without distorting the series.
Combined Test for the Presence of Identifiable Seasonality F 2137
For seasonality to be identifiable, the series should be identified as seasonal by using the “Test for the Presence of Seasonality Assuming Stability” and “Nonparametric Test for the Presence of Seasonality Assuming Stability.” Also, since the presence of moving seasonality can cause distortion, it is important to evaluate the moving seasonality in conjunction with the stable seasonality to determine if the seasonality is identifiable. The test for identifiable seasonality is performed by combining the F tests for stable and moving seasonality, along with a Kruskal-Wallis test for stable seasonality. The description below is based on Lothian and Morry (1978b); other details can be found in Dagum (1988, 1983). Let Fs and Fm denote the F value for the stable and moving seasonality tests, respectively. The combined test is performed as shown in Table 32.3 and as follows: 1. If the null hypothesis in the stable seasonality test is not rejected at the 0.10% significance level (0.001), then since the series is not seasonal, PROC X12 displays “Identifiable Seasonality Not Present.” 2. If the null hypothesis in step 1 is rejected, then compute the following quantities: T1 D
7 Fm
T2 D
3Fm Fs
Let T denote the simple average of T1 and T2 : T D
.T1 C T2 / 2
If the moving seasonality null hypothesis is not rejected at the 5.0% significance level (0.05) and if T 1:0, the null hypothesis of identifiable seasonality not present is accepted and PROC X12 displays “Identifiable Seasonality Not Present.” 3. If the null hypothesis of identifiable seasonality not present has not been accepted, but T1 1:0, T2 1:0, or the Kruskal-Wallis chi-squared test fails at the 0.10% significance level (0.001), then PROC X12 displays “Identifiable Seasonality Probably Not Present.” 4. If the FS and Kruskal-Wallis chi-squared tests pass, and if none of the combined measures described in steps 2 and 3 fail, then the null hypothesis of identifiable seasonality not present is rejected, and PROC X12 displays “Identifiable Seasonality Present.” The “Summary of Results for Combined Test for the Presence of Identifiable Seasonality” table displays the T1 , T2 , and T values and the significance levels for the stable seasonality test, the moving seasonality test, and the Kruskal-Wallis test. The last item in the table is the result of the combined test for identifiable seasonality.
2138 F Chapter 32: The X12 Procedure
Figure 32.3 Combined Seasonality Test Flowchart
Computations For more details about the computations used in PROC X12, see X-12-ARIMA Reference Manual (U.S. Bureau of the Census 2001b). For more details about the X-11 method of decomposition, see Seasonal Adjustment with the X-11 Method (Ladiray and Quenneville 2001).
Displayed Output/ODS Table Names/OUTPUT Tablename Keywords F 2139
Displayed Output/ODS Table Names/OUTPUT Tablename Keywords The options specified in PROC X12 control both the tables produced by the procedure and the tables available for output to the OUT= data set specified in the OUTPUT statement. The displayed output is organized into tables identified by a part letter and a sequence number within the part. The seven major parts of the X12 procedure are as follows: A
prior adjustments and regARIMA components (optional)
B
preliminary estimates of irregular component weights and trading day regression factors (X11 method)
C
final estimates of irregular component weights and trading day regression factors
D
final estimates of seasonal, trend cycle, and irregular components
E
analytical tables
F
summary measures
G
charts
Table 32.8 describes the individual tables and charts. “P” indicates that the table is displayed only and is not available for output to the OUT= data set. Data from displayed tables can be extracted into data sets by using the Output Delivery System (ODS). For more information about the features of the ODS Graphics system, including the many ways that you can control or customize the plots produced by SAS procedures, see Chapter 21, “Statistical Graphics Using ODS” (SAS/STAT User’s Guide). For more information about the SAS Output Delivery system, see the SAS Output Delivery System: User’s Guide. When tables available through the OUTPUT statement are output using ODS, the summary line is included in the ODS output by default. The summary line gives the average, standard deviation, or total by each period. The value –1 for YEAR indicates that the summary line is a total; the value –2 for YEAR indicates that the summary line is an average; and the value –3 for YEAR indicates that the line is a standard deviation. The value of YEAR for historical and forecast values will be greater than or equal to zero. Thus, a negative value indicates a summary line. You can suppress the summary line altogether by specifying the NOSUM option in the TABLES statement. However, the NOSUM option also suppresses the display of the summary line in the displayed table. “T” indicates that the table is available using the OUTPUT statement, but is not displayed by default; you must request that these tables be displayed by using the TABLES Statement. The actual number of tables displayed depends on the options and statements specified. If a table is not computed, it is not displayed. Table 32.8
Table Names and Descriptions
Table
Description
Notes
IDENTIFY Tables ACF PACF
autocorrelation factors partial autocorrelation factors
P P
2140 F Chapter 32: The X12 Procedure
Table 32.8
continued
Table
Description
Notes
ARIMA estimates for unit root identification results of unit root test for identifying orders of differencing models estimated by automatic ARIMA model selection procedure best five ARIMA models chosen by automatic modeling comparison of automatically selected model and default model final automatic model choice
P P
P P P P P P P P P P
Roots MLESummary ForecastCL MV1
extreme or missing values exact ARMA likelihood estimation iteration tolerances ARMA iteration history critical values to use in outlier detection potential outliers exact ARMA likelihood estimation iteration summary regression model parameter estimates chi-squared tests for groups of regressors exact ARMA maximum likelihood estimation average absolute percentage error in within(out)sample fore(back)casts: (non)seasonal (AR)MA roots estimation summary forecasts, standard errors, and confidence limits original series adjusted for missing value regressors
Sequenced Tables A1 A2 A6 A7 A8 A8AO A8LS A8TC A9 A10 B1 C17 C20
original series prior-adjustment factors regARIMA trading day component regARIMA holiday component regARIMA combined outlier component regARIMA AO outlier component regARIMA level change outlier component regARIMA temporary change outlier component regARIMA user-defined regression component regARIMA user-defined seasonal component prior-adjusted or original series final weight for irregular components final extreme value adjustment factors
AUTOMDL Tables UnitRootTestModel UnitRootTest AutoChoiceModel Best5Model AutomaticModelChoice FinalModelChoice
Modeling Tables MissingExtreme ARMAIterationTolerances IterHistory OutlierDetection PotentialOutliers ARMAIterationSummary RegParameterEstimates RegressorGroupChiSq ARMAParameterEstimates AvgFcstErr
P P P P
P P P
T
ODS Graphics F 2141
Table 32.8
continued
Table
Description
Notes
D1 D7 D8 D8A D9 D9A D10 D10B D10D D11 D11A
modified original data, D iteration preliminary trend cycle, D iteration final unmodified S-I ratios seasonality tests final replacement values for extreme S-I ratios moving seasonality ratio final seasonal factors seasonal factors, adjusted for user-defined seasonal final seasonal difference final seasonally adjusted series final seasonally adjusted series with forced yearly totals factors applied to get adjusted series with forced yearly totals rounded final seasonally adjusted series (with forced yearly totals) final trend cycle final irregular series combined adjustment factors final adjustment differences combined calendar adjustment factors ratios of annual totals percent changes in original series percent changes in final seasonally adjusted series percent changes (differences) in seasonally adjusted series with forced yearly totals (D11.A) percent changes (differences) in rounded seasonally adjusted series (D11.R) differences in final trend cycle summary measures quality assessment statistics day of the week trading day component factors spectral analysis
T T
D11F D11R D12 D13 D16 D16B D18 E4 E5 E6 E6A E6R E7 F2A-I F3 F4 G
P P
P
P P P P
ODS Graphics This section describes the use of ODS for creating graphics with the X12 procedure. To request these graphs, you must specify the ODS GRAPHICS statement. The graphics available through ODS GRAPHICS are ACF plots, PACF plots, and spectral graphs. ACF and PACF plots are not available unless the IDENTIFY statement is used. A spectral plot of the original series is always available; however additional spectral plots are provided when the X11
2142 F Chapter 32: The X12 Procedure
statement is used. When the ODS GRAPHICS statement is not used, the plots are integrated into the ACF, PACF, and spectral tables as a column of the table.
ODS Graph Names PROC X12 assigns a name to each graph it creates by using ODS. You can use these names to reference the graphs when using ODS. The names are listed in Table 32.9. Table 32.9
ODS Graphics Produced by PROC X12
ODS Graph Name ACFPlot PACFPlot SpectralPlot
Plot Description autocorrelation of model residuals partial autocorrelation of model residuals spectral plot of original or adjusted series
Special Data Sets The X12 procedure can input the MDLINFOIN= and output the MDLINFOOUT= data sets. The structure of both these data sets is the same. The difference is that when the MDLINFOIN= data set is read, only information relative to specifying a model is processed, whereas the MDLINFOOUT= data set contains the results of estimating a model. The X12 procedure can also read data sets that contain EVENT definition data. The structure of these data sets is the same as in the SAS® High Performance Forecasting system.
MDLINFOIN= and MDLINFOOUT= Data Sets The MDLINFOIN= and MDLINFOOUT= data sets can contain the following variables.
enable the model information to be specified by BY groups. BY variables can be included in this data set that match the BY variables used to process the series. If no BY variables are included, then the models specified by _NAME_ apply to all BY groups.
_NAME_
should match the variable name of the time series to which the model is to be applied. Omit the _NAME_ variable if you are specifying the same model for all series in the BY group.
_MODELTYPE_
specifies whether the observation contains regression or ARIMA information. For PROC X12, _MODELTYPE_ should either be REG to supply regression information or ARIMA to supply model information. If valid regression information exists in the MDLINFOIN= data set for the BY group and series being processed, then the REGRESSION, INPUT, and EVENT statements are ignored for that BY group and series. Likewise, if valid model information exists in the data set, then the AUTOMDL, ARIMA, and TRANSFORM statements are ignored. Valid values for the other variables in the data set depend on the value of the
Special Data Sets F 2143
_MODELTYPE_ variable. While other values of _MODELTYPE_ might be permit-
ted in other SAS procedures, PROC X12 recognizes only REG and ARIMA. _MODELPART_
further qualifies the regression or ARIMA information in the observation. For _MODELTYPE_= REG, valid values of _MODELPART_ are INPUT, EVENT, and PREDEFINED. A value of INPUT indicates that this observation refers to the user-defined variable whose name is given in _DSVAR_. Likewise, a value of EVENT indicates that the observation refers to the SAS or user-defined EVENT whose name is given in _DSVAR_. PREDEFINED indicates that the name given in _DSVAR_ is a predefined U.S. Census Bureau variable. If only model information is included in the data set (that is, all observations have _MODELTYPE_ = ARIMA) then the _MODELPART_ variable can be omitted. However, valid values for model information are FORECAST, “.”, or blank.
_COMPONENT_
further qualifies the regression or ARIMA information in the observation. For _MODELTYPE_= REG, the only valid value of _COMPONENT_ is SCALE. Other SAS procedures might allow other values. For _MODELTYPE_= ARIMA, the valid values of _COMPONENT_ are TRANSFORM, CONSTANT, NONSEASONAL, and SEASONAL. TRANSFORM indicates that the observation contains the information that would be supplied in the TRANSFORM statement. CONSTANT is specified to control the constant term in the model. NONSEASONAL and SEASONAL refer to the AR, MA, and differencing terms in the ARIMA model.
_PARMTYPE_
further qualifies the regression or ARIMA information in the observation. For regression information, the value of _PARMTYPE_ is the same as the value of the REGRESSION USERTYPE= option. Since the USERTYPE= option applies only to user-defined events and variables, the value of _PARMTYPE_ does not alter processing in observations where _MODELPART_ = PREDEFINED. However, it is consistent to use a value for _PARMTYPE_ that matches the Census predefined variable. For the constant term in model information, _PARMTYPE_ should be SCALE. For transformation information, the value of _PARMTYPE_ should be NONE, LOG, LOGIT, SQRT, or BOXCOX. For ARIMA model information, _PARMTYPE_ should be AR, MA, or DIF.
_DSVAR_
specifies the variable name associated with this observation. For regression information, the value of _DSVAR_ is the name of the user-defined variable, the EVENT, or the Census predefined variable. For model information, _DSVAR_ should match the name of the series being processed. If the model information applies to more than one series, then _DSVAR_ can be blank or “.”.
_VALUE_
contains a numerical value that is used as a parameter for certain types of information. For certain Census predefined variables, _VALUE_ is the associated parameter value. For example, the REGESSION statement option PREDEFINED=EASTER(6) would be implemented using _DSVAR_=EASTER and _VALUE_=6. For a BOXCOX transformation, _VALUE_ would be the associated parameter value. For _COMPONENT_=SEASONAL, if _VALUE_ is nonmissing, then _VALUE_ is used as the seasonal period. If _VALUE_ is missing for _COMPONENT_=SEASONAL, then the seasonal period is determined by the interval of the series.
2144 F Chapter 32: The X12 Procedure
_FACTOR_
applies only to AR and MA ARIMA model information. The actual value of _FACTOR_ should be the same for all observations related to lags within the same ARIMA factor. So the value of _FACTOR_ identifies lags that belong in the same factor.
_LAG_
identifies the degree for differencing and AR and MA lags. If _COMPONENT_=SEASONAL, then the value in _LAG_ is multiplied by the seasonal period indicated by the value of _VALUE_.
_SHIFT_
contains the shift value for transfer functions. This value is not processed by PROC X12.
_NOEST_
indicates whether a parameter associated with the observation is to be estimated. For example, the NOINT option would be indicated by constant information with _NOEST_=1 and _EST_=0. _NOEST_=1 indicates that the value in _EST_ is a fixed value. _NOEST_ pertains to the constant term, to AR and MA lags, and to regression information.
_EST_
contains an initial or fixed value for a parameter associated with the observation that is to be estimated. _NOEST_=1 indicates the value in _EST_ is a fixed value. _EST_ pertains to the constant term, to AR and MA lags, and to regression information.
_STDERR_
contains output information about estimated parameters.
_TVALUE_
contains output information about estimated parameters.
_PVALUE_
contains output information about estimated parameters.
INEVENT= Data Set The INEVENT= data set can contain the following variables. When a variable is omitted from the data set, that variable is assumed to have the default value for all observations. The default values are given below. _NAME_
EVENT variable name. _NAME_ is displayed with the case preserved. Since _NAME_ is a SAS variable name, the event can be referenced by using any case. The _NAME_ variable is required; there is no default.
_CLASS_
class of EVENT: SIMPLE, COMBINATION, PREDEFINED. The default for _CLASS_ is SIMPLE.
_KEYNAME_
contains either a date keyword (SIMPLE EVENT), a predefined EVENT variable name (PREDEFINED EVENT), or an EVENT name (COMBINATION event). All _KEYNAME_ values are displayed in upper case. However, if the _KEYNAME_ value refers to an EVENT name, then the actual name can be of mixed case. The default for _KEYNAME_ is no keyname, designated by “.”.
_STARTDATE_
contains either the date timing value or the first date timing value to use in a do-list. The default for _STARTDATE_ is no date, designated by a missing value.
_ENDDATE_
contains the last date timing value to use in a do-list. The default for _ENDDATE_ is no date, designated by a missing value.
Special Data Sets F 2145
_DATEINTRVL_
contains the interval for the date do-list. The default for _DATEINTRVL_ is no interval, designated by “.”.
_STARTDT_
contains either the datetime timing value or the first datetime timing value to use in a do-list. The default for _STARTDT_ is no datetime, designated by a missing value.
_ENDDT_
contains the last datetime timing value to use in a do-list. The default for _ENDDT_ is no datetime, designated by a missing value.
_DTINTRVL_
contains the interval for the datetime do-list. The default for _DTINTRVL_ is no interval, designated by “.”.
_STARTOBS_
contains either the observation number timing value or the first observation number timing value to use in a do-list. The default for _STARTOBS_ is no observation number, designated by a missing value.
_ENDOBS_
contains the last observation number timing value to use in a do-list. The default for _ENDOBS_ is no observation number, designated by a missing value.
_OBSINTRVL_
contains the interval length of the observation number do-list. The default for _OBSINTRVL_ is no interval, designated by “.”.
_TYPE_
type of EVENT. The default for _TYPE_ is POINT.
_VALUE_
value for nonzero observation. The default for _VALUE_ is 1:0.
_PULSE_
INTERVAL that defines the units for the DURATION values. The default for _PULSE_ is no interval, designated by “.”.
_DUR_BEFORE_
number of durations before the timing value. The default for _DUR_BEFORE_ is 0.
_DUR_AFTER_ number of durations after the timing value. The default for _DUR_AFTER_ is 0. _SLOPE_BEFORE_ determines whether the curve is GROWTH or DECAY before the timing value for _TYPE_=RAMP, _TYPE_=RAMPP, and _TYPE_=TC. The default for _SLOPE_BEFORE_ is GROWTH. _SLOPE_AFTER_ determines whether the curve is GROWTH or DECAY after the timing value for _TYPE_=RAMP, _TYPE_=RAMPP, and _TYPE_=TC. The default for _SLOPE_AFTER_ is GROWTH unless _TYPE_=TC; then the default is DECAY. _SHIFT_
number of _PULSE_= intervals to shift the timing value. The shift can be positive (forward in time) or negative (backward in time). If _PULSE_= is not specified, then the shift is in observations. The default for _SHIFT_ is 0.
_TCPARM_
parameter for EVENT of TYPE=TC. The default for _TCPARM_ is 0:5.
_RULE_
rule to use when combining events or when timing values of an event overlap. The default for _RULE_ is ADD.
_PERIOD_
frequency interval at which the event should be repeated. If this value is missing, then the event is not periodic. The default for _PERIOD_ is no interval, designated by “.”.
_LABEL_
label or description for the event. If you do not specify a label, then the default label value is displayed as “.”. For events that produce dummy variables, either the user-supplied label or the default label is used. For COMPLEX events, the _LABEL_ value is merely a description of the group of events.
2146 F Chapter 32: The X12 Procedure
Examples: X12 Procedure
Example 32.1: Model Identification An example of the statements typically invoked when using PROC X12 for model identification might follow the same format as the following example. This example invokes the X12 procedure and uses the TRANSFORM and IDENTIFY statements. It specifies the time series data, takes the logarithm of the series (TRANSFORM statement), and generates ACFs and PACFs for the specified levels of differencing (IDENTIFY statement). The ACFs and PACFs for Nonseasonal Order=1 and Seasonal Order=1 are shown in Output 32.1.1, Output 32.1.2, Output 32.1.3, and Output 32.1.4. The data set is the same as in the section “Basic Seasonal Adjustment” on page 2104. The graphical displays are requested by specifying the ODS GRAPHICS statement. For more information about the graphics available in the X12 procedure, see the section “ODS Graphics” on page 2141. ods graphics on; proc x12 data=sales date=date; var sales; transform power=0; identify diff=(0,1) sdiff=(0,1); run;
Example 32.1: Model Identification F 2147
Output 32.1.1 ACFs (Nonseasonal Order=1 Seasonal Order=1) The X12 Procedure
Lag 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Autocorrelation of Model Residuals Differencing: Nonseasonal Order=1 Seasonal Order=1 For variable sales Standard Correlation Error Chi-Square DF Pr > ChiSq -0.34112 0.10505 -0.20214 0.02136 0.05565 0.03080 -0.05558 -0.00076 0.17637 -0.07636 0.06438 -0.38661 0.15160 -0.05761 0.14957 -0.13894 0.07048 0.01563 -0.01061 -0.11673 0.03855 -0.09136 0.22327 -0.01842 -0.10029 0.04857 -0.03024 0.04713 -0.01803 -0.05107 -0.05377 0.19573 -0.12242 0.07775 -0.15245 -0.01000
0.08737 0.09701 0.09787 0.10101 0.10104 0.10128 0.10135 0.10158 0.10158 0.10389 0.10432 0.10462 0.11501 0.11653 0.11674 0.11820 0.11944 0.11975 0.11977 0.11978 0.12064 0.12074 0.12126 0.12436 0.12438 0.12500 0.12514 0.12520 0.12533 0.12535 0.12551 0.12569 0.12799 0.12888 0.12924 0.13061
15.5957 17.0860 22.6478 22.7104 23.1387 23.2709 23.7050 23.7050 28.1473 28.9869 29.5887 51.4728 54.8664 55.3605 58.7204 61.6452 62.4045 62.4421 62.4596 64.5984 64.8338 66.1681 74.2099 74.2652 75.9183 76.3097 76.4629 76.8387 76.8943 77.3442 77.8478 84.5900 87.2543 88.3401 92.5584 92.5767
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
<.0001 0.0002 <.0001 0.0001 0.0003 0.0007 0.0013 0.0026 0.0009 0.0013 0.0018 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
NOTE: The P-values approximate the probability of observing a Q-value at least this large when the model fitted is correct. When DF is positive, small values of P, customarily those below 0.05 indicate model inadequacy.
2148 F Chapter 32: The X12 Procedure
Output 32.1.2 Plot for ACFs (Nonseasonal Order=1 Seasonal Order=1)
Example 32.1: Model Identification F 2149
Output 32.1.3 PACFs (Nonseasonal Order=1 Seasonal Order=1) Partial Autocorrelation of Model Residuals Differencing: Nonseasonal Order=1 Seasonal Order=1 For variable sales Standard Lag Correlation Error 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
-0.34112 -0.01281 -0.19266 -0.12503 0.03309 0.03468 -0.06019 -0.02022 0.22558 0.04307 0.04659 -0.33869 -0.10918 -0.07684 -0.02175 -0.13955 0.02589 0.11482 -0.01316 -0.16743 0.13240 -0.07204 0.14285 -0.06733 -0.10267 -0.01007 0.04378 -0.08995 0.04690 -0.00490 -0.09638 -0.01528 0.01150 -0.01916 0.02303 -0.16488
0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737 0.08737
2150 F Chapter 32: The X12 Procedure
Output 32.1.4 Plot for PACFs (Nonseasonal Order=1 Seasonal Order=1)
Example 32.2: Model Estimation After studying the output from Example 32.1 and identifying the ARIMA part of the model as, for example, (0 1 1)(0 1 1) 12, you can replace the IDENTIFY statement with the ARIMA and ESTIMATE statements. The parameter estimates and estimation summary statistics are shown in Output 32.2.1. proc x12 data=sales date=date; var sales; transform power=0; arima model=( (0,1,1)(0,1,1) ); estimate; run ;
Example 32.3: Seasonal Adjustment F 2151
Output 32.2.1 Estimation Data The X12 Procedure Exact ARMA Likelihood Estimation Iteration Tolerances For variable sales Maximum Total ARMA Iterations Convergence Tolerance
1500 1.0E-05
Average absolute percentage error in within-sample forecasts: For variable sales Last year: Last-1 year: Last-2 year: Last three years:
2.81 6.38 7.69 5.63
Exact ARMA Likelihood Estimation Iteration Summary For variable sales Number of ARMA iterations Number of Function Evaluations
Parameter Nonseasonal MA Seasonal MA
6 19
Exact ARMA Maximum Likelihood Estimation For variable sales Standard Lag Estimate Error t Value 1 12
0.40181 0.55695
0.07887 0.07626
5.09 7.30
Pr > |t| <.0001 <.0001
Estimation Summary For variable sales Number of Residuals Number of Parameters Estimated Variance Estimate Standard Error Estimate Log likelihood Transformation Adjustment Adjusted Log likelihood AIC AICC (F-corrected-AIC) Hannan Quinn BIC
131 3 1.3E-03 3.7E-02 244.6965 -735.2943 -490.5978 987.1956 987.3845 990.7005 995.8211
Example 32.3: Seasonal Adjustment Assuming that the model in Example 32.2 is satisfactory, a seasonal adjustment that uses forecast extension can be performed by adding the X11 statement to the procedure. By default, the data is forecast one year ahead at the end of the series. Table D8.A is shown in Output 32.3.1.
2152 F Chapter 32: The X12 Procedure
ods output D8A#1=SalesD8A_1; ods output D8A#2=SalesD8A_2; ods output D8A#3=SalesD8A_3; ods output D8A#4=SalesD8A_4; proc x12 data=sales date=date; var sales; transform power=0; arima model=( (0,1,1)(0,1,1) ); estimate; x11; run; title ’Stable Seasonality Test’; proc print data=SalesD8A_1 LABEL; run; title ’Nonparametric Stable Seasonality Test’; proc print data=SalesD8A_2 LABEL; run; title ’Moving Seasonality Test’; proc print data=SalesD8A_3 LABEL; run; title ’Combined Seasonality Test’; proc print data=SalesD8A_4 LABEL NOOBS; var _NAME_ Name1 Label1 cValue1; run;
Output 32.3.1 Table D8.A as Displayed The X12 Procedure Table D 8.A: F-tests for Seasonality For variable sales Test for the Presence of Seasonality Assuming Stability Sum of Mean Squares DF Square F-Value Between Months Residual Total
23571.41 1481.28 25052.69
11 132 143
2142.855 11.22182
190.9544
** Seasonality present at the 0.1 percent level. Nonparametric Test for the Presence of Seasonality Assuming Stability KruskalWallis Probability Statistic DF Level 131.9546
11
.00%
Seasonality present at the one percent level.
**
Example 32.3: Seasonal Adjustment F 2153
Output 32.3.1 continued Moving Seasonality Test Sum of Mean Squares DF Square Between Years Error
259.2517 846.1424
10 110
25.92517 7.692204
F-Value 3.370317
**
**Moving seasonality present at the one percent level. Summary of Results and Combined Test for the Presence of Identifiable Seasonality Seasonality Tests:
Probability Level
Stable Seasonality F-test Moving Seasonality F-test Kruskal-Wallis Chi-square Test
0.000 0.001 0.000
Combined Measures:
Value
T1 = 7/F_Stable T2 = 3*F_Moving/F_Stable T = (T1 + T2)/2
0.04 0.05 0.04
Combined Test of Identifiable Seasonality:
Present
The four ODS statements in the preceding example direct output from the D8A tables into four data sets: SalesD8A_1, SalesD8A_2, SalesD8A_3, and SalesD8A_4. It is best to direct the output to four different data sets because the four tables associated with table D8A have varying formats. The ODS data sets are shown in Output 32.3.2, Output 32.3.3, Output 32.3.4, and Output 32.3.5. Output 32.3.2 Table D8.A as Output in a Data Set by Using ODS Stable Seasonality Test
Obs
_NAME_
FT_SRC
1 2 3
sales sales sales
Between Months Residual Total
Sum of Squares
DF
Mean Square
F-Value
FT_AST
23571.41 1481.28 25052.69
11 132 143
2142.855 11.22182 .
190.9544 . .
**
2154 F Chapter 32: The X12 Procedure
Output 32.3.3 Table D8.A as Output in a Data Set by Using ODS Nonparametric Stable Seasonality Test
Obs
_NAME_
KruskalWallis Statistic
1
sales
131.9546
DF
Probability Level
11
.00%
Output 32.3.4 Table D8.A as Output in a Data Set by Using ODS Moving Seasonality Test
Obs
_NAME_
FT_SRC
1 2
sales sales
Between Years Error
Sum of Squares
DF
Mean Square
F-Value
FT_AST
259.2517 846.1424
10 110
25.92517 7.692204
3.370317 .
**
Output 32.3.5 Table D8.A as Output in a Data Set by Using ODS Combined Seasonality Test _NAME_ Name1
Label1
sales sales sales sales sales sales sales sales sales sales sales sales sales
Seasonality Tests: P_STABLE P_MOV P_KW
Stable Seasonality F-test Moving Seasonality F-test Kruskal-Wallis Chi-square Test Combined Measures:
T1 T2 T
T1 = 7/F_Stable T2 = 3*F_Moving/F_Stable T = (T1 + T2)/2
cValue1 Probability Level 0.000 0.001 0.000 Value 0.04 0.05 0.04
IDSeasTest Combined Test of Identifiable Seasonality: Present
Example 32.4: RegARIMA Automatic Model Selection This example demonstrates two of the new features available through the X-12-ARIMA method that are not available by using the previous X-11 and X-11-ARIMA methods: regARIMA modeling and TRAMO-based automatic model selection. Assume that the same data set is used as in the previous examples. title ’TRAMO Automatic Model Identification’; ods select ModelEstimation.AutoModel.UnitRootTestModel
Example 32.4: RegARIMA Automatic Model Selection F 2155
ModelEstimation.AutoModel.UnitRootTest ModelEstimation.AutoModel.AutoChoiceModel ModelEstimation.AutoModel.Best5Model ModelEstimation.AutoModel.AutomaticModelChoice ModelEstimation.AutoModel.FinalModelChoice ModelEstimation.AutoModel.AutomdlNote; proc x12 data=sales date=date; var sales; transform function=log; regression predefined=td; automdl maxorder=(1,1) print=unitroottest unitroottestmdl autochoicemdl best5model; estimate; x11; output out=out(obs=23) a1 a2 a6 b1 c17 c20 d1 d7 d8 d9 d10 d11 d12 d13 d16 d18; run; proc print data=out(obs=23); title ’Output Variables Related to Trading Day Regression’; run;
The automatic model selection output is shown in Output 32.4.1, Output 32.4.2, and Output 32.4.3. The first table, “ARIMA Estimate for Unit Root Identification,” gives details of the method that TRAMO uses to automatically select the orders of differencing. The second table, “Results of Unit Root Test for Identifying Orders of Differencing,” shows that a regular difference order of 1 and a seasonal difference order of 1 has been determined by TRAMO. The third table, “Models estimated by Automatic ARIMA Model Selection procedure,” shows all the models examined by the TRAMO-based method. The fourth table, “Best Five ARIMA Models Chosen by Automatic Modeling,” shows the top five models in order of rank and their BIC2 statistic. The fifth table, “Comparison of Automatically Selected Model and Default Model,” compares the model selected by the TRAMO model to the default X-12-ARIMA model. The sixth table, “Final Automatic Model Selection,” shows which model was actually selected.
2156 F Chapter 32: The X12 Procedure
Output 32.4.1 Output from the AUTOMDL Statement TRAMO Automatic Model Identification The X12 Procedure ARIMA Estimates for Unit Root Identification For variable sales Model Number 1
2
3
Estimation Method H-R H-R H-R H-R H-R H-R H-R H-R H-R H-R H-R
Estimated Model ( ( ( ( ( ( ( ( ( ( (
2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
0)( 0)( 0)( 1)( 1)( 1)( 1)( 1)( 1)( 1)( 1)(
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
0) 0) 0) 1) 1) 1) 1) 1) 1) 1) 1)
ARMA Parameter Estimate NS_AR_1 NS_AR_2 S_AR_12 NS_AR_1 S_AR_12 NS_MA_1 S_MA_12 NS_AR_1 S_AR_12 NS_MA_1 S_MA_12
0.67540 0.28425 0.91963 0.13418 0.98500 0.47884 0.51726 -0.39269 0.06223 -0.09570 0.58536
Results of Unit Root Test for Identifying Orders of Differencing For variable sales Regular difference order
Seasonal difference order
Mean Significant
1
1
no
Example 32.4: RegARIMA Automatic Model Selection F 2157
Output 32.4.2 Output from the AUTOMDL Statement Models estimated by Automatic ARIMA Model Selection procedure For variable sales Model Number 1
2
3
4
5 6
7
8
9
ARMA Parameter Estimate
Estimated Model ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0)( 0)( 0)( 0)( 0)( 0)( 0)( 0)( 0)( 0)( 0)( 0)( 0)( 0)( 0)( 0)( 0)( 0)( 0)( 0)( 0)( 0)( 1)( 1)( 1)( 0)( 0)( 0)( 1)( 1)( 1)( 1)( 1)( 1)(
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0) 0) 0) 0) 1) 1) 1) 1) 1) 0) 0) 0) 0) 0) 1) 1) 1) 1) 1) 1) 1) 1) 1) 1) 1) 1) 1) 1) 1) 1) 1) 1) 0) 0)
NS_AR_1 NS_AR_2 NS_AR_3
-0.33524 -0.05558 -0.15649
NS_AR_1 NS_AR_2 NS_AR_3 S_MA_12
-0.33186 -0.05823 -0.15200 0.55279
NS_AR_1 NS_AR_2 NS_AR_3 S_AR_12
-0.38673 -0.08768 -0.18143 -0.47336
NS_AR_1 NS_AR_2 NS_AR_3 S_AR_12 S_MA_12
-0.34352 -0.06504 -0.15728 -0.12163 0.47073
S_MA_12
0.60446
NS_MA_1 S_MA_12
0.36272 0.55599
NS_AR_1 S_MA_12
-0.32734 0.55834
NS_AR_1 NS_MA_1 S_MA_12
0.17833 0.52867 0.56212
NS_MA_1
0.36005
Statistics of Fit BIC BIC2
1024.469
-3.40549
993.7880
-3.63970
1000.224
-3.59057
998.0548
-3.60713
996.8560
-3.61628
986.6405
-3.69426
987.1500
-3.69037
991.2363
-3.65918
1017.770
-3.45663
2158 F Chapter 32: The X12 Procedure
Output 32.4.3 Output from the AUTOMDL Statement TRAMO Automatic Model Identification The X12 Procedure Automatic ARIMA Model Selection Methodology based on research by Gomez and Maravall (2000).
Best Five ARIMA Models Chosen by Automatic Modeling For variable sales Rank 1 2 3 4 5
Estimated Model ( ( ( ( (
0, 1, 1, 0, 0,
1, 1, 1, 1, 1,
1)( 0)( 1)( 0)( 1)(
0, 0, 0, 0, 0,
1, 1, 1, 1, 1,
BIC2 1) 1) 1) 1) 0)
-3.69426 -3.69037 -3.65918 -3.61628 -3.45663
Comparison of Automatically Selected Model and Default Model For variable sales
Source of Candidate Models Automatic Model Choice Airline Model (Default)
Statistics of Fit Plbox Rvr
Estimated Model ( 0, 1, 1)( 0, 1, 1) ( 0, 1, 1)( 0, 1, 1)
0.62560 0.62561
0.03546 0.03546
Comparison of Automatically Selected Model and Default Model For variable sales
Source of Candidate Models Automatic Model Choice Airline Model (Default)
Statistics of Fit Number of Plbox RvrOutliers
Estimated Model ( 0, 1, 1)( 0, 1, 1) ( 0, 1, 1)( 0, 1, 1)
0.62560 0.62561
0.03546 0.03546
0 0
Final Automatic Model Selection For variable sales Source of Model Automatic Model Choice
Estimated Model ( 0, 1, 1)( 0, 1, 1)
Table 32.10 and Output 32.4.4 illustrate the regARIMA modeling method. Table 32.10 shows the relationship between the output variables in PROC X12 that results from a regARIMA model. Note that some of these formulas apply only to this example. Output 32.4.4 shows the values of these variables for the first 23 observations in the example.
Example 32.4: RegARIMA Automatic Model Selection F 2159
Table 32.10
regARIMA Output Variables and Descriptions
Table
Title
Type
Formula
A1 A2
data factor
input calculated from regression
factor
calculated from regression
data
C17
time series data (for the span analyzed) prior-adjustment factors leap year (from trading day regression) adjustments regARIMA trading day component leap year prior adjustments included from Table A2 original series (prior adjusted) (adjusted for regARIMA factors) final weights for irregular component
factor
C20 D1
final extreme value adjustment factors modified original data, D iteration
factor data
D7
preliminary trend cycle, D iteration
data
D8
final unmodified SI ratios
factor
D9
final replacement values for SI ratios
factor
D10 D11
final seasonal factors final seasonally adjusted data (also adjusted for trading day)
factor data
D12
final trend cycle
data
D13 D16
final irregular component combined adjustment factors (includes seasonal, trading day factors) combined calendar adjustment factors (includes trading day factors)
factor factor
B1 D A1=A6 * * because only TD specified calculated using moving standard deviation calculated using C16 and C17 D1 D B1=C 20 ** D1 D C19=C 20 ** C19=B1 in this example calculated using Henderson moving average D8 D B1=D7 *** D8 D C19=D7 *** TD specified in regression if C17 shows extreme values, D9 D D1=D7; D9 D : otherwise calculated using moving averages D11 D B1=D10 **** D11 D C19=D10 **** B1 D C19 for this example calculated using Henderson moving average D13 D D11=D12 D16 D A1=D11
A6
B1
D18
factor
D18 D D16=D10 D18 D A6 ***** ***** regression TD is the only calendar adjustment factor in this example
2160 F Chapter 32: The X12 Procedure
Output 32.4.4 Output Variables Related to Trading Day Regression Output Variables Related to Trading Day Regression
Obs DATE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
sales_A1 sales_A2 sales_A6 sales_B1
SEP78 OCT78 NOV78 DEC78 JAN79 FEB79 MAR79 APR79 MAY79 JUN79 JUL79 AUG79 SEP79 OCT79 NOV79 DEC79 JAN80 FEB80 MAR80 APR80 MAY80 JUN80 JUL80
Obs sales_D8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0.89040 0.94731 1.06161 1.01225 0.96179 1.07521 1.15580 1.17347 1.07360 0.91822 0.81156 0.91589 0.88151 0.96581 1.05869 1.00429 0.91906 1.03509 1.20245 1.18520 1.07239 0.90436 0.76108
112 118 132 129 121 135 148 148 136 119 104 118 115 126 141 135 125 149 170 170 158 133 114
1.00000 1.00000 1.00000 1.00000 1.00000 0.99115 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.02655 1.00000 1.00000 1.00000 1.00000 1.00000
1.01328 0.99727 0.98960 1.00957 0.99408 0.99115 1.00966 0.99279 0.99406 1.01328 0.99727 0.99678 1.00229 0.99408 1.00366 0.99872 0.99406 1.03400 0.99872 0.99763 1.00966 0.99279 0.99406
110.532 118.323 133.388 127.777 121.721 136.205 146.584 149.075 136.813 117.440 104.285 118.381 114.737 126.751 140.486 135.173 125.747 144.100 170.217 170.404 156.489 133.966 114.681
sales_ C17 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.98630 0.88092 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000
sales_ C20 sales_D1 sales_D7 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.99964 1.00320 1.00000 1.00000 0.95084 1.00000 1.00000 1.00000 1.00000 1.00000 0.94057
110.532 118.323 133.388 127.777 121.721 136.205 146.584 149.075 136.813 117.440 104.285 118.381 114.778 126.346 140.486 135.173 132.248 144.100 170.217 170.404 156.489 133.966 121.927
124.138 124.905 125.646 126.231 126.557 126.678 126.825 127.038 127.433 127.900 128.499 129.253 130.160 131.238 132.699 134.595 136.820 139.215 141.559 143.777 145.925 148.133 150.682
sales_D9
sales_ D10
sales_ D11
sales_ D12
sales_ D13
sales_ D16
sales_ D18
. . . . . . . . . . . . 0.88182 0.96273 . . 0.96658 . . . . . 0.80917
0.90264 0.94328 1.06320 0.99534 0.97312 1.05931 1.17842 1.18283 1.06125 0.91663 0.81329 0.91135 0.90514 0.93820 1.06183 0.99339 0.97481 1.06153 1.17965 1.18499 1.06005 0.91971 0.81275
122.453 125.438 125.459 128.375 125.083 128.579 124.391 126.033 128.916 128.121 128.226 129.897 126.761 135.100 132.306 136.072 128.996 135.748 144.295 143.802 147.624 145.662 141.103
124.448 125.115 125.723 126.205 126.479 126.587 126.723 126.902 127.257 127.747 128.421 129.316 130.347 131.507 132.937 134.720 136.763 138.996 141.221 143.397 145.591 147.968 150.771
0.98398 1.00258 0.99790 1.01720 0.98896 1.01574 0.98160 0.99315 1.01303 1.00293 0.99848 1.00449 0.97249 1.02732 0.99525 1.01004 0.94321 0.97663 1.02177 1.00283 1.01397 0.98442 0.93588
0.91463 0.94070 1.05214 1.00487 0.96735 1.04994 1.18980 1.17430 1.05495 0.92881 0.81107 0.90841 0.90722 0.93264 1.06571 0.99212 0.96902 1.09762 1.17814 1.18218 1.07028 0.91307 0.80792
1.01328 0.99727 0.98960 1.00957 0.99408 0.99115 1.00966 0.99279 0.99406 1.01328 0.99727 0.99678 1.00229 0.99408 1.00366 0.99872 0.99406 1.03400 0.99872 0.99763 1.00966 0.99279 0.99406
Example 32.5: Automatic Outlier Detection F 2161
Example 32.5: Automatic Outlier Detection This example demonstrates the use of the OUTLIER statement to automatically detect and remove outliers from a time series to be seasonally adjusted. The data set is the same as in the section “Basic Seasonal Adjustment” on page 2104 and the previous examples. Adding the OUTLIER statement to Example 32.3 requests that outliers be detected by using the default critical value as described in the section “OUTLIER Statement” on page 2122. The tables associated with outlier detection for this example are shown in Output 32.5.1. The first table shows the critical values; the second table shows that a single potential outlier was identified; the third table shows the estimates for the ARMA parameters. Since no outliers are included in the regression model, the “Regression Model Parameter Estimates” table is not displayed. Because only a potential outlier was identified, and not an actual outlier, in this case the A1 series and the B1 series are identical. title ’Automatic Outlier Identification’; proc x12 data=sales date=date; var sales; transform function=log; arima model=( (0,1,1)(0,1,1) ); outlier; estimate; x11; output out=nooutlier a1 b1 d10; run ;
Output 32.5.1 PROC X12 Output When Potential Outliers Are Identified Automatic Outlier Identification The X12 Procedure Critical Values to use in Outlier Detection For variable sales Begin End Observations Method AO Critical Value LS Critical Value
SEP1978 AUG1990 144 Add One 3.889838 3.889838
NOTE: The following time series values might later be identified as outliers when data are added or revised. They were not identified as outliers in this run either because their test t-statistics were slightly below the critical value or because they were eliminated during the backward deletion step of the identification procedure, when a non-robust t-statistic is used.
Type of Outlier AO
Potential Outliers For variable sales t Value Date for AO NOV1989
-3.48
t Value for LS -1.51
2162 F Chapter 32: The X12 Procedure
Output 32.5.1 continued
Parameter
Exact ARMA Maximum Likelihood Estimation For variable sales Standard Lag Estimate Error t Value
Nonseasonal MA Seasonal MA
1 12
0.40181 0.55695
0.07887 0.07626
5.09 7.30
Pr > |t| <.0001 <.0001
In the next example, reducing the critical value to 3.3 causes the outlier identification routine to more aggressively identify outliers as shown in Output 32.5.2. The first table shows the critical values. The second table shows that three additive outliers and a level shift have been included in the regression model. The third table shows how the inclusion of outliers in the model affects the ARMA parameters. proc x12 data=sales date=date; var sales; transform function=log; arima model=((0,1,1) (0,1,1)); outlier cv=3.3; estimate; x11; output out=outlier(obs=50) a1 a8 run; proc print data=outlier(obs=50); run;
a8ao a8ls b1 d10;
Output 32.5.2 PROC X12 Output When Outliers Are Identified Automatic Outlier Identification The X12 Procedure Critical Values to use in Outlier Detection For variable sales Begin End Observations Method AO Critical Value LS Critical Value
Type Automatically Identified
SEP1978 AUG1990 144 Add One 3.3 3.3
Regression Model Parameter Estimates For variable sales Standard Parameter NoEst Estimate Error
t Value
Pr > |t|
AO JAN1981
Est
0.09590
0.02168
4.42
<.0001
LS FEB1983 AO OCT1983 AO NOV1989
Est Est Est
-0.09673 -0.08032 -0.10323
0.02488 0.02146 0.02480
-3.89 -3.74 -4.16
0.0002 0.0003 <.0001
Example 32.5: Automatic Outlier Detection F 2163
Output 32.5.2 continued
Parameter
Exact ARMA Maximum Likelihood Estimation For variable sales Standard Lag Estimate Error t Value
Nonseasonal MA Seasonal MA
1 12
0.33205 0.49647
0.08239 0.07676
4.03 6.47
Pr > |t| <.0001 <.0001
The first 50 observations of the A1, A8, A8AO, A8LS, B1, and D10 series are displayed in Output 32.5.3. You can confirm the following relationships from the data. A8 D A8AO A8LS B1 D A1=A8 The seasonal factors are stored in the variable sales_D10.
2164 F Chapter 32: The X12 Procedure
Output 32.5.3 PROC X12 Output Series Related to Outlier Detection Automatic Outlier Identification
Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
DATE
sales_A1
sales_A8
sales_ A8AO
sales_ A8LS
sales_B1
sales_ D10
SEP78 OCT78 NOV78 DEC78 JAN79 FEB79 MAR79 APR79 MAY79 JUN79 JUL79 AUG79 SEP79 OCT79 NOV79 DEC79 JAN80 FEB80 MAR80 APR80 MAY80 JUN80 JUL80 AUG80 SEP80 OCT80 NOV80 DEC80 JAN81 FEB81 MAR81 APR81 MAY81 JUN81 JUL81 AUG81 SEP81 OCT81 NOV81 DEC81 JAN82 FEB82 MAR82 APR82 MAY82 JUN82 JUL82 AUG82 SEP82 OCT82
112 118 132 129 121 135 148 148 136 119 104 118 115 126 141 135 125 149 170 170 158 133 114 140 145 150 178 163 172 178 199 199 184 162 146 166 171 180 193 181 183 218 230 242 209 191 172 194 196 196
1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.21243 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156
1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.10065 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156 1.10156
101.674 107.121 119.830 117.107 109.844 122.553 134.355 134.355 123.461 108.029 94.412 107.121 104.397 114.383 128.000 122.553 113.475 135.263 154.327 154.327 143.433 120.738 103.490 127.093 131.632 136.171 161.589 147.972 141.864 161.589 180.653 180.653 167.036 147.064 132.539 150.695 155.234 163.405 175.206 164.312 166.128 197.901 208.795 219.688 189.731 173.391 156.142 176.114 177.930 177.930
0.90496 0.94487 1.04711 1.00119 0.94833 1.06817 1.18679 1.17607 1.07565 0.91844 0.81206 0.91602 0.90865 0.94131 1.04496 0.99766 0.94942 1.07172 1.18663 1.18105 1.07383 0.91930 0.81385 0.91466 0.91302 0.93086 1.03965 0.99440 0.95136 1.07981 1.18661 1.19097 1.06905 0.92446 0.81517 0.91148 0.91352 0.91632 1.03194 0.98879 0.95699 1.09125 1.19059 1.20448 1.06355 0.92897 0.81476 0.90667 0.91200 0.89970
Example 32.5: Automatic Outlier Detection F 2165
From the two previous examples, you can examine how outlier detection affects the seasonally adjusted series. Output 32.5.4 shows a plot of A1 versus B1 in the series where outliers are detected. B1 has been adjusted for the additive outliers and the level shift. proc sgplot data=outlier; series x=date y=sales_A1 / name=’A1’ markers markerattrs=(color=red symbol=’circle’) lineattrs=(color=red); series x=date y=sales_B1 / name=’B1’ markers markerattrs=(color=black symbol=’asterisk’) lineattrs=(color=black); yaxis label=’Original and Outlier Adjusted Time Series’; run;
Output 32.5.4 Original Series and Outlier Adjusted Series
Output 32.5.5 compares the seasonal factors (table D10) of the series unadjusted for outliers to the series adjusted for outliers. The seasonal factors are based on the B1 series. data both; merge nooutlier(rename=(sales_D10=unadj_D10)) outlier; run; title ’Results of Outlier Identification on Final Seasonal Factors’; proc sgplot data=both;
2166 F Chapter 32: The X12 Procedure
series x=date y=unadj_D10 / name=’unadjusted’ markers markerattrs=(color=red symbol=’circle’) lineattrs=(color=red) legendlabel=’Unadjusted for Outliers’; series x=date y=sales_D10 / name=’adjusted’ markers markerattrs=(color=blue symbol=’asterisk’) lineattrs=(color=blue) legendlabel=’Adjusted for Outliers’; yaxis label=’Final Seasonal Factors’; run;
Output 32.5.5 Seasonal Factors Based on Original and Outlier Adjusted Series
Example 32.6: User-Defined Regressors This example demonstrates the use of the USERVAR= option in the REGRESSION statement to include user-defined regressors in the regARIMA model. The user-defined regressors must be defined as nonmissing values for the span of the series being modeled plus any forecast values. Suppose you have the data set SALESDATA with 132 monthly observations beginning in January of 1949. title ’Data Set to be Seasonally Adjusted’;
Example 32.6: User-Defined Regressors F 2167
data salesdata; set sashelp.air(obs=132); run;
Since the regARIMA model forecasts one year ahead, the user-defined regressor must be defined for 144 observations that start in January of 1949. You can construct a simple length-of-month regressor by using the following DATA step. title ’User-defined Regressor for Data to be Seasonally Adjusted’; data regressors(keep=date LengthOfMonth); set sashelp.air; LengthOfMonth = INTNX(’MONTH’,date,1) - date; run;
The two data sets must be merged in order to use them as input to PROC X12. The BY statement is used to align the regressors with the time series by the time ID variable DATE. title ’Data Set Containing Series and Regressors’; data datain; merge regressors salesdata; by date; run; proc print data=datain(firstobs=121); run;
The last 24 observations of the input data set are displayed in Output 32.6.1. Note that the regressor variable is defined for one year (12 observations) beyond the span of the time series to be seasonally adjusted.
2168 F Chapter 32: The X12 Procedure
Output 32.6.1 PROC X12 Input Data Set with User-Defined Regressor Data Set Containing Series and Regressors
Obs
DATE
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
JAN59 FEB59 MAR59 APR59 MAY59 JUN59 JUL59 AUG59 SEP59 OCT59 NOV59 DEC59 JAN60 FEB60 MAR60 APR60 MAY60 JUN60 JUL60 AUG60 SEP60 OCT60 NOV60 DEC60
Length OfMonth 31 28 31 30 31 30 31 31 30 31 30 31 31 29 31 30 31 30 31 31 30 31 30 31
AIR 360 342 406 396 420 472 548 559 463 407 362 405 . . . . . . . . . . . .
The DATAIN data set is now ready to be used as input to PROC X12. Note that the DATE= variable and the user-defined regressors are automatically excluded from the variables to be seasonally adjusted. title ’regARIMA Model with User-defined Regressor’; proc x12 data=datain date=DATE interval=MONTH; transform function=log; regression uservar=LengthOfMonth; automdl; x11; run;
The parameter estimates for the regARIMA model are shown in Output 32.6.2
Example 32.7: MDLINFOIN= and MDLINFOOUT= Data Sets F 2169
Output 32.6.2 PROC X12 Output for User-Defined Regression Parameter regARIMA Model with User-defined Regressor The X12 Procedure
Type
Regression Model Parameter Estimates For variable AIR Standard Parameter NoEst Estimate Error
User Defined
LengthOfMonth
Est
0.04683
t Value
Pr > |t|
2.55
0.0119
0.01834
Exact ARMA Maximum Likelihood Estimation For variable AIR Standard Lag Estimate Error t Value
Parameter Nonseasonal MA Seasonal MA
1 12
0.33678 0.54078
0.08506 0.07726
3.96 7.00
Pr > |t| 0.0001 <.0001
Example 32.7: MDLINFOIN= and MDLINFOOUT= Data Sets This example illustrates the use of MDLINFOIN= and MDLINFOOUT= data sets. Using the data set shown, PROC X12 step identifies the model with outliers as displayed in Output 32.7.1. Output 32.7.2 shows the data set that represents the chosen model. data b1; input y @@; datalines; 112 118 132 121 135 148 136 119 104 115 126 141 125 149 270 158 133 114 ;
129 148 118 135 170 140
title ’Model Identification Output to MDLINFOOUT= Data Set’; proc x12 data=b1 start=’1980q1’ interval=qtr MdlInfoOut=mdl; automdl; outlier; run ; proc print data=mdl; run;
2170 F Chapter 32: The X12 Procedure
Output 32.7.1 Displayed Model Identification with Outliers Model Identification Output to MDLINFOOUT= Data Set The X12 Procedure Critical Values to use in Outlier Detection For variable y Begin End Observations Method AO Critical Value LS Critical Value
1980Q1 1985Q4 24 Add One 3.419415 3.419415
Final Automatic Model Selection For variable y Source of Model
Estimated Model
Automatic Model Choice
Type Automatically Identified
Parameter Nonseasonal AR
( 2, 1, 0)( 0, 0, 0)
Regression Model Parameter Estimates For variable y Standard Parameter NoEst Estimate Error AO 1984Q3
Est
102.36589
5.96584
t Value
Pr > |t|
17.16
<.0001
Exact ARMA Maximum Likelihood Estimation For variable y Standard Lag Estimate Error t Value 1 2
0.40892 -0.53710
0.20213 0.20975
2.02 -2.56
Pr > |t| 0.0554 0.0178
Example 32.7: MDLINFOIN= and MDLINFOOUT= Data Sets F 2171
Output 32.7.2 PROC X12 MDLINFOOUT= Data Set Model with Outlier Detection Model Identification Output to MDLINFOOUT= Data Set
O b s
_ N A M E _
_ M O D E L T Y P E _
1 2 3 4
y y y y
REG ARIMA ARIMA ARIMA
O b s
_ F A C T O R _
1 2 3 4
. . 1 1
_ M O D E L P A R T _
_ C O M P O N E N T _
EVENT FORECAST FORECAST FORECAST
_ P A R M T Y P E _
SCALE NONSEASONAL NONSEASONAL NONSEASONAL
_ L A G _
_ S H I F T _
_ N O E S T _
_ E S T _
. 1 1 2
. . . .
0 . 0 0
102.366 . 0.409 -0.537
AO DIF AR AR
_ D S V A R _
_ V A L U E _
AO01JUL1984D y y y
. . . .
_ S T D E R R _
_ T V A L U E _
_ P V A L U E _
_ S T A T U S _
5.96584 . 0.20213 0.20975
17.1587 . 2.0231 -2.5606
0.000000 . 0.055385 0.017830
. . . .
_ S C O R E _
_ L A B E L _
Suppose that after examining the output from the preceding example, you decide that an Easter regressor should be added to the model. The following statements create a data set with the model identified above and adds a U.S. Census Bureau Predefined Easter(25) regression. The new model data set to be used as input in the MDLINFOIN= option is displayed in the data set shown in Output 32.7.3. data pluseaster; _NAME_ = ’y’; _MODELTYPE_ = ’REG’; _MODELPART_ = ’PREDEFINED’; _COMPONENT_ = ’SCALE’; _PARMTYPE_ = ’EASTER’; _DSVAR_ = ’EASTER’; _VALUE_ = 25; run;
2172 F Chapter 32: The X12 Procedure
data mdlpluseaster; set mdl; run; proc append base=mdlpluseaster data=pluseaster force; run; proc print data=mdlpluseaster; run;
Output 32.7.3 MDLINFOIN= Data Set Model with Easter(25) Regression Added Model Identification Output to MDLINFOOUT= Data Set
O b s
_ N A M E _
_ M O D E L T Y P E _
1 2 3 4 5
y y y y y
REG ARIMA ARIMA ARIMA REG
O b s
_ V A L U E _
_ F A C T O R _
1 2 3 4 5
. . . . 25
. . 1 1 .
_ M O D E L P A R T _
_ C O M P O N E N T _
_ P A R M T Y P E _
_ D S V A R _
EVENT FORECAST FORECAST FORECAST PREDEFINED
SCALE NONSEASONAL NONSEASONAL NONSEASONAL SCALE
AO DIF AR AR EASTER
AO01JUL1984D y y y EASTER
_ L A G _
_ S H I F T _
_ N O E S T _
_ E S T _
. 1 1 2 .
. . . . .
0 . 0 0 .
102.366 . 0.409 -0.537 .
_ S T D E R R _
_ T V A L U E _
_ P V A L U E _
_ S T A T U S _
5.96584 . 0.20213 0.20975 .
17.1587 . 2.0231 -2.5606 .
0.000000 . 0.055385 0.017830 .
. . . . .
_ S C O R E _
_ L A B E L _
The following statement estimate the regression and ARIMA parameters by using the model described in the new data set mdlpluseaster. The results of estimating the new model are shown in Output 32.7.4. proc x12 data=b1 start=’1980q1’ interval=qtr MdlInfoIn=mdlpluseaster MdlInfoOut=mdl2;
Example 32.7: MDLINFOIN= and MDLINFOOUT= Data Sets F 2173
estimate; run;
Output 32.7.4 Estimate Model with Added Easter(25) Regression Model Identification Output to MDLINFOOUT= Data Set The X12 Procedure
Type
Regression Model Parameter Estimates For variable y Standard Parameter NoEst Estimate Error
Easter User Defined
Easter[25] AO01JUL1984D
Parameter
Est Est
6.15738 105.29433
4.89162 6.15636
t Value
Pr > |t|
1.26 17.10
0.2219 <.0001
Exact ARMA Maximum Likelihood Estimation For variable y Standard Lag Estimate Error t Value
Nonseasonal AR
1 2
0.44376 -0.54050
0.20739 0.21656
2.14 -2.50
Pr > |t| 0.0443 0.0210
The new model estimation results are displayed in the data set mdl2 shown in Output 32.7.5. proc print data=mdl2; run;
2174 F Chapter 32: The X12 Procedure
Output 32.7.5 MDLINFOOUT= Data Set, Estimation of Model with Easter(25) Regression Added Model Identification Output to MDLINFOOUT= Data Set
O b s
_ N A M E _
_ M O D E L T Y P E _
1 2 3 4 5
y y y y y
REG REG ARIMA ARIMA ARIMA
O b s
_ V A L U E _
_ F A C T O R _
1 2 3 4 5
25 . . . .
. . . 1 1
_ M O D E L P A R T _
_ C O M P O N E N T _
_ P A R M T Y P E _
_ D S V A R _
PREDEFINED EVENT FORECAST FORECAST FORECAST
SCALE SCALE NONSEASONAL NONSEASONAL NONSEASONAL
EASTER AO DIF AR AR
EASTER AO01JUL1984D y y y
_ L A G _
_ S H I F T _
_ N O E S T _
_ E S T _
. . 1 1 2
. . . . .
0 0 . 0 0
6.157 105.294 . 0.444 -0.541
_ S T D E R R _
_ T V A L U E _
_ P V A L U E _
_ S T A T U S _
4.89162 6.15636 . 0.20739 0.21656
1.2588 17.1033 . 2.1397 -2.4959
0.22193 0.00000 . 0.04428 0.02096
. . . . .
_ S C O R E _
_ L A B E L _
Example 32.8: Setting Regression Parameters This example illustrates the use of fixed regression parameters in PROC X12. Suppose that you have the same data set as in the section “Basic Seasonal Adjustment” on page 2104. You would like to use TRAMO to automatically identify a model with an U.S. Census Bureau Easter(25) regression. The displayed results are shown in Output 32.8.1.
Example 32.8: Setting Regression Parameters F 2175
title ’Estimate Easter(25) Parameter’; proc x12 data=sales date=date MdlInfoOut=mdlout1; var sales; regression predefined=easter(25); automdl; run ;
Output 32.8.1 Automatic Model ID with Easter(25) Regression Estimate Easter(25) Parameter The X12 Procedure
Type
Regression Model Parameter Estimates For variable sales Standard Parameter NoEst Estimate Error
Easter
Easter[25]
Parameter
Est
-5.09298
3.50786
t Value
Pr > |t|
-1.45
0.1489
Exact ARMA Maximum Likelihood Estimation For variable sales Standard Lag Estimate Error t Value
Nonseasonal AR
Nonseasonal MA Seasonal MA
1 2 3 1 12
0.62148 0.23354 -0.07191 0.97377 0.10558
0.09279 0.10385 0.09055 0.03771 0.10205
6.70 2.25 -0.79 25.82 1.03
Pr > |t| <.0001 0.0262 0.4285 <.0001 0.3028
The MDLINFOOUT= data set, mdlout1, that contains the model and parameter estimates is shown in Output 32.8.2. proc print data=mdlout1; run;
2176 F Chapter 32: The X12 Procedure
Output 32.8.2 MDLINFOOUT= Data Set, Estimation of Automatic Model ID with Easter(25) Regression Estimate Easter(25) Parameter
O b s
_ N A M E _
_ M O D E L T Y P E _
_ M O D E L P A R T _
1 2 3 4 5 6 7 8
sales sales sales sales sales sales sales sales
REG ARIMA ARIMA ARIMA ARIMA ARIMA ARIMA ARIMA
PREDEFINED FORECAST FORECAST FORECAST FORECAST FORECAST FORECAST FORECAST
O b s
_ F A C T O R _
1 2 3 4 5 6 7 8
. . . 1 1 1 1 2
_ L A G _
_ S H I F T _
_ N O E S T _
. 1 1 1 2 3 1 1
. . . . . . . .
0 . . 0 0 0 0 0
_ C O M P O N E N T _
_ P A R M T Y P E _
SCALE NONSEASONAL SEASONAL NONSEASONAL NONSEASONAL NONSEASONAL NONSEASONAL SEASONAL
EASTER DIF DIF AR AR AR MA MA
_ D S V A R _
_ V A L U E _
EASTER sales sales sales sales sales sales sales
_ E S T _
_ S T D E R R _
_ T V A L U E _
_ P V A L U E _
_ S T A T U S _
-5.09298 . . 0.62148 0.23354 -0.07191 0.97377 0.10558
3.50786 . . 0.09279 0.10385 0.09055 0.03771 0.10205
-1.4519 . . 6.6980 2.2488 -0.7942 25.8240 1.0346
0.14894 . . 0.00000 0.02621 0.42851 0.00000 0.30277
. . . . . . . .
_ S C O R E _
25 . . . . . . .
_ L A B E L _
To fix the Easter(25) parameter while adding a regressor that is weighted according to the number of Saturdays in a month, either use the REGRESSION and EVENT statements or create a MDLINFOOUT= data set. The following statements show the method for using the REGRESSION statement to fix the EASTER parameter and the EVENT statement to add the SATURDAY regressor. The output is shown in Output 32.8.3. title ’Use SAS Statements to Alter Model’; proc x12 data=sales date=date MdlInfoOut=mdlout2grm; var sales;
Example 32.8: Setting Regression Parameters F 2177
regression predefined=easter(25) / b=-5.029298 F; event Saturday; automdl; run ;
Output 32.8.3 Automatic Model ID with Fixed Easter(25) and Saturday Regression Use SAS Statements to Alter Model The X12 Procedure
Type
Regression Model Parameter Estimates For variable sales Standard Parameter NoEst Estimate Error
User Defined Easter
Saturday Easter[25]
Parameter
Est Fixed
3.23225 -5.02930
1.16701 .
t Value
Pr > |t|
2.77 .
0.0064 .
Exact ARMA Maximum Likelihood Estimation For variable sales Standard Lag Estimate Error t Value
Nonseasonal AR
1
-0.32506
0.08256
-3.94
Pr > |t| 0.0001
To fix the EASTER regressor and add the new SATURDAY regressor by using a DATA step, you would create the data set mdlin2 as shown. The data set mdlin2 is displayed in Output 32.8.4. title ’Use a SAS DATA Step to Create a MdlInfoIn= Data Set’; data plusSaturday; _NAME_ = ’sales’; _MODELTYPE_ = ’REG’; _MODELPART_ = ’EVENT’; _COMPONENT_ = ’SCALE’; _PARMTYPE_ = ’USER’; _DSVAR_ = ’SATURDAY’; run; data mdlin2; set mdlout1; if ( _DSVAR_ = ’EASTER’ ) then do; _NOEST_ = 1; _EST_ = -5.029298; end; run; proc append base=mdlin2 data=plusSaturday force; run; proc print data=mdlin2; run;
2178 F Chapter 32: The X12 Procedure
Output 32.8.4 MDLINFOIN= Data Set, Fixed Easter(25) and Added Saturday Regression, Previously Identified Model Use a SAS DATA Step to Create a MdlInfoIn= Data Set
O b s
_ N A M E _
_ M O D E L T Y P E _
1 2 3 4 5 6 7 8 9
sales sales sales sales sales sales sales sales sales
REG ARIMA ARIMA ARIMA ARIMA ARIMA ARIMA ARIMA REG
O b s
_ V A L U E _
_ F A C T O R _
1 2 3 4 5 6 7 8 9
25 . . . . . . . .
. . . 1 1 1 1 2 .
_ M O D E L P A R T _
_ C O M P O N E N T _
_ P A R M T Y P E _
PREDEFINED FORECAST FORECAST FORECAST FORECAST FORECAST FORECAST FORECAST EVENT
SCALE NONSEASONAL SEASONAL NONSEASONAL NONSEASONAL NONSEASONAL NONSEASONAL SEASONAL SCALE
EASTER DIF DIF AR AR AR MA MA USER
_ L A G _
_ S H I F T _
_ N O E S T _
. 1 1 1 2 3 1 1 .
. . . . . . . . .
1 . . 0 0 0 0 0 .
_ E S T _ -5.02930 . . 0.62148 0.23354 -0.07191 0.97377 0.10558 .
_ D S V A R _ EASTER sales sales sales sales sales sales sales SATURDAY
_ S T D E R R _
_ T V A L U E _
_ P V A L U E _
_ S T A T U S _
3.50786 . . 0.09279 0.10385 0.09055 0.03771 0.10205 .
-1.4519 . . 6.6980 2.2488 -0.7942 25.8240 1.0346 .
0.14894 . . 0.00000 0.02621 0.42851 0.00000 0.30277 .
. . . . . . . . .
_ S C O R E _
_ L A B E L _
The data set mdlin2 can be used to replace the regression and model information contained in the REGRSSION, EVENT, and AUTOMDL statements. Note that the model specified in the mdlin2 data set is the same model as the automatically identified model. The following example uses the mdlin2 data set as input; the results are displayed in Output 32.8.5. title ’Use Updated Data Set to Alter Model’;
Example 32.8: Setting Regression Parameters F 2179
proc x12 data=sales date=date MdlInfoIn=mdlin2 MdlInfoOut=mdlout2DS; var sales; estimate; run ;
Output 32.8.5 Estimate MDLINFOIN= File for Model with Fixed Easter(25) and Saturday Regression, Previously Identified Model Use Updated Data Set to Alter Model The X12 Procedure
Type
Regression Model Parameter Estimates For variable sales Standard Parameter NoEst Estimate Error
User Defined Easter
SATURDAY Easter[25]
Parameter Nonseasonal AR
Nonseasonal MA Seasonal MA
Est Fixed
3.41762 -5.02930
1.07641 .
t Value
Pr > |t|
3.18 .
0.0019 .
Exact ARMA Maximum Likelihood Estimation For variable sales Standard Lag Estimate Error t Value 1 2 3 1 12
0.62225 0.30429 -0.14862 0.97125 0.11691
0.09175 0.10109 0.08859 0.03798 0.10000
6.78 3.01 -1.68 25.57 1.17
Pr > |t| <.0001 0.0031 0.0958 <.0001 0.2445
The following statements specify almost the same information as contained in the data set mdlin2. Note that the ARIMA statement is used to specify the lags of the model. However, the initial AR and MA parameter values are the default. When using the mdlin2 data set as input, the initial values can be specified. The results are displayed in Output 32.8.6. title ’Use SAS Statements to Alter Model’; proc x12 data=sales date=date MdlInfoOut=mdlout3grm; var sales; regression predefined=easter(25) / b=-5.029298 F; event Saturday; arima model=((3 1 1)(0 1 1)); estimate; run ; proc print data=mdlout3grm; run;
2180 F Chapter 32: The X12 Procedure
Output 32.8.6 MDLINFOOUT= Statement, Fixed Easter(25) and Added Saturday Regression, Previously Identified Model Use SAS Statements to Alter Model
O b s
_ N A M E _
_ M O D E L T Y P E _
1 2 3 4 5 6 7 8 9
sales sales sales sales sales sales sales sales sales
REG REG ARIMA ARIMA ARIMA ARIMA ARIMA ARIMA ARIMA
O b s
_ V A L U E _
_ F A C T O R _
1 2 3 4 5 6 7 8 9
. 25 . . . . . . .
. . . . 1 1 1 1 2
_ M O D E L P A R T _
_ C O M P O N E N T _
_ P A R M T Y P E _
EVENT PREDEFINED FORECAST FORECAST FORECAST FORECAST FORECAST FORECAST FORECAST
SCALE SCALE NONSEASONAL SEASONAL NONSEASONAL NONSEASONAL NONSEASONAL NONSEASONAL SEASONAL
USER EASTER DIF DIF AR AR AR MA MA
_ L A G _
_ S H I F T _
_ N O E S T _
. . 1 1 1 2 3 1 1
. . . . . . . . .
0 1 . . 0 0 0 0 0
_ E S T _ 3.41760 -5.02930 . . 0.62228 0.30431 -0.14864 0.97128 0.11684
_ D S V A R _ Saturday EASTER sales sales sales sales sales sales sales
_ S T D E R R _
_ T V A L U E _
_ P V A L U E _
_ S T A T U S _
1.07640 . . . 0.09175 0.10109 0.08859 0.03796 0.10000
3.1750 . . . 6.7825 3.0103 -1.6779 25.5881 1.1684
0.00187 . . . 0.00000 0.00314 0.09579 0.00000 0.24481
. . . . . . . . .
_ S C O R E _
_ L A B E L _
The MDLINFOOUT= data set provides a method for comparing the results of the model identification. The data set mdlout3grm that is the result of using the ARIMA MODEL= option can be compared to the data set mdlout2DS that is the result of using the MDLINFOIN= data set with initial values for the AR and MA parameters. The mdlout2DS data set is shown in Output 32.8.7, and the results of the comparison are shown in Output 32.8.8. The slight difference in the estimated parameters can be attributed to the difference in the initial values for the AR and MA parameters.
Example 32.8: Setting Regression Parameters F 2181
proc print data=mdlout2DS; run;
Output 32.8.7 MDLINFOOUT= Data Set, Fixed Easter(25) and Added Saturday Regression, Previously Identified Model Use SAS Statements to Alter Model
O b s
_ N A M E _
_ M O D E L T Y P E _
1 2 3 4 5 6 7 8 9
sales sales sales sales sales sales sales sales sales
REG REG ARIMA ARIMA ARIMA ARIMA ARIMA ARIMA ARIMA
O b s
_ V A L U E _
_ F A C T O R _
1 2 3 4 5 6 7 8 9
. 25 . . . . . . .
. . . . 1 1 1 1 2
_ M O D E L P A R T _
_ C O M P O N E N T _
_ P A R M T Y P E _
EVENT PREDEFINED FORECAST FORECAST FORECAST FORECAST FORECAST FORECAST FORECAST
SCALE SCALE NONSEASONAL SEASONAL NONSEASONAL NONSEASONAL NONSEASONAL NONSEASONAL SEASONAL
USER EASTER DIF DIF AR AR AR MA MA
_ L A G _
_ S H I F T _
_ N O E S T _
. . 1 1 1 2 3 1 1
. . . . . . . . .
0 1 . . 0 0 0 0 0
_ E S T _ 3.41762 -5.02930 . . 0.62225 0.30429 -0.14862 0.97125 0.11691
_ D S V A R _ SATURDAY EASTER sales sales sales sales sales sales sales
_ S T D E R R _
_ T V A L U E _
_ P V A L U E _
_ S T A T U S _
1.07641 . . . 0.09175 0.10109 0.08859 0.03798 0.10000
3.1750 . . . 6.7817 3.0100 -1.6776 25.5712 1.1691
0.00187 . . . 0.00000 0.00314 0.09584 0.00000 0.24451
. . . . . . . . .
_ S C O R E _
title ’Compare Results of SAS Statement Input and MdlInfoIn= Input’; proc compare base= mdlout3grm compare=mdlout2DS; var _EST_; run ;
_ L A B E L _
2182 F Chapter 32: The X12 Procedure
Output 32.8.8 Compare Parameter Estimates from Different MDLINFOOUT= Data Sets
Value Comparison Results for Variables __________________________________________________________ || Value of Parameter Estimate || Base Compare Obs || _EST_ _EST_ Diff. % Diff ________ || _________ _________ _________ _________ || 1 || 3.4176 3.4176 0.0000225 0.000658 5 || 0.6223 0.6222 -0.000033 -0.005237 6 || 0.3043 0.3043 -0.000021 -0.006977 7 || -0.1486 -0.1486 0.0000235 -0.0158 8 || 0.9713 0.9713 -0.000024 -0.002452 9 || 0.1168 0.1169 0.0000759 0.0650 __________________________________________________________
Example 32.9: Illustration of ODS Graphics This example illustrates the use of ODS Graphics. Using the same data set as in the section “Basic Seasonal Adjustment” on page 2104 and the previous examples, a spectral plot of the original series is displayed in Output 32.9.1. The graphical displays are requested by specifying the ODS GRAPHICS statement. For specific information about the graphics available in the X12 procedure, see the section “ODS Graphics” on page 2141. ods graphics on; proc x12 data=sales date=date; var sales; run;
Example 32.9: Illustration of ODS Graphics F 2183
Output 32.9.1 Spectral Plot for Original Data
2184 F Chapter 32: The X12 Procedure
Acknowledgments: X12 Procedure The X-12-ARIMA procedure was developed by the Time Series Staff of the Statistical Research Division, U.S. Census Bureau. Brian Monsell is the primary programmer for the U.S. Census Bureau’s X-12-ARIMA procedure and has been very helpful in the development of PROC X12. The version of PROC X12 documented here was produced by converting the U.S. Census Bureau’s FORTRAN code to the SAS development language and adding typical SAS procedure syntax. This conversion work was performed by SAS and resulted in the X12 procedure. Although several features were added during the conversion, credit for the statistical aspects and general methodology of the X12 procedure belongs to the U.S. Census Bureau. The X-12-ARIMA seasonal adjustment program contains components developed from Statistics Canada’s X-11-ARIMA program. The X-12-ARIMA automatic modeling method is based on the work of Gomez and Maravall (1997a, b).
References Box, G. E. P., Jenkins, G. M., and Reinsel, G. C. (1994), Time Series Analysis: Forecasting and Control, Third Edition, Englewood Cliffs, NJ: Prentice Hall. Cholette, P. A. (1979), A Comparison and Assessment of Various Adjustment Methods of Sub-annual Series to Yearly Benchmarks, StatCan Staff Paper STC2119, Seasonal Adjustment and Time Series Staff, Statistics Canada, Ottawa. Dagum, E. B. (1983), The X-11-ARIMA Seasonal Adjustment Method, Technical Report 12-564E, Statistics Canada. Dagum, E. B. (1988), The X-11-ARIMA/88 Seasonal Adjustment Method: Foundations and User’s Manual, Ottawa: Statistics Canada. Findley, D. F., Monsell, B. C., Bell, W. R., Otto, M. C., and Chen, B. C. (1998), “New Capabilities and Methods of the X-12-ARIMA Seasonal Adjustment Program,” Journal of Business and Economic Statistics, 16, 127–176. Gomez, V. and Maravall, A. (1997a), Guide for Using the Programs TRAMO and SEATS, Beta Version, Banco de España. Gomez, V. and Maravall, A. (1997b), Program TRAMO and SEATS: Instructions for the User, Beta Version, Banco de España. Huot, G. (1975), Quadratic Minimization Adjustment of Monthly or Quarterly Series to Annual Totals, StatCan Staff Paper STC2104, Statistics Canada, Seasonal Adjustment and Time Series Staff, Ottawa.
References F 2185
Ladiray, D. and Quenneville, B. (2001), Seasonal Adjustment with the X-11 Method, New York: Springer-Verlag. Ljung, G. M. (1993), “On Outlier Detection in Time Series,” Journal of the Royal Statistical Society, B, 55, 559–567. Lothian, J. and Morry, M. (1978a), A Set of Quality Control Statistics for the X-11-ARIMA Seasonal Adjustment Method, StatCan Staff Paper STC1788E, Seasonal Adjustment and Time Series Analysis Staff, Statistics Canada, Ottawa. Lothian, J. and Morry, M. (1978b), A Test for the Presence of Identifiable Seasonality When Using the X-11-ARIMA Program, StatCan Staff Paper STC2118, Seasonal Adjustment and Time Series Analysis Staff, Statistics Canada, Ottawa. Shiskin, J., Young, A. H., and Musgrave, J. C. (1967), The X-11 Variant of the Census Method II Seasonal Adjustment Program, Technical Report 15, U.S. Department of Commerce, Bureau of the Census. U.S. Bureau of the Census (2001a), X-12-ARIMA Quick Reference for UNIX, Version 0.2.8, Washington, DC. U.S. Bureau of the Census (2001b), X-12-ARIMA Reference Manual, Version 0.2.8, Washington, DC. U.S. Bureau of the Census (2001c), X-12-ARIMA Seasonal Adjustment Program, Version 0.2.8, Washington, DC.
2186
Part III
Data Access Engines
2188
Chapter 33
The SASECRSP Interface Engine Contents Overview: SASECRSP Interface Engine . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Opening a Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Your Opened Database . . . . . . . . . . . . . . . . . . . . . . . . . Getting Started: SASECRSP Interface Engine . . . . . . . . . . . . . . . . . . . . Structure of a SAS Data Set That Contains Time Series Data . . . . . . . . . Reading CRSP Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the SAS DATA Step . . . . . . . . . . . . . . . . . . . . . . . . . . . Using SAS Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using CRSP Date Formats, Informats, and Functions . . . . . . . . . . . . . Syntax: SASECRSP Interface Engine . . . . . . . . . . . . . . . . . . . . . . . . The LIBNAME libref SASECRSP Statement . . . . . . . . . . . . . . . . . Details: SASECRSP Interface Engine . . . . . . . . . . . . . . . . . . . . . . . . Using the Inset Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The SAS Output Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . Understanding CRSP Date Formats, Informats, and Functions . . . . . . . . Examples: SASECRSP Interface Engine . . . . . . . . . . . . . . . . . . . . . . . Example 33.1: Specifying PERMNOs and RANGE on the LIBNAME Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example 33.2: Using the LIBNAME Statement to Access All Keys . . . . . Example 33.3: Accessing One PERMNO Using No RANGE . . . . . . . . Example 33.4: Specifying Keys Using the INSET= Option . . . . . . . . . Example 33.5: Specifying Ranges for Individual Keys with the INSET= Option Example 33.6: Converting Dates By Using the CRSP Date Functions . . . . Example 33.7: Comparing Different Ways of Accessing CCM Data . . . . . Example 33.8: Comparing PERMNO and GVKEY Access of CRSP Stock Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example 33.9: Using Fiscal Date Range Restriction . . . . . . . . . . . . . Example 33.10: Using Different Types of Range Restrictions in the INSET . Example 33.11: Using INSET Ranges with the LIBNAME RANGE Option Data Elements Reference: SASECRSP Interface Engine . . . . . . . . . . . . . . Available CRSP Stock Data Sets . . . . . . . . . . . . . . . . . . . . . . . . Available Compustat Data Sets . . . . . . . . . . . . . . . . . . . . . . . . Available CRSP Indices Data Sets . . . . . . . . . . . . . . . . . . . . . . .
2190 2190 2190 2192 2192 2192 2194 2194 2194 2194 2194 2195 2203 2203 2206 2208 2212 2212 2214 2216 2217 2221 2222 2224 2227 2228 2230 2233 2234 2238 2242 2275
2190 F Chapter 33: The SASECRSP Interface Engine
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2287 2287
Overview: SASECRSP Interface Engine
Introduction The SASECRSP interface engine enables SAS users to access and process time series, events, portfolios, and group data that reside in CRSPAccess databases. It also provides a seamless interface between CRSP, COMPUSTAT, and SAS data processing. Currently, SASECRSP supports access of CRSP Stock databases, CRSP Indices databases and CRSP/Compustat Merged databases.
Opening a Database The SASECRSP engine uses the LIBNAME statement to enable you to specify which CRSPAccess database you would like to access and how you would like to perform selection on that database. To specify the database, you supply the combination of a physical path to indicate the location of the CRSPAccess data files and a set identifier (SETID) to identify the database desired from those available at the physical path. Specify one SETID from Table 33.1. Note that the CRSP environment variable CRSPDB_SASCAL must be defined before the SASECRSP engine can access the CRSPAccess database calendars that provide the time ID variables and enable the LIBNAME to successfully assign. Table 33.1
CRSPAccess Databases SETIDs
SETID 10 20 200 400 420 440 460
Data Set CRSP Stock, daily data CRSP Stock, monthly data CRSP/Compustat Merged (CCM) data CRSP Indices data, monthly index groups CRSP Indices data, monthly index series CRSP Indices data, daily index groups CRSP Indices data, daily index series
Usually you do not want to open the entire CRSPAccess database, so for efficiency purposes and ease of use, SASECRSP supports a variety of options for performing data selection on your CRSPAccess database with the LIBNAME statement. These options enable you to open and retrieve data for only the portion of the database you desire. The availability of some of these options depends on the type of database being opened.
Opening a Database F 2191
CRSP Stock Databases When accessing the CRSP Stock databases, you can select which securities you want to access by specifying their PERMNOs with the PERMNO= option. PERMNO™ is CRSP’s unique permanent issue identification number and the primary key for their stock databases. Alternatively, a number of secondary keys can also be used to select stock data. For example, you could use the PERMCO= option to read selected securities based on CRSP’s unique permanent company identification number, PERMCO™. A full list of possible keys for accessing CRSP Stock data is shown in Table 33.2. Table 33.2
Keys for Accessing CRSP Stock Data
Key PERMNO PERMCO CUSIP HCUSIP SICCD TICKER
Access By CRSP’s unique permanent issue identification number This is the primary key for CRSP Stock databases. CRSP’s unique permanent company identification number CUSIP number historical CUSIP standard industrial classification (SIC) code TICKER symbol (for active companies only)
CRSP/Compustat Merged Databases When accessing Compustat data via the CCM database, you can select which companies you want to access by specifying their GVKEYs. GVKEY™ is Compustat’s unique identifer and primary key. You can specify a GVKEY to include with the GVKEY= option. Two secondary keys, PERMNO and PERMCO, are also supported for access via their respective PERMNO= and PERMCO= options. A full list of possible keys for accessing CCM data is shown in Table 33.3. Table 33.3
Keys for Accessing CCM Data
Key GVKEY PERMNO PERMCO
Access By Compustat’s unique identifer and primary key for CCM database CRSP’s unique permanent issue identification number CRSP’s unique permanent company identification number
CRSP Indices Databases When accessing CRSP Indices data, you can select which indices you want to access by specifying their INDNOs. INDNO™ is the primary key for CRSP Indices databases. You can specify which INDNO to use with the INDNO= option. No secondary key access is supported for CRSP Indices. A full list of possible keys for accessing CRSP Indices data is shown in Table 33.4.
2192 F Chapter 33: The SASECRSP Interface Engine
Table 33.4
Keys for Accessing Indices Data
Key INDNO
Access By CRSP’s unique permanent index identifier number. This is the primary key for CRSP Indices databases and enables you to specify which index series or groups you want to select.
Regardless of the database you are accessing, you can always use the INSET= and RANGE= options for subsetting and selection. The RANGE= option subsets the data timewise. The INSET= option enables you to specify which issues or companies you want to select from the CRSP database by using an input SAS data set.
Using Your Opened Database Once the LIBNAME is assigned, the database is opened. You can retrieve data for any member you want in the opened database. For a complete description of available data sets and their fields, see the section “Data Elements Reference: SASECRSP Interface Engine” on page 2234. You can also use the SAS DATA Step to perform further subsetting and to store the resulting time series in a SAS data set. Since CRSP and SAS use three different date representations, you can make use of the engine-provided CRSP date formats, informats, and functions for your data processing needs. See the section “Understanding CRSP Date Formats, Informats, and Functions” on page 2208, as well as Example 33.6 later in this chapter for more information about dates with SASECRSP. SASECRSP for SAS 9.2 supports Linux, Solaris (SUNOS5.9), and Windows.
Getting Started: SASECRSP Interface Engine
Structure of a SAS Data Set That Contains Time Series Data SAS requires time series data to be in a specific form that is recognizable by the SAS System. This form is a two-dimensional array, called a SAS data set, whose columns correspond to series variables and whose rows correspond to measurements of these variables at certain points in time. The time at which observations are recorded can be included in the data set as a time ID variable. Note that CRSP sets the date to the end of a time period as opposed to the beginning, and SASECRSP follows this convention. For example, the time ID variable for any particular month in a monthly time series occurs on the last trading day of that month. The SASECRSP engine provides several different time ID variables depending on the data member opened. For most members, a time ID variable called CALDT is provided. CALDT provides a day-
Structure of a SAS Data Set That Contains Time Series Data F 2193
based calendar date and is in a CRSP date format. This means dates are stored as an offset in an array of trading days or a trading day calendar. There are five different CRSP trading day calendars and the one used depends on the frequency of the data member. For example, the CRSP date for a daily time series refers to a daily trading day calendar. The five trading day calendars are: annual, quarterly, monthly, weekly and daily. For your convenience, the format and informat for this field is set so the CRSP date is automatically converted to an Integer date representation when viewed or printed. For data programming, the SASECRSP engine provides 23 different user functions for date conversions between CRSP, SAS, and integer dates. The CCM database contains members whose dates are based on the fiscal calendar of the corresponding company, so a comprehensive set of time ID variables are provided. CRSPDT, RCALDT and FISCALDT provide day-based dates, each with its own format. CRSPDT This time ID variable provides a date in CRSP date format similar to CALDT. CRSPDT differs
only in that its format and informat are not set for automatic conversion to integer dates because this is already provided by FISCALDT and RCALDT. For fiscal members, CRSPDT is one based on the fiscal calendar of the company. FISCALDT This time ID variable provides the same date CRSPDT does, but in integer format. It is the result of performing a CRSP-to-Integer date conversion on CRSPDT. Since the date CRSPDT holds is fiscal, FISCALDT is also fiscal. RCALDT This time ID variable is also an integer date, just like FISCALDT, but it has been shifted so the
date is on calendar time as opposed to being fiscal. For example, Microsoft’s fiscal year ends in June, so if you look at its annual period descriptor for the 2002 fiscal year, its time ID variables are 78 for CRSPDT, 20021231 for its FISCALDT, and 20020628 for RCALDT. In summary, a total of three time ID variables are provided for fiscal time series members. One is in CRSP date format, and the other two are in integer format with the only difference between the two integer formats being that one of them is based on the fiscal calendar of the company while the other is not. For more information about how CALDT, CRSPDT, and date conversions are handled, see the section “Understanding CRSP Date Formats, Informats, and Functions” on page 2208. The CCM database also contains fiscal array members, which are all the segment data members. They are unlike the fiscal time series in that they are not associated with a calendar and also have their time ID variables embedded in the data as a data field. Generally both fiscal and calendar time ID variables are embedded. However, segment members segsrc, segcur, and segitm have only one fiscal time ID variable embedded. For your convenience, SASECRSP calculates and provides CALYR, the calendar version of the embedded fiscal time ID variable for these three segment members. Note that due to limitations of the data, all segment member time ID variables are year-based.
2194 F Chapter 33: The SASECRSP Interface Engine
Reading CRSP Data Files The SASECRSP engine supports reading time series, events, portfolios and group data from CRSPAccess databases. The SETID you specify determines the database that is read. See Table 33.1 for a list of possible databases. The CRSP environment variable CRSPDB_SASCAL must be defined before the SASECRSP engine can access the CRSPAccess database calendars that provide the time ID variables and enable the LIBNAME to successfully assign.
Using the SAS DATA Step If desired, you can store the selected series in a SAS data set by using the SAS DATA step. You can also perform other operations on your data inside the DATA step. Once the data is in a SAS data set you can use it as you would any other SAS data set.
Using SAS Procedures You can print the output SAS data set by using the PRINT procedure, and you can report information concerning the contents of your data set by using the CONTENTS procedure. You can also create a view of the CRSPAccess database by using the SQL procedure in conjunction with a SASECRSP libref. See Example 33.11 later in this chapter for an example with PROC SQL. Viewtable enables you to view your data by double-clicking on a SASECRSP libref in the LIBNAME window of the SAS Display Manager.
Using CRSP Date Formats, Informats, and Functions Historically, CRSP has used two different methods to represent dates, and SAS has used a third. The SASECRSP engine provides 23 functions, 15 informats, and 10 formats to enable you to easily translate the dates from one internal representation to another. See the section “Understanding CRSP Date Formats, Informats, and Functions” on page 2208 for details.
Syntax: SASECRSP Interface Engine The SASECRSP engine uses standard engine syntax. Options used by SASECRSP are summarized in Table 33.5.
The LIBNAME libref SASECRSP Statement F 2195
Table 33.5
Summary of LIBNAME libref
Option SETID=
PERMNO= PERMCO= CUSIP= HCUSIP= TICKER= SICCD= GVKEY= INDNO= RANGE= INSET= CRSPLINKPATH=
Description specifies which CRSP database subset to open This option is required. See Table 33.1 for complete list of supported SETIDs. specifies a CRSP PERMNO to be selected for access specifies a CRSP PERMCO to be selected for access specifies a current CUSIP to be selected for access specifies a historic CUSIP to be selected for access specifies a TICKER to be selected for access (for active companies only) specifies a SIC Code to be selected for access specifies a Compustat GVKEY to be selected for access specifies a CRSP INDNO to be selected for access specifies the range of data to keep in format ‘YYYYMMDD-YYYYMMDD’ uses a SAS data set named setname as input for issues specifies location of the CRSP link history. This option is required for accessing CRSP data with Compustat’s GVKEY.
The LIBNAME libref SASECRSP Statement LIBNAME libref SASECRSP ’physical name ’ options ;
The physical name required by the LIBNAME statement should point to the directory of CRSPAccess data files where the CRSP database you want to open is located. Note that the physical name must end in a slash for UNIX environments and a backslash for Windows environments. The CRSP environment variable CRSPDB_SASCAL must be defined before the SASECRSP engine can access the CRSPAccess database calendars. The CRSP environment variable CRSPDB_SASCAL is necessary for the SASECRSP LIBNAME to assign successfully. This necessary environment variable should be defined automatically by either the CRSP software installation or, in later versions, the CRSP data installation. Since occasional instances occur where the variable is not set properly, always check to ensure the CRSPDB_SASCAL environment variable is set to the location where your most recent CRSP data resides. Remember to include the final slash or backslash required. After the LIBNAME is assigned, you can access any of the available data sets/members within the opened database. For a complete description of available data sets and their fields, see the section “Data Elements Reference: SASECRSP Interface Engine” on page 2234. The following options can be used in the LIBNAME libref SASECRSP statement: SETID=crsp_setidnumber
Specifies the CRSP database you want to read from. SETID is a required option. Choose one
2196 F Chapter 33: The SASECRSP Interface Engine
SETID from seven possible values in Table 33.1. The SETID limits the frequency selection of time series that are included in the SAS data set. As an example, to access monthly CRSP Stock data, you would use the following statements: LIBNAME myLib sasecrsp ’physical-name’ SETID=20;
PERMNO=crsp_permnumber
By default, the SASECRSP engine reads all keys for the CRSPAccess database that you specified in your SASECRSP libref. The PERMNO= option enables you to select data from your CRSP database by the PERMNO(s) (or other keys) you specify. PERMNOs are CRSP’s unique permanent issue identification number. There is no limit to the number of crsp_permnumber options that you can use. From a performance standpoint, the PERMNO= option does efficient random access and reads only the data for the PERMNOs specified. The following LIBNAME statement reads data only for Microsoft Corporation (PERMNO=10107) and International Business Machines Corporation (PERMNO=12490) using the primary PERMNO key, and thus is very efficient. LIBNAME myLib sasecrsp ’physical-name’ SETID=20 PERMNO=10107 PERMNO=12490;
The PERMCO=, CUSIP=, HCUSIP=, SICCD=, TICKER=, GVKEY=, and INDNO= options behave similarly and you can use them in conjunction with or in place of the PERMNO= option. For example you could have used the following statement to access monthly data for Microsoft and IBM: LIBNAME myLib sasecrsp ’physical-name’ SETID=20 TICKER=’MSFT’ CUSIP=59491810;
Details on the use of other key selection options are described separately later. PERMNOs specified by this option can select the companies or issues to keep for CRSP Stock or for CRSP/Compustat Merged databases, but PERMNO is not a supported option for CRSP Indices databases. Use the INDNO= option for the CRSP Indices database and use the PERMNO= option with CRSP US Stock and with CRSP/Compustat Merged databases. Details on the use of key selection options for each type of database follows. STK Databases PERMNO is the primary key for CRSP Stock databases. Every valid PERMNO you specify with the PERMNO= option keeps exactly one issue. CCM Databases PERMNO can be used as a secondary key for the CCM database through CRSPLink™. Linking between the CRSP and Compustat databases is a complex, many-to-many relationship
The LIBNAME libref SASECRSP Statement F 2197
between PERMNO/PERMCOs and GVKEYs. When accessing CCM data by PERMNO, all GVKEYs that link to the given PERMNO are amalgamated to provide seamless access to all linked data. However, note that accessing CCM data by PERMNO is logically different than accessing it by its linked GVKEY(s). In particular, when the PERMNO you specify is linked to several different GVKEYs, one link is designated as the primary link. This designation is set by CRSP and its researchers, and serves in a specific role for the link information in the CCM database. Only data for the primary link is retrieved for the header. For other members, including all time series members, all links are examined, but data is extracted only for the active period of the links and only if the data is within any possible user-specified date restriction. If two or more GVKEYto-PERMNO links overlap in time, data from the later (more recent) GVKEY is used. For more information about CRSP links, see “Link Used Array” in the CRSP/Compustat Merged Database Guide. For example, PERMNO=10083 is CRSP’s unique issue identifier for Teknowledge Incorporated, and later (due to a name change) Cimflex Teknowledge Corporation. To access CCM data for IBM Corporation, Teknowledge Inc., and Cimflex Teknowledge Corp., you can use the following statement: LIBNAME myLib1 sasecrsp ’physical-name’ SETID=200 GVKEY=6066 /* IBM */ PERMNO=10083; /* Teknowledge and Cimflex */
Teknowledge Inc. and Cimflex Corp. have separate GVKEYs in the CCM database, so the previous statement is actually an example of using one PERMNO to access data for several (linked) GVKEYs. The first link to GVKEY=11947 spans March 5, 1986, to December 31, 1988, and the second link to GVKEY=15495 spans February 2, 1989, to September 9, 1993. An alternate way of accessing the data is by using the linked GVKEYs directly as seen in this statement. LIBNAME myLib2 sasecrsp ’physical-name’ SETID=200 GVKEY=6066 GVKEY=11947 GVKEY=15495;
These two LIBNAME statements look similar, but do not perform the same operation. myLib1 assumes you are selecting the issue data for PERMNO=10083, so only observations from the CCM database that are within the time period of the used links are accessed. In the previous example for myLib1, only data ranging from March 5, 1986, to December 31, 1988, are extracted for GVKEY=11947 and only data ranging from February 28, 1989, to September 9, 1993, are extracted for GVKEY=15496. Furthermore, while both GVKEYs 11947 and 15495 are linked to the PERMNO, GVKEY 15495 is the primary link, and when accessing the header, only 15495 is used. If the two links overlap, the data from the later (more recent) GVKEY of 15495 is used.
2198 F Chapter 33: The SASECRSP Interface Engine
In contrast, myLib2 uses an open range for all three specified keys. If there are data overlapping in time between GVKEY 11947 and 15495, data for both are reported. Similarly, when accessing the header, data for both 11947 and 15497 are retrieved. IND Databases INDNO is the primary key for accessing CRSP Indices databases. PERMNO is not available as a key for the IND (CRSP Indices) database; use INDNO for efficient access of IND database. GVKEY=crsp_gvkey
The GVKEY= option is similar to the PERMNO= option. It enables you to use the Compustat’s Permanent SPC Identifier key (GVKEY) to select the companies or issues to keep. There is no limit to the number of crsp_gvkey options that you can use. STK Databases GVKEY can serve as a secondary key for accessing CRSP Stock databases. This requires the additional use of the CRSPLINKPATH= option. Linking between the Compustat and CRSP databases is a complex, many-to-many relationship between GVKEYs and PERMNO/PERMCOs. When accessing CRSP data by GVKEY, all links of the specified GVKEY are followed and processed. No additional logic is applied, and link ranges are ignored. Accessing CRSP data by GVKEY is identical to accessing CRSP data by all of its linked PERMNOs. For example, Wolverine Exploration Co. and Amerac Energy Corp have different PERMNOs but the same GVKEY, and there are two identical ways of accessing CRSP Stock data on these two entities. LIBNAME myLib1 sasecrsp ’physical-name’ SETID=10 PERMNO=13638 /* Wolverine Exploration */ PERMNO=84641; /* Amerac Energy */ LIBNAME myLib2 sasecrsp ’physical-name’ SETID=10 CRSPLINKPATH=’physical-name’ GVKEY=1544;
The CRSPLINKPATH= option is required when accessing CRSP Stock databases by GVKEY. See the discussion later in this section on the CRSPLINKPATH= option. CCM Databases GVKEY is the primary key for accessing the CCM database. Every valid GVKEY you specify keeps exactly one company. IND Databases INDNO is the primary key for accessing CRSP Indices databases; use INDNO instead of GVKEY for IND databases. GVKEY is not available as a key for accessing CRSP Indices databases.
The LIBNAME libref SASECRSP Statement F 2199
PERMCO=crsp_permcompany
The PERMCO= option is similar to the PERMNO= option. It enables you to use the CRSP’s unique permanent company identification key (PERMCO) to select the companies or issues to keep. There is no limit to the number of crsp_permcompany options that you can use. STK Databases PERMCO is a secondary key for accessing CRSP Stock databases. One PERMCO can map to multiple PERMNOs. Access by PERMCO is equivalent to access by all mapped PERMNOs. CCM Databases PERMCO can also be used as a secondary key for accessing the CCM database. Linking between the CRSP and CCM databases is a complex, many-to-many relationship. When accessing CCM data by PERMCO, all linking GVKEYs are amalgamated and processed. Link active ranges are respected. Only data for the primary link is returned for the header. In cases when the active ranges of various links overlap, the most recent link is used. See PERMNO= option for more details. IND Databases Use INDNO for accessing CRSP Indices databases. PERMCO is not available as a key for accessing CRSP Indices databases; use INDNO instead. CUSIP=crsp_cusip
The CUSIP= option is similar to the PERMNO= option. It enables you to use the CUSIP key to select the companies or issues to keep. There is no limit to the number of crsp_cusip options that you can use. STK Databases CUSIP is a secondary key for accessing CRSP Stock databases. One CUSIP maps to one PERMNO. CCM Databases CUSIP is not available as a key for accessing CCM databases. IND Databases Use INDNO for accessing CRSP Indices databases. CUSIP is not available as a key for accessing CRSP Indices databases; use INDNO instead. HCUSIP=crsp_hcusip
The HCUSIP= option is similar to the PERMNO= option. It enables you to use the historical CUSIP key, HCUSIP, to select the companies or issues to keep. There is no limit to the number of crsp_hcusip options that you can use. STK Databases HCUSIP is a secondary key for accessing CRSP Stock databases. One HCUSIP maps to one PERMNO.
2200 F Chapter 33: The SASECRSP Interface Engine
CCM Databases HCUSIP is not available as a key for accessing CCM databases. IND Databases Use INDNO for accessing CRSP Indices databases. HCUSIP is not available as a key for accessing CRSP Indices databases; use INDNO instead. TICKER=crsp_ticker
The TICKER= option is similar to the PERMNO= option. It enables you to use the TICKER key to select the companies or issues to keep. There is no limit to the number of crsp_ticker options that you can use. STK Databases TICKER is a secondary key for accessing CRSP Stock databases. One TICKER maps to one PERMNO. Note that some PERMNOs are inaccessible by TICKER. CCM Databases TICKER is not available as a key for accessing CCM databases. IND Databases Use INDNO for accessing CRSP Indices databases. TICKER is not available as a key for accessing CRSP Indices databases; use INDNO instead. SICCD=crsp_siccd
The SICCD= option is similar to the PERMNO= option. It enables you to use the Standard Industrial Classification (SIC) Code (SICCD) to select the companies or issues to keep. There is no limit to the number of crsp_siccd options that you can use. STK Databases SICCD is a secondary key for accessing CRSP Stock databases. One SICCD can map to multiple PERMNOs. All PERMNOs that have been classified once under the specified SICCD are mapped and data for them is retrieved. Access by SICCD is equivalent to access by all PERMNOs that have ever been classified under the specified SICCD. CCM Databases SICCD is not available as a key for accessing CCM databases. IND Databases Use INDNO for accessing CRSP Indices databases. SICCD is not available as a key for accessing CRSP Indices databases; use INDNO instead. INDNO=crsp_indno
The INDNO= option is similar to the PERMNO= option. It enables you to use CRSP’s permanent index number INDNO to select the companies or issues to keep. There is no limit to the number of crsp_indno options that you can use.
The LIBNAME libref SASECRSP Statement F 2201
STK Databases INDNO is not available as a key for accessing CRSP Stock databases, but it can be used in the combined CRSP Stock and Indices databases. CCM Databases INDNO is not available as a key for accessing CCM databases; use GVKEY instead. IND Databases INDNO is the primary key for accessing CRSP Indices databases. Every INDNO you specify keeps exactly one index series or group. For example, you can use the following statement to access the CRSP NYSE Value-Weighted and Equal-Weighted daily market indices: LIBNAME myLib3 sasecrsp ’physical-name’ SETID=460 INDNO=1000000 /* Value-Weighted */ INDNO=1000001; /* Equal-Weighted */
CRSPLINKPATH=’crsp_linkpath’
To access CRSP Stock data with GVKEYs, use the CRSPLINKPATH= option. CRSPLINKPATH= specifies the physical location where your CCM database resides. N OTE : The physical name must end in a slash for UNIX environments and a backslash for Windows environments. RANGE=’crsp_begdt-crsp_enddt’
To limit the time range of data read from your CRSPAccess database, specify the RANGE= option in your SASECRSP libref, where crsp_begdt is the beginning date in YYYYMMDD format and crsp_enddt is the ending date of the range in YYYYMMDD format. As an example, to access monthly stock data for Microsoft Corporation and for International Business Machines Corporation for the first quarter of 1999, you can use the following statement: LIBNAME myLib sasecrsp ’physical-name’ SETID=20 PERMNO=10107 PERMNO=12490 RANGE=’19990101-19990331’;
The given beginning and ending dates are interpreted as calendar dates by default. If you want these dates to be interpreted as fiscal dates, you must prepend the character ‘f’ to the range. For example, the following statement extracts data for the 1994 fiscal year of both Microsoft and IBM. LIBNAME myLib sasecrsp ’physical-name’ SETID=20 PERMNO=10107
2202 F Chapter 33: The SASECRSP Interface Engine
PERMNO=12490 RANGE=’f19940101-19941231’;
The result of the previous statement is that data from actual calendar date July 1,1993, to June 30,1994, is extracted for Microsoft because its fiscal year end month is June. Data from January 1,1994, to December 31,1994, is extracted for IBM because its fiscal year end month is December. See Example 33.10 for a more detailed example. The RANGE= option can be used on all CRSP Stock, Indices, and CCM members. When this option is applied to segment data members however, the behavior is slightly different in the following ways.
Dates associated with segment member data records are in years and can resolve only to years. This is unique to segment members. All other CRSP data members have a date resolution to the day. For example, monthly time series, though monthly, resolve to the last trading day of the month. However, segment members have a maximum resolution of years because they are not mapped to a calendar in the CRSP/Compustat database. Hence, when range restrictions are applied to segment members, only the ‘YYYY’ year portion of the range is considered.
Multiple dates are sometimes associated with a particular segment member record. In such cases, the preferred date for use in determining the date range restriction is the data year as opposed to the source year. This multiple date behavior is unique only to segment members. All other CRSP data members are associated with only one date.
INSET=‘setname[,keyfieldname,keyfieldtype,date1field,date2field,datetype]’
When you specify a SAS data set named setname as input for issues, the SASECRSP engine assumes that a default PERMNO field that contains selected CRSP PERMNOs is present in the data set. If optional parameters are used, they must all be specified. The only acceptable shorthand for dropping the parameters is to drop those at the very end, assuming they are all being omitted. Dropped parameters use their defaults. The optional parameters are explained below: keyfieldname
label of the field that contains the keys to be selected. If unspecified, the default is “PERMNO”.
keyfieldtype
specifies the CRSPAccess key type of the provided keys. Possible key types are: PERMNO, PERMCO, CUSIP, HCUSIP, TICKER, SICCD, GVKEY or INDNO. If unspecified, the default is “PERMNO”.
date1field beginning date of the specific date range restriction being applied to this key. If either date1field or date2field is omitted, the default is for there to be no date range restriction. date2field ending date of the specific date range restriction being applied to this key. If either date1field or date2field is omitted, the default is for there to be no date range restriction. datetype
indicates whether the provided beginning and ending dates are calendar dates or fiscal dates. A fiscal date type means the dates given are based on the fiscal calendar of the respective company or GVKEY. A calendar date means the dates are based on the standard Julian calendar. The strings ‘calendar’ and ‘fiscal’ are used to indicate the respective date types. If unspecified, the default type is calendar.
Details: SASECRSP Interface Engine F 2203
It is important to note that fiscal dates are applicable only to members with fiscal data. Fiscal members consists of all period descriptors, items, and segment members of the CCM database. If a fiscal date range is applied to nonfiscal members, it is ignored. Individual date range restrictions specified by the inset can be used in combination with the RANGE= option on the LIBNAME. In such a case, only data from the intersection of the individual date restriction and the global RANGE= option date restriction are read.
Details: SASECRSP Interface Engine
Using the Inset Option To better illustrate the use of the INSET= option, some examples follow: Basic Inset Use: Providing a List of PERMNOs This example uses the INSET= option to extract monthly data for a portfolio of three companies. No date range restriction is used. data testin1; permno = 10107; output; permno = 12490; output; permno = 14322; output; run; LIBNAME mstk sasecrsp ’physical-name’ SETID=20 INSET=’testin1’; proc print data=mstk.stkhead (keep=permno permco begdt enddt hcomnam htick); run;
General Use of Inset for Specifying Lists of Keys The following example illustrates the use of the INSET= option to select a few Index Series from the Indices database, companies from the CCM database, and securities from the Stock database. Libref ind2 is used for accessing the Indices database with the two specified INDNOs. Libref comp2 is used to access the CCM database with the two specified PERMCOs. Libref sec3 is used to access the Stock database with the three specified TICKERs. Note the use of shorthand in specifying the INSET= option. The date1field, date2field, and datetype fields are all omitted, thereby using the default of no range restriction (though the range restriction set by the RANGE= on the LIBNAME statement still applies). For details including sample output, see Example 33.4 data indices;
2204 F Chapter 33: The SASECRSP Interface Engine
indno=1000000; output; indno=1000001; output;
/* NYSE Value-Weighted Market Index */ /* NYSE Equal-Weighted Market Index */
run; libname ind2 sasecrsp "%sysget(CRSP_MSTK)" setid=420 inset=’indices,INDNO,INDNO’ range=’19990101-19990401’; title2 ’Total Returns for NYSE Value and Equal Weighted Market Indices’; proc print data=ind2.tret label; run; data companies; permco=8045; output; permco=20483; output; run;
/* Oracle */ /* Citigroup */
libname comp2 sasecrsp "%sysget(CRSP_CST)" setid=200 inset=’companies,PERMCO,PERMCO’ range=’20040101-20040531’; title2 ’Link Info of Selected PERMCOs’; proc print data=comp2.link label; run; title3 ’Dividends Per Share for Oracle and Citigroup’; proc print data=comp2.div label; run; data securities; ticker=’BAC’; output; ticker=’DUK’; output; ticker=’GSK’; output; run;
/* Bank of America */ /* Duke Energy */ /* GlaxoSmithKline */
libname sec3 sasecrsp "%sysget(CRSP_MSTK)" setid=20 inset=’securities,TICKER,TICKER’ range=’19970820-19970920’; title2 ’PERMNOs and General Header Info of Selected TICKERs’; proc print data=sec3.stkhead (keep=permno htick htsymbol) label; run; title3 ’Average Price for Bank of America, Duke and GlaxoSmithKline’; proc print data=sec3.prc label; run;
Key-Specific Date Range Restriction with Insets Suppose you not only want to select keys with your inset, but also want to specify a date range restriction for each key individually. The following example shows how to do this. Again, shorthand enables you to omit the datetype field. The provided dates default to a calendar interpretation. For details including the sample output, see Example 33.5. title2 ’INSET=testin2 uses date ranges along with PERMNOs:’; title3 ’10107, 12490, 14322, 25788’; title4 ’Begin dates and end dates for each permno are used in the INSET’;
Using the Inset Option F 2205
data testin2; permno = 10107; permno = 12490; permno = 14322; permno = 25778; run;
date1 date1 date1 date1
= = = =
19980731; 19970101; 19950731; 19950101;
date2 date2 date2 date2
= = = =
19981231; 19971231; 19960131; 19950331;
output; output; output; output;
libname mstk2 sasecrsp "%sysget(CRSP_MSTK)" setid=20 inset=’testin2,PERMNO,PERMNO,DATE1,DATE2’; data b; set mstk2.prc; run; proc print data=b; run;
Fiscal Date Range Restrictions with Insets You can use fiscal dates on the date range restrictions inside insets by specifying the date type. The following example shows two identical accesses, except one inset uses the date range restriction in fiscal terms, and the other inset uses the date range restriction in calendar terms. For details including sample output, see Example 33.10. data comp_fiscal; /* Crude Petroleum & Natural Gas */ compkey=2416; begdate=19860101; enddate=19861231; datetype=’fiscal’; output; /* Commercial Intertech */ compkey=3248; begdate=19940101; enddate=19941231; datetype=’fiscal’; output; run; data comp_calendar; /* Crude Petroleum & Natural Gas */ compkey=2416; begdate=19860101; enddate=19861231; datetype=’calendar’; output; /* Commercial Intertech */ compkey=3248; begdate=19940101; enddate=19941231; datetype=’calendar’; output;
2206 F Chapter 33: The SASECRSP Interface Engine
run; libname fisclib sasecrsp "%sysget(CRSP_CST)" SETID=200 INSET=’comp_fiscal,compkey,gvkey,begdate,enddate,datetype’; libname callib sasecrsp "%sysget(CRSP_CST)" SETID=200 INSET=’comp_calendar,compkey,gvkey,begdate,enddate,datetype’; title2 ’Quarterly Period Descriptors with Fiscal Date Range’; proc print data=fisclib.qperdes(drop = peftnt1 peftnt2 peftnt3 peftnt4 peftnt5 peftnt6 peftnt7 peftnt8 candxc flowcd spbond spdebt sppaper); run; title2 ’Quarterly Period Descriptors with Calendar Date Range’; proc print data=callib.qperdes(drop = peftnt1 peftnt2 peftnt3 peftnt4 peftnt5 peftnt6 peftnt7 peftnt8 candxc flowcd spbond spdebt sppaper); run;
Inset Ranges in Conjunction with the LIBNAME Range Suppose you want to specify individual date restrictions but also impose a common range. This example demonstrates two companies, each with its own date range restriction, but both companies are also subject to a common range set in the LIBNAME by the RANGE= option. As a result, data from August 1, 1999, to February 1, 2000, is retrieved for IBM, and data from January 1, 2001, to April 21, 2002, is retrieved for Microsoft. For details including sample output see Example 33.11. data two_companies; gvkey=6066; date1=19800101; date2=20000201; output; gvkey=12141; date1=20010101; date2=20051231; output; run; libname mylib sasecrsp "%sysget(CRSP_CST)" SETID=200 INSET=’two_companies,gvkey,gvkey,date1,date2’ RANGE=’19990801-20020421’; proc sql; select prcc.gvkey,prcc.caldt,prcc,ern from mylib.prcc as prcc, mylib.ern as ern where prcc.caldt = ern.caldt and prcc.gvkey = ern.gvkey; quit;
The SAS Output Data Set You can use the SAS DATA step to write the selected CRSP or Compustat data to a SAS data set. This enables you to easily analyze the data using SAS. When you specify the name of the output
The SAS Output Data Set F 2207
data set on the DATA statement, it causes the engine supervisor to create a SAS data set using the specified name in either the SAS WORK library or, if specified, the USER library. The contents of the SAS data set include the DATE of each observation, the series name of each series read from the CRSPAccess database, event variables, and the label or description of each series/event or array. You can use PROC PRINT and PROC CONTENTS to print your output data set and its contents. Alternatively, you can view your SAS output observations by opening the desired output data set in the SAS Explorer. You can also use PROC SQL with your SASECRSP libref to create a custom view of your data. In general, CRSP missing values are represented as ‘.’ in the SAS data set. When accessing the CRSP STOCK data, SASECRSP uses the mapping shown in Table 33.6 for converting CRSP missing values into SAS missing codes. Table 33.6
Mapping of CRSP Stock Missing Values to SAS Missing Codes
CRSP Stock –99 –88 –77 –66 –55 –44
SAS . .A .B .C .D .E
Condition No valid price Out of range Off-exchange No valid previous price No delisting information No valid comparison for an excess return
When accessing the CCM database, CRSP uses certain Compustat missing codes which SASECRSP then converts into SAS missing codes. Table 33.7 shows the mapping of Compustat missing codes for the CCM database. Table 33.7
Mapping of Compustat and SAS Missing Codes
Compustat 0.0001 0.0002 0.0003 0.0004 0.0007 0.0008
SAS . .S .A .C .N .I
Condition No data for data item Data is only on a semi-annual basis Data is only on an annual basis Combined into other item Data is not meaningful Reported as insignificant
Missing value codes conform with Compustat’s Strategic Insight and binary conventions for missing values. See Notes on Missing Values in the second chapter of the CRSP/Compustat Merged Database Guide for more information about how CRSP handles Compustat missing codes.
2208 F Chapter 33: The SASECRSP Interface Engine
Understanding CRSP Date Formats, Informats, and Functions CRSP has historically used two different methods to represent dates, while SAS has used a third. The three formats are SAS dates, CRSP dates, and integer dates. The SASECRSP engine provides 23 functions, 15 informats, and 10 formats to enable you to easily translate the dates from one internal representation to another. A SASECRSP LIBNAME assign must be active to use these date access methods. See Example 33.6, “Converting Dates Using the CRSP Date Functions.” SAS dates are stored internally as the number of days since January 1, 1960. The SAS method is an industry standard and provides a great deal of flexibility, including a wide variety of informats, formats, and functions. CRSP dates are designed to ease time series storage and access. Internally, the dates are stored as an offset into an array of trading days or trading day calendar. Note that there are five different CRSP trading day calendars: Annual, Quarterly, Monthly, Weekly, and Daily. In this sense, there are five different types of CRSP dates, one for each frequency of calendar it references. The CRSP method provides fewer missing values and makes trading period calculations very easy. However, there are also many valid calendar dates that are not available in the CRSP trading calendars, and care must be taken when using other dates. Integer dates are a way to represent dates that are platform independent and maintain the correct sort order. However, the distance between dates is not maintained. The best way to illustrate these formats is with some sample data. Table 33.8 shows date representations for CRSP daily and monthly data. Table 33.8
Date Representations for Daily and Monthly Data
Date
SAS Date
CRSP Date CRSP Date (Daily) (Monthly) July 31, 1962 942 21 440 August 31, 1962 973 44 441 Dec. 30, 1998 14,243 9190 NA* Dec. 31, 1998 14,244 9191 877 * Not available if an exact match is requested.
Integer Date 19620731 19620831 19981230 19981231
Having an understanding of the internal differences in representing SAS dates, CRSP dates, and CRSP integer dates helps you use the SASECRSP formats, informats, and functions effectively. Always keep in mind the frequency of the CRSP calendar that you are accessing when you specify a CRSP date.
The CRSP Date Formats There are two types of formats for CRSP dates, and five frequencies are available for each of the two types. The two types are exact dates (CRSPDT*) and range dates (CRSPDR*), where the ‘*’ can be A for annual, Q for quarterly, M for monthly, W for weekly, or D for daily. The ten
Understanding CRSP Date Formats, Informats, and Functions F 2209
types are: CRSPDTA, CRSPDTQ, CRSPDTM, CRSPDTW, CRSPDTD, CRSPDRA, CRSPDRQ, CRSPDRM, CRSPDRW, and CRSPDRD. Table 33.9 shows some samples that use the monthly and daily calendar as examples. The Annual (CRSPDTA and CRSPDRA), Quarterly (CRSPDTQ and CRSPDRQ), and the Weekly (CRSPDTW and CRSPDRW) formats work analogously. Table 33.9
Sample CRSPDT Formats for Daily and Monthly Data
Date July 31,1962
CRSP Date Daily, Monthly 21, 440
CRSPDTD Daily Date 19620731
CRSPDRD Daily Range 19620731 +
CRSPDTM Monthly Date 19620731
CRSPDRM Monthly Range
19620630, 19620731 August 31,1962 44, 441 19620831 19620831 + 19620831 19620801, 19620831 Dec. 30,1998 9190, NA * 19981230 19981230 + NA* NA* Dec. 31,1998 9191, 877 19981231 19981231 + 19981231 19981201, 19981231 + Daily ranges look similar to Monthly Ranges if they are Mondays or immediately following a trading holiday. * When working with exact matches, no CRSP monthly date exists for December 30, 1998.
The @CRSP Date Informats There are three types of informats for CRSP dates, and five frequencies are available for each of the three types. The three types are exact (@CRSPDT*), range (@CRSPDR*), and backward (@CRSPDB*) dates, where the ‘*’ can be A for annual, Q for quarterly, M for monthly, W for weekly, or D for daily. The fifteen formats are: @CRSPDTA, @CRSPDTQ, @CRSPDTM, @CRSPDTW, @CRSPDTD, @CRSPDRA, @CRSPDRQ, @CRSPDRM, @CRSPDRW, @CRSPDRD, @CRSPDBA, @CRSPDBQ, @CRSPDBM, @CRSPDBW, and @CRSPDBD. The five CRSPDT* informats find exact matches only. The five CRSPDR* informats look for an exact match, and if an exact match is not found, they go forward, matching the CRSPDR* formats. The five CRSPDB* informats look for an exact match, and if an exact match is not found, they go backward. Table 33.10 shows a sample that uses only the CRSP monthly calendar as an example. The daily, weekly, quarterly, and annual frequencies work analogously.
2210 F Chapter 33: The SASECRSP Interface Engine
Table 33.10
Sample @CRSP Date Informats Using Monthly Data
Input Date (Integer Date)
CRSP Date CRSPDTM
CRSP Date CRSPDRM
CRSP Date CRSPDBM
19620731
440
440
440
CRSPDTM Monthly Date 19620731
19620815 19620831
.(missing) 441
441 441
440 441
See below+ 19620831
CRSPDRM Monthly Range 19620630 to 19620731 See below* 19620801 to 19620831
+ If missing, then missing. If 441, then 19620831. If 440, then 19620731. * If missing, then missing. If 441, then 19620801 to 19620831. If 440, then 19620630 to 19620731.
The CRSP Date Functions Table 33.11 shows the 23 date functions provided with the SASECRSP engine. These functions are used internally by the engine, but also are available to the end users. There are seven groups of functions. The first four have five functions each, one for each CRSP calendar frequency. The next two are for converting between SAS and Integer date formats. The last function does not convert between formats, but is a shifting function for shifting integer dates based on a fiscal calendar to normal calendar time. In this shift function, the second argument holds the fiscal year-end month of the fiscal calendar used.
Understanding CRSP Date Formats, Informats, and Functions F 2211
Table 33.11
Function Group
CRSP Date Functions
Function Argument Argument Return Name One Two Value CRSP dates to integer dates for December 31, 1998 Annual crspdcia 74 None 19981231 Quarterly crspdciq 293 None 19981231 Monthly crspdcim 877 None 19981231 Weekly crspdciw 1905 None 19981231 Daily crspdcid 9191 None 19981231 CRSP dates to SAS dates for December 31, 1998 Annual crspdcsa 74 None 14,244 Quarterly crspdcsq 293 None 14,244 Monthly crspdcsm 877 None 14,244 Weekly crspdcsw 1905 None 14,244 Daily crspdcsd 9191 None 14,244 Integer dates to CRSP dates exact is illustrated, but can be forward or backward Annual crspdica 19981231 0 74 Quarterly crspdicq 19981231 0 293 Monthly crspdicm 19981231 0 877 Weekly crspdicw 19981231 0 1905 Daily crspdicd 19981231 0 9191 SAS dates to CRSP dates exact is illustrated, but can be forward or backward Annual crspdsca 14,244 0 74 Quarterly crspdscq 14,244 0 293 Monthly crspdscm 14,244 0 877 Weekly crspdscw 14,244 0 1905 Daily crspdscd 14,244 0 9191 Integer dates to SAS dates for December 31, 1998 Integer to SAS crspdi2s 19981231 None 14,244 SAS dates to integer dates for December 31, 1998 SAS to Integer crspds2i 14,244 None 19981231 Fiscal to calendar shifting of integer dates for December 31, 1998 Fiscal to Calendar crspdf2c 20021231 8 20020831 Shift
2212 F Chapter 33: The SASECRSP Interface Engine
Examples: SASECRSP Interface Engine
Example 33.1: Specifying PERMNOs and RANGE on the LIBNAME Statement The following statements show how to set up a LIBNAME statement for extracting data for certain selected PERMNOs during a specific time period. The result is shown in Output 33.1.1. title2 ’Define a range inside the data range’; title3 ’My range is ( 19950101-19960630 )’; libname _all_ clear; libname testit1 sasecrsp "%sysget(CRSP_MSTK)" setid=20 permno=81871 /* Desired PERMNOs are selected */ permno=82200 /* via the libname PERMNO= option */ permno=82224 permno=83435 permno=83696 permno=83776 permno=84788 range=’19950101-19960630’; proc print data=testit1.ask; run;
Example 33.1: Specifying PERMNOs and RANGE on the LIBNAME Statement F 2213
Output 33.1.1 ASK Monthly Time Series Data with RANGE Define a range inside the data range My range is ( 19950101-19960630 ) Obs
PERMNO
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 82200 82200 82200 82200 82200 82200 82200 82200 82200 82200 82200 82224 82224 82224 82224 82224 82224 82224 82224 82224 82224 83435 83435 83435 83696
CALDT 19950731 19950831 19950929 19951031 19951130 19951229 19960131 19960229 19960329 19960430 19960531 19960628 19950831 19950929 19951031 19951130 19951229 19960131 19960229 19960329 19960430 19960531 19960628 19950929 19951031 19951130 19951229 19960131 19960229 19960329 19960430 19960531 19960628 19960430 19960531 19960628 19960628
ASK 18.25000 19.25000 26.00000 26.00000 25.50000 24.25000 22.00000 32.50000 30.25000 33.75000 27.50000 30.50000 49.50000 62.75000 88.00000 138.50000 139.25000 164.25000 51.00000 41.62500 61.25000 68.25000 62.50000 46.50000 48.50000 47.75000 49.75000 49.00000 47.00000 53.00000 55.50000 54.25000 51.00000 30.25000 28.00000 21.00000 19.12500
2214 F Chapter 33: The SASECRSP Interface Engine
Example 33.2: Using the LIBNAME Statement to Access All Keys To set up the libref to access all keys, no key options such as PERMNO=, TICKER=, or GVKEY= are specified on the LIBNAME statement, and no INSET= option is used. Use of any of these options causes the engine to limit access to only specified keys or specified insets. When no such options are specified, the engine correctly defaults to selecting all keys in the database. Other LIBNAME options such as the RANGE= option can still be used normally to limit the time span of the data, in other words, to define the date range of observations. In this example, no key-specifying options are used. This forces the engine to default to all PERMNOs in the monthly STK database. The range given on the LIBNAME behaves normally, and data is limited to the first two months of 1995. title2 ’Define a range inside the data range ’; title3 ’My range is ( 19950101-19950228 )’; libname _all_ clear; libname testit2 sasecrsp "%sysget(CRSP_MSTK)" setid=20 range=’19950101-19950228’; data a; set testit2.ask(obs=30); run; proc print data=a; run;
The result is shown in Output 33.2.1.
Example 33.2: Using the LIBNAME Statement to Access All Keys F 2215
Output 33.2.1 All PERMNOs of ASK Monthly with RANGE Define a range inside the data range My range is ( 19950101-19950228 ) Obs
PERMNO
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
10001 10001 10002 10002 10003 10003 10009 10009 10010 10010 10011 10011 10012 10012 10016 10016 10018 10018 10019 10019 10021 10021 10025 10025 10026 10026 10028 10028 10032 10032
CALDT 19950131 19950228 19950131 19950228 19950131 19950228 19950131 19950228 19950131 19950228 19950131 19950228 19950131 19950228 19950131 19950228 19950131 19950228 19950131 19950228 19950131 19950228 19950131 19950228 19950131 19950228 19950131 19950228 19950131 19950228
ASK 8.00000 8.00000 13.50000 13.50000 2.12500 2.25000 18.00000 18.75000 5.37500 4.87500 14.62500 13.50000 2.25000 2.12500 7.00000 8.50000 1.12500 1.12500 10.62500 11.62500 11.75000 12.00000 18.50000 19.00000 11.00000 11.75000 1.87500 2.00000 12.50000 12.75000
2216 F Chapter 33: The SASECRSP Interface Engine
Example 33.3: Accessing One PERMNO Using No RANGE SASECRSP defaults to providing access to the entire range of available data when no range restriction is specified via the RANGE= option. This example shows access of the entire range of available data for one particular PERMNO extracted from the monthly data set. title2 ’Select only PERMNO = 81871’; title3 ’Valid trading dates (19890131--19981231)’; title4 ’No range option, leave wide open’; libname _all_ clear; libname testit3 sasecrsp "%sysget(CRSP_MSTK)" setid=20 permno=81871; data c; set testit3.ask; run; proc print data=c; run;
The result is shown in Output 33.3.1.
Example 33.4: Specifying Keys Using the INSET= Option F 2217
Output 33.3.1 PERMNO=81871 of ASK Monthly without RANGE Select only PERMNO = 81871 Valid trading dates (19890131--19981231) No range option, leave wide open Obs
PERMNO
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871 81871
CALDT 19950731 19950831 19950929 19951031 19951130 19951229 19960131 19960229 19960329 19960430 19960531 19960628 19960731 19960830 19960930 19961031 19961129 19961231 19970131 19970228 19970331 19970430 19970530 19970630 19970731 19970829 19970930 19971031 19971128 19971231 19980130 19980227 19980331 19980430 19980529 19980630 19980731 19980831 19980930 19981030
ASK 18.25000 19.25000 26.00000 26.00000 25.50000 24.25000 22.00000 32.50000 30.25000 33.75000 27.50000 30.50000 26.12500 19.12500 19.50000 14.00000 18.75000 24.25000 29.75000 24.37500 15.00000 18.25000 25.12500 31.12500 35.00000 33.00000 26.81250 18.37500 16.50000 16.25000 22.75000 21.00000 22.50000 16.12500 11.12500 13.43750 22.87500 17.75000 24.25000 26.00000
Example 33.4: Specifying Keys Using the INSET= Option The INSET= option enables you to select any companies and/or issues you want data for. This example selects two CRSP Index Series from the Indices database, two companies from the CCM
2218 F Chapter 33: The SASECRSP Interface Engine
database, and four securities from the Stock database for data extraction. Note that because each CRSP database might be in a different location and has to be opened separately, a total of three different librefs are used, one for each database. data indices; indno=1000000; output; indno=1000001; output; run;
/* NYSE Value-Weighted Market Index */ /* NYSE Equal-Weighted Market Index */
libname _all_ clear; libname ind2 sasecrsp "%sysget(CRSP_MSTK)" setid=420 inset=’indices,INDNO,INDNO’ range=’19990101-19990401’; title2 ’Total Returns for NYSE Value and Equal Weighted Market Indices’; proc print data=ind2.tret label; run;
Output 33.4.1 shows the result of selecting two CRSP Index Series from the Indices database. Output 33.4.1 IND Data Extracted Using INSET= Option Total Returns for NYSE Value and Equal Weighted Market Indices Obs
INDNO
1 2 3 4 5 6
1000000 1000000 1000000 1000001 1000001 1000001
CALDT 19990129 19990226 19990331 19990129 19990226 19990331
TRET 0.012419 -0.024179 0.028591 -0.007822 -0.041127 0.015204
This example selects two companies from the CCM database. data companies; permco=8045; output; permco=20483; output; run;
/* Oracle */ /* Citigroup */
libname comp2 sasecrsp "%sysget(CRSP_CST)" setid=200 inset=’companies,PERMCO,PERMCO’ range=’20040101-20040531’; title2 ’Using the Link Info of Selected PERMCOs’; proc print data=comp2.link label; run; title3 ’To Show Dividends Per Share for Oracle and Citigroup’; proc print data=comp2.div label;
Example 33.4: Specifying Keys Using the INSET= Option F 2219
run;
Output 33.4.2 shows the result of selecting two companies from the CCM database by using the CCM LINK data and the INSET= option. Output 33.4.2 CCM LINK Data Extracted By Using INSET= Option Using the Link Info of Selected PERMCOs To Show Dividends Per Share for Oracle and Citigroup Obs
GVKEY
LINKDT
LINKENDT
NPERMNO
NPERMCO
1 2
12142 3243
19860312 19861029
20991231 20991231
10104 70519
8045 20483
LINKTYPE LC LC
LINKFLAG BBB BBB
Output 33.4.3 shows the result of selecting two companies from the CCM database by using the CCM DIV data and the INSET= option. Output 33.4.3 CCM DIV Data Extracted By Using INSET= Option Using the Link Info of Selected PERMCOs To Show Dividends Per Share for Oracle and Citigroup Obs
GVKEY
CALDT
1 2 3 4 5 6 7 8 9 10
12142 12142 12142 12142 12142 3243 3243 3243 3243 3243
20040130 20040227 20040331 20040430 20040528 20040130 20040227 20040331 20040430 20040528
DIV 0.0000 0.0000 0.0000 0.0000 0.0000 0.4000 0.0000 0.0000 0.4000 0.0000
This example selects three securities from the Stock database by using TICKERs in the INSET= option for data extraction. data securities; ticker=’BAC’; output; ticker=’DUK’; output; ticker=’GSK’; output; run;
/* Bank of America */ /* Duke Energy */ /* GlaxoSmithKline */
libname sec3 sasecrsp "%sysget(CRSP_MSTK)" setid=20 inset=’securities,TICKER,TICKER’ range=’19970820-19970920’; title2 ’PERMNOs and General Header Info of Selected TICKERs’;
2220 F Chapter 33: The SASECRSP Interface Engine
proc print data=sec3.stkhead(keep=permno htick htsymbol) label; run; title3 ’Average Price for Bank of America, Duke and GlaxoSmithKline’; proc print data=sec3.prc label; run;
Output 33.4.4 shows the STK header data for the TICKERs specified by using the INSET= option. Output 33.4.4 STK Header Data Extracted Using INSET= Option PERMNOs and General Header Info of Selected TICKERs Average Price for Bank of America, Duke and GlaxoSmithKline Obs
PERMNO
1 2 3
59408 27959 75064
Obs
PERMCO
COMPNO
3151 60003150 20608 0 1973 60001972
ENDDT DLSTCD HCUSIP
1 20061229 2 20061229 3 20061229
ISSUNO HEXCD HSHRCD
100 100 100
Obs HTSYMBOL
4005 0 2523
HTICK
1 1 1
11 11 31
HSICCD
6021 19721229 4911 19610731 2834 19721229
HCOMNAM
06050510 BAC 26441C10 DUK 37733W10 GSK
BANK OF AMERICA CORP DUKE ENERGY CORP NEW GLAXOSMITHKLINE PLC
HNAICS
HPRIMEXC
HTRDSTAT
HSECSTAT
522110 221122 325412
N N N
A A A
R R R
1 BAC 2 DUK 3 GSK
BEGDT
Output 33.4.5 shows the STK price data for the TICKERs specified by using the INSET= option. Output 33.4.5 STK Price Data Extracted Using INSET= Option PERMNOs and General Header Info of Selected TICKERs Average Price for Bank of America, Duke and GlaxoSmithKline Obs
PERMNO
1 2 3
59408 27959 75064
CALDT 19970829 19970829 19970829
PRC 59.75000 48.43750 39.93750
Example 33.5: Specifying Ranges for Individual Keys with the INSET= Option F 2221
Example 33.5: Specifying Ranges for Individual Keys with the INSET= Option Insets enable you to define options specific to each individual key. This example uses an inset to select four PERMNOs and specifies a different date restriction for each PERMNO. title2 ’INSET=testin2 uses date ranges along with PERMNOs:’; title3 ’10107, 12490, 14322, 25788’; title4 ’Begin dates and end dates for each permno are used in the INSET’; data testin2; permno = 10107; permno = 12490; permno = 14322; permno = 25778; run;
date1 date1 date1 date1
= = = =
19980731; 19970101; 19950731; 19950101;
date2 date2 date2 date2
= = = =
19981231; 19971231; 19960131; 19950331;
output; output; output; output;
libname _all_ clear; libname mstk2 sasecrsp "%sysget(CRSP_MSTK)" setid=20 inset=’testin2,PERMNO,PERMNO,DATE1,DATE2’; data b; set mstk2.prc; run; proc print data=b; run;
Output 33.5.1 shows CRSP Stock price time series data selected by PERMNO in the INSET= option, where each PERMNO has its own time span specified in the INSET= option.
2222 F Chapter 33: The SASECRSP Interface Engine
Output 33.5.1 PRC Monthly Time Series Using INSET= Option INSET=testin2 uses date ranges along with PERMNOs: 10107, 12490, 14322, 25788 Begin dates and end dates for each permno are used in the INSET Obs
PERMNO
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
10107 10107 10107 10107 10107 10107 12490 12490 12490 12490 12490 12490 12490 12490 12490 12490 12490 12490 14322 14322 14322 14322 14322 14322 14322 25778 25778 25778
CALDT 19980731 19980831 19980930 19981030 19981130 19981231 19970131 19970228 19970331 19970430 19970530 19970630 19970731 19970829 19970930 19971031 19971128 19971231 19950731 19950831 19950929 19951031 19951130 19951229 19960131 19950131 19950228 19950331
PRC 109.93750 95.93750 110.06250 105.87500 122.00000 138.68750 156.87500 143.75000 137.25000 160.50000 86.50000 90.25000 105.75000 101.37500 106.00000 98.50000 109.50000 104.62500 32.62500 32.37500 36.87500 34.00000 39.37500 39.00000 41.50000 49.87500 57.25000 59.37500
Example 33.6: Converting Dates By Using the CRSP Date Functions This example shows how to use the CRSP date functions and formats. The CRSPDTD formats are used for all the crspdt variables, while the YYMMDD format is used for the sasdt variables. title2 ’OUT= Data Set’; title3 ’CRSP Functions for sasecrsp’; libname _all_ clear; /* Always assign the LIBNAME sasecrsp first */ libname mstk sasecrsp "%sysget(CRSP_MSTK)" setid=20;
Example 33.6: Converting Dates By Using the CRSP Date Functions F 2223
data a (keep = crspdt crspdt2 crspdt3 sasdt sasdt2 sasdt3 intdt intdt2 intdt3); format crspdt crspdt2 crspdt3 crspdtd8.; format sasdt sasdt2 sasdt3 yymmdd6.; format intdt intdt2 intdt3 8.; format exact 2.; crspdt = 1; sasdt = ’2jul1962’d; intdt = 19620702; exact = 0; /* Call the CRSP date to Integer function*/ intdt2 = crspdcid(crspdt); /* Call the SAS date to Integer function*/ intdt3 = crspds2i(sasdt); /* Call the Integer to Crsp date function*/ crspdt2 = crspdicd(intdt,exact); /* Call the Sas date to Crsp date conversion function*/ crspdt3 = crspdscd(sasdt,exact); /* Call the CRSP date to SAS date conversion function*/ sasdt2 = crspdcsd(crspdt); /* Call the Integer to Sas date conversion function*/ sasdt3 = crspdi2s(intdt); run; title3 ’PROC PRINT showing data for sasecrsp’; proc print data=a; run; title3 ’PROC CONTENTS showing formats for sasecrsp’; proc contents data=a; run;
Output 33.6.1 shows the OUT= data set created by the DATA step. Output 33.6.1 Date Conversions By Using the CRSP Date Functions OUT= Data Set PROC CONTENTS showing formats for sasecrsp Obs crspdt 1
crspdt2
crspdt3
sasdt sasdt2 sasdt3
intdt
intdt2
intdt3
19251231 19620702 19620702 620702 251231 620702 19620702 19251231 19620702
Output 33.6.2 shows the contents of the OUT= data set by alphabetically listing the variables and their attributes.
2224 F Chapter 33: The SASECRSP Interface Engine
Output 33.6.2 Contents of Date Conversions By Using the CRSP Date Functions Alphabetic List of Variables and Attributes #
Variable
Type
1 2 3 7 8 9 4 5 6
crspdt crspdt2 crspdt3 intdt intdt2 intdt3 sasdt sasdt2 sasdt3
Num Num Num Num Num Num Num Num Num
Len 8 8 8 8 8 8 8 8 8
Format CRSPDTD8. CRSPDTD8. CRSPDTD8. 8. 8. 8. YYMMDD6. YYMMDD6. YYMMDD6.
Example 33.7: Comparing Different Ways of Accessing CCM Data You can use three different ways to select CCM data: by the primary key, GVKEY, or by either of the two secondary keys PERMNO and PERMCO. This section demonstrate the three different ways. This example retrieves data on Cimflex Teknowledge Corporation which was previously known as Teknowledge Inc. This company is considered a single entity by the CRSP Stock database and is identified by PERMNO=10083 and PERMCO=8026. The Compustat database, however, considers Teknowledge Inc. and Cimflex Teknowledge Corporation as two separate entities, and each has its own GVKEY. Thus, PERMNO=10083 maps to GVKEYs 11947 and 15495, and PERMCO=8026 has the identical relationship. Access by PERMNO and PERMCO are equivalent in this case, but differ from access by GVKEY. PERMNO/PERMCO access retrieves data only within the active period of the links, and only the primary linked GVKEY is used for header access. In contrast, GVKEY access provides wide-open, full data for both GVKEYs. See PERMNO= option for more details. title1 ’Comparing various access methods for CCM data’; libname _all_ clear; /* assign libnames for the three different access methods */ libname crsp1a sasecrsp "%sysget(CRSP_CST)" setid=200 permno=10083 range=’19870101-19900101’; libname crsp1b sasecrsp "%sysget(CRSP_CST)" setid=200 permco=8026 range=’19870101-19900101’; libname crsp2 sasecrsp "%sysget(CRSP_CST)" setid=200 gvkey=11947 gvkey=15495 range=’19870101-19900101’;
Example 33.7: Comparing Different Ways of Accessing CCM Data F 2225
title2 ’PERMNO=10083 access of CCM data’; title3 ’Sales (Net)’; data permnoaccess; set crsp1a.iqitems(keep=gvkey rcaldt fiscaldt iq2); run; proc print data=permnoaccess; run;
Output 33.7.1 shows PERMNO access of CCM quarterly ‘Sales (Net)’ data. Output 33.7.1 PERMNO Access of CCM Data Comparing various access methods for CCM data PERMNO=10083 access of CCM data Sales (Net) Obs
GVKEY
RCALDT
FISCALDT
IQ2
1 2 3 4 5 6 7 8 9 10
15495 15495 15495 15495 15495 15495 15495 15495 15495 15495
19870331 19870630 19870930 19871231 19880331 19880630 19890331 19890630 19890929 19891229
19870930 19871231 19880331 19880630 19880930 19881230 19890331 19890630 19890929 19891229
4.5680 5.0240 4.4380 3.8090 3.5420 2.5940 6.4660 10.1020 12.0650 10.8780
title2 ’GVKEY=11947 and GVKEY=15495 access of CCM data’; title3 ’Sales (Net)’; data gvkeyaccess; set crsp2.iqitems(keep=gvkey rcaldt fiscaldt iq2); run; proc print data=gvkeyaccess; run;
Output 33.7.2 shows GVKEY access of CCM quarterly Sales data.
2226 F Chapter 33: The SASECRSP Interface Engine
Output 33.7.2 GVKEY Access of CCM Data Comparing various access methods for CCM data GVKEY=11947 and GVKEY=15495 access of CCM data Sales (Net) Obs
GVKEY
RCALDT
FISCALDT
IQ2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
11947 11947 11947 11947 11947 11947 11947 11947 15495 15495 15495 15495 15495 15495 15495 15495
19870331 19870630 19870930 19871231 19880331 19880630 19880930 19881230 19880331 19880630 19880930 19881230 19890331 19890630 19890929 19891229
19870930 19871231 19880331 19880630 19880930 19881230 19890331 19890630 19880331 19880630 19880930 19881230 19890331 19890630 19890929 19891229
4.5680 5.0240 4.4380 3.8090 3.5420 2.5940 1.6850 1.7080 14.0660 12.2770 9.5960 9.9800 6.4660 10.1020 12.0650 10.8780
title3 ’LINK: Link information’; proc print data=crsp2.link; run; /* Show how PERMNO and PERMCO access are the same */ title4 ’Proc compare of PERMNO vs. PERMCO access of CCM data’; proc compare base=crsp1a.iqitems compare=crsp1b.iqitems brief; run;
Output 33.7.3 shows CRSP link information and comparison of GVKEY to PERMNO access. Output 33.7.3 Link Information and Comparison Comparing various access methods for CCM data GVKEY=11947 and GVKEY=15495 access of CCM data LINK: Link information Proc compare of PERMNO vs. PERMCO access of CCM data Obs
GVKEY
LINKDT
LINKENDT
NPERMNO
NPERMCO
1 2 3
11947 15495 15495
19860305 19880101 19890227
19890226 19890226 19930909
10083 0 10083
8026 0 8026
LINKTYPE LC NR LC
LINKFLAG BBB XXX BBB
Example 33.8: Comparing PERMNO and GVKEY Access of CRSP Stock Data F 2227
Output 33.7.3 continued Comparing various access methods for CCM data GVKEY=11947 and GVKEY=15495 access of CCM data LINK: Link information Proc compare of PERMNO vs. PERMCO access of CCM data The COMPARE Procedure Comparison of CRSP1A.IQITEMS with CRSP1B.IQITEMS (Method=EXACT) NOTE: No unequal values were found. All values compared are exactly equal.
Example 33.8: Comparing PERMNO and GVKEY Access of CRSP Stock Data You can access CRSP data using GVKEYs. Access in this manner requires the use of the CRSPLINKPATH= option, and is identical to access by its corresponding PERMNO(s). Links between PERMNOs and GVKEYs are used without reference to their active period. Link information is used solely for finding corresponding GVKEYs. This example shows two ways of accessing CRSP Stock data: one by PERMNOs and the other by its corresponding GVKEY. Several members are compared, showing they are equivalent. title ’Comparing PERMNO and GVKEY access of CRSP Stock data’; libname _all_ clear; libname crsp1 sasecrsp "%sysget(CRSP_MSTK)" setid=20 permno=13638 permno=84641 range=’19900101-19910101’; libname crsp2 sasecrsp "%sysget(CRSP_MSTK)" setid=20 crsplinkpath="%sysget(CRSP_CST)" gvkey=1544 range=’19900101-19910101’; title1 ’PERMNO=13638 and PERMNO=84641 access of CRSP data’; proc print data=crsp1.stkhead; run; %macro compareMember(memb); title1 "Proc compare on &memb between PERMNO and GVKEY"; proc compare base=crsp1.&memb compare=crsp2.&memb brief; run; %mend; %compareMember(stkhead); %compareMember(prc); %compareMember(ret);
2228 F Chapter 33: The SASECRSP Interface Engine
%compareMember(askhi); %compareMember(vol);
Output 33.8.1 compares PERMNO with GVKEY access of CRSP Stock members STKHEAD, PRC, RET, ASKHI, AND VOL, showing that they are equal. Output 33.8.1 Comparing PERMNO and GVKEY Access of CRSP Stock Data Proc compare on vol between PERMNO and GVKEY Obs
PERMNO
1 2
13638 84641
Obs
PERMCO
COMPNO
325 60000324 325 60000324
ENDDT DLSTCD HCUSIP
1 19921231 2 19980130 Obs HTSYMBOL
ISSUNO HEXCD HSHRCD
560 231
428 0
HTICK
97789210 02351730 HNAICS
1 WEXC 2
3 2
11 11
HSICCD
BEGDT
1310 19721229 1382 19970331
HCOMNAM WOLVERINE EXPLORATION CO AMERAC ENERGY CORP
HPRIMEXC
HTRDSTAT
HSECSTAT
Q A
A A
R R
Example 33.9: Using Fiscal Date Range Restriction Fiscal date ranges give you the flexibility of selecting company data by using fiscal year range sepcifications instead of calendar year range specifications. This example shows how to use this feature to extract data such as the ‘Earnings Per Share’ time series for several companies for the 1994 fiscal year. title ’Extract data for fiscal year 1994 for several companies’; libname _all_ clear; libname crsp1 sasecrsp "%sysget(CRSP_CST)" setid=200 gvkey=6066 gvkey=12141 gvkey=10107 range=’f19940101-19941231’; data rnd_eps (keep = gvkey rcaldt fiscaldt iq4 iq9 iq19 iq69); set crsp1.iqitems; run; proc print data=rnd_eps label; run;
Output 33.9.1 shows Earnings Per Share for several companies for the 1994 fiscal year.
Example 33.9: Using Fiscal Date Range Restriction F 2229
Output 33.9.1 Earnings Per Share by GVKEY Access for the 1994 Fiscal Year. Extract data for fiscal year 1994 for several companies Obs
GVKEY
RCALDT
FISCALDT
IQ4
1 2 3 4 5 6 7 8 9 10 11 12
6066 6066 6066 6066 12141 12141 12141 12141 10107 10107 10107 10107
19940331 19940630 19940930 19941230 19930930 19931231 19940331 19940630 19940331 19940630 19940930 19941230
19940331 19940630 19940930 19941230 19940331 19940630 19940930 19941230 19940331 19940630 19940930 19941230
1100.0000 1092.0000 1053.0000 1118.0000 134.0000 150.0000 156.0000 170.0000 A A A 100.9630
Obs
IQ9
IQ19
IQ69
1 2 3 4 5 6 7 8 9 10 11 12
0.6300 1.1300 1.1600 2.0300 0.7900 0.9500 0.8400 0.5900 0.4600 0.7100 0.7600 0.5400
0.6400 1.1400 1.1800 2.0600 0.7900 0.9500 0.8400 0.5900 0.4600 0.7100 0.7600 0.5400
392.0000 688.0000 710.0000 1231.0000 239.0000 289.0000 256.0000 362.0000 11.3890 17.3670 18.8070 13.4190
Note how two time ID variables are kept. Raw Calendar Trading Date provides the actual calendar date. Fiscal Trading Date provides the date according to the company’s fiscal calendar which is dependent upon when its fiscal year-end month is. For example, Observation 8 is Microsoft’s fourth fiscal quarter, hence a Fiscal Trading Date of December 30,1994. Since Microsoft’s fiscal year ends in June, its fourth fiscal quarter corresponds to the second calendar quarter of the year, so the Raw Calendar Trading Date shows June 30,1994. The shift calculation of six months (in this case) required to compute the Raw Calendar Trading Date is done automatically by the SASECRSP engine. Keep in mind that fiscal date ranges are applicable only to fiscal members. When fiscal date range restrictions are applied to nonfiscal members, they are ignored. The missing value ‘.A’ seen in observations 9 through 12 indicate that the data is reported only on an annual basis.
2230 F Chapter 33: The SASECRSP Interface Engine
Example 33.10: Using Different Types of Range Restrictions in the INSET You can specify both calendar and fiscal date range restrictions with the INSET= option. This example shows how to use both types of date range restrictions. Two INSETs, nearly identical except for the type of their date range restriction, are used for accessing the same database. Despite the many similarities, the different date range restriction types result in dissimilar output. Note that the specification of the datetype in the INSET= option for comp_calendar is not required. The datetype default is the calendar type. data comp_fiscal; /* Crude Petroleum & Natural Gas */ compkey=2416; begdate=19860101; enddate=19861231; datetype=’fiscal’; output; /* Commercial Intertech */ compkey=3248; begdate=19940101; enddate=19941231; datetype=’fiscal’; output; run; data comp_calendar; /* Crude Petroleum & Natural Gas */ compkey=2416; begdate=19860101; enddate=19861231; datetype=’calendar’; output; /* Commercial Intertech */ compkey=3248; begdate=19940101; enddate=19941231; datetype=’calendar’; output; run; libname _all_ clear; libname fisclib sasecrsp "%sysget(CRSP_CST)" SETID=200 INSET=’comp_fiscal,compkey,gvkey,begdate,enddate,datetype’; libname callib sasecrsp "%sysget(CRSP_CST)" SETID=200 INSET=’comp_calendar,compkey,gvkey,begdate,enddate,datetype’;
Example 33.10: Using Different Types of Range Restrictions in the INSET F 2231
title1 ’Quarterly Period Descriptors’; title2 ’Using the Fiscal Date Range’; proc print data=fisclib.qperdes(drop = peftnt1 peftnt2 peftnt3 peftnt4 peftnt5 peftnt6 peftnt7 peftnt8 candxc flowcd spbond spdebt sppaper); run;
Output 33.10.1 shows quarterly period descriptors for the 1986 and 1994 fiscal years. Output 33.10.1 Using Inset with Fiscal Date Range Quarterly Period Descriptors Using the Fiscal Date Range
G V K E Y
C R S P D T
R C A L D T
F I S C A L D T
2416 2416 2416 2416 3248 3248 3248 3248
242 243 244 245 274 275 276 277
19860630 19860930 19861231 19870331 19940131 19940429 19940729 19941031
19860331 19860630 19860930 19861231 19940331 19940630 19940930 19941230
O b s 1 2 3 4 5 6 7 8
O b s 1 2 3 4 5 6 7 8
S P B O N D
S P D E B T
S P P A P E R
S P R A N K
M A J I N D
I N D I N D
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
17 18 18 21 16 16 16 16
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
R E P D T
F L O W C D
C A N D X C
0 0 0 0 1994054 1994146 1994236 1994349
1 1 1 1 7 7 7 7
0 0 0 0 0 0 0 0
P E F T N T 1
P E F T N T 2
D A T Y R
D A T Q T R
F I S C Y R
C A L Y R
C A L Q T R
U P C O D E
S R C D O C
1986 1986 1986 1986 1994 1994 1994 1994
1 2 3 4 1 2 3 4
3 3 3 3 10 10 10 10
1986 1986 1986 1987 1993 1994 1994 1994
2 3 4 1 4 1 2 3
3 3 3 3 3 3 3 3
53 53 53 53 53 53 53 53
P E F T N T 3
P E F T N T 5
P E F T N T 6
P E F T N T 4
P E F T N T 7
P E F T N T 8
The next PRINT procedure uses the calendar datetype in its INSET= option instead of the fiscal datetype, producing different results for the Crude Petroleum and Natural Gas Company when the report is based on calendar dates instead of fiscal dates. The differences shown in observations 1 through 4 are due to Crude Petroleum and Natural Gas Company’s fiscal year ending in March instead of December.
2232 F Chapter 33: The SASECRSP Interface Engine
Since Commercial Intertech does not shift its fiscal year, but uses a fiscal year ending in December, the fiscal report and the calendar report match exactly for the company’s corresponding observations 5 through 8 in Output 33.10.1 and Output 33.10.2 respectively. title1 ’Quarterly Period Descriptors’; title2 ’Using the Calendar Date Range’; proc print data=callib.qperdes(drop = peftnt1 peftnt2 peftnt3 peftnt4 peftnt5 peftnt6 peftnt7 peftnt8 candxc flowcd spbond spdebt sppaper); run;
Output 33.10.2 shows quarterly period descriptors for the designated calendar date range. Output 33.10.2 Using Inset with Calendar Date Range Quarterly Period Descriptors Using the Calendar Date Range
G V K E Y
C R S P D T
R C A L D T
F I S C A L D T
2416 2416 2416 2416 3248 3248 3248 3248
241 242 243 244 274 275 276 277
19860331 19860630 19860930 19861231 19940131 19940429 19940729 19941031
19851231 19860331 19860630 19860930 19940331 19940630 19940930 19941230
O b s 1 2 3 4 5 6 7 8
O b s 1 2 3 4 5 6 7 8
S P B O N D
S P D E B T
S P P A P E R
S P R A N K
M A J I N D
I N D I N D
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
17 17 18 18 16 16 16 16
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
R E P D T
F L O W C D
C A N D X C
0 0 0 0 1994054 1994146 1994236 1994349
1 1 1 1 7 7 7 7
0 0 0 0 0 0 0 0
P E F T N T 1
P E F T N T 2
D A T Y R
D A T Q T R
F I S C Y R
C A L Y R
C A L Q T R
U P C O D E
S R C D O C
1985 1986 1986 1986 1994 1994 1994 1994
4 1 2 3 1 2 3 4
3 3 3 3 10 10 10 10
1986 1986 1986 1986 1993 1994 1994 1994
1 2 3 4 4 1 2 3
3 3 3 3 3 3 3 3
53 53 53 53 53 53 53 53
P E F T N T 3
P E F T N T 5
P E F T N T 6
P E F T N T 4
P E F T N T 7
P E F T N T 8
Fiscal date range restrictions are valid only for fiscal members and can be used in either the INSET=
Example 33.11: Using INSET Ranges with the LIBNAME RANGE Option F 2233
option or the RANGE= option. Use calendar date ranges for nonfiscal members. N OTE : Fiscal date ranges are ignored when used with nonfiscal members.
Example 33.11: Using INSET Ranges with the LIBNAME RANGE Option It is possible to specify both individual range restrictions with an INSET and a global date range restriction via the RANGE= option on the LIBNAME statement. In such cases, only observations that satisfy both date range restrictions are returned. The effective range restriction becomes the intersection of the two specified range restrictions. If this intersection is empty, no observations are returned. This example extracts data for two companies, IBM and Microsoft. Each company has an individual range restriction specified in the inset. Furthermore, a global range restriction is set by the RANGE= option on the LIBNAME statement. As a result the effective date range restriction for IBM becomes August 1, 1999, to February 1, 2000, and the effective date range restriction for Microsoft becomes January 1, 2001, to April 21, 2002. data two_companies; gvkey=6066; date1=19800101; date2=20000201; output; gvkey=12141; date1=20010101; date2=20051231; output; run; libname _all_ clear; libname mylib sasecrsp "%sysget(CRSP_CST)" SETID=200 INSET=’two_companies,gvkey,gvkey,date1,date2’ RANGE=’19990801-20020421’; title1 ’Two Companies, Two Range Selections’; title2 ’Global RANGE Statement Used With Individual Inset Ranges’; title3 ’Results Show Intersection of Both Range Restrictions’; proc sql; select prcc.gvkey,prcc.caldt,prcc,ern from mylib.prcc as prcc, mylib.ern as ern where prcc.caldt = ern.caldt and prcc.gvkey = ern.gvkey; quit;
Output 33.11.1 shows the combined effect of both INSET and RANGE date restrictions on the closing prices and earnings per share for IBM and Microsoft.
2234 F Chapter 33: The SASECRSP Interface Engine
Output 33.11.1 Mixing INSET Ranges with the RANGE= Option Two Companies, Two Range Selections Global RANGE Statement Used With Individual Inset Ranges Results Show Intersection of Both Range Restrictions Calendar Trading Closing Earnings GVKEY Date Price Per Share ---------------------------------------------6066 19990831 124.5625 4.1950 6066 19990930 121.0000 4.3650 6066 19991029 98.2500 4.3650 6066 19991130 103.0625 4.3650 6066 19991231 107.8750 4.2500 6066 20000131 112.2500 4.2500 12141 20010131 30.5313 0.9500 12141 20010228 29.5000 0.9500 12141 20010330 27.3438 0.9500 12141 20010430 33.8750 0.9500 12141 20010531 34.5900 0.9500 12141 20010629 36.5000 0.7250 12141 20010731 33.0950 0.7250 12141 20010831 28.5250 0.7250 12141 20010928 25.5850 0.6000 12141 20011031 29.0750 0.6000 12141 20011130 32.1050 0.6000 12141 20011231 33.1250 0.5650 12141 20020131 31.8550 0.5650 12141 20020228 29.1700 0.5650 12141 20020328 30.1550 0.5900
For more about using the SQL procedure, see the chapter on SQL in Base SAS Procedures Guide.
Data Elements Reference: SASECRSP Interface Engine Data sets are made available based on the type of CRSP database opened. Table 33.12, Table 33.13, and Table 33.14 show summary views of the three types of CRSP databases (Stock, CCM, and Indices) and the data sets they make available. Details on the data sets including their specific fields can be found in sections immediately following the summary tables. You can also see the available data sets for an opened database via the SAS Explorer by opening a SASECRSP libref that you have previously assigned.
Data Elements Reference: SASECRSP Interface Engine F 2235
Table 33.12
CRSP Database STOCK
Summary of All Available Data Sets by CRSP Database Type
Data Set Name
Reference Table Title
STKHEAD NAMES DISTS SHARES DELIST NASDIN PRC RET BIDLO ASKHI BID ASK RETX SPREAD ALTPRC VOL NUMTRD ALTPRCDT PORT1 PORT2 PORT3 PORT4 PORT5 PORT6 PORT7 PORT8 PORT9 GROUP16
Header Identification and Summary Data Name History Array Distribution Event Array Shares Outstanding Observation Array Delisting History Array NASDAQ Information Array Price or Bid/Ask Average Time Series Returns Time Series Bid or Low Price Time Series Ask or High Price Time Series Bid Time Series Ask Time Series Returns Without Dividends Time Series Spread Between Bid and Ask Price Alternate Time Series Volume Time Series Number of Trades Time Series Price Alternate Date Time Series Portfolio Data for Portfolio Type 1 Portfolio Data for Portfolio Type 2 Portfolio Data for Portfolio Type 3 Portfolio Data for Portfolio Type 4 Portfolio Data for Portfolio Type 5 Portfolio Data for Portfolio Type 6 Portfolio Data for Portfolio Type 7 Portfolio Data for Portfolio Type 8 Portfolio Data for Portfolio Type 9 Group Data for Group Type 16
Reference Table Table 33.15 Table 33.16 Table 33.17 Table 33.18 Table 33.19 Table 33.20 Table 33.21 Table 33.21 Table 33.21 Table 33.21 Table 33.21 Table 33.21 Table 33.21 Table 33.21 Table 33.21 Table 33.21 Table 33.21 Table 33.21 Table 33.22 Table 33.22 Table 33.22 Table 33.22 Table 33.22 Table 33.22 Table 33.22 Table 33.22 Table 33.22 Table 33.22
2236 F Chapter 33: The SASECRSP Interface Engine
Table 33.13
CRSP Database CCM
Summary of All Available Data Sets by CRSP Database Type
Data Set Name
Reference Table Title
CSTHEAD CSTNAME LINK APERDES QPERDES IAITEMS IQITEMS BAITEMS BQITEMS PRCH PRCL PRCC DIV ERN SHSTRD DIVRTE RAWADJ CUMADJ BKV CHEQVM CSHOQ NAVM OEPS12 GICS CPSPIN DIVFT RAWADJFT COMSTAFT ISAFT SEGSRC SEGPROD SEGCUST SEGDTL SEGNAICS SEGGEO SEGCUR SEGITM
Compustat Header Data Compustat Description History Array CRSP Compustat Link History Annual Period Descriptors Time Series Quarterly Period Descriptors Time Series Annual Data Items Quarterly Data Items Bank Annual Data Items Bank Quarterly Data Items High Price Time Series Low Price Time Series Closing Price Time Series Dividends Per Share Time Series Earnings Per Share Time Series Shares Traded Time Series Annualized Dividend Rate Time Series Raw Adjustment Factor Time Series Cumulative Adjustment Factor Time Series Book Value Per Share Time Series Cash Equivalent Distribution Time Series Common Shares Outstanding Time Series Net Asset Value Time Series Earnings/Share from Operations Global Industry Class. Std. code S&P Index Primary Marker Time Series Dividends per share footnotes Raw adjustment factor footnotes Comparability status footnotes Issue status alert footnotes Operating Segment Source History Operating Segment Products History Operating Segment Customer History Operating Segment Detail History Operating Segment NAICS History Geographic Segment History Segment Currency Data Segment Item Data
Reference Table Table 33.23 Table 33.24 Table 33.25 Table 33.26 Table 33.27 Table 33.28 Table 33.29 Table 33.30 Table 33.31 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.32 Table 33.33 Table 33.34 Table 33.35 Table 33.36 Table 33.37 Table 33.38 Table 33.39 Table 33.40
Data Elements Reference: SASECRSP Interface Engine F 2237
Table 33.14
CRSP Database IND
Summary of All Available Data Sets by CRSP Database Type
Data Set Name
Reference Table Title
INDHEAD REBAL REBAL LIST LIST USDCNT TOTCNT USDCNT TOTCNT USDVAL TOTVAL USDVAL TOTVAL TRET ARET IRET TRET ARET IRET TIND AIND IIND TIND AIND IIND
Index Header Data Index Rebalancing History Arrays Index Rebalancing History Group Arrays Index Membership List Arrays Index Membership List Groups Arrays Portfolio Used Count Array Portfolio Total Count Array Portfolio Used Count Time Series Groups Portfolio Total Count Time Series Groups Portfolio Used Value Array Portfolio Total Value Array Portfolio Used Value Time Series Groups Portfolio Total Value Time Series Groups Total Returns Time Series Appreciation Returns Time Series Income Returns Time Series Total Returns Time Series Groups Income Returns Time Series Groups Income Returns Time Series Groups Total Return Index Levels Time Series Appreciation Index Levels Time Series Income Index Levels Time Series Total Return Index Levels Groups Appreciation Index Levels Groups Income Index Levels Time Series Groups
Reference Table Table 33.41 Table 33.42 Table 33.43 Table 33.44 Table 33.45 Table 33.46 Table 33.47 Table 33.48 Table 33.49 Table 33.50 Table 33.51 Table 33.52 Table 33.53 Table 33.54 Table 33.55 Table 33.56 Table 33.57 Table 33.58 Table 33.59 Table 33.60 Table 33.61 Table 33.62 Table 33.63 Table 33.64 Table 33.65
2238 F Chapter 33: The SASECRSP Interface Engine
Available CRSP Stock Data Sets STKHEAD Data Set—Header Identification & Summary Data
Table 33.15
STKHEAD Data Set—Header Identification & Summary Data
Fields PERMNO PERMCO COMPNO ISSUNO HEXCD HSHRCD HSICCD BEGDT ENDDT DLSTCD HCUSIP HTICK HCOMNAM HTSYMBOL HNAICS HPRIMEXC HTRDSTAT HSECSTAT
Label PERMNO PERMCO NASDAQ Company Number NASDAQ Issue Number Exchange Code Header Share Code Header Standard Industrial Classification Code Begin of Stock Data End of Stock Data Delisting Code Header CUSIP Header Ticker Symbol Header Company Name Header Trading Symbol Header North American Industry Classification Header Primary Exchange Header Trading Status Header Security Status Header
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Character Character Character Character Character Character Character Character
NAMES Data Set—Name History Array Table 33.16
NAMES Data Set—Name History Array
Fields PERMNO NAMEDT NAMEENDT SHRCD EXCHCD SICCD NCUSIP TICKER COMNAM SHRCLS TSYMBOL NAICS PRIMEXCH TRDSTAT
Label PERMNO Names Date Names Ending Date Share Code Exchange Code Standard Industrial Classification Code CUSIP Ticker Symbol Company Name Share Class Trading Symbol North American Industry Classification System Primary Exchange Trading Status
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Character Character Numeric Character Character Character Character
Available CRSP Stock Data Sets F 2239
Table 33.16
continued
Fields SECSTAT
Label Security Status
Type Character
DISTS Data Set—Distribution Event Array Table 33.17
DISTS Data Set—Distribution Event Array
Fields PERMNO DISTCD DIVAMT FACPR FACSHR DCLRDT EXDT RCRDDT PAYDT ACPERM ACCOMP
Label PERMNO Distribution Code Dividend Cash Amount Factor to Adjust Price Factor to Adjust Share Distribution Declaration Date Ex-Distribution Date Record Date Payment Date Acquiring PERMNO Acquiring PERMCO
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
SHARES Data Set—Shares Outstanding Observation Array Table 33.18
SHARES Data Set—Shares Outstanding Observation Array
Fields PERMNO SHROUT SHRSDT SHRENDDT SHRFLG
Label PERMNO Shares Outstanding Shares Observation Date Shares Observation End Date Shares Outstanding Observation Flag
Type Numeric Numeric Numeric Numeric Numeric
DELIST Data Set—Delisting History Array Table 33.19
Fields PERMNO DLSTDT DLSTCD NWPERM NWCOMP NEXTD DLAMT DLRETX DLPRC
DELIST Data Set—Delisting History Array
Label PERMNO Delisting Date Delisting Code New PERMNO New PERMCO Delisting Next Price Date Delisting Amount Delisting Return Without Dividends Delisting Price
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2240 F Chapter 33: The SASECRSP Interface Engine
Table 33.19
Fields DLPDT DLRET
continued
Label Delisting Amount Date Delisting Return
Type Numeric Numeric
NASDIN Data Set—NASDAQ Information Array Table 33.20
Fields PERMNO TRTSCD TRTSDT TRTSENDT NMSIND MMCNT NSDINX
NASDIN Data Set—NASDAQ Information Array
Label PERMNO NASDAQ Traits Code NASDAQ Traits Date NASDAQ Traits End Date NASDAQ National Market Indicator Market Maker Count Nasd Index Code
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric
STOCK Time Series Data Sets Table 33.21
STOCK Time Series Data Sets
Data Set Name, Long Name PRC Price or Bid/Ask Average Time Series RET Returns Time Series ASKHI Ask or High Price Time Series BIDLO Bid or Low Price Time Series BID Bid Time Series ASK Ask Time Series RETX Returns without Dividends SPREAD Spread Between Bid
Fields
Label
Type
PERMNO CALDT PRC PERMNO CALDT RET PERMNO CALDT ASKHI PERMNO CALDT BIDLO PERMNO CALDT BID PERMNO CALDT ASK PERMNO CALDT RETX PERMNO CALDT
PERMNO Calendar Trading Date Price or Bid/Ask Aver PERMNO Calendar Trading Date Returns PERMNO Calendar Trading Date Ask or High Price PERMNO Calendar Trading Date Bid or Low Price PERMNO Calendar Trading Date Bid PERMNO Calendar Trading Date Ask PERMNO Calendar Trading Date Returns w/o Dividends PERMNO Calendar Trading Date
Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available CRSP Stock Data Sets F 2241
Table 33.21
continued
Data Set Name, Long Name and Ask Time Series ALTPRC Price Alternate Time Series VOL Volume Time Series NUMTRD Number of Trades Time Series ALTPRCDT Alternate Price Date Time Series
Fields
Label
Type
SPREAD PERMNO CALDT ALTPRC PERMNO CALDT VOL PERMNO CALDT NUMTRD PERMNO CALDT ALTPRCDT
Spread Between Bid Ask PERMNO Calendar Trading Date Price Alternate PERMNO Calendar Trading Date Volume PERMNO Calendar Trading Date Number of Trades PERMNO Calendar Trading Date Alternate Price Date
Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Portfolio and Group Data Sets Table 33.22
Portfolio and Group Data Sets
Data Set PORT1 Portfolio data for Portfolio Type 1 PORT2 Portfolio data for Portfolio Type 2 PORT3 Portfolio data for Portfolio Type 3 PORT4 Portfolio data for Portfolio Type 4 PORT5 Portfolio data for Portfolio Type 5 PORT6 Portfolio data for Portfolio Type 6
Fields PERMNO CALDT PORT1 STAT1 PERMNO CALDT PORT2 STAT2 PERMNO CALDT PORT3 STAT3 PERMNO CALDT PORT4 STAT4 PERMNO CALDT PORT5 STAT5 PERMNO CALDT PORT6 STAT6
Label PERMNO Calendar Trading Date Portfolio Assignment for Portfolio Type 1 Portfolio Statistic for Portfolio Type 1 PERMNO Calendar Trading Date Portfolio Assignment for Portfolio Type 2 Portfolio Statistic for Portfolio Type 2 PERMNO Calendar Trading Date Portfolio Assignment for Portfolio Type 3 Portfolio Statistic for Portfolio Type 3 PERMNO Calendar Trading Date Portfolio Assignment for Portfolio Type 4 Portfolio Statistic for Portfolio Type 4 PERMNO Calendar Trading Date Portfolio Assignment for Portfolio Type 5 Portfolio Statistic for Portfolio Type 5 PERMNO Calendar Trading Date Portfolio Assignment for Portfolio Type 6 Portfolio Statistic for Portfolio Type 6
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2242 F Chapter 33: The SASECRSP Interface Engine
Table 33.22
continued
Data Set PORT7 Portfolio data for Portfolio Type 7 PORT8 Portfolio data for Portfolio Type 8 PORT9 Portfolio data for Portfolio Type 9 GROUP16 Group data for Group Type 16
Fields PERMNO CALDT PORT7 STAT7 PERMNO CALDT PORT8 STAT8 PERMNO CALDT PORT9 STAT9 PERMNO GRPDT GRPENDDT GRPFLAG GRPSU
Label PERMNO Calendar Trading Date Portfolio Assignment for Portfolio Type 7 Portfolio Statistic for Portfolio Type 7 PERMNO Calendar Trading Date Portfolio Assignment for Portfolio Type 8 Portfolio Statistic for Portfolio Type 8 PERMNO Calendar Trading Date Portfolio Assignment for Portfolio Type 9 Portfolio Statistic for Portfolio Type 9 PERMNO Group Beginning Date Group Ending Date Group Flag of Associated Index Group Subflag
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available Compustat Data Sets CSTHEAD Data Set—Compustat Header Data Table 33.23
Fields GVKEY IPERM ICOMP BEGYR ENDYR BEGQTR ENDQTR AVAILFLG DNUM FILE ZLIST STATE COUNTY STINC FINC XREL STK DUP
CSTHEAD Data Set—Compustat Header Data
Label GVKEY Header CRSP issue PERMNO link Header CRSP company PERMCO link Annual date of earliest data (yyyy) Annual date of latest data (yyyy) Quarterly date of earliest data (yyyy.q) Quarterly date of latest data (yyyy.q) Code of available CST data types Industry code File identification code Exchange listing and S&P Index code State identification code County identification code State incorporation code Foreign incorporation code S&P Industry Index relative code Stock ownership code Duplicate file code
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available Compustat Data Sets F 2243
Table 33.23
continued
Fields CCNDX GICS IPODT BEGDT ENDDT FUNDF1 FUNDF2 FUNDF3 NAICS CPSPIN CSSPIN CSSPII SUBDBT CPAPER SDBT SDBTIM CNUM CIC CONAME INAME SMBL EIN INCORP RCST3
Label Current Canadian Index Code Global Industry Class. Std. code IPO date First date of Compustat data Last date of Compustat data Fundamental File Identification Code 1 Fundamental File Identification Code 2 Fundamental File Identification Code 3 North American Industry Classification Primary S&P index marker Secondary S&P index marker Subset S&P index marker Current S&P Subordinated Debt Rating Current S&P Commercial Paper Rating Current S&P Senior Debt Rating Current S&P Senior Debt Rating-Footnote CUSIP issuer code Issuer number Company name Industry name Stock ticker symbol Employer Identification Number Incorporation ISO Country Code Reserved 3
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Character Character Character Character Character Character Character Character Character Character Character Character Character Character Character Character
CSTNAME Data Set—Compustat Description History Array Table 33.24
CSTNAME Data Set—Compustat Description History Array
Fields GVKEY CHGDT CHGENDDT DNUM FILE ZLIST STATE COUNTY STINC FINC XREL STK DUP CCNDX GICS
Label GVKEY Effective date of this description Last effective date of this description Industry code File identification code Exchange listing and S&P Index code State identification code County identification code State incorporation code Foreign incorporation code S&P Industry Index relative code Stock ownership code Duplicate file code Current Canadian Index Code Global Industry Classification Std. code
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2244 F Chapter 33: The SASECRSP Interface Engine
Table 33.24
Fields IPODT RCST1 RCST2 FUNDF1 FUNDF2 FUNDF3 NAICS CPSPIN CSSPIN CSSPII SUBDBT CPAPER SDBT SDBTIM CNUM CIC CONAME INAME SMBL EIN INCORP RCST3
continued
Label IPO date Reserved 1 Reserved 2 Fundamental File Identification Code 1 Fundamental File Identification Code 2 Fundamental File Identification Code 3 North American Industry Classification Primary S&P index marker Secondary S&P index marker Subset S&P index marker Current S&P Subordinated Debt Rating Current S&P Commercial Paper Rating Current S&P Senior Debt Rating Current S&P Senior Debt Rating-Footnote CUSIP issuer code Issuer number Company name Industry name Stock ticker symbol Employer Identification Number Incorporation ISO Country Code Reserved 3
Type Numeric Numeric Numeric Numeric Numeric Numeric Character Character Character Character Character Character Character Character Character Character Character Character Character Character Character Character
LINK Data Set—CRSP Compustat Link History Table 33.25
Fields GVKEY LINKDT LINKENDT NPERMNO NPERMCO LINKTYPE LINKFLAG
LINK Data Set—CRSP Compustat Link History
Label GVKEY First date link is valid Last date link is valid CRSP PERMNO linked CRSP PERMCO linked Link type code Linking Flag
Type Numeric Numeric Numeric Numeric Numeric Character Character
APERDES Data Set—Annual Period Descriptors Time Series Table 33.26
Fields GVKEY CRSPDT RCALDT FISCALDT
APERDES Data Set—Annual Period Descriptors Time Series
Label GVKEY CRSP Date Raw Calendar Trading Date Fiscal Trading Date
Type Numeric Numeric Numeric Numeric
Available Compustat Data Sets F 2245
Table 33.26
Fields DATYR DATQTR FISCYR CALYR CALQTR UPCODE SRCDOC SPBOND SPDEBT SPPAPER SPRANK MAJIND INDIND REPDT FLOWCD CANDXC PEFTNT1 PEFTNT2 PEFTNT3 PEFTNT4 PEFTNT5 PEFTNT6 PEFTNT7 PEFTNT8
continued
Label Data Year Data Quarter Fiscal year-end month of data Calendar year Calendar quarter Update code Source document code S&P Senior Debt Rating S&P Subordinated Debt Rating S&P Commercial Paper Rating Common Stock Ranking Major Index Code S&P Industry Index Code Report date of quarterly earnings Flow of funds statement format code Canadian index code Period Descriptor Footnote 1 Source Document Code Period Descriptor Footnote 2 Month of Deletion Period Descriptor Footnote 3 Year of Deletion Period Descriptor Footnote 4 Reason For Deletion Period Descriptor Footnote 5 Unused Period Descriptor Footnote 6 Unused Period Descriptor Footnote 7 Unused Period Descriptor Footnote 8 Unused
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Character Character Character Character Character Character Character Character
QPERDES Data Set—Quarterly Period Descriptors Time Series Table 33.27
Fields GVKEY CRSPDT RCALDT FISCALDT DATYR DATQTR FISCYR CALYR CALQTR UPCODE SRCDOC SPBOND SPDEBT SPPAPER SPRANK
QPERDES Data Set—Quarterly Period Descriptors Time Series
Label GVKEY CRSP Date Raw Calendar Trading Date Fiscal Trading Date Data Year Data Quarter Fiscal year-end month of data Calendar year Calendar quarter Update code Source document code S&P Senior Debt Rating S&P Subordinated Debt Rating S&P Commercial Paper Rating Common Stock Ranking
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2246 F Chapter 33: The SASECRSP Interface Engine
Table 33.27
Fields MAJIND INDIND REPDT FLOWCD CANDXC PEFTNT1 PEFTNT2 PEFTNT3 PEFTNT4 PEFTNT5 PEFTNT6 PEFTNT7 PEFTNT8
continued
Label Major Index Code S&P Industry Index Code Report date of quarterly earnings Flow of funds statement format code Canadian index code Period Descriptor Footnote 1 Comparability Status Period Descriptor Footnote 2 Company Status Alert Period Descriptor Footnote 3 S&P Senior Debt Rating Period Descriptor Footnote 4 Reason For Deletion Period Descriptor Footnote 5 Unused Period Descriptor Footnote 6 Unused Period Descriptor Footnote 7 Unused Period Descriptor Footnote 8 Unused
Type Numeric Numeric Numeric Numeric Numeric Character Character Character Character Character Character Character Character
IAITEMS Data Set—Annual Data Items Table 33.28
Fields GVKEY CRSPDT RCALDT FISCALDT IA1 IA2 IA3 IA4 IA5 IA6 IA7 IA8 IA9 IA10 IA11 IA12 IA13 IA14 IA15 IA16 IA17 IA18 IA19 IA20 IA21 IA22
IAITEMS Data Set—Annual Data Items
Label GVKEY CRSP Date Raw Calendar Trading Date Fiscal Trading Date Cash and Short Term Investments Receivables—Total Inventories—Total Current Assets—Total Current Liabilities Total Assets Total Liabilities and Stockholders’ Equity Total Property, Plant, and Equipment Total (Gross) Property, Plant, and Equipment Total (Net) Long-Term Debt—Total Preferred Stock Liquidating Value Common Equity Tangible Sales (Net) Operating Income Before Depreciation Depreciation and Amortization Income Statement) Interest Expense Income Taxes—Total Special Items Income Before Extraordinary Items Dividends—Preferred Income Before Extraordinary Items Adj for Com Stk Equiv Dollar Savings Dividends—Common Price—High
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available Compustat Data Sets F 2247
Table 33.28
Fields IA23 IA24 IA25 IA26 IA27 IA28 IA29 IA30 IA31 IA32 IA33 IA34 IA35 IA36 IA37 IA38 IA39 IA40 IA41 IA42 IA43 IA44 IA45 IA46 IA47 IA48 IA49 IA50 IA51 IA52 IA53 IA54 IA55 IA56 IA57 IA58 IA59 IA60 IA61 IA62 IA63 IA64 IA65 IA66 IA67
continued
Label Price—Low Price—Close Common Shares Outstanding Dividends Per Share by Ex-Date Adjustment Factor (Cumulative) by Ex-Date Common Shares Traded Employees Property, Plant, and Equipment Capital Expenditures (Schedule V) Investments and Advances Equity Method Investments and Advances—Other Intangibles Debt in Current Liabilities Deferred Taxes and Investment Tax Credit (Balance Sheet) Retained Earnings Invested Capital Total Minority Interest Balance Sheet Convertible Debt and Preferred Stock Common Shares Reserved for Conversion—Total Cost of Goods Sold Labor and Related Expense Pension and Retirement Expense Debt—Due in One Year Advertising Expense Research and Development Expense Rental Expense Extraordinary Items and Discontinued Operations Minority Interest (Income Account) Deferred Taxes (Income Account) Investment Tax Credit (Income Account) Net Operating Loss Carry Forward Unused Portion Earnings Per Share (Basic)—Including Extraordinary Items Common Shares Used to Calculate Earnings Per Share (Basic) Equity in Earnings Preferred Stock Redemption Value Earnings Per Share (Diluted)—Excluding Extraordinary Items Earnings Per Share (Basic)—Excluding Extraordinary Items Inventory Valuation Method Common Equity—Total Non-operating Income (Expense) Interest Income Income Taxes—Federal Current Income Taxes—Foreign Current Amortization of Intangibles Discontinued Operations Receivables Estimated Doubtful
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2248 F Chapter 33: The SASECRSP Interface Engine
Table 33.28
Fields IA68 IA69 IA70 IA71 IA72 IA73 IA74 IA75 IA76 IA77 IA78 IA79 IA80 IA81 IA82 IA83 IA84 IA85 IA86 IA87 IA88 IA89 IA90 IA91 IA92 IA93 IA94 IA95 IA96 IA97 IA98 IA99 IA100 IA101 IA102 IA103 IA104 IA105 IA106 IA107 IA108 IA109 IA110 IA111 IA112
continued
Label Current Assets—Other Assets—Other Accounts Payable Income Taxes Payable Current Liabilities Other Property, Plant, and Equipment—Construction in Progress (Net) Deferred Taxes (Balance Sheet) Liabilities—Other Inventories—Raw Materials Inventories—Work in Progress Inventories Finished Goods Debt—Convertible Debt—Subordinated Debt—Notes Debt—Debentures Long—Term Debt Other Debt—Capitalized Lease Obligations Common Stock Treasury Stock Memo Entry Treasury Stock Number of Common Shares Treasury Stock—Total Dollar Amount Pension Costs—Unfunded Vested Benefits Pension Costs—Unfunded Past or Prior Service Debt—Maturing In Second Year Debt—Maturing In Third Year Debt—Maturing In Fourth Year Debt—Maturing In Fifth Year Rental Commitments Minimum—Five Years Total Rental Commitments Minimum—First Year Retained Earnings Unrestricted Order Backlog Retained Earnings Restatement Common Shareholders Interest Expense on Long-Term Debt Excise Taxes Depreciation Expense (Schedule VI) Short-Term Borrowing Average Short-Term Borrowings Average Interest Rate Equity In Net Loss (Earnings) (Statement of Cash Flows) Sale of Property, Plant, and Equipment (Statement of Cash Flows) Sale of Common and Preferred Stock (Statement of Cash Flows) Sale of Investments (Statement of Cash Flows) Funds from Operations Total (Statement Changes) Long-Term Debt Issuance (Statement of Cash Flows) Sources of Funds Total (Statement of Changes)
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available Compustat Data Sets F 2249
Table 33.28
Fields IA113 IA114 IA115 IA116 IA117 IA118 IA119 IA120 IA121 IA122 IA123 IA124 IA125 IA126 IA127 IA128 IA129 IA130 IA131 IA132 IA133 IA134 IA135 IA136 IA137 IA138 IA139 IA140 IA141 IA142 IA143 IA144 IA145 IA146 IA147 IA148 IA149 IA150
continued
Label Increase in Investment (Statement of Cash Flows) Long-Term Debt Reduction (Statement of Cash Flows) Purchase of Common and Preferred Stock (Statement of Cash Flows) Uses of Funds—Total (Statement of Changes) Sales (Restated) Income Before Extraordinary Items (Restated) Earnings Per Share (Basic)—Excluding Extraordinary Items (Restated) Assets—Total (Restated) Working Capital (Restated) Pretax Income (Restated) Income Before Extraordinary Items (Statement of Cash Flows) Extraordinary Items & Discontinued Operations (Statement of Cash Flows) Depreciation and Amortization (Statement of Cash Flows) Deferred Taxes (Statement of Cash Flows) Cash Dividends (Statement of Cash Flows) Capital Expenditures (Statement of Cash Flows) Acquisitions (Statement of Cash Flows) Preferred Stock Carrying Value Cost of Goods Sold (Restated) Selling, General, and Administrative Expense (Restated) Depreciation and Amortization Restated Interest Expenses (Restated) Income Taxes—Total (Restated) Extraordinary Items and Discontinued Operations (Restated) Earnings Per Share (Basic)—Including Extraordinary Items Restated) Common Shares Used To Calculate Earnings Per Share (Basic) Restated Earnings Per Share (Diluted)—Excluding Extraordinary Items (Restated) Earnings Per Share (Diluted)—Including Extraordinary Items (Restated) Property, Plant, and Equipment—Total Net (Restated) Long-Term Debt—Total (Restated) Retained Earnings (Restated) Stockholders’ Equity (Restated) Capital Expenditures (Restated) Employees (Restated) Interest Capitalized Long-Term Debt Tied to Prime Auditor/Auditor’s Opinion Foreign Currency Adjustment Income Account
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2250 F Chapter 33: The SASECRSP Interface Engine
Table 33.28
Fields IA151 IA152 IA153 IA154 IA155 IA156 IA157 IA158 IA159 IA160 IA161 IA162 IA163 IA164 IA165 IA166 IA167 IA168 IA169 IA170 IA171 IA172 IA173 IA174 IA175 IA176 IA177 IA178 IA179 IA180 IA181 IA182 IA183 IA184 IA185 IA186 IA187 IA188 IA189 IA190 IA191 IA192 IA193 IA194 IA195
continued
Label Receivables—Trade Deferred Charges Accrued Expense Debt—Subordinated Convertible Property, Plant, and Equipment—Buildings (Net) Property, Plant, and Equipment—Machinery and Equipment (Net) Property, Plant, and Equipment Natural Resources (Net) Property, Plant, and Equipment—Land and Improvements (Net) Property, Plant, and Equipment—Leases (Net) Prepaid Expense Income Tax Refund Cash Rental Income Rental Commitments Minimum—Second Year Rental Commitments Minimum—Third Year Rental Commitments Minimum—Fourth Year Rental Commitments Minimum—Fifth Year Compensating Balance Earnings Per Share (Diluted)—Including Extraordinary Items Pretax Income Common Shares Used to Calculate Earnings Per Share (Diluted) Net Income (Loss) Income Taxes—State Current Depletion Expense (Schedule VI) Preferred Stock Redeemable Blank Net Income (Loss) (Restated) Operating Income After Depreciation Working Capital (Balance Sheet) Working Capital Change Total (Statement of Changes) Liabilities—Total Property, Plant, and Equipment—Beginning Balance (Schedule V) Accounting Changes Cumulative Effect Property, Plant, and Equipment Retirements (Schedule V) Property, Plant, and Equipment—Other Changes (Schedule V) Inventories—Other Property, Plant, and Equipment—Ending Balance (Schedule V) Debt—Senior Convertible Selling, General, and Administrative Expense Non-operating Income (Expense) Common Stock Equivalents—Dollar Savings Extraordinary Items Short-Term Investments Receivables—Current Other Current Assets—Other Excluding Prepaid Expenses
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available Compustat Data Sets F 2251
Table 33.28
Fields IA196 IA197 IA198 IA199 IA200 IA201 IA202 IA203 IA204 IA205 IA206 IA207 IA208 IA209 IA210 IA211 IA212 IA213 IA214 IA215 IA216 IA217 IA218 IA219 IA220 IA221 IA222 IA223 IA224 IA225 IA226 IA227 IA228 IA229 IA230 IA231 IA232 IA233 IA234 IA235 IA236 IA237
continued
Label Depreciation, Depletion and Amortization (Accumulated) (Balance Sheet) Price—Fiscal Year High Price—Fiscal Year Low Price—Fiscal Year Close Common Shares Reserved for Conversion Convertible Debt Dividends Per Share by Payable Date Adjustment Factor (Cumulative) by Payable Date Common Shares Reserved for Conversion Preferred Stock Goodwill Assets—Other Excluding Deferred Charges Notes Payable Current Liabilities Other—Excluding Accrued Expenses Investment Tax Credit (Balance Sheet) Preferred Stock Nonredeemable Capital Surplus Income Taxes—Other Blank Sale of Prop, Plnt, & Equip & Sale of Invs Loss(Gain)(Stmt of Csh Flo) Preferred Stock Convertible Common Shares Reserved for Conversion Stock Options Stockholders’ Equity Total Funds from Operations Other (Statement of Cash Flow) Sources of Funds Other (Statement of Changes) Uses of Funds—Other (Statement of Changes) Depreciation (Accumulated) Beginning Balance (Schedule VI) Depreciation (Accumulated) Retirements (Schedule VI) Depreciation (Accumulated)—Other Changes (Schedule VI) Depreciation (Accumulated)—Ending Balance (Schedule VI) Non-operating Income (Expense) (Restated) Minority Interest (Restated) Treasury Stock (Dollar Amount)—Common Treasury Stock (Dollar Amount)—Preferred Currency Translation Rate Common Shares Reserved for Conversion Warrants and Other Retained Earnings Cumulative Translation Adjustment Retained Earnings Other Adjustments Common Stock—Per Share Carrying Value Earnings per Share from Operations ADR Ratio Common Equity Liquidation Value Working Capital Change Other—Increase (Decrease) (Stmnt of Changes) Income Before Extraordinary Items Available for Common
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2252 F Chapter 33: The SASECRSP Interface Engine
Table 33.28
Fields IA238 IA239 IA240 IA241 IA242 IA243 IA244 IA245 IA246 IA247 IA248 IA249 IA250 IA251 IA252 IA253 IA254 IA255 IA256 IA257 IA258 IA259 IA260 IA261 IA262 IA263 IA264 IA265 IA266 IA267 IA268 IA269 IA270 IA271 IA272 IA273 IA274 IA275 IA276 IA277 IA278 IA279 IA280 IA281
continued
Label Marketable Securities Adjustment (Balance Sheet) Interest Capitalized Net Income Effect Inventories—LIFO Reserve Debt—Mortgages and Other Secured Dividends—Preferred In Arrears Pension Benefits Present Value of Vested Pension Benefits Present Value of Nonvested Pension Benefits Net Assets Pension Discount Rate (Assumed Rate of Return) Pension Benefits Information Date Acquisition—Income Contribution Acquisitions—Sales Contribution Property, Plant, and Equipment—Other (Net) Depreciation (Accumulated)—Land and Improvements Depreciation (Accumulated) Natural Resources Depreciation (Accumulated) Buildings Depreciation (Accumulated) Machinery and Equipment Depreciation (Accumulated)—Leases Depreciation (Accumulated) Construction in Progress Depreciation (Accumulated)—Other Net Income Adjusted for Common Stock Equivalents Retained Earnings Unadjusted Property, Plant, and Equipment—Land and Improvements at Cost Property, Plant, and Equipment—Natural Resources at Cost Blank Property, Plant, and Equipment—Buildings at Cost Property, Plant, and Equipment—Machinery and Equipment at Cost Property, Plant, and Equipment—Leases at Cost Property, Plant, and Equipment Construction in Progress at Cost Property, Plant, and Equipment—Other at Cost Debt Unamortized Debt Discount and Other Deferred Taxes Federal Deferred Taxes Foreign Deferred Taxes—State Pretax Income Domestic Pretax Income Foreign Cash & Cash Equivalent Increase (Decrease) (Statement of Cash Flows) Blank S&P Major Index Code Historical S&P Industry Index Code—Historical Fortune Industry Code Historical Fortune Rank S&P Long-Term Domestic Issuer Credit Rating Historical Blank
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available Compustat Data Sets F 2253
Table 33.28
Fields IA282 IA283 IA284 IA285 IA286 IA287 IA288 IA289 IA290 IA291 IA292 IA293 IA294 IA295 IA296 IA297 IA298 IA299 IA300 IA301 IA302 IA303 IA304 IA305 IA306 IA307 IA308 IA309 IA310 IA311 IA312 IA313 IA314 IA315 IA316 IA317 IA318 IA319 IA320 IA321 IA322 IA323
continued
Label S&P Common Stock Ranking S&P Short-Term Domestic Issuer Credit Rating—Historical Pension Vested Benefit Obligation Pension—Accumulated Benefit Obligation Pension—Projected Benefit Obligation Pension Plan Assets Pension—Unrecognized Prior Service Cost Pension—Other Adjustments Pension—Prepaid Accrued Cost Pension Vested Benefit Obligation Underfunded Periodic Postretirement Benefit Cost Net Pension—Accumulated Benefit Obligation (Underfunded) Pension—Projected Benefit Obligation (Underfunded) Periodic Pension Cost (Net) Pension Plan Assets (Underfunded) Pension—Unrecognized Prior Service Cost (Underfunded) Pension—Additional Minimum Liability Pension—Other Adjustments (Underfunded) Pension—Prepaid Accrued Cost (Underfunded) Changes in Current Debt (Statement of Cash Flows) Accounts Receivable Decrease (Increase) (Statement of Cash Flows) Inventory—Decrease (Increase) (Statement of Cash Flows) Accounts Payable & Accrued Liabilities Inc (Decrease) (St Cash Flows) Income Taxes—Accrued Increase (Decrease) Statement of Cash Flow Blank Assets and Liabilities Other (Net Change) Statement of Cash Flow Operating Activities Net Cash Flow Statement of Cash Flow Short-Term Investments Change (Statement of Cash Flows) Investing Activities Other (Statement of Cash Flows) Investing Activities Net Cash Flow (Statement of Cash Flows) Financing Activities Other (Statement of Cash Flows) Financing Activities Net Cash Flow (Statement of Cash Flows) Exchange Rate Effect (Statement of Cash Flows) Interest Paid—Net (Statement of Cash Flows) Blank Income Taxes Paid (Statement of Cash Flows) Format Code (Statement of Cash Flows) Dilution Adjustment S&P Subordinated Debt Rating Interest Income Total (Financial Services) Dilution Available Excluding Numeric Earnings per share from Operations (Diluted)
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2254 F Chapter 33: The SASECRSP Interface Engine
Table 33.28
Fields IA324 IA325 IA326 IA327 IA328 IA329 IA330 IA331 IA332 IA333 IA334 IA335 IA336 IA337 IA338 IA339 IA340 IA341 IA342 IA343 IA344 IA345 IA346 IA347 IA348 IA349 IA350 IA351 IA352 IA353 IA354 IA355 IA356 IA357 IA358 IA359 IA360 IA361 IA362 IA363 IA364 IA365 IA366 IA367 IA368
continued
Label Historical SIC Code Blank Blank Contingent Liabilities Guarantees Debt—Finance Subsidiary Debt—Consolidated Subsidiary Postretirement Benefit Asset (Liability) (Net) Pension Plans Service Cost Pension Plans Interest Cost Pension Plans Return on Plan Assets (Actual) Pension Plans—Other Periodic Cost Components (Net) Pension Plans—Rate of Compensation Increase Pension Plans Anticipated Long-Term Rate of Return on Plan Assets Risk-Adjusted Capital Ratio—Tier 1 Blank Interest Expense Total (Financial Services) Net Interest Income (Tax Equivalent) Non-performing Assets Total Provision for Loan/Asset Losses Reserve for Loan/Asset Losses Net Interest Margin Blank Blank Blank Risk-Adjusted Capital Ratio—Total Net Charge-Offs Blank Current Assets Discontinued Operations Other Intangibles Long-Term Assets of Discontinued Operations Other Current Assets Excluding Discontinued Other Assets Excluding Discontinued Operations Deferred Revenue Current Accumulated Other Comprehensive Income Deferred Compensation Other Stockholders’ Equity Adjustments Acquisition/Merger Pretax Acquisition/Merger After-Tax Acquisition/Merger Basic EPS Effect Acquisition/Merger Diluted EPS Effect Gain/Loss Pretax Gain/Loss After-Tax Gain/Loss Basic EPS Effect Gain/Loss Diluted EPS Effect Impairments of Goodwill Pretax
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available Compustat Data Sets F 2255
Table 33.28
Fields IA369 IA370 IA371 IA372 IA373 IA374 IA375 IA376 IA377 IA378 IA379 IA380 IA381 IA382 IA383 IA384 IA385 IA386 IA387 IA388 IA389 IA390 IA391 IA392 IA393 IA394 IA395 IA396 IA397 IA398 IA399 IA400
continued
Label Impairments of Goodwill After-Tax Impairments of Goodwill Basic EPS Effect Impairments of Goodwill Diluted EPS Effect Settlement (Litigation /Insurance) Pretax Settlement (Litigation /Insurance) Aftertax Settlement (Litigation /Insurance) Basic EPS Settlement (Litigation /Insurance) Diluted EPS Restructuring Costs Pretax Restructuring Costs Aftertax Restructuring Costs Basic EPS Effect Restructuring Costs Diluted EPS Effect Writedowns Pretax Writedowns After-Tax Writedowns Basic EPS Effect Writedowns Diluted EPS Effect Other Special Items Pretax Other Special Items Aftertax Other Special Items Basic EPS Effect Other Special Items Diluted EPS Effect In Process Research & Development Thereafter Rent Commitments Accumulated Depreciation of Real Estate Property Total Real Estate Property Gain/Loss on Sale of Property Depreciation and Amortization of Property Goodwill Amortization See Footnote for 394 Common Shares Issued Deferred Revenue Long-Term Stock Compensation Expense Implied Option Expense Blank
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
IQITEMS Data Set—Quarterly Data Items Table 33.29
Fields GVKEY CRSPDT RCALDT FISCALDT IQ1 IQ2 IQ3
IQITEMS Data Set—Quarterly Data Items
Label GVKEY CRSP Date Raw Calendar Trading Date Fiscal Trading Date Selling, General, and Administrative Expense Sales (Net) Minority Interest (Income Account)
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2256 F Chapter 33: The SASECRSP Interface Engine
Table 33.29
Fields IQ4 IQ5 IQ6 IQ7 IQ8 IQ9 IQ10 IQ11 IQ12 IQ13 IQ14 IQ15 IQ16 IQ17 IQ18 IQ19 IQ20 IQ21 IQ22 IQ23 IQ24 IQ25 IQ26 IQ27 IQ28 IQ29 IQ30 IQ31 IQ32 IQ33 IQ34 IQ35 IQ36 IQ37 IQ38 IQ39 IQ40 IQ41 IQ42 IQ43 IQ44
continued
Label Research and Development Expense Depreciation and Amortization (Income Statement) Income Taxes—Total Earnings Per Share (Diluted)—Including Extraordinary Items Income Before Extraordinary Items Earnings Per Share (Diluted)—Excluding Extraordinary Items Income Before Extraordinary Items—Adj for Com Stk Equiv Dollar Savings Earnings Per Share (Basic)—Including Extraordinary Items Price—Close 1st Month of Quarter Price—Close 2nd Month of Quarter Price—Close 3rd Month of Quarter Common Shares Used to Calculate Earnings per Share (Basic) Dividends per Share by Ex-Date Adjustment Factor Cumulative by Ex-Date Common Shares Traded Earnings Per Share (Basic)—Excluding Extraordinary Items Dividends—Common Indicated Annual Operating Income Before Depreciation Interest Expense Pretax Income Dividends—Preferred Income Before Extraordinary Items Available for Common Extraordinary Items and Discontinued Operations Earnings Per Share (Basic) Excluding Extraordinary Items 12 Mo Moving Common Shares Used to Calculate Earnings Per Share—12 Month Moving Interest Income Total (Financial Services) Cost of Goods Sold Non-operating Income (Expense) Special Items Discontinued Operations Foreign Currency Adjustment (Income Account) Deferred Taxes (Income Account) Cash and Short Term Investments Receivables—Total Inventories—Total Current Assets—Other Current Assets—Total Depreciation, Depletion and Amortization (Accumulated) (Balance Sheet) Property, Plant, and Equipment—Total (Net) Assets—Other Assets—Total/Liabilities and Stockholders Equity-Total
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available Compustat Data Sets F 2257
Table 33.29
Fields IQ45 IQ46 IQ47 IQ48 IQ49 IQ50 IQ51 IQ52 IQ53 IQ54 IQ55 IQ56 IQ57 IQ58 IQ59 IQ60 IQ61 IQ62 IQ63 IQ64 IQ65 IQ66 IQ67 IQ68 IQ69 IQ70 IQ71 IQ72 IQ73 IQ74 IQ75 IQ76 IQ77 IQ78 IQ79 IQ80 IQ81 IQ82 IQ83 IQ84 IQ85 IQ86
continued
Label Debt in Current Liabilities Accounts Payable Income Taxes Payable Current Liabilities Other Current Liabilities Total Liabilities—Other Long-Term Debt—Total Deferred Taxes and Investment Tax Credit (Balance Sheet) Minority Interest (Balance Sheet) Liabilities—Total Preferred Stock Carrying Value Common Stock Capital Surplus Retained Earnings Common Equity—Total Stockholders’ Equity Total Common Shares Outstanding Invested Capital Total Price—High 1st Month of Quarter Price—High 2nd Month of Quarter Price—High 3rd Month of Quarter Price—Low 1st Month of Quarter Price—Low 2nd Month of Quarter Price—Low 3rd Month of Quarter Net Income (Loss) Interest Expense Total (Financial Services) Preferred Stock Redeemable Dividends per Share by Payable Date Working Capital Change Other—Increase (Decrease) (Stmnt of Changes) Cash & Cash Equivalents Increase (Decrease) (Statement of Cash Flows) Changes in Current Deb (Statement of Cash Flows) Income Before Extraordinary Items (Statement of Cash Flows) Depreciation and Amortization (Statement of Cash Flows) Extraordinary Items & Discontinued Operation (Statement of Cash Flows) Deferred Taxes (Statement of Cash Flows) Equity in Net Loss (Earnings) (Statement of Cash Flows) Funds from Operations Other (Statement of Cash Flows) Funds from Operation Total (Statement of Charges) Sale of Property, Plant and Equipment (Statement of Cash Flows) Sale of Common and Preferred Stock (Statement of Cash Flows) Sale of Investments (Statement of Cash Flows) Long-Term Debt Issuance (Statement of Cash Flows)
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2258 F Chapter 33: The SASECRSP Interface Engine
Table 33.29
Fields IQ87 IQ88 IQ89 IQ90 IQ91 IQ92 IQ93 IQ94 IQ95 IQ96 IQ97 IQ98 IQ99 IQ100 IQ101 IQ102 IQ103 IQ104 IQ105 IQ106 IQ107 IQ108 IQ109 IQ110 IQ111 IQ112 IQ113 IQ114 IQ115 IQ116 IQ117 IQ118 IQ119 IQ120 IQ121 IQ122 IQ123 IQ124 IQ125
continued
Label Sources of Funds Other (Statement of Changes) Sources of Funds Total (Statement of Changes) Cash Dividends (Statement of Cash Flows) Capital Expenditures (Statement of Cash Flows) Increase in Investment (Statement of Cash Flows) Long-Term Debt Reduction (Statement of Cash Flows) Purchase of Common and Preferred Stock (Statement of Cash Flows) Acquisitions (Statement of Cash Flows) Uses of Funds—Other (Statement of Changes) Uses of Funds—Total (Statement of Changes) Net Interest Income (Tax Equivalent) Treasury Stock Total Dollar Amount Non-Performing Assets Total Adjustment Factor (Cumulative) by Payable Date Working Capital Change Total (Statement of Changes) Sale of Prop, Plnt, & Equip & Sale of Invs Loss(Gain) (Stmt of Csh Flo) Accounts Receivable Decrease (Increase) (Statement of Cash Flows) Inventory—Decrease (Increase) (Statement of Cash Flows) Accounts Payable & Accrued Liabilities Inc (Decrease) (St Cash Flows) Income Taxes—Accrued Increase (Decrease) (Statement of Cash Flows) Assets and Liabilities Other (Net Change) (Statement of Cash Flows) Operating Activities Net Cash Flow (Statement of Cash Flows) Short-Term Investments Change (Statement of Cash Flows) Investing Activities Other (Statement of Cash Flows) Investing Activities Net Cash Flow (Statement of Cash Flows) Financing Activities Other (Statement of Cash Flows) Financing Activities Net Cash Flow (Statement of Cash Flows) Exchange Rate Effect (Statement of Cash Flows) Interest Paid—Net (Statement of Cash Flows) Income Taxes Paid (Statement of Cash Flows) Accounting Changes Cumulative Effect Property, Plant and Equipment—Total (Gross) Extraordinary Items Common Stock Equivalents Dollar Savings Currency Translation Rate Accounts Payable Expanded Blank Common Shares for Diluted EPS Dilution Adjustment
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available Compustat Data Sets F 2259
Table 33.29
continued
Fields IQ126 IQ127..IQ170 IQ171 IQ172 IQ173 IQ174 IQ175 IQ176 IQ177 IQ178 IQ179 IQ180 IQ181 IQ182..IQ232 IQ233 IQ234 IQ235 IQ236 IQ237 IQ238 IQ239 IQ240 IQ241 IQ242 IQ243 IQ244 IQ245 IQ246 IQ247 IQ248 IQ249 IQ250 IQ251 IQ252 IQ253 IQ254 IQ255 IQ256 IQ257 IQ258 IQ259 IQ260 IQ261 IQ262
Label Dilution Available Excluding Blank Provision for Loan/Asset Losses Reserve for Loan/Asset Losses Net Interest Margin Risk-Adjusted Capital Ratio—Tier 1 Risk-Adjusted Capital Ratio—Total Net Charge-Offs Earnings per Share from Operations Earnings per Share from Operations Earnings per Share (Diluted)—Excluding Extraordinary Items 12 Mo Mov Earnings per Share from Operations (Diluted) 12 Months Moving Earnings per Share from Operations (Diluted) Blank Total Long-Term Investments Goodwill Other Intangibles Other Long-Term Assets Unadjusted Retained Earnings Accumulated Other Comprehensive Income Deferred Compensation Other Stockholders’ Equity Adjustments Acquisition/Merger Pretax Acquisition/Merger After-Tax Acquisition/Merger Basic EPS Effect Acquisition/Merger Diluted EPS Effect Gain/Loss Pretax Gain/Loss After-Tax Gain/Loss Diluted EPS Effect Gain/Loss Basic EPS Effect Impairments of Goodwill Pretax Impairments of Goodwill After-Tax Impairments of Goodwill Basic EPS Effect Impairments of Goodwill Diluted EPS Effect Settlement (Litigation/Insurance) Pretax Settlement (Litigation/Insurance) Aftertax Settlement (Litigation/Insurance) Basic EPS Settlement (Litigation/Insurance)Diluted EP Restructuring Costs Pretax Restructuring Costs Aftertax Restructuring Costs Basic EPS Effect Restructuring Costs Diluted EPS Effect Writedowns Pretax Writedowns After-Tax
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2260 F Chapter 33: The SASECRSP Interface Engine
Table 33.29
Fields IQ263 IQ264 IQ265 IQ266 IQ267 IQ268 IQ269 IQ270 IQ271 IQ272 IQ273 IQ274 IQ275 IQ276 IQ277 IQ278 IQ279 IQ280
continued
Label Writedowns Basic EPS Effect Writedowns Diluted EPS Effect Other Special Items Pretax Other Special Items Aftertax Other Special Items Basic EPS Effect Other Special Items Diluted EPS Effect Accumulated Depreciation of Real Estate Property Total Real Estate Property Gain/Loss on Sale of Property Depreciation and Amortization of Property ADR Ratio In Process Research & Development Goodwill Amortization See Footnote for 275 Common Shares Issued Stock Compensation Expense Blank Blank
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
BAITEMS Data Set—Bank Annual Data Items Table 33.30
Fields GVKEY CRSPDT RCALDT FISCALDT BA1 BA2 BA3 BA4 BA5 BA6 BA7 BA8 BA9 BA10 BA11 BA12 BA13 BA14 BA15 BA16 BA17
BAITEMS Data Set—Bank Annual Data Items
Label GVKEY CRSP Date Raw Calendar Trading Date Fiscal Trading Date Cash and Due from Bank U.S. Treasury Securities Securities of Other U.S. Government Agencies and Corporations Due from Banks (Memorandum Entry) Other Securities (Taxable) Total Taxable Investment Securities Obligations of States and Political Subdivisions Total Investment Securities Geographic Designation Code Trading Account Securities Federal Funds Sold and Securities Purchased under Agreements to Resell Treasury Stock—Dollar Amount—Common Foreign Loans Real Estate Loans Total Real Estate Loans Insured or Guaranteed by U.S. Government Treasury Stock Dollar Amount Preferred Loans to Financial Institutions
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available Compustat Data Sets F 2261
Table 33.30
Fields BA18 BA19 BA20 BA21 BA22 BA23 BA24 BA25 BA26 BA27 BA28 BA29 BA30 BA31 BA32 BA33 BA34 BA35 BA36 BA37 BA38 BA39 BA40 BA41 BA42 BA43 BA44 BA45 BA46 BA47 BA48 BA49 BA50 BA51 BA52 BA53 BA54 BA55 BA56 BA57 BA58 BA59
continued
Label Loans for Purchasing or Carrying Securities Interest Income Total (Financial Services) Commercial or Industrial Loans Loans to Individuals for Household, Family, Other Consumer Expenditures Other Loans Loans (Gross) Unearned Discount/ Income Interest on Due from Banks (Restated) Interest Income on Fed Funds Sold & Secs Purchased under Agmnt to Resell Other Interest Income (Restated) Bank Premises, Furniture, and Fixture Real Estate Other than Bank Premises Investments in Nonconsolidated Subsidiaries Direct Lease Financing Customers’ Liability to this Bank on Acceptances Outstanding Other Assets Intangible Assets Aggregate Miscellaneous Assets Total Assets (Gross) Trading Account Income (Restated) Other Current Operating Revenue (Restated) Interest Expense on Fed Funds Purch’d & Secs Sold under Agmnts to Repur Assets Held for Sale Total Demand Deposits Net Interest Margin Consumer Type Time Deposit Total Savings Deposits Money Market Certificates of Deposit All Other Time Deposit Total Time Deposits (Other than Savings) Risk-Adjusted Capital Ratio - Tier 1 Interest on Long-Term Debt and Not Classified as Capital (Restated) Interest on Other Borrowed Money (Restated) Other Interest Expense (Restated) Salaries and Related Expenses (Restated) Total Deposits Worldwide Total Domestic Deposits Total Foreign Deposits Demand Deposits of IPC Time and Savings Deposits of IPC Deposits of U.S. Government Deposits of States and Political Subdivisions
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2262 F Chapter 33: The SASECRSP Interface Engine
Table 33.30
Fields BA60 BA61 BA62 BA63 BA64 BA65 BA66 BA67 BA68 BA69 BA70 BA71 BA72 BA73 BA74 BA75 BA76 BA77 BA78 BA79 BA80 BA81 BA82 BA83 BA84 BA85 BA86 BA87 BA88 BA89 BA90 BA91 BA92 BA93 BA94 BA95 BA96 BA97 BA98 BA99 BA100 BA101 BA102
continued
Label Deposits of Foreign Governments Deposits of Commercial Banks Certified and Officers Checks Other Deposits Risk-Adjusted Capital Ratio—Total Federal Funds Purchased & Securities Sold under Agreements to Repurchase Commercial Paper Long-Term Debt Not Classified as Capital Other Liabilities for Borrowed Money Total Borrowings Valuation Portion of Reserve for Loan Losses Mortgage Indebtedness Acceptances Executed by or for the Account of this Bank and Outstanding Other Liabilities (Excluding Valuation Reserves) Deferred Portion of Reserve for Loan Losses Contingency Portion of Reserve for Loan Losses Total Liabilities (Excluding Valuation Reserves) Minority Interest in Consolidated Subsidiaries Reserve(s) for Bad Debt Losses on Loans Depreciation and Amortization Reserves on Securities Total Reserves on Loan and Securities Fixed Expense (Occupancy and Equipment - Net)(Restated) Other Current Operating Expense(Restated) Capital Notes and Debentures Minority Interest (Income Account)(Restated) Preferred Stock Par Value Number of Shares of Preferred Stock Outstanding Common Stock Par Value Number of Shares Authorized Number of Shares Outstanding Number of Shares Reserved for Conversion Treasury Stock—Cost Number of Treasury Shares Held Special Items Surplus Undivided Profits Reserves for Contingencies and Other Capital Reserves Total Extraordinary Items—Net of Taxes (Restated) Total Book Value Net Income Per Share Excluding Extraordinary Items (Restated) Total Liabilities, Reserves and Capital Accounts Total Capital Accounts and Minority Interest (Invested Capital)
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available Compustat Data Sets F 2263
Table 33.30
Fields BA103 BA104 BA105 BA106 BA107 BA108 BA109 BA110 BA111 BA112 BA113 BA114 BA115 BA116 BA117 BA118 BA119 BA120 BA121 BA122 BA123 BA124 BA125 BA126 BA127 BA128 BA129 BA130 BA131 BA132 BA133 BA134 BA135 BA136 BA137 BA138 BA139 BA140 BA141
continued
Label Net Current Op Erngs Per Share—Excluding Extraordinary Items Fully Foreign Exchange Gains and Losses Interest and Fees on Loans Interest Inc on Federal Funds Sold & Secs Purchased und Agmnts to Resell Blank Interest and Discount on U.S. Treasury Securities Interest on Securities of U.S. Government Agencies and Corporations Interest and Dividends on Other Taxable Securities Total Taxable Investment Revenue Interest on Obligation of States & Political Subdivisions Other Interest Income Trading Account Interest (Memorandum Entry) Total Interest and Dividends on Investment Aggregate Loan and Investment Revenue Trust Department Income Service Charges on Deposit Accounts Other Svce Charges, Collection & Exchange Charges, Comms & Fees Trading Account Income Other Current Operating Revenue Interest on Due from Banks Aggregate Other Current Operating Revenue Total Current Operating Revenue Number of Employees Salaries and Wages of Officers and Employees Pension and Employee Benefits Average Fed Funds Purch’d & Securities Sold under Agmnts to Repurchase Interest on Deposits Interest Exp on Fed Fnds Purch’d & Securities Sold under Agmnts to Repur Interest on Borrowed Money Interest on Long-Term Debt—Not Classified as Capital Total Interest on Deposits and Borrowing Interest on Capital Notes and Debentures Provision for Loan Losses Occupancy Expense of Bank Premises—Net Total Interest Expense Rental Income Furniture and Equipment Depreciation, Rental Cost, Servicing, Etc. Number of Employees(Restated) Number of Domestic Officers (Restated)
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2264 F Chapter 33: The SASECRSP Interface Engine
Table 33.30
Fields BA142 BA143 BA144 BA145 BA146 BA147 BA148 BA149 BA150 BA151 BA152 BA153 BA154 BA155 BA156 BA157 BA158 BA159 BA160 BA161 BA162 BA163 BA164 BA165 BA166 BA167 BA168 BA169 BA170 BA171 BA172 BA173 BA174 BA175 BA176 BA177 BA178 BA179 BA180 BA181 BA182
continued
Label Other Current Operating Expense Aggregate Other Current Operating Expense Total Current Operating Expense Current Operating Earnings before Income Tax Income Taxes Applicable to Current Operating Earnings Net Current Operating Earnings Minority Interest (Income Account) Net Current Operating Earnings after Minority Interest Average Cash and Due from Banks (Restated) Average Loans Domestic (Restated) Average Loans Foreign (Restated) Net Pre-Tax Profit or Loss on Securities Sold or Redeemed Average Fed Fnds Sold & Secs Purchased under Agmnts to Resell (Rest) Average Trading Account Securities(Restated) Average Deposits(Restated) Tax Effect on Profit or Loss on Securities Sold or Redeemed Net Aft-Tax Profit/Loss on Secs Sld or Redmd Prior to Eff of Min Int Minority Interest in Aft-Tax Profit/Loss on Securities Sold or Redeemed Net After-Tax & Aft-Min Int Profit/Loss on Secs Sld or Redeemed Net Income Preferred Dividend Deductions Savings Due to Common Stock Equivalents Net Income Available for Common Net Current Operating Earnings Available for Common Interest and Fees on Loans (Restated) Taxable Investment Income (Restated) Non-Taxable Investment Income (Restated) Total Interest Income (Restated) Trust Department Income (Restated) Total Current Operating Revenue (Restated) Interest on Deposits (Restated) Total Interest on Deposits and Borrowing (Restated) Interest on Capital Notes and Debentures (Restated) Total Interest Expense (Restated) Provision for Loan Losses (Restated) Cash Dividends Declared on Common Stock Cash Dividends Declared on Preferred Stock Total Current Operating Expense (Restated) Current Operating Earnings before Income Tax (Restated) Income Taxes Applicable to Current Operating Earnings (Restated) Net After-Tax & Aft-Min Int Profit/Loss on Secs Sld/Redeemed (Rest)
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available Compustat Data Sets F 2265
Table 33.30
Fields BA183 BA184 BA185 BA186 BA187 BA188 BA189 BA190 BA191 BA192 BA193 BA194 BA195 BA196 BA197 BA198 BA199 BA200 BA201 BA202 BA203 BA204 BA205 BA206 BA207 BA208 BA209 BA210 BA211 BA212 BA213 BA214 BA215 BA216 BA217 BA218
continued
Label Net Income (Restated) Net Current Operating Earnings Per Share (Restated) Total Extraordinary Items Net of Taxes Common Shares Used in Calculating Earnings Per Shares (Restated) Additions to Reserves for Bad Debts Due to Mergers and Absorptions Additions to Reserves for Bad Debts Due to Recoveries Credt’d to Rsrvs Deductions from Reserves for Bad Debts Due to Losses Charged to Reserves Net Credit/Charge to Reserves for Bad Debts from Loan Recs or Chg-offs Transfers to Reserves for Bad Debts from Inc and/or to/from Undiv Prfts Average Preferred Stoc Par Value (Restated) Net Current Operating Earnings Per Share Excluding Extraordinary Items Net Income per Share Excluding Extraordinary Items Net Income per Share Including Extraordinary Items Common Shares Used in Calculating Earnings per Share Net Cur Op Earnings per Shares - Exc Extraordinary Items & Fully Diluted Net Income per Share Excluding Extraordinary Items—Fully Diluted Net Income per Share Including Extraordinary Items—Fully Diluted Common Shares Used in Calculating Fully Diluted Earnings per Share Common Dividends Paid per Share by Ex-Date Annualized Dividend Rate Market Price—High Market Price—Low Market Price—Close Common Shares Traded Average Reserve for Bad Debt Losses on Loans(Restated) Number of Domestic Offices Number of Foreign Offices Average Loans(Restated) Average Assets (Gross) Average Loans (Gross) Average Cash and Due from Banks Average Taxable Investments Average Non-Taxable Investments Average Deposits Average Deposits Time and Savings Average Deposits Demand
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2266 F Chapter 33: The SASECRSP Interface Engine
Table 33.30
Fields BA219 BA220 BA221 BA222 BA223 BA224 BA225 BA226 BA227 BA228 BA229 BA230 BA231 BA232
continued
Label Average Borrowings Average Fed Funds Sold & Secs Purchased under Agrmnts to Resell Average Book Value Average Fed Funds Purch’d & Secs Sold under Agmnts to Repurchase Average Taxable Investments (Restated) Average Nontaxable Investments Average Assets(Restated) Average Deposits Time and Savings(Restated) Average Deposits Demand (Restated) Average Deposits Foreign (Restated) Average Borrowings (Restated) Average Long-Term Debt (Restated) Average Book Value (Restated) Adjustment Factor Cumulative by Ex-Date
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
BQITEMS Data Set—Bank Quarterly Data Items Table 33.31
Fields GVKEY CRSPDT RCALDT FISCALDT BQ1 BQ2 BQ3 BQ4 BQ5 BQ6 BQ7 BQ8 BQ9 BQ10 BQ11 BQ12 BQ13 BQ14 BQ15 BQ16 BQ17 BQ18 BQ19 BQ20
BQITEMS Data Set—Bank Quarterly Data Items
Label GVKEY CRSP Date Raw Calendar Trading Date Fiscal Trading Date Cash and Due from Banks U.S. Treasury Securities Securities of Other U.S. Government Agencies and C Due from Banks (Memorandum Entry) Other Securities (Taxable) Total Taxable Investment Securities Obligations of States and Political Subdivisions Total Investment Securities Geographic Designation Code Trading Account Securities Federal Funds Sold & Secs Purch’d under Agrmnts to S&P Senior Debt Rating Unearned Discount/Income Loans (Gross) Treasury Stock Dollar Amount—Common Treasury Stock—Dollar Amount—Preferred Interest Income Total (Financial Services) Assets Held for Sale Bank Premises, Furniture, and Fixtures Real Estate Other than Bank Premises
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available Compustat Data Sets F 2267
Table 33.31
Fields BQ21 BQ22 BQ23 BQ24 BQ25 BQ26 BQ27 BQ28 BQ29 BQ30 BQ31 BQ32 BQ33 BQ34 BQ35 BQ36 BQ37 BQ38 BQ39 BQ40 BQ41 BQ42 BQ43 BQ44 BQ45 BQ46 BQ47 BQ48 BQ49 BQ50 BQ51 BQ52 BQ53 BQ54 BQ55 BQ56 BQ57 BQ58 BQ59 BQ60 BQ61 BQ62 BQ63 BQ64 BQ65
continued
Label Investments in Nonconsolidated Subsidiaries Direct Lease Financing Customer’s Liability to this Bank on Acceptances Other Assets Intangible Assets Aggregate Miscellaneous Assets Total Assets (Gross) Net Interest Margin Risk-Adjusted Capital Ratio—Tier 1 Total Demand Deposits Average Investments Average Loans (Gross) Total Savings Deposits Average Assets (Gross) Average Deposits Demand Total Time Deposits (Other than Savings) Average Deposits Time and Savings Average Deposits Average Deposits Foreign Average Borrowings Average Total Stockholders’ Equity Total Deposits Worldwide Total Domestic Deposit Total Foreign Deposits Risk-Adjusted Capital Ratio—Total Federal Funds Purchased & Secs Sold under Agrmnts Commercial Paper Long-Term Debt—Not Classified as Capital Other Liabilities for Borrowed Money Total Borrowings Depreciation and Amortization Mortgage Indebtedness Acceptances Executed by or for Account of this Ban Other Liabilities (Excluding Valuation Reserves) Special Items Blank Total Liabilities (Excluding Valuation Reserves) Minority Interest in Consolidated Subsidiaries Reserve(s) for Bad Debt Losses on Loans Valuation Portion of Reserves for Loan Losses Deferred Portion of Reserve for Loan Losses Contingency Portion of Reserve for Loan Losses Blank Capital Notes and Debentures Blank
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2268 F Chapter 33: The SASECRSP Interface Engine
Table 33.31
Fields BQ66 BQ67 BQ68 BQ69 BQ70 BQ71 BQ72 BQ73 BQ74 BQ75 BQ76 BQ77 BQ78 BQ79 BQ80 BQ81 BQ82 BQ83 BQ84 BQ85 BQ86 BQ87 BQ88 BQ89 BQ90 BQ91 BQ92 BQ93 BQ94 BQ95 BQ96 BQ97 BQ98 BQ99 BQ100 BQ101 BQ102 BQ103 BQ104 BQ105 BQ106 BQ107 BQ108
continued
Label Preferred Stock Par Value Common Stock Par Value Number of Shares Outstanding Surplus Undivided Profits Reserves for Contingencies & Other Capital Reserve Blank Total Book Value Blank Total Liabilities, Reserves and Capital Accounts Total Capital Accounts and Minority Interest (Invested Capital) Report Date of Quarterly Earnings Per Share Interest and Fees on Loans Interest Inc on Fed Funds Sld & Secs Purchased under Agrmnts to Resell Blank Interest and Discount on U.S. Treasury Securities Interest on Securities of U.S. Government Agencies Interest and Dividends on Other Taxable Securities Total Taxable Investment Revenue Interest on Obligation of States and Political Subdivisions Foreign Exchange Gains and Losses Total Interest and Dividends on Investments Other Interest Income Trust Department Income Service Charges on Deposit Accounts Other Svce Charges, Collections & Exchange Charges Trading Account Income Other Current Operating Revenue Interest on Due from Banks Aggregate Other Current Operating Revenue Total Current Operating Revenue Salaries and Wages of Officers and Employees Pension and Employee Benefits Blank Interest on Deposits Interest Expense on Fed Funds Purchased & Secs Sld Interest on Other Borrowed Money Interest on Long-Term Debt—Not Classified as Cap Total Interest Expense Interest on Capital Notes and Debentures Provision for Loan Losses Occupancy Expense of Bank Premises—Net Furniture and Equipment:Depreciation Rental, Costs, Servicing, Etc.
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available Compustat Data Sets F 2269
Table 33.31
Fields BQ109 BQ110 BQ111 BQ112 BQ113 BQ114 BQ115 BQ116 BQ117 BQ118 BQ119 BQ120 BQ121 BQ122 BQ123 BQ124 BQ125 BQ126 BQ127 BQ128 BQ129 BQ130 BQ131 BQ132 BQ133 BQ134 BQ135 BQ136 BQ137 BQ138 BQ139 BQ140 BQ141 BQ142 BQ143 BQ144 BQ145 BQ146
continued
Label Blank Other Current Operating Expense Aggregate Other Current Operating Expense Total Current Operating Expense Current Operating Earnings before Income Tax Income Taxes Applicable to Current Operating Earnings Net Current Operating Earnings Minority Interest (Income Account) Net Current Operating Earnings after Minority Interest Net Pre-Tax Profit or Loss on Securities Sold or Redeemed Blank Blank Tax Effect on Profit or Loss on Securities Sold or Redeemed Minority Interest in After-Tax Profit or Loss on Sec sold or Redeemed Net After-Tax & After-Min Int Profit or Loss on Secs Sld or Redeemed Net Income Preferred Dividend Deductions Savings Due to Common Stock Equivalents Net Income Available for Common Net Current Operating Earnings Available for Common Cash Dividends Declared on Common Stock Cash Dividends Declared on Preferred Stock Net After-Tax Transfers bet Undivided Profits & Valuation Reserves Total Extraordinary Items Net of Taxes Net Credit or Charge to Reserves for Bad Debts for Loan Recs or Crg-Offs Blank Common Dividends Paid per Share by Payable Date Adjustment Factor Cumulative by Payable Date Net Current Operating Earnings per Share Excluding Extraordinary Items Net Income per Share Excluding Extraordinary Items Net Income per Share Including Extraordinary Items Common Shares Used in Calculating Quarterly Earnings per Share Net Cur Op Erns per Share—Excluding Extraordinary Items 12 Mo Moving Net Income per Share Excluding Extraordinary Items 12 Mo Moving Net Income per Share Including Extraordinary Items 12 Mo Moving Common Shares Used in Calculating 12 Mo Moving Earnings per Share Net Current Op Earngs per Share—Extraordinary Net Income per Share Excluding Extraordinary Items Fully Diluted
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2270 F Chapter 33: The SASECRSP Interface Engine
Table 33.31
Fields BQ147 BQ148 BQ149 BQ150 BQ151 BQ152 BQ153 BQ154 BQ155 BQ156 BQ157 BQ158 BQ159 BQ160 BQ161 BQ162 BQ163 BQ164 BQ165
continued
Label Net Income per Share Including Extraordinary Items Fully Diluted Common Shares Used in Calculating Quartly Fully Diluted Earnings per Shr Net Cur Op Erngs/Share Ex Extrd Items Fully Diluted 12 Mo Mov Net Inc/Share Excldg Extraordinary Items—Fully Net Inc per Share Inc Extraordinary Items—Fully Diluted—12 Mo Mov Common Shrs Used in Calc 12 Mo Moving Fully Diluted 12 Mo Mov Market Price 1st Month of Quarter High Market Price 1st Month of Quarter Low Market Price 1st Month of Quarter Close Market Price 2nd Month of quarter High Market Price 2nd Month of Quarter Low Market Price 2nd Month of Quarter Close Market Price 3rd Month of quarter High Market Price 3rd Month of Quarter Low Market Price 3rd Month of Quarter Close Common Dividends Paid per Share by Ex-Date Annualized Dividend Rate Common Shares Traded (Quarterly) Adjustment Factor Cumulative by Ex-Date
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Time Series Data Sets Table 33.32
Time Series Data Sets
Data Set PRCH High Price Time Series PRCL Low Price Time Series PRCC Closing Price Time Series DIV Dividends Per Share Time Series ERN Earnings Per Share Time Series SHSTRD Shares Traded
Fields GVKEY CALDT PRCH GVKEY CALDT PRCL GVKEY CALDT PRCC GVKEY CALDT DIV GVKEY CALDT ERN GVKEY CALDT
Label GVKEY Calendar Trading Date High Price GVKEY Calendar Trading Date Low Price GVKEY Calendar Trading Date Closing Price GVKEY Calendar Trading Date Dividends Per share GVKEY Calendar Trading Date Earnings Per Share GVKEY Calendar Trading Date
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available Compustat Data Sets F 2271
Table 33.32
continued
Data Set Time Series DIVRTE Annualized Dividend Rate Time Series RAWADJ Adjustment Factor Time Series CUMADJ Cumulative Adjustment Factor Time Series BKV Book Value Per Share Time Series CHEQVM Cash Equivalent Distribution CSHOQ Common Share Outstanding NAVM Net Asset Value Time Series OEPS12 Earnings/Share From Operations GICS Global Industry Class Standard Code CPSPIN S&P Index Primary Marker Time Series DIVFT Dividends per Share Footnotes RAWADJFT Raw Adjustment Factor Footnotes COMSTAFT Comparability Status Footnotes ISAFT Issue Status Alert Footnotes
Fields SHSTRD GVKEY CALDT DIVRTE GVKEY CALDT RAWADJ GVKEY CALDT CUMADJ GVKEY CALDT BKV GVKEY CALDT CHECQVM GVKEY CALDT CSHOQ GVKEY CALDT NAVM GVKEY CALDT OEPS12 GVKEY CALDT GICS GVKEY CALDT CPSPIN GVKEY CALDT DIVFT GVKEY CALDT RAWADJFT GVKEY CALDT COMSTAFT GVKEY CALDT ISAFT
Label Shares Traded GVKEY Calendar Trading Date Annual’d Dividend Rate GVKEY Calendar Trading Date Raw Adjustment Factor GVKEY Calendar Trading Date Cumulative Adjustment Factor GVKEY Calendar Trading Date Book Value Per Share GVKEY Calendar Trading Date Cash Equivalent Distributions GVKEY Calendar Trading Date Common Shares Outstanding GVKEY Calendar Trading Date Net Asset Value GVKEY Calendar Trading Date Earnings/Share from Operations GVKEY Calendar Trading Date Global Industry Class. Std. code GVKEY Calendar Trading Date S&P Index Primary Marker GVKEY Calendar Trading Date Dividends per share footnotes GVKEY Calendar Trading Date Raw adjustment factor footnotes GVKEY Calendar Trading Date Comparability status footnotes GVKEY Calendar Trading Date Issue status alert footnotes
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Character Numeric Numeric Character Numeric Numeric Character Numeric Numeric Character Numeric Numeric Character
2272 F Chapter 33: The SASECRSP Interface Engine
SEGSRC Data Set—Operating Segment Source History Table 33.33
Fields GVKEY SRCYR SRCFYR CALYR RCST1 SSRCE SUCODE CURCD SRCCUR HNAICS
SEGSRC Data Set—Operating Segment Source History
Label GVKEY Segment Source year Segment Source fiscal year end month Calendar Year Reserved 1 Source Document code Update code ISO currency code Source ISO currency code Segment Primary historical NAICS
Type Numeric Numeric Numeric Numeric Numeric Character Character Character Character Character
SEGPROD Data Set—Operating Segment Products History Table 33.34
Fields GVKEY SRCYR SRCFYR CALYR PDID PSID PSALE RCST1 PNAICS PSTYPE PNAME
SEGPROD Data Set—Operating Segment Products History
Label GVKEY Segment Source year Segment Source fiscal year end month Calendar Year Product Identifier Segment Link segment identifier External Revenues Reserved 1 Product NAICS code Segment link segment type Product Name
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Character Character Character
SEGCUST Data Set—Operating Segment Customer History Table 33.35
Fields GVKEY SRCYR SRCFYR CALYR CDID CSID CSALE RCST1 CTYPE
SEGCUST Data Set—Operating Segment Customer History
Label GVKEY Segment Source year Segment Source fiscal year end month Calendar Year Customer Identifier (cio) Segment Link segment identifier Customer Revenues Reserved 1 Customer type
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Character
Available Compustat Data Sets F 2273
Table 33.35
Fields CGEOCD CGEOAR CSTYPE CNAME
continued
Label Geographic area code Geographic area type Segment link - segment type Customer Name
Type Character Character Character Character
SEGDTL Data Set—Operating Segment Detail History Table 33.36
Fields GVKEY SRCYR SRCFYR CALYR SID RCST1 STYPE SOPTP1 SOPTP2 SGEOTP SNAME
SEGDTL Data Set—Operating Segment Detail History
Label GVKEY Segment Source year Segment Source fiscal year end month Calendar Year Segment Identifier Reserved 1 Segment type Operating segment type 1 Operating segment type 2 Geographic segment type Segment Name
Type Numeric Numeric Numeric Numeric Numeric Numeric Character Character Character Character Character
SEGNAICS Data Set—Operating Segment NAICS History Table 33.37
Fields GVKEY SRCYR SRCFYR CALYR SID RANK SIC RST1 SNAICS STYPE
SEGNAICS Data Set—Operating Segment NAICS History
Label GVKEY Segment Source year Segment Source fiscal year end month Calendar Year Segment Identifier Ranking Segment SIC Code Reserved 1 Segment NAICS code Segment type
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Character Character
SEGGEO Data Set—Geographic Segment History Table 33.38
Fields GVKEY
SEGGEO Data Set—Geographic Segment History
Label GVKEY
Type Numeric
2274 F Chapter 33: The SASECRSP Interface Engine
Table 33.38
Fields SRCYR SRCFYR CALYR SID RCST1 STYPE SGEOCD SGEOTP
continued
Label Segment Source year Segment Source fiscal year end month Calendar Year Segment Identifier Reserved 1 Segment type Geographic area code Geographic segment type
Type Numeric Numeric Numeric Numeric Numeric Character Character Character
SEGCUR Data Set—Segment Currency Data Table 33.39
Fields GVKEY DATYR DATFYR CALYR SRCYRFYR XRATE XRATE12 SRCCUR CURCD
SEGCUR Data Set—Segment Currency Data
Label GVKEY Segment Data year (year) Segment Data fiscal year end month (fyr) Segment Calendar Year (cyr) Segment Source year and source fiscal (fyr) Period end exchange rate 12-month moving Exchange rate Source currency code ISO Currency code (USD)
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Character Character
SEGITM Data Set—Segment Item Data Table 33.40
Fields GVKEY DATYR FISCYR SRCYR SRCFYR CALYR SID EMP SALE OIBD DP OIAD CAPX IAT EQEARN INVEQ
SEGITM Data Set—Segment Item Data
Label GVKEY Data year (year) Data Fiscal year end month (fyr) Source year Source fiscal year end month Data calendar year (cyr) Segment Identifier Employees Net Sales Operating income before depreciation Depreciation and amortization Operating income after depreciation Capital expenditures Identifiable/total Assets Equity in earnings Investments at equity
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available CRSP Indices Data Sets F 2275
Table 33.40
continued
Fields RD OBKLG EXPORTS INTSEG OPINC PI IB NI RCST1 RCST2 RCST3 SALEF OPINCF CAPXF EQEARNF EMPF RDF STYPE
Label Research and development Order backlog Export sales Intersegment eliminations Operating profit Pretax income Income Before Extraordinary Items Net Income (loss) Reserved 1 Reserved 2 Reserved 3 Footnote 1—sales Footnote 2—operating profit Footnote 3—capital expenditures Footnote 4—equity in earnings Footnote 5—employees Footnote 6—research and development Segment type
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Character Character Character Character Character Character Character
Available CRSP Indices Data Sets INDHEAD Data Set—CRSP Index Header Data Table 33.41
INDHEAD Data Set—CRSP Index Header Data
Fields INDNO INDCO PRIMFLAG PORTNUM INDNAME GROUPNAM
Label Permanent index identification number Permanent index group identification number Index primary link Portfolio number if subset series Index Name Index Group Name
Type Numeric Numeric Numeric Numeric Character Character
REBAL Data Set—Index Rebalancing History Arrays Table 33.42
Fields INDNO RBEGDT RENDDT USDCNT MAXCNT
REBAL Data Set—Index Rebalancing History Arrays
Label INDNO Rebalancing beginning date Rebalancing ending date Count used as of rebalancing Maximum count during period
Type Numeric Numeric Numeric Numeric Numeric
2276 F Chapter 33: The SASECRSP Interface Engine
Table 33.42
Fields TOTCNT ENDCNT MINID MAXID MINSTA MAXSTA MEDSTA AVGSTA
continued
Label Available count as of rebalancing Count at end of period Identifier at minimum value Identifier at maximum value Smallest statistic in period Largest statistic in period Median statistic in period Average statistic in period
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
REBAL Group Data Set—Index Rebalancing History Group Array Table 33.43
Fields INDNO RBEGDT1 RBEGDT2 RBEGDT3 RBEGDT4 RBEGDT5 RBEGDT6 RBEGDT7 RBEGDT8 RBEGDT9 RBEGDT10 RENDDT1 RENDDT2 RENDDT3 RENDDT4 RENDDT5 RENDDT6 RENDDT7 RENDDT8 RENDDT9 RENDDT10 USDCNT1 USDCNT2 USDCNT3 USDCNT4 USDCNT5 USDCNT6 USDCNT7 USDCNT8 USDCNT9 USDCNT10
REBAL Group Data Set—Index Rebalancing History Group Array
Label INDNO Rebalancing beginning date for port 1 Rebalancing beginning date for port 2 Rebalancing beginning date for port 3 Rebalancing beginning date for port 4 Rebalancing beginning date for port 5 Rebalancing beginning date for port 6 Rebalancing beginning date for port 7 Rebalancing beginning date for port 8 Rebalancing beginning date for port 9 Rebalancing beginning date for port 10 Rebalancing ending date for port 1 Rebalancing ending date for port 2 Rebalancing ending date for port 3 Rebalancing ending date for port 4 Rebalancing ending date for port 5 Rebalancing ending date for port 6 Rebalancing ending date for port 7 Rebalancing ending date for port 8 Rebalancing ending date for port 9 Rebalancing ending date for port 10 Count used as of rebalancing for port 1 Count used as of rebalancing for port 2 Count used as of rebalancing for port 3 Count used as of rebalancing for port 4 Count used as of rebalancing for port 5 Count used as of rebalancing for port 6 Count used as of rebalancing for port 7 Count used as of rebalancing for port 8 Count used as of rebalancing for port 9 Count used as of rebalancing for port10
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available CRSP Indices Data Sets F 2277
Table 33.43
continued
Fields MAXCNT1 MAXCNT2 MAXCNT3 MAXCNT4 MAXCNT5 MAXCNT6 MAXCNT7 MAXCNT8 MAXCNT9 MAXCNT10 TOTCNT1 TOTCNT2 TOTCNT3 TOTCNT4 TOTCNT5 TOTCNT6 TOTCNT7 TOTCNT8 TOTCNT9 TOTCNT10 ENDCNT1 ENDCNT2 ENDCNT3 ENDCNT4 ENDCNT5 ENDCNT6 ENDCNT7 ENDCNT8 ENDCNT9 ENDCNT10 MINID1 MINID2 MINID3 MINID4 MINID5 MINID6 MINID7 MINID8 MINID9 MINID10 MAXID1 MAXID2 MAXID3 MAXID4 MAXID5
Label Maximum count during period for port 1 Maximum count during period for port 2 Maximum count during period for port 3 Maximum count during period for port 4 Maximum count during period for port 5 Maximum count during period for port 6 Maximum count during period for port 7 Maximum count during period for port 8 Maximum count during period for port 9 Maximum count during period for port 10 Available count as of rebalancing for port 1 Available count as of rebalancing for port 2 Available count as of rebalancing for port 3 Available count as of rebalancing for port 4 Available count as of rebalancing for port 5 Available count as of rebalancing for port 6 Available count as of rebalancing for port 7 Available count as of rebalancing for port 8 Available count as of rebalancing for port 9 Available count as of rebalancing for port10 Count at end of period for port 1 Count at end of period for port 2 Count at end of period for port 3 Count at end of period for port 4 Count at end of period for port 5 Count at end of period for port 6 Count at end of period for port 7 Count at end of period for port 8 Count at end of period for port 9 Count at end of period for port 10 Identifier at minimum value for port 1 Identifier at minimum value for port 2 Identifier at minimum value for port 3 Identifier at minimum value for port 4 Identifier at minimum value for port 5 Identifier at minimum value for port 6 Identifier at minimum value for port 7 Identifier at minimum value for port 8 Identifier at minimum value for port 9 Identifier at minimum value for port 10 Identifier at maximum value for port 1 Identifier at maximum value for port 2 Identifier at maximum value for port 3 Identifier at maximum value for port 4 Identifier at maximum value for port 5
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2278 F Chapter 33: The SASECRSP Interface Engine
Table 33.43
Fields MAXID6 MAXID7 MAXID8 MAXID9 MAXID10 MINSTA1 MINSTA2 MINSTA3 MINSTA4 MINSTA5 MINSTA6 MINSTA7 MINSTA8 MINSTA9 MINSTA10 MAXSTA1 MAXSTA2 MAXSTA3 MAXSTA4 MAXSTA5 MAXSTA6 MAXSTA7 MAXSTA8 MAXSTA9 MAXSTA10 MEDSTA1 MEDSTA2 MEDSTA3 MEDSTA4 MEDSTA5 MEDSTA6 MEDSTA7 MEDSTA8 MEDSTA9 MEDSTA10 AVGSTA1 AVGSTA2 AVGSTA3 AVGSTA4 AVGSTA5 AVGSTA6 AVGSTA7 AVGSTA8 AVGSTA9 AVGSTA10
continued
Label Identifier at maximum value for port 6 Identifier at maximum value for port 7 Identifier at maximum value for port 8 Identifier at maximum value for port 9 Identifier at maximum alue for port 10 Smallest statistic in period for port 1 Smallest statistic in period for port 2 Smallest statistic in period for port 3 Smallest statistic in period for port 4 Smallest statistic in period for port 5 Smallest statistic in period for port 6 Smallest statistic in period for port 7 Smallest statistic in period for port 8 Smallest statistic in period for port 9 Smallest statistic in period for port 10 Largest statistic in period for port 1 Largest statistic in period for port 2 Largest statistic in period for port 3 Largest statistic in period for port 4 Largest statistic in period for port 5 Largest statistic in period for port 6 Largest statistic in period for port 7 Largest statistic in period for port 8 Largest statistic in period for port 9 Largest statistic in period for port 10 Median statistic in period for port 1 Median statistic in period for port 2 Median statistic in period for port 3 Median statistic in period for port 4 Median statistic in period for port 5 Median statistic in period for port 6 Median statistic in period for port 7 Median statistic in period for port 8 Median statistic in period for port 9 Median statistic in period for port 10 Average statistic in period for port 1 Average statistic in period for port 2 Average statistic in period for port 3 Average statistic in period for port 4 Average statistic in period for port 5 Average statistic in period for port 6 Average statistic in period for port 7 Average statistic in period for port 8 Average statistic in period for port 9 Average statistic in period for port 10
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available CRSP Indices Data Sets F 2279
LIST Data Set—Index Membership List Arrays Table 33.44
Fields INDNO PERMNO BEGDT ENDDT SUBIND WEIGHT
LIST Data Set—Index Membership List Arrays
Label INDNO Issue identifier First date included Last date included Code for subcategory of list Weight during range
Type Numeric Numeric Numeric Numeric Numeric Numeric
LIST Group Data Set—Index Membership List Group Arrays Table 33.45
Fields INDNO PERMNO1 BEGDT1 ENDDT1 SUBIND1 WEIGHT1
LIST Group Data Set—Index Membership List Group Arrays
Label INDNO Issue identifier First date included Last date included Code for subcategory of list Weight during range
Type Numeric Numeric Numeric Numeric Numeric Numeric
USDCNT Data Set—Portfolio Used Count Array Table 33.46
Fields INDNO CALDT USDCNT
USDCNT Data Set—Portfolio Used Count Array
Label INDNO Calendar Trading Date Portfolio Used Count
Type Numeric Numeric Numeric
TOTCNT Data Set—Portfolio Total Count Array Table 33.47
Fields INDNO CALDT TOTCNT
TOTCNT Data Set—Portfolio Total Count Array
Label INDNO Calendar Trading Date Portfolio Used Count
Type Numeric Numeric Numeric
2280 F Chapter 33: The SASECRSP Interface Engine
USDCNT Group Data Set—Portfolio Used Time Series Group Table 33.48
Fields INDNO CALDT USDCNT1 USDCNT2 USDCNT3 USDCNT4 USDCNT5 USDCNT6 USDCNT7 USDCNT8 USDCNT9 USDCNT10 USDCNT11 USDCNT12 USDCNT13 USDCNT14 USDCNT15 USDCNT16 USDCNT17
USDCNT Group Data Set—Portfolio Used Time Series Group
Label INDNO Calendar Trading Date Used Count for Port 1 Used Count for Port 2 Used Count for Port 3 Used Count for Port 4 Used Count for Port 5 Used Count for Port 6 Used Count for Port 7 Used Count for Port 8 Used Count for Port 9 Used Count for Port 10 Used Count for Port 11 Used Count for Port 12 Used Count for Port 13 Used Count for Port 14 Used Count for Port 15 Used Count for Port 16 Used Count for Port 17
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
TOTCNT Group Data Set—Portfolio Total Count Time Series Groups Table 33.49
Fields INDNO CALDT TOTCNT1 TOTCNT2 TOTCNT3 TOTCNT4 TOTCNT5 TOTCNT6 TOTCNT7 TOTCNT8 TOTCNT9 TOTCNT10 TOTCNT11 TOTCNT12 TOTCNT13 TOTCNT14 TOTCNT15 TOTCNT16
TOTCNT Group Data Set—Portfolio Total Count Time Series Groups
Label INDNO Calendar Trading Date Total Count for Port 1 Total Count for Port 2 Total Count for Port 3 Total Count for Port 4 Total Count for Port 5 Total Count for Port 6 Total Count for Port 7 Total Count for Port 8 Total Count for Port 9 Total Count for Port10 Total Count for Port11 Total Count for Port12 Total Count for Port13 Total Count for Port14 Total Count for Port15 Total Count for Port16
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Available CRSP Indices Data Sets F 2281
Table 33.49
Fields TOTCNT17
continued
Label Total Count for Port17
Type Numeric
USDVAL Data Set—Portfolio Used Value Array Table 33.50
Fields INDNO CALDT USDVAL
USDVAL Data Set—Portfolio Used Value Array
Label INDNO Calendar Trading Date Portfolio Used Value
Type Numeric Numeric Numeric
TOTVAL Data Set—Portfolio Total Value Array Table 33.51
Fields INDNO CALDT TOTVAL
TOTVAL Data Set—Portfolio Total Value Array
Label INDNO Calendar Trading Date Portfolio Total Value
Type Numeric Numeric Numeric
USDVAL Group Data Set—Portfolio Used Value Time Series Groups Table 33.52
Fields INDNO CALDT USDVAL1 USDVAL2 USDVAL3 USDVAL4 USDVAL5 USDVAL6 USDVAL7 USDVAL8 USDVAL9 USDVAL10 USDVAL11 USDVAL12 USDVAL13 USDVAL14 USDVAL15 USDVAL16 USDVAL17
USDVAL Group Data Set—Portfolio Used Value Time Series Groups
Label INDNO Calendar Trading Date Used Value for Port 1 Used Value for Port 2 Used Value for Port 3 Used Value for Port 4 Used Value for Port 5 Used Value for Port 6 Used Value for Port 7 Used Value for Port 8 Used Value for Port 9 Used Value for Port 10 Used Value for Port 11 Used Value for Port 12 Used Value for Port 13 Used Value for Port 14 Used Value for Port 15 Used Value for Port 16 Used Value for Port 17
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2282 F Chapter 33: The SASECRSP Interface Engine
TOTVAL Group Data Set—Portfolio Total Value Time Series Groups Table 33.53
Fields INDNO CALDT TOTVAL1 TOTVAL2 TOTVAL3 TOTVAL4 TOTVAL5 TOTVAL6 TOTVAL7 TOTVAL8 TOTVAL9 TOTVAL10 TOTVAL11 TOTVAL12 TOTVAL13 TOTVAL14 TOTVAL15 TOTVAL16 TOTVAL17
TOTVAL Group Data Set—Portfolio Total Value Time Series Groups
Label INDNO Calendar Trading Date Total Value for Port 1 Total Value for Port 2 Total Value for Port 3 Total Value for Port 4 Total Value for Port 5 Total Value for Port 6 Total Value for Port 7 Total Value for Port 8 Total Value for Port 9 Total Value for Port10 Total Value for Port11 Total Value for Port12 Total Value for Port13 Total Value for Port14 Total Value for Port15 Total Value for Port16 Total Value for Port17
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
TRET Data Set—Total Returns Time Series Table 33.54
Fields INDNO CALDT TRET
TRET Data Set—Total Returns Time Series
Label INDNO Calendar Trading Date Total Returns
Type Numeric Numeric Numeric
ARET Data Set—Appreciation Returns Time Series Table 33.55
Fields INDNO CALDT ARET
ARET Data Set—Appreciation Returns Time Series
Label INDNO Calendar Trading Date Appreciation Returns Time Series
Type Numeric Numeric Numeric
Available CRSP Indices Data Sets F 2283
IRET Data Set—Income Returns Time Series Table 33.56
Fields INDNO CALDT IRET
IRET Data Set—Income Returns Time Series
Label INDNO Calendar Trading Date Income Returns
Type Numeric Numeric Numeric
TRET Group Data Set—Total Returns Time Series Groups Table 33.57
Fields INDNO CALDT TRET1 TRET2 TRET3 TRET4 TRET5 TRET6 TRET7 TRET8 TRET9 TRET10 TRET11 TRET12 TRET13 TRET14 TRET15 TRET16 TRET17
TRET Group Data Set—Total Returns Time Series Groups
Label INDNO Calendar Trading Date Total Returns for Port 1 Total Returns for Port 2 Total Returns for Port 3 Total Returns for Port 4 Total Returns for Port 5 Total Returns for Port 6 Total Returns for Port 7 Total Returns for Port 8 Total Returns for Port 9 Total Returns for Port 10 Total Returns for Port 11 Total Returns for Port 12 Total Returns for Port 13 Total Returns for Port 14 Total Returns for Port 15 Total Returns for Port 16 Total Returns for Port 17
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
ARET Group Data Set—Appreciation Returns Time Series Groups Table 33.58
Fields INDNO CALDT ARET1 ARET2 ARET3 ARET4 ARET5 ARET6
ARET Group Data Set—Appreciation Returns Time Series Groups
Label INDNO Calendar Trading Date Appreciation Returns for Port 1 Appreciation Returns for Port 2 Appreciation Returns for Port 3 Appreciation Returns for Port 4 Appreciation Returns for Port 5 Appreciation Returns for Port 6
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2284 F Chapter 33: The SASECRSP Interface Engine
Table 33.58
Fields ARET7 ARET8 ARET9 ARET10 ARET11 ARET12 ARET13 ARET14 ARET15 ARET16 ARET17
continued
Label Appreciation Returns for Port 7 Appreciation Returns for Port 8 Appreciation Returns for Port 9 Appreciation Returns for Port 10 Appreciation Returns for Port 11 Appreciation Returns for Port 12 Appreciation Returns for Port 13 Appreciation Returns for Port 14 Appreciation Returns for Port 15 Appreciation Returns for Port 16 Appreciation Returns for Port 17
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
IRET Group Data Set—Income Returns Time Series Groups Table 33.59
Fields INDNO CALDT IRET1 IRET2 IRET3 IRET4 IRET5 IRET6 IRET7 IRET8 IRET9 IRET10 IRET11 IRET12 IRET13 IRET14 IRET15 IRET16 IRET17
IRET Group Data Set—Income Returns Time Series Groups
Label INDNO Calendar Trading Date Income Returns for Port 1 Income Returns for Port 2 Income Returns for Port 3 Income Returns for Port 4 Income Returns for Port 5 Income Returns for Port 6 Income Returns for Port 7 Income Returns for Port 8 Income Returns for Port 9 Income Returns for Port 10 Income Returns for Port 11 Income Returns for Port 12 Income Returns for Port 13 Income Returns for Port 14 Income Returns for Port 15 Income Returns for Port 16 Income Returns for Port 17
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
TIND Data Set—Total Return Index Levels Time Series Table 33.60
Fields INDNO CALDT TIND
TIND Data Set—Total Return Index Levels Time Series
Label INDNO Calendar Trading Date Total Return Index Levels
Type Numeric Numeric Numeric
Available CRSP Indices Data Sets F 2285
AIND Data Set—Appreciation Index Levels Time Series Table 33.61
Fields INDNO CALDT AIND
AIND Data Set—Appreciation Index Levels Time Series
Label INDNO Calendar Trading Date Appreciation Index Levels
Type Numeric Numeric Numeric
IIND Data Set—Income Index Levels Time Series Table 33.62
Fields INDNO CALDT IIND
IIND Data Set—Income Index Levels Time Series
Label INDNO Calendar Trading Date Income Index Levels
Type Numeric Numeric Numeric
TIND Group Data Set—Total Return Index Levels Time Series Groups Table 33.63
Fields INDNO CALDT TIND1 TIND2 TIND3 TIND4 TIND5 TIND6 TIND7 TIND8 TIND9 TIND10 TIND11 TIND12 TIND13 TIND14 TIND15 TIND16 TIND17
TIND Group Data Set—Total Return Index Levels Time Series Groups
Label INDNO Calendar Trading Date Total Return Index Levels for Port 1 Total Return Index Levels for Port 2 Total Return Index Levels for Port 3 Total Return Index Levels for Port 4 Total Return Index Levels for Port 5 Total Return Index Levels for Port 6 Total Return Index Levels for Port 7 Total Return Index Levels for Port 8 Total Return Index Levels for Port 9 Total Return Index Levels for Port 10 Total Return Index Levels for Port 11 Total Return Index Levels for Port 12 Total Return Index Levels for Port 13 Total Return Index Levels for Port 14 Total Return Index Levels for Port 15 Total Return Index Levels for Port 16 Total Return Index Levels for Port 17
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
2286 F Chapter 33: The SASECRSP Interface Engine
AIND Group Data Set—Appreciation Index Levels Groups Table 33.64
Fields INDNO CALDT AIND1 AIND2 AIND3 AIND4 AIND5 AIND6 AIND7 AIND8 AIND9 AIND10 AIND11 AIND12 AIND13 AIND14 AIND15 AIND16 AIND17
AIND Group Data Set—Appreciation Index Levels Groups
Label INDNO Calendar Trading Date Appreciation Index Levels for Port 1 Appreciation Index Levels for Port 2 Appreciation Index Levels for Port 3 Appreciation Index Levels for Port 4 Appreciation Index Levels for Port 5 Appreciation Index Levels for Port 6 Appreciation Index Levels for Port 7 Appreciation Index Levels for Port 8 Appreciation Index Levels for Port 9 Appreciation Index Levels for Port 10 Appreciation Index Levels for Port 11 Appreciation Index Levels for Port 12 Appreciation Index Levels for Port 13 Appreciation Index Levels for Port 14 Appreciation Index Levels for Port 15 Appreciation Index Levels for Port 16 Appreciation Index Levels for Port 17
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
IIND Group Data Set—Income Index Levels Time Series Groups Table 33.65
Fields INDNO CALDT IIND1 IIND2 IIND3 IIND4 IIND5 IIND6 IIND7 IIND8 IIND9 IIND10 IIND11 IIND12 IIND13 IIND14 IIND15 IIND16
IIND Group Data Set—Income Index Levels Time Series Groups
Label INDNO Calendar Trading Date Income Index Levels for Port 1 Income Index Levels for Port 2 Income Index Levels for Port 3 Income Index Levels for Port 4 Income Index Levels for Port 5 Income Index Levels for Port 6 Income Index Levels for Port 7 Income Index Levels for Port 8 Income Index Levels for Port 9 Income Index Levels for Port 10 Income Index Levels for Port 11 Income Index Levels for Port 12 Income Index Levels for Port 13 Income Index Levels for Port 14 Income Index Levels for Port 15 Income Index Levels for Port 16
Type Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric Numeric
References F 2287
Table 33.65
continued
Fields IIND17
Label Income Index Levels for Port 17
Type Numeric
References Center for Research in Security Prices (2003), CRSP/Compustat Merged Database Guide, Chicago: The University of Chicago Graduate School of Business. Center for Research in Security Prices (2003), CRSP Data Description Guide, Chicago: The University of Chicago Graduate School of Business, [http://www.crsp.uchicago.edu/support/documentation/index.html]. Center for Research in Security Prices (2002), CRSP Programmer’s Guide, Chicago: The University of Chicago Graduate School of Business, [http://www.crsp.uchicago.edu/support/documentation/index.html]. Center for Research in Security Prices (2003), CRSPAccess Database Format Release Notes, Chicago: The University of Chicago Graduate School of Business, [http://www.crsp.uchicago.edu/support/documentation/index.html]. Center for Research in Security Prices (2003), CRSP Utilities Guide, Chicago: The University of Chicago Graduate School of Business, [http://www.crsp.uchicago.edu/support/documentation/index.html]. Center for Research in Security Prices (2002), CRSP SFA Guide, Chicago: The University of Chicago Graduate School of Business, [http://www.crsp.uchicago.edu/support/documentation/index.html].
Acknowledgments Many people have been instrumental in the development of the ETS Interface engine. The individuals listed here have been especially helpful. Janet Eder, Center for Research in Security Prices, University of Chicago Graduate School of Business. Ken Kraus, Center for Research in Security Prices, University of Chicago Graduate School of Business. Bob Spatz, Center for Research in Security Prices, University of Chicago Graduate School of Business. Rick Langston, SAS Institute, Cary, NC.
2288 F Chapter 33: The SASECRSP Interface Engine
Kelly Fellingham, SAS Institute, Cary, NC. Peng Zang, SAS Institute, Atlanta, GA. The final responsibility for the SAS System lies with SAS Institute alone. We hope that you will always let us know your opinions about the SAS System and its documentation. It is through your participation that SAS software is continuously improved.
Chapter 34
The SASEFAME Interface Engine Contents Overview: SASEFAME Interface Engine . . . . . . . . . . . . . . . . . . . . . . Getting Started: SASEFAME Interface Engine . . . . . . . . . . . . . . . . . . . Structure of a SAS Data Set Containing Time Series Data . . . . . . . . . . Reading and Converting FAME Database Time Series . . . . . . . . . . . . Using the SAS DATA Step . . . . . . . . . . . . . . . . . . . . . . . . . . . Using SAS Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the SAS Windowing Environment . . . . . . . . . . . . . . . . . . . Using Remote FAME Data Access . . . . . . . . . . . . . . . . . . . . . . Creating Views of Time Series Using SASEFAME LIBNAME Options . . . Syntax: SASEFAME Interface Engine . . . . . . . . . . . . . . . . . . . . . . . . The LIBNAME libref SASEFAME Statement . . . . . . . . . . . . . . . . Details: SASEFAME Interface Engine . . . . . . . . . . . . . . . . . . . . . . . . The SAS Output Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . Mapping FAME Frequencies to SAS Time Intervals . . . . . . . . . . . . . Examples: SASEFAME Interface Engine . . . . . . . . . . . . . . . . . . . . . . Example 34.1: Converting an Entire FAME Database . . . . . . . . . . . . Example 34.2: Reading Time Series from the FAME Database . . . . . . . Example 34.3: Writing Time Series to the SAS Data Set . . . . . . . . . . . Example 34.4: Limiting the Time Range of Data . . . . . . . . . . . . . . . Example 34.5: Creating a View Using the SQL Procedure and SASEFAME Example 34.6: Reading Other FAME Data Objects with the FAMEOUT= Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example 34.7: Remote FAME Access Using FAME CHLI . . . . . . . . . Example 34.8: Selecting Time Series Using CROSSLIST= Option and KEEP Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example 34.9: Selecting Time Series Using CROSSLIST= Option with a FAME Namelist of Tickers . . . . . . . . . . . . . . . . . . . . . . Example 34.10: Selecting Time Series Using CROSSLIST= Option with INSET= and WHERE=TICK . . . . . . . . . . . . . . . . . . . . . . Example 34.11: Selecting Series Using FAMEOUT= Option . . . . . . . . References: SASEFAME Interface Engine . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2290 2290 2290 2290 2291 2291 2291 2291 2292 2292 2293 2297 2297 2298 2299 2300 2303 2304 2307 2311 2316 2319 2319 2322 2325 2327 2336 2337
2290 F Chapter 34: The SASEFAME Interface Engine
Overview: SASEFAME Interface Engine The SASEFAME interface engine enables SAS users to access and process time series, case series, and formulas residing in a FAME database, and provides a seamless interface between FAME and SAS data processing. FAME (Financial Analytic Modeling Environment) is an integrated, front-to-back market data and historical database solution for storing and managing real-time and high-volume time series data that is used by leading institutions in the financial, energy, and public sectors, as well as third-party content aggregators, software vendors, and individual investors. FAME provides real-time market data feeds, a Web-based desktop solution, application hosting, data delivery components, and tools for performing analytic modeling. The SASEFAME engine uses the LIBNAME statement to enable you to specify which time series you would like to read from the FAME database, and how you would like to convert the selected time series to the same time scale. The SAS DATA step can then be used to perform further subsetting and to store the resulting time series in a SAS data set. You can perform more analysis if desired either in the same SAS session or in another session at a later time. SASEFAME for SAS 8.2 supports Windows, Solaris, AIX, and HP-UX hosts. SASEFAME for SAS 9.2 supports Windows, Solaris, AIX, Linux, Linux Opteron, and HP-UX hosts.
Getting Started: SASEFAME Interface Engine
Structure of a SAS Data Set Containing Time Series Data SAS requires time series data to be in a specific form recognizable by the SAS System. This form is a two-dimensional array, called a SAS data set, whose columns correspond to series variables and whose rows correspond to measurements of these variables at certain time periods. The time periods at which observations are recorded can be included in the data set as a time ID variable. The SASEFAME engine provides a time ID variable named DATE.
Reading and Converting FAME Database Time Series The SASEFAME engine supports reading and converting time series stored in FAME databases. The SASEFAME engine uses the FAME WORK database to temporarily store the converted time series. All series specified by the FAME wildcard are written to the FAME WORK database. For
Using the SAS DATA Step F 2291
conversion of very large databases, you may want to define the FAME_TEMP environment variable to point to a location where there is ample space for FAME WORK. SASEFAME provides seamless access to FAME databases via FAME’s C Host Language Interface (CHLI). The FAME CHLI does not support writing more than 2 gigabytes to the FAME WORK area, and when this happens SASEFAME will terminate with a system error. The SASEFAME engine finishes the CHLI whenever a fatal error occurs. To restart the engine after a fatal error, terminate the current SAS session and bring up a new SAS session.
Using the SAS DATA Step If desired, you can store the converted series in a SAS data set by using the SAS DATA step. You can also perform other operations on your data inside the DATA step. Once your data is stored in a SAS data set you can use it as you would any other SAS data set.
Using SAS Procedures You can print the output SAS data set by using the PRINT procedure and report information concerning the contents of your data set by using the CONTENTS procedure, as in Example 34.1. You can create a view of the FAME database by using the SQL procedure to create your view using the SASEFAME engine in your libref, along with the using clause. See Example 34.5.
Using the SAS Windowing Environment You can see the available data sets in the SAS LIBNAME window of the SAS windowing environment by selecting the SASEFAME libref in the LIBNAME window that you have previously defined in your LIBNAME statement. You can view your SAS output observations by double-clicking on the desired output data set libref in the LIBNAME window of the SAS windowing environment. You can type Viewtable on the SAS command line to view any of your SASEFAME tables, views, or librefs both for input and output data sets.
Using Remote FAME Data Access The remote access feature of the SASEFAME interface uses the FAME CHLI to communicate with your remote FAME server, and it is available to licensed CHLI customers who have FAME CHLI on both the remote and client machines.
2292 F Chapter 34: The SASEFAME Interface Engine
As shown in Example 34.7, you simply provide the frdb_m port number and node name of your FAME master server in your libref. For more details, refer to “Starting the Master Server” in the Guide to FAME Database Servers.
Creating Views of Time Series Using SASEFAME LIBNAME Options You can perform selection based on names of your time series simply by using FAME wildcard specifications in your SASEFAME WILDCARD= option. You can limit the time span of time series data by specifying a begin and end date range in your SASEFAME RANGE= option. It is also easy to use the SAS input data set INSET= option to create a specific view of your FAME data. Multiple views can be created by using multiple LIBNAME statements with customized options tailored to the unique view that you want to create. The INSET variables define the BY variables that enable you to view cross sections or slices of your data. When used in conjunction with the WHERE clause and the CROSSLIST= option, SASEFAME can show any or all of your BY groups in the same view or in multiple views. The INSET= option is invalid without a WHERE clause specifying the BY variables you want to use in your view, and it must be used with the CROSSLIST= option, as shown in Example 34.10. The CROSSLIST= option can be used without using the INSET= option as shown in Example 34.8 and Example 34.9.
Syntax: SASEFAME Interface Engine The SASEFAME engine uses standard engine syntax. Options used by SASEFAME are summarized in Table 34.1.
The LIBNAME libref SASEFAME Statement F 2293
Table 34.1
Summary of LIBNAME libref SASEFAME Statement
Option CONVERT= WILDCARD=
RANGE= INSET=
CROSSLIST= FAMEOUT=
Description specifies the FAME frequency and the FAME technique specifies a FAME wildcard to match data object series names within the FAME database, which limits the selection of time series that are included in the SAS data set specifies the range of data to keep in format ’ddmonyyyy’d - ’ddmonyyyy’d uses a SAS data set named setname and/or WHERE= FAME namelist as selection input for BY variables such as tickers or issues stored in a FAME string case series specifies a FAME crosslist namelist to perform selection based on the crossproduct of two FAME namelists specifies the FAME data object class/type you want output to the SAS data set
The LIBNAME libref SASEFAME Statement LIBNAME libref SASEFAME ’physical name ’ options ;
Since physical name specifies the location of the folder where your FAME database resides, it should end in a backslash if you are in a Windows environment, or a forward slash if you are in a UNIX environment. If you are accessing a remote FAME database by using FAME CHLI, you can use the following syntax for physical name: ’#port_number@hostname physical_path_name ’ The following options can be used in the LIBNAME libref SASEFAME statement. CONVERT=( FREQ=fame_frequency TECH=fame_technique)
specifies the FAME frequency and the FAME technique just as you would in the FAME CONVERT function. There are four possible values for fame_technique: CONSTANT (default), CUBIC, DISCRETE, or LINEAR. All FAME frequencies except PPY and YPP are supported by the SASEFAME engine. For a more complete discussion of FAME frequencies and SAS time intervals, see the section “Mapping FAME Frequencies to SAS Time Intervals” on page 2298. For all possible fame_frequency values, refer to “Understanding Frequencies” in the User’s Guide to FAME. For example: LIBNAME libref sasefame ’physical-name’ CONVERT=(TECH=CONSTANT FREQ=TWICEMONTHLY);
2294 F Chapter 34: The SASEFAME Interface Engine
WILDCARD="fame_wildcard"
By default, the SASEFAME engine reads all time series in the FAME database that you name in your SASEFAME libref. You can limit the time series read from the FAME database by specifying the WILDCARD= option in your LIBNAME statement. The fame_wildcard is a quoted string containing the FAME wildcard you want to use. The wildcard is used for matching against the data object names of series you want to select from the FAME database that resides in the library you are assigning. For more information about wildcarding, see “Specifying Wildcards” in the User’s Guide to FAME. For example, to read all time series in the TEST library being accessed by the SASEFAME engine, you would specify LIBNAME test sasefame ’physical name of test database’ WILDCARD="?";
To read series with names such as A_DATA, B_DATA, or C_DATA, you could specify LIBNAME test sasefame ’physical name of test database’ WILDCARD="^_DATA";
When you use the WILDCARD= option, you are limiting the number of series that are read and converted to the desired frequency. This option can help you save resources when processing large databases or when processing a large number of observations, such as daily or hourly frequencies. Since the SASEFAME engine uses the FAME WORK database to store the converted time series, using wildcards is recommended to prevent your WORK space from getting too large. When the FAMEOUT= option is also specified, the wildcard is applied to the type of data object series you specify in the FAMEOUT= option. RANGE=’fame_begdt’d-’fame_enddt’d
To limit the time range of data read from your FAME database, specify the RANGE= option in your SASEFAME libref, where fame_begdt is the beginning date in ddmonyyyy format and fame_enddt is the ending date of the range in ddmonyyyy format. As an example, to read a series with a date range that spans the first quarter of 1999, you could use the following statement: LIBNAME test sasefame ’physical name of test database’ RANGE=’01jan1999’d - ’31mar1999’d;
INSET=(setname WHERE=fame_bygroup )
When you specify a SAS data set named setname as input for a BY group such as tickers, the SASEFAME engine uses the fame_bygroup to select time series that are named using the following convention. Selected variable names are glued together by the ticker name concatenated with the glue character (such as DOT) to the series name that is specified in the CROSSLIST= option or the fame_namelist. See the following example that uses both the CROSSLIST= option and the INSET= option with the WHERE= clause.
The LIBNAME libref SASEFAME Statement F 2295
CROSSLIST=( [ fame_namelist1 ] , fame_namelist2 )
There are two methods for performing the crosslist selection function. The first method uses two FAME namelists, and the second method uses one namelist and one BY group specified in the WHERE= clause of the INSET= option. Using the CROSSLIST= option with the optional fame_namelist1 causes the SASEFAME engine to perform a crossproduct of the first namelist’s members with the second namelist’s members, using a glue symbol "." to join the two. For example, if your FAME database has a namelist named TICKER defined by Ticker = {AOL, C, CVX, F, GM, HPQ, IBM, INDUA, INTC, SPX, SUNW, XOM}
and your time series are named in fame_namelist2 as adjust, close, high, low, open, volume, uclose, uhigh, ulow, uopen, uvolume when you specify LIBNAME test sasefame ’physical name of test database’ RANGE=’01jan1999’d - ’31mar1999’d CROSSLIST=(nl(ticker), {adjust, close, high, low, open, volume, uclose, uhigh, ulow, uopen, uvolume}) ;
then the 132 variables shown in Table 34.2 are selected by the CROSSLIST= option. Table 34.2
SAS Variables Selected by CROSSLIST= Option
AOL.ADJUST AOL.CLOSE AOL.HIGH AOL.LOW AOL.OPEN AOL.UCLOSE AOL.UHIGH AOL.ULOW AOL.UOPEN AOL.UVOLUME AOL.VOLUME GM.ADJUST GM.CLOSE GM.HIGH GM.LOW GM.OPEN GM.UCLOSE GM.UHIGH GM.ULOW GM.UOPEN GM.UVOLUME GM.VOLUME
C.ADJUST C.CLOSE C.HIGH C.LOW C.OPEN C.UCLOSE C.UHIGH C.ULOW C.UOPEN C.UVOLUME C.VOLUME HPQ.ADJUST HPQ.CLOSE HPQ.HIGH HPQ.LOW HPQ.OPEN HPQ.UCLOSE HPQ.UHIGH HPQ.ULOW HPQ.UOPEN HPQ.UVOLUME HPQ.VOLUME
CVX.ADJUST CVX.CLOSE CVX.HIGH CVX.LOW CVX.OPEN CVX.UCLOSE CVX.UHIGH CVX.ULOW CVX.UOPEN CVX.UVOLUME CVX.VOLUME IBM.ADJUST IBM.CLOSE IBM.HIGH IBM.LOW IBM.OPEN IBM.UCLOSE IBM.UHIGH IBM.ULOW IBM.UOPEN IBM.UVOLUME IBM.VOLUME
F.ADJUST F.CLOSE F.HIGH F.LOW F.OPEN F.UCLOSE F.UHIGH F.ULOW F.UOPEN F.UVOLUME F.VOLUME INDUA.ADJUST INDUA.CLOSE INDUA.HIGH INDUA.LOW INDUA.OPEN INDUA.UCLOSE INDUA.UHIGH INDUA.ULOW INDUA.UOPEN INDUA.UVOLUME INDUA.VOLUME
2296 F Chapter 34: The SASEFAME Interface Engine
Table 34.2
continued
INTC.ADJUST INTC.CLOSE INTC.HIGH INTC.LOW INTC.OPEN INTC.UCLOSE INTC.UHIGH INTC.ULOW INTC.UOPEN INTC.UVOLUME INTC.VOLUME
SPX.ADJUST SPX.CLOSE SPX.HIGH SPX.LOW SPX.OPEN SPX.UCLOSE SPX.UHIGH SPX.ULOW SPX.UOPEN SPX.UVOLUME SPX.VOLUME
SUNW.ADJUST SUNW.CLOSE SUNW.HIGH SUNW.LOW SUNW.OPEN SUNW.UCLOSE SUNW.UHIGH SUNW.ULOW SUNW.UOPEN SUNW.UVOLUME SUNW.VOLUME
XOM.ADJUST XOM.CLOSE XOM.HIGH XOM.LOW XOM.OPEN XOM.UCLOSE XOM.UHIGH XOM.ULOW XOM.UOPEN XOM.UVOLUME XOM.VOLUME
Instead of using two namelists, you can use the WHERE= clause from your INSET= option to perform the crossproduct of the BY variables specified in your INSET via the WHERE= clause, with the members named in your namelist. Suppose you have defined a SAS input dataset named INSETA and you want to use it as input for your CROSSLIST= option instead of using the FAME namelist: DATA INSETA; LENGTH tick $5; /* AOL, C, CVX, F, GM, HPQ, IBM, INDUA, INTC, SPX, SUNW, XOM */ tick=’AOL’; output; tick=’C’; output; tick=’CVX’; output; tick=’F’; output; tick=’GM’; output; tick=’HPQ’; output; tick=’IBM’; output; tick=’INDUA’; output; tick=’INTC’; output; tick=’SPX’; output; tick=’SUNW’; output; tick=’XOM’; output; RUN; LIBNAME test sasefame ’physical name of test database’ RANGE=’01jan1999’d - ’31mar1999’d INSET=(inseta, where=tick) CROSSLIST=( {adjust, close, high, low, open, volume, uclose, uhigh, ulow, uopen, uvolume}) ;
Whether you use a SAS INSET with a WHERE clause or you use a FAME namelist for your CROSSLIST= selection, the two methods are equivalent ways of performing the same selection function. In the preceding example, the FAME ticker namelist corresponds to the SAS INSET’s BY variable named TICK.
Details: SASEFAME Interface Engine F 2297
Note that the WHERE= fame_bygroup must match your BY variable name used in your INSET= in order for the CROSSLIST= option to perform the desired selection. If one of the time series listed in your fame_namelist2 does not exist, the SASEFAME engine stops processing the remainder of the namelist. For complete results you should make sure that your fame_namelist2 is accurate and does not name unknown variables. The same holds true for fame_namelist1 and the BY variable values named in your INSET= option and used in your WHERE= clause. FAMEOUT=fame_data_object_class_type
specifies the class and type of the FAME data series objects you want in your SAS output data set. The possible values for fame_data_object_class_type are FORMULA, TIME, BOOLEAN, CASE, DATE, and STRING. If the FAMEOUT= option is not specified, numeric time series are output to the SAS data set. FAMEOUT=CASE defaults to case series of numeric type, so if you want another type of case series in your output, then you must specify it. Scalar data objects are not supported.
Details: SASEFAME Interface Engine
The SAS Output Data Set You can use the SAS DATA step to write the selected time series from your FAME database to a SAS data set. This enables you to easily analyze the data by using SAS. You can specify the name of the output data set in the DATA statement. This causes the engine supervisor to create a SAS data set by using the specified name in either the SAS WORK library or, if specified, the USER library. For more about naming your SAS data set, see the section “Characteristics of SAS Data Libraries” in SAS Language Reference: Dictionary. The contents of the SAS data set containing time series include the date of each observation, the name of each series read from the FAME database as specified by the WILDCARD option, and the label or FAME description of each series. Missing values are represented as ’.’ in the SAS data set. You can see the available data sets in the SAS LIBNAME window of the SAS windowing environment by selecting the SASEFAME libref in the LIBNAME window that you have previously used in your LIBNAME statement. You can use PROC PRINT and PROC CONTENTS to print your output data set and its contents. You can use PROC SQL along with the SASEFAME engine to create a view of your SAS data set. You can view your SAS output observations by doubleclicking on the desired output data set libref in the LIBNAME window of the SAS windowing environment. The DATE variable in the SAS data set contains the date of the observation. For FAME weekly intervals that end on a Friday, FAME reports the date on the Friday that ends the week, whereas SAS reports the date on the Saturday that begins the week.
2298 F Chapter 34: The SASEFAME Interface Engine
A more detailed discussion of how to map FAME frequencies to SAS time intervals follows. For other types of data such as string case series, date case series, boolean case series, and formulas, see Example 34.11.
Mapping FAME Frequencies to SAS Time Intervals Table 34.3 summarizes the mapping of FAME frequencies to SAS time intervals. It is important to note that FAME frequencies often have a sample unit in parentheses following the keyword frequency. This sample unit is an end-of-interval unit. SAS dates are represented using beginningof-interval notation. For more on SAS time intervals, see Chapter 4, “Date Intervals, Formats, and Functions.” For more on FAME frequencies, see the section “Understanding Frequencies” in the User’s Guide to FAME. Table 34.3
Mapping FAME Frequencies
FAME Frequency
SAS Time Interval
WEEKLY (SUNDAY) WEEKLY (MONDAY) WEEKLY (TUESDAY) WEEKLY (WEDNESDAY) WEEKLY (THURSDAY) WEEKLY (FRIDAY) WEEKLY (SATURDAY)
WEEK.2 WEEK.3 WEEK.4 WEEK.5 WEEK.6 WEEK.7 WEEK.1
BIWEEKLY (ASUNDAY) BIWEEKLY (AMONDAY) BIWEEKLY (ATUESDAY) BIWEEKLY (AWEDNESDAY) BIWEEKLY (ATHURSDAY) BIWEEKLY (AFRIDAY) BIWEEKLY (ASATURDAY) BIWEEKLY (BSUNDAY) BIWEEKLY (BMONDAY) BIWEEKLY (BTUESDAY) BIWEEKLY (BWEDNESDAY) BIWEEKLY (BTHURSDAY) BIWEEKLY (BFRIDAY) BIWEEKLY (BSATURDAY)
WEEK2.2 WEEK2.3 WEEK2.4 WEEK2.5 WEEK2.6 WEEK2.7 WEEK2.1 WEEK2.9 WEEK2.10 WEEK2.11 WEEK2.12 WEEK2.13 WEEK2.14 WEEK2.8
BIMONTHLY (NOVEMBER) BIMONTHLY
MONTH2.2 MONTH2.1
QUARTERLY (OCTOBER) QUARTERLY (NOVEMBER) QUARTERLY
QTR.2 QTR.3 QTR.1
Examples: SASEFAME Interface Engine F 2299
Table 34.3
continued
FAME FREQUENCY
SAS TIME INTERVAL
ANNUAL (JANUARY) ANNUAL (FEBRUARY) ANNUAL (MARCH) ANNUAL (APRIL) ANNUAL (MAY) ANNUAL (JUNE) ANNUAL (JULY) ANNUAL (AUGUST) ANNUAL (SEPTEMBER) ANNUAL (OCTOBER) ANNUAL (NOVEMBER) ANNUAL
YEAR.2 YEAR.3 YEAR.4 YEAR.5 YEAR.6 YEAR.7 YEAR.8 YEAR.9 YEAR.10 YEAR.11 YEAR.12 YEAR.1
SEMIANNUAL (JULY) SEMIANNUAL (AUGUST) SEMIANNUAL (SEPTEMBER) SEMIANNUAL (OCTOBER) SEMIANNUAL (NOVEMBER) SEMIANNUAL
SEMIYEAR.2 SEMIYEAR.3 SEMIYEAR.4 SEMIYEAR.5 SEMIYEAR.6 SEMIYEAR.1
YPP PPY
not supported not supported
SECONDLY MINUTELY HOURLY
SECOND MINUTE HOUR
DAILY BUSINESS TENDAY TWICEMONTHLY MONTHLY
DAY WEEKDAY TENDAY SEMIMONTH MONTH
Examples: SASEFAME Interface Engine In this section, the examples were run on Windows, so the physical names used in the LIBNAME libref SASEFAME statement reflect the syntax necessary for that platform. In general, the Windows environments use backslashes in their pathname, and the UNIX environments use forward slashes.
2300 F Chapter 34: The SASEFAME Interface Engine
Example 34.1: Converting an Entire FAME Database To enable conversion of all time series no wildcard is specified, so the default “?” wildcard is used. Always consider both the number of time series and the number of observations generated by the conversion process. The converted series are stored in the FAME WORK database during the SAS DATA step. You can further limit your resulting SAS data set by using KEEP, DROP, or WHERE statements inside your DATA step. The following statements convert a FAME database and print out its contents: options pagesize=60 linesize=80 validvarname=any ; %let FAME=%sysget(FAME); %put(&FAME); %let FAMETEMP=%sysget(FAME_TEMP); %put(&FAMETEMP); libname famedir sasefame "%sysget(FAME_DATA)" convert=(freq=annual technique=constant); libname mydir "%sysget(FAME_TEMP)"; data mydir.a; /* add data set to mydir */ set famedir.oecd1; /* Read in oecd1.db data from the Organization */ /* For Economic Cooperation and Development */ where date between ’01jan88’d and ’31dec93’d; run; proc print data=mydir.a; run;
In the preceding example, the FAME database is called oecd1.db and it resides in the famedir directory. The DATA statement names the SAS output data set ’a’, which will reside in mydir. All time series in the FAME oecd1.db database will be converted to an annual frequency and stored in the mydir.a SAS data set. Since the time series variable names contain the special glue symbol ’.’, the SAS option statement specifies VALIDVARNAME=ANY. Refer to the SAS Language Reference: Dictionary for more about this option. The FAME environment variable is the location of the FAME installation, and on Windows, the log would look like this: 1
options validvarname=any;
2 %let FAME=%sysget(FAME); 3 %put(&FAME); (C:\PROGRA~1\FAME) 4 %let FAMETEMP=%sysget(FAME_TEMP); 5 %put(&FAMETEMP); (\\ge\U11\saskff\fametemp\) 6 7 libname famedir sasefame "&FAME\util" 8 convert=(freq=annual technique=constant);
Example 34.1: Converting an Entire FAME Database F 2301
NOTE: Libref FAMEDIR was successfully assigned as follows: Engine: FAMECHLI Physical Name: C:\PROGRA~1\FAME\util 9 10 libname mydir ’\\dntsrc\usrtmp\saskff’; NOTE: Libref MYDIR was successfully assigned as follows: Engine: V9 Physical Name: \\dntsrc\usrtmp\saskff 11 12 data mydir.a; /* add data set to mydir */ 13 set famedir.oecd1; AUS.DIRDES -- SERIES (NUMERIC by ANNUAL) AUS.DIRDES copied to work data base as AUS.DIRDES.
For more about the glue DOT character, refer to “Glueing Names Together” in the User’s Guide to FAME. In the preceding log, the variable name AUS.DIRDES uses the glue DOT between AUS and DIRDES. The PROC PRINT statement creates Output 34.1.1 showing all of the observations in the mydir.a SAS data set.
2302 F Chapter 34: The SASEFAME Interface Engine
Output 34.1.1 Listing of OUT=MYDIR.A of the OECD1 FAME Data AUS. AUT. BEL. Obs DATE DIRDES AUS.HERD DIRDES AUT.HERD DIRDES BEL.HERD 1 2 3 4
1988 1989 1990 1991
750 . . .
1072.90 . . .
. . . .
. . . .
374 . . .
16572.70 18310.70 18874.20 .
CAN. DIRDES CAN.HERD 1589.60 1737.00 1859.20 1959.60
2006 2214 2347 2488
Obs
CHE. DIRDES
CHE.HERD
DEU. DIRDES
DEU.HERD
DNK. DIRDES
DNK.HERD
ESP. DIRDES
ESP.HERD
1 2 3 4
632.100 . . .
1532 1648 . .
3538.60 3777.20 2953.30 .
8780.00 9226.60 9700.00 .
258.100 284.800 . .
2662 2951 . .
508.200 623.600 723.600 .
55365.5 69270.5 78848.0 89908.0
Obs
FIN. DIRDES
FIN.HERD
FRA. DIRDES
FRA.HERD
GBR. DIRDES
GBR.HERD
GRC. DIRDES
GRC.HERD
1 2 3 4
247.700 259.700 271.000 .
1602.0 1725.5 1839.0 .
2573.50 2856.50 3005.20 .
19272.00 21347.80 22240.00 .
2627.00 2844.10 . .
1592.00 1774.20 . .
60.600 119.800 . .
6674.50 14485.20 . .
Obs
IRL. DIRDES
IRL.HERD
ISL. DIRDES
ISL.HERD
ITA. DIRDES
ITA.HERD
JPN. DIRDES
1 2 3 4
49.6000 50.2000 51.7000 .
37.0730 39.0130 . .
. 10.3000 11.0000 11.8000
. 786.762 902.498 990.865
1861.50 1968.00 2075.00 2137.80
2699927 2923504 3183071 3374000
NLD. Obs DIRDES NLD.HERD 1 2 3 4
883 945 . .
Obs SWE.HERD 1 2 3 4
. 11104 . .
2105 2202 . .
NOR. DIRDES NOR.HERD . 308.900 . 352.000
JPN.HERD
9657.20 10405.90 . .
2014073 2129372 2296992 .
NZL. PRT. SWE. DIRDES NZL.HERD DIRDES PRT.HERD DIRDES
. . 2771.40 78.7000 . . 3100.00 .
. 143.800 . .
111.5 10158.20 . . . . . .
. 1076 . .
TUR. DIRDES
TUR.HERD
USA. DIRDES
USA.HERD
YUG. DIRDES
YUG.HERD
174.400 212.300 . .
74474 143951 . .
20246.20 22159.50 23556.10 24953.80
20246.20 22159.50 23556.10 24953.80
233.000 205.100 . .
29.81 375.22 2588.50 .
Example 34.2: Reading Time Series from the FAME Database F 2303
Example 34.2: Reading Time Series from the FAME Database Use the FAME wildcard option to limit the number of series converted. For example, suppose you want to read only series starting with “WSPCA”. Output 34.2.1 and Output 34.2.2 show the results of the following sample code. options validvarname=any; %let FAME=%sysget(FAME); %put(&FAME); %let FAMETEMP=%sysget(FAME_TEMP); %put(&FAMETEMP); libname lib1 sasefame "%sysget(FAME_DATA)" wildcard="wspca?" convert=(technique=constant freq=twicemonthly ); libname lib2 "%sysget(FAME_TEMP)"; data lib2.twild(label=’Annual Series from the FAMEECON.db’); set lib1.subecon; where date between ’01jan93’d and ’31dec93’d; /* keep only */ keep date wspca; run; proc contents data=lib2.twild; run; proc print data=lib2.twild; run;
Output 34.2.1 Contents of OUT=LIB2.TWILD of the SUBECON FAME Data The CONTENTS Procedure Alphabetic List of Variables and Attributes # Variable Type Len Format Informat Label 1 DATE 2 WSPCA
Num Num
8 DATE9. 9. 8
Date of Observation STANDARD & POOR’S WEEKLY BOND YIELD: COMPOSITE, A
The WILDCARD=“WSPCA?” option limits reading only those series whose names begin with WSPCA. The KEEP statement further restricts the SAS data set to include only the series named WSPCA and the DATE variable. The time interval used for the conversion is TWICEMONTHLY.
2304 F Chapter 34: The SASEFAME Interface Engine
Output 34.2.2 Listing of OUT=LIB2.TWILD of the SUBECON FAME Data Obs
DATE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
01JAN1993 16JAN1993 01FEB1993 16FEB1993 01MAR1993 16MAR1993 01APR1993 16APR1993 01MAY1993 16MAY1993 01JUN1993 16JUN1993 01JUL1993 16JUL1993 01AUG1993 16AUG1993
WSPCA 8.59400 8.50562 8.47000 8.31000 8.27000 8.29250 8.32400 8.56333 8.37867 8.26312 8.21333 8.14400 8.09067 8.09937 7.98533 7.91600
Example 34.3: Writing Time Series to the SAS Data Set You can use the KEEP or DROP statement to include or exclude certain series names from the SAS data set as shown in Output 34.3.1. options validvarname=any; %let FAME=%sysget(FAME); %put(&FAME); %let FAMETEMP=%sysget(FAME_TEMP); %put(&FAMETEMP); libname famedir sasefame "%sysget(FAME_DATA)" convert=(freq=annual technique=constant); libname mydir "%sysget(FAME_TEMP)"; data mydir.a; /* add data set to mydir */ set famedir.oecd1; drop ’ita.dirdes’n--’jpn.herd’n ’tur.dirdes’n--’usa.herd’n; where date between ’01jan88’d and ’31dec93’d; run; title2 "OECD1: TECH=constant, FREQ=annual"; title3 "drop using n-literals"; proc print data=mydir.a; run;
Example 34.3: Writing Time Series to the SAS Data Set F 2305
Output 34.3.1 Listing of OUT=MYDIR.A of the OECD1 FAME Data OECD1: TECH=constant, FREQ=annual drop using n-literals AUS. AUT. BEL. Obs DATE DIRDES AUS.HERD DIRDES AUT.HERD DIRDES BEL.HERD 1 2 3 4
1988 1989 1990 1991
750 . . .
1072.90 . . .
. . . .
. . . .
374 . . .
16572.70 18310.70 18874.20 .
CAN. DIRDES CAN.HERD 1589.60 1737.00 1859.20 1959.60
2006 2214 2347 2488
Obs
CHE. DIRDES
CHE.HERD
DEU. DIRDES
DEU.HERD
DNK. DIRDES
DNK.HERD
ESP. DIRDES
ESP.HERD
1 2 3 4
632.100 . . .
1532 1648 . .
3538.60 3777.20 2953.30 .
8780.00 9226.60 9700.00 .
258.100 284.800 . .
2662 2951 . .
508.200 623.600 723.600 .
55365.5 69270.5 78848.0 89908.0
Obs
FIN. DIRDES
FIN.HERD
FRA. DIRDES
FRA.HERD
GBR. DIRDES
GBR.HERD
GRC. DIRDES
GRC.HERD
1 2 3 4
247.700 259.700 271.000 .
1602.0 1725.5 1839.0 .
2573.50 2856.50 3005.20 .
19272.00 21347.80 22240.00 .
2627.00 2844.10 . .
1592.00 1774.20 . .
60.600 119.800 . .
6674.50 14485.20 . .
Obs
IRL. DIRDES
IRL.HERD
ISL. DIRDES
ISL.HERD
NLD. DIRDES
1 2 3 4
49.6000 50.2000 51.7000 .
37.0730 39.0130 . .
. 10.3000 11.0000 11.8000
. 786.762 902.498 990.865
Obs
NZL. DIRDES
NZL.HERD
PRT. DIRDES
PRT.HERD
SWE. DIRDES
1 2 3 4
. 78.7000 . .
. 143.800 . .
111.5 . . .
10158.20 . . .
. 1076 . .
883 945 . .
NLD.HERD
NOR. DIRDES
NOR.HERD
2105 2202 . .
. 308.900 . 352.000
. 2771.40 . 3100.00
SWE.HERD . 11104 . .
YUG. DIRDES
YUG.HERD
233.000 205.100 . .
29.81 375.22 2588.50 .
Note that the SAS option VALIDVARNAME=ANY was used at the beginning of this example due to special characters being present in the time series names. SAS variables that contain certain special characters are called n-literals and are referenced in SAS code as shown in this example. You can rename your SAS variables by using the RENAME statement as follows. Output 34.3.2 shows how to use n-literals when selecting variables you want to keep, and how to rename some of your kept variables. options validvarname=any;
2306 F Chapter 34: The SASEFAME Interface Engine
%let FAME=%sysget(FAME); %put(&FAME); %let FAMETEMP=%sysget(FAME_TEMP); %put(&FAMETEMP); libname famedir sasefame "%sysget(FAME_DATA)" convert=(freq=annual technique=constant); libname mydir "%sysget(FAME_TEMP)"; data mydir.a; /* add data set to mydir */ set famedir.oecd1; /* keep and rename */ keep date ’ita.dirdes’n--’jpn.herd’n ’tur.dirdes’n--’usa.herd’n; rename ’ita.dirdes’n=’italy.dirdes’n ’jpn.dirdes’n=’japan.dirdes’n ’tur.dirdes’n=’turkey.dirdes’n ’usa.dirdes’n=’united.states.of.america.dirdes’n ; run; title3 "keep statement using n-literals"; title4 "rename statement using n-literals"; proc print data=mydir.a; run;
Output 34.3.2 Listing of OUT=MYDIR.A of the OECD1 FAME Data OECD1: TECH=constant, FREQ=annual keep statement using n-literals rename statement using n-literals
Obs
DATE
italy. dirdes
ITA.HERD
1 2 3 4 5 6 7
1985 1986 1987 1988 1989 1990 1991
1344.90 1460.60 1674.40 1861.50 1968.00 2075.00 2137.80
1751008 2004453 2362102 2699927 2923504 3183071 3374000
Obs
turkey. dirdes
1 2 3 4 5 6 7
144.800 136.400 121.900 174.400 212.300 . .
japan. dirdes 8065.70 8290.10 9120.80 9657.20 10405.90 . .
JPN.HERD 1789780 1832575 1957921 2014073 2129372 2296992 .
TUR.HERD
united.states. of.america. dirdes
USA.HERD
22196 26957 32309 74474 143951 . .
14786.00 16566.90 18326.10 20246.20 22159.50 23556.10 24953.80
14786.00 16566.90 18326.10 20246.20 22159.50 23556.10 24953.80
Example 34.4: Limiting the Time Range of Data F 2307
Example 34.4: Limiting the Time Range of Data You can also limit the time range of the data in the SAS data set by using the RANGE= option in the LIBNAME statement or the WHERE statement in the DATA step to process the time ID variable DATE only when it falls in the range you are interested in. All data for 1988, 1989, and 1990 are included in the SAS data set generated by using the RANGE=’01JAN1988’D - ’31DEC1990’D option or the WHERE DATE BETWEEN ’01JAN88’D AND ’31DEC90’D statement. The difference is that the range option uses less space in your FAME WORK database. If you have a very large database and want to use less space in your FAME WORK database while you are processing the oecd1 database, you should instead use the RANGE= option as shown in Output 34.4.1. options validvarname=any; %let FAME=%sysget(FAME); %put(&FAME); %let FAMETEMP=%sysget(FAME_TEMP); %put(&FAMETEMP); libname famedir SASEFAME "%sysget(FAME_DATA)" convert=(freq=annual technique=constant) range=’01jan1988’d - ’31dec1990’d; libname mydir "%sysget(FAME_TEMP)"; data mydir.a; /* add data set to mydir */ set famedir.oecd1; /* range on the libref restricts the dates * * read from famedir’s oecd1 database */ run; proc print data=mydir.a; run;
2308 F Chapter 34: The SASEFAME Interface Engine
Output 34.4.1 Listing of OUT=MYDIR.A of the OECD1 FAME Data Using RANGE= Option OECD1: TECH=constant, FREQ=annual keep statement using n-literals rename statement using n-literals AUS. AUT. BEL. Obs DATE DIRDES AUS.HERD DIRDES AUT.HERD DIRDES BEL.HERD 1 2 3
1988 1989 1990
750 . .
1072.90 . .
. . .
. . .
374 . .
CAN. DIRDES CAN.HERD
16572.70 1589.60 18310.70 1737.00 18874.20 1859.20
2006 2214 2347
Obs
CHE. DIRDES
CHE.HERD
DEU. DIRDES
DEU.HERD
DNK. DIRDES
DNK.HERD
ESP. DIRDES
ESP.HERD
1 2 3
632.100 . .
1532 1648 .
3538.60 3777.20 2953.30
8780.00 9226.60 9700.00
258.100 284.800 .
2662 2951 .
508.200 623.600 723.600
55365.5 69270.5 78848.0
Obs
FIN. DIRDES
FIN.HERD
FRA. DIRDES
FRA.HERD
GBR. DIRDES
GBR.HERD
GRC. DIRDES
GRC.HERD
1 2 3
247.700 259.700 271.000
1602.0 1725.5 1839.0
2573.50 2856.50 3005.20
19272.00 21347.80 22240.00
2627.00 2844.10 .
1592.00 1774.20 .
60.600 119.800 .
6674.50 14485.20 .
Obs
IRL. DIRDES
IRL.HERD
ISL. DIRDES
ISL.HERD
ITA. DIRDES
ITA.HERD
JPN. DIRDES
JPN.HERD
1 2 3
49.6000 50.2000 51.7000
37.0730 39.0130 .
. 10.3000 11.0000
. 786.762 902.498
1861.5 1968.0 2075.0
2699927 2923504 3183071
NLD. Obs DIRDES NLD.HERD 1 2 3
883 945 .
Obs SWE.HERD 1 2 3
. 11104 .
2105 2202 .
NOR. DIRDES NOR.HERD . 308.900 .
9657.20 10405.90 .
2014073 2129372 2296992
NZL. PRT. SWE. DIRDES NZL.HERD DIRDES PRT.HERD DIRDES
. . 2771.40 78.7000 . .
. 143.800 .
111.5 10158.20 . . . .
. 1076 .
TUR. DIRDES
TUR.HERD
USA. DIRDES
USA.HERD
YUG. DIRDES
YUG.HERD
174.400 212.300 .
74474 143951 .
20246.20 22159.50 23556.10
20246.20 22159.50 23556.10
233.000 205.100 .
29.81 375.22 2588.50
The WHERE statement can be used in the DATA step to process the time ID variable DATE only when it falls in the range you are interested in. In Output 34.4.2, you can see that the result from the WHERE statement is the same as the result in Output 34.4.1 using the RANGE= option. options validvarname=any; %let FAME=%sysget(FAME);
Example 34.4: Limiting the Time Range of Data F 2309
%put(&FAME); %let FAMETEMP=%sysget(FAME_TEMP); %put(&FAMETEMP); libname famedir SASEFAME "%sysget(FAME_DATA)" convert=(freq=annual technique=constant); libname mydir "%sysget(FAME_TEMP)"; data mydir.a; /* add data set to mydir */ set famedir.oecd1; /* where only */ where date between ’01jan88’d and ’31dec90’d; run; proc print data=mydir.a; run;
2310 F Chapter 34: The SASEFAME Interface Engine
Output 34.4.2 Listing of OUT=MYDIR.A of the OECD1 FAME Data Using WHERE Statement OECD1: TECH=constant, FREQ=annual keep statement using n-literals rename statement using n-literals AUS. AUT. BEL. Obs DATE DIRDES AUS.HERD DIRDES AUT.HERD DIRDES BEL.HERD 1 2 3
1988 1989 1990
750 . .
1072.90 . .
. . .
. . .
374 . .
CAN. DIRDES CAN.HERD
16572.70 1589.60 18310.70 1737.00 18874.20 1859.20
2006 2214 2347
Obs
CHE. DIRDES
CHE.HERD
DEU. DIRDES
DEU.HERD
DNK. DIRDES
DNK.HERD
ESP. DIRDES
ESP.HERD
1 2 3
632.100 . .
1532 1648 .
3538.60 3777.20 2953.30
8780.00 9226.60 9700.00
258.100 284.800 .
2662 2951 .
508.200 623.600 723.600
55365.5 69270.5 78848.0
Obs
FIN. DIRDES
FIN.HERD
FRA. DIRDES
FRA.HERD
GBR. DIRDES
GBR.HERD
GRC. DIRDES
GRC.HERD
1 2 3
247.700 259.700 271.000
1602.0 1725.5 1839.0
2573.50 2856.50 3005.20
19272.00 21347.80 22240.00
2627.00 2844.10 .
1592.00 1774.20 .
60.600 119.800 .
6674.50 14485.20 .
Obs
IRL. DIRDES
IRL.HERD
ISL. DIRDES
ISL.HERD
ITA. DIRDES
ITA.HERD
JPN. DIRDES
JPN.HERD
1 2 3
49.6000 50.2000 51.7000
37.0730 39.0130 .
. 10.3000 11.0000
. 786.762 902.498
1861.5 1968.0 2075.0
2699927 2923504 3183071
NLD. Obs DIRDES NLD.HERD 1 2 3
883 945 .
Obs SWE.HERD 1 2 3
. 11104 .
2105 2202 .
NOR. DIRDES NOR.HERD . 308.900 .
9657.20 10405.90 .
2014073 2129372 2296992
NZL. PRT. SWE. DIRDES NZL.HERD DIRDES PRT.HERD DIRDES
. . 2771.40 78.7000 . .
. 143.800 .
111.5 10158.20 . . . .
. 1076 .
TUR. DIRDES
TUR.HERD
USA. DIRDES
USA.HERD
YUG. DIRDES
YUG.HERD
174.400 212.300 .
74474 143951 .
20246.20 22159.50 23556.10
20246.20 22159.50 23556.10
233.000 205.100 .
29.81 375.22 2588.50
Refer to the SAS Language Reference: Concepts for more information on KEEP, DROP, RENAME, and WHERE statements.
Example 34.5: Creating a View Using the SQL Procedure and SASEFAME F 2311
Example 34.5: Creating a View Using the SQL Procedure and SASEFAME The following statements create a view of OECD data by using the SQL procedure’s FROM and USING clauses as shown in Output 34.5.1. Refer to the BASE SAS Procedures Guide for details on SQL views. title1 ’famesql5: PROC SQL Dual Embedded Libraries w/ FAME option’; options validvarname=any; %let FAME=%sysget(FAME); %put(&FAME); %let FAMETEMP=%sysget(FAME_TEMP); %put(&FAMETEMP); title2 ’OECD1: Dual Embedded Library Allocations with FAME Option’; proc sql; create view fameview as select date, ’fin.herd’n from lib1.oecd1 using libname lib1 sasefame "%sysget(FAME_DATA)" convert=(tech=constant freq=annual), libname temp "%sysget(FAME_TEMP)"; quit; title2 ’OECD1: Print of View from Embedded Library with FAME Option’; proc print data=fameview; run;
Output 34.5.1 Printout of the FAME View of OECD Data famesql5: PROC SQL Dual Embedded Libraries w/ FAME option OECD1: Print of View from Embedded Library with FAME Option Obs
DATE
FIN.HERD
1 2 3 4 5 6 7
1985 1986 1987 1988 1989 1990 1991
1097.00 1234.00 1401.30 1602.00 1725.50 1839.00 .
The following statements create a view of DRI Basic Economic data by using the SQL procedure’s FROM and USING clauses as shown in Output 34.5.2. title2 ’SUBECON: Dual Embedded Library Allocations with FAME Option’; options validvarname=any;
2312 F Chapter 34: The SASEFAME Interface Engine
%let FAME=%sysget(FAME); %put(&FAME); %let FAMETEMP=%sysget(FAME_TEMP); %put(&FAMETEMP); proc sql; create view fameview as select date, gaa from lib1.subecon using libname lib1 sasefame "%sysget(FAME_DATA)" convert=(tech=constant freq=annual), libname temp "%sysget(FAME_TEMP)"; quit; title2 ’SUBECON: Print of View from Embedded Library with FAME Option’; proc print data=fameview; run;
Example 34.5: Creating a View Using the SQL Procedure and SASEFAME F 2313
Output 34.5.2 Printout of the FAME View of DRI Basic Economic Data famesql5: PROC SQL Dual Embedded Libraries w/ FAME option SUBECON: Print of View from Embedded Library with FAME Option Obs
DATE
GAA
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993
. . 23174.00 19003.00 24960.00 21906.00 20246.00 20912.00 21056.00 27168.00 27638.00 26723.00 22929.00 29729.00 28444.00 28226.00 32396.00 34932.00 40024.00 47941.00 51429.00 49164.00 51208.00 49371.00 44034.00 52352.00 62644.00 81645.00 91028.00 89494.00 109492.00 130260.00 154357.00 173428.00 156096.00 147765.00 113216.00 133495.00 146448.00 128521.99 111337.99 160785.00 210532.00 201637.00 218702.00 210666.00 . .
2314 F Chapter 34: The SASEFAME Interface Engine
The following statements create a view of the DB77 database by using the SQL procedure’s FROM and USING clauses, as shown in Output 34.5.3. title2 ’DB77: Dual Embedded Library Allocations with FAME Option’; options validvarname=any; %let FAME=%sysget(FAME); %put(&FAME); %let FAMETEMP=%sysget(FAME_TEMP); %put(&FAMETEMP); proc sql; create view fameview as select date, ann, ’qandom.x’n from lib1.db77 using libname lib1 sasefame "%sysget(FAME_DATA)" convert=(tech=constant freq=annual), libname temp "%sysget(FAME_TEMP)"; quit; title2 ’DB77: Print of View from Embedded Library with FAME Option’; proc print data=fameview; run;
Example 34.5: Creating a View Using the SQL Procedure and SASEFAME F 2315
Output 34.5.3 Printout of the FAME View of DB77 Data famesql5: PROC SQL Dual Embedded Libraries w/ FAME option DB77: Print of View from Embedded Library with FAME Option Obs
DATE
ANN
QANDOM.X
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
. . . . . . . . . . . . . . . . . . . . . 100 101 102 103 104 105 106 107 109 111
0.56147 0.51031 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The following statements create a view of the DRI economic database by using the SQL procedure’s FROM and USING clauses, as shown in Output 34.5.4. title2 ’DRIECON: Dual Embedded Library Allocations with FAME Option’; options validvarname=any; %let FAME=%sysget(FAME); %put(&FAME); %let FAMETEMP=%sysget(FAME_TEMP); %put(&FAMETEMP); proc sql; create view fameview as select date, husts from lib1.driecon using libname lib1 sasefame "%sysget(FAME_DATA)"
2316 F Chapter 34: The SASEFAME Interface Engine
convert=(tech=constant freq=annual) range=’01jan1980’d - ’01jan2006’d , libname temp "%sysget(FAME_TEMP)"; quit; title2 ’DRIECON: Print of View from Embedded Library with FAME Option’; proc print data=fameview; run;
Note that the SAS option VALIDVARNAME=ANY was used at the beginning of this example due to special characters being present in the time series names. The output from this example shows how each FAME view is the output of the SASEFAME engine’s processing. Note that different engine options could have been used in the USING LIBNAME clause if desired. Output 34.5.4 Printout of the FAME View of DRI Basic Economic Data famesql5: PROC SQL Dual Embedded Libraries w/ FAME option DRIECON: Print of View from Embedded Library with FAME Option Obs
DATE
HUSTS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998
1.29990 1.09574 1.05862 1.70580 1.76351 1.74258 1.81205 1.62914 1.48748 1.38218 1.20161 1.00878 1.20159 1.29201 1.44669 1.36158 1.46952 1.47760 1.56250
Example 34.6: Reading Other FAME Data Objects with the FAMEOUT= Option Suppose you want to see the source for your formulas or you are interested in string case series instead of numeric time series. You can designate the data objects that are output to your SAS data set using the FAMEOUT= option. In this example, the FAMEOUT=FORMULA option selects the formulas and their source definitions to be output as shown in Output 34.6.1. Note that the RANGE= option is ignored since no time series are selected when FAMEOUT=FORMULA is specified.
Example 34.6: Reading Other FAME Data Objects with the FAMEOUT= Option F 2317
options validvarname=any ls=90; %let FAME=%sysget(FAME); %put(&FAME); %let FAMETEMP=%sysget(FAME_TEMP); %put(&FAMETEMP); libname lib6 sasefame "%sysget(FAME_DATA)" fameout=formula convert=(frequency=business technique=constant) range=’02jan1995’d - ’25jul1997’d wildcard="?YIELD?" ; data crout; set lib6.training; keep ’S.GM.YIELD.A’n -- ’S.XON.YIELD.A’n ; run; title2 ’Contents of OUT=CROUT from the FAMEOUT=FORMULA Option’; title3 ’Using WILDCARD="?YIELD?"’; proc contents data=crout; run;
Output 34.6.1 Contents of OUT=CROUT from the FAMEOUT=FORMULA Option of the TRAINING FAME Data famesql5: PROC SQL Dual Embedded Libraries w/ FAME option Contents of OUT=CROUT from the FAMEOUT=FORMULA Option Using WILDCARD="?YIELD?" The CONTENTS Procedure Alphabetic List of Variables and Attributes # 1 2 3 4 5 6 7 8 9 10
Variable
Type
Len
S.GM.YIELD.A S.GM__PP.YIELD.A S.HWP.YIELD.A S.IBM.YIELD.A S.INDUT.YIELD.A S.SPAL.YIELD.A S.SPALN.YIELD.A S.SUNW.YIELD.A S.XOM.YIELD.A S.XON.YIELD.A
Char Char Char Char Char Char Char Char Char Char
82 82 82 82 82 82 82 82 82 82
The FAMEOUT=FORMULA option restricts the SAS data set to include only formulas. The WILDCARD=“?YIELD?” option further limits reading only those formulas whose names “YIELD” in them. This output is shown in Output 34.6.2. options validvarname=any linesize=79;
2318 F Chapter 34: The SASEFAME Interface Engine
title2 ’PROC PRINT of OUT=CROUT from the FAMEOUT=FORMULA Option’; title3 ’Using WILDCARD="?YIELD?"’; proc print data=crout noobs; run;
Output 34.6.2 Listing of OUT=CROUT from the FAMEOUT=FORMULA Option of the TRAINING FAME Data famesql5: PROC SQL Dual Embedded Libraries w/ FAME option PROC PRINT of OUT=CROUT from the FAMEOUT=FORMULA Option Using WILDCARD="?YIELD?" S.GM.YIELD.A (%SPLC2TF(C37044210X01, IAD_DATE.H, IAD.H)/C37044210X01.CLOSE)*C37044210X01.ADJ S.GM__PP.YIELD.A (%SPLC2TF(C37044210X01, IAD_DATE.H, IAD.H)/C37044210X01.CLOSE)*C37044210X01.ADJ S.HWP.YIELD.A (%SPLC2TF(C42823610X01, IAD_DATE.H, IAD.H)/C42823610X01.CLOSE)*C42823610X01.ADJ S.IBM.YIELD.A (%SPLC2TF(C45920010X01, IAD_DATE.H, IAD.H)/C45920010X01.CLOSE)*C45920010X01.ADJ S.INDUT.YIELD.A (%SPLC2TF(C00000110X00, IAD_DATE.H, IAD.H)/C00000110X00.CLOSE)*C00000110X00.ADJ S.SPAL.YIELD.A (%SPLC2TF(C00000117X00, IAD_DATE.H, IAD.H)/C00000117X00.CLOSE)*C00000117X00.ADJ S.SPALN.YIELD.A (%SPLC2TF(C00000117X00, IAD_DATE.H, IAD.H)/C00000117X00.CLOSE)*C00000117X00.ADJ S.SUNW.YIELD.A (%SPLC2TF(C86681010X60, IAD_DATE.H, IAD.H)/C86681010X60.CLOSE)*C86681010X60.ADJ S.XOM.YIELD.A (%SPLC2TF(C30231G10X01, IAD_DATE.H, IAD.H)/C30231G10X01.CLOSE)*C30231G10X01.ADJ S.XON.YIELD.A (%SPLC2TF(C30231G10X01, IAD_DATE.H, IAD.H)/C30231G10X01.CLOSE)*C30231G10X01.ADJ
Example 34.8: Selecting Time Series Using CROSSLIST= Option and KEEP Statement F 2319
Example 34.7: Remote FAME Access Using FAME CHLI Suppose you are running FAME in a client/server environment and have FAME CHLI capability allowing you access to your FAME server. You could access your FAME remote data by specifying the port number of the tcpip service that is defined for your frdb_m and the node name of your FAME master server in your physical path. In this example, the FAME server node name is STONES, and the port number is 5555, which was designated in the FAME master command. Refer to “Starting the Master Server” in the Guide to FAME Database Servers for more information about starting your FAME master server. This output is shown in Output 34.7.1. options ls=78; title1 "DRIECON Database, Using FAME with REMOTE ACCESS VIA CHLI"; libname test1 sasefame ’#5555@stones $FAME/util’; data a; set test1.driecon; keep YP ZA ZB; where date between ’01jan98’d and ’31dec03’d; run; proc means data=a n; run;
Output 34.7.1 Summary Statistics for the Remote FAME Data DRIECON Database, Using FAME with REMOTE ACCESS VIA CHLI The MEANS Procedure Variable Label N ----------------------------------------------------------YP PERSONAL INCOME 5 ZA CORPORATE PROFITS AFTER TAX EXCLUDING IVA 4 ZB CORPORATE PROFITS BEFORE TAX EXCLUDING IVA 4 -----------------------------------------------------------
Example 34.8: Selecting Time Series Using CROSSLIST= Option and KEEP Statement This example shows how to use two FAME namelists to perform selection. Note that fame_namelist1 could be easily generated using the FAME WILDLIST function. For more about WILDLIST, refer to “The WILDLIST Function” in the FAME Command Reference Volume 2, Functions. In the following statements, 11 tickers are selected in fame_namelist1, but when you use the KEEP statement, the resulting data set contains only the desired IBM ticker shown in Output 34.8.1 and Output 34.8.2.
2320 F Chapter 34: The SASEFAME Interface Engine
libname lib8 sasefame "%sysget(FAME_DATA)" convert=(frequency=business technique=constant) crosslist=( { IBM,SPALN,SUNW,XOM }, { adjust, close, high, low, open, volume, uclose, uhigh, ulow,uopen,uvolume } ); data trout; /* eleven companies, keep only the IBM ticker this time */ set lib8.training; where date between ’01mar02’d and ’20mar02’d; keep IBM: ; run;
title2 ’Contents of OUT=trout for BYGROUP Tick=IBM.’; proc contents data=trout; run; title2 ’TRAINING DB, Pricing Timeseries for IBM Ticker in CROSSLIST=’; title3 ’OUT=TROUT from the PRINT Procedure’; proc print data=trout; run;
Output 34.8.1 Contents of the IBM Time Series in the Training FAME Data DRIECON Database, Using FAME with REMOTE ACCESS VIA CHLI Contents of OUT=trout for BYGROUP Tick=IBM. The CONTENTS Procedure Alphabetic List of Variables and Attributes # 1 2 3 4 5 6 7 8 9 10 11
Variable
Type
IBM.ADJUST IBM.CLOSE IBM.HIGH IBM.LOW IBM.OPEN IBM.UCLOSE IBM.UHIGH IBM.ULOW IBM.UOPEN IBM.UVOLUME IBM.VOLUME
Num Num Num Num Num Num Num Num Num Num Num
Len 8 8 8 8 8 8 8 8 8 8 8
Example 34.8: Selecting Time Series Using CROSSLIST= Option and KEEP Statement F 2321
Output 34.8.2 Listing of Ticker IBM Time Series in the Training FAME Data DRIECON Database, Using FAME with REMOTE ACCESS VIA CHLI TRAINING DB, Pricing Timeseries for IBM Ticker in CROSSLIST= OUT=TROUT from the PRINT Procedure Obs
IBM.ADJUST
IBM.CLOSE
IBM.HIGH
IBM.LOW
IBM.OPEN
IBM.UCLOSE
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 1 1 1 1 1 1 1 1 1 1 1 1 1
103.020 105.900 105.670 106.300 103.710 105.090 105.240 108.500 107.180 106.600 106.790 106.350 107.490 105.500
103.100 106.540 106.500 107.090 107.500 107.340 105.970 108.850 108.650 107.950 107.450 108.640 108.050 106.900
98.500 103.130 104.160 104.750 103.240 104.820 103.600 105.510 106.700 106.590 105.590 106.230 106.490 105.490
98.600 103.350 104.250 105.150 107.300 104.820 104.350 105.520 108.300 107.020 106.550 107.100 106.850 106.900
103.020 105.900 105.670 106.300 103.710 105.090 105.240 108.500 107.180 106.600 106.790 106.350 107.490 105.500
Obs
IBM.UHIGH
IBM.ULOW
IBM.UOPEN
IBM.UVOLUME
IBM.VOLUME
1 2 3 4 5 6 7 8 9 10 11 12 13 14
103.100 106.540 106.500 107.090 107.500 107.340 105.970 108.850 108.650 107.950 107.450 108.640 108.050 106.900
98.500 103.130 104.160 104.750 103.240 104.820 103.600 105.510 106.700 106.590 105.590 106.230 106.490 105.490
98.600 103.350 104.250 105.150 107.300 104.820 104.350 105.520 108.300 107.020 106.550 107.100 106.850 106.900
104890 107650 75617 76874 109720 107260 86391 110640 64086 53335 108640 53048 46148 48367
104890 107650 75617 76874 109720 107260 86391 110640 64086 53335 108640 53048 46148 48367
2322 F Chapter 34: The SASEFAME Interface Engine
Example 34.9: Selecting Time Series Using CROSSLIST= Option with a FAME Namelist of Tickers This example demonstrates selection by using the CROSSLIST= option. Only the ticker “IBM” is specified in the KEEP statement from the 11 companies in the FAME ticker namelist. The results are shown in Output 34.9.1 and Output 34.9.2. libname lib9 sasefame "%sysget(FAME_DATA)" convert=(frequency=business technique=constant) range=’07jul1997’d - ’25jul1997’d crosslist=( nl(ticker), { adjust, close, high, low, open, volume, uclose, uhigh, ulow, uopen, uvolume } ); data crout; /* eleven companies in the FAME ticker namelist */ set lib9.training; keep IBM: ; run; title2 ’TRAINING DB, Pricing Timeseries for eleven Tickers in CROSSLIST=’; title3 ’OUT=CROUT from the PRINT Procedure’; proc print data=crout; run; title2 ’Contents of OUT=crout from the FAME Crosslist function’; title3 ’Using TICKER namelist.’; proc contents data=crout; run;
Example 34.9: Selecting Time Series Using CROSSLIST= Option with a FAME Namelist of Tickers F 2323 Output 34.9.1 Listing of OUT=CROUT Using CROSSLIST= Option in the Training FAME Data DRIECON Database, Using FAME with REMOTE ACCESS VIA CHLI Contents of OUT=crout from the FAME Crosslist function Using TICKER namelist. Obs
IBM.ADJUST
IBM.CLOSE
IBM.HIGH
IBM.LOW
IBM.OPEN
IBM.UCLOSE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
47.2500 47.8750 48.0938 47.8750 47.8750 47.6250 48.0000 48.8125 49.8125 52.2500 51.8750 51.5000 52.5625 53.9063 53.5000
47.7500 47.8750 48.3438 48.0938 48.6875 48.2188 48.1250 49.0000 50.8750 52.6250 53.1563 51.7500 53.5000 54.2188 54.2188
47.0000 47.2500 47.6563 47.0313 47.8125 47.0000 46.6875 47.6875 48.5625 50.0000 51.0938 49.6875 51.5938 52.2500 52.8125
47.5000 47.2500 48.0000 47.3438 47.9063 47.8125 47.4375 47.8750 48.9063 50.0000 52.6250 50.0313 52.1875 52.8125 53.9688
94.500 95.750 96.188 95.750 95.750 95.250 96.000 97.625 99.625 104.500 103.750 103.000 105.125 107.813 107.000
Obs
IBM.UHIGH
IBM.ULOW
IBM.UOPEN
IBM.UVOLUME
IBM.VOLUME
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
95.500 95.750 96.688 96.188 97.375 96.438 96.250 98.000 101.750 105.250 106.313 103.500 107.000 108.438 108.438
94.000 94.500 95.313 94.063 95.625 94.000 93.375 95.375 97.125 100.000 102.188 99.375 103.188 104.500 105.625
95.000 94.500 96.000 94.688 95.813 95.625 94.875 95.750 97.813 100.000 105.250 100.063 104.375 105.625 107.938
129012 102796 177276 127900 137724 128976 149612 215440 315504 463480 328184 368276 219880 204088 146600
64506 51398 88638 63950 68862 64488 74806 107720 157752 231740 164092 184138 109940 102044 73300
2324 F Chapter 34: The SASEFAME Interface Engine
Output 34.9.2 Contents of OUT=CROUT Using CROSSLIST= Option in the Training FAME Data Alphabetic List of Variables and Attributes # 1 2 3 4 5 6 7 8 9 10 11
Variable
Type
IBM.ADJUST IBM.CLOSE IBM.HIGH IBM.LOW IBM.OPEN IBM.UCLOSE IBM.UHIGH IBM.ULOW IBM.UOPEN IBM.UVOLUME IBM.VOLUME
Num Num Num Num Num Num Num Num Num Num Num
Len 8 8 8 8 8 8 8 8 8 8 8
Example 34.10: Selecting Time Series Using CROSSLIST= Option with INSET= and WHERE=TICK F 2325
Example 34.10: Selecting Time Series Using CROSSLIST= Option with INSET= and WHERE=TICK Suppose instead of having a FAME namelist with the Tickers for companies whose data you are interested in, you have an input SAS data set (INSET) that specifies the tickers to select. You can specify your selection by using the WHERE statement as in the following statements. The results are shown in Output 34.10.1 and Output 34.10.2. data inseta; length tick $5; /* need $5 so SPALN is not truncated */ tick=’AOL’; tick=’C’; tick=’CPQ’; tick=’CVX’; tick=’F’; tick=’GM’; tick=’HWP’; tick=’IBM’; tick=’SPALN’; tick=’SUNW’; tick=’XOM’; run;
output; output; output; output; output; output; output; output; output; output; output;
libname lib10 sasefame "%sysget(FAME_DATA)" convert=(frequency=business technique=constant) range=’07jul1997’d - ’25jul1997’d inset=( inseta where=tick ) crosslist= ( {adjust, close, high, low, open, volume, uclose, uhigh, ulow,uopen,uvolume} ); data trout; /* eleven companies with unique TICKs specified in INSETA */ set lib10.training; keep IBM: ; run; title2 ’TRAINING DB, Pricing Timeseries for eleven Tickers in CROSSLIST=’; title3 ’OUT=TROUT from the PRINT Procedure’; proc print data=trout; run; title2 ’Contents of OUT=trout from the FAME Crosslist function’; title3 ’Using INSET with WHERE=TICK.’; proc contents data=trout; run;
2326 F Chapter 34: The SASEFAME Interface Engine
Output 34.10.1 Listing of OUT=TROUT Using CROSSLIST= and INSET= Options in the Training FAME Data DRIECON Database, Using FAME with REMOTE ACCESS VIA CHLI Contents of OUT=trout from the FAME Crosslist function Using INSET with WHERE=TICK. Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
IBM.ADJUST 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 IBM.UHIGH 95.500 95.750 96.688 96.188 97.375 96.438 96.250 98.000 101.750 105.250 106.313 103.500 107.000 108.438 108.438
IBM.CLOSE 47.2500 47.8750 48.0938 47.8750 47.8750 47.6250 48.0000 48.8125 49.8125 52.2500 51.8750 51.5000 52.5625 53.9063 53.5000 IBM.ULOW 94.000 94.500 95.313 94.063 95.625 94.000 93.375 95.375 97.125 100.000 102.188 99.375 103.188 104.500 105.625
IBM.HIGH
IBM.LOW
IBM.OPEN
47.7500 47.8750 48.3438 48.0938 48.6875 48.2188 48.1250 49.0000 50.8750 52.6250 53.1563 51.7500 53.5000 54.2188 54.2188
47.0000 47.2500 47.6563 47.0313 47.8125 47.0000 46.6875 47.6875 48.5625 50.0000 51.0938 49.6875 51.5938 52.2500 52.8125
47.5000 47.2500 48.0000 47.3438 47.9063 47.8125 47.4375 47.8750 48.9063 50.0000 52.6250 50.0313 52.1875 52.8125 53.9688
IBM.UOPEN 95.000 94.500 96.000 94.688 95.813 95.625 94.875 95.750 97.813 100.000 105.250 100.063 104.375 105.625 107.938
IBM.UVOLUME 129012 102796 177276 127900 137724 128976 149612 215440 315504 463480 328184 368276 219880 204088 146600
IBM.UCLOSE 94.500 95.750 96.188 95.750 95.750 95.250 96.000 97.625 99.625 104.500 103.750 103.000 105.125 107.813 107.000
IBM.VOLUME 64506 51398 88638 63950 68862 64488 74806 107720 157752 231740 164092 184138 109940 102044 73300
Example 34.11: Selecting Series Using FAMEOUT= Option F 2327
Output 34.10.2 Contents of OUT=TROUT Using CROSSLIST= and INSET= Options in the Training FAME Data Alphabetic List of Variables and Attributes # 1 2 3 4 5 6 7 8 9 10 11
Variable
Type
IBM.ADJUST IBM.CLOSE IBM.HIGH IBM.LOW IBM.OPEN IBM.UCLOSE IBM.UHIGH IBM.ULOW IBM.UOPEN IBM.UVOLUME IBM.VOLUME
Num Num Num Num Num Num Num Num Num Num Num
Len 8 8 8 8 8 8 8 8 8 8 8
Example 34.11: Selecting Series Using FAMEOUT= Option This example shows how to read case series instead of time series. Case series can be numeric, boolean, string, date, or formulas that resolve to series. SASEFAME resolves all formulas that belong to the type of series data object that you specify in your FAMEOUT= option. If these object types are not specified, the FAMEOUT= option defaults to time series. The first case shows writing of all boolean case series to your SAS data set, while the second shows selection of numeric case series; the third shows writing all date case series to your SAS data set, while the fourth shows selection of all the string case series. The last case shows output of the SOURCE for all the formula case series in the ALLTYPES data base. In the first example, suppose you prefer to extract all boolean case series from your FAME data base. The following statements write all boolean case series to your SAS data set. The results are shown in Output 34.11.1 and Output 34.11.2. title1 ’***famallt: FAMEOUT option, Different Type Values***’; options validvarname=any; %let FAME=%sysget(FAME); %put(&FAME); %let FAMETEMP=%sysget(FAME_TEMP); %put(&FAMETEMP); libname lib4 sasefame "%sysget(FAME_DATA)" fameout=boolcase wildcard="?" ; data booout; set lib4.alltypes; run; title2 ’Contents of OUT=booout’;
2328 F Chapter 34: The SASEFAME Interface Engine
title3 ’Using FAMEOUT=CASE BOOLEAN option without range’; proc contents data=booout; run; title2 ’ALLTYPES FAMEOUT=BOOLCASE and open wildcard for BOOLEAN CASE Series’; title3 ’OUT=BOOOUT from the PRINT Procedure’; proc print data=booout; run;
Output 34.11.1 Contents of OUT=BOOUT Using FAMEOUT=BOOLCASE and Open Wildcard for Boolean Case Series ***famallt: FAMEOUT option, Different Type Values*** Contents of OUT=booout Using FAMEOUT=CASE BOOLEAN option without range The CONTENTS Procedure Alphabetic List of Variables and Attributes #
Variable
Type
1 2 3 4 5
BOO0 BOO1 BOO2 BOOM BOO_RES
Num Num Num Num Num
Len 8 8 8 8 8
Example 34.11: Selecting Series Using FAMEOUT= Option F 2329
Output 34.11.2 Listing of OUT=BOOOUT Using FAMEOUT=BOOLCASE and Open Wildcard for Boolean Case Series ***famallt: FAMEOUT option, Different Type Values*** ALLTYPES FAMEOUT=BOOLCASE and open wildcard for BOOLEAN CASE Series OUT=BOOOUT from the PRINT Procedure Obs
BOO0
BOO1
BOO2
BOOM
BOO_RES
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
1 0 0 1 1 0 0 1 . . . . . . . . . . . .
0 1 0 1 0 . . . 0 . . . 1 . . . 0 . . .
1 0 251 1 1 0 0 1 . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Suppose instead of boolean case series, you prefer to see numeric case series. Case series can be numeric or boolean or string series or date series. In addition to the existing case series in your FAME database, you can have formulas that resolve to numeric case series. SASEFAME will resolve all formulas that belong to the class and type of series data object that you specify in your FAMEOUT= option. The following statements write all numeric case series to your SAS data set. The results are shown in Output 34.11.3 and Output 34.11.4. libname lib5 sasefame "%sysget(FAME_DATA)" fameout=case wildcard="?" ; data csout; set lib5.alltypes; run; title2 ’Contents of OUT=csout’; title3 ’Using FAMEOUT=CASE option without range’; proc contents data=csout; run; title2 ’ALLTYPES, FAMEOUT=CASE and open wildcard for Numeric Case Series’; title3 ’OUT=CSOUT from the PRINT Procedure’; proc print data=csout; run;
2330 F Chapter 34: The SASEFAME Interface Engine
Output 34.11.3 Contents of OUT=CSOUT Using FAMEOUT=CASE and Open Wildcard for Numeric Case Series ***famallt: FAMEOUT option, Different Type Values*** Contents of OUT=csout Using FAMEOUT=CASE option without range The CONTENTS Procedure Alphabetic List of Variables and Attributes # 1 2 3 4 5 6 7 8 9 10 11
Variable
Type
FRM1 NUM0 NUM1 NUM2 NUMM NUM_RES PRC0 PRC1 PRC2 PRCM PRC_RES
Num Num Num Num Num Num Num Num Num Num Num
Len 8 8 8 8 8 8 8 8 8 8 8
Example 34.11: Selecting Series Using FAMEOUT= Option F 2331
Output 34.11.4 Listing of OUT=CSOUT Using FAMEOUT=CASE and Open Wildcard for Numeric Case Series ***famallt: FAMEOUT option, Different Type Values*** ALLTYPES, FAMEOUT=CASE and open wildcard for Numeric Case Series OUT=CSOUT from the PRINT Procedure
O b s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
F R M 1
N U M 0
N U M 1
N U M 2
0.00000 1.00000 0.66667 3.00000 4.00000 . . 7.00000 . . . . . . . . . . . .
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 . . . . . . . . . . . .
1.33333 1.00000 0.66667 0.33333 0.00000 . . . -1.33333 . . . -2.66667 . . . -4.00000 . . .
N U M M
N U M _ R E S
P R C 0
P R C 1
P R C 2
0 1 1.7014E38 3 4 5 6 7 . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
-18 -16 -14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20
0 1 2 3 4 5 6 7 . . . . . . . . . . . .
1.33333 1.00000 0.66667 0.33333 0.00000 . . . -1.33333 . . . -2.66667 . . . -4.00000 . . .
P R C M
P R C _ R E S
0 1 1.7014E38 3 4 5 6 7 . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Instead of numeric case series, you could decide to extract date case series. Case series can be numeric or boolean or string series or date series. In addition to the existing case series in your FAME database, you can have formulas that resolve to date case series. SASEFAME will resolve all formulas that belong to the class and type of series data object that you specify in your FAMEOUT= option. The following statements write all date case series to your SAS data set. The results are shown in Output 34.11.5 and Output 34.11.6. libname lib6 sasefame "%sysget(FAME_DATA)" fameout=datecase wildcard="?" ; data cdout; set lib6.alltypes; run; title2 ’Contents of OUT=cdout’; title3 ’Using FAMEOUT=DATECASE option without range’; proc contents data=cdout; run;
2332 F Chapter 34: The SASEFAME Interface Engine
title2 ’ALLTYPES, FAMEOUT=DATECASE and open wildcard for Date Case Series’; title3 ’OUT=CDOUT from the PRINT Procedure’; proc print data=cdout; run;
Output 34.11.5 Contents of OUT=CDOUT Using FAMEOUT=DATECASE ***famallt: FAMEOUT option, Different Type Values*** Contents of OUT=cdout Using FAMEOUT=DATECASE option without range The CONTENTS Procedure Alphabetic List of Variables and Attributes #
Variable
Type
1 2 3 4 5 6
DAT0 DAT1 DAT2 DATM DAT_RES FRM2
Num Num Num Num Num Num
Len
Format
Informat
8 8 8 8 8 8
YEAR4. YEAR4. YEAR4. YEAR4. YEAR4. YEAR4.
4. 4. 4. 4. 4. 4.
Output 34.11.6 Listing of OUT=CDOUT Using FAMEOUT=DATECASE ***famallt: FAMEOUT option, Different Type Values*** ALLTYPES, FAMEOUT=DATECASE and open wildcard for Date Case Series OUT=CDOUT from the PRINT Procedure Obs
DAT0
DAT1
DAT2
DATM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
1981 1982 1983 1984 1985 1986 1987 1988 . . . . . . . . . . . .
1987 1986 1985 1984 1983 . . . 1979 . . . 1975 . . . 1971 . . .
1981 1982 1983 1984 1985 1986 1987 1988 . . . . . . . . . . . .
DAT_RES . . . . . . . . . . . . . . . . . . . .
FRM2 1987 1986 1985 1984 1983 . . . 1979 . . . . . . . . . . .
Example 34.11: Selecting Series Using FAMEOUT= Option F 2333
The next example shows how to extract string case series. Case series can be numeric or boolean or string series or date series. In addition to the existing string case series in your FAME database, you can have formulas that resolve to string case series. SASEFAME will resolve all formulas that belong to the class and type of series data object that you specify in your FAMEOUT= option. The following statements write all string case series to your SAS data set. The results are shown in Output 34.11.7 and Output 34.11.8. libname lib7 sasefame "%sysget(FAME_DATA)" fameout=stringcase wildcard="?" ; data cstrout; set lib7.alltypes; run; title2 ’Contents of OUT=cstrout’; title3 ’Using FAMEOUT=STRINGCASE option without range’; proc contents data=cstrout; run; title2 ’ALLTYPES, FAMEOUT=STRINGCASE and open wildcard for STRING CASE Series’; title3 ’OUT=CSTROUT from the PRINT Procedure’; proc print data=cstrout; run;
Output 34.11.7 Contents of OUT=CSTROUT Using FAMEOUT=STRINGCASE and Open Wildcard for String Case Series ***famallt: FAMEOUT option, Different Type Values*** Contents of OUT=cstrout Using FAMEOUT=STRINGCASE option without range The CONTENTS Procedure Alphabetic List of Variables and Attributes #
Variable
Type
Len
1 2 3 4
STR0 STR1 STR2 STRM
Char Char Char Char
16 16 16 16
2334 F Chapter 34: The SASEFAME Interface Engine
Output 34.11.8 Listing of OUT=CSTROUT Using FAMEOUT=STRINGCASE and Open Wildcard for String Case Series ***famallt: FAMEOUT option, Different Type Values*** ALLTYPES, FAMEOUT=STRINGCASE and open wildcard for STRING CASE Series OUT=CSTROUT from the PRINT Procedure Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
STR0 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
STR1 0 1 2 3 4 5 6 7
STR2 1.333333 1 0.6666667 0.3333333 0
STRM 0 1 2 3 4 5 7
-1.333333
-2.666667
-4
Suppose you prefer to extract all the source for the formulas in your FAME database. The following statements show the source of all formulas written to your SAS data set. The results are shown in Output 34.11.9 and Output 34.11.10. Another example of FAMEOUT=FORMULA option is shown in Example 34.6. libname lib8 sasefame "%sysget(FAME_DATA)" fameout=formula wildcard="?" ; data cforout; set lib8.alltypes; run; title2 ’Contents of OUT=cforout’; title3 ’Using FAMEOUT=FORMULA option without range’; proc contents data=cforout; run;
Example 34.11: Selecting Series Using FAMEOUT= Option F 2335
Output 34.11.9 Contents of OUT=CFOROUT Using FAMEOUT=FORMULA and Open Wildcard ***famallt: FAMEOUT option, Different Type Values*** Contents of OUT=cforout Using FAMEOUT=FORMULA option without range The CONTENTS Procedure Alphabetic List of Variables and Attributes #
Variable
Type
Len
1 2 3
S.DFRM S.FRM1 S.FRM2
Char Char Char
27 27 27
title2 ’ALLTYPES, FAMEOUT=FORMULA and open wildcard for FORMULA Series’; title3 ’OUT=CFOROUT from the PRINT Procedure’; proc print data=cforout noobs; run;
Output 34.11.10 Listing of OUT=CFOROUT Using FAMEOUT=FORMULA and Open Wildcard ***famallt: FAMEOUT option, Different Type Values*** ALLTYPES, FAMEOUT=FORMULA and open wildcard for FORMULA Series OUT=CFOROUT from the PRINT Procedure S.DFRM IF DBOO THEN DPRC ELSE DNUM
S.FRM1 IF BOO1 THEN NUM1 ELSE NUM2
S.FRM2 IF BOO0 THEN DAT1 ELSE DAT2
If you want all series of every type, you can merge the resulting data sets together. For more about merging SAS data sets, see SAS Language Reference: Concepts.
2336 F Chapter 34: The SASEFAME Interface Engine
References: SASEFAME Interface Engine DRI/McGraw-Hill (1997), DataLink, Lexington, MA. DRI/McGraw-Hill Data Search and Retrieval for Windows (1996), DRIPRO User’s Guide, Lexington, MA. SunGard Data Management Solutions (1998), Guide to FAME Database Servers, 888 Seventh Avenue, 12th Floor, New York, NY 10106 USA [http://www.fame.sungard.com/support.html], [http://www.data.sungard.com/] SunGard Data Management Solutions (1995), User’s Guide to FAME, Ann Arbor, MI [http://www.fame.sungard.com/support.html]. SunGard Data Management Solutions (1995), Reference Guide to Seamless C HLI, Ann Arbor, MI [http://www.fame.sungard.com/support.html]. SunGard Data Management Solutions(1995), Command Reference for Release 7.6, Vols. 1 and 2, Ann Arbor, MI [http://www.fame.sungard.com/support.html]. Organization For Economic Cooperation and Development (1992), Annual National Accounts: Volume I. Main Aggregates Content Documentation for Magnetic Tape Subscription, Paris, France. Organization For Economic Cooperation and Development (1992), Annual National Accounts: Volume II. Detailed Tables Technical Documentation for Magnetic Tape Subscription, Paris, France. Organization For Economic Cooperation and Development (1992), Main Economic Indicators Database Note, Paris, France. Organization For Economic Cooperation and Development (1992), Main Economic Indicators Inventory, Paris, France. Organization For Economic Cooperation and Development (1992), Main Economic Indicators OECD Statistics on Magnetic Tape Document, Paris, France. Organization For Economic Cooperation and Development (1992), OECD Statistical Information Research and Inquiry System Magnetic Tape Format Documentation, Paris, France. Organization For Economic Cooperation and Development (1992), Quarterly National Accounts Inventory of Series Codes, Paris, France. Organization For Economic Cooperation and Development (1992), Quarterly National Accounts Technical Documentation, Paris, France.
Acknowledgments F 2337
Acknowledgments Many people have been instrumental in the development of the ETS Interface engine. The individuals listed here have been especially helpful. Jeff Kaplan, SunGard Data Management Solutions, Ann Arbor, MI. Rick Langston, SAS Institute, Cary, NC. Kelly Fellingham, SAS Institute, Cary, NC. The final responsibility for the SAS System lies with SAS Institute alone. We hope that you will always let us know your opinions about the SAS System and its documentation. It is through your participation that SAS software is continuously improved.
2338
Chapter 35
The SASEHAVR Interface Engine Contents Overview: SASEHAVR Interface Engine . . . . . . . . . . . . . . . . . . . . . . Getting Started: SASEHAVR Interface Engine . . . . . . . . . . . . . . . . . . . Structure of a SAS Data Set Containing Time Series Data . . . . . . . . . . Reading and Converting HAVER DLX Time Series . . . . . . . . . . . . . Using the SAS DATA Step . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the SAS Windowing Environment . . . . . . . . . . . . . . . . . . . Syntax: SASEHAVR Interface Engine . . . . . . . . . . . . . . . . . . . . . . . . The LIBNAME libref SASEHAVR Statement . . . . . . . . . . . . . . . . Details: SASEHAVR Interface Engine . . . . . . . . . . . . . . . . . . . . . . . . The SAS Output Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . Mapping HAVER Frequencies to SAS Time Intervals . . . . . . . . . . . . Error Recovery for SASEHAVR . . . . . . . . . . . . . . . . . . . . . . . . Examples: SASEHAVR Interface Engine . . . . . . . . . . . . . . . . . . . . . . Example 35.1: Examining the Contents of a HAVER Database . . . . . . . Example 35.2: Viewing Quarterly Time Series from a HAVER Database . . Example 35.3: Viewing Monthly Time Series from a HAVER Database . . . Example 35.4: Viewing Weekly Time Series from a HAVER Database . . . Example 35.5: Viewing Daily Time Series from a HAVER Database . . . . Example 35.6: Limiting the Range of Time Series from a HAVER Database Example 35.7: Using the WHERE Statement to Subset Time Series from a HAVER Database . . . . . . . . . . . . . . . . . . . . . . . . . . . Example 35.8: Using the KEEP Option to Subset Time Series from a HAVER Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example 35.9: Using the SOURCE Option to Subset Time Series from a HAVER Database . . . . . . . . . . . . . . . . . . . . . . . . . . . Example 35.10: Using the GROUP Option to Subset Time Series from a HAVER Database . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Elements Reference: SASEHAVR Interface Engine . . . . . . . . . . . . . . HAVER ANALYTICS DLX Database Profile . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2340 2340 2340 2340 2341 2341 2342 2343 2346 2346 2346 2347 2350 2351 2352 2353 2354 2355 2356 2357 2359 2361 2363 2369 2369 2376 2376
2340 F Chapter 35: The SASEHAVR Interface Engine
Overview: SASEHAVR Interface Engine The SASEHAVR interface engine is a seamless interface between HAVER and SAS data processing that enables SAS users to read economic and financial time series data residing in a HAVER ANALYTICS DLX (Data Link Express) database. The HAVER ANALYTICS DLX economic and financial database offerings include United States Economic Indicators, Specialized Databases, Financial Indicators, Industry, Industrial Countries, Emerging Markets, International Organizations, Forecasts and As Reported Data, and United States Regional. For more details, see “Data Elements Reference: SASEHAVR Interface Engine” on page 2369. The SASEHAVR engine uses the LIBNAME statement to enable you to specify how you would like to subset your HAVER data and how you would like to aggregate the selected time series to the same frequency. The SAS DATA step can then be used to perform further subsetting and to store the resulting time series into a SAS data set. You can perform more analysis if desired either in the same SAS session or in another session at a later time.
Getting Started: SASEHAVR Interface Engine
Structure of a SAS Data Set Containing Time Series Data SAS represents time series data in a two-dimensional array called a SAS data set whose columns correspond to series variables and whose rows correspond to measurements of these variables at certain time periods. The time periods at which observations are recorded can be included in the data set as a time ID variable. The SASEHAVR engine provides a time ID variable called DATE. The DATE variable can be represented in any of the time intervals shown in the section “Mapping HAVER Frequencies to SAS Time Intervals” on page 2346.
Reading and Converting HAVER DLX Time Series The SASEHAVR engine supports reading and converting time series stored in HAVER DLX databases. The SASEHAVR engine enables you to limit the range of data with the START= and END= libname options. Start dates and end dates are recommended on the libname statement to help you save resources when processing large databases or when processing a large number of observations. The SASEHAVR engine enables you to convert or aggregate all selected time series to a desired frequency. By default, SASEHAVR selects the time series variables that match the frequency of the first selected variable. To select variables of one specific frequency, use the FREQ= option. If no selection criteria are specified, the first selected variable is the first physical DLXRecord read
Using the SAS DATA Step F 2341
from the HAVER database. To force aggregation of all selected variables to the frequency specified by the FREQ= option, the FORCE= FREQ option can be used. Aggregation is only supported from a more frequent time interval to a less frequent time interval, such as from weekly to monthly. If a conversion to a more frequent frequency is attempted, all missing values are returned by the HAVER DLX API (application programming interface). See “Aggregating to Quarterly Frequency Using the FORCE= FREQ Option” on page 2350. The FORCE= option is ignored if the FREQ= option is not specified.
Using the SAS DATA Step If desired, you can store your selected time series in a SAS data set by using the SAS DATA step. You can further subset your data by using the WHERE, KEEP, or DROP statements in your DATA step. For more efficient subsetting of time series by HAVER variables, by HAVER groups, or by HAVER sources, you can use the corresponding KEEP=, GROUP=, and SOURCE= option on the LIBNAME libref SASEHAVR statement . For your convenience, wildcarding is supported in these options. There are three wildcard symbols: ’*’, ’?’ and ’#’. The ’*’ wildcard corresponds to any character string and will include any string pattern that corresponds to that position in the matching variable name. The ’?’ stands for any single alphanumeric character. Lastly, the ’#’ wildcard corresponds to a single numeric character. You can also deselect time series by HAVER variables, by HAVER groups, or by HAVER sources, by using the corresponding DROP=, DROPGROUP= or the DROPSOURCE= option. These options also support wildcarding to facilitate deselection based on variable names, group names, or source names. Once your selected data is stored in a SAS data set, you can use it as you would any other SAS data set.
Using the SAS Windowing Environment You can see the available data sets in the SAS LIBNAME window of the SAS windowing environment by selecting the SASEHAVR libref in the LIBNAME window that you have previously defined in your LIBNAME statement. You can view your SAS output observations by double-clicking on the desired output data set libref in the LIBNAME window of the SAS windowing environment. You can type Viewtable on the SAS command line to view your SASEHAVR tables, views, or librefs. It is recommended that Viewtable be used for output data set viewing by storing your output data sets in a separate physical folder/library from your input data bases. When this guideline is followed, the default location for output data sets is SASWORK. When this guideline is not followed, it is expected that the following errors are likely to be encountered for some of your input databases being viewed:
2342 F Chapter 35: The SASEHAVR Interface Engine
ERROR: No variable selected with current options. ERROR: No variable selected with current options. ERROR: No variable selected with current options. ERROR: No variable selected with current options. ERROR: No variable selected with current options.
When this happens, you will notice that you get one error message for each input data base that does not have the selected options on the sasehavr libref that you have double-clicked on.
Syntax: SASEHAVR Interface Engine The SASEHAVR engine uses standard engine syntax. Options used by SASEHAVR are summarized in the following table.
The LIBNAME libref SASEHAVR Statement F 2343
Table 35.1
Summary of LIBNAME libref SASEHAVR statement Options
Option
Description
FREQ= FREQUENCY= INTERVAL= START=
specifies the HAVER frequency specifies the HAVER frequency specifies the HAVER frequency specifies a HAVER start date to limit the selection of time series that begins with the specified date specifies a HAVER start date to limit the selection of time series that begins with the specified date specifies a HAVER start date to limit the selection of time series that begins with the specified date specifies a HAVER start date to limit the selection of time series that begins with the specified date specifies a HAVER end date to limit the selection of time series that ends with the specified date specifies a HAVER end date to limit the selection of time series that ends with the specified date specifies a HAVER end date to limit the selection of time series that ends with the specified date specifies a list of comma-delimited HAVER variables to keep in the output SAS data set specifies a list of comma-delimited HAVER variables to drop in the output SAS data set specifies a list of comma-delimited HAVER groups to keep in the output SAS data set specifies a list of comma-delimited HAVER groups to keep in the output SAS data set specifies a list of comma-delimited HAVER groups to drop in the output SAS data set specifies a list of comma-delimited HAVER sources to keep in the output SAS data set specifies a list of comma-delimited HAVER sources to keep in the output SAS data set specifies a list of comma-delimited HAVER sources to drop in the output SAS data set specifies that all selected variables should be aggregated to the frequency specified in the FREQ= option
STARTDATE= STDATE= BEGIN= END= ENDDATE= ENDATE= KEEP= DROP= GROUP= KEEPGROUP= DROPGROUP= SOURCE= KEEPSOURCE= DROPSOURCE= FORCE= FREQ
The LIBNAME libref SASEHAVR Statement LIBNAME libref sasehavr ’physical name ’ options ;
The physical name specifies the location of the folder where your HAVER DLX database resides.
2344 F Chapter 35: The SASEHAVR Interface Engine
The following options can be used in the LIBNAME libref SASEHAVR statement: FREQ=haver_frequency
specifies the HAVER frequency. All HAVER frequencies are supported by the SASEHAVR engine. Accepted frequency values are annual, year, yearly, quarter, quarterly, qtr, monthly, month, mon, week.1, week.2, week.3, week.4, week.5, week.6, week.7, weekly, week, daily, day. START=start_date
specifies the start date for the time series in the form YYYYMMDD. END=end_date
specifies the end date for the time series in the form YYYYMMDD. KEEP=haver_variables
specifies the list of HAVER variables to be included in the output SAS data set. This list is comma-delimited and must be surrounded by quotes “”. DROP=haver_variables
specifies the list of HAVER variables to be excluded from the output SAS data set. This list is comma-delimited and must be surrounded by quotes “”. GROUP=haver_groups
specifies the list of HAVER groups to be included in the output SAS data set. This list is comma-delimited and must be surrounded by quotes “”. DROPGROUP=haver_groups
specifies the list of HAVER groups to be excluded from the output SAS data set. This list is comma-delimited and must be surrounded by quotes “”. SOURCE=haver_sources
specifies the list of HAVER sources to be included in the output SAS data set. This list is comma-delimited and must be surrounded by quotes “”. DROPSOURCE=haver_sources
specifies the list of HAVER sources to be excluded from the output SAS data set. This list is comma-delimited and must be surrounded by quotes “”. FORCE= FREQ
specifies that the selected variables are to be aggregated to the frequency in the FREQ= option. Aggregation is only supported from a more frequent time interval to a less frequent time interval, such as from weekly to monthly. See “Aggregating to Quarterly Frequency Using the FORCE= FREQ Option” on page 2350 for sample output and suggested error recovery from attempting a conversion that yields missing values when specifying a higher frequency conversion. This option is ignored if the FREQ= option is not set. For a more complete discussion of HAVER frequencies and SAS time intervals see the section “Mapping HAVER Frequencies to SAS Time Intervals” on page 2346. As an example, LIBNAME libref sasehavr ’physical-name’ FREQ=MONTHLY;
The LIBNAME libref SASEHAVR Statement F 2345
By default, the SASEHAVR engine reads all time series in the HAVER database that you reference when using your SASEHAVR libref. The haver_startdate is specified in the form YYYYMMDD. The start date is used to delimit the data to a specified start date. For example, to read the time series in the TEST library starting on July 4, 1996, you would specify LIBNAME test sasehavr ’physical-name’ STARTDATE=19960704;
When you use the START= option, you are limiting the range of observations that are read from the time series and that are converted to the desired frequency. Start dates can help you save resources when processing large databases or when processing a large number of observations. It is also possible to select specific variables to be included or excluded from the SAS data set by using the KEEP= or the DROP= option. LIBNAME test sasehavr ’physical-name’ KEEP="ABC*, XYZ??"; LIBNAME test sasehavr ’physical-name’ DROP="*SC*, #T#";
When the KEEP= or the DROP= option is used the resulting SAS data set will keep or drop the variables that you select in that option. There are three wildcards currently available: ’*’, ’?’ and ’#’. The ’*’ wildcard corresponds to any character string and will include any string pattern that corresponds to that position in the matching variable name. The ’?’ means that any single alphanumeric character is valid. And finally, the ’#’ wildcard corresponds to a single numeric character. You can also select time series in your data by using the GROUP= or the SOURCE= option to select on GROUP name or on SOURCE name. Alternatively, you can deselect time series by using the DROPGROUP= or the DROPSOURCE= option. LIBNAME test sasehavr ’physical-name’ GROUP="CBA, *ZYX"; LIBNAME test sasehavr ’physical-name’ DROPGROUP="TKN*, XCZ?"; LIBNAME test sasehavr ’physical-name’ SOURCE="FRB"; LIBNAME test sasehavr ’physical-name’ DROPSOURCE="NYSE";
SASEHAVR selects only the variables that are of the specified frequency in the FREQ= option. If this option is not specified, SASEHAVR selects the variables that match the frequency of the first selected variable. If no other selection criterion are specified, by default, the first selected variable is the first physical DLXRecord read from the HAVER database. The FORCE= FREQ option can be specified to force the aggregation of all variables selected to be of the frequency specified in the FREQ= option. Aggregation is only supported from a more frequent time interval to a less frequent time interval, such as from weekly to monthly. See “Aggregating to Quarterly Frequency Using the FORCE= FREQ Option” on page 2350 for suggested recovery from using a frequency that does not aggregate the data appropriately. The FORCE= option is ignored if the FREQ= option is not specified.
2346 F Chapter 35: The SASEHAVR Interface Engine
Details: SASEHAVR Interface Engine
The SAS Output Data Set You can use the SAS DATA step to write the HAVER converted series to a SAS data set. This enables the user to easily analyze the data using SAS. You can specify the name of the output data set on the DATA statement. This causes the engine supervisor to create a SAS data set using the specified name in either the SAS WORK library, or if specified, the USER library. For more about naming your SAS data set see the section “Characteristics of SAS Data Libraries” in SAS Language Reference: Dictionary. The contents of the SAS data set include the DATE of each observation, the name of each series read from the HAVER database, and the label or HAVER description of each series. Missing values are represented as ’.’ in the SAS data set. You can use PROC PRINT and PROC CONTENTS to print your output data set and its contents. You can use PROC SQL along with the SASEHAVR engine to create a view of your SAS data set. The DATE variable in the SAS data set contains the date of the observation. The SASEHAVR engine automatically maps the HAVER intervals to the appropriate corresponding SAS interval. A more detailed discussion of how to map HAVER frequencies to SAS time intervals follows.
Mapping HAVER Frequencies to SAS Time Intervals Table 35.2 summarizes the mapping of HAVER frequencies to SAS time intervals. For more information refer to “Date Intervals, Formats, and Functions” in SAS/ETS User’s Guide. Table 35.2
Mapping HAVER Frequencies to SAS Time Intervals
HAVER Frequency
SAS Time Interval
FREQ=
ANNUAL QUARTERLY MONTHLY WEEKLY (SUNDAY) WEEKLY (MONDAY) WEEKLY (TUESDAY) WEEKLY (WEDNESDAY) WEEKLY (THURSDAY) WEEKLY (FRIDAY) WEEKLY (SATURDAY) WEEKLY WEEK.1-WEEK.7 DAILY
YEAR QTR MONTH WEEK.1 WEEK.2 WEEK.3 WEEK.4 WEEK.5 WEEK.6 WEEK.7 WEEKLY WEEKDAY17W
YEARLY QTRLY MON WEEK.1 WEEK.2 WEEK.3 WEEK.4 WEEK.5 WEEK.6 WEEK.7 WEEKLY DAY
Error Recovery for SASEHAVR F 2347
Error Recovery for SASEHAVR Common errors are easy to avoid by noting the valid dates that are specified in the warning messages in your SAS log. Often you can get rid of errors by removing either your date restriction (START= and END= options), by removing your FORCE= FREQ option, or by letting the haver_frequency default to the original frequency rather than attempting a conversion. Here are some common error scenarios and how to handle them.
Using the Optimum Range for Best Output Results Suppose you see the following warnings in your SAS log: libname kgs2 sasehavr "%sysget(HAVER_DATA)" start= 19550101 end=19600105 keep="FCSEED, FCSEEI, FCSEEM, BGSX, BGSM, FXDUSBC" group="I01, F56, M02, R30" source="JPM,CEN,OMB" ; NOTE: Libref KGS2 was successfully assigned as follows: Engine: HAVERDLX Physical Name: C:\haver data kgse9; set kgs2.haver; NOTE: Defaulting to MONTHLY frequency. WARNING: Start date (19550101) is not a valid date. Engine is ignoring your start date and using default. Setting the default Haver start date to 7001. WARNING: End date (19600105) is not a valid date. Engine is ignoring your end date and using default. Setting the default Haver end date to 10103. run; NOTE: There were 375 observations read from the data set KGS2.HAVER. NOTE: The data set WORK.KGSE9 has 375 observations and 4 variables.
The important diagnostic to note here is the warning message which tells you that the data starts in January of 1970 (HAVER date 7001), and ends in March, 2001 (HAVER date 10103). Since the specified range falls outside the range of data, no observations are in range so the engine uses the default range stated in the warning messages. Changing the START= and END= options to overlap the results in data spanning from JAN1970 to MAR2001. To view the entire range of selected data, remove the START= and END= options from your LIBNAME statement: libname kgs sasehavr "%sysget(HAVER_DATA)" keep="FCSEED, FCSEEI, FCSEEM, BGSX, BGSM, FXDUSBC" group="I01, F56, M02, R30" source="JPM,CEN,OMB" ;
2348 F Chapter 35: The SASEHAVR Interface Engine
NOTE: Libref KGS was successfully assigned as follows: Engine: HAVERDLX Physical Name: C:\haver data kgse5; set kgs.haver; NOTE: Defaulting to MONTHLY frequency. run; NOTE: There were 375 observations read from the data set KGS.HAVER. NOTE: The data set WORK.KGSE5 has 375 observations and 4 variables.
Using a Valid Range of Data With START= and END= Options In this example, an error about an invalid range is issued: libname lib1 sasehavr "%sysget(HAVER_DATA)" freq=Weekly start=20060301 end=20060531; NOTE: Libref LIB1 was successfully assigned as follows: Engine: HAVERDLX Physical Name: C:\haver libname lib2 "\\dntsrc\usrtmp\saskff" ; NOTE: Libref LIB2 was successfully assigned as follows: Engine: V9 Physical Name: \\dntsrc\usrtmp\saskff data lib2.wweek; set lib1.intwkly; ERROR: No observations found inside RANGE. The valid range for HAVER dates is (610104-1050318). ERROR: No observations found in specified range. keep date m11: ; run; WARNING: The variable date in the DROP, KEEP, or RENAME list has never been referenced. WARNING: The variable m11: in the DROP, KEEP, or RENAME list has never been referenced. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set LIB2.WWEEK may be incomplete. When this step was stopped there were 0 observations and 0 variables. WARNING: Data set LIB2.WWEEK was not replaced because this step was stopped.
In the preceding example, the important diagnostic message is the first error statement which tells you that the RANGE of HAVER dates is invalid for the specified frequency. A valid range is one that overlaps the dates (610104-1050318). Removing the range altogether will cause the engine to output the entire range of data. libname lib1 sasehavr "%sysget(HAVER_DATA)"
freq=Weekly;
Error Recovery for SASEHAVR F 2349
NOTE: Libref LIB1 was successfully assigned as follows: Engine: HAVERDLX Physical Name: C:\haver libname lib2 "\\dntsrc\usrtmp\saskff" ; NOTE: Libref LIB2 was successfully assigned as follows: Engine: V9 Physical Name: \\dntsrc\usrtmp\saskff data lib2.wweek; set lib1.intwkly; keep date m11: ; run; NOTE: There were 2307 observations read from the data set LIB1.INTWKLY. NOTE: The data set LIB2.WWEEK has 2307 observations and 35 variables.
When giving a range of dates, since the START=, END= options give day-based dates, its important to use dates that correspond to the FREQ= option, especially with weekly frequencies, such as week.1-week.7. Since the FREQ=week.4 selects weeks that begin on Wednesday, the start and end dates need to be specified as Wednesday dates. libname lib1 sasehavr "%sysget(HAVER_DATA)" freq=Week.4 start=20050302 end=20050309; NOTE: Libref LIB1 was successfully assigned as follows: Engine: HAVERDLX Physical Name: \\tappan\crsp1\haver title2 ’Weekly dataset with freq=week.4 range is small’; libname lib2 "\\dntsrc\usrtmp\saskff" ; NOTE: Libref LIB2 was successfully assigned as follows: Engine: V9 Physical Name: \\dntsrc\usrtmp\saskff data lib2.wweek; set lib1.intwkly; keep date m11: ; run; NOTE: There were 2 observations read from the data set LIB1.INTWKLY. NOTE: The data set LIB2.WWEEK has 2 observations and 25 variables.
Giving bad dates (i.e., Tuesday dates) for a Wednesday FREQ=week.4 will result in the following error. ERROR: Fatal error in GetDate routine. Remove the range statement or change the START= date to be consistent with the freq=option. ERROR: No observations found in specified range.
2350 F Chapter 35: The SASEHAVR Interface Engine
Aggregating to Quarterly Frequency Using the FORCE= FREQ Option In the next example, 6 time series are selected by the KEEP=option, but their frequencies are annual, monthly, quarterly, so when the FREQ= weekly and FORCE= freq options are used, a diagnostic appears in the log stating that the engine is forcing the frequency to QUARTERLY for better date alignment of observations. The first selected variable is BALO which is a quarterly time series, which causes the default choice of freq to be quarterly: title1 ’***HAVKWC.SAS: KEEP= option tests with wildcards***’; %setup( ets ); /*----------------*/ /* Wildcard: * */ /*----------------*/ title2 "keep=B*, G*, I*"; title3 "6 valid variables are: BALO BGSM BGSX BPBCA G IUM"; libname lib1 sasehavr ’C:\haver\’ keep="B*, G*, I*" freq=weekly force=freq; NOTE: Libref LIB1 was successfully assigned as follows: Engine: HAVERDLX Physical Name: C:\haver\ data wc; set lib1.haver; WARNING: Earliest Start Date in DLX Database matches QUARTERLY frequency better than the specified WEEKLY frequency. Engine is forcing the frequency to QUARTERLY for better date alignment of observations. run; NOTE: There were 221 observations read from the data set LIB1.HAVER. NOTE: The data set WORK.WC has 221 observations and 7 variables.
Note that the time series IUM is an annual frequency, so the attempt to convert to a quarterly frequency produces all missing values in the output range because aggregation produces only missing values when going from a lower frequency to a higher (forced) frequency.
Examples: SASEHAVR Interface Engine Before running the following sample code, set your HAVER_DATA environment variable to point to the ETS SASMISC folder containing sample HAVER databases. The provided sample data files are HAVERD.DAT, HAVERD.IDX, HAVERW.IDX, and HAVERW.DAT.
Example 35.1: Examining the Contents of a HAVER Database F 2351
Example 35.1: Examining the Contents of a HAVER Database To see which time series are in your HAVER database, use PROC CONTENTS with the SASEHAVR LIBNAME statement to read the contents. libname lib1 sasehavr "%sysget(HAVER_DATA)" freq=yearly start=19920101 end=20041231 force=freq; data hwouty; set lib1.haverw; run; title1 ’Haver Analytics Database, HAVERW.DAT’; title2 ’PROC CONTENTS for Time Series converted to yearly frequency’; proc contents data=hwouty; run;
In the preceding example, the HAVER database is called haverw and it resides in the directory referenced in lib1. The DATA statement names the SAS output data set hwouty, which will reside in saswork. All time series in the HAVER haverw database are listed alphabetically in Output 35.1.1. Output 35.1.1 Examining the Contents of HAVER Analytics Database, haverw.dat Haver Analytics Database, HAVERW.DAT PROC CONTENTS for Time Series converted to yearly frequency The CONTENTS Procedure Alphabetic List of Variables and Attributes # Variable Type Len Format Label 1 DATE 2 FA 3 FCM1M
Num Num Num
4 5 6 7
Num Num Num Num
FM1 FTA1MA FTB3 LICN
8 YEAR4. Date of Observation 8 Total Assets: All Commercial Banks (SA, Bil.$) 8 1-Month Treasury Bill Market Bid Yield at Constant Maturity (%) 8 Money Stock: M1 (SA, Bil.$) 8 Treasury 4-Week Bill: Total Amount Accepted (Bil$) 8 3-Month Treasury Bills, Auction (% p.a.) 8 Unemployment Insurance: Initial Claims, State Programs (NSA, Thous)
You could use the following SAS statements to create a SAS data set named hwouty and to print its contents. libname lib1 sasehavr "%sysget(HAVER_DATA)" freq=yearly start=19920101 end=20041231 force=freq;
2352 F Chapter 35: The SASEHAVR Interface Engine
data hwouty; set lib1.haverw; run; title1 ’Haver Analytics Database, Frequency=yearly, infile=haverw.dat’; title2 ’Define a range inside the data range for OUT= dataset,’; title3 ’Using the START=19920101 END=20041231 LIBNAME options.’; proc print data=hwouty; run;
The preceding LIBNAME lib1 statement specifies that all time series in the haverw database be converted to yearly frequency but to only select the range of data from January 1, 1992, to December 31, 2004. The resulting SAS data set hwouty is shown in Output 35.1.2. Output 35.1.2 Defining a Range Inside the Data Range for Yearly Time Series Haver Analytics Database, Frequency=yearly, infile=haverw.dat Define a range inside the data range for OUT= dataset, Using the START=19920101 END=20041231 LIBNAME options. Obs
DATE
FA
FCM1M
FM1
FTA1MA
FTB3
LICN
1 2 3 4 5 6 7 8 9 10 11 12 13
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
3466.3 3624.6 3875.8 4209.3 4399.1 4820.3 5254.8 5608.1 6115.4 6436.2 7024.9 7302.9 7950.5
. . . . . . . . . 2.31368 1.63115 1.02346 1.26642
965.31 1077.69 1144.85 1142.70 1106.46 1069.23 1079.56 1101.14 1104.07 1136.31 1192.03 1268.40 1337.89
. . . . . . . . . 11.753 18.798 16.089 13.019
3.45415 3.01654 4.28673 5.51058 5.02096 5.06885 4.80726 4.66154 5.84644 3.44471 1.61548 1.01413 1.37557
407.340 344.934 340.054 357.038 351.358 321.513 317.077 301.581 301.108 402.583 402.796 399.137 345.109
Example 35.2: Viewing Quarterly Time Series from a HAVER Database Consider the following statements for quarterly frequency conversion of all time series for the period spanning April 1, 2001, to December 31, 2004. libname lib1 sasehavr "%sysget(HAVER_DATA)" freq=quarterly start=20010401 end=20041231 force=freq; data hwoutq;
Example 35.3: Viewing Monthly Time Series from a HAVER Database F 2353
set lib1.haverw; run; title1 ’HAVER Analytics Database, Frequency=quarterly, infile=haverw.dat’; title2 ’ Define a range inside the data range for OUT= dataset’; title3 ’ Using the START=20010401 END=20041231 LIBNAME options.’; proc print data=hwoutq; run;
The resulting SAS data set hwoutq is shown in Output 35.2.1. Output 35.2.1 Defining a Range Inside the Data Range for Quarterly Time Series HAVER Analytics Database, Frequency=quarterly, infile=haverw.dat Define a range inside the data range for OUT= dataset Using the START=20010401 END=20041231 LIBNAME options. Obs
DATE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2001Q2 2001Q3 2001Q4 2002Q1 2002Q2 2002Q3 2002Q4 2003Q1 2003Q2 2003Q3 2003Q4 2004Q1 2004Q2 2004Q3 2004Q4
FA
FCM1M
FM1
FTA1MA
6225.4 6425.9 6436.2 6396.3 6563.5 6780.0 7024.9 7054.5 7319.6 7238.6 7302.9 7637.3 7769.8 7949.5 7950.5
. 2.98167 2.00538 1.73077 1.72769 1.69231 1.37385 1.17846 1.08000 0.92000 0.91538 0.90231 0.94692 1.34923 1.82429
1115.75 1157.90 1169.62 1186.92 1183.30 1189.89 1207.80 1231.41 1262.24 1286.21 1293.76 1312.43 1332.75 1343.79 1362.60
. 12.077 11.753 22.309 17.126 21.076 18.798 24.299 14.356 16.472 16.089 21.818 12.547 21.549 13.019
FTB3 3.68308 3.27615 1.95308 1.72615 1.72077 1.64769 1.36731 1.15269 1.05654 0.92885 0.91846 0.91308 1.06885 1.49393 2.01731
LICN 356.577 368.408 477.685 456.292 368.592 352.892 433.408 458.746 386.185 361.346 390.269 400.585 310.508 305.862 362.171
Example 35.3: Viewing Monthly Time Series from a HAVER Database Suppose you want to convert your time series to a monthly frequency like this: libname lib1 sasehavr "%sysget(HAVER_DATA)" freq=monthly start=20040401 end=20041231 force=freq; data hwoutm; set lib1.haverw; run; title1 ’Haver Analytics Database, Frequency=monthly, infile=haverw.dat’;
2354 F Chapter 35: The SASEHAVR Interface Engine
title2 ’ title3 ’
Define a range inside the data range for OUT= dataset’; Using the START=20040401 END=20041231 LIBNAME options.’;
proc print data=hwoutm; run;
The result from using the range of April 1, 2004, to December 31, 2004, is shown in Output 35.3.1. Output 35.3.1 Defining a Range Inside the Data Range for Monthly Time Series Haver Analytics Database, Frequency=monthly, infile=haverw.dat Define a range inside the data range for OUT= dataset Using the START=20040401 END=20041231 LIBNAME options. Obs 1 2 3 4 5 6 7 8 9
DATE APR2004 MAY2004 JUN2004 JUL2004 AUG2004 SEP2004 OCT2004 NOV2004 DEC2004
FA
FCM1M
FM1
FTA1MA
FTB3
LICN
7703.8 7704.7 7769.8 7859.5 7890.0 7949.5 7967.6 8053.4 7950.5
0.9140 0.9075 1.0275 1.1840 1.3650 1.5400 1.6140 1.9125 1.9640
1325.73 1332.96 1339.50 1330.13 1347.84 1352.40 1355.28 1366.06 1365.60
16.946 25.043 12.547 21.823 25.213 21.549 21.322 21.862 13.019
0.93900 1.03375 1.26625 1.34900 1.48000 1.65000 1.74750 2.05625 2.20200
317.36 297.00 315.45 357.32 276.70 270.70 304.24 335.85 441.16
Example 35.4: Viewing Weekly Time Series from a HAVER Database An example of weekly data spanning September 1, 2004, to December 31, 2004, is shown in Output 35.4.1. libname lib1 sasehavr "%sysget(HAVER_DATA)" freq=weekly start=20040901 end=20041231; data hwoutw; set lib1.haverw; run; title1 ’HAVER Analytics Database, Frequency=weekly, infile=haverw.dat’; title2 ’ Define a range inside the data range for OUT= dataset’; title3 ’ Using the START=20040901 END=20041231 LIBNAME options.’; proc print data=hwoutw; run;
Example 35.5: Viewing Daily Time Series from a HAVER Database F 2355
Output 35.4.1 Defining a Range Inside the Data Range for Weekly Time Series HAVER Analytics Database, Frequency=weekly, infile=haverw.dat Define a range inside the data range for OUT= dataset Using the START=20040901 END=20041231 LIBNAME options. Obs
DATE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
29AUG2004 05SEP2004 12SEP2004 19SEP2004 26SEP2004 03OCT2004 10OCT2004 17OCT2004 24OCT2004 31OCT2004 07NOV2004 14NOV2004 21NOV2004 28NOV2004 05DEC2004 12DEC2004 19DEC2004 26DEC2004
FA 7890.0 7906.2 7962.7 7982.1 7987.9 7949.5 7932.4 7956.9 7957.3 7967.6 7954.1 8009.7 7938.3 8053.4 8010.7 8054.8 8019.2 7995.5
FCM1M 1.39 1.46 1.57 1.57 1.56 1.54 1.56 1.59 1.63 1.75 1.84 1.89 1.93 1.99 2.05 2.08 1.98 1.89
FM1
FTA1MA
FTB3
LICN
1360.8 1353.7 1338.3 1345.6 1359.7 1366.0 1362.3 1350.1 1346.0 1362.7 1350.4 1354.8 1364.5 1381.3 1379.3 1355.1 1358.3 1366.3
27.342 25.213 25.255 15.292 15.068 21.549 17.183 17.438 12.133 21.322 22.028 25.495 24.000 24.424 21.862 22.178 12.066 12.787
1.515 1.580 1.635 1.640 1.685 1.710 1.685 1.680 1.770 1.855 1.950 2.045 2.075 2.155 2.195 2.210 2.200 2.180
275.2 273.7 250.6 275.8 282.7 279.6 338.7 279.8 317.6 305.5 354.8 311.9 356.0 320.7 472.7 370.6 374.7 446.6
Example 35.5: Viewing Daily Time Series from a HAVER Database Consider viewing the HAVER Analytics daily database named haverd . The contents of this database can be seen by submitting the following DATA step. libname lib1 sasehavr "%sysget(HAVER_DATA)" freq=daily start=20041201 end=20041231; data hwoutd; set lib1.haverd; run; title1 ’HAVER Analytics Database, HAVERD.DAT’; title2 ’PROC CONTENTS for Time Series converted to daily frequency’; proc contents data=hwoutd; run;
Output 35.5.1 shows the output of PROC CONTENTS with the time id variable DATE followed by the time series variables FCM10, FCM1M, FFED, FFP1D, FXAUS, and TCC with their corresponding attributes such as type, length, format, and label.
2356 F Chapter 35: The SASEHAVR Interface Engine
Output 35.5.1 Examining the Contents of a Daily HAVER Analytics Database, haverd.dat HAVER Analytics Database, HAVERD.DAT PROC CONTENTS for Time Series converted to daily frequency The CONTENTS Procedure Alphabetic List of Variables and Attributes # Variable Type Len Format Label 1 DATE 2 FCM10
Num Num
3 FCM1M
Num
4 5 6 7
Num Num Num Num
FFED FFP1D FXAUS TCC
8 DATE9. Date of Observation 8 10-Year Treasury Note Yield at Constant Maturity (Avg, % p.a.) 8 1-Month Treasury Bill Market Bid Yield at Constant Maturity (%) 8 Federal Funds [Effective] Rate (% p.a.) 8 1-Day AA Financial Commercial Paper (% per annum) 8 Foreign Exchange Rate: Australia (US$/Australian$) 8 Treasury: Closing Operating Cash Balance (Today, Mil.$)
Example 35.6: Limiting the Range of Time Series from a HAVER Database Suppose you limit the range of data to the month of December: libname lib1 sasehavr "%sysget(HAVER_DATA)" freq=daily start=20041201 end=20041231; data hwoutd; set lib1.haverd; run; title1 ’Haver Analytics Database, Frequency=daily, infile=haverd.dat’; title2 ’ Define a range inside the data range for OUT= dataset’; title3 ’ Using the START=20041201 END=20041231 LIBNAME options.’; proc print data=hwoutd; run;
Note that Output 35.6.1 for daily conversion shows the frequency as the SAS time interval for WEEKDAY.
Example 35.7: Using the WHERE Statement to Subset Time Series from a HAVER Database F 2357 Output 35.6.1 Defining a Range Inside the Data Range for Daily Time Series Haver Analytics Database, Frequency=daily, infile=haverd.dat Define a range inside the data range for OUT= dataset Using the START=20041201 END=20041231 LIBNAME options. Obs
DATE
FCM10
FCM1M
FFED
FFP1D
FXAUS
TCC
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
01DEC2004 02DEC2004 03DEC2004 06DEC2004 07DEC2004 08DEC2004 09DEC2004 10DEC2004 13DEC2004 14DEC2004 15DEC2004 16DEC2004 17DEC2004 20DEC2004 21DEC2004 22DEC2004 23DEC2004 24DEC2004 27DEC2004 28DEC2004 29DEC2004 30DEC2004 31DEC2004
4.38 4.40 4.27 4.24 4.23 4.14 4.19 4.16 4.16 4.14 4.09 4.19 4.21 4.21 4.18 4.21 4.23 . 4.30 4.31 4.33 4.27 4.24
2.06 2.06 2.06 2.09 2.08 2.08 2.07 2.07 2.04 2.01 1.98 1.93 1.95 1.97 1.92 1.84 1.83 . 1.90 1.88 1.76 1.68 1.89
2.04 2.00 1.98 2.04 1.99 2.01 2.05 2.09 2.18 2.24 2.31 2.26 2.23 2.26 2.24 2.25 2.34 2.27 2.24 2.24 2.23 2.24 1.97
2.01 1.98 1.96 1.98 1.99 1.98 2.03 2.07 2.13 2.22 2.27 2.24 2.20 2.21 2.21 2.22 2.08 . 2.26 2.24 2.23 2.18 2.18
0.7754 0.7769 0.7778 0.7748 0.7754 0.7545 0.7532 0.7495 0.7592 0.7566 0.7652 0.7563 0.7607 0.7644 0.7660 0.7656 0.7654 0.7689 0.7777 0.7787 0.7709 0.7785 0.7805
7564 8502 7405 7019 15520 12329 5441 6368 11395 13695 39765 33640 32764 36216 35056 34599 24467 26898 31874 30513 34754 20045 24690
Example 35.7: Using the WHERE Statement to Subset Time Series from a HAVER Database Using a WHERE statement in the DATA step can be useful for further subsetting. libname lib1 sasehavr "%sysget(HAVER_DATA)" freq=daily start=20041101 end=20041231; data hwoutd; set lib1.haverd; where date between ’01nov2004’d and ’01dec2004’d; run; title1 ’Haver Analytics Database, Frequency=daily, infile=haverd.dat’; title2 ’ Define a range inside the data range for OUT= dataset’; title3 ’ Using the START=20041101 END=20041231 LIBNAME options.’; title4 ’Subset further: where date between 01nov2004 and 31dec2004.’; proc print data=hwoutd; run;
2358 F Chapter 35: The SASEHAVR Interface Engine
Output 35.7.1 shows that the time slice of November 1, 2004, to December 31, 2004, is narrowed further by the DATE test on the WHERE statement to stop at December 1, 2004. Output 35.7.1 Defining a Range Using START=20041101 END=20041231 along with the WHERE statement Haver Analytics Database, Frequency=daily, infile=haverd.dat Define a range inside the data range for OUT= dataset Using the START=20041101 END=20041231 LIBNAME options. Subset further: where date between 01nov2004 and 31dec2004. Obs
DATE
FCM10
FCM1M
FFED
FFP1D
FXAUS
TCC
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
01NOV2004 02NOV2004 03NOV2004 04NOV2004 05NOV2004 08NOV2004 09NOV2004 10NOV2004 11NOV2004 12NOV2004 15NOV2004 16NOV2004 17NOV2004 18NOV2004 19NOV2004 22NOV2004 23NOV2004 24NOV2004 25NOV2004 26NOV2004 29NOV2004 30NOV2004 01DEC2004
4.11 4.10 4.09 4.10 4.21 4.22 4.22 4.25 . 4.20 4.20 4.21 4.14 4.12 4.20 4.18 4.19 4.20 . 4.24 4.34 4.36 4.38
1.79 1.86 1.83 1.85 1.86 1.88 1.89 1.88 . 1.91 1.92 1.93 1.90 1.91 1.98 1.98 1.99 1.98 . 2.01 2.02 2.07 2.06
1.83 1.74 1.73 1.77 1.76 1.80 1.79 1.92 1.92 2.02 2.06 1.98 1.99 1.99 1.99 2.01 2.00 2.02 2.02 2.01 2.03 2.02 2.04
1.80 1.74 1.73 1.75 1.75 1.84 1.81 1.85 . 1.96 2.03 1.95 1.93 1.94 1.93 1.96 1.95 1.89 . 1.97 2.00 2.04 2.01
0.7460 0.7447 0.7539 0.7585 0.7620 0.7578 0.7618 0.7592 . 0.7685 0.7719 0.7728 0.7833 0.7786 0.7852 0.7839 0.7860 0.7863 . 0.7903 0.7852 0.7723 0.7754
35111 34091 14862 23304 19872 21095 16390 12872 12872 28926 10480 13417 10506 6293 5100 6045 18135 14109 14109 20588 24322 18033 7564
Example 35.8: Using the KEEP Option to Subset Time Series from a HAVER Database F 2359
Example 35.8: Using the KEEP Option to Subset Time Series from a HAVER Database To select specific time series, the KEEP= or DROP= option can also be used as follows. libname lib1 sasehavr "%sysget(HAVER_DATA)" freq=daily start=20041101 end=20041231 keep="FCM*"; data hwoutd; set lib1.haverd; run; title1 ’Haver Analytics Database, Frequency=daily, infile=haverd.dat’; title2 ’ Define a range inside the data range for OUT= dataset’; title3 ’ Using the START=20041101 END=20041231 LIBNAME options.’; title4 ’ Subset further: Using keep="FCM*" LIBNAME option ’; proc print data=hwoutd; run;
Output 35.8.1 shows two series that are selected by using KEEP="FCM*" on the LIBNAME statement.
2360 F Chapter 35: The SASEHAVR Interface Engine
Output 35.8.1 Using the KEEP Option along with Defining a Range Using START=20041101 END=20041231 Haver Analytics Database, Frequency=daily, infile=haverd.dat Define a range inside the data range for OUT= dataset Using the START=20041101 END=20041231 LIBNAME options. Subset further: Using keep="FCM*" LIBNAME option Obs
DATE
FCM10
FCM1M
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
01NOV2004 02NOV2004 03NOV2004 04NOV2004 05NOV2004 08NOV2004 09NOV2004 10NOV2004 11NOV2004 12NOV2004 15NOV2004 16NOV2004 17NOV2004 18NOV2004 19NOV2004 22NOV2004 23NOV2004 24NOV2004 25NOV2004 26NOV2004 29NOV2004 30NOV2004 01DEC2004 02DEC2004 03DEC2004 06DEC2004 07DEC2004 08DEC2004 09DEC2004 10DEC2004 13DEC2004 14DEC2004 15DEC2004 16DEC2004 17DEC2004 20DEC2004 21DEC2004 22DEC2004 23DEC2004 24DEC2004 27DEC2004 28DEC2004 29DEC2004 30DEC2004 31DEC2004
4.11 4.10 4.09 4.10 4.21 4.22 4.22 4.25 . 4.20 4.20 4.21 4.14 4.12 4.20 4.18 4.19 4.20 . 4.24 4.34 4.36 4.38 4.40 4.27 4.24 4.23 4.14 4.19 4.16 4.16 4.14 4.09 4.19 4.21 4.21 4.18 4.21 4.23 . 4.30 4.31 4.33 4.27 4.24
1.79 1.86 1.83 1.85 1.86 1.88 1.89 1.88 . 1.91 1.92 1.93 1.90 1.91 1.98 1.98 1.99 1.98 . 2.01 2.02 2.07 2.06 2.06 2.06 2.09 2.08 2.08 2.07 2.07 2.04 2.01 1.98 1.93 1.95 1.97 1.92 1.84 1.83 . 1.90 1.88 1.76 1.68 1.89
Example 35.9: Using the SOURCE Option to Subset Time Series from a HAVER Database F 2361
The DROP option can be used to drop specific variables from a HAVER database. To specify this option, use DROP= instead of KEEP=.
Example 35.9: Using the SOURCE Option to Subset Time Series from a HAVER Database To select specific variables that belong to a certain source, the SOURCE= or DROPSOURCE= option can be used, similar to the way you use the KEEP= or DROP= option. libname lib1 sasehavr "%sysget(HAVER_DATA)" freq=daily start=20041101 end=20041223 source="FRB"; data hwoutd; set lib1.haverd; run; title1 ’Haver Analytics Database, Frequency=daily, infile=haverd.dat’; title2 ’ Define a range inside the data range for OUT= dataset’; title3 ’ Using the START=20041101 END=20041223 LIBNAME options.’; title4 ’ Subset further: Using source="FRB" LIBNAME option’; proc print data=hwoutd; run;
Output 35.9.1 shows two series that are selected by using SOURCE="FRB" on the LIBNAME statement.
2362 F Chapter 35: The SASEHAVR Interface Engine
Output 35.9.1 Using the SOURCE Option along with Defining a Range Using START=20041101 END=20041213 Haver Analytics Database, Frequency=daily, infile=haverd.dat Define a range inside the data range for OUT= dataset Using the START=20041101 END=20041223 LIBNAME options. Subset further: Using source="FRB" LIBNAME option Obs
DATE
FCM10
FFED
FFP1D
FXAUS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
01NOV2004 02NOV2004 03NOV2004 04NOV2004 05NOV2004 08NOV2004 09NOV2004 10NOV2004 11NOV2004 12NOV2004 15NOV2004 16NOV2004 17NOV2004 18NOV2004 19NOV2004 22NOV2004 23NOV2004 24NOV2004 25NOV2004 26NOV2004 29NOV2004 30NOV2004 01DEC2004 02DEC2004 03DEC2004 06DEC2004 07DEC2004 08DEC2004 09DEC2004 10DEC2004 13DEC2004 14DEC2004 15DEC2004 16DEC2004 17DEC2004 20DEC2004 21DEC2004 22DEC2004 23DEC2004
4.11 4.10 4.09 4.10 4.21 4.22 4.22 4.25 . 4.20 4.20 4.21 4.14 4.12 4.20 4.18 4.19 4.20 . 4.24 4.34 4.36 4.38 4.40 4.27 4.24 4.23 4.14 4.19 4.16 4.16 4.14 4.09 4.19 4.21 4.21 4.18 4.21 4.23
1.83 1.74 1.73 1.77 1.76 1.80 1.79 1.92 1.92 2.02 2.06 1.98 1.99 1.99 1.99 2.01 2.00 2.02 2.02 2.01 2.03 2.02 2.04 2.00 1.98 2.04 1.99 2.01 2.05 2.09 2.18 2.24 2.31 2.26 2.23 2.26 2.24 2.25 2.34
1.80 1.74 1.73 1.75 1.75 1.84 1.81 1.85 . 1.96 2.03 1.95 1.93 1.94 1.93 1.96 1.95 1.89 . 1.97 2.00 2.04 2.01 1.98 1.96 1.98 1.99 1.98 2.03 2.07 2.13 2.22 2.27 2.24 2.20 2.21 2.21 2.22 2.08
0.7460 0.7447 0.7539 0.7585 0.7620 0.7578 0.7618 0.7592 . 0.7685 0.7719 0.7728 0.7833 0.7786 0.7852 0.7839 0.7860 0.7863 . 0.7903 0.7852 0.7723 0.7754 0.7769 0.7778 0.7748 0.7754 0.7545 0.7532 0.7495 0.7592 0.7566 0.7652 0.7563 0.7607 0.7644 0.7660 0.7656 0.7654
Example 35.10: Using the GROUP Option to Subset Time Series from a HAVER Database F 2363
Example 35.10: Using the GROUP Option to Subset Time Series from a HAVER Database To select specific variables that belong to a certain group, the GROUP= or DROPGROUP= option can also be used, similar to the way you use the KEEP= or DROP= option. Output 35.10.1, Output 35.10.2, and Output 35.10.3 show 3 different cross sections of the same database, haverw, by specifying 3 unique GROUP= options: GROUP="F*" on LIBNAME lib1, GROUP="M*" on LIBNAME lib2, and GROUP="E*" on LIBNAME lib3. libname lib1 sasehavr "%sysget(HAVER_DATA)" freq=week.6 force=freq start=20040102 end=20041001 group="F*"; data hwoutw; set lib1.haverw; run; title1 ’Haver Analytics Database, Frequency=week.6, infile=haverw.dat’; title2 ’ Define a range inside the data range for OUT= dataset’; title3 ’ Using the START=20040102 END=20041001 LIBNAME options.’; title4 ’ Subset further: Using group="F*" LIBNAME option’; proc print data=hwoutw; run;
2364 F Chapter 35: The SASEHAVR Interface Engine
Output 35.10.1 Using the GROUP=F* Option along with Defining a Range Haver Analytics Database, Frequency=week.6, infile=haverw.dat Define a range inside the data range for OUT= dataset Using the START=20040102 END=20041001 LIBNAME options. Subset further: Using group="F*" LIBNAME option Obs
DATE
FCM1M
FTA1MA
FTB3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
02JAN2004 09JAN2004 16JAN2004 23JAN2004 30JAN2004 06FEB2004 13FEB2004 20FEB2004 27FEB2004 05MAR2004 12MAR2004 19MAR2004 26MAR2004 02APR2004 09APR2004 16APR2004 23APR2004 30APR2004 07MAY2004 14MAY2004 21MAY2004 28MAY2004 04JUN2004 11JUN2004 18JUN2004 25JUN2004 02JUL2004 09JUL2004 16JUL2004 23JUL2004 30JUL2004 06AUG2004 13AUG2004 20AUG2004 27AUG2004 03SEP2004 10SEP2004 17SEP2004 24SEP2004 01OCT2004
0.86 0.88 0.84 0.79 0.86 0.90 0.90 0.92 0.96 0.97 0.96 0.94 0.95 0.95 0.94 0.92 0.89 0.87 0.89 0.89 0.91 0.94 0.97 1.01 1.05 1.08 1.11 1.14 1.16 1.21 1.30 1.34 1.37 1.36 1.39 1.46 1.57 1.57 1.56 1.54
16.089 12.757 12.141 12.593 17.357 21.759 21.557 21.580 21.390 24.119 24.294 23.334 21.400 21.818 17.255 14.143 14.136 16.946 22.772 23.113 25.407 25.043 27.847 27.240 17.969 12.159 12.547 21.303 25.024 25.327 21.823 21.631 28.237 26.070 27.342 25.213 25.255 15.292 15.068 21.549
0.885 0.920 0.870 0.875 0.890 0.920 0.920 0.915 0.930 0.940 0.930 0.945 0.930 0.945 0.930 0.915 0.935 0.970 0.985 1.060 1.040 1.050 1.130 1.230 1.390 1.315 1.355 1.320 1.315 1.330 1.425 1.465 1.470 1.470 1.515 1.580 1.635 1.640 1.685 1.710
Example 35.10: Using the GROUP Option to Subset Time Series from a HAVER Database F 2365
libname lib2 sasehavr "%sysget(HAVER_DATA)" freq=week.6 force=freq start=20040102 end=20041001 group="M*"; data hwoutw; set lib2.haverw; run; title1 ’Haver Analytics Database, Frequency=week.6, infile=haverw.dat’; title2 ’ Define a range inside the data range for OUT= dataset’; title3 ’ Using the START=20040102 END=20041001 LIBNAME options.’; title4 ’ Subset further: Using group="M*" LIBNAME option’; proc print data=hwoutw; run;
2366 F Chapter 35: The SASEHAVR Interface Engine
Output 35.10.2 Using the GROUP=M* Option along with Defining a Range Haver Analytics Database, Frequency=week.6, infile=haverw.dat Define a range inside the data range for OUT= dataset Using the START=20040102 END=20041001 LIBNAME options. Subset further: Using group="M*" LIBNAME option Obs
DATE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
02JAN2004 09JAN2004 16JAN2004 23JAN2004 30JAN2004 06FEB2004 13FEB2004 20FEB2004 27FEB2004 05MAR2004 12MAR2004 19MAR2004 26MAR2004 02APR2004 09APR2004 16APR2004 23APR2004 30APR2004 07MAY2004 14MAY2004 21MAY2004 28MAY2004 04JUN2004 11JUN2004 18JUN2004 25JUN2004 02JUL2004 09JUL2004 16JUL2004 23JUL2004 30JUL2004 06AUG2004 13AUG2004 20AUG2004 27AUG2004 03SEP2004 10SEP2004 17SEP2004 24SEP2004 01OCT2004
FA
FM1
7302.9 7351.2 7378.5 7434.7 7492.4 7510.4 7577.8 7648.7 7530.6 7546.7 7602.0 7603.0 7625.5 7637.3 7667.4 7692.5 7698.4 7703.8 7686.8 7734.6 7695.8 7704.7 7715.1 7754.0 7753.2 7796.2 7769.8 7852.3 7852.8 7854.7 7859.5 7847.9 7888.7 7851.8 7890.0 7906.2 7962.7 7982.1 7987.9 7949.5
1298.2 1294.3 1286.8 1296.7 1305.1 1303.1 1309.1 1317.0 1321.1 1316.2 1312.7 1324.0 1337.6 1337.9 1327.3 1321.8 1322.2 1331.6 1342.5 1325.5 1330.1 1337.7 1329.0 1324.4 1336.4 1345.8 1351.4 1330.1 1326.3 1323.5 1340.6 1337.3 1340.1 1347.3 1360.8 1353.7 1338.3 1345.6 1359.7 1366.0
Example 35.10: Using the GROUP Option to Subset Time Series from a HAVER Database F 2367
libname lib3 sasehavr "%sysget(HAVER_DATA)" freq=week.6 force=freq start=20040102 end=20041001 group="E*"; data hwoutw; set lib3.haverw; run; title1 ’Haver Analytics Database, Frequency=week.6, infile=haverw.dat’; title2 ’ Define a range inside the data range for OUT= dataset’; title3 ’ Using the START=20040102 END=20041001 LIBNAME options.’; title4 ’ Subset further: Using group="E*" LIBNAME option’; proc print data=hwoutw; run;
2368 F Chapter 35: The SASEHAVR Interface Engine
Output 35.10.3 Using the GROUP=E* Option along with Defining a Range Haver Analytics Database, Frequency=week.6, infile=haverw.dat Define a range inside the data range for OUT= dataset Using the START=20040102 END=20041001 LIBNAME options. Subset further: Using group="E*" LIBNAME option Obs
DATE
LICN
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
02JAN2004 09JAN2004 16JAN2004 23JAN2004 30JAN2004 06FEB2004 13FEB2004 20FEB2004 27FEB2004 05MAR2004 12MAR2004 19MAR2004 26MAR2004 02APR2004 09APR2004 16APR2004 23APR2004 30APR2004 07MAY2004 14MAY2004 21MAY2004 28MAY2004 04JUN2004 11JUN2004 18JUN2004 25JUN2004 02JUL2004 09JUL2004 16JUL2004 23JUL2004 30JUL2004 06AUG2004 13AUG2004 20AUG2004 27AUG2004 03SEP2004 10SEP2004 17SEP2004 24SEP2004 01OCT2004
552.8 677.9 490.8 382.3 406.3 433.2 341.6 328.2 342.1 339.0 312.1 304.5 296.8 304.2 350.7 335.0 313.7 283.2 292.8 297.1 294.0 304.1 308.2 312.4 322.5 318.7 349.9 444.5 394.4 315.7 282.1 291.5 268.0 272.1 275.2 273.7 250.6 275.8 282.7 279.6
Data Elements Reference: SASEHAVR Interface Engine F 2369
Data Elements Reference: SASEHAVR Interface Engine
HAVER ANALYTICS DLX Database Profile The HAVER DLX economic and financial database offerings include United States Economic Indicators, Specialized Databases, Financial Indicators, Industry, Industrial Countries, Emerging Markets, International Organizations, Forecasts and As Reported Data, and United States Regional. Table 35.3 is a list of available databases and the corresponding description of each. Table 35.3
Available Data Offerings
Database Name
Offering Type
Description
USECON
U.S. Economic Indicators U.S. Economic Indicators U.S. Economic Indicators U.S. Economic Indicators U.S. Economic Indicators U.S. Economic Indicators U.S. Economic Indicators U.S. Economic Indicators U.S. Economic Indicators U.S. Economic Indicators U.S. Economic Indicators U.S. Economic Indicators U.S. Economic Indicators U.S. Economic Indicators Specialized Databases
HAVER Analytics’ primary database of US Economic, Financial data Complete US National Income and Product Accounts from the Bureau of Economic Analysis Business and Consumer Expectations, surveys
USNA SURVEYS SURVEYW CPIDATA PPI PPIR LABOR EMPL CEW IP FFUNDS CAPSTOCK USINT CBDB
Business and Consumer Expectations, weekly surveys Consumer Price Indexes, monthly in CPI Detailed Report Producer Price Indexes by Bureau of Labor Statistics Producer Price Indexes by Bureau of Labor Statistics Employment and Earnings by Bureau of Labor Statistics Household Employment Survey, monthly by Bureau of Labor Statistics Covered Employment and Wages, monthly, quarterly Industrial Production and Capacity Utilization by Federal Reserve Board Flow of Funds Data by Federal Reserve Board Capital Stock by Bureau of Economic Analysis U.S. International Trade (TIC) data by Country and Product Conference Board Database, monthly
2370 F Chapter 35: The SASEHAVR Interface Engine
Table 35.3
continued
Database Name
Offering Type
Description
BCI
Specialized Databases Specialized Databases Specialized Databases
US Business Cycle Indicators
UMSCA FIBER
ECRI DAILY INTDAILY WEEKLY INTWKLY SPD SPW SPM SPAH MSCID MSCIW MSCIM BONDINDX BONDS
ICI QFR MBAMTG DLINQ
Specialized Databases Financial Indicators Financial Indicators Financial Indicators Financial Indicators Financial Indicators Financial Indicators Financial Indicators Financial Indicators Financial Indicators Financial Indicators Financial Indicators Financial Indicators Financial Indicators Financial cators Financial cators Financial cators Financial cators
IndiIndiIndiIndi-
Consumer Sentiment Survey from The University of Michigan Surveys of Consumers Fiber Business Cycle Indicators from the Foundation of International Business and Economic Research Cyclical Indicators from Economic Cycle Research Institute, weekly and monthly U.S. Daily Statistics Data Country Daily Statistics U.S. Weekly Statistics Country Weekly Statistics Standard and Poor’s Industry Groups, daily Standard and Poor’s Industry Groups, weekly Standard and Poor’s Industry Groups, monthly Standard and Poor’s Analysts’ Annual Handbook, yearly Morgan Stanley Capital International, daily Morgan Stanley Capital International, weekly Morgan Stanley Capital International, monthly U.S. Bond Indexes CITIGROUP Bond Performance Indexes by CITIGROUP Global Markets, formerly Salomon Smith Barney Mutual Fund Activity from the Investment Company Institute Quarterly Financial Report Capacity Utilization by Federal Reserve Board Mortgage Delinquency Rates by Mortgage Bankers Association Consumer Delinquency Rates by American Bankers Association, monthly
HAVER ANALYTICS DLX Database Profile F 2371
Table 35.3
continued
Database Name
Offering Type
Description
FDIC
INDUSTRY REALTOR
Financial Indicators Financial Indicators Industry Industry
OGJ
Industry
OGJANN
Industry
OILWKLY EEI
Industry Industry
ASM RAILSHAR
Industry Industry
CHEMWEEK
Industry
BALTIC
Industry
G10+
Industrial Countries Industrial Countries Industrial Countries Industrial Countries Industrial Countries Industrial Countries Industrial Countries Industrial Countries Industrial Countries Industrial Countries Industrial Countries
FDIC Banking Statistics TIC data by Country and Product U.S. Government Financial Statistics by U.S. Treasury U.S. Industry Statistics Home Sales from National Association of Realtors U.S. and International Energy Statistics, from Pennwell Publishing’s Oil and Gas Journal U.S. and International Energy Statistics, from Pennwell Publishing’s Oil and Gas Journal, annual Weekly Oil Statistics U.S.Electric Output from the Edison Electric Institute, weekly Annual Survey of Manufactures Railcar Loadings from Association of American Railroads and Atlantic Systems Weekly Chemical Prices from the Chemical Division of Access Intelligence Baltic Freight Indexes, from the Baltic Exchange in London International Macroeconomic Data
GOVFIN
PMI INTSRVYS JAPAN JAPANW CANSIM CANSIMR UK UKSRVYS GERMANY FRANCE
Purchasing Managers Surveys by NTC Research Country Surveys Japan from Nomura Research Institute Japan from Nomura Research Institute, weekly Canada from Statistics Canada and the Bank of Canada Canada from Statistics Canada and the Bank of Canada United Kingdom United Kingdom Surveys, by NTC Research Germany from the Deutsche Bundesbank and Statistics Bundesamt France, Statistics from INSEE, the Bank of France and the Ministry of France
2372 F Chapter 35: The SASEHAVR Interface Engine
Table 35.3
continued
Database Name
Offering Type
Description
ITALY
Industrial Countries Industrial Countries Industrial Countries Industrial Countries Industrial Countries Industrial Countries Industrial Countries Emerging Markets Emerging Markets Emerging Markets Emerging Markets Emerging Markets International Organizations International Organizations International Organizations International Organizations International Organizations International Organizations International Organizations International Organizations International Organizations International Organizations
Italy, from Istituto Nazionale di Statistica and Banca d’Italia Spain, from the Instituto Nacional de Estadistica and the Banco de Espana Ireland, from the Central Statistics Office and Central Bank Norway, Sweden, Denmark, Finland
SPAIN IRELAND NORDIC ALPMED BENELUX ANZ EMERGELA EMERGEPR CHINA EMERGECW EMERGEMA EUROSTAT OECDMEI OECDNAQ OECDNA OECDLFS OUTLOOK IFS IFSANN IMFDOTM IMFDOT
Austria, Switzerland, Greece, Portugal Belgium, Netherlands, Luxembourg, monthly Australia and New Zealand Latin American Macroeconomic Data Asia/Pacific RIm Emerging Markets CEIC Premium China Database, from CEIC Central and Eastern Europe and Western Asia Middle East and African Emerging Markets European Union Data from EUROSTAT and OECD OECD Main Economic Indicators OECD Quarterly National Accounts OECD Annual National Accounts OECD Quarterly Labor Force OECD Economic Outlook International Financial Statistics from International Monetary Fund International Financial Statistics, annual from International Monetary Fund Direction of Trade Statistics, monthly from International Monetary Fund Direction of Trade Statistics from International Monetary Fund
HAVER ANALYTICS DLX Database Profile F 2373
Table 35.3
continued
Database Name
Offering Type
Description
WBPRICES
International Organizations International Organizations International Organizations International Organizations Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data
World Commodity Prices from World Development Prospects Group (Pinksheets) Global Development Finance from World Bank Debt Tables United Nations Population Projections
WBDEBT UNPOP INTCOMP MA4CAST
MA4CSTL
CQM
CPM
OEFQMACR
OEFMAJOR
OEFINTER
OEFMINOR
OEFQIND
EIUIAMER
EIUIASIA
EIUIEEUR
International Comparisons from Bureau of Labor Statistics Short Term U.S. Economic Forecasts from Macroeconomic Advisors Long Term U.S. Economic Forecasts from Macroeconomic Advisors Canadian Quarterly Model from Centre for Spatial Economics Canadian Provincial Model from Centre for Spatial Economics OEF Global Macroeconomic Forecasts from Oxford Economic Forecasting OEF Global Macroeconomic Forecasts from Oxford Economic Forecasting OEF Global Macroeconomic Forecasts from Oxford Economic Forecasting OEF Global Macroeconomic Forecasts from Oxford Economic Forecasting OEF Global Industry from Oxford Economic Forecasting EIU Market Indicators and Forecasts from the Economist Intelligence Unit EIU Market Indicators and Forecasts from the Economist Intelligence Unit EIU Market Indicators and Forecasts from the Economist Intelligence Unit
2374 F Chapter 35: The SASEHAVR Interface Engine
Table 35.3
continued
Database Name
Offering Type
Description
EIUIMENA
Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data
EIU Market Indicators and Forecasts from the Economist Intelligence Unit
EIUISUBS
EIUIWEUR
EIUIREGS
EIUDAMER
EIUDASIA
EIUDEEUR
EIUDMENA
EIUDSUBS
EIUDWEUR
EIUDOECD
EIUDREGS
AS1REPNA
MMSAMER
EIU Market Indicators and Forecasts from the Economist Intelligence Unit EIU Market Indicators and Forecasts from the Economist Intelligence Unit EIU Market Indicators and Forecasts from the Economist Intelligence Unit EIU Country Data from the Economist Intelligence Unit EIU Country Data from the Economist Intelligence Unit EIU Country Data from the Economist Intelligence Unit EIU Country Data from the Economist Intelligence Unit EIU Country Data from the Economist Intelligence Unit EIU Country Data from the Economist Intelligence Unit EIU Country Data from the Economist Intelligence Unit EIU Country Data from the Economist Intelligence Unit Action Economics Forecast Medians and As Reported Data MMS Survey Medians and As First Reported Data from MMS International
HAVER ANALYTICS DLX Database Profile F 2375
Table 35.3
continued
Database Name
Offering Type
Description
MMSEUR
MMS Survey Medians and As First Reported Data from MMS International
LABORR
Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data Forecasts and As Reported Data U.S. Regional
EMPLR
U.S. Regional
EMPLC
U.S. Regional
BEAEMPL BEAEMPM PERMITS PERMITY PERMITP PERMITC PERMITA REGIONAL REGIONAW PIQR PIR PIRMSA PICOUNTY PIRC1 to 9 MBAMTG
U.S. Regional U.S. Regional U.S. Regional U.S. Regional U.S. Regional U.S. Regional U.S. Regional U.S. Regional U.S. Regional U.S. Regional U.S. Regional U.S. Regional U.S. Regional U.S. Regional U.S. Regional
DLINQR
U.S. Regional
BANKRUPT GSP USPOP USPOPC PORTS EXPRQ1 to 9
U.S. Regional U.S. Regional U.S. Regional U.S. Regional U.S. Regional U.S. Regional
SURVEYS
AS4CAST
ASREPGDP
Economic Survey Forecasts
Historical Economic Forecasts
As Reported U.S. Gross Dometic Product from Bureau of Economic Analysis Monthly Payroll Employment from Bureau of Labor Statistics Labor Force and Unemployment from Bureau of Labor Statistics Labor Force and Unemployment from Bureau of Labor Statistics Annual Employment by Industry Annual Employment by Industry Residential Building Permits Residential Building Permits Residential Building Permits Residential Building Permits Residential Building Permits Selected Regional Indicators Selected Regional Indicators Personal Income Personal Income Personal Income Personal Income Personal Income Mortgage Delinquincy Rates from Mortgage Bankers Association Consumer Delinquincy Rates from American Bankers Association Bankrupts by County and MSA Gross State Product from BEA Population by Age and Sex Population by Age and Sex Trade by Port Exports by Industry and Country from the World Institute for Strategic Economic Research and the Census Bureau
2376 F Chapter 35: The SASEHAVR Interface Engine
Table 35.3
continued
Database Name
Offering Type
Description
EXPORTSR
U.S. Regional
EXPORT99
U.S. Regional
GOVFINR
U.S. Regional
FDICR
U.S. Regional
Exports by Industry and Country from the World Institute for Strategic Economic Research and the Census Bureau Exports by Industry and Country from the World Institute for Strategic Economic Research and the Census Bureau Government Financial Statistics from the Census Bureau and Rockefeller Institute of Government FDIC Banking Statistics
References HAVER ANALYTICS (2001), DLX API Programmer’s Reference, New York, NY. 10165 [http://www.haver.com/] HAVER ANALYTICS (2005), DLX Database Profile, New York, NY. 10165 HAVER ANALYTICS (2005), DATA LINK EXPRESS, Time Series Data Base Management System, New York, NY. 10165 [http://www.haver.com/]
Acknowledgments Many people have been instrumental in the development of the ETS Interface engine. The individuals listed here have been especially helpful. Maurine Haver, HAVER ANALYTICS, New York, NY. Lai Cheng, HAVER ANALYTICS, New York, NY. Rick Langston, SAS Institute, Cary, NC. The final responsibility for the SAS System lies with SAS Institute alone. We hope that you will always let us know your opinions about the SAS System and its documentation. It is through your participation that SAS software is continuously improved.
Part IV
Time Series Forecasting System
2378
Chapter 36
Overview of the Time Series Forecasting System Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the Time Series Forecasting System . . . . . . . . . . . . . . . . . . . . . . SAS Software Products Needed . . . . . . . . . . . . . . . . . . . . . . . . . . .
2379 2380 2381
Introduction The Time Series Forecasting system forecasts future values of time series variables by extrapolating trends and patterns in the past values of the series or by extrapolating the effect of other variables on the series. The system provides convenient point-and-click windows to control the time series analysis and forecasting tools of SAS/ETS software. You can use the system in a fully automatic mode, or you can use the system’s diagnostic features and time series modeling tools interactively to develop forecasting models customized to best predict your time series. The system provides both graphical and statistical features to help you choose the best forecasting method for each series. The following is a brief summary of the features of the Time Series Forecasting system. You can use the system in the following ways: use a wide variety of forecasting methods, including several kinds of exponential smoothing models, Winters method, and ARIMA (Box-Jenkins) models. You can also produce forecasts by combining the forecasts from several models. use predictor variables in forecasting models. Forecasting models can include time trend curves, regressors, intervention effects (dummy variables), adjustments you specify, and dynamic regression (transfer function) models. view plots of the data, predicted versus actual values, prediction errors, and forecasts with confidence limits, as well as autocorrelations and results of white noise and stationarity tests. Any of these plots can be zoomed and can represent raw or transformed series. use hold-out samples to select the best forecasting method
2380 F Chapter 36: Overview of the Time Series Forecasting System
compare goodness-of-fit measures for any two forecasting models side by side or list all models sorted by a particular fit statistic view the predictions and errors for each model in a spreadsheet or compare the fit of any two models in a spreadsheet examine the fitted parameters of each forecasting model and their statistical significance control the automatic model selection process: the set of forecasting models considered, the goodness-of-fit measure used to select the best model, and the time period used to fit and evaluate models customize the system by adding forecasting models for the automatic model selection process and for point-and-click manual selection save your work in a project catalog print an audit trail of the forecasting process show source statements for PROC ARIMA code save and print system output including spreadsheets and graphs
Using the Time Series Forecasting System Chapters starting from Chapter 37, “Getting Started with Time Series Forecasting,” through Chapter 41, “Using Predictor Variables,” contain a series of example sessions that show the major features of the system. Chapters from Chapter 42, “Command Reference,” through Chapter 44, “Forecasting Process Details,” serve as reference and provide more details about how the system operates. The reference chapters contain a complete list of system features. To get started using the Time Series Forecasting system, it is a good idea to work through a few of the example sessions. Start with Chapter 37, “Getting Started with Time Series Forecasting,” and use the system to reproduce the steps shown in the examples. Continue with the other chapters when you feel comfortable using the system. The example sessions make use of time series data sets contained in the SASHELP library: air, citimon, citiqtr, citiyr, citiwk, citiday, gnp, retail, usecon, and workers. You can use these data sets to work through the example sessions or to experiment further with the system. Once you are familiar with how the system operates, start working with your own data to build your own forecasting models. When you have questions, consult the reference chapters mentioned above for more information about particular features. The Time Series Forecasting system forecasts time series, that is, variables that consist of ordered observations taken at regular intervals over time. Since the Time Series Forecasting system is a part of the SAS software system, time series values must be stored as variables in a SAS data set or data
SAS Software Products Needed F 2381
view, with the observations representing the time periods. The data can also be stored in an external spreadsheet or data base if you license SAS/ACCESS software. The Time Series Forecasting System chapters refer to series and variables. Since time series are stored as variables in SAS data sets or data views, these terms are used interchangeably. However, the term series is preferred when attention is focused on the sequence of data values, and the term variable is preferred when attention is focused on the data set.
SAS Software Products Needed The Time Series Forecasting system is part of SAS/ETS software. To use it, you must have a license for SAS/ETS. To use the graphical display features of the system, you must also license SAS/GRAPH software.
2382
Chapter 37
Getting Started with Time Series Forecasting Contents The Time Series Forecasting Window . . . . . . . . . . . . . Outline of the Forecasting Process . . . . . . . . . . . . . . . Specify the Input Data Set . . . . . . . . . . . . . . . . Provide a Valid Time ID Variable . . . . . . . . . . . . Select and Fit a Forecasting Model for Each Series . . . Produce the Forecasts . . . . . . . . . . . . . . . . . . Save Your Work . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . The Input Data Set . . . . . . . . . . . . . . . . . . . . . . . The Data Set Selection Window . . . . . . . . . . . . . Time Series Data Sets, ID Variables, and Time Intervals Automatic Model Fitting Window . . . . . . . . . . . . . . . Produce Forecasts Window . . . . . . . . . . . . . . . . . . . The Forecast Data Set . . . . . . . . . . . . . . . . . . Forecasting Projects . . . . . . . . . . . . . . . . . . . . . . Saving and Restoring Project Information . . . . . . . . Sharing Projects . . . . . . . . . . . . . . . . . . . . . Develop Models Window . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . Fitting Models . . . . . . . . . . . . . . . . . . . . . . Model List and Statistics of Fit . . . . . . . . . . . . . Model Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . Prediction Error Plots . . . . . . . . . . . . . . . . . . Autocorrelation Plots . . . . . . . . . . . . . . . . . . . White Noise and Stationarity Plots . . . . . . . . . . . Parameter Estimates Table . . . . . . . . . . . . . . . . Statistics of Fit Table . . . . . . . . . . . . . . . . . . . Changing to a Different Model . . . . . . . . . . . . . Forecasts and Confidence Limits Plots . . . . . . . . . Data Table . . . . . . . . . . . . . . . . . . . . . . . . Closing the Model Viewer . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2384 2389 2389 2389 2390 2390 2390 2391 2391 2391 2395 2396 2404 2406 2410 2412 2416 2417 2417 2420 2425 2427 2429 2430 2431 2433 2434 2435 2436 2437 2438
2384 F Chapter 37: Getting Started with Time Series Forecasting
This chapter outlines the forecasting process and introduces the major windows of the system through three example sessions. The first example, beginning with the section “The Time Series Forecasting Window,” shows how to use the system for fully automated forecasting of a set of time series. This example also introduces the system’s features for viewing data and forecasts through tables and interactive graphs. It also shows how to save and restore forecasting work in SAS catalogs. The second example, beginning with the section “Develop Models Window,” introduces the features for developing the best forecasting models for individual time series. The chapter concludes with an example showing how to create dating variables for your data in the form expected by the system. After working through the examples in this chapter, you should be able to do the following: select a data set of time series to work with and specify its periodicity and time ID variable use the automatic forecasting model selection feature to create forecasting models for the variables in a data set produce and save forecasts of variables in a data set examine your data and forecasts as tables of values and through interactive graphs save and restore your forecasting models by using project files in a SAS catalog and edit project information use some of the model development features to fit and select forecasting models for individual time series variables This chapter introduces these topics and helps you get started using the system. Later chapters present these topics in greater detail and document more advanced features and options.
The Time Series Forecasting Window There are several ways to get to the Time Series Forecasting System. If you prefer to use commands, invoke the system by entering forecast on the command line. You can optionally specify additional information on the command line; see Chapter 42, “Command Reference,” for details. If you are using the SAS windowing environment with pull-down menus, select the Solutions menu from the menu bar, select the Analysis item, and then select Time Series Forecasting System, as shown in Figure 37.1.
The Time Series Forecasting Window F 2385
Figure 37.1 Time Series Forecasting System Menu Selection
You can invoke the Forecasting System from the SAS Explorer window by opening an existing forecasting project. By default these projects are stored in the FMSPROJ catalog in the SASUSER library. Select SASUSER in the Explorer to display its contents. Then select FMSPROJ. This catalog is created the first time you use the Forecasting System. If you have saved projects, they appear in the Explorer with the forecasting graph icon, as shown in Figure 37.2. Double-click one of the projects, or select it with the right mouse button and then select Open from the pop-up menu, as shown in the figure. This opens the Forecasting System and opens the selected project.
2386 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.2 Opening a Project from the Explorer
To invoke the Forecasting System in the SAS desktop environment, select the Solutions menu from the menu bar, select Desktop, and then open the Analysis folder. You can run the Time Series Forecasting System or the Time Series Viewer directly, or you can drag and drop. Figure 37.3 illustrates dragging a data set (known as a table in the Desktop environment) and dropping it on the Forecasting icon. In this example, the tables reside in a user-defined folder called Time Series Data.
The Time Series Forecasting Window F 2387
Figure 37.3 Drag and Drop on the SAS Desktop
If you are using SAS/ASSIST software, select the Planning button and then select Forecasting from the pop-up menu. Any of these methods takes you to the Time Series Forecasting window, as shown in Figure 37.4.
2388 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.4 Time Series Forecasting Window
At the top of the window is a data selection area for specifying a project file and the input data set containing historical data (the known past values) for the time series variables that you want to forecast. This area also contains buttons for opening viewers to explore your input data either graphically, one series at a time, or as a table, one data set at a time. The Project and Description fields are used to specify a project file for saving and restoring forecasting models created by the system. Using project files is discussed later, and these fields are ignored for now. The lower part of the window contains six buttons: Develop Models
opens the Develop Models window, which you use to develop and fit forecasting models interactively for individual time series. Fit Models Automatically
opens the Automatic Model Fitting window, which you use to search automatically for the best forecasting model for multiple series in the input data set. Produce Forecasts
opens the Produce Forecasts window, which you use to compute forecasts for all
Outline of the Forecasting Process F 2389
the variables in the input data set for which forecasting models have been fit. Manage Projects
opens the Manage Forecasting Project window, which lists the time series for which you have fit forecasting models. You can drill down on a series to see the models that have been fit. You can delete series or models from the project, re-evaluate or refit models, and explore models and forecasts graphically or in tabular form. Exit
exits the Forecasting System. Help
displays information about the Forecasting System.
Outline of the Forecasting Process The examples shown in the following sections illustrate the basic process you use with the Forecasting System.
Specify the Input Data Set Suppose you have a number of time series, variables recorded over time, for which you want to forecast future values. The past values of these time series are stored as variables in a SAS data set or data view. The observations of this data set correspond to regular time periods, such as days, weeks, or months. The first step in the forecasting process is to tell the system to use this data set by setting the Data Set field. If your time series are not in a SAS data set, you must provide a way for the SAS System to access the data. You can use SAS features to read your data into a SAS data set; refer to SAS Language Reference. You can use a SAS/ACCESS product to establish a view of data in a database management system; refer to SAS/ACCESS documentation. You can use PROC SQL to create a SAS data view. You can use PROC DATASOURCE to read data from files supplied by supported data vendors; refer to Chapter 11, “The DATASOURCE Procedure,” for more details.
Provide a Valid Time ID Variable To use the Forecasting System, your data set must be dated: the data set must contain a time ID variable that gives the date of each observation. The time ID variable must represent the observation dates with SAS date values or with SAS datetime values (for hourly data or other frequencies less than a day), or you can use a simple time index.
2390 F Chapter 37: Getting Started with Time Series Forecasting
When SAS date values are used, the ID variable contains dates within the time periods corresponding to the observations. For example, for monthly data, the values for the time ID variable can be the date of the first day of the month corresponding to each observation, or the time ID variable can contain the date of the last day in the month. (Any date within the period serves as the time ID for the observation.) If your data set already contains a valid time ID variable with SAS date or datetime values, the next step is to specify this time ID variable in the Time ID field. If the time ID variable is named DATE, the system fills in the Time ID field automatically. If your data set does not contain a time ID, you must add a valid time ID variable before beginning the forecasting process. The Forecasting System provides features that make this easy to do. See Chapter 38, “Creating Time ID Variables,” for details.
Select and Fit a Forecasting Model for Each Series If you are using the automated model selection feature, the system performs this step for you and chooses a forecasting model for each series automatically. All you need to do is select the Fit Models Automatically button and then select the variables to fit models for. If you want more control over forecasting model selection, you can select the Develop Models button, select the series you want to forecast, and use the Develop Models window to specify a forecasting model. As part of this process, you can use the Time Series Viewer and Model Viewer graphical tools. Once you have selected a model for the first series, you can select a different series to work with and repeat the model development process until you have created forecasting models for all the series you want to forecast. The system provides many features to help you choose the best forecasting model for each series. The features of the Develop Models window and graphical viewer tools are introduced in later sections.
Produce the Forecasts Once a forecasting model has been fit for each series, select the Produce Forecasts button and use the Produce Forecasts window to compute forecast values and store them in a SAS data set.
Save Your Work If you want only a single forecast, your task is now complete. But you might want to produce updated forecasts later, as more data becomes available. In this case, you want to save the forecasting models you have created, so that you do not need to repeat the model selection and fitting process.
Summary F 2391
To save your work, fill in the Project field with the name of a SAS catalog member in which the system will store the model information when you exit the system. Later, you will select the same catalog member name when you first enter the Forecasting System, and the model information will be reloaded. Note that any number of people can work with the same project file. If you are working on a forecasting project as part of a team, you should take care to avoid conflicting updates to the project file by different team members.
Summary This is the basic outline of how the Forecasting System works. The system offers many other features and options that you might need to use (for example, the time range of the data used to fit models and how far into the future to forecast). These options will become apparent as you work with the Forecasting System. As an introductory example, the following sections use the Automatic Model Fitting and Produce Forecasts windows to perform automated forecasting of the series in an example data set.
The Input Data Set As the first step, you must specify the input data set. The Data Set field in the Time Series Forecasting window gives the name of the input data set containing the time series to forecast. Initially, this field is blank. You can specify the input data set by typing the data set name in this field. Alternatively, you can select the Browse button at the right of the Data Set field to select the data set from a list, as shown in the following section.
The Data Set Selection Window Select the Browse button to the right of the Data Set field. This opens the Data Set Selection window, as shown in Figure 37.5.
2392 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.5 Data Set Selection Window
The Libraries list shows the SAS librefs that are currently allocated in your SAS session. Initially, the SASUSER library is selected, and the SAS Data Sets list shows the data sets available in your SASUSER library. In the Libraries list, select the row that starts with SASHELP. The Data Set Selection window now lists the data sets in the SASHELP library, as shown in Figure 37.6.
The Data Set Selection Window F 2393
Figure 37.6 SASHELP Library
Use the vertical scroll bar on the SAS Data Sets list to scroll down the list until the data set CITIQTR appears. Then select the CITIQTR row. This selects the data set SASHELP.CITIQTR as the input data set. Figure 37.7 shows the Data Set Selection window after selection of CITIQTR from the SAS Data Sets list.
2394 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.7 CITIQTR Data Set Selected
Note that the Time ID field is now set to DATE and the Interval field is set to QTR. These fields are explained in the following section. Now select the OK button to complete selection of the CITIQTR data set. This closes the Data Set Selection window and returns to the Time Series Forecasting window, as shown in Figure 37.8.
Time Series Data Sets, ID Variables, and Time Intervals F 2395
Figure 37.8 Time Series Forecasting Window
Time Series Data Sets, ID Variables, and Time Intervals Before you continue with the example, it is worthwhile to consider how the system determined the values for the Time ID and Interval fields in the Data Set Selection window. The Forecasting System requires that the input data set contain time series observations, with one observation for each time period. The observations must be sorted in increasing time order, and there must be no gaps in the sequence of observations. The time period of each observation must be identified by an ID variable, which is shown in the Time ID field. If the data set contains a variable named DATE, TIME, or DATETIME, the system assumes that this variable is the SAS date or datetime valued ID variable, and the Time ID field is filled in automatically. The time ID variable for the SASHELP.CITIQTR data set is named DATE, and therefore the system set the Time ID field to DATE. If the time ID variable for a data set is not named DATE, TIME, or DATETIME, you must specify the time ID variable name. You can specify the time ID variable either by typing the ID variable name in the Time ID field or by clicking the Select button.
2396 F Chapter 37: Getting Started with Time Series Forecasting
If your data set does not contain a time ID variable with SAS date values, you can add a time ID variable using one of the windows described in Chapter 38, “Creating Time ID Variables.” Once the time ID variable is known, the Forecasting System examines the ID values to determine the time interval between observations. The data set SASHELP.CITIQTR contains quarterly observations. Therefore, the system determined that the data have a quarterly interval, and set the Interval field to QTR. If the system cannot determine the data frequency from the values of the time ID variable, you must specify the time interval between observations. You can specify the time interval by using the Interval combo box. In addition to the interval names provided in the pop-up list, you can type in more complex interval names to specify an interval that is a multiple of other intervals or that has date values in the middle of the interval (such as monthly data with time ID values falling on the 10th day of the month). See Chapter 3, “Working with Time Series Data,” and Chapter 4, “Date Intervals, Formats, and Functions,” for more information about time intervals, SAS date values, and ID variables for time series data sets.
Automatic Model Fitting Window Before you can produce forecasts, you must fit forecasting models to the time series. Select the Fit Models Automatically button. This opens the Automatic Model Fitting window, as shown in Figure 37.9.
Automatic Model Fitting Window F 2397
Figure 37.9 Automatic Model Fitting Window
The first part of the Automatic Model Fitting window confirms the project filename and the input data set name. The Series to Process field shows the number and lists the names of the variables in the input data set to which the Automatic Model Fitting process will be applied. By default, all numeric variables (except the time ID variable) are processed. However, you can specify that models be generated for only a select subset of these variables. Click the Select button to the right of the Series to Process field. This opens the Series to Process window, as shown in Figure 37.10.
2398 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.10 Series to Process Window
Use the mouse and the CTRL key to select the personal consumption expenditures series (GC), the personal consumption expenditures for durable goods series (GCD), and the disposable personal income series (GYD), as shown in Figure 37.11. (Remember to hold down the CTRL key as you make the selections; otherwise, selecting a second series will deselect the first.)
Automatic Model Fitting Window F 2399
Figure 37.11 Selecting Series for Automatic Model Fitting
Now select the OK button. This returns you to the Automatic Model Fitting window. The Series to Process field now shows the selected variables. The Selection Criterion field shows the goodness-of-fit measure that the Forecasting System will use to select the best fitting model for each series. By default, the selection criterion is the root mean squared error. To illustrate how you can control the selection criterion, this example uses the mean absolute percent error to select the best fitting models. Click the Select button to the right of the Selection Criterion field. This opens a list of statistics of fit, as shown in Figure 37.12.
2400 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.12 Choosing the Model Selection Criterion
Select Mean Absolute Percent Error and then select the OK button. The Automatic Model Fitting window now appears as shown in Figure 37.13.
Automatic Model Fitting Window F 2401
Figure 37.13 Automatic Model Fitting Window
Now that all the options are set appropriately, select the Run button. The Forecasting System now displays a notice, shown in Figure 37.14, confirming that models will be fit for three series using the automatic forecasting model search feature. This prompt is displayed because it is possible to fit models for a large number of series at once, which might take a lot of time. So the system gives you a chance to cancel if you accidentally ask to fit models for more series than you intended. Select the OK button.
2402 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.14 Automatic Model Fitting Note
The system now fits several forecasting models to each of the three series you selected. While the models are being fit, the Forecasting System displays notices indicating what it is doing so that you can observe its progress, as shown in Figure 37.15.
Automatic Model Fitting Window F 2403
Figure 37.15 “Working” Notice
For each series, the system saves the model that produces the smallest mean absolute percent error. You can have the system save all the models fit by selecting Automatic Fit from the Options menu. After the Automatic Model Fitting process has completed, the results are displayed in the Automatic Model Fitting Results window, as shown in Figure 37.16.
2404 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.16 Automatic Model Fitting Results
This resizable window shows the list of series names and descriptive labels for the forecasting models chosen for them, as well as the values of the model selection criterion and other statistics of fit. Select the Close button. This returns you to the Automatic Model Fitting window. You can now fit models for other series in this data set or change to a different data set and fit models for series in the new data set. Select the Close button to return to the Time Series Forecasting window.
Produce Forecasts Window Now that you have forecasting models for these three series, you are ready to produce forecasts. Select the Produce Forecasts button. This opens the Produce Forecasts window, as shown in Figure 37.17.
Produce Forecasts Window F 2405
Figure 37.17 Produce Forecasts Window
The Produce Forecasts window shows the input data set information and indicates the variables in the input data set for which forecasting models exist. Forecasts will be produced for these series. If you want to produce forecasts for only some of these series, use the Select button at the right of the Series field to select the series to forecast. The Data Set field in the Forecast Output box contains the name of the SAS data set in which the system will store the forecasts. The default output data set is WORK.FORECAST. You can set the forecast horizon by using the controls on the line labeled Horizon. The default horizon is 12 periods. You can change it by specifying the number of periods, number of years, or the date of the last forecast period. Position the cursor in the date field and change the forecast ending date to 1 January 1996 by typing jan1996 and pressing the ENTER key. The window now appears as shown in Figure 37.18.
2406 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.18 Produce Forecasts Window
Now select the Run button to produce the forecasts. The system indicates that the forecasts have been stored in the output data set. Select OK to dismiss the notice.
The Forecast Data Set The Forecasting System can save the forecasts to a SAS data set in three different formats. Depending on your needs, you might find one of these output formats more convenient. The output data set format is controlled by the Format combo box. You can select the following output formats. The simple format is the default. Simple
The data set contains time ID variables and the forecast variables, and it contains one observation per time period. Observations for earlier time periods contain actual values copied from the input data set; later observations contain the forecasts.
Interleaved
The data set contains time ID variables, the variable TYPE, and the forecast variables. There are several observations per time period, with the meaning of
The Forecast Data Set F 2407
each observation identified by the TYPE variable. Concatenated
The data set contains the variable SERIES, time ID variables, and the variables ACTUAL, PREDICT, ERROR, UPPER, LOWER, and STD. There is one observation per time period per forecast series. The variable SERIES contains the name of the forecast series, and the data set is sorted by SERIES and DATE.
Simple Format Forecast Data Set To see the simple format forecast data set that the system created, select the Output button. This opens a VIEWTABLE window to display the data set, as shown in Figure 37.19. Figure 37.19 Forecast Data Set—Simple Format
Figure 37.19 shows the default simple format. This form of the forecast data set contains time ID variables and the variables that you forecast. The forecast variables contain actual values or predicted values, depending on whether the date of the observation is within the range of data supplied in the input data set. Select File and Close to close the Viewtable window.
2408 F Chapter 37: Getting Started with Time Series Forecasting
Interleaved Format Forecast Data Set From the Produce Forecasts window, use the list to select the Interleaved format, as shown in Figure 37.20. Figure 37.20 Forecast Data Set Options
Now select the Run button again. The system presents a warning notice reminding you that the data set WORK.FORECAST already exists and asking if you want to replace it. Select Replace. The forecasts are stored in the data set WORK.FORECAST again, this time in the Interleaved format. Dismiss the notice that the forecast was stored. Now select the Output button again. This opens a Viewtable window to display the data set, as shown in Figure 37.21.
The Forecast Data Set F 2409
Figure 37.21 Forecast Data Set—Interleaved Format
In the interleaved format, there are several output observations for each input observation, identified by the TYPE variable. The values of the forecast variables for observations with different TYPE values are as follows. ACTUAL
actual values copied from the input data set
ERROR
the difference between the actual and predicted values
LOWER
the lower confidence limits
PREDICT
the predicted values from the forecasting model These are within-sample, onestep-ahead predictions for observations within the historical period, or multistep predictions for observations within the forecast period
STD
the estimated standard deviations of the prediction errors
UPPER
the upper confidence limits
Select File and Close to close the VIEWTABLE window.
2410 F Chapter 37: Getting Started with Time Series Forecasting
Concatenated Format Forecast Data Set Use the list to select the Concatenated format. Create the forecast data set again, and then select the Output button. The Viewtable window showing the concatenated format of the forecast data set appears, as shown in Figure 37.22. Figure 37.22 Forecast Data Set—Concatenated Format
This completes the example of how to use the Produce Forecasts window. Select File and Close to close the Viewtable window. Select the Close button to return to the Time Series Forecasting window.
Forecasting Projects The system collects all the forecasting models you create, together with the options you set, into a package called a forecasting project. You can save this information in a SAS catalog entry and
Forecasting Projects F 2411
restore your work in later forecasting sessions. You can store any number of forecasting projects under different catalog entry names. To see how this works, select the Manage Projects button. This opens the Manage Forecasting Project window, as shown in Figure 37.23. Figure 37.23 Manage Forecasting Project Window
The table in this window lists the series for which forecasting models have been fit, and it shows for each series the forecasting model used to produce the forecasts. This window provides several features that allow you to manage the information in your forecasting project. You can select a row of the table to drill down to the list of models fit to the series. Select the GYD row of the table, either by double-clicking with the mouse or by clicking once to highlight the table row and then selecting List Models from the toolbar or from the Tools menu. This opens the Model List window for this series, as shown in Figure 37.24.
2412 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.24 Model List Window
Because the Automatic Model Fitting process kept only the best fitting model, only one model appears in the model list. You can fit and retain any number of models for each series, and all the models fit and kept will appear in the series’ model list. Select Close from the toolbar or from the File menu to return to the Manage Forecasting Project window.
Saving and Restoring Project Information To illustrate how you can save your work between sessions, in this section you will exit and then re-enter the Forecasting System. From the Manage Forecasting Project window, select File and Save as. This opens the Forecasting Project to Save window. In the Project Name field, type the name WORK.TEST.TESTPROJ. In the Description field, type “Test of forecasting project file.” The window should now appear as shown in Figure 37.25.
Saving and Restoring Project Information F 2413
Figure 37.25 Project to Save Name and Description
Select the OK button. This returns you to the Project Management window and displays a message indicating that the project was saved. Select Close from the toolbar or from the File menu to return to the Time Series Forecasting window. Now select the Exit button. The system asks if you are sure you want to exit the system; select Yes. The forecasting application now terminates. Open the forecasting application again. A new project name is displayed by default. Now restore the forecasting project you saved previously. Select the Browse button to the right of the Project field. This opens the Forecasting Project File Selection window, as shown in Figure 37.26.
2414 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.26 Forecasting Project File Selection Window
Select the WORK library from the Libraries list. The Catalogs list now shows all the SAS catalogs in the WORK library. Select the TEST catalog. The Projects list now shows the list of forecasting projects in the catalog TEST. So far, you have created only one project file, TESTPROJ; so TESTPROJ is the only entry in the Projects list, as shown in Figure 37.27.
Saving and Restoring Project Information F 2415
Figure 37.27 Forecasting Projects List
Select TESTPROJ from the Projects list and then select the OK button. This returns you to the Time Series Forecasting window. The system loads the project information you saved in TESTPROJ and displays a message indicating this. The Project field is now set to WORK.TEST.TESTPROJ, and the description is the description you previously gave to TESTPROJ, as shown in Figure 37.28.
2416 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.28 Time Series Forecasting Window after Loading Project
If you now select the Manage Projects button, you will see the list of series and forecasting models you created in the previous forecasting session.
Sharing Projects If you plan to work with others on a forecasting project, you might need to consider how project information can be shared. The series, models, and results of your project are stored in a forecasting project (FMSPROJ) catalog entry in the location you specify, as illustrated in the previous section. You need only read access to the catalog to work with it, but you must have write access to save the project. Multiple users cannot open a project for update at the same time, but they can do so at different times if they all have write access to the catalog where it is stored. Project options settings such as the model selection criterion and number of models to keep are stored in an SLIST catalog entry in the SASUSER or TSFSUSER library. Write access to this catalog is required. If you have only read access to the SASUSER library, you can use the -RSASUSER option when starting SAS. You will be prompted for a location for the TSFSUSER library, if it is not already assigned. If you want to use TSFSUSER routinely, assign it before you start the Time
Develop Models Window F 2417
Series Forecasting System. Select New from the SAS Explorer file menu. In the New Library window, type TSFSUSER for the name. Click the Browse button and select the directory or folder you want to use. Turn on the enable at startup option so this library will be assigned automatically in subsequent sessions. The SASUSER library is typically used for private settings saved by individual users. This is the default location for project options. If a work group shares a single options catalog (SASUSER or TSFSUSER points to the same location for all users), then only one user can use the system at a time.
Develop Models Window In the first forecasting example, you used the Automatic Model Fitting window to fit and select the forecasting model for each series automatically. In addition to this automatic forecasting process, you can also work with time series one at a time to fit forecasting models and apply your own judgment to choose the best forecasting model for each series. Using the Automatic Model Fitting feature, the system acts like a “black box.” This section goes inside the black box to look at the kinds of forecasting methods that the system provides and introduces some of the tools the system offers to help you find the best forecasting model.
Introduction From the Time Series Forecasting window, select the Browse button to the right of the Data Set field to open the Data Set Selection window. Select the USECON data set from the SASHELP library. This data set contains monthly data on the U.S. economy. Select OK to close the selection window. Now select the Develop Models button. This opens the Series Selection window, as shown in Figure 37.29. You can enlarge this window for easier viewing of lists of data sets and series.
2418 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.29 Series Selection Window
Select the series CHEMICAL: Sales of Chemicals and Allied Products, and then select the OK button. This opens the Develop Models window, as shown in Figure 37.30.
Introduction F 2419
Figure 37.30 Develop Models Window
The Data Set, Interval, and Series fields in the upper part of the Develop Models window indicate the series with which you are currently working. You can change the settings of these fields by selecting the Browse button. The Data Range, Fit Range, and Evaluation Range fields show the time period over which data are available for the current series, and what parts of that time period are used to fit forecasting models to the series and to evaluate how well the models fit the data. You can change the settings of these fields by selecting the Set Ranges button. The bottom part of the Develop Models window consists of a table of forecasting models fit to the series. Initially, the list is empty, as indicated by the message “No models.” You can fit any number of forecasting models to each series and designate which one you want to use to produce forecasts. Graphical tools are available for exploring time series and fitted models. The two icons below the Browse button access the Time Series Viewer and the Model Viewer. Select the left icon. This opens the Time Series Viewer window, as shown in Figure 37.31.
2420 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.31 Chemical and Allied Product Series
The Time Series Viewer displays a plot of the CHEMICAL series. The Time Series Viewer offers many useful features, which are explored in later sections. The Time Series Viewer appears in a separate resizable window. You can switch back and forth between the Time Series Viewer window and other windows. For now, return to the Develop Models window. You can close the Time Series Viewer window or leave it open. (To close the Time Series Viewer window, select Close from the toolbar or from the File menu.)
Fitting Models To open a menu of model fitting choices, select Edit from the menu bar and then select Fit Model, or select Fit Models from List in the toolbar, or simply select a blank line in the table as shown in Figure 37.32.
Fitting Models F 2421
Figure 37.32 Menu of Model Fitting Choices
The Forecasting System provides several ways to specify forecasting models. The eight choices given by the menu shown in Figure 37.32 are as follows: Fit Models Automatically
performs for the current series the same automatic model selection process that the Automatic Model Fitting window applies to a set of series. Fit Models from List
presents a list of commonly used forecasting models for convenient point-andclick selection. Fit Smoothing Model
displays the Smoothing Model Specification window, which enables you to specify several kinds of exponential smoothing and Winters method forecasting models. Fit ARIMA Model
displays the ARIMA Model Specification window, which enables you to specify many kinds of autoregressive integrated moving average (ARIMA) models, including seasonal ARIMA models and ARIMA models with regressors, transfer functions, and other predictors.
2422 F Chapter 37: Getting Started with Time Series Forecasting
Fit Factored ARIMA Model
displays the Factored ARIMA Model Specification window, which enables you to specify more general ARIMA models, including subset models and models with unusual and/or multiple seasonal cycles. It also supports regressors, transfer functions, and other predictors. Fit Custom Model
displays the Custom Model Specification window, which enables you to construct a forecasting model by specifying separate options for transforming the data, modeling the trend, modeling seasonality, modeling autocorrelation of the errors, and modeling the effect of regressors and other independent predictors. Combine Forecasts
displays the Forecast Combination Model Specification window, which enables you to specify models that produce forecasts by combining, or averaging, the forecasts from other models. (This option is not available unless you have fit at least two models.) Use External Forecasts
displays the External Forecast Model Specification window, which enables you to use judgmental or externally produced forecasts that have been saved in a separate series in the data set. All of the forecasting models used by the system are ultimately specified through one of the four windows: Smoothing Method Specification, ARIMA Model Specification, Factored ARIMA Model Specification, or Custom Model Specification. You can specify the same models with either the ARIMA Model Specification window or the Custom Model Specification window, but the Custom Model Specification window can provide a more natural way to specify models for those who are less familiar with the Box-Jenkins style of time series model specification. The Automatic Model feature, the Models to Fit window, and the Forecast Combination Model Specification window all deal with lists of forecasting models previously defined through the Smoothing Model, ARIMA Model, or Custom Model specification windows. These windows are discussed in detail in later sections. To get started using the Develop Models window, select the Fit Models from List item from the menu shown in Figure 37.32. This opens the Models to Fit window, as shown in Figure 37.33.
Fitting Models F 2423
Figure 37.33 Models to Fit Window
You can select several models to fit at once by holding down the CTRL key as you make the selections. Select Linear Trend and Double (Brown) Exponential Smoothing, as shown in Figure 37.34, and then select the OK button.
2424 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.34 Selecting Models to Fit
The system fits the two models you selected. After the models are fit, the labels of the two models and their goodness-of-fit statistic are added to the model table, as shown in Figure 37.35.
Model List and Statistics of Fit F 2425
Figure 37.35 Fitted Models List
Model List and Statistics of Fit In the model list, the Model Title column shows the descriptive labels for the two fitted models, in this case Linear Trend and Double Exponential Smoothing. The column labeled Root Mean Square Error (or labeled Mean Absolute Percent Error if you continued from the example in the previous section) shows the goodness-of-fit criterion used to decide which model fits better. By default, the criterion used is the root mean square error, but you can choose a different measure of fit. The linear trend model has a root mean square error of 1203, while the double exponential smoothing model fits better, with a RMSE of only 869. The left column labeled Forecast Model consists of check boxes that indicate which one of the models in the list has been selected as the model to use to produce the forecasts for the series. When new models are fit and added to the model list, the system sets the Forecast Model flags to designate the one model with the best fit—as measured by the selected goodness-of-fit statistic—as the forecast model. (In the case of ties, the first model with the best fit is selected.) Because the Double Exponential Smoothing model has the smaller RMSE of the two models in the list, its Forecast Model check box is set. If you would rather produce forecasts by using the Linear
2426 F Chapter 37: Getting Started with Time Series Forecasting
Trend model, choose it by selecting the corresponding check box in the Forecast Model column. To use a different goodness-of-fit criterion, select the button with the current criterion name on it (Root Mean Square Error or Mean Absolute Percent Error). This opens the Model Selection Criterion window, as shown in Figure 37.36. Figure 37.36 Model Selection Criterion Window
The system provides many measures of fit that you can use as the model selection criterion. To avoid confusion, only the most popular of the available fit statistics are shown in this window by default. To display the complete list, you can select the Show all option. You can control the subset of statistics listed in this window through the Statistics of Fit item in the Options menu on the Develop Models window. Initially, Root Mean Square Error is selected. Select R-Square and then select the OK button. This changes the fit statistic displayed in the model list, as shown in Figure 37.37.
Model Viewer F 2427
Figure 37.37 Model List with R-Square Statistics
Now that you have fit some models to the series, you can use the Model Viewer button to take a closer look at the predictions of these models.
Model Viewer In the Develop Models window, select the row in the table containing the Linear Trend model so that this model is highlighted. The model list should now appear as shown in Figure 37.38.
2428 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.38 Selecting a Model to View
Note that the Linear Trend model is now highlighted, but the Forecast Model column still shows the Double Exponential Smoothing model as the model chosen to produce the final forecasts for the series. Selecting a model in the list means that this is the model that menu items such as View Model, Delete, Edit, and Refit will act upon. Choosing a model by selecting its check box in the Forecast Model column means that this model will be used by the Produce Forecasts process to generate forecasts. Now open the Model Viewer by selecting the right-hand icon under the Browse button, or by selecting Model Predictions in the toolbar or from the View menu. The Model Viewer displays the Linear Trend model, as shown in Figure 37.39.
Prediction Error Plots F 2429
Figure 37.39 Model Viewer: Actual and Predicted Values Plot
This graph shows the linear trend line representing the model predicted values together with a plot of the actual data values, which fluctuate about the trend line.
Prediction Error Plots Select the second icon from the top in the vertical toolbar in the Model Viewer window. This switches the Viewer to display a plot of the model prediction errors (actual data values minus the predicted values), as shown in Figure 37.40.
2430 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.40 Model Viewer: Prediction Errors Plot
If the model being viewed includes a transformation, prediction errors are defined as the difference between the transformed series actual values and model predictions. You can choose to graph instead the difference between the untransformed series values and untransformed model predictions, which are called model residuals. You can also graph normalized prediction errors or normalized model residuals. Use the Residual Plot Options submenu under the Options menu.
Autocorrelation Plots Select the third icon from the top in the vertical toolbar. This switches the Viewer to display a plot of autocorrelations of the model prediction errors at different lags, as shown in Figure 37.41. Autocorrelations, partial autocorrelations, and inverse autocorrelations are displayed, with lines overlaid at plus and minus two standard errors. You can switch the graphs so that the bars represent significance probabilities by selecting the Correlation Probabilities item on the toolbar or from the View menu. For more information about the meaning and use of autocorrelation plots, see Chapter 7, “The ARIMA Procedure.”
White Noise and Stationarity Plots F 2431
Figure 37.41 Model Viewer: Autocorrelations Plot
White Noise and Stationarity Plots Select the fourth icon from the top in the vertical toolbar. This switches the Viewer to display a plot of white noise and stationarity tests on the model prediction errors, as shown in Figure 37.42.
2432 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.42 Model Viewer: White Noise and Stationarity Plot
The white noise test bar chart shows significance probabilities of the Ljung-Box chi square statistic. Each bar shows the probability computed on autocorrelations up to the given lag. Longer bars favor rejection of the null hypothesis that the prediction errors represent white noise. In this example, they are all significant beyond the 0.001 probability level, so that you reject the null hypothesis. In other words, the high level of significance at all lags makes it clear that the linear trend model is inadequate for this series. The second bar chart shows significance probabilities of the augmented Dickey-Fuller test for unit roots. For example, the bar at lag three indicates a probability of 0.0014, so that you reject the null hypothesis that the series is nonstationary. The third bar chart is similar to the second except that it represents the seasonal lags. Since this series has a yearly seasonal cycle, the bars represent yearly intervals. You can select any of the bars to display an interpretation. Select the fourth bar of the middle chart. This displays the Recommendation for Current View, as shown in Figure 37.43. This window gives an interpretation of the test represented by the bar that was selected; it is significant, therefore a stationary series is likely. It also gives a recommendation: You do not need to perform a simple difference to make the series stationary.
Parameter Estimates Table F 2433
Figure 37.43 Model Viewer: Recommendation for Current View
Parameter Estimates Table Select the fifth icon from the top in the vertical toolbar to the right of the graph. This switches the Viewer to display a table of parameter estimates for the fitted model, as shown in Figure 37.44.
2434 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.44 Model Viewer: Parameter Estimates Table
For the linear trend model, the parameters are the intercept and slope coefficients. The table shows the values of the fitted coefficients together with standard errors and t tests for the statistical significance of the estimates. The model residual variance is also shown.
Statistics of Fit Table Select the sixth icon from the top in the vertical toolbar to the right of the table. This switches the Viewer to display a table of statistics of fit computed from the model prediction errors, as shown in Figure 37.45. The list of statistics displayed is controlled by selecting Statistics of Fit from the Options menu.
Changing to a Different Model F 2435
Figure 37.45 Model Viewer: Statistics of Fit Table
Changing to a Different Model Select the first icon in the vertical toolbar to the right of the table to return the display to the predicted and actual values plots (Figure 37.39). Now return to the Develop Models window, but do not close the Model Viewer window. You can use the Next Viewer icon in the toolbar or your system’s window manager controls to switch windows. You can resize the windows to make them both visible. Select the Double Exponential Smoothing model so that this line of the model list is highlighted. The Model Viewer window is now updated to display a plot of the predicted values for the Double Exponential Smoothing model, as shown in Figure 37.46. The Model Viewer is automatically updated to display the currently selected model, unless you specify Unlink (the third icon in the window’s horizontal toolbar).
2436 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.46 Model Viewer Plot for Exponential Smoothing Model
Forecasts and Confidence Limits Plots Select the seventh icon from the top in the vertical toolbar to the right of the graph. This switches the Viewer to display a plot of forecast values and confidence limits, together with actual values and one-step-ahead within-sample predictions, as shown in Figure 37.47.
Data Table F 2437
Figure 37.47 Model Viewer: Forecasts and Confidence Limits
Data Table Select the last icon at the bottom of the vertical toolbar to the right of the graph. This switches the Viewer to display the forecast data set as a table, as shown in Figure 37.48.
2438 F Chapter 37: Getting Started with Time Series Forecasting
Figure 37.48 Model Viewer: Forecast Data Table
To view the full data set, use the vertical and horizontal scroll bars on the data table or enlarge the window.
Closing the Model Viewer Other features of the Model Viewer and Develop Models window are discussed later in this book. For now, close the Model Viewer window and return to the Time Series Forecasting window. To close the Model Viewer window, select Close from the window’s horizontal toolbar or from the File menu.
Chapter 38
Creating Time ID Variables Contents Creating a Time ID Value from a Starting Date and Frequency . . . . . . . . . . . Using Observation Numbers as the Time ID . . . . . . . . . . . . . . . . . . . . . Creating a Time ID from Other Dating Variables . . . . . . . . . . . . . . . . . .
2439 2443 2446
The Forecasting System requires that the input data set contain a time ID variable. If the data you want to forecast are not in this form, you can use features of the Forecasting System to help you add time ID variables to your data set. This chapter shows examples of how to use these features.
Creating a Time ID Value from a Starting Date and Frequency As a first example of adding a time ID variable, use the SAS data set created by the following statements. (Or use your own data set if you prefer.) data no_id; input y @@; datalines; 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 run;
Submit these SAS statements to create the data set NO_ID. This data set contains the single variable Y. Assume that Y is a quarterly series and starts in the first quarter of 1991. In the Time Series Forecasting window, use the Browse button to the right of the Data set field to bring up the Data Set Selection window. Select the WORK library, and then select the NO_ID data set. You must create a time ID variable for the data set. Click the Create button to the right of the Time ID field. This opens a menu of choices for creating the Time ID variable, as shown in Figure 38.1.
2440 F Chapter 38: Creating Time ID Variables
Figure 38.1 Time ID Creation Popup Menu
Select the first choice, Create from starting date and frequency. This opens the Time ID Creation from Starting Date window shown in Figure 38.2.
Creating a Time ID Value from a Starting Date and Frequency F 2441
Figure 38.2 Time ID Creation from Starting Date Window
Enter the starting date, 1991:1, in the Starting Date field. Select the Interval list arrow and select QTR. The Interval value QTR means that the time interval between successive observations is a quarter of a year; that is, the data frequency is quarterly. Now select the OK button. The system prompts you for the name of the new data set. If you want to create a new copy of the input data set with the DATE variable added, enter a name for the new data set. If you want to replace the NO_ID data set with the new copy containing DATE, just select the OK button without changing the name. For this example, change the New name field to WITH_ID and select the OK button. The data set WITH_ID is created containing the series Y from NO_ID and the added ID variable DATE. The system returns to the Data Set Selection window, which now appears as shown in Figure 38.3.
2442 F Chapter 38: Creating Time ID Variables
Figure 38.3 Data Set Selection Window after Creating Time ID
Select the Table button to see the new data set WITH_ID. This opens a VIEWTABLE window for the data set WITH_ID, as shown in Figure 38.4. Select File and Close to close the VIEWTABLE window.
Using Observation Numbers as the Time ID F 2443
Figure 38.4 Viewtable Display of Data Set with Time ID Added
Using Observation Numbers as the Time ID Normally, the time ID variable contains date values. If you do not want to have dates associated with your forecasts, you can also use observation numbers as time ID variables. However, you still must have an ID variable. This can be illustrated by adding an observation index time ID variable to the data set NO_ID. In the Data Set Selection window, select the data set NO_ID again. Select the Create button to the right of the Time ID field. Select the fourth choice, Create from observation numbers. This opens the Time ID Variable Creation window shown in Figure 38.5.
2444 F Chapter 38: Creating Time ID Variables
Figure 38.5 Create Time ID Variable Window
Select the OK button. This opens the New Data Set Name window. Enter “OBS_ID” in the New data set name field. Enter “T” in the New ID variable name field. Now select the OK button. The new data set OBS_ID is created, and the system returns to the Data Set Selection window, which now appears as shown in Figure 38.6.
Using Observation Numbers as the Time ID F 2445
Figure 38.6 Data Set Selection Window after Creating Time ID
The Interval field for OBS_ID has the value ‘1’. This means that the values of the time ID variable T increment by one between successive observations. Select the Table button to look at the OBS_ID data set, as shown in Figure 38.7.
2446 F Chapter 38: Creating Time ID Variables
Figure 38.7 VIEWTABLE of Data Set with Observation Index ID
Select File and Close to close the VIEWTABLE window. Select the OK button from the Data Set Selection window to return to the Time Series Forecasting window.
Creating a Time ID from Other Dating Variables Your data set might contain ID variables that date the observations in a different way than the SAS date valued ID variable expected by the forecasting system. For example, for monthly data, the data set might contain the ID variables YEAR and MONTH, which together date the observations. In these cases, you can use the Forecasting System’s Create Time ID features to compute a time ID variable with SAS date values from the existing dating variables. As an example of this, use the SAS data set read in by the following SAS statements: data id_parts; input yr qtr y; datalines;
Creating a Time ID from Other Dating Variables F 2447
91 1 91 2 91 3 91 4 92 1 92 2 92 3 92 4 93 1 93 2 93 3 93 4 94 1 94 2 94 3 94 4 run;
10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
Submit these SAS statements to create the data set ID_PARTS. This data set contains the three variables YR, QTR, and Y. YR and QTR are ID variables that together date the observations, but each variable provides only part of the date information. Because the forecasting system requires a single dating variable containing SAS date values, you need to combine YR and QTR to create a single variable DATE. Type “ID_PARTS” in the Data Set field and press the ENTER key. (You could also use the Browse button to open the Data Set Selection window, as in the previous example, and complete this example from there.) Select the Create button at the right of the Time ID field. This opens the menu of Create Time ID choices, as shown in Figure 38.8.
2448 F Chapter 38: Creating Time ID Variables
Figure 38.8 Adding a Time ID Variable
Select the second choice, Create from existing variables. This opens the window shown in Figure 38.9.
Creating a Time ID from Other Dating Variables F 2449
Figure 38.9 Creating a Time ID Variable from Date Parts
In the Variables list, select YR. In the Date Part list, select YEAR as shown in Figure 38.10.
2450 F Chapter 38: Creating Time ID Variables
Figure 38.10 Specifying the ID Variable for Years
Now click the right-pointing arrow button. The variable YR and the part code YEAR are added to the Existing Time IDs list. Next select QTR from the Variables list, select QTR from the Date Part list, and click the arrow button. This adds the variable QTR and the part code QTR to the Existing Time IDs list, as shown in Figure 38.11.
Creating a Time ID from Other Dating Variables F 2451
Figure 38.11 Creating a Time ID Variable from Date Parts
Now select the OK button. This opens the New Data Set Name window. Change the New data set name field to NEWDATE, and then select the OK button. The data set NEWDATE is created, and the system returns to the Time Series Forecasting window with NEWDATE as the selected Data Set. The Time ID field is set to DATE, and the Interval field is set to QTR.
2452
Chapter 39
Specifying Forecasting Models Contents Series Diagnostics . . . . . . . . . . . . . . . . . . Models to Fit Window . . . . . . . . . . . . . . . . Automatic Model Selection . . . . . . . . . . . . . . Smoothing Model Specification Window . . . . . . . ARIMA Model Specification Window . . . . . . . . Factored ARIMA Model Specification Window . . . Custom Model Specification Window . . . . . . . . Editing the Model Selection List . . . . . . . . . . . Forecast Combination Model Specification Window . Incorporating Forecasts from Other Sources . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
2453 2457 2459 2462 2465 2468 2472 2478 2482 2485
This chapter explores the tools available through the Develop Models window for investigating the properties of time series and for specifying and fitting models. The first section shows you how to diagnose time series properties in order to determine the class of models appropriate for forecasting series with such properties. Later sections show you how to specify and fit different kinds of forecasting models.
Series Diagnostics The series diagnostics tool helps you determine the kinds of forecasting models that are appropriate for the data series so that you can limit the search for the best forecasting model. The series diagnostics address these three questions: Is a log transformation needed to stabilize the variance? Is a time trend present in the data? Is there a seasonal pattern to the data? The automatic model fitting process, which you used in the previous chapter through the Automatic Model Fitting window, performs series diagnostics and selects trial models from a list according to the results. You can also look at the diagnostic information and make your own decisions as to the kinds of models appropriate for the series. The following example illustrates the series diagnostics features. Select “Develop Models” from the Time Series Forecasting window. Select the library SASHELP, the data set CITIMON, and the series RCARD. This series represents domestic retail sales of passenger cars. To look at this series, select “View Series” from the Develop Models window. This opens the Time Series Viewer window, as shown in Figure 39.1.
2454 F Chapter 39: Specifying Forecasting Models
Figure 39.1 Automobile Sales Series
Select “Diagnose Series” from the Tools menu. You can do this from the Develop Models window or from the Time Series Viewer window. Figure 39.2 shows this from the Develop Models window.
Series Diagnostics F 2455
Figure 39.2 Selecting Series Diagnostics
This opens the Series Diagnostics window, as shown in Figure 39.3.
2456 F Chapter 39: Specifying Forecasting Models
Figure 39.3 Series Diagnostics Window
Each of the three series characteristics—need for log transformation, presence of a trend, and seasonality—has a set of options for Yes, No, and Maybe. Yes indicates that the series has the characteristic and that forecasting models fit to the series should be able to model and predict this behavior. No indicates that you do not need to consider forecasting models designed to predict series with this characteristic. Maybe indicates that models with and without the characteristic should be considered. Initially, all these values are set to Maybe. To have the system diagnose the series characteristics, select the Automatic Series Diagnostics button. This runs the diagnostic routines described in Chapter 44, “Forecasting Process Details,” and sets the options according to the results. In this example, Trend and Seasonality are changed from Maybe to Yes, while Log Transform remains set to Maybe. These diagnostic criteria affect the models displayed when you use the Models to Fit window or the Automatic Model Selection model-fitting options described in the following section. You can set the criteria manually, according to your judgment, by selecting any of the options, whether you have used the Automatic Series Diagnostics button or not. For this exercise, leave them as set by the automatic diagnostics. Select the OK button to close the Series Diagnostics window.
Models to Fit Window F 2457
Models to Fit Window As you saw in the previous chapter, you can select models from a list. Invoke the Models to Fit window by clicking the middle of the table and selecting “Fit Models from List” from the menu. This can also be selected from the tool bar or the Fit Model submenu of the Edit menu. The Models to Fit window comes up, as shown in Figure 39.4. Figure 39.4 Models to Fit Window
Since you have performed series diagnostics, the models shown are the subset that fits the diagnostic criteria. Suppose you want to consider models other than those in this subset because you are undecided about including a trend in the model. Select the Show all models option. Now the entire model selection list is shown. Scroll through the list until you find Log Seasonal Exponential Smoothing, as shown in Figure 39.5.
2458 F Chapter 39: Specifying Forecasting Models
Figure 39.5 Selecting a Model from List
This is a nontrended model, which seems a good candidate. Select this model, and then select the OK button. The model is fit to the series and then appears in the table with the value of the selected fit criterion, as shown in Figure 39.6.
Automatic Model Selection F 2459
Figure 39.6 Develop Models Window Showing Model Fit
You can edit the model list that appears in the Models to Fit window by selecting “Options” and “Model Selection List” from the menu bar or by selecting the Edit Model List toolbar icon. You can then delete models you are not interested in from the default list and add models using any of the model specification methods described in this chapter. When you save your project, the edited model selection list is saved in the project file. In this way, you can use the Select from List item and the Automatic Model Selection item to select models from a customized search set.
Automatic Model Selection Automatic model selection is equivalent to choosing Select from List, as you did in the preceding section, fitting all the models in the subset list and then deleting all except the best fitting of the models. If series diagnostics have not yet been done, they are performed automatically to determine the model subset to fit. If you set the series diagnostics for log, trend, or seasonal criteria manually using the radio buttons, these choices are honored by the automatic fitting process.
2460 F Chapter 39: Specifying Forecasting Models
Using automatic selection, the system does not pause to warn you of model fitting errors, such as failure of the estimates to converge (you can track these using the audit trail feature). By default, only the best fitting model is kept. However, you can control the number of automatically fit models retained in the Develop Models list, and the following example shows how to do this. From the menu bar, choose “Options” and “Automatic Fit.” This opens the Automatic Model Selection Options window. Click the Models to Keep list arrow, and select “All models”, as shown in Figure 39.7. Now select OK. Figure 39.7 Selecting Number of Automatic Fit Models to Keep
Next, select “Fit Models Automatically” by clicking the middle of the table or using the toolbar or Edit menu. The Automatic Model Selection window appears, showing the diagnostic criteria in effect and the number of models to be fit, as shown in Figure 39.8.
Automatic Model Selection F 2461
Figure 39.8 Automatic Model Selection Window
Select the OK button. After the models have been fit, all of them appear in the table, in addition to the model which you fit earlier, as shown in Figure 39.9.
2462 F Chapter 39: Specifying Forecasting Models
Figure 39.9 Automatically Fit Models
Smoothing Model Specification Window To fit exponential smoothing and Winters models not already provided in the Models to Fit window, select “Fit Smoothing Model” from the pop-up menu or toolbar or select “Smoothing Model” from the Fit Model submenu of the Edit menu. This opens the Smoothing Model Specification window, as shown in Figure 39.10.
Smoothing Model Specification Window F 2463
Figure 39.10 Smoothing Model Specification Window
The Smoothing Model Specification window consists of several parts. At the top is the series name and a field for the label of the model you are specifying. The model label is filled in with an automatically generated label as you specify options. You can type over the automatic label with your own label for the model. To restore the automatic label, enter a blank label. The Smoothing Methods box lists the different methods available. Below the Smoothing Methods box is the Transformation field, which is used to apply the smoothing method to transformed series values. The Smoothing Weights box specifies how the smoothing weights are determined. By default, the smoothing weights are automatically set to optimize the fit of the model to the data. See Chapter 44, “Forecasting Process Details,” for more information about how the smoothing weights are fit. Under smoothing methods, select “Winters Method – Additive.” Notice the smoothing weights box to the right. The third item, Damping, is grayed out, while the other items, Level, Trend, and Season, show the word Optimize. This tells you that these three smoothing weights are applicable to the smoothing method that you selected and that the system is currently set to optimize these weights for you. Next, specify a transformation using the Transformation list. A menu of transformation choices pops up, as shown in Figure 39.11.
2464 F Chapter 39: Specifying Forecasting Models
Figure 39.11 Transformation Options
You can specify a logarithmic, logistic, square root, or Box-Cox transformation. For this example, select “Square Root” from the list. The Transformation field is now set to Square Root. This means that the system will first take the square roots of the series values, apply the additive version of the Winters method to the square root series, and then produce the predictions for the original series by squaring the Winters method predictions (and multiplying by a variance factor if the Mean Prediction option is set in the Forecast Options window). See Chapter 44, “Forecasting Process Details,” for more information about predictions from transformed models. The Smoothing Model Specification window should now appear as shown in Figure 39.12. Select the OK button to fit the model. The model is added to the table of fitted models in the Develop Models window.
ARIMA Model Specification Window F 2465
Figure 39.12 Winter’s Method Applied to Square Root Series
ARIMA Model Specification Window To fit ARIMA or Box-Jenkins models not already provided in the Models to Fit window, select the ARIMA model item from the pop-up menu, toolbar, or Edit menu. This opens the ARIMA Model Specification window, as shown in Figure 39.13.
2466 F Chapter 39: Specifying Forecasting Models
Figure 39.13 ARIMA Model Specification Window
This ARIMA Model Specification window is structured according to the Box and Jenkins approach to time series modeling. You can specify the same time series models with the Custom Model Specification window and the ARIMA Model Specification window, but the windows are structured differently, and you may find one more convenient than the other. At the top of the ARIMA Model Specification window is the name and label of the series and the label of the model you are specifying. The model label is filled in with an automatically generated label as you specify options. You can type over the automatic label with your own label for the model. To restore the automatic label, enter a blank label. Using the ARIMA Model Specification window, you can specify autoregressive (p), differencing (d), and moving average (q) orders for both simple and seasonal factors. You can specify transformations with the Transformation list. You can also specify whether an intercept is included in the ARIMA model. In addition to specifying seasonal and nonseasonal ARIMA processes, you can also specify predictor variables and other terms as inputs to the model. ARIMA models with inputs are sometimes called ARIMAX models or Box-Tiao models. Another term for this kind of model is dynamic regression. In the lower part of the ARIMA Model Specification window is the list of predictors to the model
ARIMA Model Specification Window F 2467
(initially empty). You can specify predictors by using the Add button. This opens a menu of different kinds of independent effects, as shown in Figure 39.14. Figure 39.14 Add Predictors Menu
The kinds of predictor effects allowed include time trends, regressors, adjustments, dynamic regression (transfer functions), intervention effects, and seasonal dummy variables. How to use different kinds of predictors is explained in Chapter 41, “Using Predictor Variables.” As an example, in the ARIMA Options box, set the order of differencing d to 1 and the moving average order q to 2. You can either type in these values or click the arrows and select the values from pop-up lists. These selections specify an ARIMA(0,1,2) or IMA(1,2) model. (See Chapter 7, “The ARIMA Procedure,” for more information about the notation used for ARIMA models.) Notice that the model label at the top is now IMA(1,2) NOINT, meaning that the data are differenced once and a second-order moving-average term is included with no intercept. In the Seasonal ARIMA Options box, set the seasonal moving-average order Q to 1. This adds a first-order moving-average term at the seasonal (12 month) lag. Finally, select “Log” in the Transformation combo box. The model label is now Log ARIMA(0,1,2)(0,0,1)s NOINT, and the window appears as shown
2468 F Chapter 39: Specifying Forecasting Models
in Figure 39.15. Figure 39.15 Log ARIMA(0,1,2)(0,0,1)s Specified
Select the OK button to fit the model. The model is fit and added to the Develop Models table.
Factored ARIMA Model Specification Window To fit a factored ARIMA model, select the Factored ARIMA model item from the pop-up menu, toolbar, or Edit menu. This brings up the Factored ARIMA Model Specification window, shown in Figure 39.16.
Factored ARIMA Model Specification Window F 2469
Figure 39.16 Factored ARIMA Model Specification Window
The Factored ARIMA Model Specification window is similar to the ARIMA Model Specification window and has the same features, but it uses a more general specification of the autoregressive (p), differencing (d), and moving-average (q) terms. To specify these terms, select the corresponding Set button, as shown in Figure 39.16. For example, to specify autoregressive terms, select the first Set button. This opens the AR Polynomial Specification Window, shown in Figure 39.17.
2470 F Chapter 39: Specifying Forecasting Models
Figure 39.17 AR Polynomial Specification Window
To add AR polynomial terms, select the New button. This opens the Polynomial Specification Window, shown in Figure 39.18. Specify the first lag you want to include by using the Lag spin box, then select the Add button. Repeat this process, adding each lag you want to include in the current list. All lags must be specified. For example, if you add only lag 3, the model contains only lag 3, not 1 through 3. As an example, add lags 1 and 3, then select the OK button. The AR Polynomial Specification Window now shows (1,3) in the list of polynomials. Now select “New” again. Add lags 6 and 12 and select “OK”. Now the AR Polynomial Specification Window shows (1,3) and (6,12) as shown in Figure 39.17. Select “OK” to close this window. The Factored ARIMA Model Specification Window now shows the factored model p=(1,3)(6,12). Use the same technique to specify the q terms, or moving-average part of the model. There is no limit to the number of lags or the number of factors you can include in the model.
Factored ARIMA Model Specification Window F 2471
Figure 39.18 Polynomial Specification Window
To specify differencing lags, select the middle Set button to open the Differencing Specification window. Specify lags using the spin box and add them to the list with the Add button. When you select “OK” to close the window, the differencing lags appear after d= in the Factored ARIMA Specification Window, within a single pair of parentheses. You can use the Factored ARIMA Model Specification Window to specify any model that you can specify with the ARIMA Model and Custom Model windows, but the notation is more similar to that of the ARIMA procedure (see Chapter 7, “The ARIMA Procedure”). Consider as an example the classic Airline model fit to the International Airline Travel series, SASHELP.AIR. This is a factored model with one moving-average term at lag one and one moving-average term at the seasonal lag, with first-order differencing at the simple and seasonal lags. Using the ARIMA Model Specification Window, you specify the value 1 for the q and d terms and also for the Q and D terms, which represent the seasonal lags. For monthly data, the seasonal lags represent lag 12, since a yearly seasonal cycle is assumed. By contrast, the Factored ARIMA Model Specification Window makes no assumptions about seasonal cycles. The Airline model is written as IMA d=(1,12) q=(1)(12) NOINT. To specify the differencing terms, add the values 1 and 12 in the Differencing Specification Window and select OK. Then select “New” in the MA Polynomial Specification Window, add the value 1, and select OK. To add the factored term, select “New” again, add the value 12, and select OK. Remember to
2472 F Chapter 39: Specifying Forecasting Models
select “No” in the Intercept radio box, since it is not selected by default. Select OK to close the Factored ARIMA Model Specification Window and fit the model. You can show that the results are the same as they are when you specify the model by using the ARIMA Model Specification Window and when you select Airline Model from the default model list. If you are familiar with the ARIMA Procedure (Chapter 7, “The ARIMA Procedure”), you might want to turn on the Show Source Statements option before fitting the model, then examine the procedure source statements in the log window after fitting the model. The strength of the Factored ARIMA Specification approach lies in its ability to contruct unusual ARIMA models, such as: Subset models These are models of order n, where fewer than n lags are specified. For example, an AR order 3 model might include lags 1 and 3 but not lag 2. Unusual seasonal cycles For example, a monthly series might cycle two or four times per year instead of just once. Multiple cycles For example, a daily sales series might peak on a certain day each week and also once a year at the Christmas season. Given sufficient data, you can fit a three-factor model, such as IMA d=(1) q=(1)(7)(365). Models with high order lags take longer to fit and often fail to converge. To save time, select the Conditional Least Squares or Unconditional Least Squares estimation method (see Figure 39.16). Once you have narrowed down the list of candidate models, change to the Maximum Likelihood estimation method.
Custom Model Specification Window To fit a custom time series model not already provided in the Models to Fit window, select the Custom Model item from the pop-up menu, toolbar, or Edit menu. This opens the Custom Model Specification window, as shown in Figure 39.19.
Custom Model Specification Window F 2473
Figure 39.19 Custom Model Specification Window
You can specify the same time series models with the Custom Model Specification window and the ARIMA Model Specification window, but the windows are structured differently, and you might find one more convenient than the other. At the top of the Custom Model Specification window is the name and label of the series and the label of the model you are specifying. The model label is filled in with an automatically generated label as you specify options. You can type over the automatic label with your own label for the model. To restore the automatic label, enter a blank label. The middle part of the Custom Model Specification window consists of four fields: Transformation, Trend Model, Seasonal Model, and Error Model. These fields allow you to specify the model in four parts. Each part specifies how a different aspect of the pattern of the time series is modeled and predicted. The Predictors list at the bottom of the Custom Model Specification window allows you to include different kinds of predictor variables in the forecasting model. The Predictors feature for the Custom Model Specification window is like the Predictors feature for the ARIMA Model Specification window, except that time trend predictors are provided through the Trend Model field and seasonal dummy variable predictors are provided through the Seasonal Model field. To illustrate how to use the Custom Model Specification window, the following example specifies
2474 F Chapter 39: Specifying Forecasting Models
the same model you fit by using the ARIMA Model Specification window. First, specify the data transformation to use. Select “Log” using the Transformation combo box. Second, specify how to model the trend in the series. Select First Difference in the Trend Model combo box, as shown in Figure 39.20. Figure 39.20 Trend Model Options
Next, specify how to model the seasonal pattern in the series. Select “Seasonal ARIMA” in the Seasonal Model combo box, as shown in Figure 39.21.
Custom Model Specification Window F 2475
Figure 39.21 Seasonal Model Options
This opens the Seasonal ARIMA Model Options window, as shown in Figure 39.22.
2476 F Chapter 39: Specifying Forecasting Models
Figure 39.22 Seasonal ARIMA Model Options
Specify a first-order seasonal moving-average term by typing 1 or by selecting “1” from the Moving Average: Q= combo box pop-up menu, and then select the OK button. Finally, specify how to model the autocorrelation pattern in the model prediction errors. Select the Set button to the right of the Error Model field. This opens the Error Model Options window, as shown in Figure 39.23. This window allows you to specify an ARMA error process. Set the Moving Average order q to 2, and then select the OK button.
Custom Model Specification Window F 2477
Figure 39.23 Error Model Options
The Custom Model Specification window should now appear as shown in Figure 39.24. The model label at the top of the Custom Model Specification window should now read Log ARIMA(0,1,2)(0,0,1)s NOINT, just as it did when you used the ARIMA Model Specification window.
2478 F Chapter 39: Specifying Forecasting Models
Figure 39.24 Log ARIMA(0,1,2)(0,0,1)s Specified
Now that you have seen how the Custom Model Specification window works, select “Cancel” to exit the window without fitting the model. This should return you to the Develop Models window.
Editing the Model Selection List Now that you know how to specify new models that are not included in the system default model selection list, you can edit the model selection list to add models that you expect to use in the future or to delete models that you do not expect to use. When you save the forecasting project to a SAS catalog, the edited model selection list is saved with the project file, and the list is restored when you load the project. There are two reasons why you would add a model to the model selection list. First, by adding the model to the list, you can fit the model to different time series by selecting it through the Fit Models from List action. You do not need to specify the model again every time you use it. Second, once the model is added to the model selection list, it is available to the automatic model
Editing the Model Selection List F 2479
selection process. The model is then considered automatically whenever you use the automatic model selection feature for any series. To edit the model selection list, select “Model Selection List” from the Options menu as shown in Figure 39.25, or select the Edit Model List toolbar icon. Figure 39.25 Model Selection List Option
This selection brings up the Model Selection List editor window, as shown in Figure 39.26. This window consists of the model selection list and an “Auto Fit” column, which controls for each model whether the model is included in the list of models used by the automatic model selection process.
2480 F Chapter 39: Specifying Forecasting Models
Figure 39.26 Model Selection List Window
To add a model to the list, select “Add Model” from the Edit menu and then select “Smoothing Model,” “ARIMA Model,” “Factored ARIMA Model,” or “Custom Model” from the submenu. Alternatively, click the corresponding icon on the toolbar. As an example, select “Smoothing Model.” This brings up the Smoothing Model Specification window. Note that the series name is “-Null-.” This means that you are not specifying a model to be fit to a particular series, but are specifying a model to be added to the selection list for later reference. Specify a smoothing model. For example, select “Simple Smoothing” and then select the Square Root transformation. The window appears as shown in Figure 39.27.
Editing the Model Selection List F 2481
Figure 39.27 Adding a Model Specification
Select the OK button to add the model to the end of the model selection list and return you to the Model Selection List window, as shown in Figure 39.28. You can now select the Fit Models from List model-fitting option to use the edited selection list. Figure 39.28 Model Added to Selection List
If you want to delete one or more models from the list, select the model labels to highlight them in the list. Click a second time to clear a selected model. Then select “Delete” from the Edit pulldown menu, or the corresponding toolbar icon. As an example, delete the Square Root Simple Exponential Smoothing model that you just added.
2482 F Chapter 39: Specifying Forecasting Models
The Model Selection List editor window gives you a lot of flexibility for managing multiple model lists, as explained in the section “Model Selection List Editor Window” on page 2611. For example, you can create your own model lists from scratch or modify or combine previously saved model lists and those provided with the software, and you can save them and designate one as the default for future projects. Now select “Close” from the File menu (or the Close icon) to close the Model Selection List editor window.
Forecast Combination Model Specification Window Once you have fit several forecasting models to a series, you face the question of which model to use to produce the final forecasts. One possible answer is to combine or average the forecasts from several models. Combining the predictions from several different forecasting methods is a popular approach to forecasting. The way that you produce forecast combinations with the Time Series Forecasting System is to use the Forecast Combination Model Specification window to specify a new forecasting model that performs the averaging of forecasts from the models you want to combine. This new model is added to the list of fitted models just like other models. You can then use the Model Viewer window features and Model Fit Comparison window features to examine the fit of the combined model. To specify a forecast combination model, select “Combine Forecasts” from the pop-up menu or toolbar, or select “Edit” and “Fit Model” from the menu bar. This brings up the Forecast Combination Model Specification window, as shown in Figure 39.29.
Forecast Combination Model Specification Window F 2483
Figure 39.29 Forecast Combination Window
At the top of the Forecast Combination window is the name and label of the series and the label of the model you are specifying. The model label is filled in with an automatically generated label as you specify options. You can type over the automatic label with your own label for the model. To restore the automatic label, enter a blank label. The middle part of the Forecast Combination window consists of the list of models that you have fit to the series. This table shows the label and goodness-of-fit measure for each model and the combining weight assigned to the model. The Weight column controls how much weight is given to each model in the combined forecasts. A missing weight means that the model is not used. Initially, all the models have missing weight values. You can enter the weight values you want to use in the Weight column. Alternatively, you can select models from the Model Description column, and weight values for the models you select are set automatically. To remove a model from the combination, select it again. This resets its weight value to missing. At the bottom of the Forecast Combination window are two buttons: Normalize Weights and Fit Regression Weights. The Normalize Weights button adjusts the nonmissing weight values so that they sum to one. The Fit Regression Weights button uses linear regression to compute the
2484 F Chapter 39: Specifying Forecasting Models
weight values that produce the combination of model predictions with the best fit to the series. If no models are selected, the Fit Regression Weights button fits weights for all the models in the list. You can compute regression weights for only some of the models by first selecting the models you want to combine and then selecting Fit Regression Weights. In this case, only the nonmissing Weight values are replaced with regression weights. As an example of how to combine forecasting models, select all the models in the list. After you have finished selecting the models, all the models in the list should now have equal weight values, which implies a simple average of the forecasts. Now select the Fit Regression Weights button. The system performs a linear regression of the series on the predictions from the models with nonmissing weight values and replaces the weight values with the estimated regression coefficients. These are the combining weights that produce the smallest mean square prediction error within the sample. The Forecast Combination window should now appear as shown in Figure 39.30. (Note that some of the regression weight values are negative.) Figure 39.30 Combining Models
Select the OK button to fit the combined model. Now the Develop Models window shows this model to be the best fitting according to the root mean square error, as shown in Figure 39.31.
Incorporating Forecasts from Other Sources F 2485
Figure 39.31 Develop Models Window Showing All Models Fit
Notice that the combined model has a smaller root mean square error than any one of the models included in the combination. The confidence limits for forecast combinations are produced by taking a weighted average of the mean square prediction errors for the component forecasts, ignoring the covariance between the prediction errors.
Incorporating Forecasts from Other Sources You might have forecasts from other sources that you want to include in the forecasting process. Examples of other forecasts you might want to use are “best guess” forecasts based on personal judgments, forecasts produced by government agencies or commercial forecasting services, planning scenarios, and reference or “base line” projections. Because such forecasts are produced externally to the Time Series Forecasting System, they are referred to as external forecasts. You can include external forecasts in combination models to produce compromise forecasts that split the difference between the external forecast and forecasting models that you fit. You can use external forecasts to compare them to the forecasts from models that are fit by the system.
2486 F Chapter 39: Specifying Forecasting Models
To include external forecasts in the Time Series Forecasting process, you must first supply the external forecast as a variable in the input data set. You then specify a special kind of forecasting “model” whose predictions are identical to the external forecast recorded in the data set. As an example, suppose you have 12 months of sales data and five months of sales forecasts based on a consensus opinion of the sales staff. The following statements create a SAS data set containing made-up numbers for this situation. data widgets; input date monyy5. sales staff; format date monyy5.; label sales = "Widget Sales" staff = "Sales Staff Consensus Forecast"; datalines; jun94 142.1 . jul94 139.6 . aug94 145.0 . sep94 150.2 . oct94 151.1 . nov94 154.3 . dec94 158.7 . jan95 155.9 . feb95 159.2 . mar95 160.8 . apr95 162.0 . may95 163.3 . jun95 . 166. jul95 . 168. aug95 . 170. sep95 . 171. oct95 . 177. run;
Submit the preceding statements in the SAS Program Editor window. From the Time Series Forecasting window, select “Develop Models.” In the Series Selection window, select the data set WORK.WIDGETS and the variable SALES. The Develop Models window should now appear as shown in Figure 39.32.
Incorporating Forecasts from Other Sources F 2487
Figure 39.32 Develop Models Window
Now select “Edit,” “Fit Model,” and “External Forecasts” from the menu bar of the Develop Models window, as shown in Figure 39.33, or the Use External Forecasts toolbar icon.
2488 F Chapter 39: Specifying Forecasting Models
Figure 39.33 Adding a Model for an External Forecast Series
This selection opens the External Forecast Model Specification window. Select the STAFF variable as shown in Figure 39.34.
Incorporating Forecasts from Other Sources F 2489
Figure 39.34 External Forecast Series Selected
Select the OK button. The external forecast model is now “fit” and added to the Develop Models list, as shown in Figure 39.35.
2490 F Chapter 39: Specifying Forecasting Models
Figure 39.35 Model for External Forecast
You can now use this model for comparison with the predictions from other forecasting models that you fit, or you can include it in a forecast combination model. Note that no fitting is actually performed for an external forecast model. The predictions of the external forecast model are simply the values of the external forecast series read from the input data set. The goodness-of-fit statistics for such models will depend on the values that the external forecast series contains for observations within the period of fit. In this case, no STAFF values are given for past periods, and therefore the fit statistics for the model are missing.
Chapter 40
Choosing the Best Forecasting Model Contents Time Series Viewer Features . . . . . . . . Model Viewer Prediction Error Analysis . . The Model Selection Criterion . . . . . . . Sorting and Selecting Models . . . . . . . Comparing Models . . . . . . . . . . . . . Controlling the Period of Evaluation and Fit Refitting and Reevaluating Models . . . . . Using Hold-out Samples . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
2491 2498 2502 2504 2505 2506 2508 2508
The Time Series Forecasting System provides a variety of tools for identifying potential forecasting models and for choosing the best fitting model. It allows you to decide how much control you want to have over the process, from a hands-on approach to one that is completely automated. This chapter begins with an exploration of the tools available through the Series Viewer and Model Viewer. It presents an example of identifying models graphically and exercising your knowledge of model properties. The remainder of the chapter shows you how to compare models by using a variety of statistics and by controlling the fit and evaluation time ranges. It concludes by showing you how to refit existing models and how to compare models using hold-out samples.
Time Series Viewer Features The Time Series Viewer is a graphical tool for viewing and analyzing time series. It can be used separately from the Time Series Forecasting System by using the TSVIEW command or by selecting Time Series Viewer from the Analysis pull-down menu under Solutions. In this chapter you will use the Time Series Viewer to examine plots of your series before fitting models. Begin this example by invoking the Forecasting system and selecting the View Series Graphically button, as shown in Figure 40.1, or the View Series toolbar icon.
2492 F Chapter 40: Choosing the Best Forecasting Model
Figure 40.1 Invoking the Time Series Viewer
From the Series Selection window, select SASHELP as the library, WORKERS as the data set, and MASONRY as the time series, and then click the Graph button. The Time Series Viewer displays a plot of the series, as shown in Figure 40.2.
Time Series Viewer Features F 2493
Figure 40.2 Series Plot
Select the Zoom In icon, the first one on the window’s horizontal toolbar. Notice that the mouse cursor changes shape and that “Note: Click on a corner of the region, then drag to the other corner” appears on the message line. Outline an area, as shown in Figure 40.3, by clicking the mouse at the upper-left corner, holding the button down, dragging to the lower-right corner, and releasing the button.
2494 F Chapter 40: Choosing the Best Forecasting Model
Figure 40.3 Selecting an Area for Zoom
The zoomed plot should appear as shown in Figure 40.4.
Time Series Viewer Features F 2495
Figure 40.4 Zoomed Plot
You can repeat the process to zoom in still further. To return to the previous view, select the Zoom Out icon, the second icon on the window’s horizontal toolbar. The third icon on the horizontal toolbar is used to link or unlink the viewer window. By default, the viewer is linked, meaning that it is automatically updated to reflect selection of a different time series. To see this, return to the Series Selection window by clicking on it or using the Window menu or Next Viewer toolbar icon. Select the Electric series in the Time Series Variables list box. Notice that the Time Series Viewer window is updated to show a plot of the ELECTRIC series. Select the Link/Unlink icon if you prefer to unlink the viewer so that it is not automatically updated in this way. Successive selections toggle between the linked and unlinked state. A note on the message line informs you of the state of the Time Series Viewer window. When a Time Series Viewer window is linked, selecting View Series again makes the linked Viewer window active. When no Time Series Viewer window is linked, selecting View Series opens an additional Time Series Viewer window. You can bring up as many Time Series Viewer windows as you want. Having seen the plot in Figure 40.2, you might suspect that the series is nonstationary and seasonal. You can gain further insight into this by examining the sample autocorrelation function (ACF), partial autocorrelation function (PACF), and inverse autocorrelation function (IACF) plots. To switch
2496 F Chapter 40: Choosing the Best Forecasting Model
the display to the autocorrelation plots, select the second icon from the top on the vertical toolbar at the right side of the Time Series Viewer. The plot appears as shown in Figure 40.5. Figure 40.5 Sample Autocorrelation Plots
Each bar represents the value of the correlation coefficient at the given lag. The overlaid lines represent confidence limits computed at plus and minus two standard errors. You can switch the graphs to show significance probabilities by selecting Correlation Probabilities under the Options pull-down menu, or by selecting the Toggle ACF Probabilities toolbar icon. The slow decline of the ACF suggests that first differencing might be warranted. To see the effect of first differencing, select the simple difference icon, the fifth icon from the left on the window’s horizontal toolbar. The plot now appears as shown in Figure 40.6.
Time Series Viewer Features F 2497
Figure 40.6 ACF Plots with First Difference Applied
Since the ACF still displays slow decline at seasonal lags, seasonal differencing is appropriate (in addition to the first differencing already applied). Select the Seasonal Difference icon, the sixth icon from the left on the horizontal toolbar. The plot now appears as shown in Figure 40.7.
2498 F Chapter 40: Choosing the Best Forecasting Model
Figure 40.7 ACF Plot with Simple and Seasonal Differencing
Model Viewer Prediction Error Analysis Leave the Time Series Viewer open for the remainder of this exercise. Drag it out of the way or push it to the background so that you can return to the Time Series Forecasting window. Select Develop Models, then click an empty part of the table to bring up the pop-up menu, and select Fit ARIMA Model. Define the ARIMA(0,1,0)(0,1,0)s model by selecting 1 for Differencing under ARIMA Options, 1 for Differencing under Seasonal ARIMA Options, and No for Intercept, as shown in Figure 40.8.
Model Viewer Prediction Error Analysis F 2499
Figure 40.8 Specifying the ARIMA(0,1,0)(0,1,0)s Model
When you select the OK button, the model is fit and you are returned to the Develop Models window. Click on an empty part of the table and choose Fit Models from List from the popup menu. Select Airline Model from the window. (Airline Model is a common name for the ARIMA(0,1,1)(0,1,1)s model, which is often used for seasonal data with a linear trend.) Select the OK button. Once the model has been fit, the table shows the two models and their root mean square errors. Notice that the Airline Model provides only a slight improvement over the differencing model, ARIMA(0,1,0)(0,1,0)s. Select the first row to highlight the differencing model, as shown in Figure 40.9.
2500 F Chapter 40: Choosing the Best Forecasting Model
Figure 40.9 Selecting a Model
Now select the View Selected Model Graphically button, below the Browse button at the right side of the Develop Models window. The Model Viewer window appears, showing the actual data and model predictions for the MASONRY series. (Note that predicted values are missing for the first 13 observations due to simple and seasonal differencing.) To examine the ACF plot for the model prediction errors, select the third icon from the top on the vertical toolbar. For this model, the prediction error ACF is the same as the ACF of the original data with first differencing and seasonal differencing applied. This differencing is apparent if you bring the Time Series Viewer back into view for comparison. Return to the Develop Models Window by clicking on it or using the window pull-down menu or the Next Viewer toolbar icon. Select the second row of the table in the Develop Models window to highlight the Airline Model. The Model Viewer is automatically updated to show the prediction error ACF of the newly selected model, as shown in Figure 40.10.
Model Viewer Prediction Error Analysis F 2501
Figure 40.10 Prediction Error ACF Plot for the Airline Model
Another helpful tool available within the Model Viewer is the parameter estimates table. Select the fifth icon from the top of the vertical toolbar. The table gives the parameter estimates for the two moving-average terms in the Airline Model, as well as the model residual variance, as shown in Figure 40.11.
2502 F Chapter 40: Choosing the Best Forecasting Model
Figure 40.11 Parameter Estimates for the Airline Model
You can adjust the column widths in the table by dragging the vertical borders of the column titles with the mouse. Notice that neither of the parameter estimates is significantly different from zero at the 0.05 level of significance, since Prob>|t| is greater than 0.05. This suggests that the Airline Model should be discarded in favor of the more parsimonious differencing model, which has no parameters to estimate.
The Model Selection Criterion Return to the Develop Models window (Figure 40.9) and notice the Root Mean Square Error button at the right side of the table banner. This is the model selection criterion—the statistic used by the system to select the best fitting model. So far in this example you have fit two models and have left the default criterion, root mean square error (RMSE), in effect. Because the Airline Model has the smaller value of this criterion and because smaller values of the RMSE indicate better fit, the system has chosen this model as the forecasting model, indicated by the check box in the Forecast Model column.
The Model Selection Criterion F 2503
The statistics available as model selection criteria are a subset of the statistics available for informational purposes. To access the entire set, select Options from the menu bar, and then select Statistics of Fit. The Statistics of Fit Selection window appears, as shown in Figure 40.12. Figure 40.12 Statistics of Fit
By default, five of the more well known statistics are selected. You can select and deselect statistics by clicking the check boxes in the left column. For this exercise, select All, and notice that all the check boxes become checked. Select the OK button to close the window. Now if you choose Statistics of Fit in the Model Viewer window, all of the statistics will be shown for the selected model. To change the model selection criterion, click the Root Mean Square Error button or select Options from the menu bar and then select Model Selection Criterion. Notice that most of the statistics of fit are shown, but those which are not relevant to model selection, such as number of observations, are not shown. Select Schwarz Bayesian Information Criterion and click OK. Since this statistic puts a high penalty on models with larger numbers of parameters, the ARIMA(0,1,0)(0,1,0)s model comes out with the better fit. Notice that changing the selection criterion does not automatically select the model that is best according to that criterion. You can always choose the model you want to use for forecasts by
2504 F Chapter 40: Choosing the Best Forecasting Model
selecting its check box in the Forecast Model column. Now bring up the Model Selection Criterion window again and select Akaike Information Criterion. This statistic puts a lesser penalty on number of parameters, and the Airline Model comes out as the better fitting model.
Sorting and Selecting Models Select Sort Models on the Tools menu or from the toolbar. This sorts the current list of fitted models by the current selection criterion. Although some selection criteria assign larger values to better fitting models (for example, R-square) while others assign smaller values to better fitting models, Sort Models always orders models with the best fitting model—in this case, the Airline Model—at the top of the list. When you select a model in the table, its name and criterion value become highlighted, and actions that apply to that model become available. If your system supports a right mouse button, you can click it to invoke a pop-up menu, as shown in Figure 40.13. Figure 40.13 Right Mouse Button Pop-up Menu
Comparing Models F 2505
Whether or not you have a right mouse button, the same choices are available under Edit and View from the menu bar. If the model viewer has been invoked, it is automatically updated to show the selected model, unless you have unlinked the viewer by using the Link/Unlink toolbar button. Select the highlighted model in the table again. Notice that it is no longer highlighted. When no models are highlighted, the right mouse button pop-up menu changes, and items on the menu bar that apply to a selected model become unavailable. For example, you can choose Edit from the menu bar, but you can’t choose the Edit Model or Delete Model selections unless you have highlighted a model in the table. When you select the check box in the Forecast Model column of the table, the model in that row becomes the forecasting model. This is the model that will be used the next time forecasts are generated by choosing View Forecasts or by using the Produce Forecasts window. Note that this forecasting model flag is automatically set when you use Fit Automatic Model or when you fit an individual model that fits better, using the current selection criterion, than the current forecasting model.
Comparing Models Select Tools and Compare Models from the menu bar. This displays the Model Fit Comparison table, as shown in Figure 40.14.
2506 F Chapter 40: Choosing the Best Forecasting Model
Figure 40.14 Model Comparison Window
The two models you have fit are shown as Model 1 and Model 2. When there are more than two models, you can bring any two of them into the table by selecting the up and down arrows. In this way, it is easy to do pairwise comparisons on any number of models, looking at as many statistics of fit as you like. Since you previously chose to display all statistics of fit, all of them are shown in the comparison table. Use the vertical scroll bar to move through the list. After you have examined the model comparison table, select the Close button to return to the Develop Models window.
Controlling the Period of Evaluation and Fit Notice the three time ranges shown on the Develop Models window (Figure 40.9). The data range shows the beginning and ending dates of the MASONRY time series. The period of fit shows the beginning and ending dates of data used to fit the models. The period of evaluation shows the beginning and ending dates of data used to compute statistics of fit. By default, the fit and evaluate ranges are the same as the data range. To change these ranges, select the Set Ranges
Controlling the Period of Evaluation and Fit F 2507
button, or select Options and Time Ranges from the menu bar. This brings up the Time Ranges Specification window, as shown in Figure 40.15. Figure 40.15 Time Ranges Specification Window
For this example, suppose the early data in the series is unreliable, and you want to use the range June 1978 to the latest available for both model fitting and model evaluation. You can either type JUN1978 in the From column for Period of Fit and Period of Evaluation, or you can advance these dates by clicking the right pointing arrows. The outer arrow advances the date by a large amount (in this case, by a year), and the inner arrow advances it by a single period (in this case, by a month). Once you have changed the Period of Fit and the Period of Evaluation to JUN1978 in the From column, select the OK button to return to the Develop Models window. Notice that these time ranges are updated at the top of the window, but the models already fit have not been affected. Your changes to the time ranges affect subsequently fit models.
2508 F Chapter 40: Choosing the Best Forecasting Model
Refitting and Reevaluating Models If you fit the ARIMA(0,1,0)(0,1,0)s and Airline models again in the same way as before, they will be added to the model list, with the same names but with different values of the model selection criterion. Parameter estimates will be different, due to the new fit range, and statistics of fit will be different, due to the new evaluation range. For this exercise, instead of specifying the models again, refit the existing models by selecting Edit from the menu bar and then selecting Refit Models and All Models. After the models have been refit, you should see the same two models listed in the table but with slightly different values for the selection criterion. The ARIMA (0,1,0)(0,1,0)s and Airline models have now been fit to the MASONRY series by using data from June 1978 to July 1982, since this is the period of fit you specified. The statistics of fit have been computed for the period of evaluation, which was the same as the period of fit. If you had specified a period of evaluation different from the period of fit, the statistics would have been computed accordingly. In practice, another common reason for refitting models is the availability of new data. For example, when data for a new month become available for a monthly series, you might add them to the input data set, then invoke the forecasting system, open the project containing models fit previously, and refit the models prior to generating new forecasts. Unless you specify the period of fit and period of evaluation in the Time Ranges Specification window, they default to the full data range of the series found in the input data set at the time of refitting. If you prefer to apply previously fit models to revised data without refitting, use Reevaluate Models instead of Refit Models. This recomputes the statistics of fit by using the current evaluation range, but does not re-estimate the model parameters.
Using Hold-out Samples One important application of model fitting where the period of fit is different from the period of evaluation is the use of hold-out samples. With this technique of model evaluation, the period of fit ends at a time point before the end of the data series, and the remainder of the data are held out as a nonoverlapping period of evaluation. With respect to the period of fit, the hold-out sample is a period in the future, used to compare the forecasting accuracy of models fit to past data. For this exercise, use a hold-out sample of 12 months. Bring up the Time Ranges Specification window again by selecting the Set Ranges button. Set Hold-out Sample to 12 using the combo box, as shown in Figure 40.16. You can also type in a value. To specify a hold-out sample period in different units, you can use the Periods combo box. In this case, it allows you to select years as the unit, instead of periods.
Using Hold-out Samples F 2509
Figure 40.16 Specifying the Hold-out Sample Size
Notice that setting the hold-out sample to 12 automatically sets the fit range to JUN1978–JUL1981 and the evaluation range to AUG1981–JUL1982. If you had set the period of fit and period of evaluation to these ranges, the hold-out sample would have been automatically set to 12 periods. Select the OK button to return to the Develop Models window. Now refit the models again. Select Tools and Compare Models to compare the models now that they have been fit to the period June 1978 through July 1981 and evaluated for the hold-out sample period August 1981 through July 1982. Note that the fit statistics for the hold-out sample are based on one-step-ahead forecasts. (See Statistics of Fit in Chapter 44, “Forecasting Process Details.”) As shown in Figure 40.17, the ARIMA (0,1,0)(0,1,0)s model now seems to provide a better fit to the data than does the Airline model. It should be noted that the results can be quite different if you choose a different size hold-out sample.
2510 F Chapter 40: Choosing the Best Forecasting Model
Figure 40.17 Using 12 Month Hold-out Sample
Chapter 41
Using Predictor Variables Contents Linear Trend . . . . . . . . . . . . . . . . . . Time Trend Curves . . . . . . . . . . . . . . . Regressors . . . . . . . . . . . . . . . . . . . Adjustments . . . . . . . . . . . . . . . . . . . Dynamic Regressor . . . . . . . . . . . . . . . Interventions . . . . . . . . . . . . . . . . . . The Intervention Specification Window . Specifying a Trend Change Intervention . Specifying a Level Change Intervention . Modeling Complex Intervention Effects . Fitting the Intervention Model . . . . . . Limitations of Intervention Predictors . . Seasonal Dummies . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
2513 2515 2518 2522 2523 2527 2528 2530 2532 2533 2535 2538 2539 2542
Forecasting models predict the future values of a series by using two sources of information: the past values of the series and the values of other time series variables. Other variables used to predict a series are called predictor variables. Predictor variables that are used to predict the dependent series can be variables in the input data set, such as regressors and adjustment variables, or they can be special variables computed by the system as functions of time, such as trend curves, intervention variables, and seasonal dummies. You can specify seven different types of predictors in forecasting models by using the ARIMA Model or Custom Model Specification windows. You cannot specify predictor variables with the Smoothing Model Specification window. Figure 41.1 shows the menu of options for adding predictors to an ARIMA model that is opened by clicking the Add button. The Add menu for the Custom Model Specification menu is similar.
2512 F Chapter 41: Using Predictor Variables
Figure 41.1 Add Predictors Menu
These types of predictors are as follows. Linear Trend
adds a variable that indexes time as a predictor series. A straight line time trend is fit to the series by regression when you specify a linear trend.
Trend Curve
provides a menu of various functions of time that you can add to the model to fit nonlinear time trends. The Linear Trend option is a special case of the Trend Curve option for which the trend curve is a straight line.
Regressors
allows you to predict the series by regressing it on other variables in the data set.
Adjustments
allows you to specify other variables in the data set that supply adjustments to the forecast.
Dynamic Regressor
allows you to select a predictor variable from the input data set and specify a complex model for the way that the predictor variable affects the dependent series.
Interventions
allows you to model the effect of special events that “intervene” to change the pattern of the dependent series. Examples of intervention effects are strikes, tax increases, and special sales promotions.
Linear Trend F 2513
Seasonal Dummies
adds seasonal indicator or “dummy” variables as regressors to model seasonal effects.
You can add any number of predictors to a forecasting model, and you can combine predictor variables with other model options. The following sections explain these seven kinds of predictors in greater detail and provide examples of their use. The examples illustrate these different kinds of predictors by using series in the SASHELP.USECON data set. Select the Develop Models button from the main window. Select the data set SASHELP.USECON and select the series PETROL. Then select the View Series Graphically button from the Develop Models window. The plot of the example series PETROL appears as shown in Figure 41.2. Figure 41.2 Sales of Petroleum and Coal
Linear Trend From the Develop Models window, select Fit ARIMA Model. From the ARIMA Model Specification window, select Add and then select Linear Trend from the menu (shown in Figure 41.1).
2514 F Chapter 41: Using Predictor Variables
A linear trend is added to the Predictors list, as shown in Figure 41.3. Figure 41.3 Linear Trend Predictor Specified
The description for the linear trend item shown in the Predictors list has the following meaning. The first part of the description, Trend Curve, describes the type of predictor. The second part, _LINEAR_, gives the variable name of the predictor series. In this case, the variable is a time index that the system computes. This variable is included in the output forecast data set. The final part, Linear Trend, describes the predictor. Notice that the model you have specified consists only of the time index regressor _LINEAR_ and an intercept. Although this window is normally used to specify ARIMA models, in this case no ARIMA model options are specified, and the model is a simple regression on time. Select the OK button. The Linear Trend model is fit and added to the model list in the Develop Models window. Now open the Model Viewer by using the View Model Graphically icon or the Model Predictions item under the View pull-down menu or toolbar. This displays a plot of the model predictions and actual series values, as shown in Figure 41.4. The predicted values lie along the least squares trend line.
Time Trend Curves F 2515
Figure 41.4 Linear Trend Model
Time Trend Curves From the Develop Models window, select Fit ARIMA Model. From the ARIMA Model Specification window, select Add and then select Trend Curve from the menu (shown in Figure 41.1). A menu of different kinds of trend curves is displayed, as shown in Figure 41.5.
2516 F Chapter 41: Using Predictor Variables
Figure 41.5 Time Trend Curves Menu
These trend curves work in a similar way as the Linear Trend option (which is a special case of a trend curve and one of the choices on the menu), but with the Trend Curve menu you have a choice of various nonlinear time trends. Select Quadratic Trend. This adds a quadratic time trend to the Predictors list, as shown in Figure 41.6.
Time Trend Curves F 2517
Figure 41.6 Quadratic Trend Specified
Now select the OK button. The quadratic trend model is fit and added to the list of models in the Develop Models window. The Model Viewer displays a plot of the quadratic trend model, as shown in Figure 41.7.
2518 F Chapter 41: Using Predictor Variables
Figure 41.7 Quadratic Trend Model
This curve does not fit the PETROL series very well, but the View Model plot illustrates how time trend models work. You might want to experiment with different trend models to see what the different trend curves look like. Some of the trend curves require transforming the dependent series. When you specify one of these curves, a notice is displayed reminding you that a transformation is needed, and the Transformation field is automatically filled in. Therefore, you cannot control the Transformation specification when some kinds of trend curves are specified. See the section “Time Trend Curves” on page 2515 in Chapter 44, “Forecasting Process Details,” for more information about the different trend curves.
Regressors From the Develop Models window, select Fit ARIMA Model. From the ARIMA Model Specification window, select Add and then select Regressors from the menu (shown in Figure 41.1).
Regressors F 2519
This displays the Regressors Selection window, as shown in Figure 41.8. This window allows you to select any number of other series in the input data set as regressors to predict the dependent series. Figure 41.8 Regressors Selection Window
For this example,
select CHEMICAL, Sales:
Chemicals and Allied Products, and Motor Vehicles and Parts. (Note: You do not need to use the CTRL key when selecting more than one regressor.) Then select the OK button. The two variables you VEHICLES, Sales:
selected are added to the Predictors list as regressor type predictors, as shown in Figure 41.9.
2520 F Chapter 41: Using Predictor Variables
Figure 41.9 Regressors Selected
You must have forecasts of the future values of the regressor variables in order to use them as predictors. To do this, you can specify a forecasting model for each regressor, have the system automatically select forecasting models for the regressors, or supply predicted future values for the regressors in the input data set. Even if you have supplied future values for a regressor variable, the system requires a forecasting model for the regressor. Future values that you supply in the input data set take precedence over predicted values from the regressor’s forecasting model when the system computes the forecasts for the dependent series. Select the OK button. The system starts to fit the regression model but then stops and displays a warning that the regressors that you selected do not have forecasting models, as shown in Figure 41.10.
Regressors F 2521
Figure 41.10 Regressors Needing Models Warning
If you want the system to create forecasting models automatically for the regressor variables by using the automatic model selection process, select the OK button. If not, you can select the Cancel button to abort fitting the regression model. For this example, select the OK button. The system now performs the automatic model selection process for CHEMICAL and VEHICLES. The selected forecasting models for CHEMICAL and VEHICLES are added to the model lists for those series. If you switch the current time series in the Develop Models window to CHEMICAL or VEHICLES, you will see the model that the system selected for that series. Once forecasting models have been fit for all regressors, the system proceeds to fit the regression model for PETROL. The fitted regression model is added to the model list displayed in the Develop Models window.
2522 F Chapter 41: Using Predictor Variables
Adjustments An adjustment predictor is a variable in the input data set that is used to adjust the forecast values produced by the forecasting model. Unlike a regressor, an adjustment variable does not have a regression coefficient. No model fitting is performed for adjustments. Nonmissing values of the adjustment series are simply added to the model prediction for the corresponding period. Missing adjustment values are ignored. If you supply adjustment values for observations within the period of fit, the adjustment values are subtracted from the actual values, and the model is fit to these adjusted values. To add adjustments, select Add and then select Adjustments from the pop-up menu (shown in Figure 41.1). This displays the Adjustments Selection window. The Adjustments Selection window functions the same as the Regressor Selection window (which is shown in Figure 41.8). You can select any number of adjustment variables as predictors. Unlike regressors, adjustments do not require forecasting models for the adjustment variables. If a variable that is used as an adjustment does have a forecasting model fit to it, the adjustment variable’s forecasting model is ignored when the variable is used as an adjustment. You can use forecast adjustments to account for expected future events that have no precedent in the past and so cannot be modeled by regression. For example, suppose you are trying to forecast the sales of a product, and you know that a special promotional campaign for the product is planned during part of the period you want to forecast. If such sales promotion programs have been frequent in the past, then you can record the past and expected future level of promotional efforts in a variable in the data set and use that variable as a regressor in the forecasting model. However, if this is the first sales promotion of its kind for this product, you have no way to estimate the effect of the promotion from past data. In this case, the best you can do is to make an educated guess at the effect the promotion will have and add that guess to what your forecasting model would predict in the absence of the special sales campaign. Adjustments are also useful for making judgmental alterations to forecasts. For example, suppose you have produced forecast sales data for the next 12 months. Your supervisor believes that the forecasts are too optimistic near the end and asks you to prepare a forecast graph in which the numbers that you have forecast are reduced by 1000 in the last three months. You can accomplish this task by editing the input data set so that it contains observations for the actual data range of sales plus 12 additional observations for the forecast period, and a new variable called, for example, ADJUSTMENT. The variable ADJUSTMENT contains the value 1000 for the last three observations and is missing for all other observations. You fit the same model previously selected for forecasting by using the ARIMA Model Specification or Custom Model Specification window, but with an adjustment added that uses the variable ADJUSTMENT. Now when you graph the forecasts by using the Model Viewer, the last three periods of the forecast are reduced by 1000. The confidence limits are unchanged, which helps draw attention to the fact that the adjustments to the forecast deviate from what would be expected statistically.
Dynamic Regressor F 2523
Dynamic Regressor Selecting Dynamic Regressor from the Add Predictors menu (shown in Figure 41.1) allows you to specify a complex time series model of the way that a predictor variable influences the series that you are forecasting. When you specify a predictor variable as a simple regressor, only the current period value of the predictor effects the forecast for the period. By specifying the predictor with the Dynamic Regression option, you can use past values of the predictor series, and you can model effects that take place gradually. Dynamic regression models are an advanced feature that you are unlikely to find useful unless you have studied the theory of statistical time series analysis. You might want to skip this section if you are not trained in time series modeling. The term dynamic regression was introduced by Pankratz (1991) and refers to what Box and Jenkins (1976) named transfer function models. In dynamic regression, you have a time series model, similar to an ARIMA model, that predicts how changes in the predictor series affect the dependent series over time. The dynamic regression model relates the predictor variable to the expected value of the dependent series in the same way that an ARIMA model relates the fluctuations of the dependent series about its conditional mean to the random error term (which is also called the innovation series). Refer to Pankratz (1991) and Box and Jenkins (1976) for more information about dynamic regression or transfer function models. See also Chapter 7, “The ARIMA Procedure.” From the Develop Models window, select Fit ARIMA Model. From the ARIMA Model Specification window, select Add and then select Linear Trend from the menu (shown in Figure 41.1). Now select Add and select Dynamic Regressor. Selection window, as shown in Figure 41.11.
This displays the Dynamic Regressors
2524 F Chapter 41: Using Predictor Variables
Figure 41.11 Dynamic Regressors Selection Window
You can select only one predictor series when specifying a dynamic regression model. For this example, select VEHICLES, Sales: Motor Vehicles and Parts. Then select the OK button. This displays the Dynamic Regression Specification window, as shown in Figure 41.12.
Dynamic Regressor F 2525
Figure 41.12 Dynamic Regression Specification Window
This window consists of four parts. The Input Transformations fields enable you to transform or lag the predictor variable. For example, you might use the lagged logarithm of the variable as the predictor series. The Order of Differencing fields enable you to specify simple and seasonal differencing of the predictor series. For example, you might use changes in the predictor variable instead of the variable itself as the predictor series. The Numerator Factors and Denominator Factors fields enable you to specify the orders of simple and seasonal numerator and denominator factors of the transfer function. Simple regression is a special case of dynamic regression in which the dynamic regression model consists of only a single regression coefficient for the current value of the predictor series. If you select the OK button without specifying any options in the Dynamic Regression Specification window, a simple regressor will be added to the model. For this example, use the Simple Order combo box for Denominator Factors and set its value to 1. The window now appears as shown in Figure 41.13.
2526 F Chapter 41: Using Predictor Variables
Figure 41.13 Distributed Lag Regression Specified
This model is equivalent to regression on an exponentially weighted infinite distributed lag of VEHICLES (in the same way an MA(1) model is equivalent to single exponential smoothing). Select the OK button to add the dynamic regressor to the model predictors list. In the ARIMA Model Specification window, the Predictors list should now contain two items, a linear trend and a dynamic regressor for VEHICLES, as shown in Figure 41.14.
Interventions F 2527
Figure 41.14 Dynamic Regression Model
This model is a multiple regression of PETROL on a time trend variable and an infinite distributed lag of VEHICLES. Select the OK button to fit the model. As with simple regressors, if VEHICLES does not already have a forecasting model, an automatic model selection process is performed to find a forecasting model for VEHICLES before the dynamic regression model for PETROL is fit.
Interventions An intervention is a special indicator variable, computed automatically by the system, that identifies time periods affected by unusual events that influence or intervene in the normal path of the time series you are forecasting. When you add an intervention predictor, the indicator variable of the intervention is used as a regressor, and the impact of the intervention event is estimated by regression analysis. To add an intervention to the Predictors list, you must use the Intervention Specification window to specify the time or times that the intervening event took place and to specify the type of intervention.
2528 F Chapter 41: Using Predictor Variables
You can add interventions either through the Interventions item of the Add action or by selecting Tools from the menu bar and then selecting Define Interventions. Intervention specifications are associated with the series. You can specify any number of interventions for each series, and once you define interventions you can select them for inclusion in forecasting models. If you select the Include Interventions option in the Options menu, any interventions that you have previously specified for a series are automatically added as predictors to forecasting models for the series. From the Develop Models window, invoke the series viewer by selecting the View Series Graphically icon or Series under the View menu. This displays the Time Series Viewer, as was shown in Figure 41.2. Note that the trend in the PETROL series shows several clear changes in direction. The upward trend in the first part of the series reverses in 1981. There is a sharp drop in the series towards the end of 1985, after which the trend is again upwardly sloped. Finally, in 1991 the series takes a sharp upward excursion but quickly returns to the trend line. You might have no idea what events caused these changes in the trend of the series, but you can use these patterns to illustrate the use of intervention predictors. To do this, you fit a linear trend model to the series, but modify that trend line by adding intervention effects to model the changes in trend you observe in the series plot.
The Intervention Specification Window From the Develop Models window, select Fit ARIMA model. From the ARIMA Model Specification window, select Add and then select Linear Trend from the menu (shown in Figure 41.1). Select Add again and then select Interventions. If you have any interventions already defined for the series, this selection displays the Interventions for Series window. However, since you have not previously defined any interventions, this list is empty. Therefore, the system assumes that you want to add an intervention and displays the Intervention Specification window instead, as shown in Figure 41.15.
The Intervention Specification Window F 2529
Figure 41.15 Interventions Specification Window
The top of the Intervention Specification window shows the current series and the label for the new intervention (initially blank). At the right side of the window is a scrollable table showing the values of the series. This table helps you locate the dates of the events you want to model. At the left of the window is an area titled Intervention Specification that contains the options for defining the intervention predictor. The Date field specifies the time that the intervention occurs. You can type a date value in the Date field, or you can set the Date value by selecting a row from the table of series values at the right side of the window. The area titled Type of Intervention controls the kind of indicator variable constructed to model the intervention effect. You can specify the following kinds of interventions: Point
is used to indicate an event that occurs in a single time period. An example of a point event is a strike that shuts down production for part of a time period. The value of the intervention’s indicator variable is zero except for the date specified.
Step
is used to indicate a continuing event that changes the level of the series. An example of a step event is a change in the law, such as a tax rate increase. The value of the intervention’s indicator variable is zero before the date specified and 1 thereafter.
2530 F Chapter 41: Using Predictor Variables
Ramp
is used to indicate a continuing event that changes the trend of the series. The value of the intervention’s indicator variable is zero before the date specified, and it increases linearly with time thereafter.
The areas titled Effect Time Window and Effect Decay Pattern specify how to model the effect that the intervention has on the dependent series. These options are not used for simple interventions, they will be discussed later in this chapter.
Specifying a Trend Change Intervention In the Time Series Viewer window position the mouse over the highest point in 1981 and select the point. This displays the data value, 19425, and date, February 1981, of that point in the upper-right corner of the Time Series Viewer, as shown in Figure 41.16. Figure 41.16 Identifying the Turning Point
Now that you know the date that the trend reversal occurred, enter that date in the Date field of the Intervention Specification window. Select Ramp as the type of intervention. The window should now appear as shown in Figure 41.17.
Specifying a Trend Change Intervention F 2531
Figure 41.17 Ramp Intervention Specified
Select the OK button. This adds the intervention to the list of interventions for the PETROL series, and returns you to the Interventions for Series window, as shown in Figure 41.18.
2532 F Chapter 41: Using Predictor Variables
Figure 41.18 Interventions for Series Window
This window allows you to select interventions for inclusion in the forecasting model. Since you need to define other interventions, select the Add button. This returns you to the Intervention Specification window (shown in Figure 41.15).
Specifying a Level Change Intervention Now add an intervention to account for the drop in the series in late 1985. You can locate the date of this event by selecting points in the Time Series Viewer plot or by scrolling through the data values table in the Interventions Specification window. Use the latter method so that you can see how this works. Scrolling through the table, you see that the drop was from 15262 in December 1985, to 13937 in January 1986, to 12002 in February, to 10834 in March. Since the drop took place over several periods, you could use another ramp type intervention. However, this example represents the drop as a sudden event by using a step intervention and uses February 1986 as the approximate time of the drop.
Modeling Complex Intervention Effects F 2533
Select the table row for February 1986 to set the Date field. Select Step as the intervention type. The window should now appear as shown in Figure 41.19. Figure 41.19 Step Intervention Specified
Select the OK button to add this intervention to the list for the series. Since the trend reverses again after the drop, add a ramp intervention for the same date as the step intervention. Select Add from the Interventions for Series window. Enter FEB86 in the Date field, select Ramp, and then select the OK button.
Modeling Complex Intervention Effects You have now defined three interventions to model the changes in trend and level. The excursion near the end of the series remains to be dealt with. Select Add from the Interventions for Series window. Scroll through the data values and select the date on which the excursion began, August 1990. Leave the intervention type as Point. The pattern of the series from August 1990 through January 1991 is more complex than a simple
2534 F Chapter 41: Using Predictor Variables
shift in level or trend. For this pattern, you need a complex intervention model for an event that causes a sharp rise followed by a rapid return to the previous trend line. To specify this model, use the Effect Time Window and Effect Decay Rate options. The Effect Time Window option controls the number of lags of the intervention’s indicator variable used to model the effect of the intervention on the dependent series. For a simple intervention, the number of lags is zero, which means that the effect of the intervention is modeled by fitting a single regression coefficient to the intervention’s indicator variable. When you set the number of lags greater than zero, regression coefficients are fit to lags of the indicator variable. This allows you to model interventions whose effects take place gradually, or to model rebound effects. For example, severe weather might reduce production during one period but cause an increase in production in the following period as producers struggle to catch up. You could model this by using a point intervention with an effect time window of 1 lag. This would fit two coefficients for the intervention, one for the immediate effect and one for the delayed effect. The Effect Decay Pattern option controls how the effect of the intervention dissipates over time. None specifies that there is no gradual decay: for point interventions, the effect ends immediately; for step and ramp interventions, the effect continues indefinitely. Exp specifies that the effect declines at an exponential rate. Wave specifies that the effect declines like an exponentially damped sine wave (or as the sum of two exponentials, depending on the fit to the data). If you are familiar with time series analysis, these options might be clearer if you note that together the Effect Time Window and Effect Decay Pattern options define the numerator and denominator orders of a transfer function or dynamic regression model for the indicator variable of the intervention. See the section “Dynamic Regressor” on page 2523 for more information. For this example, select 2 lags as the value of the Event Time Window option, and select Exp as the Effect Decay Pattern option. The window should now appear as shown in Figure 41.20.
Fitting the Intervention Model F 2535
Figure 41.20 Complex Intervention Model
Select the OK button to add the intervention to the list.
Fitting the Intervention Model The Interventions for Series window now contains definitions for four intervention predictors. Select all four interventions, as shown in Figure 41.21.
2536 F Chapter 41: Using Predictor Variables
Figure 41.21 Interventions for Series Window
Select the OK button. This returns you to the ARIMA Model Specification window, which now lists items in the Predictors list, as shown in Figure 41.22.
Fitting the Intervention Model F 2537
Figure 41.22 Linear Trend with Interventions Specified
Select the OK button to fit this model. After the model is fit, bring up the Model Viewer. You will see a plot of the model predictions, as shown in Figure 41.23.
2538 F Chapter 41: Using Predictor Variables
Figure 41.23 Linear Trend with Interventions Model
You can use the Zoom In feature to take a closer look at how the complex intervention effect fits the excursion in the series starting in August 1990.
Limitations of Intervention Predictors Note that the model you have just fit is intended only to illustrate the specification of interventions. It is not intended as an example of good forecasting practice. The use of continuing (step and ramp type) interventions as predictors has some limitations that you should consider. If you model a change in trend with a simple ramp intervention, then the trend in the data before the date of the intervention has no influence on the forecasts. Likewise, when you use a step intervention, the average level of the series before the intervention has no influence on the forecasts. Only the final trend and level at the end of the series are extrapolated into the forecast period. If a linear trend is the only pattern of interest, then instead of specifying step or ramp interventions, it would be simpler to adjust the period of fit so that the model ignores the data before the final trend
Seasonal Dummies F 2539
or level change. Step and ramp interventions are valuable when there are other patterns in the data—such as seasonality, autocorrelated errors, and error variance—that are stable across the changes in level or trend. Step and ramp interventions enable you to fit seasonal and error autocorrelation patterns to the whole series while fitting the trend only to the latter part of the series. Point interventions are a useful tool for dealing with outliers in the data. A point intervention will fit the series value at the specified date exactly, and it has the effect of removing that point from the analysis. When you specify an effect time window, a point intervention will exactly fit as many additional points as the number of lags specified.
Seasonal Dummies A Seasonal Dummies predictor is a special feature that adds to the model seasonal indicator or “dummy” variables to serve as regressors for seasonal effects. From the Develop Models window, select Fit ARIMA Model. From the ARIMA Model Specification window, select Add and then select Seasonal Dummies from the menu (shown in Figure 41.1). A Seasonal Dummies input is added to the Predictors list, as shown in Figure 41.24.
2540 F Chapter 41: Using Predictor Variables
Figure 41.24 Seasonal Dummies Specified
Select the OK button. A model consisting of an intercept and 11 seasonal dummy variables is fit and added to the model list in the Develop Models window. This is effectively a mean model with a separate mean for each month. Now return to the Model Viewer, which displays a plot of the model predictions and actual series values, as shown in Figure 41.25. This is obviously a poor model for this series, but it serves to illustrate how seasonal dummy variables work.
Seasonal Dummies F 2541
Figure 41.25 Seasonal Dummies Model
Now select the parameter estimates icon, the fifth from the top on the vertical toolbar. This displays the Parameter Estimates table, as shown in Figure 41.26.
2542 F Chapter 41: Using Predictor Variables
Figure 41.26 Parameter Estimates for Seasonal Dummies Model
Since the data for this example are monthly, the Seasonal Dummies option added 11 seasonal dummy variables. These include a dummy regressor variable that is 1.0 for January and 0 for other months, a regressor that is 1.0 only for February, and so forth through November. Because the model includes an intercept, no dummy variable is added for December. The December effect is measured by the intercept, while the effect of other seasons is measured by the difference between the intercept and the estimated regression coefficient for the season’s dummy variable. The same principle applies for other data frequencies: the “Seasonal Dummy 1” parameter always refers to the first period in the seasonal cycle; and, when an intercept is present in the model, there is no seasonal dummy parameter for the last period in the seasonal cycle.
References Box, G.E.P. and Jenkins, G.M. (1976), Time Series Analysis: Forecasting and Control, San Francisco: Holden-Day.
References F 2543
Pankratz, Alan (1991), Forecasting with Dynamic Regression Models, New York: John Wiley & Sons.
2544
Chapter 42
Command Reference Contents TSVIEW Command and Macro . . Syntax . . . . . . . . . . . . Examples . . . . . . . . . . . FORECAST Command and Macro . Syntax . . . . . . . . . . . . Examples . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
2545 2545 2546 2546 2547 2550
TSVIEW Command and Macro The TSVIEW command invokes the Time Series Viewer. This is a component of the Time Series Forecasting System that can also be used as a standalone graphical viewer for any time series data set or view. See the section “Time Series Viewer Window” in Chapter 43, “Window Reference,” for more information. The TSVIEW command must be specified from the command line or an SCL program. If you need to submit from the program editor, use the %TSVIEW macro instead. You can use the macro within a data step program, but you must submit it within the SAS windowing environment. If the TSVIEW command or %TSVIEW macro is issued without arguments, the Series Selection window appears to enable you to select an input data set and series. This is equivalent to selecting “Time Series Viewer” from the Analysis submenu of the Solutions menu. By specifying the DATA= and VAR= arguments, you can bring up the Time Series Viewer window directly. The ID= and INTERVAL= arguments are useful when the system cannot determine them automatically from the data.
Syntax The TSVIEW command has the following form: TSVIEW [options] ;
The %TSVIEW macro has the following form:
2546 F Chapter 42: Command Reference
%TSVIEW [(option, . . . , option) ] ;
The following options can be specified for the command and the macro. DATA=data set name
specifies the name of the SAS data set containing the input data. VAR=time series variable name
specifies the series variable name. It must be a numeric variable contained in the data set. ID=time id variable name
specifies the time ID variable name for the data set. If the ID= option is not specified, the system attempts to locate the variables named DATE, DATETIME, and TIME in the data set specified by the DATA= option. INTERVAL=interval name
specifies the time ID interval between observations in the data set.
Examples TSVIEW Command tsview data=sashelp.air var=air tsview data=dept.prod var=units id=period interval=qtr
%TSVIEW Macro %tsview( data=sashelp.air, var=air); %tsview( data=dept.prod, var=units, id=period, interval=qtr);
FORECAST Command and Macro The FORECAST command invokes the Time Series Forecasting System. The command must be specified from the command line or an SCL program. If you need to submit from the program editor, use the %FORECAST macro instead. You can use the macro within a data step program, but you must submit it within the SAS windowing environment. If the FORECAST command or %FORECAST macro is issued without arguments, the Time Series Forecasting window appears. This is equivalent to selecting “Time Series Forecasting System” from the Analysis submenu of the Solutions menu. Using the arguments, it is possible to do the following: Bring up the system with information already filled into some of the fields
Syntax F 2547
Bring up the system starting at a different window than the default Time Series Forecasting window Run the system in unattended mode so that a task such as creating a forecast data set is accomplished without any user interaction. By submitting such commands repeatedly from a SAS/AF or SAS/EIS application, it is possible to do “batch” processing for many data sets or by-group processing for many subsets of a data set. You can create a project in unattended mode and later open it for inspection interactively. You can also create a project interactively in order to set options, fit a model, or edit the list of models, and then use this project later in unattended mode. The Forecast Command Builder, a point-and-click SAS/AF application, makes it easy to specify, run, save, and rerun forecasting jobs by using the FORECAST command. To use it, enter the following on the command line (not the program editor): %FCB
or AF C=SASHELP.FORCAST.FORCCMD.FRAME.
Syntax The FORECAST command has the following form: FORECAST [options] ;
The %FORECAST macro has the following form: %FORECAST [(option, . . . , option ) ] ;
The following options can be specified for the command and the macro. PROJECT=project name
specifies the name of the SAS catalog entry in which forecasting models and other results are stored and from which previously stored results are loaded into the forecasting system. DATA=data set name
specifies the name of the SAS data set containing the input data. VAR=time series variable name
specifies the series variable name. It must be a numeric variable contained in the data set. ID=time id variable name
specifies the time ID variable name for the data set. If the ID= option is not specified, the system attempts to locate the variables named DATE, DATETIME, and TIME in the data set specified by the DATA= option. However, it is recommended that you specify the time ID variable whenever you are using the ENTRY= argument.
2548 F Chapter 42: Command Reference
INTERVAL=interval name
specifies the time ID interval between observations in the data set. Commonly used intervals are year, semiyear, qtr, month, semimonth, week, weekday, day, hour, minute, and second. See Chapter 4, “Date Intervals, Formats, and Functions,” for information about more complex interval specifications. If the INTERVAL= option is not specified, the system attempts to determine the interval based on the time ID variable. However, it is recommended that you specify the interval whenever you are using the ENTRY= argument. STAT=statistic
specifies the name of the goodness-of-fit statistic to be used as the model selection criterion. The default is RMSE. Valid names are sse
sum of square error
mse
mean square error
rmse
root mean square error
mae
mean absolute error
mape
mean absolute percent error
aic
Akaike information criterion
sbc
Schwarz Bayesian information criterion
rsquare
R-square
adjrsq
adjusted R-square
rwrsq
random walk R-square
arsq
Amemiya’s adjusted R-square
apc
Amemiya’s prediction criterion
CLIMIT=integer
specifies the level of the confidence limits to be computed for the forecast. This integer represents a percentage; for example, 925 indicates 92.5% confidence limits. The default is 95—that is, 95% confidence limits. HORIZON=integer
specifies the number of periods into the future for which forecasts are computed. The default is 12 periods. The maximum is 9999. ENTRY=name
The name of an entry point into the system. Valid names are main
starts the system at the Time Series Forecasting window (default).
devmod
starts the system at the Develop Models window.
viewmod
starts the system at the Model Viewer window. Specify a project that contains a forecasting model by using the PROJECT= option. If a project containing a model is not specified, the message “No forecasting model to view” appears.
Syntax F 2549
viewser
starts the system at the Time Series Viewer window.
autofit
runs the system in unattended mode, fitting a forecasting model automatically and saving it in a project. If PROJECT= is not specified, the default project name SASUSER.FMSPROJ.PROJ is used.
forecast
runs the system in unattended mode to generate a forecast data set. The name of this data set is specified by the OUT= parameter. If OUT= is not specified, a window appears to prompt for the name and label of the output data set. If PROJECT= is not specified, the default project name SASUSER.FMSPROJ.PROJ is used. If the project does not exist or does not contain a forecasting model for the specified series, automatic model fitting is performed and the forecast is computed by using the automatically selected model. If the project exists and contains a forecasting model for the specified series, the forecast is computed by using this model. If the series covers a different time range than it did when the project was created, use the REFIT or REEVAL keyword to reset the time ranges.
OUT=argument
specifies one or two-level name of a SAS data set in which forecasts are saved. Use in conjunction with ENTRY=FORECAST. If omitted, the system prompts for the name of the forecast data set. KEEP=argument
specifies the number of models to keep in the project when automatic model fitting is performed. This corresponds to Models to Keep in the Automatic Model Selection Options window. A value greater than 9 indicates that all models are kept. The default is 1. DIAG=YES|NO
specifies which models to search with regard to series diagnostics. DIAG= YES causes the automatic model selection process to search only over those models that are consistent with the series diagnostics. DIAG= NO causes the automatic model selection process to search over all models in the selection list, without regard for the series diagnostics. This corresponds to Models to Fit in the Automatic Model Selection Options window. The default is YES. REFIT=keyword
(for macro usage) refits a previously saved forecasting model by using the current fit range; that is, it reestimates the model parameters. Refitting also causes the model to be reevaluated (statistics of fit recomputed), and it causes the time ranges to be reset if the data range has changed (for example, if new observations have been added to the series). This keyword has no effect if you do not use the PROJECT= argument to reference an existing project containing a forecasting model. Use the REFIT keyword if you have added new data to the input series and you want to refit the forecasting model and update the forecast by using the new time ranges. Be sure to use the same project, data set, and series names that you used previously. REEVAL=keyword
(for macro usage) reevaluates a previously saved forecasting model by using the current evaluation range; that is, it recomputes the statistics of fit. Reevaluating also causes the time ranges to be reset if the data range has changed (for example, if new observations have been
2550 F Chapter 42: Command Reference
added to the series). It does not refit the model parameters. This keyword has no effect if you also specify REFIT, or if you do not use the PROJECT= argument to reference an existing project containing a forecasting model. Use the REEVAL keyword if you have added new data to the input series and want to update your forecast by using a previously fit forecasting model and the same project, data set, and series names that you used previously.
Examples FORECAST Command The following command opens the Time Series Forecasting window with the data set name and series name filled in. The time ID variable is also filled in since the data set contains the variable DATE. The interval is filled in because the system recognizes that the observations are monthly. forecast data=sashelp.air var=air
The following command opens the Time Series Forecasting window with the project, data set name, series, time ID, and interval fields filled in, assuming that the project SAMPROJ was previously saved either interactively or by using unattended mode as depicted below. Previously fit models appear when the Develop Models or Manage Projects window is opened. forecast project=samproj
The following command runs the system in unattended mode, fitting a model automatically, storing it in the project SAMPROJ in the default catalog SASUSER.FMSPROJ, and placing the forecasts in the data set WORK.SAMPOUT. forecast data=sashelp.workers var=electric id=date interval=month project=samproj entry=forecast out=sampout
The following command assumes that a new month’s data have been added to the data set from the previous example and that an updated forecast is needed that uses the previously fit model. Time ranges are automatically updated to include the new data since the REEVAL keyword is included. Substitute REFIT for REEVAL if you want the system to reestimate the model parameters. forecast data=sashelp.workers var=electric id=date interval=month project=samproj entry=forecast out=sampout reeval
The following command opens the model viewer with the project created in the previous example and with 99 percent confidence limits in the forecast graph. forecast data=sashelp.workers var=electric id=date interval=month project=samproj entry=viewmod climit=99
Examples F 2551
The final example illustrates using unattended mode with an existing project that has been defined interactively. In this example, the goal is to add a model to the model selection list, to specify that all models in that list be fit, and that all models which are fit successfully be retained. First open the Time Series Forecasting window and specify a new project name, WORKPROJ. Then select Develop Models, choosing SASHELP.WORKERS as the data set and MASONRY as the series. Now select “Model Selection List” from the Options menu. In the Model Selection List window, click Actions, then Add, and then ARIMA Model. Define the model ARIMA(0,1,0)(0,1,0)s NOINT by setting the differencing value to 1 under both ARIMA Options and Seasonal ARIMA Options. Select OK to save the model and OK to close the Model Selection List window. Now select “Automatic Fit” from the Options menu. In the Automatic Model Selection Options window, select “All autofit models in selection list” in the Models to fit radio box, select “All models” from the Models to keep combo box, and then click OK to close the window. Select “Save Project” from the File menu, and then close the Develop Models window and the Time Series Forecasting window. You now have a project with a new model added to the selection list, options set for automatic model fitting, and one series selected but no models fit. Now enter the command: forecast data=sashelp.workers var=electric id=date interval=month project=workproj entry=forecast out=workforc
The system runs in unattended mode to update the project and create the forecast data set WORKFORC. Check the messages in the Log window to find out if the run was successful and which model was selected for forecasting. To see the forecast data set, issue the command viewtable WORKFORC. To see the contents of the project, open the Time Series Forecasting window, open the project WORKPROJ, and select “Manage Projects.” You will see that the variable ELECTRIC was added to the project and has a forecasting model. Select this row in the table and then select List Models from the Tools menu. You will see that all of the models in the selection list which fit successfully are there, including the new model you added to the selection list.
%FORECAST Macro This example demonstrates the use of the %FORECAST macro to start the Time Series Forecasting System from a SAS program submitted from the Editor window. The SQL procedure is used to create a view of a subset of a products data set. Then the %FORECAST macro is used to produce forecasts. proc sql; create view selprod as select * from products where type eq ’A’ order by date; run; %forecast(data=selprod, var=amount, id=date, interval=day, entry=forecast, out=typea, proj=proda, refit= );
2552
Chapter 43
Window Reference Contents Overview . . . . . . . . . . . . . . . . . . . . . . . Adjustments Selection Window . . . . . . . . . . . AR/MA Polynomial Specification Window . . . . . ARIMA Model Specification Window . . . . . . . . ARIMA Process Specification Window . . . . . . . Automatic Model Fitting Window . . . . . . . . . . Automatic Model Fitting Results Window . . . . . . Automatic Model Selection Options Window . . . . Custom Model Specification Window . . . . . . . . Data Set Selection Window . . . . . . . . . . . . . . Default Time Ranges Window . . . . . . . . . . . . Develop Models Window . . . . . . . . . . . . . . . Differencing Specification Window . . . . . . . . . Dynamic Regression Specification Window . . . . . Dynamic Regressors Selection Window . . . . . . . Error Model Options Window . . . . . . . . . . . . External Forecast Model Specification Window . . . Factored ARIMA Model Specification Window . . . Forecast Combination Model Specification Window . Forecasting Project File Selection Window . . . . . Forecast Options Window . . . . . . . . . . . . . . Intervention Specification Window . . . . . . . . . . Interventions for Series Window . . . . . . . . . . . Manage Forecasting Project Window . . . . . . . . . Model Fit Comparison Window . . . . . . . . . . . Model List Window . . . . . . . . . . . . . . . . . . Model Selection Criterion Window . . . . . . . . . . Model Selection List Editor Window . . . . . . . . . Model Viewer Window . . . . . . . . . . . . . . . . Models to Fit Window . . . . . . . . . . . . . . . . Polynomial Specification Window . . . . . . . . . . Produce Forecasts Window . . . . . . . . . . . . . . Regressors Selection Window . . . . . . . . . . . . Save Data As . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2554 2554 2555 2557 2560 2561 2565 2568 2569 2573 2575 2576 2584 2585 2586 2587 2588 2589 2591 2593 2595 2595 2597 2599 2605 2606 2610 2611 2615 2621 2622 2623 2627 2628
2554 F Chapter 43: Window Reference
Save Graph As . . . . . . . . . . . . . . . . . . . Seasonal ARIMA Model Options Window . . . . . Series Diagnostics Window . . . . . . . . . . . . . Series Selection Window . . . . . . . . . . . . . . Series to Process Window . . . . . . . . . . . . . Series Viewer Transformations Window . . . . . . Smoothing Model Specification Window . . . . . . Smoothing Weight Optimization Window . . . . . Statistics of Fit Selection Window . . . . . . . . . Time ID Creation – 1,2,3 Window . . . . . . . . . Time ID Creation from Several Variables Window . Time ID Creation from Starting Date Window . . . Time ID Creation Using Informat Window . . . . . Time ID Variable Specification Window . . . . . . Time Ranges Specification Window . . . . . . . . Time Series Forecasting Window . . . . . . . . . . Time Series Simulation Window . . . . . . . . . . Time Series Viewer Window . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
2630 2631 2632 2633 2636 2637 2639 2641 2643 2644 2644 2646 2647 2648 2649 2651 2653 2654
Overview This chapter provides a reference to the various windows of the Time Series Forecasting System. The windows are presented in alphabetical order by name. Each section describes the purpose of the window, how to open it, its controls, fields, and menus. For windows that have their own menus, there is a description of each menu item under the heading “Menu Bar.” These windows also have a toolbar with icons that duplicate the more commonly used menu items. Each icon has a screen tip: a brief description that appears when you hover the mouse cursor over the icon. If you don’t see the screen tips, open the SAS Preferences window, under the Options submenu of the Tools menu. Select the View tab and make sure the “Screen tips” check box is checked.
Adjustments Selection Window Use the Adjustments Selection window to select input variables for use as adjustments to the forecasts and add them to the Predictors list. Invoke this window from the pop-up menu that appears when you select the Add button of the ARIMA Model Specification window or Custom Model Specification window. For more information, see the “Adjustments” section in Chapter 41, “Using Predictor Variables.”
AR/MA Polynomial Specification Window F 2555
Controls and Fields Dependent
is the name and variable label of the current series. Adjustments
is a table that lists the names and labels in the input data set available for selection as adjustments. The variables you select are highlighted. Selecting a highlighted row again deselects that variable. OK
closes the Adjustments Selection window and adds the selected variables as adjustments in the model. Cancel
closes the window without adding any adjustments. Reset
resets all selections to their initial values upon entry to the window.
AR/MA Polynomial Specification Window Use these windows to specify the autoregressive and moving-average terms in a factored ARIMA model. Access the AR Polynomial Specification window from the Set button next to the Autoregressive term in the Factored ARIMA Model Specification window. Access the MA Polynomial Specification window from the Set button next to the Moving Average term.
2556 F Chapter 43: Window Reference
Controls and Fields List of Polynomials
Lists the polynomials that have been specified. Each polynomial is represented by a commadelimited list of lag values enclosed in parentheses. New
Opens the Polynomial Specification window to add a new polynomial to the model. Edit
Opens the Polynomial Specification window to edit a polynomial that has been selected. If no polynomial is selected, this button is unavailable. Remove
Removes a selected polynomial from the list. If none are selected, this button is unavailable. Remove All
Clears the list of polynomials. Move Up
Moves a selected polynomial up one position in the list. If no polynomial is selected, or the first one is selected, this button is unavailable. Move Down
Moves a selected polynomial down one position in the list. If no polynomial is selected, or the last one is selected, this button is unavailable. OK
Closes the window and returns the specified list of polynomials to the Factored ARIMA Model Specification window.
ARIMA Model Specification Window F 2557
Cancel
Closes the window and discards any changes made to the list of polynomials.
ARIMA Model Specification Window Use the ARIMA Model Specification window to specify and fit an ARIMA model with or without predictor effects as inputs. Access it from the Develop Models menu, where it is invoked from the Fit Model item under Edit in the menu bar, or from the pop-up menu when you click an empty area of the model table.
Controls and Fields Series
is the name and variable label of the current series. Model
is a descriptive label for the model that you specify. You can type a label in this field or allow the system to provide a label. If you leave the label blank, a label is generated automatically based on the options you specify. ARIMA Options
specify the orders of the ARIMA model. You can either type in a value or click the arrow to select from a list.
2558 F Chapter 43: Window Reference
Autoregressive
defines the order of the autoregressive part of the model. Differencing
defines the order of simple differencing—for example, first difference or second difference. Moving Average
defines the order of the moving-average part of the model. Seasonal ARIMA Options
specify the orders of the seasonal part of the ARIMA model. You can either type in a value or click the arrow to select from a list. Autoregressive
defines the order of the seasonal autoregressive part of the model. Differencing
defines the order of seasonal differencing—for example, first difference or second difference at the seasonal lags. Moving Average
defines the order of the seasonal moving-average part of the model. Transformation
defines the series transformation for the model. When a transformation is specified, the ARIMA model is fit to the transformed series, and forecasts are produced by applying the inverse transformation to the ARIMA model predictions. The available transformations are: Log, Logistic, Square Root, Box-Cox, and None. Intercept
specify whether a mean or intercept parameter is included in the ARIMA model. By default, the Intercept option is set to No when the model includes differencing and Yes when there is no differencing. Predictors
lists the predictor effects included as inputs in the model. OK
closes the ARIMA Model Specification window and fits the model. Cancel
closes the ARIMA Model Specification window without fitting the model. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the ARIMA Model Specification window. This might be useful when editing an existing model specification; otherwise, Reset has the same function as Clear. Clear
resets all options to their default values. Add
opens a menu of types of predictors to add to the Predictors list.
ARIMA Model Specification Window F 2559
Delete
deletes the selected (highlighted) entry from the Predictors list. Edit
edits the selected (highlighted) entry in the Predictors list.
Mouse Button Actions You can select or deselect entries in the Predictors list by clicking them. The selected (highlighted) predictor effect is acted on by the Delete and Edit buttons. Double-clicking on a predictor in the list invokes an appropriate edit action for that predictor. If you right-click an entry in the Predictors list, the system displays the following menu of actions that encompass the features of the Add, Delete, and Edit buttons. Add Linear Trend
adds a Linear Trend item to the Predictors list. Add Trend Curve
opens a menu of different time trend curves and adds the curve you select to the Predictors list. Certain trend curve specifications also set the Transformation field. Add Regressors
opens the Regressors Selection window to enable you to select other series in the input data set as regressors to predict the dependent series and add them to the Predictors list. Add Adjustments
opens the Adjustments Selection window to enable you to select other series in the input data set for use as adjustments to the forecasts and add them to the Predictors list. Add Dynamic Regressor
opens the Dynamic Regressor Selection window to enable you to select a series in the input data set as a predictor of the dependent series and also specify a transfer function model for the effect of the predictor series. Add Interventions
opens the Interventions for Series window to enable you to define and select intervention effects and add them to the Predictors list. Add Seasonal Dummies
adds a Seasonal Dummies predictor item to the Predictors list. Edit Predictor
edits the selected (highlighted) entry in the Predictors list. Delete Predictors
deletes the selected (highlighted) entry from the Predictors list.
2560 F Chapter 43: Window Reference
ARIMA Process Specification Window Use the ARIMA Process Specification window to define ARIMA processes for simulation. Invoke this window from the Add Series button in the Time Series Simulation window.
Controls and Fields Series Name
is the variable name for the series to be simulated. Series Label
is the variable label for the series to be simulated. Series Mean
is the mean of the simulated series. Transformation
defines the series transformation. Simple Differencing
is the order of simple differencing for the series. Seasonal Differencing
is the order of seasonal differencing for the series. AR Parameters
is a table of autoregressive terms for the simulated ARIMA process. Enter a value for Factor,
Automatic Model Fitting Window F 2561
Lag, and Value for each term of the AR part of the process you want to simulate. For a nonfactored AR model, make the Factor values the same for all terms. For a factored AR model, use different Factor values to group the terms into the factors. MA Parameters
is a table of moving-average terms for the simulated ARIMA process. Enter a value for Factor, Lag, and Value for each term of the MA part of the process you want to simulate. For a nonfactored MA model, make the Factor values the same for all terms. For a factored MA model, use different Factor values to group the terms into the factors. OK
closes the ARIMA Process Specification window and adds the specified process to the Series to Generate list in the Time Series Simulation window. Cancel
closes the window without adding to the Series to Generate list. Any options you specified are lost. Reset
resets all the fields to their initial values upon entry to the window. Clear
resets all the fields to their default values.
Automatic Model Fitting Window Use the Automatic Model Fitting window to perform automatic model selection on all series or selected series in an input data set. Invoke this window by using the Fit Models Automatically button on the Time Series Forecasting window. Note that you can also perform automatic model fitting, one series at a time, from the Develop Models window.
2562 F Chapter 43: Window Reference
Controls and Fields Project
the name of the SAS catalog entry in which the results of the model search process are stored. Input Data Set
is the name of the current input data set. You can type in a one-level or two-level data set name here. Browse button
opens the Data Set Selection window for selecting an input data set. Time ID
is the name of the ID variable for the input data set. You can type in the variable name here or use the Select or Create button. time ID Select button opens the Time ID Variable Specification window. time ID Create button opens a menu of choices of methods for creating a time ID variable for the input data set. Use this feature if the input data set does not already contain a valid time ID variable. Interval
is the time interval between observations (data frequency) in the current input data set. You can type in an interval name or select one by using the combo box pop-up menu. Series to Process
indicates the number and names of time series variables for which forecasting model selection will be applied.
Automatic Model Fitting Window F 2563
Series to Process Select button opens the Series to Process window to let you select the series for which you want to fit models. Selection Criterion
shows the goodness-of-fit statistic that will be used to determine the best fitting model for each series. Selection Criterion Select button opens the Model Selection Criterion window to enable you to select the goodness-of-fit statistic that will be used to determine the best fitting model for each series. Run button
begins the automatic model fitting process. Models Fit button
opens the Automatic Model Fitting Results window to display the models fit during the current invocation of the Automatic Model Fitting window. The results appear automatically when model fitting is complete, but this button enables you to redisplay the results window. Close button
Closes the Automatic Model Fitting window.
Menu Bar File
Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print Setup
opens the Print Setup window, which allows you to access your operating system print setup. Close
closes the Automatic Model Fitting window. View
Input Data Set
opens a Viewtable window to browse the current input data set. Models Fit
opens Automatic Model Fitting Results window to show the forecasting models fit during the current invocation of the Automatic Model Fitting window. This is the same as the Models Fit button.
2564 F Chapter 43: Window Reference
Tools
Fit Models
performs the automatic model selection process for the selected series. This is the same as the Run button. Options
Default Time Ranges
opens the Default Time Ranges window to enable you to control how the system sets the time ranges for series. Model Selection List
opens the Model Selection List editor window. Use this action to control the forecasting models considered by the automatic model selection process and displayed in the Models to Fit window. Model Selection Criterion
opens the Model Selection Criterion window, which presents a list of goodness-of-fit statistics and enables you to select the fit statistic that is displayed in the table and used by the automatic model selection process to determine the best fitting model. This action is the same as the Selection Criterion Select button. Statistics of Fit
opens the Statistics of Fit Selection window, which presents a list of statistics that the system can display. Use this action to customize the list of statistics shown in the Statistics of Fit table and available for selection in the Model Selection Criterion menu. Forecast Options
opens the Forecast Options window, which enables you to control the widths of forecast confidence limits and control the kind of predicted values computed for models that include series transformations. Forecast Data Set
see Produce Forecasts window. Alignment of Dates
Beginning
aligns dates that the system generates to identify forecast observations in output data sets to the beginning of the time intervals. Middle
aligns dates that the system generates to identify forecast observations in output data sets to the midpoints of the time intervals. End
aligns dates that the system generates to identify forecast observations in output data sets to the end of the time intervals. Automatic Fit
opens the Automatic Model Selection Options window, which enables you to control the
Automatic Model Fitting Results Window F 2565
number of models retained by the automatic model selection process and whether the models considered for automatic selection are subset according to the series diagnostics. Tool Bar Type
Image Only
displays the toolbar items as icons without text. Label Only
displays the toolbar items as text without icon images. Both
displays the toolbar items with both text and icon images. Include Interventions
controls whether intervention effects defined for the current series are automatically added as predictors to the models considered by the automatic selection process. A check mark or filled check box next to this item indicates that the option is turned on. Print Audit Trail
prints to the SAS log information about the models fit by the system. A check mark or filled check box next to this item indicates that the audit option is turned on. Show Source Statements
controls whether SAS statements submitted by the forecasting system are printed in the SAS log. When the Show Source Statements option is selected, the system sets the SAS system option SOURCE before submitting SAS statements; otherwise, the system uses the NOSOURCE option. Note that only some of the functions performed by the forecasting system are accomplished by submitting SAS statements. A check mark or filled check box next to this item indicates that the option is turned on.
Automatic Model Fitting Results Window This resizable window displays the models fit by the most recent invocation of the Automatic Model Fitting window. It appears automatically after Automatic Model Fitting runs, and can be opened repeatedly from that window by using the Models Fit button or by selecting Models Fit from the View menu. Once you exit the Automatic Model Fitting window, the Automatic Model Fitting Results window cannot be opened again until you fit additional models by using Automatic Model Fitting.
2566 F Chapter 43: Window Reference
Table Contents The results table displays the series name in the first column and the model label in the second column. If you have chosen to retain more than one model by using the Automatic Model Selection Options window, more than one row appears in the table for each series; that is, there is a row for each model fit. If you have already fit models to the same series before invoking the Automatic Model Fitting window, those models do not appear here, since the Automatic Model Fitting Results window is intended to show the results of the current operation of Automatic Model Fitting. To see all models that have been fit, use the Manage Projects window. The third column of the table shows the values of the current model selection criterion statistic. Additional columns show the values of other fit statistics. The set of statistics shown are selectable by using the Statistics of Fit Selection window. The table can be sorted by any column other than Series Name by clicking on the column heading.
Controls and Fields Graph
opens the Model Viewer window on the model currently selected in the table. Stats
opens the Statistics of Fit Selection window. This controls the set of goodness-of-fit statistics displayed in the table and in other parts of the Time Series Forecasting System. Compare
opens the Model Fit Comparison window for the series currently selected in the table. This button is unavailable if the currently selected row in the table represents a series for which fewer than two models have been fit.
Automatic Model Fitting Results Window F 2567
Save
opens an output data set dialog, enabling you to specify a SAS data set to which the contents of the table is saved. Note that this operation saves what you see in the table. If you want to save the models themselves for use in a future session, use the Manage Projects window. Print
prints the contents of the table. Close
closes the window and returns to the Automatic Model Fitting window.
Menu Bar File
Save
opens an output data set dialog, enabling you to specify a SAS data set to which the contents of the table is saved. This is the same as the Save button. Print
prints the contents of the table. This is the same as the Print button. Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print Setup
opens the Print Setup window, which allows you to access your operating system print setup. Close
closes the window and returns to the Automatic Model Fitting window. View
Model Predictions
opens the Model Viewer to display a predicted and actual plot for the currently highlighted model. Prediction Errors
opens the Model Viewer to display the prediction errors for the currently highlighted model. Prediction Error Autocorrelations
opens the Model Viewer to display the prediction error autocorrelations, partial autocorrelations, and inverse autocorrelations for the currently highlighted model.
2568 F Chapter 43: Window Reference
Prediction Error Tests
opens the Model Viewer to display graphs of white noise and stationarity tests on the prediction errors of the currently highlighted model. Parameter Estimates
opens the Model Viewer to display the parameter estimates table for the currently highlighted model. Statistics of Fit
opens the Model Viewer window to display goodness-of-fit statistics for the currently highlighted model. Forecast Graph
opens the Model Viewer to graph the forecasts for the currently highlighted model. Forecast Table
opens the Model Viewer to display forecasts for the currently highlighted model in a table. Tools
Compare Models
opens the Model Fit Comparison window to display fit statistics for selected pairs of forecasting models. This item is unavailable until you select a series in the table for which the automatic model fitting run selected two or more models. Options
Statistics of Fit
opens the Statistics of Fit Selection window. This is the same as the Stats button. Column Labels
selects long or short column labels for the table. Long column labels are used by default. ID Columns
freezes or unfreezes the series and model columns. By default they are frozen so that they remain visible when you scroll the table horizontally to view other columns.
Automatic Model Selection Options Window Use the Automatic Model Selection Options window to control the automatic selection process. This window is available from the Automatic Fit item of the Options menu in the Develop Models window, Automatic Model Fitting window, and Produce Forecasts window.
Custom Model Specification Window F 2569
Controls and Fields Models to fit
Subset by series diagnostics
when selected, causes the automatic model selection process to search only over those models consistent with the series diagnostics. All models in selection list
when selected, causes the automatic model selection process to search over all models in the search list, without regard for the series diagnostics. Models to keep
specifies how many of the models tried by the automatic model selection process are retained and added to the model list for the series. You can specify the best fitting model only, the best n models, where n can be 1 through 9, or all models tried. OK
closes the window and saves the automatic model selection options you specified. Cancel
closes the window without changing the automatic model selection options.
Custom Model Specification Window Use the Custom Model Specification window to specify and fit an ARIMA model with or without predictor effects as inputs. Access it from the Develop Models window, where it is invoked from the Fit Model item under the Edit menu, or from the pop-up menu when you click an empty area of the model table.
2570 F Chapter 43: Window Reference
Controls and Fields Series
is the name and variable label of the current series. Model
is a descriptive label for the model that you specify. You can type a label in this field or allow the system to provide a label. If you leave the label blank, a label is generated automatically based on the options you specify. Transformation
defines the series transformation for the model. When a transformation is specified, the model is fit to the transformed series, and forecasts are produced by applying the inverse transformation to the resulting forecasts. The following transformations are available: Log
specifies a logarithmic transformation. Logistic
specifies a logistic transformation. Square Root
specifies a square root transformation. Box-Cox
specifies a Box-Cox transform and opens a window to specify the Box-Cox parameter. None
specifies no series transformation.
Custom Model Specification Window F 2571
Trend Model
controls the model options to model and forecast the series trend. Select from the following: Linear Trend
adds a Linear Trend item to the Predictors list. Trend Curve
brings of a menu of different time trend curves and adds the curve you select to the Predictors list. First Difference
specifies differencing the series. Second Difference
specifies second-order differencing of the series. None
specifies no model for the series trend. Seasonal Model
controls the model options to model and forecast the series seasonality. Select from the following: Seasonal ARIMA
opens the Seasonal ARIMA Model Options window to enable you to specify an ARIMA model for the seasonal pattern in the series. Seasonal Difference
specifies differencing the series at the seasonal lag. Seasonal Dummy Regressors
adds a Seasonal Dummies predictor item to the Predictors list. None
specifies no seasonal model. Error Model
displays the current settings of the autoregressive and moving-average terms, if any, for modeling the prediction error autocorrelation pattern in the series. Set button
opens the Error Model Options window to enable you to set the autoregressive and movingaverage terms for modeling the prediction error autocorrelation pattern in the series. Intercept
specifies whether a mean or intercept parameter is included in the model. By default, the Intercept option is set to No when the model includes differencing and set to Yes when there is no differencing. Predictors
is a list of the predictor effects included as inputs in the model. OK
closes the Custom Model Specification window and fits the model.
2572 F Chapter 43: Window Reference
Cancel
closes the Custom Model Specification window without fitting the model. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the Custom Model Specification window. This might be useful when editing an existing model specification; otherwise, Reset has the same function as Clear. Clear
resets all options to their default values. Add
opens a menu of types of predictors to add to the Predictors list. Select from the following: Linear Trend
adds a Linear Trend item to the Predictors list. Trend Curve
opens a menu of different time trend curves and adds the curve you select to the Predictors list. Regressors
opens the Regressors Selection window to enable you to select other series in the input data set as regressors to predict the dependent series and add them to the Predictors list. Adjustments
opens the Adjustments Selection window to enable you to select other series in the input data set for use as adjustments to the forecasts and add them to the Predictors list. Dynamic Regressor
opens the Dynamic Regressor Selection window to enable you to select a series in the input data set as a predictor of the dependent series and also specify a transfer function model for the effect of the predictor series. Interventions
opens the Interventions for Series window to enable you to define and select intervention effects and add them to the Predictors list. Seasonal Dummies
adds a Seasonal Dummies predictor item to the Predictors list. This is unavailable if the series interval is not one which has a seasonal cycle. Delete
deletes the selected (highlighted) entry from the Predictors list. Edit
edits the selected (highlighted) entry in the Predictors list.
Mouse Button Actions You can select or deselect entries in the Predictors list by clicking them. The selected (highlighted) predictor effect is acted on by the Delete and Edit buttons. Double-clicking on a predictor in the list invokes an appropriate edit action for that predictor.
Data Set Selection Window F 2573
If you right-click an entry in the Predictors list and press the right mouse button, the system displays a menu of actions that encompass the features of the Add, Delete, and Edit buttons.
Data Set Selection Window Use this resizable window to select a data set to process by specifying a library and a SAS data set or view. These selections can be made by typing, by selecting from lists, or by a combination of the two. In addition, you can control the time ID variable and time interval, and you can browse the data set.
Access this window by using the Browse button to the right of the Data Set field in the Time Series Forecasting, Automatic Model Fitting, and Produce Forecasts windows. It functions in the same way as the Series Selection window, except that it does not allow you to select or view a time series variable.
Controls and Fields Library
is a SAS libname assigned within the current SAS session. If you know the libname associated with the data set of interest, you can type it in this field. If it is a valid choice, it will appear in the libraries list and will be highlighted. The SAS Data Sets list will be populated with data sets associated with that libname. See also Libraries under Selection Lists. Data Set
is the name of a SAS data set (data file or data view) that resides under the selected libname.
2574 F Chapter 43: Window Reference
If you know the name, you can type it in and press Return. If it is a valid choice, it will appear in the SAS Data Sets list and will be highlighted. Time ID
is the name of the ID variable for the selected input data set. To specify the ID variable, you can type the ID variable name in this field or select the control arrows to the right of the field. Time ID Select button opens the Time ID Variable Specification window. Time ID Create button opens a menu of methods for creating a time ID variable for the input data set. Use this feature if the data set does not already contain a valid time ID variable. Interval
is the time interval between observations (data frequency) in the selected data set. If the interval is not automatically identified by the system, you can type in the interval name or select it from a list by clicking the combo box arrow. For more information about intervals, see Chapter 4, “Date Intervals, Formats, and Functions,” in this book. OK
closes the Data Set Selection window and makes the selected data set the current input data set. Cancel
closes the window without applying any selections made. Table
opens a Viewtable window for browsing the selected data set. Reset
resets the fields to their initial values upon entry to the window. Refresh
updates all fields and lists in the window. If you assign a new libname without exiting the Data Set Selection window, use the refresh action to update the Libraries list so that it will include the newly assigned libname.
Selection Lists Libraries
displays a list of currently assigned libnames. You can select a libname by clicking it with the left mouse button, which is equivalent to typing its name in the Library field. If you cannot locate the library or directory you are interested in, go to the SAS Explorer window, select “New” from the File menu, then select “Library” and “OK.” This opens the New Library window. You also assign a libname by submitting a libname statement from the Editor window. Select the Refresh button to make the new libname available in the libraries list. SAS Data Sets
displays a list of the SAS data sets (data files or data views) contained in the selected library. You can select one of these by clicking with the left mouse button, which is equivalent to typing its name in the Data set field. You can double-click a data set name to select it and exit the window.
Default Time Ranges Window F 2575
Default Time Ranges Window Use the Default Time Ranges window to control how the period of fit and evaluation and the forecasting horizon are determined for each series when you do not explicitly set these ranges for a particular series. Invoke this window from the Options menu of the Develop Models, Automatic Model Fitting, Produce Forecasts, and Manage Forecasting Project windows. The settings you make in this window affect subsequently selected series; they do not alter the time ranges of series you have already selected.
Controls and Fields Forecast Horizon
specifies the forecast horizon as either a number of periods or years from the last nonmissing data value or as a fixed date. You can type a number or date value in this field. Date value must be entered in a form recognized by a SAS date informat. (See SAS Language Reference: Concepts for information about SAS date informats.) Forecast Horizon Units
indicates whether the value in the forecast horizon field represents periods or years or a date. Click the arrow and select one from the pop-up list. Hold-out Sample Size
specifies that a number of observations, number of years, or percent of the data at the end of the data range be used for the period of evaluation with the remainder of data used as the period of fit. Hold-out Sample Size Units
indicates whether the hold-out sample size represents periods or years or percent of data range. Period of Fit
specifies how much of the data range for a series is to be used as the period of fit for models fit to the series. ALL indicates that all the available data is used. You can specify a number of periods, number of years, or a fixed date, depending on the value of the units field to the right. When you specify a date, the start of the period of fit is the specified date or the first nonmissing series value, whichever is more recent. Date value must be entered in a form
2576 F Chapter 43: Window Reference
recognized by a SAS date informat. (See SAS Language Reference: Concepts for information about SAS date informats.) When you specify the number of periods or years, the start of the period of fit is computed as the date that number of periods or years from the end of the data. Period of Fit Units
indicates whether the period-of-fit value represents periods or years or a date. OK
closes the window and stores the specified changes. Cancel
closes the window without saving changes. Any options you specified are lost. Defaults
resets all options to their default values. Reset
resets the options to their initial values upon entry to the window.
Develop Models Window This resizable window provides access to all of the Forecasting System’s interactive model fitting and graphical tools. Use it to fit forecasting models to an individual time series and choose the best model to use to produce the final forecasts of the series. Invoke this window by using the Develop Models button on the Time Series Forecasting window.
Develop Models Window F 2577
Controls and Fields Data Set
is the name of the current input data set. Interval
is the time interval (data frequency) for the input data set. Series
is the variable name and label of the current time series. Browse button
opens the Series Selection window to enable you to change the current input data set or series. Data Range
is the date of the first and last nonmissing data values available for the current series in the input data set. Fit Range
is the current period of fit setting. This is the range of data that will be used to fit models to the series. Evaluation Range
is the current period of evaluation setting. This is the range of data that will be used to calculate the goodness-of-fit statistics for models fit to the series. Set Ranges button
opens the Time Ranges Specification window to enable you to change the fit range or evalua-
2578 F Chapter 43: Window Reference
tion range. Note: A new fit range is applied when new models are fit or when existing models are refit. A new evaluation range is applied when new models are fit or when existing models are refit or reevaluated. Changing the ranges does not automatically refit or reevaluate any models in the table: Use the Refit Models or Reevaluate Models items under the Edit menu. View Series Graphically icon
opens the Time Series Viewer window to display plots of the current series. View Selected Model Graphically icon
opens the Model Viewer to display graphs and tables for the currently highlighted model. Forecast Model
is the column of the model table that contains check boxes to select which model is used to produce the final forecasts for the current series. Model Title
is the column of the model table that contains the descriptive labels of the forecasting models fit to the current series. Root Mean Square Error (or other statistic name) button
is the button above the right side of the table. It displays the name of the current model selection criterion: a statistic that measures how well each model in the table fits the values of the current series for observations within the evaluation range. Clicking this button opens the Model Selection Criterion window to let you to select a different statistic. When you select a statistic, the model table the Develop Models window is updated to show current values of that statistic.
Menu Bar File
New Project
opens a dialog that lets you create a new project, assign it a name and description, and make it the active project. Open Project
opens a dialog that lets you select and load a previously saved project. Save Project
saves the current state of the system (including all the models fit to a series) to the current project catalog entry. Save Project as
saves the current state of the system with a prompt for the name of the catalog entry in which to store the information. Clear Project
clears the system, deleting all the models for all series. Save Forecast
writes forecasts from the currently highlighted model to an output data set.
Develop Models Window F 2579
Save Forecast As
prompts for an output data set name and saves the forecasts from the currently highlighted model. Output Forecast Data Set
opens a dialog for specifying the default data set used when you select “Save Forecast.” Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print Setup
opens the Print Setup window, which enables you to access your operating system print setup. Close
closes the Develop Models window and returns to the main window. Edit
Fit Model
Automatic Fit
invokes the automatic model selection process. Select From List
opens the Models to Fit window. Smoothing Model
opens the Smoothing Model Specification window. ARIMA Model
opens the ARIMA Model Specification window. Custom Model
opens the Custom Model Specification window. Combine Forecasts
opens the Forecast Combination Model Specification window. External Forecasts
opens the External Forecast Model Specification window. Edit Model
enables you to modify the specification of the currently highlighted model in the table and fit the modified model. The new model replaces the current model in the table. Delete Model
deletes the currently highlighted model from the model table.
2580 F Chapter 43: Window Reference
Refit Models
All Models
refits all models in the table by using data within the current fit range. Selected Model
refits the currently highlighted model by using data within the current fit range. Reevaluate Models
All Models
recomputes statistics of fit for all models in the table by using data within the current evaluation range. Selected Model
recomputes statistics of fit for the currently highlighted model by using data within the current evaluation range. View
Project
opens the Manage Forecasting Project window. Data Set
opens a Viewtable window to display the current input data set. Series
opens the Time Series Viewer window to display plots of the current series. This is the same as the View Series Graphically icon. Model Predictions
opens the Model Viewer to display a predicted versus actual plot for the currently highlighted model. This is the same as the View Selected Model Graphically icon. Prediction Errors
opens the Model Viewer to display the prediction errors for the currently highlighted model. Prediction Error Autocorrelations
opens the Model Viewer to display the prediction error autocorrelations, partial autocorrelations, and inverse autocorrelations for the currently highlighted model. Prediction Error Tests
opens the Model Viewer to display graphs of white noise and stationarity tests on the prediction errors of the currently highlighted model. Parameter Estimates
opens the Model Viewer to display the parameter estimates table for the currently highlighted model. Statistics of Fit
opens the Model Viewer window to display goodness-of-fit statistics for the currently highlighted model.
Develop Models Window F 2581
Forecast Graph
opens the Model Viewer to graph the forecasts for the currently highlighted model. Forecast Table
opens the Model Viewer to display forecasts for the currently highlighted model in a table. Tools
Diagnose Series
opens the Series Diagnostics window to determine the kinds of forecasting models appropriate for the current series. Define Interventions
opens the Interventions for Series window to enable you to edit or add intervention effects for use in modeling the current series. Sort Models
sorts the models in the table by the values of the currently displayed fit statistic. Compare Models
opens the Model Fit Comparison window to display fit statistics for selected pairs of forecasting models. This is unavailable if there are fewer than two models in the table. Generate Data
opens the Time Series Simulation window. This window enables you to simulate ARIMA time series processes and is useful for educational exercises or testing the system. Options
Time Ranges
opens the Time Ranges Specification window to enable you to change the fit and evaluation time ranges and the forecast horizon. This action is the same as the Set Ranges button. Default Time Ranges
opens the Default Time Ranges window to enable you to control how the system sets the time ranges for series when you do not explicitly set time ranges with the Time Ranges Specification window. Settings made by using this window do not affect series you are already working with; they take effect when you select a new series. Model Selection List
opens the Model Selection List editor window. Use this action to edit the set of forecasting models considered by the automatic model selection process and displayed by the Models to Fit window. Model Selection Criterion
opens the Model Selection Criterion window, which presents a list of goodness-of-fit statistics and enables you to select the fit statistic that is displayed in the table and used by the automatic model selection process to determine the best fitting model. This action is the same as clicking the button above the table which displays the name of the current model selection criterion.
2582 F Chapter 43: Window Reference
Statistics of Fit
opens the Statistics of Fit Selection window, which presents a list of statistics that the system can display. Use this action to customize the list of statistics shown in the Model Viewer, Automatic Model Fitting Results, and Model Fit Comparison windows and available for selection in the Model Selection Criterion menu. Forecast Options
opens the Forecast Options window, which enables you to control the widths of forecast confidence limits and control the kind of predicted values computed for models that include series transformations. Alignment of Dates
Beginning
aligns dates that the system generates to identify forecast observations in output data sets to the beginning of the time intervals. Middle
aligns dates that the system generates to identify forecast observations in output data sets to the midpoints of the time intervals. End
aligns dates that the system generates to identify forecast observations in output data sets to the end of the time intervals. Automatic Fit
opens the Automatic Model Selection Options window, which enables you to control the number of models retained by the automatic model selection process and whether the models considered for automatic selection are subset according to the series diagnostics. Include Interventions
controls whether intervention effects defined for the current series are automatically added as predictors to the models considered by the automatic selection process and displayed by the Models to Fit window. When the Include Interventions option is selected, the series interventions are also automatically added to the predictors list when you specify a model in the ARIMA and Custom Models Specification windows. A check mark or filled check box next to this item indicates that the option is turned on. Print Audit Trail
prints to the SAS log information about the models fit by the system. A check mark or filled check box next to this item indicates that the audit option is turned on. Show Source Statements
Controls whether SAS statements submitted by the forecasting system are printed in the SAS log. When the Show Source Statements option is selected, the system sets the SAS system option SOURCE before submitting SAS statements; otherwise, the system uses the NOSOURCE option. Note that only some of the functions performed by the forecasting system are accomplished by submitting SAS statements. A check mark or filled check box next to this item indicates that the option is turned on.
Develop Models Window F 2583
Left Mouse Button Actions for the Model Table When the cursor is over the description of a model in the table, the left mouse button selects (highlights) or deselects that model. On some computer systems, you can double-click to open the Model Viewer window for the selected model. When the cursor is over an empty part of the model table, the left mouse button opens a menu of model fitting choices. These choices are the same as those in the Fit Model submenu of the Edit menu.
Right Mouse Button Actions for the Model Table When a model in the table is selected, the right mouse opens a menu of actions that apply to the highlighted model. The actions available in this menu are as follows. View Model
opens the Model Viewer for the selected model. This action is the same as the View Model Graphically icon. View Parameter Estimates
opens the Model Viewer to display the parameter estimates table for the currently highlighted model. This is the same as the Parameter Estimates item in the View menu. View Statistics of Fit
opens the Model Viewer to display a table of goodness-of-fit statistics for the currently highlighted model. This is the same as the Statistics of Fit item in the View menu. Edit Model
enables you to modify the specification of the currently highlighted model in the table and fit the modified model. This is the same as the Edit Model item in the Edit menu. Refit Model
refits the highlighted model by using data within the current fit range. This is the same as the Selected Model item under the Refit Models submenu of the Edit menu. Reevaluate Model
reevaluates the highlighted model by using data within the evaluation fit range. This is the same as the Selected Model item under the Reevaluate Models submenu of the Edit menu. Delete Model
deletes the currently highlighted model from the model table. This is the same as the Delete Model item under the Edit menu. View Forecasts
opens the Model Viewer to display the forecasts for the currently highlighted model. This is the same as the Forecast Graph item under the View menu. When the model list is empty or when no model is selected, the right mouse button opens the same menu of model fitting actions as the left mouse button.
2584 F Chapter 43: Window Reference
Differencing Specification Window Use the Differencing Specification window to specify the list of differencing lags d=(lag, ..., lag) in a factored ARIMA model. To specify a first difference, add the value 1 (d=(1)). To specify a second difference (difference twice at lag 1), add the value 1 again (d=(1,1)). For first differencing at lags 1 and 12, use the values 1 and 12 (d=(1,12)).
Controls and Fields Lag
specifies a lag value to add to the list. Type in a positive integer or select one by clicking the spin box arrows. Duplicates are allowed. Add
adds the value in the Lag spin box to the list of differencing lags. Remove
deletes a selected lag from the list of differencing lags. OK
closes the window and returns the specified list to the Factored ARIMA Model Specification window. Cancel
closes the window and discards any lags added to the list.
Dynamic Regression Specification Window F 2585
Dynamic Regression Specification Window Use the Dynamic Regression Specification window to specify a dynamic regression or transfer function model for the effect of the predictor variable. It is invoked from the Dynamic Regressors Selection window.
Controls and Fields Series
is the name and variable label of the current series. Input Model
is a descriptive label for the dynamic regression model. You can type a label in this field or allow the system to provide the label. If you leave the label blank, a label is generated automatically based on the options you specify. When no options are specified, the label is the name and variable label of the predictor variable. Input Transformation
displays the transformation specified for the predictor variable. When a transformation is specified, the transfer function model is fit to the transformed input variable. Lagging periods
is the pure delay in the effect of the predictor, l. Simple Order of Differencing
is the order of differencing, d. Set this field to 1 to use the changes in the predictor variable. Seasonal Order of Differencing
is the order of seasonal differencing, D. Set this field to 1 to difference the predictor variable at the seasonal lags—for example, to use the year-over-year or week-over-week changes in the predictor variable.
2586 F Chapter 43: Window Reference
Simple Order Numerator Factors
is the order of the numerator factor of the transfer function, p. Seasonal Order Numerator Factors
is the order of the seasonal numerator factor of the transfer function, P. Simple Order Denominator Factors
is the order of the denominator factor of the transfer function, q. Seasonal Order Denominator Factors
is the order of the seasonal denominator factor of the transfer function, Q. OK
closes the window and adds the dynamic regression model specified to the model predictors list. Cancel
closes the window without adding the dynamic regression model. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the window. This might be useful when editing a predictor specification; otherwise, Reset has the same function as Clear. Clear
resets all options to their default values.
Dynamic Regressors Selection Window Use the Dynamic Regressors Selection window to select an input variable as a dynamic regressor. Access this window from the pop-up menu which appears when you select the Add button of the ARIMA Model Specification window or Custom Model Specification window.
Error Model Options Window F 2587
Controls and Fields Dependent
is the name and variable label of the current series. Dynamic Regressors
is a table listing the variables in the input data set. Select one variable in this list as the predictor series. OK
opens the Dynamic Regression Specification window for you to specify the form of the dynamic regression for the selected predictor series, and then closes the Dynamic Regressors Selection window and adds the specified dynamic regression to the model predictors list. Cancel
closes the window without adding the dynamic regression model. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the window.
Error Model Options Window Use the Error Model Options window to specify the autoregressive and moving-average orders for the residual autocorrelation part of a model defined by using the Custom Model Specification window. Access it by using the Set button of that window.
2588 F Chapter 43: Window Reference
Controls and Fields ARIMA Options
Use these combo boxes to specify the orders of the ARIMA model. You can either type in a value or click the combo box arrow to select from a pop-up list. Autoregressive
defines the order of the autoregressive part of the model. Moving Average
defines the order of the moving-average term. OK
closes the Error Model Options window and returns to the Custom Model Specification window. Cancel
closes the Error Model Options window and returns to the Custom Model Specification window, discarding any changes made. Reset
resets all options to their initial values upon entry to the window.
External Forecast Model Specification Window Use the External Forecast Model Specification window to add to the current project forecasts produced externally to the Time Series Forecasting System. To add an external forecast, select a variable from the selection list and choose the OK button. The name of the selected variable will be added to the list of models fit, and the values of this variable will be used as the forecast. For more information, see “Incorporating Forecasts from Other Sources” in the “Specifying Forecasting Models” chapter.
Factored ARIMA Model Specification Window F 2589
Controls and Fields OK
closes the window and adds the external forecast to the project. Cancel
closes the window without adding an external forecast to the project. Reset
deselects any selection made in the selection list.
Factored ARIMA Model Specification Window Use the ARIMA Model Specification window to specify an ARIMA model by using the notation: p = (lag, ..., lag) ...(lag, ..., lag) d = (lag, ..., lag) q = (lag, ..., lag) ...(lag, ..., lag)
where p, d, and q represent autoregressive, differencing, and moving-average terms, respectively. Access it from the Develop Models menu, where it is invoked from the Fit Model item under Edit in the menu bar, or from the pop-up menu when you click an empty area of the model table.
2590 F Chapter 43: Window Reference
The Factored ARIMA Model Specification window is identical to the ARIMA Model Specification window, except that the p, d, and q terms are specified in a more general and less limited way. Only those controls and fields that differ from the ARIMA Model Specification window are described here.
Controls and Fields Model
is a descriptive label for the model. You can type a label in this field or allow the system to provide a label. If you leave the label blank, a label is generated automatically based on the p, d, and q terms that you specify. For example, if you specify p=(1,2,3), d=(1), q=(12) and no intercept, the model label is ARIMA p=(1,2,3) d=(1) q=(12) NOINT. For monthly data, this is equivalent to the model ARIMA(3,1,0)(0,0,1)s NOINT as specified in the ARIMA Model Specification window or the Custom Model Specification window. ARIMA Options
Specifies the ARIMA model in terms of the autoregressive lags (p), differencing lags (d), and moving-average lags (q). Autoregressive
defines the autoregressive part of the model. Select the Set button to open the AR Polynomial
Forecast Combination Model Specification Window F 2591
Specification window, where you can add any set of autoregressive lags grouped into any number of factors. Differencing
specifies differencing to be applied to the input data. Select the Set button to open the Differencing Specification window, where you can specify any set of differncing lags. Moving Average
defines the moving-average part of the model. Select the Set button to open the MA Polynomial Specification window, where you can add any set of moving-average lags grouped into any number of factors. Estimation Method
specifies the method used to estimate the model parameters. The Conditional Least Squares and Unconditional Least Squares methods generally require fewer computing resources and are more likely to succeed in fitting complex models. The Maximum Likelihood method requires more resources but provides a better fit in some cases. See also Estimation Details in Chapter 7, “The ARIMA Procedure.”
Forecast Combination Model Specification Window Use the Forecast Combination Model Specification window to produce forecasts by averaging the forecasts of two or more forecasting models. The specified combination of models is added to the model list for the series. Access this window from the Develop Models window whenever two or more models have been fit to the current series. It is invoked by selecting Combine Forecasts from the Fit Model submenu of the Edit menu, or from the pop-up menu which appears when you click an empty part of the model table.
2592 F Chapter 43: Window Reference
Controls and Fields Series
is the name and variable label of the current series. Model
is a descriptive label for the model that you specify. You can type a label in this field or allow the system to provide a label. If you leave the label blank, a label is generated automatically based on the options you specify. Weight
is a column of the forecasting model table that contains the weight values for each model. The forecasts for the combined model are computed as a weighted average of the predictions from the models in the table that use these weights. Models with missing weight values are not included in the forecast combination. You can type weight values in these fields or you can use other features of the window to set the weights. Model Description
is a column of the forecasting model table that contains the descriptive labels of the forecasting models fit to the current series that are available for combination. Root Mean Square Error (or other statistic name) button
is the button above the right side of the table. It displays the name of the current model selection criterion: a statistic that measures how well each model in the table fits the values of the current series for observations within the evaluation range. Clicking this button opens the Model Selection Criterion window to enable you to select a different statistic. Normalize Weights button
replaces each nonmissing value in the Weights column with the current value divided by the sum of the weights. The resulting weights are proportional to original weights and sum to 1.
Forecasting Project File Selection Window F 2593
Fit Regression Weights button
computes weight values for the models in the table by regressing the series on the predictions from the models. The values in the Weights column are replaced by the estimated coefficients produced by this linear regression. If some weight values are nonmissing and some are missing, only models with nonmissing weight values are included in the regression. If all weights are missing, all models are used. OK
closes the Forecast Combination Model Specification window and fits the model. Cancel
closes the Forecast Combination Model Specification window without fitting the model. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the Forecast Combination Model Specification window. This might be useful when editing an existing model specification; otherwise, Reset has the same function as Clear. Clear
resets all options to their default values.
Mouse Button Actions You can select or deselect models for inclusion in the combination model by positioning the mouse cursor over the model description and pressing the left mouse button. When you select a model in this way, the weights are automatically updated. The newly selected model is given a weight equal to the average weight of the previously selected models, and all the nonmissing weights are normalized to sum to 1. When you use the mouse to remove a model from the combination, the weight of the deselected model is set to missing and the remaining nonmissing weights are normalized to sum to 1.
Forecasting Project File Selection Window Use the Forecasting Project File Selection window to locate and load a previously stored forecasting project. Access it from the project Browse button of the Manage Forecasting Project window or the Time Series Forecasting window or from the Open Project item under the File menu of the Develop Models window.
2594 F Chapter 43: Window Reference
Selection Lists Libraries
is a list of currently assigned libraries. When you select a library from this list, the catalogs in that library are shown in the catalog selection list. Catalogs
is a list of catalogs contained in the currently selected library. When you select a catalog from this list, any forecasting project entries stored in that catalog are shown in the projects selection list. Projects
is a list of forecasting project entries contained in the currently selected catalog.
Controls and Fields OK
closes the window and opens the selected project. Cancel
closes the window without selecting a project. Delete
deletes the selected project file. Reset
restores selections to those which were set before the window was opened.
Intervention Specification Window F 2595
Forecast Options Window Use the Forecast Options window to set options to control how forecasts and confidence limits are computed. It is available from the Forecast Options item in the Options menu of the Develop Models window, Automatic Model Fitting window, Produce Forecasts, and Manage Projects windows.
Controls and Fields Confidence Limits
specifies the size of the confidence limits for the forecast values. For example, a value of 0.95 specifies 95% confidence intervals. You can type in a number or select from the pop-up list. Predictions for transformed models
controls how forecast values are computed for models that employ a series transformation. See the section Predictions for Transformed Models in Chapter 44, “Forecasting Process Details,” for more information. The values are as follows. Mean
specifies that forecast values be predictions of the conditional mean of the series. Median
specifies that forecast values be predictions of the conditional median of the series. OK
closes the window and saves the option settings you specified. Cancel
closes the window without changing the forecast options. Any options you specified are lost.
Intervention Specification Window Use the Intervention Specification window to specify intervention effects to model the impact on the series of unusual events. Access it from the Intervention for Series window. For more information,
2596 F Chapter 43: Window Reference
see the section “Interventions” on page 2527.
Controls and Fields Series
is the name and variable label of the current series. Label
is a descriptive label for the intervention effect that you specify. You can type a label in this field or allow the system to provide the label. If you leave the label blank, a label is generated automatically based on the options you specify. Date
is the date that the intervention occurs. You can type a date value in this field, or you can set the date by selecting a row of the data table on the right side of the window. Type of Intervention
Point
specifies that the intervention variable is zero except for the specified date. Step
specifies that the intervention variable is zero before the specified date and a constant 1.0 after the date. Ramp
specifies that the intervention variable is an increasing linear function of time after the date of the intervention and zero before the intervention date.
Interventions for Series Window F 2597
Number of lags
specifies the numerator order for the transfer function model for the intervention effect. Select a value from the pop-up list. Effect Decay Pattern
specifies the denominator order for the transfer function model for the intervention effect. The value “Exp” specifies a single lag denominator factor; the value “Wave” specifies a two-lag denominator factor. OK
closes the window and adds the intervention effect specified to the series interventions list. Cancel
closes the window without adding the intervention. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the window. This might be useful when editing an intervention specification; otherwise, Reset has the same function as Clear. Clear
resets all options to their default values.
Interventions for Series Window Use the Interventions for Series window to create and edit a list of intervention effects to model the impact on the series of unusual events and to select intervention effects as predictors for forecasting models. Access it from the Add button pop-up menu of the ARIMA Model Specification or Custom Model Specification window, or by selecting Define Interventions from the Tools in the Develop Models window. For more information, see the section “Interventions” on page 2527.
2598 F Chapter 43: Window Reference
Controls and Fields Series
is the name and variable label of the current series. OK
closes the window. If you access this window from the ARIMA Model Specification window or the Custom Model Specification window, any interventions that are selected (highlighted) in the list are added to the model. If you access this window from the Tools menu, all interventions in the list are saved for the current series. Cancel
closes the window without returning a selection or changing the interventions list. Any options you specified are lost. Reset
resets the list as it was on entry to the window. Clear
deletes all interventions from the list. Add
opens the Intervention Specification window to specify a new intervention effect and add it to the list. Delete
deletes the currently selected (highlighted) entries from the list. Edit
opens the Intervention Specification window to edit the currently selected (highlighted) intervention.
Manage Forecasting Project Window F 2599
Mouse Button Actions To select or deselect interventions, position the mouse cursor over the intervention’s label in the Interventions list and press the left mouse button. When you position the mouse cursor in the Interventions list and press the right mouse button, a menu containing the actions Add, Delete, and Edit appears. These actions are the same as the Add, Delete, and Edit buttons. Double-clicking on an intervention in the list invokes an Edit action for that intervention specification.
Manage Forecasting Project Window Use this resizable window to work with collections of series, models, and options called projects. The window contains a project name, a description field, and a table of information about all the series for which you have fit forecasting models. Access it by using the Manage Projects button on the Time Series Forecasting window.
2600 F Chapter 43: Window Reference
Controls and Fields Project Name
is the name of the SAS catalog entry in which forecasting models and other results will be stored and from which previously stored results are loaded into the forecasting system. You can specify the project by typing a SAS catalog entry name in this field or by selecting the Browse button to the right of this field. If you specify the name of an existing catalog entry, the information in the project file is loaded. If you specify a one-level name, it is assumed to be the name of a project in the “fmsproj” catalog in the “sasuser” library. For example, typing samproj is equivalent to typing sasuser.fmsproj.samproj. project Browse button opens the Forecasting Project File Selection window to enable you to select and load the project from a list of previously stored project files. Description
is a descriptive label for the forecasting project. The description you type in this field will be stored with the catalog entry shown in the Project field if you save the project.
Series List Table The table of series for which forecasting models have been fit contains the following columns. Series Name
is the name of the time series variable represented in the given row of the table. Series Frequency
is the time interval (data frequency) for the time series. Input Data Set Name
is the input data set that provided the data for the series. Forecasting Model
is the descriptive label for the forecasting model selected for the series. Statistic Name
is the statistic of fit for the forecasting model selected for the series. Number of Models
is the total number of forecasting models fit to the series. If there is more than one model for a series, use the Model List window to see a list of models. Series Label
is the variable label for the series. Time ID Variable Name
is the time ID variable for the input data set for the series. Series Data Range
is the time range of the nonmissing values of the series. Model Fit Range
is the period of fit used for the series.
Manage Forecasting Project Window F 2601
Model Evaluation Range
is the evaluation period used for the series. Forecast Range
is the forecast period set for the series.
Menu Bar File
New
opens a dialog which lets you create a new project, assign it a name and description, and make it the active project. Open
opens a dialog that lets you select and load a previously saved project. Close
closes the Manage Forecasting Project window and returns to the main window. Save
saves the current state of the system (including all the models fit to a series) to the current project catalog entry. Save As
saves the current state of the system with a prompt for the name of the catalog entry in which to store the information. Save to Data Set
saves the current project file information in a SAS data set. The contents of the data set are the same as the information displayed in the series list table. Delete
deletes the current project file. Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print
prints the current project file information. Print Setup
opens the Print Setup window, which allows you to access your operating system print setup. Edit
2602 F Chapter 43: Window Reference
Delete Series
deletes all models for the selected (highlighted) row of the table and removes the series from the project. Clear
resets the system, deleting all series and models from the project. Reset
restores the Manage Forecasting Project window to its initial state. View
Data Set
opens a Viewtable window to display the input data set for the selected (highlighted) series. Series
opens the Time Series Viewer window to display plots of the selected (highlighted) series. Model
opens the Model Viewer window to show the current forecasting model for the selected series. Forecast
opens the Model Viewer to display plots of the forecasts produced by the forecasting model for the selected (highlighted) series. Tools
Diagnose Series
opens the Series Diagnostics window to perform the automatic series diagnostic process to determine the kinds of forecasting models appropriate for the selected (highlighted) series. List Models
opens the Model List window for the selected (highlighted) series, which displays a list of all the models that you fit for the series. This action is the same as double-clicking the mouse on the table row. Generate Data
opens the Time Series Simulation window. This window enables you to simulate ARIMA time series processes and is useful for educational exercises or testing the system. Refit Models
All Series
refits all the models for all the series in the project by using data within the current fit range.
Manage Forecasting Project Window F 2603
Selected Series
refits all the models for the currently highlighted series by using data within the current fit range. Reevaluate Models
All Series
reevaluates all the models for all the series in the project by using data within the current evaluation fit range. Selected Series
reevaluates all the models for the currently highlighted series by using data within the current evaluation range. Options
Time Ranges
opens the Time Ranges Specification window to enable you to change the fit and evaluation time ranges and the forecast horizon. Default Time Ranges
opens the Default Time Ranges window to enable you to control how the system sets the time ranges for series when you do not explicitly set time ranges with the Time Ranges Specification window. Settings made by using this window do not affect series you are already working with; they take effect when you select a new series. Model Selection List
opens the Model Selection List editor window. Use this to edit the set of forecasting models considered by the automatic model selection process and displayed by the Models to Fit window. Statistics of Fit
opens the Statistics of Fit Selection window, which controls which of the available statistics will be displayed. Forecast Options
opens the Forecast Options window, which enables you to control the widths of forecast confidence limits and control the kind of predicted values computed for models that include series transformations. Column Labels
enables you to set long or short column labels. Long labels are used by default. Include Interventions
controls whether intervention effects defined for the current series are automatically added as predictors to the models considered by the automatic selection process and displayed by the Model Selection List editor window. When the Include Interventions option is selected, the series interventions are also automatically added to the predictors list when you specify a model in the ARIMA and Custom Models Specification windows.
2604 F Chapter 43: Window Reference
Print Audit Trail
prints to the SAS log information about the models fit by the system. A check mark or filled check box next to this item indicates that the audit option is turned on. Show Source Statements
controls whether SAS statements submitted by the forecasting system are printed in the SAS log. When the Show Source Statements option is selected, the system sets the SAS system option SOURCE before submitting SAS statements; otherwise, the system uses the NOSOURCE option. Note that only some of the functions performed by the forecasting system are accomplished by submitting SAS statements. A check mark or filled check box next to this item indicates that the option is turned on.
Left Mouse Button Actions If you select a series in the table by positioning the cursor over the table row and clicking with the left mouse button once, that row of the table is highlighted. Menu bar actions such as Delete Series will apply to the highlighted row of the table. If you select a series in the table by positioning the cursor over the table row and double-clicking with the left mouse button, the system opens the Model List window for that series, which displays a list of all the models that you fit for the series. This is the same as the List Models action under Tools in the menu bar.
Right Mouse Button Actions Clicking the right mouse button invokes a pop-up menu of actions applicable to the highlighted series. The actions in this menu are as follows. Delete Series
deletes the highlighted series and its models from the project. This is the same as Delete Series in the Edit menu. Refit All Models
refits all models attached to the highlighted series by using data within the current fit range. This is the same as the Selected Series item under Refit Models in the Tools menu. Reevaluate All Models
reevaluates all models attached to the highlighted series by using data within the current evaluation range. This is the same as the Selected Series item under Reevaluate Models in the Tools menu. List Models
invokes the Model List window. This is the same as List Models under the Tools menu. View Series
opens the Time Series Viewer window to display plots of the highlighted series. This is the same as the Series item under the View menu. View Forecasting Model
invokes the Model Viewer window to display the forecasting model for the highlighted series. This is the same as the Model item under the View menu.
Model Fit Comparison Window F 2605
View Forecast
opens the Model Viewer window to display the forecasts for the highlighted series. This is the same as the Forecast item under the View menu. Refresh
updates information shown in the Manage Forecasting Project window.
Model Fit Comparison Window Use the Model Fit Comparison window to compare goodness-of-fit statistics for any two models fit to the current series. Access it from the Tools menu of the Develop Models window and the Automatic Model Fitting Results window whenever two or more models have been fit to the series.
Controls and Fields Series
identifies the current time series variable. Range
displays the starting and ending dates of the series data range. Model 1
shows the model currently identified as Model 1. Model 1 upward arrow button
enables you to change the model identified as Model 1 if it is not already the first model in the
2606 F Chapter 43: Window Reference
list of models associated with the series. Select this button to cycle upward through the list of models. Model 1 downward arrow button
enables you to change the model identified as Model 1 if it is not already the last model in the list of models. Select this button to cycle downward through the list of models. Model 2
shows the model currently identified as Model 2. Model 2 upward arrow button
enables you to change the model identified as Model 2 if it is not already the first model in the list of models associated with the series. Select this button to cycle upward through the list of models. Model 2 downward arrow button
enables you to change the model identified as Model 2 if it is not already the last model in the list of models. Select this button to cycle downward through the list of models. Close
closes the Model Fit Comparison window. Save
opens a dialog for specifying the name and label of a SAS data set to which the statistics will be saved. The data set will contain all available statistics and their values for Model 1 and Model 2, as well as a flag variable that is set to 1 for those statistics that were displayed. Print
prints the contents of the table to the SAS Output window. If you find that the contents do not appear immediately in the Output window, you need to set scrolling options. Select “Preferences” under the Options submenu of the Tools menu. In the Preferences window, select the Advanced tab, then set output scroll lines to a number greater than zero. If you want to route the contents to a printer, go to the Output window and select “Print” from the File menu. Statistics
opens the Statistics of Fit Selection window for controlling which statistics are displayed.
Model List Window This resizable window shows all of the models that have been fit to a particular series in a project. Access it from the Manage Forecasting Project window by selecting a series in the series list table and choosing “List Models” from the Tools menu or by double-clicking the series.
Model List Window F 2607
Controls and Fields Data Set
is the name of the current input data set. Interval
is the time interval (data frequency) for the input data set. Series
is the variable name and label of the current time series. Data Range
is the date of the first and last nonmissing data values available for the current series in the input data set. Fit Range
is the current period of fit setting. This is the range of data that will be used to fit models to the series. It might be different from the fit ranges shown in the table, which were in effect when the models were fit. Evaluation Range
is the current period of evaluation setting. This is the range of data that will be used to calculate the goodness-of-fit statistics for models fit to the series. It might be different from the evaluation ranges shown in the table, which were in effect when the models were fit. View Series Graphically icon
opens the Time Series Viewer window to display plots of the current series.
2608 F Chapter 43: Window Reference
View Model Graphically icon
opens the Model Viewer to display graphs and tables for the currently highlighted model.
Model List Table The table of models fit to the series contains columns that show the model label, the fit range and evaluation range used to fit the models, and all of the currently selected fit statistics. You can change the selection of fit statistics by using the Statistics of Fit Selection window. Click on column headings to sort the table by a particular column. If a model is highlighted, clicking with the right mouse button invokes a pop-up menu that provides actions applicable to the highlighted model. It includes the following items. View Model
opens the Model Viewer on the selected model. This is the same as “Model Predictions” under the View menu. View Parameter Estimates
opens the Model Viewer to display the parameter estimates table for the currently highlighted model. This is the same as “Parameter Estimates” under the View menu. View Statistics of Fit
opens the Model Viewer to display the statistics of fit table for the currently highlighted model. This is the same as “Statistics of FIt” under the View menu. Edit Model
opens the appropriate model specification window for changing the attributes of the highlighted model and fitting the modified model. Refit Model
refits the highlighted model using the current fit range. Reevaluate Model
reevaluates the highlighted model using the current evaluation range. Delete Model
deletes the highlighted model from the project. View Forecasts
opens the Model Viewer to show the forecasts for the highlighted model. This is the same as “Forecast Graph” under the View menu.
Menu Bar File
Save
opens a dialog which lets you save the contents of the table to a specified SAS data set. Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you
Model List Window F 2609
can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print
sends the contents of the table to a printer as defined through Print Setup. Print Setup
opens the Print Setup window, which allows you to access your operating system print setup. Close
closes the window and returns to the Manage Forecasting Projects window. Edit
Edit Model
enables you to modify the specification of the currently highlighted model in the table and fit the modified model. The new model replaces the current model in the table. Refit Model
refits the currently highlighted model using data within the current fit range. Reevaluate Model
recomputes statistics of fit for the currently highlighted model using data within the current evaluation range. Delete Model
deletes the currently highlighted model from the model table. Reset
restores the contents of the Model List window to the state initially displayed. View
Series
opens the Time Series Viewer window to display plots of the current series. This is the same as the View Series Graphically icon. Model Predictions
opens the Model Viewer to display a predicted and actual plot for the currently highlighted model. This is the same as the View Model Graphically icon. Prediction Errors
opens the Model Viewer to display the prediction errors for the currently highlighted model. Prediction Error Autocorrelations
opens the Model Viewer to display the prediction error autocorrelations, partial autocorrelations, and inverse autocorrelations for the currently highlighted model.
2610 F Chapter 43: Window Reference
Prediction Error Tests
opens the Model Viewer to display graphs of white noise and stationarity tests on the prediction errors of the currently highlighted model. Parameter Estimates
opens the Model Viewer to display the parameter estimates table for the currently highlighted model. Statistics of Fit
opens the Model Viewer window to display goodness-of-fit statistics for the currently highlighted model. Forecast Graph
opens the Model Viewer to graph the forecasts for the currently highlighted model. Forecast Table
opens the Model Viewer to display forecasts for the currently highlighted model in a table. Options
Statistics of Fit
opens the Statistics of Fit Selection window, which presents a list of statistics that the system can display. Use this action to customize the list of statistics shown in the Model Viewer, Automatic Model Fitting Results, and Model Fit Comparison windows and available for selection in the Model Selection Criterion menu. Column Labels
enables you to set long or short column labels. Long labels are used by default.
Model Selection Criterion Window Use the Model Selection Criterion window to select the model selection criterion statistic used by the automatic selection process to determine the best fitting forecasting model. Model selection criterion statistics are a subset of those shown in the Statistics of Fit Selection window, since some statistics of fit, such as number of observations, are not useful for model selection. This window is available from the Model Selection Criterion item of the Options menu of the Develop Models window, Automatic Model Fitting window, and Produce Forecasts window.
Model Selection List Editor Window F 2611
Controls and Fields Show subset
when selected, lists only those model selection criterion statistics that are selected in the Statistics of Fit Selection window. Show all
when selected, lists all available model selection criterion statistics. OK
closes the window and sets the model selection criterion to the statistic you specified. Cancel
closes the window without changing the model selection criterion.
Model Selection List Editor Window Use the Model Selection List Editor window to edit the model selection list, including adding your own custom models, and to specify which models in the list are to be used in the automatic fitting process. Access it from the Options menu in the Develop Models, Automatic Model Fitting window, Produce Forecasts, and Manage Projects windows. The window initially displays the current model list for your project. You can modify this set of models in several ways: Open one or more alternate model lists to replace or append to the current model list. These can be either model lists included with the software or model lists previously saved by you or other users.
2612 F Chapter 43: Window Reference
Turn the autofit option on or off for individual models. Those that are not flagged for autofit will be available by using the Models to Fit window but not by using automatic model fitting. Delete models from the list that are not needed for your project. Reorder the models in the list. Edit models in the list. Create a new empty list. Add new models to the list. Having modified the current model list, you can save it for future use in several ways: Save it in a catalog so it can be opened later in the Model Selection List Editor. Save it as the user default to be used automatically when new projects are created. Select close to close the Model Selection List Editor and attach the modified model selection list to the current project. Select cancel to close the Model Selection List Editor without changing the current project’s model selection list. Since model selection lists are not bound to specific data sources, care must be taken when including data-specific features such as interventions and regressors. When you add an ARIMA, Factored ARIMA, or Custom model to the list, you can add regressors by selecting from the variables in the current data set. If there is no current data set, you will be prompted to specify a data set so you can select regressors from the series it contains. If you use a model list that has models with a particular regressor name on a data set that does not contain a series of that name, model fitting will fail. However, you can make global changes to the regressor names in the model list by using Set regressor names. For example, you might use the list of dynamic regression models found in the sashelp.forcast catalog. It uses the regressor name “price.” If your regessor series is named “x,” you can specify “price” as the current regressor name and “x” as the “change to” name. The change will be applied to all models in the list that contain the specified regressor name. Interventions cannot be defined for models defined from the Model Selection List Editor. However, you can define interventions by using the Intervention Specification Window and apply them to your models by turning on the Include Interventions option.
Model Selection List Editor Window F 2613
Auto Fit The auto fit column of check boxes enables you to eliminate some of the models from being used in the automatic fitting process without having to delete them from the list. By default, all models are checked, meaning that they are all used for automatic fitting.
Model This column displays the descriptions of all models in the model selection list. You can select one or more models by clicking them. Selected models are highlighted and become the object of the actions Edit, Move, and Delete.
Menu Bar File
New
creates a new empty model selection list. Open
opens a dialog for selecting one or more existing model selection lists to open. If you select multiple lists, they are all opened at once as a concatenated list. This helps you build large specialized model lists quickly by mixing and matching various existing lists such as the various ARIMA model lists included in SASHELP.FORCAST. By default,
2614 F Chapter 43: Window Reference
the lists you open replace the current model list. Select the “append” radio button if you want to append them to the current model list. Open System Default
opens the default model list supplied with the product. Cancel
exits the window without applying any changes to the current project’s model selection list. Close
closes the window and applies any changes made to the project’s model selection list. Save
opens a dialog for saving the edited model selection list in a catalog of your choice. Save as User Default
saves your edited model list as a default list for new projects. The location of this saved list is shown on the message line. When you create new projects, the system searches for this model list and uses it if it is found. If it is not found, the system uses the original default model list supplied with the product. Edit
Reset
restores the list to its initial state when the window was invoked. Add Model
enables you to add new models to the selection list. You can use the Smoothing Model Specification window, the ARIMA Model Specification window, the Factored ARIMA Model Specification window, or the Custom Model Specification window. Edit Selected
opens the appropriate model specification window for changing the attributes of the highlighted model and adding the modified model to the selection list. The original model is not deleted. Move Selected
enables you to reorder the models in the list. Select one or more models, then select Move Selected from the menu or toolbar. A note appears on the message line: “Select the row after which the selected models are to be moved.” Then select any unhighlighted row in the table. The selected models will be moved after this row. Delete
deletes any highlighted models from the list. This item is not available if no models are selected. Set Regressor Names
opens a dialog for changing all occurrences of a given regressor name in the models of the current model selection list to a name that you specify. Select All
selects all models in the list. Clear Selections
deselects all models in the list.
Model Viewer Window F 2615
Select All for Autofit
checks the autofit check boxes of all models in the list. Clear Autofit Selections
deselects the autofit check boxes of all models in the list.
Mouse Button Actions Clicking any model description in the table selects (highlights) that model. Clicking the same model again deselects it. Multiple selections are allowed. Clicking the auto fit check box in any row toggles the associated model’s eligibility for use in automatic model fitting. Right-clicking the right mouse button opens a pop-up menu.
Model Viewer Window This resizable window provides plots and tables of actual values, model predictions, forecasts, and related statistics. The various plots and tables available are referred to as views. The section “View Selection Icons” on page 2617 explains how to change the view.
2616 F Chapter 43: Window Reference
You can access Model Viewer in a number of ways, including the View Model Graphically icon of the Develop Models and Model List windows, the Graph button of the Automatic Model Fitting Results window, and the Model item under the View menu in the Manage Forecasting Project window. In addition, you can go directly to a selected view in the Model Viewer window by selecting Model Predictions, Prediction Errors, Statistics of Fit, Prediction Error Autocorrelations, Prediction Error Tests, Parameter Estimates, Forecast Graph, or Forecast Table from the View menu or corresponding toolbar icon or pop-up menu item in the Develop Models, Model List, or Automatic Model Fitting Results windows. The state of the Model Viewer window is controlled by the current model and the currently selected view. You can resize this window, and you can use other windows without closing the Model Viewer window. By default, the Model Viewer window is automatically updated to display the new model when you switch to working with another model (that is, when you highlight a different model). You can unlink the Model Viewer window from the current model selection by selecting the Link/Unlink icon from the window’s horizontal toolbar. See “Link/Unlink” in the section “Toolbar Icons” on page 2616. For more information, see the section “Model Viewer” on page 2427.
Toolbar Icons The Model Viewer window contains a horizontal row of icons called the Toolbar. Corresponding menu items appear under various menus. The function of each icon is explained in the following list. Zoom in
In the Model Predictions, Prediction Errors, and Forecast Graph views, the Zoom In action changes the mouse cursor into cross hairs that you can use with the left mouse button to define a region of the graph to zoom in on. In the Prediction Error Autocorrelations and Prediction Error Tests views, Zoom In reduces the number of lags displayed. Zoom out
reverses the previous Zoom In action. Link/Unlink viewer
disconnects or connects the Model Viewer window to the model table (Develop Models window, Model List window, or Automatic Model Fitting Results window). When the viewer is linked, selecting another model in the model table causes the model viewer to be updated to show the selected model. When the Viewer is unlinked, selecting another model does not affect the viewer. This feature is useful for comparing two or more models graphically. You can display a model of interest in the Model Viewer, unlink it, then select another model and open another Model Viewer window for that model. Position the viewer windows side by side for convenient comparisons of models, or use the Next Viewer icon or F12 function key to switch between them. Save
saves the contents of the Model Viewer window. By default, an HTML page is created. This enables you to display graphs and tables by using the Results Viewer or publish them on the Web or your intranet. See also “Save Graph As” and “Save Data As” under “Menu Bar” below.
Model Viewer Window F 2617
Print
prints the contents of the viewer window. Close
closes the Model Viewer window and returns to the window from which it was invoked.
View Selection Icons At the right hand side of the Model Viewer window is a vertical toolbar to select the view—that is, the kind of plot or table that the viewer displays. Corresponding menu items appear under View in the menu bar. The function of each icon is explained in the following list. Model Predictions
displays a plot of actual series values and model predictions over time. Click individual points in the graph to get a display of the type (actual or predicted), ID value, and data value in the upper right corner of the window. Prediction Errors
displays a plot of model prediction errors (residuals) over time. Click individual points in the graph to get a display of the prediction error value in the upper right corner of the window. Prediction Error Autocorrelations
displays horizontal bar charts of the sample autocorrelation, partial autocorrelation, and inverse autocorrelation functions for the model prediction errors. Overlaid line plots represent confidence limits computed at plus and minus two standard errors. Click any of the bars to display its value. Prediction Error Tests
displays horizontal bar charts that represent results of white noise and stationarity tests on the model prediction errors. The first bar chart shows the significance probability of the LjungBox chi-square statistic computed on autocorrelations up to the given lag. Longer bars favor rejection of the null hypothesis that the series is white noise. Click any of the bars to display an interpretation. The second bar chart shows tests of stationarity of the model prediction errors, where longer bars favor the conclusion that the series is stationary. Each bar displays the significance probability of the augmented Dickey-Fuller unit root test to the given autoregressive lag. Long bars represent higher levels of significance against the null hypothesis that the series contains a unit root. For seasonal data, a third bar chart appears for seasonal root tests. Click on any of the bars to display an interpretation. Parameter Estimates
displays a table showing model parameter estimates along with standard errors and t tests for the null hypothesis that the parameter is zero. Statistics of Fit
displays a table of statistics of fit for the selected model. The set of statistics shown can be changed by using the Statistics of Fit item under Options in the menu bar. Forecast Graph
displays a plot of actual and predicted values for the series data range, followed by a horizontal reference line and forecasted values with confidence limits. Click individual points in the
2618 F Chapter 43: Window Reference
graph to get a display of the type, date/time, and value of the data point in the upper right corner of the window. Forecast Table
displays a data table with columns containing the date/time, actual, predicted, error (residual), lower confidence limit, and upper confidence limit values, together with any predictor series.
Menu Bar File
Save Graph
saves the plot displayed in viewer window as a SAS/GRAPH grseg catalog entry. When the current view is a table, this menu item is not available. See also “Save” in the section “Toolbar Icons” on page 2616. If a graphics catalog entry name has not already been specified, this action functions like “Save Graph As.” Save Graph As
saves the current graph as a SAS/GRAPH grseg catalog entry in a SAS catalog that you specify and/or as an Output Delivery System (ODS) object. By default, an HTML page is created, with the graph embedded as a gif image. Save Data
saves the data displayed in the viewer window in a SAS data set, where applicable. Save Data As
saves the data in a SAS data set that you specify and/or as an Output Delivery System (ODS) object. By default, an HTML page is created, with the data displayed as a table. Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print Graph
prints the contents of the viewer window if the current view is a graph. This is the same as the Print toolbar icon. If the current view is a table, this menu item is not available. Print Data
prints the data displayed in the viewer window, where applicable. Print Setup
opens the Print Setup window, which allows you to access your operating system print setup. Print Preview
opens a preview window to show how your plots will appear when printed.
Model Viewer Window F 2619
Close
closes the Model Viewer window and returns to the window from which it was invoked. Edit
Edit Model
enables you to modify the specification of the current model and to fit the modified model, which is then displayed in the viewer. Refit Model
refits the current model by using data within the current fit range. This action also causes the ranges to be reset if the data range has changed. Reevaluate Model
reevaluates the current model by using data within the current evaluation range. This action also causes the ranges to be reset if the data range has changed. View
See “View Selection Icons” on page 2617. It describes each of the items available under “View,” except “Zoom Way Out.” Zoom Way Out
zooms the plot out as far as it will go, undoing all prior zoom in operations. Tools
Link Viewer
See “Link/Unlink” in the section “Toolbar Icons” on page 2616. Options
Time Ranges
opens the Time Ranges Specification window to enable you to change the period of fit, period of evaluation, or forecast horizon to be applied to subsequently fit models. Statistics of Fit
opens the Statistics of Fit Selection window, which presents a list of statistics that the system can display. Use this action to customize the list of statistics shown in the statistics of fit table and available for selection in the Model Selection Criterion menu. Forecast Options
opens the Forecast Options window, which enables you to control the widths of forecast confidence limits and control the kind of predicted values computed for models that include series transformations. Residual Plot Options
Provides a choice of four methods of computing prediction errors for models which include a data transformation. Prediction Errors
computes the difference between the transformed series actual values and model predictions.
2620 F Chapter 43: Window Reference
Normalized Prediction Errors
computes prediction errors in normalized form. Model Residuals
computes the difference between the untransformed series values and the untransformed model predictions. Normalized Model Residuals
computes model residuals in normalized form. Number of Lags
opens a window to enable you to specify the number of lags shown in the Prediction Error Autocorrelations and Prediction Error Tests views. You can also use the Zoom In and Zoom Out actions to control the number of lags displayed. Correlation Probabilities
controls whether the bar charts in the Prediction Error Autocorrelations view represent significance probabilities or values of the correlation coefficient. A check mark or filled check box next to this item indicates that significance probabilities are displayed. In each case the bar graph horizontal axis label changes accordingly. Include Interventions
controls whether intervention effects defined for the current series are automatically added as predictors to the models considered by the automatic selection process. A check mark or filled check box next to this item indicates that the option is turned on. Print Audit Trail
prints to the SAS log information about the models fit by the system. A check mark or filled check box next to this item indicates that the audit option is turned on. Show Source Statements
controls whether SAS statements submitted by the forecasting system are printed in the SAS log. When the Show Source Statements option is selected, the system sets the SAS system option SOURCE before submitting SAS statements; otherwise, the system uses the NOSOURCE option. Note that only some of the functions performed by the forecasting system are accomplished by submitting SAS statements. A check mark or filled check box next to this item indicates that the option is turned on.
Mouse Button Actions You can examine the data values of individual points in the Model Predictions, Model Prediction Errors, and Forecast Graph views of the Model Viewer by clicking the point. The date/time and data values as well as the type (actual, predicted, and so forth) are displayed in a box that appears in the upper right corner of the Viewer window. Click the mouse elsewhere or select any action to dismiss the data box. Similarly, you can display values in the Prediction Error Autocorrelations view by clicking any of the bars. Clicking bars in the Prediction Error Tests view displays a Recommendation for Current View window which explains the test represented by the bar. When you select the Zoom In action in the Predicted Values, Model Prediction Errors, and Forecasted Values views, you can use the mouse to define a region of the graph to zoom. Position the
Models to Fit Window F 2621
mouse cursor at one corner of the region, press the left mouse button, and move the mouse cursor to the opposite corner of the region while holding the left mouse button down. When you release the mouse button, the plot is redrawn to show an expanded view of the data within the region you selected.
Models to Fit Window Use the Models to Fit window to fit models by choosing them from the current model selection list. Access it by using “Fit Models from List” under the Fit Model submenu of the Edit menu in the Develop Models window, or the pop-up menu that appears when you click an empty area of the model table in the Develop Models window. If you want to alter the list of models that appears here, use the Model Selection List editor window.
To select a model to be fit, use the left mouse button. To select more than one model to fit, drag with the mouse, or select the first model, then press the shift key while selecting the last model. For noncontiguous selections, press the control key while selecting with the mouse. To begin fitting the models, double-click the last selection or select the OK button. If series diagnostics have been performed, the radio box is available. If the Subset by series diagnostics radio button is selected, only those models in the selection list that fit the diagnostic criteria will be shown for selection. If you want to choose models that do not fit the diagnostic criteria, select the Show all models button.
2622 F Chapter 43: Window Reference
Controls and Fields Show all models
when selected, lists all available models, regardless of the setting of the series diagnostics options. Subset by series diagnostics
when selected, lists only the available models that are consistent with the series diagnostics options. OK
closes the Models to Fit window and fits the selected models. Cancel
closes the window without fitting any models. Any selections you made are lost.
Polynomial Specification Window Use the Polynomial Specification window to add a polynomial to an ARIMA model. The set of lags defined here become a polynomial factor, denoted by a list of lags in parentheses, when you select “OK.” If you accessed this window from the AR Polynomial Specification window, then it is added to the autoregressive part of the model. If you accessed it from the MA Polynomial Specification window, it is added to the moving-average part of the model.
Controls and Fields Lag
specifies a lag value to add to the list. Type in a positive integer or select one by clicking the
Produce Forecasts Window F 2623
spin box arrows. Add
adds the value in the Lag spin box to the list of polynomial lags. Duplicate values are not allowed. Remove
deletes a selected lag from the list of polynomial lags. Polynomial Lags
is a list of unique integers that represent lags to be added to the model. OK
closes the window and returns the specified polynomial to the AR or MA polynomial specification window. Cancel
closes the window and discards any polynomial lags added to the list.
Produce Forecasts Window Use the Produce Forecasts window to produce forecasts for the series in the current input data set for which you have fit forecasting models. Access it by using the Produce Forecasts button of the Time Series Forecasting window.
2624 F Chapter 43: Window Reference
Controls and Fields Input Data Set is the name of the current input data set. To specify the input data set, you can type a one-level or two-level SAS data set name in this field or select the Browse button to the right of the field. Input data set Browse button opens the Data Set Selection window to enable you to select the input data set. Time ID
is the name of the time ID variable for the input data set. To specify this variable, you can type the ID variable name in this field or use the Select button. Time ID Select button opens the Time ID Variable Specification window. Create button
opens a menu of choices of methods for creating a time ID variable for the input data set. Use this feature if the input data set does not already contain a valid time ID variable. Interval
is the time interval between observations (data frequency) in the current input data set. If the interval is not automatically filled in by the system, you can type in an interval name here, or select one from the pop-up list. Series
indicates the number and names of time series variables for which forecasts will be produced. Series Select button opens the Series to Process window to let you select the series for which you want to produce forecasts. Forecast Output Data Set is the name of the output data set that will contain the forecasts. Type the name of the output data set in this field or click the Browse button. Forecast Output Browse button opens a dialog to let you locate an existing data set to which to save the forecasts. Format
enables you to select one of three formats for the forecast data set: Simple
specifies the simple format for the output data set. The data set contains the time ID variable and the forecast variables and contains one observation per time period. Observations for earlier time periods contain actual values copied from the input data set; later observations contain the forecasts. Interleaved
specifies the interleaved format for the output data set. The data set contains the time ID variable, the variable TYPE, and the forecast variables. There are several observations per time period, with the meaning of each observation identified by the TYPE variable. Concatenated
specifies the concatenated format for the output data set. The data set contains the variable SERIES, the time ID variable, and the variables ACTUAL, PREDICT, ERROR,
Produce Forecasts Window F 2625
LOWER, and UPPER. There is one observation per time period per forecast series. The variable SERIES contains the name of the forecast series, and the data set is sorted by SERIES and DATE. Horizon
is the number of periods or years to forecast beyond the end of the input data range. To specify the forecast horizon, you can type a value in this field or select one from the pop-up list. Horizon periods
selects the units to apply to the horizon. By default, the horizon value represents number of periods. For example, if the interval is month, the horizon represents the number of months to forecast. Depending on the interval, you can also select weeks or years, so that the horizon is measured in those units. Horizon date
is the ending date of the forecast horizon. You can type in a date that uses a form recognized by a SAS date informat, or you can increment or decrement the date shown by using the left and right arrows. The outer arrows change the date by a larger amount than the inner arrows. The date field and the horizon field reset each other, so you can use either one to specify the forecasting horizon. Run button
produces forecasts for the selected series and stores the forecasts in the specified output SAS data set. Output button
opens a Viewtable window to display the output data set. This button becomes available once the forecasts have been written to the data set. Close button
closes the Produce Forecasts window and returns to the Time Series Forecasting window.
Menu Bar File
Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print Setup
opens the Print Setup window, which allows you to access your operating system print setup. Close
closes the Produce Forecasts window and returns to the Time Series Forecasting window.
2626 F Chapter 43: Window Reference
View
Input Data Set
opens a Viewtable window to browse the current input data set. Output Data Set
opens a Viewtable window to browse the output data set. This is the same as the Output button. Tools
Produce Forecasts
produces forecasts for the selected series and stores the forecasts in the specified output SAS data set. This is the same as the Run button. Options
Default Time Ranges
opens the Default Time Ranges window to enable you to control how the system sets the time ranges when new series are selected. Model Selection List
opens the Model Selection List editor window. Use this to edit the set of forecasting models considered by the automatic model selection process and displayed by the Models to Fit window. Model Selection Criterion
opens the Model Selection Criterion window, which presents a list of goodness-of-fit statistics and enables you to select the fit statistic that is displayed in the table and used by the automatic model selection process to determine the best fitting model. Statistics of Fit
opens the Statistics of Fit Selection window, which presents a list of statistics that the system can display. Use this action to customize the list of statistics shown in the Statistics of Fit table and available for selection in the Model Selection Criterion window. Forecast Options
opens the Forecast Options window, which enables you to control the widths of forecast confidence limits and control the kind of predicted values computed for models that include series transformations. Forecast Data Set
enables you to select one of three formats for the forecast data set. See Format, which is described previously in this section. Alignment of Dates
Beginning
aligns dates that the system generates to identify forecast observations in output data sets to the beginning of the time intervals.
Regressors Selection Window F 2627
Middle
aligns dates that the system generates to identify forecast observations in output data sets to the midpoints of the time intervals. End
aligns dates that the system generates to identify forecast observations in output data sets to the end of the time intervals. Automatic Fit
opens the Automatic Model Selection Options window, which enables you to control the number of models retained by the automatic model selection process and whether the models considered for automatic selection are subset according to the series diagnostics. Set Toolbar Type
Image Only
displays the toolbar items as icons without text. Label Only
displays the toolbar items as text without icon images. Both
displays the toolbar items as both text and icon images. Include Interventions
controls whether intervention effects defined for the current series are automatically added as predictors to the models considered by the automatic selection process. A check mark or filled check box next to this item indicates that the option is turned on. Print Audit Trail
prints to the SAS log information about the models fit by the system. A check mark or filled check box next to this item indicates that the audit option is turned on. Show Source Statements
controls whether SAS statements submitted by the forecasting system are printed in the SAS log. When the Show Source Statements option is selected, the system sets the SAS system option SOURCE before submitting SAS statements; otherwise, the system uses the NOSOURCE option. Note that only some of the functions performed by the forecasting system are accomplished by submitting SAS statements. A check mark or filled check box next to this item indicates that the option is turned on.
Regressors Selection Window Use the Regressors Selection window to select one or more time series variables in the input data set to include as regressors in the forecasting model to predict the dependent series. Access it from the pop-up menu that appears when you select the Add button of the ARIMA Model Specification window or Custom Model Specification window.
2628 F Chapter 43: Window Reference
Controls and Fields Dependent
is the name and variable label of the current series. Regressors
is a table listing the names and labels of the variables in the input data set available for selection as regressors. The variables that you select are highlighted. Selecting a highlighted row again deselects that variable. OK
closes the Regressors Selection window and adds the selected variables as regressors in the model. Cancel
closes the window without adding any regressors. Any selections you made are lost. Reset
resets all options to their initial values upon entry to the window.
Save Data As Use Save Data As from the Time Series Viewer Window or the Model Viewer Window to save data displayed in a table to a SAS data set or external file. Use Save Forecast As from the Develop Models Window to save forecasts and related data including the series name, model, and interval. It supports append mode, enabling you to accumulate the forecasts of multiple series in a single data set.
Save Data As F 2629
To save your data in a SAS data set, provide a library name or assign one by using the Browse button, then provide a data set name or accept the default. Enter a descriptive label for the data set in the Label field. Click OK to save the data set. If you specify an existing data set, it will be overwritten, except in the case of Save Forecast As. External file output takes advantage of the Output Delivery System (ODS) and is designed primarily for creating HTML tables for Web reporting. You can build a set of Web pages quickly and use the ODS Results window to view and organize them. To use this feature, check Save External File in the External File Output box. To set ODS options, click Results Preferences, then select the Results tab in the Preferences dialog. If you have previously saved data of the current type, the system remembers your previous labels and titles. To reuse them, click the arrow button to the right of each of these window fields. Use the Customize button if you need to specify the name of a custom macro that contains ODS statements. The default macro simply runs the PRINT procedure. A custom macro can be used to add PRINT procedure and/or ODS statements to customize the type and organization of output files produced.
2630 F Chapter 43: Window Reference
Save Graph As Use Save Graph As from the Time Series Viewer Window or the Model Viewer Window to save any of the graphs in a catalog or external file.
To save your graph as a grseg catalog entry, enter a two level name for the catalog or select Browse to open an Open dialog. Use it to select an existing library or assign a new one and then select a catalog to contain the graph. Click the Open button to open the catalog and close the dialog. Then enter a graphics entry name (eight characters or less) and a label or accept the defaults and click the OK button to save the graph. External file output takes advantage of the Output Delivery System (ODS) and is designed primarily for creating gif images and HTML for Web reporting. You can build a set of Web pages that contain graphs and use the Results window to view and organize them. To use this feature, check Save External File in the External File Output box. To set ODS options, click Results Preferences, then select the Results tab in the Preferences dialog. If you have previously saved graphs of the current type, the system remembers your previous labels and titles. To reuse them, click the arrow button to the right of each of these window fields. Use the Customize button if you need to specify the name of a custom macro that contains ODS statements. The default macro simply runs the GREPLAY procedure. Users familiar with ODS
Seasonal ARIMA Model Options Window F 2631
might want to add statements to the macro to customize the type and organization of output files produced.
Seasonal ARIMA Model Options Window Use the Seasonal ARIMA Model Options window to specify the autoregressive, differencing, and moving-average orders for the seasonal part of a model defined by using the Custom Model Specification window. Access it by selecting “Seasonal ARIMA. . . ” from the Seasonal Model combo box of that window.
Controls and Fields ARIMA Options
Use these combo boxes to specify the orders of the ARIMA model. You can either type in a value or click the combo box arrow to select from a pop-up list. Autoregressive
defines the order of the seasonal autoregressive part of the model. Differencing
defines the order of seasonal differencing. Moving Average
defines the order of the seasonal moving-average term. OK
closes the Seasonal ARIMA Model Options window and returns to the Custom Model Specification window.
2632 F Chapter 43: Window Reference
Cancel
closes the Seasonal ARIMA Model Options window and returns to the Custom Model Specification window, discarding any changes made. Reset
resets all options to their initial values upon entry to the window.
Series Diagnostics Window Use the Series Diagnostics window to set options to limit the kinds of forecasting models considered for the series according to series characteristics. Access it by selecting “Diagnose Series” from the Tools menu in the Develop Models, Manage Project, and Time Series Viewer window menu bars. You can let the system diagnose the series characteristics automatically or you can specify series characteristics according to your judgment by using the radio buttons.
For each of the options Log Transform, Trend, and Seasonality, the value “Yes” means that only models appropriate for series with that characteristic should be considered. The value “No” means that only models appropriate for series without that characteristic should be considered. The value “Maybe” means that models should be considered without regard for that characteristic.
Controls and Fields Series
is the name and variable label of the current series.
Series Selection Window F 2633
Series Characteristics
Log Transform
specifies whether forecasting models with or without a logarithmic series transformation are appropriate for the series. Trend
specifies whether forecasting models with or without a trend component are appropriate for the series. Seasonality
specifies whether forecasting models with or without a seasonal component are appropriate for the series. Automatic Series Diagnostics
performs the automatic series diagnostic process. The options Log Transform, Trend, and Seasonality are set according to the results of statistical tests. OK
closes the Series Diagnostics window. Cancel
closes the Series Diagnostics window without changing the series diagnostics options. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the Series Diagnostics window. Clear
resets all options to their default values.
Series Selection Window Use this resizable window to select a time series variable by specifying a library, a SAS data set or view, and a variable. These selections can be made by typing, by selecting from lists, or by a combination of the two. In addition, you can control the time ID variable and time interval, and you can browse the data set or view plots of the series from this window.
2634 F Chapter 43: Window Reference
This window appears automatically when you select the View Series Graphically or Develop Models buttons in the Time Series Forecasting window and no series has been selected, and when you open the Time Series Viewer as a standalone tool. It is also invoked by using the Browse button in the Develop Models window. The system requires that series names be unique for each frequency (interval) within the forecasting project. If you select a series from the current input data set that already exists in the project with the same interval but a different input data set name, the system warns you and gives you the option to cancel the selection, to refit all models associated with the series by using the data from the current input data set, to delete the models for the series, or to inherit the existing models.
Controls and Fields Library
is a SAS libname assigned within the current SAS session. If you know the libname associated with the data set of interest, you can type it in this field and press Return. If it is a valid choice, it will appear in the libraries list and will be highlighted. The SAS Data Sets list will be populated with data sets associated with that libname. Data Set
is the name of a SAS data set (data file or data view) that resides under the selected libname. If you know the name, you can type it in and press Return. If it is a valid choice, it will appear in the SAS Data Sets list and will be highlighted, and the Time Series Variables list will be populated with the numeric variables in the data set. Variable
is the name of a numeric variable contained in the selected data set. You can type the variable name in this field or you can select the variable with the mouse from the Time Series Variables list.
Series Selection Window F 2635
Time ID
is the name of the ID variable for the input data set. To specify the ID variable, you can type the ID variable name in this field or click the Select button. Select button
opens the Time ID Variable Specification window to let you select an existing variable in the data set as the Time ID. Create button
opens a menu of methods for creating a time ID variable for the input data set. Use this feature if the data set does not already contain a valid time ID variable. Interval
is the time interval between observations (data frequency) in the selected data set. If the interval is not automatically filled in by the system, you can type in an interval name or select one from the pop-up list. For more information about intervals, see Chapter 4, “Date Intervals, Formats, and Functions,” in this book. OK
This button is present when you have selected “Develop Models” from the Time Series Forecasting window. It closes the Series Selection window and makes the selected series the current series. Close
If you have selected the View Series Graphically icon from the Time Series Forecasting window, this button returns you to that window. If you have selected a series, it remains selected as the current series. If you are using the Time Series Viewer as a standalone application, this button closes the application. Cancel
This button is present when you have selected “Develop Models” from the Time Series Forecasting window. It closes the Series Selection window without applying any selections made. Reset
resets the fields to their initial values at entry to the window. Table
opens a Viewtable window for browsing the selected data set. This can assist you in locating the variable containing data you are looking for. Graph
opens the Time Series Viewer window to display the selected time series variable. You can switch to a different series in the Series Selection window without closing the Time Series Viewer window. Position the windows so they are both visible, or use the Next Viewer toolbar icon or F12 function key to switch between windows. Refresh
updates all fields and lists on the window. If you assign a new libname without exiting the Series Selection window, use the refresh action to update the Libraries list so that it will include the newly assigned libname. Also use the Refresh action to update the variables list if the input data set is changed.
2636 F Chapter 43: Window Reference
Selection Lists Libraries
displays a list of currently assigned libnames. You can select a libname by clicking it, which is equivalent to typing its name in the Library field. If you cannot locate the library or directory you are interested in, go to the SAS Explorer window, select “New” from the File menu, then select “Library” and “OK.” This opens the New Library dialog window. You also assign a libname by submitting a libname statement from the Editor window. Select the Refresh button to make the new libname available in the libraries list. SAS Data Sets
displays a list of the SAS data sets (data files or data views) located under the selected libname. You can select one of these by clicking it, which is equivalent to typing its name in the Data Set field. Time Series Variables
displays a list of numeric variables contained within the selected data set. You can select one of these by clicking it, which is equivalent to typing its name in the Variable field. You can double-click a series to select it and exit the window.
Series to Process Window Use the Series to Process window to select series for model fitting or forecasting. Access it by using the Select button in the Automatic Model Fitting and Produce Forecasts windows. Hold down the shift key or drag with the left mouse button for multiple selections. Use the control key for noncontiguous multiple selections. Once you make selections and select OK, the number of selected series and their names are listed in the Series to Process field of the calling window (with ellipses if not all the names will fit). When invoked from Automatic Model Fitting, the Series to Process window shows all the numeric variables in the input data set except the time ID variable. These are the series which are currently available for model fitting. When invoked from Produce Forecasts, the Series to Process window shows all the series in the input data set for which models have been fit. These are the series which are currently available for forecasting.
Series Viewer Transformations Window F 2637
Controls and Fields OK
closes the window and applies the series selection(s) which have been made. At least one series must be selected. Cancel
closes the window, ignoring series selections which have been made, if any. Clear
deselects all series in the selection list. All
selects all series in the selection list.
Series Viewer Transformations Window Use the Series Viewer Transformations window to view plots of transformations of the current series in the Time Series Viewer window. It provides a larger set of transformations than those available from the viewer window’s toolbar. It is invoked by using “Other Transformations” under the Tools menu of the Time Series Viewer window. The options that you specify in this window are applied to the series displayed in the Time Series Viewer window when you select “OK” or “Apply.” Use the Apply button if you want to make repeated transformations to a series without having to close and reopen the Series Viewer Transformations window each time.
2638 F Chapter 43: Window Reference
Controls and Fields Series
is the variable name for the current time series. Transformation
is the transformation applied to the time series displayed in the Time Series Viewer window. Select Log, Logistic, Square Root, Box-Cox, or none from the pop-up list. Simple Differencing
is the order of differencing applied to the time series displayed in the Time Series Viewer window. Select a number from 0 to 5 from the pop-up list. Seasonal Differencing
is the order of seasonal differencing applied to the time series displayed in the Time Series Viewer window. Select a number from 0 to 3 from the pop-up list. Percent Change
is a check box that if selected displays the series in terms of percent change from the previous period. Additive Decomposition
is a check box that produces a display of a selected series component derived by using additive decomposition. Multiplicative Decomposition
is a check box that produces a display of a selected series component derived using multiplicative decomposition. Component
selects a series component to display when either additive or multiplicative decomposition is
Smoothing Model Specification Window F 2639
turned on. You can display the seasonally adjusted component, the trend-cycle component, the seasonal component, or the irregular component—that is, the residual that remains after removal of the other components. The heading in the viewer window shows which component is currently displayed. OK
applies the transformation options you selected to the series displayed in the Time Series Viewer window and closes the Series Viewer Transformations window. Cancel
closes the Series Viewer Transformations window without changing the series displayed by the Time Series Viewer window. Apply
applies the transformation options you selected to the series displayed in the Time Series Viewer window without closing the Series Viewer Transformations window. Reset
resets the transformation options to their initial values upon entry to the Series Viewer Transformations window. Clear
resets the transformation options to their default values (no transformations).
Smoothing Model Specification Window Use the Smoothing Model Specification window to specify and fit exponential smoothing and Winters method models. Access it from the Develop Models window by using the Fit Model submenu of the Edit menu or from the pop-up menu when you click an empty area of the model table.
2640 F Chapter 43: Window Reference
Controls and Fields Series
is the name and variable label of the current series. Model
is a descriptive label for the model that you specify. You can type a label in this field or allow the system to provide a label. If you leave the label blank, a label is generated automatically based on the options you specify. Smoothing Methods
Simple Smoothing
specifies simple (single) exponential smoothing. Double (Brown) Smoothing
specifies double exponential smoothing by using Brown’s one parameter model (single exponential smoothing applied twice). Seasonal Smoothing
specifies seasonal exponential smoothing. (This is like Winters method with the trend term omitted.) Linear (Holt) Smoothing
specifies exponential smoothing of both the series level and trend (Holt’s two parameter model). Damped-Trend Smoothing
specifies exponential smoothing of both the series level and trend with a trend damping weight. Winters Method - Additive
specifies Winters method with additive seasonal factors. Winters Method - Multiplicative
specifies Winters method with multiplicative seasonal factors. Smoothing Weights
displays the values used for the smoothing weights. By default, the Smoothing Weights fields are set to “optimize,” which means that the system will compute the weight values that best fit the data. You can also type smoothing weight values in these fields. Level
is the smoothing weight used for the level of the series. Trend
is the smoothing weight used for the trend of the series. Damping
is the smoothing weight used by the damped-trend method to damp the forecasted trend towards zero as the forecast horizon increases. Season
is the smoothing weight used for the seasonal factors in Winters method and seasonal exponential smoothing.
Smoothing Weight Optimization Window F 2641
Transformation
displays the series transformation specified for the model. When a transformation is specified, the model is fit to the transformed series, and forecasts are produced by applying the inverse transformation to the model predictions. Select Log, Logistic, Square Root, Box-Cox, or None from the pop-up list. Bounds
displays the constraints imposed on the fitted smoothing weights. Select one of the following from the pop-up list: Zero-One/Additive
sets the smoothing weight optimization region to the intersection of the region bounded by the intervals from zero (0.001) to one (0.999) and the additive invertible region. This is the default. Zero-One Boundaries
sets the smoothing weight optimization region to the region bounded by the intervals from zero (0.001) to one (0.999). Additive Invertible
sets the smoothing weight optimization region to the additive invertible region. Unrestricted
sets the smoothing weight optimization region to be unbounded. Custom
opens the Smoothing Weights window to enable you to customize the constraints for smoothing weights optimization. OK
closes the Smoothing Model Specification window and fits the model you specified. Cancel
closes the Smoothing Model Specification window without fitting the model. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the window. This might be useful when editing an existing model specification; otherwise, Reset has the same function as Clear. Clear
resets all options to their default values.
Smoothing Weight Optimization Window Use the Smoothing Weight Optimization window to specify constraints for the automatic fitting of smoothing weights for exponential smoothing and Winters method models. Access it from the Smoothing Models Specification window when you select “Custom” in the “Bounds” combo box.
2642 F Chapter 43: Window Reference
Controls and Fields No restrictions
when selected, specifies unrestricted smoothing weights. Bounded region
when selected, restricts the fitted smoothing weights to be within the bounds that you specify with the “Smoothing Weight Bounded Region” options. Additive invertible region
when selected, restricts the fitted smoothing weights to be within the additive invertible region of the parameter space of the ARIMA model equivalent to the smoothing model. (See the section “Smoothing Models” on page 2669 for details.) Additive invertible and bounded region
when selected, restricts the fitted smoothing weights to be both within the additive invertible region and within bounds that you specify. Smoothing Weight Bounded Region
is a group of numeric entry fields that enable you to specify lower and upper limits on the fitted value of each smoothing weight. The fields that appear in this part of the window depend on the kind of smoothing model that you specified. OK
closes the window and sets the options that you specified. Cancel
closes the window without changing any options. Any values you specified are lost. Reset
resets all options to their initial values upon entry to the window. Clear
resets all options to their default values.
Statistics of Fit Selection Window F 2643
Statistics of Fit Selection Window Use the Statistics of Fit Selection window to specify which of the available goodness-of-fit statistics are reported for models you fit and are available for selection as the model selection criterion used by the automatic selection process. This window is available under the Options menu in the Develop Models, Automatic Model Fitting, Produce Forecasts, and Model List windows, and from the Statistics button of the Model Fit Comparison window and Automatic Model Fitting results windows.
Controls and Fields Select Statistics Table
list the available statistics. Select a row of the table to select or deselect the statistic shown in that row. OK
closes the window and applies the selections made. Cancel
closes the window without applying any selections. Clear
deselects all statistics. All
selects all statistics.
2644 F Chapter 43: Window Reference
Time ID Creation – 1,2,3 Window Use the Time ID Creation – 1,2,3 window to add a time ID variable to an input data set with observation numbers as the ID values. The interval for the series will be 1. Use this approach if the data frequency does not match any of the system’s date or date-time intervals, or if other methods of assigning a time ID do not work. To access this window, select “Create from observation numbers” from the Create pop-up list in any window where you can select a Time ID variable. For more information, see Chapter 4, “Date Intervals, Formats, and Functions,” in this book.
Controls and Fields Data set name
is the name of the input data set. New ID variable name
is the name of the time ID variable to be created. You can type any valid SAS variable name in this field. OK
closes the window and proceeds to the next step in the time ID creation process. Cancel
closes the window without creating a Time ID variable. Any options you specified are lost.
Time ID Creation from Several Variables Window Use the Time ID Creation from Several Variables window to add a SAS date valued time ID variable to an input data set when the input data set already contains several dating variables, such as day, month, and year. To access this window, select “Create from existing variables” from the Create pop-up list in any window where you can select a Time ID variable. For more information, see Chapter 38, “Creating Time ID Variables.”
Time ID Creation from Several Variables Window F 2645
Controls and Fields Variables
is a list of variables in the input data set. Select existing ID variables from this list. Date Part
is a list of date parts that you can specify for the selected ID variable. For each ID variable that you select from the Variables list, select the Date Part value that describes what the values of the ID variable represent. arrow button
moves the selected existing ID variable and date part specification to the “Existing Time IDs” list. Once you have done this, you can select another ID variable from the Variables list. New variable
is the name of the time ID variable to be created. You can type any valid SAS variable name in this field. New interval
is the time interval between observations in the input data set implied by the date part ID variables you have selected. OK
closes the window and proceeds to the next step in the time ID creation process. Cancel
closes the window without creating a time ID. Any options you specified are lost. Reset
resets the options to their initial values upon entry to the window.
2646 F Chapter 43: Window Reference
Time ID Creation from Starting Date Window Use the Time ID Creation from Starting Date window to add a SAS date valued time ID variable to an input data set. This is a convenient way to add a time ID of any interval as long as you know the starting date of the series. To access this window, select “Create from starting date and frequency” from the Create pop-up list in any window where you can select a Time ID variable. For more information, see Chapter 38, “Creating Time ID Variables.”
Controls and Fields Data set name
is the name of the input data set. Starting Date
is the starting date for the time series in the data set. Enter a date value in this field, using a form recognizable by a SAS date informat, for example, 1998:1, feb1997, or 03mar1998. Interval
is the time interval between observations in the data set. Select an interval from the pop-up list. New ID variable name
is the name of the time ID variable to be created. You can type any valid SAS variable name in this field. OK
closes the window and proceeds to the next step in the time ID creation process. Cancel
closes the window without changing the input data set. Any options you specified are lost.
Time ID Creation Using Informat Window F 2647
Time ID Creation Using Informat Window Use the Time ID Creation using Informat window to add a SAS date valued time ID variable to an input data set. Use this window if your data set contains a date variable that is stored as a character string. Using the appropriate SAS date informat, the date string is read in and used to create a date or date-time variable. To access this window, select “Create from existing variable/informat” from the Create pop-up list in any window where you can select a Time ID variable.
Controls and Fields Variable Name
is the name of an existing ID variable in the input data set. Click the Select button to select a variable. Select button
opens a list of variables in the input data set for you to select from. Informat
is a SAS date or datetime informat for reading date or datetime value from the values of the specified existing ID variable. You can type in an informat or select one from the pop-up list. First Obs
is the value of the variable you selected from the first observation in the data set, displayed here for convenience. Date Value
is the SAS date or datetime value read from the first observation value that uses the informat that you specified.
2648 F Chapter 43: Window Reference
New ID variable name
is the name of the time ID variable to be created. You can type any valid SAS variable name in this field. OK
closes the window and proceeds to the next step in the time ID creation process. Cancel
closes the window without changing the input data set. Any options you specified are lost. Reset
resets the options to their initial values upon entry to the window.
Time ID Variable Specification Window Use the Time ID Variable Specification window to specify a variable in the input data set that contains the SAS date or datetime value of each observation. You do not need to use this window if your time ID variable is named date, time, or datetime, since these are picked up automatically. Invoke the window from the Select button to the right of the Time ID field in the Data Set Selection, Automatic Model Fitting, Produce Forecasts, Series Selection, and Time Series Forecasting windows.
Controls and Fields Data Set
is the name of the current input data set. Time ID
Time Ranges Specification Window F 2649
is the name of the currently selected Time ID variable, if any. Interval
is the time interval between observations (data frequency) in the input data set. Select a Time ID Variable
is a selection list of variables in the input set. Select one variable to assign it as the Time ID variable. OK
closes the window and retains the selection made, if it is a valid time ID. Cancel
closes the window and ignores any selection made. Reset
restores the time ID variable to the one assigned when the window was initially opened, if any.
Time Ranges Specification Window Use the Time Ranges Specification window to control the period of fit and evaluation and the forecasting horizon. Invoke this window from the Options menu in the Develop Models, Manage Forecasting Project, and Model Viewer windows or the Set Ranges button in the Develop Models window.
2650 F Chapter 43: Window Reference
Controls and Fields Data Set
is the name of the current input data set. Interval
is the time interval (data frequency) for the input data set. Series
is the variable name and label of the current time series. Data Range
gives the date of the first and last nonmissing data values available for the current series in the input data set. Period of Fit
gives the starting and ending dates of the period of fit. This is the time range used for estimating model parameters. By default, it is the same as the data range. You can type dates in these fields, or you can use the arrow buttons to the left and right of the date fields to decrement or increment the date values shown. Date values must be entered in a form recognized by a SAS date informat. (See SAS Language Reference: Concepts for information about SAS date informats.) The inner arrows increment by periods, the outer arrows increment by larger amounts, depending on the data interval. Period of Evaluation
gives the starting and ending dates of the period of evaluation. This is the time range used for evaluating models in terms of statistics of fit. By default, it is the same as the data range. You can type dates in these fields, or you can use the control arrows to the left and right of the date fields to decrement or increment the date values shown. Date values must be entered in a form recognized by a SAS date informat. (See SAS Language Reference: Concepts for information about SAS date informats.) The inner arrows increment by periods, the outer arrows increment by larger amounts, depending on the data interval. Forecast Horizon
is the forecasting horizon expressed as a number of forecast periods or number of years (or number of weeks for daily data). You can type a number or select one from the pop-up list. The ending date for the forecast period is automatically updated when you change the number of forecasts periods. Forecast Horizon - Units
indicates whether the Forecast Horizon value represents periods or years (or weeks for daily data). Forecast Horizon Date Value
is the date of the last forecast observation. You can type a date in this field, or you can use the arrow buttons to the left and right of the date field to decrement or increment the date values shown. Date values must be entered in a form recognized by a SAS date informat. (See SAS Language Reference: Concepts for information about SAS date informats.) The Forecast Horizon is automatically updated when you change the ending date for the forecast period. Hold-out Sample
specifies that a number of observations or years (or weeks) of data at the end of the data range are used for the period of evaluation with the remainder of data used as the period of fit. You can type a number in this field or select one from the pop-up list. When the hold-out sample
Time Series Forecasting Window F 2651
value is changed, the Period of Fit and Period of Evaluation ranges are changed to reflect the hold-out sample specification. Hold-out Sample - Units
indicates whether the hold-out sample field represents periods or years (or weeks for daily data). OK
closes the window and stores the specified changes. Cancel
closes the window without saving changes. Any options you specified are lost. Reset
resets the options to their initial values upon entry to the window. Clear
resets all options to their default values.
Time Series Forecasting Window The Time Series Forecasting window is the main application window that appears when you invoke the Time Series Forecasting System. It enables you to specify a project file and an input data set and provides access to the other windows described in this chapter.
2652 F Chapter 43: Window Reference
Controls and Fields Project
is the name of the SAS catalog entry in which forecasting models and other results will be stored and from which previously stored results are loaded into the forecasting system. You can specify the project by typing a SAS catalog entry name in this field or by selecting the Browse button to right of this field. If you specify the name of an existing catalog entry, the information in the project file is loaded. If you specify a one-level name, the catalog name is assumed to be fmsproj and the library is assumed to be sasuser. For example, samproj is equivalent to sasuser.fmsproj.samproj. Project Browse button opens the Forecasting Project File Selection window to enable you to select and load the project from a list of previously stored projects. Description
is a descriptive label for the forecasting project. The description you type in this field will be stored with the catalog entry shown in the Project field. Data Set
is the name of the current input data set. To specify the input data set, you can type the data set name in this field or use the Browse button to the right of the field. Data set Browse button opens the Data Set Selection window to enable you to select the input data set. Time ID
is the name of the ID variable for the input data set. To specify the ID variable, you can type the ID variable name in this field or use the Select button. If the time ID variable is named date, time, or datetime, it is automatically picked up by the system. Select button
opens the Time ID Variable Specification window. Create button
opens a menu of choices of methods for creating a time ID variable for the input data set. Use this feature if the input data set does not already contain a valid time ID variable. Interval
is the time interval between observations (data frequency) in the current input data set. If the interval is not automatically filled in, you can type an interval name or select one from the pop-up list. For more information about intervals, see the section “Time Series Data Sets, ID Variables, and Time Intervals” on page 2395. View Series Graphically icon
opens the Time Series Viewer window to display plots of series in the current input data set. View Data as a Table
opens a Viewtable window for browsing the selected input data set. Develop Models
opens the Develop Models window to enable you to fit forecasting models to individual time series and choose the best models to use to produce the final forecasts of each series.
Time Series Simulation Window F 2653
Fit Models Automatically
opens the Automatic Model Fitting window for applying the automatic model selection process to all series or to selected series in an input data set. Produce Forecast
opens the Produce Forecasts window for producing forecasts for the series in the current input data set for which you have fit forecasting models. Manage Projects
opens the Manage Forecasting Project window for viewing or editing information stored in projects. Exit
closes the Time Series Forecasting system. Help
accesses the help system.
Time Series Simulation Window Use the Time Series Simulation window to create a data set of simulated series generated by ARIMA processes. Access this window from the Tools menu in the Develop Models and Manage Forecasting Project windows.
2654 F Chapter 43: Window Reference
Controls and Fields Output Data Set
is the name of the data set to be created. Type in a one-level or two-level SAS data set name. Interval
is the time interval between observations (data frequency) in the simulated data set. Type in an interval name or select one from the pop-up list. Seed
is the seed for the random number generator used to produce the simulated time series. N Observations
is the number of time periods to simulate. Starting Date
is the starting date for the simulated observations. Type in a date in a form recognizable by a SAS data informat, for example, 1998:1, feb1997, or 03mar1998. Ending Date
is the ending date for the simulated observations. Type in a date in a form recognizable by a SAS data informat. Series to Generate
is the list of variable names and ARIMA processes to simulate. Add Series
opens the ARIMA Process Specification window to enable you to add entries to the Series to Generate list. Delete Series
deletes selected (highlighted) entries from the Series to Generate list. OK
closes the Time Series Simulation window and performs the specified simulations and creates the specified data set. Cancel
closes the window without creating a simulated data set. Any options you specified are lost.
Time Series Viewer Window Use the Time Series Viewer window to explore time series data using plots, transformations, statistical tests, and tables. It is available as a standalone application and as part of the Time Series Forecasting System. To use it as a standalone application, select it from the Analysis submenu of the Solutions menu, or use the tsview command (see Chapter 42, “Command Reference,” in this book). To use it within the Time Series Forecasting System, select the View Series Graphically icon in the Time Series Forecasting, Develop Models, or Model List window, or select “Series” from the View menu of the Develop Models, Manage Project, or Model List window. The various plots and tables available are referred to as views. The section “View Selection Icons” on page 2617 explains how to change the view.
Time Series Viewer Window F 2655
The state of the Time Series Viewer window is controlled by the current series, the current series transformation specification, and the currently selected view. You can resize this window, and you can use other windows without closing the Time Series Viewer window. You can explore a number of series conveniently by keeping the Series Selection window open. Each time you make a selection, the viewer window is updated to show the selected series. Keep both windows visible, or switch between them by using the Next Viewer toolbar icon or the F12 function key. You can open multiple Time Series Viewer windows. This enables you to “freeze”a plot so you can come back to it later, or compare two plots side by side on your screen. To do this, unlink the viewer by using the Link/Unlink icon on the window’s toolbar or the corresponding item in the Tools menu. While the viewer window remains unlinked, it is not updated when other selections are made in the Series Selection window. Instead, when you select a series and click the Graph button, a new Time Series Viewer window is invoked. You can continue this process to open as many viewer windows as you want. The Next Viewer icon and corresponding F12 function key are useful for navigating between windows when they are not simultaneously visible on your screen. A wide range of series transformations is available. Basic transformations are available from the window’s horizontal toolbar, and others are available by selecting “Other Transformations” from the Tools menu.
Horizontal Tool Bar The Time Series Viewer window contains a horizontal toolbar with the following icons:
2656 F Chapter 43: Window Reference
Zoom in
changes the mouse cursor into cross hairs that you can use with the left mouse button to drag out a region of the time series plot to zoom in on. In the Autocorrelations view and the White Noise and Stationarity Tests view, Zoom In reduces the number of lags displayed. Zoom out
reverses the previous Zoom In action and expands the time range of the plot to show more of the series. In the Autocorrelations view and the White Noise and Stationarity Tests view, Zoom Out increases the number of lags displayed. Link/Unlink viewer
disconnects or connects the Time Series Viewer window to the window in which the series was selected. When the Viewer is linked, it always shows the current series. If you select another series, linked Viewers are updated. Unlinking a Viewer freezes its current state, and changing the current series has no effect on the Viewer’s display. The View Series action creates a new Series Viewer window if there is no linked Viewer. By using the unlink feature, you can open several Time Series Viewer windows and display several different series simultaneously. Log Transform
applies a log transform to the current view. This can be combined with other transformations; the current transformations are shown in the title. Difference
applies a simple difference to the current view. This can be combined with other transformations; the current transformations are shown in the title. Seasonal Difference
applies a seasonal difference to the current view. For example, if the data are monthly, the seasonal cycle is one year. Each value has subtracted from it the value from one year previous. This can be combined with other transformations; the current transformations are shown in the title. Close
closes the Time Series Viewer window and returns to the window from which it was invoked.
Vertical Toolbar View Selection Icons At the right-hand side of the Time Series Viewer window is a vertical toolbar used to select the kind of plot or table that the Viewer displays. Series
displays a plot of series values over time. Autocorrelations
displays plots of the sample autocorrelations, partial autocorrelation, and inverse autocorrelation functions for the series, with lines overlaid at plus and minus two standard errors. White Noise and Stationarity Tests
displays horizontal bar charts that represent results of white noise and stationarity tests. The first bar chart shows the significance probability of the Ljung-Box chi-square statistic computed on autocorrelations up to the given lag. Longer bars favor rejection of the null hypothesis that the series is white noise. Click any of the bars to display an interpretation.
Time Series Viewer Window F 2657
The second bar chart shows tests of stationarity, where longer bars favor the conclusion that the series is stationary. Each bar displays the significance probability of the augmented Dickey-Fuller unit root test to the given autoregressive lag. Long bars represent higher levels of significance against the null hypothesis that the series contains a unit root. For seasonal data, a third bar chart appears for seasonal root tests. Click any of the bars to display an interpretation. Data Table
displays a data table containing the values in the input data set.
Menu Bar File
Save Graph
saves the current plot as a SAS/GRAPH grseg catalog entry in a default or most recently specified catalog. This item is unavailable in the Data Table view. Save Graph as
saves the current graph as a SAS/GRAPH grseg catalog entry in a SAS catalog that you specify and/or as an Output Delivery System (ODS) object. By default, an HTML page is created, with the graph embedded as a gif image. This item is unavailable in the Data Table view. Save Data
saves the data displayed in the viewer window to an output SAS data set. This item is unavailable in the Series view. Save Data as
saves the data in a SAS data set that you specify and/or as an Output Delivery System (ODS) object. By default, an HTML page is created, with the data displayed as a table. Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print Graph
prints the plot displayed in the viewer window. This item is unavailable in the Data Table view. Print Data
prints the data displayed in the viewer window. This item is unavailable in the Series view. Print Setup
opens the Print Setup window, which allows you to access your operating system print setup.
2658 F Chapter 43: Window Reference
Print Preview
opens a preview window to show how your plots will look when printed. Close
closes the Time Series Viewer window and returns to the window from which it was invoked. View
Series
displays a plot of series values over time. This is the same as the Series icon in the vertical toolbar. Autocorrelations
displays plots of the sample autocorrelation, partial autocorrelation, and inverse autocorrelation functions for the series. This is the same as the Autocorrelations icon in the vertical toolbar. White Noise and Stationarity Tests
displays horizontal bar charts representing results of white noise and stationarity tests. This is the same as the White Noise and Stationarity Tests icon in the vertical toolbar. Data Table
displays a data table containing the values in the input data set. This is the same as the Data Table icon in the vertical toolbar. Zoom In
zooms the display. This is the same as the Zoom In icon in the window’s horizontal toolbar. Zoom Out
undoes the last zoom in action. This is the same as the Zoom Out icon in the window’s horizontal toolbar. Zoom Way Out
reverses all previous Zoom In actions and expands the time range of the plot to show all of the series, or shows the maximum number of lags in the Autocorrelations View or the White Noise and Stationarity Tests view. Tools
Log Transform
applies a log transformation. This is the same as the Log Transform icon in the window’s horizontal toolbar. Difference
applies simple differencing. This is the same as the Difference icon in the window’s horizontal toolbar. Seasonal Difference
applies seasonal differencing. This is the same as the Seasonal Difference icon in the window’s horizontal toolbar.
Time Series Viewer Window F 2659
Other Transformations
opens the Series Viewer Transformations window to enable you to apply a wide range of transformations. Diagnose Series
opens the Series Diagnostics window to determine the kinds of forecasting models appropriate for the current series. Define Interventions
opens the Interventions for Series window to enable you to edit or add intervention effects for use in modeling the current series. Link Viewer
connects or disconnects the Time Series Viewer window to the window from which series are selected. This is the same as the Link item in the window’s horizontal toolbar. Options
Number of Lags
opens a window to let you specify the number of lags shown in the Autocorrelations view and the White Noise and Stationarity Tests view. You can also use the Zoom In and Zoom Out actions to control the number of lags displayed. Correlation Probabilities
controls whether the bar charts in the Autocorrelations view represent significance probabilities or values of the correlation coefficient. A check mark or filled check box next to this item indicates that significance probabilities are displayed. In each case the bar graph horizontal axis label changes accordingly.
Mouse Button Actions You can examine the data value and date of individual points in the Series view by clicking them. The date and value are displayed in a box that appears in the upper right corner of the Viewer window. Click the mouse elsewhere or select any action to dismiss the data box. You can examine the values of the bars and confidence limits at different lags in the Autocorrelations view by clicking individual bars in the vertical bar charts. You can display an interpretation of the tests in the White Noise and Stationarity Tests view by clicking the bars. When you select the Zoom In action, you can use the mouse to define a region of the graph to take a closer look at. Position the mouse cursor at one corner of the region, press the left mouse button, and move the mouse cursor to the opposite corner of the region while holding the left mouse button down. When you release the mouse button, the plot is redrawn to show an expanded view of the data within the region you selected.
2660
Chapter 44
Forecasting Process Details Contents Forecasting Process Summary . . . . . . . Parameter Estimation . . . . . . . . Model Evaluation . . . . . . . . . . Forecasting . . . . . . . . . . . . . . Forecast Combination Models . . . . External or User-Supplied Forecasts . Adjustments . . . . . . . . . . . . . Series Transformations . . . . . . . . Smoothing Models . . . . . . . . . . . . . Smoothing Model Calculations . . . Missing Values . . . . . . . . . . . . Predictions and Prediction Errors . . Smoothing Weights . . . . . . . . . Equations for the Smoothing Models ARIMA Models . . . . . . . . . . . . . . . Notation for ARIMA Models . . . . Predictor Series . . . . . . . . . . . . . . . Time Trend Curves . . . . . . . . . . Intervention Effects . . . . . . . . . Seasonal Dummy Inputs . . . . . . . Series Diagnostic Tests . . . . . . . . . . . Statistics of Fit . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
2661 2662 2662 2664 2666 2666 2667 2667 2669 2669 2670 2670 2671 2672 2680 2680 2684 2684 2685 2687 2687 2688 2690
This chapter provides computational details on several aspects of the Time Series Forecasting System.
Forecasting Process Summary This section summarizes the forecasting process.
2662 F Chapter 44: Forecasting Process Details
Parameter Estimation The parameter estimation process for ARIMA and smoothing models is described graphically in Figure 44.1. Figure 44.1 Model Fitting Flow Diagram
The specification of smoothing and ARIMA models is described in Chapter 39, “Specifying Forecasting Models.” Computational details for these kinds of models are provided in the following sections “Smoothing Models” on page 2669 and “ARIMA Models” on page 2680. The results of the parameter estimation process are displayed in the Parameter Estimates table of the Model Viewer windows along with the estimate of the model variance and the final smoothing state.
Model Evaluation The model evaluation process is described graphically in Figure 44.2.
Model Evaluation F 2663
Figure 44.2 Model Evaluation Flow Diagram
Model evaluation is based on the one-step-ahead prediction errors for observations within the period of evaluation. The one-step-ahead predictions are generated from the model specification and
2664 F Chapter 44: Forecasting Process Details
parameter estimates. The predictions are inverse transformed (median or mean) and adjustments are removed. The prediction errors (the difference of the dependent series and the predictions) are used to compute the statistics of fit, which are described in the section “Series Diagnostic Tests” on page 2687. The results generated by the evaluation process are displayed in the Statistics of Fit table of the Model Viewer window.
Forecasting The forecasting generation process is described graphically in Figure 44.3.
Forecasting F 2665
Figure 44.3 Forecasting Flow Diagram
The forecasting process is similar to the model evaluation process described in the preceding section, except that k-step-ahead predictions are made from the end of the data through the specified forecast horizon, and prediction standard errors and confidence limits are calculated. The forecasts and confidence limits are displayed in the Forecast plot or table of the Model Viewer window.
2666 F Chapter 44: Forecasting Process Details
Forecast Combination Models This section discusses the computation of predicted values and confidence limits for forecast combination models. See Chapter 39, “Specifying Forecasting Models,” for information about how to specify forecast combination models and their combining weights. Given the response time series fyt W 1 t ng with previously generated forecasts for the m component models, a combined forecast is created from the component forecasts as follows: P Predictions: yOt D m i D1 wi yOi;t Prediction Errors: eOt D yt yOt where yOi;t are the forecasts of the component models and wi are the combining weights. The estimate of the root mean square prediction error and forecast confidence limits for the combined forecast are computed by assuming independence of the prediction errors of the component forecasts, as follows: qP m 2 2 Standard Errors: O t D O i;t i D1 wi Confidence Limits:
˙O t Z˛=2
where O i;t are the estimated root mean square prediction errors for the component models, ˛ is the confidence limit width, 1 ˛ is the confidence level, and Z˛=2 is the ˛2 quantile of the standard normal distribution. Since, in practice, there might be positive correlation between the prediction errors of the component forecasts, these confidence limits may be too narrow.
External or User-Supplied Forecasts This section discusses the computation of predicted values and confidence limits for external forecast models. Given a response time series yt and external forecast series yOt , the prediction errors are computed as eOt D yt yOt for those t for which both yt and yOt are nonmissing. The mean squared error (MSE) is computed from the prediction errors. The variance of the k-step-ahead prediction errors is set to k times the MSE. From these variances, the standard errors and confidence limits are computed in the usual way. If the supplied predictions contain so many missing values within the time range of the response series that the MSE estimate cannot be computed, the confidence limits, standard errors, and statistics of fit are set to missing.
Adjustments F 2667
Adjustments Adjustment predictors are subtracted from the response time series prior to model parameter estimation, evaluation, and forecasting. After the predictions of the adjusted response time series are obtained from the forecasting model, the adjustments are added back to produce the forecasts. If yt is the response time series and Xi;t , 1 i m are m adjustment predictor series, then the adjusted response series wt is wt D yt
m X
Xi;t
i D1
Parameter estimation for the model is performed by using the adjusted response time series wt . The forecasts wO t of wt are adjusted to obtain the forecasts yOt of yt . yOt D wO t C
m X
Xi;t
i D1
Missing values in an adjustment series are ignored in these computations.
Series Transformations For pure ARIMA models, transforming the response time series can aid in obtaining stationary noise series. For general ARIMA models with inputs, transforming the response time series or one or more of the input time series can provide a better model fit. Similarly, the fit of smoothing models can improve when the response series is transformed. There are four transformations available, for strictly positive series only. Let yt > 0 be the original time series, and let wt be the transformed series. The transformations are defined as follows: Log
is the logarithmic transformation, wt D ln.yt /
Logistic
is the logistic transformation, wt D ln.cyt =.1
cyt //
where the scaling factor c is c D .1
10
6
/10
ceil.log10 .max.yt ///
and ceil.x/ is the smallest integer greater than or equal to x. Square Root
is the square root transformation, p wt D yt
2668 F Chapter 44: Forecasting Process Details
Box Cox
is the Box-Cox transformation, ( yt 1 ; ¤0 wt D ln.yt /; D 0
Parameter estimation is performed by using the transformed series. The transformed model predictions and confidence limits are then obtained from the transformed time series and these parameter estimates. The transformed model predictions wO t are used to obtain either the minimum mean absolute error (MMAE) or minimum mean squared error (MMSE) predictions yOt , depending on the setting of the forecast options. The model is then evaluated based on the residuals of the original time series and these predictions. The transformed model confidence limits are inverse-transformed to obtain the forecast confidence limits.
Predictions for Transformed Models Since the transformations described in the previous section are monotonic, applying the inversetransformation to the transformed model predictions results in the median of the conditional probability density function at each point in time. This is the minimum mean absolute error (MMAE) prediction. If wt D F.yt / is the transform with inverse-transform yt D F median.yOt / D F
1
.E Œwt / D F
1
1 .w /, t
then
.wO t /
The minimum mean squared error (MMSE) predictions are the mean of the conditional probability density function at each point in time. Assuming that the prediction errors are normally distributed with variance t2 , the MMSE predictions for each of the transformations are as follows: Log
is the conditional expectation of inverse-logarithmic transformation, yOt D E e wt D exp wO t C t2 =2
Logistic
is the conditional expectation of inverse-logistic transformation, 1 yOt D E c.1 C exp. wt // where the scaling factor c D .1
e
6 /10 ceil.log10 .max.yt /// .
Square Root
is the conditional expectation of the inverse-square root transformation, yOt D E wt2 D wO t2 C t2
Box Cox
is the conditional expectation of the inverse Box-Cox transformation, i ( h E .wt C 1/1= ; ¤0 yOt D E Œe wt D exp.wO t C 21 t2 /; D 0
The expectations of the inverse logistic and Box-Cox ( ¤0 ) transformations do not generally have explicit solutions and are computed by using numerical integration.
Smoothing Models F 2669
Smoothing Models This section details the computations performed for the exponential smoothing and Winters method forecasting models.
Smoothing Model Calculations The descriptions and properties of various smoothing methods can be found in Gardner (1985), Chatfield (1978), and Bowerman and O’Connell (1979). The following section summarizes the smoothing model computations. Given a time series fYt W 1 t ng, the underlying model assumed by the smoothing models has the following (additive seasonal) form: Yt D t C ˇt t C sp .t / C t where t
represents the time-varying mean term.
ˇt
represents the time-varying slope.
sp .t /
represents the time-varying seasonal contribution for one of the p seasons.
t
are disturbances.
For smoothing models without trend terms, ˇt D 0; and for smoothing models without seasonal terms, sp .t / D 0. Each smoothing model is described in the following sections. At each time t , the smoothing models estimate the time-varying components described above with the smoothing state. After initialization, the smoothing state is updated for each observation using the smoothing equations. The smoothing state at the last nonmissing observation is used for predictions.
Smoothing State and Smoothing Equations Depending on the smoothing model, the smoothing state at time t consists of the following: Lt is a smoothed level that estimates t . Tt is a smoothed trend that estimates ˇt . St
j,
j D 0; : : :; p
1, are seasonal factors that estimate sp .t/.
The smoothing process starts with an initial estimate of the smoothing state, which is subsequently updated for each observation by using the smoothing equations.
2670 F Chapter 44: Forecasting Process Details
The smoothing equations determine how the smoothing state changes as time progresses. Knowledge of the smoothing state at time t 1 and that of the time series value at time t uniquely determine the smoothing state at time t. The smoothing weights determine the contribution of the previous smoothing state to the current smoothing state. The smoothing equations for each smoothing model are listed in the following sections.
Smoothing State Initialization Given a time series fYt W 1 t ng, the smoothing process first computes the smoothing state for time t D 1. However, this computation requires an initial estimate of the smoothing state at time t D 0, even though no data exists at or before time t D 0. An appropriate choice for the initial smoothing state is made by backcasting from time t D n to t D 1 to obtain a prediction at t D 0. The initialization for the backcast is obtained by regression with constant and linear terms and seasonal dummies (additive or multiplicative) as appropriate for the smoothing model. For models with linear or seasonal terms, the estimates obtained by the regression are used for initial smoothed trend and seasonal factors; however, the initial smoothed level for backcasting is always set to the last observation, Yn . The smoothing state at time t D 0 obtained from the backcast is used to initialize the smoothing process from time t D 1 to t D n (Chatfield and Yar 1988). For models with seasonal terms, the smoothing state is normalized so that the seasonal factors St j for j D 0; : : :; p 1 sum to zero for models that assume additive seasonality and average to one for models (such as Winters method) that assume multiplicative seasonality.
Missing Values When a missing value is encountered at time t, the smoothed values are updated using the errorcorrection form of the smoothing equations with the one-step-ahead prediction error, et , set to zero. The missing value is estimated using the one-step-ahead prediction at time t 1, that is YOt 1 .1/ (Aldrin 1989). The error-correction forms of each of the smoothing models are listed in the following sections.
Predictions and Prediction Errors Predictions are made based on the last known smoothing state. Predictions made at time t for k steps ahead are denoted YOt .k/ and the associated prediction errors are denoted et .k/ D Yt Ck YOt .k/. The prediction equation for each smoothing model is listed in the following sections. The one-step-ahead the future—that is, et D et 1 .1/ D Yt
predictions refer to predictions made at time t 1 for one time unit into YOt 1 .1/. The one-step-ahead prediction errors are more simply denoted YOt 1 .1/. The one-step-ahead prediction errors are also the model residuals,
Smoothing Weights F 2671
and the sum of squares of the one-step-ahead prediction errors is the objective function used in smoothing weight optimization. The variance of the prediction errors are used to calculate the confidence limits (Sweet 1985, McKenzie 1986, Yar and Chatfield 1990, and Chatfield and Yar 1991). The equations for the variance of the prediction errors for each smoothing model are listed in the following sections. Note: var.t / is estimated by the mean square of the one-step-ahead prediction errors.
Smoothing Weights Depending on the smoothing model, the smoothing weights consist of the following: ˛
is a level smoothing weight.
is a trend smoothing weight.
ı
is a seasonal smoothing weight.
is a trend damping weight.
Larger smoothing weights (less damping) permit the more recent data to have a greater influence on the predictions. Smaller smoothing weights (more damping) give less weight to recent data.
Specifying the Smoothing Weights Typically the smoothing weights are chosen to be from zero to one. (This is intuitive because the weights associated with the past smoothing state and the value of current observation would normally sum to one.) However, each smoothing model (except Winters Method—Multiplicative Version) has an ARIMA equivalent. Weights chosen to be within the ARIMA additive-invertible region will guarantee stable predictions (Archibald 1990 and Gardner 1985). The ARIMA equivalent and the additive-invertible region for each smoothing model are listed in the following sections.
Optimizing the Smoothing Weights Smoothing weights are determined so as to minimize the sum of squared, one-step-ahead prediction errors. The optimization is initialized by choosing from a predetermined grid the initial smoothing weights that result in the smallest sum of squared, one-step-ahead prediction errors. The optimization process is highly dependent on this initialization. It is possible that the optimization process will fail due to the inability to obtain stable initial values for the smoothing weights (Greene 1993 and Judge et al. 1980), and it is possible for the optimization to result in a local minima. The optimization process can result in weights to be chosen outside both the zero-to-one range and the ARIMA additive-invertible region. By restricting weight optimization to additive-invertible region, you can obtain a local minimum with stable predictions. Likewise, weight optimization can be restricted to the zero-to-one range or other ranges. It is also possible to fix certain weights to a specific value and optimize the remaining weights.
2672 F Chapter 44: Forecasting Process Details
Standard Errors
The standard errors associated with the smoothing weights are calculated from the Hessian matrix of the sum of squared, one-step-ahead prediction errors with respect to the smoothing weights used in the optimization process. Weights Near Zero or One
Sometimes the optimization process results in weights near zero or one. For simple or double (Brown) exponential smoothing, a level weight near zero implies that simple differencing of the time series might be appropriate. For linear (Holt) exponential smoothing, a level weight near zero implies that the smoothed trend is constant and that an ARIMA model with deterministic trend might be a more appropriate model. For damped-trend linear exponential smoothing, a damping weight near one implies that linear (Holt) exponential smoothing might be a more appropriate model. For Winters method and seasonal exponential smoothing, a seasonal weight near one implies that a nonseasonal model might be more appropriate and a seasonal weight near zero implies that deterministic seasonal factors might be present.
Equations for the Smoothing Models Simple Exponential Smoothing The model equation for simple exponential smoothing is Yt D t C t The smoothing equation is Lt D ˛Yt C .1
˛/Lt
1
The error-correction form of the smoothing equation is Lt D Lt
1
C ˛et
(Note: For missing values, et D 0.) The k-step prediction equation is YOt .k/ D Lt The ARIMA model equivalency to simple exponential smoothing is the ARIMA(0,1,1) model .1
B/Yt D .1
D1
˛
B/t
Equations for the Smoothing Models F 2673
The moving-average form of the equation is Yt D t C
1 X
˛t
j
j D1
For simple exponential smoothing, the additive-invertible region is f0 < ˛ < 2g The variance of the prediction errors is estimated as 2 3 k X1 var.et .k// D var.t / 41 C ˛ 2 5 D var.t /.1 C .k
1/˛ 2 /
j D1
Double (Brown) Exponential Smoothing The model equation for double exponential smoothing is Yt D t C ˇt t C t The smoothing equations are Lt D ˛Yt C .1
˛/Lt
Tt D ˛.Lt
1/
Lt
1
C .1
˛/Tt
1
This method can be equivalently described in terms of two successive applications of simple exponential smoothing: Œ1
D ˛Yt C .1
Œ2
D ˛St C .1
St
St
Œ1 1 Œ2 ˛/St 1
˛/St
Œ1
Œ1
Œ2
where St are the smoothed values of Yt , and St equation then takes the form: YOt .k/ D .2 C ˛k=.1
Œ1
˛//St
Œ1
are the smoothed values of St . The prediction
.1 C ˛k=.1
The error-correction forms of the smoothing equations are Lt D Lt Tt D Tt
1 1
C Tt
1
C ˛et
2
C ˛ et
(Note: For missing values, et D 0.) The k-step prediction equation is YOt .k/ D Lt C ..k
1/ C 1=˛/Tt
Œ2
˛//St
2674 F Chapter 44: Forecasting Process Details
The ARIMA model equivalency to double exponential smoothing is the ARIMA(0,2,2) model, .1
B/2 Yt D .1
D1
B/2 t
˛
The moving-average form of the equation is Yt D t C
1 X
1/˛ 2 /t
.2˛ C .j
j
j D1
For double exponential smoothing, the additive-invertible region is f0 < ˛ < 2g The variance of the prediction errors is estimated as 2 3 k X1 var.et .k// D var.t / 41 C .2˛ C .j 1/˛ 2 /2 5 j D1
Linear (Holt) Exponential Smoothing The model equation for linear exponential smoothing is Yt D t C ˇt t C t The smoothing equations are Lt D ˛Yt C .1
˛/.Lt
Tt D .Lt
1/
Lt
1
C .1
C Tt
/Tt
1/ 1
The error-correction form of the smoothing equations is Lt D Lt Tt D Tt
1 1
C Tt
1
C ˛et
C ˛ et
(Note: For missing values, et D 0.) The k-step prediction equation is YOt .k/ D Lt C kTt The ARIMA model equivalency to linear exponential smoothing is the ARIMA(0,2,2) model, .1
B/2 Yt D .1
1 D 2
˛
2 D ˛
1
˛
1 B
2 B 2 /t
Equations for the Smoothing Models F 2675
The moving-average form of the equation is Yt D t C
1 X
.˛ C j˛ /t
j
j D1
For linear exponential smoothing, the additive-invertible region is f0 < ˛ < 2g f0 < < 4=˛
2g
The variance of the prediction errors is estimated as 2 3 k X1 var.et .k// D var.t / 41 C .˛ C j˛ /2 5 j D1
Damped-Trend Linear Exponential Smoothing The model equation for damped-trend linear exponential smoothing is Yt D t C ˇt t C t The smoothing equations are Lt D ˛Yt C .1
˛/.Lt
Tt D .Lt
1/
Lt
1
C .1
C Tt
/Tt
1/ 1
The error-correction form of the smoothing equations is Lt D Lt
1
C Tt
1
C ˛et Tt D Tt
1
C ˛ et
(Note: For missing values, et D 0.) The k-step prediction equation is k X
YOt .k/ D Lt C
i Tt
i D1
The ARIMA model equivalency to damped-trend linear exponential smoothing is the ARIMA(1,1,2) model, .1
B/.1
1 D 1 C 2 D .˛
B/Yt D .1 ˛
1/
˛
1 B
2 B 2 /t
2676 F Chapter 44: Forecasting Process Details
The moving-average form of the equation (assuming jj < 1) is Yt D t C
1 X
.˛ C ˛ . j
1/=.
1//t
j
j D1
For damped-trend linear exponential smoothing, the additive-invertible region is f0 < ˛ < 2g f0 < < 4=˛
2g
The variance of the prediction errors is estimated as 2 k X1 var.et .k// D var.t / 41 C .˛ C ˛ . j
3 1/=.
1//2 5
j D1
Seasonal Exponential Smoothing The model equation for seasonal exponential smoothing is Yt D t C sp .t / C t The smoothing equations are Lt D ˛.Yt
St
C .1
St D ı.Yt
Lt / C .1
p/
˛/Lt
ı/St
1
p
The error-correction form of the smoothing equations is Lt D Lt
1
C ˛et
St D St
p
C ı.1
˛/et
(Note: For missing values, et D 0.) The k-step prediction equation is YOt .k/ D Lt C St
pCk
The ARIMA model equivalency to seasonal exponential smoothing is the ARIMA(0,1,p+1)(0,1,0)p model, .1
B p /Yt D .1
B/.1
1 D 1
˛
2 D 1
ı.1
3 D .1
˛/.ı
˛/ 1/
1 B
2 B p
3 B pC1 /t
Equations for the Smoothing Models F 2677
The moving-average form of the equation is Yt D t C
1 X
j t j
j D1
( j
D
˛ ˛ C ı.1
forj modp¤0 forj mod p D 0
˛/
For seasonal exponential smoothing, the additive-invertible region is fmax. p˛; 0/ < ı.1
˛/ < .2
˛/g
The variance of the prediction errors is estimated as 2 3 k X1 25 var.et .k// D var.t / 41 C j j D1
Multiplicative Seasonal Smoothing In order to use the multiplicative version of seasonal smoothing, the time series and all predictions must be strictly positive. The model equation for the multiplicative version of seasonal smoothing is Yt D t sp .t / C t The smoothing equations are Lt D ˛.Yt =St
p/
C .1
St D ı.Yt =Lt / C .1
˛/Lt
ı/St
1
p
The error-correction form of the smoothing equations is Lt D Lt
1
St D St
p
C ˛et =St C ı.1
p
˛/et =Lt
(Note: For missing values, et D 0.) The k-step prediction equation is YOt .k/ D Lt St
pCk
The multiplicative version of seasonal smoothing does not have an ARIMA equivalent; however, when the seasonal variation is small, the ARIMA additive-invertible region of the additive version of seasonal described in the preceding section can approximate the stability region of the multiplicative version. The variance of the prediction errors is estimated as 2 1 p X X1 4 var.et .k// D var.t / . j Cip St Ck =St Ck
3 25 j/
i D0 j D0
where
j
are as described for the additive version of seasonal method, and
j
D 0 for j k.
2678 F Chapter 44: Forecasting Process Details
Winters Method—Additive Version The model equation for the additive version of Winters method is Yt D t C ˇt t C sp .t / C t The smoothing equations are Lt D ˛.Yt
St
p/
C .1
˛/.Lt
Tt D .Lt
Lt
1/
C .1
/Tt
St D ı.Yt
Lt / C .1
ı/St
1
C Tt
1/
1
p
The error-correction form of the smoothing equations is Lt D Lt
1
C Tt
1
Tt D Tt
1
C ˛ et
St D St
p
C ı.1
C ˛et ˛/et
(Note: For missing values, et D 0.) The k-step prediction equation is YOt .k/ D Lt C kTt C St
pCk
The ARIMA model equivalency to the additive version of Winters method is the ARIMA(0,1,p+1)(0,1,0)p model, " # pC1 X p i .1 B/.1 B /Yt D 1 i B t i D1
8 ˆ 1 ˛ ˆ ˆ ˆ < ˛
j D1 2j p j D ˆ 1 ˛ ı.1 ˛/ j D p ˆ ˆ ˆ :.1 ˛/.ı 1/ j DpC1 ˛
1
The moving-average form of the equation is Yt D t C
1 X
j t j
j D1
( j
D
˛ C j˛ ˛ C j˛ C ı.1
˛/;
forj forj
mod p ¤ 0 mod p D 0
For the additive version of Winters method (see Archibald 1990), the additive-invertible region is fmax. p˛; 0/ < ı.1 f0 < ˛ < 2
˛
ı.1
˛/ < .2 ˛/.1
˛/g cos.#/g
Equations for the Smoothing Models F 2679
where # is the smallest nonnegative solution to the equations listed in Archibald (1990). The variance of the prediction errors is estimated as 2 3 k X1 25 var.et .k// D var.t / 41 C j j D1
Winters Method—Multiplicative Version In order to use the multiplicative version of Winters method, the time series and all predictions must be strictly positive. The model equation for the multiplicative version of Winters method is Yt D .t C ˇt t /sp .t / C t The smoothing equations are Lt D ˛.Yt =St Tt D .Lt
p/
Lt
C .1
˛/.Lt
1
/Tt
1
C .1
1/
St D ı.Yt =Lt / C .1
ı/St
C Tt
1/
p
The error-correction form of the smoothing equations is Lt D Lt
1
C Tt
Tt D Tt
1
St D St
p C ı.1
1
C ˛et =St
C ˛ et =St
p
p
˛/et =Lt
N OTE : For missing values, et D 0. The k-step prediction equation is YOt .k/ D .Lt C kTt /St
pCk
The multiplicative version of Winters method does not have an ARIMA equivalent; however, when the seasonal variation is small, the ARIMA additive-invertible region of the additive version of Winters method described in the preceding section can approximate the stability region of the multiplicative version. The variance of the prediction errors is estimated as 2 1 p X X1 4 var.et .k// D var.t / . j Cip St Ck =StCk
3 25 j/
i D0 j D0
where
j
are as described for the additive version of Winters method and
j
D 0 for j k.
2680 F Chapter 44: Forecasting Process Details
ARIMA Models Autoregressive integrated moving-average (ARIMA) models predict values of a dependent time series with a linear combination of its own past values, past errors (also called shocks or innovations), and current and past values of other time series (predictor time series). The Time Series Forecasting System uses the ARIMA procedure of SAS/ETS software to fit and forecast ARIMA models. The maximum likelihood method is used for parameter estimation. Refer to Chapter 7, “The ARIMA Procedure,” for details of ARIMA model estimation and forecasting. This section summarizes the notation used for ARIMA models.
Notation for ARIMA Models A dependent time series that is modeled as a linear combination of its own past values and past values of an error series is known as a (pure) ARIMA model.
Nonseasonal ARIMA Model Notation The order of an ARIMA model is usually denoted by the notation ARIMA(p,d,q), where p
is the order of the autoregressive part.
d
is the order of the differencing (rarely should d > 2 be needed).
q
is the order of the moving-average process.
Given a dependent time series fYt W 1 t ng, mathematically the ARIMA model is written as .1
B/d Yt D C
.B/ at .B/
where t
indexes time.
is the mean term.
B
is the backshift operator; that is, BXt D Xt
.B/
is the autoregressive operator, represented as a polynomial in the back shift operator: .B/ D 1 1 B : : : p B p .
.B/
is the moving-average operator, represented as a polynomial in the back shift operator: .B/ D 1 1 B : : : q B q .
at
is the independent disturbance, also called the random error.
1.
Notation for ARIMA Models F 2681
For example, the mathematical form of the ARIMA(1,1,2) model is .1
B/Yt D C
.1
1 B 2 B 2 / at .1 1 B/
Seasonal ARIMA Model Notation Seasonal ARIMA models are expressed in factored form by the notation ARIMA(p,d,q)(P,D,Q)s , where P
is the order of the seasonal autoregressive part.
D
is the order of the seasonal differencing (rarely should D > 1 be needed).
Q
is the order of the seasonal moving-average process.
s
is the length of the seasonal cycle.
Given a dependent time series fYt W 1 t ng, mathematically the ARIMA seasonal model is written as .1
B/d .1
B s /D Yt D C
.B/s .B s / at .B/s .B s /
where s .B s /
is the seasonal autoregressive operator, represented as a polynomial in the back shift operator: s .B s / D 1 s;1 B s : : : s;P B sP
s .B s /
is the seasonal moving-average operator, represented as a polynomial in the back shift operator: s .B s / D 1 s;1 B s : : : s;Q B sQ
For example, the mathematical form of the ARIMA(1,0,1)(1,1,2)12 model is .1
B 12 /Yt D C
.1
1 B/.1 s;1 B 12 s;2 B 24 / at .1 1 B/.1 s;1 B 12 /
Abbreviated Notation for ARIMA Models If the differencing order, autoregressive order, or moving-average order is zero, the notation is further abbreviated as I(d)(D)s
integrated model or ARIMA(0,d,0)(0,D,0)
AR(p)(P)s
autoregressive model or ARIMA(p,0,0)(P,0,0)
IAR(p,d)(P,D)s
integrated autoregressive model or ARIMA(p,d,0)(P,D,0)s
MA(q)(Q)s
moving average model or ARIMA(0,0,q)(0,0,Q)s
IMA(d,q)(D,Q)s
integrated moving average model or ARIMA(0,d,q)(0,D,Q)s
ARMA(p,q)(P,Q)s
autoregressive moving-average model or ARIMA(p,0,q)(P,0,Q)s .
2682 F Chapter 44: Forecasting Process Details
Notation for Transfer Functions A transfer function can be used to filter a predictor time series to form a dynamic regression model. Let Yt be the dependent series, let Xt be the predictor series, and let ‰.B/ be a linear filter or transfer function for the effect of Xt on Yt . The ARIMA model is then .1
B/d .1
B s /D Yt D C ‰.B/.1
B/d .1
B s /D Xt C
.B/s .B s / at .B/s .B s /
This model is called a dynamic regression of Yt on Xt . Nonseasonal Transfer Function Notation
Given the ith predictor time series fXi;t W 1 t ng, the transfer function is written as
Dif.di /Lag.ki /N.qi /=D.pi / where di
is the simple order of the differencing for the ith predictor time series, .1 B/di Xi;t (rarely should di > 2 be needed).
ki
is the pure time delay (lag) for the effect of the ith predictor time series, Xi;t B ki D Xi;t ki .
pi
is the simple order of the denominator for the ith predictor time series.
qi
is the simple order of the numerator for the ith predictor time series.
The mathematical notation used to describe a transfer function is ‰i .B/ D
!i .B/ .1 ıi .B/
B/di B ki
where B
is the backshift operator; that is, BXt D Xt
ıi .B/
is the denominator polynomial of the transfer function for the ith predictor time series: ıi .B/ D 1 ıi;1 B : : : ıi;pi B pi .
!i .B/
is the numerator polynomial of the transfer function for the ith predictor time series: !i .B/ D 1 !i;1 B : : : !i;qi B qi .
1.
The numerator factors for a transfer function for a predictor series are like the MA part of the ARMA model for the noise series. The denominator factors for a transfer function for a predictor series are like the AR part of the ARMA model for the noise series. Denominator factors introduce exponentially weighted, infinite distributed lags into the transfer function. For example, the transfer function for the ith predictor time series with
Notation for ARIMA Models F 2683
ki D 3
time lag is 3
di D 1
simple order of differencing is one
pi D 1
simple order of the denominator is one
qi D 2
simple order of the numerator is two
would be written as [Dif(1)Lag(3)N(2)/D(1)]. The mathematical notation for the transfer function in this example is ‰i .B/ D
.1
!i;1 B !i;2 B 2 / .1 .1 ıi;1 B/
B/B 3
Seasonal Transfer Function Notation
The general transfer function notation for the ith predictor time series Xi;t with seasonal factors is [Dif(di )(Di )s Lag(ki ) N(qi )(Qi )s / D(pi )(Pi )s ] where Di
is the seasonal order of the differencing for the ith predictor time series (rarely should Di > 1 be needed).
Pi
is the seasonal order of the denominator for the ith predictor time series (rarely should Pi > 2 be needed).
Qi
is the seasonal order of the numerator for the ith predictor time series, (rarely should Qi > 2 be needed).
s
is the length of the seasonal cycle.
The mathematical notation used to describe a seasonal transfer function is ‰i .B/ D
!i .B/!s;i .B s / .1 ıi .B/ıs;i .B s /
B/di .1
B s /Di B ki
where ıs;i .B s /
is the denominator seasonal polynomial of the transfer function for the ith predictor time series: ıs;i .B/ D 1 ıs;i;1 B : : : ıs;i;Pi B sPi
!s;i .B s /
is the numerator seasonal polynomial of the transfer function for the ith predictor time series: !s;i .B/ D 1 !s;i;1 B : : : !s;i;Qi B sQi
For example, the transfer function for the ith predictor time series Xi;t whose seasonal cycle s D 12 with di D 2
simple order of differencing is two
Di D 1
seasonal order of differencing is one
qi D 2
simple order of the numerator is two
Qi D 1
seasonal order of the numerator is one
2684 F Chapter 44: Forecasting Process Details
would be written as [Dif(2)(1)s N(2)(1)s ]. The mathematical notation for the transfer function in this example is ‰i .B/ D .1
!i;1 B
!i;2 B 2 /.1
!s;i;1 B 12 /.1
B/2 .1
B 12 /
Note: In this case, [Dif(2)(1)s N(2)(1)s ] = [Dif(2)(1)s Lag(0)N(2)(1)s /D(0)(0)s ].
Predictor Series This section discusses time trend curves, seasonal dummies, interventions, and adjustments.
Time Trend Curves When you specify a time trend curve as a predictor in a forecasting model, the system computes a predictor series that is a deterministic function of time. This variable is then included in the model as a regressor, and the trend curve is fit to the dependent series by linear regression, in addition to other predictor series. Some kinds of nonlinear trend curves are fit by transforming the dependent series. For example, the exponential trend curve is actually a linear time trend fit to the logarithm of the series. For these trend curve specifications, the series transformation option is set automatically, and you cannot independently control both the time trend curve and transformation option. The computed time trend variable is included in the output data set in a variable named in accordance with the trend curve type. Let t represent the observation count from the start of the period of fit for the model, and let Xt represent the value of the time trend variable at observation t within the period of fit. The names and definitions of these variables are as follows. (Note: These deterministic variables are reserved variable names.) Linear trend
variable name _LINEAR_, with Xt D t
Quadratic trend
variable name _QUAD_, with Xt D .t c/2 . Note that a quadratic trend implies a linear trend as a special case and results in two regressors: _QUAD_ and _LINEAR_.
Cubic trend
variable name _CUBE_, with Xt D .t c/3 . Note that a cubic trend implies a quadratic trend as a special case and results in three regressors: _CUBE_, _QUAD_, and _LINEAR_.
Logistic trend
variable name _LOGIT_, with Xt D t. The model is a linear time trend applied to the logistic transform of the dependent series. Thus, specifying a logistic trend is equivalent to specifying the logistic series transformation and a linear time trend. A logistic trend predictor can be used only in conjunction with the logistic transformation, which is set automatically when you specify logistic trend.
c
Intervention Effects F 2685
Logarithmic trend
variable name _LOG_, with Xt D ln.t/
Exponential trend
variable name _EXP_, with Xt D t . The model is a linear time trend applied to the logarithms of the dependent series. Thus, specifying an exponential trend is equivalent to specifying the log series transformation and a linear time trend. An exponential trend predictor can be used only in conjunction with the log transformation, which is set automatically when you specify exponential trend.
Hyperbolic trend
variable name _HYP_, with Xt D 1=t
Power curve trend
variable name _POW_, with Xt D ln.t/. The model is a logarithmic time trend applied to the logarithms of the dependent series. Thus, specifying a power curve is equivalent to specifying the log series transformation and a logarithmic time trend. A power curve predictor can be used only in conjunction with the log transformation, which is set automatically when you specify a power curve trend.
EXP(A+B/TIME) trend
variable name _ERT_, with Xt D 1=t. The model is a hyperbolic time trend applied to the logarithms of the dependent series. Thus, specifying this trend curve is equivalent to specifying the log series transformation and a hyperbolic time trend. This trend curve can be used only in conjunction with the log transformation, which is set automatically when you specify this trend.
Intervention Effects Interventions are used for modeling events that occur at specific times. That is, they are known changes that affect the dependent series or outliers. The ith intervention series is included in the output data set with variable name _INTVi _, which is a reserved variable name. Point Interventions
The point intervention is a one-time event. The ith intervention series Xi;t has a point intervention at time ti nt when the series is nonzero only at time ti nt —that is, ( 1; t D ti nt Xi;t D 0; ot herwi se Step Interventions
Step interventions are continuing, and the input time series flags periods after the intervention. For a step intervention, before time ti nt , the ith intervention series Xi;t is zero and then steps to a constant level thereafter—that is, ( 1; t ti nt Xi;t D 0; ot herwi se
2686 F Chapter 44: Forecasting Process Details
Ramp Interventions
A ramp intervention is a continuing intervention that increases linearly after the intervention time. For a ramp intervention, before time ti nt , the ith intervention series Xi;t is zero and increases linearly thereafter—that is, proportional to time. ( t ti nt ; t ti nt Xi;t D 0; ot herwise Intervention Effect
Given the ith intervention series Xi;t , you can define how the intervention takes effect by filters (transfer functions) of the form ‰i .B/ D
1 1
!i;1 B ıi;1 B
::: :::
!i;qi B qi ıi;pi B pi
where B is the backshift operator Byt D yt
1.
The denominator of the transfer function determines the decay pattern of the intervention effect, whereas the numerator terms determine the size of the intervention effect time window. For example, the following intervention effects are associated with the respective transfer functions. Immediately
‰i .B/ D 1
Gradually
‰i .B/ D 1=.1
1 lag window
‰i .B/ D 1
!i;1 B
3 lag window
‰i .B/ D 1
!i;1 B
ıi;1 B/ !i;2 B 2
!i;3 B 3
Intervention Notation
The notation used to describe intervention effects has the form type :ti nt (qi )/(pi ), where type is point, step, or ramp; ti nt is the time of the intervention (for example, OCT87); qi is the transfer function numerator order; and pi is the transfer function denominator order. If qi D 0, the part “(qi )” is omitted; if pi D 0, the part “/(pi )” is omitted. In the Intervention Specification window, the Number of Lags option specifies the transfer function numerator order qi , and the Effect Decay Pattern option specifies the transfer function denominator order pi . In the Effect Decay Pattern options, values and resulting pi are: None, pi D 0; Exp, pi D 1; Wave, pi D 2. For example, a step intervention with date 08MAR90 and effect pattern Exp is denoted “Step:08MAR90/(1)” and has a transfer function filter ‰i .B/ D 1=.1 ı1 B/. A ramp intervention immediately applied on 08MAR90 is denoted “Ramp:08MAR90” and has a transfer function filter ‰i .B/ D 1.
Seasonal Dummy Inputs F 2687
Seasonal Dummy Inputs For a seasonal cycle of length s, the seasonal dummy regressors include fXi;t W 1 i .s
1/; 1 t ng
for models that include an intercept term and fXi;t W 1 i s; 1 t ng for models that exclude an intercept term. Each element of a seasonal dummy regressor is either zero or one, based on the following rule: ( 1; when i D t mod s Xi;t D 0; otherwise Note that if the model includes an intercept term, the number of seasonal dummy regressors is one less than s to ensure that the linear system is full rank. The seasonal dummy variables are included in the output data set with variable names prefixed with “SDUMMYi” and sequentially numbered. They are reserved variable names.
Series Diagnostic Tests This section describes the diagnostic tests that are used to determine the kinds of forecasting models appropriate for a series. The series diagnostics are a set of heuristics that provide recommendations on whether or not the forecasting model should contain a log transform, trend terms, and seasonal terms. These recommendations are used by the automatic model selection process to restrict the model search to a subset of the model selection list. (You can disable this behavior by using the Automatic Model Selection Options window.) The tests that are used by the series diagnostics do not always produce the correct classification of the series. They are intended to accelerate the process of searching for a good forecasting model for the series, but you should not rely on them if finding the very best model is important to you. If you have information about the appropriate kinds of forecasting models (perhaps from studying the plots and autocorrelations shown in the Series Viewer window), you can set the series diagnostic flags in the Series Diagnostics window. Select the YES, NO, or MAYBE values for the Log Transform, Trend, and Seasonality options in the Series Diagnostics window as you think appropriate. The series diagnostics tests are intended as a heuristic tool only, and no statistical validity is claimed for them. These tests might be modified and enhanced in future releases of the Time Series Forecasting System. The testing strategy is as follows:
2688 F Chapter 44: Forecasting Process Details
1. Log transform test. The log test fits a high-order autoregressive model to the series and to the log of the series and compares goodness-of-fit measures for the prediction errors of the two models. If this test finds that log transforming the series is suitable, the Log Transform option is set to YES, and the subsequent diagnostic tests are performed on the log transformed series. 2. Trend test. The resultant series is tested for presence of a trend by using an augmented Dickey-Fuller test and a random walk with drift test. If either test finds that the series appears to have a trend, the Trend option is set to YES, and the subsequent diagnostic tests are performed on the differenced series. 3. Seasonality test. The resultant series is tested for seasonality. A seasonal dummy model with AR(1) errors is fit and the joint significance of the seasonal dummy estimates is tested. If the seasonal dummies are significant, the AIC statistic for this model is compared to the AIC for and AR(1) model without seasonal dummies. If the AIC for the seasonal model is lower than that of the nonseasonal model, the Seasonal option is set to YES.
Statistics of Fit This section explains the goodness-of-fit statistics reported to measure how well different models fit the data. The statistics of fit for the various forecasting models can be viewed or stored in a data set by using the Model Viewer window. Statistics of fit are computed by using the actual and forecasted values for observations in the period of evaluation. One-step forecasted values are used whenever possible, including the case when a hold-out sample contains no missing values. If a one-step forecast for an observation cannot be computed due to missing values for previous series observations, a multi-step forecast is computed, using the minimum number of steps as the previous nonmissing values in the data range permit. The various statistics of fit reported are as follows. In these formulas, n is the number of nonmissing observations and k is the number of fitted parameters in the model. Number of Nonmissing Observations. The number of nonmissing observations used to fit the model. Number of Observations. The total number of observations used to fit the model, including both missing and nonmissing observations. Number of Missing Actuals. The number of missing actual values. Number of Missing Predicted Values. The number of missing predicted values. Number of Model Parameters. The number of parameters fit to the data. For combined forecast, this is the number of forecast components.
Statistics of Fit F 2689
Total Sum of Squares (Uncorrected). P The total sum of squares for the series, SST, uncorrected for the mean: ntD1 yt2 . Total Sum of Squares (Corrected). P The total sum of squares for the series, SST, corrected for the mean: ntD1 .yt is the series mean. Sum of Square Errors. P The sum of the squared prediction errors, SSE. SSE D ntD1 .yt step predicted value.
y/2 , where y
yOt /2 , where yO is the one-
Mean Squared Error. The mean squared prediction error, MSE, calculated from the one-step-ahead forecasts. MSE D n1 SSE. This formula enables you to evaluate small hold-out samples. Root Mean Squared Error. p The root mean square error (RMSE), MSE. Mean Absolute Percent Error. The mean absolute percent prediction error (MAPE), The summation ignores observations where yt D 0. Mean Absolute Error. The mean absolute prediction error,
1 n
Pn
t D1 jyt
100 n
Pn
tD1 j.yt
yOt /=yt j.
yOt j.
R-Square. The R2 statistic, R2 D 1 SSE=SST. If the model fits the series badly, the model error sum of squares, SSE, can be larger than SST and the R2 statistic will be negative. Adjusted R-Square. The adjusted R2 statistic, 1
. nn
1 /.1 k
R2 /.
Amemiya’s Adjusted R-Square. Amemiya’s adjusted R2 , 1 . nCk /.1 n k
R2 /.
Random Walk R-Square. The random walk R2 statistic (Harvey’s R2 statistic by using P the random walk model for comparison), 1 . n n 1 /SSE=RWSSE, where RWSSE D ntD2 .yt yt 1 /2 , and P D n 1 1 ntD2 .yt yt 1 /. Akaike’s Information Criterion. Akaike’s information criterion (AIC), n ln.MSE/ C 2k. Schwarz Bayesian Information Criterion. Schwarz Bayesian information criterion (SBC or BIC), n ln.MSE/ C k ln.n/. Amemiya’s Prediction Criterion. Amemiya’s prediction criterion, n1 SST. nCk /.1 n k
R2 / D . nCk / 1 SSE. n k n
Maximum Error. The largest prediction error. Minimum Error. The smallest prediction error. Maximum Percent Error. The largest percent prediction error, 100 max..yt tions where yt D 0.
yOt /=yt /. The summation ignores observa-
2690 F Chapter 44: Forecasting Process Details
Minimum Percent Error. The smallest percent prediction error, 100 min..yt vations where yt D 0. Mean Error. The mean prediction error,
1 n
Pn
tD1 .yt
Mean Percent Error. The mean percent prediction error, where yt D 0.
100 n
yOt /=yt /. The summation ignores obser-
yOt /. Pn
t D1
.yt yO t / . yt
The summation ignores observations
References Akaike, H. (1974), “A New Look at the Statistical Model Identification,” IEEE Transaction on Automatic Control, AC-19, 716–723. Aldrin, M. and Damsleth, E. (1989), “Forecasting Nonseasonal Time Series with Missing Observations,” Journal of Forecasting, 8, 97–116. Anderson, T.W. (1971), The Statistical Analysis of Time Series, New York: John Wiley & Sons. Ansley, C. (1979), “An Algorithm for the Exact Likelihood of a Mixed Autoregressive MovingAverage Process,” Biometrika, 66, 59. Ansley, C. and Newbold, P. (1980), “Finite Sample Properties of Estimators for Autoregressive Moving-Average Models,” Journal of Econometrics, 13, 159. Archibald, B.C. (1990), “Parameter Space of the Holt-Winters Model,” International Journal of Forecasting, 6, 199–209. Bartolomei, S.M. and Sweet, A.L. (1989), “A Note on the Comparison of Exponential Smoothing Methods for Forecasting Seasonal Series,” International Journal of Forecasting, 5, 111–116. Bhansali, R.J. (1980), “Autoregressive and Window Estimates of the Inverse Correlation Function,” Biometrika, 67, 551–566. Bowerman, B.L. and O’Connell, R.T. (1979), Time Series and Forecasting: An Applied Approach, North Scituate, Massachusetts: Duxbury Press. Box, G.E.P. and Cox D.R. (1964), “An Analysis of Transformations,” Journal of Royal Statistical Society B, No. 26, 211–243. Box, G.E.P. and Jenkins, G.M. (1976), Time Series Analysis: Forecasting and Control, Revised Edition, San Francisco: Holden-Day. Box, G.E.P. and Tiao, G.C. (1975), “Intervention Analysis with Applications to Economic and Environmental Problems,” JASA, 70, 70–79. Brocklebank, J.C. and Dickey, D.A. (1986), SAS System for Forecasting Time Series, 1986 Edition,
References F 2691
Cary, North Carolina: SAS Institute Inc. Brown, R.G. (1962), Smoothing, Forecasting, and Prediction of Discrete Time Series, New York: Prentice-Hall. Brown, R.G. and Meyer, R.F. (1961), “The Fundamental Theorem of Exponential Smoothing,” Operations Research, 9, 673–685. Chatfield, C. (1978), “The Holt-Winters Forecasting Procedure,” Applied Statistics, 27, 264–279. Chatfield, C., and Prothero, D.L. (1973), “Box-Jenkins Seasonal Forecasting: Problems in a Case Study,” Journal of the Royal Statistical Society, Series A, 136, 295–315. Chatfield, C. and Yar, M. (1988), “Holt-Winters Forecasting: Some Practical Issues,” The Statistician, 37, 129–140. Chatfield, C. and Yar, M. (1991), “Prediction Intervals for Multiplicative Holt-Winters,” International Journal of Forecasting, 7, 31–37. Cogger, K.O. (1974), “The Optimality of General-Order Exponential Smoothing,” Operations Research, 22, 858. Cox, D. R. (1961), “Prediction by Exponentially Weighted Moving Averages and Related Methods,” Journal of the Royal Statistical Society, Series B, 23, 414–422. Davidson, J. (1981), “Problems with the Estimation of Moving-Average Models,” Journal of Econometrics, 16, 295. Dickey, D. A., and Fuller, W.A. (1979), “Distribution of the Estimators for Autoregressive Time Series with a Unit Root,” Journal of the American Statistical Association, 74(366), 427–431. Dickey, D. A., Hasza, D. P., and Fuller, W.A. (1984), “Testing for Unit Roots in Seasonal Time Series,” Journal of the American Statistical Association, 79(386), 355–367. Fair, R.C. (1986), “Evaluating the Predictive Accuracy of Models,” in Handbook of Econometrics, Vol. 3., Griliches, Z. and Intriligator, M.D., eds., New York: North Holland. Fildes, R. (1979), “Quantitative Forecasting—the State of the Art: Extrapolative Models,” Journal of Operational Research Society, 30, 691–710. Fuller, W.A. (1976), Introduction to Statistical Time Series, New York: John Wiley & Sons. Gardner, E.S., Jr. (1984), “The Strange Case of the Lagging Forecasts,” Interfaces, 14, 47–50. Gardner, E.S., Jr. (1985), “Exponential Smoothing: the State of the Art,” Journal of Forecasting, 4, 1–38. Granger, C.W.J. and Newbold, P. (1977), Forecasting Economic Time Series, New York: Academic Press, Inc. Greene, W.H. (1993), Econometric Analysis, Second Edition, New York: Macmillan Publishing Company.
2692 F Chapter 44: Forecasting Process Details
Hamilton, J. D. (1994), Time Series Analysis, Princeton: Princeton University Press. Harvey, A.C. (1981), Time Series Models, New York: John Wiley & Sons. Harvey, A.C. (1984), “A Unified View of Statistical Forecasting Procedures,” Journal of Forecasting, 3, 245–275. Hopewood, W.S., McKeown, J.C., and Newbold, P. (1984), “Time Series Forecasting Models Involving Power Transformations,” Journal of Forecasting, Vol 3, No. 1, 57–61. Jones, Richard H. (1980), “Maximum Likelihood Fitting of ARMA Models to Time Series with Missing Observations,” Technometrics, 22, 389–396. Judge, G.G., Griffiths, W.E., Hill, R.C., and Lee, T.C. (1980), The Theory and Practice of Econometrics, New York: John Wiley & Sons. Ledolter, J. and Abraham, B. (1984), “Some Comments on the Initialization of Exponential Smoothing,” Journal of Forecasting, 3, 79–84. Ljung, G.M. and Box, G.E.P. (1978), “On a Measure of Lack of Fit in Time Series Models,” Biometrika, 65, 297–303. Makridakis, S., Wheelwright, S.C., and McGee, V.E. (1983), Forecasting: Methods and Applications, Second Edition, New York: John Wiley & Sons. McKenzie, Ed (1984), “General Exponential Smoothing and the Equivalent ARMA Process,” Journal of Forecasting, 3, 333–344. McKenzie, Ed (1986), “Error Analysis for Winters’ Additive Seasonal Forecasting System,” International Journal of Forecasting, 2, 373–382. Montgomery, D.C. and Johnson, L.A. (1976), Forecasting and Time Series Analysis, New York: McGraw-Hill. Morf, M., Sidhu, G.S., and Kailath, T. (1974), “Some New Algorithms for Recursive Estimation on Constant Linear Discrete Time Systems,” I.E.E.E. Transactions on Automatic Control, AC-19, 315–323. Nelson, C.R. (1973), Applied Time Series for Managerial Forecasting, San Francisco: Holden-Day. Newbold, P. (1981), “Some Recent Developments in Time Series Analysis,” International Statistical Review, 49, 53–66. Newton, H. Joseph and Pagano, Marcello (1983), “The Finite Memory Prediction of Covariance Stationary Time Series,” SIAM Journal of Scientific and Statistical Computing, 4, 330–339. Pankratz, Alan (1983), Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, New York: John Wiley & Sons. Pankratz, Alan (1991), Forecasting with Dynamic Regression Models, New York: John Wiley & Sons.
References F 2693
Pankratz, A. and Dudley, U. (1987), “Forecast of Power-Transformed Series,” Journal of Forecasting, Vol 6, No. 4, 239–248. Pearlman, J.G. (1980), “An Algorithm for the Exact Likelihood of a High-Order Autoregressive Moving-Average Process,” Biometrika, 67, 232–233. Priestly, M.B. (1981), Spectral Analysis and Time Series, Volume 1: Univariate Series, New York: Academic Press, Inc. Roberts, S.A. (1982), “A General Class of Holt-Winters Type Forecasting Models,” Management Science, 28, 808–820. Schwarz, G. (1978), “Estimating the Dimension of a Model,” Annals of Statistics, 6, 461–464. Sweet, A.L. (1985), “Computing the Variance of the Forecast Error for the Holt-Winters Seasonal Models,” Journal of Forecasting, 4, 235–243. Winters, P.R. (1960), “Forecasting Sales by Exponentially Weighted Moving Averages,” Management Science, 6, 324–342. Yar, M. and Chatfield, C. (1990), “Prediction Intervals for the Holt-Winters Forecasting Procedure,” International Journal of Forecasting, 6, 127–137. Woodfield, T.J. (1987), “Time Series Intervention Analysis Using SAS Software,” Proceedings of the Twelfth Annual SAS Users Group International Conference, 331–339. Cary, NC: SAS Institute Inc.
2694
Part V
Investment Analysis
2696
Chapter 45
Overview Contents About Investment Analysis . Starting Investment Analysis Getting Help . . . . . . . . Using Help . . . . . . . . . Software Requirements . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
2697 2698 2699 2699 2700
About Investment Analysis The Investment Analysis system is an interactive environment for the time-value of money of a variety of investments: loans savings depreciations bonds generic cashflows Various analyses are provided to help analyze the value of investment alternatives: time value, periodic equivalent, internal rate of return, benefit-cost ratio, and breakeven analysis. These analyses can help answer a number of questions you may have about your investments: Which option is more profitable or less costly? Is it better to buy or rent? Are the extra fees for refinancing at a lower interest rate justified? What is the balance of this account after saving this amount periodically for so many years? How much is legally tax-deductible?
2698 F Chapter 45: Overview
Is this a reasonable price? Investment Analysis can be beneficial to users in many industries for a variety of decisions: manufacturing: cost justification of automation or any capital investment, replacement analysis of major equipment, or economic comparison of alternative designs government: setting funds for services finance: investment analysis and portfolio management for fixed-income securities
Starting Investment Analysis There are two ways to invoke Investment Analysis from the main SAS window. One way is to select Solutions ! Analysis ! Investment Analysis from the main SAS menu, as displayed in Figure 45.1. Figure 45.1 Initializing Investment Analysis with the Menu Bar
The other way is to type INVEST into the toolbar’s command prompt, as displayed in Figure 45.2. Figure 45.2 Initializing Investment Analysis with the Toolbar
Using Help F 2699
Getting Help You can get help in Investment Analysis in three ways. One way is to use the Help Menu, as displayed in Figure 45.3. This is the right-most menu item on the menu bar. Figure 45.3 The Help Menu
Help buttons, as in Figure 45.4, provide another way to access help. Most dialog boxes provide help buttons in their lower-right corners. Figure 45.4 A Help Button
Also, the toolbar has a button (see Figure 45.5) that invokes the help system. This is the right-most icon on the toolbar. Figure 45.5 The Help Icon
Each of these methods invokes a browser that gives specific help for the active window.
Using Help The chapters pertaining to Investment Analysis in this document typically have a section that introduces you to a menu and summarizes the options available through the menu. Such chapters then have sections titled Task and Dialog Box Guides. The Task section provides a description of how to perform many useful tasks. The Dialog Box Guide lists all dialog boxes pertinent to those tasks and gives a brief description of each element of each dialog box.
2700 F Chapter 45: Overview
Software Requirements Investment Analysis uses the following SAS software: Base SAS SAS/ETS SAS/GRAPH (optional, to view bond pricing and breakeven graphs)
Chapter 46
Portfolios Contents The File Menu . . . . . . . . . . . . . . . . . Tasks . . . . . . . . . . . . . . . . . . . . . . Creating a New Portfolio . . . . . . . . . Saving a Portfolio . . . . . . . . . . . . Opening an Existing Portfolio . . . . . . Saving a Portfolio to a Different Name . Selecting Investments within a Portfolio . Dialog and Utility Guide . . . . . . . . . . . . Investment Analysis . . . . . . . . . . . Menu Bar Options . . . . . . . . . . . . Right-Clicking within the Portfolio Area
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
2701 2702 2702 2703 2703 2704 2705 2706 2706 2707 2707
The File Menu Investment Analysis stores portfolios as catalog entries. Portfolios contain a collection of investments, providing a structure to collect investments with a common purpose or goal (like a retirement or building fund portfolio). It may be advantageous also to collect investments into a common portfolio if they are competing investments you want to perform a comparative analysis upon. Within this structure you can perform computations and analyses on a collection of investments in a portfolio, just as you would perform them on a single investment. Investment Analysis provides many tools to aid in your manipulation of portfolios through the File menu, shown in Figure 46.1.
2702 F Chapter 46: Portfolios
Figure 46.1 File Menu
The File menu offers the following items: New Portfolio creates an empty portfolio with a new name. Open Portfolio opens the standard SAS Open dialog box where you select a portfolio to open. Save Portfolio saves the current portfolio to its current name. Save Portfolio As opens the standard SAS Save As dialog box where you supply a new portfolio name for the current portfolio. Close closes Investment Analysis. Exit closes SAS (Windows only).
Tasks
Creating a New Portfolio From the Investment Analysis dialog box, select File ! New Portfolio.
Saving a Portfolio F 2703
Figure 46.2 Creating a New Portfolio
The Portfolio Name is WORK.INVEST.INVEST1 as displayed in Figure 46.2, unless you have saved a portfolio to that name in the past. In that case, some other unused portfolio name is given to the new portfolio.
Saving a Portfolio From the Investment Analysis dialog box, select File ! Save Portfolio. The portfolio is saved to a catalog-entry with the name in the Portfolio Name box.
Opening an Existing Portfolio From the Investment Analysis dialog box, select File ! Open Portfolio. This opens the standard SAS Open dialog box. You enter the name of a SAS portfolio to open in the Entry Name box. For example, enter SASHELP.INVSAMP.NVST as displayed in Figure 46.3.
2704 F Chapter 46: Portfolios
Figure 46.3 Opening an Existing Portfolio
Click Open to load the portfolio. The portfolio should look like Figure 46.4. Figure 46.4 The Opened Portfolio
Saving a Portfolio to a Different Name From the Investment Analysis dialog box, select File ! Save Portfolio As. This opens the standard SAS Save As dialog box. You can enter the name of a SAS portfolio into the Entry Name box. For example, enter SASUSER.MY_PORTS.PORT1, as in Figure 46.5.
Selecting Investments within a Portfolio F 2705
Figure 46.5 Saving a Portfolio to a Different Name
Click Save to save the portfolio.
Selecting Investments within a Portfolio To select a single investment in an opened portfolio, click the investment in the Portfolio area within the Investment Analysis dialog box. To select a list of adjacent investments, do the following: click the first investment, hold down SHIFT, and click the final investment. After the list of investments is selected, you can release the SHIFT key. The selected investments will appear highlighted as in Figure 46.6. Figure 46.6 Selecting Investments within a Portfolio
2706 F Chapter 46: Portfolios
Dialog and Utility Guide
Investment Analysis Figure 46.7 Investment Analysis Dialog Box
Investment Portfolio Name holds the name of the portfolio. The name is of the form library.catalog_entry.portfolio. The default portfolio name is work.invest.invest1, as in Figure 46.7. Portfolio Description provides a more descriptive explanation of the portfolio’s contents. You can edit this description any time this dialog box is active. The Portfolio area contains the list of investments comprising the particular portfolio. Each investment in the Portfolio area displays the following attributes: Name is the name of the investment. It must be a valid SAS name. It is used to distinguish investments when performing analyses and computations. Label is a place where you can provide a more descriptive explanation of the investment. Type is the type of investment, which is fixed when you create the investment. It is one of the following: LOAN, SAVINGS, DEPRECIATION, BOND, or OTHER. Additional tools to aid in the management of your portfolio are available by selecting from the menu bar or by right-clicking within the Portfolio area.
Right-Clicking within the Portfolio Area F 2707
Menu Bar Options Figure 46.8 The Menu Bar
The menu bar (shown in Figure 46.8) provides many tools to aid in the management of portfolios and the investments that comprise them. The following menu items provide functionality particular to Investment Analysis: File opens and saves portfolios. Investment creates new investments within the portfolio. Compute performs constant dollar, after tax, and currency conversion computations on generic cashflows. Analyze analyzes investments to aid in decision-making. Tools sets default values of inflation and income tax rates.
Right-Clicking within the Portfolio Area Figure 46.9 Right-Clicking
After selecting an investment, right-clicking in the Portfolio area pops up a menu (see Figure 46.9) that offers the following options: Edit opens the selected investment within the portfolio. Duplicate creates a duplicate of the selected investment within the portfolio. Delete removes the selected investment from the portfolio. If you wish to perform one of these actions on a collection of investments, you must select a collection of investments (as described in the section “Selecting Investments within a Portfolio” on page 2705) before right-clicking.
2708
Chapter 47
Investments Contents The Investment Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loan Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying Savings Terms to Create an Account Summary . . . . . . . Depreciation Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bond Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generic Cashflow Tasks . . . . . . . . . . . . . . . . . . . . . . . . . Dialog Box Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loan Initialization Options . . . . . . . . . . . . . . . . . . . . . . . . Loan Prepayments . . . . . . . . . . . . . . . . . . . . . . . . . . . . Balloon Payments . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rate Adjustment Terms . . . . . . . . . . . . . . . . . . . . . . . . . Rounding Off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Depreciation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Depreciation Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bond Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bond Price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generic Cashflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . Right-Clicking within Generic Cashflow’s Cashflow Specification Area Flow Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . Forecast Specification . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
2709 2711 2711 2718 2719 2722 2726 2733 2733 2735 2736 2737 2737 2739 2739 2741 2743 2743 2745 2746 2747 2748 2749 2751
The Investment Menu Because there are many types of investments, a tool that manages and analyzes collections of investments must be robust and flexible. Providing specifications for four specific investment types and one generic type, Investment Analysis can model almost any real-world investment.
2710 F Chapter 47: Investments
Figure 47.1 Investment Menu
The Investment menu, shown in Figure 47.1, offers the following items: New ! Loan opens the Loan dialog box. Loans are useful for acquiring capital to pursue various interests. Available terms include rate adjustments for variable rate loans, initialization costs, prepayments, and balloon payments. New ! Savings opens the Savings dialog box. Savings are necessary when planning for the future, whether for business or personal purposes. Account summary calculations available per deposit include starting balance, deposits, interest earned, and ending balance. New ! Depreciation opens the Depreciation dialog box. Depreciations are relevant in tax calculation. The available depreciation methods are Straight Line, Sum-of-years Digits, Depreciation Table, and Declining Balance. Depreciation Tables are necessary when depreciation calculations must conform to set yearly percentages. Declining Balance with conversion to Straight Line is also provided. New ! Bond opens the Bond dialog box. Bonds have widely varying terms depending on the issuer. Because bond issuers frequently auction their bonds, the ability to price a bond between the issue date and maturity date is desirable. Fixed-coupon bonds may be analyzed for the following: price versus yield-to-maturity, duration, and convexity. These are available at different times in the bond’s life. New ! Generic Cashflow opens the Generic Cashflow dialog box. Generic cashflows are the most flexible investments. Only a sequence of date-amount pairs is necessary for specification. You can enter date-amount pairs and load values from SAS data sets to specify any type of investment. You can generate uniform, arithmetic, and geometric cashflows with ease. SAS’s forecasting ability is available to forecast future cashflows as well. The new graphical display aids in visualization of the cashflow and enables the user to change the frequency of the cashflow view to aggregate and disaggregate the view. Edit opens the specification dialog box for an investment selected within the portfolio. Duplicate creates a duplicate of an investment selected within the portfolio. Delete removes an investment selected from the portfolio. If you want to edit, duplicate, or delete a collection of investments, you must select a collection of investments as described in the section “Selecting Investments within a Portfolio” on page 2705 before performing the menu-option.
Tasks F 2711
Tasks
Loan Tasks Suppose you want to buy a home that costs $100,000. You can make a down payment of $20,000. Hence, you need a loan of $80,000. You are able to acquire a 30-year loan at 7% interest starting January 1, 2000. Let’s use Investment Analysis to specify and analyze this loan. In the Investment Analysis dialog box, select Investment ! New ! Loan from the menu bar to open the Loan dialog box.
Specifying Loan Terms to Create an Amortization Schedule You must specify the loan before generating the amortization table. To specify the loan, follow these steps: 1. Enter MORTGAGE for the Name. 2. Enter 80000 for the Loan Amount. 3. Enter 7 for the Initial Rate. 4. Enter 360 for the Number of Payments. 5. Enter 01JAN2000 for the Start Date. After you have specified the loan, click Create Amortization Schedule to generate the amortization schedule displayed in Figure 47.2.
2712 F Chapter 47: Investments
Figure 47.2 Creating an Amortization Schedule
Storing Other Loan Terms Let’s include information concerning the purchase price and downpayment. These terms are not necessary to specify the loan, but it may be advantageous to store such information with the loan. Consider the loan described in the section “Loan Tasks” on page 2711. In the Loan dialog box (Figure 47.2) click Initialization to open the Loan Initialization Options dialog box, where you can specify the down payment, initialization costs, and discount points. To specify the down payment, enter 100000 for the Purchase Price, as shown in Figure 47.3.
Loan Tasks F 2713
Figure 47.3 Including the Purchase Price
Click OK to return to the Loan dialog box.
Adding Prepayments Now let’s observe the effect of prepayments on the loan. Consider the loan described in the section “Loan Tasks” on page 2711. You must pay a minimum of $532.24 each month to keep up with payments. However, let’s say you dislike entering this amount in your checkbook. You would rather pay $550.00 to keep the arithmetic simpler. This would constitute a uniform prepayment of $17.76 each month. In the Loan dialog box, click Prepayments, which opens the Loan Prepayments dialog box shown in Figure 47.4.
2714 F Chapter 47: Investments
Figure 47.4 Specifying the Loan Prepayments
You can specify an arbitrary sequence of prepayments in the Prepayments area. If you want a uniform prepayment, clear the Prepayments area and enter the uniform payment amount in the Uniform Prepayment text box. That amount will be added to each payment until the loan is paid off. To specify this uniform prepayment, follow these steps: 1. Enter 17.76 for the Uniform Prepayment. 2. Click OK to return to the Loan dialog box. 3. Click Create Amortization Schedule, and the amortization schedule is updated, as displayed in Figure 47.5.
Loan Tasks F 2715
Figure 47.5 The Amortization Schedule with Loan Prepayments
The last payment is on January 2030 without prepayments and February 2027 with prepayment; you would pay the loan off almost three years earlier with the $17.76 prepayments. To continue this example you must remove the prepayments from the loan specification, following these steps: 1. Reopen the Loan Prepayments dialog box from the Loan dialog box by clicking Prepayments. 2. Enter 0 for Uniform Prepayment. 3. Click OK to return to the Loan dialog box.
Adding Balloon Payments Consider the loan described in the section “Loan Tasks” on page 2711. Suppose you cannot afford the payments of $532.24 each month. To lessen your monthly payment, you could pay balloon payments of $6,000 at the end of 2007 and 2023. You wonder how this would affect your monthly payment. (Note that Investment Analysis does not allow both balloon payments and rate adjustments to be specified for a loan.) In the Loan dialog box, click Balloon Payments, which opens the Balloon Payments dialog box shown in Figure 47.6.
2716 F Chapter 47: Investments
Figure 47.6 Defining Loan Balloon Payments
You can specify an arbitrary sequence of balloon payments by adding date-amount pairs to the Balloon Payments area. To specify these balloon payments, follow these steps: 1. Right-click within the Balloon Payment area (which pops up a menu) and release on New. 2. Set the pair’s Date to 01JAN2007. 3. Set Amount to 6000. 4. Right-click within the Balloon Payment area (which pops up a menu) and release on New. 5. Set the new pair’s Date to 01JAN2023. 6. Set its Amount to 6000. Click OK to return to the Loan dialog box. Click Create Amortization Schedule, and the amortization schedule is updated. Your monthly payment is now $500.30, a difference of approximately $32 each month. To continue this example you must remove the balloon payments from the loan specification, following these steps: 1. Reopen the Balloon Payments dialog box. 2. Right-click within the Balloon Payment area (which pops up a menu) and release on Clear. 3. Click OK to return to the Loan dialog box.
Handling Rate Adjustments Consider the loan described in the section “Loan Tasks” on page 2711. Another option for lowering your payments is to get a variable rate loan. You can acquire a three-year adjustable rate mortgage (ARM) at 6% with a periodic cap of 1% with a maximum of 9%. (Note that Investment Analysis does not allow both rate adjustments and balloon payments to be specified for a loan.) In the Loan dialog box, click Rate Adjustments to open the Rate Adjustment Terms dialog box shown in Figure 47.7.
Loan Tasks F 2717
Figure 47.7 Setting the Rate Adjustments
To specify these loan adjustment terms, follow these steps: 1. Enter 3 for the Life Cap. The Life Cap is the maximum deviation from the Initial Rate. 2. Enter 1 for the Periodic Cap. 3. Enter 36 for the Adjustment Frequency. 4. Confirm that Worst Case is selected from the Rate Adjustment Assumption options. 5. Click OK to return to the Loan dialog box. 6. Enter 6 for the Initial Rate. 7. Click Create Amortization Schedule, and the amortization schedule is updated. Your monthly payment drops to $479.64 each month. However, if the worst-case scenario plays out, the payments will increase to $636.84 in nine years. Figure 47.8 displays amortization table information for the final few months under this scenario.
2718 F Chapter 47: Investments
Figure 47.8 The Amortization Schedule with Rate Adjustments
Click OK to return to the Investment Analysis dialog box.
Specifying Savings Terms to Create an Account Summary Suppose you put $500 each month into an account that earns 6% interest for 20 years. What is the balance of the account after those 20 years? In the Investment Analysis dialog box, select Investment ! New ! Savings from the menu bar to open the Savings dialog box. To specify the savings, follow these steps: 1. Enter RETIREMENT for the Name. 2. Enter 500 for the Periodic Deposit. 3. Enter 240 for the Number of Deposits. 4. Enter 6 for the Initial Rate.
Depreciation Tasks F 2719
You must specify the savings before generating the account summary. After you have specified the savings, click Create Account Summary to compute the ending date and balance and to generate the account summary displayed in Figure 47.9. Figure 47.9 Creating an Account Summary
Click OK to return to the Investment Analysis dialog box.
Depreciation Tasks Commercial assets are considered to lose value as time passes. For tax purposes, you want to quantify this loss. This investment structure helps calculate appropriate values. Suppose you spend $50,000 for a commercial fishing boat that is considered to have a ten-year useful life. How would you depreciate it? In the Investment Analysis dialog box, select Investment ! New ! Depreciation from the menu bar to open the Depreciation dialog box.
Specifying Depreciation Terms to Create a Depreciation Table To specify the depreciation, follow these steps: 1. Enter FISHING_BOAT for the Name.
2720 F Chapter 47: Investments
2. Enter 50000 for the Cost. 3. Enter 2000 for the Year of Purchase. 4. Enter 10 for the Useful Life. 5. Enter 0 for the Salvage Value. You must specify the depreciation before generating the depreciation schedule. After you have specified the depreciation, click Create Depreciation Schedule to generate a depreciation schedule like the one displayed in Figure 47.10. Figure 47.10 Creating a Depreciation Schedule
The default depreciation method is Declining Balance (with Conversion to Straight Line). Try the following methods to see how they each affect the schedule: Straight Line Sum-of-years Digits Declining Balance (without conversion to Straight Line) It might be useful to compare the value of the boat at 5 years for each method. A description of these methods is available in the section “Depreciation Methods” on page 2785.
Depreciation Tasks F 2721
Using the Depreciation Table Sometimes you want to force the depreciation rates to be certain percentages each year. This option is particularly useful for calculating modified accelerated cost recovery system (MACRS) depreciations. The United States’ Tax Reform Act of 1986 set depreciation rates for an asset based on an assumed lifetime for that asset. Since these lists of rates are important to many people, Investment Analysis provides SAS data sets for situations with yearly rates (using the “half-year convention”). Find them at SASHELP.MACRS* where * refers to the class of the property. For example, use SASHELP.MACRS15 for a fifteen-year property. (When using the MACRS with the Tax Reform Act tables, you must set the Salvage Value to zero.) Suppose you want to compute the depreciation schedule for the commercial fishing boat described in the section “Depreciation Tasks” on page 2719. The boat is a ten-year property according to the Tax Reform Act of 1986. To employ the MACRS depreciation from the Depreciation dialog box, follow these steps: 1. Select the Depreciation Table option within the Depreciation Method area and click OK. This opens the Depreciation Table dialog box. 2. Right-click within the Depreciation area (which pops up a menu) and select Load. 3. Enter SASHELP.MACRS10 for the Dataset Name. Figure 47.11.
The dialog box should look like
Figure 47.11 MACRS Percentages for a Ten-Year Property
Click OK to return to the Depreciation dialog box. Click Create Depreciation Schedule and the depreciation schedule fills (see Figure 47.12).
2722 F Chapter 47: Investments
Note there are eleven entries in this depreciation schedule. This is because of the half-year convention that enables you to deduct one half of a year the first year which leaves a half year to deduct after the useful life is over. Figure 47.12 Depreciation Table with MACRS10
Click OK to return to the Investment Analysis dialog box.
Bond Tasks Suppose someone offers to sell you a 20-year utility bond that was issued six years ago. It has a $1,000 face value and pays semi-year coupons at 2%. You can purchase it for $780. Would you be satisfied with this bond if you expect an 8% minimum attractive rate of return (MARR)? In the Investment Analysis dialog box, select Investment ! New ! Bond from the menu bar to open the Bond dialog box.
Specifying Bond Terms To specify the bond, follow these steps: 1. Enter UTILITY_BOND for the Name.
Bond Tasks F 2723
2. Enter 1000 for the Face Value. 3. Enter 2 for the Coupon Rate. The Coupon Payment updates to 20. 4. Select SEMIYEAR for Coupon Interval. 5. Enter 28 for the Number of Coupons. Because 14 years remain before the bond matures, the bond still has 28 semiyear coupons to pay. The Maturity Date updates.
Computing the Price from Yield Enter 8 for Yield within the Valuation area. You see the bond’s value would be $666.72 as in Figure 47.13. Figure 47.13 Bond Value
Computing the Yield from Price Now enter 780 for Value within the Valuation area. You see the yield is only 6.5%, as in Figure 47.14. This is not acceptable if you desire an 8% MARR.
2724 F Chapter 47: Investments
Figure 47.14 Bond Yield
Performing Bond Analysis To perform bond-pricing analysis, follow these steps: 1. Click Analyze to open the Bond Analysis dialog box. 2. Enter 8.0 as the Yield to Maturity. 3. Enter 4.0 as the +/-. 4. Enter 0.5 as the Increment by. 5. Enter 780 as the Reference Price. 6. Click Create Bond Valuation Summary. The Bond Valuation Summary area fills and shows you the different values for various yields as in Figure 47.15.
Bond Tasks F 2725
Figure 47.15 Bond Price Analysis
Creating a Price versus Yield-to-Maturity Graph Click Graphics to open the Bond Price dialog box. This contains the price versus yield-to-maturity graph shown in Figure 47.16.
2726 F Chapter 47: Investments
Figure 47.16 Bond Price Graph
Click Return to return to the Bond Analysis dialog box. In the Bond Analysis dialog box, click OK to return to the Bond dialog box. In the Bond dialog box, click OK to return to the Investment Analysis dialog box.
Generic Cashflow Tasks To specify a generic cashflow, you merely define any sequence of date-amount pairs. The flexibility of generic cashflows enables the user to represent economic alternatives or investments that do not fit into loan, savings, depreciation, or bond specifications. In the Investment Analysis dialog box, select Investment ! New ! Generic Cashflow from the menu bar to open the Generic Cashflow dialog box. Enter RETAIL for the Name as in Figure 47.17.
Generic Cashflow Tasks F 2727
Figure 47.17 Introducing the Generic Cashflow
Right-Clicking within the Cashflow Specification Area Right-clicking within Generic Cashflow’s Cashflow Specification area reveals the pop-up menu displayed in Figure 47.18. The menu provides many useful tools to assist you in creating these date-amount pairs. Figure 47.18 Right-Clicking within the Cashflow Specification Area
The following sections describe how to use most of these right-click options. The Specify and Forecast menu items are described in the sections “Including a Generated Cashflow” on page 2729 and “Including a Forecasted Cashflow” on page 2731.
2728 F Chapter 47: Investments
Adding a New Date-Amount Pair
To add a new date-amount pair manually, follow these steps: 1. Right-click in the Cashflow Specification area as shown in Figure 47.18, and release on Add. 2. Enter 01JAN01 for the date. 3. Enter 100 for the amount. Copying a Date-Amount Pair
To copy a selected date-amount pair, follow these steps: 1. Select the pair you just created. 2. Right-click in the Cashflow Specification area as shown in Figure 47.18, but this time release on Copy. Sorting All of the Date-Amount Pairs
Change the second date to 01JAN00. Now the dates are unsorted. Cashflow Specification area as shown in Figure 47.18, and release on Sort.
Right-click in the
Deleting a Date-Amount Pair
To delete a selected date-amount pair, follow these steps: 1. Select a date-amount pair. 2. Right-click in the Cashflow Specification area as shown in Figure 47.18, and release on Delete.
Clearing All of the Date-Amount Pairs
To clear all date-amount pairs, right-click in the Cashflow Specification area as shown in Figure 47.18, and release on Clear.
Loading Date-Amount Pairs from a Data Set
To load date-amount pairs from a SAS data set into the Cashflow Specification area, follow these steps: 1. Right-click in the Cashflow Specification area, and release on Load. Load Dataset dialog box. 2. Enter SASHELP.RETAIL for Dataset Name.
This opens the
Generic Cashflow Tasks F 2729
3. Click OK to return to the Generic Cashflow dialog box. If there is a Date variable in the SAS data set, Investment Analysis loads it into the list. If there is no date-time-formatted variable, it loads the first available date or date-time-formatted variable. Investment Analysis then searches the SAS data set for an Amount variable to use. If none exists, it takes the first numeric variable that is not used by the Date variable.
Saving Date-Amount Pairs to a Data Set
To save date-amount pairs from the Cashflow Specification area to a SAS data set, follow these steps: 1. Right-click in the Cashflow Specification area and release on Save. Save Dataset dialog box.
This opens the
2. Enter the name of the SAS data set for Dataset Name. 3. Click OK to return to the Generic Cashflow dialog box.
Including a Generated Cashflow To generate date-amount pairs for the Cashflow Specification area, follow these steps: 1. Right-click in the Cashflow Specification area and release on Specify. This opens the Flow Specification dialog box. 2. Select YEAR for the Time Interval. 3. Enter today’s date for the Starting Date. 4. Enter 10 for the Number of Periods. The Ending Date updates. 5. Enter 100 for the level. You can visualize the specification in the Cashflow Chart area (see Figure 47.19). 6. Click Add to add the specified cashflow to the list in the Generic Cashflow dialog box. Clicking Add also returns you to the Generic Cashflow dialog box.
2730 F Chapter 47: Investments
Figure 47.19 Uniform Cashflow Specification
Clicking Subtract subtracts the current cashflow from the Generic Cashflow dialog box, and it returns you to the Generic Cashflow dialog box. You can generate arithmetic and geometric specifications by clicking them within the Series Flow Type area. However, you must enter a value for the Gradient. In both cases the Level value is the value of the list at the Starting Date. With an arithmetic flow type, entries increment by the value Gradient for each Time Interval. With a geometric flow type, entries increase by the factor Gradient for each Time Interval. Figure 47.20 displays an arithmetic cashflow with a Level of 100 and a Gradient of 12.
Generic Cashflow Tasks F 2731
Figure 47.20 Arithmetic Cashflow Specification
Including a Forecasted Cashflow To generate date-amount pairs for the Cashflow Specification area, follow these steps: 1. Right-click in the Cashflow Specification area and release on Forecast to open the Forecast Specification dialog box. 2. Enter sashelp.retail as the Data Set. 3. Select SALES for the Analysis Variable. 4. Click Compute Forecast to generate the forecast. You can visualize the forecast in the Cashflow Chart area (see Figure 47.21). 5. Click Add to add the forecast to the list in the Generic Cashflow dialog box. Clicking Add also returns you to the Generic Cashflow dialog box.
2732 F Chapter 47: Investments
Figure 47.21 Cashflow Forecast
Clicking Subtract subtracts the current forecast from the Generic Cashflow dialog box, and it returns you to the Generic Cashflow dialog box. To review the values from the SAS data set you forecast, click View Table or View Graph. You can adjust the following values for the SAS data set you forecast: Time ID Variable, Time Interval, and Analysis Variable. You can adjust the following values for the forecast: the Horizon, the Confidence, and choice of predicted value, lower confidence limit, and upper confidence limit.
Using the Cashflow Chart Three dialog boxes contain the Cashflow Chart to aide in your visualization of cashflows: Generic Cashflow, Flow Specification, and Forecast Specification. Within this chart, you possess the following tools: You can click on a bar in the plot and view its Cashflow Date and Cashflow Amount. You can change the aggregation period of the view with the box in the lower-left corner of the Cashflow Chart. You can take the quarterly sales figures from the previous example, select YEAR as the value for this box, and view the annual sales figures. You can change the number in the
Loan F 2733
box to the right of the horizontal scroll bar to alter the number of entries you want to view. The number in that box must be no greater than the number of entries in the cashflow list. Lessening this number has the effect of zooming in upon a portion of the cashflow. When the number is less than the number of entries in the cashflow list, you can use the scroll bar at the bottom of the chart to scroll through the chart.
Dialog Box Guide
Loan Selecting Investment ! New ! Loan from the Investment Analysis dialog box’s menu bar opens the Loan dialog box displayed in Figure 47.22. Figure 47.22 Loan Dialog Box
The following items are displayed: Name holds the name you assign to the loan. You can set the name here or within the Portfolio area of the Investment Analysis dialog box. This must be a valid SAS name. The Loan Specification area gives access to the values that define the loan. Loan Amount holds the borrowed amount. Periodic Payment holds the value of the periodic payments.
2734 F Chapter 47: Investments
Number of Payments holds the number of payments in loan terms. Payment Interval holds the frequency of the Periodic Payment. Compounding Interval holds the compounding frequency. Initial Rate holds the interest rate (a nominal percentage between 0 and 120) you pay on the loan. Start Date holds the SAS date when the loan is initialized. The first payment is due one Payment Interval after this time. Initialization opens the Loan Initialization Options dialog box where you can define initialization costs and down-payments relevant to the loan. Prepayments opens the Loan Prepayments dialog box where you can specify the SAS dates and amounts of any prepayments. Balloon Payments opens the Balloon Payments dialog box where you can specify the SAS dates and amounts of any balloon payments. Rate Adjustments opens the Rate Adjustment Terms dialog box where you can specify terms for a variable-rate loan. Rounding Off opens the Rounding Off dialog box where you can select the number of decimal places for calculations. Create Amortization Schedule becomes available when you adequately define the loan within the Loan Specification area. Clicking it generates the amortization schedule. Amortization Schedule fills when you click Create Amortization Schedule. The schedule contains a row for the loan’s start-date and each payment-date with information about the following: Date is a SAS date, either the loan’s start-date or a payment-date. Beginning Principal Amount is the balance at that date. Periodic Payment Amount is the expected payment at that date. Interest Payment is zero for the loan’s start-date; otherwise it holds the interest since the previous date. Principal Repayment is the amount of the payment that went toward the principal. Ending Principal is the balance at the end of the payment interval. Print becomes available when you generate the amortization schedule. Clicking it sends the contents of the amortization schedule to the SAS session print device. Save Data As becomes available when you generate the amortization schedule. Clicking it opens the Save Output Dataset dialog box where you can save the amortization table (or portions thereof) as a SAS Dataset. OK returns you to the Investment Analysis dialog box. If this is a new loan specification, clicking OK appends the current loan specification to the portfolio. If this is an existing loan specification, clicking OK returns the altered loan specification to the portfolio. Cancel returns you to the Investment Analysis dialog box. If this is a new loan specification, clicking Cancel discards the current loan specification. If this is an existing loan specification, clicking Cancel discards the current changes.
Loan Initialization Options F 2735
Loan Initialization Options Clicking Initialization in the Loan dialog box opens the Loan Initialization Options dialog box displayed in Figure 47.23. Figure 47.23 Loan Initialization Options Dialog Box
The following items are displayed: The Price, Loan Amount and Downpayment area contains the following information: Purchase Price holds the actual price of the asset. This value equals the loan amount plus the downpayment. Loan Amount holds the loan amount. % of Price (to the right of Loan Amount) updates when you enter the Purchase Price and either the Loan Amount or Downpayment. This holds the percentage of the Purchase Price that comprises the Loan Amount. Setting the percentage manually causes the Loan Amount and Downpayment to update. Downpayment holds any downpayment paid for the asset. % of Price (to the right of Downpayment) updates when you enter the Purchase Price and either the Loan Amount or Downpayment. This holds the percentage of the Purchase Price that comprises the Downpayment. Setting the percentage manually causes the Loan Amount and Downpayment to update.
2736 F Chapter 47: Investments
Initialization Costs and Discount Points area Loan Amount holds a copy of the Loan Amount above. Initialization Costs holds the value of any initialization costs. % of Amount (to the right of Initialization Costs) updates when you enter the Purchase Price and either the Initialization Costs or Discount Points. This holds the percentage of the Loan Amount that comprises the Initialization Costs. Setting the percentage manually causes the Initialization Costs to update. Discount Points holds the value of any discount points. % of Amount (to the right of Discount Points) updates when you enter the Purchase Price and either the Initialization Costs or Discount Points. This holds the percentage of the Loan Amount that comprises the Discount Points. Setting the percentage manually causes the Discount Points to update. OK returns you to the Loan dialog box, saving the information that is entered. Cancel returns you to the Loan dialog box, discarding any changes made since you opened the dialog box.
Loan Prepayments Clicking Prepayments in the Loan dialog box opens the Loan Prepayments dialog box displayed in Figure 47.24. Figure 47.24 Loan Prepayments Dialog Box
The following items are displayed: Uniform Prepayment holds the value of a regular prepayment concurrent to the usual periodic payment.
Rate Adjustment Terms F 2737
Prepayments holds a list of date-amount pairs to accommodate any prepayments. Right-clicking within the Prepayments area reveals many helpful tools for managing date-amount pairs. OK returns you to the Loan dialog box, storing the information entered on the prepayments. Cancel returns you to the Loan dialog box, discarding any prepayments entered since you opened the dialog box.
Balloon Payments Clicking Balloon Payments in the Loan dialog box opens the Balloon Payments dialog box displayed in Figure 47.25. Figure 47.25 Balloon Payments Dialog Box
The following items are displayed: Balloon Payments holds a list of date-amount pairs to accommodate any balloon payments. Rightclicking within the Balloon Payments area reveals many helpful tools for managing date-amount pairs. OK returns you to the Loan dialog box, storing the information entered on the balloon payments. Cancel returns you to the Loan dialog box, discarding any balloon payments entered since you opened the dialog box.
Rate Adjustment Terms Clicking Rate Adjustments in the Loan dialog box opens the Rate Adjustment Terms dialog box displayed in Figure 47.26.
2738 F Chapter 47: Investments
Figure 47.26 Rate Adjustment Terms Dialog Box
The following items are displayed: The Rate Adjustment Terms area Life Cap holds the maximum deviation from the Initial Rate allowed over the life of the loan. Periodic Cap holds the maximum adjustment allowed per adjustment. Adjustment Frequency holds how often (in months) the lender can adjust the interest rate. The Rate Adjustment Assumption determines the scenario the adjustments will take. Worst Case uses information from the Rate Adjustment Terms area to forecast a worstcase scenario. Best Case uses information from the Rate Adjustment Terms area to forecast a best-case scenario. Fixed Case specifies a fixed-rate loan. Estimated Case uses information from the Rate Adjustment Terms and Estimated Rate area to forecast a best-case scenario. Estimated Rates holds a list of date-rate pairs, where each date is a SAS date and the rate is a nominal percentage between 0 and 120. The Estimated Case assumption uses these rates for its calculations. Right-clicking within the Estimated Rates area reveals many helpful tools for managing date-rate pairs. OK returns you to the Loan dialog box, taking rate adjustment information into account.
Savings F 2739
Cancel returns you to the Loan dialog box, discarding any rate adjustment information provided since opening the dialog box.
Rounding Off Clicking Rounding Off in the Loan dialog box opens the Rounding Off dialog box displayed in Figure 47.27. Figure 47.27 Rounding Off Dialog Box
The following items are displayed: Decimal Places fixes the number of decimal places your results will display. OK returns you to the Loan dialog box. Numeric values will then be represented with the number of decimals specified in Decimal Places. Cancel returns you to the Loan dialog box. Numeric values will be represented with the number of decimals specified prior to opening this dialog box.
Savings Selecting Investment ! New ! Savings from the Investment Analysis dialog box’s menu bar opens the Savings dialog box displayed in Figure 47.28.
2740 F Chapter 47: Investments
Figure 47.28 Savings Dialog Box
The following items are displayed: Name holds the name you assign to the savings. You can set the name here or within the Portfolio area of the Investment Analysis dialog box. This must be a valid SAS name. The Savings Specification area Periodic Deposit holds the value of your regular deposits. Number of Deposits holds the number of deposits into the account. Initial Rate holds the interest rate (a nominal percentage between 0 and 120) the savings account earns. Start Date holds the SAS date when deposits begin. Deposit Interval holds the frequency of your Periodic Deposit. Compounding Interval holds how often the interest compounds. Create Account Summary becomes available when you adequately define the savings within the Savings Specification area. Clicking it generates the account summary. Account Summary fills when you click Create Account Summary. The schedule contains a row for each deposit-date with information about the following: Date is the SAS date of a deposit. Starting Balance is the balance at that date. Deposits is the deposit at that date. Interest Earned is the interest earned since the previous date.
Depreciation F 2741
Ending Balance is the balance after the payment. Print becomes available when you generate an account summary. Clicking it sends the contents of the account summary to the SAS session print device. Save Data As becomes available when you generate an account summary. Clicking it opens the Save Output Dataset dialog box where you can save the account summary (or portions thereof) as a SAS Dataset. OK returns you to the Investment Analysis dialog box. If this is a new savings, clicking OK appends the current savings specification to the portfolio. If this is an existing savings specification, clicking OK returns the altered savings to the portfolio. Cancel returns you to the Investment Analysis dialog box. If this is a new savings, clicking Cancel discards the current savings specification. If this is an existing savings, clicking Cancel discards the current changes.
Depreciation Selecting Investment ! New ! Depreciation from the Investment Analysis dialog box’s menu bar opens the Depreciation dialog box displayed in Figure 47.29. Figure 47.29 Depreciation Dialog Box
The following items are displayed: Name holds the name you assign to the depreciation. You can set the name here or within the Portfolio area of the Investment Analysis dialog box. This must be a valid SAS name.
2742 F Chapter 47: Investments
Depreciable Asset Specification Cost holds the asset’s original cost. Year of Purchase holds the asset’s year of purchase. Useful Life holds the asset’s useful life (in years). Salvage Value holds the asset’s value at the end of its Useful Life. The Depreciation Method area holds the depreciation methods available: Straight Line Sum-of-years Digits Depreciation Table Declining Balance – DB Factor: choice of 2, 1.5, or 1 – Conversion to SL: choice of Yes or No Create Depreciation Schedule becomes available when you adequately define the depreciation within the Depreciation Asset Specification area. Clicking the Create Depreciation Schedule button then fills the Depreciation Schedule area. Depreciation Schedule fills when you click Create Depreciation Schedule. The schedule contains a row for each year. Each row holds: Year is a year. Start Book Value is the starting book value for that year. Depreciation is the depreciation value for that year. End Book Value is the ending book value for that year. Print becomes available when you generate the depreciation schedule. Clicking it sends the contents of the depreciation schedule to the SAS session print device. Save Data As becomes available when you generate the depreciation schedule. Clicking it opens the Save Output Dataset dialog box where you can save the depreciation table (or portions thereof) as a SAS Dataset. OK returns you to the Investment Analysis dialog box. If this is a new depreciation specification, clicking OK appends the current depreciation specification to the portfolio. If this is an existing depreciation specification, clicking OK returns the altered depreciation specification to the portfolio. Cancel returns you to the Investment Analysis dialog box. If this is a new depreciation specification, clicking Cancel discards the current depreciation specification. If this is an existing depreciation specification, clicking Cancel discards the current changes.
Bond F 2743
Depreciation Table Clicking Depreciation Table from Depreciation Method area of the Depreciation dialog box opens the Depreciation Table dialog box displayed in Figure 47.30. Figure 47.30 Depreciation Table Dialog Box
The following items are displayed: The Depreciation area holds a list of year-rate pairs where the rate is an annual depreciation rate (a percentage between 0% and 100%). Right-clicking within the Depreciation area reveals many helpful tools for managing year-rate pairs. OK returns you to the Depreciation dialog box with the current list of depreciation rates from the Depreciation area. Cancel returns you to the Depreciation dialog box, discarding any editions to the Depreciation area since you opened the dialog box.
Bond Selecting Investment ! New ! Bond from the Investment Analysis dialog box’s menu bar opens the Bond dialog box displayed in Figure 47.31.
2744 F Chapter 47: Investments
Figure 47.31 Bond
The following items are displayed: Name holds the name you assign to the bond. You can set the name here or within the Portfolio area of the Investment Analysis dialog box. This must be a valid SAS name. Bond Specification Face Value holds the bond’s value at maturity. Coupon Payment holds the amount of money you receive periodically as the bond matures. Coupon Rate holds the rate (a nominal percentage between 0% and 120%) of the Face Value that defines the Coupon Payment. Coupon Interval holds how often the bond pays its coupons. Number of Coupons holds the number of coupons before maturity. Maturity Date holds the SAS date when you can redeem the bond for its Face Value. The Valuation area becomes available when you adequately define the bond within the Bond Specification area. Entering either the Value or the Yield causes the calculation of the other. If you respecify the bond after performing a calculation here, you must reenter the Value or Yield value to update the calculation. Value holds the bond’s value if expecting the specified Yield. Yield holds the bond’s yield if the bond is valued at the amount of Value. You must specify the bond before analyzing it. After you have specified the bond, clicking Analyze opens the Bond Analysis dialog box where you can compare various values and yields.
Bond Analysis F 2745
OK returns you to the Investment Analysis dialog box. If this is a new bond specification, clicking OK appends the current bond specification to the portfolio. If this is an existing bond specification, clicking OK returns the altered bond specification to the portfolio. Cancel returns you to the Investment Analysis dialog box. If this is a new bond specification, clicking Cancel discards the current bond specification. If this is an existing bond specification, clicking Cancel discards the current changes.
Bond Analysis Clicking Analyze from the Bond dialog box opens the Bond Analysis dialog box displayed in Figure 47.32. Figure 47.32 Bond Analysis
The following items are displayed: Analysis Specifications Yield-to-maturity holds the percentage yield upon which to center the analysis. +/- holds the maximum deviation percentage to consider from the Yield-to-maturity. Increment by holds the percentage increment by which the analysis is calculated. Reference Price holds the reference price. Analysis Dates holds a list of SAS dates for which you perform the bond analysis.
2746 F Chapter 47: Investments
You must specify the analysis before valuing the bond for the various yields. After you adequately specify the analysis, click Create Bond Valuation Summary to generate the bond valuation summary. Bond Valuation Summary fills when you click Create Bond Valuation Summary. The schedule contains a row for each rate with information concerning the following: Date is the SAS date when the Value gives the particular Yield. Yield is the percent yield that corresponds to the Value at the given Date. Value is the value of the bond at Date for the given Yield. Percent Change is the percent change if the Reference Price is specified. Duration is the duration. Convexity is the convexity. Graphics opens the Bond Price graph that represents the price versus yield-to-maturity. Print becomes available when you generate the Bond Valuation Summary. Clicking it sends the contents of the summary to the SAS session print device. Save Data As becomes available when you fill the Bond Valuation Summary area. Clicking it opens the Save Output Dataset dialog box where you can save the valuation summary (or portions thereof) as a SAS Dataset. Return takes you back to the Bond dialog box.
Bond Price Clicking Graphics from the Bond dialog box opens the Bond Price dialog box displayed in Figure 47.33.
Generic Cashflow F 2747
Figure 47.33 Bond Price Graph
It possesses the following item: Return takes you back to the Bond Analysis dialog box.
Generic Cashflow Selecting Investment ! New ! Generic Cashflow from the Investment Analysis dialog box’s menu bar opens the Generic Cashflow dialog box displayed in Figure 47.34. Figure 47.34 Generic Cashflow
2748 F Chapter 47: Investments
The following items are displayed: Name holds the name you assign to the generic cashflow. You can set the name here or within the Portfolio area of the Investment Analysis dialog box. This must be a valid SAS name. Cashflow Specification holds date-amount pairs that correspond to deposits and withdrawals (or benefits and costs) for the cashflow. Each date is a SAS date. Right-clicking within the Cashflow Specification area reveals many helpful tools for managing date-amount pairs. The Cashflow Chart fills with a graph representing the cashflow when the Cashflow Specification area is nonempty. The box to the right of the scroll bar controls the number of entries with which to fill the graph. If the number in this box is less than the total number of entries, you can use the scroll bar to view different segments of the cashflow. The left box below the scroll bar holds the frequency for drilling purposes. OK returns you to the Investment Analysis dialog box. If this is a new generic cashflow specification, clicking OK appends the current cashflow specification to the portfolio. If this is an existing cashflow specification, clicking OK returns the altered cashflow specification to the portfolio. Cancel returns you to the Investment Analysis dialog box. If this is a new cashflow specification, clicking Cancel discards the current cashflow specification. If this is an existing cashflow specification, clicking Cancel discards the current changes.
Right-Clicking within Generic Cashflow’s Cashflow Specification Area Right-click within the Cashflow Specification area of the Generic Cashflow dialog box pops up the menu displayed in Figure 47.35. Figure 47.35 Right-Clicking
Add creates a blank pair. Delete removes the currently highlighted pair. Copy duplicates the currently selected pair. Sort arranges the entered pairs in chronological order. Clear empties the area of all pairs.
Flow Specification F 2749
Save opens the Save Dataset dialog box where you can save the entered pairs as a SAS Dataset for later use. Load opens the Load Dataset dialog box where you select a SAS Dataset to populate the area. Specify opens the Flow Specification dialog box where you can generate date-rate pairs to include in your cashflow. Forecast opens the Forecast Specification dialog box where you can generate the forecast of a SAS data set to include in your cashflow. If you want to perform one of these actions on a collection of pairs, you must select a collection of pairs before right-clicking. To select an adjacent list of pairs, do the following: click the first pair, hold down the SHIFT key, and click the final pair. After the list of pairs is selected, you can release the SHIFT key.
Flow Specification Figure 47.36 Flow Specification
The following items are displayed: Flow Time Specification Time Interval holds the uniform frequency of the entries. You can set the Starting Date when you set the Time Interval. It holds the SAS date the entries will start.
2750 F Chapter 47: Investments
You can set the Ending Date when you set the Time Interval. It holds the SAS date the entries will end. Number of Periods holds the number of entries. Flow Value Specification Series Flow Type describes the movement the entries can assume:
Uniform assumes all entries are equal.
Arithmetic assumes the entries begin at Level and increase by the value of Gradient per entry.
Geometric assumes the entries begin at Level and increase by a factor of Gradient per entry.
Level holds the starting amount for all flow types. You can set the Gradient when you select either Arithmetic or Geometric series flow type. It holds the arithmetic and geometric gradients, respectively, for the Arithmetic and Geometric flow types. When the cashflow entries are adequately defined, the Cashflow Chart fills with a graph displaying the dates and values of the entries. The box to the right of the scroll bar controls the number of entries with which to fill the graph. If the number in this box is less than the total number of entries, you can use the scroll bar to view different segments of the cashflow. The left box below the scroll bar holds the frequency. Subtract becomes available when the collection of entries is adequately specified. Clicking Subtract then returns you to the Generic Cashflow dialog box and subtracts the entries from the current cashflow. Add becomes available when the collection of entries is adequately specified. Clicking Add then returns you to the Generic Cashflow dialog box and adds the entries to the current cashflow. Cancel returns you to Generic Cashflow dialog box without changing the cashflow.
Forecast Specification F 2751
Forecast Specification Figure 47.37 Forecast Specification
The following items are displayed: Historical Data Specification Data Set holds the name of the SAS data set to forecast. Browse opens the standard SAS Open dialog box to help select a SAS data set to forecast. Time ID Variable holds the time ID variable to forecast over. Time Interval fixes the time interval for the Time ID Variable. Analysis Variable holds the data variable upon which to forecast. View Table opens a table that displays the contents of the specified SAS data set. View Graph opens the Time Series Viewer that graphically displays the contents of the specified SAS data set. Forecast Specification Horizon holds the number of periods into the future you want to forecast. Confidence holds the confidence limit for applicable forecasts. Compute Forecast fills the Cashflow Chart with the forecast. The box below Forecast Specification holds the type of forecast you want to generate:
2752 F Chapter 47: Investments
Predicted Value Lower Confidence Limit Upper Confidence Limit The Cashflow Chart fills when you click Compute Forecast. The box to the right of the scroll bar controls the number of entries with which to fill the graph. If the number in this box is less than the total number of entries, you can use the scroll bar to view different segments of the cashflow. The left box below the scroll bar holds the frequency. Subtract becomes available when the collection of entries is adequately specified. Clicking Subtract then returns you to the Generic Cashflow dialog box subtracting the forecast from the current cashflow. Add becomes available when the collection of entries is adequately specified. Clicking Add then returns you to the Generic Cashflow adding the forecast to the current cashflow. Cancel returns to Generic Cashflow dialog box without changing the cashflow.
Chapter 48
Computations Contents The Compute Menu . . . . . . . . . . Tasks . . . . . . . . . . . . . . . . . Taxing a Cashflow . . . . . . . Converting Currency . . . . . . Deflating Cashflows . . . . . . Dialog Box Guide . . . . . . . . . . . After Tax Cashflow Calculation Currency Conversion . . . . . . Constant Dollar Calculation . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
2753 2754 2754 2756 2758 2760 2760 2761 2763
The Compute Menu Figure 48.1 shows the Compute menu. Figure 48.1 The Compute Menu
The Compute menu offers the following options that apply to generic cashflows. After Tax Cashflow opens the After Tax Cashflow Calculation dialog box. Computing an after tax cashflow is useful when taxes affect investment alternatives differently. Comparing after tax cashflows provides a more accurate determination of the cashflows’ profitabilities. You can set default values for income tax rates by selecting Tools ! Define Rate ! Income Tax Rate from the Investment Analysis dialog box. This opens the Income Tax Specification dialog box where you can enter the tax rates.
2754 F Chapter 48: Computations
Currency Conversion opens the Currency Conversion dialog box. Currency conversion is necessary when investments are in different currencies. For data concerning currency conversion rates, see http://dsbb.imf.org/, the International Monetary Fund’s Dissemination Standards Bulletin Board. Constant Dollars opens the Constant Dollar Calculation dialog box. A constant dollar (inflation adjusted monetary value) calculation takes cashflow and inflation information and discounts the cashflow to a level where the buying power of the monetary unit is constant over time. Groups quantify inflation (in the form of price indices and inflation rates) for countries and industries by averaging the growth of prices for various products and sectors of the economy. For data concerning price indices, see the United States Department of Labor at http://www.dol.gov/ and the International Monetary Fund’s Dissemination Standards Bulletin Board at http://dsbb.imf. org/. You can set default values for inflation rates by clicking Tools ! Define Rate ! Inflation from the Investment Analysis dialog box. This opens the Inflation Specification dialog box where you can enter the inflation rates.
Tasks The next few sections show how to perform computations for the following situation. Suppose you buy a $10,000 certificate of deposit that pays 12% interest a year for five years. Your earnings are taxed at a rate of 30% federally and 7% locally. Also, you want to transfer all the money to an account in England. British pounds convert to American dollars at an exchange rate of $1.00 to £0.60. The inflation rate in England is 3%. The instructions in this example assume familiarity with the following: The right-clicking options of the Cashflow Specification area in the Generic Cashflow dialog box (described in the section “Right-Clicking within Generic Cashflow’s Cashflow Specification Area” on page 2748.) The Save Data As button located in many dialog boxes (described in the section “Saving Output to SAS Data Sets” on page 2781.)
Taxing a Cashflow Consider the example described in the section “The Compute Menu” on page 2753. To create the earnings, follow these steps: 1. Select Investment ! New ! Generic Cashflow to create a generic cashflow. 2. Enter CD_INTEREST for the Name. 3. Enter 1200 for each of the five years starting one year from today as displayed in Figure 48.2. 4. Click OK to return to the Investment Analysis dialog box.
Taxing a Cashflow F 2755
Figure 48.2 Computing the Interest on the CD
To compute the tax on the earnings, follow these steps: 1. Select CD_INTEREST from the Portfolio area. 2. Select Compute ! After Tax Cashflow from the pull-down menu. 3. Enter 30 for Federal Tax. 4. Enter 7 for Local Tax. Note that Combined Tax updates. 5. Click Create After Tax Cashflow. Figure 48.3.
The After Tax Cashflow area fills, as displayed in
2756 F Chapter 48: Computations
Figure 48.3 Computing the Interest After Taxes
Save the taxed earnings to a SAS data set named WORK.CD_AFTERTAX. Click Return to return to the Investment Analysis dialog box.
Converting Currency Consider the example described in the section “The Compute Menu” on page 2753. To create the cashflow to convert, follow these steps: 1. Select Investment ! New ! Generic Cashflow to open a new generic cashflow. 2. Enter CD_DOLLARS for the Name. 3. Load WORK.CD_AFTERTAX into its Cashflow Specification. 4. Add –10,000 for today and +10,000 for five years from today to the cashflow as displayed in Figure 48.4. 5. Sort the transactions by date to aid your reading. 6. Click OK to return to the Investment Analysis dialog box.
Converting Currency F 2757
Figure 48.4 The CD in Dollars
To convert from British pounds to American dollars, follow these steps: 1. Select CD_DOLLARS from the portfolio. 2. Select Compute ! Currency Conversion from the pull-down menu. Currency Conversion dialog box.
This opens the
3. Select USD for the From Currency. 4. Select GBP for the To Currency. 5. Enter 0.60 for the Exchange Rate. 6. Click Apply Currency Conversion to fill the Currency Conversion area as displayed in Figure 48.5.
2758 F Chapter 48: Computations
Figure 48.5 Converting the CD to Pounds
Save the converted values to a SAS data set named WORK.CD_POUNDS. Click Return to return to the Investment Analysis dialog box.
Deflating Cashflows Consider the example described in the section “The Compute Menu” on page 2753. To create the cashflow to deflate, follow these steps: 1. Select Investment ! New ! Generic Cashflow to open a new generic cashflow. 2. Enter CD_DEFLATED for Name. 3. Load WORK.CD_POUNDS into its Cashflow Specification (see Figure 48.6). 4. Click OK to return to the Investment Analysis dialog box.
Deflating Cashflows F 2759
Figure 48.6 The CD before Deflation
To deflate the values, follow these steps: 1. Select CD_DEFLATED from the portfolio. 2. Select Compute ! Constant Dollars from the menu. This opens the Constant Dollar Calculation dialog box. 3. Clear the Variable Inflation List area. 4. Enter 3 for the Constant Inflation Rate. 5. Click Create Constant Dollar Equivalent to generate a constant dollar equivalent summary (see Figure 48.7).
2760 F Chapter 48: Computations
Figure 48.7 CD Values after Deflation
You can save the deflated cashflow to a SAS data set for use in an internal rate of return analysis or breakeven analysis. Click Return to return to the Investment Analysis dialog box.
Dialog Box Guide
After Tax Cashflow Calculation Having selected a generic cashflow from the Investment Analysis dialog box, to perform an after tax calculation, select Compute ! After Tax from the Investment Analysis dialog box’s menu bar. This opens the After Tax Cashflow Calculation dialog box displayed in Figure 48.8.
Currency Conversion F 2761
Figure 48.8 After Tax Cashflow Calculation Dialog Box
The following items are displayed: Name holds the name of the investment for which you are computing the after-tax cashflow. Federal Tax holds the federal tax rate (a percentage between 0% and 100%). Local Tax holds the local tax rate (a percentage between 0% and 100%). Combined Tax holds the effective tax rate from federal and local income taxes. Create After Tax Cashflow becomes available when Combined Tax is not empty. Create After Tax Cashflow then fills the After Tax Cashflow area.
Clicking
After Tax Cashflow fills when you click Create After Tax Cashflow. It holds a list of date-amount pairs where the amount is the amount retained after taxes for that date. Print becomes available when you fill the after-tax cashflow. Clicking it sends the contents of the after tax cashflow to the SAS session print device. Save Data As becomes available when you fill the after tax cashflow. Clicking it opens the Save Output Dataset dialog box where you can save the resulting cashflow (or portions thereof) as a SAS Dataset. Return returns you to the Investment Analysis dialog box.
Currency Conversion Having selected a generic cashflow from the Investment Analysis dialog box, to perform a currency conversion, select Compute ! Currency Conversion from the Investment Analysis dialog box’s menu bar. This opens the Currency Conversion dialog box displayed in Figure 48.9.
2762 F Chapter 48: Computations
Figure 48.9 Currency Conversion Dialog Box
The following items are displayed: Name holds the name of the investment to which you are applying the currency conversion. From Currency holds the name of the currency the cashflow currently represents. To Currency holds the name of the currency to which you wish to convert. Exchange Rate holds the rate of exchange between the From Currency and the To Currency. Apply Currency Conversion becomes available when you fill Exchange Rate. Apply Currency Conversion fills the Currency Conversion area.
Clicking
Currency Conversion fills when you click Apply Currency Conversion. The schedule contains a row for each cashflow item with the following information: Date is a SAS date within the cashflow. The From Currency value is the amount in the original currency at that date. The To Currency value is the amount in the new currency at that date. Print becomes available when you fill the Currency Conversion area. Clicking it sends the contents of the conversion table to the SAS session print device. Save Data As becomes available when you fill the Currency Conversion area. Clicking it opens the Save Output Dataset dialog box where you can save the conversion table (or portions thereof) as a SAS Dataset. Return returns you to the Investment Analysis dialog box.
Constant Dollar Calculation F 2763
Constant Dollar Calculation Having selected a generic cashflow from the Investment Analysis dialog box, to perform a constant dollar calculation, select Compute ! Constant Dollars from the Investment Analysis dialog box’s menu bar. This opens the Constant Dollar Calculation dialog box displayed in Figure 48.10. Figure 48.10 Constant Dollar Calculation Dialog Box
The following items are displayed: Name holds the name of the investment for which you are computing the constant dollars value. Constant Inflation Rate holds the constant inflation rate (a percentage between 0% and 120%). This value is used if the Variable Inflation List area is empty. Variable Inflation List holds date-rate pairs that describe how inflation varies over time. Each date is a SAS date, and the rate is a percentage between 0% and 120%. Each date refers to when that inflation rate begins. Right-clicking within the Variable Inflation area reveals many helpful tools for managing date-rate pairs. If you assume a fixed inflation rate, just insert that rate in Constant Rate. Dates holds the SAS date(s) at which you wish to compute the constant dollar equivalent. Rightclicking within the Dates area reveals many helpful tools for managing date lists. Create Constant Dollar Equivalent becomes available when you enter inflation rate information. Clicking it fills the constant dollar equivalent summary with the computed constant dollar values. Constant Dollar Equivalent Summary fills with a summary when you click Create Constant Dollar Equivalent. The first column lists the dates of the generic cashflow. The second column contains the constant dollar equivalent of the original generic cashflow item of that date.
2764 F Chapter 48: Computations
Print becomes available when you fill the constant dollar equivalent summary. Clicking it sends the contents of the summary to the SAS session print device. Save Data As becomes available when you fill the constant dollar equivalent summary. Clicking it opens the Save Output Dataset dialog box where you can save the summary (or portions thereof) as a SAS Dataset. Return returns you to the Investment Analysis dialog box.
Chapter 49
Analyses Contents The Analyze Menu . . . . . . . . . . . . . . . . Tasks . . . . . . . . . . . . . . . . . . . . . . . Performing Time Value Analysis . . . . . Computing an Internal Rate of Return . . . Performing a Benefit-Cost Ratio Analysis . Computing a Uniform Periodic Equivalent Performing a Breakeven Analysis . . . . . Dialog Box Guide . . . . . . . . . . . . . . . . . Time Value Analysis . . . . . . . . . . . . Uniform Periodic Equivalent . . . . . . . . Internal Rate of Return . . . . . . . . . . . Benefit-Cost Ratio Analysis . . . . . . . . Breakeven Analysis . . . . . . . . . . . . Breakeven Graph . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
2765 2766 2766 2768 2769 2770 2771 2773 2773 2774 2775 2776 2777 2778
The Analyze Menu Figure 49.1 shows the Analyze menu. Figure 49.1 Analyze Menu
The Analyze menu offers the following options for use on applicable investments. Time Value opens the Time Value Analysis dialog box. Time value analysis involves moving money through time across a defined minimum attractive rate of return (MARR) so that you can
2766 F Chapter 49: Analyses
compare value at a consistent date. The MARR can be constant or variable over time. Periodic Equivalent opens the Uniform Periodic Equivalent dialog box. Uniform periodic equivalent analysis determines the payment needed to convert a cashflow to uniform amounts over time, given a periodicity, a number of periods, and a MARR. This option helps when making comparisons where one alternative is uniform (such as renting) and another is not (such as buying). Internal Rate of Return opens the Internal Rate of Return dialog box. The internal rate of return of a cashflow is the interest rate that makes the time value equal to 0. This calculation assumes uniform periodicity of the cashflow. It is particularly applicable where the choice of MARR would be difficult. Benefit-Cost Ratio opens the Benefit-Cost Ratio Analysis dialog box. The benefit-cost ratio divides the time value of the benefits by the time value of the costs. For example, governments often use this analysis when deciding whether to commit to a public works project. Breakeven Analysis opens the Breakeven Analysis dialog box. Breakeven analysis computes time values at various MARRs to compare, which can be advantageous when it is difficult to determine a MARR. This analysis can help you determine how the cashflow’s profitability varies with your choice of MARR. A graph displaying the relationships between time value and MARR is also available.
Tasks
Performing Time Value Analysis Suppose a rock quarry needs equipment to use the next five years. It has two alternatives: a box loader and conveyer system that has a one-time cost of $264,000 a two-shovel loader, which costs $84,000 but has a yearly operating cost of $36,000. This loader has a service life of three years, which necessitates the purchase of a new loader for the final two years of the rock quarry project. Assume the second loader also costs $84,000 and its salvage value after its two-year service is $10,000. A SAS data set that describes this is available at SASHELP.ROCKPIT You expect a 13% MARR. Which is the better alternative? To create the cashflows, follow these steps: 1. Create a cashflow with the single amount –264,000. Date the amount 01JAN1998 to be consistent with the SAS data set you load. 2. Load SASHELP.ROCKPIT into a second cashflow, as displayed in Figure 49.2.
Performing Time Value Analysis F 2767
Figure 49.2 The contents of SASHELP.ROCKPIT
To compute the time values of these investments, follow these steps: 1. Select both cashflows. 2. Select Analyze ! Time Value. This opens the Time Value Analysis dialog box. 3. Enter the date 01JAN1998 into the Dates area. 4. Enter 13 for the Constant MARR. 5. Click Create Time Value Summary.
2768 F Chapter 49: Analyses
Figure 49.3 Performing the Time Value Analysis
As shown in Figure 49.3, option 1 has a time value of –$264,000.00 naturally on 01JAN1998. However, option 2 has a time value of –$263,408.94, which is slightly less expensive.
Computing an Internal Rate of Return You are choosing between five investments. A portfolio containing these investments is available at SASHELP.INVSAMP.NVST. Which investments are acceptable if you expect a MARR of 9%? Open the portfolio SASHELP.INVSAMP.NVST and compare the investments. Note that Internal Rate of Return computations assume regular periodicity of the cashflow. To compute the internal rates of return, follow these steps: 1. Select all five investments. 2. Select Analyze ! Internal Rate of Return.
Performing a Benefit-Cost Ratio Analysis F 2769
Figure 49.4 Computing an Internal Rate of Return
The results displayed in Figure 49.4 indicate that the internal rates of return for investments 2, 4, and 5 are greater than 9%. Hence, each of these is acceptable.
Performing a Benefit-Cost Ratio Analysis Suppose a municipality has excess funds to invest. It is choosing between the same investments described in the previous example. Government agencies often compute benefit-cost ratios to decide which investment to pursue. Which is best in this case? Open the portfolio SASHELP.INVSAMP.NVST and compare the investments. To compute the benefit-cost ratios, follow these steps: 1. Select all five investments. 2. Select Analyze ! Benefit-Cost Ratio. 3. Enter 01JAN1996 for the Date. 4. Enter 9 for Constant MARR. 5. Click Create Benefit-Cost Ratio Summary to fill the Benefit-Cost Ratio Summary area. The results displayed in Figure 49.5 indicate that investments 2, 4, and 5 have ratios greater than 1. Therefore, each is profitable with a MARR of 9%.
2770 F Chapter 49: Analyses
Figure 49.5 Performing a Benefit-Cost Ratio Analysis
Computing a Uniform Periodic Equivalent Suppose you need a warehouse for ten years. You have two options: pay rent for ten years at $23,000 per year build a two-stage facility that you will maintain and which you intend to sell at the end of those ten years Data sets describing these scenarios are available in the portfolio SASHELP.INVSAMP.BUYRENT. Which option is more financially sound if you desire a 12% MARR? Open the portfolio SASHELP.INVSAMP.BUYRENT and compare the options. To perform the periodic equivalent, follow these steps: 1. Load the portfolio SASHELP.INVSAMP.BUYRENT. 2. Select both cashflows. 3. Select Analyze ! Periodic Equivalent. This opens the Uniform Periodic Equivalent dialog box. 4. Enter 01JAN1996 for the Start Date. 5. Enter 10 for the Number of Periods. 6. Select YEAR for the Interval.
Performing a Breakeven Analysis F 2771
7. Enter 12 for the Constant MARR. 8. Click Create Time Value Summary. Figure 49.6 Computing a Uniform Periodic Equivalent
Figure 49.6 indicates that renting costs about $1,300 less each year. Hence, renting is more financially sound. Notice the periodic equivalent for renting is not $23,000. This is because the $23,000 per year does not account for the MARR.
Performing a Breakeven Analysis In the previous example you computed the uniform periodic equivalent for a rent-buy scenario. Now let’s perform a breakeven analysis to see how the MARR affects the time values. To perform the breakeven analysis, follow these steps: 1. Select both options. 2. Select Analyze ! Breakeven Analysis. 3. Enter 01JAN1996 for the Date. 4. Enter 12.0 for Value. 5. Enter 4.0 for (+/-). 6. Enter 0.5 for Increment by.
2772 F Chapter 49: Analyses
7. Click Create Breakeven Analysis Summary to fill the Breakeven Analysis Summary area as displayed in Figure 49.7. Figure 49.7 Performing a Breakeven Analysis
Click Graphics to view a plot displaying the relationship between time value and MARR. Figure 49.8 Viewing a Breakeven Graph
Time Value Analysis F 2773
As shown in Figure 49.8, renting is better if you want a MARR of 12%. However, if your MARR should drop to 10.5%, buying would be better. With a single investment, knowing where the graph has a time value of 0 tells the MARR when a venture switches from being profitable to being a loss. With multiple investments, knowing where the graphs for the various investments cross each other tells at what MARR a particular investment becomes more profitable than another.
Dialog Box Guide
Time Value Analysis Having selected a generic cashflow from the Investment Analysis dialog box, to perform an time value analysis, select Analyze ! Time Value from the Investment Analysis dialog box’s menu bar. This opens the Time Value Analysis dialog box displayed in Figure 49.9. Figure 49.9 Time Value Analysis Dialog Box
The following items are displayed: Analysis Specifications Dates holds the list of dates as of which to perform the time value analysis. Right-clicking within the Dates area reveals many helpful tools for managing date lists.
2774 F Chapter 49: Analyses
Constant MARR holds the desired MARR for the time value analysis. This value is used if the MARR List area is empty. MARR List holds date-rate pairs that express your desired MARR as it changes over time. Each date refers to when that expected MARR begins. Right-clicking within the MARR List area reveals many helpful tools for managing date-rate pairs. Create Time Value Summary becomes available when you adequately specify the analysis within the Analysis Specifications area. Clicking Create Time Value Summary then fills the Time Value Summary area. Time Value Summary fills when you click Create Time Value Summary. The table contains a row for each date in the Dates area. The remainder of each row holds the time values at that date, one value for each investment selected. Print becomes available when you fill the time value summary. Clicking it sends the contents of the summary to the SAS session print device. Save Data As becomes available when you fill the time value summary. Clicking it opens the Save Output Dataset dialog box where you can save the summary (or portions thereof) as a SAS Dataset. Return takes you back to the Investment Analysis dialog box.
Uniform Periodic Equivalent Having selected a generic cashflow from the Investment Analysis dialog box, to perform a uniform periodic equivalent, select Analyze ! Periodic Equivalent from the Investment Analysis dialog box’s menu bar. This opens the Uniform Periodic Equivalent dialog box displayed in Figure 49.10. Figure 49.10 Uniform Periodic Equivalent Dialog Box
Internal Rate of Return F 2775
The following items are displayed: Analysis Specifications Start Date holds the date the uniform periodic equivalents begin. Number of Periods holds the number of uniform periodic equivalents. Interval holds how often the uniform periodic equivalents occur. Constant MARR holds the Minimum Attractive Rate of Return. Create Periodic Equivalent Summary becomes available when you adequately fill the Analysis Specification area. Clicking Create Periodic Equivalent Summary then fills the periodic equivalent summary. Periodic Equivalent Summary fills with two columns when you click Create Periodic Equivalent Summary. The first column lists the investments selected. The second column lists the computed periodic equivalent amount. Print becomes available when you fill the periodic equivalent summary. Clicking it sends the contents of the summary to the SAS session print device. Save Data As becomes available when you generate the periodic equivalent summary. Clicking it opens the Save Output Dataset dialog box where you can save the summary (or portions thereof) as a SAS Dataset. Return takes you back to the Investment Analysis dialog box.
Internal Rate of Return Having selected a generic cashflow from the Investment Analysis dialog box, to perform an internal rate of return calculation, select Analyze ! Internal Rate of Return from the Investment Analysis dialog box’s menu bar. This opens the Internal Rate of Return dialog box displayed in Figure 49.11. Figure 49.11 Internal Rate of Return Dialog Box
The following items are displayed:
2776 F Chapter 49: Analyses
IRR Summary contains a row for each deposit. Each row holds: Name holds the name of the investment. IRR holds the internal rate of return for that investment. interval holds the interest rate interval for that IRR. Print becomes available when you fill the IRR summary. Clicking it sends the contents of the summary to the SAS session print device. Save Data As opens the Save Output Dataset dialog box where you can save the IRR summary (or portions thereof) as a SAS data set. Return takes you back to the Investment Analysis dialog box.
Benefit-Cost Ratio Analysis Having selected a generic cashflow from the Investment Analysis dialog box, to compute a benefitcost ratio, select Analyze ! Benefit-Cost Ratio from the Investment Analysis dialog box’s menu bar. This opens the Benefit-Cost Ratio Analysis dialog box displayed in Figure 49.12. Figure 49.12 Benefit-Cost Ratio Analysis Dialog Box
The following items are displayed: Analysis Specifications Dates holds the dates as of which to compute the Benefit-Cost ratios.
Breakeven Analysis F 2777
Constant MARR holds the desired MARR. This value is used if the MARR List area is empty. MARR List holds date-rate pairs that express your desired MARR as it changes over time. Each date refers to when that expected MARR begins. Right-clicking within the MARR List area reveals many helpful tools for managing date-rate pairs. Create Benefit-Cost Ratio Summary becomes available when you adequately specify the analysis. Clicking Create Benefit-Cost Ratio Summary fills the benefit-cost ratio summary. Benefit-Cost Ratio Summary fills when you click Exchange the Rates. The area contains a row for each date in the Dates area. The remainder of each row holds the benefit-cost ratios at that date, one value for each investment selected. Print becomes available when you fill the benefit-cost ratio summary. Clicking it sends the contents of the summary to the SAS session print device. Save Data As becomes available when you generate the benefit-cost ratio summary. Clicking it opens the Save Output Dataset dialog box where you can save the summary (or portions thereof) as a SAS Dataset. Return takes you back to the Investment Analysis dialog box.
Breakeven Analysis Having selected a generic cashflow from the Investment Analysis dialog box, to perform a breakeven analysis, select Analyze ! Breakeven Analysis from the Investment Analysis dialog box’s menu bar. This opens the Breakeven Analysis dialog box displayed in Figure 49.13. Figure 49.13 Breakeven Analysis Dialog Box
2778 F Chapter 49: Analyses
The following items are displayed: Analysis Specification Analysis holds the analysis type. Only Time Value is currently available. Date holds the date for which you perform this analysis. Variable holds the variable upon which the breakeven analysis will vary. Only MARR is currently available. Value holds the desired rate upon which to center the analysis. +/- holds the maximum deviation from the Value to consider. Increment by holds the increment by which the analysis is calculated. Create Breakeven Analysis Summary becomes available when you adequately specify the analysis. Clicking Create Breakeven Analysis Summary then fills the Breakeven Analysis Summary area. Breakeven Analysis Summary fills when you click Create Breakeven Analysis Summary. The schedule contains a row for each MARR and date. Graphics becomes available when you fill the Breakeven Analysis Summary area. Clicking it opens the Breakeven Graph graph representing the time value versus MARR. Print becomes available when you fill the breakeven analysis summary. Clicking it sends the contents of the summary to the SAS session print device. Save Data As becomes available when you generate the breakeven analysis summary. Clicking it opens the Save Output Dataset dialog box where you can save the summary (or portions thereof) as a SAS Dataset. Return takes you back to the Investment Analysis dialog box.
Breakeven Graph Suppose you perform a breakeven analysis in the Breakeven Analysis dialog box. Once you create the breakeven analysis summary, you can click the Graphics button to open the Breakeven Graph dialog box displayed in Figure 49.14.
Breakeven Graph F 2779
Figure 49.14 Breakeven Graph Dialog Box
The following item is displayed: Return takes you back to the Breakeven Analysis dialog box.
2780
Chapter 50
Details Contents Investments and Data Sets . . . . . . . . . . . . Saving Output to SAS Data Sets . . . . . . Loading a SAS Data Set into a List . . . . Saving Data from a List to a SAS Data Set Right Mouse Button Options . . . . . . . . . . . Depreciation Methods . . . . . . . . . . . . . . Straight Line (SL) . . . . . . . . . . . . . Sum-of-Years Digits . . . . . . . . . . . . Declining Balance (DB) . . . . . . . . . . Rate Information . . . . . . . . . . . . . . . . . The Tools Menu . . . . . . . . . . . . . . Dialog Box Guide . . . . . . . . . . . . . Minimum Attractive Rate of Return (MARR) . . Income Tax Specification . . . . . . . . . . . . . Inflation Specification . . . . . . . . . . . . . . Reference . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
2781 2781 2783 2783 2784 2785 2785 2785 2787 2788 2788 2789 2789 2790 2791 2792
Investments and Data Sets Investment Analysis provides tools to assist you in moving data between SAS data sets and lists you can use within Investment Analysis.
Saving Output to SAS Data Sets Many investment specifications have a button that reads Save Data As. Clicking that button opens the Save Output Dataset dialog box (see Figure 50.1). This dialog box enables you to save all or part of the area generated by the specification.
2782 F Chapter 50: Details
Figure 50.1 Saving to a Dataset
The following items are displayed: Dataset Name holds the SAS data set name to which you want to save. Browse opens the standard SAS Open dialog box, which enables you to select an existing SAS data set to overwrite. Dataset Label holds the SAS data set’s label. Dataset Variables organizes variables. The variables listed in the Selected area will be included in the SAS data set. You can select variables one at a time, by clicking the single right-arrow after each selection to move it to the Selected area. If the desired SAS data set has many variables you want to save, it may be simpler to follow these steps: 1. Click the double right-arrow to select all available variables. 2. Remove any unwanted variable by selecting it from the Selected area and clicking the single left-arrow. The double left-arrow removes all selected variables from the proposed SAS data set. The up and down arrows below the Available and Selected boxes enable you to scroll up and down the list of variables in their respective boxes. Save Dataset attempts to save the SAS data set. If the SAS data set name exists, you are asked if you want to replace the existing SAS data set, append to the existing SAS data set, or cancel the current save attempt. You then return to this dialog box ready to create another SAS data set to save.
Saving Data from a List to a SAS Data Set F 2783
Return takes you back to the specification dialog box.
Loading a SAS Data Set into a List Right-click in the area that you want to load the list and release on Load. This opens the Load Dataset dialog box (see Figure 50.2). Figure 50.2 Load Dataset Dialog Box
The following items are displayed: Dataset Name holds the name of the SAS data set that you want to load. Browse opens the standard SAS Open dialog box, which aids in finding a SAS data set to load. If there is a Date variable in the SAS data set, Investment Analysis loads it into the list. If there is no Date variable, it loads the first available time-formatted variable. If an amount or rate variable is needed, Investment Analysis searches the SAS data set for a Amount or Rate variable to use. Otherwise it takes the first numeric variable that is not used by the Date variable. Dataset Label holds a SAS data set label. OK attempts to load the SAS data set specified in Dataset Name. If the specified SAS data set exists, clicking OK returns you to the calling dialog box with the selected SAS data set filling the list. If the specified SAS data set does not exist and you click OK, you receive an error message and no SAS data set is loaded. Cancel returns you to the calling dialog box without loading a SAS data set.
Saving Data from a List to a SAS Data Set Right-click in the area you want to hold the list, and release on Save. This opens the Save Dataset dialog box.
2784 F Chapter 50: Details
Figure 50.3 Save Dataset Dialog Box
The following items are displayed: Dataset Name holds the SAS data set name to which you want to save. Browse opens the standard SAS Save As dialog box, which enables you to find an existing SAS data set to overwrite. Dataset Label holds a user-defined description to be saved as the label of the SAS data set. OK saves the current data to the SAS data set specified in Dataset Name. If the specified SAS data set does not already exist, clicking OK saves the SAS data set and returns you to the calling dialog box. If the specified SAS data set does already exist, clicking OK warns you and enables you to replace the old SAS data set with the new SAS data set or cancel the save attempt. Cancel aborts the save process. Clicking Cancel returns you to the calling dialog box without attempting to save.
Right Mouse Button Options A pop-up menu often appears when you right-click within table editors. The menus offer tools to aid in the management of the table’s entries. Most table editors provide the following options. Figure 50.4 Right-Clicking Options
Add creates a blank row. Delete removes any currently selected row.
Depreciation Methods F 2785
Copy duplicates the currently selected row. Sort arranges the rows in chronological order according to the date variable. Clear empties the table of all rows. Save opens the Save Dataset dialog box where you can save the all rows to a SAS Dataset for later use. Load opens the Load Dataset dialog box where you select a SAS Dataset to fill the rows. If you want to perform one of these actions on a collection of rows, you must select a collection of rows before right-clicking. To select an adjacent list of rows, do the following: click the first pair, hold down SHIFT, and click the final pair. After the list of rows is selected, you may release the SHIFT key.
Depreciation Methods Suppose an asset’s price is $20,000 and it has a salvage value of $5,000 in five years. The following sections describe various methods to quantify the depreciation.
Straight Line (SL) This method assumes a constant depreciation value per year. Assuming that the price of a depreciating asset is P and its salvage value after N years is S , the annual depreciation is: P
S N
For our example, the annual depreciation would be $20; 000
$5; 000 5
D $3; 000
Sum-of-Years Digits An asset often loses more of its value early in its lifetime. A method that exhibits this dynamic is desirable. Assume an asset depreciates from price P to salvage value S in N years. First compute the sumof-years as T D 1 C 2 C C N . The depreciation for the years after the asset’s purchase is:
2786 F Chapter 50: Details
Table 50.1
Sum-of-Years General Example
Year Number
Annual Depreciation
first
N T .P
second
N 1 T .P
S/
third
N 2 T .P
S/
:: :
S/
:: : 1 T .P
final
S/
For the i th year of the asset’s use, the annual depreciation is: N C1 T
i
.P
S/
For our example, N D 5 and the sum of years is T D 1 C 2 C 3 C 4 C 5 D 15. The depreciation during the first year is .$20; 000
$5; 000/
5 D $5; 000 15
Table 50.2 describes how Declining Balance would depreciate the asset. Table 50.2
Sum-of-Years Example
Year
Depreciation
Year-End Value
.$20; 000
5 D $5; 000 $5; 000/ 15
$15; 000:00
2
.$20; 000
D $4; 000
$11; 000:00
3
.$20; 000
D $3; 000
$8; 000:00
4
.$20; 000
D $2; 000
$6; 000:00
5
.$20; 000
4 $5; 000/ 15 3 $5; 000/ 15 2 $5; 000/ 15 1 $5; 000/ 15
D $1; 000
$5; 000:00
1
As expected, the value after N years is S .
S DP DP DP
.5 years’ depreciation/ 5 4 .P S / C .P 10 10 .P S /
3 S/ C .P 10
2 S/ C .P 10
1 S/ C .P 10
S/
Declining Balance (DB) F 2787
Declining Balance (DB) Recall that the straight line method assumes a constant depreciation value. Conversely, the declining balance method assumes a constant depreciation rate per year. And like the sum-of-years method, more depreciation tends to occur earlier in the asset’s life. Assume the price of a depreciating asset is P and its salvage value after N years is S . You could assume the asset depreciates by a factor of N1 (or a rate of 100 N %). This method is known as single declining balance. The annual depreciation is: 1 .previous year’s value/ N So for our example, the depreciation during the first year is how declining balance would depreciate the asset. Table 50.3
$20;000 5
D $4; 000. Table 50.3 describes
Declining Balance Example
Year 1 2 3 4 5
Depreciation $20;000:00 5 $16;000:00 5 $12;800:00 5 $10;240:00 5 $12;800:00 5
D $4; 000:00 D $3; 200:00 D $2; 560:00 D $2; 048:00 D $1; 638:40
Year-End Value $16; 000:00 $12; 800:00 $10; 240:00 $8; 192:00 $6; 553:60
DB Factor You could also accelerate the depreciation by increasing the factor (and hence the rate) at which depreciation occurs. Other commonly accepted depreciation rates are 200 N % (called double declin%. Investment Analysis enables you ing balance as the depreciation factor becomes N2 ) and 150 N 200 to choose between these three types for declining balance: 2 (with N % depreciation), 1.5 (with 150 100 N %), and 1 (with N %).
Declining Balance and the Salvage Value The declining balance method assumes that depreciation is faster earlier in an asset’s life; this is what you wanted. But notice the final value is greater than the salvage value. Even if the salvage value were greater than $6,553.60, the final year-end value would not change. The salvage value never enters the calculation, so there is no way for the salvage value to force the depreciation to assume its value. Newnan and Lavelle (1998) describe two ways to adapt the declining balance method to assume the salvage value at the final time. One way is as follows: Suppose you call the depreciated value after i years V .i/. This sets V .0/ D P and V .N / D S.
2788 F Chapter 50: Details
If V .N / > S according to the usual calculation for V .N /, redefine V .N / to equal S . If V .i / < S according to the usual calculation for V .i/ for some i (and hence for all subsequent V .i / values), you can redefine all such V .i/ to equal S . This alteration to declining balance forces the depreciated value of the asset after N years to be S and keeps V .i / no less than S .
Conversion to SL The second (and preferred) way to force declining balance to assume the salvage value is by conversion to straight line. If V .N / > S, the first way redefines V .N / to equal S; you can think of this as converting to the straight line method for the last timestep. If the V .N / value supplied by DB is appreciably larger than S , then the depreciation in the final year would be unrealistically large. An alternate way is to compute the DB and SL step at each timestep and take whichever step gives a larger depreciation (unless DB drops below the salvage value). After SL assumes a larger depreciation, it continues to be larger over the life of the asset. SL forces the value at the final time to equal the salvage value. As an algorithm, this looks like the following statements: V(0) = P; for i=1 to N if DB step > SL step from (i,V(i)) take a DB step to make V(i); else break; for j = i to N take a SL step to make V(j);
The MACRS, which is discussed in the section that describes the Depreciation Table window, is actually a variation on the declining balance method with conversion to the straight line method.
Rate Information
The Tools Menu Figure 50.5 shows the Tools menu.
Dialog Box Guide F 2789
Figure 50.5 The Tools Menu
The Tools ! Define Rates menu offers the following options. MARR opens the Minumum Attractive Rate of Return (MARR) dialog box. Income Tax Rate opens the Income Tax Specification dialog box. Inflation opens the Inflation Specification dialog box.
Dialog Box Guide
Minimum Attractive Rate of Return (MARR) Selecting Tools ! Define Rates ! MARR from the Investment Analysis dialog box menu bar opens the MARR dialog box that is displayed in Figure 50.6.
2790 F Chapter 50: Details
Figure 50.6 MARR Dialog Box
Name holds the name that you assign to the MARR specification. This must be a valid SAS name. Constant MARR holds the numeric value that you choose to be the constant MARR. This value is used if the MARR List table editor is empty. MARR List holds date-MARR pairs where the date refers to when the particular MARR value begins. Each date is a SAS date. OK returns you to the Investment Analysis dialog box. Clicking it causes the preceding MARR specification to be assumed when you do not specify MARR rates in a dialog box that needs MARR rates. Cancel returns you to the Investment Analysis dialog box, discarding any work that was done in the MARR dialog box.
Income Tax Specification Selecting Tools ! Define Rates ! Income Tax Rate from the Investment Analysis dialog box menu bar opens the Income Tax Specification dialog box displayed in Figure 50.7.
Inflation Specification F 2791
Figure 50.7 Income Tax Specification Dialog Box
Name holds the name you assign to the Income Tax specification. This must be a valid SAS name. Federal Tax holds the numeric value that you want to be the constant Federal Tax. Local Tax holds the numeric value that you want to be the constant Local Tax. Taxrate List holds date-Income Tax triples where the date refers to when the particular Income Tax value begins. Each date is a SAS date, and the value is a percentage between 0% and 100%. OK returns you to the Investment Analysis dialog box. Clicking it causes the preceding income tax specification to be the default income tax rates when using the After Tax Cashflow Calculation dialog box. Cancel returns you to the Investment Analysis dialog box, discarding any changes that were made since this dialog box was opened.
Inflation Specification Selecting Tools ! Define Rates ! Inflation from the Investment Analysis dialog box menu bar opens the Inflation Specification dialog box displayed in Figure 50.8.
2792 F Chapter 50: Details
Figure 50.8 Inflation Specification Dialog Box
Name holds the name that you assign to the Inflation specification. This must be a valid SAS name. Constant Rate holds the numeric value that you want to be the constant inflation rate. This value is used if the Inflation Rate List table editor is empty. Inflation Rate List holds date-rate pairs where the date refers to when the particular inflation rate begins. Each date is a SAS date and the rate is a percentage between 0% and 120%. OK returns you to the Investment Analysis dialog box. Clicking it causes the preceding inflation specification to be assumed when you use the Constant Dollar Calculation dialog box and do not specify inflation rates. Cancel returns you to the Investment Analysis dialog box, discarding any changes that were made since this dialog box was opened.
Reference Newnan, Donald G. and Lavelle, Jerome P. (1998), Engineering Economic Analysis, Austin, Texas: Engineering Press.
Subject Index @CRSPDB Date Informats SASECRSP engine, 2209 @CRSPDR Date Informats SASECRSP engine, 2209 @CRSPDT Date Informats SASECRSP engine, 2209 2SLS estimation method, see two-stage least squares 3SLS estimation method, see three-stage least squares add factors, see adjustments additive model ARIMA model, 212 additive Winters method seasonal forecasting, 803 additive-invertible region smoothing weights, 2671 ADDWINTERS method FORECAST procedure, 803 adjacency graph MODEL procedure, 1178 adjust operators, 754 adjustable rate mortgage, see LOAN procedure LOAN procedure, 828 adjusted R squared MODEL procedure, 1027 adjusted R-square statistics of fit, 1819, 2689 adjustments, 2554, 2667 add factors, 2522 forecasting models, 2522 specifying, 2522 After Tax Cashflow Calculation, 2754 AGGREGATE method EXPAND procedure, 740 aggregation of time series data, 721, 724 aggregation of time series EXPAND procedure, 721, 724 AIC, see Akaike information criterion, see Akaike’s information criterion Akaike Information Criterion VARMAX procedure, 1957 Akaike information criterion AIC, 251 ARIMA procedure, 251 AUTOREG procedure, 370
used to select state space models, 1591 Akaike information criterion corrected AUTOREG procedure, 370 Akaike’s information criterion AIC, 2689 statistics of fit, 2689 alignment of dates, 2582 time intervals, 132 alignment of dates, 142, 2582 Almon lag polynomials, see polynomial distributed lags MODEL procedure, 1102 alternatives to DIF function, 108 LAG function, 108 Amemiya’s prediction criterion statistics of fit, 2689 Amemiya’s R-square statistics of fit, 1820, 2689 AMO, 59 amortization schedule LOAN procedure, 856 analyzing models MODEL procedure, 1174 and goal seeking ordinary differential equations (ODEs), 1074 and state space models stationarity, 1571 and tests for autocorrelation lagged dependent variables, 326 and the OUTPUT statement output data sets, 83 Annuity, see Uniform Periodic Equivalent AR initial conditions conditional least squares, 1091 Hildreth-Lu, 1091 maximum likelihood, 1091 unconditional least squares, 1091 Yule-Walker, 1091 ARCH model AUTOREG procedure, 314 autoregressive conditional heteroscedasticity, 314 ARIMA model additive model, 212 ARIMA procedure, 190
2794 F Subject Index
autoregressive integrated moving-average model, 190, 2680 Box-Jenkins model, 190 factored model, 212 multiplicative model, 212 notation for, 207 seasonal model, 212 simulating, 2560, 2653 subset model, 212 ARIMA model specification, 2557, 2589 ARIMA models forecasting models, 2465 specifying, 2465 ARIMA procedure Akaike information criterion, 251 ARIMA model, 190 ARIMAX model, 190, 213 ARMA model, 190 autocorrelations, 193 autoregressive parameters, 256 BY groups, 227 conditional forecasts, 257 confidence limits, 257 correlation plots, 193 cross-correlation function, 240 data requirements, 221 differencing, 210, 247, 254 factored model, 212 finite memory forecasts, 258 forecasting, 257, 259 Gauss-Marquardt method, 250 ID variables, 259 infinite memory forecasts, 257 input series, 213 interaction effects, 217 intervention model, 213, 216, 219, 298 inverse autocorrelation function, 239 invertibility, 255 Iterative Outlier Detection, 307 log transformations, 258 Marquardt method, 250 Model Identification, 300 moving-average parameters, 256 naming model parameters, 256 ODS graph names, 276 ODS Graphics, 224 Outlier Detection, 305 output data sets, 262–264, 267, 268 output table names, 272 predicted values, 257 prewhitening, 246, 247 printed output, 269 rational transfer functions, 219
regression model with ARMA errors, 213, 214 residuals, 257 Schwarz Bayesian criterion, 251 seasonal model, 212 stationarity, 194 subset model, 212 syntax, 221 time intervals, 259 transfer function model, 213, 218, 251 unconditional forecasts, 258 ARIMA process specification, 2560 ARIMAX model ARIMA procedure, 190, 213 ARIMAX models and design matrix, 217 ARMA model ARIMA procedure, 190 autoregressive moving-average model, 190 MODEL procedure, 1088 notation for, 207 as time ID observation numbers, 2443 asymptotic distribution of impulse response functions VARMAX procedure, 1943, 1951 asymptotic distribution of the parameter estimation VARMAX procedure, 1951 at annual rates percent change calculations, 110 attributes DATASOURCE procedure, 520 attributes of variables DATASOURCE procedure, 545 audit trail, 2582 augmented Dickey-Fuller tests, 231, 246 autocorrelation, 1058 autocorrelation tests, 1058 Durbin-Watson test, 343 Godfrey Lagrange test, 1058 Godfrey’s test, 344 autocorrelations ARIMA procedure, 193 multivariate, 1573 plotting, 193 prediction errors, 2430 series, 2495 automatic forecasting FORECAST procedure, 774 STATESPACE procedure, 1568 automatic generation forecasting models, 2396 automatic inclusion of
Subject Index F 2795
interventions, 2582 automatic model selection criterion, 2610 options, 2568 automatic selection forecasting models, 2459 AUTOREG procedure Akaike information criterion, 370 Akaike information criterion corrected, 370 ARCH model, 314 autoregressive error correction, 316 BY groups, 340 Cholesky root, 359 Cochrane-Orcutt method, 361 conditional variance, 383 confidence limits, 355 dual quasi-Newton method, 367 Durbin h test, 326 Durbin t test, 326 Durbin-Watson test, 324 EGARCH model, 335 EGLS method, 361 estimation methods, 357 factored model, 329 GARCH model, 314 GARCH-M model, 335 Gauss-Marquardt method, 360 generalized Durbin-Watson tests, 324 heteroscedasticity, 329 Hildreth-Lu method, 362 IGARCH model, 335 Kalman filter, 360 lagged dependent variables, 326 maximum likelihood method, 362 nonlinear least-squares, 362 ODS graph names, 388 output data sets, 384 output table names, 386 Prais-Winsten estimates, 361 predicted values, 355, 381, 382 printed output, 385 quasi-Newton method, 342 random walk model, 397 residuals, 356 Schwarz Bayesian criterion, 370 serial correlation correction, 316 stepwise autoregression, 327 structural predictions, 381 subset model, 329 Toeplitz matrix, 358 trust region method, 342 two-step full transform method, 361 Yule-Walker equations, 358 Yule-Walker estimates, 357
autoregressive conditional heteroscedasticity, see ARCH model autoregressive error correction AUTOREG procedure, 316 autoregressive integrated moving-average model, see ARIMA model, see ARIMA model autoregressive models FORECAST procedure, 797 MODEL procedure, 1088 autoregressive moving-average model, see ARMA model autoregressive parameters ARIMA procedure, 256 auxiliary data sets DATASOURCE procedure, 520 auxiliary equations, 1074 MODEL procedure, 1074 balance of payment statistics data files, see DATASOURCE procedure balloon payment mortgage, see LOAN procedure LOAN procedure, 828 bandwidth functions, 1013 Base SAS software, 48 Basmann test SYSLIN procedure, 1639, 1654 batch mode, see unattended mode Bayesian vector autoregressive models VARMAX procedure, 1904, 1947 BEA data files, see DATASOURCE procedure BEA national income and product accounts PC Format DATASOURCE procedure, 590 BEA S-pages, see DATASOURCE procedure Benefit-Cost Ratio Analysis, 2769 between between estimators, 1291 Between Estimators PANEL procedure, 1291 between estimators, 1291 between, 1291 between levels and rates interpolation, 125 between stocks and flows interpolation, 125 BIC, see Schwarz Bayesian information criterion block structure MODEL procedure, 1178 BLS consumer price index surveys DATASOURCE procedure, 591 BLS data files, see DATASOURCE procedure BLS national employment, hours, and earnings survey DATASOURCE procedure, 592
2796 F Subject Index
BLS producer price index survey DATASOURCE procedure, 591 BLS state and area employment, hours, and earnings survey DATASOURCE procedure, 593 Bond, 2722 BOPS data file DATASOURCE procedure, 609 boundaries smoothing weights, 2671 bounds on parameter estimates, 644, 975, 1386 BOUNDS statement, 644, 975, 1386 Box Cox transformations, 2667 Box Cox transformation, see transformations Box-Cox transformation BOXCOXAR macro, 150 Box-Jenkins model, see ARIMA model BOXCOXAR macro Box-Cox transformation, 150 output data sets, 151 SAS macros, 150 break even analysis LOAN procedure, 852 Breakeven Analysis, 2771 Breusch-Pagan test, 1050 heteroscedasticity tests, 1050 Brown smoothing model, see double exponential smoothing Bureau of Economic Analysis data files, see DATASOURCE procedure Bureau of Labor Statistics data files, see DATASOURCE procedure buydown rate loans, see LOAN procedure LOAN procedure, 828 BY groups ARIMA procedure, 227 AUTOREG procedure, 340 COUNTREG procedure, 491 cross-sectional dimensions and, 79 ESM procedure, 689 EXPAND procedure, 731 FORECAST procedure, 795 MDC procedure, 891 PANEL procedure, 1271 PDLREG procedure, 1355 SIMILARITY procedure, 1450 SIMLIN procedure, 1516 SPECTRA procedure, 1546 STATESPACE procedure, 1586 SYSLIN procedure, 1637 TIMESERIES procedure, 1688 TSCSREG procedure, 1735 UCM procedure, 1760
X11 procedure, 2046 X12 procedure, 2111 BY groups and time series cross-sectional form, 79 calculation of leads, 112 calculations smoothing models, 2669 calendar calculations functions for, 95, 143 interval functions and, 104 time intervals and, 104 calendar calculations and INTCK function, 104 INTNX function, 104 time intervals, 104 calendar functions and date values, 96 calendar variables, 95 computing dates from, 96 computing from dates, 96 computing from datetime values, 98 canonical correlation analysis for selection of state space models, 1570, 1593 STATESPACE procedure, 1570, 1593 Cashflow, see Generic Cashflow CATALOG procedure, 49 SAS catalogs, 49 Cauchy distribution estimation example, 1219 examples, 1219 CDT (COMPUTAB data table) COMPUTAB procedure, 454 ceiling of time intervals, 102 Censored Regression Models QLIM procedure, 1399 Census X-11 method, see X11 procedure Census X-11 methodology X11 procedure, 2059 Census X-12 method, see X12 procedure Center for Research in Security Prices data files, see DATASOURCE procedure centered moving time window operators, 745, 746 change vector, 1027 changes in trend forecasting models, 2530 changing by interpolation frequency, 125, 721, 733 periodicity, 125, 721 sampling frequency, 125
Subject Index F 2797
changing periodicity EXPAND procedure, 125 time series data, 125, 721 character functions, 50 character variables MODEL procedure, 1155 CHART procedure, 49 histograms, 49 checking data periodicity INTNX function, 103 time intervals, 103 Chirp-Z algorithm SPECTRA procedure, 1549 choice of instrumental variables, 1084 Cholesky root AUTOREG procedure, 359 Chow test, 343, 344, 375 Chow test for structural change, 343 Chow tests, 1081 MODEL procedure, 1081 CITIBASE format DATASOURCE procedure, 523 CITIBASE old format DATASOURCE procedure, 594 CITIBASE PC format DATASOURCE procedure, 595 classical decomposition operators, 749 Cochrane-Orcutt method AUTOREG procedure, 361 coherency cross-spectral analysis, 1553 coherency of cross-spectrum SPECTRA procedure, 1553 cointegration VARMAX procedure, 1959 cointegration test, 345, 377 cointegration testing VARMAX procedure, 1902, 1963 collinearity diagnostics MODEL procedure, 1032, 1041 column blocks COMPUTAB procedure, 455 column selection COMPUTAB procedure, 452, 453 COLxxxxx: label COMPUTAB procedure, 446 combination models forecasting models, 2482 specifying, 2482 combined seasonality test, 2076, 2136 combined with cross-sectional dimension interleaved time series, 82
combined with interleaved time series cross-sectional dimensions, 82 combining forecasts, 2591, 2666 combining time series data sets, 119 Command Reference, 2545 common trends VARMAX procedure, 1959 common trends testing VARMAX procedure, 1903, 1961 COMPARE procedure, 49 comparing SAS data sets, 49 comparing forecasting models, 2505, 2605 comparing forecasting models, 2505, 2605 comparing loans LOAN procedure, 835, 852, 856 comparing SAS data sets, see COMPARE procedure compiler listing MODEL procedure, 1171 COMPUSTAT data files, see DATASOURCE procedure DATASOURCE procedure, 595 COMPUSTAT IBM 360/370 general format 48 quarter files DATASOURCE procedure, 597 COMPUSTAT IBM 360/370 general format annual files DATASOURCE procedure, 596 COMPUSTAT universal character format 48 quarter files DATASOURCE procedure, 599 COMPUSTAT universal character format annual files DATASOURCE procedure, 598 COMPUTAB procedure CDT (COMPUTAB data table), 454 column blocks, 455 column selection, 452, 453 COLxxxxx: label, 446 consolidation tables, 446 controlling row and column block execution, 453 input block, 455 missing values, 457 order of calculations, 451 output data sets, 457 program flow, 448 programming statements, 445 reserved words, 456 row blocks, 456 ROWxxxxx: label, 446 table cells, direct access to, 456 computational details
2798 F Subject Index
VARMAX procedure, 2001 computing calendar variables from datetime values, 98 computing ceiling of intervals INTNX function, 102 computing dates from calendar variables, 96 computing datetime values from date values, 97 computing ending date of intervals INTNX function, 101 computing from calendar variables datetime values, 97 computing from dates calendar variables, 96 computing from datetime values calendar variables, 98 date values, 97 time variables, 98 computing from time variables datetime values, 97 computing lags RETAIN statement, 108 computing midpoint date of intervals INTNX function, 101 computing time variables from datetime values, 98 computing widths of intervals INTNX function, 101 concatenated data set, 2410 concentrated likelihood Hessian, 1021 conditional forecasts ARIMA procedure, 257 conditional least squares AR initial conditions, 1091 MA Initial Conditions, 1092 conditional logit model MDC procedure, 870, 871, 904 conditional t distribution GARCH model, 367 conditional variance AUTOREG procedure, 383 predicted values, 383 predicting, 383 confidence limits, 2436 ARIMA procedure, 257 AUTOREG procedure, 355 FORECAST procedure, 807 forecasts, 2436 PDLREG procedure, 1359 STATESPACE procedure, 1601 VARMAX procedure, 1988 consolidation tables
COMPUTAB procedure, 446 Constant Dollar Calculation, 2758 constrained estimation heteroscedasticity models, 351 Consumer Price Index Surveys, see DATASOURCE procedure contemporaneous correlation of errors across equations, 1649 contents of SAS data sets, 49 CONTENTS procedure, 49 SASECRSP engine, 2194 SASEFAME engine, 2291 continuous compounding LOAN procedure, 850 contrasted with flow variables stocks, 724 contrasted with flows or rates levels, 724 contrasted with missing values omitted observations, 78 contrasted with omitted observations missing observations, 78 missing values, 78 contrasted with stock variables flows, 724 contrasted with stocks or levels rates, 724 control charts, 56 control key for multiple selections, 2398 control variables MODEL procedure, 1153 controlling row and column block execution COMPUTAB procedure, 453 controlling starting values MODEL procedure, 1034 convergence criteria MODEL procedure, 1028 convergence problems VARMAX procedure, 2001 conversion methods EXPAND procedure, 739 convert option SASEFAME engine, 2291 Converting Dates Using the CRSP Date Functions SASECRSP engine, 2208 converting frequency of time series data, 721 COPY procedure, 49 copying SAS data sets, 49 CORR procedure, 49
Subject Index F 2799
corrected sum of squares statistics of fit, 2689 correlation plots ARIMA procedure, 193 cospectrum estimate cross-spectral analysis, 1553 SPECTRA procedure, 1553 counting time intervals, 99, 102 counting time intervals INTCK function, 102 COUNTREG procedure bounds on parameter estimates, 490 BY groups, 491 output table names, 508 restrictions on parameter estimates, 493 syntax, 487 covariance estimates GARCH model, 343 Covariance of GMM estimators, 1015 covariance of the parameter estimates, 1008 covariance stationarity VARMAX procedure, 1983 covariates heteroscedasticity models, 350, 1390 CPORT procedure, 49 CPU requirements VARMAX procedure, 2002 creating time ID variable, 2439 creating a FAME view, see SASEFAME engine creating a Haver view, see SASEHAVR engine creating from Model Viewer HTML, 2618 creating from Time Series Viewer HTML, 2657 criterion automatic model selection, 2610 cross sectional dimensions represented by different series, 79 cross sections DATASOURCE procedure, 527, 529, 532, 543 cross-correlation function ARIMA procedure, 240 cross-equation covariance matrix MODEL procedure, 1026 seemingly unrelated regression, 1011 cross-periodogram cross-spectral analysis, 1542, 1553 SPECTRA procedure, 1553 cross-reference MODEL procedure, 1170 cross-sectional dimensions, 79
combined with interleaved time series, 82 ID variables for, 79 represented with BY groups, 79 transposing time series, 121 cross-sectional dimensions and BY groups, 79 cross-spectral analysis coherency, 1553 cospectrum estimate, 1553 cross-periodogram, 1542, 1553 cross-spectrum, 1553 quadrature spectrum, 1553 SPECTRA procedure, 1542, 1553 cross-spectrum cross-spectral analysis, 1553 SPECTRA procedure, 1553 crossproducts estimator of the covariance matrix, 1021 crossproducts matrix, 1043 crosstabulations, see FREQ procedure CRSP and SAS Dates SASECRSP engine, 2208 CRSP annual data DATASOURCE procedure, 605 CRSP calendar/indices files DATASOURCE procedure, 601 CRSP daily binary files DATASOURCE procedure, 600 CRSP daily character files DATASOURCE procedure, 600 CRSP daily IBM binary files DATASOURCE procedure, 600 CRSP daily security files DATASOURCE procedure, 602 CRSP data files, see DATASOURCE procedure CRSP Date Formats SASECRSP engine, 2208 CRSP Date Functions SASECRSP engine, 2208 CRSP Date Informats SASECRSP engine, 2209 CRSP Integer Date Format SASECRSP engine, 2208 CRSP monthly binary files DATASOURCE procedure, 600 CRSP monthly character files DATASOURCE procedure, 600 CRSP monthly IBM binary files DATASOURCE procedure, 600 CRSP monthly security files DATASOURCE procedure, 603 CRSP stock files DATASOURCE procedure, 600 CRSPAccess Database
2800 F Subject Index
DATASOURCE procedure, 600 CRSPDB_SASCAL environment variable SASECRSP engine, 2194 CRSPDCI Date Functions SASECRSP engine, 2210 CRSPDCS Date Functions SASECRSP engine, 2210 CRSPDI2S Date Function SASECRSP engine, 2210 CRSPDIC Date Functions SASECRSP engine, 2210 CRSPDS2I Date Function SASECRSP engine, 2210 CRSPDSC Date Functions SASECRSP engine, 2210 CRSPDT Date Formats SASECRSP engine, 2208 cubic trend curves, 2684 cubic trend, 2684 cumulative statistics operators, 747 Currency Conversion, 2756 custom model specification, 2569 custom models forecasting models, 2472 specifying, 2472 CUSUM statistics, 355, 368 Da Silva method PANEL procedure, 1302 damped-trend exponential smoothing, 2675 smoothing models, 2675 data frequency, see time intervals data periodicity FORECAST procedure, 796 data requirements ARIMA procedure, 221 FORECAST procedure, 806 X11 procedure, 2064 data set, 2405 concatenated, 2410 forecast data set, 2406 forms of, 2406 interleaved, 2408 simple, 2407 data set selection, 2391, 2573 DATA step, 49 SAS data sets, 49 DATASETS procedure, 49 DATASOURCE procedure attributes, 520 attributes of variables, 545 auxiliary data sets, 520 balance of payment statistics data files, 520
BEA data files, 520 BEA national income and product accounts PC Format, 590 BEA S-pages, 520 BLS consumer price index surveys, 591 BLS data files, 520 BLS national employment, hours, and earnings survey, 592 BLS producer price index survey, 591 BLS state and area employment, hours, and earnings survey, 593 BOPS data file, 609 Bureau of Economic Analysis data files, 520 Bureau of Labor Statistics data files, 520 Center for Research in Security Prices data files, 520 CITIBASE format, 523 CITIBASE old format, 594 CITIBASE PC format, 595 COMPUSTAT data files, 520, 595 COMPUSTAT IBM 360/370 general format 48 quarter files, 597 COMPUSTAT IBM 360/370 general format annual files, 596 COMPUSTAT universal character format 48 quarter files, 599 COMPUSTAT universal character format annual files, 598 Consumer Price Index Surveys, 520 cross sections, 527, 529, 532, 543 CRSP annual data, 605 CRSP calendar/indices files, 601 CRSP daily binary files, 600 CRSP daily character files, 600 CRSP daily IBM binary files, 600 CRSP daily security files, 602 CRSP data files, 520 CRSP monthly binary files, 600 CRSP monthly character files, 600 CRSP monthly IBM binary files, 600 CRSP monthly security files, 603 CRSP stock files, 600 CRSPAccess Database, 600 direction of trade statistics data files, 520 DOTS data file, 609 DRI Data Delivery Service data files, 520 DRI data files, 520, 593 DRI/McGraw-Hill data files, 520, 593 DRIBASIC data files, 594 DRIBASIC economics format, 523 DRIDDS data files, 594 employment, hours, and earnings survey, 520 event variables, 542, 543, 549
Subject Index F 2801
FAME data files, 520 FAME Information Services Databases, 520, 605 formatting variables, 545 frequency of data, 524 frequency of input data, 539 generic variables, 550 GFS data files, 610 Global Insight data files, 520, 593, 594 Global Insight DRI data files, 593 government finance statistics data files, 520 Haver Analytics data files, 607 ID variable, 549 IMF balance of payment statistics, 609 IMF data files, 520 IMF direction of trade statistics, 609 IMF Economic Information System data files, 608 IMF government finance statistics, 610 IMF International Financial Statistics, 527 IMF international financial statistics, 608 indexing the OUT= data set, 538, 581 input file, 538, 539 international financial statistics data files, 520 International Monetary Fund data files, 520, 608 labeling variables, 546 lengths of variables, 534, 546 main economic indicators (OECD) data files, 520 national accounts data files (OECD), 520 national income and product accounts, 520, 590 NIPA Tables, 590 obtaining descriptive information, 525, 529–531, 550–553 OECD ANA data files, 610 OECD annual national accounts, 610 OECD data files, 520 OECD main economic indicators, 612 OECD MEI data files, 612 OECD QNA data files, 611 OECD quarterly national accounts, 611 Organization for Economic Cooperation and Development data files, 520, 610 OUTALL= data set, 530 OUTBY= data set, 529 OUTCONT= data set, 525, 531 output data sets, 523, 548, 550–553 Producer Price Index Survey, 520 reading data files, 523 renaming variables, 532, 547 SAS YEARCUTOFF= option, 544
state and area employment, hours, and earnings survey, 520 stock data files, 520 subsetting data files, 523, 536 time range, 544 time range of data, 526 time series variables, 524, 549 type of input data file, 538 U.S. Bureau of Economic Analysis data files, 590 U.S. Bureau of Labor Statistics data files, 591 variable list, 547 DATE ID variables, 72 date values, 2389 calendar functions and, 96 computing datetime values from, 97 computing from datetime values, 97 difference between dates, 101 formats, 70, 138 formats for, 70 functions, 143 incrementing by intervals, 99 informats, 69, 136 informats for, 69 INTNX function and, 99 normalizing to intervals, 101 SAS representation for, 68 syntax for, 68 time intervals, 129 time intervals and, 100 DATE variable, 72 dates alignment of, 2582 DATETIME ID variables, 72 datetime values computing calendar variables from, 98 computing from calendar variables, 97 computing from time variables, 97 computing time variables from, 98 formats, 70, 142 formats for, 70 functions, 143 informats, 69, 136 informats for, 69 SAS representation for, 69 syntax for, 69 time intervals, 129 DATETIME variable, 72 dating variables, 2446 decomposition of prediction error covariance VARMAX procedure, 1897, 1933
2802 F Subject Index
default time ranges, 2575 defined INTCK function, 99 interleaved time series, 80 INTNX function, 98 omitted observations, 78 time values, 69 definition S matrix, 1008 time series, 2380 degrees of freedom correction, 1026 denominator factors transfer function model, 218 dependency list MODEL procedure, 1174 Depreciation, 2719 derivatives MODEL procedure, 1157 DERT. variable, 1068 descriptive statistics, see UNIVARIATE procedure design matrix ARIMAX models and, 217 details generalized method of moments, 1011 developing forecasting models, 2417, 2576 developing forecasting models, 2417, 2576 DFPVALUE macro Dickey-Fuller test, 153 SAS macros, 153 DFTEST macro Dickey-Fuller test, 154 output data sets, 155 SAS macros, 154 seasonality, testing for, 154 stationarity, testing for, 154 diagnostic tests, 2453, 2687 time series, 2453 diagnostics and debugging MODEL procedure, 1168 Dickey-Fuller test, 2688 DFPVALUE macro, 153 DFTEST macro, 154 PROBDF Function, 158 significance probabilities, 158 significance probabilities for, 153 unit root, 158 VARMAX procedure, 1901 Dickey-Fuller tests, 231 DIF function alternatives to, 108 explained, 106 higher order differences, 109, 110
introduced, 105 MODEL procedure version, 109 multiperiod lags and, 109 percent change calculations and, 110–112 pitfalls of, 107 second difference, 109 DIF function and differencing, 105–107 difference between dates date values, 101 differences with X11ARIMA/88 X11 procedure, 2058 Differencing, 2584 differencing ARIMA procedure, 210, 247, 254 DIF function and, 105–107 higher order, 109, 110 MODEL procedure and, 109 multiperiod differences, 109 percent change calculations and, 110–112 RETAIN statement and, 108 second difference, 109 STATESPACE procedure, 1587 testing order of, 154 time series data, 105–112 VARMAX procedure, 1894 different forms of output data sets, 82 differential algebraic equations ordinary differential equations (ODEs), 1148 differential equations See ordinary differential equations, 1070 direction of trade statistics data files, see DATASOURCE procedure discussed EXPAND procedure, 124 distributed lag regression models PDLREG procedure, 1349 distribution of time series, 724 distribution of time series data, 724 distribution of time series EXPAND procedure, 724 DOT as a GLUE character SASEFAME engine, 2294 DOTS data file DATASOURCE procedure, 609 double exponential smoothing, see exponential smoothing, 2673 Brown smoothing model, 2673 smoothing models, 2673
Subject Index F 2803
DRI Data Delivery Service data files, see DATASOURCE procedure DRI data files, see DATASOURCE procedure DATASOURCE procedure, 593 DRI data files in FAME.db, see SASEFAME engine DRI/McGraw-Hill data files, see DATASOURCE procedure DATASOURCE procedure, 593 DRI/McGraw-Hill data files in FAME.db, see SASEFAME engine DRIBASIC data files DATASOURCE procedure, 594 DRIBASIC economics format DATASOURCE procedure, 523 DRIDDS data files DATASOURCE procedure, 594 DROP in the DATA step SASEFAME engine, 2304 dual quasi-Newton method AUTOREG procedure, 367 Durbin h test AUTOREG procedure, 326 Durbin t test AUTOREG procedure, 326 Durbin-Watson MODEL procedure, 1025 Durbin-Watson test autocorrelation tests, 343 AUTOREG procedure, 324 for first-order autocorrelation, 324 for higher-order autocorrelation, 324 p-values for, 324 Durbin-Watson tests, 343 linearized form, 349 dynamic models SIMLIN procedure, 1512, 1513, 1519, 1534 dynamic multipliers SIMLIN procedure, 1519, 1520 dynamic regression, 190, 213, 2585, 2586 specifying, 2523 dynamic regressors forecasting models, 2523 dynamic simulation, 1068 MODEL procedure, 1068, 1117 SIMLIN procedure, 1513 dynamic simultaneous equation models VARMAX procedure, 1916 econometrics features in SAS/ETS software, 23 editing selection list forecasting models, 2478 EGARCH model
AUTOREG procedure, 335 EGLS method AUTOREG procedure, 361 embedded in time series missing values, 78 embedded missing values, 78 embedded missing values in time series data, 78 Empirical Distribution Estimation MODEL procedure, 1023 employment, hours, and earnings survey, see DATASOURCE procedure ending dates of time intervals, 101 endogenous variables SYSLIN procedure, 1616 endpoint restrictions for polynomial distributed lags, 1350, 1356 Enterprise Guide, 58 Enterprise Miner—Time Series nodes, 59 ENTROPY procedure input data sets, 663 missing values, 662 ODS graph names, 666 output data sets, 664 output table names, 665 Environment variable, CRSPDB_SASCAL SASECRSP engine, 2194 EQ. variables, 1059, 1155 equality restriction linear models, 648 nonlinear models, 999, 1076 equation translations MODEL procedure, 1155 equation variables MODEL procedure, 1152 Error model options, 2587 error sum of squares statistics of fit, 2689 ERROR. variables, 1155 errors across equations contemporaneous correlation of, 1649 ESACF (Extended Sample Autocorrelation Function method), 241 ESM procedure BY groups, 689 ODS graph names, 705 EST= data set SIMLIN procedure, 1521 ESTIMATE statement, 981 estimation convergence problems MODEL procedure, 1038 estimation methods AUTOREG procedure, 357
2804 F Subject Index
MODEL procedure, 1007 estimation of ordinary differential equations, 1070 MODEL procedure, 1070 evaluation range, 2649 event variables DATASOURCE procedure, 542, 543, 549 example Cauchy distribution estimation, 1219 generalized method of moments, 1054, 1105, 1108, 1109, 1111 Goldfeld Quandt Switching Regression Model, 1221 Mixture of Distributions, 1225 Multivariate Mixture of Distributions, 1225 ordinary differential equations (ODEs), 1215 The D-method, 1221 example of Bayesian VAR modeling VARMAX procedure, 1866 example of Bayesian VECM modeling VARMAX procedure, 1873 example of causality testing VARMAX procedure, 1881 example of cointegration testing VARMAX procedure, 1869 example of multivariate GARCH modeling VARMAX procedure, 1983 example of restricted parameter estimation and testing VARMAX procedure, 1878 example of VAR modeling VARMAX procedure, 1859 example of VARMA modeling VARMAX procedure, 1952 example of vector autoregressive modeling with exogenous variables VARMAX procedure, 1874 example of vector error correction modeling VARMAX procedure, 1868 example, COUNTREG, 509 examples Cauchy distribution estimation, 1219 Monte Carlo simulation, 1218 Simulating from a Mixture of Distributions, 1225 Switching Regression example, 1221 systems of differential equations, 1215 examples of time intervals, 135 exogenous variables SYSLIN procedure, 1616 EXPAND procedure AGGREGATE method, 740
aggregation of time series, 721, 724 BY groups, 731 changing periodicity, 125 conversion methods, 739 discussed, 124 distribution of time series, 724 extrapolation, 736 frequency, 721 ID variables, 733, 735 interpolation methods, 739 interpolation of missing values, 124 JOIN method, 740 ODS graph names, 758 output data sets, 756 range of output observations, 736 SPLINE method, 739 STEP method, 740 time intervals, 735 transformation of time series, 726, 742 transformation operations, 742 EXPAND procedure and interpolation, 124 time intervals, 124 experimental design, 56 explained DIF function, 106 LAG function, 106 explosive differential equations, 1147 ordinary differential equations (ODEs), 1147 exponential trend curves, 2685 exponential smoothing, see smoothing models double exponential smoothing, 798 FORECAST procedure, 774, 798 single exponential smoothing, 798 triple exponential smoothing, 798 exponential trend, 2685 Extended Sample Autocorrelation Function (ESACF) method, 241 external forecasts, 2666 external forecasts, 2666 external sources forecasting models, 2485, 2588 extrapolation EXPAND procedure, 736 Factored ARIMA, 2555, 2584, 2622 Factored ARIMA model specification, 2589 Factored ARIMA models forecasting models, 2468 specifying, 2468 factored model
Subject Index F 2805
ARIMA model, 212 ARIMA procedure, 212 AUTOREG procedure, 329 FAME data files, see DATASOURCE procedure, see SASEFAME engine FAME glue symbol named DOT SASEFAME engine, 2299 FAME Information Services Databases, see DATASOURCE procedure, see SASEFAME engine DATASOURCE procedure, 605 fast Fourier transform SPECTRA procedure, 1549 fatal error when reading from a FAME data base SASEFAME engine, 2290 FCMP procedure, 49 SAS functions, 49 features in SAS/ETS software econometrics, 23 FIML estimation method, see full information maximum likelihood Financial Functions PROBDF Function, 158 financial functions, 51 finishing the FAME CHLI SASEFAME engine, 2290 finite Fourier transform SPECTRA procedure, 1542 finite memory forecasts ARIMA procedure, 258 first-stage R squares, 1087 fitting forecasting models, 2420 fitting forecasting models, 2420 fixed effects model one-way, 1283 two-way, 1285 fixed rate mortgage, see LOAN procedure LOAN procedure, 828 flows contrasted with stock variables, 724 for first-order autocorrelation Durbin-Watson test, 324 for higher-order autocorrelation Durbin-Watson test, 324 for interleaved time series ID variables, 80 for multiple selections control key, 2398 for nonlinear models instrumental variables, 1084 for selection of state space models canonical correlation analysis, 1570, 1593 for time series data
ID variables, 67 forecast combination, 2591, 2666 FORECAST command, 2546 forecast data set, see output data set forecast horizon, 2575, 2649 forecast options, 2595 FORECAST procedure ADDWINTERS method, 803 automatic forecasting, 774 autoregressive models, 797 BY groups, 795 confidence limits, 807 data periodicity, 796 data requirements, 806 exponential smoothing, 774, 798 forecasting, 774 Holt two-parameter exponential smoothing, 774, 803 ID variables, 795 missing values, 795 output data sets, 806, 808 predicted values, 807 residuals, 807 seasonal forecasting, 799, 803 seasonality, 805 smoothing weights, 803 STEPAR method, 797 stepwise autoregression, 774, 797 time intervals, 796 time series methods, 786 time trend models, 784 Winters method, 774, 799 FORECAST procedure and interleaved time series, 80, 81 Forecast Studio, 51 forecasting, 2664 ARIMA procedure, 257, 259 FORECAST procedure, 774 MODEL procedure, 1120 STATESPACE procedure, 1568, 1597 VARMAX procedure, 1930 Forecasting menusystem, 45 forecasting models adjustments, 2522 ARIMA models, 2465 automatic generation, 2396 automatic selection, 2459 changes in trend, 2530 combination models, 2482 comparing, 2505, 2605 custom models, 2472 developing, 2417, 2576 dynamic regressors, 2523 editing selection list, 2478
2806 F Subject Index
external sources, 2485, 2588 Factored ARIMA models, 2468 fitting, 2420 interventions, 2527 level shifts, 2532 linear trend, 2513 predictor variables, 2511 reference, 2508 regressors, 2518 seasonal dummy variables, 2539 selecting from a list, 2457 smoothing models, 2462, 2669 sorting, 2504, 2581 specifying, 2453 transfer functions, 2682 trend curves, 2515 forecasting of Bayesian vector autoregressive models VARMAX procedure, 1948 forecasting process, 2389 forecasting project, 2410 managing, 2599 Project Management window, 2411 saving and restoring, 2412 sharing, 2416 forecasts, 2437 confidence limits, 2436 external, 2666 plotting, 2436 producing, 2404, 2623 form of state space models, 1568 formats date values, 70, 138 datetime values, 70, 142 recommended for time series ID, 71 time values, 142 formats for date values, 70 datetime values, 70 formatting variables DATASOURCE procedure, 545 forms of data set, 2406 Fourier coefficients SPECTRA procedure, 1553 Fourier transform SPECTRA procedure, 1542 fractional operators, 751 FREQ procedure, 49 crosstabulations, 49 frequency changing by interpolation, 125, 721, 733 EXPAND procedure, 721
of time series observations, 84, 125 SPECTRA procedure, 1552 time intervals and, 84, 125 frequency of data, see time intervals DATASOURCE procedure, 524 frequency of input data DATASOURCE procedure, 539 frequency option SASEHAVR engine, 2340 from interleaved form transposing time series, 119 from standard form transposing time series, 122 full information maximum likelihood FIML estimation method, 1614 MODEL procedure, 1019 SYSLIN procedure, 1624, 1649 Fuller Battese variance components, 1292 Fuller’s modification to LIML SYSLIN procedure, 1654 functions, 50 date values, 143 datetime values, 143 lag functions, 1160 mathematical functions, 1159 random-number functions, 1159 time intervals, 143 time values, 143 functions across time MODEL procedure, 1160 functions for calendar calculations, 95, 143 time intervals, 98, 143 functions of parameters nonlinear models, 981 G4 inverse, 985 GARCH in mean model, see GARCH-M model GARCH model AUTOREG procedure, 314 conditional t distribution, 367 covariance estimates, 343 generalized autoregressive conditional heteroscedasticity, 314 heteroscedasticity models, 350 initial values, 348 starting values, 342 t distribution, 367 GARCH-M model, 366 AUTOREG procedure, 335 GARCH in mean model, 366 Gauss-Marquardt method ARIMA procedure, 250
Subject Index F 2807
AUTOREG procedure, 360 Gauss-Newton method, 1027 Gaussian distribution MODEL procedure, 980 General Form Equations Jacobi method, 1142 Seidel method, 1142 generalized autoregressive conditional heteroscedasticity, see GARCH model generalized Durbin-Watson tests AUTOREG procedure, 324 generalized least squares PANEL procedure, 1300 generalized least squares estimator of the covariance matrix, 1021 generalized least-squares Yule-Walker method as, 361 Generalized Method of Moments V matrix, 1012, 1017 generalized method of moments details, 1011 example, 1054, 1105, 1108, 1109, 1111 generating models, 2561 Generic Cashflow, 2726 generic variables DATASOURCE procedure, 550 GFS data files DATASOURCE procedure, 610 giving dates to time series data, 67 Global Insight data files DATASOURCE procedure, 593, 594 Global Insight DRI data files, see DATASOURCE procedure DATASOURCE procedure, 593 global statements, 50 GLUE symbol SASEFAME engine, 2294 GMM simulated method of moments, 1016 SMM, 1016 GMM in Panel: Arellano and Bond’s Estimator Panel GMM, 1304 goal seeking MODEL procedure, 1138 goal seeking problems, 1074 Godfrey Lagrange test autocorrelation tests, 1058 Godfrey’s test, 344 autocorrelation tests, 344 Goldfeld Quandt Switching Regression Model example, 1221 goodness of fit, see statistics of fit
goodness-of-fit statistics, see statistics of fit, 2643, see statistics of fit government finance statistics data files, see DATASOURCE procedure gradient of the objective function, 1042, 1043 Granger causality test VARMAX procedure, 1944 graphics SAS/GRAPH software, 52 graphs, see Model Viewer, see Time Series Viewer grid search MODEL procedure, 1036 Hausman specification test, 1079 MODEL procedure, 1079 Haver Analytics data files DATASOURCE procedure, 607 Haver data files, see SASEHAVR engine Haver Information Services Databases, see SASEHAVR engine HCCME 2SLS, 1057 HCCME 3SLS, 1057 HCCME = hccme=0, 1313 PANEL procedure, 1313 HCCME OLS, 1055 HCCME SUR, 1057 hccme=0 HCCME =, 1313 help system, 23 Henze-Zirkler test, 1048 normality tests, 1048 heteroscedastic errors, 1011 heteroscedastic extreme value model MDC procedure, 881, 906 heteroscedasticity, 947, 1050 AUTOREG procedure, 329 Lagrange multiplier test, 351 testing for, 329 Heteroscedasticity Corrected Covariance Matrices, 1313 heteroscedasticity models, see GARCH model constrained estimation, 351 covariates, 350, 1390 link function, 350 heteroscedasticity tests Breusch-Pagan test, 1050 Lagrange multiplier test, 351 White’s test, 1050 Heteroscedasticity-Consistent Covariance Matrix Estimation , 1055 higher order differencing, 109, 110
2808 F Subject Index
higher order differences DIF function, 109, 110 higher order sums summation, 114 Hildreth-Lu AR initial conditions, 1091 Hildreth-Lu method AUTOREG procedure, 362 histograms, see CHART procedure hold-out sample, 2575 hold-out samples, 2508 Holt smoothing model, see linear exponential smoothing Holt two-parameter exponential smoothing FORECAST procedure, 774, 803 Holt-Winters Method, see Winters Method Holt-Winters method, see Winters method homoscedastic errors, 1050 HTML creating from Model Viewer, 2618 creating from Time Series Viewer, 2657 hyperbolic trend curves, 2685 hyperbolic trend, 2685 ID groups MDC procedure, 891 ID values for time intervals, 100 ID variable, see time ID variable DATASOURCE procedure, 549 ID variable for time series data, 67 ID variables, 2395 ARIMA procedure, 259 DATE, 72 DATETIME, 72 EXPAND procedure, 733, 735 for interleaved time series, 80 for time series data, 67 FORECAST procedure, 795 PANEL procedure, 1273 SIMLIN procedure, 1517 sorting by, 72 STATESPACE procedure, 1586 TSCSREG procedure, 1735 X11 procedure, 2046, 2048 X12 procedure, 2111 ID variables for cross-sectional dimensions, 79 interleaved time series, 80 time series cross-sectional form, 79 IGARCH model AUTOREG procedure, 335
IMF balance of payment statistics DATASOURCE procedure, 609 IMF data files, see DATASOURCE procedure IMF direction of trade statistics DATASOURCE procedure, 609 IMF Economic Information System data files DATASOURCE procedure, 608 IMF government finance statistics DATASOURCE procedure, 610 IMF International Financial Statistics DATASOURCE procedure, 527 IMF international financial statistics DATASOURCE procedure, 608 IML, see SAS/IML software impact multipliers SIMLIN procedure, 1519, 1524 impulse function intervention model and, 216 impulse response function VARMAX procedure, 1898, 1919 impulse response matrix of a state space model, 1600 in SAS data sets time series, 2380 in standard form output data sets, 83 incrementing by intervals date values, 99 incrementing dates INTNX function, 99 incrementing dates by time intervals, 98, 99 independent variables, see predictor variables indexing OUT= data set, 549 indexing the OUT= data set DATASOURCE procedure, 538, 581 inequality restriction linear models, 648 nonlinear models, 975, 999, 1076 infinite memory forecasts ARIMA procedure, 257 infinite order AR representation VARMAX procedure, 1898 infinite order MA representation VARMAX procedure, 1898, 1919 informats date values, 69, 136 datetime values, 69, 136 time values, 136 informats for date values, 69 datetime values, 69 initial values, 348, 896
Subject Index F 2809
GARCH model, 348 initializations smoothing models, 2670 initializing lags MODEL procedure, 1163 SIMLIN procedure, 1522 innovation vector of a state space model, 1569 input block COMPUTAB procedure, 455 input data set, 2391, 2573 input data sets ENTROPY procedure, 663 MODEL procedure, 1104 input file DATASOURCE procedure, 538, 539 input matrix of a state space model, 1569 input series ARIMA procedure, 213 INPUT variables X12 procedure, 2113 inputs, see predictor variables installment loans, see LOAN procedure instrumental regression, 1010 instrumental variables, 1010 choice of, 1084 for nonlinear models, 1084 number to use, 1085 SYSLIN procedure, 1616 instruments, 1008 INTCK function calendar calculations and, 104 counting time intervals, 102 defined, 99 INTCK function and time intervals, 99, 102 interaction effects ARIMA procedure, 217 interest rates LOAN procedure, 851 interim multipliers SIMLIN procedure, 1515, 1520, 1523, 1524 interleaved data set, 2408 interleaved form output data sets, 82 interleaved form of time series data set, 80 interleaved time series and _TYPE_ variable, 80, 81 combined with cross-sectional dimension, 82 defined, 80
FORECAST procedure and, 80, 81 ID variables for, 80 plots of, 90 Internal Rate of Return, 2768 internal rate of return LOAN procedure, 852 internal variables MODEL procedure, 1153 international financial statistics data files, see DATASOURCE procedure International Monetary Fund data files, see DATASOURCE procedure DATASOURCE procedure, 608 interpolation between levels and rates, 125 between stocks and flows, 125 EXPAND procedure and, 124 of missing values, 124, 723 time series data, 125 to higher frequency, 125 to lower frequency, 125 interpolation methods EXPAND procedure, 739 interpolation of missing values, 124 time series data, 124, 125, 723 interpolation of missing values EXPAND procedure, 124 interpolation of time series step function, 740 interrupted time series analysis, see intervention model interrupted time series model, see intervention model interval functions, see time intervals, functions interval functions and calendar calculations, 104 INTERVAL= option and time intervals, 84 intervals, see time intervals, 2395 intervention analysis, see intervention model intervention model ARIMA procedure, 213, 216, 219, 298 interrupted time series analysis, 216 interrupted time series model, 213 intervention analysis, 216 intervention model and impulse function, 216 step function, 217 intervention notation, 2686 intervention specification, 2595, 2597 interventions, 2685 automatic inclusion of, 2582 forecasting models, 2527
2810 F Subject Index
point, 2685 predictor variables, 2685 ramp, 2686 specifying, 2527 step, 2685 INTNX function calendar calculations and, 104 checking data periodicity, 103 computing ceiling of intervals, 102 computing ending date of intervals, 101 computing midpoint date of intervals, 101 computing widths of intervals, 101 defined, 98 incrementing dates, 99 normalizing dates in intervals, 101 INTNX function and date values, 99 time intervals, 98 introduced DIF function, 105 LAG function, 105 percent change calculations, 110 time variables, 95 inverse autocorrelation function ARIMA procedure, 239 invertibility ARIMA procedure, 255 VARMAX procedure, 1949 Investment Analysis System, 46 Investment Portfolio, 2706 invoking the system, 2384 IRoR, 2768 irregular component X11 procedure, 2034, 2040 iterated generalized method of moments, 1015 iterated seemingly unrelated regression SYSLIN procedure, 1649 iterated three-stage least squares SYSLIN procedure, 1649 Iterative Outlier Detection ARIMA procedure, 307 Jacobi method MODEL procedure, 1141 Jacobi method with General Form Equations MODEL procedure, 1142 Jacobian, 1009, 1027 Jarque-Bera test, 344 normality tests, 344 JMP, 57 JOIN method EXPAND procedure, 740 joint generalized least squares, see seemingly unrelated regression
jointly dependent variables SYSLIN procedure, 1616 K-class estimation SYSLIN procedure, 1648 Kalman filter AUTOREG procedure, 360 STATESPACE procedure, 1570 used for state space modeling, 1570 KEEP in the DATA step SASEFAME engine, 2304 kernels, 1013, 1549 SPECTRA procedure, 1549 Kolmogorov-Smirnov test, 1048 normality tests, 1048 KPSS (Kwiatkowski, Phillips, Schmidt, Shin) test, 346 KPSS test, 346, 378 unit roots, 378 Kruskal-Wallis test, 2076 labeling variables DATASOURCE procedure, 546 LAG function alternatives to, 108 explained, 106 introduced, 105 MODEL procedure version, 109 multiperiod lags and, 109 percent change calculations and, 110–112 pitfalls of, 107 LAG function and Lags, 106 lags, 105, 107 lag functions functions, 1160 MODEL procedure, 1160 lag lengths MODEL procedure, 1162 lag logic MODEL procedure, 1161 lagged dependent variables and tests for autocorrelation, 326 AUTOREG procedure, 326 lagged endogenous variables SYSLIN procedure, 1616 lagging time series data, 105–112 Lagrange multiplier test heteroscedasticity, 351 heteroscedasticity tests, 351 linear hypotheses, 650 nonlinear hypotheses, 1006, 1078, 1410 Lags
Subject Index F 2811
LAG function and, 106 lags LAG function and, 105, 107 MODEL procedure and, 109 multiperiod lagging, 109 percent change calculations and, 110–112 RETAIN statement and, 108 SIMLIN procedure, 1522 lambda, 1028 language differences MODEL procedure, 1164 large problems MODEL procedure, 1045 leads calculation of, 112 multiperiod, 112 time series data, 112 left-hand side expressions nonlinear models, 1152 lengths of variables DATASOURCE procedure, 534, 546 level shifts forecasting models, 2532 specifying, 2532 levels contrasted with flows or rates, 724 LIBNAME libref SASEHAVR ’physical name’ on UNIX SASEFAME engine, 2299 LIBNAME libref SASEHAVR ’physical name’ on Windows SASEFAME engine, 2299 LIBNAME interface engine for FAME database, see SASEFAME engine LIBNAME interface engine for Haver database, see SASEHAVR engine LIBNAME statement SASECRSP engine, 2190 SASEFAME engine, 2290 SASEHAVR engine, 2340 likelihood confidence intervals, 1082 MODEL procedure, 1082 likelihood ratio test linear hypotheses, 650 nonlinear hypotheses, 1006, 1078 limitations on ordinary differential equations (ODEs), 1147 limitations on ordinary differential equations MODEL procedure, 1147 Limited Dependent Variable Models QLIM procedure, 1399 limited information maximum likelihood LIML estimation method, 1614
SYSLIN procedure, 1648 LIML estimation method, see limited information maximum likelihood linear trend curves, 2684 linear dependencies MODEL procedure, 1041 linear exponential smoothing, 2674 Holt smoothing model, 2674 smoothing models, 2674 linear hypotheses Lagrange multiplier test, 650 likelihood ratio test, 650 Wald test, 650 linear hypothesis testing, 1312 PANEL procedure, 1312 linear models equality restriction, 648 inequality restriction, 648 restricted estimation, 648 linear structural equations SIMLIN procedure, 1519 linear trend, 2513, 2684 forecasting models, 2513 linearized form Durbin-Watson tests, 349 link function heteroscedasticity models, 350 Loan, 2711 LOAN procedure adjustable rate mortgage, 827, 828 amortization schedule, 856 balloon payment mortgage, 827, 828 break even analysis, 852 buydown rate loans, 827, 828 comparing loans, 835, 852, 856 continuous compounding, 850 fixed rate mortgage, 827, 828 installment loans, 827 interest rates, 851 internal rate of return, 852 loan repayment schedule, 856 loan summary table, 856 loans analysis, 827 minimum attractive rate of return, 852 mortgage loans, 827 output data sets, 853, 854 output table names, 856 present worth of cost, 852 rate adjustment cases, 846 taxes, 852 true interest rate, 852 types of loans, 828 loan repayment schedule
2812 F Subject Index
LOAN procedure, 856 loan summary table LOAN procedure, 856 loans analysis, see LOAN procedure log transformations, 2667 log likelihood value, 344 log test, 2688 log transformation, see transformations log transformations ARIMA procedure, 258 LOGTEST macro, 156 logarithmic trend curves, 2685 logarithmic trend, 2685 logistic transformations, 2667 trend curves, 2684 logistic trend, 2684 logit QLIM Procedure, 1376 LOGTEST macro log transformations, 156 output data sets, 157 SAS macros, 156 long-run relations testing VARMAX procedure, 1972 %MA and %AR macros combined, 1099 MA Initial Conditions conditional least squares, 1092 maximum likelihood, 1092 unconditional least squares, 1092 macros, see SAS macros MAE AUTOREG procedure, 368 main economic indicators (OECD) data files, see DATASOURCE procedure main economic indicators (OECD) data files in FAME.db, see SASEFAME engine managing forecasting project, 2599 managing forecasting projects, 2599 MAPE AUTOREG procedure, 368 Mardia’s test, 1048 normality tests, 1048 Marquardt method ARIMA procedure, 250 Marquardt-Levenberg method, 1028 MARR, see minimum attractive rate of return, 2789 mathematical functions, 51 functions, 1159
matrix language SAS/IML software, 54 maximizing likelihood functions, 56 maximum likelihood AR initial conditions, 1091 MA Initial Conditions, 1092 maximum likelihood method AUTOREG procedure, 362 MDC procedure binary data modeling example, 918 binary logit example, 918, 921 binary probit example, 918 bounds on parameter estimates, 890 BY groups, 891 conditional logit example, 921 conditional logit model, 870, 871, 904 goodness-of-fit measures, 915 Hausman’s specification and likelihood ratio tests for nested logit, 918 heteroscedastic extreme value model, 881, 906 ID groups, 891 introductory examples, 871 mixed logit model, 886, 907 multinomial discrete choice, 903 multinomial probit example, 924 multinomial probit model, 880, 909 nested logit example, 931 nested logit model, 876, 910 output table names, 917 restrictions on parameter estimates, 901 syntax, 887 mean absolute error statistics of fit, 2689 mean absolute percent error statistics of fit, 1819, 2689 mean percent error statistics of fit, 2690 mean prediction error statistics of fit, 2690 mean square error statistics of fit, 1819 mean squared error statistics of fit, 2689 MEANS procedure, 49 measurement equation observation equation, 1569 of a state space model, 1569 MELO estimation method, see minimum expected loss estimator memory requirements MODEL procedure, 1046 VARMAX procedure, 2002 menu interfaces
Subject Index F 2813
to SAS/ETS software, 45, 46 merging series time series data, 119 merging time series data sets, 119 Michaelis-Menten Equations, 1074 midpoint dates of time intervals, 101 MINIC (Minimum Information Criterion) method, 243 minimization methods MODEL procedure, 1027 minimization summary MODEL procedure, 1030 minimum attractive rate of return LOAN procedure, 852 MARR, 852 minimum expected loss estimator MELO estimation method, 1648 SYSLIN procedure, 1648 minimum information criteria method VARMAX procedure, 1940 Minimum Information Criterion (MINIC) method, 243 missing observations contrasted with omitted observations, 78 missing values, 748, 1106 COMPUTAB procedure, 457 contrasted with omitted observations, 78 embedded in time series, 78 ENTROPY procedure, 662 FORECAST procedure, 795 interpolation of, 124 MODEL procedure, 1025, 1143 smoothing models, 2670 time series data, 723 time series data and, 77 VARMAX procedure, 1912 missing values and time series data, 77, 78 MISSONLY operator, 748 mixed logit model MDC procedure, 886, 907 Mixture of Distributions example, 1225 MMAE, 2668 MMSE, 2668 model evaluation, 2662 Model Identification ARIMA procedure, 300 model list, 2425, 2606 MODEL procedure adjacency graph, 1178 adjusted R squared, 1027 Almon lag polynomials, 1102
analyzing models, 1174 ARMA model, 1088 autoregressive models, 1088 auxiliary equations, 1074 block structure, 1178 character variables, 1155 Chow tests, 1081 collinearity diagnostics, 1032, 1041 compiler listing, 1171 control variables, 1153 controlling starting values, 1034 convergence criteria, 1028 cross-equation covariance matrix, 1026 cross-reference, 1170 dependency list, 1174 derivatives, 1157 diagnostics and debugging, 1168 Durbin-Watson, 1025 dynamic simulation, 1068, 1117 Empirical Distribution Estimation, 1023 equation translations, 1155 equation variables, 1152 estimation convergence problems, 1038 estimation methods, 1007 estimation of ordinary differential equations, 1070 forecasting, 1120 full information maximum likelihood, 1019 functions across time, 1160 Gaussian distribution, 980 goal seeking, 1138 grid search, 1036 Hausman specification test, 1079 initializing lags, 1163 input data sets, 1104 internal variables, 1153 Jacobi method, 1141 Jacobi method with General Form Equations, 1142 lag functions, 1160 lag lengths, 1162 lag logic, 1161 language differences, 1164 large problems, 1045 likelihood confidence intervals, 1082 limitations on ordinary differential equations, 1147 linear dependencies, 1041 memory requirements, 1046 minimization methods, 1027 minimization summary, 1030 missing values, 1025, 1143 model variables, 1152 Monte Carlo simulation, 1218
2814 F Subject Index
Moore-Penrose generalized inverse, 985 moving average models, 1088 Multivariate t-Distribution Estimation, 1022 n-period-ahead forecasting, 1117 nested iterations, 1027 Newton’s Method, 1141 nonadditive errors, 1059 normal distribution, 980 ODS graph names, 1116 ordinary differential equations and goal seeking, 1074 output data sets, 1110 output table names, 1113 parameters, 1152 polynomial distributed lag models, 1102 program listing, 1169 program variables, 1155 properties of the estimates, 1025 quasi-random number generators, 1130 R squared, 1027, 1034 random-number generating functions, 1159 restrictions on parameters, 1098 S matrix, 1026 S-iterated methods, 1027 Seidel method, 1142 Seidel method with General Form Equations, 1142 SIMNLIN procedure, 945 simulated nonlinear least squares, 1019 simulation, 1120 solution mode output, 1132 solution modes, 1117, 1140 SOLVE Data Sets, 1148 starting values, 1031, 1038 static simulation, 1068 static simulations, 1117 stochastic simulation, 1120 storing programs, 1167 summary statistics, 1135 SYSNLIN procedure, 945 systems of ordinary differential equations, 1215 tests on parameters, 1078 time variable, 1074 troubleshooting estimation convergence problems, 1030 troubleshooting simulation problems, 1143 using models to forecast, 1120 using solution modes, 1117 variables in model program, 1151 _WEIGHT_ variable, 1052 MODEL procedure and differencing, 109 lags, 109
MODEL procedure version DIF function, 109 LAG function, 109 model selection, 2621 model selection criterion, 2502, 2610 model selection for X-11-ARIMA method X11 procedure, 2068 model selection list, 2611 model variables MODEL procedure, 1152 Model Viewer, 2427, 2615 graphs, 2419 plots, 2419 saving graphs and tables, 2628, 2630 Monte Carlo simulation, 1120, 1218 examples, 1218 MODEL procedure, 1218 Moore-Penrose generalized inverse, 985 mortgage loans, see LOAN procedure moving average function, 1160 moving average models, 1089 MODEL procedure, 1088 moving averages percent change calculations, 111, 112 moving between computer systems SAS data sets, 49 moving product operators, 752 moving rank operator, 751 moving seasonality test, 2076 moving t-value operators, 755 moving time window operators, 745 moving-average parameters ARIMA procedure, 256 multinomial discrete choice independence from irrelevant alternatives, 905 MDC procedure, 903 multinomial probit model MDC procedure, 880, 909 multiperiod leads, 112 multiperiod differences differencing, 109 multiperiod lagging lags, 109 multiperiod lags and DIF function, 109 LAG function, 109 summation, 114 multiple selections, 2398 multiplicative model ARIMA model, 212 multiplicative seasonal smoothing, 2677 smoothing models, 2677
Subject Index F 2815
multipliers SIMLIN procedure, 1515, 1516, 1519, 1520, 1523, 1524 multipliers for higher order lags SIMLIN procedure, 1520, 1534 multivariate autocorrelations, 1573 normality tests, 1048 partial autocorrelations, 1592 multivariate forecasting STATESPACE procedure, 1568 multivariate GARCH Modeling VARMAX procedure, 1907 Multivariate Mixture of Distributions example, 1225 multivariate model diagnostic checks VARMAX procedure, 1957 Multivariate t-Distribution Estimation MODEL procedure, 1022 multivariate time series STATESPACE procedure, 1568 n-period-ahead forecasting MODEL procedure, 1117 naming time intervals, 84, 130 naming model parameters ARIMA procedure, 256 national accounts data files (OECD), see DATASOURCE procedure national accounts data files (OECD) in FAME.db, see SASEFAME engine national income and product accounts, see DATASOURCE procedure DATASOURCE procedure, 590 negative log likelihood function, 1020 negative log-likelihood function, 1022 Nerlove variance components, 1294 nested iterations MODEL procedure, 1027 nested logit model MDC procedure, 876, 910 Newton’s Method MODEL procedure, 1141 Newton-Raphson optimization methods, 490, 896 Newton-Raphson method, 490, 896 NIPA Tables DATASOURCE procedure, 590 NLO Overview NLO system, 165 NLO system NLO Overview, 165
Options, 165 output table names, 183 remote monitoring, 181 NOMISS operator, 748 nonadditive errors MODEL procedure, 1059 nonlinear hypotheses Lagrange multiplier test, 1006, 1078, 1410 likelihood ratio test, 1006, 1078 Wald test, 1006, 1078, 1410 nonlinear least-squares AUTOREG procedure, 362 nonlinear models equality restriction, 999, 1076 functions of parameters, 981 inequality restriction, 975, 999, 1076 left-hand side expressions, 1152 restricted estimation, 975, 999, 1076 test of hypotheses, 1005 nonmissing observations statistics of fit, 2688 nonseasonal ARIMA model notation, 2680 nonseasonal transfer function notation, 2682 nonstationarity, see stationarity normal distribution MODEL procedure, 980 normality tests, 1048 Henze-Zirkler test, 1048 Jarque-Bera test, 344 Kolmogorov-Smirnov test, 1048 Mardia’s test, 1048 multivariate, 1048 Shapiro-Wilk test, 1048 normalizing dates in intervals INTNX function, 101 normalizing to intervals date values, 101 notation nonseasonal ARIMA model, 2680 nonseasonal transfer function, 2682 seasonal ARIMA model, 2681 seasonal transfer function, 2683 notation for ARIMA model, 207 ARMA model, 207 number of observations statistics of fit, 2688 number to use instrumental variables, 1085 numerator factors transfer function model, 218
2816 F Subject Index
OBJECT convergence measure, 1028 objective function, 1008 observation equation, see measurement equation observation numbers, 2644 as time ID, 2443 time ID variable, 2443 obtaining descriptive information DATASOURCE procedure, 525, 529–531, 550–553 ODS graph names ARIMA procedure, 276 AUTOREG procedure, 388 ENTROPY procedure, 666 ESM procedure, 705 EXPAND procedure, 758 MODEL procedure, 1116 SIMILARITY procedure, 1483 SYSLIN procedure, 1660 TIMESERIES procedure, 1714 UCM procedure, 1813 VARMAX procedure, 2000 ODS Graphics ARIMA procedure, 224 UCM procedure, 1754 OECD ANA data files DATASOURCE procedure, 610 OECD annual national accounts DATASOURCE procedure, 610 OECD data files, see DATASOURCE procedure OECD data files in FAME.db, see SASEFAME engine OECD main economic indicators DATASOURCE procedure, 612 OECD MEI data files DATASOURCE procedure, 612 OECD QNA data files DATASOURCE procedure, 611 OECD quarterly national accounts DATASOURCE procedure, 611 of a state space model impulse response matrix, 1600 innovation vector, 1569 input matrix, 1569 measurement equation, 1569 state transition equation, 1569 state vector, 1568 transition equation, 1569 transition matrix, 1569 of a time series unit root, 154 of interleaved time series overlay plots, 90 of missing values interpolation, 124, 723
of time series distribution, 724 overlay plots, 88 sampling frequency, 71, 84, 125 simulation, 2560, 2653 stationarity, 210 summation, 113 time ranges, 77 of time series data set standard form, 76 time series cross-sectional form, 79 of time series observations frequency, 84, 125 periodicity, 71, 84, 125 omitted observations contrasted with missing values, 78 defined, 78 replacing with missing values, 104 omitted observations in time series data, 78 one-way fixed effects model, 1283 random effects model, 1291 one-way fixed effects model PANEL procedure, 1283 one-way fixed-effects model, 1283 one-way random effects model PANEL procedure, 1291 one-way random-effects model, 1291 operations research SAS/OR software, 55 optimization methods Newton-Raphson, 490, 896 quasi-Newton, 350, 490, 896 trust region, 350, 490, 896 optimizations smoothing weights, 2671 Options NLO system, 165 options automatic model selection, 2568 order of calculations COMPUTAB procedure, 451 order statistics, see RANK procedure Ordinal Discrete Choice Modeling QLIM procedure, 1396 ordinary differential equations (ODEs) and goal seeking, 1074 differential algebraic equations, 1148 example, 1215 explosive differential equations, 1147 limitations on, 1147 systems of, 1215 ordinary differential equations and goal seeking
Subject Index F 2817
MODEL procedure, 1074 Organization for Economic Cooperation and Development data files, see DATASOURCE procedure DATASOURCE procedure, 610 Organization for Economic Cooperation and Development data files in FAME.db, see SASEFAME engine orthogonal polynomials PDLREG procedure, 1350 OUT= data set indexing, 549 OUTALL= data set DATASOURCE procedure, 530 OUTBY= data set DATASOURCE procedure, 529 OUTCONT= data set DATASOURCE procedure, 525, 531 Outlier Detection ARIMA procedure, 305 Output Data Sets VARMAX procedure, 1987 output data sets and the OUTPUT statement, 83 ARIMA procedure, 262–264, 267, 268 AUTOREG procedure, 384 BOXCOXAR macro, 151 COMPUTAB procedure, 457 DATASOURCE procedure, 523, 548, 550–553 DFTEST macro, 155 different forms of, 82 ENTROPY procedure, 664 EXPAND procedure, 756 FORECAST procedure, 806, 808 in standard form, 83 interleaved form, 82 LOAN procedure, 853, 854 LOGTEST macro, 157 MODEL procedure, 1110 PANEL procedure, 1320–1322 PDLREG procedure, 1363 produced by SAS/ETS procedures, 82 SIMLIN procedure, 1522, 1523 SPECTRA procedure, 1552 STATESPACE procedure, 1601, 1602 SYSLIN procedure, 1655, 1656 X11 procedure, 2071, 2072 Output Delivery System (ODS), 2618, 2657 OUTPUT statement SAS/ETS procedures using, 83 output table names ARIMA procedure, 272 AUTOREG procedure, 386
COUNTREG procedure, 508 ENTROPY procedure, 665 LOAN procedure, 856 MDC procedure, 917 MODEL procedure, 1113 NLO system, 183 PANEL procedure, 1323 PDLREG procedure, 1364 QLIM procedure, 1415 SIMLIN procedure, 1525 SPECTRA procedure, 1554 STATESPACE procedure, 1605 SYSLIN procedure, 1659 TSCSREG procedure, 1738 X11 procedure, 2085 over identification restrictions SYSLIN procedure, 1654 overlay plot of time series data, 88 overlay plots of interleaved time series, 90 of time series, 88 _TYPE_ variable and, 90 p-values for Durbin-Watson test, 324 panel data TSCSREG procedure, 1727 Panel GMM, 1304 GMM in Panel: Arellano and Bond’s Estimator, 1304 PANEL procedure Between Estimators, 1291 BY groups, 1271 Da Silva method, 1302 generalized least squares, 1300 HCCME =, 1313 ID variables, 1273 linear hypothesis testing, 1312 one-way fixed effects model, 1283 one-way random effects model, 1291 output data sets, 1320–1322 output table names, 1323 Parks method, 1300 Pooled Estimator, 1291 predicted values, 1279 printed output, 1323 R-square measure, 1316 residuals, 1280 specification tests, 1317 two-way fixed effects model, 1285 two-way random effects model, 1294 Zellner’s two-stage method, 1301 parameter change vector, 1042
2818 F Subject Index
parameter estimates, 2433 parameter estimation, 2662 parameters MODEL procedure, 1152 UCM procedure, 1757–1768, 1770–1780 Pareto charts, 56 Parks method PANEL procedure, 1300 partial autocorrelations multivariate, 1592 partial autoregression coefficient VARMAX procedure, 1899, 1936 partial canonical correlation VARMAX procedure, 1899, 1939 partial correlation VARMAX procedure, 1938 PDL, see polynomial distributed lags PDLREG procedure BY groups, 1355 confidence limits, 1359 distributed lag regression models, 1349 orthogonal polynomials, 1350 output data sets, 1363 output table names, 1364 polynomial distributed lags, 1350 predicted values, 1359 residuals, 1359 restricted estimation, 1360 percent change calculations at annual rates, 110 introduced, 110 moving averages, 111, 112 period-to-period, 110 time series data, 110–112 year-over-year, 110 yearly averages, 111 percent change calculations and DIF function, 110–112 differencing, 110–112 LAG function, 110–112 lags, 110–112 percent operators, 755 period of evaluation, 2506 period of fit, 2506, 2575, 2649 period-to-period percent change calculations, 110 Periodic Equivalent, see Uniform Periodic Equivalent periodicity changing by interpolation, 125, 721 of time series observations, 71, 84, 125 periodicity of time series data, 84, 125 periodicity of time series
time intervals, 84, 125 periodogram SPECTRA procedure, 1542, 1553 Phillips-Ouliaris test, 345, 377 Phillips-Perron test, 345, 376 unit roots, 345, 346, 376 Phillips-Perron tests, 231 Physical Names on Supported hosts SASEFAME engine, 2299 Physical path name syntax for variety of environments SASEFAME engine, 2299 pitfalls of DIF function, 107 LAG function, 107 plot axis and time intervals, 88 plot axis for time series SGPLOT procedure, 88 PLOT procedure, 49 plotting time series, 92 time series data, 92 plot reference lines and time intervals, 88 plots, see Model Viewer, see Time Series Viewer plots of interleaved time series, 90 plotting autocorrelations, 193 forecasts, 2436 prediction errors, 2429 residual, 91 time series data, 86 plotting time series PLOT procedure, 92 SGPLOT procedure, 87 Time Series Viewer procedure, 86 point interventions, 2685 point interventions, 2685 point-in-time values, 721, 724 polynomial distributed lag models MODEL procedure, 1102 polynomial distributed lags Almon lag polynomials, 1349 endpoint restrictions for, 1350, 1356 PDL, 1349 PDLREG procedure, 1350 Polynomial specification, 2555, 2584, 2622 pooled pooled estimator, 1291 Pooled Estimator PANEL procedure, 1291 pooled estimator, 1291
Subject Index F 2819
pooled, 1291 Portfolio, see Investment Portfolio power curve trend curves, 2685 power curve trend, 2685 PPC convergence measure, 1028 Prais-Winsten estimates AUTOREG procedure, 361 PRED. variables, 1155 predetermined variables SYSLIN procedure, 1616 predicted values ARIMA procedure, 257 AUTOREG procedure, 355, 381, 382 conditional variance, 383 FORECAST procedure, 807 PANEL procedure, 1279 PDLREG procedure, 1359 SIMLIN procedure, 1513, 1518 STATESPACE procedure, 1597, 1601 structural, 356, 381, 1359 SYSLIN procedure, 1640 transformed models, 1061 predicting conditional variance, 383 prediction error covariance VARMAX procedure, 1897, 1930, 1932 prediction errors autocorrelations, 2430 plotting, 2429 residuals, 2498 stationarity, 2431 predictions smoothing models, 2670 predictive Chow test, 344, 376 predictive Chow tests, 1081 predictor variables forecasting models, 2511 independent variables, 2511 inputs, 2511 interventions, 2685 seasonal dummies, 2687 specifying, 2511 trend curves, 2684 Present Value Analysis, see Time Value Analysis present worth of cost LOAN procedure, 852 prewhitening ARIMA procedure, 246, 247 principal component, 1041 PRINT procedure, 49 printing SAS data sets, 49 printed output ARIMA procedure, 269
AUTOREG procedure, 385 PANEL procedure, 1323 SIMLIN procedure, 1523 STATESPACE procedure, 1603 SYSLIN procedure, 1657 X11 procedure, 2074 printing SAS data sets, 49 printing SAS data sets, see PRINT procedure probability functions, 51 PROBDF Function Dickey-Fuller test, 158 Financial Functions, 158 significance probabilities, 158 significance probabilities for Dickey-Fuller tests, 158 PROBDF function defined, 158 probit QLIM Procedure, 1376 produced by SAS/ETS procedures output data sets, 82 Producer Price Index Survey, see DATASOURCE procedure producing forecasts, 2404, 2623 producing forecasts, 2623 program flow COMPUTAB procedure, 448 program listing MODEL procedure, 1169 program variables MODEL procedure, 1155 programming statements COMPUTAB procedure, 445 Project Management window forecasting project, 2411 properties of the estimates MODEL procedure, 1025 properties of time series, 2453 PROTO procedure, 49 printing SAS data sets, 49 QLIM Procedure, 1376 logit, 1376 probit, 1376 selection, 1376 tobit, 1376 QLIM procedure Bivariate Limited Dependent Variable Modeling, 1406 Box-Cox Modeling, 1406 BY groups, 1387 Censored Regression Models, 1399
2820 F Subject Index
Frontier, 1403 Heteroscedasticity, 1405 Limited Dependent Variable Models, 1399 Multivariate Limited Dependent Models, 1409 Ordinal Discrete Choice Modeling, 1396 Output, 1411 output table names, 1415 Selection Models, 1408 syntax, 1382 Tests on Parameters, 1410 Truncated Regression Models, 1402 Types of Tobit Model, 1400 quadratic trend curves, 2684 quadratic trend, 2684 quadrature spectrum cross-spectral analysis, 1553 SPECTRA procedure, 1553 quasi-Newton optimization methods, 350, 490, 896 quasi-Newton method, 350, 490, 896 AUTOREG procedure, 342 quasi-random number generators MODEL procedure, 1130 R convergence measure, 1028 R square statistic statistics of fit, 1819 R squared MODEL procedure, 1027, 1034 R-square measure PANEL procedure, 1316 R-square statistic statistics of fit, 2689 SYSLIN procedure, 1651 R-squared measure, 1316 ramp interventions, 2686 ramp function, see ramp interventions ramp interventions, 2686 ramp function, 2685 Ramsey’s test, see RESET test random effects model one-way, 1291 two-way, 1294 random number functions, 51 random walk model AUTOREG procedure, 397 random walk R-square statistics of fit, 1820, 2689 random-number functions functions, 1159 random-number generating functions
MODEL procedure, 1159 random-walk with drift tests, 231 range of output observations EXPAND procedure, 736 RANGE= option in the LIBNAME statement SASEFAME engine, 2307 RANK procedure, 49 order statistics, 49 rate adjustment cases LOAN procedure, 846 rates contrasted with stocks or levels, 724 ratio operators, 755 rational transfer functions ARIMA procedure, 219 reading time series data, 66, 127 reading data files DATASOURCE procedure, 523 reading from a FAME data base SASEFAME engine, 2290 reading from a HAVER DLX database SASEHAVR engine, 2340 reading from CRSP data files SASECRSP engine, 2194 reading, with DATA step time series data, 125, 126 recommended for time series ID formats, 71 recursive residuals, 356, 368 reduced form coefficients SIMLIN procedure, 1519, 1524, 1528 SYSLIN procedure, 1653 reference forecasting models, 2508 SGPLOT procedure, 88 regression model with ARMA errors ARIMA procedure, 213, 214 regressor selection, 2627 regressors forecasting models, 2518 specifying, 2518 relation to ARMA models state space models, 1599 Remote FAME Access, Using FAME CHLI SASEFAME engine, 2292 remote monitoring NLO system, 181 RENAME in the DATA step SASEFAME engine, 2304 renaming SAS data sets, 49 renaming variables DATASOURCE procedure, 532, 547
Subject Index F 2821
replacing with missing values omitted observations, 104 represented by different series cross sectional dimensions, 79 represented with BY groups cross-sectional dimensions, 79 reserved words COMPUTAB procedure, 456 RESET test, 344 Ramsey’s test, 344 RESID. variables, 1054, 1059, 1155 residual plotting, 91 residual analysis, 2498 residuals, see prediction errors ARIMA procedure, 257 AUTOREG procedure, 356 FORECAST procedure, 807 PANEL procedure, 1280 PDLREG procedure, 1359 SIMLIN procedure, 1518 STATESPACE procedure, 1601 structural, 356, 1359 SYSLIN procedure, 1640 restarting the SASEFAME engine SASEFAME engine, 2290 RESTRICT statement, 352, 648, 999 restricted estimates STATESPACE procedure, 1587 restricted estimation, 352 linear models, 648 nonlinear models, 975, 999, 1076 PDLREG procedure, 1360 SYSLIN procedure, 1641, 1642 restricted vector autoregression, 1098 restrictions on parameters MODEL procedure, 1098 RETAIN statement computing lags, 108 RETAIN statement and differencing, 108 lags, 108 root mean square error statistics of fit, 1819, 2689 row blocks COMPUTAB procedure, 456 ROWxxxxx: label COMPUTAB procedure, 446 RPC convergence measure, 1028 S convergence measure, 1028 S matrix definition, 1008 MODEL procedure, 1026
S matrix used in estimation, 1026 S-iterated methods MODEL procedure, 1027 sample cross covariances VARMAX procedure, 1897, 1935 sample cross-correlations VARMAX procedure, 1896, 1935 sample data sets, 2380, 2393 sampling frequency changing by interpolation, 125 of time series, 71, 84, 125 time intervals and, 84 sampling frequency of time series data, 84, 125 sampling frequency of time series time intervals, 84, 125 SAS and CRSP Dates SASECRSP engine, 2208 SAS catalogs, see CATALOG procedure SAS data sets contents of, 49 copying, 49 DATA step, 49 moving between computer systems, 49 printing, 49 renaming, 49 sorting, 49 structured query language, 50 summarizing, 49, 50 transposing, 50 SAS data sets and time series data, 65 SAS DATA step SASECRSP engine, 2194 SASEFAME engine, 2291 SASEHAVR engine, 2341 SAS Date Format SASECRSP engine, 2208 SAS language features for time series data, 64 SAS macros BOXCOXAR macro, 150 DFPVALUE macro, 153 DFTEST macro, 154 LOGTEST macro, 156 macros, 149 SAS options statement, using VALIDVARNAME=ANY SASEFAME engine, 2299, 2304 SAS output data set SASECRSP engine, 2206 SASEFAME engine, 2297 SASEHAVR engine, 2346 SAS representation for
2822 F Subject Index
date values, 68 datetime values, 69 SAS Risk Products, 60 SAS source statements, 2582 SAS YEARCUTOFF= option DATASOURCE procedure, 544 SAS/ETS procedures using OUTPUT statement, 83 SAS/GRAPH software, 52 graphics, 52 SAS/HPF, 51 SAS/IML software, 54 IML, 54 matrix language, 54 SAS/OR software, 55 operations research, 55 SAS/QC software, 56 statistical quality control, 56 SAS/STAT software, 53 SASECRSP engine @CRSPDB Date Informats, 2209 @CRSPDR Date Informats, 2209 @CRSPDT Date Informats, 2209 CONTENTS procedure, 2194 Converting Dates Using the CRSP Date Functions, 2208 CRSP and SAS Dates, 2208 CRSP Date Formats, 2208 CRSP Date Functions, 2208 CRSP Date Informats, 2209 CRSP Integer Date Format, 2208 CRSPDB_SASCAL environment variable, 2194 CRSPDCI Date Functions, 2210 CRSPDCS Date Functions, 2210 CRSPDI2S Date Function, 2210 CRSPDIC Date Functions, 2210 CRSPDS2I Date Function, 2210 CRSPDSC Date Functions, 2210 CRSPDT Date Formats, 2208 Environment variable, CRSPDB_SASCAL, 2194 LIBNAME statement, 2190 reading from CRSP data files, 2194 SAS and CRSP Dates, 2208 SAS DATA step, 2194 SAS Date Format, 2208 SAS output data set, 2206 SETID option, 2194 SQL procedure, creating a view, 2194 SASEFAME engine CONTENTS procedure, 2291 convert option, 2291 creating a FAME view, 2289
DOT as a GLUE character, 2294 DRI data files in FAME.db , 2289 DRI/McGraw-Hill data files in FAME.db, 2289 DROP in the DATA step, 2304 FAME data files, 2289 FAME glue symbol named DOT, 2299 FAME Information Services Databases, 2289 fatal error when reading from a FAME data base, 2290 finishing the FAME CHLI, 2290 GLUE symbol, 2294 KEEP in the DATA step, 2304 LIBNAME libref SASEHAVR ’physical name’ on UNIX, 2299 LIBNAME libref SASEHAVR ’physical name’ on Windows, 2299 LIBNAME interface engine for FAME databases, 2289 LIBNAME statement, 2290 main economic indicators (OECD) data files in FAME.db, 2289 national accounts data files (OECD) in FAME.db, 2289 OECD data files in FAME.db, 2289 Organization for Economic Cooperation and Development data files in FAME.db, 2289 Physical Names on Supported hosts, 2299 Physical path name syntax for variety of environments, 2299 RANGE= option in the LIBNAME statement, 2307 reading from a FAME data base, 2290 Remote FAME Access, Using FAME CHLI, 2292 RENAME in the DATA step, 2304 restarting the SASEFAME engine, 2290 SAS DATA step, 2291 SAS options statement, using VALIDVARNAME=ANY, 2299, 2304 SAS output data set, 2297 Special characters in SAS Variable names, the glue symbol DOT, 2299 SQL procedure, using clause, 2291 SQL procedure,creating a view, 2291 Supported hosts, 2290 Using CROSSLIST= option to create a view, 2292 Using INSET= option with the CROSSLIST= option to create a view, 7, 2292
Subject Index F 2823
Using RANGE= option to create a view, 2292 Using WHERE clause with INSET= option to create a view, 2292 Using WILDCARD= option to create a view, 2292 VALIDVARNAME=ANY, SAS option statement, 2299, 2304 viewing a FAME database, 2289 WHERE in the DATA step, 2307 SASEHAVR engine creating a Haver view, 2339 frequency option, 2340 Haver data files, 2339 Haver Information Services Databases, 2339 LIBNAME interface engine for Haver databases, 2339 LIBNAME statement, 2340 reading from a HAVER DLX database, 2340 SAS DATA step, 2341 SAS output data set, 2346 viewing a Haver database, 2339 SASHELP library, 2393 saving and restoring forecasting project, 2412 Savings, 2718 SBC, see Schwarz Bayesian criterion, see Schwarz Bayesian information criterion scale operators, 754 SCAN (Smallest Canonical) correlation method, 244 Schwarz Bayesian criterion ARIMA procedure, 251 AUTOREG procedure, 370 SBC, 251 Schwarz Bayesian information criterion BIC, 2689 SBC, 2689 statistics of fit, 2689 seasonal adjustment time series data, 2034, 2103 X11 procedure, 2034, 2040 X12 procedure, 2103 seasonal ARIMA model notation, 2681 Seasonal ARIMA model options, 2631 seasonal component X11 procedure, 2034 X12 procedure, 2103 seasonal dummies, 2687 predictor variables, 2687 seasonal dummy variables forecasting models, 2539 specifying, 2539
seasonal exponential smoothing, 2676 smoothing models, 2676 seasonal forecasting additive Winters method, 803 FORECAST procedure, 799, 803 WINTERS method, 799 seasonal model ARIMA model, 212 ARIMA procedure, 212 seasonal transfer function notation, 2683 seasonal unit root test, 246 seasonality FORECAST procedure, 805 testing for, 154 seasonality test, 2688 seasonality tests, 2076 seasonality, testing for DFTEST macro, 154 second difference DIF function, 109 differencing, 109 See ordinary differential equations differential equations, 1070 seemingly unrelated regression, 1010 cross-equation covariance matrix, 1011 joint generalized least squares, 1614 SUR estimation method, 1614 SYSLIN procedure, 1622, 1649 Zellner estimation, 1614 Seidel method MODEL procedure, 1142 Seidel method with General Form Equations MODEL procedure, 1142 selecting from a list forecasting models, 2457 selection QLIM Procedure, 1376 selection criterion, 2610 sequence operators, 752 serial correlation correction AUTOREG procedure, 316 series autocorrelations, 2495 series adjustments, 2667 series diagnostics, 2453, 2632, 2687 series selection, 2633 series transformations, 2496 set operators, 753 SETID option SASECRSP engine, 2194 SETMISS operator, 749 SGMM simulated generalized method of moments, 1016
2824 F Subject Index
SGPLOT procedure plot axis for time series, 88 plotting time series, 87 reference, 88 time series data, 87 Shapiro-Wilk test, 1048 normality tests, 1048 sharing forecasting project, 2416 Shewhart control charts, 56 shifted time intervals, 131 shifted intervals, see time intervals, shifted significance probabilities Dickey-Fuller test, 158 PROBDF Function, 158 unit root, 158 significance probabilities for Dickey-Fuller test, 153 significance probabilities for Dickey-Fuller tests PROBDF Function, 158 SIMILARITY procedure BY groups, 1450 ODS graph names, 1483 SIMLIN procedure BY groups, 1516 dynamic models, 1512, 1513, 1519, 1534 dynamic multipliers, 1519, 1520 dynamic simulation, 1513 EST= data set, 1521 ID variables, 1517 impact multipliers, 1519, 1524 initializing lags, 1522 interim multipliers, 1515, 1520, 1523, 1524 lags, 1522 linear structural equations, 1519 multipliers, 1515, 1516, 1519, 1520, 1523, 1524 multipliers for higher order lags, 1520, 1534 output data sets, 1522, 1523 output table names, 1525 predicted values, 1513, 1518 printed output, 1523 reduced form coefficients, 1519, 1524, 1528 residuals, 1518 simulation, 1513 statistics of fit, 1524 structural equations, 1519 structural form, 1519 total multipliers, 1516, 1520, 1523, 1524 TYPE=EST data set, 1519 SIMNLIN procedure, see MODEL procedure simple data set, 2407
simple exponential smoothing, 2672 smoothing models, 2672 simulated method of moments GMM, 1016 simulated nonlinear least squares MODEL procedure, 1019 simulating ARIMA model, 2560, 2653 Simulating from a Mixture of Distributions examples, 1225 simulation MODEL procedure, 1120 of time series, 2560, 2653 SIMLIN procedure, 1513 time series, 2560, 2653 simultaneous equation bias, 1009 SYSLIN procedure, 1615 single equation estimators SYSLIN procedure, 1648 single exponential smoothing, see exponential smoothing sliding spans analysis, 2060 Smallest Canonical (SCAN) correlation method, 244 SMM, 1016 GMM, 1016 SMM simulated method of moments, 1016 smoothing equations, 2669 smoothing models, 2669 smoothing model specification, 2639, 2641 smoothing models calculations, 2669 damped-trend exponential smoothing, 2675 double exponential smoothing, 2673 exponential smoothing, 2669 forecasting models, 2462, 2669 initializations, 2670 linear exponential smoothing, 2674 missing values, 2670 multiplicative seasonal smoothing, 2677 predictions, 2670 seasonal exponential smoothing, 2676 simple exponential smoothing, 2672 smoothing equations, 2669 smoothing state, 2669 smoothing weights, 2671 specifying, 2462 standard errors, 2672 underlying model, 2669 Winters Method, 2678, 2679 smoothing state, 2669 smoothing models, 2669 smoothing weights, 2641, 2671 additive-invertible region, 2671
Subject Index F 2825
boundaries, 2671 FORECAST procedure, 803 optimizations, 2671 smoothing models, 2671 specifications, 2671 weights, 2671 solution mode output MODEL procedure, 1132 solution modes MODEL procedure, 1117, 1140 SOLVE Data Sets MODEL procedure, 1148 SORT procedure, 49 sorting, 49 sorting, see SORT procedure forecasting models, 2504, 2581 SAS data sets, 49 time series data, 72 sorting by ID variables, 72 Special characters in SAS Variable names, the glue symbol DOT SASEFAME engine, 2299 specification tests PANEL procedure, 1317 specifications smoothing weights, 2671 specifying adjustments, 2522 ARIMA models, 2465 combination models, 2482 custom models, 2472 dynamic regression, 2523 Factored ARIMA models, 2468 forecasting models, 2453 interventions, 2527 level shifts, 2532 predictor variables, 2511 regressors, 2518 seasonal dummy variables, 2539 smoothing models, 2462 state space models, 1578 time ID variable, 2648 trend changes, 2530 trend curves, 2515 SPECTRA procedure BY groups, 1546 Chirp-Z algorithm, 1549 coherency of cross-spectrum, 1553 cospectrum estimate, 1553 cross-periodogram, 1553 cross-spectral analysis, 1542, 1553 cross-spectrum, 1553 fast Fourier transform, 1549
finite Fourier transform, 1542 Fourier coefficients, 1553 Fourier transform, 1542 frequency, 1552 kernels, 1549 output data sets, 1552 output table names, 1554 periodogram, 1542, 1553 quadrature spectrum, 1553 spectral analysis, 1542 spectral density estimate, 1542, 1553 spectral window, 1547 white noise test, 1551, 1554 spectral analysis SPECTRA procedure, 1542 spectral density estimate SPECTRA procedure, 1542, 1553 spectral window SPECTRA procedure, 1547 SPLINE method EXPAND procedure, 739 splitting series time series data, 118 splitting time series data sets, 118 SQL procedure, 50 structured query language, 50 SQL procedure, creating a view SASECRSP engine, 2194 SQL procedure, using clause SASEFAME engine, 2291 SQL procedure,creating a view SASEFAME engine, 2291 square root transformations, 2667 square root transformation, see transformations stable seasonality test, 2076 standard errors smoothing models, 2672 standard form of time series data set, 76 standard form of time series data, 76 STANDARD procedure, 50 standardized values, 50 standardized values, see STANDARD procedure starting dates of time intervals, 100, 101 starting values GARCH model, 342 MODEL procedure, 1031, 1038 Stat Studio software, 54 state and area employment, hours, and earnings survey, see DATASOURCE procedure state space model
2826 F Subject Index
UCM procedure, 1787 state space models form of, 1568 relation to ARMA models, 1599 specifying, 1578 state vector of, 1568 STATESPACE procedure, 1568 state transition equation of a state space model, 1569 state vector of a state space model, 1568 state vector of state space models, 1568 state-space representation VARMAX procedure, 1913 STATESPACE procedure automatic forecasting, 1568 BY groups, 1586 canonical correlation analysis, 1570, 1593 confidence limits, 1601 differencing, 1587 forecasting, 1568, 1597 ID variables, 1586 Kalman filter, 1570 multivariate forecasting, 1568 multivariate time series, 1568 output data sets, 1601, 1602 output table names, 1605 predicted values, 1597, 1601 printed output, 1603 residuals, 1601 restricted estimates, 1587 state space models, 1568 time intervals, 1585 Yule-Walker equations, 1590 static simulation, 1068 MODEL procedure, 1068 static simulations MODEL procedure, 1117 stationarity and state space models, 1571 ARIMA procedure, 194 nonstationarity, 194 of time series, 210 prediction errors, 2431 testing for, 154 VARMAX procedure, 1941, 1949 stationarity tests, 231, 246, 345 stationarity, testing for DFTEST macro, 154 statistical quality control SAS/QC software, 56 statistics of fit, 1819, 2425, 2434, 2643, 2688 adjusted R-square, 1819, 2689
Akaike’s information criterion, 2689 Amemiya’s prediction criterion, 2689 Amemiya’s R-square, 1820, 2689 corrected sum of squares, 2689 error sum of squares, 2689 goodness of fit, 2434 goodness-of-fit statistics, 1819, 2688 mean absolute error, 2689 mean absolute percent error, 1819, 2689 mean percent error, 2690 mean prediction error, 2690 mean square error, 1819 mean squared error, 2689 nonmissing observations, 2688 number of observations, 2688 R square statistic, 1819 R-square statistic, 2689 random walk R-square, 1820, 2689 root mean square error, 1819, 2689 Schwarz Bayesian information criterion, 2689 SIMLIN procedure, 1524 uncorrected sum of squares, 2689 step interventions, 2685 step function, see step interventions interpolation of time series, 740 intervention model and, 217 step interventions, 2685 step function, 2685 STEP method EXPAND procedure, 740 STEPAR method FORECAST procedure, 797 stepwise autoregression AUTOREG procedure, 327 FORECAST procedure, 774, 797 stochastic simulation MODEL procedure, 1120 stock data files, see DATASOURCE procedure stocks contrasted with flow variables, 724 stored in SAS data sets time series data, 75 storing programs MODEL procedure, 1167 structural predicted values, 356, 381, 1359 residuals, 356, 1359 structural change Chow test for, 343 structural equations SIMLIN procedure, 1519 structural form
Subject Index F 2827
SIMLIN procedure, 1519 structural predictions AUTOREG procedure, 381 structured query language, see SQL procedure SAS data sets, 50 subset model ARIMA model, 212 ARIMA procedure, 212 AUTOREG procedure, 329 subsetting data, see WHERE statement subsetting data files DATASOURCE procedure, 523, 536 summarizing SAS data sets, 49, 50 summary of time intervals, 132 summary statistics MODEL procedure, 1135 summation higher order sums, 114 multiperiod lags and, 114 of time series, 113 summation of time series data, 113, 114 Supported hosts SASEFAME engine, 2290 SUR estimation method, see seemingly unrelated regression Switching Regression example examples, 1221 syntax for date values, 68 datetime values, 69 time intervals, 84 time values, 69 SYSLIN procedure Basmann test, 1639, 1654 BY groups, 1637 endogenous variables, 1616 exogenous variables, 1616 full information maximum likelihood, 1624, 1649 Fuller’s modification to LIML, 1654 instrumental variables, 1616 iterated seemingly unrelated regression, 1649 iterated three-stage least squares, 1649 jointly dependent variables, 1616 K-class estimation, 1648 lagged endogenous variables, 1616 limited information maximum likelihood, 1648 minimum expected loss estimator, 1648 ODS graph names, 1660
output data sets, 1655, 1656 output table names, 1659 over identification restrictions, 1654 predetermined variables, 1616 predicted values, 1640 printed output, 1657 R-square statistic, 1651 reduced form coefficients, 1653 residuals, 1640 restricted estimation, 1641, 1642 seemingly unrelated regression, 1622, 1649 simultaneous equation bias, 1615 single equation estimators, 1648 system weighted MSE, 1651 system weighted R-square, 1651, 1657 tests of hypothesis, 1643, 1644 three-stage least squares, 1622, 1649 two-stage least squares, 1619, 1648 SYSNLIN procedure, see MODEL procedure system weighted MSE SYSLIN procedure, 1651 system weighted R-square SYSLIN procedure, 1651, 1657 systems of ordinary differential equations (ODEs), 1215 systems of differential equations examples, 1215 systems of ordinary differential equations MODEL procedure, 1215 t distribution GARCH model, 367 table cells, direct access to COMPUTAB procedure, 456 table names UCM procedure, 1810 TABULATE procedure, 50 tabulating data, 50 tabulating data, see TABULATE procedure taxes LOAN procedure, 852 tentative order selection VARMAX procedure, 1935 test of hypotheses nonlinear models, 1005 TEST statement, 353 testing for heteroscedasticity, 329 seasonality, 154 stationarity, 154 unit root, 154 testing order of differencing, 154
2828 F Subject Index
testing over-identifying restrictions, 1015 tests of hypothesis SYSLIN procedure, 1643, 1644 tests of parameters, 353, 649, 1005 tests on parameters MODEL procedure, 1078 The D-method example, 1221 three-stage least squares, 1011 3SLS estimation method, 1614 SYSLIN procedure, 1622, 1649 time functions, 95 time ID creation, 2644, 2646, 2647 time ID variable, 2389 creating, 2439 ID variable, 2389 observation numbers, 2443 specifying, 2648 time intervals, 2395 alignment of, 132 ARIMA procedure, 259 calendar calculations and, 104 ceiling of, 102 checking data periodicity, 103 counting, 99, 102 data frequency, 2384 date values, 129 datetime values, 129 ending dates of, 101 examples of, 135 EXPAND procedure, 735 EXPAND procedure and, 124 FORECAST procedure, 796 frequency of data, 2384 functions, 143 functions for, 98, 143 ID values for, 100 incrementing dates by, 98, 99 INTCK function and, 99, 102 INTERVAL= option and, 84 intervals, 84 INTNX function and, 98 midpoint dates of, 101 naming, 84, 130 periodicity of time series, 84, 125 plot axis and, 88 plot reference lines and, 88 sampling frequency of time series, 84, 125 shifted, 131 starting dates of, 100, 101 STATESPACE procedure, 1585 summary of, 132 syntax for, 84 UCM procedure, 1767
use with SAS/ETS procedures, 85 VARMAX procedure, 1892 widths of, 101, 735 time intervals and calendar calculations, 104 date values, 100 frequency, 84, 125 sampling frequency, 84 time intervals, functions interval functions, 98 time intervals, shifted shifted intervals, 131 time range DATASOURCE procedure, 544 time range of data DATASOURCE procedure, 526 time ranges, 2506, 2575, 2649 of time series, 77 time ranges of time series data, 77 time series definition, 2380 diagnostic tests, 2453 in SAS data sets, 2380 simulation, 2560, 2653 time series cross sectional form TSCSREG procedure and, 80 time series cross-sectional form BY groups and, 79 ID variables for, 79 of time series data set, 79 TSCSREG procedure and, 1727 time series cross-sectional form of time series data set, 79 time series data aggregation of, 721, 724 changing periodicity, 125, 721 converting frequency of, 721 differencing, 105–112 distribution of, 724 embedded missing values in, 78 giving dates to, 67 ID variable for, 67 interpolation, 125 interpolation of, 124, 125, 723 lagging, 105–112 leads, 112 merging series, 119 missing values, 723 missing values and, 77, 78 omitted observations in, 78 overlay plot of, 88 percent change calculations, 110–112 periodicity of, 84, 125
Subject Index F 2829
PLOT procedure, 92 plotting, 86 reading, 66, 127 reading, with DATA step, 125, 126 sampling frequency of, 84, 125 SAS data sets and, 65 SAS language features for, 64 seasonal adjustment, 2034, 2103 SGPLOT procedure, 87 sorting, 72 splitting series, 118 standard form of, 76 stored in SAS data sets, 75 summation of, 113, 114 time ranges of, 77 Time Series Viewer, 86 transformation of, 726, 742 transposing, 119, 121 time series data and missing values, 77 time series data set interleaved form of, 80 time series cross-sectional form of, 79 time series forecasting, 2651 Time Series Forecasting System invoking, 2546 invoking from SAS/AF and SAS/EIS applications, 2546 running in unattended mode, 2546 time series methods FORECAST procedure, 786 time series variables DATASOURCE procedure, 524, 549 Time Series Viewer, 2419, 2491, 2654 graphs, 2419 invoking, 2545 plots, 2419 saving graphs and tables, 2628, 2630 time series data, 86 Time Series Viewer procedure plotting time series, 86 time trend models FORECAST procedure, 784 Time Value Analysis, 2766 time values defined, 69 formats, 142 functions, 143 informats, 136 syntax for, 69 time variable, 1074 MODEL procedure, 1074 time variables computing from datetime values, 98
introduced, 95 TIMEPLOT procedure, 50, 93 TIMESERIES procedure BY groups, 1688 ODS graph names, 1714 to higher frequency interpolation, 125 to lower frequency interpolation, 125 to SAS/ETS software menu interfaces, 45, 46 to standard form transposing time series, 119, 121 tobit QLIM Procedure, 1376 Toeplitz matrix AUTOREG procedure, 358 total multipliers SIMLIN procedure, 1516, 1520, 1523, 1524 trading-day component X11 procedure, 2034, 2040 transfer function model ARIMA procedure, 213, 218, 251 denominator factors, 218 numerator factors, 218 transfer functions, 2682 forecasting models, 2682 transformation of time series data, 726, 742 transformation of time series EXPAND procedure, 726, 742 transformations, 2637 Box Cox, 2667 Box Cox transformation, 2667 log, 2667 log transformation, 2667 logistic, 2667 square root, 2667 square root transformation, 2667 transformed models predicted values, 1061 transition equation of a state space model, 1569 transition matrix of a state space model, 1569 TRANSPOSE procedure, 50, 119, 121, 122, 126 transposing SAS data sets, 50 TRANSPOSE procedure and transposing time series, 119 transposing SAS data sets, 50 time series data, 119, 121 transposing SAS data sets, see TRANSPOSE procedure
2830 F Subject Index
transposing time series cross-sectional dimensions, 121 from interleaved form, 119 from standard form, 122 to standard form, 119, 121 TRANSPOSE procedure and, 119 trend changes specifying, 2530 trend curves, 2684 cubic, 2684 exponential, 2685 forecasting models, 2515 hyperbolic, 2685 linear, 2684 logarithmic, 2685 logistic, 2684 power curve, 2685 predictor variables, 2684 quadratic, 2684 specifying, 2515 trend cycle component X11 procedure, 2034, 2040 trend test, 2688 TRIM operator, 748 TRIMLEFT operator, 748 TRIMRIGHT operator, 748 triple exponential smoothing, see exponential smoothing troubleshooting estimation convergence problems MODEL procedure, 1030 troubleshooting simulation problems MODEL procedure, 1143 true interest rate LOAN procedure, 852 Truncated Regression Models QLIM procedure, 1402 trust region optimization methods, 350, 490, 896 trust region method, 350, 490, 896 AUTOREG procedure, 342 TSCSREG procedure BY groups, 1735 estimation techniques, 1730 ID variables, 1735 output table names, 1738 panel data, 1727 TSCSREG procedure and time series cross sectional form, 80 time series cross-sectional form, 1727 TSVIEW command, 2545 two-stage least squares, 1009 2SLS estimation method, 1614 SYSLIN procedure, 1619, 1648 two-step full transform method
AUTOREG procedure, 361 two-way fixed effects model, 1285 random effects model, 1294 two-way fixed effects model PANEL procedure, 1285 two-way fixed-effects model, 1285 two-way random effects model PANEL procedure, 1294 two-way random-effects model, 1294 type of input data file DATASOURCE procedure, 538 _TYPE_ variable and interleaved time series, 80, 81 overlay plots, 90 TYPE=EST data set SIMLIN procedure, 1519 types of loans LOAN procedure, 828 Types of Tobit Model QLIM procedure, 1400 U.S. Bureau of Economic Analysis data files DATASOURCE procedure, 590 U.S. Bureau of Labor Statistics data files DATASOURCE procedure, 591 UCM procedure BY groups, 1760 ODS graph names, 1813 ODS Graphics, 1754 ODS table names, 1810 parameters, 1757–1768, 1770–1780 state space model, 1787 Statistical Graphics, 1800 syntax, 1750 table names, 1810 time intervals, 1767 unattended mode, 2546 unconditional forecasts ARIMA procedure, 258 unconditional least squares AR initial conditions, 1091 MA Initial Conditions, 1092 uncorrected sum of squares statistics of fit, 2689 underlying model smoothing models, 2669 Uniform Periodic Equivalent, 2770 unit root Dickey-Fuller test, 158 of a time series, 154 significance probabilities, 158 testing for, 154 unit roots
Subject Index F 2831
KPSS test, 378 Phillips-Perron test, 345, 346, 376 univariate autoregression, 1093 univariate model diagnostic checks VARMAX procedure, 1958 univariate moving average models, 1099 UNIVARIATE procedure, 50, 1219 descriptive statistics, 50 unlinking viewer windows, 2495 unrestricted vector autoregression, 1095 use with SAS/ETS procedures time intervals, 85 used for state space modeling Kalman filter, 1570 used to select state space models Akaike information criterion, 1591 vector autoregressive models, 1590 Yule-Walker estimates, 1590 Using CROSSLIST= option to create a view SASEFAME engine, 2292 Using INSET= option with the CROSSLIST= option to create a view SASEFAME engine, 7, 2292 using models to forecast MODEL procedure, 1120 Using RANGE= option to create a view SASEFAME engine, 2292 using solution modes MODEL procedure, 1117 Using WHERE clause with INSET= option to create a view SASEFAME engine, 2292 Using WILDCARD= option to create a view SASEFAME engine, 2292 V matrix Generalized Method of Moments, 1012, 1017 VALIDVARNAME=ANY, SAS option statement SASEFAME engine, 2299, 2304 variable list DATASOURCE procedure, 547 variables in model program MODEL procedure, 1151 variance components Fuller Battese, 1292 Nerlove, 1294 Wallace Hussain, 1293 Wansbeek Kapteyn’s, 1292 VARMAX procedure Akaike Information Criterion, 1957 asymptotic distribution of impulse response functions, 1943, 1951
asymptotic distribution of the parameter estimation, 1951 Bayesian vector autoregressive models, 1904, 1947 cointegration, 1959 cointegration testing, 1902, 1963 common trends, 1959 common trends testing, 1903, 1961 computational details, 2001 confidence limits, 1988 convergence problems, 2001 covariance stationarity, 1983 CPU requirements, 2002 decomposition of prediction error covariance, 1897, 1933 Dickey-Fuller test, 1901 differencing, 1894 dynamic simultaneous equation models, 1916 example of Bayesian VAR modeling, 1866 example of Bayesian VECM modeling, 1873 example of causality testing, 1881 example of cointegration testing, 1869 example of multivariate GARCH modeling, 1983 example of restricted parameter estimation and testing, 1878 example of VAR modeling, 1859 example of VARMA modeling, 1952 example of vector autoregressive modeling with exogenous variables, 1874 example of vector error correction modeling, 1868 forecasting, 1930 forecasting of Bayesian vector autoregressive models, 1948 Granger causality test, 1944 impulse response function, 1898, 1919 infinite order AR representation, 1898 infinite order MA representation, 1898, 1919 invertibility, 1949 long-run relations testing, 1972 memory requirements, 2002 minimum information criteria method, 1940 missing values, 1912 multivariate GARCH Modeling, 1907 multivariate model diagnostic checks, 1957 ODS graph names, 2000 Output Data Sets, 1987 partial autoregression coefficient, 1899, 1936 partial canonical correlation, 1899, 1939 partial correlation, 1938
2832 F Subject Index
prediction error covariance, 1897, 1930, 1932 sample cross covariances, 1897, 1935 sample cross-correlations, 1896, 1935 state-space representation, 1913 stationarity, 1941, 1949 tentative order selection, 1935 time intervals, 1892 univariate model diagnostic checks, 1958 vector autoregressive models, 1941 vector autoregressive models with exogenous variables , 1944 vector autoregressive moving-average models, 1912, 1949 vector error correction models, 1906, 1962 weak exogeneity testing, 1974 Yule-Walker estimates, 1899 vector autoregressive models, 1098 used to select state space models, 1590 VARMAX procedure, 1941 vector autoregressive models with exogenous variables VARMAX procedure, 1944 vector autoregressive moving-average models VARMAX procedure, 1912, 1949 vector error correction models VARMAX procedure, 1906, 1962 vector moving average models, 1101 viewing a FAME database, see SASEFAME engine viewing a Haver database, see SASEHAVR engine viewing time series, 2419 Wald test linear hypotheses, 650 nonlinear hypotheses, 1006, 1078, 1410 Wallace Hussain variance components, 1293 Wansbeek Kapteyn’s variance components, 1292 weak exogeneity testing VARMAX procedure, 1974 _WEIGHT_ variable MODEL procedure, 1052 weights, see smoothing weights WHERE in the DATA step SASEFAME engine, 2307 WHERE statement subsetting data, 50 white noise test SPECTRA procedure, 1551, 1554 white noise test of the residuals, 233 white noise test of the series, 232
White’s test, 1050 heteroscedasticity tests, 1050 widths of time intervals, 101, 735 WINTERS method seasonal forecasting, 799 Winters Method, 2678, 2679 Holt-Winters Method, 2678 smoothing models, 2678, 2679 Winters method FORECAST procedure, 774, 799 Holt-Winters method, 803 X-11 ARIMA methodology X11 procedure, 2058 X-11 seasonal adjustment method, see X11 procedure X-11-ARIMA seasonal adjustment method, see X11 procedure X-12 seasonal adjustment method, see X12 procedure X-12-ARIMA seasonal adjustment method, see X12 procedure X11 procedure BY groups, 2046 Census X-11 method, 2034 Census X-11 methodology, 2059 data requirements, 2064 differences with X11ARIMA/88, 2058 ID variables, 2046, 2048 irregular component, 2034, 2040 model selection for X-11-ARIMA method, 2068 output data sets, 2071, 2072 output table names, 2085 printed output, 2074 seasonal adjustment, 2034, 2040 seasonal component, 2034 trading-day component, 2034, 2040 trend cycle component, 2034, 2040 X-11 ARIMA methodology, 2058 X-11 seasonal adjustment method, 2034 X-11-ARIMA seasonal adjustment method, 2034 X12 procedure BY groups, 2111 Census X-12 method, 2102 ID variables, 2111 INPUT variables, 2113 seasonal adjustment, 2103 seasonal component, 2103 X-12 seasonal adjustment method, 2102 X-12-ARIMA seasonal adjustment method, 2102
Subject Index F 2833
year-over-year percent change calculations, 110 yearly averages percent change calculations, 111 Yule-Walker AR initial conditions, 1091 Yule-Walker equations AUTOREG procedure, 358 STATESPACE procedure, 1590 Yule-Walker estimates AUTOREG procedure, 357 used to select state space models, 1590 VARMAX procedure, 1899 Yule-Walker method as generalized least-squares, 361 Zellner estimation, see seemingly unrelated regression Zellner’s two-stage method PANEL procedure, 1301 zooming graphs, 2493
2834
Syntax Index 2SLS option FIT statement (MODEL), 986, 1009 PROC SYSLIN statement, 1635 3SLS option FIT statement (MODEL), 986, 1011, 1111 PROC SYSLIN statement, 1635 A option PROC SPECTRA statement, 1545 A= option FIXED statement (LOAN), 841 ABORT, 1166 ABS function, 1159 ACCEPTDEFAULT option AUTOMDL statement (X12), 2120 ACCUMULATE= option FORECAST statement (ESM), 689 ID statement (ESM), 692 ID statement (SIMILARITY), 1451 ID statement (TIMESERIES), 1694 INPUT statement (SIMILARITY), 1454 TARGET statement (SIMILARITY), 1456 VAR statement (TIMESERIES), 1698 ADDITIVE option MONTHLY statement (X11), 2047 QUARTERLY statement (X11), 2052 ADDMAXIT= option MODEL statement (MDC), 894 ADDRANDOM option MODEL statement (MDC), 894 ADDVALUE option MODEL statement (MDC), 895 ADF= option ARM statement (LOAN), 846 ADJMEAN option PROC SPECTRA statement, 1545 ADJSMMV option FIT statement (MODEL), 985 ADJUST statement X12 procedure, 2114 ADJUSTFREQ= option ARM statement (LOAN), 846 ALIGN= option FORECAST statement (ARIMA), 143, 237 ID statement (ENG), 143 ID statement (ESM), 143, 693 ID statement (HPF), 143 ID statement (HPFDIAGNOSE), 143
ID statement (HPFEVENTS), 143 ID statement (SIMILARITY), 143, 1452 ID statement (TIMESERIES), 143, 1694 ID statement (UCM), 143, 1767 ID statement (VARMAX), 143, 1892 PROC DATASOURCE statement, 143, 538 PROC EXPAND statement, 143, 729, 734 PROC FORECAST statement, 143, 790 ALL option COMPARE statement (LOAN), 848 MODEL statement (AUTOREG), 342 MODEL statement (MDC), 896 MODEL statement (PDLREG), 1356 MODEL statement (SYSLIN), 1639 PROC SYSLIN statement, 1636 TEST statement (ENTROPY), 650 TEST statement (MODEL), 1006 TEST statement (PANEL), 1281 TEST statement (QLIM), 1395 ALPHA= option FORECAST statement (ARIMA), 237 FORECAST statement (ESM), 689 FORECAST statement (UCM), 1766 IDENTIFY statement (ARIMA), 228 MODEL statement (SYSLIN), 1639 OUTLIER statement (ARIMA), 236 OUTPUT statement (VARMAX), 1909 PROC FORECAST statement, 790 PROC SYSLIN statement, 1635 ALPHA=option OUTLIER statement (UCM), 1773 ALPHACLI= option OUTPUT statement (AUTOREG), 354 OUTPUT statement (PDLREG), 1358 ALPHACLM= option OUTPUT statement (AUTOREG), 354 OUTPUT statement (PDLREG), 1358 ALPHACSM= option OUTPUT statement (AUTOREG), 354 ALTPARM option ESTIMATE statement (ARIMA), 232, 253 ALTW option PROC SPECTRA statement, 1545 AMOUNT= option FIXED statement (LOAN), 841 AMOUNTPCT= option FIXED statement (LOAN), 842 AOCV= option
2836 F Syntax Index
OUTLIER statement (X12), 2123 APCT= option FIXED statement (LOAN), 842 %AR macro, 1097, 1098 AR option IRREGULAR statement (UCM), 1770 AR= option BOXCOXAR macro, 151 DFTEST macro, 155 ESTIMATE statement (ARIMA), 234 LOGTEST macro, 156 PROC FORECAST statement, 790 ARCHTEST option MODEL statement (AUTOREG), 342 ARCOS function, 1159 ARIMA procedure, 221 syntax, 221 ARIMA procedure, PROC ARIMA statement PLOT option, 224 ARIMA statement X11 procedure, 2043 X12 procedure, 2115 ARM statement LOAN procedure, 845 ARMACV= option AUTOMDL statement (X12), 2120 ARMAX= option PROC STATESPACE statement, 1583 ARSIN function, 1159 ARTEST= option MODEL statement (PANEL), 1276 ASCII option PROC DATASOURCE statement, 538 ASTART= option PROC FORECAST statement, 790 AT= option COMPARE statement (LOAN), 849 ATAN function, 1159 ATOL= option MODEL statement (PANEL), 1276 ATTRIBUTE statement DATASOURCE procedure, 545 AUTOMDL statement X12 procedure, 2118 AUTOREG procedure, 335 syntax, 335 AUTOREG statement UCM procedure, 1757 B option ARM statement (LOAN), 847 BACK= option ESTIMATE statement (UCM), 1763 FORECAST statement (ARIMA), 238
FORECAST statement (UCM), 1766 OUTPUT statement (VARMAX), 1909 PROC ESM statement, 686 PROC STATESPACE statement, 1585 BACKCAST= option ARIMA statement (X11), 2043 BACKLIM= option ESTIMATE statement (ARIMA), 235 BACKSTEP option MODEL statement (AUTOREG), 348 BALANCED option AUTOMODL statement (X12), 2120 BALLOON statement LOAN procedure, 845 BALLOONPAYMENT= option BALLOON statement (LOAN), 845 BANDOPT= option MODEL statement (PANEL), 1276 BASE = option PROC PANEL statement, 1272 BCX option MODEL statement (QLIM), 1392 BESTCASE option ARM statement (LOAN), 847 BI option COMPARE statement (LOAN), 849 BLOCK option PROC MODEL statement, 972, 1178 BLOCKSEASON statement UCM procedure, 1758 BLOCKSIZE= option BLOCKSEASON statement (UCM), 1759 BLUS= option OUTPUT statement (AUTOREG), 354 BOUNDS statement COUNTREG procedure, 490 ENTROPY procedure, 644 MDC procedure, 890 MODEL procedure, 975 QLIM procedure, 1386 BOXCOXAR macro, 151 macro variable, 152 BP option COMPARE statement (LOAN), 849 MODEL statement (PANEL), 1276 BREAKINTEREST option COMPARE statement (LOAN), 849 BREAKPAYMENT option COMPARE statement (LOAN), 849 BREUSCH= option FIT statement (MODEL), 989 BSTART= option PROC FORECAST statement, 790
Syntax Index F 2837
BTOL= option MODEL statement (PANEL), 1276 BTWNG option MODEL statement (PANEL), 1276, 1277 BUYDOWN statement LOAN procedure, 848 BUYDOWNRATES= option BUYDOWN statement (LOAN), 848 BY statement ARIMA procedure, 227 AUTOREG procedure, 340 COMPUTAB procedure, 446 COUNTREG procedure, 491 ENTROPY procedure, 646 ESM procedure, 689 EXPAND procedure, 731 FORECAST procedure, 795 MDC procedure, 891 MODEL procedure, 976 PANEL procedure, 1271 PDLREG procedure, 1355 QLIM procedure, 1387 SIMILARITY procedure, 1450 SIMLIN procedure, 1516 SPECTRA procedure, 1546 STATESPACE procedure, 1586 SYSLIN procedure, 1637 TIMESERIES procedure, 1688 TSCSREG procedure, 1735 UCM procedure, 1760 VARMAX procedure, 1888 X11 procedure, 2046 X12 procedure, 2111 CANCORR option PROC STATESPACE statement, 1583 CAPS= option ARM statement (LOAN), 846 CAUCHY option ERRORMODEL statement (MODEL), 980 CAUSAL statement VARMAX procedure, 1889 CDEC= option PROC COMPUTAB statement, 439 CDF= option ERRORMODEL statement (MODEL), 981 CELL statement COMPUTAB procedure, 443 CENSORED option ENDOGENOUS statement (QLIM), 1388 MODEL statement (ENTROPY), 647 CENTER option ARIMA statement (X11), 2045 IDENTIFY statement (ARIMA), 228
MODEL statement (AUTOREG), 340 MODEL statement (VARMAX), 1893 PROC SPECTRA statement, 1545 CEV= option OUTPUT statement (AUTOREG), 354 CHAR option COLUMNS statement (COMPUTAB), 440 ROWS statement (COMPUTAB), 442 CHARTS= option MONTHLY statement (X11), 2048 QUARTERLY statement (X11), 2053 CHECKBREAK option LEVEL statement (UCM), 1772 CHICR= option ARIMA statement (X11), 2044 CHISQUARED option ERRORMODEL statement (MODEL), 980 CHOICE= option MODEL statement (MDC), 892 CHOW= option FIT statement (MODEL), 990, 1081 MODEL statement (AUTOREG), 343 CLAG option LAG statement (PANEL), 1275 CLAG statement LAG statement (PANEL), 1275 CLASS statement PANEL procedure, 1272 CLEAR option IDENTIFY statement (ARIMA), 228 CLIMIT= option FORECAST command (TSFS), 2548 CMPMODEL options, 972 COEF option MODEL statement (AUTOREG), 343 PROC SPECTRA statement, 1546 COEF= option HETERO statement (AUTOREG), 351 COINTEG statement VARMAX procedure, 1890, 1973 COINTTEST= option MODEL statement (VARMAX), 1902 COINTTEST=(JOHANSEN) option MODEL statement (VARMAX), 1902 COINTTEST=(JOHANSEN=(IORDER=)) option MODEL statement (VARMAX), 1902, 1980 COINTTEST=(JOHANSEN=(NORMALIZE=)) option MODEL statement (VARMAX), 1903, 1966 COINTTEST=(JOHANSEN=(TYPE=)) option MODEL statement (VARMAX), 1903 COINTTEST=(SIGLEVEL=) option MODEL statement (VARMAX), 1903
2838 F Syntax Index
COINTTEST=(SW) option MODEL statement (VARMAX), 1903, 1961 COINTTEST=(SW=(LAG=)) option MODEL statement (VARMAX), 1903 COINTTEST=(SW=(TYPE=)) option MODEL statement (VARMAX), 1904 COLLIN option ENTROPY procedure, 641 FIT statement (MODEL), 990 ‘column headings’ option COLUMNS statement (COMPUTAB), 440 COLUMNS statement COMPUTAB procedure, 440 COMPARE statement LOAN procedure, 848 COMPOUND= option FIXED statement (LOAN), 842 COMPRESS= option TARGET statement (SIMILARITY), 1457 COMPUTAB procedure, 436 syntax, 436 CONDITIONAL OUTPUT statement (QLIM), 1393 CONST= option BOXCOXAR macro, 151 LOGTEST macro, 157 CONSTANT= option OUTPUT statement (AUTOREG), 355 OUTPUT statement (PDLREG), 1359 CONTROL, 1206 CONTROL statement MODEL procedure, 979, 1151 CONVERGE= option ARIMA statement (X11), 2044 ENTROPY procedure, 643 ESTIMATE statement (ARIMA), 235 FIT statement (MODEL), 992, 1028, 1036, 1038 MODEL statement (AUTOREG), 348 MODEL statement (MDC), 892 MODEL statement (PDLREG), 1357 PROC SYSLIN statement, 1635 SOLVE statement (MODEL), 1004 CONVERT statement EXPAND procedure, 732 CONVERT= option LIBNAME statement (SASEFAME), 2293 COPULA= option SOLVE statement (MODEL), 1003 CORR option FIT statement (MODEL), 990 MODEL statement (PANEL), 1277 MODEL statement (TSCSREG), 1736 CORR statement
TIMESERIES procedure, 1689 CORRB option ESTIMATE statement (MODEL), 982 FIT statement (MODEL), 990 MODEL statement, 492 MODEL statement (AUTOREG), 343 MODEL statement (MDC), 896 MODEL statement (PANEL), 1277 MODEL statement (PDLREG), 1356 MODEL statement (SYSLIN), 1639 MODEL statement (TSCSREG), 1736 QLIM procedure, 1385 CORROUT option PROC PANEL statement, 1270 PROC QLIM statement, 1385 PROC TSCSREG statement, 1734 CORRS option FIT statement (MODEL), 990 COS function, 1159 COSH function, 1159 COST option ENDOGENOUS statement (QLIM), 1389 COUNTREG procedure, 487 syntax, 487 COV option FIT statement (MODEL), 990 COV3OUT option PROC SYSLIN statement, 1635 COVB option ESTIMATE statement (MODEL), 982 FIT statement (MODEL), 990 MODEL statement, 492 MODEL statement (AUTOREG), 343 MODEL statement (MDC), 896 MODEL statement (PANEL), 1277 MODEL statement (PDLREG), 1356 MODEL statement (SYSLIN), 1639 MODEL statement (TSCSREG), 1736 PROC STATESPACE statement, 1584 QLIM procedure, 1385 COVBEST= option ENTROPY procedure, 641 FIT statement (MODEL), 985, 1021 COVEST= option MODEL statement (AUTOREG), 343 MODEL statement (MDC), 895 PROC COUNTREG statement, 490 QLIM procedure, 1385 COVOUT option ENTROPY procedure, 642 FIT statement (MODEL), 988 PROC AUTOREG statement, 339 PROC COUNTREG statement, 489 PROC MDC statement, 890
Syntax Index F 2839
PROC PANEL statement, 1270 PROC QLIM statement, 1385 PROC SYSLIN statement, 1635 PROC TSCSREG statement, 1734 COVS option FIT statement (MODEL), 990, 1026 CPEV= option OUTPUT statement (AUTOREG), 354 CROSS option PROC SPECTRA statement, 1546 CROSSCORR statement TIMESERIES procedure, 1690 CROSSCORR= option IDENTIFY statement (ARIMA), 228 CROSSLIST= option LIBNAME statement (SASEFAME), 2295 CROSSPLOTS= option PROC TIMESERIES statement, 1686 CROSSVAR statement TIMESERIES procedure, 1698 CRSPLINKPATH= option LIBNAME statement (SASECRSP), 2201 CSPACE= option PROC COMPUTAB statement, 439 CSTART= option PROC FORECAST statement, 791 CUSIP= option LIBNAME statement (SASECRSP), 2199 CUSUM= option OUTPUT statement (AUTOREG), 355 CUSUMLB= option OUTPUT statement (AUTOREG), 355 CUSUMSQ= option OUTPUT statement (AUTOREG), 355 CUSUMSQLB= option OUTPUT statement (AUTOREG), 355 CUSUMSQUB= option OUTPUT statement (AUTOREG), 355 CUSUMUB= option OUTPUT statement (AUTOREG), 355 CUTOFF= option SSPAN statement (X11), 2055 CV= option OUTLIER statement (X12), 2122 CWIDTH= option PROC COMPUTAB statement, 439 CYCLE statement UCM procedure, 1760 DASILVA option MODEL statement (PANEL), 1277 MODEL statement (TSCSREG), 1737 DATA Step IF Statement, 73
WHERE Statement, 73 DATA step DROP statement, 74 KEEP statement, 74 DATA= option ENTROPY procedure, 641 FIT statement (MODEL), 987, 1104 FORECAST command (TSFS), 2546, 2547 IDENTIFY statement (ARIMA), 229 PROC ARIMA statement, 224 PROC AUTOREG statement, 339 PROC COMPUTAB statement, 438 PROC COUNTREG statement, 489 PROC ESM statement, 686 PROC EXPAND statement, 729 PROC FORECAST statement, 791 PROC MDC statement, 889 PROC MODEL statement, 970 PROC PANEL statement, 1270 PROC PDLREG statement, 1355 PROC QLIM statement, 1384 PROC SIMILARITY statement, 1448 PROC SIMLIN statement, 1515, 1522 PROC SPECTRA statement, 1546 PROC STATESPACE statement, 1582 PROC SYSLIN statement, 1635 PROC TIMESERIES statement, 1686 PROC TSCSREG statement, 1734 PROC UCM statement, 1754 PROC VARMAX statement, 1886 PROC X11 statement, 2042 PROC X12 statement, 2109 SOLVE statement (MODEL), 1000, 1151 TSVIEW command (TSFS), 2546, 2547 DATASOURCE procedure, 536 syntax, 536 DATE function, 144 DATE= option MONTHLY statement (X11), 2048 PROC X12 statement, 2109 QUARTERLY statement (X11), 2053 DATEJUL function, 96 DATEJUL function, 144 DATEPART function, 97, 144 DATETIME function, 144 DAY function, 96, 144 DBNAME= option PROC DATASOURCE statement, 538 DBTYPE= option PROC DATASOURCE statement, 538
2840 F Syntax Index
DECOMP statement TIMESERIES procedure, 1691 DEGREE= option SPLINEREG statement (UCM), 1778 SPLINESEASON statement (UCM), 1779 DELTA= option ESTIMATE statement (ARIMA), 235 DEPLAG statement UCM procedure, 1762 DETAILS option FIT statement (MODEL), 1046 PROC MODEL statement, 973 DETTOL= option PROC STATESPACE statement, 1584 DFPVALUE macro, 153 macro variable, 154, 155 DFTEST macro, 154 DFTEST option MODEL statement (VARMAX), 1901, 2002 DFTEST=(DLAG=) option MODEL statement (VARMAX), 1901 DHMS function, 97 DHMS function, 144 DIAG= option FORECAST command (TSFS), 2549 DIF function, 105 DIF function MODEL procedure, 109 DIF= option BOXCOXAR macro, 151 DFTEST macro, 155 INPUT statement (SIMILARITY), 1454 LOGTEST macro, 157 MODEL statement (VARMAX), 1893 TARGET statement (SIMILARITY), 1458 VAR statement (TIMESERIES), 1698 DIFF= option IDENTIFY statement (X12), 2117 DIFFORDER= option AUTOMDL statement (X12), 2119 DIFX= option MODEL statement (VARMAX), 1894 DIFY= option MODEL statement (VARMAX), 1894, 2013 DIMMAX= option PROC STATESPACE statement, 1584 DISCRETE option ENDOGENOUS statement (QLIM), 1388 DIST= option COUNTREG statement (COUNTREG), 492
MODEL statement (AUTOREG), 341 MODEL statement (COUNTREG), 492 DISTRIBUTION= option ENDOGENOUS statement (QLIM), 1388 DLAG= option DFPVALUE macro, 153 DFTEST macro, 155 DO, 1164 DOL option ROWS statement (COMPUTAB), 442 DOWNPAYMENT= option FIXED statement (LOAN), 842 DOWNPAYPCT= option FIXED statement (LOAN), 842 DP= option FIXED statement (LOAN), 842 DPCT= option FIXED statement (LOAN), 842 DROP statement DATASOURCE procedure, 541 DROP= option FIT statement (MODEL), 984 LIBNAME statement (SASEHAVR), 2344 DROPEVENT statement DATASOURCE procedure, 543 DROPGROUP= option LIBNAME statement (SASEHAVR), 2344 DROPH= option SEASON statement (UCM), 1775 DROPSOURCE= option LIBNAME statement (SASEHAVR), 2344 DUAL option ENTROPY procedure, 643 DUL option ROWS statement (COMPUTAB), 442 DW option FIT statement (MODEL), 990 MODEL statement (SYSLIN), 1639 DW= option MODEL statement (AUTOREG), 343 MODEL statement (PDLREG), 1357 DWPROB option FIT statement (MODEL), 990 MODEL statement (AUTOREG), 343 MODEL statement (PDLREG), 1357 DYNAMIC option FIT statement (MODEL), 985, 1070, 1072 SOLVE statement (MODEL), 1002, 1068, 1117 EBCDIC option PROC DATASOURCE statement, 538 ECM= option MODEL statement (VARMAX), 1906
Syntax Index F 2841
ECM=(ECTREND) option MODEL statement (VARMAX), 1906, 1970 ECM=(NORMALIZE=) option MODEL statement (VARMAX), 1871, 1906 ECM=(RANK=) option MODEL statement (VARMAX), 1871, 1906 Empirical Distribution Estimation ERRORMODEL statement (MODEL), 1023 EMPIRICAL= option ERRORMODEL statement (MODEL), 981 END= option ID statement (ESM), 693 ID statement (SIMILARITY), 1452 ID statement (TIMESERIES), 1695 LIBNAME statement (SASEHAVR), 2344 MONTHLY statement (X11), 2048 QUARTERLY statement (X11), 2053 ENDOGENOUS statement MODEL procedure, 979, 1151 SIMLIN procedure, 1516 SYSLIN procedure, 1637 ENTROPY procedure, 638 syntax, 638 ENTRY= option FORECAST command (TSFS), 2548 EPSILON = option FIT statement (MODEL), 992 ERRORMODEL statement MODEL procedure, 980 ERRSTD OUTPUT statement (QLIM), 1393 ESACF option IDENTIFY statement (ARIMA), 229 ESM, 681 ESM procedure, 684 syntax, 684 EST= option PROC SIMLIN statement, 1515, 1521 ESTDATA= option FIT statement (MODEL), 988, 1105 SOLVE statement (MODEL), 1000, 1121, 1149 ESTIMATE statement ARIMA procedure, 232 MODEL procedure, 981 UCM procedure, 1763 X12 procedure, 2115 ESTIMATEDCASE= option ARM statement (LOAN), 847 ESTPRINT option PROC SIMLIN statement, 1515 ESUPPORTS= option MODEL statement (ENTROPY), 647 EVENT statement
X12 procedure, 2112 EXCLUDE= option FIT statement (MODEL), 1086 INSTRUMENTS statement (MODEL), 995 MONTHLY statement (X11), 2048 EXOGENEITY option COINTEG statement (VARMAX), 1890, 1976 EXOGENOUS statement MODEL procedure, 983, 1151 SIMLIN procedure, 1517 EXP function, 1159 EXPAND procedure, 727 CONVERT statement, 742 syntax, 727 EXPAND= option TARGET statement (SIMILARITY), 1458 EXPECTED OUTPUT statement (QLIM), 1393 EXTRADIFFUSE= option ESTIMATE statement (UCM), 1763 FORECAST statement (UCM), 1766 EXTRAPOLATE option PROC EXPAND statement, 730 F option ERRORMODEL statement (MODEL), 980 FACTOR= option PROC EXPAND statement, 729, 733, 734 FAMEOUT= option LIBNAME statement (SASEFAME), 2297 FAMEPRINT option PROC DATASOURCE statement, 538 FCMPOPT statement SIMILARITY procedure, 1451 FILETYPE= option PROC DATASOURCE statement, 538 FIML option FIT statement (MODEL), 985, 1019, 1105, 1211 PROC SYSLIN statement, 1635 FINAL= option X11 statement (X12), 2135 FIRST option PROC SYSLIN statement, 1636 FIT statement MODEL procedure, 984 FIT statement, MODEL procedure GINV= option, 985 FIXED statement LOAN procedure, 841 FIXEDCASE option ARM statement (LOAN), 847 FIXONE option
2842 F Syntax Index
MODEL statement (PANEL), 1277 MODEL statement (TSCSREG), 1736 FIXONETIME option MODEL statement (PANEL), 1277 FIXTWO option MODEL statement (PANEL), 1277 MODEL statement (TSCSREG), 1737 FLATDATA statement PANEL procedure, 1272 FLOW option PROC MODEL statement, 973 FORCE= FREQ option LIBNAME statement (SASEHAVR), 2344 FORCE= option X11 statement (X12), 2135 FORECAST macro, 2547 FORECAST option SOLVE statement (MODEL), 1002, 1117, 1120 FORECAST procedure, 788 syntax, 788 FORECAST statement ARIMA procedure, 237 ESM procedure, 689 UCM procedure, 1765 X12 procedure, 2116 FORECAST= option ARIMA statement (X11), 2044 FORM statement STATESPACE procedure, 1586 FORM= option GARCH statement, 1907 FORMAT statement DATASOURCE procedure, 545 FORMAT= option ATTRIBUTE statement (DATASOURCE), 545 COLUMNS statement (COMPUTAB), 441 ID statement (ESM), 693 ID statement (SIMILARITY), 1453 ID statement (TIMESERIES), 1695 ROWS statement (COMPUTAB), 443 FREQ= option LIBNAME statement (SASEHAVR), 2344 FREQUENCY= option PROC DATASOURCE statement, 539 FROM= option PROC EXPAND statement, 729, 733, 734 FRONTIER option ENDOGENOUS statement (QLIM), 1389 FSRSQ option FIT statement (MODEL), 991, 1010, 1087 FULLER option
MODEL statement (PANEL), 1277 MODEL statement (TSCSREG), 1737 FULLWEIGHT= option MONTHLY statement (X11), 2049 QUARTERLY statement (X11), 2053 FUNCTION= option TRANSFORM statement (X12), 2131 FUZZ= option PROC COMPUTAB statement, 438 GARCH statement VARMAX procedure, 1907 GARCH= option MODEL statement (AUTOREG), 341 GCE option ENTROPY procedure, 641 GCENM option ENTROPY procedure, 641 GCONV= option ENTROPY procedure, 643 GENERAL= option ERRORMODEL statement (MODEL), 980 GENGMMV option FIT statement (MODEL), 985 GETDER function, 1158 GINV option MODEL statement (AUTOREG), 343 GINV= option FIT statement (MODEL), 985 MODEL statement (PANEL), 1277 GME option ENTROPY procedure, 641 GMED option ENTROPY procedure, 641 GMENM option ENTROPY procedure, 641 GMM option FIT statement (MODEL), 985, 1011, 1054, 1105, 1108, 1109, 1111 MODEL statement (PANEL), 1277 GODFREY option FIT statement (MODEL), 991 MODEL statement (AUTOREG), 344 GRAPH option PROC MODEL statement, 972, 1178 GRID option ESTIMATE statement (ARIMA), 235 GRIDVAL= option ESTIMATE statement (ARIMA), 235 GROUP1 option CAUSAL statement (VARMAX), 1889 GROUP2 option CAUSAL statement (VARMAX), 1889 GROUP= option
Syntax Index F 2843
LIBNAME statement (SASEHAVR), 2344 GVKEY= option LIBNAME statement (SASECRSP), 2198 H= option COINTEG statement (VARMAX), 1890, 1973 HALTONSTART= option MODEL statement (MDC), 892 HAUSMAN option FIT statement (MODEL), 991, 1080 HCCME= option FIT statement (MODEL), 986 MODEL statement (PANEL), 1277 HCUSIP= option LIBNAME statement (SASECRSP), 2199 HESSIAN= option FIT statement (MODEL), 992, 1021 HETERO statement AUTOREG procedure, 350 HEV option MODEL statement (MDC), 892 HMS function, 97 HMS function, 144 HOLIDAY function, 144 HORIZON= option FORECAST command (TSFS), 2548 HOUR function, 144 HRINITIAL option AUTOMDL statement (X12), 2120 HT= option OUTPUT statement (AUTOREG), 354 I option FIT statement (MODEL), 991, 1043 MODEL statement (PDLREG), 1357 MODEL statement (SYSLIN), 1639 ID statement ENTROPY procedure, 646 ESM procedure, 691 EXPAND procedure, 733 FORECAST procedure, 795 MDC procedure, 891 MODEL procedure, 993 PANEL procedure, 1273 SIMILARITY procedure, 1451 SIMLIN procedure, 1517 STATESPACE procedure, 1586 TIMESERIES procedure, 1693 TSCSREG procedure, 1735 UCM procedure, 1767 VARMAX procedure, 1892
X11 procedure, 2046 X12 procedure, 2111 ID= option FORECAST command (TSFS), 2546, 2547 FORECAST statement (ARIMA), 238 OUTLIER statement (ARIMA), 237 TSVIEW command (TSFS), 2546, 2547 IDENTIFY statement ARIMA procedure, 228, 236 X12 procedure, 2117 IDENTITY statement SYSLIN procedure, 1638 IF, 1164 INCLUDE, 1168 INCLUDE statement MODEL procedure, 993 INDEX option PROC DATASOURCE statement, 538 INDID = option PROC PANEL statement, 1272 INDNO= option LIBNAME statement (SASECRSP), 2200 INEVENT= option PROC X12 statement, 2111 INFILE= option PROC DATASOURCE statement, 538 INIT statement COMPUTAB procedure, 444 COUNTREG procedure, 491 QLIM procedure, 1391 INIT= option FIXED statement (LOAN), 843 INITIAL statement STATESPACE procedure, 1587 INITIAL= option FIT statement (MODEL), 984, 1072 FIXED statement (LOAN), 843 MODEL statement (AUTOREG), 348 MODEL statement (MDC), 896 INITIALPCT= option FIXED statement (LOAN), 843 INITMISS option PROC COMPUTAB statement, 438 INITPCT= option FIXED statement (LOAN), 843 INITVAL= option ESTIMATE statement (ARIMA), 234 INPUT statement SIMILARITY procedure, 1454 X12 procedure, 2113 INPUT= option ESTIMATE statement (ARIMA), 232, 253 INSET= option LIBNAME statement (SASECRSP), 2202
2844 F Syntax Index
LIBNAME statement (SASEFAME), 2294 INSTRUMENTS statement MODEL procedure, 994, 1084 SYSLIN procedure, 1638 INTCINDEX function, 145 INTCK function, 99 INTCK function, 145 INTCYCLE function, 145 INTEGRATE= option MODEL statement (MDC), 892 INTERIM= option PROC SIMLIN statement, 1515 INTERVAL= option FORECAST command (TSFS), 2546, 2548 FORECAST statement (ARIMA), 238, 259 ID statement (ESM), 693 ID statement (SIMILARITY), 1453 ID statement (TIMESERIES), 1695 ID statement (UCM), 1767 ID statement (VARMAX), 1892 PROC DATASOURCE statement, 539 PROC FORECAST statement, 791 PROC STATESPACE statement, 1585 PROC X12 statement, 2110 TSVIEW command (TSFS), 2546, 2548 INTERVAL=option FIXED statement (LOAN), 843 INTFIT function, 145 INTFMT function, 145 INTGET function, 145 INTGPRINT option SOLVE statement (MODEL), 1004 INTINDEX function, 146 INTNX function, 98 INTNX function, 146 INTONLY option INSTRUMENTS statement (MODEL), 995 INTORDER= option MODEL statement (MDC), 892 INTPER= option PROC FORECAST statement, 791 PROC STATESPACE statement, 1585 INTSEAS function, 146 INTSHIFT function, 147 INTTEST function, 147 IRREGULAR statement UCM procedure, 1768 IT2SLS option FIT statement (MODEL), 986 IT3SLS option FIT statement (MODEL), 986 PROC SYSLIN statement, 1636
ITALL option FIT statement (MODEL), 991, 1044 ITDETAILS option FIT statement (MODEL), 991, 1042 ITGMM option FIT statement (MODEL), 986, 1015 MODEL statement (PANEL), 1277 ITOLS option FIT statement (MODEL), 986 ITPRINT option ENTROPY procedure, 642 ESTIMATE statement (X12), 2116 FIT statement (MODEL), 992, 1036, 1042 MODEL statement, 492 MODEL statement (AUTOREG), 344 MODEL statement (MDC), 896 MODEL statement (PANEL), 1278 MODEL statement (PDLREG), 1357 PROC STATESPACE statement, 1584 PROC SYSLIN statement, 1637 QLIM procedure, 1385 SOLVE statement (MODEL), 1005, 1145 ITSUR option FIT statement (MODEL), 986, 1011 PROC SYSLIN statement, 1636 J= option COINTEG statement (VARMAX), 1891 JACOBI option SOLVE statement (MODEL), 1003 JULDATE function, 96, 147 K option PROC SPECTRA statement, 1546 K= option MODEL statement (SYSLIN), 1639 PROC SYSLIN statement, 1636 KEEP = option PROC PANEL statement, 1272 KEEP statement DATASOURCE procedure, 541 KEEP= option FORECAST command (TSFS), 2549 LIBNAME statement (SASEHAVR), 2344 KEEPEVENT statement DATASOURCE procedure, 542 KEEPH= option SEASON statement (UCM), 1776 KERNEL option FIT statement (MODEL), 1108 KERNEL= option FIT statement (MODEL), 986, 1013 KLAG= option PROC STATESPACE statement, 1584
Syntax Index F 2845
KNOTS= option SPLINEREG statement (UCM), 1778 SPLINESEASON statement (UCM), 1779 L= option FIXED statement (LOAN), 841 _LABEL_ option COLUMNS statement (COMPUTAB), 440 ROWS statement (COMPUTAB), 442 LABEL statement DATASOURCE procedure, 546 MODEL procedure, 995 LABEL= option ATTRIBUTE statement (DATASOURCE), 545 FIXED statement (LOAN), 843 LAG function, 105 LAG function MODEL procedure, 109 LAG statement PANEL procedure, 1274 LAGDEP option MODEL statement (AUTOREG), 344 MODEL statement (PDLREG), 1357 LAGDEP= option MODEL statement (AUTOREG), 344 MODEL statement (PDLREG), 1357 LAGDV option MODEL statement (AUTOREG), 344 MODEL statement (PDLREG), 1357 LAGDV= option MODEL statement (AUTOREG), 344 MODEL statement (PDLREG), 1357 LAGGED statement SIMLIN procedure, 1517 LAGMAX= option MODEL statement (VARMAX), 1896 PROC STATESPACE statement, 1582 LAGRANGE option TEST statement (ENTROPY), 650 TEST statement (MODEL), 1006 LAGS= option CORR statement (TIMESERIES), 1690 CROSSCORR statement (TIMESERIES), 1691 DEPLAG statement (UCM), 1762 LAMBDA= option DECOMP statement (TIMESERIES), 1692 LAMBDAHI= option BOXCOXAR macro, 151 LAMBDALO= option BOXCOXAR macro, 151 LCL= option
OUTPUT statement (AUTOREG), 355 OUTPUT statement (PDLREG), 1359 LCLM= option OUTPUT statement (AUTOREG), 355 OUTPUT statement (PDLREG), 1359 LDW option MODEL statement (AUTOREG), 349 LEAD= option FORECAST statement (ARIMA), 238 FORECAST statement (UCM), 1766 FORECAST statement (X12), 2117 OUTPUT statement (VARMAX), 1909 PROC ESM statement, 686 PROC FORECAST statement, 791 PROC STATESPACE statement, 1585 LENGTH option MONTHLY statement (X11), 2049 LENGTH statement DATASOURCE procedure, 546 LENGTH= option ATTRIBUTE statement (DATASOURCE), 545 SEASON statement (UCM), 1776 SPLINESEASON statement (UCM), 1780 LEVEL statement UCM procedure, 1771 LIBNAME libref SASECRSP statement, 2195 LIFE= option FIXED statement (LOAN), 841 LIKE option TEST statement (ENTROPY), 650 TEST statement (MODEL), 1006 LIMIT1= option MODEL statement (QLIM), 1392 LIML option PROC SYSLIN statement, 1636 LINK= option HETERO statement (AUTOREG), 351 LIST option FIT statement (MODEL), 1197 PROC MODEL statement, 972, 1169 LISTALL option PROC MODEL statement, 972 LISTCODE option PROC MODEL statement, 972, 1171 LISTDEP option PROC MODEL statement, 972, 1174 LISTDER option PROC MODEL statement, 973 LJC option COLUMNS statement (COMPUTAB), 441 ROWS statement (COMPUTAB), 443 LJUNGBOXLIMIT= option AUTOMDL statement (X12), 2120
2846 F Syntax Index
LM option TEST statement (ENTROPY), 650 TEST statement (MODEL), 1006 TEST statement (PANEL), 1281 TEST statement (QLIM), 1395 LOAN procedure, 838 syntax, 838 LOG function, 1159 LOG10 function, 1159 LOG2 function, 1159 LOGLIKL option MODEL statement (AUTOREG), 344 LOGNORMALPARM= option MODEL statement (MDC), 893 LOGTEST macro, 156 macro variable, 157 LOWERBOUND= option ENDOGENOUS statement (QLIM), 1388, 1389 LR option TEST statement (ENTROPY), 650 TEST statement (MODEL), 1006 TEST statement (PANEL), 1281 TEST statement (QLIM), 1395 LRECL= option PROC DATASOURCE statement, 539 LSCV= option OUTLIER statement (X12), 2123 LTEBOUND= option FIT statement (MODEL), 992, 1147 MODEL statement (MODEL), 1147 SOLVE statement (MODEL), 1147 M= option MODEL statement (PANEL), 1278 MODEL statement (TSCSREG), 1737 %MA macro, 1100, 1101 MA= option ESTIMATE statement (ARIMA), 234 MACURVES statement X11 procedure, 2046 MAPECR= option ARIMA statement (X11), 2044 MARGINAL OUTPUT statement (QLIM), 1393 MARGINALS option MODEL statement (ENTROPY), 647 MARKOV option ENTROPY procedure, 641 MARR= option COMPARE statement (LOAN), 849 MAXAD= option ARM statement (LOAN), 846
MAXADJUST= option ARM statement (LOAN), 846 MAXBAND= option MODEL statement (PANEL), 1278 MAXDIFF= option AUTOMDL statement (X12), 2119 MAXERROR= option PROC ESM statement, 686 PROC TIMESERIES statement, 1686 MAXERRORS= option PROC FORECAST statement, 791 PROC MODEL statement, 973 MAXIT= PROC SYSLIN statement, 1636 MAXIT= option ESTIMATE statement (ARIMA), 235 PROC STATESPACE statement, 1584 MAXITER= option ARIMA statement (X11), 2044 ENTROPY procedure, 643 ESTIMATE statement (ARIMA), 235 ESTIMATE statement (X12), 2116 FIT statement (MODEL), 992 MODEL statement (AUTOREG), 349 MODEL statement (MDC), 896 MODEL statement (PANEL), 1278 MODEL statement (PDLREG), 1357 PROC SYSLIN statement, 1636 SOLVE statement (MODEL), 1004 MAXNUM= option OUTLIER statement (ARIMA), 237 OUTLIER statement (UCM), 1773 MAXORDER= option AUTOMDL statement (X12), 2119 MAXPCT= option OUTLIER statement (ARIMA), 237 OUTLIER statement (UCM), 1773 MAXR= option ARM statement (LOAN), 846 MAXRATE= option ARM statement (LOAN), 846 MAXSUBITER= option ENTROPY procedure, 643 FIT statement (MODEL), 992, 1028 SOLVE statement (MODEL), 1004 MDC procedure, 887 syntax, 887 MDC procedure, MODEL statement ADDMAXIT= option, 894 ADDRANDOM option, 894 ADDVALUE option, 895 ALL option, 896 CHOICE= option, 892 CONVERGE= option, 892
Syntax Index F 2847
CORRB option, 896 COVB option, 896 COVEST= option, 895 HALTONSTART= option, 892 HEV= option, 892 INITIAL= option, 896 ITPRINT option, 896 LOGNORMALPARM= option, 893 MAXITER= option, 896 MIXED= option, 893 NCHOICE option, 893 NOPRINT option, 896 NORMALEC= option, 893 NORMALPARM= option, 893 NSIMUL option, 893 OPTMETHOD= option, 896 RANDINIT option, 894 RANDNUM= option, 893 RANK option, 894 RESTART= option, 894 SAMESCALE option, 895 SEED= option, 895 SPSCALE option, 895 TYPE= option, 895 UNIFORMEC= option, 893 UNIFORMPARM= option, 893 UNITVARIANCE= option, 895 MDC procedure, OUTPUT statement OUT= option, 901 P= option, 901 XBETA= option, 901 MDC procedure, PROC MDC statement COVOUT option, 890 DATA= option, 889 OUTEST= option, 890 MDCDATA statement, 890 MDLINFOIN= option PROC X12 statement, 2111 MDLINFOOUT= option PROC X12 statement, 2111 MDY function, 96 MDY function, 147 MEAN= option MODEL statement (AUTOREG), 342 MEASURE= option TARGET statement (SIMILARITY), 1459 MEDIAN option FORECAST statement (ESM), 690 MELO option PROC SYSLIN statement, 1636 MEMORYUSE option PROC MODEL statement, 974 METHOD= option
ARIMA statement (X11), 2044 CONVERT statement (EXPAND), 732, 739 ENTROPY procedure, 644 ESTIMATE statement (ARIMA), 232 FIT statement (MODEL), 992, 1027 MODEL statement (AUTOREG), 349 MODEL statement (PDLREG), 1357 MODEL statement (VARMAX), 1894 PROC COUNTREG statement, 490 PROC EXPAND statement, 730, 739 PROC FORECAST statement, 791 QLIM procedure, 1386 MILLS OUTPUT statement (QLIM), 1393 MINIC option IDENTIFY statement (ARIMA), 229 PROC STATESPACE statement, 1583 MINIC= option MODEL statement (VARMAX), 1900 MINIC=(P=) option MODEL statement (VARMAX), 1901, 1940 MINIC=(PERROR=) option MODEL statement (VARMAX), 1901 MINIC=(Q=) option MODEL statement (VARMAX), 1901, 1940 MINIC=(TYPE=) option MODEL statement (VARMAX), 1901 MINR= option ARM statement (LOAN), 846 MINRATE= option ARM statement (LOAN), 846 MINTIMESTEP= option FIT statement (MODEL), 993, 1147 MODEL statement (MODEL), 1147 SOLVE statement (MODEL), 1147 MINUTE function, 147 MISSING=option FIT statement (MODEL), 988 MIXED option MODEL statement (MDC), 893 MODE= option DECOMP statement (TIMESERIES), 1692 X11 statement (X12), 2133 MODEL procedure, 962 syntax, 962 MODEL statement AUTOREG procedure, 340 COUNTREG procedure, 491 ENTROPY procedure, 646 MDC procedure, 892 PANEL procedure, 1275, 1276 PDLREG procedure, 1356 QLIM procedure, 1391
2848 F Syntax Index
SYSLIN procedure, 1638 TSCSREG procedure, 1736 UCM procedure, 1772 VARMAX procedure, 1892 MODEL= option ARIMA statement (X11), 2044 ARIMA statement (X12), 2115 FORECAST statement (ESM), 690 PROC MODEL statement, 971, 1167 MOMENT statement MODEL procedure, 996 MONTH function, 96, 147 MONTHLY statement X11 procedure, 2047 MTITLE= option COLUMNS statement (COMPUTAB), 440 MU= option ESTIMATE statement (ARIMA), 235 +n option COLUMNS statement (COMPUTAB), 441 ROWS statement (COMPUTAB), 442 N2SLS option FIT statement (MODEL), 986 N3SLS option FIT statement (MODEL), 986 NAHEAD= option SOLVE statement (MODEL), 1002, 1117, 1118 _NAME_ option COLUMNS statement (COMPUTAB), 441 ROWS statement (COMPUTAB), 442 NBACKCAST= option FORECAST statement (ESM), 690 NBLOCKS= option BLOCKSEASON statement (UCM), 1759 NCHOICE option MODEL statement (MDC), 893 NDEC= option MONTHLY statement (X11), 2049 PROC MODEL statement, 973 QUARTERLY statement (X11), 2054 SSPAN statement (X11), 2055 NDRAW option FIT statement (MODEL), 986 NDRAW= option QLIM procedure, 1386 NEST statement MDC procedure, 897 NESTIT option FIT statement (MODEL), 993, 1027 NEWTON option SOLVE statement (MODEL), 1003
NKNOTS= option SPLINEREG statement (UCM), 1778 NLAG= option CORR statement (TIMESERIES), 1690 CROSSCORR statement (TIMESERIES), 1691 IDENTIFY statement (ARIMA), 229 MODEL statement (AUTOREG), 341 MODEL statement (PDLREG), 1358 NLAGS= option PROC FORECAST statement, 790 NLAMBDA= option BOXCOXAR macro, 151 NLOPTIONS statement MDC procedure, 900 QLIM procedure, 1392 UCM procedure, 1772 VARMAX procedure, 1908, 1953 NO2SLS option FIT statement (MODEL), 987 NO3SLS option FIT statement (MODEL), 987 NOCENTER option PROC STATESPACE statement, 1583 NOCOMPRINT option COMPARE statement (LOAN), 850 NOCONST option HETERO statement (AUTOREG), 352 NOCONSTANT option ESTIMATE statement (ARIMA), 233 NOCURRENTX option MODEL statement (VARMAX), 1894 NODF option ESTIMATE statement (ARIMA), 233 NODIFFS option MODEL statement (PANEL), 1278 NOEST option AUTOREG statement (UCM), 1757 BLOCKSEASON statement (UCM), 1759 CYCLE statement (UCM), 1761 DEPLAG statement (UCM), 1762 ESTIMATE statement (ARIMA), 235 IRREGULAR statement (UCM), 1768, 1770 LEVEL statement (UCM), 1772 PROC STATESPACE statement, 1584 RANDOMREG statement (UCM), 1774 SEASON statement (UCM), 1776 SLOPE statement (UCM), 1777 SPLINEREG statement (UCM), 1779 SPLINESEASON statement (UCM), 1780 NOESTIM option MODEL statement (PANEL), 1278 NOGENGMMV option
Syntax Index F 2849
FIT statement (MODEL), 987 NOINCLUDE option PROC SYSLIN statement, 1636 NOINT option ARIMA statement (X11), 2045 AUTOMODL statement (X12), 2119 ESTIMATE statement (ARIMA), 233 INSTRUMENTS statement (MODEL), 995 MODEL statement (AUTOREG), 340, 342 MODEL statement (COUNTREG), 492 MODEL statement (ENTROPY), 647 MODEL statement (PANEL), 1278 MODEL statement (PDLREG), 1358 MODEL statement (QLIM), 1392 MODEL statement (SYSLIN), 1639 MODEL statement (TSCSREG), 1737 MODEL statement (VARMAX), 1895 NOINTERCEPT option INSTRUMENTS statement (MODEL), 995 NOLEVELS option MODEL statement (PANEL), 1278 NOLS option ESTIMATE statement (ARIMA), 236 NOMEAN option MODEL statement (TSCSREG), 1737 NOMISS option IDENTIFY statement (ARIMA), 230 MODEL statement (AUTOREG), 350 NOOLS option FIT statement (MODEL), 987 NOOUTALL option FORECAST statement (ARIMA), 238 PROC ESM statement, 686 NOP option FIXED statement (LOAN), 844 NOPRINT option ARIMA statement (X11), 2045 COLUMNS statement (COMPUTAB), 441 ESTIMATE statement (ARIMA), 233 FIXED statement (LOAN), 844 FORECAST statement (ARIMA), 238 IDENTIFY statement (ARIMA), 230 MODEL statement (AUTOREG), 344 MODEL statement (MDC), 896 MODEL statement (PANEL), 1278 MODEL statement (PDLREG), 1358 MODEL statement (SYSLIN), 1639 MODEL statement (TSCSREG), 1737 MODEL statement (VARMAX), 1896 OUTPUT statement (VARMAX), 1909 PROC COMPUTAB statement, 439 PROC COUNTREG statement, 490 PROC ENTROPY statement, 642 PROC MODEL statement, 973
PROC QLIM statement, 1385 PROC SIMLIN statement, 1515 PROC STATESPACE statement, 1582 PROC SYSLIN statement, 1637 PROC UCM statement, 1754 PROC VARMAX statement (VARMAX), 1988 PROC X11 statement, 2043 ROWS statement (COMPUTAB), 443 SSPAN statement (X11), 2055 NOPROFILE ESTIMATE statement (UCM), 1763 NORED option PROC SIMLIN statement, 1515 NORMAL option ERRORMODEL statement (MODEL), 980 FIT statement (MODEL), 991 MODEL statement (AUTOREG), 344 NORMALEC= option MODEL statement (MDC), 893 NORMALIZE= option COINTEG statement (VARMAX), 1891, 2002 INPUT statement (SIMILARITY), 1454 TARGET statement (SIMILARITY), 1460 NORMALPARM= option MODEL statement (MDC), 893 NORTR option PROC COMPUTAB statement, 439 NOSTABLE option ESTIMATE statement (ARIMA), 236 NOSTORE option PROC MODEL statement, 971 NOSUM TABLES statement (X12), 2130 NOSUMMARYPRINT option FIXED statement (LOAN), 844 NOSUMPR option FIXED statement (LOAN), 844 NOTFSTABLE option ESTIMATE statement (ARIMA), 236 NOTRANS option PROC COMPUTAB statement, 439 NOTRANSPOSE option PROC COMPUTAB statement, 438 NOTSORTED option ID statement (ESM), 693 ID statement (SIMILARITY), 1453 ID statement (TIMESERIES), 1695 NOZERO option COLUMNS statement (COMPUTAB), 441 ROWS statement (COMPUTAB), 443 NPARMS= option CORR statement (TIMESERIES), 1690
2850 F Syntax Index
NPERIODS= option DECOMP statement (TIMESERIES), 1693 TREND statement (TIMESERIES), 1697 NPREOBS option FIT statement (MODEL), 987 NSEASON= option MODEL statement (VARMAX), 1895 NSIMUL option MODEL statement (MDC), 893 NSSTART= MAX option PROC FORECAST statement, 792 NSSTART= option PROC FORECAST statement, 792 NSTART= MAX option PROC FORECAST statement, 792 NSTART= option PROC FORECAST statement, 792 NVDRAW option FIT statement (MODEL), 987 NWKDOM function, 147 OBSERVED= option CONVERT statement (EXPAND), 732, 737 PROC EXPAND statement, 730 OFFSET= option BLOCKSEASON statement (UCM), 1759 MODEL statement (COUNTREG), 492 SPLINESEASON statement (UCM), 1780 OL option ROWS statement (COMPUTAB), 443 OLS option FIT statement (MODEL), 987, 1180 PROC SYSLIN statement, 1636 ONEPASS option SOLVE statement (MODEL), 1003 OPTIONS option PROC COMPUTAB statement, 439 OPTMETHOD= option MODEL statement (AUTOREG), 350 MODEL statement (MDC), 896 ORDER= option ENDOGENOUS statement (QLIM), 1388 PROC SIMILARITY statement, 1448 OTHERWISE, 1166 OUT = option FlatData statement (PANEL), 1273 OUT1STEP option PROC FORECAST statement, 793 OUT= option BOXCOXAR macro, 151 DFTEST macro, 155 ENTROPY procedure, 641 FIT statement (MODEL), 988, 1110, 1182
FIXED statement (LOAN), 844, 853 FORECAST command (TSFS), 2549 FORECAST statement (ARIMA), 238, 262 LOGTEST macro, 157 OUTPUT statement (AUTOREG), 354 OUTPUT statement (COUNTREG), 493 OUTPUT statement (MDC), 901 OUTPUT statement (PANEL), 1279 OUTPUT statement (PDLREG), 1358 OUTPUT statement (QLIM), 1393 OUTPUT statement (SIMLIN), 1518 OUTPUT statement (SYSLIN), 1655 OUTPUT statement (VARMAX), 1909, 1987 OUTPUT statement (X11), 2051, 2071 OUTPUT statement (X12), 2121 PROC ARIMA statement, 227 PROC COMPUTAB statement, 439 PROC DATASOURCE statement, 539, 548 PROC ESM statement, 686 PROC EXPAND statement, 729, 756 PROC FORECAST statement, 792, 806 PROC SIMILARITY statement, 1448 PROC SIMLIN statement, 1523 PROC SPECTRA statement, 1546, 1552 PROC STATESPACE statement, 1585, 1601 PROC SYSLIN statement, 1635 PROC TIMESERIES statement, 1686 SOLVE statement (MODEL), 1001, 1121, 1150 TEST statement (ENTROPY), 650 TEST statement (MODEL), 1006 OUTACTUAL option FIT statement (MODEL), 988 PROC FORECAST statement, 792 SOLVE statement (MODEL), 1001 OUTALL option FIT statement (MODEL), 988 PROC FORECAST statement, 793 SOLVE statement (MODEL), 1001 OUTALL= option PROC DATASOURCE statement, 540, 552 OUTAR= option PROC STATESPACE statement, 1583, 1601 OUTBY= option PROC DATASOURCE statement, 540, 551 OUTCOMP= option COMPARE statement (LOAN), 850, 854 OUTCONT= option PROC DATASOURCE statement, 540, 550 OUTCORR option ESTIMATE statement (ARIMA), 234 PROC PANEL statement, 1270 PROC TSCSREG statement, 1734
Syntax Index F 2851
OUTCORR= option PROC TIMESERIES statement, 1686 OUTCOV option ENTROPY procedure, 642 ESTIMATE statement (ARIMA), 234 ESTIMATE statement (MODEL), 982 FIT statement (MODEL), 988 PROC PANEL statement, 1270 PROC SYSLIN statement, 1635 PROC TSCSREG statement, 1734 PROC VARMAX statement, 1886, 1989 OUTCOV3 option PROC SYSLIN statement, 1635 OUTCOV= option IDENTIFY statement (ARIMA), 230, 263 OUTCROSSCORR= option PROC TIMESERIES statement, 1686 OUTDECOMP= option PROC TIMESERIES statement, 1686 OUTERRORS option SOLVE statement (MODEL), 1001 OUTEST= option ENTROPY procedure, 642 ENTROPY statement, 664 ESTIMATE statement (ARIMA), 234, 264 ESTIMATE statement (MODEL), 982 ESTIMATE statement (UCM), 1764 FIT statement (MODEL), 988, 1112 PROC AUTOREG statement, 339 PROC COUNTREG statement, 489 PROC ESM statement, 687 PROC EXPAND statement, 729, 756 PROC FORECAST statement, 793, 808 PROC MDC statement, 890 PROC PANEL statement, 1270, 1321 PROC QLIM statement, 1385 PROC SIMLIN statement, 1515, 1522 PROC SYSLIN statement, 1635, 1655 PROC TSCSREG statement, 1734 PROC VARMAX statement, 1886, 1989 OUTESTALL option PROC FORECAST statement, 793 OUTESTTHEIL option PROC FORECAST statement, 793 OUTEVENT= option PROC DATASOURCE statement, 540, 553 OUTEXTRAP option PROC X11 statement, 2042 OUTFITSTATS option PROC FORECAST statement, 793 OUTFOR= option FORECAST statement (UCM), 1766 PROC ESM statement, 687 OUTFORECAST option
X11 statement (X12), 2134 OUTFULL option PROC FORECAST statement, 793 OUTHT= option GARCH statement, 1907 PROC VARMAX statement, 1991 OUTL= option ENTROPY procedure, 642 ENTROPY statement, 665 OUTLAGS option FIT statement (MODEL), 988 SOLVE statement (MODEL), 1001 OUTLIER statement UCM procedure, 1773 X12 procedure, 2122 OUTLIMIT option PROC FORECAST statement, 793 OUTMEASURE= option PROC SIMILARITY statement, 1448 OUTMODEL= option ESTIMATE statement (ARIMA), 234, 267 PROC MODEL statement, 971, 1167 PROC STATESPACE statement, 1584, 1602 OUTP= option ENTROPY procedure, 642 ENTROPY statement, 664 OUTPARMS= option FIT statement (MODEL), 1112 PROC MODEL statement, 970, 1107 OUTPATH= option PROC SIMILARITY statement, 1448 OUTPREDICT option FIT statement (MODEL), 988 SOLVE statement (MODEL), 1001 OUTPROCINFO= option PROC ESM statement, 687 OUTPUT OUT=, 384 OUTPUT statement AUTOREG procedure, 354 COUNTREG procedure, 493 PANEL procedure, 1279 PDLREG procedure, 1358 PROC PANEL statement, 1320 QLIM procedure, 1392 SIMLIN procedure, 1517 SYSLIN procedure, 1640 VARMAX procedure, 1908 X11 procedure, 2051 X12 procedure, 2121 OUTRESID option FIT statement (MODEL), 988, 1182 PROC FORECAST statement, 793 SOLVE statement (MODEL), 1001
2852 F Syntax Index
OUTS= option ENTROPY procedure, 642 FIT statement (MODEL), 989, 1026, 1112 OUTSEASON= option PROC TIMESERIES statement, 1686 OUTSELECT= option PROC DATASOURCE statement, 540 OUTSEQUENCE= option PROC SIMILARITY statement, 1448 OUTSN= option FIT statement (MODEL), 989 OUTSPAN= option PROC X11 statement, 2043, 2071 VAR statement (X11), 2071 OUTSSCP= option PROC SYSLIN statement, 1635, 1656 OUTSTAT= option DFTEST macro, 155 ESTIMATE statement (ARIMA), 234, 268 PROC ESM statement, 687 PROC VARMAX statement, 1886, 1992 OUTSTB= option PROC X11 statement, 2043, 2071 OUTSTD option PROC FORECAST statement, 793 OUTSUM= option FIXED statement (LOAN), 844 PROC ESM statement, 687 PROC LOAN statement, 841, 854 PROC SIMILARITY statement, 1448 PROC TIMESERIES statement, 1687 OUTSUSED= option ENTROPY procedure, 642 FIT statement (MODEL), 989, 1026, 1113 OUTTDR= option PROC X11 statement, 2043, 2072 OUTTRANS= option PROC PANEL statement, 1322 OUTTRANS=option PROC PANEL statement, 1270 OUTTREND= option PROC TIMESERIES statement, 1687 OUTUNWGTRESID option FIT statement (MODEL), 989 OUTV= option FIT statement (MODEL), 989, 1109, 1113 OUTVARS statement MODEL procedure, 997 OVDIFCR= option ARIMA statement (X11), 2045 OVERID option MODEL statement (SYSLIN), 1639 OVERPRINT option ROWS statement (COMPUTAB), 443
P option IRREGULAR statement (UCM), 1770 PROC SPECTRA statement, 1546 P= option ESTIMATE statement (ARIMA), 233 FIXED statement (LOAN), 842 GARCH statement, 1907 IDENTIFY statement (ARIMA), 230 MODEL statement (AUTOREG), 341 MODEL statement (VARMAX), 1900 OUTPUT statement (AUTOREG), 355 OUTPUT statement (MDC), 901 OUTPUT statement (PANEL), 1279 OUTPUT statement (PDLREG), 1359 OUTPUT statement (SIMLIN), 1518 _PAGE_ option COLUMNS statement (COMPUTAB), 441 ROWS statement (COMPUTAB), 443 PANEL procedure, 1268 syntax, 1268 PARAMETERS statement MODEL procedure, 997, 1151 PARKS option MODEL statement (PANEL), 1278 MODEL statement (TSCSREG), 1737 PARMS= option FIT statement (MODEL), 984 PARMSDATA= option PROC MODEL statement, 970, 1107 SOLVE statement (MODEL), 1001 PARMTOL= option PROC STATESPACE statement, 1585 PARTIAL option MODEL statement (AUTOREG), 344 MODEL statement (PDLREG), 1358 PASTMIN= option PROC STATESPACE statement, 1584 PATH= option TARGET statement (SIMILARITY), 1460 PAYMENT= option FIXED statement (LOAN), 842 PCHOW= option FIT statement (MODEL), 991, 1081 MODEL statement (AUTOREG), 344 PDATA= option ENTROPY procedure, 641 ENTROPY statement, 663 %PDL macro, 1103 PDLREG procedure, 1354 syntax, 1354 PDWEIGHTS statement X11 procedure, 2052 PERIOD= option CYCLE statement (UCM), 1761
Syntax Index F 2853
PERMCO= option LIBNAME statement (SASECRSP), 2199 PERMNO= option LIBNAME statement (SASECRSP), 2196 PERROR= option IDENTIFY statement (ARIMA), 230 PH option PROC SPECTRA statement, 1546 PHI option MODEL statement (PANEL), 1278 MODEL statement (TSCSREG), 1737 PHI= option DEPLAG statement (UCM), 1762 PLOT HAXIS=, 88 PLOT option AUTOREG statement (UCM), 1758 BLOCKSEASON statement (UCM), 1759 CYCLE statement (UCM), 1761 ESTIMATE statement (ARIMA), 233 ESTIMATE statement (UCM), 1764 FORECAST statement (UCM), 1766 IRREGULAR statement (UCM), 1768 MODEL statement (SYSLIN), 1639 PROC ARIMA statement, 224 PROC UCM statement, 1754 RANDOMREG statement (UCM), 1774 SEASON statement (UCM), 1776 SLOPE statement (UCM), 1777 SPLINEREG statement (UCM), 1779 SPLINESEASON statement (UCM), 1780 PLOT= option PROC ESM statement, 687 PLOTS option PROC ARIMA statement, 224 PROC AUTOREG statement, 339 PROC ENTROPY statement, 642 PROC MODEL statement, 970 PROC PANEL statement, 1271 PROC UCM statement, 1754 PLOTS= option PROC EXPAND statement, 730 PROC SIMILARITY statement, 1449 PROC TIMESERIES statement, 1687 PM= option OUTPUT statement (AUTOREG), 356 OUTPUT statement (PDLREG), 1359 PMFACTOR= option MONTHLY statement (X11), 2049 PNT= option FIXED statement (LOAN), 843 PNTPCT= option FIXED statement (LOAN), 843 POINTPCT= option
FIXED statement (LOAN), 843 POINTS= option FIXED statement (LOAN), 843 POISSON option ERRORMODEL statement (MODEL), 980 POOLED option MODEL statement (PANEL), 1278 POWER= option TRANSFORM statement (X12), 2130 PRC= option FIXED statement (LOAN), 844 pred OUTPUT statement (COUNTREG), 493 PREDEFINED= option ADJUST statement (X12), 2114 PREDEFINED statement (X12), 2125 PREDICTED OUTPUT statement (QLIM), 1393 PREDICTED= option OUTPUT statement (AUTOREG), 355 OUTPUT statement (PANEL), 1279 OUTPUT statement (PDLREG), 1359 OUTPUT statement (SIMLIN), 1518 OUTPUT statement (SYSLIN), 1640 PREDICTEDM= option OUTPUT statement (AUTOREG), 356 OUTPUT statement (PDLREG), 1359 PREPAYMENTS= option FIXED statement (LOAN), 843 PRICE= option FIXED statement (LOAN), 844 PRIMAL option ENTROPY procedure, 643 PRINT option AUTOREG statement (UCM), 1758 BLOCKSEASON statement (UCM), 1760 CYCLE statement (UCM), 1761 ESTIMATE statement (UCM), 1765 FORECAST statement (UCM), 1767 IRREGULAR statement (UCM), 1768 LEVEL statement (UCM), 1772 OUTLIER statement (UCM), 1773 PROC STATESPACE statement, 1585 RANDOMREG statement (UCM), 1774 SEASON statement (UCM), 1776 SLOPE statement (UCM), 1777 SPLINESEASON statement (UCM), 1780 SSPAN statement (X11), 2056 STEST statement (SYSLIN), 1644 TEST statement (SYSLIN), 1646 PRINT= option AUTOMDL statement (X12), 2119 BOXCOXAR macro, 152 LOGTEST macro, 157
2854 F Syntax Index
MODEL statement (VARMAX), 1896 PROC SIMILARITY statement, 1449 PROC TIMESERIES statement, 1688 PRINT=(CORRB) option MODEL statement (VARMAX), 1896 PRINT=(CORRX) option MODEL statement (VARMAX), 1896 PRINT=(CORRY) option MODEL statement (VARMAX), 1897, 1936 PRINT=(COVB) option MODEL statement (VARMAX), 1897 PRINT=(COVPE) option MODEL statement (VARMAX), 1897, 1931 PRINT=(COVX) option MODEL statement (VARMAX), 1897 PRINT=(COVY) option MODEL statement (VARMAX), 1897 PRINT=(DECOMPOSE) option MODEL statement (VARMAX), 1897, 1933 PRINT=(DIAGNOSE) option MODEL statement (VARMAX), 1897 PRINT=(DYNAMIC) option MODEL statement (VARMAX), 1897, 1917 PRINT=(ESTIMATES) option MODEL statement (VARMAX), 1897 PRINT=(IARR) option MODEL statement (VARMAX), 1871, 1898 PRINT=(IMPULSE) option MODEL statement (VARMAX), 1924 PRINT=(IMPULSE=) option MODEL statement (VARMAX), 1898 PRINT=(IMPULSX) option MODEL statement (VARMAX), 1920 PRINT=(IMPULSX=) option MODEL statement (VARMAX), 1898 PRINT=(PARCOEF) option MODEL statement (VARMAX), 1899, 1937 PRINT=(PCANCORR) option MODEL statement (VARMAX), 1899, 1939 PRINT=(PCORR) option MODEL statement (VARMAX), 1899, 1938 PRINT=(ROOTS) option MODEL statement (VARMAX), 1899, 1942 PRINT=(YW) option MODEL statement (VARMAX), 1899 PRINT=option PROC ESM statement, 688 PRINTALL option ARIMA statement (X11), 2045 ESTIMATE statement (ARIMA), 236 FIT statement (MODEL), 991 FORECAST statement (ARIMA), 239 MODEL statement, 492 MODEL statement (VARMAX), 1896
PROC MODEL statement, 974 PROC QLIM statement, 1385 PROC UCM statement, 1757 SOLVE statement (MODEL), 1005 SSPAN statement (X11), 2056 PRINTDETAILS option PROC ESM statement, 688 PROC SIMILARITY statement, 1450 PROC TIMESERIES statement, 1688 PRINTERR option ESTIMATE statement (X12), 2116 PRINTFORM= option MODEL statement (VARMAX), 1896, 1920 PRINTFP option ARIMA statement (X11), 2045 PRINTOUT= option MONTHLY statement (X11), 2049 PROC STATESPACE statement, 1583 QUARTERLY statement (X11), 2054 PRINTREG option IDENTIFY statement (X12), 2118 PRIOR option MODEL statement (VARMAX), 1904 PRIOR=(IVAR) option MODEL statement (VARMAX), 1904 PRIOR=(LAMBDA=) option MODEL statement (VARMAX), 1904 PRIOR=(MEAN=) option MODEL statement (VARMAX), 1904 PRIOR=(NREP=) option MODEL statement (VARMAX), 1905 PRIOR=(THETA=) option MODEL statement (VARMAX), 1905 PRIORS statement ENTROPY procedure, 648 PRL= option FIT statement (MODEL), 984, 1083 PROB OUTPUT statement (COUNTREG), 493 OUTPUT statement (QLIM), 1393 PROBALL OUTPUT statement (QLIM), 1393 PROBDF function, 158 macro, 158 PROBZERO OUTPUT statement (COUNTREG), 493 PROC ARIMA statement, 224 PROC AUTOREG OUTEST=, 384 PROC AUTOREG statement, 339 PROC COMPUTAB NOTRANS, 447 OUT=, 457
Syntax Index F 2855
PROC COMPUTAB statement, 438 PROC DATASOURCE statement, 537 PROC ENTROPY statement, 640 PROC ESM statement, 686 PROC EXPAND statement, 728 PROC FORECAST statement, 790 PROC LOAN statement, 840 PROC MDC statement, 889 PROC MODEL statement, 970 PROC PANEL statement, 1270 PROC PDLREG statement, 1355 PROC SIMILARITY statement, 1447 PROC SIMLIN statement, 1515 PROC SPECTRA statement, 1545 PROC STATESPACE statement, 1582 PROC SYSLIN statement, 1634 PROC TIMESERIES statement, 1686 PROC TSCSREG statement, 1734 PROC UCM statement, 1753, see UCM procedure PROC VARMAX statement, 1885 PROC X11 statement, 2042 PROC X12 statement, 2109 PRODUCTION option ENDOGENOUS statement (QLIM), 1389 PROFILE ESTIMATE statement (UCM), 1765 PROJECT= option FORECAST command (TSFS), 2547 PSEUDO= option SOLVE statement (MODEL), 1003 PURE option ENTROPY procedure, 641 PURGE option RESET statement (MODEL), 999 PUT, 1165 PWC option COMPARE statement (LOAN), 849 PWOFCOST option COMPARE statement (LOAN), 849 Q option IRREGULAR statement (UCM), 1770 Q= option ESTIMATE statement (ARIMA), 233 GARCH statement, 1907 IDENTIFY statement (ARIMA), 230 MODEL statement (AUTOREG), 341 MODEL statement (VARMAX), 1900, 1953 QLIM procedure, 1382 syntax, 1382 QLIM procedure, CLASS statement, 1387 QLIM procedure, TEST statement, 1394 QLIM procedure, WEIGHT statement, 1395
QTR function, 147 QUARTERLY statement X11 procedure, 2052 QUASI= option SOLVE statement (MODEL), 1003 QUIET= option FCMPOPT statement (SIMILARITY), 1451 R= option FIXED statement (LOAN), 842 OUTPUT statement (AUTOREG), 356 OUTPUT statement (PANEL), 1280 OUTPUT statement (PDLREG), 1359 OUTPUT statement (SIMLIN), 1518 RANDINIT option MODEL statement (MDC), 894 RANDNUM= option MODEL statement (MDC), 893 RANDOM= option SOLVE statement (MODEL), 1003, 1121, 1136 RANDOMREG UCM procedure, 1773 RANGE, 1153 RANGE option MODEL statement (ENTROPY), 648 RANGE statement DATASOURCE procedure, 544 MODEL procedure, 998 RANGE= option LIBNAME statement (SASECRSP), 2201 LIBNAME statement (SASEFAME), 2294 RANK option MODEL statement (MDC), 894 RANK= option COINTEG statement (VARMAX), 1892, 1973 RANONE option MODEL statement (PANEL), 1279 MODEL statement (TSCSREG), 1737 RANTWO option MODEL statement (PANEL), 1279 MODEL statement (TSCSREG), 1737 RAO option TEST statement (ENTROPY), 650 TEST statement (MODEL), 1006 RATE= option FIXED statement (LOAN), 842 RECFM= option PROC DATASOURCE statement, 539 RECPEV= option OUTPUT statement (AUTOREG), 356 RECRES= option
2856 F Syntax Index
OUTPUT statement (AUTOREG), 356 REDUCECV= option AUTOMDL statement (X12), 2120 REDUCED option PROC SYSLIN statement, 1637 REEVAL option FORECAST command (TSFS), 2549 REFIT option FORECAST command (TSFS), 2549 REGRESSION statement X12 procedure, 2124 Remote FAME data access physical name using #port number, 2293 RENAME statement DATASOURCE procedure, 547 REPLACEBACK option FORECAST statement (ESM), 690 REPLACEMISSING option FORECAST statement (ESM), 690 RESET option MODEL statement (AUTOREG), 344 RESET statement MODEL procedure, 999 RESIDDATA= option SOLVE statement (MODEL), 1001 RESIDEST option PROC STATESPACE statement, 1585 RESIDUAL OUTPUT statement (QLIM), 1393 RESIDUAL= option OUTPUT statement (AUTOREG), 356 OUTPUT statement (PANEL), 1280 OUTPUT statement (PDLREG), 1359 OUTPUT statement (SIMLIN), 1518 OUTPUT statement (SYSLIN), 1640 RESIDUALM= option OUTPUT statement (AUTOREG), 356 OUTPUT statement (PDLREG), 1359 RESTART option MODEL statement (MDC), 894 RESTRICT statement AUTOREG procedure, 352 COUNTREG procedure, 493 ENTROPY procedure, 648 MDC procedure, 901 MODEL procedure, 999 PDLREG procedure, 1360 QLIM procedure, 1393 STATESPACE procedure, 1587 SYSLIN procedure, 1641 VARMAX procedure, 1909 RETAIN statement MODEL procedure, 1167 RHO option
MODEL statement (PANEL), 1279 MODEL statement (TSCSREG), 1737 RHO= option AUTOREG statement (UCM), 1758 CYCLE statement (UCM), 1761 RKNOTS option SPLINESEASON statement (UCM), 1780 RM= option OUTPUT statement (AUTOREG), 356 OUTPUT statement (PDLREG), 1359 ROBUST option MODEL statement (PANEL), 1279 ROUND= NONE option FIXED statement (LOAN), 844 ROUND= option FIXED statement (LOAN), 844 ‘row titles’ option ROWS statement (COMPUTAB), 442 ROWS statement COMPUTAB procedure, 441 RTS= option PROC COMPUTAB statement, 439 S option IRREGULAR statement (UCM), 1770 PROC SPECTRA statement, 1546 SAMESCALE option MODEL statement (MDC), 895 SAR option IRREGULAR statement (UCM), 1770 SATISFY= option SOLVE statement (MODEL), 1000 SCALE= option INPUT statement (SIMILARITY), 1455 SCAN option IDENTIFY statement (ARIMA), 230 SCENTER option MODEL statement (VARMAX), 1895 SCHEDULE option FIXED statement (LOAN), 845 SCHEDULE= option FIXED statement (LOAN), 845 SCHEDULE= YEARLY option FIXED statement (LOAN), 845 SDATA= option ENTROPY procedure, 642, 663 FIT statement (MODEL), 989, 1107, 1187 SOLVE statement (MODEL), 1001, 1121, 1148 SDIAG option PROC SYSLIN statement, 1636 SDIF= option INPUT statement (SIMILARITY), 1455 TARGET statement (SIMILARITY), 1460
Syntax Index F 2857
VAR statement (TIMESERIES), 1698 SDIFF= option IDENTIFY statement (X12), 2117 SEASON statement TIMESERIES procedure, 1696 UCM procedure, 1774 SEASONALITY= option PROC ESM statement, 688 PROC SIMILARITY statement, 1450 PROC TIMESERIES statement, 1688 SEASONALMA= option X11 statement (X12), 2134 SEASONS= option PROC FORECAST statement, 793 PROC X12 statement, 2110 SECOND function, 147 SEED= option MODEL statement (MDC), 895 QLIM procedure, 1386 SOLVE statement (MODEL), 1004, 1121 SEIDEL option SOLVE statement (MODEL), 1003 SELECT, 1166 SELECT option ENDOGENOUS statement (QLIM), 1390 SETID= option LIBNAME statement (SASECRSP), 2195 SETMISSING= option FORECAST statement (ESM), 690 ID statement (ESM), 693 ID statement (SIMILARITY), 1453 ID statement (TIMESERIES), 1695 INPUT statement (SIMILARITY), 1455 TARGET statement (SIMILARITY), 1461 VAR statement (TIMESERIES), 1699 SICCD= option LIBNAME statement (SASECRSP), 2200 SIGCORR= option PROC STATESPACE statement, 1584 SIGMA= option OUTLIER statement (ARIMA), 236 SIGSQ= option FORECAST statement (ARIMA), 239 SIMILARITY procedure, 1446 syntax, 1446 SIMLIN procedure, 1514 syntax, 1514 SIMPLE option PROC SYSLIN statement, 1637 SIMULATE option SOLVE statement (MODEL), 1002 SIN function, 1159 SINGLE option
SOLVE statement (MODEL), 1003, 1141 SINGULAR= option ESTIMATE statement (ARIMA), 236 FIT statement (MODEL), 993 MODEL statement (TSCSREG), 1737 PROC FORECAST statement, 794 PROC STATESPACE statement, 1585 PROC SYSLIN statement, 1636 SINGULAR=option MODEL statement (PANEL), 1279 SINH function, 1159 SINTPER= option PROC FORECAST statement, 794 SKIP option ROWS statement (COMPUTAB), 443 SKIPFIRST= option ESTIMATE statement (UCM), 1765 FORECAST statement (UCM), 1767 SKIPLAST= option ESTIMATE statement (UCM), 1763 SLAG option LAG statement (PANEL), 1275 SLAG statement LAG statement (PANEL), 1275 SLENTRY= option PROC FORECAST statement, 794 SLIDE= option TARGET statement (SIMILARITY), 1461 SLOPE statement UCM procedure, 1777 SLSTAY= option MODEL statement (AUTOREG), 348 PROC FORECAST statement, 794 SMA option IRREGULAR statement (UCM), 1771 SOLVE statement MODEL procedure, 1000 SOLVEPRINT option SOLVE statement (MODEL), 1005 SORTNAMES option PROC ESM statement, 688 PROC SIMILARITY statement, 1450 PROC TIMESERIES statement, 1688 SOURCE= option LIBNAME statement (SASEHAVR), 2344 SP option IRREGULAR statement (UCM), 1771 SPAN= option OUTLIER statement (X12), 2122 PROC X12 statement, 2110 SPECTRA procedure, 1544 syntax, 1544 SPLINEREG UCM procedure, 1778
2858 F Syntax Index
SPLINESEASON UCM procedure, 1779 SPSCALE option MODEL statement (MDC), 895 SQ option IRREGULAR statement (UCM), 1771 SQRT function, 1159 SRESTRICT statement SYSLIN procedure, 1642 SSPAN statement X11 procedure, 2055 START= option FIT statement (MODEL), 984, 1034, 1180 FIXED statement (LOAN), 844 ID statement (ESM), 694 ID statement (SIMILARITY), 1453 ID statement (TIMESERIES), 1696 LIBNAME statement (SASEHAVR), 2344 MODEL statement (AUTOREG), 348 MONTHLY statement (X11), 2050 PROC FORECAST statement, 794 PROC SIMLIN statement, 1516 PROC X12 statement, 2110 QUARTERLY statement (X11), 2054 SOLVE statement (MODEL), 1002 STARTITER option FIT statement (MODEL), 1035 STARTITER= option FIT statement (MODEL), 993 STARTSUM= option PROC ESM statement, 688 STARTUP= option MODEL statement (AUTOREG), 342 STAT= option FORECAST command (TSFS), 2548 STATESPACE procedure, 1580 syntax, 1580 STATIC option FIT statement (MODEL), 1070 SOLVE statement (MODEL), 1002, 1068, 1117 STATIONARITY= option IDENTIFY statement (ARIMA), 231 MODEL statement (AUTOREG), 345 STATS option SOLVE statement (MODEL), 1005, 1135 STB option MODEL statement (PDLREG), 1358 MODEL statement (SYSLIN), 1639 STD= option HETERO statement (AUTOREG), 351 STEST statement SYSLIN procedure, 1643 SUMBY statement
COMPUTAB procedure, 446 SUMMARY option MONTHLY statement (X11), 2050 QUARTERLY statement (X11), 2054 SUMONLY option PROC COMPUTAB statement, 439 SUR option ENTROPY procedure, 641 FIT statement (MODEL), 987, 1010, 1183 PROC SYSLIN statement, 1636 SYSLIN procedure, 1632 syntax, 1632 T option ERRORMODEL statement (MODEL), 981, 1022 TABLES statement X11 procedure, 2056 X12 procedure, 2129 TABLES table names TABLES statement (X12), 2130 TAN function, 1159 TANH function, 1159 TARGET statement SIMILARITY procedure, 1456 TAX= option COMPARE statement (LOAN), 849 TAXRATE= option COMPARE statement (LOAN), 849 TCCV= option OUTLIER statement (X12), 2124 TDCOMPUTE= option MONTHLY statement (X11), 2050 TDCUTOFF= option SSPAN statement (X11), 2055 TDREGR= option MONTHLY statement (X11), 2050 TECH= option ENTROPY procedure, 644 TECHNIQUE= option ENTROPY procedure, 644 TEST statement AUTOREG procedure, 353 ENTROPY procedure, 649 MODEL procedure, 1005 SYSLIN procedure, 1644 VARMAX procedure, 1880, 1911 TEST= option HETERO statement (AUTOREG), 351 THEIL option SOLVE statement (MODEL), 1005, 1135 TI option COMPARE statement (LOAN), 849 TICKER= option
Syntax Index F 2859
LIBNAME statement (SASECRSP), 2200 TIME function, 148 TIME option MODEL statement (PANEL), 1279 TIME= option FIT statement (MODEL), 989, 1074 SOLVE statement (MODEL), 1074 TIMEPART function, 97, 148 TIMESERIES procedure, 1683 syntax, 1683 TIN=, 733 _TITLES_ option COLUMNS statement (COMPUTAB), 441 TO= option PROC EXPAND statement, 729, 733, 734 TODAY function, 148 TOL= option ESTIMATE statement (X12), 2116 TOTAL option PROC SIMLIN statement, 1516 TOUT=, 733 TR option MODEL statement (AUTOREG), 342 TRACE option PROC MODEL statement, 974 TRACE= option FCMPOPT statement (SIMILARITY), 1451 TRANSFORM statement X12 procedure, 2130 TRANSFORM=, 733 TRANSFORM= option ARIMA statement (X11), 2046 FORECAST statement (ESM), 691 INPUT statement (SIMILARITY), 1455 OUTPUT statement (AUTOREG), 356 OUTPUT statement (PDLREG), 1359 TARGET statement (SIMILARITY), 1461 VAR statement (TIMESERIES), 1699 TRANSFORMIN= option CONVERT statement (EXPAND), 733, 742 TRANSFORMOUT= option CONVERT statement (EXPAND), 733, 742 TRANSIN=, 733 TRANSOUT=, 733 TRANSPOSE procedure, 119 TRANSPOSE= option CORR statement (TIMESERIES), 1690 CROSSCORR statement (TIMESERIES), 1691 DECOMP statement (TIMESERIES), 1693 SEASON statement (TIMESERIES), 1696 TREND statement (TIMESERIES), 1698 TREND statement
TIMESERIES procedure, 1697 TREND= option DFPVALUE macro, 153 DFTEST macro, 155 MODEL statement (VARMAX), 1895 PROC FORECAST statement, 794 TREND=LINEAR option MODEL statement (VARMAX), 1970 TRENDADJ option MONTHLY statement (X11), 2050 QUARTERLY statement (X11), 2054 TRENDMA= option MONTHLY statement (X11), 2050 X11 statement (X12), 2135 TRIMMISS= option INPUT statement (SIMILARITY), 1456 TRIMMISSING= option INPUT statement (SIMILARITY), 1461 TRUEINTEREST option COMPARE statement (LOAN), 849 TRUNCATED option ENDOGENOUS statement (QLIM), 1389 TSCSREG procedure, 1733 syntax, 1733 TSFS, 2379 TSFS procedure, 2379 TSNAME = option PROC PANEL statement, 1273 TSVIEW macro, 2546 TWOSTEP option MODEL statement (PANEL), 1279 TYPE= option FIT statement (MODEL), 989 MODEL statement (AUTOREG), 341 MODEL statement (MDC), 895 OUTLIER statement (ARIMA), 236 OUTLIER statement (X12), 2122 PROC DATASOURCE statement, 539 PROC SIMLIN statement, 1516 SOLVE statement (MODEL), 1002 TEST statement (AUTOREG), 353 TYPE=option BLOCKSEASON statement (UCM), 1760 SEASON statement (UCM), 1776 U option ERRORMODEL statement (MODEL), 981 UCL= option OUTPUT statement (AUTOREG), 356 OUTPUT statement (PDLREG), 1359 UCLM= option OUTPUT statement (AUTOREG), 356 OUTPUT statement (PDLREG), 1359
2860 F Syntax Index
UCM procedure, 1750 syntax, 1750 UCM procedure, PROC UCM statement PLOT option, 1754 UL option ROWS statement (COMPUTAB), 443 UNIFORMEC= option MODEL statement (MDC), 893 UNIFORMPARM= option MODEL statement (MDC), 893 UNITSCALE= option MODEL statement (MDC), 892 UNITVARIANCE= option MODEL statement (MDC), 895 UNREST option MODEL statement (SYSLIN), 1640 UPPERBOUND= option ENDOGENOUS statement (QLIM), 1389 URSQ option MODEL statement (AUTOREG), 348 USE= option FORECAST statement (ESM), 691 USERDEFINED statement X12 procedure, 2131 USSCP option PROC SYSLIN statement, 1637 USSCP2 option PROC SYSLIN statement, 1637 UTILITY statement MDC procedure, 902 VAR option MODEL statement (PANEL), 1277 MODEL statement (TSCSREG), 1736 VAR statement FORECAST procedure, 795 MODEL procedure, 1007, 1151 SPECTRA procedure, 1547 STATESPACE procedure, 1587 SYSLIN procedure, 1646 TIMESERIES procedure, 1698 X11 procedure, 2056 X12 procedure, 2132 VAR= option FORECAST command (TSFS), 2546, 2547 IDENTIFY statement (ARIMA), 231 TSVIEW command (TSFS), 2546, 2547 VARDEF= option FIT statement (MODEL), 987, 1013, 1026 MODEL statement (VARMAX), 1895 Proc ENTROPY, 641 PROC SYSLIN statement, 1637 VARIANCE= option AUTOREG statement (UCM), 1758
CYCLE statement (UCM), 1761 IRREGULAR statement (UCM), 1768 LEVEL statement (UCM), 1772 RANDOMREG statement (UCM), 1774 SLOPE statement (UCM), 1778 SPLINEREG statement (UCM), 1779 SPLINESEASON statement (UCM), 1780 VARIANCE=option BLOCKSEASON statement (UCM), 1760 SEASON statement (UCM), 1777 VARLIST statement, 890 VARMAX procedure, 1882 syntax, 1882 VCOMP= option MODEL statement (PANEL), 1279 VDATA= option FIT statement (MODEL), 989, 1108 W option ARM statement (LOAN), 847 WALD option TEST statement (ENTROPY), 650 TEST statement (MODEL), 1006 TEST statement (PANEL), 1281 TEST statement (QLIM), 1395 WEEK function, 148 WEEKDAY function, 96 WEEKDAY function, 148 WEIGHT statement, 1052 ENTROPY procedure, 650 MODEL procedure, 1007 SYSLIN procedure, 1646 WEIGHT= option PROC FORECAST statement, 794 WEIGHTS statement SPECTRA procedure, 1547 WHEN, 1166 WHERE statement DATASOURCE procedure, 543 WHITE option FIT statement (MODEL), 991 WHITENOISE= option ESTIMATE statement (ARIMA), 233 IDENTIFY statement (ARIMA), 232 WHITETEST option PROC SPECTRA statement, 1546 WILDCARD= option LIBNAME statement (SASEFAME), 2294 WISHART= option SOLVE statement (MODEL), 1004 WORSTCASE option ARM statement (LOAN), 847 X11 procedure, 2040
Syntax Index F 2861
syntax, 2040 X11 statement X12 procedure, 2132 X12 procedure, 2106 syntax, 2106 XBETA OUTPUT statement (COUNTREG), 493 OUTPUT statement (QLIM), 1393 XBETA= option OUTPUT statement (MDC), 901 XLAG option LAG statement (PANEL), 1275 XLAG statement LAG statement (PANEL), 1275 XLAG= option MODEL statement (VARMAX), 1900 XPX option FIT statement (MODEL), 992, 1043 MODEL statement (PDLREG), 1358 MODEL statement (SYSLIN), 1640 XREF option PROC MODEL statement, 973, 1170 YEAR function, 96, 148 YRAHEADOUT option PROC X11 statement, 2043 YYQ function, 96, 148 ZERO= option COLUMNS statement (COMPUTAB), 441 ROWS statement (COMPUTAB), 443 ZEROMISS option PROC FORECAST statement, 794 ZEROMISS= option FORECAST statement (ESM), 691 INPUT statement (SIMILARITY), 1456 TARGET statement (SIMILARITY), 1462 ZEROMISSING= option ID statement (PROC ESM), 694 ID statement (SIMILARITY), 1454 ZEROMODEL statement COUNTREG procedure, 494 ZEROWEIGHT= option MONTHLY statement (X11), 2051 QUARTERLY statement (X11), 2055 ZGAMMA OUTPUT statement (COUNTREG), 493 ZLAG option LAG statement (PANEL), 1275 ZLAG statement LAG statement (PANEL), 1275
Your Turn We welcome your feedback. 3 If you have comments about this book, please send them to [email protected]. Include the full title and page numbers (if applicable). 3 If you have comments about the software, please send them to [email protected].
SAS Publishing Delivers! ®
Whether you are new to the work force or an experienced professional, you need to distinguish yourself in this rapidly changing and competitive job market. SAS Publishing provides you with a wide range of resources to help you set yourself apart. Visit us online at support.sas.com/bookstore. ®
SAS Press ®
Need to learn the basics? Struggling with a programming problem? You’ll find the expert answers that you need in example-rich books from SAS Press. Written by experienced SAS professionals from around the world, SAS Press books deliver real-world insights on a broad range of topics for all skill levels.
SAS Documentation
support.sas.com/saspress
®
To successfully implement applications using SAS software, companies in every industry and on every continent all turn to the one source for accurate, timely, and reliable information: SAS documentation. We currently produce the following types of reference documentation to improve your work experience: • Online help that is built into the software. • Tutorials that are integrated into the product. • Reference documentation delivered in HTML and PDF – free on the Web. • Hard-copy books.
support.sas.com/publishing
SAS Publishing News ®
Subscribe to SAS Publishing News to receive up-to-date information about all new SAS titles, author podcasts, and new Web site features via e-mail. Complete instructions on how to subscribe, as well as access to past issues, are available at our Web site.
support.sas.com/spn
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2009 SAS Institute Inc. All rights reserved. 518177_1US.0109