Biostatistics (2001), 2, 3, pp. 261–276 Printed in Great Britain
Semiparametric models and inference for biomedical time series with extra-variation ROBERTO IANNACCONE Via Domenico Panaroli, S6, 00172 Roma, Italia
[email protected] STUART COLES∗ Department of Mathematics, University of Bristol, Bristol BS8 1TW, UK
[email protected] S UMMARY Biomedical trials often give rise to data having the form of time series of a common process on separate individuals. One model which has been proposed to explain variations in such series across individuals is a random effects model based on sample periodograms. The use of spectral coefficients enables models for individual series to be constructed on the basis of standard asymptotic theory, whilst variations between individuals are handled by permitting a random effect perturbation of model coefficients. This paper extends such methodology in two ways: first, by enabling a nonparametric specification of underlying spectral behaviour; second, by addressing some of the tricky computational issues which are encountered when working with this class of random effect models. This leads to a model in which a population spectrum is specified nonparametrically through a dynamic system, and the processes measured on individuals within the population are assumed to have a spectrum which has a random effect perturbation from the population norm. Simulation studies show that standard MCMC algorithms give effective inferences for this model, and applications to biomedical data suggest that the model itself is capable of revealing scientifically important structure in temporal characteristics both within and between individual processes. Keywords: Biomedical trials; Dynamic systems; Random effects; Spectral analysis; Time series.
1. I NTRODUCTION A widely used application of time series methodology is in the study of measurements taken as part of biomedical trials (for example, Diggle (1990)). Such trials often lead to time series of a common process measured on different subjects, giving rise to a range of modelling options. On the one hand it can be argued that each individual’s process has its own stochastic history, and that this should be analysed in isolation from those of other individuals. The opposite extreme is to argue that since the time series are measurements of the same process on individuals from a common population then, perhaps after allowance for covariate effects, the time series should be treated as replicates of a single process. There is also a compromise view: empirical studies often reveal a certain similarity in the temporal characteristics of data recorded on different subjects, but not to such an extent that variations can be explained by measured covariates. This has led a number of authors to propose random effect models which enable ∗ To whom correspondence should be addressed.
c Oxford University Press (2001)
262
R. I ANNACCONE AND S. C OLES
a balance that exploits commonality of observed features between individuals, but also permit a variation exceeding that which is explainable by replication of a single process, and also which is not related to observed covariates. Brillinger (1973) applied a model of this type directly to the measurement process. We base our developments on an alternative approach due to Diggle and al Wasel (1997), who apply the random effect model structure to the spectral characterization of the stochastic properties of individual histories. In broad terms the Diggle and al Wasel model assumes that for a particular process there is what might be called a baseline spectral function determining the stochastic properties of the process. Then, each individual in the population has their own spectral function which is a modification of the baseline spectrum, due to both explainable covariate effects and nonexplainable random effects. The observations on each individual then give rise to a series whose empirical periodogram may be regarded as a realization from that same individual’s theoretical spectrum. There are a number of advantages of modelling spectral coefficients, including an ease of interpretation and a convenient asymptotic sampling property. This latter point is made explicit in the error structure of the Diggle and al Wasel model. In their paper, they show such a model to be feasible in terms of inference, and useful in terms of its applicability to a range of biomedical datasets. There are limitations, however, which make Diggle and al Wasel’s model more restrictive than is desirable. First, the computational technique proposed is cumbersome. As well as making estimation difficult, this renders virtually impossible the evaluation of estimates of precision. Second, it is necessary to assume a specific parametric form for the baseline spectral function of the process—the effects of a misspecification of this form are not discussed. In this paper we present an approach which seeks to overcome both of these limitations. Our starting point is to observe that the model has a natural hierarchical structure, implying that a Bayesian formulation, with inference carried out by MCMC, is straightforward to carry out. Furthermore, this approach also enables the replacement of the parametric population spectrum by a nonparametric specification, for which Carter and Kohn (1997) have developed efficient MCMC algorithms. Effectively, the present paper can be thought of as the incorporation of Carter and Kohn’s Monte Carlo methodology for spectral analysis, embedded in the random effect model of Diggle and al Wasel for describing within-population variations in time series behaviour. In Section 2 the model of Diggle and al Wasel is described in some detail; this requires a basic summary of the theory of spectral analysis. Section 3 describes Carter and Kohn’s methodology for nonparametric spectral analysis. Section 4 explains our amalgamation of these models and gives some simulation examples. Section 5 gives an application of the model to genuine biomedical time series data. Details of the MCMC algorithm are given in an appendix. 2. A RANDOM EFFECT MODEL FOR SAMPLE PERIODOGRAMS It is useful first to recount some basic facts about spectral analysis; see Priestley (1981), for example, for a substantially more detailed account. Given a stationary ergodic process {X t : t = 1, 2, . . . }, its spectral density is defined as
f (ω) =
∞
γu exp(−iuω);
0 ω π,
u=−∞
where γu = cov(X t , X t−u ). The spectral density function gives a complete description of the secondorder properties of the process and therefore its inference is equivalent to inferring the full covariance structure. Moreover, the sample version of the spectrum has neat asymptotic properties that form a natural
Semiparametric models and inference for biomedical time series with extra-variation
263
modelling basis. The sample periodogram of {X t } is defined by the sequence 2 n I (ω j ) = n −1 xt exp(−itω j ) ; t=1
j = 1, . . . , m
where m = [(n − 1)/2] and ω j = 2 jπ/n is the jth ‘Fourier frequency’. The periodogram definition can also be extended to the frequencies ω = 0 and ω = π , but special considerations apply at the boundaries as discussed in Section 6. Asymptotically, as n → ∞, 2I (ω j )/ f (ω j ) ∼ χ22 ,
(1)
and I (ω j ) and I (ωk ) are (asymptotically) independent whenever j = k. These asymptotic sampling properties represent an important argument in favour of a spectral approach to modelling time series, though the situation is somewhat more complicated than it first appears since I (ω j ) is not consistent for f (ω j ) because the number of Fourier frequencies grows with the sample size. Hence, estimation of the spectral density usually requires some sort of smoothing of the sample periodogram. The most extreme version of periodogram smoothing is to assume a parametric model. This is the approach adopted by Diggle and al Wasel as one aspect of their random effects model for withinpopulation variation. In detail, their model assumes raw data of the form {X i,t : i = 1, . . . , r ; t = 1, . . . , n} where X i,t corresponds to the measurement of a process on individual i at time point t, for which the time points are equally spaced. Potentially, individual i also has a sequence of covariate information. Then, denoting periodogram ordinates for individual i at frequency ω j as Ii (ω j ), their model has the form Ii (ω j ) = f i (ω j )Z i (ω j )Ui, j ,
(2)
where f i is a parametric model for the baseline spectrum, possibly modified due to observable covariates on subject i, the Z i (·) are independent realizations of stochastic process such that E[Z i (ω)] = 1 ∀ω, and the Ui, j are mutually independent standard exponential random variables. Ignoring the Z i term, this model corresponds exactly to the asymptotic law (1) for periodogram ordinates. The inclusion of the stochastic Z i term is to incorporate extra-variation between individuals, above that which can be explained by observed covariates that are modelled through f i . Imposing the unit-mean constraint on Z i means that f i (·)Z i (·) is the spectral density for the process {X i,t : t = 1, 2, . . . } measured on subject i after conditioning on the value of Z i , while f i is the population spectral density function in the sense that it is the expected spectral density of an individual in the population whose covariates correspond to those of individual i. As such, the Z i play the role of random effect terms, which modify each individual’s ‘true’ spectral density away from the population norm as a result of unobservable effects or intrinsic variation. Any parametric family of non-negative functions provides a valid model for f i , though it is natural to construct families that include particular models, such as white noise or moving averages, as special cases. Constructing models for the Z i is rather more difficult because of the mean constraint. Diggle and al Wasel propose the model q 1 2 Z i (ω) = exp [φs (ω)Bi,s + 2 σs φs (ω){1 − φs (ω)}] (3) s=0
with φ0 (ω) = 1, φ2l−1 (ω) = cos(lω), φ2l (ω) = sin(lω), each for l = 1, 2, . . . , and each Bi,s ∼ N(− 12 σs2 , σs2 ) independently for s = 0, . . . , q. It is easily checked that this formulation satisfies the imposed unit-mean constraint.
264
R. I ANNACCONE AND S. C OLES It is rather more natural to express model (2) in additive form by taking logarithms: Yi, j = log f i (ω j ) + log Z i (ω j ) + i, j ,
(4)
where the Yi, j are the log-periodogram ordinates and the i, j are doubly exponential, or so-called extreme value random variables. Diggle and al Wasel propose maximum likelihood to infer the parameters of f i , but this is only feasible using Monte Carlo techniques. In particular, their proposal uses a rather complicated Monte Carlo procedure and the complexity of the procedure limits the range of models that can be fitted for f i and renders impractical the calculation of, for example, standard errors. One feature of our work is the extension of the implicit hierarchical structure of model (4) into a fully Bayesian framework. As suggested by M¨uller (1997), this then admits treatment through Markov chain Monte Carlo, enabling considerably more complete inferences to be drawn. Details of this approach for the parametric model (4) are given in Iannaccone (1999). In this paper we restrict attention to inferences on the semiparametric model, overcoming the necessity of specifying a parametric form for f i in (4). 3. N ONPARAMETRIC SPECTRAL DENSITY ESTIMATION This section follows closely Carter and Kohn (1997). Consider model (4) without the random effect term for the periodogram ordinates of a single individual: Y j = log f (ω j ) + j ,
(5)
where an obvious collapse of the index notation has been adopted. To avoid the imposition of a parametric form on f , Carter and Kohn (1997) suggest spline smoothing in accordance with the error structure implied by (5). In the context of nonparametric smoothing splines have a number of desirable properties including properties of optimality and numerical efficiency (see, for example, Green and Silverman (1994)). Based on spline regression results from Wahba (1978); Wecker and Ansley (1983) show that model (5) can be formulated as a dynamic linear system. This results from the discretization of a stochastic differential equation representation for splines, which in turn lends itself to resolution by standard MCMC methods. Specifically, the spline formulation leads to the linear state-space model for d log f (ω) α j = log f (ω j ), dω ω=ω j given by
Y j = F α j + j α j = Gα j−1 + τ u j
where F = (1, 0) , 1 δ G= 0 1 and u is bivariate normal with zero mean and variance matrix
3 2 V=
δ 3 δ2 2
δ 2
δ
(6)
Semiparametric models and inference for biomedical time series with extra-variation
265
where δ = 2π/n. The parameter τ is a smoothing parameter which can either be pre-specified or estimated as part of the overall inference, subject to a prior specification. In particular, it can be seen from equation (6) how the dynamic system formulation imposes smoothness on the population spectrum f : small values of τ will lead to greater similarity in successive values of the log f (ω j ) compared with larger values of τ for which the sequence of log f (ω j ) values can be highly variable. A more direct interpretation of τ can also be made as the ‘equivalent degrees of freedom’, facilitating a comparison between parametric and nonparametric models in terms of smoothness. Inference for dynamic models using MCMC methodology is now a well-established technique (for example, Shephard (1994); de Jong and Shephard (1995)). The procedure would be particularly straightforward if the j were normally distributed instead of having an extreme value distribution. Carter and Kohn (1997) resolve this difficulty by approximating the extreme value distribution with a mixture of five normals and introducing an allocation variable to each of the j . Conditional on the complete set of allocation variables the dynamic system reduces to a linear Gaussian system, sometimes referred to as the Kalman filter (for example, Harvey (1989)). This induces a neat conjugacy which means posterior conditionals are available in closed form, leading to a simple Gibbs sampler scheme. The only additional requirement is an update of the allocation variables. Details of the algorithm in the more general setting of a semiparametric random effect model to be introduced in the next section are postponed until the Appendix. 4. S EMIPARAMETRIC RANDOM EFFECT MODEL 4.1
Model description
For ease of exposition we assume that f i = f for each individual i in model (4), though in Section 5 we give a example where covariate adjustments are also made. As explained above, the novelty of this paper is the replacement of a parametric form for the baseline population spectral density f in equation (4) with a nonparametric form as in the Carter and Kohn (1997) formulation. The formulation is equivalent to model (6) but with an additive random term to explain extra-variation between individuals. Equivalently, conditional on the random effects Bi,s in formulation (3), and defining Y j = (Y1, j , . . . , Yr, j ) and S j = (S1, j , . . . , Sr, j ) with Si, j = log(Z i (ω j )) =
q
[φs (ω j )Bi,s + 12 σs2 φs (ω j ){1 − φs (ω j )}],
(7)
s=0
the model reduces to the dynamic system Y j = F α j + S j + j α j = Gα j−1 + τ u j
(8)
where j is a vector of independent doubly exponential variables, independent for each j, 1 1 ... 1 F= , 0 0 ... 0 and α j , G, u j and τ are as in equation (6). Conditional on the random {Bi,s }, a Gibbs sampler scheme similar to that in Carter and Kohn (1997), but with allowance for independent replicate observations, can be used to update the α j and τ . Conditional on these values an update of the {Bi,s } and associated hyper-parameters can be carried out in a similar way to that developed for the parametric model described in Section 2. Details of the algorithm are given in the Appendix.
266
R. I ANNACCONE AND S. C OLES 4.2
Simulated examples
To verify the efficacy of the proposed MCMC scheme, a number of simulated examples were tried. In particular, two families of functions were adopted for the population spectrum f , together with a range of parameter settings for the random effect components based on a population of eight individuals. For f we tried both the parametric model of Diggle and al Wasel (1997): f (ω j ) = β1 exp{β2 cos(ω j ) + β3 sin(ω j )};
(9)
and the model proposed by Wahba (1983) and adopted by Carter and Kohn (1997): log f (ω j ) = 13 {B10,5 (ω j ) + Bκ,κ (ω j ) + B5,10 (ω j )},
(10)
for specified κ, where the Br,s are standard beta density functions with parameters r and s, standardized to have domain [0, π]. In Figure 1 we show results for a subset of the simulations, including the true population spectrum, sample periodograms for each of the individuals and posterior mean estimates of the population spectrum values, assuming q = 2 in model (7), together with 95% credible intervals. For each example the algorithm appears to be working well in inferring the population structure. More detailed analyses, not shown, also reveal that the random effect components for each individual, together with the hyper-parameters of the random effect distribution, are also accurately estimated. Both formal and informal checks of convergence were applied to all parameters in the MCMC. The only parameters with a Metropolis step are the random effect variance parameters, σs2 , though reasonable mixing was obtained across a range of random walk settings for the chain. For all other parameters convergence was also observed to be fast with the chain exhibiting good mixing properties. In fact, substantially fewer problems of convergence were obtained compared with an MCMC implementation of the parametric version of the model described in Section 2, where difficulties of confounding were obtained between the parametric specification of f using the model proposed by Diggle and al Wasel (1997) and the random effect components. In that case a hierarchical re-centering parametrization of the random effect structure was necessary to speed-up convergence: see Iannaccone (1999) for details. The examples shown here are typical of many others also studied Iannaccone (1999). At the suggestion of a referee we have also tried fitting our model to sample periodograms of data generated from random effect structures falling outside of class (2). In particular, we have simulated data from AR processes of the type X i,t = φi X i,t−1 + i,t
(11)
where the φi are autoregressive parameters generated from a distribution that ensures |φi | < 1 to ensure stationarity. Our general findings in such cases are that although some bias is induced in the estimation of individual spectra due to model mis-specification, the estimates are still reasonable in terms of their accuracy, the population ‘norm’ is estimated extremely well and the model retains excellent discriminatory power for distinguishing population and individual effects. 5. B IOMEDICAL EXAMPLES Diggle and al Wasel (1997) give a range of biomedical examples to illustrate the utility of the random effect model (2) for handling variation in observed time series across individuals in longitudinal medical trials. The first example concerns a study of the variation in the level of the LH hormone in blood. It is thought that the pattern of variation in such hormonal concentration plays a primary role in affecting
Semiparametric models and inference for biomedical time series with extra-variation
0 –2 –6 –10 – 8
–10 – 8
–6
–4
–4
–2
0
2
(b)
2
(a)
267
0.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0
0.5
1.0
2.5
3.0
2.0
2.5
3.0
4 2
2
0
0
–2
–2
–4
–4
–6
–6 0.0
2.0
(d)
4
(c)
1.5
0.5
1.0
1.5
2.0
2.5
3.0
0.0
0.5
1.0
1.5
Fig. 1. Inference for semiparametric model on each of four simulated datasets. In all cases, there is a sample of eight individuals whose simulated log-periodograms are shown (dotted curves). True population spectra are also shown (long dashes), together with posterior mean estimate (continuous curve) and 95% credibility intervals (short dashes). The top two panels correspond to the Diggle and al Wasel model (9) with (a) β = (0.66, 2.25, −3.10), σ = (1.1, 0.7, 0.4); (b) β = (0.66, 2.25, −3.10), σ = (0.8, 0.5, 1). The bottom two panels correspond to the Wahba model (10) with κ = 10 and (c) σ = (1.3, 0.8, 0.4); (d) σ = (0.8, 0.6, 1.2).
an individual’s physiology, rather more than the mean level. For this reason understanding the temporal behaviour of such processes within individuals, and, moreover, the variations of such patterns across individuals, is of considerable interest. For the reasons outlined in Section 1, modelling variations across individuals by means of random variations in sample periodograms is a natural way to model such phenomena; we now consider the efficacy of the semiparametric version of model (2). The measurements were made on each of eight healthy adults at intervals of one minute for a period of one hour. The data were then subjected to a simple moving average filter in order to remove empirically a low-frequency variation effect in the data which was of no physiological importance. In the Diggle and al Wasel (1997) model, a systematic factor was included in the model for f in order to account for this smoothing; in our nonparametric analysis such a correction seems superfluous and is therefore not included. Sample log-periodogram transforms of the data for each individual are shown in Figure 2. It is reasonably clear from this figure that there is a tendency for all individuals to have a peak in their periodogram somewhere between ω = 1 and ω = 1.5, that most individuals have low spectra at low frequencies as a consequence of the filtering and that there is a tendency for the spectra to increase at high frequencies. Thus, the individuals have common features to their spectral coordinates, but equally, there is
R. I ANNACCONE AND S. C OLES
– 12
– 10
–8
–6
–4
–2
0
268
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Fig. 2. Sample log-periodograms of LH hormone process for sample of eight individuals.
considerable variation across the individuals, which renders implausible the modelling of the time series as replicates of a common process. As for the simulated examples, the MCMC routine was found to behave well and converge quickly. Figure 3 shows the posterior mean estimate and 95% credible interval of the population log-spectrum using the model described in Section 4, assuming q = 2 in equation (3). Also shown in Figure 3 are empirical estimates of the population log-spectrum based on averaging the sample log-periodograms and correction for non-zero means, together with corresponding 95% confidence intervals. At least by eye, the estimate seems to capture the typical pattern of the sample individuals. The posterior mean estimates of the random effect variances, with posterior standard deviations in parentheses, are obtained as σˆ 0 = 1.48 (0.33), σˆ 1 = 0.48 (0.28) and σˆ 2 = 0.62 (0.38), indicating reasonably substantial variation in both the scale and phase of time variation across individuals. For comparison, Figure 4 shows the estimated nonparametric population spectrum with that obtained using the parametric model (9) of Diggle and al Wasel (1997), each superimposed on a graph showing the sample mean across the individual periodograms. The estimate of the parametric model is slightly different from the estimate reported in Diggle and al Wasel (1997), presumably due to the alternative method of inference, but the qualitative features are the same in any case. The posterior mean and standard deviation of τ are 1.17 and 0.19 respectively, so that taking into account the interpretation of this parameter as an ‘equivalent number of degrees of freedom’, the inferred nonparametric estimate is, in a certain sense, smoother than the corresponding parametric estimate. Furthermore, it is perhaps the case that the nonparametric estimate gives a more faithful representation of the typical behaviour of sample values, and in particular the peak of the population spectrum now occurs at a frequency that seems to have greater consistency with the individual series. This suggests that the parametric family (9) may be overly restrictive in its structure. As might be expected for its extra flexibility, there is an apparent cost to the nonparametric analysis in terms of sampling variation. As a second example we report briefly on a second dataset considered by Diggle and al Wasel (1997). These data are also time series of the LH hormone in blood, recorded every minute for one hour, but this time on seven post-menopausal women with a history of non-functional ovary behaviour. As in the
269
–2
0
Semiparametric models and inference for biomedical time series with extra-variation
• •
•
• •
–4
• • •
• •
•
•
•
•
•
•
•
•
• •
• •
• • •
–6
•
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Fig. 3. Estimated population log-spectrum for LH hormone process based on sample of eight individuals. The full curve corresponds to posterior mean estimate of population log-spectrum, log f , with 95% credible intervals shown as dashed curves. Points correspond to mean-corrected empirical estimates with approximate 95% confidence intervals shown as vertical bars.
previous example, the data were filtered to remove low-frequency variation effects. In this study, however, the women were subjected to a hormone replacement therapy and the purpose of the trial was to examine whether the therapy has any apparent effect on the spectral characteristics of LH secretion. The experiment was repeated on each woman, once before and once after therapy, with the aim of assessing the evidence for a treatment effect on temporal characteristics, subject to allowance for a between-individual effect. With this in mind, Diggle and al Wasel proposed the sample periodogram model Ii,h (ω j ) = f h (ω j )Z i (ω j )Ui, j,h
(12)
for i = 1, . . . , 7; j = 1, . . . , 26; h = 1, 2, which is an obvious extension of model (2), but with an index h to distinguish between measurements before and after therapy. Hence, they assumed that the random effect perturbation Z i is constant for each individual before and after therapy, but that the baseline population spectra f 1 and f 2 are potentially different due to therapy effects. They also assumed each of the f h to have parametric form (9); our alternative is to assume a nonparametric form, represented by the dynamic systems of Section 4 for each of the two regimes. This requires straightforward modification to the MCMC algorithm: Iannaccone (1999) gives complete details. We summarize the analysis by showing a plot of the posterior mean of log f 1 (ω) − log f 2 (ω), together with corresponding 95% credible intervals, as a function of ω in Figure 5. Also shown are naive empirical estimates based on sample log-periodograms averaged across individuals. Clearly, there is absolutely no evidence for a treatment effect on the basis of our analysis. This differs somewhat from Diggle and al Wasel (1997) who report an apparent difference at low frequencies. Our experience of comparing the nonparametric and parametric methodologies leads us to believe that this conclusion is more likely to be a result of possible mis-specification in the f h than in any genuine systematic effect.
R. I ANNACCONE AND S. C OLES
0.0
0.05
0.10
0.15
270
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Fig. 4. Comparison of estimates of population spectrum for LH hormone data based, respectively, on parametric (dashed curve) and nonparametric (solid curve) models. In each case the central curve corresponds to the posterior mean estimate and the outer curves to 95% credibility interval limits. Also shown (dotted) is the empirical mean of sample periodograms.
6. D ISCUSSION We have shown in this paper that the random effect model for sample periodograms proposed by Diggle and al Wasel can be generalized to dispense with the necessity of a parametric form for the population spectrum. This has required the reformulation of the model in Bayesian hierarchical terms, and the recasting of the population spectrum as a dynamic system. These changes then permit inference through standard MCMC methodology. There are, in any case, advantages to this alternative way of making inferences as measures of precision are now easily obtained. Simulation studies have shown the inference technique to work well and application to biomedical datasets suggests that the model may be more realistic than its parametric counterpart in representing genuine data behaviour. A more complex example has shown that simple covariate effects can also be handled as part of the inference, though this also highlights an area for further research. Whilst in model (12) it is assumed that the treatment affects the baseline population spectra, there is no assumed effect on the between-individual variation. Though, in principle, it is straightforward to extend the methodology to enable differences in these effects across the treatment regimes, it remains to be seen whether biomedical datasets, perhaps of a larger scale than that considered here, display sufficiently rich structure that they can be usefully modelled in this way. A remaining issue concerns interpretation of the fitted model. For a continuous-time stochastic process, the spectrum f (ω) is a non-negative valued function which distributes power over the positive half-line. When a process of this type is sampled at discrete time intervals, say δ, power at frequencies higher than the Nyquist frequency, ω0 = π radians, corresponding to one cycle every 2δ time units, is redistributed amongst lower frequencies, a phenomenon known as aliasing (Priestley, 1981). Thus, the spectrum of the discrete-time process differs from that of the underlying continuous process unless the latter has negligible power beyond the Nyquist frequency. Furthermore, any discrete-time spectrum
271
-1.5
-1.0
-0.5
0.0
0.5
1.0
Semiparametric models and inference for biomedical time series with extra-variation
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Fig. 5. Posterior mean (solid curve) and 95% credible intervals (dashed curves) for log f 2 (ω) − log f 1 (ω) corresponding to the difference in population spectra of LH hormone due to a treatment effect on post-menopausal women. Also shown is an empirical estimate (dotted curve) based on sample individual periodograms.
which has been derived in this way from an underlying continuous-time spectrum must obey certain constraints, in particular d log f (ω)/dω = 0 at ω = π . For standard spectral estimation it is not difficult using MCMC techniques to impose such a constraint, as suggested by Carter and Kohn (1997). However, there are a number of difficulties in the extension to the random effect model. First, in the examples of Diggle and al Wasel that motivated our work, the sample spectral values were not recorded at 0 or π . This is inconvenient, but not especially problematic for inference within an MCMC framework. A more fundamental difficulty is that whilst it is straightforward to impose boundary constraints on the population spectrum, it is not so easy on the individual spectra. To this extent, our model suffers the same weakness as the original model of Diggle and al Wasel, namely that the modelled discrete-time spectrum does not have a direct intepretation as the spectrum of a continuous-time process. Our main objective in this study, however, like that of Diggle and al Wasel, has been to allow for both within- and between-individual effects so as to allow comparative assessment of spectral properties, rather than to determine the actual spectrum for the underlying continuous-time process. The best way to allow for between-subject variation whilst retaining a strict continuous-time interpretation remains an open question.
ACKNOWLEDGEMENTS This work has been supported by EU TMR network ERB-FMRX-CT96-0095 on ‘Computational and statistical methods for the analysis of spatial data’. We are grateful to two referees for constructive criticism and to Peter Diggle for many helpful discussions.
272
R. I ANNACCONE AND S. C OLES A PPENDIX : MCMC DETAILS A.1
Updating population spectrum values and parameters
As discussed, conditional on the values of the random effect variables, model (8) collapses to the dynamic model of Carter and Kohn (1997), but with repeat observations (one per individual for each frequency). That is, the observation equation of (8) is modified to become Y ∗j = F α j + j
(13)
where Y ∗j = Y j − S j . Temporarily, we also assume that the observation error vectors j are normally distributed with mean 0 and diagonal variance matrix W j . This means that, with a simple modification to account for the repeat observations within the Y ∗j vector, standard algorithms can be adopted to update these parameters within an overall MCMC scheme. Specifically, a forward pass of the Kalman filter leads to the distribution of αm | . . . ; then, iterating back through the sequence using α j |α j+1 leads to a complete algorithm for simulating from the joint distribution of the {α j } conditional on the complete observed sequence {Y ∗j }. Using standard notation α j |D j to denote the distribution of α j conditional on D j = {Y ∗1 , . . . , Y ∗j }, with conditioning on other model parameters assumed implicitly, suppose that α j |D j ∼ N( p j , C j ). The Kalman filter is obtained by recursion of the updating equation α j+1 |D j+1 ∼ N( p j+1 , C j+1 ). This follows since, first, α j+1 |D j ∼ N(a j+1 , R j+1 ), where a j+1 = G p j
R j+1 = GC j G + V.
Next, it follows that Y ∗j+1 |(α j , D j ) ∼ N( f j+1 , Q j+1 ), where f j+1 = F a j+1 ,
Q j+1 = F R j+1 F + W j .
Finally, defining the innovation error e j+1 = Y ∗j+1 − f j+1 , equation (14) follows with p j+1 = a j+1 + R j+1 F Q −1 j+1 e j+1 and C j+1 = R j+1 − R j+1 F Q −1 j+1 F R j+1 .
(14)
Semiparametric models and inference for biomedical time series with extra-variation
273
Table 1. Means, variances and probability weights of each component in the mixture of five normals approximation to the distribution of an extreme value distribution Mixture component (k) 1 2 3 4 5
Probability ( pk ) 0.19 0.11 0.27 0.25 0.18
Mean (µk ) −2.20 −0.80 −0.55 −0.035 0.48
Variance (ξk2 ) 1.93 1.01 0.69 0.60 0.29
Details of the Bayesian computations leading to these results may be found in West and Harrison (1997), for example. Thus, with an initial specification 0 0 c α0 ∼ N , , 0 0 c0 0 where the value of c0 can be chosen large to reflect vague prior knowledge about the population spectrum, the above equations can be applied recursively to obtain the conditional distribution of the sequence α j |D j , j = 1, . . . , m. In the context of a Gibbs sampler algorithm, this can then be used as the basis for the updating of αm at each iteration. The complete update of the α j sequence is then made by successive iteration back through the sequence, using successively for j = n − 1, n − 2, . . . , 1: α j |(α j+1 , D j+1 ) ∼ N(h j , H j ) where h j = p j + B j (α j+1 − a j+1 ),
H j = C j − B j R j+1 Bj ,
with B j = C j G R−1 j+1 . As explained above, because of the asymptotic assumptions being made on the log-periodogram ordinates, the observation errors j in (8) are not normally distributed but have the extreme value distribution. Again, we follow Carter and Kohn (1997) in approximating this distribution by a mixture of five normals, with weights, means and variances as given in Table 1. For each component i, j we then introduce an allocation variable K i, j indicating which of the five mixture components is allocated to. More precisely, P( i, j = x) = P(K i, j = k) × P(Nk = x) where Nk is a random variable having the normal distribution N(µk , ξk2 ) corresponding to allocation k. Thus, i, j |(K i, j = ki, j ) has distribution N(µki, j , ξk2i, j ) and so, conditional on the complete set of {K i, j }, the α j may be simulated as described above, on replacement of Y ∗j by Y ∗j − µ j , where µ j = (µk1, j , . . . , µkr, j ), and setting W j = diag(ξk21, j , . . . , ξk2r, j ).
274
R. I ANNACCONE AND S. C OLES
The remaining step corresponds to the update of the allocation variables. But this is straightforward using the relation p(Yi, j |K i, j = ki, j , . . . ) p(K i, j = ki, j ) p(K i, j = ki, j | . . . ) = 5 , s=1 p(Yi, j |K i, j = s, . . . ) p(K i, j = s) applied successively in a Gibbs sampler of each allocation variable in turn. In this expression, p(Yi, j |K i, j = ki, j , . . . ) has a normal density according to the observation equation (8), while p(K i, j = ki, j ) is given by the appropriate mixing probability pk in Table 1. Finally, unless it is to be set at a fixed constant value, the smoothing parameter τ should be updated. Carlin et al. show that, subject to the improper prior specification p(τ 2 ) ∝ 1, m − j=2 d j V −1 d j 2 2 −m+1 p(τ | . . . ) ∝ (τ ) , exp 2τ 2 where d j = α j − Gα j−1 , 2 can be updated by simulation from an inverse Gamma distribution IG(a, b) with a = m − 2 so that τ and b = mj=2 d j V −1 d j /2.
A.2
Updating random effect parameters
Conditioning on the α, the f (.) terms are completely specified in equation (4). Hence, now setting Yi,∗ j = Yi, j − α1, j we obtain Yi,∗ j = log Z i (ω j ) + i, j , with log Z i (ω) =
q
[φs (ω)Bi,s + 12 σs2 φs (ω){1 − φs (ω)}].
(15)
s=0
As before, the i, j are independent extreme value innovations which can be approximated as normals conditional on the allocation variables: this permits standard conjugacy arguments in the updates of the random effect parameters. Specifically, we first re-express model (15) as log Z i (ω) =
q
∗ σs φs (ω)Bi,s −
1 2
s=0
q
σs2 φs2 (ω),
(16)
s=0
∗ ∼ N(0, 1). Then, it is straightforward to show that where now Bi,s
∗ Bi,s |··· ∼ N
m 2
σs
σs νi,s
j=1
φs2 (ω j ) ξk2 i, j
,
m 2
+ 1 σs
j=1
1 φs2 (ω j ) ξk2 i, j
+1
Semiparametric models and inference for biomedical time series with extra-variation
275
with νi,s =
∗ m (Y˜ i, j + σs φs (ω j )Bi,s )φs (ω j )
ξk2i, j
j=1
and Y˜i, j = Yi,∗ j − µki, j −
q
∗ σl φl (ω j )Bi,l +
l=0
1 2
q
σl2 φl2 (ω j ).
l=0
∗ are easily updated by simulation from independent normals. Hence, the Bi,s The remaining step is the update of the σs parameters in equation (16). There is no simple conjugacy available for these parameters so we adopt vague gamma priors, Ga(a, b), with a = b = 0.001 and use a random walk Metropolis step to update each σs parameter in turn. This requires an expression, up to proportionality, for the conditional distribution of σs , which follows from
m ˜ 2 r Yi, j 1 p(Y 1 , . . . , Y m |σs , . . . ) ∝ exp − 2 i=1 j=1 ξki, j and simple use of Bayes’ theorem. R EFERENCES B RILLINGER , D. R. (1973). The analysis of time series collected in an experimental design. In Krishnaiah, P. R. (ed.), Multivariate analysis III, New York: Academic, pp. 241–256. C ARLIN , B. P., P OLSON , N. G. AND S TOFFER , D. S. (1992). A Monte Carlo approach to non-normal and nonlinear state space modeling. Journal of the American Statistical Association 87, 493–500. C ARTER , C. K. AND KOHN , R. (1994). On Gibbs sampling for state space models. Biometrika 81, 541–553. C ARTER , C. K. AND KOHN , R. (1997). Semiparametric Bayesian inference for time series with mixed spectra. Journal of the Royal Statistical Society, Series B 59, 255–268. DE J ONG ,
P. AND S HEPHARD , N. (1995). The simulation smoother for time series models. Biometrika 82, 339–350.
D IGGLE , P. J. (1990). Time Series: A Biostatistical Introduction. Oxford: Oxford University Press. D IGGLE , P. J. 46, 31–71.
AND AL
WASEL , I. (1997). Spectral analysis of replicated biomedical time series. Applied Statistics
G REEN , P. J. AND S ILVERMAN , B. W. (1994). Nonparametric Regression and Generalized Linear Models. London: Chapman and Hall. H ARVEY , A. C. (1989). Forecasting Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. I ANNACCONE , R. (1999). Analisi spettrale di serie storiche biomediche, Ph.D. Thesis, Padova University, unpublished. ¨ M ULLER , P. (1997). Discussion of Diggle, P. J. and al Wasel, I. (1997) Spectral analysis of replicated biomedical time series. Applied Statistics 46, 31–71. P RIESTLEY , M. B. (1981). Spectral Analysis and Time Series. London: Academic. S HEPHARD , N. (1994). Partial non-Gaussian state space models. Biometrika 81, 115–132.
276
R. I ANNACCONE AND S. C OLES
WAHBA , G. (1978). Improper priors, spline smoothing and the problem of guarding against model errors in regression. Journal of the Royal Statistical Society, Series B 40, 364–372. WAHBA , G. (1983). Bayesian confidence intervals for the cross-validated smoothing spline. Journal of the Royal Statistical Society, Series B 45, 133–150. W ECKER , W. E. AND A NSLEY , C. F. (1983). The signal extraction approach to nonlinear regression and spline smoothing. Journal of the American Statistical Association 78, 81–89. W EST , M. AND H ARRISON , J. (1997). Bayesian Forecasting and Dynamic Models. Berlin: Springer. [Received February 23, 2000; revised June 13, 2000; accepted for publication June 27, 2000]