This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
~2 P J(00h2(A, A)dA o
(2.81)
Therefore, the improvement of the Wiener model prediction depends on the autocorrelation properties of the input ensemble and its relation to the second-order kernel (i.e., the nonlinear characteristics of the system). It is conceivable that for some inputs the Wiener model prediction may be worse than the Volterra model prediction of the same order. However, as the input ensemble tends to GWN [i.e., 4>( Tl - T2) = PS( Tl - T2)], the improvement of the Wiener model prediction becomes guaranteed, since the left-hand side of (2.81) becomes twice the right-hand side. It must be emphasized that these statements
2.2
WIENER MODELS
67
apply only to truncated models, and the complete models (Volterra or Wiener) give the same output prediction. For an alternate GWN test input of power level pI, the MSE difference between the two-model predictions is given by
,iQ(P') =
cu:- P)P{L'\2(A, A)dAY
(2.82)
which indicates that the reduction in the MSE ofthe Wiener model prediction increases as the alternate GWN input power level P' increases (relative to the power level P of the GWN input for which the Wiener series was orthogonalized) but it may become negative if P' < P/2. For P = P', a reduction in prediction MSE occurs (LlQ > 0), as expected. Note that the rate ofthis reduction is proportional to the square ofthe integral ofthe second-order kerne I diagonal (i.e., it depends on the system nonlinear characteristics). This illustrative result is strictly limited to second-order systems and must not be generalized to higher-order systems and models. 2.2.2
Wiener Approach to Kernel Estimation
The estimation of the unknown system kemels is the key task in nonparametric system modeling and identification. As discussed earlier, the primary motivation for the introduction ofthe Wiener series has been the facilitation ofthe kernel estimation task. In the general case of an infinite Volterra series, the accurate (unbiased) estimation of the Volterra kernels from input-output data using the methods of Section 2.1.5 is not possible, in principle, due to the unavoidable truncation of the Volterra series that results in input-correlated residuals. The severity of this kernel estimation bias depends on the size of the input-correlated residuals relative to the prediction of the truncated Volterra model. If more high-order terms are included in the truncated model in order to reduce the size of the residuals, then the estimation task becomes more cumbersome and often impractical. Although a practicable solution for this important problem has been recently proposed in the form oftrainable network models ofhigh-order systems (see Section 2.3.3), this serious limitation gave initial impetus to the Wiener approach and its variants that were developed to overcome specific practical shortcomings of the initial Wiener methodology. The Wiener approach addresses the problem of biased kernel estimation in the general context of an infinite series expansion by decoupling the various Wiener kernels through orthogonalization of the Wiener functionals. Nonetheless, the resulting Wiener kemels are distinct from the Volterra kernels of the system and can be viewed as having a "structured bias" related to the employed GWN input, as discussed previously. In this section, we present Wiener' s original approach to kerne I estimation in order to provide historical perspective and allow the reader to appreciate the many subtleties of this modeling problem, although Wiener' s approach is not deemed the best choice at present, in light of recent developments that have made available powerful methodologies for the estimation ofVolterra (not Wiener) models with superior performance in a practical context. Following presentation ofthe rudiments ofthe Wiener approach, we will elaborate on its most popular implementation (the cross-correlation technique) in Section 2.2.3 and discuss its practically useful variants using quasiwhite test inputs in Section 2.2.4. We note that the recommended and most promising estimation methods at present (using ker-
68
NONPARAMETRIC MODELING
nel expansions, iterative estimation techniques, and equivalent network models) are presented in Sections 2.3 and 4.3, and they yield Volterra (not Wiener) models. The orthogonality of the Wiener series allows decoupling of the Wiener functionals through covariance computations and estimation ofthe Wiener kernels from the GWN input and the corresponding output data. Since the orthogonality of the Wiener functionals is independent of the specific kernel functions involved, a known "instrumental" Wiener functional can be used to isolate each term in the Wiener series (by computing its covariance with the system output) and subsequently obtain the corresponding kernel, For instance, if an mth-order instrumental functional Qm[qm; x(t'), t' ;::: t], constructed with a known kerneI qm(Tb ... , 'Tm), is used to compute the covariance with the OUtput signal y(t), then E[y(t)Qm(t)] =
L E[Gn(t)Qm(t)]
n=O =
E[Gm(t)Qm(t)]
= mlpm( ... (hm(Th"" Tm)qm(Th"" Tm)dT\ ... dr.;
(2.83)
since Qm is orthogonal (i.e., it has zero covariance) with all Gn functionals for m :/; n. Note that the instrumental functional Qm(t) has the form of the mth-order Wiener functional given by Equation (2.56) with the kernel qm replacing the kernel hm; hence, it can be computed for a given input signal x(t). The "instrumental" kerneI qm( Tl' . . . , 'Tm) is judiciously chosen in order to facilitate the evaluation of the unknown kernel hm(Tl' ... , Tm), after the left-hand-side of Equation (2.83) is evaluated from input-output data. Wiener suggested the use of a multidimensional orthonormal basis for defining the instrumental kernels. So, if {b)( T)} is a complete orthonormal (CON) basis over the range of system memory, TE [0, JL], then instrumental kernels of the form qm(Tb· .. , Tm) = b)l(Tl) ... b)m(Tm)
(2.84)
can be used to obtain the expansion coefficients {aj}, ... ,im} of the unknown kerne I over the specified CON basis as
a)}, ... «Jm
=
1 'pm E[y(t)Qm(t)] rn.
(2.85)
where hm( Th
... ,
Tm) =
L ...La)}, ... ,)mb)l(
Tl) . . .
ji
b)m(Tm)
(2.86)
jm
Note that in this case [m/2]
Qm(t) =
L /=0
(-1yp1rn ! (-a1A
~
~
"'1\'1''''1 V)l(t) ... V)m_2/(t)5jm_2/+},)m_2/+2 .. . 5jm- l ,)m
(2.87)
2.2
WIENER MODELS
69
Ho(t)
Hermite CON Basis
x(t) INPUT
H J (t)
Expansion coeffiCientsl y (t)
{rJ, , ••• ,riR } HM
OUTPUT
(t)
LAGUERRE FILTERBANK
Figure 2.9 The block-structured Wiener model that is equivalent to the Wiener series for a GWN input when Land M tend to infinity. The suggested filter bank {b;} by Wiener was the complete othonormal (CON) Laguerre basis of functions. The Hermite basis generates the signals {Hj(t)}. Since the Laguerre and Hermite bases are known (selected), the problem reduces to determining the expansion coefficients {/'j1' ... , /'jR} from input-output data when the input is GWN.
where vif) =
rb/ o
T)x(t - T)dT
(2.88)
and J.L and Bi,} denote the system memory and the Kronecker delta* respectively. Since the input x(t) is GWN and the basis functions {bj } are orthononnal, the signals vj(t) are independent Gaussian (nonwhite) random processes with zero mean and variance P [Marmarelis, 1979b]. Therefore, the instrumental functionals {Qm} can be seen as orthogonal multinomials in the variables (Vb . . . , V m ) , with a structure akin to multivariate Hermite polynomials. Motivated by this observation, Wiener proposed a general model (for the Wiener class of systems) comprised of a known set of parallel linear filters with impulse response functions {bj ( 'T)} (i.e., a filter bank comprising an orthonormal complete basis of functions, such as the Laguerre set) receiving the GWN input signal and feeding the filter-bank outputs into a multiinput static nonlinearity,y = f(Vb Vb . . .), which he decomposed into the cascade of a known "Hermite-polynomial" component and an unknown "coefficients" component to be detennined from the data, as shown in Figure 2.9. The latter decomposition was viewed as an efficient way of implementing the Hennite-like structure of the functionals {Qm}, although one may question this assertion of efficiency in light of the complexity introduced by the Hermite expansion. The system identificationlmodeling task then reduces to evaluating the "coefficients" component from input-output data for a GWN input, since all other components are known and fixed. In the original Wiener formulation, the output values of the filter bank {Vi} are viewed as variables completely describing the input signal from -00 up to the present time [x( T); T :5 t]. Wiener chose the Laguerre set of functions to expand the past (and present) of the *The Kronecker delta Bi,} is the discrete-time equivalent ofthe Dirac delta function (impulse), defmed as 1 für i =j and as zero elsewhere.
70
NONPARAMETRIC MODELING
input signal because these functions form a complete orthononnal (CON) basis in the semi-infinite interval [0, 00) and have certain desirable mathematical properties (which will be described in Section 2.3.2). In addition, the outputs of the Laguerre "filterbank" can be easily generated in analog form by linear ladder R-C circuits (an important issue at that time). When we employ L filters in the filter bank, we use L variables {vlt)} to describe the past (and present) of the input signal at each time t. Thus, the system output can be considered a function of L variables and it can be expanded in tenns of the CON basis of Hermite polynomials {~} as M
y(t) =}j~
M
L .. ·.L ».... jJ!ljJ(Vl) ... ~iVR) + ho
R~oo Jl=O
(2.89)
JR=O
Clearly, in practice, both M and R must be finite and detennine the maximum order of nonlinearity approximated by the finite-order model, where M is the maximum order of Hennite polynomials used in the expansion. Note that the nonlinear order r is defined by the SUfi ofthe indices: V} + ... + JR)' The Hermite polynomial ofjth order is given by H{ v) = e- ß v2 J
dj dv'
ß v2 - . [e ]
(2.90)
where the parameter ß determines the Gaussian weighting function exp[-2ßv2] that defines the orthogonality ofthe Hermite polynomials as:
r e-2ßv2~I(v)~iv)dv
= B;I,j2
(2.91)
-00
The proper selection of the parameter ß is important in practice, because it determines the convergence ofthe Hermite expansion ofthe static nonlinearity f(v}, ... , VL) in conjunction with the GWN input power level P, since the variance of each vlt) process is P. For any GWN input, the terms ofthe multidimensional Hermite expansion in Equation (2.89) are statistically orthogonal (i.e., have zero covariance) and are normalized to unity Euclidean norm because the joint PDF ofthe {Vi} processes has the same form as the Hermite weighting function (i.e., multivariate Gaussian). Therefore, the expansion coefficients in Equation (2.89) can be evaluated through the ensemble average: l'i} ... iR = E{[y(t) - ho]Hi} [Vl(t)] ... HiR[VR(t)]}
(2.92)
where all terms ofthe expansion ofEquation (2.89) with indices Vi' ... ,jR) distinct from the indices (i;, ... , iR ) vanish. The indices i. through iR take values from 0 to M and add up to the order of the estimated Wiener functional component. Note that the multiindex subscript (i} ... i R ) is ordered (i.e., it is nonpennutable) and all possible combinations of L functions {Vi} taken by R must be considered in the expansion of Equation (2.89). The mean h o of the output y(t) must be subtracted in Equation (2.92) because it is separated out in Equation (2.89). According to the Wiener approach, the coefficients l'i} ... iR characterize the system completely and the identification problem reduces to the problem of determining these coefficients through the averaging operation indicated in Equation (2.92). The ensemble
2.2
WIENER MODELS
71
average of Equation (2.92) can be implemented by time averaging in practice, due to the ergodicity and stationarity of the input-output processes. Once these coefficients have been determined, they can be used to synthesize (predict) the output ofthe nonlinear model for any given input, according to Equation (2.89). Of course, the output prediction for any given input will be accurate only ifthe model is (nearly) complete, as discussed earlier [Bose, 1956; Wiener, 1958]. This approach, which was deciphered for Wiener's MIT colleagues in the 1950's by Amar Bose (see Historical Note #2), is difficult to apply to physiological systems in a practical context for the following reasons: 1. The form of the output expression is alienating to many physiologists, because it is difficult to assign some physiological meaning to the characterizing coefficients that would reveal some functional features ofthe system under study. 2. The experimentation and computing time required for the evaluation ofthe characterizing coefficients is long, because long data records are required in general for reducing the variance ofthe estimates down to acceptable levels. For these reasons, this original Wiener approach has been viewed by most investigators as impractical, and has not found any applications to physiology in the originally proposed form. However, variants of this approach have found many applications, primarily by means of the cross-correlation technique discussed in Section 2.2.3 and multinomial expansions of the multiinput static nonlinearity in tenns of the orthogonal polynomials {Qm} given by Equation (2.87) to yield estimates ofthe expansion coefficients {aJI,J2, ... } ofthe Wiener kemels, as described by Equation (2.85) [Marmarelis, 1987b]. To avoid the complexity introduced by the Hermite expansion, Bose proposed the use of an orthogonal class of functions that he called "gate" functions, which are simply square unit pulses that are used to partition the output function space into nonoverlapping cells (hence, orthogonality ofthe gate functions) [Bose, 1956]. This fonnulation is conceptually simple and appears suitable for systems that have strong saturating elements. Nonetheless, it has found very limited applications, due to the still cumbersome model form and the demanding requirement for long input-output data records. It must be emphasized that the main contribution of Wiener's fonnulation is in suggesting the decomposition ofthe general nonlinear model into a linear filter bank (using a complete set of filters that span the system functional space) and a multiinput static nonlinearity receiving the outputs ofthe filter bank and producing the system output (see Figure 2.9). This is a powerful statement in terms ofnonlinear dynamic system modeling, because it separates the dynamics (the filter-bank stage) from the nonlinearities and reduces the latter to a static form that is much easier to represent/estimate, for any given application. In the many possible variants of the Wiener approach, different orthogonal or nonorthogonal bases of functions can be used both for the linear filter bank and the static nonlinearity. We have found that the Laguerre basis (in discrete time) is a good choice for the filter bank in general, as discussed in Section 2.3.2. We have also found that polynomial nonlinearities are good choices in the general case, although combinations with specialized forms (e.g., sigmoidal) may also be suitable in certain cases. These important issues are discussed further in Section 2.3 and constitute the present state of the art, in connection with iterative (gradient-based) estimation methods and equivalent network structures (see Sections 4.2~.4).
72
NONPARAMETRIC MODELING
It should be noted that a general, yet rudimentary, approach in discretized, input-output space (having common characteristics with the Bose approach), may utilize a grid of the discrete input values that cover the memory of the system and the dynamic range of amplitudes ofthe input. At any discrete time t, the present and past values ofthe input are described by a vector ofreal numbers (xo, Xl' . . . ,XM) that can be put in correspondence with the value, Yo, of the system output at this time, thus forming the "mapping" input-output vector (xo, Xj, . . . ,XM, Yo). As the system is being tested with an ergodie input (e.g., white noise), input-output vectors are formed until the system is densely tested by combinations of values of the input vectors. All these input-output vectors define an input-output mapping that represents a "digital model" of the system in a most rudimentary form. In the next two sections, we complete the traditional approaches to Wiener kerne I estimation that have found many applications to date but have seen their utility eclipsed by more recent methodologies presented in Sections 2.3 and 4.3.
2.2.3
The Cross-Correlation Technique for Wiener Kernel Estimation
Lee and Schetzen (1965) proposed a different implementation of Wiener's original idea for kernel estimation that has been widely used because of its relative simplicity. The Lee and Schetzen method, termed the "cross-correlation technique," is based on the observation that the product of m time-shifted vers ions of the GWN input can be written in the form ofthe leading term ofthe mth-order Wiener functional using delta functions: x(t- TI) . . . X(t- Tm) =
r.. r o
°
8(TI - AI)· .. 8(Tm- Am)x(t- AI)· .. x(t- Am)dAI ... dAm
(2.93)
The expression of Equation (2.93) has the form of a homogeneous (Volterra) functional of mth order and, therefore, it is orthogonal to all Wiener functionals of higher order. Based on this observation, they were able to show that, using this product as the leading term of an "instrumental functional" in connection with Equation (2.83), the Wiener kernel estimation is possible through input-output cross-correlations of the respective order. The resulting expression for the estimation of the mth-order Wiener kernel is 1 h m('Tb ... , 'Tm) = --E[Ym(t)x(t - 'Tl) ... x(t - 'Tm)]
m!pm
(2.94)
where Ym(t) is the mth-order output residual defined by the expression m-I
Ym(t) = y(t) -
I
Gn(t)
(2.95)
n=O
The use ofthe output residual in the cross-correlation formula ofEquation (2.94) is necessitated by the fact that the expression of Equation (2.93), having the form of a homogeneous (Volterra) functional of mth order, is orthogonal to all higher-order Wiener functionals but not to the lower-order ones, whose contributions must be subtracted. It is seen later that this subtraction is required only by the lower-order terms of the same parity (odd/even) in principle. Failure to use the output residual in Equation (2.94) leads to se-
2.2
WJENER MODELS
73
vere misestimation of the Wiener kernels at the diagonal values, giving rise to impulselike errors along the kernel diagonals. In practice, this output residual is computed on the basis of the previously estimated lower-order Wiener kernels and functionals. Thus, the use of the output residual in Equation (2.94) implies the application of the cross-correlation technique in ascending order ofWiener kernel estimation. The statistical ensemble average denoted by the "expected value" operator E[.] in Equation (2.94) can be replaced in practice by time averaging over the length ofthe data record, assuming stationarity ofthe system. Since the data record is finite, these time averages form with certain statistical variance (i.e., they are not precise). This variance depends on the record length and the GWN input power level, in addition to the ambient noise and the system characteristics, as detailed in Section 2.4.2. At the risk ofbecoming somewhat pedantic, we detail below the successive steps in the actual implementation of the cross-correlation technique for Wiener kernel estimation [Marmarelis & Marmarelis, 1978; Schetzen 1980].
Estimation of ho- The expected value of each Wiener functional Gn[hn; x(t)] (for n 2:: 1) is zero if x(t) is GWN, since the Wiener functionals for n 2:: 1 are constructed orthogonal to the constant zeroth-order Wiener functional h.; Therefore, taking the expected value ofthe output signal y(t) yields
E[y(t)] = ho
(2.96)
which indicates that ho is the ensemble average (mean) or the time average of the output signal for a GWN input.
Estimation of h 1(T). The shifted inputx(t- 0") is a first-order homogeneous (Volterra) functional according to Equation (2.93), where it is written as a convolution ofthe GWN input with a delta function. Since x(t - 0") can be also viewed as a Wiener functional of first order (no other terms are required in the first-order case), its covariance with any other order of Wiener functional will be zero. Thus,
r
E[y(t)x(t - lT)] = E[X(t - o)
=
r o
h,( r)x(t - r)dT]
h b)E[x(t - lT).x(t - T)]dT
= f\,(T)p8(T-lT)dT o
(2.97)
since the second-order autocorrelation function of GWN is a delta function with strength P. Therefore, the first-order Wiener kernel is given by the cross-correlation between the GWN input and the respective output, normalized by the input power level P: h 1(0") = (I/P)E[y(t)x(t - u)]
(2.98)
Note that the cross-correlation fonnula (2.98) for the estimation of h, does not require in principle the use ofthe first-order output residual prescribed by Equation (2.94). The reason for this is the parity (odd/even) separation of the homogeneous functionals (and, by
74
NONPARAMETRIC MODELING
extension, of the Wiener functionals) due to the fact that the odd-order autocorrelation functions of GWN are unifonnly zero (see Appendix 11). Nonetheless, since the orthogonality between Wiener functionals is only approximate in practice due to the finite data records, it is advisable to always use the output residual as prescribed by Equation (2.94). This implies subtraction of the previously estimated ho from y(t) prior to cross-correlation with x(t - rr) for the practical estimation of h 1(er).
0'
Estimation h 2 ( Tl' 72). Since x(t - lTl)X(t - lTz) is a second-order homogeneous (Volterra) functional of x(t) [see Equation (2.93)], it is orthogonal to all Wiener functionals ofhigher order, i.e., E[G;(t)x(t - lTl)X(t - uz)] = 0 for i > 2. Thus, the cross-correlation between [x(t - Ul)X(t - lTz)] and the output y(t) eliminates the contributions of all Wiener functionals except Go, GI, and Gz. Furthermore, E[GoX(t- Ul)X(t- uz)] = hoP8(lTl -lTz)
(2.99)
indicating that the estimate of ho ought to be subtracted from y(t) prior to cross-correlation, and EtG](t)x(t- O"])x(t- 0"2)] =
r
h](T)E[X(t- O"])x(t- 0"2)x(t- T)]dT= 0
(2.100)
since all the odd-order autocorrelation functions of GWN are zero (see Appendix 11). For the second-order Wiener functional, we have
E[Gz(t)x(t -
fJ
h2( Th T2)[ 0( Tl
-
T2)0(0"] - 0"2) + 0(T] - 0"])0(T2 - 0"2)
r
+ 0( T2 - 0"])0(T] - 0"2)]dT]dT2 - p20(0"] - 0"2) h2(T], Tl)dTI = 2p Zh z(Uh uz)
(2.101)
using the Gaussian decomposition property (see Appendix 11) and the symmetry of the second-order kemel, Thus, the cross-correlation between the output y(t) and [x(t- lTl)x(tuz)] yields
E[y(t)x(t - Ul)x(t - uz)] = Ph o8(lTl - lTz) + 2pZh z(lTb uz)
(2.102)
which demonstrates the previously stated fact of impulse-like estimation errors along the kerneI diagonals when the output residual is not used in the cross-correlation formula of Equation (2.94). This example also demonstrates the fact that there is an oddleven separation ofthe Wiener functionals (i.e., h, is not present in Equation (2.102) but ho is). In order to eliminate the contribution of ho to this second-order cross-correlation, it suffices in practice to subtract from the output signaly(t) the previously estimated value of ho (which is the average value ofthe output signal). However, it is generally advisable to subtract all lower-order functional contributions from the output signal, as prescribed by Equation (2.95), because the theoretical orthogonality between [x(t - lTl)x(t - lTz)] and G 1[h 1; x(t)] is only approximate for finite data-records in practice. Therefore, in order to minimize the
2.2
WIENER MODELS
75
estimation errors due to the finite data records, we cross-correlate the second-order output residual (the "hats" denote estimates ofthe Wiener kemels): Y2(t) = y(t) -ho - {"h\(T)x(t- T)dT o
(2.103)
with [x(t - UI)X(t - (2)] to obtain the second-order Wiener kernel estimate as h 2(Uh (2) = (1/2p2)E[y2(t)x(t- UI)X(t - (2)]
(2.104)
It should be noted that a possible error Jiho in the estimate of ho used for the computation of the output residual causes some error Jih2 at the diagonal points of the second-order Wiener kernel estimate that takes the form of adelta function along the diagonal: Jih2(uJ, (2)
=
(1/2P)Jih 08(UI - (2)
(2.105)
Estimation of h 3 (Tl ' T2' T3). Following the same reasoning as for h 2 , we first compute the third-order output residual Y3(t): Y3(t) = y(t) - G2(t) - GI(t) -
Go
(2.106)
using the previously estimated ho, h1, and h2 (even though the subtraction of G2 and Go is not theoretically required due to the odd/even separation), and then we estimate the thirdorder Wiener kernel by computing the third-order cross-correlation: h 3(UI, U2, (3) = (1/6p3)E[Y3(t)x(t - O"I)X(t - (2)X(t - (3)]
(2.107)
It must be noted that any imprecision Jih l in the estimation of h l , which is used for the computation of the output residual, will cause some error Jih3 along the diagonals of the h 3 estimate that will have the impulsive form Jih3(Uh U2, (3) = (1/6P)[Jih l(UI)8(0"2 - (3) + Jih l(U2)8(U3 - 0"1) + Jih l(U3)5(UI - (2)]
(2.108) because E[GI(t)x(t - UI)X(t- 0"2)X(t - 0"3)] = P2[h l(UI)5(0"2 - (3) + h l(U2)5(0"3 -
UI)
+ h l(0"3)8(UI - (2)]
(2.109)
The successive steps of Wiener kernel estimation using the cross-correlation technique are illustrated in Figure 2.10. The aforementioned errors due to the finite data records [see Equations (2.105) and (2.108)] occur along the diagonals, but additional estimation errors occur throughout the kernel space for a variety of reasons detailed in Section 2.4.2 and become more severe as the order of kernel estimation increases. A complete analysis of estimation errors associated with the cross-correlation technique and their dependence on the input and system characteristics is given in Seetion 2.4.2.
76
NONPARAMETRIC MODELING white-noise stimulus
~
ll.i!!j
~ '.!tlj
response
SYSlEM
AVERAGE
~ ~
.h
~
o
response
An" 1\
1\
D. 1\"
.h o
FIRST-ORDER KERNEl
h 1 (T) T
~ W.N.stimuJus ...k.JJ.J
vT-'-~T
~)~ kernel-predo
response
nonlinear response component
Yo(t )
~ YLTIIt.ßU8TRACT~) ~ response
SECONO-ORDER KERNEL
~ hZ (TI' TZ)
response
~ Y~ ""I'l~ ~AlA. A.Ml.Lt.aA."IA.k.JwJx(t_~ ) . '~./t'",'r.\1rv"'''''Yr'''''·Y·'''FT
~J~ W.N. stimulus
I CROSS ~T""T"2) CORRELATE
f
X(t-Tz)
~~'/
fWG)ß "
~~00 Tf
Figure 2.10 Illustration 01 the successive steps for the estimation 01 zero-, first-, and second-order Wiener kerneis via the cross-correlation technique [Marmarelis & Marmarelis, 1978].
The computation of the output residuals in ascending order also provides in practice a measure of model adequacy at each order of Wiener kernel estimation. This measure of model adequacy is typically the normalized mean-square error (NMSE) Qr of the rth-order output prediction, defined as the ratio of the variance of the rth-order output residual Yr(t) to the output variance (note that the first-order output residual is simply the demeaned output):
Qr = E[y;(t)]/E[yt(t)]
(2.110)
2.2
WIENER MODELS
77
The NMSE measure Qr lies between 0 and 1 (for r ~ 1) and quantifies the portion of the output signal power that is not predicted by the model. If this quantity Qr drops below a selected critical value (e.g., a value Qr = 0.1 would correspond to 10% NMSE ofthe rthorder model prediction, indicating that 90% of the output signal power is explained/predicted by the model), then the kernel estimation procedure can be terminated and a truncated Wiener model of rth order is obtained. Obviously, the selection of this critical NMSE value depends on the specific requirements of each application and on the prevailing signal-to-noise ratio (which provides a lower bound for this value). In applications to date, this value has ranged from 0.02 to 0.20. As a practical matter, the actual estimation of Wiener kernels has been limited to second-order to date (with rare occasions of third-order Wiener kernel estimation but no further) due to the multi-dimensional structure of high-order kernels, This is dictated by practicallimitations of data-record length, computational burden, and estimation accuracy that become rather severe for multidimensional high-order kernels. Same Practical Consideratians. Wiener kernel estimation through the cross-correlation technique has several advantages over the original Wiener formulation because: (1) it directly estimates the Wiener kernels, which can reveal interpretable features ofthe system under study; (2) it is much simpler computationally, since it does not involve the concatenated Laguerre and Hermite expansions. Because of its apparent simplicity, the cross-correlation technique has found many applications in physiological system modeling and fomented the initial interest in the Wiener approach. Nonetheless, the application ofthe original cross-correlation technique exposed a host of practical shortcomings that were addressed over time by various investigators. These shortcomings primarily concemed issues of estimation errors regarding the dependence of estimation variance on the input length and bandwidth, as weIl as the generation and application of physically realizable approximate GWN inputs. This prompted the introduction of a broad class of quasiwhite test input signals that address some of these applicability issues, as discussed in Section 2.2.4. Among the important practical issues that had to be explored in actual applications of the cross-correlation technique are the generation of appropriate quasiwhite test signals (since ideal GWN is not physically realizable), the choice ofinput bandwidth, the accuracy of the obtained kernel estimates as a function of input bandwidth and record length, and the effect of extraneous noise and experimental transducer errors. A comprehensive and detailed study of these practical issues was first presented in Marmarelis & Marmarelis (1978). A summary of the types of estimation errors associated with the crosscorrelation technique is provided in Section 2.4.2, since the cross-correlation technique is a "legacy methodology" but not the best choice at present time. It is evident that in actual applications the ideal GWN input has to be approximated in terms of practicallimitations on bandwidth and amplitude. Since the latter are both infinite in the ideal GWN case, we have to accept in practice a band-limited and amplitudetruncated GWN input. In light of these inevitable approximations, non-Gaussian, quasiwhite input signals were introduced that exhibited whiteness within a given finite bandwidth and remained within a specified finite amplitude range. In addition, these quasiwhite test signals were easy to generate and offered computational advantages in certain cases (e.g., binary or temary signals). A broad family of such quasiwhite test input signals was introduced in 1975 by the author during his Ph.D. studies. These signals can be used in connection with
78
NONPARAMETRIC MODELING
the cross-correlation technique for kernel estimation and give rise to their own orthogonal functional series, as discussed in Section 2.2.4. We must note that a fair amount of variance is introduced into the Wiener kernel estimates (obtained via cross-correlation) because of the stochastic nature of the (quasi-) white noise input. This has prompted the use ofpseudorandom m-sequences which, however, present some problems in their high-order autocorrelation functions (see Section 2.2.4). The use of deterministic inputs (such as multiple impulses or sums of sinusoids of incommensurate frequencies) alleviates this problem of stochastic inputs but places us in the context ofVolterra kerneis, as discussed in Section 2.1.5. An interesting estimation method that reduces the estimation variance for arbitrary stochastic inputs is based on exact orthogonalization of the expansion terms for the given input data record [Korenberg, 1988]. However, this method (and any other method based on least-squares fitting for arbitrary input signals) yields a hybrid between the Wiener and the Volterra kerneis of the system, which depends on the spectral characteristics of the specific input signal. This hybrid can be viewed as a biased estimate ofthe Wiener and/or Volterra kerneis if significant higher-order terms exist beyond the order of the estimated truncated model. Another obstacle in the broader use of the cross-correlation technique has been the heavy computational burden associated with the estimation of high-order kerneis. The amount of required computations increases geometrically with the order of estimated kernei, since the estimated kernel values are roughly proportional to Mf2, where Q is the kernel order and M is the kernel memory-bandwidth product. This prevents the practical estimation of kerneis above a certain order, depending on the kernel memory-bandwidth product (i.e., the number of sampie values per kerne I dimension). An additional practical limitation is imposed by the fact that the kerneis with more than three dimensions are difficult to represent, inspect, or interpret meaningfully. As a result, successful application of the cross-correlation technique has been limited to weakly nonlinear systems (typically second-order or rarely third-order) to date.
Illustrative Example. To illustrate the application of thecross-correlation technique on a real system, we present below one of its first successful applications to a physiological system that transforms light-intensity variations impinging upon the catfish retina into a variable ganglion cell discharge rate [Marmarelis & Naka, 1973b]. The latter is measured by superimposing multiple spike-train responses (sequences of action potentials generated by the ganglion cell) to the same (repeated) light-intensity stimulus signal. The physical stimulus is monochromatic light intensity modulated in band-limited GWN fashion around a reference level of mean light intensity (light-adapted preparation). The response is the extracellularly recorded ganglion cell response (spike trains superimposed and binned to yield a measure of instantaneous firing frequency). The stimulus-response data are processed using the cross-correlation technique to estimate the first-order and second-order Wiener kernels of the light-to-ganglion cell system shown in Figure 2.11. Frequency-Domain Estimation of Wiener Kerneis. Wiener kerne I estimation is also possible in the frequency domain through the evaluation of high-order cross-spectra [Brillinger, 1970; French & Butz, 1973; Barker & Davy, 1975; French, 1976], which represent the frequency-domain counterparts of the high-order cross-correlations. The problem of estimation variance due to the stochastic nature of the GWN input persists in this approach as weIl.
2.2
.-
h 2 ( .,." 1'1
0,o . .012. . .-.
'<'
.05 sec t----t
79
WIENER MODELS
..
1'2)
•
.121 •
.11 •
•
,
1'2 .012
.C8I
...
.
.,
•• 8
Figure 2.11 The Wiener kernel estimates of first order (Iett) and second order (right) obtained via the cross-correlation technique for the light-to-ganglion cell system in the catfish retina [Marmarelis & Naka, 1973b].
The efficient implementation of the frequency-domain approach calls for the use of fast Fourier transforms (FFTs) with a data-length size equal to the smallest power oftwo that is greater than or equal to twice the memory-bandwidth product of the system. Thus, if the FFT size is 2M (a power of two by algorithmic necessity), then the input-output data record is segmented into K contiguous segments of 2M data points each. An estimate of the rth-order cross-spectrum, Sr,z(WJ, ... , wr) = Yr,lWI + ... + Wr)Xi(WI) ... Xj(w r)
(2.111)
can be obtained from the ith segment, where Xj( w) denotes the FFT conjugate of the ith input segment and Yr,z(w) denotes the FFT ofthe corresponding ith segment ofthe rth output residual. Then, the rth-order Wiener kernel can be estimated in the frequency domain by averaging the various segment estimates ofEquation (2.111): " 1 ~" Hr(wJ, ... , wr) = -,-r-LSr,z(wJ, ... , wr) r.PK i = l
(2.112)
where P is the input power level and (2M) is the total number of input-output data points. The rth-order Wiener kerneI estimate can be converted into the time domain via an rdimensional inverse FFT. This approach offers some computational advantages over direct cross-correlation (especially for systems with large memory-bandwidth product) but exhibits the same vulnerabilities in terms of kerneI estimation errors.
80
NONPARAMETRIC MODELING
2.2.4
Quasiwhite Test Inputs
The aforementioned quasiwhite random signals, whieh ean be used as test inputs in the eontext of Wiener modeling, have been termed "eonstant-switehing-paee symmetrie random signals" (CSRS) and they are defined by the following generation procedure [Marmarelis, 1975, 1976, 1977]: 1. Seleet the bandwidth of interest, B, (in Hz), and the desirable finite-range symmetrie amplitude distribution (with zero mean), p(x), for the diserete-time CSRS input signal x(n). 2. Draw an independent sample x(n) at every time step li.t = 1/(2Bx ) (in sec) from a random number generator aecording to the probability distributionp(x). It has been shown [Marmarelis, 1975, 1976, 1977] that a quasiwhite signal thus generated possesses the appropriate autoeorrelation properties of all orders that allow its use for Wiener-type modeling of nonlinear systems with bandwidths B ~ 1/(2Ii.t) via the erosseorrelation teehnique. In analog form, the CSRS remains constant at the seleeted value, x(n), for the duration of eaeh interval zir [i.e., for nli.t ~ t< (n + 1) li.t] and switehes at t = (n + 1) dt to the statistieally independent value x(n + 1), where it stays eonstant until the next switching time t = (n + 2)at. The fundamental time interval zir is called the "step" ofthe CSRS, and it determines the bandwidth ofthe signal within which whiteness is seeured: B; = 1/(2Ii.t). It has been shown that the statistical independence of any two steps of a CSRS is sufficient to guarantee its whiteness within the bandwidth determined by the specified step size li.t. The symmetrie probability density function (PDF) p(x) with which a CSRS is generated has zero mean, and, consequently, all its odd-order moments are zero. The even-order moments of p(x) are nonzero and finite, because the CSRS has finite amplitude range. The power level of the CSRS is (M2 • li.t), where M 2 is the second moment of p(x)--also the variance since p(x) has zero mean. This random quasiwhite signal approaches the ideal white-noise proeess as li.t tends to zero. Note that in order to maintain constant CSRS power level as li.t decreases, the signal amplitude must be divided by ~. This implies that the distribution of the asymptotic white-noise process as at ~ 0 is ~ p(x· ~) and tends to infinite variance. For instance, if the standardized normal/Gaussian distribution is used to generate the CSRS (zero mean and unit variance), then the asymptotic PDF is lim
6.t~O
V Ii.t/(2TT) exp[-x2Ii.t/2]
having varianee equal to 1/lit. We note that every member of the CSRS family is by construction a stationary and ergodie process. For a given step size li.t and a proper amplitude PDF p(x), an ensemble of random proeesses is defined within the CSRS family. Because of the ergodicity and stationarity of the CSRS, the ensemble-through and time-through statistics of the signal are invariant over the ensemble and over time, respectively. This allows us to define its autocorrelation functions in both the temporal and the statistieal (ensemble) sense. In practiee, we have finite-length signals and we are praetieally restrieted to obtaining only estimates of the autoeorrelation funetions, usually by time averaging over the reeord lengthR, as
2.2
A
4>n( 'Tb
... ,
1 'Tn-I) = R _ T
IR x(t)x(t -
m
'Tl) . . .
WIENER MODELS
x(t - 'Tn_I)dt
81
(2.113)
Tm
where Tm = max {'Tb . . . , 'Tn-I} and stationarity has suppressed one tau argument (e.g., no shift for the first term of the product). For discrete-time (sampled) data, the integral of Equation (2.113) becomes a summation. The estimate 4>n( 'Tb •.. , 'Tn-I) is a random variable itself and its statistical properties must be studied in order to achieve an understanding of the statistical properties of the kerneI estimates obtained by the cross-correlation technique. It was found that the expected value of 4>n( 'Tb . . . , 'Tn-I) is cPn( 'Tb . . . , 'Tn-I), which makes it an unbiased estimate, and the variance of 4>n( 'Tb . . . , 'Tn-I) at all points tends to zero asymptotically with increasing record length R, which makes it a consistent estimate [Marmarelis, 1977]. It has been shown [Marmarelis, 1975, 1977] that the autocorrelation functions of a CSRS are those of a quasiwhite signal; that is, the odd-order autocorrelation functions are uniformly zero, whereas the even-order ones are zero everywhere except at the diagonal strips. Note that the diagonal strips are areas within ±at (the step size of a CSRS) around every Juli diagonal of the argument space (i.e., where all the arguments form pairs of identical values). If we visualize the autocorrelation function estimate as a surface in n-dimensional space, then 4>n( 'Tb . . . , 'Tn-I) appears to have apexes corresponding to the nodal points of the n-dimensional space (i.e., the points with coordinates that are multiples of at). These apexes are connected with n-dimensional surface segments that are offirst degree (planar) with respect to each argument 'Ti. The direct implication of this morphology is that the "extrema" of the surface 4>n( 'Tb . . . , 'Tn-I) must be sought among its apexes (i.e., among the nodal points of the argument space). This simplifies the analysis of this multi dimensional surface that determines the statistical characteristics of the CSRS kerne I estimates. To illustrate this, we consider the second-order autocorrelation function of a CSRS:
:2(
1-
1;1)
for
I
'Tl 1
:::;
for
I
'Tl 1
> at
at
(2.114)
shown in Figure 2.12 along with its power spectrum (i.e., the Fourier transform ofthe autocorrelation function). The quasiwhite autocorrelation properties ofthe CSRS family are manifested by the impulse-like structure oftheir even-order autocorrelation functions and justify their use in kerne I estimation via the cross-correlation technique. However, the kernel estimates that are obtained through the use of CSRS test inputs correspond to an orthogonal functional series that is slightly different in structure from the original Wiener series. This structural difference is due to the statistical properties of the CSRS, as expressed by the moments of its amplitude PDF, which are different in general from the moments ofthe Gaussian distribution (although the latter is included in the CSRS family). The decomposition property of even products of Gaussian random variables (see Appendix 11) results in great simplification of the expressions describing the orthogonal Wiener functionals. In the general case of a non-Gaussian CSRS, however, the decomposition property does not hold and complete description ofthe amplitude PDF may require several of its moments, which results in greater complexity of the form of the CSRS orthogonal functionals. Nevertheless, the construction ofthe CSRS functionals can be made
82
NONPARAMETRIC MODELING
xh)
M~(Tr-T2) - - - - -..... l:it 0 Öt ~
..
T
1-T2
6~
~(f) = p(Sin1rf6t)2 1TfAt
-2/6'
-1I6t
o
1/6t
2/6t
log ~(f) ~
Ia::
_
>-
.~
I
logP
o
e
n
~
(/)
Frequency
1/6t2/Af logf
Figure 2.12 Portion of a CSRS quasiwhite signal (top left), its seeond-order autoeorrelation function (top right) and its power spectrum in linear and logarithmie seales (oottom) [Marmarelis & Marrnarelis, 1978].
routinelyon the basis of an orthogonalization (Gram-Schmidt) procedure similar to the one that was used in the construction of the Wiener series, under the assumption that the CSRS bandwidth is broader than the system bandwidth. In the special case where a Gaussian amplitude PDF is chosen for the CSRS, the CSRS functional series takes the form of the Wiener series, where the power level of the quasiwhite input is equal to the product of the second moment and the step size of the CSRS (P = M 2at). Thus, it becomes clear that the CSRS functional series is a more general orthogonal functional expansion than the Wiener series, extending the basic idea of orthogonal expansions using Volterra-type functionals throughout the space of symmetrie probability distributions. The advantages of such a generalization are those that accrue in any optimization problem where the parameter space is augmented. This generality is achieved at the expense of more complexity in the expressions for the orthogonal functionals if a nonGaussian PDF is chosen. Under the assumption that the CSRS bandwidth is broader than the system bandwidth, the first four orthogonal functionals {G~} that correspond to a CSRS quasiwhite input take the form [Marmarelis, 1977]
Gö[go; x(t'), t' ::; t] = go
(2.115)
Gt[g\(T\); x(t'), t'::::; t] = {"g\(T\)x(t- T\)dT\
(2.116)
2.2
er! [g2( 'Tb 'T2); X(t '), t' ~ t] =
fl
WIENER MODELS
83
L oo
oo
g2('T], 'T2)x(t - 'TI ).x(t - 'T2)d'T] d'T2 - (M2Ilt)
g2('T], 'T] )d'T]
0 0 0
(2.117)
Gj[gl 'Tb 'Tl> 'T3); X(t'), t' -s t] = -3
(M2Ilt)ff g3( o
'Tb
fff o
g3('Tb 'T2, 'T3)x(t - 'T])x(t - 'T2)x(t - 'T3)d'T]d'T2 d'T3
0 0
'T2, 'T2)x(t - 'T])d'T]d'T2 -
0
[(M 3M2)llt ]f g3( M 4
2
-
'Tb
'Tb 'T])x(t- 'T])d'T]
0
2
(2.118) where X(t) is a CSRS, Ilt is its step size, {gi} are its associated ~SRS kernels, and M 2 , M 4 , and so on are the second, fourth, and so on moments of its amplitude PDF p(x). It is evident that the deviation of the CSRS functionals (and associated kemels) from their Wiener counterparts diminishes as the CSRS amplitude distribution approaches the Gaussian profile, since then M 4 = 3M~ and G3* attains exactly the form of a third-order Wiener functional with power level P = M 2 1lt. The expressions for higher-order functionals become quite complicated since they involve all the higher even moments of the CSRS input, but their derivation can be made routinelyon the basis of a Gram-Schmidt orthogonalization procedure. Notice that the basic evenlodd separation in the structural form ofthe CSRS functionals is the same as in the Wiener functionals, i.e., the odd- (even-) order functionals consist solely of all odd(even-) order homogeneous functionals of equal and lower order. The CSRS functionals should be viewed as slightly modified Wiener functionals. The integral terms (i.e., the homogeneous functionals) of the CSRS functionals that contain higher even moments (>2) contain also a higher power of Ilt as a factor. This makes them significantly smaller than the terms containing only the second moment, since Ilt attains small values. Therefore, the CSRS functionals become (for all practical purposes) the same as the Wiener functionals for very small zsr. This implies in turn that whenever Ilt is very smalI, the CSRS kernels are approximately the same as the Wiener kernels, except possibly at the diagonal points where the finite number of CSRS amplitude levels (e.g., binary, ternary, etc.) limits estimability, as discussed below. The power spectrum of a CSRS with step size ~t and second moment M 2 is shown in Fig. 2.12 and is independent of the amplitude PDF. The bandwidth of the signal is inversely proportional to Ilt, and it approaches the ideal white noise (infinite bandwidth) as Ilt approaches zero (provided that the power level P = M 2 1lt remains finite). The orthogonality of the CSRS functionals is satisfactory when the bandwidth of the CSRS exceeds the bandwidth ofthe system under study. In order to estimate the rth-order CSRS kernel, we cross-correlate the rth-order output residual y(t) with r time-shifted versions of the CSRS input (as in the GWN input case) and scale the outcome: gr(Uh' .. , ur) = C,E[Yr(t)x(t -
UI) . . .
x(t - Ur)]
(2.119)
where r-l
Yr(t) = y(t) -
I
;=0
GT[g;(Th
... ,
Ti); x(t/), t'
5
t]
(2.120)
84
NONPARAMETRIC MODELING
and C; is the proper scaling faetor that depends on the even moments and the step size of the CSRS, as well as the loeation of the estimated point in the kernel (e.g., diagonal vs. nondiagonal). It was shown earlier that in the case of GWN inputs, the scaling factor er is (r!pr)-l. However, in the case of CSRS inputs, the scaling factors differ for the diagonal and nondiagonal points of the kerneIs. For the non-diagonal points (i.e., when all c, are distinct) the scaling factor is the same as in the GWN case, where P = M 2 flt. However, the determination of the appropriate scaling factors for the diagonal points involves higher even moments. For example, in the second-order case the scaling factor for the diagonal points ( (J' 1 = (J' 2) is found to be: 1
C2 = (M4 -
M~)~t2
(2.121)
CSRS and Va/terra Kerne/s. It is instructive to examine the relation between the Volterra and the CSRS kernels of a system (at the nondiagonal points, for simplicity of expression) in order to demonstrate the dependence of the CSRS kernels upon the even moments and the step size of the specific CSRS that is used to estimate the kerneIs. Recall that the Volterra kemels of a system are independent of input characteristics, unlike the Wiener or CSRS kernels that depend on the GWN or CSRS input characteristics (i.e., the power level or the step size and even moments, respectively) [Marmarelis & Marmarelis, 1978]. The resulting expressions for the CSRS kernels in terms of the Volterra kernels {k i } of the system are more complicated than their Wiener counterparts given by Equation (2.57). The general expression for the nondiagonal points of the even-order CSRS kernels is
I Cm.n(p]) m=n 00
g2n(CTb
... ,
CT2n) =
+ I~tIDI.m.n(Pl"" ~l
{(X) I· 0
··l k 00
0
2m( Tl>
,pl+l)I~· ·fk2m(Tb " 0
.•• ,
T2m-2m
" T2m-2n--2b
CTb •.• ,
CTI"'"
CT2n)dT] ... dT2m-2n
CT2n)dTI ... dT2m-2n-2/}
0
(2.122) where PI is a "generalized power level of Ith order" for CSRS inputs, defined as
P/=M2 /fl r
(2.123)
The function Cm,n depends only on the conventional power level P = PI, hut the function DI,m,n is a rational expression of the generalized power levels. Note, however, that the terms of the second summation in the expression (2.122) are negligible in comparison with the terms of the first summation, since flt is very small. Furthermore, the function Cm.n(P 1) tends to the coefficient found in the relation between the Wiener and the Volterra kernels given by Equation (2.57) as flt tends to zero. Hence, the CSRS kemels at the nondiagonal points are approximately the same as the Wiener kernels (as long as they have the same power level P = M 2 flt). The only significant difference hetween the CSRS and the Wiener kernels of a system is found at some diagonal points whenever the CSRS attains a finite number of amplitude levels, as discussed below.
2.2
WIENER MODELS
85
The Diagonal Estimability Problem. Every CSRS waveform attains (by construction) a finite number L of amplitude levels. This limits our ability to estimate via crosscorrelation the kernel values at diagonal points that have dimension L or above. For instance, a binary CSRS input x(t) attains the values of ±A. Therefore, x 2(t) is a constant A2 for this binary CSRS input and the cross-correlation formula of Equation (2.119) yields at the diagonal points ((Tl = (T2 = (T) ofthe second-order binary kernel g~((T,
(T) = C2A2E[y2(t)] = 0
(2.124)
since the second-order output residual has zero mean by construction. The same zero value is obtained for all diagonal points ofhigher-order binary kerneIs with even parity (i.e., when an even number of arguments are the same) because x 2n(t) = A2n and the high-order cross-correlations reduce to lower-order ones that are orthogonal to the output residual. By the same token, the estimated values of binary kerneIs at diagonal points with odd parity are zero, because all the odd-order cross-correlations reduce to a first-order one that is zero for all residuals of order higher than first. For instance, the third-order binary kernel estimate at the diagonal of parity two (i.e., lTI = (T/, (T2 = (T3 = (T) is g~( (T/, a,
(T) = C02E[y3(t)X(t - (T/)] = 0
(2.125)
since the third-order output residual is orthogonal to x(t - (T/). The estimates at the diagonal points of parity three ((Tl = lT2 = (T3) also yield zero result, because the third-order cross-correlation reduces to the first-order one indicated by Equation (2.125). It can be stated, in general, that the binary CSRS input "folds" all diagonal kernel estimates of dimension two or above (obtained via cross-correlation) into lower-order projections and yields a zero estimated value (inability to estimate these diagonal kerne I values) because of the orthogonality between the output residual and the reduced order of the instrumental homogeneous functional (i.e., the number of cross-correlated input product terms is smaller than the order of the output residual). The same can be stated for a ternary CSRS input with regard to the diagonal of the third-order ternary kerne I estimate or all diagonal values of dimension three and above in higher-order ternary kerneIs. In general, a CSRS input with L amplitude levels cannot yield kerne I estimates (via cross-correlation) at diagonal points of dimension L or above. The contributions of these diagonal points to the system output are folded into lower-order functional terms and the resulting cross-correlation estimates of the CSRS kerneIs for these diagonal points are zero. An analytical example is given below that elucidates these points. Since kernel estimation via cross-correlation has been limited in practice to second order (and rarely extending to third order), the practical implications ofthe aforementioned fact of "diagonal estimability" attain importance only for binary (and occasionally ternary) quasiwhite inputs. The binary test inputs have been rather popular among investigators, in either the CSRS form or in the pseudorandom m-sequence form (discussed below). There fore, the issue of "diagonal estimability" attains practical importance, since many investigators have been perplexed in the past by the so-called "diagonal problem" when binary test inputs are used to estimate second-order kerneIs. This discussion elucidates the problem and provides useful guidance in actual applications. An illustrative example from areal physiological system (the fly photoreceptor) is given in Figure 2.13, where the second-order kerneI estimates obtained via cross-correlation
86
NONPARAMETRIC MODEUNG
li 'c
FIRST-ORDER KERNEL ESTIMATE G,(,)
50
40
:J
.;
30
c-
c
40
:::: 81
20 10
::-
Ci
I
~
,
I
I
,
0.00
u.06
T (sec)
FIRST-ORDER KERNEL ESTIMATE 0.(.)
50
:J .;
_c u
::::1-~ 1020 ~ -10o ~
li
30
o
- 10
~
L-0.00
0.06 r
(sec)
0.05,
0.06r--=:._•._
"t:<;uNu-OROER KERNEL ESTIMATE G,('I ' ")
, SECONO-OROER KERNEL ESTIMATE 0..('" ")
V"
,-.
~' ,
"
u
,:
~
,:
-t
r-
~ffs::.~·'1.
'
j
~
':;.
D~' -" ') 0 ". _ . -
.
,::;
t~~(~~t~})
,
r
.',
! :{r!"j:~:
~t~~iy;~.
0.00
(I
!-- '
~
~ ~/ "
" ':'/~
;;';::, :,~ .{:~:':' : '
0.00
f
u
~ .c:-:
~
,
(sec)
!
-
I
V.OS
0.00' 0.00
,
"
I
V.OS ft
'. (sec)
Figure 2.13 The first- and second-order CSRS kemel estimates of the fly photoreceptor obtained with the use of binary (right) and ternary (Ieft) test inputs. The differences along the diagonal of the second-order kemel are evident [Marmarelis & McCann. 1977].
using binary and ternary CSRS test inputs are shown. The differences along the diagonal points are evident and point to the potential pitfalls of the common (but questionable) practice ofinterpolating the diagonal values ofthe binary kerneI estimate to overcome the "diagonal problem." It is evident from Figure 2.13 that diagonal interpolation ofthe binary kernel estimate would not reproduce the correct kernel values (represented by the ternary estimate) especially along two segments ofthe diagonal (the part between the two positive humps and for latencies shorter than 8 msec). There may be certain cases in which interpolation may yield reasonable approximations along the diagonal (depending on the true kernel morphology) but this cannot be asserted in general. However, even in those cases where interpolation is justifiable, adjustments must be made to the lowerorder kernel estimates of the same parity (e.g., the zero-order kernel estimate when the second-order kernel is interpolated along the diagonal) in order to balance properly the contributions of these kernel values to the system output. An Analytical Example. As an analytical example of the relation between CSRS and Volterra kerneis, consider a third-order Volterra system [i.e., a system for which kn( 'T\, 'T2, ... , 'Tn ) = 0 for n > 3]. If k o = 0, the system response to a CSRS x(t) is y(t) =
r
f
k\(T\).x(t- 'T\)d'T\ + L'\2('TI> 'T2)x(t- 'T\)x(t- 'T2)d'Tld'T2
+
ffJo k3(
'TI> 'T2,
'T3).x(t - 'T\)x(t- 'T2)x(t - 'T3)d'T\d'T2 d'T3
(2.126)
If we consider a CSRS input with more than three amplitude levels (to avoid the aforementioned problem of"diagonal estimability"), then the CSRS kernels ofthe system (for the nondiagonal points) are
2.2
r
go = (M21:1 t)
f
g\(O"I) = kl(O"I) + 3(M2I:1t)
o
k2(TI,
k3( 0" 1> Tl> TI)dTI + [(
g2((Tb (T2) =
g3((Tb (T2, (T3)
WIENER MODELS
(2.127)
TI )dTI
~: - 3M2)l:1t2]k3
k2 ( (Tb
= k 3( (Tb
87
( 0" 1> 0"1' 0"1)
(2.128)
(T2)
(2.129)
(T2' (T3)
(2.130)
The second-order and third-order CSRS kernels are identical to the Volterra kernels because of the absence of nonlinearities of order higher than third and the multilevel structure ofthe CSRS input (more than three levels). The zeroth-order CSRS kernel depends upon the second-order Volterra kemel, and the first-order CSRS kerneI depends upon the first-order and third-order Volterra kernels in a manner similar to the dependence on the Wiener kernels as ~t tends to zero. If a binary CSRS input is used, then the diagonal values of the estimated second-order and third-order binary kernels are zero, as indicated in Eqs. (2.124)-(2.125). The "nulling" of the diagonal values of g~ and g~ simplifies the form of the CSRS functionals given by Equations (2.115)-(2.118), because only the leading term of each functional remains nonzero for a binary CSRS input:
oq*[g~; x(t'), t'::5 t] =
~*[g~; x(t'), t'::5 t] =
frf o
ff ~(Tl>
T2)x(t- TI)x(t- T2)dTl dT2
rl(TI> T2' T3)x(t- TI)x(t- T2)x(t- T3)dTl d T2dT3
(2.131)
(2.132)
where the superscript "b" denotes "binary" functionals or kemels, This simplification applies to all higher-order functionals that may be present in a system. This modified form of the CSRS functional series for binary inputs will be revisited in the study of neuronal systems with spike-train inputs (see Chapter 8). If a ternary CSRS input is used, then only the main diagonal values of g3 become zero and only the third-order CSRS functional is simplified by elimination of its last term (for this example of a third-order system). For a high-order system, the kernel values for all diagonal points of parity three or above attain zero values, with concomitant simplifications of the respective ternary functionals. Since most applications have been limited to second-order models thus far, the ternary CSRS inputs appear to be a better choice than their more popular binary counterparts in terms of avoiding the "diagonal problem" in the second-order kerneI estimate. It is instructive to explore also the form of the Wiener kemels of this third-order system as an illustrative example. The second-order and third-order Wiener kernels are identical to their Volterra (and nondiagonal CSRS) counterparts for this system. The zerothorder and first-order Wiener kernels are given by ho =
pr o
l0.( Tl> TI)dTI
(2.133)
88
NONPARAMETRIC MODELING
h\(u\) = k\(u\) + 3P{"klub
'Tb 'T\)d'Tl
(2.134)
where P is the power level of the GWN input. In other words, the Wiener kernels can be viewed as a special case of CSRS kernels, when M 4 = 3M~ (i.e., when the amplitude distribution is Gaussian), or when is very small [i.e., when the third term of gl in Equation (2.128) becomes negligible relative to the second term].
at
Comparison of Model Prediction Errors. Of importance in system modeling is the accuracy (in the mean-square error sense) ofthe model-predicted response to a given stimulus. This issue was discussed previously with regard to Wiener models and is examined here with regard to the predictive ability of a CSRS model relative to a Volterra model of the same order. To simplify the mathematical expressions (without loss of conceptual generality), we consider the zeroth-order model ofthe system in the previous analytical example. The mean-square error (MSE) ofthe zeroth-order Volterra model prediction is (2.135)
Qv = E[y2(t)]
because we considered ko = 0, while the MSE ofthe zeroth-order CSRS model prediction is
Qc = E[ [y(t) - gO]2]
= E[y2(t)] + g6 - 2goE[y(t)]
(2.136)
Therefore, the improvernent in accuracy of the zeroth-order model prediction, using a CSRS model instead of a Volterra model for an arbitrary input, is i o = Qv- Qc
= 2goE[y(t)] - g6
(2.137)
If the input signal is the CSRS with which the model was estimated, then i o = g6
=
[PI L'"k
2( 'Tb 'T\)d'T\
J;: :
0
(2.138)
As expected, we always have improvement in predicting the system output for a CSRS input of the same power level with which kernel estimation was performed. If other CSRS inputs of different power level Pi are used to evaluate the zeroth-order model, then i o = P\(2Pj -
p\{{"k
2 ( 'Tb 'T\)d'T\
r
(2.139)
which is similar to the result for the Wiener model [see Equation (2.82)]. Thus, we can have improvement or deterioration in the accuracy of the zeroth-order model prediction, depending on the relative size of the power levels.
2.2
WIENER MODELS
89
Ifthe input x(t) is an arbitrary signal, then io =
r,('k2(Tl' T))dT] {2JL( 'k I(T))dTI + 2 f +
r
f
ki Tl>
Ll
T2)[2cf>2( Tl> T2) -
k3( Tb T2' T3) cf>3( T], T2' T3)dT] d T2dT3
T2)]dTI d T2}
PI 8( TI -
(2.140)
where 4>2 and 4>3 are the second- and third-order autocorrelation functions ofthe input signal and J.L is the input mean. Equation (2.140) clearly demonstrates the fact that the improvement (or deterioration) in the case of an arbitrary input signal depends upon the autocorrelation functions of this signal. This establishes the important fact that the performance of models obtained with quasiwhite test inputs (including band-limited GWN) depends crucially on the relation ofthe autocorrelation functions ofthe specific input with the kernels and the ones ofthe quasiwhite signal used to obtain the model.
Discrete-Time Representation of the CSRS Functional Series. Since sampled data are used in practice to perform the modeling task, it is useful to examine the form of the CSRS functional series in discrete time. This form is simplified when the sampling interval T is equal to the CSRS step size Si. Because of aliasing considerations, T cannot be greater than at and if T < at, the integrals of the continuous-time representation of the CSRS functionals attain complicated discretized forms. Therefore, we adopt the convention that T = at for actual applications, which allows the conversion of the integrals of the CSRS functionals into summations to yield the discrete-time representation (for at = T): Gi(n) =
TI g}(m)x(n - m)
(2.141)
m
Gi(n)
Gj(n)
=
=
I
T2I g2(mJ, m2)x(n - m})x(n - m2) -M2T2I g2(m, m) m} m2 m
I I I
T3
(2.142)
g3(mJ, m2' m3)x(n - m})x(n - m2)x(n - m3)
m} m2 m3
- 3M2T 3I I
m m'
g3(m, m', m ')'X(n - m) -
(MM 3M ) T 3I 4
2
-
2
g3(m, m, m)x(n - m) (2.143)
m
where the tilde denotes the discrete-time (sampled) counterparts of the continuous-time variables. Aside from possible scaling factors involving T, the discretized values of the kernels and input variables are simply the corresponding sampled values. These discretized forms should be used in practice to interpret the CSRS kernels and functionals when the cross-correlation technique is used for CSRS kernel estimation.
Pseudorandom Signals Based on m-Sequences. In order to reduce the natural redundancy ofthe random quasiwhite signals, while still preserving quasiwhite autocorrelation properties, one may employ specially crafted pseudorandom signals (PRSs) based on m-sequences, which are deterministic periodic signals with quasiwhite autocorrelation properties within aperiod of the PRS. These PRS signals are generated with linear autorecursive relations designed to yield sequences with maximum period (see below) [Zier-
90
NONPARAMETRIC MODELING
ler, 1959; Gyftopoulos & Hooper, 1964; Barker, 1967; Golomb, 1967; Davies, 1970; Moller, 1973; Sutter, 1975]. An important advantage of the PRS is the fact that their second-order autocorrelation function is zero outside the neighborhood of the origin (zero lag) and within, of course, the period of the signal (since the autocorrelation functions are also periodic). This is an advantage over random quasiwhite signals (such as the CSRS), which exhibit small nonzero values in this region of their second-order autocorrelation function for finite data records. The latter cause some statistical error in CSRS kerneI estimation (see Section 2.4.2). However, the PRS exhibit significant imperfections in their higher even-order autocorrelation functions, which offset their superiority in the second-order autocorrelation properties and may cause significant errors in the estimation ofhigh-order kemels [Barker & Pradisthayon, 1970]. For this reason, the PRS are most advantageous in identification of linear systems, whereas the presence of nonlinearities in the system makes the choice between random and pseudorandom quasiwhite test signals dependent upon the specific characteristics of the system at hand. The PRS exhibit the same stair-like form as the CSRS, i.e., they remain constant within small time intervals defined by the step size li.t and switch abruptly at multiples of li.t. Their values at each step are determined by a linear recurrence formula of the form x, ==
al
0
Xi-l
EB a2 ® X i-2 EB ... EB a.; ® Xi-rn
(2.144)
where the coefficients aj and the signal values x, correspond to the elements of a finite Galois field (i.e.a finite set of integer values equal to apower of a prime number). The operations {0, EB} are defined to be internaIoperations of multiplication and addition for the specified Galois field. For example, in the case ofa binary pseudorandom signal, the Galois field has two elements (0 and 1) and the operations EB and ® are defined to be modulo 2 (i.e., corresponding to "AND" and "OR" Boolean logic operations), so that the outcome of the recurrence fonnula (2.144) is also an element of same Galois field (i.e., binary). It is evident that a sequence {Xi} constructed on the basis of the linear recurrence formula (2.144) is periodic, because the root string of m consecutive values of x, will repeat after a finite number of steps. The length of this period depends on the specific values of the coefficients aj and the order m of the recurrence formula (for a given Galois field). Among all the sequences {x} constructed from the L members of a certain Galois field (a prime number) and with linear recurrence fonnulae of order m, there are some that have the maximum period. Since I/" is the number of all possible distinct arrangements with repetitions of L elements in strings of m, this maximum period is (Lm - 1), where L is a power of a prime number and the null string is excluded. These maximum-period sequences are called "maximum-length" or "m-sequences," and they correspond to a special choice ofthe coefficients {al, ... ,am} that coincide with the coefficients of a primitive (or irreducible) polynomial of degree (m - 1) in the respective Galois field [cf. Zierler, 1959]. Thus, we can always select the number of elements L and the order of the recurrence fonnula m in such a way that we get an m-sequence with a desirable period (within the limitations imposed by the integer nature of m and prime L). The generation of PRS in the laboratory is a relatively simple task. Suppose we have decided upon the prime number L of amplitude values that the signal will attain and the required maximum period (Lm - 1) [i.e., the order m of the linear recurrence forrnula (2.144)]. Now, we only need the coefficients of a primitive polynomial of degree (m - 1) in the respective L-element Galois field. If such a primitive polynomial is available (such polynomials have been tabulated, cf. Church, 1935), then we choose an initial string of
2.2
WIENER MODELS
91
values and construct the corresponding m-sequence. Any initial string (except the null one) will give the same m-sequence (for a given set of coefficients aj), merely shifted. Specialized pieces ofhardware can be used for real-time generation. For example, a binary m-sequence can be generated through a digital shift register, composed of a cascade of flip-flops (0 or 1) and an "exclusive OR" feedback connection, as shown in Figure 2.14. Upon receipt of a shift (or clock) pulse, the content of each flip-flop is transferred to its neighbor and the input to the first stage is being received from the output ofthe "exclusive OR" gate (which generates a 0 when the two inputs are same, or 1 otherwise). We note that a 15-bit m-sequence can be generated by this four-stage shift register, corresponding to the maximum number (24 - 1) ofpossible four-bit binary numbers, except for the null one (since ifOOOO ever occurred, the output thereafter would be zero). Such pseudorandom sequences, produced by shift registers with feedback, have been studied extensively (cf. Davies, 1970; Golomb, 1967; Barker, 1967) in connection with many engineering applications, especially in communication systems (spread-spectrum communications and CDMA protocols ofwireless telephony). Table 2.1 gives the possible stage numbers in the shift register from which the output, along with the output from the last stage, could be fed into the "exclusive-OR" gate and fed back into the first stage, in order to obtain a maximum-length binary sequence [Davies, 1970]. The quasiwhiteness of a pseudorandom signal (and, consequently, its use for Wiener/CSRS kerne I estimation in connection with the cross-correlation technique) is due to the shift-and-add property of the m-sequences (cf. Ream, 1970). According to this property, the product (of the proper modulo) of any number of sequence elements is another sequence element: Xk-jI
@
Xk-j2
@ ... @
Xk-jm
(2.145)
= Xk-jo
where z, depends on}I'}I' ... .i; but not on k. A slight modification must be made in the m-sequences with even numbers of levels in order to possess the antisymmetric property, which entails the inversion of every other bit of the m-sequence (doubling the period of the sequence) and makes the odd-order autocorrelation functions uniformly zero. As a result of the shift-and-add property and the basic structural characteristics of the m-sequences (i.e., maximum period and antisymmetry), the odd-order autocorrelation functions are uniformly zero everywhere (within aperiod of the signal) and the even-order ones approximate quasiwhiteness. Note that the even-order autocorrelation functions of order higher than second exhibit some serious imperfections (termed "anomalies" in the literature), which constitute a serious impediment in the use of PRS test inputs for nonlinear system identification using the
Reollt.,s:
.1
#3
#4 Output
Figure 2.14 Shift register with "exclusive OR" feedback for the generation of pseudorandom sequences [Marmarelis & Marmarelis, 1978].
92
NONPARAMETRIC MODELING
Table 2.1 Stages that Can Be Combined with the Last Stage of Various Shift Registers to Generate Maximum-Length Binary Sequences*
Number of stages in shift-register
Stage number giving feedback
5 6 7 9 10 11 15 18 20 21 22 23 25 28 31 33
2 1 1 or 3 4 3 2 1,4, or 7 7 3 2 1 5 or 9 3 or 7 3,9,or 13 3,6,7, or 13 13
Sequence length in bits 31 63 127 511 1023 2047 32767 262,143 1,048,575 2,097,151 4,194,303 8,388,607 33,554,431 268,435,455 2,147,483,647 8,589,934,591
*From Davies (1970) .
cross-correlation technique [Barker et al., 1972]. These anomalies, first observed by Gyftopoulos and Hooper (1964, 1967), have been studied extensively by Barker and Pradisthayon (1970). Barker et al. (1972) studied several PRS (binary, temary, and quinary) and compared their relative performance, showing that these anomalies are due to existing linear relationships among the elements ofthe m-sequence and that their exact position and magnitude can be determined from the generating recurrence equation through a laborious algorithm related to polynomial division. Since these anomalies are proven to be inherent characteristics of the m-sequences related to their mathematical structure, their effect can be anticipated and potentially mitigated through an elaborate iterative scheme. The kemeis estimated by the use of a PRS and the corresponding functional series are akin to the CSRS kerneis ofthe associated multilevel amplitude distribution, corresponding to the specific PRS [Marmarelis & Marmarelis, 1978].
Comparative Use of GWH, PRS, and CSRS. The quasiwhite test input signals that have been used so far to estimate the Wiener/CSRS kemeis of nonlinear systems through cross-correlation are band-limited GWN, PRS, and CSRS. Each one of these classes of quasiwhite input signals exhibits its own characteristic set of advantages and disadvantages, summarized below. For the GWN. The main advantage ofband-limited GWN derives from its Gaussian nature, which secures the simplest expressions for the orthogonal functional series (the Wiener series) because of the decomposition property of Gaussian random variables (which allows all the even-order autocorrelation functions to be expressed in terms ofthe second-order one). Additionally, the use of a GWN test input avoids the estimation problems at diagonal kerne1points, associated with the use ofbinary or temary PRS/CSRS.
2.2
WIENER MODELS
93
The main disadvantages of GWN are the actual generation and application in the laboratory (including the unavoidable truncation of the Gaussian amplitude distribution), as weIl as the imperfections in the autocorrelation functions due to its stochastic nature and the finite data records. For the PRS. There are two main advantages ofbinary or ternary PRS:
1. Easy generation and application. 2. Short records required to form the desirable autocorrelation functions. The main disadvantages ofPRS are: 1. The anomalies in their higher (>2) even-order autocorrelation functions, which may induce considerable kernel estimation errors ifthe system contains significant nonlinearities. 2. The inability to estimate the diagonal kerneI values using binary PRS. For the CSRS. The main advantages of the CSRS are:
1. Easy generation and application. 2. Their autocorrelation functions do not exhibit any "anomalies" as in the case ofPRS. 3. Error analysis is facilitated by the simple structure of CSRS, allowing the design of an optimum test input. 4. The user is given flexibility in choosing the signal with the number of levels and amplitude PDF that fits best the specific case at hand. The main disadvantages ofthe CSRS are: 1. They require fairly long records in order to reduce the statistical error in the kerne I estimates (as in the case of GWN). 2. The analytical expressions concerning the corresponding functional series and kernels are fairly complicated (e.g., relation of CSRS kernels with Volterra kernels, normalizing factors of the cross-correlation estimates, etc.). 3. The inability to estimate the diagonal kerne! va1ues using binary CSRS. Note that for a ternary test input (CSRS or PRS), this inability concerns the estimation ofthe third-order kernel main diagonal (or higher order diagonals). Besides these basic advantages and disadvantages of GWN, PRS, and CSRS, there may be other factors that become important in a specific application because of particular experimental or computational considerations [Marmarelis, 1978a,b].
2.2.5 Apparent Transfer Function and Coherence Measurements One of the questionable habits forced upon investigators by the lack of effective methodologies for the study of nonlinear systems is the tendency to "linearize" physiological systems with intrinsic nonlinearities by applying uncritically linear modeling methods in the frequency domain. An "apparent transfer function" measurement is typically sought in
94
NONPARAMETRIC MODELING
those cases, often accompanied by "coherence function" measurements in order to test the validity ofthe linear assumption or establish the extent ofthe "linearized" approximation. Specifically, the "coherence function" is computed over the entire frequency range ofinterest using Equation (2.149), and the proximity of its values to unity is examined (coherence values are by definition between 0 and 1). If the coherence values are found to be close to unity over the frequency range of interest, then the inference is made that the linear time-invariant assumption is valid and the noise content of the experimental data is low. In the opposite case, the reduced coherence is thought to indicate the presence of either system nonlinearities (and/or nonstationarities) and/or high noise content in the data. Distinguishing among those possible culprits for the reduction in coherence values requires specialized testing and analysis (e.g., repetition of identical experiments and averaging to reduce possible noise effects, or application of nonlinear/nonstationary analysis to assess the respective effects). As a rule of thumb, coherence values above 0.8 are thought to validate the linearized approximation. In this section, we examine the apparent transfer function and coherence function measurements in the general framework ofnonlinear systems with GWN inputs following the Wiener approach [Marmarelis, 1988a], thus offering a rigorous guide for the proper interpretation of these two popular experimental measurements. In the Wiener series representation of nonlinear systems/models, the Wiener functionals are constructed to be mutually orthogonal (or have zero covariance) for a GWN input with power level P. Thus, using this statistical orthogonality (zero covariance) and the basie properties of high-order autocorrelation functions of GWN summarized in Appendix 11, we can find the output spectrum to be Sy{f) = h5 5{f) + PIHI {f )12 +
+
Ir!pr
r=2
r· .. f
IRr(Uh · . · , Ur-hf-
Ul - •. • -ur-l)1 2dul
· · . dUr-l
(2.146)
-00
where H; is the r-dimensional Fourier transform ofthe rth-order Wiener kernel ofthe system h., andfdenotes frequency in Hz. Since the input-output cross-correlation is the firstorder Wiener kernel scaled by P [see Equation (2.98)], its Fourier transform yields the cross-spectrum when the input is GWN: (2.147)
Syw(f) =PHI(f)
Therefore, the commonly evaluated "apparent transfer function" (ATF) is the first-order Wiener kernel in the frequency domain: Rapp (f)
since Sw(f)
=
A
=
Syw(f) (f) Sw(f) = H 1
(2.148)
P for a GWN input. The coherence function becomes y-(f) ~
ISyw(f) 2 Sw(f)Sy(f) 1
IHI (f )12 2
~
IHI{f)1 + L r!P r=2
r-I
J . .. JIHr(Uh···, ur-hf- UI - ... -ur-I)I dUI ... dUr-I 2
(2.149)
2.2
WIENER MODELS
95
für all frequencies other thanf= o. Since the summation in the denominator ofEquation (2.149) includes only nonnegative terms, it is clear that the coherence function will be less than unity to the extent determined by the GWN input power level and the indicated integrals ofthe high-order kernels ofthe system. Equation (2.149) shows that for a linear system or very small P, the coherence function is close to unity. Note that the coherence function is further reduced in the presence of noise. For instance, in the presence of output-additive noise n(t) the output signal is y(t) = y(t) + n(t)
(2.150)
which leaves the input-output cross-spectrum unchanged (Syw == Syw) ifthe noise has zero mean and is statistically independent from the output signal, but the output spectrum becomes Sy{f) = Sy{f) + Sn(f)
(2.151)
where Sn(f) is the noise spectrum. Thus, the coherence function for the noise-contaminated output data is reduced according to the relation Sy{f)
',;Z(f) = y(f) Sy(f) + Sn(f)
(2.152)
Since the ATF equals the Fourier transform of the first-order Wiener kernel, we can express it in terms of the Volterra kernels of the system: Happ{f) =
+ I)!pm foo L (2mm.2 , 00
m
m=O
. ..
fK
2m+ I (f,
u-; -UJ,
. . . ,Um, -Um)dUI . . .
du.; (2.153)
-00
which indicates that the ATF of a nonlinear system depends on all odd-order Volterra kernels of the system and the GWN input power level. Therefore, measurements of Happ{f) with GWN inputs differ from the linear transfer function of the system [represented by the first-order Yolterra kernel K1{f)] and vary with the power level ofthe GWN input as apower series, according to Equation (2.153). In many studies of physiological systems, experimental considerations dictate that GWN test inputs ofvarious nonzero mean levels be used (e.g., in the visual system the input light intensity can assume, physically, only positive values). In those cases, the computation of the Wiener kemels requires de-meaning of the input and the resulting kernel estimates depend on the input mean level. Thus, the ATF in this case becomes
HJJapp(f) =
ff
m~O '~O
(2m + t+ 1)!
m! t!
L: ... fK~m+e+l(f,
(P)m 2 (p., - JLo) e
UI> -UI> . . . ,Um, -Um'
0, ... , O)du, ... du.; (2.154)
where /L is the nonzero mean of the GWN test input and /Lo is the reference mean level (possibly, but not necessarily, zero) for the nominal Volterra kernels {KiO}. The coher-
96
NONPARAMETRIC MODELING
ence function measurement is also affected, because the Wiener kernel estimates depend on the GWN input JL used in the experiment.
Example 2.5. L-N Cascade System As an illustrative example, we consider the case of a simple L-N cascade of a linear filter, with impulse response function g( r), followed by a static nonlinearity that is represented by a Taylor series expansion with coefficients {c.}. The system output is y(t) =
~arv'{t) = ~ar[rg(T)x(t- T)dTl
(2.155)
where v( t) is the output of the linear filter. The Volterra kernels ofthis system in the frequency domain are (see Section 4.1.2) Krift, ... ,Ir) = arGCft) ... G(fr)
(2.156)
and the ATF can be found using Equation (2.153) to be _
Happ(f) -
{
J:o (2mm!2+ l)!pm 00
m
.
a2m+1
[J 1 OO
-00
2
G(u) 1du
]m}
G(f)
= C' G(f)
(2.157)
i.e., it is a scaled version of G(f) which is the transfer function ofthe linear component of the cascade. The scaling factor c depends on the static nonlinearity, the input power level, and the Euclidean norm of G(f). The coherence function in this example is found from Equations (2.157) and (2.149) to be (forf* 0) c2 IG(f )12
y(j) = 2 c 2 1G(f)1 + Lr!pr-IA; r=2
J... JlG(ul) . · . G(Ur-I)G(f- UI - ... - ur_I)1
2dul·
.. du.s,
(2.158) where
2m)!pm J:o (r +r!m!2 00
Ar =
m
ar+2m
[J lG(u)1
2du ]m
(2.159)
This indicates that, even in this relatively simple case, the coherence function is a rather complicated expression, dependent on the system nonlinearities and the input power level in the manner described by the denominator ofEquation (2.158). To reduce the complexity, let us consider a quadratic nonlinearity (i.e., CXr = 0 for r > 2). Then, Happ(f) = al G(f)
(2.160)
2.2
WIENER MODELS
97
and: IG(f)12
y2(j) = 2
[ IG(f)1 +
]
2P(:~ YJIG(U)G(f- u)1
(2.161)
2du
for f =/:. o. Equation (2.161) indicates clearly that the quadratic nonlinearity reduces the coherence values more as P and/or (a2/a})2 increase. An illustration ofthis is given in Figure 2.15 where part (a) shows the ATF measurements of a quadratic cascade system for three different values of GWN input power level (P = 0.25, 1, and 4) obtained from input-output records of 4,096 data points each. We observe slight variations in the ATF measurements due to the finite data record, which leads to imperfectly forming averages. Part (b) of Figure 2.15 shows the coherence function measurements obtained from the same input-output data records and values of P. The observed effect of P on the coherence function measurement is in agreement with Equation (2.161) [Marmarelis, 1988a].
Example 2.6. Quadratic Valterra System Since quadratic nonlinear systems are quite common in physiology (e.g., in the visual system), we use them in a second example that is more general than the first in that the system is described by two general kernels (K}, K 2 ) and is not limited to a cascade arrangement. Note that these kernels are the same for the Volterra and the Wiener expansions-a fact that holds for all second-order systems (Kr == 0 for r> 2). For this general second-order system, we have (forf=/:. 0) (2.162)
Happ(f) = K}(f) IK)(f)1
y(j) =
IK) (f)1
2
+
2
2PJIK u,f- u)1 2(
(2.163) 2du
Equation (2.162) shows that the ATF in this case is identical to the linear portion (first-order Volterra kerneI) of the system. However, if a nonzero mean GWN test input is used (as in the visual system), then we can find from Equation (2.154) that H':pp(f) = K?(f) + 2(JL - JLo)K~(f, 0)
(2.164)
i.e., the ATF measurement is affected by the second-order kernel and the nonzero mean of the GWN test input signal. Equation (2.163) shows that the coherence function values are reduced as the relative nonlinear contribution increases. The latter is different for different GWN input mean level JL, since the corresponding Kf(f) or Hf(f) will vary with JL as Kf(f)
=
K?(f) + 2(JL - JLo)K~(f, 0)
while K 2 (or H 2 ) remains the same for all values of JL.
(2.165)
98
NONPARAMETRIC MODELING
0.400E-Q2
100
0.360E-Q2
0900
0.320E-Q2
O.BOO
0.2BOE-Q2
0.700
0.240E-Q2
0600
0200E-Q2
0500
0.160E-Q2
0400
0.120E-Q2
0300
0.BOOE-Q2
0200
~~
p
t
0.100
0.400E-Q2
0.0 1 00
, 0.100
,
,
,
,
0200
0.300
0.400
0.500
0.0 1 00
, 0.100
,
,
,
0200
0300
0.400
I
0.500
Frequency
Frequency (a)
(b)
Figure 2.15 (a) The gain of the apparent transfer function of a quadratic cascade system for three different power levels of GWN input (P = 0.25, 1, and 4); (b) the coherence function measurements for the three GWN power input levels [Marmarelis, 1988a].
Example 2. 7. Nonwhite Gaussian Inputs We can use the case of quadratic nonlinear systems to demonstrate also the effect of nonlinearities on coherence and apparent transfer function measurements when the input x(t) is zero-mean Gaussian but nonwhite. In this case, the cross-spectrum is Syx(f) = K1(f)Sx(f)
(2.166)
and the output spectrum is (for/* 0)
Si/)
=
IK\(f)1 2Sx(f) + 2
J~IK2(U,J-
u)1 2SxCu)Sx(f- u)du
(2.167)
where S, is the input spectrum. Consequently, the ATF for quadratic systems is exactly the linear component of the system, K\(f), regardless of whiteness of the input. On the other hand, the coherence function is (for/* 0) 1
r(f)
=
Joo IKz(u,J- u) !2[SxCU)]Si/_ u)du
1 + 2 -=
K \ (f)
(2.168)
Si/)
which is clearly less than unity to the extent detennined by the degree of nonlinearity and the input spectrum, as indicated in the second tenn of the denominator.
Example 2.8. Duffing System As a final example, let us consider a system described by a relatively simple nonlinear differential equation:
Ly+ ay 3 =
X
(2.169)
2.2
WIENER MODELS
99
where L is a linear differential operator of qth order with constant coefficients [i.e., L(D)
= cp + ... + cID + Co, where D denotes the differential operator d(·)/dt]. When L is of second order, Equation (2.169) is the Duffing equation, popular in nonlinear mechanics because it describes a mass-spring system with cubic elastic characteristics. The Volterra kernels ofthis nonlinear differential system are [Marmarelis et al., 1979]
KI(f) = l/L(j)
(2.170)
K 2(fi,12) = 0
(2.171)
K 3Cft,12,13) = -aKI(fi )K I(f2)K 1(f3)K ICft +12 +13)
(2.172)
The general expression for the odd-order Volterra kernels of this system is
3(-a)r KZr+\U;, ... ,fzr+l) = (4,-2 _ 1) K\(f;) ... KlJir+\)K\U; + ... + fzr+I)IK\(fj\ + jjz + jj3) r lIJ2J3 (2.173) where the summation is taken over all combinations of three indices from the integers 1 through (2r + 1). All the even-order Volterra kemels of this system are zero. Since the (2r + 1)th-order Volterra kernel is proportional to a", we can simplify the equivalent Volterra model when a ~ 1 by neglecting the odd-order Volterra kernels of order higher than third. Then, Happ(f)
== KI(f) - AKI(f)
(2.174)
and
11 -
y(f)=
11 -
AK I (1)1 2
(2.175)
AK\(f)I Z + 6a zpzf f IK1(u\)K,(uz)K\(f- », - uz)IZdu\duz
where
A= 3PafIK\(u)IZdu
(2.176)
Note that Equation (2.174) provides a practical tool for exploring a class of nonlinear feedback systems, as discussed in Section 4.1.5.
Concluding Remarks. In conclusion, it has been shown that for the case of GWN inputs: 1. The coherence function measurements reflect the presence of high-order (nonlinear) kernels and depend on the GWN input power level. Specifically, the coherence values are reduced from unity as the input power level P and/or the degree of nonlinearity increase.
100
NONPARAMETRIC MODELING
2. The apparent transfer function is identical to the Fourier transform ofthe first-order Wiener kernel of the system, which is not, in general, the same as the linear component ofthe system (formally represented by the first-order Volterra kernel). The apparent transfer function depends on the system odd-order nonlinearities and on the power level of the GWN input. 3. In those physiological studies where the physical test input is a GWN perturbation around a nonzero mean J.L (e.g., in vision), the apparent transfer function and the coherence function measurements depend on J.L. Therefore, they will vary as J.L may vary from experiment to experiment. 4. The same observations hold true for nonwhite broadband inputs, where, additionally the specific form ofthe input spectrum affects the apparent transfer function (unless it is a second-order system) and the coherence measurements. It is hoped that the presented analysis will assist investigators of physiological systems in interpreting their experimental results, obtained with these traditional frequency-domain measurements of the apparent transfer function and the coherence function, in the presence of intrinsie system nonlinearities.
2.3
EFFICIENT VOlTERRA KERNEl ESTIMATION
In Section 2.2, we described methods for the estimation ofthe Wiener kernels of a nonlinear system that receives a GWN (or quasiwhite) input. For such inputs, the most widely used method to date has been the cross-correlation technique in spite of its numerous limitations and inefficiencies. Part of the reason is that, although the Wiener functional terms are decoupled (i.e., theyare orthogonal for GWN inputs), their orthogonality (zero covariance) is approximate in practice, since their covariance is not exact1y zero for finite data records, which causes estimation errors in the Wiener kerne! estimates. Additional estimation errors associated with the cross-correlation technique are caused by the finite bandwidth of the input and the presence of noise/interference. These errors depend on the system characteristics and decrease with increasing data-record length, as discussed in Section 2.4.2. The most important limitations ofthe cross-correlation technique are: (a) the stringent requirement of a band-limited white-noise input, (b) the input dependence of the estimated Wiener (instead ofVolterra) kernels, (c) the experimental and computational burden of long data records, and (d) the considerable estimation variance ofthe obtained kernel estimates (especially of order higher than first). The latter two limitations are an inevitable consequence ofthe stochastic nature ofthe employed GWN (or quasiwhite) input and the fact that the cross-correlation estimates are computed from input-output data records with finite length. These estimates converge to the true values at a rate proportional to the square-root ofthe record length. Note that this limitation is also incumbent on the initially proposed Wiener implementation using time-averaging computation ofthe covariance between any two Wiener functionals over finite data records [Marmarelis & Marmarelis, 1978]. Thus, long data records are required to obtain cross-correlation or covariance estimates of satisfactory accuracy, resulting in heavy experimental and computational burden. In addition to the various estimation errors, the Wiener kemels are not input-indepen-
2.3
EFFICIENT VOLTERRA KERNEL ESTIMATION
101
dent and, therefore, the Volterra kernels are deemed more desirable in actual applications from the viewpoint of physiological interpretation. Methods for Volterra kernel estimation were discussed in Section 2.1.5, but they were also shown to exhibit practicallimitations and inefficiencies. More efficient methods for Volterra kernel estimation are discussed in this section. These methods can be applied to systems with arbitrary (broadband) inputs typical of spontaneous/natural operation, thus removing a serious practicallimitation ofthe Wiener approach (i.e., the requirement for white or quasiwhite inputs) or the need for specialized inputs (e.g., impulses or sinusoids). It is evident that, in practice, we can obtain unbiased estimates of the Volterra kernels only for systems of finite order. These Volterra kernel estimates do not depend on input characteristics when a complete (nontruncated) model is obtained for the system at hand. However, the Volterra kernel estimates for a truncated model generally have biases resulting from the correlated residuals, which are dependent on the specific input used for kerne I estimation. Thus, the attraction of obtaining directly the Volterra kernels of a system has been tempered by the fact that model truncation is necessitated in the practical modeling ofhigh-order systems, which in turn introduces input-dependent estimation biases because the residuals are correlated and input-dependent since the functional terms of the Volterrra models are coupled. Naturally, the biases and other estimation errors of the Volterra kernels depend on the specific characteristics of each application (e.g., system order, data-record length, input characteristics, etc.) and should be minimized in each particular case. To accomplish this error minimization, we must have a thorough understanding of the methodological issues regarding kerne I estimation in a general, yet realistic, context. The emphasis here will be placed on the most promising methods of Volterra kernel estimation that are applicable for nearly arbitrary inputs and high-order models. These methods employ Volterra kerne I expansions, direct inversion and iterative estimation methods in connection with equivalent network model structures. 2.3.1
Volterra Kernel Expansions
The introduction of the kernel expansion approach was prompted by the fact that the use of the cross-correlation technique for kerne I estimation revealed serious practical problems with the estimation of multidimensonal (high-order) kernels, These problems are rooted in the unwieldy multi dimensional representation ofhigh-order kemels and the statistical variation of covariance estimates. The Volterra kernel expansion approach presented in this section mitigates some of the problems of kernel estimation by compacting the kernel representation and avoiding the computation of cross-correlation (or covariance) averages, performing, instead, leastsquares fitting ofthe actual data to estimate the expansion coefficients ofthe Volterra kernels. This alternative strategy also removes the whiteness requirements ofthe experimental input, although broadband inputs remain desirable. The difference between this approach and the direct least-squares estimation of Volterra kernels discussed in Section 2.1.5 is that the representation of the Volterra kernels in expansion form is more compact than the discrete-time formulation of Section 2.1.5, especially for systems with large memory-bandwidth products, as discussed below. The more compact kernel representations result in higher estimation accuracy and reduced computational burden, especially in high-order kernel estimation. The basic kernel expansion methodology employs a properly selected basis of L causal
102
NONPARAMETRIC MODELING
funetions {bj ( T)} defined over the kernel memory, whieh ean be viewed as the impulse response funetions of a linear filterbank reeeiving the input signal x(t). This basis is assumed to span eompletely and effieiently (i.e., with fast eonvergenee) the kerne I function spaee over [0, J.L]. The outputs {vj(t)} of this filterbank (j = 1, ... , L) are given by
vif) =
rbl o
(2.177)
T)x(t - T)dT
and ean be used to express the system output y(t) aeeording to the Volterra model of Equation (2.5), through a multinomial expression: Q
y(t) = k o +
L
L
Ir=1 jlI ... jr=1 I arVb ... ,}r)vj1(t) ... Vjr(t)
(2.178)
=1
whieh results from substituting the kernels in Equation (2.5) with their expansions: L
kr(Tb· .. , Tr ) =
L
I ... jr=1 IarVb ... ,}r)bj1(Tl) ... bjr(Tr)
(2.179)
jl=1
where {ar} are the eoefficients of the rth-order kernel expansion. Note that this modified expression of the Volterra model (using the kernel expansions) is isomorphie to the bloek-structured model of Figure 2.16, whieh is equivalent to the Volterra class of systems if and only if the kernel expansions of Equation (2.179) hold for all r. In other words, the seleeted basis {bj ( T)} must be complete for the expansion of the partieular kernels of the system under study or at least provide adequate approximations for the requirements of the study. The kernel estimation problem thus reduees to the estimation of the unknown expansion coeffieients using the expression (2.178), whieh is linear in terms ofthe unknown eoeffieients. Note that the signals vj(t) are known as eonvolutions ofthe
LINEAR FILTERBANK
x(t)
STATIC NONLINEARITY
VI
(t)
Vj
(t)
VL
(t)
f[-]
y ( t ) =f
[ VI (t),...,VL (t)]
Figure 2.16 The modular form of the Volterra model, akin to the modular Wiener model of Figure 2.9. This modular model is isomorphie to the modified Volterra model of Equation (2.178), where the multi-input statie nonlinearity f[.] is represented/ approximated by a multinomial expansion.
2.3
EFFICIENTVOLTERRA KERNEL ESTIMATION
103
input signal with the selected filterbank [see Equation (2.177)], but are available only in sampled form (discretized) in practice. For this reason, the advocated methodology is cast in a discrete-time context by letting n replace t, m replace T, and summation replace integral. This leads to the modified discrete Volterra (MDV) model: Q
yen) = Co +
L
Jr - l
L L ...JLCrUb ... ,}r)vi 1(n) . . . viren) + e(n) r=1 Jl=1
(2.180)
r=1
where the expansion coefficients {c.} take into account the symmetries of the Volterra kemels, that is, CrUb' .. ,jr) = A~rVb ... ,ir)
(2.181)
where Ar depends on the multiplicity ofthe specific indices Vb ... ,jr)' For instance, if all indices are distinct, then Ar = 1; but ifthe indices formp groups with multiplicities mj(i = 1, ... ,p, and m, + ... + mp = r), then Ar = ml! ... rnpL The error tenn e(n) incorporates possible model truncation errors and noise/interference in the data. The filter-bank outputs are M-l
vj(n) =
TL bj(m)x(n - m)
(2.182)
m=O
where M denotes the memory-bandwidth product of the system and T is the sampling interval. For estimation efficiency, the number of required basis functions L must be much smaller than the memory-bandwidth product M ofthe system, which is the number of discrete samples required for representing each kerne I dimension. With regard to the estimation of the unknown expansion coefficients, the methods discussed in Section 2.1.5 apply here as well, since the unknown coefficient vector must be estimated from the matrix equation: y = Vc +
8
(2.183)
where the matrix V is constructed with the outputs of the filter bank according to Equation (2.180). Note that the symmetries of the kernels have been taken into account by requiring that j, $ }i-l in Equation (2.180). For instance, for a second-order system, the nth row of the matrix V is {I, vl(n), ... , vL(n), vr(n), v2(n)vl(n), ... , vL(n)vl(n), v~(n), v3(n)v2(n), ... , vL(n)vL-l(n), vz(n)}
The number of columns in matrix V for a Qth-order MDV model is P = (L + Q)!/(L!Q!)
(2.184)
and the number of rows is equal to the number of output samples N. Solution of the es-
104
NONPARAMETRIC MODELING
timation problem formulated in Equation (2.183) can be achieved by direct inversion of the square Gram matrix G = [V'V] if it is nonsingular, or through pseudoinversion of the reetangular matrix V if G is singular (or ill-conditioned), as discussed in Section 2.1.5. Iterative schemes based on gradient descent can also be used instead of matrix inversion, especially if the residuals are non-Gaussian, as discussed in Section 2.1.5. The number of columns of the matrix V determines the computational burden in the directinversion approach and depends on the parameters Land Q, as indicated in Equation (2.184). Therefore, it is evident that practical solution of this estimation problem is subject to the "curse of dimensionality" for high-order systems, since the number of columns P in matrix V increases geometrically with Q or L. For this reason, our efforts should aim at minimizing L by judicious choice of the expansion basis, since Q is an invariant system characteristic. If the matrix V is full-rank, then the coefficient vector can be estimated by means of ordinary least-squares as c = [V'V]-lV'y
(2.185)
This unbiased and consistent estimate is also efficient (i.e., it has minimum variance among all linear estimators) if the residuals are white (i.e., statistically independent with zero mean) and Gaussian. Ifthe residuals are not white, then the generalized least-squares estimate ofEquation (2.46) can be used, as discussed in Section 2.1.5. In that same section, an iterative procedure was described for obtaining efficient estimates when the residuals are not Gaussian, by minimizing a cost function defined by the log-likelihood function (not repeated here in the interest of space). A practical problem arises when the matrix V is not full-rank or, equivalently, when the Gram matrix G is singular. In this case, a generalized inverse (or pseudoinverse) V+ can be used to obtain the coefficient estimates as [Fan & Kalaba, 2003]: c=V+y
(2.186)
Another problematic situation arises when the Gram matrix G is ill-conditioned (a frequent occurrence in practice). In this case, a generalized inverse can be used or a reducedrank inverse can be found by means of singular-value decomposition to improve numerical stability. The resulting solution in the latter case depends on the selection of the threshold used for determining the "significant" singular values.
Model Order Determination. Of particular practical importance is the determination ofthe structural parameters Land Q ofthe MDV model, which detennine the size ofmatrix V. A statistical criterion for MDV model order selection has been developed by the author that proceeds sequentially in ascending order, as described below for the case of Gaussian white output-additive noise (i.e., the residual vector for the true model order). If the residuals are not white, then the prewhitening indicated by Equation (2.47) must be applied. First, we consider a sequence ofMDV models ofincreasing order "r," where r is a sequential index sweeping through the double loop of increasing Land Q values (L being the fast loop). Thus, starting with (Q = 1, L = 1) for r = 1, we go to (Q = 1, L = 2) for r = 2, and so on. After (Q = 1, L = L max ) for r = L max , we have (Q = 2, L = 1) for r = L m ax + 1 and so on until we reach (Q = Qmax, L = L max ) for r = Qmax . L m ax . For the true model or-
2.3
105
EFFICIENT VOLTERRA KERNEL ESTIMATION
der r = R (corresponding to the true Q and L values), we have (2.187)
Y= VRCR + SR where the residual vector BR is white and Gaussian, and the coefficient vector structs the true Volterra kemels of the system according to
I ... I
.
11=
recon-
Jq-I
L
kq(mb . . . , mq) =
eR
I
. I /«:
CqUb ... ,}q)bJI(mI) ... b, (m q)
(2.188)
q
However, for an incomplete order r, we have the truncated model
Y= V,c, + Sr
(2.189)
where the residual vector contains portion of the input-output relationship and is given by Sr
= [I - Vr[V;Vr]-lV;]y ~HrY
(2.190)
where the "projection" matrix H, is idempotent (i.e., H; = H r) and ofrank (N - Pr), with Pr denoting the number of free parameters given by Equation (2.184) for the respective L and Q. Note that
e; = H~;-IBr-l ~ SrBr-I
(2.191)
which relates the residuals at successive model orders through the matrix Sr that depends on the input data. Since Bo is the same as y (by definition), the rth-order residual vector can be expressed as
e; = SrSr-l ... SIY (2.192)
=Hry
because the concatenation ofthe linear operators [SrSr-l ... Sd simply reduces to H': The residuals of an incomplete model rare composed of an input-dependent term Ur (the unexplained part of the system output) and a stochastic term W r that represents the outputadditive noise after being transformed by the matrix Hr- Therefore,
e, = Ur + W r
(2.193)
where Ur is input-dependent and Hrwo
(2.194)
Ur = Hruo
(2.195)
Wr =
Note that
Wo
denotes the output-additive noise in the data and
Uo
represents the noise-free
106
NONPARAMETRIC MODELING
output data. For the true order R, we must have UR = 0 since all the input-dependent information in the output data has been captured and explained by the MDV model of order R. The expected value ofthe sum ofthe squared residuals (SSR) at order r is given by E[Or] = E[s;sr}
= u.u, + UÖTr{H;Hr }
(2.196)
where UÖ is the variance of the initial white residuals, and Tr{·} denotes the trace of the subject matrix. The latter is found to be
Tr{H;H r} = N - Pr
= N - iL, + Qr)!/{Lr!Qr!)
(2.197)
where L, and Q; are the structural parameter values that correspond to model order r. To test whether r is the true order ofthe system, we postulate the null hypothesis that it is the true order, which implies that Ur = 0, and an estimate of the initial noise variance can be obtained from the computed SSR as
uö = O/{N -
Pr)
(2.198)
Then, the expected value ofthe reduction in the computed SSR for the next order (r + 1) is E[Or- Or+l] =
uö· (Pr+l -Pr)
(2.199)
under this null hypothesis. Therefore, using the estimate of UÖ given by Equation (2.198), we see that this reduction in computed SSR ought to be greater than the critical value: (Jr =
Anr (Fr+l
- Fr) (N -Pr)
(2.200)
to reject this null hypothesis and continue the search to higher orders. We can select A = 1 + 2V2, because the SSR follows approximately a chi-square distribution. Ifthe reduction in computed SSR is smaller than the critical value given by Equation (2.200), then the order r is accepted as the true order ofthe MDV model for this system; otherwise we proceed with the next order (r + 1) and repeat the test. As an alternative approach, we can employ a constrained configuration of the MDV model in the form of equivalent feedforward networks, which reduce significantly the number of free parameters for high-order systems regardless of the value of L, as discussed in Section 2.3.3. However, the estimation problem ceases to be linear in terms of the unknown parameters and estimation ofthe network parameters is achieved with iterative schemes based on gradient descent, as discussed in Section 4.2.2. The use of gradient-based iterative estimation of system kemels was first attempted with "stochastic approximation" methods [Goussard et al., 1991], but it became more attractive and more widely used with the introduction of the equivalent network models discussed in Section 2.3.3. It should be noted that these methods offer computational advantages when the size of the Gram matrix G becomes very large, but they are exposed to problems of conver-
2.3
EFFICIENT VOLTERRA KERNEL ESTIMATION
107
gence (speed and local minima). They also allow robust estimation in the presence of noise outliers (i.e., when the model residuals are not Gaussian and exhibit occasional outliers) by utilizing nonquadratic cost functions compatible with the log-likelihood 2nction of the residuals, since least-squares estimation methods are prone to serious errors in those cases (as discussed in Sections 2.1.5 and 2.4.4). The key to the efficacy of the kerne I expansion approach is in finding the proper basis {bj } that reduces the number L to a minimum for a given application (since Q is fixed for a given system, the number of free parameters depends only on L). The search for the putative "minimum set" of such basis functions may commence with the use of a general complete orthonormal basis (such as the Laguerre basis discussed in Section 2.3.2 below) and advance with the notion of "principal dynamic modes" (discussed in Section 4.1.1), in order to extract the significant dynamic components of the specific system and eliminate spurious or insignificant terms.
2.3.2
The Laguerre Expansion Technique
To date, the best implementation ofthe kernel expansion approach (discussed in the previous section) has been made with the use of discrete Laguerre functions. The Laguerre basis (in continuous time) was Wiener' s original suggestion for expansion ofthe Wiener kernels, because the Laguerre functions are orthogonal over a domain from zero to infinity (consistent with the kernel domain) and have a built-in exponential (consistent with the relaxation dynamic characteristics ofphysical systems). Furthermore, the Laguerre expansions can be easily generated in continuous time with analog means (ladder R-C network), which enhanced their popularity at that time when analog processing was in vogue. With the advancing digital computer technology, the data processing focus shifted to discretized signals and, consequently, to sampled versions of the continuous-time Laguerre functions. The first applications of Laguerre kernel expansions were made in the early 1970s on the eye pupil reflex [Watanabe & Stark, 1975] and on hydrological rainfall-runoffprocesses [Amorocho & Branstetter, 1971] using sampled versions ofthe continuous-time Laguerre functions. Note that these discretized Laguerre functions are distinct from the "discrete Laguerre functions" (DLFs) advocated herein, which are constructed to be orthogonal in discrete time [Ogura, 1985] (the discretized versions are not generally orthogonal in discrete time but tend to orthogonality as the sampling interval tends to zero). Wide application of the Laguerre kernel expansion approach commenced in the 1990s with the successful introduction ofthe DLFs discussed below [Marmarelis, 1993]. The Laguerre expansion technique (LET) for Volterra kernel estimation is cast in discrete time by the use ofthe orthonormal set of discrete Laguerre 2nctions (DLFs) given by [Ogura, 1985] hirn) = a(m- j)/2(l - a)1I2
~(-1 )k(: )({ )ai-k(l -
a)k
(2.201)
where bj(m) denotes thejth-order orthonormal DLF, the integer m ranges from 0 to M - 1 (the memory-bandwidth product of the system), and the real positive number a (0 < a < 1) is the critical DLF parameter that determines the rate of exponential (asymptotic) decline ofthese 2nctions. We consider a DLF filter bank (for) = 0, 1, ... ,L - 1) receiving the input signal x(n) and generating at the output of each filter the key variables {vj(n)} as
108
NONPARAMETRIC MODELING
the discrete-time convolutions given by Equation (2.182) between the input and the respective DLF. Since the sampled versions of the traditional continuous-time Laguerre functions, which were originally proposed by Wiener and used by Watanabe and Stark in the first known application of Laguerre kernel expansions to physiological systems [Watanabe & Stark, 1975], are not necessarily orthogonal in discrete time (depending on the sampling interval), Ogura constructed the DLFs to be orthogonal in discrete time and introduced their use in connection with Wiener-type kernel estimation that involves the computation of covariances by time averaging over the available data records (with all the shortcomings of the covariance approach discussed earlier) [Ogura, 1985]. The advocated LET combines Ogura's DLFs (which are easily computed using an autorecursive relation) with least-squares fitting (instead of covariance computation), as was initially done by Stark and Watanabe. An important enhancement of the LET was introduced by the author in terms of adaptive estimation of the critical Laguerre parameter a from the input-output data, as elaborated below. This is a critical issue in actual applications of the Laguerre expansion approach because it determines the rate of convergence of the DLF expansion and the efficiency of the method, It should be noted that LET was introduced in connection with Volterra kernel estimation but could be also used for Wiener kernel estimation if the Wiener series is employed as the model structure. However, there is no apparent reason to use the Wiener series formulation, especially when a complete (nontruncated) model is employed. Of course, the Volterra formulation must be used when the input signal is nonwhite. When the discretized Volterra kemels of the system are expanded on the DLF basis as L-l
Jr-l
Jl=O
Jr=O
k.irn«, ... , mr) =
I ... I
CrVb ... ,Jr)bJ1(ml) ... bJr(m r)
(2.202)
then the resulting MDV model of Qth order is given by Equation (2.180), where a finite number L ofDLFs is used to represent the kernels as in Equation (2.202). The task of Volterra system modeling now reduces to estimating the unknown kernel expansion coefficients {CrOb ... , Jr)} from Equation (2.180), where the output data {y(n)} and the transfonned input data {vJ(n)} (which are the outputs ofthe DLF filterbank) are known. This task can be performed either through direct inversion ofthe matrix V in the fonnulation of Equation (2.183) or using iterative (gradient-based) techniques that can be more robust when the errors are non-Gaussian (with possible outliers) and/or the size of the V matrix is very large. The total number of free parameters P that must be estimated [which is equal to the number of columns of matrix V given by Equation (2.184)] remains the critical consideration with regard to estimation accuracy and computational burden. Estimation accuracy generally improves as the ratio PIN decreases, where N is the total number of input-output data points. The computational burden increases with P or N, but is more sensitive to increases in P. Thus, minimizing L is practically important in order to minimize P, since the model order Q is dictated by the nonlinear characteristics of the system (beyond our control). A complete model (Le., when Q is the highest order of significant nonlinearity in the system) is required in order to avoid estimation biases in the obtained Volterra kerne I estimates. This also alleviates the strict requirement of input whiteness and, thus, naturally occurring data can be used for Volterra kerne I estimation.
2.3
EFFICIENT VOLTERRA KERNEL ESTIMATION
109
The computations required for given Land Q can be reduced by computing the key variables {vj(n)} using the autorecursive relation [Ogura, 1985]: vj(n) = ~vj(n - 1) + Yavj_l(n) - vj_l(n - 1)
(2.203)
which is due to the particular form ofthe DLF. Computation ofthis autorecursive relation must be initialized by the following autorecursive equation offirst order that yields vo(n) for given stimulus x(n): vo(n) = Yavo(n - 1) + TVT=ax(n)
(2.204)
where T is the sampling interval. These computations can be preformed rather fast for n = 1, ... ,N and} = 0, 1, ... ,L - 1, where L is the total number ofDLF used in the kernel expansion. The DLFs also can be generated by these autorecursive relations. The choice ofthe Laguerre parameter a is critical in achieving efficient kerne I expansions (i.e., minimize L) and, consequently, fast and accurate kernel estimation. Its judicious selection was initially made on the basis of the parameter values Land M [Marmarelis, 1993] or by successive trials. However, the author recently proposed its estimation through an iterative adaptive scheme based on the autorecursive relation of Equation (2.203). This estimation task is embedded in the broader estimation procedure for the kernel expansion coefficients using iterative gradient-based methods. Specifically, we seek the minimization of a selected nonnegative cost function F [usually the square ofthe residual term in Equation (2.180)] by means ofthe iterative gradient-based expression
C (i+ I )U '1 r , . . . , J') r
= C~)Ub
... ,ir) -
'Y
I acrUaF(e) I, . . . , J') r e=eCi)
(2.205)
where y denotes the adaptation step (learning constant), i is the iteration index, and the gradient aF/aCr is evaluated for the ith iteration error [i.e., the model residual or prediction error is computed from Equation (2.180) for the ith-iteration parameter estimates]. In the case of the Laguerre expansion, the iterative estimation of the expansion coefficients based on the expressions (2.205) can be combined with the iterative estimation of the square-root ofthe DLF parameter ß = al/2 :
ß
=
ß
aF(e) I 'Yß~ s=dJ)
(2.206)
Clearly, changes in ß (or a = ß2) alter the values of the key variables {Vj(n)} according to the autorecursive relation (2.203), and thus n) cJvi
aß
= vin - 1) + vj_l(n)
(2.207)
This simple gradient relation (with respect to ß = al/2 ) allows iterative estimation of a along with the expansion coefficients in a practical context. For instance, in the case of a
110
NONPARAMETRIC MODELING
quadratic cost function F = i2 (corresponding to least-squares fitting), the gradient components are aF(e)
(2.208)
and aF
Q
----v2
aa
= -2e(n)I
Jr- I Jr
L
I ...IJr=O PI~Ji cl } !> .. . ,Jr)vJ\(n) . .. [vp(n -
1) + vp-I(n)] ... vJ/n)
r= 1 Ji =O
(2.209)
Although this procedure appears to be laborious for high-order models, it becomes drastically accelerated when an equivalent network structure is used to represent the high-order model, as discussed in Section 2.3.3. It is instructive to illustrate certain basic properties of the DLFs. The first five DLFs for a = 0.2 are shown in Figure 2.17 (T = 1). We note that the number of zero crossings (roots) of each DLF equals its order. Furthermore, the higher the order, the longer the significant values of a DLF spread over time, and the time separation between zero crossing increases. This is further illustrated in Figure 2.18, where the DLFs of order 4, 8, 12, and 16 are shown for a = 0.2. An interesting illustration of the first 50 DLFs for a = 0.2 is given in Figure 2.19, plotted as a square matrix over 50 time lags in three-dimensional perspective (top display) and contour plot (bottom display). We note the symmetry of this matrix and the fact that higher-order DLFs are increasingly "delayed"
0.900 0 .75 0 0.600
/\< "" "'''/
0.300 0 . 150 0.0
" ...
;,
0 .4 50
~\
/1:, '\. \ :~
\:
~
I
~\ H,'
'j'; :
,=-,;'
- 0. 45 0
I' /
-0.600 0.0
', .'
'" / . ""'- "'-
j\!
-- ... _ -~
I,
'~." .'~
. ; I
-0 .300
/ '"
\. . \., .:,j \ .:\' /
r\.\!!,
- 0 .1 5 0
-.
" .
. . . ....
....-.-.-.-
" - " ' -"
-:
,.
.. 5 .00
10.0
15.0
20 .0
25 .0
TIME UNITS
Figure 2.17 Discrete-time Laguerre functions (DLF) of order 0 (solid), 1 (dotted), 2 (dashed), 3 (dotdash), and 4 (dot-dot-dot-dash) for a = 0.2, plotted over the first 251ags [Marmarelis , 1993].
2.3
EFFICIENT VOLTERRA KERNEL ESTIMA TlON
111
0.500 0.400 0..:500
,,
/7\ "" ":/...'.,.
0.200 0. 100
/ 0.0
",I ,\
\ ', i
-0.200
/ ....
'
"
, ,.-
i , i I
I'
i
!
I
"
"
" ....'.
it ,'
,~
-0.100
''
"- , ,
.'I
-0 ..:500 - 0.400 -0.500
,0.0
10.0
20 .0
.:50.0
40.0
50.0
TIME UNITS
Figura 2.18 Discrete-time Laguerre funct ions (DLF) of order 4 (solid), 8 (dotted), 12 (dashed), and 16 (dot-dash) for Ci = 0.2, plotted over the first 25 lags [Marmarelis, 1993].
in their significant values. Note that Volterra kernels with a pure delay may require use of "associated DLFs" of appropriate order for efficient kerne1 representation [Ogura, 1985]. In the frequency domain, the FFT magnitude of all DLFs for a given a is identical, as illustrated in Figure 2.20 for the five DLFs of Figure 2.17. However, the FFT phase of these DLFs is different, as illustrated also in Figure 2.20. We note that the nth-order DLF exhibits a maximum phase shift of (n1T) , with the odd-order ones commencing at 1T radians and the even-order ones commencing at 0 radians (at zero frequency). Thus , the "minimum-phase" DLF is always for order zero. In order to examine the effect of a on the form ofthe DLFs, we show in Figure 2.21 the fourth-order DLF for a = 0.1,0.2, and 004. We observe that increasing a results in longer spread of significant values and zero crossings. Thus, kernels with longer memory may require a larger a for efficient representation. In the frequency domain, the FFT magnitudes and phases ofthe DLFs ofFigure 2.21 are shown in Figure 2.22. We observe that for larger a, the lower frequencies are emphasized more and the phase lags more rapidly, although the total phase shift is the same (41T for all fourth-order DLFs). A more complete illustration ofthe effect of o is given in Figure 2.23, where the matrix ofthe first 50 DLFs for a = 0.1 is plotted in the same fashion as in Figure 2.19 (for a = 0.2). It is clear from these figures that increasing a increases the separation between the zero crossings (ripple in three-dimensional perspective plots) and broadens the "fan" formation evident in the contour plots. These observations provide an initial guide to selecting an approximate value of a for given kernel memory-bandwidth product (M) and number ofDLFs (L) . For instance, we
112
NONPARAMETRIC MODELING
Dato file : Uot2
....z
~
w
er::
85
:::>
SI
~
....
o
0
N
er:: ~ er::
o g
:::~~: 4~ :::::
~ ::~~: .~ :UU
: ::~~:
10
·: :1::11:8:
Ja
TIME UNITS
'"
Figure 2.19 The first 50 DLFs for a = 0.2, plotted from 0 to 49 lags in a three-dimensional perspective plot (Ieft panel) and contour plot (right panel) [Marmarelis, 1993].
may select the value a for which the significant values of the DLFs extend from 0 to M while, at the same time, the DLFs values are diminished for m J$> M (in order to secure their orthogonality). In other words, for given values M and L, a must be chosen so that the point (M, L) in the contour plane is near the edge ofthe "fan fonnation" but outside this "fan." I/Iustrative Examples. We now demonstrate the use of DLF expansions for kernel
estimation. Consider first a second-order nonlinear system with the first- and secondorder Volterra kernels shown in Figure 2.24, corresponding to a simple L-N cascade.
'''' L'"
HO
"'0
3.20
4.00
"
2.40
''''
~.
Lee
'000
i
'000
~
' 400 ' .200 0.100
o lOO 1rt01t~1[!)
0300 fJl[OV( NCY
,0<>0
'500
.'.
~ \
-1 .60
-3.20
' .
,\
0.0
-2 .40
.,
"
0 _800
- 0 .800
,.,
.. -'. ;\ .,
-1 '';\'"' ' 0.0
i
0.100
' .'
i
0.300
0.200
i
i
0400
0.500
NORlolAUZED F'R[OU( NCT
(a)
(b)
Figure 2.20 (a) FFT magnitude of the first five (orders 0 to 4) DLFs (shown in Figure 2.17) for a = 0.2, plotted up to the normalized Nyquist frequency of 0.5 . (b) FFT phase of the first five (orders 0 to 4) DLFs (shown in Figure 2,17) for a = 0.2, plotted up to the normalized Nyquist frequency of 0.5 [Marmarelis, 1993].
2.3
EFFICIENT VOLTERRA KERNEL ESTIMATION
113
o.soc 0.400 0.300
~
t\"
"'j
0.100
-1 / '1'
0.0
,,
i", n.
0.200
,
" ,, ,,
•,
--.
... .......
.
-t 1' 1\ I; : ;
-0.100
~ 11:
1\ ,
~
:1 ... 1'/ :'
-0.200
~
- 0.300
.j ~'
t..:,
!'
:
11 / ..
-0.400 - O . ~O O
IE
;:
I' •
"
0.0
I
10.0
20 .0
30.0
40.0
50 .0
n UE UNITS
Figure 2.21 The fourth-order DLFs for a = 0.1 (solid line). 0.2 (dotted line), and 004 (dashed line), plotted from 0 to 491ags [Marmarelis, 1993].
This system is simulated for a band-limited GWN input of 512 data points, and the kernels are estimated using both the cross-correlation technique (CCT) and the advocated Laguerre expansion technique (LET). The first-order kerne I estimates are shown in Figure 2.25, where the LET estimate is plotted with asolid line (alm ost exactly identical to the true kernel) and the CCT estimate is plotted with a dashed line. The second-order kernel estimates are shown in Figure 2.26, demonstrating the superiority of the LET estimates. Note that for a second-order system, the Volterra and Wiener kemels
• .80
25.
• .00
U S
3.20
' 00
2.4.
1.7S
r,·. '0
1.80
i
' .10 ' .1$
•.00
0.'"
-:-----.,
-... _-
",
0....
•
~a
_ .
a -. _ •
• • •
•
• •
••
J
0.... 0.0 -0.800
- 1.60 - 2.&0
0.'" 0.• 0.0
0.100
• lOG ~ZtO
(a)
" 00
l"IIIeoucl<'l'
....
-s.ee 0100
00
0'00
eseo
o >CO
'OOflUAUZED ' A( OU(NCl'
(b)
Figure 2.22 (a) FFT magn itude of the fourth-order DLFs shown in Figure 2.21, for a =0.1 (solid line). a = 0.2 (dotted line), and a =004 (dashed line). (b) FFT phase of the fourth-order DLFs shown in Figure 2.21, for a = 0.1 (solid line). a = 0.2 (dotted line), a = 0.4 (dashed) [Marmarelis, 1993].
114
NONPARAMETRIC MODEL/NG
Dat a fi,, : LJr,l; 1 L-f"NS lo'ATRnl (A.O.I )
~
~ W Cl< Cl<
SI
::s
~
~ ~
Cl<
o
:: :~::
4::::::
~::~::
61::: ::
:::~::
-g :U1U:::
!!
20
-c
Ja
TIM E UNI TS
Figure 2.23 The first 50 DLFs for a = 0.1, plotted from 0 to 49 lags in three-dimensional perspective plot (Ieft panel) and contour plot (right panel) [Marmarelis, 1993].
are identical (except for the zeroth-order one). Thus, the Volterra kernel estimates obtained via LET can be directly compared with Wiener kernel estimates obtained via CCT . These LET estimates were obtained for L = 10 and IX = 0.1, and the estimates of the Laguerre coefficients in this case are shown in Figure 2.27 for the first-order and secondorder kcrnels, Note that thc second-order eoeffieients are plotted as a symmetrie matrix, even though only the entries of one triangular region are estimated. It is evident from Figure 2.27 that an adequate estimation ofthese kerneis canalso be aecomplished with L = 8 (due to the very small values of the ninth and tenth coefficients), which demonstrates the significant compactness in kernel representation aecomplished by the Laguerre expansion (for M = 50 and L = 8, the savings faetor is about 35 for a second-order model and grows rapidly for higher-order models). Recall that the number L ofrequired DLFs for accurate representation ofthe kerneis of
0.100
x-min w 0.0000 x-max = IS.OOOO
0.000
i~
y-min -
0.0000 Il .OOOO
0 .seo
y-max
0. _
e-rnte - -O. IOI9E +OO
=
z-max e
O.5076E+OO
0 ,300
0.200 0.100
0.0 - 0.100
-0%00 00
...
". 0
'00
20_0
2~ C
TlWEI.JoCö (T,t,U )
(a)
(b)
Figure 2.24 (a) Exact first-order kernel of simulated system. (b) Exact second-order kemel of simulated system [Marmarelis , 1993].
2.3
115
EFFICIENTVOLTERRA KERNEL ESTIMATION
0.800
0.700 0.600
0.500 0.400
~ ....., :
0.300 0.200
0.1.00
" ... _,' "
, \
0.0
,._.1
\
,
-0.100 -0.200
r,
-------,
....... ,,
I
\
-. 5.00
0.0
15.0
10.0
20.0
25.0
TIME LAG (TAU)
Figure 2.25 1993].
The estimated first-order kerneI via LET (solid line) and CCT (dashed line) [Marmarelis,
a given system critically affects the computational burden associated with this method. In general, the total number ofresulting expansion coefficients (free parameters ofthe model) for a system of order Q is (L + Q)!/(L!Q!). Note that this number includes all kerneIs up to order Q and takes into account the symmetries in high-order kerneIs. The required L in a given application can be detennined by the statistical procedure described in the previous section [see Equation (2.200)] or can be empirically selected by estimating the first-order kernel for a large L, and then, by inspecting the obtained coefficient estimates, we can select the minimum number that corresponds to significant values.
x-rnin = 0.0000 x-rnax = 15.0000 y-min = 0.0000 y-rnax = 15.0000 z-min = -0.1013E+OO z-rnax = 0.507SE+OO
(a)
I
x-min x-max y-min y-max z-min
A
z-max
0.0000 15.0000 0.0000 15.0000 -0. 1464E+OO O.S421E+OO
(b)
Figure 2.26 (a) Second-order kernel estimated via LET. (b) Second-order kernel estimated via CCT [Marmarelis, 1993].
116
NONPARAMETRIC MODELING
The same reasoning can be applied to high-order kemels (e.g., the second-order coefficients shown in Figure 2.27), although the pattern of convergence ofthe Laguerre expansion depends on o, a fact that prompted the introduction of the iterative scheme (discussed above) for estimating a (along with the expansion coefficients) on the basis ofthe data. Selection of the required maximum kernel order Q can also be made using the statistical procedure ofEquation (2.200) or can be empirically based on preliminary tests or on the adequacy of the output prediction accuracy of a given model order (as in all previous applications ofthe Volterra-Wiener approach). We strongly recommend the selection of the key model parameters Land Q using the statistical criterion of Equation (2.200) and successive trials in ascending order, as discussed above. The use of the popular criteria (e.g., Akaike information criterion or minimum description length), although rather common in established practice, is discouraged as potentially misleading. We now examine the effect of noise on the obtained kernel estimates by adding independent GWN to the output signal for a signal-to-noise ratio (SNR) of 10 dB. The first-order kernel estimates obtained by LET and CCT are shown in Figure 2.28 as asolid line for LET and as a dotted line for CCT. The corresponding second-order kernel estimates are shown in Figure 2.29 and demonstrate the superiority ofthe LET approach in terms of robustness under noisy conditions. Note that the LET estimates in this demonstration were computed with L = 8, and required very short computing time. The fact that kernel estimates of this quality can be obtained from such short data records (512 input-output data points), even in cases with significant noise (SNR = 10 dB), can have important implications in actual applications of the advocated modeling approach to physiological systems. Another important issue in actual applications is the effect of higher-order nonlinearities on the obtained lower-order kerneI estimates when a truncated (incomplete) Volterra model is used. Since most applications limit themselves to the first two kemels, the presence of higher-order (>2) Volterra functionals acts as a source of "correlated noise," which is dependent on the input. To illustrate this effect, we add third-order and fourth-order nonlinearities to the previous system and recompute the kerneI estimates.
0.110
x-min = 0.ססOO x-max = 9.ססOO y-min = 0.ססOO y-max = 9.ססOO z-min = -0.1980E+OO z-max = 0.S932E+OO
0.120 O.eooE-Ol
0.0 -0.800E-01
Ü -0.120 -0.1150 -0.240 -0.300
-0.360 -0.420
0.0
2.00
4.00
6.00
8.00
10.0
ORDER OF'I.AGUERRE COEFF'ICIENT
(a)
(b)
Figure 2.27 (a) Estimates of the first 10 (order 0 to 9) Laguerre expansion coefficients for the firstorder kerne!. (b) Estimates of the Laguerre expansion coefficients (order 0 to 9 in each dimension) for the second-order kernei, plotted as asymmetrie two-dimensional array (total number of distinct coefficients is 55 ) [Marmarelis, 1993].
2.3
EFFICIENT VOLTERRA KERNEL ESTIMATION
117
0.800
0.700 0.800
0.500 0.400
0.300 0.200 0.100 0.0
.... ~
-0.100 -0.200
-,
-. 0.0
10.0
5.00
15.0
25.0
20.0
TIME LAG (TAU)
Figure 2.28 First-order kernel estimates obtained via LET (solid line) and CCT (dotted line) for noisy data (SNR = 10dB) [Marmarelis, 1993].
Note that the simulated system is a simple L-N cascade of a linear filter followed by a static nonlinearity of the form y = z + Z2 (for the second-order system) and y = Z + Z2 + z3!3 + z4/4 (for the fourth-order system), where z{t) is the output of the linear filter. The obtained first-order kerneI estimates are shown in Figure 2.30, where the exact Wiener kerne I is plotted with asolid line, the LET estimate with a dotted line, and the CCT estimate with a dashed line. The LET estimate is much better than its CCT counterpart, but it exhibits certain minor deviations from the exact kernel due to the presence of the third-order and fourth-order terms. The exact second-order Volterra kernel of the fourth-order system has the same form as in Figure 2.24, but its second-order Wiener kerne} is scaled by a factor of 2.23 because of the presence of the fourth-order nonlinearity [see Equation (2.57) for P = 1]. The second-order kerne I estimates obtained via
x-min x-max y-min y-max z-min z-max
(a)
o.oooo
rs.oooo
x-min x-max
is.ooco
y-min
O.()()()()
-O.l06SE+OO 0.S120E+OO
y-max z-min z-max
I A
0.0000 15.0000 0.0000 15.0000 -0. 1395E+OO O.5491E+OO
(b)
Figure 2.29 (a) Second-order kernel estimated obtained via LET for noisy data (SNR = 10 dB). (b) Second-order kernel estimated obtained via CCT for noisy data [Marmarelis, 1993].
118
NONPARAMETRIC MODELING 1.60 1.40 1.20
1.00 0.800 0.600 0.400 '\
0.200 ~
,--
"
" ,
I
r-. ,
, ' ...... ,
\
-0.200 -0.400
\
"'~".":.I.l\
0.0
"
..
\,'
I
\ ...... T
0.0
5.00
10.0
15.0
25.0
20.0
TIME LAG (TAU)
Figure 2.30 First-order kernel estimates for fourth-order simulated system obtained via LET (dotted line) and CCT (dashed line). The exact kernel is plotted with the solid line and is almost identical to the LET estimate [Marmarelis, 1993].
LET and CCT are shown in Figure 2.31 and demonstrate that the LET estimate is better than its CCT counterpart. Note that the LET estimates closely resemble the exact Wiener or Volterra kernels in form (since both have the same shape in this simulated system), but have the size of the Wiener kernels (for this cascade system) because the input is GWN and the estimated model is truncated to the second order. An important advantage of the advocated technique (LET) is its ability to yield accurate kernel estimates even when the input signal deviates from white noise (as long as the model order is correct). This is critically important in experimental studies where a whitenoise (or quasiwhite) input cannot be easily secured. As an illustrative example, consider the previous second-order system being simulated for a nonwhite (broadband) input with two resonant peaks [Marmarelis, 1993]. The first-order kernel estimates obtained via LET
x-min x-max y-min y-max z-min z-max
(a)
x-min
x-max
0.0000 15.0000 0.0000 15.0000 -0.2077E+OO 0.1157E+OI
y-min
y-max z-min z-max
0.0000 15.0000 0.0000 15.0000 -0.3564E+OO 0.1223E+OI
(b)
Figure 2.31 (a) Second-order kernel estimated for the fourth-order simulated system obtained via LET. (b) Second-order kernel estimated for the fourth-order simulated system obtained via CCT [Marmarelis, 1993].
2.3
EFFICIENT VOLTERRA KERNEL ESTIMATION
119
0.800 0.700 0.600
0.500 0.400
0.300 0.200
0.100
,'", ... ", '.
',I,"~ , ', __,' \, ,
0.0
.. -'"
->,
, '"
\ \
" \
I
\
-0.100 -0.200
T
0.0
5.00
10.0
15.0
20.0
25.0
TIME LAG (TAU)
Figure 2.32 First-order kernel estimates for nonwhite stimulus obtained via LET (dotted line) and CCT (dashed line). The exact kernel of the solid line is nearly superimposed on the LET estimate [Marmarelis, 1993].
and CCT are shown in Figure 2.32, where the LET estimate (dotted line) is almost identical to the exact kerne I (solid line), while the CCT estimate (dashed line) shows the effects ofthe nonwhite stimulus in terms of estimation bias (in addition to the anticipated estimation variance). The second-order kernel estimates obtained via LET and CCT are shown in Figure 2.33 and clearly demonstrate that the LET estimate is far better than the CCT estimate and is not affected by the nonwhite spectral characteristics ofthe input. Note, however, that the LET estimates will be affected by the spectral characteristics of a nonwhite input in the presence of higher-order nonlinearities. This is illustrated by simulating the previous fourth-order system with the nonwhite input. The obtained first-order kernel estimates via LET and CCT are shown in Figure 2.34, along with the exact kernel, The LET estimate is affected by showing some additional estimation bias relative to the previous
x-min x-max
y-min y-max z-min z-max
(a)
0.0000 lS.OOOO 0.0000 1S.OOOO -0.1021E+OO O.SOSOE+OO
x-min = 0.0000 x-max = lS.OOOO y-min = 0.0000 y-max = lS.OOOO z-rnin = -0.1876E+OO z-max = O.644SE+OO
(b)
Figure 2.33 (a) Second-order kernel estimated for nonwhite stimulus via LET. (b) Second-order kernel estimated for nonwhite stimulus via CCT [Marmarelis, 1993].
120
NONPARAMETRIC MODELING
1.60 1.40 1.20
1.00 0.800
0.600 0 ..00
0.200
,.
;'"
0.0
'" ,,,,
... '"
"
,
I', .. "- . .
-0.200
-0.400
...
~Ml"'
,----; ;' '.~2 , 3
I .......
t········ . 5.00
0.0
15.0
10.0
25.0
20.0
TIME lAG (TAU)
Figure 2.34 First-order kernel estimates for nonwhite stimulus and fourth-order system obtained via LET (dotted line) and CCT (dashed line). The exact kernel is plotted with asolid line [Marmarelis, 1993].
case of white input for the fourth-order system (see Figure 2.30), but still remains much better than its CCT counterpart. The same is true for the second-order kerne1 estimates shown in Figure 2.35. It is interesting to note that the overall form of the CCT secondorder kernel estimates for the nonwhite input (see Figures 2.33 and 2.35) is not affected much by the presence of higher-order terms, even though the estimates are rather poor in both cases. These results demonstrate the fact that the advocated LET approach yields accurate kerne 1estimates from short experimental data records, even for nonwhite (but broadband) inputs, when there are no significant nonlinearities higher than the estimated ones (nontruncated models). However, these kernel estimates may be seriously affected when the
x-min x-max y-min y-max z-min z-max
(a)
0.0000 15.0000 0.0000 15.0000 -0.2835E+OO 0.7750E+OO
x-min
x-max y-min
y-max z-min z-max
f\
0.0000 15.0000 0.0000 15.0000 -0.3677E+OO 0.9611E+OO
(b)
Figure 2.35 (a) The second-order kernel estimate for the nonwhite stimulus and fourth-order system obtained via LET. (b) The second-order kernel estimate for the nonwhite stimulus and fourth-order system obtained via CCT [Marmarelis, 1993].
2.3
EFFICIENT VOLTERRA KERNEL ESTIMATION
121
experimental input is nonwhite and significant higher-order nonlinearities exist beyond the estimated model order (truncated model). This is due to the fact that the model residuals are correlated and input-dependent, resulting in certain estimation bias owing to lower-order "projections" from the omitted higher-order terms (which are distinct from the projections associated with the structure of the Wiener series). Of course, this problem is alleviated when all significantnonlinearities (kernels) are included in the estimated model. This is illustrated below with the accurate estimation of third-order kernels from short data records (made possible by the ability of the LET approach to reduce the number of required parameters for kernel representation). Although this compactness of kernel representation cannot be guaranteed in every case, the structure of the DLFs (i.e., exponentially weighted polynomials) makes it likely for most physiological systems, since their kemels usually exhibit asymptotically exponential structure. Let us now consider the previously simulated cascade system with the third-order nonlinearity y = z + Z2 + z3, receiving a band-limited GWN input of 1024 data points. The resulting third-order kernel estimate via LET is shown in Figure 2.36, as a three-dimensional "slice" for m-; = 4 (note that visualization of third-order kernels requires taking "three-dimensional slices" for specific values of m3)' Comparison with the exact third-order kerne I "slice" at m-; =: 4 shown also in Figure 2.36, indicates the efficacy of the LET technique for third-order kerne I estimation. Results of similar quality were obtained for all other values of m-; It has been shown that the third-order kernel estimate obtained via the traditional CCT in this case is rather poor [Marmarelis, 1993]. The ability of the advocated LET approach to make the estimation of third-order kernels practical and accurate from short data records creates a host of exciting possibilities for the effective nonlinear analysis of physiological systems with significant third-order nonlinearities. This can give good approximations of nonlinearities with an inflection point (such as sigmoid-type nonlinearities that have gained increasing prominence in recent years) that are expected to be rather common in physiology because of the requirement for bounded operation for very large positive or negative input values. The efficacy of the advocated kerne I estimation technique (LET) is further demonstrated in Chapter 6 with results obtained from actual experimental data in various physiological systems.
x-min x-max y-min
x-min = 0.0000 x-max = 15.0000 y-min = 0.0000 y-max = 15.0000 z-min = -0.7306E-OI z-max es O.2003E-OI
(a)
y-max z-min z-max
0.0000 15.0000 0.0000 15.0000 -0.7259E-Ol 0.1457E-OI
(b)
Figure 2.36 (a) Third-order kernet estimate (two-dimensional "slice" at m3 = 4) obtained via LET. (b) Exact third-order kernet "slice" at m3 = 4 [Marmarelis, 1993].
122
2.3.3
NONPARAMETRIC MODELING
High-Order Volterra Modeling with Equivalent Networks
The use of a basis of functions for kerne! expansion (discussed in the previous two sections) is equivalent to using a linear filterbank to preprocess the input in order to arrive at a static nonlinear relation between the system output and the multiple outputs of the filter bank, in accordance with Wiener's original suggestion (see Figure 2.9). As discussed in Section 2.2.2, Wiener further suggested the expansion of the multiinput static nonlinearity into a set of Hermite multinomials, giving rise to a network of adjustable coefficients of these multinomials that are characteristic of each system and must be estimated from the input-output data (i.e., a parametric description of the system). This model structure (often referred to as the Wiener-Bose model and shown in Figure 2.9) is an early network configuration that served as an equivalent model for the Wiener class ofnonlinear dynamic systems. For reasons discussed in Section 2.2.2, this particular model form was never adopted by users. However, Wiener's idea has been adapted successfully in the Volterra context, where the multiinput static nonlinear relation can be expressed in multinomial form, as shown in Equation (2.178) for the general Volterra model representation (see Figure 2.16). This modular structure ofthe general Volterra model can be also cast as an equivalent feedforward network configuration that may offer certain practical/methodological advantages presented below. Various architectures can be utilized to define equivalent networks for Volterra models of physiological systems. Different types of input preprocessing can be used (i.e., different filter banks) depending on the dynamic characteristics of the system, and different numbers or types ofhidden units (e.g., activation functions) can be used depending on the nonlinear characteristics of the system. The selected architecture affects the performance ofthe model in terms ofparsimony, robustness and prediction accuracy, as elaborated in Chapter 4 where the connectionist modeling approach is discussed further. In this section, we will limit ourselves to discussing the network architectures that follow most directly from the Volterra model forms discussed in Section 2.3.1. Specifically, we focus on the use of filter banks for input preprocessing and polynomial-type decompositions of the static nonlinearity (i.e., polynomial activation functions of the hidden units) transforming the multiple outputs of the filter bank. into the system output. One fundamental feature of these basic architectures is that there are no recurrent connections in these equivalent networks and, therefore, they attain forms akin to feedforward "artificial neural networks" that have received considerable attention in recent years. Nonetheless, recurrent connections may offer significant advantages in certain cases and are discussed in Chapter 10. Due to the successful track record ofthe Laguerre expansion technique (presented in Section 2.3.2), the particular feedforward network architecture that utilizes the discrete-time Laguerre filter bank for input preprocessing, termed the "Laguerre-Volterra network," has found many successful applications in recent years and is elaborated in Section 4.3. The modular representation of the general Volterra model shown in Figure 2.16 is equivalent to a feedforward network architecture that can be used to obtain accurate and robust high-order Volterra models using relatively short data records. One particular implementation that has been used successfully in several applications to physiological system modeling is the aforementioned Laguerre-Volterra Network (discussed in Section 4.3). In this section, we discuss the general features of the feedforward "Volterra-Wiener Network" (VWN) which implements directly the block-structured model of Figure 2.16 with a single hidden layer having polynomial activation functions.
2.3
EFFICIENT VOLTERRA KERNEL ESTIMATION
123
The VWN employs a discrete-time filter bank with trainable parameter(s) for input preprocessing and is equivalent to a "separable Volterra network," discussed below. The applicability of the VWN is premised on the separability of the multiinput static nonlinearity f into a sum of polynomial transformations of linear combinations of the L outputs {Vj} of the filter bank as yen) = f(vJ,
... , VL)
H
{L
== Co + ~ ~Ch,q ~ Wh,jvi n ) Q
}q
(2.210)
where {Wh,j} are the connection weights from the L outputs ofthe filter bank to the Hhidden units and {Ch,q} are the coefficients of the polynomial activation functions of the hidden units. Although this separability cannot be guaranteed in the general case for finite H, it has been found empirically that such separability is possible (with satisfactory approximation accuracy) for all cases ofphysiological systems modeling attempted to date. In fact, it has been found that reasonable approximation accuracy is achieved even for small H, leading to the notion of "principal dynamic modes" discussed in Section 4.1.1. It can be shown that exact representation of any Volterra system can be achieved by letting Land H approach infinity [Marmarelis & Zhao, 1994, 1997]. However, this mathematical result is of no practical utility as we must restriet Land H to small values by practical necessity. Estimation ofthe unknown parameters ofthe VWN (i.e., the w's and c's) is achieved by iterative schemes based on gradient descent and the chain rule of differentiation for error back-propagation, as discussed in detail in Section 4.2. Here we limit ourselves to discussing the main advantage of the VWN model (i.e., the reduction in the number of unknown parameters) relative to conventional Volterra or MDV models, as weIl as the major considerations governing their applicability to physiological systems. In terms ofmodel compactness, the VWN has [H(L + Q) + 1] free parameters (excluding any trainable parameters of the filter bank), whereas the MDV model of Equation (2.180) based on kernel expansions has [(L + Q)!/(L!Q!)] free parameters (in addition to any filter bank parameters, such as the DLF parameter a). The conventional discrete Volterra model of Equation (2.32) with memory-bandwidth product M has [(M + Q)!/(M!Q!)] free parameters. Since typically M $> L, the conventional discrete Volterra model is much less compact than the MDV model and we will focus on comparing the MDV with the VWN. It is evident that the VWN is more compact when H < (L + Q - 1) ... (L + 1)/ Q! Clearly, the potential benefits ofusing the VWN increase for larger Q, given that His expected to retain small values in practice (less than five). Since L is typically less than ten, the VWN is generally expected to be more compact than the MDV model (provided the separability property applies to our system). As an example, for H = 3, L = 10, and Q = 3 we have a compactness ratio of 7 in favor of VWN, which grows to about 67 for Q = 5. This implies significant savings in data-record length requirements when the VWN is used, especially for high-order models. Note that the savings ratio is much higher relative to conventional discrete Volterra models (about 4,000 for Q = 3). Another network model architecture emerges when the separability concept used in
124
NONPARAMETRIC MODELING
VWN is applied to the conventional discrete Volterra model, without a filter bank for preprocessing the input. This leads to the "separable Volterra network" (SVN) that exhibits performance characteristics between VWN and MDV, since its number of free parameters is [H(M + Q) + 1]. For instance, the VWN is about eight times more compact than the SVN when M = 100, L = 10, and Q = 3, regardless of the number of hidden units. The SVN is also useful in demonstrating the equivalence between discrete Volterra models and three-layer perceptrons or feedforward artificial neural networks [Marmarelis & Zhao, 1997], as discussed in Section 4.2.1. It is evident that a variety of filter banks can be used in given applications to achieve the greatest model compactness. Naturally, the performance improvement in the modeling task depends on how weIl the chosen filter bank matches the dynamics ofthe particular system under study. This has given rise to the concept of the "principal dynamic modes," which represent the optimal (minimal) set of filters in terms of MDV model compactness for a given system, as discussed in Section 4.1.1. It is also worth noting that different types of activation functions (other than polynomial) can be used in these network architectures, as long as they retain the equivalence with the Volterra class of models. Specifically, any complete set of analytic functions will retain this equivalence (e.g., exponential, trigonometric, polynomial, and combinations thereof). Of course the key practical criterion in assessing suitability for a specific system is the resulting model compactness for a given level ofprediction accuracy. The latter is determined by the combination ofthe selected filter bank and activation functions (in the form and numbers). A promising approach to near-optimal selection of a network model for physiological systems is presented in Section 4.4. The model compactness determines the required length of experimental data and the robustness of the obtained parameter estimates (i.e., estimation accuracy) for given signal-to-noise ratio (SNR}-both important practical considerations. In general, more free parameters and/or lower SNR in the data imply longer data-record requirements-a fact that impacts critically the experiment design and our fundamental ability to achieve our scientific goal of understanding the system function under the prevailing experimental! operational conditions. For instance, issues of nonstationarity of the experimental preparation may impose strict limits on the length of the experimental data that can be collected in a stationary context, with obvious implications on model parameterestimation. In addition, lower SNR in the data raises the specter of unintentional overfitting when the model is not properly constrained (i.e., having the minimum necessary number of free parameters). This risk is mitigated by the use of statistical criteria for model order selection, like the one presented in Section 2.3.1 for MDV models. Finally, it must be noted that the model compactness may affect our ability to properly interpret the results in a physiologically meaningful context. It is conceivable that the introduction of more "hidden layers" in the network architecture (i.e., more layers of units with nonlinear activation functions) may lead to greater overall model compactness by allowing reduction in the number of hidden units in each hidden layer (so that the total number ofparameters is reduced). The introduction of additional hidden layers also relaxes the aforementioned requirement of separability, expressed by Equation (2.210). The estimation/training procedure is minimally impacted by the presence of multiple hidden layers. The case of multiple hidden layers will be discussed in Sections 4.2 and 4.4 and may attain critical importance in connection with multiinput and spatiotemporal models (see Chapter 7).
2.4 ANALYSIS OF ESTIMATION ERRORS
125
2.4 ANALYSIS OF ESTIMATION ERRORS In this section, we elaborate on the specific estimation errors associated with the use of each ofthe presented kernel estimation methods: (1) the cross-correlation technique (Section 2.4.2), (2) the direct inversion methods (Section 2.4.3), and (3) the iterative cost minimization methods (Section 2.4.4). We begin with an overview ofthe various sources of estimation errors in the following section.
2.4.1
Sources of Estimation Errors
The main sources of estimation errors in nonparametric modeling can be classified into three categories: 1. Model specification errors, due to mismatch between the system characteristics and the specified model structure 2. Estimation method errors, due to imperfections of the selected estimation method for the given input and model structure 3. Noiselinterference errors, due to the presence of ambient noise (including measurement errors) and systemic interference In the advocated modeling approach that employs kerne I expansions or equivalent network structures, the model specification errors arise from incorrect selection of the key structural parameters of the model. For instance, in the Laguerre expansion technique (LET), the key structural parameters are the number L of discrete-time Laguerre functions (DLFs), the order of nonlinearity Q, and the DLF parameter a. In the case of a Volterra-Wiener Network (VWN), the key structural parameters are the number L offilters in the filterbank, the number H ofhidden units, and the order ofnonlinearity Q in the polynomial activation functions. Misspecification of these structural parameters will cause modeling errors whose severity will depend on the specific input signals and, of course, on the degree of misspecification. This type of error is very important, as it has broad ramifications both on the accuracy and the interpretation ofthe obtained models. It can be avoided (or at least minimized) by the use of a statistical search procedure utilizing successive trials, where the parameter values are gradually increased until a certain statistical criterion on the incremental improvement ofthe output prediction (typically quantified by the sum ofthe squared residuals) is met. This procedure is described in detail in Section 2.3.1. The errors associated with the selected estimation method depend also on the specific input signals as they relate to the selected model structure. In the case of LET, the direct inversion method is typically used, since the estimation problem is linear in the unknown parameters. The required matrix inversion (or pseudoinversion if the matrix is ill-conditioned) is subject to all the subtleties and pitfalls of this well-studied subject. In short, the quality of the result depends on the input data and the structural parameters of the model that determine the Gram matrix to be inverted. Generally, ifthe input is a broadband random signal and the model structure is adequate for the system at hand, then the results are expected to be very good-provided that the effects of noise/interference are not excessive (see third type of error below). However, ifthe model structure is inadequate or the input signal is a simple waveform with limited information content (e.g., pulse, impulse, sinusoid, or narrowband random), then the resulting Gram matrix is likely to be ill-condi-
126
NONPARAMETRIC MODELING
tioned and the results are expected to be poor unless a pseudoinverse of reduced rank is used. In the case ofVWN, the iterative cost-minimization method is typically used, since the estimation problem is nonlinear in some of the unknown parameters (i.e., the weights of the filter bank outputs). A variety of algorithms are available for this purpose, most of them based on gradient descent. Generally, the key issues with these algorithms are the rate of convergence and the search for the global minimum (avoidance of local minima)-both of which depend on the model structure and the input characteristics. The proper definition of the minimized cost function depends on the prediction error statistics that include the noise/interference contaminating the data. Although a quadratic cost function is commonly used, the log-likelihood function ofthe model residuals can define the cost function in order to achieve estimates with minimum variance. Many important details concerning the application of these algorithms are discussed in Sections 4.2 and 4.3, or can be found in the extensive literature on this subject that has received a lot of attention in recent years in the application context of artificial neural networks (see, for instance, Haykin, 1994 and Hassoun, 1995). The last type of estimation error is due to the effects of ambient noise andlor systemic interference contaminating the input-output data. The extent of this error depends on the statistical and spectral characteristics of the noise/interference as they relate to the respective data characteristics, the selected model structure, and the employed estimation method. For instance, if there is considerable noise/interference at high frequencies but little noise-free output power at these high frequencies, then the resulting estimation errors will be significant if the selected model structure allows high-frequency dynamics. On the other hand, these errors will be minimal if the selected model structure constrains the high-frequency dynamics (e.g., a large DLF parameter a andlor small L in the LET method) or if the input-output data are low-pass filtered prior to processing in order to eliminate the high-frequency noise/interference. It should be noted that the iterative cost-minimization methods allow for explicit incorporation of the specific noise/interference statistics (if known or estimable) in the minimized cost function to reduce the estimation variance and achieve robust parameter estimates (see Sections 2.1.5, 4.2, and 4.3). Naturally, there is a tremendous variety in the possible noise/interference characteristics that can be encountered in practice, and many possible interrelationships with the model/method characteristics to provide general practical guidelines for this type of error. However, it can be generally said that the noise/interference characteristics should be carefully studied and intelligently managed by proper preprocessing of the data (e.g., filtering) and judicious selection of the model structural parameters for the chosen estimation method and available data. This general strategy is summarized in Chapter 5 and illustrated with actual applications in Chapter 6. It is accurate to say that the proper management of the effects of noise/interference is of paramount importance in physiological system modeling and often determines the success of the whole undertaking. Note that, unlike man-made systems in which high-frequency noise is often dominant, physiological systems are often burdened with low-frequency noise/interference that has traditionally been given less attention. The modeling errors that result from possible random fluctuations in system characteristics are viewed as being part of the systemic noise, since the scope of the advocated methodology is limited to deterministic models. Such stochastic system variations can be studied statistically on the basis of the resulting model residuals.
2.4
ANALYSIS OF ESTIMATION ERRORS
127
The effects of noise and interference are intrinsie to the experimental preparation and do not depend on the analysis procedure. However, the specification and estimation errors depend on the selected modeling and estimation methodology. The issue of model specification errors runs through the book as a central issue best addressed in the nonparametric context (minimum prior assumptions about the model structure). A major thrust ofthis book is to emphasize this point and develop the synergistic aspects with other modeling approaches (parametric, modular, and connectionist). The model specification and estimation errors are discussed in the following section for the various nonparametric methods presented in this book: the cross-correlation technique, the direct-inversion methods, and the iterative cost-minimization methods.
2.4.2
Estimation Errors Associated with the Cross-Correlation Technique
In this section, we concentrate on the estimation errors associated with the use of the cross-correlation technique for the estimation of the Wiener or CSRS kernels of a nonlinear system (see Sections 2.2.3 and 2.2.4). Although the introduction ofthe cross-correlation technique is theoretically connected to GWN inputs, it has been extended (by practical necessity) to the case of quasiwhite (CSRS or PRS) inputs used in actual applications, as discussed in Section 2.2.4. Since we are interested in studying the estimation errors during actual applications of the cross-correlation technique, we focus here on the broad class of CSRS quasiwhite inputs (that includes the band-limited GWN as a special case) and the discrete-time formulation ofthe problem, as relevant to practice. Detailed analysis was first provided in Marmarelis and Marmarelis (1978). The estimation of the rth-order CSRS kernel, gn requires the computation of the rthorder cross-correlation between sampled input-output data of finite record length N and its subsequent scaling as
"
c, ~
gr(mJ, ... ,m r) = - L..,y(n)x(n - ml) ... x(n - m r) N n=l
(2.211)
where C; is the scaling constant, x(n) is the discretized CSRS input, and yen) is the discretized output, used here instead ofthe rth-order output residual (as properly defined in Section 2.2.4) to facilitate the analytical derivations, i.e., the analysis applies to the nondiagonal points of the kernel argument space. It is evident that the estimation error is more severe at the diagonal points, because of the presence of low-order terms in the CSRS orthogonal functionals; however, the number of diagonal points in a kernel is much smaller than the number of nondiagonal points. Thus, the overall kerne I estimation error will be determined primarily by the cumulative errors at the nondiagonal points. We note that the study of the errors at diagonal points of the kernels makes use of the higher moments of the input CSRS, thus complicating considerably the analytical derivations. The important issues related to CSRS kernel estimation at the diagonal points are discussed in Section 2.2.4. The estimation errors for the rth-order CSRS kerneI estimate, gn given by Equation (2.211) depend on the step size fit ofthe particular quasiwhite CSRS inputx(n) and on the record length N. Note that fit may not be equal to the sampling interval T (T ::5 fit), and that the initial portion of the actual experimental data record (equal to the extent of the computed kernel memory) is not included in the summation interval ofEquation (2.211) in order to prevent estimation bias resulting from null input values.
128
NONPARAMETRIC MODELING
The use of Equation (2.211) for the estimation of the CSRS results in two main types of errors, which are due to the finite record length and the finite input bandwidth. These two limitations of finite record and bandwidth are imposed in any actual application of the method and, consequently, the study of their effect on the accuracy of the obtained kerne I estimates becomes of paramount importance. In addition to the aforementioned sources of estimation error, there are errors due to the finite rise-time of input transducers, which along with the discretization and digitization errors can be made negligible in practice [Marmarelis & Marmarelis, 1978]. Of particular importance is the sampling rate, which must be higher than the Nyquist frequency of the input-output signals in order to alleviate the aliasing problem (T :=; ~t) and must exceed the system bandwidth Bs(T :=; (2B s )- I). Also, digitization length of at least 12 bits is needed to provide the numerical accuracy sufficient for most applications. We concentrate on the two main types of errors caused by the finite record length and bandwidth of the CSRS family of quasiwhite test signals [Marmarelis, 1975, 1977, 1979]. The relatively simple statistical structure of the CSRS facilitates the study of their autocorrelation properties (as discussed in Section 2.2.4), which are essential in the analysis of the kernel estimation errors. Substituting yen) in terms of its Volterra expansion in Equation (2.211), we obtain the following expression for the nondiagonal points of the rth-order CSRS kerne I estimate: gr(mb· .. ,mr)
= er.L I i=O Ui
... I
k;(ub ... , U;)~r+;(Ub ... , U;, nu,
... ,m r ) (2.212)
Ui
where the factor P has been incorporated in the discretized Volterra kerne I k., and ~ 1 N cPr+i( (Tl> ... , (Ti> ml> · · . , m r ) = N ~x(n - (Tj) ... x(n - (T;}x(n- mj) ... x(n - mr ) (2.213)
is the estimate of the (r + i)th autocorrelation function of the CSRS input x(n). Clearly, the error analysis for the obtained CSRS kernel estimate gr necessitates the study of the statistical properties of the autocorrelation function estimate ~r+;, as they affect the outcome ofthe summation in Equation (2.212). The estimation error ingr consists of two parts: the bias and the variance.
Estimation Bias. The bias in the CSRS kernel estimate is due to the finite bandwidth of the quasiwhite input (i.e., the finite step size ~t). The possible deviation of the CSRS input amplitude distribution from Gaussian may also cause a bias (with respect to the Wiener kernels) but it is not actually an error, since it represents the legitimate difference between the Wiener kernels and the CSRS kernels of the system. Therefore, this type of bias will not be included in the error analysis and was studied separately in Section 2.2.4. Note that the difference between Wiener and CSRS kernels at the nondiagonal points vanishes as ~t tends to zero [see, for instance, Equations (2.128) and (2.134)]. The estimation bias due to the finite input bandwidth amounts to an attenuation of high frequencies in the kerne I estimate. The extent of this attenuation can be studied in the time domain by considering the 2rth-order autocorrelation function cP2r of a CSRS at the nondiagonal points (mI' ... , mr ) , which was found to be [Marmarelis, 1977]
2.4 ANALYSIS OF ESTIMATION ERRORS
{M2 fI(1 - Im f i tI).' i
4J2r(O"b . . . , a.; mi, ... , mr ) =
-
z=1
(Ti
o
for Imi -
O"il
~
fit
129
(2.214)
elsewhere
Assuming that the rth-order Volterra kernel k, is analytic in the neighborhood ofthe nondiagonal point (mb . . . , m r), we can expand it in a multivariate Taylor series about that point. With the use ofEquation (2.214), we cari obtain r
Ergr(mb ... , m r)] = kr(mb ... , m r)
I
+ Ifit2/ /=1
D/(jb ... ,l/)/tr2l)(m b ... , m r)
Jl,· .. ,iFl
(2.215)
where k/2l)(mb ... , m r) denotes the 21th partial derivative of k; with respect to each ofits arguments twice, evaluated at the discrete point (mh ... , mr), and D/(jb ... ,li) depends on the multiplicity ofthe sets ofindices (jb ... ,li). More specifically, ifa certain combination (j 1, . . . ,li) consists of I distinct groups of identical indices, and Pi is the population number (multiplicity) ofthe ith group, then 1
I
D/(j b
...
,j/) =
JJ (2p;)!(Pi + 1)(2Pi + 1)
(2.216)
In all practical situations, fit is much smaller than unity, which allows us to obtain a simpler approximate expression for the expected value of the kernel estimate given by ,.,
E[gr(mb ... , m r)]
fit2 ~ a'lkr( 'Tb ... , 'Tr) L 2 12 J=1 a'TJ
== kr(mb ... , mr) + -
I ._.
(2.217)
7j-mJ
Therefore, ifhigher-order Volterra functionals are present in the system (note that the lower-order ones will vanish for nondiagonal points), then we can show that, in first approximation, they give rise to terms of the form [Marmarelis, 1979] Ergr(mb ... , m
r)]
==
i/~O
(r + 2l)!(M2ätY r!l!_1 ät/
I ... I mr+l
{kr+2l mb ... , m., mr+b mr+b ... , mr+b m r+/)
mr+/
2
fit + 12
~ L, J=1
a'lkr+2/('Tb· .. , 'Tn 'Tr+h Tr+b ... , 'Tr+b 'Tr+/)
a'T~J
1._ .} (2.218) 7j-mJ
where (r + 2l) represents the order of each existing higher-order Volterra kernel. We conclude that, for small fit, the bias of the rth-order CSRS kernel estimate at the nondiagonal points can be approximately expressed as Ergr(mJ, ... , m r)] - gr(mJ, ... , mr)
== Ar(mb ... , m r )fit 2
(2.219)
The functionAr(mb ... , m r ) depends on the second partial derivative ofthe Volterra kernels ofthe same or higher order (ofthe same parity) as indicated by Equations (2.217) and
130
NONPARAMETRIC MODELING
(2.218). We must note that the validity ofthe derived expressions is limited to the analytic regions of the kerne I (a point of only theoretical interest).
Estimation Variance. The variance of the CSRS kernel estimate gr is due to the statistical variation of the estimates of the input autocorrelation functions for finite data records. This variance diminishes as N increases, but it remains significant for values of N used in practice. Thus, ifwe consider the random deviation 'Yi ofthe ith-order autocorrelation function, then the variance ofthe rth-order kernel estimate at the nondiagonal points is equal to the second moment of the random quantity
s Im I> ••• , m r) = er =
i iP- ... f
k;(CTI>
••• ,
CT;}Yr+i( CTI>
••• ,
i=Q 0
CTi, nu, ... , mr)dCT] ... dCT;
(2.220)
LUr,i(mb ... , mr) i=O
where the scaling factor
er for nondiagonal points is [Marmarelis, 1977] (2.221)
Each of the random functions Ur,i can be evaluated by utilizing the basic properties of the random deviation functions 'Yr+i' Namely, that 'Yr+i is offirst degree (i.e., piecewise linear) with respect to each of its arguments and that its values at the nodal points (i.e., the points where each argument is a multiple of ~t) determine uniquely its values over the whole argument space through multilinear interpolation [Marmarelis, 1975, 1977; Marmarelis & Marmarelis, 1978]. Furthermore, the probability density function ofthe 'Yr+i random values at nodal points will tend to the zero-mean Gaussian due to the Central Limit Theorem and the statistical properties of the CSRS input. If we expand the kernel k, into a multivariate Taylor series about the nodal points and evaluate the integral over each elementary segment ofthe space within which 'Yr+i remains analytic (and piecewise multilinear), then the variance of each random quantity Ur,i (for small ~t) is approximately (cf. Marmarelis, 1979):
pi var[Ur,lmb ... , mr)] = N~tr Ur,lmb ... , mr)
(2.222)
where P = M2~t is the power level ofthe input CSRS, and Ur,i is a square-integral quantity dependent on the ith-order Volterra kernel ofthe system. Combining Equations (2.222) and (2.220), we find that the variance ofthe rth order CSRS kernel estimate at the nondiagonal points is approximately "(
var[gr ml> ... , m r)] ==
B2(p. r ,ml>"" m r) N~r
(2.223)
where B;(P; m., ... , mr) =
L piUr,lmb ... , mr) i=O
(2.224)
2.4 ANALYSIS OF ESTIMATION ERRORS
131
depends on the Volterra kernels of the system and the input power level (for details, see Marmarelis, 1979).
Optimization of Input Parameters. The combined error, due to the estimation variance and bias, can be minimized by judicious choice of the CSRS input parameters Ilt and N. This optimization procedure is based on Equations (2.219) and (2.223). Therefore, it is valid for very small values of Ilt-a condition that is always satisfied in actual applications. We seek to minimize the total mean-square error Qr ofthe rth-order CSRS kernel estimate that can be found by summation of A;' and B;' over the effective memory extent of the kernel: Qr = ur!if + ßr(P) Nil I"
(2.225)
where a, and ßr are the summations of A;' and B;', respectively, over (mb . . . , m r) that cover the entire kernel memory. Note that Qr decreases monotonically with increasing N but, in practice, limitations are imposed on N by experimental or computational considerations. Therefore, the optimization task consists of selecting Ilt as to minimize Qr for a given value of N (set at the maximum possible for a given application). The function Qr has always a single minimum with respect to Ilt, because a; and ßr are positive constants. The position of this minimum defines the optimum li.t for each order r and is given by (!itopt)r = [~ ßr(P) ]lICr+4) 4N a,
(2.226)
Consideration must be given to the fact that the optimum Ilt depends implicitly through ßr on the power level P, which is proportional to Ilt. Therefore, the optimization of Ilt for a fixed P necessitates the adjustment of the second moment of the CSRS input signal as at changes (since P = M 2 . Ilt) by dividing its amplitude by VKt. Note also that Equation (2.226) gives a different optimum Ilt for each order of kerne!. The determination of each optimum 1i.t requires the knowledge of the corresponding constants a; and ßr which depend on the (unknown) Volterra kernels of the system. Therefore, these constants must be estimated through properly designed preliminary tests, on the basis of the analysis given above. For instance, if we obtain several rth-order kernel estimates for various values of 1i.t while keeping the input power level constant and varying the record length so as to keep the second term of Equation (2.225) constant, then a, can be estimated through a least-squares regression procedure. A regression procedure can be also used to estimate ßr by varying the input record length, while keeping Ilt and P constant. These procedures have been illustrated with computer-simulated examples [Marmarelis, 1977] and elaborated extensively in the first monograph by the Marmarelis brothers [Marmarelis & Marmarelis, 1978]. Weillustrate below the findings of the presented analysis through computer-simulated examples and demonstrate the proposed optimization method in an actual application. The dependence ofthe estimation bias upon the step size Ilt ofthe CSRS input [cf. Equation (2.219)] is illustrated in Figure 2.37, where the kernel estimates obtained for three different step sizes are shown. The bias of those estimates at each point is approximately
132
NONPARAMETRIC MODELING
3r
~
g,
estimation bias
0 second derivative of g\
o'
J
TIME (SECONDS)
6
Figure 2.37 Dependence of first-order CSRS kemel estimation bias on CSRS step size (top) for the simulated example defined by the first-order Volterra kernel shown as a curve D in the bottom panel, along with the CSRS kernel estimates (bottom) for Ilt equal to 0.1 (A), 0.2 (8), and 0.4 (C)[Marmarelis & Marmarelis, 1978].
proportional to the second derivative of the kernel and the square of the respective step size, confirrning Equation (2.219). The dependence of the estimation variance upon the record length [cf. Equation (2.223)] is illustrated in Figure 2.38, where the first- and second-order CSRS kernel estimates, obtained for three different record lengths, are shown . The step size 1:::..1 of the CSRS input in all three cases is 0.1 sec. The norrnalized mean-square errors of those estimates (as percentages of the respective summed squared kerneIs) are given in Table 2.2. These numerical values confirm, within some statistical deviation , the theoretical relation given by Equation (2.223) . The dependence of the total estimation error (bias and variance) upon the step size 1:::..1 of the CSRS input [see Equation (2.225)] is illustrated in Figure 2.39, where the secondorder CSRS kerne1 estimates obtained for three different step sizes are shown. The datarecord length in all three cases is 500 CSRS steps (i.e., N = 500). The statistical error (estimation variance) for multiple kernel estimates of first and second order is plotted in Figure 2.40 for various step sizes. Note that the same data-record length is used for these estimates and the input power level is kept constant while varying 1:::..1 by adjusting the variance of the CSRS input signal. The total mean-square errors of several kerne1estimates, obtained for various CSRS step sizes and record lengths, are shown in Figure 2.41, along with the curves representing the theoretical relation described by Equation (2.225).
Table 2.2 Normalized Mean-Square Errors of First- and Second-Order CSRS Kernel Estimates for Various Record Lengths
Normalized mean-square eITers of CSRS kernel estimate of Record length 500 CSRS steps 1000 CSRS steps 2000 CSRS steps
First order
Second order
0.834 0.488 0.211
23.74 12.22 6.82
2.4 ANALYSIS OF ESTIMA TlON ERRORS
133
...
o
w
o I,.
8
Il)
'~:;:.:~(;<~/
w
ci
o o Il) ci
I' :'~ '
', '
, ~ .' "
.'~~
,,';(. "
/;, .':,.~:.:"
!...~
.." ,.
;' ,' ,
~' /
r,
,
,
'
"
o
o
ci
ci
0,0
0.500E 01
0.0
0.500E 01
o
~ ~ ... TIME (SECONDS) ,
o " w
, ~
.'"
8
Il)
ci
o ci
/ ,; .,'
l.
»
;
"
~'' , ...'
»,
~
..;, ,
,
.,
, '"
.....,~ .
."" /" .,~..~'::/(,
I!z ITI' T2 )
sec
"
F~~R./-: ~ 0.0
T
0.500E 01
Tl -
Figure 2.38 First- and second-order CSRS kernel estimates tor various data-record lengths: 500 (A), 1000 (8), and 2000 (C) data po ints. The exact kerneis are shown in (0) [Marmarelis & Marmarelis, 1978],
134
NONPARAMETRIC MODELING
t:.t = .15 sec
•
!
"
-,.
..
/ ,; c., ', '/, , /, ,
,;
.
.. , ",,',', ,', ... ,.,,-
J
',""
r , ;;..
..' ':' ,,,
.
!
, ', " ,/
'/:,; ..\ . ' ,,, .".'.,
.'"
,
"
~Il'-~~
,.
!•I
~
t:.t
= .45
...
sec
l/
-, .... " ,.
'
:i
I
t:.t
= .60
sec
• .. . 01
ö
!
:i
...
0'-'''
...
0.. . ..
Figura 2.39 Second-order CSRS kemel estimates for various step sizes of the CSRS input. The exact kernel ls shown in Figure 2.38 (0) [Marmarelis & Marmarelis, 1978).
Noise Effects. The effect of noise in the data is examined in the context of the crosscorrelation technique by considering first the most cornrnon case of output-additive noise: yen) = yen) + e(n)
(2.227)
then the rth-order CSRS kernel estimate obtained via cross-correlation becomes
_(
) •(
g r m\> . .. , m r = gr m\> . . . , m r )
er ~ ~ e(n)x(n +N
m\) . . . x(n - m r )
(2 .228)
The last term of Equation (2.228) represents the effect of output-additive noise on the CSRS kerne I estimate and will diminish as the record length N increases, because the noise e(n) is assumed to have zero mean and finite variance .
2.4 ANALYSIS OF ESTIMATION ERRORS
135
statistical fluctuation error of
i
.3
.2. .I
o
L
I I .15
.30
f 45
t .60
st~p length [sec]
statistico' fluctuotäon error of Q2
6
5 4
3
I
I r
2
.15
I
I
I
.30
.45
~ I
.60 step Je ngth [sec]
Figure 2.40 The statistical error (estimation variance) of first- and second-order CSRS kernel estimates for various CSRS step sizes (mean and standard deviation bars) [Marmarelis & Marmarelis,
1978].
In the case of input-additive noise, the effect is more complicated since substitution ofi
= x +8 into the cross-correlation formula (2.211) for CSRS kerne1estimation gives rise to many multiplicative terms involving the noise term and the input signal. Furthermore, some ofthe input-additive noise (ifnot simply measurement errors) may propagate through the system and give rise to noise-generated deviations in the output signal. This further complicates matters and causes multiplicative tenns that may not have zero mean (e.g., products of correlated noise values) whose effect will not diminish as fast with increasing N. Fortunately, in practice, input-additive noise is not significant in most cases and care must be taken to ensure that remains minimized by paying proper attention to the way the input is applied to the system (e.g., careful design ofD-to-A devices and transducers). The effect of systemic noise or interference (i.e., random variations ofthe system characteristics due to interfering factors) is potentially the most detrimental and most difficult to alleviate in practice, since its effect on the output signal is generally dependent on the input and may result in significant estimation errors when the cross-correlation technique is used for kerne I estimation. Nonetheless, if the systemic noise/interference has zero
136
NONPARAMETRIC MODELING CI)
00= ~ ck ·6 t k
.3
k=l
.10
Q1= A, ·~t4 + !!L T
T=500
/.,/
9'/
""'/6o"'T=IOOO .2 L / /O~ YT
.05
g..//iO//
-A~~
o
.30 4
15 10 5
Q2 =A2 -At
\
\
,
~
\
.1
Ta500
/
\
/
I
6~
Ta1000 4 /'
-,"a......... ',b.... . . . JI /"Tj2000 2 ~_~_G"" ---0---
.15
.45
1=500
/T-looo
«..... T-2000
.....
/'/
// ..........: /.d
......' ..........y
::::..r~-
o
~5
82 + T-,6t
\
0/0
~/
=2000
........
/
/A)
,...D..... ~t
.30
Otot =00+"'2·Ot+m~·021 r-soo ~ f \ I
\"
".", ,0, D
/J
-
/ .J:I / /
d'
1=1000
"T=20oo
<,~-~.30
.45
Figure 2.41 The total mean-square error of CSRS kernel estimates for various CSRS step sizes at and record lengths T: 00' (zeroth order), 0 1 (first order), O2 (second order), Otot (all orders together) [Marmarelis & Marmarelis, 1978].
mean, then sufficiently long data records may alleviate its effect considerably. Careful attention must be given to the nature of the systemic noise/interference and experimental care must be taken to minimize its effects. Erroneous Scaling of Kernel Estimates. In the cross-correlation method, we obtain the rth-order CSRS kernel estimate by scaling the rth-order cross-correlation function estimate with the appropriate factors that depend on the even moments of the CSRS, its step size Ji.t, and the location of the estimated kernel point (i.e., multiplicity of indices for diagonal points). The problem that is encountered in practice, with regard to the accurate scaling of the CSRS kerne I estimates, derives from the fact that the actual input signal, which stimulates the system under study, deviates somewhat from the exact theoretical CSRS waveform that is intended to be delivered as test input to the system. This deviation is usually caused by the finite response time of the experimental transducers, which convert the string of numbers generated by the digital computer into an analog physical signal that stimulates the system under study. This does not affect the quasiwhite autocorrelation properties of the test input signal but it causes some changes in the values ofthe even moments (which define the scaling factors) from the ones that are theoretically anticipated. Ifthe measurement ofthe actual even moments ofthe applied input signal is not possible, then a final correctional procedure can be used that is based on a least-squares fit between the actual system output and the estimated contributions of the estimated CSRS orthogonal functionals, in order to determine the accurate scaling of the CSRS kernel estimates [Marmarelis & Marmarelis, 1978].
2.4 ANALYSIS OF ESTIMATION ERRORS
2.4.3
137
Estimation Errors Associated with Direct Inversion Methods
The direct inversion methods for Volterra kernel estimation employ the modified discrete Volterra (MDV) model ofEquation (2.180) in its matrix formulation ofEquation (2.183). Although pseudoinversion may be required when the matrix V is not full rank or simply ill-conditioned (as discussed in Section 2.3.1), the error analysis will be performed below for the case when the matrix V is full rank and, equivalently, the Gram matrix G = [V'V] is nonsingular. In this case, we can define the "projection" matrix
n, = 1- Vr[V;Vr]-IV;
(2.229)
for an MDV model of order r that corresponds to structural parameter values of L basis functions for the expansion of the Volterra kernels of highest-order Q. The matrix H, is idempotent and has rank (N - Pr), where N is the number of output data points and Pr = (Q + L)!/(Q!L!) is the total number offree parameters in the MDV model (i.e., the total number ofkernel expansion coefficients that need be estimated). In Section 2.3.1, we showed that that residual vector for model order r is given by 8r
= Hry = Ur +
Wr
(2.230)
where Ur is input-dependent (the unexplained part of the system output that vanishes for the correct model order), and Wr is the transformation of the input-independent noise Wo (which is added to the output data and assumed to be white and Gaussian) using the "projection" matrix: Wr =
Hrwo
(2.231)
Note that the matrix H, depends on the input data and the MDV model structure of order r. Therefore, even if Wo is white, the residuals for r =1= 0 are not generally white. Furthermore, we note that the residuals for an incomplete model order r contain a term Ur = Hruo
(2.232)
that represents the model "truncation error" of order r (because it is the part of the noisefree output that is not captured by the model order r), where Uo represents the noise-free output data. Thus for a MDV model of order r, the "model specification" error is represented by Ur and vanishes for the time system order R, i.e., UR = o. The size ofthis error in the estimated expansion coefficients for a truncated model (r < R) is given by
9r ~ E[c r ] -
Cr
= A,A~CR - c,
(2.233)
where c, and CR denote the kernel expansion coefficients for the model orders rand R, respectively (R is the true system order), AR denotes the generalized inverse ofthe [PR x .N] matrix AR' and Ar is the input-dependent matrix that yields the coefficient vector estimate for model order r:
138
NONPARAMETRIC MODELING
Cr = Ary
(2.234)
Ar = [V;Vr]-IV;
(2.235)
where
Note that the coefficient vector estimate is composed oftwo parts:
Cr = Aruo + Arwo
(2.236)
where the first term represents the deterministic bias an of the estimate cr (due to the model truncation), and the second term represents the random variation in the estimate due to the output-additive noise Wo (statistical error). Since
E[c r] =
Aruo
=A~kCR
(2.237)
which yields the result of Equation (2.233), quantifying the estimation bias due to the truncation ofthe model order. It is evident form Equation (2.233) that the model specification error depends on the model structure and the input data. The statistical estimation error Arwo has zero mean and covariance matrix cov[Cr ] ~ E[A rwow6A;] = U6 . A~;
= U6 . [V;Vr]-1
(2.238)
when the output-additive noise is white with variance U6 (note that the SNR = u6uo/Nuö). Thus, the coefficient estimates are correlated with each other and their covariance depends on the input data and the structure of the model. The estimation variance will be minimum for Gaussian noise (for given input data). The statistics ofthe estimation error are multivariate Gaussian if Wo is Gaussian, or will tend to multivariate Gaussian even if Wo is not Gaussian (but reasonably concentrated) because of the Central Limit Theorem. When the output-additive noise Wo is not white but has some covariance matrix S, then we can perform prewhitening ofthe output data in the manner similar to the one described earlier by Equations (2.47)-(2.48). This results in alteration ofthe fundamental matrix Ar as
Ar = [V'S-IV r r]-IV'S-1 r
(2.239)
which determines the estimation bias in Equation (2.233) and the estimation variance in Equation (2.238). Thus, the estimation bias and variance ofthe kernel expansions are affected by the covariance matrix of the output-additive noise when the latter is not white. The case of nonGaussian output-additive noise is discussed in the following section in connection with iterative cost-minimization methods, since the estimation problem becomes nonlinear in the unknown expansion coefficients when nonquadratic cost functions are used.
2.4
ANALYSIS OF ESTIMATION ERRORS
139
2.4.4 Estimation Errors Associated with Iterative Cost-Minimization Methods These iterative estimation methods are used typieally in connection with the network models introduced in Section 2.3.3 and elaborated further in Sections 4.2-4.4, because the estimation problem becomes nonlinear with respect to the unknown network parameters in those cases. The estimation problem also becomes nonlinear when nonquadratic cost functions are used (see Section 2.1.5), even ifthe model remains linear with respect to the unknown parameters. It is also plausible that these methods can be used for practical advantage in certain cases with quadratic cost funetion in which the model is linear with respect to the unknown parameters (e.g., for very large and/or ill-conditioned Gram matrix) resulting in the "iterative least-squares" method [Goodwin & Sin, 1984; Ljung, 1987; Haykin, 1994]. The model specification error for these methods relates to the selection of the structural parameters (e.g., number L ofbasis filters, number H ofhidden units, number Q ofnonlinear order). This error can be minimized by use of a statistical selection criterion such as the one described by Eq. (2.200) in Section 2.3.1 in eonnection with the modified discrete Volterra (MDV) model. However, the size ofthe resulting specification error depends on the partieular iterative method used. The same is true for the parameter estimation error, which depends on the ability of the selected iterative method to converge to the global minimum of the cost function (for the specified model). Since there is a great variety of such iterative methods (from gradient-based to random-search or genetic algorithms), a comprehensive error analysis is not possible within the bounds of this section. The reader can find ample information on this subject in the extensive literature on the training of "artificial neural networks" that have recently come into vogue, or on general minimization methods dating all the way back to the days of Newton [Eykhoff, 1974; Haykin, 1994; Hassoun, 1995]. For the modest objectives of this section, we limit Dur discussion to simple gradientbased methods in connection with the MDV model [see Equations (2.180) and (2.205)]. Generally speaking, the efficacy of all these iterative cost-minimization methods relies on the ability of the employed algorithm to find the global minimum (thus avoiding being trapped in the possible local minima) within a reasonable number of iterations (fast convergence). Clearly this task depends on the "morphology ofthe surface defined by the cost funetion" in the parameter space (to use a geometrie notion). 1fthis surface is convex with a single minimum, the task is trivial. However, this is seldom the case in practice, where the task is further complicated by the fact that the "surfaee morphology" changes at each iteration. In addition, the presence of noise introduces additional "wrinkles" onto this surface. The presence ofmultiple local minima (related to noise or not) makes the choice ofthe initialization point rather critical. Therefore, this task is formidable and remains an open challenge in its general formulation (although considerable progress has been made in many cases with the introduction of clever algorithms for specific classes ofproblems). As an example, we consider the case ofthe MDV model ofEquation (2.180) and the gradient-descent method of Equation (2.205) with a quadratic cost function F( e) yielding Equation (2.208). Then, the ''update rule" ofthe iterative procedure for the expansion coefficient Cr(ji- . . • , j r) is
c;(i+ I)(j i-
.•.
«l-. ) --
.) (i)( n ) c;(i)(j b · · · -J» + 2 ve (i)() n ViI(i)( n) ... Vir
(2.240)
140
NONPARAMETRIC MODELING
where i is the iteration index and vy)(n) denotes the ith-iteration estimate of vj(n). It is evident from Equation (2.240) that the method utilizes an estimate of the gradient of the cost-function surface at each iteration. This estimate can be rather poor and further degraded by the presence of output-additive noise [i.e., a term could be added at the end of Equation (2.240) representing the gradient ofthe input-dependent residual w, in Equation (2.231)]. Therefore, the reader should be disabused of any simplistic notions of reliable gradient evaluation ofthe putative cost-function surface or ofthe certitude of smooth convergence in general. It is evident that, even in this relatively simple case where the model is linear with respect to the unknown parameters and the cost function is quadratic, the convergence of the gradient-descent algorithm raises a host of potential pitfalls. Nonetheless, experience has shown that gradient-descent methods perform (on the average) at least as weIl as any existing alternative search methods that have their own set of potential pitfalls. It should be also noted that the three types of error defined in the introduction of this section (model specification, model estimation, and noise related) are intertwined in their effects following the iterative cost-minimization approach. For instance, the use of the proper model order does not guarantee elimination ofthe model specification error (since entrapment in a local minimum is still possible), or the use of very long data records does not guarantee diminution of the model estimation error (no claim of "consistent estimation" can be made). Likewise, it is not possible to quantify separately the effects of noise based on a measure of signal-to-noise ratio in the data, as these effects depend on the unpredictable distortion ofthe cost-function surface in the neighborhood ofthe global minimum. For all these reasons, reliable quantitative analysis of estimation errors is not generally possible for the iterative cost-minimization methods.
HISTORICAL NOTE #2: VITO VOlTERRA AND NORBERT WIENER The field ofmathematical modeling ofnonlinear dynamic systems was founded on the pioneering ideas of the Italian mathematician Vito Volterra and his American counterpart Nobert Wiener, the "father of cybernetics." Vito Volterra was born at Ancona, Italy in 1860 and was raised in Florence. His strong interest in mathematics and physics convinced his reluctant middle-class family to let hirn attend university, receiving a Doctorate in Physics from the University of Pisa in 1882. He rose quickly in mathematical prominence to become the youngest Professor ofthe University ofPisa in 1883, at the age of23. In 1890, he was invited to assume the Chair ofMathematical Physics in the University ofRome, reaching the highest professional status in Italian academe. In his many contributions to mathematical physics and nonlinear mechanics, Volterra emphasized a method of analysis that he described as "passing from the finite to the infinite." This allowed hirn to extend the mathematical body of knowledge on vector spaces to the study offunctional spaces. In doing so, he created a general theory offunctionals (a function of a function) as early as 1883 and applied it to a class of problems that he termed "the inversion of definite integrals." This allowed solution of long-standing problems in nonlinear mechanics (e.g., hereditary elasticity and hysteresis) and other fields of mathematical physics in terms of integral and integrodifferential equations. In his seminal monograph "Theory of Functionals and of Integral and Integro-differential Equations," Volterra discusses the conceptual and mathematical extension of the Taylor series expan-
HISTORICAL NOTE #2: VITO VOL TERRA AND NORBERT WIENER
141
sion to the theory of analytic functionals, resulting in what we now call the Volterra series expansion, which is at the core of our subject matter. The introduction of the Volterra series as a general explicit representation of a functional (output) in terms of a causally related arbitrary function (input) occupies only one page in Volterra's seminal monograph (p. 21 in the Dover edition of 1930) and the definition of the homogeneous Volterra functionals occupies only two pages (pp. 19-20). Nonetheless, the impact ofthese seminal ideas on the development ofmethods for nonlinear system modeling from input-output data has been immense, starting with the pivotal influence on Wiener's ideas on this subject (as discussed below). The breadth of Volterra's intellect and sociopolitical awareness led hirn to become a Senator of the Kingdom of Italy in 1905 and to take active part in World War I (1914-1918). He was instrumental in bringing Italy to the side ofthe Allies (the Entente) and although in his mid-50s, he joined the Air Force and developed its capabilities, flying hirnself with youthful enthusiasm in the Italian skies. He assiduously promoted scientific and technical collaboration with the French and English allies. At the end ofWorld War I in 1918, Volterra returned to university teaching and commenced research on mathematical biology regarding population dynamics of competitive species in a shared environment. This work was stimulated by extensive interactions with Umberto D'Ancona, a Professor of Biology in the University of Siena, and remained his primary research interest until the end ofhis life in 1940. Volterra's later years were shadowed by the oppressive grip of fascism rising over Italy in the 1920s. He was one ofthe few Senators who had the courage to oppose vocally the rise of fascism at great personal risk, offering to all ofus a life model of scientific excellence coupled with moral convictions and liberal thinking. When democracy was completely abolished by the Fascists in 1930, he was forced to leave the University ofRome and resign from all positions of influence. His life was saved by the international renown of his scientific stature. Since 1930 and until his death in 1940, he lived mostly in Paris and other European cities, returning only occasionally to his country house in Ariccia, where he took refuge from the tunnoil of a tormented Europe. Volterra lived a productive and noble life. He remains an inspiration not only for his pioneering scientific contributions but also for his highly ethical and principled conduct in life. In the words of Griffith Evans from his Preface to Volterra's monograph (Dover edition), "His career gives us confidence that the Renaissance ideal of a free and widely ranging knowledge will not vanish, however great the presence of specialization." Let me add that Volterra' s noble life offers a model of the struggle of enlightened intellect against the dark impulses of human nature, which remains the only means for sustaining the advancement of civilization against reactionary forces. It is intriguing that with Vito Volterra's passing in 1940, and as the world was getting embroiled in the savagery of World War 11, Norbert Wiener was joining the war effort against the Axis on the other side of the Atlantic and was assuming the scientific mantle of advancing Volterra's pivotal ideas (regrettably without explicit acknowledgement) with regard to the problem of nonlinear system modeling from input-output data. Norbert Wiener was born in 1894 and demonstrated an early aptitude in mathematics (he was a child prodigy), receiving his Ph.D. at Harvard at the age of 19 and becoming a Professor of mathematics at MIT at the age of 25, where he remained until the end of his life in 1964. Among his many contributions, the ones with high scientific impact have been the solution of the "optimal linear filtering" problem (the Wiener-Hopf equation)
142
NONPARAMETRIC MODELING
and its extension to nonlinear filtering and system identification (the Wiener functional series) that is germane to our subject. However, the most influential and best known ofhis intellectual contributions has been the introduction of"cybemetics" in 1948 as the science of"communication and control in the animal and the machine." Wiener authored several books and monographs on the mathematical analysis of data, including The Fourier Integral and Certain ofits Applications (1933) and Extrapolation and Interpolation and Smoothing of Stationary Time Series with Engineering Applications (1949). He also authored a two-volume autobiography, Ex-Prodigy: My Childhood and Youth (1953) and I Am a Mathematician (1956), as well as a novel, The Temper (1959). In the last year ofhis life, he published God and Golem (1964), which received posthumously the National Book Award in 1965. Wiener was an original thinker and agrand intellectual force throughout his life. His ideas had broad impact that extended beyond science as they shaped the forward perspective of our society with regard to cybemetics and ushered in the "information age." Germane to our subject matter is Wiener's fundamental view "... that the physical functioning of the living individual and the operation of some of the newer communication machines are precisely parallel in their analogous attempts to control entropy through feedback" [from The Human Use of Human Beings: Cybernetics and Society (1950) p. 26; emphasis added]. This view forms the foundation for seeking mathematical or robotic emulators of living systems and asserts the fundamental importance of feedback in physiological system function (i.e., homeostatic mechanisms constraining disorganization, which is analogous to "controlling entropy"). This view is the conceptual continuation of the Hippocratic views on "the unity of organism" and "the recuperative mechanisms against the disease process" with regard to the feasibility of prognosis. Furthennore, Wiener's specific mathematical ideas for modeling nonlinear dynamic systems in a stochastic context, presented in bis seminal monograph Nonlinear Problems in Random Theory (1958), have provided great impetus to our collective efforts for modeling physiological systems. Wiener's seminal ideas on systems and cybemetics have shaped the entire field of systems science and have inspired its tremendous growth in the last 50 years. This development was greatly assisted by his interactions with the MIT "statistical communication theory group" under the stewardship of Y. W. Lee (Wiener's former doctoral student). Through these interactions, Wiener's ideas were adapted to the practical context of electrical engineering and were elaborated for the benefit of the broader research community by Lee's graduate students (primarily Amar Bose, who first expounded on Wiener's theory of nonlinear systems in a 1956 RLE Technical Report). Lee hirnself worked with Wiener (initially for bis doctoral thesis) on the use ofLaguerre filters for the synthesis of networks/systems (a pivotal Wiener idea that has proven very useful in recent applications) and the use of high-order cross-correlations for estimation of Wiener kernels (in collaboration with Lee's doctoral student, Martin Schetzen). Wiener's ideas on nonlinear system modeling were presented in aseries of lectures to Lee's group in 1956 and were transcribed by this group into Wiener's seminal mono graph Nonlinear Problems in Random Theory. Wiener's ideas on nonlinear system identification were first recorded in abrief 1942 paper in which the Volterra series expansion was applied to a nonlinear circuit with Gaussian white noise excitation. Following on the orthogonalization ideas of Cameron and Martin (1947), and combining them with his previously published ideas on the "homogeneous chaos" (1938), he developed the orthogonal Wiener series expansion for
HISTORICAL NOTE #2: VITO VOLTERRA AND NORBERT WIENER
143
Gaussian white noise inputs that is the main thrust of the aforementioned seminal monograph and his main contribution to our subject matter (see Section 2.2). Wiener's ideas were further developed, elaborated, and promulgated by Lee.'s group in the 1950s [Bose, 1956; Brilliant, 1958; George, 1959] and in the 1960s with the Lee-Schetzen cross-correlation technique (1965) having the greatest impact on future developments. However, the overall impetus ofthe MIT group began to wane in the 1970s, while at the same time the vigor of Wiener' s ideas sprouted considerable research activity at Caltech on the subject ofvisual system modeling, spearheaded by my brother Panos in collaboration with Ken Naka and Gilbert McCann (see Prologue). As these ideas were put to the test of actual physiological system modeling, the weaknesses of the Wiener approach (as weIl as its strengths) became evident and stimulated an intensive effort for the development of variants that offered more effective methodologies in a practical context. This effort spread nationally and bore fruit in the form of more effective methodologies by the late 1970s, when the Caltech group was dissolved through administrative fiat in one of the least commendable moments of the school' s administration. The state of the art at that time was imprinted on the monograph Analysis 0/ Physiological Systems: The White-Noise Approach, authored by the Marmarelis brothers The torch was passed on to a nationwide community of researchers who undertook to apply this methodology to a variety of physiological systems and further explore its efficacy. It became evident by this time that only some core elements ofthe original Wiener ideas remained essential. For instance, it is essential to have broadband (preferably stochastic) inputs and kernel expansions for improved efficiency of estimation, but there is no need to orthogonalize the Volterra series or necessarily use Gaussian white noise inputs. This process culminated in the early 1990s, with the development of efficient methodologies that are far more powerful than the original methods in a practical application context (see Section 2.3). On this research front, a leading role was played by the Biomedical Simulations Resource Center at the University of Southern California (a research center funded by the National Institutes of Health since 1985) through the efforts ofthe author and his many collaborators worldwide. As we pause at the beginning of the new century to reflect on this scientific journey that started with Volterra and Wiener, we can assert with a measure of satisfaction that the roots planted by Hippocrates and Galen grew through numerous efforts all the way to the point where we can confidently avail the present generation of scientists with the powerful methodological tools to realize the "quantum leap" in systems physiology through reliable models that capture the essential complexity of living systems.
3 Parametrie Modeling
Parametrie modeling typically takes the form of algebraic or differentialldifference equations, depending on whether the system is static or dynamic, respectively. The term "parametric" derives from the free-parameter status of the coefficients of the model equation that must be estimated from data. Both continuous-time and discrete-time parametric models can be used, although the discrete-time form (difference equations) is the one obtained directly from the sampled input-output data. Equivalent continuous-time models (in the form of nonlinear differential equations) can be derived from their discrete-time counterparts (nonlinear difference-equation models) using a specialized technique discussed in Section 3.5. Historically, differential-equation models have been developed (actually postulated) for various physiological systems based on first physicallchemical principles, following the reductionist approach of physical sciences. These parametric models are extremely useful if accurate, because they are physiologically interpretable and often compact. However, the deductive process by which they are developed is often fraught with dangers of misrepresentation and oversimplification. The result can be an inaccurate model with pretensions of scientific pedigree that does not capture the true dynamics of the physiological system under study. Therefore, the validity of postulated parametric models must be tested rigorously with a broad ensemble of input signals and under various operational conditions, representative of the actual operating environment of the system. The difficulty of developing accurate parametric models of physiological systems from first principles (due to the complexity of such systems) has motivated a more practical procedure, whereby we postulate a class of plausible parametric models and select the most likely member of this class on the basis of the available input-output data. This process incorporates all existing knowledge (and intuition) about the characteristic properties and the functional organization of the system. The free parameters of the selected model are estimated from the data as part of the modeling process. Nonlinear Dynamic Modeling 0/ Physiological Systems. By Vasilis Z. Mannarelis ISBN 0-471-46960-2 © 2004 by the Institute ofElectrical and Electronics Engineers.
145
146
PARAMETRIC MODELING
Another approach that leads to parametric models in a "data-true" fashion commences with inductive nonparametric modeling using the input-output data and then transitions to an equivalent parametric model, as discussed in Section 3.4. The reverse route is also possible (i.e., going from a parametric model to an equivalent nonparametric model), as discussed in Section 3.2 for continuous-time models and in Section 3.3 for discrete-time models. The important case of nonlinear feedback and its relation to parametric models of nonlinear differential equations is discussed in Section 4.1.5 in the modular modeling context. Nonlinear feedback is an essential functional component of many physiological systems used to achieve homeostasis through autoregulation or improved performance (e.g., enhancement of sensory/neural information processing). Note that the connectionist models discussed in Chapter 4 may also be viewed as parametric models since the modeling task reduces to estimating the free parameters of the respective connectionist modellnetwork. However, the distinction is made herein between the two types of model structures: networks (connectionist) versus equations (parametric), because the latter maintain the claim of physical lineage and physiological interpretability (as well as parsimonious representation).
3.1 BASIC PARAMETRIC MODEL FORMS AND ESTIMATION PROCEDURES Consider the discrete-time input and output data, x(n) andy(n), respectively. Ifthe system is static and linear, then we can employ the simple model of linear regression, y(n) = ax(n) + b + e(n)
(3.1)
to represent the input-output relation for every discrete-time index n, where e(n) denotes the error term (or model residual) at each n. The unknown parameters (a, b) can be estimated through least-squares linear regression methods, as discussed in Section 2.1.5. This model can be extended to multiple inputs {x., X2, . . . ,Xk} and outputsjy.} as Yln) = a i,l xl(n) + a i,2 x2(n), ... , ai,kxk(n) + b + e(n)
(3.2)
and various linear regression techniques can be used for the estimation of the unknown parameters (a;,b . . . , a;,h b) for each output Yi [Eykhoff, 1974; Soderstrom & Stoica, 1989; Ljung & Soderstrom, 1983; Ljung, 1987; Goodwin & Payne, 1977]. If the system is static and nonlinear, then a nonlinear input-output relation J
y(n) = ICjFj[x(n)] + e(n) j=l
(3.3)
can be used as a parametric model for the single-input/single-output case, where the functions {Fj} represent a set of selected nonlinear functions (e.g., polynomials, sinusoids, sigmoids, exponentials, or any other suitable set of functions over the range of values of x), and {c.} are the unknown parameters that can be estimated through linear regressionprovided that the {Fj} functions do not contain other unknown parameters in a nonlinear fashion. In the latter case, nonlinear regression methods must be used (e.g., gradientbased methods) as discussed in Chapter 4. Naturally, the choice of the functions {Fj} is
3.1
BASIC PARAMETRIC MODEL FORMS AND ESTIMATION PROCEDURES
147
critical in terms of modeling efficiency and depends on the characteristics of the system or the type of available data. The relatively simple case of static systems (discussed above) has been extensively studied to date but have only limited applicability to actual physiological systems, since the latter are typically dynamic-i.e., the output value at time n depends also on input and/or output values at previous times (lags). Note that the possible dependence of the present output value on previous output values (autoregressive terms) can be also expressed as a dependence on previous input values. Thus, we now turn to the practically important case of dynamic systems. We begin with the case oflinear (stationary) dynamic systems, where the discrete-time parametric model takes the form of an autoregressive moving average with exogenous variable (ARMAX) model described by the difference equation:
yen)
=
aly(n - 1) + ... + aky(n - k) + ßoX(n) + ßlx(n - 1)
+ ... + ß,;x(n - m) + wen) + Ylw(n - 1) + ... + yqw(n -
q)
(3.4)
where wen) represents a white-noise sequence. This ARMAX model is a difference equation that expresses the present value ofthe output, yen), as a linear combination of k previous values ofthe output (autoregressive part), previous (and the present) values ofthe input (exogenous part), and q previous (and the present) values of the white-noise disturbance sequence (moving-average part). The latter is assumed independent ofthe input-output data and composes the model residual (error term). When Yi = 0 for all i =1= 0, then the residuals form a white sequence, and the coefficients (ab' .. , ab ßo, ßb ... , ßm) can be estimated through the ordinary least-squares estimator ofEq. (2.42). However, if any Y. is nonzero, then minimum-variance estimation of the coefficients requires the generalized least-squares estimator ofEq. (2.46), also required for the multiple regression model ofEquation (3.2) when e(n) is a nonwhite error sequence (see Section 2.1.5). In the case of noise present at the input data, as weIl as at the output data, "total least-squares" procedures must be used that introduce additional complexity in the estimation of the model parameters. Although the estimation of ARMAX model parameters can be straightforward through linear regression, the model-order determination remains achallenge, i.e., determining the maximum lag values (k, m, q) in the difference equation (3.4) from given input-output data, x(n) and y(n). A number of statistical procedures have been devised for this purpose (e.g., weighted residual variance, Akaike information criterion, minimum description length criterion, or our model-order selection criterion presented in Section 2.3.1), all of them based on minimizing a quantity that involves the prediction error for given model order (k, m, q) compensated for properly by the total number ofmodel parameters. Typically, the sum of the squared residuals is used to quantify the prediction error in the statistical model-order criterion. It is critical that the prediction error be evaluated on a set of input-output data distinct from the one used for the estimation of the model parameters ("out-of-sample" prediction). Application of these statistical criteria is repeated successively for ascending values ofthe model order (k, m, q), and the order selection process is completed when the minimum criterion value is achieved [Soderstrom & Stoica, 1989]. Another point of frequent confusion in practice is whether the prediction errors (residuals) are computed using the autoregressively estimated model output or the observed output values in the autoregressive terms of the model. These are often referred to as the "open loop" or "closed loop" conditions for error estimation. We use the observed output
148
PARAMETRIC MODELING
values (open-loop conditions) to avoid problems from accumulation of output estimation errors in a closed-loop recursive scheme. It should be noted that the term more commonly used in present practice of linear parametric modeling is ARMA (without the letter X, denoting the exogenous variable), where the "moving-average" part refers to the exogenous variable instead of the residual term. Although this is not consistent with the original definition of the ARMAX model, it makes practical sense because actual investigations seldom delve into the detailed modeling of the residuals, limiting themselves instead to what should be properly termed ARX modeling. Therefore, the reader should be alerted to the possible use of the term "ARMA model" in the literature instead of what should be properly called "ARX model" according to the original definition. This potential confusion is bound to increase as more investigations extend to multiple exogenous variables. In the latter case, the autoregressive (AR) part involving lagged values of the "output variable" remains structurally the same, but more exogenous variables and their appropriate lagged values are included on the right-hand side of the difference equation to represent the multiple "input variables," as shown below for the two-input (Xl' X2) case: y(n) = aly(n - 1) +
+ ßO,2X2(n) +
+ aky(n -
k)
+ ßO,IXI(n) + ... + ßmJ,IXI(n- ml)
+ ßm2,2X2(n - m2) + e(n)
(3.5)
where the structure ofthe residual term remains the same as before: e(n) = w(n) + 'Ylw(n - 1) + ... + 'Yqw(n - q)
(3.6)
In the original definition, this model would be called an "ARMAX 1X2 " model, but it is more likely nowadays to be called a model with "two moving-average components." This recently adopted convention of ARMA terminology to describe the input-output relation attains greater importance for our purposes as the discrete-time Volterra model of Eq. (2.32) is now referred to (on occasion) as a "nonlinear moving-average model." The multiple linear regression problem defined by any ofthe Equations (3.2)-(3.5) can be written in a vector form: y(n) = r'(n)8 + e(n)
(3.7)
where r(n) represents the vector of all M regression variables in each case, the prime denotes "transpose" and 8 denotes the unknown parameter vector. For a set of data points (n = 1, ... ,N) Equation (3.7) yields the matrix formulation y=R8 +
8
(3.8)
where y is the output vector, R is the (N x M) reetangular regression matrix, and 8 is the vector of the residuals. The general solution to this regression problem, which minimizes the sum of the squared residuals, is given by means of the "generalized inverse" R+ of the reetangular matrix R [Fan & Kalaba, 2003]: 8=R+y
(3.9)
3.1
149
BASIC PARAMETRIC MODEL FORMS AND ESTIMATION PROCEDURES
This generalized inverse (or pseudoinverse) yields the following output prediction: y=RR+y
(3.10)
e> [1-RR+]y
(3.11 )
resulting in the residuals
where the (N x N) matrix [I - RR+] is idempotent and of rank less than or equal to (N M). The sum of squared residuals (SSR) for this general solution is 'I' = y'[1 - RR+]y
(3.12)
Ifthe output data vector y is the sum of a noise-free vector Yo and a zero-mean noise vectorw, then the mean ofthe parameter vector estimate is
E[Ö] = R'y,
(3.13)
E[ÖÖ'] = R+C(R+)'
(3.14)
and its covariance matrix is
where C is the covariance matrix of the noise vector w. The mean and variance of the SSR are
E['I'] = yb[1- RR+]yo
(3.15)
var['I'] = yb[1- RR+]'C[I - RR+]yo
(3.16)
When the residuals are uncorrelated and stationary (C var['I'] = UÖ .
=
Ei'I']
UÖI), then (3.17)
and 'I'/uo follows a chi-square distribution with (N - M) degrees of freedom, ifthe matrix R is full-rank and the residuals are Gaussian. If the latter is not true but N is very large, then the Central Limit Theorem stipulates an approximately Gaussian distribution for 'I' (provided the residuals do not have many outliers). When the (N x M) matrix R is full-rank (N - M), then the generalized inverse R+ takes the form that yields the ordinary least-squares (OLS) estimate of 6 (see Section 2.1.5): ÖOLS
= [R'R]-IR'y
(3.18)
which yields unbiased and consistent estimates ifthe residuals e(n) are uncorrelated and independent from the regression variables. These estimates are also efficient (i.e., they have the minimum variance among all linear estimators) if the uncorrelated residuals are Gaussian. As discussed in Section 2.1.5, the only practical complications here may arise from the numerical inversion of the Gram matrix [R'R] if the latter is ill-conditioned or singular. However, the use ofthe pseudoinverse R+ obviates this concern. Ifthe residuals
150
PARAMETRIC MODELING
are not Gaussian, then the efficient estimates result from minimization of a cost function that is defined by the minus log-likelihood function of the non-Gaussian residuals (see Section 2.1.5). If the residuals are correlated and Gaussian, then the generalized least-squares (GLS) estimator
a
GLS
= [R'C-1R]-lR'C-1y
(3.19)
ought to be used to obtain unbiased and consistent estimates with minimum variance, where C denotes the covariance matrix of the residuals. Practical complications arise from the fact that C is not apriori known and, therefore, must be either postulated or estimated from the data. The latter case, which is more realistic in actual applications, leads to an iterative procedure that may not be convergent or may not yield satisfactory results (i.e., start with an OLS estimate of 8, obtain a first estimate of 8 and C, evaluate a first GLS estimate of 8, obtain a second estimate of 8 and C, and iterate until the process converges to a final GLS estimate of 8). As an alternative to this iterative procedure, we can obtain an estimate of the movingaverage model ofthe residual term given by Equation (3.6), using the residual values resulting from initial OLS estimation ofthe model parameters, and then evaluate the covarlance matrix C from this moving-average model for subsequent use in GLS estimation. Although this alternative approach avoids iterations and problems of convergence, it does not necessarily yield satisfactory results because it is very sensitive to possible errors in the residuals model of Equation (3.6). Note that the residual term of the model of Equation (3.5) contains all the noise and errors associated with all input-output variables and their lagged values involved in the model. Equivalent to this latter procedure is the residual whitening method, which amounts to prefiltering the data with the inverse of the transfer function corresponding to Equation (3.6) (prewhiteningfilter), prior to OLS estimation. The quality ofthe results depends on the accuracy and time-invariance of the noise model. In this case, of course, the input-output variables get linearly transformed by the prewhitening filter. As another alternative, the parameter vector can be augmented to include the coefficients {l'i} of the moving-average model of the residuals in Equation (3.6), leading to a pseudolinear regression problem [since the estimates of w(n) depend on 8] that is implemented as an iterative extended least-squares (ELS) procedure [Billings & Voon, 1984,1986; Ljung, 1987; Ljung & Soderstrom, 1983; Soderstrom & Stoica, 1989]. This procedure can be also used for the estimation of the parameters of a NARMAX model (see below) and it is subject to a host of potential problems arising from the combination of the noise (intrinsic to the data) with the nonlinear terms of the NARMAX model structure.
3.1.1
The Nonlinear ease
In the more general case ofnonlinear (stationary) systems, the ARMAX model can be extended to the NARMAX model (nonlinear ARMAX) that includes nonlinear multinomial expressions of the variables on the right-hand side of Equation (3.4) [Billings & Voon, 1984; Leontaritis & Billings, 1985a,b]. This may include difference equations that exhibit chaotic behavior [Barahona & Poon, 1996]. For instance, a second-degree multinomial NARMAX model of order (k = 2, m = 1, q = 0) takes the form
3.1
BASIC PARAMETRIC MODEL FORMS AND ESTIMATION PROCEDURES
151
y(n) = lXtY(n -1) + lX2Y(n - 2) + lXI,tT(n -1) + lXI,2Y(n - 1)y(n - 2) + lX2,2y(n - 2) + ßox(n) + ßlx(n - 1) + ßo,oX2(n) + ßO,Ix(n)x(n - 1) + ßl,Ix2(n - 1) + l'I,uY(n - 1)x(n)
+ 1'2,uY(n - 2)x(n) + l'I,tY(n - 1)x(n - 1) + 1'2,tY(n - 2)x(n - 1) + w(n)
(3.20)
It is evident from Equation (3.20) that the form of a NARMAX model may become rather unwieldy, and the model specification task is very challenging (i.e., selecting the appropriate form and degree of nonlinear terms, as well as the number of input, output, and noise lags involved in the model). Several approaches have been proposed for this purpose [Billings & Voon, 1984, 1986; Haber, 1989; Haber & Unbehauen, 1990; Korenberg, 1983, 1988; Leontaritis & Billings, 1985a,b; Zhao & Marmarelis, 1997, 1998] and theyare all rather sensitive to the presence of data-contaminating noise. Computationally, when the structure of the NARMAX model is established, then the parameter estimation task can be straightforward using multiple linear regression (as discussed above), since the unknown parameters enter linearly in the NARMAX model. However, the results are reliable only for cases of additive and/or limited noise. Otherwise, the iterative "extended least-squares" procedure ought to be employed, which is subject to a host of potential problems, including the convergence of the iterative procedure and the detrimental effect ofmultiplicative noise terms (emerging from the nonlinear output terms ofthe NARMAX model, even in the relatively simple case of output-additive noise). In order to appreciate the potential complications arising from the presence of noise in NARMAX models, the reader should be reminded that noise (or errors, in general) additive to the input and/or output data results in multiplicative terms (with the input and/or output) in the context ofNARMAX modeling, although it results only in additive residual terms in the context ofthe ARMAX model. For instance, output data contaminated by additive noise, y = y + 8, will generate the square term y2 = Y + 2t,y + e2, which gives rise to the output-multiplicative noise term (t,y). In addition, the possible Gaussian characteristics of the noise are transformed by the e2 term into non-Gaussian (chi-square) characteristics for the model residuals. This example can be extended to all nonlinear terms of the NARMAX model (products of input/output lagged values) to depict the resulting complexity of the noise effects on parameter estimation. The problem of multiplicative noise/error terms is confounding in practice, because existing estimation methods assume that the noise/error (residual) terms are additive to the input and/or output. In fact, even the case of simultaneous input and output additive noise remains practically challenging in the context of "total least-squares" methods, which necessitate assumptions about the noise statistics and the relative input/output noise power. The complication arising from non-Gaussian residuals can be resolved with iterative cost-minimization procedures wherein the cost function is determined by the statistics of the residuals, as discussed in Section 2.1.5 for Volterra kernel estimation, and in Section 4.2.2 in connection with gradient-based training of the Volterra-equivalent network (connectionist) models that are intrinsically nonlinear in the parameters. A key practical issue in the use ofNARMAX models is the determination ofthe model structure (i.e., the number and type of nonlinear combinations of input and output lagged terms). For this purpose, the prediction-error stepwise-regression (PESR) method was proposed by Billings and Voon (1984), according to which the model structure is determined by computing the partial correlations between the output and all possible candidate terms involving input and output lagged values in multinomial combinations. By applying the F-ratio test on the computed partial correlations, we can select the
152
PARAMETRIC MODELING
"significant" tenns of the NARMAX structure. The PESR method works well under lownoise conditions, but it may encounter convergence problems and severe inaccuracies under high-noise conditions. Moreover, when the PESR method is used to detennine the structure ofthe NARMAX model, the partial correlations ofthe output with all the candidate tenns may be rather numerous and may result in heavy computational burden. Another promising method for detennining the NARMAX model structure and estimating its parameters has been proposed by Korenberg (1983, 1988) and appears to offer some advantages in certain cases. An alternative identification method ofNARMAX models can be based on Volterra kernel estimates (ofup to third order), under the condition that the NARMAX model can be satisfactorily approximated by a third-order Volterra model. For the third-order NARMAX model to be described adequately by the first three orders of Volterra kemels, the coefficients of the nonlinear autoregressive tenns must retain small magnitudes relative to the linear terms, which implies that the Volterra kernels of order higher than third can be neglected [Zhao & Mannarelis, 1994a, 1997]. Since computationally efficient techniques currently exist by which Volterra kernels can be estimated accurately (up to third order) even with rather noisy data [Marmarelis, 1993], it is plausible that more accurate NARMAX models can be obtained indirectly from these kerne I estimates, rather than direct1y from the input-output data [Zhao & Marmare1is, 1998]. This method is presented in Section 3.4 and separates the terms of different order in the NARMAX model based on the mathematical relations between NARMAX and Volterra models discussed in Section 3.3. 3.1.2
The Nonstationary ease
If the system exhibits nonstationarities, then the regression coefficient vector 6 of model parameters in Eq. (3.7) will vary through time and can be estimated either in a piecewise stationary fashion over a sliding time window (batch processing) or in a recursive fashion using an adaptive estimation fonnula (recursive processing). The latter approach has been favored in recent years for parametric modeling (Goodwin & Sin, 1984; Ljung, 1987; Ljung & Soderstrom, 1983; Ljung & Glad, 1994) and is briefly outlined below. An interesting alternative is to introduce a specific parameterized time-varying structure for the parameters, thereby augmenting the unknown parameter vector to include the additional parameters of the time-varying structure (e.g., a pth degree polynomial structure of time-varying model parameters will introduce p additional parameters for each time-varying term in the vector 6 that need be estimated from the data via batch processing). Implementation of the batch processing approach folds back to the previously discussed least-squares estimation methods. However, the recursive approach requires a new methodological framework that seeks to update continuously the parameter estimates on the basis of new time-series data. This adaptive or recursive approach has gained considerable popularity in recent years, although certain important practical issues remain open regarding the speed of algorithmic convergence and the effect of correlated noise [Goodwin & Sin, 1984; Ljung, 1987; Ljung & Soderstrom, 1983; Ljung & Glad, 1994]. The basie formulae for the recursive least squares (RLS) algorithm are 6(n) =6(n - 1) + s(n)[y(n) - r'(nj(n - 1)]
(3.21)
sen) = r(n)P(n - l)r(n)
(3.22)
3.2
P{n)
153
VOLTERRA KERNELS OF NONLINEAR DIFFERENTIAL EQUATIONS
=
P{n - 1) - ')I(n)P{n - l)r(n)r'{n)P(n - 1)
(3.23)
')I(n) = [r'{n)P{n - l)r{n) + ~(n)]-l
(3.24)
where the matrix P{n), the vector sen), and the scalar ')I(n) are updating instruments ofthe algorithm computed at each step n, and {a(n)} denotes a selected sequence of scalar weights (often taken to be unity) for the squared prediction errors that define the cost function. The initialization of this algorithm is made by setting P{O) = coI, where Co is a large positive constant (to suppress the effect of initial conditions) and I is the identity matrix, as weIl as giving very small random values to the initial parameter vector 9(0). This recursive algorithm can be used for on-line identification of slowly varying nonstationary systems to track changes in (near) real time. A critical issue for the efficacy of this approach is the speed of algorithmic convergence relative to the speed of time variation of the system/model parameters. It is important to note that when output autoregressive terms exist in the model of Equation (3. 7), the regression vector r{n) is correlated with the residual e( n), and, thus, none of the aforementioned least-squares estimates of the parameter vector will converge (in principle) to the actual parameter values. This undesirable effect (estimation bias) is somewhat mitigated when the predicted output values are used at each step for the autoregressive lagged terms (closed-loop condition or global predictive model) instead of the observed output values (open-loop condition or one-step predictive model). To remedy this problem, the instrumental variable (IV) method has been introduced, which selects an IV uncorrelated with the residuals but strongly correlated with the regression vector ren) in order to evaluate the least-squares estimate [Soderstrom & Stoica, 1989]. The IV estimates can be obtained in recursive fashion, as weIl as in batch processing mode.
3.2
VOLTERRA KERNELS OF NONLINEAR DIFFERENTIAL EQUATIONS
This section discusses the fundamental issue of the mathematical relation (and equivalence) between nonlinear parametrie and nonparametric models for stable nonlinear stationary systems. The issue is addressed by deriving the equivalent nonparametric (Volterra) models for the broad class of stable nonlinear dynamic systems described by the nonlinear differential equation model of Equation (3.25). Note that if nonlinear terms involving the input are present on the right-hand side ofEquation (3.25), they will give rise to singularities (delta functions) in the high-order kerneis of the equivalent Volterra model. Since such singularities have not been observed thus far in second-order kernel estimates from a broad variety of physiological systems, we have not included such nonlinear terms on the right-hand side ofEquation (3.25). Additional classes of nonlinear differential equations can be studied with the same approach if they have stable solutions. One such important class involves bilinear terms (between input and output or internal state variables) that may be appropriate for describing modulatory effects of the nervous, metabolic, or endocrine systems. Its nonparametric analysis is discussed below with an illustrative example drawn from the so-called "minimal" glucose-insulin model. We consider the broad class of nonlinear dynamic systems described by the ordinary differential equation
154
PARAMETRIC MODELING
I I
L(D)y +
(3.25)
Ci,jyi(Dy'Y = M(D)x
i=O i=O
i+j~2
where L(D) and M(D) are polynomials of the differential operator D = d(·)/dt, and x(t) andy(t) are the system input and output, respectively. The polynomial L(D) is assumed to be of degree higher than first (i.e., it involves at least the second derivative ofy), and the polynomial M(D) is of lower degree than L(D) in order to avoid the emergence of singular (delta) functions in the equivalent Volterra kernels, The nonlinear terms are of degree two and higher, and may be viewed as terms of a Taylor expansion of an analytic function of y and its derivative. Furthermore, we assume that the coefficients {ci,j} of the nonlinear terms are of small magnitude (i.e., ICi,jl :5 e ~ 1 for all i andj) for stable operation ofthe system-a fact that also simplifies the approximate analytical expressions for the equivalent Volterra kernels, For the region of stable solutions ofEquation (3.25), there exists a Volterra functional expansion [see Equation (2.5)] that represents the system output in terms of aseries of multiple convolution integrals ofthe input. The kernel functions {kn} characterize the dynamics ofthe nonlinear system and can be derived analytically for the system ofEquation (3.25) using the method of "generalized harmonie balance" [Marmarelis, 1982, 1989a,f, 1991]. According to this method, the mth-order Volterra kernel can be evaluated in the mdimensional Laplace domain by considering the generalized harmonie inputs: xm(t)
= elt + ... +
e mt
(3.26)
where Si are distinct complex (Laplace) variables. Substituting the input xm(t) into the Volterra series expansion ofEquation (2.5), we obtain the output expression (for ko = 0) 00
Ym(t) =
m
n=IJI =1 00
=
m
I.I ....I
e(Sj\+ ... +Sjn)t
Jn=I
m
J... JknC
Tl> • • • , Tn)e-SjITI+'" -sjnTndTI • • •
m
I I ... Ie(Sjl+ ... +Sjn)tKn(SjI'··· ,Sjn) n=IiI=I
dt;
(3.27)
in=I
where K; is the n-dimensional Laplace transform ofthe Volterra kernel k.: The rth derivative of the output is given by the expression 00
IYYm(t)
=
m
m
I I ... I
n=I il =1
(SjI
+ ... + Sjn)r e(SjI+'"
+Sjn)tK n(SjI' ... ,Sjn)
(3.28)
in=I
Note that the output terms that contain the complex exponential e(sI+'" +Sm)t with all distinct complex frequencies (Laplace variables) ofthe input are terms associated with the mth-order kernel. Therefore, substitution ofthe output expression and its derivatives [given by Equations (3.27) and (3.28)] into Equation (3.25) yields an expression that can be used for the evaluation of the mth-order Volterra kernel Km(SI' ... , sm) on the basis of harmonie balance (i.e., by selecting only those terms in the equation that contain the complex exponential e(SI+' .. +sm)t). The contributions of these terms in the equation must balance out, according to the harmonie balance approach, and yield a set of equations that can be solved to achieve our stated goals.
3.2
VOLTERRA KERNELS OF NONLINEAR DIFFERENTIAL EQUATIONS
155
Let us follow this approach for increasing values of m. For m = 1 we have Yl(t) = L ens1t Kn(Sb ... ,SI) n=1
(3.29)
Dry 1(t) = L(nSl)r ensltKn(sJ, ... ,SI) n=1
(3.30)
and
The nonlinear terms [;1(DYIY] yield vi - ~ ... L ~ e(nl+ . . .+ni)Sl t . K nl(s b " " 1 J 1/ - L Y i(D"J nl=1 nj=l
S )
1 ..•
K ni(s b " " S1)
nI' ' .. n'sje(nl+ ... +nJ~)slt·K'(s L~ ... 'L i 1 nl b " " s) 1
'1 nl=
-r' 1
...
K,(s (331) ru b · · · , s) 1 . 'J
When Equations (3.29)-(3.31) are substituted in Equation (3.25), then, by balancing terms containing simply the factor es1t, we obtain the equation L(SI)K1(SI)es lt = M(SI)&l t
(3.32)
which can be solved to find the first-order Volterra kernel, in terms of the polynomials L(SI) and M(SI) defined in Equation (3.25). Therefore, the equivalent first-order Volterra kernel in the Laplace domain is K I(SI) = M(SI)/L 1(SI)
(3.33)
The Volterra kernel expression in the time domain can be found by inverse Laplace transform. Note that the nonlinear terms in Equation (3.25) do not contribute to K 1; i.e., the first-order Volterra kernel of the system represents strict1y the linear portion of the nonlinear differential equation. The equivalent second-order Volterra kerneI ofthis system can be found following the same procedure for m = 2. Then, 2
2
Y2(t) = L L ... L e(sil+'" +Sjn)tKn(Sjl' ... ,Sin) n=lil=1 in=I 2
Dry 2(t) = L
L
n=l JI =1
(3.34)
2
... L (Sil + ... + Sin)re(SjI + ... +sin)tKn(Sil' ... , Sin) i n =1
(3.35)
Since the expression for Y1(t)[DY2(t)Y is rather unwieldy, we focus only on the contributions ofthose terms that contain the exponential factor e(SI+S2)t. Since no such terms exist for i + j > 2, the only such contributions will come from the expressions for the nonlinear terms Y2(t)DY2(t), y!(t) and [DY2(t)]2, in addition, of course, to such contributions from the linear terms Y2(t) and Dry 2(t). Noting that the kemels are symmetrie functions of their arguments, we obtain the following equation from second-order harmonie balance:
156
PARAMETRIC MODELING
2L(Sl + s2)K2(sJ, S2) + 2C2,oK"l(Sl)K1(S2) + Cl,l(Sl + s2)K 1(sl)K 1(S2)
+ 2CO,2S1S2Kl(Sl)Kl(S2) = 0
(3.36)
Solving the algebraic Equation (3.36) for K 2 , we obtain Kz(s" sz) = [cz,o + Cl,I(SI + sz)/2 + co,zslsz]K1(SI)Kl(SZ) L(SI + S2)
(3.37)
To derive the equivalent third-order kernel, we repeat this procedure for m = 3. We see that terms containing the factor e(sl+S2+S3)t are contributed by Y3(t), IYY3(t), and y~(t)[Dy3(t)Y for (i + j) = 2, 3. Note, however, that the contributions from expressions for (i + j) = 2 are of order i2. Since we have assumed that ICi,j1 :5 e ~ 1, we can neglect these higher-order terms, and the resulting approximate third-order harmonie balance equation is 3!L(Sl + S2 + s3)K3(Sb S2, S3) + [3! C3,O + 2! C2,1(Sl + S2 + S3)
+ 2!Cl,2(SlS2 + S2S3 + S3S1) + 3 !CO,3S1S2S3]K1(sl)K1(S2)K1(S3) = 0
(3.38)
where terms of order e2 or higher have been omitted in first approximation. Solving Equation (3.38) for K 3 , we obtain the equivalent third-order Volterra kernel: K 3(Sb S2' S3) = · [ C3,O + C2,1
L( SI
SI
+ S2 + S3
)
+ S2 + S3 SlS2 + S2 S3 + S3S1 ] 3 + Cl,2 3 + CO,3 S1 S2S3 K 1(Sl)K1(S2)K1(S3) (3.39)
in first approximation (i.e., omitting terms of order e2 or higher). We observe that, for the mth-order harmonie balance of terms eontaining the factor e(sl+oo+sm)t, the only nonnegligible terms (i.e., of order e) will be contributed by the expressions for Ym(t), Drym(t), and y~(t)[DYm(t)]j for (i + j) = m. Therefore, in first approximation, we have the general expression (for m > 1): } In=O (m -m.n)!n! ,cm-n,nRm,n(Sb ... ,sm) K1(Sl) ... K1(sm) m
{
Km(sJ, ... , sm) =
L(SI + ... + Sm)
(3.40)
where Rm,n(sJ, ... , Sm) denotes the sum of all distinet products (Sj 1Sj2 ... Sjn) that ean be formed with eombinations ofthe indices (jbj2, ... ,jn) from the set (1, 2, ... , m). Note that Rm,o = 1 by definition. Equation (3.40) yields the approximate general expression for the equivalent high-order Volterra kernels of this elass of nonlinear differential systems, under the stated assumption of small-magnitude coefficients for the nonlinear terms of Equation (3.25). The derived analytieal expressions for the equivalent Volterra kemels allow the nonparametrie study of this broad class of parametrie nonlinear systems, whieh may also deseribe important eases of nonlinear feedback, as diseussed in Example 3.1 below and in Seetion 4.1.5.
3.2
VOLTERRA KERNELS OF NONLINEAR DIFFERENTIAL EQUATIONS
157
Another application of the derived analytical expressions is the study of linearized models with GWN inputs using the first-order Wiener kernel (apparent transfer function), discussed in Section 3.2.1. The important case of intennodulation mechanisms described by bilinear terms in differential systems (e.g., the "minimal" model of insulin-glucose interactions discussed in Sections 1.4 and 6.4) is discussed in Section 3.2.2. This type of model may have broad applications in physiological autoregulation and neuronal dynamics (see Section 8.1). Example 3.1 The Riccati Equation As a first analytical example, consider the well-studied Riccati equation: Dy+ay+by2=ex
(3.41)
that represents the parametric model of a system exhibiting a nonlinearity in the squared output term by': As discussed in Section 1.4, this parametric model can be also cast in a modular (block-structured) form of a nonlinear feedback model with a linear feedforward component and a static nonlinear (square) negative feedback component (see Figure 1.8). The equivalent Volterra kernels of this nonlinear parametric model can be obtained with the generalized harmonic balance method presented above. In order to simplify the resulting analytical expressions and limit the equivalent Volterra model to the second order, let us assume that Ibl ~ 1 (which is also required to secure stability ofthe system for a broad variety of inputs). Then the first-order Volterra kernel is found to be in the Laplace domain [cf. Equation (3.33)]:
e K 1(s) = s+a
(3.42)
k 1(T) = ee-aTu( T)
(3.43)
or in the time domain:
where u( T) denotes the step function (0 for T < 0, and 1 for T ~ 0). The second-order Volterra kerne1is found to be in the two-dimensional Laplace domain [cf. Equation (3.37)]:
K 2(s " S2) = -be
1
2 (SI
+ a)(s2 + a)(sl + S2 + a)
(3.44)
By two-dimensional inverse Laplace transform, we find the expression for the second-order Volterra kernel in the time domain:
k2( Tb
bc: T2) = - -
a
e-a(Tl+72)[ l
.
-
e" mlO(Tl,72)]
U( Tl)U( T2)
(3.45)
It is evident that the equivalent nonparametric model (even the second-order approximation for Ibl ~ 1) is much less compact that its parametric counterpart given by the Riccati equation (3.41). Nonetheless, it should be also noted that the simplification ofthe model specification task is the primary motivation for using the nonparametric approach. There fore , in the absence of sufficient prior knowledge about the system functional characteristics, the nonparametric approach will yield an inductive model "true to the
158
PARAMETRIC MODELING
data." On the other hand, the parametric approach requires firm prior knowledge to allow the specification of the parametric model form. Clearly, whenever this is possible, the parametric approach is more advantageous. Unfortunately, this is rarely possible in a realistic physiological context, although this fact has not prevented many questionable attempts in the past to postulate parametric models on very thin (or even nonexistent) supporting evidence. One of the primary thrusts of this book is to advocate that this questionable practice be discontinued as it engenders serious risks for the proper development of scientific understanding of physiological function. (First impressions, even when erroneous, are hard to change.) Parametrie models ought to be a desirable objective (because of their specific advantages discussed herein) but their derivation must be based on firm evidence. The advocated approach involves the transition from initial nonparametric models to equivalent parametric models, as discussed in Section 3.4. The mathematical relation between the Riccati equation and the Volterra functional expansion can be also explored through the integral equation that results from integration of the Riccati equation after multiplication (throughout) with the exponential function exp(at). Applying integration-by-parts on the derivative term, we obtain the integral equation (for t ~ 0) y(t) = y(O) e" - b
fo y(A)e-a(t-A)dA + e f x(A)ea(t-A)dA
(3.46)
0
which also includes the effect of the initial condition y(O). For noncausal inputs (extending to -00), the dependence on the initial value can be eliminated and the upper integration limits can be extended to infinity. Then, the integral equation (3.46) yields [by iterative substitution of the approximate linear solution Yl(t) into the square integral term of the right-hand side]the following second-order Volterra-like approximation ofthe output:
1 00
Y(t)
== e
o
e-atx(t - T)dT- bc' re-aAdAIre-a(Al+Al)x(t - A- AJ)x(t- A- Al)dAJdAl (3.47) 0
0
ifterms of order b2 and above can be ignored for very small values of Ibl. Changing into the integration variables: 'Tl = A + Al and 'T2 = A + A2 , we obtain the expressions for the first- and second-order Volterra kemels given by Equations (3.43) and (3.45). Note that a nonlinear feedback model (such as the one shown in Figure 1.8) is an equivalent modular (block-structured) model for the Riccati equation and can be viewed also as a feedforward model with many parallel branches that represent the iterative "looping" of the signal, akin to the iterative substitution method described above. This is depicted in Figure 3.1 and can be useful in understanding the effect ofnonlinear feedback in certain cases (provided the system remains stable). 3.2.1
Apparent Transfer Functions of Linearized Models
An interesting situation arises in practice, when an estimate of the "impulse response
function" of a "linearized model" of the nonlinear system is obtained for a GWN input through cross-correlation or cross-spectral analysis. This is formally the first-order Wiener kernel of the system. In the frequency domain, this is commonly referred to as the "apparent transfer function" (ATF) ofthe system, and yields the best linear model (in the
3.2
VOLTERRA KERNELS OF NONLINEAR DIFFERENTIAL EQUATIONS
~
I
I
~-/T'.
159
Y
y
x
I
..
....... GJGJ·:·
Figure 3.1 On the left is the block-structured nonlinear feedback model of the Riccati equation, where L denotes the first-order (linear filter) part of Equation (3.41) and N denotes a negative square (static) nonlinearity -b(·)2. On the right is the equivalent feedforward "signal-looping" model with an infinite number of parallel cascade branches that is stable for small values of b (allowing convergence of the series composed of the output contributions {y;} of the parallel branches).
output mean-square error sense) for GWN inputs (see Section 2.2.5). For the class of systems described by Equation (3.25), this ATF is given by [Marmarelis, 1989b] H[(jw) = K](jw) _
K]~w)
i I' 2
L(jw) m=l n=O
(2m - (n + l)!n!) m!
(~)m 471'
c2m+l-n,n
.f~ ... f
R2m+],n(jw,ju], -ju[> ... ,jum, -jum)IK](u,) ... K](u m)12du] ... du; (3.48)
where the previously derived analytical expressions (3.40) for the equivalent Volterra kemels of this class of systems are combined with Equation (2.153) under the assumption of small nonlinear coefficients. It is evident from Equation (3.48) that the ATF H1(jw) is apower series in P and depends on the coefficients of the nonlinear terms of Equation (3.25). Note that H1(jw) coincides with K1(jw) (which represents the linear portion of the differential equation) for P = 0, as expected. Inspection of the function R 2m+1,n(jw, iv.. -juJ, ... ,jum, -jum), as defined following Equation (3.40), indicates that its values for n even do not depend on co, whereas its values for n odd depend linearlyon jto. This leads to the alternative expression for the ATF: H[(jw)
= K](jw) _ K](jw)
i
L(jw) m=l
(P12)m m!
m
. L[(2m - 2/ + 1)!(2l)!C2m-21+1,21 + (jw)(2m - 2l)!(2/ + 1)!C2m-2/,21+1]Qm,1 (3.49) /=0
where Qm,l =
(2~)m f~ ... fRmiu2],. . . , u:,) . IK](u[) ... K](um)12du] ... du.;
(3.50)
160
PARAMETRIC MODELING
Considering the definition of Rm,f, we see that the constants Qm,l depend on the Euclidean norms of IK}(u)1 and luK}(u)l. For these quantities to be finite, the degree ofthe polynomial L(D) in Equation (3.16) must be at least two degrees higher than the degree ofthe po1ynomial M(D). Thus, Equation (3.49) can be also written as
H1(jw) = K1(jw) _
~~:: [A(P) + jwB(P)] A(P) + jWB(P)]
. [
- K1(jw) 1 -
L(jw)
(3.51)
where A(P) and B(P) are power series in P with coefficients dependent on {Qm,l} and {ci,j} for (i + j) odd [i.e., i odd andj even for A(P), and i even andj odd for B(P) coefficients]. Equation (3.51) indicates that the ATF H}vw) is affected only by the nonlinear terms ofEquation (3.25) for which (i + j) is odd, and depends on the power level P ofthe GWN input. Therefore, the linearized model obtained through cross-correlation or crossspectral analysis may deviate considerably from the linear portion of the system (represented by K}) if odd-degree nonlinear terms are present in Equation (3.25). The extent of this deviation depends on the values A and B, which in turn depend on the input power level. An illustrative example is given below [Marmarelis, 1989b]. Example 3.2 Illustrative Example Consider as an example a system of this class described by the differential equation (a 2/Y
where ICi)
~
+ a-D + ao)y + C3,QY3 + C2,t.f(Dy) + C3, 2y3(Dy)2 + CO,5(Dy)5 = boX (3.52)
1. Then, the only nonnegligible quantities {Qm,l} are Q},O =
1 Joo 21T IK}(u})1 2du I -00
Q2,}
=
K
(3.53)
=2KA
(3.54)
= A2
(3.55)
Q2,2
using Eq. (3.50), where 1 Joo A= luK}(u)1 2du 21T
(3.56)
-00
2
IK1(u)1 =
bÖ (ao- a2u2)2 + a1u2
(3.57)
Therefore, the ATF of the linearized model of this system is
H1(jw) = b jw(al - B) + (ao - A - a2w2 ) o Uwa} + (ao- a2w2)]
(3.58)
3.2
VOLTERRA KERNELS OF NONLINEAR DIFFERENTIAL EQUATIONS
161
where A(P) = 3C3,OQ},oP +
"23 C3,2Q2,}P2
B(P) = C2,}Q},oP + 15co,sQ2,2P2
(3.59) (3.60)
For very small values of P, which make A and B negligible relative to ao and ab respectively, the measured ATF H}(jw) has the poles and zeros of K}(jw). However, for values of P for whichA and B become significant relative to ao and a}, respectively, two new zeros emerge for H}(jw) and its poles double, as indicated by Eq. (3.58). These effects become, in general, more pronounced as the value of P increases (i.e., the nonlinear effects are increasing in importance as the GWN input power increases). To illustrate these effects, we use computer simulations for the parameter values a} = 2, a} = 3, a2 = 1, bo = 1, C3,O = 1/12, C2,} = 1/12, C3,2 = -1/12, and co,s = 1/10, which yield Q},O = 1/12, Q2,} = 1/36, Q2,2 = 1/36, and
P) 6
A(P)
= !:-(1 -
B(P)
= !:-(~ +
48
24
6
p)
(3.61)
(3.62)
The magnitude ofthe ATF IH}(jw)1 2 is shown in Figure 3.2, plotted for P values ranging from 0 to 50. Clearly, as P increases, the system shifts into an unstable mode.
3.2.2
Nonlinear Parametrie Models with Intermodulation
As mentioned earlier, another class of nonlinear parametric models that can have wide applicability to physiological systems is defined by a system of ordinary differential equations that contain bilinear terms representing intermodulatory effects. Such effects abound in physiology and, therefore, the relation of this model form with nonparametric Volterra models may find wide and useful application. An example of such a model is the so-called "minimal model" of insulin-glucose interactions that has received considerable attention in the diabetes literature [Bergman et a1., 1981; Carson et a1., 1983; Bergman & Lovejoy, 1997]. This model is comprised of two differential equations of first order, which are presented in Section 1.4 [Equations (1.18) and (1.19)]. Its equivalent Volterra kernels can be derived by use of the generalized harmonie balance method and are given by Equations (1.20)-(1.22). In this section, we seek to generalize the modeling approach to systems with intermodulatory mechanisms represented by bilinear terms between the output variable (y) and an internal "regulatory" variable (z) that exerts the modulatory action. For instance, in the aforementioned "minimal model," the internal regulatory variable is termed "insulin action" and its dynamics are described by Equation (1.19), whereas the output is plasma glucose, whose dynamics are described by Equation (1.18) and are subject to the modulatory action depicted by the bilinear term of Equation (1.18). Additional "regulatory" variables may represent the effects of glucagon, epinephrine, free fatty acids, cortisol, and so on. This model form is also proposed in Chapter 8 as a more compact and useful repre-
162
PARAMETRIC MODELING
( GAIN OF APPARENT TRANSFER FUNCTION
INPUT POWER LEVEL
PREQUENCY Figure 3.2 Changes in the shape of the magnitude of the apparent transfer function IH1(iwW for the example in the text as the value of the input power level P changes from 0 to 50. The plotted frequency range is from 0 to 10Hz [Marmarelis, 1989].
sentation of voltage-dependent and ligand-dependent dynamics of neuronal function (replacing the cumbersome Hodgkin-Huxley model and its many variants) which is extending also to synaptic junctions. Let us assurne that the internal regulatory variables {Zi} have first-order kinetics described by the linear differential equation dz, dt
~+
a;Zi =
bx
(3.63)
where i denotes any number of such internal regulatory variables that are all driven by the same input x. Let us also assurne that the output dynamics are described by the first-order differential equation -dy + cY= Yo+ "LZiY dt i
(3.64)
that includes as many bilinear terms as regulatory variables. Note the presence ofthe output basal value Yo that is indispensable for these systems to maintain sustained operation.
3.2
VOLTERRA KERNELS OF NONLINEAR DIFFERENTIAL EQUATIONS
163
The output equation (3.64) is nonlinear and gives rise to an infinite number of Volterra kemels (all orders ofVolterra functionals). This model form includes the aforementioned "minimal model" as a special case for i = 1. We will now derive the equivalent Volterra kernels ofthis system. To this purpose, we can solve Equation (3.63) for an arbitrary (noncausal) input x(t): z;(t) = b,
f
e-a;Ax(t - A)dA
o
(3.65)
Substitution of the integral expression (3.65) of the regulatory variables {zlt)} into the output Equation (3.64) yields the integrodifferential equation dy(t) -d- + cy(t) = Yo + t
I
biy(t) . i
JOO
e-aiAx(t- A)dA
(3.66)
0
that ean be solved by the generalized harmonie balance method, or by substitution of a Volterra expression of the output y(t) = k o +
j'"o k
1( 'T)x(t-
'T)d'T +
ff 0
k2( 'Tl> 'T2)x(t - 'Tl)x(t- 'T2)d'Tld'T2 + . ..
(3.67)
into the integral equation
Y + I b JOO e-euy(t- a)du JOO e-aiAx(t- u- A)dA y(t) = ~ i c i 0 0
(3.68)
that results from convolving Equation (3.66) with its homogeneous solution expf-crr]. The latter approach yields the following Volterra kernels: ko = Yo
(3.69)
C
because ko is obtained for a null input x(t) == O. The first-order kernel is obtained by equating first-order terms [i.e., terms containing one input signal x(t)]: k 1(T) = koIbijTe-cU-ai(T-U)du 0
i
= Yo C
I .s.: [e--GT i
ai- c
e-a;T]
(3.70)
It is evident from Equation (3.70) that the kinetie eonstants {ai} of all regulatory mechanisms are imprinted on the first-order Volterra kernel, defining distinct time constants in the first-order dynamics with relative importance proportional to b/(a i - c). The seeond-order Volterra kernel is obtained by equating second-order terms (i.e., terms containing two input factors) after substitution of y(t) from Equation (3.67) into Equation (3.68):
164
PARAMETRIC MODELING
k 2 ( 'Tb
'T2)
= I b, f
Tm
e-cA-ai(TI-A)kl ('T2 -
A)dA
0
i
~~ = Yo 2c c: .: ( b·b· z J) { [e-CTI-ai72 + e-C72-Qi TI] I
}
aj
_ [e--Qi TI--QjT2
-
aiTm (e
-
1)
ai
C
+ e--QiT2--QjTI] (1 -
m e-{C-a,
(3.71)
(c - ai - aj )
where i and j are the indices of all regulatory variables, and 'Tm = min( 'Tl, 'T2). This expression for the second-order kerneI demonstrates the complexity that arises in the nonlinear dynamics ofthe system because ofthe bilinear terms. Note that, if Ibil ~ 1 for all i, the higher-order kernels (higher than second) become negligible and the system can be approximated well by a second-order Volterra model. To get a better feeling about the shape ofthe second-order kernel, we may simplify the expression (3.71) by looking at the diagonal ('Tl = 'T2 = 'Tm = 'T): _ Yo ~ ~ bb, {_ (1 - e-a i ") e CT k2 ( 'T, 'T) - - L L c i j (aj-c) ai
- e-(az+aj}T
[1 -
e-(c-a i- a} 71 } -=-J
(c-ai-aj)
(3.72)
The purpose of this example is to demonstrate the fact that the presence of abilinear term gives rise to rather complicated nonlinear dynamics, as attested by the expression for the second-order Volterra kernel. Nonetheless, these dynamics can be measured direct1y by means ofkemel estimates obtained from input-output data collected in anatural or experimental context (see Chapter 2). This gives us the powerful capability to validate parametric models that are widely accepted (e.g., the aforementioned "minimal model") or modify them properly if their validity cannot be affirmed. This approach can also be used to delineate the various regulatory mechanisms from the structure of the estimated Volterra kernels, although this task requires considerable sophistication of analysis, an example ofwhich was presented above. It should be emphasized that, despite the complexity of the task, we finally have the realistic prospect of deriving nonlinear parametrie models from physiological data in a methodical manner that removes the heuristic and desultory character of previous efforts. As elaborated in Section 3.4, we can commence with kerneI measurements (a nonparametric model) obtained from the data and transition to an equivalent parametric model using the rigorous mathematical analysis presented in this section. 3.3
DISCRETE-TIME VOLTERRA KERNELS OF NARMAX MODELS
The fundamental relations between discrete-time Volterra kemels and the parameters of equivalent NARMAX models (nonlinear difference equations) provide the methodological key for utilizing the equivalence between parametric and nonparametrie models in the context of actual applications (sampled data). This equivalence can be used to facilitate the determination of the structure of the NARMAX model that is appropriate for a given system and to achieve accurate estimation of the NARMAX model parameters, using input-output sampled data. These relations also make possible the transition from nonparametric to parametric models in discrete time [Zhao & Mannarelis, 1998]. The discretetime finite-memory Volterra model is given by (for ko = 0)
3.3
M-} y(n)
=
165
M-} M-}
I
k}(m})x(n - mI) +
mI=O
+
DISCRETE-TIME VOLTERRA KERNELS OF NARMAX MODELS
I I
k2(mJ, m2)x(n - m})x(n - m2) + ...
mI=O m2=O
M-I M-I
M-l
ml=O rn2=O
rnr=O
I I··· I
kr(mJ, m2, ... , mr)x(n - mI)x(n - m2) ... x(n - mr) + ...
(3.73)
where the tilde distinguishes the discrete from the continuous Volterra kernels in this Chapter, the discrete values of the rth-order kerneI are assumed multiplied by T for simplicity of notation (T is the sampling interval), and M is the length of the discretized system memory or the memory-bandwidth product ofthe system (i.e., the maximum lag corresponding to significant kernel values is M - 1). We can estimate the kernels k1, k2 , and k3 from input--output data using the methods presented in Chapter 2 for a finite-order model [max(r) = q < 00]' Note that there are (M + q)!/(M!q!) discrete kernel values to be estimated in the qth-order model of Equation (3.73). This is a large number of values to be estimated when M is large (50-100 for a typical physiological system) and q ;;::: 2, a faet that can lead to reduced accuracy when the available input--output data are limited and corrupted with noise. In order to deerease the data-reeord length requirement and inerease the aceuracy of kernel estimation, the kernel expansion method was introduced in Section 2.3.1, whieh generally leads to eonsiderable eomputational savings and increased estimation accuracy. The current implementation of choice employs L diserete-time Laguerre functions to expand the Volterra kernels of the system, as diseussed in Section 2.3.2. Because L can be mueh smaller than M for most physiologieal systems (owing to the asymptotically exponential form oftheir kernel functions), the savings in kernel representation/estimation are signifieant, being approximately proportional to (M/L)q for a qth-order model. In addition, this method is rather robust in the presenee of noise, a fact that provides additional motivation for the indirect NARMAX model estimation from Volterra kernel measurements presented below [Zhao & Marmarelis, 1998]. The general NARMAX model is a multinomial nonlinear difference equation of the fonn A(~)y(n) - j[x(n), ... , dQx(n); ~y(n), ... , ~Ry(n)]
= B(d)x(n)
(3.74)
where d denotes the delay/shift operator, i.e., dmx(n) = x(n - m); f{.) is a multinomial funetion without linear terms; and A(-) and B(-) are polynomials in d representing linear differenee operators. In order to determine the discrete-time first-order kernel ofthe above NARMAX model, we follow the diserete-time version ofthe "generalized harmonie balance" method presented in Section 3.2 for continuous-time models. This method yields discrete-time kernel expressions in the multidimensional z-domain in terms ofthe coefficients ofthe equivalent nonlinear difference-equation model ofEquation (3.74). To obtain the first-order discretetime Volterra kemel, we consider the inputx(n) = z", where z = exp(jsT) is the complex variable ofthe z-domain (the discrete counterpart ofthe Laplace domain). Ifwe substitute this input into the Volterra model (3.73) and subsequently insert the resulting expressions for yen) andx(n) into the NARMAX model (3.74), then an expression of KI(z) in terms ofthe parameters of the NARMAX model can be obtained by balancing all the terms with factor z" in Equation (3.74). The resulting first-order diserete Volterra kernel is expressed in terms of A(Z-l) and B(Z-I), which constitute the linear part ofthe NARMAX model, as
166
PARAMETRIC MODELING
B(Z-I)
(3.75)
K1(z) = A(Z-l)
Note the similarity ofthis result with its continuous counterpart in Equation (3.33). This approach can be extended to second-order discrete Volterra kernel evaluation by considering the input x(n) = zr + z~ and repeating the process described above by balancing the terms in Equation (3.74) that contain the factor (z7zq) in order to obtain an expression for K 2(ZI' Z2). In general, for the rth-order discrete Volterra kerne1 evaluation, the input x(n) = z7 + ... + z~ is considered and all the terms with the factor (z7 ... z~) are balanced to obtain an expression for Kr(z}, ... ,zr). It has been shown [Zhao & Marmarelis, 1998] that the resulting high-order discrete Volterra kernels can be expressed in terms of the first-order kernel and the coefficients of the multinomial functionj{·). Specifically, ifwe decompose the functionj{·) into nonlinear terms gr(·) that represent distinct degrees of nonlinearity as j{.) = g2(-) + g3(-) + ...
(3.76)
where Q Q
Q
I I
R
I I
g2(X, ... , aQx; ay, ... , aRy) = o;i~J2~ilXL~i2X + 8J~~xamy Jl=OJ2=0 J=O m=1 R
R
,3 + L ~ L ~ 82ml,m2 a m1yam2y lm2=1 ml=
(3.77)
Q Q Q g3(X, .. . , aQx; ay, .. . ,aRy) = 8Ji~2J3~IX~2X~3X Jl =0 J2=0 J3=0 Q R R Q Q R + L~ L~ L"" 8~,2 ~xamlyam2y + ~ ~ L ~ 8~,3. J,m},m2 L L Jl,J2,m ~lx~2xamy J=O ml=lm2=1 Jl=O J2=0 m=l
I I I
R
R
R
+ L~ L~ L""~,4 ml,m2,m3 amlyam2yam3y ml=l m2=1 m3=1
(3.78)
Note that the nonlinear term g2(·) determines the second-order kernel, g2(·) and g3(·) together determine the third-order kernel, and so on. The expressions for the second-order and third-order discrete Volterra kernels are obtained after considerable analytical derivations in the z-domain (for details see [Zhao & Marmarelis, 1998]): -1 [~~ .... K2(zJ, Z2) = ~ ~/ -1 -1) L L 8]ij2(z IJl Z2J2 + zlJ2z2Jl) Zl Z2 Jl=O J2=0 Q
R
+
I I 83:~(ZlmziKl(Zl) + ZIJz2-mKl(z2)) J=O m=l
+
IR IR
ml=l m2=1
8~~,m2(Zlml Z2m2 + Zlm2Z2ml )K1(Zl)K1(Z2)
]
(3.79)
3.4
~
1
K3(z l> Z2> Z3) = ~
u
_\
_\
-I)
R
R
+~ ~ L
ml=1 m2=1
Q
+
Q
[Q
R
I I
(}2,3
Q
jl=O j2=O j3=O
~ ~
+~ L
R
~
L
J=O ml=1 m2=1
+~ ~ L
L
.:!: 8 L
jl=O j2=Om=1
R
+
R
il,i2,i3
Zi{(Zi~i3tmK2(zi2> Zi3) + 2
-m 2( )-m l Zil Zi2Zi3 K
( . 1 Zl}
)K (z,
,)
2 Z'2' Z'3
lJ,12,13
.
3 1 JI,J2,J3
R
.
I
-ml( )-m2 Zil Zi2Zi3
~
mJ,m2.~,
I I I (} .', Q
8}~
j=O m=1
I Z2 Z3
L
167
FROM VOLTERRA KERNEL MEASUREMENTS TO PARAMETRIC MODELS
R
I I I
ml=1 m2=1 m3=1
I
-" -' -'
Z ,11 Z . -.12 Z ,-.13 '1 '2 '3
il,i2,i3
~
3,2
(
(}j,ml,m2.~,
-:-ml ~m2 Z'2 Z'3
+ 2
-:-ml -:-m2) Z'3 Z'2
-j Zil K 1(Zi2)K1 ( Z i3)
lJ,12,13
3.•3 . JJ,J2,m
,,(Zi1IZi1 + ZijIZ;j2) 2
L
2
ir«I (zl},)
zl}
il,i2,i3
(}~i,m2,m3 .~. z~mIZi2m2zi3m3KI(ZI)KI(Z2)KI(Z3)
]
(3.80)
lJ,12,13
where ~il,i2,i3 denotes the sum of all distinct triplets ofthe indices (i., i 2 , i 3 ) taking values from 1 to 3. In practice, the estimated discrete Volterra kemels can be used to identify the equivalent NARMAX model based on Equations (3.75), (3.79), and (3.80), as outlined in the following section.
3.4 FROM VOlTERRA KERNEl MEASUREMENTS TO PARAMETRIC MODELS Comparative studies of parametric models in the form of nonlinear differential or difference equations and nonparametric models in the form of Volterra functional expansions have been motivated by the potential benefits that may accrue from the synergistic use of these two modeling approaches [Barrett, 1963, 1965; Korenberg, 1973b,c; Billings & Leontartis, 1982; Billings & Voon, 1984; Marmarelis, 1982, 1989a,f, 1991; Zhao & Marmarelis, 1994a,b, 1997, 1998]. Among their main advantages, the parametric models are usually more compact and subject to easier physiological interpretation, whereas the nonparametrie models do not require prior knowledge 0/ the model structure and yield models that are true to the data. The main advantages of each approach are also the main disadvantages of its counterpart. Thus, it is sensible to combine the two approaches in a synergistic manner in order to secure the maximum set of possible advantages. Specifically, if insufficient knowledge prevents the postulation of parametric model structures (a condition often encountered in complex physiological systems), then the nonparametric approach can be used first to obtain the Volterra kemels of the system from input-output data in the "model-free" context of a "minimally constrained" nonparametric model and, subsequently, a method like the one proposed below can be used to obtain an equivalent parametrie model. This parametric model is first obtained in discrete time using the method outlined below, and then can be transitioned to continuous time using the "kernel invariance" method outlined in Section 3.5. In the previous section, we presented the general methodology by which we can derive the mathematical relations between discrete-time Volterra kemels (nonparametric model)
168
PARAMETRIC MODELING
and the equivalent NARMAX model (parametric model). Note that the discrete-time Volterra model can be viewed as a nonlinear moving-average (NMA) model. In this section, we will present two different approaches for transitioning from nonparametric to parametric models. The first uses the equivalence relations derived in the previous section, and the second uses a modular form of the Volterra model that can be easily converted into continuous-time differential equations (thus facilitating physiological interpretation). Note that the NARMAX models exhibit two inherent shortcomings: (1) the presence of nonlinear autoregressive terms leads to considerable prediction errors even for small estimation errors in the autoregressive coefficients or in the presence of considerable noise; (2) physiological interpretation of the NARMAX model is inhibited by the difficulty of transitioning to continuous time (i.e., to derive equivalent nonlinear differential equations), as discussed in Section 3.5. In the first approach, we use the inverse z-transform ofEquation (3.75), to obtain a linear difference equation involving the first-order discrete Volterra kerneI in the time domain: J
I
Ii=O aikI(n -
(3.81)
i) = Ibja(n - j) j=O
where a(·) is the Kronecker delta, ao == 1, and the coefficients a, and b, describe the linear portion ofthe NARMAX models [Equation (3.74)]: I
A(Z-I) = Iai z- i i=O
(3.82)
J
B(Z-I) = Ibjz-j j=O
(3.83)
If k I (n) has been previously estimated from input-output data, then we can use existing ARMAX model-order determination and parameter-estimation methods to determine the appropriate values of land J in Equation (3.81) and to estimate the coefficients {ab bj } of Eq. (3.81), which is the linear part ofthe NARMAX model ofEquation (3.74). Similarly, the mathematical relation between the second-order discrete Volterra kernel and the second-order parameters ofthe NARMAX model given in Equation (3.79) in the two-dimensional z-domain can be converted to the time domain in the form ofthe two-dimensional autoregressive relation I
~aikz(nl - i, nz - i) = O;;f.nz + 1=0
1
R
+ "2 I
R
I
1
R
"2 I
m=I
.
[O~?mkl(nl - m) + O~'~mkl(nZ - m)]
O~~.mz[kl(nl - ml)k1(nz - mz) + k1(nl - mZ)kl(nZ- ml)]
(3.84)
mI=l m2=1
Note that this Equation (3.84) is linear in terms of the unknown parameters {(f2,1, (f2,2, (P,3} and, if kIen) and k2(nb n2) have been previously estimated from input-output data, we can use linear regression methods (like the ones discussed in Sec. 3.1) to determine the order and estimate the coefficients of the second-order terms of the NARMAX model [Zhao & Marmarelis, 1998].
3.4
169
FROM VOLTERRA KERNEL MEASUREMENTS TO PARAMETRIC MODELS
Although laborious, this procedure can be extended to the third-order Volterra kernel expression of Equation (3.80), in order to obtain estimates of the third-order coefficients ofthe NARMAX model nonlinearity in Equation (3.78). This procedure can be continued until the identification task is completed by including all the terms of the NARMAX model. It is evident that this procedure involves increasingly complex kernel expressions as the order of nonlinearity increases, and its practical efficacy is contingent upon our ability to obtain accurate kerneI estimates of the required order from input-output data. We term this method "kernel-based realization" (KBR) and we illustrate its use with an example below. Note that the term "realization" indicates the process by which a parametric model is developed from data or from an equivalent nonparametrie model. The reader may be wondering why we appear to identify the system twice (first deriving a nonparametrie model in the form ofVolterra kemels, and then developing an equivalent parametrie model in NARMAX form). The reason is that we ultimately seek to obtain a parametrie NARMAX model, and the nonparametrie model (Volterra kernels) is estimated as an intermediate step, because we believe that this "indirect" procedure offers significant practical advantages in terms of robustness to noise and accuracy of estimation (as illustrated below).
Example 3.3. Illustrative Example A computer-simulated example of a NARMAX model (which has been previously published in the literature) is used to demonstrate the efficacy ofthe proposed indirect (KBR) approach and to compare it with the PESR method previously proposed by Billings and Voon (1986). In the interest of simplicity, the example involves only a simple nonlinearity, two output lags, and output-additive noise: yen) = 0.5y(n - 1) - 0.3y(n - 2) + O.2x(n)y(n - 1) + 0.4x(n) zen) = yen) + wen)
(3.85)
where the input x(n) is a uniformly distributed white-noise signal with Ix(n)1 $ V3 (i.e., unit variance), and the output-additive noise wen) is an independent GWN sequence with variance of 0.2, resulting in signal-to-noise ratio (SNR) of 0 dB in the output data. The key Equation (3.84) becomes in this case k2(nJ, n2) = 0.5k2(nl - 1, n2- 1) - 0.3k2(nl - 2, n2 - 2) + 0.1[k1(nl - 1) + k 1(n2 -1)] (3.86)
The results obtained from applying the KBR method are compared with the results from the PESR method in Table 3.1, for a data record of 500 input-output data points. It
Table 3.1 Estimation results using the "prediction-error stepwise-regression" (PESR) method and the "kernel-based realization" (KBR) method, for the NARMAX model of Equation (3.85) and a data record of 500 input-output datapoints (SNR=O dB for GWN output-additive noise)
Equation tenns yen - 1) yen - 2) x(n) x(n)y(n - 1)
Coefficient values
PESR estimates
KBR estimates
0.5 -0.3 0.4 0.2
0.46 -0.25 0.42 0.16
0.48 -0.30 0.42 0.16
170
PARAMETRIC MODELING
is evident that the kernel-based (KBR) method yields more accurate parameter estimates for the linear terms ofthis example, while it also facilitates the model-order determination task. The requisite kerne I estimates for the KBR method were obtained via the Laguerre expansion technique discussed in Section 2.3.2. Note that the estimation error for the coefficient of the nonlinear term is the same in both methods for this example. The presented results for the KBR method demonstrate that NARMAX model specification and estimation can be accomplished indirectly using Volterra kernel estimates obtained from input-output data as an intermediate step. The KBR method relies on accurate kernel estimates (obtained, for instance, by use of the Laguerre expansion technique discussed in Section 2.3.2) and exhibits the following main advantages over previous methods: (1) recursive or iterative algorithms are not required and, therefore, there are no problems of convergence as in previous algorithms (for instance, PESR); (2) the terms of different orders of nonlinearity are treated separately, so it is easier to establish the structure of the model (order determination); and (3) the KBR approach may yield more accurate parameter estimates in the presence of noise, because the effective SNR of the kernel estimates is usually higher than the SNR of the output data. The main disadvantage is that application of the KBR method is limited to third-order models, because of the impracticality of estimating kerneis of order higher than third and the complexity of the analytical expressions relating higher-order kernels to NARMAX model coefficients. The KBR method can be used in a practical context (up to third-order models) to make the transition from nonparametrie (kernel-based) to parametrie (NARMAX) models, thus securing some of the combined advantages of the two modeling approaches. However, there is a second approach to transitioning from nonparametrie to parametric models (discussed below) that also allows easier transition to continuous-time parametric models (which are more amenable to physiological interpretation) unlike the cumbersome transition of NARMAX models to continuous-time nonlinear differential equations, as discussed in Section 3.5. The second approach seeks the parsimonious representation of the estimated discretetime Volterra kernels in terms of a set of linear filters that may be viewed as a basis of functions (akin to the Laguerre expansion technique discussed in Section 2.3.2). This leads to the modular model form of Figure 2.16, where each of the discrete-time filters hj(n) generates an output uJ{n) that is fed into a multiinput static nonlinearity. Each of these filters can be easily converted into an equivalent continuous-time differential equation following the "impulse invariance method" described in Section 3.5. This offers an alternative "realization" approach whereby the nonparametrie model (obtained in the form of discrete Volterra kemels) can be converted to a parameterized modular model. In order to accomplish this, we fit an ARMAX model (linear difference equation) to each ofthe estimated filter data hj(n): hj(n)
=
C}j,thj(n -1) + ... + aj,Pj hj(n -Pj) + ßj,05(n) + ... + ßj,"'j5(n - mj) + 8j(n)
(3.87)
following the linear estimation procedures outlined in Section 3.1. Having estimated the coefficients {C}j, t, ... , C}j,pj' ßj,O, ... , ßj,mj} for the selected model order (Pj' mj), we proceed with the estimation of the multiinput static nonlinearity: y(n) =trUten), ... , uj(n), ... , uJAn)]
(3.88)
3.5
171
EQUIVALENCE BETWEEN CONTINUOUS AND DISCRETE PARAMETRIC MODELS
where M-l
uj(n) =
I
(3.89)
hj(m)x(n - m)
m=O
This realization method is completed when the static nonlinearity is parameterized. For instance, a multinomial representation, f(u}, ... , UH) =
I
cj 1,...,jHu{I
...
uff
(3.90)
Jl' ··JH
was discussed in Section 2.3.2, in connection with the Laguerre expansion technique.
3.5 EQUIVALENCE BETWEEN CONTINUOUS AND DISCRETE PARAMETRIC MODELS Although the analysis of discrete-time (sampled) data yields discrete parametric models, the actual physiological processes described by these models occur in continuous time. The physical or physiological interpretation of the estimated model parameters may change considerably between discrete-time (difference equation) and continuous-time (differential equation) representations, especially for nonlinear models. Thus, methods for defining the equivalence between discrete and continuous parametric models are needed to assist in model development and interpretation. The "impulse invariance" method has been introduced for this purpose in the linear case, as discussed below. We have extended this method to the nonlinear case by introducing the "kernel invariance" method that offers the means for defming the equivalence between continuous and discrete nonlinear models [Zhao & Marmarelis, 1997]. This method uses the general Volterra model form as a canonical representation of nonlinear systems and requires that the continuous-time kernels (corresponding to the differential equation) be identical to the discrete-time kemels (corresponding to the equivalent difference equation) at the sampling points. The actual implementation of this method may become unwieldy in the general case, but it appears to be tractable in certain cases of low-order nonlinear systems, as described below. We begin with the presentation ofthe "impulse invariance" method for linear time-invariant systems that is based on the principle of maintaining the direct correspondence between the poles in the continuous s-domain with the poles in the discrete z-domain for the respective transfer functions, so that the values of the respective impulse response functions are the same at the sampling points. According to this method, the partial fraction expansions of the two transfer functions H(s) andH(z) (using their P poles) are equated, so that the discrete impulse response function h(m) is the sampled version of its continuous counterpart h( 'T), where 'T = mT (the sampling interval T is assumed to be sufficiently short in order to avoid aliasing). Thus, for P distinct poles, we have P
H(s) = ~
1=1
Ai S-Pi
(3.91)
172
PARAMETRIC MODELING
H(z)=TI~ '"
p
(3.92)
i=1
where Pi are the continuous poles. Equations (3.91) and (3.92) can be readily translated into equivalent linear differential and difference equation models, respectively. For example, suppose that a discrete-time impulse response function h(n) is described by the difference equation lien) = 0.3h(n - 1) - 0.02h(n - 2) + 8 (n)
(3.93)
that has two poles detennined by the z-domain transfer function 1 ii(z) = 1 _ 0.3z- 1 + 0.02z-2 2
1 - 0.2z- 1
1 - 0.lz- 1
(3.94)
The two discrete poles 0.2 and 0.1 lead to the two continuous poles PI = ln(0.2)/T and P2 = ln(O.l)/T, according to Equation (3.92) that govems the impulse-invariance method. Therefore, the s-domain transfer function ofthis linear system in continuous time is (for T = 1):
H(s) =
S2
s + ln(20) -ln(0.02)s + 1n(0.2)ln(0.1)
(3.95)
leading to the equivalent differential-equation model for the system:
dy'
dy
dt - In(0.02) dt + ln(0.2)ln(0.1) y
=
dx
di + In(20)x
(3.96)
where y(t) and x(t) are the continuous-time output and input of this system. The developed "kerne1 invariance method" is the conceptual and mathematical extension to the nonlinear case of the presented "impulse invariance method," that offers a practical approach for defining the equivalence between continuous and discrete nonlinear parametric models in the time-invariant case [Zhao & Marmarelis, 1997]. According to the kerne1 invariance method, the equivalence between nonlinear differential and difference equation models is defined on the basis of the principle that the discrete Volterra kemeis (corresponding to nonlinear difference equations) be the sampled versions of their continuous counterparts (corresponding to the equivalent nonlinear differential equations). Implementation of this method requires derivation of the formal mathematical relations between nonlinear differential/difference equations and continuous/discrete Volterra models (kerneis) that are summarized below. In the following derivations the "tilde" denotes the discrete variables, functions and operators. For the nonlinear differential equation model ofthe form L(D)y - fix, y, Dx, Dy, ... , IJRx, DQy)
= M(D)x
(3.97)
3.5
EQUIVALENCE BE7WEEN CONTINUOUS AND DISCRETE PARAMETRIC MODELS
173
where D denotes the differential operator d(·)/dt,f(-) is a multivariate continuous function without linear terms, and L(D), M(D) are polynomials in D, where L is of higher degree than max(R, Q) and M is of lower degree than L. We seek an equivalent nonlinear difference equation model of the form
l(a)y -l(x, y, ax,ay, ... , aRx, aQY) = M(a)x
(3.98)
where adenotes the delay/shift operator, ax(n) = x(n - 1). The key requirement that defines this equivalence according to the kerne I invariance method is (3.99)
k;(mJ, ... , mi) = Tik;(mIT, ... ,miT)
where the scaling constant Ti is introduced to remove the dependence of the measured discrete kernel values on the sampling interval T. Although the concept of "kernel invariance" is rather clear and straightforward, its actual methodological implementation may be rather complex. This is demonstrated below in the relatively simple case of quadratic (i.e., second-order) models. We consider the class of nonlinear systems described by Equation (3.97), for nonlinearities of second degree: f(x, Dx, ... , IJRx; y, R
R
Q
Dy, ... , DQy)
= R
Q
Q
L L alJ,/2DlIxDl2x +mI=Om2=0 L L bmI,m2IrlyDm2y + L L cI,,,!)lxnmy /1=0 12=0 1=0 m=O
(3.100)
This class of nonlinear systems can be represented by a Volterra series expansion (if stable solutions exist) with kernels that can be analytically evaluated using the "generalized harmonic balance" method described in Section 3.2. The first two continuous Volterra kernels (in the Laplace domain) are given by the expressions M(s) K1(s) = L(s)
KZ(SI> sz) =
~ ~ ~a 2 L
L
11=0 /2=0
11,/2
Q Q + _21 L'" '" L
b
(s{lsF + s{zs1 L(SI + S2) ( m
mI,m2
SI IsT2
mI=O m2=0
+ ~~ ~ 2L
L
[=0 m=O
(3.101)
1 )
2 s T I)K I(S I)K I(S2) + sf -=---=--..:....-~~:2~:!L(SI
+ S2)
(s'{'s1K1(SI) + s{sr'K1(sz)) Cl,m
L(SI
+ S2)
(3.102)
The equivalent discrete first-order kernel KI(z) in the z-domain can be obtained from using the impulse invariance method for linear systems, discussed above. Consequently, the linear part of the equivalent difference equation can be constructed from
KI(s)
KI(z).
To determine the nonlinear (quadratic) part of the equivalent difference equation, we can apply the kernel invariance method of second order. If we assume, for simplicity of
174
PARAMETRIC MODELING
derivations, that Kl(s) and K 2(Sl' S2) have distinct poles, then K 2(Sl' S2) can be represented in terms ofthe partial fraction expansion in the (SI' S2) domain [Crum & Heinen, 1974]: N bo,o + bl,oSl + bO,lS2 K 2(Sb S2) = k=l (s 1 - ak)(S2 - ßk)(Sl + S2 - 'Yk)
I
(3.103)
where (ab ßb 'Yk) are N distinct triplets formed by the P distinct poles of Kl(s) with repetitions (i.e., N = P3). In order to avoid singular functions in the inverse Laplace transform ofthe second-order kernel, we have to assume that the order of SI and S2 in the denominator of K 2(Sb S2) is higher than the order in the numerator. Ifthe nonlinearity ofthe equivalent difference equation is A'" J~('" ~ X, uX,
R
'" A'" AQ~ ... , uAR"'. x, y, uy, ... ,u YJ-
Q
R
L L 11=0/2=0 "'""
"'"" al '"
Q
R
Q
A/ Am '" + L "'"" L"'"" b'"ml,m2 uAmI yu "' 2Y '" + L "'"" L "'"" '" Xu 2 X CI,m uA/"'Am'" Xu Y l'/2uAll "' ml=l m2=1 1=0 m=l
(3.104)
then the corresponding second-order discrete Volterra kernel in the (Z1, Z2) domain can be found by generalized hannonic balance to be 1 R R l2zil "Zi (ZillZil2 + Zi I) 2 L L /1'/2 '" /1 =0/2=0 Liz;' zi
K2(z \ , Z2) = - " + _1 2
IQ
Q b "'""
L ml=l m2=1
R Q + _1 ~ "'"" '"
2 L L 1=0 m=l
m\,m2
Cl,m
(
l Z2-rn + Zlm2Z2ml)Kl(Zl)K (z-m '" '" (Z ) 1 _ \ 2 lZ2l) L(Zl
I m)K '" l(z) Zl-mZ2-IK'"1(Zl) +ZIZ2 _ 2 lZ2l) L(Zl
(3.105)
According to the second-order kernel-invariance method, the equivalent K2(Zl' Z2) ought to conform with the partial fraction expansion of K 2(Sb S2) given by Equation (3.103) and, therefore, by equating respective fractions we obtain N
K2(Zl' Z2) = T2I
X
k=l
(bo,o + bl,o + bO,1)e(ak+ßk+Yk)Tzllz2l + ('Yk - ak - ßk)[b l,o(l - eakTz-l) + bO,l(l - eßkTz- l)] ('Yk - (Xk - ßk)(l - eakTzIl)(l - eßkTz2l)(1 - eYkTzllz2l) (3.106)
where the discrete model parameters are expressed in terms of the continuous model parameters. Comparison of Equations (3.105) and (3.106) yields the "discrete" coefficients (all '/2' bmI,m2' CI,m) in terms ofthe pole triplets (ab ßb 'Yk) and the constants (bo,o, bl,o, bO,l) ofthe partial fraction expansion of K 2 shown in Equation (3.103) or, equivalently, in terms of the "continuous" coefficients (al l ,/2' bmI,m2' CI,m) in Eq. (3.102). The relations between the two sets of coefficients define the equivalence between the quadratic parts of the nonlinear differential and difference equation models.
3.5
EQUIVALENCE BETWEEN CONTINUOUS AND DISCRETE PARAMETRIC MODELS
175
For more procedural details see Zhao and Marmarelis (1997). It is evident that this procedure can be rather unwieldy in general, but represents a general and powerful approach that has not been utilized heretofore. To assist the reader in understanding the procedural steps, we provide an illustrative example below.
Example 3.4. Illustrative Example Consider the differential equation model that has a single square nonlinearity: D2y + 2alDy + aaY- ey2 = bx
(3.107)
This model has the following equivalent first-order and second-order Volterra kerneis in the s-domain:
b a)(s - ß)
(3.108)
b K I(SI)K1(S2)K1(s} + S2)
(3.109)
K,(s)
K 2(Sl' S2) =
= (s -
e
where a = -al + (aT - ao)I/2, and ß = -al - (aT - ao)I/2, are the two poles of the model. Note that this model has a full Volterra series expansion (i.e., an infinite number of kernels) but here we limit ourselves to a second-order approximation (truncated model) on the assumption that lei ~ 1. The partial fraction expansions for these two kerneis are
b[ (s -1 a) -
K)(s) = d
_1
(s - ß)
J
(3.110)
1
1
--------+-------(SI -
+
eil-
a)(s2 - a)(SI + S2 - a)
(SI -
a)(S2 - ß)(SI + S2 - ß)
1
1 +-------(s} - ß)(S2 - a)(s} + S2 - ß) (SI - ß)(S2 - ß)(SI + S2 - a)
K 2(sJ, S2) = -;p
1 (SI -
ß)(S2 - ß)(SI + S2 - ß)
(SI -
a)(S2 - ß)(SI + S2 - a)
1 (SI -
(3.111)
1 1
ß)(S2 - a)(sl + S2 - a)
(SI -
a)(S2 - a)(SI + S2 - ß)
where d = (a - ß). Therefore, based on first-order kernel invariance, the first-order kernel ofthe equivalent discrete model in the z-domain is M(z-I) K1(z) = l(Z-I)
b'Ite'" - eßl)z-I
(3.112)
d(1 - e aTz- 1)( 1 - eßTz- 1)
and the linear part of the equivalent nonlinear difference equation is expressed as yen) - (eaT + eß1)y(n -1) + e(a+ß)Ty(n -2)
bT(e"! - eßl) =
1
x(n -1)
(3.113)
176
PARAMETRIC MODELING
Based on second-order kernel invariance, the partial fraction expansion of the equivalent discrete second-order discrete Volterra kernel in the (ZI' Z2) domain is obtained using the general Equation (3.106). After performing the calculations to merge the various partial fractions that correspond to Equation (3.111), we obtain the following expression for the second-order discrete Volterra kernel:
,. . ; K 2(z j,
Z2)
=
eb 2 T 2
---;p1
Zl zi
A.. 1 [ 'f'1
X
+ wi A.. ( -1 + -1) + A.. Z -1 -1 + A.. Z -1 -1( -1 + -1) + A.. -2 -2] Z1 Z2 'f'3 1 Z2 'f'4 1 Z2 Z 1 Z2 'f'SZ 1 Z2 ,. . ; 1""'; 1""'; 1 1 L(zl )L(zi )L(zl zi )
(3.114)
where l(z-I) = (1 - eaTz- 1)(1 - eßTz- 1)
(3.115)
and the values of the parameters (cP1' cP2' cP3' 4>4' cPs) in terms of a, ß and T are given in Zhao and Marmarelis (1997). Thus, the nonlinear difference equation model that has the same first-order and second-order Volterra kerneis as the differential equation model ofEquation (3.107) (in accordance with the kernel invariance method) is yen) - (e aT + eß1)y(n - 1) + e(a+ß)Ty(n - 2)
2eb
- d2(ea T _ eßT)2 [82x(n - l)y(n - 1) + 83x(n - l)y(n - 2)]
e d(ea T _ eßT) [84y2(n - 1) + 28sY'(n - l)y(n - 2) + 86jJ2(n - 2)] b(e aT - eß1) =
b2() x(n -1) + ~cX2(n - 1)
(3.116)
where the parameters (()l' ()2' ()3' ()4' ()s, ()6) are expressed in terms of a, ß, and T, as detailed in Zhao and Marmarelis (1997). These expressions are rather unwieldy and are omitted here in the interest of space. We observe that the nonlinear second-degree term in the differential equation (3.107) has given to rise to several nonlinear second-degree terms in the equivalent difference equation (3.116); namely, all the square and cross-product terms between the lagged input-output values x(n - 1), yen - 1) and yen - 2) that participate in the equivalent linear (first-order) portion ofthe difference equation. Thus, although the continuous differentialequation model of Equation (3.107) has only one nonlinear term limited to the current value ofy, the equivalent difference-equation model ofEquation (3.116) has six nonlinear terms that involve the lagged values of the discrete input and output. These lagged values are also the ones present in the linear portion of the difference equation. This result has important practical implications because it shows that, if one had obtained experimentally the discrete (NARMAX) model of the nonlinear difference equation (3.116) from sampled data, then the presence of nonlinear terms involving lagged input-output values should not be necessarily interpreted as nonlinearities in the deriva-
3.5
EQUIVALENCE BETWEEN CONTINUOUS AND DISCRETE PARAMETRIC MODELS
177
tive(s) of continuous input-output variables of the actual physical process governed by the differential equation (3.107). This example demonstrates that such terms may be simply the result of a nonlinearity in y undergoing discretization in the dynamic context of the differential equation. Therefore, proper interpretation of empirically obtained nonlinear difference equation (NARMAX) models of continuous nonlinear dynamic systems requires the in-depth and rather complicated analysis presented in this section and elaborated in Zhao and Marmarelis (1997). It is the considerable complexity of this type of analysis that has confounded the fruitful application of the NARMAX approach to date and has inhibited practical parametrie modeling in a nonlinear context. In order to test the validity of the presented analytical results, the first-order and second-order Volterra kerneis for the continuous and discrete models of this example can be evaluated for specific parameters of the differential and difference equations, and compared with the kernel estimates obtained from data resulting from simulation ofthe difference equation with a Gaussian white-noise input. Specifically, in Zhao & Marmarelis (1997), we evaluate analytically the kerneis ofthe continuous model ofEquation (3.107) for parameter values ao = 12, al = 3.5, c = 2 and b = 5 using the Laplace-domain expressions of Equations (3.110) and (3.111). Their discrete counterparts of the equivalent difference equation model of Equation (3.116) are evaluated using the corresponding analytical expressions (3.112) and (3.114). The results are shown to be identical for all practical purposes, in terms of the first-order and the second-order Volterra kernels. These results confirm the validity of the proposed "kerne I invariance method" and corroborate the analytical derivations presented above. 3.5.1
Modular Representation
The material presented above makes it clear that derivation of continuous nonlinear parametric models from input-output data is a rather complicated task, unless the modular model approach that was presented in the previous section is used. According to this modular approach, the discrete Volterra kernel estimates are used to find a set of H linear filters that form a complete basis for the system at hand. Each of these filters is converted into an equivalent linear difference equation [see Equation (3.87)]. Then, each difference equation is converted into an equivalent differential equation using the impulse-invariance method. The outputs uj(t) ofthese continuous filters (which can be viewed as "internal or state variables") are fed into the multiinput static nonlinearity f{.) of the modular model to produce the continuous system output y(t). Therefore, the equivalent continuous parametric model takes the form Aj(D)uj = Bj(D)x
y
=
(for j = 1, ... , H)
f{u}, ... , UR)
(3.117) (3.118)
where Aj(D) and Bj(D) are the obtained differential operators for thejth filter [i.e., polynomials in D = d(·)/dt] using the impulse invariance method on Equation (3.87). It is evident that this continuous modular model separates the dynamics from the nonlinearities and can be viewed also as a parametrie model when the static nonlinearity f{.) of Qth degree is parameterized. The dynamics of this model are contained only in the H "state equations" (3.117), and the nonlinearities are contained only in the static nonlinearity of the "output
178
PARAMETRIC MODELING
equation" (3.118). This modular/parametric model form facilitates the study of stability and control for the subject system. Note that the jth "state equation" is of order Pi and can be written as a system of Pi first-order differential equations to confonn with the conventional fonnulation ofstate space in control theory, ifthis is deemed necessary or appropriate. In spite of the many advantages of this modular/parametric model form, the potential complexity ofthe output nonlinearity (due to high dimensionality H and/or nonlinear order Q) remains a practical problem. The key requirement for obtaining this modular/parametric model in a practical context is the parsimonious representation of the kemels in terms of an efficient expansion and its estimation from input-output data. This relates to the fundamental issue of "principal dynamic modes" discussed in Section 4.1.1 as the most efficient representation of the system dynamics.
4 Modularand Connectionist Modeling
In this chapter, we review the other two modeling approaches (modular and connectionist) that supplement the traditional parametric and nonparametric approaches discussed in the previous two chapters. The potential utility of these approaches depends on the characteristics of each specific application (as discussed in Section 1.4). Generally, the connectionist approach offers methodological advantages in a synergistic context, especially in conjunction with the nonparametric modeling approach as discussed in Sections 4.2--4.4. The primary motivation for connectionist (network-structured) modeling is the desire for greater efficiency in extracting the causal relationships between input and output data without the benefit ofprior knowledge (as discussed in Sections 4.2 and 4.3). On the other hand, the primary motivation for modular (block-structured) modeling is the desire to compact the estimated nonparametric models and facilitate their physiological interpretation (as discussed in Sections 4.1 and 4.4). The equivalence among these model forms is established through detailed mathematical analysis in order to allow their synergistic use, since each approach exhibits its own blend of advantages and disadvantages vis-ä-vis the specific requirements of a given application. Some modular forms that have been proven useful in practice and derive from nonparametric models are discussed in Section 4.1. The connectionist modeling approach and its relation to nonparametric (Volterra) models is discussed in Section 4.2, and Section 4.3 presents its most successful implementation to date (the Laguerre-Volterra network). The chapter concludes with a general model form that is advocated as the ultimate tool for open-loop physiological system modeling from input-output data (the VWM model).
4.1
MODULAR FORM OF NONPARAMETRIC MODELS
In this section, we present a general modular form of nonparametric (Volterra) models employing the concept of "principal dynamic modes," as well as specific modular forms Nonlinear Dynamic Mode/ing 0/ Physiological Systems. By Vasilis Z. Mannarelis ISBN 0-471-46960-2 © 2004 by the Institute ofElectrical and Electronics Engineers.
179
180
MODULAR AND CONNECTIONIST MODELING
(cascades and feedback) derived from nonparametric models that have been found useful in actual applications as they assist in physiological interpretation of the obtained models. Webegin with the general modular form that is equivalent to a Volterra model and employs the concept of "principal dynamic modes" in order to facilitate model interpretation. 4.1.1
Principal Dynamic Modes
The general modular form that is equivalent to a Volterra model derives from the blockstructured model ofFigure 2.16, which results from the modified discrete Volterra model of Equation (2.180). The discrete impulse response functions {bj(m)} of the filter bank constitute a complete basis for the functional space of the system kemels and are selected apriori as a general "coordinate system" for kernel representation. However, this is not generally the most efficient representation ofthe system in terms ofparsimony. The pursuit of parsimony raises the important practical issue of finding the "minimum set" of linear filters {plm)} that yield an adequate approximation of the system output and can be used in connection with the modular form of Figure 2.16 as an efficient/parsimonious model of the system. This "minimum set" is termed the "principal dynamic modes" (PDMs) ofthe system and defines a parsimonious modular model (in conjunction with its respective static nonlinearity) that is equivalent to the discrete Volterra model and shown schematically in Figure 4.1 [Marmarelis, 1997]. It is evident that the reduction in dimensionality ofthe functional space defined by the filter bank requires a criterion regarding the adequacy of the PDM model prediction for the given ensemble of input-output data. This criterion must also take into account the possible presence ofnoise in the data (see Section 2.3.1). The general approach to this question can be based on the matrix formulation of the modified discrete Volterra (MDV) model of Equation (2.180) that is given in Equation (2.183). For the correct model order R, the MDV model is (4.1)
y = VRCR + HRWo
where CR is the true vector of Volterra kernel expansion coefficients for the general basis employed in the filter-bank, VR is the input-dependent matrix of model order R, H R is the "projection" matrix defined in Equation (2.190), and Wo is the output-additive noise vector.
x
f(-)
:Y
Figure 4.1 Structure of the PDM model composed of a filter bank of H filters (PDMs) that span the dynamics of the system and whose outputs {Uj} feed into an H-input static nonlinearity f(u 1, •.• , UH) to generate the output of the model.
4.1
MODULAR FORM OF NONPARAMETRIC MODELS
181
We seek a parsimonious representation in terms of the size PR of the coefficient vector CR, so that the sum ofthe squared residuals ofthe model output prediction remains below an acceptable level 00. Let us use the following notation for the "parsimonious" model: y = '1'/)'s +
's
(4.2)
where the input-dependent matrix '1's is composed of the outputs of the H PDMs in the MDV model formulation (H < L), Ys is the vector of the respective expansion coefficients that can reconstruct the Volterra kernel estimates using the PDMs {plm)} (i = 1, ... ,H), and (s is the resulting residual vector whose Euclidean norm cannot exceed 00. The task is to find a [PR x Ps] matrix that transfonns V R into '1's:
r
vRr = '1's
(4.3)
where PR = (Q + L)!/(Q!L!) and P, = (Q + H)!/(Q!H!), so that the resulting prediction error remains below the acceptable level 00. Note that the residual vector (s contains generally a deterministic input-dependent component that results from the reduction in dimensionality of the filter bank and a stochastic component that is due to the output-additive noise but is different (in general) from HrwO• The transformation of Equation (4.3) gives rise to a different "projection" matrix for the PDM model:
Gs
=
I-
=I -
'1's['1'~ '1's]-l'1'~
VRr[r'V~VRr]-lr'Vk
(4.4)
which defines the deterministic error in the output prediction as G s V RCR, and the stochastic component of (s as: Gswo. Therefore, the prediction error for the PDM model is (s = GsVRCR + Gswo
(4.5)
which implies that the mean of the Euclidean norm of this prediction error is
E[(;(s] = c~V~G~GsVRCR + (TB . Tr{G~Gs}
(4.6)
where (TB is the variance of the output-additive noise (assumed white for simplification) and Tr{·} denotes the trace of the subject matrix. Thus, the task becomes one of finding a matrix r [which determines the matrix G s from Equation (4.4)] so that the right-hand side ofEquation (4.6) is no more than 0 0 for a given system (CR), input data (V R' H R ) , and output-additive noise variance (TB. It is evident from Equations (4.4) and (4.6) that the general solution to this problem is rather complicated, however an approximate solution can be obtained with equivalent model networks (see Section 4.2.2), and a practical solution has been proposed for second-order systems [Marmarelis & Orme, 1993; Marmarelis, 1994a, 1997], which is discussed below. For a second-order MDV model, the output signal can be expressed as a quadratic form: y(n) = v'(n)Cv(n)
(4.7)
182
MODULAR AND CONNECTJONIST MODELING
where v(n) is the augmented vector ofthe filter-bank outputs {vj(n)} at each discrete time n:
v'(n) = [1 vI(n)
v2(n)
...
(4.8)
vL(n)]
and C is the symmetrie coefficient matrix:
-
1
1
Co
2"cI(I)
2"cI(2)
...
2"cI(L)
2"c I(I)
c2(1, 1)
c2(1,2)
...
c2(1, L)
c2(2, 1)
c2(2,2)
...
c2(2, L)
c2(L, 1)
c2(L,2)
...
c2(L, L)
1
C= I 1
2"c I(2)
1 2"c I(L)
1
I
(4.9)
Because ofthe symmetry ofthe square matrix C, there always exists an orthonormal (unitary) matrix R (composed ofthe eigenvectors ofC) such that C=R'AR
(4.10)
where Ais the diagonal matrix ofthe eigenvalues (Le., a diagonal matrix with the (L + 1) distinct eigenvalues {Ai} ofmatrix C as diagonal elements). Therefore, y(n) = v'(n)R'ARv(n)
= u'(n)Au(n) L
Ii=O AiUr(n)
(4.11)
u(n) = Rv(n)
(4.12)
= where
Therefore, each component uln) is the inner product between the ith eigenvector ~i of matrix C (i.e., the ith column of matrix R) and the vector v(n) which is defined by the outputs of the filter bank at each discrete time n. The first element, ILi,O' of each eigenvector ~i corresponds to the constant that is the first element ofthe vector v(n). The other eigenvector elements define the transformed filter-bank outputs: L
uln)
= ILi,O + I
ILi,jVj(n)
(4.13)
j=l
Because the eigenvectors have unity Euclidean norm, the eigenvalues {Ai} in Equation (4.11) quantify the relative contribution of the components ur(n) to the model output.
4.1
MODULAR FORM OF NONPARAMETRIC MODELS
183
Thus, inspection of the relative magnitude of the ordered eigenvalues (by absolute value) allows us to detennine which components uren) in Equation (4.11) make significant contributions to the output y(n). Naturally, the selection ofthe "significant" eigenvalues calls for a criterion, such as IA;I being greater than a set percentage (e.g., 5%) ofthe maximum absolute eigenvalue IAol. Having made the selection of H "significant" eigenvalues, the model output signal is approximated as H-I
yen) = LA;uf(n) ;=0
= ~Ai JLi,O + ];/i(m)x(n - m) 2 H-I
[M-I
H-I
{
]
M-I
= ~Ai JL7.o + ~O 2JLi,oPi(m)x(n - m) ~l~l
}
+m~om~ti(ml)p;(m2)x(n - ml)x(n - m2)
(4.14)
where L
p;(m) = LJL;,jbj(m)
(4.15)
j=l
is the ith ''principal dynamic mode" (PDM) of the system. It is evident that the selection of H PDMs compacts the MDV model representation by reducing the number of filterbank outputs and, consequently, the dimensionality of the static nonlinearity that generates the model output. The resulting reduction in the number of free parameters for the second-order MDV model is (L - H)(L + H + 3)/2. This reduction in the number of free parameters has multiple computational and methodological benefits that become even more dramatic for higher-order models (provided that the PDMs can be detennined for higher-order models). As a matter ofpractice, the PDMs thus detennined by the secondorder model can be used for higher-order models as well, on the premise that the PDMs of a system are reflected on all kernels. However, this may not be always true and the issue of detennining the PDMs of a higher-order model is addressed more properly in the context of equivalent network models, as discussed in Section 4.2. The resulting Volterra kernels ofthe PDM model are H-I
"""" A; 1L;,o 2 k--o = L
(4.16)
;=0 H-I
"l(m) = L 2A;JL;,op;(m) ;=0 H-I
L
= L L 2A;JL;,oJL;,jbj(m) ;=0 j=l
(4.17)
184
MODULAR AND CONNECTIONIST MODELING H-l
k2(m J, m2) = I
i=O H-l
=
A. iPi(ml)p;(m2) L
L
Ii=O jl=1 I j2=1 I A.iJLi,jlJLi,j2bjl(ml)bj2(m2)
(4.18)
which indicates the effect of the elimination of the "insignificant" eigenvalues by the PDM analysis on the original expansion coefficients ofthe Volterra kemels. Specifically, the kernel expansion coefficients after the PbM analysis are H-l
Cl(j) =
Ii=O 2A.iJLi,OJLi,j
(4.19)
H-l
C2(jJ,}2) =
Ii=O A.iJLi,jlJLi,j2
(4.20)
Note that another criterion for selecting the PDMs (i.e., the "significant" eigenvalues) can be based on the mean-square value of the model output for a GWN input, after the zero-order Volterra kernel is subtracted (this represents a measure of output signal power for the broadest possible ensemble of input epochs). Implementation of this criterion requires an analytical expression of this mean-square value (J in tenns of the expansion coefficients for the orthonormal basis {bj(m)} for any H from 1 to L. This is found to be
L
8(H, P) ~ E[ [Y(n) - k O]2] = P ~ cy(j) + 2p2 )=1
}2 ?L ?L C~(jJ,j2) + {P L ~ C2(j,})
)1=1)2=1
(4.21)
)=1
where P is the power level ofthe GWN input, and Cl and C2 depend on H as indicated by Equations (4.19) and (4.20), respectively. The quantity 8(H, P) is evaluated for H= 1, ... , L, and the ratio O(H, P)/8(L, P) (that is between 0 and 1) is compared to a critical threshold value that is slightly less than unity (e.g., 0.95), representing the percentage of output signal power captured by the H PDMs for the input power level of interest. The minimum H for which the ratio exceeds the threshold value determines the number ofPDMs. The presented PDM analysis can be performed directly in the discrete-time representation of the kernels without employing any expansion on a discrete-time basis [Marmarelis, 1997]. However, the use ofa general orthonormal filter bank (Le., a complete basis of expansion) usually improves the computational efficiency ofthis task. Note that a similar type of analysis can be performed through eigen-decomposition of the second-order kernel alone and selection of its "significant" eigenvalues that determine the PDMs in the form ofthe respective eigenvectors [Westwick & Kearney, 1992, 1994]. The first-order kernel is then included as another (separate) PDM in this approach, unless it can be represented as a linear combination ofthe PDMs selected from the eigen-decomposition of the second-order kernel. This approach can be extended to higher-order kernels, whereby the eigen-decomposition is replaced by singular-value decomposition of reetangular matrices, properly constructed to represent the contribution of the respective discrete Volterra functional to the output ofthe model. The column vectors that correspond to the "significant" singular values are the selected PDMs for each order ofkemel.
4.1
185
MODULAR FORM OF NONPARAMETRIC MODELS
The fundamental tradeoff in the PDM modeling approach regards the compactness versus the accuracy of the model as the number of PDMs changes. This issue can be addressed in a rigorous mathematical framework by considering the following matrix-vector representation of the input-output relation: yen) == Yo + xI(n)k l + x{(n)K2 xI(n) + x{(n)K3 x2(n) + ... + x{(n)KRxR_I(n) (4.22)
where xr(n) == [x(n) x(n - 1) ... x(n - M+ 1)] is the vector of input epoch values affecting the output at discrete time n, k: == [kl(O) kl(I) ... k}(M - 1)] is the vector of first-order kerneI values, K 2 is a symmetric matrix defined by the values of the second-order Volterra kernel, and, generally, the vector Xr-l(n) and the matrix Kr are constructed so that they represent the contribution of the rth-order Volterra functional term. For instance, ifx:(n) == [x(n) x(n - 1)], then
x~(n)K3x2(n)
°
°
x2(n)
]
k3(0, 0, 0) 3k3(0, 0, 1) x(n)x(n _ 1) (4.23) = [x(n) x(n - 1)][ 3kiO, 1, 1) k3(l , 1, 1) [ x 2(n _ 1)
J
Following this fonnulation, we may pose the mathematical question of how to reduce the rank of the matrices K 2 , K 3 , .•. ,KR for a certain threshold of significant "singular values" when a canonical input (such as GWN) is used. Then, singular-value decomposition (SVD) ofthese matrices: K R == U~SRWR
(4.24)
where SR is the diagonal matrix of singular values, can yield the singular column vectors of matrix WR corresponding to the "significant" singular values that are greater than the specified threshold (rank reduction). All these "singular vectors" thus selected from all matrices K 2 through KR (and the vector k.) can form a new reetangular matrix that can be subjected again to rank reduction through SVD in order to arrive at the final PDMs of the system. To account for the relative importance of different nonlinear orders, we can weigh each singular vector by the square-root of the product of the respective singular value with the input power level. This process represents a rigorous way of determining the PDMs of a system (or the structural parameter H in Volterraequivalent network models) but requires knowledge of the VoIterra kernels of the system-which is impractical. It is presented here only as a general mathematical framework to assist the reader in understanding the meaning and the role of the PDMs in Volterra-type modeling. A more practical approach to the problem of selecting the PDMs is to use the estimated coefficients from the direct inversion of the MDV model estimation to form a rectangular matrix C that comprises all the column vectors of all estimated kerneI expansions [i.e., the first-order kernel expansion has one column vector, the second-order kernel expansion has L column vectors, the third-order kernel expansion has L(L + 1)/2 column vectors, etc.]. Then application ofSVD on the matrix C yields the PDMs ofthe system as the singular vectors corresponding to the most significant singular values. To account for the different effect of input power on the various orders of nonlinearity, we may weigh the column vectors of the rth-order kernel by the rth power of the root-mean-square value ofthe de-meaned input signal.
186
MODULAR AND CONNECTIONIST MODELING
Illustrative Examples. Two simulated examples are given below to i1lustrate the PDM analysis using a second-order and a fourth-order Volterra system. Examples of PDM analysis using real data from physiological systems are given in Chapter 6. In the first example, a second-order system with two PDMs is described by the output equation: yen) = 1 + vI(n) + v2(n) + vI(n)v2(n)
(4.25)
where yen) is the system output and (Vh V2) are the convolutions of the system input x(n)--a I,024-point GWN signal in this simulation-with the two impulse response functions, gl and g2, respectively, shown in Figure 4.2. The first-order and second-order kernels ofthis system are shown in Figure 4.3 and can be precisely estimated from these data via the Laguerre expansion technique (LET) (see Section 2.3.2). As anticipated by theory, these kemels can be expressed as (note that ko = 1) kI(m) = gI(m) + g2(m)
k2(mh m2) =
(4.26)
1
"2 [gI(mI)g2(m2) + gl (m2)g2(m I)]
(4.27)
Equation (4.25) can be viewed as the static nonlinearity of a system with two PDMs gl and g2 (and their corresponding outputs VI and V2) having abilinear cross-term VI • V2' This nonlinearity can also be expressed without cross-terms using the "decoupled"
0.7 0.6 0.5 0.4
\ \
\ \
0.3
\ \ \
0.2
\ \ \ \
0.1
\
v \
._._._._._._._._._ .
.. :
\
0
..\~_._._._._._._._._._.-
_._._._._._._._._._._~-~-:.-.:.:.:-:.:'::::
\ \
-0.1
\ \ \
-0.2
\
, ,----,; " ;"
"
,,"
,," "
"
.•.
~*
-0.3
0
5
10 15 TIME LAG (TAU)
20
25
Figure 4.2 Impulse response functions of the two filters {g1' g2} used in the simulation example [Marmarelis, 1997].
4.1
MODULAR FORM OF NONPARAMETRIC MODELS
187
0.875 0.75 0.625 0.5 0.375 0.25 0.125 0 -0.125 -0.25 0
5
10
15
20
25
TIME LAG (TAU)
(a)
X-MIN - 0.0 X-MAX- 25
Y-MIN- 0.0 Y-MAX- 25
Z-MIN- -0.04543 Z-MAX- 0.2165
(b) Figure 4.3 First-order (a) and second-order (b) kemels of the simulated system described by Equation (4.25) [Marmarelis. 1997J.
188
MODULAR AND CONNECTIONIST MODELING
PDMs: (gI + g2) and (gI - g2), and their corresponding outputs Ul = (VI + V2) and U2 = (VI - V2), with offsets J-Ll,O = 2 and J-L2,O = 0, respectively [see Equation (4.13)] as 1
1
1
1
2 2- -u 2 Y = -Cu 4 1 +2)2- -u 4 2 = 1 +u 1 + -u 41 42
(4.28)
Note that if we apply the convention of normalizing the PDMs to unity Euclidean norm, the resulting normalized PDMs are PI = 0.60(gl + g2) and P2 = 0.92(gl - g2) in this case, and have an associated nonlinearity:
y = 1 + 1.67ul + 0.69uY -
0.30u~
(4.29)
Application ofthe eigen-decomposition (ED) approach based on LET kernel estimates (outlined above) yields the two PDMs shown in Figure 4.4. Note that the first PDM (solid line) has the same form as the first-order kerne I in this case. As indicated previously, these two PDMs are the nonnalized sum and difference of gl and g2 ofFigure 4.2. The estimated nonlinear function (through least-squares regression) for these PDM estimates is precisely the one given by Equation (4.29) and has no cross-terms (separable). The presence of contaminating noise may introduce estimation errors (random fluctuations) in the obtained PDMs. The point is illustrated by adding independent GWN to the output signal for an SNR of6 dB and, subsequently, estimating the PDMs using LET-ED that are shown in Figure 4.5. We observe excellent noise resistance of the LET-ED ap-
0.7 0.6
0.5 0.4
....
--' ....... ...
0.3 I I
0.2
I I
I
1
0.1
,
I
... ...
...
... ,
... ...
"
' .... .......
.......
I
._
o
I _._._;_._._._._._._
\
...
""",,-----
I
,
I
\
-0.1
I \
I \
I
\
; \;
-0.2 -0.3
o
5
10 15 TIME LAG (TAU)
20
25
Figure 4.4 Two PDMs obtained via eigen-decomposition (ED) using the first two LET kernel estimates. They correspond to the normalized functions P1 = O.60(g1 + 92) and P2 = O.92(g1 - 92) associated with the nonlinearity of Equation (4.29) [Marmarelis, 1997].
4.1
189
MODULAR FORM OF NONPARAMETRIC MODELS
0.7 0.6 0.5 0.4
0.3
......................
......
"' ...... ...
/
0.2
/
I I
I
0.1
, I
I
l
\ \ \ \
I
.....................
-
.
'
t I
I \
-0.2
I
\
I
J
v
-0.3
o Figure 4.5
..........
._._._._.-:-._._._._._._._._._,_._.- _._._._._._._._._._._.
o -0.1
...
'''',
I
5
10 15 TIME LAG (TAU)
20
25
Estimated PDMs from noisy data (SNR = 6 dB) using LET-ED [Marmarelis, 1997].
proach, especially for the first PDM (solid line) that corresponds to the highest eigenvalue. The obtained output nonlinearity associated with the normalized PDMs estimated via LET-ED is y = 1.02 + 1.66uI + O.70UT - O.32u~, demonstrating the robustness of this method by comparing with the precise output nonlinearity of Equation (4.29). Although these estimates are somewhat affected by the noise in the data, the estimation accuracy of the resulting nonlinear model in the presence of considerable noise (SNR = 6dB) demonstrates the robustness of the proposed approach. Next, we examine the efficacy ofthis approach for high-order systems by considering the fourth-order system described by the output nonlinearity: _
y-VI +V2 +VI V2 -
1 3 -VI
3
+
1 4 -V2
4
(4.30)
where (VI' V2) are as defined previously. This example serves the purpose of demonstrating the performance of the proposed method in a high-order case where it is not feasible to estimate all of the system kemels in a conventional manner, whereas the proposed method may yield a complete model (i.e., one containing all nonlinear terms present in the system). Note that the quadratic part of Equation (4.30) is identical to the output nonlinearity of the previous example given by Equation (4.25). The addition of the third-order and fourth-order terms will introduce some bias into the first-order and second-order kernel estimates obtained for the truncated second-order model. This bias is likely to prevent precise estimation of the previous PDMs (gI + g2) and (gI - g2) via LET-ED based on the first two kerne 1 estimates. Furthermore, more than two PDMs will be required for a model of this system without cross-terms in the output nonlinearity. For instance, the
190
MODULAR AND CONNECTIONIST MODELING
use of three PDMs corresponding to gl and g2 and a linear combination (gl + ag2) should be adequate, because gl and g2 give rise to VI and V2, respectively, and the crossterm (VIV2) can be expressed in terms of these three PDMs as: [(VI + av2)2 - vt a2v~]/2a. Thus, three separate polynomial functions corresponding to these three PDMs can be found to fully represent the system output in a separable form. The LET-ED analysis based on the first two kernel estimates obtained from the simulated data still yields two PDMs corresponding to two significant eigenvalues Al = 2.38 and A2 = -0.65 (with the subsequent eigenvalues being A3 = 0.14, A4 = -0.12, etc.). Note that the sign of the eigenvalues signifies how the respective PDM output contributes to the system output (excitatory or inhibitory). The obtained PDMs via LET-ED for Al and A2 are shown in Figure 4.6, and resemble the PDMs of the previous example, although considerable distortion (estimation bias) is evident due to the aforementioned influence of the high-order nonlinearities. These two PDMs correspond to a bivariate output nonlinearity that only yields an approximation of the system output. This distortion can be removed by increasing the order of estimated nonlinearities. This can be done by means of equivalent network models discussed in Section 4.2. In this simulated example, the use of a fourth-order model yields the three PDMs shown in Figure 4.7 that correspond precisely to gl, g2 and a linear combination of gl and g2, as discussed above [Marmarelis, 1997]. These results are extendable to systems of arbitrary order of nonlinearity with multiple PDMs, endowing this approach with unprecedented power of modeling applications of highly nonlinear systems. An illustrative example of infinite-order Volterra system is given for a simulated system described by the output nonlinearity:
0.6 0.5 0.4 0.3
... ...... 0.2
-_...
.,- ........ ,
' ... " "'"
0.1
...... ..........
0
-.._----
-0.1
-0.2 -0.3 -0.4 0
5
10 15 TIME LAG (TAU)
20
25
Figure 4.6 The two estimated PDMs for the fourth-order system described by Equation (4.30) using the LET-ED method, corresponding to two significant eigenvalues: A1 =2.38 and, A2 =-0.65. Considerable distortion relative to the previous two PDMs is evident, due to the influence of the higher-order terms (third and fourth order) [Marmarelis, 1997].
4.1
MODULAR FORM OF NONPARAMETRIC MODELS
191
0.7 0.6 0.5
0.4
\'
\ \,
0.3
" "
\\
\
\\
0.2
\
\\
" \
,
0.1
"".
-.",
\
\ \
............................\~ ,
o
'.~:>:.~:.~:.~,".".". __._._._._._.".::.:.:.=.::.::i!t!!!_u
\
-0.1
\ \
,
'..., ... _,,'" ,"
-0.2
,.' ,"
,,"
,,' .,'
J
,-
-0.3
o
5
10 15 TIME LAG (TAU)
20
25
Figure 4.7 The three estimated PDMs for the fourth-order system described by Equation (4.30) using a fourth-order model. The obtained PDMs correspond to 91 and 92' and a linear combination of 91 and 92' as anticipated by theory [Marmarelis, 1997].
y = expjv.jsinj'(v, + v2)/2]
(4.31)
where VI and V2 are as defined previously. This Volterra system has kemeIs of all orders with declining magnitudes as the order increases. This becomes evident when the Taylor series expansions of the exponential and trigonometrie functions are used in Equation (4.31). Application of the LET-ED method yields only two PDMs (i.e., only two significant eigenvalues "-I = 1.86 and "-2 = 1.09, with the remaining eigenvalues being at least one order of magnitude smaller). The prediction of the fifth-order PDM model (based on two PDMs) is shown in Figure 4.8 along with the exact output ofthe infinite-order system described by Equation (4.31). Note that the corresponding normalized mean-square error (NMSE) ofthe model prediction is only 6.8% for the fifth-order PDM model, demonstrating the potential ofthe PDM modeling approach for high-order Volterra systems. In closing this section, we must emphasize that the use ofthe PDM modeling approach is not only motivated by our desire to achieve an accurate model (of possibly high-order Volterra systems) but also by the need to provide meaningful physiological interpretation for the obtained nonlinear models, as demonstrated in Chapter 6. 4.1.2
Volterra Models 01 System Cascades
The cascade oftwo Volterra systems A and B (shown in Figure 4.9) with Volterra kemeIs {ao, ab a-; ... } and {b o, bb b 2 , ••• }, respectively, has Volterra kerneIs that can be expressed in terms of the Volterra kemels of the cascade components. The analytical expressions can be derived by the following procedure where the short-hand notation k; @ x' is employed to denote the r-tuple convolution ofthe r-th order kernel k; with the signal
192
MODULAR AND CONNECTIONIST MODELING
o
200
100
400
300
500
TIME
Figure 4.8 Actual output of infinite-order Volterra system (trace 1) and output prediction by a fifthorder PDM model using two estimated PDMs (trace 2). The resulting NMSE of output prediction is only 6.8% [Marmarelis, 1997].
x(t). It is evident that the output of the A-B cascade system can be expressed by the
Volterra series expansion: y = bo + b} ® z
+ b2 ® Z2 + ...
(4.32)
where z(t) is the output ofthe first cascade component A that can be expressed in terms of the input x( t) as z
= ao + a} ® x + a2 ® x 2 + ...
(4.33)
By substitution of z from Equation (4.33) into Equation (4.32), we have the input-output expression for the cascade system:
r-----------------------.
X
·1
A
r1
B
I
y,
----------------------Figure 4.9
The cascade configuration of two Volterra systems, A and B.
4.1
193
MODULAR FORM OF NONPARAMETRIC MODELS
y = b o + b l Q9 [aO + al Q9 x + a2 Q9 X 2 + ...] + b2 Q9 [aO + al Q9 x + a2 Q9 X 2 + ... ]2 = [bo + b l Q9 ao + b 2 Q9 ao2 + ...] + + [bI Q9 ao + b2 Q9 (ao Q9 a} + a} Q9 ao) + ...] ® X + + [bI Q9 a2 + b2 Q9 (ar + ao Q9 a2 + a2 Q9 ao) + ...] Q9 x 2 + . . . which directly determines the Volterra kernels {"o, k., k2 , equating functional terms of a given order. Thus,
f
!co = bo + ao +. . . k)(T) =
o
f
b)(A)dA + ai}
ff 0
kz(Tb Tz) =
0
ff
0
(4.35)
a)(T- A))U(T- A))bz(A), Az)dA)dAZ +
ff a)(T- A))U(T- A))br(A), · . · ,Ar)dA) . · . ds; +. . .
rI'" o
of the cascade system by
f.··f b,.{A b ... , Ar)dA) . · · o;
biA b Az)dA)dAZ + ... + ao
a)(T- A)U(T- A)b)(A)dA + 2ao
... + rat;'
••• }
(4.34)
(4.36)
aZ(T) - Ab Tz - AZ)U(T) - A))U(TZ - Az)b)(A))8(A) - Az)dA)dA Z
+ f'J"a)(T) - A))a)(TZ - AZ)U(T) - A1)u(TZ - Az)bz(A), Az)dA)dAZ o
0
+ aO L"" ["'[az(T) - Ab Tz - AZ) + az(Tl - AZ' TZ - A))] o 0 U(TI - A})U(T2 - A2)b2(Aj, A2)dA}dA2 +. . .
(4.37)
where U( T - A) denotes the step function (1 for T ~ A, 0 otherwise). These expressions can also be given in terms of the multidimensional Fourier (or Laplace) transforms of these causal Volterra kemels (denoted with capital letters): ko = b o + aoB}(O) + a5B2(0, 0) + ... + aüBr(O, ... , 0) + ...
(4.38)
K}(w) =A}(w)B}(w) + 2aoA}(w)B2(w, 0) + ... + raü-}A}(w)Br(w, 0, ... ,0) + ... (4.39) K 2(wj, (2) = A}(W})A}(W2)B2(wj, (2) + A 2(w j, (2)B}(w} + W2)
+ 2aoA2(wj, (2)B 2(wj, W2) + ...
(4.40)
It is evident that these expressions are simplified considerabIy when ao = 0 (i.e., when the first subsystem has no output basal value), especially the expressions for higher order kernels. In this case, the third-order Volterra kerne I of the cascade is given in the frequency domain by K 3(wj, W2' (3) =A}(WI)AI(W2)AI(W3)B3(wj, W2, (3) 2
+ -[A}(w})A 2(W2' (3)B 2(wj, W2 + (3) + A}(W2)A2(W3' w})B 2(W2' W3 + w}) 3
+ A}(W3)A2(Wj, w2)B2(W3' W} + (2)] + A 3(wJ, W2' (3)B 1(w} + W2 + (3)
(4.41)
194
MODULAR AND CONNECTIONIST MODELING
The frequency-domain expressions ofthe Volterra kemels indicate how the frequency-response characteristics of the two cascade components combine at multiple frequencies in a nonlinear context. For instance, the interaction between input power at two frequencies w} and W2 is reflected on the cascade output in a manner detennined by the first-order response characteristics of A at these two frequencies combined in product with the secondorder response characteristics of B (represented by the second-order kernel) at the bifrequency point (wJ, W2) [i.e., the tennA}(w})A(W2)B2(wJ, W2)], and so on. For cascade systems with more than two components, the associative property can be applied, whereby the kernels ofthe first two components are evaluated first and then the result is combined with the third component, and so. on. The reverse route can be followed in decomposing a cascade into its constituent components [Brilliant, 1958; George, 1959]. For finite-order cascade components, the order ofthe cascade is the product ofthe orders ofthe cascade components. For instance, in the A-B cascade, if QA and QB are the finite orders of A and B respectively, then the order of the overall cascade is QA . QB, as can be easily derived from Equations (4.32) and (4.33).
The L-N-M, L-N, and N-M Cascades. The most widely studied cascade systems to date are the L-N-M, L-N, and N-M cascades, where Land M represent linear filters and N is a static nonlinearity. These three types of cascades have been used extensively to model physiological systems in a manner that lends itself to interpretation and control (see Chapter 6). Note that the L-N-M cascade (also called "the sandwich model") was initially used for the study of the visual system in the context of "describing functions" [Spekreijse, 1969, 1982] and several interesting results were obtained outside the context of Volterra-Wiener modeling. In the Volterra-Wiener context, the pioneering work is that ofKorenberg (1973a,d), elaborated in Korenberg & Hunter (1986). The expressions for the V olterra kernels of the L-N-M cascade can be easily adapted to the other two types of cascades by setting one filter to unity transfer function (allpass). Thus, we will start by deriving the kerne I expressions for the L-N-M cascade. first, and then the expressions for the other two cascades, L-N and N-M, can immediately follow. The L-N-M cascade is composed of two linear filters Land M with impulse response functions g( T) and h( T), respectively, separated by a static nonlinearity N defined by the function.f(·) that can be represented as a polynomial or apower series. The latter can be viewed as the Taylor series expansion for any analytic nonlinearity. Ifthe nonlinearity is not analytic, a polynomial approximation of arbitrary accuracy can be obtained over the domain of values defined by the output of the first filter L using any polynomial basis in the context of function expansions detailed in Appendix I. The constitutive equations for the L-N-M cascade shown in Figure 4.10 are v(t) = {"g(T)x(t- T)dT o
(4.42)
Q
I
lXrVr(t)
(4.43)
y(t) = {"h(A)z(t- A)dA
(4.44)
z(t) =f[v(t)] =
r=O
o
4.1
MODULAR FORM OF NONPARAMETRIC MODELS
195
~LHNHM~ z=/{v)
v=g®x
y=h®z
Figure 4.10 The L-N-M configuration composed of two linear filters Land M separated by a static nonlinearity N (see text).
where Q may tend to infinity for a general analytic nonlinearity. Combining these three equations to eliminate v(t) and z(t), we can obtain the input-output relation in the form of a Volterra model of order Q as folIows. Substitution of z(t) from Equation (4.43) into Equation (4.44) yields y(t) =
L a; fooh(A)v(t - A)dA Q
(4.45)
0
r=0
and substitution of v(t) from Equation (4.42) into Equation (4.45) yields the input-output relation y(t) =
L a; L Q
r=0
m in ( Tl , ... , T r )
h(A)dA
0
foo foo ...
0
0
g( Tl - A) ... g( Tr - A)x(t - T}) ... x(t - Tr)dTl ... dr,
(4.46) Therefore, the rth-order Volterra kerneI ofthe L-N-M cascade is: ~NM( Tb . . . ,
m in ( Tl , ... , T r )
Tr) = a;
f
h(A)g( Tl - A) ... g( Tr - A)u(T} - A) ... U(Tr - A)dA
o
(4.47) This expression yields the Volterra kernels ofthe L-N and N-M cascades by letting h(A) = 8(A) in the former case and g(A) = 8(A) in the latter case. Thus, the rth-order Volterra kernel for the L-N cascade (sometimes called the "Wiener model"-somewhat confusingly, since Wiener's modeling concept is far broader than this extremely simple model) has the rth-order Volterra kernel k~N( Tb
... ,
Tr) = argeT}) ... g( Tr)U( Tl) ... U(Tr)
(4.48)
and the N-M cascade (also called the "Hammerstein model") has the rth-order Volterra kernel
k~M(TI' ... , Tr) = a r{ ~ L r
Cil, ...
,ir)
h(Th)8(7j2 -7jI) ... 5(7jr -7jI)}U(TI) ... U(Tr)
(4.49)
where the summation over Vb ... ,ir) takes place fori} = 1, ... ,r and V2' ... ,ir) being all other (r- 1) indices excepti.. This rotational summation is necessary to make the Volterra kernel invariant to any permutation ofits arguments (i.e., symmetrie by definition).
196
MODULAR AND CONNECTIONIST MODELING
The relative simplicity of these kerne I expressions has made the use of these cascade models rather popular. Some illustrative examples were given in Sections 1.4 and 2.1. Additional applications of these cascade models are given in Chapter 6 and more can be found in the literature [Hunter & Korenberg, 1986; Kearney & Hunter, 1990; Naka et al., 1988; Sakai & Naka, 1987a,b, 1988a,b; Sakuranaga & Naka, 1985a,b,c]. It is evident from the Volterra kerne I expressions (4.47), (4.48), and (4.49) that the different cascade structures can be distinguished by simple comparisons ofthe first- and second- (or higher) order Volterra kernels. For instance, any "slice" ofthe second-order kernel at a fixed T2 value in the L-N model is proportional to the first-order kerneI-an observation used in the fly photoreceptor model discussed in Section 2.1.1 (Fig. 2.6). The same is true for slices of higher-order kernels of the L-N model. Furthermore, the first-order Volterra kernel is proportional to the impulse response function g( T) of the linear filter L. Likewise, the N-M model is distinguished by the fact that the second-order kernel is zero everywhere except at the main diagonal ('Tl = 'T2), where the second-order kernel values are proportional to the first-order kernel, which is in turn proportional to the impulse response function h( T) of the linear filter M. Similar facts hold for the higher-order kernels ofthe N-M cascade (i.e., zero everywhere except at the main diagonal, TI = 'T2 = ... = Tn where the kernel values are proportional to the first-order kernel). Note that kernels of order higher than second have rarely been estimated in practice and, therefore, these useful observations have been used so far primarily in comparisons between first- and second-order kemels, For the L-N-M cascade, the relation between the first-order and the second-order Volterra kernels is a bit more complicated in the time domain but becomes simplified in the frequency domain where KrNM(W) = aIG(w)H(w) K~NM(Wh W2)
(4.50)
= a2G(wI)G(W2)H(WI + W2)
(4.51)
and, consequently,
K~NM(W, 0) = a2 G(O) . KrNM(w)
(4.52)
al
This relation can be used to distinguish the L-N-M structure, provided that G(O) =1= O. The frequency-domain expression for the rth-order Volterra kernel of the L-N-M cascade is Kr(Wh ... , wr) = arG(wI)· .. G(Wr)H(WI + ... + wr)
(4.53)
A time-domain relation can also be found for the L-N-M model, when we observe that the values of the zth-order kernel along any axis (Le., when any Ti is zero) become proportional to the values of the (r + 1)th-order kernel along any two axes, due to the causality of g( T) [Chen et al., 1985, 1986]: k~M(Th ... , 'Tr-h 0, 0)
= ar+ lh(O)g2(O) . g(TI) ... g(Tr-l) ar+l ar
= --g(O) . k~NM( 'Th
.•. ,
Tr-h 0)
(4.54)
4.1
MODULAR FORM OF NONPARAMETRIC MODELS
197
This relation, however, requires at least third-order kerne1 estimates for meaningful implementation, provided also that g(O) =f= 0 and h(O) =1= o. These requirements have limited its practica1 use to date. Nonethe1ess, the kernel values along any axis offer an easy way of estimating the prior filter g( T) (within a scalar), provided that g(O)h(O) =f= 0, since k~NM(Tl'
0) = a2h(0)g(0) . g(Tl)
(4.55)
Subsequently, the posterior filter h( T) can be estimated (within a scalar) through deconvolution of the estimated g( T) from the first-order Vo1terra kerne I [see Equation (4.50)]. It is evident that the prior filter L andlor the posterior filter M can be estimated (within a sca1ar) from the first-order and second-order Vo1terra kerne1s of any of these cascades (L-N-M, L-N, N-M), following the steps described above. Subsequently, the static nonlinearity can be estimated by plotting (or regressing) the reconstructed interna1 signal z(t) on the reconstructed interna1 signal v(t) in the case of the L-N-M cascade, or the corresponding signals in the other two cases [e.g., plotting the output y(t) versus the reconstructed internal signal v(t) in the case of the L-N cascade]. "Reconstructed" v(t) signal imp1ies convolution of the input signal with the estimated prior filter g( T), and "reconstructed" z(t) signal imp1ies deconvo1ution of the estimated posterior filter h( T) from the output signal. C1early, there is a scaling ambiguity between the filters and the static nonlinearity [i.e., the filters can be multip1ied by an arbitrary nonzero sca1ar and the input-output relation of the cascade remains intact by adjusting properly the sca1e(s) of the static non1inearity]. For instance, let us assume that the prior and the posterior filters of the L-N-M cascade are multip1ied by the sca1ars CL and CM, respectively. Then the static non1inearity z = j{v) is adjusted to the non1inearity __ 1 f(v) = -f(CLV) CM
(4.56)
in order to maintain the same input-output relation in the overall cascade model. This·fact imp1ies that the estimated filters can be norma1ized without any loss of genera1ity. The foregoing discussion e1ucidates the way of how to achieve comp1ete identification of these three types of cascade systems, provided that their first-order and second-order Volterra kerneis can be estimated accurate1y (within a sca1ar). This raises the issue ofpossib1e estimation bias due to higher-order terms when a truncated second-order Vo1terra model is used for kerne1 estimation. For such bias to be avoided, either a comp1ete Volterra model (with all the significant nonlinear terms) ought to be used, or a Wiener model of second order that offers an unbiased solution in this particu1ar case. It is this latter case that deserves proper attention for these cascade systems because it offers an attractive practica1 solution to the problem of high-order model estimation. The reason for this is found in the fact that the Wiener kerne1s of first order and second order are proportional to their Volterra counterparts for these cascades systems [Marmarelis & Marmare1is, 1978]. Specifically, the Wiener kernel expressions for the L-N-M cascade are found using Equation (2.57) to be hl(T) = kl(T)
·~o 00
(2m + I)! m!2 m
[ a2m+1
Lg2(A)dA]m oo
p
0
(4.57)
198
MODULAR AND CONNECTIONIST MODELING
h 2('Tl'
'T2) = ki'Tl> 'T2)'
~l 00
(2m)!
[
1
(m -1)!2m a2m P
00
0
t
g2(A)dA
(4.58)
The proportionality factor between the Volterra and the Wiener kernels depends on the coefficients of the polynomial (or Taylor series) static nonlinearity of the same parity (i.e., odd for h 1 and even for h2) . The proportionality factor also depends on the variance ofthe prior filter L output for a GWN input with power level P, which is given by Var[v(t)]
=P('g2(A)dA
(4.59)
Thus, estimation of the normalized first-order and second-order Volterra kemels can be achieved in practice for an L-N-M cascade system 0/ any order by means of estimation of their Wiener counterparts when a GWN input is available. Subsequently, the estimation of each cascade component separately can be achieved by the aforementioned methods for any order of nonlinearity. The feasibility of estimating such cascade models of arbitrary order of nonlinearity has contributed to their popularity. Some illustrative examples are given in Chapter 6. Because of its relative simplicity and the fact that cascade operations appear natural for information processing in the nervous system, the L-N-M cascade (or "sandwich model") received early attention in the study of sensory systems by Spekreijese and his colleagues, who used a variant of the "describing function" approach employing a combination of sinusoidal and noise stimuli [Spekreijse, 1969]. A few years later, Korenberg analyzed the sandwich model in the Volterra-Wiener context [Korenberg, 1973a]. This pioneering work was largely ignored until it was properly highlighted by Marmarelis and Marmarelis (1978), leading to a number of subsequent applications to physiological systems. We conclude this section by pointing out that the aforementioned three types of cascade systems cannot be distinguished by means of the first-order kerne1 alone (Volterra or Wiener), but that the second-order kernel is necessary ifthe static nonlinearity has an even component, or the third-order kernel is required ifthe static nonlinearity is odd. Longer cascades can also be studied using the general results on cascaded Volterra systems presented earlier; however, the attractive simplifications ofthe L-N-M cascade (and its L-N or N-M offsprings) are lost when more nonlinearities are appended to the cascade.
4.1.3
Volterra Models 01 Systems with Lateral Branches
The possible presence of lateral feedforward branches in a system may take the form of additive parallel branches (the simplest case) or modulatory feedforward branches that either multiply the output of another branch or affect the characteristics (parameters or kernels) of another system component (see Figure 4.11). In the simple case of additive parallel branches (see Figure 4.11a), the Volterra kemels ofthe overall system are simply the SUfi ofthe component kemels ofthe respective order:
kr ( Th
. . . , Tr )
= a r ( Th . . .
, Tr )
+ br ( Tl'
. . . , Tr )
where {a.} and {b r } are the rth-order Volterra kernels of A and B, respectively.
(4.60)
4.1
MODULAR FORM OF NONPARAMETRIC MODELS
x X
-.
199 y=A[x,z]
y=ZA+ZB
(c)
(b)
(a)
Figure 4.11 Configurations of modular models with lateral branches. (a) Two parallel branches converging at an adder. (b) Two parallel branches converging at a multiplier. (c) A lateral branch B modulating component A.
In the case of a multiplicative branch (see Figure 4.11b), the system output is given by y = [ao + al ® x + a2 ® x 2 + ... ][b o + b l 0
X
+ b 2 ® x 2 + ... ]
(4.61)
Thus, the Volterra kernels of the overall system are
k2(Tb T2)
= aOb2(Tb
ko = aobo
(4.62)
kl(T) = aObl(T) + bOal(T)
(4.63)
T2) + bOa2(Tb T2)
1
+i
[al(TI)b l(T2)
+ al(T2)b l(TI)]
(4.64)
r
k r ( Tb ... , Tr )
=
I a j=O
j(
Tb ... , Tj)br- j ( Tj+b ... , Tr )
(4.65)
for Tl ~ T2 ~ ... ~ Tn so that the general expression for the rth-order kernel need not be symmetrized with respect to the arguments (Tb . . . , Tr ) . In the case of the "regulatory" branch B of Figure 4.11 c, the Volterra kerne1 expressions for the overall system will depend on the specific manner in which the output Z of component B influences the internal characteristics of the output-generating component A. For instance, if z(t) multiplies (modulates) the first-order kernel of component A, then y = ao + [bo + b, ® x + b2 ® x 2 + .. .]al 0 x + a2 ® x 2 + ...
(4.66)
and the Volterra kernels of this system are k r( Tb ... , Tr ) = a r( Tb ... , Tr ) + al(TI)b r- l( T2' ... , Tr )
(4.67)
for Tl ~ T2 ~ ... ~ r., to avoid symmetrizing the last term ofEquation (4.67). This latter category of "regulatory" branches may attain numerous diverse forms that will define different kernel relations. However, the method by which the kernel expressions are derived in all cases remains the same and relies on expressing the output in terms of the input using Volterra representations. Note that the component subsystems may also be expressed in terms of parametric models (e.g., differential equations). Then the equivalent nonparametric model of each
200
MODULAR AND CONNECTIONIST MODELING
component must be used to derive the Volterra kemels of the overall system in tenns of the component kernels. An example of this was given in Section 1.4 for the "minimal model" of insulin-glucose interactions.
4.1.4
Volterra Models of Systems with Feedback Branches
Systems with feedback branches constitute a very important class of physiological systems because of the critical role of feedback mechanisms in maintaining stable operation under normal or perturbed conditions (horneostasis and autoregulation). Feedback mechanisms may attain diverse forms, including the closed-loop and nested-loop configurations discussed in Chapter 10. In this section, we derive the Volterra kernels for certain basic feedback configurations depicted in Figure 4.12. The simplest case (Figure 4.12a) exhibits additive feedback that can be expressed as the integral input-output equation: y = al ® [x + b, ® Y + b2 ®
r + ...] + a2 ® [x + b, ® Y + b2 ® y 2 + ... ]2 + ...
(4.68)
where we have assumed that ao = 0 and bo = 0 to simplify matters (which implies that ko = 0). This integral equation contains Volterra functionals of the input and output, suggesting the rudiments of the general theory presented in Chapter 10. The explicit solution of this integral equation (i.e., expressing the output signal as a Volterra series of the input signal) is rather complicated but it may be achieved by balancing Volterra tenns of the same order. Thus, balancing terms of first order yields kI 0 x = a I 0 x + aI ®b I ®k I 0x
(4.69)
which can be solved in the frequency domain to yield A}(w) K)(w) = 1 -A)(w)B)(w)
(4.70)
Equation (4.70) is well known from linear-feedback system theory. Balancing tenns of second order, we obtain
o
k2 0x2 =al 0 [bI 0 k2 ®x2+ b2 0 (k} 0X)2] +a2 0 [x2 + (bI 0 k I 0X)2 + 2x(b1 0 k} x)] (4.71)
y
x z
y
y
x
z
(a)
(b)
(c)
Figure 4.12 Configurations of modular models with feedback branches: (a) additive feedback branch B; (b) multiplicative feedback branch B; (c) modulatory feedback branch B; all acting on the forward component A.
4. 1 MODULAR FORM OF NONPARAMETRIC MODELS
201
which can be solved in the frequency domain to yield the second-order Volterra kernel of the feedback system: K 2(Wb W2) = {A 1(WI + w2)B2(Wb w2)K1(Wl)K1(W2)
+ A 2(Wb w2)[1 + B 1(Wl)B 1(W2)K 1(Wl)K 1(W2) + 1 -1 + "2 [B1(Wl)K1(Wl) + B 1(W2)K 1(W2)]}[1 -A 1(WI + w2)B1(Wl + W2)] (4.72) This approach can be extended to any order, resulting in kernel expressions of increasing complexity. Obviously, these expressions are simplified when either A or B is linear. For instance, ifthe forward component A is linear, then the second-order kernel becomes K 2(Wb W2) =A 1(WI + w2)B2(Wb w2)K1(Wl)K1(W2)[1-A 1(Wl + w2)B1(Wl + W2)]-1
(4.73)
and the third-order kerneI is given by Kiwj, W2' W3) = {
~Aj(WJ + W2 + w3)[B2(wj, w2)K2(wl> ~)KJ(W:J) +
+ B 2(W2' w3)K2(W2' w3)K 1(Wl) + B 2(W3' wl)K2(W3' wl)K1(W2)] + B 3(wj,
~, W])KJ(WJ)KJ(~)KJ(W3)}[1 -AJ(wJ + W2 + w3)BJ(wJ + W2 + W3)]-J
(4.74)
This case of the linear forward and nonlinear feedback is discussed again in the following section in connection with nonlinear differential equation models. We examine now the multiplicative feedback ofFigure 4.12b. The input-output integral equation (assuming ao = k o = 0 but b o i= 0) is y = al @ [x(bo + b 1 @ Y + b 2 @ y 2 + ...)] + a2 @ [x(bo + b 1 @ y + b 2 @ y2 + ... )]2 + ... (4.75) which yields the first-order balance equation k 1 @x= al @ (boX)
(4.76)
from which the first-order Volterra kernel of the multiplicative feedback system is derived to be K 1(w) = boAl(W)
(4.77)
The second-order balance equation is
k2 @ x 2 = al ® [x(b l @ k, ® x)] + a2 @ (boX)2
(4.78)
which yields the second-order Volterra kernel 2 bo K 2(Wb W2) = b~2(Wb W2) + TAI(WI + w2)[B1(WI)AI(WI) + BI (w2)A 1(W2)]
(4.79)
202
MODULAR AND CONNECTIONIST MODELING
Note that the kernel expressions for multiplicative nonlinear feedback are simpler than their counterparts for additive nonlinear feedback. The case of "regulatory" feedback of Figure 4.12c depends on the specific manner in which the feedback signal z(t) influences the characteristics of the forward component A and will not be discussed further in the interest of saving space.
4.1.5
Nonlinear Feedback Described by Differential Equations
This case was first discussed in Section 3.2 and is revisited here in order to elaborate on the relationship between parametric models described by nonlinear differential equations and modular feedback models. It is evident that any of the component subsystems (A and/or B), discussed above in connection with modular feedback systems/models, can be described equivalently by a parametric or nonparametric model and converted into the other type using the methods presented in Sections 3.2-3.5. In this section, we will elaborate further on the case of a system with a linear forward and weak nonlinear feedback, shown in Figure 4.13, that is described by the differential equation L(D)y + Ej(y) = M(D)x
(4.80)
where lEI ~ 1, and L(D), M(D) are polynomials in the differential operator D:!: [d(·)/dt]. If the functionj'(-) is analytic or can be approximated to an arbitrary degree of accuracy by apower series (note that the linear term is excluded since it can be absorbed into L) as [Marmarelis, 1991] j(y)
= Ianyz
(4.81)
n=2
then the resulting Volterra kernels are M(s) K 1(s) = L(s)
(4.82)
Kn(s}, ... ,sn) = - E a,J(l(Sl) ... K1(sn)/L(Sl + ... + Sn)
(4.83)
where terms of order E 2 or higher have been considered negligible.
Figure 4.13 The modular form of the nonlinear feedback system described by Equation (4.80). L and M are linear dynamic operators and f(') is an analytic function (or a function approximated by a polynomial). The factor -E in the feedback component denotes weak negative feedback.
4. 1 MODULAR FORM OF NONPARAMETRIC MODELS
203
The first-order Wiener kernel in this case is
f
H 1(Jw) = K 1(JW){ 1 _ ~ (2m + I)! ( P K)m } m! 2 a2m+1 L(jw) m=l =
where
K
~C (P)] L(Jw) 1
K 1(JW)[ 1 -
(4.84)
is the integral ofthe square of k 1 andP K = (PK). The second-order Wiener kernel is:
HljWj,j~) =- E Kj(jwj)Kj(j~)
f
(2m + 2)! (PK)m
L(JW1 + jW2) m=O
=-
m!2
2
a2m+2
Kj(jWj)Kj(j~) C2CP) E
L(jWj +j~)
(4.85)
We observe that as the GWN input power level P varies, the wavefonn ofthe first-order Wiener kernel changes but the second-order Wiener kernel remains unchanged in shape and changes only in scale. Note that the functions C 1(P) and C2(P) are power series (or polynomials) in (P K) and characteristic of the system nonlinearities. The Wiener kernels approach their Volterra counterparts as P diminishes (as expected). These results indicate that, for a system with linear forward and weak nonlinear feedback (i.e., IE ail ~ 1), the first-order Wiener kernel in the time domain will be
hj('r) = kj(r) - E Cj(P)(kj(r- A)g(A)dA
(4.86)
and the second-order Wiener kernel will be m ineTl, T2)
h2 (Tb T2)
= -
I
E C2(P)
0
k 1(T1 - A)k 1(T2 - A)g(A)dA
(4.87)
where g(A) is the inverse Fourier transform of I/L(jw). A companion issue to that of changing input power level is the effect of changing mean level of the experimental input (with white noise or other perturbations superimposed on them) in order to explore different ranges of the system function. The resulting kernels for each different mean level ~ of the input will vary if the input data are defined as the deviations from the mean level each time. To reconcile these different measurements, we can use a reference mean level ~O in order to refer the kernels {knl-t} obtained from different mean levels ~ to the reference kerneis {kn O} according to the relation
k~( Tb ... , Tn) =
I (n +,.,i)! (~- ~oY°1 i=O nu:
00
00
0
...
fk~+i(Tb ... , Tn, Ub ... , Ui)du1 ... da, (4.88)
The first-order Wiener kernel for this class of systems with static nonlinear feedback is given in tenns ofthe reference Volterra kernels (when /-Lo = 0) by the expression
hl)(r) = k,O(r)- E A (g(A)k?(r- A)dA o
(4.89)
204
MODULAR AND CONNECTIONIST MODELING
where
+ i+ I)! (PK)m L? (2m m.l. ,., -2 (J.L1'Y.a2m+i+l } 00
A= {
00
(4.90)
m= 01= 0 m+i~l
and l' is the integral of k? Note that the first-order Wiener kernel for J.L =/; 0 is also affected by the even-order terms ofthe nonlinearity in this case, unlike the case of IL = 0, where it is affected only by the odd-order terms ofthe nonlinearity. Below, we use computer simulations of systems with cubic and sigmoidal feedback to demonstrate the effect of changing GWN input power level and/or mean level on the waveform ofthe first-order Wiener kernel. This will help us explain the changes observed in the first-order Wiener kemels of some sensory systems when the GWN input power level and/order mean level is varied experimentally. Example 4.1. Cubic Feedback Systems First, we consider a system with a low-pass forward linear subsystem (L-l) and a cubic negative feedback - E y3, as shown in Figure 4.13 (for M == 1). For lEI ~ 1, the firstorder Wiener kernel is given by Equation (4.86) as h\(r) = g(r) - 3 E P K (g(A)g(r- A)dA
(4.91)
where the first-order Volterra kernel k 1(T) is identical to the impulse response function g(T) ofthe low-pass linear, forward subsystem in this case. For a zero-mean GWN input with power levels of P = 1, 2, 4 and cubic feedback coefficient E = 0.001, the first-order
Wiener kerneI estimates are shown in Figure 4.14 along with the estimate for E = 0 (i.e., no cubic feedback) or P ~ 0, which corresponds to k1(T) == g(T). We observe a gradual decrease of damping (i.e., emergence of an increasing "undershoot") in the kernel estimates as P increases, consistent with Equation (4.91). This corresponds to a gradual increase oftheir bandwidth as P increases, as shown in Figure 4.15, where the FFT magnitudes of these kerneI estimates are shown up to normalized frequency 0.1 Hz (Nyquist frequency is 0.5 Hz). We observe the gradual transition from an overdamped to an underdamped mode and a companion decrease of zero- frequency gain as P increases, similar to what has been observed in certain low-pass sensory systems such as retinal horizontal cells. Note that this system becomes unstable when P increases beyond a certain value. Next we explore the effect of varying the GWN input mean level J.L while keeping E and P constant (E = 0.001 and P = 1) using input mean levels of J.L = 0, 1, 2, and 3, successively. The obtained first-order Wiener kerne I estimates are shown in Figure 4.16 and exhibit changes in their wavefonn as J.L increases that are qualitatively similar to the ones induced by increasing P (i.e., increasing bandwidth and decreasing damping). According to the general expression ofEquation (4.89), we have for this system h't(r) = g(r) - 3 E (P K + (J.//y)2l(g(A)g(r- A)dA
(4.92)
We see that the effect ofincreasing Pis similar to the effect ofincreasing J.L2 and the differential effect is proportional to K and respectively. Another point of practical interest
r,
4. 1 MODULAR FORM OF NONPARAMETRIC MODELS
205
0 ..4 00 0.350 0.300 0'-250
,..,
~
:r:
0.200 0.150 0.100
o.506E-01 0.0 -O.500E-01
-0.100
I
0.0
e.oo
16.0
24.0
40.0
32.0
TIME LAG (TAU)
Figure 4.14 First-order Wiener kernel estimates of the system with negative cubic feedback (E = 0.001) and a low-pass forward subsystem g(7"), obtained for P = 1,2, and 4, along with the first-order Volterra kernel of the system (P ~ 0), which is identical to g(7') in this case. Observe the increasing undershoot in the kernel waveform as P increases [Marmarelis, 1991].
4.00 :5.60 3.20
2.80 2.40 2.00 1.60
1.20 0.800 0.400 0.0
T
0.0
O.200E-01
O.400E-01
O.600E-01
o.aOOE-01
0.100
NORMALlZED FREOuENCY (Hz}
Figure 4.15 FFT magnitudes of the first-order kerneis in Figure 4.14, plotted up to normalized frequency of 0.1 Hz. Observe the gradual transition from overdamped to underdamped mode and the increase of bandwidth as P increases [Marmarelis, 1991].
206
MODULAR AND CONNECTIONIST MODELING
0.420
0.360 0.300
0.240 0.180
S
0.120
:I:
0.600E-01
0.0 -O.600E-01
-0.120 -0.180
T
0.0
8.00
24.0
16.0
32.0
40.0
TIME LAG (TAU)
Figure 4.16 First-order Wiener kernel estimates of system with negative cubic feedback and a lowpass forward subsystem, obtained for J.t = 0, 1, 2, and 3 (P = 1, E = 0.001 in all cases). The changes in kernel waveform follow the same qualitative pattern as in Figure 4.14 [Marmarelis, 1991].
is the difference between the first-order kernel (Volterra or Wiener) and the system response to an impulse. This point is often a source of confusion due to misconceptions ingrained by linear system analysis. For a third-order system, such as in this example for small E, the response to an impulse input x(t) = AS(t) is rtf..t) = Ag(t) - E A3fg(A) g3(t - A)dA o
(4.93)
which is clearly different from the first-order Volterra kernel kl(t) == g(t), or its Wiener counterpart given by Equation (4.91). Another point of practical interest is the response of this nonlinear feedback system to a step/pulse input x(t) = Au(t), since pulse inputs have been used extensively in physiological studies. The system response to the pulse input is ruCt) = A f g(T)dT- E A3fg( o 0
T){f g(A - T)dA}3dr
(4.94)
T
and the changes in the response waveforms as the pulse amplitude increases are demonstrated in Figure 4.17, where the responses of this system are shown for pulse amplitudes of 1, 2, and 4. The observed changes are qualitatively consistent with the previous discussion (i.e., the responses are less damped for stronger pulse inputs). However, the reader must note that we cannot obtain the first-order kernel (Volterra or Wiener) or the response
207
4. 1 MODULAR FORM OF NONPARAMETRIC MODELS
116 11.0 1o.S
1.08 1.50 &00 ...50 3.00 1..50
0.0
-1.50
-, 0.0,
25.0
50.0
75.0
100.
125.
TIME
Figure 4.17 Responses of negative cubic feedback system (E = 0.001) to pulse inputs of different amplitudes (1, 2, and 4). Observe the gradual decrease of damping and latency of the onset response as the pulse amplitude increases, as weil as the difference between onset and offset transient responses [Marmarelis, 1991].
to an impulse by differentiating the pulse response over time, as in the linear case. Observe also the sharp difference between onset and offset transient response, characteristic of nonlinear systems and so often seen in physiological systems. The steady-state value ofthe step response for various values of A is given by L(O)y + E y3 = A
(4.95)
(in the region ofstability ofthis system) where L(O) == I/K I(O) for this system. The steadystate values of the pulse response as a function of pulse amplitude are shown in Figure 4.18. Note that these values are different, in general, from the mean response values when the GWN input has nonzero mean. Having examined the behavior of this nonlinear feedback system with a low-pass (overdamped) forward subsystem, we now examine the case of a band-pass (underdamped) forward subsystem with negative cubic feedback of E = 0.008. The resulting first-order Wiener kerneI estimates for increasing GWN input power level (viz., P = 1, 2 and 4) are shown in Figure 4.19, along with the first-order Volterra kerneI ofthe system (which is the same as the impulse-response function ofthe linear forward subsystem) that corresponds to the case of P = O. We observe a gradual deepening ofthe undershoot portion of the band-pass kernel accompanied by a gradual shortening of its duration as P increases (i.e., we see a gradual broadening ofthe system bandwidth and upward shift ofthe resonance frequency as P increases). This is demonstrated in Figure 4.20 in the frequency domain, where the FFT magnitudes ofthe kernels ofFigure 4.19 are shown. The changes
208
MODULAR AND CONNECTIONIST MODELING
20 ..0
is.o
w
I ~
Q..
~ LaI
12.0 8..00
4.00
0..0
i
-4.'00
~
-8.00
~
";'12.0
~
,
~
;;
-16.0
-20.0
1-'
-16.0
-8.00
8.00
0.0
16.0
24.0
PULSE AMPUTUDE
Figure 4.18 The steady-state values of the pulse responses as a function of input pulse amplitude for the negative cubic feedback system (E = 0.001) [Marmarelis, 1991].
0.480
0.400 0..320 0.240 0.180
'"" ~ 0.8ooE-01 %:
0.0
-0.8ooE-01 -0.160
-0.2.0 -0.320
-. 0.0
8.00
16.0'
24.0
32.0
40.0
TIME lAG
Figure 4.19 First-order Wiener kerneis of the negative cubic feedback system (E = 0.008) with the band-pass (underdamped) linear forward subsystem (corresponding to P = 0) for P = 0, 1, 2, and 4. Observe the increasing undershoot as P increases [Marmarelis, 1991].
4. 1 MODULAR FORM OF NONPARAMETRIC MODELS
209
0.125E-01 0.1'..,E-01 O.10QE-01
E
0.875E-02
i
i ~
0.75OE-02
w
0.625E-02 O.5DOE-02
~ 0.375E-02 0.25OE-02
0.125E-02 0.0
0.0
0.400E-01
0.800E-Ot
0.120
0.160
0.200
NORMALIZED rREQUENCY
Figure 4.20 FFT magnitudes of the first-order Wiener kerneis shown in Figure 4.19. Observe the gradual increase of bandwidth and upward shift of resonant frequency as P increases [Marmarelis, 1991].
in the wavefonn of these kernels with increasing P are consistent with our theoretical analysis and mimic changes observed experimentally in some band-pass sensory systems (e.g., primary auditory fibers). Note that the effect ofincreasing GWN input mean level on the first-order Wiener kernels is not significant, due to the fact that 'Y (i.e., the integral of k 1) is extremely small in this case-ef. Equation (4.92). Finally, the system response to input pulses of increasing amplitude (A = 1, 2, and 4) are shown in Figure 4.21 and demonstrate increasing resonance frequency and decreasing damping in the pulse response as A increases. Note also that the steady-state values of the pulse responses are extremely small, and the onset/offset response waveforms are similar (with reverse polarity), due to the very small value of 'Y = K 1(0) [cf. Equation (4.95)]. Example 4.2. Sigmoid Feedback Systems The next example deals with a sigmoid feedback nonlinearity which, unlike the cubic one, is bounded for any output signal amplitude. The arctangent function 2
j(y) = -
arctan( ay)
(4.96)
7T'
was used in the simulations (a = 0.25) with the previous low-pass forward subsystem and the resulting first-order Wiener kernels for P = 1 and E = 0,0.125, and 0.5 are shown in Figure 4.22. The qualitative changes in waveform are similar to the cubic feedback case for increasing input power level P or feedback strength E. However, for fixed sigmoid
210
MODULAR AND CONNECTIONIST MODELING
8.00 1.40 4.80 3.20 1.60
0.0 -1.10
-3.20 -4.80
-6.40 -8.00
0.0
25.0
50.0
75.0
125.
100.
TIME
Figure 4.21 Response of negative feedback system (E = 0.008) with underdamped forward to pulse inputs of different amplitudes A = 1,2, and 4. Observe the increasingly underdamped response as A increases, and the negligible steady-state responses [Marmarelis, 1991].
0.450
0.400 0.350
0.300 0.250 .........
t:, J:
0.200 0.150
0.100
c O.500E-Q1
0.0 -O.500E-Q1
f
0.0
i
i
'I
I
I
1',--1
8.00
6.0
24.0
32.0
40.0
TIME LAG (TAU)
Figure 4.22 First-order Wiener kernel estimates of negative sigmoid feedback system with the previous low-pass forward subsystem for E = 0, 0.125, and 0.5 (P = 1, a = 0.25 in all cases). Observe the similarity in changes of kernel waveform with the ones shown in Figure 4.14 [Marmarelis, 1991].
4. 1 MODULAR FORM OF NONPARAMETRIC MODELS
211
feedback strength (E) the kernels resulting from increasing GWN input power level P follow the reverse transition in wavefonn, as demonstrated in Figure 4.23, where the kernels obtained for P == 1, 4, 16, and 64 are shown for E = 0.25 in all cases. Bear in mind that the first-order Volterra kernel of this sigmoid feedback system is not the same as the impulse response function of the forward subsystem, but it is the impulse response function of the overall linear feedback system when the linear tenn of the sigmoid nonlinearity (i.e., its slope at zero) is incorporated in the (negative) feedback loop. Thus, the kerne I wavefonn follows the previously described gradual changes from the impulse response function of the linear feedback system to that of the linear forward subsystem as P increases (i.e., the kernel wavefonn changes gradually from underdamped to overdamped as P increases and the gain of the equivalent linearized feedback decreases). Because of the bounded nature of the (negative) sigmoid nonlinearity, large values of E andlor P do not lead to system instabilities as in the case of cubic feedback. Increasing values of E result in decreasing damping, eventually leading to oscillatory behavior. This is demonstrated in Figure 4.24, where the kernels for E = 0.5, 1,2, and 4 are shown (P = 1). The oscillatory behavior of this system, for large values of E, is more dramatically demonstrated in Figure 4.25, where the actual system responses y{t) for E = 100 and 1000 are shown (P == 1). The system goes into perfect oscillation regardless ofthe GWN input, due to the overwhelming action of the negative sigmoid feedback that is bounded and symmetric about the origin. The amplitude ofthis oscillation is proportional to E, but is independent of the input power level. In fact, the oscillatory response remains the same in amplitude and frequency for any input signal (regardless of its amplitude and wavefonn)
0.450
0.400 0.350 0 ..300
0.250
S :r
0.200 0.150
0.100 O.500E-01
0.0 -0.500E-01
T
0.0
8.00·
16.0
24.0
32.0
40.0
TIME LAG (TAU)
Figure 4.23 First order Wiener kernel estimates of negative sigmoid feedback system with the previous low-pass forward subsystem for P = 1, 4, 16, and 64 (E = 0.25, a = 0.25 in all cases). Observe reverse pattern of kernel waveform changes from the ones in Figures 4.22 or 4.14 [Marmarelis, 1991].
212
MODULA R AND CONNECTIONIST MODELING
0.600 O.~OO
0.400
0.300 0.200
,... ...., ~
0.100
%
0.0 -0.100 -0.200
-0.300 -0.400
0.0
8.00
24.0
16.0
32.0
40.0
TIME lAG
Figure 4.24 First-order Wiener kerneis of negative sigmoid feedbac k system with low-pass (overdamped) forward subsystem, for E = 0.5,1,2 , and 4. Observe transitio n to oscillatory behavior as E increases [Marmarelis, 1991].
0.25OE+ 04
2
0.2ooE+04 0.150E+0 4
0.1ooE+04 500. ;::::' ~
0.0
-500. -0.1ooE+ 04 -0. 15OE+04
-0.200E+ 04 -0.250E +04
1
0.0
100.
300.
200.
400.
500.
TIME
Figure 4.25 Oscillatory response of negative sigmoid feedbac k system for very large feedback gain E = 100 and 1000, and GWN input (P = 1) [Marmarelis, 1991].
4. 1 MODULAR FORM OF NONPARAMETRIC MODELS
213
as long as the value of E is much larger than the maximum value of the input. The initial transient and the phase of the oscillation, however, may vary according to the input power and waveform. The frequency of oscillation depends on the dynamics (time constants) ofthe linear forward subsystem. For instance, a low-pass subsystem with shorter memory (i.e., shorter time constants) leads to higher frequency of oscillation, and so does an underdamped system with the same memory extent. Although the case of oscillatory behavior is not covered formally by the Volterra-Wiener analysis because it violates the finite-memory requirement, it is of great interest in physiology because of the numerous and functionally important physiological oscillators. Therefore, it is a subject worthy of further exploration in the context of large negative compressive (e.g., sigmoid) feedback, because the foregoing observations are rather intriguing. For instance, can an oscillation of fixed frequency be initiated by a broad ensemble of stimuli that share only minimal attributes irrespective of waveform (e.g., having bandwidth and dynamic range within certain bounds) as long as the feedback gain is large? The effect ofvarying slope ofthe sigmoid nonlinearity was also studied and a gradually decreasing damping with increasing slope was observed [Marmarelis, 1991]. This transition reaches asymptotically a limit in both directions of changing a values, as expected. For a ~ 00, the sigmoid nonlinearity becomes the signum function and leads to perfect oscillations; and for a ~ 0, the gain of the feedback loop diminishes, leading to a kernel identical to the impulse response function of the forward linear subsystem. The effect of nonzero GWN input mean J.L is similar to the effect of increasing P, that is, the first-order Wiener kernels become more damped as J.L increases, which indicates decreasing gain of the equivalent linearized negative feedback. In the case of the underdamped forward subsystem and negative sigmoid feedback, the changes in the kernel waveforrn undergo a gradual transition from the linearized feedback system to the forward linear subsystem as the GWN input power level P increases. The two limit waveforrns (for P ~ 0 and P ~ 00) of the first-order Wiener kernel are shown in panel (a) ofFigure 4.26 for E = 1, a = 0.25. The effect ofthe negative sigmoid feedback is less dramatic in this case, since the kerneI retains its underdamped mode for all values of P. There is, however, a downward shift of resonance frequency and increase of damping when P increases, as indicated by the FFT magnitudes of the "limit" kerneI wavefonns shown in panel (b) of Figure 4.26.
Example 4.3. Positive Nonlinear Feedback The reverse transition in the first-order Wiener kerne I waveform is observed when the polarity ofthe weak nonlinear feedback is changed, as dictated by Equation (4.91). Positive decompressive (e.g., cubic) feedback leads to a decrease in resonance frequency and higher gain values in the resonant region. Also, the reverse transition in kernel waveforrn occurs (i.e., upward shift of resonance frequency and decrease of damping with increasing P values) when the compressive (e.g., sigmoid) feedback becomes positive. The great advantage of sigmoid versus cubic feedback is that stability of the system behavior is retained over a broader range of the input power level. For this reason, sigmoid (or other bounded) feedback is an appealing candidate for plausible models of physiological feedback systems. For those systems that exhibit transitions to broader bandwidth and decreased damping as P increases, candidate models may include either negative decompressive (e.g., cubic) or positive compressive (e.g., sigmoid) feedback. For those systems that exhibit the reverse transition pattern (i.e., to narrower bandwidth
0.480 0.400
0.320
0.240
...
""'" ~
%
0.180 0.800E-01 0.0 -0.800E-01 -0.160 -0.240
-0.320
T
0.0
8.00
16.0
32.0
24.0
TIME LAG
(a)
Q.125E-01
O.11JE-01 0.1ooE-01
"""" 0.815E-02
~ %
~
0.750E-02
Q
0.625E-02
w
::;)
i ~
....t:
0.5OOE-02
0.375E-02 0.250E-02
O.12SE-02 0.0
lii
0.0
i
I"',' J O.SOOE...01
0.100
0.150
0.200
0.250
NORMALIZED rREQU[NCY
(h) Figure 4.26 The two limit waveforms of the first-order Wiener kernel for the negative sigmoid feedback system (E = 1, a = 0.25) with underdamped forward subsystem, obtained for P ~ 0 and P ~ 00 (a), and their FFT magnitudes (b). Observe the lower resonance frequency and increased damping for p~ 00 [Marmarelis, 1991].
214
4.1
MODULAR FORM OF NONPARAMETRIC MODELS
215
and increased darnping as P increases), candidate models may include either positive decompressive or negative compressive feedback. Example 4.4. Second-Order KerneIs ofNonlinear Feedback Systems Gur exarnples so far have employed nonlinear feedback with odd symmetry (cubic and sigmoid), and our attention has focused on first-order Wiener kemels because these systems do not have even-order kernels. However, if the feedback nonlinearity is not oddsymmetrie, then even-order kernels exist. An example of this is given for negative quadratic feedback of the form E r (for E = 0.08, P = 1) where the previous band-pass (underdarnped) forward subsystem is used. The resulting second-order Wiener kernel is shown in Figure 4.27 and has the form and size predicted by the analytical expression of Equation (4.87). The first-order Wiener kernel is not affected significantly by the quadratic feedback for small values of E . It is important to note that Wiener analysis with nonzero GWN input mean yields even-order Wiener kernels (dependent on the nonzero input mean IL), even for cubic or sigmoid feedback systems, because a nonzero input mean defines an "operating point" that breaks the odd symmetry ofthe cubic or sigmoid nonlinearity. For instance, a negative cubic feedback system, where only K 1 and K 3 are assumed to be significant for small values of E, has the second-order Wiener kerne1 H';( WI , wz) = 3J.LK3(WI, WZ, 0)
= -3 E J.L'yK,(WI)K,(wz)K1(w, + Wz)
(4.97)
Equation (4.97) implies that the second-order Wiener kernel will retain its shape but increase linearly in size with increasing IL (provided, of course, that E is smalI).
Figure 4.27 Second-order Wiener kernel of the negative quadratic feedback system with a bandpass (underdamped) forward subsystem (E = 0.08, P = 1) [Marmarelis , 1991].
216
MODULAR AND CONNECTIONIST MODELING
Nonlinear Feedback in Sensory Systems. The presented analysis of nonlinear feedback systems is useful in interpreting the Wiener kerneI measurements obtained for certain visual and auditory systems under various conditions of GWN stimulation, as discussed below. The Wiener approach has been applied extensively to the study of retinal cells using band-limited GWN stimuli [e.g., Marmarelis & Naka, 1972, 1973a, b, c, d, 1974a, b]. In these studies, the experimental stimulus consists of band-limited GWN modulation of light intensity about a constant level of illumination, and the response is the intracellular or extracellular potential of a certain retinal cell (receptor, horizontal, bipolar, amacrine, ganglion). Wiener kerneIs (typically of first and second order) are subsequently computed from the stimulus-response data. The experiments are typically repeated for different levels of mean illumination (input mean) and various GWN input power levels in order to cover theentire physiological range of interest. It has been observed that the waveform of the resulting kernels generally varies with different input mean and/or power level. We propose that these changes in waveform may be explained by the presence of a nonlinear feedback mechanism, in accordance with the previous analysis. Note that these changes cannot be explained by the simple cascade models discussed in Section 4.1.2. The first such observation was made in the early 1970s [Marmarelis & Naka, 1973b] on the changing waveform of first-order Wiener kernel estimates of horizontal cells in the catfish retina, obtained for two different levels of stimulation (low and high mean illumination levels with proportional GWN modulation). The kernel corresponding to high level of stimulation was less damped and had shorter latency (shorter peak-response time) as shown in Figure 6.1. This observation was repeated later (e.g., [Naka et al., 1988; Sakai & Naka, 1985, 1987a,b]) for graduated values ofincreasing P and M. The observed changes are qualitatively similar to the ones observed in our simulations of negative decompressive (cubic) feedback systems with an overdamped forward subsystem. However, the changes in latency time and kernel size are much more pronounced in the experimental kernels than in our simulations of negative cubic feedback systems presented earlier. To account for the greater reduction in kernel size observed experimentally, we may introduce a compressive (static) nonlinearity in cascade with the overall feedback system that leads to an additional reduction ofthe gain ofthe overall cascade system as P and/or J.L increase. On the other hand, a greater reduction in the peak-response (latency) time may require the introduction of another dynamic component in cascade with the feedback system. Led by these observations, we propose the modular (block-structured) model, shown in Figure 4.28, for the light-to-horizontal cell system [Marmarelis, 1987d, 1991]. This model is comprised of the cascade of three negative decompressive (cubic) feedback subsystems with different overdamped forward components (shown in Figure 4.29) and a compressive (sigmoidal) nonlinearity between the outer and the inner segments of the photoreceptor model component. The first part of this cascade model, comprised of the PLIPN feedback loop and the compressive nonlinearity CN, corresponds to the transformations taking place in the outer segment of the photoreceptor and represents the nonlinear dynamics ofthe phototransduction process. The second part, comprised ofthe RLIRN feedback loop, represents the nonlinear dynamic transformations taking place in the inner segment ofthe photoreceptor (including the receptor terminals). The third part, comprised of the HLIHN feedback loop, represents the nonlinear dynamic transformations taking place in the horizontal cell and its synaptic junction with the receptor. Note that this model does not differentiate between cone and rod receptors and does not take into account spatial interactions or the triadic synapse with bipolar cells (see below).
4.1
MODULAR FORM OF NONPARAMETRIC MODELS
x(t)
217
y(t)
Receptor Cell Outer Segment
'----v---J
'---v-----/
ReceptorCell InnerSegment
HorizontalCell
Figure 4.28 Schematic of the modular (block-structured) model of light-to-horizontal cell system. Input x(t) represents the light stimulus and output y(t) the horizontal cell response. Each of the three cascaded segments of the model contains negative decompressive feedback and an overdamped forward component (shown in Figure 4.29). The static nonlinearity CN between the outer and inner segment of the receptor model component is compressive (sigmoidal) [Marmarelis, 1991].
The first-order Wiener kemels of this model are shown in Figure 4.30 for GWN input power levels P = 0.5, 1, 2, and 4, for the parameter values indicated in the caption ofFigure 4.29. We observe kemel waveform changes that resemble closely the experimentally observed changes (note that hyperpolarization is plotted as a positive deflection) that are discussed in Section 6.1.1. Since experimentally obtained horizontal-cell kemels are usually plotted in the contrast sensitivity scale (i.e., scaled by the respective GWN input power level), we show in the same figure the kemels plotted in the contrast sensitivity scale. The purpose of this demonstration is to show that the experimentally observed kernel waveform changes can be reproduced fairly well by a modelofthis form employing nonlinear feedback. The selected model components and parameters are dependent on the specific species, and the precise parameter values can be determined by repeated experiments (for different values of P and ~) and kernel analysis in the presented context for each particular physiological preparation. We can extend the light-to-horizontal cell model to explain the experimentally observed changes in the waveform of first-order Wiener kemels of the light-to-bipolar-cell system (for increasing GWN input power level) [Marmarelis, 1991]. As shown in Figure 4.31, the response of the horizontal cell is subtracted from the response of the receptor (inner segment), and the resulting signal is passed through a nonlinear feedback component representing the triadic synapses (from the receptor terminals to the horizontal processes and bipolar dendrites) as well as the transformation of the postsynaptic potential through the bipolar dendrites. The resulting first-order Wiener kemels are similar to the experimentally observed ones (i.e., shorter latency, increased bandwidth, and increased sensitivity with increasing P) that are presented in Section 6.1.1. Beyond these mechanistic explanations, an important scientific question can be posed about the teleological reasons for the existence of decompressive feedback in retinal cells, in tandem with compressive nonlinearities. The presented analysis suggests that this is an effective functional design that secures sensory transduction over a very broad range of stimulus intensities while, at the same time, provides adequate (practically undiminishing) dynamic range of operation about a dynamically changing operating point (attuned to changing stimulus conditions). Furthermore, the gradual transition ofthe system functional characteristics towards faster response when the stimulus intensity and temporal changes are greater, would be a suitable attribute for a sensory system that has evolved
Figure 4.29 Impulse response functions of the linear overdamped forward components PL (top), RL (middle), and HL (bottom) used in the model of Figure 4.28. Note that the feedback nonlinearities PN, RN, and HN used in the model are decompressive (cubic) with coef1icients E = 0.05,0.10, and 0.01, respectively. The static nonlinearity CN is compressive (sigmoidal) of the form described by Equation (4.96) with a = 0.2 (E = 2) [Marmarelis, 1991].
0 .900 0 .800 0 .700 0 .600 0.500
... ...,
......
0 .400
:t
0 .300 0.200 0. 100 0.0 -0 .100
-, 00
15.0
.30.0
45 .0
60 .0
75 0
TIME LAG (TAU)
(a)
0.0
15.0
.30.0
45 .0
60 .0
75.0
TIME LAG
(b) Figure 4.30 (a) First-order Wiener kerneis of the Iight-to-horizontal cell model shown in Figure 4.28 , for P = 0.5, 1, 2, and 4. Observe the gradual transition in kernel waveform akin to the experimentally observed . (b) The same kerneis plotted in cont rast sensitivity scale (Le., each kernel scaled by it s corresponding power level) [Marmarelis , 1991).
219
220
MODULAR AND CONNECTIONIST MODELING
x (t)
+
ReceptorCell Outer Segment
ReceptorCell InnerSegment
HorizontalCell
Bipolar Cell
Figure 4.31 Schematic of the modular (block-structured) model of the light-to-bipolar cell system described in the text [Marmarelis, 1991].
under the requirements of rapid detection of changes in the visual field (threat detection) for survival purposes. Another interesting example of a sensory system are the auditory nerve fibers, whose band-pass first-order Wiener kernels undergo a transition to lower resonance frequencies as the input power level increases [Marmarelis, 1991]. This has been observed experimentally in primary auditory nerve fibers that have center (resonance) frequencies between 1.5 and 6 KHz [Moller, 1983; Lewis et al., 2002a,b], as discussed in Section 6.1.3. To explore whether nonlinear feedback may constitute a plausible model in this case, we consider a band-pass, linear forward subsystem and negative sigmoid feedback, like the one discussed earlier. The obtained first-order Wiener kernel estimates for GWN input power level P = 1, 16,256, and 4096 are shown in Figure 4.32 (with appropriate plotting offsets to allow easier visual inspection) and replicate the gradual shift in resonance frequency and contraction of the "envelope" of the kernel with increasing P, which were also observed experimentally [Moller, 1975, 1977, 1978]. Since these changes are more easily seen in the frequency domain (the preferred domain in auditory studies), we also show the FFT magnitudes ofthese kernels in Figure 4.32. Note that the latter are approximate "inverted tuning curves" exhibiting decreasing resonance frequency and broadening fractional bandwidth as P increases, similar to the experimental observations in auditory nerve fibers. This nonlinear feedback model appears to capture the essential functional characteristics of primary auditory nerve fibers that have been observed experimentally. The negative compressive feedback can be thought as intensity-reduced stiffness, which has been observed in studies of the transduction properties of cochlear hair cells. Aceurate quantitative measures of the functional components and parameters of this feedback system (e.g., the precise form of the feedback nonlinearity) can be obtained on the basis of the analysis presented earlier and will require aseries of properly designed experiments for various values of P. Furthermore, the presence of a negative compressive feedback in the auditory fiber response characteristics may provide a plausible explanation for the onset of pathological conditions such as tinnitus, a situation in which the strength of the negative compressive feedback increases beyond normal values and leads to undiminishing oscillatory behavior irrespective of the specific auditory input waveform, as demonstrated earlier (see Figure 4.25).
Concluding Remarks on Nonlinear Feedback. Nonlinear feedback has been long thought to exist in many important physiological systems and be of critical importance for maintaining proper physiological function. However, its systematic and rigor-
.....--.....
4
"-
J
S i
2
~
20.0
0.0
80.0
60.0
40.0
100.
TINE lAG
(a) 0.125E-01
-
O.',JE-Ol
0.1ooE-01
0.815E-02 0.750E-02
0.625E-02 O.500E-02
0.375E-02 0.250E-02 0.125E-02 0.0
I' ,
0.0
i
,
i
i
i
i'
li
ii
0.250E-Ot
,
'i
,
,
,
I
i
, i
0.5OOE-01
i
,
i
i
i
,
li
i
0.750E-01
i
i
i
i
i
i
i
I'
0.100
i
,
i
,
i
i
,
,
,
0.125
NORMAUZEO rREOUENCY (b) Figure 4.32 (a) First-order Wiener kerneis of negative sigmoid feedback system (E = 1, a = 0.25) with a band-pass forward component, for P = 1 (trace 1),16 (trace 2),256 (trace 3), and 4096 (trace 4), emulating primary auditory fibers. Observe the contracting envelope and decreasing resonance frequency as P increases (arbitrary offsets for easier inspection). (b) FFT magnitudes of the kerneis shown in (a). We observe decreasing resonance frequency and gain as P increases, as weil as broadening of the tuning curve in reverse relation to the envelope of the kerne!. When these curves are plotted in contrast sensitivity scale (i.e., each scaled by its corresponding P value), then the resonance-frequency gain will appear increasing with increasing P [Marmarelis, 1991].
221
222
MODULAR AND CONNECTIONIST MODELING
ous study has been hindered by the complexity of the subject matter and the inadequacy of practical methods of analysis. The study of Volterra-Wiener expansions of nonlinear differential equations has led to some analytical results that begin to shed light on the analysis of nonlinear feedback systems in a manner that advances our understanding of the system under study. The results obtained for a class of nonlinear feedback systems relate Volterra or Wiener kernel measurements to the effects of nonlinear feedback under various experimental conditions. Explicit mathematical expressions were derived that relate Wiener kernel measurements to the characteristics of the feedback system and the stimulus parameters. The theoretical results were tested with simulations, and their validity was demonstrated in a variety of cases (cubic and sigmoid feedback with overdamped or underdamped forward subsystem). These test cases were chosen as to suggest possible interpretations of experimental results, including results that have been published in recent years for two types of sensory systems: retinal horizontal and bipolar cells, and primary auditory nerve fibers. It was shown that relatively simple nonlinear feedback models can reproduce the qualitative changes in kerne I waveforms observed experimentally in these sensory systems. Precise quantitative determination of the parameters of the feedback models requires analysis (in the presented context) of data from aseries of properly designed experiments. Specifically, it was shown that negative decompressive feedback (e.g., cubic) or positive compressive feedback (e.g., sigmoid) result in gradually decreasing damping (increasing bandwidth) of the first-order Wiener kernel as the GWN input power level and/or mean level increase. Conversely, positive decompressive or negative compressive feedback result in the reverse pattern of changes. The extent of these effects depends, of course, on the exact type of feedback nonlinearity and/or the dynamics of the linear forward subsystem. It was demonstrated through analysis and computer simulations that the experimentally observed changes in the waveform of the first-order Wiener kernel measurements for retinal horizontal and bipolar cells can be explained with the use of negative decompressive (cubic) feedback and low-pass forward subsystems (viz., the gradual transition from an overdamped to an underdamped mode as the GWN stimulus power and/or mean level increase). In the case of auditory nerve fibers, it was shown that the use of negative compressive (sigmoid) feedback and a band-pass forward subsystem can reproduce the effects observed experimentally on their "tuning curves" for increasing stimulus intensity (viz., a gradual downward shift of the resonance frequency and broadening of the bandwidth of the tuning curve with increasing stimulus power level) [Marmarelis, 1991]. It is hoped that this work will inseminate an interest among systems physiologists to explore the possibility of nonlinear feedback models in order to explain changes in response characteristics when the experimental stimulus conditions vary. This is critical when such changes cannot be explained by simple cascade models of linear and static nonlinear components (like the ones discussed earlier), which are currently popular in efforts to construct equivalent block-structured models from kernel measurements. For instance, in the case of the auditory nerve fibers, the suggested model of negative sigmoid feedback may offer a plausible explanation for pathological states of the auditory system, such as tinnitus. Likewise, in the case ofretinal cells, negative decompressive feedback in tandem with compressive nonlinearites may explain the ability of the "front end" of the visual system to accommodate a very broad range of visual stimulus intensities while preserving adequate dynamic range for effective information processing, as weIl as retain the ability to respond rapidly to changes in stimulus intensity.
4.2
4.2
CONNECTIONIST MODELS
223
CONNECTIONIST MODELS
The idea behind connectionist models is that the relationships among variables of interest can be represented in the form of connected graphs with generic architectures, so that claims of universal applicability can be supported for certain broad classes of problems. The most celebrated example of this approach has been the class of "artificial neural networks" (ANN) with forward and/or recurrent (feedback) interconnections. The latter types with recurrent interconnections have found certain specialized applications (e.g., the Hopfield-nets solution to the notorious "traveling salesman problem") and have made (largely unsubstantiated) claims of affinity to biological neural networks. However it is fair to say that they have not lived up to their promise (until now) and their current use in modeling applications is rather limited. The former types of ANN with forward interconnections have found (and continue to find) numerous applications and have demonstrated considerable utility in various fields. These types of ANN can be used for modeling purposes by representing arbitrary input-output mappings, and they derive their scientific pedigree from Hilbert's "13th Problem" and Kolmogorov's "representation theorem" of the early part ofthe 20th century [Kolmogorov, 1957; Sprecher, 1972]. The fundamental mathematical problem concems the mapping of a multivariate function onto a univariate function by means of a reference set of "activation functions" and interconnection weights. Kolmogorov's constructive theorem provided theoretical impetus to this effort, but the practical solution of this problem came through the methodological evolution of the concept of a "perceptron," which was proposed by Rosenblatt (1962). The field was further advanced through the pioneering work of Widrow, Grossberg, and the contributions of numerous others, leading to the burgenoning field of feedforward ANN (for review see [Rurnelhart & McClelland, 1986; Grossberg, 1988; Widrow & Lehr, 1990; Haykin, 1994; Hassoun, 1995]). The adjective "neural" is used for historical reasons, since some of the pioneering work alluded to similarities with information processing in the central nervous system, a point that remains conjectural and largely wishful rather than corroborated by real data in a convincing manner (yet). Nonetheless, the mere allusion to analogies with natural "thinking processes" seems to magnetize people's attention and to lower the threshold of initial acceptance of related ideas. Although this practice was proven to offer promotional advantages, the tenuous connection with reality and certain distaste for promotional hype has led us to dispense with this adjective in the sequel and refer to this type of connectionist model as a "Volterra-equivalent network" (VEN). For our purposes, the problem of nonlinear system modeling from input-output data relates intimately to the problem of mapping multivariate functions onto a univariate function, when the input-output data are discretized (sampled). Since the latter is always the case in practice, we have explored the use of Volterra-equivalent network architectures as an alternative approach to achieve nonlinear system modeling in a practical context. Wehave found that certain architectures offer practical advantages in some cases, as discussed below.
4.2.1
Equivalence between Connectionist and Volterra Models
Westart by exploring the conditions for equivalence between feedforward connectionist models and discrete Volterra models [Marmarelis, 1994c; Marmarelis & Zhao, 1994, 1997]. The latter generally represent a mapping of the input epoch vector x(n) = [x(n),
224
MODULAR AND CONNECTIONIST MODELING
x(n - 1), ... , x(n - M + 1)]' onto the output present scalar value y(n), where M is the memory-bandwidth product ofthe system. This mapping ofthe [M x 1] input epoch vector onto the output scalar present value can be expressed in terms of the discrete Volterra series expansion ofEquation (2.32). On the other hand, this mapping can be implemented by means of a feedforward network architecture that receives as input the input epoch vector and generates as output the scalar value y(n). The general architecture of a "Volterra-equivalent network" (YEN) is shown in Figure 4.33 and employs an input layer of M units (introducing the input epoch values into the network) and two hidden layers ofunits (that apply nonlinear transformations on weighted sums of the input values). The VEN output is formed by the sum of the outputs of the hidden units of the second layer and an offset value. A more restricted but practically useful YEN class follows the general architecture of a "three-layer perceptron" (TLP) with a "tapped-delay" input and a single hidden layer (shown in Figure 4.34) that utilizes polynomial activation functions instead of the conventional sigmoidal activation functions employed by TLP and other ANN. The output may have its own nonlinearity (e.g., a hard threshold in the initial perceptron architecture that generated binary outputs, consistent with the all-or-none data modality of action potentials in the nervous system).
x(n)
x(n-m)
x(n-M+l)
INPUT
FIRST HIDDEN
LAYER
SECOND HIDDEN
LAYER or INTERACTION
LAYER
OUTPUT
Figure 4.33 The general architecture of a "Volterra-equivalent network" (VEN) receiving the input epoch and generating the corresponding system output after processing through two hidden layers. Each hidden unit performs a polynomial transformation of the weighted sum of its inputs. Arbitrary activation functions can be used and be approximated by polynomial expressions within the range of interest. The output unit performs a simple summation of the outputs of the second hidden layer (also called "interaction layer") and an output offset.
4.2
CONNECTIONIST MODELS
225
The TLP architecture shown in Figure 4.34 corresponds to the class of "separable Volterra networks" (SVNs) whose basic operations are described below [Marmarelis & Zhao, 1997]. The weights {Wj,m} are used to form the input Uj( n) into the nonlinear "activation function" ij of the jth hidden unit as M-l
uj(n)
(4.98)
= L wj,mX(n - m) m=O
leading to the output of the jth hidden unit: (4.99)
Zj(n) = ij[uj(n)]
where the activation functionij is a static nonlinear function. For SVN or VEN models, the activation functions are chosen to be polynomials: Q
ij(Uj)
=
(4.100)
LCj,quJ q=l
although any analytic function or nonanalytic function approximated by polynomials (or power series) can be used.
x(n -m)
x(n)
•••
x{n - M
+ 1)
INPUT
T 7 1
r-----
HIDDEN
LAYER
L
~
: Z\
(n)
Zj(n)
ZH(n)
Yo OUTPUT
Figure 4.34 The single-Iayer architecture of the special class of VEN that corresponds to the "separable Volterra network" (SVN). The output unit is a simple adder. This network configuration is similar to the traditional "three-Iayer perceptron" (TLP) albeit with polynomial activation functions {f;} in the hidden units, instead of the conventional sigmoidal activation functions used in TLP.
226
MODULAR AND CONNECTIONIST MODELING
The SVNNEN OUtput unit is a simple adder (i.e., no output weights are necessary) that sums the outputs {Zj} of the hidden units and an offset Yo as H
yen)
= Yo + LZj(n)
(4.101)
j=l
Combining Equations (4.98), (4.99), (4.100), and (4.101), we obtain the SVN/VEN input-output relation: Q
yen)
= Yo +
H
LL q=l j=l
Cj,q
M-I
M-I
ml=O
mq=O
L ... L
Wj,ml . . . Wj,mqx(n -
ml) ... x(n -
m q)
(4.102)
whieh is isomorphie to the discrete Volterra model (DVM) of order Q as H tends to infinity. Equation (4.102) demonstrates the equivalenee between SVNNEN and DVM, whieh is expeeted to hold in practice (with satisfaetory approximation) even for finite H. It has been found empirically that satisfaetory DVM approximations can be obtained with small H for many physiologieal systems. It is evident that the diserete Volterra kemels ean be evaluated by means of the SVNNEN parameters as H
kq(mb . . . , m q)
=
LCj,qWj,ml . . . Wj,m q j=l
(4.103)
offering an alternative for Volterra kerne I estimation through SVNNEN training that has proven to have eertain practical advantages [Marmarelis & Zhao, 1997]. Naturally, the same equivalenee holds for the broader class of VEN models shown in Figure 4.33 as H and I tend to infinity. However the Volterra kerne I expressions become more complieated for the general form ofthe VEN model in Figure 4.33. The use of polynomial activation funetions in eonneetion with SVNNEN models direet1y maintains the mathematieal affinity with the Volterra models [Marmarelis, 1994e, 1997], and typieally reduces the required number H of hidden units (relative to pereeptron-type models). This is demonstrated below by eomparing the two elasses of models. The aetivation funetions used for conventional TLP arehiteetures are seleeted to have the sigmoidal shape of the "logistic" funetion:
~{u) =
1 1 + exp[-.A(Uj- 0)]
(4.104)
1 - exp[-A(uj - 0j)] 1 + exp[-.A(Uj- 0)]
(4.105)
or the "hyperbolie tangent" funetion:
~(u) =
depending on whether we want a unipolar (between 0 and 1) or abipolar (between -1 and + 1) output, as Uj tends to ±oo. The parameter Adefines the slope of this sigmoidal curve at the infleetion point Uj = Oj and is typically speeified by the user in the conventional TLP arehiteetures (i.e., it is not estimated from the data). However, the offset parameter Oj is
4.2
227
CONNECTIONIST MODELS
estimated from the data for each hidden unit separately, during the "training" process of the TLP. Since the specific value of A can affect the stability and the convergence rate of the training algorithm (e.g., through error back-propagation discussed in Section 4.2.2), we recommend that A be trained along with the other network parameters (contrary to established practice). As A increases, the sigmoidal function approaches a "hard threshold" at (Jj and can be used in connection with binary output variables. In the latter case, it is necessary to include output weights and a hard-threshold (Jo in the output unit as
yen) =
To[~rA{n)]
(4.106)
where {rj } are the output weights and To denotes a hard-threshold operator (i.e., equal to 1 when its argument is greater than a threshold value (Jo, and 0 otherwise). In order to simplify the comparison and study of equivalence between the SVNNEN and the TLP class of models, we consider a TLP output unit without threshold: H
y(n) =
I
(4.107)
rjZj(n)
j=l
Then, combining Equations (4.98), (4.99), (4.104), and (4.107), we obtain the input-output relation ofthe TLP model: H
y(n) = ~
-.s, ~owj.",.x(n-m) [M-I
]
(4.108)
which can be put in a DVM form by representing each sigmoidal activation function S, with its respective Taylor expansion:
~(Uj) = Iai,j((Jj)uJ(n) i=O
(4.109)
where the Taylor expansion coefficients depend on the offset parameter (Jj of the sigmoidal activation function (and on the slope parameter A, ifit is allowed to be trained). If the selected activation function is not analytic (e.g., a hard-threshold function that does not have a proper Taylor expansion), then a polynomial approximation of arbitrary accuracy can be obtained according to the Weierstrass theorem, following the method presented in Appendix I. Combining Equations (4.108) and (4.109), we obtain the equivalent DVM: 00
y(n) =
H
Ii=O I
j=l
M-I
rjai,j((Jj)
M-I
I ...mi=O I Wj,ml ... Wj,mix(n - ml) ... x(n - mJ
(4.110)
ml=O
for this class of TLP models, where the ith-order discrete Volterra kerneI is given by H
klmJ, ... , mi) = Irjai,j((Jj)Wj,ml ... Wj,mi
(4.111)
j=l
Therefore, an SVNNEN or a TLP model has an equivalent DVM whose kernels are defined by Equation (4.103) or (4.111), respectively. The possible presence of an activation function at the output unit (e.g., a hard threshold) does not alter this fundamental fact but it
228
MODULAR AND CONNECTIONIST MODELING
makes the analytical expressions for the equivalent Volterra kernels more complicated. The fundamental question remains as to the relative efficiency of an equivalent SVNNEN or TLP representation of a given DVM. This question can be answered by considering the total number offree parameters for each model type that yields the same approximation ofthe kernel values for a Qth-order DVM. It has been found that the TLP model requires generally a much larger number ofhidden units and therefore more free parameters. This important point can be elucidated geornetrically by introducing a hard threshold at the output of both models and let A ~ 00 in the sigmoidal activation functions of the TLP so that each Sj becomes a hard-threshold operator 1j(uj - Oj). Furthermore, to facilitate the demonstration, consider the simple case of Q = 2 and M = 2, where the input-output relation before the application of the output threshold is simply given by y(n) = x 2(n)
+ x 2(n - 1)
(4.112)
Application of a hard threshold 00 = 1 at the output of this DVM yields a circular binary boundary defining the input-output relation, as shown in Figure 4.35. To approximate this input-output relation with a TLP architecture, we need to use a very large number H of hidden units, since each hidden unit yields a rectilinear segment after network training
X(n-l) 2
1--------I I I
-----------,1
1
I I I I I -2
1
\7···········,. o
1
••• -1,1.···
\ .
I
1 1
12 1
I I
1
X(n)
I
1 1
.
I I I I I I I I
_
I I I I I I I
Figure 4.35 Illustrative example of a circular output "trigger boundary" (solid line) being approximated by a three-Iayer perceptron (TLP) with three hidden units defining the piecewise rectilinear (triangular) approximation of the "trigger boundary" marked by the dotted Iines. The training set is generated by 500 data points of uniform white-noise input that lies within the square domain demarcated by dashed lines. The piecewise rectilinear approximation improves with increasing number of hidden units of the TLP, assuming polygonal form and approaching asymptotically more precise representations of the circular boundary. Nonetheless, a VEN with two hidden units having quadratic activation functions yields apreeise and parsimonious representation of the circular boundary.
4.2
CONNECTIONIST MODELS
229
für the best mean-square approximation of the output, according to the binary output approximation: H
Irj1j(uj
- (}j)
== (Ja
(4.113)
j=l
where 1j(uj - (}j) = 1 when wj,oX(n) + wj,}x(n equation, Wj,oX(n)
1)~ (}j'
otherwise T, is zero. Thus, the linear
+ wj,}x(n - 1) = (Jj
(4.114)
defines each rectilinear segment of the output approximation due to the jth hidden unit, and the resulting TLP polygonal approximation is defined by the output Equation (4.113). For instance, if His only 3, then a TLP triangular approximation is shown in Figure 4.35 with dotted line. If the training ofthe TLP is perfeet, then the nonoverlapping area is minimized in a mean-square sense and the resulting polygonal approximation is canonical (i.e., symmetrie for asymmetrie boundary). This, of course, is seldom the case in practice because the TLP training is imperfect due to noise in the data or incomplete training convergence, and an approximation similar to the one depicted in Figure 4.35 typically emerges. Naturally, as H increases, the polygonal TLP approximation of the circle improves, reaching asymptotically aperfect representation when H ~ 00 and if the training data set is noise-free and fully representative of the actual input ensemble of the system. Note however that a perfect SVNIVEN representation of this input-output relation (for noise-free data) requires only two hidden units with quadratic activation functions! This simple illustrative example punctuates a very important point that fonns the foundation of understanding the relative efficacy of SVNNEN and TLP models in representing nonlinear dynamic input-output relationships/mappings. The key point is that the use of sigmoidal activation functions unduly constrains the ability of the network to represent a broad variety of input-output mappings with a small number of hidden units. This is true even when multiple hidden layers are used, because the aforementioned rectilinear constraint remains in force. On the other hand, if polynomial activation functions are used, then a much broader variety of input-output mappings can be represented by a small number ofhidden units. This potential parsimony in the complexity of the required network architecture (in terms of the number of hidden units) is of critical importance in practice, because the complexity of the modeling task is direct1y related to the number of hidden units (both in terms of estimation and interpretation). This holds true for single or multiple hidden layers. However, the use ofpolynomial activation functions may give rise to some additional problems in the network training process by introducing more local minima during minimization of the cost function. It is also contrary to Kolmogorov's constructionist approach in his representation theorem (requiring monotonie activation functions) that retains a degree of reverence within the peer community. With all due respect to Kolmogorov's seminal contributions, it is claimed herein that nonmonotonic activation functions (such as polynomials) offer, on balance, a more efficient approach to the problem of modeling arbitrary input-output mappings with feedforward network architectures. This proposition gives rise to a new class of feedforward Volterra-equivalent network architectures that employ polynomial (or generally nonmonotomic) activation functions
230
MODULAR AND CONNECTIONIST MODELING
and can be efficient models of nonlinear input-output mappings. Note that this is consistent with Gabor's proposition of a "universal" input-output model, akin in form to a discrete Volterra model [Eykhoff, 1974]. Relation with PDM Modeling. If a feedforward network architecture with polynomial activation functions is shown to be appropriate for a certain system, then the number of hidden units in the first hidden layer defines the number of PDMs of this system. This is easily proven for the single hidden-layer SVNNEN models shown in Figure 4.34 by considering as the jth PDM output the "internal variable" uj(n) of the jth hidden unit given by the convolution ofEquation (4.98), where thejth PDMpj(m) is defined by the respective weights {wj,m} that form a discrete "impulse response function." This internal variable uj(n) is subsequently transformed by the polynomial activation function !j(Uj) to generate the output of the jth hidden unit zj(n) according to Equation (4.100), where the polynomial coefficients {Cj,q} are estimated from the data during SVNNEN training. It is evident that the equivalent Volterra kernels of the SVNNEN model are given by H
kq(mh ... , mq) =
I
Cj,qpj(ml) ... Pj(m q)
(4.115)
j=1
Therefore, this network model corresponds to the case of a "separable PDM model" where the static nonlinearity associated with the PDMs can be "separated" into individual polynomial nonlinearities corresponding to each PDM and defined by the respective activation functions, as shown in Figure 4.34. This separable PDM model can be viewed as a special case of the general PDM model of Figure 4.1 and corresponds to the "separable Volterra network" (SVN) architecture. The PDMs capture the system dynamics in a most efficient manner (although other filterbanks can be also used) and the nonlinearities may or may not be separable. In the latter case, the general YEN model ofFigure 4.33 must be used to represent the general non-separable PDM model ofFigure 4.1. The SVN architecture is obviously very convenient in practice but cannot be proposed as having general applicability. Even though it has been found to be appropriate for many actual applications to date (see Chapter 6), the general YEN model would require multiple hidden layers that can represent static nonlinearities of arbitrary complexity in the PDM model. The additional hidden layers may incorporate other analytic functions (such as sigmoidal, Gaussian, etc.), although the polynomial functions would yield directly the familiar multinomial form of the modified Volterra model with crossterms. The relation of PDM models with the VEN architectures also points to the relation with the modified discrete Volterra (MDV) models that employ kernel expansions on selected bases. In this case, the input can be viewed as being preprocessed through the respective filterbank prior to weighting and processing by the hidden layer, resulting in the architecture ofFigure 4.36. The only difference between this and the previous case ofthe YEN model shown in Figure 4.33 is that the internal variables ofthe first hidden layer are now weighted sums of the filterbank outputs {vln)}: L
u}{n) =
I
/=1
wj,/vln)
(4.116)
4.2
CONNECTIONIST MODELS
x(n)
I ~
i
231
INPUT
1 ~
... fb;l ... ~ ~) ~)
FILTERBANK
HIDDEN LAYER
INTERACTION LAYER
y(n) Figure 4.36
OUTPUT
The VEN model architecture for input preprocessing by the filter bank {b,}.
instead ofthe weighted sum ofthe input lags shown in Equation (4.98). The eritieal point is that when L ~ M, YEN model parsimony results. By definition, the use ofthe PDMs in the filterbank yields the most parsimonious VEN model (minimum L). In order to establish a clear terminology, we will use the term SVN for the YEN model of Figure 4.34 with a single hidden layer and polynomial activation funetions, and the term TLP when the aetivation functions are sigmoidal. Note that the VEN model ean generally have multiple hidden layers to represent arbitrary nonlinearities (not neeessarily separable), as shown in Figure 4.33 and Figure 4.36. The important praetieal issue of how we determine the appropriate memory-bandwidth produet M and degree of polynomial nonlinearity Q in the aetivation funetions of the SVN model is addressed by preliminary experiments diseussed in Seetion 5.2. For instanee, the degree of polynomial nonlinearity ean be established by preliminary testing of the system with sinusoidal inputs and subsequent determination of the highest harmonie in the output via diserete Fourier transform, or by varying the power level of a white-noise input and fitting the resulting output varianee to a polynomial expression. One other eritieal issue, the number of hidden units ean be determined by sueeessive trials in ascending order and applieation of the statistieal eriterion of Seetion 2.3.1 on the resulting reduetion of residual varianee. Note that the input weights in the SVN model are normalized to the unity Euelidean norm for eaeh hidden unit so that the polynomial eoeffieients give a direet measure of the relative importanee of eaeh hidden unit.
232
MODULAR AND CONNECTJONIST MODELING
Illustrative examples are given below for a second-order and an infinite-order simulated system with two PDMs [Marmarelis, 1997; Marmarelis & Zhao, 1997].
Illustrative Examples. First we consider a second-order Volterra system with memory-bandwidth product M = 25, having the first-order kerne1 shown in Figure 4.37 with a solid line and the second-order kerne1 similar to the one shown in the top panel of Figure 4.38 . This system is simulated using a uniform white-noise input of 500 data points. We estimate the first-order and second-order Volterra kernels of this system using TLP and SVN models, as well as LET, which was introduced in Section 2.3 .2 to improve Volterra kerne1 estimation by use of Laguerre expansions of the kemels and least-squares estimation ofthe expansion coefficients [Marmarelis, 1993]. In the noise-free case , the LET and SVN approaches yie1d precise first-order and second-order Volterra kernel estimates , although at considerab1y different computationa1 cost (LET is about 20 times faster than SVN in this case) . Note that the LET approach requires five discrete Laguerre functions (DLFs) in this example (i.e., 21 free parameters need be estimated) while the SVN approach needs only one hidden unit with a second-degree activation function (resulting in 28 free parameters). As expected, the TLP model requires more free parameters in this example, (i.e., more hidden units) and its predictive accuracy is rather inferior, although it incrementally improves with increasing number H ofhidden units . This incremental improvement gradually diminishes, because of the finite data record. Since the computationa1 burden for net-
1sr-ORDER KERNELS FOR NOISY CASE (SNR=O dB)
0.42 0.358 0.296 0.234 0 .172 0.1 1
/
"
~\
0
f . >: -0.014 -0.076
.\ / .'
v
- 0.138 -0.2 0
5
10
15
20
25
TIME lAG
Figure 4.37 The exact first-order Volterra kernel (solid line) and the three estimates obtained in the noisy case (SNR = 0 dB) via LET (dashed line), SVN (dot-dashed line), and TLP (dotted line). The LET estimate is the best in this example, followed closely by the SVN estimate in terms of accuracy. The TLP estimate (obtained with four hidden units) is the worst in accuracy and computationally most demand ing [Marmarelis & Zhao , 1997].
4.2
CONNECTIONIST MOD ELS
233
(a)
(b)
(c)
Figura 4.38 The second-order Volterra kemel estimates obta ined in the noisy case (SNR = 0 dB) via (a) LET, (b) SVN, and (c) TLP. The relative performance of the three methods is the same as described in the case of first-order kemeis (see capt ion of Figure 4.37) [Marmarelis & Zhao , 1997].
234
MODULAR AND CONNECTIONIST MODELING
work training increases with increasing H, we are faced with an important trade-off: incremental improvement in accuracy versus additional computational burden. By varying H, we determine a reasonable compromise für a TLP model with four hidden units, where the number of free parameters is 112 and the required training time is about 20 times longer than SVN (or 400 times longer than LET). The resulting TLP kerne I estimates are not as accurate as their SVN or LET counterparts, as illustrated in Figures 4.37 and 4.38 for the first-order and second-order kernels, respectively, for a signal-to-noise ratio of 0 dB in the output data (i.e., the output-additive independent GWN variance is equal to the noise-free, de-meaned output mean-square value). Note that the SVN training required 200 iterations in this example versus 2000 iterations required for TLP training. Thus, SVN appears to be preferable to TLP in terms of accuracy and computational effort in this example of a second-order Volterra system. The obtained Volterra kerneI estimates via the three methods (LET, SVN, TLP) demonstrate that the LET estimates are the most accurate and quiekest to obtain, followed by the SVN estimates in terms of accuracy and computation, although SVN requires longer computing time (by a factor of 20). The TLP estimates are clearly inferior to either LET or SVN estimates in this example and require longer computing time (about 20 times longer than SVN for H = 4). These results demonstrate the considerable benefits of using SVN configurations instead of TLP for Volterra system modeling purposes, although there may be some cases in which the TLP configuration has a natural advantage (e.g., systems with sigmoidal output nonlinearities). Although LET appears to yield the best kernel estimates, its application is practically limited to low-order kernels (up to third) and, therefore, it is the preferred method only for systems with low-order nonlinearites. On the other hand, SVN offers not only an attractive alternative for low-order kernel estimation and modeling, but also a unique practical solution when the system nonlinearities are 0/ high order. The latter constitutes the primary motivation for introducing the SVN configuration for nonlinear system modeling. To demonstrate the efficacy of SVN modeling for high-order systems, we consider an infinite-order nonlinear system described by the output equation y = (v} + 0.8~ - 0.6vIv2)sin[(v} + v2)/5]
(4.117)
where the sine function can be expressed as a Taylor series expansion, and the "internal" variables (v} , V2 ) are given by the difference equations: v}(n) = 1.2v}(n - 1) - 0.6v}(n - 2) + 0.5x(n - 1)
(4.118)
v2(n) = 1.8v2(n - 1) - 1.lv2(n - 2) + 0.2x(n - 3) + O.lx(n - 1) + O.lx(n - 2)
(4.119)
The discrete-time input signal x(n) is chosen in this simulation to be a 1024-point segment ofGWN with unit variance. Use ofLET with six DLFs to estimate the truncated secondorder and third-order Volterra models yields output predictions with normalized meansquare errors (NMSEs) of 47.2% and 34.7%, respectively. Note that the obtained kernel estimates are seriously biased because of the presence of higher-order terms in the output equation that are treated by LET as correlated residuals in least-squares estimation. Use of the SVN approach (employing five hidden units of seventh-degree) yields a model of improved prediction accuracy (NMSE = 6.1%) and mitigates the problem ofkemel estima-
4.2
CONNECTIONIST MODELS
235
tion bias by allowing estimation of nonlinear terms up to seventh order. Note that, although the selected system is of infinite order, the higher-order Volterra kerneis are of gradually diminishing size, consistent with the Taylor series expansion of the sine function. Training of a TLP model with these data yields less prediction accuracy than the SVN model for comparable numbers ofhidden units. For instance, a TLP model with H = 8 yields an output prediction NMSE of 10.3% (an error that can be gradually, but slowly, reduced by increasing the number of hidden units) corresponding to 216 free parameters, which compares with 156 free parameters for the aforementioned SVN model with H = 5 and Q = 7 that yields an output prediction NMSE of 6.1%.
4.2.2 Volterra-Equivalent Network Architectures for Nonlinear System Modeling This section discusses the basic principles and methods that govern the use of Volterraequivalent network (VEN) architectures for nonlinear system modeling. The previously established Volterra modeling framework will remain the mathematical foundation for evaluating the performance of alternative network architectures. The key principles that will be followed are 1. The network architecture must retain equivalence to the Volterra class of models. 2. Generality and parsimony will be sought, so that the model is compact but not unduly constrained. The study will be limited here to feedforward network architectures of single-input/single-output models for which broadband time-series data are available. The case of multiple inputs and outputs, as weIl as the case of autoregressive models with recurrent network connections, will be discussed in Chapters 7 and 10, respectively. The underlying physiological system is assumed to be stationary and belong to the Volterra class. The nonstationary case will be discussed in Chapter 9. The network architectures considered herein may have multiple hidden layers and arbitrary activation function (as long as the latter can be expressed or approximated by polynomials or Taylor series expansions in order to maintain equivalence with the Volterra class of models). For simplicity of representation, all considered networks will have a filterbank for input preprocessing, which can be replaced by the basis of sampling functions if no preprocessing is desired. In general, the selection of the filterbank will be assumed "judicious" in order to yield compact representations of the system kerneIs (see Section 2.3.1). Although the filterbank may incorporate trainable parameters (see next section on the Laguerre-Volterra network), this will not be a consideration here. With these stipulations in mind, we consider the general VEN architecture shown in Figure 4.36 that has two hidden layers, Vi} and {gi}, and a filterbank {b,} for input convolutional preprocessing according to M-l
vln) = Lblm)x(n-m)
(4.120)
m=O
The first hidden layer has H units with activation functions variables
Vi} transforming the internal
L
uj(n) = L wj,lvln) '=1
(4.121)
236
MODULAR AND CONNECTIONIST MODELING
into the hidden unit output Zj(n) = jj[uj(n)] Q
(4.122)
= ICj,quJ(n) q=1
The outputs {Zj(n)} of the first hidden layer are the inputs to the second hidden layer (also termed the "interaction layer") that has I units with activation functions {gi} transforming the ith internal variable H
c/Jln)
=
Ipi,~j(n)
(4.123)
j=1
into the ith interaction unit output l/Jln) = gi[ c/Jln)] R
=
I
(4.124)
'Yi,rc/Ji(n)
r=1
Note that R and/or Q may tend to infinity if the activation function is expressed as a Taylor series expansion. Therefore, activation functions other than polynomials (e.g., sigmoidal, exponential, sinusoidal) are admissible under this network architecture (note that monotonicity is not a requirement in contrast to the conventional approach). For instance, a sensible choice might involve polynomial activation functions in the first hidden layer (for the reason expounded in the previous section), but cascaded with sigmoidal activation functions to secure stability of the model output, as discussed in Section 4.4. It is evident that the presence of a second hidden layer distinguishes this architecture from the separable Volterra networks (SVN) discussed in the previous section and endows it with broader applicability. However, as previously for the SVN model, the "principal dynamic modes" (PDMs) corresponding to this VEN model architecture remain the equivalent filters generating the internal variables uj(n) of the first hidden layer, i.e., the jthPDM is L
Pj(m) =
I
wj,lblm)
(4.125)
pj(m)x(n - m)
(4.126)
/=1
and the PDM outputs M-l
uj(n) =
I
m=O
are fed into a multiinput static nonlinearity that maps the H PDM outputs {zl(n), ... , zIfn)} onto the VEN output yen) after transformation through the interaction layer. Thus, the nonseparable nonlinearity ofthe equivalent PDM model is represented by the cascaded operations of the hidden and interaction layers, yielding the input-output relation y(n)
I {H
[L
M-I
= Yo + ~ s, ~ Pi,Jj ~Wj,f ~o bf(m)x(n -m)
]}
(4.127)
4.2
CONNECTIONIST MODELS
237
which provides guidance for the application of the chain rule of differentiation for network training through the error back-propagation method (see below). When the quadratie eost function 1
(4.128)
J(n) = "2E2(n)
is sought to be minimized for all n in the training set of data, where E(n) = y(n) - y(n)
(4.129)
is the output prediction error (Y denotes the output measurements), we need to evaluate the gradient of the cost function with respect to all network parameters over the network parameter space, which is composed ofthe weights {Wj,l} and {Pi,j}' as weIl as the parameters ofthe activation functions {fj} and {gi}. For instance, the gradient component with respect to the weight Wk,s is aJ(n) aE(n) --=E(n)-l1l.vk,s awk,s
(4.130)
by application of the chain rule of differentiation we have aE(n)
ay(n)
~
l1l.vk,s
l1l.vk,s
i=l
af/>ln)
- - = - - = Lg:{cPln)} -
l1l.vk,s
I, H, auj(n) = Lgi{ cPln)} LPi,j!j[Uj(n)]-i=l j=l l1l.vk,s I
=
Lg:{ cPln)}Pi,J;[Uk(n)]vs(n) i=l
(4.131)
where f' and g' denote the derivatives oif and g, respectively. These gradient components are evaluated, of course, for the current parameter values that are continuously updated through the training proeedure. For instanee, the value ofthe weight wk,s is updated at the ith iteration as W
(i+ 1) _ (i) - W k.s k.s
'V
IW
raE
(n) ] (i) E(l)(n) .
- .:h.,
uvvk,s
(4.132)
where the gradient component is given by Equation (4.131), Jlw denotes the "training step" or "leaming constant" for the weights {Wj,l}, and the superscript (i) denotes quantities evaluated for the ith-iteration parameter values. The update schemes that are based on Ioeal gradient information usuaBy employ a "momentum" term that reduces the random variability from iteration to iteration by performing first-order low-pass filtering (exponentiaBy weighted smoothing). Analogous expressions can be developed for the other network parameters using the ehain rule of differentiation. A specific example is given in the following section for the Laguerre-Volterra network that has been used extensively in actual applications to date. In this section, we will concentrate on three key issues:
238
MODULAR AND CONNECTIONIST MODELING
1. Equivalence with Volterra kernels/models 2. Selection ofthe structural parameters ofthe network model 3. Convergence and accuracy ofthe training procedure Note that the training of the network is based on the training dataset (using either single error/residual points or summing many squared residuals in batch form); however, the cost function computation is based on the testing dataset (different residuals than the training dataset) that has been randomly selected according to the method described later in order to reduce possible correlations among the residuals. Note also that, if the actual prediction errors are not Gaussian, then a nonquadratic cost function can be used to attain efficient estimates of the network (i.e., minimum estimation variance). The appropriate cost function in this case is determined by the minus log-likelihood function ofthe actual prediction errors, as described in Section 2.1.5.
Equiva/ence with Vo/terra Kerne/sIMode/s. The input-output relation ofthe VEN model shown in Figure 4.36 is given by Equation (4.127). The equivalent Volterra model, when the activation functions are expressed either as polynomials or as Taylor series expansions, yields the Volterra kerne I expressions (4.133)
ko =Yo I
k 1( m ) =
I
l'i,1
i=1
I
k2( m
J, m2)
=
H
I
l'i,1
i=1
IPi'J~j,2 I
I
i=1
H
Yi,2
L
j=1
1=1
wj,lblm)
(4.134)
L
I
Wj,/IWj,/2bll(ml)bI2(m2) 11=1 12=1
j=1
I
+
L
H
IPi'J~j,1 I
H
I I
jl =1 j2=1
L
Pi,jl P i,j2 CjJ,I Cj2,1
L
I I
Wjl,/IWj2,/2bll(ml)bI2(m2) 11=1 12=1
(4.135)
The expressions for the higher-order kernels grow more complicated but are not needed in practice, since the interpretation ofhigh-order nonlinearities will rely on the PDM model form and not on individual kernels. It is evident that the complete representation of the general Volterra system will require an infinite number ofhidden units and filterbank basis functions. However, we posit that, for most physiological systems, finite numbers of L, H, and I will provide satisfactory model approximations. The same is posited for the order of nonlinearity, which is detennined by the product (QR) in the network ofFigure 4.36.
Se/ection of the Structura/ Parameters of the VEN Mode/. The selection of the structural parameters (L, H,.Q, I, R) that define the architecture ofthe VEN model in Figure 4.36 is a very crucial matter because it determines the ability ofthe network structure to approximate the function of the actual physiological system (with regard to the input-output mapping) for properly selected parameter values and for a broad ensemble of inputs. It should be clearly understood that the ability of a given network model to achieve a satisfactory approximation ofthe input-output mapping (with properly selected parameter values) is critically constrained by the selected network structure. In the case of the VEN models, this selection task is as formidable as it is crucial, because some of the model parameters enter nonlinearly in the estimation process, unlike
4.2
CONNECTIONIST MODELS
239
the case of the discrete Volterra model, where the parameters enter linearly and a model order selection criterion can be rigorously applied (see Section 2.3.1). Therefore, even if one assurnes that proper convergence can be achieved in the iterative cost-minimization procedure (discussed below), the issue of rigorously assessing the significance of residual reduction with increasing model complexity remains a formidable challenge. To address this issue, we establish the following guiding principles: 1. The assessment of significance of residual reduction must be statistical, since the data-contaminating noise/interference is expected to be stochastic (at least in part). 2. The simplest measure of residual reduction is the change in the sum of the squared residuals (SSR) as the model complexity increases (i.e., for increasing values of L, H, Q,I,R). 3. Proceeding in ascending model complexity, a statistical-hypothesis test is performed at each step, based on the "null hypothesis" that the current model structure is the right one and examining the residual reduction in the next step (i.e., the next model order) using a statistical criterion constructed under the null hypothesis. 4. To maximize the statistical independence of the model residuals used for the SSR computation (a fact that simplifies the construction of the statistical criterion by assuming whiteness ofthese residuals), we evaluate the SSR from randomly selected data points of the output (the "testing dataset") while using the remaining output data-points for network training (the "training dataset"). 5. The statistics of the residuals used for the SSR computation are assumed to be approximately Gaussian in order to simplify the statistical derivations and justify the use of a quadratic cost function. Based on these principles, we examine a sequence ofnetwork model structures {Sk} in ascending order of complexity, starting with L = H = Q = I = R = 1 and incrementing each structural parameter sequentially in the presented rightward order (i.e., first we increment L all the way to L m ax and then increment H, etc.). At the kth step, the network structure Sk is trained with the "training dataset" and the resulting SSR J k is computed from the "testing dataset." Because the residuals are assumed Gaussian and white (see principles 4 and 5 above), J k follows a chi-square distribution with degrees of freedom equal to the size of the "testing dataset" minus the number of free parameters in Sk. Subsequently, an F statistic can be used to test the ratio J;Jk+ 1 against a statistical threshold for a specified level of confidence. Ifthe threshold is not exceeded, then the null hypothesis is accepted and the network structure Sk is anointed the "right one" for this system; otherwise, the statistical testing procedure continues with the next network structure of higher complexity. This selection procedure appears straightforward but is subject to various pitfalls, rooted primarily in the stated assumptions regarding the nature of the actual residuals and their interrelationship with the specific input data used in this procedure. It is evident that the pitfalls are minimized when the input data are elose to band-limited white noise (covering the entire bandwidth and dynamic range of the system) and the actual residuals are truly white and Gaussian, as well as statistically independent from both the input and the output. The application ofthis procedure is demonstrated is Section 4.3 in connection with the Laguerre-Volterra network, which is the most widely used Volterra-equivalent network model to date.
240
MODULAR AND CONNECTIONIST MODELING
Convergence and Accuracy of the Training Procedure. Having selected the structural parameters (L, H, I, Q, R) of the YEN model, we must "train" it using the "training set" of input-output data. The verb "train" is used to indicate the iterative estimation of the VEN model parameters through minimization of a cost function defined by the "testing set" of the input-output data. As indicated above, the available input-output data are divided into a "training set" (typically about 80% of the total) and a complementary "testing set" using random sampling to maximize the statistical independence of the model prediction errors/residuals at the points of the testing set. This random sampling is also useful in mitigating the effects ofpossible nonstationarities in the system, as discussed in Chapter 9. Note that the input data is comprised of the vector of preprocessed data of the filter-bank. outputs v(n) = [Vt(n), ... , vL(n)]', which are contemporaneous with the corresponding output value y(n). Thus, the random sampling selects about 20% ofthe time indices for the testing set prior to commencing the training procedure on the basis of the remaining 80% inputoutput samples comprising the training set. For each data point in the training set, the output prediction residual is computed for the current values ofthe network parameters. This residual error is used to update the values of the VEN parameter estimates, based on a gradient-descent procedure, such as the one shown in Equation (4.132) or a variant of it, as discussed below. Search procedures, either deterministic or stochastic (e.g., genetic algorithms), are also possible candidates for this purpose but are typically more time-consuming. The reader is urged to explore the multitude of interesting approaches and algorithms that are currently available in the extensive literature on artificial neural networks [Haykin, 1994; Hassoun, 1995] and on the classic problem of nonlinear cost minimization that has been around since the days of Newton and still defies a "definitive solution." In this section, we will touch on some ofthe key issues germane to the training offeedforward Volterra-equivalent network models ofthe type depicted in Figure 4.36. These issues are 1. 2. 2. 4.
Selection of the training and testing data sets Network parameter initialization Enhanced convergence for fixed-step algorithms Variable-step algorithrns
The selection 0/ the training and testing data sets entails, in addition to the aforementioned random sampling, the sorting of the input data vectors {v(n)} so that if their Euclidean distance in the L-dimensional space is shorter than a specified "minimal proximity" value, then the data can be consolidated by using the vector averages within each "proximity cell." The rationale for this consolidation is that proximal input vectors v(n) are expected to have small differential effects on the output (in which case their "training value" is smalI) or, ifthe respective observed outputs are considerably different, then this difference is likely due to noise/interference and will be potentially misleading in the training context. This "data consolidation" is beneficial in the testing context as well, because it improves the signal-to-noise ratio and makes the measurements of the quadratic cost function more robust. The "proximity cells" can be defined either through a Cartesian grid in the L-dimensional space or through a clustering procedure. In both cases, a "minimal proximity" value must be specified that quantifies our assessment of the input signal-to-noise ratio
4.2
CONNECTIONIST MODELS
241
(whieh detennines the input veetor "jitter") and the differential sensitivity of the input-output mapping for the system at hand. This value dv defines the spaeing of the grid or the eluster size. Note that this "minimal proximity" value dv may vary depending on an estimate of the gradient of the output at eaeh speeifie loeation in the L-dimensional spaee. Sinee this gradient is not known apriori, a eonservative estimate ean be used or the "data consolidation" proeedure ean be applied iteratively. The downside risk of this proeedure is the possibility of exeessive smoothing of the surfaee that defines the mapping of the input veetor v(n) onto the output y(n). Finally, the random sampling of the eonsolidated data for the seleetion ofthe testing dataset is subjeet to a minimum time-separation between seleeted data points in order to minimize the probability of eorrelated residuals. The remaining datapoints form the training dataset. The networkparameter initialization eoneems the critieal issue of possible entrapment in loeal minima during the training proeedures. This is one of the fundamental pitfalls of gradient-based iterative proeedures, sinee unfortunate initialization may lead to a "stable" loeal minimum (i.e., a loeally "deep" trough of the eost funetion surfaee that remains mueh higher than the global minimum). This risk is often mitigated by seleeting multiple initialization points (that "sample" the parameter spaee with suffieient density either randomly or with a deterministie grid) and eomparing the resulting minima in order to select the global minimum. This proeedure is sound but ean be very time-eonsuming when the parameter spaee is multidimensional. This problem also gave impetus to search algorithms (ineluding genetie algorithms that make random "mutation" jumps) whieh, however, remain rather time-eonsuming. In general, there are no definitive solutions to this problem. However, for Volterratype models of physiologieal systems, one may surmise that the higher-order Volterra terms/funetionals will be of gradually deelining importanee relative to the first two orders in most eases. Consequently, one may obtain initial seeond-order Volterra approximations (using direet inversion or iterative methods) and use these approximations to set many of the initial network parameter values in the "neighborhood" of the global minimum. Subsequently, the eorreet model order ean be selected without seeond-order limitation and the training proeedure ean be properly perfonned to yield the final parameter estimates with redueed risk of "loeal minimum" entrapment. It is worth noting that the aforementioned "data consolidation" proeedure is expeeted to alleviate some ofthe loeal minima that are due to noise in the data (input and output). However, the morphology of the minimized eost funetion depends on the eurrent estimates of the network parameters and, therefore, changes continuously throughout the testing proeess. The general morphology also ehanges for eaeh different data point in the training set, and the loeation of the global minimum may shift depending on the aetual residual/noise at eaeh datapoint of the testing set. It is expeeted that the global minimum for the entire testing set will be very elose to the loeation defined by the true parameter values of the network model. The basie notion of the changing surfaee morphology of the cost function during the training process is not widely understood or appreciated, although its implieations for the training proeess ean be very important (e.g., appearanee and disappearanee of loeal minima during the iterative proeess). A static notion of the eost funetion ean be seriously misleading as it supports an unjustifiable faith in the ehancy estimates of the gradient, whieh is eonstantly changing. When the eost function is formed by the summation of all squared residuals in the testing set, then this morphology remains invariant, at least with respeet to the individual datapoints, but still changes with respeet to the eontinuously updated parameter values. Note that these updates are based
242
MODULAR AND CONNECTIONIST MODELING
on gradient estimates of the ever-changing cost function surfaces for the various datapoints in the training set. Although one may be tempted to combine many training data points in batch form in order to make the cost function surface less variable in this regard, this has been shown empirically to retard the convergence of the training algorithm. Somewhat counterintuitively, the individual training data points seem to facilitate the convergence speed of the gradient-descent algorithm. Enhanced convergence algorithmsfor fixed training steps have been extensively studied, starting with the Newton-Raphson method that employs "curvature information" by means of the second partial derivatives forming the Hessian matrix [Eykhoff, 1974; Haykin, 1994]. This approach has also led to the so-called "natural gradient" method that takes into account the coupling between the updates of the various parameters during the training procedure using eigendecomposition of the Hessian matrix in order to follow a "most efficient" path to cost minimization. Generally, the gradient-based update of the parameter P« at the (i + 1) iteration step is given by IIp(i) k
~p(i+l) _ k
(i) _ ajCi)(n) Pk --/' apk
(4.136)
However, this update of parameter Pk changes the cost-function surface that is used for the update ofthe next parameter Pk+l byapproximately
sr: (n) == k,l+l
i
ajCi)(n) A (i) = uPk - apk
ajCi)(n) apk
}2
(4.137)
Thus, the update ofthe Pk+l parameter should be based on the gradient ofthe "new" cost function ]Ci)(n): a]Ci)
ajCi)
J2 jCi)
apk+l
apk+l
apk+lapk
-- = -- +
.
Ilp~)
ajCi)
a
{ajCi)}2
apk+l
apk+l
apk
== - - - / ' - - - -
(4.138)
It is evident that the "correction" ofthe cost-function gradient depends on the second partial derivative (curvature) ofthe ith update ofthe cost function (i.e., depends on the Hessian matrix ofthe cost-function update, ifthe entire parameter vector is considered), leading to the second-order ith update OfPk+l: .
ajCi)
k+l
apk+l
Ilp Cl ) = - / ' - - +
a {ajCi)}2 y--apk+l
apk
(4.139)
which reduces back to the first-order ith update ofthe type indicated in Equation (4.136) when /' is very small. Because of the aforementioned fundamental observation regarding the changeability ofthe cost-function surface during the training process, it appears imprudent to place high confidence in these gradient estimates (or their Hessian-based corrections). Nonetheless, the gradient-based approaches have found many useful applications and their refinement remains an active area of research. Since these requirements are elaborate and deserve more space than we can dedicate here, we refer the reader to numerous excellent sources in the extensive bibliography on this subject. We note the current popularity ofthe Levenberg-Marquardt algorithm (in part because it is available in MATLAB) and the useful no-
4.2
CONNECTJONIST MODELS
243
tion, embedded in the "nonnalized least mean-squares" method, that the fixed-step size may be chosen inversely proportional to the mean-square value of the input. The use of a momentum tenn in the update fonnula has also been found useful, whereby the update indicated by Equation (4.136) is not direct1y applied but is subject to first-order autorecursive filtering. The choice of the fixed-step value remains a key practical issue in this iterative gradient-based approach. The variable step algorithms for enhanced convergence deserve abrief overview because they were found to perform well in eertain eases where eonvergence with fixed-step algorithms proved to be problematic [Haykin, 1994]. Ofthese algorithms, some use alternate trials (e.g., the "beta rule") and others use previous updates of the parameters to adjust the step size (e.g., the "delta-bar-delta" rule). Note that the idea of reducing the step size as a monotonie function of the iteration index, originating in "stochastic approximation" methods, was found to be ofvery limited utility. The "beta rule" provides that the step size for the training of a specifie parameter is either multiplied or divided by a fixed scalar ß, depending on which of the alternate trials yields a smaller cost function. Thus, both options are evaluated at each step and the one that leads to greater reduction of the eost function is selected. The proper value of the fixed sealar ß has been determined empirically to be about 1.7 in the ease of artificial neural networks. The "delta-bar-delta" rule is rooted on two heuristic observations by Jaeobs, who suggested that if the value of the gradient retains its algebraic sign for several consecutive iterations, then the corresponding step size should be increased. Conversely, ifthe algebraic sign of the gradient alternates over several successive iterations, then the corresponding step size should be decreased. These ideas were first implemented in the "delta-delta rule," which ehanges the step size aecording to the product ofthe last two gradient values. However, observed deficiencies in the application of the "delta-delta rule" led to the variant of the "delta-bar-delta" rule, which inereases the step size by a small fixed quantity K ifthe gradient has the same sign with a low-pass filtered (smoothed) measure ofprevious gradient values; otherwise, it deereases the step size by a quantity proportional to its current value (so that the step size remains always positive but may diminish asymptotically) [Haykin, 1994]. Breaking altogether with the conventional thinking of incremental updates, we propose a method that uses successive parabolic fits (based on local estimates of first and seeond derivatives) to define variable "leaps" in seareh of the global minimum. According to this method, the morphology ofthe cost-function surface with respect to a specific parameter Pk may be one of the three types shown in Figure 4.39. The parameter change (leap) is defined by the iterative relation
p~+I) = p~) _
J'(Pk)
(4.140)
where J' and J" denote the first and second partial derivatives of the cost function evaluated at p~), and e is a very small positive "floor value" used to avoid numerical instabilities when J" approaehes zero. Alternatively, the parameter is not changed when J" is very close to zero, since the cost-funetion surface is eontinuously altered by the updates ofthe other parameters and, consequently, J" is likely to attain nonzero values at the next iteration(s) (moving away from an inflection point on the surface). Clearly, this method is a slight modification of the classic Newton-Raphson method that extends to concave
244
MODULAR AND CONNECTION/ST MODELING
J(p) 1
Case I
Case 11
Case 111
J">O
J"
J"-O
........................ ',--~.---','
Convex Concave
r inflection point
p Figure 4.39 Illustration of the three main cases encountered in the use of the "parabolic leap" method of cost minimization. The simplified two-dimensional drawings show the cost function J(p) with solid line and the parabolie loeal fits with dashed line. The abscissa is the value of the trained parameter P, whose starting value Pr at the rth iteration is marked with a circle and the "Ianding" value (after the "Ieap") is marked with an asterisk. Gase I of a convex morphology (Ieft) iIIustrates the efficiency and reliable eonvergenee of defining the "Ieap" using the minimum of the loeal parabelle fit (the classic Newton-Raphson method). Gase 11 of a concave morphology (middle) iIIustrates the use of the symmetrie point relative to the maximum of the local parabolic fit. Gase 111 of convexleoncave morphology with an inflection point (right) iIIustrates the low Iikelihood of a potential pitfall at the infleetion point. The local parabelle fit has the same first and second derivative values as the cost function at the pivot point P" and eonsequently has the analytical form f(P)
1
= 2 J " (Pr)(P -
Pr)2 + J'(Pr)(P- Pr) + J(Pr)
*"
which defines a "Ieap-Ianding" point p: = Pr - J' (Pr)/IJ"(Pr) I as long as J"(Pr) 0 (Le., avoiding inflection points). This is a slightly modified form of the Newton-Raphson method that covers concave morphologies.
morphologies and inflection points. The rationale of this approach is depicted in Figure 4.39.
The Pseudomode-Peeling Method. One practical way of addressing the problem of local minima in the training of separable VEN (SVN) models is the use of the "pseudomode-peeling" method, whereby a single-hidden-unit SVN model is initially fitted to the input-output data and, subsequently, another single-hidden-unit SVN model is fitted to the input-residual data, and so on until the residual is minimized (i.e., meets our criterion for model-order selection given in Section 2.3.1). Bach of the single-hidden-unit SVN "submodels" thus obtained corresponds to a "pseudo-PDM" of the system (termed the "pseudomode") with its associated nonlinearity. All the obtained SVN "submodels" are combined by summation of their outputs to form the output of the overall SVN model of the system. Although this method protects the user from getting "trapped" in a local minimum (by virtue of its ability to repeat the training task on the residual error and "peel" new pseudomodes as needed), it is generally expected to yield different pseudomodes for different initializations of the successive "peeling" steps. Also, there is no guarantee that this procedure will always converge to the correct overall model, because it is still subject to the
4.2
CONNECTIONIST MODELS
245
uncertainties of iterative minimization methods. Nonetheless, it is expected to reach the neighborhood of the correct overall model in most cases. Another practical advantage of this method is that it simplifies the model-order selection task by limiting it to the structural parameters Land Q at each "peeling" step. Note that the overall model (comprised of all the "peeled" pseudomodes) can be used to construct the unique Volterra kemels of the system, which can be used subsequently to determine the PDMs of the system using the singular-value decomposition method presented in Section 4.1.1. In this manner, the proposed approach is expected to arrive at similar overall PDM models regardless of the particular initializations used during the pseudomode-peeling procedure. An alternative utilization of the overall model resulting from the pseudomode-peeling method is to view it as a "denoising" tool, whereby the output noise is equated with the final residuals of the overall model. The "denoised" data can be subsequently used to perform any other modeling or analysis procedure at a higher output-SNR. This can offer significant practical advantages in low-SNR cases. Another variant of the pseudomode-peeling method that may be useful in certain cases involves the use of a "posterior" filter in each mode branch (akin to an L-N-M cascade discussed in Section 4.1.2). This method can be implemented by the network model architeeture shown in Figure 4.40, whereby two filter banks are employed to represent the two filtering operations: one preprocesses the input signal (prior filter) and the other processes the output ofthe hidden unit for each "peeling pseudomode" (posterior filter). It is evident that this network model architecture gives rise to an overall model of parallel L-N-M cascades that deviates from the basic Volterra-Wiener modular form or from the equivalent PDM model form of Figure 4.1. This model form can be more efficient (i.e., less modes are required) in certain cases [Palm, 1979].
x(t)
I
W
----y--HU J z(t)
WL
.
ßI
rM
'i
Yo
I y(t)
The L-N-M equivalent network model for a "peeling mode," employing two filter banks = 1, ... , Land m =1, ... , M. The second filter bank processes the output z(t) of the hidden unit (HU) and captures the "posterior" filter operation through the output weights {r m}' Figure 4.40
{b,} and {ßm}, where I
246
MODULAR AND CONNECTIONIST MODELING
Nonlinear Autoregressive Modeling (Open-Loop). The Volterra-equivalent network models discussed above can be also deployed in an autoregressive context without recurrent connections (i.e., open-loop configurations), whereby the "input" is defined as the past epoch of the signal from discrete time (n - 1) to (n - M) and the output is defined as the value of the signal at discrete time n. Thus, we can obtain the nonlinear mapping of the past epoch of a signal [Yen - 1), ... , yen - M)] onto its present value yen) in the form of a feedforward network model that constitutes an "open-loop," nonlinear autoregressive (NAR) model of the signal yen). Clearly, this NAR model attains the form of an autoregressive discrete Volterra model (with well-defined autoregressive kernels ) that is equivalent to a nonlinear difference equation and describes the "internal dynamics" of a single process/signal. The residual of this NAR model can be viewed as the "innovations process" of conventional autoregressive terminology (i.e., the unknown signal that drives the generation of the observed signal through the NAR model). It is evident that this class of models can have broad utility in physiology, since it pertains to the characterization of single processes (e.g., heart rate variability, neuronal rhythms, endocrine cycles, etc.). The estimation approach for this NAR model form is determined by the criterion imposed on the properties of the residuals. For instance, if we seek to minimize the variance of these residuals, then the estimation approach is similar to the aforementioned ones (i.e., least-squares estimation). However, residual variance minimization may not be a sensible requirement in the autoregressive case, because the residuals represent an "innovation process" and do not quantify a "prediction error" as before. Thus, we may require that the residuals be white (i.e., statistically independent innovations that yield the "maximum entropy" solution), or that they have minimum correlations with certain specified variables (maximizing the "mutual information" criterion), or that they exhibit specific statistical and/or spectral properties (as prescribed by previous knowledge about the process). This important issue is revisited in Chapter 10 in connection with the modeling of multiple interconnected physiological variables operating in closed-loop or nested-loop configurations that represent the ultimate modeling challenge in systems physiology.
4.3
THE LAGUERRE-VOLTERRA NETWORK
The most widely used Volterra-equivalent network model to date has been the so-called "Laguerre-Volterra network" (LVN) which employs a filter bank of discrete-time Laguerre functions (DLFs) to preprocess the input signal (as discussed in Section 2.3.2 in connection with the Laguerre expansion technique) and a single hidden layer with polynomial activation functions, as shown in Figure 4.41 [Marmarelis, 1997; Alataris et al., 2000; Mitsis & Marmarelis, 2002]. The LVN can be viewed as a network-based implementation of the Laguerre expansion technique (LET) for Volterra system modeling that employs iterative cost-minimization algorithms (instead of direct least-squares inversion employed by LET) to estimate the unknown kernel expansion coefficients (through the estimates of the LVN parameters) and the DLF parameter a. The latter has proven to be extremely important in actual applications, because it determines the efficiency ofthe Laguerre expansion of the kemels and constitutes a critical contribution by the author to the long evolution of Laguerre expansion methods. The LVN approach has yielded accurate and reliable models of various physiological systems since its recent introduction, using relatively short input-output data records (see Chapter 6).
4.3
THE LA GUERRE-VOLTERRA NETWORK
247
x(n)
h
th
DISCRETE LAGUERRE FILTERBANK
Hidden Unit
"o(n)
1JL - i
(n)
'--------~----------;Ä------------------------------,
:
Wo h
WL - i
l'
h
», (n) = L
I
HIDDEN LAYER
(n)
Wj.h"j ! j=O:
:
H'~
1 L-i
I
1
: : : : : :
:
,
: : :
: : :
7 ---------------------------------------------'
t.:
h
Q
Zh
(n)
=
L
Cq.hU~ (n)
q=i
Figure 4.41 The Laguerre-Volterra Network (LVN) architecture employing a discrete Laguerre filter bank for input preprocessing and a single hidden layer with polynomial activation functions distinct for each hidden unit HU h (see text).
The basic architecture of the LVN is shown in Figure 4.41 and follows the standard architecture of a single-layer, fully connected feedforward artificial neural network, with three distinctive features that place it in the SVNNEN class ofmodels: (1) the DLF filterbank that preprocesses the input with trainable parameter o; (2) the polynomial (instead ofthe conventional sigmoidal) activation functions in the hidden units; (3) the nonweighted summative output unit (no output weights are necessary because the coefficients ofthe polynomial activation functions in the hidden units are trainable). Note that the polynomial activation functions do not have constant terms, but the output unit has a trainable offset value. The LVN has been shown to be equivalent to the Volterra class of finite-order models [Marmarelis, 1997] and yields interpretable PDM models (see Chapter 6). As discussed in Section 2.3.2, the output vj(n) of the jth DLF can be computed by means ofthe recursive relation (2.203) that improves the computational efficiency ofthe DLF expansion, provided the parameter a can be properly selected. A major advantage of the LVN approach over the Laguerre expansion technique is the iterative estimation of a (along with the other LVN parameters). The input ofthe hth hidden unit is the weighted sum ofthe DLF filterbank outputs: L-I
uh(n) =
I
Wj,hVj(n)
(4.141)
j=O
where h = 1, 2, ... , Hand the DLF indexj ranges from 0 to L - 1, instead ofthe conventional range from 1 to L used in Section 2.3.1 (since the zero-order DLF is defined as the first term in the filter bank). The output of the hth hidden unit is given by the polynomial activation function
248
MODULAR AND CONNECTIONIST MODELING
Q
Zh(n) = LCq,hU%(n) q=l
(4.142)
where Q is the nonlinear order of the equivalent Volterra model. The LVN output is given by the nonweighted summation of the hidden-unit outputs, including a trainable offset value y-: H
y(n) = LZh(n) + yo
(4.143)
h=l
The parameter a is critical for the efficacy of this modeling procedure because it defines the form of the DLFs and it detennines the convergence of the DLF expansion, which in turn determines the computational efficiency ofthis approach. In the original introduction of the Laguerre expansion technique, the key parameter a was specified beforehand and remained constant throughout the model estimation process. This presented a serious practical impediment in the application of the method, because it required tedious and time-consuming trials to select the proper a value that yielded an efficient Laguerre expansion (i.e., with rapid convergence and capable of accurate kerneI representation). As discussed in Section 2.3.2, the recursive relation (2.203) allows the computationally efficient evaluation of the prediction error gradient with respect to a, so that the iterative estimation of the parameter a from the data is feasible, thus removing a serious practicallimitation of the original Laguerre expansion approach. It is evident from Equations (2.206) and (2.207) that the parameter ß = va can be estimated using the iterative expression H
L-I
ß{r+l) = ß{r) - 'YßeCr)(n) L f~{r)(Uh) L Wh,j[vj(n - 1) + Vj-l(n)] h=l j=O
(4.144)
where eCr )( n) is the output prediction error at the rth iteration, l'ß is the fixed learning constant (update stepsize), andfh{r)(Uh) is the derivative ofthe polynomial activation function of the hth hidden unit at the rth iteration: Q
fh{r)[u~)(n)] = L qc~~[u~)(n)]q-l
(4.145)
q=l
where the superscript (r) denotes the value of the subject variable/parameter at the rth iteration. The iterative relations for the estimation of the other trainable parameters of the LVN are W~l) = w~3 C{r+l) q.h
- 'YweCr)(n)f~{r)[u~)(n)]vr)(n)
= c{r) q,h -
y~l)
'V
tc
tfr)(n)[u{r)(n)]q h
= y~) _ 'Yytfr)(n)
(4.146) (4.147) (4.148)
wherej = 0, 1, ... ,L - 1; h = 1,2, ... ,H; q = 1, 2 ... , Q; and 'Yw, l'c and l'y denote the respective learning constants for the weights, the polynomial coefficients, and the output offset.
4.3
THE LAGUERRE-VOLTERRA NETWORK
249
In order to assist the reader in making the formal eonnection between the LVN and the Volterra models, we note that successive substitution of the variables {Zh} in terms of {Uh} and, in turn, {Vi}' yields an expression of the output in terms of the input that is equivalent to the discrete Volterra model. Then it ean be seen that the Volterra kemels can be expressed in terms of the LVN parameters as ko =Yo H
L-1
k l(m1) = LCI,h L Wh,jbi(m1) h=1 i=O H
L-1
(4.150)
L-1 L-1
k2(mJ, m2) = L C2,h L L Wh,i1wh,J2bi1(m1)bi2(m2) h=1 i1=O i2=O H
(4.149)
(4.151)
L-I
kQ(mJ, ... ,mQ) = L CQ,h L ... L Wh,j1'" wh,JQbi 1(ml) ... biQ(mQ) h=1 il=O iQ=O
(4.152)
If so desired, onee the training of the LVN is performed, the Volterra kemels can be evaluated from Equations (4.149)-(4.152). However, it is recommended that the physiologieal interpretation of the obtained model be based on the equivalent PDM model, and, thus, the Volterra kemels serve only as a general framework of reference and a means of validation!evaluation. A critical practical issue is the selection of the LVN structural parameters, namely the number L ofDLFs, the number H ofhidden units, and the degree Q ofthe polynomial activation functions, so that the LVN model is not underspecified or overspecified. This is done by successive trials in ascending order (i.e., moving from lower to higher parameter values, starting with L = 1, H = 1, and Q = 1 and incrementing from left to right) using a statistical criterion to test the reduction in the eomputed mean-square error of the output prediction achieved by the model, properly balaneed against the total number offree parameters. To this purpose, we have developed the Model Order Seleetion (MOS) eriterion ofEq. (2.200) that is described in Seetion 2.3.1. Conceming the number L of DLFs, it was found that it can affeet the convergence of the iterative estimation of the a parameter in eertain eases where more DLFs in the L VN make a eonverge to a smaller value. The reason is that increasing DLF order results in a longer spread of significant values and increased distance between zero crossings in each DLF, which is also what happens when o is inereased. Note that Q is independent ofthe L selection since it represents the intrinsic nonlinearity of the system. Likewise, the selection of L does not affect (in principle) the seleetion of H, which determines the number of required PDMs in the system under study. We must emphasize that the selection ofthese structural parameters of the LVN model is not unique in general, but the equivalent Volterra model ought to be unique. Illustrative Example of LVN Modeling. To illustrate the efficacy of the LVN modeling approach, we simulate a fifth-order system deseribed by the following difference and algebraic equations: v(n) + A 1v(n - 1) + A 2v(n - 2) = BoX(n) + B 1x(n - 1)
(4.153)
250
MODULAR AND CONNECTIONIST MODELING
y(n)
=
1 1 1 1 v(n) + -v2(n) + -v3(n) + -v4(n) + -v5(n) 2 3 4 5
(4.154)
where the coefficient values are arbitrarily chosen to be Al = -1.77, A 2 = 0.78, B o = 0.25, and BI = -0.27. This system is equivalent to the cascade of a linear filter with two poles [defined by Equation (4.153)] followed by a fifth-degree polynomial static nonlinearity [defined by Equation (4.154)]. In this example, we know neither the appropriate value of a nor the appropriate number L of DLFs, but we do know the appropriate values of the structural parameters: H = 1 and Q = 5- knowledge that can be used as "ground truth" for testing and validation purposes. The first-order Volterra kerneI can be found analytically by solving the difference equation (4.153) that yields kI(m) =
1
4 (2pi - pi)
(4.155)
where PI = exp(-0.2) and P2 = exp(-0.05). The high-order Volterra kernels of this system are expressed in terms of the first-order kernel as 1 kr(mh' .. ,mr) = -kI(mI) ... kI(m r) r
(4.156)
for r = 2, 3, 4, 5 (of course, k, is zero for r > 5). The simulation ofthis system is performed with a unit-variance, 1024-point Gaussian CSRS input and the selected LVN model has structural parameters L = 4, H = 1, and Q = 5. The obtained Volterra kernel estimates are identical to the true Volterra kernels ofthe system given by Equations (4.155) and (4.156). In order to examine the performance of the LVN approach in the presence of noise, the simulation is repeated for noisy output data, whereby an independent GWN signal is added to the output for a signal-to-noise ratio (SNR) equal to 0 dB (i.e., the output signal power is equal to the noise variance). The a leaming curves for both the noise-free and noisy cases are shown in Figure 4.42. We can see that a converges to nearby values: 0.814 in the noise-free case and 0.836 in the noisy case. Its convergence is not affected significantly by the presence of noise. The large values of a in this example reflect the fact that this system has slow dynamics (the spread of significant values of the first-order kernel is about 100 lags, which corresponds to the memory-bandwidthproduct ofthe system). The estimated first-order and second-order kernels in the noisy case are shown along with the true ones (which are identical to the noise-free estimates) in Figure 4.43. The noisy estimates exhibit excellent resemblance to the noise-free estimates (which are identical to the true kemels) despite the low-SNR data and the relatively short data record of 1024 samples, However, it is evident that the second-order kernel estimate is affected more than its first-order counterpart by the presence of noise. The NMSE values of the LVN model prediction for a different GWN input and output data (out-of-sample prediction) are 0.2% and 49.41% for the noise-free and noisy cases, respectively. Note that a perfect output prediction in the noisy case for SNR = 0 dB corresponds to 50% NMSE value. These results demonstrate the efficacy and the robustness ofthe LVN modeling approach, even for high-order systems (fifth-order in this example), low SNR (0 dB), and short data records (1024 samples),
4.3
THE LAGUERRE-VOLTERRA NETWORK
1-
251
I
0.35 - - - - , - - - - - - , - - - - - - r - - - - - , - - - - - . . . - - - - - - - - , r-i
Noise tee - - SNR=OdB
0.3 0.25
0.2,
'V---
1'I I I I I I I I I
0.15 0.1 0.05
o[-L- ---'../ ) o
!
50
100
150 Iterations
200
250
300
Figure 4.42 The learning curves of a for the simulated fifth-order system for noise-free and noisy outputs. Note that the learning constant for the noisy case is ten times smaller [Mitsis & Marmarelis,
2002].
In actual applications to physiological systems, the SNR rarely drops below 4 dB and almost never below 0 dB. Likewise, it is very rare to require a model order higher than fifth and it is not unusual to have data records of size comparable to 1024 (in fact, in the early days of the cross-correlation technique, the data records had typically tens of thousands of samples). Therefore, this illustrative example offers a realistic glimpse at the quality of the modeling results achievable by the LVN approach. The application of this modeling approach to actual physiological systems is illustrated in Chapter 6. It is accurate to say that the LVN approach has yielded the best Volterra modeling results to date with real physiological data and, therefore, points to a promising direction for future modeling efforts, without precluding further refinements or enhanced variants of this approach in the future.
Modeling Systems with Fast and Siow Dynamics (LVN-2). Ofthe many possible extensions of the LVN approach, the most immediate concems the use of multiple Laguerre filter banks (with distinct parameters o) in order to capture multiple time scales of dynamics intrinsic to a system. This is practically important because many physiological systems exhibit vastly different scales of fast and slow dynamics that may also be interdependent-i-a fact that makes their simultaneous estimation a serious challenge in a practical context. Note that fast dynamics require high sampling rates and slow dynamics necessitate long experiments, resulting in extremely long data records (with all the burdensome experimental and computational ramifications). A practical solution to this problem can be achieved by a variant of the LVN approach with two filter banks (one for fast and one for slow dynamics) discussed below [Mitsis & Marmarelis, 2002].
252
M ODULAR AND CONNECTIONIS T M ODELING
3 True . _. - Estimated
2.5 . 2 1.5
g ::;;: 0.5 0 -0.5 -1
0
5
10
15
25
20
m
(a)
, .,
.
3
:.'
3v " :
..
2
.:
~
2J. · · :
~1
~1
go
go
SI
SI ·1
·1
-2
·2
o
o
5
5
10
10
15
15
20
m1
20 25 0
m1
25 0
(b) Figure 4.43 (a)The true and estimated first-order Volterra kernel of the simulated fifth-order system using LVN (L = 4, H = 1, Q = 5) tor the noisy case of SNR = 0 db [Mitsls & Marmarelis, 2002]. (b) The true (Ieft) and estimated (right) second-order Volterra kernel of the simu lated fifth-order system using LVN (L = 4, H = 1, Q = 5) for the noisy case of SNR = 0 db [Mitsis & Marmarelis, 2002].
4.3
THELAGUERRE-VOLTERRA NETWORK
253
The proposed architecture of the LVN variant with two filter banks (LVN-2) is shown in Figure 4.44. The two filter banks preprocess the input separately and are characterized by different Laguerre parameters (al and a2) corresponding generally to different numbers of DLFs (LI and L 2 ) . A small value of al for the first filter bank and a large value of a2 for the second filter bank allows the simultaneous modeling of the fast and the slow components of a system, as well as their interaction. As was discussed in Section 2.3.2, the asymptotically exponential structure of the DLFs makes them a good choice for modeling physiological systems, since the latter often exhibit asymptotically exponential structure in their Volterra kernels. However, one cannot rule out the possibility of system kemels that do not decay smoothly-a situation that will require either a large number of DLFs or an alternate (more suitable) filterbank. The reader must be reminded that the parameter adefines the exponential relaxation rate of the DLFs and determines the convergence of the Laguerre expansion for a given kernel function. Larger a values result in longer spread of significant values (slow dynamics). Therefore, the choice of the DLF parameters (al and a2) for the two filter banks of the LVN-2 model must not be arbitrary and is critical in achieving an efficient model representation of a system with fast and slow dynamics. This choice is made automatically by an iterative estimation procedure using the actual experimental data,
xrn)
(2)
WH.~-l
(1) w1•O
Zl
(n)
ZH
(n) I"
Yo
y(n) Figure 4.44 The LVN-2 model architecture with two Laguerre filter banks {b/ 1)} and {b/ 2 )} that preprocess the input x(n). The hidden units in the hidden layer have polynomial activation functions {fh} and receive input from the outputs of both filter banks. The output y(n) is formed by summation of the outputs of the hidden units {Zh} and the output offset Yo [Mitsis & Marmarelis, 2002].
254
MODULAR AND CONNECTIONIST MODELING
as discussed earlier for the LVN model. For the LVN-2 model, the iterative estimation fonnula is H
ß
= ß
"~
Q
)
~
LI~l
)
~
q[c q.nr hW(hi)·uhq-1(n)[v(i)(n - 1) + v(~l(n)]] n.j } } r (4.157) r
h=l q=l j=O
where i = 1, 2 is the filter bank index, tfr)(n) is the output prediction error at the rth iteration, ßi = ~, and {"Ii} are fixed positive learning constants. The notation [·]r means that the quantity in the brackets is evaluated for the parameter estimates at the rth iteration. The remaining variables and parameters are defined in a manner similar to the LVN case, as indicated in Figure 4.44. Note, however, that the filter bank index i appears as a subscript of L (since the two filter banks may have different numbers ofDLFs in general) and as a superscript ofthe weights {w~}} and ofthe DLF outputs {v5i)(n)}, indicating their dependence on the respective filter bank. The inputs to the polynomial activation functions of the hidden units are 2
LI~l
Uh(n) = L ~L ~ w(i).v(i)(n) h,} }
(4.158)
;=1 j=O
and the LVN-2 output is given by H
yen) = yo + I.fh[uh(n)]
(4.159)
h=l
where each polynomial activation function perfonns the polynomial transformation Q
h[Uh] =
I
cq,huff
(4.160)
q=l
that scrambles the contributions of the two filter banks and generates output components that mix (or solely retain, if appropriate) the characteristics of the two filter banks. For instance, the equivalent first-order Volterra kernel of the LVN-2 model is composed of a fast component (denoted by subscript ''/' and corresponding to i = 1): L1-l
H
kjm) =
I
C1,h
h=l
I
w~,}bY){m)
(4.161)
j=O
and a slow component (denoted by subscript "s" and corresponding to i = 2): ks{m) =
H
L2-1
h=l
j=O
IC1,h I
w~~}b)2){m)
(4.162)
where k1(m) = kj(m) + ks(m)
(4.163)
The equivalent higher-order Volterra kemels of the LVN-2 model contain also components that mix the fast with the slow dynamics (cross-tenns). For instance, the second-
4.3
THE LAGUERRE-VOLTERRA NETWORK
255
order kerne I is composed of three eomponents: a fast kffi a slow kss , and a fast-slow crossterm kJs, that are given by the expressions H Ll-l Ll-l ~ C2,h ~ ~ (1) (I) b(1)( )b(1)( ) m2 - L L L Wh,}IWh,}2 }1 m, }2 m2 h=1 }1 =0 }2=0
~f k1J\mJ,
H
kss ( nu, m2 ) -- "L
L2-1 L2-1 ~ ~ (2) (2) b(2)( )b(2)( ) C2,h L L Wh,}IWh,}2 }1 m, }2 m2
h=1 LI
H
kjs(mJ,
m2)
==
L
h=1
C2,h
(4.164)
) -
(4.165)
}1=0 }2=0
L2
L L [w~~JIW~~)2b)?(ml)b)~(m2) + w~~)lw~)2b)I/(m2)b)~(ml)]
(4.166)
}1=1}2=}
The second-order kernel is the summation of these three components. In general, the equivalent qth-order Volterra kernel ean be reconstructed from the LVN-2 parameters as H
k q ( mJ, . . . , m q )
2
== LCq,hL' " ~ h=1
il=1
2 Lil-1
~ ~
Li-l
.~ L
(i})
(iq) b ( i } ) ( ) }1 m,
.. L L ... Wh,}1 . . . Wh,} iq=I}I=O }q=O q
...
b(iq) (
}
mq
)
(4 . 167)
q
It is evident that there is a wealth of information in these kernel components that cannot be retrieved with any other existing method. This approach can be extended to any number of filter banks. A critical practieal issue for the suceessful applieation of the LVN-2 model is the proper selection of its structural parameters, that is, the size of the DLF filter banks LI and L 2 , the number H of hidden units, and the degree Q of the polynomial activation functions. As in the LVN case discussed previously, this selection can be made by suecessive trials in ascending order (i.e., moving from lower to higher values of the parameters) until a proper criterion is met (e.g., the statistical MOS criterion presented in Section 2.3.1) that defines statistically the minimum reduetion in the normalized mean-square error (NMSE) of the output prediction achieved by the model for an increment in the structural parameters. Specifieally, we eommence the LVN-2 training with structural parameter values LI == L 2 = 1, H == 1, and Q == 1 and increment the struetural parameters sequentially (starting with LI and L 2 , and continuing with Hand Q) until the MOS criterion is met. The training ofthe LVN-2 is performed in a manner similar to the training of the LVN (based on gradient deseent) and has not presented any additional problems.
Illustrative Examples of LVN-2 Modeling. To illustrate the performance of the LVN-2 modeling approach, we use three simulated nonlinear systems: the first system is isomorphie to the LVN-2 model shown in Figure 4.44, (used as a "ground-truth" validation test), the second is a high-order modular system, and the third is a nonlinear parametrie model defined by the differential equations (4.176)-(4.178) that are frequently used to describe biophysical and biochemical processes [Mitsis & Marmarelis, 2002]. The first simulated system has Volterra kemels that are composed of linear combinations ofthe first three DLFs with two distinct Laguerre parameters: al == 0.2 and a2 == 0.8. It can also be represented by the modular model of Figure 4.45 with two branches of linear filters h, and h2 , representing the fast and slow dynamics, respectively) feeding into the output static nonlinearity N given by the seeond-order expression yen) == ul(n) + u2(n) + uy(n) - u~(n)
+ u}(n)u2(n)
(4.168)
256
MODULAR AND CONNECTIONIST MODELING
.----1 ~ I
~! uJ(n) I
-----'I
x(n)
N ~
~n)
~
Figure 4.45 The modular model of the first simulated example of a second-order system [Mitsis & Marmarelis, 2002].
where UI and U2 are the outputs of h, and h2 respectively, given by
h1(m) = bbl)(m) + 2bi l)(m) + b~l)(m)
(4.169)
h2(m) = bb2)(m) - bi2)(m) + 2b~2)(m)
(4.170)
where by)(m) denotes thejth-order DLF with parameter a; By substituting the convolutions ofthe input with h 1 and h2 for UI and U2' respectively, in Equation (4.168), the first-order and second-order Volterra kemeIs of this system are found to be
k}(m) = hl(m) + h2(m)
k2(mJ, m2) = h l(ml)h l(m2) - h2(ml)h2(m2) +
1
"2 [h
1(ml)h
(4.171)
2(m2) + h I(m2)h 2(ml)] (4.172)
These kemeIs contain fast and slow components corresponding to the two distinct Laguerre parameters, as indicated in Equations (4.171) and (4.172). The system is simulated with a Gaussian CSRS input ofunit variance and length of 1024 data points (at = T = 1). Following the ascending-order MOS procedure described earlier, we detennine that three DLFs in each filterbank (LI = L 2 = 3) and three hidden units (H = 3) with distinct second-degree polynomial activation functions (Q = 2) are adequate to model the system, as theoretically anticipated. The number offree parameters ofthis LVN-2 model is (LI + L 2 + Q) H + 3 = 27. Note that the obtained structural parameter values in the selected LVN-2 model are not unique in general, although in this example they match the values anticipated by the construction of the simulated system. Different structural parameters (i.e., LVN-2 model configurations) may be selected for different data from the same system, and the corresponding L VN-2 parameter values (e.g., weights and polynomial coefficients) will be generally different as weIl. However, what remains constant is the equivalent Volterra representation ofthe system (i.e., the Volterra kemeIs ofthe system), regardless of the specific LVN-2 configuration selected or the corresponding parameter values. In the noise-free case, the estimated LVN-2 kernels of first and second order are identical to their true counterparts, given by Equations (4.171) and (4.172), respectively, and the normalized mean-square error (NMSE) ofthe output prediction achieved by the LVN-
4.3
THE LAGUERRE-VOLTERRA NETWORK
257
2 model is on the order of 10- 3 , demonstrating the excellent performance of this modeling procedure. The first-order kernel is shown in Figure 4.46, along with its slow and fast components estimated by LVN-2 . The estimated second-order kernel and its three components are shown in Figure 4.47 , demonstrating the ability ofthe LVN-2 to capture slow and fast dynamics. The utility of employing two filter banks in modeling systems with fast and slow dynamics can be demonstrated by comparing the performance ofthe LVN-2 model with an LVN model (one filter bank) having the same total number of free parameters (L = 6, H = 3, Q = 2). In order to achieve comparable performance with a single filter bank, it was found that we have to almost double the total number of free parameters to 44 from 27 [Mitsis & Marmarelis, 2002]. In order to examine the effect of noise on the performance of the LVN-2 model, we add independent GWN to the system output for a signal-to-noise ratio (SNR) equal to 0 dB (i.e., the noise variance equals the mean-square value ofthe de-meaned noise-free output). Convergence occurs in about 600 iterations (peculiarly, faster than in the noise-free case) and the final estimates of Cl! and ClZ are not affected much by the presence of severe noise (&, = 0.165 and &z = 0.811). The resulting NMSE value for the LVN-2 model prediction is 53.78% in the noisy case , which is deemed satisfactory, since the ideal NMSE level is 50% for SNR = OdB. The NMSEs of the estimated first-order and second-order Volterra kernels in the noisy case are given in Table 4.1 along with the estimates of Cll and ClZ that corroborate the previous favorable conclusion regarding the efficacy of the LVN approach, especially when compared to the estimates obtained via the conventional cross-correlation technique (also given in Table 4.1). The second simulated system also has the modular architecture shown in Figure 4.45 but with a fourth-order nonlinearity given by yen) = u!(n)
+ 2u!(n) + 4uy(n) -
4u~(n)
+ 4u,(n)uz(n) + t u1(n) + ! u~ (n) + au t (n) + ~ u ~(n) (4.173)
31
I
I
_ · - FMI I ... Slow Total . 1-
2.5 2
1.5
..~
1
-0.5 ·1'
o
20
40
60
so
100
I 120
m
Figura 4.46 Estimated first-order Volterra kernel for the first simulated system and its slow/fast components [Mitsis & Marmarelis, 2002].
258
M ODULAR AND CO NNECTIONIST MODELING
:r
(a)
(b)
.....:. ....
f
.'.
+.
o ..
.....
5. " -": ' ' .;
0
o~~
.+ . :
" "
>100
-5 0 IU
ml
30 0
(c)
(d)
..;.... 0.2 .
..... ... ~
.
0.5
~.
o~ -O.~ .' ~ ~ ~: :~ 100 ...'.. .. .
.
30
20~
..::. ,... .. . -:..
0
oo.g 100
Flgure 4.47 Estimated second-order Volterra kernel for the first simulated syst em (a), and its three components: (b) fast component, (c) slow component, and (d) fast-slow cross-component [Mitsis & Marmarelis, 2002).
and linear filter impulse responses that are not linear combinations of DLFs but given by
~ )si n( 7)
(4.174)
;~ ) - exp(- ~ )
(4.175)
hJ(m) = exp(-
him ) = exp( -
Employing the ascen ding-order MOS procedure for the selection of the structural parameters ofthe LVN-2 mod el, we select Z, =L2 = 7, H = 4, and Q = 4 (a total of75 LVN-2 model parameters). The results for a Gaussian CSRS input of 4096 data points are excellent in the noise-free case, as before, and demonstrate the efficac y ofthe method for this high-o rder system [Mitsis & Marmarelis, 2002]. The effect of output- additive noise on the performance of the LVN-2 model is examined for this system by adding 20 different independent GWN signals to the output for an SNR of 0 dB. The resulting NMSE values (computed over the 20 independent trials) for
Table 4.1 Normalized mean-square errors (NMSEs) for model prediction and kernel estimates using LVN-2 and conventional cross -correlation for SNR = 0 dB
LVN-2 Cros s-correlation
a,
az
Prediticion NMSE (%)
0.165
0.8 11 -
53.78 86.38
-
Kerne l NMSEs k ,(m)
kz(m" mz)
(%)
(%)
7.69 42 1
4.10 1919
4.3
THE LAGUERRE-VOLTERRA NETWORK
259
the output prediction and for the estimated kernels (mean value ± standard deviation) are 48.42 ± 2.64% for the output prediction, 3.72 ± 2.37% for the k l estimate, and 6.22 ± 3.82% for the k 2 estimate. The robustness of the method is evident since the prediction NMSE is close to 50% (ideal NMSE for SNR = 0 dB) and the kernel NMSEs are low compared to the variance of the output-additive noise. In fact, the NMSE of the kernel estimates can be used to define an SNR measure for the kernel estimates computed as -10 10g(NMSE), which yields about 14 dB for the k l estimate and about 12 dB for the k2 estimate. Finally, a third system of different structure is simu1ated, which is described by the following differential equations (the input-output data are properly discretized after continuous-time simulation of these differential equations): dy(t) - - + boY(t) = [CIZI(t) - C2Z2(t)}y(t) + Yo dt
(4.176)
dz\(t) + btz\(t) = x(t) dt
(4.177)
dz 2(t) + b2zit) = x(t) dt
(4.178)
where Yo is the output baseline (basal) value and Zl(t) and Z2(t) are internal state variables whose products with the output y(t) in the bilinear terms of Equation (4.176) constitute the nonlinearity of this system, which gives rise to an equivalent Volterra model of infinite order. The nonlinearities of this system [i.e., the bilinear terms of Equation (4.176)] may represent modulatory effects that are often encountered in physiological regulation (neural, endocrine, cardiovascular, metabolie, and immune systems) or in intennodulatory interactions of cellular/molecular mechanisms (including voltage-dependent conductances of ion channels in neuronal membranes or ligand-dependent conductances in synapses-see Chapter 8). The contribution ofthe qth-order Volterra kernel to the output ofthis system is proportional to the qth powers of RCI and RC2' where R is the root-mean-square value ofthe input. When the magnitudes of Cl and C2 are smaller than one, a truncated Volterra model can be used to approximate the system. For the parameter values of b., = 0.5, b 1 = 0.2, b2 = 0.02, Cl = 0.3, and C2 = 0.1, it was found that a fourth-order LVN-2 model was sufficient to represent the system output for a Gaussian CSRS input with unity power level and length of 2048 data points. Following the proposed MOS search procedure for the selection ofthe structural parameters of the model, an LVN-2 with LI = L 2 = 5, H = 3, and Q = 4 was selected (a total of 45 free parameters). The obtained results for the noise-free and noisy (SNR = 0 dB) conditions are given in Table 4.2, demonstrating the excellent performance of the LVN-2
Table 4.2
LVN-2 model performance for the system described by Equations (4.176)-(4.178)
Noise-free output Noisy output (SNR = 0 dB)
al
a2
Output prediction NMSE(%)
First-order Kerne! NMSE(%)
0.505 0.366
0.903 0.853
0.28 47.49
0.10 4.13
260
MODULAR AND CONNECTIONIST MODELING
model for this system as welle It should be noted that the estimated zeroth-order kernel was equal to 1.996, very close to its true value of 2. The equivalent Volterra kerneIs for this system can be analytically derived by using the generalized harmonie-balance method described in Section 3.2. The resulting analytical expressions for the zeroth and first-order kerneIs are k o = Yo bo
(4.179)
C2 } Yo { Cl kl(m) = b bl-bo [exp(-bom)-exp(-b,m)]- bz-b [exp(-bom)-exp(-bzm)] (4.180) o o
The analytical forms of the higher-order kerneIs are rather complex and are not given here in the interest of space, but the general expression for the second-order Volterra kernel is given by Equation (3.71). The fast component ofthe first-order kernel corresponds to the first exponential difference in Equation (4.180), whereas the slow component corresponds to the second exponential difference in Equation (4.180) (recall that b l is ten times bigger than b2 in this simulated example). We close this section with the conclusion that the use of LVN offers powerful means for efficient modeling of nonlinear physiological systems from short input-output data records. The problem of efficient modeling of nonlinear systems with fast and slow dynamics can also be addressed by employing two filter banks characterized by distinct Laguerre parameters (the L VN-2 model). The efficiency and robustness of this approach were demonstrated in the presence of severe output-additive noise.
4.4
THE VWM MODEL
The basic rationale for the use ofthe Volterra-equivalent network models is twofold: (1) the separation of the dynamics from the nonlinearities occurring at the first hidden layer, and (2) the compact representation of the dynamics with a "judiciously chosen" filter bank and of the nonlinearities with a "properly chosen" structure of activation functions in the hidden layer(s). This basic rationale is encapsulated in the proposed Volterra-Wiener-Marmarelis (VWM) model in a manner considered both general and efficient that obviates the need for a preselected basis in the filter bank and provides additional stability of operation with the use of a trainable compressive transformation cascaded with the polynomial activation functions. The overall structure of the VWM model exhibits close affinity with the Volterra-equivalent network models shown in Figure 4.36 and discussed in Section 4.2.2. Specifically, the VWM model is composed of cascaded and lateral operators (both linear and nonlinear) forming a layered architecture, as shown in Figure 4.48 and described below. The previously employed filterbank for input preprocessing is replaced in the VWM model with a set of linear difference equations of the form vj(n)
=
aj,IVj(n - 1) + ... + (Xj,Kjvj(n -~)
+ x(n) + ßj,lx(n - 1) + ... + ßj,Mjx(n -~) (4.181)
4.4
ARX (Kl'M J ) Mode #1
VI
THE VWM MODEL
261
(n)
I
""-
W1,1
h W 1•H
x(n)
y(n)
WL,l
ARX(KL,ML )
,
VL
(n)
",..
Mode#L Figure 4.48 Schematic of the VWM model. Each input-preprocessing mode # j U = 1, ... , L) is an autoregressive with exogenaus variable (ARX) difference equation of order (Kj , M}; the activation functions {fh} and (gJ are polynomials of degree Qh and R; respectively as shown in Equations (4.182) and (4.184). {Sh} are sigmoidal functions given by Equation (4.183) and the weights {Wj,h} and {'h,i} are applied according to Equations (4.182) and (4.184), respectively.
which define the "modes" ofthe system, where x(n) is the input and vj(n) is the output of the jth mode. Note that ßj,O is set equal to 1, because subsequent weighting of the mode outputs in Equation (4.182) makes these coefficients redundant. The modes defined in Equation (4.181) for j = 1, ... , L serve as the input preprocessing filter bank but take the form of trainable ARX models instead of the preselected filter bank basis in the VEN architeeture. The mode outputs are fed into the units of the first hidden layer (after proper weighting) and are transformed first polynomially and subsequently sigmoidally to generate the output of the hth hidden unit as
zh(n) = Sh{
~Ch,q{~ wj,hvin) }q}
(4.182)
where the sigmoidal transformation (denoted by Sh) is distinct for each hidden unit (with trainable slope Ah and offset eh) and is given by the bipolar sigmoidal expression of the hyperbolic tangent: 1 - exp[-Ah(U - eh)] Sh{U} = 1+exp[-Ah(u-8h)]
(4.183)
The outputs of these hidden units are combined linearly with weights {(h,i} and entered into the units of a second hidden layer that will be termed the "interaction layer" for clarity of communication. Each "interaction unit" generates an output
R·
IMn) = ~
[H
». ~th,izh(n)
]r
(4.184)
by means of a polynomial transformation of degree R; The VWM model output is formed simply by summing the outputs of the interaction units and an output offset yo: 1
yen) = yo +
I
i=1
l/J,.{n)
(4.185)
262
MODULAR AND GONNEGTIONIST MODELING
No output weights are needed because the scaling of the contributions of the interaction units is absorbed by the coefficients {'Yi,r}. The equivalence of the VWM model with the discrete Volterra model is evident from the discussion of Section 4.2.1. Note that the cascaded polynomial-sigmoidal nonlinear transformation in the hidden layer endows the VWM model with the potential capabilities of both polynomial and sigmoidal activation functions in a data-adaptive manner, so that the combined advantages can be secured. We expounded the advantages of polynomial activation function previously. The additional sigmoidal operation is meant to secure stability ofbounded outputs in the hidden units (which is a potential drawback ofpolynomial transformations) and facilitate the convergence of the training process. However, it may not be needed, in which case the trained values ofthe slopes {A h } become very small (being reduced to a linear scaling transformation) and the respective {(h,i} values become commensurably large to compensate for the scaling of the small slope values. The key practical questions are: (1) how to select appropriately the structural parameters of this model, and (2) how to estimate the model parameters from input-output data (via costfunction minimization, since many parameters enter nonlinearly in this estimation problem). The selection of the structural parameters follows the guidelines of the search procedure presented in Section 4.2.2. A simplification occurs when we accept uniform structure for the same type ofunits in each layer (i.e., K, = K, M, = M, Qh = Q, R, = R). Then the structural parameters of the VWM model are L, K, M, H, Q, I, and R. Certain additional simplifications are advisable in practice. For instance, K and M can be fixed to 4 or 6, since most physiological systems to date have exhibited modes with no more than two or three resonances (from the PDM analysis of actual physiological systems). Although this is not asserted as a general rule and there are no guarantees of general applicability, one has to maintain a practicable methodological sense rooted in the accumulated experience. Naturally, ifthis rule appears inappropriate in a certain case, different values of K and M should be explored. Following the same reasoning, R can be limited to 2, since the purpose of the interaction units is to introduce the "cross-terms" missing in the "separable Volterra network" architecture and achieve a better approximation of the multiinput static nonlinearity ofthe system. This is largely achieved by setting R = 2. Finally, Q can be set to 3 on the basis of the accurnulated experience to date regarding the observable order of the actual physiological systems in a practical context. Recall that in the presence of the cascaded polynomial-sigmoidal transformations, the nonlinear order of the VWM model becomes infinite. However, lower-order models are possible when the trained slopes of the sigmoidal transformations become very small (effectively, a linear transformation within the dynamic range ofthe input). Note that the inclusion ofthe sigmoidal transformation (of trainable slope) at the outputs of the hidden units is motivated by the desire to avoid possible numerical instabilities (due to the polynomial transformations) by bounding the outputs of the hidden units. Because of the trainable slope of the sigmoidal functions, no such "bounding" is applied if not needed (corresponding to a very small value of Ah, compensated by the trained values of (h,J. With these practical simplifications in the structure of the VWM model, the remaining structural parameters are L, H, and I. The model-order selection procedure presented in Section 4.2.2 (utilizing the statistical criterion presented in Section 2.3.1) can be applied for these three structural parameters in the prescribed rightward ascending order (i.e., starting with L = H = I = 1, continue with increasing L up to L max , and then increase H and I in successively slower "looping speeds").
4.4
THE VWM MODEL
263
Having detennined the structure of the VWM model, the parameter estimates are obtained via the training procedures discussed in the previous section by application of the chain rule of differentiation in error back-propagation. It should be emphasized that the training of the mode equations obviates the need for "judicious selection" of the input filter bank basis (a major practical advantage) and is expected to yield the minimum set of filters for input preprocessing based on guidance from the input-output data. One cannot ask for more, provided the training procedure converges properly. The equivalent Volterra kernels for the VWM model can be obtained by expressing the sigmoidal functions in tenns ofTaylor series expansions or finite polynomial approximations within the range of their abscissa values. For instance, the analytical expression for the first-order Volterra kerneI of the VWM model is I
H
k1(m) == L Yi,l L(h,iah,l(Oh, Ah)Ch,l ph(m) i=l
(4.186)
h=l
where ph(m) denotes the hth PDM defined by Equation (4.187) and ah,l is the first-order Taylor coefficient of Sh that depends on the respective slope Ah and the offset 0h. The resulting analytical expressions for the high-order kemels are rather complex in the general case and are omitted in the interest of space. Furthermore, their practical utility is marginal because the physiological interpretation of the VWM model ought to be based on the equivalent PDM model shown in Figure 4.1. Note that the hth PDM is given by L
ph(m) == L Wj,hgj(m) j=l
(4.187)
where gj(m) denotes the impulse response function ofthe ARX (mode) Equation (4.181). The number of free parameters in the VWM model is P == L(K + M) + H(L + Q +2) + I(H + R) + 1
(4.188)
which indicates that the VWM model complexity is linear with respect to each structural parameter separately, although it is bilinear with respect to pairs of parameters. This fact is of fundamental practical importance, especially with regard to the nonlinear order Q and R, and constitutes the primary advantage (model compactness) of the use of the VWM model and, generally, Volterra-equivalent networks for nonlinear dynamic system modeling. In most actual applications, we expect K == M == 4, Q == 3, R == 2, and L ::; 4, H :5 3, I ::; 2. Therefore, the maximum number of free parameters in a practical context is expected to be about 70 (for L == 4, H == 3, 1== 2). In the few cases where K == M == 6, the maximum number of free parameters becomes 86. In most cases, we expect P to be between 30 and 50. This implies that effective modeling is possible with a few hundred input-output data points. The structural parameter His the most critical because it defines the number of "principal dynamic modes" (PDMs) in the Volterra-equivalent PDM model that is used for physiological interpretation (i.e., each hidden unit corresponds to a distinct PDM). A smaller number of PDMs facilitates the physiological interpretation of the VWM model and affects significantly the computational complexity of the VWM model because ap/aH == L + Q + I + 2. Likewise, the computational complexity is affected significantly by the necessary number of modes L, because ap/aL == K + M + H. Each of the PDMs
264
MODULAR AND CONNECTIONIST MODELING
corresponds to the linear combination of the VWM modes given by Equation (4.187), which defines the input entering into each hidden unit in the VWM model. In closing, we note that the mode equations of the VWM model can be viewed also as "state equations," since the mode outputs {vj(n)} describe the dynamic state ofthe system at time n. This dynamic state is subject to a static nonlinear transformation to generate the system output. The nonlinear transformation is implemented (approximated) by means of the activation functions of the hidden and interaction layers in the combination prescribed by the VWM model structures ofFigure 4.48. Finally, we note that the VWM approach can be extended to the case ofmultiple inputs and multiple outputs with immense scalability advantages through the use of the interaction layer, as discussed in Chapter 7.
5 A Practitioner s Guide "... explanations ofnatural phenomena must be as simple as possible ... but no simpler . . . " -Attributed to Albert Einstein
In this chapter, we summarize the key practical issues confronting the actual application of the advocated modeling approach, and we present proper procedures for addressing these issues in a realistic context. The first two sections deal with initial groundwork and the last two sections deal with the core issues of model specification, estimation, validation, and interpretation. This is meant to be a "how to" chapter, useful for the practicioner ofthis methodology. User-friendly software packages implementing this methodology are distributed (free of charge) by the Biomedical Simulations Resource at USC. http:// bmsr.usc.edu Before delving into the specifics of the advocated methodology, the reader should be alerted to the "golden rules" that ought to govem the three main facets of the modeling task in the case of physiological systems, regarding the proper (1) operating conditions and input ensemble, (2) model specification and estimation, and (3) model interpretation and utilization. Rule #1: Rule #2:
Rule #3:
We must try to secure natural operating conditions and a broad, representative input ensemble. We must minimize apriori constraints on the selected model structure and use robust estimation methods associated with proper means ofvalidation and accuracy assessment. We must interpret physiologically the obtained model in the realistic nonlinear dynamic context (not simplistic cases of contrived experimental paradigms) and with regard to the intended utilization (diagnosis, control, assessment, scientific knowledge).
Nonlinear Dynamic Modeling 01Physiological Systems. By Vasilis Z. Mannarelis ISBN 0-471-46960-2 © 2004 by the Institute of Electrical and Electronics Engineers.
265
266
A PRACTITIONER'S GUIDE
5.1 PRACTICAL CONSIDERATIONS AND EXPERIMENTAL REQUIREMENTS In this seetion, we diseuss the main praetical eonsiderations that arise in actual applications of nonlinear modeling of physiological systems. We are eoneerned with modeling studies of physiologieal systems where the input-output signals are reeorded (either spontaneously or in eontrolled experiments) as evenly sampled time-series datasets and proeessed to obtain a mathematical model ofthe dynamie input-output relationship. Multiple inputs and outputs are possible, including cases of spatiotemporal patterns (e.g., a visual stimulus eomprised of time-varying spatial patterns) or spectrotemporal patterns (e.g., an auditory stimulus comprised oftime-varying spectral patterns). The input signal may represent the natural spontaneous activity of the system or be eontrolled experimentally so that its key attributes (bandwidth and dynamie range) may be seleeted to cover the operational system bandwidth and dynamic range. The sampling interval is assumed to be the same for the input and output signals, and it is selected so that the Nyquist frequency (i.e., the inverse of twiee the sampling interval) exeeeds the system bandwidth. Analog low-pass filtering is applied to the input and output signals prior to discretization (sampling) to avoid aliasing due to extraneous noise. Of particular importanee are the input signal charaeteristics that must be seleeted in accordance with the system characteristies and the objeetives ofthe modeling study. For instanee, the highest frequency of interest in the system response determines the system bandwidth and, eonsequently, the minimum input bandwidth that should be used. Likewise, the lowest frequeney of interest in the system response eorresponds to the system memory and, eonsequently, determines the minimum experimental data record required (typically a small multiple ofthe system memory). These practical considerations are organized in three categories regarding: (1) system charaeteristics (Section 5.1.1), (2) input characteristics (Seetion 5.1.2), and (3) experimental eonsiderations (Section 5.1.3). 5.1.1
System Characteristics
We must first examine some basic system characteristics that influenee the proper design ofthe modeling study. These characteristies ean be elustered in two groups: one comprising key system parameters (Le., bandwidth, memory, and dynamie range) and the other pertaining to key system functional properties (linearity, stationarity, and ergodieity). The aetual determination of these characteristics is subjeet to experimental variations and noise/interferenee and, therefore, calls for appropriate seleetion criteria that take into account this uneertainty in the context of the speeifie data (spontaneous activity or eontrolled experiments). Generally, the ability to perform preliminary experimental tests for this purpose is desirable, as discussed in Section 5.2. System Bandwidth. The system bandwidth is defined as the range of input frequeneies over whieh the system generates significant response. In an experimental setting, it can be established through preliminary testing with narrow-band (sinusoidal) inputs of sweeping frequeneies or with broadband white-noise inputs that cover the frequeney range of interest, as diseussed in Section 5.2.1. The use of white-noise (or, generally, broadband) inputs is appropriate for testing nonlinear systems since it can generate nonlinear interactions among simultaneously applied multiple frequeneies, whereas narrowband inputs do not generate such interactions and, therefore, do not test the system exhaustively.
5.1
PRACTICAL CONSIDERATlONS AND EXPERIMENTAL REQUIREMENTS
267
The definition of "significant response" is heuristic and calls for an empirical or statistical criterion. The convention followed usually in the engineering 1iterature is based on a 3 dB reduction of the frequency response gain (i.e., half of the maximum gain) but appears inappropriate for the purpose of determining the bandwidth of physiological systems, in which frequency response gains as low as 1% of the maximum may be of interest. It is recommended that the selection criterion be applied to the spectrum of the elicited response signal for a broadband or white-noise stimulus and be set as low as -40 dB (which corresponds to a -20 dB threshold for the frequency response gain). If the presence of significant ambient noise prevents the application of such a low threshold by placing a higher "noise floor" on the spectrum of the measured response, then either the input-signal power should be increased to improve the signal-to-noise ratio at the output over a wider frequency range, or the "noise floor" should be accepted as the threshold. If controlled experiments are not possible and only natural data are available, then the ratio of the magnitudes of the input-output Fourier transforms can be used as a measure of frequency-response gain to determine the system bandwidth.
System Memory. The system memory (or settling time) is defined as the time elapsed for the diminution of the causal effect of an impulsive input stimulus. Again, the "diminution of the causal effect" in the system response calls for a threshold criterion, which is subject to the same measurement noise ambiguities discussed above for the system bandwidth. It is recommended that this determination be made with a generous disposition, using either an impulsive or a step input. If neither of those deterministic inputs is experimentally feasible or desirable, but broadband input-output data are available (either natural or experimental), then the input-output cross-correlation can be used to estab1ish the system memory (see Section 5.2.2). Another issue related to the system memory is the low-endjj ofthe frequency range of interest in the system response. The latter can be used to determine the minimum length ofthe system kernels, which is equivalent to the system memory (all kernels of a system are assumed to have approximately the same effective memory extent). Note that a very important parameter in nonparametrie model estimation is the memory-bandwidth product of the system that determines the minimum number of discrete-time samples along each kernel dimension required for adequate discrete-time representation of the kernels (equivalent to the ratio of the highest to the lowest significant frequency in the frequency response characteristics ofthe system). System Dynamic Range. The system dynamic range is defined as the input amplitude range over which the natural operation of the system occurs and the system response properties are modeled. It is usually determined by the maximum and minimum amplitude values of the input signal that the physiological system is expected to experience in the course of its natural operation (also called the physiological range), since it is desirable to use experimental stimuli that resemble as much as possible the natural input ensemble. In engineering, the dynamic range is often given in dB and defined as 20 log [maxImin] (if min > O)-a convention that is not compelling to adopt in the context of physiological systems, for which it is more appropriate to simply specify the min-max range. Although the sought model ought to be valid over the entire physiological range, many studies to date have resorted to segmentation of this range experimentally in order to simplify the modeling task when the system exhibits significant nonlinearities. This range segmentation often seeks to approximate the local behavior of the nonlinear system in the
268
A PRACTITIONER'S GUIDE
neighborhood of an operating point by means of a linearized model. The characteristics of this linearized model change, of course, for each different operating point in order to fit the global nonlinear characteristics ofthe system. This linearization approach may be useful in certain cases but generally results in unwieldy model representations and potentially serious misunderstandings due to the multiple "localized" models. Our goal is to seek "global" models that are valid over the entire system dynamic range and avoid range segmentation.
System Linearity. The issue of system linearity is critical for modeling purposes, since nonlinearities require more elaborate methodologies and present themselves with far greater diversity of model forms. The humorous phrase: "to divide all systems into linear and nonlinear is like dividing the world into bananas and nonbananas" punctuates the point of the preponderance of possible system nonlinearities over the very particular case oflinearity. Linearity, of course, is attractive because ofits relative simplicity and was defined in Section 1.3.2 by means ofthe superposition principle. Although most physiological systems do not strictly obey the superposition principle, many can be approximated fairly well by means of linear models under certain (possibly restrictive) operating conditions. For a given dynamic range of interest, the linearity of a system can be examined experimentally using one of many possible tests based on the superposition principle or other fundamental properties of a linear system (e.g., a fundamental property of linear time-invariant systems is the generation of a sinusoidal output in response to a sinusoidal input at the same frequency), as discussed in Section 5.2.4. White-noise inputs can be also used to test the linearity of a system and possess the advantage of simultaneously testing the system over all frequencies-a far more vigorous test than with deterministic inputs such as sinusoids or impulses/pulses (see Section 5.2.4). Another important aim of the linearity test can be to provide insight into the type/order of nonlinearity, if linearity is rejected. These aspects of the linearity test and the required mathematical derivations for rigorous analysis are elaborated in Section 5.2.4 due to their considerable importance for actual applications. System Stationarity. The issue of system stationarity (or time invariance) is likewise critical for modeling purposes, since possible system nonstationarities are likely to present a daunting modeling challenge when combined with system nonlinearities. The test for stationarity is briefly described in Section 5.2.3 and its application is further discussed in Chapter 9 in connection with nonstationary system modeling. The key issue 'that serves as a methodological branching point is the relation between the rapidity of nonstationary change and the extent ofthe system memory (or the time constants ofthe system dynamics). When the system nonstationarity is slow relative to the low-frequency end ofthe system dynamics, it is possible to obtain approximate "piecewise stationary" models over successive (and possibly overlapping) time segments of the data or use some tracking method based on recursive relations (e.g., recursive least squares). However, when the nonstationarity is fast relative to the system dynamics, only certain cases can be treated with the specialized methods discussed in Chapter 9. System Ergodicity. The issue of system ergodicity is one that is seldom acknowledged in practice despite its fundamental importance. It refers to the invariance of the functional properties for various experimental preparations of a given physiological system. It is widely accepted that such invariance is not possible in the real world, because each exper-
5.1
PRACTICAL CONSIDERATIONS AND EXPERIMENTAL REQUIREMENTS
269
imental preparation exhibits some systemic variation due to a myriad of uncontrollable (intrinsie and extrinsie) factors, regardless of our best efforts to have identical experimental preparations. The same is true for intersubject variability in clinical data. Thus, ergodicity does not usually hold and the best we can hope for is that these random systemic variations across different preparations exhibit fixed statistical characteristics. In this case, appropriate statistical analysis of the results obtained from different preparations is possible and the formation of proper averages remains meaningful. In spite of this fact, the ergodie property is usually postulated in practice for physiological systems, since it is often impractical or even impossible to actually perform appropriate statistical analysis across a broad ensemble of results from real data. An ergodicity test will have to involve a large number of different preparations that are not usually available. Nonetheless, the issue should be acknowledged and addressed in the proper manner in each practical context, given that it may call into question the results obtained through customary practice (averaging results from different preparations). It is our beliefthat the lack of ergodicity across physiological preparations or clinical subjects need not be confounding in actual modeling studies and can be properly addressed with the current methodological means. 5.1.2
Input Characteristics
A key input requirement for nonlinear system modeling is that the ensemble of input waveforms contains most, if not all, frequencies and amplitudes of interest (i.e., cover the entire bandwidth and dynamic range of the system under study). This can be achieved with a natural ensemble of inputs (if such is available) or, in an experimental setting, the system must be tested exhaustively over its entire bandwidth and dynamic range in order to observe all possible nonlinear interactions. This clearly implies that the system cannot be tested separately for different frequency bands and amplitude ranges, since the superposition principle does not hold for nonlinear systems and nonlinear interactions must be observed lest they cannot be modeled. It is for this reason that we favor random (or pseudorandom) broadband input signals that cover the entire system bandwidth and dynamic range of interest or a natural ensemble .ofinputs (if available) that tend to be broadband and cover the physiological range of system operation. Although deterministic input signals can be constructed that satisfy the broadband requirements (e.g., chirp waveforms), most attention has been directed toward random or pseudorandom input signals that approximate white noise over the bandwidth of interest (i.e., nearly constant-power spectral density over the system bandwidth). This tendency is due in part to Norbert Wiener's pioneering proposition ofmore than 50 years aga that the "effective test input" for nonlinear systems is Gaussian white noise (GWN). However, the use of white-noise or quasiwhite random inputs is also supported by the fact that natural stimuli (constituting the ideal input ensemble) are typically broadband and appear stochastic in form. Another important consideration is the availability of appropriate methodologies for the analysis of such data. This motivated the early use of specialized test inputs (e.g., multiple impulses or sinusoids, GWN or quasiwhite signals, etc.) that facilitated the modeling task. It is only recently that the requisite methodologies became available for effective modeling under natural inputs and operating conditions. In addition to the input signal waveform, we must select the dynamic range of input amplitudes over which we intend to study the system. We advocate the use of the full physiological range of input amplitudes (i.e., the normal operating range of the system) but this may be constrained by experimental considerations. It should be remembered that
270
A PRACTITIONER'S GUIDE
the input amplitude range and frequency bandwidth define the region of validity of the obtained model in terms of a compact function space (i.e., inputs with smaller dynamic range and/or bandwidth will be compatible with the estimated model, but not necessarily inputs with larger dynamic range andlor bandwidth). The full bandwidth requirement can be established with preliminary testing, as discussed in Section 5.2.1, but the full dynamic range must be determined from physiological considerations of the "normal operating range." This issue is moot when natural (spontaneous) activity data are used for our modeling purposes. Wiener' s pivotal contributions to nonlinear system identification are detailed in Section 2.2, where the reasons for recommending GWN inputs are also discussed. Here we simply note that band-limited and range-limited GWN signals (the physical realizations of ideal GWN) satisfy the aforementioned input requirements. However, other properly constructed random/pseudorandom and deterministic signals may satisfy these input requirements as well. Among such random signals, a large family of quasiwhite (i.e., approximately white over the bandwidth of interest) signals with symmetrie amplitude probability density functions (termed the CSRS) offer an attractive alternative in practice, due to the afforded flexibility in selecting the appropriate amplitude distribution and other advantages discussed in Section 2.2.4. Note that all random test input signals are selected as stationary and ergodie processes. Among the candidate pseudorandom signals, binary and temary m-sequences have been the primary choice to date, exhibiting a mix of potential advantages and disadvantages discussed in Section 2.2.4. Among the deterministic signals, the choice has been limited so far to sums of sinusoids (of nearly incommensurate frequencies) and properly constructed sequences of impulses discussed in Sections 2.1.5 and 8.3 for neuronal systems, although chirp waveforms (linear or hyperbolic FM) may also have some potential that is heretofore largely unexplored. Illustrative examples of various "quasiwhite" input signals are given in Section 2.2.4 in connection with nonparametric modeling. It should be noted that input whiteness (or quasiwhiteness) is not a strict requirement for recently developed methodologies (discussed in Section 2.3), although it was astriet requirement for the initial kerne I estimation technique (i.e., the cross-correlation technique discussed in Section 2.2.3) for nonparametrie modeling of nonlinear systems. It must be emphasized that, whenever possible, a natural ensemble of inputs should be used (which is typically stochastic and broadband). If such is not available, then an appropriate quasiwhite test signal of appropriate bandwidth and amplitude distribution for each given application is advisable.
5.1.3
Experimental Considerations
Having selected the type of input signal (i.e., waveform, bandwidth, and dynamic range), we must ascertain the capability of digital-to-analog signal transducers to deliver the selected waveform at the required bandwidth and dynamic range of the experimental stimulus. Depending on the particular characteristics of each experimental preparation, this may or may not be achallenge. Typically, when the bandwidth and dynamic range increase, the experimental implementation of the stimulus becomes more challenging, since the input signal transducers may introduce significant distortions. For this reason, the experimental stimulus that is actually delivered to the system must be recorded and used for the modeling task during data analysis (and not the originally designed input to the transducer). Alternatively, the transfer characteristics of the input transducer can be modeled
5.1
PRACTICAL CONSIDERATlONS AND EXPERIMENTAL REQUIREMENTS
271
separately and accounted for as a precascaded filter in the overall model obtained from the originally designed input. Similar transduction issues exist on the output side. The actual experimental output is measured with an output transducer and discretized with an analog-to-digital converter (ADC). The transfer characteristics of the output transducer can be modeled separately and accounted for as a postcascaded filter in the resulting overall model. The effects of the ADC are usually deemed negligible by securing a sufficient number ofbits for quantization levels (typically, 12 or more bits alleviate the quantization error). The sampling interval T is chosen to be the same for the input and output signals, and is selected on the basis ofthe system bandwidth, Es, and the input bandwidth, B, (in Hz), following the Nyquist theorem as T:5 1/(2B x ) :5 1/(2Bs ) (in second). In order to avoid aliasing from possible high-frequency noise or interference, the output signal is low-pass filtered at B, prior to sampling and digitization. It is recommended that B; be selected to be equal to B s , and T = 1/(2B x ) , to maximize the efficiency of the data collection and analysis process (i.e., minimize the number ofrequired samples). Preliminary measurements of the ambient noise and systemic interference are strongly reeommended in order to guide possible preproeessing of the reeorded output signal by means of appropriate filtering. For instanee, many physiological systems exhibit signifieant low-frequency noise and/or interference (e.g., the cardiovascular system exhibits interference from the endocrine and metabolie systems below 0.01 Hz) that can be removed by appropriate high-pass filtering. Likewise, high-frequency noise may be introduced by the instrumentation or by systemie sourees, and must be low-pass filtered to avoid aliasing. It is also possible to tailor the speetrum of the input signal in order to maintain a fairly constant signal-to-noise ratio (SNR) aeross all frequencies for an anticipated noise/interference spectrum. The length of the experimental data record is determined by the key trade-off between stationarity/stability of the experimental preparation and data analysis burden on one hand, arguing for shorter records, and estimation aecuracy requirements favoring longer data records on the other hand. Obviously, low SNR conditions require longer data reeords or repetition of identieal experiments and averaging of the output records (under an ergodie assumption) in order to improve the SNR in the output data. For nonlinear systems, another important consideration is that the input signal must have adequate opportunity to stimulate the system with a suffieient variety of input waveforms (as the only indisputable means for model validation). This consideration favors random broadband or quasiwhite inputs, since they offer a broader repertoire of stimulation epochs to the system. Since these conditions cannot be precisely quantified, this judgment remains qualitative. Typically, the longest possible data record is eollected under the prevailing experimental constraints, and the other issues are addressed and assessed in the analysis process. It is useful to note that recently developed estimation methodologies have reduced dramatically the data length requirements and yield accurate estimates for rather short data reeords, thus relieving the investigator of considerable experimental burden. The effect of noise/interference is a critical praetical consideration in the modeling of physiological systems and cannot be easily separated from possible system nonstationarities or stochastic systemic variations that are widely viewed as part ofthe natural milieu of physiological systems. Therefore, special attention must be given to the robustness and noise-suppressing properties of the employed modeling methodologies (especially the estimation methods). The resulting residuals (i.e., the model prediction errors) must be carefully examined for clues that enhance our understanding of the statistics of the ambient
272
A PRACTITIONER'S GUIDE
noise/interferenee environment of a physiologieal system and for eonfirmation of the validity ofthe obtained model. This eritieal issue is further diseussed in Seetions 5.3 and 5.4.
5.2
PRELIMINARY TESTS AND DATA PREPARATION
In this seetion, we diseuss the preliminary testing required in order to establish some key attributes of the system that affeet the modeling process; namely, the system bandwidth, memory, linearity, stationarity, and ergodieity. As a starting point, it is assumed that a stable experimental preparation is available for testing with the seleeted input waveforms (or for monitoring the spontaneous activity of the naturally operating system). The means of applying the experimental test input signal (D-to-A transdueers) and recording the elieited output signal (sensors and A-to-D transdueers) are assumed to be available for reliable data eolleetion, ineluding provisions for minimizing the noise in the reeorded/digitized data. Finally, the dynamie range is seleeted based on the natural operating range of interest.
5.2.1
Test for System Bandwidth
Having seleeted the input dynamie range, we proeeed with the determination of the system bandwidth in an experimental setting by performing tests with sueeessively broader input bandwidths until inspeetion of the resulting output speetra reveals the highest frequeney fmax at whieh the system eeases to respond appreeiably. The latter judgment is eomplieated by the presence of noise that places a "floor" in the speetral values at high frequeneies. Typieally, we consider the frequeney at whieh the output spectral density drops below a speeified threshold (e.g., -20 dB from peak value) to be fmax, provided that the noise "floor" is lower; otherwise the threshold is set at the noise "floor." Averaging of responses to repetitive identieal input signals will generally reduee this noise "floor" and bring it below -20 dB (from peak value) for most applieations. Band-limited GWN (or other quasiwhite) test inputs are ideal for this purpose, but other broadband inputs can be used as weIl (e.g., multiple sinusoids), although the use of a finite number of sinusoids may not test all possible interaetions of interest among multiple frequeneies. The system bandwidth B, ean be set equal to fmax, and the input signal bandwidth B, ought to be greater than or equal to fmax in our random broadband experiments (otherwise the system is not fully tested). Sinee the required sampling interval (to avoid aliasing) is T:::; (2Bx )- I, it is most effieient to seleet B, = fmax and T = (2Bx )- I.
5.2.2
Test for System Memory
The next task is to determine the system memory (or settling time), whieh represents the time required for the effects of the input on the output to diminish below a speeified threshold value. A simple preliminary test that can be used for this purpose is the applieation of an impulsive input with amplitude determined by the dynamie range of the system input that we wish to study. Inspeetion ofthe elicited response ean determine the time beyond whieh the output amplitude drops below a speeified threshold (e.g., -20 dB from peak value). The value ofthis threshold depends on the signal-to-noise ratio (SNR) in the output signal beeause, as in the ease ofbandwidth determination above, the noise plaees a "floor" in the observable output values. Averaging of multiple responses to repetitive impulsive inputs (suffieiently separated in time to exeeed any eonservative estimate of the
5.2
PRELIMINARY TESTS AND DATA PREPARATION
273
system memory) may be required in practice to raise the output SNR above 20 dB, at a minimum. Obviously, step inputs or reetangular pulses can be used for the same purpose as impulses (if they are more convenient experimentally), however the use of such specialized inputs engenders certain risk of limiting our view of the system behavior to specialized testing conditions (impulses, pulses, sinusoids). To avoid this risk, a GWN (or other quasiwhite) test input can be used and the system memory can be detennined by the extent of significant values in the kerneI estimates. This is most practical for the first-order kerne I estimate. In the case of broadband random inputs, the system memory can be detennined by the application of a statistical threshold on the cross-correlation values between input and output for positive lags (selecting the maximum lag above the threshold as a conservative estimate of the memory extent). The threshold can be evaluated from the computed cross-correlation values for negative lags (assuming causality and open-Ioop operating conditions that allow us to use the latter values to construct the null hypothesis). The same goal can be achieved with the use ofmoving-average models (see Section 3.1), instead of cross-correlations, to detennine the maximum lag of input impact on the output. In the context of the modified Volterra models, resulting from kernel expansions on appropriate bases (see Section 2.3.1), the determination of the system memory is subsumed into the issue of selecting the proper parameters for the expansion bases. For instance, in the case of the Laguerre basis, the Laguerre parameter a indirectly determines the effective system memory (and vice versa). When iterative procedures are used (as discussed in Section 4.3 in the context ofthe Laguerre-Volterra network), the Laguerre parameter a is estimated adaptively from the data along with the other model parameters, thus obviating the need for a priori determination ofthe system memory. In the general context ofthe discrete Volterra model (see Section 2.1.4), the problem of memory detennination is also subsumed into the broader issue of model-order selection, since M constitutes one of the key structural parameters of the model (the memory-bandwidth product of the system). The advocated model-order-selection algorithm (see Section 2.3.1) can be used for determining M along with the nonlinear order Q on the basis of statistical hypothesis testing. The estimated system memory intluences the selection of the required experimental data record length. The latter ought to be at least a low multiple of the system memory (no less than three times, and preferably more than 10 times). Obviously, the longer the data record, the better, provided the system remains stationary. This is a critical experimental consideration, since it relates to issues of stability of the preparation (including possible nonstationarities) and feasibility of the experiment. The record length also determines the experimental and computational burden associated with the collection and processing ofthe data. The advocated "efficient kernel estimation" methods (see Section 2.3) offer considerable advantages over other existing methods in this regard by reducing significantly the requirements of data-record length. Experience with various applications to date has confinned potential savings by factors of 10 to 100, relative to traditional methods. The importance of these savings can hardly be overstated, since they affect the experimental feasibility, the quality ofthe results, the possibility ofpiecewise nonstationary analysis (if needed) and the total required effort (experimental and computational).
5.2.3
Test for System Stationarity and Ergodicity
With regard to the issue of system stationarity, preliminary testing involves the repetition of identical experiments with the same preparation at different times and the assessment
274
A PRACTITIONER'S GUIDE
of the observed differenees in the recorded outputs and/or in the obtained models. Sinee some of the output differences will be due to stochastic factors (ambient noise and systemic interference), this assessment is far from trivial and generally less reliable when based on the observed output data. However, it is more robust when based on observed changes in the estimated kernels (whieh reduce the effect of noise/interference). In both cases, we must employ statistical analysis in order to make this assessment. This issue is discussed further in Chapter 9, where sophistieated methods for nonstationary modeling are presented. Here, we limit ourselves to the fundamental observation that if the nonstationarities are slow relative to the system dynamics (i.e., when observable ehanges in the system functional characteristics take longer than about ten times the system memory), the effective tracking of the changes in the system model can be accomplished with pieeewise stationary analysis of the data over a sliding (and possibly overlapping) time window or with recursive methods (e.g., reeursive least squares). However, ifthe nonstationarities are fast relative to the system dynamics, then the problem is far more diffieult and one ofthe advanced methods presented in Chapter 9 may be applicable. With respect to the system ergodicity, the assessment must be also statistieal by virtue of the stochastic nature of this attribute and requires repetition of identical experiments with different preparations (of the same experimental setup) in order to eompare the resulting models. The consistency ofthe obtained models for different preparations ought to be quantified with statistical measures (e.g., mean, standard deviation, skewness, eovarianee, distributions, etc.) that ean be used to assess the degree of ergodicity. Naturally, the key practical question regarding ergodicity is the amount of variability expeeted from preparation to preparation in terms of the obtained models for given experimental and computational parameters. When this assessment can be made in a quantitative manner, the degree of ergodicity of the system can be established. Although this is rarely done in current practice, we wish to emphasize its importance and encourage future investigations to assess the degree of ergodieity ofthe system under study. 5.2.4
Test for System Linearity
The fundamental issue of system linearity has to be carefully examined with preliminary testing in order to establish the need for elaborate nonlinear modeling and determine the required order of the nonlinear model. This testing may utilize the superposition prineiple or some basic property oflinear or nonlinear systems. For instance, under the assumption of stationarity (time invarianee), linear systems produce sinusoidal outputs in response to sinusoidal inputs (of the same frequency). This basic test can be extended also to nonlinear systems, as depieted in the analysis of Section 2.1.2, where the Volterra kernel of nth order was shown to generate the n, (n - 2), (n - 4), and so on harmonics of a sinusoidal input frequency. Therefore, a simple preliminary test may employ a sinusoidal input and Fourier analysis of the resulting output that ought to reveal the highest order of Volterra kernel present in the system (nonlinear order) based on the detectable presence of harmonies of the stimulating frequency in the output. It is evident that linearity is asserted when only the first harmonie (i.e., the same frequency as in the input) is detected. This task, of course, is practically subject to output SNR considerations that may obscure this judgment. High output SNR can be achieved by averaging the results of identical repetitive experiments. It is important to note that the sinusoidal input frequency must be varied and sweep through the system bandwidth in successive trials in order to test the system thoroughly. The amplitude ofthe sinusoidal input must also cover the amplitude range of natural operation. Even though this is a simple and practicable test, it is not a eomplete
5.2
PRELIMINARY TESTS AND DATA PREPARATION
275
test in the sense that it does not probe the nonlinear interactions among various input frequencies (sine the input is a single sinusoid at each trial). The sum-of-sinusoids input, described in Section 2.1.5, can be used in similar fashion to alleviate this deficiency to some extent, since it probes the system for nonlinear interactions among all frequencies present in the input signal. Impulsive inputs can be also used to test the superposition principle by comparing the resulting outputs against the expression ofEquation (2.36) and determining the best polynomial fit for various values ofthe impulse strengthA. The degree ofthe best polynomial fit determines the nonlinear order of the required Volterra model, provided the diagonal kerneI values are not zero. Comparison ofthe elicited response for a linear combination of a pair of impulses to the same linear combination of the individual responses to each impulse separately (see Figure 2.1) is particularly suitable for neuronal systems with spikes (action potentials) at the input. Of course, any combination of linearly independent waveforms can be used for the same purpose (i.e., to test the system linearity by means ofthe superposition principle). These waveforms must cover the system bandwidth and dynamie range ofnatural operation, ifthe test is intended to be complete. However, any specialized waveform allows for the possibility of partial testing of the system linearity that engenders some risk of erroneous assessment. The only input waveform that ean assure eomplete testing and definitive assessment of linearity is the white-noise input signal, eovering the entire bandwidth and dynamic range ofthe system (e.g., the uniform CSRS quasiwhite test inputs discussed in Section 2.2.4). This definitive assessment can be based on the existence of high-order kernels (higher than first-order) and the linear/nonlinear dependence of the output variance on the input power level, which is given by Equation (2.69) in the case of a GWN input. It is evident form Equation (2.69) that the nonlinear order of the required Volterra-Wiener model is the degree of the best polynomial fit of the output variance for different values of the input power level P. The coefficients of this polynomial fit depend on the Euclidean norm ofthe respective Wiener kernels, which is a reliable measure ofkernel significance. Another criterion for testing system linearity is the invariance (or not) ofthe estimated first-order Wiener or CSRS kerneI for different power levels of the quasiwhite test input, in accordance with the analysis presented in Section 2.2.4. A related and commonly used test involves eoherence function measurements to assess system linearity (and stationarity), as elaborated in Section 2.2.5. The test of linearity is practically important because it determines the employed modeling methodology (linear or nonlinear). It is most useful when it also yields the required order ofnonlinearity, which determines the order ofthe utilized Volterra model.
5.2.5
Oata Preparation
The key step in data preparation eoncems the possible enhancement ofthe signal-to-noise ratio (SNR) by proper filtering of some of the noise/interference contaminating the input-output data, and the removal of possible artifaets in the output measurements (typically in the form of baseline/calibration drifts or impulsive outliers). In terms of noise/interference reduction, the most common approach involves appropriate low-pass or high-pass filtering in the frequency bands where the SNR is low. In the presence of high-frequency noise, low-pass filtering may also allow the lowering of the Nyquist frequency by down-sampling (if such is aceeptable in terms of the required temporal resolution). In the event of low-frequency noise/interference (which is a frequent oecurrence in many physiological systems subjeet to numerous autoregulatory mecha-
276
A PRACTITIONER'S GUIDE
nisms and unknown systemic or extraneous influences), the necessary high-pass filtering must be viewed in the context of the data record length to avoid numerical artifacts (i.e., the cut-off frequency should be considerably higher than the inverse of the record length). Low-frequency baseline drifts (whether physiological in origin or experimental artifacts) can be removed also by fitting and subtracting linear or polynomial baseline fits. The effect of possible impulsive outliers associated with measurement artifacts can be mitigated by appropriate clipping (hard or soft) of the signal amplitude values. Proper clipping requires prior estimation of the natural amplitude distribution of the recorded signal and selection of a hard or soft clipping threshold based on the dispersion properties of this distribution. Of particular importance in terms of data preparation is the "data consolidation" method advocated in Section 4.2.2. According to this method, we must first define the space of input vectors (using the input epoch in the simplest case or the filter-bank outputs in the case ofthe modified Volterra model) and then determine the "minimal proximity" value that represents an estimate of the random variability of the input vectors due to extraneous stochastic factors (i.e., a measure of uncertainty in the input vector measurement). Then, the input vectors that are within this "minimal proximity" value can be averaged and the resulting input vector average can be corresponded to the average value of the respective outputs. This "data consolidation" procedure is based on the rationale that input vectors within the vicinity of measurement uncertainty cannot elicit significantly different outputs. The resulting "smoothing" may introduce some estimation bias in the model but is expected to have a far more important (and beneficial) effect on reducing input and output noise in the data used for model estimation. Since the latter tends to be a critical factor in practice, especially for iterative estimation procedures (i.e., giving rise to local minima), it is expected that the overall effect will be highly beneficial. N aturally, the trade-off between estimation bias and variance must be carefully examined in practice vis-a-vis the specified "minimal proximity" value that controls the degree of this "smoothing" operation.
5.3
MODEL SPECIFICATION AND ESTIMATION
Having completed the preliminary testing to determine the basic attributes of the system model and having prepared the input-output time-series data, we are now ready to tackle the core tasks of model specification and estimation. These tasks will be described for the two types of nonlinear dynamic models that are deemed most promising at the present time: the modified discrete Volterra (MDV) models and the Volterra-equivalent network (VEN) models, exemplified by the VWM model in Section 4.4. The former class of models (MDV) is linear in the unknown parameters and both the specification and estimation tasks are tackled in the context of linear algebra (matrix formulation), as discussed in Section 2.3.1. The latter class ofmodels (VENNWM) is nonlinear in some unknown parameters because of the imposed equivalent network structure, and both the specification and estimation tasks require iterative search and cost-minimization procedures. The two classes of models will be addressed separately and cover the two primary approaches for cases of natural, as weIl as controlled, input ensembles. The fact that they do not require specialized test inputs (e.g., white noise) but are applicable for nearly arbitrary natural inputs is of tremendous practical importance as it relaxes the experimental requirements and broadens the scope of potential applications. It should be reemphasized
5.3
MODEL SPECIFICATION AND ESTIMATION
277
that these general methodologies can incorporate specialized knowledge (if available) for practical advantage and, thus, they cannot be criticized for inefficiencies resulting from their generality. They are both general and efficient. They can also be extended to multiple inputs and multiple outputs, as discussed in Chapter 7.
5.3.1
The MDV Modeling Methodology
The MDV model of order R can be put in the matrix form Y == VRCR + SR
(5.1)
as elaborated in Section 2.3.1. A statistical method for the selection of the model order was presented in Section 2.3.1 that can be used to determine the true order R ofthe MDV model (incorporating the filter-bank size Land the nonlinear order Q). The number offree parameters in the Rth-order model is PR == (L + Q)!/L!Q!
(5.2)
and are contained in the parameter vector CR (i.e., the model is linear in terms of the unknown parameters). As discussed in Section 2.3.1, the residual vector BR for the true model order R is a linear transformation of the output-additive noise/interference and allows the application of a statistical criterion for determining the true model order in an ascending-order search procedure. The reetangular matrix V R depends on the input data and the selected filter bank. If it is of reduced rank (because the input ensemble is not "rich" enough), then pseudoinversion can be used to estimate the parameter vector eR. The parameter estimates can be used to reconstruct the estimates of the Volterra kernels of the system, utilizing the respective expansion basis (defining the filter bank). The statistical properties of the parameter (or kerne I) estimates depend on the statistical properties of the residuals, as discussed in Section 2.3.1. Generally, zero-mean, inputindependent, uncorrelated residuals seeure unbiased and consistent estimates. If the residuals are also Gaussian, then the estimates are efficient (i.e., have minimum variance) when least-squares estimation methods are used. For non-Gaussian uncorrelated residuals, minimum estimation variance can be achieved through minimization of a cost function proportional to the minus log-likelihood function (i.e., maximum likelihood estimation), which is not quadratic in the non-Gaussian case. In the Gaussian case, minimum variance is not achieved when the residuals are correlated. Prewhitening of the data is required in order to achieve minimum variance in this case, which leads to the "generalized least-squares" estimator of Equation (2.46), employing the residual covariance matrix. In general, for nearly Gaussian residuals, the parameter vector estimate with minimum variance is given by
eR == [GRVR]+GRy
(5.3)
where G R denotes the prewhitening matrix of the residuals and + denotes the generalized inverse (or pseudoinverse). An alternative to prewhitening is the random selection of batches of input-output data points (i.e., input vectors composed of samples at randomly selected times and the corresponding output values) to form the matrix VR in order to reduce the correlation among residuals and facilitate the estimation procedure by making
278
A PRACTITIONER'S GUIDE
G R the identity matrix. The random selection procedure can be repeated and the obtained estimates from each batch can be averaged to yield the final estimates. The size of each batch must be greater than PR and equal to a fraction of the data record N (e.g., a batch size of NI8 > PR yields up to eight distinct batches through random selection). As in the case of network training, where the input-output data points are divided into a training set and a testing set, one may perform the aforementioned estimation task using most of these data batches and use the remaining batches to test the predictive ability of the estimated model ("out of sample" prediction). This procedure is recommended for model validation (as discussed in Section 5.4) because of possible low-frequency noise/interference and/or nonstationarities in physiological systems. It is evident that the estimation of the MDV model is greatly simplified by the fact that the unknown parameters enter linearly into the estimation problem, which allows the use ofthe well-developed matrix inversion methods. However, the size ofthe pseudoinverted reetangular matrix often becomes very large (especially for high-order systems), which raises issues ofpractical applicability. This has prompted the introduction ofthe Volterraequivalent network (YEN) models, such as the VWM model discussed in the following section.
5.3.2
The VENNWM Modeling Methodology
The VWM model was described in Section 4.4 and is used to exemplify the VEN modeling approach. It is generally comprised of a set of "state equations" (linear difference equations transforming the input into a set of intemallstate variables) and the cascaded operation of a "hidden layer" followed by an "interaction layer" that j ointly apply a static nonlinear transformation on the vector of the intemal/state variables to generate the system output. The hidden layer employs cascaded polynomial and sigmoidal activation functions in each hidden unit (acting on the weighted sum ofthe state variables) and the interaction layer employs polynomial activation functions. These nonlinear operations make the estimation problem nonlinear in terms of most of the unknown parameters. This complication (of nonlinear, instead of linear, parameter estimation) is viewed as the "price we have to pay" in order to compact the model representation (i.e., reduce the number of free parameters in the model). This is especially important for high-order systems, as indicated in Equation (4.188), where the total number of free parameters for a VWM model is shown to be linear with respect to the order ofnonlinearity Q and/or R. The key trade-off between model compactness and ease of estimation must be examined in the context of each application, as it depends on the specific characteristics of each system. Dur cumulative experience to date shows that model compactness is usually far more important and justifies the additional methodological burden of nonlinear (as opposed to linear) estimation ofthe unknown parameters. The latter has to be achieved with iterative cost-minimization methods that are subject to various potential pitfalls, as discussed in Section 4.2.2. Nonetheless, many mature methodologies exist for this purpose, albeit requiring careful handling and attention to many procedural details. At the core ofthe VENNWM model specification and estimation tasks is the iterative algorithm by which the values of the model parameters are updated using the output-prediction error. This algorithm was discussed in Section 4.2.2 and may take various forms, depending on how the information about the local first and second derivative of the cost function is utilized. Generally, methods that utilize this information to adjust the (variable) step size are more efficient than fixed-step methods in terms of convergence. The
5.4
MODEL VALIDATION AND INTERPRETATION
279
potential entrapment in local minima remains the most serious problem of these iterative estimation methods, imposing the burden of multiple randomized or grid-based trials/initializations or "disentrapment" techniques (e.g., simulated annealing or genetic algorithms). We favor the use of multiple initializations within a subspace of the parameter space defined by a preliminary second-order approximation of the system model that places us in the "neighborhood" of the global minimum. In terms of specific parameter update algorithms, we favor either variable-step algorithms using previous update information (i.e., beta rule and delta-bar-delta rule) or the "parabolic leap" algorithm described in Section 4.2.2. The use of these algorithms is expected to provide rapid convergence for different initializations. The global minimum is identified among the individual results from multiple initializations in the aforementioned subspace defined by preliminary second-order approximation. The parameter values that correspond to the global minimum cannot be proven to be unique; however, the equivalent Volterra kemeIs (constructed from these parameters) are unique. This is the source of the great appeal and power of the Volterra modeling framework. The global minimum of the cost function is found for each selected model order for the training dataset, and the corresponding sum of squared residuals of the testing dataset is used to determine the correct model for the system, based on the statistical criterion described in Section 2.3.1, where the number of free parameters is given by Equation (4.188). The selection of the correct VENNWM model order yields a Volterra-equivalent model that can be interpreted either though its equivalent Volterra kemeIs or through the equivalent PDM formulation, as discussed in the following section.
5.4
MODEL VALIDATION AND INTERPRETATION
Having obtained a model for the system under study, we must now validate it to assure its acceptance by the peer community, and we seek to interpret it in a way that is physiologically meaningful and advances the scientific objectives of the study. Our efforts remain focused on the MDV and VWM classes of models.
5.4.1
Model Validation
The validation of the obtained MDV and VENNWM models (following the procedures described in the previous section) is performed initially through the normalized meansquare error (NMSE) of the output prediction for the testing dataset. The NMSE is the sum of the squared residuals for all data points in the testing datasets divided by the sum ofthe squares ofthe respective de-meaned output values. This NMSE number (scalar) is between 0 and 1 and represents the portion of the output signal power that is not "explained" by the model (it can be viewed as a percentage when multiplied by 100). Ofcourse, this NMSE incorporates all the effects ofnoise and interference that are detrimental to the prediction task, either because they affect the estimated model or because they are simply present in the output signal. Herein lies the key point: the mere size of the prediction NMSE is not a necessary means of validation (although it is clearly a sufficient means ofvalidation) because a low SNR at the output will result in large NMSE ofthe output prediction even for a perfect model of the system. Therefore, an additional means of
280
A PRACTITIONER'S GUIDE
validation (a necessary condition for stationary systems) is the consistency of the obtained models for different experiments, regardless ofthe resulting NMSE of output prediction. Note that the "consistency" of the models can be quantified by means of the meansquare difference of the obtained Volterra kemels for different experiments. To elaborate on these validation means, we consider the four possible outcomes oftwo-by-two combinations of low/high prediction NMSE and consistent/inconsistent models resulting from various experiments on the same preparation. If different preparations are used, then ergodic considerations should enter into this assessment. Clearly, ifthe prediction NMSE is low and the consistency is high, then the validation task has been completed. The same is true when the consistency is high but the NMSE is not low (as long as it is not so extremely high as to call into question the basic validity of the experiment). In this case, we conclude that the model is valid because it is consistent from experiment to experiment and has some predictive ability (albeit possibly limited by the low SNR in the output data). The case of low prediction NMSE (high predictive ability) but low consistency of the models obtained for different experiments points to the likelihood of a nonstationary system for which a predictive quasistationary model (i.e., nearly stationary over the time interval of each experiment) can be obtained for each experiment on the same preparation at different times. However, the nonstationarity of the system leads to low consistency of modeling results from experiment to experiment. The nonstationary characteristics of the system can be studied in this case by tracking the changes ofthe obtained models over time using successive (or overlapping) time windows of data. A time-varying pattern of these observed changes in model characteristics ought to be established in order to validate the nonstationary modeling results; otherwise, the level of acceptance by the peer community may be low, as no compelling evidence is offered to distinguish the obtained results from an apparent random behavior. This is not an unlikely occasion in physiology, where slow changes are known to occur in the characteristics of almost all systems due to various slow modulatory factors (e.g., effects ofthe endocrine or metabolic systems) or due to degradation processes such as aging, wear, and fatigue. These occasions can make full use of the power of our approach to capture and quantify nonstationary effects that demonstrate the unparalleled efficacy ofthe advocated methodology. Two illustrative examples (from the cardiovascular and the endocrine/metabolic systems) are presented in Chapter 6. Finally, the case oflow predictive ability (high prediction NMSE) and low consistency of models from experiment to experiment (on the same preparation) clearly represents the worst of the four possible outcomes. There are three main implications of this outcome: (1) the quality ofthe data is low (low SNR); (2) the system possibly exhibits characteristics that are not compatible with practicable Volterra modeling (i.e., the system is ofvery high order, or it contains nonlinear oscillators or chaotic components that endow it with infinite memory); and (3) the system is nonstationary (either slow or fast nonstationarity relative to the system dynamics). It is advisable in this case to improve the SNR as the foremost priority, since only then can the other two possibilities be explored. This can be achieved either by increasing the input power (if controllable), or by repetitive experiments and proper data preparation (e.g., filtering or consolidation). The task of improving the SNR in the data remains formidable in the presence ofnonstationarities and/or nonlinear oscillators (or chaotic dynamics). Any improvement in SNR can be detected as a reduction in prediction NMSE and allows the exploration of possible nonstationarities with the methods presented in Chapter 9. The exploration of quantitative models for nonlinear oscillators or chaotic dynamics requires substantial improvements in SNR and may em-
5.4
MODEL VALIDATION AND INTERPRETATION
281
ploy a host of methods cited in the literature [Bassingthwaighte et al., 1994; Barahona & Poon, 1996] or some specialized methods briefly described in Chapter 10. An additional means of validation is our ability to interpret the obtained models in a manner that is compatible with existing knowledge about the system functions (if such reliable knowledge is available) or appears plausible to our best scientific judgment (if specific reliable knowledge does not preexist). This brings us to the important issue ofmodel interpretation.
5.4.2
Model Interpretation
Clearly, the whole enterprise ofphysiological system modeling would be without a clear scientific purpose if we are unable to interpret the obtained models in a manner that is physiologically meaningful and advances scientific knowledge. Even though an uninterpretable predictive model can still be useful for certain purposes (e.g., control or featurebased diagnosis), the full utility of the modeling enterprise is realized only when we can interpret the obtained model in a manner that advances the scientific understating of the function of the system in the evolving context of systems physiology. This goal is a critical part of the integrative systems viewpoint in physiological research (discussed in Chapter 1) and provides the ultimate justification for the advocated inductive approach to modeling, since physiological relevance is the cornerstone of the alternative deductive approach. Demonstrating that physiological relevance can be ultimately achieved by the inductive approach leaves the competing viewpoint of deductive modeling with no competitive advantage. Although we generally favor a synergistic approach that utilizes the relative strengths of both inductive and deductive modeling methodologies, it is clear that the methodological impetus of our work has been built on the inductive approach. There are two ways in which the obtained models can be interpreted: one is focused on the obtained Volterra kernels and the other is based on the PDM formulation of the Volterra model. The former is appropriate only für low-order models (up to second order), since it is impractical to interpret multidimensional functions. The PDM fonnulation, however, offers itself for interpretation of high-order models, especially in the case of a separable nonlinearity, resulting in the relatively simple configuration of a small number of parallel L-N cascades. Interpretation of Volterra Kerneis. The interpretation of the first-order and secondorder Yolterra kemels can be made either in the time domain or in the frequency domain, depending on the salient features ofthe system and the culture ofits horne discipline. For instance, the time domain is preferred in discussing models of the effects of injected insulin on the concentration of blood glucose, but the frequency domain is preferred in discussing models of the response of primary auditory fibers to acoustic stimuli (tones or multitones). Specific examples will be given in Chapter 6. The first-order kernel in the time domain represents the weighting pattern of input past values in generating (upon integration) the linear component ofthe system output, as discussed in the introduction to the Volterra series in Section 2.1. Likewise, the second-order kernel in the time domain represents the weighting patter of products of input past values in generating (upon double integration) the quadratic component of the system output. Naturally, peaks or troughs in these kemels attract attention as indicating the lag values for which the input past epoch influences maximally the output (in a positive manner for a
282
A PRACTITIONER'S GUIDE
peak and in a negative manner for a trough). The relative strength ofthese effects is quantified by the respective kernel values. In the frequency domain, the morphology of these two kerneis depicts the presence of possible resonances (peaks ofthe magnitude) or dissonances (troughs ofthe magnitude). Although this interpretation is straightforward for the first-order kernei, it is less intuitive for the second-order kernel where one must invoke the notion of bifrequeney. A bifrequency location attains the meaning of a pair of frequencies whose stimulating combination at the input has the effect on the output that is depicted in the second-order kernel value at this bifrequency point. For instance, if the bifrequency point ( Wl, W2) eorresponds to a magnitude peak of the Fourier transform (or FFT, in practice) of the second-order kernei, then the combination of input power at frequencies Wl and W2 results in a large effect on the output (positive or negative depending on the respective phase). This effect is very small when the respective bifrequency point corresponds to a magnitude trough of the second-order kernel FFT. The bifrequency locations of magnitude peaks and troughs indicate the presence of nonlinearities in individual physiological mechanisms, if the latter can be associated with specific frequency bands. For instance, resonances or dissonances appearing at the frequency bands associated with the dominant breathing rate or heart rate can be interpreted in terms of interactions with the respiratory or cardiac activity. Similar interpretations can be developed for frequency bands associated with myogenic, sympathetic, parasympathetic, endothelial, metabolie, and endocrine activity, as discussed in Chapter 6 for the specific cases of renal and cerebral autoregulation. In general, it should be noted that magnitude peaks (or troughs) on the diagonal bifrequencies (Wb Wl) of the second-order kernel reflect amplitude nonlinearities of the individual mechanisms residing in the respective frequency band Wl, and off-diagonal peaks (or troughs) at bifrequency locations (Wl' W2) reflect the presence of nonlinear interaetions between the (possible) individual mechanisms residing at the frequency bands Wl and W2. Of course, if an individual mechanism resides in two or more frequency bands (i.e., it has two or more resonances or dissonances), then the amplitude nonlinearities of this mechanism will give rise to multiple on-diagonal and off-diagonal bifrequency combinations, as mathematically predicted by the frequency-domain analysis of the Volterra models presented in Section 2.1.3. The morphology of the magnitude of the first-order and second-order kerneis in the frequency domain also depicts possible low-pass or high-pass response characteristics of the system (in addition to the resonances and dissonances). An example ofthis is given in Chapter 6 for a mechanoreceptor.
Interpretation of the PDM Model. In the "principal dynamic mode"(PDM) formulation of the Volterra model, the estimated PDMs constitute the minimum set of linear filters that can represent the system dynamics. They result either from eigendeeomposition of the Volterra kerneis or from the in-bound weights of each hidden unit in the first hidden layer of the Volterra-equivalent network model (in combination, of course, with the respective filter bank), as discussed in Section 4.1.1. Therefore, the PDMs directly depict any dynamic characteristics ofthe system (e.g., low-pass, high-pass, or band-pass). The outputs of the PDMs are transformed nonlinearly by the multiinput nonlinearity that generates the system/model output and is represented in the model by the combination of the activation functions in the hidden and interaction layers, or by the individual static nonlinearities in the separable case. The interpretation of the static nonlinearities is
5.5
OUTLINE OF STEP-BY-STEP PROCEDURE
283
based on visual inspection (e.g., saturation, threshold, decompressive, or other characteristics). This visual inspection is easier, of course, in the separable case. The physiological interpretation ofthe PDM model hinges primarilyon our ability to associate features ofthe individual PDMs with distinct physiological mechanisms. This is not a simple task and it cannot be supported by general theoretical considerations (i.e., there is no mathematical proof or any guarantee that individual PDMs will correspond to distinct physiological mechanisms). This task is now exposed to the evolutionary influences ofthe accumulated evidence in current and future studies. The aforementioned conjecture regarding correspondence of PDM features to physiological mechanisms is expected to be tested by this accumulating experience and will be either affirmed or evolved. Illustrative examples of PDM model interpretation are given in Chapter 6 for neural, cardiovascular, and endocrine/metabolic systems. These examples constitute only the initial step in this ambitious undertaking of physiological interpretation, with crucial importance for the ultimate impact ofnonlinear modeling on systems physiology. It is plausible that additional insights and useful interpretation clues may result from developing equivalent parametric models or modular models (other than the PDM formulation), following the theory and methods presented in Chapters 3 and 4, respectively.
5.5
OUTLINE OF STEP-BY-STEP PROCEDURE
In this section, we provide abrief outline of the advocated step-by-step procedure for physiological system modeling in order to assist the reader in developing a comprehensive and useful understanding of this approach in a practical context. Step 1:
Determine the system characteristics of bandwidth, memory, and dynamic range. Step 2: Examine the system linearity, stationarity and ergodicity. Establish the degree of nonlinearity (if nonlinear) and the suitability of stationary or quasistationary modeling. Step 3: Select the input characteristics of waveform class, bandwidth, amplitude distribution, and dynamic range-if the input is experimental and controllable. A natural input ensemble is recommended, ifpossible. Step 4: Determine the data-record length requirements and prepare the input-output data according to the system characteristics and the noise/interference conditions (i.e., data filtering or consolidation). Step 5: Perform the core tasks of model specification and estimation, following either direct inversion methods (for discrete Volterra models) or iterative costminimization methods (for Volterra-equivalent network models). Step 6: Repeat Steps 1-5 for as many data segments (or experiments) of the same and different preparations as possible, in order to explore the consistency of the results and possible nonstationary or nonergodie characteristics. Step 7: Perform the model validation task based on output prediction accuracy (for the testing datasets) and also based on the degree of consistency of the obtained models from experiment to experiment. Step 8: Perform the interpretation task for the validated model, either in the PDM formulation (recommended as the most promising option) or by inspecting
284
A PRACTITIONER'S GUIDE
the low-order Volterra kernels, The interpretation results can be enriched by ineorporating prior knowledge about the system and the results of supplemental parametric and alternative modular analysis. Step 9: Based on the final interpretation results and analysis of the key physiologieal issues, design the next set of experiments or natural data collections that can elueidated possible ambiguities or explore additional facets of interest. Step 10: Disseminate the results in the prescribed rigorous context and utilize the obtained model for the specific purpose of the study (e.g., advancement of seientific knowledge, diagnosis, control and outcome assessment). (For this, do not forget to read the following chapters as weIl.)
5.5.1
Elaboration of the Key Step #5
The reeommended proeedure for the core tasks of model specification and estimation performed by Step #5 is: (a) (b)
(e) (d) (e)
Seleet the "training" and "testing" datasets through four-to-one random sampling of the prepared input-output data. Compute the seeond-order or third-order MDV model through direct inversion [see Equation (2.185)] using the model-order-selection (MOS) criterion ofEquation (2.200) to determine the appropriate structural parameters a and L through an aseending-order search procedure. Find the PDMs ofthe obtained second-order or third-order MDV model using the deeomposition approach outlined in Section 4.1.1. Compute the PDM outputs {uj(n)} U = 1, ... , M < L) through convolution of eaeh PDM with the input data. Determine the static mapping nonlinearity of the "state-vector" u = [ul(n) ... uAin)]' onto the output data yen). This can be done either eomputationally (by defining proper bins of a grid in the state space and computing the average output value for eaeh bin) or through least-squares fitting of a postulated analytical expression for the multivariate static nonlinearity (e.g., multinomial).
An alternative route after step 5(c) is the following:
(d') Use the PDMs to determine the equivalent difference equations of input preproeessing for the VWM model [see Equation (4.181)]. (e') Use the obtained difference equations to initialize the coefficient values of the VWM model, and the kerne I deeompositions (eigenvalues or singular values) to initialize the polynomial coefficients of the hidden units of the VWM model. (f') Initialize the sigmoidal funetions of the VWM hidden units in the "quasilinear" region (i.e., small curvature within the input range) and set the number of interaction units equal to [H/2] with quadratic activation funetions having very small initial coefficient of second degree (for unity eoefficient of first degree). (g') Train the VWM model for higher orders in the hidden units (Q ;;::: 3) using the MOS criterion ofEq. (2.200) on the testing dataset. Note that the resulting VWM model is equivalent to the PDM model, the only difference being that the static nonlinearity is constrained in the VWM model for greater parsimony.
6 Selected Applications
In this chapter, we present examples of the application of the advocated methodology to actual physiological systems. These systems represent various physiological domains (neural, cardiovascular, renal, endocrine/metabolic) for which actual Volterra-Wiener type of modeling has been undertaken over the course of the last 30 years. Naturally, the specific variant of the advocated modeling approach taken in each case depends on the evolutionary stage of the subject methodology at the time of each application and, of course, on the particular characteristics of the system under study. The illustrative applications presented herein should not be viewed as a complete or exhaustive review of the relevant bibliography, but are primarily drawn from work with which the author has been direct1y involved (with a few exceptions). This affords the requisite familiarity with the procedural and methodological details of each application, as well as an intimate knowledge ofthe physiological objectives. The applications presented in this chapter are for single-input/single-output cases, since the multiple-input/multipleoutput cases are discussed in Chapter 7. Examples from the particular case of neural systems with spike-train inputs are discussed separately in Chapter 8, since this case requires specialized variants of the general methodology that are compatible with the binary signal modality of spike trains. All the examples presented in this chapter have discretized (sampled) continuous inputs, but they can have either discretized continuous or point-process (binary spike-train) outputs (as in the case ofneurosensory systems). Section 6.1 deals with neurosensory systems (visual, auditory, and somatosensory) that receive continuous inputs and generate either continuous (graded) or point-process (spike-train) outputs. Neural systems with spike-train inputs are addressed separately in Chapter 8. Examples of cardiovascular and renal systems are drawn from blood-flow autoregulation studies and presented in Sections 6.2 and 6.3, respectively. The dynamic effects of injected insulin on blood glucose concentration is used as an example of a metabolic/endocrine system in Section 6.4. Nonlinear Dynamic Modeling 0/ Physiological Systems. By Vasilis Z. Marmarelis ISBN 0-471-46960-2 © 2004 by the Institute of Electrical and Electronics Engineers.
285
286
SELECTED APPLICATIONS
Many additional applications ofthe Volterra-Wiener approach can be found in the literature that cannot be reviewed here in the interest of space. Among them, it is worth noting the first known application of this approach to a physiological system (the pupil reflex) by L. Stark and his associates [Sandberg & Stark, 1968; Stark, 1968, 1969], the nonlinear modeling of aplysia neurons by J. Segundo and his associates [Bryant & Segundo, 1976; Brillinger et al., 1976; Brillinger & Segundo, 1979], ofthe muscle spindIe by G. Moore [Moore et al., 1975], ofthe ankle reflex by R. Kearney and I. Hunter (1990), of growth dynamics in phycomyces by E. Lipson (1975), of ganglion cells in the cat retina by J. Victor and R. Shapley (1979a, b,1980), and of auditory neurons by A. Moller (1973, 1975, 1976, 1977, 1983, 1987). Ofparticular note are the additional applications (not presented herein) by K. Naka et al., A. French et al., and T. Lewis et al., as weIl as the groundbreaking work on neuronal systems with point-process inputs by T. Berger and R. Selabassi and their associates that is reviewed in Chapter 8. Note that the important cases of spatiotemporal and spectrotemporal receptive fields (whose analysis is enabled by this approach) are discussed in Section 7.4.
6.1
NEUROSENSORY SYSTEMS
As indicated above, most neural systems receive as input presynaptic action potentials (spike trains) that represent a particular data modality requiring specialized treatment (discussed separately in Chapter 8). In this section, we will present examples from neurosensory systems that have discretized continuous inputs (e.g., light intensity for the visual system, sound pressure for the auditory system, mechanical strain for the somatosensory system). Some of these neurosensory systems have continuous output (e.g., graded intracellular potential in retinal receptor, horizontal, and bipolar cells) and others generate sequences of action potentials (e.g., ganglion retinal cells, primary auditory nerve fibers, or cuticular mechanoreceptors). In the latter case, a threshold-trigger operator is appended to the Volterra model output in order to generate the binary spike-train output corresponding to the recorded sequence of action potentials (see also Section 8.2). An alternative to thresholding that has been used widely in the past is to define the output of the model as the "likelihood of firing an action potential" instead of the binary numerical value denoting the presence (1) or absence (0) of an action potential at each discrete time bin n. The "likelihood of firing" at each time bin n (outside the absolute refractory period) is quantified by the continuous output of the Volterra modeland can be converted into "probability of firing" through proper normalization into the range [0, 1]. Of course, the application of a hard threshold on the graded output signal representing the likelihood or probability of firing can yield a binary (spike train) representation of the output prediction as a sequence of action potentials. For historical reasons, we begin with illustrative examples from the extensive modeling work that Ken Naka and his associates performed on the vertebrate retina over the last 30 years. This includes the pioneering applications to catfish retinal cells that the author' s brother, Panos, performed in collaboration with Ken Naka that gave the initial impetus to this field in the early 1970s. These pioneering applications, along with the earlier contributions of Larry Stark and his associates that represent the first known application of the Volterra-Wiener approach to physiological systems (the pupil reflex), should be credited with establishing the importance of this modeling approach to systems physiology and have been seminal in the formation of this entire field of study.
6. 1 NEUROSENSORY SYSTEMS
287
Note that a major contribution ofPanos Marmarelis to this field has been his extension of the Volterra-Wiener modeling methodology to systems with two (or more) inputs, which is discussed in Chapter 7, where illustrative examples are also given from his groundbreaking applications to motion-sensitive eells in the fly composite eye (with Gilbert McCann) and to the bipolar cells ofthe catfish retina (with Ken Naka). We conclude the illustrative examples of neurosensory systems with a nonlinear model of the fly photoreceptor retinula cells 1-6, which was the author' s first application of Volterra-Wiener modeling to a physiological system (with Gilbert McCann) during his Ph.D. studies at Caltech in the mid-1970s, along with some additional applications to photoreceptors from other research groups. Following these examples from the visual system, we present a nonlinear model ofprimary auditory nerve fibers due to Ted Lewis and his associates as an illustrative example from the auditory system. We elose this section with a nonlinear model of a cuticular mechanoreceptor as a simple example of a somatosensory system, which was obtained by the author in collaboration with Andrew Freneh. The latter example includes intracellular and extracellular output recordings. These illustrative examples from neurosensory systems are a fitting tribute to the historical importance of these applieations for the establishment of this field and demonstrate the important fact of common funetional attributes of neurosensory systems (e.g., the presence of two principal dynamic modes eneoding, respectively, the intensity and the rate of change of input information).
6.1.1
Vertebrate Retina
As a first example, we select for historical reasons the Wiener models of catfish retinal cells obtained by Panos Marmarelis and Ken Naka in the pioneering applications of this approach [Marmarelis & Naka, 1972, 1973a, b, c, d, 1974a, b]. A schematic ofthe retina organization is shown in Figure 1.3 and a simplified block diagram of neuronal signal flow is shown in Figure 1.4 (note that more elaborate modular models of the outer retinal layers are shown in Figures 4.28 and 4.31 in connection with nonlinear feedback analysis). The first set of Wiener kernel estimates (of first and seeond order) obtained in catfish retinal ganglion cells is shown in Figure 1.5, where the analyzed output data represent "probability of firing" measured by superimposition of recorded spike-trains from repetitive identical trials (peristimulus histogram) using a segment of band-limited GWN current stimulus injected into the horizontal cell layer. As indicated in Section 1.4, the "biphasic" waveform of the first-order Wiener kerne I exhibits the two main functional eharacteristic of sensory systems: (1) the initial positive lobe (peaking in this case at about 30 ms) indicates that the system responds to the input intensity after weighted integration (smoothing) over a 50-60 ms time window (using the weighting pattern defined by the shape of the positive lobe); and (2) the subsequent undershoot (the negative lobe extending from about 60 ms to about 200 ms) indieates that the system also responds to changes in the smoothed input intensity, thus providing a robust running measure of the rate-of-change of input intensity through time. The biphasic form ofthe first-order Wiener kerne I in retinal ganglion cells is more evident for higher mean and/or power levels of stimulation (a fact that carries over to other retinal cells as well). This fact suggests the presence of explicit (i.e., through physical neuronal processes) or implicit (i.e., through nonlinearities ofthe underlying biophysical
288
SELECTED APPLICATIONS
mechanisms) nonlinear feedback, as discussed in Section 4.1.5. This biphasic form often diminishes for low levels of stimulation (e.g., dark-adapted preparations exposed to lowintensity light stimulation), taking a monophasic (low-pass) form. It is important to note that the specific kernel waveform can be related to the dynamics of distinct ionic channel mechanisms, in accordance with the detailed analysis presented in Section 8.1. The shape ofthe second-order Wiener kernel exhibits a primary positive mount (peaking at the diagonal point with lags/coordinates Tl = T2 = 30 ms) and a secondary positive mount at lags/coordinates Tl = 70 ms and T2 = 100 ms (and the symmetrie mount about the diagonal). Negative "valleys" run along the diagonal after a T2 lag of about 75 ms and originating at Tl = 60 ms (also the symmetrie "valley" originating at T2 = 60 ms for Tl > 75 ms). The fact that the primary positive mount peaks at the same lag as the main positive lobe of the first-order kernel suggests a rectifying nonlinearity (consistent with the notion of a threshold mechanism generating the action potentials at the axon hillock of the ganglion cell). However, the subtler features of the second-order kernel require more nuanced interpretation that was not possible at that early time. At the present time, the use of a PDM model can offer this kind of refined and elaborate interpretation, extending also to higher-order nonlinearities. Unfortunately, these data have not been analyzed in the advanced context of PDM modeling (a fact applying to all retinal cell models to date, with the exception of the related L-N and L-N-M cascade models of some retinal cells discussed below). The presented Wiener model of the horizontal-to-ganglion cell pathway was validated by means of its ability to reduce the mean-square error of the model prediction for the experimental input-output data, as illustrated in Figure 1.6. Recently, the author and his associates attempted some preliminary PDM analysis of retinal data provided by Ken Naka for this purpose. The initial results indicate the presence of only two PDMs, associated with rectifying- and saturating-type nonlinearities. If confirmed with additional data analysis, this result offers the attractive prospect of interpreting the PDM models ofretinal cells in the context ofbiophysical mechanisms, thereby identifying specific ionic channels associated with the electrochemical activity of retinal cells under various operating conditions (following the procedures presented in Sections 8.1 and 8.2). The time is ripe to examine these issues thoroughly and establish the basic facts of retinal cell function in light of our current improved estimation and rigorous understanding of kemels, We should note that these Wiener kerneI estimates depend on the mean level and the power level of the GWN stimulus with which they have been estimated. Therefore, the aforementioned kernel morphology may change if different mean and/or power levels are used for the GWN test input (an important fact largely overlooked in the past). Of course, these changes in kerne I morphology will be minimal for the second-order Wiener kernel if the system is well approximated by a second-order model (although this is not true for the first-order Wiener kernel estimate that can be affected significantly if the mean level ofthe GWN input is altered, even for a second-order system). An illustration of this is given in Figure 6.1 where the first- and second-order Wiener kernels ofthe light-to-horizontal cell system in the catfish retina are shown for two different mean and power levels ofthe GWN input [Marmarelis & Naka, 1973a]. The first-order Wiener kerne I for the higher level of stimulation exhibits an undershoot after a lag of approximately 90 ms (unlike its lower-level counterpart that exhibits no undershoot) and has a shorter peak-response time (about 40 ms versus about 70 ms for its Iower-level counterpart). This is indicative of nonlinear feedback in this system (see Section 4.1.5). Furthermore, the second-order Wiener kernel estimates (although ofrelatively poor qual-
6. 1
h l (T) 32l-.A
289
NEUROSENSORY SYSTEMS
Q) U)
c
281-
I \
0
-
0U)
24r
I
20
X\
Q)
a::
Log I
16 12 8 4
0 -4
(msec)
-8
-12 TI'
.032
64
128
192
TI '
.032
.16
.128
.096
~ ~ .~
~
~6
11). . .
.096
.128
.16
(
.096
<,
"J
\
-,
r-. Low meon intensity
061&
.... N
""N
.16
sec 061&
.032
.032
.128 +I
320
256
sec
.06~
T
.096
-.....
~o ,-.......
.128
\ .16
h2 ('I' '2)
High meon intensity
Figure 6.1 First-order and second-order Wiener kerneis of the Iight-to-horizontal cell system for low (8) and high (A) mean light intensity levels. [Marmarelis & Naka, 1973a].
ity) demonstrate that the higher level of stimulation gives rise to stronger nonlinearities (as depicted by the larger values of the second-order kernel for higher level of stimulation) and the negative sign of the primary trough suggests compressive nonlinearity (i.e., the second derivative of the nonlinearity is negative). The values of the second-order Wiener kernel for the lower level of stimulation are small, suggesting an operating point in a quasilinear region or close to an inflection point of the putative nonlinearity. It is interesting to note that the first-order Wiener kernel of the light-to-ganglion cell system shown in Figure 6.2 exhibits more differentiating characteristics (i.e., more sensi-
290
SELECTED APPLICA TIONS
.oaa
.*
.OII
.la
.11
.032
•GIN
.OII
.128
.18
Figure 6.2 First-order and second-order Wiener kerneis of light-to-ganlion-cell system [Naka, 1973b]. The scale bar in the lett panel corresponds to 64 ms.
tivity to input change and exhibiting broader bandwidth) than its counterpart for the lightto-horizontal cell system shown in Figure 6. I-an effect that can be attributed to neural processing perfonned by the mediating bipolar and amacrine cells. In addition, the second-order Wiener kernel ofthe light-to-ganglion cell system indicates stronger nonlinearities (than the light-to-horizontal cell system) and of decompressive character attested to by the positive sign of the principal mount. The historical value of these examples notwithstanding, it would be remiss not to remind the reader that the advocated approach in this book favors the estimation of the Volterra kernels ofthe system with more accurate/reliable methods than cross-correlation (e.g., LET, LVN, or VWM) presented in Sections 2.3, 4.3, and 4.4. Therefore, equipped with our current knowledge and advanced methodologies, we should revisit the subject of nonlinear dynamic modeling of retinal cells, because it offers a stable experimental preparation of great neurophysiological importance. Ken Naka and his associates (most notably Hiroko Sakai) perfonned extensive studies of catfish retinal cells with various types of GWN stimuli over many years, starting with the pioneering studies at Caltech in the early 1970s (in collaboration with Panos Marmare lis) cited above. These extensive and meticulous studies have elucidated many aspects of retinal function in the context of Wiener models and have yielded a valuable database for additional analysis. Experimental data included responses of horizontal, bipolar, amacrine, and ganglion cells in response to GWN light stimuli, as well as injected current stimuli into horizontal, bipolar, and amacrine cells. Studies included multiinput as well as multicolor testing and modeling [Sakai & Naka, 1985, 1987a, b, 1988a, b]. Some illustrative examples are given below for amacrine and ganglion cells. Figure 6.3 shows a schematic of retinal organization depicting the main signal flow pathways and the neuronal classification proposed by Naka. Typical first-order and second-order Wiener kerneI estimates for various types of bipolar, amacrine, and ganglion cells are shown in Figure 6.4 (on-center and off-center). An interesting demonstration of
6.1
NEUROSENSORY SYSTEMS
291
Figura 6.3 Schematic diagram of the neural organization of the catfish retina proposed by Naka and his associates. Arrows indicate signal pathways with sign-noninverting (+) or sign-inventing (-) transmission. Note that all monosynaptic inputs to ganglion cells are sign-nonineverting [Sakai & Naka, 1988b] .
how the difference in the second-order kemels of type C and type NB amacrine cells can be explained by means of a postcascaded linear filter is presented in Figure 6.5, whereby the form of the second-order kernel of the type C amacrine cell corresponds to an L-N cascade, but that of the type NB amacrine cell corresponds to an L -N-M cascade model. Finally, we show in Figure 6.6 the first-order Wiener kernel estimates obtained for the neuronal pathways between different types of amacrine and ganglion cells when GWN current stimulus is injected into the amacrine cell and the elicited response in the ganglion
A
B
.., CJl
~
i)
~ z...;
("Tl
sr
CJl
.J
.., CJl i)
;;; rn
CJl
.J 0.1 sec 0 .2
Figura 6.4a Typical first-order Wiener kemels of each type of retinal neuron are shown. The stimu lus was a full-field white-noise light stimulus whose mean luminance was 30 J1-W/cm 2 • Column A shows first-order kemels from depolar izing (on-) cells; Column B shows the first-order kemels from hyperpolarizing (011-) cells. Four kemels from the same cell type were superimposed by normalizing the kemel amplitude. Kemels on the lowest traces were computed from extracellu lar spike discharges, whereas the others were trom intracellular slow potentials. The kemel ordinate unit is mV . cm 2 J1-W-1 for intracellular responses and spikes · cm 2 J1-W-1 for extracellular spike discharges [Naka & Bhanot, 2002] .
292
SELECTED APPLICATIONS
.., (f)
~ 2
B
GANGLION ~ELLS
ß~ß~ße .~~/ ~
!2
.a."
!2
a..V
#11
p
~
""0
~ ~
~
:j
J
<:lCf:l
2 2 2
C
ß~~ O e 7" -at _,it.. _,~~,:J ~
o
~
0
~!2
~
0
l'Tl
Figure 6.4b Typical second-order Wiener kemels obtained trom amacrine cells (A), trom intracellularly recorded ganglion cells (B), and trom extracellular ganglion spike discharges (C). Solid Iines in the kemel indicate a mount or a depolarization, and dashed Iines indicate a valley or a hyperpolarization. Although there are some variations, the second-order kemels computed trom amacrine cells are essent ially ot three kinds (type-C, type-NB, type-NA). The second-order kemels computed trom ganglion cells , either trom intracellular slow potentials (B) or trom extracellular spike d ischarges (C), correspond to the same three kinds obtained trom amacrine cells . The kemel ordinate unit is mV . crrr' /LW-2 tor intracellular responses and spikes -crrr' /LW-2 for extracellular sp ike discharges [Naka & Bhanot, 2002].
cell is recorded as output. It is intriguing that homotypical cell pathways (e.g., NA - GA, NB - GB) exhibit the typical profile of neuronal transduction in neurosensory systems (intensity and rate sensitivity), but heterotypical cell pathways (e.g., NA _ GB, NB_ GA) exhibit a dynamic characteristic of double differentiation (implying acceleration or edge sensitivity). The latter should be expected in tasks specializing in detection and coding of on-off transient response characteristics that relate to the appearance and disappearance of sudden events (possible "threats"), as weIl as edge information related to motion detection and coding. The physiological interpretation of these modeling results elucidates certain aspects of visual information processing in the vertebrate retina. In the outer plexiform layer (i.e., the receptor-horizontal-bipolar complex), we take into consideration the putative nonIinear feedback model presented in Section 4.1.5 that explains the experimentally observed first-order Wiener kerneis and their changes for changing GWN input power. We posit that the receptor-horizontal complex (Iike most neurosensory systems) processes intensity and rate-of-change information of the input signal in a manner that retains adequate sensitivity (gain) over a broad range of stimulation by adjusting dynamically its response characteristics to the input conditions (mean and power level) using nonlinear feedback in tandem with a compressive nonlinearity (see Section 4.1.5). Furthermore, this arrange-
6.1
NEUROSENSORY SYSTEMS
293
8
A
~RETINA~ ~RETINA~
..~ o
•• @
_AU
RE!
~
_
-E-fJ}Ulliu O. j~
~ ••
~J
_ASS SClUAIItlG
EillfQ~ ~j~'
0
ll~i':"cC~'T'"
''1o •• cOMI\.AT1OfII
Figura 6.5 Validation of an L-N cascade model for a type -C amacrine cell (Ieft) and of an L-N-M cascade model of a type-NB amacrine cell (right) based on their second-order Wiener kemels. The impulse responses of the prior filter Land posterior filter Mare shown. The output of the first filter L is squared [Naka & Bhanot, 2002] .
ment secures faster response and broader bandwidth when the input intensity and rate of change are greater. The latter attribute meets the critical behavioral requirement of rapid detection of threats for survival purposes. The spatial arrangement of the receptor-horizontal cell complex also provides processing capabilities for contrast enhancement and edge detection (center-surround organization of the receptive field), as discussed in Section 7.4 in connection with spatiotemporal modeling. The addition of the bipolar cell (the main "bridge" between the outer and the inner plexiform layers of the retina) to the receptor-horizontal complex further enhances the differentiating characteristics of visual (pre)processing in time and space that emphasize contrast and rate-of-change information . The triadic synapses (bringing together horizontal processes and bipolar dendrites at the receptor terminals) assist with this differentiating task in time and space, and implement the continuous adjustment of processing characteristics to the changing conditions ofvisual stimulation (the morphology ofthe triadic invagination changes with changing stimulus intensity). The receptive field ofthe bipolar cell performs double differentiation in space (for edge enhancement) and single differentiation in time (for rate-of-change enhancement}-both band-limited operations, as discussed further in Sections 7.2.2 and 7.4.1. In the inner plexiform layer (i.e., the amacrine-ganglion complex), the obtained modeling results point to the existence of three channels of information processing (corre-
294
SELECTED APPLICATIONS
A1
81
NA-GA NB-GB
u
~
«c:
<,
ca>
~
«
--..........--....
..
Cf)
.x
u cu
I
··
NB-GA
t'
.., -, ,,\
NA-GB
~ cl) CI>
..x
'Ci. V)
.~
0
I 0
,
i
100 ms 150
50
.,
A2
v i
i
I
100 ms 150
50
82
... l~ .,
~I
'.
CD
"0
0
N
10
Hz
100
10
Hz
100
Figure 6.6 A1 shows the first-order Wiener kerneis derived for the neuronal pathway from type-NA amacrine to ganglion GA cell (solid line) and from type-NB amacrine to GB ganglion cell (dashed line). The two kernel waveforms are almost identical. B1 shows the first-order Wiener kerneis derived for the neuronal pathway from type-NB amacrine to GA ganglion cell (solid line) and from type-NA amacrine to GB ganglion cell (dashed line). Fourier transform of the kerneis yielded the gain characteristics in A2 and B2, respectively [Sakai & Naka, 1988a].
sponding to the traditional classification into on-sustained, off-sustained, and on/off-transient response characteristics) denoted by Naka et al., as type A, B, and C, respectively. These three channels of information processing are also interacting through amacrine intemeuronal connections in order to encode the rate-of-change and intensity of signals related to motion (translational and rotational) over spatial patches that retain localization of motion information. For instance, the differentiation-rectification-integration cascade of the type-NB amacrine cell shown in Figure 6.5 provides a running measure of speed, and a measure of acceleration is given by the double-differentiation characteristic shown in the first-order kemels of heterotypical amacrine-ganglion pathways (e.g., NA ~ GB or NB ~ GA) shown in Figure 6.6. An interesting example of a cascade model is also given by Emerson and his associates for visual cortical cells in the cat [Emerson et al., 1989, 1992]. This work is revisited in Section 7.4 in the context of spatiotemporal modeling. Here, we limit ourselves to single cortical cells (both "simple" and "complex") for which the cascade models of Figures 6.7 and 6.8 were obtained [Emerson et al., 1989]. For a simple cortical cell, the cascade model of Figure 6.7 suggests that the aggregate effect of two long chains of concatenated operations from the retina to the LGN (depicted within the component LI) can be approximated by a linear operator with the typical dynamic characteristics of visual sensory neurons (e.g., the first-order kernel ofretinal ganglion cells shown earlier). This first quasilinear operation is followed by a half-wave rectifying static nonlinearity (ramp thresh-
6. 1
NEUROSENSORY SYSTEMS
295
SIMPLECKLL CASCADE MODBL
LI
L,no IIJ/M . N
LJA
LI.I
L2
~
rdWLt
R......GC
LGH
~
CORTEX
Figure 6.7 An expanded version of the L1-N-L2 model of a simple cortical cell to show the components within L1 that could account for the linear representation of stimulus luminance in an off area of the cortical receptive field. The first three sub-boxes in L1 account for linear transduction of light to an internally transmitted signal, linear filtering of the signal, and conversion into a spike train, all occurring between receptors and the output of the ganglion cells in the retina. The knee of ganglion-cell threshold (N1.1) is shown far to the left to signify that the threshold is so low that a nonlinearity would appear only for the most extreme negative modulations of the steady-state signal at mean luminance (indicated by the dashed vertical line). The last three subboxes account for transmission of the spike train to the LGN, conversion through synaptic input to a higher-threshold LGN neuron, nonlinear recoding into another spike train through principal cells of the LGN, and transmission to the cortex. The last subboxes, operations L1.3 and L1.4, include inversion of the on-pathway signal polarity, which combines with the off-pathway signal to recreate a linear intracellular voltage in the cortical neuron. This "push-pull" approach has the effect of cancelling the even-ordered nonlinearities distributed along each of the complementary pathways [Emerson et al., 1989].
old) and another linear filter, with the characteristics shown in Figure 6.7. For a complex cortical cell, the resulting cascade model has two parallel branches corresponding to on and off pathways, each associated with a ramp-threshold static nonlinearity. The combination of these two pathways gives the total behavior of full-wave rectification as shown in Figure 6.8, consistent with an approximate "squaring" operation required by the "energy model" of Adelson and Bergen (1985). These initial interpretations of the modeling results will be further elucidated in the discussion of spatiotemporal modeling of visual systems, given in Section 7.4 in the context of multiinput modeling. They can also benefit significantly by the application of the advocated PDM modeling methodology that has not yet been applied to the wealth of data available from numerous experimental studies of the visual system. Additional results on retinal cell modeling and interpretation will be presented in Chapter 7 in connection with two-input modeling, whereby valuable insight into the spatiotemporal organization of the retina was obtained by the pioneering studies performed by P. Marmarelis and Naka using spot/annulus stimuli, which opened the path for future multiinput modeling studies. In particular, the basic characteristics of the spatiotemporal receptive fields of bipolar, amacrine, and ganglion cells were revealed for the first time through these studies. These efforts culminated with the measurements of complete spa-
296
SELECTED APPLICATIONS
GENERATING EVEN-oRDBR FUNC110N (COMPLEX CELL)
N
L
I
I
ow
II~ N
L
I
I
B
A SEPARATE ON aad OFF NONLINEARITIES
!
0.50 r
0.40 0.30
B 0.20
EVEN·ORDER .. SQUARING FUNCI10N
,,
0.50
ON
'\ 0,.,
i
1I~
ON
i
""\, \ \
,, \ \
0.10
-G.20 ..0.10
,,
0.0 INPUT
0.40
~
SQU~: I~VEN
\
0.30 0.20
O.tO
0.20
-0.20
-0.10
0.0
0.10
0.20
INPUT
Figure 6.8 Model for generating the squaring function needed by complex cortical cells for motion perception (energy model). Aseparate static nonlinear operation is provided for the linear on and off pathways. Combination of the static nonlinearities produces a close approximation of a full squarer at the output (see bottom panels) [Emerson et al., 1989].
tiotemporal receptive fields of retinal ganglion cells by Citron, Kroeker, and McCann [Citron et al., 1981], which was an extraordinary achievement and was extended to cortical cells by Emerson and Citron [Citron & Emerson, 1983; Citron et al. 1981]. As will be seen in Section 7.4, these spatiotemporal receptive field measurements are cast in a nonlinear context and contain valuable information about retinal processing unavailable in any other measurements attained to date. Given the additional methodological power that becomes available with the techniques advocated in this book (e.g., PDM modeling), it is impossible to overstate the potential for significant advancements in spatiotemporal neural processing in the immediate future.
6.1.2
Invertebrate Retina
An example from invertebrate vision is the modular model ofthe retinula cells 1-6 ofthe fly (Calliphora erythrocephala) photoreceptor in a single ommatidium of the composite eye, which takes the form ofthe L-N cascade shown in Figure 1.7. This modular model was developed by analyzing the CSRS kemels shown in Figure 6.9 that were obtained with an eight-level CSRS test input during the author's Ph.D. studies with Gilbert Mc-
6.1
~e
::J .;
e
~"i ~-
~
20
10
o I
,
,
I
0.00 T
0.06.
I
,
u.06 -
llee)
SECOND-QRDER KERNEL ESTIMATE ~IT •• T2)
/ .?
j
...
297
FIRST-QRDER KERNEL ESTIMATE 0.(,')
50 40 30
-10
~
NEUROSENSORY SYSTEMS
5@ Q .
~:::/
~
/' ~ ', .
, // /
!!f[:n~1~z() @E~~/P o.oo~.oo
•
•
I
'_
J
u.06
T.lsec)
Figure 6.9 First-and second-order CSRS kernel estimates of the photoreceptor of Calliphora erythrocephaJa (with wh ite eyes) as obtained through the use of an eight-Ievel CSRS stimulus [Marmarelis & McCann, 1977).
Cann [Mannarelis & McCann, 1977]. These kernel estimates suggest an equivalent L-N cascade model based on the analysis of Seetion 4.1.2 regarding the relationships between kerneis of different order in an L-N cascade model (viz., a "cut" along the axis of the second-order kerne I for any given value is a scaled version ofthe first-order kerneI). However, elose inspection of the second-order kernel of Figure 6.9 indicates that the aforementioned simple scaling relation holds true only for certain "cuts"-specifically for T2 "cut" values from 10 ms to 15 ms and from 20 ms to 25 ms. Outside these ranges of T2 "cut" values, the second-order "cuts" deviate from a scaled version of the first-order kernel, suggesting a more complex model. Furthennore, the observed changes in first-order CSRS kernel estimates for different quasiwhite input power levels are consistent with the presence of nonlinear negative decompressive feedback. This becomes more evident in dark-adapted cases, where the second-order model becomes inadequate to predict the system output (i.e., higher order nonlinearities are present). The inadequacies ofthe secondorder model are also evidenced by its inability to account for certain higher frequencies in the system output, as illustrated in Figure 6.10 for a ternary input. It is worth noting that the retinula cells 7-8 of the fly ommatidium have kernels distinct from the retinula cells 1--6 shown above [McCann, 1974; McCann et al., 1977; Eckert & Bishop, 1975], and they project directly to the medulla along with the large monopolar cells (LMC) that are postsynaptic to the retinula cells 1--6 at the laminar layer. The illumination spectral response characteristics are also different, with the retinula cells 7-8 exhibiting a much stronger UV response (than the cells 1--6), which is possibly used by the fly for navigational purposes [Fargason & McCann, 1978].
298
SELECTED APPLICATlONS
a
fW/ IIA(vU\A"'~IW1!M
b
'IIIIV I/lAI"VII''''IM
c
.1 SEC t-----i
Figura 6.10 Portion of a three-Ievel CSRS stimulus (trace a), the corresponding system response (trace b), and the second-order model-predicted response (trace c) [Marmarelis & Marmarelis, 1978).
These observations were generally confirmed by the extensive studies of Gilbert McCann and his associates in the fly eye. For example, Figure 6.11 shows the first-order and second-order Wiener kernels of retinula cells 1-6 for two different sizes of spot stimuli (corresponding to two different power levels of GWN stimulation). The increasing undershoot in the first-order kernel and the rising magnitude of nonlinearity (peak of second-order kernel) are evident for increasing stimulus size [McCann, 1974]. Another example of first-order and second-order Wiener kernels of the LMC response to center/surround stimulation in the fly Eristalis is given by Andrew James [James, 1992] and shown in Figure 6.12. The first-order Wiener kernels ofthe lamina in the fly eye change with stimulation power level in a manner that suggests the presence of nonlinear decompressive feedback (i.e., increased undershoot or decreased damping) in accordance with the analysis presented in Section 4.1.5 . Another interesting application of this approach to an invertebrate photoreceptor was made by Andrew French and his associates to the locust Locusta migratoria using green GWN stimuli of 50 Hz bandwidth. Severallevels of stimulat ion were applied and a nonlinear feedback model was developed (using a ratio nonlinearity) for mean stimulus levels above 2000 photons per second [French et al., 1989]. However, this feedback model was not able to explain the behavior of the locust photoreceptor at lower levels of stimulation, although nonlinearities have been detected at such low levels . The results confirmed the previous qualitative observation of faster response (in terms of shorter latency and broader bandwidth associated with the emergence of an undershoot) and reduced gain sensitivity with increasing stimulus intensity. The possibility of feedback from a later stage to an earlier stage of a eascade was first suggested by Borsellino and Fuortes (1968), following on the classic model of Fuortes and Hodgkin (1964), and was later elaborated in Limulus photoreeeptors by Grzywaez and Hillman (1988) . The cumulative experience indicates the presenee of forward and feedback nonlinearities in the process of phototransduction, intertwined with dynamics that attain more complieated forms as the range of stimulation increases, and especially as it extends into the lower intensity levels (i.e., when the model seeks to explain the dynamie behavior from near the dark-adapted state to a high light-adapted state). We postulate a photoreceptor model with negative decompressive feedbaekf(y) (see Seetion 4.1 .5) that is described by the nonlinear differential equation
A
For curve I; Dio. 1& 1/4For 20nd 3; Dia.••/2-
20
~
'5
~
SinVf. Spot
-5 -10 0
0.1
0.2
r in s.conds
B 0
°1 .01
oB S u
! T
.01 t
t
T, j n
secondl
.02
7
t
.03
.04
.05
.04
.05
t
h2-'
.02
tD
•
.~ ~
.03
.04
.05
T,
C .01
.02
in seconds .03 ~-2
.Ot
i
.02
I
.5
....C\I
.03
.04
.05
0( -221
20
Figure 6.11 Wiener kerneis for retinula cell responses to single-spot stimuli. A: first-order kerneis. B, C: second-order kerneis. Curve 1 of A and B is for 0.25 0 diameter spot at central field axis; dynamic range 80 cd/rn", average 40 cd/m", Curves 2 and 3 of A and C are for 0.5 0 diameter spot 1.3 0 from central field axis (half-sensitivity point); dynamic range 40 cd/rn". Ordinate units for h, are (p,V)/[(intensity unit) . s]; for h 2 , (p,V}/[(intensity unit) . S]2 [adapted from McCann, 1974].
300
SELECTED APPLICATlONS
ha.hb
O,S
r
~
>e
0
-o.s ·1
-i.s
0
10
20
30
40
t,ma
haa
hbb
Flgure 6.12 First-order and second-order Wiener kemels for an Eristalis LMC. Upper panel shows the first -order kemels for center (solid) and surround stimuli (dashed) in mV/(Cms). Lower panels show the second-order kemels in units mV/(Cms)2 for the center (haa) and the surround (hbb) stimuli [adapted from James , 1992].
Ly + 1(Y) =x
(6.1)
where L is a linear differential operator L(D) = a p + . . . + aJD+ ao. In the steady state, for fixed input xo, we have a fixed output Yo that is given by the sigmoidal function Yo = F(xo), where F is the inverse of the decompressive function j'(v) + aaY [because all other terms in the linear operator L(D) vanish at the steady state]. For instance , if l(Yo) = tan(Ayo) and ao = 0, then F(xo) = arctan(xof A). This is consistent with the experimentally observed decre asing gain sensitivity of the photoreceptor with increasing mean level of stimulation, because the gain sensitivity relates to the derivative of F. Aperturbation x(t) about a fixed stimulus level Xo yields Ly + aaYo +l(yo + y ) = Xo + x
(6.2)
where y (t) is the response to the perturbation around a fixed response level Yo corresponding to xo. If we use a Taylor expansion oif about Yo,
a.r
l(yo + y) = N o) + -(yo)y +
ay
ePl
..:l. .2
uy
y2
(Yo)- + . . . 2
(6.3)
6. 1
NEUROSENSORY SYSTEMS
301
then Equation (6.2) yields the perturbation equation Ly+ LC,JJn =x
(6.4)
n=l
because auYo+J(yo) = xo, where
c; = ~. il'f n!
ayn (Yo)
(6.5)
Therefore, the dynamic response to an input perturbation depends on Yo through the coefficients {c n } . This is consistent with the experimental finding that the estimated Volterra kemels change for different mean levels of stimulation (and response). Specifically, the first-order Volterra kernel for mean levels Xo and Yo is the solution of the linear differential equation Ly + CI(YO)y = x
(6.6)
Since Cl is the derivative ofJatyo (i.e., the slope ofthe decompressive nonlinearity atyo), the anticipated changes in the waveform of the first-order Volterra kernel are greater for the values ofyo where the derivative changes faster. This relatively simple nonlinear model seems to explain the experimental observations to date. The second-order Volterra kernel for mean levels Xo and Yo can be found to be (see Section 4.1.5) in the bifrequency domain K 2(wJ, W2) = -c2(Yo)K I(WI)KI(W2)KI(WI + W2)
(6.7)
where
K1(w]) = L(w) + Cl(YO)
(6.8)
When the morphology of the second-order kerne I in certain cases changes with mean stimulation level differently from the manner stipulated by Equation (6.7), we must augment the model of Equation (6.1) to include the derivative termj' in the feedback nonlinearity as j(y, y). Of course, the precise form of this nonlinearity with respect to y depends on the observed changes of the kemels and requires elaborate analysis using the methodology presented in Section 4.1.5. Specifically, ifwe denote
if+lj .
Ckl=
-ay"ayl (yo, 0)
(6.9)
then 1 K](s) = L(s) + CO,]s
+ c],O
(6.10)
302
SELECTED APPLICATIONS
and SI + S2 ] K 2(Sb S2) = - [ C2,O + CO,2SI S2 + CI,I--2- K I (sl)KI(s2)K I (SI + S2)
(6.11)
or in the time domain, T m=min( Tl, T2)
kl(A)kl(TI - A)kl(T2 - A)dA Io Tm - CO,2 Io k1(A)k{( Tl - A)k{(T2 - A)dA
k2(Tb T2) = -C2,O
C
- --.ld 2
ITm k1(A)[ki(TI -
°
A)k1(T2 - A) + kl(TI - A)k{(T2 - A)]dA
(6.12)
where k{ denotes the derivative of k.. It is evident from Equations (6.10) and (6.11) that the augmented model affords greater flexibility in explaining the experimentally observed changes in the first-order and second-order Volterra kemels. This model structure can be validated by checking the analytical relation between the first-order and second-order Volterra kernels, provided that accurate estimates can be obtained from the data. Note that the key parameters {Ck,/} depend on the mean level of stimulation and on the morphology ofthe function.f(y,y) aty = Yo andy = O. Modeling ofthe fly eye has been popular because ofthe relative simplicity and stability of the experimental preparation. It will be revisited again in Chapter 7 in connection with two-input modeling because it represents the seminal application of two-input modeling following the Volterra-Wiener approach that was pioneered by Panos Marmarelis in connection with Gilbert McCann's studies of directional selectivity in the fly eye [Marmarelis & McCann, 1973].
6.1.3
Auditory Nerve Fibers
As an initial example from the auditory system, we consider the second-order Wiener models obtained by Ted Lewis and his associates from high-frequency axons ofthe basilar papilla in auditory neurons of the American bullfrog (Rana catesbeiana) [Yamada et al., 1997; Yamada & Lewis, 1999; Lewis et al., 2002a, b]. These auditory units are not phase-locking to stimuli within an octave of their characteristic frequencies (i.e., their best excitatory frequencies), which are above 1000 Hz. When presented with a short segment (100 ms) ofband-limited noise that covers one octave above and below their characteristic frequency (CF), the first-order Wiener kernel estimates obtained through crosscorrelation are weak and unreliable. However, the obtained second-order Wiener kernel estimates are strong and reveal a consistent structure illustrated in Figure 6.13 for two experimental preparations. Singular-value decomposition of this second-order Wiener kernel shows that two singular values are dominant and their respective singular vectors exhibit a quadrature relation (i.e., they have the same waveform but shifted by 90°) as shown in Figure 6.14 for the two experimental preparations, along with the singular vectors for the subsequent six singular values. It is evident from Figure 6.14 that no discernible information resides in the singular vectors below the fourth one. However the third and fourth singular vectors contain some interesting features, albeit weak, that are
6.1
1
10 0
3l 8
0
E
-:g:;, 20
-
'e Cl ~
6
303
B
I
12
NEUROSENSORY SYSTEMS
10 ' ... ~ . . ..... .. . ............... .
-1
4 4
6
8
0 0
10 12
msec I
10
5
15
rank
n
12
2
10
!
~'
81
j
I
I
0
0
3l 81
6
8
I
6 4
•
I •
1
-1
4
:JI
E
0 -1
I
4
10 12
8
6
10 12
msec
msec I
n
2
30 " ~"
..... .. . .. ..... ......
Ql
!
81
1
I
"0
.~
0
c:
20' ... .......... .. . .. ....
Cl l'll
E 10 ' " .. ._ ................. -1
4
6
8 msec
10 12
0 0
10
5
15
rank
Figure 6.13 Second-order Wiener kemeis from high-frequency axons of the basilar papilla in the bullfrog. The stimulus noise was flat from appro ximately 100 Hz to 5.0 kHz, with an ampl itude of 90 dB SPL measured over 95 Hz bandw idth. The kemeis were estimated through reverse co rrelation over the 15 ms immediately preced ing each sp ike (producing a 300 x 300 matrix of values). The grey scale here shows a 200 x 200 region of the original kemei, focusing on the significant area. B: The sixteen highest -ranking singular vectors in the singular-value d ecomposition of the original (300 x 300) Wiener kem el of panel A. Open circles denote singular vectors that make negative (inhibitory or suppressive) contributions to the predicted spike rate, crosses denote singu lar vectors that make pos itive (excitatory) contributions. C: Second -order Wiener kemel from another unit at lower dens ity of 76 dB SPL. D: Reduced kemel of panel C, reconstructed from only the two highest -ran king singular vectors (the highest-ranking quadrature pair). E: Reduced kemel of panel C, reconstructed from the six highest -rank ing singular vectors. F: The sixteen highest-ran king singular vectors in the singuJar-value decomposition of the Wiener kemel of panel C [Yamada & Lewis , 1999).
also evident in the frequency domain shown in Figure 6.15 (i.e., low gain at CF suggestive ofnonlinear spectrolateral interaction). We observe that the phasic relation of the first pair of singular vectors relative to the second pair of singular vectors (which can be viewed as the four PDMs of this system) is such that the second pair provides a form of spectral " lateral inhibition" (akin to the spatial lateral inhibition long studied in the visual
304
SELECTEDAPPLICA T10NS
unit 3/1/97 #2
A
:o
~
unit 3/1/97 #1 1
A~~~L..
1
E
:
~~~
I~
~IF o . 2
2
0
-2
-2
I _: ~
C
o~:
-2
IH
I
-2
4
6
B msec
10
12
4
6
B
10
12
msec
Figure 6.14 Singular vectors for the two Wiener kemeis in Figure 6.13 (A and C). The singular vectors were computed for the entire 300 x 300 matrix of each kemel , and each was scaled by its associated weighting factor. The 10 ms segments shown here correspond to the time frames shown in Figure 6.13. Each of these segments is a 200 element vector. Panels A through D are the segments for the Wiener kemel in Fig 6.13A; panels E through H are those for the Wiener kemel in Fig 6.13C. A, E: First-ranking singular vector (dark line) and second ranking singular vector (grey line). S, F: Thirdranking singular vector (dark line) and fourth-ranking singular vector (grey line). C, G: Fifth-ranking singular vector (dark line) and sixth-rank ing singular vector (grey line). D, H: Seventh-ranking singular vector (dark line) and eighth-ranking singular vector (grey line) (Yamada & Lewis, 1999].
system) suitable for enhancing the ability ofthe system to perceive and distinguish single tones in the presence of ambient noise or interference (a form of "spectral contrast enhancement"). The quadrature relation between the two highest singular vectors suggests an envelope detection mechanism following a band-pass filtering operation centered at CF. The Lewis model is consistent with the van Dijk sandwich (L-N-M cascade) model of the same system [van Dijk et al., 1994, 1997a, b], which conforms with the prevailing biophysical view of the function of an auditory unit with a single hair cell; i.e., the first (band-pass) filter represents the mechanical tuning structure peripheral to the hair cell, the static (rec-
6.1
r::: :.
unit 3/1/97 #2 A
40 30 20
: :
D. . ~
40f
:
305
uni! 3/1/97 #1
. ~40
· ..
i
~.. ~
NEUROSENSORY SYSTEMS
0=
11'
I
40
I
I
,
f·f .. :
: ~ .. ;
1"' .
,
,
30 f ..... ; ......
2TFv . ?TIJ 10 .... . :
o
:
i
c 40 f · ·
40 f
:
,
0
,
0
:·
;· ·
;
:
D········..
30 t 20
:
;
~
A
A'
;
.
<3
30
;
20
/.' .
40 f· ··H. . :
;
30
;
. :
20
.
A: . ;
..
.
;
:
.
;
;
.
1300
2600
.
.
; ..
) 10 .. .
10~
o
.. 40
··~· ··· · · · · ·i · ···· ··· · ·: · · · · · · · · ·j ;
:
I
325
650
1300
Hz
2600
325
650
Hz
Figura 6.15 Discrete Fourier transforms of the complete (300-element) versions of the corresponding singular vec!ors in Fig. 6.14 [Yamada & Lewis, 1999).
tifying) nonlinearity represents the hair-eell transduction process, and the second (lowpass) filter represents signal transformations through the afferent synapse and the primary afferent axon to the spike trigger. Note that the singular-value decomposition of an amphibian papilla unit yielded only one significant singular value, which is a scaled version of the first-order Wiener kerne1 [Lewis et al., 2002a], as shown in Figure 6.16. This implies that the L-N cascade model applies to this case. Finally, Lewis' group reported two significant singular values for the second-order Wiener kernel of a midfrequency cochlear unit of a Mongolian gerbil , corresponding to a quadrature pair of singular vectors, where the strongest singular vector scales to the obtained first-order Wiener kernel of this system [Lewis et al., 2002b]. The latter can be viewed as the "excitatory pathway" and the other singular vector (lagging by 90%) can be viewed as the "inhibitory pathway" [Lewis et al., 2002b].
306
SELECTED APPLICATIONS
(8) Rank 1 Approx.
(A) K2 25 50
20 15
15 0 10
10
r:;J
50
•
0
5
5
•
-50 10
10
20
-50
20
(0) FFT(k1) end FFT(u1)
(C) WK1 (0) .vs. u1(-) 0.4
-0.4'
o
, 5
10
15
20
2
10
Figure 6.16 Second-order kernel of amphibian Papilla (top left) and its only significant singular vector (bottorn left) which is identical to its first-order kernei, shown also in the frequency domain (bottom right). The top right panel shows the second-order kernel reconstruction based on the singular vector [Lewis et al., 2002a] .
The observed quadrature relation between the singular vectors of nonphase-locking units is viewed as a confirmation of the anticipated fact that envelope detection is the critical operation for these units with high CF, since this is the only practical way to encode phase information of the stimulus for these high-frequency units . For units with lower CF, the phase-locking characteristics are also encoded by the first-order kernel. We must point out that these observations do not incorporate the effects of nonlinear feedback in the cochlea as expressed on changes observed in first-order Wiener kernel estimates for varying stimulus power-reported by Moller (1978) and, among others, by deBoer and Nuttall (1997) as well as Recio et al., (1997) for basilar-membrane response to noise and broadband stimuli, consistent with our analysis in Section 4.1.5, and shown in Figures 6.17 and 6.18. It is important to note that the two dominant singular vectors observed in auditory units can be viewed as the "principal dynamic modes" of this system and their quadrature relation depicts the critical role of the second-order nonlinearities in maintaining the phase sensitivity ofhigh-frequency sensors. Note however that the conventional "tuning curve" cannot be equated with the minus log ofthe FFT magnitude ofthe first-order kernei, since higher-order kerneis are contributing to the tuning curve . We should also note the pioneering efforts of Aage Moller , who obtained the first-order Wiener kerneI estimate s for various auditory neurons with different characteristic fre-
6.1
I
307
ccf spectrum
ccf wavefonn 1.0
NEUROSENSORY S YSTEMS
I
(a)
0 dB
·10
0.5
·20
0.0
·30 00.5
-40
FIg.1
Alt:
80dB ·50
-1.0 0
20
40
Tme['l!tol
1IO
1IO
1.87
mal
100
20 15 Frequency [kHz)
25
ccf spectrum
ccf waveform 1.0
10
I (a)
I
0 dB
·10
0.5
-20
0.0
·30 00.5
-40
FIg.2
Alt:
20dB
·50
-1.0 0
20 40 Tme['l!tol
1IO
1.87
1IO
mal
100
10
15 20 Frequency 1kHz]
25
Figure 6.17 Top panels show the input-output cross-correlation funct ion (ccf) at a low level of stimulation (attenuation 80 dB). Left panel (a): waveform of ccf, show n on an arbitrary amp litude scale. Right panel (b): spectrum of ccf , shown on a 50 dB scale and normalized with its peak at 0 dB. Bottom panels show the input-output cross-correlation function at a 60 dB higher level of stimulation (attenuation 20 dB), measured in the same animal [de Boer & Nuttall, 19971.
quenc ies [Moller, 1973, 1975, 1976, 1977, 1978, 1983] and demonstrated the effe ct of increasi ng power of stim ulation (down-shift in resonant freque ncy and the broadening of the bandwidth), consistent with Figure 6.17 and our analysis of negative compressive feedback (see Section 4. 1.5), as well as the extensive work of Peter Johanesma and his associates Aertsen and Eggermont on spectrotemporal analysis (see Sec . 7.4) of the auditory system [Aertsen & Johanesma, 1981; Eggermont et a1. , 1983]. As we move to the modeling study of a spider mechanoreceptor in the following section, we shou ld be rem inded that the hair cells in the cochlea are also mechanorece ptors , allowing the comparison between verte brate and inverte brate mechanoreceptors.
6.1.4
Spider Mechanoreceptor
Transduction and detection of mechanical stimuli takes place in living systems for both somatosensory and regulatory purposes. It is perfonned by mec hanoreceptors that operate with sufficient gain over a broad range of stimuli in order to detect small stimuli with adequate sensitivity and also accommodate naturally occurring. stimuli of high intensity. In
308
SELECTED APPLICATlONS
Rrst-order Wiener kemel 100
10
5dBNHz
50 5dBNHz
o
0.1 ·50
·100
0.01 25
~E
1
~ .e:
0
8.
·2
.l9 CIl
.3
So ·1
35
45
~ -4
GI
~
55
c, I
o
I
·5 ~.J
LIBO
I
T
,
i
I
I
i
i
i
i
2
3
4
1
3
5
7
9
11
13
Time (ms)
Frequency (kHz)
Figura 6.18 First-order Wiener kemels (Ieft column) obtained for different noise levels and their magnitude and phase spectra (right column). Observe the broaden ing of the resonant peak and the downshift of CF as the stimulation increases [Recio et al., 1997].
addition to adapting their gains to the prevailing stimulus conditions, mechanoreceptors must have sufficiently rapid response time for the intended task. The detailed dynamic behavior ofmechanoreceptors, and the specific molecular mechanisms involved, are not yet fully understood, mainly because of the small physical size of most mechanoreceptors and their intrinsic dynamic nonlinearities. As an illustrative example, we consider a spider mechanoreceptor (the cuticular receptor in the slit-sense organ of the spider Cupiennius salei) tested with pseudorandom quasiwhite mechanical strain stimuli caused by forced displacement of the slit, and recording three distinct outputs: (I) the induced intracellular current (under voltage-clamped conditions), (2) the transmembrane potential (under current-clamped and spike-suppressing conditions), and (3) the extracellular action potentials . In all cases, the experimental preparation proved to be very stable and the obtained Volterra kernel estimates using the Laguerre expansion technique (LET) were consistent from preparation to preparation [Marmarelis et al., 1999a]. Details of the experimental preparation are given in Marmarelis et al., (1999a). Here, we simply point out that the applied Gaussian quasiwhite mechanical strain stimulus has a bandwidth of approximately 400 Hz and the input-output data were sampled at 1 ms. The responses to 10 repetitions ofthe same quasiwhite stimulus segment were averaged to improve the SNR of the output data (in the first two cases of continuous output data, but not in the case of output action lt0tentials). Sufficient time was allowed between recordings to avoid transient effects, since\i nonzero mean level of the stimulus was used. Experiments
6. 1 NEUROSENSORY SYSTEMS
309
were performed for two different mean levels (with the same bandwidth and power level of pseudorandom stimulation in both cases). Although long datasets can be obtained due to the stability of the preparation, only 4,000 datapoints were used in this analysis for each kernel estimation via LET. The first-order and second-order Volterra kerneI estimates for the intracellular current response are shown in Figures 6.19 and 6.20 for the two mean stimulus levels. Indicatively, we note that the normalized mean-square errors (NMSE) of the model prediction are
1ST-ORDER KERNEL ESTtMATES FORCURRENTOATA
8.10. ' Q
-,x,o-, -0.01-' --0.024
-0.032 ...00.04
.
~~I'
-o.o..e -0.058 -0.08.
-0.072
o
10
.0
20 30 TtME lAG ["SEC]
50
FFT MAGNtTUOEOf 1 sr-ORDER KERNELS rOR CURRENT DATA
$)(10-"
•.5x10·· .)(10. 4 .,. -'
3.5)(10-· 3)(10. 4
... -,
,
'--...... _---
2.5x 10·"
....
.........................
-..
2)(10·"
- ....... --- __ 2
1.~h(10-"
to"' 4
5JC10· s 0
0
\l.1
0.2 0.3 F'REOUENCV 1KHZ]
0.4
0.5
Figure 6.19 The obtained first-order kernel estimates of the mechanoreceptor using LET on the intracellular current response data (under voltage-clamped conditions) in the time (top) and frequency (bottom) domains, for two different mean displacement stimulus levels (solid: 29.02; dashed: 31.90 j-tm). The peak kernel value for the higher mean displacement level is about 40% larger, but the waveforms of the two kerneis are similar, exhibiting mildly low-pass characteristics. The ordinate axis units of the first-order kernel are nA/(j-tm ms) [Marmarelis et al., 1999a].
310
SELECTED APPLICATIONS
o
I~
~
I O~
X-MIN- 0.0 X-MAX- 50
~
1, Z- MIN- - 8.841 x 19,-3 Z-MAX- 1.584x1 0-
Y-MIN - 0.0 Y-MAX- 50
k,
o
->: ~ X-MIN - 0 .0 X-MAX- 50
Y-MIN- 0.0 Y-MAX _ 50
1,
~:~: 2.~~~2~6-3
Figure 6.20 The obtained second-order kernel estimates of the mechanoreceptor in the time domain for the two data sets discussed in Figure 6.19. The peak-to-peak kernel value for the higher mean displacement (bottom panel) is about 25% larger, but the waveforms of the two kemels are similar . The ordinate axis units for the second-order kernel are: nA/(pm rns)? [Marmarelis et al., 1999a).
about 34% for the first-order and about 8% for the second-order model, when evaluated for a segment of data not used for the estimation of the kemels (out-of-sample prediction). The results demonstrate the inadequacy ofthe first-order (linear) model and the good prediction obtained by the second-order model. Next, we compute the PDMs (principal dynamic modes) ofthis system using the presented first-order and second-order kernel estimates. Only two significant PDMs are found in this system and they are shown in Figure 6.21 for the aforementioned two
6. 1
NEUROSENSORY SYSTEMS
311
1.125 0.931S 017S 0.5625 0.375 0.1875
... , 0
~~._~
.-2
-0.1875 -0.375 -0.5625 -0.75
0
10
20
JO
40
50
rlME LAG (USEC]
1.125 0.9375
0.75 0.5625
lf:, I
0.375
~J
0.1875
:c:
0
~'-}-"'i-' , ,
-0.1875
~
-0.375
.:,
: I I I
,, ,, 'I "I, :' I
-0.5625
I I,
-0.75
0
10
20 30 TIME LAG (USECJ
40
50
(a) Figure 6.21 (a) The two principal dynamic modes (PDM) of the mechanoreceptor in the time domain using the intracellular current kerneis of Figures 6.19 and 6.20. The PDMs for the two mean displacement levels (Iow: top; high: bottom) are rather snnnar. The waveform of the first PDM (solid) is similar to the first-order kernel (with reverse polarity) exhibiting mild low-pass characteristics, whereas the second PDM (dashed) exhibits strong high-pass characteristics, as iIIustrated in (b) where the FFT magnitudes of the two PDMs are shown (see next page). The high-pass characteristic of the second PDM is evident but it resides entirely in the second-order kerne!. The corresponding eigenvalues are both negative and indicate that the contribution of the first PDM to the current response Is much larger than the contribution of the second PDM. The ordinate axis units for the PDMs are nA/
312
SELECTED APPLICATIONS 8)(10- J 7.2><10- 3 6.4)(10'-:3 5.6)(10- 3 "--'~
4.8)(10- 3 4)(10- 3
,
3.2)(10- 3
,
,,
............ ----- __ 2
I
,,
2.4><10- 3
,...
... -"
,
---1
f
I
1.6><10- 3
,
I
I
', ... ,"''' "
8><10- 4
o o
0.1
0.2
0.3
0.4
0.5
FREQUENCY (KHZJ
8><10- 3 7.2><10- 3 6.4><10- 3 5.6)(10- 3 4.8)(10- 3 4)(10- 3
,
, , " '"
"------- .. -.... _-.. --------
2
,,'
3.2)(10- 3
,, ,,
I
I
2.4)(10- 3
,, ,,
1.6)(10- 3 8)(10-"
... ...... '
'
,, ,,
, <>: '
o o
0.1
~2
~3
0.4
0.5
FREQUENCY [KHZ]
(b)
recordings of low and high mean displacement level of stimulation. The consistency in the form of these PDMs for different recordings is remarkable. The corresponding eigenvalues for the first (Al) and second (A2) PDMs are Al = -0.0317 and A2 = -0.0046, and Al = -0.0632 and Al = -0.0078 for the two recordings, quantifying the relative contributions of the two PDMs to the system output power. The FFT magnitudes in Figure 6.21(b) demonstrate the high-pass characteristics ofthe second PDM (evident in the second-order kernel), and the mildly low-pass characteristics of the first PDM that resembles the firstorder kernel. This result also demonstrates that the high-pass characteristics of the mechanoreceptor reside in the second-order kernel and are strictly nonlinear. We must note, however, that the observed high-pass characteristic is due in part to the artificial numerical representation of an action potential with an impulse function (instead of its actu-
6.1
NEUROSENSORY SYSTEMS
313
o
X-MIN- -3 X-MAX- 3
Y-MtN- -2.5 Y-MAX- 2.5
Z-MIN- -0.3418 Z-MAX- 0.01318
Figure 6.22 The static nonlinearity associated with the two PDMs shown in Figure 6.21 for low mean displacement. The ramp-threshold characteristic with respect to U1 is altered when U2 becomes negative [Marmarelis et al., 1999a].
al waveform). When the actual waveform ofthe action potential is taken into account, the high-pass PDM is attenuated to some extent at the highest frequencies. The nonlinearity associated with these two PDMs is shown in Figure 6.22 for the low-mean displacement level. It is evident from the plot that the nonlinear dependence of the system output (intracellular current) upon the first PDM output, Ub follows a ramp-threshold characteristic whose critical threshold value remains fairly close to zero for positive values ofthe second PDM output, U2' but rapidly decreases for negative values of U2' This nonlinear surface is concave (because of the negative eigenvalues) reflecting the fact that the differential change of intracellular current in response to an increase of displacement forcing is negative. Note also that the slope of the ramp-threshold curve with respect to UI decreases with decreasing U2, especially for negative values of U2' The form of this nonlinearity indicates directionally selective behavior of this mechanoreceptor, since the response characteristics with respect to UI (magnitude of displacement) are distinct1y different for negative and positive values of U2 (velocity of displacement). Similar observations apply to the nonlinearity for the highmean displacement level [Marmarelis et al., 1999a]. This interesting nonlinearity and the associated two PDMs constitute a complete nonlinear model for the system defined by the mechanical displacement input and the intracellular current output (under voltage-clamped conditions). The results for the transmembrane potential data collected under current-clamped conditions (and spike suppression with TTX) are also very consistent in form across recordings. Two representative first-order kemels obtained for the same stimuli used in the two previously presented current recordings are shown in Figure 6.23 in the time and frequen-
314
SELECTEDAPPLICATIONS
cy domains. These first-order kerneIs look like low-pass filtered versions oftheir intracellular current counterparts ofFigure 6.19 (due to the transmembrane capacitance) but their size relation is more exaggerated in favor ofthe higher mean displacement level case (i.e., the peak value ofthe corresponding kernel is about double, compared to about 40°A> larger for the current output case). The corresponding second-order kerneIs are shown in the time domain in Figure 6.24. The consistency in the form ofthese kernels is again evident, and the size relation is similar to their first-order kernel counterparts of Figure 6.23. The first-order and second-order model predictions are shown in Figure 6.25 along with the actual output for a segment of data not used for the estimation ofthe kerneIs (out-of-sampIe prediction). The significant contribution of the second-order kerne! is again evident (NMSE for second-order prediction is 25.9% versus 60.2% for the first-order prediction), demonstrating the inadequacy of the linear (first-order) model and the presence of significant second-order nonlinearities. Computation of the PDMs from these kerneI estimates again yields only two significant PDMs for the case of output potential, shown in Figure 6.26 for the same two recordings. We observe that the first PDMs are now more low-pass than their currentdata counterparts of Figure 6.21, resembling the first-order kernel waveforms, whereas the second PDMs remain distinctly high-pass and notably similar in waveform (although of reverse polarity) to the second PDMs of the previously analyzed current data. The corresponding eigenvalues for the first (Al) and second (A2 ) PDMs are Al = 2.877 and A2 = 0.145, and Al = 2.649 and A2 = 0.247 for the two mean levels of stimulation, indicating the relative contributions of the two PDMs to the system output power (the first is about one order of magnitude larger). These PDMs are shown in the frequency domain (FFT magnitude) in Figure 6.26(b). Clearly, the first PDM dominates for low frequencies up to about 160 Hz, and the second PDM is dominant above that frequency. Thus, for the transmembrane potential data, the two PDMs seem to divide the operational bandwidth at about 160 Hz. It should be noted again that part of the high-pass characteristic is due to the artificial representation of an action potential with an impulse function. The nonlinearities associated with these two PDMs for the two mean displacement levels are shown in Figure 6.27 and exhibit convex morphology, reflecting the fact that the differential change oftransmembrane potential in response to an increase of displacement forcing is positive (this is also the reason for the positive eigenvalues). The form of the nonlinear surface changes slightly with mean displacement level (in addition to the obvious and anticipated change in elevation), and exhibits less asymmetry with respect to the output ofthe second PDM than in the previous case of current output. These results demonstrate the efficacy and the utility of PDM modeling and analysis (based on Volterra models) in enhancing our understanding ofthe nonlinear dynamic behavior of the mechanoreceptor system. The form of the nonlinearity in the current data resembles threshold behavior akin to half-wave rectification (with a negative slope) for positive values ofthe second PDM (U2 > 0), i.e., a positive displacement Ul (relative to the mean) elicits negative current for positive displacement velocity U2, but for negative displacement velocity the aforementioned displacement threshold is reduced and the slope of the supratheshold response is also reduced (i.e., negative current will continue flowing for negative velocity, even when the displacement becomes negative, but the sensitivity will be lower). This response behavior is both position sensitive and velocity sensitive. For the potential data, the nonlinearity again exhibits rectifying behavior, having
6.1
NEUROSENSORY SYSTEMS
315
1ST-ORDERKERNEL ESTIMATES rOR POTENTIAL DATA
1.125
0.8'5 0.75
0.625
0.5 0.375 0.25
0.125 0 ....
,
..
-0.125 10
0
20 TIME lAG
JO (MSECl
40
50
rrr MAGNITUDE OF 1ST-ORDER KERNELS FOR POTENTIAL DATA
0.02 0.018
,,-\
O.OHS
,
I
0.014
I
\
\
,,
,
\
\
0.012
\ \
,
0.01
\"""""
8)(10-~
6)(10-3
......
4)(10-3
.........
..... ....
..._~~--------------
2x'O-J 0 0
0.1
0.2 0.3 F'REOUENCY (KHZ)
0.4
0.5
Figure 6.23 The obtained first-order kernel estimates using LET on the intracellular potential response data (under current-clamped conditions) in the time (top) and frequency (bottom) domains, for the same stimuli as in Figure 6.19. These first-order kerneis look like low-pass versions of their intracellular current counterparts in Figure 6.19 (due to the transmembrane capacitance), although the size relation is more exaggerated in favor of the higher mean-displacement-Ievel case. The ordinate axis units for the first-order kernel are mV/(p,m ms) [Marmarelis et al., 1999a].
supralinear response eharaeteristies for Ul > 0 (i.e., changing faster than linear) and flattened response for Ul < O. The effeet of U2 is slightlyasymmetrie and eauses inereased response potential for larger displaeement speed, also in supralinear fashion. Thus, this response behavior is also position sensitive and veloeity sensitive, but not very direetionally selective. Note that these experiments were made under elamped eonditions and with TTX-suppressed fast sodium ehannel, whieh may account for some of the differenees in
316
SELECTEDAPPLICATIONS T2
~
~
~~ ~~ Y-MIN= 0.0 Y-MAX= 50
X-MIN=- 0.0 X-MAX= 50
T(
Z-MIN= -0.024.33 Z-MAX= 0.1202
~~ k2
o
/~~
X-MIN= 0.0 X-MAX= 50
~~ V-MIN= 0.0 Y-MAX= 50
Z-MIN- -0.05652 Z-MAX= 0.2034
Figure 6.24 The obtained second-order kernel estimates in the time domain for the two data sets discussed in Figure 6.23. The waveforms are similar for the two mean displacement levels, but the size for the higher mean displacement level (oottorn panel) is about double-in rough correspondence to their first-order kernel counterparts. The ordinate axis for the second-order kernel are mV/{J.tm rns)", The 'T1 and 'T2 axes are in ms units (10 ms bar shown) [Marmarelis et al., 1999a].
the nonlinear behavior evident in the current and potential response data. Finally, we should note that, although the form of the nonlinearity for Ul > 0 appears initially to be supralinear, it gradually becomes linear (possibly an inflection point) and tends to become sublinear as Ul increases further and reaches the end ofthe dynamic range (i.e., sigmoidal overall shape). The presented PDM model is more compact than its Volterra counterpart (e.g., for second-order models the numbers of free parameters are 108 and 1378, respectively). However, the Volterra model includes the dynamics represented by the less significant eigenvalues/eigenvectors that are omitted from the PDM model. In this application, the
6. 1
100
/00
300
400
NEUROSENSORY SYSTEMS
tlOO
317
600
TIME (MSEC)
Figure 6.25 A segment of transmembrane potential test data under current-clamped conditions (trace 1) and the Volterra model predictions of first order (trace 2) and second order (trace 3). The significant contribution of the second-order kernel to the response potential is evident. The normalized mean-square errors are 60.2% for the first-order, and 25.9% for the second-order model prediction. Note that action potentials are suppressed by use of TTX [Marmarelis et al., 1999a].
difference in prediction mean-square error was marginal, lending support to the notion of a "minimal model" based on PDM analysis. Furthermore, the PDM model can be extended to nonlinear orders higher than second (even though limited to the selected PDMs in terms of dynamies), whereas the Volterra models cannot be practically extended into higher-order nonlinearities because of the computational burden associated with the rapid increase in the number of free parameters. The nonlinear dynamic behavior observed in this analysis agrees weIl with experiments using step displacements, where positive steps (indenting the slits) caused significant inward currents, while negative steps caused much smaller reductions in inward current. This asymmetrie nonlinear behavior was more pronounced in the initial dynamic responses to steps than in the late responses near the end of the step stimulus, as reflected in the obtained model by the high-pass properties ofthe second PDM that is primarily responsible for the nonlinear transient behavior. The physiological system responsible for the receptor current consists of the slit cuticle between the stimulator at the dendritic sheath surrounding the neuron tip (a smaIl, presumably fluid filled, region between the dendritic sheath and the neuronal membrane) and the mechanically activated ion channels in the neuronal membrane. The two important questions in interpreting the obtained nonlinear dynamic model are: (1) what biophysical mechanisms could correspond to the two PDMs, and (2) what is the biological basis ofthe nonlinearity? Although neither question can be answered with certainty at present, it appears that the two distinct PDMs, by exhibiting low-pass and high-pass characteristics, respectively, may correspond to two types of mechanically activated ion channels in the neuronal membrane that have fast (sodium) and slow (potassium) dynamies. The latter in-
318
SELEC TED APPLICATIONS 0.75 0.6
0.45
0.3 0.15
.......... ~.~
0
_,I
2
e~
-0.15
-0.3 -0.45 -0.6
-0.75
0
10
20 JO TIME LAG [MSEC]
40
50
0.75 0.585 0.42 O~25S
0.09
-0.075 -0.24
-0.405 -0.57
-0.735 -0.9
l----r---~m'H_~_-.,.--
o
i
,
10
20
,;
JO
i
,
40
50
rlME LAG (MSEC)
(a) Figure 6.26 (a) The two PDMs in the time domain, using the transmembrane potential kerneis of Figures 6.23 and 6.24. Again, the waveforms are similar for the two mean displacement levels (top: high level, bottom: low level), and the first PDMs (solid) resemble in waveform the first-order kerneis. The second PDMs (dashed) are similar in waveform to their counterparts for the intracellular current data (with reverse polarity). The corresponding eigenvalues are both positive and indicate that the relative power contribution of the first PDM is about one order of magnitude larger. (b) (See next page.) The two PDMs for the transmembrane potential data in the frequency domain (Le., FFT magnitude of the PDMs). As previously, the high-pass characteristic of the second PDM is evident. The two PDMs appear to divide the frequency response bandwidth, whereby the first PDM is dominant below 160 Hz and the second PDM is dominant above that frequency [Marmarelis et al., 1999a]..
6. 1
NEUROSENSORY SYSTEMS
319
0.0125 0.01125 0.01
7.5)(10- 3
5)(10- 3
- - - - - - - - - - -
2.5)( 10-3
,,- ... ---
1.25)(10-~
.r>
,,
2
,,
3.75)(10-
,
, ," \,
o o
0.1
0.2
0.3 F"REQUENCY [KHZ]
0.4
0.5
0.01 9)(10- 3 8)(10- 3 7)(10- 3 6)(10- 3 5)(10- 3
_~~--------------------
4)(10- 3
,,
3)(10- 3
2
;-
I
I I I
2)(10- 3
J
",. ... "
I I I
I
10- 3
"
..
I
""
J
..
"
J
0
0
0.1
0.2 0.3 F"REOUENCY [KHZ)
0.4
0.5
(b) Figure 6.26 (continued)
cludes the possibility of a calcium-activated potassium channel. Experiments that eliminate selectively the permeant ions can be elucidating in this regard. Another factor possibly inducing nonlinear behavior is the fluid between the dendritic sheath and the neuronal membrane, which could conceivably cause nonlinear dashpot action. It is also likely that the nonlinear dynamics measured here reflect the connection of the deformation of the neuronal membrane to the molecular structures of the mechanically activated ion channels and their linkages to the cytoskeleton. Detailed models of mechanically activated channels are starting to emerge and can benefit from the quantitative nonlinear dynamic descriptions ofmechanotransduction provided herein [French, 1984a, b; Sachs, 1992].
320
SELECTED APPLICATIONS
Z-MIN- -0.125 Z-MAX- 10.568
X-MIN= -6 X-MAX= 6
y
~ U,
~
~~~
X-MIN- -6 X-MAX= 6
Y-MIN- -3 Y-MAX-.3
U2
Z-WH- -0.8301 Z-t.fAX- 11.475
Figure 6.27 The static nonlinearities associated with the two PDMs of Figure 6.26 for high (bottorn) and low (top) mean displacement levels. The axes (U1' U2) represent the two PDM outputs, and the vertical axis is the transmembrane potential response under current-clamped conditions and suppression of action potentials with TTX. The axes ranges are given at the bottom of each plot (1 rnV bar shown). The convex non linear characteristic is evident as weil as the slight asymmetry with respect to U2 [Marmarelis et al., 1999a].
6.2
CARDIOVASCULAR SYSTEM
As an illustrative example from the cardiovascular system, we select the modeling study of the nonlinear dynamics of cerebral blood flow autoregulation. The traditional concept of cerebral autoregulation refers to the ability of the cerebrovascular bed to maintain a relatively constant cerebral blood flow despite changes in cerebral perfusion pressure. Because of the high aerobic metabolie rate of cerebral tissue, the maintenance of adequate cerebral blood flow through cerebral autoregulation is critical for survival. Under normal conditions, it has been observed that a sudden drop in the arterial blood pressure level causes an initial drop in the level of cerebral blood flow that gradually returns to its previ-
6.2
CARDIOVASCULAR SYSTEM
321
ous value within a couple of minutes, due to multiple homeostatic regulatory mechanisms that control cerebrovascular impedance over several time scales (from a few seconds to a couple of minutes) [Edvinsson & Krause, 2002; Panerai et al., 1999, 2000; Poulin et al., 1996, 1998; Zhang et al., 1998, 2000]. With the development oftranscranial Doppler (TCD) ultrasonography for the noninvasive measurement of cerebral blood flow velocity in the middle cerebral artery with high temporal resolution, it has been shown that blood flow velocity can vary in response to variations of systemic arterial blood pressure over various time scales. We consider data representing the mean arterial blood pressure (MABP) and mean cerebral blood flow velocity (MCBFV), computed as averages over each heartbeat interval (marked by the R-R peaks in the ECG), and resampled evenly every second after proper low-pass filtering to avoid aliasing [Mitsis et al., 2002; Zhang et al., 1998]. Spontaneous fluctuations in beat-to-beat MABP and MCBFV data possess broadband spectral properties that offer the opportunity to study dynamic cerebral autoregulation in humans, using the advocated nonlinear modeling methods. Impulse-response and transfer-function analysis were initially utilized to show that cerebral autoregulation is more effective in the low-frequency range (below 0.1 Hz), where most ofthe ABP spectral power resides. These studies have also indicated the presence of significant nonlinearities in this low-frequency range as attested to by low coherence function measurements. The nonlinear dynamic relationship between beat-to-beat changes in MABP and MCBFV reflects the combined effects ofmultiple mechanisms serving cerebral autoregulation. Since the vasculature is influenced by metabolic, endocrine, myogenic, endothelial, respiratory, and neural mechanisms, the dynamics of cerebral autoregulation are active over widely different frequency bands. Specifically, metabolic or endocrine mechanisms are active at very low frequencies and respiratory or myogenic mechanisms are active at high frequencies, whereas endothelial and autonomic neural mechanisms are found in the intermediate frequency bands. For this reason, it is incumbent on the employed modeling methodology to be able to capture reliably both fast and slow dynamics in a single processing task. To this purpose, we employ the nonlinear modeling method presented in Section 4.3 that utilizes the Laguerre-Volterra network with two filter banks (LVN-2) to model nonlinear systems with fast and slow dynamics effectively (from 0.005 to 0.5 Hz in this case). Arterial blood pressure was measured in the finger by photoplethysmography (Finapres, Ohmeda). Cerebral blood flow velocity was measured in the middle cerebral artery using TCD (2 MHz Doppler probe, DWL Electronische Systeme) placed over the temporal window and fixed at a constant angle and position with adjustable head gear to obtain optimal signals. End-tidal CO2 was also monitored continuously with a nasal cannula using a mass spectrometer (MGA 1100, Marquette Electronics). The analog signals of blood pressure and flow velocity were sampled simultaneously at 100 Hz and were digitized at 12 bits (Multi-Drop X2, DWL). Beat-to-beat mean values of MABP and MCBFV were calculated by integrating the waveform of the sampled signals within each cardiac cycle (R-R interval) and dividing by this interval. The beat-to-beat values were then linearly interpolated and resampled at 1 Hz (after antialiasing low-pass filtering) to obtain equally spaced time-series of MABP (input) and MCBFV (output) data for the subsequent analysis (see Figure 6.28). After high-pass filtering at 0.005 Hz to remove very slow trends in the data, 6 min input-output data segments (which correspond to 360 data points) are employed in the LVN-2 training procedure (see Section 4.3). The structural parameters of the LVN-2 model are selected by the model-order se-
322
SELECTED APPLICATIONS
lection criterion presented in Section 2.3.1, which ensures that we obtain an accurate model representation of the system and avoid overfitting the model to the specific data segment. Following this procedure, an LVN-2 model with LI -= L 2 ;;;.;. 8, H = 3, and Q = 2 is selected for these data. Note that the total number of unknown parameters in this model is 57, which is extremely low compared to the conventional cross-correlation technique, which would require the estimation of 5151 values for the first-order and second-order kernels with the necessary memory of 100 lags. The achieved model parsimony is further accompanied by a significant improvement in the prediction NMSE relative to the conventional cross-correlation technique. In order to terminate properly the training procedure and avoid overtraining the network, the prediction NMSE is minimized for a 2 min forward segment of testing data (adjacent to the 6 min training data segment) [Mitsis et al., 2002]. The averages ofthe MABP and MCBFV data over the 2 hr recordings from each ofthe five subjects are 82.3 ± 10.7 mmHg and 61.7 ± 9.0 cm/s, respectively. Typical6 min segments of MABP and MCBFV data are shown in Figure 6.28 along with their corresponding spectra. Most ofthe signal power lies below 0.1 Hz. The average achieved NMSEs of output prediction using first-order and second-order models, are 49.1% ± 13.4% and 27.6% ± 9.5%, respectively. The reduction of the prediction NMSE from the first-order (linear) model to the second-order (nonlinear) model is significant (over 20%), confirming the fact that the dynamics of cerebral autoregulation are nonlinear. The performance ofthe LVN-2 model is illustrated in Figure 6.29, where we show the actual MCBFV output (top trace) along with the obtained LVN-2 model prediction (second trace), as well as its first-order and second-order components (third and fourth traces, respectively). For this specific data segment, the second-order prediction NMSE is 13%, and the first-order prediction NMSE is 34% (i.e., the NMSE reduction due to the secondorder kernel is 21%). We must note that the contribution of the second-order kerneI (non-
90 ,..,----------~
AlterlaI Blood Pl88sure
C8r8braI Blood Flow VeIoclty 90-----..----..-------,
80
80
l
1 70
170
60
80
o
120
240 380 llrne (lee]
ASP spectrum
eoo r---------__--,
120
400.
240
360
'TIme [sec]
CBF Y810clty spectrum
,
300
400
200 200 100
0.05
0.1 0.15 0.2 Frequency [Hz]
0.05
0.1 0.15 0.2 Frequency [Hz]
Figure 6.28 Typical MABP and MCBFV data used for LVN-2 model estimation. Top panels: time series; bottom panels: spectra after high-pass filtering at 0.005 Hz [Mitsis et al., 2002].
6.2
o
120
240
CARDIOVASCULAR SYSTEM
323
380
T1me{aec)
Figure 6.29
Typical LVN-2 model prediction of MCBFV output (see text) [Mitsis et al., 2002].
linear term) to the output prediction NMSE demonstrated considerable variability among data segments (as small as 8% and as large as 62%). This variability was also reflected in the form ofthe second-order kernel estimates obtained for different segments and/or subjects. This finding suggests either nonstationary behavior in the nonlinearity ofthe system or the presence of intermodulatory (nonlinear) influences of other exogenous variables (e.g., changes in arterial CO 2 tension or hormonal fluctuations). The relative contributions of the linear and nonlinear terms of the model are also illustrated in Figure 6.30 for the same set of data in the frequency domain, where the output spectrum and the spectra of the first-order and second-order residuals (output prediction errors) are shown. The shaded area corresponds to the difference between the first-order and second-order residuals in the frequency domain, indicating that the nonlinearities are found below 0.1 Hz and are more pronounced below 0.04 Hz. This observation is consistent with previous findings based on the estimated coherence function. The fast Fourier transform (FFT) magnitudes ofthe first-order kernel and its two components (fast and slow) are shown in Figure 6.31 in log-log scale. The fast component has a high-pass (differentiating) characteristic with a peak around 0.2 Hz and a "shoulder" around 0.075 Hz, whereas the slow component exhibits a peak around 0.03 Hz and a trough around 0.01 Hz. The total first-order frequency response indicates that cerebral autoregulation attenuates the effects of MABP changes on MCBFV within the frequency range where most of the ABP signal power resides, as expected. The second-order kerne I (describing the nonlinear dynamics ofthe system) is shown in Figure 6.32 for the same data segment, along with its corresponding frequency-domain representation (defined as the magnitude of the two-dimensional FFT of the seeond-order kernel). The frequency-domain peak of the latter is located on the diagonal and is related to the corresponding first-order frequency response peak (for this specific segment) at 0.03 Hz. Note that the off-diagonal peak at the bifrequency loeation (0.03, 0.01 Hz) implies nonlinear intermodulatory interactions between the mechanisms residing at the respective frequencies, whereas the diagonal peak at the bifrequeney location (0.03, 0.03
324
SELECTED APPLICATlONS
i
i
i
i
i
i
i
i
I
i
350
an 250 ......Actual output
zn 1SI order rasiduals
Figure 6.30 Spectra of the output (MCBFV), first-order and second-order model residuals. The shaded area shows the effect of the nonlinearterm in the frequency domain [Mitsis et al., 2002).
Hz) implies nonlinearity of the single meehanism residing around 0.03 Hz. Seeondary peaks are also diseemible at lower off-diagonal bifrequeney points. Note that the loeations ofthe various bifrequency peaks vary over time but stay within eertain bounded regions from segment to segment (e .g. , the 0.03 Hz peak stays within the 0.02-0.04 Hz region). The nonstationarity of the system dynamies (i.e., the varying loca-
..
.
,~ --~
100
~
bCII
~ ••• -.
F.t
- - Slow
----
....... .,..-,., 1./ "-
\I:'..,/ " /
\
.."
"-
V
\ /1 {. I
.,
10
.............
L ·.······..·
~
-,
-,
<,
...•..•.... \ I I I
"
I' I
-,
"
"-
"
" "
\1
10""
10~
10"
Frequency [Hz]
Flgure 6.31 FFTmagnitude of the first-order kernel of cerebral autoregulation, and its fast and slow components in log-log scale . Solid line: total; dotted line: fast component; dashed line: slow cornponent. A trough around 0.01 Hz and peaks around 0.03 Hz and 0.20 Hz are evident [Mitsis et al., 2002).
6.2
l(
10-3 • I • •
4
•
• •
~
...
CARDIOVASCULAR SYSTEM
325
...: .... . ~
. .: .- . ~
'.
. '.
2 0
-2
....... ...., .. .... 0 10 20 .. ' ". .. ... . .. . ., .. .... ~ 3J 40 50 0 Time [sec] Tlrns [sec) ~. :
:I:
50
o
0.1
0 .05
0.1
0.15
Frequency [Hz) Figure 6.32 The second-order kernel of cerebral autoregulation for the data of Figure 6.28. Upper panel: time domain; tower panel : frequency domain [Mitsis et al., 2002].
tions of the speetral peaks and their respeetive strengths) ean be traeked for sueeessive overlapping data segments [Mitsis et a1. , 2002]. The observed variation s over time may be the result of nonstationary modulation of the vaseular impedanee by the autonomie nervous system (sympathetie and parasympathetie aetivity) and other faetors, sueh as endothelial, endoerine, or metabolie meehanisms that affeet the vaseular impedanee (typieally over longer time seales). Nonetheless , the time variations of the first-order kernel estimates are limited, as demonstrated in Figure 6.33 where the averaged first-order kerneIs over 20 sueeessive 6 min segments of data for five different subjeets are shown in log-linear seales along with standard deviation bounds . In order to facilitate the interpretation of the obtained nonlinear model , we resort to PDM analysis. The key struetural parameter is the nwnber of hidden units (H) , whieh is also the number of "prineipal dynamie modes" (PDMs) and found to be equal to 3 in all eases. The LVN-2 model with struetural parameters (L = 8, H = 3, Q = 2) was used to analyze data from human subjeets (provided by our eollaborators Dr. B. Levine and Dr. R. Zhang from the University ofTexas in Dallas). A typieal PDM model ofa normal subjeet
326
SELECTED APPLICATIONS
0.8 0.8 0.4
0.4
0.2
0.2
o
:. __•
O~
-o.2 L
04.2
,
1~
11me(aec)
,..:.:.::::
;;;.:.:.: _.
I
-0.2
.oA'
102
10'
0
1
1cf
10'
04.4 10°
11me(ieC)
10' llme [lee)
2
10
0.8 \
::,\\ 0,:, \< : : : . •
.::: ::::~:: -o.2~
,
.0.2
-oAl
1~
~::::.
10' TIme[aec)
2
10
,
I
\.
....... , .."
,..-
44'
1~
,
10'
1cf
nme[aec)
Figure 6.33 Average first-order kerneis over 20 successive 6 min segments for five different subjects (solid lines) and corresponding standard deviation (dashed Iines) [Mitsis et al., 2002].
is shown in Figure 6.34, exhibiting resonant peaks that can be related to specific physiological mechanisms. In order to determine this correspondence, we resort to pharmacological manipulations whereby data are analyzed before and after the subjects are treated with various medications (e.g., trimethaphan, phenylephrine, L-NMMA, etc.) that selectively affect different physiological mechanisms. For instance, MABP and MCBFV data before and after the administration of trimethaphan (which blocks the ganglia ofboth sympathetic and parasympathetic nerves) were used to explore quantitatively the effect of the autonomie nervous system on cerebral autoregulation in the aforementioned nonlinear dynamic context. The average value and dynamic range of MABP and MCBFV typically drop after the injection of trimethaphan and the effect of respiration is visually evident, as illustrated in Figure 6.35 for anormal subject. It is also evident that there are many occasions of flow fluctuations (typically deep depressions) with no apparent causallink to preceding changes in pressure, possibly caused by other factors (e.g., the effect of blood gases is examined in Section 7.2.4 for CO 2 tension in the context oftwo-input modeling). The resulting third-order PDM models reveal intriguing information about the underlying mechanisms, as illustrated in Figure 6.36, which shows the results for a normal human subject after injection of trimethaphan. It is evident by comparing Figure 6.34 with Figure 6.36 that the first and third PDMs are affected drastically by the trimethaphan as a result of the ganglionic blockade. The second PDM is not affected significantly and is surmised to represent
6.2
CARDIOVASCULAR SYSTEM
327
Flgure 6.34 PDM model of cerebral autoregulation for anormal subject, where the PDMs are plotted in the frequency doma in (magnitude only). Three PDMs are consistently identified that exhib it admittance peaks within certain frequency bands. Based on existing knowledge regard ing the act ion of the sympathetic and parasympathetic nerves, we pos it that the first PDM with peaks marked L 1s • M 1s • H 1s corresponds primarily to sympathetic effects, and the third PDM with peaks marked L 1p • M 1P ' H 1P corresponds primarily to parasympathetic effects. The second PDM contains the effects of local endothelial mechan isms (NO and CO2 ) and respiratory coupling (ASA).
88 [-
-,---
82
:E 80 [
78
o
---,
4
f:: L' I~~1 i
-----,------.. .-----,--
--'--
60
!!\~~~~ 120
240
180
300
360
Time [sec) 70 - - - -~---_,.-------.----__r----_,
~
.M ;~~
{~~~
~
.2.
~ ~
'~'V V~
j
:E ---l
60
120
180
240
300
360
Time [sec)
Figura 6.35
MABP and MCBFV data fram anormal subject injected with trimethaphan.
328
SELECTED APPLICATIONS
v1
SYMP ~ :
~
N
1
2
I
x(t) I ----+-----+
MABP! 0'
!
2 ~[DJ~lf) 0.2
•
0.3
0.4
0;5
-2 2
I
1----------.
1.5
PARA
1
v3(t)
1 ---. 0 1--1---------,~-..-~'--------+I... ~
0.5
.
f [Hz])
o
0.1
0.2
0.3
0.4
0.5
v31
-2 ~
-2-1
o
2
Figure 6.36 Third-order PDM model of cerebral autoregulation for a normal human subject (PDMs plotted in the frequency domain) after the injection of trimethaphan. Observe the changes relative to Fig. 6.34, especially in the first (sympathetic) and third (parasympathetic) PDMs (see text).
meehanisms that are not related to the autonomie nervous system (e.g., endothelial, myogenie, metabolie, and hormonal faetors, as well as effeets related to the respiratory cyele). Speeifieally, we interpret the effeets of trimethaphan on the first PDM of Figure 6.36 to indieate the eonneetion of the sympathetie nerve with the PDM peaks at the mid-frequeney and high-frequeney ranges (marked as M Is and HIs, respeetively in Fig. 6.34) that disappear after the injeetion of trimethaphan. The disappearanee of these two peaks can be explained by the removal of the vasoeonstrietive effeet of sympathetie activity and can be viewed as a deerease of impedanee at the respeetive frequeney bands (beeause of the negative trend of the nonlinearity). However, the PDM peak at the low-frequeney range (marked as L l s in Fig. 6.34) is enhaneed, eorresponding to an inerease in vascular impedanee beeause ofthe negativity ofthe assoeiated nonlinearity. One possible explanation ofthe broadband change in PDM values is that the reduction in average blood pressure resulting from the removal of the sympathetie eardiae exeitation (i.e., reduetion of eardiae output) is followed by more than proportional reduction in average blood flow, because of the viseoelastie properties of the vaseular wall (i.e., the vessel is not a firm, passive tube but exhibits viseoelastie and aetive eharaeteristies). The inerease of apparent vaseular impedanee that is evident in the very low frequencies of the first PDM (peak around 0.01 Hz, marked as L Is ) may be due to the removal ofthe modulatory effeet ofthe sympathetie aetivity on the humoral meehanism that is likely responsible for this resonant peak (a eyelieal phenomenon with approximately 2 min period). For instanee, a change in the hormonal or endothelial modulation of the way the sympathetic innervation ofthe vaseular wall aets to eontrol its viseoelastie properties may cause such an effeet. Note that this peak is also evident in the other two PDMs but eontributes differently to the flow output because of the different nonlinearities assoeiated with the different PDMs. The reason we posit that the first PDM is related largely to the sympathetie activity is the negative eontribution of this PDM to the flow output (i.e., the contribution of
6.2
CARDIOVASCULAR SYSTEM
329
the first PDM amounts to a decrease in MCBFV for an increase in MABP) consistent with the anticipated vasoconstrictive action of the sympathetic nerve. The magnitude of the negative contribution decreases after injection of trimethaphan. The third PDM also exhibits peaks at the same frequency bands as the first PDM, (marked as HIP' M I P , and L I P in Fig. 6.34) and they are ascribed to the activity of the parasympathetic nerve due to their positive contribution to the flow output (positivity of associated nonlinearity) in the normal subject prior to the injection of trimethaphan. The positive sign of the output contribution of the third PDM implies that the peaks now correspond to admittanee peaks (in contrast to the impedance peaks ofthe first PDM, owing to the negative sign of the respective nonlinearity). Therefore, the disappearance of the peaks M I P and HIP after the injection oftrimethaphan means that the apparent impedance is increased at the respective frequency bands. The impedance is also increased at low frequencies because the L I P peak is decreased and the values of the associated nonlinearity are decreased after the trimethaphan injection. The latter change after trimethaphan signifies considerable increase of impedance at the very low frequencies. The removal of the parasympathetic cardiac stimulation (after ganglionic blockade by trimethaphan) should counterbanlance to some extent the respective removal of sympathetic cardiac stimulation. Therefore, we are led to a working hypothesis of a facilitative modulatory effect of the parasympathetic nerve on the humoral or endothelial mechanism likely responsible far the peak L I P ("facilitative" implies reduction of vascular impedance). This can be explored in a two-input context (see Section 7.2.4). Finally, the main features ofthe second PDM (peaks marked as LI, L 2 , M 2 , H 2 in Fig. 6.34) are only slightly altered after trimethaphan injection, with the corresponding output contribution retaining its basic trend (although reduced somewhat in size). This leads us to sunnise that this PDM is not affected by the ganglionic blockade of the sympathetic and parasympathetic nerves. Therefore, the observed peaks must be due to other "local" mechanisms that reside in the endothelium (e.g., the effect ofnitric oxide or CO2 tension in decreasing loeal vaseular impedance that may relate to the L 2 andlor M 2 peaks) and hormonal effects that are distributed throughout the vasculature (LI peak). The H 2 peak is likely related to the effect of the respiratory cycle (marked as RSA) on the pulmonary vasculature that is not affected by the ganglionic blockade. A third-order model of this system satisfies the MOS criterion of Section 2.3.1 part of the time (e.g., after trimethaphan injection), but not always. Nonetheless, we may consider a third-order model (Q = 3) as the correct structural parameter for cerebral autoregulation, because of the long-held view that the "steady-state" relation between pressure values and flow has an inflection point (i.e., the "steady-state" flow values exhibit a plateau for middle values of "steady-state" pressure, but the flow values are rapidly rising for very large values or declining for very small values of pressure). The existence of an infleetion point neeessitates the use of (at least) a third-order model. Another key structural parameter is the size of the DLF filterbank (L), which is set at L = 8 since it was found to range between 6 and 8 for various subjects and data segments. The particular subject matter of cerebral autoregulation is further elucidated in Chapter 7 (Sec. 7.2.4) by an additional example ofPDM analysis with two inputs (end-tidal CO 2 in addition to arterial blood pressure). This analysis sheds additional light to the aforementioned postulated hypotheses by delineating the effects of CO 2 on cerebral flow autoregulation (confinning some ofthe interpretations presented above). Additional insight is gained by the PDM modeling of data from another experiment
330
SELECTED APPLICATIONS
whereby various levels of orthostatic stress are applied on normal subjects by means of "low-body negative pressure" (LBNP). Some initial results are presented below to corroborate further the efficacy ofPDM analysis. The application of LBNP emulates changes in orthostatic stress and allows the controlled study of cerebral autoregulation under such conditions (due to its clinical importance for elderly subjects). The obtained first-order Volterra kemels from 6 min data records are shown in Figure 6.37 for different levels of LBNP. It is evident that an increase of orthostatic stress raises the admittance gain at lower frequencies initially (i.e., at -30 mmHg) and for greater LBNP (i.e., at -50 mmHg) across all frequencies. The overall reduction ofimpedance is about five-fold. In order to achieve a more refinedlevel ofunderstanding of these effects, we employ again the equivalent PDM models for the baseline data and the two LBNP levels, shown in Figure 6.38. We observe decreased gain (cerebrovascular admittance) at the higher frequencies (above 0.04 Hz) for LBNP = -30 mmHg and increased gain at the low frequency peak around 0.015 Hz for the first PDM. The latter remains true for LBNP = -50 mmHg; however, the other two PDMs exhibit increased gain at higher frequencies (above 0.10 Hz) as weIl as a new resonant peak at 0.04 Hz in the second PDM. The associated nonlinearities remain about the same for LBNP = -30 mmHg except for the first one that shows a ten-fold increase ofits negative branch. For LBNP = -50 mmHg, the other two PDM branches exhibit some significant changes, most notably the convexity of the second nonlinearity that makes the second PDM contribution to the output mostly positive. This model suggests that autonomic failure under severe orthostatic stress may be caused by the increased admittance around 0.015 Hz (possibly due to cerebellar activity) when combined with the drop in average MCBFV. The complexity of these observations and the potential insight that they may offer to our understanding of cerebral autoregulation i1lustrate the potential utility of the PDM model and the high promise of the advocated approach. However this task will require considerable effort and the careful examination of the results for extensive amounts of data, especially data collected from subjects under various pharmacological treatments that selectively target relevant mechanisms (e.g. propranorol or atropine for selective sympathetic or parasympathetic blocking, respectively, L-NMMA for blocking NO receptors, modulators of blood gases, etc.). This type of data can yield useful results in the near future within the presented methodological framework in a manner that systematizes rigorously the individual effects of various protagonists. In addition, the interpretation of the PDM model and our understanding of cerebral autoregulation can be greatly enhanced by multivariate analysis of data collected on additional relevant variables (e.g., respiratory and heart rate variability, blood gases, nitric oxide, catecholamines, etc.) when they can be measured with sufficient temporal resolution. It is understood that the multivariate type of data places a considerable (but not insurmountable) burden of measurement and will require multifactorial methods of analysis, such as the ones discussed in Chapters 7 and 10. No doubt, it represents the "cutting edge" in systems physiology and a most exciting prospect within the potential capabilities ofthe peer community. This wealth of information can become available for the first time through the application of the advocated methodology and, if confirmed with sufficient amounts of data, it surely represents a quantum lead in the state of the art in this field. It is recognized, however, that the "quantum leap" nature of this advancement raises the threshold of acceptance by the peer community. Thus, many more experiments must be conducted and this fOOJ
6.2 0.25
331
CARDIOVASCULAR SYSTEM
0.7.
i
i
0.8
0.2
0.5
!
0.4
=jg 0
'"
CD
....
~
0.3
-0.
-0.10
20
40
60
80
100
120 140 m[sec)
160
180
200
00
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
FI8qU8nCy (Hz)
~
I
0.1
1.4.
i
1.2
C)
::I:
0.1
E E
~~ 11 .x
0.05
Q.
Z
~ -0.1
0
-0.15
20
40
60
80
100
120 140 m[sec)
160
180
200
12r--.,..-...,..---r------,r-----r---r--~--,.---
~ o. E E _ 0.8
~~
8.
0.05
0.1
3.5
0.15
i
0.2
0.25 0.3 Ff8CIUMCY (Hz)
0.35
0.4
0.45
0.5
I
,
2.5 2
1.5
0.4
Z
~
0.2 0.5
-0.20
20
40
60
80
100
120 140 m(sec)
160
180
200
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
FNQ'*'CY (Hz)
Figure 6.37 The first-order Volterra kerneis of anormal subject for different levels of low-body negative pressure (LBNP) simulating orthostatic stress. First row: baseline; second row: LBNP = -30 mmHg; third row: LBNP = -50 mmHg. Left column: time domain; right column: frequency domain representation.
type of analysis must be successfully repeated (confirming these observations) before these postulated hypotheses can be accepted by the peer community. Far from being a burden, this is an exciting task that invites the coordinated effort ofthe peer community, because it promises to deliver (when successfully completed) the requisite breadth and depth of understanding of the mechanisms subserving cerebral autoregulation that is necessary for drastically improving clinical practice in this century.
0.5
332
SELECTED APPLICATIONS
Baseline Data
.~
,
v1
0.511---------~---·------~---···-----~----------:----._.---~ - . 0'
,
0.3
~
-..-
0.4
0.5 f Il-lz • .,
21
oN
-2
4
2
x(t)
------r---------r---------r---------
ABP
"'!""
i
0.1
0.2
0.3
,
,
,
i
0,
~
,
0·1 [HS·
,
~
5
1
15
1.51-········[·········j········t·········I········· 1\;
;
10
0.5 .. -.-----~---.~-- ..-------~----------~---.-.---- ~ 0
Figure 6.38a
0.1
0.2
i
;
0.3
0.4
\G --.
5 0 -5 -5
0.5
f[Hz]
PDM model of cerebral autoregulation in anormal subject for baseline data.
LBNP=-3Q mm Hg 10 F
y1 'i
2 ~.--------t------·----!--·_-----·+----------~··--····--
. · . r. . ·_.·. : U' : • •
: I I
: I f
: I I
o o
0.1
0.2
0.3
i
-
;;jij
o~
~::d::.:::I-:::···:I:.:· . :::r
v1:
·····i··...·····..........
~
5
0
0.4
-.
0.5
:
:
o
-2
-4
f[Hz]
.
1
v1
2
4
10,
1
O~I
x(t)
V2. ! -.-10 -------r- -----------.-----
:~::::::.:::::::.:::::::::l::: -5
10 ~
va
0 . y3
5
---,------t-=---r----,
r-,
5
-O~. ,
''
!~
04
0.5
. f[Hz]
Figure 6.38b
-5 -----------t--------------4
-2
\6
---------·---i-
0
2
PDM model of cerebral autoregulation in anormal subject for LBNP
=-30 mmHg.
6.3
LBNP=-50
.--------.
2.5 2 1.5 1
RENAL SYSTEM
333
mm Hg
....- ..- .._.. ...._............... .............. _...... .................... ..................... ~
I ................. rI .. ................ -.................... -.-I ...................... TI .................... ~
·::::::::I:::-::::-::::::::::r::::::::.::::::::::
0.5 ---- ......... -, ....... ---..-- -----------,-----------,------- ...-....
:
:
:
0.1
0.2
0.3
I
0·1[Hz1-5
21:---·---·-~----------+-------·-+-··--~----t-------·--
x(t)
~ . . . . -.. --.. . -.. i . . -_.. -..:--.. . --.. . . --r.i. . -----.. . -t---_. ~-·-
i
0'
o
~
,
0.1
0.2
0.3
~
0.4 0.5 f[Hz]
2_~ -:::::::r::::::::1:::::::::l::::::::1:::::::::: -------+-.--.----+-------.-+---------+------.--
1.5
o_~ ::- -::::t:::::::j:::::::::j::::::::::l:::::::::: o Figure 6.38c
6.3
-5
o
0
O.1
0.2
,
0.3
0.4 0.5 f[Hz]
°H
~-1o l-----L--I--------------------------~ -20
1,--------.------ -------------------,---------.-5
o
5
PDM model of cerebral autoregulation in anormal subject for LBNP = -50 mmHg.
RENAL SYSTEM
Renal blood flow autoregulation (i.e., the process by which vascular hemodynamic impedance is adjusted to minimize fluctuations in renal blood flow caused by fluctuations in arterial blood pressure) is critical for maintaining fairly constant filtration rates by the kidneys [Chon et al., 1993, 1994a, b, 1998b; Marsh et al., 1990]. The two most important meehanisms subserving renal autoregulation are the myogenic and the tubuloglomerular feedback (TGF). The myogenic mechanism is vaseular in nature and causes changes in blood vessel diameter and meehanieal eharaeteristics (e.g., stiffness) in response to changes in loeal vascular hydrostatie pressure. An inerease in blood pressure causes constrietion of the vascular smooth muscle that inereases the vascular wall tension and limits the change in blood flow. TGF is an intrarenal, local feedback mechanism govemed by flow-rate-dependent changes in the concentration of NaCI at the maeula densa. Changes in arterial pressure lead to changes in NaCI reabsorption in the renal proximal tubule that affect the flow rate of tubular fluid in subsequent segments of the nephron. Reabsorption of NaCI in the ascending limb of the loop of Henle is flow-rate dependent, whereby increases in flow rate are eausing the concentration ofNaCI to increase, and vice versa. The NaCI concentration changes are sensed at the macula densa, which is a collection of specialized cells at the most distal point of the loop of Henle. Macula densa cells trigger a signaling chain that causes the arterioies to adjust the smooth muscle tone in order to limit variations in flow within the tubules. Both TGF and the myogenic mechanism are triggered by changes in arterial blood pressure and act on a common vascular segment (the afferent arteriole) so that interactions between the two mechanisms are to be expected. Therefore, nonlinear
334
SELECTED APPLICATIONS
analysis (like the one advocated herein) is especiaIly weIl suited for the study of such interactions. Both the myogenic mechanism and TGF generate autonomous oscillations in arteriolar diameter, which suggests that nonlinearities are functionally significant features of both mechanisms. Previous work using various linear and nonlinear methods of transfer function and coherence analysis gives ample confirmation of the importance of nonlinearities in the two components ofrenal autoregulation [Chon et al., 1993, 1994a, b, 1998a, b; Holstein-Rathlou et al., 1995; Marmarelis et al., 1993, 1999b], in a manner qualitatively similar to cerebral autoregulation discussed in the previous section. The dynamic characteristics of the two mechanisms have been studied in rats using broad-band arterial pressure forcing to separate the nonlinear dynamic properties of the two renal autoregulatory mechanisms [Chon et al., 1993; Marmarelis et al., 1993]. The general conclusion is that the myogenic mechanism causes decreased impedance over the frequency range from 0.1-0.2 Hz, whereas the TGF mechanism is active in the frequency range below 0.08 Hz, where increased impedance is observed. The combined action of the two mechanisms attenuates the effect of blood pressure fluctuations on renal blood flow at frequencies less than 0.1 Hz, where most ofthe power ofthe spontaneous pressure signal resides, while the opposite effect occurs in the 0.1-0.2 Hz frequency range (where very little power of spontaneous pressure fluctuations exists). Experiments were performed on male Sprague-Dawley rats, and details of the setup are given in Marmarelis et al., (1993). Measurements of renal blood flow and arterial blood pressure were made while broadband fluctuations were induced in the arterial blood pressure with pseudorandom forcing. These were generated by a bellows pump through a cannula inserted into the distal aorta at the bifurcation. The input was derived from a constant-switching-pace symmetric random signal (CSRS), which exhibits the spectral properties of band-limited white noise up to 2 Hz (see Section 2.2.4). A unique seed was used for the random-number generator in each experiment to secure independent experimental runSe The arterial blood pressure was measured in the superior mesenteric artery with a catheter, and the renal blood flow was measured in the left renal artery with an electromagnetic flow probe. Three different power levels of forcing were used in each experiment on five different preparations, yielding a total of 15 arterial blood pressure and 15 blood flow data records. The mean and root-mean-square (RMS) values (after de-meaning) of the resulting pressure and flow fluctuations are given in Table 6.1 for the three levels of forcing: low (3.86-5.66 mmHg), medium (6.33-7.96 mmHg), and high (10.40-12.41 mmHg) in terms ofRMS level ofthe de-meaned arterial pressure forcing. The input-output records comprised 512 sampIes over 256 seconds with a sampling rate of 2 sampIes per second (Nyquist frequency of 1 Hz), after digitallow-pass filtering (using a 20th-order Hamming window) to avoid aliasing. Each data record was subjected to second-order polynomial trend removal and was de-meaned (by subtracting out the mean value) as weIl as normalized by dividing with its RMS value prior to processing. Thus, regardless of different power levels of arterial pressure forcing, aIl analyzed data sets had zero mean and unit variance, because we wish to study the form of the system kemels irrespective of scaling. The mean and RMS values ofthe resulting pressure and flow signals are plotted in Figure 6.39 in order to demonstrate that conventional "steady-state" analysis reveals nothing beyond a crude trend and misses entirely the critical dynamic characteristics of renal autoregulation. Typical input-output (pressure-flow) data are shown in Figure 6.40 for three
6.3
RENAL SYSTEM
335
power levels of CSRS forcing in a normotensive rat. The results of linear transfer function and coherence analysis are shown in Figure 6.41 and demonstrate reduced coherence (i.e., likely presence of nonlinearities) in the lower frequency range (below 0.1 Hz) consistent with our results in cerebral autoregulation. The obtained coherence values also demonstrate decreasing autoregulatory effects at higher power levels of forcing, indicating that strong forcing overwhelms the autoregulatory mechanisms. The obtained first-order Volterra kernel estimates for the three power levels of forcing are shown in Figure 6.42. The observed changes in wavefonn as the input power level increases are suggestive of negative compressive (sigmoidal) feedback (see Section 4.1.5). The efficacy ofthird-order Volterra modeling ofthis system using LET (L = 8) is corroborated by the model prediction errors given in Table 6.2 and illustrated in Figure 6.43, where the contributions ofthe various orders to the model prediction are shown. Note that the third-order model seems to capture some "flow depressions" that are not captured by the second-order model. This suggests that a third-order nonlinearity can account for the main nonlinear dynamic characteristics of renal autoregulation that are found in the frequency range where the TGF mechanism is active (below 0.08 Hz). Note also that the aforementioned "flow depressions" in the first-order residuals are more pronounced for medium and low levels of forcing, and the degree of third-order nonlinearity (as measured by the percentage contribution of the third-order kernel to the model output prediction) follows the same pattern, as shown in Table 6.2. This suggests that the presence of nonlinear autoregulation is least evident at the high level of forcing, leading us to posit that the TGF may become "overwhelmed" at high levels of pressure forcing. The demonstrated efficacy ofthe advocated methodology in nonnotensive rats led us to extend the study to spontaneously hypertensive rats (SHR) and examine the effects ofhypertension on the obtained nonlinear models. Four SHR preparations were compared with four nonnotensive Sprague-Dawley rats (SDR) for two different levels offorcing. The re-
Table 6.1 Mean and standard deviation (SD) values of arterial pressure (AP) and renal blood flow (BF) for three different power levels of forcing and five experimental preparations (rats)
Data set # (exp. prep.).
2
3
4
5
Power level
AP mean, mmHg
APSD, mmHg
BF mean, mllmin
BFSD mllmin
High Medium Low High Medium Low High Medium Low High Medium Low High Medium Low
94.19 109.13 105.69 88.96 99.21 102.38 92.26 96.41 105.32 112.29 115.08 118.54 108.54 113.08 115.42
10.40 6.33 3.99 11.58 6.92 3.86 12.41 7.69 5.28 11.64 7.96 5.66 11.41 7.72 5.03
5.72 6.24 6.42 3.97 4.12 4.54 4.15 4.81 5.71 10.62 10.11 9.24 6.73 7.12 7.33
0.83 0.50 0.33 1.06 0.65 0.39 0.88 0.56 0.37 1.26 0.82 0.71 1.17 0.89 0.66
336
SELECTED APPLICATIONS
12
:g
!
10
•
IBOB
• A
MEDIUM IJJW
•
IDGH
8
~
~ 6
"8
c8
4
2
2
iCD
o
90
eo
100
110
120
Mean Arterial Pressure (mmHg)
c
1.4 ,
~ g
1.2
,
,
,
,
,
,
,
~
~
1
JO.8
I
I
I
•
I
I
!
!
'!'
•
MEDIUM
A
LOW
!
o
~ 0.6 .~
~
0.4
c ~ 0.2
~
Cf)
.
0 1"'1"'1"'1'''1'''1'''1'''1
o
2
4
6
8
10
12
14
Standard Deviation of Arterial Pressure (mmHg)
Figure 6.39 Comparison of mean blood flow versus mean arterial pressure (top display) and standard deviation of blood flow versus standard deviation of arterial pressure (bottom display) at high, medium, and low power level of arterial pressure broadband forcing in the rat kidney. Observe the formation of three clusters with respect to arterial pressure in the standard deviation plot: low (3.5-6 mmHg), medium (6-8 mmHg), and high (10-12.5 mmHg) [Chon et al., 1993].
sulting mean and RMS values are given in Table 6.3. Note that the RMS value is defined on each de-meaned signal and it is not related to averaging over different preparations. Typical pressure-flow data are shown in Figure 6.44 for low-power level of forcing. The observed differences between SDR and SHR in the first-order Volterra kernel estimates are subtle and insufficient to differentiate between the two cases. However, some discriminating differences are observed in the second-order and third-order Volterra kernel estimates obtained via LET (L = 8), as illustrated in Figure 6.45 and Figure 6.46, respectively. Distinct differences exist for both low and high level of forcing that are evident as distinct patterns of magnitude peaks in the bifrequency domain used for display in Figures 6.45 and 6.46 (magnitudes of 2-D FFT of second-order kernel and of a thirdorder kerneI slice). Generally, hypertension removes the nonlinear interaction between the TGF and the myogenic mechanism seen in Figure 6.45(a) at the bifrequency point:
6.3
..
HIGH
la:
~
C/) C/)
....
0 .... ...J u. ....
>&0
o
W
!t4 .
....
E
:::l ...
w a:
a.
;i
3:
...
....
o
o...J
CI: .... w
lD
b: ...
...
....
....
....
...
2.10
....... , .~
.00.
no.
~
n4.
S. ,.
:SUD
~
....
:::,
W
••
....
...J
CI: w I-
~
Cl
:c
E
E
~
:::l
1ßw
~...J
sa: W
'00.
MEDIUM
.... .... ....
.......
n.o
... ...
...
....
....
....
...
'00.
,...
...
....
....
....
...
'00.
LOW
'110
..,.
tu.
Iuo....
:::,
s. ....
....
~uo
..
...J
u. ....
o
o
n.o
110
.....
g .... lD
n ..
....
..
glD ....
b: ...
....
oo ..,.
....
&: ....
...J
....
u. ....
lw ,,...· a:
...
..,.
i S .... a: a.
...
,...
Ci
C/)
337
'00
. ,. I:::, .u.·
Cl :I:
RENAL SYSTEM
...
.
..
.....
....
TIME (sec )
...
.
'00.
...
....
....
....
TIME (sec)
..
'00
Figura 6.40 Typical input-output data (arterial blood pressure: left column ; renal blood flow: right column) from the renal autoregulation experiments in rats tor three power levels of CSRS (quasiwhite broadband) pressure forc ing , marked as high (first row), medium (second row), and low (third row) [Chon et al., 1993).
0.13 Hz,12 = 0.03 Hz (and is symmetrie about the diagonal) . The peak at the diagonal point (fj = 12 = 0.03 Hz) is indicative of the presence of nonlinearity in the TGF mechanism that is thought to reside around the 0.03 Hz frequency band. Because the second-order and third-order kernel estirnates are quite variable from preparation to preparation (and also over time) and their inspection is a rather demanding task, we derive the equivalent PDM models and attempt the same comparison between SDR and SHR, as weIl as between high and low level of forcing. Two PDMs are obtained in all cases that seem to largely correspond (fortuitously) to the TGF and myogenic mecha-
;; =
338 ....... UJ
o.nG
~
G.t40
~
Q.210
o
Z
..:
SELECTED APPUCATIONS 1
....... UJ
g o.nG ~
o.J4O
5i :::;:
0.210
~
0.'10
Z
0.11D
o ;::
0.110
i=
0.'50
~
0..110
5
0.120
2
o
(,)
;:)
~ UD-01
u, CI: G.IOGI~1 UJ ~ G.IOlX-Gt
~
.....
u, .....-o1 Cl:
cn 2 ......-01 ..: ~
2
c:::(
Cl:
...
.....
...00
....... UJ
go.nG ~CU40
~ :::;:
Q.210
Z
allD
i=
0.1.
~
0. • •
o
o
I
0: Q.JClOI:~' UJ
.....
~
~
~ ~
:::;:
... -,
~ ..................... ... -- . ... ~
...
"'00
.....
.....
.....
.....
.....
.....
..-
.....
.......
D.21D 0.180
o
5
0..110
CI:
0.'20
u.. UJ
~ 0.I0OI:-01
2
~,/
CU70
...-
Z
« Cl:
GADCI:-O'
..... ......-01
" .....
...
••.•...................._..:
Q"JI)Q[-o,
i=
o
.."....
,
lJ; UlllI<-o1 2 ~ uoat:~
o
.';
.... 0.0
UJ
",.....~ ..\.
;:)
,/i
u,
.....
..-
f:'
!
:
,:
.....
/
,/
"'00
.....
.....
FREOUENCY (Hz)
0.-
.....
...
...
0,'"
FREOUENCY (Hz)
Flgure 6.41a Transfer function magnitude (galn of admittance funct ion) obtalned from flow/pressure data at high (top panels) and low (bottom panels) arterial pressure forcings for normotensive rats (Ieft panels) and hypertensive rats (right panels). [Solid Iines are means; dotted and dashed Iines are standard errors bounds (SE).) Transfer funct ion magnitude units are arbitrary since the flow-pressure data were normalized to unit variance [Chon et al., 1998).
nisms . The results are shown in Figures 6.47 and 6.48 and demonstrate the potential of PDM analysis for improved physiologieal interpretation of nonlinear dynamie models of renal autoregulation. For low level offoreing, it is observed that the first PDM (that eorresponds to the myogenie meehanism) remains essentially the same for SDR and SHR, but the seeond PDM (whieh eorresponds to the TGF meehanism) is signifieantly altered by hypertension. Specifieally, it is seen in Figures 6.47 and 6.48 that the seeond PDM exhibits two magnitude peaks in the frequeney domain that have distinetly different size relation (i.e., for SDR, the peak at - 0.02 Hz is dominant over the peak at -0.07 Hz, but for SHR the two peaks are roughly equal in size). It is very interesting to see in Figure 6.49 that the seeond PDM for SDR exhibits the same parity between the two peaks for high level of foreing, suggesting that this parity results from an elevated operating point of average arterial pressure. However, this parity is destroyed by high-level foreing in SHR, as attested to by the appearanee of a dominant resonanee peak in the seeond PDM at -0.04 Hz shown in Figure 6.50 (as ifthe two previous peaks are merged into one under the eombined effeet of hypertension and high level of foreing). It is important to note also that the eontribution ofthe seeond PDM is diminished rela-
6.3
...
,
-.
.....,
'
....., W
.....
~
.....,
w w
er: J:
0
8uoo
Ü
.....,
.....
~::.:::
...
.....
uoo
uoo
-
....
-
.uoo
.....
. ,.
-
o.a
.....
.
•.00
.. ..,.
.....
u ..
.....
.....
'.00
.......,
....., .....,
-..... ...
w ..... Ü
zw ....
w .....
~
w
....
er: .... w J: 0._
ffi ....
6 ..-
0
uoo
..,.. ...
/ ' .../ .v ......,..... ,. ......
~.....,
~ lIAOO
Ü
.......--::::-- ~-_._ ------......
....., ur e, 700
.... ...
339
,...
.-"
Ü .....,
Z
RENAL SYSTEM
Ü
uoo
o.JOO
..... .
.
e, ,..
...
0.,,,
uoo
uoo
FREOU ENCY (Hz)
.....
......
0.0
.....
lUOO
.....
0....
.....
FREOUENCY (Hz)
Fig ure 6.41b The coherence fun cti on measurements for the fo ur cases of renal autoregulation experiments with the transfer functions shown in Fig. 6.41a [Chon et al., 1998].
tive to the contribution of the first PDM in the case of SHR with high-level forcing. This does not imply that the role of the TGF in the latter case has diminished but rather that it has likely changed into a chaotic mode, giving rise to a previously observed spontaneous chaotic oscillation between 0.03 and 0.04 Hz in hyperten sive rats [Marsh et al., 1990]. This also explains the fact that the model prediction error increases in this case, as a result ofthis input-independent chaotic oscillation, not captured by the Volterra model. The first PDM is essentially the same in all cases, suggesting that the dynamic characteri stics of the myogenic mechanism are not affected by the level of arterial blood pressure or hypertension. We also note that the contribution of the first PDM to the model output power is 5 to 10 times larger than the contribution of the second PDM , as quantified by the respective eigenvalues. The estimated nonlinearities associated with each PDM show that the second PDM exhibits stronger nonlinear characteristics (relative to the respective linear characteristics), consistent with the observed lower coherence at frequencies below 0.08 Hz. This is visually evident in the PDM model with a three-dimensional static nonlinearity associated with the two PDMs in the normotensive case of renal autoregulation shown in Figure 6.51. Note that in this PDM model, the spike at the zero lag of the first PDM has been removed (as representing an all-pass characteristic) in order to enhance our ability to examine the autoregulation dynamics separately from the all-pass characteristics of pas-
340
SELECTED APPLICATJONS
2.40 2. 10 1.80 1.50 1.20 0.900 0 .600 0.300 0.0 - 0.300 -0.600
- r-r,-ITITT
0.0
T- f T
l
12 .0
6.00
18.0
24.0
30 .0
TIME LAG [SEC)
Figure 6.42 Typical first-order Volterra kemel estimates at the high (dashed line); medium (dotted line) and low (solid line) power levels of arterial pressure forcing, obtained from the same preparation (rat). Note the increased damping of the waveforms as the level of arterial pressure forcing increases. The amplitude units of these kerneis are arbitrary, since the pressure-flow data were normalized to unit variance. In general, the first-order kemel units are (output units)/(input units)/(time units). Actual units can be obtained by use of the corresponding RMS or SO values of pressure/f1ow(input/output) given in Table 6.1 [Marmarelis et al., 1993].
Table 6.2 NMSEs of model prediction for three power levels of broadband input-forcing and various model orders for the renal autoregulation data. Note that the NMSE decreases with increasing power level and model order
Model order Power Level High Medium Low
Ist Order
2nd Order
3rd Order
6.86% 15.04% 18.70%
6.67% 14.70% 18.63%
1.51% 4.17% 7.20%
Table 6.3 Averaged mean and de-meaned RMS (SO) of renal arterial pressure (AP) and blood flow (BF) for two different levels of forcing (computed over four SHR and four SOR preparations)
Sprague-Dawley rats (SDR) Forcing Level High Low
Mean ± RMS AP (mmHg)
Mean ± RMS BF (mllmin)
110.50 ± 7.88 117.00 ± 4.72
11.75 ± 1.18 12.68 ± 1.34
Spontaneously hypertensive rats (SHR) High Low
145.01 ± 6.76 147.64 ± 4.32
12.76 ± 1.06 10.27 ± 1.21
'v...,' Z
~~'I~, ~~~,
:>
o w N
:,
-< :> a:
o
z: w :>
o
S Q.
. :::!
l""..;v.".....,.rv .../'~"'V---.Jo-..r..-v-JVv""""-''-
0.0
154.
102.
51.2
205 .
5
256 .
TIME [ SEC)
Figure 6.43 Typical arter ial blood pressure signal at the med ium power level (trace 1), the assoclated blood flow signal (trace 2), the first -order model residuals (trace 3), the second-order model resid uals (trace 4), and the third-order model residuals (trace 5) based on Volterra model pred ict ions. Note the broadband nature of these expe rimental pressure-flow signals, and the absence in the thlrd-order residuals of the "f1ow depressions" seen in the first-order and second-order residuals. The amplitude units for the pressure-flow signals are normalized to unit variance (see text) [Marmarelis et al.,
1993] .
161
Ci 150
i
J:
E .§. 140 l!!
I14 .§.12 :t o Li: 10
::I
~ 130 l!! c,
SDR
"8
öl 120
~ 8
'e:
~ <110'
o
, ., 50
r
I
6'
"
o
100
!
50
16,
~150 Cl
100 I
J:
E .§. 140 l!!
I14 .§.12 :t o Li: 10
::I
:::130
'"eil
0:
"8
äi 120
~ 8
.~
t
I
-c 110'
o
50 Time (Seconds)
100
6'
o
50 Time (Seconds)
, 100
Figure 6.44 Typ ical input-output data (arterial blood pressure: left column; renal blood flow: right column) from the renal autoregulation experiments for low power level of CSRS (quasiwhite broadband) pressure forc ing in normotensive (top row) and hypertensive (bottom row) rat preparations [Chon et al., 1998].
341
342
SELECTED APPLICA TlONS
::\ij
i
v
~;
~
",
u '.0laJCY ,u
I ( tQ)
0 .1
02
0.3
0."
~FJ(Iiz)
B
os
""""'1TI- - - r t- - - - - - , .
Figura 6.45a Contour plots (Ieft) and surfaee plots (right) of magnitude of two-dimensional fast Fourier transforms (2D-FFTs) of averaged seeond-order kemels for high (A) and low (8) power levels of forcing in normotensive (SDR) rats. The two axes of the plots represent the two frequeney variables " and Note the strong nonlinear interaetion peak present at low power level at bifrequeney locatlon r, = 130 mHz and'2 = 30 mHz (and its symmetrie about the diagonal) [Chon et al., 1994b].
'2'
sive hydraulies. Thus removing the zero-lag spike from the first PDM , we allow the remaining PDM waveform to represent strietly the myogenie meehanism. The normalized mean-square error ofthe PDM model predietion for SDR is about 12% for low-level foreing and below 5% for high-level foreing (suggestive of the improved SNR in the latter ease). This error inereases for SHR model predictions and is eonsiderably higher for highlevel forcing (as mentioned above in connection with the emergence of a chaotic oscillation in this ease). Finally , we should note that possible direet effeets of the autonomie nervous system on renal autoregulation are not examined in this study beeause the rat kidney is denervated in the experimental preparation-raising the exciting prospeet of naturally innervated kidneys in future studies, as well as PDM modeling under eonditions of spontaneous aetivity (similar to the presented studies of cerebral autoregulation in the previous seetion).
6.4
METABOLIC-ENDOCRINE SYSTEM
As an illustrative example for the application ofthe advocated methodology to a metabolic-endocrine system, we model the dynamics of the insulin-glucose interactions during
6.4 METABOLlC-ENDOCRINE SYSTEM
343
A
:[ "i
1
4
J
2
~:
12 ~
11:"-" »<;
"-
I :BS:: . ••
B
r-.
1
0 0
... u ... FIlIQUDIC'f" (...)
-~~-~---'i 4
2
o.~~ __
O.J
0,4
.-..rv1lNlCY pt 0...,
lI.I
Figure 6.45b Contour plots (Ieft) and surface plots (right) of magnitudes of 2D-FFTs of averaged secend-erder kemels for high (A) and low (8) power-level forcings in hypertensive rats (SHR). The two axes of the plots represent the two frequency variables and Note absence of nonlinear interaction peak at = 130 mHz and =30 mHz observed in normotensive rats for low power level of forcing [Chon et al., 1994bl.
'1
'2
'1
'2'
metabolie activity. This subject has received extensive attention for a long time, primarily through the use of compartmental (parametrie) modeling [Bergman et al. , 1981; Carson et al., 1983; Cobelli & Mari , 1983; Cobelli & Pacini, 1988; Vicini et al., 1999] , because it is of great importance for the diagnosis and treatment of diabetes. Unfortunately, despite the considerable effort and resources dedicated to this task to date , the modeling results have been modest in that no reliable diagnostic technique has emerged and the dream of an "artificial pancreas" regulating blood glucose levels through automated insulin infus ions remains elusive. It is for this reason that the author and his associates have selected this system as a high-priority application ofthe advocated PDM modeling approach. The analyzed data are of two types: (1) continuous (every 5 min) glucose concentration measurements and doses of insulin infusions through an implanted micropump in Type I diabetics (provided by Medtronic-Minimed, Inc.); and (2) continuous (every 3 min) plasma glucose and insulin concentration measurements of spontaneous activity in anesthetized dogs (provided by Richard Bergman and Katrin Huecking of the USC Medical School). In both cases, the insulin is viewed as the input and the glucose as the output of a Volterra model, since the underlying physiological mechanisms are nonlinear and dynamic. These various mechanisms can be grouped into two categories: (1) glucoleptic
344
SELECTED APPLICATlONS
A
1""1- , . , . ,- - - - - - - - - . . . "
"
!
':J
~ ~
05
o
o
o.,
•• rt".I (Hol . .oo.oa
B
ffi
0.2
0.3
0.•
~PJGfz)
ll.J
i
2
!
l.5
c~
I~
0.5
.. nI[GU(JOC't
u
rt
(Ho)
0.1
0.2
lI.3
0..
~PJGfz)
Figura 6.46a Contour plots (Ieft) and surfaee plots (right) of magnitude of 2D-FFTs of averaged third-order kernel sliee for high (A) and low (8) power-level foreings in normotens ive (SDR) rats. Note the strong nonlinear interaetion peak for low power level of foreing at the bifrequeney loeation: t. = 130 mHz and'2 = 30 mHz, as in the seeond-order kernel [Chon et al., 1994b).
(pertaining to the insulin-assisted uptake of glucose by various tissues and organs); and (2) glu cogenic (pertaining to the generation of new glucose from various tissues and organs in response to insulin elevation). The basic physiological mechanisms of glucose metabolism and insulin secretion have been studied extensively through compartmental modeling but no reliable quantitative model has yet been advanced that explains weIl the existing data. Multiple regulatory closed-loop mechanisms subserve the insulin-glucose interactions (e.g., insulin-dependent glucose utilization in muscles and adipose tissue, gluco se-dependent pancreatic insulin secretion, liver glucose production and uptake, renal secretion of glucose, etc.). In addition, there is insulin-independent glucose utilization in the central nervous system and in the red blood cells, as weIl as additional hormonal controllers (such as glucagon, which affects the liver glucose production and whose secretion by pancreatic o-cells is inhibited by elevated glucose and insulin). It is also qualitatively known that the concentration of free fatty acids and epinephrine influence the nonlinear dynamic relation between insulin and glucose . A preliminary quantitative model of the effect of free fatty acids is presented in Section 7.2.3 as an example of a two-input PDM model. Confronted with this rather complicated nested-loop system, early investigators resorted to modeling simplifications of the full physiological complexity in hope of partial but
6.4 METABOLlC-ENDOCRINE SYSTEM
345
A irr, - ,----c-------, "-1\
g
1,5
0,5
o
o 0,,
0.2
0,3
0,4
~FJUiz)
Bi
0.5
i
1.5
0,5
.. ..
flIlllWJll:r .. (Hr)
.... 0.2
0,1
U
0.4
~FJCliaJ
0.1
Figura 6.46b Contour plots (Ieft) and surface plots (right) of magn ltudes of 2D-FFTs of averaged third-order kernel slice for high (A) and low (8) power level of forcing in hypertensive rats (SHR). Note = 130 mHz and = 30 mHz observed in northe absence of the non linear interaction peak at motensive rats for low power level of forcing [Chon et al., 1994b].
'1
'2
meaningful results . The most successful of these efforts is the so-called "minimal model" that was discussed briefly in Section 1.4. This "minimal model " offered some useful results in the context of specialized clinical "glucose-tolerance" tests [Bergman et al., 1981; Bergman & Lovejoy, 1997] but remains inadequate to represent the true complexity of this system under natural operating conditions. Over the last 30 years, more than 50 parametric (compartmental) models have been published in the literature but none can claim satisfactory performance (for various reasons) under natural operating conditions. We take a different approach to this problem by dispensing with the unnecessary (and potentially misleading) constraints of compartmental models and by adopting a true-tothe-data inductive approach, as advocated throughout this book. It is recognized that the obtained PDM model (based on the available insulin-glucose data) is nonstationary and subject specific. Therefore, this PDM model is adapting through time (through tracking algorithms) and is customized to each specific subject. In this framework, we have been able to obtain PDM models of excellent predictive performance and remarkable interpretability in terms of a glucoleptic and a glucogenic PDM branch, as discussed below. These adaptive PDM models offer a realistic prospect for achieving the long-held dream of an effective "artificial pancreas," because they can be used to control properly the continuous infusions of an implanted insulin micropurnp in the context of a modelreference control strategy based on an accurate nonlinear dynamic model of the underly-
346
SELECTED APPLICATIONS
2nd PDM tor NormotenalYe: Low Forelug
1st PDM tor Normotenslve: Low Forcing 1j
1i
I
I
T1meDomlin
!
l
0.5
0.5
°r
-0.5
•
r,
I
, .... ,
I , l
O,
I
1\ I
. ' . "
"
.'
v
10
0
20 30 40 Time Lag (Seeonds)
0.015 .
50
-0.5 0
i
0.015" ,
10
,
20 30' . 40 Time Lag (SecondI)
,,
50
'
I I I
!
.,~-..:-_ ;;. _ ~ __
. ~
... . . .
,
Frequency Dameln
I
0.0111 \\
0.01
c
i
~
~ 0.OO5l \
I;?if11
"
.:,:::/
---------------_.
\, .:
0 0
0.1
0.2 0.3 Frequency (Hz)
0.4
0.5
0 0
0.1
0.2 0.3 Frequency (Hz)
0.4
0.5
Figura 6.47 The two obtained PDMs of renal autoregulation (averaged over three data records) for the low-forc ing normotensive data in the time domain (top panels) and in the frequency domain (bottom panels). The first PDM (Ieft panels) corresponds to the myogenic mechanism with a resonance peak at around 0.12 Hz, and the second PDM (right panels) corresponds to the TGF mechanlsm with a dominant resonance peak around 0.02 Hz. The standard error bounds in dashed Iines correspond to ± one standard deviation [Marmarelis et al., 1999b).
ing insulin-glucose relation. The importance of this prospect for reliable regulation of blood glucose concentration cannot be overstated. First, we consider the data of injected insulin (input) and glucose concentration (output) from human subjects (8-hour data records sampled every 5 min). Two PDMs have been consistently obtained for a11 subjects and segments, whereby the first PDM represents the insulin-assisted glucose uptake (glucolepsis) and the second represents the insulin-induced glucose production (glucogenesis) . An illustrative example of these adaptive PDM model s is shown in Figure 6.52 for two 8-hr records from a human subject at successive days (i.e., separated by 16 hr). The glucoleptic PDM of both models depicts the insulin-assisted reduction of glucose concentration, with minor differences in dynamics and magnitude for the two segments. Specifically, the dynamics of the second segment are a bit faster (peaking at about 30 min instead of 50 min) and show a slight overshoot after - 2 hr (in both cases the effect of insulin diminishes after -5 hr). The associated glucoleptic nonlinearity is decompressive in both cases , indicating a supralinear insulin-assisted reduction of glucose concentration. Similar observations apply to the
6.4 METABOLlC-ENDOCRINE SYSTEM
1st PDM
tor Hypertenlive: l.ow
2nd PDM tor Hypertenslve : l.o w Forcing
Forel ug
1i
347
1i
i
i
Tlme Domsln
!
I
0.5 .
0.5
°r
~I
r;
,'A
/fi '\ I
.
-05 ' -~:--. 0.... 10 20
30
40
50
-0.5 0
""l..
---
10
20
Tlme Lag (Seconds)
30
40
50
Time Lag (Seconds)
0.015.
i
0.016,
I
Frequency Domsln
-8
0.01
0.01
Ii: u.. 0.005
0.005
~ i
~
liiA~" r· t
.' -
\ '.
v' "
0.1
0.2
0.3
Frequency(Hz)
0.4
0.5
00
0.1
....
...
0.2
0.3
0.4
0.6
Frequency (Hz)
Figure 6.48 The two obtained PDMs (averaged over three data records) for the low-forcing hypertensive data in the time domain (top panels) and in the frequency domain (bottom panels). The first PDM (Ieft panels) corresponds to the myogenic mechanism and the second PDM (right panels) corresponds to the TGF mechanism of renal autoregulation. The first PDM is similar to the low-forcing normotensive case (Figure 6.47) but the second PDM is different and resembles the high-forcing normotensive case (Figure 6.49). The standard error bounds in dashed Iines correspond to ± one standard deviation [Marmarelis et al., 1999b].
glucogenic PDM and its associated nonlinearity, although the contribution of the glucogenic component is decreased by about 30% in the second segment (see the ordinate of the associated nonlinearities). However, the peak response time ofthe glucogenic PDM is about the same in both segments (-90 min) and the effects of insulin-induced glucogenesis last a bit longer (about 7 hr, instead of about 5 hr for glucolepsis) in both segments. Similar results were obtained for three more human subjects, although some intersubject variability was observed. The consistency of these results and their direct interpretability (based on what is expected qualitatively from all previous knowledge) are remarkable and hold the promise of a reliable clinical tool for improved diagnosis and (ultimately) effective glucose regulation with an implanted insulin micropump equipped with adaptive model-reference control capability (the "artificial pancreas"), For instance, the "insulin sensitivity" of a human subj ect can be quantified clinically by the peak value (or the area) ofthe glucoleptic PDM in combination with the values and curvature (second derivative) ofthe glucoleptic non-
348
SELECTED APPLICATlONS
2nd POM rar Normotenslve: HighForcing
1st PDM fOr Normotenslve: HighForcing
1i
i
TIme Domain
! I
0.5
0.5
°r ~
-0.5
0
10
20 30 40 Time Lag (Seconds)
0.015.
!
10
50
20
30
40
50
Time Lag (Seconds)
i
0.015.
I
Frequency Domaln
t\ ,' , O.OH \: ..
0.01
1I I/
t
1 1 \
\
\
~ 0.005
0 0
,'I,,- \
0.0051
".I I
, 0.1
0.2 0.3 Frequency (Hz)
0.4
0.5
00
"
\ \
\ \ \ \ \
"~
0.1
0.2 0.3 Frequency (Hz)
I 0.4
0.5
Figure 6.49 The two obta ined PDMs (averaged over three data reeerds) for the high-fore ing normotensive data in the time domain (top panels) and in the frequeney domain (bottom panels). The first PDM (Ieft panels) eorresponds to the myogenie meehanism and the seeond PDM (right panels) eorresponds to the TGF meehanism of renal autoregulation. Compared to the low-foreing normotensive ease of Figure 6.47, the first PDM is not altered signifieantly but the seeond PDM exhibits marked ehanges (two resonant peaks at 0.02 and 0.07 Hz), resembling the low-foreing hypertensive ease of Fig. 6.48 [Marmarelis et al., 1999b].
linearity. The respective values of the glucogenic branch can provide a clinical measure of the subjects ability to produce new glucose in response to elevated insulin (indirectly assess ing possible malfunction ofthe liver and/or kidneys in this regard ). We now turn to the experimental data of spontaneous plasma glucose and insulin concentrations from the anesthetized dog (lO-hour data record sampled every 3 min). The concentration of free fatty acids (FFA) was also measured at the same times and subsequently analyzed. Single-input results are presented below, but the most intriguing result is when FFA was analyzed as a second input in addition to insulin (see Section 7.2.3). The experimental data records are shown in Figure 6.53. Two PDM s were again obtained for the insulin-to-glucose system from these data and shown in the PDM model of Figure 6.54. Although the first PDM exhibits again the main glucoleptic characteristics of an insulin-assisted reduction of glucose concentration (albeit with different dynamics than before), the second PDM presents a more complicated bipha sic behavior (both glucoleptic and glucogenic phases). This may be attributed to homeostatic mechanisms of the au-
349
6.4 METABOLlC-ENDOCRINE SYSTEM
1st PDM tor Hypertenstve: High Forctng
2nd PDM Tor Hypertensive: High Forctng TlmeDomain
!
I
0.5
0.5
I
I.
°U ,I'·" I '
{,
"
\1
-0.5
10
0
20
30 40 Time Lag (Seconds)
50
0.015,
I
i
-0.5
10
0
20 30 40 Tarne Lag (Seconds)
50
0.015 .
i
Frequency Dornaln
0.01 f :'\'~\
0.01
"",,
\ \
,w \~ I I '\ \ ' ,' 1 ., \
t: 0.005
(~~~=~~
~
0.005 ~t!
\
_ ~'~: "
: " . ~~- - - -- - -- .., ---:-:::-:-
;.~.-..
00
\
0.1
0.2
0.3
0.4
0.5
Frequency(Hz)
0'
o
0.1
0.2
0.3
_
.-_. _===1 0.4
-0.5
Frequency (Hz)
Figure 6.50 The two obtained PDMs (averaged over three data reeords) for the high-foreing hypertensive data in the time domain (top panels) and in the frequeney domain (bottom panels). The first PDM (Ieft panels) eorresponds to the myogenie meehanism and the seeond PDM (right panels) cor responds to the TGF meehanism of renal autoregulation. The first PDM is similar to the normotensive ease and the seeond PDM resembles the low-foreing normotensive ease, although the dominant resonanee peak is shifted upward to about 0.04 Hz [Marrnarelis et al., 1999b] .
MODE#2
Figura 6.51
PDM model for renal autoregulation (without the all-pass eomponent).
350
SELECTED APPLICAT10NS
02 GLUCOLEPSIS
0 ~
o
-50 .. _.....
-0.2 -0 .4
-r
_.
---
..•..
._,,",
;
-0.6
Injeeted lnJuIin
x(t)
o
0
1500
500 1000 Time Imins)
Plasma
Glueese
y.j.t)
-0 .4
-0 6
GLUCOGENESIS
-0.8'
o
Figure 6.52a
, 1500
:>00 1000 Time Iminsl
o
PDM model of injected insulin-glucose data in a diabetic subject (first 8 hr segment).
0.2,'
I
GLUCOLEPSIS -0.2
y ,(t)
-0.4 -0.6
-0 .8 '
Injected x(l) Insulin
o
0:
500 1000 Time Imins)
, 1500
~y(t)
+. I
7'
o
-5
Plasma
Gl\l(ose
I
-0.1
-0.2
y.j.t)
-0 .3
GLUCOGENESIS :::
[v·_·······r ···_···;···_·······] o
--
:>UO 1000 Time Imins)
1500
-3
-2
-1
o
Figure 6.52b PDM model of injected insulin-glucose data in the same diabetic subject as in part (a), for a second 8 hr segment starting 16 hr after the end of the first segment.
6.4
METABOLIC-ENDOCRINE SYSTEM
351
100 ~ 80 ~
.s 60 "S
40
CI)
.s
20
0
100
200
300
400
0
100
200
300
400
0
100
200
300
400
500
600
500
600
500
600
Time [minutes]
--.90
=c
0,
E
;' 80 CI)
0
0 ::::J
(5 70
Time [minutes]
0.8
!O.6 ~ 0.4
u,
0.2
Time [minutes]
Figure 6.53 Spontaneous plasma insulin, glucose, and free fatty acid (FFA) data from anesthetized dog, sampled every 3 min.
o
o
-0.005
...
v1(t)
-0.01 -0.015
-0.025
Insulin
7
-5 -10 -15
-0.02
xjt)
y1
5
150
300
450
600
...
:/ /
r7
-20 0
v1
750
0
-5
5
...... I
~
0.01 0.005
o
h2(t)
10
l\
5
-0.01
/~
---.. V2.(t)
V o
V
t [nin s]
V \ -0.005
y2
150
Figure 6.54
300
450
600
750
0
V
-5 -4
-2
o
2
~
V2.
4
PDM model for spontaneous insulin-to-glucose data in dog.
Glucose
352
SELECTED APPLICATIONS
toregulated state ofthe spontaneous metabolic-endocrine activity ofthe normal dog (giving rise to the so-called "counterregulation" phases that have been observed qualiitatively and seem to contradict notions of unidirectional autoregulatory mechanisms). Note that this biphasic dynamical behavior was also observed in our studies of cerebral autoregulation under natural conditions of spontaneous activity (see Section 6.2). We also note that the timing ofthe two peaks in the glucoleptic PDM coincides with the timing ofthe peak and trough of the glucogenic PDM. Furthermore, the time scale of these dynamics of spontaneous activity (in dogs) is longer than in the previous injected insulin-glucose model in humans (almost double). The spontaneous insulin-to-glucose PDM model (if confinned with additional data) offers great potential as a diagnostic tool, because the glucoleptic PDM quantifies the aggregate effect of insulin on glucose uptake (a reliable measure of insulin sensitivity that can diagnose Type 11 diabetes in rigorously defined grades), and the glucogenic PDM quantifies the aggregate effect of insulin on new glucose production (a reliable diagnostic tool for possible malfunction of this process, related primarily to the liver and the kidney). Since the spontaneous activity data truly derive from a closed-Ioop system, we may reverse the input-output roles of insulin and glucose in order to derive a PDM model of glucose-induced insulin changes. The resulting PDM model ofthe glucose-to-insulin system is shown in Figure 6.55 and exhibits biphasic dynamics in both PDMs, although the individual contributions of the two PDM branches (tenned "insulinoleptic" and "insulinogenic") balance each other in a homeostatic fashion by virtue of the opposite signs of the associated nonlinearities. We can utilize the PDM model of the glucose-to-insulin system in a diagnostic context and use the measurement of the "insulinogenic" PDM (i.e., the PDM that represents the glucose-induced production of insulin) to diagnose possible pancreatic deficiency/inadequacy and to quantify reliably various grades of Type I diabetes. A suitable diag-
0.2
40
----,.---,.-----r--,-----,
1-1
o 1 ~f +...-_.-~ ---------- i---.-.---- t·.--------(-----.!r---.! i. tImins o. I I. .;>... ===I.
-... .
20
o
.....:.
~
-0. 1
,··········t·········r·····!········..( . I
~.21
o
x(t
:
:
:
:
: 150
: 300
: 450
: 600
750
0.151 h2(t)
0P
v1(t) -20 -40
10
o~~~;::;"]:::::::·ffi:::
-5
o
5 y2
---.------,.--+----.-----,
r-,
0'
..... '
I •
. tfml. .]' 450
Figure 6.55
800
750
o
2
4
PDM model for spontaneous glucose-ta-insulin data in dog.
6.4 METABOLIC-ENDOCRINE SYSTEM
353
nostic role can also be assigned to the obtained "insulinoleptic" PDM, as a measure of the subject's aggregate ability to utilize insulin for glucose catabolism. In this rigorous and quantitative context, we can assess the effect of obesity, exercise, and other factors (germane to metabolie processes) on the aggregate effect of insulin on blood glucose concentration. This addresses a host of important health-related issues that have absorbed immense resources over many years with modest (and often equivocal) results to date. Of course, the presented PDM results are preliminary. It is interesting to note that the PDM models obtained consistently from human subjects cannot be reproduced by the widely accepted "minimal model," pointing to its limited utility for prediction and control, unless it can be augmented to incorporate the other important mechanisms (e.g., the glucogenic process and the effects of glucagon). In order to demonstrate this point, we perform PDM analysis on the "minimal model" for parameter values that are considered physiologically meaningful for normal humans in the literature. The resulting PDM model exhibits different characteristics from those observed in the PDM models based on the actual data. This is to be expected since this "minimal model" only accounts for insulin-induced glucoleptic effects. The comparison can be made also on the basis of the respective Volterra kemels and leads to the same conclusion. It can be argued that the use of models that are not inductive (i.e., not based on the actual data) but derive from abstract considerations (that are liable to subjective biases and possibly restrictive preconceived notions ) may be misleading in certain cases and should be avoided when lives are at risk. Another issue of practical interest is the asymptotic behavior of the PDM model for step inputs of variable strength (often referred to as the "steady-state" behavior). This allows us to gain some insight into the functional importance of the PDM-associated nonlinearities. The actual PDM dynamics are reflected on the transient portion of the step response (e.g., the presence or absence of undershoot, and the extent of settling time). However, the asymptotic value of the step response (i.e., after the settling time has elapsed) reflects the form of the associated nonlinearities when viewed as a function of the step amplitude. Thus, if a step input of amplitude A is applied, then the jth PDM output reaches asymptotically (i.e., after the settling time) the value: uj = Apj' where Pj denotes the integral of the jth PDM. The contribution of this PDM to the system output is Yj = jj(uj ) , where jj denotes the associated nonlinearity for the jth PDM. The resulting asymptotic value of the system output is y(A) = Yo +
I
Yj
j
= Yo +
I
jj[Apj]
(6.13)
j
The "steady-state" behavior of the "minimal model" given by Equations (1.18) and (1.19) is given for a step insulin input (relative to its basal value) whereby the asymptotic value of insulin action is X = Py4/P2' and the asymptotic value of the glucose concentration is
G = Gb(1 + ~A]-l
(6.14)
PtP2
It is evident from Equation (6.14) that the "minimal model" cannot reproduce the actual
354
SELECTED APPLICATlONS
steady-state behavior ofthe system as represented by the inductive (data-true) PDM model and given by Eq. (6.13). To achieve better agreement with the data-true inductive model, we propose a "new minimal model" that incorporates in the "glucose balance" equation two more "state variables" in bilinear terms (representing internal glucose and insulin production) and assumes the form dG = -ß(G - Gb ) dt
~]
-
'YI(ZO + ZI)G + 'Y2Z2
= -alx] + I,
dX2 dt = -a2x2 + I,
ZI = fi(x]) > 0
Z2 = h(X2) > 0
dz o
---;;t = -aoZo + 'Yo G
(glucose balance)
(6.15)
(insulin action)
(6.16)
. (glucose production)
(6.17)
(insulin production)
(6.18)
Note that two ofthe "state variables" (insulin action and glucose production) are subjected to static nonlinear transformations fi. and h prior to entering into the bilinear terms of the "glucose balance" equation. This model yields the "steady-state" expression 1'01'1 - + [1'1 1'2 I' )] --02 1 + -fi.(A/al)]G = [ G b + -J2(A/a2 aoß
ß
ß
(6.19)
that has a single positive solution G for each A, because the discriminant of this quadratic equation in G is positive and the product of the two real roots is negative (the negative root is not admissible because G must be positive). Proper selection of the functions t, and h can reproduce the steady-state behavior observed in the actual data and endow the "new minimal model" with greater versatility of dynamics capable of explaining more complicated behavior. At this juncture, it is instructive to note that a compartmental or parametric model, such as the "minimal model" or any other, can be validated by means ofVolterra analysis. Specifically, the Volterra kernels of the parametric model can be compared with the estimated Volterra kernels using the input-output data. Agreement between all pairs of kernels of the same order validates the parametric model because of the uniqueness of the Volterra representation for practically any system (provided that the Volterra kernel estimates are unbiased, being derived for a complete model). Disagreement between any pair of Volterra kernels invalidates the parametric model, at least for the respective order of nonlinearity and to the extent indicated by the observed discrepancy. For instance, it can be shown that for P3 ~ 1, the "minimal model" can be approximated well by a second-order Volterra model with kemels [Marmarelis & Mitsis, 2000]: k o = Gb
(6.20)
k 1(T) = -Gb~[exp(-PIT) - exp(-P2 T)] P2-Pl
(6.21)
6.4
1 {
k2('T\,T2)= 2G
f
m in
k\(T\)k,(T2)+P\
( Tl, 72)=7m
0
METABOLIC-ENDOCRINE SYSTEM
exp(-P IA)k\(T\-A)k l(T2- A)dA
}
355
(6.22)
b
Ifthe estimated first-order Volterra kernel cannot be fitted by the two-pole expression of Equation (6.21), then the more general expression ofEquation (3.70) can be applied that was derived for systems with multiple "state variables" (i.e., bilinear terms in the glucosebalance equation). The second-order Volterra kerne I can be used for finer validation and to guide possible adjustments of the parametric model if discrepancies are observed. Clearly, fitting the second-order kerneI is more complicated than fitting the first-order kernel, but it also provides unique insight into the system nonlinearities (i.e., many systems or models may look alike at the first order but be distinctly different at the second order and exhibit drastically different input-output behavior). Another interesting observation on the expression of the second-order kerneI in Equation (6.22) is that the first term within the brackets corresponds to a squarer applied to the output of a filter k-, and the second term corresponds to a cascaded operation of the square of the k I filter output with another filter h(A) = Pt exp(-ptA). This equivalent modular model is an approximation of the precise modular model shown in Figure 1.10 for P3 ~ 1, and takes the form of an "Sm model" initially proposed by Baumgartner & Rugh (1975) and studied by Chen et al., (1986). The question naturally arises as to what is a PDM approximation ofthis modular model. Clearly, one PDM can be considered to be equal to k I , associated with the quadratic nonlinearity. However, the other branch of the modular model may be approximated by an arbitrary number of PDMs due to the action of the posterior filter. If only two PDMs are desired, then the second PDM g2(T) should yield the best least-squares approximation by minimizing the mean-square error:
J= If{g2(T\)g2(T2) -
»,
r m
exp(-P\A)k\(T\ - A)k\(T2 - A)dAYdT\ d T2
(6.23)
The solution of this minimization problem can be facilitated if we consider an expansion of the second PDM g2(T) on an appropriate basis of functions {bj ( T)} (e.g., Laguerre functions) and then minimize Jby differentiating with respect to the expansion coefficients. A more practicable alternative is to minimize J in the frequency domain, since we know that
J= I r -00
2
IG2(WaG2(W2) -K
1(w\)K(W2)
/\
Pt +}
Wt
+ W2
)
1
dto.dco,
(6.24)
where G2 and K, denote the Fourier transforms of g2 and k., respectively. It is evident from Equation (6.24) that ifthe "minimal model" can be approximated well by means of two PDMs, then the second PDM has the transfer function G 2 ( W ) = [2G bJ- 1/2Q(w)KI ( w)
(6.25)
PI PI + j(w\ -r W2
(6.26)
where the filter Q is such that
Q(W\)Q(W2)
==
356
SELECTED APPLICATIONS
r.
1 1
o
'
1.5 -1
1 ~--
---.
0.5
//
v(t)
I ~
100
200
Figure 6.56
300
400
7 I
-4
-5 -1.5
500
V
(t)
-----..
/
-3 \
~
/1
/
-2
H-----------,-------------;-------------:--------------
01
...........-:
-1
-0.5
0
0.5
PDM model for spontaneous FFA-to-glucose data in dog.
The static nonlinearity associated with the second PDM is a simple squarer in this approximation. It is worth noting that the first (glucoleptic) PDM of the human subject in Figure 6.52a resemhles the first-order Volterra kernel ofthe "minimal model" (plotted in Figure 1.9), hut not for the second data segment shown in Figure 6.52h). The agreement ceases also for the overall PDM model and for the second-order kemels. Tuming to the FFA data, we obtain a PDM model of the FFA impact (by considering it as an input) on the glucose concentration (output), which is expected to be antagonistic to the effect ofinsulin. This model has only one PDM that is shown in Figure 6.56, depicting the anticipated antagonistic effect on glucose (lasting over --2 hr) by comparing with the glucoleptic PDM ofFigure 6.54, although a weak "counterregulation" phase follows after the 2 hr lag and lasting up to --5 hr. Since another point of interest is how changes in plasma insulin concentration affect the FFA concentration, we obtain a PDM model with insulin as input and FFA as output. The resulting model has one PDM and is shown in Figure 6.57, depicting an agonistic effect of insulin on FFA (observe the positivity of the associated nonlinearity that turns both positive and negative phases of the biphasic PDM into positive effect on the FFA output). This result clearly shows that any change in plasma insulin causes an elevation in FFA concentration.
h(t)
0.04 0.03
..~
0.15
0.02
\
0.01
v(t)
~~
o \ -0.01 -0.02
o
y
0.2
V 50
Figure 6.57
100
---. 0.1
\
...
t [mins]
150
200
0.05
o
-4
\
(t)
----.
\
-. /
-2
o
/
! v
2
PDM model tor spontaneous insulin-to-FFA data in dog.
6.4
METABOLIC-ENDOCRINE SYSTEM
357
An intriguing result is also presented in Section 7.2.3 in connection with the nonlinear dynamic impact on glucose concentration of the combined application of insulin and FFA inputs (obtained from the spontaneous data in the dog). This result appears to elucidate in a quantitative manner the effect of obesity on the insulin sensitivity of glucose utilization (confirming the anticipated reduction of insulin sensitivity for elevated FFA concentration) [Rebrin et al., 1995; Marmarelis et al., 2002]. These results illustrate the wealth of knowledge that can be extracted from natural (spontaneous) physiological data by the proper application ofthe advocated methodology in the field of metabolic-endocrine systems. The author is aware that this field has been dominated by compartmental modeling up to now, in part because of the sparsity of the available time-series data. Nonetheless, as the measurement technologies improve (e.g., continuous glucose sensors), the use of the advocated methods becomes feasible and the potential benefits for the advancement of scientific knowledge and clinical practice appear to be exceptional.
7 Modelingof Multiinput/output Systems
This chapter is dedicated to the important problem of modeling physiological systems with multiple inputs and multiple outputs, which is attracting growing attention as it becomes increasingly evident that reliable understanding of the function of physiological systems will be possible only when the full cast of protagonists can be taken into account in a single model. This realization arises from the fact that physiological mechanisms are often highly interconnected and subject to multiple influences. This complex interconnectivity may take the form of cascaded and parallel operations (e.g., fan-in and fan-out forward architectures of neurosensory and neuromotor systems) or closed-loop operations (e.g., homeostatic mechanisms of cardiovascular-respiratory or metabolic-endocrine autoregulation). The latter case is far more challenging, as it often exhibits complicated interconnectivity in nested loops that may require a different methodological approach such as the one presented in Chapter 10. In this chapter, we focus on the modeling problem of multiple causal pathways between selected input and output variables. For each output variable, an extended multiinput Volterra series expression is sought that describes the causal relationship between the output variable and all the selected input variables. Note that a specific variable can be viewed both as an output and as an input (for another output variable, or even in auto-regressive relation with itself as discussed in Chapter 10.) In the following section, we start with the two-input case, the simplest example ofmultiinput modeling, which has found several interesting applications to date (illustrated in Section 7.2). The full multi input case is discussed in Section 7.3 and culminates in the important spatiotemporal case in Section 7.4. The latter represents the best example so far of multiinput modeling in the visual system. It must be emphasized at the outset that the nonparametric modeling approach (and its derivative network-based methodologies) offer a tremendous methodological advantage over the parametric modeling approach in the case of multiple inputs. This was first recognized by my brother Panos in the realm of the visual system 30 years aga and remains Nonlinear Dynamic Modeling 0/ Physiological Systems. By Vasilis Z. Marmarelis ISBN 0-471-46960-2 © 2004 by the Institute ofElectrical and Electronics Engineers.
359
360
MODELING OF MULTIINPUTIMUL TIOUTPUT SYSTEMS
the most promising direetion in our efforts to disentangle the eomplexity of multivariate physiologieal funetion in a manner that remains "true to the data."
7.1
THE TWO-INPUT CASE
In the ease of two inputs, the output can be expressed in terms of the extended Volterra series that ineludes a set of kernels for eaeh input (termed "self-kernels") and a set of "eross-kemels" that deseribe the dynamie nonlinear interaetions between the two inputs as they affeet the output [Marmarelis & McCann, 1973; Marmarelis & Naka, 1973e]:
y(t) = ko,o + ('k\,o(T)x\(t- T)dT+ ('ko,\(T)x2(t- T)dT
°
0
+
fJ
k 2,o(T\, T2)x\(t- T\)x\(t- T2)dT\ dT2 +
+
fJ
k\,\(Tb T2)x\(t- T\)x2(t- T2)dT\ dT2
o
fJ 0
kO,2(Tb T2)Xit- T\)X2(t- T2)dT\ dT2
+ ... + f ..
·fooo km,n( Tb . . . , Tm+n)xl(t- T\) ... XI(t- Tm)x2(t-
Tm+\) ... xit- Tm+n)dT\ ... dr.m+n (7.1)
+ ...
where km,n( Tl' . . . , T m+n) denotes the kerne! that eonvolves m times with the first input XI(t) and n times with the seeond input x-tr). Clearly, when mn =1= 0, km,n is aeross-kemel, and when m = 0, ko,n is the nth-order self-kernel for input X2' and viee-versa. The erosskernels deseribe the nonlinear interaetions between the two inputs as they affect the output and (unlike the self-kemels) are not symmetrie about their diagonals (i.e., the temporal arrangement between the two inputs matters in terms ofthe elicited response). The zeroth-order kernel ko,o is eommon for both inputs. The self-kemels for either input in the two-input Volterra model are the same as the kernels ofthe same input in a single-input Volterra model, if and only if the models are eomplete (i.e., not truneated) and the other input is not aetive. If the other input is aetive, then the single-input Volterra model will attain a nonstationary form (see Chapter 9) beeause ofthe funetional terms in Equation (7.1) that eontain the other input. The two-input Volterra series can be orthogonalized for two independent GWN inputs in a manner similar to the single-input Wiener series orthogonalization, yielding the twoinput Wiener series [Marmarelis & MeCann, 1973]:
y(t) = ho,o + ('ht,o( T)x\(t - T)dT+ ( 'ho,\(T)x2(t- T)dT o 0 + foof h 2,o( T\, T2)x\(t - T\)x\(t - T2)dT\dT2 -
°
+ +
ff ff
+ ...
p\fooh2 ,o(A, A)dA 0
f
h O,2( Tb T2)xit - T\)xit - T2)dT\ dT2 - P 2
ho,2(A, A)dA
h\,\(Tb T2)x\(t- T\)xit- T2)dT\ dT2
(7.2)
7.1 THE TWO-INPUT GASE
361
where the Wiener kemels {hi,j} are generally distinct from their Volterra counterparts and depend on both power levels PI and P 2 ofthe two independent GWN inputs Xl and X2. For instance, the zeroth-order Wiener kernel is expressed in terms of the input-independent Volterra kerneis as m
I I 00
ho,o =
m=On=O
J
(2m - 2n)!(2n)! Joo (m _ n)!n!2m p'{'-npq ... k2m-2n,2n(A), A), ... , Am' Am)dAi ... dA m (7.3) 0
which implies that ho,o contains components generally dependent on all even-order Volterra kerneis and on both power levels ofthe two GWN inputs. This is an important point to keep in mind when the Wiener kerneI estimates of the two-input model are interpreted. To elaborate this point for the practically important firstorder Wiener kernei, we note that 2n)!(2n)! h\'2=II (2m(m+ -1 -n).n.2 , ,m pr+l-npq Joo... Jk2m+I-2n,2n(r,AJ, AJ,... .x.; Am)dAI···dA (7.4) 00
m
m
m=On=O
-00
which implies that the first-order Wiener "self-kernel" actually depends on the higher odd-order Volterra cross-kernels, if such exist. Therefore, in the Wiener formulation of the two-input problem, the interpretation of the "self-kernels" must be made with great caution and proper attention to the possible effects of higher order cross-kernels representing interaction nonlinearities. Similar statements apply to the second-order Wiener kemels and so on, observing the separation between odd and even terms. The first application ofthe two-input modeling approach to physiological systems was made by Panos Marmarelis and Gilbert McCann on directionally selective cells in the fly eye [Marmarelis & McCann, 1973] followed shortly by an application to horizontal, bipolar, and ganglion cells in the catfish retina by Panos Marmarelis and Ken Naka [Marmarelis & Naka, 1973c]. The first application used two spot stimuli to emulate directional motion, and the second application used stop and annulus stimuli to test the center-surround organization of retinal receptive fields. These applications are viewed as pivotal in establishing the feasibility of the multi input modeling approach in systems physiology, culminating to the spatiotemporal modeling of the vertebrate retina by the same pioneers (see Section 7.4). These pioneering applications, as well as two recent applications to the metabolic and cardiovascular systems, are presented in Section 7.2 as illustrative exampIes oftwo-input modeling. In this section, we present the three main methodologies for estimating the self-kernels and cross-kernels in the two-input case. For historical reasons, we present first in Section 7.1.1 the cross-correlation technique that was used in the aforementioned pioneering applications for the estimation of Wiener self- and cross-kemels of a second-order model. We follow in Section 7.1.2 with the adaptation of the kernel-expansion technique to the two-input case, which yields the Volterra self-and cross-kernels of(possibly higher order) models. The kemel-expansion approach naturally leads to the PDM variant for two or multiple inputs, and this in turn leads to Volterra-equivalent network models that are discussed in Section 7.1.3. The kernel-expansion approach and its PDM or network-based variants yield accurate Volterra-type high-order models from short data records and, therefore, are recommended as the methods of choice for two-input modeling. Their efficiency becomes critical in the case of multiinput modeling, as will be discussed in Sections 7.3 and 7.4.
362
MODELING OF MUL TIINPUTIMUL T10UTPUT SYSTEMS
7.1.1
The Two-Input Cross-Correlation Technique
As indicated above, this is the technique originally used for two-input modeling by Panos Marmarelis and his colleagues. It employs two independent GWN (or other quasiwhite) input signals and utilizes the orthogonal Wiener series formalism ofEquation (7.2) to obtain estimates ofthe Wiener kernels as 1 hm.n{ Tb . . . ,Tm+n) = m!n!pmpn E[Ym.n(t)xI{tI
2
Tl)' .. Xl{t -
Tm)x2{t - Tm+l) ... X2{t- Tm+n)] (7.5)
where Ym,n(t) denotes the output residual at the (m, n) estimation step (i.e., after subtracting from the output signal the contributions of all previously estimated Wiener kernels of lower order). The reason for using the output residual (instead of the output signal itself) was elaborated in the single-input case and pertains to the accuracy of the estimation of the diagonal values of the Wiener kemels, The ensemble average indicated in Equation (7.5) is replaced in practice by time-averaging (over finite-length records) on the assumption of stationarity and ergodicity of the input-output signals. Note that unlike the self-kernels, the cross-kemels are not symmetric about some of their diagonals that separate the effects of the two inputs. The limitations of the cross-correlation technique and the associated estimation errors were discussed extensively in Chapter 2, and the same analysis applies to the multiinput case as well. The application ofthe two-input cross-correlation technique to actual physiological systems is illustrated in Section 7.2. 7.1.2
The Two-Input Kernel-Expansion Technique
As in the single-input case, the Volterra kernels ofthe system can be expanded on ajudiciously chosen basis of functions (e.g., the Laguerre basis) in order to reduce the number of free parameters that need be estimated from input-output data. Note that due to the asymmetry ofthe cross-kernels, the number of expansion coefficients for cross-kernels is larger than for self-kemels of the same order. The expansion for the Volterra kernel of order (m, n) is km,n(TI,· .. ,Tm+m) = L
L
I ...jm+n=l I Cm,nU},··· ,jm+n)bjI(TI) . . . bjm(Tm)bjm+I(Tm+l) . . . bjm+n(Tm+n)
(7.6)
Jl=l
where bj( T) denotes the jth basis function and {cm,n} are the expansion coefficients of km,n (to be estimated from input-output data). Note that the indices j I to j m correspond to the input Xl (m product tenns) and the indicesjm+I tojm+n correspond to the input X2 (n product terms). It is evident that the kernel value is the same for any permutation of the first set of indices (as in the case of a self-kemel) or for any permutation of the second set of indices. However, permutations across the two sets of indices correspond to distinct kernel values (i.e., asymmetry of cross-kernels about some diagonals). Note that the two sets ofindices remain segregated by convention (i.e., the first m correspond to input Xl and the last n correspond to input X2). It follows from these observations that the kernel of order (m, n) has
7.1 THE TWO-INPUT GASE
363
Lm+n/(m!n!) distinct expansion coefficients, where L is the number of employed basis functions. The kernel expansion of Equation (7.6) leads to the following "two-input modified Volterra model": L
y(t) = ko,o
+I
il=l
L
L
cI,oCiI)V)~)(t) L
L
+I···I I iI=I im=I im+I=I
+I
iI=l
co,ICiI)V)~)(t)
+...
L
'" Cm,n Ci J, ••• ,}m+n ' )VjI (1)( t) ... Vjm (1)() (2) (t) ... Vjm+n (2) (t) L t Vjm+I
i m + n =l
+ ...
(7.7)
where v/i) denotes the convolution of b, with Xi' Note that each input may also employ its own set of basis functions (if such practice is deemed useful in a given application) as shown later for multiinput Volterra-equivalent network models. The expansion coefficients in the model of Equation (7.7) can be estimated through linear least-squares procedures because they enter linearly into the model (note that the signals Vj(i)(t) are computed as convolutions between the basis functions and the respective input signal). Although the model of Equation (7.7) is written in continuous time, its discrete equivalent (resulting through appropriate sampling) is the one used in practice for estimation and other computational purposes (e.g., prediction). Thus, the estimation problem for the two-input Volterra model using the kernel-expansion technique follows the procedural steps elaborated previously for the single-input case. Likewise, in a manner similar to the single-input case, we can derive the equivalent PDM model for the two-input case, as depicted in Figure 7.1. The PDM model can be practically estimated for high-order systems and represents the most parsimonious form of a Volterra-type model. The two-input PDM model is recommended as the key instru-
U(I) 1
INPUT
Xl
(1)
U H1
MULTI-INPUT -f( U 1(I) , ••• ,UH2(2)) STATIC yNONLINEARITY I I
U(2) 1
INPUT x 2 (2)
U H2
0
Figure 7.1 The PDM model for the two-input ease. The PDM output 1) is the convolution of the PDM (i,J) with the input x, (i = 1, 2). The multiinput statie nonlinearity f receives as inputs al! the PDM outputs and generates the model output. The PDMs and f are determined by means of similar proeedures with the ones deseribed in Section 4.1.1 for the single-input ease, using broadband input-output data.
364
MODELING OF MUL TIINPUTIMUL TIOUTPUT SYSTEMS
ment for interpreting the observed dynamics and understanding the functional characteristics ofthe subject system. The PDMs and the associated multiinput static nonlinearity can be determined by means of procedures similar to the ones outlined in Section 4.1.1 for the single-input case, using the available input-output data. The latter ought to be broadband and the two input signals ought to be weakly correlated in order to achieve high-quality estimates of the PDMs andf. The reader should be reminded that the PDMs are determined either from Volterra kerneI estimates or through Volterra-equivalent network models (discussed in the following section).
7.1.3
Volterra-Equivalent Network Models with Two Inputs
A most efficient approach to obtaining the two-input PDM model is the use of a "separable Volterra network" (SVN), whereby the multiinput static nonlinearity of Figure 7.1 is "sliced" into individual dual-input static nonlinearities associated with each pair of PDMs (one from each input). This leads to the two-input SVN model with a single hidden layer shown in Figure 7.2, whereby the activation functions of the hidden units are the aforementioned individual dual-input static nonlinearities. The PDMs are detennined by the inbound weights of each hidden unit (each hidden unit corresponds to a distinct pair of PDMs) in conjunction with the respective basis functions ofthe filter bank employed for preprocessing the corresponding input. Note that the two filter banks may be distinct and the mathematical formulation is now in discrete time (by computational necessity), whereby n is the discrete-time index. As indicated in Figure 7.2, the internal variable uh(n) of the hth hidden unit is composed as a weighted summation of all filter bank outputs: LI
L2
Uh(n) = L ~ w(l!v(l)(n) h.j } j=I
+ L~ w(2!v(2)(n) h,}}
(7.8)
j=I
x2(n)
~(n)
v;l}(n) V~l) ( n)
I
vj2)( n )
L,
Uh ( n )
= L wi:~ vY) (n) }=1
~
+~ W(2~v(?) (n) ~ h,l 1
.h(e)
zh(n)=fh[uh(n)]
}=1
y(n)
Figure 7.2 The two-input separable Volterra network where each input {x1(n), x2(n)} is preprocessed through the respective filter bank {bJ)} (i = 1, 2; j = 1, ... , Li), and the respective filter bank outputs are fed into the hidden units of the hidden layer with polynomial activation functions {fh}. The output y(n) is formed by summation of the outputs of the hidden units {zh(n)} and an offset Yo.
7.1 THE TWO-INPUT GASE
365
where w~j denotes the weight ofthe output vj)(n) ofthejth filter in the ith filterbank given by the convolution M-l
vj)(n) =
L bj)(m)x;(n - m)
(7.9)
m=O
where bj) denotes the basis function that is the impulse response ofthejth filter in the ith filterbank. The PDM (i, h) in the two-input case is given by Li
Pi,h(m) = Lw~jbj)(m)
(7.10)
j=l
where m denotes the discrete-time lag of the PDM. The internal variable uh(n) of each hidden unit is transformed by the polynomial static nonlinearity Jh to produce the output of the hth hidden unit: Q
zh(n) = Jh[uh(n)] = LCh,qu%(n) q=I
(7.11)
The output of the network model is formed by simple summation of the outputs of the hidden units and an offset Yo: H
y(n) = Yo + LZh(n)
(7.12)
h=l
It is evident from Equation (7.8) and (7.10) that the PDM outputs for the two inputs combine in summative pairs prior to the nonlinear transformation depicted in Equation (7.11) that results in "nonlinear interaction" terms between the two inputs as they impact the output ofthe model. Thus, even though the activation functionJh is univariate, it contains nonlinear interaction terms between the two inputs. A critical advantage of this modeling approach over the previous two (cross-correlation and kernel-expansion) is that the number of free parameters is linear with respect to the nonlinear order Q (and with a rate of increase equal to H) making it very efficient for high-order systems. This issue is further discussed in Section 7.3 in connection with multiinput modeling. Note that the number of free parameters for the network model of Figure 7.2 is (LI + L 2 + Q)H + 3, where one free parameter is allowed for each filter bank. The correspondence between the Volterra kerneIs ofthe two-input system and the parameters of the Volterra-equivalent network of Figure 7.2 is found in a manner similar to the single-input case. By proper substitution ofthe analytical expressions (7.8)-(7.12), we obtain ko,o = Yo
(7.13)
H
k1,o(m) = LCh,tPI,h(m)
(7.14)
h=l H
kO,l(m) = L Ch,tP2,h(m) h=l
(7.15)
366
MODELING OF MUL TIINPUTIMUL TIOUTPUT SYSTEMS H
k2,o(mb m2):=
I Ch,LPI ,h(m I )P I ,h(m2) h=I
(7.16)
H
kO,2(mI, m2):=
I
Ch,LP2,h(mI)P2,h(m2) h=I
(7.17)
H
kl,I(mb m2):=
I Ch,2[PI,h(mI)P2,h(m2) + PI,~m2)p2,~mI)] h=I
(7.18)
etc. Upon training of the network model of Figure 7.2 with the input-output data, the PDMs and/or the Volterra kemels ofthe system can be constructed by means ofthe expressions given above. This methodology has been shown to perform exceptionally weIl, even for short data records and in the presence of considerable noise. An illustrative example of a simulated system is given below and examples of applications to actual physiological systems (neurosensory, cardiovascular, and metabolic) are given in Section 7.2. In closing this section, we should note that an alternative network configuration may prove beneficial in certain cases (and especially in the multiinput case), whereby a second hidden layer (termed the "interaction layer") is used to capture the interactions among various inputs. Because this architecture is deemed to be primarily useful for scalability purposes in the multiinput case, it is discussed in detail in Section 7.3.3.
Illustrative Example. To illustrate the performance of the network-based approach, we use two Laguerre filter banks (with distinct parameters alpha) to model a simulated system with two inputs (for which we have "ground truth") that is described by the differential equations dy(t) = [-ao + CjZj(t)- czzz(t)}Y(t) + Yo dt
(7.19)
dzj(t) = -ajZj(t) + Xj(t) dt
(7.20)
dzz(t) = -azzz(t) + xz(t) dt
(7.21)
where Xl(t) and X2(t) are the two inputs of the system, y(t) is the output, and ZI (t), Z2(t) are state variables. The values of the parameters used for the simulation are ao := 0.5, al := 0.25, a2 := 0.05, Cl := 0.3, and C2 := 0.15. Models ofthis form have been used in physiology (e.g., in the metabolic and endocrine system [Bergman et al., 1981; Carson et al., 1983], where the bilinear terms represent modulatory action). In the context of glucose regulation, the first equation describes the dependence of blood glucose concentration y(t) on glucagon action ZI(t) and insulin action Z2(t), which are assumed to have first-order kinetics. A model of similar structure has been proposed by Robert Pinter to describe lateral inhibition in the visual system [Pinter, 1984, 1985, 1987; Pinter & Nabet, 1992]. The nonlinearity of this system lies in the two bilinear terms in Equation (7.19), which give rise to an infinite-order equivalent Volterra model. The relative contribution of the two nonlinear terms to the qth-order Volterra functional is proportional to the qth powers
7.1
THE TWO-INPUT GASE
367
Table 7.1 The NMSEs for the network model prediction and for the kernel estimates of the two-input system described by Equations (7.19)-(7.21) for two values of output SNR (10 and 0 dB). Note that the 10 dB results are for a single run of 1024 input-output data points, but the 0 dB results are the average and standard deviation values over 50 independent runs.
Output Prediction SNR NMSE (%)
Kemels NMSEs (%) kt(m)
k2(m)
k ll(mb m2)
k22(mb m2)
a2
at
10 dB 10.14 0.856 0.818 6.15 4.50 0.241 0.696 OdB 49.70±2.24 2.41 ± 1.52 3.47±3.55 26.80±16.20 13.50±7.14 0.173±0.060 0.648±0.104
of P}/2 C I and P~/2C2' where PI and P 2 are the power levels ofthe two inputs Xl and X2' respectively. Since the magnitudes of Cl and C2 are both smaller than one in this example, a truncated Volterra model can be used to approximate the system. For the above values of Cl and C2, it was found that a second-order Laguerre-Volterra network model was sufficient to model the system. The Volterra kemels of this system can be analytically derived by the generalized harmonie balance method presented in Section 3.2. This system is simulated for two independent unit-variance GWN inputs with a length of 1024 data points (for zero initial conditions). Independent GWN signals are added to the output signal for SNR of 10 dB, in order to examine the effect of outputadditive noise and the robustness of this approach. A two-input Laguerre-Volterra network with five DLFs in each filter bank (LI = L 2 = 5) and two hidden units (H = 2) with second-degree polynomial activation functions (Q = 2) is selected using the model-order selection criterion of Section 2.3.1. The NMSEs of the resulting model prediction and of the kernel estimates are given in Table 7.1. Note that the ideal NMSE ofthe model prediction for output SNR = 10 dB is 10%. The estimated first-order and second-order Volterra kernels are shown in Figures 7.3 and 7.4, respectively. The estimated zerothorder kernel was equal to 1.996, very close to its true value of 2. The results demonstrate the excellent performance of this approach, especially relative to the conventional
o
0.7,-----~-~-~---,
Q6 -0.1
-0.2
Q4
g
i' ~
~Ci
~0.3
-0.3
Q2 0.1 0'
o
--?r-
10
2>
3)
40
eo
m
4\
eo
100
m
Figure 7.3 Estimated first-order Volterra kerneis of the simulated two-input system for SNR = 10 dB and N = 1024 data points (solid line): left panel for X1 and right panel for X2' The exact kerneis are shown with dotted line.
368
MODELING OF MUL TflNPUTIMULTIOUTPUT SYSTEMS
.. !.
.-!.
0015, ---
['-"
!
o
0015 ---
:.
-o.0C6
N 001
N 001 E
E
__
-. :
E
\
N
'il00C6
~ -0.01
.. ~.
~0015
~
><
-0.02
.\.-
-.
'.
~ .....
' ." 1:'"
.'.~.
-0.025 1. :
100
o
100
50
o
m2
m1 100 0
50 m2
m1 100 0
Figure 7.4 Estimated second-order Volterra kemels of the simulated two-input system for SNR = 10 dB and N = 1024 data points. The cross-kernel (right panel) shows negative values, although the self-kernel of the "negative" modulator Z2 (middle panel) shows positive values.
cross-correlation method for which the kerne1 estimates are unacceptable for such short data records, as shown in Figures 7.5 and 7.6. The robustness of this approach is further demonstrated by decreasing the output SNR to 0 dB and repeating the estimation procedure for 50 independent runs. The resulting NMSEs ofmodel prediction and kerne! estimation are given in Table 7.1 as weIl (average and standard deviation values over the 50 independent runs) and corroborate the robustness of this approach (note that the ideal NMSE of the model prediction for SNR = 0 dB is 50%).
06
04 ,
05 1,:
03
04
02 '
,
r
Cll
03
,~:
"l
:§:02
:;<
01
0 .0.1 .0.2 0
10
J:)
:D
m
Cl
!D
I
100
m
Estimated first-order kemels of the simulated two-input system for SNR = 10 dB and N using the cross-correlat ion technique (dashed line) or the Laguerre-Volterra network-based approach (solid line). Figure 7.5
= 1024 data polnts,
7.2
APPLICATlONS OF TWO-INPUT MODELING TO PHYSIOLOG/CAL SYSTEMS
369
.'. ,
.r , ;
0.02 .:'
"
:.
'i"
...... 0.011/ N
o
!.§.
.§. ~
->c:
-'-.!
·{>.cIl I."
:N
j
!':'.
:.~
0
.,F~ -0.051.'
-0.01
-0.1 1/
0
100
40
2J m1 40 0
2J ni2
· il'i·····
0.05 1."
". : E ,......
..:
l-o.~~n~
0.1 :
~O,O1
: I
E
r.
0.02 ..'
!D m1 100 0
o
!D
ni2
-. !D
m1 100 0
/ 100 !D
ni2
Figure 7.6 Cross-correlation estimates of the second-o rder kemels of the simulated two-input system for SNR = 10 dB and N = 1024 data polnts. Comparison with the respective Laguerre-Volterra network-based estimates of Figure 7.4 demonstrates the superior performance of the advocated approach over the conventional cross-correlation technique,
We conclude this example with a note on the steady-state behavior ofthis system/model. For step inputs Xl(t) = Alu(t) and X2(t) = A2u(t), the steady-state (asymptotic) value of the output is y = yoJ[ao - (c /al)A, + (c2Ia2)A2], which explains the polarity of the firstand seeond-order kemels for the two inputs (eorresponding to the signs of the first and second partial derivatives ofthis nonlinearity). 7.2 APPLICATIONS OF TWO-INPUT MODELING TO PHYSIOLOGICAL SYSTEMS To honor the pioneering contributions of my brother Panos and his assoeiates to the problem oftwo-input Volterra-Wiener modeling and to provide proper historie al perspective, we begin with the presentation of the first two applications to actual physiological systems: one on directionally selective eeIIs in the fly eye [Marmarelis & McCann, 1973] and the other on the eenter-surround organization of receptive fields in the eatfish retina [Marmarelis & Naka, 1973c]. Both of these initial applications employed the two-input variant of the cross-correlation technique. Subsequently, we present two recent applications oftwo-input modeling using the Laguerre-Volterra network-based approach (which is shown to be far more efficacious) to analyze natural data of spontaneous activity, which offer valuable insight into the metabolie autoregulation in dogs (Section 7.2.3) and the cerebral autoregulation in humans (Seetion 7.2.4). 7.2.1
Motion Detection in the Invertebrate Retina
The compound eye of insects consists of a matrix of ommatidia, each containing a small number ofretinula cells (eight in the fly ommatidium) and having a distinct optical axis .
370
MODELING OF MULTIINPUTIMUL TlOUTPUT SYSTEMS
Motion-sensitive neurons, located in the optic lobe of the insect brain, respond maximally to motions along two axes (approximately horizontal and vertical). For each axis, there is a pair of neurons tuned to respond maximally to each direction (i.e., for the horizontal axis, there is one fiber responding maximally to motions from left to right and one responding to motions from right to left). The functional properties (dynamies) of each fiber type can be examined with a two-input experiment in which the stimulus consists of two spots of light placed along one axis of motion and whose intensity is modulated by two independent GWN signals (with bandwidth of80 Hz in this fly experiment). The output is the spike-train response of the motion-sensitive cell measured by an extracellular electrode. The output spike record is converted into continuous form as a peristimulus histogram by repeating the GWN stimulus (60 sec duration) 12 times and histogramming the spike frequency in each time bin [Marmarelis & McCann, 1973]. Figure 7.7 (top-left panel) shows the two first-order Wiener kernels h l a and h l b , corresponding to the two spots a and b, which are quite different. If an impulsive light stimulus is given at input b, a positive response (increase from the mean response level) will be evoked, while an impulse given at input a will elicit a very small negative response (slight decrease from the mean response level). Figure 7.7 also shows the obtained second-order Wiener kernels (a cross-kernel h2ab and two self-kernels h 2aa and h2bb ) presented as arrays of numerical values (the self-kernels are symmetrie). We observe that the cross-kernel exhibits the asymmetry expected in directionally selective cells, and that the self-kernel h 2aa is very small (nearly null), whereas the self-kemel h 2bb has significant values (almost as large as the cross-kernel values). The contribution ofthese kemels to the model response can be seen in Figure 7.8. It should be noted that while the Wiener self-kemels describe the contribution of each input separately, their effect is generally dependent upon the presence of the other input (see earlier analysis ofWiener and Volterra kernels in the two-input case). The cross-kernel h2ab ( Tb T2) exhibits directional selectivity in motion detection because it has a large positive mount in the region Tl > T2 (forward motion), while in the region T2 > Tl it has a large negative valley (reverse motion). This cross-kernel describes quantitatively the contribution to the system response that is due to the nonlinear dynamic interaction between the two input signals. From this kerneI we see that a pulse at a followed by a pulse at b (i.e., Tl > T2) will elicit a large positive response, while a pulse at b followed by a pulse at a (i.e., 'T2 > 'Tl) will produce a negative response. The temporal extent of such "cross-talk" between the two inputs (about 60 sec in duration) and the precise time course ofthe elicited response are fully described by h2ab • The experimental and kernel-predicted responses shown in Figure 7.8 demonstrate that the system is strongly nonlinear and the cross-kernel contribution is significant.
7.2.2
Receptive Field Organization in the Vertebrate Retina
The typical "receptive field" (RF) organization for a neuron in the vertebrate retina consists of a central spot (center) and a concentric annulus (surround) that exhibit distinct response characteristics. Thus, a natural way to study the RFs of retinal cells is to consider them as two-input systems and employ two photic stimuli: a small spot of light covering the center (in this case, 0.3 mm in diameter) and a concentric annulus oflight covering the surround (in this case, with inner diameter of 0.3 mm and an outer diameter of 5 mm), each modulated by statistically independent GWN signals (light intensities modulated in GWN fashion with bandwidth 55 Hz and dynamic range of 1.6 log units). The response
7.2
371
APPLICATIONS OF TWO-INPUT MODELING TO PHYS/OLOG/CAL SYSTEMS
Tl,mse c O
a 20
o
b
o
I!l 1 21 1 20 ~ 21 n ~ ~ ~ ~ ~~ ~ ~ 9 7 2 ~ ~ " R
I
• • 12 '6 20 2'1 28 )2 lS
0 0 J 5 ·1 2 6 9 h 2 0a ( 1"1' 1"2) · 2·2 5 11 13 ·1·' 0 SI III 12 1 -2 ·) 2 11 13 ] ·1 ·)·1 II 10 • 0·3 ·1 2 II 2 1 · 3 · ) 2 II 5 6 , 1 lIO 0 1·2 ·.·1 II 5 5 6 II .... , 0 0·2·2 I II .. 5 5 3 0 lIell'OI002)3221-' 52 " 5 1 1 I I I 1 2 1 · 1·1· 1 · ' u 56 2 " " 1 0 I 1 2 1 2 0·'·2 -I · 1 150·1 I , 2 I 0 , 3 :3 2 2 ·1 ·3·1 0 ·1 E "'" -)·1 0 1 I I I 2 ) 3 2 1·2 -s ·1 · 1 ·2 • 61 - I - 2 -2 · 1 0 2 2 J , ) ] 2 0 - , -s -2 ·2 -s ...N 72 , 0 ·2·'·2 1 2 J , 1 ) 5 2·2·' -2 ·2 ·2 ·2 7fl 2 2 - I · 2 -lI ·2 I 2 3 2 0 , " Q -2 · 2 - 2 - 2 · 1 - 2 eo ·1 Q ·I - 2 -2 -s - I 2 , 3 1 -I 2 1 ·2·2·2 -2· 1 0 0 tN ·1 .] . " -5 · 3 -1 -I 1 , 16 ') 1·1 0 · 1 · , · 2 ..1·1 0 0 0 ee ·1 ·2 . .. -s -1 · 3 I 2 J II 11 :3 1 · 1 0 ·1 -s - I 0 -I -I 0 92 · 1 · 1 - I . ] -, - ' · 2 2 16 11 11 'I , 1 · 1 0 - 1 -2 I 2 - I 0 98 -I -2 -a · 1 -a -5 -ll 0 " 8 5 :3 ) , 1 ·1 0 0·) , 11 0 1 100 ·2 ·2 - ll . ", · 2 · 3 -) · 2 I 5 6 11 ) ] , 1·1 0 I I 6 5·1 I~ ·2 ·2 ., -$ ·5 -ll · ll - ] · 2 I 5 5 ) 2 , 2 1 0 2 2 2 8 II loe O ·I·2·' ·5·5..q ·II ·) ·) 0 'I \I 2 I I 1 0 2 11 , I 'I 112 2 1 0 ·1·1 ·]·5 -11 .) · 2·) 0 ) 2 2 0·1 0 1 :3 5 2 0 111 , :3 2 1 0 0 · ' -11 ..q ·1 · 1·3 0 2 ) 2 0·1 0 2 ) , 1 120.... 1 " ) 1 0 ·1 ·3 -ll.) 0 - I -lI·2 2 , 1 0 0 1 1 2 1 12'ol · ' ·11 0 11 2 0 0 0 ·2 ... ·2 0 ·1 ..q., 0·1 ·2 - I 1 0 0 2 121 1·2·5 0 , 1 0 I 2 0 ·2 ·2 1·1·, .. ·)·11·)·1 0·1 0 1)2 2 ) · 1 ·5 I ) · 1·1 2 ) 2 0 · 1 I 0 ·3 ~ · 5 ·5 ·3 -I 1 0 136 1 11 2·3·5 1 1·2·2 0 'I ) 1 0 I 0 ·3 · .. -s -11 · ' · 1 2 1'10 ) 3 5 1 · 5 · 5· 1 0 - ) ·11· 1 11 II 2 ·1 0·1·5 -s · 1 -s ." · 1
0
ForwordLinear Kerneis
101-
Il
:<
- 10 T
-20
o
o
0.10
0 .05
Time, sec
.. Q
o
e
..
t"l t
msee
8 12 16 20 2lI 28 32 :J6 '60 \lll .., 52 56 60
6'f,
6' 12 1'6 eo ..
oe
I
0 5 8 0 .. 11 12
~
:
~ I~ ~ ~ :n
21 J2
J 12 36 so so 7 11 I "Ii '17 60 55 0 3 10 J :3 2lI SlI 11 53
36
3 -3 - I 5 ·2
h z bO (TI' TZ)
.2'1 11 6
"t1 I ". · 11
5 32
se 57 ",
so
.. ·5·' ·11 -" 13 113 Si )8 I 3 · 8- 111 - 10 I 2S 52 51 \12 21 0 0-1\-11 -9 11 11 SI 53 3S 21
... ...s -"
52 -, ., '"' -&-10·21-18
-a 20 .,
\I'
52 1111 21 17
54 -I-13-IO-ll-16-20-21-12 6 2S 39 3S 23 16 ISO ....- 13- 11- ' '1-11 -25- 23- ' " -2 '" 26 33 3S 29 20 , 611 -I -I-''I -lt- 19-27-29-1I·8 e 19 2S 29 10 2S E SI ~ · 2 -5-1 3- 21-2'l- 3Q· 27-11 0 10 20 22 2't 21& -s ~ -3 · $ - 18..2S·26 -2t·22 -s 14 12 It 19 17 N?I 0 -.4t -11 ""I· l0·23-21·~ ·1S -17 -I 7 I' 19 111 ... ., "·2..e ·, -9- 1' - 27-25-20-20- 11 "Il 17 111 .. , "-1·12-12-15-25-21-19·13·13 - 5 • 111 17 • -5 6 2·IO-ll- ll-19 -2G-22 -U -s -5 1 11 13 i2 -I -$ "- 1-13-21-20-2 1-22 -15 -41 2' 0 5 12 • ~ - SI ·1 ·1 -6-16·25·22·19 ·17 ·9 0 'I , 7 u
n
100 0 -2 -t -. -5 ·1 0-20·27-21 ·1" ·10
~
2 "
5
11)1 ., 2 I'" -9·10·I'·21-2'rI·1$ ., ·$·1 :I , 1011 . , -"l J 2 ·' ·t-I1-15 ·,,·ae ·7 0 0 2 5 112 2·10'" ) , ~·II ·II -,S·llt-I3·1 5 2 II IlS , 2·13 -I 2 5 · ' · 1"- ' 1· 15-12· 11 1 6 .. 120 · 2 9 '·12 -I 2 , ""·15·11 ·13·12" I 5 12It · 5 1 11 lt ·I·1 0 ,, ·1·15·15·11·10 ·8 1 121 ·)·2 II 12 5 ·1·11 · 5 ·1·10·,5 ·12·10 - I ·.. 132 · 1 0 , 8 12 .. -1 ·12 ·1 ·5· ,0-Ill·10 ., ·5 136 I 0 0 I 5 9 2 ·7·12·10 ·$·10· 1) -6 · 2 '.0 - 2 1 ·1 ·2·2 2 II 0 -e·12· 10 -11 - 10·12·1
l3 16 •
I!I 12 11 20 211 2t )2 38 lIO ....... 52 51
20 10 I 17 13 I " 11 12 8 - 5 -6
10 • 11 2' -5 -I 15 7 , .. 13 SI Il 10 7
• 9 I -2 2 S " • 0 -1 2 11 , 2 7 0 -1 11 13 .. .. 5 -2 , t SI 12 11 t 6 0 5 6 t 10 12 e , .. 7 1 7 I!I 10 Jl 6 S 1 SI I!I I!I , SI e ) e 11 I • I!I 'I 5 'I I 11 11 5 ., 8 0 0 II 'I 1 1 II , "-11 , ) 0 1 9 6 1 ) ·) .. , ., 5 10 9 9 · 2· 1 6 ...) 8 11 9 .. I -2 5 5 0 I 1I
$ 2 , .. 2 6 1 12 18 11 -8·1 7 I 1 11 7 Si 1 6 1 • · ID-U -6 • I!I 10 15 11 t I!I 2 12 · 10· 1.·18 · 5 1I 11 20 23 20 UI 12 I1 · 2· 12-11· 15 , 10 36 Je Je 3S 29 20 ) -6 ·11·2'1 -9 21 115 56 551 ,. .7 21& I · . · 20· 3'5·33 · 7 2e 53 68 'I 62 21 - 2 ·1·18-"12 -55-"2 . , 21 SI 68 .. J2 ·2 · '·11 -)7-11-61 -11. · 7 25 115 56 3S J -I ·20-:IlI·SO-a-es·llS ·2 2'fI 3i l&O 5 D-II ·]5-III-S5-6I...«t.I·211 0 20 ~ 11 ,,·7·21·50·511 ·51 -."·]1·,1 0 "11 5 '·2 ·20-"2·51 - 511-1111 ·32-11·11 52 , I 0-1)-32....1·51·51 ..0 ·211·12
e: E :: ;., 12 ... 7G 10 .. ..
CROSS-KERNEL h... (" , '.)
r l , mstC
11
o , •
)
12
o
SI I
)
12 5 ·2 ,
,
1 5 -2 11 22 so 3i '.. '17 '" 11I 33 32 21 2S III 22 0 10
2·1 17 2 37 20 51 :si 511 55 11' 32 1I 1
-I
so.. se on
7 Si 1 1 3 1 ·) ·1 I ·1 -41 ) 5·2·) ) • , 7 2'1 13 10 31 21 11 .... "15 15 3S 26 21 2S 22 20 16 I' 11 16 10
11 ICI .. ..
I 2 I 2 I 2 2 .. 1 , I 2 • 12 5 11 11 ... ·) 10 ' ·5·2 1 7·1 I , • 10 ) 5 16 1 0 10 , ..
'-41 . )
I
7 0 -7 5 5·) " 11 0 2 5· 1 -I 2 I 1 ·2 2 1·1 0 , 2·1 2 I II
,
5 t
7 :I 8 2 " 1 ·2 0 2
1·1 ·2 - \I .. I
.~ ; ~ :t:~:~::::t:~:::~.;~ j :; .: I~ : : .~ :ti~ :~ .: ~ -~ .~ ::::~:~:~:~:~:~:::g.i: :~ :~ ~ .g .~ : -::: :; .~ " 1 0 7 0
I ·1·12·22·23·21· 11·17·17·11·13"" .. , 5 1 ·,·,5·25 ·2'1·11·15·12·12·13·11 2 13 10 3 ·2 · '-21-lt·23· II·ll . , -1 ·12 -Q 6 I I!I 1 .... -10·23·26·1.. ·3 ·5-10'" -'I ) 5 I -'1.. 10·11- ,1·13·13 -I 0·)·1 az ., ·5 ·$·1 I·) ·1 ·11 ....·13·10· 10 ... 0 0 SIll " .) 6 ' ·2 -5 •• ·$·13· 15·12·12·11 -1 ·10 ... 2 100 ) II '11 13 )·5 · 1·13 ..11- 11· 11 -I ., -I 10. · 1 2 2 11 13 13 2 · 1· 13· 17· 11· 13 · SI 2 ·1 Ic.·IO "'" 1 0 5 12 10 ·2·U ·ll-15·1)·II·II ·1 7 " 112 · 1 -« 2 3 I I 10 5 ·5 ·12·1) ·10 -1· 12·111·2 I 111 ., ... · 1 1 7 ) 5 I ' ·)·10-10·' ·7·12·13 -lI 120 , 3 11 10 5 • 7 1 ..·11·11 -I ·'·10 ·S 1211 10 ' ·5·3 "12 11 , 1 .·2 -4·1 3·11 ... -I - , 121 -tI I!I 1 2 0 ) 11 , 6 11 6·1 ·7 -12-11 - 5 - 5 132·1I ·8 l& 7 Si 2 2 • 3 " 10 1 2 ... ·12 ·10 ·5 138 ·1 2 ~ 12 ·' 2 6 Si 0 ·2· 1·) 0 8 .. ·1 -"12 ·7 1'10·10 ·13 -1 ·1 2 :I 2·3 ·,·7 -I .. I 0 - ' · 13· 10 6 ) I 10 I , ..q·2 .. · 7 2 ..
-I.'
" ·1:2
-,., \I 2·) ·7 13 ... · 10 1 2 ·2 11 15 ... • .. ·2 11 1 ISI 15 · 7· 13 ... ·7 , 11 11 ·2 ... · ) · 3 1 17 11 , .. ·1 ·1 7 13 20 11 3 0 2 .. 10 ·2 ) 2 2 5 ) 0 · 1 2 'I ., 7 • 1·2 2 1 S 2 5·1 ..e·1 2 · 5 · 2 3..q·1 ... - 7 - I - $ 2 ... · 10 -6 -. ~1t ·5 2 -41 .... · 5 -1 ·1 0.. 0
.$ -,
, .......
.... . ,
-41 -I ·7
Figura 7.7 First- and second-order Wiener kerneis of the two-input experiment in the fly eye with two spot stimuli a and b. The first -order Wiener kerneis are shown in the top-Ieft panel, indicating much stronger response for spot b. The second-order Wiener cross-kernel h 28b is shown in the bottom-right panel (as an array of numerical values) and exhibits the asymmetry expected in a directionally selective cell (positive mount above the diagonal and negative valley below the .diagonal). The second-order Wiener self-kernel h 2b b (bottom-Ieft panel) is much stronger than the other self-kernel h 288 (top-right panel) [Marmarelis & McCann, 1973].
arising from the simultaneous stimulation ofthe two inputs can be separated into the components evoked by each stimulus separately, as wen as the component due to the interaction of the two inputs using the two-input modeling methodology to estimate the self-kernels for each input and the cross-kernel for a second-order model [Mannarelis & Naka, 1973c].
372
MODELING OF MUL TIINPUTIMUL TIOUTPUT SYSTEMS
oo
b
0
Forword---
....--.-.-----1 300msec
Figure 7.8 Experimental and model-predicted responses of the two-input system of directionally selective cell in the fly eye. Note the significant contribution of the second-order cross-kernel (marked as "nonlinear interaction") and the minute contribution of the first-order kerneis (marked as "linear model"). The significant contribution of the h 2b b self-kernel is shown in the trace marked "nonlinear model (only self-terms)" [Marmarelis & McCann, 1973].
Figure 7.9 shows the first-order Wiener kernels for the light-to-horizontal cell system in the catfish retina obtained via crosscorrelation in three cases: (1) single-input GWN spot stimulus, (2) single-input GWN annular stimulus, and (3) dual-input composed of a spot and an annulus stimulus independently modulated by GWN signals. Hyperpolarization ofthe cell has been plotted upward. We note that the annular first-order kernel h Ials in the presence of the spot stimulus is very similar to h l a (in the absence of the spot stimulus). However, the spot kerneI in the presence of annular stimulus h Isla is larger and faster than hIs, which implies that the mechanism responsible for the generation ofthe horizontal cell response to a light spot stimulus becomes faster and its gain increases in the presence of a positive annular input. On the other hand, the presence of the spot stimulus does not affect the annular response to any appreciable extent (unlike the polar cell response, which is also shown). Figure 7.9 also shows portions ofthe two GWN input signals (one for spot and the other for annulus) and the resulting horizontal cell response obtained experimentally, together with the corresponding model response to these same inputs. The model response was
7.2
APPLICATIONS OF TWO-INPUT MODELING TO PHYSIOLOG/CAL SYSTEMS
373
FIRST-ORDER KERNELS
I
100 msee
I
bipolar eell
horizontal eell
Single - input a: annulus s : spot
Two - input als: annular component s/a: spot eomponent
HORIZONTAL CELL
Annulus
BIPOLAR CELL
I
500 msee ,
Figure 7.9 First-order Wiener kerneis and experimental and model responses of the two-input system (spotlannulus) for the horizontal and the bipolar cell in the catfish retina (see text for details) [Marmarelis & Naka, 1973c].
374
MODELING OF MULTIINPUTIMULTlOUTPUT SYSTEMS
computed from all Wiener kernel estimates up to second order. Agreement between the system response and the model prediction is very good (NMSE is about 5%). The first-order contribution brings the error down to 12%, suggesting that the system is fairly linear. Similar results are shown in Figure 7.9 for the light-to-bipolar cell system in the catfish retina [Marmarelis & Naka, 1973c]. It is evident from Figure 7.9 that the response time ofthe horizontal cell (latency ofthe peak ofthe first-order kerne I) is longer for the spot input, but decreases in the presence of the other input. For both bipolar and horizontal cells, the presence ofthe other input amplifies the entire waveform of the respective kernel, except for the horizontal annular kernel. It is also evident that the bipolar cell has a biphasic RF (positive spot and negative annular kemels for this on-center bipolar cell) but the horizontal cell has monophasic RF (positive spot and annular kemels ) for this level of stimulation. It has been found that for higher levels of stimulation, the annular first-order Wiener kernel of the horizontal cell exhibits a negative undershoot after approximately 100 ms lag (consistent with our nonlinear feedback analysis of Sec. 4.1.5). The results ofthe two-input experiment on the horizontal cell suggest an enhancement of the spot response in the presence of an annulus stimulus but not vice versa. In the catfish retina, the interaction ofthe spot annular stimuli in the external horizontal cells is facilitatory (the sum ofthe spot and annular responses is smaller than the response obtained by simultaneous presentation ofthe two stimuli), whereas in the internal horizontal cells it is neutral (insignificant). Marmarelis and Naka have shown, through an analytical solution of the spatial distribution of potential in the catfish horizontal cell layers, that these results can be explained by an increase in the space constant due to the annular stimulus in the case of the extemal horizontal cells. Thus, the catfish horizontal cells may perform a dual function: produce the integrated receptive-field response (i.e., setting the operating point at the average intensity level) and improve the frequency response ofthe initial retinal processing stages at high intensity levels by means of negative feedback to the receptors (see also Sec. 4.1.5). We now turn to the study of the RF of the retinal ganglion cell, which is responsible for encoding incoming visual information into spike trains transmitted through the optic nerve to the LGN and the visual cortex. Webegin by showing in Figure 7.10 the first-order and second-order Wiener kernels ofthe single-input, light-to-ganglion cell system for three different GWN stimuli: (1) annulus, (2) spot, and (3) uniform field (i.e., covering both annulus and spot area). It is evident that the annulus pathway input is dominant, since the annular kemels are much closer to the field kemels than the spot kemels, We proceed with the two-input study, in which the spot and annulus stimuli are independent GWN signals presented simultaneously to the retina. The obtained first-order Wiener kemels are shown in Figure 7.11 along with their single-input counterparts and demonstrate the fact that the presence of the other stimulus alters the dynamics for both pathways (suggesting strong lateral interconnections between center and surround areas of the RF). This effect of lateral interaction is much more pronounced on the spot pathway (as a result of the presence of the independent annular stimulus), as demonstrated also by the power spectra ofthe various response components shown in Figure 7.12. The second-order Wiener kernels for the two-input case are shown in Figure 7.13 for Type A and Type B ganglion cells. These second-order kemels demonstrate the different nonlinear dynamics of the two RF pathways (center and surround) and the differences in the nonlinear dynamics of the two types (A and B) of ganglion cells. Most notably, it is observed that only the Type A ganglion cell exhibits directional selectivity (asymmetric
7.2
375
APPLICATIONS OF TWO-INPUT MODELING TO PHYSIOLOG/CAL SYSTEMS
A 25
• UNIFORM LIGHT • SPOT OF LIGHT + ANNUWS OF LIGHT
20 15
60 MSEC
.E 10
~
~
1----1
5
c-: '....;str
0::
~
0:: W
o o
..t!:t ..,
TIME
-5
0:54 SEC
T
0::
t- -10
(f)
0:: lJ..
-15
-20
-25 Figure 7.10a First-order Wiener kerneis for the light-to-ganglion (type A) cell system in the catfish retina obtained for three different GWN light stimuli: spot, annulus, and field. The field stimulus is the combination of spot and annulus. Ordinate units are in (spikes/sec)/(IJ.W/mm2) and 20 units correspond to a change of 50 spikes/sec from the mean firing rate caused bya brief flash at the mean intensity level [Marmarelis & Naka, 1973c].
GANGU<»4 CELL
-K o
.Ga
'r, .He ._
.e.
o +--*--*' 'C>O
»
~
.1.
o
.11
.012
T., sec ...
'-~.
._
.121
.0Sl
.1'
.~.
Ti.1IC ..
._
.121
.1.
AR
~ ~~:. \ ~: (!(? "a~~ ~~~ ,,,~,"
;::
.,al
0
•
I
\
I
.•a .11
.11
AnNiut • ~ (T,. TZ)
Spot.
ha hi, TZ)
Fiefd,
ha (TI' Tz)
Figure 7.10b Contour plots of the second-order Wiener kerneis of the Iight-to-ganglion cell system in the catfish retina for three different GWN stimuli: annulus (Ieft), spot (middle), and field (right). The units of h 2 are (spikes/sec)/(IJ.W/mm2)2. The elevation contours are drawn at equal increments [Marmarelis & Naka, 1973c].
376
MODELING OF MUL TIINPUTIMUL TlOUTPUT SYSTEMS
40rA
€s: 20~
Spot Component Spo1 alone o Spot ond onnulus
A
~
0
G> c
~
0 I
0.2
0.1
0.3
0.4
0.3
004
Time r; sec
-20L.
40,.. 8
~s: 20~
Annu[or Component Annulus olone o Annulus ond spot
A
A
0
'ii c
~
Q,)
~
-20
I \ V
0.2.
Time T. sec
Figure 7.11 First-order Wiener kerneis for light-to-ganglion cell (type A) system in the catfish retina obtained from one-input (spot or annulus) and two-input (spot and annulus) experiments. Ordinate scales are similar to those in Figure 7.1O(a) [Marmarelis & Naka, 1973c].
cross-kernel). We attribute these differences to the distinct connectivity of presynaptic amacrine cells. We must also note that the obtained first-order Wiener kernels for the two types (A and B) of ganglion cells in the two-input case have similar waveforms and reverse polarity (positive for spot and negative for annular kernels of the Type A ganglion cell, and reverse polarities for the Type B ganglion cell) [Mannarelis & Naka, 1973c].
Ganglion Cel~timuti 01-
•••••••••••;;.::~~"*"'C'~.':':::::-:.~ - - ~••.•.•.:::::......
.....-;;:..._--
i
g
-10 -
-:7.....
-
~ E perimenI '"\' I.y x Model
~(spoll \ " '\ Annulus-'\..\
""\\
Spot
l-20
I
/.~
Spot {tmuIUS)
,\\
~'" .....
\ .. '- ~ .....
\,\ ~
-30
\
~i~~ \ ~\
2
0---"·10
Frequency, Hz
\
i 60
Figure 7.12 Power spectra of the inputs and outputs of the light-to-ganglion cell (type A) system. One-input response power spectra are compared with two-input power spectra with similar stimulus configuration. Notation "annulus (spot)" indicates the annular component of the two-input model prediction in the presence of the spot stimulus. Curves marked "model" and "experiment" are the spectra of the two-input model and the experimental responses. Spectra marked "spot" and "annulus" are computed from the response of the one-input experiments [Marmarelis & Naka, 1973c].
7.2
'r,.
o
.032
o4
see
TI'
.C* •• • <.' ,(
• < • ('(
.t21 ',,'
.0»
.11
<
t
•
sec
.•
••
.121
o
.18
O'
'
.032 6t::.
T" •
see
.0IlI
.011
:A:'
~
• •ClIlI
.•
...~
tf'
~.~ ~~: - U~·. ~'\
.128
.t8
.11
.18 t'
o~
.08lI
."0
o
.121
.t2l· •
.0!2
.0!2
Type A QOnQlion.
377
APPLICATIONS OF TWO-INPUT MODELING TO PHYSIOLOG/CAL SYSTEMS
St\a(li.Ta)
~t::>~O ~
~~~.:: Type A QOngUon,
Ahz(li. t'zl
~Zl\<
Type A QONJlion. SAh2 ( tj . TZ'
Figure 7.13a The second-order Wiener kerneis for the two-input light-to-ganglion (Type A) cell system. Sh 2(7'1 , 7'2) denotes the spot self-kernel, Ah 2 ( 7'1 , 7'2) the annular self-kernel, and SAh 2 ( 7'1 , 7'2) the spot-annulus cross-kerne!. Kernel units are in volts/(fl-W/mm2)2 [Marmarelis & Naka, 1973c].
1j.
o ~
t
t
.032 <
(.
t
sec
•• !
•
•• te
K·
.121 ..
:ol
.1'•
.....'1i. sec
•
,J»
'fit sec .11
.03a
••
••
.121
.t8
~1D2
H••
-..
...
•t21
.11
Type B CJOnCJ'ion, Sht (ti. Ta)
Type 8 9O"Qlion, ~ (lj • T2J
Type B 9Ongllon, SAha(tit TZ)
Figure 7.13b The second-order Wiener kerneis for the two-input Iight-to-ganglion (type B) cell system. Note the lack of significant asymmetry in the cross-kernel [Marmarelis & Naka, 1973c].
Representative first-order Wiener kernels ofType N amacrine cells in the catfish retina (both NA and NB connecting to Type A and B ganglion cells, respectively) for single-input and dual-input experiments are shown in Figure 7.14. We note the suppression ofthe spot kernel in the presence of annular input for the Type NA amacrine cell (but not for the Type NB amacrine cell), indicative of strong lateral interactions that are consistent with the previously observed directional selectivity ofthe Type A ganglion cell (which is postsynaptic to the Type NA amacrine cell). The observed differences in RF center-surround organization (in terms of response characteristics) and in relative degree of nonlinearity (measured by the relation between first-order and second-order responses) led Panos Marmarelis and Ken Naka to a clever classification scheme portrayed in Figure 7.15, whereby various types ofhorizontal, bipolar, amacrine, and ganglion cells in the catfish retina are shown to form distinct clusters. These results suggest that, throughout the catfish retina, the central RF mechanism is a slower process when it is excited alone, but it becomes faster (both latency-wise and bandwidth-wise) in the presence of an annular stimulus. This "speed-up" of the central
t
378
MODELING OF MUL TIINPUTIMUL TlOUTPUT SYSTEMS
8
,
0.1 sec,
Figure 7.14 (A) First-order Wiener kerneis from type NA amacrine cells and (8) NB-type amacrine cells in the catfish retina. Two sets of kerneis are shown, one set from two-input spotlannulus experiments and the other from single-input experiments, under the same conditions. In A, traces 1 and 3 and traces 2 and 4 are h 1sl a and h 1s • Note the complete suppression of the spot kerare h 1al s and nel in the presence of the annular input. In B, traces 1 and 3 are h 1sl a and h 1s ' and traces 2 and 4 are h 1al s and h 1a • No other significant first-order effects are observed. Upward deflection is for hyperpolarization of the membrane potential [Marmarelis & Naka, 1973c].
r-;
RF mechanism most likely takes place at the level ofthe outer plexiform layer, since it is first observed in the horizontal cell responses and is replicated at the ganglion celllevel. This study also shows that as the mean intensity level is increased, the retinal cell kemels resulting from field or annular inputs become less damped, whereas the kernels resulting form spot inputs remain overdamped. This can be explained by the stipulation of a nonlinear feedback mechanism which is inactive at low-intensity levels but becomes active at higher intensity levels. The existence of a negative feedback mechanism results in an improvement ofthe frequency-response characteristics ofthe system as the bandwidth ofthe system is extended, consistent with the analysis presented in Section 4.1.5.
7.2.3
Metabolie Autoregulation in 00g8
The effect of insulin on the concentration of blood glucose has been studied extensively in the context of diabetes mellitus and the treatment of Type I diabetic patients with insulin injections. The recent development of continuous glucose monitors and insulin micropumps has stimulated again the long-held interest in an "artificial pancreas," which requires a thorough understanding of the dynamic insulin-glucose relationship and reliable means to control the release of insulin in order to maintain fairly steady glucose levels in the blood. Achieving these important objectives requires reliable nonlinear-dynamic models that can be used also for improved diagnostic purposes in a clinical context. Examples of such models were given in Section 6.4. In this section, we consider a two-input modeling task that seeks to reveal and quantify the causallinks between two simultaneous inputs (plasma insulin and free fatty acids) on
7.2
In
II (/)
~ §, Ire w z ~ LLJ
0: hyp
o:dep
0: dep
s:dep
s . hyp
s . hyp
s
e:tO
~~ :J~
-
16
CI)
Z 0 0:
::>
LU
z « w z :J z 0:
s
.-
4
-~E~RÖNS----" -' . . ,/'~~Yb 0 0 00
--0
o
~
..._ ... _ ... - .... - 12.... ~~ (~ o oo.. . __-- __~~Jo 1 4 ---2 1 l/2 Y4 o
8 ,..... ----..........
[o]=[s] -,
0
01
0
o
--~
No NEURONS
" :
0
\ I')
larger annulor [0] con1ribution
-,
00 0
0 "\
'/4 '/2
0
Ya NEURONS .>
0 00
,
,
-
-
00 0
1
0- 006'
-
2
4
~/ HORIZONTAL ~~ _.,.'-~
IOOC81c}OI• •
Nb'NEURO~~-""
g:> 0& ..Q.
~ 0
-oW -
8
16
0)
- ..-";""":.... - -
- -
-
[al larger annular
0"
/"9 o0
0
I
/'i-"' o"\
I I
••••••••• - - - __ -
',.~~~~~~-~~--~~-~~BIPOLAR Bb
...----------- .....
(s]
\ I I
~''''
" ... ---[0]=[5]---------......... 1
2
J
0
''', \
0
0
..f) 00
,
-~~~~-~,
...
0
0
-l..."......-oo..Q...~}- ,,*1;>- ~'-"..r:'! - - a J - \-f'- -§ <:t> ,,' -- ........... I
~
0
"
2
BIPOLAR Ba
8- -",--- - -
-
0
'~ Q:JD ~
I
\
-
(/)
!
g sc:
m o
-
"" 0
~
In
I
~~l
>.
0::
c .._
m
:J Z
8
o:hyp s:dep
Ire
Z
379
APPLICATIONS OF TWO-INPUT MODELING TO PHYSIOLOG/CAL SYSTEMS
4
Cs]
contribution
,
00
,
, 1
~
"'".o. 2.-'''''" C NEURONS
8
Figure 7.15 Clusters of functional classification of catfish retinal neurons. The lower-case designation "a" or "b" denotes depolarizing or hyperpolarizing cells respectively (earlier denoted as Type A and Type B). The Y cells are ganglions with strong nonlinearity [Marmarelis & Naka, 1973c).
the concentration of plasma glucose (output). The experimental data are from an anesthetized dog under conditions of spontaneous activity (i.e., normal closed-loop operation without external infusion of insulin, free fatty acids or glucose). The samples were collected every 3 min over a 10 hr period (200 data points for each of the three variables) [Marmarelis et al., 2002]. Application ofthe method presented in Section 7.1.3 (for L = 5, H= 2, Q = 3) yielded the PDM model shown in Figure 7.16 that has a single pair ofPDMs (one for each ofthe two inputs). The PDM corresponding to the insulin input exhibits glucoleptic characteristics (i.e., causes a reduction in glucose concentration as a result of an increase in insulin concentration). The dynamics ofthis glucoleptic effect show maximum response after ~3 hr and an earlier smaller peak after ,ooJ1 hr, whereas the effect is diminished after 7-8 hr. The other PDM, corresponding to the free fatty acids (FFA) input, exhibits a positive effeet on glucose concentration (as expected) with the maximum response occurring almost immediately (zero lag point in the PDM curve) and a secondary peak after ~1 hr. The FFA effect is diminished after ~3 hr. The combined effect of these two PDMs is determined by the associated static nonlinearity shown on the right of the PDM model, which transforms the sum of the outputs of the two PDMs. It is evident from the form of this concave nonlinearity that an increase of FFA input will reduce the sensitivity of glucose uptake to insulin (glucoleptic reduction), because it will move the "operating point" ofthe
380
MODELING OF MUL TIINPUTIM UL TlOUTPUT SYSTEMS
5
x10
-3
.01---------·----------------'-
Insulin
---.
-5 1-,·--------------1-··-----;--·-----·-·----------------1 2
o -20'
o
500
GIUCOS4
,
1000 Time [mins]
(
+
1 -.- - - - - - - . - - - - - - - . . , 0.8
--······----·---···---···1-----·--·--·--·····-----·-
FF~ ::: ::::::..:.:.:.::::.::::..L:::::::::::::::::::: O.2f \
j
o o
500
--+
-2 -4
-6
-2
-1
o
1
2
. 1000 Time [mins]
Figure 7.16 The PDM model of the two-input system defined by the spontaneous variations in the concentrations of plasma insulin and free fatty acids (FFA) as inputs and glucose as output in dogs. The two PDMs quantify the effect of elevated FFA on insulin sensitivity.
insulin-glucose relation higher where the slope of the nonlinear curve (sensitivity gain) is smaller. This lucid result (if confirmed by additional data and analysis) can explain the effect of obesity on insulin sensitivity and potentially elucidate one major factor for the onset ofType 11 (or adult onset) diabetes in a quantitative manner. The prediction of this model is shown in Figure 7.17 for this set of experimental data and demonstrates good performance in terms of predicting the slow changes in glucose concentration, but it does not capture the high-frequency variations that are viewed as systemic noise/interference. These high-frequency fluctuations ought to be examined separately, since they may not be part ofthe causal relations among the three considered variables.
7.2.4
Cerebral Autoregulation in Humans
Multiple homeostatic mechanisms regulate cerebral blood flow in humans to maintain a relatively constant level despite changes in cerebral perfusion pressure [Edvinsson & Krause, 2002; Panerai et al., 2000; Poulin et al., 1998; Zhang et al., 2000]. A modeling example of dynamic cerebral autoregulation was given n Section 6.2, whereby spontaneous fluctuations of beat-to-beat mean arterial blood pressure (MABP) was viewed as the input and the corresponding mean cerebral blood flow velocity (MCBFV) was viewed as the output [Mitsis et al., 2002, 2003a, c; Zhang et al., 1998, 2000]. A PDM model was obtained that revealed new information about the nonlinear dy-
7.2
APPLICATIONS OF TWO-INPUT MODELING TO PHYSIOLOG/CAL SYSTEMS
381
6
4
2
o -2 -4
:"1~V.....
-6
-8
-10
I
o
,
100
200
300
400
500
600
Time [m ins]
Figure 7.17 The output prediction of the two-input model shown in Figure 7.16, demonstrating the prediction of glucose concentration based on insulin and FFA concentrations under spontaneous operating conditions in dogs.
namic properties of cerebral autoregulation and the quantitative manner in which rapid changes in arterial pressure induce rapid changes in cerebral flow. The observed data cast doubt on the validity of the notion of "steady-state" analysis, since no such "steady state" is ever observed in the natural operation of cerebral circulation, and advances the notion of dynamic autoregulation (Le., frequency-dependent). Furthermore, the obtained models have demonstrated that cerebral autoregulation is more effective in the low-frequency range (below 0.1 Hz), where most of the MABP spectral power resides and where the model exhibits significant nonlinearities. Therefore, most spontaneous MABP changes do not cause large MCBFV variations, because of cerebral autoregulation mechanisms that are more prominent in the low frequency range (up to 0.1 Hz) and exhibit dynamic (i.e., frequency -dependent) nonlinearities. It is also well established in the literature that changes in arterial CO 2 tension cause vascular responses in cerebral vessels [Poulin et al., 1996, 1998]. The putative physiological mechanism is described by the pH hypothesis, which postulates that systemic CO 2 crosses the blood-brain barrier and modulates the extracellular and perivascular [H+], thus changing the smooth muscle properties. Specifically, it is qualitatively known that hypercapnia induces vasodilation and hypocapnia causes vasoconstriction, but precise quantitative (dynamic) relations of this causallink were still lacking until the recent twoinput study described below. A number of studies have examined MCBFV responses to step changes in CO 2 tension [Panerai et al., 2000; Poulin et al., 1996, 1988], and it was shown that this response is not instantaneous but lags the CO 2 tension changes by several seconds. Poulin et al., developed a simple one-compartment model for the cerebrovascular response to hypercapnia,
382
MODELING OF MUL TIINPUTIMUL TIOUTPUT SYSTEMS
characterized by a time constant, a gain term, and a pure delay. A second compartment with a larger time constant (on the order of 7 min) had to be included for the hypocapnic response, since a secondary, slow adaptation process of MCBFV increase in response to the hypocapnic stimulus was reported. An asymmetry in the on-transient and off-transient responses to hypocapnia was also reported, with the on-transient being significantly faster and with a smaller gain than the off-transient, whereas a pure time delay equal to 3.9 sec was estimated [Poulin et al., 1998]. Nonlinear mathematical models were developed by Ursino et al. (1998, 2000) in order to describe the interactions between cardiovascular impedance, arterial CO 2 tension, and intracranial blood pressure, whereby the interaction between arterial CO 2 tension and cerebrovascular effects on autoregulation was modeled with a sigmoidal relationship. Since the easiest way to observe changes in arterial CO 2 tension is to monitor the breathto-breath end-tidal CO 2 (PETC02) variations, the latter measurements can be used as a surrogate to study the dynamic effects of CO 2 tension on vascular impedance by introducing the PETC02 signal as a second input (in addition to spontaneous MABP variations being the first input) and obtain a nonlinear dynamic model of MCBFV variations (output). This has been attempted in a modeling study by Panerai et al. (2000), that employed causal FIR filters and spontaneous breath-to-breath PETC02 variations to assess the effect of arterial CO 2 on MCBFV in a linear context. It was found that, when used along with beat-to-beat MABP variations, PETC02 variations improve the prediction performance of the model considerably. The dynamic characteristics ofthe MABP-to-MCBFV and PETC02-to-MCBFV relations were obtained in the form of impulse response functions and no significant interactions between the two input variables were reported [Panerai et al., 2000]. Here, we employ the two-input formulation of the LVN modeling approach, which is suitable for nonlinear systems with two inputs, in order to assess the nonlinear dynamic effects ofMABP an~fE'l'C02 on MCBFV as well as their-nenlinearinteractionsjlvlitsis et al., 2002, 2003a, cl. Six-minute data segments (360 data points) from ten normotensive subjects were used to train a two-input LVN with structural parameters LI = L 2 = 8, H = 3, Q = 3, resulting in 60 free parameters (a very low number ofparameters for a nonlinear model with memory extent of about 2 min or 160 lags). The achieved model parsimony results in significant performance improvement relative to conventional methods, such as the cross-correlation technique. To terminate the training procedure and avoid overtraining the two-input LVN model, the prediction NMSE is minimized for a two-minute forward segment of testing data (adjacent to the six-minute training data segment). The model is estimated for a sliding six-minute data window (with five-minute overlap) over 40-min recordings in order to track any nonstationary changes in the system. The mean values ofthe MABP, PETC02, and MCBFV data, averaged over the 40-min recordings, are 77.2 ± 9.1 (mm Hg) for MABP, 40.0 ± 1.9 (mm Hg) for P E TC02, and 59.1 ± 12.3 (ern/sec) for MCBFV. Typical six-minute data segments are shown in Figure 7.18, along with the spectra ofthe corresponding high-passed (at 0.005 Hz) data sets. The high-pass filtering is performed in order to eliminate very slow trends. Most of the signal power resides below 0.1 Hz, although the MCBFV signal exhibits some power up to 0.3 Hz. The average NMSEs of in-sample model prediction is given in Table 7.2 for LVN models with one input (MABP or P E TC02) and two inputs (MABP and PETC02) ofvarious orders (up to third). The complexity of the one-input and two-input models, in terms of the total number of free parameters, is the same. Although MABP variations explain most
7.2
APPL/CATIONS OF TWO-INPUT MODEL/NG TO PH YS/OLOG/ CAL SYSTEMS
Mean Arterial Blood Pressure
90 '
End-tidal C02
Cerebral Blood Flow Velocity
70,
"
i::~~~r~\~I~~~~~I~~~m I
60 1 0
I
240 360 Time [sec] ABP spectrum
150 ~W \
100
50 1
0
120 240 360 Time [sec] PETC02 spectrum
10
I
200 1.
~
I
I.
I
0.1 0.2 0.3 Frequency [Hz]
"
35
120
250 ,
383
0
120
240 360 Time [sec] Vp spectrum
120 100
:~
BO 60 40 20
0
0
0.1 0.2 0.3 Frequency [Hz]
0
0
0.1 0.2 0.3 Frequency [Hz]
Figure 7.18 Typi cal data segments used for two-input LVN (TI-LVN) model esti mation. Top panels: t ime series, bottom panels: spectra of high -passed (at 0.005 Hz) signals [Mitsis et al., 2003a].
ofthe MCBFV variations, the incorporation of P E TC02 variations as an additional input in the model reduces the third-order model prediction NMSE by about 6%. The predi ction NMSE achieved by the third-order model in the two-input case (TI-LVN) satisfies our model-order selection criterion and a third-order model is selected. The prediction performance achieved by the TI-LVN model for a typical data segment is shown in Figure 7.19 (left panel), along with the contributions ofthe linear and nonl inear self-kernels and cross-kernels. The overall model prediction is very good, as attested to by the low value of the average NMSE (20%), and the contribution of the nonlinear tenns is significant (about 20% average NMSE reduction). The right panel ofFigure 7.19, where the spectrum ofthe MCBFV output is compared with the spectra ofthe third-order and first-order model residuals, shows that the nonlinearitie s reside mainly in the low frequencies below 0.08 Hz (the shaded area corresponds to the improvement achieved by the nonlinear terms).
Table 7.2 Average in- sam pie predict ion NMSEs (± standard deviation) for one-inp ut and two- input LVN models for cerebral autoregulation in 10 normal human subjects
Model inputs Model order
MABP
P ETC0 2
1
42 .2 ± 7.2% 25.7 ± 8.3% 26.8 ± 7.6%
93.2 ± 2.7% 78.2 ± 6.4% 71.7 ± 4.8%
2 3
MABP &
P ETC0 2
38.2 ± 6.5% 22.0 ± 6.0% 20.2 ± 5.4%
384
MODELING OF MUL TIINPUTIMUL T10UTPUT SYSTEMS
/
o
13)
1lKl
2«l
300
3Sl
-nn.[MCj
0115
TNI Wp~
0.1
0.15 Frlquaney (Hz]
0.2
0.25
03
Figura 7.19 Left panel : Actual MCBFV output and model predictions (total , linear , and nonlinear terms) for a typical data segment. Right panel: Spectra of actual output and model residuals (first-order and third-order total) [Mitsis et al., 2003a].
The contribution of each of the two model inputs as weIl as their nonlinear interaction can be seen in the left panel of Figure 7.20 for the same data segment. The top trace corresponds to the total (third-order) model prediction, the second trace corresponds to the contribution of the MABP input (linear and nonlinear terms ), the third trace corresponds to the contribution ofthe P ETC0 2 input, and the bottom trace corresponds to the nonlinear interaction betw een the two inputs (second-order and third-order cross-kernels). The MABP component accounts for about 60% ofthe total model prediction power, the P E T-
/Trueou!pI.C
120
ieo llme[HCj
240
300
300
0.05
0.\
0.\5 FrlqU1/lCY 1Hz)
0.2
0.25
0.3
Figura 7.20 Left panel: Total (third-order) TI-LVN model prediction and contributions of MABP, PETand non linear interaction terms for the data segment of Figure 7.19. Right panel : Spectra of actual output, first-order and total (third-order) residuals [Mits is et at., 2003a] .
C02'
7.2
APPLICATIONS OF TWO-INPUT MODELING TO PHYS/OLOG/CAL SYSTEMS
385
COz component accounts for an additional 17% of the total model prediction power, and the interaction component accounts for the remaining 23% in this example. The spectra of the MABP and of the total model residuals are shown in the right panel of Figure 7.20 along with the MCBFV output spectrum. It is observed that most of the contribution of the MABP input lies above 0.04 Hz, whereas the contribution ofthe PETCOZ input and the two-input interaction lies primarily below 0.08 Hz, being most prominent below 0.04 Hz (as illustrated by the shaded area). The relative linear-nonlinear contribution of each input (MABP and PETCOZ) in the total model prediction is illustrated in Figure 7.21, where each input contribution is decomposed into its linear (first-order Volterra) and nonlinear (second-order and third-order self) components. For this specific data segment, the power of the linear MABP component corresponds to about 80% of the total MABP power contribution, whereas the power ofthe linear PETC02 component is approximately equal to that ofits nonlinear counterpart. The aforementioned observations are consistent among different segments and/or subjects. However, considerable variability over time is observed in the form ofthe nonlinear self- and cross-Volterra kernels (see Section 9.4). The first-order MABP Volterra kernel for one subject, averaged over the 40 min recording (6-minute sliding data segments with a 5-min overlap), is shown in Figure 7.22, both in the time and frequency domains (log-linear plots, whereby the time lag values are incremented by one). The form of the kernel is consistent among different segments, as demonstrated by the tight standard deviation bounds. The high-pass characteristic of the first-order frequency response implies that slow MABP changes are attenuated more effectively (i.e., autoregulation ofpressure variations is more effective in the low-frequency range below 0.04 Hz where most of the power of spontaneous pressure variations resides). Two resonant peaks are evident around 0.06 Hz and 0.2 Hz, with a secondary one appearing around 0.025 Hz. Compared to the first-order frequency response obtained
8 6 4 2
o -2 -4
-6
Nonlinear MABPcontribution
o
60
120
180
Time [sec]
240
300
360
PETC02 -8' Nonlnear , , , contribution , , o 60 120 180 240 300
, 360
Time [sec]
Figure 7.21 Decomposition of the contributions of MABP (Ieft) and PETC02 (right) inputs in terms of linear and nonlinear components for the data segment of Figure 7.19 [Mitsis et al., 2003a].
386
MODELING OF MUL TIINPUTIMULTIOUTPUT SYSTEMS
1.6 'r -~~~-~~~--......-,
1.4
0.8
1.2
0.8
0.6 r- ,
o
0.4
-o.a
0.2
.{lA ,
1~
, 1
10
l1me[sec)
0' 2
10
, 10.1 Frequency [Hz]
10-2
Flgure 7.22 Average first-order MABP kernel (solid line) and corresponding standard deviation bounds (dotted Iines) for one subject over a 40-min data record (see text). Left panel: time domain. Right panel: FFT magnitude [Mitsis et al., 2003a).
when only MABP is used as an input [Mitsis et aI., 2002] , the MABP-to-MCBFV first-order frequency respon se in the two-input case exhibits reduced gain in the low-frequency range , reflecting the fact that most ofthe low-frequency MCBFV variations are explained by P E TC0 2 fluctuations. In the higher frequency ranges, the MABP first-order kernels are not affected by the inclusion of P E TC0 2 as an additional input. The average first-order P E TC0 2 kernel for the same subject is shown in Figure 7.23. It should be noted that the results shown here are obtained without shifting the P E TC0 2 data. Since it is known that a pure delay of 3--4 sec is present in the P E TC0 2 dynamics, significant variance in the initial time lags ofthe first-order and second-order kernel estimates of the P E TC0 2 input results. The first-order P ETC0 2 frequency respon se (right panel of Figure 7.23) has most of its power below 0.04 Hz and especially below 0.02 Hz (notice the shoulder at 0.03 Hz) and exhibits a secondary peak around 0.15 Hz, giving the P E TC0 2 first-order frequency response a low-pass characteristic rather than the high-pass characteristic of its MABP counterpart. Typical second-order MABP and P ETC02 self-kernels and the corre sponding cross-kernel are shown in Figure 7.24. Most of the power of the second-order kemels lies below 0.04 Hz, with some addit ional peaks around 0.06 Hz and 0.16 Hz. Ther e are two diagonal peaks in the MABP second-order frequency response: a main peak at [0.02,0.02 Hz] and a secondary peak at [0.16, 0.16 Hz], as weIl as off-diagonal peaks at bifrequency points [0.02 ,0.06 Hz] and [0.02, 0.16 Hz]. The P ETC0 2 second-order frequency response has a main diagonal peak also at [0.02, 0.02 Hz] and a secondary peak at [0.02, 0.06 Hz]. The main cross-kerneI peak occurs at [0.02, 0.02 Hz], and two secondary peaks at [0.16 Hz, 0.02 Hz] and [0.06, 0.02 Hz] (note the asymrnetry of the cross-kernel), which are related to the MABP and P E TC0 2 self-kernel main peaks and imply nonlinear interactions between the primary mechani sms of the two inputs acting at these specific frequency bands. Although the second-order kernels are considerably variabl e among different data segments, the main diagonal peak ofthe second-order MABP and P E TC0 2 frequency respons-
7.2
APPLICATIONS OF TWO-INPUT MODELING TO PHYSIOLOG/CAL SYSTEMS
387
0.8 0.7 0.6 0.5
0.5
0.4 0.3
o
0.2
" - " " ' \ , ....
0.1 0 -0.1
1ci
10.1
10.2 ~[Hz]
Figure 7.23 Average first-order PETC02 kernel (solid line) and corresponding standard deviation bounds (dotted lines) for one subject over a 40-min data record (see text). Left panel: time domain. Right panel: FFT magnitude [Mitsis et al., 2003a].
es stays in the neighborhood of 0.02 Hz, and eonsistently defines one eoordinate value for the secondary off-diagonal peaks in the bi-frequeney domain, while the other coordinate value lies in the mid-frequency range (0.05-0.10 Hz) and the high-frequeney range (0.10-0.25 Hz). The cross-kernel peaks are related in general to the self-kernel peaks. In order to i1lustrate the performance ofthe two-input model, we simulate it for hypercapnic and hypocapnic pulses with a magnitude of 1 mm Hg (onset at 20 sec and offset at 420 sec), followed by a shorter MABP pulse with a magnitude of 8 mm Hg applied between 150 and 300 sec. The onset/offset times are selected to allow sufficient settling time, based on the estimated kernel memories. The corresponding MCBFV model responses for a typical subjectlmodel are illustrated in Figure 7.25 and demonstrate that: (1) hypereapnia inereases MCBFV and hypoeapnia reduces it (as expected); (2) the on-transient and off-transient responses to the MABP step are distinct in waveform (i.e., not symmetrie), and the magnitude of the off-transient peak deflection is slightly larger than the eorresponding on-transient peak defleetion; (3) the settling time of the hypercapnic on-response to the MABP step is larger (around 50 sec) than that of the normocapnic or hypocapnic on-response (around 25 see); (4) the settling time for the off-response transient to the MABP step is about the same (=40 see) in all cases; and (5) the on- and offresponses to the PETC02 step are roughly symmetrieal in waveform, although the size of the hypocapnic steady-state response is slightly larger (=20%). The form ofthe responses to MABP andPETC02 step changes demonstrate the autoregulatory characteristics of the obtained models in a quantitative manner, but exhibit some differences compared to previously reported results, where the MCBFV responses to step increases in PETC02 were found to be much slower than the responses to step decreases in PETC02. This may be due to the fact that our model was estimated based on small fluetuations of PETC02 around its mean value, or that the model does not account properly for closed-loop processes active in this system (see Chapter 10). The presented results demonstrate the effieacy of the advocated methodology for
388
MODELING OF MULTIINPUTIMULTlOUTPUT SYSTEMS
0.5 Q.
..
0.25 llJiJ X 11m" I (
~, -- - ' - - "f ' ''1 ''''
-0.5
i I')
<
I
0.2
0.25
'
0.2
o
~ 0.15
o
~
(,
J
01
..'
o
0.05
'ö
50 Time [sec]
o
~
0.05
Time [sec]
0.25'1 0.05
0.15
1I I
I
0.2
~ 0.15
N
~
0.1
Frequency [Hz]
}.,
0
Q.
-0.05
o
0.05
o
lime [sec]
- o........,.,I".;=-c:::::::::;::.; , o 0.05 0.1 0.15
0.2
0.25
Frequency [Hz]
0.25 , 0.1
~ ~
~ 0.15
i 01~(W
0
CD
~
I
0.2
N
W
' 11 1 1
.>:
-0.1
o
0.05
o
o lime [sec]
o
0.05
0.1
0.15
0.2
0.25
Frequency [Hz]
Figure 7.24 Typical second -order selt-kemels and cross-kemels in the time (Ieft panels) and trequency (right panels) domains tor cerebral autoregulation in normal human subjects. Top: MABP selt-kemel; middle: PETC02 selt-kemel; bottom: cross-kemel [Mitsis et al., 2003a].
two-input modeling of nonlinear physiological systems, quantifying the linear and nonlinear effects of each input and their nonlinear interaction. This powernd approach can be extended to include additional physiological variables affecting autoregulation (which may lead to a better understanding of the underlying physiological mechanisms under normal and pathophysiological conditions), using the methods outlined in the following section .
7.3
THE MUL TIINPUT GASE
389
10
Nonnocapnla ...... Hypercapnla - . - Hypocapnla
8
6 4
... ...... .. ... .... .
~
0
I.
.!:!.
Ci: co
u
\
-2
J\:.. . ......... ... .... ". :Ät' y,. .r- -- - · _· _· _· - . ,.. '
~
,
..
~ 2
..... .....................
'~.::
:
.\ \.
\
"-
-4
_.- . _ . _.
\
\/
-
v ' ...
;.. I
._. _._. . . .
./ I
~I ~I
, ,\
,
-10
-12
o
50
100
150
200
250
300
350
400
450
500
Time [sec]
Figure 7.25 Model response to MABP single-input pulse (solid), to dual- input MABP pulse and hypercapnia (dotted), and to dual- inpur MABP pulse and hypocapnia (dashed-dotted) (see text for details) [Mitsis et al., 2003a).
7.3
THE MULTIINPUT CASE
Having examined the two-input case in some detail, we now extend this approach to the case of multiple inputs. Note that there is no methodological difference between a single output and multiple outputs, because the Volterra-type model is derived separately for each output of interest. In this connection, we should note also that nonlinear autoregressive (NAR) terms can be incorporated in this methodological framework, whereby the past epoch of the output signal can be viewed as another "input." Thi s attains particular importance in closed-Ioop systems and is examined in more detail in Chapter 10. With regard to multiple inputs, the mathematical formali sm of the two-input Volterra series is readily extendable to M inputs : y( t) = ko. . . . . 0 +
+ f~ · o
+,. .
r o
k, .o. .. .. o(r)x ,(t - r)dr+ . . . +
·f knl. . .. . nM(rl ' .. . , rn,+...+nM)Xl(t -
r 0
ko. .... o.,( r)x,,/t - r )dr + . . .
r l) ... xAlt - rn,+... +nM)dr , . . . r n,+. . . +nM
(7.22 )
where nJ, ... , nM denote the multiplicity ofproduct terms in the general Volterra functional of order (nI + ... + nM) that correspond to each input XI' ... ,XM, respectively. The complexity of this mathematical fonnalism renders the approach unwieldy and impractical as the number of inputs increases and/or the nonlinear order of the model increases. Nonetheless, the multi input Volterra model retains a mathematical elegance and gener-
390
MODEL/NG OF MULTIINPUTIMULT/OUTPUT SYSTEMS
al applicability (no arbitrary constraints are imposed apriori) that is without peer in a nonlinear context. For the special case of second-order systems/models, a large number of inputs can be accommodated in practice. The best example ofthis type ofmodel/system is in the area of spatiotemporal modeling of visual systems that is discussed in Section 7.4. No applications of multiinput modeling (more than two inputs) for higher-order systems are known to the author to date. This is because of the aforementioned "curse of dimensionality" for higher-order systems, compounded by the numerous cross-kemels arising from the presence of multiple inputs. We present in Section 7.3.3 a network-based methodology that offers for the first time a realistic solution to the vexing problem of multi input modeling of high-order systems. This solution is viewed as a possible path to understanding the full complexity of multivariate physiological systems. For instance, in the case of cerebral autoregulation discussed earlier, we may be able to incorporate additional inputs of interest (e.g., heart rate variability, respiratory sinus arrhythmia, pR, O2 tension, nitric oxide, cathecholamines, etc.), or, in the case of glucose metabolism, we may include (in addition to insulin and free fatty acids) the concentrations of glucagon, epinephrine, norepinephrine, cortisol, etc. Clearly, the methodological ability to extend our modeling to multiple inputs is critical but also relies on the availability of appropriate time-series data. The latter constitutes another major challenge in advancing the state of the art, for which emerging micro- and nanotechnologies can be of great assistance. Before we proceed with the network-based method that is advocated in this book, we discuss in Section 7.3.1 the cross-correlation-based method because it has been used in connection with spatiotemporal modeling in the visual system (illustrative examples are given in Section 7.4). We also discuss the kemel-based method in Section 7.3.2, because it constitutes the methodological foundation for the recommended network-based method (discussed in Section 7.3.3). 7.3.1
Cross-Correlation-Based Method for Multiinput Modeling
The cross-correlation-based method for multi-input modeling is, of course, based on a Wiener-type orthogonalized functional series when the inputs are independent white or quasiwhite signals. The orthogonalized multiinput Wiener series for second-order systems/models takes the extended form ofEquation (7.2) for Minputs: y(t) = ho,... .o +
f
o
hj,o, ... ,0(T)xj(t- T)dT+
f'J
+
ffo ho,o, ... , 2(
h2,0, ... , o(Tl> T2)xj(t Tl>
11
+
I
... I
nl=O nl+ ... +n~2
ho,o, ... , j(T)x,w{t- T)dT
0
+
o
f
Tj)xj (t
- T2)dTjdT2 - P, (00 h2,0, ... ,0(.'\, A)dA + ... Jo
T2)xA!...t - Tj)xA!...t - T2)dTj dT2 - PM (00ho,o, . . . , 2(A, A)d'\ + .. · Jo 00
Lfhnj, ... ,nJTiI'Ti2)xnj(t-Tij) ... XnM(t-Ti2)dTijdTi2 (7.23)
nNFO 0
where the indices i 1 and i2 in the last term correspond to the only two values of (nb . . . , nM) that are nonzero in the multiple summation (note the constraint: nl + ... + nM = 2 in the last summation) [Marmarelis & Naka, 1974a]. The notation for the second-order cross-kernel terms of Equation (7.23) (the last tenn on the right-hand side) can be extended to higher-order kemels, but this has no practical
7.3
THE MUL TIINPUT GASE
391
utility sinee the cross-eorrelation-based method cannot be applied to high-order multiinput systems in a realistie eontext. As mentioned earlier, the only aetual applieations of the multiinput eross-eorrelationbased method to date have been in spatiotemporal modeling of visual systems. For this reason, we adapt the fonnalism of Equation (7.23) to the spatiotemporal ease where the input s(x; y; t) is a funetion of two spaee variables (x, y) and time t, and the output is a spatiotemporal signal r(x; y; t) in the general ease, although in the spatiotemporal applieations to date it is viewed only as a funetion oftime r(t) (i.e., the output is reeorded at a single loeation, typically a neuron in the retina or in the visual eortex). The seeond-order spatiotemporal model takes the orthogonalized Wiener-type form [Citron et al., 1981; Mannarelis & Mannarelis, 1978; Yasui & Fender, 1975; Yasui et al., 1979]: r(xo;Yo; t) = ho + I dTI dxI dy : h-I»; y; T)S(Xo -x; Yo - y; t - T) o;
D
Dy
+ I I dT)dT2 I I dxidx, I I dy)dY2' h2(.xb X2; Yb Y2; Tb T2) o;
D D
».
Dy Dy
-pIdTI dxI dY'h 2(x,x;y,Y;T,T) D
o;
"»
(7.24)
where (xo,Yo) is the output referenee point in spaee, P is the power level of the spatiotemporal white (or quasiwhite) input, and the domains of integration are D for time-lag T, D; for spaee lag x, and D; for spaee lag y. The formalism of the spatiotemporal model of Equation (7.24) ean be extended to higher order but with limited praetieal utility due to its eomplexity (as diseussed earlier). The integration domain for the time lag extends from 0 to the memory JL of the system, and for the spaee lags (x and y) it extends over the area of the "receptive field" (RF) of the output neuron. Thus, Equation (7.24) gives a rigorous definition and elear understanding of a eomplete nonlinear dynamte RF for a retinal or eortieal neuron (applieable also to any other sensory system that reeeives input information not only in time but also in spaee and/or wavelength). In the spectrotemporal case, spaee is replaced by wavelength (or, equivalently, by frequeney). For instanee, the seeond-order response of an auditory unit may be modeled as r(AO; t) = ho + I dTI dA' h)(A; T)S(Ao - A; t- T) D
DA
+ I I dT) dT2 I I D D
DA DA
dAl dA2' h2(A" A2; Tb T2)S(Ao - A); t- T)S(Ao - A2; t- T2)
-pI dTI äX: h2(A, A; T, T) D
(7.25)
DA
where Adenotes the wavelength ofthe aeoustie stimulus and DA is its respective domain that defines the support of the "speetrotemporal reeeptive field" (STRF). The praetical ehallenge in measuring the STRF is the generation and applieation of an aeoustic stimulus that is speetrotemporally white (or quasiwhite), whieh implies statistieal independenee of multitone signals over a broad bandwidth (covering the bandwidth of the neuron of interest). This type of experiment and modeling ean reveal the eomplete STRF of an auditory system in a nonlinear dynamie eontext [Aertsen & Johanesma, 1981;
392
MODELING OF MULTIINPUTIMUL TIOUTPUT SYSTEMS
Eggermont et al., 1983; Lewis & van Dijk, 2003]. Likewise, the same fonnalism and modeling approach can be applied to spectroscopic measurements (e.g., time-resolved fluorescence spectroscopy data) to yield a complete characterization of the spectrotemporal characteristics of the source (e.g., the spectrotemporal characteristics of the fluorophores in the fluorescent tissue in the case of time-resolved fluorescence spectroscopy) [Papazoglou et al., 1990; Stavridi et al., 1995a, b]. The orthogonalized Wiener-type models in the spatiotemporal or spectrotemporal case lend themselves to kernel estimation via crosscorrelation, using the white (or quasiwhite) statistical properties of the inputs (as in the single-input case). For instance, the Wiener kemels in the spatiotemporal GWN input case are obtained as (7.26)
ho = E[r(xo; Yo; t)] 1
h1(x; Y; 'T) = p
Eh (Xo; Yo; t)s(xo - x; Yo - Y; t -
'T)]
(7.27)
1 h2(Xl' X2; Yl' Y2; 'Tl' 'T2) = 2p2 Eh(xo; Yo; t)s(xo - Xl;Yo - Yl; t - 'Tl)S(XO - X2; Yo- Y2; t - 'T2)] (7.28) where rl and r2 denote the output residuals of first-order and second-order, respectively (as in the single-input case), and the indicated ensemble average is replaced in practice by a time average and space average over the finite data record. These kernel estimation fonnulae are based on the autocorrelation properties of the spatiotemporal GWN input. Specifically, all the odd-order autocorrelation functions are zero, and the second-order is
E[s(x; Y; t)s(x'; y'; t')] = P5(x - x')5(y - y')5(t - t')
(7.29)
where 5 denotes the Dirac impulse function and Pis the power level ofthe spatiotemporal GWN input signal. The fourth-order autocorrelation function is also relevant and given by
E[S(Xl; Yl; t 1)S(X2; Y2; t2)S(X3; Y3; t3)S(X4; Y4; t4)] = P 25(XI - x2)5(Yl - Y2)5(t1 - t2)5(X3 - x4)5(Y3 - Y4)5(t3 - t4) + P 25(X2 - x3)5(Y2 - Y3)5(t2 - t3)5(X4 - xl)5(Y4 - Yl)5(t4 - t 1) + P 25(x 1 - x3)5(Yl - Y3)5(t1 - t3)5(X2 - X4)5(y2 - Y4)5(t2 - t4)
(7.30)
For quasiwhite inputs (e.g., binary or ternary that have been used in spatiotemporal applications), the fourth-order autocorrelation function and the second-order kernel estimation formula must be adjusted to the specific second and fourth moments of the quasiwhite signals (as in the single-input case with CSRS quasiwhite input signals). Obviously, the complexity of these expansions rises rapidly for higher-order kernels, preventing the practical use of cross-correlation-based methods. Illustrative examples of second-order spatiotemporal applications are given in Section 7.4. As in the two-input case, the multiinput Wiener kernels are generally distinct from their Volterra counterparts (except for the highest two orders of kemels in a finite-order
7.3
393
THEMULTIINPUTCASE
system that are the same for Volterra and Wiener) and depend on the power levels ofthe white (or quasiwhite) test inputs with which they have been estimated. Thus estimating the input-independent Volterra kernels remains an attractive prospect that is discussed in the following section.
7.3.2
The Kernet-Expansion Method for Muttiinput Modeling
The Volterra kernels of a multi input system can be estimated with the kemel-expansion method. As in the two-input case, distinct bases of functions can be used to expand the kemels associated with each input, resulting in the multiinput modified Volterra model: LI
y(t) =
Co
+
I
LM
Cl,O, ... , O(jl)V)~)(t)
+ ... +
LI
I ...
Jl=l
1(jM)v5~(t) + ...
JM=I
Jl=l
+
I
LM
I
J n 1+ · · · +nM=l
Cn], ... ,nJj, ... ,jn\+ ... +nM)vS~)(t) ... vS~+ .. .+nM(t)
+ ...
(7.31)
where
J;
v5~P) = b5~;( T)x;(t-
T)dT
(7.32)
with bj(i) denoting the jth basis function for the expansion of the kernels corresponding to input x.. The multivariate model of Equation (7.31) is linear with respect to the expansion coefficients that can be estimated by means ofleast-squares regression using input-output data. This model can be also parsimonious for judicious selection of expansion bases or by determining the appropriate PDMs of the multiinput system, following the procedures discussed in the two-input case. Note that the model ofEquation (7.31) is valid for all inputs and does not require white (or quasiwhite) inputs, so long as the input signals are fairly broadband (e.g., spontaneous or natural activity). As before, the concept of PDM modeling leads us to Volterra-equivalent network models that are discussed in the following section.
7.3.3
Network-Based Muttiinput Modeling
This is the recommended approach to the multiinput modeling problem because of its scalability advantage (i.e., its complexity, as measured by the number offree parameters, is linear with respect to the number M ofinputs and the order Q ofnonlinearity). To achieve this scalability advantage, we introduce the network architecture of Figure 7.26 that is equivalent to K Volterra models with M inputs and K outputs. This Volterraequivalent network employs the distinct filter banks {b)m)} (j = 1, ... ,Lm; m = 1, ... ,M) that preprocess the respective inputs and generate outputs {v5m )(t) } that are fed into separate hidden layers for each input (each having Hi; hidden units with polynomial activation functions (fhm)} (h = 1, ... ,Hm) ofdegree Qm). The outputs ofthe hidden units are fed into a common "interaction layer" with I interaction units that operate in a manner similar to the
394
MODELING OF MULTIINPUTIMULTJOUTPUT SYSTEMS
x,
XM
INPUT LAYER
FILTER LAYER
HIDDEN LAYER
INTERACTION LAYER
Figure 7.26
The Volterra-equivalent network model in the multiinput, multioutput case (see text).
hidden units having polynomial activation functions {gi} (i = 1, ... ,l) of'degree R, It is evident that the interaction layer generates the equivalent Volterra cross-kemels (in addition to the self-kemels that pass through). The outputs ofthe interaction units are summed with appropriate weights {"i,k} to form each output Yk (k = 1, ... , K) with offset value Yk,O. There are (LJIm) in-bound weights {W)~J} of the hidden units for input Xm, and (HJ) in-bound weights {,~m?} of the interaction units from the hidden layer of input X m. Therefore, the total number of free parameters for this network model is P
=
M
I
m=l
FI
I u.; + Qm + tvt; + (I + I)K + IR
i
(7.33)
without counting possible parameters in the employed filter banks (e.g., the parameter for each Laguerre filter bank). In order to demonstrate more easily the scalability advantage of this network, let us assume that all filter banks have the same size (L m = L), all hidden layers have the same number ofhidden units (Hm = H), all hidden units have the same order (Qm = Q), and all interaction units have the same order (R, = R). Then the total number offree parameters becomes P = MH(L + Q + l) + I(K + R) + K + M
(7.34)
where we have also included one free parameter for each filter bank (as in the case ofLaguerre filter banks). The expression (7.34) is instructive with regard to the complexity of
7.4
SPATIOTEMPORAL AND SPECTROTEMPORAL MODELING
395
this network model, since the total number of free parameters is linear with respect to Q and R, as well as Land M. Note that the multiinput modified Volterra model of Equation (7.31) has a much larger number of free parameters that grows exponentially with the order of nonlinearity and the number of inputs. For example, in the two-input/two-output case (M = K = 2) and for typical values of the structural parameters in applications to date-L = 7, H = 2, 1= 2, Q = R = 2-we have P = 56, whereas the modified Volterra model ofEquation (7.31) has 4,232 free parameters-a "compactness" ratio of about 80. Note that the nonlinear order of the multiinput network model is equal to the product (QR). This immense scalability advantage comes at the price of a more challenging convergence of the iterative training algorithm and is contingent on the assumption that the structural constraints imposed by the network architecture are compatible with the system at hand. The validity of the latter assumption is checked by the quality of the model performance (in terms of predictive ability and consistency of results from experiment to experiment) provided that the input ensemble is rich in information content and representative of the natural operation of the system. This multiinput network architecture can be applied to the spatiotemporal or spectrotemporal modeling problem as well, with immense scalability advantages (especially for higher-order systems).
7.4
SPATIOTEMPORAL AND SPECTROTEMPORAL MODELING
The multiinput modeling approach presented earlier in this chapter can be used for the study of spatiotemporal and spectrotemporal dynamics, whereby the different space locations or spectral bands of stimulation and response represent the multiple inputs and outputs. The mathematical formalism of a Wiener-type model (orthogonalized functional series for independent GWN inputs) is given by Equation (7.24) for the spatiotemporal case and by Equation (7.25) for the spectrotemporal case. The cross-correlation-based formulae for the estimation of the spatiotemporal Wiener kemels are given by Equations (7.26)-(7.28) for a spatiotemporal output signal. For an output that is only a function of time (i.e., recorded at a single location xo, Yo), the cross-correlation formula for the estimation of the qth-order spatiotemporal Wiener kernel using a finite data record t E [0, R] is ~ 1 JR hq(Xb ... ,xq; Yb ... ,Yq; Tb ... , Tq) = - - rq(xo; Yo; t) q!pqR
°
s(xo -Xl; Yo - YI; t- Tl) ... S(Xo -Xq;Yo - Y q; t- Tq)dt
(7.35)
where rq denotes the qth-order output residual, S denotes the GWN spatiotemporal input of power level P, and the integration over time is replaced by summation for the discretized data. This estimation formula has been used successfully in the study of the visual system up to the second order of nonlinearities, as discussed in the following two sections. However, the quality of the obtained kernel estimates is modest even for very long data records, and this approach becomes impractical for large number of inputs (i.e., many spatiallocations) or for higher-order nonlinearities. These practicallimitations motivated the introduction of more efficient Volterra-type modeling formulations that have the additional advantage of being applicable to cases
396
MODELING OF MULTIINPUTIMUL TIOUTPUT SYSTEMS
where the multiple inputs are not independent white (or quasiwhite) noise. This is extremely important in practice since it corresponds to the case of natural spatiotemporal stimuli (i.e., natural visual images changing in space and time). It is a key tenet of this book that our modeling efforts should be directed (to the extent practically possible) toward the natural paradigm of system operation (i.e., using the "natural" ensemble of input-output signals). The efficient Volterra-type formulation ofthe spatiotemporal or spectrotemporal modeling problem follows the path of the single-input case, whereby Volterra kerne I expansions are used to reduce the estimation problem to multivariate linear regression with the minimum number of free parameters [see Equation (7.31)]. Further compaction of this fonnulation (in terms of reducing the number of free parameters) leads to the networkbased fonnulation presented in Section 7.3.3 that represents the most efficient approach to the multiinput modeling problem to date (including the spatiotemporal and spectrotemporal cases). The spatiotemporal fonnulation of the modified Volterra model using kernel expansions is given by L
r(xo;Yo; t) = Co +
I
Cl (j)vj(xo;
Yo; t) + ...
j=l L
+I
. 1 11=
L
... I
Cq{jb ... ,}q)Vj1(Xo; Yo; t) ... Vj (Xo; Yo; t) +. . . q
. 1 i«:
(7.36)
where vj(xo; Yo; t) denotes the output of a spatiotemporal basis filter bj(x; Y; T):
vixo; Yo; t) =
f drf f"» D
dx o;
dy· bix; y; r)s(xo- x; Yo - y; t - r)
(7.37)
The filter bank {bj } forms a complete basis of the spatiotemporal dynamics of the system and is selected judiciously to minimize the filterbank size L required for a satisfactory approximation (in direct extension of the kernel-expansion concept used in the single-input case). The spatiotemporal basis functions can be represented in terms of separate temporal and spatial basis functions {bI, bf, bJ} as bj(x;Y; T) = bf(x)bJ(y)bI( T)
(7.38)
where {bI} can be a Laguerre basis and {bf}, {bJ} can be a Hermite or Gabor basis (although there is ample choice for temporal or spatial bases) defined over the domains D, D x , and Dy , respectively. The spatiotemporal basis {bj} need not be separable, as in Equation (7.38), but separability simplifies the computational algorithm. The formulation ofEquation (7.36) reduces the estimation problem to multivariate linear regression, since the unknown parameters (i.e., the kernel expansion coefficients) enter linearly in the modified Volterra model, as in the single-input case. The total number offree parameters in the model ofEquation (7.36) for a Qth-order system (i.e., all kernels up to Qth order) is (L + Q)!I(L!Q!). However, if distinct bases are used for the temporal and spatial dynamics (with sizes L r, Lx, L y) without regard to the simplifying constraint of Equation (7.38), then the total number of free parameters increases immensely, because the qth-order spatiotemporal Volterra kernel expansion becomes
397
7.4 SPATIOTEMPORAL AND SPECTROTEMPORAL MODELING
LX LX Ly kq(Xb ... ,Xq;Yl, ... ,Yq;Tb ... ,Tq)=I···I if=I i:=Ii[=I
Ly
LT
LT
i;=IiIT=I
i[=I
I···I I···I
aqUxb · · · ' )·X.·y q,) b Y
T
. . . ,)·Y.·T q,j b . . .
X X ·1\bif! ( Xl ) ... b i:( Xq)bY(v) ,jq] ir I
...
T
T(Tl) ... bi T(Tq) biY(Yq)biI q q
(7.39)
This unwieldy kernel expansion can be contrasted to the relative simplicity of the qth-order kernel expansion for the spatiotemporal basis ofEquation (7.38): kq(Xb . . . , x q; Yb ... , Y q; Tb
... ,
Tq) =
L
L
iI=I
i q=l
I ... .I cq(jJ, ... ,}q)
bi l(x I; YI; 'Tl) ... biq(xq;Y q; Tq)
(7.40)
Nonetheless, it should be noted that the apparent simplicity of Equation (7.40) disguises the fact that the employed spatiotemporal basis is far more constrained that the general case depicted in Equation (7.39). In a practical setting, the choice of the proper basis depends on the specific characteristics of the system. However unwieldy models generally emerge for high-order systems (higher than second order) and the network-based approach becomes far more attractive. The recommended Volterra-equivalent network model is shown in Figure 7.27 and employs a spatiotemporal filter bank for input preprocessing, as weIl as a hidden layer and an interaction layer that seek to approximate the static nonlinear mapping: r(xo; Yo; t) = F[VI(XO; Yo; t), ... , VL(XO; Yo; t)]
(7.41)
that is valid for all (xo;Yo; t) data points. The presence of the interaction layer is the key to achieving parsimony in a general spatiotemporal modeling context. The presented spatiotemporal models can be reduced to the spatial dimensions only by integration over the temporal dimension. This may be appropriate in certain applications, such as the study of visual receptive fields in response to static images or for the analysis of imaging apertures. In the space-only case, the spatial Volterra kernels {k~} are related to their spatiotemporal counterparts {k q } as kJ(xl s
•••
,xq ; Y I>
•••
,Yq ) =
f. ..f kq(x D
b ...
,xq ; Yl> ... ,Yq ;
Tb ••. , Tq)dTJ ••• dTq
(7.42)
The mathematical formalism for spectrotemporal models follows the same line of thinking, where the different spatial locations are replaced by different wavelengths or frequency bands of spectral interaction. Although applications of the efficient spatiotemporal modeling approach using Volterra kernel expansions or Volterra-equivalent networks have not been published yet, we present in Section 7.4. an illustrative example of a second-order cross-kernel estimate from a cortical neuron obtained by use ofLaguerre kernel expansions and compare it with its cross-correlation counterpart. In closing this section, we must note the pioneering contribution to spatiotemporal
398
MODELING OF MULTIINPUTIMULTIOUTPUT SYSTEMS
s(xo,Yo, t) SPATIOTEMPORAL INPlJT
SPATIOTEMPORAL FILTER BANK
HIDDEN LAYER
INTERACTION LAYER
SPATIOTEMPORAL OUTPUT
r(xo,Yo, t) Figure 7.27 Spatiotemporal Volterra-equivalent network model with input preprocessing by spatiotemporal filter bank.
modeling of Syozo Yasui, who first proposed the spatiotemporal extension of the Volterra-Wiener approach in the pivotal 1975 symposium at Caltech [Yasui & Fender, 1975] organized by Panos Marmarelis and Gilbert McCann, and Sutter's pioneering contribution in the same symposium [Sutter, 1975]. We must also acknowledge McCann's foresight in pursuing the onerous implementation of this approach to retina1 ganglion cells in collaboration with Citron and Kroeker [Citron et al., 1981b], Citron's work with Emerson [Citron & Emerson, 1983; Citron et al., 1981a], and Naka's pioneering efforts with Yasui, Davis, and Krausz [Davis & Naka, 1980; Krausz & Naka, 1980; Yasui et al., 1979].
7.4.1
Spatiotemporal Modeling of Retinal Cells
The study of visual receptive fields has been pursued conventionally through the tedious use of many specialized stimuli of various shapes, sizes, orientations, and contrasts. This laborious and time-consuming process has yielded useful knowledge but at great expense of time and resources. The advocated spatiotemporal modeling framework offers a far more efficient path to this goal by deriving quantitative descriptions of the visual receptive fields (RFs) that are valid for all possible stimuli and have predictive ability in a nonlinear dynamic context. Furthermore, the obtained spatiotemporal models can be used to study motion-related responses and transient behavior in response to arbitrary combinations of stimuli. Therefore, there seldom exists a paradigm of scientific research that can benefit so much from an emerging methodology as the case of spatiotemporal Volterra
7.4
SPATIOTEMPORAL AND SPECTROTEMPORAL MODELING
399
modeling of the visual system. Sadly, this fact has been largely overlooked by the peer community thus far. This is about to change. In this section, we present as an example the first application of the spatiotemporal Volterra-Wiener approach to the study of the RF of retinal ganglion cells in the frog (class 3 cells) performed by Citron, Kroeker, and McCann more than 20 years aga [Citron et al., 1981a]. The class 3 ganglion cells in the frog retina exhibits an excitatory RF center (6-10° in diameter), surrounded by an inhibitory RF annulus (from about 12° to 20°), the sole illumination of which does not elicit a response. The experiment involves a spatiotemporal white-noise (STWN) stimulus composed of a 16 x 16 grid of square pixels each subtending 1.2° on the retina (or a 75 J.Lm square) and modulated by independent binary CSRS signals (on or off for maximum contrast). The CSRS signals switch synchronously (and, of course, randomly) every 42.4 ms for a temporal bandwidth of approximately 24 Hz (or 24 spatial frames per second). This STWN input tests the ganglion cell for a broad variety of stimuli over the length of the experiment (720 sec) and is suitable for estimating the spatiotemporal Wiener-like (binary CSRS) kemels of the system via crosscorrelation, except at the diagonal points of the second-order kemels (see Section 2.2.4). The binary waveform was used to simplify the procedure but is unduly restrictive and was replaced by a 16-level CSRS in later experiments. The obtained spatiotemporal model can predict the response of the ganglion cell to any spatiotemporal stimulus, including moving objects and variable scenes. The obtained spatiotemporal first-order Wiener (CSRS) kemel h 1 is shown in Figure 7.28B as a sequence of eight two-dimensional frames every 42.4 ms extending 9.6° in either dimension (i.e., 8 x 8 pixels). The maximum (negative) value is observed at T = 42.4 ms (Fig 7.28C), and the half-max contour is approximately elliptical with main axes about 7° and 5°. The time course of h 1 (over T) is shown in Figure 7.28A for the central pixel ofthe RF and becomes slightly positive for T ~ 85 ms, retuming to the zero baseline around T = 300 ms. It is evident from the spatial dimensions of this measurement that the antagonistic surround ofthe RF is not included in the h 1 estimate (only the off-center excitatory portion of the RF is included). Since the total area of the stimulus grid is about 20°, we surmise that the annular (surround) portion of the RF was not captured properly in this study because of poor estimation accuracy due to the employed cross-correlation technique. This provides additional motivation for utilizing the improved estimation methods advocated in this book, especially in low-SNR cases. Note that in the spatiotemporal case, the multiple "channels" of information that flow from the various inputs to the output serve as "interference" (i.e., when the kemels corresponding to the mth input are estimated, the contributions of all other inputs to the output signal act as interference and degrade the output SNR). This is particularly detrimental in the case of the crosscorrelation technique, as evidenced by this result. An illustrative example of how much the kemel estimates improve with the use of the Laguerre expansion technique is given in the next section regarding cortical cells. It must be appreciated that the number of possible cross-kemels in the spatiotemporal model (even for second-order nonlinearities only) increases rapidly with the size of the RF measured in number of pixels, because all possible pair combinations must be considered in principle. For instance, the 8 x 8 RF considered in the example above (even though it represents only the excitatory center) has 2016 second-order temporal kemels, of which only 64 are self-kemels. This huge number of cross-kemels can be reduced by imposing symmetry and localization constraints, but remains daunting in general. For this reason, we advocate the equivalent PDM and network-based models (shown in Figure
400
MODELING OF MUL TfINPUTIMULTlOUTPUT SYSTEMS
·0.4
A
tl, 01 0 sinQlt pixel
0,2 Spt",,1 He
0 <12
-Oa
o
42~7- - '70 Tim'
T
2~
212
2i17
ImMel
' [J " D ' . [QJJ [illB] B . . GJ '.': ~ @l. :.D.:·. ." . '.: B
Comol,l . spal tOlemporol h l
..
0...
','
.
.
',"
:
:
o muc
c
42 rnMC
,
"
.
. , ':'.' GJ'
/
.
8~
'
".
rr
mue
12 7 mH C
...
.... -
, ' " . .
170 mHe
.
....
~ .: 212 mu c
. .
.
.
2~4
mMC
297 "' He
h , O' 0 .,nQle 11m. (T",•• I
96
dt
D
(Dllll ~l el
96 MOI·n
Figure 7.28 (A) first-order spatiotemporal Wiener kemel of the class 3 ganglion cell in the frog retina at a single pixel near the center of the receptive field. A flash stimulus at this pixel causes an initial decrease from the mean firing rate of background activity, followed bya smaller but longer increase in firing rate . Error bars represent the standard deviation obtained fram three independent trials. (8) firstorder spatiotemporal Wiene r kemel for all spatial positions in the off-center region of the receptive field over eight successive time frames (every 42.4 ms). Stippled areas represent negative kemel values. Note that each point In the receptive field appears to have the time course shown In A and differs only in magnitude. (C) expanded view of the peak response frame of the first -order kemei . Absclssa and ordinate give the spatial coordinates of the 16 by 16 array of stirnulus pixels . The slze of an Individual plxells Indlcated by the small box In the lower rlght of the contour map [Citron et al., 1981].
7.27) that avoid this problem and allow practical extension of spatiotemporal modeling into higher-order nonlinearities (if needed). In this example, the temporal second-order kerneIs were found to be similar in form but of different scale, depending on the distance separating the two input points. A cut through a second-order cross-kernel at the point of maximum value is shown indicatively in Figure 7.29 and exhibits similar waveform but reverse polarity from the first-order kernel. When the maximum (positive) value of the second-order kernel is plotted along one spatial dimens ion, then the expanse of second-order interactions appears to be comparable to the first-order kerneI expanse, as shown in Figure 7.30. However, for a fixed reference point, the peak values of the cross-kernels between neighboring points and the reference point appear to decline faster, as shown in Figure 7.30 with dashed lines in two cases. This intriguing localization of second-order interactions is consistent with the hypothe sis of nonlinear subunits within the RF of ganglion cells, as previously posited for the ganglion cells in the cat retina [Victor & Shapley, 1979a, b, 1980].
7.4
SPATIOTEMPORAL AND SPECTROTEMPORAL MODELING
401
Time course of h land h 2
+0.4
0.2
Spikes/ o sec
0.2
-0.4
+0.002
0.001
Spikes/ sec
o
0.001
L
-0.002
I
I
85
127
~__
o
42
Time T
I
170
j
I
I
212
254
297
(rnsec)
Figure 7.29 Comparison of the time course of first- and second-order spatiotemporal Wiener kernels of the class 3 ganglion cell in the frog retina. The second-order kernel was cut at Jit = 0.0424 sec (bottom trace) for comparison with the first-order kernet (top trace) [Citron et al., 1981b].
1,00
o
I ....
:=sC
-
-,
..
>
,m:-
L.-..J
1.2deorees
oI
....
> ~ ..
-----====-
Pec* Spatial hl -1.00
Figure 7.30 Comparison of the spatial profiles of first- and second-order spatiotemporal Wiener kerneis of the same class 3 ganglion cell shown in Figure 7.29. For clarity, only values at Tmax (time of peak response) are plotted. The abscissa represents location along the diameter of the receptive field. The lower solid curve represents the relative magnitude of the first-order kernei, which is a cut through the surface shown in Fig. 7.28C. The dotted and dashed lines represent two examples of two-point spatial interaction about the two reference points indicated by arrows. These spatial interactions were determined around each pixel in the receptive field, and the upper solid line represents the envelope of the peak values of each of these interaction curves [Citron et al., 1981b].
402
MODELING OF MUL TIINPUTIMUL TlOUTPUT SYSTEMS
This possible organization of retinal ganglion cells makes them suitable for the network-based spatiotemporal modeling approach advocated in this book, which promises immense efficiency gains in this formidable problem. The time has come to tackle the full complexity of this important modeling problem. 7.4.2
Spatiotemporal Modeling of Cortical Cells
The spatiotemporal modeling approach presented in the previous section was also applied by Citron, Emerson, and their associates to cortical cells in the cat striate cortex [Citron et al., 1981a; Citron & Emerson, 1983; Emerson et al., 1987, 1992]. The experimental stimulus for directionally selective cells was changed to randomly moving bars in the preferred direction (one space dimension instead of two), whose location and contrast was determined by two independent CSRS (quasiwhite) processes of 16 possible values for location and three possible values for contrast. The middle contrast value corresponding to mean luminance of 222 cd/m? was defined numerically as "zero level," the low luminance value corresponding to dark was defined as -1, whereas the high luminance value corresponding to 444 cd/m? was defined as +1. The dimension of each bar was 0.5 0 by 110 , covering the whole RF in the long direction. The luminance and location values switched every 16 ms, and this "random grating" stimulus was applied for 7 min continuously. The peristimulus histogram of the cortical cell (over five repetitions) was used as the output signal [Emerson et al., 1992]. A typical cross-kernel for a strongly directionally selective complex cell is shown in Figure 7.31, corresponding to two adjacent positions (9 and 10). The ('Tb T2) representation is typica1 of directionally selective cells as seen earlier in the fly (Section 7.2.1). The ('T, a 'T) representation allows the suppression of the 'T variable through integration (summation) because the positive "mount" and negative "valley" are separated at d'T= O. Upon suppression ofthe 'T dimension, we display in Figure 7.32 the cross-kernel values over d'T (from -8 to +8) and space (from 1 to 16), representing the position ofthe moving bar (the reference bar position is at 9). The alternating "mounts" and "valleys" of the plotted surface reflect the directional selectivity of the complex cortical cell. The direction of preferred motion is marked on Figure 7.32(c) and corresponds to the main axis of the elongated main lobe. The null direction of motion is also marked on Figure 7.32(c) and corresponds to the line that connects the centers ofthe elliptical contours (so that integration along this line yields very small values). The slope of the main axis (preferred direction) determines the preferred velocity for motion detection by this complex cell. As this slope increases, the preferred velocity becomes higher (other directions result in smaller output values, monotonically decreasing toward the minimum at the null direction). The width of the "mounts" and "valleys" is comparable and determines the "optimal" width of the bar that elicits the maximum response from this complex cell when moving in the preferred direction. Another implication of this result is the critical value of the width of a bar above which the directional selectivity of the cell will diminish (because of attenuation caused by integration along the space dimension). This example i1lustrates the wealth of information that can be extracted from cortical complex (or other) cells by means of a fairly short experiment and rigorous analysis. Since the presented results were obtained by means of the conventional cross-correlation technique (with all its well-documented limitations), the advanced estimation methods advocated in this book can enhance further the quality of the information extracted from these data.
7.4
SPATIOTEMPORAL AND SPECTROTEMPORAL MODEL/NG
403
A
B
•
•L
sr
q
l -~--------
..
1~q:4!J
:. lF
0
()
•
t
~
/,
•
~
~
~
Figure 7.31 (A) Second-order interaction cross-kernel values plotted as a two-dimensional function of the two time lags from two interacting stimuli. (8) Same interaction values as in A, but plotted as a function of stimulus temporal separation (~7) versus time after the second stimulus (7). Square of A maps to triangle of B, and heavy square of 8 to heavy arrow-shaped region of A. Note that interaction values in ~ T versus T coordinates of Bare nearly separable, which permits suppression of the Tvariable (see text). Contour at zero is not shown in any contour plots because it is dominated by noise [Emerson et al., 1992].
404
MODELING OF MUL TIINPUTIMULTJOUTPUT SYSTEMS
B 11"'.") 11
o
J: 12 iiII_
.c. ••
..
o.
'
"u 0"
...z~ -
e
C,., .... Cl -,
-4
0
.4
.,
6T
SCH!MATIC INT!RACT10N
c
..on...' .,. 11 11
o
5 '21'" .. :: .c.
..
0-
0"
•• a-. I
, IIOVI"'~ OOW ..
-,
-4
0 .4 ., , . . nuTIW TO CIMn" Ult f6TJ
Flgure 7.32 Spatiotemporal dependence of nonlinear interactions around position 9 in the same directionally selective complex cell as in Figure 7.31. (A) second-order interacti on values, such as in Figure 7.31 , have been projected onto the "neighbor position" versus AT by summ lng over the first 80 msec (5 lags) following the second stimulus, and plotted In perspective vlew as a function of the full range of 16 possible 0.5° nelghbor positions, and 15 temporal separations (A.ry covering ±112 msec. (8) Contour plot shows that the ridge-valley structure of A is obliquely orlented In space-time (dashed contours are negative). (C) Schematic contour pattem shows locus of interactions elicted by a bar moving downward (broken arrow) at 15.5°/sec, whlch corresponds to the velocity that intersects the minlmallnteraction values. This cell was dlrectlonally selectlve because upward movement at th is speed elicited strong positive interact ions (open ellipse), while downward movement at the same speed elicited the strongest suppression in the RF. "Fast" arrow indicates the trajectory assoclated wlth a faster upward moving bar [Emerson et al., 1992].
7.4
X-MIN= 0.0 X-MAX= 20
SPATIOTEMPORAL AND SPECTROTEMPORAL MODELING
Y-MIN= 0.0 Y-MAX= 20
405
Z -MIN= - 7 .024x 10-3 Z-MAX= 5.644x10- 3
(A)
X-MIN= 0.0 X-MAX= 20
Y-MIN= 0.0 Y-MAX= 20
3
Z-MIN= -5.643x '':pZ-MAX= 4.98x10-
(B) Figure 7.33 Second-order cross-kernels between locations 4 and 5 in a random bar stimulus of a complex cortical cell obtained via Laguerre expansion (A) and cross-correlation (8), using a short input-output data record. The Laguerre-expansion results ware consistent for various data segments. The cross-correlation estimate is evidently unreliable.
406
MODELING OF MULTIINPUTIMUL TlOUTPUT SYSTEMS
An illustration ofthis is given in Figure 7.33, where the cross-kernel between two adjacent input locations (4 and 5) ofthe random bar stimulus is shown for a simple cortical cell as estimated by the Laguerre expansion technique in the left panel (L = 6, a = 0.2) and by the cross-correlation technique in the right panel (unpublished results-data provided to the author by Bob Emerson). The comparison is attesting to the potential benefits accrued by the use of the advocated kernel estimation methods.
8 Modelingof Neuronal Systems
Because ofthe particular signal modality employed by the nervous system (sequences of action potentials) and the rising interest among neuroscientists in quantitative methods of analysis, we dedicate aseparate chapter to the modeling of neuronal systems. The mathematical modeling of neuronal function has been attracting increased attention as the quantitative means for understanding information processing and coding in the nervous system. Modeling efforts have been made from the subcellular and cellular levels to the integrated levels ofneuronal networks and behavioral neuroscience. Techniques for neural modeling include parametrie and nonparametric methods (i.e., the use of differential or integral equations, respectively) and remain rather diverse. The great diversity of modeling approaches is a natural consequence of the immense variety of neural systems and the diverse requirements of different applications. The approach presented herein must be viewed as a modeling tool suitable for a broad class of problems that involve the nonlinear dynamic relation between discemible and measurable stimuli and the corresponding observable response (output). The nervous system is composed of a multitude of functional components interconnected in complex configurations that process information in order to perform specific vital tasks in its interactions with the environment and in preserving physiological homeostasis. Developments in systems science allow us to formulate the study of information flow within the nervous system as a "systems problem," whereby signals, representing information, travel between neuronal components and are dynamically transformed by them. The use of the systems approach is predicated on appropriate conceptual and mathematical fonnalism in describing the "transfer characteristics" of the neural system components, their interconnections, and the transformed neural signals. In the case of the nervous system, the level of system integration/decomposition (i.e., the chosen level for examining the natural hierarchy of functional organization) may range from the subcellular to the behavioral. In Sections 8.1-8.3, we focus on the level of individual neurons (viewed as the basic operational unit), which receive certain input signals and generate, in NonlinearDynamic Modeling ofPhysiological Systems. By Vasilis z. Marmarelis ISBN 0-471-46960-2 © 2004 by the Institute ofElectrical and Electronics Engineers.
407
408
MODELING OF NEURONAL SYSTEMS
a causal manner, certain output signals. Aceurate understanding of the functional properties of the "neuronal unit' facilitates the study of neuronal ensembles and networks, leading to "integrated" neural system representations in Section 8.4. Two types of neuronal systems will be studied from the modeling viewpoint: those pertaining to the process of generation of postsynaptic and action potentials by individual neurons, and those concerning the transformation of sequences of action potentials (and related events, such as population spikes) between neurons in neuronal ensembles. The former type of system performs encoding of input information into a sequence of binary events (action potentials) by individual neurons, and the latter type of system performs processing of this encoded information by neuronal ensembles. Both types of systems exhibit significant nonlinearities that are essential for neurophysiological function. Therefore, the advocated methodology appears to be suitable for the task, after appropriate adjustments for the particular signal modality of neuronal spike trains (sequences of action potentials represented as binary variables). The chapter begins with a general model form (proposed as an alternative to the Hodgkin-Huxley (H-H) model ofexcitable membranes) that is suitable forpractical databased modeling of action and postsynaptic potentials. This model form is simpler than the classic H-H model and can be applied to both voltage-dependent and ligand-dependent processes at the synapse or the axon hillock. This general model form is used in Section 8.2 to integrate the functional dynamics of a single neuron. In Section 8.3, the methodological subtleties attendant to the use of point-process (spike) inputs are discussed and examples are given from actual applications. In Section 8.4, the modeling problem of signal transformation and processing in neuronal ensembles is addressed in the Volterra-Wiener context using the advocated PDM modeling approach. It is shown that this methodological framework is unique in its power to offer a practicable solution to the difficult problem of multiunit neuronal processing in the true nonlinear dynamic context.
8.1
A GENERAL MODEL OF MEMBRANE AND SYNAPTIC DYNAMICS
One of the most popular modeling problems in neurophysiology concems the ion-channel mechanisms that give rise to transmembrane currents and potentials, starting with the seminal work of Hodgkin and Huxley on the generation of action potentials at the squid axon membrane '[Hodgkin & Huxley, 1952]. The H-H model is highly celebrated (and was honored properly with a Nobel prize in 1963) because it delineated quantitatively the specific mechanisms of distinct biophysical processes in excitable membranes (leading to the key notion of ion channels) in addition to describing accurately the experimental data. Thus, it influenced profoundly the scientific way of thinking about these biophysical processes and illuminated a productive path for numerous subsequent studies, representing an outstanding example of the utility of mathematical modeling in biology. Nonetheless, the H-H model also has had several critics and detractors who questioned its validity under different sets of conditions than the ones considered by the famous duo. Numerous attempts have been made to simplify the original form ofthe H-H model (in terms of its postulated nonlinearities and the number of free parameters) in order to make it more useful in the practical context of neuronal ensembles and more accessible mathematically to the broadest possible community of investigators. These attempts range from the simplistic "integrate-and-fire" model to elaborate phase-space analysis, and this subject matter remains an active area ofresearch 50 years after the seminal publication ofthe
8. 1 A GENERAL MODEL OF MEMBRANE AND SYNAPTIC DYNAMICS
409
H-H model. The H-H model has been examined also under conditions of white-noise stimulation in the Volterra-Wiener modeling context [Guttman et al., 1974, 1975, 1977; Courellis & Marmarelis, 1989]. In this section, a general parametric model form is presented as a simpler alternative to the H-H model that can be also extended to synaptic dynamics. The motivation for introducing this novel model form is to reduce somewhat the complexity ofthe H-H model and extend its applicability to synaptic dynamics, while maintaining the essential functional characteristics observed experimentally for the dynamics ofpostsynaptic and action potentials. Another motivation is the need for practical estimation ofthe dynamic characteristics of each ion-channel from natural (broadband) input-output data without resorting to voltage-clamp experiments. This general model form aspires to facilitate the functional integration at the single-neuron level and eventually at the aggregate level ofneuronal ensembles. Validation of this model relies on inductive, data-based modeling using broadband experimental data and, therefore, it is also examined in the nonparametric context of the Volterra-Wiener approach and its PDM modeling variant advocated in this book. The key observation is that currently used voltage-clamping techniques do not allow the accurate measurement ofthe voltage-dependent conductances ofthe various ion channels (as claimed extensively over the last 50 years) even under pharmacological ion-specific blocking of channels. This is evident from the H-H equation, in which the required current for clamping at an imposed reference transmembrane voltage V depends not only on the various conductances at Vbut also on the partial derivatives ofthese conductances with respect to voltage (evaluated at V). For instance, after blocking ofthe sodium channel with TTX, we expect to observe a clamping potassium current M:
M = [g~V) +
ag;,.V) (V - V
K
)
J.:1V
(8.1)
where av = V - Vm , Vm is the membrane resting potential, VK is the potassium equilibrium potential, and gK is the voltage-dependent conductance of the potassium channel. Note that Equation (8.1) is also simplified in that it neglects the effect of other possible nonsodium channels (that are not blocked by TTX) and the so-called gating current. AIthough numerical techniques can be construed to obtain an estimate of g~V) from voltage-clamp measurements governed by Equation (8.1), this is not a trivial matter in practice and is subject to many possible sources of error, in addition to requiring long and laborious experimentation. This fact provides the rationale for the proposed alternative model that allows the practical estimation of the dynamics of individual ion channels directly from the data under conditions of natural operation. The general form of the proposed model is
y + ay = F(vJ, ... ,VM)Y + bo-z
(8.2)
where dot denotes derivative and the negative feedback term z(t) is given by
z+ yz = R(y)
(8.3)
where y denotes the "output" transmembrane potential, b o is a "basal" value of transmembrane current related to intrinsic ion pumps, z denotes the "re setting feedback" (responsible for the refractory behavior of the neuron and related to channel inactivation mecha-
410
MODELING OF NEURONAL SYSTEMS
nisms), R denotes a sigmoidal function with high slope (a threshold operation), and F denotes a bounded nonlinear function ofthe individual ion-channel variables: V/t) = gi 0 x(t)
(8.4)
where 0 denotes convolution between the input current x(t) and an impulse response function g/ T) characteristic of the ith ion channel. The latter describes the dynamics of each of the M ion channels. Critical in this formulation is the selection of the function F, since it is responsible for defining the nonlinearities of the voltage-dependent ion-channel conductances that lead to generation of an action potential. For synaptic dynamics, the terms of this model attain different meaning and interpretation. As an illustrative example, consider two channels (e.g., sodium and potassium) with dynamics described by the simple biexponential functions gI(T)
= aI(e-bIT- e- CI T)
(8.5)
g2( T)
= a2(e- b2T - e- C27)
(8.6)
where Cl > b, > C2 > b2 (b, and Cl correspond to the faster sodium channel). We consider also F to be a sigmoidal function acting on the difference (VI - V2) as
ß Fiv-; V2) = 1 + expj-tv, -
V2 -
0)]
(8.7)
Finally, we consider also a sigmoidal function for the "resetting" feedback function R: P R(y)
= 1 + exp[-A(y -
m
(8.8)
where A attains a high value (sharp slope of thresholding operation). This model can reproduce quite accurately the time course of the action potential generated at the axonal membrane of the squid or elsewhere (with proper adjustment of the parameters). The main functional features ofthe model are: (1) the activation phase of spiking suprathreshold behavior (that generates the action potential) due to properly timed negative conductance (a - F), and (2) the inactivation phase of refractory behavior due to the negative suprathreshold feedback z. For subthreshold behavior, the feedback component z is not active and the voltage-dependent "conductance" [o - F(Vh V2)], remains positive, so that the resulting output
y(t) = y(O) · exp{
-{[Cl-
F(vl>
V 2 )]dt' }
(8.9)
does not give rise to an action potential. However, when the integral in the exponent of Equation (8.9) becomes positive (negative "conductance"), then an action potential results that triggers the negative feedback loop and causes subsequent hyperpolarization and refractory behavior. The sigmoidal functions Fand R secure stable operation and bounded output. Their sigmoidal form can be attributed to the self-limiting process, intrinsic in biophysical
8. 1 A GENERAL MODEL OF MEMBRANE AND SYNAPTIC DYNAMICS
411
processes, where the substrate for the subject species gets depleted (Bernoulli equation). The negative feedback mechanism is attributed to the inactivation of the sodium channel. The channel dynamics (or kinetics) described by the impulse response functions {gi} may attain any form consistent with the data, but the dynamics for y and z are set to first-order (capacitive effects) to simplify the formalism without sacrificing much in generality. For a step input, the variables VI and V2 will reach steady asymptotic values that will trigger repetitive action potentials, if above threshold. The frequency of these action potentials will increase with increasing step input values until a limit is reached due to the saturation of the sigmoidal function F. The claim that this model can be used for estimation of individual channel dynamics is due to its mathematical relation with the Volterra kernels of this system (or their equivalent PDMs) that can be directly measured form the input-output data in the subthreshold region (as shown below). Furthermore, the claim of applicability to various transmembrane processes and synaptic dynamics is based on the fact that any number of channels of arbitrary dynamics can be used with various nonlinear relationships in this fonnulation. Note that the feedback term can be dropped for synaptic dynamics (if no fast inactivation mechanisms are present), or can be relaxed to a much "softer" feedback mechanism. The generality of this model form is appealing if we can show that the individual channel characteristics can be related to kernel or PDM measurements, as intended. Let us now examine this model in the context ofVolterra analysis. For the two-channel example, consider the Taylor expansion ofthe sigmoidal functions: F(VI - V2) = cPo + cPI . (VI
-
V2) + cP2(VI
-
V2)2 + ...
R(y) = ro + rl . Y + r2 . y2 + ...
(8.10) (8.11 )
The analysis is simplified considerably in the subthreshold region, where the zero and first-order Volterra kemels of this system are given by
bo
ko=
a-4>o
k!( T) = ko4>! . {e-PA[g!( T- Ä)- g2(T- Ä)]dÄ
(8.12)
(8.13)
o
where p = a - cPo. The first-order Volterra kernel attains a simpler form in the Laplace domain: kOcPI
(8.14)
where GI(s) and G2(s) are the Laplace transforms of gl and g2, respectively. Note that the zero-order Volterra kernel ko is the resting potential and known in practice. Also, the basal value b o can be measured or estimated separately. Therefore, Equation (8.12) allows estimation ofthe "rate constant" (o - cPo) = bolko. This rate constant is important because it appears as a pole of the first-order Volterra kernel, as indicated in Equation (8.14). Thus, the key function G(s) = [G1(s) - G2(s)] can be estimated from
412
MODELING OF NEURONAL SYSTEMS
Equation (8.14) within a scalar 4>b using the first-order Volterra kernel estimate that is obtained from the subthreshold input-output data (which ought to be broadband from "natural" operation of the system). The waveform of the key function g(t), which is the inverse Laplace transform of G(s), contains the critical information regarding the dynamics of the individual ion channels partaking in the generation of the action potential (or postsynaptic potentials in the case of synapse models). The scaling by the parameter 4>1 does not affect this waveform. Ifpharmacological isolation of individual ion channels is possible, then each individual channel response function g;(t) can be determined through this relatively simple procedure. For instance, in the aforementioned example of the two channels (sodium and potassium), blocking ofthe sodium channel with TTX allows estimation ofthe potassium-channel dynamics g2(t), and blocking of the potassium channel with TEA allows estimation of the sodium-channel dynamics gl(t), provided, of course, that no other ion channels are present. These measurements are based on our ability to obtain reliable estimates of the firstorder Volterra kernel under subthreshold conditions. Having evaluated the individual channel dynamics {g;(t)}, we now turn to the estimation of the nonlinearity F. In the subthreshold region, the second-order Volterra kerne I in the two-dimensional Laplace domain is given by K 2(s j, S2) = {kocf>2 G(S\)G(S2) +
~\ [K\(Sj)G(S2) + K\(S2)G(Sj)] }/(St + S2 +p)
(8.15)
or in the time domain: k2(Tb T2) = ko4>2 4>1 +-
2
Tm
fo
e-PAg( Tl
-
A)g(T2 - A)dA
fTm e-PA[kl(TI 0
A)g(T2 - A) + k l(T2 - A)g(TI - A)]dA
(8.16)
where Tm = miru Tl' T2). The second-order Volterra kernel measurement under subthreshold broadband conditions can be used to validate the structure of our model (since its dependence on the previously estimated kl and g is specific to the structure of the model). This validation process entails the least-squares fitting ofthe measured k2(Tb T2) to the two components on the right-hand side ofEquation (8.16) that can be constructed from the known p, g, and kl . This least-squares fitting (if it validates the postulated model structure) yields estimates ofthe scalar quantities (k 04>2) and 4>1. Since k o is already estimated, estimates of the parameters 4>1 and 4>2 become available, which can be used to estimate the two unknown parameters (ß and fJ) ofthe sigmoidal nonlinearity F given by Equation (8.7). This relatively simple procedure of estimating the model components in the subthreshold region (using Volterra kernel measurements up to second order, obtained from the broadband input-output data) can be extended now to the suprathreshold region in order to estimate the characteristic parameters '}', p, A, and ( of the feedback term z. There are two routes to this goal: (1) the use of Volterra kernel measurements from data in the suprathreshold region, and (2) the direct least-squares fitting of Equation (8.3) using estimates of z(t) from suprathreshold data; since we have from Eq. (8.2)
z = F(Vb V2)Y + bo - ay -
y
(8.17)
8.1
413
A GENERAL MODEL OF MEMBRANE AND SYNAPTIC DYNAMICS
Recall that the right-hand side of Equation (8.17) is known because all its parameters have been already estimated from subthreshold data. Thus, fitting of the integral equation
z(t) = fe-rrRfy(t - T)]dT
(8.18)
o
where z(t) and y(t) are known, can yield estimates of l' and the parameters of the sigmoidal nonlinearity R. This fitting procedure also validates (or does not) the feedback structure of the postulated model. Significant discrepancies will necessitate, of course, modification ofthe postulated feedback structure ofthe model. The use ofVolterra kerne1measurements in the suprathreshold region requires the following analytical expressions that result from Volterra analysis of the model with the feedback term active (suprathreshold behavior): kg= bo-ro 'YP
(8.19)
kf(T) = k04>I(h(A)g(T- A)dA
(8.20)
where the superscript "s" denotes "suprathreshold," and h(A) is the inverse Laplace transform of (s + ,,) H(s)
(8.21)
= (s + p)(s + Y) + Cl
with the constant Cl given by c I = L""' nkn-Ir 0 n
(8.22)
n=l
For the second-order Volterra kernel in the suprathreshold case, the analytical expression in the Laplace domain is Ki(Sh sz) = {
~I
(SI
+ Sz + Y>[Kf(sl)G(sz) + Kf(sz)G(SI) +
- czKf(sl)Kf(sz)
2~:z G(SI)G(SZ) J
}/[(Sl + Sz + P)(SI + Sz + y) + cd
(8.23)
where: 00
""'
,
2 n. C2 = ~ ( _ 2)'2 rnkÖn-2 n .
(8.24)
The complexity of the second-order Volterra kernel expression makes it unlikely that it will be put into practical use. There fore, the fitting procedure of Equations (8.17) and (8.18) appears more likely to be used in the suprathreshold region.
414
MODELING OF NEURONAL SYSTEMS
With regard to synaptic dynamics, the general model retains the same form but without the sharp feedback term of the inactivation process (feedback is possible but of milder functional form). The individual processes of presynaptic release and postsynaptic binding of various neurotransmitters are represented by the functions {gi} that describe individual "synaptic-channel" dynamics. The activation nonlinearity F describes both the individual self-limiting nonlinearities for each neurotransmitter (typically of the compressive sigmoidal type) as well as the possible nonlinear interactions among different neurotransmitter processes (including "synaptic channels" dependent on second-messenger processes). The output y(t) represents the postsynaptic potential induced by a presynaptic voltage or current input x(t). The analysis of this model replicates the procedure outlined previously for the subthreshold region of the axonal membrane. In the following section, we employ the general model form obtained in this section to examine the functional integration of a single neuron and to derive a general, yet realistic and practicable, model representation of this integration process that is fundamental to the function of the nervous system.
8.2
FUNCTIONAL INTEGRATION IN THE SINGLE NEURON
We focus on the neuron as the basic operational neural unit from the viewpoint of signal transformation and coding. The term "transformation" is used to indicate the dynamic process of combining multiple channels of input information to produce a composite intracellular potential or current at the site of the axon hillock where the generation of action potentials takes place-the latter representing the "coding" of neural information. Note that the representation ofinformation flow in the nervous system (neural signals) is either in the form of "graded potentials" (GP) or "action potentials" (AP). The latter are often referred to as "spikes" (i.e., impulse-like waveforms) and it is accepted that their information content is in the interspike time intervals and not in their magnitude or shape. The graded potentials are continuous-time signals, whose values (like those of the APs) are measured relative to the "resting potential" of the neuron. The generation and transformation of these potentials is accomplished through complex biophysical mechanisms of excitable tissues (ionic currents arising from changing membrane permeabilities), release and binding of neurotransmitters in synaptic junctions, electrical coupling in "tight" junctions, and passive "electrotonic" spread of potentials. These biophysical mechanisms have been studied extensively and are the subject of a vast literature. In this section, we limit ourselves to the fundamentals of neuronal structure and function that are common (with variations in size, geometry, and specific functional attributes) in most neurons. The "prototypical" neuron is comprised of a cell body (soma), a tree-like structure of receiving fibers (dendrites), and a long transmitting fiber (axon) with occasional branches (collaterals). The axon is attached to the soma at the "axon hillock" and, along with its collaterals, ends at synaptic terminals (boutons) that are used to relay information onto other neurons through "synaptic junctions." The soma contains the nucleus and is attached to the trunk ofthe dendritic tree from which it receives incoming information. The dendrites are conductors ofinput information to the soma (input ports) and usually exhibit a high degree of tree-like arborization (up to several thousand branches). Input information arrives at various synaptic sites in the dendritic tree and the soma through axonal terminals of other neurons or electrical (tight) junctions. The axonal terminals release chemical neurotransmitters into the synaptic cleft upon stimulation by arriving puls-
8.2
FUNCTIONAL INTEGRATION IN THE SINGLE NEURON
415
es (the branched-out APs of presynaptic neurons), which diffuse across the synaptic gap (typically 200 A wide) of the synaptic junction and induce the generation of a "postsynaptic potential" (PSP) at the postsynaptic site ofthe dendrite or soma by binding onto neurotransmitter-specific receptors. The neurotransmitter can be excitatory or inhibitory, which determines the relative polarity between presynaptic and postsynaptic potentials. The PSP propagates away from the synaptic junction in an electrotonic manner (i.e., in a way analogous to current flow in passive electric cables), leading to attenuation of amplitude and spreading ofthe waveform. The resulting dendritic potentials (DPs) merge at the numerous nodes of the dendritic tree and eventually arrive at the soma where, combined with potentials generated directly at the soma by possible somatic junctions, they produce the composite intracellular potential or current at the site of the axon hillock. Note that the merging ofDPs is, most likely, a nonlinear operation because these potentials are due primarily to migration of sodium ions, which may lead to sigmoidal saturation. Although passive dendritic fibers are more common, active (or semiactive) ones have been found that have voltage-dependent permeabilities and allow at least partial regeneration of the propagating signal, leading to less attenuation of amplitude. This partial regeneration property of dendritic membrane should be contrasted with the full regeneration property ofaxonal membrane that maintains the amplitude of the propagating AP along the axon. There are two types of neurons: those that generate APs at the axon hillock through a threshold-trigger mechanism, and those which do not. The latter transmit through their axon a graded potential only over very short distances «0.3 mm), owing to rapid attenuation of GPs with distance. Therefore, the use of GPs for interneuronal communication is limited to neurons with very short axons (e.g., photoreceptors and horizontal and bipolar cells in the retina). On the other hand, APs can propagate practically unaltered over long distances (especially in the myelinated axons) because they are regenerated by the excitable membrane, thus offering an effective mode of communication in the nervous system. Having established in the previous section the general model form for the two key transformational processes in the neuron (i.e., the generation oftransmembrane and postsynaptic potentials in response to transmembrane currents and presynaptic potentials, respectively), we now proceed with the ambitious goal of modeling the entire process of functional integration in the single neuron. This entails the quantitative description of the manner in which a multitude of incoming synaptic inputs at the dendrites are integrated within the neuron to generate an action potential output at the axon hillock. Accomplishment of this task requires not only quantitative understanding of the two key transformational processes mentioned above, but also quantitative understanding of the process of propagation ofpostsynaptic potentials through the dendritic branches and their integration at the soma prior to arrival at the axon hillock. In keeping with the basic tenet of this book for balancing fidelity and completeness of description with practical considerations, the suitability of simplistic notions of linear "cable theory" is questioned for the purpose of accurate modeling of dendritic propagation and integration. We hold the view that this process is rather complicated, not only because it is govemed locally at each dendritic patch by nonlinear dynamic equations of the general form previously discussed (some dendrites may also have excitable membranes, and neuromodulators may also exert influences away from the synaptic clefts), but also because ofthe extremely complicated geometry ofthe dendritic arborization. On the other hand, we see no compelling practical motivation to examine in detail the enormous complexity of the dendritic geometry, and we advocate instead the notion offunctional
416
MODELING OF NEURONAL SYSTEMS
integration, whereby the modeling problem is cast as the "mapping" of all synaptic inputs
onto the transmembrane current entering the axon hillock. This process is nonlinear and dynamic and, therefore, amenable to our methodology, which offers both practical advantages and a realistic description of somato-dendritic integration. Consequently, the advocated modeling approach appears to be suitable if our objective is the quantitative description of how a single neuron acts as integrator of incoming synaptic information to produce (or not) an action potential at any given point in time. Clearly, ifthe objective is the understanding of the detailed biophysical mechanisms subserving this integrative process, then detailed modeling analysis must be applied to each dendritic or somatic 10cation over the time course of the process. The objective of functional integration at the single-neuron level is pursued by first defining the synaptic inputs. There are numerous postsynaptic potentials (both excitatory and inhibitory) that are generated at the synaptic junctions in response to incoming axonal impulses from many terminals of various presynaptic neurons (ignoring for now the possible inputs from electric gap junctions). Let us consider first as an example the case of a single presynaptic neuron with axonal spike-train activity denoted by x(t) = LAip(t - D i)
(8.25)
i
where Ai is the amplitude of the ith presynaptic pulse p(t - D i) (of fixed form) arriving at the ith axon terminal, and D, is the respective time delay due to propagation through different lengths ofaxon terminals. This presynaptic potential induces a postsynaptic potential z;{t) upon transformation by a nonlinear dynamic operator of the form given by Equations (8.2)-(8.4) that describe the specific synaptic transmission. Subsequently, the ith postsynaptic potential Zi(t) propagates along the dendritic branches, merging with other propagating postsynaptic potentials to arrive eventually at the soma in the form of a single aggregate potential. As mentioned earlier, the propagation and merging process from the dendrites and soma to the axon hillock is very complicated and can be lumped for practical reasons into a single nonlinear dynamic operator S that acts on all N synaptic inputs [A}x(t - D}), ... , ANt(t - D N ) ] to produce the single signal s(t) that represents the input signal at the axon hillock and generates an action potential (or does not) according to the nonlinear dynamic relation defined by Equations (8.2)-(8.4). The operator S acts on the N synaptic inputs in a manner determined by the geometry and biophysics of the full dendritic arborization, the soma, and the morphology of the synaptic clefts. In the interest of simplifying the modeling task, we may incorporate the process by which the various presynaptic potentials are produced at the axon terminals of a presynaptic neuron and define an operator Q that represents the cascaded operations of the axon terminal distribution and dendritic integration as a single operation on the activity {x(t)} at the axon hillock of the presynaptic neuron: s(t) = Q[x(t - T); 0
:5
T :5 p,]
(8.26)
The operator Q can be modeled as a Volterra system receiving input x(t) from the presynaptic neuron to produce the intracellular current s(t) that represents the input to the axon hillock. The practical advantage of this formulation is that it allows estimation of the operator Q (as a Volterra-type model) from actual data. When cascaded with the model of Equations (8.2)-(8.4) for the generation of action potentials, this formulation provides
8.2
FUNCTIONAL INTEGRATION IN THE SINGLE NEURON
417
complete and efficient representation of realistic functional integration by a single neuron. Clearly, the existence of more presynaptic neurons does not alter any of these methodological facts but simply requires use of the multiinput modeling methods of Chapter 7 to describe the multiinput case: s(t) = QMl:Xl(t - Tl), ...
,x~t -
TM); Tb ... , TM E [0, J.L]]
(8.27)
The application ofthe PDM modeling concept to the concatenation ofthe processes of somatodendritic integration and spike (action potential) generation has given rise to the notion of "neuronal modes" for information processing in the nervous system, as discussed in the following section along with the related issue of"trigger regions." Since the inputs provided by the presynaptic neurons are in the form of spike trains (sequences of action potentials), the modeling task can be adjusted (for simplification ofprocessing) to the case of systems with point-process inputs discussed in Section 8.3.
8.2.1
Neuronal Modes and Trigger Regions
We conceptualize each neuron as a "black box" that receives certain input signals and produces certain output signals on the basis of a nonlinear dynamic "rule" described by an explicit mathematical expression (model). The use ofthe black-box concept for the representation of a single neuron allows us to bypass the complex biophysical mechanisms that are active at the subcellular level, and simply concentrate on the input-output transformation. This, in turn, allows the development of reduced-complexity models for aggregates of neurons and the study of their functional properties. In developing this mathematical model, we seek a compromise between mathematical complexity and biological relevance in order to obtain tractable mathematical formulations of the problem while preserving the essential functional features that have been observed experimentally. Our goal is to search for unifying operational principles that explain the largest possible amount of current experimental evidence and conceptually organize our understanding of the system function. There is no doubt that nonlinearities are omnipresent in neural systems and their role is essential for many aspects of their function. Compressive and decompressive nonlinearities observed in sensory receptors, certain types of facilitatory or suppressive/depressive interaction, synaptic gap transmission, and the generation mechanism of action potentials are some of the most widely recognized examples. The challenge is the actual identification ofthese dynamic nonlinearities from data, and their modeling in a manner that is practical and meaningful from the point of view of advancing our understanding of neural function. Finally, one should note the unavoidable presence of extraneous and intrinsic noise that places the modeling problem in a stochastic framework. The modeling concept of "neuronal modes" (NMs) is introduced in order to provide concise and general mathematical representations of the nonlinear dynamics involved in signal transformation and coding by neuronal systems [Marmarelis & Orme, 1993]. This modeling concept is based on the judicious selection of the principal dynamic modes of a neuronal system to achieve a reasonable balance between mathematical complexity and neurphysiological relevance. The proposed modeling approach decomposes into two operational steps the nonlinear dynamics involved in the transformation of the input signal into the output signal. The first operational step involves the aforementioned NMs that perform all linear dynamic (filtering) operations on the input signal. The second opera-
418
MODELING OF NEURONAL SYSTEMS
tional step employs a multiinput static nonlinearity (MSN) to produce a continuous output, or a multiinput threshold-trigger (MTT) operator (i.e., a binary operator with multiple real-valued operands that are the outputs of the NMs) with a refractory mechanisrn that produces the sequence of output spikes (action potentials). The two operational steps involving NMs and MTT are depicted in the block-structured model ofFigure 8.1, which is the modular form of the PDM model adapted to the case of spike-output systems [Marmarelis, 1989c; Marmarelis & Orme, 1993]. The NMs are the principal dynamic modes (PDMs) ofa neuronal unit or system, when the PDM analysis procedure presented in Section 4.1.1 is applied to the neuronal input-output data. Thus, when the input is the axonal activity of a presynaptic neuron and the output is the generated sequence of action potentials at the axon hillock of the postsynaptic neuron, then the NMs provide a succinct representation of the functional characteristics of neuronal information processing. The same concept applies when the input is provided by the graded potential of a sensory unit (where the output is the encoded sensory information), or in the case ofneurosensory transduction, where the input is the continuous sensory stimulus. The NM model shown in Figure 8.1 follows on the PDM model, where the output now can be either continuous (a graded potential or probability of firing) or a point process (a sequence of action potentials or "spike train"). The latter output modality requires appending a threshold-trigger operator at the output ofthe NM model, as discussed above. It is critical to note that the employed NMs are properly defined filters that capture the important dynamic characteristics of the system and reflect the integrated effect of all axodendritic and axosomatic synaptic inputs (including propagation effects on the formation of the transmembrane potential at the axon hillock preceding the generation of an action
x(n)
MODE 1
m,
MODE Z
mZ
MODE K mk
uk(n)
MULTI-INPUT THRESHOLD T[u1,u2,···,uk]
Yen) Figure 8.1 Block-structured model of a neuronal system with K neuronal modes (akin to the MPDFs of Sec. 4.1.1) that form a linear filter bank representing the important dynamic characteristics of the system. The multiinput threshold operator incorporates all system nonlinearities and generates the output spikes (multiinput threshold-trigger) [Marmarelis & Orme, 1993].
8.2
FUNCTIONAL INTEGRATION IN THE SINGLE NEURON
419
potential (or describe the transduction dynamics and conduction effects for a sensory system). The employed MSN or MTT operator represents all nonlinear characteristics of the system, which incorporate any nonlinearities involved in the creation of the aforementioned composite potential at the axon hillock, as weIl as the possible generation of the action potential. We will now illustrate the use and the meaning ofNMs through examples. Consider as a first example a simulated neuronal system that exhibits two distinct types of dynamics: a (leaky) integrating NM and a (slowly) differentiating NM, shown in Figure 8.2 as impulse response functions gl and g2, respectively, given by the expressions
Tl
12 [exp(-T/6) -
exp(-T/2)]
(8.28)
g2(T) = -(T- 10)exp[-(T- 10)2/16]
(8.29)
gl( T) =
where the time unit is 1 ms. Consider also a two-input static nonlinearity that has a sigmoidal characteristic in terms ofa linear combination ofthe two NM outputs VI and V2: (8.30)
y = 1 + a exp(-ßIVI - ß2V2)
where y is the system (continuous) output. For a = 0.4, ßI = 0.2, and ß2 = 0.4 the resulting (hyper) sigmoid surface is shown in Figure 8.3. The first-order and second-order Volterra kemels of this nonlinear system are shown in Figure 8.4, and they reflect the combined dynamics of the two NMs gl and g2 under the (hyper) sigmoid nonlinear transformation of Equation (8.30). Kemels of this approximate form have been measured in actual neu-
IMPULSE RESPONSE F'UNcriONS or TWO DYNAMIC Moors
]
0.0
15.0
45.0
30.0
60.0
75.0
TIME
Figure 8.2
1989c].
The impulse response functions of two NMs, 91 (trace 1) and 92 (trace 2) [Marmarelis,
420
MODELING OF NEURONAL SYSTEMS (HYPtR)SlCMotD STATte NONLINFAalTY
(Z HN)
Figure 8.3 The form of the (hyper) sigmoidal static nonlinearity defined by Equation (8.30) [Marmarelis, 1989c].
ronal systems and can be used to detennine the NMs in each particular case. If the NMs are detennined correctly, then the form ofthe associated static nonlinearity can be easily estimated from the experimental data . Wegenerally expect a great variety ofNMs and static nonlinearities in the nervous system, depending on individual neuron characteristics. For instance, a neurophysiological phenomenon known as "shunt inhibition" may result in amplitude modulation of one NM byanother, leading to bilinear tenns in the exponent of a (hyper) sigmoid nonlinearity as y = ----:----:--------;1 + Cl' exp(-ßIV] - ßzvz + 'YV,vz)
(8.31)
shown in Figure 8.5 (for Cl' = 0.5, ß, = 0.5, ßz = 0.5, y = 0.1) to exhibit regions ofmutual facilitation and suppression. Another example could be the case of full-wave rectification ofboth NMs, giving rise to quadratic tenns in the exponent ofthe (hyper) sigmoid nonlinearity and shown in Figure 8.6. Full-wave rectification in the presented phenomenological context may arise from the combined synaptic inputs by two other neurons that have previously applied half-wave rectification to their respective input . In its simplest form, the AP-generation mechanism can be viewed as a threshold-trigger device that produces aspike whenever its input exceeds a certain threshold value. There must be a "reset" mechanism in this threshold trigger that returns the output to the resting level after each spike for a short period of time following each firing (refractory period). When this threshold trigger is combined with the muItiinput static nonlinearity MSN , the MTT operator results, which can be described by an M-dimensional binary function 1
Y=
1
"2 + "2 sgn {j{vj, vz, .. . , VM) -
8}
(8.32)
8.2
FUNCTlONAL INTEGRATION IN THE SINGLE NEURON
421
rlR\ 1 OftOU 1l( RN[l
' 00
oeoo oeoo 0 '00 0200 00 -0200 -0'" - 0 '"
-oeoo ., 00 00
"0
.. 0
.100
100
,,"0
r...
Figura 8.4 The first-order and second-order kemels of the nonlinear system defined by Eqs. (8.28)-(8.30) that has the NMs of Figure 8.2 [Marmarelis, 1989c].
wherejiv., V2, . . . , V ,\.f) is the MSN with M inputs, () is a fixed threshold , and "sgn" is the signum function . In other words , MTT will produce aspike whenever the combination of NM output values ( VI> V 2' . .. , VM) is such that./{vl> V2, . . . , V M) ? (). These trigger values of ( V I> V 2, .. . , V M) define "trigger regions" that are demarcated by the solutions of the equation ./{VI> V2"
' "
VM) - () =
0
(8.33)
The solution s ofEquation (8.33) are the "trigger lines" whose form determines the required (minimum) order of the model (see Section 8.2.2). In actual applications, these "trigger regions " (TRs) can be determined experimentally by computing the values of NM output s ( V I> V 2, • • • , VM) for which aspike is observed in the output ofthe neuron . The locus of these values will form an estimate of the TR of the system. This is, of course , practically possible only if a relatively small number of NMs can span effectively the dynamics of the system under study. This general formulation of the modeling problem for spike-output systems has important implications in the study of neuronal systems. A spike-generating unit (neuron) is seen as a dynamic element that codes input information into a sequence of spikes where the exact timing of the spikes contains the transmitted inform ation . This coding operation may be defined by a small number ofNMs and by the TR ofthe unit. This representation leads to a general and, at the same time, parsimonious description of the nonlinear dynamics of a neuronal unit. These units can be interconnect ed to form neuronal aggregates with specifiable functional characteristics. These aggregates may be composed of classes of interconnected units , with each class characterized by a specific type of representation and connecti vity. Let us now see how these ideas apply to the example discussed above. When the threshold-trigger operator (with threshold () = 0.8) is appended to the output of the (hyper) sigmoid shown in Figure 8.3, then the MTT characteristic shown in Figure 8.7 results (i.e., the trigger line is a straight line). If the same is done for the MSN function shown in Figure 8.6, then the MTT characteristic shown in Figure 8.8 results. Note that the same MTT characteristic will result for all MSN surfaces that have the same intersection line(s) with the threshold plane, i.e., they yield the same solution for Equation
422
MODELING OF NEURONAL SYSTEMS STATIC NONLINEARITY SHOWINC fACILITATION 6 OCCLUSION
ta-
1 ' 1 + oexp(-~.v. -/f.,tJ2 + 1 VI V., )
Figure 8.5 The form of the static nonlinearity defined by Equation (8.31), exhibiting regions of mutual facilitation and suppression [Marmarelis, 1989c].
STATIC
----...............
.O.L.N~A.ITY
SHOW.MC TVO-MODE aECTlrlCATION
----.. ~~
Figure 8.6 The form of the static nonlinearity containing quadratic terms of the NM outputs v, and in the exponent of the (hyper) sigmoidal expression (full-wave rectification characteristics) [Marmarelis, 1989c].
V2
8.2 THaESHOLDED
Figu re 8.7
FUNCTlONAL INTEGRATION IN THE SINGLE NEURON
(HYP ER)S ICMO ID STAT IC NONL INEA aITY
(NTT )
The MTT characteristic for the MSN of Figure 8.3 [Marmarelis, 1989c).
TKSESHOLOEO STAT1C HORL IREAalTY SKOKIRC TKO'HOOE aEcT lflCA TIOR
,.,..-~
-~-'--"""'"
-
J
'l. .... ""- ...... -
.
r-«:
--------...-----",
.0 . . . . . . . . . _ _
,~
Figu re 8.8 The MTT characteristic for the MSN snown in Figure 8.6 [Marmarelis, 1989C).
423
424
MODELING OF NEURONAL SYSTEMS
(8.33). This clearly demonstrates that the detailed morphology of the MSN surface in the subthreshold or suprathreshold region has no bearing on the pattern of the generated spikes. Note that the NM outputs for this system, VI (t) and V2(t), provide finite-bandwidth information through time about the intensity and rate characteristics of the input signal, respectively. The MTT then indicates which combinations of intensity and rate values of the input signal will lead to the generation of aspike by the neuron. These combinations define the TR of the specific neuron. Furthermore, subregions of the TR can be monitored by "downstream" postsynaptic neurons using the temporal pattern of the generated spike train and their own nonlinear dynamic characteristics (NMs and MTT). This "higher-level" coding can provide the means for refined clustering of spike events that reflect specific "features" of the input signal, leading to specialized detection and classification of input information features and, eventually, to cognitive actions using these specialized features as "primitives." The "fanning-out" of information to postsynaptic units for the purpose of combinatorial higher-level processing is consistent with the known architecture of the cortex. In full awareness that this hypothesis of neural information processing is not yet tested and is still in a seminal stage, we nevertheless propose it as a plausible theory that exhibits some attractive characteristics; it incorporates the apparently complex nonlinear dynamics and the signal modalities found in neuronal systems in a manner that is consistent with current experimental evidence. We conclude with a simple example of signal encoding based on the presented ideas. Consider a pulse input, shown in Figure 8.9 along with the resulting NM outputs VI(t) and V2(t). Application of a threshold trigger on VI(t), V2(t), -V2(t), and V2 2(t) separately yields the spike trains shown in Figure 8.10 (traces, 1, 2, 3, and 4, respectively). These four
INPUT PULSE (11) AND INTERNAL VARIABLES Vl (12) AND V2 ('3)
··~----2
_--------J J
0.0
• • • • » »»»»• »• •
30.0
i
•
•
60.0
•
•
»
i
i
i
i
90.0
i
i
i
•
i
.T.. T·T,-,-,..,...,.....,..rTl
120.
150
TIME
Figure 8.9 A pulse input (trace 1) and the resulting internal NM outputs V1(t) (trace 2) and V2(t) (trace 3), corresponding to the NMs shown in Figure 8.2 [Marmarelis, 1989c].
8.2
FUNCTlONAL INTEGRATION IN THE SINGLE NEURON
425
(1 )(;~-SIJS"A!NEO.{2)ON- TRANSIENT,(3)Off - TRANSIENT,(4)ON/OFF" - TRANSS
._ _lNL
0.0
30.0
. .__
90.0
60.0
120.
J
150
TIME
Figure 8.10 Spike-train response to the pulse input for four common types of neuronal responses: (1) on-sustained, (2) on-transient, (3) off-transient, (4) on/off-transient [Marmarelis, 1989c].
cases emulate the "on-sustained," "on-transient," "off-transient," and "onloff-transient" responses of neurons, respectively, that are often observed experimentally. They code an event of significant magnitude and its duration, the onset of an event, the offset of an event, and the onset and offset of an event, respectively. Application of an MTT of the type shown in Figure 8.7 (with appropriate threshold) on a linear combination ofthe NM outputs [VI (t) + 2V2(t)] yields an "on-mixed" response shown in Figure 8.11 along with the input pulse and the combined continuous wavefonn ofthe NM outputs. The spike output encodes the onset ofthe stimulus (event) and its duration in the same time record. The higher-Ievel postsynaptic neurons must "know" that this is an "on-mixed" cell, otherwise they will mistakenly interpret the cell output as coding two distinct events. This can be accomplished by cross-examination of the outputs of several different types of neurons receiving and encoding the same input. To i1lustrate the idea of higher-Ievel decoding by monitoring different subregions of the MTT trigger regions for spike events (i.e., clustering of VI, V2 values resulting in a spike), we consider the presented "on-mixed" spike train of Figure 8.11 and plot the values (Vb V2) corresponding to an output spike on the (Vb V2) plane. The result is shown in Figure 8.12, where the abscissa is VI values and the ordinate is V2 values. The combinations of (Vb V2) values that lead to spike generation cluster in two groups. The upper-left cluster corresponds to high V2 values (encoding significant positive rate of change in the input signal) and the lower-right cluster corresponds to high VI values (encoding significant positive magnitude of the input signal). These two clusters could be delineated by higher-level neurons with appropriate connectivity and dynamic characteristics, leading to extraction of specific input features. This example i1lustrates the possibilities for higher-level decoding of complex input information afforded by this approach. The immense variety of individual neuron characteristics (in tenns of synaptic, histological, and biophysical characteristics) leads to a similar variety of NMs and associated
426
MODELING OF NEURONAL SYSTEMS
-------~
3
1
0.0
60.0
30.0
90.0
ieo,
120.
TiME
Figure 8.11 The pulse input (trace 1), combined internal variable (V1 + 2V2) (trace 2) and the "onmixed" spike output (trace 3) (details in the text) [Marmarelis, 1989c].
10.0 8.50
AE ~
7.00 N
>
5.50
"1
..
W
..J
m
4.00
~
2.50
)I(
0:=
..J
"
W I-
~
1.00 .;,.~
•
-0.500 -2.00 -3.50 -5.00
T
0.0
2.00
4.00
6.00
8.00
10.0
INTERNAL VARIABLE V1
Figure 8.12 Combinations of (v1 , v2 ) values leading to spike generation for the example of Figure 8.11. Two clusters of (V1' v2 ) trigger values appear [Marmarelis, 1989c].
8.2
FUNCTIONAL INTEGRATION IN THE SINGLE NEURON
427
static nonlinearities in different cases. Nonetheless, the presented modeling approach offers a common framework that allows a unified approach to the problem of signal transformation and coding by different types of neurons, and incorporates nonlinear dynamics and spike generation in a fairly general yet parsimonious manner. The actual identification of the NMs of a neuronal system from data remains a formidable task in practice. The complexity of this task is compounded by the presence of datacontaminating noise and experimentallimitations in applying and measuring the appropriate input-output signals. Nonetheless, the proposed modeling approach has been applied successfully and has been shown to provide accurate understanding of the functional properties of individual neurons that will facilitate the study of neuronal aggregates and, ultimately, the understanding of the functional organization of "integrated" neural systems of greater complexity. Note the coincidence between NMs and PDMs of a neuronal system. An example from a spider mechanoreceptor is provided below. Illustrative Examples. An example of an integrated PDM model with trigger regions of a single neuron with action-potential output is presented here, taken from the modeling studies of the cuticular mechanoreceptor in the spider leg that was discussed in Section 6.1.4 for intracellular current and potential outputs. The obtained first-order and second-order kemels of the cuticular mechanoreceptor are shown in Figure 8.13, when the output is the sequence of action potentials generated by the quasiwhite displacement (mechanical stain) stimulus [Mitsis et al., 2003c]. Three significant PDMs are found in this case and shown in Figure 8.14. The most significant PDM (corresponding to the highest eigenvalue of 2.28) exhibits high-pass characteristics, and the other two PDMs (corresponding to eigenvalues of 0.31 and 0.22) exhibit bandpass characteristics with resonant frequencies at approximately 110Hz and 180 Hz, with about 80% fractional bandwidths (secondary resonant peaks are also evident at about 40 Hz and 70 Hz in the second and first PDM, respectively). Dur working hypothesis is that the resonant peaks of the PDMs correspond to different ion channels with distinctive dynamics. We posit that the high-pass PDM corresponds to the fast sodium channel and reflects (in part) the artificial representation of the action potential with an impulse function. The main peaks of the second PDM (--180 Hz) and of the third PDM (--110Hz) may be corresponded to the sodium and potassium channels involved in the generation and resetting ofthe action potential. The secondary peak ofthe second PDM (--40 Hz) may be due to the calcium-activated potassium channel based on the time scale of the anticipated dynamics. This leaves the secondary peak ofthe first PDM (--70 Hz) as an open question, regarding its origin and correspondence to a biological mechanism. The PDM outputs {uI(n), u2(n), u3(n)} can be mapped in three-dimensional state space for the combinations that lead to the generation of an action potential (AP) [Marmarelis, 1989c; Mitsis et al., 2003c]. The subspace thus defined is called the "trigger region" of the system and is illustrated in Figure 8.15 for UI versus U2, U2 versus U3' and UI versus U3 separately (since visual display ofthe "trigger region" for all three PDM outputs is difficult) for a 10% "probability of firing" (PoF). The latter is computed on the basis of the data as the ratio of the histogram of PDM output combinations generating an AP to the histogram of all PDM output combinations in the data. Note that an absolute refractory period of 5 ms is determined from the data and used for histogramming (binning) purposes. The PoF surfaces are also shown in Figure 8.15 for the three pair combinations of PDM outputs. It is evident that the selection of a hard threshold to produce the "trigger region" (TR) from a PoF can be determined by the mean firing rate ofthe neuron (the 10%
428
MODELING OF NEURONAL SYSTEMS
0.5
(a) ,
I
(b)
,
0.55
I
1
0.4
0.5
0.3
-
0.2
~
0.1
oS .-
0
.Q.1
~.21 V
0.1
0
,
,
,
I
5
10 m[rnsec)
15
20
0.05
0
100
200
(a)
1.5...,
""
,,"
~O.5 .E
N .Mt
0
-G.5 ·1 0
--., ,, I ,, ",,~, , 1 I', , '.1 1 I" "
«lO
I
g300
I
u
" , " II ,A"
~
'~1 , I
tt~ V --
m1 [maee]
.-..
'..., 1
:::J
~
...
::..
~
e- 200
-,I , 'I
LL
20
"
100 1
20 0
500
~
cG)
_yl
..,.
10
,
~ ""'"
~
"" "-
400
'
~
I• ,,',I
500
(b)
,," I', , ~
I
400
500
,~
"..-.. "
300
FrIQl*'CY [Hz)
m2(msec]
0 0
<::::)
t 100
200
300
Frequency [Hz]
Figure 8.13 First-order and second-order Volterra kerneis of the mechanoreceptor for action potential output in the time domain (Iett column) and in the frequency domain (right column). Note the high-pass characteristics in both kerneis [Mitsis et al., 2003c].
level is chosen only for illustrative purposes in Figure 8.15). Accordingly, the hard threshold is selected if the actual mean firing rate can be established, so that the resulting mean firing rate from the model equals the actual. Another way to address this issue of hard-threshold selection is by computing the sensitivity-specificity curves (SSC), which are akin to the "receiver operating characteristic" (ROC) curves used traditionally in detection theory and applications. The SSC can be de-
8.2
,
0.8. 0.41 0.2H
I'
(a)
,
,
~
I ••
,
1.8.
~
1.4~
1
1.2
~
," ,I ,," , I ,, ,,, I ,
t
0.8~
(b)
,", \
\ \
---~ \
,, ''
"\
I
0.8~
0.4~ •
,
•
20
0 0
..
~ Mode2 --Mode3
100
-
\
I
I
15
...
\
I
, 10 m[maec)
"
,~
0.2~1
5
429
FUNCTIONAL INTEGRATION IN THE SINGLE NEURON
200 300 Frequency (Hz)
400
\
500
Figure 8.14 Estimated PDMs of the mechanoreceptor for action potential output in the time domain (Iett) and in the frequency domain (right) [Mitsis et al., 2003c].
tennined by computing the "sensitivity" of the model as the percentage of true positive predictions (Le., correct prediction of an output AP) for every threshold (J over all time bins in the data (defined by the absolute refractory period) and plot it versus the "specificity" ofthe model, computed as one minus the percentage offalse positives (i.e., predictions of output AP when none exists) over all time bins for each threshold (J. Thus, the SSC is computed for all (J values, and the performance of the model is deemed better when the area under the SSC increases (ideally approaching unity). In our example, the SSC is shown in Figure 8.16 and its area for 3 PDMs is found to be 0.976 for a bin size of 0.025. Note that the bin size of the nonlinearity in the PDM model affects the predictive performance of the model. Maximization of the SSC area can be used as the criterion for selecting the "optimal" bin size or other model parameters, including the number of PDMs. Although the threshold (J detennines the trade-off between sensitivity and specificity of the model, the natural firing rate of the neuron must be considered in selecting the "natural threshold," because the notion of "optimality" is extemally imposed (by our tendency to seek optimal performance) and may not be applicable in a natural context. Another way of evaluating the PDM model for AP output is to compute the PoF for each time bin (using the PoF surface) and compare it to a binary output value (1 if an AP exists in the time bin and 0 otherwise). The computation of a normalized mean-square error of this model prediction can be used as an alternative metric for model evaluation. It is important to observe that the most significant high-pass PDM for AP output is very similar to the second PDM for the previous intracellular current and potential data (see Figs. 6.21 and 6.26) and both PDMs for the transmembrane current data also bear certain common features with the band-pass PDMs of the AP output (e.g., the resonant peak around 40 Hz and the subtler peak around 70-80 Hz), as well as the resonant peak around 100 Hz. We must emphasize that the time course ofthe PDMs ofthe mechanoreceptor or any other neuronal system can be analyzed further to infer the dynamics of the distinct ion
430
MODELING OF NEURONAL SYSTEMS
~
~~ - ~ r·-r • - r - ~ r _ ~ ~ ~" ~. ~
.~' ~.
" ",," "1111\-, -•
.
,0.8
"
''' .. ~ .. _~ .. _~ -~"","- .. ~--~ ..
l5 0.8
1
•
•
I
f
i. .. _: :
,
.... -:- .... ,~ .. -~,-.
j::
~~'--
0.2
. t
0.8
u1
~ ::~.~::::[:~ .. - :..... ---: ->
•••
-;' I,,"
1°·8 ~ 0.8
'
.... ...:.. ...- :..,
f
0.4 0.2
-.
__ '
'"
'. ~.' ' ~
-
"
Trigger region . . . function cf 11..
-~ ~. ~ ~.:. . ~ --~ -:. -~ .. ~ ~:
,,"
1
•
,0.8
t
I
.
"2.
,
r--_.:.
.- - • t .
'.'
A'
u1
'-~,.
';
,
-- •
,,",
~.
'15
,,"
.:. •
,
u2
Probability of firing an action potential as a function of the ouputs of modes 1 and 2.
I
,
0.2
u2
,,"
,
::I::t::L
,0.8
r--r"
,1 ~
0.4
,
-.- .. - po - - " - •
---:.., --t, -- ~, _.
I
: . : : ':' ~ ~ .
F:
,-
..
0.2
,
--.:;"...
.;y o.s
0.5
... , ' 0.4 0.4
u1
u3
u1
Trigger region . . . f\anction oe 11"
u3
Probability of firing an action potential as a function of the outputs of modes 1 and 3.
11).
--
, ,",~~- -~--••:- .> :: --~~- ~'..:- ••, "-,.r, ..,"' ..r">~ " -'.', - . .,. ' 't'_ .~ -.A' """.....
",,';"
.
.,,'
.,••••••:
:
... "~,,
:.:k..:.:'.:'.:!r.J , .... "
1
I
1
"
-
.> H~-
I
-
__ t
.'
_r~_
"".,r 4' - . ~.5
~.20
0.2
0.2
~--.....---...~--........
.0.2
~.2
0.5 ~.4
u2
u3
Probability offlring an action potential as a function oflbe outputs of modes 2 and 3.
0.4
~.4
u2
u3
Trigger region . . . function
oe 112, 11).
Figure 8.15 Probability of firing an action potential as a function of the PDM outputs taken by pairs (Ieft column) and the corresponding trigger region for 8 = 0.1 (right column) [Mitsis et al., 2003c).
channels in the neuronal membrane. The particular methodology for this purpose is presented in Section 8.1 in connection with a class of nonlinear parametrie models with bilinear terms that represent the modulatory effects of different channels (voltage-activated or ligand-activated conductances). This method can be extremely useful in connecting datatrue VolterraIPDM models with biophysically interpretable parametrie models (nonlinear differential equations). It is important to note that the Wiener analysis of the Hodgkin-Huxley (H-H) model yields kemels (of first and second order) with high-pass characteristics up to 500 Hz for
8.2
FUNCTlONAL INTEGRATION IN THE SINGLE NEURON
0 0 Co 0
0
3 PDMs ___
o co O) ~~
",°0
0 .9
~
@
~~ \
0 .8 0 .7
~
/ ~
0 .6
~,
PDM pairs
~
;2 0.5 ::!
'"
If)
431
. (;1
0.4
0 .3 0 .2 0 .1
o
o
0 .1
0 .2
0 .3
0.4
0.5
0.6
0.7
0 .8
0 .9
Specifi city
Figure 8.16 The SSC of the mechanoreceptor for three POMs (best performance corresponding to an area of 97.6%) and the pair combinations (Iower circles corresponding to an area of 92.4%).
the squid axon membrane [Courellis & Marmarelis, 1989] . From the estimated first-order Wiener kernei, we see that the frequency response declines after 500 Hz-a fact that attributes band-pass characteristics to the squid axon dynamics over a broader frequency range (the half-max passband is from about 200 Hz to about 1200 Hz). However, its highpass characteristics up to about 500 Hz are similar to the first PDM of the cuticular mechanoreceptor, and, therefore, the first PDM dynamics of the mechanoreceptor can be attributed to the sodium-potassium channels that are included in the H-H model. It was also shown in the Wiener analysis ofthe H-H model that kerneis of order higher than second are required in order to predict the generated action potentials in response to broadband (quasiwhite) stimulation. This finding is consistent with the results obtained in the PDM model ofthe cuticular mechanoreceptor.
8.2.2
Minimum-Order Modeling of Spike-Output Systems
Ever since the Volterra-Wiener approach was applied to the study of spike-output systems, it has been assumed that a large number of kerneis (of high order) would be necessary to produce a satisfactory model prediction of the timing of the output spikes. This view was based on the rationale that the presence of a spike-generating mechanism constituted a "hard nonlinearity," necessitating the use of high-order nonlinear terms. AIthough this rationale is correct if we seek to reproduce the numerical binary values of the system output using a Volterra-Wiener model, we have come to the important realization that the inclusion of a threshold-trigger in our model reduces considerably the number of kernels necessary for complete modeling in terms of predicting the timing of the output spikes.
432
MODELING OF NEURONAL SYSTEMS
This realization gives rise to the notion of a "minimum-order" Volterra-Wiener model, which captures the necessary dynamic characteristics of the system when the effect of the spike-generating threshold-trigger is separated out, as discussed in the previous section. Thus, a wide class of spike-output systems can be effectively modeled with low-order Volterra-Wiener models followed by a threshold-trigger operator. We came to this realization by looking closer at the results of the "reverse correlation" technique for the estimation of Wiener kemels and by studying the meaning of trigger regions of spike-output systems containing a threshold-trigger mechanism for the generation of spikes [Marmarelis et al., 1986].
The Reverse-Corre/ation Technique. For a spike-output system with a Gaussian white-noise (GWN) input, it has been shown that the Wiener kernels can be estimated through the "reverse-correlation" technique [de Boer and Kuyper, 1968]. This technique utilizes the fact that the output can be written as a sum of Kronecker deltas in discretetime notation: I
yen) =
LSCn - n;)
(8.34)
;=1
where {n;} are the locations of the output spikes. We can then obtain the Wiener kernel estimates through cross-correlation as [Marmarelis et al. 1986]
"
h Im«, ... , m r )
=
1 {I Lx(n; - ml) ... xin, - m
~
rur;
-
N
I
;=1
r)
r-l 1 N } - ~ N ~ GJhj ; x(n'), n' ~ n]x(n - ml) ... x(n - mr )
(8.35)
where Gj are the estimates ofthe lower-order discrete Wiener functionals (j = 0, 1, ... , r - 1), N is the total number ofpoints in the input-output data record and I is the total number of spikes in the output. For the off-diagonal points of the kemel, the second term on the right-hand side ofEquation (8.35) vanishes as Ntends to infinity. To get a feeling about Equation (8.35), let us consider the estimation of the first three Wiener kernels: I
" 1~( ) __ ho = - LY n - n, - N I
N ;=1
"
(8.36)
l{l
I " IN } h 1(m)=-2 NLx(n;-m)-ho-Lx(n-m) a; ;=1 N n=1
1 "'" Nu 2 x
"
h2(ml ' m2) =
I
~x(ni - m)
1{IN ~x(ni I 24 - ml)x(n i o;
z=1
(for large N)
z=1
}"
1
N
m2) -ho- Lx(n - ml)x(n - m2) N n=1
(8.37)
8.2
FUNCTlONAL INTEGRATION IN THE SINGLE NEURON
433
1 N M-I - N ~ ~O h\(m)x(n - m)x(n - ml)x(n - mz) =:::::
1 21\'
K
I
4
1VU X
Ix(ni - ml)x(n i - m2) - - 2 2 5(m l ;= I NuX
-
m2)
(for large N)
(8.38)
For large Nthe first-order model response is M-I
YI(n) =h o +
I
hl(m)x(n - m)
m=O
I 1 = N + Na}
I
M
=N +
N
M-I
I
~~x(ni-m)x(n-m)
I
~r(n-ni)
(8.39)
where 1 r(n - n i) =
--2
Mo;
M-I
I
xin, - m)x(n - m)
(8.40)
m=1
The critical observation is that r(n - nj) is an estimate (over M samples) of the normalized autocorrelation function of the input x(n), centered at the point n = n.. Since x(n) is a white process, this estimate will tend to a Kronecker delta as M (i.e., the memory bandwidth product ofthe kernel) increases. For given M, r(n - ni) will have its peak value at n = n, and side values (i.e., for n =1= n i) randomly distributed with zero mean and variance 11M. We can easily derive that E[r(n - nj)] = 5(n - n;)
(8.41)
1 var[r(n - ni)] = -[1 + o(n - n i)] M
(8.42)
1 cov[r(n - nj), r(n - nj ) ] = M [o(nj - nj ) + 0(2n - n, - nj ) ]
(8.43)
The covariance ofr(n - nj) in Eq. (8.43) depends on the autocorrelation ofthe system output and the specific time instant n. This makes the variance of the first-order model prediction dependent on the overall system characteristics. If we call the contribution of the first-order Wiener functional: Yl(n) = Yl(n) -ho, then
M [ I I-I ] var[Yl(n)] = }f2 1+ 2~ ~8(2n-ni-n)
(8.44)
An upper bound for this variance can be found regardless of the system characteristics as
IM(
n)
var[YI(n)] ~ N2 1 + N
(8.45)
434
MODELING OF NEURONAL SYSTEMS
Equation (8.45) indicates that the variance of the first-order model prediction may increase in time for given N, although it tends to zero as N tends to infinity. Note however that MI E[Yl(n)] = 8(n - ni ) N i=l
I
(8.46)
which implies that the size of the expected value of the predicted spikes also decreases as N increases. We can study similarly the prediction given by the second-order Wiener functional Y2(n) =Y2(n) -Yl(n) -ho M-l
=
M-l
I I
M
h2(m j, m2)x(n -
ml)x(n - m2) - a-;
ml=O m2=O
I
h2(m, m)
(8.47)
m=O
For large N, substitution of h2 from Equation (8.38) into Equation (8.47) yields
M2 I IM IM "Y2(n) = 2N ~,-2(n - n;) - 2N Y(O) + 2Na-; 4J=(O) I - ---=t
2Nu,-r
M-l
M-l
I I
"-
cPxx(ml - m2)x(n - ml)x(n - m2)
(8.48)
ml=O m2=O
where 1 N 4>=(m) = N ~x(n)x(n - m)
(8.49)
Note that 4>xx is different from r because of the different sample size (N versus M), and E[Y2(n)] =
M(M+ 1)
2N
I
I8(n - nJ
(8.50)
i=l
which indicates that the prediction of the second-order Wiener functional will also reproduce (in the mean) the output spike train. This is the fundamental observation that has led to the concept of minimum-order Wiener modeling for spike-output systems [Marmarelis et al., 1986].
Minimum-Order Wiener Models. The analysis of the previous section revealed the interesting fact that the contribution of each nonzero Wiener functional to the model prediction of a spike output is an estimate 0/ a scaled replica of that output. This fact becomes apparent when the reverse-correlation technique is used for the estimation ofthe system Wiener kemels. The model prediction of each individual Wiener functional places an estimate of the input autocorrelation function (obtained over M points where M is the memory-bandwidth product of the respective kerneI) at all locations where an input-induced spike is found in the output record. Since the input is white noise, the expected value of this input autocorrelation function is a Kronecker delta. This would appear to imply that if these input autocorrelation estimates are sufficiently accurate, then a single
8.2
435
FUNCTIONAL INTEGRATION IN THE SINGLE NEURON
nonzero Wiener functional would suffice in predicting the system output. However, the variance of these input autocorrelation estimates can be so large as to prevent such a desirable simplification of the modeling and prediction task. This variance depends, in general, on the overall system characteristics and the output autocorrelation. The important suggestion that emerges from these realizations is that if the aforementioned variance is sufficiently small, then the lowest-order nonzero Wiener functional will be adequate for the purpose of predicting the system output. Otherwise, more Wiener terms must be included in the model until the resulting prediction attains the required clarity. No general mies have been formulated as to the minimum number of required Wiener terms in a given application, since it depends on the system characteristics. However the notion of a minimum-order Wiener (MOW) model has become clear. We explore below the application of this concept to several classes of continuous-input/spike-output (CISO) systems, and demonstrate the considerable simplification that can be achieved in many applications by employing the concept of a MOW model. First we consider the class of CISO systems described by the cascade of a stable linear time-invariant filter LTI followed by a threshold-trigger TT as shown in Figure 8.17. The following relations hold: M-I
v(n)
=
I
h(m)x(n - m)
(8.51 )
m=O
ifv ~ (J ifv< (J
1 y(v) = 0
(8.52)
where h(m) is the discrete impulse response function of the filter (M being its memorybandwidth product) and (J is the threshold value of TT. If x(n) is a GWN process with variance then v(n) is a nonwhite Gaussian process with zero mean and variance:
u;,
M-I
I
u~ = u;
h2(m)
(8.53)
m=O
Ifwe now consider the Hermite expansion ofy(v) over the entire realline, we have: y(v) = IakHk(v/V2uv)
(8.54)
k=O
where {Hk} is the orthogonal set ofHermite polynomials and {ak} are the Hermite expansion coefficients given by
a = k
Figure 8.17
1 foo 2 kk!V2O"v Y2u
8
v
er 2l2av2H k(v/V2O"v)dv
(8.55)
Cascade of a linear time-invariant filter (LTI) followed by a threshold trigger.
436
MODELING OF NEURONAL SYSTEMS
where the kth-order Hermite polynomial is given by Hk(z) =
(-1 Y2 k- 21kI
[k/2]
I
__
(8.56)
zk-21
1=0
Then the Wiener kernels of the system shown in Figure 8.17 are hk(mb' .. , mk) = c,j1,(ml) ... h(mk)
(8.57)
where 2k/ 2
(8.58)
ci> - k - a k Uv
For instance, the zero-order Wiener kerne I is given by 1 ho = v!2uv
foo
-v 212u3 V2u lJ e v
dv
(8.59)
representing the average firing rate of the system for a GWN input, that clearly depends on the values of o; and (J. The first-order Wiener kernel is given by hl(ml)
= clh(m)
(8.60)
where Cl
= - -12 foo 2uv
V2uv 8
e:
v2
2/2u v2 ( - 2 Uv
1) dv
(8.61)
Note that the first-order Wiener kernel is a scaled replica ofthe impulse response function of the filter LTI. Consequently, all the information needed to characterize the system in Figure 8.17 can be found in the first-order Wiener kernel (within a scalar), in addition, of course, to the threshold value o. By evaluating higher-order Wiener kernels we do not gain any additional information about this system, that is, the MOW model for this class ofsystems is first-order, although higher-order kernels exist. Although this is not true for all systems, we can generally assert that higher-order kernels are not needed above a certain finite order for the representation of the CIsa class of systems. An important observation regarding the unknown scalar Cl and threshold value (J in the example above is that specific knowledge of these two parameters is not necessary from the prediction point of view. Since a single threshold estimate fJ is used in converting the Wiener model output into a spike train (which is our model prediction for the spike output), both ofthose unknown parameters (Cl and 0) can be combined in determining 8. The actual estimation of the threshold 8 that ought to be used in connection with the Wiener model in order to predict the system spike output is a critical task in actual applications that is performed by comparing the Wiener model output with the actual system output. In general, the model prediction for a given threshold fJ will fail to match some output spikes (false-negative aspikes) and predict some spikes that are not present in the system output (false-positive ß spikes). A cost function that combines the number of a and ß spikes for a given threshold 8can be used in practice to yield an optimum threshold
8.2
FUNCTlONAL INTEGRATION IN THE SINGLE NEURON
437
estimate that minimizes this cost function. We should note that, in actual experimental data, a number of spikes that are not input-related will generally be present (point-process noise or spontaneous activity). These should be found among the aspikes of our model prediction, and could be effectively "filtered out" by use ofthe presented method. Another method for estimating () and evaluating the predictive ability of this model was presented earlier in connection with the spider mechanoreceptor model (see Fig. 8.16) and involves computation of the sensitivity-specificity curves [akin to receiver operating characteristic (ROe) curves used in detection systems]. Let us now consider the class of CISO systems described by the cascade of a linear time-invariant filter LTI followed by a zero-memory nonlinearity ZMN and a threshold trigger TT, as shown in Figure 8.18. The cascade ofZMN and TT can be viewed as a single zero-memory, nonlinear subsystem NTT that produces aspike whenever its input v(n) attains values within specific ranges determined by the combination of ZMN and TT. For instance, if the characteristic of ZMN and the threshold value of TT are as shown in Figure 8.19(a), then the resulting characteristic of the composite zero-memory subsystem NTT is as shown in Figure 8.19(b). Consequently, the Wiener series representation of the overall system of Figure 8.18 will have kemels of the form described by Equation (8.57), where the coefficients {Ck} will be given by the expression in Equation (8.58) with the Hermite expansion coefficients {ak} corresponding to the nonlinear function shown in Figure 8.19(b). Clearly, ifthe characteristic functionj{v) of ZMN is monotonie, then the NTT characteristic will be a threshold-trigger characteristic with a single threshold value and the analysis presented for the class of systems in Figure 8.17 will apply. If the ZMN characteristic is not monotonie, then the analysis will be complicated in that higher-order Wiener terms may be necessary to achieve a certain accuracy if and only if the function j{v) - () has multiple roots. The critical observation is that the minimum order of the required Wiener model is equal to the number of "trigger points" in the NTT characteristic; that is, the number of real solutions ofthe equation:j{v) - ()= O. For instance, a characteristic ofthe form shown in Figure 8.19(b) would require a third-order MOW model. This observation is based on the fact that the MOW model must reproduce, in addition to the dynamics related to the LTI subsystem in this example, a zero-memory nonlinear characteristic p(v) that will intersect the model threshold line at the actual trigger points of the system. These trigger points represent, of course, the real roots of the polynomial p(v) - 8, where p(v) is the minimum degree polynomial that satisfies this condition. These trigger points define the one-dimensional "trigger regions" discussed in the previous section. For a ZMN characteristic with even symmetry (e.g., a full-wave rectifier or a squarer), the odd-order Hermite coefficients ofits expansion will be zero and, consequently, a firstorder Wiener model will be unable to predict any ofthe output spikes (since the first-order Wiener kerne I will be zero). A second-order Wiener model, however, will be an excellent
Figure 8.18 Cascade of a linear time-invariant filter (LTI) followed by a zero-memory nonlinearity (ZMN) and a threshold trigger (Tf). The composite box NTT is a zero-memory nonlinearity exhibiting the trigger regions shown in Fig. 8.19 [Marmarelis et al., 1986].
438
MODELING OF NEURONAL SYSTEMS
u
«v )
y
1
rV1
(a )
v2
I
v3
V
(b)
Figure 8.19 (a) The characteristic u = f(v) of ZM N in Figure 8.18 and its intersection points with the threshold line u = (J. (b) The characteristics of NTT exhibiting trigger regions that are defined by the intersection points V1' V2' and V3 [Marmarelis et al., 1986].
predictor of output spikes when combined with the appropriate threshold on the basis of similar arguments as the ones presented above. Generally, all ZMN characteristics that have only two intersection points with the line representing the threshold will yield a twosided threshold characteristic for the composite subsystem NTT and, consequently, admit a second-order MOW model as a complete representation ofthe overall system. It is important to observe that no matter how complex the ZMN characteristic is in the subthreshold or suprathreshold region, it does not affect the order ofthe MOW model and its predictive ability. The only thing that matters is how many times the ZMN characteristic intersects the threshold line. A question naturally arises in connection with CISO systems that do not belong to the simple classes indicated in Figures 8.17 and 8.18. The general class of CISO systems can be described by the PDM (or NM) model followed by a TT, and the combined NTT will generally exhibit multi dimensional "trigger regions" of dimensionality equal to the number ofPDMs or NMs ofthe subject system. An illustrative example of MOW modeling of an actual physiological system is given below for the case of spatiotemporal dynamics of retinal ganglion cells discussed in Section 7.4. An example of a three-dimensional trigger region is presented in the case of a spider mechanoreceptor in Section 8.2.1 (Fig. 8.15). Illustrative Example. As an example of MOW modeling of an actual physiological system, we present here some results obtained by spatiotemporal minimum-order Wiener model of retinal ganglion cells in the frog [Citron & Marmarelis, 1987]. The response of the class 3 ganglion cell was predicted using a MOW spatiotemporal model of first and second order. Typical results are shown in Figure 8.20 and demonstrate the utility of the second-order Wiener functional in this case, since the latter predicts five of the eight output spikes in the data segment shown for the appropriate threshold (with no false positives), whereas the first-order Wiener functional predicts only two of the eight spikes (one output spike is predicted by both functionals). This result indicates that the minimum order for Wiener modeling of ganglion cells in the frog retina is second-order and predicts about 75% of the output spikes with no false positives, for the appropriate threshold.
8.3
NEURONAL SYSTEMS WITH POINT-PROCESS INPUTS
The nonparametric modeling of neuronal systems stimulated by temporal sequences of action potentials (spike trains) can be simplified (from the processing point ofview) when
"
PREDICTION U8INQ Fl'8T-0ADER WBER TEAM
!
x
• xx
x x
)0(
I
I
e
I ...
2.AO
UIO
I
I
I
I
I
o.eo
0
4
3.20
Th1e lsecl
B
SECOM)-OAD, ~ x
x
x
x
TERM XX
XX
I
e
...
I
I
I
o.eo
0
2.AO
UIO
Tme
C
I
I
4
3.20
(sec)
Fl'ST + SECOND-OADER TERMS 1+2
• •~ 1
X
X
22
2
X
xx
X
xx
I
e
5"
I
o.eo
I
1.80
I
2.AO
, 3.20
~
Tme lsecl Figure 8.20 First - and second -orde r predictions of the spike response of a Class 3 frog ret inal ganglion cell to a spat iotemporal white-noise stimulus. Measured spikes are marked by x 's. Pred icted spikes are indicated by arrows. Threshold value, 0, is drawn to maximize the number of predicted spikes without false positives [Citron & Marmarelis, 1987].
439
440
MODELING OF NEURONAL SYSTEMS
the action potentials are idealized as impulses of fixed intensity (Dirac delta functions in continuous time or Kronecker deltas in discrete time). Thus, these inputs can be represented mathematically in a stochastic context by "point processes," a class of random processes that are formed by sequences of identical impulses (spikes) representing random, instantaneous, and identical events. Proper discretization of a continuous-time point process requires that the sampling interval (bin width) be equal to the absolute refractory period in order to allow the representation of each continuous-time action potential by one and only one Kronecker delta in the respective bin (the intensity ofthe recorded spike ought to be the integrated area under the action potential divided by the bin width T). In practice, a sequence of action potentials (spike train) can be represented by a discrete-time point process: I
x(n) =A Iö(n -ni)
(8.62)
i=l
where n = 1, ... ,N denotes the discrete-time index (t = nT), A is the intensity ofthe Kronecker delta (spike) and n, is the time index ofthe ith spike event (i.e., the discretized timing of the ith action potential). Note that this point process has I spike events over the available data record of N bins, that is, the mean rate of this point process is (I/N). We seek to address the problem of neural system modeling from input-output data, where the system input x(n) is a point process as described by Equation (8.62), but the system output y(n) may be either a discretized continuous signal or a point process. This fundamental problem was addressed for the first time in the general input-output nonlinear modeling context ofthe Volterra-Wiener approach by Krausz (1975) and Ogura (1972) in the 19708. Shortly thereafter, significant contributions were made by Kroeker (1977, 1979) and Berger and Selabassi and their associates [Berger et al., 1987, 1988a, b, 1989, 1991, 1993, 1994; Selabassi et al., 1987, 1988a, b, 1989, 1994]. Although this methodology yielded promising initial results, it has found only limited use to date, partly due to its perceived complexity and the burdensome requirements of its initial applications (viz., the length of the required experimental data and the restrictive use of Poisson stimuli required by the cross-correlation technique that is used for model estimation). Note that in this context the Poisson process is the counterpart of GWN inputs in that it represents statistically independent events (no correlation or spectral structure). In this section, we clarify some important methodological issues regarding the nature of the obtained kernels and introduce a more efficient model/kernel estimation method that reduces significantly the data-length requirements while increasing accuracy. Initial applications ofthis methodology to experimental data (using Laguerre expansions) have demonstrated its efficacy in a practical context [Alataris et al., 2000; Courellis et al., 2000; Gholmieh et al., 2001, 2002, 2003a, b; Song et al., 2002, 2003; Dimoka et al., 2003] and have corroborated the mathematical results presented herein. This approach to nonlinear modeling of neural systems with point-process inputs is the only general methodology currently available. This methodology is also extendable to systems with multiple inputs and multiple outputs in a nonlinear dynamic context (see Section 8.4), offering the realistic prospect of modeling neuronal ensembles. In the general formulation of the modeling problem, we seek an explicit mathematical description of the causal functional F that maps the input past (and present) upon the pre-
8.3
NEURONAL SYSTEMS WITH POINT-PROCESS INPUTS
441
sent value ofthe output in terms ofthe discrete-time Volterra series, which reduces for the point-process input ofEquation (8.62), to the output expression I
I
yen) = ko + A Lk1(n - ni ) + A 2 L i=l
I
Lk2(n - n., n - n
j)
+ ...
(8.63)
i=l j=l
where the high-order terms (second-order and above) are nonzero only for In - nil :5 M (M is the finite memory ofthe system) and i or} cover all event indices. In practice, rapid convergence ofthis functional power series is desirable, as it allows truncation ofthe Volterra series to a few terms for satisfactory model accuracy (i.e., output prediction) and yields relatively concise models. However, this convergence is often slow for point-process inputs. Following Wiener's approach in the case of continuous-input systems, the search for a kemel-estimation method that minimizes the model prediction error (as measured by the output prediction mean-square error) leads to the construction of an orthogonal hierarchy (series) of functionals using a variant of the Gram-Schmidt orthogonalization procedure. This approach seeks to decouple the kernels of various orders and secure maximum reduction of the prediction error at each successive model order. Orthogonalization of the functionals also facilitates the estimation of the kernels through cross-correlation. Critical for this orthogonalization procedure is the selection of the proper input that ought to test the system as thoroughly as possible over the space of all possible pointprocess inputs. Thus, for stochastic inputs, ergodicity is required as well as appropriate autocorrelation properties of all orders up to twice the highest-order functional (kerne I) of a given system. For systems with point-process inputs, the proper test input is the Poisson process, defined in the discrete-time context as a sequence of independent events (spikes) with fixed probability A of occurrence at each time bin T. For a discrete-time Poisson point-process (PPP) input x(n), we have the following statistical moments: E[xr(n)] = AAr
(8.64)
where E[.] denotes the expected value operator or ensemble average. The qth-order autocorrelation function of a PPP is E[x(nl)x(n2) ... x(n q)] = NAq
(8.65)
where} denotes the number of distinct time indices among the indices (n 1, . . . , nq ) . This expression results from the statistical independence of the values of the Poisson process in each time bin. The parameter A (Poisson parameter) defines the mean rate of event occurrence and plays a role analogous to the power level of GWN in the case of continuoustime input signals. The statistical properties defined by Equation (8.65) are critical for the development of the orthogonal functional series with PPP input, which we term the Poisson-Wiener (P-W) series. The development ofthe P-W series is greatly simplified ifwe use the de-meaned input: zen) = x(n) - AA
(8.66)
The P-W orthogonal functionals {Qj} are constructed by use of a Gram-Schmidt orthogonalization procedure and involve a set of characteristic P-W kernels {Pj}. They satisfy the orthogonality condition ofzero covariance E[QiQj] = 0 for i =1= j, and take the form
442
MODELING OF NEURONAL SYSTEMS
(8.67)
QO=PO AI
QI[Z(n);PI] = LPI(m)Z(n - m) m=O M
(8.68)
M
Q2[Z(n);P2] = L L P2(mh m2)Z(n - ml)Z(n - m2) ml=Om2=0 IL3 M M - - LP2(m, m)z(n - m) - IL2 LP2(m, m) IL2 m=O m=O
(8.69)
and so on. The model output is composed of the sum of these orthogonal functionals up to the required order, which is finite by practical necessity, although the system output may correspond in general to an infinite series: (8.70)
yen) = LQJz(n);pd ;=0
Note that these orthogonal functionals depend on the statistical central moments of the de-meaned PPP input: /-LI ~ E[z(n)] = 0, IL2 ~ E[z2(n)] = A(1 - A)A2, IL3 ~ E[z3(n)] = A(1 A)(1 - 2A)A3, and so on. The normalized rth moment IL,JAr is an rth-degree polynomial of A. The following key relation is noted: 2
IL4
~ E[z4(n)] = /-L~ + /-L3
(8.71)
JL2
as it attains critical importance in proving the fundamental limitation in the estimability of the kernel diagonal values that was empirically observed first by Krausz [Krausz, 1975]. Emulating the cross-correlation technique for Wiener kerne I estimation in the continuous case utilizing GWN inputs, we may estimate the unknown kemels {Pj} byevaluating the covariances between the output yen) and known orthogonal "instrumental" functionals of the input zen), that is, evaluating the "orthogonal projections" of the output signal upon each of these "instrumental" orthogonal functionals that form an orthogonal "coordinate system" in the functional space. If these "instrumental" functionals are chosen to be simple shift operators, then this approach results in the cross-correlation technique [Lee & Schetzen, 1965; Berger et al., 1987], which was first adapted to Poisson-input systems by Krausz in 1975, building on Ogura's earlier contributions [Ogura, 1972]. Unfortunately, the specific estimation formulae derived by Krausz contained some scaling errors that are corrected below. The P-W kernel estimation formulae following the cross-correlation technique with the appropriate instrumental functionals are given below. The zero-order P-W kernel Po represents the average output value Po = E[y(n)]
The first-order P-W kerne I is given by
(8.72)
8.3
Pl(m)
NEURONAL SYSTEMS WITH POINT-PROCESS INPUTS
=
1 -E[y(n)z(n - m)] J1-2
443
(8.73)
Note that the first-order P-W kernel estimation fonnula derived originally by Krausz has a different nonnalization constant, namely A instead of J.L2 = A(1 - A)A2. For the evaluation ofthe second-order P-W kernel, the key (and surprising) realization is that, because of the identity of Equation (8.71) relating J1-2, J1-3' and J.L4 for any Poisson process, the diagonal values of the kernel (i.e., for m, = m2) cannot be evaluated through cross-correlation and must be defined as zero! In other words, the "orthogonal projection" ofthe output signal y(n) upon the signal z2(n - m) - (1 - 2A)A . ztn - m) - A(1 - A)A2, is always zero for all m (i.e., they are orthogonal). Thus, the second-order P-W kerneI estimate is given by
P2(mb m2) =
{
2~~ E[y(n)z(n - ml)z(n - m2)],
for m,
0,
for m, = m2
=1=
m2
(8.74)
In general, the rth-order P-W kernel is given by
Pr(mb . . . , m ) = { r
r!~2E[y(n)z(n - ml) ... ztn - m
for distinct (mb . . . , m r )
r) ] ,
otherwise
0,
(8.75)
For ergodie and stationary processes, the ensemble averaging can be replaced by time averaging over infinite data records. Since, in practice, we only have the benefit of finite data records, the aforementioned time averaging is limited to a finite domain of time and results in estimates of the system kernels that represent approximations of the exact kernels (see Section 2.4.2). The P-W kerne I estimation fonnulae can be adapted to the specific PPP input form of Equation (8.62) by utilizing the properties of the Kronecker delta, which reduce the multiplication operations of cross-correlation to additions:
A2 --2
2NJ.L2 P2(mb m2) = ~
0,
~
1 N JA Po=- Ly(n)=N n=O N
(8.76)
A f AA Pl(m) = -Ly(ni+m)- -Po N J1-2 i=1 J.L2
(8.77)
{f.L.Lf ~[y(nil + m 1=112=1
A~[y(ni
1) + y(ni2 + m2)]5[(n i1 - n i2) - (m2 - ml)] 2
A}
+ ml) +y(ni + m2)] + A Npo ,
(8.78) for m,
=1=
m2
for m, = m2
The key definition that the diagonal values of the kemels be zero leads to considerable
444
MODELING OF NEURONAL SYSTEMS
simplification of the form of the functionals by eliminating all the terms other than the leading term, that is, the last two terms of Q2 in Equation (8.69), since P2(m, m) == O. Thus, the P-W functional series takes the simple form yen) = Po +
L PI(m)z(n - m) + LLP2(mh m2)z(n - ml)z(n - m2) m
+L L
mlm2
L P3(m b m2' m3)z(n - ml)z(n - m2)z(n - m3) + ...
(8.79)
ml m2 m3
which is identical in form to the Volterra series for the de-meaned PPP input, but with the important distinction that the diagonal values of the kernels be zero by definition. The P-W series ofEquation (8.79) can be expressed in terms ofthe original PPP input x(n) by use ofEquation (8.66). The resulting expression for the system output in terms of the P-W kernels and the original PPP input can be used to derive the Poisson-Volterra (P-V) kernels (8.80)
k o = L(-AA)rL ... LPr(mb ... , m r) r=0
ml
mr
k1(m) = L r(-AA)r-1L ... LPr(m, 11
r=l
,...
kz(ml> mZ) =
00
t., 12, ... ,Ir-I)
(8.81)
I r-1
r!
L 2'( _ 2)1 (-AAy-zL r=2 . r . 11
... LPr(ml> mZ, 11, 1r-2
• • •
,Ir-Z)
(8.82)
and so on, which are identical to the Volterra kernels of the system at the nondiagonal points. Note that, for a finite-order system, the highest order P-V and P-W kemels are identical. The diagonal values ofthe P-V kernels are zero since the P-W kernels are zero at the diagonals, reflecting the fact that a point-process input (having fixed spike magnitude values) is intrinsically unable to probe (and therefore estimate) the kernel diagonal values. Note that the latter would be possible ifthe spike-magnitude values were allowed to vary. The resulting "Poisson-Volterra" (P-V) series attains the following meaning: the zeroorder term is the average value ofthe output; the first-order term accounts for the responses to individual input impulses; the second-order term accounts for interactions between pairs of input impulses within the memory epoch (defined by M); the third-order term accounts for interactions among triplets of input impulses within the memory epoch, and so on. One could not hope for a more orderly and elegant mathematical model of the hierarchy of nonlinear interactions of point-process inputs.
8.3.1
Lag-Delta Representation of P-V or P-W Kerneis
Several investigators [Berger et al., 1987, 1988a, b, 1989; Selabassi et al., 1987, 1988a, b, 1989] have found it more convenient to represent the P-V or P-W kemels in a different coordinate system that takes the inter-spike interval as an independent variable; that is, instead ofp2(mb m2) they use P2(m, a 1) , where m = m, and a 1 = m2 - m«. ClearlY,P2(m, a 1) = 0 for a 1 = o. The expressions for the P-V kernels in this lag-delta coordinate system become
8.3
NEURONAL SYSTEMS WITH POINT-PROCESS INPUTS
I ... I
k o = I(-AA)rr!I r=O m aI
ar-I
I ...I
kI(m) = Ir(-AA)r-I r! r=I aI --
k2(m,
Pr(m, ~b
,ar-I)
(8.83)
... ,ar-I)
(8.84)
•••
Pr(m, ab
445
ar-I
~" _ ~ r!(r-1)! r-2~ ~ __ k 2(m , m + a I) - L 2'( _ 2)' (-AA) L ' " LPr(m, ~J, ~2' r=2 . r . a2 Ar-I
a I) -
•••
,~r-I) (8.85)
a
where i = m.; 1 - m 1. The kernel values are zero whenever any of the arguments ~i is zero (kerneI diagonals in the conventional notation). The two sets of kernels, {~} and {ki}' are equivalent in the sense that either set represents fully the input-output relationship. The difference is in the way the kemels are defined within the P-V functionals. The lag-delta representation indicates that the interspike intervals of the input point process are the critical factors in detennining the nonlinear effects at the output of neuronal systems. We must emphasize that, in actual applications, the obtained model should be ultimately put in the P-V form, since the P-V kemels are independent of the specific parameters (A, A) ofthe PPP input-unlike the estimated P-W kemels, which depend on the PPP input parameter (AA). The P-V kernels (unlike the P-W kernels) provide a system model that does not depend on the specific point-process input and can be used to predict the system response to any given point-process input (not only Poisson). Equations (8.80)-(8.85) can be used to reconstruct the P-V kernels of a system from a complete set of P-W kernel estimates obtained via the cross-correlation technique. The mathematical relationships between P-W and P-V kemels also suggest practical means for estimating the nonlinear order of the required model by varying the input-specific parameter (AA) and observing the resulting effects on the obtained P-W kemels, Since the latter are polynomial expressions of (AA) with coefficients dependent on the input-independent P-V kernels, an indication ofthe order ofthe system (and ofthe required model order) can be obtained from the degree ofthe observed polynomial relation. An important observation is in order regarding the diagonal values of the kemels. It was shown that, for point-process inputs of fixed spike magnitude A, the diagonal values of the P-W kernels cannot be estimated via cross-correlation. However, since two input events cannot occur at the same time, the diagonal values of the kernels are never used in model prediction (whether zero in value or otherwise) and may assume any values without affecting the model prediction, as long as the contributions of these diagonal values are balanced properly by the lower-order kemels. If the input spikes are allowed to have different magnitude values, then the diagonal kernel values become relevant and can be estimated. Of course, in order to estimate these diagonal kernel values from input-output data, we must either test the system with point-process inputs of variable spike magnitudes or interpolate the diagonal kerne I values using the neighboring off-diagonal values (under the assumption ofkernel smoothness at the diagonal points). The ultimate validation of these kemels has to be based on the predictive performance of the resulting model.
8.3.2 The Reduced P-V or P-W Kerneis In another simplification used frequently in practice [Berger et al., 1987, 1988a, b, 1989; Selabassi et al., 1987, 1988a, b, 1989], the lag-dimension (m) of the kernels in the lag-
446
MODELING OF NEURONAL SYSTEMS
delta representation can be suppressed and the dimensionality of the kemels can be reduced by one, when the system dynamics in the lag dimension do not spread beyond one time bin (typically 5 to 10 ms in size). The latter situation also arises in those cases where the output is an event synchronized with each input spike (e.g., the measurement of a population spike in a recorded field potential). The lag dimension can be suppressed if we can assume fixed response latency in first approximation. This reduces significantly the complexity of the resulting model by suppressing one dimension in each kernel and has been proven useful in many applications to date. It is evident that a modified Volterra model emerges in this case, termed "the reduced Volterra model," whereby the magnitude ofthe synchronized output event is expressed in terms ofthe "reduced P-V kernels" {kr} as
L
L L
yen;) = AkT + A2 ki(n i - ni) + A3 kj(n i - nil' n, - ni2) + ... nj
(8.86)
where n, denotes the common time index of input-output events, and the summations take place over all time-indices ni of events preceding the present/reference index n, that lie within the memory epoch ofthe system kernels (i.e., In; - nil ~ M). Note that the reduced P-V kernels are expressed in the lag-delta representation. The use of the reduced Volterra model facilitates the practical modeling of neural systems for which the output events exhibit fixed latency relative to the corresponding input event. If this latency d( n i ) is variable, then another reduced Volterra model equation can be used to describe the dependence of the variable latency on the timing of the input events as den;) = AlT + A2
I
nj
li(n i - ni) + A3
I
L
'11
lj(n i - nil' n, - ni2) + . .. (8.87)
where the reduced P-V kemels {Ir} characterize fully the dependence of the output event latency upon the timing of the preceding input events. The introduction of a separate model equation for the output event latency suggests that it is possible to have a vector representation of the output containing any number 01 output attributes (e.g., the initial slope or peak value of an EPSP of a field potential). Each of these output attributes begets a reduced Volterra model equation and the associated set of kemels, Each equation (single-attribute data) is analyzed separately to estimate the respective set ofkemels. Interpretation ofthe kernels can be made either for single attributes or conjointly for multiple attributes. The resulting reduced Volterra model for the vector output case (i.e., when the dependence of multiple output attributes on the input spike sequence is examined) takes the vector form x(n;) = Agi + A2
L
nj
g~(ni - ni) + A3
I
I
nil
g~(ni - nil' n, - ni2) + . . .
(8.88)
where the underscore denotes vector, i.e., l(n;) = [al(ni) a2(ni ) ••• aQ(n;)]' with aq(ni) denoting the qth attribute ofthe output at event time n i (q = 1, 2, ... , Q), and denotes the vector ofthe corresponding rth-order reduced P-V kernels for the multiple output attributes considered.
g:
8.3
NEURONAL SYSTEMS WITH POINT-PROCESS INPUTS
447
Having established the proper definitions, mathematical relations and meaning for the Volterra, P-V, and P-W kernels of a neuronal system, we now turn to the key issue in actual applications: the accurate and efficient estimation ofthese kernels from input-output data. The cross-correlation technique (CCT) was the first to be used for P-W kernel estimation in actual applications [Krausz, 1975], but it has been shown to provide limited estimation accuracy and to require long data records in order to achieve satisfactory P-W kerneI estimates-a serious burden in actual experimental studies where the preparation can be kept stable only for a limited time. A better estimation method can be based on Laguerre expansions of the kernels and least-squares fitting procedures, as in the continuous-input case [Marmarelis, 1993]. This method was recently adapted to systems with point-process inputs and was applied successfully to modeling studies of hippocampal neurons [Gholmieh et al., 2001, 2002, 2003a, b; Dimoka et al., 2003; Song et al., 2002, 2003]. The method employs the orthonormal basis of discrete-time Laguerre functions (DLFs) to expand the system kernels, and then uses least-squares fitting to estimate the requisite expansion coefficients. In the context of point-process inputs, we may consider the Laguerre expansions of the Volterra kernels in the lag-delta representation, using L DLFs {bj } : L
k1(m) = L
k2(m, ~) =
I j=l
Cl (j)bj(m)
(8.89)
L
I I
C2(jbj2)bjl(m)bj2(~)
jl=lj2=1
(8.90) The general input-output relation after the Laguerre expansion of the kernels is I
y(n) =
Co
+A
L
I j=l I
Cl (j)bj(n
- ni)
i=l
I
+ 2A2
il
L
L
I I I I
C2(jJ,J2)bj 1(n - nil)bj2(n - ni2)
il=l i2=ljl=lj2=1
+ ...
(8.91)
where Co = k o (for uniformity ofnotation) and the summation over the indices (i b i 2 , . . •) extends over the intervals In - nil that do not exceed the system memory (i.e., the extent of the nonzero values of the kernels). It is evident that the diagonal values of the Volterra kernels (or the zero-argument values in the equivalent lag-delta representation) are interpolated from neighboring estimated values using the Laguerre basis functions. The expansion coefficients ofEquation (8.91) can be estimated by linear least-squares fitting of the input-output data. Having estimated the expansion coefficients in the model equation (8.91), we can now reconstruct the estimates of the Volterra kemels of the system using Equations (8.89) and (8.90). These Volterra kernel estimates have nonzero values at zero arguments (in general), owing to the interpolation/extrapolation means provided by the Laguerre expansion. As an example, for a single input spike of magnitude A at time n., the model output is
448
MODELING OF NEURONAL SYSTEMS
y(n) = [ko + Ak1(n- ni) + A 2k2(n - ni' 0) + .. .]u(n - ni)
(8.92)
where u denotes the discrete step function (0 for negative argument and 1 elsewhere), and the Volterra kernels {ko, k., k2 , ••• } are not defined as zero for aj = 0 in the lag-delta representation, unlike their P-V counterparts, which express the model output as y(n) = [ko + Ak1(n - ni)]u(n - ni)
(8.93)
because, k2 (n - n., 0) = 0 etc. This simple example i1lustrates the use of the interpolated Volterra kernels instead ofthe P-V kerneIs (causing no harm). It should be emphasized that the proposed kernel estimation method yields the Volterra kernels of the system, whereas the cross-correlation method can only yield estimates of the P-W kerneIs that are zero on the diagonal points. It is also evident that Volterra kerne I estimation using kernel expansions does not require Poisson inputs (although the latter are efficient test inputs) and can be used in connection with spontaneous neural activity data (arbitrary point-process inputs). This broadens immensely the scope ofthe advocated approach and elevates it to a general methodology with no peers at the present time. A note should be made regarding the relation of the advocated approach to the currently popular approach of paired-pulse testing, whereby pairs of impulses of variable interpulse interval are presented at the input ofthe system and the deviation ofthe system output from linear superposition is computed as a quantitative measure of paired-pulse facilitation or depression. This quantitative measure is also given by the second-order Volterra kernel in a more efficient manner, when the system is of second order and only a pair of impulses is allowed within the memory epoch. Greater efficiency results from the reduction in the required experimentation and processing time using the advocated approach, because the long sequence of paired-pulse tests is avoided. However, if the system is of higher order, even if no more than two impulses occur within the memory epoch, the results will deviate for the two cases. The presented Volterra formulation is valid for any number of impulses within the memory epoch and is more rigorous because it is based on solid mathematical foundations that allow modeling ofthe nonlinear interactions (facilitatory or depressive) of any order among any number of impulses within the memory epoch (not just pairs), bringing the experimental paradigm and the utility of the Volterra model much closer to the actual natural operation of the system. This is extremely important for advancing scientific knowledge in neuronal dynamics in a natural and unbiased context that is not constrained by specialized inputs, since results obtained from paired-pulse testing can be severely biased by higher-order interactions. The results obtained with the advocated methodology are more reliable than the results ofthe paired-pulse method because they separate explicitly the contributions of the various orders of kemels (i.e., the paired-pulse response contains the contributions from all kernels of order second and higher without distinction) and they avoid errors due to cross-term interactions or the effects of systemic/ambient noise (because ofthe employed least-squares fitting). Some illustrative examples of the application of this approach to the hippocampal formation are given in the following section.
8.3.3
Examples from the Hippocampal Formation
As a first example of the power of the Poisson-Wiener modeling approach in elucidating the functional complexity of parts of the central nervous system, we choose one of
8.3
NEURONAL SYSTEMS WITH POINT-PROCESS INPUTS
449
the pioneering applications to the hippocampal formation by the collaborating research groups of Berger and Selabassi [Berger et al., 1987, 1988a, b, 1989, 1991, 1993, 1994; Selabassi et al., 1987, 1988a, b, 1989, 1994]. We also present more recent modeling results obtained by the collaborating research groups of Berger and Marmarelis from medial and lateral perforant-path stimulation in a hippocampal slice (in vitro) that make use of the improved kernel estimation methods presented in the previous section (Laguerre expansion technique to obtain directly the "reduced" Volterra kemels). The recent results include single-input and dual-input Volterra models [Gholmieh et al., 2001, 2002, 2003a, b; Dimoka et al., 2003]. Finally, we present an example of reduced Volterra modeling of synaptic dynamics in the hippocampus and compare it to a widely accepted parametric model [Song et al., 2002, 2003]. Single-Input Stimulation in Vivo and Cross-Correlation Technique. The neuroanatomical structure ofmany brain regions is organized in terms ofhighly interconnected subpopulations of neurons, like the well-studied hippocampal formation, which consists of the entorhinal, hippoc ampal, and subicular cortices [Berger et al., 1987]. The major projection neurons ofthese three cortical regions form a multisynaptic circuit (see Figure 8.21) in which excitation of the entorhinal cortex results in the sequential excitation through the perforant path of hippocampal and subicular neurons. In addition, many feedforward and feedback pathways exist within this "master loop" that modulate the activity ofthe projection neurons. Berger and Selabassi and their associates have used the aforementioned modeling approach to study the functional characteristics ofthe hippocampal formation [Berger et al., 1987, 1988a, b, 1989; Selabassi et al., 1987, 1988a, b, 1989]. Their experimental strategy has involved the application of Poisson random impulse trains to axons of the perforant path, which arise from entorhinal cortical neurons, and recording as outputs the evoked population responses from postsynaptic granule cells of the dentate gyrus or pyramidal cells in the CA3 or CAI regions. The reduced Volterra model presented in Section 8.3.2 is used for representing the transformational characteristics of the hippocampal network of neurons activated by the perforant path. The activity of dentate granule cells can serve as an effective measure oftotal network activity because the other subpopulations ofneurons within the hippocampal formation contribute to the excitability of granule cells through several feedback pathways. The high degree of lamination of neuronal elements within the hippocampus results in distinct advantages for applying this modeling approach. Subthreshold stimulation applied to the perforant path results in the generation of excitatory postsynaptic potentials (EPSPs) within the dendrites of dentate granule neurons. The intracellular current is reflected extracellularly as a positive-going potential (i.e., current source), or "population EPSP," within the cell body layer of the dentate [Figure 8.22 (A)]. If the stimulus is suprathreshold, the action potential discharges of a population of granule cells are reflected as a negative-going "population spike" (i.e., current sink) that is superimposed on the positive-going "population EPSP" [Figure 8.22 (B)]. Thus, the two components ofthe extracellular field potentials can be separated and analyzed in isolation [Figure 8.22 (C)]. This greatly simplifies the computation of"reduced" Poisson-Wiener kemels for the population spike in the lag-delta (or tau-delta) representation (see Section 8.3.2). Methodological and experimental procedures are detailed in Berger et al. (1987, 1988a, b) and Selabassi et al. (1988a, b). All experiments were conducted using male New Zealand white rabbits, and data were collected either acutely or after animals had recovered from chronic implantation of stimulation and recording electrodes. During each
450
MODELING OF NEURONAL SYSTEMS
Figura 8.21a Sehematie representat ion of a eross-seetion of the hippoeampus showing the intrinsie trisynaptie pathway involv ing exeitatory input from the med ial and lateral perforant path (pp) to granule eells in the dentate gyrus, to pyramidal eells of the CA3 regie inferior, and to pyramidal eells of the CA1 regie superior [adapted from Berger et al., 1987].
I
I I , I I I
I ;
k A'comm.
I" I I,', , I 'I I , I I
II~ I I I
°l
ENTORHINAL
~
pp
I
I I
DENTATE
CA3
O"}
I
IA I
<~H-!-0-:"""\-< : bc I
bc
>
:
"c<~>.0:> - ----j
SUBICULUM
comm.
./
CAl
Figure 8.21b Sehematie representation of the major intereonneetivity in the hippoeampal formation. Abbreviations : eomm . = eommissural projeetions; be = basket eells; pp = perforant path.
experiment, a Poisson train of 4064 impulses (biphasic, 0.10 ms duration) was delivered to the perforant path with a mean rate of2.0 impulses/sec (i.e., a mean interimpulse interval of500 ms), and a range ofrandom (Poisson) interimpulse interval s of 1-5000 ms. The majorit y of population spikes (typically 90% for each preparation) evoked during random train stimulation occurred with latencies (r) of 3-9 ms in both anesthetized (N = 8) and unanesthetized (N = 8) animals [Figure 8.23 (A) and (B), respectively]. Because of this range of spike latencies , first-order kerneis were computed initiall y with a temporal resolution of 10 ms, so as to include all evoked spikes in a single tau bin, making the firstorder kernel a simple scalar [reduced Poisson-Wiener (P-W) kernei]. The value of the first-ord er reduced P-W kerneI is the average population spike amplitude evoked during Poisson-train stimulation, which was 1.7 mV (±0.1) for anesthetized animals and 2.2 mV
8.3
A.
B.
t
s
451
NEURONAL SYSTEMS WITH POINT-PROCESS INPUTS
tJI (· "\. "., ,
t5
'',+--J1ltl
c.
r_~"(11 ~
ts
I
-
Figure 8.22 Examples of population EPSP (A) and spike (B) components of field potentials evoked in the cell body layer of the dentate gyrus in response to stimulation of the perforant path. The result of a subthreshold stimulation is shown in A, and of a suprathreshold stimulation in B. Shown in C is the manner in whieh the amplitude (A) of the population spike is determined [Berger et al., 1987).
(±0.3) for unanesthetized animals . Second-order P-W kerneis were computed also in reduced form (single tau bin of 10 ms). Representative reduced second-order P-W kerneis for three anesthetized animals are shown in Figure 8.24 (A-C), and reveal significant nonlinearities as a function of the interimpulse interval or A. For A < 50 ms they exhibit inhibitory effects on the population spike, but between 50 ms and 300 ms they exhibit marked facilitation. The maximum facilitation for all anesthetized preparations ranged from 93-158% of the first-order kernel value and occurred between 70-100 msec . The facilitation was bimodal in some preparations. For example, Figure 8.24 (C) shows a second-order P-W kerne I exhibiting one peak of facilitation at 70 ms and a second peak of facilitation at 210 ms. The presence of this bimodality was attributed to inadvertent stimulation of two different states or high estimation variance associated with the employed cross-correlation technique. For A values in the range of300-700 ms, an inhibition in spike amplitude was observed in six ofthe eight anesthetized preparations but was small and variable from animal to animal. At A values longer than 700 ms, the kernel values appear statistically insignificant. Second-order P-W kernels computed from unanesthetized, chronically prepared animals revealed several significant differences from those obtained in anesthetized animals [Figure 8.24 (D-F)]. First, data from unanesthetized animals revealed less facilitation in response to interimpulse intervals in the range of70-100 ms (mean peak facilitation = 74 ± 8%). Second , inhibition did not occur in response to intervals longer than 300 ms. Third, data from 5 of the 8 animals showed a slight facilitation that extended to approximately 1000 ms [see Figure 8.24 (F)]. Fourth , the bimodality observed in anesthetized animals is removed. Analyzing the amplitude of the evoked population spike in isolation of other components also greatly simplifies the presentation and interpretation of third-order reduced P-W kernels because the data can be presented in three dimensions instead offour dimensions that would be necessary if the entire field potential had been analyzed. Reduced third-order P-W kerneis are shown in Figure 8.25, one for data collected from an unanesthetized animal and one for data collected from an anesthetized animal (both kerneis were computed with a tau bin of 35 ms). The second-order P-W kerneis for these same preparations are shown in Figure 8.24 (A) and (D), respectively. The third order kernels reveal
452
MODELING OF NEURONAL SYSTEMS
B
A 100
.100
10
10
10
80
: c
!
70
B ..
~c 8
40
30
:.
30
)
'ö
50
l
,...
2 10 50
~
20
20
10
10 01 e .... " " 024 • •
,
,
,
10
12
14
,
"
e
"
,
0'
o
20
,=l're
ei
ms
c
D 3.5
2.5
3.0
2.0
2.5 ~
~
.E
~
1.5
2.0
t:: -;::: 1.5 ..c:
1.0
1.0
0.5 0.5
0'
o
r
10
,
20
e 30
e 40
e 50
, 10
, 70
, 10
oI o
, • 10 100
I
,
,
,
,
10
20
30
40
50
ms
, 10
, 12
, 14
11
,
,
10 100
3.0
2.5
E
, 70
F 3.0
S
,
rns
E
:;
eo
2.5
~
...
2.0
>
2.0
E
S
1.5
s
1.0
1.5 1.0
0.5
0.5 01
o
,=.-'1"
2
4
••
10
ms
12
14
11
,.
20
oI o
i'" 2
4
I I
,
•
, 10
,
,
,
1. 20
ms
Figure 8,23 Distributions of latencies for population spikes elicited during random train stimulation for anesthetized (A) and unanesthetized (B) preparation. First-order Poisson-Wiener kerneis (values are expressed in millivolts) computed with temporal resolutions of 10msec (C and 0) and 1 msec (E and F) for data collected from the same two preparations [Berger et al., 1987].
8.3
NEURONAL SYSTEMS WITH POINT-PROC ESS INPU TS
453
that significant third-orde r nonlinearities exist in the system and are altered by anesthesia. In general, third-order P-W kernels from anesthetized animals have significant positive components, whereas their counterparts from unanesthetized animal s contain almost all negative component s. The observed bimodality in the facilitatory region of some kernels may be due to the stimulation of two subpopulations of perforant-path fibers, since a medial and a lateral perforant path can be distinguished on the basis of anatomical , physiological, and pharmacological criteria (see example of dual stimulation below). One of those criteria is a shorter latency to peak response of population EPSPs in dentate granule cells produced by stimulation of the medial perforant path. Thus, the first peak may reflect predominant activation of the medial perforant path, whereas the second peak may reflect predominant
A
... 'JO
D
-.. i''''''
- ',.,..' ' ' 1 ... 1 >
~
E ...
.,.~
Ij
•
11
>
'"
E • .71
'00
...
:soo
...
500 100 A (mi'
700
ooe
· Ul
...
!
r"'"
>E , ,.
" .71
C
,
1... t .,. ~
I; ..".,
.... •
'00
...
:soo
'00
500
A Iml'
.. .....
...
100
... 1711
1... ...t • I;
:100
500 100 A (mi)
700
laO
toO '000
-:~
l':l~ i:j~ • ... ,.. ... ...... 700
...
... '000
,.-c:c
'101)
'100 JOOO
A (mi)
,....
Ii .....
'00
... ,.. ... ...... ... 700
A (mi,
r=
;:.".,
'1
~ .. .. iil... .....
~
...... •
... ...
u.
l
l •1 .,. • .. 71
'00
'00
I'
.71
...
. ..
lClO 1000
..... 1' 1 .• - '....' ' ~
•
. , 70
100
,n
~
,.. u.
.....
.,
•
100 '000
- ,.,.. 1•• .,.t i • ...~ • Ij
1-
.
.....
'10
.."
'''''''
'1 • ! .,.~ '1 " -e." ~ l ''''''
1t •
.... "I .,.,.. .... .,.... •
,.,
... '000
-u o
.,....
•
... ... ...
100 '000 12'01) A (mi)
Figura 8.24 Second-order reduced P-W kemels for population spikes of all latenc ies (tau bin = 10 msec) recorded from three different anesthetized (A-e) and two different unanesthetized (D and E) preparations. For panels A-E , second-order kemels values are expressed as a functlon of interstimulus intervals (ß) extending to 1000 msec. The data shown in F are from the same preparation as in E, except that interstimulus intervals extend to 2000 msec. Both unnormalized (mV) and normalized (relative to the peak of the first-order kemel) values are shown.
454
MODELING OF NEURONAL SYSTEMS
activation ofthe lateral perforant path. This is corroborated by the results ofthe two-input study (medial/lateral) presented below. Using this approach, Berger and Selabassi determined how manipulation of afferent projection that is diffusely distributed throughout the hippocampus (i.e., the noradrenergic input from the locus coeruleus) influences functional properties at an integrated level and investigated how epileptiform activity at one point in the system influences the transformational properties of the entire hippocampal network. These results demonstrate the utility of this approach to a comprehensive understanding of the function of the central nervous system. It is evident that the presented modeling methodology can be used to assess quantitatively the effect of pharmacological intervention or to diagnose neuropathologies in the central nervous system, provided that the appropriate input-output measurements can be
100
f =m
50
E o
c
---
~
0
~
~-so
~
-100
o Figure 8.25 Third-order reduced P-W kerneis from an unanesthetized (top) and an anesthetized animal (bottom). Kerneis values are normalized relative to the respective first-order kernel for each set of data, and plotted as a function of the least recent (a 1) and the most recent (a 2) of the two prior interimpulse intervals. Only half of the plot is shown because only interval pairs in which a 2 > a 1 were included in the calculations [Berger et al., 1987].
8.3
NEURONAL SYSTEMS WITH POINT-PROCESS INPUTS
455
made reliably and safely. An illustrative example of quantifying pharmacological effects in a hippocampal slice (in vitro) is presented in the following section.
Single-Input Stimulation in Vitro and Laguerre Expansion Technique. The major excitatory projection from the entorhinal cortex to the granule cells of the dentate gyrus is the perforant pathway and consists of two anatomically and functionally distinct subdivisions-the medial and the lateral perforant path. The lateral perforant path (LPP) arises in the ventrolateral entorhinal area and it synapses in the outer one-third ofthe molecular layer of the dentate. The medial perforant path (MPP) arises in the dorsomedial entorhinal cortex and it synapses to the granule cell dendrites in the middle one-third of the molecular layer (see Figure 8.21a). The difference in the synaptic locations on the granule cell dendrites of the terminals from the LPP and MPP results in different electrophysiological characteristics of the granule cell responses to the independent activation of the two pathways. The medial and the lateral fibers of the perforant path were stimulated in a hippocampal slice preparation in vitro and the population spike data recorded as output in the dentate gyrus were analyzed using the Laguerre expansion technique adapted for point process inputs (see Section 8.3.2). An array of 60 microelectrodes (3 x 20 electrode arrangement with sizes of 28 J.Lm, and center-to-center spacing of 50 um) was used for recording (MEA60 Multi Channel Systems, Germany) from a hippocampal slice of adult male rats. Poisson sequences of 400 impulses were used to stimulate each pathway independently and the induced responses of the dentate population spikes were analyzed. The resulting reduced second-, and third-order Volterra kernels are shown in Figure 8.26 for the medial pathway (k} = 144 f.LV) and in Figure 8.27 for the lateral pathway (k} = 197 f.L V). The distinct dynamic characteristics ofthe two pathways are evident. The medial pathway exhibits biphasic dynamics but the lateral pathway is strictly inhibitory in the second-order kernel. The third-order kernels are biphasic for both pathways and clearly with distinct dynamics that appear to counteract the effects of their second-order counterparts. The predictive ability of the obtained models is excellent and the obtained kernel estimates are consistent from experiment to experiment. These results corroborate the improvement in modeling performance due to the new estimation method (based on kernel expansions). This also allows the extension ofthe modeling approach to the dual-stimulation paradigm, discussed in the following subsection [Dimoka et al., 2003]. Another interesting demonstration of the efficacy of the new modeling methodology is presented in Gholmieh et al. (2001), where the reduced second-order Volterra kerne I of the neuronal pathway from the Schaffer collateral (stimulation) to the CA1 region (population spike response) is shown to remain rather consistent from experiment to experiment in acute and cultured hippocampal slices. The pharmacological effects of picrotoxin (a GABAA receptor antagonist) are quantified by means of the respective reduced second-order Volterra kernels, as demonstrated in Figure 8.28. The effect of picrotoxin on the neuronal dynamics is more pronounced for short lags (a < 100 ms), consistent with the existing qualitative knowledge regarding GABA A receptors (observe the higher peak after picrotoxin). Dual-Input Stimulation in the Hippocampal Slice. Two independent Poisson trains of impulses stimulate simultaneously the medial and lateral perforant pathways (MPP and LPP) of the hippocampal slice preparation and the induced population spike
456
MODELING OF NEURONAL SYSTEMS
(PS) in the dentate gyrus is viewed as the system output. The latency of the PS is within 10 ms of the respective stimulating impulse and, thus, the reduced form of the Poisson-Volterra model is employed by suppressing the tau dimension in the tau-delta kernel representation (see Seetion 8.3.2). The synchrony of the input-output events, the independence of the timing of the impulses in the two input Poisson sequences, and the extraction ofthe PS output data from the field recording are illustrated in Figure 8.29. The two-input modeling methodology has to be slightly modified in the reduced Poisson-Volterra case in order to account for two distinct interaction components (cross-kernels) in the second-order model. This modification is presented below and corresponds a
K1:
Mean: 144uV
Stdev: 7.65
K2:
200 150 100 50
o o
400 ···----..--t'\11I1
800
1000
-50 -100
time (msec)
K3: 20 0 -20 -40 -60 -8 0 -100 -120 -140 1000
~
500
o
Figure 8.26
0
200
400
600
800
1000
Second-order and third-order reduced Volterra kemels for medial pathway.
8.3
457
NEURONAL SYSTEMS WITH POINT-PROCESS INPUTS
SINGLE INPUT STIMULATION: Lateral (Third order model): K1:
Mean: 197
Stdev: 3.41
K2:
o
400
200
600
800
1000
o -50 -100
-150 -200
-250 -300
-350 -400
time (msec) K3:
500 400
~
30 0 200 100
\
.. ,
-100 100 0 50 0
o Figure 8.27
50 0
0
Second-order and third -order reduced Volterra kemels tor lateral pathway.
cross-kernel component to each input-synchronized output PS. If we denote with x] and X2 the Poisson input sequences for MPP and LPP, respectively, then x](n)
=
LA,8(n - ni)
(8.94)
nil
X2(n) =
LA28(n - ni2)
(8.95)
"i:
where n denotes the discrete-time index (bin size of IO ms), Al and A 2 denote the fixed impulse amplitudes for MPP and LPP , respectively, and {nil}, {n i2} are the times of oc-
458
MODELING OF NEURONAL SYSTEMS 700 ,
i
i
i
i
I
i
i
i
i
i
800
.......
]500 ::E4OO
~ ~300 ...III
A
5 §
200
'0
100
U)
0
- 100 0
tr 200
400
eoo
800
1000
1200
1400
eoo
1
1800
2000
Time (ms)
B
1.6
8
C
7
~6
1.2
kl
>5 ~ III
..
D.8
'e I:P
DA
U
•o 2 1
o I'YA,'", l la
0
i
rWA , I Yd , dTt , l rA , m , ' R'.
1 2 3 .. 5 6 7 8 9
Laguerre basis coellicient, C, Figure 8.28 Effects of picrotoxin (100 IJ.M) on the reduced second-order kemels . (A) Second-order kemel before drug addition (Iower curve) and 15 min after perfusion with 100 IJ.M picrotoxin (higher curve). (8) Effect of picrotoxin on reduced first-order kemeis (open bars: control; hatched bars: picrotoxin (100 IJ.M); means ± SD of five experiments). (C) Effects of picrotoxin on the nine Laguerre expansion coefficients (open bars: control ; hatched bars: picrotoxin) [Gholmieh et al., 2001].
currences of the impulse events in the two inputs [Courellis et al., 2000; Dimoka et al., 2003]. For the second-order model, the output PS amplitude can be expressed in terms oftwo components Yl and Y2 representing the MPP and LPP contributions, respectively: yen) = YI(ni,)O(n - nil)
+ Y2(ni2)O(n - ni2)
(8.96)
8.3
La
\c
..
I
12
t1
Figure 8.29
459
NEURONAL SYSTEMS WITH POINT-PROCESS INPUTS
n
u
tot
Two-input experiment with medial and lateral perforant-path stimulation.
YI(nil) = kl,o
+
I
kl,l(nil - njl)
ni 1-j.L
Y2(n i2) = k 2,o
+
I ni2-j.L<'12
I
+
k l,2(n i l - ni2)
(8.97)
k 2,2(n i2 - nj2)
(8.98)
ni 1-j.L<'12
k 2,I(n i2 - njl)
+
I
ni2-j.L
where kl,o and k 2,o denote now the reduced P-V first-order kernels, kl,l and k 2,2 denote the reduced P-V second-order self-kemels, and k l ,2 and k2 ,1 are the two components ofthe reduced P-V second-order cross-kernel in the two-input case (note that the summation index accounts only for "other input" effects). The reduced first-order kernel is a constant value and represents the average PS amplitude for each pathway under dual stimulation. The reduced second-order self-kernel represents the nonlinear interaction between the present stimulus impulse and every past stimulus impulse in each pathway within the memory epoch . Each component of the reduced second-order cross-kernel shows the effect of one input pathway on the response generated by the other input pathway. The reduced P-V kernels were estimated using the Laguerre expansion technique for L = 4, a = 0.994. The values of the first-order kerneI were 106 f.1 V for the MPP and 217 f.1V for the LPP in this example. The obtained second-order kernels are shown in Figure 8.30. The second-order self-kernel for the LPP exhibits only a slow inhibitory phase up to about 600 ms. The second-order self-kernel for the MPP is characterized by a fast facilitatory phase up to 80 ms, followed by a smaller but longer inhibitory phase lasting up to about 800 ms. The second-order cross-kernel for the LPP (representing the effects of MPP on the LPP-synchronized output) is characterized by a fast inhibitory phase (0-100 ms), followed by a small and relatively short (100-400 ms) facilitatory phase. The second-order cross-kernel for the MPP is characterized by a fast facilitatory phase up to 150 ms, followed by a small and short inhibitory phase up to 250 ms, and then a small but longer facilitatory phase up to about 800 msec. The NMSE of the second-order model prediction with only self-kemels is 16.05%, and the inclusion of the crosskernels drops it to 7.03%, as illustrated in Figure 8.31 [Courellis et al., 2000; Dimoka et al., 2003].
Nonlinear Modeling of Synapfic Dynamies. As an example of nonlinear modeling of synaptic dynamics, we consider the case of short-tenn plasticity (STP) in the hippocampus, whereby presynaptic action potentials in a Schaffer collateral elicit excitatory postsynaptic currents (EPSCs) at CAI pyramidal cell synapses in a manner that depends on recent activity. A Poisson sequence of impulses (mean rate of 2 impulses/sec) was applied to the Schaffer collateral to stimulate orthodromically the synapses at voltageclamped CAI pyramidal cells. Due to the voltage clamp, the deconvolved EPSCs record-
460
MODEL/NG OF NEURONAL SYSTEMS
Medial perforant Path a : MCOIIÖ Order SelfKemel
Lateral PertOrant Plth A: Secood Order SelfKemel
~I~
·50
~OO'
10 0
200
llOO
llOO
«10
~::F~~1
I
u
1000
200
400
600
llOO
D: Second Order Cross
«101 300
1000
Kernell
200 100
!O
o
10'
.,00'
I
o
200
tg'.)
200
400time lilfllll) llOO
«10time
I llOO
1000
Figura 8.30 Second-order self-kemels (top) and cross-kemels (bottom) when stimulat ing at the MPP (right) and the LPP (Ieft) simultaneously (dual-site stimulation).
ed in the soma of the pyramidal cell were used as the output measurement of presynaptic neurotransmitter release [Song et al., 2002] . A third-order reduced Poisson-Volterra (P-V) model was estimated from the data, using 1000 input-output events and the Laguerre expansion technique. The resu1ting P-V kernel estimates for this nonparametric model are shown in Figure 8.32 and yield excellent prediction accuracy (NMSE = 2.4%). The reduced second-order P-V kernel exhibits early facilitation (up to -100 ms) and late depression (between -100 ms and -1500 ms). The reduced third-order P-V kernel exhibits early depression (up to -100 ms) and slight late facilitation (between -100 ms and -500 ms). These reduced P-V kernel estimates are compared in the same figure with their counterparts obtained from a widely accepted parametric model of STP for this synapse [Song et al., 2002], which is described briefly below. The simulated parametric model of STP was proposed in Dittman et al. (2000) and is described by the following equations [Song et al., 2002] :
>3 .=,:" -8 .€
0 " 0
I
I
!
" !
I
"
I
,
! I
"
I
I
11
I
t
1000
&00
I
I
...
I
! I
'
"
...
i5. ..10'
~ : r--
"
..::a.c
i
i
2~ '
' Q. . Iln 0'" 0
6 .~
I -.00
t ,. 1 , 000
I
I I , 110OO
I
J
aooo
J I
I J
~
I
I
GOOO
11 tlI600
I
I .. 000
I
I I I " 1KlO
I I
I
-.oe
. 10-
&:ri":':"-~-,,-ri"T'3
0 Il- .
time(ms) Figura 8.31 A: The actual response (amplitude of the population spikes) of the two -input system. B: The model-predicted response based on the estimated self-kemels. C: The model-predicted response upon inclusion of the estimated cross-kemels [Courellis et al., 2000].
8.3
NEURONAL SYSTEMS WITH POINT-PROCESS INPUTS
EX.
kl
461
PK
o.s
0.$
I....)
401
1
40.
20
20
1500
500
2000
k)
0
1000
1500
2000
... ..
20
~)
I
," 1
·20 ......
-20
Figura 8.32 The reduced P-V kernel estimates tor third -order models estimated trom Poisson random train data (EI<) and optimized parametrie model (PI<) [Song et al., 2002).
EPSC(t) = a . N T' r(t) . !J(t)
(8.99)
aCaXF = -CaXFCt)/TF + S(t - t o) at
(8.100)
aCaXD = -CaXD(t)/TD + S(t - t o)
(8.10 1)
st
F(t)
=
1-F1 F , + 1 + K~CaXFCt)
(8.102)
1-F} -1 K F = (F/(l-F ,))· p-F) aD
dt =
(
(8.103)
1 - D(t)) . "-recov( caXD) - IJ(to) . r(to) . S(t - to)
(8.104)
kmax -!co
(8.105)
"-recov(caXd ) = 1 + KdcaXD(t)
+k
o
In this parametrlc model, the EPSC peaks are modeled as the product of unitary EPSC (a), the total number of release sites (NT)' the facilitation factor (F), and the depression factor (D) . F and D are calculated from the concentrations of two calcium-bound molecules, caXF and caXD, which are both driven by residual calcium. Model parameters are :
462
MODELING OF NEURONAL SYSTEMS
the initial release probability F I ; the maximum paired-pulse facilitation ratio p; the decay time-constants 'TF and 'TD for CaXF and CaXD , respectively; the minimum and maximum recovery rates k o and k m ax ; and the affinities of CaXF and eaXD for the release sites K F and K D , respectively. Note that K F and p are equivalent. The model parameters (p, F b ko, km ax , and K D ) were estimated from experimental data using the quasi-Newton optimization method yielding the model parameter values p == 1.1, F I == 0.32, k o == 1.4 sec', km ax == 15.2 sec', K D == 1.9, 'TF= 100 msec, 'TD == 50 msec, and (o : NT) == 0.54 nA. The predictive performance of the parametric model is a little worse (NMSE == 3.5%) and its equivalent P-V kernels are slightly different from their counterparts of the nonparametric model (see Fig. 8.32). The main difference is in the early facilitation ofthe reduced second-order P-V kernel (representing two-pulse interactions), where the parametric model exhibits almost double the actual values measured by the nonparametric model (which is inductive and true to the data). At this juncture, it is instructive to clarify the relation between the P-V kernels and the widely used "paired-pulse response" curves (obtained by successive experiments with two impulses of variable time separation). If Y( 'Tl' 'T2) denotes the output predicted by the third-order P-V model with two input impulses preceding the present output by 'Tl and 'T2 sec, respectively, then Y( 'Tb
'T2)
== k I + r2( 'Tl) + r2( 'T3) + 2k2 ( 'Tb
'T2)
(8.106)
where r2( 'T)
== k2 ( 'T) + k3 ( 'T, T)
(8.107)
is the paired-pulse response for a third-order system. Note that k I is equal to F I , the initial release probability defined in the parametric model, and the maximum facilitation ratio p is the summation of the peak value of r2 and 1.
8.4
MODELING OF NEURONAL ENSEMBLES
The modeling methodology presented in the previous section can be applied to any level of complexity (system integration) from the single neuron to any number of interconnected neurons (ensemble). In the latter case, a critical issue in developing an "external" model (or input-output nonparametric mapping) is the selection of the input and output points in the histological configuration of the ensemble. The input points can be either locations of external stimulation or the natural activity of a neuron in the ensemble. The output points are selected locations where the induced activity (response to stimulation) is monitored for the purpose of determining its causallink with the stimulating input point(s). The input and output points can be intracellular or extracellular. The latter case includes field recordings that reflect the collective contributions of a population of neurons in the neighborhood of the electrode (a useful measurement in some cases, discussed in Section 8.3.3). Adoption of the input-output approach facilitates the discovery of causal links between points of interest in the neuronal ensemble and yields quantitative descriptions (models) ofthese causal relationships. These models can be used for prediction purposes or to advance our understanding of the functional characteristics of the neuronal ensem-
8.4
MODELING OF NEURONAL ENSEMBLES
463
ble. We can also manipulate pharmacologically the experimental preparation in order to separate specific neurophysiological mechanisms, or we can redefine our input-output points to explore different functional aspects ofthe neuronal ensemble. However, these input-output models do not probe specific aspects of the internal workings of the ensemble and do not reveal directly detailed infonnation on the individual components. To achieve this more specific infonnation, we need to develop "structural" or modular models of the neuronal ensemble based on histological knowledge of the neuronal structural interconnectivity, or we can use functional modules defined by the "integrated submodels" of single neurons. In both cases, it behooves us to consider the single neuron as the basic functional unit that maps presynaptic inputs onto the sequence of action potentials at the axon hillock (see Sec. 8.2). It is evident that the definition of this "basic functional unit" calls for an integrated, external model of the single neuron such as the Volterra-type (or PDM) models advocated in this book. It is also clear that by defining the output as the sequence of action potentials at the axon hillock, we also imply that the "presynaptic inputs" should be represented by the outputs of presynaptic neurons (i.e., their sequences of action potentials at their axon hillocks) and not by the numerous presynaptic events/pulses arriving at the various axonal tenninals of each presynaptic neuron. Generally, for an ensemble of M possibly interconnected neurons, we may consider a multiinput Volterra model with a threshold trigger to represent the activity of each neuron in terms ofthe activity ofits cohorts. Thus, for the activity ofthe mth neuron Xm(t) =
r
T{k~m) + ~ k~m)( T)xi(t- T)dT i"*m
+.I .I ff k~7,~iTb lJ =1 12=1 0 iI*m i2*m
TZ)xi,(t- T,)xii t- Tz)dT)dTZ
+...]
(8.108)
where T denotes a threshold-trigger operator with refractoriness, and k(~: . .. ,in denotes the nth-order cross-kernel for the mth neuron describing the interactions among the neuronal activities (inputs) XiI' . . . ,xin as they affect the activity X m ofthe mth neuron. The Volterra model, which defines the operand of T, can be viewed as generating an "intracellular potential" at the hillock of the mth neuron in response to the activities of all other neurons, which is subsequently converted to action potential if it exceeds the threshold. The latter must have a refractory period and may have some stochastic characteristics (as in the Brillinger fonnulation of the problem [Brillinger, 1988]). Obviously, this Volterra model may take the modified form ofEq. (2.180) after kerne I expansion or an equivalent modular PDM form with a "trigger region" (see Fig. 8.1). The latter can be also represented by an equivalent network form to achieve greater efficiencies in model compactness with benefits in estimation and prediction. This model can be viewed as a generalization ofthe simple "integrate-and-fire" model that takes into consideration the combined effects of multiple (possibly interconnected) neuronal units in a nonlinear dynamic context, where intennodulatory effects are accounted for (i.e., the nonlinearity is more than the operation of a threshold-trigger generating the action potential). We can also define the basic functional unit of a single neuron by means of a PDM model with a "trigger region" termed the "neuronal modes", as discussed in the previous
464
MODELING OF NEURONAL SYSTEMS
section. The neuronal modes (NMs) incorporate all the dynamics of voltage-dependent and ligand-dependent channels implicated in the generation and processing of information through the neuronal unit (in the form of transmembrane voltage and currents) including propagation effects throughout the neuronal structure. It is critical to emphasize that the inputs of this "neuronal unit" are defined as the sequences of action potentials at the axon hillocks of the presynaptic neurons, and the output is defined as the elicited sequence of action potentials at the axon hillock of the subject neuron. The latter may branch out when it becomes the presynaptic input to other neuronal units (fan-out ofaxonal terminals), defining the possibly divergent "down-stream" connectivity of the subject neuron; however, the integrated dynamics of this process ofaxonal divergence and postsynaptic connectivity are incorporated in the NMs of the postsynaptic neuronal units. This is illustrated in Figure 8.33, where the NM models of several neurons (neuronal units as previously defined) are shown to form a neuronal ensemble of intricate interconnectivity. The emerging properties of such neuronal ensembles depend both on the characteristics of the individual NMs and TRs as well as on the particular interconnectivity among neuronal units. This subject matter deserves more elaboration than allowed by the space limitations and overall economy of this book. Inevitably, it will have to await more elaborate treatment in future publications. For now, it is simply presented as a plausible general framework for the detailed functional study of "internalized models" of neuronal ensembles. A more practicable alternative is offered by "external" or "black-box" models of neuronal ensembles defined by the selected input and output points. This approach is consistent with the general nonparametric methodology that constitutes the foundation of the Volterra-Wiener approach (with all its strengths and weaknesses discussed in Chapters 1 and 2). Although this input-output modeling approach is the origin of the advocated methodology, the latter has also made significant strides in "internalizing" the model (i.e., opening the black box), primarily through the use of equivalent modular models (such as the PDM or NM models) or equivalent parametric models (in the form of differential or difference equations) whenever appropriate. This remains our guiding principle in the gradual process of converting inductive models (i.e., nonparametric models true to the input-output data) into interpretable model forms that reflect more closely the internal functional organization of the subject system and elucidate detailed aspects of scientific or clinical interest. This gradual process can be assisted by the use of PDM analysis or its counterpart for neuronal systems (NM analysis). An interesting method utilizing maximum-likelihood estimation was used by Brillinger to analyze multiunit data from Aplysia californica provided by Segundo and Bryant [Brillinger et al., 1976]. The spontaneous activity ofthree abdominal ganglia was monitored and the spike-train data were analyzed to examine possible causallinks among them. The mathematical formalism follows the Volterra approach (up to second order) to define an "intracellular" potential at each neuron based on the activities of the other two neurons, and an action potential is generated if the intracellular potential exceeds a certain soft threshold that is defined by anormal cumulative [Brillinger, 1988]. The inflection point of the soft threshold is allowed to vary parabolically with the time elapsed from the last neuron firing (in an attempt to capture the refractoriness of the neuron) and also allowed to exhibit some limited Gaussian variation (stochastic threshold). The likelihood function is defined by a Bernoulli-type binomial distribution (on the assumption of statistical independence of the successive values of probability of firing) and is maximized
8.4
NEURONAL UNIT #1
NEURONAL UNIT #2
,
------------------------~
i-~u~~~~;l--8J--i
I
I I I I
INPUT
I i
.!
i
I
-
TR2:
:
I
I
L
.-
I
L-..J
#3
-
J' I
-.
I
I
#6
i IN~;6.M&lf_ I
TR6
hn
NM(4.MJ
C··
.
i
!m---~-~~~~.~~IJ--l BR5 : :
lI
+-1 NM(5.M~ I !
!
I
J
L
l~
#7
i-b=---~~~;;;----B----i
I
' :
1
lINMc;.M'lf_
I
I
~------------------~
Figure 8.33
L-J #5
L
i-~--~~~6~;;----B----l ..: :
·· .....-, ...v I
I
- - - - - - - - - - - - - - - - - - - - - - __ I
1
I
JI
~
~-
iU---~-~~~:.~~'U~.. :BR4 : ] ! +-1 I !
:~:
:B·~: ! ..-1 N~~,M.J I i TR3
I
:
~
#4
,------ ------------.,
,
i
I
:
JI
465
MODELING OF NEURONAL ENSEMBLES
TR7 :
i
i I
~------------------~
Interconnectivity in neuronal ensemble of "neuronal units" (see text).
with respect to the unknown parameters (which are akin to kernel functions or related to the thresholding operation). The results are interesting but not compelling, because ofvarious assumptions in the model structure that appear somewhat arbitrary (including the postulated statistical independence of successive values ofthe probability of firing or the assumed parabolic dependence of refractoriness). We must also note the pioneering work of Moore, Segundo, and Perkel on cross-correlation analysis of neuronal firing patterns to detennine causality in neuronal ensembles [Moore et al., 1966; Segundo et al., 1968]. This work retains some descriptive relevance to our scientific objectives until the present time, but it does not yield predictive models or deal with the intrinsie nonlinearities of the system. Last, but not least, we must point out that the presence of closed-loop connectivity raises a host of critical modeling issues that are addressed in Chapter 10.
9 Mode/ingof Nonstationary Systems
The importance of nonstationarities in physiological function is widely recognized (e.g., time-varying characteristics of the cardiovascular or metabolic-endocrine systems), although no general methodology has been accepted as broadly effective under the constraints imposed by physiological data. Some of the confounding factors are the diversity in the form ofnonstationarities encountered in physiology, the intertwining ofnonstationarities with nonlinearities, and the ubiquitous presence of noise that obscures observation and complicates the estimation task. The first task in a practical context is to examine whether the subject system exhibits significant nonstationarities under the operating conditions of interest. We must stress the importance of examining the system behavior under meaningful operating conditions in order to avoid serious misconceptions arising from contrived experimental conditions (which often tend to assume a level of credibility in the literature beyond what is warranted by the artificially imposed experimental constraints). The practical assessment of nonstationarity can be made either by means ofpreliminary testing or by comparing the relative performance of stationary and nonstationary models on the same input-output data (in terms of prediction capability). In the former case, the same experiment is repeated at different times and the obtained stationary models are compared for consistency. Obviously, significant differences between stationary models obtained at different times imply either the presence of nonstationarities or poor estimation accuracy (an issue that must be resolved by improving the estimation accuracy and reevaluating the results). This approach can be experimentally laborious but can be simplified somewhat by using specialized inputs. The comparison of the predictive performance of stationary and nonstationary models on the same input-output data requires the estimation of nonstationary models and, therefore, is subject to all the burdens and potential pitfalls of this task. However, it is less laborious experimentally and becomes the more attractive (or the only feasible) option in cases where the experimental constraints on the duration of data collection are severe. Nonlinear Dynamic Modeling 0/ Physiological Systems. By Vasilis Z. Mannarelis ISBN 0-471-46960-2 © 2004 by the Institute of Electrical and Electronics Engineers.
467
468
MODELING OF NONSTA TlONARY SYSTEMS
If a controlled input can be used, then an efficient and rigorous test of nonstationarity can be performed as follows. A stationary broadband (preferably quasiwhite) random input is applied to the system and the recorded output is analyzed to determine whether it is a stationary random process or not. Barring the presence of significant output contaminating noise that happens to be nonstationary, the stationarity (or nonstationarity) of the output signal provides an unambiguous answer to the question of system stationarity, because the latter reflects inevitably on the former. A practical and rigorous method for establishing the stationarity (or nonstationarity) of the output signal has been proposed by the author [Marmarelis, 1981b, 1983] and is outlined in Section 9.2 in connection with the kernel expansion methodology of nonstationary modeling of Volterra systems. In this chapter, we review existing methodologies for the purpose of nonstationary modeling from input-output data and emphasize the ones that we consider most promising in terms of efficacy and practicability. We begin in Section 9.1 with the most commonly used approach, which entails the tracking of nonstationary changes through time by applying either recursive algorithms or piecewise stationary methods over a sliding window of data (quasistationary methods). In Section 9.2, we discuss the nonstationary modeling methodology based on kerne I expansions that was pioneered by the author in order to provide a general methodology for modeling nonstationary Volterra-Wiener systems with broadband inputs. In Section 9.3, the nonstationary modeling of nonlinear dynamic systems is cast in the context ofVolterra-equivalent time-varying networks. Finally, in Section 9.4, we present some illustrative examples from applications to actual physiological systems. The presented methods offer effective means for nonstationary modeling of certain classes of systems. Specifically, the classes covered are systems with slow nonstationarities (relative to the system dynamics) of nearly arbitrary form, or systems with fast nonstationarities ofrestricted form (i.e., cyclical or transient). The strengths and limitations of each case are discussed below.
9.1
QUASISTATIONARY AND RECURSIVE TRACKING METHODS
The simplest and most commonly used approach to the modeling of nonstationary systems is the use of piecewise stationary modeling methods within a limited time window of the input-output data and the successive repetition of this task as the time window slides through time. Typically, substantial overlap is allowed between successive time windows, so that the evolution of changes in the obtained sequence of piecewise stationary models can be tracked reliably through time. This quasistationary approach requires the proper selection of the length of the sliding time window so that sufficient data exist for obtaining reliable stationary input-output models at each position ofthe time window, without exceeding in length the time interval over which significant nonstationary changes do not occur in the system. Therefore, a critical trade-off must be exercised between the speed of nonstationary change and the amount of required input-output data for reliable (stationary) model estimation. For this reason, the quasistationary approach is effective when the speed of nonstationary change is low relative to the system dynamics (i.e., its bandwidth), so that a time window of adequate length can be secured for (stationary) model estimation at each time position ofthe sliding window. The amount of overlap between successive windows is another parameter that can be used to finesse the strain ofthese competing requirements.
9.2
KERNEL EXPANSION METHOD
469
On the positive side, the quasistationary approach can be applied to nonstationarities of nearly arbitrary form (i.e., they need not assume specific patterns, like cyclical, transient, etc.) including cases of chaotic or stochastic variations of the system characteristics. In addition, this approach does not require any additional methodological burden or esoteric mathematical formulations that tend to be discouraging to many biomedical investigators. An illustrative example of this approach is given in Section 9.4 for the case of cerebral autoregulation. Therefore, the quasistationary approach is very attractive as a first option (provided that the nonstationarity is slow relative to the system dynamics) because many physiological systems exhibit nonstationarities with no particular temporal structure (e.g., the intermodulatory effects of endocrine secretions or intermittent activity of the autonomic nerVOtiS system on the cardiovascular system, subject to stochastic external stimulation). One drawback ofthis approach is that interpretation ofthe observed nonstationary behavior requires separate analysis, because the nonstationarity is not modeled directly (unlike the cases of kernel expansion and network based approaches that model directly the form of the nonstationarity). Ultimately, the validity and the utility of this approach must be judged on the basis of its performance in each specific application (like all other approaches). To improve the estimation accuracy of this quasistationary approach, parametric recursive methods have been introduced to track the changes of the system in a continuous fashion. One such method is outlined in Section 3.1 in connection with adaptive estimation of parametric models through the recursive least-squares (RLS) algorithm [Goodwin & Sin, 1984]. This algorithm can be applied to the parameterized form of the general Volterra model ofEq. (2.180) (termed the "modified Volterra model") after preprocessing of the input through aselected filter bank, which is equivalent to the kernel expansion on the filter-bank basis of functions. The filter banks remain time-invariant and simply the coefficients of the modified Volterra model (which can be viewed as the kernel expansion coefficients) are updated through time via the RLS algorithm (recall that the coefficients enter linearly in the model, although the latter is nonlinear in terms of the input-output relation). The constraint of slow nonstationarity (relative to the system dynamics) still applies in this case, but the estimation accuracy of the successive stationary models generally improves with RLS estimation. An example of this approach is given by Michael Khoo and his associates in the modeling study of respiratory sinus arrhythmia [Blasi et al., 2003].
9.2
KERNEL EXPANSION METHOD
This is the general approach to the problem of modeling, instead of simply tracldng (as in the previous section), the nonstationarities ofVolterra systems that are represented by explicit time variation ofthe Volterra kernels [Marmarelis, 1979b, c, 1980a, c, 1981a]. The approach, termed the "temporal expansion method" (TEM), employs a judiciously selected basis of temporal functions {ßm(t)}, defined over the input-output data record [0, R], to represent the rth-order time-varying Volterra kernel as an expansion: M
kr(t; Tb ... , Tr) =
L a~)( Tb ... , Tr)ßm(t) m=l
(9.1)
470
MODELING OF NONSTATIONARY SYSTEMS
where {a~)} can be viewed as time-invariant kernel components associated with the M functions of the temporal basis for each order of kernel. This expansion is valid for all kernels that are square integrable over time, i.e., satisfy the condition lim -1
R~oo
R
IR k;(t;
'Tl' . . . ,
0
'Tr)dt< 00
(9.2)
for all ('Tb . . . , 'Tr). Note that, even if condition (9.2) is not satisfied, an approximation of the expansion (9.1) can be obtained in practice. This type of kernel expansion can be applied also to the time-varying Wiener kernels ofthe system: M
hI];
'Tl' . . . ,
'Tr) =
I
1'~;?( 'Tb ... , 'Tr)ßm(t)
(9.3)
m=l
provided that the Wiener kemels also satisfy the square integrability condition (9.2). Recall that the Volterra or Wiener kernels must also satisfy integrability conditions with respect to ('Tb ... , 'Tr) in order to represent stable systems. Thus, for stable physiological systems, these integrability conditions imply bounded values of the kerneIs. The reader must be also alerted to the fact that the manner by which the time-varying kemels are represented in Equations (9.1) and (9.3) is different from the mathematical fonnalism employed in control system theory. In the latter, following the pioneering work of Zadeh, the time-varying representation of the impulse response function h(t, 'T') (in a linear system context) is defined by a convolution input-output relation where the variable 'T' is different from our lag-variable 'T in that it integrates the input values in reverse order from -00 to t. This distinction in mathematical fonnulation appears trivial but has profound implications in terms ofthe separability condition required by the kerneI expansion. Therefore, it must be clarified that our fonnulation follows on the established Volterra series fonnulation (with regard to the definition ofthe lag variables) and simply allows the kernels to change with time t. This is consistent with the pioneering work of Flake on time-varying Volterra models [Flake, 1963b]. The key result that was obtained by the author in 1979 (but has been largely overlooked by the peer community and periodically "reinvented" in various specialized fonns-e.g., cyclostationary analysis) is that the expansion kernel components in Equations (9.1) and (9.3) can be estimated in practice by use of a single input-output data record, provided the appropriate expansion basis {ßm(t)} is selected. The judicious selection of the temporal expansion basis {ßm( t)} is critical for the efficacy of this methodology in a practical context because it detennines the number of required expansion components. Obviously, the approach is more efficacious when a small number of expansion components is sufficient for satisfactory representation of the nonstationary dynamics of the system. Therefore, the selection of the appropriate temporal basis depends on the nonstationary characteristics ofthe specific system under study. For instance, a system with periodic (or cyclical) nonstationarities may be modeled efficiently by use of a few components ofthe Fourier basis, or a system with transient nonstationarities may be modeled efficiently by use ofpolynomial (e.g., Legendre) or polynomial-exponential (e.g., Laguerre or Gamma) basis functions. Having selected a temporal expansion basis, the estimation ofthe kernel expansion components can be accomplished either via cross-correlation (ifthe input is white or quasiwhite) or via least-squares fitting (ifthe input is arbitrary). The two cases are elaborated below.
9.2
471
KERNEL EXPANSION METHOD
For a GWN (or quasiwhite) input, the expansion components of the Wiener kernels can be estimated by "weighted" crosscorrelation of the input-output data for a single data record as 1
y~:?( Tb ... , Tr) = PR
IR ßm(t)Yr(t)x(t 0
Tl) . . .
x(t - Tr)dt
(9.4)
where the temporal expansion basis ßm(t) constitutes the "weighting function" that can be viewed as "modulating" the output data prior to crosscorrelation with the GWN or quasiwhite input x(t), P is the power level ofthe GWN or quasiwhite input, and y, denotes the rth-order output residual (defined as in the stationary case by subtracting all previously estimated terms from the output signal). It has been shown that the estimator ofEquation (9.4) is unbiased and consistent (i.e., its variance tends to zero as R tends to infinity) for systems with "sustained" nonstationarities [Marmarelis, 1979b, cl. The latter are defined by the following condition: 1 IR h;'(t; Tl' . . . , Tr)dt< o< R~oo lim -2 R-R
00
(9.5)
which pertains also to the square-integrability condition discussed earlier for the existence ofthe temporal expansion (the right side ofthe inequality). The left side ofthe inequality simply states that the kernel must not vanish after a finite point in time (this must hold for at least one kerneI of the system, so that the system operation does not "die out"). This "sustainability" condition is required for the statistical consistency of the estimator (9.4) but, in practice, it suffices to have a data record of adequate length to achieve satisfactory estimation variance. Therefore, all these mathematical conditions for R ~ 00 need only be satisfied in practice for finite data records of sufficient length. The important result regarding estimator (9.4) is that its variance decreases with increasing Rand, therefore, the kernel expansion components in Equation (9.3) can be estimated in practice using a single input-output data record, provided the input is GWN or quasiwhite. This result creates a host of exciting prospects in actual applications to physiological systems exhibiting nonstationarities with discernible structure that can be captured by the appropriate temporal expansion basis. However, in order to use the crosscorrelation-based estimator (9.4), the input must be GWN (or quasiwhite) and, thus, for arbitrary input waveforms, we must resort to the Volterra kernel-expansion formulation of the problem. By substituting the kernel expansion of Equation (9.1) into the timevarying Volterra model, we have the nonstationary input-output relation
r
1k,(t; 'T)x(t- 'T)d'T + I k2(t; 'T" 'T2)x(t - 'T,)x(t - 'T2)d'T]d'T2 + ... + Ikr(t; 'T" ... , 'Tr)x(t - 'T]) . · . x(t - 'Tr)d'T, ... dr, + ... 00
y(t) = ko(t) +
o
0
r... o
=~/m(t){a~) + + ... +
f a~)(
f ... Ia~)('T"
'T)x(t- 'T)d'T +
fI a<~)(
'T" 'T2)x(t - 'T])x(t- 'T2)d'T]d'T2
· .. , 'Tr)x(t- 'T,) .. · x(t- 'Tr)d'T, ... dr; + ...}
(9.6)
472
MODELING OF NONSTATIONARY SYSTEMS
It is evident that each temporal expansion basis function ßm(t) is associated with a full stationary Volterra series comprising all ofthe mth expansion components ofthe time-varying Volterra kernels, In order to estimate practically all these expansion components of the Volterra kemels for arbitrary input waveforms, we must resort again to yet another kernel expansion of the lag variables (as in the stationary case of the modified Volterra model). To this purpose, we use an expansion basis {bj(T)} over the lag variables that is tailored to the dynamic characteristics of the system (e.g., the Laguerre basis) to further expand the time-invariant kernel components as L
L
I ...JIC~)(jb". ,jr)bj1(Tl).·· bjr(Tr) jl=1
a~)(Th···' Tr) =
(9.7)
r=1
Subsequently, the nonstationary Volterra model ofEquation (9.6) takes the modified form y(t) =
I:IM
{L
ßm(t )
L L
c~) +j~lC~)(jl)Vh(t) +j~lj~/;,)(jhj2)Vh(t)Vh(t)
L
L
11=1
1r=1
+ ... + ~ ... ~C}:;)(jh ... ,jr)vJl(t) ... Vjr(t) + ...
}
(9.8)
This modified Volterra model is linear in terms of the unknown expansion coefficients {c~)}, which can be estimated by least-squares regression from the discretized data {x(n), y(n)} (n = tlT), based on the output expression for the Qth-order system: M
y(n) =
Q
L
L
I I I ... jr=1 I C~)(jb ... ,jr)ßm(n)Vjl(n) ... vjr(n) m=1 r=0 jl=1
(9.9)
where n = 1, ... , N is the discrete-time index of the data, and c~) = a~) for uniformity of notation. Note that vj(n) is computed as the discrete-time convolution between b, and x. The number of unknown parameters in the model of Equation (9.9) is P = M(Q + L)!/(Q!L!), and N must be sufficiently larger than P in order to achieve satisfactory estimation variance. This key requirement places the onus for the successful implementation of this approach on being able to find appropriate kerne I expansion bases (over t and T) so that the P -eN. In order to meet this key requirement, it is usually necessary to perform preliminary analysis that can guide the selection of the temporal expansion basis to minimize M. It is evident that the use of a general (complete) basis, like the Fourier basis, cannot meet this requirement because M = N in that case. If the minimization of M proves to be difficult, then the use of the modified Volterra model of Equation (9.9) becomes problematic. In that case, the crosscorrelation-based approach remains a viable option (if quasiwhite inputs are available),because it does not place any formal constraints on M relative to Nalthough a much larger N is still desirable in order to achieve satisfactory estimation variance for the kernel expansion components of Equation (9.4). Nonetheless, it must be emphasized that the use ofthe estimator ofEquation (9.4) can employ a general (complete) temporal expansion basis and does not rely on apriori minimization of M. In light of these observations, it is evident that a practical approach to the nonstationary modeling problem may involve two steps: first, use the cross-correlation-based method (if quasiwhite inputs are available) to explore the structure of the system nonsta-
9.2
KERNEL EXPANSION METHOO
473
tionarities with a complete temporal expansion basis (e.g., Fourier); and second, minimize M with appropriate selection of the system-specific functions {ßm(t)} that decompose efficiently the system nonstationary dynamies in the form of Equation (9.8) and estimate the expansion eoefficients of the time-varying Volterra kernels via least-squares regression ofEquation (9.9). A special note must be made for the case ofusing the complete Fourier basis {ei m wot } (m = 0,1, ... , M= Nil; Wo = 21T/R) for the temporal expansion ofthe time-varying kernels, because this represents an easily implementable and general method for exploring the nonstationary characteristics of the system using FFT. In fact, the resulting set of firstorder Wiener kernel expansion eomponents {1'~)} in this case is termed "the half-spectrum" and has broad utility in connection with the nonstationary problem [Marmarelis, 1981a, b, 1983, 1987b, c; Sams & Marmarelis, 1988]. It is evident that the half-spectrum eomponents can be estimated via discrete Fourier transform (or FFT) over time of the diseretized product [y(t)x(t - T)] for every T within the system memory M, properly scaled by the input power level. In the estimation of the half spectrum, special attention must be given to the problem of "leakage" assoeiated with the use of diserete Fourier transform (DFT) or FFT, for which proper eorreetions ean be applied [Sams & Marmarelis, 1988]. Note that extrapolation of the estimated nonstationarity outside the interval of observation [0, R] is not generally possible, but it may be possible in the case of periodic nonstationarities. The selection of the "signifieant" eomponents of the half-spectrum must be based on statistical hypothesis testing that utilizes the following expression for the estimation varianee of eaeh component when the input in GWN (or quasiwhite): 1 Var[ y~)(T)] = R2
+
1 R
IR dt IJLdA· hr(t; o
r o
A)
0
dt{dA(l - J1)h 1(t, T- A)h 1(t + A, T+ A)e-j2mnAiR -R R
(9.10)
Since the variance of the half-spectrum component estimates depends on the first-order time-varying Wiener kernel h 1(t, A), we must proeeed in practice sequentially by testing first the largest (in peak absolute value) component estimate against a null hypothesis of no signifieant eomponents at all. If the null hypothesis is rejected for any T value, then this eomponent is aceepted as "significant" and we proceed with the next-largest component estimate under a null hypothesis that ineorporates the previously aceepted "significant" components for the evaluation of the eonfidence bounds [detennined by the varianee ofEquation (9.10)]. This proeedure terminates when the null hypothesis is accepted and allows seleetion of the significant eomponents of the half-spectrum estimate that, in turn, allows estimation of the first-order time-varying Wiener kernel of the nonstationary system (see example in Section 9.2.1 below). Note that the half-spectrum is complex in general. It is evident that, for a stationary system, only the zero-order component, 1'0(1)( T), will be selected. This fact suggests a rigorous test of stationarity for a given system, when a GWN (or quasiwhite) input is available. When the input is arbitrary, a test for (non)stationarity is described in Section 9.2.2. This statistical hypothesis testing procedure can be applied also to determine the nonlinear order of a (possibly) nonstationary system with GWN (or quasiwhite) input, utiliz-
474
MODELING OF NONSTA TlONARY SYSTEMS
ing the respective expression for the variance of the higher-order kernel components. The nonlinear order is detennined when the null hypothesis is accepted for the largest component estimate ofthe Qth-order kernel, using confidence bounds determined by the respective variance:
Var[r9)(T\> ... , TQ)] =
'/\12oD\2
rJ o
dtdt' · ßm(t)ßm(t')c/Jit- t')c/J2Q(T\> .. · ,TQ) (9.11)
where cPe denotes the autocorrelation function ofthe residuals, and order autocorrelation function of the stationary input.
9.2.1
cP2Q
denotes the 2Qth-
Illustrative Example
Computer-simulated examples (for which "ground truth" is available) are used to demonstrate and validate the presented nonstationary modeling methodology. To simplify matters in the cross-correlation-based approach, we choose initially a first-order (linear) system that is described by the linear periodically time-varying differential equation
dy + (1 + a cos ßt)y = x dt
(9.12)
and simulate it for a GWN input signal with unit power level [Sams & Marmarelis, 1988]. The exact time-varying kernelofthis system is h(t, T) =
e-T-(2aIß)sin(ßTI2)cos(ßt-ßtI2)
u( T)
(9.13)
This kernel satisfies the integrability conditions for consistency of the kernel component estimates. Note that this linear system has only this first-order kernel, which is not termed "impulse response function" because it is not the response to an impulse input for timevarying systems [the latter being h(t, t)]. The periodic time-varying character ofthis system makes suitable the Fourier temporal expansion basis. The Fourier expansion terms of this kerneI (over time) are nonzero at all multiples of the frequency ß of the system nonstationarity. However, only a few ofthese terms are expected to be distinguished from the background "noise" level in the half-spectrum estimate, which is determined by the estimation variance of Equation (9.10). The time increment (sampling interval) used is 0.2 sec, and the parameters ofthe model ofEquation (9.12) are a = 2, ß = 1. The effective memory of this system is approximately 6 sec (31 lags) and the period of nonstationarity is 2'JT = 6.28 sec, making the two comparable in size. The simulations used 8.192 input-output data points. The lower portion (up to a frequency ofO.5 Hz) of the obtained half-spectrum magnitude is shown in Figure 9.1. This portion of the halfspectrum contains all the significant kernel component estimates, as determined by the previously outlined statistical procedure over the entire half-spectrum up to the Nyquist frequency of 2.5 Hz. As indicated in Figure 9.1, four significant components {co, Ch C2' C3} of the kerneI expansion are identified, corresponding to zero frequency and first, second, and third hannonics of the frequency of the periodic nonstationarity (1/27T Hz). Of course, kernel expansion components exist at higher harmonics but cannot be detected due to their small size relative to the background variance that is caused by the random nature of the test input and the finite length of the data record. Corrections for the "leakage effect" were made to the half-spectrum estimates at the aforementioned three harmon-
9.2
KERNEL EXPANSION METHOD
475
HALFSPECTRUM MAGNITUDE
L..J
1"-
0.1 Hz
Figure 9.1 Estimated half-spectrum magnitude for the linear time-varying system in Equation (9.12). The four significant expansion components are indicated [Sams & Marmarelis , 1988]
ies, and were critical in obtaining an accurate kernel estimate [Sams & Marmarelis, 1988]. The resulting time-varying kernel estimate, eomposed of these four components (including the respective phases), is shown in Figure 9.2 along with the exact kernel given by Equation (9.13) over the same time interval of 20 sec near the middle portion of the record. If stationary analysis were applied to this set of input-output data, the time-invariant kernel (impulse response function) shown in Figure 9.3 would be obtained. This is identical to the zero-frequeney (m = 0) eomponent ofthe half-spectrum. Segments ofthe aetual system output and the model predictions obtained from stationary and nonstationary analysis for an independent GWN input are shown in Figure 9.4. This figure demonstrates the importance of employing the nonstationary (time-varying) identification method. The normalized mean-square error ofthe time-varying model prediction is 9.8% compared to 62.5% if a time-invariant model is used. The accuraey ofthis prediction (and of the associated kernel estimate) ean be improved by increasing the length of the input-output data record, owing to the eonsistency ofthe estimator (9.4). An example of the use of the modified Volterra model (when the additional expansion over the lag variable tau is applied) is given in [Marmarelis, 1981a] for a second-order system.
9.2.2 A Test of Nonstationarity The procedure described above for the seleetion of the statistically significant components ofthe time-varying kerneI expansion, when the input is GWN (or quasiwhite), can be used also for testing the stationarity of the system. Clearly, stationarity is ascertained in the ease where only the zero-order expansion eomponent (m = 0), corresponding to a constant basis function, is seleeted as statistieally significant for all kerneis present in the system. When the input waveform is arbitrary (not GWN or quasiwhite) the aforementioned proeedure is not valid and an alternative test for nonstationarity can be based on examining the stationarity of the output signal (provided that the input signal is a stationary proeess). The test relies on the fact that the output signal will be a nonstationary random proeess if and only if the system is nonstationary (assuming stationarity of the input sig-
476
MODELING OF NONSTATIONARY SYSTEMS
(a)
h(t,T) EXACT KERNEL (b)
-: T
Flgure 9.2 (a) Segment 01 the estimated time-varying kemel 01 the system 01 Equation (9.12) constructed on the basis of the four significant half-spectrum components shown in Figure 9.1. (b) The same segment 01the exact time-vary ing kernel given by Equation (9.13) [Sams & Marmarelis, 1988).
2-'
TIME-INVARIANT KERNEl
o 2
4 T'
6
(sec)-
Figure 9.3 The time-invariant first-order kernel estimate 10r the time-varying system of Eq. (9.12), representing the zero-order component of the hal1-spectrum [Sams & Marmarelis, 1988).
9.2
I
KERNEL EXPANSION METHOD
477
TIME-INVARIANT MODEL RESPONSE
I
10 sec:
Figure 9.4 Comparison 01 the actual response 01 the time-varying system 01 Equation (9.12) (trace b) with the predicted response using the estimated time-varying model (trace a) and the time-invariant model 01 Fig. 9.3 (trace c) [Sams & Marmarelis, 1988].
nal and of possible output-additive noise). Note the distinction between a system and a signal (a random process) with regard to defining stationarity. In order to test the stationarity of the output signal, we must first estimate its autocorrelation function as follows [Marmarelis, 1981b]. Consider the autocorrelation function of a nonstationary random process y(t):
4>(t, T) = E[y(t)y(t - T)]
(9.14)
and its temporal expansion on the basis {ßm(t)} in the range [-R, +R]: M
4>(t, T) =
I
'I'm(T)ßm(t)
(9.15)
m=O
We have shown [Marmarelis, 1981b] that the single-record estimator
~m(T) = - 1 2R
JR y(t)y(t -
r)ßm(t)dt
(9.16)
-R
yields unbiased and consistent estimates of the expansion components {'I'm(T)} under the consistency condition described below for the variance of the estimator:
Var[~m(T)] = 4~2r f{E[Y(t)y(t- T)y(t)y(t'- T)- 4J(t, T)4J(t', T)} · ßm(t)ßm(t')dtdt' -R
(9.17)
which is going to zero as R tends to infinity. Consider the uniform bound B of the basis functions {ßm(t)} and a "ceiling" function a( A) which satisfies the inequality
478
MODELING OF NONSTATJONARY SYSTEMS
E[y(t)y(t - T)y(t')y(t' - T)] - cP(t, T)c/>(t', T) ~ a 2 (t - t')
(9.18)
for all t, t', and T. Then, by using inequality (9.18) in Equation (9.17) we find that
"
B2 J2R
lAI )
(
2
Var ['I'm(T)]:5 2R -2R 1 - 2R a (A)dA
B2 J2R
2
(9.19)
J2R 2 -2R a (A)dA = 0
(9.20)
2R -2R a (A)dA
:5
Therefore, the estimator is consistent if
.
1
11m -2R R~oo
Note that a 2 (A) can be seen as a ceiling function for the autocorrelation ofy(t): E[:y2(t)y(t - A)] - M 2 2 ~ a 2 (A)
(9.21)
where M 2 is an upper bound of the second moment of y(t). Therefore, existence of a function a 2 (A), which satisfies the integrability condition (9.20), is tied to the autocorrelation pattern ofY(t). We expect that in most cases ofpractical interest, the correlation between samples of y(t) will decline with increasing intersample distance at a sufficiently fast pace as to satisfy the integrability condition (9.20). These expressions can be further elaborated for Gaussian processes/signals [Marmarelis, 1983]. This rationale and the resulting estimator can be extended to single-record estimation of high-order auto- and cross-correlation functions of nonstationary processes. However, it should be noted that the obtained expansion of a nonstationary autocorrelation function is valid only within the time range [-R, +R], and extrapolation outside this range must be judged on the characteristics of each particular case at hand. Having estimated the autocorrelation function of the output signal, we can now apply a statistical test on the estimated expansion components {'I'm(T)} in sequential order according to their peak absolute value, as in the case of the kernel expansion discussed previously, except for the fact that the largest component is now accepted untested (since this is an autocorrelation function and at least one component must be nonzero). For the second largest component, the confidence bounds of the null hypothesis are based on the variance expression of Equation (9.17), where the autocorrelation function c/J(t, T) is constructed by the accepted expansion components up to this point. The procedure continues until the null hypothesis is accepted for some component (ranked according to peak absolute value). Clearly, if any component for m ~ 0 is selected, then the output signal is nonstationary and the system is also deemed nonstationary under the previously stated assumptions of input and noise stationarity. This test of nonstationarity is rigorous and encompassing all the possible kemels in the system. It can be applied for arbitrary (natural) inputs under the stated weak assumptions. If the latter assumptions are not valid, then quasistationary estimation of Volterra models (over successive time segments) can be used to examine the significance of possible changes in the estimated kemels over time.
9.2
9.2.3
479
KERNEL EXPANSION METHOD
Linear Time-Varying Systems with Arbitrary Inputs
The case of linear time-varying systems deserves separate elaboration because it offers a relatively simple tool for nonstationary analysis of "linearized" systems with arbitrary input waveforms and allows us to outline the analytical approach of parametric realization of time-varying models in the form of difference equations with time-varying coefficients. The latter can be useful in applications such as speech processing or analysis of physiological nonstationary signals (e.g., cardiopulmonary data, EEG, ECG, etc.). The general input-output relation in the linear case is given by
y(t) =
r o
(9.22)
h(t, 'T).x(t - 'T)d'T
where the time-varying kernel (impulse response function) can be expanded on the temporal basis {ßm(t)} as
h(t, T) =
I
a m(T)ßm(t)
(9.23)
m
This time-varying kernel can be estimated from arbitrary (but broadband) input-output data as the two-dimensional inverse Fourier transform (2D-IFT): (9.24)
h(t, T) = 2D - IFT{
where
c;(n)y(n - i) + ... + c}(n)y(n - 1) + co(n)y(n) = x(n)
(9.25)
we can substitute the output expression
yen) =
I I m
am(l)ßm(n)x(n-1)
(9.26)
I
and using the discrete exponential input x(n) = z", we obtain after summation over I
I m
Am(z)[cln)ßm(n - i)Zi + ... + c}(n)ßm(n - l)Z-1 + co(n)ßm(n)] = 1
(9.27)
480
MODELING OF NONSTATfONARY SYSTEMS
Ifthe discrete Fourier set: ßm(n) = exp[-jmnwo], is used for temporal expansion basis then Equation (9.27) becomes after summation over n
I
(9.28)
Am(z)fmez) = 1
m
where
f mez) = 'Yi,mejmiWOz-i + ... + 'Yl,mejmWOz-l + 'YO,m
(9.29)
and 'Yj,m denotes the mth Fourier coefficient of ej(n). Since Am(z) can be estimated from input-output data (as described above), Equation (9.28) implies that the functions I'mez) can be determined within a scalar as
Fmez) = const/Am(z)
(9.30)
Knowledge ofthe functions r mez) allows estimation ofthe coefficients {'Y.i,m} from which the time-varying coefficients {Cj(n)} U = 0, ... , i) ofEquation (9.25) can be reconstructed, yielding the desired difference equation model. Implementation of this procedure is clearly a nontrivial task but its far-reaching importance and unique capability justifies the effort in applications where parametrie time-varying models are useful.
9.3
NETWORK-BASED METHODS
It is evident from Equation (9.6) that the general nonstationary nonlinear system (satisfying the weak requirements of the Volterra class) can be modeled as a parallel set of stationary Volterra modules {Vm } whose outputs are multiplied (modulated) by selected basis functions {ßm(t)} before summing them to form the system output, as depicted in Figure 9.5.
x
Volterra Modules (Subnets)
Modulating Functions
Figure 9.5
Time-varying Volterra-equivalent network model.
9.3
NETWORK-BASED METHODS
481
Each of these modules can be represented in the modified Volterra model form using an additional kernel expansion over the lag variables, as shown in Equation (9.8), or it can be represented by a Volterra-equivalent network in order to achieve greater parsimony. The compactness benefits (i.e., less free parameters in the model) are significant in actual applications, especially when the input waveform is arbitrary (but sufficiently broadband). This is the rationale for the introduction ofthe network-based method that has been shown to be effective under stringent operating conditions where other alternative methods fail [Iatrou et al., 1999a]. Since Volterra-equivalent networks (VEN) were discussed extensively in Chapters 4 and 6, we will not elaborate on them here any further. Any of the VEN forms discussed earlier can be used for representation ofthe modules {Vm } , depending on the characteristics of each case. However, we must emphasize that the selection of the "modulating" functions at the outputs ofthe Vm modules is critical for the successful application ofthis approach. Although these modulating functions (MFs) correspond mathematically to the basis functions {ßm(t)} ofthe temporal expansion, their number has to be constrained in practice in order to achieve satisfactory convergence ofthe training ofthe network-based model within the customary constraints of applications. This selection is guided, of course, by our preexisting knowledge ofthe nature ofthe system nonstationarity that generally falls within three categories: (1) periodic or cyclical; (2) asymptotic transient; (3) transient trend. A finite number of MFs with a small number of unknown parameters must be selected for each category. For the first category (periodic or cyclical), we typically select a small number of sinusoidal MFs of unknown frequency and phase (to be determined by the data through the iterative training procedure). For the second category (asymptotic transient), we typically select a small number of exponential functions ofunknown time constants or sigmoidal functions with unknown slope and offset parameters (to be determined by the training procedure) possibly combined with polynomials. For the third category (transient trend), we may select polynomial functions or any other function deemed appropriate with a small number of free parameters (to be determined by the training procedure). It was found empirically that the inclusion ofMFs in the VEN model architecture often makes the convergence of the training algorithm more challenging, necessitating the use of specialized algorithm for faster convergence (e.g., the delta-bar-delta rule) [Haykin, 1994; Iatrou et al., 1999a, b]. Illustrative examples from simulated systems are given below and from actual physiological data in Section 9.4. 9.3.1
Illustrative Examples
To validate and demonstrate the efficacy ofthe network-based method, we use computersimulated examples oftime-varying network (TVN) models where the Volterra modules {Vm} in Figure 9.5 (also called "subnets") are simply L m - Nm cascades, and the MFs are sinusoidal, exponential, sigmoidal, and polynomial functions. The filters L m in these cascades can be viewed as "principal dynamic modes" in the context ofnonstationary modeling. In all examples, the zero-order module/subnet corresponds to ßo = 1 (viewed as the stationary component ofthe system). Broadband random (both white and nonwhite) and deterministic (chirp) input signals of 1024 data points for systems with memory-bandwidth product of 16, were used successfully, and convergence was achieved generally after less than 1000 iterations of
482
MODELING OF NONSTATIONARY SYSTEMS
training. The results were excellent in all cases, and especially for exponential (transient) and sinusoidal (periodic) nonstationarities, where the convergence was fast even for relatively short data records. For polynomial nonstationarities, convergence of the training algorithm required longer data records, especially for high-degree coefficients. Sums of sinusoids and combinations of exponential, sinusoidal, and polynomial functions also were tested successfully. This relaxes the specification requirements for model nonstationarities (i.e., form of MFs) in practice. For instance, if the system nonstationarity is periodic, sin(t), and the model nonstationarity is specified in the more general form, exp(-bt)sin(wt), then the training algorithm will yield estimates b = 0, W = 1. The method was shown to be robust in the presence of output additive noise. Illustrative examples are given in Figures 9.6-9.8 for transient and periodic nonstationarities. The robustness ofthis approach is illustrated in Figure 9.9, where the model prediction is shown to be practically impervious to output-additive noise for SNR = 0 dB. More elaboration on these examples can be found in Iatrou et al., (1999a).
ActuI1 aad ... il.~+) iqJaIse nspaase tbr_ 0.8t
Lt ActuaIIud estin-ed(+) iqJu1se response fbr_ I
L2
0.8,.......- - - - - - - - - . .
0.2
o o t
. --r=J
5
I
,
!
!
,
10
,
,
15
Time Lag
o
5
(a)
(b)
~ ofmociulator expooeat 0.02 .....-....----..-........- . . - - ,
~ of output·rms error
0.8-.-
0.015
0.6
0.01
0.4
0.005
0.2
°0
100
200
300
lteratiaDs (c)
15
r Jme Ug10
400
~
°0
.......--..,ro---....-......----.
100- 200
300
«Xl
ItendioDs (d)
Figure 9.6 Results for a simulated TVN model (see Fig. 9.5) with one stationary and one nonstationary subnet with exponential MF: ß1(n) = exp(-O.005n); panels (a) and (b) are the actual and estimated (+) impulse responses for filters Lo and L 1 , respectively; (c) convergence pattern of the MF exponent; (d) convergence pattern of the output root-mean-square error [Iatrou et al., 1999a).
Actual end estlmated (+) Impulse reaponse for filter L1
O.S.
,
,
,
10
15
ActUill (+) 8ncI estlmetecl Impulse reeponse for filter L2
0.8,
,
0
5
,
,
10
15
0.6 0.4
0.2 0
5
0
11meLag
11rne Lag
(a)
(b)
Comergence of ~eter b
,
,
0.03,
,
,
,
0.025
~mergeI ace
eoo
ollnlection point q
500
0.02
400
0.015 300
0.01
200
0.005 0
0
20
eo
40
100
80
100
0
20
40
Iterations
60
80
100
iterations
(c)
(d)
Figure 9.7 Results for a simulated TVN model with one stationary and one nonstationary subnet with sigmoidal MF: ß1(n) = 1/{1 + exp[-O.015(n - 580)]}; panels (a) and (b) are the actual and estimated (+) impulse responses for filters La and L1 , respectively; (c) convergence pattern of the MF slope; (d) convergence pattern of the MF offset [Iatrou et al., 1999a].
Actual Ind eltimated(+) impulse relpon.e for filter LI 0.8
Actual Ind eltimated(+) impulle relponle for filter L2 0.8 ,
0.8
0.8
0.4
0.4
0.2
0.2
0,
o
5 Time LI.1 0
. - .. , 15
5
(b)
(a>
0.8, - -- __ y
n
15
10 Time La.
--
~-------
--
~-----",
o.e
3
0.4
2
0.2
o -0.2
0
100
200
300
Iterationl (e)
400
0
0
100
200
·300
400
Iterationl (d)
Figure 9.8 Results for a simulated TVN model with one stationary and one nonstationary subnet with sinusoidal MF: ß1(n) = sin(O.64n); panels (a)and (b) are the actual and estimated (+) impulse responses for filters La and L1 , respectively; (c) convergence pattern for the frequency; (d) the phase [Iatrou et al., 1999a].
483
484
MODELING OF NONSTATIONARY SYSTEMS
2.5.
i
•
,
,
2 1.5 1
0.5 O~~~I""'II'l"~V"I~w",
-0.5 -1 -1.5~~ltIJ","~Vlll~."'T~'lI
-2' 1700
, 1800
, 1900
, 2000
, 2100
, 2200
Time Figure 9.9 TVN model prediction and system output for noise-free and noisy (SNR = 0 dB) cases. The estimated output (model prediction) is obtained from the TVN trained with the noisy output data and appears to be a very good approximation of the noise-free output [Iatrou et al., 1999a].
9.4
APPLICATIONS TO NONSTATIONARY PHYSIOLOGICAL SYSTEMS
The three classes of methodologies presented in the previous sections have found some initial applications in the study of nonstationary physiological systems, but surely not as many as the importance of the problem warrants. We expect, however, that increasing attention will be given to the nonstationary aspects of physiological function in the near future, as investigators realize more and more their omnipresence and significance. The presented methodologies can be important tools in pursuing this purpose, and their performance is illustrated below with initial applications. The most commonly used approach is the quasistationary method, for which we show an example taken from a study of cerebral autoregulation (elaborated in Section 6.2 for the stationary analysis). The Volterra model of the nonlinear dynamic relation between mean arterial blood pressure (input) and mean cerebral blood flow velocity (output) is estimated from a 6-min sliding window of data (having a 4-min overlap with the previous window) using fixed structural parameters for a Laguerre-Volterra network (LVN) model in each data window. The FFT magnitude ofthe estimated first-order Volterra kerneI (Le., the first-order frequency response function) is plotted in Fig. 9.10 for successive 6-min segments over a two-hour period for two different subjects [Mitsis et al., 2002]. The nonstationarity of this kernel is evident, and it is more pronounced in the frequency range below 0.08 Hz (where autoregulation is deemed to be more active). This nonstationarity is even more pronounced for the second-order kernel estimates (not shown here in the inter-
9.4
APPLICATIONS TO NONSTATIONARY PHYS/OLOGICAL SYSTEMS
10"
1 0"
i lO·I~J!~r " 10.3
485
o
!
>-
~~
1 0. 2
U.
1 0 .3
50 Time [mlnules)
100
' II!
o
""., I
"
111.11 1 "
I 1
50 Tim e Im Inules]
100
Figure 9.10 The first-order frequency response functions tracked over 2 hours of data (6 min sllding data segments with 4 min overlap) for two subjects . The nonstationarity is evident and has random appearance [Mitsis et at., 2002).
est of space) . No apparent temporal pattern is evident in the observed nonstationarities, although the matter deserves more study [Mitsis et al., 2002]. An example of the temporal expansion method is taken from a study of respiratory mechanics [Marmarelis , 1987c]. And the obtained half-spectrum and the reconstructed timevarying first-order kernel are shown in Figure 9.11 (input is forced broadband tracheal pressure and the output is the resulting tracheal airflow). The half-spectrum exhibits three significant nonstationary kernel components that are clustered around the mean breathing frequency , as expected, since the mechanical properties ofthe lungs (resistance and compliance) change as a function of the phase of the breathing cycle (e.g., minimum complianee at the end ofthe inspiratory phase, and maximum at the end ofthe expiratory phase) . Thus, the half-speerrum offers a quantitative measure of these phasic ehanges and can be used for improved diagnosis of obstructive pulmonary disease or to optimize artificial respiration (among other possible clinieal uses). The signifieanee of these cyclical nonstationary components belies the scientifie or clinical relevance of the simplified notion of stationary analysis eommonly used. The network-based method is illustrated with an application taken from a study of potentiation in the hippocampus, whereby the perforant path is stimulated with a Poisson sequenee of impulses (mean rate of five impulses per second) and the population field potential in the dendritic layer of the dentate gyrus is recorded as the output (see also Sec. 8.3.3). The relatively high mean rate ofPoisson stimulation induces potentiation, which is modeled as an asymptotic transient nonstationarity using sigmoidal modulating functions (MFs) in a second-order TVN with three modules/subnets (one stationary and two sigmoidal nonstationary) [Iatrou et al., I999b]. All subnets had a single hidden unit with second-degree polynomial activation funetions; thus, each has one "principal dynamic mode" (PDM) . The changes in responses to impulsive stimulation are illustrated in Figure
486
MODELING OF NONSTATIONARY SYSTEMS
IlALf-SPECTRIIl IIAGIlI TIm
Figure 9.11 Estimated half-spectrum magnitude (top) and time-varying kemel (bottam) of respiratory mechanics. The cluster of three significant nonstationary components is marked by "c" in the half-spectrum and lies in the neighborhood of the breathing frequency [Marmarelis , 1987c].
9.12 for five segments of data (each segment is about 3.5 sec) spanning the course ofthe potentiation process. The obtained TVN model gave excellent prediction accuracy (4.8% normalized root-rnean-square error) and its two MFs and three PDMs (corresponding to the three subnets) are shown in Figures 9.13 and 9.14. The first MF "switches on" at the beginning of the third segment and Iasts for almost 5000 sampIe points (or 1.5 sec), and the second MF "switches off" at the end ofthe second segment and lasts for almost 2500 sampIe points (or 750 msec) entering the third segment. Thus , the first nonstationary path is "off" during the first two segments and "on"
9.4
APPLICATIONS TO NONSTATIONARY PHYSIOLOG/CAL SYSTEMS
487
Reaponae to an Impulae for the live agments
0.5 1at segment
.)
2nd segment
b)
o
3rd segment
c)
4th segment
d)
-1
e)
-3'
, 20
o
, 40
, , 80 80 Time-Lag (msec)
, 100
..
120
Figure 9,12 The model based response to an impulse for the first (a), second (b), third (c), fourth (d), and fifth (e) segments from the hippocampal data during potentiation. The time-dependent changes due to potentiation are evident [Iatrau et al., 1999b].
0.8 ,
,
o 0.4
..0.1 -0.2
0.2
-0.3 -0.4
o
-0.5
-o.e
-0.2
o
10
20
30
o
«)
10
TimeL8g
20 30 Time Lag
40
(b)
(a)
0.4 . - - - -.....- -......- -.....- -......- - -
0.2
o -0.2 -0.4
o
10
2D
30
410
Time Lag
(c)
Figure 9.13 The three estimated PDMs of the trained TVN model of hippocampal potentiation with one stationary (a) and two nonstationary subnets with modulating functions: ß1 (n) = 1/{1 + exp[-b 1(n - q1)]} and ß2(n) = 1/{1 + exp[-b 2(n - q2)]} (b) and (c), respectively [Iatrau et al., 1999b].
488
MODELING OF NONSTATIONARY SYSTEMS
0.8
0.8
0.4
0.2 'Modulatorb' 2nd nonst8lionary pelh
0'
o
, 0.5
,
, 1.5
l:t"5t
2
'
,
,
,
,
,
2.5
3
3.5
4
4.5
5
l1me (slml)le points)
,
X 10·
Figure 9.14 The estimated sigmoidal MFs for the hippocampal potentiation model corresponding to the PDMs of panels (b) and (c) of Figure 9.13 [Iatrou et al., 1999b].
during the last three segments (the transitional area extends over the first half of the third segment). The second nonstationary path is "on" during the first two segments and "off' during the last three segments (the transitional area extends briefly over the end of the second segment and beginning of the third segment). The model-predicted output for the last three segments is almost identical to the actual output and a very good match over the first two segments. This result demonstrates the ability of the network-based methodology to model an important class of nonstationary systems with "on" and "off" switches that capture different states that the system may assume over time. Each switch will be a different subnet with a sigmoidal MF introduced in the structure of the TVN model. The parameters of the sigmoidal MFs (slope and offset) reflect the dynamics of the molecular and cellular processes underlying the transition from the low-excitability state to the high-excitability state (e.g., ligand-dependent conductances). Effects dependent on voltage-gated or ligand-gated channels will be reflected on the estimated PDMs and their corresponding MFs modeling state transitions. This offers the exciting prospect of rigorous hypothesis testing regarding these "switching" nonstationary phenomena. Another interesting application of nonstationary modeling of neural systems, using a method based on ensemble averaging, has been presented by Krieger et al., (1992).
10 Modelingof Closed-Loop Systems
The study of closed-loop systems is fundamental in physiology because of the importance of homeostatic and autoregulatory mechanisms that maintain the "normal" operation of living systems. Normal operating conditions must be viewed in a dynamic context, since everything is in astate of perpetual change within living systems, as weIl as in the SUfrounding environment that exerts multiple and ever-changing influences/stimulations on each living system. Furthermore, the intertwined operation of multiple closed-loop mechanisms often presents the daunting challenge of a "nested-loop" system. The importance of understanding closed-loop or nested-loop operation in physiology is rivaled by its complexity, especially if we seek a quantitative understanding that is offered by mathematical modeling. One of the key issues in the study of closed-loop systems is the "circular causality" implicit in these systems, which prevents the ordinary assumptions of causality underlying existing modeling methodologies. Various ideas have been proposed and tested to overcome this problem, typically attempting to "open the loop" experimentally or methodologically. In the approach advocated herein, the closed-loop or nested-loop system will be modeled as such, without any attempt to "open the loop." The resulting model will capture the nonlinear dynamic interrelationships among all variables of interest (for which time-series data measurements are available) regardless of the complexity of the interconnections. The successful application of this approach can have enormous potential implications for the proper study of highly interconnected physiological systems that perform homeostatic tasks under natural operating conditions (spontaneous activity). It is critical to note that the advocated approach is cast in the framework of natural operation and is not subject to experimental or methodological constraints that may distort the obtained understanding of physiological function. It is obvious that the advocated closed-loop modeling approach represents a "paradigm shift" and can have immense impact on the development of a "new systems physiology" with a subsequent quantum advance in clinical practice. It is also obvious that the comNonlinear Dynamic Modeling ofPhysiological Systems. By Vasilis Z. Marmarelis ISBN 0-471-46960-2 © 2004 by the Institute ofElectrical and Electronics Engineers.
489
490
MODELING OF CLOSED-LOOP SYSTEMS
plexity of this subject matter deserves a lengthy treatment that is not practically possible within the confines of this monograph. Therefore, this chapter will only attempt to introduce the basic concepts and broad methodological approach, deferring the detailed elaboration and the necessary demonstrations (unavailable at present) to a future monograph dedicated solely to this subject matter.
10.1
AUTOREGRESSIVE FORM OF CLOSED-LOOP MODEL
The simplest formulation of the closed-loop model takes the form of a nonlinear autoregressive (NAR) model, incorporating all measured in-the loop variables in a generalized NARMAX model form. The mathematical formulation of this problem for two "in-theloop" variables in the discrete-time Volterra context is R
y(n)
=
R
go + Igl(r)y(n - r) r=1
+
g2(rb r2)y(n - rl)y(n - r2)
+ ...
rl =1 r2=1
M
+ ko + I
R
I I M
k 1(m)x(n - m) +
m=O
M
I I
k2(mb m2)x(n - ml)x(n - m2) + ...
+ e(n) (10.1)
ml =0 m2=O
where y(n) and x(n) are the two in-the-loop variables, e(n) is the noise/interference tenn, and {gi} and {k i } are the respective sets ofVolterra kemels for this autoregressive fonnulation. This formulation can be also viewed as a nonlinear feedback model: y(n)
= G[y(n - 1), ... ,y(n - R)] + K[x(n), ... ,x(n - M)] + e(n)
(10.2)
where G denotes the "feedback" (autoregressive) Volterra operator and K denotes the "forward" Volterra operator. A more general formulation ofthe closed-loop model is obtained wheny(n) is moved to the right-hand side ofEquation (10.2) and is combined with G into a single Volterra operator: F[y(n), ... ,y(n - R)]
= G[y(n - 1), ... ,y(n - R)] - y(n)
(10.3)
so that the closed-loop model takes the form F[y(n), ... ,y(n - R)] + K[x(n), ... ,x(n - M)]
+ e(n) = 0
(10.4)
which can be considerably more general than Equation (10.2) [when y(n) enters into F in more complicated ways than a subtractive term] and refers to the modular model form shown in Figure 10.1 that has the direct conceptual interpretation of a nonlinear dynamic closed-loop system with external perturbations/disturbances. Note that in Figure 10.1, an additional "disturbance term" (n) has been included for greater generality. The operational importance ofthe "disturbance terms" e(n) and (n) is critical because they are the "driving" processes, akin to the "innovation" processes used in autoregressive modeling. They represent the interaction ofthe closed-loop system with its environment and obviously attain critical importance in the context of human physiology. Elaboration on the role and interpretation of these "disturbance terms" is deferred to a future monograph, because of the aforementioned constraints of space.
10.2
NETWORK MODEL FORM OF CLOSED-LOOP SYSTEMS
I
e(n)
•
I
I.
491
q(n)
+1- + l - - - - - - - J
1-(
Figure 10.1 The general closed-Ioop modular model form, for which we must estimate the Volterra operators K and F so that the disturbances e(n) and ~(n) satisfy certain conditions.
10.2
NETWORK MODEL FORM OF CLOSED-LOOP SYSTEMS
The nonlinear auto-regressive model of Equation (10.1) can be also cast in the form of the Volterra-equivalent network shown in Figure 10.2 (without the recursive connection). Activation of the recursive connection, shown with dashed line in Figure 10.2, leads to the bivariate closed-loop network model of Figure 10.3, where the "interaction layer" is now termed the "fusion layer" because it generates outputs {lfil(n), ... , lfiAn)} that surn with an offset lfio to form the residual term e(n). The latter can be viewed as an equilibra-
x(n)
(
y n-4-1)
~ ~
~:
FILTERBANK LAYER
HIDDEN
LAYER
INTERACTION
'--""
GI
Ya
./ GI
"\
•
LAYER
OUfPUT
+
'·~G.../'
RECURSIVE CONNECTION
- - - - - - - - - - - - - - - - - - - - - - - - ..... - - - - - - - - - - - - - - --
..
.y (n)
Figure 10.2 Volterra-equivalent network of an autoregressive component (right module). When the recursive connection (shown with dashed line) is activated, then we transition to the "equilibrating" network architecture of Figure 10.3.
492
MODELING OF CLOSED-LOOP SYSTEMS
y(n)
x(n)
PHYSIOLOGICAL VARIABLES
FlLTERBANK LAYER
B1DDEN LAYER
FUSION LAYER
BALANCING STATES
VII
VII
~Vlo
EQUILmRATION RESIDUAL (ANISOROPy)
e(n)
Figure 10.3 The "equilibrating" network architecture involving two variables x(n) and y(n) in a closed loop (physiologically, not schematically). The closed-Ioop interrelationship is represented by the generation of the "balancing states" {t/J;(n)} by the "fusion layer." The equilibration residual (anisoropy) e(n) must satisfy certain conditions vls-ä-vls the data x(n) and y(n) (see text).
tion variable (termed also "anisoropy," which means in Greek "lack of equilibrium") and must satisfy some key conditions with regard to the in-the-loop variables x(n) and y(n). The particular form ofthese conditions between e(n) and {x(n), y(n)} is the key to the successful application of this approach, since satisfaction of these conditions will be used for training this network model. It must be emphasized that mean-square minimization is not necessarily a meaningful criterion for e(n) in this case. We may require instead that e(n) have maximum "cross-entropy" withx(n) andy(n) (an "information theoretic" critenon), or have minimum "projection" on them according to some norm (other than the Euclidean norm). This subject deserves and shall receive the proper attention in the near future. In the multivariate case, the closed-loop network model is shown in Figure 10.4 and exhibits the virtues of scalability that were discussed in connection with the multi-input models in Chapter 7. It is evident that this model, also termed the "multivariate homeodynamic model," can be potentially useful in modeling the full complexity of physiologieal systems-viewed as practically intraetable heretofore. As an example of a training criterion for the multivariate ease, we may eonsider the minimization of the pth-order "dependence measure"
J:! N~xf(n-m)p(n)- ~xf(n-m)~8P(n) M
Oi,p=
1
N
N
N
1
2/ p
(10.5)
10.2
493
NETWORK MODEL FORM OF CLOSED-LOOP SYSTEMS
XM
XI
PHYSIOLOGICAL VARIABLES
FILTERBANK LAYER
HIDDEN LAYER
FUSION LAYER
DYNAMIC STATES
'1//.
'1/1
'1/0
e(n)
EQUILIBRATION RESIDUAL (ANISOROPY)
Figure 10.4 Multivariate nonlinear dynamic closed-Ioop network model of M interconnected variables, also termed the multivariate "homeodynamic" model.
for each variable x, and for one or more integer p values. Also, "classic" Euclidean projections can be used in minimizing the aggregate quantity:
=
I i
(Xi' e)2 (Xb Xi)
(10.6)
where <., .> denotes the "inner produet" (see Appendix I). Obviously, many different possibilities exist for training the multivariate "homeodynamie" network, whieh must be evaluated in the future in the presented methodological eontext. It is eritieal to note that minimization ofmean-square value ofthe anisotropy e(n) is not a sensible eriterion in this formulation of the modeling problem, but rather we must seek the minimization of metries that minimize the "mutual information" or "mutual interdependenee" between the anisotropy and the variables of interest.
APPENDIX
I
Function Expansions
Consider a set of M square-integrable functions {bm{t)} (m = 1, ... , M), defined in the time interval [A, B], that have nonzero Euclidean norm:
Ilbm(t)W
~ [b~(t)dt > 0
(Al.l)
These functions form a basis if and only if they are linearly independent; that is, each of them cannot be expressed as a linear combination of the others. A necessary and sufficient condition for linear independence is that the Gram matrix defined by the inner products of these functions {(bi' bj ) } be nonsingular Ci,} = 1, ... ,M). The definition of the inner product is
(b;, b) ~
r
blt)bß)dt
(Al.2)
A
from which we see that: (bb b) == Ilbi l12 • The basis {b m } defines a "Hilbert space" over t E [A, B] and can be viewed as a coordinate system offunctions in the sense that any function of this Hilbert space can be expressed as a linear combination of the basis functions. The key operational concept of function expansions is that any square-integrable functionj(t), t E [A, B] (not necessarily from the Hilbert space defined by the basis {b m } ) can be approximated by a linear combination of the basis functions as M
let) =
I
ambm{t)
(Al.3)
m=l
in the sense that the Euclidean norm ofthe difference [fCt) -let)] is minimized. The latter represents the "mean-square approximation error" that underlies most methods of funcNonlinear Dynamic Modeling 0/ Physiological Systems. By Vasilis Z. Mannarelis ISBN 0-471-46960-2 © 2004 by the Institute ofElectrical and Electronics Engineers.
495
496
APPENDIX I
tionlsignal approximation (or estimation in a Gaussian statistical context using maximum likelihood methods). The expansion coefficients {am} in Equation (A1.3) can be determined by minimization of the mean-square error: IJ/{t) - !(t)1J2
~ r~t) -
Zl
r
ambm(t) dt
(A1.4)
yielding the canonical equation G{l=~
(A1.S)
where G is the M x M Gram matrix {(bi' bj ) }(i,) = 1, ... , M) defined by all inner products of the basis functions, a is the vector of the unknown expansion coefficients {am}, and c is the vector ofthe inner products {if, bm ) } (m = 1, ... ,M). The canonical equation (A1.5) yields the expansion coefficients upon inversion ofthe Gram matrix {l = G-l~
(A1.6)
because the Gram matrix is nonsingular by definition (linear independence of the basis functions). The solution of Equation (A1.6) is facilitated computationally when the Gram matrix is diagonal, which corresponds to the case of an "orthogonal basis": (bb bj ) = 0, for i # j. This motivates the search for orthogonal bases in a practical context. An orthogonal basis {ßm(t)} can be always constructed from a nonorthogonal basis {bm(t)} (i.e., spanning the same Hilbert space) using the Gram-Schmidt orthogonalization procedure. For an orthogonal basis {ßm(t)}, the expansion coefficients are given by
(f, ßm) a m = (ßm, ßm)
(Al.7)
Furthermore, the orthogonal basis can be normalized to unity Euclidean norm, IIßml1 = 1, for every m = 1, ... ,M. This results in an "orthonormal basis" {ßm(t)} for which the expansion coefficients are given by
am = if, ßm)
(Al.8)
The inner product operation of Equation (Al.8) for an orthonormal basis can be viewed also as a correlation of fit) with orthonormal regression variables {ßm(t)}. The meansquare approximation error 8ft for an orthonormal basis can be expressed as M
8ft ~ 1II-111 2 = 11/11 2 -
I
m=l
a~
(Al.9)
When this error tends to zero, the basis is called "complete." Completeness applies, of course, to non-orthogonal bases as weIl. When a basis is complete for a given Hilbert space of functions, then any function of this space can be represented precisely as a linear combination of the basis functions.
FUNCTION EXPANSIONS
497
It is critical to note that the mean-square approximation (and the respective expansion coefficients) resulting from the solution ofthe canonical equation (AI.6) depends on the interval of expansion [A, B] over which the basis is defined. Thus, when the latter is changed (as in the example below), the expansion coefficients and the approximation error also change in general. This is akin to the dependence of Wiener or CSRS kemels on the input power level in the case ofJunctional expansions (see Section 2.2). Another critical distinction is between these mean-square expansions and analytical expansions (like the Taylor series). The latter are not based on error minimization but on the derivative values at the reference point of differentiable (analytic) functions only. This distinction corresponds to the distinction between Volterra and Wiener kemels in the case ofJunctional expansions discussed in Section 2.2.1. Complete bases are often enumerably infinite (e.g., the Fourier basis) and, therefore, are truncated in practice. This truncation naturally results in an approximation error that depends on the convergence pattem of the expansion and decreases monotonically with increasing number of basis functions (i.e., the dimensionality of the approximating subspace). Thus, in practice, the relevant issue is not the completeness but the convergence of the approximation error resulting from a truncated expansion for the specific case at hand. Well-established complete orthonormal (CON) bases that have been used so far include the Fourier (sinusoidal), Legendre, and Chebychev (polynomial) sets for finite expansion intervals [A, B]. For semiinfinite intervals (B ~ 00), a well-established CON basis is the Laguerre set (polynomials multiplied with an exponential) that has found many useful applications in the expansion ofkemels (see Section 2.3.2). For infinite intervals (A ~ -00, B ~ 00), a well-established CON basis is the Hennite set (polynomials multiplied with a Gaussian) that is currently finding application to image processing and studies of receptive fields in the visual system. Example We consider as an illustrative example, a function that is frequently used to represent compressive static nonlinearities in biological systems, the Michaelis-Menton function: x J(x) = x + C
(AI.IO)
defined for x E [0, 00), which has the following analytical (Taylor) expansion about x = 0 (for x ~ 0): fex)
= ~(-I)n+l(n _
1)!( ~ )n
(AI.II)
where c represents the half-max value (note that the max value of 1 is attained asymptotically as x ~ 00). If we seek a linear mean-square approximation of this function over the finite interval [0, xo], then we must consider the subspace defined by the basis {I, x} for x E [0, xo], and construct the Gram matrix (1, 1) G= [ (x, 1)
x)] [xo
(1, (x, x) = X5/2
xfj/2] xfj/3
(AI.I2)
498
APPENDIX I
whose inverse 4/xo G-I = [ -6/X6
-6/X5] 12/xfi
(AI.I3)
yields the two expansion coefficients ao and al after multiplication with the vector:
xo+cInCo:J [ (I; 1)] == ( c ) (I; x) I X6 _ cXo - c2 In Xo + c
(AI.14)
2
This yields ao ==
4
6
Xo
Xo
12
6 2([, 1)
Xo
Xo
-(I; 1) - 2([, x)
al == - 3 ([, x) -
(Al.I5)
(Al.I6)
which indicates that the slope of the linear mean-square approximation depends on Xo and is distinct from the coefficient (I/c) of the linear term of the Taylor expansion in Eq. (Al. 11). This i1lustrates the analogy between Volterra (analytical) and Wiener (orthogonal) expansions, as well as the dependence of the Wiener kernels (analogous to the orthogonal expansion coefficients) on the GWN input power level (analogous to the interval of expansion).
APPENDIX
11
Gaussian White Noise
Gaussian white noise (GWN) is a stationary and ergodie random process with zero mean that is defined by the following fundamental property: any !wo values 01 GWN are statistically independent now matter how close they are in time. The direct implication of this property is that the autocorrelation function of a GWN process w( t) is zero for nonzero arguments/shifts/lags:
cPw( T) ~ E[w(t)w(t -
T)] = E[w(t)]E[w(t - T)] = 0
(A2.1)
for all T # 0, where cPw denotes the autocorrelation function, and E[ . ] is the "expected value statistical operator." The value of cPw at T = 0 is the variance of the zero-mean process w(t), and has to be defined mathematically to be infinite for GWN because, otherwise, the GWN signal will have zero power. Recall that the power spectrum of a random process (signal) is the Fourier transform ofits autocorrelation function, and, therefore, the latter has to be defined as a delta function for GWN in order to retain nonzero power spectral density. That is, if the value of cPw at T = 0 were finite, then the power spectrum would be a null function. Thus, the autocorrelation function of GWN is defined by means of the Dirac delta function as
cPw( T) = PB(T)
(A2.2)
where P is a positive scalar termed the "power level" of GWN. The GWN power spectrum (also called the power spectral density function) is found as the Fourier transform of its autocorrelation function to be Sw(W) =P Nonlinear Dynamic Modeling ofPhysiological Systems. By Vasilis Z. Marmarelis ISBN 0-471-46960-2 © 2004 by the Institute ofElectrical and Electronics Engineers.
(A2.3) 499
500
APPENDIX"
Evidently, the power spectrum of GWN is constant over all frequencies, hence the name "white noise," in analogy to the white light that contains all (visible) wavelengths with the same power. The additional adjective "Gaussian" in GWN indicates that the amplitude distribution of the white-noise signal is Gaussian-like the independent steps in Brownian motion. However, any zero-mean amplitude distribution can define a non-Gaussian white-noise process (signal) as long as the values ofthe signal satisfy the aforementioned condition of statistical independence (see Section 2.2.4 for examples ofnon-Gaussian white processes with symmetrie amplitude distributions). Although the mathematical properties of GWN have been studied extensively and utilized in many fields, the ideal GWN signal is not physically realizable because it has infinite variance by definition (recall that the variance of a stationary zero-mean random signal is equal to the value of its autocorrelation function at zero lag). This situation is akin to the mathematical use of the Dirac delta function that has found immense utility in many fields of science and engineering but is, likewise, not physically realizable. Of course, in practice, finite-bandwidth approximations of delta functions or GWN signals can be used that are adequate for the requirements of each given application. Specifically, the critical parameter is the bandwidth ofthe system under study, which has to be covered by the bandwidth of the selected signal (i.e., the band-limited GWN input signal must cover the bandwidth of interest). The most common approximation for GWN is the band-limited GWN signal (with a bandwidth equal or exceeding the requirements of a given application) that has the "Sinc function" as autocorrelation function. The use of this and other GWN approximations (termed quasiwhite signals) are discussed in Section 2.2.4 in the context ofnonlinear system identification. It should be noted that the key property of GWN with regard to system identification is the "whiteness" and not the "Gaussianity." Thus, non-Gaussian white-noise signals (e.g., the CSRS family of quasiwhite signals discussed in Section 2.2.4) have symmetrie amplitude probability density functions and may exhibit practical advantages in certain applications over band-limited GWN. For instance, multilevel CSRS quasiwhite signals (e.g., binary, temary, etc.) may be easier to generate and apply through experimental transducers in certain applications than band-limited GWN waveforms. In the context of the Wiener approach to nonlinear system identification, it is critical to understand the high-order statistical properties of GWN signals. We note that the high-order autocorrelation functions of GWN signals have a specific structure that is suitable for nonlinear system identification following the Wiener approach. Specifically, all the oddorder autocorrelation functions are uniformly zero and the even-order ones can be expressed in terms 0/ sums 0/products 0/ delta functions. This statistical "decomposition" property is critical for the development ofthe Wiener series and its application to nonlinear system identification, as elaborated in Appendix 111 and Section 2.2.3. Let us illustrate the structure of the even-order autocorrelation functions using the fourth-order case: cP4( Th T2, T3) ~ E[w(t)w(t - TI)W(t- T2)W(t - T3)]
Using the decomposition property of zero-mean Gaussian random variables X4), which states that
(A2.4) (Xl' X2' X3,
GAUSS/AN WH/TE NO/SE
E1Xl X2 X3 X4] = E1XIX2~X3X4] + E1XIX3j.E[X2X4] + E1XIX4jE'[X2X3]
501
(A2.5)
we obtain 4>4( Tb T2' T3) = E[w(t)w(t - Tl)] . E[w(t - T2)W(t - T3)] +
E[w(t)w(t - T2)] . E[W(t- TI)W(t- T3)] + E[W(t)W(t- T3)] . E[W(t- T2)W(t - Tl)]
(A2.6)
which reduces to 4>4(Tl' T2' T3) = p2{ 8( TI)8(T2 - T3) + 5(T2)5(T3 - Tl) + 8( T3)8( Tl - T2)}
(A2.7)
for GWN, because ofEquation (A2.2). The decornposition property applies to any even number 01zero-mean Gaussian variables and states that the expected value of the product 2m Gaussian variables is equal to the surn of the products of the expected values of all possible pairs: E[XIX2 ... X2m] = IOE[XiXj]
(A2.8)
where ~n denotes the surn ofproducts ofall possible pair cornbinations of(i,}) from the indices (1, ... ,2m). Since there are (2m)!/(m!2 m ) possible decornpositions of 2m variables in m pairs, the 2mth-order autocorrelation function of GWN can be expressed as the sum of (2m)!/(m!2 m ) products of m delta functions (with the proper arguments resulting from all possible pair combinations of the time-shifted versions of w). This decomposition property is arefleetion of the fact that the Gaussian proeess is a "seeond-order process," (i.e., its high-order statisties ean be expressed entirely in terms of its seeond-order statisties). With regard to the odd-order statistics, note that the expeeted value 01any odd number 01zero-mean Gaussian variables is zero. Therefore, the odd-order autoeorrelation funetions of zero-rnean Gaussian processes (white or nonwhite) are uniformly zero. The same is true for all quasiwhite non-Gaussian processes with symmetrie amplitude distributions (probability density funetions). These properties find applieation in the kernel estimation method via eross-eorrelation presented in Sections 2.2.3 and 2.2.4.
APPENDIX
III
Construction 0/ the Wiener Series
Wiener proposed the orthogonalization of the Volterra series for Gaussian white noise (GWN) inputs in a manner akin to a Grarn-Schmidt orthogonalization procedure. Thus, the zero-order Wiener functional is a constant (like its Volterra counterpart, although of different value): (A3.1)
Go=h o
Then the first-order Wiener functional has a leading term similar to its Volterra counterpart (although involving a different kerne1)plus a multiple ofthe zero-order tenn:
GI(t) = f\I(T)x(t- T)dT+ cI,oha
(A3.2)
o
where the scalar C1,O will be detennined so that the covariance between G 1{ t ) and Go be zero (orthogonality) for a GWN input. This orthogonality condition yields C1,O = 0, because a GWN signal has zero mean:
E[GOG1{t)] = hoL''\I(T) 'E[x(t- T)]dT+ CI ohÖ = 0 o
'
~ c1,oh6 = 0
(A3.3)
Following the same procedure for the second-order Wiener functional, we have OO
G2 { t ) = LooJ h2{ Tl' o
where C2,1 and be satisfied:
T2)X{t - T1)X{t - T2) d T1 dT2
C2,O must
+ C2 1L h l ( T)x(t - T)]dT + C2 aho ' 0 '
(A3.4)
be detennined so that the following two orthogonality conditions
Nonlinear Dynamic Modeling ofPhysiological Systems. By Vasilis Z. Mannarelis ISBN 0-471-46960-2 © 2004 by the Institute of Electrical and Electronics Engineers.
503
504
APPENDIX 111
ElG2(t)G l(t)] = 0
(A3.5)
E[G2 (t)Go] = 0
(A3.6)
From condition (A3.6) we have
rf
ho
hl TJ, Tz)E[X(t - Tl)x(t - Tz)]dTldTz + cz,oh& = 0 P
=> C2,O = --h
jooh (Tb Tl)dTl
(A3.7)
2
o 0
because E[x(t - Tl)X(t - T2)] = P5( Tl - T2) and E[x(t - T)] = 0, for a GWN input with power level P. From condition (A3.5) we get C2,1 = 0, otherwise orthogonality between G2 and GI cannot be secured. Therefore Gz(t) =
rf o
hl Tl' Tz)x(t - :Tl)x(t - Tz)dTjdTZ - Pjoohz( TJ, Tj)dTI
(A3.8)
0
This procedure can be continued for higher-order Wiener functionals and leads to separation of odd and even functional terms, because of the statistical properties of GWN discussed in Appendix 11. For instance, the third-order Wiener functional is G 3(t) =
joo ff h3(TJ, TZ, T3)x(to
rf
- 3P
o
Tj)x(t - Tz)x(t - T3)dTj dTzdT3
(A3.9)
h 3(TJ, A, A)x(t - Tj)dTjdA
For additional details, see Marmarelis & Marmarelis (1978) or Schetzen (1980). Using the orthogonality ofthe Wiener functionals, we can derive a general expression for the autocorrelation function of the output of a nonlinear system: E[y(t)y(t- u)] = IE[Glt)Glt- u)]
(A3.10)
i=O
which indicates that the autocorrelation of the system output is composed of the partial autocorrelations of the Wiener functionals. A corollary ofthe general expression (A3.10) is the expression for the output variance in terms of the Wiener kemels, since Var[y(t)] = E[y2(t)] - hfj
(A3.11)
and E[y2(t)] = IE[G;(t)] r=0
=
Ir!.?'f. ..
r=O
r
h;( TJ,
. • . ,
Tr)dTj . · . dr,
0
utilizing the autocorrelation properties ofthe GWN process (see Appendix 11).
(A3.12)
APPENDIX
IV
Stationarity, Ergodicity, and Autocorrelation Functions 01 Random Processes
The time course of continuous observations of variables associated with a natural phenomenon or experiment can be described by a "random process," because ofthe stochastic element intrinsie to these observations, which necessitates "probabilistic" (as opposed to "deterministic") descriptions of these variables. Thus a random process (RP) can be viewed as a function of time (or signal) whose value at each point in time is a random variable and can be described only probabilistically (i.e., we can assign a certain probability of occurrence to each possible value ofthis signal at any given time). The RP, X(t), is often denoted with a capitalletter as the "ensemble" of all possible realizations {xlt)}. Each realization is termed a "sample function" and is denoted by lowercase letters. The amplitude probability density function (APDF) p(x, t) of the RP is defined as Prob lx., :::; X(t) < Xo + dx}
=
p(xo, t)dx
(A4.1)
and may gene rally depend on time t. Likewise, the kth joint APDF can be defined as Prob lx, :::; X(t l ) < Xl + dx .,
= Pk(Xh ... ,Xk; t h
,Xk :::; X(tk) < Xk + dXk} , tk)dxI ... dx;
(A4.2)
When all joint APDFs are time-invariant, the RP is called "stationary" and the expressions are simplified by omitting the explicit reference to time. This, in fact, is the main class of RPs considered in practice. In order to describe the statistical relations among the different samples/values of the RP at different times, we introduce the autocorrelation functions Nonlinear Dynamic Modeling ofPhysiological Systems. By Vasilis Z. Mannarelis ISBN 0-471-46960-2 © 2004 by the Institute ofElectrical and Electronics Engineers.
505
506
APPENDIX IV
cPk(t h
... ,
tk) = E1X(tI) ... X(tk)]
= J . · . JOO Xl
... X,piXI> ... ,Xk; tl> ... , tk)dx] ...
dx,
(A4.3)
--00
which are somewhat simplified for stationary RPs by considering only the time differences (tl - t) for i = 2, ... , k, as affecting the value ofthe kth-order autocorrelation function. In addition to stationarity, the other key attribute of RPs is ergodicity. An ergodie RP retains the same statistical characteristics throughout the entire ensemble, the same way a stationary process retains the same statistical characteristics over time. In practice, we typically assume that the observed RP is ergodie, although this assumption ought to be tested and examined. When the RP is ergodie and stationary, the ensemble averaging can be replaced by time averaging. Therefore, the kth-order autocorrelation function of a stationary and ergodie RP is given by cPk( Th
... ,
Tk-I) = lim - 1 R~oo 2R
JR X(t)X(t -R
Tl) . . .
X(t - Tk-I)dt
(A4.4)
It is evident that in practice (where only finite data records are available) estimates of the autocorrelation functions are obtained for finite R. Likewise, only estimates of the APDF can be obtained via amplitude histograms over the finite data records. In a practical context, one or more realizations of the RP can be recorded over a finite time interval and estimates of the APDF and the autocorrelation functions can be obtained. The respective multi dimensional Fourier transform ofthe autocorrelation function defines the "polyspectrum" or the "high-order spectrum" of the RP for the respective order. Typically, only the second-order autocorrelation function is estimated, yielding an estimate of the RP spectrum via finite Fourier transform. Occasionally, the bispectrum and the trispectrum are estimated from the third- and fourth-order autocorrelation functions via the respective Fourier transform. This is meaningful only for non-Gaussian processes, since Gaussian processes are fully described by the second-order autocorrelation function (see Appendix 11 for the "decomposition" property ofGaussian variables). Thus, the two aspects of an ergodie and stationary RP that are typically examined are its spectrum (or the corresponding second-order autocorrelation function) and its APDF. The former allows the classification into white or nonwhite RPs, and the latter determines the amplitude characteristics (Gaussian or not, multilevel etc.).
References
Aaslid, R.K., W. Lindengaard, W. Sorterberg, and H. Nomes. (1989). Cerebral autoregulation dynamics in humans. Stroke 20:45-52. Abdel-Malek, A. and V.Z. Marmarelis. (1988). Modeling of task-dependent characteristics of human operator dynamics during pursuit manual tracking. IEEE Transaetions on Systems, Man, and Cyberneties 18: 163-172. Abdel-Malek, A., C.H. Markham, P.Z. Marmarelis, and V.Z. Marmarelis. (1988). Quantifying deficiencies associated with Parkinson's disease by use of time-series analysis. Journal of Eleetroeneephalography & Clinieal Neurophysiology 69:24-33. Adelson, E.H. and J.R. Bergen. (1985). Spatiotemporal energy models for the perception ofmotion. J. Opt. Soe. Am. A 2:284-299. Aertsen, A.M. and P.l. Johanesma. (1981). The spectrotemporal receptive field: a functional characteristic of auditory neurons. Biologieal Cyberneties 69:407-414. Alataris, K., T.W. Berger, and V.Z. Marmarelis. (2000). A novel network for nonlinear modeling of neural systems with arbitrary point-process inputs. Neural Networks 13:255-266. Amorocho, J. and A. Brandstetter. (1971). Determination ofnonlinear functional response functions in rainfall-runoffprocesses. Water Resourees Research 7:1087-1101. Aracil, J. Measurements ofWiener kemels with binary random signals. (1970). IEEE Transactions on Automatie Control15: 123-125. Arbib, M.A., P.L. Falb, and R.E. Kaiman. (1969). Topics in Mathematical System Theory. McGraw-Hill, New York. Astrom, K.J. and P. Eykhoff. (1971). System identification-a survey. Automatiea 7: 123-162. Barahona, M. and C.S. Poon. (1996). Detection of nonlinear dynamics in short, noisy time series. Nature 381:215-217. Bardakjian, B.L., W.N. Wright, T.A. Valiante, and P.L. Carlen. (1994). Nonlinear system identification of hippocampal neurons. In: Advaneed Methods ofPhysiologieal System Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 179-194. Nonlinear Dynamte Modeling ofPhysiological Systems. By Vasilis Z. Mannarelis ISBN 0-471-46960-2 © 2004 by the Institute of Electrical and Electronics Engineers.
507
508
REFERENCES
Barker, H.A. (1967). Choice ofpseudorandom binary signals for system identification. Electronics Letters 3:524-526. Barker, H.A. (1968). Elimination of quadratic drift errors in system identification by pseudorandom binary signals. Electronics Letters 4:255-256. Barker, H.A. and R.W. Davy. (1975). System identification using pseudorandom signals and the discrete Fourier transform. Proceedings lEE 122:305-311. Barker, H.A. and S.N. Obidegwu. (1973). Effects ofnonlinearities on the measurement ofweighting functions by crosscorrelation using pseudorandom signals. Proceedings lEE 120: 1293-1300. Barker, H.A. and T. Pradisthayon. (1970). High-order autocorrelation functions of pseudorandom signals based on M-sequences. Proceedings lEE 117:1857-1863. Barker, H.A., S.N. Obidegwu, and T. Pradisthayon. (1972). Performance of antisymmetrie pseudorandom signals in the measurement of second-order Volterra kemels by crosscorrelation. Proceedings lEE 119:353-362. Barrett, J.F. (1963). The use of functionals in the analysis of nonlinear physical systems. J. Electron. ControI15:567-615. Barrett, J.F. (1965). The use ofVolterra series to find region of stability of a nonlinear differential equation. International Journal ofControll :209-216. Barrett, T.R. (1975). On linearizing nonlinear systems. Journal ofSound Vibration 39:265-268. Bassingthwaighte, J.B., L.S. Liebovitch, and B.J. West. (1994). Fractal Physiology, Oxford University Press, Oxford. Baumgartner, S.L. and W.J. Rugh. (1975). Complete identification of a class of nonlinear systems from steady state frequency response. IEEE Trans. on Circuits and Systems 22:753-758. Bedrosian, E. and S.O. Rice. (1971). The output properties ofVolterra systems (nonlinear systems with memory) driven by harmonie and Gaussian inputs. Proceedings IEEE 59:1688-1707. Bekey, G.A. (1973). Parameter estimation in biological systems: A survey. Proceedings of the Third IFAC Symposium-Identijication and System Parameter Estimation, North-Holland, Amsterdam, pp. 1123-1130. Bellman R. and K.J. Astrom. (1969). On structural identifiability. Math. Biosei. 1:329-339. Belozeroff, V., R.B. Berry, and M.C.K. Khoo. (2002). Model-based assessment of autonomie control in obstructive sleep apnea syndrome. Sleep 26(1):65-73. Benardete, E.A. and J.D. Victor. (1994). An extension ofthe m-sequence technique for the analysis ofmulti-input nonlinear systems. In: Advanced Methods ofPhysiological System Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 87-110. Bendat, J. S. (1976). System identification from multiple input/output data. Journal ofSound Vibration 49:293-308. Bendat, J.S. (1998). Nonlinear Systems Techniques and Applications. Wiley, New York. Bendat, J.S. and A.G. Piersol. (1986). Random Data: Analysis and Measurement Procedures, 2nd Edition. Wiley, New York. Berger, T.W. and R.J. Sclabassi. (1988). Long-term potentiation and its relation to changes in hippocampal pyramidal cell activity and behavioral leaming during classical conditioning. In: Long-term Potentiation: From Biophysics to Behavior, Alan R. Liss, New York, pp. 467-497. Berger, T.W., G. Chauvet, and R.J. Sclabassi. (1994). A biologically-based model ofthe functional properties ofthe hippocampus. Neural Networks 7:1031-1064. Berger, T.W., G.B. Robinson, R.L. Port, and R.J. Sclabassi. (1987). Nonlinear systems analysis of the functional properties ofhippocampal formation. In: Advanced Methods ofPhysiological System Modeling, Volume /, V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 73-103.
REFERENCES
509
Berger, T.W., J.L. Eriksson, D.A. Ciarolla, and R.J. Sclabassi. (1988a). Nonlinear systems analysis of the hippocampal perforant path-dentate projection. 11. Effect of random pulse stimulation. Journal 0/ Neurophysiology 60: 1077-1094. Berger, T.W., J.L. Eriksson, D.A. Ciarolla, and R.J. Sclabassi. (1988b). Nonlinear systems analysis of the hippocampal perforant path-dentate projection. 111. Comparison of random train and paired impulse stimulation. Journal 0/ Neurophysiology 60: 1095-1109. Berger, T.W., T.P. Harty, G. Barrionuevo, and R.J. Sclabassi. (1989). Modeling of neuronal networks through experimental decomposition. In: Advanced Methods 0/ Physiological System Modeling, Volume IL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 113-128. Berger, T.W., G. Barrionuevo, S.P. Levitan, D.N. Krieger, and R.J. Sclabassi. (1991). Nonlinear systems analysis of network properties of the hippocampal formation. In: Neurocomputation and Learning: Foundations 0/ Adaptive Networks, J.W. Moore and M. Gabriel (Eds.), MIT Press, Cambridge, MA, pp. 283-352. Berger, T.W., G. Barrionuevo, G. Chauvet, D.N. Krieger, S.P. Levitan, and R.J. Sclabassi. (1993). A theoretical and experimental strategy for realizing a biologically based model of the hippocampus. In: Synaptic Plasticity: Molecular, Cellular and Functional Aspects, R.F. Thompson and J.L. Davis (Eds.), MIT Press, Cambridge, MA, pp. 169-207. Berger, T.W., T.P. Harty, C. Choi, X. Xie, G. Barrionuevo, and R.J. Sclabassi. (1994). Experimental basis for an input/output model of the hippocampal formation. In: Advanced Methods 0/ Physiological System Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 29-54. Berger, T.W., M. Baudry, R.D. Brinton, J.-S. Liaw, V.Z. Marmarelis, A.Y. Park, B.J. Sheu, and A.R. Tanguay, Jr. (2001). Brain-implantable biomimetic electronics as the next era in neural prosthetics. Proceedings IEEE 89:993-1012. Bergman, R.N. and J.C. Lovejoy (Eds.). (1997). The Minimal Model Approach and Determinants of Glucose Tolerance. Pennington Center Nutrition Series, Vol. 7, Louisiana State Univ. Press, Baton Rouge, LA and London. Bergman, R.N., C.R. Bowden, and C. Cobelli. (1981). The minimal model approach to quantification offactors controlling glucose disposal in man. In: Carbohydrate Metabolism, Wiley, New York, pp. 269-296. Billings, S.A. (1980). Identification ofnonlinear systems-a survey. Proceedings lEE 127:272-285. Billings, S.A. and S.Y. Fakhouri. (1978). Identification of a class of nonlinear systems using correlation analysis. Proceedings lEE 125:691-695. Billings, S.A. and S.Y. Fakhouri. (1979). Identification ofnonlinear unity feedback systems. International Journal ofSystem Seien ce 10:1401-1408. Billings, S.A. and S.Y. Fakhouri. (1981). Identification of nonlinear systems using correlation analysis and pseudorandom inputs. International Journal ofSystem Science 11:261-279. Billings, S.A. and S.Y. Fakhouri. (1982). Identification of systems containing linear dynamic and static nonlinear elements. Automatica 18:15-26. Billings, S.A. and I.J. Leontaritis. (1982). Parameter estimation techniques for nonlinear systems. In: IFAC Symposium on Identification and System Parameter Estimation, Arlington, VA, pp. 427--432. Billings, S.A. and W.S.F. Voon. (1984). Least-squares parameter estimation algorithms for nonlinear systems. International Journal of System Science 15:610-615. Billings, S.A. and W.S.F. Voon. (1986). A prediction-error and stepwise-regression estimation algorithm for nonlinear systems. International Journal 0/Control44:803-822. Blasi A., J. Jo, E. Valladares, B.J. Morgan, J.B. Skatrud, and M.C. Khoo. (2003). Cardiovascular variability after arousal from sleep: time-varying spectral analysis. Journal 0/Applied Physiology 95(4):1394-1404.
510
REFERENCES
Blum, E.K. and L.K. Li. (1991). Approximation theory and feedforward networks. Neural Networks 4:511-515. Boden, G, X. Chen, J. Ruiz, J.V. White, and L. Rossetti. (1994). Mechanism of fatty-acid induced inhibition of glucose uptake. Journal 01 Clinical Investigation 93:2438-2446. Borsellino, A. and M.G. Fuortes. (1968). Responses to single photons in visual cells of Limulus. Journal 0/ Physiology 196:507-539. Bose, A.G. (1956). A theory ofnonlinear systems. Technical Report No. 309, Research Laboratory ofElectronics, M.I.T., Cambridge, MA. Boyd, S. and L.G. Chua. (1985). Fading memory and problem of approximating nonlinear operators with Volterra series. IEEE Transactions on Circuits and Systems 32:1150-1160. Boyd, S., L.G. Chua, and C.A. Desoer. (1984). Analytical foundation of Volterra series. J. Math. Contr. Info. 1:243-282. Boyd, S., Y.S. Tang, and L.O. Chua. (1983). Measuring Volterra kemels. IEEE Transactions on Circuits and Systems 30:571-577. Briggs, P.A. and K.R. Godfrey. (1966). Pseudorandom signals for the dynamic analysis of multivariable systems. Proceedings IEEE 113:1259-1267. Brilliant, M. B. (1958). Theory ofthe analysis ofnonlinear systems. Technical Report No. 345, Research Laboratory ofElectronics, M.I.T., Cambridge, MA. Brillinger, D.R. (1970). The identification ofpolynomial systems by means ofhigher order spectra. Journal ofSound Vibration 12:301-313. Brillinger, D.R. (1975a). The identification of point process systems. Annals 01 Probability 3:909-929. Brillinger, D.R. (1975b). Time Series: Data Analysis and Theory, Holt, Rinehart & Winston, New York. Brillinger, D.R. (1976). Measuring the association ofpoint processes: A case history. The American Mathematical Monthly 83:16-22. Brillinger, D.R. (1987). Analyzing interacting nerve cell spike trains to assess causal connections. In: Advanced Methods 0/ Physiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 29-40. Brillinger, D.R. (1988). The maximum likelihood approach to the identification of neuronal firing systems. Annals 01 Biomedical Engineering 16:3-16. Brillinger, D.R. (1989). Parameter estimation for nongaussian processes via second and third order spectra with an application to some endocrine data. In: Advanced Methods 0/ Physiological System Modeling, Volume IL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 53-62. Brillinger, D.R., H. Bryant, and J.P. Segundo (1976). Identification ofsynaptic interactions. Biological Cybernetics 22:213-228. Brillinger, D.R. and J.P. Segundo. (1979). Empirical examination ofthe threshold model ofneuron firing. Biological Cybernetics 35:213-220. Brillinger, D.R. and A.E.P. Villa. (1994). Examples ofthe investigation ofneural information processing by point process analysis. In: Advanced Methods 0/ Physiological System Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 111-127. Brockett, R.W. (1976). Volterra series and geometrie control theory. Automatica 12:167-176. Bryant, H.L., A.R. Marcos, and J.P. Segundo. (1973). Correlations of neuronal spike discharges produced by monosynaptic connections and by common inputs. Journal 0/ Neurophysiology 36:205-225. Bryant, H.L. and J.P. Segundo. (1976). Spike initiation by transmembrane current: a whitenoise analysis. Journal 0/ Physiology 260:279-314. Bussgang, J.J. (1952). Crosscorrelation functions of amplitude distorted Gaussian signals. Technical Report No. 216, M.I.T. Research Laboratory ofElectronics, Cambridge, MA.
REFERENCES
511
Bussgang, J.J., L. Ehrman, and J.W. Graham. (1974). Analysis ofnonlinear systems with multiple inputs. Proceedings IEEE 62:1088-1119. Cambanis, S. and B. Liu. (1971). On the expansion of a bivariate distribution and its relationship to the output of a nonlinearity. IEEE Transactions on Information Theory 17: 17-25. Cameron, R.H. and W.T. Martin. (1947). The orthogonal development ofnonlinear functionals in series of Fourier-Hermite functionals. Annals ofMathematics 48:385-392. Carson, E.R., C. Cobelli, and L. Finkelstein. (1983). The Mathematieal Modeling ofMetabolie and Endocrine Systems. Wiley, New York. Caumo, A., P. Vicini, J.J. Zachwieja, A. Avogaro, K. Yarasheski, D.M. Bier, D.M., and C. Cobelli. (1996). Undermodeling affects minimal model indexes: insights from a two compartmental model. American Journal ofPhysiology 276:EI171-EI193. Chan, R.Y. and K.-I. Naka. (1980). Spatial organization of catfish retinal neurons. 11. Circular stimulus. Journal ofNeurophysiology 43:832. Chang, F.H.I. and R. Luus. (1971). A non-iterative method for identification using Hammerstein model. IEEE Transactions on Automatie ControI16:464-468. Chappell, R.L., K.-I. Naka, and M. Sakuranaga. (1984). Turtle and catfish horizontal cells show different dynamics. Vision Research 24: 117-128. Chappell, R.L., K.-I. Naka, and M. Sakuranaga. (1985). Dynamics ofturtle horizontal cells. Journal ofGeneral Physiology 86:423-453. Chen, H.-W., N. Ishii, and N. Suzumura. (1986). Structural classification ofnon-linear systems by input and output measurements. International Journal ofSystem Seience 17:741-774. Chen, H.-W., N. Ishii, M. Sakuranaga, and K.-I. Naka. (1985). A new method for the complete identification of some classes of nonlinear systems. In: 15th NIBB Conference on Information Processing in Neuron Network, K.-I. Naka and Y.I. Ando (Eds.), Okazaki, Japan. Chen, H.-W., D. Jacobson, and J.P. Gaska. (1990). Structural classification ofmulti-input nonlinear systems. Biologieal Cyberneties 63:341-357. Chen, H.-W., D. Jacobson, J.P. Gaska, and D.A. Pollen. (1993). Cross-correlation analyses ofnonlinear systems with spatiotemporal inputs. IEEE Transactions on Biomedical Engineering 40:1102-1113. Chian, M.T., V.Z. Marmarelis, and T.W. Berger. (1998). Characterization ofunobservable neural circuitry in the hippocampus with nonlinear system analysis. In: Computational Neuroscience, 1.M. Bower (Ed.), Plenum Press, New York, pp. 43-50. Chian, M.T., V.Z. Marmarelis, and T.W. Berger. (1999). Decomposition of neural systems with nonlinear feedback using stimulus-response data. Neurocomputing 26-27:641-654. Chan, K.H., N.H. Holstein-Rathlou, and V.Z. Marmarelis. (1998a). Comparative nonlinear modeling of renal autoregulation in rats: Volterra approach vs. artificial neural networks. IEEE Transactions on Neural Networks 9:430-435. Chan, K.H., T.J. Mullen, and R.J. Cohen. (1996). A dual-input nonlinear system analysis of autonomic modulation ofheart rate. IEEE Trans. on Biomedical Engineering 43:530-544. Chan, K.H., N.-H. Holstein-Rathlou, D.l. Marsh, and V.Z. Marmarelis. (1994a). Parametrie and nonparametric nonlinear modeling of renal autoregulation dynamics. In: Advanced Methods of Physiological System Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 195-210. Chon, K.H., Y.M. Chen, N.H. Holstein-Rathlou, D.J. Marsh and V.Z. Marmarelis. (1998b). Nonlinear system analysis of renal autoregulation in normotensive and hypertensive rats. IEEE Trans. Biomedical Engineering 45:342-353. Chan, K.H., Y.M. Chen, N.H. Holstein-Rathlou, D.l. Marsh, and V.Z. Marmarelis. (1993). On the efficacy of linear system analysis of renal autoregulation in rats. IEEE Trans. Biomedical Engineering 40:8-20.
512
REFERENCES
Chon, K.H., Y.M. Chen, V.Z. Marmarelis, D.J. Marsh, and N.H. Holstein-Rathlou, (1994b). Detection of interactions between myogenic and TGF mechanisms using nonlinear analysis. American Journal 01Physiology 265:F160-F173. Chua, L. and Y. Liao. (1989). Measuring Volterra kernels (11). International Journal ofCircuit Theory and Applications 17: 151-190. Chua, L. and Y. Liao. (1991). Measuring Volterra kernels (111). International Journal of Circuit Theory and Applications 19:189-209. Chua, L. and C. Ng. (1979). Frequency domain analysis of nonlinear systems: general theory, formulation oftransfer functions. IEEE Transactions on Circuits and Systems 3:165-185, 257-269. Church, R. (1935). Tables of irreducible polynomials for the first four prime moduli. Annals of Mathematics 36: 198. Citron, M.C. (1987). Spatiotemporal white noise analysis of retinal receptive fields. In: Advanced Methods 01 Physiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 161-1 71. Citron, M.C. and R.C. Emerson. (1983). White noise analysis of cortical directional selectivity in cat. Brain Research 279:271-277. Citron, M.C. and V.Z. Marmarelis. (1987). Application ofminimum-order Wiener modeling to retinal ganglion cell spatio-temporal dynamies. Biological Cybernetics, 57:241-247. Citron, M.C., R.C. Emerson, and L.A. Ide. (1981a). Spatial and temporal receptive-field analysis of the cat's geniculocortical pathway. Vision Research 21:385-397. Citron, M.C., J.P. Kroeker, and G.D. McCann. (1981b). Non-linear interactions in ganglion cell receptive fields. Journal 01Neurophysiology 46:1161-1176. Citron, M.C., R.C. Emerson, and W.R. Levick. (1988). Nonlinear measurement and classification of receptive fields in cat retinal ganglion cells. Annals 01Biomedical Engineering 16:65-77 Clever, W.C. and W.C. Meecham. (1972). Time-dependent Wiener-Hermite base for turbulence. Physics 01Fluids 15:244-255. Cobelli, C. and A. Mari. (1983). Validation ofmathematical models of complex endocrinemetabolic systems. A case study of a model of glucose regulation. Medical & Biological Engineering & Computing. 21:390-399. Cobelli, C. and G. Pacini. (1988). Insulin secretion and hepatic extraction in humans by minimal modeling of C-peptide and insulin kinetics. Diabetes 37:223-231. Courellis, S.H. and V.Z. Marmarelis. (1989). Wiener analysis of the Hodgkin-Huxley Equations. In: Advanced Methods 01 Physiological System Modeling, Volume IL V. Z. Marmarelis (Ed.), Plenum, New York, pp. 273-289. Courellis, S.H. and V.Z. Marmarelis. (1990). An artificial neural network for motion detection and speed estimation. In: International Joint Conference on Neural Networks, Volume L San Diego, CA, pp. 407-422. Courellis, S.H. and V.Z. Marmarelis. (1991a). Sensitivity enhancement of elementary velocity estimators with self and lateral facilitation. In: Proceedings 01 the IEEE International Joint Conference on Neural Networks, Volume L Seattle, WA, pp. 749-758. Courellis, S.H. and V.Z. Marmarelis. (1991b). Speed ranges accommodated by network architectures of elementary velocity estimators. In: Proceedings 01 Visual Communication and Image Pro cess ing, Volume 1606, Boston, MA, pp. 336-349. Courellis, S.H. and V.Z. Marmarelis. (1992a). Nonlinear functional representations for motion detection and speed estimation schemes. In: Nonlinear Vision, R. Pinter and B. Nabet (Eds.), CRC Press, Boca Raton, FL, pp. 91-108. Courellis, S.H. and V.Z. Marmarelis. (1992b). Velocity estimators of visual motion in two spatial dimensions. In: International Joint Conference on Neural Ne tworks, Volume IIL Baltimore, MD, pp. 72-83.
REFERENCES
513
Courellis, S.H., V.Z. Marmarelis, and T.W. Berger. (2000). Modeling event-driven nonlinear dynamics in biological neural networks. In: Proceedings 0/ the 7th Symposium on Neural Computation, Los Angeles, CA, Volume 10, pp. 28-35. Cronin, J. (1987). Mathematical Aspects 0/ Hodgkin-Huxley Neural Theory. Cambridge University Press, Cambridge. Crum, L.A. and J.A. Heinen. (1974). Simultaneous reduction and expansion of multidimensional Laplace transform kemels. SIAM Journal 0/ Applied Mathematics 26:753-771. Curlander, J.C. and V.Z. Marmarelis. (1983). Processing ofvisual information in the distal neurons ofthe vertebrate retina. IEEE Transactions on Systems, Man and Cybernetics. 13:934-943. Curlander, J.C. and V.Z. Marmarelis. (1987). A linear spatio-temporal model ofthe light-tobipolar cell system and its response characteristics to moving bars. Biological Cybernetics 57:357-363. Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics 0/ Control, Signals & Systems 2:303-314. D' Argenio, D.Z. (Ed.). (1991). Advanced Methods 0/ Pharmacokinetic and Pharmacodynamic Systems Analysis, Volume I Plenum, New York. D'Argenio, D.Z. (Ed.). (1995). Advanced Methods ofPharmacokinetic and Pharmacodynamic Systems Analysis, Volume IL Plenum, New York. D' Argenio, D.Z. and V.Z. Marmarelis. (1987). Experiment design for biomedical system modeling. In: Systems & Control Encyclopedia: Theory, Technology, Applications, M.G. Singh (Ed.), Pergamon Press, Oxford, pp. 486-490. Dalal, S.S., V.Z. Marmarelis, and T.W. Berger. (1997). Anonlinearpositive feedback model ofglutamatergic synaptic transmission in dentate gyrus. In: Proceedings 0/ the 4th Joint Symposium on Neural Computation, Los Angeles, CA, 7:68-75. Dalal, S.S., V.Z. Marmarelis, and T.W. Berger. (1998). A nonlinear systems approach of characterizing AMPA and NMDA receptor dynamics. In: Computational Neuroscience, J.M. Bower (Ed.), Plenum Press, New York, pp. 155-160. Davies, W.D.T. (1970). System Identification for Self-Adaptive Control. Wiley, New York. Davis, G.W. and K.I. Naka. (1980). Spatial organizations of catfish retinal neurons. Single-and random-bar stimulation. Journal 0/ Neurophysiology 43:807-831. de Boer, E. and HR. de Jongh. (1978). On cochlear encoding: potentialities and limitations of the reverse-correlation technique. Journal ofthe Acoustical Society 0/ America 63: 115-135. de Boer, E. and P. Kuyper. (1968). Triggered correlation. IEEE Transactions on Biomedical Engineering 15:169-179. de Boer, E. and A.L. Nuttall. (1997). On cochlear cross-correlation functions: connecting nonlinearity and activity. In: Diversity in Auditory Mechanics. E.R. Lewis, G.R. Long, R.F. Lyon, P.M. Narins, C.R. Steele, and E. Hecht-Poinar (Eds.), World Scientific Press, Singapore, pp. 291-304. de Figueiredo, R.J.P. (1980). Implications and applications of Kolmogorov' s superposition theorem. IEEE Trans. Autom. Contr. 25:1227-1231. Deutsch, R. (1955). On a method ofWiener for noise through nonlinear devices. IRE Convention Record, Part 4, pp. 186-192. Deutsch, S. and E. Micheli-Tzanakou. (1987). Neuroelectric Systems. New York University Press, New York. Dimoka, A., S.H. Courellis, D. Song, V.Z. Marmarelis, and T.W. Berger. (2003). Identification of the lateral and medial perforant path of the hippocampus using single and dual random impulse train stimulation. In: Proceedings 0/ the IEEE EMDS Conference, Cancun, Mexico, pp. 1933-1936. Dittman, J.S., A.C. Kreitzer and W. G. Regehr (2000). Interplay between facilitation, depression and residual calcium at three presynaptic terminals. J. Neuroscience 20: 1374-1385.
514
REFERENCES
Dowling, I.E. and B.B. Boycott. (1966). Organization of the primate retina: electron microscopy. Proc. Roy. Soc. (London) Sero B. 166:80-111. Eckert, H. and L.G. Bishop. (1975). Nonlinear dynamic transfer characteristics of cells in the peripheral visual pathway of flies. Part I: The retinula cells. Biological Cybernetics 17:1-6. Edvinsson, L. and D.N. Krause. (2002). Cerebral Blood Flow and Metabolism. Lippincott Williams and Wilkins, Philadelphia. Eggermont, 1.1. (1993). Wiener and Volterra analyses applied to the auditory system. Hearing Research 66: 177-201. Eggermont, 1.1., P.1. Johannesma, and A.M. Aertsen (1983). Quantitative characterization procedure for auditory neurons based on the spectro-temporal receptive field. Hearing Research 10:167-190. Eggermont, 1.1., P.I.M. Johannesma, and A.M. Aertsen. (1983). Reverse-correlation methods in auditory research. Quarterly Reviews ofBiophysics 16:341. Emerson, R.C., I.R. Bergen, and E.H. Adelson. (1992). Directionally selective complex cells and the computation of motion energy in cat visual cortex. Vision Research 32:203-218. Emerson, R.C. and M.C. Citron. (1988). How linear and nonlinear mechanisms contribute to directional selectivity in simple cells of cat striate cortex. Invest. Opthalmol. & Vis. Sci., Suppl. 29:23. Emerson, R.C., M.l. Korenberg, and M.C. Citron. (1989). Identification of intensive nonlinearities in cascade models of visual cortex and its relation to cell classification. In: Advanced Methods of Physiological System Modeling, Volume IL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 97-112. Emerson, R.C., M.l. Korenberg, and M.C. Citron. (1992). Identification of complex-cell intensive nonlinearities in a cascade model of cat visual cortex. Biological Cybernetics 66:291-300. Emerson, R.C., M.C. Citron, W.J. Vaughn, and S.A. Klein. (1987). Nonlinear directionally selective subunits in complex cells of cat striate cortex. J. Neurophysiol. 58: 33-65. Emerson, R.C. and G.L. Gerstein. (1977). Simple striate neurons in the cat. I. Comparison of responses to moving and stationary stimuli. J. Neurophysiol. 40:119-135. Eykhoff, P. (1963). Some fundamental aspects ofprocess-parameter estimation. IEEE Transactions on Automatie Control8:347-357. Eykhoff, P. (1974). System Identification: Parameter and State Estimation. Wiley, New York. Fakhouri, S.Y. (1980). Identification ofthe Volterra kernels ofnonlinear systems. Proceedings lEE, Part D 127:246-304. Fan, Y. and R. Kalaba. (2003). Dynamic programming and pseudo-inverses. Applied Mathematics and Computation 139:323-342. Fargason, R.D. and G.D. McCann. (1978). Response properties of peripheral retinula cells within Drosophila visual mutants to monochromatic Gaussian white-noise stimuli. Vision Research 18:809-813. Flake, R.H. (1963a). Volterra series representation of nonlinear systems. A. IEEE Trans. 81:330-335. Flake, R.H. (1963b). Volterra series representation oftime-varying nonlinear systems. In: Proceedings of2nd IFAC Congress., Basel, Switzerland. 2:91-99. FitzHugh, R. (1969). Thresholds and plateaus in the Hodgkin-Huxley nerve equations. J. Gen. Physiol. 43: 867-896. Frechet, M.R. (1928). Les Espaces Abstrait. Academie Francaise, Paris. Freckmann, G., B. Kalatz, B. Pfeiffer, U. Hoss, and C. Haug. (2001). Recent advances in continuous glucose monitoring. Exp. Clin. Endocrinol. Diabetes Suppl. 2:S347-S357. French, A.S. (1976). Practical nonlinear system analysis by Wiener kernel estimation in the frequency domain. Biological Cybernetics 24:111-119.
REFERENCES
515
French, A.S. (1984a). The dynamic properties of the action potential encoder in an insect mechanosensory neuron. Biophys. J. 46:285-290. French, A.S. (1984b). The receptor potential and adaptation in the cockroach tactile spine. J. Neuroseien ce 4:2063-2068. French, A.S. (1989). Two components ofrapid sensory adaptation in a cockroach mechanoreceptor neuron. J. Neurophysiol. 62:768-777. French, A.S. (1992). Mechanotransduction. Ann. Rev. Physiol. 54: 135-152. French, A.S. and E.G. Butz. (1973). Measuring the Wiener kemels of a nonlinear system using the fast Fourier transform algorithm. International Journal ojControI17:529-539. French, A.S. and E.G. Butz. (1974). The use ofWalsh functions in the Wiener analysis ofnonlinear systems. IEEE Transactions on Computers 23:225-232. French, A.S. and A.V. Holden. (1971). Frequency domain analysis of neurophysiological data. Computer Programs in Biomedicine 1:219-234. French, A.S. and M. Jarvilehto. (1978). The dynamic behavior of photoreceptor cells in the fly in response to random (white noise) stimulation at a range oftemperatures. Journal 0/ Physiology 274:311-322. French, A.S. and M.J. Korenberg. (1989). A nonlinear cascade model of action potential encoding in an insect sensory neuron. Biophys. J. 55:655-661. French, A.S. and M.J. Korenberg. (1991). Dissection of a nonlinear cascade model for sensory encoding. Ann. Biomed. Eng. 19:473-484. French, A.S. and J.E. Kuster. (1987). Linear and nonlinear behavior ofphotoreceptors during the transduction of small numbers of photons. In: Advanced Methods 0/ Physiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 41--48. French, A.S. and J.E. Kuster. (1981). Sensory transduction in an insect mechanoreceptor: extended bandwidth measurements and sensitivity to stimulus strength. Biological Cybernetics 42:87-94. French, A.S. and V. Z. Marmarelis. (1995). Nonlinear neuronal mode analysis of action potential encoding in the cockroach tactile spin neuron. Biological Cybernetics 73:425--430. French, A.S. and V.Z. Marmarelis. (1999). Nonlinear analysis of neuronal systems. In: Modern Techniques in Neuroscience Research, V. Windhorst & H. Johansson (Eds.), Springer-Verlag, NewYork. French, A.S. and S.K. Patrick. (1994). Testing a nonlinear model ofsensory adaptation with a range of step input functions. In: Advanced Methods 0/ Physiological System Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 129-138. French, A.S. and R.K.S. Wong. (1977). Nonlinear analysis of sensory transduction in an insect mechanoreceptor. Biological Cybernetics 26:231-240. French, A.S., A.E.C. Pece, and M.J. Korenberg. (1989). Nonlinear models of transduction and adaptation in locust photoreceptors. In: Advanced Methods 0/ Physiological System Modeling, Volume IL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 81-96. Funahashi, K.-I. (1989). On the approximate realization of continuous mappings by neural networks. Neural Networks. 2:183-192. Fuortes, M.G. and A.L. Hodgkin. (1964). Changes in time scale and sensitivity in the ommatidia of Limulus. Journal of Physiology 172:239-263. Gallman, P.G. (1975). An iterative method for identification of nonlinear systems using a Uryson model. IEEE Transactions on Automatie ControI20:771-775. Garde, S., M.G. Regalado, V.L. Schechtman, and M.C.K. Khoo. (2001). Nonlinear dynamics of heart rate variability in cocaine-exposed neonates during sleep. American Journal 0/ Physiology-Heart and Circulatory Physiology 280:H2920-H2928. Gemperlein, R. and G.D. McCann. (1975). A study of the response properties of retinula cells of flies using nonlinear identification theory. Biological Cybernetics 19:147-158.
516
REFERENCES
George, D.A. (1959). Continuous nonlinear systems. Technical Report No. 355, Research Laboratory ofElectronics, M.I.T., Cambridge, MA. Ghazanshahi, S.D. and M.C.K. Khoo. (1997). Estimation of chemoreflex loop gain using pseudorandom binary CO 2 stimulation. IEEE Trans. on Biomedieal Engineering 44:357-366. Ghazanshahi, S.D., V.Z. Marmarelis, and S.M. Yamashiro. (1986). Analysis of the gas exchange system dynamics during high-frequency ventilation. Annals 0/ Biomedieal Engineering 14:525-542. Ghazanshahi, S.D., S.M. Yamashiro, and V.Z. Marmarelis. (1987). Use ofrandom forcing for high frequency ventilation. Journal 0/ Applied Physiology, 62:1201-1205. Gholmieh G., S.H. Courellis, S. Fakheri, E. Cheung, V.Z. Marmarelis, H. Baudry, and T.W. Berger. (2003). Detection and classification of neurotoxins using a novel short-term plasticity quantification method. Biosensors & Bioeleetronies 18:1467-1478. Gholmieh, G., S.H. Courellis, V.Z. Marmarelis, and T.W. Berger. (2002). An efficient method for studying short-term plasticity with random impulse train stimuli. Journal 0/Neuroseienee Methods 21: 111-127. Gholmieh, G., S.H. Courellis, D. Song, Z. Wang, V.Z. Marmarelis, and T.W. Berger. (2003). Characterization of short-term plasticity of the dentate gyrus-CA3 system using nonlinear systems analysis. In: Proeeedings ofthe IEEE EMBS Conference, Cancun, Mexico, pp. 1929-1932. Gholmieh, G., W. Soussou, S.H. Courellis, V.Z. Marmarelis, T.W. Berger, and M. Baudry. (2001). A biosensor for detecting changes in cognitive processing based on nonlinear systems analysis. Biosensors and Bioeleetronics 16:491-501. Gilbert, E.G. (1977). Functional expansions for the response of nonlinear differential systems. IEEE Transactions on Automatie ControI22:909-921. Godfrey, K.R. and P.A.N. Briggs. (1972). Identification of processes with directiondependent dynamic responses. Proeeedings lEE 119: 1733-1739. Godfrey, K.R. and W. Murgatroyd. (1965). Input-transducer errors in binary crosscorrelation experiments. Proeeedings lEE 112:565-573. Golomb, S.W. (1967). Shift Register Sequenees. Holden-Day, San Francisco. Goodwin, G.C. and R.L. Payne. (1977). Dynamie System Identifieation: Experiment Design and Data Analysis. Academic Press, New York. Goodwin, G.C. and K.S. Sin. (1984). Adaptive Filtering, Predietion and Control. Prentice-Hall, Englewood Cliffs, NJ. Goussard, Y. (1987). Wiener kernel estimation: A comparison of cross-correlation and stochastic approximation methods. In: Advaneed Methods 0/ Physiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 289-302. Goussard, Y., W.C. Krenz, L. Stark, and G. Demoment. (1991). Practical identification offunctional expansions of nonlinear systems submitted to non-Gaussian inputs. Annals 0/ Biomedical Engineering 19:401-427. Grossberg, S. (1988). Nonlinear neural networks: principles, mechanisms, and architectures. Neural Networks.l:17-61. Grzywacz N.M. and P. Hillman. (1985). Statistical test of linearity of photoreceptor transduction process: Limulus passes, others fail. Proc. Natl. Aead. Sei. USA 82:232-235. Grzywacz N.M. and P. Hillman. (1988). Biophysical evidence that light adaptation in Limulus photoreceptors is due to a negative feedback. Biophysieal Journal 53:337-348. Guttman, R. and L. Feldman. (1975). White noise measurement of squid axon membrane impedance. Biochemieal and Biophysical Research Communieations 67:427-432. Guttman, R., L. Feldman, and H. Lecar. (1974). Squid axon membrane response to white noise stimulation. Biophysieal Journal 14:941-955.
REFERENCES
517
Guttman, R., R. Grisell, and L. Feldman. (1977). Strength-frequency relationship for whitenoise stimulation of squid axons. Math. Biosei. 33:335-343. Gyftopoulos, E.P. and R.J. Hooper. (1964). Signals for transfer function measurement in nonlinear systems. Noise Analysis in Nuelear Systems. USAEC Symposium series 4. TID-7679, Boston. Haber, R. (1989). Structural identification of quadratic block-oriented models based on estimated Volterra kemels. Int. J. Syst. Seienee. 20: 1355-1380. Haber, R. and L. Keviczky. (1976). Identification ofnonlinear dynamic systems, In: IFAC Symposium on Identifieation & System Parameter Esimation., Tbilisi, Georgia, pp. 62-112. Haber, R. and H. Unbehauen. (1990). Structural identification of nonlinear dynamic systems-a survey ofinput/output approaches. Automatiea 26:651-677. Haist, N.D., F.H.1. Chang, and R. Luus. (1973). Nonlinear identification in the presence of correlated noise using a Hammerstein model. IEEE Transactions on Automatie Control 18:552555. Hassoun, M.H. (1995). Fundamentals 0/ Artifieial Neural Networks. MIT Press, Cabridge, MA. Haykin, S. (1994). Neural Networks: A Comprehensive Foundation. Macmillan, New York. Hida, E., K.-I. Naka, and K. Yokoyama. (1983). A new photographie method for mapping spatiotemporal receptive field using television snow stimulation. Jounal 0/ Neuroseienee Methods 8:225. Hodgkin, A.L. and A.F. Huxley. (1952). A quantitative description ofmembrane current and its application to conduction and excitation in nerve. Journal 0/ Physiology 117:500-544. Holstein-Rathlou, N.-H., K.H. Chon, D.J. Marsh, and V.Z. Marmarelis. (1995). Models of renal blood flow autoregulation. In: Modeling the Dynamies 0/ Biologieal Systems, E. Mosekilde and O.G. Mouritsen (Eds.), Springer Verlag, Berlin, pp. 167-185. Hooper, R.J. and E.P. Gyftopoulos. (1967). On the measurement of characteristic kemels of a class of nonlinear systems. Neutron Noise, Waves and Pulse Propagation. USAEC Conference Report No. 660206, Boston. Homik, K., M. Stinchcombe, and H. White. (1989). Multi-later feedforward networks are universal approximators. Neural Networks 2:359-366. Hsieh, H.C. (1964). The least squares estimation oflinear and nonlinear system weighting function of matrices. In! Control 7:84-115. Huang, S.T. and S. Cambanis. (1979). On the representation ofnonlinear systems with Gaussian input. Stoehastie Proeess. 2: 173. Hung, G., D.R. Brillinger, and L. Stark. (1979). Interpretation of kemels 11. Mathematieal Bioseienees 47: 159-187. Hung, G., L. Stark, and P. Eykhoff. (1977). On the interpretation ofkemels: I. Computer simulation of responses to impulse-pairs. Annals 0/ Biomedieal Engineering. 5: 130-143. Hung. G. and L. Stark. (1977). The kemel identification method (1910-1977): Review of theory, calculation, application and interpretation. Mathematical Bioseiences 37: 135-190. Hung. G. and L. Stark. (1991). The interpretation of kemels-an overview. Mathematieal Biosciences 19:509-519. Hunter, LW. and M.J. Korenberg. (1986). The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biologieal Cybemetics 55: 135-144. Hunter, LW. and R.E. Keamey. (1987). Quasi-linear, time-varying, and nonlinear approaches to the identification of muscle and joint mechanics. In: Advanced Methods 0/ Physiologieal System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 128-147. Iatrou, M., T.W. Berger, and V.Z. Marmarelis. (1999a). Modeling ofnonlinear nonstationary dy-
518
REFERENCES
namic systems with a novel class of artificial neural networks. IEEE Transaetions on Neural Networks 10:327-339. Iatrou M., T.W. Berger and V.Z. Mannarelis. (1999b). Application ofa novel modeling method to the nonstationary properties of potentiation in the rabbit hippocampus. Annals ofBiomedieal Engineering 27:581-591. Jacobson, L.D., J.P. Gaska, H.-W. Chen, and D.A. Pollen. (1993). Structural testing ofmulti-input linear-nonlinear cascade models for cells in macaque striate cortex. Vision Research 33:609-626. James, A.C. (1992). Nonlinear operator network models ofprocessing in fly lamina. In Nonlinear Vision. R.B. Pinter and B. Nabet (Eds.), CRC Press, Boca Raton, FL, Chapter 2, pp. 40-73. Juusola, M. and A. S. French. (1995). Transduction and adaptation in spider slit sense organ mechanoreceptors. Journal ofNeurophysiology 74:2513-2523. Kalaba, R.E. and K. Springam. (1982). Identifieation, Control, and Input Optimization. Plenum Press, New York. Kalaba, R.E. and L. Tesfatsion. (1990). Flexible least squares for approximately linear systems. IEEE Trans. Syst. Man & Cyber. 20:978-989. Keamey, R.E. and LW. Hunter. (1986). Evaluation of a technique for the identification of timevarying systems using experimental and simulated data. Digest 12th CMBEC 12:75-76. Keamey, R.E. and LW. Hunter. (1990). System identification ofhumanjoint dynamies. CRC Critieal Reviews ofBiomedieal Engineering 18:55-87. Khoo, M.C.K. (Ed.). (1989). Modeling and Parameter Estimation in Respiratory Control. Plenum, NewYork. Khoo, M.C.K. (Ed.). (1996). Bioengineering Approaches to Pulmonary Physiology and Medicine. Plenum, New York. Khoo M.C.K. (2000). Physiological Control Systems: Analysis, Simulation, and Estimation. IEEE Press, New York. Khoo, M.C.K. and V.Z. Marmarelis. (1989). Estimation ofchemoreflex gain from spontaneous sigh responses. Annals ofBiomedical Engineering, 17:557-570. Klein, S. and S. Yasui. (1979). Nonlinear systems analysis with non-Gaussian white stimuli-General basis functionals and kemels. IEEE Trans. Info. Theo. 25:495-500. Klein, S.A. (1987). Relationships between kemels measured with different stimuli. In: Advaneed Methods ofPhysiologieal System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 278-288. Kolmogorov, A.N. (1957). On the representation ofcontinuous functions ofseveral variables by superposition of continuous functions of one variable and addition. Doklady Akademii Nauk. SSSR 114:953-956; AMS Transl. 2:55-59, (1963). Korenberg, M.J. (1973a). Identification of biological cascades of linear and static nonlinear systems. Proeeedings 16th Midwest Symposium Cireuit Theory 18.2:1-9. Korenberg, M.J. (1973b). Identification ofnonlinear differential systems, In: Proeeedings Joint Automatie Control Conference pp. 597-603, San Francisco. Korenberg, M.J. (1973c). New methods in the frequency analysis oflinear time-varying differential equations. In: Proe ofIEEE International Symposium on Cireuit Theory, pp. 185-188. Korenberg, M.J. (1973d). Crosscorrelation analysis ofneural cascades. In: Proe. 10th Annual Rocky Mountain Bioengineering Symposium, pp. 47-52, Denver. Korenberg, M.J. (1982). Statistical identification of parallel cascades of linear and nonlinear systems. In: IFAC Symposium on Identifieation and System Parameter Estimation, Arlington, VA, pp. 580-585. Korenberg, M.J. (1983). Statistical identification of difference equation representations for nonlinear systems. Eleetronies Letters 19:175-176.
REFERENCES
519
Korenberg, M.I. (1984). Statistieal identifieation of Volterra kerneis of high order systems. In: ICAS'84, pp. 570-575. Korenberg, M.I (1987). Funetional expansions, parallel eascades and nonlinear differenee equations. In: Advanced Methods 0/ Physiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedieal Simulations Resouree, Los Angeles, pp. 221-240. Korenberg, M.I. (1988). Identifying nonlinear differenee equation and funetional expansion representations: The fast orthogonal algorithm. Annals ofBiomedical Engineering 16:123-142. Korenberg, M.I. (1989a). A robust orthogonal algorithm for system identifieation and time series analysis. Bio/. Cybern. 60:267-276. Korenberg, M.I. (1989b). Fast orthagonal algorithms for nonlinear system identifieation and timeseries analysis. In: Advanced Methods 0/ Physiological System Modeling, Volume IL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 165-178. Korenberg, M.I. (1991). Parallel easeade identifieation and kernel estimation for nonlinear systems. Annals of Biomedical Engineering 19:429-455. Korenberg, M.I. and LW. Hunter. (1986). The identifieation ofnonlinear biologie al systems: LNL caseade models. Biological Cybernetics 55:125-134. Korenberg, M.I. and LW. Hunter. (1990). The identifieation of nonlinear systems: Wiener kernel approaehes. Ann. Biomed. Eng. 18:629-654. Korenberg, M.I., S.B. Bruder, and P.I. MeIlroy. (1988). Exaet orthogonal kernel estimation from finite data reeords: Extending Wiener' s identifieation of nonlinear systems. Annals 0/ Biomedical Engineering 16:201-214. Krausz, H.L (1975). Identifieation of nonlinear systems using random impulse train inputs. Biological Cybernetics 19:217-230. Krausz, H.L and W.G. Friesen. (1977). The analysis ofnonlinear synaptie transmission. Journal 01 General Physiology 70:243. Krausz, H.L and K.L Naka. (1980). Spatio-temporal testing and modeling of eatfish retinal neurons. Biophysical Journal 29: 13-36. Krenz, W. and L. Stark. (1987). Interpretation of kerneis of funetional expansions. In: Advanced Methods 01Physiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedieal Simulations Resouree, Los Angeles, pp. 241-257. Krenz, W. and L. Stark. (1991). Interpretation of funetional series expansions. Ann. Biomed. Eng. 19:485-509. Krieger, D., T.W. Berger, and R.I. Sealabassi. (1992). Instantaneous eharaeterization oftime-varying nonlinear systems. IEEE Trans. Biomedical Engineering 39:420-424, 1992. Kroeker, I.P. (1977). Wiener analysis of nonlinear systems using Poisson-Charlier eross eorrelation. Biological Cybernetics 27:221-227. Kroeker, I.P. (1979). Synaptie faeilitation in Aplysia explored by random presynaptie stimulation. Journal ofGeneral Physiology 73:747. Ku, Y.H. and A.A. Wolf. (1966). Volterra-Wiener funetionals for the analysis ofnonlinear systems. Journal 0/ the Franklin Institute 3:9-26. Landau, M. and C.T. Leondes. (1975). Volterra series synthesis of nonlinear stoehastic tracking systems. IEEE Trans. Aerospace and Electronic Systems 10:245-265. Lasater, E.M. (1982a). A white-noise analysis of responses and receptive fields of catfish cones. Journal ofNeurophysiology 47:1057. Lasater, E.M. (1982b). Spatial receptive fields of catfish retinal ganglion cells. Journal 01 Neurophysiology 48:823. Lee, Y. W. (1964). Contributions of Norbert Wiener to linear theory and nonlinear theory in engineering. In: Selected Papers 0/ Norbert Wiener, SIAM, MIT Press, Cambridge, MA, pp. 17-33.
520
REFERENCES
Lee, Y.W. and M. Sehetzen. (1965). Measurement ofthe Wiener kerneis of a nonlinear system by eross-eorrelation. International Journal 0/ Control2:23 7-254. Leontaritis, LJ. and S.A. Billings. (1985). Input-output parametrie models for nonlinear systems; Part I-Deterministie nonlinear systems; Part 11 pp. 328-344. Int. J. ControI41:303-327. Lewis, E.R. and ~.R. Henry. (1995). Nonlinear effeets of noise on phase-Ioeked eoehleamerve responses to sinusoidal stimuli. Hearing Research 92:1-16. Lewis, E.R., K.R. Henry, and W.M. Yamada. (2000) Essential roles ofnoise in neural eoding and in studies ofneural eoding. Biosystems 58:109-115. Lewis, E.R., K.R. Henry, and W.M. Yamada. (2002a). Tuning and timing ofexeitation and inhibition in primary auditory nerve fibers. Hearing Research 171:13-31. Lewis, E.R., K.R. Henry, and W.M. Yamada. (2002b). Tuning and timing in the gerbil ear: Wiener kernel analysis. Hearing Research 174:206-221. Lewis, E.R. and P. van Dijk. (2003). New variations on the derivation ofspeetro-temporal reeeptive fields for primary auditory afferent axons. Hearing Research 186:30-46. Lipson, E.D. (1975a). White noise analysis ofPhyeomyees light growth response system I. Normal intensity range. Biophysical Journal 15:989-1012. Lipson, E.D. (1975b). White noise analysis of Phyeomyees light growth response system. II. Extended intensity ranges. Biophysical Journal 15:1013-1031. Lipson, E.D. (1975e). White noise analysis ofPhyeomyees light growth ~esponse system. 111. Photomutants. Biophysical Journal 15:1033-1045. Ljung, L. (1987). System Identification: Theory for the User. Prentiee-Hall Ine., Englewood Cliffs, NJ. Ljung, L. and T. Glad. (1994). Modeling 0/ Dynamic Systems. Prentiee-Hall, Englewood Cliffs, NJ. Ljung, L. and T. Soderstrom. (1983). Theory and Practice 0/ Recursive Identification. MIT Press, Cambridge, MA. Marmarelis, P.Z. (1972). Nonlinear identifieation ofbioneuronal systems through white-noise stimulation. In: Thirteenth Joint Automatie Control Conference, Stanford University, Stanford, CA, pp. 117-126. Marmarelis, P.Z. (1975). The noise about white-noise: Pros and eons." In: Proceedings 1st Symposium on Testing and Identification on Nonlinear Systems, California Institute of Teehnology, Pasadena, CA, pp. 56-75. Marmarelis, P.Z. and V.Z. Marmarelis. (1978). Analysis ofPhysiological Systems: The White-Noise Approach. New York: Plenum. (Russian translation: Mir Press, Moseow, 1981. Chinese translation: Aeademy ofSeienees Press, Beijing, 1990.) Marmarelis, P.Z. and G.D. MeCann. (1973). Development and applieation ofwhite-noise modeling teehniques for studies ofinseet visual nervous systems. Kybemetik 12:74-89. Marmarelis, P.Z. and G.D. MeCann. (1975). Errors involved in the praetieal estimation ofnonlinear system kerneis. In: Proceedings 1st Symposium on Testing and Identification 0/ Nonlinear Systems. California Institute ofTeehnology, Pasadena, CA, pp. 147-173. Marmarelis, P.Z. and K.-L Naka. (1972). White noise analysis of a neuron ehain: An applieation of the Wiener theory. Science 175:1276-1278. Marmarelis, P.Z. and K.-L Naka. (1973a). Nonlinear analysis and synthesis of reeeptivefield responses in the eatfish retina. I. Horizontal eell ~ ganglion eell chain. Journal 0/ Neurophysiology 36:605-618. Marmarelis, P.Z. and K.-L Naka. (1973b). Nonlinear analysis and synthesis of reeeptive-field responses in the eatfish retina. II. One-input white-noise analysis. Journal 0/ Neurophysiology 36:619-633. Marmarelis, P.Z. and K.-L Naka. (1973e). Nonlinear analysis and synthesis of reeeptive-field reI
REFERENCES
521
sponses in the catfish retina. III. Two-input white-noise analysis. Journal of Neurophysiology 36:634-648. Marmarelis, P.Z. and K.-I. Naka. (1973d). Mimetic model ofretinal network in catfish. In: Conference Proceedings on Regulation and Control in Physiological Systems, A.S. Iberall and A.C. Guyton (Eds.). Rochester, NY, pp. 159-162. Marmarelis, P.Z. and K.-I. Naka. (1974a). Identification of multi-input biological systems. IEEE Trans. Biomedical Engineering 21:88-101. Marmarelis, P.Z., and K.-I. Naka. (1974b). Experimental analysis of a neural system: Two modeling approaches. Kybemetik 15:11-26. Marmarelis, P.Z. and F.E. Udwadia. (1976). The identification ofbuilding structural systems; Part 11: The nonlinear case. Bulletin ofthe Seismological Society ofAmerica 66: 153-171. Marmarelis, V.Z. (1975). Identification of nonlinear systems through multi-level random signals. In: Proceedings 1st Symposium on Testing and Identification ofNonlinear Systems, Pasadena, CA, pp. 106-124. Marmarelis, V.Z. (1976). Identification of Nonlinear Systems through Quasi-White Test Signals. Ph.D. Thesis, Califomia Institute ofTechnology, Pasadena, CA. Marmarelis, V.Z. (1977). A family of quasi-white random signals and its optimal use in biological system identification. Part I: Theory, Biological Cybernetics 27:49-56. Marmarelis, V.Z. (1978a). Random vs. pseudorandom test signals in nonlinear system identification. IEEE Proceedings 125:425--428. Marmarelis, V.Z. (1978b). The optimal use ofrandom quasi-white signals in nonlinear system identification. Multidisciplinary Research 6:112-141. Marmarelis, V.Z. (1979a). Error analysis and optimal estimation procedures in identification of nonlinear Volterra systems. Automatica 15:161-174. Marmarelis, V.Z. (1979b). Methodology for nonstationary nonlinear analysis ofthe visual system. In: Proceedings U.S.-Japan Joint Symposium on Advanced Analytical Techniques Applied to the Visual System, Tokyo, Japan, pp. 235-244. Marmarelis, V.Z. (1979c). Practical identification of the general time-variant nonlinear dynamic system. In: Proceedings International Conference on Cybernetics and Society, Denver, CO, pp. 727-733. Marmarelis, V.Z. (1980a). Identification methodology for nonstationary nonlinear biological systems. In: Proceedings International Symposium on Circuits and Systems, Houston, TX, pp. 448-452. Marmarelis, V.Z. (1980b). Identification of nonlinear systems by use of nonstationary white-noise inputs. Applied Mathematical Modeling 4:117-124. Marmarelis, V.Z. (1980c). Identification of nonstationary nonlinear systems. In: 14th Asilomar Conference on Circuits, Systems and Computers, Pacific Grove, CA, pp. 402-406. Marmarelis, V.Z. (1981a). Practicable identification of nonstationary nonlinear systems. lEE Proceedings, Part D 128:211-214. Marmarelis, V.Z. (1981b). A single-record estimator for correlation functions ofnonstationary random processes. Proceedings ofthe IEEE 69:841-842. Marmarelis, V.Z. (1982). Non-parametric validation of parametrie models. Mathematical Modelling 3:305-309. Marmarelis, V.Z. (1983). Practical estimation of correlation functions of nonstationary Gaussian processes. IEEE Transactions on Information Theory 29:937-938. Marmarelis, V.Z. (1987a). Advanced Methods of Physiological System Modeling, Volume I Biomedical Simulations Resource, Los Angeles, Califomia. Marmarelis, V.Z. (1987b). Nonlinear and nonstationary modeling ofphysiological systems. In: Ad-
522
REFERENCES
vanced Methods ofPhysiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, California, pp. 1-24. Marmarelis, V.Z. (1987c). Recent advances in nonlinear and nonstationary analysis. In: Advanced Methods ofPhysiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, California, pp. 323-336.
Marmarelis, v.z. (1987d). Visual system nonlinear modeling. In: Systems and Control Encyclopedia: Theory, Technology, Applications, M.G. Singh (Ed.), Pergamon Press, Oxford, pp. 5065-5070. Marmarelis, V.Z. (1988a). Coherence and apparent transfer function measurements for nonlinear physiological systems. Annals ofBiomedical Engineering 16:143-157. Marmarelis, V.Z. (1988b). The role of nonlinear models in neurophysiological system analysis. In: 1st IFAC Symposium on Modeling and Control in Biomedical Systems, Venice, Italy, pp. 25-35. Marmarelis, V.Z. (1989a). Identification and modeling of a class ofnonlinear systems. Mathematical Computer Modelling 12:991-995.
Marmarelis, V.Z. (1989b). Linearized models of a class of nonlinear dynamic systems. Applied Mathematical Modelling 13:21-26.
Marmarelis, V.Z. (1989c). Signal transformation and coding in neural systems. IEEE Transactions on Biomedical Engineering 36:15-24. Marmarelis, V.Z. (1989d). The role of nonlinear models in neurophysiological system analysis. In: Modelling and Control in Biomedical Systems, C. Cobelli and L. Mariani (Eds.), pp. 39-50, Pergamon Press, Oxford. Marmarelis, V.Z. (Ed.). (198ge). Advanced Methods ofPhysiological System Modeling, Volume II. Plenum, New York. Marmarelis, V.Z. (1989f). Volterra-Wiener analysis of a class of nonlinear feedback systems and application to sensory biosystems. In: Advanced Methods of Physiological System Modeling, Volume IL Plenum, New York, pp. 1-52. Marmarelis, V.Z. (1991). Wiener analysis of nonlinear feedback in sensory systems. Annals ofBiomedical Engineering 19:345-382.
Marmarelis, V.Z. (1993). Identification of nonlinear biological systems using Laguerre expansions ofkernels. Annals ofBiomedical Engineering 21:573-589. Marmarelis, V.Z. (1994a). Nonlinear modeling of physiological systems using principal dynamic modes. In: Advanced Methods of Physiological System Modeling, Volume IIL Plenum, New York, pp. 1-28. Marmarelis, V.Z. (1994b). On kernel estimation using non-Gaussian and/or non-white input data. In: Advanced Methods ofPhysiological System Modeling, Volume IIL Plenum, New York, pp. 229-242. Marmarelis, V.Z. (1994c). Three conjectures on neural network implementation ofVolterra models (mappings). In: Advanced Methods ofPhysiological System Modeling, Volume IIL Plenum, New York, pp. 261-268. Marmarelis, V.Z. (Ed.). (1994d). Advanced Methods ofPhysiological System Modeling, Volume IIL Plenum, New York. Marmarelis, V.Z. (1995). Methods and tools for identification of physiological systems. In: Handbook of Biomedical Engineering, J.D. Bronzino (Ed.). Boca Raton, FL: CRC Press, pp. 2422-2436. Marmarelis, V.Z. (1997). Modeling methodology for nonlinear physiologieal systems. Annals of Biomed. Eng. 25:239-251.
Marmarelis, V.Z. (2000). Methods and tools for identification of physiological systems. In: The
REFERENCES
523
Biomedical Engineering Handbook, 2nd Ed., Volume 2, J.D. Bronzino (Ed.), Chapter 163, CRC Press, Boea Raton, FL, pp. 163.1-163.15.
Marmarelis, V.Z. and N. Herman. (1988). LYSIS: An interaetive software system for nonlinear modeling and simulation. In: 1988 SCS Multiconference: Modeling and Simulation on Microcomputers, San Diego, CA, pp. 6-10. Marmarelis, V.Z. and G.D. MeCann. (1975). Optimization of test parameters for identifieation of spike-train responses of biologieal systems through random test signals. In: Proceedings 1st Symposium on Testing and Identification ofNonlinear Systems, Pasadena, CA, pp. 325-338. Marmarelis, V.Z. and G.D. MeCann. (1977). A family ofquasi-white random signals and its optimal use in biologieal system identification. Part 11: Application to the photoreceptor of Calliphora erythrocephala. Biological Cybernetics 27:57-62. Marmarelis, V.Z. and G.D. Mitsis. (2000). Nonparametrie modeling ofthe glucose-insulin system. In: Annual Conferenee Biomedical Engineering Society, Seattle, WA. Marmarelis, V.Z. and M.E. Orme. (1993). Modeling ofneural systems by use ofneuronal modes. IEEE Transactions on Biomedical Engineering 40:1149-1158. Marmarelis, V.Z. and A.D. Sams. (1982). Evaluation ofVolterra kerneis from Wiener kernel measurements. In: 15th Annual Hawaii Internernational Conference on System Seien ces, Honolulu, HI, pp. 322-326. Marmarelis, V.Z. and S.M. Yamashiro. (1982). Nonparametrie modeling ofrespiratory mechanics and gas exchange. In: Proceedings 6th IFAC Symposium on Identification and System Parameter Estimation, Arlington, VA, pp. 586-591. Marmarelis, V.Z. and X. Zhao. (1994). On the relation between Volterra models and feedforward artificial neural networks. In: Advanced Methods ofPhysiological System Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 243-260. Marmarelis, V.Z. and X. Zhao. (1997). Volterra models and three-Iayer perceptrons. IEEE Transactions on Neural Networks 8: 1421-1433. Marmarelis, V.Z., M.C. Citron, and C.P. Vivo. (1986). Minimum-order Wiener modeling of spike output system. Biological Cybemetics 54:115-123. Marmarelis, V.Z., M. Juusola, and A.S. French. (1999a). Prineipal dynamic mode analysis ofnonlinear transduetion in a spider mechanoreceptor. Annals ofBiomedical Engineering 27:391-402. Marmarelis, V.Z., K.H. Chon, N.H. Holstein-Rathlou and D.J. Marsh. (1999b). Nonlinear analysis of renal autoregulation in rats using principal dynamic modes. Annals ofBiomedical Engineering 27:23-31. Marmarelis, V.Z., G.D. Mitsis, K. Hueeking, and R.N. Bergman. (2002). Nonparametrie modeling of the insulin-glucose dynamie relationships in dogs. In: Proeeedings of2nd Joint IEEE/EMBS Conference, Houston, TX, pp. 224-225. Marmarelis, V.Z., K.H. Chon, Y.M. Chen, D.J. Marsh, and N.H. Holstein-Rathlou. (1993). Nonlinear analysis of renal autoregulation under broadband forcing conditions. Annals ofBiomedical Engineering 21:591-603. Marmarelis, V.Z., S.F. Masri, F.E. Udwadia, T.K. Caughey, and G.D. Jeong. (1979). Analytical and experimental studies of the modeling of a class of nonlinear systems. Nuclear Engineering and Design 55:59-68. Marsh, D.J., J.L. Osborn, and W.J. Cowley. (1990). liftluctuations in arterial pressure and regulation ofrenal blood flow in dogs. American Journal ofPhysiology 258:FI394-FI400. McCann, G.D. (1974). Nonlinear identification theory models for successive stages of visual nervous systems in flies. Journal ofNeurophysiology 37:869-895. McCann, G.D. and J.C. Dill. (1969). Fundamental properties ofintensity, form and motion perception in the visual nervous systems of Calliphora phaenicia and Musea domestica. Journal of General Physiology 53:385-413.
524
REFERENCES
McCann, G.D. and P.Z. Marmarelis (Eds.). (1975). In: Proceedings of the First Symposium on Testing and Identification ofNonlinear Systems, Califomia Institute of Technology, Pasadena, CA. McCann, G.D., R.D. Fargason, and V.T. Shantz. (1977). The response properties ofretinula cells in the fly Callifphora erythrocephala as a function ofthe wave-length and polarization properties of visible and ultraviolet light. Biological Cybernetics 26:93-107. McCullogh, W.S. and W.R. Pitts. (1943). A logical calculus ofideas immanent in nervous activity. Bull. Math. Biophys. 5:115-133. Mitsis, G.D. and V.Z. Marmarelis. (2002). Modeling of nonlinear physiological systems with fast and slow dynamics. L Methodology. Annals ofBiomedical Engineering 30:272-281. Mitsis, G.D., R. Zhang, B.D. Levine, and V.Z. Marmarelis. (2002). Modeling ofnonlinear systems with fast and slow dynamies. 11. Application to cerebral autoregulation in humans. Annals of Biomedical Engineering 30:555-565. Mitsis, G.D., P.N. Ainslie, M.J. Poulin, P.A. Robbins, and V.Z. Marmarelis. (2003a). Nonlinear modeling of the dynamic effects of arterial pressure and blood gas variations on cerebral blood flow in healthy humans. In: The IXth Oxford Conference on Modeling and Control ofBreathing, Paris, France. Mitsis, G.D., S. Courellis, A.S. French, and V Z. Marmarelis. (2003b). Principal dynamic mode analysis of a spider mechanoreceptor action potentials. In: Proceedings 25th Anniversary Conference ofthe IEEE EMBS, Cancun, Mexico, pp. 2051-2054. Mitsis, G.D., A. Mahalingam, Z. Zhang, B.D. Levine, and V. Z. Marmarelis. (2003c). Nonlinear analysis of dynamic cerebral autoregulation in humans under orthostatic stress. In: Proceedings 25th Anniversary Conference ofthe IEEE EMBS, Cancun, Mexico, pp. 398-401. Moller, A.R. (1973). Statistical evaluation ofthe dynamic properties of cochlear nucleus units using stimuli modulated with pseudorandom noise. Brain Research 57:443-456. Moller, A.R. (1975). Dynamic properties of excitation and inhibition in the cochlear nucleus. Acta Physiologica Scandinavica 93:442-454. Moller, A.R. (1976). Dynamic properties ofthe responses of single neurones in the cochlear nucleus of the rat. Journal ofPhysiology 259:63-82. Moller, A.R. (1977). Frequency selectivity of single auditory-nerve tibers in response to broadband noise stimuli. Journal ofthe Acoustical Society ofAmerica 62:135-142. Moller, A.R. (1978). Responses of auditory nerve tibers to noise stimuli show cochlear nonlinearities. Acta Oto-Laryngol (Stockholm) 86:1-8. Moller, A.R. (1983). Frequency selectivity of phase-locking of complex sounds in the auditory nerve ofthe rat. Hearing Research 11:267-284. Moller, A.R. (1987). Analysis of the auditory system using pseudorandom noise. In: Advanced Methods ofPhysiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 60-62. Moller, A.R. (1989). Volterra-Wiener analysis from the whole-nerve responses of the exposed audiotory nerve in man to pseudorandom noise. In: Advanced Methods of Physiological System Modeling, Volume IL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 63-80. Moller, A.R. and A. Rees. (1986). Dynamic properties ofthe responses ofsingle neurons in the inferior colliculus ofthe rat. Hearing Research 24:203-215. Moore, G.P. and R.A. Auriemma. (1985). Testing a gamma-activated multiple spike-generator hypothesis for the Ia afferent. In: The Muscle Spindle, LA. Boyd & M.R. Gladden (Eds.), Stockton Press, New York, pp. 391-395. Moore, G.P. and R.A. Auriemma. (1985). Production of muscle stretch receptor behavior using Wiener kemels. Brain Research 331: 185-189. Moore, G.P., D.R. Perkel, and lP. Segundo. (1966). Statistical analysis and functional interpretation of neuronal spike data. Annual Review ofPhysiology 28:493-522.
REFERENCES
525
Moore, G.P., D.G. Stuart, E.K. Stauffer, and R. Reinking. (1975). White-noise analysis of mammalian muscle receptors. In: Proeeedings 1st Symposium on Testing and Identifieation ofNonlinear Systems, G. D. McCann and P. Z. Marmarelis (Eds.), Califomia Institute ofTechnology, Pasadena, CA, 316-324. Naka, K.-I. (1971). Receptive field mechanism in the vertebrate retina. Seienee 171:691-693. Naka, K.-I. (1976). Neuronal circuitry in the catfish retina. Invest. Opthalmol. 15:926. Naka, K.-I. (1977). Functional organization of the catfish retina. Journal ofNeurophysiology 40:26. Naka, K.-I. (1982). The cells horizontal cells talk to. Vision Res. 22:653. Naka, K.-I. and V. Bhanot (2002). White-noise analysis in retinal physiology. New York Series, Part. 111. Naka, K.-I. and T. Ohtsuka. (1975). Morphological and functional identifications of catfish retinal neurons. 11. Morphological identification. Journal ofNeurophysiology 38:72-91. Naka, K.I., M. Itoh, and N. Ishii. (1987). White-Noise Analysis in Visual Physiology. In: Advaneed Methods ofPhysiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 49-59. Naka, K.-I., G.W. Davis, and R.Y. Chan. (1979). Receptive-field organization in catfish retina. Sensory Proe. 2:366. Naka, K.-I., H.M. Sakai, and N. Ishii. (1988). Generation and transformation ofsecondorder nonlinearity in catfish retina. Annals ofBiomedieal Engineering 16:53-64. Naka, K.-I., P.Z. Marmarelis, and R.Y. Chan. (1975). Morphological and functional identifications of catfish retinal neurons. 111. Functional identifications. Journal ofNeurophysiology 38:92-131. Naka, K.-I., R.L. Chappell, and M. Sakuranaga. (1982). Wiener analysis of turtle horizontal cells. Biomed. Res. 3(Suppl.): 131. Naka, K.-I., R.Y. Chan, and S. Yasui. (1979). Adaptation in catfish retina. Journal ofNeurophysiology 42:441--454. Narayanan, S. (1967). Transistor distortion analysis using Volterra series representation. The Bell System Technical Journal 46:991-1024. Narayanan, S. (1970). Application ofVolterra series to intermodulation distortion analysis ofa transistor feedback amplifier. IEEE Transactions on Circuit Theory 17:518-527. Narenda, K.S. and P.G. Gallman. (1966). An iterative method for the identification of nonlinear system using a Hammerstein model. IEEE Transactions on Automatie Controlll:546-550. Neis, V.S. and J.L. Sackman (1967). An experimental study of a non-linear material with memory. Trans. Soe. Rheology. 11:307-333. Ni, T.-C., M. Ader, and R.N. Bergman. (1997). Reassessment of glucose effectiveness and insulin sensitivity from minimal model analysis: A theoretical evaluation of the singlecompartment glucose distribution assumption. Diabetes 46: 1813-1821. Nikias, C.L. and A.P. Petropulu. (1993). Higher-Order Spectra Analysis. Prentice-Hall, Englewood Cliffs, NJ. Ogura, H. (1972). Orthogonal functionals ofthe Poisson process. IEEE Transaetions on Information Theory 18:473--481. Ogura, H. (1985). Estimation of Wiener Kemeis of a nonlinear system and a fast algorithm using digital Laguerre filters, In: Proeeedings 15th NIBB Conferenee on Information Proeessing in Neuron Network, Okazaki, Japan, pp. 14-62. O'Leary, D.P. and V. Honrubia. (1975). On-line identification ofsensory systems using pseudorandom binary noise perturbations. Biophysieal Journal 15:505-532. O'Leary, D~P., R. Dunn, and V. Honrubia. (1974). Functional and anatomical correlation ofafferent responses from the isolated semicircular canal. Nature 251:225-227. O'Leary, D.P., R.F. Dunn, and V. Honrubia. (1976). Analysis of afferent responses from isolated semicircular canal of the guitarfish using rotational acceleration white-noise inputs. Part I: Cor-
526
REFERENCES
relation of response dynamics with receptor innervation. Journal of Neurophysiology 39:631-647. Palm, G. (1979). On representation and approximation of nonlinear systems-Part 11: discrete time. Biological Cybernetics 34:49-52. Palm, G. and T. Poggio. (1977a). Wiener-like system identification in physiology. J. Math. Biol. 4:375-381. Palm, G. and T. Poggio. (1977b). The Volterra representation and the Wiener expansion: validity and pitfalls. SIAM Journal 01Applied Mathematics 33: 195-216. Palm, G. and T. Poggio (1978). Stochastic identification methods for nonlinear systems: an extension ofthe Wiener theory. SIAM Journal ofApplied Mathematics 34:524-534. Palmer, L.A., A. Gottschalk, and J.P. Jones. (1987). Constraints on the estimation of spatial receptive field profiles of simple cells in visual cortex. In: Advanced Methods ofPhysiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 205-220. Panerai, R.B., S.L. Dawson, and J.F. Potter. (1999). Linear and nonlinear analysis of human dynamic cerebral autoregulation. American Journal ofPhysiology 277:H 1089-1 099. Panerai, R.B., D.M. Simpson, S.T. Deverson, P. Mahony, P. Hayes, and D.H. Evans. (2000). Multivariate dynamic analysis of cerebal blood flow regulation in humans. IEEE Transactions on Biomedical Engineering 47:419-421. Papazoglou, T.G., T. Papaioannou, K. Arakawa, M. Fishbein, V.Z. Marmarelis, and W.S. Grundfest. (1990). Control of excimer laser aided tissue ablation via laser-induced fluorescence monitoring. Applied Optics 29:4950-4955. Patwardhan, A.R., S. Vallurupali, J.M. Evans, E.N. Bruce, and C.F. Knap. (1995). Override of spontaneous respiratory pattern generator reduces cardiovascular parasympathetic influence. Journal ofApplied Physiology 79:1048-1054. Pinter, R.B. (1983). The electrophysiological bases for linear and for non-linear productterm lateral inhibition and the consequences ofwide-field textured stimuli. J. Theor. Biol. 105:233-243. Pinter, R.B. (1984). Adaptation ofreceptive field spatial organization via multiplicative lateral inhibition. J. Theor. Bio!. 110:424-444. Pinter, R.B. (1985). Adaptation of spatial modulation transfer functions via nonlinear lateral inhibition. Biol. Cybern. 51 :285-291. Pinter, R.B. (1987). Kernel synthesis from nonlinear multiplicative lateral inhibition. In: Advanced Methods ofPhysiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 258-277. Pinter, R.B. and B. Nabet (Eds.) (1992). Nonlinear Vision: Determination of Neural Receptive Fields, Function, and Networks. CRC Press, Boca Raton, FL. Poggio, T. and V. Torre. (1977). Volterra representation for some neuron models. Biological Cybernetics 27:1113-1124. Poggio, T. and W. Reichardt. (1973). Considerations on models ofmovement detection. Kybernetik 13:223-227. Poggio, T. and W. Reichardt. (1976). Visual control of orientation behaviour in the fly. Part 11: Towards the underlying neural interactions. Quarterly Reviews ofBiophysics 9:377-448. [For Part I see Reichardt and Poggio (1976).] Poulin, M.J., P.-J. Liang, and P.A. Robbins. (1996). Dynamics ofthe cerebral blood flow response to step changes in end-tidal PC0 2 and P0 2 in humans. Journal of Applied Physiology 81:1084-1095. Poulin, M.J., P.-J. Liang, and P.A. Robbins. (1998). Fast and slow components of cerebral blood flow response to step decreases in end-tidal CO2 in humans. Journal of Applied Physiology 85:388-397.
REFERENCES
527
Powers, R.L. and D.W. Amett. (1981). Spatio-temporal cross-correlation analysis of catfish retinal neurons. Biological Cybernetics 41: 179. Price, R.A. (1958). A useful theorem for nonlinear devices having Gaussian inputs. IRE Trans. Inform. Theory 4:69-72. Ratliff, F. B.W. Knight, and N. Graham. (1969). On tuning and amplification by lateral inhibition. Proc. U.S. Nat. Acad. Sei. 62:733-740. Ream, N. (1970). Nonlinear identification using inverse-repeat rn-sequences. Proceedings IEEE 117:213-218. Rebrin K., G.M. Steil, W.P. van Antwerp, and J.J. Mastrototaro. (1991). Subcutaneous glucose predicts plasma glucose independent of insulin: implications for continuous monitoring. American Journal ofPhysiology 277:E561-E571. Rebrin, K., G.M. Steil, L. Getty, and R.N. Bergman. (1995). Free fatty-acid as a link in the regulation of hepatic glucose output by peripheral insulin. Diabetes 44: 1038-1045. Recio, A., S.S. Narayan, and M.A. Ruggero. (1997). Wiener-kemel analysis ofbasilarmembrane responses to white noise. In: Diversity in Auditory Mechanics. E.R. Lewis, G.R. Long, R.F. Lyon, P.M. Narins, C.R. Steele, and E. Hecht-Poinar (Eds.), World Scientific Press, Singapore, pp. 325-331. Reichardt, W. and T. Poggio. (1976). Visual control of orientation behaviour in the fly. Part I: A quantitative analysis. Quarterly Reviews ofBiophysics 9:311-375. [For Part 11 see Poggio and Reichardt (1976).] Rissanen, J. (1978). Modeling by shortest data description. Automatica 14:465-471. Rissanen, J. (1996). Information theory and neutral nets. In: Mathematical Perspectives on Neural Networks, P. Smolensky, M.C. Mozer and D.E. Rumelhart (Eds.), Lawrence Erlbaum Associates, Mahwah, NJ, pp. 567-602. Robinson, G.ß., R.J. Sclabassi, and T.W. Berger. (1991). Kindling-induced potentiation of excitatory and inhibitory inputs to hippocampal dentate granule cells. I. Effects on linear and nonlinear response characteristics. Brain Research 562: 17-25. Robinson, G.B., S.J. Fluharty, M.J. Zigmond, R.J. Sclabassi, and T.W. Berger. (1993). Recovery of hippocampal dentate granule cell responsiveness to entorhinal cortical input following norepinephrine depletion. Brain Research 614:21-28. Rosenblatt, F. (1962). Principles ofNeurodynamics: Preceptrons and the Theory ofBrain Mechanisms. Spartan Books, Washington, D.C. Rosenblueth, A. and N. Wiener. (1945). The role ofmodels in science. Phi/os. Sei. 12:316-321. Rugh, W.J. (1981). Nonlinear System Theory: The Volterra/Wiener Approach. Johns Hopkins University Press, Baltimore. Rumelhart, D.E. and J.L. McClelland (Eds). (1986). Parallel Distributed Processing, Volumes 1 and 11 MIT Press, Cambridge, MA. Sachs, F. (1992). Stretch-sensitive ion channels: An update. Sensory Transduction 15:241-260. Sakai, H.M. and K.-I. Naka. (1985). Novel pathway connecting the outer and inner vertebrate retina. Nature (London) 315:570. Sakai, H.M. and K.-I. Naka. (1987a). Signal transmission in the catfish retina. IV. Transmission to ganglion cells. Journal ofNeurophysiology 58: 1307-1328. Sakai, H.M. and K.-I. Naka. (1987b). Signal transmission in the catfish retina. V. Sensitivity and circuit. Journal ofNeurophysiology 58: 1329-1350. Sakai, H.M. and K.-I. Naka (1988a). Dissection of neuron network in the catfish inner retina. I. Transmission to ganglion cells. Journal ofNeurophysiology 60: 1549-1567. Sakai, H.M. and K.-I. Naka (1988b). Dissection ofneuron network in the catfish inner retina. II. Interactions between ganglion cells. Journal ofNeurophysiology 60: 1568-1583.
528
REFERENCES
Sakai, H.M. and K.-I. Naka. (1990). Dissection ofneuron network in the catfish inner retina. V. Interactions between NA and NB amacrine cells. Journal ofNeurophysiology 63:120-130. Sakuranaga, M. and K.-I. Naka. (1985a). Signal transmission in catfish retina. I. Transmission in the outer retina. Journal ofNeurophysiology 53:373-388. Sakuranaga, M. and K.-I. Naka. (1985b). Signal transmission in catfish retina. II. Transmission to type-N cell. Journal of Neurophysiology 53:390-406. Sakuranaga, M. and K.-I. Naka. (1985c). Signal transmission in catfish retina. III. Transmission to type-C cell. Journal of Neurophysiology 53:411-428. Sakuranaga, M. and Y.-I. Ando. (1985). Visual sensitivity and Wiener kemels. Vision Research 25:509. Saltzberg, B. and W.D. Burton Jr. (1989). Nonlinear filters for tracking chaos in neurobiological time series. In: Advanced Methods 0/ Physiological System Modeling, Volume IL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 201-214. Sams, A.D. and V.Z. Marmarelis. (1988). Identification oflinear periodically time-varying systems using white-noise test inputs. Automatica 24:563-567. Sandberg, A. and L. Stark. (1968). Wiener G-function analysis as an approach to nonlinear characteristics of human pupil reflex. Brain Research 11:194-211. Sandberg, I.W. (1982). Expansions fornonlinear systems. Bell. Syst. Tech. J. 61:159-200. Sandberg, I.W. (1983). The mathematical foundations of associated expansions for mildly nonlinear systems. IEEE Transactions on Circuits and Systems 30:441-445. Saridis, G.N. (1974). Stochastic approximation methods for identification and control-A survey. IEEE Trans. Automatie ControI19:798-809. Saul, J.P., R.D. Berger, P. Albrecht, S.P. Stein, M.H. Chen, and R.J. Cohen. (1991). Transfer function of the circulation: unique insights into cardiovascular regulation. American Journal 0/Physiology 261:HI231-HI245. Schetzen, M. (1965a). Measurement of the kemels of a nonlinear system of finite order. International Journal 0/ControI2:251-263. Schetzen, M. (1965b). Synthesis of a class of nonlinear systems. International Journal 0/ Control 1:401-414. Schetzen, M. (1974). A theory ofnonlinear system identification. International Journal of Control 20:557-592. Schetzen, M. (1980). The Volterra and Wiener Theories ofNonlinear Systems. Wiley, New York. Schetzen, M. (1981). Nonlinear system modeling based on the Wiener theory. Proc. IEEE 69:1557. Schetzen, M. (1986). Differential models and modeling. Int. J. Contr. 44:157-179. Sclabassi, R.J., C.L. Hinman, 1.S. Kroin, and H. Risch. (1985). A nonlinear analysis of afferent modulatory activity in the cat somatorsensory system. Electroenceph. Clin. Neurophysiol. 60:444-454. Sclabassi, R.J., J.S. Kroin, C.L. Hinman, and H. Risch. (1986). The effect of cortical ablation on modulatory activity in the cat somatosensory system. Electroenceph. Clin. Neurophysiol. 64:31-40. Sclabassi, R.J., D.N. Krieger, and T.W. Berger. (1987). Nonlinear systems analysis of the somatosensory system. In: Advanced Methods 0/Physiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 104-127. Sclabassi, R.J., D.N. Krieger, and T.W. Berger. (1988a). A Systems Theoretic Approach to the Study of CNS Function. Annals 0/Biomedical Enginnering 16:17-34. Sclabassi, R.J., D.N. Krieger, 1. Solomon, J. Samosky, S. Levitan, and T.W. Berger. (1989). Theoretical decomposition of neuronal networks. In: Advanced Methods 0/ Physiological System Modeling, Volume IL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 129-146. Sclabassi, R.J., J.L. Eriksson, R.L. Port, G.B. Robinson, and T.W. Berger. (1988b). Nonlinear sys-
REFERENCES
529
tems analysis of the hippocampal perforant path-dentate projection: I. Theoretical and interpretational considerations. J. Neurophysiol 60: 1066-1076. Sclabassi, R.J., B.R. Kosanoviee, G. Barrionuevo, and T.W. Berger. (1994). Computational methods of neuronal network decomposition. In: Advanced Methods ofPhysiological System Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 55-86. Segall, A. and T. Kailath (1976). Orthogonal functionals ofindependent-increment processes. IEEE Transactions on Information Theory IT-22:287-298. Segundo, J.P. and A.F. Kohn. (1981). A model of excitatory synaptic interactions between pacemakers. Its reality, its generality and the principles involved. Biol Cybern 40:113-126. Segundo, J.P., O. Diez Martinez, and H. Quijano. (1987). Testing a model ofexcitatory interactions between oscillators. Biol Cybern 55:355-365. Segundo, J.P. and O. Diez Martinez. (1985b). Dynamic and static hysteresis in crayfish mechanoreceptors. Biol Cybern 52 :291-296. Segundo, J.P., D.H. Perkel, H. Wyman, H. Hegstad, and G.P. Moore. (1968). Input-output relations in computer-simulated nerve cells. Influence of the statistical properties, strength, number and interdependence of excitatory presynaptic terminals. Kybernetik 4: 157-171. Seyfarth, E.A. and A.S. French. (1994). Intracellular characterization ofidentified sensory cells in a new mechanoreceptor preparation. Journal ofNeurophysiology 71: 1422-1427. Shapley, R.M. and J.D. Victor. (1978). The effect of contrast on the transfer properties of cat retinal ganglion cells. J. Physiol. (London) 285:275. Shapley, R.M. and J.D. Victor. (1979). Nonlinear spatial summation and the contrast gain control of the cat retina. Journal ofPhysiology 290: 141. Shi, J. and H.H. Sun. (1990). Nonlinear system identification for cascade block model: an application to electrode polarization impedance. IEEE Trans. on Biomedical Eng. 37:574-587. Shi, J. and H.H. Sun. (1994). Identification of nonlinear system with feedback structure. In: Advanced Methods ofPhysiological System Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 139-162. Shi, Y. and K.E. Hecox. (1991). Nonlinear system identification by m-pulse sequences: application to brainstem auditory evoked responses. IEEE Trans. on Biomedical Eng. 38:834-845. Shingai, R. E., Hida, and N.-I. Naka. (1983). A comparison of spatio-temporal receptive fields of ganglion cells in the retinas ofthe tadpole and adult frog. Vision Research 23:943. Soderstrom, T. and P. Stoica. (1989). System Identification. Prentice-Hall International, London. Sohrab, S. and S.Y. Yamashiro. (1980). Pseudorandom testing of ventilatory response to inspired carbon dioxide in man. Journal 0/ Applied Physiology 49:1000-1009. Song, D., V.Z. Marmarelis, and T.W. Berger. (2002). Parametrie and non-parametric models of short-term plasticity. In: 2nd Joint IEEE EMBS and BMES Conference, Houston, TX. Song, D., Z. Wang, V.Z. Marmarelis, and T.W. Berger. (2003). Non-parametric interpretation and validation of parametric short-term plasticity models. In: Proceedings 0/ the IEEE EMBS Conference, Cancun, Mexico, pp. 1901-1904. Spekreijse, H. (1969). Rectification in the goldfish retina: Analysis by sinusoidal and auxilliary stimulation. Vision Research 9: 1461-1472. Spekreijse, H. (1982). Sequential analysis ofthe visual evoked potential system in man: Nonlinear analysis ofa sandwich system. Annals ofthe New YorkAcademy ofSciences 388:72-97. Sprecher, D.A. (1972). An improvement in the superposition theorem of Kolmogorov. J. Math. Anal. Appl. 38:208-213. Stark, L. (1968). Neurological Control Systems: Studies in Bioengineering. Plenum Press, New York. Stauffer, E.K., R.A. Auriemma, and G.P. Moore (1986). Responses ofGolgi tendon organs to concurrently active motor units. Brain Research 375: 157-162.
530
REFERENCES
Stark, L. (1969). The pupillary control system: Its nonlinear adaptive and stochastic engineering design characteristics. Automatica 5:655-676. Stavridi, M., V.Z. Mannarelis, and W.S. Grundfest. (1995a). Simultaneous monitoring of spectral and temporal Xe-Cl excimer laser-induced fluorescence. Meas. Sei. Techno/. 7:87-95. Stavridi, M., V.Z. Marmarelis, and W.S. Grundfest. (1995b). Spectro-temporal studies ofXe-CI excimer laser-induced arterial wall fluorescence. Medical Engineering & Physics 17:595-601. Stein, R.B., A.S. French, and A.V. Holden. (1972). The frequency response coherence, and information capacity oftwo neuronal models. Biophysical Journal 12:295-322. Stoica, P. (1981). On the convergence ofiterative algorithm used for Hammerstein system identification. IEEE Transactions on Automatie ControI26:967-969. Suki, B., Q. Zhang, and K. Lutchen. (1995). Relationship between frequency and amplitude dependence in the lung: a nonlinear block-structured modeling approach. Journal 0/ Applied Physiology 79:600-671. Sun, H.H. and J.H. Shi. (1989). New algorithm for Korenberg-Billings model ofnonlinear system identification. In: Advanced Methods 0/ Physiological System Modeling, Volume IL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 179-200. Sun, H.H., B. Omaral, and X. Wang. (1987). Bioelectrode polarization phenomena: a fractal approach. In: Advanced Methods 0/ Physiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 63-72. Sutter, E.E. (1975). A revised conception ofvisual receptive fields based upon pseudorandom spatio-temporal pattern stimuli. In: Proceedings 1st Symposium on Testing and Identification 0/ Nonlinear Systems, G.D. McCann & P.Z. Marmarelis (Eds.), Califomia Institute ofTechnology, Pasadena, CA, pp. 353-365. Sutter, E.E. (1992). A detenninistic approach to nonlinear system analysis. In: Nonlinear Vision, R.B. Pinter and B. Nabet (Eds.), CRC Press, Boca Raton, FL, pp. 171-220. Sutter, E.E. (1987). A practical nonstochastic approach to nonlinear time-domain analysis. In: Advanced Methods 0/ Physiological System Modeling, Volume L V.Z. Mannarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 303-315. Swanson, G.D. and J.W. Belville. (1975). Step changes in end tidal CO 2 : methods and implications. Journal 0/ Applied Physiology 39:377-385. Taylor, M.G. (1966). Use ofrandom excitation and spectral analysis in the study offrequency-dependent parameters ofthe cardiovascular system. Circ. Res. 18:585-595. Theunissen, F.E., K. Sen, and A.J. Doupe. (2000). Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J. Neuroscience 20:2315-2331. Thomas, E.J. (1971). Some considerations on the application of the Volterra representation of nonlinear networks to adaptive echo cancellers. The Bell System Technical Journal 50:2797-2805. Tiecks, F.P., A.M. Lam, R. Aaslid, and D.W. Newell. (1995). Comparison of static and dynamic cerebral autoregulation measurements. Stroke 26: 1014-1019. Toffolo, G., R.N. Bergman, D.T. Finegood, C.R. Bowden, C.R., and C. Cobelli. (1980). Quantitative estimation ofbeta cell sensitivity to glucose in the intact organism. Diabetes 29:979-990. Tranchina, D., J. Godron, R. Shapley, and J.-1. Toyoda. (1984). Retinallight adaptation evidence for a feedback mechanism. Nature (London) 310:314. Tresp, V., T. Briegel, and J. Moody. (1999). Neural-network models for the blood glucose metabolism ofa diabetic. IEEE Transactions on Neural Networks 10:1204-1213. Trimble, J. and G. Phillips. (1978). Nonlinear analysis ofhuman visual evoked response. Biological Cybernetics 30:55-61. Udwadia, F.E. and R.E. Kalaba. (1996). Analytical Dynamies: A New Approach, Cambridge University Press, Cambridge, U.K. Udwadia, F.E. and P.Z. Marmarelis. (1976). The identification ofbuilding structural systems, I. The
REFERENCES
531
linear case; 11. The nonlinear case. Bulletin 01 the Seismological Society of America 66: 125171. Ursino, M., and C.A. Lodi. (1998). Interaction among autoregulation, CO 2 reactivity and intracranial pressure: A mathematical model. American Journal 01Physiology 274:HI715-HI728. Ursino, M., A. Ter Minassian, C.A. Lodi, and L. Beydon. (2000). Cerebral hemodynamics during arterial CO 2 pressure changes: in vivo prediction by a mathematical model. American Journal 01 Physiology 279:H2439-H2455. van Dijk, P,. H.P. Wit, J.M Segenhout, and A. Tubis. (1994). Wiener kernel analysis of inner ear function in the American bullfrog. The Journal 01 the Acoustical Society 01 America 95:904-919. van Dijk, P., H.P. Wit, and J.M. Segenhout. (1997a). Dissecting the frog inner ear with Gaussian noise. I. Application ofhigh-order Wiener kernel analysis. Hearing Research 114:229-242. van Dijk, P., H.P. Wit, and J.M. Segenhout. (1997b). Dissecting the frog inner ear with Gaussian noise. 11. Temperature dependence ofinner-ear function. Hearing Research 114:243-251. van Trees, H.L. (1964). Functional techniques for the analysis of the nonlinear behavior of phaselocked loops. Proc. IEEE 32:891-911. Vassilopoulos, L.A. (1967). The application of statistical theory of nonlinear systems to ship performance in random seas. Int. Shipbuild. Prog. 14:54-65. Vicini, P., A. Caumo, and C. Cobelli. (1999). Glucose effectiveness and insulin sensitivity from the minimal model: consequences of undermodeling assessed by Monte Carlo simulation. IEEE Transactions on Biomedical Engineering 46: 130-137. Victor, J.D. (1979). Nonlinear system analysis: comparison ofwhite noise and sum ofsinusoids in a biological system. Proc. Natl. Acad. Sei. U.S.A. 76:996-998. Victor, J.D. (1987). Dynamics of cat X and Y retinal ganglion cells, and some related issues in nonlinear systems analysis. In: Advanced Methods 01 Physiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 148-160. Victor, J.D. (1989). The geometry of system identification: fractal dimension and integration formulae. In: Advanced Methods 01 Physiological System Modeling, Volume IL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 147-164. Victor, J.D. (1991). Asymptotic approach of generalized orthogonal functional expansions to Wiener kernels. Annals 01Biomedical Engineering 19:383-399. Victor, J.D. and B.W. Knight. (1979). Nonlinear analysis with an arbitrary stimulus ensemble. Q. Appl. Math. 37:113-136. Victor, J.D. and R.M. Shapley. (1979a). Receptive field mechanisms of cat X and Y retinal ganglion cells. Journal 01 General Physiology 74:275. Victor, J.D. and R.M. Shapley. (1979b). The nonlinear pathway ofY ganglion cells in the cat retina. Journal 01 General Physiology 74:671-689. Victor, J.D. and R.M. Shapley. (1980). A method of nonlinear analysis in the frequency domain. Biophysical Journal 29:459-484. Victor, J.D., R.M. Shapley, and B.W. Knight. (1977). Nonlinear analysis of cat retinal ganglion cells in the frequency domain. Proc. Nat!. Acad. Sei. U.S.A. 74:3068. Volterra, V. (1930). Theory 01 Functionals and 01 Integral and Integro-Differential Equations, Dover Publications, New York. Waddington, J. and F. Fallside (1966). Analysis ofnonlinear differential equations by the Volterra series. International Journal ofControl s: 1-15. Watanabe, A. and L. Stark. (1975). Kernel method for nonlinear analysis: Identification of a biological control system. Mathematical Bioseiences 27:99-108. Webster, J.G. (1971). Pupillary light reflex: The development of teaching models. IEEE Transactions on Biomedical Engineering 18: 187-194.
532
REFERENCES
Weiss, P.L., LW. Hunter, and R.E. Kearney. (1988). Human ankle joint stiffness over the full range ofmuscle activation levels. Journal ofBiomechanics 21:539-544. Westwick, D.T. and R.E. Keamey. (1992). A new algorithm for the identification ofmultiple input Wiener systems. Bio!. Cybern. 68:75-85. Westwiek, D.T. and R.E. Keamey. (1994). Identification ofmultiple-input nonlinear systems using non-white test signals. In: Advanced Methods 0/ Physiologieal System Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 163-178. Wickesberg, R.E. and C.D. Geisler. (1984). Artifacts in Wiener kerneis estimated using Gaussian white noise. IEEE Transactions on Biomedical Engineering 31:454-461. Wickesberg, R.E., J.W. Dickson, M.M. Gibson, and C.D. Geisler. (1984). Wiener kemels analysis ofresponses from anteroventral cochlear nucleus neurons. Hearing Research 14:155-174. Widrow, B. and M.A. Lehr. (1990). 30 years ofadaptive neural networks: Perceptron, madaline and back propogation. Proc. IEEE 78: 1415-1442. Wiener, N. (1938). The homogeneous chaos. Am. J. Math. 60:897. Wiener, N. (1942). Response ofa nonlinear device to noise. Report No. 129, Radiation Laboratory, M.I.T., Cambridge, MA. Wiener, N. (1958). Nonlinear Problems in Random Theory. MIT Press, Cambridge, MA. Wray, J. and G.G.R. Green. (1994). Calculation ofthe Volterra kerneis ofnonlinear dynamic systems using an artificial neural network. Biol. Cybern. 71: 187-195. Wysocki, E.M. and W.J. Rugh. (1976). Further results on the identification problem for the class of nonlinear systems SM. IEEE Trans. Circuits & Systems 23:664-670. Yamada, W.M. and E.R Lewis. (1999). Predicting the temporal responses of non-phaselocking bullfrog auditory units to complex acoustic waveforms. Hearing Research 130:155-170. Yamada, W.M. and E.R. Lewis. (2000). Demonstrating the Wiener kernel description of tuning and suppression in an auditory afferent fiber: Predicting the AC and DC response to a complex novel stimulus. In: Reeent Developments in Auditory Mechanics, H. Wada, T. Takasaka, K. Ikeda, K. Ohyama, T. Koike (Eds.), World Scientific, Singapore, pp. 506-512. Yamada, W.M., K.R. Henry, and E.R. Lewis. (2000). Tuning, suppression and adaptation in auditory afferents, as seen with second-order Wiener kerneis. In: Reeent Developments in Auditory Mechanics. H. Wada, T. Takasaka, K. Ikeda, K. Ohyama, and T. Koike (Eds.), World Scientific, Singapore, pp. 419-425. Yamada, W.M., G. Wolodkin, E.R. Lewis, and K.R. Henry. (1997). Wiener kernel analysis and the singular value decomposition. In: Diversity in Auditory Mechanics. E.R. Lewis, G.R. Long, R.F. Lyon, P.M. Narins, C.R. Steele, and E. Hecht-Poinar (Eds.), World Scientific Press, Singapore, pp. 111-118. Yasui, S. (1979). Stochastic functional Fourier series, Volterra series, and nonlinear systems analysis. IEEE Trans. Autom. Contr. 24:230-242. Yasui, S. (1982). Wiener-like Fourier kemels for nonlinear systems identification synthesis (Nonanalytic cascade bilinear and feedback case). IEEE Trans. Automatie Control. 27:667. Yasui, S. and D. H. Fender. (1975). Methodology for measurement of spatiotemporal Yolterra and Wiener kerneis for visual systems. In: Proceedings 1st Symposium on Testing and Identification ofNonlinear Systems. Califomia Institute ofTechnology, Pasadena, CA, pp. 366-383. Yasui, S., W. Davis, and K.I. Naka. (1979). Spatio-temporal receptive field measurements ofretinal neurons by random pattem stimulation and cross-correlation. IEEE Transactions on Biomedical Engineering 26:5-11. Yates, F.E. (1973). Systems biology as a concept. In Engineering Principles in Physiology, Volume 1, H.V. Brown and D.S. Gann (Eds.), Academic Press, New York. Yeshurun, Y., Z. Wollgerg, and N. Dyn. (1987). Identfication ofMGB cells by Yolterra kemeis, 111: a glance into the black box. Biological Cybernetics 56:261-268.
REFERENCES
533
Zadeh, L.A. (1956). On the identification problem. IRE Trans. Circuit Theory 3:277-281. Zadeh, L.A. (1957). On the representation ofnonlinear operators. In: IRE Wescon Conv. Rec., Part 2. 105-113. Zames, G.D. (1963). Functional analysis applied to nonlinear feedback systems. IEEE Transactions on Circuit Theory 10:392--404. Zhang, R., I.H. Zuckerman, C.A. Giller, and B.D. Levine. (1998). Transfer function analysis of dynamic cerebral autoregulation in humans. American Journal 0/ Physiology 274:H233-H241. Zhang, R., I.H. Zuckerman, and B.D. Levine. (2000). Spontaneous fluctuations in cerebral blood flow velocity: insights from extended duration recordings in humans. American Journal 0/ Physiology 278:HI848-1855. Zhao, X. and V.Z. Marmarelis. (1994a). Identification ofparametric (NARMAX) models from estimated Volterra kerneIs. In: Advanced Methods 0/ Physiological System Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 211-218. Zhao, X. and V.Z. Marmarelis. (1994b). Equivalence between nonlinear differential and difference equation models using kernel invariance methods. In: Advanced Methods 0/ Physiological System Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 219-228. Zhao, X. and V.Z. Marmarelis. (1997). On the relation between continuous and discrete nonlinear parametric models. Automatica 33:81-84. Zhao, X. and V.Z. Marmarelis. (1998). Nonlinear parametric models from Volterra kerneIs measurements. Mathl. Comput. Modeling 27:37-43. Zierler, N. (1959). Linear recurring sequences. Journal ofSociety for Industrial Applied Mathematics 7:31-49.
Index
Action potentials, 414 Additive parallel branches, 198 Aliasing, 13 Amplitude nonlinearity, 43 Analysis of estimation errors, 125 Anatomists, 3, 25 Anisoropy, 492 ANN, see Artiticial neural network Apparent transfer function (ATF), 93, 158 illustrative example, 160 of linearized models, 158 Applications oftwo-input modeling to physiological systems, 369 Arbitrary inputs, 52 ARMA model, 148 ARMAX model; see Autoregressive moving average with exogenous variable model ARMAX tX2 model, 148 Artiticial neural network (ANN), 223 Artiticial pancreas, 345, 347 Ascending-order MOS procedure, 258 Asclepiades, 26, 27 ATF, see Apparent transfer function Auditory nerve tibers, 302 Autocorrelation functions of random processes, 505 Autoregressive moving average with
exogenous variable (ARMAX) model, 147, 168 parameters, 147 Axon hillock, 414 Axons, 414 Band-limited GWN, 92 advantages, 92 disadvantages, 93 input, 60 Bandwidth, 270 Beta rule, 243, 279 Bose, Amar, 71, 142 Broadband stochastic input/output signals, 5 Brownian motion, 57 Caltech, 143 Cardiovascular system, 320 Causal systems, 10 Cerebral autoregulation, 22 in humans, 380 Closed-Ioop condition, 153 Closed-Ioop model, 490 autoregressive form, 490 Closed-Ioop systems, 489 network model form, 491 Coherence function, 94 Coherence measurements, 93
Nonlinear Dynamic Modeling 01Physiological Systems. By Vasilis Z. Marmarelis ISBN 0-471-46960-2 © 2004 by the Institute of Electrical and Electronics Engineers.
535
536
INDEX
Comparative use ofGWN, PRS, and CSRS, 92 Comparison of Volterra/Wiener model predictions, 64 Connectionist models, 223 relation with PDM modeling, 230 Constant zeroth-order Wiener functional h o, 73 Constant-switching-pace symmetrie random signal (CSRS), 80 advantages, 93 and Volterra kernels, 84 disadvantages, 93 Cross talk, 43 Cross-correlation technique, 17, 78, 100, 449 Cross-correlation technique (CCT), 113 for Wiener kernel estimation, 72 of nonparametrie modeling, 16 Cross-correlation-based method for multiinput modeling, 390 CSRS, see Constant-switching-pace symmetrie random signal CCT, see Cross-correlation technique Cubic feedback, 222 systems, 204 Cybernetics, 32 Data consolidation method, 276 Data preparation, 275 Data record length, 273 Deductive modeling, 24 Delta-bar-delta rule, 243, 279 Democritus, 26 Dendrites, 414 Dendritic potentials (DPs), 415 Diagonal estimability problem, 85 Differential-equation models, 145 Discrete-time representation of the CSRS functional series, 89 Discrete-time Volterra kerneis ofNARMAX models, 164 Discretized output, 17 Disease process, 3 DLF expansions for kernel estimation, 112 DP, see Dendritic potentials Dual-input stimulation in the hippocampal slice, 455 Duffing system, 98 Dynamic nonlinearity, 43 Dynamic range, 270 Dynamic system physiology, 3 Dynamic systems, 30 ED, see Eigen decomposition
Efficient Volterra kerne1estimation, 100 Eigen-decomposition (ED) approach, 188 ELS, see Extended least squares Empiricists, 3, 26, 27 Enhanced convergence algorithms for fixed training steps, 242 Equivalence between connectionist and Volterra models, 223 Equivalence between continuous and discrete parametrie models, 171 illustrative example, 175 Erasistratus,25 Ergodicity, 273, 505 Erroneous scaling ofkernel estimates, 136 Error term, 9 Estimation bias, 128 Estimation error, 132 analysis of, 125 Estimation errors associated with direct inversion methods, 137 Estimation Errors Associated with iterative cost-minimization methods, 139 Estimation errors associated with the crosscorrelation technique, 127 Estimation of ho, 73 Estimation of h2 (t h t2 ) , 74 Estimation of h 3 (t l , t2 , t3 ) , 75 Estimation variance, 130, 132 Extended least-squares (ELS) procedure, 150 Fast exact orthogonalization, 55 Feedback branches, 200 Feedback mechanisms, 200 Feedforward Volterra-equivalent network architectures, 229 Filter banks, 251 First-order Volterra functional, 35 First-order Volterra kernei, 34 Fly photoreceptor, 85 F-ratio test, 151 Frequency-domain estimation ofWiener kemels, 78 Function expansions, 495-498 Functional integration in the single neuron, 414 Galen ofPergamos (Galenos), 3 27, 28, 143 Gaussian white noise (GWN), 499 input, 30 test input, 16 General model of mebrane and synaptic dynamies, 408 Generalized harmonie balance method, 173
INDEX
Global predictive model, 153 Glucose balance, 354 equation, 354 Glucose metabolism, 344 Glucose production, 354 Glucose-insulin minimal model, 21; see also Minimal model Graded potentials, 414 Gram matrix, 53, 54, 104 GWN, see Gaussian white noise Harvey, William, 3, 27 H-H model, see Hodgkin-Huxley model Hidden layer, 278 Higher-order nonlinearities, 37, 116 High-order kernels, 78 High-order Volterra modeling with equivalent networks, 122 High-order Volterra models, 122 Hippocampal formation, 448 Hippocrates, 1, 3, 25, 26, 27, 28, 143 Hodgkin-Huxley (H-H) model, 162,408 Homogeneous chaos, 57 Hypothesis-driven research, 4 Impulse invariance method, 171 Impulse sequences, 51 Impulsive input, 42 Inductive modeling, 24 Inductively derived models, 2 Input characteristics, 269 Input-additive noise, 135 Input-output data, 13, 30 Input-output signal transformation, 8 Instrumental variable (IV) method, 153 Insulin action, 354 Insulin production, 354 Insulin secretion, 344 Insulin sensitivity, 347 Insulin-glucose interactions, 342 Insulinogenic PDM, 352 Insulinoleptic PDM, 353 Integrated PDM model with trigger regions, 427 Integrative and dynamic view of physiology, 3 Interaction layer, 278 Interference, 8, 135, 271 Interpretation ofthe PDM model, 282 Interpretation of Volterra kernels, 281 Invertebrate photoreceptor, 18 Invertebrate retina, 296
537
Iterative cost-minimization methods for nonGaussian residuals, 55 Iterative estimation methods, 139 KBR method, see Kemel-based method Kernel expansion approach, 55 Kernel expansion method, 469 kernel expansion methodology, 101 kernel invariance method, 171, 172 Kernel-based (KBR) method, 169, 170 Kernel-expansion method for multiinput modeling, 393 Kronecker delta, 69 Lag-delta representation ofP-V or P-W kernels, 444 Laguerre expansion technique (LET), 31, 107, 113,455 Laguerre functions, 107 Laguerre-Volterra Network (LVN), 246 illustrative example, 249 Lateral feedforward branches, 198 Lee, Y. W., 142 Leucippus, 26 Light-to-horizontal cell model, 217 Likelihood of firing, 286 Linear time-varying systems with arbitrary inputs, 479 Linearized models, apparent transfer functions of, 158 L-N cascade, 194 system, 38, 96 L-N model, 196 L-N-M cascade, 194, 198 L-N-M model, 196 L-N-M "sandwich" system, 39 LVN variant with two filter banks (LVN-2), 253 LVN-2 Modeling, 255 Marmarelis, Panos, 143, 286, 287, 295, 361, 369 Marmarelis, Vasilis, 143, 288 Mathematical models, 1, 7 "ideal," 7 "less than ideal," 7 bandwidth, 11 compact,7 datasets,8 dynamic range, 11 efficiency, 11 global, 7
538
INDEX
Mathematical models (continued) interpretable, 7 operational range, 11 practical considerations and experimental requirements of, 266 robustness, 7 scientific interpretability, 11 signals, 8 trade-off between model parsimony and its global validity, 11 McCann, Gilbert, 143, 287, 361 MDV model, see Modified diserete Volterra (MDV) model MDV modeling methodology, 277 Measurement noise, 5, 8 Metabolie autoregulation in dogs, 378 Metabolic-Endocrine system, 342 Method of generalized harmonie balance, 154 Methodists, 27 Minimal model of insulin-glucose interaction, 161,353,354,356 Minimum-order modeling of spike-output systems, 431 Minimum-order Wiener models, 434 illustrative example, 438 MIT, 142 Model estimation, 10, 276, 284 Model interpretation, 281 Model order determination, 104 Model specification, 10, 276, 284 inductive versus deductive model development, 11 Model validation, 279 Modeling errors, 8, 126 Modeling of closed-loop systems, 489-493 Modeling of multiinput/output systems, 359-406 multiinput case, 389 two-input case, 360 Modeling of neuronal ensembles, 462 Modeling of neuronal systems, 407-465 Modeling of nonstationary systems, 467-488 illustrative example, 474 Modeling physiological systems with multiple inputs and multiple outputs, 359 Modified discrete Volterra (MDV) model, 103 Modular and connectionist modeling, 179-264 Modular form of nonparametric models, 179 Modular representation, 177 Modulatory feedforward branches, 198 Motion detection in the invertebrate retina, 369
Multiinput/multioutput systems, 10 Multiple interconnections, 5 Multiple variables of interest, 5 Multiplicative feedback, 201 Myogenic mechanism, 333 Naka, Ken, 143,286,287,288,290,295,361 NARMAX model, 151, 152, 164, 165, 168, 169,170 Negative decompressive feedback, 222 nervous system, 407 Network-based methods, 480 applications to nonstationary physiological systems, 484 illustrative examples, 481 Network-based multiinput modeling, 393 Neuronal modes, (NMs), 417 Neuronal systems with point-process inputs, 438 Neuronal unit, 408 Neurosensory systems, 286 N-M cascades, 194 N-M model, 196 Noise, 271 effects, 134 Non-Gaussian white-noise, 500 Non-Gaussian, quasiwhite input signals, 77 Nonlinear (stationary) systems, 150 Nonlinear autoregressive modeling (openloop),246 Nonlinear behavior, 6 Nonlinear dynamic analysis, 3 Nonlinear dynamics, 5 Nonlinear feedback, 220 described by differential equations, 202 in sensory systems, 216 Nonlinear modeling of physiological systems, 29 conceptual/mathematical framework, 30 objective, 30 strengths, 29 weaknesses, 29 Nonlinear modeling of synaptic dynamics, 459 Nonlinear models ofphysiological systems, 13 connectionist, 14 modular, 14 nonparametric, 14 parametric, 14 Nonlinear parametric models with intermodulation, 161 Nonlinearity, 12 amplitude, 43
INDEX
Nonparametrie modeling, 29-143 Nonstationarity, 12, 13, 152, 467 modeling of, 467-488 modeling problem, 472 system, 30 in system dynamics, 5 system/model, 34 Nonwhite Gaussian inputs, 98 One-step predictive model, 153 Open-Ioop condition, 153 Optimization of input parameters, 131 Ordinary least-squares (OLS) estimate, 53 Orthogonal Wiener series, 30 Parabolic leap algorithrn, 279 Parallel-cascade Method, 55 Parametrie model, 167, 168 Parametrie modeling, 145-178 basic parametric model forms and estimation procedures, 146 PDM (principal dynamic mode) model interpretation, 282 insulinogenic, 352 insulinoleptic, 353 integrated, with trigger regions, 427 Physiological system modeling, 3, 6, 7 complexity, 11 data driven, 11 data-driven, 25 deductive, 24 inductive, 11, 24 linearity, 12 nonlinearities in, 12 nonstationarities in, 12 superposition principle., 12 synergistic, 24, 25 Physiological system modeling problem, 13 Physiological variables, 8 inputs, 8 outputs, 8 Physiology, problems of modeling in, 6 Piecewise stationary modeling methods, 468 Positive compressive feedback, 222 Positive nonlinear feedback, 213 Posterior filter, 245 Practical considerations and experimental requirements of mathematical modeling, 266 Preliminary testing, 272 test for system bandwidth, 272 test for system linearity, 274
539
test for system memory, 272 test for system stationarity and ergodicity, 273 Prewhitening, 56 filter, 150 Principal dynamic mode (PDM), 179, 180; also seePDM Problems ofmodeling in physiology, 6 Pseudolinear regression problem, 150 Pseudomode-peeling method, 245 Pseudorandom sequences, 91 Pseudorandom signals (PRSs), 89 advantages, 93 based on m-Sequences, 89 disadvantages, 93 based on m-Sequences, 89 Quadratic Volterra system, 97 Quasistationary approach, 468, 469 Quasiwhite test input, 30, 80 Quasiwhiteness, 270 Random broadband input signals, 6 Real physiological systems, 5 Receptive field organization in the vertebrate retina, 370 Recuperative faculties, 3 Recursive tracking methods, 468 Reduced P-V or P-W kerneis, 445 Regulatory branches, 199 Regulatory feedback, 202 Relation between Volterra and Wiener models, 60 analytical example, 86 comparison ofmodel prediction errors, 88 Renal system, 333 Residual, 9 Residual whitening method, 150 Reverse-correlation technique, 432 Riccati equation, 19, 157, 158 Riccati system, 40 Sandwich model, 39 Second-order kerneis of nonlinear feedback systems, 215 Second-order Volterra functional, 36 Second-order Volterra kernei, 44 Second-order Volterra system, 35, 36 Second-order Wiener model, 16 Separable Volterra network (SVN), 225 Settling time, 267
540
INDEX
Sigmoid feedback, 222 systems, 209 Signal characteristics, 266 Significant response, 267 Single-input stimulation in vitro, 455 Single-input stimulation in vivo, 449 Sinusoidal input, 43 Socrates,5 Sources of estimation errors, 125 estimation method errors, 125 model specification errors, 125 noise/interference errors, 125 Spatiotemporal modeling, 395, 397 of cortical cells, 402 of retinal cells, 398 Spectrotemporal model, 397, 395 Spider mechanoreceptor, 307 Spontaneous insulin-to-glucose PDM model, 352 Stark, Larry, 286 Static nonlinear system, 37 Stationarity, 12, 273, 505 Stationary system, 30 Step-by-step procedure for physiological system modeling, 283 Stochastic error term, 8, 9 Sum of sinusoids of incommensurate frequencies,52 SVN, see separable Volterra network Synaptic junction, 414 System bandwidth, 266, 272 System characteristics, 266 System dynamic range, 267 System ergodicity, 268 System linearity, 268, 274 System memory, 267, 272 System modeling, 8, 11 System nonlinear dynamics, 4 System stationarity, 268 Systemic interference, 5 Systemic noise, 135 Taylor multivariate series expansion of an analytic function, 32 Taylor series coefficients, 37 Taylor series expansion, 33 Test of nonstationarity, 475 Test systems for system bandwidth, 272 for system linearity, 274 for system memory, 272 for system stationarity and ergodicity, 273
TGF, see tubuloglomerular feedback Themison,27 Three-Iayer perceptron (TLP), 224 Time invariance, 268 TLP, see three-Iayer perceptron Transformation, 414 Trigger lines, 421 Trigger regions, 417, 421 Tubuloglomerular feedback (TGF), 333 Two-dimensional Fourier transform, 36,44 Two-input cross-correlation technique, 362 Two-input kernel-expansion technique, 362 Variable step algorithms, 243 Vector notation, 10 VEN, see Volterra-equivalent network VENNWM modeling methodology, 278 Vertebrate retina, 15, 287 Vesalius, 3, 27 Volterra, Vito, 31, 32, 33, 140, 141 Theory 0/Functionals and IntegroDifferential Equations, 32 Volterra analysis of Riccati equation, 19 Volterra-equivalent network (VEN), 223,224 architectures, 223, 235 for nonlinear system modeling, 235 convergence and accuracy of the training procedure, 240 equivalence with Volterra kemels/models, 238 network parameter initialization, 241 selection of the structural parameters, 238 selection of the training and testing data sets, 240 with two inputs, 364 illustrative example, 366 Volterra functional, 33, 34, 36, 42, 43, 44 expansion, 30, 37 series expansion, 31 Volterra kernei, 18,20,22, 30, 33,34,37,42, 167 discrete-time, 20 expansion, 101 estimation of, 49, 101 first order, 20 meaning of, 45 of nonlinear differential equations, 153 operational meaning, 41 second order, 20 Volterra modeling framework, 35 Volterra models, 31, 37, 223 discrete-time,47
INDEX
frequency-domain analysis, 48 frequency-domain representation, 45 of system cascades, 191 of systems with feedback branches, 200 of systems with lateral branches, 198 Volterra series, 30, 31, 32 expansion, 32, 33, 37 Volterra-Wiener approach, 4, 15 VolterraIWiener model predictions, comparison of, 64 Volterra-Wiener network (VWN), 122 Volterra-Wiener-Marmarelis (VWM) model, 260 VWN see Volterra-Wiener network
Weierstrass theorem, 33 Whiteness, 270 Wiener, Norbert, 6, 32, 140, 141, 142, 143 approach to kernel estimation, 67 Wiener class of systems, 62 Wiener functionals, 58, 59 Wiener kernel, 16, 30, 58, 59, 77 estimation, 77 Wiener model, 57, 30, 195 examples of, 63 Wiener series, 57, 58, 503 construction of, 503 Wiener-Bose model, 122
541