This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
T ~ " ' , 7 is T~'. In the simulations the upper bound was very high and the lower bound was set to 0. Both bounds were in fact never reached in practice.
if A, < (1-crit)k T = T - 0.01
(54
Learning rule
else, if A remains within the bounds defined by crit,
The learning rule is a simple Hebbian rule that also allows decreases in weight, as follows:
ifA,>k
Aw,, = p+afaj- p - a,( 1 - u j ) , where
T = T+0.003333
F+,p - > o
ifA,
T = T - 0.003333
(5b)
where crit is the criterion for deciding whether A, is much larger or smaller (crit = 0.20 works well for the simulations reported here). One disadvantage of this method is that T may change too quickly so that the module starts to oscillate violently. To prevent this, A, is dampened by making it a moving average, AT, of the current total activation and the activation of previous iterations. When AT is the current level of activation, the value used to compute both the level of inhibition A,T and the change in the parameter T is: A,= 0%-
I
+ 0.5AT
(8)
(6)
This precedes calculation of the new threshold T (Eq. 5a, b). Finally, if T < T"'",T is set to T"'", and if T > Tflux it is set to T"". 7'"" was set to a very high number
and w,(t
+ I ) = w,(t) + Aw,
(9)
where Awu represents the weight change and p-and p+ represent the learning rates. Learning rate can vary rapidly due to certain central states (e.g., motivational) or very slowly (e.g., due to aging). Network sizes and patterns
In a typical simulation, the following sizes were used. The trace system consisted of 150 neurons, the link system contained 32 neurons. In practice, we often used full connectivity, even though on theoretical grounds (very) sparse connectivity is to be preferred. We have for these simulations, however, not found any systematic effects of full versus sparse connectivity (see Murre and Raffone,
63
in preparation, for a thorough investigation of this issue), except that it reduces the variance in the results. With a network of this size, fifteen patterns were typically used. The patterns were generated externally (i.e., formation of the representations is currently not part of the model), such that there was no overlap of patterns at the trace level and some overlap (16.7%) at the link level. The latter causes an exponential forgetting process in the link system which is partially compensated by the consolidation process at the trace level. In Murre and Meeter (submitted) we present a mathematical treatment of this process and show that it causes the model to exhibit many psychologically plausible characteristics with respect to the shape of learning and forgetting. Larger networks and more patterns will be necessary to assess the stability of the consolidation process with larger-scale simulations. As described in the text the patterns are taught sequentially, with intermediate consolidation periods in which the to-be-consolidated patterns were cued by an autonomous noise process. No pattern cues were administered during the consolidation process.
Acknowledgements We would like to thank all the patients with semantic dementia (and their spouses) who have given so generously of their time. We would also like to thank our colleagues who have contributed to this work through many helpful discussions over the last few years. This research was supported by the Medical Research Council, the British Council, the Netherlands Royal Academy of Arts and Sciences (J.M.) and the Wellcome Trust (J.H. and K.G.).
References Abeles, M. (1991). Corticonics: neural circuits of the cerebral cortex. Cambridge: Cambridge University Press. Ackley, D.H., Hinton, G.E. and Sejnowski, T.J. (1983). A learning algorithm for Boltzmann machines. Cogn. Sci., 9: 147-169. Alvarez, P. and Squire, L.R. (1994). Memory consolidation and the medial temporal lobe: a simple network model. Proc. Nut. Acud. Sci., USA, 91: 7041-7045. Breedin. S.D., Saffran, E.M. and Coslett, H.B. (1994). Reversal of the concreteness effect in a patient with semantic dementia. Cogn. Neuropsycho., 1 1: 617-660.
Calabrese P., Markowitsch, H.J., Durwen, H.F., Widlitzek, H., Haupts, M.. Holinka, B. and Gehlen, W. (1996). Right temporofrontal cortex as critical locus for the ecphory of old episodic memories. J. Neurol., Neurosurg. Psychiutr:, 6 I : 304-3 10. Caselli, R.J.. Jack, C.R., Petersen, R.C., Wahner, H.W. and Yanagihara, T. ( 1992). Asymmetric cortical degenerative syndromes: Clinical and radiologic correlations. Neurology, 42: 1462-1468. Croot, K., Patterson, K. and Hodges, J.R. (1998). Single word production in nonfluent progressive aphasia. Bruin Lung., 61 : 226-273. Crovitz, H.F. and Schiffman, H. (1974). Frequency of episodic memories as a function of their age. Bull. Psychonomic Soc., 4: 517-518. Diesfeldt, H.F.A. ( 1992). Impaired and preserved semantic memory functions in dementia. In: L. Backman (Ed.), Memory Functioning in Dementia. Elsevier Science Publishers, pp. 227-263. Felleman, D.J. and Van Essen, D.C. (1991). Distributed hierarchical processing in the primate cerebral cortex. CerebK Cortex, 1: 1 4 7 . Graham, K.S. (1999) Semantic dementia: A challenge to the Multiple Trace Theory of memory consolidation? Trends in Cognitive Sciences, 3 , 85-87. Graham, K.S. and Hodges, J.R. (1997). Differentiating the roles of the hippocampal complex and the neocortex in long-term memory storage: Evidence from the study of semantic dementia and Alzheimer’s disease. Neuropsychology, 11: 77-89. Graham, K.S., Becker, J.T., Patterson, K. and Hodges, J.R. (1997a). Lost for words: A case of primary progressive aphasia? In: A.J. Parkin (Ed.), Case Studies in the Neuropsychology of Memory. Psychology Press, Hove, East Sussex, pp. 83-1 10. Graham, K.S., Becker, J.T. and Hodges, J.R. (1997b). On the relationship between knowledge and memory for pictures: Evidence from the study of patients with semantic dementia and Alzheimer’s disease. J. Int. Neuropsychol. Soc. 3: 534-544. Graham, K.S., Patterson, K., Pratt, K.H. and Hodges, J.R. Relearning and subsequent forgetting of semantic category exemplars in a case of semantic dementia. Neuropsychology. (in press, b). Graham, K.S., Pratt, K.H., and Hodges, J.R. (1998). A reverse temporal gradient for public events in a single-case of semantic dementia. Neurocuse, 4,46 1 4 7 0 . Graham, K.S., Simons, J.S., Pratt, K.H., Patterson, K. and Hodges, J.R. Insights from semantic dementia on the relationship between episodic and semantic memory. Neuropsychologiu. (in press, a). Greene, J.D.W. and Hodges, J.R. (1996). Identification of famous faces and famous names in early Alzheimer’s disease. Relationship to anterograde episodic and general semantic memory. Bruin, 119: 1 1 1-128. Greene, J.D.W., Hodges, J.R. and Baddeley, A.D. (1995). Autobiographical memory and executive function in early
64 dementia of Alzheimer type. Neuropsychologia, 12: 1647-1670. Harasty, J.A., Halliday, G.M., Code, C. and Brooks, W.S. (1996). Quantification of cortical atrophy in a case of progressive fluent aphasia. Brain, 1 19, 181-190. Hasselmo, M.E. (1995). Neuromodulation and cortical function: Modeling the physiological basis of behavior. Behav. Brain Rex, 67, 1-27. Hodges, J.R., Garrard, P. and Patterson, K. (1998). Semantic dementia. In: A. Kertesz and D. Munoz (Ed.), Pick’s Disease and Pick Complex-.New York: Wiley, pp. 83-104. Hodges, J.R., Graham, N.E. and Patterson, K. (1995). Charting the progression in semantic dementia: implications for the organisation of semantic memory. Memory, 463495. Hodges, J.R. and Patterson, K. (1995). Is semantic memory consistently impaired early in the course of Alzheimer’s disease? Neuroanatomical and diagnostic implications. Neuropsvchologia, 33: 441459. Hodges, J.R. and Patterson, K. (1996). Nonfluent progressive aphasia and semantic dementia: a comparative neuropsychological study. J. In?. Neuropsychol. Soc., 2: 5 11-524. Hodges, J.R., Patterson, K., Oxbury, S. and Funnell, E. (1992). Semantic dementia: progressive fluent aphasia with temporal lobe atrophy. Brain, 115: 1783-1806. Hodges, J.R., Patterson, K. and Tyler, L.K. (1994). Loss o f semantic memory: implications for the modularity of mind. Cogn. Neuropsychol., 1 1: 505-542. Hodges, J.R. and Graham, K.S. (1998). A reversal of the temporal gradient for famous person knowledge in semantic dementia: Implications for the neural organisation of longterm memory. Neuropsychologia, 36: 803-825. Howard, D. and Patterson, K. (1992). Pyramids and Palm Trees: A Test of Semantic Access From Pictures and Words. Bury St Edmunds, Suffolk: Thames Valley Test Company. Kapur, N. (1993). Focal retrograde amnesia in neurological disease: a critical review. Cortex, 29: 217-234. Kapur, N.D., Ellison, M.P., Smith, D.L., McLellan and Burrows, E.H. ( 1992). Focal retrograde amnesia following bilateral temporal lobe atrophy. Brain, 115: 73-85. Kim, J.J. and Fanselow, M.S. (1992). Modality-specific retrograde amnesia of fear. Science, 256: 675-677. Knott, R., Patterson, K. and Hodges, J.R. (1997). Lexical and semantic binding effects in short-term memory: evidence from semantic dementia. Cogn. Neuropsychol., 8: 1165-1216, Kopelman, M.D. and Mum, J.M.J. The localizution of memory. In preparation. Kopelman, M.D., Wilson, B.A. and Baddeley, A.D. ( 1990). The Autobiographicid Memory Interview. Bury St. Edmunds, Suffolk: Thames Valley Test Company. McClelland, J.L., McNaughton, B.L., and O’Reilly, R.C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures o f connectionist models of learning and memory. Psychol. Rev., 102: 4 19457.
Meiran. N. and Jelicic, M. (1995). Implicit memory in Alzheimer’s disease: a meta-analysis. Neuropsychology, 9: 29 1-303. Mesulam, M.M. (1982). Slowly progressive aphasia without generalized dementia. Ann. Neurol., 11: 592-598. Mesulam, M.M. and Weintraub, S. (1992). Primary progressive aphasia. In: F. Boller (Ed.), Heterogen. Alzheimer’.s Dis., Springer-Verlag, Berlin, pp. 43-66. Mingazzini, G . (1914). On aphasia due to atrophy of the cerebral convolutions. Brain, 36: 493-524. Moscovitch. M. and Nadel, L. (1998). Consolidation and the hippocampal complex revisited: in defence of the multipletrace model. Cum Opin. Neuvobiol., 8: 297-300. Moscovith, M. and Nadel, L., (1999) Multiple-trace theory and semantic dementia: Response to K.S. Graham (1999) Trends in Cogn. Sci., 3, 87-89. Mummery, C.J., Patterson, K. Wise, R.J.S. Price, C.J. and Hodges, J.R. (1999) Disrupted temporal lobe connections in semantic dementia. Brain., 122, 61-73. Murre, J.M.J. (1996). TraceLink: A model of amnesia and consolidation of memory. Hippocampus, 6: 675-684. Murre, J.M.J. (1997). Implicit memory in amnesia: some explanations and predictions by the TraceLink model. Memory, 5: 213-232. Murre, J.M.J.. Graham, K.S. and Hodges, J.R. Semantic dementia: New constraints on connectionist models of longterm memory (submitted). Murre, J.M.J. and Meeter, M. A model of amnesia and longterm learning and forgetting. Submitted to Psychological Review. (submitted). Murre, J.M.J. and Raffone, A. The self-organization of longterm memory in the cerebral cortex: A model based on emerging synfire chains. (in preparation). Murre, J.M.J. and Sturdy, D.P.F. (1995). The connectivity of the brain: multi-level quantitative analysis. Biol. Cyhernet., 73: 529-545. Nadel, L. and Moscovitch, M. (1997). Memory consolidation, retrograde amnesia and the hippocampal complex. Curr. Opin.Neurobiol., 7: 217-227. Nadel, L. and Moscovitch, M. (1998). Hippocampal contributions to cortical plasticity. Neuropharmacology, 37: 431439. Ostenieth, P.A. (1944). Le test de copie d’une figure complexe: Contribution a I’etude de la perception et de la memoire. Archives de Psychologie, Geneva, 30: 205-220. Pick, A. (1892). Uber die Beziehungen der senilen Hirnatrophie zur Aphasic Prager Medizinsche Wochenschrjfi, 17: 165- 167. Poeck, K. and Luzzatti, C. (1988). Slowly progressive aphasia i n three patients: The problem of accompanying neuropsychological deficit. Brain, 1 1 1: 151-168. Press, G.A., Amaral, D.G. and Squire, L.R. (1989). Hippocampal abnormalities in amnesic patients revealed by high-resolution magnetic resonance imaging. Nature, 341 : 54-57. Raven, J.C. (1962). Coloured Progressive Matrices Sets A, AB, B. London: H.K. Lewis.
65 Ruppin, E. and Reggia, J. (1995). A neural model of memory impairment in diffuse cerebral atrophy. BK J. Psychzat., 166: 19-28. Scheltens, P., Ravid. R. and Kamphurst, W. (1994). Pathologic findings in a case of primary progressive aphasia. Neurology, 44: 279-282. Scoville. W.B. and Milner, B. (1957). Loss of recent memory after bilateral hippocampal lesions. J. Neural., Neurosurgery and Psychiatry, 20: 11-2 1. Simons, J.S., Graham, K.S., Watson, P. and Hodges, J.R. To what extent is episodic memory dependent upon semantic knowledge? Insights from the study of semantic dementia. Neuropsychology. (submitted). Snowden, J.S., Goulding, P.J. and Neary. D. (1989). Semantic dementia: a form of circumscribed cerebral atrophy. Behav. Neurol., 2: 167-182. Snowden, J.S., Griffiths, H.L. and Neary, D. (1996a). Semanticepisodic memory interactions in semantic dementia: Implications for retrograde memory function. Cogn. Neuropsycho/., 13: 1101-1 137. Snowden. J.S., Neary, D. and Mann, D.A. (1996b). Frontaltemporal lobar degeneration: fronto-temporal dementia,
progressive aphasia, semantic dementia. New York: Churchill Livingstone. Squire, L.R. (1992). Memory and the hippocampus: a synthesis from findings with rats. monkeys, and humans. Psychol. Rev., 99: 195-231. Tulving, E. (1983). Elemenrs of Episodic Memory. Oxford University Press: Oxford. Tulving, E. and Markowitsch, H.J. (1998). Episodic and declarative memory: The role of the hippocampus. Hippocampus, 8: 198-204. Tyler, L.K. and Moss, H.E. (1998) Going, going, gone ...? Implicit and explicit tests of conceptual knowledge in a longitudinal study of semantic dementia. Neuropsychologia, 36, 1313-1323. Wanington, E.K. (1975). The selective impairment of semantic memory. Quarterly J. Exp. Psychol., 27: 635-657. Winocur, G. (1990). Anterograde and retrograde amnesia in rats with dorsal hippocampal or dorsomedial thalamic lesions. Behav. Brain Res., 38: 145-154. Zola-Morgan, S., Squire, L.R. and Amaral, D.G. (1986). Human amnesia and the medial temporal region: Enduring memory impairment following a bilateral lesion limited to field CAI of the hippocampus. J. Neurosci., 6: 2950-2967.
This Page Intentionally Left Blank
J.A. Reggia. E. Ruppin and D. Glanzman (Eds.) Progress in Brain Research. Vol 121 0 1999 Elscvier Science BV. All rights reserved.
CHAPTER 4
Multimodular networks and semantic memory impairments David Horn', Nir Levy' and Eytan Ruppin2 'School qf Physics and Astronomy, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel 2Departments of Physiology & Computer Science, Sackler School of Medicine & Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978. Israel
Introduction Imaging studies (Martin et al., 1995, 1996) support the notion that the knowledge of concrete objects is stored in the brain in a distributed network of cortical areas. Attributes that define an object seem to be represented close to the cortical regions that mediate perception of these attributes. Hence, if we wish to represent the memory of some object using an attractor neural network, we had better let its architecture reflect this general structure. This we try to do in our model, in which the network is constructed of different modules, assuming each module to represent a different cortical area. There exists evidence for cortical modules in the somatosensory (Mountcastle, 1957), visual (Hubel and Wiesel, 1977) and association (Hevner, 1993) cortices. Cortical modules were also shown to function as memory units (see Amit, 1995 for a review). Given a multi-modular structure one may now wonder how easy it is to store in it memory patterns that are coded in a variable number of modules. This problem was first introduced by (Lauro-Grotto et al., 1994). They found that they have to separately define the category to which each memory belongs, thus determining its level of modular coding, and then to modulate synapses and neuronal thresholds during retrieval. This process *Corresponding author. e-mail: [email protected]
has led (Lauro-Grotto et al., 1994) to conclude that storage and retrieval of memories that are coded in a variable number of modules must involve consciousness. In a recent work (Levy et al., 1999), we have introduced a low-level mechanism that solves this problem. It is based on a functional distinction between intra-modular and inter-modular synaptic couplings. The post-synaptic currents induced by inter-modular synapses are supposed to undergo additional dendritic non-linear processing before reaching the soma. This new squashing function on the inputs coming from other modules, eliminates the difference between contributions to the postsynaptic potential generated by memories with different levels of activity. The biological motivation for intrdinter modular synaptic segregation hinges on the observation that neurons from distant modules synapse onto the distal part of the dendritic tree (Hetherington and Shapiro, 1993; Yuste et al., 1994; Markram et al., 1997). Given this structure we can readily investigate a well-known psychological problem, namely category specific impairments in the presence of focal damage, such as in deep acquired dyslexia (Warrington and Shallice, 1984). It is quite easy to understand in our model why memories with higher activity are less susceptible to damage. Following Hinton and Shallice (1991) one may see this as the reason for the higher resilience of memories of natural objects to focal damage. A recent investigation reports a reverse situation observed for patients
68
of acute Alzheimer’s disease for whom artifact memories are more robust (Gonnerman et al., 1997). We will show how this fact can also be accounted for in our model.
In this section we present the mathematical description of our model. It contains M memory patterns in L modules of N binary neurons each. Each memory T~ is defined on a subset of size SZ” of the L modules. We refer to LRp as its modular coding. The sparse coding level inside a module is p < < 1. The synaptic efficacy J: between the jth (presynaptic) neuron from the kth module and the ith (postsynaptic) neuron from the Ith module is chosen in a Hebbian manner M
+ 1) = e[hj(t) - en],
(2)
where 0[x] is the step function and 8, is the neuronal threshold. The neuron’s local field, or membrane potential, has two components mrernui(t)
+ hl e x r r r n J t ) .
(3)
The internal field is
with inhibition proportional to the total activity inside the module
The external field is
where %[XI represents dendritic processing of postsynaptic currents from neurons situated in other cortical modules. The retrieval quality at each recall trial is measured by the overlap function that describes the similarity between the final state V the network converges to and the memory pattern q’l that is cued in each trial. It is defined by L
N
where W is the modular coding of q”,
This synaptic matrix is a natural extension of (Tsodyks, 1989) formulation for the case of multimodular network. The updating rule for the activity state of the ith binary neuron in the Zth module is given by
hf(t) = hf
1
(6)
The multi-modular model
vf(t
L
The importance of non-linear dendritic processing Most conventional neural network models of associative memory are limited to storing memories with fixed coding levels. It is quite a computational challenge to store memories with variable coding levels in the same network. In our network, where we use a fixed coding level p inside the module, this question turns into one of storing memories with different levels of modular codings (i.e. different values of W specifying the number of modules in which the memory p is encoded). It is here where the non-linear character of the dendritic transfer function plays a crucial role. Representing it by a Heaviside threshold function has the important advantage that the neuronal input that is due to activity in other modules will always have a standard strength. Thus a neuron activated during memory retrieval feels the input of all other neurons in its own module as well as a standard strong input reflecting the fact that several other modules of the same memory pattern are active, but it is not sensitive to the exact number of other active modules. An example is presented in Fig. 1 where we compare the case of %[XI =x, corresponding to no
69
dendritic processing, with the non-linear %[XI = X O[.u], where 8 is the Heaviside step function. It shows the performance of the two networks when the stored memories have different levels of modular codings. We measure the network’s performance by the mean overlaps of the retrieved memory patterns. We choose the modular coding to be homogeneously distributed with equal proportions for all memories, e.g. we store 5 memories with cRp= 1, another 5 memories with W = 2 , and so on. As can be seen, for the given choice of threshold parameters, the conventional network can only sustain memories with high modular codings. In order to retrieve patterns with low W, non-linear inter-connections are needed. In Levy et al. (1998) we have studied the critical capacities of three types of models: single-module with two levels of coding-sparseness, multi-modular network without dendritic processing, and one 1
I
I
I
I
1
2
3
4
5
with non-linear dendritic processing. Only the last one was found to have the desired property of a uniform level of capacity for memories with different coding levels. An example of these results is shown in Fig. 2. As is evident, the non-linear modular network has a lower critical capacity of the small memories but is significantly better for the large memories.
Fault tolerance to afferent focal damage The modular organization of our network allows us to study the interesting question of fault-tolerance to focal afferent damage. Assuming spatially localized damage, and setting the input cues to some of the modules to zero, we can easily obtain the pattern completion quality of our model, as demonstrated in Fig. 3. I
I
1
0.9
0.8 0.7 A
0.6
5 0.5 0.4 0.3
0.2 0.1
0
6
7
8
9
10
R Fig. 1 . Quality of retrieval vs. memory modular coding. The dark shading represents the mean overlap of retrieved memories with different modular codings achieved by a network with linear synaptic couplings. The light shading represents the mean overlap of a network with non-linear processing of the inter-modular connections. The latter achieves perfect recall of all memory patterns. The simulation parameters are: L = 10, N=500, M =50, p=O.05, X=0.7, 8,=2 and 0,,=0.6. The encoded memories are distributed homogeneously over all possible modular codings.
70
The basic important message from these results is that memories encoded in more modules are much more resilient to focal afferent damage than memories that are encoded in few modules. It is then interesting to make a connection with the observations of Warrington and Shallice ( I 984) who have shown that objects that have more semantic features are better preserved. To be specific, the comparison was made between memories of natural objects and those of artificial objects. Psychological studies have demonstrated that the former are associated with a larger number of semantic features than the latter. Based on this realization, Hinton and Shallice ( 1 99 1) have constructed a neural model with various feedforward and feedback features in which they could demonstrate that memories with higher activity are less susceptible to damage. In our model this result is immediate. It follows naturally from the squashed representation of inputs from other modules. 0.08
Fault tolerance to dendritic weakening The analysis in the previous section was carried out under conditions in which afferents were damaged but all lateral connections were left intact. Obviously there are many kinds of damage that can be afflicted on neuronal systems, and there exist natural ways for the system to defend itself against most of them (see Horn et al., 1998 for a discussion of neuronal regulation, a mechanism that has obtained recent experimental support (Turrigiano et al., 1998)). In this section we would like to point out one specific problem that may be easily parametrized and studied in our model: the weakening of the dendritic transport function. In our model this is simply represented by the parameter A that multiplies the Heaviside step function describing non-linear dendritic processing. Obviously, the resilience of memories with high modular coding, described in the previous section, crucially depends 1
Conventional Network Linear Modular Network Non-linear Modular Network
0.07
0.06 0.05 > o
0.04 0.03 0.02 0.01 0
50
250
Number of Firing Neurons in a Memory Pattern Fig. 2. Critical capacities for conventional (single module), linear and non-linear modular networks storing two populations of memories. All three networks have 5000 neurons, and the activities of the two memory populations are 50 and 250 neurons correspondingly (that is, diffused activities of 0.01 and 0.05 in the single module network and two populations of = 1,5 and p = 0.05 in the modular networks). Other parameters are L = 5, 8 = 9,, = 0.7, 8,= 2 and A = 0.5.
71 1
I
I
,
I
I
1
Fig. 3 . Performance of networks with different modular codings for different cases of focal afferent damage. The curves show the average overlaps obtained when different numbers of modular inputs were left intact.
0.2
0
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
h
(a=
(a=
Fig. 4. For A = 1 low-coded memories 1,2,3)perform worse than high-coded memories 8,9,10) when two modular afferents are absent. This situation is reversed when the strength of the signal transmitted from the apical dendrites becomes too weak.
72
on A being large, leading to a strong signal from other activated modules. If, however, A is weakened, the situation will be reversed. Highly coded memories will lose their resilience to afferent focal damage. In fact, low-coded memories then have a higher chance of survival. An example of this state of affairs is shown in Fig. 4. Starting out with high-A values, high-coded memories show their robustness in a situation in which two (of ten) modules lose their inputs. However, when A decreases below some threshold (0.6 in this example), and memory reconstruction is demanded above some overlap (0.7 here), lowcoded memories will outperform high-coded ones on average. This may explain an interesting observation in patients of Alzheimer’s disease (Gonnerman et al., 1997). It seems that for lowdamage patients memories of natural items are better retained than artifacts, while for highdamage patients artifact memories are more robust. The transition between the two groups of patients looks like the effect described here.
References Amit, D.J. (1995) The Hebbian paradigm reintegrated: local reverberations as internal representations, Behav. Brain Sci., 18: 617. Gonnerman, L.M., Andersen, E.S., Devlin, J.T., Klemper, D. and Seidenberg, M.S. (1997) Double dissociations of semantic categories in Alzheimer’s disease, Brain and Language, 57(2): 254-279. Hetherington, P.A. and Shapiro, L.M. (1993) Simulating Hebb cell assemblies: the necessity for partitioned dendritic trees and a post-not-pre LTD rule, Network, 4: 135-153. Hevner, R.F. (1993) More modules, TINS, 16(5): 178.
Hinton, G.E. and Shallice, T. (1991) Lesioning at attractor network: investigations of acquired dyslexia, Psycholog. Rev., 98(1): 74-95. Horn, D., Levy, N. and Ruppin, E. (1998) Memory maintenance via neuronal regulation, Neural Cornput., 10: 1-18. Hubel, D.H. and Wiesel, T.N. (1977) Functional architecture of macaque visual cortex, Proc. R. SOC. Lond., B, 198: 1-59. Lauro-Grotto, R., Reich, S. and Virasoro, M.A. (1994) The computational role of sonscious processing in a model of semantic memory. In: Proceedings of the IIAS Symposium on Cognition Computation and Consciousness. Levy, N., Horn, D. and Ruppin, E. (1998) Associative memory in a multimodular network, Neural Cornp., to be published. Markram, H., Lubke, J., Frotscher, M., Roth, A. and Sakmann. B. (1997) Physiology and anatomy of synaptic connections between thick tufted pyramidal neurones in the developing rat neocortex, J. Physiol., 500(2): 409440. Martin, A,, Haxby, J.V., Lalonde, EM.. Wiggs, C.L. and Ungerleider. L.G. (1995) Discrete cortical regions associated with knowledge of color and knowledge of action, Science, 270: 102-105. Martin, A,, Wiggs, C.L., Ungerleider, L.G. and Haxby, J.V. ( 1996) Neuronal correlates of category-specific knowledge, Nature, 379: 649-652. Mountcastle, V.B. (1957) Modality and topographic properties of single neurons of cat’s somatic sensoty cortex, J. Neurophysiol., 20: 408434. Tsodyks, M.V. (1989) Associative memory in neural networks with the hebbian learning rule, Modern Phys. Left. B, 3(7): 555-560. Turrigiano. C.G., Leslie, K.R., Desai, N.S., Rutherford, L.C. and Nelson, S.B. (1998) Activity-dependent scaling of quanta1 amplitude in neocortical neurons, Nature, 39 1 : 892-89 5. Warrington, E.K. and Shallice, T. (1984) Category specific semantic impairments, Brain, 107: 829-854. Yuste, R., Gutnick, M.J., Saar. D., Delaney, K.R. and Tank, D. ( I 994) Ca2’ accumulation in dendrites of neocortical pyramidal neurons: an apical band and evidence for two functional compartments, Neuron, 13: 23-43.
SECTION I1
Neuropsychology
This Page Intentionally Left Blank
J.A. Reggia, E. Ruppin and D. Glanzman (Eds.) Progress in Bruin Research, Vol 12 I 0 1999 Elsevicr Science BV. All rights reserved.
CHAPTER 5
Understanding failures of learning: Hebbian learning, competition for representational space, and some preliminary experimental data James L. McClelland’,2,*,Adam G. Thomas’93,Bruce D. M ~ C a n d l i s s ” ~ and Julie A. fie^"^ ‘Centerfor the Neural Basis qf Cognition 2Department of Psychology, Curnegie Mellon University, Philadelphia, PA, USA ‘Depurtment of Neuroscience, Universiry of Pittsburgh, Philudelphiu, PA, USA “LearningResearch and Developnent Center and Department of Psychology, University of Pittsburgh, Philudelphiu, PA, USA
Introduction The availability of powerful learning algorithms such as back propagation has created a situation in which we now know how to teach neural networks many complex things. Models that use back propagation have been used to account for development and learning in a wide range of task situations. For example, there are successful models of the acquisition of word reading skill (Sejnowslu and Rosenberg, 1987; Seidenberg and McClelland, 1989; Plaut et al., 1996), of physical knowledge such as object permanence (Munakata et al., 1997), and of conceptual knowledge (Hinton, 1989; Rumelhart, 1990) such as kinship relations and the natural kind of hierarchy. These models raise the question, why is it that we sometimes fail to learn from experience? Two cases of failure of learning motivated the present analysis: 1. Many models, including the model of McClelland et al. (1995), assume that after the onset of amnesia, gradual learning in spared *Corresponding author. Tel.: + 1 (412)-268-3157; Fax: + 1 (412)-268-5060; e-mail: [email protected]
neocortical areas is possible. This assumption is based on the fact that amnesic subjects show considerable spared learning ability, particularly when given repeated exposure to items, and plays a prominent role in most models y of temporally graded retrograde amnesia (Milner, 1989; McClelland et al., 1983; Alvarez and Squire, 1994). Yet they are virtually completely incapable of learning a set of arbitrary paired associates using standard paired-associate learning methods, even with massive repetition. This is mentioned in passing in several case reports, but detailed documentation is lacking. However, a similar failure is reported by Gabrieli et al. (1988). They tried to teach HM the meanings of eight new words, one of which he already knew in advance. Through hundreds of trials over many sessions with several task variants, he failed to learn the meanings of any of the seven words he did not already know. 2. Even in normal adults, there can be cases of failure of learning. For example, when Japanese adults come to the United States, they often have great difficulty discriminating between Y and 1; while there is some evidence of slight improvement over time, it is very gradual, and difficulties can persist indefinitely. Yet adults are capable of
76
learning many new skills, and indeed it seems likely that the cortical mechanisms thought to underlie spared learning in amnesics are available for skill learning in normal subjects. Why then are perceptual discriminations between sounds not contrasted in one’s own native language so difficult to acquire? One idea that could account for these difficulties comes from a consideration of the Hebbian learning rule. According to Hebb, if one neuron takes part in firing another, the strength of the connection between them will be increased. What this means is that if an input elicits a pattern of neural activity, Hebbian learning will tend to strengthen the tendency to elicit the same pattern of activity on subsequent occasions. That is, if learning in the brain is Hebbian, then learning will tend to strengthen whatever response the brain makes to its inputs. If the response is useful and constructive, the brain will learn to reinforce it. If the response is inappropriate or undesirable, Hebbian learning will still tend to reinforce it. This leads to the suggestion that many failures of learning in adulthood may reflect a paradoxical tendency of the mechanisms of learning to reinforce inappropriate or undesirable responses. We can now examine why paired associate learning may be difficult in amnesics. The subject receives a list of, say, 12 word pairs (including, for example, LOCOMOTIVE-DISHTOWEL and TABLE-BANANA, among others). After a slight delay, the experimenter presents the first word in one of the pairs, and asks the subject to recall the word that was previously paired with it in the experiment. Due to the subject’s amnesia, he may not remember even that there was a list of word pairs. Nevertheless, as is standard in pairedassociate learning, the subject is encouraged to guess a response. Given the arbitrary pairing of the words, TABLE is unlikely to come to mind in the context of BANANA as a cue, and so the stimulus is likely to elicit some other response. If learning is Hebbian, it is this response that will be strengthened, thereby leading to interference. There is experimental support for the idea that forcing amnesics to make their own responses to items leads to interference. Baddeley and Wilson (1 994) contrasted two conditions for teaching
subjects to recall a particular whole word (e.g. QUOTE) from a part-word cue (e.g. QU). In the errorless condition, the experimenter said: “I’m thinking of a word beginning with QU. The word is QUOTE. Please write it down.” In the errorful condition, the experimenter said. “I am thinking of a word beginning with QU. Can you guess what it is?”. Subjects generally made several incorrect guesses, and in fact the experimenter could switch the ‘correct’ answer to ensure that no correct guesses were made on the first occasion. After the guessing the experimenter said “The word I was thinking of is QUOTE, please write it down.” Thus in both conditions, subjects wrote down the experimenter’s word at the end of the presentation. This procedure was repeated three times with several different words in each condition. Subsequently, subjects were tested for their ability to remember the experimenter’s words. The amnesic subjects scored about 30% correct in the errorful condition, and about 70% correct in the errorless condition, indicating that encouraging them to guess their own responses produced massive interference. Control subjects did far better in both conditions and showed much less interference from their own guesses. Another experiment making similar points was reported by Hayman et al. (1993).
Modeling studies The focus of the research we report here has been the failure of Japanese adults to learn the discrimination between r and 1. We have taken a two-pronged attack, combining computational modeling with experimental studies on Japanese adults who show considerable difficulty discriminating r and 1. The modeling work (Thomas and McClelland, 1997) began with an effort to demonstrate how a Hebbian-like learning mechanism could lead to failure to learn to discriminate two similar inputs (abstract proxies for r from l), once that ability had been lost by ’rearing’ the network in an environment providing only a single input in that region of perceptual space. For this purpose we used a variant of the Kohonen network architecture (Fig. 1). There were two layers, each with 49 units, arranged in a 7 x 7 array. These layers were called the ‘input’ and the ‘representation’ layer, respec-
Fig. 1. A diagram of the network model used in our simulations. The model is a variant of the self-organizing map model of Kohonen (1982, 1990).
tively. Initial random feed-forward projects were loosely topographic. On presentation of an input, the representation unit receiving the strongest net input was chosen as the winning representation unit, and it and its neighbors were assigned activation values equal to a Gaussian function of distance from the winner. Weights coming into representation units were then adjusted according to a variant of the competitive learning rule:
Aw,, = w ( a , - w,,)
(1)
Here a, is the activation of the receiving or representation unit, a, is the activation of the sending or input unit, and w, is the weight to the former from the latter. This rule has a Hebbian component (the product ap,) together with a tendency for weights to decay in proportion to the product of the activation of the receiving unit and the current value of the weight (the product a,w,,). Inputs consisted of Gaussian blobs of activity. Two training conditions were used. In both, there were four corner inputs (Fig. 2A). These occupy the four corners of the input space, and correspond to background phonemes. In one training condition
(the ‘English-like’ training condition) there were two additional overlapping inputs (Fig. 2B), proxies for Y and 1 respectively. Given the parameters used, more than 90% of networks trained in this condition successfully learned to assign distinct representations to the two overlapping stimuli, just as most children in English-speaking countries naturally learn to discriminate stimuli in their native language. In another training condition (the ‘initially Japanese-like’ training condition), there were initially just the four corner inputs and just one other, centered input (Fig. 2C) between the two overlapping inputs used in the English-like conditions. After 300 epochs, the networks were switched from the Japanese-like to environment to the English-like environment. In this case, all networks learned initially to assign a single representation to the Japanese-like, centered input. Crucially, none of these networks subsequently learned to assign different representations to the two overlapping English-like inputs. They retained their tendency to treat these inputs as the same, even though the mechanisms of plasticity operate without any changes throughout the simulation.
78
Fig. 2. Examples of the input patterns used in training and testing the network. (A) The four comer inputs. (B) The two overlapping inputs. (C) The single central input. (D) Exaggerated versions of the overlapping inputs used in ‘remediation’ of the network.
Thus far, our work supports the idea that discriminations that can be learned if the distinction is present in a network’s initial environment may not be learned when the distinction is not introduced until after the network’s response tendencies become established. This provides one way of accounting for the the loss of plasticity seen in Japanese adults. Obviously, other factors could be at work, including possibly a general reduction in plasticity with age. If the mechanisms considered here are even part of the story, they predict that we may be able to induce plasticity in Japanese adults. First, we consider the use of exaggerated inputs as a method for inducing plasticity. To illustrate this in the simulation, we added two additional inputs to the English-like environment. These were exaggerated versions of the Y and I-like stimuli. Networks that failed to learn to discriminate the Y and l-like stimuli in the initially Japanese condition learned to discriminate these stimuli in only a few epochs after the exaggerated stimuli were included in the training set (Fig. 2D).
The idea that the use of exaggerated stimuli could induce plasticity is consistent with the findings reported by Merzenich, Tallal, and their colleagues (Merzenich et al., 1996; Tallal et al., 1996). They showed that they could remediate children with language impairments when they used a training regime that exaggerated contrasts between plosive stops and other sounds differing by rapid transitions (see also Alexander and Frost, 1982). We wanted to show that plasticity was still present in adults, and to test the role of exaggeration. For this purpose, McCandliss et al. (1998) developed a set of two speech continua, one spanning from ‘rock’ to ‘lock’ and one spanning from ‘road’ to ‘load’. Starting with natural speech tokens generated by a native English speaker, eighty-item continua were constructed for each contrast, ranging from highly exaggerated tokens of ‘lock’ or ‘load’ to highly exaggerated tokens of ‘rock’ or ‘road’. University of Pittsburgh Undergraduates showed very clean categorization of the stimuli on each continuum; in each case, only 10 steps on the continuum lay in a gray zone,
79
separating those items the native English speakers reliably heard as I from an item they reliably heard as r.
Experimental investigations Eight subjects whose initial discrimination of r and 1 stimuli was quite poor were tested in the adaptive condition of this experiment. Each subject was trained on one of the two continua. Highly exaggerated tokens of r and 1 stimuli were used initially, near the extreme ends of the continuum. The two selected stimuli were presented in random order, and the subject was simply required to press one button if the stimulus began with r and another if it began with 1. Whenever the subject made an error, the task was made easier, by replacing the stimulus with the next more exaggerated one, until the extremes of the continuum was reached. Whenever the subject performed correctly on eight trials in a row, the task was made easier, by replacing the r or the 1 with the next less exaggerated item. Half the subjects received feedback after each trial, and half received no feedback. All of the subjects showed substantial improvement within three twenty-minute sessions, and all showed marked improvement compared to pre-test performance in a subsequent post-test. Eight other subjects selected according to the same criteria participated in the set training condition of the experiment. For this condition, the r and 1 stimuli just at the edge of the native English speaker’s gray zone were used throughout the experiment. Otherwise the experiment was identical to the adaptive condition, with half of the subjects receiving feedback and half receiving no feedback. We had initially expected that the subjects would generally fail to learn the discrimination, but this expectation was only partially confirmed. The two out of the four subjects in the no feedback condition whose performance was initially the worst did fail to learn. If anything, these subjects became less able to distinguish the stimuli over the course of the experiment, in accordance with the Hebbian hypothesis. However, the other two subjects in the no-feedback condition, whose initial ability to distinguish the r - 1 stimuli was some-
what better, showed rapid learning, with strong gains on the post-test. These findings suggest that as long as the stimuli are even only partially discriminable, the mechanisms of learning will successfully pull them apart. Initially, we were puzzled by the fact that these subjects could learn so rapidly using very difficult stimuli without feedback. If they could learn this rapidly in our experiment, why had they not mastered the discrimination from exposure to natural speech? A possible explanation comes from an aspect of the model that we now feel may be at least as important as the use of Hebbian learning. This is the fact that, in a Kohonen network, patterns compete for space. The outcome of the competition depends on similarity, frequency of presentation, and existing conditions. Under natural conditions, we suggest that rand 1 must compete for space with many other stimuli. This happens in our simulations, where comer inputs compete for space with the overlapping inputs. Under these circumstances, if existing Fonditions are such that the overlapping stimuli are treated the same, the competition from the corner stimuli helps to maintain this. However, if training is focused only on the overlapping stimuli, the model will learn to separate them. Any initial difference in the response to the stimuli will be capitalized on very rapidly, leading to a rapid separation of the response to these two inputs. This is consistent with the conditions of our experiment, in which subjects were allowed to focus only on the distinction between r and 1 in a single contrast. Those subjects who had some initial ability to discriminate the stimuli rapidly learned to distinguish them.
Conclusions The research reviewed above suggests that our initial suggestion that learning may rely on a Hebbian process may only be part of the story. It appears that competition for space in representations may also be relevant to understanding cases in which learning fails. To test this, we currently plan additional experiments in which we will vary the number of different stimuli the subjects must distinguish. According to the model, if several other stimuli ending in say ‘-ock’ are included at the same time in addition to ‘rock’ and ‘lock’, the
80
competition from the others for representational space should greatly retard learning of the r - I discrimination. Thus far we have considered only the results for the subjects who received no feedback in the set training condition of the experiment. Four other subjects in this condition did receive feedback, and all four of them showed rapid improvement. This indicates that in fact, outcomes can make a difference to learning, and that the Hebbian account of learning is at best incomplete. Current work in our group is exploring ways in which the existing model might be elaborated to take account of these aspects of the findings of this experiment. While feedback does appear to play a role, it is our view that such feedback is very rarely available in natural learning situations. True, a Japanese adult may be able to use context to determine whether an English speaker intends to refer to a rock or a lock. However, such context cannot unambiguously tell us that the words for rocks and locks are different. There are clearly words within both English and Japanese that sound the same but refer to completely different things in different contexts (consider the English words ‘bat’ and ‘ball’, each of which have at least two apparently completely unrelated meanings). Thus our future efforts will consider how feedback might modulate and enhance learning that might otherwise depend crucially on the competitive and Hebbian characteristics found in the Kohonen net.
References Alexander, D.W. and Frost, B.P. (1982) Decelerated synthesized speech as a means of shaping speed of auditory processing of children with delayed language. Percept. M o m Skills, 55: 783-792. Alvarez, P. and Squire, L.R. (1994) Memory consolidation and the medial temporal lobe: A simple network model. Proc. Nat. Acad. Sci., USA, 91: 7041-7045. Baddeley, A. and Wilson, B.A. (1994) When implicit learning fails: Amnesia and the problem of error elimination. Neuropsychologia, 32: 53-68. Gabrieli, J.D.E., Cohen, N.J. and Corkin, S. (1988) The impaired learning of semantic knowledge following bilateral medial temporal-lobe resection. Brain and Cognition, 7: 157-177. Hayman, C.A.G., Macdonald, C.A. and Tulving, E. (1993) The role of repetition and associative interference in new
semantic learning in amnesia: A case experiment. J. Cogn. Neurosci., 5: 375-389. Hinton, G.E. ( 1989) Learning distributed representations of concepts. In: R.G.M. Moms (Ed.), Parallel Distributed Processing: Implications for Psychology and Neurobiology (pp. 46-6 1). Oxford, England: Clarendon Press. Kohonen, T. ( 1982) Self-organized formation of topologically correct feature maps. Biolog. Cybernet., 43: 59-69. Kohonen, T. (1990) The self-organizing map. Proc. IEEE, 78: 1464- 1480. McCandliss, B.D., Fiez, J.A., Conway, M., Protopapas, A. and McClelland, J.L. ( 1998) Eliciting adult plasticity: both adaptive and non-adaptive training improves Japanese adults identification of english r and 1. Soc. Neurosci. Abstr., 24: 1898. McClelland, J.L., McNaughton, B.L. and O’Reilly, R.C. (1983) Why do we have a special learning system in the hippocampus? (Abstract 580). Bull. Psychorzomic Soc., 31: 404. McClelland, J.L., McNaughton, B.L. and O’Reilly, R.C. (1995) Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psycholog. Rev., 102: 419457. Merzenich, M.M., Jenlans, W.M., Johnson, P., Schreiner, C., Miller, S.L. and Tallal, P. (1996) Temporal processing deficits of language-learning impaired children ameliorated by training. Science, 27 I , 77-8 I. Milner, P. (1989) A cell assembly theory of hippocampal amnesia. Neuropsychologia, 27: 23-30. Munakata, Y., McClelland, J.L., Johnson, M.H. and Siegler, R. (1997) Rethinking infant knowledge: Toward an adaptive process account of successes and failures in object permanence tasks. Psycholog. Rev., 104: 686-713. Plaut, D.C., McClelland, J.L., Seidenberg, M.S. and Patterson, K.E. (1996) Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psycholog. Rev., 103: 56-115. Rumelhart, D.E. (1990) Brain style computation: Learning and generalization. In: S.F. Zornetzer, J.L. Davis and C. Lau (Eds.), An Introduction to Neural and Electronic Networks (San Diego, CA: Academic Press, pp. 405-420). Seidenberg, M.S. and McClelland, J.L. (1989) A distributed, developmental model of word recognition and naming. Psycholog. Reit., 96: 523-568. Sejnowski, T.J. and Rosenberg, C.R. (1987) Parallel networks that learn to pronounce english text. Complex Systems, 1: 145-1 68. Tallal, P., Miller, S.L.. Bedi, G., Byma, G., Wang, X., Nagaraja, S.S., Schreiner, C., Jenkins, W.M. and Merzenich. M.M. (1996) Language comprehension in language-learning impaired children improved with acoustically modified speech. Science, 27 I: 8 1-84. Thomas, A. and McClelland, J.L. (1997) How plasticity can prevent adaptation: Induction and remediation of perceptual consequences of early experience (abstract 97.2). Soc. Neurosci. AbstE. 23: 234.
1.A. Reggia, E. Ruppin and D. Glanzman (Eds.) Progress in Broira Resemch, Vol 121
0 1999 Elrevier Science BV. All rights reserved
CHAPTER 6
Frames of reference in hemineglect: a computational approach A. Pouget'-*, S. Deneve' and T.J. Sejnowski* 'Brain and Cognitive Science Department, University qf Roche,steK Rochester; NI: 14627, USA 'Howard Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla. CA 92037, USA and Department of Biology University of California, Sun Diego La Jolla, CA 92093, USA
Unilateral lesions of the parietal lobe in humans often lead to a syndrome known as hemineglect (Heilman et al., 1985; Pouget and Driver, 1999). These patients experience difficulty processing or reacting to stimuli located in the hemispace contralateral to their lesion. For instance, after a right parietal lesion, the case we consider here, patients fail to eat food located on the left side of their
plates and forget to dress on the left side of their bodies. When asked to cross out line segments displayed on a page, a clinical test known as line cancellation, they fail to cross out the lines located on the left side of the page (Fig. 1). Although it is clear that hemineglect patients have problems with the left side of space, the frame of reference of this left hemispace has remained difficult to delineate. In principle, these patients could be neglecting only objects falling on their right retinal hemifield, in which case hemineglect
*Corresponding author. e-mail: alex @ bcs.rochester.edu
C
B
A /
x I
\ /
\
Q
I )
c2
Fi
Fig. 1. A. Typical result for a line cancellation test in a left neglect patient. B. Percentage of correct identification in the Karnath et al. experiment (1993). In condition 1 (CI), subjects were seated with eyes, head, and trunk lined up whereas in condition 2 (C2) the trunk was rotated by 15" to the left. The overall pattern of performance is not consistent with pure retinal or pure head-centered neglect and suggests a deficit affecting a mixture of these two frames of reference. C, The two displays used in Driver et al. (1994) experiment. Patients must detect a gap in the upper part of the central triangle. In the top (resp. bottom) display, the object made out of the triangles is perceived as rotated 60 degrees clockwise (resp. counterclockwise). Left parietal patients detect the gap more reliably in the bottom display, i.e., when the gap is associated with the right side of the object.
82
would be purely retinocentric. Alternatively, they could be neglecting objects located on the left of their trunk, which would correspond to trunkcentered neglect. Other frames of reference such as head-centered or environment centered (allocentric) are also possible. These various possibilities can be disentangled by fixing the position of the stimulus in one frame of reference while testing the subject in multiple postures. An experiment by Kamath et al. (1991) provides a good example of this strategy. Subjects were asked to identify a stimulus that could appear on either side of the fixation point. In order to test whether the position of the stimuli with respect to the body affects performance, two conditions were tested: a control condition with head straight ahead (Cl) and a second condition with head rotated 15" on the right (where right is defined with respect to the trunk) or, equivalently, with the trunk rotated 15" to the left (where left is defined with respect to the head) (see Fig. IB, C2). In C2, both stimuli occurred further to the right of the trunk than in C 1, though at the same location with respect to the head and retina. Moreover, the trunk-centered position of the left stimulus in C2 was the same as the trunkcentered position of the right stimulus in C1. As expected, subjects with right parietal lesions performed better on the right stimulus in the control condition (Cl), a result consistent with both retinotopic and trunk-centered neglect. However, to distinguish between the two frames of reference, performance should be compared across conditions. If the deficit was purely retinocentric, the results should be identical in both conditions since the retinotopic locations of the stimuli do not vary. On the other hand, if the deficit is purely trunkcentered, the performance on the left stimulus should improve when the head is turned right since the stimulus now appears further toward the right of the trunk-centered hemispace. Furthermore, performance on the right stimulus in the control condition should be the same as performance on the left stimulus in the rotated condition since they share the same trunk-centered position in both cases. Neither of these hypotheses is fully consistent with the data. As expected from retinotopic neglect, subjects always performed better on the right
stimulus in both conditions. However, performance on the left stimulus improved when the head was turned right (C2), though not sufficiently to match the level of performance on the right stimulus in the control condition (CI, Fig. IB). Therefore, these results suggest a retinotopically based form of neglect modulated by trunk-centered factors. In addition, Karnath et al. (1991) tested patients on a similar experiment in which subjects were asked to generate a saccade toward the target. The analysis of reaction time revealed the same type of results as the one found in the identification task, thereby demonstrating that the spatial deficit is, to a first approximation, independent of the task. Several other experiments have found that neglect affects a mixture of frames of reference in a variety of tasks (Bisiach et al., 1985; Calvanio et al., 1987; Ladavas, 1987; Ladavas et al., 1989; Farah et al., 1990; Behrmann & Moscovitch, 1994). At first glance, these results would appear to be consistent with current views regarding spatial representations in the parietal cortex. Several authors argue that the parietal cortex contains a mosaic of cortical areas each encoding the location of an object in one particular frame of reference and each involved in the control of one particular behavior (Goldberg, 1990b; Stein, 1992; Colby, 1998). Large lesions, which are unfortunately quite common, would be expected to encompass many of these areas and the resulting neglect would affect multiple frames of reference. However, neglect would also be expected to be behavior specific. For instance, it has been suggested that cells in LIP use retinotopic coordinates for the control of saccadic eye movements (Goldberg et al., 1990) whereas cells in VIP uses head-centered receptive fields for the control of reaching toward the face (Graziano et al., 1994; Duhamel et al., 1997). A lesion of these modules would affect both behaviors but the deficit would be retinotopic for eye movements and headcentered for reaching. More generally, for any particular behavior, the deficit should be confined to one particular set of coordinates. The fact that Karnath et al. (1993) found that a mixture of frame of reference was affected for different tasks, verbal report vs. saccades, is incompatible with this prediction.
83
In this chapter, we suggest a different explanation for the mixture of frames of reference found in hemineglect subjects. Our approach is based on interpreting the responses of parietal neurons as basis function of their sensory inputs. We show that basis function decomposition is a computationally efficient way to represent the position of objects in space for the purpose of sensorimotor transformation. Basis function representations also have the property that each cell can encode the location of objects in multiple frames of reference simultaneously allowing them to be involved in the control of multiple behaviors. Thus, we concur with previous reports that the parietal cortex appears to contain multiple modules, but we argue that, in all modules, each neuron encodes multiple frames of reference. As a result, a lesion leads to a syndrome in which each behavior is affected in multiple frames of reference in all patients. In the final section, we also address the issue of object-centered neglect, which refers to the possibility that patients neglect the left side of object regardless of the position and orientation of objects in space. Object-centered neglect is clearly illustrated in an experiment by Driver et al. (1994) in which patients were asked to detect a gap in the upper part of a triangle embedded within a larger object (Fig. 1C). They reported that patients detected the gap more reliably when it was associated with the right side of the object than when it belonged to the left side even when this gap appeared at the same retinal location across conditions (Fig. 1C). We show that the basis function framework can also account for such results without invoking the existence of explicit object-centered representations. We also discuss other claims about object-centered neglect and argue that, in many cases, the deficit is not necessarily in objectcentered coordinates but rather in a relative viewer-centered coordinate frame (the term viewer centered here means any frame of reference centered on a part of the body, such as the trunk or the eyes). We call this relative neglect and we demonstrate that basis functions combined with a temporal selection process are sufficient to account for this behavior.
This chapter is organized as follow: It starts with a brief summary of the basis function approach followed by the description of the architecture used for the simulations of hemineglect. Next, the behavior of a lesioned basis function network is examined in a line cancellation task. Then the issue of frame of reference is addressed by simulating the Karnath experiment. Finally, object-centered neglect is examined in the light of the basis function approach.
Basis function representation The basis function model of parietal cortex is motivated by the hypothesis that spatial representations correspond to a recoding of the sensory inputs that facilitates the computation of motor commands. This perspective is consistent with Goodale and Milner’s (1990) suggestion that the dorsal pathway of the visual cortex mediates object manipulation (the ‘How’ pathway), as opposed to simply localizing objects as Mishkin et al. (1983) previously suggested (the ‘Where’ pathway). In general, the choice of a representation strongly constrains whether a particular computation is easy to perform or difficult. For example, addition of numbers is easy in decimal notation but difficult with Roman numerals. The same is true for spatial representations. With some representations the motor commands for grasping may be simple to perform and stable to small input errors, but in others the computation could be long and sensitive to input errors. A set of basis functions has the property that any nonlinear function can be approximated by a linear combination of the basis functions (Poggio, 1990; Poggio and Girosi, 1990). Therefore, basis functions reduce the computation of nonlinear mappings to linear transformations - a simpler computation. Most sensorimotor transformations are nonlinear mappings of the sensory and posture signals into motor coordinates; hence, given a set of basis functions, the motor command can be obtained by a linear combination of these functions. In other words, if parietal neurons compute basis functions of their inputs, they would recode the information in a format that simplifies the computation of subsequent motor commands.
84
As illustrated in Fig. 2B, the response of parietal neurons can be described as the product of a Gaussian function of retinal location multiplied by a sigmoid function of eye position. Sets of both Gaussians and sigmoids are basis functions, and the set of all products of these two basis functions also forms basis functions over the joint space (Pouget and Sejnowski, 1995, 1997b). These data are therefore consistent with the assumption that parietal neurons compute basis functions of their inputs and thus provide a representation of the sensory inputs from which motor commands can be computed by simple linear combinations (Pouget & Sejnowski, 1995, 1997b). It is important to emphasize that not all models of parietal cells have the properties of simplifying
Fig. 2. A. Idealization of a retinotopic visual receptive field of a typical parietal neuron for three different gaze angles (eJ. Note that eye position modulates the amplitude of the response but does not affect the retinotopic position of the receptive field (adapted from Andersen et al., 1985). B. 3D plot showing the response function of an idealized parietal neuron for all possible eye and retinotopic positions, e., and r, . The plot in A was obtained by mapping the visual receptive field of this idealized parietal neuron for three different eye positions as indicated by the bold lines
the computation of motor commands. For example, Goodman and Andersen (1990) as well as Mazzoni and Andersen (1995) have proposed that parietal cells simply add the retinal and eye position signals. The output of this linear model does not reduce the computation of motor commands to linear combinations because linear units cannot provide a basis set. By contrast, the hidden units of the Zipser and Andersen model (1988), or the multiplicative units used by Salinas and Abbott (1995, 1996), have response properties closer to the basis function units, and the basis function hypothesis can be seen as a formalization of these models (for a detailed discussion see Pouget and Sejnowski, 1997b). One interesting property of basis functions, particularly in the context of hemineglect, is that they represent the positions of objects in multiple frames of reference simultaneously. Thus, one can recover simultaneously the position of an object in retinocentric and head-centered coordinates from the response of a group of basis function units similar to the one shown in Fig. 2B (Pouget and Sejnowski, 1995, 1997). As shown in the next section, this property allows the same set of units to perform multiple spatial transformations in parallel. This approach can be extended to other sensory and posture signals and to other parts of the brain where similar gain modulations have been reported (Trotter et al., 1992; Boussaoud et al., 1993; Bremmer and Hoffmann 1993; Field and Olson 1994; Brotchie et al., 1995). When generalized to other posture signals, such as neck muscle proprioception or vestibular inputs, the resulting representation encodes simultaneously the retinal, head-centered, body-centered, and world-centered coordinates of objects. We recently explored the effects of a unilateral lesion of a basis function network (Pouget and Sejnowski, 1997a). The next section describes the structure of this model.
Model organization The model contains two distinct parts: a network for performing sensorimotor transformations and a
85
selection mechanism. The selection mechanism is used when there is more than one object present in the visual field at the same time. Network architecture
The network has three layers with basis function units in the intermediate layer to perform a transformation from a visual retinotopic map input to two motor maps in head-centered and oculocentric coordinates respectively (Fig. 3). The visual inputs correspond to the cells found in the early stages of visual processing and the set of units encoding eye position have properties similar to the neurons found in the intralaminar nucleus of the thalamus (Schlag-Rey and Schlag 1984). These input units project to a set of intermediate units that contribute to both output transformations. Each intermediate unit computes a Gaussian of the retinal location of the object, r,, multiplied by a sigmoid of eye position, e.,:
Fig. 3. Network architecture. Each unit in the intermediate layers is a basis function unit with a Gaussian retinal receptive field modulated by a sigmoid function of eye position. This type of modulation is characteristic of the response of parietal neurons.
We consider horizontal positions only because the vertical axis is irrelevant for hemineglect. These units are organized in two two-dimensional maps covering all possible combinations of retinal and eye position selectivities. The only difference between the two maps is the sign of the parameter p, which controls whether the units increase or decrease activity with eye position.' p was set to 8" for one map and -8" for the other map. The indices (i, j ) refer to the position of the units on the maps. Each location is characterized by a position for the peak of the retinal receptive field, r,,, and the midpoint of the sigmoid of eye position, e,]. These quantities are systematically varied along the two dimensions of the maps in such a way that in the upper right comer rr, and e, correspond to right retinal and right eye positions whereas in the lower left they correspond to left retinal and left eye positions. This type of basis function is consistent with the responses of single parietal neurons found in area 7a. A population of such units forms a basis function map encoding the locations of objects in head-centered and retinotopic coordinates simultaneously. The activities of the units in the output maps are computed by a simple linear combination of the basis function unit activities. Appropriate values of the weights were found by using linear regression to achieve the least mean square error. This architecture mimics the pattern of projections of the parietal area 7a, which innervate both the superior colliculus and the premotor cortex (via the ventral parietal area VIP; Andersen et al., 1990a; Colby and Duhamel, 1993), where neurons have retinotopic and head-centered visual receptive fields respectively (Sparks, 1991; Graziano and Gross, 1994).
' We use the term retinal position to refer to the position of the object in space with respect to a frame of reference centered on the eye. This is not to be confused with the position of the image of the object onto the retina. Indeed, due to the optics of the eye, the image of an object in the right retinal hemifield will be projected onto the left hemiretina.
86
Hemispheric biases and lesion model
Although the parietal cortices in both hemispheres contain neurons with all possible combinations of retinal and eye position selectivities, most cells tend to have their retinal receptive fields on the contralateral side (Andersen et al., 1990a,b). Whether a similar contralateral bias exists for the eye position in the parietal cortex remains to be determined although several authors have reported such bias for eye position selectivities in other parts of the brain (Schlag-Rey and Schlag, 1984; Galletti and Battaglini, 1989; Van Opstal et al., 1995). In the model, we divide the two basis function maps into two sets of two maps, one set for each hemisphere (again the two maps in each hemisphere correspond to two possible values for the parameter p). Units are distributed across each hemisphere to create neuronal gradients. These neuronal gradients induce more activity overall in the left maps than in the right maps when an object appears in the right retinal hemifield or when the eyes are turned to the right, with the opposite being true in the right maps. Several types of neuronal gradients can lead to such activity gradients. The gradients we used for the simulations presented here affected only the maps with positive p; that is, maps with units whose activity increased when the eyes turn right. In both the right and left maps, the number of units for a given pair of (r,,, ex,) increased linearly with contralateral values of eye and retinal location, as indicated in Fig. 4. The slope of the gradient was chosen to be steeper for retinal location than for eye position. This choice was motivated by electrophysiological recordings. As mentioned above, the retinal gradient is clearly present in experimental data whereas the eye position gradient is not as clear. This would therefore suggest that if the eye position gradient exists - and our model predicts that it does - it is weaker than the retinal gradient. A right parietal lesion was modeled by removing the right parietal maps and studying the network behavior produced by the left maps alone. The effect of the lesion was therefore to induce a neuronal gradient such that there was more activity in the network for right visual field and right eye positions.
Fig. 4. Same as in Fig. 3 but with the intermediate layer split into right and left hemisphere maps. The arrows indicate the direction of the neuronal gradient. Each map overrepresents contralateral retinal and eye positions. A right parietal lesion i s simulated by removing the right map.
The exact profile of the neuronal gradient across the basis function maps did not matter as long as it induced a monotonically increasing activity gradient as objects were moved further to the right retinal position and the eyes fixated further to the right. The results presented in this chapter were obtained with linear neuronal gradients. Selection model
The selection mechanism in the model was adapted from Burgess (1995) and was inspired by the visual search theory of Treisman and Gelade (1980) and the saliency map mechanism proposed by Koch and Ullman (1985). Unfortunately, most of these existing models are not based on distributed representations of the kind used in the present model. Models of stimulus selection, for instance, typically use local representations in which a
87
stimulus is characterized by one number, usually the activity of a single units (Burgess, 1995). In contrast, distributed patterns of activity occur in the basis function maps of the present model to represent one or several stimuli (see Fig. 7B). Therefore, we had first to reduce the dimensionality of our representation before an existing model of target selections could be used. A saliency value was assigned to each stimulus present on the display. The saliency was defined as the sum of the activity of all the basis function units whose receptive field were centered exactly on the retinal position of the stimulus, r,. This method is mathematically equivalent to defining the saliency of the stimulus as the peak in the profile of activity in the superior colliculus output map. Once the saliencies of the stimuli were computed, the selection mechanism was initiated. It involved a repetition over time of three steps: winner-take-all, inhibition of return and recovery. On the first time step, the stimulus with the highest saliency was selected by winner-take-all, and its corresponding saliency was set to zero to implement inhibition of return. At the next time step, the second highest stimulus was selected and inhibited, while the previously selected item was allowed to recover slowly. These operations were repeated for the duration of the trial. This procedure ensured that the most salient items were not selected twice in a row. However, due to the recovery process, the stimuli with the highest saliencies might be selected again if displayed for a sufficiently long time. In this model of selection, the probability of selecting an item is proportional to two factors: the absolute saliency associated with the item and the saliency relative to the ones of competing items. It is possible to implement a selection mechanism similar to the one described here by using lateral connections within the basis function map (Cohen et al., 1994). This implementation does not make an artificial distinction between the representation and the selection mechanism, as made here, and is more biologically plausible. We favor the idea that there is no distinction between the cells responsible for encoding the location of objects and the ones responsible for selective attention. However, the model of Cohen et al.
(1994) required complex dynamics and computation-intensive simulations and would have produced the same results as the present model. Thus, the selection mechanism used here was motivated by practical considerations. Evaluating network pe$ormance We used this model to simulate several experiments in which patient performance was evaluated according to reaction time or per cent of correct response. In reaction time experiments, we assumed that processing involves two sequential steps: target selection and target processing. Target selection time was assumed to be proportional to the number of iterations, n, required by the selection network to select the stimulus using the mechanism described above. Each iteration was arbitrarily chosen to be 50 ms long. This duration matters only when more than one stimulus is present, so that distractors could delay the detection of the target by winning the competition. The time for target processing (that is to say, target recognition, target naming, etc.) was assumed to be inversely proportional to stimulus saliency, s,. Thus, the total reaction (RT) time was given by:
500 RT=l00+50n+---. 1 OOOS, The percentage of correct responses to a stimulus was determined by a sigmoid function of the stimulus saliency:
where so and t are constants. This model for evaluating the output is based on signal detection theory when signal and noise follow Gaussian distributions of equal variance (Green and Swets, 1966). This is equivalent to assuming that the rate of correct detection (hit rate) is the integral of the probability distribution of the signal from the decision threshold to infinity. In line bisection experiments, subjects were asked to judge the midpoint of a line segment. In the network model, the midpoint, rn, was estimated
88
by computing the center of mass of the activity induced by the line in the basis function map. m=
units
airxi
units
ai '
(4)
where ri, is the retinal position of the peak of the visual receptive field of unit i. All the results given here were obtained from the lesioned model in which the right basis function maps had been removed. For control tasks on the normal network see Pouget and Sejnowski ( 1 997a).
Line cancellation We first measured the saliency of a stimulus as a function of its retinal location ([ - 30°, 30'1) and eye position ([ - 30', 30'1) after a simulated right parietal lesion in the basis function network. Saliency was defined as the summed activity in the basis function layer in response to the stimulus. As shown in Fig. 5A, the saliency increased as the stimulus was moved to the right hemifield or when the eyes fixated toward the right. This saliency gradient is a direct reflection of the underlying neuronal gradient introduced by the lesion. The fact that the saliency is affected by both
the retinal location of stimuli and the position of the eyes can account for the fact that neglect affects multiple frames of reference in the model, as elaborated in the next sections. Next, we tested the network on the line cancellation test, in which patients are asked to cross out short line segments uniformly spread over a page. To simulate this test, we presented the display shown in Fig. 5B and ran the selection mechanism to determine which lines get selected by the network. As illustrated in Fig. 5B, the network only crossed out the lines located in the right half of the display, mimicking the behavior of left neglect patients in the same task (Heilman et al., 1985). The rightward gradient introduced by the lesion makes the right lines more salient than the left lines. As a result, the rightmost lines always won the competition, preventing the network from selecting the left lines. The probability that the line was crossed out as a function of its position in the display is shown in Fig. 5C, where position is defined with respect to the frame of the display. There was a sharp jump in the probability function such that lines on the right of this break had a probability near 1 of being selected whereas lines on the left of the break had a probability near zero (Fig. 5C). The sharp jump in the probability of selection stands in contrast to the smooth and monotonic
Fig. 5. A. Saliency gradient in a network with a right lesion. Saliency increases when an object is moved toward the right hemiretina or when the eyes move right. B. Simulation of a line cancellation task. The network failed to cross out the line segments on the left side of the page, as in right parietal patients. C. Probability of crossing a line as a function of its horizontal position in the display. The lesioned network fails to cross out the lines in the left half of the display as if the neuronal gradient introduced by the lesion was a step function. The gradient, however, is smooth, and the sudden change in behavior in the center of the display is the result of the dynamics of the selection mechanism.
89
profile of the neuronal gradient. Whereas the sharp boundary in the pattern of line crossing may suggest that the model ‘sees’ only one half of the display, the linear profile of the neuronal gradient shows that this is not the case. The sharp jump is mainly a consequence of the dynamics of the selection process: because right bars are associated with higher saliencies, they consistently win the competition to the detriment of left bars. Consequently, the network starts by selecting the bar furthest on the right and due to inhibition of return moves its way toward the left. Eventually, however, previously inhibited items recover and again win the competition, preventing the network from selecting the leftmost bars. The point at which the network stops selecting bars toward the left depends on the exact recovery rate and the total number of items displayed. The pattern of line crossing by the network is not the result of a deficit in the selection mechanism, but rather is the result of a selection mechanism operating on a lesioned spatial representation. The network had difficulty detecting stimuli on the left side of space not because it was unable to orient toward that side of space - it would orient to the left if only one stimulus were presented in the left
hemifield - but because the bias in the representation favored the rightmost bars in the competition.
Frames of reference The frame of reference of neglect in the model was examined next. Since Karnath et al. (1 993) manipulated head position, their experiment was simulated by using a basis function map integrating visual inputs with head position rather than eye position. As in patients, the performance of the network was intermediate between retinocentric and trunk-centered neglect (as shown in Fig. 1B). This result is the direct consequence of the profile of the saliency gradient induced by the lesion (Fig. 5A). In Fig, 6, the left and right stimuli are shown in both conditions superposed on the saliency gradient. In both conditions, the right is more salient than the left stimulus and consequently, the performance on the right stimuli is better. In addition, the saliency of the left stimulus increases from condition 1 to condition 2 which account for the better performance on the left when the head is turned right with respect to the trunk. Note however, that the saliency of the left stimulus in the second condition does not match the saliency of the right stimulus in the first condition
Fig. 6. Right (triangle) and left (circle) stimuli in conditions 1 and 2 of Karnath et al. experiment (Fig. 1B) superposed on top of the saliency gradient. The lesioned model performed better on the right stimulus than left stimulus because the right stimulus was the most salient in both conditions. Additionally, performance for the left stimulus increased from condition 1 to condition 2 due to an increase in saliency. However, the saliency of the left stimulus in condition 2 did not match the saliency of the right stimulus in condition 1 even though they share the same trunk-centered location. This is because the slope of the gradient was steeper along the retinal axis than along the head position axis. As a result, the model did not perform as well on the left stimulus in condition 2 than on the right stimulus in condition I .
90
even though they share the same trunk-centered location. This is because the gradient along the retinal axis and the head position axis were not equal. This particular choice of gradient was motivated by physiological constraints as explained in the section on network architecture. Our simulations were performed with a complete lesion of the right parietal map but, in reality, lesions are incomplete. In some patients, only part of the right hemisphere might be lesioned. The saliency gradient will then reflect the combination of the representation in the left hemisphere and whatever is left in the right hemisphere. It is possible that, in some cases, this gradient would end up being identical along both dimensions in which case neglect would be purely trunk-centered. We would predict that such cases, however, to be very rare. Therefore, as in humans, neglect in the model was neither retinocentric nor trunk-centered alone but both at the same time. Similar principles can be used to account for the behavior of patients in many other experiments that involve frames of reference (Bisiach et al., 1985; Calvanio et al., 1987; Ladavas 1987; Ladavas et al., 1989; Farah et al., 1990; Behrmann and Moscovitch, 1994).
Object-centeredneglect One of the first studies reporting evidence for object-centered neglect came from the work of Caramazza and Hillis (1990) who reported a patient with a right word-centered neglect. This patient made spelling mistakes, primarily on the right side of words, whether the words were presented horizontally, vertically or mirror reversed (in the latter case, the right side of the word appeared in the left hemispace). Since then, several other studies have observed similar behavior not just for words but for objects as well. These studies are reviewed in the first part of this section. We argue that in most cases, their results can be explained with what we called relative neglect, a form of neglect that does not involve a lesion of an object-centered representation. There are, however, a few studies that cannot be explained by relative neglect such as the one by Driver et al. (1994) described in the introduction. These cases are addressed in the second part of this section.
Recent single cell data suggest a natural extension of the basis function framework to object-centered representations and we explain how this extension of the basis function hypothesis can account for Driver et al. results. Object-centered neglect or relative neglect? The paper by Arguin and Bub (1993) provides a good illustration of the kind of data typically used to support object-centered neglect. As shown in Fig. 7A, they found that reaction times were faster when a target (the ' x ' in Fig. 7B) appeared on the right of a set of distractors (C2) instead of on the left side (Cl), even though the target was at the same retinotopic location in both conditions. Although this result is certainly consistent with object-centered neglect, it is just as consistent with the idea that patients tend to neglect the parts of the object the furthest to the left, where left is defined with respect to the viewer, not the object. In other words, what matters may be the relative position of stimuli, or subparts of an object, along the left-right defined with respect to the viewer. This is what we call relative neglect. The result of several other studies can be accounted for with relative rather than object centered neglect. The only way to distinguish between these alternatives is to rotate the object such that the left-right axis of the object is no longer lined up with the left-right axis of the object. This comment applies as well to standard clinical tests such as line cancellation and line bisection. Line bisection is a test in which patients are asked to judge the midpoint of a line. Left neglect patients typically estimate the midpoint too far to the right. One might be tempted to conclude this rightward overshoot is due to neglect of the left side of the line - a line-centered interpretation. If this were the case then rotating the line by 45" with respect to the viewer should have no effect on the performance of the subject. The left side of the line should still be neglected by the same amount. On the other hand, if this is an example of relative neglect, then the overshoot is due to the fact that the subject ignores the part of the line furthest to the left with respect to the subject. This would predict that when the line is vertical, the overshoot should disappear since no part of the line is further to the left than any other
91
B
A
c1
+
c2
e e e n
+FP
c3
x o o o c 1 Target
Distracton
c2
+
0 0 0 x c 3
Fig. 7. A. Response times for the Arguin and Bub (1993) experiment for the three experimental conditions illustrated below the graph (FP, fixation point). The decrease from condition 1 (CI) to condition 2 (C2) is consistent with object-centered neglect, i.e., subjects are faster when the target is on the right of the distractors then when it is on the left, even though the retinal position of the target is the same. The further decrease in reaction time in condition 3 (C3) shows that the deficit was also retinotopic. B. Reaction time between conditions 1 and 2 decreased due to the change in the relative saliency of the target with respect to the distractors, even though the absolute saliency of the target was the same in these two conditions (a1 =a2). FP, Fixation point; C3, condition 3
part. Experimental evidence supports the latter hypothesis. The amount of overshoot is proportional to the cosine of the angle between the line and the viewer with a minimum when the line is near vertical (Halligan and Marshall, 1989; Burnett-Stuart et al., 1991). Interestingly, relative neglect emerges naturally in our basis function model of neglect. The network reaction times in simulations of the Arguin and Bub (1993) experiments followed the same trends reported in human patients (Fig. 7A). This result can easily be understood if one examines the pattern of activity in the retinotopic output layer of the network for the three conditions in those experiments. Although the absolute levels of activity associated with the target (solid lines) in conditions 1 and 2 were the same, the activity of the distractors (dotted lines) differed in the two conditions. In condition 1, they had relatively higher activity and thereby strongly delayed the detection of the target by the selection mechanism. In condition 2, the distractors were less active than the target and did not delay target processing as much as they did in condition 1. The network also reproduced the further decrease in reaction time when the whole display, target and distractors, were moved further right on
the retina (condition 3). This indicates that both the relative and absolute retinal position play a role in neglect. This is another example of multiple frames of reference being affected concomitantly in the same patient (and model). The network also showed relative neglect in a line bisection task. In particular, the overshoot went away for vertical lines (not shown here, but see Pouget and Sejnowski (1997) for more details). To convincingly establish object-centered neglect, patients should be tested with rotated objects. Object-centered neglect predicts that the neglected part will be the left side of the object whereas relative neglect predicts that it will be the part of the object the furthest on the left with respect to the viewers. Farah et al. have carried out such an experiment and reported results consistent with relative neglect. Behrmann and Moscovitch (1994) obtained similar results except for asymmetric letters for which the results seemed to indicate object-centered neglect. However, Drain and Reuter-Lorenz (1997) have obtained similar results on asymmetric letters in normals, casting doubt on the object-centered neglect interpretation. Tipper and Berhmann (1996) have also reported data consistent with object-centered neglect. They
92
used stimuli made of two circles, one on the left and one on the right. The circles could be linked by a bar, forming a barbell-like object, or not. They explored whether priming can be defined in objectcentered coordinates and found that when the right circle was primed, followed by a 180” rotation which brought the right circle to the location of the left circle and vice-versa, the priming stayed with the right circle (now on the left), but only if the two circles were linked by the bar. These results are consistent with the hypothesis that attention can be allocated in object-centered coordinates. A simpler interpretation, however, is that attention is allocated in retinal coordinates and that the attentional spotlight can move with an object. Moreover, the fact that attention followed the left circle only when linked to the other circle would imply that the dynamical aspects of the attentional spotlight are influenced by segmentation factors. Our network model does not include this mechanism and, consequently, cannot account for this experimental result on priming. However, simulations by Mozer shows that Tipper and Behrmann’s result can indeed be explained by using an attentional spotlight that works in retinal coordinates (see Mozer’s chapter (7) in this book). What makes this demonstration particularly interesting is the fact that Mozer’s model was originally designed to perform word recognition. It is therefore possible that Tipper and Behrmann’s results are not related to the existence of explicit objectcentered representations (a term we define more precisely in the next section) but could be a byproduct of the way the visual system segments and recognizes words and objects. Ultimately, there appears to be only one result that could not be explained without invoking object-centered representation, namely, the experiment by Driver et al. (1994) described in the introduction and illustrated in Fig. 1C. We show next how this result can be related to the response of single cells using the basis function framework.
Basis function applied to object-centered representations The existence of object-centered representations at the neuronal level appear to be supported by the
recent work of Olson and Gettner (1995). They trained monkeys to perform saccades to a particular side of an object (right or left, depending on a visual cue) regardless of its position in space. Ideally, it would have been important to test the monkey for multiple orientations of the object but these conditions were omitted in the first study. Once the monkey had acquired the task, they recorded the activity of cells of the supplementary eye field during the execution of the task. They found that some cells respond selectively prior to eye movements directed to a particular side of an object. For example, a cell might give a strong response before an upward saccade directed to the left side of the object but no response at all to the same upward saccade but directed to the right side of the object. This behavior suggests that some cells have motor fields defined in object-centered coordinates which would constitute what we call an explicit object-centered representation. However, all the cells recorded by Olson and Gettner can be interpreted as having an oculocentric motor field a bell-shaped tuning to the direction of the next saccadic eye movement, where direction is defined with respect to the fixation point - which is gain modulated by the side of the object and the command (Olson, personal communication). The gain modulation assumption appears to be supported by recent data from Olson and Gettner, 1998, as well as Breznen et al., 1998. Moreover, this hypothesis is computationally efficient. Performing saccades to the side of an object specified by an instruction can be formalized as a nonlinear mapping from the inputs, the image of a bar and an instruction, to the output, the motor command for the saccade (Deneve and Pouget, 1998). An efficient way to proceed is to use basis functions of the input variables in the intermediate stage of processing. These basis functions could be retinotopic receptive fields gain modulated by the side of the object and the command. Figure 8A shows a basis function neural network that can generate saccades to a particular side of an object according to an instruction and independently of the position and orientation of the object. This is a slightly more general situation than the one considered by Olson and Gettner (1995) since the model can handle arbitrary orientation.
93
Fig. 8. A. Neural network model for performing saccades to a particular side of an object in response to the image of the object and an instruction. The input contains a VI-like representation of the image of the object, a set of cells tuned to the orientation of the object-similar to cells found in the in the infero temporal cortex-and a set of cells encoding the current command. The supplementary eye field (SEF) layer contains gain modulated cells like the one found by Olson and Gettner, which compute basis function of the inputs. The motor command in the superior colliculus layer is generated by combining the output of the basis function units. B. Summed activity generated in the SEF in response to the display used by Driver et al. in a lesioned network. The dotted line indicated the general orientation of the object. The activity associated with the upper edge of the triangle is the weakest when the object is tilted clockwise (bottom). This is consistent with Driver’s finding that left neglect patients perform worse in this condition.
Deneve and Pouget (1998) have explored the effect of a unilateral lesion in this basis function model. They assumed that the left hemisphere overrepresents right retinal location and counterclockwise object rotation (and vice versa for the right hemisphere). The retinal gradient is identical to the one used by Pouget and Sejnowski (1997) (see the preceding sections). The preference for counterclockwise object rotation is consistent with the general principle used for setting the gradients, namely, the left hemisphere favors any posture toward the right, such as moving the eye to the right, rotating the head to the right, or tilting the head to the right. Indeed, when the head is tilted to the right, the retinal image rotates counterclockwise. The hemisphere preferring head tilt to the right should therefore also favor counterclockwise image rotation. Figure 8B shows the amount of activity in the basis function layer of a lesioned network, in response to the presentation of the triangle display of Driver et al. The important part to focus on is the
upper edge of the central triangle, the edge for which the patients are asked to detect the presence of a gap. As one might expect given the hemispheric gradient, the highest activity is obtained when the object is tilted counterclockwise. This is the case for which subjects perceive the edge to be on the right of the main axis. This network was not designed to detect the presence of a gap but a simple signal-to-noise argument would predict that the presence of the gap should be more readily detected for counterclockwise than clockwise rotation, as reported by Driver et al. in patients. The interesting point to note is that the objectcentered neglect is obtained here even though the network does not contain an explicit objectcentered representation: it does not use cells with motor fields or receptive fields in object-centered coordinates. Instead, the representation uses a more implicit format, involving basis functions of the inputs, which is computationally efficient and consistent with single cell data.
94
Discussion The model of the parietal cortex presented here was originally developed by considering the response properties of parietal neurons and the computational constraints inherent in sensorimotor transformations. It was not designed to model neglect, so its ability to account for a wide range of deficits is additional evidence in favor of the basis function hypothesis. The basis function model captures three essential aspects of the neglect syndrome: (1) it reproduces the pattern of line crossing of parietal patients in line cancellation; (2) the deficit coexists in multiple frames of reference simultaneously; and (3) the model accounts for relative and object-centered neglect. These results rely in pan on the existence of monotonic gradients along the retinal and eye position axis (or more generally, the posture axis) of the basis function map. The retinal gradient is supported by recordings from single neurons in the parietal cortex (Andersen et al., 1990), but gradients for the postural signals remain to be demonstrated. The retinal gradient hypothesis is also at the heart of Kinsbourne’s theory of hemineglect (Kinsbourne, 1987) and some models of neglect dyslexia and line bisection are based on a similar idea (Mozer and Behrmann, 1990; Anderson, 1996; Mozer et al., 1997). Other behaviors of hemineglect patients can also be captured by this model, such as the patterns of line bisection or the recovery after vestibular caloric stimulation (see Pouget and Sejnowski, 1997, 1999). The model presented here cannot account for the fact that left neglect is much more common than right neglect (Heilman, 1985). The reason for this asymmetry is unclear but the common explanation depends on an asymmetry in the hemispheric representations. Whereas the left hemisphere may represent only the right hemifield, the right parietal cortex appears to represent both hemifields in some patients (Kinsbourne, 1987). This would suggest that contrary to what we have assumed in the model, the gradients in the right and left representations are not simply mirror images. Instead, the right hemisphere may have a shallower contralateral gradient and the left hemisphere may have a
steeper gradient. An asymmetric gradient would lead to a preference for the left side of space in a normal network. There is evidence for such a leftward preference in normal subjects and it is therefore possible that the asymmetry observed with right and left neglect is indeed due to a difference in the gradients in the left and right spatial representations (Kinsbourne, 1987; Ladavas, 1990). One interesting aspect of our approach is that there is no need to represent explicitly all frames of reference to account for the behavior of patients. Instead we have assumed that the position of objects is represented by basis functions, a representation that spans multiple Cartesian frames of reference simultaneously. Consequently, any attempt to determine the Cartesian space in which hemineglect operates is bound to lead to inconclusive results in which Cartesian frames of reference appear to be mixed. This perspective on frames of reference has interesting implications for rehabilitation of neglect patients. It is often assumed that left neglect patients would improve if one could teach, or force, them to orient toward the left side of space. It has been shown, for example, that neglect improves after caloric stimulation of the vestibular system (consisting of an injection of cold water in the left ear; see Rubens, 1985), or after the presentation of a leftward motion flow field (Pizzamiglio et al., 1990). In both cases, the stimulation induces a series of left eye movements, or nystagmus, forcing the subjects to look toward the left. However, this explanation fails to account for the fact that the recovery persists for several minutes after the left nystagmus stops. The basis function hypothesis suggests a different interpretation. The neuronal gradient in the lesioned network is such that any change of body posture toward the right improves the saliency of visual stimuli appearing in the left retinal visual hemifield (Fig. 5A). Hence, in the experiment by Karnath et al. (1993; Fig. lB), the detection and recognition of the left visual stimulus improves when the head is turned toward the right. The same mechanism could explain the effect of the caloric and flow field stimulation. Cold water in the left ear induces an illusion of a head rotation toward the
95
right. Likewise, a leftward flow field is consistent with a head rotation toward the right. If the brain uses a leaky integrator in both of these situations to compute the amount of rotation, then one would expect the recovery to persist beyond the period of stimulation before slowly disappearing. It might therefore be worthwhile to pursue rehabilitation methods that take advantage of these general principles; namely, that patients improve when they adjust their posture toward the ‘good’, or ipsilesional, side of space.
Acknowledgments This research was supported in part by a fellowship from the McDonnell-Pew Center for Cognitive Neuroscience and an R29 NIH grant to A.P. and grants from the Office of Naval Research and the Howard Hughes Medical Institute to T.J.S. We thank Daphne Bavelier for her comments and suggestions.
References Andersen, R.A., Essick, G.K. and Siegel, R.M. (1985) Encoding of spatial location by posterior parietal neurons. Science, 230: 456-458. Andersen, R.A., Asanuma, C., Essick, G. and Siegel, R.M. ( I 9YOa) Corticocortical connections of anatomically and physiologically defined subdivisions within the inferior parietal lobule. J. Camp. Neurol., 296( I ) : 65-1 13. Andersen, R.A., Bracewell, R.M., Barash, S., Gnadt, J.W. and Fogassi, L. (1990b) Eye position effect on visual memory and saccade-related activity in areas LIP and 7a of macaque. J. Neurosci., 10: 1176-1 196. Anderson, B. (1996) A mathematical model of line bisection behaviour in neglect. Brain, 119: 841-850. Arguin, M. and Bub, D.N. (1993) Evidence for an independent stimulus-centered reference frame from a case of visual hemineglect. Cortex, 29: 349-357. Behrmann, M. and Moscovitch, M. (1994) Object-centered neglect in patients with unilateral neglect: effects of left-right coordinates of objects. J. Cogn. Neurosci., 6(2): 15 1-155. Bisiach, E., Capitani, E. and Porta, E. (1985) Two basic properties of space representation in the brain: evidence from unilateral neglect. J. Neurol., Neurosurg. Psychiatry, 48: 1 4 1- I 44. Boussaoud, D., Barth, T.M. and Wise, S.P. (1993) Effects of gaze on apparent visual responses of frontal cortex neurons. Exp. Brain Rex, 93(3): 423-34. Bremmer, F. and Hoffmann, K.P. (1993) Pursuit related activity in macaque visual cortical areas MST and LIP is modulated by eye position. Soc. Neurosci. Ahsti:, p. 1283.
Brotchie, P.R., Andersen, R.A., Snyder, L.H. and Goodman, S.J. (1995) Head position signals used by parietal neurons to encode locations of visual stimuli. Nature, 375(6528). 232-5. Burgess, N. (1995) A solvable connectionist model of immediate recall of ordered lists. In: Tesauro, G., Touretzky, D.S. and Leen, T.K. (Eds.), Advances Neural Inform. Process. Syst., vol. 7. Cambridge, MA: MIT Press. Burnett-Stuart, G., Halligan, P.W. and Marshall, J.C. (1991) A newtonian model of perceptual distortion in visuo-spatial neglect. NeuroReport, 2: 255-257. Calvanio, R., Petrone, P.N. and Levine, D.N. (1987) Left visual spatial neglect is both environment-centered and bodycentered. Neurology, 37: 1 179-1 18 1. Caramazza, A. and Hills, A.E. (1990) Spatial representation of words in the brain implied by studies of a unilateral neglect patient. Nature, 346: 261-269. Cohen, J.D., Farah, M.J., Romero, R.D. and Servan-Schreiber, D. (1994) Mechanisms of spatial attention: the relation of macrostructure to microstructure in parietal neglect. J. Cogn. Neurosci., 6(4): 377-387. Colby, C.L. (1998) Action oriented spatial reference frames in cortex. Neuron, 20: 15-24. Colby, C.L. and Duhamel, J.R. (1993) Ventral intraperietal area of the macaque: anatomic location and visual response properties. 1.Neurophysiol., 69(3): 902-914. Deneve, S. and Pouget, A. (1998) Neural basis of objectcentered representations. In: Jordan, M.I., Keams, M.J. and Solla, S. (Eds.), Advances in Neural Information Processing Systems, vol. 10. MIT Press, Cambridge MA: MIT Press. Drain, M. and Reuter-Lorenz, P.A. (1997) Object-centered neglect for letters: do informational asymmetries play a role? Neuropsychologia, 35(4). 445-456. Driver, J., Baylis, G.C., Goodrich, S.J. and Rafal, R.D. (1994) Axis-based neglect of visual shapes. Neuropsychologia, 32(11): 1353-1365. Duhamel, J.R., Bremmer, E, BenHamed, S . and Graf, W. (1997) Spatial invariance of visual receptive fields in parietal cortex neurons. Nature, 389(6653): 845-848. Farah, M.J., Brunn, J.L., Wong, A.B., Wallace, M.A. and Carpenter, P.A. (1990) Frames of reference for allocating attention to space: evidence from the neglect syndrome. Neuropsychologia, 28(4), 335-347. Field, P.R. and Olson, C.R. (1994) Spatial analysis of somatosensory and visual stimuli by single neurons in macaque area 7B. SOC. Neurosci. Absti:, vol. 20(1), p. 3 17.12. Galletti, C. and Battaglini, P.P. (1989) Gaze-dependent visual neurons in area V3a of monkey prestriate cortex. J. Neuroscience, 9: 11 12-1 125. Goldberg, M.E., Colby, C.L. and Duhamel, J.R.(1990) Cold Spring Harbor Symposia on Quantitative Biology. Representation of visuomotor space in the parietal lobe of the monkey, 55: 729-739. Goodale, M.A. and Milner, A.D. (1990) Separate visual pathways for perception and action. Trends in Neuroscience, 15: 20-25.
96 Goodman, S.J. and Andersen, R.A. (1990) Algorithm programmed by a neural model for coordinate transformation. In: International Joint Conference on Neural Networks. Graziano, M.S.. Yap, G.S. and Gross, C.G. (1994) Coding of visual space by premotor neurons. Science, 266, 1054-1057. Green, D.M. and Swets, J.A. (1966) Signal detection theory and psychophysics. John Wiley & Sons. Halligan, P.W. and Marshall, J.C. (1989) Line bisection in visuo-spatial neglect: disproof of a conjecture. Cortex, 25: 5 17-52 I . Heilman, K.M., Watson, R.T. and Valenstein, E. (1985) Neglect and related disorders. In: Heilman, K.M. and Valenstein, E. (Eds.), Clinical Neuropsychology. New York: Oxford University Press, pp. 243-294. Karnath, H.O., Schenkel, P. and Fischer, B. (1991) Trunk orientation as the determining factor of the ‘contralateral’ deficit in the neglect syndrome and as the physical anchor of the internal representation of body orientation in space. Brain, 114: 1997-2014. Kamath, H.O., Christ, K. and Hartje, W. (1993) Decrease of contralateral neglect by neck muscle vibration and spatial orientation of trunk midline. Brain. 116, 383-396. Kinsbourne, M. ( I 987) Mechanisms of unilateral neglect. Pages 69-86 of Jeannerod, M. (Ed.), Neuro~~hysiologicaland Neuropsychological Aspects of Spatial Neglect. North-Holland. Koch, C. and Ullman, S. (1985) Shifts in selective visual attention: towards the underlying neural circuitry. Hum. Neurobiology, 4(4), 2 19-27. Ladavas, E. (1987) Is the hemispatial deficit produced by right parietal lobe damage associated with retinal or gravitational coordinates? Bruin, 110: 167-180. Ladavas, E., Pesce, M.D. and Provinciali, L. (1989) Unilateral attention deficits and hemispheric assymmetries in the control of visual attention. Neuropsychologia, 27(3): 353-366. Ladavas, E., Petronio, A. and Umilta, C. (1990) The deployment of visual attention in the intact field of hemineglect patients. Cortex, 26: 307-3 17. Mazzoni, P. and Andersen, R.A. (1995) Gaze coding in the posterior parietal cortex. In: Arbib, M.A. (Ed.), The Handbook of Brain Theory and Neurul Networks. Cambridge, MA: MIT Press, pp. 423426. Mishkin, M., Ungerleider, L.G. and Macko, K.A. (1983) Object vision and spatial vision: two cortical pathways. Trends in Neurosci., Oct: 414417. Mozer, M.C. and Behrmann, M. (1990) On the interaction of selective attention and lexical knowledge: a connectionist account of neglect dyslexia. J. Cogn. Neurosci., 2(2): 96123. Mozer, M.C., Halligan, P.W. and Marshall, J.C. ( 1997) The end of the line for a brain-damaged model of hemispatial neglect. J. Cogn. Neurosci., 9(2): 171-190. Olson, C. and Gettner, S. (1998) Impairment of object-centered vision following lesion of macaque posterior parietal cortex. Soc. Neurosci. Abstr., p. 449.10.
Olson, C.R. and Gettner, S.N. (1995) Object-centered direction selectivity in the macaque supplementary eye. Science, 269: 985-988. Pizzamiglio, L., Frasca, R., Guarglia, C., Incoccia, C. and Antonucci, G . (1990) Effect of optokinetic stimulation in patients with visual neglect. Cortex, 26: 535-540. Poggio, T. (1990) A theory of how the brain might work. Cold Spring Harbor Symposium on Quantitative Biology, 55: 899-910. Poggio, T. and Girosi, F. (1990) Regularization algorithms for learning that are equivalent to multilayer networks. Science, 247: 978-982. Pouget, A. and Driver, J. (1999) Visual neglect. In: Wilson, R. and Keil, E (Eds.), MIT Encyclripedia and Cognitive Sciences. Cambridge: MIT Press. Pouget, A. and Sejnowski, T.J. (1995) Spatial representations in the parietal cortex may use basis functions. In: Tesauro, G., Touretzky, D.S. and Leen, T.K. (Eds.), Advances in NeuruI Information Processing Systems, vol. 7 . Cambridge, MA: MIT Press. Pouget, A. and Sejnowski, T.J. (1997a) A new view of hemineglect based on the response properties of parietal neurons. Philosoph. Transac. Roy. Soc. Series B , 352: 1449-1 459. Pouget, A. and Sejnowski, T.J. (1997b) Spatial transformations in the parietal cortex using basis functions. J. Cogn. Neurosci., 9(2): 222-237. Pouget, A. and Sejnowski, T.J. (1999) Lesioning a basis function model of spatial representations in the parietal cortex: comparison with hemineglect. Psycholug. Rei:, submitted. Rubens, A.B. (1985) Caloric stimulation and unilateral visual neglect. Neurology, 35: 1019-1024. Salinas, E. and Abbott, L.F. (1995) Transfer of coded information from sensoty to motor networks. J. Neumsci.. 15(10): 6461-74. Salinas, E. and Abbott, L.F. (1996) A model of multiplicative neural responses in parietal cortex. Proc. Nat. Acad. Sci., USA, 93: lI956-1196l. Schlag-Rey, M. and Schlag, J. (1984) Visuomotor functions of central thalamus in monkey. I. Unit activity related to spontaneous eye movements. J. Neurophysiol., 5 1(6), 1149-74. Sparks, D.L. (1991) Sensori-motor integration in the primate superior colliculus. Seminars in the Neurosciences, 3: 39-SO. Stein, J.F. (1992) The representation of egocentric space in the posterior parietal cortex. Behav. Brain Sci., 15(4): 69 1-700. Tipper, S.P. and Behrmann, M. (1996) Object-centered not scene-based visual neglect. J, Experimentul Psychology, Humun Perception and Perjomzance, 22(5): 1261-1 278. Treisman, A. and Gelade, G. (1980)A feature integration theory of attention. Cogn. Psycho/., 12: 97-136. Trotter, Y., Celebrini, S., Stricanne, B.. Thorpe, S. and lmbert, M. (1992) Modulation of neural stereoscopic processing in primate area VI by the viewing distance. Science, 257: 1279-81.
97 Van Opstal, A.J., Hepp, K., Suzuki, Y. and Henn, V. (1995) Influence of eye position on activity in monkey superior colliculus. J. Neuroplzysial., 74(4): 1593-1610.
Zipser, D. and Andersen, R.A. (1988) A back-propagation programmed network that stimulates responses properties of a subset of posterior parietal neurons. Nature, 33 1: 679-684.
This Page Intentionally Left Blank
J.A. Reggia, E. Ruppin and D. Glanrman (Eds.) Progress in Brain Research, Val 121 D 1999 Elsevier Science BV. All rights reserved.
CHAPTER 7
Explaining object-based deficits in unilateral neglect without object-based frames of reference Michael C . Mozer * Department of Computer Science and Institute of Cognitive Science, Universiiy of Colorado, Bouldec CO 80309-0430, USA
Introduction A key question motivating research in perception and attention is how the brain represents visual information. One aspect of this representation is the reference frame with respect to which visual features are encoded. The reference frame specifies the center location, the up-down, left-right, and front-back directions, and the relative scale of each axis. Reference frames can be prescribed by the viewer’s gaze, intrinsic characteristics of an object, or the environment. To determine the frames of reference involved in human vision and attention, neurological patients with unilateral neglect have been extensively studied. Neglect patients often fail to orient toward, explore, and respond to stimuli on the left. The interesting question is: with respect to what frame of reference is neglect of the left manifested? When a neglect patient shows a deficit in attentional allocation that depends not merely on the location of an object with respect to the viewer but on the extent, shape, or movement of the object itself, the inference is often made that attentional allocation must be operating in an object-based frame of reference. Via simulations of an existing connectionist model of spatial attention which operates entirely in a viewer-based frame of reference, I argue that this inference is not logically necessary: object-based attentional effects can be obtained without object-based frames of reference. *Correspondingauthor. [email protected]
By explaining object-based effects without objectbased reference frames, our simulations call into question the common assumption that object-based representations are required in visual attention and perception.
Reference frames in visual perception Figure 1 shows two different frames. Using the reference frame centered on the telephone, the phone handle would be described as being U P relative to the phone base, but using the reference frame centered on the viewer, the phone handle would be described as being relatively in back. Reference frames can be prescribed by the viewer, objects, or the environment. Viewer-based frames are determined by the gaze, head orientation, a n d or torso position of the viewer. Object-based frames are determined by intrinsic characteristics of an object, such as axes of symmetry or elongation, or knowledge of the object’s standard orientation. Environment-based frames are based on landmarks in the environment, such as walls in a room, or other absolutes, such as gravity or compass directions.’ Alternative terminology for these three reference frames abounds in the literature. Retinotopic, headcentered and body-centered are specific instances of viewer based; stimulus based is equivalent to object based; and gravitational and scene-based are instances of environment based. Further, egocentric is often used as a synonym for viewer based, and allocentric as a synonym for environment based.
100
t
up
rig74 front
f right Fig. 1. Two reference frames that can describe the telephone, one of which is intrinsic to the object and the other is based on the viewer’s gaze. The reference frame prescribes the center location, the up-down, left-right, and front-back directions, and the scale of each axis (indicated by the mark along each axis).
Determining which reference frame or frames are used by the brain to encode visual features is a key step to understanding the mechanisms of visual cognition, most importantly object recognition. Historically, two contrasting models of object recognition have been proposed, one of which relies on object-based frames and the other does not. By determining the psychological and neurobiological reality of neocortical object-based representations, we can determine which model is more likely to be correct. The two models, caricatured in Fig. 2, are focused on solving a key problem in object recognition - viewpoint-invariant recognition, i.e. identifying the object as being the same regardless of its position, orientation, and distance with respect to the viewer. The traditional model involves constructing an internal description of the
structure of the object from the observed visual features (Fig. 2a; Hinton, 1981; Man; 1982; Pinker, 1984; Zemel et al., 1988; Caramazza and Hillis, 1990a). In our terminology, this boils down to transforming the viewer-based input representation to an object-based representation. This transformation solves the problem of viewpoint-invariant recognition, because every view of an object maps to the same object-based representation (modulo occluded features). Given the object-based representation, recognition of a rigid object can be performed by matching the representation to stored templates of familiar objects. One might call this a two-stage model because of the separation of the transformation and recognition processes. In contrast, a model one might call one-stage involves the gradual construction of a transformation-invariant
Fig. 2. Two models of viewpoint-invariant object recognition. (a) In the two-stage model, visual features corresponding to an object (the shaded circle) are transformed from a viewer-based representation to an object-based representation, and then, for rigid objects. recognition consists of a template matching process. (b) In the one-stage model, many transformations of the features in the visual input are considered in parallel, and recognition is achieved via a multistage hierarchical process that constructs increasingly complex featural representations with increasing viewpoint-invariance.
101
representation in the course of performing recognition (Fig. 2b; Hubel and Wiesel, 1979; Fukushima and Miyake, 1982; Le Cun et al., 1989; Mozer, 1991). The one-stage model involves a hierarchy of detectors - depicted in Fig. 2b by the pyramid structure - that construct increasingly complex featural representations that are increasingly transformation invariant. Rather than being forced to choose a single reference frame of representation, parallelism of hardware in this model allows the model to consider multiple transformations of features at each level in the hierarchy. To contrast these two models, consider the task of recognizing an object independent of its location in the visual field. The two-stage model first selects the region of interest in the visual field, and then processes only the visual features at that location, whereas the onestage model processes visual features from across the visual field, but as processing proceeds, features in the region of interest come to dominate the representation.* Perhaps through the powerful influence of Marr's work, the two-stage model is often taken as canonical, and has provided a pervasive but often implicit framework for theoretical interpretation of experimental results. However, skepticism about the two-stage model is warranted, on three grounds. First, in primate single-cell recording studies, visual object representations do not achieve complete viewpoint invariance (Tanaka, 1993); these data are more consistent with the one-stage model, which predicts limited invariance, than the twostage model, which requires the neo-cortical reality of object-based representations and hence a viewpoint-invariant representation. Second, in human behavioral studies, some support is found for viewspecific object representations (Bar, 1998; Vetter et al., 1995; Tan; 1995). Third, implementing the twostage model has proven tricky, in part because an object-based frame cannot always be established in a purely bottom-up fashion, and the alternative ~.
.
'Attention is required to direct processing in both frameworks. In the two-stage model, attention might specify the viewer-based to object-based transformation, and in the one-stage model, attention might highlight the representation of visual features in the region of interest.
approach - an interactive model in which the object-based frame and the object identity are determined in parallel through constraint satisfaction - involves a massive combinatorial search that has many local optima (Hinton and Lang, 1985; O'Reilly et al., in press, Chapter 6 ). Although some empirical studies fail to find support for object-based representations, and the mechanisms capable of constructing object-based representations are as yet unknown, one cannot definitively rule out the possibility of object-based representations in visual cortex. A compelling demonstration of the existence of object-based representations would provide strong support for the two-stage model, and would cast doubt upon the one-stage model on grounds of parsimony. For this reason, there has been intense interest in neurological patients with unilateral neglect, who provide a rich source of data diagnostic of the reference frames involved in human perception.
Unilateral neglect Damage to parietal cortex can cause patients to fail to orient toward, explore, and respond to stimuli on the contralesional side of space (Farah, 1990; Heilman et al., 1993). This disorder, known as unilateral neglect can compromise visual, auditory, tactile, and olfactory modalities and may involve personal, extrapersonal, and imaginal space (Halligan and Marshall, 1993). Unilateral neglect is more frequent, long lasting, and severe following lesions to the right hemisphere than to the left. Consequently, all descriptions in this paper will refer to right-hemisphere damage and neglect of stimuli on the left. The interesting question surrounding unilateral visual neglect is: with respect to what reference frame is left neglect manifested? Clever behavioral experiments have been designed to dissociate various reference frames and determine the contribution of each to neglect. In several experiments, patients show a deficit in attentional allocation that depends not merely on the location of an object with respect to the viewer, but on the extent, shape, or movement of the object itself. From this finding of object-based attentional effects, the inference is often made that attentional allocation must be operating in an object-based
102
frame of reference, and consequently, object-based representations are key to visual information processing. The point of this paper is to show that this inference is not logically necessary: Object-based attentional effects can be obtained without objectbased reference frames. Consequently, the neglect data that has been mustered as strong support for the two-stage model of visual recognition is equally consistent with the one-stage model. We argue this point via a computational model that utilizes only viewer-based frames, yet can account for data from a variety of experimental studies that were interpreted as supporting objectbased frames. Through simulations of the computational model, it becomes evident that the data is trickier to interpret than one might at first imagine. In the next section, we present the model and explain key principles of the model that will allow us to account for data. Then, we show simulation results for several different studies. And we conclude with a discussion of other data in the literature that has been used as evidence for and against the neurobiological reality of object-based frames.
MORSEL MORSEL (Mozer, 1991; Mozer and Sitton, 1998) is a connectionist model of visual perception and
attention. The model has previously been used to explain a large corpus of experimental data, including perceptual errors that arise when several shapes appear simultaneously in the visual field, facilitatory effects of context and redundant information, visual search performance, attentional cueing effects, reading deficits in neglect dyslexia (Mozer and Behrmann, 1992), and line bisection performance in neglect (Mozer et al., 1997). MORSEL (Fig. 3) includes a recognition network that can identify multiple shapes in parallel and in arbitrary locations of the visual field, but has capacity limitations. Thus, it is a single-stage recognition model, as depicted in Fig. 2b. MORSEL also includes an attentional mechanism that determines where in the visual field to focus processing resources. Visual input presented to MORSEL is encoded by a set of feature detectors arrayed on a topographic map. The detectors are of five primitive feature types: oriented line segments at O", 45", 90°, 135", and line-segment terminators (end of line segments). Figure 4 shows a sample input to MORSEL, four letters centered on comers of a square, where the representation of each letter occupies a 3 x 3 region of the topographic map. The upper panel presents the superimposed features, and the bottom panels separate the topographic map by feature type. In these separate maps, a dark symbol indicates activity of the detector for the given feature at the particular location, a light symbol indicates inactivity. Activity from the
Fig. 3. Key components of MORSEL (Mozer, 1991). MORSEL includes a recognition network, the first stages of which are depicted against a grey background, and an attentional mechanism.
103
topographic map propagates through both the recognition network and the attentional mechanism. In our earlier modeling work, we stipulated that the topographic map is in a viewer-based reference frame, meaning that the input representation changes as the viewer moves through the world. However, our earlier work did not require us to commit as to the precise nature of the viewer-based frame, whether it be retinotopic, head centered, or body centered. Because the experimental paradigms that we simulate in this work confound eye, head, and body position, the various viewer-based frames are equivalent, and no specific commitment is required now either. MORSEL is primarily a model of psychological structures, not neurobiological structures. One might treat MORSEL'S primitive visual features as corresponding to primary visual cortex, and the AM as corresponding to parietal cortex. Beyond this loose fit to neurobiology, we do commit to a neurobiological instantiation at present. We instead treat MORSEL as a psychological-level theory which describes functional processing in neocortex. Consequently, we characterize processing units in the model in terms of their functional properties, not neurobiology. For example, the left visual field is represented by units on the left side of the primitive feature maps, even though those units would correspond to V1 neurons in the right cerebral hemisphere.
MORSEL is not intended as a model of human development. The recognition network is trained to reproduce adult competence, but MORSEL makes no claims as to the nature of developmental processes that give rise to adult competence in visual perception. The connectivity of the attentional mechanism is determined by principles we describe in the next section. The connectivity is fixed in all simulations; connectionist learning procedures are not involved. MORSEL is a comprehensive model consisting not only of the recognition network and attentional mechanism depicted in Fig. 3, but several other elements that we sidestepped because of their irrelevance to the present work. Simulating the entire model can be a problem, because it is difficult to identify which component and properties of the model are responsible for producing a certain behavior. Consequently, our strategy has been to simulate only the critical components of the model, and to make simple assumptions concerning the operation of other components. We adopt this strategy in the present work, and use only the attentional mechanism to account for data from unilateral neglect, much as we did in Mozer et al. ( 1 997). The attentional mechanism
The attentional mechanism, or AM for short, is a set of processing units in one-to-one correspondence with the locations in the topographic map.
Fig. 4. Top panel: Sample input to MORSEL, consisting of the letters A, C, D, and X, encoded in terms of five primitive features-line segments at four orientations and segment terminators (circles). Bottom panels: Feature map activity corresponding to the sample input. A dark symbol indicates the activity of the detector for a particular feature in a particular location; a light symbol indicates inactivity.
104
Activity in an AM unit indicates the salience of the corresponding location, and serves to gate the flow of activity from feature detectors at that location in the topographic map into the recognition network (indicated in Fig. 3 by the connections from the AM into the recognition network); the more active an AM unit is, the more likely that features in the corresponding location of the topographic map will be detected and analyzed by the recognition network. However, the AM serves only to bias processing: features from unattended locations are not absolutely inhibited, but have a lower probability of being detected by the recognition network. Each unit in the AM receives bottom-up or exogenous input from the detectors in the
corresponding location of the topographic map (indicated in Fig. 3 by the connections from the primitive features to the AM). Each unit in the AM can also receive top-down or endogenous input from higher centers in the model, but this aspect of the model is not utilized in the present research. Given the exogenous and endogenous input, cooperative and competitive dynamics within the AM cause a subset of locations to be activated. Figure 5 shows an example of the AM in operation. Each panel contains a 15x 15 topographic map depicting the state of the AM after various numbers of processing time steps or iterations. The area of a black square is proportional to the exogenous input at that location. The
Fig. 5 . Example of the operation of the AM. Each panel contains a 15 X 15 topographic map depicting the state of the AM at a particular processing iteration. The area of a black square is proportional to the exogenous input at that location. The area of a white square is proportional to the AM activity. The white squares are superimposed on top of the black; consequently, the exogenous input is not visible at locations with AM activity. The exogenous input pattern indicates three objects, the largest one-the one producing the strongest input-is in the upper left portion of the field. By iteration 20. the AM has reached equilibrium and has selected the region surrounding the largest object.
105
area of a white square is proportional to the AM activity. The white squares are superimposed on top of the black; consequently, the exogenous input is not visible at locations with AM activity. Initially, at iteration 0, the AM is reset and has no activity. Three distinct blobs of feature activity are evident on the input, but as processing proceeds, the AM selects the largest blob. Note that the input blobs do not indicate the type or precise arrangement of features, just the total activity in a region. Although the model appears to have formed a spotlight of attention, the dynamics of the model do not mandate the selection of a contiguous or convex region. Typically, however, a single region is selected, and the selected region conforms to the shape of objects in the visual input, tapering off at object boundaries. The operation of the AM is based on three principles concerning the allocation of spatial attention, which most would view as noncontroversial: Attention is directed to locations in the visual field where objects appear, as well as to other task-relevant locations. Attention is directed to contiguous regions of the visual field. Attention has a selective function; it should choose some regions of the visual field over others. These abstract principles concerning the direction of attention can be incorporated into a computational model like the AM by translating them into rules of activation, such as the following: (1) Locations containing visual features should be
activated. This rule provides a bias on unit activity (i.e., all else being equal, the principle indicates whether a unit should be on or off). One can see this rule at work in Fig. 5, where the initial activity of the AM (upper-middle frame) is based on the exogenous input (upperleft frame). (2) Locations adjacent to activated locations should also be activated. This rule results in cooperation between neighboring units, and is manifested in Fig. 5 by the increase in activity
over time for the blob in the upper left portion of the field. (3) Locations whose activity grows the slowest should be suppressed. This rule results in competition between units, and is manifested in Fig. 5 by the decrease in activity for the two lower blobs once the upper-left blob begins to dominate in activity. This rule allows a large region to become activated, if the activity of all units in the region rises at more-or-less the same rate. These three rules qualitatively describe the operation of the model. The model can be characterized quantitatively through an update equation, which expresses the activity of a processing unit in the AM as a function of the input to the AM and the activities of other AM units. If we denote the activity of an AM unit at location (x, y ) in the topographic map at a particular time t by a,(t), then its new activity at the following time step is expressed as a,(t
+ 1) =f
NEIGH,,
(1) where emv is the exogenous input to the AM from features in the topographic map at location (x, y), f is a linear threshold function that caps activity at 0 and 1 .
f(z)=
0 ifz
i
and NEIGH, is the set of eight locations adjacent to (x, y). The first term on the right side of Eqn 1, a,(t), causes a unit to sustain its activity over time. The second term, exo, implements the bias rule. The third term implements the cooperation rule by causing an increase in activity when a unit is less
106
active than its neighbors. Because it also causes a decrease in activity when a unit is more active than its neighbours, the third term can be viewed as encouraging a unit to take on the average value of its neighbors. Finally, the fourth term in Eqn 1 implements the competition rule by causing a decrease in activity when a unit is less active than ii(t), a measure of the average activity of AM, defined below. The parameters p and 8 are positive and weight the contribution to the activation dynamics of the cooperation and competition rules, respectively. The average activity, ii, requires some additional explanation. The fourth term in Eqn 1 causes a unit’s activity to be inhibited in proportion to ii. If ii were simply the mean activity level of all AM units - i.e..
where 0 < y d 1. If y = 1, a unit must have an activity level above the mean to remain on, but if y < 1, the mean is depreciated and units whose activity is slightly below the mean will not be suppressed. To explain the activation function intuitively, consider the time course of activation as depicted in Fig. 5. Initially, the activity of all AM units is reset to zero. When a stimulus display is presented, features are activated in the topographic map, which provides exogenous input to the AM (second term in Eqn 1). Units with active neighbors will grow the fastest because of neighborhood support (third term). As the flow of activation progresses, high-support neighborhoods will have activity above the mean; they will therefore be pushed even higher, while low-support neighborhoods will experience the opposite tendency (fourth term). Lesioning the AM to produce neglect
where n is the number of units in the AM -the level of inhibition would rise or fall as the total activity rises or falls, causing the total activity to remain roughly constant; consequently, the AM would tend to select a fixed-size region. The AM should be capable of attending to small or large regions, depending on the stimulus and task environment. To achieve this property, the inhibition between each pair of units is modulated by the number of active units, instead of what amounts to fixed inhibition between units. That is, ii is defined as the mean activation considering only the active units, computed by replacing n with nAcT,the number of active units:
.
a,
nACT = lim -~ . c-0
E
+a,?.
As E approaches zero, nACTbecomes simply the number of units with positive activity levels. In the original model, it turned out that to control the behavior of the AM, an additional depreciation factor, 7 , was needed in the definition of ii:
To model data from neglect dyslexia (Mozer and Behrmann, 1992) and line bisection (Mozer et al., 1997), we proposed a particular form of lesion to the model - damaging the connections from the primitive feature maps to the AM. The damage is graded monotonically, most severe at the left extreme of the topographic map and least severe at the right (assuming a right hemisphere lesion, as we do throughout this article). Figure 6 depicts the damaged connections into the AM. The graded damage is important, because it results in a relative preference for the right; complete destruction of the connections in the left field and fully intact connections in the right field would yield a qualitatively different sort of behavior. The graded damage we propose is inspired by Kinsboume’s (1987, 1993) orientational basis account of neglect. This proposal for lesioning the model can be contrasted with two alternatives. First, one might ’The damage we propose is described in functional terms4.e. how the damage affects the operation of the model. The model is neutral with regard to the neurobiological basis of this damage-i.e. how a unilateral brain lesion results in damage of this functional form. Additional assumptions will be required to specify the model at a neurobiological level.
107
Fig. 6. A sketch of the AM and some of its inputs from the primitive feature maps. Each feature detector connects to the homologous unit in the AM. In neglect, graded damage to these connections is hypothesized, resulting in feature detectors to be less effective in activating the AM. The damage is depicted by the fainter connections toward the left side of the field.
damage the visual recognition network itself. However, this would lead to blindness, and is inconsistent with the view of neglect as an attentional phenomenon and with the neuroanatomical lesion sites that give rise to neglect. Second, one might lesion the AM directly, either changing the activation dynamics or connectivity of the units such that damaged units integrated activity more slowly or had a weakened influence on the activity of other units. We conjecture that these types of lesions would yield a behavioral effect similar to the proposed lesion for the simulation studies reported in this article. The damage depicted in Fig. 6 affects the probability that primitive visual features are detected by the AM. To the extent that features in a given location fail to trigger attention, the AM will fail to focus attention at that location. Thus, the
deficit is not ‘perceptual’, in the sense that if somehow attention can be mustered, features will be analyzed normally by the recognition network. The nature of the attentional deficit is specified via a function relating the horizontal position of a feature on the topographic map to the probability that the feature will be transmitted to the corresponding location of the AM (Fig. 7). The function is piecewise linear with a flat segment, followed by a segment with positive slope, followed by another flat segment. The left and right extremes of the curve represent the left and right edges of the topographic map, respectively. The probability that the AM will register a feature is low in the left field, and is monotonically decreasing further to the right. The function is characterized by four parameters: (1) the minimum transmission probability (anchor
transmission probability
saturation
anchor probability’ left
’
anchor position
center
W t
position in viewer centered frame
Fig. 7. The transmission probability curve representing the damage to the model’s attentional system. This function relates the position of a feature in the viewer-centered frame to the probability that the feature will be detected by the corresponding unit of the AM. The function is for a left neglect patient; the probability that the AM will register a feature is low in the left field, and is monotonically nondecreasing further to the right.
108
probability); (2) the horizontal position in the topographic map at which the probability begins to rise (anchor position); ( 3 ) the slope of the rising segment (gradient); and (4) the probability of feature transmission on the right extreme of the topographic map (saturation probability).This parameterization allows a variety of transmission functions, including forms corresponding to normals (e.g. a minimum probability close to 1 and a gradient of 0), and homogeneous slope across the entire field (e.g. a shallow gradient and a saturation position at the far right edge), and a sharp discontinuity at the hemifield crossing (a very steep gradient and a saturation position just to the right of centre). Presumably the exact nature of the function varies from patient to patient. Regardless of the specific form of damage, we emphasize that the damage is to a viewer-centered representation of space. General simulation methodology The AM as described is identical to the model used in our earlier simulation studies of neglect. However, the nature of the model’s input was changed in one minor respect. In the earlier simulation studies, when a stimulus display was presented to the model, the exogenous input to the AM was determined probabilistically based on the stimulus display and the transmission-probability function (Fig. 7). The exogenous input then remained constant as the AM settled. However, when the display itself is not static - as is the case in one simulation reported here - the exogenous input cannot be static. Consequently, in these simulations, we recomputed the exogenous input at each time step of the simulation. This recomputation had no systematic effect for static displays, but allowed us to simulate the AM for dynamic displays. The AM has three parameters: p, 8 and y. In our earlier simulations utilizing the AM, p was fixed at 1/8 and 8 at 1/2. These values were used in the present research as well. The third parameter, y, is dependent on the amount of activity in the stimulus display. In our earlier simulations, we devised a formula for setting y based on the total exogenous input to the AM, exo,, and a metaparameter y that modulates the fraction of the locations that provide
exogenous input to the AM that should be selected by the AM:
i e3
y =min 0.75, max 1.0,
.
y was originally conceived as task- and stimulus-
independent, and earlier simulations of the AM used a constant y f . However, we discovered in the present work - which covers a much wider variety of stimulus displays than the previous simulations that it was necessary to set y f for each experimental paradigm. The adjustment was performed to obtain sensible behavior from the AM, not to fit simulation data to human data. The model’s behavior was qualitatively robust to the choice of y’. However, if y’ was too large, the AM would fail to be selective, and if y f was too small, all activity in the AM would die out. We set y’ to 240 for simulations of Behrmann and Tipper (BT) and 110 for simulations of Arguin and Bub (AB). In Mozer et al. (1997), we simulated a range of lesions by varying the four parameters in the transmission-probability curve. For the present work. however, we chose a single lesion profile that had produced typical results in the earlier work. This profile had an anchor probability of 0.3 and a saturation probability at 0.9. The anchor position was at the left edge of the topographic map, and the gradient was chosen such that saturation was reached 5/6 of the way to the right edge of the topographic map. For the unlesioned model, the anchor and saturation probabilities were both 0.9. The BT simulations used a topographic map of dimensions 36 x 36; the AB simulation required a larger 10 x 61 topographic map to allow for variation in the horizontal position of the stimuli. To simulate an experimental task, the experimental stimuli must be mapped to a pattern of exogenous input to the AM. As we have done in earlier work, the mapping was accomplished by laying a silhouette of the stimulus over the topographic map, and setting the exogenous input at all locations covered by the silhouette to 0.10 except along the stimulus contour, where the exogenous input is raised to 0.20 to reflect the contrast along the border. Further, as in the past, we
I09
assume a slight amount of blurring of the exogenous input: each stimulus location provided input not only to the corresponding location of the AM but also the immediately adjacent locations, with a relative strength of 2%. This small amount of spread is unlikely to affect processing, but we have preserved it to maintain consistency with the original simulations. The experimental tasks to be simulated have as their dependent variable the response time to detect or identify a target. Rather than running the full MORSEL model and using the object-recognition network to determine detection or identification responses, we make a simple readout assumption that allows us to perform a simulation using only the AM. The assumption is that the reaction time to detect or identify a target is inversely proportional to the attentional activation in locations that correspond to the target. This assumption is justified by earlier simulations of MORSEL (Mozer, 1991), in which output activity of the recognition network was found to be monotonically related to the allocation of attention to locations of a target. Because the propagation of activity in MORSEL is temporally extended, we use not the instantaneous activation of the AM, but rather the mean activity of the AM over the twenty iterations following target onset. The results described in the following sections are not sensitive to the specific read out assumptions; results are qualitatively similar if the mean activity is computed over ten or forty iterations instead of twenty, or if the mean activity is mapped to response time by any monotonic transformation. Because trials will vary due to random effects of the transmission probabil-
ity curve, we average activation across multiple trials in each experimental condition.
Simulations Behrmann and Tipper When an experimental stimulus is presented upright and centered on the fixation point, viewercentered and object-centered reference frames are confounded. To dissociate the two frames, Behrmann and Tipper (1994) rotated a display containing a barbell - two disks, one colored red and the other blue, connected by a solid bar. The barbell first appeared with, say, the red disk on the left and the blue disk on the right. It remained stationary for one second, allowing subjects to establish an object-based frame of reference. In the moving condition, the barbell then rotated 180” (Fig. 8a) such that the blue disk ended up on the left and the red disk on the right - the two disks had exchanged places (Fig. 8b). Following the rotation, the red disk appears on the left with respect to the object-based frame, but on the right with respect to the viewer-based frame. The subjects’ task was to detect a target appearing on either the red or the blue disk. A static condition, in which the barbell did not rotate, was used as a baseline. Left-neglect subjects showed facilitation for targets appearing on the red disk in the moving condition relative to the static condition, and showed inhibition for targets appearing on the blue disk. Basically, the laterality of neglect reversed with reversal of the
Fig. 8. Barbell stimulus used in the Behrmann and Tipper experiment. The disk labeled ‘R’ is coloured red, the disk labelled ‘B’ blue. In the moving condition, the initial display (panel a) was rotated 1 80°,resulting in the left and right disks exchanging places (panel b). In the static condition, no rotation occurred (panel b).
barbell. Results were therefore consistent with object-based, not viewer-based, neglect. Tipper and Behrmann (1996) ruled out an explanation for this phenomenon in terms of overt traclung by eye movements. They also showed that the phenomenon appeared to depend on the disks being encoded as one object: in contrast to the condition depicted in Fig. 8 in which the two disks are connected, when the bar between the disks is removed - the disconnected condition - the reversal of neglect no longer occurred when the disks rotate. This finding is what one would expect if neglect occurred in an object-based frame, because rotation of the display no longer corresponds to rotation of an object-based frame. To simulate the moving condition in the AM, a horizontal barbell was presented for 50 iterations, and then rotated 180” over the next 400 iterations. To simulate the static condition, the horizontal barbell was presented for 200 iterations. Encoding the rotating stimulus in a discrete array of cells is complicated due to quantization effects. Rather than attempting to hand-design an exogenous input patterns for the barbell at every angle 8, the exogenous input was automatically generated from the exogenous input for the horizontal barbell stimulus as follows. For each location (x,y), a new coordinate (x’,y’) was computed by 8” rotation. Because x’ and y’ are in general non-integer, the exogenous input at (x,y) could not copied to (x’,y’) directly. Rather than rounding x’ any y’ to the nearest integers, the exogenous input at (x‘, y’) was then split up according to the distance of (x’,y’) to the four integer grid locations surrounding it. This procedure minimized quantization effects that arose from the coarse representation of the topographic map. As we explained earlier, we assume that the attentional activity in a region of space is related to the speed and accuracy of information processing in that region. In the Behrmann and Tipper experiment, the critical regions are those of the two disks. Read-out from the model was performed by calculating the mean attentional activity directed toward the each disk, averaged over all locations containing features of the disk and over the twenty iterations following the trial and over 200 trials, which will be referred to as the read-out activity.
Greater read-out activity for a disk indicates a shorter response time to the target appearing in that disk. Figure 9 shows a trial of the unlesioned AM in the moving barbell condition. The unlesioned model has a uniform transmission probability of 0.9 across the field, producing occasional missing features in the exogenous input, as can be seen in the upper-left panel. As the figure shows, attention is rapidly deployed to the entire barbell, and remains with the barbell as it rotates. This result is a not altogether trivial feat for the model, as the model has never been tested on dynamic stimuli. The read-out activity was 0.99 for both the left and right disks. (With regard to the disks and targets, ‘left’ and ‘right’ will refer to viewer-centered locations.) The attentional state before rotation begins, at iteration 50, gives a good indication of the read-out activity in the static condition, which was also 0.99 for left and right disks. Thus, the unlesioned AM shows no difference in target detection time among conditions - moving versus static, left versus right target, and connected versus disconnected disks. The lesioned AM shows quite different behavior (Fig. 10). A relative degradation to the exogenous input on the left side of the barbell can be observed due to the transmission probability curve, causing the right half of the barbell to be selected initially. As the barbell begins to rotate, the focus of attention narrows further to just the disk. As rotation continues, attentional activity lags slightly behind the exogenous input, but catches up when the rotation is completed. Given the final distribution of attention in the moving conditions, the model will be faster to respond to a target on the left than on the right. This reversal does not occur in the static condition, as suggested by the AM state at iteration 50. The trial depicted in Fig. 10 is representative; it is consistent with the more quantitative measure of read-out activity (Table 1 , connected condition) which indicates greater activity for the left disk in the moving versus the static condition, and less activity for the right disk. When the disks are disconnected, attention jumps from the disk that started off on the left to the disk that ends up on the left (Fig. 11). After the disks cross the midline, the disk rotating into the
Ill
Fig. 9. One trial of the unlesioned model on the Behrmann and Tipper rotating-barbell stimulus. Attentional activation (white squares) follows the exogenous input (black squares) as the barbell rotates.
Fig. 10. One trial of the lesioned model on the Behrmann and Tipper rotating-barbell stimulus.
112
TABLE 1 Read-out activity from the lesioned AM for the left and right disks in the experimental conditions of Tipper and Behrmann (1996) condition connected disconnected
left disk
right disk
moving static
0.22 0.00
0.04 0.99
moving
0.00 0.00
0.93 0.99
right field beings to receive more support from the exogenous input than the disk rotating into the left field. Eventually this exogenous support is sufficient to activate the right disk, and competition kicks in to suppress the left disk. This pattern is observed reliably, as indicated by the measure of read-out activity (Table 1, disconnected condition). The read-out activity shows nearly full activity to the right disk and none to the left disk, and no difference between moving and static conditions.
To summarize, the AM simulation replicates the primary findings of (Behrmann and Tipper 1994; Tipper and Behrmann, 1996); (1) For normals, no reliable differences are obtained across conditions. (2) For patients shown connected disks, left-sided facilitation and right-sided inhibition is obtained in the moving condition relative to the static. (3) For patients shown disconnected disks, leftsided facilitation and right-sided inhibition are not observed. (4) For patients, there is a main effect of target side: left is slower than right.
The model’s ability to replicate the pattern of data was not obvious without running a simulation, and in fact, its behavior for disconnected disks was unexpected. Nonetheless, the results emerged reliably from the simulation. In a situation such as this, our only recourse is to experiment with the model and determine what factors influence its behavior,
Fig. I 1. One trial of the lesioned model on the Tipper and Behmann rotating disconnected stimuli.
113
with the goal of eventually extracting an intuitive explanation for its success. Many factors did not affect the model’s qualitative performance, suggesting that the result is robust. The specific design of the stimuli was unimportant; qualitative performance was uninfluenced by altering the size of the disks or the bar, or the pattern of exogenous input corresponding to either of these components. Other factors also had little effect, including: alternative parameters for the transmission probability curve, rate of rotation of the stimulus, and the read-out formula. In fact, the reversal effect reported for rotating connected disks could be made even larger by increasing the size of the disks, reinforcing the exogenous input to the borders of the stimulus, increasing the rotation time, andor reading out the asymptotic activity of the AM. To understand the simulation results, consider first the moving connected-disk trials. The model appears to track the right disk into the left field. Because attentional activity in the model corresponds to covert attention, this tracking is not necessarily overt and is therefore consistent with the finding of (Tipper and Behmann 1996) that eye movements are not critical to the phenomenon. Tracking occurs because the attentional state has hysteresis: the state at some iteration is a function of both the exogenous input and the state of the previous iteration. Attention would not ordinarily be drawn to a disk on the left given a competing disk on the right because the exogenous input to the left disk is weaker. Nonetheless, if attention is already focused on the disk on the left, even a weak exogenous input may be sufficient to maintain attention on the disk. Returning to the rules of activation of the model described earlier, the disk that has moved into the left field has support via the bias and cooperation rules, whereas the disk that has moved into the right field has support only via the bias rule. However, the winner is not determined simply by the number of activation rules that support it. Key to the model’s behavior is the total quantitative support provided to each of the disks. If the total support is greater for the right disk, then attention will flip to the right. This flipping occurs on the disconnected-disk trials. Based on an exploration of
alternative stimuli, it appears that the flipping occurs for the disconnected but not connected trials due to the presence of the neck of the barbell on connected trials - the region where the disk makes contact with the bar. The neck provides a region of exogenous input adjacent to the disk, and by the cooperation rule, therefore provides an environment that supports attentional activity. Figure 10 clearly shows that activation is centered on the neck as the disk rotates into the left field. Without the neck to ‘hook’ activity in place, activity drops to the point that the left disk cannot fend off attack from the right disk. Although this account is not entirely satisfactory, in that we have not explained the phenomena in linguistically simple, qualitative terms, it is sometimes the best one can hope for in characterizing the behavior of a complex, dynamical system such as the AM. The Behrmann and Tipper data seems strongly consistent with the hypothesis that neglect operates in object-based coordinates. The AM, however, provides an alternative explanation, because it has no object-based frame of reference, yet it can account for the data. The AM’S account involves covert attentional tracking. Without simulations, the covert-tracking account is not compelling, because it would not appear to explain the lack of neglect reversal for disconnected displays. However, despite the absence of object-based representations, the AM does show a distinction between connected (single object) and disconnected (multiple object) displays, and hence increases the plausibility of the covert-tracking account. The AM also makes a variety of predictions, which we are currently testing in patients (McGoldrick et al., in preparation). Arguin and Bub
Several studies have tried to disentangle the contributions of various frames of reference to neglect by manipulating the location of a target in one reference frame while keeping it fixed in another (e.g., Calvanio et al., 1987; Farah et al., 1990; Arguin and Bub, 1993; Behrmann and Moscovitch, 1994) performed such a study in which the two frames were viewer based and object
1 I4
based.4 Subjects were asked to name a target letter presented in a horizontal array containing four elements. The other three elements were filled circles. The target could appear in one of eight positions on the screen, called the viewer-relative position. The target could also appear in one of four positions relative to the circles, called the object-relative position. Viewer-relative and objectrelative positions were varied independently, producing 32 different display configurations (Fig. 12). Response time to name the target was fixation
x 0.0 0
l , i x 0 0.0
vary viewer: . relative position
X.0.
vary objectrelative position
Fig. 12. Each row, consisting of a letter and three filled circle, is a possible stimulus display in the Arguin and Bub (1993) study. The fixation point is indicated by the ‘ + ’. In the first three rows, the position of the letter is varied with respect to the viewer-based frame while the position with respect to the object-based frame is fixed. In the last three rows, the position of the letter is varied with respect to the object-based frame, while the position with respect to the viewer-based frame is fixed.
4Arguin and Bub distinguish object-based frames from stimulus-based frames. An object-based frame ‘depicts the spatial relations between the parts of a single object’, whereas a stimulus-based frame ‘represents the relative locations of spatially distinct stimuli’ (Arguin and Bub, 1993, p. 350; italics in original). We see no clear cut
distinction between these two perspectives. Many objects can be drawn in a way that their parts are not physically connected, e.g. the word DOG. If one accepts a hierarchical organization of objects and their parts, there is no in principle distinction between ‘objects’ and ‘stimuli’ because a ‘stimulus’ at one level of the hierarchy (made up of multiple objects) is an object at the next level up the hierarchy. Until some compelling evidence is presented for a dissociation of object-based and stimulus-based frames of reference, we will use the two terms as equivalent. Even if the two frames are dissociated, the simulation to be reported is still of value: Arguin and Bub argue for the psychological reality of stimulus-based frames based on their data, but the present simulation replicates the pattern of data without either a stimulus-based or object-based level of representation.
measured. Response time was presumed to be a ‘direct measure of allocation of attention across space’: the more attention allocated to a position, the faster the response times. This paradigm allows for the comparison of performance across objectrelative position when the viewer-relative position is held constant. Whereas normal subjects showed no effect of object-relative position, neglect patient B.A. showed increasing response time with leftward target displacement in the array. Because an effect of object-relative position was obtained when unconfounded with viewer-relative position, the data were interpreted as supporting the hypothesis that neglect can occur with respect to an objectbased frame of reference. In our simulation of this experiment, each display element was mapped to a 4 x 3 pattern of exogenous input to the AM, with a one-column gap between display elements. Because the target could appear in eight different viewer-centered locations, and it was necessary to allow for three additional display elements to the left and right of the target, the topographic map was designed to accommodate fourteen distinct locations. Figure 13 shows an example of the lesioned model’s performance. As the figure illustrates, the exogenous input tends to be weaker for the elements further to the left. However, because this display is on the left side of the viewer-centered frame, the exogenous input is degraded even for the rightmost element. Although all four elements capture attention on this trial, attention builds most rapidly for the rightmost elements, suggesting that the read-out activity (the mean attentional activity over the twenty iterations following stimulus onset) should be larger for the rightmost elements, and hence response time should be faster. This observation is confirmed by running fourteen presentations of the complete experimental design (32 trial types). The simulation data are summarized in Fig. 14. For the normal model, no effect is found for either object- or viewer-relative position (object: F(3,416)< 1; viewer: F(7,416)= 1.1, p > .3). For the lesioned model, however, main effects are obtained for both object- and viewerrelative position (object F(3,416) = 80.8, p < .001; viewer: F(7,416) = 109.9,p < .OOl), and there is no
115
interaction (F(21416)= 1.14, p > .3). These results replicate the main findings of Arguin and Bub. The only significant discrepancy between the human and simulation data is that Arguin and Bub observed effects of retinal eccentricity that influenced performance as a function of viewer-
relative position. The model obviously does not address retinal acuity effects, because its visual field is homogeneous. However, such effects could readily be incorporated by, for example, assuming read-out time that increases with distance from fixation.
Fig. 13. Performance of the lesioned model when presented with a sample display from the Arguin and Bub (1993) study.
(4 80-. normal
normal
/
/ - = - - - :
/
lesioned
lesioned
0.761
0.76
1
Object-Relative Position
Viewer-Relative Position
Fig. 14. Simulation performance of the normal and lesioned AM on the Anguin and Bub (1993) task.
116
How can it be that the model has only a viewerbased representation yet performance is affected by the object-relative position of a target? The intuitive answer is that the attentional gradient causes information on the relative left of a stimulus to be weaker than information on the relative right, regardless of the absolute position of the stimulus. The rate of growth of attentional activation depends on the strength of the exogenous input. Hence, attentional activation will rise more slowly for display elements further to the left. Also factoring into the explanation is the competition rule in the AM activation function: if activity of the rightmost elements rises faster than activity of the leftmost elements, the rightmost elements will tend to suppress the leftmost, further enhancing the effect of the attentional gradient. This suppression was evident in the Behrmann and Tipper simulation (Fig. lo), where only the right half of the barbell was attended, even prior to the onset of rotation. The suppression does not occur in the normal model because the exogenous input to the left and right elements is balanced; consequently, their activity rises at roughly the same rate, and competition does not kick in. Similar object-relative effects in the lesioned AM were observed by (Mozer and Behrmann 1992) in a reading task. Surprisingly, viewer-relative effects are somewhat dependent on the task and the specific read out assumptions: For example, when (Mozer et al. 1997) simulated line bisection using the lesioned AM, they found very minor viewerrelative effects when response formulation was assumed to depend on the asymptotic activity of the AM.
Discussion The neuropsychological studies we addressed are concerned with issues such as: what internal representations are constructed in the ordinary course of visual information processing? And, can attention be directed in coordinates defined by the object itself? In one study, (Behrmann and Tipper 1994; Tipper and Behrmann, 1996) observed that neglect remained with the left side of an object as the object rotated 180”. In the second study, (Arguin and Bub 1993) found that neglect was
greater for a stimulus located on the left side of an array than on the right side, even when retinal position of the stimulus was controlled for. These results were interpreted by the authors of the studies to support the psychological reality of a frame of reference other than the viewer-based frame: “The findings suggest that attention operates on object-centered as well as location-based representations, and thus accesses multiple reference frames.” (Tipper and Behrmann, 1996) “. . . A stimulus-centered spatial reference frame . . . may be affected in the visual hemineglect syndrome. . .. Thus, we suggest that the concept of stimulus-centered reference frame corresponds to a level of spatial representation that is generally used in human vision . . .” (Arguin and Bub, 1993) Although the authors are careful to state their conclusions tentatively, the accumulation of such studies has caused many researchers to believe in the existence of object-based reference frames and object-based representations in the brain. Simulations of the AM reported here have struck a blow against this interpretation, as the AM can explain the data yet operates only in a viewer-based frame. In addition, the AM has the potential of explaining a variety of other data from neglect patients that has been used in support of, or that has presupposed the existence of, object-based frames, including: Driver and Halligan 1991; Grabowecky et al. 1993; Driver, et al. 1994; Pavlovskaya et al. 1997; and Behrmann and Tipper, 1998. Simulation studies are currently underway to model key results from these studies. If the AM can also explain these data, it will severely weaken the case for object-based frames in visual perception. The existence of viewer-based frames is undisputable: early visual information is encoded with respect to retinal location and gaze direction. However, object-based frames are not required for visual perception. Although one class of models of visual perception does utilize objectbased frames (Fig. 2a), another class requires only viewer-based frames (Fig. 2b). In the absence of strong empirical support for object-based frames, the model class utilizing only viewer-based frames
117
would seem the more parsimonious, especially considering the computational obstacles - outlined earlier - to constructing object-based reference frames. Evaluating the model Computational modeling is a valuable exercise for many reasons. A computational model provides a concrete embodiment of a theory. It forces one to be explicit about one’s claims. It allows one to examine interactions among assumptions. However, a computational model makes its greatest contribution when it offers a new or radically different conceptualization of data. The AM appears to have succeeded in this regard. The neglect data seemed to demand an explanation involving object-based frames. No qualitiative model could convincingly argue otherwise; only a simulation model could resurrect a class of explanations that might otherwise be ruled out. Even if the internal dynamics of the AM were incomprehensible to human observers, it would still be valuable as an existence proof - a detailed model having only viewer-based frames that can nonetheless explain the data. The inner workings of the AM are indeed difficult to comprehend. We are sometimes successful in explaining its behavior in qualitative language that can be communicated to others, but not always. The model’s behavior is an emergent property of the interaction of cooperative and competitive forces. One should not expect that such complex dynamics can be reduced to a simple explanation that sidesteps the dynamics. To the degree we have succeeded in characterizing the model’s performance, the characterizations come via post hoc analysis of the simulation results. Indeed, the AM has sufficiently complex dynamics that its creator has difficulty in predicting the outcome of a simulation. Many results we have modeled using the AM were unexpected and surprising. In the simulations reported here, the disconnected disk condition of Tipper and Behrmann (1996) was a case in point. The model is far more interesting and subtle than we first realized. However, its success in explaining a wide variety of data is undeniable. Each time that the model, with
only trivial extensions, can explain a corpus of data it was not designed around, one must increase one’s confidence in the model. A relatively simple model like the AM could not continue to provide accounts of data were it not in some fundamental sense correct.
The status of object-based frames of reference Several researchers have noted caution in invoking object-based frames of reference to explain data and behavior. Buxbaum (1995) argues - consistent with the present work - that apparent object-based deficits in neglect might arise from attentional gradients in viewer-based space. Driver et al. ( 1994) acknowledges that the object-based effects of Driver and Halligan (1991) might be attributable to a relative deficit of attention in viewer-based space. Farah (1990; see also Vecera and Farah, 1994) argues for location-based encoding of object properties, and attributes object-based effects to the fact that parietal attentional processes are part of an interactive system that includes other parts of the brain that recognize objects. Some objects have found no, or limited support for object-based frames. Farah et al. (1990) failed to obtain neglect in an object-based frame. Tan and Pinker (1990; see also McMullen and Farah, 1991) observed the use of object-based frames only in special cases of recognition, but suggest that the ordinary visual reference frame is tied to egocentric coordinates. Buxbaum et al. (1995) eliminate object-based neglect by manipulating task instructions, and suggest that mental rotation may underlie at least some cases of object-based neglect. Two neglect-related phenomena have been reported that are not trivially explained by the AM and its viewer-based representation of space. Caramazza and Hillis (1990b; Hillis and Caramazza, 1995) have studied patients who show neglect for the right side of a word, in both perception and production, across perceptual modalities, and - most pertinent to the issue of frames of reference in visual perception - irrespective of the topographic arrangement of letters (neglect was observed words presented vertically and mirror reversed). Humphreys and Riddoch (1994, 1995) have observed patients who manifest
left neglect in single words but right neglect in multiple-stimulus displays. These two phenomena, though undoubtably real, do not necessarily conflict with the perspective we have presented. One might argue that the phenomena arise from a complex interaction of perceptual and motor processing that is both strategic and task dependent, in contrast to the seemingly more pure-perception tasks simulated in this paper. (This sort of explanation was proposed by Buxbaum et al., 1995). For example, reading a mirror-reversed word may involve piecing together the letters one at a time in a verbal or visual short term store; and right-sided motor neglect could explain difficulty in processing multi-term displays. One might also accommodate these phenomena by positing specialized visual representations for words versus other visual stimuli. The conjecture supported by our computational model is that object-based frames of reference play little or no role in the course of ordinary perception. Consequently, a two-stage model (Fig. la) that constructs an object-based representation is a less likely candidate for describing how visual perception is implemented in the brain than a one-stage model (Fig. 1b). One should not, however, interpret our line of reasoning as supporting the stronger conjecture that object-based frames are non existent in the brain. Surely if demanded by the task, people can mentally construct visual object-based representations. However, the AM suggests that such a complex cognitive ability is built on top of a more basic perceptual apparatus that operates using viewer-based frames of reference.
Acknowledgements The author thanks James Reggia and Eytan Ruppin for organizing a stimulating meeting from which this work arose. This work greatly benefited from the comments and insights of Paul Smolensky and Brenda Rapp, and from James Reggia’s astute critique of the manuscript. This research was supported by grant 97-18 from the McDonnellPew Programme in Cognitive Neuroscience, and by NSF award IBN-9873492.
References Arguin, M. and Bub, D.N. (1993). Evidence for an independent stimulus-centered spatial reference frame from a case of visual hemineglect. Cortex, 29, 349-357. Bar, M. (1998). Characteristics and cortical localization of subliminal visual priming. Unpublished doctoral dissertation. Los Angeles, CA: Department of Psychology, University of Southern California. Behrmann, M. & Moscovitch, M. (1994). Object-centered neglect in patients with unilateral neglect: Effects of left-right coordinates of objects. J. Cogn. Neurosci., 6, 1-16. Behrmann, M. and Tipper, S.P. (1994). Object-based attentional mechanisms: Evidence from patients with unilateral neglect. In: C. Umilta and M. Moscovitch (Eds.), Attention and Pe$ormance XV: Conscious and Nonconscious Processing and Cognitive Functioning, Cambridge, MA: MIT Press, pp. 351-375. Buxbaum, L.J. (19950. Visual attention to objects and space: Object-based neglect or location focus? Paper presented at the lnternarional Neuropsychological Society Conjerence, Seattle, WA, February 1995. Buxbaum, L.J., Coslett, H.B., Montgomery, M.W. & Farah, M.J. (1996). Mental rotation may underlie apparent objectbased neglect. Neuropsychologia, 34, 1 13-126. Calviano, R., Petrone, P.N. and Levine, D.N. (1987). Left visual spatial neglect is both environment-centered and bodycentered. Neurology, 37, 1179-1 183. Caramazza, A. and Hillis, A.E. (1990a). Levels of representation, coordinate frames, and unilateral neglect. C o p . Neuropsychol., 7, 3 19445. Caramazza, A. and Hillis, A.E. (1990b). Spatial representation of words in the brain implied by studies of a unilateral neglect patient. Nature, 346, 267-269. Driver, J., Baylis, G.C., Goodrich, S.J. and Rafal, R.D. (1994). Axis-based neglect of visual shapes. Neuropsychologia. 3 2 , 1353-1365. Driver, J. and Halligan, P.W. (1991). Can visual neglect operate in object-centered coordinates? An affirmative single-case study. Cogn. Neuropsychol., 8, 475496. Farah, J.J. (1990). Visual agnosia. Cambridge, MA: MIT Press/ Bradford Books. Farah, M.J., Brunn, J.L., Wong, A.B., Wallace, M.A. and Carpenter, P.A. (1990). Frames of reference for allocating attention to space: Evidence from the neglect syndrome. Neuropsychologia, 28, 335-347. Fukushima, K. and Miyake, S. (1982). Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position. Putt. Recog., 15, 455469. Crabowecky, M., Robertson, L.C. and Treisman, A. (1993). Preattentive processes guide visual search: Evidence from patients with unilateral visual neglect. J. Cogn. Neurosci., 5 , 288-302. Halligan, P.W. and Marshall, J.C. (1993). The history and clinical presentation of visual neglect. In: I.H. Robertson and
119 J.C. Marshall (Eds.), Uniluteral Neglect: Clinical and Experimental Studies. London: Erlbaum Associates. Heilman, K.M., Watson, R.T. and Valenstein, E. (1993). Neglect and related disorders. Clinical Neuropsychology (2nd edition). New York: Oxford University Press, pp. 279-336. Hillis, A.E. & Caramazza, A. (1995). Spatially specific deficits in processing graphemic representations in reading and writing. Bruin Lang., 48, 263-308. Hinton, G.E. (1981). A parallel computation that assigns canonical object-based frames of reference. Proceedings of the Seventh International Joint Conference oiz Artijciul Intelligence. Los Altos, CA: Morgan Kaufmann, pp. 683-685. Hinton, G.E. and Lang, K. (1985). Shape recognition and illusory conjunctions. Proceedings ofthe Ninth International Joint Conference on ArtiJcial Intelligence. Los Altos, CA: Morgan Kaufmann, pp. 252-259. Hubel. D.H. & Wiesel, T.N. (1979). Brain mechanisms of vision. Scientific American, 241, 150-162. Humphreys. G.W. and Riddoch, M.J. (1995). Separate coding of space within and between perceptual objects: Evidence from unilateral visual neglect. Cogn. Neuropsychol., 12, 283-3 1 1. Humphreys, G.W. and Riddoch, M.J. (1994). Attention to within-object and between-object spatial representations: Multiple sites for visual selection. Cogn. Neuropsychol., 11, 207-24 I . Kinsboume. M. (1987). Mechanisms of unilateral neglect. In: M. Jeannerod, Neurophysiological and neuropLsychological aspects of spatial neglect. Amsterdam: North Holland, pp. 69-86. Insbourne, M. (1993). Orientational bias model of unilateral neglect: Evidence from attentional gradients within hemispace. In: I.H. Robertson and J.C. Marshall (Eds.), Unilateml Neglect: Clinical and Experimental Studies. London: Lawrence Erlbaum. Le Cun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W. and Jackel, L.D. (1989). Backpropagation applied to handwritten zig code recognition. Neur Cornput., 1,541-551. Marr, D. (1982). Vision. San Francisco: Freeman. McGoldrick, J., Mozer, M.C., Munakato, Y. and Reed, C. (In preparation). Object-based representations and neglect. Tests of a computational model. McMullen, P.A. and Farah, M.J. (1991). Viewer-centered and object-centered representations in the recognition of naturalistic line drawings. Psycho/. Sci., 2, 275-277.
Mozer, M.C. (1991). The perception of multiple objects: A connectionist approach. Cambridge, MA: MIT PressBradford Books. Mozer, M.C. and Behrmann, M. (1992). Reading with attentional impairments: A brain-damaged model of neglect and attentional dyslexias. In: R.G. Reilly and N.E. Sharkey (Eds.), Connectionist Approaches to Natural Language ProcEssing. Hillsdale, NJ: Erlbaum Associates, pp. 409-460. Mozer, M.C., Halligan, P.W. and Marshall, J.C. (1997). The end of the line for a brain-damaged model of unilateral neglect. J. Cogn. Neurosci., 9, 171-190. Mozer, M.C. and Sitton, M. (1998). Computational modeling of spatial attention. In: H. Pashler (Ed.), Attention. London: UCL Press, pp. 341-393. Olson, C.R. and Gettner, S.N. (1996). Brain representations of object-centered space. Curt Opin. Neurobiol., 6 , 165-170. O’Reilly, R.C., Munakata, Y. and McClelland, J.L. Explorations in computational cognitive neuroscience: Understanding the Mind by Simulating the Brain. Cambridge, MA: MIT Press (in press). Pavlovskaya, M., Glass, I., Soroker, N., Blum, B. and Groswasser, Z. (1997). Coordinate frame for pattern recognition in unilateral neglect. Cogn. Neurosci., 9, 824-834. Pinker, S. (1984). Visual cognition: An introduction. Cognition, 18, 1-63. Tanaka, K. ( 1993). Neuronal mechanisms of object recognition. Science, 262, 658-688. Tarr, M.J. and Pinker, S. (1990). When does human object recognition use a viewer-centered reference frame? Psycholog. Sci., 1, 253-256. Tarr, M.J. (1995). Rotating object to recognize them: A case study on the role of viewpoint dependency in the recognition of three-dimensional objects. Psychonomic Bull. Rev., 2, 55-82. Tipper, S.P. and Behrmann, M. (1996). Object-centered not scene-based visual neglect. Journal of Experirnental Psychology: Hum. Percept. Perjorm., 22, 1261-1278. Vecera, S. and Farah, M.J. (1994). Does visual attention select objects or locations? J. Exp. Psycho/.: General, 123, 146-1 60. Vetter, T., Hurlbert, A. and Poggio, T. (1995). View-based models of 3D object recognition: Invariance to imaging transformations. Cereb. Cortex, 5, 261-269. Zemel, R.S., Mozer, M.C. and Hinton, G.E. (1988). TRAFFIC: Object recognition using hierarchical reference frame transformations. In: D. Touretzky (Ed.), Advances in Neural Information Processing Systems Il. San Mateo, CA: Morgan Kaufmann, pp. 266-273.
This Page Intentionally Left Blank
J.A. Reggia, E. Ruppin and D. Glanzman (Eds.) Progress i n Rrain Research, Vol 121 Q 1999 Elsevier Science BV. All rights reserved.
CHAPTER 8
Inter-hemispheric competition of sub-cortical structures is a crucial mechanism in paradoxical lesion effects and spatial neglect Claus-C. Hilgetag',", Rolf Kotter2 and Malcolm P. Young' 'Neural Sv.yteins Group, University of Newcastle, Department of Psycliology, Ridfey Building, Newcastle upon Tyne NEI 7RU, UK 'Center,for Anatomy and Brain Research, Heinrich-Heine University Diisseldorf; Moorenstrafle 5, 40225 Diisseldoi$ Cerinuny
Introduction The ability to focus attention to any point in the visual field is normally taken for granted. The assignment of attention proceeds so inconspicuously and effortlessly that the existence of special attentional mechanisms often becomes apparent only when they fail. Some of the most instructive examples of impaired attention mechanisms arise from spatial neglect, extinction, and the paradoxical effects of certain combinations of multiple lesions. Human patients with unilateral lesions in regions of parietal and premotor cortex frequently show (hemi)neglect effects, that is, reduced spatial attention to sensory stimulation in the contralesional hemifield (e.g. Mesulam, 1981; Vallar, 1998). These effects may manifest themselves for unilateral stimulus presentation, although they are more likely to become apparent for bilateral stimulation (Driver and Mattingley, 1998), in which case they are more correctly denoted (hemi)extinction (Vallar, 1998). Neglect produces a wide and colorful spectrum of effects (Halligan and Marshall, 1992). One frequently observed phenomenon is the shift of spontaneous spatial attention away from the midline, where it is centered in normal subjects, *Corresponding author. Fax: ( + 44-191) 222 5622; e-mail: [email protected]
to the ipsilesional side (e.g. Karnath, 1997). This might lead to an avoidance of the contralesional side (Vallar, 1998), and to increased exploration of the ipsilesional space. It is noteworthy that attention is again reduced at the far periphery of the intact, ipsilesional hemifield (Karnath, 1997). Earlier studies suggested a co-operation of cortical and sub-cortical, in particular tectal, mechanisms in producing intact spatial attention and vision (Poppel and Richards, 1974; Singer et al., 1977). This view, however, appears to have lost ground in recent years. Since post-mortem studies and modem non-invasive brain imaging techniques have found damage of cortical tissue in most tested neglect patients, many researchers now assume that the mechanisms of spatial attention and neglect are cortical in origin (e.g. Mesulam, 1981; Nobre et al., 1997; Vallar, 1998). On the other hand, some singular cases of neglect have also been reported for human patients without cortical damage, but with impairment of, for instance, the thalamus (Rafal and Posner, 1987), or the striatum and the deep white matter (Healton et al., 1982). However, such cases have been interpreted in terms of changed processing in the cortical stations that are connected with the damaged sub-cortical ones (Vallar, 1998). Similar effects of damaged attentional systems as in humans have been observed in other mammal-
122
ian species, such as the cat. Cats with unilateral lesions or inactivation of the posterior middle suprasylvian cortex fail to orient to visual stimuli presented to them in the contralesional hemifield (Sprague, 1966; Payne et al., 1996a; Payne et al., 1996b; Rosenquist et al., 1996). The same failure arises following unilateral lesion or inactivation of the superior colliculus (Lomber and Payne, 1996; Rosenquist et al., 1996). Such results implicate both cortical and midbrain structures to be involved in mediating spatial attention in the cat. Thus, the human cases and the cat lesion model are similar in the attentional defects, although these deficits have been tested differently in each framework. Moreover, there are parallels in cats and humans in the neural systems presumably contributing to the dysfunction. There is, however, one set of puzzling effects that has been extensively studied in the cat, but for which hardly any parallels exist in human studies (Kapur, 1996). In a classical experiment, Sprague observed that the visual hemineglect induced in the cat by removing large portions of one occipitaltemporal cortex can be paradoxically reversed, and orienting performance restored, by subsequently removing further brain tissue in addition to the primary lesion. Secondary damage that leads to this surprising recovery of spatial attention includes lesioning the superior colliculus [SC] on the contralesional side, or sectioning the commissure of the superior colliculus (Sprague, 1966). More recent studies confirmed and extended these results. Payne and Lomber employed reversible cooling to inactivate middle suprasylvian and collicular locations (e.g. Lomber and Payne, 1996; Payne et al., 1996a; Payne et al., 1996b). Unilateral cooling inactivation of either cortical or collicular sites produced contralateral visual hemineglect very similar to that created by ablating tissue (Payne et al., 1996b). The experiments also produced paradoxical restorations of function after bilateral inactivation of the cortical sites alone and bilateral inactivation of the colliculi alone. These experiments demonstrated that performance can be restored by subsequent inactivation at the same level as the primary lesion (Lomber and Payne, 1996). Similar paradoxical restorations of function were, moreover, produced by secondary lesions to
the contralesional substantia nigra pars reticulata [SNr] (Wallace et al., 1990), or by destruction of non-tectal fibres in the commissure of the superior colliculus (Wallace et al., 1989). Despite an extensive body of experimental work, the precise mechanisms of spatial attention, its disruption by unilateral lesions, and its paradoxical restoration remain poorly understood (e.g. Lomber and Payne, 1996; Ciaramitaro et al., 1997). The experimental studies demonstrate a complex, perplexing and partly counter-intuitive set of effects, which are nonetheless robust. The assessment of tissue damage in human patients and the experiments in the cat suggest that intact, lesioned and paradoxically restored orienting behavior may arise from interactions between a potentially large set of brain structures. Some of the anatomical connectivity between the structures implicated in the results is known, at least in the cat, and is known to be complex (e.g. Edwards et al., 1979; Wallace et al., 1990; Harting et al., 1992). We were interested in whether the various and divergent experimental results could be comprehensively explained in the framework of a simple mathematical model based on aspects of established brain connectivity.
Model description and results General approach
Little is known about the anatomical connectivity of the human brain (Crick and Jones, 1993); consequently the structure of our model relies on experimentally established connectivity of the cat brain. Such a ‘comparative approach’ demands caution because of the divergent designs and functional specializations of the nervous systems of different mammalian species. We did, however, feel the approach justified in this instance, for a number of reasons: (i) the neglect effects in cat and humans appear similar and some of the neural stations implicated are homologous; (ii) our model considers only gross anatomical pathways that might exist in many mammalian species; (iii) the model mainly describes the behavior of sub-cortical structures, whose design and connectivity may not have changed as dramatically during evolution as that of cortical structures.
123
Experimental findings suggest that several cortical and midbrain stations are involved in orienting behavior. The midbrain stations have been presumed to include tectal or non-tectal structures and basal ganglia, such as the SNr (Sprague, 1966; Wallace et al., 1989; Wallace et al., 1990; Lomber and Payne, 1996). The main cortical structure of the cat suggested to be involved in spatial attention is the posterior middle suprasylvian cortex [pMS] (e.g. Lomber and Payne, 1996). Figure 1 gives an impression of some of the anatomical pathways known to connect these stations. As Fig. 1 demonstrates, visual information can reach the SC via several different pathways. Although there are direct retinotectal connections
Fig. 1. Some of the anatomical pathways thought to be involved in spatial orienting behavior. The diagram represents the two retinae, the optic chiasm, and cortical stations, C, and C,, at the top. In the cat, the most important of these stations for spatial attention are the middle suprasylvian cortices (Lomber and Payne, 1996). This representation also includes cell populations in the substantia nigra pars reticulata, SNr, and SNr,, as well as the two superior culliculi, SC, and SC,. Plus and minus signs denote excitatory and inhibitory pathways. The pathways shown here are described in detail in the main text.
(Harting and Guillery, 1976), the strongest visual input into the SC presumably arrives through indirect pathways that represent the contralateral visual field. Such paths travel mainly through the ipsilateral visual cortex (Harting et al., 1992) where they combine input from the ipsilateral temporal hemiretina and crossed input from the contralateral nasal hemiretina. The SC moreover possesses a representation of the ipsilateral visual field. This is in part mediated by retinotectal fibers coming from the contralateral temporal hemiretinae and crossing over at the optic chiasm (Harting and Guillery, 1976). Since there is still a response to ipsilateral field stimuli in animals with a split optic chiasm, it can be inferred that other crossed pathways also contribute to this representation (Antonini et al., 1978). Such routes have been found to come from the contralateral visual, in particular suprasylvian, cortex (Berman and Payne, 1982; Baleydier et al., 1983; Harting et al., 1992). The SNr in the basal ganglia also receives input about visual stimuli through the cortex, either directly or via other basal ganglia structures (Joseph and Boussaoud, 1985). While a representation of the contralateral visual field might be provided by direct ipsilateral corticonigral connections (Nieoullon et al., 1978), an ipsilateral field representation also existing (Joseph and Boussaoud, 1985), could be mediated by an indirect connection from the contralateral visual cortex through the caudate nucleus (Nieoullon et al., 1978). The existence of an ipsilateral nigrotectal connection is well established (Harting et al., 1988). Moreover, it has recently been found, that distinct cell populations in the SNr contribute selectively to the inhibitory ipsi- and contralateral nigrotectal projections (Jiang et al., 1997). Those innervated by excitatory corticonigral connections project to the contralateral; those innervated by inhibitory connections (via indirect pathways from the visual cortex) project to the ipsilateral SNr. While excitatory callosal fibres link areas in the visual cortex (Buhl and Singer, 1989; Payne, 1994), the left and right SC are linked by inhibitory commisural fibres (Appell and Behan, 1990). Although the pathways which have been outlined here, and which are represented schematically in Fig. 1, already present a complex picture, it can be
124
expected that even further structures contribute to orienting behavior. Analysis of human lesions leading to neglect, for instance, also implicates stations of the medial frontal or premotor cortex (Vallar, 1998). However, rather than attempting to capture all of the structures potentially involved, with their connectivity and physiology, we concentrated on modeling the structures involved in the most robust effects in the cat, namely the suprasylvian cortex and the colliculi. In this respect, the present model is only the first step towards a more extensive, formalized description of the neural basis of orienting behavior. Our model demonstrates one-stage bilateral competition at the level of the midbrain. Behavioral effects reported after inactivation of basal ganglia (e.g. Boussaoud and Joseph, 1985; Wallace et al., 1989; Wallace et al., 1990) suggest that there is at least one other stage of inter-hemispheric interaction involved (see Fig. l), which may substantially overlap motor systems. The process by which we constructed our model was to begin with the simplest set of structures and connections that were consistent with experimental results from the behavioral effects of lesions and from neuroanatomy, and to add further structures and neuroanatomical detail only as required to account for the lesion effects. Moreover, the mode1 describes neural structure at the coarse resolution of large-scale neural systems and their general characteristics (Douglas and Martin, 1991; Payne, 1993). Consequently, our model is an abstract, rather than detailed, representation of biological reality.
occur frequently between sub-cortical structures (e.g. Alexander, 1995). Long-range inhibitory connections have been assumed (Sprague, 1966) and demonstrated (Mascetti and Aniagada, 198I ; Appell and Behan, 1990) to link the two superior colliculi. The structure of the basic model is shown in Fig. 2. Because of the partial decussation in the optic chiasm, neural activity in either midbrain structure mainly represents stimuli in the contralateral hemifield. Each structure, however, also receives weak inputs from the ipsilateral hemifield (see Figs. 1, 2 and below). The neural activity in M , and MRcan be interpreted as an output signal to the subsequent motor systems (Kotter and Wickens, 1998), initiating an orienting response towards the contralateral side. To describe the computation of neural activity in MLand MRin our model, we used a mathematical formalism similar to the one frequently employed in describing reaction rates in metabolic networks
A basic model for spatial attention Our model represents the gross neuronal activity in two competing ‘midbrain structures’, ML and MR, one in either half of the brain. The description of these structures is essentially based on anatomical data for the SC, however, some of the mechanisms outlined here might also apply to other midbrain structures. M , and M , each receive anatomically strong ipsilateral and weak contralateral input, and mutually inhibit one another via reciprocal commissural connections. Such long-range inhibitory connections are absent between regions of the mammalian cortex (Crick and Asanuma, 1986), yet
Fig. 2. A minimal model for effects of intact, lesioned and paradoxically restored spatial attention. This model contains only the midbrain activities M , and MRas dynamical variables. Inputs related to stimuli in the left and right visual hemifields are relayed into the right and left midbrain structures, M R and M,,, respectively, via suprasylvian cortex and other pathways, represented by lLand ZR (OC =optic chiasm). A weaker, crossed connection to the contralateral side accounts for the representation of ipsilateral stimuli in the midbrain structures. The tenn A represents the average neural activity in the system. Plus and minus signs indicate excitatory and inhibitory interactions. See main text for default values of the coefficients and for their settings during the simulations.
I25
(e.g. Schuster and Hilgetag, 1994). The behavior of the midbrain activities is given by the system, S, of linear ordinary differential equations:
+ k - , M , + k , ( A - M,),
(lb)
with k,,, k,,, k,,,, k , R L , k, 3 0; k,, k - , 0. The dynamical variables M , and M , represent the gross neural activities in the left and right midbrain structures, respectively. I,. and ZR stand for the inputs into these structures; these input terms represent both stimulus related neural activity in the cortical stations, which is relayed to the midbrain, as well as input relayed to ML and M , directly from the retina. Additionally, moderately strong input, underlying the ipsilateral visual field representation, comes from the contralateral hemisphere (see Figs. 1 and 2 ) . Strictly speaking, the contralateral input is not essential for many aspects of the basic model’s functional behavior, which will be described in detail in the following sections. The fact that midbrain structures possess a neural representation of the ipsilateral visual field (Harting and Guillery, 1976; Joseph and Boussaoud, 1985), however, is intriguing and important for detailed studies of spatial orienting. We therefore included contralaterally projecting connections in the basic version of our model as well as in the more detailed version outlined in later sections. The model also contains excitatory input relayed from the rest of the system. This neural ‘background’ activity was simulated through the term A , which is defined as:
and k, the background pathways’ strength. Coefficients k, and k - , , were always non-positive, the other coefficients non-negative. Defaults for the coefficients and constants, assuming an intact symmetrical system at rest, were k,, = k,, = 5 , klLR=klRL= 1, k,= 1, k , = k _ , = - 1.25, ZL=ZR=0.4. The pathway coefficients reflect our assumption that ipsilateral input through the cortex would be dominant. Visual inputs were simulated by setting Z,= 1 (for stimuli in the right visual hemifield) or Z, = 1 (for stimuli in the left visual hemifield). We specified two different types of input being integrated in the midbrain, ‘excitatory, and ‘inhibitory’. We assumed that excitatory inputs would lead to an increase of activity in a midbrain structure, if they carried activity greater than the current level of activity in the midbrain structure. Excitatory inputs smaller than that current level of activity, however, lead to a down-regulation of activity in the midbrain. Thus, excitatory input pathways mediated the net increase or decrease of neural activity coming into the midbrain. On the other hand, inputs modeled as inhibitory were integrated in direct proportion to the activity of the inhibiting structure. The neural activity in ML or MRresulting from the integration of the inputs was determined as a linear superposition of the different inputs. While these model mechanisms are simplistic, we are not aware of experimental evidence that clearly supports a particular type of more complex integration in this system. Experimental data describing the effects of lesions or neural inactivation on spatial orienting so far do not provide information on the short-time dynamics of the behavioral responses. For this reason, we were mainly interested in the steady state behavior of the model; that is, the timeinvariant states of the system, where dML - dM,= 0. dt dt The system S has a singular steady state. For the case of a completely symmetrical model (that is, k,, = k,, and k,,, = k,L,, k;: = kc k - C , I: = I L = 1), with k,: = k,, + k,,,, the steady state of the system is given by 1
The coefficients k,, and k,, describe the functional strengths of the ipsilateral input pathways; klLR and k,,, describe the strengths of the contralateral input pathways; k, and k - , the strengths of the mutually inhibitory commissural pathways;
126
Fig. 3. Stability of the basic model, depending on the size of excitatory and inhibitory coefficients. The principal steps of a linear stability analysis of the model are presented in the Appendix. The surfaces in the main picture are functions of inhibitory coefficient k, and collective excitatory coefficient k, as described in equations (A6) and (A7). The surface values, which correspond to the determinant of the model's Jacobian matrix, illustrate the transition from a stable region with det > 0 to the unstable region det < O (also see figure inset.) The top surface drawn in lines corresponds to a setting of the coefficient d = 5 (see Appendix for explanation of this coefficient), the two other surfaces in dot-print result from setting d = 2.5 and d = 1 , respectively (top to bottom). The curves in the (k,, k,)-plane are projections of the zero-crossings of the surfaces, showing the dependence of the sign inversion of the function in (A6) on k, and k,. The line closest to the k,-axis corresponds to the setting d = 5. The plot demonstrates that the model is stable as long as the inhibitory coefficients are not much larger than the excitatory ones.
The steady state is stable, that is, inert to small perturbations of the variables, as long as the inhibitory coefficient ki is not much larger than the excitatory coefficients k, and k,. This corresponds to the biologically plausible situation. A linear stability analysis of this model is presented in the Appendix, and the stability of the steady state depending on the choice of excitatory and inhibitory coefficients is illustrated in Fig. 3.
To investigate the behavior of the model with intact and lesioned pathways, the equations (1) were integrated numerically using the XPPNVinPP software developed by Dr. B. Ermentrout (XPP/ WinPP: http://mrb.niddk.nih.gov/xpp/).* In order to prevent dynamical activities from becoming *The integration used a Runge-Kutta algorithm with fixed time steps of 0.05.
127
smaller than zero, all dynamical variables were restricted to be at least zero, using Heaviside functions. Behavior of the intact and lesioned model We start by describing the behavior of the model with reference just to orienting to left and right visual hemifields. This demonstrates the main principles of the model’s operation. In the following three sections, we will relate the model’s behavior to intact, unilaterally impaired, and paradoxically restored orienting in the cat. Impaired spatial orienting in humans has often been explored in experimental paradigms different from the tests undertaken in cats, and we will show in later sections of the present paper how aspects of human orienting behavior can be studied with the help of an extended version of our model. The behavior and steady states of the activities in ML and MR depending on the different test conditions, with parameters settings for the intact, stimulated and lesioned system, are shown in a phase plane diagram in Fig. 4. The main results are also tabulated quantitatively in Table 1. To assess the consequences of neural activity in M L and M,, we interpreted a clear activity difference between the two midbrain structures as directing attention to the hemifield contralateral to the higher activity. Simulating intact visual orienting The point ‘0’ in Fig. 4 represents the resting steady state of the system. With parameters set to the default values given above, the singular steady state has the coordinates MJrest) = MR(rest)= 0.336. We now provided input in the right hemifield (IL=1), which mainly raises the input into ML. The activity in ML increases greatly, while M,s activity increases only very slightly, driving the system to point ‘1’. The slight increase in MR is due to the weak contralateral input into M,, which balances and slightly overcomes the increased inhibition from ML. Hence, visual stimulation in the right hemifield gives rise to increased activity mainly in the left midbrain structure, and the system orients toward the right, appropriately. This test condition resembles intact orienting to the right. Since the
intact system is completely symmetrical, mirrored conditions with left stimulation give rise to appropriate intact orienting to the left (see Fig. 4). Resetting ZL to the unstimulated value restores the original resting steady state (point ‘0’). Simulating visual orienting with a unilateral lesion We now examine the consequences of lesions on the behavior of the system. Point ‘2’ represents the state of the system after pathways originating in the left hemisphere have been lesioned, in analogy to a lesion of the left pMS. Since it is likely that some sub-cortical or cortical midbrain inputs remain intact after a selective cortical lesion, we assumed that the input via these pathways would not be eliminated completely, but would be reduced to 10% of the original value (that is, new k,, = 0.5 and new k,,, = 0.1). The activity of MR in this situation is slightly raised, due to the reduced contralateral inhibition, while that of M L is clearly reduced. This picture is consistent with experimental observations that neuronal activity in the deep (Hardy and Stein, 1988) and middle (Ogasawara et al., 1984) layers of the ipsilateral SC was strongly reduced after suprasylvian cortical lesions. Given the differentially higher activity in the right midbrain, a unilateral lesion of this kind can produce a tendency to circle toward the contralesional hemifield while there is no selective stimulation in the visual field (e.g. Sprague, 1966). Again stimulating the external right hemifield, and thus providing input into the left structure, moves the system to point ‘1 +2’. Here ML)s activity is somewhat higher than at resting state, and activity in MR is also slightly raised. The state of the system is, however, close to the region of the resting state (ML)sactivity is only 14% of the inputrelated activity it possesses in the intact system),* and the activity differences between ML and MR are very small. Such a state corresponds to the experimental or pathological situation after a cortical lesion, in which the animal or a hemineglect patient does not respond to stimuli in the ipsilesionial hemifield. Again, because the model is *See Table 1 .
128
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
MR Fig. 4. Phase plane plot M , versus M , demonstrating the behavior of the minimal model. The different parameter settings simulate the conditions for intact uni- and bilateral stimulation, as well as for primary and secondary lesions. Steady states are shown for combinations of these test conditions: 0 - Steady state at rest; 1 - Stimulus presentation in right visual hemifield; 2 - Lesion of lefthemispheric input stations, leaving 10% intact; 3 - Cut of conimissural pathways, removing mutual inhibition; 4 - Lesion of right-hemispheric input stations, leaving 10% intact; 5 - Complete lesion of right midbrain structure. The dashed lines display the coordinate system within which the intact system responds in the absence of external visual stimulation, for unilaterally presented stimuli in the left and right hemifields respectively. and for bilateral stimulation (stimulus conditions given by pictograms). Table I lists the numerical values that correspond to the steady states displayed in this figure. See text for detailed explanations.
symmetrical, stimulation in the left hemifield and lesion in the right input pathways will give rise to analogous mirrored results. Simulating paradoxical restorations of function We now examine conditions in which further secondary lesions are made to the system. All the following simulations take the system’s state at point ‘1 + 2’ as the precondition, that is, they will assume a left parietal lesion and visual stimulation
in the right visual hemifield. As demonstrated in the previous section, such an initial lesion leads to neglect of stimuli presented in the right visual hemifield, and we wanted to investigate how further lesions would affect this visual neglect. We first examined the effect of removing the mutually inhibitory connections between ML and MR (i.e., setting k, = k - , = 0). This condition simulates the commissurotomy undertaken by Sprague, and by Wallace and colleagues (Sprague, 1966; Wallace et
129
TABLE 1 Activity in left and right midbrain structures under intact and lesioned conditions. Columns 2 and 3 correspond to the absolute activities of ML and MR shown in the phase plot diagram, Fig. 4. These activities were transformed into coordinates in the non-cartesian coordinate system, which is spanned by vectors that correspond to the intact activity states for MLand M,, given unilateral right- or left hemifield stimulations, respectively. In this coordinate system, which takes the resting state of the system as its origin, the coordinate point (0,I), for instance, corresponds to a Cartesian point (0.38,0.80), and both coordinates describe the activity in MLand M , for the presentation of a left-field input in an intact system. The transformation of a coordinate point in the Cartesian coordinate system of the phase plane into a coordinate point in the intact-condition-based coordinate system can be given by
and
All activities marked with an overscore in the formulas are taken relative to the respective activities at the intact resting state, and the indices (1,O) and (0,l) stand for the intact condition with input-related activity in the left and right hemisphere, respectively. Activity in Condition
M,(abs)
Intact: Rest Left-field stimulus Right-field stimulus Left- and right-field stimulus
0.34 0.38 0.80 0.84
0.34 0.80 0.38 0.84
0.00 0.00 1.oo 1.00
0.00 1.00
I , lesion (10% intact): Right-field stimulus Right-field stjmulus/Commissurotomy Right-field stiniulus/l, lesion Right-field stimulus/ M , lesion
0.40 0.60 0.66 0.63
0.35 0.40 0.00 0.00
0.14 0.57 0.78 0.72
0.02 0.09 - 0.80 - 0.80
al., 1989; Wallace et al., 1990), which experimentally leads to a paradoxical restoration of function (the Sprague commissural paradox). This combination of stimulation and lesions brings the system to the state of activities represented in point ' 1 + 2 + 3 ' . The activity in M L in this state is higher than that in M , (the left structure has 57% of ML)s input-related activity in an intact system, whereas M,'s activity measures 9% in the same coordinate system, see Table l), so that the system will orient right, as is observed experimentally. However, neither the difference between the activities of two structures, nor the absolute activation of ML, are as great as in the intact system. We expect, therefore, that the mechanisms of orienting under a unilateral lesion and commissurotomy should not be as
__
__
0.00
1 .oo
efficient as in the intact system, or as in the two multiple-lesion conditions considered in the following paragraphs (resulting in the points ' 1 + 2 + 4' and ' 1 + 2 + 5' in the phase plane diagram, Fig. 4). This is consistent with experimental observations: Sprague found that a recovery from an occipital cortical lesion was weak and slow following a subsequent tectal commissurotomy (partial recovery beginning six weeks' post-surgery), whereas paradoxical recovery mediated through different secondary lesions was strong and almost instantaneous (Sprague, 1966). In analogy to the paradoxical restoration of function observed by Lomber and Payne when they inactivated the parietal cortex bilaterally (Lomber and Payne, 1996), we examined the case in which
130
the right parietal cortex is lesioned in addition to a lesion of the left parietal cortex. This combination of right hemifield visual stimulation and the two cortical lesions moves the state of the system to point '1 + 2 + 4 ' . In this condition M,'s input is reduced to a small fraction of the resting state. The right midbrain structure is moreover inhibited across the midline by greater activity in ML. Consequently, M,'s activity drops dramatically. Correspondingly, the activity of ML is higher, as a consequence of receiving a small stimulus related input, and because of disinhibition from M,. This condition thus generates an appropriate orienting response toward the right hemifield, and so is consistent with the experimental demonstration that simultaneous bilateral lesions of pMS allow the animal to orient appropriately, as in the LomberPayne paradox. In analogy to the classical Sprague paradox, in which a subsequent collicular lesion in animals with existing contralateral cortical lesions restores orienting behavior, we examined the case in which M R is inactivated in a system with visual stimulation from the right hemifield and damage to the left pMS. This combination of stimulation and lesions drives the state of the system to the condition represented by point ' 1 + 2 + 5'. Destruction of M , reduces its activity to zero, disinhibiting ML,which retains a small stimulus induced input. The combination of disinhibition and stimulation causes the activity of MLto be high. The system would thus be expected to orient appropriately toward the right, so reproducing the classical paradoxical result that animals with contralateral midbrain and cortical lesions can, despite their lesions, orient near normally, Adding spatial detail to the niodel The degree of paradoxical recovery varies for different horizontal eccentricities in the visual field, restoration of orienting behavior being weaker for stimuli in the periphery (Sprague, 1966; Lomber and Payne, 1996). Further studies show that effects of spatial attention can vary for the center and the periphery of vision (Singer et al., 1977; Karnath, 1997). In order to model such spatially dependent effects, we incorporated additional detail into our model while preserving the basic structure outlined
above. The additional parameters were based on features of the deeper collicular layers (stratum griseum intermediale and below). There is experimental evidence that midbrain structures such as the SC possess a topographic representation of the visual field (e.g. Graybiel, 1975; Bruce, 1993; Berson and Stein, 1995). Each SC contains a representation of the complete visual field (Harting and Guillery, 1976; Antonini et al., 1978), which, however, appears to lack an input from the extreme ipsilateral periphery (Antonini et al., 1978). The field representation is binocular, and the representation of the ipsilateral field is probably mainly based on crossed input from contralateral visual cortex (Antonini et al., 1978; Berman and Payne, 1982; Baleydier et al., 1983; Harting et al., 1992), or from ipsilateral cortical input to the colliculus transferred cortico-cortically via the corpus callosum (Antonini et al., 1979). To incorporate these aspects of connectivity, each midbrain structure was subdivided into 15 regions, which were interlinked by both excitatory and inhibitory lateral connections, and which represented one particular sector of the visual field (see Fig. 5). Dynamical variables represented the orientations from - 90" to 90" (a minus sign denoting eccentricities in the left hemifield), spaced at 15" angles, in each structure. This resolution corresponds to the typical perimetry resolution in cat behavioral tests (e.g. Payne et al., 1996a), and the approximate receptive field size (10"-20") of neurons in the deeper layers of the SC (Wallace and Stein, 1996). Two further variables per structure represented - 100" and 100" angles. These latter variables were not tested specifically during the following simulations, but were included to eliminate boundary artifacts. Following experimental evidence that tectal commissural projections preserve topographic register for horizontal eccentricities (Behan and Kime, 1996), long-range inhibitory connections linked patches that represented opposite sectors of the visual field (i.e., 45" left and 45" right). The inhibitory coefficients for these pathways between antagonistic patches were, as before, kc = k - c = - 1.25. The impact of input at a particular eccentricity was determined (i) by the structure of the input; and (ii) by the size of the region of the midbrain
131
structure responding to that input. Pouget and Sejnowslu (1997) have proposed that parietal neurons combine retinotopic location of a stimulus and head and body orientation in a way that can be described mathematically by basis functions. Most input into the modeled midbrain structures was from modeled parietal cortical stations, and we assumed that basis functions would also be appropriate representations for inputs in our model. However, all the experimental paradigms employed to generate data in this area have begun each trial with the animal attending centrally, and we are not in any case aware of data on the influence of different head or body orientations for the cat. Therefore we assumed that gaze would be directed toward the center of the visual field in all test conditions. In this case, the parietal cortical integration, and hence the input into the midbrain structures, is simply provided by a Gaussian function (Pouget and Sejnowski, 1997), see Fig. 6.
A Gaussian function also appears to be a reasonable fit for the representation of different eccentricities of the visual field within the SC: collicular magnification, determined by topographically organized projections, causes a large part of the structure to be devoted to the representation of central locations (Berson and Stein, 1995). We also took account of the fact that the representation of the ipsilateral field in the SC has a smaller extent (Antonini et al., 1978) and is produced by weaker input (Graybiel, 1975) than that of the contralateral field (Harting et al., 1992), although it is still thought to be substantial (Harting and Guillery, 1976). The half of the axis representing the ipsilateral hemifield in our model structures is compressed by a factor of: compared with the contralateral axis, which skews the resulting curve. Moreover, ipsilateral input does not extend beyond 60 degrees from the vertical meridian. This, together with the weaker ipsilateral field input, has
Fig. 5 . Schematic representation of topographic integration of inputs in the model. Here, each midbrain structure ML and M Rconsists of 15 dynamical variables, each simulating a specific sector in the mapped representation of the visual field. The orientations are spaced from 0" to 90" relative to the vertical meridian, at 15" intervals, for each hemifield. Additionally, two more variables, representing - 100" and 100" orientations (not shown here), were included for either structure in order to avoid artifacts at the borders of the spatial representation. The inputs i to 19"relay activity coming into the midbrain mainly from both the ipsi- and contralateral visual cortices. Sectors in the left and right midbrarn structures that contain mirrored representations of the visual field (e.g., - 15" and 15") are competing with each other through mutual inhibition. The inhibition is mediated by the topographically organized, reciprocal projections with the strengths k, and k c .The strength coefficients of the input pathways, k , ( a ) . are determined by the function depicted in Fig. 6. Neighboring patches in either structure are linked by lateral connections carrying both excitatory and inhibitory influences. Each variable is also influenced by input representing the average level of dynamical neural activity in the system (not represented in this diagram).
132
the consequence that the ipsilateral representation is largely restricted to orientations up to 45 degrees from the vertical meridian, which agrees with experimental results (Meredith and Stein, 1990). The summed input strength for the central f 15" accounts for 52% of the total strength of all inputs, and input strength at the central focus is up to five times stronger than that at contralateral peripheral locations. The whole input distribution curve was scaled in such a way that the input pathways were again the dominant pathways in the model, and the average input strength for the contralateral inputs was equal to the input k, = 5. The spatial parameters we have chosen are consistent with known parameters for midbrain structures (Rosa and Schmid, 1995), and were as reasonable as possible on the basis of available data, but they may be further modified by more direct information from experiments.
Including these spatial details required that the term for background activity in the system be refined. This excitatory coefficient was split into two parts; the larger component, k, = 0.9, accounted for global inputs from the rest of the model system. That part of the input was, again, simulated as the average activity of all the inputs and dynamical variables, as in equation (2). A smaller term, kLatE= 0. I,balanced the intrinsic coupling between immediate neighbors in each unilateral structure, which was included to provide weak lateral inhibition, following Kadunce et al. (1997). These inhibitory connections between immediate neighbors integrated activity according to the same mechanism as the long-range inhibitory fibers, and had a strength kLatI= - 0.1. The representation of attention to the regions of the visual field was assessed by averaging the steady state values of the variables that correspond to matching sectors from
*Right
midbrain inputs
I
10
2
0 -90 -75 -60 -45 -30 -15
0
15 30 45
60 75 90
Visual field eccentricity c1 (degrees) Fig. 6. Distribution of pathway strengths for left and right input paths into the bilateral midbrain structures. The curves for the lelt and right input paths are marked by dashed lines with circles and by solid lines with squares, respectively. The overlap of the curves reflects the fact that the central k 60" of vision are represented in both midbrain structures. Each structure M receives inputs for the complete contralateral hemifield (0' to 90"). as well as weaker inputs for a more central part of the ipsilateral hemifield (0"to 60"). The Gaussian shape of the curves represents the integration of incoming neural activity in the midbrain, see main text. The axis for inputs from the ipsilateral side of the visual field has been scaled by factor to account for the reduced and weaker ipsilateral representalion assumed in midbrain structures like the SC (Antonini et al., 1978; Hariing et al., 1992). The width of the curve was chosen as to agree with data available for magnification factors in midbrain structures (Rosa and Schmid, 1995). These experimental results suggest that the representation of the central focus in the SC is about 4 to 6 times greater than that of the periphery. The and z = 2 for resulting shape of the input function k, is described by k,(a) = 10 e-(zeJ'''*'' , with 6 = 4 2 and z = 1 for u,,,,,,,,,,,=(O,lOO)
:,
~lp\llareral=
(om).
133
both midbrain structures (e.g., the representation of - 45" in the left structure with that of - 45" in the right). Such a read-out would provide a very simple directional signal mechanism for downstream motor systems or for sensory systems.
Simulating intact orienting to different j e l d eccentricities We now turn to the results of stimulating the model with visual inputs at different eccentricities, while the pathways and the model structures were again intact or lesioned. Figure 7 shows the input-related activity averaged across the relevant sectors of both midbrain structures for the intact system. As the system is symmetrical, only left-sided inputs are presented here. Identical, mirrored results can be produced with stimuli in the right hemifield. The highest absolute activities were produced by those sectors that corresponded to the locations at which input had been presented, thus computing the correct locations. There are also sharp local gradients between these and neighboring sectors, supported by lateral inhibition between neighbors. On the other hand, the strength of the response increases as stimulation moves from the periphery to more central locations. However, the response to
a central stimulus does not generate the highest absolute activity because of the mutual inhibition between the two competing sectors that represent central vision in each structure. It can also be seen how more central inputs increase the baseline activity for the hemifield in which the stimulus has been presented, and how stimulus-related activity is complemented by a depression of activity at the antagonistic sector, in the representation of the opposite hemifield.
Spatial aspects of neglect resulting from unilateral lesions Figure 8 presents the resting state activity averaged across the two midbrain structures for the intact system, and after lesioning Z, which corresponds to a lesion of the right cortical station. The lesion is, once again, simulated by reducing the strengths of the (right-hemispheric) pathways to 10% of the original value, taking account of other residual undamaged inputs into the midbrain. While the intact activity distribution is determined by the Gaussian inputs, the unilaterally lesioned system shows a dramatic decrease in the activity representing the contralesional hemifield. Coupled to this is a shift of the peak in activity away from central I
1 , 0.8
.--8> 0.6 CI
* 0.4 0
0.2
-90 -75 -60 -45 -30 -15
0
15 30 45 60
75 90
Mapped field eccentricity (degrees) Fig. 7. Mapping of different stimulus eccentricities in the intact bilateral midbrain system. The curves represent the spatially distributed activity in the midbrain (averaged for the left and right midbrain structure) following visual stimulation at given eccentricities in the left hemifield (stimulus eccentricities shown in inset). The respectively highest peaks of the midbrain activity distribution correspond to the correct eccentricities of the given input. Analogous, mirrored results follow for presentation of stimuli in the right hemifield (not shown here).
I34
system
+Lesioned right inputs
-90 -75 -60 -45 -30 -15 0
1
15 30 45 60 75 90
Mapped field eccentricities (degrees)
Fig. 8. Off-midline shift of averaged midbrain activity after lesioning of inputs relayed from input stations in the right hemisphere (leaving 10%of the pathways intact). Neural activity at rest, corresponding to a representation of the complete visual field, is centered in the intact system, but shifted to the ipsilesional side by 15"-30" in the unilaterally lesioned system. In both the intact and lesioned state, the representation has a main peak and then declines toward the periphery.
vision toward 15"-30" in the ipsilesional field. Due to the imbalanced inhibition between ML and M,, this peak is higher than for the resting state activity in the intact system (see also point '2' in Fig. 4). Such a distribution of activity may help to explain the tendency of lesioned animals, as well as of human patients suffering from neglect, to preferentially explore the ipsilesional side and to ignore the contralaterai space. The shift of the activity peak in the representation of the visual field can also help to understand the behavior of human patients suffering from hemineglect in standard neurological tests like the line bisection task. In this task, patients with left hemineglect determine the subjective midpoint of a line consistently to the right of the actual center. The distorted representation of the visual field, as in Fig. 8, which under-represents the left of the field, over-represents the right, and shifts the peak of the representation towards the right, might well underlie such behavior. The reduction of neural activity in sectors representing the far ipsilesional periphery would, moreover, correspond to experimental observations in human neglect patients that show a reduction of visual exploration for very peripheral eccentricities (Karnath, 1997). Simulating hemi-extinction arising from bilateral stimulus presentation Hemi-extinction in human neglect patients often arises only for bilateral presentation of visual stimuli; for unilateral stimulus presentation these
patients behave like normal subjects (e.g. Driver and Mattingley, 1998). The phenomenon has rarely been studied in animals (Foreman et al., 1992), although it clearly deserves more attention as an elucidating example of stimulus competition. Since our earlier simulations showed that stimuli presented in the neglected hemifield would not lead to a significant activity in the midbrain representation, apart from presentation in a small region close to the midline, we concluded that hemi-extinction would only occur for lesions involving lesser brain damage. We consequently simulated a lesion that would leave 30% of the right hemispheric pathways intact, that is, the lesion would reduce the strength of such paths to 30%. In this case, stimuli in the left 'neglect' hemifield are effective (i.e. lead to the highest absolute activity in the midbrain representation) if they are presented unilaterally at eccentricities between 45" and 0" inclusive. This neural activity would form the basis of an appropriate orienting response towards the presented stimulus, and the animal or human subject would appear normal with respect to orienting behavior. However, simultaneous stimulus presentation in the opposite hemifield, at an antagonistic eccentricity, extinguishes the previously strong neural response and creates a new, even stronger activity locus in the antagonistic midbrain sector, leading to an orienting response towards the stimulus in the 'intact' hemifield. Both situations, for uni- and bilateral stimulus presentation, are depicted in Fig.
135
9. This simulation, thus, shows that hemi-extinction can occur for bilateral stimulus presentation even though there is no neglect apparent for unilaterally presented stimuli. The phenomenon would occur in cases where the underlying lesion produced lesser brain damage than that leading to hemineglect.
Recovery of spatial orienting for bilateral lesions - without recovery of the periphery Sprague, as well as Lomber and Payne, observed that doubly lesioned systems had a poorer recovery at the far periphery than for more central locations (Sprague, 1966; Lomber and Payne, 1996). In Lomber and Payne's experiment, bilateral cooling of the pMS cortex produced lower responding at very peripheral eccentricities ( f 90') (Lomber and Payne, 1996). In analogy to the bilateral inactivation undertaken by Lomber and Payne (1996), Fig. 10 shows the representation of stimuli at given locations in a system in which all inputs to the orienting system via the cortex have been lesioned, that is, set to 10% of their original strengths. Comparing Fig. 10, the doubly lesioned case, with Fig. 7, the intact case, aided by the curve from the intact case at - 90" which is superimposed upon Fig. 10, shows that both baseline and stimulusI
1
11
1
O.*
.-.-> c.
0.6
2 0.4 0.2
!
I
+milat. 4-bilat.
stim. (-30degrees) stim. (+/-30 degrees)
1
1
15
I
\
/
0
Our results show that the frequently perplexing and counter-intuitive patterns of intact, lesioned and paradoxically restored orienting behavior can be explained quite comprehensively by a simple mathematical model based on aspects of real
1'. I
-90 -75 -60 -45 -30 -15
Discussion
I
n
I
t
related activity are reduced, and local gradients are less steep. Generally, however, the highest absolute activities still correspond to the appropriate stimulus locations. This means that for bilaterally lesioned inputs the system again recovers correct orienting to given visual eccentricities. A notable exception of this paradoxical recovery is the behavior of the system for stimulus presentation at the far periphery, for example - 90". At these eccentricities, the activity representing a visual stimulus is in fact smaller than that for baseline activity at the representation of central vision. Hence, we would not expect that such low stimulus-related activity would generate an unambiguous signal to attend to the far periphery, and we consequently expect that the doubly lesioned system would not reliably orient to stimuli on the fringes of vision, as is consistent with experiment (Lomber and Payne, 1996).
'
30
45
60
75
90
Mapped field eccentriclty (degrees)
Fig. 9. Suppression of the stimulus-related activity in the lesion-impaired representation of the midbrain by simultaneous stimulation in an antagonistic, contralateral eccentricity of the visual field. The effect is apparent for lesser lesion damage than underlying Fig. 8. Here, 30% of the right-hemispheric input pathways are still intact. The curves in solid and dashed lines obtain for unilateral stimulus presentation at - 30", and bilateral stimulus presentation at - 30" as well as 30", respectively. The two curves are very similar apart from the stimulus-related activity peaks. Unilateral stimulus presentation in the impaired hemifield produces an unambiguous peak of neural activity corresponding to the correct location. For bilateral stimulus presentation in antagonistic eccentricities, however, only the neural activity representing stimulation in the intact hemifield rises clearly above baseline activity. This condition reproduces the neurological picture of hemi-extinction, where patients show effects of hemineglect only for bilateral, but not unilateral stimulus presentation (Driver and Mattingley. 1998).
136
cortical one, particularly involving the superior colliculi and the basal ganglia. These sub-cortical mechanisms are informed and influenced by cortical processes and representations (as in the spatially dependent aspects of the model in Figs. 7-10), but the principal mechanism appears nonetheless to be direct mutual inhibition across the midline involving sub-cortical structures. It is evident that many of the effects observed experimentally, and simulated in our model, have parallels in human spatial attention and its disorders. We are uncertain as to whether the human system is as strongly dependent on sub-cortical interactions as is the cat, but we note that there are apparent cases of sub-cortical neglect (e.g. Healton et al., 1982; Vallar and Perani, 1986; Rafal and Posner, 1987), which suggest that the mechanisms may be similar. Recent observations that auditory alerting might ameliorate spatial visual neglect in humans (Robertson et al., 1998) also hint on the involvement of midbrain structures like the colliculi, which are well known to integrate multi-modal sensory information (e.g. Wallace and Stein, 1996). Similarly, paradoxical restorations of function in human spatial attention have been reported (e.g. Poppel and Richards, 1974; Vuilleumier et al., 1996), and a challenge for the future is whether models that do not represent sub-cortical interactions will be able to account for these paradoxical effects.
connectivity. The connectivity structure employed in the model, and the dynamic behavior the model exhibits, make it clear that competition between brain structures is an important mechanism for spatial attention in the cat and potentially also in humans. Our model also demonstrates that paradoxical lesion effects, which present a surprising set of phenomena, need by no means be mysterious. Paradoxical restoration of spatial orienting can be easily understood from the mechanism of competition across the midline between bilateral brain structures. Normal function arises from balanced competition, and restoration of attentional function after primary lesions is achieved by redressing the level of bilateral competition, so that the system can be efficiently influenced by external visual input. Many models of spatial attention are restricted to interactions within and between cortical stations (e.g. Mesulam, 1981; Pouget and Sejnowski, 1997) Cortical interactions can implement competition, as when output from two or more structures compete for ‘selection’ by a downstream processor (e.g. Pouget and Sejnowski, 1997), or when lateral inhibition shapes the saliency of visual representations (Taylor and Stein, 1998). However, in the cat, the experimental results from the behavioral effects of brain lesions, the connectional neuroanatomy, and the behavior of our model, suggest that the principal mechanism for competition is a sub-
0.2
0 -90 -75
-60
-45
-30 -15
0
15
30
45
60
75
90
Fig. 10. Behaviour of the system for bilaterally lesioned inputs. The meaning of the curves is as in Fig. 7. The system, in which both left- and right-hemispheric pathways are reduced to 10% of their original strengths, shows recovery of the original behavior. Importantly however, the mapping of stimulus-related activity across the midbrain structures is ambiguous for eccentricities in the far periphery (?90”), as these stimuli angles produce only a local, not however a global, maximum of activity in the midbrain representation.
137
An important difference between the organization of spatial attentional mechanisms in the cat and in humans lies in the issue of lateralization. The experimental data for the cat, which are underlying our model, show no significant lateralization of intact, lesion-impaired or restored orienting. Patterns of spatial neglect in humans, however, are asymmetrical with respect to the two halves of the brain. Left hemineglect, which is caused by damage to the right half of the brain, occurs more frequently in human patients and is more severe than right hemineglect (e.g. Vallar, 1998). Human spatial attention is so conspicuously lateralized that it has been suggested that the right hemisphere contains neural networks representing both external hemifields, whereas the left hemisphere is mainly or exclusively concerned with attending to the contralateral, right hemifield (Weintraub and Mesulam, 1987). On the other hand, reversible inactivation of cortical locations in the right and left parietal lobes by means of transcranial magnetic stimulation leads to largely symmetrical effects of hemi-extinction in human subjects (Pascual-Leone et al., 1994). Our model provides the possibility to explore the effects of asymmetrical competition across the midline, mediated by reciprocal but asymmetrical tonic inhibition, and we will investigate aspects of hemispheric lateralization in future studies. Kapur has argued that models that can account for paradoxical effects, are more constrained by complex and non-intuitive data than those that cannot (Kapur, 1996). Our model was additionally constrained by aspects of real connectivity. The success of the model in accounting for the complex lesion effects in a moderately constrained framework suggests that the mechanisms and parameters we have supposed may not be too far from reality. Two avenues for further effort suggest themselves in this context. First, it might be fruitful to employ the parameters we have suggested as a starting point for further experimental refinement, so that our assumptions can be replaced by values derived from experiment. Second, we hope that the model will facilitate the development of a framework in which interactions within large-scale neural systems can be more adequately formalized. Such a framework is presently absent, and is required to
rigorously integrate data from neurophysiology, neuroanatomy, neuroimaging, and brain lesion studies. These data are frequently used to address related questions posed at the systems level, but to date have rarely been treated in a systematic and unified way. We acknowledge limitations of the model. Any of the pathway coefficients could be debated, and most or all will probably be subject to refinement in the future. Similarly, there are many more elegant treatments of the mechanisms of integration of inputs within a model structure than the ones we have assumed here. A challenge will be to develop more realistic integrative mechanisms that will correspond to neurophysiological and neuroanatomical data at the systems level, while still giving rise to a system that reproduces the observed behavioral effects. In addition, we are aware that there are very diverse inputs into the colliculi, and that these structures plainly have many more functions than the visual spatial attention mechanisms we have modeled. A further challenge for us, which has presumably already confronted evolution, is to map these different functions into the same neurological structures in a way that disrupts each function minimally. Reviews of spatial neglect (e.g. in Pouget and Sejnowski, 1997), demonstrate that neglect has components that can be related to retinotopic as well as object-centered coordinates. As we are not aware of experimental data for the cat coming from object-centered testing paradigms, we based our model on data readily available for the retinotopic frame of reference. It would, however, be simple to extend the present model to operate in further coordinate systems, following the promising modeling approach developed by Pouget and Sejnowski (1997) to combine different coordinate systems in basis functions. The model excludes any plasticity. It should be mentioned that attentional systems can recovery spontaneously after primary lesions, both in the cat (e.g. Payne et al., 1996b) and in humans (e.g. Vallar et al., 1988). The exact biological mechanisms of long-term recovery are far from clear. It is, however, widely assumed that some or all restoration of function after brain lesions is implemented by other structures in the network changing plasti-
138
cally, and thereby taking up the functions previously mediated by the damaged structure. These plastic processes are very likely to be occurring in the experimental animals whose behavior we have tried to model (Payne et al., 1996b; Rosenquist et al., 1996). However, it is also possible that slowly ongoing and increasing dumage may, paradoxically, contribute to some gradual restorations of function. For example, we can envisage that a wounded brain area might broadcast noise loudly across the network, and that this source of disinformation might diminish as its cells continue to die, so allowing the network to rebound gradually toward greater efficiency. Another example might be the case of one structure competing with another structure by direct mutual innervation, as in the relations between the modeled colliculi. Should unilateral damage occur, then cell loss at the recipient structure following degeneration of the damaged structure’s efferents, might restore more balanced competition after time, with a time course similar to that for gradual recovery of function. We also did not consider dynamic mechanisms that lead to a habituation or adaptation of orienting behavior. Inspired by such phenomena, Singer et al. (1977) suggested an involvement of tectal stations in human orienting behavior. We will be interested to extend our model for spatial attention to encapsulate such effects in the future. Competition as a general principle of neural processing has been advanced by several authors in the context of human and non-human primates (e.g. Kastner et al., 1998; Humphreys et al., 1994; Kapur, 1996; Duncan et al., 1997), and our model is consistent with these ideas. It seems that competitive principles of interaction between brain areas can account for intact, lesioned and paradoxically restored behavioral function in our model, and have been suggested to explain aspects of human function. We have presented an initial step toward a mathematical framework based on brain connectivity that can predict the deleterious and beneficial effects of patterns of lesions. Should competitive principles hold in the more general case and in the human brain, then our results suggest that, alongside better information about human brain connectivity (Crick and Jones, 1993),
it may be possible to derive a principled framework which could guide further intervention on the damaged brain in order to restore function in patients with neurological abnormalities.
Appendix Stability of the basic model In order to show that the basic model performs in a stable way in all practically relevant parameter regions, we investigated the stability of the system S, equations (l), to small perturbations close its steady state. We used the standard techniques of linear stability analysis (e.g. Jordan and Smith, 1987; Murray, 1989) for this investigation, of which we only present the main steps here. The Jacobian matrix of the system S in equations (1) is given by - k, - 3kB/4
k,+kB/4
ki+kB/4 - k, - 3 k ~ / 4
From this follows the characteristic equation, which, with A as the eigenvalues of J, can be written in a general form as A’-
trA+det=O
(A21
For the matrix J in (Al), the characteristic equation becomes
so that
tr= - (2k,+$) and
are the trace and the determinant of J, respectively. The coefficient ki:= k, = k-, denotes, as in the main text, the inhibitory path coefficients. The stability
139
of the system depends on the combination of values for tr and det (see inset in Fig. 3 ) . For det
parts of the eigenvalues that solve equation (A2) would become positive, and hence the steady state unstable. However, equation (A4) makes it clear that tr never becomes positive, as both k, and k, are always non-negative. The character of the stable steady state of S can be investigated by looking at the curve tr’/4=det (see Fig. 3 inset). For positive tr, points above this curve correspond to a stable node, below the curve to a stable spiral. For S with tr and det as given in (A4) and (A5), the relation tr2/4> det leads to
which can be transformed into the obviously nonnegative form
with
I
f=
(ke/4+k,)’> 0.
2d2+3d+ 1 ’
The two-dimensional function in (A6) depends on the summarized excitatory ( k , ) and inhibitory (k,) parameters in the system S. The function is plotted in Fig. 3 , for the settings of d = 1 (lowest surface), d = 2.5, and d = 5 (top surface). The zero-crossings of the function surfaces are projected onto the floor of the plot, with the bottom-most trajectory corresponding to the highest surface. These trajectories can be described by the non-negative solution for k,. Inequality (A6) being solved as a quadratic equality yields that solution. It is given by k, =
ki(l -
mf) 4f
With the reasoning outlined above, the system S will be stable as long as the left-hand side of equation (A8) is equal or larger than the right-hand side. This relation describes the stable region in Fig. 3 above the trajectories in the floor plane. In the region below these curves, the steady state would behave as an unstable saddle point. This figure and the relation (A8) demonstrate that the stability of the system is guaranteed as long as the inhibitory path coefficients are not much larger than the combined excitatory ones. As the inset in Fig. 3 indicates, the system may also become unstable for parameter values with det > 0, if tr > 0. In this case, at least one of the real-
(A101
This corresponds to tr2/4> det and means that the stable steady state of S is a stable node. In summary, steady states of S become unstable only for unproportionally large inhibitory path coefficients. In these cases, the steady state is an unstable saddle point. For all other parameter settings, however, the steady state corresponds to a stable node.
Acknowledgements This work has been supported by the Wellcome Trust and the sponsors of the Crete Course in Computational Neuroscience. We wish to thank J. Rinzel for mathematical advice, B. Payne and S. Lomber for helpful discussions, and V. Braitenberg and his elegant Vehicles for stimulation.
References Alexander, G.E. (1995) Basal Ganglia. In: M.A. Arbib (Ed.), The Handbook qf Brain Theory and Neural Networks. MIT Press, Cambridge, pp. 139-144. Antonini, A,, Berlucchi, G., Marzi, C.A. and Sprague, J.M. (1979) Importance of corpus callosum for visual receptive fields of single neurons in cat superior colliculus. J. Neurophysiol., 42: 137-152. Antonini, A,, Berlucchi, G. and Sprague, J.M. (1978) Indirect, across-the-midline retinotectal projections and representation of ipsilateral visual field in superior colliculus of the cat. J. Neuropliysiol., 41: 285-304. Appell, P.P. and Behan, M. (1990) Sources of subcortical GABAergic projections to the superior colliculus in the cat. J. Cump. Neurul., 302: 143-158.
I40 Baleydier, C., Kahungu, M. and Mauguiere, F. (1983) A crossed corticotectal projection from the lateral suprasylvian area in the cat. J. Comp. Neurol., 214: 344-351. Behan, M. and Kime, N.M. (1996) Spatial distribution of tectotectal connections in the cat. Prog. Bruin Res., 112: 131-142. Berman, N. and Payne, B.R. (1982) Contralateral corticofugal projections from the lateral, suprasylvian and ectosylvian gyri in the cat. Exp. Bruin Res., 47: 234-238. Berson, D.M. and Stein, J.J. (1995) Retinotopic organization of the superior colliculus in relation to the retinal distribution of afferent ganglion cells. Vis. Neurosci., 12: 671-686. Boussaoud, D. and Joseph, J.P. (1985) Role of the cat substantia nigra pars reticulata in eye and head movements. 11. Effects of local pharmacological injections. Exp. Bruin R e x , 57: 297-304. Bruce, L.L. ( I 993) Postnatal development and specification of the cat’s visual corticotectal projection: efferents from the posteromedial lateral suprasylvian area. Bruin Res. Dev. Bruin Res., 73: 47-61. Buhl, E.H. and Singer, W. (1989) The callosal projection in cat visual cortex as revealed by a combination of retrograde tracing and intracellular injection, Exp. Bruin Res., 75: 47M76. Ciaramitaro, V.M., Wallace, S.F. and Rosenquist, A.C. (1997) Ibotenic acid lesions of the substantia nigra pars reticulata ipsilateral to a visual cortical lesion fail to restore visual orienting responses in the cat. J. Conzp. Neurol., 377: 596-6 10. Crick, F. (1994) The Astonishing Hypothesis. Simon & Schuster, London. Crick, F. and Asanuma, C. (1986) Certain aspects of the anatomy and physiology of the cerebral cortex. In: D.E. Rumelhart, J.L. McClelland and the PDP Research Group (Eds.), Purullel Distributed Processing, Vol. 2. MIT Press, Cambridge, MA, pp. 333-371. Crick, F. and Jones, E. (1993) Backwardness of human neuroanatomy. Nuture, 361: 109-1 10. Douglas, R.J. and Martin, K.A.C. (1991) Opening the gray box. Trends Neurosci., 14: 286-293. Driver, J. and Mattingley, J.B. (1998) Parietal neglect and visual awareness. Nurure Neurosci., I : 17-22. Duncan, J., Humphreys, G. and Ward, R. (1997) Competitive brain activity in visual attention. Curr: Opin. Neurobiol., 7: 255-26 1. Edwards, S.B., Ginsburgh, C.L., Henkel, C.K. and Stein, B.E. ( I 979) Sources of subcortical projections to the superior colliculus in the cat. J. Comp Neural., 184: 309-329. ‘oreman, N., Save, E., Thinus-Blanc, C. and Buhot, M.C. (1992) Visually guided locomotion, distractibility, and the missing-stimulus effect in hooded rats with unilateral or bilateral lesions of parietal cortex. Behuv. Neurosci., 106: 529-538. iraybiel. A.M. (1975) Anatomical organization of retinotectal afferents in the cat: an autoradiographic study. Bruin Res., 96: 1-23.
Halligan, P.W. and Marshall, J.C. (1992) Left visuo-spatial neglect: a meaningless entity? Cortex, 28: 525-535. Hardy, S.C. and Stein, B.E. (1988) Small lateral suprasylvian cortex lesions produce visual neglect and decreased visual activity in the superior colliculus. J. Comnp. Neurol., 273: 527-542. Harting, J.K. and Cuillery, R.W. (1976) Organization of retinocollicular pathways in the cat. J. Comp. Neurol., 166: 133-144. Harting, J.K., Huerta, M.F., Hashikawa, T., Weber, J.T. and Van Lieshout, D.P. (1988) Neuroanatomical studies of the nigrotectal projection in the cat. J. Comp. Neurol., 278: 6 15-63 I . Harting, J.K., Updyke, B.V. and Van Lieshout, D.P. (1992) Corticotcctal projections in the cat: anterograde transport studies of twenty-five cortical areas. J. Comp. Neurol., 324: 379-414. Healton. E.B., Navarro, C., Bressman, S. and Brust, J.C. (1982) Subcortical neglect. Neurology, 32: 776-778. Humphreys, G.W., Romani, C., Olson, A., Riddoch, M.J. and Duncan, J. (1994) Non-spatial extinction following lesions of the parietal lobe in humans. Nuture, 372: 357-359. Jiang, H., Stein, B. and McHaffie, J. (1997) Electrical activation of corticotectal regions of extraprimary sensory cortex modulates the activity of nigrotectal neurons. Soc. Neurosci. Abstr:, 23: 602.8. Jordan, D.W. and Smith, P. (1987) Nonlinear Ordinary Diflerentiul Equations. Clarendon Press, Oxford. Joseph, J.P. and Boussaoud, D. (1985) Role of the cat substantia nigra pars reticulata in eye and head movements. I. Neural activity. Exp. Bruin R e x , 57: 286-296. Kadunce, D.C., Vaughan, J.W., Wallace, M.T., Benedek, G. and Stein, B.E. (1997) Mechanisms of within- and cross-modality suppression in the superior colliculus. J. Neurophysiol., 78: 2834-2847. Kapur, N. ( 1996) Paradoxical functional facilitation in brainbehaviour research. A critical review. Brain, 119: 1775-1790. Karnath, H.O. (1997) Spatial orientation and the representation of space with parietal lobe lesions. Philos. Trans. R. Soc. Land B Biol. Sci., 352: 1411-1419. Kastner, S., de Weerd, P., Desimone, R. and Ungerleider, L.G. (1998) Mechanisms of directed attention in the human extra striate cortex as revealted by functional MRI. Science, 282: 708-7 1 1. Kotter, R. and Wickens, J.R. (1998) Striatal modeling i n Parkinson’s disease: New insights from computer modeling. Art$ Intell. Med., 13: 37-55. Lomber, S.G. and Payne, B.R. (1996) Removal of 2 halves restores the whole - reversal of visual hemineglect during bilateral cortical or collicular inactivation in the cat. vis. Neurosci., 13: 1143-1 156. Mascetti, G.G. and Arriagada, J.R. ( I 98 I ) Tectotectal interactions through the commissure of the superior colliculi: an electrophysiological study. Exp. Neurol., 7 1: 122-133.
141 Meredith, M.A. and Stein, B.E. (1990) The visuotopic component of the multisensory map in the deep laminae of the cat superior colliculus. J. Neurosci., 10: 3727-3742. Mesulam, M.M. (1981) A cortical network for directed attention and unilateral neglect. Ann. Neurol., 10: 309-325. Murray, J.D. (1989) Mathematical Biology. Springer, Berlin, p. 697. Nieoullon, A,, Cheramy, A. and Glowinski, J. (1978) Release of dopamine evoked by electrical stimulation of the motor and visual areas of the cerebral cortex in both caudate nuclei and in the substantia nigra in the cat. Bruin Res., 145: 69-83. Nobre, A.C., Sebestyen, G.N., Gitelman, D.R., Mesulam, M M., Frackowiak, R.S. and Frith, C.D. (1997) Functional localization of the system for visuospatial attention using positron emission tomography. Brain, 120: 5 15-533. Ogasawara, K., McHaffie, J.G. and Stein, B.E. (1984) Two visual corticotectal systems in cat. J. Neurophysiol.. 52: 1226-1 245. Pascual-Leone. A,, Gomez-Tortosa, E., Grafman, J., Alway. D., Nichelli, P. and Hallett, M. (1994) Induction of visual extinction by rapid-rate transcranial magnetic stimulation of parietal lobe. Neurology, 44: 4 9 4 4 9 8 . Payne, B.R. (1993) Evidence for visual cortical area homologs in cat and macaque monkey. Cerebral Cortex, 3: 1-25. Payne, B.R. (1994) Neuronal interactions in cat visual-cortex mediated by the corpus callosum. Behuv. Brain Res., 64: 55-64. Payne, B.R., Lomber, S.G., Geeraerts, S., Vandergucht, E. and Vandenbussche, E. ( 1996a) Reversible visual hemineglect. Proc. Natl. Acad. Sci. USA, 93: 296294. Payne. B.R., Lomber, S.G., Villa, A.E. and Bullier. J. (1996b) Reversible deactivation of cerebral network components. Trends in Neurosci., 19: 535-542. Poppel, E. and Richards, W. (1974) Light sensitivity in cortical scotomata contralateral to small islands of blindness. Exp. Bruin R e x , 21: 125-130. Pouget, A. and Sejnowski, T.J. (1997) A new view of hemineglect based on the response properties of parietal neurones. Philos. Trans. R. Soc. Lond. B Biol. Sci., 352: 1449- 1459. Rafal, R.D. and Posner, M.I. (1987) Deficits in human visual spatial attention following thalamic lesions. Proc. Natl. Acad. Sci. USA, 84: 7349-7353. Robertson, I.H., Mattingley, J.B., Rorden, C. and Driver, J. (1998) Phasic alerting of neglect patients overcomes their spatial deficit in visual awareness. Nature. 395: 169-172. Rosa, M.G. and Schmid, L.M. (1995) Magnification factors, receptive field images and point-image size in the superior
colliculus of flying foxes: comparison with the primary visual cortex. Exp. Bruin Res., 102: 551-556. Rosenquist, A.C., Ciaramitaro, V.M., Dunner, J.S., Wallace, S.F. and Todd. W.E. (1996) Ibotenic acid lesions of the superior colliculus produce longer lasting deficits in visual orienting behavior than aspiration lesions in the cat. Prog. Brain Res., 112: 117-130. Schuster, S. and Hilgetag, C. (1994) On elementary flux modes in biochemical reaction systems at steady state. J. Biol. Syst., 2: 165-182. Singer, W., Zihl, J. and Poppel, E. (1977) Subcortical control of visual thresholds in humans: evidence for modality specific and retinotopically organized mechanisms of selective attention. Exp. Brain R e x , 29: 173-90. Sprague, J.M. (1966) Interaction of cortex and superior colliculus in mediation of visually guided behavior i n the cat. Science, 153: 15441547. Taylor, K. and Stein, J. (1998) Attention, intention and saliency in the posterior parietal cortex. In: .I.Bower (Ed.), Comp. Neurosci. "1998 Abstr., Santa Barbara, p. 165. Vallar, G. (1998) Spatial hemineglect in humans. Trends in Cognitive Sciences, 2: 87-97. Vallar, G. and Perani, D. (1986) The anatomy of unilateral neglect after right-hemisphere stroke lesions. A clinicaVCTscan correlation study in man. Neuropsychologia. 24: 609-622. Vallar, G., Perani, D., Cappa, S.F., Messa, C., Lenzi, G.L. and Fazio, F. (1988) Recovery from aphasia and neglect after subcortical stroke: neuropsychological and cerebral perfusion study. J. Neurol. Neurosurg. Psychiatry, 5 I : 1269-1276. Vuilleumier. P.. Hester, D., Assal, G. and Regli, F. (1996) Unilateral spatial neglect recovery after sequential strokes. Neurology, 46: 184-1 89. Wallace, M.T. and Stein, B.E. (1996) Sensory organization of the superior colliculus in cat and monkey. Frog. Brain Res., 112: 301-3 I I . Wallace, S.F., Rosenquist, A.C. and Sprague, J.M. (1989) Recovery from cortical blindness mediated by destruction of nontectotectal fibers in the commissure of the superior colliculus in the cat. J. Comp. Neurol., 284: 429450. Wallace, S.F., Rosenquist, A.C. and Sprague, J.M. (1990) Ibotenic acid lesions of the lateral substantia nigra restore visual orientation behavior in the hemianopic cat. J. Comp. Neurol., 296: 222-252. Weintraub, S. and Mesulam, M.M. (1987) Right cerebral dominance in spatial attention. Further evidence based on ipsilateral neglect. Arch. Neurol., 44: 62 1-625.
This Page Intentionally Left Blank
J.A. Reggia, E. Ruppin and D. Glanzman (Eds.)
Progress in Brain Research, Vol 121 0 1999 Elsevirr Science BV. All rights reserved.
CHAPTER 9
A new model of letter string encoding: simulating right neglect dyslexia Carol Whitney and Rita Sloan Berndt* Department of Neurology, Univer.siiy of Maryland School of Medicine, 22 South Greene Street, Baltimore, M D 21201, USA
Introduction Cognitive models of how oral reading is normally accomplished are typically developed and tested on the basis of experimental data gathered from skilled readers. Increasingly, however, information from other sources is playing an important role in the elaboration of such models. Detailed analyses of the effects of focal brain lesions on skills such as oral reading can place limits on the possible types of relationships that might exist among hypothesized processing components (see, e.g., Shallice, 1988). Of special interest are analyses of the relationship between the target words patients are attempting to read and the types of errors they produce, which may implicate breakdown of semantic, orthographic and/or phonologic processing. A second methodology that has begun to play an important role in the elaboration of cognitive theories of reading involves the development and testing of computational models, Implemented computer simulations of oral reading have demonstrated the feasibility of such hypothesized processing details as the distinction between serial assembly of sub-lexical units for unfamiliar words and parallel access to stored orthographic units for familiar words (Coltheart et al., 1993). Frequently, neuropsychological data are used to test computational models, which should be able to be degraded with simulated ‘lesions’ to reproduce clinical *Corresponding author. rberndt @umaryland.edu
patterns (Coltheart et al., 1993; Plaut and Shallice, 1993). This paper describes a theoretical model of some aspects of oral reading that was developed to be consistent with experimental data from normal subjects and with existing neurobiological theory. Computer simulations based on this model can be ‘lesioned’ to reproduce an error pattern obtained from a group of adult patients with left hemisphere lesions. The development of this model forced explicit consideration of a number of issues concerning normal word processing, especially the important issue of how letter order is encoded in words. We will describe the genesis of the model, including the motivation for modifications made during its development, to illustrate the value of computational modeling for investigating cognitive processes and their impairments. Neglect dyslexia and positional bias in visual errors
The error pattern at issue here is one in which patients substitute orthographically related words- words with substantial overlap of the targets’ letters -for the words they are trying to read. The pattern of orthographic substitution is not random; rather, letters on the side of the target word opposite the patients’ hemispheric lesions are much more likely be lost from the response than are letters from positions on the same side of the word as the lesion. A pattern of word substitution error in
144
which responses overlap targets only for the left or right parts of words has been termed ‘neglect dyslexia’. As the name implies, this reading pattern has been interpreted to be a manifestation of hemispatial neglect, affecting all objects appearing in the side of space contralateral to the patient’s lesion. Neglect dyslexia was first described in a group of six patients with right hemisphere lesions who showed generalized left-sided neglect but no language impairment other than errors substituting letters on the left sides of words (e.g. level+ novel; lnsbourne and Warrington, 1962). Many similar cases of left neglect dyslexia have been reported for patients with right hemisphere lesions, left spatial neglect, and no language impairments (see Riddoch, 1990, for review). The pattern of right neglect dyslexia following left hemisphere lesion has also been reported, though less frequently. Several early descriptions are available of patients who substituted words sharing only the left-most positions with the targets, and these patients did not demonstrate neglect in drawing and other clinical tasks (Warrington and Zangwill, 1957; Casey and Ettlinger, 1960). More recently, a pattern consistent with right neglect dyslexia has been described for patients with left hemisphere lesions, but the clearest cases have involved individuals who were naturally left-handed (Caramazza and Hillis, 1990a; Warrington, 1991). The most thoroughly studied of these cases showed frank right neglect and no aphasia (Caramazza and Hillis, 1990a). This patient substituted words overlapping targets only in early letter positions (e.g. journal- journey), and she did this regardless of whether words were presented horizontally, vertically, in mirrorreversed format or were spelled aloud to her. That is, the ‘neglect’ of the ends of words was maintained even when those end letters did not fall in the right side of space. Based on these and other findings, Caramazza and Hillis argued that the patient suffered from an attentional impairment that operated over an internally generated spatial representation of the word. Further, they explicitly interpreted this finding as indicating that a spatial representation of written words is computed during the process of normal reading. (Caramazza and Hillis, 1990b).
The hypothesis that a spatial representation of written input is an element of normal word processing is a theoretically important claim. In normal readers, a spatial representation of letter input could function to encode letter order transiently, i.e., in the very short period (<40 ms, Perfetti and Bell, 1991) before it begins to be supplanted by a phonetic code. Right neglect dyslexia is then interpreted as a form of ‘neglect’ of this spatial code, and is thus necessarily linked to other manifestations of spatial neglect (see Hillis and Caramazza, 1995, for discussion). One complication for this point of view is the occurrence of neglect-like reading errors among patients with left-hemisphere lesions and no clinical signs of spatial neglect (e.g. Warrington, 1991). Word substitutions that favor retention of early letter positions have long been described among the ‘visual’ errors of left-hemisphere damaged patients who also show a wide range of other types of language impairments, but no reported spatial neglect (Morton and Patterson, 1975; Shallice and Warrington, 1980). In fact, retention of early letter positions appears to be a relatively common finding among the errors of patients with Deep and Phonological Dyslexia (see also Buxbaum and Coslett, 1996; Greenwald and Berndt, 1999). To investigate this issue further, and to provide some actual error data that could be simulated, we carried out a retrospective analysis of an error corpus gathered from patients with focal left hemisphere lesions and consequent reading impairment.
An analysis of word substitution errors in patierits with lef-hemisphere lesions Patients were selected to participate in an investigation of the underlying causes of sub-lexical reading impairment. Thus, all patients demonstrated difficulty reading nonsense words that was disproportionate to their ability to read real words. All patients were premorbidly right handed adults who were skilled readers prior to suffering a left hemisphere cerebrovascular accident not less than three months prior to initiation of the study. The initial cohort of eleven patients is described in detail in Berndt et al. (1996). This group showed a range of aphasic impairments from severe non-
145
fluency with agrammatism to a mild anomia. The ability to read real words ( N = 349) ranged from 0.18 to 0.97 correct, while non-word reading ( N = 20) ranged from 0.00 to 0.60 correct. This group of patients produced a wide range of errors when reading words, including substitution of words semantically related to targets, perseverations of previous responses, and phonetic distortions. Of primary interest for present purposes, most of the patients with difficulty reading real words produced at least some errors in which orthographically related words were substituted for targets. These so-called ‘visual’ errors have been defined as word substitutions that share at least half the letters of the target (Coltheart, 1980). Five patients of the group produced visual errors as a substantial proportion ( > 0.25) of their errors, and these data were selected for further analysis (Berndt and Haendiges, 1997). A letter position analysis was carried out on these visual errors ( N = 201) by aligning the target and response words from left to right and calculating the proportion of letters from each position that appeared in both words. For each patient, the proportion of letters retained in the response at each position was calculated across all the visual errors, and was expressed as proportion of target letters retained per position. Several additional analyses were conducted to assure that this absolute scoring of position did not introduce scoring artifacts (Berndt and Haendiges, 1997). A strong positional effect favoring early letter positions was found for all patients (mean proportion letters retained across patients from first to last position: 0.80, 0.72, 0.55, 0.32, 0.27, 0.20). Similar positional biases were found for patients’ lexicalizations of non-word targets, and a somewhat attenuated positional effect was found when all word substitution errors were scored for position (see Berndt and Haendiges, 1997). Other aspects of the errors produced by these patients were consistent with previous descriptions of ‘neglect dyslexia’. First, there was a strong tendency for patients’ responses to be approximately the same length as the targets: there were no substitutions of ‘cat’ for ‘caterpillar’, for example, or vice versa. Most errors were within one letter of target length. Second, the absolute
position at which letters tended not to be retained changed with increasing word length. That is, more letters were retained in the early positions of longer words than of shorter words. Another feature of these errors, which has not previously been discussed, was that only 5 % involved letter transpositions. Most errors reflected deletion of letters (e.g. fact+ fat) or insertions (e.g. plane-planet) or a combination of both (e.g. corn- cord, frog+ frost), rather than errors that permuted the letters of the target (e.g. note+ tone). These characteristics of the patient data, in addition to the positional effects, were considered in evaluating the simulations to be described below. Although the patients who produced these errors were not tested extensively for the presence of right neglect, none of them demonstrated obvious neglect on clinical screening. In contrast, all patients showed some type of language impairment, which in several cases was quite severe. Thus, in considering possible sources for the positional effects found among this group, there was no compelling reason to assume that a degraded spatial representation of target words was a necessary component in the generation of this error pattern. Rather, it seemed appropriate to consider other ways in which letter order could be encoded during normal written word processing that could produce this pattern consequent to impairment within the language system. Modeling and letter order encoding Our goal was thus to develop a model of written word processing that could accommodate the finding that patients with focal left hemisphere lesions tend to substitute words that overlap the first few letters of the target. More specifically, the model’s structure, when degraded, should fail in a manner indicating that lexical access is based on a partial representation that includes information about the identity and order of words’ initial letters and about target word length. The assumption is that the disorder responsible for this pattern (similar to ‘right neglect dyslexia’) is substantively distinct from the disorder responsible for left neglect dyslexia, which seems much more clearly linked to an impairment involving all information
I46
processing in left extrapersonal space (see also Ellis et al., 1993). Rather than demonstrating generalized problems with information presented in right space, patients with the pattern of right neglect dyslexia tend to show problems only with language. Specifically, the patients whose data are presented here showed only one common symptom across the entire group: impaired ability to ‘sound out’ unfamiliar letter strings. It is unclear at this time whether or not this symptom is functionally related to the word substitution pattern favoring early letter retention. This possibility is discussed further below. In light of the clear language impairments demonstrated by our patient group, and the lack of clear attentionallspatial impairments, the architecture of our model was designed to be consistent with findings in the cognitive literature about normal written word processing. As reviewed below, the available data are consistent with the idea that the relative order of letters in a string is an important factor in normal subjects’ processing of written letter strings. These studies do not support the hypothesis that the representation of letter identity and order is spatial in nature, but it is less clear precisely what sort of representation they do support, The complex pattern of results related to normal letter string encoding becomes especially evident when attempting to develop an explicit computational account of the processes involved. Previously published computational models of word recognition and pronunciation have adopted one of two mechanisms for the encoding of letter order, neither of which is entirely satisfactory. Several connectionist models of word recognition (McCleIland and Rumelhart, 1981) and pronunciation (Coltheart et al, 1993; Whitney et al., 1996) have employed a ‘channel-specific’ coding scheme that requires the representation of every letter in each position. Presentation of a letter string then activates only constituent letters in the correct position. This straightforward approach to letter order ensures that later processing will be based on the correct order of letters, but it requires a high degree of item redundancy. Morever, channelspecific coding cannot account for findings of relative position priming, where facilitation of word targets occurred when the relative order, not
the absolute position, of letters was preserved in the prime (Humphreys et aL, 1990; Peressotti and Grainger, in press). Nor can it account for position independent priming where facilitation of alphabetic identification of trigrams occurred when both absolute position and order were violated in the primes (Peressoti and Grainger, 1995). Another approach to encoding information about order in computational models of reading is to postulate the existence of multiletter clusters, usually trigrams, as a functional unit (Mozer, 1987; Seidenberg and McClelland, 1989; Mozer and Behrmann, 1992). The use of cluster representations produces a model that is sensitive to the local context in which letters appear rather than to their specific position in the string. Consequently, cluster coding of letter input can better account for relative position priming results than can models with channel-specific representations. However, the postulation of letter clusters begs the question of how these sublexical, supraletter clusters are generated during orthographic processing. It is difficult to see how trigrams could be activated directly from orthography without any recognition of the constituent letters, and such a process is inconsistent with evidence for facilitatory priming effects on individual letters (Grainger and Jacobs, 1991). However, if contextual units are activated from representations of the constituent letters, then the question of how letter order is initially represented remains unaddressed. Based on these considerations, our goal in modeling the patient data was to construct a theoretical framework for written word recognition that was consistent with what is known about normal processing, and to implement that framework in a model that could be lesioned to simulate the errors produced by the patient group. At the same time, we attempted to address some of these outstanding issues concerning the representation of letter order in a manner that accommodates current neurobiological thinking. The remainder of the paper is divided into four sections. First, we present our initial formulation of order encoding and its underlying rationale, including a discussion of its limitations. Next, we describe an elaborated version of the model which addresses those limitations. Then we present the results of three
147
simulations based on this model. Finally, we propose a potential source of the hemispheric differences that produce the symptoms of right and left neglect dyslexia.
Letter order as an activation gradient The central tenet of our conception of the representation of letter strings is that individual letters within the string (from first to last) are not all activated to the same degree when a letter string is presented. Rather, we conceive of an activation gradient across letter nodes, with highest activation of the first letter and gradual fall-off to the end position. For example, the string ‘CAT’ would be represented by clamping the activation of C to a maximal value, A to a value less than C, and T to a value less than A. Under this scheme, there would be a single set of letter nodes, and the position of a letter within a string would be represented by its level of activation. This general framework has some support from studies of normal string processing, especially for the idea that initial letters in a string are the most highly activated. For tachistoscopically presented strings, probability of letter recall and identification falls off from left to right (Lefton et al., 1978; Hammond and Green, 1982; Mason, 1982). Masked form priming studies have shown the initial letter to be the most facilitatory in target identification. For non-word primes having one letter in common with word targets, target naming was facilitated only when the common letter was in the initial position (Humphreys et al., 1990). For a task involving word primes and identification of target letters, identification was facilitated only when the target letter occurred in the initial position of the prime (Grainger and Jacobs, 1991). Other studies have indicated that strings differing in their initial letters are more dissimilar than strings differing in medial positions. For orthographically legal non-words constructed by transposing two adjacent letters of a word, lexical decision latencies were slowed when the transposed letters were in the middle of the word, but not when they were at the beginning (Chambers, 1979). Interference in a word identification task occurred when a prime differed from the target in
medial positions, but not in initial positions (Perea, 1998). In string comparison studies, subjects took longer to judge that two strings were different when the mismatch occurred at the end of the string than when the mismatch was at the beginning (Ratcliff, 1981; Proctor and Healy, 1985, 1987). Taken together, these results indicate that skilled readers show preferential encoding, or enhanced processing, of letters in the initial position. As noted below, the literature on normal letter string processing contains considerable evidence that positional encoding of letters is more complex across the entire string than these findings would suggest. However, our initial attempts to simulate these data focused on the feasibility of capturing a simple leftto-right processing advantage without using either channel-specific or contextual letter units.
Activation gradient simulations (Z) In our pilot simulations, we investigated the feasibility of representing letter positions via an activation gradient. Here we briefly summarize the structure and results of our initial simulations. The network consisted of two layers, a set of letter nodes (the input layer) and a set of word nodes (the output layer). The letter level was comprised of a node for each single letter (A-Z), and a node for each double letter (AA-ZZ). The word level consisted of a node for each monosyllabic word from the NETTalk corpus (Sejnowski and Rosenberg, 1987), yielding approximately 3500 word nodes. To represent an input string, the activation of the corresponding letter node was set to 0.7p0s-I, where pos denotes position within the string. For example, the input string ‘TREE’ was represented by setting T to 1.0, R to 0.7, and EE to 0.49. If a letter was repeated, but not as a double letter, the activation of that letter node was set to the sum of the activations for each position. The weights on the letter-to-word connections were set, rather than learned. The weight vector for each word was set to the letter activation vector corresponding to that word, where the length of that vector was normalized to 1.0. The activation of a word node was the dot product of its weight vector and the letter activation vector. The word node with
148
the highest activation constituted the output of the network. To lesion the network, noise was added to the activation of each word node. As discussed below, the selection of noise at the word level as a ‘lesion’ mechanism was intended to represent degraded input from impaired sublexical translation mechanisms. The pilot simulations were encouraging. Each word in the database was correctly recognized. When lesioned, the simulation produced errors that were positionally biased from left to right, and preserved target length. Thus, we were able to implement a novel string representation without a channel-specific representation of letter order, and without introducing context units. Further, a ‘wordcentered’ spatial representation was not required to reproduce salient aspects of the patients’ errors. Limitations of the pilot simulations Although this pilot simulation provided general support for our approach, several aspects of the data suggested the need for modification. First, the simulation of the patient data failed to reproduce some elements that we interpreted to be important characteristics of patient performance. For example, the patients’ errors consisted of substitutions, omissions, andor additions of letters, with very few transpositions. However, a majority of the error responses from the pilot simulations involved transpositions. In addition, the shape of the simulated positional effect was different from the patients’ with respect to the initial positions. The patients showed similar retention levels across the first and second positions. However, the retention level at the second position was significantly lower than at the first position in the simulations. A more general problem was that the graded activation representation was not very robust. Activation levels of anagrams (e.g. CLAPS and CLASP) were very similar to each other, and even a small amount of noise caused errors. In addition to these problems with the patient simulations, we were also concerned that the pilot model was unable to accommodate some complex results from the normal literature concerning the status of terminal letters. Because the final letter was the least likely to be retained in the patient
data, we assumed that the lowest level of activation occurred in the final position. A low level of activation at the terminal position is also consistent with results of string comparison reaction times reported by Proctor and Healy (1987). In one set of experiments (the ‘order’ task) normal subjects were to compare two strings of four consonants and report ‘same’ if both strings had the same letters in the same order, and ’different’ if any change was detected. Latencies to make correct ‘different’ responses showed significant effects of the position of the difference, regardless of whether the difference involved a replacement of one of the consonants or a shift of position. Changes were easier to detect at the beginnings compared to the ends of strings. However, in another experiment (the ‘item’ task), subjects were to respond ‘same’ if both strings contained the same letters, regardless of their order. Under these conditions in which subjects were explicitly NOT to attend to order, subjects were faster to detect a changed item at the end of the string than in the second position. Thus, processing of the terminal letter in the string was not always disadvantaged relative to earlier positions, but changed in these experiments as a function of task. There was also conflicting evidence reported about the activation of the final letter in the masked form priming studies of Humphreys and coworkers (1990). For non-word primes having two letters in common with word targets, target naming was most facilitated when the common letters were in the initial and final positions. Yet an analysis of errors made under the neutral prime condition (no letters in common with the target) showed that the target’s final letter was the least likely to be correctly identified. In other studies, the final letter was also the least likely to be correctly identified in briefly presented trigrams (Hellige et al., 1995) and words (Montant et al., 1998). In an alphabetical identification task using trigrams, false positives were the most likely to occur when the foil appeared in the third position, indicating that the final position had the lowest level of activation (Peressoti and Grainger, 1995). In contrast to these findings of low levels of activation for final letters, several other studies have indicated that the final letter has an increased
149
activation relative to the internal letters. For tachistoscopically presented strings, efficacy of recall and target identification increased in the final position (Lefton et al., 1978; Hammond and Green, 1982; Mason, 1982). Interference in a word identification task occurred when a prime differed from the target in medial positions, but not when it differed in the final position (Perea, 1998). It is difficult to reconcile these conflicting results. In all studies, the initial letter appeared to be the most active, with activation declining from left to right across the string. Yet evidence for the activation of the final letter was contradictory: some studies indicated that it was the least active of all the letters; others indicated that it was more highly activated than the internal letters. Obviously, our initial activation gradient model could not accommodate these findings.
A theoretical framework for letter string encoding Although the simulations using an activation gradient across letter units reproduced some of the positional effects in the patient data, it appeared that an additional constraint on relative letter order would be required to decrease the number of transposition errors and to provide a more robust representation. Such a constraint could be achieved by the interpolation of letter cluster units, specified for order. between letter and word representations. As noted above, however, the use of cluster units in computational models of letter string processing has been relatively unmotivated. We wanted to develop a framework within which all of the hypothetical levels we proposed were based on evidence that they play a role in normal processing. Further, we thought it necessary to propose some mechanism by which such contextual units become activated. Accordingly, in this section we set out the theoretical framework for a model of letter string encoding that is motivated by findings in cognitive psychology and neurobiology. This theoretical model serves as a basis for computer simulations, which are descibed in the following section. In trying to understand the nature of letter activations and their possible relationship to contextual units, we conceived of a novel model of letter position coding. We postulate that a letter
string is encoded by a temporal firing pattern across letter nodes: position is represented by the precise timing of firing relative to the other letters. The induction of this firing pattern results in an activation gradient. Thus, the activation gradient does not in itself represent position, but rather comes about as a by-product of the representational process responsible for letter node activation. Such a model provides the flexibility to account for the contradictory findings, noted above, with regard to final-letter activation. Contextual units are activated by the sequential dynamics of the letter nodes, and provide a mechanism for decoding this temporal representation. We chose bigrams as the contextual unit since they are the smallest possible such unit. In addition, there is evidence from studies of normal letter string processing that bigrams indeed comprise a functional processing unit (Whitely and Walker, 1994, 1997), and arguments have even been made concerning the source of these units. For example, Whitely and Walker (1994, p. 469) suggest that bigrams may emerge as units by virtue of their frequent correspondence to higher level representations, including syllable elements and familiar morphemes. Our approach to the genesis of bigram representations is different from that of Whitely and Walker’s in that we postulate that bigrams are activated from processes originating at the letter level. In this theoretical model, we will use the word ‘node’ to refer to the basic computational unit, which may be thought of as corresponding to a neuronal assembly. Functionally, a node recognizes the occurrence of a symbol. Following Hopfield ( I 996), we assume that a pool of nodes recognizes each symbol, and that a subset of that pool responds to each occurrence of a symbol. If the same symbol appears more than once within an input string, a different subset becomes active for each occurrence. The model consists of three levels of nodes (letters, bigrams and words), each of which has unique activation dynamics. At the letter level, nodes are specialized for creating a sequential firing pattern from graded inputs. At the bigram level, nodes are specialized to recognize temporally ordered pairs of letters. Bigram nodes convert
150
the temporal encoding from the letter level to a non-temporal representation. At the word level, nodes are specialized to recognize combinations of bigrams. For example, the string CART would be represented in the following way: at the letter level, C would fire, then A, then R, then T; at the bigram level, CA, AR, RT, CR, AT, and CT would become activated; at the word level, CART would be the most active of the word nodes. This structure is illustrated in Fig. 1. The following sections address each level in greater detail. Letter level
The idea that letter position within a string is represented by the precise timing of neural spikes is
WORD Conventional
00
consistent with current neurobiological models of information encoding. Hopfield (1 995) has proposed that quantities are represented by the explicit timing of action potentials, rather than by their ‘firing rate’. In this ‘phase-advance’ model, encoding neurons undergo internal, sub-threshhold oscillations of membrane potential. The magnitude of an input to such neurons determines when threshhold is exceeded. For a small input, threshhold is not exceeded until late in the cycle when the cell’s oscillation brings its potential near threshhold. For a larger input, threshhold is exceeded earlier in the cycle. Thus, the size of an input is represented by spike timing relative to the oscillatory cycle. This scheme implies that individual
CART
BIGRAM Detect ordered pairs
LETTER Sequential firing
GRADED INPUTS
:: r
M E
I
Fig. 1. Architecture of the model. At the letter level, simultaneous graded inputs and lateral inhibition create a temporal firing pattern, as indicated by the timing of firing displayed under the letter nodes. Excitatory connections link the letter nodes and the bigram nodes, which recognize ordered pairs of letters. The activations of the bigrams (shown above the nodes) are proportional to the activations of the constituent letters. Excitatory connections link the bigram nodes and the word nodes. Activation of word nodes is based on the conventional dot-product model.
151
spikes are much more important than has traditionally been assumed. Indeed, recent studies have shown that single spikes encode signficant amounts of information (Reike et al., 1997), and that spike timing is precise and reproducible at a millisecond time scale (Victor and Purpura, 1996; Berry et al., 1997; de Ruyter van Steveninck, 1997). Our intention is not to incorporate specific neurobiological details into our formulation of letter string encoding, but to provide an account that could ultimately be linked to such details. The framework offered here attempts to provide an interface between cognitive data concerning positional encoding and current thinking about underlying neural activity. Hopfield (1995) has proposed the phase-advance model as a general method of information encoding used by systems performing non-serial processing (e.g. in the olfactory system), as well as serial processing. Given that all other modalities of language input and output (hearing, speaking and writing) involve serial processes, it is not unreasonable to assume that a spatial letter string is converted into a temporal representation at some point. This temporal representation does not arise from a serial input process. Rather, parallel input is converted to a temporal representation at the letter level of our model. This is accomplished by graded inputs in conjunction with internal subthreshhold oscillations and lateral inhibition. The graded inputs are such that the first letter receives the most input, the second letter receives the next highest amount, and so on. The letter receiving the highest level of input fires first because it reaches threshhold before the others. The letter receiving the second highest level of input fires next, etc. Suitable input levels and lateral inhibition assure that only one letter fires at a time. Our model assumes the existence of graded inputs; possibilities regarding their source are discussed below. More precisely, we assume that all letter nodes undergo the same oscillations of internal state, C(t) with period g, and that all letter nodes have the same firing threshhold, T. We will consider activity during a single oscillatory cycle 0 < t < g, where t = 0 is a time for which C is at its minumum. Let Li denote a letter node, and V,denote its internal state. Input to Li is measured in terms of its effect on V j,
Let E, denote excitatory input to L, . For simplicity, we take E to be independent of t . Prior to firing, a node’s internal state is given by V,(t)=E,+ C(t)- Z,(r) + v, where Z,(t) denotes inhibitory input and v is a constant. When V,(t)>T, L, fires and sends inhibition to all other letter nodes. Immediately after firing, V, falls to v due to internal post-firing inhibition. The rate of rebound of V, depends on E l . We assume that lateral inhibition and post-firing inhibition interact in a multiplicative manner. Lateral inhibition has a small effect if the receiving node has not fired recently, and strong effect if it has (sufficient to inhibit the node for the rest of the cycle). Thus a node can fire repeatedly until it receives lateral inhibition. Given this framework, a string of letters, 1,1,. . . l,,, can be represented by the sequential firing of letter units. Here we assume that n is small enough so that all letters can be represented within a single cycle. Based on experimental evidence to be discussed below, we assume that g = 200 ms, and that a maximum of about eight letters can be encoded within a cycle. Let s, denote the first time that L, fires, el denote the last time, and k, denote the total number of times that L, fires (activation). Denoting the node that recognizes 1, as L, , it should be the case that s,>s,. . .>s,, and e,<s,+, . This constraint can be satisfied by having El > E,+, and V ,(s, - ,) < T. This ensures that the letters fire one at a time, in order. We assume that internal letters ( i > 2 and i < n ) attain the same minimal level of activation. In accordance with the experimental evidence for the importance of initial letters, we assume that the large inputs to L, and L, cause them to become more active than this minimal level. So k , > k, > k, , for i > 2 and i
,
152
encodes the order of the letters within the string. As a side effect of this process, the initial letters and the final letter have a higher activation level than the internal letters. Thus, the input gradient to the letter nodes creates an activation gradient across the letter nodes. However the activation gradient does not necessarily have the same shape as the input gradient because the activation gradient results from the interaction of the input gradient with other factors (namely, the internal states of the letter nodes, and lateral inhibition between letter nodes). The input gradient is strictly decreasing from initial to final position, while that is not necessarily the case for the activation gradient. This principle can account for the conflicting evidence regarding the apparent activation of the final letter in a string. Simulations to illustrate this effect are presented below. Bigram and word levels
The postulation of a temporal representation of letter order poses the problem of how such a representation might be decoded by the word nodes. We see two possibilities: either word nodes directly decode the temporal patterns, or intermediate units recognize temporal sub-patterns, and word units decode combinations of these nontemporal units. Corresponding to the first option, Hopfield (1 995) proposes that recognition of a temporal pattern involves learning time delays (physical connection lengths) so that all inputs arrive at the correct decoding unit simultaneously; the target unit then performs coincidence detection. Alternatively, short sequences could be recognized directly. That is, specialized nodes fire when two inputs occur in the proper order. The bigram level is based on this assumption. A bigram node becomes active if it receives suitable inputs within the time span of an oscillatory cycle. We hypothesize that neuronal assemblies exist which fire only if input A is followed by input B. Such an assembly would not fire if only A were received, or if B were received prior to A. A bigram node becomes activated even if its corresponding letters were not immediately contiguous in the string. Let B,, denote a bigram node
that becomes activated by input from L, followed by input from L,. We assume that the activation level of B,, is proportional to the product of k, and k,, and decreases with the time interval between between e, and s,. The slope of this decrease is assumed to be steep initially, and then to flatten out. This means that other things being equal, bigrams representing contiguous letters are more active than bigrams representing non-contiguous letters, while the activations of bigrams representing letters separated by one or more letters are comparable. This proposed scheme is much more robust than a time-delay architecture. Redundant information is encoded across the bigram units, whereas the timedelay scheme depends on learning precise connection lengths. At the bigradword interface, our model follows the classical firing-rate model. That is, the activation of a word node is a function of the dot-product of its weight vector and input vector. The input vector is comprised of the activations of all bigram nodes subsequent to representing the input string across the letter nodes. The weight vector is set to the vector comprised of the bigram activations corresponding to the word; i.e. the connection weight between a bigram node and word node is set to the bigram node’s activation subsequent to representing the word across the letter nodes.
The model and experimental data Specific elements of the theoretical model proposed above are supported by data from a variety of sources. In this section we review some additional data from the literature that constrains our formulation of particular elements of how letter string information is encoded. The actual dynamics of these elements are further supported by the results of three computer simulations. Simulations at the letter level illustrate the induction of a temporal firing pattern from graded inputs, and show how the model can account for the conflicting data on the activation of the final letter. String comparison simulations show how the proposed representation at the letter level can account for the complex positional results from Proctor and Healy’s (1987) order task. Finally, returning to the data that launched this investigation, we present simulations of the patients’ errors.
153
Letter level It has been proposed that oscillatory activity in the brain near 40 Hz (gamma frequencies) is related to cognitive processing (Tiitinen et al., 1993). There is evidence that individual 40 Hz waves are related to individual auditory stimuli (Joliot et al., 1994). It has been suggested that short-term memories are encoded on 40 Hz sub-cycles of a low-frequency ( 5 to 12 Hz) oscillation (Lisman and Idiart, 1995). This proposal is consistent with the observation of nested oscillations recorded in the human cortex in response to auditory stimuli (Llinas and Ribary, 1993). We suggest that such oscillations also underlie visual language processing. In our model, each letter position corresponds to a successive 40 Hz sub-cycle within the oscillatory period g = 200 ms. This proposal is consistent with some curious results from a study involving sequential letter presentation. In this study, the letters of 8-letter words were presented one at a time across a horizontal row (Mewhort and Beal, 1977). The interval between successive letters (ISI) was varied, and performance was measured as probability of correctly identifying the word. For ISIs of 0 ms, 50 ms, and 125 ms, performance was 98%, 70%, and 50%, respectively. However, for an IS1 of 250 ms, performance rebounded to 65%, rather than continuing to fall off. Our interpretation is that the sequential presentation interfered with the normal phasic coding of letter position. Letter presentations were maximally out of phase with respect to normal at an IS1 of 125 ms. Performance levels for 50 ms and 250 ms were similar, consistent with a cycle length of 200 ms. A variation of this task, where the unit of presentation was a syllable rather than a letter, provides further evidence of periodicity. Errors rates increased as IS1 increased from 0 to 250 ms, decreased back to the Oms level with further increases of IS1 from 250 to 500ms, and increased again from 500 to 625 ms. Temporal encoding simulations We have performed simulations to illustrate the proposed firing pattern at the letter level. The
simulations were implemented by choosing specific values for the functions described above in the specification of the letter level. The details are given in Appendix A. The activation pattern of the nodes relative to each each other can be manipulated by varying the overall level of input (the absolute size of the Es). For a high input level, k,, (the activation of the node representing the final letter) may be greater than k, (the activation of the node representing the penultimate letter), as described above. For a low input level, the final node may not fire at all or may fire only a minimal amount, as s,,(the onset of firing for the node representing the final letter) is pushed near to g / 2 (the time at which the internal oscillation is at its maximum). This effect is illustrated in Fig. 2. We propose that variation in overall input level is one component contributing to the contradictory evidence regarding the importance of the final letter in the normal literature, as discussed above. To review, although some studies showed increased activation of the final letter relative to stringinternal positions, there were five experimental results in which the final letter seemed to be the least activated of all letter units. These findings were generated by a number of distinct experimental paradigms, including the neutral-condition masked form priming studies of Humphreys et al. ( 1 990), central fixation trigram identification results from Hellige et al. ( 1 9 9 3 , word identification data from Montant et al. (1998), false positive responses in Peresotti and Grainger (1995), and the ‘order’ condition of Proctor and Healy (1987). The activation dynamics postulated in our model can accommodate low levels of activation for final letters under these experimental conditions. We assume that the final letter is normally more active than internal letters due to lack of lateral inhibition subsequent to firing. However, when overall input level is reduced, the final letter is the most vulnerable. When stimulus presentation is very short, input level is reduced; the final letter becomes the least activated. Indeed, it is the case that in Humphreys et al. (1990), Hellige et al. (1995) and Montant et al. (1998), target presentation durations were very short, approximately 40 ms. In contrast, all experiments that showed a
-,
154 I
I
I
I
I
I
I
I
I
I
I
1
10
20
30
40
50
60
70
80
90
100
110
120
I
I
I
I
I
I
I
I
I
1
I
I
~~
0
Time
"t
0
4
0
0
I
I
I
I
I
I
I
I
I
I
I
I
I
0
10
20
30
40
50
60
70
80
90
100
110
120
Time
Fig. 2. Simulated firing at the letter level. Each point indicates a time at which the corresponding node spiked. Node 1 received the highest amount of input, while node S received the lowest. The upper graph shows the results for a high overall level of input, E = [15 13.5 12.0 10.5 9.01. The first node fires early in the cycle ( t = O ) , and the last node fires more than the internal nodes (5 spikes versus 2 or 3 spikes). For a lower level of input, E=[13.S 11.5 9.5 7.5 5.51, displayed in the lower graph, the pattern is different. The first node fires later in the cycle (t=15), and the last node fires only once, having the lowest level of activation.
privileged status for the final letter used target durations of 75 ms or greater. Thus, the data are consistent with the idea that the final letter fires until it can no longer reach threshhold due to the downward phase of the oscillatory cycle. At very short presentation durations and low levels of input, the final letter fires very late in the cycle, and becomes the least active of all the letters. For longer presentations and higher levels of input, the final letter starts to fire earlier in the cycle, and fires more than the internal letters. However, there were contradictory data from within the Humphreys et al. (1990) study; the
authors found the initiavfinal combination to be the most facilitatory for priming of all possible twoletter combinations. How could this combination be the most facilitatory when the final letter is the least active of the letters? The key is that priming measures the after-effects of firing, not the level of firing itself. We have proposed that letters become inhibited if they receive lateral inhibition after firing. Interpreted this way, the initial/final combination is the most facilitatory because it does not involve the internal letters, which have become inhibited as a result of prime presentation. Although the initial letter has also received inhibi-
155
tion, its internal state is still above normal due to its high level of input. This idea is consistent with a study showing that non-word primes have inhibitory effects on letter identification within a string for medially located targets, but not for initially or terminally located targets (Grainger and Jacobs, 1991). However, in the Humphreys et al. (1990) study, the two-letter combination consisting of the internal letters was found to be facilitatory compared to the neutral condition. If, as proposed, the internal letters are inhibited after prime presentation, how does this facilitation occur? We propose that it occurs at the next level of processing, the bigram level, because activation of the internal bigram of the target is facilitated. In the other studies which implied low levels of activation for the final letter (Proctor and Healy, 1987; Peressoti and Grainger, 1995), targets were displayed until a response was given by the subject. Thus, short presentation durations cannot account for these results. Despite the extended presentation duration, the Peressotti and Grainger (1995) result can be understood as a reflection of the low level of input to the final position. In that study, 3-character strings were presented, and subjects were to determine whether or not the string consisted entirely of alphabetic characters. Foils consisted of two letters and one non-letter (e.g. A%B). When the non-letter appeared in the final position, false positives were the most likely to occur, indicating that the final character was the least well represented. We propose that a non-letter final character does not display the usual advantage that results from lack of lateral inhibition because its activation dynamics are different from that of letters. Letters are interconnected in a network that processes strings, while non-letters are not. Thus, the low level of input to the final position is shown directly when that character is not a letter. This proposal is consistent with studies showing that strings of digits and letters are processed differently from strings of other characters. Digits, like letters, have shorter detection latencies at the ends of a string. However, strings of other types of symbols do not show this pattern; latencies are increased at the ends (Hammond and Green, 1982; Mason, 1982). Our account of the Proctor and Healy (1987) result involves a higher level of processing than the
letter level, namely that of string comparison. The fact that our account depends on the comparison process is consistent with the finding that the role of the final letter varied with the type of comparison task (‘order’ versus ‘item’) in those experiments. In the ‘order’ task subjects were to determine if two strings were comprised of the same letters in the same order. Two types of differences between strings were analyzed. For ‘replacement’ pairs, the two strings differed by the substitution of a letter at a single position, i.e., ABCD and ABXD. Response time increased monotonically with replacement position, i.e., XBCD was the fastest to be rejected, and ABCX was the slowest. Thus, no privileged status for the final letter was observed. For ‘permutation’ pairs, the two strings were comprised of the same letters in different orders, i.e., ABCD and DBCA. Response time was a complex function of total displacement of constituent letters, and of positional matches. We propose that the final letter was indeed more highly activated than the internal letters, but that this was not evident in the pattern of response times for replacement pairs due to the nature of the comparison operation between the two strings. We have implemented a simulation that illustrates this hypothesis. Moreover, the simulation also captures the pattern of response times observed for the permutation pairs. String comparison simulations The simulation is based on the premise that letter position is represented by the timing of firing relative to an underlying oscillatory cycle. Thus, representation of position can be considered to be a waveform over time. We model comparisons between strings as an interaction between the constituent waveforms over the entire oscillatory period. Using a trial and error process, we arrived at parameters that yielded a good fit to the experimental data. The underlying period is represented by time slots 1-120. Firing is represented as occurring over a block of time, rather than as individual spikes. A node’s activation corresponds to duration of firing. The first letter in a string fires during the first 50 time slots, the second letter fires
156
during the next 25 slots, the third letter during the next 13 slots, and the fourth and final letter fires for the next 25 slots. Specifications of the waveforms, and the comparison function are given in Appendix B. Figure 3 shows the results of our simulations compared with the experimental data. Note that the response times for the replacement pairs are monotonically increasing despite the fact that the fourth letter is active longer than the third letter (25 slots vs. 13 slots). This result occurs because the comparison function depends on the entire waveform, not just the activation period. Note also that the simulations reproduce the pattern of response times from the permutation pairs. The two simulations were run with the same parameters. Thus, simulations based on the premise that string comparisons can be modeled as the interaction of temporal waveforms yielded a good fit to these complex data. Bigram and word levels
The simulations presented above lend support to our proposed mechanisms for encoding letter order in a manner that produces an activation gradient across letter units, which is then propagated to the bigram units. Next we turned our attention to the dynamics of bigram-to-word activation in an attempt to generate the correct words from letter string input and to simulate the patient data. Activation gradient simulations (11) These simulations were similar in structure to the pilot activation gradient simulations, except that the input layer consisted of bigram nodes, rather than letter nodes. The network thus consisted of two layers of nodes, the bigram (input) layer, and the word (output) layer. The bigram layer consisted of all possible combinations of two letters, and the word layer was the same as in the pilot simulations (approximately 3500 monosyllabic words). Activations of bigram nodes were set in accordance with the assumptions outlined above, namely that their activation is proportional to the activations of their constituent letters, and that activation is reduced with the separation of their constituent
letters. Letter activations were not modeled directly, but rather were assumed to vary with position (as simulated above). This letter activation gradient was incorporated in the specification of the activation of bigram nodes. Thus, the activation of a bigram node depended on the the positions of its constituent letters within the word being represented across the input layer. We assumed that despite the separation between the initial letter and the final letter, their high levels of activation would combine to make bigram B,,, highly activated. The details of the bigram activations are given in Appendix C. As in the pilot simulations, weights between the two levels were set, not learned. The weight vector for each word node was set to the bigram activation vector corresponding to that word. The activation of a word node was calculated as the dot product of its weight vector and the input vector. The word node with the highest activation was selected as the output of the simulation. In a test of intact performance, the network correctly identified each word in the database. Lesion simulations The performance of the intact model suggests that the structure we have proposed is generally feasible as a model of written word activation. However, the primary goal of the undertaking was to produce a model that could be degraded to reproduce several aspects of our patient data. Ideally, the lesion simulations could provide us with a means of testing our ideas about the processing limitations that led to the reported symptoms. As noted above, all the patients in our sample showed some difficulty reading aloud following their stroke. Although they varied widely in their ability to read words, all of them showed difficulty sounding out non-words that was disproportionate to their ability to read words. Although our model does not incorporate a means for allowing sublexically generated phonetic input to affect activation of the word nodes, there is evidence that such input is available to normal readers very early in the course of written word processing (Perfetti and Bell, 1991; Ferrand and Grainger, 1993). For our patients, however, any information generated by sublexical
157 I
I
I
I
experimental simulated
+
I
1400 1300
i=
1200
900 800 700
1 1
2
4
3
Replacement Position
1
1300
-+
1200
-
1100
-
800
-
700
-
600
-
1000 900
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
experimental +simulated -t-.
Fig. 3. Experimental and simulated results for the order task of the string comparison study. The upper graph dislays the replacement pairs by position of replacement. The lower graph displays the permutation pairs. Permutations are ordered by degree of positional displacement of the constituent letters from the base string, ABCD.
158
print-to-sound translation would be expected to be abnormal. Based on this assumption, we ‘lesioned’ the model by adding normally-distributed noise (representing input from impaired sublexical translation) to the word nodes in the patient simulations. A second factor we considered in lesioning the model concerned the apparently low level of activation found for the final letter in the positional analysis of the patient data. As discussed above, we hypothesized that the last letter in a string is normally more highly activated than internal letters (because it does not receive lateral inhibition after firing), except in conditions of degraded input. We assumed that some degree of difficulty with the activation of letters might reproduce the effect of degraded input conditions and lead to low levels of activation for final letters. Although we have no independent evidence that these patients actually showed letter activation impairments (because we did not look for them at the time the patients were tested), it is possible that they did. Arguments have been made that subtle deficits in letter activation can be uncovered through the use of specially designed tasks, even in patients who do not show clinical signs of such impairments (e.g., ReuterLorenz and Brunn, 1990). Based on this idea, the activation of B,, was not adjusted to a high level in the lesion simulations. (Note, however, that the connection weights remained unchanged; the weights on such bigram-to-word connections retained their original high values.) This reasoning suggests that it should be possible to further test the relationship between letter activation impairments and retention of the final letter in word substitution errors through careful study of individual patients. If the assumptions underlying this lesion locus in our simulations are correct, the two symptoms should be correlated. Various parameters for the distribution of noise and the positional activation values were tried in order to find those that gave the best fit to the patients’ errors. The following results are based on simulations run with noise having a mean of 0, and a standard deviation of 0.6. The resulting errors were very similar in form to those made by the patients. Errors consisted of insertions and deletions, with very few transpositions. Figure 4
100
90 80 U 70 K ‘5 60 c 50-
-
! c 2
40
I
I
1
f
a<-..
I
L
‘g
-
30 20
I
I
experimental simulated
-
+--- --+
I
I
I
I
I
1
1
2
3
4
5
6
letter position
I
Fig. 4. Retention level by position for experimental and simulated results.
shows the results of the positional analysis of the experimental and simulated results. This analysis was performed in the same manner as described above for scoring the patient data, where the proportion of errors retaining the letter in each position was tallied. Note that the positional curve for the simulated results captures the overall shape of the experimental curve. Retention levels fall off slowly between positions 1 and 2, and then more sharply thereafter. As discussed above, the degree of retention by position varied with target length. Figure 5 displays retention level at position 3 for each target length. 100 I
I
I
I
0’ 2
I
3
I
4
I
5
target length
1
I
experimental simulated
I
6
-t+
I
Fig. 5. Retention level for position three by target length for experimental and simulated results.
I59
71
1
I
I
I
1
experimental ++ simulated -t--
3 ’
I
3
I
I
1
4
5
6
I
target length
Fig. 6. Average error length by target length for experimental and simulated results.
It is evident that the third letter is less likely to be retained in three-letter words than in longer words. The simulations reproduce this effect. It was also the case that errors tended to preserve the length of the target. These data are displayed in Fig. 6. For each target length, the average length of all the error responses was calculated. This average was close to the target length. The simulated results also show increasing response length with increasing target length, although not as strongly. This effect was very robust and held over all parameters that were tried. We think the effect is not as strong as was observed in the patients’ results due to the fact that the simulation’s word database consisted of monosyllabic words only. Thus the simulation did not have access to longer words, resulting in shorter responses for targets of length 5 and 6 than the patients gave. The simulations appear to have captured the primary characteristics of the patient data through the introduction of two ‘lesions’ in the model’s activation dynamics. Although we attempted to motivate our degradation of the model’s functioning in a principled manner, further research is needed before we can assume that these simulated ‘lesions’ represent functional cognitive deficits. Specifically, we need to investigate the ability of the model to reproduce deviations from the modal pattern of patient performance.
Discussion We have presented a theoretical framework in which the order of letters within letter strings is encoded through the activation of bigram units from letter units. Unlike previous models employing multi-letter contextual units to encode letter order, we have attempted to account for the activation of bigram units as a consequence of the dynamics of letter activation. Letter units are activated as a temporal firing pattern across positions; this pattern is initiated by graded stimulus inputs and is modulated by lateral inhibition from subsequent letters’ activations. Elements of this scheme have been shown to accommodate a number of complex results regarding normal readers’ encoding of letter positions across a variety of tasks. The representations and processing dynamics postulated here were shown to result in successful activation of correct word nodes for monosyllabic words, and most importantly, could be degraded in a principled fashion to simulate a number of characteristics of the reading errors of patients with left hemisphere cortical lesions. Our simulation of the ‘right neglect dyslexia’ pattern of errors was accomplished without the postulation of a spatial representation of the letter strings, and without assuming that the patient data reflect ‘neglect’ of this spatial representation. Rather, the positional results (and the other aspects of the data that were simulated) are interpreted as a reflection of breakdown within processes that are specific to language, rather than to a form of attentional impairment. One critical issue remains to be discussed. In our model, the induction of a temporal firing pattern across letter nodes relies on the postulation of a gradient across the letter input: initial letter positions provide the strongest input, second positions somewhat less input, etc. What is the source of these graded inputs? There is evidence that graded inputs may be related to the way information is processed, or attention is allocated, by each of the cerebral hemispheres. Many different characterizations have been given to the different ‘modes of processing’ that describe the manner in which each of the two hemispheres deals with information (Heilman, 1995; Hellige,
160
1993). One such characterization from studies of normal letter string identification is that the left hemisphere processes information in parallel, with attention deployed across all letter positions relatively equally, while the right hemisphere uses a less efficient serial mechanism to process letters (Reuter-Lorenz and Baynes, 1992; Hellige et al., 1995). For example, normal subjects identified CVC letter strings better when they were presented tachistoscopically to the right visual field(RVF)/left hemisphere(LH) than when they were presented to the left visual field (LVF)/right hemisphere(RH)(Eng and Hellige, 1994; Hellige et al., 1995). This result supports the general superiority of the left hemisphere for the processing of linguistic stimuli. However, an important (and counterintuitive) result involved the error patterns produced for each letter position as a function of the visual field to which strings were presented. With LVF/RH presentation, subjects made many more errors that involved deletion of the last letter, relative to the first letter, of the string. With RVF/ LH presentation, this finding was greatly attenuated: there were more errors on the first letter, and fewer errors on the last letter, resulting in a more even distribution of errors across the string. Interestingly, when stimuli were presented to both hemispheres in a central fixation (Hellige et al., 1999, or were presented simultaneously to both visual fields (Eng and Hellige, 1994), the error pattern that emerged was the same as the right hemisphere pattern. This finding suggests that, even though the LH was the most effective at performing the task, the RH’s mode of processing dominated when the stimuli were presented to both hemispheres simultaneously. The effect produced is that with central fixation, stimuli are processed in a manner that results in superior processing of the first letter, declining across positions. Although this advantage for early letter positions appears to apply to normal (central fixation) reading, at least for languages that are read from left-to-right (Chokron and Imbert, 1993; Chokron and DeAgostini, 1995), its precise source is not entirely clear. In addition to the interpretation offered by Hellige et al. (1995), the results could reflect the manner in which the right hemisphere allocates attention across letter positions. Whatever
its source, in our model the effect of these graded inputs is carried forward (for bigram and word units) such that even as the processing levels become increasingly abstract (i.e. farther removed from the visual input), the advantage for early letter positions is maintained. As shown in our initial simulations using an activation gradient across letter nodes, the entire constellation of results cannot be successfully reproduced through the simple postulation of an activation gradient. The model provides details about the effects such graded inputs could have on subsequent levels of processing. Our model, and the ‘lesion’ simulations that were carried out, offer an alternative to the idea that the pattern of errors found in ‘right neglect dyslexia’ indicates an attentional deficit across a spatial representation of letter strings. The situation that emerges when the right hemisphere is damaged, and the initial positions in letter strings are subject to disruption when reading (left neglect dyslexia), clearly requires a qualitatively different explanation. In the case of right hemisphere damage, the left-to-right mode of processing that emanates from the right hemisphere is presumably disrupted, along with other consequences that may affect all spatial processing to the left of midline. When the right hemisphere is damaged, the reading errors that preferentially degrade left-most letter positions might reasonably be regarded as a form of left neglect. In contrast, the conclusion we are compelled to reach is that the pattern of error at issue here - word substitutions with degraded right-most letter positions - should not be regarded as a type of neglect but as a form of language impairment.
Appendix A Temporal encoding simulations Entity
Definition
Li t C(t)
ith letter node time internal oscillation oscillatory period
g
161
T V,(t) El
Ii(t) V
firing threshhold internal state excitatory input inhibitory input constant
The functions and variables defined in the model are summarized above. To review, prior to firing, V,(t)=El+ C(t)+ I,(t) + v, where t ranges from 0 to g. When V,(t) 2 T, L, fires and sends lateral inhibition to all other letter nodes. After firing, V, falls to V , and then rebounds, where the rate of rebound depends on El. The simulations were performed using the following values: T = - 50, g = 200, and v = - 65. The values E, constituted the inputs to the simulations. For simplicity, C was taken to be piecewise linear, with C(r) =0.1* r for I S g/2, and C(t)= C(g/ 2) - C(t - g/2) for t > g / 2 . If a node has fired, V ( t )= V(t - 1) + T/(T - E, + 1) up to a maximal value of v+C(t>+E,. For simplicity, the effect of lateral inhibition on a node was considered to be negligible, unless the node had already fired, in which case the node was inhibited for the rest of the cycle.
Appendix B String comparison simulations Using a trial and error process, we arrived at parameters that yielded a good fit to the experimental data. A waveform encodes position within a string, and is comprised of time slots 1 to 120. The representation of a string consists of four letters and their positional waveforms. A value is associated with each time slot in a waveform. For simplicity, values are abstract, and limited to three values: 10 while firing, - 1 prior to firing, and 4 after firing. The first letter in a string fires during the first 50 slots, the second letter fires during the next 25 slots, the third letter during the next 13 slots, and the fourth and final letter fires during the next 25 slots. We denote the waveform representing position i as W,, and we denote the value of the kth time slot of W, as P(W,, k ) . For example, for the waveform representing the second position, P(W2,1) to P(W,, 50)= - 1, P(W2, 51) to P(W,, 75)= 10, and
P(W2,76)to P(W2,120)=4. If a letter does not fire, its positional waveform, W,,, has value - 1 for all time slots. We assume that a mechanism exists for comparing firing patterns (waveforms). Briefly, the similarity of two strings is evaluated by comparing the waveforms for each letter, calculating a total raw score, and comparing the raw score to the maximal possible score. More specifically, we define a comparison function, C( W, ,W,), which takes two positional waveforms and yields a comparison waveform, W,]. C is defined as follows: For each time slot t, if P(W,, t ) * P(W,, t ) > 0, P(W,, t ) = (P(W,, t ) + P( W,, t))/2: otherwise, P( W,, , t ) = P( W,, t ) - 6. We define L(S,, S,, i) as the position in S2 of the ith letter of S,: if the letter does not appear in S,, L = 0. To compare two strings, the comparison waveforms for each letter are calculated. That is, given strings S, and S,, for each position i, we calculate C(W,, WL(S,.S,2 ,,). For example, to compare the strings S , = ABCD and S2= AXBC, the following positional waveforms are compared: W , and W, (for A), W, and W, (for B), W , and W, (for C), and W, and W, (for D). The raw score, R(S,, S2),is calculated by summing over all time slots over all four comparison waveforms. The maximal raw score, M , is set to the R(S,, S,). The simulated response time, T(S,,S,) is calculated by comparing R(S,, S,) to M and scaling it: T(S,, S,) = 800 ms * (R(S,, S,)/IW)~+ 720 ms.
Appendix C Activation gradient simulations (11) To represent the input string, activations of bigrams comprised of contiguous letters in the input string ', where pos denotes the position were set to 0.6D0Sof the initial letter of the bigram. For positions greater than 3, pos was set to 3. Thus, these activation levels were either 1.0, 0.6 or 0.36. The activations of non-contiguous bigrams were set to 0.6p"". If a bigram occurred more than once, its activation was set to the sum of the activations for each occurrence. We assumed that despite the separation between the initial letter and the final letter, their high levels of activation would combine to make bigram B , ,
162
highly activated. We also assumed that the activation of the final letter depends on the length of the input string, because for longer input strings, the final letter starts to fire later in the cycle and thus achieves a lower level of activation. To approximate these assumptions, the activation of B,, was adjusted to 1.0 - 0 . 0 1 ~For example, to represent the input string FLANK, FL = 1.O, LA = 0.6, AN=0.36, NK=0.36, FA=0.6, LN=0.36, AK=0.22, FN=0.6, LK=0.36, FK=0.95, and all other bigrams were set to 0. Two-letter and oneletter words were handled as special cases.
Acknowledgements The studies reported here were supported by grant number R01-DC00699 from the National Institute on Deafness and Other Communication Disorders to the University of Maryland School of Medicine. The authors are grateful to James A. Reggia for helpful comments, and to Anne N. Haendiges for assistance with the patient data.
References Berndt, R.S., Haendiges, A.N., Mitchum, C.C. and Wayland, S.C. (1996) An investigation of nonlexical reading impairments. Cogn. Neuropsychol., 13: 763-80 1. Bemdt, R.S. and Haendiges, A.N. (1997) Positional effects in dyslexic ‘visual errors’: constraints on the interpretation of word substitutions. Bruin Lung., 60: 112-1 15. Berry, M.J., Warland, D.K. and Meister, M. (1997) The structure and precision of retinal spike trains. Proc. Nutl. Acad. Sci., USA, 94: 541 1-5416. Buxbaum, L.J. and Coslett, H.B. (1996) Deep dyslexic phenomena in a letter-by-letter reader. Bruin Lang., 54: 136-167. Caramazza, A. and Hillis, A.E. (1990a) Levels of representation, co-ordinate frames, and unilateral neglect. C o p . Neuropsychol., 7: 391445. Caramazza, A. and Hillis, A.E. (l990b) Spatial representation of words in the brain implied by the studies of a unilateral neglect patient. Nature, 346: 267-269. Casey, T. and Ettlinger, G. (1960) The occasional ‘independence’ of dyslexia and dysgraphia from dysphasia. J. Neurol. Neurosurg. Psychiat., 23: 228-236. Chambers, S.M. (1979) Letter and order information in lexical access. J. Verb. Learn. Verb. Behuv., 18: 225-241.
Chokron, S. and De Agostini, M. (1995) Influence of reading habits on line bisection: a developmental approach. Cogn. Bruin Res., 3: 51-58. Chokron, S. and Imbert M. (1993) Reading Habits and line bisection. Cogn. Bruin Res., 1: 219-222. Coltheart, M. ( 1980) Deep dyslexia: a review of the syndrome. In: M. Coltheart, K. Patterson and J.C. Marshall (Eds.), Deep Dyslexia. Routledge and Kegan Paul, London, pp. 2 2 4 7 . Coltheart, M., Curtis, B., Atkins, P. and Haller, M. (1993) Models of reading aloud: Dual-route and parallel-distributedprocessing approaches. Psychol. Rev., 100: 589-608. De Ruyter van Steveninck, R.R., Lewen, G.D., Strong, S.P., Koberle, R. and Bialek, W. (1997) Reproducibility and variability in neural spike trains. Science, 275: 1805-1 808. Ellis, A.W., Young, A.W. and Flude, B.M. (1993) Neglect and visual language. In: I.H. Robertson and J.C. Marshall (Eds.), Unilateral Neglect: Clinical and Experimental Studies. Lawrence Erlbaum, Hillsdale, NJ, pp. 233-256. Eng, T.L. and Hellige, J.B. (1994) Hemispheric asymmetry for processing unpronounceable and pronounceable letter trigrams. Bruin Lang., 46: 517-535. Ferrand, L. and Grainger, 1. (1993) The time course of orthographic and phonological code activation in the early phases of visual word recognition. Bull. Psychon. Soc., 31: 119-122. Grainger, J. and Jacobs, A.M. (1991) Masked constituent letter priming in an alphabetic decision task. EUK J. Cogn. Psychol., 3: 413434. Greenwald, M.L. and Berndt, R.S. (1998) Impaired encoding of abstract letter order: severe alexia in a mildly aphasic patient. Cogn. Neuropsychol., in press. Hammond, E.J. and Green, D.W. (1982) Detecting targets in letter and non-letter arrays. Can. J. Psychol., 36: 67-82. Heilnian, K.M. (1995) Attentional asymmetries. In: R.J. Davidson and K. Hugdahl (Eds.), Brczin Asymmefty. MIT Press, Cambridge, MA, pp. 217-234. Hellige, J.B. (1993) Hemispheric Asymmetry: What’s Right and What’s Left. Harvard University Press, Cambridge, MA. Hellige, J.B., Cowin, E.L. and Eng, T.L. (1995) Recognition of CVC syllables from LVF, RVF, and central locations: hemispheric differences and interhemispheric interaction. J. Cogn. Neurol., 7: 258-266. Hillis, A.E. and Caramazza, A. (1995) A framework for interpreting distinct patterns of hemispatial neglect. Neurocuse, 1: 189-207. Hopfield, J.J. (1 995) Pattern recognition computation using action potential timing for stimulus representation. Nature, 376: 33-36. Hopfield, J.J. (1996) Transforming neural computations and representing time. Proc. Nutl. Acad. Sci., USA, 93: 15440-15444. Humphreys, G.W., Evett, L.J. and Quinlan, P.T. (1990) Orthographic processing in visual word identification. C o p . Psychol.. 22: 5 17-560. Joliot. M., Ribary, U. and Llinas R. (1994) Human oscillatory brain activity near 40 Hz coexists with cognitive temporal binding. Proc. Nutl. Acad. Sci., USA, 91: 11748-1 1751.
163 Kinsboume, M. and Warrington, E.K. (1962) A variety of reading disability associated with right hemisphere lesions. J. Neurol. Ne~crosurg.Psychiat., 25: 339-344. Lefton. L.A., Fisher, D.F. and Kuhn, D.M. (1978) Left-to-right processing of alphabetic material is independent of retinal location. Bull. Psychon. Soc., 112: 171-174. Lisman, J.E. and Idiart, M.A.P. (1995) Storage of 7 + 2 shortterm memories in oscillatory subcycles. Science, 267: 1s12-1 5 IS. Llinas, R. and Ribary, U. (1993) Coherent 40-hz oscillation characterizes dream state in humans. Proc. Nail. Acad. Sci., USA, 90: 2078-208 1. Mason. M. ( I 982) Recognition time for letters and nonletters: effects of serial position, m a y size, and processing order. J. Exp. Psychol., 8 : 724-738. McClelland, J.L. and Rumelhart, D.E. (1981) An interactive activation model of context effects in letter perception: Part 1 . An account of basic findings. Psychol. Rev., 88: 375407. Mewhort, D.J.K. and Beal, A.L. (1977) Mechanisms of word identification. J. Exp. Psychol., 3: 629-640. Montant, M., Nazir, T.A. and Poncet, M. (1998) Pure alexia and the viewing position effect in printed words. Cogn. Neuropsychol., IS: 93-140. Morton, J. and Patterson, K. (1980) A new attempt at an interpretation, or, an attempt at a new interpretation. In: M. Coltheart, K. Patterson and J.C. Marshall (Eds.), Deep Dyslexia. Routledge and Kegan Paul, London, pp. 91-1 18. Mozer, M.C. (1987) Early parallel processing in reading: a connectionist approach. In: M. Coltheart (Ed.), Attention and Performance XII: The Psychology of Reading. Lawrence Erlbaum, London, pp. 83-104. Mozer, M.C. and Behrmann, M. (1992) On the interaction of selective attention and lexical knowledge: a connectionist account of neglect dyslexia. J. Cog. Neurosci., 2: 96-123. Perea, M. (1998) Orthographic neighbors are not all equal: evidence using an identification technique. Language and Cognitive Processes, 13: 77-90. Peressotti, F. and Grainger, J. (1995) Letter-position coding in random consonant arrays. Perception and Psychophysics, 57: 875-890. Peressotti, F. and Grainger, J. The role of letter identity and letter position in orthographic priming. Perception and Psychophysics, in press. Perfetti, C.A. and Bell, L. (1991) Phonemic activation during the first 40 ms of word identification: evidence from backward masking and priming. J. Mern. Lung., 30: 473485. Plaut, D.C. and Shallice, T. (1993) Deep dyslexia: a case study of connectionist neuropsychology. Cogn. Neuropsychol., 10: 377-500.
Proctor, R.W. and Healy, A.F. (1985) Order-relevant and orderirrelevant decision rules in multiletter matching. J. Exp. Psychol.: LMC, 1 1: 5 19-537. Proctor, R.W. and Healy, A.F. (1987) Task-specific serial position effects in comparisons of multiletter strings. Perception and Psychophysics. 42: 180-1 94. Ratcliff, R. (1981) A theory of order relations in perceptual matching. Psychol. Rev., 88: 552-572. Reuter-Lorenz, P.A. and Baynes, K. (1992) Modes of lexical access in the callosotomized brain. J. Cogn. Neur., 4: I 55- 164. Reuter-Lorenz, P.A. and Brunn, J.L. (1990) A prelexical basis for letter-by-letter reading: a case study. Cogn. Neuropsychol., 7 : 1-20. Riddoch, J. (1990) Neglect and the peripheral dyslexias. Cogn. Neuropsychol., 7: 369-389. Rieke, F., Warland, D., De Ruyter van Steveninck, R. and Bialek, W. (1997) Spikes: Exploring the Neural Code, MIT Press, Cambridge. Seidenberg, M.S. and McClelland, J.L. (1989) A distributed developmental model of word recognition and naming. Psychol. Rev., 96: 523-568. Sejnowski, T.J. and Rosenberg, C.R. (1987) Parallel networks that learn to pronounce English text. Complex Systems, 1: 145-168. Shallice, T. and Warrington, E.K. (1975) Word recognition in a phonemic dyslexic patient. Quarterly J. Exp. Psychol., 27: 187- 199. Tiitinen, H., Sinkkonen, J., Rainikainen, K., Alho, K., Lavikainen, J. and Naatanen, R. (1993) Selective attention enhances the 40-hz response in humans. Nature, 364: 59-60. Victor, J.D. and Purpura, K.P. (1996) Nature and precision of temporal coding in visual cortex: a metric-space analysis. J. Neurophysiol., 76: 1310-1326. Wamngton, E.K. and Zangwill, O.L. (1957) A study of dyslexia. J. Neurol. Neurosurg. Psychiat., 20: 208-2 15. Wanington, E.K. (1991) Right neglect dyslexia: a single case study. Cogn. Neuropsychol., 8: 193-21 2. Whitely, H.E. and Walker, P. (1994) The activation of mulitletter units in visual word recgnition, Vis. Cogn., 1: 433473. Whitely, H.E. and Walker, P. (1997) Mulitletter units in visual word recgnition: direct activation by supraletter features, Vis. Cogn., 4: 69-1 10. Whitney, C.S., Bemdt, R.S. and Reggia, J.A. (1996) Simulation of neurogenic reading disorders with a dual-route connectionist model. In: J.A. Reggia. E. Ruppin. R.S. Berndt (Eds.), Neural Modeling of Brain and Cogniiive Disorders, World Scientific, Singapore, pp. 201-228.
This Page Intentionally Left Blank
I.A. Repgia. E. Ruppin and D. Glanzman (Eds.)
P r o p s s in Brain Rescorch. Val 121 0 1999 Elsevier Science BV. All rights rescrvcd.
CHAPTER 10
Prosopagnosia in modular neural network models Matthew N. Dailey” and Garrison W. Cottrell Department of Compiiter Science and Engineering, U.C. Sun Diego, La Jolla, CA 92093-0114, USA
Introduction There is strong evidence that face processing in the brain is localized. The double dissociation between prosopagnosia, a face recognition deficit occurring after brain damage, and visual object agnosia, difficulty recognizing other kinds of complex objects, indicates that face and non-face object recognition may be served by partially independent neural mechanisms. In this chapter, we use computational models to show how the face processing specialization apparently underlying prosopagnosia and visual object agnosia could be attributed to: (1) a relatively simple competitive selection mechanism that, during development, devotes neural resources to the tasks they are best at performing; (2) the developing infant’s need to perform subordinate classification (identification) of faces early on; and (3) the infant’s low visual acuity at birth. Prosopagnosia and visual object agnosia together form a classic neuropsychological double dissociation. This might be taken as evidence for a domain-specific face processing mechanism in the brain that is distinct from the mechanisms serving general object recognition. However, two issues have led to a long-running debate on this view: (1) it is not entirely clear how specific or independent prosopagnosia and visual object agnosia are, and (2) double dissociations do not necessarily implicate separate, domain-specific mechanisms. In this section, we first briefly review the data on proso~~
“Corresponding author. e-mail: ( mdailey,gary ] @cs.ucsd.edu
pagnosia and visual object agnosia; these data support the view that the mechanisms underlying facial identity recognition are at least somewhat different from those underlying most other object recognition tasks. We then review the theories attempting to explain the seemingly remarkable dissociation and motivate the current computational modeling studies. Is prosopagnosia really specijic to faces? Prosopagnosia is almost always accompanied by other visual impairments, so it is difficult to determine the extent to which a prosopagnosic’s deficit is limited to face processing. We limit discussion here to so-called ‘associative’ prosopagnosics, who perform normally on face matching and face detection tasks but cannot recognize the identity of familiar faces (De Renzi, 1986). This condition is usually associated with either unilateral right hemisphere or bilateral lesions in the fusiform gyrus area. For reviews that include lesion locations, see Farah (1990) and De Renzi et al. (1994). Although many prosopagnosics have difficulty performing difficult subordinate (within-class) classification tasks with objects other than faces, in some cases, the condition can be remarkably facespecific. De Renzi’s (1986) ‘case 4’ was profoundly prosopagnosic but claimed to have no trouble with day-to-day within-class discrimination of common objects such as keys and automobiles. However, there have been objections that perhaps this patient
166
was not tested extensively enough to determine whether his deficit was truly face specific. McNeil and Warrington (1993) report that W.J., a patient with severe prosopagnosia but apparently normal recognition of famous buildings, dog breeds, car makes, and flower species, had acquired a flock of sheep and learned to recognize the individuals from their markings. In a test with unfamiliar sheep of a breed unfamiliar to W.J., a control group performed significantly better on recognition of human faces than of the sheep faces, indicating the advantages humans normally have in identifying human faces. But W.J. performed significantly better on the sheep face task than on the human face task. The unfamiliar sheep face recognition task was in many ways as difficult in terms of complexity and confusability as face recognition, yet W.J. performed well. Martha Farah and her colleagues have performed two important experiments providing further evidence that face processing can be impaired with little impact on within-category discrimination of objects. In the first, they constructed a within-class discrimination task involving faces and visually similar eyeglasses (Farah et al., 1995a). Normal subjects were significantly better at discriminating the faces than the eyeglasses, but the prosopagnosic patient L.H. did not show this effect. His face discrimination performance was significantly lower than that of the control group, but his eyeglass discrimination performance was comparable to that of the controls. In the other experiment, the researchers compared L.H.’s performance in recognizing inverted faces to that of normals (Farah et al., 1995b). The surprising result was that whereas normal subjects were significantly better at recognizing upright faces than inverted ones, L.H. performed normally on the inverted faces but was actually worse at recognizing the upright faces than the inverted ones. These studies have shown that prosopagnosia can be quite specific to normal, upright faces. On the other hand, studies of several patients have shown that visual object recognition can be severely impaired while face recognition is spared. Associative visual agnosia sparing face recognition is normally associated with left occipital or bilateral occipital lesions and usually coincides with
alexia, in which patients have difficulty reading because they cannot rapidly piece letters into words (Farah, 1990). It seems to reflect an impairment in decomposing complex objects into parts (Feinberg et al., 1994). Although it is difficult to assess exactly what is impaired and what is preserved (researchers obviously cannot test patients on all objects), Farah’s (1990) review cites many such cases. Perhaps the most dramatically impaired and well-known visual agnosic without prosopagnosia is C.K. (Behrmann et al., 1994). This patient has a striking deficit in part integration; he can identify the component parts of objects but cannot put them together to recognize the whole. His face processing abilities, however, are largely spared, to the point that he can see faces in ‘composite’ images where a face is composed of or hidden amongst other objects, but cannot see the objects themselves. Moscovitch et al. (1997) show in a series of experiments that C.K.’s ability to recognize (upright) famous faces, family resemblances, caricatures, and cartoons is completely normal, as is his ability to match unfamiliar faces. On the other hand, he is impaired at tasks involving inverted faces, which presumably activate his damaged ‘object processing’ mechanisms. These complementary patterns of brain damage constitute a double dissociation between face and object recognition and provide evidence that the visual system contains elements specialized for (or merely very useful for) face processing. However, double dissociations certainly do not imply that two tasks are served by entirely separate and distinct ‘modules’. As Plaut (1995) points out, two seemingly independent tasks might not be independent at all, but simply rely more heavily or less heavily on particular mechanisms. In the worst case, the apparent distinction between face and object processing could simply reflect the expected outliers in a random distribution of patterns of brain damage (Juola and Plunkett, 1998). However, there are independent reasons, other than the patterns of brain damage, to believe that prosopagnosia reflects damage to a system that is specialized for face processing (and possibly certain other types of stimuli): we next review the behavioral distinctions between face processing and general object processing.
I67
How does face recognition diger from general object recognition ? Since the neuropsychological data indicate that there is something special about faces, perhaps the most debated issue in the field is whether there is an innate, mandatory, domain-specific module (Fodor, 1983) for face processing. Moscovitch et al. (1997), for instance, give a convincing argument for modularity based on their experiments with C.K., the object agnosic. At the same time, many other researchers have attempted to find a more parsimonious explanation for the face/object double dissociation that places face recognition at some extreme end of a continuum of mechanisms. There are many ways in which prosopagnosia could reflect damage to a general-purpose object recognition system yet appear to be face specific. One early explanation was that face recognition is simply more difficult than other types of recognition, so mild damage to a general-purpose recognition system could affect face recognition more than non-face object recognition (Damasio et al., 1982; Humphreys and Riddoch, 1987). But this hypothesis is ruled out by the fact that many visual object agnosic patients have impaired recognition of common objects but spared face recognition (see the previous section). Currently, there are at least two related classes of plausible theories attempting to characterize the differences between face and object processing. The first posits that faces are perceived and represented in memory ‘holistically’ with little or no decomposition into parts, whereas most other object classes require part-based representations. Farah et al. (1998) review the literature on this topic and provide new evidence for holistic perceptual representations. Biederman and Kalocsai (1997) propose a computational basis for such representations. They show that the outputs of an array of overlapping local spatial filters similar to some of the receptive fields in visual cortex, as used in Wiskott et al.’s (1997) face recognition system, can account for human performance in experiments using face stimuli but cannot account for human performance in other object recognition tasks. Clearly, such simple representations would be holistic at least in the sense that there is no explicit
encoding of the parts of the face independent of the whole face. Theories in the second, related class state that the main reason for the special status of face recognition is that it involves expert-level subordinate classification within a relatively homogeneous object class. In this view, faces are only special in that they are very similar to each other, and we must acquire a great deal of sensitivity to configural differences between them. Tanaka and Sengco (1997) have shown that subtle configuration information, such as the distance between the eyes in a face, plays a crucial role in face processing but not in processing other object types. But it appears that the acquisition of expertise in subordinate classification of a novel synthetic object class, ‘greebles’, leads to a similar sensitivity to configuration information (Gauthier and Tarr, 1997). Gauthier et al. (1998) have also observed in fMRI studies that expert-level greeble classification activates an area in fusiform gyms thought by some to be specialized for faces (Sergent et al., 1992; McCarthy et al., 1997). Thus the main observable differences between face processing and general object processing involve holistic representations and our level of expertise with subordinate-level classification of faces. In this chapter, we propose a theoretical model that explains how such specialized representations and mechanisms might develop, and describe preliminary computational modeling experiments that support the theory. The next section outlines some of the important data on the development of face recognition in infants, which we will use to inform the construction of our computational models. Developmental data and a possible low spatial frequency bias In the previous sections, we have outlined evidence from neuropsychology and adult behavior that faces (and possibly other similar classes of stimuli) are processed by specialized mechanisms. Experiments exploring the development of face recognition abilities in human infants have also provided important clues to the organization of the putative face processing system and how that organization arises.
168
Experiments have shown that at birth, an infant’s visual attention is spontaneously directed toward face-like stimuli (Johnson et al., 1991). An infant can visually discriminate between his or her mother’s face and a stranger’s face, but only external features such as hairline and head contours are salient to the task at an early age (Pascalis et al., 1995). Later, around the age of six to eight weeks, infants begin to use the face’s internal features to discriminate their mothers from strangers (de Schonen et al., 1998). A possibly related developmental factor is the fact that the newborn infant’s acuity and contrast sensitivity are such that they can only detect large, high contrast stimuli; at one month of age, infants are typically insensitive to spatial frequencies greater than two cycleddegree (Teller et al., 1986). It is not clear whether the infant’s shift to the use of internal features for distinguishing his or her mother from strangers represents the use of an entirely new system or a gradual refinement of the old system (Johnson, 1997). But the experimental data on the development of face recognition capabilities make it seem likely that the infant visual system begins training a cortical ‘face processor’ utilizing external facial features very early on. At the same time, these capabilities must develop on the basis of extremely low resolution stimuli. de Schonen and Mancini (1998) propose a scenario accounting for some of the known data. The scenario holds that several factors, including different rates of maturation in different areas of cortex, the infant’s tendency to track faces, and the infant’s initially low acuity, all conspire to force an early specialization for face recognition in right hemisphere. This specialized mechanism would necessarily be based on a ‘configurational’ as opposed to a ‘componential’ approach, due to the low resolution involved, so it could well provide the basis for an adult-level holistic face processing system. de Schonen and Mancini’s scenario resonates with some of the recent experimental data showing a low spatial frequency bias in adult face processing. Costen et al. (1996) showed that although both high-pass and low-pass image filtering decrease face recognition accuracy, high-pass filtering degrades identification accuracy more quickly than
low-pass filtering. Also, Schyns and Oliva (1999) have shown that when asked to recognize the identity of the ‘face’ in a briefly-presented hybrid image containing a low-pass filtered image of one individual’s face and a high-pass filtered image of another individual’s face, subjects consistently use the low-frequency component of the image for the task, whereas they are inconsistent in other tasks such as gender classification or emotion classification. This work indicates that low spatial frequency information may be relatively more important for face identification than high spatial frequency information.
Chapter outline In a series of computational modeling studies, we have begun to provide a computational account of the face specialization data. We propose that a neural mechanism allocating resources according to their ability to perform a given task could begin to explain the apparent specialization for face recognition evidenced by prosopagnosia. We have found that a model based on the mixture of experts architecture, in which a gating network implements competitive selection between two simple homogeneous modules, can develop a specialization such that damage to one module disproportionately impairs face recognition compared to non-face object recognition. We then consider how the availability of spatial frequency information and the task to be performed affects face recognition specialization given this hypothesis of neural resource allocation by competitive selection. We find that when high and low spatial frequency information is ‘split’ between two modules in our system, and the task is to identify the faces while simply classifying the objects, the low-frequency module consistently specializes for face recognition. After describing the models in more detail, we present our experimental results and discuss their implications.
The modeling paradigm We have performed two computational modeling experiments designed to explore the ways in which a general-purpose learning mechanism might spe-
169
cialize for face recognition vs. object recognition, such that localized random ‘damage’ to the model results in decreased face recognition performance or decreased object recognition performance. Both of the models are feed-forward neural networks with special competitive ‘modular’ architectures’ that allow us to conveniently study the conditions under which specialization arises. In this section, we describe the computational models then describe how we acquired and preprocessed the objecdface image data used in both experiments. The theoretical model Our basic theoretical model for face and object recognition is displayed in Fig. 1. We generally assume that prosopagnosia (sparing object recognition) and visual object agnosia (sparing face recognition) are symptoms of damage to subsystems that are more or less specialized for face recognition or object recognition. We imagine an array of general-purpose ‘processing units’ that compete to perform tasks and a ‘mediator’ that selects processing units for tasks. This mediator
’
Competition in this context means competition between computational modules performing a supervised learning task-this should not be confused with techniques for unsupervised competitive learning in neural networks.
could be intrinsic to the processing unit architecture itself, as in the self-organizing map (Kohonen, 1995) or a more external, explicit mechanism, as in the mixture of experts (ME) (Jordan and Jacobs, 1995). We instantiate this theoretical model with modular neural networks by presenting modular networks with various face/object classification tasks then study the conditions under which, through competition, one expert or module specializes for faces to the extent that ‘damaging’ that model by removing connections results in a ‘prosopagnosic’ network. By allowing the networks to learn and discover potentially domain-specific representations on their own, we can gain some insight into the processes that might lead to such specializations in the brain. Although we make no claims that our models are biologically plausible in any significant way, the experts or modules in a given network could be interpreted as representing, for instance, analogous regions in the left and right hemispheres, or two nearby relatively independent processing units in the same region of the brain. The network architectures The first model’s network architecture is the wellknown ‘mixture of experts’ (ME) network (Jacobs et al., 1991). The ME network contains a popula-
Fig. 1. Theoretical visual recognition model. We assume that recognition of face-like stimuli and non-face-like stimuli is accomplished by interdependent hut possibly specialized processing units. Some mechanism, on the basis of a representation of a stimulus, mediates a competition between the units to generate a final decision on the identity or class of the stimulus. We explore the conditions under which one system specializes for face processing.
170
tion of simple linear classifiers (the ‘experts’) whose outputs are mixed by a ‘gating’ network. During learning, the experts compete to classify each input training pattern, and the gating network directs more error information (feedback) to the expert that performs best. Eventually, the gating network learns to partition the input space such that expert 1 ‘specializes’ in one area of the space, expert 2 specializes in another area of the space, and so on. The second network architecture we use is inspired by the ME network but is slightly more complicated. The main difference between it and ME is that it contains a modular hidden layer and a gating network that essentially learns to decide which hidden layer representation to ‘trust’ in classifying a given input stimulus. Figure 2 summarizes the differences between the two architectures; Appendices A and B describe their operation and learning rules in detail. Modular networks like the mixture of experts can be useful in a variety of engineering applications, but as Jacobs (1997) argues, they have also been very useful tools for exploring hypotheses about brain function. Jacobs and Kosslyn (1994), for instance, showed that if one expert in a two-expert network was endowed with large ‘receptive fields’ and the other was given smaller receptive fields, one expert specialized for a ‘what’ task whereas the other specialized for a ‘where’ task. As another example, Erickson and Kruschke (1998) have successfully used the mixture of experts paradigm to model human categorization of visual stimuli. Thus the mixture of experts approach is a potentially powerful computational tool for studying functional specialization in the brain.
gate network’s estimate of the probability the given pattern was drawn from the expert’s area of expertise. To determine whether expert or module n is a ‘face specialist’, we can present the face patterns in the test set to the network, record gate unit a’s activation for each of the patterns, and average them. If that average is high, we can say that expert or module n is indeed a face specialist. We can model localized brain damage by randomly eliminating some or all of the connections in one of the experts or modules. If one expert or module is specialized for a task, such as book classification, but not other tasks, eliminating its connections will degrade the overall model’s performance on that task, with less impact on performance of other tasks.
Measuring specialization and efsects of local ‘brain damage’
Preprocessing with Gabor wavelet filters
Since these modular networks naturally decompose given problems in a data-driven way, we can explore hypotheses about the modularity of face and object recognition by training the models to perform combined face/object classification tasks. In both the network models we have described, the gating network assigns a weight to each expert or module given an input pattern; this weight is the
Face/object stimuli Our studies utilized static images of 12 individuals’ faces, 12 different cups, 12 different books, and 12 different soda cans. See Fig. 3 for examples from each class. For the faces, we collected five images of each of 12 individuals from the Cottrell and Metcalfe database (1991). In these images, the subjects attempt to display various emotions, while the lighting and camera viewpoint is held constant. For the 36 objects, we captured five images of each with a CCD camera and video frame grabber. We performed minor, pseudorandom perturbations of each object’s position and orientation while lighting and camera viewpoint remained constant. After capturing the 640 x 480 gray-scale images, we cropped and scaled them to 64 x 64, the same size as the face images.
In order to transform raw 64 x 64 8-bit gray-scale images into a representation more appropriate for a neural network classifier, we preprocessed the images with a Gabor wavelet-based feature detector. von der Malsburg and colleagues have been using grids of these wavelet filters to extract good representations for face recognition for several years. The Gabor wavelet resembles a sinusoid restricted by a Gaussian function, may be tuned to
171
a particular orientation and spatial frequency, and is similar to the observed receptive fields of simple cells in mammalian primary visual cortex (Jones
and Palmer, 1987). A ‘jet’ is formed by concatenating the response of several filters with different orientation and spatial frequency tunings. As an
*
I
I n Input Stimulus
multiplicauve
I
output layem
I
h
connections
*
I
-
Input Stimulus
output
I
interconnection
Fig. 2. Modular network model architectures. (a) The standard mixture of experts (ME). See Appendix A for details. (b) The modular network for Model 11, described in Appendix A. In this network, the gate mixes hidden layer representations rather than expert network output layers. In the M E network, the experts are self-contained linear networks with their own output layers. The entire network’s output vector is a linear combination of the expert output vectors. In contrast, the second network’s modules are not self-containedeach module’s ‘output’ is a standard network’s hidden layer. Each of the overall network’s output units is a function of a / / of the nonlinear hidden units in both of the modules, modulated by the gate network’s outputs.
172
Fig. 3. Example face, book, cup, and can images.
image feature detector, the jet exhibits some invariance to background, translation, distortion, and size (Buhmann et al., 1990). Early versions of their face recognition system (Lades et al., 1993) stored square meshes of these jets at training time and used them as deformable templates at recognition (test) time to match a test face. More recent versions (Wiskott et al., 1997) place the jets over particular facial features (fiducial points) for greater accuracy. Biederman and Kalocsai (1997) show how Wiskott et al.’s representation can account for psychological phenomena in face recognition, and the system was recently a top performer in the U.S. Army’s FERET Phase I11 face recognition competition (Okada et al., 1998). Thus the Gabor wavelet jet is a good representation for face recognition. We use a simple version of the square mesh (Buhmann et al., 1990) as described below. Since we use prealigned images and phaseinvariant filter responses, the more complicated fiducial point techniques are unnecessary. The basic wavelet is:
the filter, and u is a constant. As in Buhmann et al. (1990), we let u = rr, let $ range over 5lT 3rr 71T ’8’4’ 8’2’ 8 ’ 4 ’ 8
31T 0 IT. rr. __
IT
and we let
where N is the image width and i is an integer. In the first series of experiments, we used 6 scales ( i E { 1, . . . , 6)), and in the second series we used 5 scales ( i E [ 1, . . . , 5 ) ) . See Fig. 4 for examples of the filters at various orientations and scales. Again as in Buhmann et al. (1990), for each of the orientationfspatial frequency pairs, we convolve G(L,2) with the input image I(?): r
then normalize the response values across orientations:
where
k‘=
[ k cos 4, k sin $ 1 7
and k = I k’I controls the spatial frequency (scale) of the filter function G, k is a point in the plane relative the wavelet’s origin, $ is the angular orientation of
Fig. 4. Real components of Gabor wavelet filters at three different orientations and scales.
173
We trained the ME model with a simple face/ object classification task, observed the extent to which each expert specialized in face, book, cup, and can classification, and finally observed how random damage localized in one expert affected the model’s generalization performance. As described in the Face/Object Stimuli section, we preprocessed each image to generate a 3072-element vector representing the image. The rest of this section describes the training procedure and specialization/ damage results. Fig. 5. Original image and Gabor jets at five scales. Each pixel’s intensity in the processed images represents the log of the sun] of the magnitudes of the filter responses in each of the eight directions.
With eight orientations and six scale factors, this process results in a vector of 48 complex values at each point of an image (see Fig. 5 for example filter responses). We subsampled an 8 x 8 grid of these vectors and computed the magnitude of the complex values, resulting in a large vector (3072 elements for the 6-scale representation in Model I or 2560 for the 5-scale representation in Model 11) representing the image.
Model I: mixture of experts network Our first model, reported in Dailey et al. (1997), was designed to explore the extent to which specialization could arise in a simple competitive modular system in which the expert networks’ inputs were not biased in any way. The network model was a mixture of experts (Jacobs et al., 1991). Figure 2(a) shows our two-expert network schematically, and Appendix A describes the network and its learning rules in detail. In short, the ‘experts’ are simple single-layer linear networks, and the gating network learns an input space partition and ‘trusts’ one expert in each of these partitions. The gate network’s learning rules attempt to maximize the likelihood of the training set assuming a Gaussian mixture model in which each expert is responsible for one component of the mixture.
Dimensionality reduction with principal components analysis (PCA) The feature extraction method described above produced 240 input patterns of 3072 elements. Since neural networks generalize better when they have a small number of independent inputs, it is desirable to reduce the input pattern dimensionality. To accomplish this, we first divided them into a training set composed of four examples for each individual face or object (192 patterns total) and a test set composed of one example of each individual (48 patterns total). Using the efficient technique for PCA described by Turk and Pentland (1991), we projected each pattern onto the basis formed by the 192 eigenvectors of the training set’s covariance matrix, resulting in 192 coefficients for each pattern. As a final step, we normalized each pattern by dividing each of its coefficients by its maximum coefficient magnitude so all coefficients fell in the range [ - 1, 11. With the resulting representation, our networks exhibited good training set accuracy and adequate generalization, so we did not further reduce the pattern dimensionality or normalize the variance of the coefficients. Note that with 192 patterns and 192 dimensions, the training set is almost certainly linearly separable.
Network training In these experiments, the network’s task was to recognize the faces as individuals and the objects as members of their class. Thus the network had 15 outputs, corresponding to cup, book, can, face 1, face 2, etc. For example, the desired
174
output
vector
for
the
‘cup’ patterns
was
[l,O,O,O,O,O,O,O,O,O,O,O,O,O,O]T, and the pattern for
‘face 5’ was [o,o,o,o,o,o,o,~,o,o,o,o,o,o,~]~. After removing one example of each face and object (48 patterns) from the training set for use as a validation set to stop training, we used the following training procedure: 1. Initialize network weights to small random values. 2. Train each expert network on 10 randomlychosen patterns from the (reduced) training set. Without this step, both networks would perform equally well on every pattern and the gating network would not learn to differentiate between their abilities, because the gate weight update rule is insensitive to small differences between the experts’ performance. 3. Repeat 10 times: (a) Randomize the training set’s presentation order. (b) Train the network for one epoch. 4. Test the network’s performance on the validation set. 5 . If mean squared error over the validation set has not increased two consecutive times, go to 3. 6. Test the network’s performance on the test set. The training regimen was sufficient to achieve near-perfect performance on the test set (see Fig. 7 results for 0% damage), but we found that the a priori estimates (g, and g2) learned by the gate network were extremely sensitive to the learning rate parameters (qRand qe in Appendix A) and momentum parameters (anand a, in Appendix A). If the gate network learns too slowly relative to the experts, they generally receive the same amount of error feedback and the g, never deviate far from 0.5. If the gate network learns too quickly relative to the experts, it tends to assign all of the input patterns to one of the experts. To address this problem, we performed a search for parameter settings that partition the training set effectively. For 270 points in the four-dimensional parameter space, we computed the variance of one of the gate network outputs over the training set, averaged over ten runs. This variance measure was maximal when q, = 0.05, qn= 0.15, a,= 0.4, and a8= 0.6.
Maximizing the gate output variance is a reasonable strategy for selecting the model’s learning parameters. It encourages a fairly sharp partition between the experts’ areas of specialization without favoring one partition over another. On the other hand, it may have been preferable to include a term penalizing low gate value variance in the network’s objective function, since this would eliminate the need for a parameter search; we experimented with this technique and found that the results (as reported in the next section) were robust to this change in the training procedure. Model I results Figure 6 summarizes the division of labor performed by the gate network over 10 runs with qe=0.05,qR=0.15, a,=O.4, and a,=0.6. The bars denote the weights the gate network assigned to whichever expert emerged as face-dominant, broken down by stimulus class, and the error bars denote standard error. It should not be surprising that the ‘face expert’ becomes responsible for a
Fig. 6. Weights assigned to the face-dominant expert network for each stimulus class. Error bars denote standard error.
175
majority of the face patterns in the training setwe choose as the face expert whichever expert is most responsible for the face patterns, post hoc. However, it is interesting that the non-face expert nearly always shows a strong specialization for the can patterns. This means that the gate network tends to find a natural division between the face and can patterns quite easily. The high variability on the book patterns indicates that many of the book patterns can easily be grouped with either the faces or the cups on different runs. Finally, the low variability on the cup patterns reflects that one set of cups tends to group with the faces and another set of cups tends to group with the cans, fairly consistently from run to run. Figure 7 illustrates the performance effects of damaging one expert by randomly removing connections between its input and output units. Damaging the face-specializing network resulted in a dramatic decrease in performance on the face patterns. When the network not specializing in faces was damaged, however, the opposite effect was present but less severe. Clearly, the face specialist learned enough about the object classes during early stages of training (when the gating network estimates all prior probabilities at about 0.5) to correctly classify some of the object patterns.
Face-dominating expert damage effects
0 0
25
15
50
% damage
(4
100
--t faces --**-
books
-+-
cans
- -e-. cups
Non-face dominating expert damage effects
Discussion of model I results The results show that localized damage in a trained ME network can model prosopagnosia: as damage to the ‘face’ module increases, the network’s ability to recognize faces decreases dramatically. From this we conclude that it is plausible for competition between unbiased functional units to give rise to a specialized face processor. Since faces form a fairly homogeneous class, it is reasonable to expect that a system good at identifying one face will also be good at identifying others. However, since the degree of separation between face and non-face patterns in the model is not clean and is sensitive to training parameters, additional constraints would be necessary to achieve a facehon-face division reliably. Indeed, as discussed earlier, such constraints, such as the prevalence of face stimuli in
% damage
(b)
-e-
faces
- - I -books +-. cups
-
-+-
cans
Fig. 7. (a) Face identification classification errors increase as damage to the face-dominating expert module increases, with less impact on object classification. (b) Object categorization classification errors increase as damage to the non-facedominating expert module increases, with less impact (on average) on face identification.
176
the newborn’s environment, different maturation rates in different areas of the brain, and a possibly innate preference for tracking faces, may well be at work during infant development (Johnson and Morton, 1991). Despite the lack of a strong facehon-face separation in the network, damaging the ‘face expert’ affects face recognition accuracy disproportionately, compared with how damage to the non-face expert affects object recognition accuracy. This is most likely due to the fact that the network is required to perform subordinate classification between members of a homogeneous class (the faces) but gross superordinate classification of the members of the other classes. This experiment shows how a functional specialization for face processing could arise in a system composed of unbiased ‘expert’ modules. The next modeling experiment shows that adding simple biologically-motivated biases to a similar competitive modular system can make the effect even more reliable.
Model 11: modular hidden layer network In the mixture of experts model just described, the experts were very simple linear classifiers and the system was not biased in any way to produce a face expert, although the specialization was sensitive to parameter settings and was not always strong. Our second model, reported in Dailey and Cottrell (1998), was designed to explore the extent to which the learning task and structural differences between modules might strengthen the specializations we observed in the earlier model. In order to allow the expert networks to develop more sophisticated representations of the input stimuli than a simple linear decision boundary, we added hidden layers to the model. In order to make the gating network more sensitive to the task at hand (and less sensitive to the a priori structure of the input space), we trained it by backpropagation of error instead of the ME’S Gaussian mixture model. The connections to the modular network’s output units come from two separate inputhidden layer pairs; these connections are gated multiplicatively by a simple linear network with softmax outputs. Figure 2(b) illus-
trates the model’s architecture, and Appendix A describes its operation and learning rules in detail. The model is very similar to the ME in that it implements a form of competitive selection in which the gating network learns which module is better able to process a given pattern and rewards the ‘winner’ with more error feedback. The purpose of the experiments described in this section was to explore how two biases might affect specialization: ( I ) the discritnination level (subordinate vs. superordinate) of the task being learned; and (2) the range of spatial frequency information available in the input. We used the same stimuli as in the mixture of experts experiments and trained the model with several different face/object classification tasks while varying the range of spatial frequencies available to the modules. In each case, we observed the extent to which each module specialized in face, book, cup, and can classification. We found that when the system’s task was subordinate classification of faces and superordinate classification of books, cups, and cans, the module receiving only low spatial frequency information developed a strong, reliable specialization for face processing. We repeated Model 1’s damage experiments with these specialized networks. The remainder of this section describes the preprocessing, training procedure, and results in more detail.
Preprocessing with principal components analysis The Gabor wavelet filtering procedure we used produced a 2560-element vector for each stimulus. As in the mixture of experts model, it is desirable to reduce the input’s dimensionality. In this experiment, however, we wanted to maintain a segregation of the responses from each Gabor wavelet filter scale, so we perfortned a separate principal components analysis on each spatial frequency component of the pattern vectors. For each of the five filter scales in the jet, we extracted the subvectors corresponding to that scale from each pattern in the training set, computed the eigenvectors of their covariance matrix, projected the subvectors from each of the patterns onto these eigenvectors, and retained the eight most sig-
177
nificant coefficients. Reassembling the pattern set resulted in 240 40-dimensional vectors. Network training
Of the 240 40-dimensional vectors, we used four examples of each face and object to form a 192-pattern training set, and one example of each face and object to form a 48-pattern test set. We held out one example of each individual in the training set for use in determining when to stop network training. We set the learning rate for all network weights to 0.1 and their momentum to 0.5. Both of the hidden layers contained 15 units in all experiments. We used the network’s performance on the hold out set to determine appropriate criteria for stopping training. For the identification tasks, we determined that a mean squared error (MSE) threshold of 0.02 provided adequate classification performance on the hold out set without overtraining and allowed the gate network to settle to stable values. For the four-way classification task, we found that an MSE threshold of 0.002 was necessary to give the gate network time to stabilize and did not result in overtraining. For all runs reported in the results section, we simply trained the network until it reached the relevant MSE threshold.
1
2
Low frequency pattern
3
We trained networks to perform three tasks: 1. Four-way superordinate classification (4 outputs). 2. Subordinate book classification; superordinate face, cup, and can classification (15 outputs). 3. Subordinate face classification; superordinate book, cup, and can classification (15 outputs).
For each of these tasks, we trained networks under two conditions. In the first, as a control, both modules and the gating network were trained and tested with the full 40-dimensional pattern vector. In the second, the gating network received the full 40-dimensional vector, but module 1 received a vector in which the elements corresponding to the largest two Gabor filter scales were set to 0, and the elements corresponding to the middle filter scale were reduced by 0.5. Module 2, on the other hand, received a vector in which the elements corresponding to the smallest two filter scales were set to 0 and the elements corresponding to the middle filter were reduced by 0.5. Thus module 1 received mostly high-frequency information, whereas module 2 received mostly low-frequency information, with de-emphasized overlap in the middle range, as shown in Fig. 8. For each of the 3 x 2 experimental conditions, we trained networks using 20 different initial random weight sets and recorded the softmax
4
5
High frequency pattern
Fig. 8. Splitting input patterns into high spatial frequency and low spatial frequency components.
178
outputs learned by the gating network on each training pattern. As in the ME model, this indicates the extent to which a module is functionally specialized for a class of stimuli. To test performance under localized random damage conditions, we randomly removed connections from a module’s hidden layer to the output layer. Model II results Figure 9 displays the resulting degree of specialization of each module on each stimulus class. Each chart plots the average weight the gating network assigns to each module for the training patterns from each stimulus class, averaged over 20 training runs with different initial random weights. The error bars denote standard error. For each of the three reported tasks (four-way classification, book identification, and face identification), one chart shows division of labor between the two modules in the control situation, in which both modules receive the same patterns, and the other chart shows division of labor between the two modules when one module receives low-frequency information and the other receives high-frequency information. When required to identify faces on the basis of high- or low-frequency information, compared with the four-way-classification and same-pattern controls, the low-frequency module wins the competition for face patterns extremely consistently (lower right graph). Book identification specialization, however, shows considerably less sensitivity to spatial frequency. We have performed the equivalent experiments with a cup discrimination and a can discrimination task. Both of these tasks show a low-frequency sensitivity lower than that for face identification but higher than that for book identification. As shown in Fig. 10, damaging the specialized face identification networks provides a good model of prosopagnosia and visual object agnosia: when the face-specialized module’s output is ‘damaged’ by removing connections from its hidden layer to the output layer, the overall network’s generalization performance on face identification drops dramatically, while its generalization performance on object recognition drops much more slowly.
When the non-face-specialized (high frequency) module’s outputs are damaged, the opposite effect occurs: the overall network’s performance on each of the object recognition tasks drops, whereas its performance on face identification remains high. Discussion of model II results The results in Fig. 9 show a strong preference for low-frequency information in the face identification task, empirically demonstrating that, given a choice, a competition mechanism will choose a module receiving low-frequency, large receptive field information for this task. It appears that the large-scale Gabor filters carry the most information relevant to face identification given this particular set of stimuli. We have essentially found a ‘sweet spot’ of spatial frequency ranges where the low spatial frequency components are very useful for face identification but less useful for other types of object classification. One problem is that we have only trained our networks on faces and objects at one distance from the camera, so the concept of ‘low spatial frequency information’ is relative to the face or object, not to the viewer. Nevertheless, the resulting specialization in the network is remarkably strong. It demonstrates dramatically how effective a small bias in the relative usefulness of a particular range of spatial frequencies can be. The result concurs with the psychological evidence for configural face representations based upon low spatial frequency information, and suggests how the developing brain could be biased toward a specialization for face recognition by the infant’s initially low visual acuity. Inspired by this result, we predict that human subjects performing face and object identification tasks will show more degradation of performance in high-pass filtered images of faces than in highpass filtered images of other objects. To our knowledge, this has not been empirically tested, although Costen et al. (1996) have investigated the effect of high-pass and low-pass filtering on face images in isolation, and Parker, Lishman, and Hughes (1996) have investigated the effect of highpass and low-pass filtering of face and object images used as 100 ms cues for a same/different task. Their results indicate that relevant high-pass filtered images cue object processing better than
I79
low-pass filtered images, but the two types of filtering cue face processing equally well. Similarly, Schyns and Oliva’s (1999) experiment, described earlier, suggest that the human face identification system preferentially responds to low spatial frequency inputs.
General discussion Both of the experiments we have described show that localized damage in modular learning systems can model brain damage resulting in face or object agnosia. In the mixture of experts model, one
Fig. 9. Average weight assigned to each module broken down by stimulus class. For each task, in the control experiment, each module receives the same pattern; the split-frequency charts summarize the specialization resulting when module 1 receives high-frequency Gabor filter information.
180
High spatial frequency module damage effects
2o
1
0 0
25
50 % damage
(a)
I5 -o-
expert tends to specialize for face recognition because the face patterns are generally near each other, and the gate module’s Gaussian mixture model assumptions encourages an a priori division of the input space. In Model 11, the low spatial frequency module presumably specializes very strongly for face recognition because the low spatial frequency components of the input patterns are particularly useful for subordinate classification of faces. The models empirically demonstrate that prosopagnosia could simply reflect random localized damage in a system trained by competition. This competition could easily be biased by structural and environmental constraints such as:
100
faces books
--*.. -+-.CUPS
-+-
cans
0
Low spatial frequency module damage effects 0
0
I
25
I
I
I
50
15
% damage
(b)
-+-
100
faces
--a*. books -+-.cups -+- cans
Fig. 10. Effect of damaging the specialized face identification networks from Fig. 9. Training on face specialization and splitting the spatial frequency information between the two modules leads to a strong specialization for faces in the low spatial frequency module.
Infants appear to have an innate tendency to track faces at birth. Faces are the only class of visual stimuli for which subordinate classification is important at birth. Learning under these conditions would necessarily be based on gross features and low spatial frequency information due to the infant’s low acuity and contrast sensitivity. Low spatial frequency information is a suitable basis for the holistic representations apparently at work in adult face processing.
In contrast to prosopagnosia, however, localized damage in our networks does not model visual object agnosia (sparing face recognition) especially well. A lesioned network with object processing as badly impaired as C.K.’s, with intact face processing, would be an extremely rare occurrence. Of course, Joula and Plunkett (1998) might argue that C.K.’s brain damage is nothing more than an outlier. But as Moscovitch et al. (1997) point out, C.K. can perceive and recognize the component parts of complex objects but cannot put them together into a whole. The psychological evidence seems to indicate that faces are different from most other objects in that they are perceived and recognized holistically, but our networks do not have much opportunity to form part-based representations of the objects. Although the hidden units in networks like those of Model I1 could presumably discover parts for their intermediate representations of objects, that would probably require (at least) a much larger training set and
181
more difficult classification tasks. Thus it seems that our models do not possess the part-based representations presumably destroyed in severe object agnosics without prosopagnosia. There are several other valid criticisms of our models. First and perhaps foremost, the models and stimuli are largely static. The developing infant’s environment is clearly dynamic, and structural changes such as increased acuity are constantly but gradually occurring. We plan to explore how some of these dynamics might influence functional specialization in our models. Another potential criticism is that children do not appear to recognize faces holistically (as operationalized by Farah and Tanaka’s part-whole paradigm) until at least the age of six (Tanaka et al., 1998). Since we claim that the low spatial frequency module in Model I1 is analogous to the putative adult holistic face processor, to account for the data in children, we have to claim that infants at some point switch from holistic processing of faces to part-based representations as their acuity increases to a reasonable level, then back to holistic representations later. This is another topic for further research.
Conclusion We have shown in two computational modeling studies that simple data-driven competition combined with constraints and biases known or thought to exist during visual system development can account for some of the effects observed in normal and brain-damaged humans. Our studies lend support to the claim that there is no need for an innately-specified face processing module- face recognition is only “special” insofar as faces form a remarkably homogeneous category of stimuli for which within-category discrimination is ecologically beneficial early in life. Using competitive computational models to study functional specialization in face processing seems to be a promising avenue for future research. In future work, we plan to explore mechanisms that lead to functional specialization and localization in unsupervised computational models that are more biologically plausible. As another route to increasing our models’ plausibility and predictiveness, we will make efforts to realistically incorporate the
time course of infant development. We also plan to study other neuropsychological double dissociations, such as that between facial expression and facial identity recognition, with similar techniques.
Acknowledgements We thank Chris Vogt and Gary’s Unbelievable Research Unit (GURU) for discussion and comments on previous drafts of this chapter. The research was supported in part by NIMH grant MH5707.5 to GWC.
Appendix A Mixture of experts learning rules In this model, the output layers of an array of linear classifiers is combined by a gating network, as shown in Fig. 2(a). We trained this network with the maximum likelihood gradient ascent learning rules described by Jordan and Jacobs (1995).
Feed-forward phase In the feed-forward stage, each expert network i is a single-layer linear network that computes an output vector 0, as a function of the input vector x and a set of parameters 8,. We assume that each expert specializes in a different area of the input space. The gating network assigns a weight g, to each of the experts’ outputs 0,.The gating network determines the g, as a function of the input vector x and a set of parameters w. The g, can be interpreted as estimates of the prior probability that expert i can generate the desired output y, or P(ilx, w). The gating network is a single-layer linear network with a softmax nonlinearity at its output. That is, the linear network computes
j
then applies the softmax function to get
Thus the g, are non-negative and sum to 1. The final, mixed output of the entire network is
182
Appendix B
o = cgioi.
Mixed hidden layer network learning rules
I
Adaptation b y maximum likelihood gradient ascent We adapted the network’s estimates of the parameters w and e j , using Jordan and Jacobs’ (1995) gradient ascent algorithm for maximizing the log likelihood of the training data given the parameters. Assuming the probability density associated with each expert is Gaussian with identity covariance matrix, they obtain the online learning algorithms
Aei = qch,(y- oi)xT
Feed-forward phase In the feed-forward stage, the hidden layer units uil (i is the module number a n d j is the unit number in the layer) compute the weighted sum of their inputs:
and
Awl= T&h, - g, >XT where qe and qg are learning rates for the expert networks and the gating network, respectively, and h, is an estimate of the posterior probability that expert i can generate the desired output y :
h, =
g1 exp( - t(Y - 0, >‘(Y
-
This model is a simple modular feed-forward network. The connections to the output units come from two separate inputhidden layer pairs; these connections are mixed multiplicatively by a gating network similar to that of the mixture of experts. The architecture is shown in Fig. 2(b). We used standard backpropagation of error to adjust the network’s weights, but since the multiplicative gating connections add some complexity, we give the detailed learning rules here.
0,
then apply the sigmoid function to the sum:
1)
El gJ exp( - t(Y - OI>’(Y - 0,))
This is essentially a softmax function computed on the inverse of the sum squared error of each expert’s output, smoothed by the gating network’s current estimate of the prior probability that the input pattern was drawn from expert i ’ s area of specialization. As the network learns, the expert networks ‘compete’ for each input pattern, while the gate network rewards the winner of each competition with stronger error feedback signals. Thus, over time, the gate partitions the input space in response to the experts’ performance. We found that adding momentum terms to the update rules enabled the network to learn more quickly and the gate network to partition the input space more reliably. With this change, if c is a weight change computed as above, the update rule for an individual weight becomes Awl ( t )= c + aAw,(t - 1). We found that setting the learning parameters q8,qe,a$,and a, was not a simple task, as described in the text.
Softmax unit i in the gate network computes the weighted sum of its inputs:
then applies the softmax function to that weighted sum:
The gi are positive and sum to 1. The final output layer then computes the weighted sum of the hidden layers of the modules, weighted by the gating values g, :
183
Adaptation by back propagation (generalized delta rule) The network is trained by on-line backpropagation of error with the generalized delta rule. Each of the network’s weights w,,for a connection leaving unit i and feeding unit j is updated in proportion to S,, the error due to unit j , and x , , the activation of unit i, with the addition of a momentum term. For output unit i,
a , = - 2 oil
-
o,),
where y , is the ith component of the desired output and o, is unit i’s actual output. For hidden node u g , the jth unit in module i’s hidden layer,
P
where s‘ is the derivative of the sigmoid function, Z,, is the weighted sum of u,,’s inputs, g, is the ith softmax output unit of the gating module, and w,,,, is the weight on the connection from u,, to output unit 0,)
.
FinalIy, the error due to the softmax unit that gates module i is
where z,, is the output activation of hidden node u,, and w,,,, is the weight from u,, to output node op. Thus the gating units both mix the outputs of each module’s hidden layer and give each module feedback during learning in proportion to its gating value (via S,,, ). Thus the architecture implements a simple form of competition in which the gate units settle on a division of labor between the modules that minimizes the entire network’s output error.
References Behrmann, M., Moscovitch, M. and Winocur, G. (1994). Intact visual imagery and impaired visual perception in a patient with visual agnosia. J. Exp. Psychol.: Hum. Percept. Perform., 20(5): 1068-1087. Biederman. I. and Kolacsai, P. (1997). Neurocomputational bases of object and face recognition. Philosoph. Trans. Roy. Soc. Lond.: Biolug. Sci., 352: 1203-1219.
Buhmann, J., Lades, M. and von der Malsburg, C. (1990). Size and distortion invariant object recognition by hierarchical graph matching. In: Proceedings of the IJCNN International Joint Conference on Neural Networks, volume 11, pages 41 1-416. Costen, N.P., Parker, D.M. and Craw, I. (1996). Effects of highpass and low-pass spatial filtering on face identification. Perception arid Psychophysics, 38(4): 602-612. Cottrell, G.W. and Metcalfe, J. (1991). Empath: Face, gender and emotion recognition using holons. In: Lippman, R.P., Moody, J. and Touretzky, D.S. (Eds.), Advances in Neural lnformution Processing Systems 3, San Mateo: Morgan Kaufmann, pp. 564-57 1. Dailey, M.N., Cottrell, G.W. and Padgett, C. (1997) A mixture of experts model exhibiting prosopagnosia. In: M.G. Shafto, and P. Langley, (Eds.), Proceedings of the Nineteenth Annual Conference of the Cognitive Science Society, Erlbaum: Hillsdale, NJ. pp. 155-160. Dailey, M.N. and Cottrell, G.W. (1998). Task and spatial frequency effects on face specialization. In: M.I. Jordan, M.J. Kearns and S.A. Solla (Eds.), Advances in Neural Information Processing Systems 10. Cambridge, MA: MIT Press, pp. 17-23. Damasio, A.R., Damasio, H. and Van Hoesen, G.W. (1982). Prosopagnosia: Anatomic basis and behavioral mechanisms. Neurology, 32: 331-341. De Renzi, E. (1986). Current issues on prosopagnosia. In: Ellis, H., Jeeves, M., Newcombe, F. and Young, A. (Eds.), Aspects qf Face Processing. Martinus Nijhoff Publishers, Dordrecht, pp. 243-252. De Renzi, E., Perani, D., Carlesimo, G., Silveri, M. and Fazio, F. (1994). Prosopagnosia can be associated with damage confined to the right hemisphere - an MRI and PET study and a review of the literature. Psychologia, 32(8): 893-902. de Schonen, S., Mancini, J. and Liegeois, F. (1998). About functional cortical specialization: The development of face recognition. In: F. Simion and G. Butterworth (Eds.), The Development of Sensory, Motor, and Cognitive Capacities in Early Infancy, pp. 103-1 16. Psychology Press, Hove, UK. Erickson, M.A. and Kruschke, J.K. (1988). Rules and exemplars in category learning. J. Exp. Psychol.: General, 127: 107- 140. Farah, M.J. (1990). Visual Agnosia: Disorders of Object Recognition and What They Tell Us about Normal Vision. MIT Press: Cambridge, MA. Farah, M.J., Levinson, K.L. and Klein, K.L. (1995a). Face perception and within-category discrimination in prosopagnosia. Neuropsychologia, 33(6): 661-674. Farah, M.J., Wilson, K.D., Drain, H.M. and Tanaka, J.R. (1995b). The inverted face inversion effect in prosopagnosia: Evidence for mandatory, face-specific perceptual mechanisms. Vis. Res., 35(14): 2089-2093. Farah, M.J., Wilson, K.D., Drain, M. and Tanaka, J. (1998). What is “special” about face perception? Psycho[. Rev., 105(3): 482498.
184 Feinberg, T.E., Schlinder, R.J., Ochoa, E., Kwan. P.C. and Farah, M.J. (1994). Associative visual agnosia and alexia without prosopagnosia. Cortex, 30(3): 3 9 5 4 1 I . Fodor, J. (1983). Modularity of Mind. MIT Press: Cambridge. MA. Gauthier, I. and Tarr, M. (1997). Becoming a “greeble” expert: Exploring mechanisms for face recognition. Vis. Res., 37( 12): 1673-1 682. Gauthier, I., Tarr, M.J., Anderson, A.W., Skudlarski, P. and Gore, J.C. (1988). Expertise training with novel objects can recruit the fusiform face area. SOC. Neurosci. Abstr, 23: 868-5. Humphreys, G.W. and Riddoch, M.J. (1987). To See but Not to See. Erlbaum: Hillsdale, NJ. Jacobs, R.A., Jordan, M.I., Nowlan, S.J. and Hinton, G.E. (1991). Adaptive mixtures of local experts. Neural Computation, 3: 79-87. Jacobs, R.A. and Kosslyn, S.M. (1994). Encoding shape and spatial relations-The role of receptive field size in coordinating complementary representations. Cogn. Sci., 18(3): 361-386. Jacobs, R.A. (1997). Nature, nurture and the development of functional specializations: A computational approach. Psychonom. Bull. Rev., 4(3): 299-309. Johnson, M., Dziurawiec, S., Ellis, H. and Morton, J. (1991). Newborns’ preferential tracking of face-like stimuli and its subsequent decline. Cognition, 40: 1-19. Johnson, M.H. (1997). Developmental Cognitive Neuroscience. Blackwell: Cambridge, MA. Johnson, M.H. and Morton, J. (19910. Biology and Cognitive Development: The Case of Face Recognition. Blackwell: Oxford, UK. Jones, J. and Palmer, L. (1987). An evaluation of the twodimensional Gabor filter model of receptive fields in cat striate cortex. J. Neurophysiol., 58(6): 1233-1258. Jordan, M. and Jacobs, R. (1995). Modular and hierarchical learning systems. In: Arbib, M. (Ed.), The Hanclbook ofBrain Theory and Neural Networks. MIT Press, Cambridge, MA. Juola, P. and Plunkett, K. (1988). Why double dissociations don’t mean much. In: Proceedings of the 20th Annual Conference qf the Cognitive Science Society, Hillsdale: NJ, Erlbaum, pp. 561-566. Kohonen, T. (1995). Self-organizing Maps. Springer Verlag: Berlin. Lades, M., Vorbrggen, J.C., Buhmann, J., Lange, J., von der Malsburg, C., Wrtz, R.P. and Konen, W. (1993). Distortion invariant object recognition in the dynamic link architecture. IEEE Transac. Comp., 42(3): 30&3 I 1.
McCarthy, G., Puce, A,, Gore, J.C. and Allison, T. (1997). Facespecific processing in the human fusiform gyrus. J. Cogn. Neurosci., 9(5): 605-610. McNeil, J.F. and Warrington, E.K. (1993). Prosopagnosia: A face-specific disorder. Quarterly J. Exp. Psychol., 46A: 1-10. Moscovitch, M., Winocur, G. and Behrmann, M. (1997). What is special about face recognition? Nineteen experiments on a person with visual object agnosia and dyslexia but normal face recognition. J. Cogn. Neurosci., 9(5): 555-604. Okada, K., Steffens, J.. Maurer, T., Hong, H., Elagin, E., Neven, H. and von der Malsburg, C. (1998). The Bochum/USC face recognition system and how it fared in the FERET phase I11 test. In: H. Wechsler. P.J. Phillips, V. Bruce, E Fogleman Soulie and T. Huang (Eds.), Face Recognition: From Theoiy to Applications. NATO AS1 Series F. Springer-Verlag. Parker, D.M., Lishman, J.R. and Hughes, J. (1996). Role of coarse and fine spatial information in face and object processing. J. Exp. Psychol.: Hum. Percept. Peflorm., 22(6): 1445-1466. Pascalis, O., de Schonen, S., Morton, J., Deruelle, C. and FabreGrenet, M. (1995). Mother’s face recognition by neonates: A replication and an extension. In$ Behav. Dev., 18: 79-85. Plaut, D.C. ( 1995). Double disssociation without modularity: Evidence from connectionist neuropsychology. J. Clin. Exp. Neuropsychol., 17(2): 294-32 I . Schyns, P.G. and Oliva, A. (1999). Dr. Angry and Mr. Smile: When categorization flexibly modifies the perception of faces in rapid visual presentations. Cognition, 69(3): 243-265. Sergent, J., Ohta, S . and MacDonald, B. (1992). Functional neuroanatomy of face and object processing. A positron emission tomography study. Brain, 1 15: 15-36. Tanaka, J. and Sengco, J. (1997). Features and their configuration in face recognition. Memory und Cognition. 235): 583-592. Tanaka, J.W., Kay, J.B., Grinnell, E., Stansfield, B. and Szechter, L. (1998). Face recognition in young children: When the whole is greater than the sum of its parts. E.s. Cogn., 5(4): 479496. Teller, D., McDonald, M., Preston, K., Sebris, S. and Dobson, V. (1986). Assessment of visual acuity in infants and children: The acuity card procedure. Dev. Mcd. Child Neurol., 28(6): 779-789. Turk, M. and Pentland, A. (1991). Eigenfaces for recognition. J. Cogn. Neurosci., 3: 71-86. Wiskott, L., Fellous, J.-M., Krger, N. and vond er Malsburg, C. (1997). Face recognotion by elastic bunch graph matching. IEEE Transactions on Puttern Analysis and Machine Intelligence, 19(7): 775-779.
J.A. Reggia, E. Ruppin and D. Glanzman (Eds.) Pmjiress in Brain Research, Vol 121 0 1999 Elscvier Science BV, All rights reserved.
CHAPTER I t
Functional brain imaging and modeling of brain disorders M.-A. Tagamets*.*and Barry Horwitz2 ‘Georgetown Institute for Cognitive cind Computational Science, Georgetown University School of Medicine, Washington, DC 20007, USA 2Lahorutory of Neurosciences, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
Introduction Brain disorders generally can be divided into two types that represent the ends of a somewhat continuous spectrum. There are those in which specific brain regions have readily definable lesions, such as may come from strokes or tumors, and some neurodegenerative disorders such as Parkinson’s disease. The second variety of brain disorder has no obviously apparent lesion, although microscopic abnormalities can sometimes be detected. Included in this category are many developmental disorders such as autism and dyslexia, and numerous psychiatric syndromes such as schizophrenia and depression. Many investigators have come to believe that symptoms observed in the second type of disorder are to be understood in terms of abnormal functional connectivity between brain components (e.g. perhaps due to abnormal neurotransmission or abnormal synaptic connections) (Horwitz et al., 1991; Dolan and Friston, 1997). However, because every brain region receives connections from, and sends connections to, other brain regions, the symptoms observed even in the first type of disorder can be thought of as indicative of abnormal connectivity. Thus, if region A has a lesion, not only does this mean that region A can no longer correctly perform its neural computations, but the areas that are the targets of *Corresponding author. Tel.: 202-687-2724; e-mail: [email protected]
region A will perform incorrect computations. Because of these complex interregional interactions, it has been difficult to understand the symptomotology of brain disorders in neurobiological terms. The advent of human brain imaging methods has provided the opportunity for studying the living brain, in both normal subjects and those with brain disorders. One of the unique features of techniques such as positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) is that data are obtained from essentially the entire brain simultaneously. These potentially rich data sets can be explored with many different methods of computational analysis. To date, the most common method that has been used for analyzing these data has been the subtraction paradigm (Posner et al., 1988). This involves directly comparing the activations produced by two or more different cognitive tasks, and taking the regions where differences in activations occur as an indicator of functional specialization for the cognitive component(s) of interest. However, functional neuroimaging data can also be used to examine how different brain regions interact with one another during the performance of specific cognitive tasks, but it is only relatively recently that methods for assessing interregional interactions using functional brain imaging data have become routinely available (Honvitz et al., 1984; Moeller et al., 1987; Horwitz et al., 1992a,b; Friston et al., 1993; Horwitz, 1994; McIntosh and Gonzalez-
186
Lima, 1994; McIntosh et al., 1994; Buechel and Friston, 1997). Some of these methods can be viewed as representing a kind of systems-level neural modeling. A way to relate these systemslevel results to their neurobiological substrates involves the use of large-scale neural models, a method that has also been developed only recently (Arbib et al., 1995; Tagamets and Horwitz, 1998). In this paper, we will first review some of the methods that have been used for analysis of functional brain imaging data. This will be followed by a description of neural modeling that can be used to provide the basis for understanding the functional networks mediating specific cognitive tasks in normal subjects and in patients with brain disorders.
Functional neuroimaging data Types of functional brain imaging PET and fMRI are hemodynamic/metabolic methods that are based on the notion that neural activity leads to an increase in both cerebral blood flow and brain oxidative metabolism (Roy and Sherrington, 1890; Siesjo, 1978; Roland, 1993; Frackowiak et al., 1997). Oxidative metabolism is thought to be needed to restore ionic concentrations following neural activity, with the greatest effects apparently occurring in the vicinity of synapses (Mata et al., 1980). The increase in blood flow is conjectured to supply glucose and oxygen to the activated tissue. A commonly used method for measuring regional cerebral blood flow (rCBF) with PET involves the bolus injection of H,I5O. Because I5O has a half-life of 123 s, multiple scans (e.g. 6-12), each representing a different cognitive condition, can be performed in the same scanning session. The time interval over which a single scan is performed is about 1 minute, and the spatial resolution of a PET image is about 5-6 mm. For fMRI, the most commonly employed method for assessing brain functional activity is called BOLD (blood oxygenation level-dependent contrast), and it measures the changes in blood oxygenation and blood volume resulting from changes in neural activity. Deoxygenated hemoglobin acts as an endogenous paramagnetic contrast
agent. Increased blood flow reduces the local concentration of deoxygenated hemoglobin causing an increase in the MR signal on a T2*-weighted image (Ogawa et al., 1993; see Turner, 1995 and Posse et al., 1996 for reviews). The temporal resolution of fMRI is much faster than for PET, on the order of a few seconds. Unlike PET, the temporal resolution of fMRI seems to be limited by the slowness of the hemodynamic response to a change in brain state (about 5-8 s), rather than by the time needed to acquire the MR image (about 1 s for a whole-brain scan). fMRI images also have better spatial resolution than do those obtained from PET, about 2 mm within-plane. In the time domain, fMRI differs from PET in that many data points can be acquired from a single subject during a relatively short period of time, yielding a timeseries of activity. Thus, unlike PET where group analyses are the most common, fMRI permits single subject studies. Two paradigms for analyzing functional brain imaging data Two fundamental assumptions, each of which leads to a different data analysis strategy, determine how functional neuroimaging data are used to make inferences about the brain processes involved in cognition and sensorimotor function. The first is functional specialization, which posits that different brain regions are engaged for different functions (i.e., computations) (Zeki, 1990). This view has led to an analysis method that is referred to as the subtraction paradigm (Posner et al., 1988; Horwitz, 1994). In functional neuroimaging studies using PETIfMRI, this assumption is implemented by comparing the functional signals between two (in its most simple formulation) scans, each representing a different experimental condition. The locations of statistically significant differences in signal between the two presumably delineate the brain regions differentially involved in the two conditions. For example, if the two conditions differ by the presence of a single cognitive operation (e.g. working memory) in one compared to the other, then the brain areas ‘activated’ would, it is assumed, be associated with that cognitive operation.
187
The second premise takes a more distributed view of cortical function and leads to what has been termed the covariance paradigm (Horwitz, 1994). Here, it is assumed that the task represented by an experimental condition is mediated by a network of interacting brain regions, and that different tasks are mediated by different, but potentially overlapping, functional networks (Mesulam, 1990; Damasio, 1989; Horwitz et al., 1992a,b) Thus, by examining the covariances in brain activity between different brain areas, one can infer something about which areas are important nodes in the networks under study as well as about how these nodes are functionally connected. Functional neuroimaging methods obtain data simultaneously from multiple brain regions, and therefore are ideal for use with the covariance paradigm (Horwitz et al., 1992b; Friston et al., 1993). These two paradigms, subtraction and covariance, complement one another, and both are necessary to get a clear picture of how the brain works. Functional connectivity analysis There are a number of different approaches that have been utilized for assessing the functional connectivity of the human brain from neuroimaging data, including simple correlation analysis (Horwitz et al., 1984; Horwitz et al., 1992a), principal component analysis and eigenimage analysis (Moeller et al., 1987; Friston et al., 1993; Friston, 1994; Lagreze et al., 1993). In its most general form, the covariance paradigm encompasses all of these, since the interregional correlation between two brain regions is the critical quantity that underlies these methods. Determining this correlation requires obtaining the functional activities simultaneously from the two regions. With PET, multiple subjects are scanned, and a subject-to-subject variation in response to the task forms the basis for the interregional covariances. That is, if one subject utilizes a specific region more than a second subject, and if that region is a critical node in the network used to perform the task, then other critical nodes in the network will also show greater activity in the first compared to the second subject. The net result is a strong
correlation between activities of the different brain regions that comprise the network when compared pairwise across subjects. Unlike the subtraction paradigm, this type of analysis does not compare tasks prior to data analysis; instead, it analyzes what the brain is doing during each task of interest, and then looks for differences. The problem of knowing that two tasks differ by only one cognitive operation is eliminated, although this method has the disadvantage that significant interregional relationships may be present due to components of task performance that are not of interest (e.g. attending to the task, makmg a finger movement to indicate a response). Hence, both the subtraction and covariance analysis methods are needed to extract the important features of functional neuroimaging data. A recent investigation of the functional connectivity of the angular gyrus in the left hemisphere in normal male readers and dyslexic men during single-word reading tasks provides an example (Horwitz et al., 1998b). Using rCBF data obtained with PET, strong interregional correlations between rCBF in the left angular gyrus and rCBF in left extrastriate occipital and temporal cortex, Wernicke’s area, and prefrontal cortex just anterior to Broca’s area were found in normal subjects during the reading of pseudowords (e.g. theck) and exception words (e.g. yacht), as would have been expected based on studies of patients with acquired alexia (Dejerine, 1892; Henderson, 1986). However, these strong functional connections were absent in the dyslexic men, suggesting that there is a functional disconnection in developmental dyslexia that mirrors the anatomical disconnection of the angular gyrus found in some acquired reading disorders. Rather than just looking at covariances or correlations between specific regions, more complex techniques such as principal components analysis (PCA) and factor analysis can be used to partition the covariances between all region pairs into relatively independent systems that may represent networks mediating different components of the tasks under study. These types of analyses have been applied both to PET (Moeller et al., 1987; Friston et al., 1993; Lagreze et al., 1993) and to fMRI (Bullmore et al., 1996) data.
188
Structural equation modeling Both subtraction and covariance methods analyze functional neuroimaging data descriptively, in the sense that they identify potentially important regions and interactions by examining all the data. In contrast, modeling explicitly chooses from the data the subset considered to be important, and tries to understand this reduced data set in a detailed and quantitative way. Two types of neural modeling have been applied to functional brain imaging data: ( 1 ) systems-level modeling (Horwitz, 1990, 1994; McIntosh and Gonzalez-Lima, 1991, 1994; Friston, 1994; McIntosh et al., 1994); and (2) large-scale neural modeling (Arbib et al., 1995; Tagamets and Horwitz, 1998). In the rest of this stion, we review the use of structural equation modeling. The next stion provides more details about how large-scale neural modeling can be used to enhance interpretation of functional imaging data. Structural equation modeling is, in a sense, the natural extension of the covariance paradigm’s focus on the functional connectivity between brain regions. Large covariances in interregional activity can come about by both direct and indirect effects. That is, two regions may have a large correlation in activity if they are anatomically linked, and if that link is functional in a specific task. However, they could also have a large correlation, for example, if they are not directly connected but rather receive inputs from a common third region. One could also have a combination of both direct and indirect effects going on at the same time. This presents a major problem for the covariance paradigm- how can network behavior be inferred from the covariances alone if direct and indirect effects cannot be separated? The solution to this problem employs a systemslevel computational modeling technique that has been referred to as structural equation modeling or path analysis (McIntosh and Gonzalez-Lima, 1992; Horwitz and Sporns, 1994; McIntosh et al., 1994). Explicit data about the anatomical connections for a pre-specified set of brain regions are combined with their mutual interregional rCBF covariances. Structural equation modeling (Joreskog and Sorbom, 1979; Hayduk, 1987) makes use of these
constraints to determine the functional strengths of the anatomical links between regions that provide the closest match between the experimentally determined interregional covariances and those based on the computed functional strengths (these are sometimes called the effective connections (Friston, 1994) ). The set of functional strengths defines the functional network corresponding to each task. A study of face discrimination in patients with mild Alzheimer’s disease provides an illustration of this type of modeling (Horwitz et al., 1995). In the first application of structural equation modeling to human functional brain imaging data, McIntosh et al. (1994) had examined the effective connectivity along the ventral and dorsal visual processing pathways in young, healthy subjects in two tasks using rCBF data obtained with PET (1) a face matching (object vision) task that emphasized use of the ventral visual processing stream (Ungerleider and Mishkin, 1982; Haxby et al., 1991); and (2) a location matching task that emphasized use of the dorsal visual processing stream (Ungerleider and Mishkin, 1982; Haxby et al., 1991). It was found that the functional network for the right cerebral hemisphere showed strong effective connections between occipital, ventral occipitotemporal, anterior temporal and frontal areas during the object vision task, whereas the strong effective connections in the spatial vision task included the links between dorsal occipital, parietal and frontal areas. A similar face matching task was used in a study comparing healthy elderly subjects with mildly demented Alzheimer patients (Horwitz et al., 1995). It was found that the old healthy controls had strong functional linkages in the same ventral network discussed above for young subjects, while the functional linkages in the Alzheimer group involving the ventral frontal area with posterior extrastriate regions were markedly reduced (see Fig. 1). However, frontal areas showed more extensive positive correlations with other frontal areas in the Alzheimer patients than in controls. These results suggest that mildly demented Alzheimer disease patients, who were able to perform the face matching task with the same accuracy as the controls, were utilizing different neural circuits than were controls, thus
189
Healthy Controls
DAT Patients
Path Coefficients Positive
-
0.7 to 1.0 0.4 to 0.6a 0.1 to 0.3
Fig. I . A comparison of the functional connectivity of the right hemispheres in normal aged adults (Healthy Controls) and patients with mild Alzheimer’s disease (DAT patients) from Horwitz et a]., 1995. Existence of connections between regions is based on known anatomical data, as shown in the figure; structural equation modeling allows relative strengths among these regions to be computed. Black arrows indicate positive coefficients, while gray arrows show negative coefficients. In addition to the differences shown in the figure, it was also found that correlations among frontal regions were higher in the Alzheimer’s patients than in normal controls. Numbers in boxes denote Brodmann areas.
emphasizing the criticaI role played by functional plasticity in early Alzheimer’s disease. Recently, structural equation modeling has also been applied to functional brain imaging data obtained by fMRI (Buechel and Friston, 1997; Bullmore et al., 1997). Because one gets many data points from the time courses of the fMRI activity, systems-level networks for individual subjects can be determined by using correlations between the time courses of fh4RI activity in different spatial locations (e.g. Buechel and Friston, 1997).
Large-scale neural modeling and functional brain imaging Problems with relating functional brain imaging data to neural activity
A number of problems prevent the direct interpretation of functional imaging data in terms of the
underlying neuronal activity in any direct or easy way (Horwitz, 1994; Horwitz and Sporns, 1994). First, the spatial resolution of human brain imaging devices is large compared with the size of neurons or cortical columns, which means that multiple and diverse neuronal populations are lumped together in any resolvable PET or fMRI region of interest (even a single voxel). Second, whereas the time resolution of neuronal events is on the order of milliseconds, the appropriate temporal dimension is on the order of a few seconds for the hemodynamic methods, even fMRI. This implies that important transient components of activity are invisible to PET or fMRI. Third, activity measured in non-human animal studies by electrical recordings generally represents the firing of neurons (i.e., action potentials), whereas the hemodynamic measurements most likely reflect synaptic activity to a larger extent than neuronal activity (Mata et al.,
190
1980; Jueptner and Weiller, 1995). Because of this, both excitatory and inhibitory synaptic activity probably result in increased PET or fMRI activity, even if they cause a decrease in overall local neuronal firing rates (Ackermann et al., 1984; Horwitz and Sporns, 1994; Jueptner and Weiller, 1995). These problems can be addressed explicitly in a large-scale neural model by imposing constraints derived from other sources, such as non-human primate electrophysiological and anatomical data. This approach was used by Arbib et al. (1995) to examine a saccade generation task. We (Tagamets and Horwitz, 1998) developed a large-scale neural model that can perform a delayed match-to-sample (i.e., working memory) visual task in a PET setting.
Constructing a large-scale model of PET and JnRl While neural networks traditionally have been used to model the qualitative behavior of biological systems, functional neuroimaging provides a wealth of quantitative data that can also be taken into account in a large-scale neural model. Accounting for quantitative data in a large-scale network requires identifying key factors that dominate changes in total synaptic activity, which determines blood flow, in addition to local firing rates, which act as a measure of the qualitative behaviors observed in the model. The main factors that we focus on are: (1) the balance of excitatory and inhibitory connections in local circuits; (2) the total balance of local vs. afferent synaptic activity; (3) the relative timing of the different components of the tasks; and (4) the nature of interregional connections, e.g. total strength, sparseness, and fanout. Since changes in each of these factors can to some degree be responsible for brain dysfunction, the relationship between these and resulting blood flow can help to understand and predict differences seen between patient and normal populations. A bottom-up method is used in designing the network. The exact structure of a basic element of
the network is chosen first, local connectivity proportions are determined from anatomical data in cat and monkey studies (Douglas et al., 1995). Then parameters are chosen to achieve target dynamic behaviors within a single basic element. Connecting groups of these elements into local regions and finally connecting the regions to one another creates the full network. Human imaging studies can then be simulated by presenting series of stimuli to the network and simultaneously following both the simulated electrical activity (e.g. neuronal firing rate) and the temporally and spatially summed absolute values of synaptic activity that represent the simulated PET/fMRI activity. The spatial scale of a basic element of the model is chosen to be at the level of a cortical column, representing a local assembly whose elements tend to have similar responses (e.g. (Hubel and Wiesel, 1977; Tanaka, 1993; Wilson et al., 1994). In visual areas, the size of these columns has been estimated to range from about 450 p m to 750 p m in diameter (Ts’o et al., 1986; Tanaka, 1993), which corresponds to a scale of about an order of magnitude below the resolution of PET data. A Wilson-Cowan unit (Wilson and Cowan, 1973) is used as the basic local element, with one unit representing the local excitatory population of neurons and another the inhibitory cell population within the column. Connection strengths within the element are based on anatomical data and reflect the relatively large contribution made by local excitatory-toexcitatory connections in the cortex. Local synaptic efficacies are fixed to 0.6, 0.15 and - 0.15 for all excitatory-to-excitatory, excitatory-to-inhibitory and inhibitory-to-excitatory connections, respectively, within the basic element. Total afferents from other areas are limited to between 10% and 20% of all connections, a proportion that is in line with anatomical observations (Douglas and Martin, 1991; Douglas et al., 1995). The resulting basic element is shown in Fig. 2(A). The circle labeled E represents the excitatory population and the circle labeled I indicates the inhibitory population. Activation rule: A sigmoid activation rule (Eqns 1 and 2) is used for updating the units within a basic element:
191
B
A
Fig. 2. (A) The basic unit of the model. The unit labeled E represents the excitatory population and I the inhibitory population in a local assembly such as a cortical column. Local synaptic activity is dominated by the local excitation and inhibition, while afferents account for the smallest proportion, as indicated by the synaptic weights shown. (B) A cortical area is modeled by one or more 9 X 9 sets of basic units. The excitatory population is shown in bold lines above the inhibitory group, shown in lighter lines. Individual units in the excitatory and inhibitory populations within a group are connected as shown in A of this figure.
dldt E , ( t ) = A (1/[1+e-Kf('"~~(')-T~ I) +NE(t)l
- sE~(t)
Computing PET activations (l)
dldt I, (r) = A ( 1/[ 1 + e - K d " ' ~ ~-( f'J ) I) + N / (01 - SZI(t)
(2)
El ( t ) and I,( t ) represent the electrical activations of the ith excitatory and inhibitory elements at time t, respectively. KE and K, are the gains, or steepnesses, of the sigmoid functions for excitatory and inhibitory units, respectively, T~ and T / are the input thresholds of excitatory and inhibitory units, A is the rate of change, 6 is the decay rate, and NE(t) and N , ( t ) are noise terms. The main parameters are the thresholds ( T ~and T / ) and gain ( K E and K,). Parameter values are chosen by both analytical and empirical means in order to achieve desired dynamics within a single basic circuit (see Tagamets and Honvitz 1998, for details). These are different for the excitatory and inhibitory units, with T~ > T / and K,-2KE. Table 1 summarizes the values used for all the parameters and their effects on computed rCBF (see below) within a single basic unit.
The simulated blood flow rCBF,(t) is the total synaptic activity within element i at time t . During simulations the total blood flow is computed by summing the absolute values of all incoming synaptic activities and integrating rCBF,(t) over time:
wrjis the connection weight from unit j to unit i and u, (t)is the current firing level activity in element j
at time t. This is similar to the method used in Arbib et al. (1995) for computing simulated blood flow. A local area A in our model is created by a group of 81 basic elements arranged as shown in Fig. 2(B). These elements are arranged spatially in a 9 x 9 configuration, which represents a patch of cortex about 0.5 to 1 cm. in diameter. In order to simulate an imaging (e.g. PET) study, total rCBF is computed from all elements in region A by summing equation 3 over all units i.
192
TABLE 1 Values of parameters that are used in the activation rule of the model, and effects on simulated rCBF that come from changing these values. Numbers in columns 1 and 3 (Value) give the parameter values that were used in all areas of the model, in excitatory elements and inhibitory elements, respectively. The Effect columns show the changes in computed rCBF that come from changing these values. These changes are computed from modeled rCBF in a single area that receives constant input to the excitatory element. Key: means that raising the parameter value raises computed rCBF in the area. 1means that raising the parameter value reduces computed rCBF in the area. NC indicates that the value of the parameter has no effect on computed rCBF. (From Tagamets and Horowitz, 1998.)
K (gain) Threshold Noise Rate of change (A) Decay (6)
Value
Effect
Value
Effect
Excitatory elements
Raising value causes rCBF to:
Inhibitory elements
Raising value causes rCBF to:
1 1 t
20.0 0. I k0,l 0.5 0.5
1 t T
9.0 0.3 *0.1
0.5 0.5
NC
I
Example of effects of local connectivity and aff erents The basic element takes into account the relative importance that local circuits are thought to have in shaping a neuronal response within a local cortical region. This is particularly significant for the interpretation of imaging data, since, as discussed above, the hemodynamic measures from imaging data are thought to reflect mainly synaptic activity, and not neuronal firing rates per se. Thus, the most likely scenario for the observed hemodynamic response (Eqn 3) is that it is the result of an interaction between afferent activity and local response. Furthermore, in a circuit such as this, there is a non-trivial relationship between mean firing rate of the excitatory units in an area (i.e. pyramidal cells, those usually recorded in nonhuman electrophysiological studies) and total synaptic activity in the population of all cells. As an example (Tagamets and Horwitz, 1997), we used a single model area A for examining how the amount of local recurrent excitation can affect computed rCBF under different conditions, such as during a task and a passive control. In the absence of any local connectivity, all synaptic activity is from an external source. In this case, any afferent inhibition would raise overall synaptic activity. In a local circuit such as shown in Fig. 2A, however, it is not obvious what the net effect of inhibition
NC
t
would be, since there is a great deal of recurrent connectivity within the element. We examined how inhibition would affect rCBF in the area A under a range of local excitatory-to-excitatory connection strengths, and at two different levels of excitation that would be similar to a task condition and a control condition. Simulations were performed by presenting a pattern to a fixed array of excitatory inputs that project to the excitatory population of the area and computing rCBF for the duration of the stimulus. Two types of patterns were used: (1) a subset of units were set to a high activity level to depict active input, such as that which may engage a region during an active task; and (2) all inputs were set to a uniformly low value, to simulate a low level of afferent input to the region. In order to examine how inhibition would affect blood flow during each of these conditions, each task was run once with added inhibition and once without. Inhibition was implemented as a low level of afferent excitation to all of the local inhibitory units in area A. Figure 3 shows the results of the simulations for each of the conditions. The x-axis represents excitatory-to-excitatory connection strength within the basic elements of the area, i.e. the value of the E-to-E connection strength that is shown as 0.6 in Figure 2A. Curves that correspond to the task condition are shown in solid lines, while the low activity condition curves are dashed. First, it is seen
193
from the rise in all curves that increasing local recurrent excitation level increases the local rCBF response, as would be expected. Note that all inputs are the same for all points along each curve, with the only difference being the amount of local recurrent excitation. Second, with low levels of local feedback excitation, below about 0.5 (50%) along the x-axis, both the high and low activity cases show a higher rCBF in the presence of inhibition (diamonds) than without inhibition (plain). This effect, i.e. increased rCBF caused by inhibition, has been suggested in both theoretical and experimental studies (Horwitz and Sporns, 1994; Jueptner and Weiller, 1993, and was also demonstrated in the model of Arbib et al. (1995). However, with higher levels of recurrent local excitation, this effect changes, so that excitation without inhibition (plain curves) results in higher computed rCBF than with inhibition (diamonds). This can be explained by the high degree of nonlinearity introduced by the recurrent connections. With low recurrent connections, the total
synaptic activity from inhibitory synapses exceeds the reduction caused by the lower activity of the E units. With high excitatory-to-excitatoryfeedback, inhibition reduces excitation in the E units by a much greater amount, lowering the total synaptic activity contributed by these recurrent connections by more than the extra synaptic activity added by the inhibition. These results suggest that inhibition may possibly either lower or raise the hemodynamic response in a region, depending on the amount of local excitatory feedback in the region, and on whether the site is being activated from other sources. Such effects need to be explored in more detail before definitive interpretation of the neuronal dynamics underlying PET and fMRI data is possible. Furthermore, these considerations are especially relevant to the understanding of imaging results from patient populations, since the effects of focal lesions, in which whole populations of columns are removed, are likely to be quite different from those that result from more diffuse degenerative processes, where local connectivity is
Effect of Recurrent Excitation Strength on Modeled PET
__
Rest
- Task + Rest with Inhibition + Task with Inhibition
1
0.00
1
I
I
I
I
I
I
I
I
I
I
I
I
0.25 0.50 Recurrent Excitatory Connection Strength
I
I
I
0.75
Fig. 3. The effects of increasing local recurrent connectivity on simulated rCBF response to inhibition. Unmarked curves show the modeled rCBF activity without any added inhibition. Curves marked by diamonds show computed rCBF with extra inhibition. Solid lines show an active task condition, in which there is a moderately high extrinsic signal arriving into the area, such as during sensory input. The dashed curves show a resting condition, in which most activity is intrinsic, such as during a baseline resting condition. The X-axis indicates the strength of the excitatory-to-excitatory connection as percent of total connections. This determines the degree of local excitatory recurrence . The Y-axis shows computed modeled PET activity in a single model region. Note that between X=OS and 0.7, inhibition has the opposite effects in rest and active conditions.
I94
reduced, or from disconnection between areas, in which local circuitry may remain unchanged but total afferents are reduced. Sirnularing a PET study The differences between temporal scale of underlying neuronal events and hemodynamic measures is a major confound for the interpretation of PET and fMRI. A human brain imaging study typically involves a number of separate cognitive processes, often mediated by different networks in the brain. Many of these also occur at different times during the study, and, especially in PET, are combined during a single scan. As an example, during a delayed match-to-sample task there are several qualitatively different phases during a single trial: (1) The cue period, when the initial stimulus is presented. This is usually on the order of one to a few seconds during which there is visual input that the subject must attend to and encode for later recall. (2) A delay period, during which there is no visual input and the subject presumably keeps the cue item in memory. This typically lasts from one to tens of seconds. (3) A test period, when a second stimulus is presented. At this time there is visual input that the subject must compare to the previously seen cue. (4) A decision event, when the subject makes a choice. ( 5 ) The inter-trial period, before the next trial begins. These phases are generally all above the temporal resolution of PET and some are even above that of fMRI. Single-cell recordings in non-human primates have made some progress in identifying different populations that mediate at least some of these phases in monkeys (e.g. Fuster, 1973, 1990; Haenny et al., 1988; Gallant et al., 1993; Wilson et al., 1993; Roe and Ts’o, 1995). Human brain imaging studies have identified areas of the human brain that activate presumably homologous regions (e.g. Corbetta et al., 1991; Sergent et al., 1992; Haxby et al., 1995; Courtney et a]., 1996). Monkey electrophysiology studies have also shown that there is a significant interaction between
perceptual response and cognitive states, such as attention (e.g. Fuster, 1973; Desimone and Schein, 1987; Haenny et al., 1988; Funahashi et al., 1990; Miller et al., 1996). However, the pathways that mediate such effects are not well known. Largescale neural modeling can help to elucidate complex interactions such as those that result from recurrent feedforwardfeedback loops that are thought to be operative in the brain. Example: A network that simulates a PET matchto-sample study In order to examine the effects of timing and the interaction of feedforward and feedback influences, we created a large-scale model that includes multiple brain regions along the ventral object vision pathway extending into the frontal lobe (Tagamets and Horwitz, 1998). In this case, the goal was to simulate a PET subtraction study of visual working memory, and to examine areas of the cortex that are specifically involved in shortterm memory processing in the visual pathway. The network includes areas V1/V2, V4, TEO/IT and a prefrontal area. An additional 9 x 9 array is used to simulate input from the lateral geniculate nucleus (LGN). Stimuli are simple shapes constructed from line segments, and each region along the path from V I N 2 - > V4 - > TEOAT responds to successively more complex features of the stimuli. A match-to-sample PET study is simulated by presenting a sequence of stimuli with delays between stimuli. During the simulations, the elements in the different regions of the model respond with electrical activity that is similar to that seen in electrophysiological studies. At the end of a series of repetitions of the task, i.e. after a simulated PET session, the total summed blood flow activity is similar to that seen in human PET studies of the delayed match-to-sample task. The prefrontal region of the model is composed of extended circuits that act as a working memory that mediates the task (see Fig. 4).Two subpopulations of elements mediate the matching task by holding a representation of the last seen visual stimulus in memory during a delay period, after which there is a response from another subpopula-
195
r Feedback to Earlier Areas
LEGEND C: Cue selective unit D2: Cue + Delay activity D 1: Delay period activity only R: Response unit Fig. 4. A worhng memory is added to the prefrontal areas of the model as a local circuit composed of different types of units, as identified in Funahashi et al. (1990). Each element of the circuit is a basic unit such as shown in Figure 2(A). Inhibitory connections are implemented by excitatory connections onto inhibitory units. An attentional modulation is modeled as a diffuse, weak activity directed into the D2 units. D1 and D2 units are also the source of feedback into earlier areas. Details of the connection weights are given in Tagamets and Horwitz, 1998.
tion of units only if the second stimulus matches the first. One simulated scanning session is performed with a high level of modulatory input into the D2 units, thus representing a task in which matching is required. A second set of similar trials serves as a control task, in which the stimuli are similar to the first task, but there is no requirement for matching, and the stimuli are not held in memory. This is implemented by lowering the modulatory input into the D2 units. The number of activated units in the LGN area are the same in both tasks. In the match-to-sample task, the effect of interest is the mechanism of matching ability and the modulation of the delay period activity that activates the memory circuits. For more details on the construction of the network, see Tagamets and Horwitz (1998). Figures 5A and 5B show the time courses of the simulated electrical activities at low and high attention levels, respectively. The activities in the D1, D2 and Response units are similar to those found in Funahashi et al., 1990. A comparison of
the simulated rCBF results with experimental data is shown in Table 2. We found it necessary to add a connection from the prefrontal area to V4 in order to successfully simulate the PET activations that are generally seen in V4 during visual tasks requiring attention or memory. The need for such a connection was also found in the structural equation modeling study by McIntosh et al. (1994). Although a direct connection from PF to V4 is not known to exist, it is possible that this is an indirect path; through subcortical areas, for example. A key conclusion from these simulation studies is that the observed results are due to an interaction of bottom-up and top-down effects. A neural model such as the one presented here can be used to examine these types of interactions in more detail, and help guide design of experiments in the future. This model has also been applied to simulation of fMRI and has helped to better understand how factors such as the hemodynamic delay affect the results (Horwitz et al., 1998a).
196
Response
D2 Dl Cue
IT v4 A.
Response
D2
D1 Cue
IT v4 B.
Fig. 5 . Time course of electrical activities of sample units from different areas of the model during five tri: of a task. Stimuli are indicated by the filled blocks in the timeline at the bottom of the figure. Each block corresponds to I second of simulated time. Trials are separated by dashed lines. (A) Low attention level. Electrical activity of a sample unit from each area, including each type of frontal unit in the memory circuit. (B) High attention level. Electrical activity of the same sample units that are shown in part A of this figure. Note the difference in activity patterns between the two types of delay units. Delay activity only occurs when the attention level is high, and the corresponding response unit shows a brief burst of activity only when the memory has been retained during the delay. During simulations, random noise is added at each iteration, yielding the variability in electrical responses seen in this figure.
197
TABLE 2 Comparison of simulated blood flow and experimental PET data. Data are normalized to the value of the high-attention V1 activations in each of the two cases (simulated and experimental). The experimental data are from that used in the paper by Haxby et al., 1995. (Adapted from Tagamets and Hortwitz, 1998) ~
VI
v4
TEO/IT
Prefrontal
~
Simulated High attention Low attention Percent change
Experimental High attention Low attention Percent change
I .oo 0.97
0.90 0.85
0.83 0.8 1
0.70 0.67
3.1
5.2
2.5
3.5
1.oo 0.97
0.89 0.83
0.81 0.77
0.7 1 0.68
2.7
8.1
4.2
4.1
Discussion Maximizing our understanding of human brain imaging data requires the application of diverse and complementary techniques. The subtraction paradigm is useful for identifying specific regions that may be involved in a cognitive process of interest. As a further step along the spectrum of methods, correlation techniques have been applied in order to identify assemblies of areas that may serve as a network mediating each task under investigation. Structural equation modeling extends the use of correlations in a setting that is constrained by knowledge of anatomical connection patterns, resulting in a systems-level model functional network for each cognitive task under study. As suggested by the Alzheimer example given above, structural equation modeling has the potential to demonstrate that a patient group is not using the same functional network as are healthy subjects to perform a cognitive task, even if task performance is the same in both groups. Extending this method to fMRI single subject analyses may have diagnostic usefulness. In particular, for neurodegenerative diseases, the brain’s compensatory mechanisms may result in no obvious clinical abnormalities even if brain pathology has started to occur. Examining the functional networks mediating appropriate cognitive tasks may indicate that brain pathology has become sufficiently extensive that cotnpensatory cognitive mechanisms are being utilized in at-risk subjects.
Large-scale neural modeling can make explicit use of both anatomical and single-cell data to clarify how the results of the foregoing methods might be interpreted in terms of underlying neuronal events. For example, inhibition has been hypothesized to raise rCBF and BOLD, making it unclear how to interpret either local increases or the polarity of correlations. The results of the first simulation study in the previous section suggest that this can be a complex matter that depends on an interaction between local connectivity and task. For example, if the local connectivity is reduced in an area, as it may be in some degenerative disorders, our modeling results indicate that tonic inhibition can produce increases in blood flow under a range of activity levels of the area; with normal local connectivity, however, inhibition may cause decreased rCBF if this area is engaged by a task, and increased rCBF if the task results in little afferent input to the region. With a large-scale model, explicit hypotheses can be examined in different contexts by varying the effects of parameters that correspond to biological substrates such as synaptic density or receptor efficacy. Manipulating these parameters in a way that changes simulated firing rates to correspond to observed single-cell recordings, then observing the effect on simulated blood flow, may yield insights and help guide experimental design of human studies. The gain parameter K, for example, has the effect of increasing signal-tonoise ratio of the simulated electrical activity,
198
which is similar to the proposed effect of dopamine. At the same time, increasing K in the E units of the model reduces the simulated rCBF in the region. Such manipulations can be used to examine both normal function and intervention studies, such as pharmacological intervention, with applicability to a variety of cognitive disorders.
References Ackermann, R.F., Finch, D.M., Babb, T.L. and Engel, J . Jr. ( I 984) Increased glucose metabolism during long-duration recurrent inhibition of hippocampal pyramidal cells. J. Neurosci., 4: 251-264. Arbib, M.A., Bischoff, A,, Fagg, A.H. and Grafton, S.T. (1995) Synthetic PET: Analyzing large-scale properties of neural networks. Hum. Brain Mapp., 2: 225-233. Buechel, C. and Friston, K.J. (1997) Modulation of connectivity in visual pathways by attention: Cortical interactions evaluated with structural equation modeling and fMRI. Cereb. Cortex, 7: 768-7711, Bullmore, E.T., Honvitz, B., Curtis, V.A., Moms, R.G., McGuire, P.K., Sharma, T.W.S.C.R., Murray, R.M. and Brammer, M.J. (1997) A neurocognitive network for reading, semantic decision and inner speech: Path analysis of fMRI data in normal and schizophrenic subjects. Soc. Neurosci. Abstv., 23: 2227. Bullmore, E.T., Rabe-Hesketh, S., Morris, R.G., Williams, S.C.R., Gregory, L., Gray, J.A. and Brammer, M.J. (1996) Functional magnetic resonance image analysis of a largescale neurocognitive network. Neurolmage, 4: 16-33. Corbetta, M., Miezin, EM., Dobmeyer, S., Shulman, G.L. and Petersen, S.E. (1991) Selective and divided attention during visual discriminations of shape, color, and speed: Functional anatomy by positron emission tomography. J. Neurosci., 1 1: 2383-2402. Courtney, S.M., Ungerleider, L.G., Keil, K. and Haxby, J.V. (1996) Object and spatial visual working memory activate separate neural systems in human cortex. Cereb. Cortex, 6: 3949. Damasio, A.R. (1989) The brain binds entities and events by multiregional activation from convergence zones. Neur: Comput., 1: 123-132. Dejerine, J. ( 1 892) Contribution a I’etude anatomo-pathologique et clinique des differentes varietes de cecite verbale. Menzoires Soc. Biol., 44: 61-90. Desimone, R. and Schein, S.J. (1987) Visual properties of neurons in area V4 of the macaque: Sensitivity to stimulus form. J. Neurophysiol., 57: 835-868. Dolan, R.J. and Friston, K.J. ( 1 997) Functional imaging and neuropsychiatry. Psychol. Med., 27: 1241-1 246. Douglas, R.J., Koch, C., Mahowald, M., Martin, K.A.C. and Suarez, H.H. (1995) Recurrent excitation in neocortical circuits. Science, 269: 981-984. Douglas, R.J. and Martin, K.A.C. (1991) A functional microcircuit for cat visual cortex. J. Physiol., 440: 735-69.
Frackowiak, R.S.J., Friston, K.J., Frith, C.D., Dolan, R.J. and Mazziotta, J.C. (1997) Human Brain Function. San Diego, CA: Academic Press. Friston, K.J. (1994) Functional and effective connectivity in neuroimaging: A synthesis. Hum. Brain Mapp., 2: 56-78. Friston, K.J., Frith, C.D., Liddle, P.F. and Frackowiak, R.S.J. ( 1993) Functional connectivity: the principal-component analysis of large (PET) data sets. J. Cereb. Blood Flow Metabol., 13: 5-14. Funahashi, S., Bruce, C. and Goldman-Rakic, P.S. (1990) Visuospatial coding in primate prefrontal neurons revealed by oculomotor paradigms. J. Neurophysiol., 63: 814-83 1. Fuster, J.M. (1973) Unit activity in prefrontal cortex during delayed-response performance: Neuronal correlates of transient memory. J. Neurophysiol., 36: 61-78. Fuster, J.M. (1990) Inferotemporal units in selective visual attention and short-term memory. J. Neurophysiol., 64: 681-697. Gallant, J.L., Braun, J. and Van Essen, D.C. (1993) Selectivity for polar, hyperbolic and Cartesian gratings in macaque visual cortex. Science, 259: 100-103. Haenny, P.E., Maunsell, J.H. and Schiller, P.H. (1988) State dependent activity in monkey visual cortex. 11. Retinal and extraretinal factors in V4. Exp. Brain Res., 69: 245-259. Haxby, J.V., Grady, C.L., Horwitz, B., Ungerleider, L.G., Mishkin, M., Carson, R.E., Herscovitch, P., Schapiro, M.B. and Rapoport, S.I. (1991) Dissociation of object and spatial visual processing pathways in human extrastriate cortex. PNAS-USA, 88: 1621-1625. Haxby, J.V., Ungerleider, L.G., Honvitz, B., Rapoport, S.I. and Grady, C.L. (1995) Hemispheric differences in neural systems for face working memory: A PET-rCBF study. Hum. Brain Mapp., 3: 68-82. Hayduk, L. (1987) Structural Equation Modeling with LISREL. Baltimore, MD: The Johns Hopkins University Press. Henderson, V.W. (1986) Anatomy of posterior pathways in reading: A reassessment. Brain Lung, 29: 119-33. Honvitz, B. (1990) Simulating functional interactions in the brain: A model for examining correlations between regional cerebral metabolic rates. Int. J. Biomed. Comput., 26: 149-170. Horwitz, B. (1994) Data analysis paradigms for metabolic-flow data: Combining neural modeling and functional neuroimaging. Hum. Brain Mapp., 2: 112-122. Horwitz, B., Bertelson, J., Beauchamp, M. and Tagamets, M.-A. (1998b) A large-scale neural model linking local neuronal dynamics to fMRI data, 4th International Conference on Functional Mapping of the Human Brain, Montreal, Neurolmage, 7: S769. Horwitz, B., Duara, R. and Rapoport, S.I. (1984) Intercorrelations of glucose metabolic rates between brain regions. J. Cereb. Blood Flow. Metabol., 4: 484-499. Honvitz, B., Grady, C.L., Haxby, J.V., Ungerleider L.G., Schapiro, M.B., Mishkin, M. and Rapoport, S.I. (1992a) Functional associations among human posterior extrastriate brain regions during object and spatial vision. J. C o p . Neurosci., 4: 31 1-322.
199 Horwitz, B., Mclntosh, A.R., Haxby, J.V., Kurkjian, M., Salerno, J.A., Schapiro, M.B., Rapoport, S.I. and Grady, C.L. (1995) Network analysis of PET-mapped visual pathways in Alzheimer type dementia. NeuroReport, 6: 2287-2292. Horwitz, B., Rumsey, J.M. and Donohue, B.C. (1998) Functional connectivity of the angular gyrus in normal reading and dyslexia. PNAS-USA, 95: 8939-8944. Horwitz, B., Soncrant, T.T. and Haxby, J.V. (1992) Covariance analysis of functional interactions in the brain using metabolic and blood flow data. In: F. Gonzalez-Lima, T. Finkenstadt and H. Scheich (Eds.), Advances in Metabolic Mapping Techniques for Brain Imaging of Behavioral and Learning Functions. Dordrecht: Kluwer Publishing Co., pp. 189-2 17. Horwitz, B. and Sporns, 0. (1994) Neural modeling and functional neuroimaging. Hum. Brain Mapp., 1: 269-283. Horwitz, B., Swedo, S.E., Grady, C.L., Pietrini, P., Schapiro, M.B., Rapoport, J.L. and Rapoport, S.I. (1991) Cerebral metabolic pattern in obsessive-compulsive disorder: altered intercorrelations between regional rates of glucose utilization. Psychiar. Res. Neuroimaging, 40: 22 1-237. Hubel, D.H. and Wiesel, T.N. (1977) Functional architecture of macaque visual cortex. Proc. R. Soc. Lond. B, 198: 1-59. Joreskog, K. and Sorbom, D. (1979) Advances in Factor Analysis and Structural Equation Modeling. Cambridge, MA: Abt Publishers. Jueptner, M. and Weiller, C. (1995) Review: Does measurement of regional cerebral blood flow reflect synaptic activity? Implications for PET and fMRI. Neuroimage, 2: 148-156. Lagreze, H.L., Hartmann, A., Anzinger, G., Schaub, A. and Deister, A. (1993) Functional cortical interaction patterns in visual perception and visuospatial problem solving. J. Neurol. Sci., 114: 25-35. Mata, M., Fink, D.J., Gainer, H., Smith, C.B., Davidsen, L., Savaki, H., Schwartz, W.J. and Sokoloff. L. (1980) Activitydependent energy metabolism in rat posterior pituitary primarily reflects sodium pump activity. J. Neurochem., 34: 213-215. McIntosh, A.R. and Gonzalez-Lima, F. (1991) Structural modeling of functional neural pathways mapped with 2-deoxyglucose: effect of acoustic startle habituation on the auditory system. Bruin Res., 547: 295-302. McIntosh, A.R. and Gonzalez-Lima, F. (1992) The application of structural modeling to metabolic mapping of functional neural systems. In: F. Gonzalez-Lima, T. Finkenstadt and H. Scheich (Eds.), Advances in Mettrbolic Mapping Techniques for Brain Imaging of Behavioral and Learning Functions, Dordrecht: Kluwer Publishing Co., pp. 219-255. McIntosh, A.R. and Gonzalez-Lima, F. (1994) Structural equation modeling and its application to network analysis in functional brain imaging. Hum. Brain Mapp., 2: 2-22. McIntosh, A.R., Grady, C.L., Ungerleider, L.G., Haxby, J.V., Rapoport, S.1. and Horwitz, B. (1994) Network analysis of cortical visual pathways mapped with PET. J. Neurosci., 14: 6.55-666. Mesulam, M.-M. ( 1990) Large-scale neurocognitive networks and distributed processing for attention, language and
memory. Ann. Neurol., 28: 597-613. Miller, E.K., Erickson, C.A. and Desimone, R. (1996) Neural mechanisms of visual working memory in prefrontal cortex of the macaque. J. Neurosci., 16: 5 154-5 167. Moeller, J.R., Strother, S.C., Sidtis, J.J. and Rottenberg, D.A. (1987) Scaled subprofile model: A statistical approach to the analysis of functional patterns in position emission tomographic data. J. Cereb. Blood Flow.Metabol., 7: 649-658. Ogawa, S., Menon, R.S., Tank, D.W., k m , S.-G., Merkle, H., Ellerman, J.M. and Ugurbil, K. (1993) Functional brain mapping by blood oxygenation level-dependent contrast magnetic resonance imaging. Biophys J., 64: 803-812. Posner, M.I., Petersen, S.E., Fox, P.T. and Raichle, M.E. (1988) Localization of cognitive operations in the human brain. Science, 240: 1627-1 63 1. Posse, S., Miiller-Gartner, H.W. and Dager, S.R. (1996) Functional magnetic resonance studies of brain activation. Sem. Clin. Neuropsychiat., 1: 76-88. Roe, A.W. and Ts'o, D.Y. (1995) Visual topography in primate V2: Multiple representation across functional stripes. J. Neurosci., 15: 3689-715. Roland, P.E. (1993) Brain activation. New York: Wiley-Liss. Roy, C.S. and Shemington, C.S. (1890) On the regulation of the blood supply of the brain. J. Physiol. (Lond.), 11: 85-105. Sergent, J., Ohta, S. and Macdonald, B. (1992) Functional neuroanatomy of face and object processing: A positron emission tomography study. Brain, 115: 15-36. Siesjo, B.K. (1978) Bruin Energy Metabolism. Chichester: John Wiley and Sons. Tagamets, M.-A. and Horwitz, B. (1997) Modeling brain imaging data with neuronal assembly dynamics. In: J.M. Bower (Ed.), Comput. Neurosci.: Trends Res. 1997. New York: Plenum Press, pp. 949-953. Tagamets, M.-A. and Horwitz, B. (1998) Integrating electrophysiological and anatomical experimental data to create a large-scale model that simulates a delayed match-to-sample human brain imaging study. Cereb. Cortext, 8: 3 10-320. Tanaka, K. (1993) Neuronal mechanisms of object recognition. Science, 262: 685-688. Ts'o, D., Gilbert, C.D. and Wiesel, T.N. (1986) Relationships between horizontal interactions and functional architecture in cat striate cortex as revealed by cross-correlation analysis. J. Neurosci., 6: 1160-1 170. Turner, R. (1995) Functional mapping of the human brain with magnetic resonance imaging. Sem. Neurosci., 7: 179-194. Ungerleider, L.G. and Mishkin, M. (1982) Two cortical visual systems. In: J. Ingle, M.A. Goodale and R.J.W. Mansfield (Eds.), Analysis of Visual Behavior. MIT Press, pp. 549586. Wilson, F.A.W., O'Scalaidhe, S.P. and Goldman-Rakic, P.S. (1993) Dissociation of object and spatial processing domains in primate prefrontal cortex. Science, 260: 1955-1958. Wilson, F.A.W., 0 Scalaidhe, S.P. and Goldman-Rakic, P.S. (1994) Functional synergism between putative gammaaminobutyrate-containing neurons and pyramidal neurons in prefrontal cortex. PNAS-USA, 91: 4009401 3 .
200 Wilson, H.R. and Cowan, J.D. (1973) A mathematical theory of the functional dynamics of cortical and thalamic nervous tissue. Kybernetik, 13: 55-80.
Zeki, S. (1990) Functional specialization in the visual cortex. In: G.E. Edelman, W.E. Gall and W.M. Cowan (Eds.), Signal and Sense. New York: Wiley-Liss, pp. 85-130.
SECTION I11
Neurology
This Page Intentionally Left Blank
J.A. Reggia. E. Ruppin and D. Glanzman (Eds.) Progress in Brain Research. Vol 121 0 1999 Elsevier Science BV. All rights reserved.
CHAPTER 12
Unmasking unmasked: neural dynamics following stroke William W. Lytton", Samuel T. Williams and Samuel J. Sober Department of Neurology, Neuroscience Program, University of Wisconsin, Wm. S. Middleton VA Hospital, I300 University Avenue, MSC 1715, Madison, WI 53706-1532, USA
Anatomy is not destiny Recovery from disease involves the interaction of the biological substrate with behavior. Nowhere is this more clearly seen than in the setting of a neurological disease such as stroke, where mind and behavior are directly, though sometimes subtly, affected. During recovery from stroke, a large web of causal chains lead to an improved condition or to continued disability. Recovery starts with the mental and physical process of rehabilitation therapy, which alters the physical state of the brain as it teaches new strategies and enhances muscle tone. Subsequently, the altered brain patterns in part determine what further rehabilitative actions the patient is capable of, leading to a vicious or virtuous cycle. Although many aspects of these healing cycles are beyond our control and will likely remain so, there are points at which we can intervene. In order to achieve the greatest rehabilitative effect, we can choose the timing and type of rehabilitation strategy and pharmacotherapeutic measures that will produce desirable brain changes. The difficulty in connecting brain and behavior is a prominent part of the more general problem of causality in neuroscience. Assigning cause is difficult because of the enormous complexity of this organ. Like the better understood liver or lung, the brain is studied using a variety of techniques *Corresponding author. Tel.: (608) 265 3524; Fax: (608) 262 2327; e-mail: bilY/@neurosim.wisc.edu
that necessarily focus on only one aspect of that organ, whether molecular, anatomical, biophysical or physiological. In the case of liver or lung, however, it is not unusual to be able to explain some global function singly, with a molecular, cellular, or anatomical explanation. In the brain, however, anatomy is not destiny, nor is molecular biology destiny. Complex interactions among different levels of organization are responsible for the remarkable functionality of the brain. In the dynamics of a cortical map, there are two major levels of operation. Cellular interactions depend on both anatomical connectivity and on the dynamics of physiology. While neural activation happens very quickly, in the order of milliseconds, activity-dependent synaptic reorganization occurs over hours or days. Though occurring on vastly different timescales, these two mechanisms interact. Synaptic change will occur on the basis of coincident neural activation, and neural activation will be determined by the underlying anatomical connections. In order to determine how the adult brain compensates when damage occurs, we study the interactions among structure, physiology, and behavior. Recovery involves plasticity, hence anatomical change; the anatomical factors cannot be understood without reference to physiology and behavior. It should also not be understood without reference to the underlying molecular receptor level, since these receptors will be the target of pharmacotherapeutic intervention in the recovery process. The goal of computer modeling in our
204
studies is to understand the interaction among levels of organization, as cells determine networks and networks behavior (Sejnowski et al., 1988). These interactions are not unidirectional; anatomy determines physiology, which determines behavior, but behavior also determines physiology, which determines anatomy.
Plasticity in the adult brain The goal of our research is to discover and take advantage of causal links that help or hinder recovery from stroke. To this end, we must devise a model that can account for what is seen in the physiology of animals and patients, and the changes they undergo before and after stroke. Models give us a quantifiable way to evaluate qualitative hypotheses concerning the mechanisms underlying physiological changes observed experimentally. They also suggest new hypotheses through the emergence of unexpected and unintended consequences. In the end, such models will lead to a deeper understanding of the processes of adult plasticity, suggesting better rehabilitation techniques for those suffering from impaired function. Cortical damage is often the result of cerebrovascular accident, which leads to the death of neural tissue. The vascular system has an organization and structure that is largely independent of the underlying brain regions. Since blood vessels are superimposed onto the nervous system, damage to the vessels affects parts of several functional neural areas rather than a single functional area in its entirety. Because of this, pure cases of singleregion damage are very rare, and the majority of strokes lead to deficits in many tasks. To recover function, the remaining neural tissue must demonstrate plasticity. Plasticity in the youthful brain has been well demonstrated, but our understanding of adult mechanisms is still lacking. The studies of Hubel and Wiesel firmly established the concept of a critical period in early brain development (Hubel, 1996; Hubel and Wiesel, 1998). Studies have shown that the establishment of ocular dominance, the ability to detect edges, and general cortical mappings, can be established only during a critical
developmental stage (Miller, 1995). Such research suggested that primary cortical organization is, for the most part, fixed from an early age. Of course, adult cortex is remodeling all the time, learning new memories and abilities. Recent studies of somatosensory and visual areas have also demonstrated adult plasticity at the cellular level. However, fundamental change is probably seldom called for outside of the domain of ablative diseases such as stroke (Nudo et al., 1996). The existence of plasticity in the adult brain has been primarily demonstrated in three ways: peripheral manipulation, intracortical microstimulation, and animal models of stroke. Most of these studies rely on the alteration of stimulus characteristics; for example, the removal of a finger or the presence of constant stimulation (Merzenich and Kaas, 1982; Wang et al., 1995; Buonomano and Merzenich, 1998). In general, cortical reorganization will occur so that heavily stimulated areas will see an increase in the size of their cortical maps, and weakly stimulated areas will see a reduction in representation (Clark et al., 1988; Robertson and Irvine, 1989; Kaas et al., 1990; Allard et al., 1991; Gilbert and Wiesel, 1992; Weinberger et al., 1993). Intracortical microstimulation shows another form of plasticity, in which stimulation of a cortical site will cause neurons in the vicinity to have receptive fields similar to the stimulated neurons (Dinse et al., 1990; Nudo et al., 1990; Smits et al., 1991; Recanzone et al., 1992). In animal models of stroke, the receptive fields of surviving cells expand, contract, and shift so as to occupy the space previously covered by the dead cells (Yamasalu and Wurtz, 1991; Alamancos and Borrel, 1995; Sober et al., 1997; Lytton et al., 1999).
The basic model Computational studies have assumed that recovery from cortical damage, recovery from peripheral damage, or response to a change in stimulus pattern of any kind would occur in two phases that correspond to the vastly different time constants of neural response and synaptic plasticity. This assumption has been partially confirmed by studies of long-term potentiation, occurring over a timeframe several orders of magnitude greater than that
205
of neural activation. Animal studies suggest that behavioral recovery may take place in two or more phases, consistent with this assumption (Nudo et al., 1996; Nudo and Milliken, 1996). According to the basic computational model, these two phases can be grossly characterized as: (1) dynamics; and (2) plasticity (Armentrout et al., 1994; Goodall et al., 1997). In the dynamic phase, changes in cell activity are due to the alteration in inputs to the remaining cells following an ablation. In the plastic phase, these alterations in cell activity start to change synaptic strengths, presumably through a Hebbian process relating strength increases to paired presynaptic and postsynaptic activity. The middle temporal visual area (area MT) has neurons that respond to visual motion. Thus, these cells do not respond to stationary stimuli but do respond when something moves in the visual field. Like cells in other visual areas, these cells have retinotopic tuning, that is, an individual cell will respond to a stimulus in one area of the visual field but will not respond to stimuli in other areas. Unlike cells in other visual areas, which may also be tuned to stimulus orientation or color, these cells are tuned to the attributes of visual motion: speed and direction. Small ablations of area MT lead to changes in the receptive fields of neighboring, nonablated cells and to changes in the ability of an animal to perceive motion (Wurtz et al., 1990; Yamasaki and Wurtz, 1991). In order to construct simple computer models of cortical dynamics, it is useful to generalize the concepts of receptive field and projective field (Fig. 1). Experimentally, a receptive field is the set of stimuli that will activate a particular neuron. In visual area MT, this receptive field includes both the retinotopic receptive field and the receptive field for speed and direction. In a sensory system, the projective field does not always have a clear operational definition. Ideally, the projective field of an elemental stimulus would correspond to the set of neurons activated. However, it is not generally possible to define an elemental stimulus nor to measure from multiple cells in order to determine which ones have been activated. Our basic neural network model for exploring receptive fields consists of input and processing layers (Fig. 1, see Appendix for details) (Miller et
al., 1989; Miller, 1994). The many layers of brain that lie between photoreceptor and high-order cortical area are collapsed into these two layers. Because of this foreshortening, the input layer is alternately conceived of as a primary receptor array for the purpose of mapping receptive fields, or as a thalamic or lower-level cortical area for the purpose of assessing cellular interactions. In our simulations we consider the input layer to be V 1 (primary visual area) and the main processing layer to be area MT. The V1 layer is simply a set of input values and not a set of processing units. Each cell in the MT layer receives stimulation from many cells in the V1 layer. This convergence can be regarded as an ‘anatomical receptive field’. Correspondingly, the divergence from a single V l layer unit to a set of MT units can be regarded as an ‘anatomical projective field’. The physiological responses of individual neurons in the network are not fully determined by these feedforward projections from V I to MT, however. Within the MT layer, there are lateral connections which provide proximal excitation and more distant inhibition in a centersurround or ‘Mexican hat’ pattern (Fig. l(B)). Activity mediated through these connections will interact with feedforward activity to produce excitation of the MT units. Because of the complexity of these interactions, it is not possible to determine the activity patterns by reference to the connectivity alone. Instead we must run simulations, analogs of physiological experiments, in order to measure receptive fields by mapping all stimulus locations that will produce suprathreshold responses in a specified MT cell.
Receptive field changes following ablation Physiologically, receptive field expansion and receptive field contraction are both observed in the surrounding cells that remain alive following an ablation (Fig. 2(A,C)). Technically, it is not possible to sample the same cell both pre- and postablation. Therefore, it is not possible to say whether a receptive field of a particular cell has become larger or smaller. One can however, sample a large number of cells in the same area both before and after the ablation. Postablation receptive fields were in many cases substantially larger than any of
206
Fig. I . (A) Computer model schematic: input layer (VI) and output layer (MT) are both two dimensional hexagonally tesselated arrays of nodes. Projective fields (left) are defined by the spread of activity to nodes in the output layer following stimulation of a single node in the input layer. Receptive fields (right) are defined by units in the input layer that can stimulate a given output unit layer to a supra threshold response. (B) The strength of connections from input to output layer (left) falls off according to a Gaussian function of lateral distance. Lateral connectivity in the output layer is also non-uniform (right). The region surrounding the excited unit is laterally excited by the central node, but nodes further away are inhibited. The result is a ‘Mexican hat’ (sombrero-shaped) pattern of connection weights.
207
Fig. 2. (A) Receptive field (RF) expansion following ablation in visual area MT of macaque. The pre-lesion receptive area in V1 (dashed line) expands after the ablation (grey) in the MT layer. RFs are represented in degrees of visual space. Adapted from Sober et al. 1997, Fig. 2. (B) Postlesion expansion in the computer model. Before the lesion, the model shows a uniform RF in V1 around the MT unit, (hash marks). After the lesion, the altered MT activation dynamics close to the lesion yield an increase in RF size (hexagons). (C) RF contraction following ablation in visual area MT of macaque. The pre-lesion receptive area in V1 (dashed line) expands after the ablation (grey) in the MT layer. Adapted from Sober et al. 1997, Fig. 3. (D) The computer model of RF contraction also shows postlesion contractions. Again, the prelesion RF (hexagons) is uniform. After the lesion, the RF contracts (hash marks).
those seen before the lesion, suggesting that there had been receptive field expansion in many cells. Often this expansion was asymmetrical with a tendency to expand more in the direction of the lesion. Other postablation receptive fields were noted to be smaller than any seen preablation,
suggesting that others had contracted. These expansions and contractions were the basic data we sought to explain using the computer model. The basic model could easily produce expansion of receptive fields (Fig. 2(B)). This is the phenomenon of unmashng, the revealing of connections
208
whose functional consequences were previously not apparent due to coincident inhibition. Expansion occurred because even weak divergent input from the input layer could now excite units that were previously inhibited disynaptically by the cells within the lesion area. Unmasking made the physiological receptive field correspond more closely to the anatomical receptive field. Therefore, the maximum extent of unmasking was determined by the width of convergence from the input layer. In general, greater convergence allowed more receptive field expansion, hence more unmasking. In this initial model, expansions were relatively modest, far less than what the divergence from the input layer could permit. The model also produced receptive field contraction; these were also not very substantial (Fig. 2(D)).
The computer model explains these receptive field changes In the premorbid state (before the lesion), the cells that will be ablated provided near excitatory inputs and far inhibitory inputs. After the ablation these influences were lost and this produced the sequence
of effects that resulted in the expansion and contraction observed in the model (Fig. 3). The primary effect of lesion in our model was a loss of local excitatory and distal inhibitory connections originating from the ablated region. The excitatory influence was only nearest neighbor, so the majority of excitatory connections within the lesion targeted other ablated cells and their loss had little effect on the surrounding regions. The loss of the inhibitory connections, with their much greater divergence, produced disinhibition in the ablation surround, resulting in increased excitability of cells outside the lesion, with the effect decreasing with distance. This increased excitability led to increases in receptive field size, for inputs previously unable to yield a suprathreshold response now had less inhibition to counter. Expansion and contraction in the basic model represented an expanding wave of consequences from the loss of projections that had emerged from the lost neurons. The divergent excitatory stimulus from the input layer remained unchanged but the lateral forces were diminished resulting in an imbalance that was rectified through a shift in responses until a new steady-state is reached. (These models always reached a steady
Fig. 3. Schematic explanation of postlesion expansion (left) and contraction (right). A loss of inhibition from ablated regions in the cortex (black hexagons), lead to an increase in activity of MT units close to the ablation (light grey hexagon) over the premorbid state. Inputs from V1 which were not able to yield a supra-threshold response (left light grey lines) are now part of the node's receptive field (left light grey oval). The highly activated MT unit (light grey hexagon), causes greater inhibition on units farther away from the ablation (dark grey hexagon) as a secondary effect. The receptive field for these units will decrease in size, for only strong connections from VI (thick lines) will yield a supra-threshold response.
209
state in our simulations; in general a dynamic system could settle into oscillations or chaos instead.) The hunt for the new steady state could be viewed as a negotiation, as activated units influence other units and are influenced in turn. The degree of disinhibition was not great enough to allow the weakest convergent feedforward projections to excite the MT unit. Although the degree of convergence (the size of the anatomical receptive field) is not an absolute limit for receptive field expansion, it is a practical limit. If inhibition is weak enough, then activity can spread laterally through the slice, allowing activation of MT units that do not receive any direct projections from V1. However, excessive lateral spread will lead to a hyperexcitable ‘epileptic’ network in which any stimulus will cause continuous activation of all cells. The increased activity levels of units with expanded receptive fields had a secondary effect, increasing the inhibitory influence on other units withn their ring of inhibitory projection (Fig. I(B)). This generally led to decreased activity levels in these cells and made it more difficult for feedforward inputs from the input layer to excite them to threshold. Contraction in the basic model was thus a secondary consequence of the increased excitability of the units which showed receptive field expansion. Because sets of cells near and far from the ablation are mutually inhibitory, the reduction in activity of the far cells reduced their inhibitory effect on the near cells, allowing these to become still more active and produce greater inhibition in return. This effect of mutual inhibition can lead to a winner-take-all situation, in which one cell completely overwhelms the other. This does not occur here due to the continuing activation from the input layer. Although the model did produce the basic effects of expansion and contraction, the contractions seen were minimal, leading us to miss them on our first set of simulations. We therefore explored other explanations for receptive field contraction, reconsidering our assumptions. Such reconsideration is a major aspect of computer modeling, a field which enjoys the distinction of making a virtue of failure. In this case, the failure to replicate substantial receptive field contractions led us back to the lesion
histology, where further examination showed a reduction in binding proteins associated with inhibitory neurons (Sober et al., 1997). This suggested that there might exist a functional disinhibitory halo, which would be expected to augment both expansion and contraction of receptive fields. This was confirmed in the model. The disinhibitory halo model showed both very large expansions (Fig. 4(A)), as well as more pronounced contractions (Fig. 4(B)). Expansion augmentation is particularly pronounced within the disinhibitory halo itself since these cells are almost entirely without inhibitory influence and are therefore excitable by the relatively weak stimulation coming from distantly converging inputs. Subsequently, the increased hyperactivity of these units produced increased inhibition in the inhibitory ring, allowing for the more pronounced contractions. Although the explanation for expansion and contraction is the same as seen in the basic model, the effect is substantially augmented by the further disinhibition (Fig. 5). As expected, overall activation tended to be greater with the elimination of inhibition. However, activation changes were surprisingly minor (Fig. 5(B)). The spread of activity change associated with ablation need not stop with second order effects. Given sufficient divergence from the input layer, one would anticipate that the decreased excitability of the cells in the receptive field contraction zone might produce another area of receptive field expansion still further out. We have not yet explored this prediction in detail, but it suggests a way in which reversible cortical lesions with cooling or lidocaine might be used to allow rapid assessment of effective divergence by demonstration of a standing wave of alternating peaks and troughs of both activity and receptive field size.
Behavioral implications Beyond the receptive field level of MT neural responses lies a level of processing that will take the information, determine what is being seen and act on the basis of that information (Oram et al., 1998). In essence this involves a decoding of information present in MT. We have not made an effort to model the details of neural processing
210
Fig. 4. (A) Receptive field (RF) expansion is more dramatic in the disinhibitory halo model, particularly for those excitatory neurons that lie in the disinhibitory halo itself. (3)RF contraction is also more pronounced in the disinhibitory halo model. The mechanism is the same as was seen in the simpler model but the increased activation of excitatory neurons in the disinhibitory halo causes a greater secondary inhibition more distally, producing greater contraction.
involved in this decoding. Nor do we wish to speculate as to the nature of conscious experience, but only to explore simple statistical methods that could be used to interpret the ‘code’ generated by the firing of cortical neurons that correspond to the external object. Another way of loolung at this situation is in terms of Bayes’ theorem, establish-
-Simple
ing the probability that the stimulus is moving in a certain direction based on the conditional probability of a particular set of cells firing in such a case and the overall probability distribution of cell firing and stimulus likelihood. As noted above, dynamic alterations are the first of two major phases of response that depend on the
B
Model
-Disinhibitory Halo Model
r
*e5
Model
-Simple
-Disinhibitory Halo Model 0
Oa5 2.5
7.5
12.5
Distance From Ablation Center
17.5
0
t8
0
17
34
51
cia
a5
Size of RF Fig. 5. (A) Both receptive field (RF) expansion and contraction are greater in the disinhibitory halo model. Cells within the ablation radius have receptive fields of size zero. (B) Despite the different MT connection weights in the basic and disinhibitory models, the level of activation relative to RF size were very similar. In both models, maximum activity of the cells was highly correlated with RF size. The main difference between models was that the disinhibitory model had cells achieving activation levels beyond what occurred in the basic model.
21 1
vastly differing time constants of neural dynamics and synaptic plasticity. Although we have not yet explored these true plastic changes, they have been explored by other authors (Grajski and Merzenich, 1990, 1990a; Xing and Gerstein, 1994; Sutton and Reggia, 1994; Goodall et al., 1997). In general, these authors have assumed that synaptic plasticity will follow some form of Hebbian algorithm, with increase in connection strength between simultaneously active cells and perhaps decreases between asynchronously active cells. There remains uncertainty as to whether plasticity is primarily confined to the feedforward projections from the input layer, to the intrinsic connections, or is present in both locations. As a first approximation, the Hebbian algorithm simply serves to imprint firing relationships that are already present and would therefore not be expected to produce major alterations in receptive field morphology (Hebb, 1949). However, a variety of plastic processes occur in intermediate time frames (Fisher et al., 1997), suggesting that the dichotomy between dynamics and plasticity may not be clean. Furthermore, as noted above, the relation between dynamics and plasticity is not one way: dynamic change begats plasticity but plasticity is structural change that begats dynamic change. This inner loop of dynamic/plastic interactions is embedded in an outer loop of behavioraffmolecular interactions. A physical neurorehabilitative therapy acts at the behavioral level, presenting particular stimuli in a damaged sensory system or suggesting movement sequences to treat a disabled motor system. Simultaneously, molecular alterations are affected by pharmacological therapies. In order to begin to understand these interactions, we must consider how neural activity produces sensorimotor coordination, as well as how behavior alters neural activity. Uncertainty regarding the locus of plasticity also arises when considering the behavioral level. We have shown that a stroke will involve alterations in receptive fields. Extending the statistical or coding viewpoint mentioned above, these changes represent an alteration in the encoding of the external stimulus. If the decoding strategy leading to behavior were to remain unchanged, function
would suffer. For that reason, full behavioral recovery will require changes in decoding strategy that match the alterations in encoding. We have hypothesized that the immediate receptive field changes, manifestations of the disorder, are also the first steps towards recovery. Therefore, we expect that there will be an alteration in decoding strategy as well. The original model, presented above, simply provided the location of the perceived object without identifying any of its other attributes (Fig. 1). Of course, visual areas also encode other stimulus attributes, such as color, shape or motion. In the case of visual area MT, the neurons are specialized to encode information about speed and direction of motion. In order to model the animal’s calculation of speed and direction, it was necessary to model the response of individual neurons to these attributes. We organized the inputs in order to get representations of location, speed and direction of an input stimulus (Fig. 6). To keep things as simple as possible, input was represented as a stimulus with attributes of location, direction of motion, and speed. Input was cosine-filtered in both the direction and speed domains. Similarly, each MT-layer unit in the original model was replaced with a set of units, each of which was assigned a preferred direction and speed. As in the original model, MTlayer units were connected to each other through a Mexican hat pattern of excitatory and inhibitory weights.
Behavior as a synthesis of neural activation Following Georgopoulis and colleagues ( 1982, 1986), we chose to model the decoding of information by assuming that each MT neuron contributes to the representation of the stimulus by ‘voting’ for a vector, which corresponds to that neuron’s preferred direction and speed. That is, firing in a particular MT cell is interpreted as a vote for the hypothesis that the stimulus is the neurons’ preferred stimulus. We tested three basic algorithms for how the collected ‘vector votes’ of many MT units could be synthesized into a single representation of the stimulus: winner-take-all, vector average and vector sum. The winner-take-all algorithm is commonly used in artificial neural networks
212
Fig. 6. (A) Estimation of the behavioral consequences of receptive field changes required a multi-level model which used an algorithm for determining the interpretation of neural activity. At the lowest level, the stimuli themselves have a speed and direction in addition to location. The next layer is a filter array with individual input units selective for location, speed, and direction. This is the ‘VI layer’ or input layer. The third layer is the MT layer whose units also have velocity selectivity based on their connections from the selective units in the input layer. The final layer is the perceptual output which produces a stimulus velocity and location estimate based on the activity of individual MT units that are activated according to one of three algorithms. These three algorithms assessed are shown to the right: in vector average the preferred directions of the active vectors are weighted according to their activity and the resulting vector is then normalized to give the percept; in vector sum the same process is done, omitting the normalization; in winner-take-all the direction of the single most active vector is taken as the percept. (B) The visual system is a complex m a y of centers. The computer model reduces this complexity to layered two-dimensional arrays. The shaded portion of the schematic represents the motion pathway leading to area MT.
because of the simplicity; the unit with the greatest activity wins, in the sense that its direction and speed preference is taken as the direction and speed of the stimulus. This is related to the idea of local
encoding, or the ‘grandmother neuron’, because a single neuron is able to make an identification. The two other algorithms are simple distributed representations where the information is represented
213
across an ensemble of neurons (Georgopoulos et al., 1982; Georgopoulos et al., 1986). In the vector sum algorithm, the magnitude of the final representation is obtained by simply adding up all of the contributions of the neurons that are firing, after weighting them in proportion to their firing rates. Hence, the representation is proportional to the number of units that contribute, with consequences for the case of stroke when this number is significantly reduced. Finally, the vector average algorithm uses a normalization step following the summation of the vectors, malung the final representation independent of number of units that contribute.
RF changes predict behavioral dysfunction In animal studies, small ablations in visual area MT cause a behavioral deficit. This can be detected by having the monkey do a visual tracking task, in which he is required to saccade to a location within the motion scotoma and then pursue the target as it moves out of the scotoma (Fig. 7A). Although a saccade to a stationary target would be accurate, the initial saccade in this task is incorrect, because the target starts moving as soon as it appears. If the target is moving away from fovea, the saccade is hypometric, and if the target is moving towards fovea, it is hypermetric, since the monkey is underestimating the velocity of the target. Furthermore, when the eyes start to pursue the object, they fall progressively behind. The location of these two symptoms of the motion scotoma, the error in initial saccade and too-slow pursuit speed, corresponds to the visual field subserved by the combined receptive fields of the ablated cells. With small ablations, this is only an area of motion obscuration, not the complete motion blindness that has been reported clinically in human stroke. Therefore, the eye pursues the stimulus in the right direction. The only evidence of deficit is that the eye does not keep up with the stimulus, the speed is underestimated and the eye falls behind. Once the target is no longer in the scotoma, the eye quickly saccades to catch up and then pursues at the proper rate thereafter. We were able to produce the underestimation of object speed characteristic of the behavioral scotoma using either the vector average or vector
summation algorithm (Fig. 7B). Using vector summation, speed estimation was reduced in the area of the scotoma due to the absence of neurons voting for the proper direction and speed. The underestimation was particularly marked in the case of vector sum because the overall strength of the sum was reduced simply by virtue of fewer vectors being added in. In the original model without the disinhibitory halo, this would produce speed underestimation even outside the area of the scotoma, due to the lack of response from neurons that were previously activated via divergence. This halo of underestimation could, however, be largely compensated by the addition of the disinhibitory halo, which tended to increase the overall vector sum, restoring the sum to that normally seen. By contrast, vector averaging was not susceptible to this surround effect since the overall signal strength was restored by the normalization process. The vector averaging algorithm produced speed underestimation within the scotoma, since disinhibited units on the lesion periphery still contributed their ‘velocity votes’ - many of which were in different or even opposite directions compared to the true stimulus - to a sum that was then normalized. The winner-take-all algorithm, on the other hand, did not generally produce any scotoma since the neuron that was most strongly activated still tended to be a neuron with the correct speed and direction tuning. Although inconclusive with respect to behavioral data, the three algorithms assessed do make different predictions regarding the expected deficit. Winner-take-all suggests that the deficit would be minimal or absent. Vector sum makes the opposite prediction; the field of behavioral deficit should be larger than the field of physiological deficit. Vector average implies that the behavioral and physiological deficit zone should match. It should be noted that these algorithms are all highly simplified and are by no means mutually exclusive. Therefore, the observed speed underestimation within the scotoma, being relatively mild, could be evidence of a change in the emphasis accorded various complementary decoding algorithms. Given the ability of winner-take-all to preserve perception in this situation, more attention might be paid to a single highly active neuron under these circum-
2 14
Fig. 7. (A) The pursuit task starts with the eye pointed at a central fixation point. The target comes on somewhere off to the side and immediately starts moving. The animal is expected to saccade to the target and then follow it with pursuit eye movement. The saccade itself is generally unimpaired by the MT lesion since other visual areas, both cortical and subcortical, signal the location of the target as it appears. However, the MT lesion results in some impairment in determining the velocity of the target, causing the pursuit eye movement to lag behind. This behavior suggests a perceptual deficit; the animal is underestimating velocity. (B) This velocity underestimation can be demonstrated by the model which shows different degrees of underestimation depending on the location of the target in the retinotopic zone of the lesioned area. The underestimation also depends on the representational algorithm used (see text). This case illustrates results using the vector average algorithm.
stances. This would represent an alteration in decodmg strategy, as the brain learns to read the output of the damaged area differently than it did previously. Other alterations in decoding strategy might be less drastic. For example, if a vector sum algorithm is being used, then a new decoding strategy would
involve reweighting inputs from the damaged area. Compensation would require an increase in all surviving weights, in order to reestablish magnitude to compensate for the lost input. Alternatively, taking a Bayesian viewpoint, any new decoding would involve reassigning of probability distributions and prior probabilities. As a simple example,
215
the ablated cells would be assigned a zero probability of firing, and conditional probabilities associated with their not firing would be eliminated. As noted above, the receptive fields of cells surrounding the lesion expand dramatically following cortical ablation; this expansion is often asymmetric with greater expansion towards the lesion. Our model suggests that this expansion plays an important role in the continued perception of speed and direction in the center of an ablation. The enlarged receptive fields effectively saved part of the input space that would otherwise entirely lose its representation in cortex. In essence, these enlarged receptive fields provided a band-aid over the area of input space that had been processed by the neurons killed by the lesion. In the absence of these expansions, a portion of visual space would lose its cortical representation entirely because it would not fall in the receptive fields of any live neurons. The behavioral deficits that follow cortical lesion certainly suggest that such damage disrupts the brain’s ability to process stimuli. Nevertheless, it is quite significant that for moderate-sized lesions this inability is reflected as an inappropriate response to stimuli rather than a complete lack of response (Yamasaki and Wurtz, 1991). Perhaps the receptive field band-aid makes a critical difference in that it allows postlesion stimuli to be misprocessed (as evidenced by inappropriate postlesion behavior) rather than not processed at all. This erroneous processing can then be corrected by Hebbian plasticity (Hebb, 1949). In general, neural network learning algorithms are good at rewiring networks that respond inappropriately, using the error to move the network towards greater accuracy. In the absence of any response, however, it is impossible to generate the correct error signal needed to move the network in the proper direction. This suggests that large lesions that destroy large tracts of cortex exceeding the typical divergence of projecting cells will be qualitatively different in their ability to recover functionally.
Changes in directional receptive fields The behavioral model’s estimation of direction following an ablation was also strongly affected by
the amount of interaction between directionally tuned units in the MT layer. In the absence of substantial surround inhibition influencing direction cells, direction tuning was substantially the same both before and after the lesion. In this case, direction cell response was determined mainly by the properties of feedforward stimulation, and was not sculpted by lateral interactions within the MT layer. Using the vector summation algorithm, stimulus speed estimation is dependent on the number of summed vectors, but using the vector average algorithm, it is not. In the absence of change in direction tuning of the individual direction cells, the smaller number of cells postlesion gave a reduction in speed estimation when using the vector sum algorithm but no alteration in speed estimation with the vector-average algorithm. By contrast, in the presence of substantial surround inhibition influencing direction tuning, direction tuning loosened after the lesion, meaning that a particular unit would respond to a wider range of directions. This was caused by the same mechanisms that caused spatial tuning loosening (spatial receptive field expansion) in the basic model. In the behavioral model, directional-selectivity inputs drove each MT unit for a certain range of directional input, just as spatial-selectivity units drove each MT unit for a certain spatial range. As in the spatial case, the actual directional tuning of an MT unit was narrower than that specified by this function because of lateral inhibition from other MT units. Lesions to the MT layer loosened MT unit tuning (that is, made tuning more similar to the driving function) by eliminating lateral inhibition. Also analogously, the inhibition was reduced both by the removal of cells with inhibitory connections and by the loss of inhibitory interneurons in the disinhibitory halo. With lateral inhibition in place the ablation resulted in expansion of directional tuning curves as well as a reduction in the number of units. Now, both the summation and averaging of algorithms predicted speed underestimation. In the case of vector summation, this happened for the same reason as in the no-lateral-inhibition case - there are simply fewer vectors being summed, and the result is a shorter vector. In the case of vector averaging, the expansion of the directional tuning
216
curve introduced omnidirectional noise into the set of vectors to be averaged. Once averaged in with the signal components, these noise components, which in some cases pointed in the wrong direction, reduced the magnitude of the final output vector.
Therapeutic implications In addition to physical rehabilitation strategies, new possibilities are emerging for pharmacological and surgical therapies. Computer modeling of the processes underlying recovery will be particularly useful in making the connection between the cellular and molecular level of these interventions and the behavioral level. Overall, the interventions that have been proposed can be classified as follows: (1) alteration of cellular dynamics; (2) alteration of activity-dependent plasticity; and (3) replacement of neurons or augmentation of existing neurons (i.e. nerve growth factors). Gross network dynamics is addressed in this paper, but the complexities of single neuron dynamics are not explored. Activity-dependent plasticity has been considered in a number of recent papers (Pearson et al., 1987; Grajski and Merzenich, 1990; Reggia et al., 1992; Cho and Reggia, 1994; Sutton et al., 1994; Xing and Gerstein, 1996). The role of sprouting or cellular replacement remains more difficult to predict at present but will also be amenable to modeling as more facts emerge. In this paper, we have discussed how dynamics provides an initial band-aid effect that may provide immediate coverage which minimizes the disability in the short-term. As noted, this will also set up the patterned activity needed for activity-dependent plasticity changes. This increased receptive field size is likely to correlate with increased activity levels, putting this putative benefit squarely at odds with the need to reduce activity levels after stroke in order to prevent excitotoxicity due to calcium entry. One method proposed to counteract excitotoxicity is NMDA blockade, which will of course have a direct effect on activity-dependent plasticity. Optimization of desirable activity-dependent plastic changes will require not only evaluation of the amount of activity during stroke recovery, but
also consideration of the best timing for plasticity. To take an obvious example, it may be preferable to enhance plasticity during periods of active physical rehabilitation and perhaps suppress plasticity during the immediate post-stroke period when patients are inactive. There remains controversy regarding the nature of the plasticity rules governing stroke recovery, with some evidence suggesting that these rules might differ somewhat from those of development. Neuron transplants have recently been pioneered in humans who have suffered strokes (Bonn, 1998). It is unknown whether these neurons may eventually intercalate directly into processing circuits or simply provide growth factors that produce new connections in remaining indigenous neurons. In the latter case, this procedure may have the same effect as direct administration of nerve growth factors. It is unclear whether sprouting due to these factors will be activity dependent or not. If it is activity-dependent, it is not known whether it will be governed by rules appropriate for the limited neural plasticity of adult learning or by the rules utilized in childhood for the much more substantial alterations of early development. If the latter, perhaps the process has a critical period comparable to that of early brain development.
Conclusions Computer modeling nicely complements the ecumenical spirit of pathophysiological investigation common to neurology and psychiatry. In both of these areas, as to a lesser extent in internal medicine, the complexities of disease causation and expression require that the physician simultaneously assume various perspectives, including cellular, chemical, neural and behavioral, when assessing a patient. He or she must then estimate how the intersection of these influences is leading to worsening disease or to recovery. Similarly, computer modeling, although not having to venture into such medical arcana as HMO reimbursement policies, explores the intersections of biological and behavioral processes at vastly different scales. In the current studies, we have looked at interactions of cellular and network dynamics and considered their effects on behavior as well as the
217
effects that behavior will have on dynamics in turn. We have shown how the limitations of the original model suggested further investigations which revealed an unexpected histological finding -the disinhibitory halo. We have also provided some explanations, although not yet testable predictions, at the behavioral level. Our model suggests a value to neural overactivation in the immediate poststroke period: associated expanding receptive fields may provide a band-aid effect and encourage plasticity leading to continuing recovery. This would suggest that pharmacological reduction of excitation to prevent excitotoxicity might best be restricted to a limited period immediately after a stroke.
Appendix MT and V1 were represented by two 20 x 20 grids of nodes with edge wrap arround. Activity (ak)of MT unit k was defined by 1
400
400
\
(14 +(ak)= - 4(&A - ak)
(Ib)
with 7 = 0.2 and A = 5. The V weight matrix was the projection from V 1 to MT, b, were inputs from V 1, and the M weight matrix expressed lateral connectivity within MT. Function +(ak) was used to eliminate drive when unit activity approached 0 or A , thereby keeping uk within these bounds. Connections from V1 to MT were topographic with wrap-around to prevent edge effects. The elements of matrix V were excitatory (positive) and calculated from radial coordinates r of the hexagonally tessellated units as distance from target unit. Divergence from a given V1 location to the MT units had Gaussian spatial fall-off v ( ~= k). - 1/2(r/# (2) where k = 1.0 is the normalizing constant and s = 3.0 is the divergence parameter. Similarly, M was calculated from excitatory E(r) and inhibitory I(r) functions with wrap-around in the MT layer. Lateral MT excitatory connections
were highly divergent and declined in strength exponentially: E(r)=c.e-'lA
(3)
with c = 0.02, A = 0.8. r 2 1 (no self connection). Lateral inhibition in the model was modeled as a reduction in negative connection strength onto local MT units. ~ ( r=)c . e - ( r - ')'A (4) with c=O.O157, A = 1.5, r 2 2 . These combine by E(r) - I(r) to form the 'Mexican Hat' shape. Receptive fields were calculated by serially activating each of the 400 inputs and recording whether a given MT unit responded above threshold (6 = 0.5). For simple ablations, units were eliminated from the model by locking their states at zero. For disinhibitory halo ablations, this lesion was surrounded by a zone in which inhibitory connections were reduced by 50%.
References Alamancos, M. and Borrel, J. (1995). Functional recovery of forelimb response capacity after forelimb primary motor cortex damage in the rat is due to the reorganization of adjacent areas of cortex. Neuroscience, 68: 793-805. Allard, T., Clark, S., Jenkins, W. and Merzenich, M. (1991). Reorganization of somatosensory area 3b representations in adult owl monkeys after digital syndactyly. J. Neurophysiol., 68: 1048-1058. Armentrout, S., Reggia, J. and Weinrich, M. (1994). A neural model of cortical map reorganization following a focal lesion. Art. Intell. Medicine, 6: 383400. Bonn, D. (1988). First cell transplant aimed to reverse stroke damage. Lancet, 352(Y 122): 119. Buonomano, D. and Merzenich, M. (1998). Cortical plasticity: from synapses to maps. Annu. Rev. Neurosci., 21: 149-186. Cho, S. and Reggia, J. (1994). Map formation in propnoceptive cortex. Inl. J. Neural Systems, 5: 87-101. Clark, S., Allard, T., Jenluns, W. and Merzenich, M. (1988). Receptive fields in the body-surface map in adult cortex defined by temporally correlated inputs. Nature, 332: 444-445. Dinse, H., REcanzone, G . and Merzenich, M. (1990). Direct observation of neural assemblies during neocortical representation reorganization. In: R. Eckmiller, G . Hartmann and G . Huske (Eds.), Parallel Processing in Neural Systems and Computers, Elsevier, New York, pp. 1-21. Fisher, S., Fisher, T. and Carew, T. (1997). Multiple overlapping processes underlying short-term synaptic enhancement. Trends Neurosci., 20(4): 170-177. Georgopoulos, A,, Kalaska, J., Caminiti, R. and Massey, J. (1982). On the relations between the direction of two-
218 dimensional arm movements and cell discharge in primate motor cortex. J. Neurosci., 2: 1527-1537. Geogopoulos, A,, Schwartz, A. and Kettner, R. (1986). Neuronal population coding of movement direction. Science, 233: 1416-1419. Gilber, C. and Wiesel, T. (1992). Receptive field dynamics in adult primary visual cortex. Nature, 356: 150-152. Goodall, S., Reggia, J., Chen, Y., Ruppin, E. and Whitney, C. (1997). A computational model of acute focal cortical lesions. Stroke, 28: 101-109. Grajski. K. and Merzenich, M. (1990). Hebb-type dynamics is sufficient to account for the inverse magnification rule in cortical somatotopy. Neural Comp., 2: 71-84. Grajski, K. and Mercenich, M. (1990) Neural network simulation of somatosensory representational plasticity. In: D.S. Taratzky (Ed.), Adv. Neur. Inform. Process. 2 . Morgan Kaufman: San Mateo, CA. Hebb, D., Organization of Behavior. John Wiley & Sons: New York, 1949. Hubel, D. (1996). A big step along the visual pathway [news; comment]. Nature, 380: 197-198. Hubel, D. and Wiesel, T. (1998). Early exploration of the visual cortex. Neuron., 20: 401-412. Kaas, J., Krubitzer, L., Chino, Y., Langston, A,, Polley, E. and Blair, N. (1990). Reorganization of retinotopic cortical maps in adult mammals after lesions of the retina. Science, 248: 229-23 1. Lytton, W., Stark, J., Yamasaki, D. and Sober, S. (1999). Computer models of stroke recovery: Implications for neurorehabilitation. The Neuroscientist, 5: 100-1 11. Merzenich, M. and Kaas, J. (1982). Reorganization of mammalian somatosensory cortex following peripheral nerve injury. Trends Neirmsci., 5: 434436. Miller, K. ( 1994). Models of activity-dependent neural development. Prog. Brain Rex, 102: 303-318. Miller, K. (1995). Receptive fields and maps in the visual cortex: Models of ocular dominance and orientation columns. In: van Hemmen, J. and K.S. (Eds.), Models of Neural Networks III. Springer Verlag: NY, Ch. 1, pp. 55-78. Miller, K., Keller, J. and Stryker, M. (1989). Ocular dominance column development: analysis and simulation. Science, 245: 605-6 15. Nudo, R., Jenkins, W. and Merzenich, M. (1990). Repetitive microstimulation alters the cortical representation of movements in adult rats. Somutoserzs. Mot. Res., 7: 463-483. Nudo, R., and Milliken, G. (1996). Reorganization of movement representations in primary motor cortex following focal ischemic infarcts in adult squirrel monkeys. J. Neurophysiol. 75: 2144-2149. Nudo, R., Wise, B., Fuentes, F. and Milliken, G. (1996). Neural substrates for the effects of rehabilitative training on motor recovery after ischemic infarct. Science, 272: 1791-1794.
Oram, M., Foldiak, P., Perrett, D. and Sengpiel, E (1998). The ‘ideal humunculus’ : decoding neural population signals. Trends Neurosci., 21: 259-265. Pearson, J., Finkel, L. and Edelman, G. (1987). Plasticity in the organization of adult cerebral cortical maps: a computer simulation based on neuronal group selection. J. Neurosci., 7: 4209-4223. Recanzone, G., Jenkins, W., Hradek, G. and Merzenich, M. (1 992). Progressive improvement in discriminative abilities in adult owl monkeys performing a tactile frequency discrimination task. J. Neurophysiol., 67: 1015-1030. Reggia, J., D’Autrechy, C., Sutton, G. and Weinrich, M. (1992). A competitive distribution theory of neocortical dynamics. Neural Comp., 4: 287-317. Robertson, D. and Irvine, D. (1989). Plasticity of frequency organization in auditory cortex of guinea pigs with partial unilateral deafness. J. Comp. Neurol., 282: 4 5 6 4 7 1. Sejnowski, T., Koch, C. and Churchland, P. (1988). Computational neuroscience. Science, 241 : 1299-1306. Smits, E., Gordon, D., Witte, S., Rasmusson, D. and Zarzecki, P. (1991). Synaptic potentials evoked by convergent somatosensory and corticocortical inputs in raccoon somatosensory cortex: substrates for plasticity. J. Neurophysiol., 66: 688-695. Sober, S., Stark, J., Yamasaki, D. and Lytton, W. (1997). Receptive field changes following stroke-like cortical ablation: a role for activation dynamics. J. Neurophysiol., 78: 3438-3443. Sutton, G., Reggia, J., Armentrout, S. and D’Autrechy, C. (1994). Cortical map reorganization as a competitive process. Neural Comp., 6: 1-13. Wang, X., Merzenich, M., Sameshima, K. and Jenkins, W. (1995). Remodelling of hand representation in adult cortex determined by timing of tactile stimulation. Nature, 378: 71-75. Weinberger, N., Javid, R. and Lepan, B. (1993). Long-term retention of learning-induced receptive-field plasticity in the auditory cortex. Proc. Nut. Acad. Sci. USA, 90: 23962398. Wurtz, R., Yamasaki, D., Duffy, C. and Roy, J. (1990). Functional specialization for visual motion processing in primate cerebral cortex. Cold Spring Hurbor Symposia on Quantitative Biology, 5 : 7 17-727. Xing, J. and Gerstein, G. (1994). Simulation of dynamic receptive fields in primary visual cortex. Vision Res., 34: 1901-191 1. Xing, J. and Gerstein, G. (1996). Networks with lateral connectivity I. J. Neurophysiol., 75: 184-199. Yamasaki, D. and Wurtz, R. (1991). Recovery of function after lesions in the superior temporal sulcus in the monkey. J. Neurophysiol., 66: 65 1-673.
J.A. Reggia. E. Ruppin and D.Glanzman (Eds.) frogrrss in Rmin Reseurih, Vol 121 0 l Y W Elwvier Science BV. All rights reserved.
CHAPTER 13
Effects of callosal lesions in a computational model of single-word reading Jasmeet Chhabra', Mark Glezer', Yuri Shkuro', Shaun D. Gittens' and James A. Reggia'." 'Delmrtment of Computer Science, A. V Williams Building, University of Mayland, College Park, M D 20742. USA LDeparfmentsof Computer Science and Neurology, Institute of Advanced Computer Studies, A. V Williams Building, University of Maryland. College Park, M D 20742, USA
Introduction Although a large body of knowledge exists today concerning the cerebral hemispheres and their connecting pathways such as the corpus callosum, our current understanding of the basic mechanisms of hemispheric interactions remains limited. For example, it is unclear which of the many known anatomic and physiologic hemispheric asymmetries might be causally contributing to language lateralization, and what role the corpus callosum may play in this process (Kinsbourne, 1978; Geschwind and Galaburda, 1987; Hellige, 1993; Springer and Deutsch, 1993). Further, there is conflicting evidence about what changes in cerebral lateralization occur following brain damage (e.g. Lee eta]., 1984; Heiss et al., 1997; Silvestrini et al., 1998). For example, when sudden focal damage occurs to a cerebral hemisphere, the extent to which a unilateral lesion causes bilateral functional impairment is controversial (Bowler et al., 1993, and it is also uncertain to what extent recovery reflects local changes in the lesioned hemisphere as opposed to compensatory changes in the intact contralateral hemisphere (Weiller et al., 1995; Heiss et al., 1997). These issues are of fundamental ~
*Corresponding author. e-mail: [jasmeet,glezer,merlin,sgittens,reggia}@cs.umd.edu
importance to anyone interested in the nature of hemispheric interactions, cerebral specialization, and recovery from acute brain damage. In this chapter, we are specifically concerned with the effects of corpus callosum lesions on hemispheric functions. Most past work on the effects of callosal lesions has focused on the extent to which information transfer between the two hemispheres is disrupted by callosal sectioning. Specially-designed experiments over many years have shown that, although chronically split-brain individuals superficially appear to function normally, communication between their two hemispheres is actually largely compromised (Sperry, 1982; Seymour et al., 1994; Gazzaniga, 1995). Callosal disconnection syndromes also occur frequently following callosal infarction (Giroud and Dumas, 1995). Our interest here is primarily with another aspect of the effects of callosal lesions and its implications for normal callosal functions. Specifically, it is not clear today whether each hemisphere exerts primarily an overall excitatory or inhibitory influence on the opposite hemisphere via the corpus callosum. Most neurons sending axons through the corpus callosum are pyramidal cells, and these synapse mainly on contralateral spiny cells (Hartenstein and Innocenti, 1981; Innocenti, 1986). Such excitatory synaptic connections, as well as
220
transcallosal diaschisis’ and split brain experiments, suggest that the resultant transcallosal influences are mainly excitatory in nature (Berlucchi, 1983). However, this hypothesis is quite controversial (Denenberg, 1983). Transcallosal monosynaptic postsynaptic potentials are subthreshold and of low amplitude, and are followed by stronger, more prolonged inhibition (Toyama et al., 1969), suggesting to some that transcallosal inhibitory influences are much more important (Kinsbourne 1978; Cook, 1986). Recent transcranial magnetic stimulation studies have also indicated that activation of one motor cortex region inhibits the contralateral one (Ferbert et al., 1992; Meyer et al., 1995), although it is difficult to know what this response to such a non-physiological stimulus implies for normal physiological hemispheric interactions. While future experimental studies can be expected to clarify these issues, theoretical models may prove useful in complementing and even guiding such future empirical work. Neural models have been widely used during the last decade to examine many other issues in neuropsychology and neurology. These include testing of hypotheses concerning both normal and dysfunctional memory, perception, motor control, attention and language (Reggia et al., 1996). Some models have even made predictions that have subsequently been verified experimentally (e.g. Sober et al., 1997). However, very little of this modeling work has examined how the two cerebral hemispheres interact and influence each other, how hemispheric specialization might arise, and whether the contralateral hemisphere contributes to recovery following brain damage. Recently, we created a computational model of single-word reading and used it to examine various hypotheses about how underlying cerebral asymmetries could lead to function lateralization (Reggia et al., 1998; Reggia et al., 1999; Shkuro et al., 1999), and the effects of unilateral hemispheric lesions. This neural network model consists of left and right hemispheric regions interacting via call
The term transcallosal diaschisisrefers to remote effects of brain damage (diaschisis) mediated via the corpus callosum. In other words, this refers to effects of a lesion in one hemisphere on the other hemisphere.
losal connections. During training of the model, lateralization occurred readily to one hemispheric region in the context of several underlying asymmetries. It was most pronounced when callosal connections were assumed to play predominantly an inhibitory role. To our knowledge this is the first neural model to study systematically the ‘spontaneous’ lateralization of function. By demonstrating that emergence of lateralization and the effects of cortical lesions can be modeled computationally, this work raises the possibility that neural models may be useful for better understanding the mechanisms of hemispheric interactions. In this paper we extend the previous studies of our model by examining the effects of simulated callosal lesions on the model’s performance and lateralization under a wide variety of assumptions about hemispheric asymmetries, divergence of callosal connections, and strength of callosal influences. Of particular interest is the extent to which post-lesion recovery arises from adaptation of the hemispheric regions, and how assumptions about the corpus callosum influence the recovery process. We begin by briefly summarizing the intact model. The methods used to simulate callosal lesions in different versions of the model are then described, and the resultant effects of callosal lesions of systematically-varied size are presented. We also describe the effects of partiallykompletely removing afferent information to a single hemisphere. This allows us to establish that transient impaired performance of the model following callosal lesions is not due solely to imbalances in interhemispheric excitationhnhibition, but also to loss of transmission of specific information between the left and right hemispheric regions via the corpus callosum. We conclude by relating both these results and our previous results involving hemispheric rather than callosal lesions to recent experimental studies.
Methods The intact neural model is described in detail elsewhere (Reggia et al., 1998; Shkuro et al., 1999) and only a brief summary is given here to make the lesioning results understandable. The intact model is trained to take three-letter words (CAD, MOP,
22 I
SIT, etc.) as input and to produce the correct temporal sequence of phonemes for the pronunciation of each given word as output. For example, given the single fixed input pattern MAT, the trained model’s output goes through a sequence of three states representing the phonemes for M, A and T, one at a time. The words used to train the model are listed in the Appendix. The model’s structure and functionality The architecture of the model is summarized schematically in Fig. 1. While this model captures the direct transformation of input to output suggested by functional imaging studies of single-word reading (Peterson et al., 1988), the intent is not to create a veridical model of underlying neocortical structures or of all aspects of the reading-aloud process. Rather, our goal is simply to represent the functionality of two interacting, recurrently-connected pathways (left and right) as sequential output is generated in a fashion described below. The model’s input elements (I) are divided into three groups, each group corresponding to the possible input characters at one of the
three input character positions. Input elements are fully connected to two sets of neural elements representing corresponding regions of the left (LH) and right (RH) hemisphere. These regions are connected to each other via a simulated corpus callosum ( C C ) ,and they are also fully connected to a set of output neural elements (0)representing individual phonemes. A set of state elements (S) receives one-to-one connections from the output elements, and serves solely to provide delayed feedback to the hemispheric regions via recurrent connections. These recurrent feedback connections are motivated in part by the recurrent or ‘backwards’ neuroanatomic connections between cortical regions (Felleman and Van Essen, 1991). The two hemispheric regions are taken to represent roughly mirror-image left and right cortical regions, consistent with the fact that such regions are generally specialized for the same or similar function (Heilman and Valenstein, 1979; Kupfermann, 1991). The callosal lesioning experiments described for the first time in this chapter are done using two different versions of the basic model in Fig. 1,
word Fig. I . Model architecture. Given an input word (bottom left), a temporal sequence of phonemes (bottom right) is generated representing its pronunciation. Individual neural elements are indicated by small hexagons, while sets of related elements are indicated by boxes. Labels: I =inputs, 0 =outputs, L H R H = lefuright hemispheric region, CC = corpus callosum, S = state elements, e/i = excitatorylinhibitory intra-hemispheric connections from element x, c = homotopic corpus callosum connections from element x.
222
differentiated by the nature of their callosal connections. The diffuse model (Reggia et al., 1998) has fully connected left and right callosal connections, where each hemispheric element sends callosal connections to all contralateral hemispheric elements. This is not as unrealistic as it first sounds in that the two hemispheric regions in the model represent relatively small, circumscribed cortical areas. For simplicity, no intracortical connections are present in the diffuse model. The second version of the model, the topographic model (Shkuro et al., 1999), has highly localized homotopic callosal connections, where each hemispheric element sends callosal connections to its corresponding mirror image element and that element’s immediate neighbors in the contralateral hemisphere (e.g. between cortical elements labeled x and c in Fig. 1). Intracortical connections are also present (Fig. 1). The diffuse and topographic versions of the model can be viewed as near the ends of a spectrum of diffuse vs. localized callosal connections. For each version of the model, simulations are done both where it is assumed that callosal influences are excitatory, and where it is assumed that they are inhibitory. Overall then, four types of models are lesioned: diffuse excitatory, diffuse inhibitory, topographic excitatory, and topographic inhibitory. These four types of models correspond to the four types of previous hypotheses that have been made concerning the nature of callosallymediated hemispheric interactions (reviewed in (Cook, 1986)). During simulations, activation patterns representing written words are clamped on the input elements and held fixed while the network generates a sequence of outputs, as follows. With the onset of input, activation propagates forward to the two hemispheric regions, and then to the output elements, where ideally only the single element representing the correct first phoneme of the input word is activated (after training). This output pattern activates the set of state elements (S), and then the network recomputes its output, generating the second phoneme in the word’s pronunciation sequence. In doing this, the activation levels of the elements in the hemispheric layers are now determined not only by the input activation pattern and
callosally-mediated activation from the opposite hemisphere, but also by the feedback activation pattern of the state elements. The state elements provide one time unit of delay between output activity and its feedback via recurrent connections to the hemispheric regions. This process repeats and a third output phoneme is produced. The activation level a, of each neural element i is set to zero at the beginning of presenting each input. The activation level of hemispheric element i is governed by da,ldt = - a, + Mu(h,)
(1)
where M is a constant maximum activation level, h, represents the linear weighted sum of input activation to element i from inputs I, state elements S, and the connected elements from both hemispheric regions, and u is the logistic function a ( x ) = ( l+e-r)-’. At equilibrium a,=Mu(h,), so each hemispheric element effectively computes a sigmoid function of its input. The reason for iteratively computing a, (25 iterations, 0.1 time step, Euler method), rather than just assigning it a value Mu(h,), is to allow the hemispheric regions time to influence each other during a simulation via the corpus callosum. Output element activation levels are simply computed directly as a,=u(h,). State element i is assigned the value of the corresponding output element plus a fraction of its previous value, i.e., a: = va:+ af, where superscript s designates a state element, o designates an output element, and v is a constant. The supervised learning rule used to train the model is a variant of recurrent error back propagation specifically designed for networks transforming a single fixed input pattern into a sequence of outputs (Jordan, 1986). The equations governing learning are briefly stated in the Appendix. Shared error signals propagate back from output elements to hemispheric elements. Learning occurs on all connections pictured in Fig. 1 with wide gray arrows (all except those forming the corpus callosum CC), and also on connections from a bias unit to hemispheric and output elements (not shown in Fig. 1). Learning is incremental, with weight updates occurring after each individual output. Learned weights were initialized with uniformly random real numbers. Callosal weights are uniform
223
and fixed during any given simulation, be they excitatory or inhibitory. Further details about the intact model can be found in (Reggia et al., 1998; Shkuro et al., 1999).
Gerzerul experimental methods Using the model described above, we previously studied the factors influencing lateralization (cerebral specialization) in variations of both the diffuse and topographic intact model (Reggia et al., 1998; Shkuro et al., 1999). The assumptions about hemispheric asymmetries and the excitatoryhnhibitory effects of the corpus callosum were systematically altered with each simulation. Three hemispheric asymmetries were examined: relative size, maximum activation level and learning rate parameter. These three different factors were examined because of experimental evidence for hemispheric asymmetries in region size, excitability, and neurotransmitter levels (e.g. Tucker and Williamson, 1984; Geschwind and Galaburda, 1987; Macdonnell et al., 1991; Hellige, 1993). Asymmetries were examined one at a time in isolation, and were always introduced so as to favor left hemispheric region dominance. In addition, a version with symmetric hemispheric regions was used a5 a control model. For each hemispheric asymmetry examined and the symmetric control model, the uniform value c of callosal strength was varied over several values between - 3.0 and + 3.0. Depending on the specific asymmetries and callosal influences used, varying amounts of cerebral specialization/lateralization emerged. Representative versions of the intact diffuse and topographic models were selected for callosal lesioning as explained later. In all lesioning simulations described in this current chapter, each hemispheric region was usually sufficiently large so that it could independently learn the input-output mapping if given enough time. This was motivated by experimental evidence that either human hemisphere alone can acquire language (Dennis and Whitaker, 1976; Vargha-Khaden et al., 1997), and that function lateralization would trivially occur if one hemispheric region was too small to learn the mapping. The 50 three-letter words with their associated pronunciation sequence that were used as training
data for both pre-lesion and post-lesion versions of the models are listed in the Appendix. These words were designed so that any single output phoneme alone could never unambiguously predict the subsequent phoneme (e.g. an L in the first position could be followed by an A, I or 0). Thus the feedback from state elements to hemispheric regions alone could not predict uniquely the subsequent correct output state. The baseline parameters, used in all simulations described below unless explicitly noted otherwise, are a maximum activation of 1.0, and a learning rate of 0.05. The diffuse model typically had 10 elements per hemispheric region, the topographic model typically 100. All software is implemented in c, and simulations were run on SUN Sparcstations. In any simulation, model performance was measured as root mean square error E of output elements measured over all 50 words. Training consisted of repeated passes through the training data until either E was reduced to 0.05 or 10,000 passes through the data (epochs) occurred. Error was measured prior to training and under three conditions after training: with both hemispheric regions connected to outputs ( E ) , and with each of the left and right hemisphere regions alone connected to outputs (EL and E R , respectively). When only one hemispheric region was connected to the outputs, the other still had an indirect influence on output via the corpus callosum. Lateralization was measured using an asymmetry coefficient p (Lezak, 1995), specifically: EL
- ER
’=1 - i ( E L + E R ) ’
(3)
Negative values of p indicate left lateralization of function, positive values indicate right lateralization, and for the specific simulations done here, p = f 0.60 roughly corresponds to maximal or complete lateralization.
Lesioning results with the diffuse model For the callosal lesioning experiments with the diffuse model, 10 variations of the basic model were used, as summarized in Table 1. These variations represent five different situations concerning which single underlying hemispheric
224
TABLE 1 Variations of intact diffuse model used for lesions Asymmetry
Callosal strength
Error
Lateralization
Full
Left
Right
Mean activation Left
Right
Symmetric Model
-2 +2
0.050 0.049
0.247 0.292
0.290 0.288
- 0.059
0.3 I8 0.448
0.268 0.462
Size 14 vs. 10
-2 +2
0.050 0.049
0.050 0.204
0.470 0.292
- 0.567
0.510 0.482
0.005 0.512
Size 20 vs. s*
- 0.5
+ 0.5
0.050 0.050
0.266 0.22s
0.4 I8 0.440
0.486 0.494
0.555 0.543
MaxAct 1.0 vs. 0.3
-2 +2
0.050 0.050
0.050 0.139
0.398 0.341
- 0.448
0.529 0.454
0.001 0.167
LearnRate 0.05 vs. 0.02
-2 +2
0.050 0.050
0.068 0.155
0.457 0.360
- 0.527
0.477 0.450
0.022 0.472
+ 0.005
- 0.1 17 - 0.23 1 - 0.322
- 0.265
- 0.276
*Pretraining hemispheric mean activation levels were balanced (Reggia et al.. 199R).
asymmetry is present: symmetric (no hemispheric asymmetries except initial random weights), size 14/10 (left hemisphere has 14 elements, right lo), size 2015, maxact 1.0/0.3 (maximum activation of left hemispheric elements 1.O, but 0.3 on the right), and learning rate 0.05/0.02 (higher left learning rate). For each of these five asymmetry conditions, two separate versions were studied, one with excitatory callosal strengths and one with inhibitory callosal strengths, for the reasons given above. As a result of these variations, pre-lesion lateralization varied from almost none (p=O) to almost complete (p = - 0.5) in the ten diffuse model variations, as listed in Table 1. Strongly lateralized versions of the model are viewed as corresponding best to the lateralization occurring with human language processing, while weakly or non-lateralized versions are viewed as ‘controls’. When present, hemispheric asymmetries were always arranged so as to favor left hemisphere dominance for simplicity, with the understanding that right hemisphere language dominance also occurs biologically. Table 1 also shows the pre-lesion root mean square performance errors E, EL and ER, and the pre-lesion mean activation values in the two hemispheric regions. Each lesion was introduced into an intact version of the model by setting the weights of a randomly
selected subset of the callosal connections to zero. Callosal lesion sizes of 25%, 50%, 75% and 100% were done in each case. After lesioning, synaptic changes were allowed to continue until the full lesioned model’s error returned to our criteria of E= 0.05 for normal performance. Various measures like error, mean activation, lateralization etc. were measured at least three times: immediately before the lesion, immediately following the lesion but before any new training (acute error measures) and after training (post-recovery error measures). Lesions of different sizes were done independently, not progressively. Symmetric hemispheric regions Typical graphs of model error vs. time following callosal lesions for symmetric models are shown in Fig. 2. Illustrated are the lesioning retraining period for the symmetric model with a 75% corpus callosum lesion both for inhibitory (Fig. 2a) and excitatory (Fig. 2b) callosal strengths. Following a callosal lesion at time zero, performance of the full model and each hemispheric region alone deteriorated acutely. Performance of the full model always returned to pre-lesion levels for both excitatory and inhibitory callosal influences for all lesion sizes. Recovery was at first rapid and then gradually
225
a. Symmetric model, c = -2
b. Symmetric model, c = +2 0.35
In w
z
v) w
i
0.2
0.15
....................................... 0
50
100
150
200
0.25 0.2 0.15 0.1 0.05
. I.....
.j
-
J
..........
.
......................
i
250
nme
nme
Fig. 2. Time course of root mean square error (RMSE) for symmetric model recovery following a 75% corpus callosum lesion for (a) inhibitory and (b) excitatory callosal connections. Note the different horizontal scales.
slower, a pattern analogous to that often seen clinically after acute brain damage (stroke, trauma, etc.). In the model with excitatory callosal influences, after an initial period in which each individual hemispheric region improved, surprisingly the separate hemispheres assessed in isolation for some lesion sizes actually got slightly worse over time while the full model got better. For example, this effect is evident in Fig. 2b for a 75% lesion. This happens as the intact hemispheres complement each other: for a given input, while
a. Symmetric model, c = -2, Acute
one of the hemispheres provides largely excitatory activation of outputs, the other hemisphere provides largely inhibitory activation of output elements, for some or all of the output phoneme sequence. During recovery, for each word the excitatory effects of one hemisphere grew, as did the inhibitory effects of the other hemisphere, so while their joint action remained in balance and improved, their individual performance could deteriorate. Figure 3 shows how error varied as a function of corpus callosum lesion size for excitatory and
b. Symmetric model, c = -2, Chronic ,
0.6
1
1
05
1
0.4
0,2- .
-
0.1
0
0.25
0 75
05
..................; . . . . . . . . . . . .
leH chronic --e--ri lchronic -achronic -v -
Ell
..i .................
-
i
i
- ................
.................................................................
-.-._._(,
1
Lesion sze
d. Symmetric model, c = +2, Chronic
c. Symmetric model, c = +2, Acute 08
1
1
05
1
,
0.8
ieH acute -8ri hlacute -a~u11acute - -- -
-.
0.5
w
i
.
1
1
;. . . . . . . . . . . . . . .i................
0.4 ................................ i .................. 0.31~
i.
leH chronic ri tchronlc $1 chronic
............
.i ..
--+---t
- -v.-
t
...........
----------*.------------..... . . . . . .
0 2 ...................
~
0.1 . . . . . . . . . . . . . . . . . . ~
,._._.............
-
.i
........................................
*-................
1.
................
+..........
...........
A,.,
1.
............
+.......
-.__.,
0 0
0.25
0.5 Leslon size
0.75
1
Fig. 3. Root mean square error (RMSE) vs. lesion size for the symmetric model with inhibitory (a, b) and excitatory (c, d) callosal influences. The lesion size shown is a fraction of total corpus callosum size, e.g., 0.25 means a 25% corpus callosum lesion. Each graph plots error E of the full model (dot-dashed line) and errors EL (dashed line) and EK (solid line) of the left and right hemispheric regions measured in isolation. Both acute errors immediately following lesions (a, c) and after a recovery period (b, d) are shown.
226
inhibitory callosal influences immediately after a lesion, and chronically (i.e. after retraining). The acute impairment of the full model in each case was found to be roughly proportional to lesion size, i.e. performance impairment is worse with larger lesions. The symmetry of the hemispheres resulted in similar impairment of both hemispheres regardless of lesion size. Following each lesion, training continued until the full lesioned model's error returned to the normal performance level ( E = 0.05) or the maximum number of epochs occurred. In each case, full recovery of the model's performance occurred, with both hemispheres playing a roughly equal role in recovery (horizontal line plotted in Fig. 3b, dj. While the full model performed normally after retraining, the separate hemispheres did not always return to pre-lesion performance levels. Figure 4 shows the mean activation levels of left and right hemispheric regions vs. lesion size for both inhibitory (Fig. 4a, b) and excitatory (Fig. 4c, d) callosal influences. With inhibitory callosal influences, mean activation increased bilaterally following lesions, both acutely and chronically, especially for larger lesions. This happened
because as the callosal connections were sectioned, the hemispheres were dis-inhibited, thereby increasing mean activations in the acute case. Mean activations in the chronic case generally decreased relative to the acute case, but usually did not return to pre-lesion levels. On the other hand, in the excitatory case mean activation levels decreased acutely bilaterally (Fig. 4c) and this decrease also persisted but improved after continued training (Fig. 4d). The decrease in mean activation levels reflected loss of mutual excitation between the two hemispheric regions. Lateralization did not change significantly in either the acute or chronic cases, with either inhibitory or excitatory callosal influences, and the models remained essentially symmetric. Neither hemispheric region had a significant advantage over the other during the post-lesion retraining in the symmetric case. Asymmetric hemispheric regions Asymmetries in the eight intact asymmetric models (Table 1j were always arranged so that the left hemispheric region was the dominant one. The
a. Symmetric model, c = -2, Acute 8
f
9
acute
--+--
1
I
1
9f 0.6 08
0.6
left chmnk --+-dahlchmnlc -a-
____----
+------<# I I
0.4
v
0
-
0.25
0.5
0.75
1
LeGbn size
c. Symmetric model, c = +2, Acute 1
1,
f
9
4
0
0.25
0.5
0.75
1
08 0.6
I
I
-
left chmnk
-8-
::'"2--0
Leslcil site
Fig. 4. Mean activation vs. lesion size for each hemispheric region of symmetric model with inhibitory (a, b) and excitatory (c, d ) callosal connections, both acutely (a, c) and following recovery (b, d).
221
results obtained were similar to those obtained with symmetric hemispheric regions in some respects but were modulated by the asymmetries in the hemispheres and the corpus callosum strength. Figures 5a and 5c show plots of error immediately after lesions vs. lesion size for the asymmetric learning rate model versions having inhibitoiy and excitatory callosal influences. As shown here, the severity of the performance deficit was generally higher for larger lesions in asymmetric models. In general, however, for the models with the same corpus callosum strength and sign, the effect on the full model of partial lesions decreased somewhat with increasing pre-lesion lateralization relative to the symmetric model. Callosal sectioning did not have much effect on model versions with weak corpus callosum strength. In general, the dominant left hemisphere was impaired acutely much more than the nondominant hemisphere by callosal lesions. After the lesioned models were allowed to retrain, the full models almost always recovered to the pre-lesion level of performance (Fig. 5c, d). In general the dominant hemispheric region played a much greater role in recovery. This is evident in
Fig. 5 where the dominant hemisphere decreases its error by a much greater margin than the nondominant hemisphere after post-lesion training. This contrasts with the symmetric models where both hemispheres contributed some to recovery. Acutely, lateralization was generally observed to decrease or remain constant and Fig. 6 shows typical examples where lateralization decreased immediately after lesioning, especially for larger lesions. Actually, the acute post-lesion lateralization decreased substantially in roughly half the models and in others it remained almost constant. Only with one model (asymmetric size hemispheric regions (left = 14, right = lo), corpus callosum strength = -2) did a slight acute increase in lateralization occur with 75% and 100%size lesions only. The decrease in lateralization often observed acutely following lesions occurred because the dominant hemisphere was impaired more than the non-dominant hemisphere, bringing its performance closer to that of the non-dominant hemisphere. On the other hand, in the models in which lateralization decreased acutely, it typically increased chronically relative to acute post-lesion lateralization (Fig. 6), but not necessarily relative to
b.
a. c = -2, Acute awte
04
w
u) 5
03
la
c
= -2, Chronic
0.6 r
I
02-.............................................
i
I
--+---t-
.........
.............. 0.1 - - ~ - ~ ---------9----------------~ - ~ -:............................. = + = . ......... "" ----t.... 0 0
0.25
0.5
0.75
1
0
0.25
0.5 Leslon slze
Lesion size
C.
.___ 0.75
d. c = +2, Chronic
c = +2, Acute
0.6
I
1
0.5
...............
0.41
0.3 ...........................
0 0.25
0.5 Leslon size
0.75
I
............ ...........I j
!,.
...........
.............. .........
...........................
......................................................................
0.2
;........................
-----------------*-----------------~----------.-----* ....................... :......................................... ............
0.1
...................
.........
w u)
2
0
1
.?...
0
0.25
0.5 Lesion slze
0.75
1
Fig. 5. Root mean square error (RMSE) vs. lesion size for asymmetric model with learning rate asymmetry 0.05 vs. 0.02 acutely and chronically for inhibitory (a, h) and excitatory (c, d) callosal influences.
228
0 -0.1 -. ......................
s
I
I
. ‘ .
i
.......... ; ............................
-0.2 _. ................ ...............................
:
.il
f
............ A.
-0 5 k
-0.6
i
.......................
4 4
0
7
................ ...........
. . .:
-
..............
..........& =________------............
_____-----+-----........................ .............. ______.
0.25
i
“-%-zLfLT.:
0.5
84
0.75
___--__---
t1
-
~~
-0.3Ir....
-0.6
:
0
--t
11
i .............................
i.................. . . . . . . . . . . . . . .~..................... -
/..
................................ 0.25
-8-
..._................ .......... .-
. ?
...
acute chronic
d------.----------+----------------...................., ......................... -
......... ........................
......................
-0.4
r .j ................
.I
-02 _ ................... ,_C.-~~:rT:LTxTT-.
-0.5 - .
.i.
-
I
-... . . . . . . . . . . .i..................................
-0.1
f-
;... ..................i.................
-0.3 _. ................... .................. ........................
0.
acute -8-
chm&
:
.....................
0.5
0.75
Fig. 6. Lateralization vs. lesion size for asymmetric model with learning rate asymmetry 0.05 vs. 0.02 acutely and chronically for (a) inhibitory and (b) excitatory callosal connections.
pre-lesion lateralization as occurred in this example. This happens because the dominant hemisphere improves its error much more than the non-dominant hemisphere during post-lesion training, increasing the relative performance difference. In the models in which the acute post-lesion lateralization was comparatively unchanged, the chronic lateralization also remained relatively constant. Post-lesion mean activation changes were modulated by corpus callosum strength and the excitatory or inhibitory nature of callosal influences. In the inhibitory callosal models (Fig. 7a, b),
a. c = -2, Acute 1
08
1
the mean activation for the right non-dominant hemisphere was initially low prior to lesioning and was almost always found to increase after lesions, both acutely and chronically. On the other hand, the dominant left hemisphere activation levels remained relatively constant. The non-dominant right hemisphere’s increase in mean activation following callosal lesions was not obtained if the corpus callosum was weak (size 20/5 having corpus callosum strength= - 0.5). In this case the mean activation levels of both dominant and non-dominant hemispheric regions remained constant after lesioning. In contrast, with excitatory callosal
b. c = -2, Chronic I
-
%-A
...............+........---___________”
0
025
0.5
0.75
0
1
0.25
Leslon slm
1
,e s
0.75
1
0 75
1
d. c = $2, Chronic
c. c = +2, Acute
s 9
0.5 ~ e ~ l size on
1.
08
8
06
I
I
0.6 08
04
02
0
0
0.25
05 Lesion slze
0 75
1
0.25
0.5
Lealon slze
Fig. 7. Mean activation vs. lesion size for asymmetric model with learning rate asymmetry acutely and chronically for inhibitory (a. b) and excitatory (c, d) callosal connections.
229
models (Fig. 7c, d) there was a large bilateral drop in mean activation levels bilaterally. This reflected loss of mutually reinforcing excitation of each hemisphere by the other. After retraining, the activation levels only partially increased back toward their pre-lesion values, more so in the dominant left hemisphere. Acallosal model
Agenesis of corpus callosum refers to congenital absence of the corpus callosum. To simulate this situation where the callosal ‘lesion’ occurs before any training with our model, training of each model version without a corpus callosum was done. All of the diffuse model versions that were used for corpus callosum lesions as described above had their untrained networks retrained without callosal connections. This enabled us to compare the results of the acallosal models with the corresponding models trained with a corpus callosum in place. Each of the points in Fig. 8 represents a model version’s lateralization obtained without corpus callosum vs. with corpus callosum. These results show that substantial lateralization can be obtained even without a corpus callosum. This happens due to the underlying asymmetry present in the model’s
1‘
I
I
4.6
-0.5
4.4
4.3
4.2
Lsteraylatlonwm
cc
-0.1
0
0.1
Fig. 8. Lateralization without callosal connections (vertical axis) vs. lateralization with callosal connections. Symbols: x (0) =models with excitatory (inhibitory) callosal influences.
hemispheric regions, and is consistent with experimental studies that have found language lateralization in acallosal individuals (Lassonde et al. 1990; Sauerwein et al., 1993). The dashed line in Fig. 8 represents the set of points for which the same lateralization is obtained both with and without a corpus callosum. Any points above this line have more lateralization when the corpus callosum is present than when it is absent. As can be seen, this only occurred to a substantial degree with two of the four asymmetric models having inhibitory callosal connections. These results suggest that two factors are important in terms of the extent of post-lesion lateralization (underlying cortical asymmetries and the nature of callosal influences) rather than either alone.
Lesioning results with the topographic model Our second set of callosal lesions was carried out using versions of the topographic model (Shkuro et al, 1999). In this model, a topological structure is imposed on the two hemispheric regions, which are best viewed as two-dimensional grids of hexagonally-tessellated elements. Each neural element roughly corresponds to a cortical column. To avoid edge effects, both cortical grids are assumed to be tori, i.e. each element on one edge has immediate neighbors on the opposite edge (periodic boundary conditions). Each cortical element in this model (such as x in Fig. 1) has two sets of outgoing cortical connections. Intrahemispheric interactions are represented by a set of excitatory connections (strength + 1.0) to six immediate neighbors (e in Fig. I), and a set of inhibitory connections (strength - 2.0) to the next twelve neighbors (i in Fig. l), providing a Mexican Hat pattern of lateral interactions. Interhemispheric interactions are represented by a set of corpus callosum connections, their strength varied in different simulations, to each element’s counterpart in the contralateral hemispheric region and its six immediate neighbors (x and c in Fig. 1). This interhemispheric connectivity is motivated by the observation that biological callosal fibers are mostly homotopic, i.e. each hemisphere projects to the other in a topographic fashion so that roughly mirror-symmetric points are connected to each other (Innocenti,
230
1986). Callosal fibers synapse mainly on contralateral neurons over a fairly small cortical region (Hartenstein and Innocenti, 198 1; Innocenti, 1986). Callosal connectivity in the topographic model is always bidirectional, i.e., if a cortical element on the left side has a callosal connection to an element on the right side, then the reverse connection is also always present, even if the hemispheric regions have different sizes. For direct comparability with the diffuse model, we randomly selected the callosal connections that were severed in each simulated callosal lesion, being sure that the corresponding left-to-right callosal connection was lost for each right-to-left connection lost, and vice versa. The effects of acute callosal lesions were simulated using eight variations of the basic topographic model, as summarized in Table 2. These variations represent four different situations concerning underlying hemispheric asymmetry similar to those used with the diffuse model: symmetric (no hemispheric asymmetries except initial random weights), asymmetric size (more hemispheric elements on the left side), asymmetric excitability (higher maximum activation on the left side) and asymmetric synaptic plasticity (higher learning rate on the left side). These different representative versions were selected so that pre-lesion lateralization varied from almost none (p=O) to almost complete (p= - 0.535), and activation patterns in the two hemispheric regions varied from fairly anti-sym-
metric to completely symmetric. Table 2 also shows the pre-lesion performance errors E, EL and ER, and the mean activation levels in the two hemispheric regions. The basic size of each hemispheric region was l o x 10 elements, larger than in the diffuse model lesioning studies described above. For each simulated lesion, we again measured model performance three times: when the model was intact (fully trained network just before any lesion occurred), acutely (immediately after the lesion), and chronically (after the network was retrained so that its performance returned to our predefined criteria of ‘normality’ ( E G 0.05) or the pre-specified time limit of 10,000 training epochs was exceeded). In all three cases the errors E, EL and ER and lateralization p were measured. Symmetric hemispheric regions
Simulated lesions were performed on the two symmetric versions of the topographic model (Table 2, top two rows), one with inhibitory and one with excitatory callosal influences. In general, symmetric versions of the topographic model exhibited similar but less pronounced post-lesion changes than the diffuse model described above. Acute error varied as a function of lesion size as seen in Fig. 9a, c: post-lesion performance of the full model generally worsened as lesion size increased, especially with inhibitory callosal influences. This performance impairment was much less
TABLE 2 Variations of intact topographic model used for lesions Asymmetry
Callosal strength
Error
Lateralization
Full
Left
Right
Mean activation Left
Right
Symmetric model
-2 +2
0.046 0.036
0.29 1 0.127
0.281 0.127
0.014 0.000
0.143 0.196
0.135 0.196
Size 12x 12 vs. 8 x 8
-2 t2
0.043 0.033
0.229 0.159
0.408 0.290
- 0.261
-0.173
0. I20 0.163
0.1 16 0. I60
MaxAct 1 .o vs. 0.5
-2 +2
0.040 0.042
0.046 0.086
0.449 0.255
- 0.535 - 0.204
0.194 0.203
0.038 0.141
LearnRate 0.05 vs. 0.02
-2 t2
0.045 0.038
0.061 0.059
0.357 0.322
- 0.374
0.183 0.218
0.076 0.149
- 0.325
23 1
a. c = -2.0, Acute
b. 0.5
1
04
..................
I
.
., ......
!
I
.
..#. ..............
..-
______ &...b. _____ ...---& -............. _________ ---.p_____________ 02 _ ...................: . . :............ :... ..... 0.3
Q r
.....
!
0.1 . . . . . . .: .................. "'" &"."._.-.-.-+.---.--.-.*-.----.-.-.I, 0.05 , _ _ . _* _ _ +- _._._.-,_. 0
20
40
60 Lesion Lze. %elements
0 80
100
d.
c. c = +2.0, Acute 0.5
,
9
20
40
I
04
= a v)
0 23 t 0
0.05 .. 0 0
i..... ........ 60 Lesion Size, %elements
80
100
0
20
40 60 Leoon Size. %elements
80
100
Fig. 9. Root mean square error (RMSE) vs. lesion size for (a, b) inhibitory and (c, d) excitatory callosal influences, both acutely and following further training, in symmetric versions of the model. Plots are of error E for full model (dot-dash line with open triangles). EL (dashed line with open circles), and ER (solid line with closed squares). In (c) and (d), EL is hidden by E,.
pronounced than in the case of the diffuse model (Fig. 321, c). The symmetric topographic model appeared to be more balanced prior to lesioning than the corresponding diffuse model, having very little initial lateralization of function. With excitatory callosal influences, individual hemispheric performance slightly worsened as lesion size increased, while with inhibitory callosal influences it remained virtually at the pre-lesion level. Following each lesion, adaptations via synaptic changes were allowed to continue. This post-lesion training of the model improved its performance, always leading to complete recovery where error E was reduced to or below 0.05, the criterion for intact performance (bottom line in Fig. 9b, d). Both hemispheric regions often participated in recovery, although changes were minimal. Mean activation levels in both the acute and chronic cases varied qualitatively in a similar manner as with the corresponding diffuse model, as seen in Fig. 10. The changes in mean activation levels of the two hemispheric regions during recovery were very small for both types of callosal influences, and much smaller than those seen with the diffuse model.
Asymmetric hemispheric regions Simulated lesions to versions of the topographic model with asymmetric hemispheric regions and function lateralization generally produced results qualitatively similar to those seen with the symmetric topographic model as described immediately above. These results were again modulated by asymmetric functionality of the hemispheric regions and by corpus callosuin strength. Qualitatively similar behavior and changes as with the diffuse asymmetric models were observed, so these are not shown graphically here for brevity. Acutely, performance of the full topographic model was at most slightly impaired. This limited performance impairment usually increased as lesion size increased, and was more pronounced in cases with inhibitory callosal influences, indicating a corresponding performance impairment of the dominant left hemispheric region. With lesions in versions having an excitatory corpus callosum, very little error was introduced into acute performance. Performance of the full model was more closely approximated by that of the dominant hemisphere in model versions with higher levels of the pre-
232
a. c = -2.0, Acute
b.
c = -2.0, 0.15 0.148
0.132
0
40
20
60
80
0.13
100
Chronic 1
-
r
20
0
02
c
a
0.185 O 0 18
I
. 0
d.
Acute I
20
l
Q
80
100
c= 02
I
I
40 €0 Leslon Size. % elements
I
I
40
60
80
100
Lesion Size, %elements
Leslon Size, % elements
c. c = $2.0,
I
I
-
+2.0, Chronic
019
-
Ole5
-
0 18
0
,
I
I
I
~
I
I
20
40
,
60
80
100
Leslon Size. % elmen$
Fig. 10. Mean activation vs. lesion size for (a, b) inhibitory and (c, d) excitatory callosal influences, both acutely and following further training in symmetric versions of the model. In (c) and (d), E,. and E , are virtually identical.
lesion lateralization. Callosal lesioning virtually did not affect performance of the non-dominant right hemispheric region. Following each lesion, adaptations via synaptic changes were allowed to continue, and as with the symmetric model this also led to complete recovery. Quick recovery from the minimal deficit that occurred was usually due to improved performance of the dominant left hemisphere. Lateralization of function towards the dominant left hemisphere, which was found to slightly decrease acutely as the dominant hemisphere received less excitation from the right, was generally restored to its pre-lesion level during retraining. Mean activation levels changed similarly to those of the diffuse model, but these changes were again less pronounced.
Unilateral loss of afferent connections While the effects of callosal lesions described above show that full model performance is generally impaired by callosal sectioning, they do not clearly indicate the extent to which this impairment is due to loss of useful information transfer between the hemispheric regions. Post-lesion impairment might, for example, be solely due to
non-specific imbalances in hemispheric activation. To explore further the issue of whether specific information was being transferred between hemispheres via the corpus callosum (and by implication was therefore being lost by callosal sectioning), we performed a series of simulations using the topographic model where connections between input and hemispheric regions were cut. The lesioned afferent connections were either all incoming connections to a selected subset of hemispheric elements, or all outgoing connections from a selected subset of input elements. We call these two different methods for sectioning input-tocortex connections grouping by cortical elements and grouping by input elements, respectively. Connection sectioning with grouping by cortical elements roughly corresponds to a focal lesion of subcortical white matter in the vicinity of a cortical region. Grouping by input elements does not have an obvious biological interpretation, but the results are interesting from the computational point of view and give some insight into the issue of whether specific information is being exchanged between the hemispheric regions. Six versions of the trained, intact topographic model were used to study the effects of sectioning
233
input connections: two symmetric versions (the top two rows of Table 2), two versions with maximum activation asymmetry (rows five and six in the table), and two versions with learning rate asymmetry (the last two rows in the table). Each input lesion was introduced by sectioning connections from the input layer to the left hemispheric region, which was dominant in the asymmetric models. In experiments with grouping by cortical elements, lesion size varied from 0 to 100% of cortical elements, e.g. a single hemispheric element was selected as the focus of lesion, then it and its neighbors in concentric 'circles' of increasing radii up to the desired lesion size were selected into the set of elements, whose connections from the input layer were all sectioned. In experiments with grouping by input elements, only those input elements that could be active for the specific 50 input words used for training were considered for lesions. There were 13 such elements (seven in the first position in a word, three in the second, and three in the third). Lesion size varied from 0 to 13 input elements, in either left-to-right or right-to-left order of corresponding position in a word. For each lesion we measured acute and chronic performance
a. c = -2.0, Acute
of the full model and left and right hemispheric regions individually, as done in the previous sections.
Symmetric hemispheric regions In experiments with the two versions of the symmetric topographic model, a substantial drop in the performance of the full model was observed, and this drop increased as the lesion size grew larger, as illustrated in Fig. 11. The impact on the left hemispheric region performance was more dramatic than that on the right as would be expected, and acutely the model became slightly lateralized to the right. Full recovery always occurred with re-training. During re-training, as the performance of the full model returned to the prelesion level, the right hemispheric region remained dominant (post-training lateralization was as high as p=O.2 for the largest lesions). These changes occurred regardless of whether callosal influences were excitatory or inhibitory.
Asymmetric hemispheric regions In the four asymmetric versions of the model, the left hemispheric region was always dominant prior
b. c = -2.0, Chronic 1
1
1
I
w
In
- - - ~ - - ~ - ~ . ~--__ - &-___. -4 ~----~~
-
!
0.3 4
i
.I
- -
............ ............... ................ ...............j.
......... i+ 0
2
6 8 10 Ledon Size, #input nodes (0111 of 13) 4
d.
c. c = +2.0, Acute
w
v1
0
12
c=
2
..............i....................
*
*
............
i
*-.-.*
I
...........................
;............ i ...........? .......
+.-.-.
-.-T
4 6 8 10 Ledon Size. #input nodes (cut of 13)
12
+2.0, Chronic
03 02
01 0
0
2
4 6 8 10 Leslon Size, #input nodes (out of 13)
12
Fig. 1 I. Root mean square error (RMSE) vs. lesion size for input connection lesions with grouping by input elements, both acutely and following further training, in the symmetric model with inhibitory (a, b) and excitatory (c, d) corpus callosum. Lesions in the left hemisphere's connections from input layer.
234
to lesioning (the lateralization varied from p =: - 0.2 to p = - 0.5). First consider sectioning of input connections to the left, dominant hemispheric region, where grouping by cortical elements is used. Except for the model with the weakest pre-lesion lateralization (maximum activation asymmetry, c = + 2, p = - 0.2041, the left hemispheric region remained dominant both acutely and chronically for almost all lesion sizes (Fig. 12 provides an example with asymmetric learning rates). Only in one case was the right hemispheric region able to catch up with the left region during retraining with a lesion size of 100% of connections to cortical elements. This result demonstrates the effective information transmittal to the left hemispheric region from the right hemispheric region via corpus callosum. For larger lesions, say 9596, only a few elements in the left hemispheric region receive information directly from the input layer. Therefore the initial left hemisphere activation pattern that encodes the input word for subsequent generation of output phonemes must be formed based on the signals received from the right hemispheric region via the corpus callosum. And yet, the left hemispheric region remains dominant
when tested in isolation, especially with excitatory callosal connections (e.g., even with 100% loss of afferents, as in Fig. 12d). Another less obvious possibility, that feedback from the state elements to the left hemispheric region (see Fig. 1) might provide the information for the left hemispheric region to perform so well, seems unlikely. This is both because the right hemispheric region is nondominant and has very poor performance alone, because such feedback does not exist for the first phoneme generated for any word, and because such feedback cannot unambiguously predict subsequent phonemes. Finally, consider another example (asymmetric excitability) with similar sectioning of input connections to the dominant left hemispheric region, but where grouping by input elements is used. Figure 13 gives representative results. Similar results are seen as with grouping by cortical elements. Even when there is a complete loss of input connections to the left hemispheric region, and input information reaches it only via the right hemispheric region and callosal connections, the left hemispheric region still retains its dominance (Fig. 13b, d).
b. c = -2.0, Chronic
i -0' :.;4:.. .
i
........... .... ...... .... ~.../..~ ~~ : * -.. ~.................. ~ ~ *-. ...-. .~ ~ : ~ .
0.1
.
~
.
1
. . ...... .
..i.........
-P .
0
20
40
60
80
100
80
I 100
Leslon Slze. X input CoIInecUofN
d. c = +2.0, Chronic
20
40
60 Ledon Slze. %Input connections
80
100
0
20
40
80
Leslon Size. % Input connections
Fig. 12. Root mean square error (RMSE) vs. lesion size for input connection lesions with grouping by cortical elements, both acutely and following further training in the model with learning rate asymmetry 0.05 vs 0.02 for inhibitory (a, b) and excitatory (c, d) corpus callosum. Lesions in the left hemisphere's connections from input layer.
235
b. c = -2.0, Chronic II
04
-
- =
-
2
- =
2
-
= = =
05 I
04
z
.......
. . . . . . . . .
l W n
.......
.... &..-----* ; 4
no
“6
I
-;+:.:
0.1 - ........... .1.... ~
0
,,*-
e/-
...... i................... ...................i . . . . . . . .:......
*.*:+::-.*.-.~& 2
0
4 6 8 Won Size. Ulnput nodes (cut of 13)
T.
&.
10
12
I
I
d. c = +2.0, Chronic 0.5
lenchronk
“pqc
0.4 w
a3
z
0.3 0.2
-
I
- ..-
--+-...
-
I
..i ~
.......
1
i............. ;...............i ...
_ - - - - _ j ... - .....!.......1.........-......&-:-$:::*:::+ ; ____;/e----y-i
. ..; ........ ........
....
0.1 --.-i~ &::e::::$-::::
.... v-
-.+ .. *.. +.
. . . . ;............. I...............I ..............i. . 6. +.... ... *. -.--
*
‘; -
-.--4
j
.
_.,
Fig. 13. Root mean square error (RMSE) vs. lesion size for input connections lesion with grouping by input elements, both acutely and following further training in the model with maximum activation asymmetry 1 .0 vs. 0.5 for inhibitory (a, b) and excitatory (c. d) corpus callosum. Lesions in the left hemisphere’s connections from input layer.
Discussion There have been very few previous neural models of cerebral cortex that examined aspects of hemispheric interactions (Anninos et al., 1984; Anninos and Cook, 1984; Cook and Beech, 1990; Ringo et al., 1994). These models did not examine lateralization of functionality through synaptic weight changes, nor the effects of lesions, as was done here. Conversely, contemporary computer models of focal brain damage or ‘simulated stroke’ have generally involved only unilateral cortical regions, and thus could not examine the issue of dynamic shifts in lateralization with callosal lesions (Sutton et al., 1994; Goodall et al., 1997; Sober et al., 1997; Revett et al., 1998). To our knowledge, neither these nor any previous neural modeling studies systematically examined the factors that might cause lateralization, or how lateralization might shift during recovery from callosal damage. Post-lesion chunges
In the study reported here, we examined the effects of corpus callosum lesions on performance of a
simple model of word naming. Different versions of the model were lesioned where callosal influences were varied between diffuse vs. topographic, and excitatory vs. inhibitory, representing the four classes of hypotheses that have been made previously concerning transcallosal hemispheric interactions (Cook, 1986). In addition, for each of these four versions of the model, we varied the extent of underlying hemispheric asymmetry that was present, and thus the amount of pre-lesion lateralization. Callosal lesions of varying severity were then systematically introduced into each of the model versions. Summarizing the results of the present study, we found that with symmetric versions of the model, the larger the callosal lesion, the greater the acute performance impairment of the full model, as would be expected. Performance impairment was most evident with diffuse callosal connections, and reflected impaired performance of each hemispheric region when measured in isolation as well as of the full model. Similar results were obtained acutely with the asymmetric models, but were modulated by the degree of pre-lesion lateralization. Specifically, the more lateralized a model was
236
initially, the more effect that a callosal lesion had on performance impairment of the dominant left hemispheric region relative to the non-dominant right hemispheric region. In addition, post-lesion recovery was characterized by a period of rapid improvement followed by progressively slower improvement, with complete recovery to baseline performance levels in all cases. The occurrence of increased performance deficits with larger lesion size and the rapid-then-slowing time course of recovery are consistent with both clinical experience and animal lesioning results involving neurological impairments in general. Such results are encouraging in demonstrating that, however simplified the model is compared to reality, it does capture some fundamental aspects of post-lesion behavioral observations. In addition, the results of input connection lesions showed that, in our model, important specific information is transmitted between the two hemispheric regions via the corpus callosum. In some cases of 100% lesions, the dominant left hemispheric region was completely cut off from the input layer and still remained dominant after retraining, independent of the assumed inhibitory or excitatory role of the corpus callosum. In such situations the specific identity of the input word must have been transmitted via callosal connections from the right to left hemispheric region. This information transmittal via the corpus callosum is consistent with numerous experimental studies (Sperry, 1982; Seymour et al., 1994; Gazzaniga, 1995). Our results suggest that in some cases, one role of the intact non-dominant hemisphere in recovery may be to help the dominant hemisphere to ‘recall’ the activation patterns that correspond to the internal encoding of input words by sending the appropriate signals via the corpus callosum. Another consistent finding immediately following a lesion was a change in the mean activation levels of the individual hemispheric regions. The nature of this shift was primarily determined by whether the callosal influences were inhibitory or excitatory, and was similar in both symmetric and asymmetric versions of the model, and in diffuse and topographic models. With inhibitory callosal influences, mean activation levels generally rose bilaterally in the hemispheric regions. In contrast,
with excitatory callosal influences, mean activation levels fell bilaterally. These patterns of mean activation shift were definitely present in both diffuse and topographic model versions, but were more pronounced with diffuse callosal connections. Experimentally, regional cerebral blood flow and glucose metabolism have been found to decrease in both cerebral hemispheres following a unilateral stroke, presumably due to loss of transcallosal excitation (Meyer et al., 1987; Dobkin et al., 1989; Cappa et al., 1997). Less is known about the effects on hemisphere activity following corpus callosum ischemic infarcts andor callosal sectioning. Bilateral depression of cortical metabolism as measured by PET has been found in two baboons following anterior corpus callosum sectioning (Yamaguchi et al., 1990), and mild bilateral decrease of cerebral blood flow has also been found in eleven swine following callosal sectioning (Andrews et al., 1993). Sectioning of the model’s callosal connections in this current study makes the (expected) testable predictions that if callosal influences are inhibitory, then increased bilateral post-lesion activation levels are expected, while if they are excitatory, then decreased activation levels are expected. Thus, the limited experimental data that exist are most consistent with an excitatory role for the corpus callosum. Of course, these predictions hold only to the extent that coupling exists between neuronal activity and blood flowfoxidative metabolism. We also examined the changes that occurred during a recovery period as synaptic changes were allowed to continue. Recovery occurred in the model after a lesion because the increase in output errors following a lesion reactivated the learning process. Both hemispheric regions perceived the same output errors as the model continued to process words; no other changes were made to the model other than the acute, static callosal lesion. All versions of the model recovered fully. This full recovery was expected because, for the reasons noted earlier, each hemispheric region alone generally had the potential to learn the input-output mapping involved. More interestingly, both hemispheric regions sometimes participated in and contributed to postlesion recovery. In the symmetric versions of the
231
model, this was most evident for larger callosal lesions, and was true regardless of whether callosal effects were excitatory or inhibitory. In contrast, with asymmetric models, recovery from callosal lesions was largely due to improved performance by the dominant hemispheric region. Often only the dominant hemisphere improved significantly following callosal sectioning. Previous results arid conclusions
The neural models used in this investigation are like many other contemporary neural models in being both small and markedly simplified when compared to biological reality. The sudden focal lesions that have been used, while similar to those used in recent computer models of stroke, are also quite simple. However, in spite of these limitations, the model’s behavior is quite interesting in capturing a number of phenomena observed experimentally. As noted earlier, both the diffuse model (Reggia et al., 1998) and the topographic model (Shkuro et al., 1999) have been used previously to study how hemispheric asymmetries and callosal influences can lead to lateralization of word naming. Both models have also previously been used to examine the effects of unilateral intrahemispheric lesions (Reggia et al., 1999; Shkuro et al., 1999). Jointly with the new callosal lesioning results presented in this current chapter, these simulations relate to and have implications for a number of questions concerning hemispheric interactions, lateralization of language and other cognitivehehavioral functions, and recovery from brain damage, as follows. First, which underlying hemispheric asymmetries are responsible for behavioral lateralization? Many hemispheric asymmetries have been identified that are candidates for explaining why behavioral lateralization occurs. Perhaps the most widely accepted theory is that hemispheric anatomical asymmetries are a critical factor (Galaburda and Habib, 1987; Geschwind and Galaburda, 1987; Hellige, 1993). Anatomic and cytoarchitectonic asymmetries include, for example, a larger left temporal plane in 65% of subjects, including newborns (Geschwind and Levitsky, 1968; Witelson and Pallie, 1973; Galaburda, et al., 1978),
although it is difficult to see how this could account for left language dominance in over 90% of the population, and the nature and significance of such findings have been questioned based on careful magnetic resonance imaging studies (Loftus et al., 1993). Other significant hemispheric asymmetries exist, including greater higher-order dendritic branching in speech areas of the left hemisphere (Scheibel, 1985), more gray matter relative to white matter in the left hemisphere (Gur et al., 1980), asymmetric distributions of neurotransmitters such as dopamine and norepinephrine (Tucker and Williamson, 1984), and a lower threshold for motor-evoked potentials on the left (Macdonnell et al., 199I ). At present it is not clear which, if any, of these underlying asymmetries leads to behavioral lateralization. Motivated by these empirical findings, variations of the intact diffuse and topographic models used for lesioning in this study were examined previously to determine conditions under which lateralization would emerge ‘spontaneously’ during learning (Reggia et al., 1998; Shkuro et al., 1999). Our simulations showed clearly that, within the limitations of the intact models we studied, it is easy to produce lateralization. Asymmetries in such different factors as initial random weights, hemisphere size, synaptic plasticity, activation level, sensitivity to input, and feedback intensity were all found to lead to lateralization of varying degrees. This finding is consistent with past arguments that a single underlying hemispheric asymmetry is unlikely to account for language and other behavioral lateralizations (Hellige, 1993). We have suggested previously that asymmetric synaptic plasticity can be viewed as a common causative mechanism for the various factors leading to lateralization in the model, and that the process of lateralization is interpretable as the outcome of a ‘race to learn’ between the model’s two hemispheric regions (Reggia et al., 1998). However, the generality of this result remains to be established. Second, does a hemisphere exert primarily an excitatory or inhibitory effect on the opposite hemisphere via the corpus callosum? This is a central issue in this current chapter. Hemisphere interactions via pathways such as the corpus callosum provide another potential factor in beha-
238
vioral lateralization (Zaidel, 1983). As noted in the Introduction, the neural elements connected by callosal axons as well as transcallosal diaschisis and split brain experiments, suggest that transcallosal interactions are mainly excitatory in nature (Berlucchi, 1983). However, this hypothesis is quite controversial (Denenberg, 1983), with a number of researchers suggesting that, regardless of the specific nature of callosal synaptic connections, transcallosal inhibitory interactions are much more important (Kinsbourne 1978; Cook, 1986). Recent transcranial magnetic stimulation studies have also indicated that activation of one motor cortex region inhibits that of the contralateral (Ferbert et al., 1992; Meyer et al., 1995). The overall results with lesions to our model provide some support for the hypothesis that callosal effects are predominantly excitatory. This is somewhat difficult to reconcile with findings from the intact model, where it was found that lateralization with any hemispheric asymmetry tended to occur most readily and markedly when callosal connections were inhibitory (Reggia et al., 1998; Shkuro et al., 1999.). However, when hemispheric asymmetries were sufficiently pronounced, or when they directly affected synaptic plasticity, marked lateralization could also occur with excitatory callosal connections in the intact model. In our earlier simulated lesioning studies with excitatory callosal connections, unilateral hemispheric lesions generally resulted in an acute fall in hemisphere activation bilaterally, consistent with experimental data (Meyer et al., 1993; Bowler et al., 1995). This was unlike the increased contralateral hemisphere activation seen acutely with versions of the model having inhibitory callosal connections. An increasing ‘mass effect’ with progressively larger lesions and the occurrence of diaschisis were most clearly evident with excitatory callosal connections. In other words, on balance the hemispheric lesioning results with model versions having excitatory callosal connections were most consistent with experimental data, and support the hypothesis that callosal influences are primarily excitatory. The callosal lesioning results in the current study when excitatory callosal influences existed also predict that bilateral fall of hemispheric activation should occur following
callosal infarcts or callosotomy. As noted above, this is consistent with limited animal data (Yamaguchi et al., 1990; Andrews et al., 1993). Third, to what extent does the non-lesioned hemisphere participate in recovery following a hemispheric lesion? There has been a long-standing clinical interest in the extent to which the right, intact hemisphere contributes to recovery from aphasia following a stroke destroying left hemispheric language areas (Wernicke’s area, Broca’s area, etc.). Evidence has existed for over a century that the right hemisphere plays a crucial role in the language recovery process in adults. The earliest evidence came from observations that recovery from aphasia due to a left hemisphere lesion would relapse when a new, mirror-image right hemisphere lesion occurred years later (Gowers 1893; Lee et al., 1984). Subsequently, a series of studies has provided consistent evidence of substantial right hemisphere responsibility for language recovery after left hemisphere strokes, using a wide variety of methods: the Wada Test (Kinsbourne, 1971), Xenon 133 cerebral blood flow measurements (Knopman et al., 1984), auditory evoked potentials (Papanicolaou et al., 1987), dichotic listening (Papanicolaou et al., 1988), and transcranial Doppler studies (Silvestrini et al., 1995, 1998). The most recent evidence that the right hemisphere is responsible, at least in part, for recovery from aphasia has come during the last decade from functional imaging studies, primarily in the form of positron emission tomography (PET) investigations. For example, in one study six recovered, right-handed Wernicke’s aphasics were found to have increased activation in the right hemisphere in areas largely homotopic to the left hemisphere’s language zones (Weiller et al., 1995). Other PET studies involving a broader range of aphasia syndromes have reached similar conclusions, either by comparing acute and chronic PET studies in the same patients or by comparing recovered aphasia patients to control subjects (Ohyama et al., 1996; Cappa et al., 1997). However, there are also PET studies that, although often finding increased right hemisphere activation in individuals with poststroke aphasia, have questioned how well these changes correlate with the recovery process (Belin et al., 1996; Heiss et al., 1993, 1997). These latter
239
studies are difficult to interpret in that some have been based solely on chronic data without documenting the extent of patient recovery (Belin et al., 1996), or have other complicating methodological problems (e.g., (Heiss et al., 1993) considered only residual variance in outcome not accounted for by initial severity). Although controversial, the bulk of the experimental data so far is indicative of a right hemisphere role in aphasia recovery. In many of the versions of our model examined previously, the contralateral, intact hemisphere was responsible for a substantial part of post-lesion recovery following hemispheric lesions (Reggia et al., 1999; Shkuro et al., 1999). Such results support the hypothesis of a right hemisphere role in recovery from aphasia due to left hemisphere lesions. However, the extent of the intact hemisphere’s participation after unilateral cortical lesions was very much a function of lesion size, being much more evident with larger dominant side lesions. In addition, the focal input connection lesions to our model’s left cortical region described in the present chapter can be viewed as representing subcortical white matter lesions that interrupt cortical afferents. The results with such lesions indicate that the cortical vs. subcortical location of a lesion may also be a factor in how much the right hemisphere participates in recovery. With the extensive loss of input connections to the dominant left hemispheric region, the left hemisphere was still responsible for most recovery if it continued to receive transcallosal information. This contrasts with the greater role of the right hemispheric region in recovery that was found earlier with simulated cortical lesions (Reggia et al., 1999; Skuro et a1 1999). The results in the current study with simulated callosal lesions indicate that when the hemispheric regions are symmetric and significant lateralization was not present, both hemispheric regions in isolation improved their performance and contributed to recovery. In contrast, in strongly lateralized versions of the models, following callosal lesions most improvement in the full model was associated with improvement in the dominant left hemispheric region’s performance. This is also quite different from the results seen with unilateral dominant
hemisphere lesions where both hemispheric regions participated in recovery, and represents another prediction of the model. Overall, both the hemispheric lesioning simulations done earlier, as well as the current callosal sectioning results, indicate that future experimental studies of this issue should carefully control for lesion size. Finally, do computational models have a role to play in gaining a deeper understanding of hemispheric interactions? Questions about the nature of hemispheric cooperation and competition must ultimately be resolved by experiment. What is surprising however, given the tremendous volume of past and ongoing experimental research into callosal functionality, lateralization of cognitive and behavioral functions, hemispheric roles during recovery from stroke, etc., is the very small amount of past work that has been done on developing neural models related to these issues. The results we have obtained here and in our other recent modeling work strongly suggest that formal models can also play an important role. They provide a theoretical framework in which non-trivial implications of hypotheses about hemispheric properties and interactions can be demonstrated. The complexity and nonlinearity of brain dynamics indicate that computational models of interacting hemispheric regions may prove useful in both interpreting existing data and guiding future experiments, just as models have proved extremely useful in understanding other complex systems (e.g. meteorological predictions). Of course, much further work will be necessary to establish if this is true. Our own efforts are currently focusing on determining the generality of the results described here via other computational models of lateralization for quite different tasks: visual object recognition (Shevstova and Reggia, 1999) and formation of self-organizing cortical maps (Levitan and Reggia, 1999).
Appendix Words used for training: CAD CAP CAT COD COP COT HAD HAP HAT HID HIP HIT HOD HOP HOT LAD LAP LID LIP LIT LOP LOT MAD MAP MAT MID MIP MOP PAD PAP PAT PIP PIT POD
240
POP POT SAD SAP SAT SIP SIT SOD SOP SOT TAP TAT TIP TIT TOP TOT Learning rule equations: Symbols 0,h, i, s and t are superscripts that denote output, hidden, input, state, and target values, respectively; +q is the learning rate, and M is the maximum activation level. 0
Error at the i-th output unit: ep = (a: - a:)
0
0
0
0
Weight change for connection from j-th hidden unit to i-th output unit:
Error at the j-th hidden unit:
Weight change for connection from k-th input unit to j-th hidden unit:
Weight change for connection from 1-th state unit toj-th hidden unit:
Acknowledgement Work supported by NINDS award NS3.5460.
References Andrews, R., Bringas, J. and Alonzo, G. et al. (1993) Corpus callosotomy effects on cerebral blood flow. Neurosri. Lett., 154: 9-12. Anninos, P. et al. (1984) A computer model for learning processes and the role of the cerebral commiswres. B i d . Cyhern., 50: 329-336. Anninos, P. and Cook, N. (1988) Neural net simulation of the corpus callosum. Intl. J. Neurosci.. 38: 381-391,
Belin. P., Van Eeckhout, P. and Zilbovicius. M. et al. (1996) Recovery from nonfluent aphasia after melodic intonation therapy: a PET study. Neurology, 47: 1504-15 1 1. Berlucchi, G. (1983) Two Hemispheres But one Brain, Behuv. Bruin Sci., 6: 171-3. Bowler. J., Wade, J. and Jones, B. et al. (1995) Contribution of diaschisis to the clinical deficit in human cerebral infarction. Stroke, 26: 1000-1006. Cappa, S., Perani. D. and Grassi, F. et al. (1997) A PET followup study of recovery after stroke in acute aphasics. Bruin Lung., 56: 5 5 4 7 . Cook, N. (1986) The Bruin Code. Methuen. Cook, N. and Beech, A. (1990) The cerebral hemispheres and bilateral neural nets. Znt. J. Neurosci., 52: 201-210. Denenberg, V. (1983) Micro and macro theories of the brain. Behub: Bruin. Sci., 6: 174-178. Dennis, M. and Whitakar, H. (1976) Language acquisition following hemidecortication. linguistic superiority of left over right hemisphere. Bruin Lung., 3: 404-433. Dobkin, J., Levine, R. and Lagreze, H. et al. (1989) Evidence for transcallosal diaschisis in unilateral stroke. Arch. Neurol., 46: 1333-1336. Felleman, D. and Van Essen, D. (1991) Distributed hierarchical processing in primate visual cortex. Cereh. Cortex, 1: 1. Ferbert, A. et al. (1992) Interhemispheric inhibition of the human motor cortex. J. Physiol., 453: 525-546. Galaburda, A. and Habib, M. (1987) Cerebral dominance: biological associations and pathology. Discussions in Neurosciences. Vol. 4, No. 2. Foundation FESN. Galaburda, A,, Sanides, F. and Geschwind, N. (1978) Cytoarchitectonic left-right asymmetries in temporal speech region. Arch. Neurol., 35: 812-817. Gazzaniga, M. (1995) Principles of human brain organization derived from split-brain studies. Neuron., 14: 217-228. Geschwind, N., Galaburda, A,, Geschwind, N. and Levitsky W. (1968) Left-right asymmetries in temporal speech region. Science, 167: 186. Giroud, M. and Dumas, R. (1995) Clinical and topographical range of callosal infarction. J. Neurol. Neurosurg. Psychiatry, 59: 238-242. Goodall, S. and Reggia, J. et al. (1997) A computational model of acute focal cortical lesions. Stroke, 28: 101-109. Gowers, W. (1893) A Munuul of Diseases of the Nervous System. Churchill, London. Gur, R. et al. (1980) Differences in distribution of gray and white matter in human cerebral hemispheres. Science, 207: 1226-8. Hartenstein, V. and Innocenti, G. (1981) The arborization of single callosal axons in the mouse cerebral cortex. Neumsci. Lett., 19-24. Heilman, K. and Valenstein, E. (1979) Clinical Neurop.sycho/ogy. Oxford Univ. Press, Oxford. Hellige, J. (1993) Hemispheric Asymmetry. Harvard, Cambridge. MA. Heiss, W., Kessler, J., Karbe, H., Fink, G. and Pawlik, G. (1993) Cerebral glucose metabolism as a predictor of recovery from aphasia in ischemic stroke. Arch. Neurol., 50: 958-964.
24 I Heiss, W. and Karbe, H. et al. (1997) Speech-induced cerebral metabolic activation reflects recovery from aphasia. J. Neurol. Sci., 145: 213-7. Innocenti. G. (1986) General organization of callosal connections in the cerebral cortex. In: E. Jones and A. Peters (eds.) Cereb. Corte-x,Vol 5 , Plenum, 291-353. Jordan. M. (1986) Attractor dynamics and parallelism in a connectionist sequential machine. Proc. 8rh Ann. Conf Cog. Sci. Soc.. 531-546. Kinsboume, M. (1971) The minor cerebral hemisphere as a source of aphasic speech, Arch. Neurol.. 25: 302-306. Kinsbourne. M. (1978) (Ed.). A.svmmerrica1 Function of fhe Bruin. Cambridge. Knopman, D., Rubens, A., Selnes, O., Klassen, A. and Meyer, M. (1984) Mechanisms of recovery from aphasia, Ann. Neurol.. IS: 530-535. Kupfermann. 1. (1991) Localization of higher cognitive and affective functions. In: E. Kandel et al. (Ed.), Principles of Neurul Scionce. Elsevier, 823-838. Lassonde, M., Bryden. M. and Demers, P. (1990) The corpus callosum and cerebral speech lateralization, Bruin Lung., 38: 195-206. Lee, H., Nakada, T., Deal, J., Lin, S. and Kwee, I. (1984) Transfer of language dominance, Ann. Neurol., 15: 304-307. Levitan. S. and Reggia, J. (1999) lnterhemispheric Effects on Map Organization Follwing Simulated Cortical Lesions, Arf$ fntrli. Med., in press. Lezak, M. ( 1995) Neurological Assessment. Oxford University Press. Loftus. W. et al. (1993) Three-dimensional quantitative analysis of hemispheric asymmetry in the human superior temporal region, Cereft.Cortex, 3: 1074. Macdonell. R . et al. (1991) Hemispheric threshold differences for motor evoked potentials produced by magnetic coil stimulation, Neurology, 41: 1441-1444. Meyer. B. et al. (1995) Inhibitory and excitatory interhemispheric transfers between motor cortical areas in normal humans and patients with abnormalities of corpus callosum, Bruin, 1 18: 429. Meyer, J., Hata, T. and Imai, A. (1987) Clinical and experimental studies of diaschisis. In: J. Wood (Ed.), Cerebral Blood Flow. McGraw-Hill, 48 1-502. Meyer, J., Obara, K. and Muramatsu, K. (1993) Diaschisis, Ne~rol.Re.?., 15: 362-366. Ohyama, M., Senda, M. and Kitamura, S. et al. (1996) Role of the nondominant hemisphere and undamaged area during word repetition in poststroke aphasics, Stroke, 27: 897-903. Papanicolaou, A., Moore, B., Levin, H. and Eisenberg, H. (1987) Evoked potential correlates of right hemisphere involvement in language recovery following stroke, Arch. Neurol., 44: 52 1-524. Papanicolaou, A,, Moore, B., Deutsch, G., Levin, H. and Eisenberg. H. (1988) Evidence for right-hemisphere involvement in recovery from aphasia, Arch. Neurol., 45: 1025-1 029.
Peterson, S. et al. (1988) PET studies of the cortical anatomy of single-word processing, Nature, 33 1 : 585-589. Reggia, J., Ruppin, E. and Berndt, R. (Eds.), (1996) Neural Modeling of Bruin and Cognitive Disorders. World Scientific. Reggia, J.. Goodall. S. and Shkuro, Y. ( I 998) Computational studies of lateralization of phoneme sequence generation, N e w Cornput., 10: 1277-1297. Reggia, J., Gittens, S. and Chhabra, J. (1999) Post-lesion lateralization shifts in a computational model of single-word reading, Laterally, in press. Revett, K., Ruppin, E., Goodall, S. and Reggia. J. (1998) Spreading depression in focal ischemia: a computational study, J. Cereh. Blood Flow and Metub., 18: 998-1007. Ringo, J., Doty, R.. Demeter, S . and Simard, P. (1994) Time is of the essence: a conjecture that hemispheric specialization arises from interhemispheric conduction delay, Cereb. Cnrter. 4: 33 1-343. Sauerwein, H., N o h , P. and Lassonde, M. (1993) Cognitive functioning in callosal agenesis. In: M. Lassonde and M. Jeeves (Eds.). Cullosal Agenesis: A Nutirrul Split Bruin? New York: Plenum Press, 221-233. Seymour, S., Reuter-Lorenz, P. and Gazzaniga, M. (1994) The disconnection syndrome: basic findings reaffirmed. Bruin 117: 104-1 15. Scheibel, A. et al. (1985) Differentiality characteristics of the human speech cortex. In: D. Benson and E. Zaidel (Eds), The D u d Bruin. Guilford, 65-74. Shevtsova, N. and Reggia, J. (1999).A neural network model of lateralization during letter identification, J. Cogn. Neuroscience, 1 1: 167- 18 1. Shkuro. Y., Glezer. M. and Reggia, J. (1999) Interhemispheric effects of simulated lesions in a neural model of single-word reading, submitted. Silvestrini, M., Troisi, E., Matteis, M., Cupini, L. and Caltagirone, C. (1995) Involvement of the healthy hemisphere in recovery from aphasia and motor deficits in patients with cortical ischemic infarction, Neurology, 45, I8 15-1820. Silvestrini, M., Troisi, E., Matteis, M., Razzano, C. and Caltigirone, C. (1998). Correlations of flow velocity changes during mental activity and recovery from aphasia in ischemic stroke. Neurology, 50: 191-195 Sober, S., Stark, J., Yaniasaki, D. and Lytton, W. (1997) Receptive field changes after strokelike cortical ablation: a role for activation dynamics, J. Neurophysiol., 78: 3438-3443. Sperry, R. (1982) Some effects of disconnecting the cerebral hemispheres, Science, 217: 1223-1226. Springer, S. and Deutsch, G. (1993) Lqfr Brain, Right Bruin, New York: W. H. Freeman. Sutton, G.. Reggia. J., Armentrout, S. and D’Autrechy, C. (1994) Map reorganization as a competitive process, Neural Computation, 6: 1-1 3. Toyama et al. (1969) Synaptic action of commissural impulses upon association efferent cells in cat visual cortex, Bruin R e x , 14: 5 18-20.
242 Tucker, D. and Williamson, P. (1984) Asymmetric neural control systems in human self-regulation, Psychol. Review, 91: 185-215. Vargha-Khadem, F., Cam, L., laaacs, E. et al. (1997) Onset of speech after left hemispherectomy in a nine-year-old boy, Bruin, 120: 159-182. Ward, J. and Hopkins, W. ( I 993) Primure Lctrerulity, SpringerVerlag.
Witelson, S. and Pallie, W. (1973) Left hemisphere specialization for language in the newborn, Bruin, 96: 641-646. Yamaguchi, T., Kunimoto, M. and Pappata, S . et al. (1990) Effects of anterior corpus callosum section on cortical glucose utilization in baboons, Bruin, 113: 937-951. Zaidel, E. (1983) Disconnection syndrome as a model for laterality effects in the normal brain. In: J. Hellige (Ed.), Cerebral Hemispheric Asymmetry. Praeger, 95-1 5 1.
J A. Reggia, E. Ruppin and D. Clanzinan (Eds.) Progress iri Bnuii Kesecrrch, Vol I2 I 0 1999 Elsevier Science BV. All rights reserved
CHAPTER 14
Penumbral tissue damage following acute stroke: a computational investigation Eytan Ruppin'?", Kenneth Revett2, Elad Ofer3, Sharon Goodal14 and James A. Reggia5 'Departments of Computer Science and Physiology Tel-Aviv University, Tel-Aviv 69978, Israel 'Department of Computer Science, Universir),of Maryland, College Park, MD 20742, USA 'Department of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel "Department qf Computer Science, University of Maryland, College Park, MD 20742. USA 5Department.rof Computer Science and Neurology, Institute of Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA
Introduction: theories of tissue damage in acute stroke Understanding the mechanisms underlying tissue damage in the ischemic penumbra is of paramount clinical importance: it may lead to new therapeutic measures that reduce evolving or progressing stroke and the resulting post-infarct debilitation (Fisher and Garcia, 1996b). While considerable progress has been made in this area during recent years (Iijima et al., 1992; Mies et al., 1993; Choi, 1994; Fisher and Garcia, 1996a), the mechanisms by which focal ischemia evolves into infarction and the factors which determine the ultimate extent of the infarct are still controversial (Pulsinelli, 1992; Hossman, 1994a, 1994b; Heiss and Graf, 1994). In this chapter, we present a computational study of possible mechanisms that may underlie the spread of ischemic damage from the infarct core to the penumbra (peri-infarct) area. Our main interest is in the primary mechanisms that play a causal role in the activation of a cascade of metabolic events that eventually leads to penumbra tissue death. Towards this goal, we shall present a set of theoretical predictions that, if tested experimentally, may help in delineating the pathological mechanisms that "Corresponding author. e-mail: [email protected]
play a leading role in this highly important pathological state. There are currently two major hypotheses concerning the primary causal mechanism underlying penumbral tissue death during acute ischemic stroke. The leading theory is that ischemic (penumbral) damage is caused by the progression of cortical spreading depression (CSD) waves. CSD is a self-regenerating wave of reduced spontaneous electrical activity and biochemical changes that spreads across the cortex with a velocity of about 2-5 m d m i n . It is characterized by transient reductions in EEG power, failure of neurons to respond to evoked potentials, negative extracellular DC potential shifts, and increased extracellular potassium and L-glutamate (Nedergaard and Hansen, 1993). Although occurring in the normally-perfused brain, CSD waves have also been found in ischemic tissue in various animal models of acute stroke, originating from the edge of the infarct core and progressing outwards into the penumbra (then also called ischemic depolarizations) (Nedergaard and Astrup, 1986; Gill et al., 1992; Iijima et al., 1992; Mies et al., 1993; Back et al., 1994; Heiss and Graf, 1994; Hossman, 1994a). The movement of CSD waves across the cortex may be viewed as a reaction-diffusion process involving potassium ions in the extracellular com-
244
partment (Lauritzen, 1994; Revett et al., 1998). The reaction component increases extracellular K levels by leakage from cells that have been necrotized or from intracellular K released by the depolarization of cells. The released K f moves by diffusion out from the source in a radial fashion. Neurons and glial cells respond physiologically by increasing their NdK-ATPase activity, which in turn reduces the levels of extracellular K ' . If the cells are compromised metabolically, this energy expenditure can deplete the energy reserves of the cells, generating a transient mismatch between energy supplies and demand, resulting in transient episodes of tissue hypoxia and lactic acidosis (Mayevsky et al., 1982; Mayevsky and Weiss, 1991; Back et al., 1994; Gyngell et al., 1994). The ensuing gradual depletion of tissue ATP and high energy phosphate reserves following repetitive CSD waves may eventually lead to peri-infarct tissue death, depending on their frequency and duration (Gyngell et al., 1994). The second hypothesis is that ischemic cell death is caused by neurotoxic mechanisms that occur independently of CSD waves, that is, in a nonwave-dependent manner. A variety of such mechanisms have been proposed, including the generation of oxygen free radicals and nitric oxide, and the recruitment of Leukocytes and initiation of apoptosis (Fisher and Garcia, 1996b). The most prominent suggestion, however, has focused on the pathological role of glutamate excitotoxicity (GE), which may occur independently of the propagation of CSD waves. The main mechanisms that underlie high extracellular glutamate levels in the penumbra include depression of the glutamate uptake processes due to energy deficiency, reversal of transport systems and cellular lysis (see [Wahl et al., 1994al for a review). These mechanisms support the idea that GE is a self-propagating process whose pathological effects are not curtailed by diffusional restrictions (Choi, 1988; Garthwaite et al., 1992). In accordance with this concept, the increased glutamate levels originating from dying tissue in the center of the infarct core spread outwards due to the action of a positive feedback process involving both glutamate diffusion in the extracellular space and ongoing tissue death. That is, the slow rate of glutamate diffusion outwards into the penumbra is +
markedly enhanced by the additional release of glutamate from cortical tissue damaged by GE effects. Moreover, the long range spread of glutamate toxicity may be further enhanced by quasi-syncytial properties of the glial network, via which toxic metabolites can be transferred through gap junctions (Largo et al., 1996). During middle cerebral artery occlusion the levels of extracellular glutamate in the penumbra may increase 25-fold, and the magnitude of glutamate release during ischemia is positively correlated with infarct volume (Takagi et al., 1993). A similar correlation between blood and CSF levels of glutamate and infarct size was recently observed clinically in human patients (Castillo et al., 1996). Thus, in accordance with the second hypothesis, the diffusional spread of glutamate is significantly accelerated via self-propagation mechanisms, but it is not wave-dependent; the levels of glutamate rise markedly at the borders of the expanding region of high glutamate levels, and remain at increased levels for a long period thereafter. To prevent any possible confusion, it should be emphasized that glutamate excitotoxicity is likely to exert its pathogenic effects in vivo via both CSD and non wave-dependent pathways. Our study is not about whether glutamate plays a pathogenic role in penumbral tissue damage, but solely about the fundamental question of whether the underlying causal mechanism is a transient CSD wave or a more persistent self-propagating GE process that is non-wave-dependent. The goal of our research has been to exploit the ability of computational models to tease apart and study the effects of CSD and nonwave-dependent pathways in isolation. Our ability to do so hinges upon the observation that both CSD and non-wave-dependent mechanisms inherently differ in the nature of their propagation. The difference in the propagation pattern of CSD versus non-wave-dependent GE damage is marked. The CSD hypothesis assumes that tissue damage develops gradualIy due to the action of repetitive, relatively fast moving waves, that sweep across the penumbra several times during the ischemic period. These repetitive waves have a common source, originating at the rim of the infarct zone and progressing outwards in a concentric manner. In contrast, the GE hypothesis assumes that tissue
245
damage develops due to the slow propagation of persistently elevated glutamate levels. Unlike in the CSD case, where after the passage of a wave potassium levels may return to almost normal levels, in the GE case the levels of glutamate in the regions traversed by glutamate remain highly increased for a prolonged period of time. As we shall show, in appropriate situations these inherently distinct propagating characteristics can result in significantly different patterns of ischemic damage. Which of these two competing hypotheses is likely to be the prinzary causal factor responsible for ischemic penumbral damage? On the one hand, there is growing evidence that CSD waves do indeed play an important role in the pathogenesis of penumbra tissue damage. First, the number of waves traversing the penumbra has been found to strongly correlate with the extent of peri-infarct damage (Mies et al., 1993). Second, recent diffusion-weighted magnetic resonance imaging studies have provided evidence that tissue damage propagation co-occurs with the passage of individual CSD waves (Takano et al., 1996). Moreover, the CSD hypothesis seems to provide a better explanation than the GE (i.e.. non-wave-dependent) hypothesis for the therapeutic effects of NMDA antagonists in focal ischemia (Wahl et al., 1994a). On the other hand, however, there are several arguments that tend to support the non-wavedependent GE hypothesis over the CSD one. As noted by (Koroshetz, 1996), the MRI findings of co-occurrence of tissue damage and CSD wave traversal may not reflect a real causal relationship between CSD waves and tissue damage, and may merely constitute an epiphenomena1 manifestation of other underlying primary processes. That is, the CSD waves occur as a secondary indicator of the ongoing rapid bursts of tissue damage, and not as their cause. In addition, there is evidence for hypoxic neural damage in the absence of CSD depolarizations in hippocampal slices (Chen et al., 1996). Finally, unequivocal measurements of ischemic CSDs in humans are still lacking, although there is increasing evidence that CSD does occur in humans Woods et al., 1994; Mayevsky et al., 1995). Needless to say, it may also be the case that both mechanisms take part in causing ischemic damage.
In the latter case, their respective relative importance should be determined. The rest of this chapter is composed of two main parts. In the first, we review the work presented in (Revett et al., 1998), in which we have studied the role of CSD waves in ischemic stroke in isolation. CSD is probably regarded by most researchers in the field as the main possible ‘culprit’ behind penumbral damage, and hence we set out first to systematically investigate how various physiological parameters involved in the generation of CSD waves influence the amount of tissue damage. The results of our CSD modeling study provide support for the CSD hypothesis in stroke, but leave open the question of the causal mechanism of spreading tissue damage. The simulation experiments described in the second part of this chapter aim at better addressing this question. In this second part, summarizing work described in (Ruppin et al., 1999), we compare the CSD model with a non-wave-dependent model of self-propagating GE processes, and examine the spatial distribution of the damage that occurs, generating specific predictions for future experimental studies. Our results and current work in progress are summarized in the chapter’s Discussion section.
CSD in acute stroke: a detailed model The model The main problems one encounters when trying to study complex biological events like ischemic stroke is the richness of these phenomena and the incompleteness of relevant data. The large number of variables involved and their intricate and nonlinear dependencies pose considerable difficulties. When constructing a formal model of such complex systems, these problems are further augmented by implementation issues. As a model becomes increasingly realistic, more computational resources are required, and the difficulty of visualizing the multi-dimensional information and interpreting the model’s behavior increases dramatically. As a result, one needs to focus on a subset of key variables, and to determine convenient ways of tracing their temporal and spatial evolution. Below we provide a high-level overview of the mathemat-
246
ical model and its computational implementation. A more detailed technical description of the model and the parameter values used is given in (Revett et al., 1998). In the model, the relations and interactions between the main variables of interest are expressed as a set of coupled differential equations. Solving these equations numerically enables one to trace the model’s evolution in time, given initial conditions and parameter settings. The specific equations used are given in the Appendix. Both the spatial structure of the cerebral cortex and time are discretized. The cortex is represented as a twodimensional, hexagonally tessellated array of elements, each of which represents a small volume of cortex. One cortical element is 0.125 mm in length, and one time step corresponds to 13 msec. As elaborated below, each cortical element i has its own value for extracellular potassium K , , potassium reuptake R, , metabolic stores M , , persistent impairment P , , tissue intactness I , , internal potassium stores Sk,, cerebral blood flow F , , extracellular glutamate G I , intracellular glutamate Sg,, extracellular calcium Cu, , intracellular calcium Sc,, and calcium reuptake, Rc,. All variables incorporated into the model are dimensionless and are calculated and presented on a scaled basis from 0 to 1, as quantitative data on the rates of change for many of the variables in the model is currently unavailable. There are numerous other variables/ factors relevant to understanding the ischemic penumbra (e.g. reactive oxygen species, lactic acidosis, mean neuronal tiring rates, etc.). Some of these factors are currently being added to our model, but they are not considered in this current chapter which focuses solely on the most fundamental aspects of ischemic CSD waves. The extracellular potassium concentration, K, , is modeled as a reaction-diffusion process consisting of a diffusion term reflecting the passive movement of potassium ions along their spatial concentration gradient; a reaction term modeling processes that generate the sharp rise of extracellular potassium; a reuptake term representing the reuptake of extracellular potassium by neuronal and glial N d K ATPase pumps, and an infusion term reflecting an external source of potassium infusion (in experiments simulating CSD wave generation in
normoxic tissue). The rate of potassium reuptake, R , , is determined by two terms reflecting the functioning of N d K pumps: a rate-increasing term proportional to the levels of extracellular potassium, tissue intactness, metabolic stores levels, and partial impairment, and a decay term that gradually brings N d K ATPase pump activity back to its resting level when K, values are restored. The levels of different metabolic factors that are important in defining tissue energy state (e.g. glucose and high energy phosphates) are combined into a single variable M , , the metabolic stores level. Their supply is proportional to the blood flow rate (presumed to be normoglycemic and well oxygenated) and tissue intactness, and their consumption is governed by the basal metabolic rate and reuptake demands. Cerebral blood flow ( F , ) to a given cortical unit is modulated by that unit’s metabolic stores. The magnitude of resting F, varies linearly along the penumbra; cortical areas located closer to the infarct core have lower resting blood flow levels and a more subdued hemodynamic response. Tissue intactness (I, ) denotes the fraction of cellular components in a volume element that remains viable. Tissue damage occurs in a cumulative fashion whenever the energy stores of a cortical element fall below a threshold value, and the extent of the damage is proportional to the extent and duration of the energy shortage. The tissue sensitivity to damage is modulated by the energy stress which the tissue undergoes, reflecting a growing susceptibility to damage of metabolically-compromised tissue. This increased susceptibility to damage, for which there is experimental support (Iijima et al., 1992; Hossman, 1994a), is incorporated into the variable PI termed the partial impairment. The rate of potassium effusion into the extracellular compartment when infarction occurs is determined by the level of intracellular potassium stores, Sk, . Changes in the level of extracellular glutamate, G I ,are modeled by incorporating the flux pathways that are known to exist and are relevant to CSD. These include Nd glutamate carriers, amino acid carriers, vesicular release of glutamate, and release resulting from impaired metabolic status. The changes in intracellular glutamate Sg, are opposite those of the extracellular changes. Extracellular calcium levels
247
Cu,, are regulated via voltage gated calcium channels, ligand gated channels (e.g., NMDA), the activity of the Ca-ATPase, the NdCa exchanger and leakage from metabolic impairment. The changes in intracellular calcium Sc, are the opposite of the changes that occur in the extracellular levels. Finally, the Ca-ATPase pump, Rc, , is incorporated into the model, analogous to that for NdKATPase. The model described above has been implemented as a C program that runs under Unix. An Euler numerical integration method is used, and key results have been verified using a fourth order Runge-Kutta method. Values of the model variables are typically recorded at three distinct sites on the simulated cortex: the center of the lesion, 2.125 mm away from the center of the initial lesion core (half way across the penumbra), and in an intact region, 5 mm from the center of the initial lesion core (or the center of the K t stimulus in the normoxic simulations).
Results
Two kinds of numerical experiments were performed: normoxic and ischemic simulations of CSD waves. The normoxic simulation produces CSD waves similar to those that have been reported to occur under conditions of normal cerebral blood flow in the cortex of a variety of experimental animals. The model equations and parameter values were selected via an extensive empirical trial-anderror search process so that the model successfully reproduces the fundamental properties of normoxic CSD waves, their velocity and duration, and the accompanying negligible damage. In order to simulate focal ischemia within the model, cerebral blood flow is clamped to zero within an area designated as the initial lesion core and a cerebral blood flow gradient is established which defines the penumbra that surrounds the lesion. The parameters for the ischemic simulations are identical to those used in the normoxic model (except for the creation of a penumbra). Figure 1 presents a recording from the central cortical element within the ischemic core, beginning at time zero at which blood flow F suddenly goes to zero. As the metabolic stores M drop below a threshold due to basal metabolic consumption, tissue intact-
ness subsequently falls as infarction occurs (this occurs fairly rapidly in the case illustrated here; more gradual or delayed infarction can be modeled by parameter variations). As intactness drops, extracellular potassium and glutamate levels rise due to release from the internal stores of the affected cortical elements. Potassium reuptake and calcium extrusion, which require metabolic energy, are negligible in the infarct core because of the reduced metabolic energy stores. Glutamate is released by both energy reduction and the drop in tissue intactness and increases more than 100-fold, consistent with literature reports (Wahl et al., 1994b). Extracellular calcium drops and remains depressed until cellular lysis occurs, which rapidly releases Ca” back into the extracellular space. Figure 2 presents the output of the same simulation recorded near the midpenumbra (approximately 2 mm from the center of the initial lesion). During this specific simulation, five CSD waves were recorded at a third more distant electrode located in the surrounding, normally perfused cortex. However, as seen in Fig. 2, at the location of the mid-penumbral electrode, the fourth and fifth waves partially coalesced. (Note that even after CSD waves stop at one penumbral location, such as that in Fig. 2, they may continue to be generated from the rim of the expanding infarct in more peripheral regions.) The penumbra CSD waves exhibit the same basic profile found in normoxic simulations, with elevated extracellular potassium and glutamate and decreased extracellular calcium. The velocity of the CSD waves remains similar to that found in the normoxic simulations, although there was a trend towards a decrease (from 4.8 to 4.3 m d m i n ) . Their duration exhibited a 13% increase (from 2.0 to 2.3 min) between the first and last wave measured at the same location, consistent with experimental literature reports (Hansen and Mutch, 1984; Nedergaard and Astrup, 1986; Nedergaard and Hansen, 1993; Back et al., 1995). Cerebral blood flow (F) rises by 55% during the first wave in response to a 62% decrease in M. With each passing wave, the drop in M increased and the rate of recovery between CSD waves decreased, without a proportional increase in F. Intracellular calcium levels increased dramatically, consistent
248
K
R
I
M
F
0'
P 0 1
Ca
0 0.2 -
G 0
I
I
I
I
I
I
Fig. 1. Recordings from the infarct center when cerebral blood flow is clamped to zero. Variables plotted are K (extracellular potassium), R (potassium reuptake rate), I (intactness), M (metabolic stores level), F (cerebral blood flow), P (partial impairment), Ca (extracellular calcium) and G (extracellular glutamate).
with literature reports (Marrannes et al., 1988; Lauritzen, 1994; Hossmann, I994b). When intactness approaches zero, calcium returns to the extracellular space as membrane integrity is lost. Glutamate levels rise in synchrony with the CSD waves as in the normoxic case, but in a bi-phasic manner, due to vesicular and metabolic release. When infarction occurred (I decrease), additional glutamate is released as cellular membranes are destroyed. Figure 3 plots the infarct area over time (since the model is two-dimensional, we refer to infarct area instead of infarct volume) as it evolves during four simulations with a variable number of CSD waves (2-5). These results capture some important
relationships described in the literature. It has been reported that there is a very strong linear correlation between the number of CSD waves and the size of the final infarct (Mies et al., 1993). However, using diffusion weighted imaging to map out damage as it occurs over time, it was recently shown that the infarct volume progresses in a less than linear fashion (Reith et al., 1995; Takano et al., 1996). Our model captures both the linear relation between the number of waves and the final infarct area (across experiments, v=O.978), as well as the non-linear relationship between the number of waves and the amount of evolving damage (damage over time as seen in Fig. 3 ) . The pattern of infarct progression which is coupled to the passage of
249
K 0.1
-
R 01
I 1
M 0'
lr F
" 1r
P
Ca
r
G n "
0
I
I
0.2
0.4
I
I
0.6 0.8 Time in hours after MCAO
1
1.2
-
J
1.4
Fig. 2. Recordings from mid-penumbra. The variables and figures axes are as in Fig. 1.
CSD waves is consistent with several recent reports that support the hypothesis that CSD waves increase the infarct volume resulting from ischemic stroke (Mies et al., 1993; Back et al., 1996; Takano et al., 1996). We also investigated the effects of penumbra and initial lesion sizes on the number of CSD waves generated and the size of the final infarct. Holding the total area of ischemia (initial lesion core plus penumbra) constant, the relationship between initial lesion core area and final infarct area is roughly linear and highly correlated. The number of CSD waves was reduced with an increase in the size of the initial core lesion from a maximum of four to a minimum of two CSD waves. The final infarct size was linearly proportional to the width of the
penumbra, holding the initial lesion core area fixed. In a separate experiment, the initial lesion core size was fixed and the surrounding penumbra was varied in its radius. As the penumbra size (radius) increased, the number of CSD waves increased from a minimum of two to a maximum of seven CSD waves. In the preceding two experiments, and indeed for all others reported in this chapter, the final infarct area progressed to a location within the penumbra with a similar 'critical' cerebral blood flow value of about 35% of control values. This is a testable prediction of the model under conditions with essentially linear blood flow gradients in the penumbra. The model exhibits an inverse relationship between infarct area and mid-penumbra cerebral
250
25
20
%E C
15
:
(II
Time in hours after MCAO Fig. 3. Cumulative infarct area versus time from onset of ischemia for four different simulations yielding 2 to 5 CSD waves.
blood flow ( Y = -0.835). By varying the blood flow gradient linearly and takmg mid-penumbra flow as essentially representing mean penumbra blood
.._2 - 1 r..ii-. A .L iiuw, -1.me- .rnuuei successiuiiy ~-q~r-uuuces exper-I-
n-.-
mental results indicating an inverse relationship between the mean cerebral blood flow and final infarct volume (Takagi et al., 1993). Simulations are also consistent with data indicating that there is an inverse relationship between final infarct area and activity of NdK-ATPase, (Williams et al., 1994). Interestingly, the total expenditure of metabolic energy is actually reduced when NdKATPase activity is increased because of the reduction in the duration and the number of CSD waves. Thus, surprisingly, enhancing NdK-ATPase
activity may actually reduce metabolic load and tissue damage in the penumbra, suggesting a possible therapeutic avenue for future studies. A
-
fib
-I--.-.-
bliUWI1,
UUI
---
n@n- - A - I ,.__----c..ii.. L3U IllUUCl G i l l 1 b u G L C b b l u l l y
reproduce the general form and properties of CSD waves in the ischemic penumbra, and the typical profiles of blood flow response and increasing metabolic shortage observed in animal models of ischemic CSD. In summary, the model displays the strong linear correlation found between the number of CSD waves traversing the penumbra and final infarct size. It shows how this overall linear correlation is observed while the damage observed in each individual acute stroke experiment progresses sub-linearly over time. It also successfully
25 1
reproduces the experimental dependency of final infarct size on mean cerebral blood flow and NdKATPase levels (the latter represented by R in the model). Overall, these findings bring some support to the hypothesis that CSD waves play an important causal role in the progression of penumbra tissue damage resulting from acute stroke. However, the possibility that CSD waves are just a side-effect of tissue damage progression, that is, an epiphenomenon of another underlying pathogenic process, obviously remains. To study this question of causality further, a direct comparison between CSD and GE hypotheses is called for. This comparison is described the next section.
The causal role of CSD vs. GE mechanisms in stroke We now consider and compare two models, the CSD model and a GE model based on the glutamate excitotoxicity hypothesis.
The two models The CSD model used here is a reduced version of that described above (and in detail in (Revett et al., 1998)) which focuses on the potassium ion as the main carrier of the waves. Each cortical element i has its own value for extracellular potassium K , , reuptake R , , metabolic stores M , , tissue intactness I, , internal potassium stores S, , and cerebral blood flow F , . The basic structure of the GE model and its computer implementation are similar to the CSD model. A detailed and formal description of both models is given in the Appendix, along with parameter values, and is based on a multi-dimensional set of non-linear differential equations that govern the temporal evolution of several key metabolic variables. As in the CSD model, in the GE model the cortex is represented as a discretized two-dimensional array of elements, each of which represents a small volume of cortex. The model’s parameters are tuned such that the overall rate of damage progression into the penumbra is similar to that observed in the CSD case, so that its temporal and spatial scaling remain basically similar to that found in the CSD case. As in the CSD case, each
cortical element i has its own value for reuptake R, , metabolic stores M , , tissue intactness I , , and cerebral blood flow F , . All variables are again dimensionless and are calculated and presented on a scaled basis from 0 to I . The variables describing potassium dynamics in the CSD model ( K , and S , ) are replaced by variables describing the dynamics of glutamate and the corresponding excitotoxic damage. To model a self-propagating process involving a toxic metabolite, each cortical element i now has its own value of extracellular glutamate levels (G,). As the cortical intactness level ( I , ) decreases below a threshold level, glutamate is released from its intracellular stores ( H I). The propagation of tissue damage in the GE model occurs as a result of leakage of glutamate from dying, lesioned, tissue. Due to the impoverished energy stores and reduced glutamate uptake in the penumbra, elevated extracellular glutamate levels diffuse locally, causing further excitotoxic damage in adjacent tissue cells. Thus, an outward gradual spread of self-propagating excitotoxic tissue damage is induced. These processes are modeled in a straightforward manner (see Appendix). Extracellular glutamate levels G, increase as a result of glutamate effusion from intracellular stores in damaged cortical tissue (the self-propagation term), and propagate through diffusion along the cortical grid. In turn, the rise of extracellular glutamate levels above a certain threshold level causes excitotoxic tissue damage. Finally, glutamate leakage out of damaged cells drains the intracellular stores ( H , ) , and ceases when these stores are completely depleted. In both the CSD and GE simulations, the cortical lesion is modeled by setting the tissue intactness at the center of the model cortex to zero (i.e. complete tissue death), according to the geometrical form of the simulated lesion (see Section 3.2). In addition, a penumbra gradient of cerebral blood flow is simulated. This gradient is modeled by linearly raising the blood flow from zero at the cortical elements in the center of the cortical grid to normal resting levels at the outermost elements of the penumbra. Two scenarios are simulated by the two models. In the first, CSD waves are assumed to play the main pathogenic role; the dying tissue releases large amounts of potassium ions into the extra-
252
cellular space, which in turn generate CSD waves that propagate outwards along the cortex, and cause further tissue damage. In the second, a GE diffusion-based self-propagating process is initiated, due to a large rise of extracellular glutamate that causes excitotoxic damage. To trace and visualize the behavior of the model, we generated two dimensional maps of the values of specific variables on the simulated cortex at a given moment. These maps can be displayed in a consecutive manner, providing an animated trace of the temporal evolution of these variables.
Results With many infarct lesions (e.g. roughly circular ones, like those observed in standard ischemic occlusion experiments), the resulting patterns of penumbra tissue damage are almost identical for both the GE and CSD cases. We therefore focused our research efforts on cortical lesions of two particular shapes that potentially promised to generate distinct patterns of damage, depending on the underlying pathogenic mechanism (CSD versus GE): V-shaped lesions whose vertex lies in the center of the cortex (see Fig. 4), and ring-shaped lesions that surround the center of the penumbra at some distance from it. The length of each side of the V-lesion is 27 cortical units, and the width of each side is 10 units. The ring lesions were placed at a distance of 20 units from the center of the penumbra, and their width was 4 units. The simulations are stopped after damage has ceased to spread. This typically occurred after a computational time frame that is the equivalent of almost two hours in experimental animals. The width of the penumbra in all simulations is 70 cortical units. In the first experiment, we superimposed a Vshaped infarct lesion on the center of an ischemic region with reduced blood supply (see Fig. 4).In the lesion region tissue intactness is clamped to zero to model cell death from mechanisms other than reduced blood supply ( e g resulting from photo-infarction, or from topical application of toxic metabolites). The lesion is placed in the penumbra region such that the tip of the V-lesion
lies in the center of the penumbra, which is also the center of each subfigure in Fig. 4. The combined pathology of tissue death and reduced blood supply (but not tissue death alone) results in a cascade of metabolic events that gradually causes damage to spread into the penumbra beyond its initial Vshaped borders. The distinct patterns of tissue damage observed in these simulations following CSD or GE pathogenic mechanisms are shown in Fig. 4. In the CSD case (a-d) the damage initially progresses more in the outward (left) region of the V lesion, but eventually it adopts an almost symmetrical shape around the center of the ischemic region. The damage in the GE case (e-h) follows the contours of the V lesion as it progresses. The parameters of the GE model are chosen such that the rate of damage progression in the GE case is of the same order as in the CSD case. In both cases, the rate of damage spread and its time course (a few mms in about two-three hours) is in the same order of magnitude of that observed experimentally. This computational experiment builds on the different generation and propagation characteristics of CSD versus GE mechanisms: CSD waves are generated when extracellular potassium levels increase above a threshold value. Since these levels are highest in the center of the penumbra (as reuptake there is the lowest), waves are first generated in this region, which hence becomes the driving, ‘pace-maker’ focus of the waves. The damage is caused in an accumulative manner, only after a few waves traverse the damaged region. These factors result in the formation of a fairly symmetric, circular propagating front around the lesion. GE spread, in contrast, depends on glutamate propagation that is generated simultaneouslyall along the lesion borders, and hence more faithfully preserves the shape of the initial lesion (e-h). Blocking tissue damage spread: a ring-shaped lesion In the second experiment, we impose a centrallysymmetric gradient of penumbral blood supply, similar to that of the first experiment. But instead of
253
Fig. 4. Propagation of penumbral tissue damage coupled with a V-shaped cortical lesion. The CSD case (a-d) is plotted in the left column, and GE case (e-h) in the right column. The four figures in each column show the level of damage at four ‘snapshots’; initially. after 40 min of lesioning and reduced blood supply, after 80 min, and finally, after 2 h, where the damage stabilizes and does not progress further. The figures’ gray scale denotes tissue viability levels; white denotes completely viable tissue, while completely black color denotes dead cortical tissue. Each of the snapshot figures displays the whole simulated cortex (with a width of 100 units), so that the simulated penumbra (with width of 70 units) captures two-thirds of the distance outwards from the center of the display. This relation between the penumbra and simulated cortex dimensions holds true also in the rest of the figures shown in this chapter.
254
simulating a V-shaped lesion, we now simulate a ring-like lesion of dead cortical tissue by clamping the tissue intactness to zero in the ring lesion surrounding the ischemic core. As shown in Fig. 5 , while the damage spreads beyond the ring lesion in the GE case, its outward spread into the cortical tissue is completely blocked if CSD waves are the dominant pathogenic mechanism. This result shows that the propagation of CSD waves (and the accompanying damage) is more sensitive to dying tissue barriers than the spread of GE-related damage. The critical difference in the dynamics of CSD versus GE mechanisms underlying this phenomenon is that CSD propagation heavily depends on the existence of viable cortical tissue; only viable tissue is capable of generating the ‘reactivation’ positive feedback loop required for the sharp rise of extracellular potassium that generates the CSD wave. In contrast, GE propagation arises mainly from the leakage of glutamate to the extracellular space from the dying lesion area, that initiates a self-propagating loop of glutamate diffusion and tissue damage. That is, in contrast to CSD propagation, GE spread is essentially independent of tissue intactness and can more readily traverse damaged tissue regions.
Discussion The causal role of CSD waves and glutamate excitotoxicity in ischemic damage is currently an open and important question (Obrenovitch and Urenak, 1997). Our work makes explicit predictions about the patterns of ischemic damage following acute focal ischemic stroke. The experimental study of these predictions may be feasible in the future (e.g. by using photo-infarction techniques (Domann et al., 1993; Montalbetti et al., 1995)), and may provide a clue to answering this question, by revealing the dominant pathogenic mechanisms underlying penumbral tissue damage. While it is possible to think that one mechanism is the primary causal one, in reality, it may also be the case that both CSD and GE play a joint causal role in the pathogenesis of penumbra damage. How then can one decipher their relative importance? In this chapter, we prescnt the first attempt to harness computational modeling to help advance an answer
to this quandary. Being limited to the ‘computational drawing board’, this kind of investigation cannot provide a definitive answer on its own. However, future experimental feedback addressing the predictions put forward here may help clarify the pathogenic factors underlying ischemic damage. Such experimental studies can describe the patterns of damage that are observed when lesioning experiments of the kind studied here are actually carried out in vivo. Since it is likely that in reality both mechanisms act in concert, at least to some degree, it may well be that these patterns may differ from those predicted in this chapter. In this case, one can then re-use the computational model and simulation tools described in this work (available via our website) to learn more about the relative weights of CSD and GE mechanisms. To this end, the two mechanisms simulated here can be combined into a unified model. Re-running the combined model and searching for the best fit with the experimentally observed data can reveal their correct weighting. While the exact patterns observed depend on the specific parameter values used and may differ in reality from those shown here, we believe that our results will remain basically valid, since they arise from inherent differences in the propagation properties of CSD versus GE mechanisms. Our current work in this ongoing study of computational modeling of acute stroke is now focused again on studying the pathogenic role of the intriguing and rather unique phenomena of cortical spreading depression waves. The latter waves have recently been reported to be involved in the pathogenesis of a broad spectrum of brain disorders including migraine, epilepsy and posttraumatic brain concussion, and hence their study gains further importance. As shown in the first part of this chapter, our CSD model successfully reproduced the experimentally found dependency of final infarct size on blood flow and ATPase levels. We are now conducting a systematic and extensive parameter search to identify ‘critical’ factors that determine penumbral tissue damage in the hope of isolating promising target variables for further therapeutic intervention, the ultimate goal of computational modeling of brain disorders.
255
Fig. 5. Blockade of the spread of penumbral tissue damage by a ring-shaped lesion. The CSD case (a-d) is plotted in the left column, and GE case (e-h) in the right column. As in Fig. 4, the four figures in each column show the level of damage at four 'snapshots' at similar time intervals. Gray scale is similar to Fig. 4.
256
Appendix Comparing GE and CSD damage - formal model description In the following equations, variables are denoted by a single capital letter (e.g., K , R), constant parameters by subscripts to the variables’ names (e.g., K,,, M , ), and multiplicative constants begin with a lower case c which is subscripted by two capital letters, both of which are usually names of variables. The first letter is the name of the variable whose equation contains this constant, and the second letter is usually the name of a variable that the constant modulates (e.g. cKs is the constant associated with the effect on the external potassium K of the internal potassium stores S). Numerical values for all parameters used in the simulations presented in this chapter are listed in Table 1 at the end of this paper. Both the spatial structure of the cerebral cortex and time are discretized. The cortex is represented as a two-dimensional array of elements, each of which represents a small volume of cortex. A hexagonal tessellation of the cortex is assumed, so each element has six immediately adjacent neighbor elements.
(approx. 3 mM in the cortex), a threshold Ke> K,,, beyond which elevated K, triggers explosive subsequent growth in K, , and a ceiling K,,, > KOabove which K, does not rise. As these dynamics require normal mechanisms operative in undamaged tissue, this term is multiplied by the level of cortical intactness, I,. The second term represents the pathological leakage of intracellular potassium into the extracellular space in damaged, infarcted tissue. It is a function of the levels of intracellular potassium stores S, , tissue intactness I , , and extracellular potassium. The third term reflects reuptake of potassium. The fourth term represents diffusion of K through the cortex, where cKDis the potassium diffusion coefficient and V is the Laplacian operator. In order to implement the discrete form of the Laplacian operator, we multiplied the extracellular K , level for element i by the number of neighboring elements and subtracted from this value the sum of the extracellular K values of these neighbors. The rate of potassium reuptake, R, , reflecting the functioning of membrane bound N d K pumps, is modeled by +
A l : the CSD model
Each cortical element i has its own value for extracellular potassium K, , reuptake R, , metabolic stores M I , persistent impairment P I , intactness I , , internal potassium stores S,, and cerebral blood flow F, , governed by the equations below. The rate of change of extracellular potassium concentration K , ( t ) of element i at time t is governed by the reaction-diffusion equation
cKS(s~
-
) ( K ~ ~ i a x-
K~
- cKKK~
R, + C K D V 2 K ~
(
where C M < 0; K,,,, , KO, K,,,,,, c K J , cKD > 0; K1,,,20 are constants. Initially, K, = KreStfor all elements. The first term, the reaction term, meets the following requirements: homeostatic maintenance of resting extracellular potassium level Kre,,
The first term reflects reuptake proportional to the levels of partial impairment, tissue intactness, and extracellular potassium and the second is a decay term, which contains extracellular potassiumdependent and independent components. The initial value of Ri is 0. The metabolic stores that determine the energy status of the tissue, such as glucose, and the high energy phosphate pool, are grouped together in a single variable M i , and are governed by
u1
(3) where c,, M, , cMR , c,> 0 are constants, M,is the is the equilibrium level of metabolic stores. c, basal level of energy expenditure. Initially, M, starts
251
at Mr,,t. The first term in equation (3) is a supply term, proportional to blood flow and tissue intactness levels, and inversely related to metabolic stores production rate. PI is the partial impairment of element i, and the addition of the factor ( P o- P I ) to the supply term incorporates the reduced ability of an impaired element to extract metabolic building blocks from the blood. The second term represents the metabolic load imposed by potassium reuptake and the basic metabolic rate. The partial impairment variable PI represents stresses on element i that compromise its integrity, such as decreased pH, increase in intracellular Ca” , etc. When the metabolic stores M, drop below the partial impairment threshold (M& the partial impairment for cortical element i increases in proportion to the magnitude of the difference between Me and M i ,
(4) where c,,>O is a constant. PI remains unchanged when M, > M e , and the initial value for P, is 0. Penumbra cerebral blood flow is regulated by
(5) where ctiM, M,,,,, F,,,,,, c F F > O are constants and cFM > cFt. M,,,, is the initial metabolic stores level, FmaX is the absolute ceiling for the blood flow rate and F,,,,/2 is the equilibrium rate of blood flow in normal tissue. Initially, F, is F,,,,/ 2 . The first term in (5) represents the dependency of blood supply on the status of the metabolic stores and current blood flow levels. The second term self-regulates blood flow towards its basal rate. Variable I, is an indicator of the intactness of element i, identifying the fraction of the element that is undamaged. Damage, which is assumed irreversible, occurs only below a critical metabolic threshold level M I< (P, + P, ), and is proportional to this energy deficiency,
where c,,, P,,>O are constants. I, remains unchanged when M, 2 (Po+ P I ) , and initially, I, is 1.O. This threshold for tissue intactness is directly dependent on the persistent impairment P I . The leakage of potassium from intracellular to the extracellular space occurs when the intactness for a given cortical element, I , , begins to drop. This formulation is expressed in the following equation, dS; -=c,(I; dt
-
S;)
(7)
where css> 0 is a constant. Initially, S, starts at 1.O. Boundary conditions are set along the edges of the simulated cortex. The results are similar with leakage and sealed-end boundary conditions. In both cases, when CSD waves reach the edge of the simulated cortex, they dissipate immediately. This is consistent with numerous reports (Grafstein, 1956a, 1956b; Bures et al., 1974) indicating that CSD waves do not propagate across sulci. Since we have not included sulci in the present version of the model, these boundary conditions provide a functional equivalence to sulci with respect to CSD wave propagation properties. The real-world equivalence of the duration and velocity of CSD waves can be calculated in a straightforward manner (Revett et al., 1998), yielding 0.125 mm as the cell scale and 13 msec min) as the tick scale, the program (2.167 * time-step. A2: the GE model The basic structure of the model (and its computer implementation) is similar to the CSD model. The cortex is represented as a two-dimensional array of elements, and both the spatial structure of the cerebral cortex and time are discretized. Boundary conditions are set as before. To model a selfpropagating process involving a toxic metabolite, each cortical element i now has its own value of extracellular glutamate levels (GI). As the cortical intactness level ( I , ) decreases below a threshold level, glutamate is released from its intracellular stores (H,). Equation ( 1) governing extracellular potassium levels is now replaced by an equation governing extracellular glutamate levels. Instead of this for-
258
mer reaction-diffusion equation describing potassium-carried CSD waves we now have the following self-propagation equation
TABLE 1 Parameter values for both CSD and GE models Parameter
0.03 0.20 1.o 0.005 0.0035 - 0.3 I.00 0.0006 0.00033 1 .OO 0.50 0.30
(8) Extracellular glutamate stores are hence composed of three terms. The first denotes the leakage of glutamate out of damaged tissue (the self-propagation term), the second denotes glutamate diffusion and the third term denotes its uptake. The rate of glutamate reuptake, T I ,is modeled by an equation analogous to equation (2)
1.oo
(9) And, similarly to equation ( 3 ) , the metabolic stores are governed by
- (CMRTr
+ cMM
CSD Simulations
(10)
1 .o 0.0006 0.00033 1.OO 0.50 0.30 1.00 0.0667 0.00025 0.30 1 + 2M,,,wMM/cM, 0.00015 1 .00 5.00 0.45 0.5 0.000 I
0.0667 0.00025 0.30 1 -k 2Mrw * C M M I C M F 0.00015 1 .OO 5.00 0.45 0.5 0.0001 0.03 ( K J 0.00 1 .0 ( M A 0.0
0.00 1 .0 (M,,,,) 0.0
Graded[O-0.51
Graded[&O.5]
1 .0 1 .o 0.001
The equations for partial impairment and blood flow remain identical to the corresponding ones in the CSD model. However, tissue damage is now caused not directly as a function of metabolic deficiency (previously described in Equation (6)), but due to the rise of extracellular glutamate levels beyond an excitotoxicity threshold I,,,
GE Simulations
I.0 1 .0 0.001 0.0035 0.005 1 .o 0.03 0.04 0.0001
Acknowledgments
where cII > O is a constant. In parallel to the CSD case, I, remains unchanged when G I G & , and initially, I, is 1.0. Finally, glutamate leakage out of damaged tissue cells drains the intracellular stores, in accordance with
where cHH> O is a constant. Initially, Hi starts at 1.o.
Supported by NINDS Award NS29414, Israeli Ministry of Health Grant No. 01350931, and BSF Award 96-00238.
References Back, T., Kohno, K. and Hossmann, K.A. (1994) Cortical negative DC deflections following middle cerebral artery occlusion and kcl-induced spreading depression: Effect on blood flow, tissue oxygenation, and electroencephalogram.J. Cereb. Blood Flow Metab., 14: 12-19. Back, T., Zhao, W. and Ginsberg, M.D. (1995) Threedimensional image analysis of brain glucose merabolismblood flow uncoupling and its electrophysiological correlates
259 in the acute ischemic penumbra following middle cerebral artery occlusion. J. Cereb. Blood Flow Metub., 15: 566-577. Back, T., Ginsberg, M.D., Dietrich, W.D. and Watson, B.D. (1996) Induction of spreading depression in the ischemic hemisphere following an experimental middle cerebral artery occlusion: effect on infarct morphology. J. Cereb. Blood Flow Metab., 16(2): 202-213. Bures, J . , Buresova, 0. and Krivanek, J. (1974) The Mechanism and Application of Leao 's Spreading Depression of ElectroencephafographicActivity, chapter 1. Academic Press, Inc. Castillo, J., Davalos, A,, Naveiro, J. and Noya, M. (1996) Neuroexcitatory amino acids and their relation to infarct size and neurological deficit in ischemic stroke. Stroke, 27(6): 1060-65. Chen, Z.F., Schottler, F., Arlinghaus, L., Kassel, N.F. and Lee, K.S. (1996) Hypoxic neuronal damage in the absence of hypoxic depolarization in rat hippocampal slices: the role of ghtamate receptors. Bruin Rex, 708: 82-92. Choi, D.W. (1988) Glutamate neurotoxicity and diseases of the nervous system. Neuron, 1: 623434. Choi, D.W. (1994) Glutamate receptors and the induction of exitotoxic neuronal death. Prog. Bruin Res., 100: 47-51, Domann, R., Hagemann, G., Kraemer, M., Freund, H.J. and White, O.W. (1 993) Electrophysiological changes in the surrounding brain tissue of photochemically induced cortical infarcts in the rat. Neurosci. Lett., 155: 69-72. Fisher, M. and Garcia, J. (1996a) Evolving stroke and the ischemic penumbra. Neurology, 47: 846888, October. Fisher. M. and Garcia, J.H. (1996b) Evolving stroke and the ischemic penumbra. Neurology, 47: 886888. Garthwaite, G., Williams, G.D. and Garthwaite, J. (1992) Glutamate toxicity: An experimental and theoretical analysis. Europ. J. Neurosci., 4: 353-360. Gill, R., Andine, P., Hillered, L., Person, L. and Hagberg, H. (1992) The effect of MK-801 on cortical spreading depression in the penumbral zone following focal ischemia in the rat. Journal of Cereb. Blood Flow Metab., 12: 37 1-379. Grafstein, B. (l956a) Locus of propagation of spreading cortical depression. J. Neurophysiol., 19: 308-3 15. Grafstein, B. (1956b) Mechanism of spreading cortical depression. J. Neurophysiol., 19: 154-171. Gyngell, M., Back, T. and Hoehn-Berlage, M. (1994) Transient cell depolarization after permanent middle cerebral occlusion: and observation by diffuse-weighted MRI and localized IH-MRS. Magn. Reson. Med., 31: 337-341. Hansen. A.J. and Mutch, W.A.C. (1984) Water and ion fluxes in cerebral ischemia. In: B.K. Siesjo (Ed.), Cerebral Ischemia. Excerpta Medica, 1984, pp. 121-130. Heiss, W.D. and Graf, R. (1994) The ischemic penumbra. Curr: Opin. Neurol., 7: 11-19. Hossmann, K.A. (1994a) Viability thresholds and the prenumbra of focal ischemia. Ann. Neurol., 36: 557-565. Hossmann, K.A. (1994b) Glutamate-mediated injury in focal cerebral ischemia: The excitotoxin hypothesis revised. Brain Pathol.. 4 : 23-36. Iijima, T., Meis, G. and Hossmann, K.A. (1992) Repeated negative deflections in rat cortex following middle cerebral
artery occlusion are abolished by MK-801. Effect on volume of ischemic injury. J. Cereb. Blood Flow Metab., 12: 727-733. Koroshetz, W.J. (1996) Imaging stroke in progress. Ann. Neurol., 39(3): 283-284. Largo, C., Cuevas, P., Somjen, G.G., del Rio, R.M. and Herreras, 0. (1996) The effect of depressing glial function in rat brain in situ on ion homeostasis, synaptic transmission, and neuron survival. J. Neurosci., 16(3): 1219-1229. Lauritzen, M. (1994) Pathophysiology of the migraine aura: The spreading depression theory. Brain, 117: 199-210. Marrannes, R., Willems, R., De Prins, E. and Wauquier, A. (1988) Evidence for a role of the N-methyl-D-aspartate (nmda) receptor in cortical spreading depression in the rat. Brain Rex, 457: 226-240. Mayavsky, A. and Weiss, H.R. (1991) Cerebral blood flow and oxygen consumption in cortical spreading depression. J. Cereb. Blood Flow Metab., 11: 829-836. Mayevsky, A,, Zarchin, N. and Friedli, C.M. (1982) Factors affecting the oxygen balance in the awake cerebral cortex exposed to spreading depression. Brain Res., 236: 93-105. Mayevsky, A,, Doron, A., Manor, T., Meilin, S., Salame, K. and Ouknine, G.E. ( 1995) Repetitive cortical spreading depression cycle development in the human brain: a multiparametric monitoring appoach. J. Cereb. Blood Flow Metab., lS(1): 534. Mies, G., Iijima, T. and Hossmann, K.A. (1993) Correlation between pen-infarct dc shifts and ischaemic neuronal damage in rat. NeuroReport, 4: 709-7 1 1. Montalbetti, L., Rozza, A,, Rizzo, V., Favalli, L., Scavini, C., Lanza, E., Savoldi, F., Racagni, G. and Scelsi, R. ( I 995) Aminoacid recovery via microdialysis and photoinduced focal cerebral ischemia in brain cortex of rats. Neurosci. Lrrt., 192: 153-156. Nedergaard, M. and Astrup, J. (1986) Infarct rim: effect of hyperglycemia on direct current potential and 3-deoxyglucose phosphorylation. J. Cereb. Blood Flow Metab., 6: 607-6 15. Nedergaard, M. and Hansen, A.J. (1993) Characterization of cortical depolarizations evoked in focal cerebral ischemia. J. Cereb. Blood Flow Metab.. 13: 568-574. Obrenovitch, T.P. and Urenak, J. (1997) Altered glutatnatergic transmission in neurological disorders: From high extracellular glutamate to excessive synaptic efficacy. Prog. Neurobiol., 51: 39-87. Pulsinelli, W. (1992) Pathophysiology of acute ischaemic stroke. The Lancet, 339: 533-536. Reith, W., Hasegawa, Y., Latour, L.L., Dardzinsli, B.J., Sotak, C.H. and Fisher, M. (1995) Multislice diffusion mapping for 3-d evolution of cerebral ischemia in a rat stroke model. Neurology, 45: 172-177. Revett, K., Ruppin, E., Goodall, S. and Reggia, J.A. (1998) Spreading depression in focal ischemia: A computational study. J. Cereb. Blood Flow Metab. 18(9), 998-1007. Ruppin, E., Ofer, E., Reggia, J., Revett, K. and Goodall, S. (1999) Pathogenic mechanisms in ischemic damage: A computational study. Comput. Biol. Med., 29( I), 39-59.
260 Takagi. K., Ginsberg, M.D., Globus, M.Y., Dietrich, W.D., Martinez, E., Kraydieh, S. and Busto. R. (1993) Changes in amino acid neurotransmitters and cerebral blood flow in the ischemic penumbral region following middle cerebral artery occlusion in the rat: correlation with histopathology. J. Cereh. Blood Flow Metab., 13: 575-585. Takano, K., Latour, L.L., Formato. J.E., Carano, R.A.D., Helmer, K.G., Hasegawa, Y.. Sotak, C. and Fisher, M. (1996) The role of spreading depression in focal ischemia evaluated by diffusion mapping. Ann.Neiirol., 39: 308-3 18. Wahl, F., Obrenovitch, T.P., Hardy, A.M., Plotkine, M., Boulu, R. and Symon, L. (1994a) Extracellular glutamate during
focal cerebral ischaemia In rats: time course and calcium dependency. J. Neurctchem., 63: 1003-101 1. Wahl, F., Obrenovitch, T.P., Hardy, A.M., Plotkine, M., Boulu, R. and Symon, L. (1994b) Extracellular glutamate during focal cerebral ischemia in rats: time course and calcium dependency. J. Neurochem., 63(3): 1003-101 1. Williams, G.D., Towfighi. G.D. and Smith, M.B. (1994) Cerebral energy metabolism during hypoxia-ischemia correlates with brain damage: a "p nmr study in unanesthetized immature rats. Neurosci. Lett.,170: 3 1-34. Woods, R.P., Iacoboni, M. and Mazziotta, J.C. (1994) Bilateral spreading cerebral hypoperfusion during spontaneous migraine headache. N. Engl. J. Med., 33 1: 1689-1 692.
1.A. Reggia. E. Ruppiii and D. Clanzman (Eda.) Progresr in Broin Research, Val 121 0 1999 Elaevier Science BV. All rights reaerved
CHAPTER 15
The gating functions of the basal ganglia in movement control Jose L. Contreras-Vidal* Arizona State University, Tempe, A Z 85287-0404, USA
Introduction The basal ganglia are adaptive subcortical circuits whose outputs terminate in thalamic regions that in turn have access to a broad region of the frontal lobe of the cerebral cortex. This connectivity allows the basal ganglia to participate in the learning, selection, planning, initiation and execution of willed actions (see Fig. 1). The basal ganglia appears to act by opening normally closed gates via selective phasic removal of tonic inhibition. This selective opening of thalamic pathways to cortical areas occurs in relation to expected or current task requirements allowing real-time reconfiguration of movement control pathways (White, 1989; Contreras-Vidal and Stelmach, 1995; Bullock, 1998). For example, during handwriting, cortical circuits associated with the control of multiple degrees of freedom (df) must be selected, switched on and off, or gated to allow for smooth transitions among motor sequence components. In this regard, Jenkins et al. (1994), using brain imaging techniques, reported reduced activity of visual processing areas during learning of sequences of keypresses by trial and error during auditory cueing, which suggested that selective attention involves depressing of cell activity in *Comespondingauthor. Fax: + (602) 965-8108; email: [email protected]. NOW at University of Maryland, Departmenty of Kinesiology, College Park, MD 20742-2611, USA.
modalities that are not relevant for the task. In this experiment, the putamen, which is the main input stage to the basal ganglia, was equally active during sequence learning and retrieval. Furthermore, prefrontal cortex, lateral premotor cortex, and parietal association were significantly activated during new learning, while the supplementary motor area (SMA) was only activated during performance of
-
Frontal Cortex
Sensory
t
Motor
I
Motor Thalamus
Basal Ganglia
Fig. 1. Multiple sensory-motor flows can be phasically selected, modulated or interrupted via segregated basal gangliathalamo-cortical loops. The filled circle represents tonic inhibition that is phasically reduced to allow selective thalamocortical activation. Cortical areas, in turn, can also influence thalamic responses via cortico-thalamic pathways. Adapted with permission from Bullock (1998).
262
the prelearned sequence. But when subjects were required to make selection of grasping movements cued by arbitrary stimuli, Grafton et al. (1998) found that dorsal premotor cortex and superior parietal cortex were highly active. Moreover, Boecker et al. (1998) showed complexity-correlated regional cerebral blood flow (rCBF) increases in anterior globus pallidus and in the contralateral rostra1 SMA (pre-SMA), whereas the increase in rCBF in the SMA proper was not modulated by the task complexity, suggesting a motor execution role. Interestingly, the basal ganglia and their cortical areas (accessed via the thalamus) were activated whenever a movement selection or novel response was required. Overall, this suggests that the basal ganglia organizes cortical circuits involved in sequence planning, selection, initiation and execution of motor programs through selective gating of thalamo-cortical pathways. On the other hand, cortico-thalamic projections in the primate can also participate in the selection process by modulating the activity of thalamic neurons (Fig. 1). Recent data suggest that the cortico-thalamic pathways cannot be considered to simply reciprocate the thalamo-cortical projection (Darian-Smith et al., 1996), and instead may subserve cortical-cortical communication or convey information to other thalamo-cortical circuits (Sherman and Guillery, 1996) through different cortico-thalamic circuitry, which appears to vary systematically in different cortical areas (Darian-Smith et al., 1996; Yeterian and Pandya, 1994). In this paper, cortico-striatal and cortico-thalamic circuits are studied in relation to functionally segregated basal ganglia-thalamo-cortical circuits involved in movement planning and control (Alexander et al., 1986). It is hypothesized that the basal ganglia work by opening normally closed gates via selective phasic removal of tonic inhibition in thalamo-cortical circuits involved in different aspects of motor behavior. Thus, unlike alternative models of movement control in normal and Parkinson’s disease (PD) movement (e.g., DeLong and Wichmann, 1993), it is proposed that phasic disinhibition of the motor thalamus is critical for movement control. It is shown through computer simulations of a network model of basal ganglia-
thalamo-cortical dynamics during handwriting production that dopamine depletion in PD causes abnormally small and short phasic pallido-thalamic gating signals that result in akinesia, bradykinesia and hypometria. Based on these results and in new experimental evidence, the current pathophysiological model of PD is criticized.
Network model of basal ganglia-thalamocortico-thalamicdynamics Figure 2 depicts a schematic diagram of the hypothesized network model of basal gangliathalamo-cortical dynamics, including the cortico-striatal and cortico-thalamic circuits. In this model, based on the loop model of Contreras-Vidal and Stelmach (1995), it is shown how multiple pallidal-thalamo-cortical circuits interact during the performance of a complex sequential task such as handwriting. The highest-level in the motor hierarchy defines the movement plan (e.g., the desired movement sequence). Motor sequence planning or programming in handwriting appears to be carried out by premotor cortex, including SMA or area 6, which is known to be associated with complex finger movements (Roland et al., 1980; Jenkins et al., 1994). Recently, Mushiake and Strick (1995) found that neurons in dorsal GPi, which are known to project to SMA, are involved in the processing of sequential movements. Moreover, the basal ganglia receive a major input from SMA (Kunzle, 1975; Alexander and Crutcher, 1990). In the model of Fig. 2, the thalamo-cortical projection to SMA triggers the read-in of the next motor program component (e.g., Target Position Vector or TPV also in area 6) coding for a single stroke within the handwriting sequence. It also triggers the initiation of basal ganglia gating signals that control movement onset and speed for each joint or df (Contreras-Vidal and Stelmach, 1995; Contreras-Vidal et al., 1998). Each TPV accounts for the three joints modeled during handwriting, namely, finger flexiodextension responsible for up-down pen movements, and wrist supination/ pronation and wrist flexion/extension which are
263
Thalamo-cortical feedforward inhibition
A
area4
A IA
L
II
v
Competitive network
Kinematics Cortico-thalamic feedforward inhibition area 6
I
Segregated networks
lo00
+
Inhibitory
Excitatory
c3
Inhibitory
___.
-
Excitatory
Fig. 2 . Network model of basal ganglia-thalamocortical dynamics, including cortical projections to the striatum and thalamus. See text for detailed description of model components and functions. Keys: D1, D2 dopamine receptor subtypes: DV, difference Vector; GPe, GPi, external and internal segments of the globus pallidus: IN, inhibitory cortical interneuron; SNc, SNr substantia nigra pars cornpacta and pars reticulata, respectively; PPV, Present Position Vector; STN, subthalamic nucleus; TC, thalamocortical neuron: TPV, Target Position Vector; TR, thalamic reticular inhibitory neuron or inhibitory thalamic intrinsic neuron; VL, ventrolateral thalamus; inhibitory projections are GABAergic (gamma-aminobutyric acid), whereas excitatory projections are GLUtamatergic.
responsible for small horizontal movements and the left-to-right progression during handwriting, respectively (Bullock et al., 1993). The order and timing of the TPV motor commands, delivered to
cortical and subcortical circuits, determine the curvature of the stroke. Each TPV is compared with the Present Position Vector (PPV) representing the current position of
264
the fingers and wrist. Thus, the resulting Difference Vector (DV) computes continously the desired stroke amplitude and direction. The output of the DV is integrated at the PPV stage to generate a point-to-point movement or stroke (i.e., the forward kinematics). The trajectory generation network component of the present model is known as the Vector Integration To Endpoint or VITE module (Bullock and Grossberg, 1988). The VITE dynamics are, however, modulated by pallido-thalamic gating signals generated by the basal ganglia model. In Fig. 2, cortico-striatal projections from premotor areas to the putamen of the basal ganglia are shown. Within the putamen, competitive interactions via surround inhibition (Parent and Hazrati, 1995) perform contrast-enhancement and noise suppresion of cortical inputs. Striatal output projections form distinct parallel pathways within the basal ganglia. These projections have been termed the direct and indirect pathways due to their effect on neurons of the internal segment of the globus pallidus (GPi) such that activation of a class of striatal neurons result in inhibition of GPi cells through the direct pathway, whereas activation of a different class of striatal neurons result in the activation of GPi neurons via the double inhibitory indirect pathway through the external segment of the pallidum (GPe) and the subthalamic nucleus or STN (Fig. 2). In fact, the main output of the basal ganglia, namely GPi, may work as a normally closed gate that can be transiently opened by phasically activating the (PUT Z+GPi) and closed by activating the indirect pathway (PUT GP e L S T N A G P i ) or by disinhibition via the PUT GPi 4 GPe pathway. Opening the gate through inhibition of GPi causes disinhibition of the motor thalamus (VL) which activates the same cortical areas that produced the striatal activation in the first place. The opponent anatomical differentiation of the direct and indirect pathways is matched by an opponent neurochemical differentiation such that the neurotransmitter y-aminobutyric acid (GABA) coexists with the neuropeptides Substance P (SP) and dynorphin (DYN) in the direct pathway, whereas GABA coexists with enkephalin (ENK) in the indirect pathway (Gerfen et al., 1991, 1992).
Moreover, most striatal cells projecting to GPi express the D-1 dopamine (DA) receptor, whereas those projecting to GPe express mainly the D-2 DA receptor (Fink, 1993). Thus, dopamine-dependent neurochemical modulation of neuronal activity in the direct and indirect pathways (Gerfen et al., 1991; Nisembaum et al., 1994) appears to subserve an additional level of opponent processing within the basal ganglia circuits. It is generally agreed that modulation of striatal neuropeptides by striatal dopamine causes opposing effects on the magnitude and timing of neuropeptide expression (Gerfen et al., 1991; Nisembaum et al., 1994; ContrerasVidal et al., 1998). Moreover, it is known that dopamine depletion in PD causes opposite effects on neuropeptide and dopamine receptor expression such that SP and D1 receptors in the direct pathway are down-regulated, whereas ENK and D2 receptors in the indirect pathway are up-regulated (Gerfen et al., 1991). These changes are consistent with the hyperactivity of GPi and STN neurons and the hypoactivity of GPe and VL observed after dopamine depletion (Contreras-Vidal et al., 1998). Recent experimental data suggest the existence of mutual inhibition between pallidal segments (Tremblay and Filion, 1989; Hazrati et al., 1990). In this regard, both the striatum and the pallidum can be represented as opponent circuits (this particular type of opponent circuit has been termed a gated dipole by Grossberg (1984); see also Contreras-Vidal and Stelmach (1995)), which maintain a delicate balance of signal strengths between direct and indirect circuit pathways. The GPi neurons project to ventrolateral thalamus (VL), which in turn gates cortical circuits during movement generation. Two of such pathways are represented in Fig. 2, one terminating in the PPC stage (related to primary motor cortex or area 4) and one terminating in the sequence planning stage previously described (in SMA or area 6). In this regard, Hoover and Strick (1993) have shown spatially separate transynaptic viral labelling in the internal segment of the globus pallidus (GPi) after injections of herpes simplex virus type I into primary motor cortex (Ml, or area 4), SMA, and ventral premotor (PM) cortex indicating discrete basal ganglia output channels. Furthermore, injections into the hand area of M1
265
suggest that the pallidothalamic projection to M1 is responsible for the control of distal movements (see also Schell and Strick, 1984). The selection of cortical neurons by thalamic input appears to involve GABAergic inhibitory neurons (White, 1989). In our model, thalamocortical pathways projecting to M1 activate in parallel pyramidal cells and inhibitory interneurons (INS). The GABAergic IN acts as an inhibitory feedforward mechanism on pyramidal cells (see right inset in Fig. 2). This system has the effect of favoring phasic activity and removing tonic excitation from cortical neurons. Thalamic neurons in the model are also gated by premotor cortical input via direct connections, a n d or through the thalamic inhibitory interneurons or thalamic reticular nucleus. In Fig. 2, two parallel thalamo-cortico pathways are simulated, namely, the projection from area 4 (ie., DV stage in the model) and from area 6 (i.e., model SMA containing the motor programs). DV-related pathways activate thalamic neurons whenever pallidal afferents phasically disinhibit the underlying thalamic nuclei. On the other hand, cortical inputs from area 6 inhibit phasically thalamic activity via GABAergic intrinsic or reticular neurons forming a feedforward inhibitory pathway. Reticular neurons also appear to contact adjacent nuclei of the dorsal thalamus. Moreover, it is known that fine corticothalamic axons ramify densely not only in the reciprocating thalamo-cortical regions, but they also appear to send many short branches to extensive parts of adjacent thalamic regions (Darian-Smith et al., 1996). Thus, subcortical gating of thalamo-cortical circuits originating from both reciprocating and adjacent thalamic nuclei can be interrupted through activation of cortico-thalamic afferents. In the next section, simulations of the model depicted in Fig. 2 are shown for normal and PD movement control during point-to-point hand movements and during a handwriting task. Handwriting movements provide a suitable window to study basal ganglia-thalamo-cortical dynamics because it exhibits both parallel and serial combinations of tighly scheduled movement components. Mathematical equations and additional biological details can be found in the Appendix, and in the
journal articles (Contreras-Vidal and Stelmach, 1995; Contreras-Vidal et al., 1998).
Results Contreras-Vidal and Stelmach (1995) have shown how a neural network with opponent interactions at the anatomical, neurophysiological and neurochemical levels, such as the basal ganglia, can generate graded gating signals that can activate motor cortical areas by phasic disinhibition of the thalamo-cortical pathways serving complex motor sequential tasks. It was also shown that simulated dopamine depletion, as in PD, caused an imbalance in the signal strengths between the direct and indirect pathways that causes disrupted balances between tonic and phasic activities that resulted in a reduced capability to coordinate finger and wrist motions during handwriting (Teulings et al., 1997; Contreras-Vidal et al., 1998). To test the hypothesis of basal ganglia control of movement by phasic signals, neural network simulations of 2D horizontal pointing movements of 5 cm of amplitude were performed (Fig. 3 ) . The simulations included movement under normal conditions (top panel), after simulated muscimol (a GABA agonist) injection into GPi (middle panel), and in PD conditions (bottom panel). The effect of muscimol injection in a small region of ventrolateral GPi was simulated by reducing the output of the model GPi neurons by 5%. The effect of PD was simulated by depleting the levels of striatal dopamine by 40% (Contreras-Vidal et al., 1998). The end-point velocity (A), simulated GPi activity (B), and simulated VLo activity (C) are shown in Fig. 3 . In the normal condition, phasic inhibition of GPi resulted in phasic disinhibition of VLo which caused movement initiation and execution of a horizontal hand movement ( 3 trials are shown in Fig. 3 for each network state, which were simulated by adding small amounts of uniformly-distributed white noise to the motor programs or TPVs). This resulted in bell-shaped end-point velocities of a simulated hand-held pen. However, a decrease in pallidal baseline activity (from 0.9 units to 0.3 units post-injection) and phasic responses (from 1.2 units to 0.5 units post-injection, compared to the top
266
0.15 units post-injection) in the simulated pallidalreceiving thalamic neuron as compared to the control movement prior to the lesion (top panel).
panel) caused by the muscimol injection resulted in increased tonic activity (from 0.04 to 0.3 units), and decreased phasic activity (from about 0.3 units to
A
B
End-point velocity
C
Simulated GPi
Simulated VLo
cmls
1.4"
120
'
'
'
'
'
rd
Pre
0.45 -
Pre
I
0-
0 120
n
Post
120
n
0
uL4 0
100
rns
-0.4O
D
0
150
rns
o 0
150
u ms
Fig. 3. Phasic disinhibition of motor thalamus is critical for movement control. Panels show simulated end-point velocity (A), simulated GPi (basal ganglia) activity (B), and simulated VLo (thalamic) activity ( C ) for horizontal movements of a hand-held pen. Three trials of 2D pointing movements were simulated from an initial position given by PPV= [O,O,O] to a target located to the right of the initial position (TPV = [5.3,I ,O]). The simulations show that phasic inhibition of GPi cells results in phasic disinhibition of VLo, which allows movement initiation and execution. But in Parkinsonism, overactivity and reduced phasic inhibition of GPi causes abnormally low tonic and phasic levels of VLo activity, which causes bradykinesia and hypometria. Simulated injections of muscimol (a GABA agonist) into GPi, which cause tonic disinhibition of VLo and reduced phasic disinhibition of VLo cells, also result in reduced end-point velocity. Therefore, phasic activity is critical for control of movement speed and amplitude.
261
The reduced phasic activation of thalamic cells due to inactivation of GPi cells is in agreement with the muscimol studies of Inase et al. (1996), who showed that after injection in GPi, pallidal-receiving thalamic neurons showed increased mean discharge rates and decreases in both phasic peak activity (see Fig. 11 of Inase et al., 1996) and movement velocity compared to pre-injection levels. The bottom panel of Fig. 3 shows that simulated striatal dopamine depletion in the model resulted in overactivity and reduced phasic responses of model GPi neurons. This imbalance in tonic and phasic GPi activities caused smaller and shorter-thannormal thalamic activities, and reduced end-point velocities. Specifically, DA depletion produced higher ronic levels of GPi activity (from 0.9 to 1.2 units) which resulted in the removal of the baseline activity of the simulated VL neuron. Moreover, the phasic activity of GPi was greatly reduced resulting in smaller and shorter-than normal phasic thalamic responses. Interestingly, both muscimol injections and striatal dopamine depletion resulted in reduced phasic VLo responses and reduced peak velocities. However, the impairment was more pronounced after PD, and may have resulted from different mechanisms: muscimol injection decreased the baseline level of GPi activity (perhaps by increasing striatal and pallidal GABAergic input activity), and therefore the range of phasic modulation of VLo neurons in the presence of high tonic thalamic activity, whereas PD reduced both the tonic and the phasic VLo activity. This suggests that tonic VLo activity is not critical for movement, but that phasic activity is necessary for modulating movement. Overall, the data suggest that phasic striato-pallido gating signals are critical for movement control. Figure 4 shows a simulation of handwriting production under normal and PD conditions based on the system schematized in Fig. 2. The variables depicted include the simulated motor program (Panel A), striatal activations (Panel B), pallidal responses (Panels C-E), subthalamic (STN) and thalamic (VLo) activities (Panels F-H), corticothalamic responses (Panel I), joint velocities for each degree of freedom (Panel J), and the resulting spatial pattern (Panel K).
Panel A represents the motor sequence for generating the handwriting character e which consisted of five subprograms given by each collumn. Each motor subprogram code for the Target Position Vector for each of the 3 df in the hand model, that is finger flexiodextension (TPV[ l]), wrist supinatiodpronation (TPV[2]), and wrist flexiodextension (TPV[3]) (Bullock et al., 1993). It is assumed that the motor program is the same for the normal and the PD simulation. Readin of each subprogram coincides with phasic activation of the striatal neurons as shown in Panel B. For example, the thin line at about 50 ms in Panel B coincides with the read-in of the second subprogram in Panel A (shaded collumn). Panel C depicts the pallidal activities for the finger joint, which corresponds to the solid bold line in Panel B, Specifically, striatal activation conmanded by cortical inputs in Panel A produces a larger phasic inhibition of GPi and GPe neurons in the normal network as compared to the PD simulations. This is in agreement with observations by Girault et al. (1986, p. 115) who noted that “in the absence of DA the control of excitatory influences mediated by corticostriatal inputs and cholinergic interneurons is largely reduced. Furthermore, the process of lateral inhibition mediated by GABA neurons is not facilitated. In this situation striatal efferent neurons are easily activated by any excitatory input without selectivity.’’ This results from the disruption of the striatal and pallidal opponent processing, as neurons in the indirect pathways are overactivated after dopamine depletion in PD. Panels D and E represent the pallidal activities for the other 2 df in the hand simulation. Note that the striatal and pallidal activities depend not only on the timing and magnitude of the cortical inputs but also on the dopamine-dependent neuropeptides which modulate the striato-pallidal pathways (not shown, but see Contreras-Vidal et al., 1998). Both striatal and pallidal neurons were embedded in networks with competitive interactions (see Methods in Contreras-Vidal et al., 1998; and in the Appendix). Panels F-H depict the simulated subthalamic (STN) activity, and the thalamic responses for each df. STN neurons were, in general, tonically active. Thalamic (VLo) responses were, however, phasic and were initiated by the timing of
268 Parklnsonlan
Normal
A L
I'
B
C
1.o
t . ? -1 '
G
H L
I
"
I
n
SMA[3]
J
n K
31
269
TPV inputs in their respective channels. VLo responses were also observed in channels not purposely recruited by the most recent TPV. For example, in Panel H, which corresponds to the shaded column in Panel A, initial activity was also seen during read-in of the first TPV. This activity was, however, shut-off by the read-in of the second TPV, which contained a non-zero target for channel 3 only. Note that the VL activities in the PD network were in general smaller in amplitude and shorter in duration with respect to the normal conditions. Panel I shows the cortico-thalamic feedback that reset the VL activity at times of peak velocities in any df (see Panel J). Cortical control of VL activity was assumed to involve GABAergic interneurons within the thalamus (see Fig. 2). Panel J represents the joint velocities for each df in the simulated hand (the thin trace corresponding to wrist flexiodextension has been normalized to fit the range of joint velocities for the other 2 joints). Finally, Panel K depicts the spatial pattern in the normal and PD simulations. The latter pattern shows small size (hypometria), and reduced peak joint velocities (see Panel J). In summary, pallidal responses in the PD simulation had a reduced range of modulation compared to the normal state. This resulted in smaller and shorter-than-normal VLo responses (Panels F-H), and sometimes absence of thalamic gating (i.e., second gating pulse in the right of Panel G is missing in the PD simulation). The deficits in pallidal gating of thalamo-cortical pathways in the PD simulation resulted in small handwriting and bradykinesia (as seen in the lower peak joint velocities in Panel J)
Conclusions The pathophysiological model of PD movement The current pathophysiological model of basal ganglia functioning in PD points to a loss of striatal dopaniine (DA) that causes an imbalance in the
activity of the direct and indirect pathways of the basal ganglia (compare the normal basal ganglia in Fig. 5A with the Parkinsonian model depicted in Fig. 5B). Larger than normal activity in the indirect pathway results in overactivity of STN and GPi neurons. The motor deficits observed in PD, such as akinesia, bradykinesia and hypometria, are thought to result from abnormally large GPi inhibition of thalamocortical neurons influencing the frontal lobe (DeLong and Wichmann, 1993; Lozano et al., 1995). Abnormally low or high activity in the basal ganglia is consistent with experimental evidence. For example, in STN the spontaneous firing rate is significantly increased from 19 to 26 spikes/s ( + 36%) after MPTP-induced Parkinsonism in monkeys (Bergman et al., 1994). Moreover, the tonic activity of a subpopulation of GPi cells with 4-8 Hz periodic activity is increased from 53 to 76 spikesh (+43%). Boraud et al., (1996) have reported an increase in tonic GPi activity from 80.5 to 106 spikesh ( + 3 1%) after MPTP in monkeys, and Filion and Tremblay (1991) reported an increase in mean firing rate from 78 to 95 spikes/s ( + 20%) for GPi cells, and a decrease from 76 to 5 1 spikesh ( - 33%) for GPe neurons after MPTPinduced Parkmonism in monkeys. The pathophysiological model of PD suggested that lesions that decrease GPi activity should result in reversal of the cardinal signs of PD. However, stereotaxic pallidotomy (Fig. 5C) or subthalamotomy (Fig. 5D) in PD, although reducing rigidity and drug-induced dyskinesias, do not show effects on motor function (Sutton et al., 1995; Samuel et al., 1998), or produce modest improvements in motor function only during the ‘off’ state (Lozano et al., 1995; Baron et al., 1996; Fazzini et al., 1997). These results clearly argue against the current pathophysiological model of PD. Moreover, they suggest that basal ganglia dysfunction cannot be seen only as a problem of abnormal levels of tonic neuronal activity (Krack et al., 1998). The pathophysiological model of PD has at least three major pitfalls (see Parent and Cicchetti (1998)
Fig. 4. Simulated spatial pattern, joint velocities, and cell activities during handwriting production under normal (Left column) and Parkinsonian (Right column) conditions based on the network model depicted in Fig. 2 . See text for description of simulation.
270 A. Normal
B. Parkinsonism CORTEX
C. Parkinsonism + GPI lesion
7
I
r"
Brain stem Spinal cord
1
K
.
PUTAMEN
PUTAMEN
Brain stem Spinal cord
D. Parklnsonism + STN leslon
Brain atern Spinal cord
Brain stem
Spinal cord
Fig. 5. Pathophysiological model of movement control in Parkinson's disease (DeLong and Wichmann, 1993).The diagrams represent the hypothesized anatomical, neurophysiological and neurochemical relations in the (A) intact, (B) Parkinsonian cortico-basal ganglia skeletomotor circuit following damage of the dopaminergic neurons of the substantia nigra pars compacta, and after GPi pallidotomy (C) and after STN lesion (D) in the Parkinsonian network. Two different subtypes of striatal cells send projections to either GPi or GPe, but not to both. These two striatal populations are also differentiated by their neuropeptide content and the type of dopamine receptor (DI and D2, respectively) mRNA they express (Gerfen et al., 1991). Pallidotomy removes the GPi inhibitory output to VL and GPe. Subthalamotomy removes excitatory projections to GPi/SNr. Keys: D I , D2 dopamine receptor subtypes; GPe, GPi, external and internal segments of the globus pallidus; SNc, SNr substantia nigra pars compacta and pars reticulata, respectively; STN, subthalamic nucleus; VL, ventrolateral thalamus. Adapted from DeLong and Wichmann (1993), Basal ganglia-thalamo-cortical circuits in Parkinsonian signs. Clinical neuroscience by permission of Wiley-Liss, Inc., a subsidiary of John Wiley & Sons, Inc.
for a review of additional limitations of the pathophysiological model of PD). First, Hoover and Strick (1993) have shown that the basal ganglia output consists of multiple segregated pallidothalamo-cortical circuits. Therefore, the specific site of pallidal stimulation, inactivation, or lesion would likely affect differently various aspects of motor behavior. In this regard, Krack et al., (1998) showed opposing motor effects of pallidal stimulation in PD patients depending on the location of the stimulating electrode within GPi. Stimulation of ventral GPi resulted in marked improvements on rigidity and dyskinesia, whereas stimulation in the dorsal GPi caused the best effect on akinesia and produced dyskinesia. These data suggested a different topography and pathophysiology for explaining the different motor deficits in PD patients (Krack et
al., 1998). The existence of at least two anatomofunctional systems within GPi has been confirmed independently by Bejjani et al. (1997). These data were to be expected given the segregation of pallidal outputs. In this regard, Hoover and Strick (1993) performed retrograde injections of an HIV virus in ventral premotor cortex (PMv), supplementary motor area (SMA) and primary motor cortex (Ml), which suggested that pallidal segregation followed a dorsal to ventral organization that is compatible with the data of Krack and colleagues. The different physiological properties and the separate somatotopic organizations characterizing different subnuclei of the motor thalamus also provide support for functional heterogeneity according to the pattern of projections of pallidal neurons (Vitek et al., 1994).
27 1
Second, the present simulations also suggest a less specific anatomical segregation of corticothalamic pathways (Darian-Smith et a]., 1996), as descending axons appear to influence not only the reciprocal thalamo-cortical projection but also adjacent thalamo-cortical circuits. Precisely, the pathophysiological model of PD movement is devoided of the influence of cortico-thalamic pathways on movement control. Third, single cell recordings and the present simulations suggest that phasic disinhibition of the thalamo-cortical neurons may be critical for the control of movement. Inase et al. (1996) showed that muscimol injection into monkey GPi resulted in increased tonic and decreased phasic levels of GPi activity compared to pre-injection levels. Furthermore, the reduced phasic activity was accompanied by smaller peak velocities and sometimes hypometria during point-to-point arm movements. As pallidotomies can only reduce the abnormally large tonic inhibition of thalamocortical cells, without restoring phasic activity in PD, it would be expected that stereotaxic pallidotomy will have a minimal effect on motor function (e.g. Sutton et al., 1995; Samuel et al., 1998). Limitations of the model
In the model depicted in Fig. 2, it is assumed that striatal and pallidal neuron organization can be described as a layer of principal neurons with mutual inhibitory connections. This assumption appears to be weakened by recent experimental data in the rat’s striatum suggesting that surround inhibition among striatal neurons is weak or nonexistent (Jaeger et al., 1994). Nevertheless, other studies support the notion of striatal surround inhibition (Rebec and Curtis, 1988; Plenz and Aersten, 1996). Moreover, differences in near vs. distant striatal neighborhoods and/or in striatal organization among species may also be important in assessing surround inhibition in the primate striatum. For example, it has been shown that the ratio of projection neurons to striatal interneurons is significantly greater in the rat than in the primate, which suggests that interneuronal systems in the primate striatum may play a greater role in its
integrative and functional organization (Graveland and DiFiglia, 1985). Also, the absence of DAergic inputs in striatal slices may affect the activity of striatal neurons and/or their local axon collaterals (and hence inhibitory interactions) through differential effects on the neuropeptides they express (Girault et al., 1986; Contreras-Vidal et al., 1998). Another limitation of the present model is the over-simplification of structures outside the basal ganglia such as the motor cortex and SMA. For example, in the present simulations the output of the local model circuits for the basal ganglia and the thalamus are linked to the trajectory formation VITE network in order to simulate the kinematics of point-to-point movements and handwriting. Recently, Bullock and colleagues have extended the VITE model to account for the functional roles of diverse cell types in movement-related areas of primate cortex (Bullock et al., 1998). This may allow an integrated interpretation of both cortical and subcortical circuits during the organization, planning and execution of movement. Finally, the present model does not account for learning of movement sequences, as it assumes that the motor program sequence has already been learned and stored in premotor areas. Recently, Matsuzaka et al. (1992) have reported the existence of a functionally segregated motor area rostral to the SMA. This rostral part was defined as preSMA, while the caudal part was redefined as SMA proper. Furthermore, Hikosaka et al. (1996) have shown through functional magnetic resonance imaging that the pre-SMA is involved in learning of sequential tasks as opposed to performance of sequential movements. Moreover, Miyashita et al. (1995) have shown that pre-SMA is critical for the learning of new sequences, while SMA is important for internal initiation of action, as in the present model. Moreover, Miyachi et al. (1997) experiments using reversible blockade found that the anterior striatum is critical for the learning of new sequences, while the middle-posterior striaturn is preferentially critical for the long-term storage or retrieval of memory for sequential movements. These data are supported by anatomical projections to anterior striatum from dorsolateral prefrontal cortex (PFCdl) and from pre-SMA (Parthasarathy et al., 1992), which are known to be involved in
212
trial-by-error learning (Watanabe, 1996; ContrerasVidal and Schultz, 1999) and learning of sequential tasks, respectively (Miyashita et al., 1995). In contrast, the middle-posterior striatum receives inputs from premotor areas including the SMA (Hoover and Strick, 1993), which are involved in the initiation and perfoimance of internally-generated (i.e., from memory) sequences, as was the case in the present simulations. Recently, Sun and Schultz (1998) have proposed a reinforcementlearning based neural network model for learning sequential movements which could be applied to learn the handwriting motor sequences.
Appendix Anatomical, neurohysiological, and neurochemical relations within the basal ganglia-thalamocortical circuits are modeled using nonlinear differential equations (Contreras-Vidal and Stelmach, 1995; Contreras-Vidal et a]., 1998). A non-linear differential equation is used to define the availability of dopamine (D) in the model's striatum as (Contreras-Vidal et al., 1998), d -D=(1 - D ) - u dt
x
D
(1)
where a is a depletion rate. This equation states that the amount of neurotransmitter dopamine D available in the system at any given time increases or accuinulates at a rate (1 - D ) and dopamine decreases or depletes (wears-om at a rate given by a. Striatal neuropeptide levels are modulated by dopamine release in the basal ganglia. We model this modulation as a medium-term effect on the amount of neurotransmitter available. For the direct pathway,
k; and the constants b = 2.0 and c = 8.0 represent the accumulation and depletion rates respectively. Equations 2 and 3 state that neurotransmitter levels are depleted at a rate given by cS,T, (direct) or cSkUk (indirect) respectively. Depletion of neuropeptides occur provided the striato-pallidal pathway (S, defined below) is active. Conversely, the levels of peptide messenger RNA in the striatopallidal pathways increase or accumulate at a - U,) rate of b(B,,(D) - T,) (direct) or b(BENK(D) (indirect) respectively. The upper and lower neurotransmitter levels are given by BSpIENK ( D ) and zero respectively. The maximum levels depend on the amount of striatal dopamine. We model the level of peptide mRNA in the Parkinsonian state as non-linear functions of dopamine, B,(D) = D~
(4)
which represents a slower-than-linear increasing function of dopamine in the range [0,1], and BENK(D)= l.O+e-"'"
(5)
which represents a faster-than-linear decreasing function of dopamine in [0,1].The dynamics of the simulated neuropeptide levels follow qualitatively the levels of enkephalin and substance P mRNA reported after striatal dopamine depletion (Nisembaum et al., 1994).It is suggested that co-expressed neuropeptides act as switches that allow dopamine to have different effects on available transmitter pools in the direct and indirect pathways. Striatal (sk) cell activity over time is modeled as,
and for the indirect pathway, d U,=b(B,,,(D)-U,)-cS,U, dt
-
(3)
where T, and uk represent the amount of neuropeptides available in the direct (GABNSP) and indirect (GABAENK) pathways for motor channel
where A, is the passive decay rate of neural activity; B,, and D , are the upper and lower activity levels of neuron S, respectively: EnI,, represents the corticostriatal input from cortex; f ( x ) =x"l(0.25 +x3) is a sigmoidal function of cell activity; and ZAch represents a baseline input from cholinergic
273
interneurons. Equation 6 states that neuron S, is excited by corticostriatal inputs at rate ( B , - S,), and is inhibited by axon collaterals (e.g., through mutual inhibition) from neurons in neighboring motor channels at rate (D,+S,). For the intact simulation, only striatal neurons within a channel have opponent interactions, whereas in the Parkinsonian simulation, all striatal neurons, even from different channels, interact via surround inhibition. The gamma-aminobutyric acid (GABA) activity from striatal cells is differentially modulated by current neuropeptide levels, which in turn affect differentially the activity of the two pallidal segments. For GPi (G,) cells,
from GPe neurons in adjacent channels (only for the PD simulation). The indirect pathway projects to GPi through STN (J,) cells, d J k = - A , J , + ( B , - J k ) ( I , + f ( J J ) - 2.5(D,+JJ,)H, dt ~
(9) where A, is the passive decay rate; B, and D, are the upper and lower bounds in neural activity respectively; I , provides a level of tonic activity, and H,is the GABAergic input from GPe neurons. The basal ganglia output (GPi) sends GABAergic projections to the ventrolateral thalamus (Pi), which in turn projects to motor cortical areas,
1 -
+ Gk)(200S,T,+ 1S H , +
G,) ritk
1
(7)
where A , is the passive decay rate; B , and D, are the upper and lower neuron activity levels respectively; . I is ,an excitatory input from subthalamic nucleus (STN) that provides background excitation to GPi neurons; f ( C,) represents positive feedback from neuron G, to itself; S,T, denotes the coexistence of GABA/SP transmitter in the direct pathway, Hi is an inhibitory input from GPe neurons (e.g., mutual inhibition), and G, represents surround inhibition from GPi neurons in adjacent channels (only for the PD simulation). cells, For GPe (H,) d Hk = - AhHk + (BIZ- Hi)(7.5Jk +f(Hk)+ 1.5) dt -
where A,, is the passive decay rate; B, and D, are the upper and lower activity levels; .Ikis an excitatory input from STN; f ( H k )represents positive feedback from neuron G, to itself; Sku,denotes the coexistence of GABNENK neurotransmitters in the indirect pathway, Gk is the inhibitory input from represents surround inhibition GPi neurons, and H,,
-
(D, + Pk)(Gk+ ISMA)]
(10)
where A, is the decay rate; B,, and D,, are the upper and lower activity cell levels; I,,,,,, is a baseline tonic input and G, is the GABAergic input from GPi neurons, and the cortico-thalamic input ISMA = 20.0 whenever any joint velocity O [ 11, 8[2], or 8[3] reaches a peak value, and zero otherwise (see Panel J in Fig. 4). ISMA also trigger the read-in of the next TPV from sequence memory. The cortical GABAergic interneuron (W,) is modeled as,
where P, is the thalamo-cortical neuron for channel k. The cortical trajectory formation network (VITE network) computes the desired kinematics of a point-to-point movement (e.g., stroke) as follows, dV, -= 60( - V, + TPV, - PPV,) dt
(12)
where v k is the difference-vector, TPV, is the target position vector and PPV, is the present position vector for df k. The output of the difference vector cells is modulated by the pallido-thalamic activity (P,). The outflow signal P k [ V k ] +is integrated at the
274
present position vector (PPV,) stage to obtain the pen trajectory,
dPPV, =0.2(P,- 1.2 x Wk+Ir)[Vk]+ (13) dt where [x] = x if x > 0 or zero otherwise, I, is a small baseline input, P, is the thalamo-cortical projection, and W, provides feedforward inhibition associated with neuron P, . The parameters used in the simulations were: (decay rates) AI=A, = 10.0, A,=A,,= 3.0, A , = 2.0; (upper activity levels) B, = 1.O, B , = 3.0, B,,= B, = B,, = 2.0; (lower activity levels) D, = 0.0, D, = D, = D, = D;, = 0.8; (transmitter rates) a = 0.00005, b = 2.0, c = 8.0; (tonic inputs) I,,,,, = 1.2, I, = 2.0, IAcIl = 0.5, I, = 0.01, and (corticostriatal input) I,,= 25, which lasted for 10 msec. For the PD simulation, the level of dopamine was equal to 0.5. All the above parameters, except for the baseline inputs to the model thalamic and pallidal cells, were the same as in Contreras-Vidal et al., (1998). The above parameters were determined based on behavioral and physiological properties of each cell type. For example, the transmitter rates were set according to the pattern of motor behavior of a PD patient over a complete drug cycle (see Contreras-Vidal et al., 1998). Decay rates, lower and upper activity levels, and baseline inputs were chosen to shape the patterning of responses for each cell type (e.g., mean firing rates of pallidal neurons). The model is said to generate some measurable empirical result if from specified initial conditions the trajectories of model’s states evolve in a way that mimics the trajectories of measured behavioral and neural variables, e.g., joint motions, time histograms of homogenous cell populations, and neutransmitter dynamics. The handwriting simulations involved movements of the fingers and wrist, such that finger flexion/extension produces vertical displacement (TPV[ l]), forearm supination/pronation or wrist rotation produces local horizontal displacement (TPV [2]), and wrist flexiordextension produces the left-to-right progression (TPV[3]). In the simulations, a sequence of relative target position vectors was defined according to the individual stroke direction and amplitude for each degree of free+
dom. At times of zero or peak velocity the subsequent target position vector of the motor program below was fed into the VITE model, which continously computed the difference vector (DV[1], DV[2], DV[3]) between the target position (TPV[l], TPV[2], TPV[3]) and the present position vectors (PPV[ I], PPV[2], PPV[3]). The sequence of present position vectors forms the trajectory of the pen tip. The spatial coordinates of the pen tip are derived from the present position vector (PPVs) as follows (Bullock et al., 1993):
x= PPV[ I] x COS(PPV[3]) +(length +PPV[2]) x sin(PPV[3]) y=
-
(14)
PPV[ 11 x sin(PPV[3])
+(length +PPV[2]) x cos(PPV[3])
(15)
where PPV[3] is given in radians, and length = 400 corresponds to the distance between wrist joint and pen tip.
Acknowledgement This work was supported by NINDS NS33173.
References Alexander, G.E. and Crutcher, M.D. (1990) Preparation for movement: neural representations of intended direction in three motor areas of the monkey. J. Neurophvsiol., 64: 133-1 50. Alexander, G.E., DeLong, M.R. and Strick B.L. (1986) Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annu. Rev. Neurosci., 9: 357-381. Baron, M.S., Vitek, J.L., Bakay, R.A.E., Green, J. Kaneoke, Y., Hashimoto, T.. Turner, R.S., Woodward, J.L., Cole, S.A.. McDonald, W.M., DeLong, M.R. (1996) Treatment of advanced Parkinson’s disease by posterior GPi pallidotomy: 1 I-year results of a pilot study. Ann. Neurol., 40: 355-366. Bejjani, B., Damier, P., Arnulf, I., Bonnet, A.M., Vadailhet, M., Dormont, D., Pidoux, B., Comu, P., Marsault, C., Agid, Y. ( I 997) Pallidal stimulation for Parkinson’s disease: Two targets? Neurology, 49: 156P1569. Bergman, H., Wichmann, T., Karmon, B., DeLong, M.R. (1994) The primate subthalamic nucleus. 11: Neuronal activity in the MPTP model of Parkinsonism. J. Neurophysiol., 72: 507-520. Boraud, T., Bezard, E., Bioulac, B., Gross, C. (1996) High frequency stimutation of the internal globus pallidus (GPi) simultaneously improves Parkinsonian symptoms and reduces the firing frequency of GPi neurons in the MPTPtreated monkey. Neurosci. Left., 21 5: 17-20.
275 Boecker, H., Dagher. A,, Ceballos-Baumann, A.O., Passingham, R.E., Samuel, M., Friston, K.J., Poline, J., Dettmers. C., Conrad, B., Brooks, D.J. (1998) Role of the human rostral supplementary motor area and the basal ganglia in motor sequence control: investigations with H2 1.50 PET. J. Neurophyviol., 79(2): 1070-1080. Bullock, D. (1998) Sensory-motor learning, planning, and timing. In: G.A. Ritschard, A. Berchtold, F. Duc and D.A. Zighed (Eds.), Apprentissuge, des P rincrpes Nuturels uux Mod2le.s Artijciels. Paris: Hermes, pp. 1 7 4 0 . Bullock, D., Grossberg, S. (1988) Neural dynamics of planned arm movements: emergent invariants and speed-accuracy properties during trajectory formation. fsychol. Rev., 95: 49-90. Bullock, D.. Grossbeg, D., Mannes. C. (1993) A neural network model for cursive script production. B i d . Cybern., 70: 15-28, Bullock. D.. Cisek, P., Grossberg, S. (1998) Cortical networks for control of voluntary arm movements under variable force conditions. Cereb. Cortex, 8: 48-62. Contreras-Vidal, J.L. and Schultz, W. (1999) A predictive reinforcement model of dopamine neurons for learning approach behavior. J. Comput. Neuro.sci., in press. Contreras-Vidal, J.L., Stelmach, G.E. (1995) A neural model of basal ganglia-thalamo cortical relations in normal and Parkinsonian movement. Bid. Cybern., 73: 467476. Contreras-Vidal, J.L., Poluha, P., Teulings. H.L., Stelinach, G.E. (1998) Neural dynamics of short and medium-term motor control effects of levodopa therapy in Parkinson’s disease. Art$ Intell. Med., l3(1-2): 57-79. Darian-Smith. I., Galea, M.P., Darian-Smith. C., Sugitani, M., Tan, A., Burman, K. (1996) The Anatomy of Manual Dexterity. Advances in Anatomy, Embriology and Cell Biology. Vol. 133, Springer-Verlag. DeLong, M.R., Wichmann, T. (1993) Basal ganglia-thalamocortical circuits in Parkinsonian signs. Clin. Nrurosci., I : 18-26. Fazzini, E.. Dogali, M.. Sterio, D., Eidelberg, D., Beric, A. ( 1997) Stereotactic pallidotomy for Parkinson’s disease. Neurology 48: 1273-1277. Filion, M.. Tremblay, L. (1991) Abnormal spontaneous activity of globus pallidus neurons in monkeys with MPTP-induced Parkinsonism. Bruin R e x , 547: 142-15 I. Fink, J.S. (1993) Neurobiology of basal ganglia receptors. Clin. Neurosci.. 1 , 27-35. Gerfen, C.R., McGinty, J.F., Young, W.S. I11 (1991) Dopamine differentially regulates dynorphin. substance P, and enkephalin expression in striatal neurons: in .rim hybridization histochemical analysis. J. Neuroscience, 1 I : 116-1031. Gerfen, C.R. (1992) The neostriatal mosaic: multiple levels of compartmental organization in the basal ganglia. Ann. Rev. Neurosci., 15: 285-320. Girault, J.A., Spampinato, U., Glowinski, J., Besson, M.J. (1986) In vivo release of [’fly-aminobutyric acid in the rat neostratiuni. 11. Opposing effects of DI and D2 dopamine
receptor simulation in the dorsal caudate putamen. Neuroscience, 19: 1009-1 117. Grafton, S.T., Fagg, A.H., Arbib. M.A. ( 1998) Dorsal premotor cortex and conditional movement selection: A PET functional mapping study. J. Neurophysiol., 79: 1092-1097. Graveland, G.A., DeFiglia, M. (1985) The frequency and distribution of medium-sized neurons with indented nuclei in the primate and rodent neostriatuni. Bruin Res., 327: 307-3 1 I . Grossberg, S. (1984) Some normal and abnormal behavioral syndromes due to transmitter gating of opponent processes. Biol. fsycliiatry, 19: 1075-1 117. Hazrati, L.N., Parent, A., Mitchell, S., Haber. S.N. (1990) Evidence for interconnections between the two segments of the globus pallidus in primates: a PHA-L anterograde tracing study. Bruin R e x , 533: 171-175. Hikosaka, O., Sakai, K., Miyauchi, S., Takino. R.. Sasaki, Y., Ptz, B. (1996) Activation of human pre-SM and preuneus in learning of sequential procedures. Neurosci. Res. [Suppl.]. 20: S243. Hoover, J.E., Strick, P.L. (1993) Multiple output channels in the basal ganglia. Science, 259: 819-82 I . Inase, M., Buford, J.A., Anderson, M.E. (1996) Changes in the control of arm position, movement, and thalamic discharge during local inactivation in the globus pallidus of the monkey. J. Neuroptiysiol., 75: 1087-1 104. Jaeger, D., Kita, H., Wilson, C.J. (1994) Surround inhibition among projection neurons is weak or nonexistent in the rat neostratium. J. Neurophysiol., 72: 2555-2558. Jenkins, I.H.. Brooks, D.J., Nixon, P.D., Frackowiak, R.S., Passingham, R.E. (1994) Motor sequence learning: a study with positron emission tomography. J. Neurosci, 14(6): 3775-3190. Krack, P., Pollak, P., Limousin, P., Hoffniann, D., Benazzouz, A,, Le Bass, J.F., Koudsie, A,, Benabid, A.L. (1998) Opposite motor effects of pallaidal stimulation in Parkinson’s disease. Ann. Ne~irol.43: 180-192. Kunzle, H. ( 1975) Bilateral projections from precentral motor cortex to the putamen and other parts of the basal ganglia: an autoradiographic study in Macaca fascicularis. Brain Res.. 88: 195-290. Lozano, A.M., Lang, A.E., Galvez-Jimenez, N., Miyasaki, J., Duff, J., Hutchinson, W.D., Dostrovsky, J.O. (1995) Effect of GPi pallidotomy on motor function in Parkinson’s disease. Luncet, 346: 1383-1 387. Mutsuzaka, Y.. Aizawa, H., Tanji, J. (1992) A motor area rostral to the supplementary motor area (presupplementary motor area) in the monkey: neuronal activity during a learned motor task. J. Neurophysiol., 68: 653-662. Miyachi, S., Hikosaka, O., Miyashita, K., Krdi, Z., Rand, M.K. (1997) Differential roles of monkey striatum in learning of sequential hand movement. Exp. Bruin R e x . 1 15: 1-5. Miyashita, K., Hikosaka, O., Lu, X., Miyachi, S. (1995) Neuronal activity in medial premotor area of monkey during learning of sequential movements. Soc. Neurosci., Abs., 21 : 1928.
276 Mushiaka, H., Strick, P.L. (1995) Pallidal neuron activity during sequential arm movements. J. Neurophysiol., 74:
2754-2758, Nisenbaum, K.L., Kitai, S.T., Crowley, W.R., Gerfen, C.R. (1994) Temporal dissociation between changes in striatal enkephalin and substance P messenger RNAs following striatal dopamine depletion. Neuroscience, 60:927-937. Pasthasarathy, H.B., Schall, J.D., Graybiel, A.M. (1992) Distributed but convergent ordering of corticostriatal projections: analysis of the frontal eye field and the supplementary eye field in the macaque monkey. J. Neurophysiol.. 12:
44684488. Parent, A,, Hazrati, L.-N. (1995) Functional anatomy of the basal ganglia. I. The corticobasal ganglia-thalamo-cortical loop. Brain Rex ReiJ.,20:91-127. Parent, A,, Chicchetti, F. (1998)The current model of basal ganglia organization under scrutiny. Movemenf Disorders, 13: 199-202. Plenz, D., Aertsen, A. (1996) Neural dynamics in cortexstriatum co-cultures 11. Spatiotemporal characteristics of neuronal activity. Neuroscience, 70: 893-924. Rebec, G.V., Curtis, S.D. (1988)Reciprocal zones of excitation and inhibition in the neostratium. Synapse, 2:633-635. Roland, P.E., Larsen, B., Lassen, N.A., Skinhof, E. (1980) Supplementary motor area and other cortical areas in organization of voluntary movements in man. J. Neurophysiol., 43: 1 18-1 36. Samuel, M., Caputo, E., Brooks, D.J., Schrag. A,, Scaravilli, T., Branston, N.M.. Rothwell, J.C., Marsden, C.D., Thomas, D.G.T.. Lees, A.J.. Quinn. N.P. (1998)A study of medial pallidotomy for Parkinson’s disease: clinical outcome, MRI location and complications. Brain, 121 : 59-75.
Schell, G.R., Strick, P.L. (1984)The origin ofthalamic inputs to the arcuate premotor and supplementary motor areas. J Neurosci., 4:539-560. Sherman, M.S., Guillery, R.W. (1996)Functional organization of thalamocortical relays. N. Neurophy.sio/.,76:1367-1 395. Sun. R.E.. Schultz, W. (1998)Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Exp. Bruin Res. 121: 350-354. Sutton. J.P., Couldwell, W., Lew, M.F., Mallory, L., Grafton, S., DeGiorgio, C., Welsh, M., Apuzzo, M.L., Ahmadi. J., Waters, C.H. (1995)Ventroposterior medial pallidotomy in patients with advanced Parkinson’s disease. Neurosurgery, 36:
I 112-1 116. Teulings, H.L., Contreras-Vidal, J.L., Stelmach, G.E. ( 1997) Coordination of fingers, wrist and arm in Parkinsonian handwriting. Exp. Neurol., 146: 159-170. Tremblay, L., Filion, M. (1989)Responses of pallidal neurons to striatal stiinulation in intact walking monkeys. Brairi R e x .
498:1-16. Vitek. J.L., Ashe, J., DeLong. M.R., Alexander, G.E. (1994) Physiologic properties and somatotopic organization of the primate motor thalamus. J. Neurophysiol., 7 l(4): 1498-15 1 3. Watanabe ( 1996) Reward expectancy in primate prefrontal neurons. Nature, 382:629-632. White, E.L. (1989)Cortical Circuits: synaptic organisation of the cerebral cortex - structure, function and theory, Birkhauser, Boston. Yeterian, E.H. and Pandya, D.N. (1994)Lanlinar origin of striatal and thalamic projections of the prefrontal cortex in rhesus monkeys. Exp. Brain Res.. 83:268-284.
J.A. Reggia, E. Ruppin and D.Glanatnan (Eds.) Pvogresc i n Bruin Rrseurch. Vol I? I 0 1999 Elsevier Science BV. All right5 reservcd.
CHAPTER 16
Motor fluctuations in Parkinson’s disease: a postsynaptic mechanism derived from a striatal model Rolf Kotter” Department of Morphological Endocrinology & Histochernistrv and C. & 0. k g t Bruin Research Institute. Heinrich Heine University, 0-40225 Diisselrlorfi Gemian):
Introduction Parkinson’s disease is a chronic illness dominated by the motor signs of rigidity, reduced mobility (hypokinesia, bradylunesia), and resting tremor. Extensive loss of nigrostriatal dopaminergic neurons is the main histopathological feature (Ehringer and Hornykiewicz, 1960). Effective treatment usually includes administration of L-DOPA (a precursor of dopamine) and other dopaminergic agonists (for a review see Tolosa et al., 1998). “The initial response to L-DOPA therapy is usually one of remarkable increases in overall motility, with a reappearance of most previously lost associated movements [ ‘L-DOPA honeymoon’]. Unfortunately, this improvement does not usually continue for 1 or 2 years without the complications of abnormal involuntary movements or of the more obscure ‘on-and-off’ phenomenon. The patient may be doing quite well when, without apparent reason, and not necessarily in association with a stressful situation, he suddenly feels very tired and, to the casual onlooker, rapidly returns to a state of alunesia, with expressionless or pained facies, stooped posture, and lack of motivation to move. This process is occa*Corresponding author. Tellfax: + 49-211-81-12095; e-mail: RK 0hirn.uni-duesseldorf.de
sionally very rapid. [. . .] The akinetic state may last a few minutes or persist some hours. The return to normal (or at least the previous ‘well’ state) may be equally rapid, but usually occurs more slowly, almost imperceptibly, often with reappearance of dyskinesias.” (Barbeau, 1974). Sudden, extreme, and occasionally unpredictable motor fluctuations occur in patients with untreated or treated Parlunson’s disease (Marsden et al., 1982). A wide variety of symptoms are subsumed under the headings ‘motor response fluctuations’ or ‘on-off phenomena’: freezing episodes, start-hesitation, end-of-dose deterioration (‘wearing off’), peak dose dyskinesia, diphasic dyskinesia, ahnesia paradoxica, kinesia paradoxica, random on-off (‘yo-yoing’, which is regarded as the on-off phenomenon in the strict sense) (Marsden, 1980; Marsden et al., 1982; Lees, 1989; Caraceni, 1991; Quinn, 1998). The time scale of behavioural fluctuations ranges from minutes to weeks (Marsden et al., 1982; Quinn, 1998). Additional autonomic, sensory, and cognitive/psychiatric fluctuations have been described that are associated with motor off states (Brown et al., 1984; Riley and Lang, 1993; Hillen and Sage, 1996). From a pathophysiological point of view the challenges are: (1) to classify motor fluctuations in a meaningful manner; (2) to establish links between certain types of motor fluctuations and possible underlying mechanisms; and (3) to conceive a
278
relationship between the different time scales of sudden fluctuations of symptoms, medium-term drug actions and chronic disease progression.
Explanations of motor fluctuations in Parkinson’s disease A useful and generally accepted starting point of pathophysiological considerations is the division of motor fluctuations into two groups: -
drug intake-related, and drug intake-unrelated motor fluctuations.
Most proposed explanatory concepts deal with the group of drug intake-related phenomena: “From the beginning of levodopa [L-DOPA] therapy, most patients are aware that benefit lasts only for a few hours after each dose. On average, initial improvement begins after half an hour or so, reaches its maximum within an hour, and lasts for up to about three or four hours. Thereafter, if another dose is not taken, the patient relapses back to his untreated Parkinsonian state.” (Marsden, 1980, p. 242). The first explanation for drug intake-related motor fluctuations, accordingly, is very straightforward: some behavioural fluctuations on the time scale of hours are caused by the rise and fall of plasma concentrations of L-DOPA (Muenter and Tyce, 1971; Fahn, 1974; Sweet and McDowell, 1974; Eriksson et al., 1984). The picture is complicated, however, by a change in the response to the drug on a slower time scale: “After a number of years of levodopa treatment, the duration of action of each dose appears to shorten so that end-of-dose deterioration occurs earlier and earlier as time passes. Patients complain that they get less and less benefit from the drug, because it lasts a shorter and shorter period of time.” (Marsden, 1980, p. 243). In spite of numerous efforts, all attempts to demonstrate changes in the peripheral pharmacokinetics of L-DOPA that would explain the short-
ening of action or the development of behavioural fluctuations have failed (Fabbrini et al., 1987; Hammerstad et al., 1991). The transport system for L-DOPA both in the small bowel and in the blood-brain barrier is saturable, and L-DOPA competes with large neutral amino acids, such as phenylalanin, for carriermediated transport (Eriksson et al., 1988; Nutt, 1992). Diet-related variations in the amount of large neutral amino acids altering the amount of transported L-DOPA may evoke swings of central L-DOPA concentrations and behavioural fluctuations. Consequently, low protein diets for Parkinsonian patients have been devised (Mena and Cotzias, 1975; Pincus et al., 1987). Such diets reduce the frequency and duration of the off periods, but they do not abolish behavioural fluctuations (Sweet and McDowell, 1974; Mena and Cotzias, 1975). The benefit obtained with lowprotein diets seems to arise predominantly from an increase of the dopamine concentration in the brain (Eriksson et al., 1988). Therefore, an involvement of central mechanisms has been suspected. L-DOPA is converted into dopamine by the enzyme aromatic L-amino acid decarboxylase (AAAD). Since dopaminergic neurons contain AAAD, continuing loss of neurons with a subsequent decrease in capacity f o r synthesis of dopamine could explain the reduction of effect of L-DOPA as the disease progresses. In extension, serotoninergic and noradrenergic neurons contain AAAD (Trugman et al., 1991), and it has been argued that such neurons are destroyed in the course of Parkinson’s disease at a later stage than dopaminergic neurons. A number of observations can be contrasted with this explanation: the indirect dopaminergic agonist amphetamine restores motility during the off period, therefore dopamine stores cannot be totally depleted. The loss of dopaminergic neurons in Parkinson’s disease does not exceed two thirds of the cells and at least 10% survive even at the time of death (Pakkenberg et al., 1991). In rats, even near total destruction of dopaminergic neurons leaves some AAAD in non-aminergic striatal neurons (Melamed et al., 1980a), but hardly any was found in serotoninergic or noradrenergic neurons (Melamed et al., 1980b). A more subtle point is that a reduced rate of synthesis in
219
dopaminergic neurons can be expected to limit the amount of dopamine every time it is released, but would not explain the shortened effect of a dose of L-DOPA. The most successful explanation for drug intakerelated motor fluctuations so far is the presynaptic buffer model, i.e. the concept that an increasing loss of dopaminergic neurons leads to a progressive fuilure to store or buffer dopamine (Marsden, 1980). This concept entails that the initial response following L-DOPA administration is brought about mainly by newly synthesized dopamine that exceeds the storage or buffer capacity in the residual dopaminergic cells and overflows to produce high extracellular dopamine concentrations. Later therapeutic benefit from the release of the stored fraction of dopamine would be shortened (or reduced) because of the small vesicular storage capacity. With progressive nigrostriatal neuronal loss the storage capacity of dopaminergic neurons decreases, and drug intake-related swings in the dopamine concentration become more and more pronounced. This presynaptic mechanism receives its main support from positron emission tomography studies showing that patients with ‘wearing off’ have a greater reduction in the striatal uptake of L-DOPA than patients without fluctuations (Leenders et al., 1986). Lees (1989) argues, however, that a progressive reduction in the storage capacity of surviving nigrostriatal dopamine terminals is not a critical factor, mainly since patients with a marked asymmetry of basal ganglia signs (corresponding to a 10-2096 difference between the two brain sides in the number of pigmented nigral cells) fail to show a matching asymmetry in their motor response to a single oral L-DOPA dose (Kempster et al., 1989). Indeed, the buffer model has not been specified in quantitative terms. For example, the size of the buffer capacity of a nigrostriatal terminal is unknown; turnover numbers of enzymes that are involved in the metabolism of dopamine have not been considered; it is not clear what mechanisms lead to an overflow of surplus dopamine. Moreover, it remains unexplained by what neuronal mechanisms dopamine exerts its effects, and by what means variations of dopamine levels could result in the occurrence of motor fluctuations.
The main virtues of the buffer model, however, are its simplicity and its use as a guideline in clinical practice: it provides an explanation for the lack of effect of L-DOPA in healthy people if an intact population of nigrostriatal neurons is capable of buffering large amounts of dopamine; it is compatible with the suggestion that directly acting dopaminergic agonists produce a lower incidence of motor fluctuations (Jorg, 1988; Gimenez Roldan et al., 1997) although the evidence for this suggestion remains controversial (Lees, 1989; Factor and Weiner, 1993). The buffer model provides a rationale for attempts at limiting the extent of swings in dopamine concentration using intravenous, subcutaneous or transcutaneous administration of L-DOPA, slow release L-DOPA preparations, or more frequent but smaller doses (Marsden et al., 1982; Chase et al., 1989). Several clinical trials have shown beneficial effects of such measures on motor fluctuations, mainly the ‘wearing off’ phenomenon. The model does not explain, however, sudden variations of mobility or drug intake-unrelated motor fluctuations since these should not occur anymore, at all, with adequate control of dopamine concentrations. Neither does the buffer model account for the beneficial effects of directly acting dopaminergic agonists in continuous or slow release form that are not stored and released by dopaminergic neurons. According to the buffer model, both the dose of L-DOPA that produces the greatest (although shorter and shorter lasting) improvement, and also the threshold for inducing abnormal involuntary movements should become smaller with progression of the illness since a larger and larger fraction of the same dose of dopamine remains in the extracellular space during the initial rise of L-DOPA levels. An apparent decrease of the threshold for dopamineinduced dyskinesias and of the therapeutic window at subsequent stages of Parkinson’s disease has indeed been demonstrated (Chase, 1998, Fig. 3). A corollary of this model is that amphetamine becomes less effective in overcoming off periods as the amount of stored dopamine that it could release decreases. Altogether, the presynaptic buffer model is powerful and widely accepted as a clinical guideline that helps to conceptualize the wearing off
280
effect and other drug intake-related motor fluctuations at medium (hours) and long (months) time scales in Parlunson’s disease. Very similar predictions arise from a model of faulty feedback control of dopaminergic terminals by a completely different mechanism: release of dopamine from nigrostriatal terminals was suggested to inhibit itself in a negative feedback loop through stimulation of presynaptic autoreceptors (Carlsson, 1983). Conceivably, a decrease in autoreceptor sensitivity in advanced Parkinson’s disease could result in excessive dopamine release, followed by premature depletion of stores. This line of argument has not been pursued further. It is contrasted, however, by the argument that preferential stimulation of autoreceptors at low levels of L-DOPA might be responsible for a transient worsening of Parkinsonian symptoms as the first response to drug intake (Quinn, 1998). Moving on from presynaptic to postsynaptic mechanisms, a loss of responsiveness of postsynaptic dopamine receptors in the basal ganglia may provide an alternative explanation for the decreasing effect of L-DOPA in long-term treatment (Fahn, 1974). Even though application of directly acting dopamine agonists, such as apomorphine, is capable of restoring motility during the off period (Diiby et al., 1972; VanLaar et al., 1992) the duration of the acute motor response to apomorphine shortens and the therapeutic window narrows (Verhagen Metman et al., 1997). However, there have been no consistent findings of D2 dopaminergic receptor up- or down-regulation under L-DOPA treatment (Marsden, 1980; Guttman et al., 1986). An occurrence of rapid changes in receptor sensitivity due to conformational change (Fahn, 1982) that could account for short-term or medium-term behavioural fluctuations has not been substantiated. More recently, a pathological long-term potentiation of gluturnatergic synapses of the NMDA subtype due to chronic intermittent stimulation of normally tonically activated dopaminergic receptors on striatal neurons was proposed (Chase, 1998; Chase et al., 1998). This synaptic mechanism was made responsible for the progressive worsening of wearing off phenomena, peak dose dyskinesias, and the eventual occurrence of other types of motor
fluctuations. Evidence for such a mechanism comes mainly from clinical observations, such as the efficacy of NMDA antagonists in ameliorating I.DOPA-associated dyskinesias (Verhagen-Metman et al., 1998) and the reduction of motor complications by continuous intravenous administration of L-DOPA (Mouradian et al., 1990) or lisuride, a dopaminergic agonist (Baronti et al., 1992). Although systemic or intracerebral injection of the NMDA antagonist MK-801 can improve the progressive shortening of L-DOPA-induced rotating behaviour in rats with a unilateral 6-hydroxydopamine lesion of compacta neurons (Papa et al., 1995; Mann et al., 1996), the conditions for the occurrence of long-term potentiation (Kotter and Wickens 1998), the stnatal site of action of MK801 (Huang et al., 1998), and the relevance of rat models for Parkinson’s disease (Kotter and Feizelmeier, 1998) are far from being established. Altogether, alterations at postsynaptic sites provide a distinct and interesting alternative or addition to the presynaptic models mentioned above (see also Bravi et al., 1994) but experimental data so far are insufficient to confirm the proposed mechanisms. Turning to drug intake-unrelated motor fluctuations, few specific explanations have been put forward. Unexplained fluctuations are either resolved as ‘complicated’ drug intake-related phenomena (Marsden et al., 1982) or put down to greater than expected intraindividual variations in pharmacolunetic or pharmacodynamic factors, such as diet, gut absorption, blood brain transport, dopamine metabolism, or receptor regulation (Fahn, 1974). The effects of these variables on central dopamine concentrations are not clear since clinical observations rely on the determination of peripheral levels of L-DOPA or its metabolites. None of the models above explains why intakeunrelated motor fluctuations should become more prominent after some period of treatment or duration of illness. Based on the pathological feature of degeneration of locus coeruleus neurons in Parkinson’s disease, and the structural and functional relationship between dopamine and noradrenaline it has been proposed that ‘aberrant and unpredictable locus coeruleus neuronal discharges’ manifest
28 1
clinically as motor fluctuations (Sandyk, 1989). Detailed proposals concerning underlying mechanisms are lacking and the presented evidence remains contradictory so that this hypothesis was not evaluated further. In summary, there are a number of interesting and clinically detailed hypotheses concerning the origin of motor fluctuations in Parkinson’s disease but only a few of them give details of pathophysiological mechanisms or address the different time scales of various types of phenomena.
A pathophysiological model of the striatum in Parkinson’s disease Recently, computer modeling of information processing in striatal networks has made considerable progress. In particular, we investigated the effects of dopamine on cellular and subcellular mechanisms in striatal principal neurons and analyzed postsynaptic interactions between nigral dopaminergic and cortical glutamatergic afferents in the striatum (Kotter, 1994; Kotter and Wickens, 1995). This chapter explores to what extent our computer model of the striatum can offer a pathophysiological explanation for motor fluctuations in Parkinson’s disease and the different time scales at which they occur. The network model that we constructed (Wickens and Kotter, 1995) represented a selection of principal neurons in the neostriatal matrix. Coordinated cortical activity switched the membrane potential of these neurons from the hyperpolarized off state to the more depolarized on state that may be accompanied by apparently random action potentials (Stem et al., 1998). Of the various effects that dopamine exerts on principal neurons we represented two, which we considered to be the most important with respect to Parkinson’s disease: the rapid reduction of an after-hyperpolarization that affects the duration of action potential firing in principal neurons (Rutherford et al., 1988) and the role of dopamine in long-term maintenance of dendritic integrity of principal neurons. The latter is evidenced by the loss of dendritic length, dendritic
spines, and corticostriatal afferents that is found after chronic lesions of dopaminergic neurons in the pars compacta of the substantia nigra in patients (McNeill et al., 1988) and in rat models of Parkinson’s disease (Ingham et al., 1993; Ingham et al., 1998). The untreated Parkinsonian state was represented in the model by a combination of reduced excitatory input and an increased slow afterhyperpolarization. Under these conditions, the firing patterns in the network were profoundly altered: the firing rates were reduced, interspike intervals increased and burst durations shortened. Consequently, output of the striatum to pallidum and pars reticulata of the substantia nigra would appear as a very short-lasting and weak inhibitory actions on the tonically active target cells. Starting from the Parkinsonian state, treatment with L-DOPA or dopaminergic agonists was represented by a reversal and reduction of the increased after-hyperpolarization. The reduction of excitatory input and dendritic membrane surface was assumed to persist since dopaminergic treatment does not reverse the loss of dendritic spines in animal experiments (Arbuthnott and Ingham, 1993). A decrease of the after-hyperpolarization, however, could compensate for some consequences of reduced dendritic membrane surface and excitatory drive: progressive reduction of the after-hyperpolarization prolonged the duration of firing episodes and, thereby, reversed the smaller number of spikes in activated neurons. Prolonged firing episodes coincided with the re-appearance and stabilization of the spatial patterns of average firing frequencies in the network. However, these patterns became relatively static, such that the restoration of normal firing frequencies was achieved at the expense that a larger proportion of neurons were inactive for prolonged periods. Striatal target neurons would consequently show a slower succession of few inhibited neurons embedded in an increased fraction of neurons with high tonic activity. An interpretation of these pathophysiological mechanisms depends on assumptions made about what is represented by activity in striatal neurons and about the role of the basal ganglia in movement generation. While intact basal ganglia output is not
282
a necessary prerequisite for the generation of movements, it is widely assumed that activation of striatal neurons facilitates the generation of movements. The proposed mechanism is an inhibition of neurons in the output stations of the basal ganglia (globus pallidus and the pars reticulata of the substantia nigra) leading to disinhibition of thalamic neurons that project on to the frontal cortex. Within this framework, we interpreted spatial activity patterns in the basal ganglia as subserving two functions: to facilitate appropriate movements by temporary disinhibition of selected thalamocortical projection neurons, on the one hand, and to suppress undesirable movements by tonic inhibition of the majority of thalamocortical neurons, on the other (Kotter and Wickens, 1998). Since the simulation of striatal dynamics in advanced Parkinson’s disease showed much lower firing rates and since increased activity was observed in subthalanuc neurons after dopaminergic depletion, we proposed that reduced striatal effects in combination with a general increase of firing rates in the basal ganglia output stations could explain the symptoms of rigidity, hypokinesia and bradykinesia in Parkinson’s disease. Furthermore, acute dopaminergic actions were found to increase the average activity in striatal neurons close to normal firing frequencies, but failed to reverse the slowed temporal characteristics of spike trains. Thus, we conceived that dopaminergic treatment could improve hypokinesia and rigidity by selective enhancement of striatal output, but not bradykinesia or other symptoms that may be related to persistent alterations of striatal dynamics. Finally, we predicted that a further loss of excitatory corticostriatal input during the progression of Parkinson’s disease would lead even this partial functional compensation to fail, since dopamine application was capable of extending cortically generated striatal activity but not of generating additional spike trains. From this point of view, treatment of Parkinson’s disease with intermittent application of L-DOPA or dopaminergic agonists neither reverses all sequels of the lack of dopamine nor stops the progression of the secondary disease process so long as corticostriatal transmission is not improved.
Explanation of motor fluctuations in terms of striatal pathophysiology Our view of a progressive loss of corticostriatal excitation in Parkinson’s disease has several implications for explaining motor fluctuations and their worsening as the disease process advances. Let us consider again the effects of increasing dopamine concentrations on basal ganglia output in the striatal model of Parlunson’s disease. The interpretation of the model was based on the implicit assumption that the timing, the duration, and the temporal structure of spike trains in striatal principal neurons carry information that is important for the generation of movements. Generally speaking, the idea is that the timing of spike trains is relevant to the coordination of movements with respect to other ongoing motor and non-motor activities; the duration of spike trains may determine for how long a movement should go on: the temporal structure of spike trains could influence movement parameters such as velocity or force. These three aspects clearly interact with one another. Independent of acute dopaminergic effects the striatal model of Parkinson’s disease produces increased interspike intervals in activated principal neurons. This slowing of the temporal structure of spike trains would indicate an alteration in movement parameters independent of whether dopaminergic treatment is given or not. If motor structures downstream of the striatum perform a temporal integration of striatal output then these will effectively reach the normal level of activity later than usual. Consequently, movement generation may show a slower onset unless initial striatal activity can be increased or anticipated by some other compensatory mechanism. Possibly such compensatory mechanisms are, for example: additional cortical input (e.g. from sensory stimulation), altered timing in basal ganglia-thalamo-cortical feedback loops, modulation by serotoninergic or noradrenergic inputs. Thus, it can be understood why the generation of movements becomes more difficult (start hesitation), why fast simple single voluntary movements slow down (an aspect of bradykinesia) or why sensory stimulation helps patients to initiate or alter movements.
283
The duration of striatal activity in the model was influenced by the after-hyperpolarization that is sensitive to dopaminergic treatment. A dopaminedependent prolongation of striatal activity can offset larger interspike intervals so long as both changes are within the range of detection by downstream motor structures. Only activity within this range is conceived as to produce normal movements. If the integrated activity is temporarily below the lower limit then fewer or no movements result (hypokinesia, akinesia); if the prolongation of firing exceeds the upper limit required for normal movement generation then prolonged or additional involuntary movements may appear (dyslunesia). If corticostriatal input progressively decreases with the advancement of Parkinson’s disease then additional problems occur. In order to increase the activity of striatal principal neurons into the range of a downstream temporal integrator, higher concentrations of dopamine are required to prolong striatal spike trains. This mechanism comes to its limit, however, when further prolongation produces dyskinesias as explained above. The therapeutic window between akinesia, one one hand, and dyskinesia, on the other, shrinks, accordingly, with the following consequences:
1. drug intake-related motor fluctuations become more frequent and more prominent (peak dose dyskinesia and wearing off effect), 2. drug intake-unrelated and seemingly random motor fluctuations appear since already small alterations of extracellular dopamine concentrations in the striatum can exceed the therapeutic range (random on-off phenomenon), 3 . the beneficial effects of dopaminergic treatment both with L-DOPA and dopaminergic agonists diminish. Finally, in the context of motor fluctuations Marsden (1980, pp. 249-251) drew attention to a decline of force during repetitive hand movements in Parkinson’s disease (see Schwab et al., 1959). The following hypothesis can be put forward on the basis of the striatal model: repetitive movements require repeated firing of similar groups of neurons in motor structures, which -because of the topographic organization of the basal ganglia-
thalamo-cortical loops - are likely to be induced by spike trains in largely the same group of striatal principal neurons. During the prolonged firing episodes that are necessary to activate downstream motor structures in Parkinson’s disease, refractoriness (i.e. the slow after-hyperpolarization) accumulates in active neurons leading to prolonged interspike intervals and inter-spike train intervals. Increased interspike intervals increase the time required to generate a movement (see above). Increased inter-spike train intervals would reduce the rate at which a movement can be generated repeatedly if largely the same set of striatal neurons was involved again. With this background, momentarily increased glutamatergic transmission a n d or dopamine concentrations can overcome relative refractoriness occurring on the time scale of seconds. This may be a reason why extra motivation and effort have a facilitating effect on movements. On the other hand, a period of rest of the patient decreases discharge in striatal principal neurons and allows refractoriness to wear off.
Comparison between the striatal model and the presynaptic buffer model Comparing the model of striatal processing in Parkinson’s disease with the presynaptic buffer model, both models account for a variety of motor fluctuations, but their predictions differ in important respects. Figure 1 illustrates the effects of single doses of L-DOPA on the extracellular concentration of dopamine in the striatum according to the two models. After a delay period, the striatal dopamine concentration increases rapidly to a maximum, and returns slowly back to baseline (lines denoted a in Figs. 1A and 1B). According to the buffer model (Fig. 1A; see Marsden, 1980) the capacity of nigrostriatal neurons for storage of dopamine decreases as Parkinson’s disease advances. Thus, increasing fractions of the same dose of L-DOPA appear as extracellular dopamine leading to larger peak concentrations (line a’). As these exceed the upper limit of the therapeutic range (hatched area) peak dose dyskinesias occur. The progressive loss of storage capacity explains why the apparent dose of L-DOPA causing dyskinesias declines (see
284
I
A
:min(a,a‘)
time
m
time
Fig. 1. Qualitative time course of extracellular striatal dopamine concentrations in response to single doses of L-DOPA according to (A) the presynaptic buffer model and (B) the postsynaptic striatal model during early (lines a) and advanced (lines a’ and b) Parkinson’s disease. Horizontal lines delimit the therapeutic window (hatched). Vertical dotted lines indicate the times when dopamine levels fall below the minimal effective concentration. According to (A), the reduction of presynaptic capacity for storing dopamine during the progression of Parkinson’s disease leads to a higher peak of dopamine concentration followed by a more rapid decline for the same dose of L-DOPA (a versus a’). Consequently, peak dose phenomena occur, the duration of effective treatment shortens (left arrow), and the apparent therapeutic window (in terms of the dose of L-DOPA, but not of extracellular dopamine concentrations) narrows. In (B) progressive loss of corticostriatal excitatory input raises the minimal effective dopamine concentration from Emin(a) to Emin(b), thus narrowing the therapeutic range (hatched areas). The duration of effective treatment decreases (left arrow) even when a larger dose of L.-DOPA (line b) was given compared to the standard dose (line a).
285
Chase, 1998, p. 4). Following the peak, less dopamine can be released from the stores so that extracellular dopamine concentrations decline faster, which may be interpreted as the wearing off effect (left arrow). The presynaptic buffer model predicts that the minimal clinically effective concentration of extracellular L-DOPA stays the same, and that more frequent or continuous modes of administration can control motor fluctuations completely. The striatal model proposed here (Fig. 1B) has very similar consequences in early Parkinson’s disease as the buffer model. In advanced Parkinson’s disease, however, it gives rise to different predictions: progressive reduction of corticostriatal input requires increasing levels of dopaminergic stimulation to produce spike trains that effectively activate downstream motor structures. It follows that the minimal clinically effective concentration of L-DOPA increases so that the therapeutic window shrinks (from the entire to the densely hatched area in Fig. 1(B)) with a consequent reduction of the time spent above the therapeutic threshold (left arrow). Attempts to extend the duration of therapeutic benefit by application of a larger dose of L-DOPA (line b) bear the danger of producing peak dose dyskinesias. Small - normally insignificant - variations of extracellular dopamine levels may exceed the limits of the narrowed therapeutic window, switch the functional state of the system between hypokinesia, motility and dyskinesia, and produce seemingly random motor fluctuations. More continuous forms of L-DOPA administration can help temporarily, but fail when the therapeutic window eventually vanishes. Both models regard motor fluctuations as consequences of chronic disease progression rather than adverse effects of long-term L-DOPA therapy. However, behavioural fluctuations become visible only if dopamine concentrations are brought into a range where they affect the functional state of the basal ganglia and have clinical effects. Thus, motor fluctuations may appear to be precipitated by dopaminergic treatment. The buffer model qualifies L-DOPA administration essentially as a form of substitution therapy, provided that drug intakerelated swings in central dopamine concentrations can be controlled: the less dopamine is buffered and
released by nigrostriatal neurons, the more has to be provided and controlled by other means. The striatal model views L-DOPA administration as a symptomatic palliative treatment since it reverses only part of the specific effects of a lack of dopamine. As corticostriatal transmission decreases, dopamine eventually loses its beneficial effects.
Conclusion Any model that attempts to explain the variety of motor fluctuations in Parkinson’s disease in terms of pathophysiological mechanisms is bound, at present, to be incomplete and hypothetical. Nevertheless, there is some virtue in formulating such explanatory models: they help to conceptualize complex problems in a simplified way that can serve as a guideline in clinical practice: they scrutinize our understanding of clinical processes and help to specify it the best we presently can; they identify important gaps between basic and clinical research and stimulate thinking about clinical symptoms in terms of pathophysiological processes. The proposed model of striatal processing is certainly incomplete, hypothetical, and vague in many respects. Among its virtues is, however, that it unifies the explanations of drug intakerelated symptoms, intake-unrelated fluctuations and disease progression, and attributes them all to a single underlying mechanism. This attempt has its precedent in the presynaptic buffer model proposed by Marsden (1980, p. 25 1). Compared to the buffer model, the present model provides an alternative postsynaptic explanation of roughly the same phenomena. In addition, it provides far more detail about possible pathophysiological mechanisms when it addresses the acute and chronic effects of dopamine in the striatum. A notable aspect that neither model incorporates is the role of dopamine in long-term potentiation and long-term depression of corticostriatal synapses in Parkinson’s disease (for review see Kotter and Wickens, 1998). Clinical observations point to the involvement of such plastic processes in chronic dopaminergic lesions (Barbato et al., 1997; Verhagen Metman et al., 1998), but the precise conditions for their occurrence have so far escaped experi-
286
mental elucidation (Wickens et al., 1996). The model of striatal processing suggests, however, that an improvement of corticostriatal transmission rather than the blockade of presumably upregulated NMDA-type glutamate receptors, may be crucial in order to influence secondary disease processes. There are clear differences between the presynaptic buffer model and the postsynaptic striatal model in their predictions of how the therapeutic window for dopaminergic treatment narrows. These differences may be useful to examine in order to find out, which model describes the clinical situation more faithfully. The two explanations of motor fluctuations, however, are not mutually exclusive: altered activity patterns of striatal principal neurons may, for example, co-exist with a reduction in storage capacity for dopamine. Eventually, a useful model will have be more than unifying and simple: it will have to provide both satisfactory pathophysiological explanations and successful clinical applications for a wide range of symptoms.
Acknowledgement This paper is dedicated to the memory of the late C. David Marsden and to the rational enquiry into the mechanisms of basal ganglia disorders.
References Arbuthnott, G.W. and Ingham, C.A. (1993) The thorny problem of what dopamine does in psychiatric disease. In: G.W. Arbuthnott and P.C. Emson (Eds.), Chemical Signalling in the Basal Ganglia, Progress in Brain Research, Vol. 99, Elsevier, Amsterdam, pp. 341-350. Barbato, L., Stocchi, F., Monge, A,, Vacca, L., Ruggieri, S., Nordera, G . and Marsden, C.D. (1997) The long-duration action of levodopa may be due to a postsynaptic effect. Clin. Neurophannacol., 20: 394-401. Barbeau, A. (1974) The clinical pharmacology of side effects in long-term L-DOPA therapy. In: F.H. McDowell and A. Barbeau (Eds.), Adv. Neurol., Vol. 5, Raven, New York, pp. 347-365. Baronti, F., Mouradian, M.M., Davis, T.L., Giuffra, M., Brughitta, G., Conant, K.E. and Chase, T.N. (1992) Continuous lisuride effects on central dopaminergic mechanisms in Parkinson’s disease. Ann. Neurol., 32: 776-781. Bravi, D., Mouradian, M.M., Roberts, J.W. and Chase, T.N. ( 1994) Wearing-off fluctuations in Parkinson’s disease: contribution of postsynaptic mechanisms. Ann. Neurol., 36: 27-3 1. Brown, R.D., Marsden, C.D., Quinn, N. and Wyke, M.A. (1984) Alterations in cognitive performance and affect-
arousal state during fluctuations in motor function in Parkinson’s disease. J. Neurol. Neurosurg. Psychiatry., 47: 454465. Caraceni, T., Geminiani, G., Genitrini, S., Giovannini, P., Girotti, F., Oliva, D. and Tamma, F. (1991) Treatment of motor fluctuations in Parkinson’s disease: Controlled release preparations. In: G. Bernardi, M.B. Carpenter, G. Di Chiara, M. Morelli and P. Stanzione (Eds.), The Basal Ganglia IIZ, Advances in Behavioral Biology, Vol. 39, Plenum, New York, pp. 6894195. Carlsson, A. (1983) Are ‘on-off’ effects during chronic [.-dopa treatment due to faulty feedback control of the nigrostriatal pathway? J. Neural Transm., 19(Suppl.): 153-161. Chase, T.N. (1998) The significance of continuous dopaminergic stimulation in the treatment of Parkinson’s disease. Drugs, 55(Suppl. 1): 1-9. Chase, T.N., Baronti, F., Fabbrini, G., Heuser, I.J., Juncos, J.L. and Mouradian, M.M. (1989) Rationale for continuous dopaminomimetic therapy of Parkinson’s disease. Neurology, 39(S~ppl.2): 7-10+ 19. Chase, T.N., Oh, J.D. and Blanchet, P.J. (1998) Neostriatal mechanisms in Parkinson’s disease. Neurology, 5 l(Supp1. 2): S30-S35. Diiby, S.E., Cotzias, G.C., Papavasiliou, P.S. and Lawrence, W.H. (1972) Injected apomorphine and orally administered levodopa in Parkinsonism. Arch. Neurol., 27: 474480. Ehringer, H. and Hornykiewicz, 0. (1960) Verteilung von Noradrenalin und Dopamin (3-Hydroxytyramin) im Gehirn des Menschen und ihr Verhalten bei Erkrankungen des extrapyramidalen Systems. Wien. Klin. Wochenschr., 73: 1236-1239. Eriksson, T., Granerus, A.K., Linde, A. and Carlsson, A. (1988) ‘On-off ’ phenomenon in Parkinson’s disease: Relationship between dopa and other large neutral amino acids in plasma. Neurology, 38: 1245-1248. Eriksson, T., Magnusson, T., Carlsson, A., Linde, A. and Granerus, A.K. (1984) ‘On-off ’ phenomenon in Parkinson’s disease: Correlation to the concentration of dopa in plasma. J. Neural Transm., 59: 229-240. Fabbrini, G., Juncos, J., Mouradian, M.M., Serrati. C. and Chase, T.N. ( 1987) Levodopa pharmacokinetic mechanisms and motor fluctuations in Parkinson’s disease. Ann. Neurol., 2 I : 370-376. Factor, S.A. and Weiner, W.J. (1993) Early combination therapy with bromocryptine and levodopa in Parkinson’s disease. Mov. Disord., 8: 267-272. Fahn, S. (1974) ‘On-off ’ phenomenon with levodopa therapy in parkinsonism. Neurology, 2 4 43 1 4 1 . Fahn. S. (1982) Fluctuations of disability in Parkinson’s disease: Pathophysiology. In: C.D. Marsden and S. Fahn (Eds.), Movement Disorders, Butterworth Scientific, London, pp. 123-145. Gimenez Roldan, S., Tolosa, E., Burguera, J.A., Chacbn, J., Liano, H. and Forcadell, F. (1997) Early combination of bromocryptine and levodopa in Parkinson’s disease: a prospective randomised study of two parallel groups over a
287 total follow-up period of 44 months including an initial eightmonth double-blind stage. Clin. Neurophannacol., 20: 67-76. Guttman, M., Seeman, P., Reynolds, G.P., Riederer, P., Jellinger, K. and Tourtellotte, W.W. (1986) Dopamine D2 receptor density remains constant in treated Parkinson’s disease. Ann. Neurol., 19: 487492. Hammerstad, J.P., Woodward, W.R., Gliessman, P., Boucher, B. and Nutt, J.G. (1991) The pharmacokinetics of L-DOPA in plasma and CSF of the monkey. In: G. Bernardi, M.B. Carpenter, G. Di Chiara, M. Morelli and P. Stanzione (Eds.), The Basal Ganglia III, Advances in Behavioral Biology, Vol. 39, Plenum, New York, pp. 683-688. Hillen, M.E. and Sage, J.I. (1996) Nonmotor fluctuations in patients with Parkinson’s disease. Neurology, 47: 1180-1183. Huang, K.X., Bergstrom, D.A., Ruskin, D.N. and Walters, J.R. (1998) N-methyl-D-aspartate receptor blockade attenuates D 1 dopamine receptor modulation of neuronal activity in rat substantia nigra. Synapse, 30: 18-29. Ingham. C.A., Hood, S.H., van Maldegem, B., Weenink, A. and Arbuthnott, G.W. (1 993) Morphological changes in the rat neostriatuni after unilateral 6-hydroxydopamine injections into the nigrostriatal pathway. Exp. Brain Res., 93: 17-27. Ingham, C.A., Hood, S.H., Taggart, P. and Arbuthnott, G.W. (1998) Plasticity of synapses in the rat neostriatum after unilateral lesion of the nigrostriatal dopaminergic pathway. J. Neurosci., 18: 4732-4743. Jorg, J., Schneider, I. (1988) Zur Klinik und Pathogenese des ‘on-off ’ Phanomens beim Parkinson-Syndrom. Forfschr: Neurol. Psychiaf.,56: 22-34. Kempster. P.A., Gibb, W.R.G., Lees, A.J. and Stem, G.M. (1989) Asymmetry of substantia nigra loss in Parkinson’s disease and its relevance to the mechanism of motor fluctuation. J. Neurol. Neurosurg. Psychiatry., 52: 72-76. Kotter, R. ( 1994) Postsynaptic integration of glutamatergic and dopaminergic signals in the striatum. Prog. Neurobiol., 44: 163-196. Kotter, R. and Feizelmeier, M. (1998) Species-dependence and relationship between morphological and electrophysiological properties in nigral compacta neurons. Prog. Neurobiol., 54: 619-632. Kotter, R. and Wickens, J.R. (1995) Interactions of glutamate and dopamine in a computational model of the striatum. J. Compur. Neurosci., 2: 195-2 14. Kotter, R. and Wickens, J.R. (1998) Striatal mechanisms in Parkmson’s disease: new insights from computer modeling. Art$ Intell. Med., 13: 37-55. Leenders. K.L., Palmer, A.J., Quinn, N., Clark, J.C., Fimau, G., Garnett, E.S., Nahmias, C., Jones, T. and Marsden, C.D. (1986) Brain dopamine metabolism in patients with Parkinson’s disease measured with positron emission tomography. J. Neurol. Neurosurg. Psychiatry, 49: 853-860. Lees, A.J. (1989) The on-off phenomenon. J. Neurol. Neurosurg. Psychiat., Special Suppl. 1989: 29-37.
Marin, C., Papa, S., Engber, T.M., Bonastre, M., Tolosa, E. and Chase, T.N. (1996) MK-80 1 prevents levodopa-induced motor response alterations in parkinsonian rats. Brain Res., 736: 202-205. Marsden, C.D. (1980) ‘On-off’ phenomena in Parkinson’s disease. In: U.K. Rinne, M. Klinger and G. Stamm (Eds.), Parkinson s Diseaye- Current Progress, Problems and Management, Elsevier, Amsterdam, pp. 24 I -254. Marsden. C.D., Parkes, J.D. and Quinn, N. (1982) Fluctuations of disability in Parkinson’s disease-clinical aspects. In: C.D. Marsden and S. Fahn (Eds.), Movement Disorders, Butterworth Scientific, London, pp. 96- 122. McNeill, T.H., Brown, S.A., Rafols, J.A., Shoulson, I. (1988) Atrophy of medium spiny I striatal dendrites in advanced Parkinson’s disease. Brain Res., 455: 148-152. Melamed, E., Hefti, F. and Wurtman, R.J. (1980a) Nonaminergic striatal neurons convert exogenous L-dopa to dopamine in parkinsonism. Ann. Neurol., 8: 558-563. Melamed, E., Hefti, F. and Wurtman, R.J. (1980b) L3.4-Dihydroxyphenylalanine and L-5-hydroxytryptophan decarboxylase activities in rat striatum: effect of selective destruction of dopaminergic or serotoninergic input. J. Neurochem., 34: 1753-1756. Mena, I. and Cotzias, G.C. (1975) Protein intake and treatment of Parkinson’s disease with levodopa. N. Engl. J. Med., 292: I8 1-1 84. Mouradian, M.M., Heuser, I.J., Baronti, F. and Chase, T.N. ( I 990) Modification of central dopaminergic mechanisms by continuous levodopa therapy for advanced Parkinson’s disease. Ann. Neurol., 27: 18-23. Muenter, M.D. and Tyce, G.M. (1971) L-DOPA therapy of Parkinson’s disease: Plasma L-DOPA concentration, therapeutic response, and side effects. Mayo Clinic Proc., 46: 23 1-239. Nutt, J.G. (1992) Pharmacokinetics and pharmacodynamics of levodopa. In: W.C. Koller (Ed.), Handbook of Parkinson’s Disease, Marcel Dekker, New York, pp. 41 1-732. Pakkenberg, B., Moller, A., Gundersen, H.J., Mouritzen Dam, A. and Pakkenberg. H. (1991) The absolute number of nerve cells in the substantia nigra in normal subjects and in patients with Parkinson’s disease estimated with an unbiased stereological method. J. Neurol. Neurosurg. Psychiatry, 54: 30-33. Papa, S.M., Boldry, R.C., Engber, T.M., Kask, A.M. and Chase, T.N. ( 1995) Reversal of levodopa-induced motor fluctuations in experimental parkinsonism by NMDA receptor blockade. Brain Res., 701: 13-18. Pincus, J.H. and Barry, K. (1987) Influence of dietary protein on motor fluctuations in Parkinson’s disease. Arch. Neurol., 44: 270-272. Quinn, N.P. (1998) Classification of fluctuations in patients with Parkinson’s disease. Neurology, 5 l(Supp1. 2): S25S29. Riley, D.E. and Lang, A.E. (1993) The spectrum of levodoparelated fluctuations in Parkinson’s disease. Neurology, 43: 1459-1464. Rutherford, A., Garcia-Munoz, M. and Arbuthnott, G.W. (1988) An afterhyperpolarization recorded in striatal cells ‘in vitro’:
288 Effect of dopamine administration. Exp. Bruin Res., 71: 399405. Sandyk, R. ( 1989) Hypothalamic-locus coeruleus mechanisms in the pathophysiology of ‘on-off’ in L-DOPA treated Parkinson’s disease: A hypothesis. Int. J. Neurosci., 47: 303-308. Schwab, R.S., England, A.C., Peterson, E. (1959) Akinesia in Parkinson’s disease. Neurology, 9: 65-12. Stern, E.A., Jaeger, D. and Wilson, C.J. (1998) Membrane potential synchrony of simultaneously recorded striatal spiny neurons in vivo. Nature, 394: 475478. Sweet, R.D. and McDowell, EH. (1 974) Plasma dopa concentrations and the ‘on-off’ effect after chronic treatment of Parkinson’s disease. Neurology, 24: 953-956. Tolosa, E., Marti’, M.J., Valldeoriola, F. and Molinuevo, J.L. (1998) History of levodopa and dopaminergic agonists in Parkinson’s disease treatment. Neurology, SO(Supp1. 6 ) : S2s10.
Trugman, J.M., James, C.L. and Wooten, G.F. (1991) D I D 2 dopamine receptor stimulation by L-DOPA. Bruin. 114: 1429-1 440. VanLaar, T., Jansen, E.N., Essink, A.W. and Neef, C. (1992) Intranasal apomorphine in parkinsonian on-off fluctuations. Arch. Neurol., 49: 482484. Verhagen Metman, L., Blanchet, P.J., van den Munckhof, P., Del Dotto, P., Natte, R. and Chase, T.N. (1998) A trial of dextromethorphan in parkinsonian patients with motor response complications. Mov. Disord., 13: 414417. Verhagen Metman, L., Locatelli, E.R., Bravi, D., Mouradian, M.M. and Chase, T.N. (1997) Apomorphine responses in Parkinson’s disease and the pathogenesis of motor complications. Neurology, 48: 369-372. Wickens, J.R., Begg, A.J. and Arbuthnott, G.W. (1996) Dopamine reverses the depression of rat corticostriatal synapses which normally follows high-frequency stimulation of cortex in vitro. Neuroscience, 70: 1-5.
J.A. Reggia. E. Ruppin and D. Glanznian (Eds.) Progress in Brniri Researcii, Val 121 0 1999 Elsevier Science BV. All rights reserved.
CHAPTER 17
Thalamic and thalamocortical mechanisms underlying 3 Hz spike-and-wave discharges Alain Destexhe'.", David A. McCormick2 and Terrence J. Sejnowski3 'Neurophysiology Laboratory, Department of Physiology, Lava1 University, Quebec G l K 7P4, Canada 'Section of Neurobiology, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06510, USA 'The Howard Hughes Medical Institute and The Salk Institnte, Computational Neurobiology Laboratoly, 10010 North Torrey Pines Road, Lu Jolla, CA 92037, USA; Department of Biology, University of California San Diego, La Jolla, CA 92093, USA
Introduction
-
Paroxysmal 3 Hz oscillations occur in thalamic and thalamocortical circuits during petit ma1 or absence epilepsy. Experiments in thalamic slices have revealed first, that thalamic circuits can generate oscillations at 3 Hz following the application of antagonists of GABA, receptors and second, that the genesis of these oscillations is critically dependent on GABA,-mediated inhibition in thalamic relay neurons. The ionic mechanisms that are responsible for paroxysmal oscillations in thalamic slices can be explored in detailed models based on the known biophysical properties of thalamic neurons and the various types of synaptic currents that mediate interactions between them. These models, which provide a basic explanation for the frequency of the oscillations and the conditions that promote them, have been extended to include thalamocortical interactions. The recurrent excitation in the cortex and the corticothalamic feedback, together with GABA,-mediated inhibition in the thalamus, can account for many features of spike-and-wave oscillations, even when the thalamus is intact. These models generate detailed predictions that can be experimentally tested.
-
*Corresponding author. Tel: + 1 (418) 656 5711; fax: + 1 (418) 656 7898; email: [email protected]
Spike-and-wave epileptic seizures are characterized in humans by 3 Hz oscillations in the electroencephalogram (EEG) (Fig. 1). These epileptic oscillations have a sudden onset, and the seizures invade the entire cerebral cortex simultaneously. Spike-and-wave patterns of similar characteristics are also seen in a number of experimental models in cats, rats, mice and monkeys. The fact that EEG activity suddenly switches to spike-and-wave patterns (Fig. 1) suggests that it is generated in a central structure projecting widely to the cerebral cortex. The possible involvement of the thalamus in spike-and-wave seizures was initially suggested by Jasper and Kershman (1941) and is now supported by several findings. First, simultaneous thalamic and cortical recordings in humans during absence attacks demonstrated a clear thalamic participation during the seizures (Williams, 1953). The same study also showed that the oscillations usually started before signs of seizure appeared in the EEG. Second, a thalamic participation in human absence seizures was also shown by positron emission tomography (PET) (Prevett et al., 1995). Third, electrophysiological recordings in experimental models of spike-and-wave seizures show that cortical and thalamic cells fire prolonged discharges in phase with the 'spike' component, while the 'wave' is characterized by a silence in all
-
290
B
A 01
FP2
02
FP1
Is
0.3s
Fig. 1. Electroencephalogram (EEG) recording during a human absence seizure. A. The absence seizure lasted approximately five seconds and consisted of an oscillation at around 3 Hz which appeared nearly-simultaneously in all EEG leads. B. At higher temporal resolution, it is apparent that each cycle of the oscillation has interleaved spikes and waves. Channels FPl and FP2 measured the potential differences between frontal and parietal regions of the scalp whereas channels 0 1 and 0 2 correspond to the measures between occipital regions. Modified from Destexhe, 1992.
cell types (Pollen, 1964; Steriade, 1974; Avoli et al., 1983; McLachlan et al., 1984; Buzsaki et al., 1988; Inoue et al., 1993; Seidenbecher et al., 1998). Electrophysiological recordings also indicate that spindle oscillations, which are generated by thalamic circuits (Steriade et a]., 1990, 1993), can gradually be transformed into spike-and-wave discharges and all manipulations that promote or antagonize spindles have the same effect on spikeand-wave seizures (Kostopoulos et al., 1981a, 1981b; McLachlan et al., 1984). Finally, the spikeand-wave patterns disappear following thalamic lesions or by inactivating the thalamus (Pellegrini et al., 1979; Avoli and Gloor, 1981; Vergnes and Marescaux, 1992). Although these results do suggest a thalamic origin for spike-and-wave seizures, there is also strong evidence that the cortex has a decisive role: thalamic injections of high doses of GABA, antagonists, such as penicillin (Ralston and Ajmone-Marsan, 1956; Gloor et al., 1977) or bicuculline (Steriade and Contreras, 1998) led to 3 4 Hz oscillations with no sign of spike-and-wave discharge. On the other hand, injection of the same drugs to the cortex, with no change in the thalamus,
resulted in seizure activity with spike-and-wave patterns (Gloor et a]., 1977; Steriade and Contreras, 1998). The threshold for epileptogenesis was extremely low in the cortex compared to the thalamus (Steriade and Contreras, 1998). Finally, it was shown that a diffuse application of a dilute solution of penicillin to the cortex resulted in spikeand-wave seizures although the thalamus was intact (Gloor et al., 1977). A series of pharmacological results suggest that y-aminobutyric acid, (GABA,) receptors play a critical role in the genesis of spike-and-wave discharges. In rats, GABAB agonists exacerbate seizures, while GABA, antagonists suppress them (Hosford et al., 1992; Snead, 1992; Puigcerver et al., 1996; Smith and Fisher, 1996). More specifically, antagonizing thalamic GABA, receptors leads to the suppression of spike-and-wave discharges (Liu et al., 1992), which is another indication for a critical role of the thalamus. There are inhibitory connections between neurons in the reticular nucleus of the thalamus (RE) and thalamocortical (TC) cells. The critical role for thalamic GABA, receptors on TC cells was established by investigating the action of clonaze-
29 1
pam, an anti-absence drug, in slices. Clonazepam diminishes GABA,-mediated inhibitory postsynaptic potentials (IPSPs) in TC cells, reducing their tendency to burst in synchrony (Huguenard and Prince, 1994a; Gibbs et al., 1996). The action of clonazepam appears to reinforce GABAA receptors in the RE nucleus (Huguenard and Prince, 1994a; Hosford et al., 1997). Indeed, there is a diminished frequency of seizures following reinforcement of GABA, receptors in the RE nucleus (Liu et al., 1991). Perhaps the strongest evidence for the involvement of the thalamus was that in ferret thalamic
A
Spontaneous
E
slices, spindle oscillations can be transformed into slower and more synchronized oscillations at 3 Hz following blockade of GABA, receptors (Fig. 2; von Krosigk et al., 1993). This behavior is similar to the transformation of spindles to spikeand-wave discharges in cats following the systemic administration of penicillin, which acts as a weak GABAA receptor antagonist (Kostopoulos et al., 1981a, 1981b). Moreover, like spike-and-wave seizures in rats, the 3 Hz paroxysmal oscillations in thalamic slices are suppressed by GABA, receptor antagonists (Fig. 2; von Krosigk et al., 1993).
-
-
Evoked
F
c
Bicuculline and saclofen
-70 mV
G
--4+--0
H
0
2s
Fig. 2. Bicuculline-induced 3 Hz oscillation in thalamic slices. A. Control spindle sequence ( - 10 Hz) started spontaneously by an IPSP (arrow). B. Slow oscillation ( - 3 Hz)following block of GABA, receptors by bicuculline. C. Suppression of the slow oscillation in the presence of the GABABantagonist baclofen. D. Recovery after wash. E-H indicate the same sequence as A-D but oscillations were triggered by stimulation of internal capsule. Modified from from von Krosigk et al. (1993).
292
Taken together, these experiments suggest that both cortical and thalamic neurons are necessary to generate spike-and-wave rhythms, and that both GABA, and that GABA, receptors seem actively involved. However, the exact mechanisms are still unclear (Gloor and Fariello, 1988). In this paper, we review models for thalamic 3 Hz paroxysmal oscillations and for thalamocortical 3 Hz oscillations with spike-and-wave field potentials.
-
-
8-10 Hz (Fig. 3A; Destexhe et al., 1993b). The circuit also displayed a transformation to 3 Hz oscillations when the kinetics of the GABAergic current were slow (Fig. 3; Destexhe et al., 1993b). The decay of inhibition greatly affected the frequency of the spindle oscillations, with slow decay corresponding to low frequencies. When the decay
-
Modeling the genesis of paroxysmal discharges in the thalamus When the in vitro model of spindle waves was discovered (von Krosigk et al., 1993), it was also demonstrated that spindles can be transformed into - 3 Hz oscillations by blocking GABA, receptors (Fig. 2). It was further shown that this oscillation is sensitive to blockade of GABAB receptors by baclofen (Fig. 2) and is also suppressed by AMPAreceptor antagonists (von Krosigk et al., 1993). These in vitro experiments thus suggested that 3 Hz paroxysmal thalamic oscillations are mediated by GABAB IPSPs ( R E - + T C ) and AMPA EPSPs (TC RE). This possibility was investigated with computational models using a simple TC-RE circuit consisting of a single TC cell reciprocally connected to a single RE cell (scheme in Fig. 3; Destexhe et al., 1993b). The intrinsic firing behavior of the model TC cell was determined by I, and I,l ; these currents were modeled using HodgkinHuxley (1952) type of models based on voltage-clamp data in TC cells. Calcium regulation of I, was accounted for the waxing-and-waning of oscillations, as described previously (Destexhe et al., 1993a). The intrinsic firing properties of the RE cell were determined by I,, I,,,, and I, using Hodgkin-Huxley ( 1952) type kinetics and calciumactivated schemes as described previously (Destexhe et al., 1994a). The two cell types also included the fast INd and I , currents necessary to generate action potentials with kinetics taken from Traub and Miles (199 1). Synaptic interactions were mediated by glutamatergic and GABAergic receptors using kinetic models of postsynaptic receptors (Destexhe et al., 1994b, 1998b). The two-neuron circuit displayed waxing-andwaning spindle oscillations at a frequency of
-
-
-
Fig. 3. Transition from 8-10 Hz spindle oscillations to 3 Hz oscillations by slowing down the kinetics of GABAergic currents. A. 8-10 Hz spindle oscillations from a simple circuit consisting of one TC cell interconnected with one RE cell. The left panel shows a detail of a few cycles within the oscillation at 10 times higher resolution. Glutamatergic AMPA receptors were used from TC-RE and GABAergic GABA, receptors from RE-TC (decay rate constant p=O.l ms-I). B. Slower oscillations for slow GABAergic synapses. The decay rate constant of the GABAergic synapse was p = 0.003 ms I, similar to the decay rate of GABAs currents. Modified from Destexhe et al., 1993b. ~
293
was adjusted to match experimental recordings of GABA,-mediated currents (obtained from Otis et al., 1993), the circuit oscillated at around 3 Hz (Fig. 3B; Destexhe et al., 1993b). Several mechanisms have been proposed to account for the effects of blocking of GABA, receptors in thalamic circuits (Wallenstein, 1994; Wang et al., 1995; Destexhe et al., 1996a; Golomb et al., 1996). The model of Wallenstein (1994) tested the proposition that disinhibition of interneurons projecting to TC cells with GABA, receptors may result in stronger discharges when GABA, receptors are antagonized (Soltesz and Crunelli, 1992). A model including TC, RE and interneurons (Wallenstein, 1994) reproduced the stronger discharges in TC cells following application of bicuculline. Although it is possible that this mechanism plays a role in thalamically-generated epileptic discharges, it does not account for experiments showing the decisive influence of the RE nucleus in preparations devoid of interneurons (Huguenard and Prince, 1994a, 1994b). Increased synchrony and stronger discharges were also reported in the model of Wang et al. (1995), but the synchronous state coexisted with a desynchronized state of the network, which has never been observed experimentally. The cooperative activation proposed for GABA, receptors (Destexhe and Sejnowsh, 1995) produced robust synchronized oscillations and traveling waves at the network level (Golomb et al., 1996; Destexhe et al., 1996a), similar to those observed in thalamic slices (Kim et al., 1995). This property also led to the transformation of spindles to 3 Hz paroxysmal oscillations following block of GABA, receptors (Destexhe et al., 1996a). These modeling studies reached the conclusion that the transition from spindle to paroxysmal patterns can be achieved provided there was cooperativity in GABA, responses. This is analyzed in more detail below.
-
Models of the activation properties of GABA, responses
-
Paroxysmal 3 Hz discharges in the thalamus depend critically on GABA, responses. The underlying mechanisms have been explored with biophysical models (Destexhe and Sejnowski,
1995; Destexhe et al., 1996a). In these models, GABA, responses depended on the presynaptic pattern of activity and, in particular, GABAB inhibitory postsynaptic potentials (IPSPs) only occurred following long presynaptic bursts of spikes. This accounted for the different patterns of GABA, responses observed in the hippocampus (Dutar and Nicoll, 1988; Davies et al., 1990) and the thalamus (Huguenard and Prince, 1994b; Kim et al., 1997). This property is also important at the network level for the genesis of paroxysmal discharges in thalamic slices. The biophysical model of GABA, responses included the release, diffusion and uptake of GABA, its binding on postsynaptic receptors and the activation of K' channels by G-proteins (Destexhe and Sejnowski, 1995). The model tested the possibility that postsynaptic mechanisms could explain the non-linear stimulus dependence observed for GABA, responses. A model incorporating extracellular diffusion of GABA was necessary to account for features of GABA, responses in the hippocampus, where GABA spillover may be significant due to the high density of GABAergic terminals. In contrast, no spillover was necessary to explain thalamic GABAergic responses, which is consistent with the sparse aggregates of inhibitory terminals on TC cell dendrites (Liu et al., 1995b). Simulating the properties of GABA, responses in the thalamus therefore required a source of non-linearity located in the postsynaptic response rather than GABA spillover (Destexhe and Sejnowski, 1995). We hypothesized that this non-linearity arose from the transduction mechanisms underlying the activation of K' channels by G-proteins. The assumption that 4 G-proteins must bind to K' channels to open them provided the nonlinearity required to account for GABAB responses (Destexhe and Sejnowski, 1995); this is consistent with the tetrameric structure of K' channels (Hille, 1992). The properties of GABAergic responses in thalamic slices were simulated using models of RE! cells based on the presence of a low-threshold calcium current and lateral GABA,-mediated synaptic interactions within the RE nucleus (Fig. 4A; Destexhe and Sejnowski, 1995). Under normal conditions, stimulation in the RE nucleus evoked
294
biphasic IPSPs in TC cells, with a rather small GABA, component (Fig. 4B). We mimicked an increase of intensity by increasing the number of RE cells discharging. The ratio between GABA, and GABA, IPSPs was independent of the intensity of stimulation in the model (Destexhe and Sejnowski, 1999, as observed experimentally (Huguenard and Prince, 1994b). However, this ratio could be changed by blocking GABA, receptors locally in the RE nucleus, leading to enhanced burst discharge in RE cells and a more prominent GABA,
component in TC cells (Fig. 4C). This is consistent with the effect of clonazepam in reinforcing the GABA, IPSPs in the RE nucleus, resulting in diminished GABAB IPSPs in TC cells (Huguenard and Prince, 1994a). These simulations suggest that, because of the characteristic properties of GABA, receptors, the output of the RE nucleus onto TC cells is determined by the presence of GABA, interactions between RE cells. The presence of these GABA, synapses restricts the bursts of RE cells to few spikes and leads to IPSPs dominated by GABA, in TC cells. However, when this lateral inhibition is suppressed, RE cells produced prolonged bursts and evoked IPSPs dominated by GABA, in TC cells. Such a relationship between GABAB receptor activation and presynaptic discharge has been observed experimentally in dual intracellular recordings (Kim et al., 1997; Thomson and Destexhe, 1999). The consequences of this mechanism 3 Hz oscillations in thalamic for generating circuits are analyzed below.
-
Genesis of
- 3 Hz oscillations in thalamic circuits -
Fig. 4. Simulation of the effect of lateral inhibition in the thalamic reticular nucleus. GABA, response were enhanced in thalamocortical cells through disinhibition in the thalamic reticular nucleus. A. Connectivity: a simple network of RE cells was simulated with GABA, receptor-mediated synaptic interactions. All RE cells project to a single TC cell with synapses containing both GABA, and GABA, receptors. Models of the RE cells were taken from Destexhe et al. (1994a). B. In control conditions, the bursts generated in RE cells by stimulation have 2-8 spikes (inset) and evoke in TC cells a GABA,--dominated IPSP with a small GABA, component. C. When GABA, receptors are suppressed in RE, the bursts become much larger (inset) and evoke in TC cells a stronger GABA, component. Modified from Destexhe and Sejnowski, 1995.
To explain the genesis of 3 Hz paroxysmal oscillations in thalamic circuits, we investigated first the effect of GABA, vs. GABA, stimulation of TC cells. The thalamic circuit model was identical to that in a previous study (Destexhe et al., 1996a). TC cells had I,, I,,, 1, and I, currents, and RE cells had I,, I, and I, currents which were modeled using Hodgkin-Huxley kinetics based on voltageclamp data. Calcium-dependent upregulation of I, was included on TC cells to account for the waxing-and-waning of spindle oscillations. Synaptic interactions were modeled by AMPA receptors (TC RE) and a mixture of GABA, and GABA, receptors (RE- TC), with GABA, modeled as described above. Details of the model can be found in Destexhe et al. (1996a). Mimicking the output of the thalamic reticular network in Fig. 4B-C, a model TC cell was stimulated with presynaptic bursts of action potentials acting on GABA, and GABAB receptors (Fig. 5 ) . For brief bursts (3 spikes at 360 Hz), mimicking the output of the RE nucleus in control conditions (Fig. 4B), the TC cell produced subharmonic bursting similar to spindle oscillations (Fig. 5A).
-
295
The suppression of GABAA conductances in model RE cells produced prolonged discharges, as described above. When such prolonged discharges were used as the presynaptic signal (7 spikes at 360 Hz), mimicking the output of the disinhibited RE nucleus (Fig. 4C), strong GABA, IPSPs were activated and the TC cell could follow a stimulation at 3.3 Hz (Fig. 5B). The TC-cell bursts were larger due to the more complete deinactivation of I, provided by GABA, IPSPs.
A
B
10 Hz stimulation
3.3 Hz stimulation
-
-
The properties analyzed above (Figs. 4-5) can explain the experimental observation that blockade of GABA, receptors by application of bicuculline transform the spindle behavior into a slower (3-4 Hz) highly synchronous oscillation that are dependent on GABA, receptors (von Krosigk et al., 1993; Bal et al., 1995; Kim et al., 1995). These properties were integrated in models of thalamic circuits (Fig. 6A; Destexhe et al., 1996a). In control conditions (Fig. 6B), the circuit generated spindle oscillations. Suppression of GABA, receptors led to slower oscillations (Fig. 6C). These oscillations were a consequence of the properties of GABA, responses as described in Fig. 4. Following removal of GABA,-mediated inhibition, the RE cells could produce prolonged bursts that evoked strong GABA, currents in TC cells. These prolonged IPSPs evoked robust rebound bursts in TC cells (as in Fig. 5B), and TC bursts in turn elicited bursting in RE cells through EPSPs. This mutual TC-RE interactions recruited the system into a 3 4 H z oscillation, with characteristics similar to those of bicuculline-induced paroxysmal oscillations in ferret thalamic slices. The mechanisms responsible for these oscillations were similar to those that give rise to normal spindle oscillations, but the shift in the balance of inhibition leads to oscillations that were slower and more synchronized (see details in Destexhe et al., 1996a).
Model of spike-and-waveoscillations in the thalamocortical system
Fig. 5 . Simulated responses of thalamocortical cells to 10 Hz or 3 Hz stimulation on GABA, and GABA, receptors. A. 10 Hz stimulation with trains of 3 pulses at 360 Hz, occumng every 100ms. The GABA, conductance is represented (top trace) with the membrane potential (bottom). B. 3.3 Hz stimulation of GABA, receptors alone (the GABA, conductance is drawn on top). In this case, seven successive bursts were simulated with an interburst period of 300 ms; each burst in the stimulus consisted of a train of 18 pulses at 360 Hz. In contrast to the stimuli in A which evoked a weak GABA, component in the IPSP (see Fig. 4, the stimulus used in B evoked strong GABA,-mediated currents and the TC cell was recruited in secure rebound bursts responses. These TC bursts were larger due to the more complete deinactivation of I, provided by GABA, IPSPs. Modified from Destexhe et al., 1996a.
Experiments reviewed in the Introduction show that the thalamus is essential to generate 3 Hz spikeand-wave seizures, and indeed thalamic slices display paroxysmal oscillations at 3 Hz following application of GABA, antagonists, as analyzed in detail above. However, evidence from a number of experimental studies indicate that this thalamic 3 Hz oscillation is a phenomenon distinct from spike-and-wave seizures. Injections of GABA, antagonists in the thalamus with intact cortex failed to generate spike-and-wave seizures (Ralston and Ajmone-Marsan, 1956; Gloor et al., 1977; Steriade and Contreras, 1998). In these in vivo experiments, suppressing thalamic GABAA receptors led to ‘slow spindles’ around 4 Hz, quite different from
-
296
spike-and-wave oscillations. On the other hand, spike-and-wave discharges were obtained experimentally by diffuse application of GABA,
antagonists to the cortex (Gloor et al., 1977). Therefore, in vivo experiments indicate that spindles transform into spike-and-wave discharges by
Fig. 6 . Oscillations in a four-neuron circuit of thalamocortical and thalamic reticular cells. A. Left: circuit diagram consisting of two TC and two RE cells. Synaptic currents were mediated by AMPAlkainate receptors (from TC to RE; gAMpA= 0.2 pS), a mixture of pS) and GABA,-mediated lateral inhibition GABA, and GABA, receptors (from RE to TC; gcAsa,= 0.02 p!i and jj,,u=0.04 between RE cells (gcasn,,= 0.2 FS). Right: inset showing the simulated burst responses of TC and RE cells following current injection (pulse of 0.3 nA during 10 ms for RE and - 0.1 nA during 200 ms for TC). B. Spindle oscillations arose as the first TC cell (TCI ) started to oscillate, recruiting the two RE cells, which in turn recruited the second TC cell. The oscillation was maintained for a few cycles and repeated with silent periods of 15-25 s. C. Slow 3-4 Hz oscillation obtained when GABA, receptors were suppressed, mimicking the effect of bicuculline. The first TC cell (TCI) started to oscillate, recruiting the two RE cells, which in turn recruited the second TC cell. The mechanism of recruitment between cells was identical to spindle oscillations, but the oscillations were more synchronized, of slower frequency, and had a IS% longer silent period. The burst discharges were prolonged due to the loss of lateral inhibition in the RE. Modified from Destexhe et al., 1996a.
291
altering cortical inhibition without changes in the thalamus. We therefore investigated a thalamocortical model to explore possible mechanisms to explain these observations and to relate them to the 3 Hz thalamic oscillation (Destexhe, 1998). Intact thulamic circuits cun be forced into oscillations due to GABA, receptors
- 3 Hz
The first question we address is how the behavior of thalamic circuits is controlled by the cortex. Thalamic networks have a propensity to generate oscillations on their own, such as the 7-14Hz spindle oscillations (Steriade et al., 1993; von Krosigk et al., 1993). Although these oscillations are generated in the thalamus, the neocortex can trigger them (Steriade et al., 1972; Roy et al., 1984; Contreras and Steriade, 1996) and corticothalamic feedback exerts a decisive control over thalamic oscillations (Contreras et al., 1996). In computational models, this cortical control required more powerful corticothalamic EPSPs on RE cells compared to TC cells (Destexhe et al., 1998a). In these conditions, excitation of corticothalamic cells led to mixed EPSPs and IPSPs in TC cells, in which the IPSP was dominant, consistent with experimental observations (Burke and Sefton, 1966; Deschenes and Hu, 1990). If cortical EPSPs and IPSPs from RE cells were of comparable conductance, cortical feedback could not evoke oscillations in the thalamic circuit due to shunting effects between EPSPs and IPSPs (Destexhe et al., 1998a). The most likely reason for the experimental and modeling evidence for ‘inhibitory dominance’ in TC cells is that RE cells are extremely sensitive to cortical EPSPs (Contreras et al., 1993), probably due to powerful T-current in their dendrites (Destexhe et al, 1996b). In addition, cortical synapses contact only the distal dendrites of TC cells (Liu et al., 1995a) and are probably attenuated for this reason. Taken together, these data suggest that corticothalamic feedback operates mainly by eliciting bursts in RE cells, which in turn evoke powerful IPSPs on TC cells that largely overwhelm the direct cortical EPSPs. The effects of corticothalamic feedback on the thalamic circuit was investigated with the thalamic model (Fig. 7; Destexhe, 1998). Simulated cortical
EPSPs evoked bursts in RE cells (Fig. 7B, arrow), which recruited TC cells through IPSPs, and 10Hz oscillation in the circuit. triggered a During the oscillation, TC cells rebound once every 2 cycles following GABA,-mediated IPSPs and RE cells only discharged a few spikes, evoking GABA,-mediated IPSPs in TC cells with no significant GABAB currents (Fig. 7B). These features are typical of spindle oscillations (Steriade et al., 1993; von Krosigk et al., 1993). However, a different type of oscillatory behavior could be elicited from the circuit by repetitive stimulation at 3 Hz with high intensity (14 spikes every 333 ms; Fig. 7C). All cell types were entrained to discharge in synchrony at 3 Hz. On the other hand, repetitive stimulation at 3 Hz at low intensity produced spindle oscillations (Fig. 7D) similar to Fig. 7A. High-intensity stimulation at 10 Hz led to quiescence in TC cells (Fig. 7E), due to sustained GABAB currents, similar to a previous analysis (see Fig. 12 in Lytton et al., 1997). These simulations indicate that strong corticothalamic feedback at 3 Hz can force thalamic circuits in a 3 Hz oscillation (Destexhe, 1998). Cortical EPSPs force RE cells to fire large bursts (Fig. 7C, arrows), fulfilling the conditions needed to activate GABAB responses. The consequence was that TC cells were ‘clamped’ at hyperpolarized levels by GABA, IPSPs during 300 ms before they could rebound. The non-linear properties of GABA, responses are therefore responsible here for the coexistence between two types of oscillations in the same circuit: moderate corticothalamic feedback recruited the circuit in 10 Hz spindle oscillations, while strong feedback at 3 Hz could force the intact circuit at the same frequency due to the nonlinear activation properties of intrathalamic GABA, responses.
-
-
-
-
-
3 H z spike-and-wave oscillations in thalamocortical circuits A thalamocortical network consisting of different layers of cortical and thalamic cells was simulated to explore the impact of this mechanism at the network level (Destexhe, 1998). The network included thalamic TC and RE cells, and a simplified representation of the deep layers of the cortex, in which pyramidal (PY) cells constitute the major
298
source of corticothalamic fibers. As corticothalamic PY cells receive a significant proportion of their excitatory synapses from ascending thalamic axons (Hersch and White, 1981; White and Hersch, 1982), these cells mediate a monosynaptic excitatory feedback loop (thalamus-cortex-thalamus)
which was modeled here. The structure of the network, with TC, RE, PY and cortical interneurons (IN), is schematized in Fig. 8A. Each cell type contained the minimal set of calcium- and voltage-dependent currents necessary to account for their intrinsic properties: TC cells contained I,
-
Fig. 7 . Corticothalamic feedback can force thalamic circuits into 3 Hz oscillations due to the properties of GABA, receptors. A. Connectivity and receptor types in a circuit of thalamocortical (TC) and thalamic reticular (RE) neurons. Corticothalamic feedback was simulated through AMPA-mediated synaptic inputs (shown on the left of the connectivity diagram; total conductance of 1.2 pS to RE cells and 0.01 pS to TC cells). B. A single stimulation of corticothalamic feedback (arrow) entrained the circuit into a 10 Hz mode similar to spindle oscillations. C. With a strong-intensity stimulation at 3 Hz (arrows; 14 spikesktimulus), RE cells were recruited into large bursts, which evoked IPSPs onto TC cells dominated by GABA,-mediated inhibition. In this case, the circuit could be entrained into a different oscillatory mode, with all cells firing in synchrony. D. Weak stimulation at 3 Hz (arrows) entrained the circuit into spindle oscillations (identical intensity as in B). E. Strong stimulation at 10 Hz (arrows) led to quiescent TC cells due to sustained GABA, current (identical intensity as in C). Modified from Destexhe, 1998.
299
I, and a calcium-dependent upregulation of I,l, RE cells contained I,, PY cells had a slow voltagedependent K’ current I,,, responsible for
-
spike-frequency adaptation similar to ‘regularspiking’ pyramidal cells (Connors and Gutnick, 1990). All cell types had the INa and I, currents
Fig. 8. Transformation of spindle oscillations into 3 Hz spike-and-wave oscillations by reducing cortical inhibition. A. Connectivity between different cell types: 100 cells of each type were simulated, including TC and RE cells, cortical pyramidal cells (PY) and interneurons (IN). The connectivity is shown by continuous arrows, representing AMPA-mediated excitation, and dashed arrows, representing mixed GABA, and GABA, inhibition. In addition, PY cells were interconnected using AMPA receptors and RE cells were interconnected using GABA, receptors. The inset shows the repetitive firing properties of PY and IN cells following depolarizing current injection (0.75 nA during 200 ms; - 70 mV rest). B. Spindle oscillations in the thalamocortical network in control conditions. 5 cells of each type, equally spaced in the network, are shown (0.5 ms time resolution). The field potentials, consisting of successive negative deflections at 10 Hz, is shown at the bottom. C. Oscillations following the suppression of GABA,-mediated inhibition in cortical cells with thalamic inhibition intact. All cells displayed prolonged discharges in phase, separated by long periods of silences, at a frequency of - 2 Hz. GABA, currents were maximally activated in TC and PY cells during the periods of silence. Field potentials (bottom) displayed spike-and-wave complexes. Thalamic inhibition was intact in B and C. Modified from Destexhe, 1998.
-
300
necessary to generate action potentials. All currents were modeled using Hodghn-Huxley (1952) type kinetics based on voltage-clamp data. Synaptic interactions were mediated by glutamate AMPA and NMDA receptors, as well as GABAergic GABA, and GABA, receptors, and were simulated using kinetic models of postsynaptic receptors (Destexhe et al., 1994b, 1998b). All excitatory connections (TC RE, TC IN, TC PY, PY PY, PY IN, PY -,RE, PY TC) were mediated by AMPA receptors, some inhibitory connections (RE-+ TC, IN- PY) were mediated by a mixture of GABA, and GABA, receptors, while intra-RE connections were mediated by GABA, receptors. Simulations were also performed using NMDA receptors added to all excitatory connections (with maximal conductance set to 25% of the AMPA conductance) and no appreciable difference was observed. They were not included in the present model. Extracellular field potentials were calculated from postsynaptic currents in PY cells according to the model of Nunez (1981) and assuming that all cells were arranged equidistantly in a one dimensional layer (see details in Destexhe, 1998). In control conditions (Fig. 8B), the thalamocortical network generated synchronized spindle oscillations with cellular discharges in phase between in all cell types, as observed experimentally (Contreras and Steriade, 1996). TC cells discharged on average once every two cycles following GABA,-mediated IPSPs, while all other cell types discharged roughly at every cycle at 10 Hz, consistent with the typical features of spindle oscillations observed intracellularly (Steriade et al., 1990; von Krosigk et al., 1993). The simulated field potentials displayed successive 10Hz (Fig. 8B), in negative deflections at agreement with the pattern of field potentials during spindle oscillations (Steriade et al., 1990). This pattern of field potentials was generated by the limited discharge in PY cells, which fired roughly one spike per oscillation cycle. Diffuse application of the GABA, antagonist penicillin to the cortex, with no change in thalamus, leads to spike-and-wave oscillations in cats (Gloor et al., 1977). In the model, this situation was simulated by decreasing GABA, conductances in
--
+
-+
+
-
-
+
cortical cells, with thalamus left intact. Alteration of GABA, receptors in the cortex had a considerable impact in generating spike-and-wave. Under these conditions, the spindle oscillations transformed into 2-3 Hz oscillations (Fig. 8C; Destexhe, 1998). The field potentials generated by these oscillations reflected a pattern of spikes and waves (Fig. 8C, bottom). Spike-and-wave discharges developed progressively from spindle oscillations. Reducing the intracortical fast inhibition from 100% to 50% increased the occurrences of prolonged high-. frequency discharges during spindle oscillations (Destexhe, 1998). Further decrease in intracortical fast inhibition led to fully-developed spike-andwave patterns similar to Fig. 8C (Destexhe, 1998). Field potentials displayed one or several negative/ positive sharp deflections, followed by a slowly-developing positive wave (Fig. 8C, bottom). During the ‘spike’, all cells fired prolonged highfrequency discharges in synchrony, while the ‘wave’ was coincident with neuronal silence in all cell types. This portrait is typical of experimental recordings of cortical and thalamic cells during spike-and-wave patterns (Pollen, 1964; Steriade 1974; Avoli et al., 1983; McLachlan et al., 1984; Buzsaki et al., 1988; Inoue et al., 1993; Seidenbecher et al., 1998). Some TC cells stayed hyperpolarized during the entire oscillation (second TC cell in Fig. 8C), as also observed experimentally (Steriade and Contreras, 1995). A similar oscillation arose if GABA, receptors were suppressed in the entire network (not shown). These simulations suggest that spindles can be transformed into an oscillation with field potentials displaying spike-and-wave, and that this transformation can occur by alteration of cortical inhibition with no change in the thalamus, in agreement with spike-and-wave discharges obtained experimentally by diffuse application of diluted penicillin onto the cortex (Gloor et al., 1977). The mechanism of the 3 Hz oscillation of this model depends on a thalamocortical loop where both cortex and thalamus are necessary, but none of them generates the 3 Hz rhythmicity alone (see details in Destexhe, 1998). Removing intrathalamic GABA,-mediated inhibition also affected the oscillation frequency,
-
301
but did not generate spike-and-wave, because pyramidal cells were still under the strict control of cortical fast inhibition (Destexhe, 1998). This is in agreement with in vivo injections of bicuculline into the thalamus, which exhibited slow oscillations with increased thalamic synchrony, but no spikeand-wave patterns in the field potentials (Ralston and Ajmone-Marsan, 1956; Steriade and Contreras, 1998). In the model, spike-and-wave oscillations may follow a similar waxing-and-waning envelope as spindles, and were a network consequence of the properties of a single ion channel (Ih)in TC cells (Destexhe, 1998). A calcium-dependent upregulation of I, was included in TC cells similar to previous models (Destexhe et al., 1993a, 1996a). The possibility that I,, upregulation underlies the waxing-and-waning of spindles at the level of thalamic networks has been demonstrated in vitro (Bal and McCormick, 1996; Luthi and McCormick, 1998) and predicted by models (Destexhe et al., 1993b; 1996a). This mechanism may also underlie the waxing-and-waning of spindles at the level of thalamocortical networks (Destexhe et al., 1998a). The present model suggests that the upregulation of I, in TC cells is responsible for temporal modulation of spike-and-wave oscillations and may evoke several cycles of spike-and-wave oscillations, interleaved with long periods of silence ( 20 sec), as is observed experimentally in sleep spindles and spike-and-wave epilepsy, thus emphasizing further the resemblance between the two types of oscillation.
-
A thalamocortical loop mechanism for spike-and-wave oscillations
- 3 Hz
During spindles, the oscillation is generated by intrathalamic interactions (TC-RE loops) and is reinforced by thalamocortical loops, as suggested in a previous model (Destexhe et al., 1998a). The combined action of intrathalamic and thalamocortical loops provides RE cells with moderate excitation, which evokes GABA,-mediated IPSPs in TC cells and sets the frequency to 10 Hz. During spike-and-wave oscillations, an increased cortical excitability provides corticothalamic feedback that is strong enough to force prolonged burst
-
discharges in RE cells, which in turn evokes IPSPs in TC cells dominated by the GABA, component. In this case, the prolonged inhibition sets the frequency to 3 Hz and the oscillation is generated by a thalamocortical loop in which the thalamus is intact (see details in Destexhe, 1998). Therefore, if the cortex is inactivated during spike-and-wave, this model predicts that the thalamus should resume generating spindle oscillations, as observed experimentally in cats treated with penicillin (Gloor et al., 1979). Figure 9 shows the phase relations between the different cell types in this model of spike-andwave. High-frequency discharges generated ‘spike’ components in the field potentials, whereas ‘wave’ components were generated by GABA, IPSPs in PY cells due to the prolonged firing of cortical interneurons. The hyperpolarization of PY cells during the ‘wave’ also contained a significant contribution from the voltage-dependent K’ current I,, which was maximally activated due to the prolonged discharge of PY cells during the ‘spike’. The ‘wave’ component in this model is therefore due to two types of K’ currents, one intrinsic and the other GABA,-mediated. The relative contribution of each current to the ‘wave’ depends on their respective conductance values (see details in Destexhe, 1998). The ‘spike’ component was generated by a concerted prolonged discharge of all cell types. However, the discharges were not perfectly in phase, as indicated in Fig. 9B. There was a significant phase advance of TC cells, as observed experimentally (Inoue et al., 1993; Seidenbecher et al., 1998). This phase advance was responsible for the initial negative spike in the field potentials, which coincided with the first spike in the TC cells (Fig. 9B, dashed line). This feature implements the precedence of EPSPs over IPSPs in the PY cell in order to generate spike-and-wave complexes. The simulations therefore suggest that the initial spike of spike-and-wave complex is due to thalamic EPSPs that precede other synaptic events in PY cells (Destexhe, 1998). Thalamic EPSPs may also trigger an initial avalanche of discharges due to pyramidal cell firing, before IPSPs arises, which would also result in a pronounced negative spike component in field potentials.
-
302
Discussion This paper reviewed experiments and models that provide a new view for the genesis of spike-andwave oscillations in thalamocortical systems. The proposed mechanism for spike-and-wave discharges is summarized here and corroborating experimental evidence and predictions are presented. A mechanism for spike-and-wave The primary biophysical component of this mechanism is the activation properties of GABA,
receptors. In the model of GABA, receptor activation based on a G-protein kinetic scheme, a multiplicity of G-protein binding sites accounted for the nonlinearities of GABA, responses (Destexhe and Sejnowski, 1995). At the level of thalamic networks, this property is responsible for the coexistence of two types of oscillations: spindle oscillations for moderate discharges, insufficient to activate GABA, responses, and slow paroxysmal oscillations for prolonged discharge patterns, for which GABA, responses are maximal (Fig. 6C; Destexhe et al., 1996a). These properties can account for the slow paroxysmal oscillations observed in thalamic slices following block of
B
A “Spike” “Wave”
t
t LFP
PY
IN
RE
TC
20 ms Fig. 9. Phase relationships during simulated spike-and-wave discharges. A. Local field potentials (LFP) and representative cells of each type during spike-and-wave oscillations. Spike: all cells displayed prolonged discharges in synchrony, leading to spiky field potentials. Wave: the prolonged discharge of RE and IN neurons evoked maximal GABA,-mediated IPSPs in TC and PY cells respectively (dashed arrows), stopping the firing of all neuron types during a period of 300-500 ms, and generating a slow positive wave in the field potentials. The next cycle restarted due to the rebound of TC cells following GABA, IPSPs (arrow). B . Phase relationships in the thalamocortical model. TC cells discharged first, followed by PY, RE and IN cells. The initial negative peak in the field potentials coincided with the first spike in TC cells before PY cells started firing, and was generated by thalamic EPSPs in PY cells. Modified from Destexhe, 1998.
303
GABA, receptors (Fig. 2: von Krosigk et a]., 1993) and fully agree with dual intracellular recordings in ferret thalamic slices (Kim et al., 1997). A second component of this mechanism is the powerful corticothalamic feedback. We propose that corticothalamic feedback operates mainly on RE cells, resulting in a dominant IPSP in TC cells. This mechanism can account for the properties of spindle oscillations (Destexhe et a]., 1998a). With this type of corticothalamic feedback, cortical EPSPs can force intact thalamic circuits to fire at the same frequency as the slow paroxysmal oscillation (Fig. 7C; Destexhe, 1998). If cortical EPSPs are strong enough, RE cells are forced into prolonged burst discharges and evoke GABA, IPSPs in TC cells. This mechanism could be tested experimentally, which provides an important prediction of this model (see details in Destexhe, 1998). A third component is the strong corticothalamic feedback provided by an increased excitability in cortical networks. If GABA, inhibition is reduced in cortex, pyramidal cells generate exceedingly strong discharges, which are strong enough to entrain the thalamus in the 3 Hz mode. At the network level, reducing cortical GABA, receptor function leads to 3 Hz oscillations with all cell types generating prolonged discharge patterns. Simulated field potentials indicate that this pattern of firing generates spike-and-wave waveforms (Fig. 8C; Destexhe, 1998).
-
Similarities and differences with experimental data
This model is consistent with a number of experimental results on spike-and-wave epilepsy: (a) thalamic and cortical neurons discharge in synchrony during the ‘spike’ while the ‘wave’ is characterized by neuronal silence (Pollen, 1964; Steriade 1974; Avoli et al., 1983; McLachlan et al., 1984; Buzsaki et al., 1988; Inoue et al., 1993; Seidenbecher et al., 1998), similar to Fig. 9A; (b) TC cells firing precedes that of other cell types, followed by cortical cells and RE cells (Inoue et al., 1993; Seidenbecher et al., 1998), similar to the phase relations in the present model (Fig. 9B); (c) spike-and-wave patterns disappear following either removal of the cortex (Avoli and Gloor, 1982) or
the thalamus (Pellegrini et al., 1979; Vergnes and Marescaux, 1992), as predicted by the present mechanism; (d) antagonizing thalamic GABA, receptors suppresses spike-and-wave discharges (Liu et al., 1992), consistent with this model; and (e) spindle oscillations can be gradually transformed into spike-and-wave discharges (Kostopoulos et al., 1981a, 1981b), as observed in this model (Destexhe, 1998). This model also emphasizes a critical role for the RE nucleus. Reinforcing GABA,-mediated inhibition in the RE nucleus will antagonize the genesis of large burst discharges in RE cells by corticothalamic EPSPs, antagonizing the genesis of GABA,-mediated IPSPs in TC cells, therefore suppresses spike-and-wave discharges (Destexhe, 1998). This property is consistent with the diminished frequency of seizures observed following reinforcement of GABA, receptors in the RE nucleus (Liu et al., 1991) and the suppression of spike-and-wave following chemical lesion of the RE nucleus (Buzsaki et al., 1988). It is also consistent with the action of the anti-absence drug clonazepam, which acts by preferentially enhancing GABA, responses in the RE nucleus (Hosford et al., 1997), leading to diminished GABA,mediated IPSPs in TC cells (Huguenard and Prince, 1994a; Gibbs et al., 1996). In addition, reinforcing the T-current in RE cells lowered the threshold for spike-and-wave in the model (Destexhe, 1998), consistent with experimental observations (Tsakiridou et al., 1995). The model is also consistent with the failure to observe spike-and-wave from injections of GABA, antagonists in the thalamus (Ralston and AjmoneMarsan, 1956; Gloor et al., 1977; Steriade and Contreras, 1998). In the model, suppressing thalamic GABA, receptors led to ‘slow spindles’ around 4 Hz, distinctly different from spike-andwave oscillations (Destexhe, 1998). In this case, the discharge of pyramidal cells was controlled by cortical GABA,-mediated inhibition and, due to this strict control, no prolonged discharges and no spike-and-wave patterns were generated in the cortex. On the other hand, a number of experimental observations are not consistent with the present model. First, an apparent intact cortical inhibition
304
was reported in cats treated with penicillin (Kostopoulos et al., 1983). However, this study did not distinguish between GABA, and GABAB-mediated inhibition. In the present model, even when GABAA was antagonized, IPSPs remained of approximately the same size because cortical interneurons fired stronger discharges (Fig. 8C) and led to stronger GABA, currents. There was a compensation effect between GABAA and GABABmediated IPSPs (not shown), which may lead to the misleading observation that inhibition is preserved. Second, some GABA, agonists, like barbiturates, may increase the frequency of seizures (Vergnes et al., 1984), possibly through interactions with GABA, receptors in TC cells (Hosford et al., 1997). A similar effect was seen in the model (Destexhe, 1998), but this effect was weak. More accurate simulation of these data would require modeling the variants of GABA, receptor types in different cells to address how the threshold for spike-and-wave discharges is affected by various types of GABAergic conductances. These points will be considered in future models. Third, the present model only investigated a thalamocortical loop scenario for the genesis of spike-and-wave oscillations but other mechanisms could also contribute. Although most experimental data favor a mechanism involving both the thalamus and the cortex (see Introduction), a number of experimental studies also point to a possible intracortical mechanism for spike-and-wave. Experiments revealed spike-and-wave in isolated cortex or athalamic preparations in cats (Marcus and Watson, 1966; Pellegrini et al., 1979; Steriade and Contreras, 1998). However, this type of paroxysmal oscillation had a different morphology from the typical ‘thalamocortical’ spike-and-wave pattern and was also slower in frequency (1-2.5 Hz vs. 3.5-5 Hz; Pellegrini et al., 1979). By contrast, intracortical spike-and-wave discharges were not observed in athalamic rats (Vergnes and Marescaux, 1992). Since no intracellular recordings were made during the presumed spike-and-wave discharges in the cat isolated cortex, it is not clear if this oscillation represents the same spike-andwave paroxysm as in the intact thalamocortical system. Future models should investigate the
possibility of intracortically-generated spike-andwave when more precise experimental data will be available. In conclusion, the models summarized here provide insights into a thalamocortical loop mechanism that may be responsible for spike-and-wave discharges based on the intrinsic and synaptic properties of thalamic and cortical cells. The qualitative characteristics displayed by the simulations are consistent with several experimental models of spike-and-wave, as well as with thalamic slice experiments. A critical element of the model is the high sensitivity of RE cells to cortical EPSPs. Since thalamic RE cells may generate bursts of spikes through dendritic T-currents (Destexhe et al, 1996b), strategies to suppress seizures could be developed that focus on these dendrites.
Acknowledgments Research was supported by the Medical Research Council of Canada, the Howard Hughes Medical Institute, the National Institutes of Health and the Klingenstein Fund. All simulations were carried out using NEURON (Hines and Carnevale, 1997). Supplementary information such as computergenerated movies are available on the Internet (http://cns.fmed.ulaval.ca or http://www.cnl.salk. edu/ alaid ).
-
References Avoli, M. and Gloor, P. (1981) The effect of transient functional depression of the thalamus on spindles and bilateral synchronous epileptic discharges of feline generalized penicillin epilepsy. Epilepsia, 22: 443452. Avoli, M. and Gloor, P. (1982) Role of the thalamus in generalized penicillin epilepsy: observations on decorticated cats. Exp. Neurol., 17: 386-402. Avoli, M., Gloor, P. Kostopoulos, G. and Gotman, J. (1983) An analysis of penicillin-induced generalized spike and wave discharges using simultaneous recordings of cortical and thalamic single neurons. J. Neurophysiol., 5 0 8 19-837. Bal, T. and McCormick, D.A. (1996) What stops syncrhonized thalamocortical oscillations? Neuron, 17: 297-308. Bal, T., von Krosigk, M. and McCormick, D.A. (1995) Synaptic and membrane mechanisms underlying synchronized oscillations in the ferret LGNd in vitro. J. Physiol., 483: 641-663. Burke, W. and Sefton. A.J. (1966) Inhibitory mechanisms in lateral geniculate nucleus of rat. J. Physiol., 187: 23 1-246. Buzsalu, G., Bickford, R.G., Ponomareff, G., Thal, L.J., Mandel, R. and Gage, F.H. (1988) Nucleus basalis and
305 thalamic control of neocortical activity in the freely moving rat. J. Neurosci., 8: 4007-4026. Connors, B.W. and Gutnick, M.J. (1990) Intrinsic firing patterns of diverse neocortical neurons. Trends Neurosci., 13: 99- I 04. Contreras, D. and Steriade, M. (1996) Spindle oscillation in cats: the role of corticothalamic feedback in a thalamicallygenerated rhythm. J. Physiol., 490: 159-179. Contreras, D., Curro Dossi, R. and Steriade, M. (1993) Electrophysiological properties of cat reticular thalamic neurones in vivo. J. Physiol., 470: 273-294. Contreras, D., Destexhe, A,, Sejnowski, T.J. and Steriade, M. (1996) Control of spatiotemporal coherence of a thalamic oscillation by corticothalamic feedback. Science, 274: 77 1-774. Davies, C.H., Davies, S.N. and Collingridge, G.L. (1990) Paired-pulse depression of monosynaptic GABA-mediated inhibitory postsynaptic responses in rat hippocampus. J. Physiol., 424: 513-531. DeschCnes, M. and Hu, B. (1990) Electrophysiology and pharmacology of the corticothalamic input to lateral thalamic nuclei: an intracellular study in the cat. Eur: J. Neurosci., 2: 140-1 52. Destexhe, A. (1992) Non-linear Dynamics of the Rhythmical Activity of the Brain (in French), Doctoral Dissertation, Universite Libre de Bruxelles, Brussels, Belgium. Destexhe, A. (1998) Spike-and-wave oscillations based on the properties of GABA, receptors. J. Neurosci., 18: 9099-9 1 I 1. Destexhe, A. and Sejnowski, T.J. (1995) G-protein activation kinetics and spill-over of GABA may account for differences between inhibitory responses in the hippocampus and thalamus. Proc. Nutl. Acad. Sci. USA, 92: 9515-9519. Destexhe, A,, Babloyantz, A. and Sejnowski, T.J. (1993a) Ionic mechanisms for intrinsic slow oscillations in thalamic relay neurons. Biophys. J., 65: 1538-1552. Destexhe, A,, McCormick, D.A. and Sejnowski, T.J. (1993b) A model for 8-10 Hz spindling in interconnected thalamic relay and reticularis neurons. Biophys. J., 65: 2474-2478. Destexhe, A., Contreras, D. and Steriade, M. (1998a) Mechanisms underlying the synchronizing action of corticothalamic feedback through inhibition of thalamic relay cells. J. Neurophysiol., 79: 999-1016. Destexhe, A., Contreras, D., Sejnowski, T.J. and Steriade, M. (1994a) A model of spindle rhythmicity in the isolated thalamic reticular nucleus. J. Neurophysiol., 72: 803-8 18. Destexhe. A., Mainen, Z.F. and Sejnowski, T.J. (1994b) An efficient method for computing synaptic conductances based on a kinetic model of receptor binding. Neur Compur., 6: 14-18. Destexhe, A,, Mainen, Z.F. and Sejnowski, T.J. (1998b) Kinetic models of synaptic transmission. In: C. Koch and I. Segev (Eds.), Methods in Neuronnl Modeling (2nd ed). Cambridge, MA: MIT Press, pp. 1-26. Destexhe, A., Bal, T., McCormick, D.A. and Sejnowski, T.J. ( 1996a) Ionic mechanisms underlying synchronized oscilla-
tions and propagating waves in a model of ferret thalamic slices. J. Neurophysiol., 76: 2049-2070. Destexhe, A., Contreras, D., Steriade, M., Sejnowski, T.J. and Huguenard, J.R. (1996b) In vivo, in vitro and computational analysis of dendritic calcium currents in thalamic reticular neurons. J. Neurosci., 16: 169-185. Dutar, P. and Nicoll, R.A. (1988) A physiological role for GABA, receptors in the central nervous system. Nature, 332: 156-1 58. Gibbs, J.W., Berkow-Schroeder, G. and Coulter, D.A. (1996) GABA, receptor function in developing rat thalamic reticular neurons: whole cell recordings of GABA-mediated currents and modulation by clonazepam. J. Neurophysiol., 76: 2568-2579. Gloor, P. and Fariello, R.G. (1988) Generalized epilepsy: some of its cellular mechanisms differ from those of focal epilepsy. Trends Neurosci., 11: 63-68. Gloor, P., Pellegrini, A. and Kostopoulos, G.K. (1979) Effects of changes in cortical excitability upon the epileptic bursts in generalized penicillin epilepsy of the cat. ElecrroencephaIogr: Clin. Neurophysiol., 46: 274-289. Gloor, P., Quesney, L.F. and Zumstein, H. (1977) Pathophysiology of generalized penicillin epilepsy in the cat: the role of cortical and subcortical structures. 11. Topical application of penicillin to the cerebral cortex and subcortical structures. Electroencephalogt: Clin. Neurophysiol., 43: 79-94. Golomb, D., Wang, X.J. and Rinzel, J. (1996) Propagation of spindle waves in a thalamic slice model. J. Neurophysiol., 75: 750-769. Hersch. S.M. and White, E.L. (1981) Thalamocortical synapses on corticothalamic projections neurons in mouse SmI cortex: electron microscopic demonstration of a monosynaptic feedback loop. Neurosci. Lett.. 24: 207-210. Hille, B. (1992) Ionic Channels of Excitable Membranes Sunderland: Sinauer Associates. Hines, M.L. and Camevale, N.T. (1997) The NEURON Computation, 9: simulation environment. Neural 1179-1209. Hodgkin, A.L. and Huxley, A.F. (1952) A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol., 117: 500-544. Hosford, D.A., Clark, S., Cao, Z., Wilson, W.A. Jr, Lin, EH., Morrisett, R.A. and Huin, A. (1992) The role of GABA, receptor activation in absence seizures of lethargic (Ih/lh) mice. Science, 257: 398-401. Hosford, D.A., Wang, Y. and Cao, Z. (1997) Differential effects mediated by GABA, receptors in thalamic nuclei of lh/lh model of absence seizures. Epilepsy Res., 27: 55-65. Huguenard, J.R. and Prince, D.A. (l994a) Clonazepam suppresses GABA,-mediated inhibition in thalamic relay neurons through effects in nucleus reticularis. J. Neurophysiol., 7 1: 2576-258 I . Huguenard, J.R. and Prince, D.A. (1994b) Intrathalamic rhythmicity studied in vitro: nominal T-current modulation causes robust anti-oscillatory effects. J. Neurosci., 14: 5485-5502.
306 Inoue, M., Duysens, J., Vossen, J.M.H. and Coenen, A.M.L. (1993) Thalamic multiple-unit activity underlying spikewave discharges in anesthetized rats. Bruin Res., 612: 35-40. Jasper, H. and Kershman, J. (1941) Electroencephalographic classification of the epilepsies. Arch. Neurol. Physchiut., 45: 903-943. Kim, U., Bal, T. and McCormick, D.A. (1995) Spindle waves are propagating synchronized oscillations in the ferret LGNd in vitro. J. Neurophysiol., 74: 1301-1323. Kim, U . , SanchezVives, M.V. and McCormick, D.A. (1997) Functional dynamics of GABAergic inhibition in the thalamus. Science, 278: 130-134. Kobtopoulos, G., Avoli, M. and Gloor, P. (1983) Participation of cortical recurrent inhibition in the genesis of spike and wave discharges in feline generalized epilepsy. Bruin Res., 267: 101-112. Kostopoulos, G., Gloor, P., Pellegi-ini, A. and Gotman, J. (1981a) A study of the transition from spindles to spike and wave discharge in feline generalized penicillin epilepsy: microphysiological features. Exp. Neurol., 73: 55-77. Kostopoulos, G., Gloor, P., Pellegrini, A. and Siatitsas, 1. (1981b) A study of the transition from spindles to spike and wave discharge in feline generalized penicillin epilepsy: EEG features. Exp. Neurol., 73: 43-54. Liu, X.B., Honda, C.N. and Jones, E.G. (199%) Distribution of four types of synapse on physiologically identified relay neurons in the ventral posterior thalamic nucleus of the cat. J. Comp. Neurol., 352: 69-91, Liu, X.B., Warren, R.A. and Jones, E.G. (199%) Synaptic distribution of afferents from reticular nucleus in ventroposterior nucleus of the cat thalamus. J. Comp. Neurol., 352: 187-202. Liu, Z.. Vergnes, M., Depaulis, A. and Marescaux, C. (1991) Evidence for a critical role of GABAergic transmission within the thalamus in the genesis and control of absence seizures in the rat. Bruin Res., 545: 1-7. Liu, Z.. Vergnes, M., Depaulis, A. and Marescaux, C. (1992) Involvement of intrathalamic GABA, neurotransmission in the control of absence seizures in the rat. Neuroscience, 48: 87-93. Luthi, A. and McCormick, D.A. (1998) Periodicity of thalamic synchronized oscillations: the role of Ca”-mediated upregulation of I,,. Neuron, 20: 553-563. Lytton, W.W., Contreras, D., Destexhe, A. and Steriade, M. (1997) Dynamic interactions determine partial thalamic quiescence in a computer network model of spike-and-wave seizures. J. Neurophysiol., 71: 1679-1696. Marcus, E.M. and Watson, C.W. (1966) Bilateral synchronous spike wave electrographic patterns in the cat: interaction of bilateral cortical foci in the intact, the bilateral corticalcallosal and adiencephalic preparations. Arch. Neurol., 14: 601-610, 1966. McLachlan, R.S., Avoli, M. and Gloor, P. (1984) Transition from spindles to generalized spike and wave discharges in the
cat: simultaneous single-cell recordings in the cortex and thalamus. Exp. Neurol., 85: 413-425. Nunez, P.L. (1981) Electric Fields of the Bruin. The Neurophysics of EEG, Oxford: Oxford University Press. Otis, T.S., Dekoninck, Y.and Mody, I. (1993) Characterization of synaptically elicited GABA, responses using patch-clamp recordings in rat hippocampal slices. J. Physiol., 463: 391-407. Pellegrini, A., Musgrave, J. and Gloor, P. (1979) Role of afferent input of subcortical origin in the genesis of bilaterally synchronous epileptic discharges of feline generalized epilepsy. Exp. Neurol., 64: 155-173. Pollen, D.A. (1964) Intracellular studies of cortical neurons during thalamic induced wave and spike. Electroencephulogr. Clin. Neurophysiol., 17: 398404. Prevett. M.C., Duncan, J.S., Jones, T., Fish, D.R. and Brooks, D.J. (1995) Demonstration of thalamic activation during ’ and PET. Neurology, typical absence seizures during H:O 45: 1396-1402. Puigcerver, A,, Van Luijtenaar, E.J.L.M., Drinkenburg, W.H.I.M. and Coenen, A.L.M. (1996) Effects of the GABA, antagonist CGP-35348 on sleep-wake states, behaviour and spike-wave discharges in old rats. Bruin Res. Bull., 40: 157-1 62. Ralston, B. and Ajmone-Marsan, C. (1956) Thalamic control of certain normal and abnormal cortical rhythms. Electroencephulog,: Clin. Neurophysiol., 8: 559-582. Roy, J.P., Clercq, M., Steriade, M. and Deschhes, M. (1984) Electrophysiology of neurons in lateral thalamic nuclei in cat: mechanisms of long-lasting hyperpolarizations. J. Neuruphysiol., 51: 1220-1235. Seidenbecher, T., Staak, R. and Pape, H.C. (1998) Relations between cortical and thalamic cellular activities during absence seizures in rats. Eul: J. Neurosci., 10: 1103-1 112. Smith, K.A. and Fisher, R.S. (1996) The selective GABA, antagonist CGP-35348 blocks spike-wave bursts in the cholesterol synthesis rat absence epilepsy model. Brain R e x , 729: 147-150. Snead, O.C. (1992) Evidence for GABA,-mediated mechanisms in experimental generalized absence seizures. Eur. J. Phurmacol., 21 3: 343-349. Soltesz, I. and Crunelli, V. (1992) GABA, and pre- and postsynaptic GABA, receptor-mediated responses in the lateral geniculate nucleus. Prog,: Bruin Res., 90: 151-169. Steriade, M. (1974) Interneuronal epileptic discharges related to spike-and-wave cortical seizures in behaving monkeys. Electroencephulogr. Clin. Neurophysiol., 37: 247-263. Steriade, M. and Contreras, D. (1995) Relations between cortical and thalamic cellular events during transition from sleep patterns to paroxysmal activity. J. Neurosci., 15: 623-642. Steriade, M. and Contreras, D. (1998) Spike-wave complexes and fast components of cortically generated seizures. I. Role of neocortex and thalamus. J. Neurophysiol., SO: 1439- 1455.
307 Steriade, M., Jones, EG. and Llinhs, R.R. (1990) Thalamic Oscillations and Signalling, New York: John Wiley & Sons. Steriade, M., McCormick, D.A. and Sejnowski, T.J. (1993) Thalamocortical oscillations in the sleeping and aroused brain. Science, 262: 679-685. Steriade, M., Wyzinski, P. and Apostol, V. (1972) Corticofugal projections governing rhythmic thalamic activity. In: T.L. Frigyesi, E. Rinvik and M.D. Yahr, (Eds.), Coriicothalamic Projections and Sensorimotor Aciivities. New York: Raven Press, pp. 221-272. Thomson, A.M. and Destexhe, A. (1999) Dual intracellular recordings and computational models of slow IPSPs in rat neocortical and hippocampal slices. Neuroscience (in press) Traub, R.D. and Miles, R. (1991) Neuronal Networks qf the Hippocampus. Cambridge: Cambridge University Press. Tsakiridou, E., Bertollini, L., de Curtis, M., Avanzini, G. and Pape, H.C. (1995) Selective increase in T-type calcium conductance of reticular thalamic neurons in a rat model of absence epilepsy. J. Neurosci., 15: 31 10-31 17. Vergnes, M. and Marescaux, C. (1992) Cortical and thalamic lesions in rats with genetic absence epilepsy. J. Neuc Trans., 35 (Suppl.): 71-83.
Vergnes, M., Marescaux, C., Micheletti, G., Depaulis, A,, Rumbach, L. and Warter, J.M. (1984) Enhancement of spike and wave discharges by GABAmimetic drugs in rats with spontaneous petit-mal-like epilepsy. Neurosci. Lett, 44: 91-94. von Krosigk, M., Bal, T. and McCormick, D.A. (1993) Cellular mechanisms of a synchronized oscillation in the thalamus. Science, 261: 361-364. Wallenstein, G.V. (1994) The role of thalamic IGABABin generating spike-wave discharges during petit ma1 seizures. NeuroRepori, 5: 1409-1412. Wang, X.J., Golomb, D. and Rinzel, J. (1995) Emergent spindle oscillations and intermittent burst firing in a thalamic model: specific neuronal mechanisms. Proc. Nail. Acad. Sci. USA, 92: 5577-558 1. White, E.L. and Hersch, S.M. (1982) A quantitative study of thalamocortical and other synapses involving the apical dendrites of corticothalamic cells in mouse SmI cortex. J. Neurocyiol., 1 I: 137-157. Williams, D. (1953) A study of thalamic and cortical rhythms in Petit Mat. Brain, 76: 50-69.
This Page Intentionally Left Blank
SECTION IV
Psychiatry
This Page Intentionally Left Blank
J.A. Reggia, E. Ruppin and D. Glanzman (Eds.) Progress in Ervin Research, Vol 121 0 1999 Elsevier Science BV. All rights reserved.
CHAPTER 18
Using a speech perception neural network simulation to explore normal neurodevelopment and hallucinated ‘voices’ in schizophrenia Ralph E. Hoffman* and Thomas H. McGlashan Yale Psychiatric Institute, RO. Box 208038, New Haven, CT 06520-8038, USA
Schizophrenia is a psychiatric illness which occurs in roughly 1% of the population in the United States and throughout the world. It occurs relatively uniformly across different races and socioeconomic groups and produces devastating symptoms and cognitive impairments. Onset of schizophrenic symptoms is generally during late adolescence or early adulthood. Many different kinds of symptoms are exhibited in patients with this disorder, including hallucinations, delusions, disorganized speech, and social withdrawal. Brain mechanisms underlying schizophrenia remain largely unknown. This chapter will describe computer models of neurocognitive processes which may help to elucidate the neurobiological basis of one facet of this complex disorder. Computer models are central to scientific disciplines ranging from meteorology to physical chemistry. Their usefulness lies in simulating complex, interactive systems. A good model does not attempt to re-create ‘reality’ in its entirety-if that were the case, the best model would be the real-life system itself. Instead, model construction proceeds by incorporating a limited number of properties or observations. The model will have informative value if, when simulated by computer, critical phenomena which have been previously *Corresponding author. Tel.: + 1 (203) 7853259; fax: + 1 (203) 785 7855; e-mail: [email protected]
unexplained are exhibited. An example is the Red Spot of Jupiter. Many unproved theories had been proposed to account for this phenomenon (for instance, volcanic eruptions). But the mystery was finally clarified when a stable, Red-Spot-like ‘gaseous storm’ appeared spontaneously in a computer simulation of Jupiter’s atmosphere after parameters reflecting the planet’s rapid spin rate and it’s largely liquid composition were included (Gleick, 1987). Presumably different simulations of Red Spots based on other hypotheses (for instance, by including a volcano in the program) were possible. However, the most informative model was that which simulated the target behavior based on the known properties of Jupiter rather than from an ad hoc additional factor such as volcanoes. Brain systems are composed of vast numbers of neural elements whose individual behaviors have been partially characterized and but whose interactive dynamics remain largely unknown. The dynamic interactions of neuronal ensembles, like the physical dynamics of Jupiter, can be explored via computer models (Amit et al., 1994; Grunze et al., 1996). These models exhibit analogues of perceptual and cognitive processing. Along these lines, we describe a neural network computer simulation of certain aspects of speech perception. Although this simulation represents a vast simplification of actual cortical networks, a number of interesting properties were exhibited by the net-
312
work, including an adaptive advantage arising from reductions of synapses and connections ordinarily observed during normal neurodevelopment, a demonstration of how psychosis could emerge as an extension of this developmental trend, and a plausible model for the mechanism of action of antipsychotic drugs. The key observations that were used to construct this simulation are as follows:
Cortical development during adolescence is characterized by significant reductions of synapses A large number of workers have hypothesized that schizophrenia needs to be understood as a neurodevelopmental disorder (Margolis et al., 1995; Weinberger, 1987). However, progress in expanding this view has been limited. Our point of departure was a classic report by Huttenlocher (1979). Studying normal postmortem tissue obtained from middle frontal cortex, synaptic density was found to peak during childhood with a subsequent decline of 3 0 4 0 % during adolescence to attain adult levels which remained relatively stable. Less than 1% of afferents to any cortical area derive from thalamus (Braitenberg, 1978). Therefore significant reductions in cortical synapses must reflect significant reductions in corticocortical projections. This perspective played a critical role in the computer models described below.
Studies have suggested reduced cortical synapses in the brains of schizophrenic patients The age of onset of schizophrenic psychosis is generally the end of adolescence. In light of this aspect of the disorder and Huttonlocher’s findings, some workers have proposed that schizophrenia reflects an extension of normal adolescent neurodevelopment -namely excessive elimination of cortical synapses (Feinberg, 1982, 1983; Hoffman and Dobscha, 1989). This hypothesis has been supported by a range of other studies including: 31Pmagnetic resonance spectroscopy studies of neural membrane phospholipid turnover suggesting that frontal regions of schizophrenic patients reflect reduced outgrowth and excessive
-
elimination of axons and dendrites (Pettegrew et al., 1991, Keshaven et al., 1994, Stanley et al., 1995); - A postmortem light microscopic study suggesting volumetric reductions of ‘neuropil’ (i.e., the dense intertwining of axons and dendrites surrounding neurons and glial cell bodies) in prefrontal cortex of schizophrenic patients compared to normal controls (Selemon et al., 1995). - Golgi stain postmortem studies showing evidence of approximately 20% loss of dendritic spines -microanatomic structures which receive synaptic inputs- in prefrontal cortex of schizophrenic brains relative to normal brains (Garey et a1.,1995; Glantz and Lewis, 1995). - Other postmortem studies have demonstrated reductions in synapse-associated and phosphoprotein (synapsin and synaptophysin) in medial temporal cortex of schizophrenic patients compared to normal controls (Browning et al., 1993; Eastwood et al., 1995). Another postmortem study found reduced levels of MAP2 and MAP5 in hippocampal areas in brains of schizophrenic patients relative to normal controls (Arnold et al., 1991). These proteins are cytoskeletal in nature and are largely limited to axons and dendrites. These data also suggest that synapses and connections themselves are pathologically reduced in schizophrenia.
Metachromatic leukodystrophy -when the onset is during adolescence and early adulthood -is the neurological condition which most closely replicates schizophrenia (Hyde et a]., 1992). This disorder attacks white matter mediating corticocortical connections, especially involving the frontal lobes, while sparing gray matter. These clinical observations suggest that schizophrenia also is a disturbance reflecting reduced corticocortical connectivity. -
As is the case for normal neurodevelopment, a natural consequence of reductions in cortical synapses and neuropil in schizophrenia is assumed to be a corresponding reduction in corticocortical connectivity. This view provided a possible model of the onset of psychosis at the end of adoles-
313
cence-namely reductions of synapses and connections which are in excess of that associated with normal neurodevelopment.
A comparison ‘pathology’: selective cell death As a comparison ‘pathology’, the consequences of neuronal loss were also studied. Animal studies indicate that cell loss accompanies normal neurodevelopment (Margolis et a]., 1995). Along these lines, Huttenlocher (1979) described postmortem data strongly suggestive of frontal neuron cell loss in humans during early childhood. A number of studies have also reported neuronal loss in cortical and medial temporal areas in the brains of schizophrenic patients (Margolis et al., 1995; Falkai and Bogerts, 1986). Especially intriguing is an etiologic explanation of schizophrenia based on pharmacological studies of N-methyl-D-aspartate (NMDA) antagonists. Insofar as these drugs have been found to be psychotomimetic in humans and productive of excitotoxic cell death in animals, Olney and Farber ( 1995) have proposed that schizophrenia itself is due to an excitotoxic process. Therefore our neural network model included an ‘excitotoxic cell death’ condition where those neurons most consistently activated were functionally eliminated. Other methods of cell death were explored, e.g. eliminating working memory neurons randomly or by identifying those neurons which are the least activated.
Auditory hallucinations in schizophrenia most commonly consist of speech or ‘voices’ Our strategy was not to simulate the entire syndrome of schizophrenia- clearly an impossible task-but to explore a single exemplar symptom, auditory hallucinations. This symptom is reported by approximately 5040% of patients with schizophrenia (Sartorius et al., 1974; Andreasen and Flaum, 1991). One clue as to their origin is that these hallucinations most commonly consist of spoken speech or ‘voices’ (Hoffman, 1986). This phenomenological feature suggests that hallucinated speech involve neural systems dedicated to auditory speech perception. This view is reinforced by neuroimaging evidence of auditoryflinguistic association cortex activation when ‘voices’ occur.
A study by Silversweig et al. (1995) using positron emission tomography demonstrated activation of left temporoparietal cortex simultaneous with voices in schizophrenic patients. A functional magnetic resonance imaging study (fMRI) by Woodruff et al. (1997) characterized activation of temporal cortex by external speech in schizophrenic patients. This study concluded that external speech competes with voices for the same cortical circuits. Similarly an evoked potential study elicited by sound stimuli in hallucinating schizophrenic patients suggested endogenous activation (Tiihonen et al., 1992). Moreover, Suzuki et al. (1993) identified left temporal activation during periods of hallucinated voices. Other non-linguistic mechanisms of hallucinated speech have been proposed, such as sensory gating abnormalities (Braff and Geyer, 1990). There are, however, no studies directly testing this hypothesis. Another popular view is that voices actually are ordinary verbal thoughts that have been misidentified as deriving from an external, non-self source (Frith and Done, 1988). One study comparing hallucinating and non-hallucinating patients found frontal underactivation in the former group when subjects imagined hearing speech spoken by others (McGuire et al., 1996). The authors suggest that these brain activation failures ‘might contribute to a less secure appreciation’ of the actual source of verbal thoughts. However, there are no studies linking frontal underactivation to difficulties in identifying the source of thoughtshmages. Thus a model of hallucinated speech based on pathology arising directly from speech perception systems of the brain appears consistent with current studies.
Working memory, speech perception, and schizophrenia Our modeling strategy was to simulate certain aspects of the speech perception system to determine if pruning ‘corticocortical connections’ could simulate hallucinated voices. Hallucinations, by definition, are percepts that emerge in the absence of corresponding external sensory information. Our a priori criterion for identifying this ‘symptom’, therefore, was production of ‘percepts’ by the speech perception simulation in the absence of any ‘phonetic’ input.
314
Our neural network was based on the fact that ordinary speech, when produced at normal rates, has significant acoustic ambiguity due to blurring of phonetic information and background sounds (Warren and Warren, 1970; Kalikow and Stevens, 1977; Woods, 1982; McClelland and Elman, 1986). Consequently, perception of new words is not a passive process based solely on acoustic input, but an active, creative one dependent on intrinsic knowledge of how words are sequenced into larger message units (Elman, 1990a,b). Working memory has been defined as a brain system that temporarily stores and processes information to direct ongoing cognitive processes (Baddeley, 1992). The storage and processing of prior linguistic information used to ‘disambiguate’ new speech inputs therefore reflects a specialized working memory capacity . Many studies have demonstrated working memory impairments in schizophrenia (Park and Holzman, 1992; Gold et al., 1997) and have implicated pathology in frontal and medial temporal areas which are known to underlie human working memory (Goldman-Rakic, 1991; Breier et al., 1992; Weinberger et al., 1992). Our simulation of speech perception included a working memory component where linguistic expectations based on how words are sequenced as messages are represented (Elman, 1990a,b; Hoffman et al., 1995). We targeted this component of our neural network to explore effects of reduced corticocortical connectivity.
Methods Our simulation of sequential word perception is based reports by Elman (1990a, 1990b). Many aspects of this simulation have previously been described in Hoffman and McGlashan (1997) and reflect an expansion of an earlier simulation which has also been reported elsewhere (Hoffman et al., 1995). Compared to our initial report, the simulation reported below utilizes a more complex learning paradigm and a reduction in input information which forced the network to rely more heavily on working memory. Network architecture
Our network consisted of 148 ‘neuronal elements’ divided into a four-layered system (Fig. 1). The
network was designed to translate ‘phonetic’ inputs into words. Actual acoustic data were not used; instead our simplifying assumption was that the ‘phonetic representation’ of each word corresponded to a pattern of activation where roughly 25% of the neurons in the initial or input layer were turned ‘on’ (Table 1). Each of forty hidden layer neurons received a weighted sum of inputs from each of the 25 input neurons:
4x)= c ’.vyxa, (Y)
(1)
where I(x) is information communicated from the input layer to neuron x in the hidden layer, wyxis the weight (which can be positive or negative) of the projection from neuron y in the input layer to neuron x in the hidden layer, and sib) is the level of activation of neuron y in the input layer. Each hidden layer neuron also received input from every neuron in the temporaly storage layer (also 40 neurons in size) which stored a replica of the pattern of activation of the hidden layer emerging from the preceding phonetic input (Fig. 1). The activation of each neuron in the hidden layer, a&), ranged from 0 to 1 and was computed as follows:
where g (gain) and p (bias) together determine response profiles of simulated neurons (see Fig. 2). When combined input to any single neuron was very negative, its activation approached zero. When its combined input was very positive, neural activation approached a maximum level of one. Intermediate levels of firing were expressed as fractions. The output layer consisted in 43 neurons. Output layer neurons received inputs exclusively from the hidden layer (see again Fig. 2) and had the same activation function as hidden layer neurons. Besides being assigned a phonetic code, each of the words in Table 1 was also assigned a pattern within the output layer where between 3-6 of these neurons were turned on. These neurons code for semantic and syntactic features. For instance, the word, ‘cop’, was represented by activation of output neurons which individually coded for NOUN, ANIMATE, HUMAN, as well as a particular neuron which referred to ‘cop’ itself. A sample
315
Output Layer (43 neurons)
0000000
Hidden Layer (40 neurons)
lnput Layer (25 neurons) Temporary Storage Layer (40 neurons) Fig. 1. Schematic representation of the neural network used in this study. The hidden layer receives ‘phonetic’ information while the output layer codes for ‘words’ in the system’s vocabulary via activation of neurons coding for specific syntactic and semantic information (see also Table 2). Projections are unidirectional and flow from the input layer to the hidden layer, and from the hidden layer to the output layer. The temporary storage layer retains a copy of the hidden layer from the prior information processing step, and projects a transformation of this representation back to the hidden layer (from Hoffman et al. (1997), p. 1685 0 American Psychiatric Association).
of output codes for individual words is provided in Table 2. When the network produced an output layer activation pattern, a least squares algorithm decided which word was the ‘best fit’ for that particular pattern; the ‘best fit’ became the ‘detected word’. With some outputs, a clear-cut ‘best fit’ could not be discerned; in these cases a ‘null output’ was declared. This outcome was obtained if insufficient phonetic information was provided at the input level.
Network training The network was trained with a set of 256 sentences. A ‘phonetic’ input corresponding to a particular word was coded at the input layer and projected into a hidden layer representation. The temporary storage layer retained a representation of the hidden layer state induced during the previous word presentation. Weights between different neuronal layers were derived from an on-line variant of backpropagation learning (Miikkulainen, 1993) where error terms were backpropagated after each
word presentation. This procedure caused learning to occur very rapidly. The learning process caused the working memory component of the network to store and make predictions based on prior information processing steps which aided in translating phonetic information into words. Insofar as different words often had partially overlapping phonetic patterns (Table l), the network learned to use information based on prior inputs to discern words correctly.
Assessing network performance Network models were then tested with an identical set of 23 sentences not used in training but incorporating the same vocabulary. Each test sentence was separated from the next by a ‘pause’ consisting in five null inputs (all input neurons set to zero). The percentage of words successfully detected by the network was counted. In addition the total number of misidentifications (when the network confused one word for another) were scored. Hallucinations were scored if words percepts were produced spontaneously during
316
‘pauses’. Assessment of network performance was undertaken with full phonetic information for each word, and then repeated with degraded ‘phonetic’ information, i.e. where two input neurons ordinarily turned by each new word input were randomly selected and reset to zero. This manipulation forced the network to further rely on working memory and linguistic expectations based on previous inputs to ‘fill in the blanks’. The following is an example of network performance was scored. Suppose the input consisted of ‘phonemes’ presented in a sequence corresponding to the following words: cop -chase -old
-
man - # - # - # - # - # - Jane - girl
- kiss
TABLE 1
Twenty-five Dimensional ‘Phonetic Code’ for the 30 Words Belonging to the Network’s Vocabulary’ ~
Word
‘Phonetic Code’
Young Old Tell Omen Dog Jane Run Ball Kick Give Boy Miss Lai-ge Small Story Frightens Girl Bill God Man COP Sam Think Kiss Won’t Woman Chase Fear Lovc Warning
000 1000 10100 1 1 1 000000 1 000 01000ll101101011010100101 1001000001100011000100000 0 100101 10000l000010010100 0 101 00 1 000000000 1 I 1 I 00000 10 10000010000001 I 10001001 000000 1011000000000I00 I 1 I 0 1 I000000100I0000010 I 1010 000001011001100000000l0l0 1100010010000000110100010 000001 I 000 1 101 100001 00000 0 I 100100000000100001010 I1 1000001I 1 1001100100010001 1001 101 110001011000101100 01 11000100101010001000100 000100100001l100010011000 I 0 1 00 1 0000 I 10000001 0000 I 0 10010000111000001I1000010 0101000001000001011100110 1 I 0 1 00 I 00 1 1 00000000000100 000000101001 1000011011100 0 I0001000100 10 10000100000 000 I0 101010000110010 10000 I0000 1001001 101000I000000 0000l00l0l0000l00l00000l0 1001000100000010000101110 1001110110001000010010000 10010000I0000 10000I0 10100 000 I 100I0101010001000 1000 100111010000100101lo00000
-
‘Phonetic code of individual words represented as string of ones and zeroes generated arbitrarily using a random number generator.
where #’s are null inputs corresponding to a ‘pause’. Assume that the output of the network is was cop - chase - 4, - dog - 4, - 4, - 4, - fear - 4 - Jane - kiss - girl (where 4, denotes the absence of any output produced by the network). The number of words correctly identified would be 5/7. One word (‘man’) would be scored as a misidentified and ‘fear’, would be scored as a hallucination.
Neuroanatomic manipulations Two neuroanatomic manipulations of working memory networks were simulated. The pruning procedure was motivated by the concept of neurodevelopmental ‘danvinism’ where neurons compete with each other for anatomic access to other neurons with elimination of less robust interneuron connections (Edelman, 1987; Geinisman et al., 1988; Nelson et al., 1990). In mathematical terms, if the absolute value of a weight was below a certain threshold it was ‘clamped’ at zero. Excitotoxic cell death was simulated by presenting the network with the standard set of test sentences. Hidden layer neurons were ranked according to the summed activation which they received. Neurons receiving highest levels of overall activation were then ‘eliminated’ by clamping their activation levels at zero. Effects of excitotoxic cell death were explored by applying this algorithm to hidden layer neurons as well as to temporary storage layer neurons. Other simulations of cell loss in either the hidden layer or temporary storage layer included: (i) elimination of neurons randomly; (ii) elimination of neurons which were the least activated during testing.
Results Performance of the ‘standard’ network Figure 3 displays the performance of the standard network prior to neuroanatomic manipulations. Unsurprisingly, when phonetic information was lost, fewer words were detected and word rnisidentifications were produced. No hallucinations
317
B
A 1
0
Inhibitory Input
Inhibitory lnput
Excitatory Input
Excitatory Input
Fig. 2. Response profiles for a sample neuron for different parameter settings corresponding to Eqn (2). Positive bias shifts the curve to the left so that more activation occurs for a range of input conditions. More positive gain results in suppression of activation for small or negative inputs and enhancement of activation for larger inputs. Neuromodulatory alterations were simulated by shifting either bias or gain parameters of hidden layer neurons from their baselines of one and zero respectively (from Hoffman et al. (1995). p. 482 0MIT Press).
were elicited, however. To demonstrate the specific contribution of verbal working memory and word order, this network was retested with the same set
of words but where word sequencing was entirely random (e.g. ‘man-old-cop-chase’ instead of ‘copchase-old-man’). Dramatic reductions in word
TABLE 2 Examples of the output neural code for particular words belonging to the network’s vocabulary -____
output neurons (11
=43)
1
3 12 14
20 21 25
21 28 39 40 41
42 43
Feature code
Words BOY
Noun Human Jane COP Verb Complementanimate Complementnull Kiss Miss Adjective Age-attribute Size-attribute Diminuitive Superlative
Jane
COP
Kiss
Miss
Run
Small
318
A 100
h
8
Y
75
c
0 .c 0
s
0“
50
B 10
\
\
h
b
‘
ca
.-U ’c *
r
v)
I
I
Undegraded
Degraded
5
.%
5
25
0
7.5
C .-0 c
\
E
8
W
2.5
n u
Jndegraded
Degraded
+Grammatical word sequences Randomized word sequences Fig. 3. Testing the network with grammatical versus randomized word sequences demonstrates that that network performance is high sequence-dependent (from Hoffman and McGlashan ( 1997) p. 1686 0 American Psychiatric Association).
detection rate for both undegraded and degraded phonetic inputs were observed and demonstrated the dependence of the network on word order in decoding input information.
Neuroanatomic alterations The results of progressive pruning of working memory connections on detection of degraded stimuli is illustrated in Fig. 4(A). Dramatic increases in word detection and decreases in word identifications were noted with an optimized system noted for pruning levels of approximately 60%. Beyond that point, perceptual abilities trailed off, and simulated ‘hallucinations’ were noted at a pruning level of approximately 80%. Hallucinations consisted in a single word, ‘won’t’. Progressive increases in excitotoxic cell death of temporary storage layer neurons also brought about increased perceptual performance though to a somewhat lesser degree than pruning (Fig. 4(B)). Beyond an optimum level, performance again declined. No hallucinations were noted following these alterations. Excitotoxic cell death applied to hidden layer neurons only impaired perceptual
function even at low levels. For instance, as few as 3/40 neurons lost produced a significant reduction in word detection rate (88%). These perceptual impairments worsened dramatically as additional neurons were lost. No hallucinations were produced by this manipulation. Eliminating neurons by random selection or by choosing those neurons which were minimally activated produced neither perceptual enhancement nor hallucinations regardless if applied to the hidden layer or the temporary storage layer.
Treatment of ‘hallucinations’ Hallucinations due to pruning could be ‘treated’ by resetting the bias level of hidden layer neurons to negative values (Table 3 ) . Hallucinations were readily eliminated, and at even more negative bias values word detection rates were restored to near normal levels when inputs were not degraded. However, phonetic degradation revealed significant impairments even when hallucinations were fully suppressed. No other neuromodulatory manipulation of the hidden layer produced a ‘therapeutic effect’ with regard to hallucinations.
319
A
100 -
-.-T C 0
75-
351)
I I
'
I
25 I I
I
I
I
50 I
I
I
I
/ '75
I
I1
I
Percent Cells Eliminated
Percent Synapses Eliminated
Fig. 4. Effects of pruning connections in working memory (A) compared with excitotoxic elimination (B). Both manipulations produced some improvement in network performance at low levels. However, only pruning connections produced hallucinations when neuroanatomic alterations occurred in excess (from Hoffman and McClashan (1997) p. 1687 0 American Psychiatric Association).
Discussion
First, the functional utility of pruning connectivity in the working memory module up to a certain level base was clearly demonstrated. We previously noted that synaptic pruning conserves energy requirements for cortical function (Hoffman and Dobscha, 1989). This advantage provides one rationale for its occurrence during development. However, our current study demonstrates clearly that these changes not only enhance energy efficiency but also increases significantly the actual information processing capacity of the system. Moreover, we have demonstrated that similar
Randomization of word order of input sentences dramatically disrupted performance of the 'standard' network. It is clear therefore that the network relied on grammatical structure of word sequences represented in working memory in order to decode 'phonetic' inputs. This aspect of the network's behavior provided the basis for investigating the impact of neuroanatomic alterations on perceptual processes critical to human communication. These efforts yielded two results. TABLE 3
'Treating' an hallucinogenic network d u e t o connectivity reductions with neuromodulatory adjustment Bias levels for network with 81% pruning 0 Hallucination Rate Word detection rate, No noise Word detection rate, Low noise Word detection ratc, High noise
- 0.4
- 0.8
-
1.2
Unpruned network (bias = 0)
7
0
0
0
0
n9
96
9n
96
9n
65
72
ni
no
91
44
54
61
62
n2
320
advantages could be obtained from excitotoxic cell death when administered to temporary storage neurons. These findings are relatively specific insofar as cell elimination by other methods did not enhance perceptual function. These observations provide an account of the relative ubiquity of pruning connections and cell death as neurodevelopmental processes in different brain regions and animal species. One other research group has addressed similar issues and found that pruning connections enhanced memory capacity of a simulated associative memory network (Chechik et al., 1998, in press). Their study found that maximum enhancement of function occurred when 60-70% of connections were eliminated, which is very close to our optimum pruning level, even though our neural architecture and information processing tasks were dissimilar. The congruence of our two studies suggests that these findings may be applicable to many different types of networks, both artificial and biological. Nonetheless, our model of neurodevelopment and language processing remains relatively untested insofar as there are no empirical studies of human cortical microanatomy as it relates to language development. Thus we must turn to animal studies. Of considerable interest are studies of songbird acquisition which demonstrated reductions of synapses in brain areas responsible for this communication function (Scheich et al., 1991). Birdsong is not language but it is a highly structured sequential system directing production and perception of sound sequences. Perhaps a parallel developmental process occurs in humans where cortical pruning of synapses results in enhanced efficiency in processing sequential linguistic behavior. In bird studies, loss of synapses was accompanied by inability to learn new birdsong sequences. These findings also have a parallel in human studies of language processing insofar as there is a loss of flexibility in second language learning following childhood (Johnson and Newport, 1991). Based on the birdsong studies and our computer simulation, one could postulate that more advanced levels of pruning during adolescence produces greater efficiency of language processing but at the cost of reducing flexibility for learning new language systems.
Our second major finding was that pruning, which enhanced perceptual capacity at low levels, produced hallucinated speech when applied excessively. This conceptual linkage is of considerable interest given the high prevalence of this symptom in schizophrenia, the characteristic age of onset of this disorder, the dramatic loss of frontal synapses occurring in adolescence and numerous studies suggesting further losses of synapses and connectivity in schizophrenia. Studies of symptom course have suggested that early manifestations of schizophrenia actually emerge 2-3 years earlier than onset of the first presentation of treating, thus placing the actual onset of the disorder within adolescence squarely in many cases (Larsen et al., 1996). Therefore it is plausible that factors which regulate synaptic pruning in adolescence could contribute significantly to the pathogenesis of schizophrenia. Schizophrenic patients are less likely to marry and have children. This observation raises a critical question: Why has the genetic predisposition for schizophrenia remained robust in diverse human populations in spite of clear-cut fertility disadvantages (Crow, 1995)? Our model suggests that genes which lead to postnatal corticocortical connectivity reductions might be advantageous cognitively up to a certain point (and hence selected for) but in certain combinations could produce too much pruning- with psychosis resulting. Insofar as pruning at low levels helped the network to ‘fill in the gaps’ during perceptual processing, it is not surprising that additional pruning could push to the network to produce to the network to ‘hallucinate’ speech percepts. A more careful examination of this phenomenon as it occurs in the model indicates that these phenomena arise specifically from excessive sequential expectations derived from working memory: the spontaneous percept, ‘won’t’ only followed certain prior nouns but not other words. At the level of neuronal functioning, we observed that these events corresponded to resonant activation sustained in the working memory system. This finding is of interest given the large number of studies indicating workmg memory impairments in schizophrenia (Park and Holzman, 1992, Gold et al., 1997). Many researchers have postulated that working memory
32 1
alterations can be linked conceptually to negative symptoms due to blunting of cognitive abilities (Wollun et al., 1992). Our model suggests that this functional system can also produce spurious outputs productive of positive symptoms. The fact that hallucinations were not generated by the cell death model suggest that a relatively full array of neurons must be present in order to produce spontaneous, intact neural representations which are experienced as ‘hallucinogenic’. This does not rule out the possibility that in some cases of schizophrenia in some brain regions there is neuronal loss but only suggests that these neuroanatomic alterations do not produce auditory hallucinations and related positive Symptoms. Standard medications used to treat schizophrenia block the effects of dopamine in terms of its effects on neuronal behavior. Dopamine does not directly mediate information transmission between neurons but instead modifies how neurons communicate with other neurons (Bunney et al., 1991). These changes occur over much more sustained periods of time compared to the rapid fluctuations of inhibitory and excitatory influences generated ‘classical neurotransmitters’ (Servan-Schreiber et al., 1990). Dopamine is therefore referred to as a neuromodulator. In general, all antipsychotic medication demonstrate dopamine-blocking effects. Consequently, it has often been assumed that schizophrenia arises from excessive dopamine drive. Our model challenges this view. Shifts in responsiveness of working memory excitability in our model produced improvement in hallucinations even though the ‘primary pathology’ was neuroanatomic (i.e. pruning of connections between working memory neurons). In other words, a ‘neuromodulatory shift’ was able to compensate for neuroanatomic pathology. This alternative view of the role of dopamine in schizophrenia is consistent with the fact that direct evident for alterations in central dopamine systems in schizophrenic patients is sparse. Cerebrospinal fluid (CSF), postmortem and in vivo neuroimaging studies have not produced convincing evidence of excessive dopamine or dopamine receptors in the brains of schizophrenics (see Lieberman and Koreen, 1989). Also in support of our model is an early study examining effects of antipsychotic drugs on cortical neuron
behavior. Rolls (1984) found that trifluoperazine enhanced excitability of prefrontal neurons in awake monkeys. Another study has demonstrated that both haloperidol and clozapine decreased the inhibitory amino acid, GABA in frontal regions (Bourdelais and Deutch, 1994). The likely consequence of decreased GABA is enhanced neuronal excitability; it is well-known that antipsychotic drugs reduce threshold for seizure-induction. This could only occur if a primary effect of these drugs is activating. Finally, multiple studies have shown that dopamine inhibits both baseline firing of single cortical neurons (Bunney and Aghajanian, 1976; Sesack and Bunney 1989) as well as evoked responses (Ferron et a]., 1984; Glowinski et al. 1984; Stanzione et al., 1984; Thierry et al., 1988; Parfitt et al., 1990; Law-Tho et al., 1994). As indicated above, antipsychotic drugs have dopamine-blocking effects which would be expected to have the opposite effect as observed in the Rolls et a]. (1984) study, namely enhanced excitability. However, evidence for amplification of response to inputs relative to baseline firing (i.e. enhanced signal-to-noise) have also been reported for dopamine (Penit-Soria et al., 1987; Sawaguchi, 1987; Yang and Mogenson, 1990). Thus there is considerable uncertainty regarding both the effects of dopamine on neurons in the cerebral cortex, as well as post-synaptic effects of dopamine-blocking drugs on neuronal behavior. Our model suggests that alterations in overall excitability rather than in signal-to-noise response of neurons is most pertinent to understanding effects of antipsychotic drugs. An alternative model of positive symptoms has recently been presented which postulates that mesocortical dopamine ordinarily is low or normal in schizophrenic patients but surges to pathological levels in response to stress (Deutch and Roth, 1990; Deutch, 1993). These surges can in theory be blocked with antipsychotic drug treatment. An appeal of this model is that it accounts for the waxing and waning course of active symptoms in schizophrenia. Our model is not inconsistent with the Deutch model insofar as stress-induced fluctuations in dopamine (considered as positive bias) are also predicted to cause emergence of positive symptoms.
322
Our model also demonstrates how positive symptoms are more responsive to treatment than cognitive deficits (Goldberg et al., 1993). Compensatory shifts in bias (the simulated ‘treatment’) still left residual impairments in word detection and elevated misidentification levels when phonetic inputs were degraded. The model predicts that narrative speech perception abilities of schizophrenic patients reporting ‘voices’ is reduced compared to non-hallucinating schizophrenics, especially when phonetic clarity was reduced. These differences were in fact demonstrated in two studies of schizophrenic patients using a speech tracking task (Hoffman et al., 1995; Hoffman et al., 1999). A number of issues are obviously not addressed by our simulations. For instance, the model does not provide an explanation as to why neuroleptic drugs take days to weeks in order to achieve antipsychotic effects. Our simulation suggest that dopamine-blockade could immediately reverse hallucinosis unless neuroanatomic changes are very severe. These time delays could reflect the fact that certain response patterns become over-represented due to repetition and need to be ‘unlearned’ over time. There is some evidence that monamine neuroregulatory systems impact on longer-term neuroplasticity; therefore could impact on learning or unlearning. Furthermore, it may be that compensatory processes come in to play when antipsychotic drugs are administered so that effects on bias are not fully expressed until days or weeks have passed. Second, our simulation only addresses a single psychotic phenomenon, namely hallucinations. Other psychotic symptoms may have different mechanisms. A computer model of story narrative has an architecture which partially overlaps ours (Miikkulainen, 1993). It is possible that pruning the working memory component of narrative memory could produce story elements which intrude into consciousness as ‘autonomous representations’ thus providing a model for id&e fixe type delusions. The model presented above has other shortcomings in terms of its fidelity in simulating human neurobiology, including the simplicity of neuronal types and architecture and the learning paradigm
used. But, as stated above, a model should not be judged on its complexity, but on its ability to provide unitary accounts for real world observations. Many facets of schizophrenia remain mysterious, perplexing and even paradoxical. We predict that human intuition alone, no matter how well informed, will be unable to untangle these conundrums and that neural models will be needed to assemble a comprehensive picture of this disorder.
Acknowledgement Supported in part by NIMH grant MH-50557.
References Amit, D.J., Brunel, N. and Tsodyks, M.V. (1994) Correlations of cortical Hebbian reverberations: theory versus experiment. J . Neumsci., 14: 6435-6445. Andreasen, N.C. and Flaum, M. (1991) Schizophrenia: the characteristic symptoms. Schizophu. Bull., 17: 27-50. Arnold, S.E., Lee, V.M.-Y., Cur, R.E. and Trojanowski, J.Q. ( I99 1 ) Abnormal expression of two microtubule-associated proteins (MAP2 and MAP5) in specific subfields of the hippocampal formation. Proc. Natl. Acad. ScI. USA, 88: 10850-10854. Baddeley, A. (1992) Working memory. Science, 255: 556-559. Bourdelais, A.J. and Deutch, A.Y. (1994) The effects of haloperidol and clozapine on extracellular GABA levels in the prefrontal cortex of the rat: an in vivo microdialysis study. Cereb. C o r t a , 4: 69-17. Braff, D.L. and Geyer, M.A. (1990) Sensorimotor gating and schizophrenia. Arch. Gen. Psychiatry, 47: 18 1-188. Braitenberg, V. (1978) Cortical architectonics: General and Areal. In: M.A.B. Brazier and H. Petsche (Eds.), Architectonics of the Cerebral Cortex. New York: Raven Press. Breier, A,, Buchanan, R.W., Elkashef, A,, Munson, R.C., Kirkpatrick, B. and Gellad, E (1992) Brain morphology and schizophrenia: A magnetic resonance imaging study of limbic, prefrontal cortex, and caudate structures. Arch. Geri. PsYcII.,49: 921-926. Browning, M.D., Dudek, E.M., Rapier, J.L.. Leonard, S. and Freedman, R. (1993) Significant reductions in synapsin but not synaptophysin specific activity in the brains of some schizophrenics. B i d Psychiatvy, 34: 529-535. Bunney, B.S. and Aghajanian, G.K. (1976) Dopamine and norepinephrine innervated cells in rat prefrontal cortex: pharmacological differentiation using microiontophoretic techniques. Life Science, 19: 1783-1792.
323 Bunney, B.S., Chiodo, L.A. and Grace, A.A. (1991) Midbrain dopamine system electrophysiological functioning: A review and new hypothesis. Synapse, 9: 79-94. Chechik, G., Meilijson, I. and Ruppin, E. (1998) Synaptic pruning in development: A computational account. Neural Cotnpirtarion, 10: 1759-1777. Chechik, G., Meilijson, 1. and Ruppin, E. The role of neuronal regulation in synaptic pruning: A compuational study. Neurcil Cornputcttion, in press. Cohen. J.D. and Servan-Schreiber. D. (1992) Context, cortex and dopamine: A connectionist approach to behavior and biology in schizophrenia, Psvchol. Re\:. 99: 45-77. Crow, T.J. (1995) A darwinian approach to the origins of psychosis. Bt: J. Psychiatry 167: 12-25. Deutch, A.Y. (1993) Prefrontal cortical dopamine systems and the elaboration of functional corticostriatal circuits: Implications for schizophrenia and Parkinson’s Disease. J. Neural Trans., 9 I ; 197-22 I . Deutch, A.Y. and Roth, R.H. (1990)The determinants of stressinduced activation of the prefrontal cortical dopamine system. Prog. Brain R e x , 85: 367402. Eastwood. S.L., Bumet, P.W. and Harrison, P.J. (1995) Altered synaptophysin expression as a marker of synaptic pathology in schizophrenia. Neuroscience, 66: 309-19. Edelman. G.M. (1978) Neural Damtiiiistn. Basic Books, New York. Elman, J.L. (1990a). Finding structure in time. C o p . Sci., 14: 179-211. Elman, J.L. (l990b) Representation and structure in connectionist models. In: G.P. Altman (Ed.), Cognitive nzodels of’ Speech Prucessing: Psycholinguistic and Computatiunal Persprrtiivs (pp. 345-382). MIT Press, Cambridge, Mass. Falkai, P. and Bogerts, B. (1986) Cell loss in the hippocampus of schizophrenics. Eue Arch Psychiatry Neurol. Sci.. 236: 154-161. Feinberg. 1. (l982/1983) Schizophrenia: caused by a fault in programmed synaptic elimination during adolescence? J. P.vychiutt: Res., 17: 3 19-334. Ferron, A,. Thierry, A.M., Le Douarin, C. and Glowinski, J. (1984) Inhibitory influence of the mesocortical dopaminergic system on spntaneous activity or excitatory response induced from the thalamic mediodorsal nucleus of the rat medial prefrontal cortex. Brain Res., 302: 257-265. Frith, C.D. and Done. D.J. (1988) Toward a neuropsychology of schizophrenia. Be J. Psychiatry, 153: 4 3 7 4 4 3 . Carey, L.J., Ong, W.Y., Patel, T.S., Kanani, M., Davis, A. and Homstein, C. (1995) Bauer M. Reduction in dendritic spine number on cortical pyramidal neurons in schizophrenia (abstract). Soc. Neuroscience, 21: 237. Geinesman, Y., Morrell, F. and deToledo-Morrell, L. (1988) Remodeling synaptic architecture during hippocampal ‘kindling’. Proc. Natl. Acad. Sci., USA, 85: 3260-3264. Glantr, L.A. and Lewis, D.A. (1995) Assessment of spine density on layer I11 pyramidal cells in the prefrontal cortex of schizophrenic subjects (abstract). Soc. Neurusci., Abstr:, 2 1 : 239. Gleick, J. (1987) Chaos. New York, Viking, pp, 53-56.
Glowinski, J., Tassin, J.P. and Thierry, A.M. (1984) The mesocortico-prefrontal dopaminergic neurons. Trends Neurosci., 7: 415-418. Gold, J.M., Carpenter, C., Randolph, C., Goldberg, T.E. and Weinberger, D.R. (1997) Auditory working memory and Wisconsin Card Sorting Test Performance in schizophrenia. Arch. Gen. Psychiar?, 54: 159-165. Goldberg, T.E., Greenberg, R.D., Griffin, S.J., Gold, J.M., Kleinman, J.E., Pickar, D.. Schulz, S.C. and Weinberger, D.R. (1993) The effect of clozapine on cognition and psychiatric symptoms in patients with schizophrenia. BE J. Psvchi(itq. 162: 43-8. Goldman-Rakic, P.S. and Friedman, H.R. (1991) The circuitry of working memory revealed by anatomy and metabolic imaging. In: H. Leven and H.M. Eisenberg (Eds.). Fronml Lobe Ficncrion and Injury, New York: Oxford University, pp. 72-9 1. Grunze, H.C.R., Rainnie, D.G., Hasselmo, M.E., Barkai, E., Heam, E.F., McCarley, R.W. and Greene, R.W. (1996) NMDA-dependent modulation of CA 1 circuit inhibition. J. Neurosci., 2034-2043. Hoffman, R.E. (1986) Verbal hallucinations and language production processes in schizophrenia. Behav. Brain Sci., 9: 503-548. Hoffman, R.E. and Dobscha, S. (1989) Cortical pruning and the development of schizophrenia: A computer model. Schizophe Bull., 15: 477490. Hoffman, R.E., Rdpaport, J., Ameli, R., McGlashan, T.H., Harcherik, D. and Servan-Schreiber, D. (1995) A neural network simulation of hallucinated ‘voices’ and associated speech perception impairments in schizohrenic patients. J. Cogn. Neurusci., 7: 479496. Hoffman, R.E., Rapaport, J., Mazure, C.M. and Quinlan, D.M. ( I 999) Schizophrenic patients reporint hallucinated ‘voices’ demonstrate selective speech perception alterations. Am. J. Psychiatry, 156: 393-399. Hoffman, R.E. and McGlashan, T.H. (1997) Synaptic elimination. neurodevelopment, and the mechanism of hallucinated ‘voices’ in schizophrenia, Am. J. Psychiatry.. 154: 1683-1689. Hultenlocher, P. (1979) Synaptic density in human frontal cortex-developmental changes and effects of aging. Brain Revs., 163: 195-205. Hyde, T.M., Ziegler, J.C. and Weinberger, D.R. (1992) Psychiatric disturbances in metachromatic leukodystrophy: Insights into the neurobiology of pychosis. Arch. Neurol., 49: 401-406. Johnson, J.S. and Newport, E.L. (1991) Critical period effects on universal properties of language: the status of subjacency in the acquisition of a second language. Cognition, 39: 215-58. Kalikov, D.N. and Stevens, K.N. ( 1 977) Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. J. Acoust. Soc. Amee. 5: 1337-1 35 1. Keshaven, M.S., Anderson, S. and Pettegrew, J.W. (1994) Is schizophrenia due to excessive synaptic pruning in the
324 prefrontal cortex? The Feinberg hypothesis revisited. J. Psychiatc Res., 28: 239-265. Larsen, T.K., McGlashan, T.H. and Moe, L.C. (1996) Firstepisode schizophrenia: 1. Early course parameters. Schizopli,hr: Bull., 22: 241-56. ,aw-Tho, D., Hirsch, J.C. and Crepel, F. (1994) Dopamine modulation of synaptic transmission in rat prefrontal cortex: an in vitro electrophysiological study. Neurosci. Res., 21: I 5 1-60. ieberman, J.A. and Koreen, A.R. (1989) Neurochemistry and neuroendocrinology of schizophrenia: A selective review. Schizopkc Bull., 19: 37 1 4 2 9 . Margolis, R.L., Chuang, D-M. and Post, R.M. (1995) Programmed cell death: Implications for neuropsychiatric disorders. Biol. Psychiatry, 35: 946-956. McClelland, J.L. and Elman, J.L. (1986) Interactive processes in speech perception: The TRACE model. In: J.L. McClelland and D.E. Rumelhart (Eds.), Parallel Distributed Processing, Vol. 2 (pp. 58-121). MIT Press, Cambridge, Mass. McGuire, P.K., Silbersweig, D.A., Wright, I., Murray, R.M., Frackowiak, R.S. and Frith, C.D. (1996) The neural correlates of inner speech and auditory verbal imagery in schizophrenia: relationship to auditory verbal hallucinations. Bc J. Psychiatv, 169: 148-5. Miikkulainen, R. (1993) Subsymbolic Natural Language ProcEssing: An Integrated Model of Scripts, Lexicon and Memoly. MIT Press. Nelson, P.G., Fields, R.D., Yu, C. and Neale, E.A. (1990) Mechanisms involved in activity-dependent synapse formation in mammalian central nervous system cell cultures. J. Neurobiol., 21: 138-156. Olney, J.W. and Farber, N.B. (1995) Glutamate receptor dysfunction and schizophrenia. Arch. Gen. Psychiatly, 52: 998- 1007. Parfitt, K.D., Gratton, A. and Bickford-Wimer, P.C. (1990) Electrophysiological effects of selective D1 and D2 dopamine receptor agonists in the medial prefrontal cortex of young and aged Fischer 344 rats. JI. Phurmucol. E.rpl. Thex, 25: 539-545. Park, S. and Holzman, P.S. (1992) Schizophrenics show spatial working memory deficits. Arch. Gen. Psychiatry, 49: 97 5-9 R 2. Penit-Soria, J., Audinat, E. and Crepel, F. (1987) Excitation of rat prefrontal cortical neurons by dopmaine: An in vitro electrophysiology study. Bruin Res., 25: 263-274. Pettegrew, J.W., Keshavan, M.S., Panchalingam, K., Strychor, S., Kaplan, D.B., Tretta, M.G. and Allan, M. (1991) Alterations in brain high-energy phosphate and membrane phospholipid metabolism in first-episode, drug-naive schizophrenics: A pilot study of the dorsal prefrontal cortex by in vivo phosphorus 3 1 nuclear magnetic resonance spectroscopy. Arch. Gen. Psychiatry, 48: 563-568. Rolls, E.T., Thorpe, S.J., Boytim, M., Szabo, 1. and Perret, D.I. (1984) Responses of striatal neurons in the behaving monkey.
3. Effects of iontophoretically applied dopamine on normal responsiveness. Neurosci., 12: 1201-1212. Sartorious, N., Shapiro, R. and Jablonsky. A. (1974) The international pilot study of schizophrenia. Sclzizophc Bull., I : 21-35. Sawaguchi, T. ( 1987) Catecholamine sensitivities of neurons related to visual reaction time task in the monkey prefrontal cortex. J. Neuropkysiol., 58: 1100-1 122. Scheich. H., Wallhausseer-Franke, E. and Braun, K. (1991) Does synaptic selection explain auditory imprinting. In: L.R. Squire, N.M. Weinberger, G. Lynch, and J.L. McGaugh (Eds.), Memory: Organization and Locus of Change, pp. 114-159. New York, Oxford University Press. Selemon, L.D., Rajkowska, G. and Goldman-Rakic, P.S. (1995) Abnormally high neuronal density in the schizophrenic cortex: A morphometric analysis of prefrontal area 9 and occipital area 17. Arch. Gen. Psychiatry, 52: 805-8 18. . Servan-Schreiber, D., Printz, H.W. and Cohen, J.D. (1990) A network model of catecholamine effects. Science, 249: 892-895. Scrvan-Schreiber, D., Cohen, J.D. and Steingard, S. (1996) Schizophrenic deficits in the processing of context: A test of a theoretical model. Arch. Genl. Psychiutry, 53: 1 105-1 112. Sesack, S.R. and Bunney, B.S. (1989) Pharmacological characterization of the receptor mediating electrophysiological responses to dopamine in the rat medial prefrontal cortex a microiontophoretic study. J. Phurmacol. Exp. Therupeut.. 248: 1323-1333. Silbersweig, D.A., Stern, E., Frith, C., Cahill, C., Holmes, A,. Grootoonk, S., Seaward, J., McKenna, P., Chua. S.E., Schnoor, L., Jones, T. and Frackowiak. R.S.J. (1995) A functional neuroanatomy of hallucinations in schizophrenia. Nature, 378: 176-179. Stanley, J.A., Williamson, P.C., Drost, D.J. Carr, T.J, Rylett, J., Malla, A. and Thompson, T. (1995) An in vivo study of the prefrontal cortex of schizophrenic patients at different stages of illness via phosphorus magnetic resonance spectroscopy. Arch. Gen. Psychiatry, 52: 399-406. Stanzione, P., Calabresi, P., Mercuri, N. and Bemardi, G. Dopamine (1 984) modulates CA 1 hippocampal neurons by elevating the threshold for spike generation: An in vitro, study. Neiirosci., 13: 1105-1 116. Suzuki, M., Yuasa, S., Minabe, Y., Murata. M. and Kurachi. M. (1993) Left superior temporal blood flow increases in schizophrenic and schizophreniform patients with auditory hallucinations: a longitudinal case study using "'I-IMP SPECT. Euc Arch. Psychiatry Clin. Neurosci., 242: 257-261. Thierry, A.M., Mantz, J., Milla, C. and Glowinski, J. (1988) Influence of the mesocortical prefrontal dopamine neurons on their target cells. Ann. NYAcad. Sci., 537: 101-111. Tiihonen, J., Hari, R.. Naukkarinen. H., Rimon, R., Jousmaki, V. and Kajola, M. (1992) Modified activity of the human auditory cortex during auditory hallucinations. Am. J. Psychiatry, 149: 255-257.
325 Warren, R.M. and Warren, R.P. (1970) Auditory illusions and confusions. Scien. Amer., 223: 30-36. Weinberger, D.R. ( 1987) Implications of normal brain development for the pathogenesis of schizophrenia. Arch. Gen. Psvchititlv. 44: 660-669. Weinberger. D.R., Berman. K.F., Suddath, R. and Torrey, E.E( 1992) Evidence for dysfunction of a prefrontal-limbic network in schizophrenia: an MRI and rCBF study of discordant monozygotic twins. Am. J. P.~ychiatry, 149: 890-897. Wolkin, A,. Sanfilipo, M., Wolf, A.P., Angrist, B., Brodie, J. and Rotrosen. J. ( 1992) Negative symptoms and hypofrontality in chronic schizophrenia. Arch. Gen. Psychinfry, 49: 959-965.
Woods, W.A. (1982) HWIN: A speech understanding system on a computer. In: M.A. Arbib, D. Caplan and J.C. Marshall (Eds.), Models of Language Processes (pp. 95-1 13). Academic Press, New York. Woodruff, P.W.R., Wright, I.C., Bullmore, E.T., Brammer, M., Howard, R.J., Williams, S.C.R., Shapeleske, J., Russell, S., David, AS., McGuire, P.K. and Murray, R.M. (1997) Auditory hallucinations and the temporal cortical response to speech in schizophrenia: A functional magnetic resonance imaging study. Am. J. Psychiatry, 154: 1676-1682. Yang, C.R. and Mogenson, G.J. (1990) Dopaminergic modulation of cholinergic responses in rat medial prefrontal cortex: an electrophysiological study. Brain Res., 524: 271-28 1.
This Page Intentionally Left Blank
J.A. Reggia. E.Ruppin and D.Glanzman (Eds.) Progrcs.v in Briiin Research. Vol 121 0 19Y9 Elhevier Science BV. All rights reserved
CHAPTER 19
Dopamine, cognitive control, and schizophrenia: the gating model Todd S. Braver',* and Jonathan D. Cohen2 'Department of Psychology, Washington University, St. Louis, MO, USA 'Deparfrnent of Psychology. Princeton University, Princeton, NJ, USA arid Depurtnwnt of Psychiatry, University of Pittsburgh, Pittsburgh. PA, USA
Introduction The most prominent behavioral impairments in schizophrenia revolve around the failure to control thoughts and actions. These failures of cognitive control are manifest clinically in symptoms such as distractibility, loosening of associations, and disorganized or socially inappropriate behavior. In the laboratory, these disturbances have been observed as deficits of attention (Zubin, 1975; Kornetsky and Orzack, 1978; Wynne et al., 1978; Nuechterlein, 1991; Cornblatt and Keilp, 1994j, working memory (Weinberger et al., 1986; Goldman-Rakic, 1991; Park and Holzman, 1992), and behavioral inhibition (Wapner and Krus, 1960; Chapman et al., 1964; Storms and Broen, 1969; Abramczyk et al., 1983; Wysoclu and Sweet, 1985; Manschreck, et al., 1988; Carter et al., 1993). However, it is still not well understood what neurobiological and psychological disturbances contribute to cognitive control impairments in schizophrenia. In this chapter, we set forth a theory of cognitive control that is formalized as a connectionist computational model. The theory suggests explicit neural and psychological mechanisms that contribute to normal cognitive control, and proposes a *Corresponding wustl.edu
author.
e-mail:
tbraver@artsci.
specific disturbance to these mechanisms which may capture the particular impairments in cognitive control in schizophrenia. To preview, we propose that cognitive control results from interactions between the dopamine (DA) neurotransmitter system and the prefrontal cortex (PFC). Specifically, we suggest that goal-related information, or context, is actively maintained in PFC, and thus serves as a source of top-down support for controlling behavior. We suggest that the DA projection to PFC serves a 'gating' function, by regulating access of context representations into active memory. As such, DA plays an important control function, by enabling flexible updating of active memory in PFC, while retaining protection against interference. Moreover, we suggest that in schizophrenia, the activity of the DA system is noisier, and that this increased variability leads to disturbances in both the updating and maintenance of context information within working memory. Below, we lay out this theory of cognitive control, motivating it from cognitive, computational, and neurobiological perspectives. Following this review, we present two simulations which establish: ( I ) the model's computational plausibility; and (2) its success at capturing empirical data regarding the behavioral deficits observed in patients with schizophrenia during performance of a simple cognitive control task.
328
A theory of dopaminergic regulation of active memory PFC and active memory Cognitive perspectives. The need for a control mechanism in cognition has long been noted within psychology. Virtually all theorists agree that some mechanism is needed to guide, coordinate, and update behavior in a flexible fashion - particularly in novel or complex tasks (Norman and Shallice, 1986). In particular, control over processing requires that information related to behavioral goals be actively represented and maintained, such that these representations can bias behavior in favor of goal-directed activities over temporallyextended periods. Moreover, goal-related information must be: (1) appropriately selected for maintenance; (2) maintained for arbitrary lengths of time; (3) protected against interference; and (4) updated at appropriate junctures. The recognition that active representation and maintenance of goalrelated information is a central component of cognitive control can be seen in many theories. Perhaps the most explicit of these is Baddeley’s working memory model (Baddeley, 1986), which includes a specific subcomponent, ‘the central executive’, responsible for utilizing goal-related information in the service of control. The postulation of a cognitive system involved in executive control closely parallels theorizing regarding the nature of frontal lobe function (Bianchi, 1922; Luria, 1969; Damasio, 1985), based on the clinical observation that patients with frontal lesions often exhibit impairments in tasks requiring control over behavior - the so-called ‘dysexecutive syndrome’. However, traditional theories have not specified the mechanisms by which the executive operates. Theories aimed at providing a more explicit computational account of human behavior have also included goal representations as a central component. In production system models, goal states represented in declarative memory are used to coordinate the sequences of production firings involved in complex behaviors (e.g. Anderson, 1983). One critical feature of goal representations in production systems is that they are actively represented and maintained throughout the course of a sequence of behaviors. Moreover, Shallice
(Shallice, 1982, 1988; Norman and Shallice, 1986) has relied upon the production system framework in which to put forth his Supervisory Attentional System @AS) as a mechanism by which complex cognitive processes are coordinated and nonroutine actions are selected. In our own work, we have suggested that the active maintenance of context information is critical for cognitive control (Cohen and Servan-Schreiber, 1992; Cohen, Braver and O’Reilly, 1996; Braver, 1997). We have defined context to be prior task-relevant information that is internally represented in such a form that it can bias selection of the appropriate behavioral response. In particular, we have suggested that representations of context can include task instructions, a specific prior stimulus, or the result of processing a sequence of prior stimuli (e.g. the interpretation resulting from processing a sequence of words in a sentence). Because context representations are maintained on-line, in an active state, they are continually accessible and available to influence processing. Consequently, context can be thought of as a component of working memory, which is commonly defined as the collection of processes responsible for the on-line maintenance and manipulation of information necessary to perform a cognitive task (Baddeley and Hitch, 1994). Context can be viewed as the subset of representations within working memory which govern how other representations are used. Representations of context are particularly important for situations in which there is strong competition for response selection. These response competition situations may arise when the appropriate response is one that is relatively infrequent, or when the inappropriate response is prepotent (such as in the classic Stroop task). In this respect, context representations are closely related to goal representations within production system architectures. Maintenance of internal goal representations, or goal-related knowledge, is critical for initiating the selection of ‘weak’ behaviors, and for coordinating their execution over temporally-extended periods, while at the same time suppressing competing, possibly more compelling behaviors. Next, we discuss evidence that context information is actively maintained within PFC.
329
Neurobiological perspectives. The PFC has long been an area of particular focus for researchers investigating the neural basis of cognitive control. Over a hundred years of neuropsychological studies have provided strong evidence of the involvement of this brain region in the regulation of behavior. In recent years, a large body of converging evidence from neurophysiology and neuroimaging studies have suggested a more specific role for PFC in the active maintenance of task-relevant information. Single-cell recording studies in non-human primates have typically examined the active maintenance properties of PFC through the use of delayed-response paradigms, in which the animal must maintain a representation of a cue stimulus over some delay, in order to respond appropriately at a later point. It is now wellestablished that during performance of these tasks, populations of neurons in monkey PFC exhibit sustained, stimulus-specific activity during the delay period (Fuster and Alexander, 1971; Kubota and Niki, 1971). The mnemonic properties of these neurons has been demonstrated by showing that both local and reversible lesions to PFC impair task performance, and that performance errors in intact animals are correlated with reduced delay-period activity (Bauer and Fuster, 1976; Fuster, 1973). Neuroimaging studies have confirmed and extended these findings in humans. In the most recent of these studies, PFC activity has been shown to: (1) increase as delay interval increases (Barch et al., 1997); (2) increase as memory load increases (Braver et al., 1997); (3) be sustained over the entire delay interval (Cohen et al., 1997; Courtney et al., 1997). In addition to these other properties, PFC also appears to be particularly specialized to maintain information in the face of interference, while still allowing for flexible updating of storage. Recently, Miller and colleagues (Miller et al., 1996) have provided direct evidence for this hypothesis. They trained monkeys to respond to repeats of a prespecified cue (e.g., A) when presented with sequences such as A-B-B-A. This task clearly required the ability to identify the cue on each trial, and maintain it across intervening distractors. They observed cue-specific delay period activity for units in both inferotemporal cortex (IT) and PFC follow-
ing initial presentation. However, subsequent stimuli obliterated this activity in IT, while it was preserved in PFC until a match occurred. The crucial role of PFC in updating and interferenceprotection can also clearly be seen in studies of PFC pathology. Increased distractibility and perseveration are hallmarks of PFC damage (Milner, 1963; Damasio, 1985; Stuss and Benson, 1986; Owen et al., 1991; Engle et al., 1999), as well as a classic symptom of schizophrenia, which is thought to involve PFC abnormalities (Malmo, 1974; Nuechterlein and Dawson, 1984). Together, these findings support the idea that there are specialized mechanisms within PFC for active memory, as well as for protecting maintained information from both perseveration and interference, and that these mechanisms are disrupted in schizophrenia. Computational perspectives. From a computational viewpoint there are a number of different processing mechanisms that could support shortterm maintenance of information. The most commonly employed and well-understood of these are fixed-point attractor networks (e.g., Hopfield, 1982; Zipser, 1991). Such networks possess recurrent connections, which ‘recirculate’ activation among units, and are thus capable of supporting sustained activity. The state of such networks typically settles into ‘attractors’, defined as stable states in which a particular pattern of activity is maintained. Thus, attractors can be used to actively store information. Indeed, a number of computational models of simple maintenance tasks have demonstrated that both physiological and behavioral data regarding PFC function can be captured using an attractor-based scheme (Dehaene and Changeux, 1989; Zipser et al., 1993; Braver et al., 1995; Moody et al., 1998). However, simple attractor systems have a number of limitations which create problems in more complex maintenance tasks. These limitations can be traced to the fact that the state of an attractor system is determined by its inputs, so that presentation of a new input will drive the system into a new attractor state, overwriting previously stored information (Bengio et al., 1993; Mozer, 1993). Although attractor networks can be configured to display resistance to disruption from input (i.e. hysteresis), this impairs their ability to be updated
330
in a precise and flexible manner. One way in which attractor networks can overcome these difficulties is through the addition of a gating mechanism. Such systems only respond to inputs, and change their attractor state, when the ‘gate’ is opened. Computational analyses suggest that gating mechanisms provide the most effective way to stably maintain information in an active state, while retaining the ability for flexible updating. For example, Hochreiter and Schmidhuber ( 1997) have compared gated recurrent neural networks with other types of attractor systems, and concluded that networks with a gating mechanism were able to learn and perform complex short-term memory tasks better than simple attractor networks, especially when the tasks involved noisy environments, frequent updating, and relatively long periods of storage. The computational studies have suggested that a gated attractor system is the optimal one for active memory. Moreover, the physiological evidence reviewed above is consistent with the hypothesis that PFC implements such a system. Indeed, in previous work, Zipser and colleagues (Zipser, 1991; Zipser et al., 1993; Moody et al., 1998;) have proposed a gated attractor model and have used it to successfully simulate the pattern of delay period activity observed for PFC neurons. However, the Zipser model has not specified the source of the gating signal. In the following section, we suggest that phasic increases in DA activity serve as a gating signal within PFC.
DA modulation of behavior Schizophrenia. Disturbances to the DA system have long been regarded as central to schizophrenic pathology. Most of the support for this viewpoint comes from observations regarding the therapeutic efficacy of neuroleptics. The finding that the clinical potency of traditional neuroleptics is highly correlated with its affinity for dopamine receptors (Creese et al., 1976), strongly implicates this neurotransmitter in schizophrenic symptomatology. In addition, long-term usage of drugs which stimulate DA activity in the CNS can lead to schizophreniform psychoses (Snyder, 1972). The DA projection to PFC in particular has been a recent focus of attention in schizophrenia research.
Specifically, a viewpoint which is rapidly gaining attention is that many of the cognitive impairments seen in schizophrenia are related to reduced DA activity in PFC (Davis et al., 1991; GoldmanRaluc, 1991). Motor functions. In addition to its involvement in schizophrenia, the DA system has been implicated in a wide range of effects on behavior. The most prominent of these is the linkage of DA with motor function. It is well-established that disturbances to the subcortical DA system cause severe movement related disorders such as Parkinson’s disease. Further, stimulants such as amphetamine and apomorphine (which are thought to act by stimulating DA release - Kelly et al., 1975) have clear effects on motor behavior. In animals, these drugs produce consistent changes in both locomotor activity (Segal, 1975), and the repertoire of behaviors exhibited (Norton, 1973), with high doses inducing species-specific stereotypies (Randrup and Munkvad, 1970). There are also many studies documenting the effect of DA activity on response activity in goal-directed tasks, such as operant conditioning paradigms (Heffner and Seiden, 1980; Louilot et al., 1987). A number of investigators have hypothesized that, together, these findings suggest a function for DA in selecting or initiating new motor response patterns (Iversen, 1984; Oades, 1985). Reward functions. Another function of DA that has commonly been postulated in the literature, is that of processing rewards. This reward-based account of DA activity is supported by findings which suggest a permissive or facilitory role in a number of primary motivated behaviors, such as feeding, drinking and sexual activity (Willner and Scheel-Kruger, 1991). Conversely, spontaneous engagement in these behaviors has been shown to result in increased DA transmission (Heffner et al., 1980). In addition, innumerable studies have shown that the electrical self-stimulation paradigm is primarily dependent on stimulation of DA pathways (Phillips and Fibiger, 1989; Mora and Cobo, 1990). This finding is consistent with the pharmacological evidence that many drugs of addiction act through the DA system (Koob and Bloom, 1988). Taken together, these findings have led some researchers to postulate a crucial role for DA in
33 1
conveying information regarding the rewarding or reinforcing properties of specific behaviors (Wise and Rompre, 1989). Cognitive functions. The literature on the behavioral effects of DA is not limited to studies of motor and reward-related behaviors. There have also been a number of reports of DA effects on cognitive function. In humans, systemic administration of DA agonists have been associated with improvements on various cognitive tasks (Klorman, et al., 1984; Callaway et al., 1994). In particular, the most consistent effects of DA on cognition have been in tasks relying on ‘working’ or active memory. These sorts of tasks require subjects to maintain relevant contextual information in an active state (i.e. ‘on-line’), such that it can be used to mediate the appropriate response. DA effects in working memory have been seen systemically in humans (Luciana et al., 1992; Luciana et al., 1995), and through local manipulations in non-human primates (Brozoski et al., 1979; Sawaguchi and Goldman-Rakic, 1994). These local effects in primates have focused on DA activity selective to PFC. For example, Goldman-Rakic and colleagues have found that pharmacologically blocking DA receptors in circumscribed areas of PFC produced reversible deficits in task performance (Sawaguchi and Goldman-Rakic, 1991). Moreover, microiontophoresis of DA agonists and antagonists, and even DA itself has been found to directly affect the activity patterns of PFC neurons (Sawaguchi, Matsumara, and Kubota, 1990; Sawaguchi and Goldman-Rakic, 1991). Goldman-Rakic and others have concluded from these findings that DA activity serves to modulate the cognitive functions mediated by PFC (Goldman-Raluc, 1991; Cohen and Servan-Schreiber, 1992). A unitLzry function? The literature on DA involvement in motor, reward and cognitive functions reveal the wide-spread influence of this neural system on behavior. Further, the disparate nature of these three domains suggests that DA may perform multiple. unrelated behavioral functions. However, another, more parsimonious explanation is also possible: that DA activity plays a unitary function in the central nervous system which is expressed in different domains as a result of its interaction with the different brain systems to which it projects (i.e.
striatal, limbic, and cortical). In this chapter, we put forth the hypothesis that DA does, in fact, play a unitary function in behavior. Specifically, we propose that the function of the DA system is to provide a means for the organism to learn about, predict, and respond appropriately to events that lead to reward. The DA system serves this function through simple neuromodulatory effects in the neural populations which it targets. One effect modulates the responsivity of the target neurons to afferent and local input, and the other effect modulates the synaptic strength between the target and these inputs. The DA effects on synaptic strength serve to drive the learning of temporal predictors of reinforcement, while the effects on responsivity serve to transiently bias on-going processing. Most importantly, we propose that through its projection to PFC, the responsivity effect of DA serves to gate access to active memory, while its coincident learning effect allows the system to discover what information must be actively maintained for performance of a given task. In the remainder of this section, we present the following arguments: (1) DA exerts a modulatory effect on target neurons; (2) This effect is of a type that could be exploited to perform a gating function in PFC; and (3) The role of the DA system in reward-prediction learning provides it with particular activation dynamics and timing that are required of a gating signal. DA as a modulatory signal. A number of lines of evidence suggest that DA acts in a modulatory fashion in PFC that is consistent with a gating role. The PFC is the most densely innervated cortical target of the DA system. Electron microscopy studies of the local connectivity patterns of DA in PFC have revealed that DA typically makes triadic contacts with prefrontal pyramidal cells and excitatory afferents, and also with inhibitory interneurons (Lewis et al., 1992; Williams and Goldman-Rakic, 1993; Sesack et al., 1995). The triadic synaptic complexes formed in PFC suggest that DA can modulate both afferent input and local inhibition. Electrophysiological data support this view, indicating that DA potentiates both afferent excitatory and local inhibitory signals (Chiodo and Berger, 1986; Penit-Soria et al., 1987). In our previous work, we have simulated this potentiating effect of
332
DA as a multiplicative change in the slope of the activation function of target processing units (Servan-Schreiber et al., 1990; Cohen and ServanSchreiber, 1993; Braver et al., 1995; Servan-Schreiber et al., 1998). In this work, we assumed that DA effects were prolonged or tonic. However, for the modulatory action of DA to be useful as a gating signal, it must also be both transient and coincident with the occurrence of task-relevant information. DA as a gating signal. It has been traditionally assumed that neuromodulatory systems (e.g. dopamine. norepinephrine, serotonin) are slow-acting (tonic), diffuse, and non-specific in informational content (Moore and Bloom, 1978). However, there are a number of recent findings that suggest a revision of this view is needed. Detailed primate studies, involving recordings of norepinephrineproducing neurons in locus coeruleus (LC) during task performance, have demonstrated short bursts of LC activity occurring immediately after target, but not non-target presentation (Aston-Jones et al., 1994). Thus, neuromodulatory nuclei may show responses that are rapid, transient, and specific to behaviorally relevant stimuli. Similar findings have been reported for the DA system. Schultz and colleagues (Schultz, 1992; Schultz et al., 1993), recording from ventral tegmental area DA neurons in behaving primates, have observed transient ( - lOOms in duration, occurring 80-150 ms after stimulus onset) activity in response to novel or behaviorally relevant stimuli. Furthermore, during learning, task stimuli that failed to activate DA neurons on initial presentation came to elicit transient activity when the animal learned their significance for the task. Specifically, Schultz’s group observed transient, stimulus-locked activity to cues predictive of reward during performance of a spatial delayed response task (Schultz et al., 1993). Thus, in neurons projecting to PFC, DA activity was observed at precisely the time that the task required information to be gated into, and maintained, in active memory. DA as a learning signal. At the same time, these findings regarding DA activity dynamics are consistent with a role for DA in reward learning. Indeed, Montague et al. (1996) recently reported a computational model that treats phasic DA activity
as a widely distributed error signal, which drives the learning of temporal predictors of reward. In this model, transient changes in the firing of DA neurons represent mismatches between expected and received rewards. These transient changes in activity serve to modulate the strength of target synapses in proportion to the degree and sign of the activity change. The synaptic changes which result from DA firing, along with the appropriate reciprocal connectivity, allow the neural populations targeted by DA neurons to become configured to respond preferentially to stimuli that predict future rewards. The claim that DA modulates changes in synaptic plasticity is one that has recently received support in the neurophysiological literature, and is hypothesized to occur through changes in intracellular calcium (Law-Tho et al., 1995; Wickens et al., 1996; Calabresi et al., 1997). From a computational perspective, the learning effects proposed for DA by Montague et al. (1996) are similar to that proposed in artificial reinforcement learning systems that use a temporal-difference algorithm to predict delayed rewards (Sutton, 1988; Sutton and Barto, 1990). In these systems, the learning chains backward in time, in order to discover successively earlier environmental predictors of reward, until the earliest possible predictor is found that cannot itself be predicted. Intriguingly, the parameter used in Montague et al. (1996) model to simulate the modulatory effects of DA on learning is very similar to the parameter that seems to best describe its effect on neuronal responsivity, in that both are multiplicative in nature. Thus, it could easily be the case that the effects are mediated by different DA receptor subtypes. A new theory. Taken together, the properties of DA and PFC reviewed above suggest the outlines of a theory regarding the neural and computational mechanisms of cognitive control. In particular, we refine our previous work on active maintenance in PFC by integrating it with the work of Montague et al. (1996) on reward-based learning. This integration provides a means of accounting for the relevant data regarding DA activity dynamics and reward functions as well as the modulatory role of DA in active memory. Specifically, the following refinements are made to our original theory (Cohen and Servan-Schreiber, 1992; Braver, 1997):
333
0
0
0
0
0
DA gates access to active memory in PFC in order to provide flexible updating while retaining interference protection. Phasic changes in DA activity mediate gating and learning effects in PFC. Both effects rely on similar neuromodulatory mechanisms (possibly through different receptor subtypes). The gating effect occurs through the transient potentiation of both excitatory afferent and local inhibitory input. The learning effect occurs through Hebbian-type modulation of synaptic weights, and is driven by errors between predicted and received rewards. The coincidence of the gating and learning signals produces cortical associations between the information being gated, and a triggering of the gating signal in the future.
The power of this new theory is that it provides a framework which may be able to account for specific patterns of normal behavioral performance across a wide-range of tasks requiring cognitive control. At the same time, by making close contact with the known physiological properties of both the DA system and PFC, it may allow for more detailed and biologically realistic explorations of the neural basis of control. In particular, it provides an explicit framework for testing ideas regarding the particular neurobiological disturbances that may underlie schizophrenia and their consequence for behavior. Because the theory is conceptualized in terms of explicit computational mechanisms, it can be explored through simulation studies. In recent work we have conducted simulations which tested the computational validity of the theory (Braver and Cohen, in press). In particular, we have provided support for the hypothesis that DA implements both gating and learning effects, and that these can work synergistically to provide a mechanism for how cognitive control might be learned through experience. Our simulation demonstrated that control over active maintenance can be achieved by using a gating signal triggered by reward-based learning dynamics. In this simulation, the timing of the gating signal developed as a function of rewardprediction errors in the temporal-difference algorithm. The algorithm enabled the network to
chain backward in time to find the earliest predictor of reward, which was a cue stimulus that also had to be maintained in active memory in order to receive the reward. Because this cue triggered a phasic response in the gatingheward-prediction unit, the information provided by the cue was allowed access to active memory. In this chapter, we present additional simulations which further test the computational plausibility of the theory and its ability to provide new insights into the pathophysiology of schizophrenia. The first simulation manipulated different parameters in the model associated with gating in order to examine their effects on updating, maintenance, and interference protection. The second simulation tests a particular hypothesis regarding whether disturbances to a DA-mediated gating system can account for the pattern of behavioral impairments observed in schizophrenia patients during performance of a simple cognitive control task.
Simulation 1: gating-based updating of active memory The goal of this simulation was to examine the effects of relevant parameters associated with gating on active memory regulation within a simple network. Three parameters of gating were explored: connectivity, duration, and strength. With respect to connectivity, we examined whether the dual requirements of memory updating and interference-protection place any constraints on the pattern of connectivity between the gating unit (representing the mesocortical DA system) and the context module (representing the PFC). This issue is an important one because it may provide a potential point of contact between the theory and neuroanatomical studies of DA projections in PFC. With respect to the duration of gating unit activity, we examined whether phasic and tonic activity can be functionally distinguished. This is an explicit assumption of the theory that has implications for our understanding of the physiology of the DA system, and the simulation provides an explicit test of that assumption. With respect to the strength of gating unit activity, we examined the relationship between memory updating and interference-protection. These two functions can be thought of as
334
opposite ends of a continuum. Specifically, new information associated with a gating signal should produce updating, such that old information is deactivated and replaced with new information, whereas new information that is not associated with a gating signal should be prevented from disrupting the current state of maintenance. We tested how manipulating the strength of the gating signal would affect both functions. Methods Architecture and Processing. Gating effects on active memory were explored within the context of a simple memory network (see Fig. 1). The network consisted of a context layer (2 units) which received one-to-one excitatory projections ( + 3.25 weight) from an input layer (2 inputs), representing two separate input conditions (A or B). Each of the units within the context layer had strong selfexcitatory connections ( + 6 weight),which allowed input activity states to be sustained over time. Further, each context unit received two sources of
inhibitory input: (a) lateral inhibition from the other memory unit ( - 3 weight), which produced competition for representations; and (b) local inhibition from a tonically active bias unit (-2.5 weight), which enforced low-levels of baseline activity. The context layer also received input from a gating unit. The connections to the context layer from this unit were multiplicative (sigma-pi). The function describing the relationship between gating unit activity and its modulatory effect on connection weights was the following: 4;= W)W,
where y ( t ) = l + ( ( K - l ) / ( l + e - ( ~ ' r J - ' c 'K~> J J1,) ,C > = 6
and a(t) is the activity of the gating unit at time t , C determines the maximum activity level of the gating unit, and K determines the maximum gain (y) of the gating unit. Thus, if gating unit activity is less than or equal to 0, the gain on the connection strength is equal to 1. If gating unit activity equals C, gain is approximately equal to K . If gating unit
Fig. 1. Simple Attractor Model. This is the architecture of the model used in Simulation 1. Each context unit received two sources of excitatory input (shown with arrowheads) and two sources of inhibitory input (shown with circular heads). Excitatory input was received from a self-connection (recurrent input) and from the input layer (afferent input). Inhibitory input was received from the competing context unit (lateral inhibitory input) and from a tonically active bias unit (local inhibitory input; spiny cell shape). The gating unit (triangular shape) made modulatory connections with context layer inputs (shown with square heads). All four possible gating unit connections were examined in the simulations (shown only for one unit).
335
activity is greater than 0 and less than C, gain monotonically increases with activity level. In all simulations, C was set to 6, and K was set to 3. Trials were presented to the network as a sequence of events occurring in the following order: input A, delay, input B , delay. Each input was presented for a duration of 4 time steps, and each delay lasted 50 time steps. Processing occurred continuously over time in the model, with unit activation states governed by temporal difference equations. Specifically, the following equation was used:
I, ( t + 1) = (y C W , ~+ B, +I, ( f ) ) dt + crZ,(t)sqrt(dt) where yJ= 1/(1 +e-’i)
is the activation of unitj, I, is the total input toj, dt is the time-step of integration, y is the gain, B is the bias, Z, ( t ) is a standard independent Gaussian random variable, and u is the variance of the distribution. In the simulations described below, dt was set at 0.5. Simulations and analysis. The role of three factors were examined in the model: (1) connectivity between the gating unit and memory layer; (2) duration of gating unit activity (tonic vs. phasic); and (3) strength of gating unit activity. The primary hypothesis with respect to connectivity was that gating effects would be optimal when the gating unit made connections only to the afferent and local inhibitory inputs of the context layer. However, the impact of connectivity between the other two input projections (self-excitatory and lateral inhibitory) was also investigated. The first set of simulations examined this question, by looking at effects in the context layer as a result of different patterns of gating unit connectivity. The first simulation also examined differences between tonic and phasic gating unit activity. In particular, three conditions were examined: phasic gating activity during stimulus presentation, no gating during stimulus presentation, and tonic gating activity during delay (i.e. no stimulus presentation). The first condition tested whether a gating signal presented simultaneously with stimulus presentation would produce memory updating. The second condition tested whether the absence of a gating
signal during stimulus presentation would prevent memory updating (i.e. interference). The third condition tested the effects of a tonic increase in gating unit activity during the delay period. A second set of simulations manipulated the strength of gating unit activity levels in order to examine the effect of this factor on the active maintenance of context. Moreover, the effects of gating activity strength were examined under both tonic and phasic conditions. In the first simulation, processing was deterministic ( a = O ) . However, in the second simulation, zero-mean Gaussian noise was added to the net input of the context units on every time step (cr = 0.95), in order to produce variability in processing. One thousand trials were simulated at 10 levels of gating unit activity strength. In the phasic condition, strength varied from 0.0 C to 1 .O C in 0.1 C increments. In the tonic condition, strength varied from 0.0 C to 0.5 C in 0.1 C increments. In both sets of simulations, phasic activity in the gating unit was simulated by setting the activity level to its maximum value (C) for 2 time steps; otherwise activity was set at 0. Gating occurred during the middle 2 time steps of stimulus presentation. In the first set of simulations, tonic activity in the gating unit was simulated by setting the activity of the gating unit to 50% of its maximum value throughout the last 50 time steps of the delay period. Each simulation run was analyzed by computing the percentage of trials that each context unit was active (greater than 0.5), for every time step of the trial. Results The first set of simulations demonstrated that, as expected, gating unit connectivity on both afferent and local inhibitory inputs provided both interference protection as well as updating. As Fig. 2 shows, a new input that is not associated with a gating signal does not disrupt the maintained state, however, if that input is accompanied by gating unit activity, the new information replaces the old state in memory, thus providing successful updating. This dynamics is directly a function of the gating connectivity. The potentiating effect of gating on the inhibitory bias deactivates the current state,
336
No DA Activity 90 a
50
,->
Input Alone
L
Y w
- Context A
1111111
Context B
Fig. 2. Gating Effects on Activation Dynamics. These plots illustrate the effects of gating on maintained context infomiation (Context A, solid lines). Afferent input corresponding to a competing context representation (Context B, dashed lines) not accompanied by a gating signal does not disrupt the state of the context layer (left panel), but if the input is accompanied by a gating signal (middle panel), the state of the context layer is updated and a new context representation is maintained. Tonic activity in the gating unit has an inhibitory effect on context activity, causing it to decay (right panel). Vertical dashed lines denote the period during which the gating unit is active. Vertical axis corresponds to the percent of trials in which each context unit was active (greater than 0.5) for the corresponding timestep.
while the potentiation of afferent activity allows the new state to gain access to memory. Interestingly, sustained increases in gating unit activity also had a deactivating effect; when the gating unit was tonically active in the absence of afferent input, it acted in a purely inhibitory manner, causing the maintained state to decay away. Furthermore, the simulations suggested that this pattern of connectivity from the gating unit affecting local inhibition and afferent input - was the optimal one. Adding the connection to the lateral inhibitory input did not have any additional affect on activity dynamics. This makes sense in that, during a gating signal, potentiating the lateral inhibitory input would not lead to further deactivation of the old information. Moreover, it would only serve to increase the difficulty by which new information could gain access to memory, by increasing the competition from old information prior to deactivation. Adding a gating connection to the self-excitatory connections in the context layer also had a deleterious effect, in that it prevented updating; this occurred because the potentiation of the self-excitatory input negated the potentiating effect on inhibitory bias. Because the self-excitatory weight was stronger than the inhibitory bias
(which is a necessary relationship in order to enable self-sustaining activity), when the gating signal potentiated both connections it served to increase, rather than decrease the net excitatory input to the already active context unit. This prevented the unit from deactivating, and produced a state of increased competition that prevented the new information from replacing the old. The second simulations demonstrated that gating effects are graded in nature, and are modulated by the strength of the gating signal. As shown in Fig. 3, manipulating the strength of the gating signal affected both the robustness of updating and interference protection. As the gating signal decreased from its maximal value, there was a decreased probability that the new information would replace the previous state. Conversely, as the gating signal increased from its minimum value (i.e. 0), there was an increased probability that information would interfere with the current state. Moreover, as the results make clear, updating and interference lie on a continuum defined by the gating signal; in other words, it is the gating signal that defines which information is relevant or irrelevant. Irrelevant information is information that should not be accompanied by a gating signal.
337
However, if a partial signal does occur to the information, it will have the opportunity to produce interference, by disrupting the currently maintained state. On the other hand, for information that is task-relevant, a gating signal should occur synchronously with its presentation. If this gating signal is reduced, the new information will not as reliably update active memory, and thus, the previous state will persevere. The simulations also demonstrated that tonic activity in the gating unit has a distinctly different effect than that of phasic activity. Specifically, tonic gating unit activity primarily impacts the robustness of active maintenance (see Fig. 3 ) . When the gating unit has low tonic activity, information can be maintained reliably; however,
as tonic activity increases above baseline value it causes memory decay. Within a certain range, the effect interacts with time, such that the longer the delay, the greater the memory decay.
Discussion The simulation results further establish the computational plausibility of the theory we have put forth regarding the functional role of the gating mechanism in cognitive control, while touching on important issues with regard to the anatomy and physiology of the DA system and their functional consequences. In the simulations, gating occurred through phasic changes in activity that occurred
-
Context A Interference
46 47 48 49 50 51 52 53 5455 56 57 58 59 60 61 6263 6465
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 6465
Time
Time
+ 30%
-&-
50%
U 70%
-e- 40%
-+
60%
-0-
90%
-A-
80%
-
Context A Maintenance
+ 25% e!
n
-A-
30%
4
35%
-D- 40% 1
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
1
I I ’
-C-
45%
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 6263 6465
Time
Fig. 3. Parametric Effects on Updating, Interference and Maintenance. The upper two plots show the parametric relationship between updating and interference. In the upper left panel, the maintained context representation (Context A) becomes progressively more perturbed as the strength of the gating signal accompanying afferent input is increased. In the upper right panel, updating to the competing context representation (Context B) becomes progressively more reliable as gating signal strength increases. In the lower panel, the maintained context representation shows progressively greater memory decay as tonic gating unit activity increases. Vertical dashed lines denote the period during which the gating unit is active. The vertical axis corresponds to the percent of trials in which each context unit was active (greater than 0.5) for the corresponding timestep. Each line represents a different value of gating unit strength (expressed as a percentage of baseline strength).
338
synchronously with task-relevant information, but were absent with information that was not relevant for performance. This pattern of activity dynamics is consistent with the findings from midbrain DA neurons recorded in behaving primates (Schultz, 1992). The effect of gating was one of potentiation, multiplying the strength of both excitatory and inhibitory signals. This effect also matches what has been observed physiologically, through recordings of the post-synaptic effects of DA activity on target cells (Chiodo and Berger, 1986). Moreover, these functions were found to be optimal when the gating unit modulated both excitatory afferent and local inhibitory connections in the context layer. This connectivity pattern is similar to what has been found in anatomical studies, where DA has been observed to affect both pyramidal cells and local inhibitory interneurons in PFC (GoldmanRakic et al., 1989; Sesack et al., 1995). The simulations may also account for relevant physiological data in terms of the functional distinctions between tonic and phasic activity in the gating unit. Tonic gating unit activity was found to have a primarily inhibitory effect on active memory, causing deactivation of previously maintained information. This effect might provide a resolution of conflicting findings within the neurophysiological literature on DA actions in PFC. In particular, DA has often been observed to have an inhibitory effect on activity, rather than a potentiating effect ( e g Ferron et al., 1984). Based on the simulation results, DA should have an inhibitory effect unless it occurs synchronously with afferent input. Although the simulations demonstrate that the dynamics and connectivity of the gating system capture a great deal of the known physiological and anatomical data regarding the DA system in PFC, there are still unresolved issues regarding the nature of the gating mechanism and its relationship to DA. In particular, phasic decreases in DA activity have also been observed in the studies of Schultz et al. (1993), and have been interpreted as having functional consequences. Specifically, these decreases in activity occurred in situations where reward was withheld, such as when the animal responded incorrectly. This finding is perfectly consistent with a reinforcement learning account of
DA function, in that situations in which a reward was predicted but not received should lead to a temporal-difference error that would be manifest as a decrease in activity. However, it is not clear whether this may also have functional consequences for the gating processes of DA. In the current simulations, a phasic decrease in activity (which would be modelled as negative gating unit activity) did not have any functional consequences, as the multiplicative effect of the gating unit was I for activity levels equal to or less than zero. Yet one could imagine how it would be adaptive for phasically decreased DA activity to play a functional role in gating. In particular, given that decreases in activity typically reflect errors, it might be useful to have this signal serve some type of ‘reset’ function in active memory. To take a real world example, in the Wisconsin Card Sort Test (WCST), receiving error feedback from the experimenter should produce a reset in the currently maintained sorting category, in order to provide new categories with an equal chance to be maintained as the sorting rule. Indeed, a similar type of mechanism has been invoked in previous computational models of the WCST in order to account for the performance data of normal and frontal patients in the task (Levine and Prueitt, 1989; Dehaene and Changeux, 1992). Interestingly, in exploratory simulations, we observed that using a function which transforms negative gating unit activity into a multiplicative gain less than 1 (i.e. causing a reduction in connection strength) might produce similar types of behavior in the context layer of the model. Specifically, this behavior occurred when the gating unit also affected the lateral inhibitory connections between context units. In this situation, a phasic decrease in activity (which caused the multiplicative gain to go to 0) effectively shut-out the input from all connections except for the selfexcitatory ones. Since the self-excitatory connections were strong, both context units rapidly increased their activation level, leading to a transient state where both contexts were simultaneously being maintained. Once the gating unit activity returned to baseline, the lateral inhibitory effects resumed, and the units competed for representation. Because both context units were
339
equally and highly active when competition resumed, each context had equal opportunity to win the competition. Thus, the phasic decrease in gating activity provided a type of reset function, by allowing all contexts an equal chance to be maintained. Although these effects were not explored in great detail, they warrant further investigation in simulations to determine whether they might capture an additional control function of the DA system in active memory. Another aspect of the gating mechanism which may not fully capture known properties of the mesocortical DA system is the nature of its inhibitory effects in the context layer. In particular, the anatomical data on DA projections in PFC suggest that the inhibitory actions of DA may occur indirectly, by potentiating the excitatory inputs to local inhibitory neurons (Sesack et a]., 1995). In contrast, the current simulations assumed a more direct inhibitory effect of the gating mechanism, by having it modulate the output connection of the bias unit onto the context unit. Moreover, in the simulations the bias units were assumed to be tonically active, so their activity level was not dependent on excitatory input. It is likely that both the direct and indirect connectivity patterns would produce similar inhibitory effects, but this cannot be stated confidently without further exploration through simulations. One possible approach to more fully capturing these anatomical constraints would be to postulate that activity in the bias unit is driven by an external source. For example, connectivity in the model could be modified such that the bias unit and context unit were mutually connected in a negative feedback circuit. In this connectivity pattern, the gating unit would potentiate the output of the context unit and increase the negative feedback provided by the bias unit. Simulations could determine whether this pattern produces the appropriate inhibitory dynamics that allow for updating. The second set of simulations conducted in this study provided useful insights into the parametric relationship between gating unit activity and active maintenance. In particular, the simulations demonstrated three significant effects: (1) Reduced phasic activity during the presentation of ‘task-relevant’ stimuli leads to perseveratory behavior, by decreas-
ing the probability that the previous context will be replaced by the current context; (2) Increased phasic activity during the presentation of ‘irrelevant’ stimuli produces interference effects, by increasing the probability that these stimuli will disrupt the currently maintained context; and (3) Increased tonic activity during delay periods produces a delay-related decay of active memory, by increasing the probability that the current context will deactivate over time. Together, these three effects may provide a potential model of the effects of DA impairment on cognitive control. Indeed, perseveration, poor interference control and maintenance deficits are all three symptoms that have commonly been associated with schizophrenia (Malmo, 1974; Nuechterlein and Dawson, 1984). The theory provides an explicit mechanism which might explain how these symptoms arise. In the next simulation, we test this idea directly, by incorporating the gating mechanism into a model of performance on a simple cognitive control task. We examine whether disturbances to this mechanism can be used to account for the patterns of behavioral impairments observed by schizophrenia patients.
Simulation 2: behavioral impairments in schizophrenic patients In this simulation, we test whether the disturbances to a DA-mediated gating mechanism can account for the cognitive control impairments seen in patients with schizophrenia. In particular, the model suggests that the pattern of deficits observed in patients are consistent with both decreased phasic and increased tonic DA activity. The gating hypothesis predicts that tonically increased DA activity should produce deficits in active or working memory, while decreased phasic DA activity should produce perseveration and interferenceeffects. Here, we directly test these predictions by conducting simulations of behavioral performance on a simple cognitive control task which requires both active maintenance and frequent updating of context information. The task is an ‘AX’ variant of the Continuous Performance Test (CPT, Rosvold et al., 1956). We have collected extensive behavioral data regarding the performance of both healthy subjects and patients with schizophrenia on this
340
iting response prepotencies. Specifically, control over processing can be examined in the three types of non-target trials, which occur with 10% frequency each (BX, AY, and BY, where ‘B’ corresponds to any non-A stimulus, and ‘Y’ to any non-X). Context information must be used on BX trials in order to inhibit the prepotent tendency to make a target response to the X. In contrast, context acts to bias incorrect responding on AY trials, since the presence of the A sets up a strong expectancy to make a target response to the probe. BY trials provide an index of performance in the absence of response competition. Behavioral data. The data for this simulation were taken from a study first presented in Braver (1997); participants in the study were 16 DSM-IV schizophrenia patients and 16 matched controls. Patients were neuroleptic-naive and experiencing
task (Servan-Schreiber et al., 1996; Braver, 1997; Cohen et al., 1999). Simulations of behavioral data were conducted by adding a gating mechanism to an existing computational model of the task (Braver, Cohen, and Servan-Schreiber, 1995; Cohen, Braver, and O’Reilly, 1996; Braver, 1997). Methods Task. In the AX-CPT, single letters are visually presented as a sequence of cue-probe pairs (Fig. 4). A target response is required to a specific probe letter (X), but only when it follows a designated cue (A). A manipulation of the delay interval between cue and probe (1 s short delay vs. 5 s long delay) enables an examination of active memory demands. In addition, target trials occur with high frequency (70%), which enables examination of the role of context in biasing response competition and inhib-
Short Delay Condition
CUE
Target is an X following an A
VALID I
TARGET
PROBE
Long Delay Condition
NONTARGET PROBE
O0
INVALID
70% (10%)
(10%)
Target is an X following an A Fig. 4. The AX-CPT task. Trials consist of single letters occurring as sequences of cue-probe pairs. In the short delay condition, the delay period is 1 s, intertrial interval is 5s. In the long delay conditions, the delay period is Ss, intertrial interval is Is. A target is defined to be an X immediately following an A. Targets occur with 70% frequency, and the three other trial types (AY,BX, BY) each occur with 10% frequency.
341
their first hospitalization for psychotic symptoms. Consequently, they formed a select subgroup of participants which are free of many of the confounds and complications associated with studying schizophrenia patients (e.g. medication, chronicity, or institutionalization effects). Both groups performed 200 trials of the AX-CPT evenly divided between short and long delay conditions. Intertrial interval was counterbalanced so that total trial duration was equated across delay conditions. Participants pressed one button of a response box for target probes and a separate button for nontargets. Both accuracy and reaction time data were collected. There were two primary behavioral measures of interest. The first, context sensitivity, indexed the ability to respond correctly to an X probe based on its prior context. Context sensitivity was computed by comparing AX hits to BX false alarms, using the d’ function. The second measure, context cost, indexed the degree of response slowing on non-target trials due to the presence of an A cue. Context cost was computed by calculat-
ing the difference in reaction time in AY trials relative to BY trials; these two measures were calculated separately for the long and short conditions in each group. Coinputationat model. A gating mechanism was incorporated into an existing computational model of the AX-CPT. The original model was found to successfully capture many aspects of both normal and schizophrenic performance in the task (Braver et al., 1995; Cohen et al., 1996; Braver, 1997). The addition of a gating mechanism provided a means to check whether the new model could also account for performance by incorporating a more refined model of DA activity. The architecture of the model is shown in Fig. 5. The model consisted of a direct pathway composed of feed-forward connections between a pool of input units, representing the four stimulus conditions (A, B, X or Y), a pool of four associative units (representing the two possible associations - target or nontarget - activated for each probe stimulus), and a pool of two output units. In addition, the cue inputs also projected to a
a a s t
Context (PFC)
Q-og>
N o n T
output
Associations
(ol= Gating Unit (DA)
a CUE
t 0 n0
Input d
PROBE
Fig. 5. Diagram of Gating Model. Architecture of model used to simulate the AX-CPT task. Units in the context layer have selfexcitatory connections, which provide a mechanism for active maintenance. The gating unit makes a multiplicative connection with both afferent excitatory and local inhibitory (not shown) inputs to the context layer.
342
layer of context units. The context layer then projected back to the pool of associative units in the direct pathway. Units within the context layer had strong (non-modifiable) self-excitatory connections ( + 6.0 weight) which provided a mechanism for active maintenance. Additionally, within each pool of units, there were lateral inhibitory connections which produced competition for representations. Finally, each unit was associated with a local inhibitory unit which provided a tonic negative bias ( - 2.5 weight) on baseline activity states. Processing evolved continuously over time in the model according to the temporal difference equation described in Simulation l . The duration of relevant events within the simulation (e.g. cue and probe presentation, delay periods) were scaled to approximate the temporal relationships used in the actual task. Thus, the cue and probe were each presented for 2 time steps, the short delay lasted 7 time steps, and the long delay lasted 33 time steps. The presentation of each stimulus was simulated by adding an external source of activation (i.e. softclamping) to units in the input layer for a short duration. Input activation states were then allowed to evolve in response to this external input. All input units were provided this external source of activation during presentation of every stimulus, in order to approximate the effects of distributed representations, and lateral competition at the sensory stage of processing. Network weights were developed through a backpropagation training procedure consisting of repeated presentations of each of the 8 different trial types of the AX-CPT (AX, AY, BX and BY at both short and long delays), with the presentation frequency of each type matching that of the behavioral task. This learning approach enabled optimization of weight strengths based on both the constraints of task performance and the relative frequencies of task events. Gating was added to the trained model by including an additional unit that had modulatory effects on the local inhibitory and afferent excitatory connections to the context layer which were identical to those described in Simulation 1. As in Simulation 1, the input connections to the gating unit were not trained. Rather, these connections were assumed to already have been learned (a simulation demonstrating how these
connections might develop through learning is provided in Braver and Cohen, in press). The only other addition to the model was that the input-tocontext connections were adjusted so that the presence of external input alone was strong enough to activate the context module when it was in a resting state (i.e. when no other units in the pool were active), but not strong enough to update it from an active state (i.e. when a competing unit in the pool was already activated). Simulations. One thousand trials of each of the eight stimulus conditions (4 trial types x 2 delays) were simulated in both the intact and impaired models. Trials were presented to the model as a continuous sequence of events occurring in the following order: cue, delay, probe, ITI. The gating unit became transiently activated during presentation of the cue and probe stimuli. Simulations of performance on each condition were conducted by determining which of the output units was the first to surpass a prespecified threshold value, and then collecting accuracy and RT statistics across each trial. Noise was added to each unit’s activation state on each time step in order to simulate variability in processing. Both the noise and threshold parameters were fixed at the levels derived for the original model (noise = 0.95; threshold = 0.65). In order to simulate the disturbances in the mesocortical DA system thought to be present in schizophrenia, we disturbed the activity of the gating unit in the model. The specific disturbance that was implemented was to further increase the noise level in gating unit activity (to a value 5 times that of the rest of the units). This pattern of disturbance causes changes in both tonic and phasic activity levels, as a result of the function which relates gating unit activity to its multiplicative effects on synaptic strength. Specifically, because the function is bounded and monotonic (i.e. a logistic), increases in noise will raise the mean value of gain for baseline (low) levels of gating unit activity (i.e. tonic gating) and decrease the mean value of gain for high levels of gating unit activity (i.e. phasic gating). Results
Behavioral Data. The behavioral data are shown in Fig. 6. For healthy controls, sensitivity to context
343
was relatively high (d' > 3 ) . Moreover, there were no significant effects of delay on sensitivity. Conversely, the cost of maintaining context was also relatively high in terms of RT slowing ( - 140 ms), and also did not decrease much with delay. In contrast, in patients with schizophrenia both context sensitivity and context cost were significantly reduced. These effects further interacted with delay, so that the difference between patients and controls was greatest at the long delay. Thus, the performance data suggest patients showed impairments in both the representation and maintenance of context information. Furthermore, the pattern of performance elicited by patients in this task also provides evidence that patients suffer from a specific impairment in cognitive control, rather than a more general deficit pattern (e.g. Chapman and Chapman, 1978). This pattern can 3.5
observed by noting that the context disturbance exhibited by patients actually results in a relative benefit in performance, since they show less of a context cost, manifest as less response slowing to AY trials relative to BY trials. Simulation data. The simulations were able to successfully capture the qualitative pattern of the behavioral data (see Fig. 6) and context sensitivity and context cost were both high in the intact model but decreased in the noisy gating model. Moreover, these effects also replicated the interaction with delay observed empirically; in particular, the difference between the two models was greatest at the long delay for both measures. An examination of the dynamics of activity in the context layer during the delay interval revealed the mechanism for these effects and in particular, it was found that in the noisy gating model, there was an increased failure 170,
I
-
140/
3.5; 3L
Simulation Data
-0.
Y b
-
2.51
:
. 2-
1.5
- ,,
I
t Y
170
*.....
0
; 110-
- 0 .
.'f
I
/
:
80-
em...... - * I
50 7
Fig. 6. AX-CPT Data: Behavioral and Simulation. These figures show data for both context sensitivity and context cost performance measures for controls and patients with schizophrenia. The upper plots show the behavioral data and the lower plots show simulation data. The simulation captures both the overall reduction of context sensitivity and context cost in schizophrenia, as well as the interaction with delay.
344
for the context representation to update following presentation of the cue. Additionally, even in the trials in which the correct representation was properly activated, the maintenance of this representation was less robust. There was an increased tendency for the representation to decay away, and this tendency accumulated over the delay.
Discussion The results of this study suggest that the gating model of the AX-CPT task was able to successfully capture the specific pattern of behavioral performance observed both in healthy controls and in patients in schizophrenia. Thus, the model compares favorably to the AX-CPT model developed previously, which also accounted for this dataset (Braver, 1997). However, the current model significantly refines and extends the account of the mechanisms hypothesized to underlie schizophrenic deficits in task performance. The earlier model accounted for these performance deficits by suggesting that, in schizophrenia, DA activity is tonically reduced in PFC. In the current model, the mechanism responsible for producing AX-CPT performance deficits is increased noise levels in mesocortical DA. This particular disturbance resulted in both increased tonic activity and decreased phasic activity in the gating system. As suggested by the results of Simulation 1, the increased tonic activity produces deficits in the maintenance of context, while the decreased phasic activity produces in deficits in updating the representation of context. The functional distinction in the model between disturbances in phasic and tonic DA activity is an important advance in the theoretical account of the pathophysiology of schizophrenia. It is worth noting that the gating account also appears to be more consistent with neurobiological data regarding the etiology of the disease. In particular, Grace (1991) has postulated that schizophrenia is associated with disturbances in both tonic and phasic DA activity, based on an analysis of neurolepticeffects on DA physiology. Importantly, however, Grace’s model predicts that patients with schizophrenia suffer from increased phasic and decreased tonic DA, which is exactly opposite to the account
provided by the current model. Thus, further work will be needed to examine these two models in greater detail in order to determine which provides a better account of the data. In the current simulation, a single disturbance increased noise levels in gating unit activity - was found to capture the pattern of performance deficits exhibited by patients in the AX-CPT. This occurred because increasing gating unit noise affected both tonic and phasic activity levels. However, the model also holds open the possibility that tonic and phasic DA activity can be independently affected by different mechanisms. Moreover, since tonic DA activity is associated with the active maintenance of context, and phasic DA activity is associated with the updating of context, the model also suggests that deficits in these two processes are also dissociable, at least in principle. This raises the intriguing possibility that different patient subgroups might suffer from independent disturbances in these two components of DA function. If patients from both subgroups were present in the data set, the averaged results would appear as if both deficits were present. This hypothesis could be tested by examining the clinical symptomatology of patients more closely, to examine whether there are relationships between different symptom subtypes and the prevalence of disturbances in updating vs. maintenance of context information. In particular, a specific disturbance in context updating would be revealed as reduced context sensitivity and context cost, but no effect of delay on performance. A specific disturbance in context maintenance would be revealed as normal performance levels at the short delay, but a significant effect of delay, such that both context sensitivity and context cost are reduced at the long delay. Thus, the model provides a means of relating clinical heterogeneity to particular neurobiological mechanisms. Another advance of the current model over our previous model is that it can potentially account for both normal and schizophrenic behavioral data in a much wider range of cognitive control tasks. In particular, the model suggests how context information in PFC can be actively maintained in the face of interference, and how this function might be disturbed in schizophrenia. In the model, the degree of interference produced by irrelevant items is
345
directly related to the degree of phasic DA activity that occurs synchronously with the presentation of each item. If phasic DA responses to irrelevant items are increased in patients with schizophrenia, this could account for the increased susceptibility to interference-effects from distractors that are so commonly observed in experimental studies (e.g. Neuchterlein and Dawson, 1984). In recent preliminary work, we have begun to demonstrate how such a mechanism could also account for interference effects in normal behavior. Specifically, we conducted simulations to try to capture the pattern of behavioral data we observed in a study with healthy subjects performing an interference version of the AX-CPT (Braver, 1997). In terms of performance, we found that irrelevant items presented during the delay period of AX-CPT trials produced significant interference effects when these items were very similar to the cue and probe stimuli (i.e. identical letters presented in different colors). Indeed, the pattern of behavioral impairments observed in healthy subjects under interference conditions was very similar to that observed by patients with schizophrenia (i.e. reduced context sensitivity and context cost). In simulations of this task using the gating model, we found that we could account for these patterns of behavioral impairment by assuming that irrelevant items were associated with a partial gating signal (Braver, Cohen and McClelland, 1997). We made this assumption by hypothesizing that the high degree of featural similarity between relevant and irrelevant stimuli would result in overgeneralization of DA activity. There is support for this assumption from physiological recordings of DA neuronal activity in behaving primates. The primate data suggest that DA neurons do exhibit partial responses to stimuli that are similar to rewardpredictive cues (Mirenowicz and Schultz, 1996).
General discussion and conclusions The studies presented in this paper establish the computational and empirical plausibility of a new theory regarding the role of DA-mediated gating in normal cognitive control and cognitive control impairments in schizophrenia. In particular, the theory postulates that control over the representa-
tion and maintenance of goal-related context occurs through phasic gating signals which arise as a consequence of the dynamics of reward-based learning. Because the theory was conceptualized in terms of explicit computational mechanisms, it was possible to investigate these theoretical hypotheses through simulation studies. The simulations demonstrated that: (a) the gating signal can provide a mechanism for both interference protection and flexible updating of stored information; and (b) disturbances to the gating mechanism can account for the behavioral impairments observed in schizophrenia patients during performance of a simple cognitive control task. As the above discussions has indicated, the simulation results have important implications for: (1) the neurobiology of DA function; (2) pathophysiology and cognitive deficits in schizophrenia; and (3) the nature and mechanisms of cognitive control. With regard to DA function, the theory postulates that dopamine’s role in behavior is a unified one, that exploits simple neuromodulatory effects to modulate both learning and on-going processing. Moreover, the simulations make intriguing suggestions regarding constraints on anatomical connectivity (i.e. DA does not project to local recurrent inputs) and differing postsynaptic effects of tonic and phasic activity (i.e. tonic DA effects are inhibitory in nature while phasic DA effects are modulatory in nature). With regards to schizophrenia, the theory suggests that patients suffer from both tonic and phasic DA disturbances and that these may contribute to maintenance and updating deficits, respectively. Additionally, this hypothesis may help to provide a unified account that can explain why patients appear to suffer from three otherwise unrelated impairments: perseveration and switching problems (e.g. Malmo, 1974; Frith and Done, 1983), distractibility and susceptibility to interference (e.g. Nuechterlein and Dawson, 1984), and memory failures (e.g. ServanSchreiber et al., 1996). The theory may also contribute to an understanding of the pathophysiology of schizophrenia by making close contact with the known physiological properties of both the DA system and PFC. In particular, by specifying the behavioral consequences of detailed physiological disturbances (i.e. tonic vs. phasic DA dysfunction),
346
the theory can provide a crucial point of contact between behavioral and basic neuroscience research on schizophrenia. This may lead to the development of more refined animal models, and to new ways of examining hypotheses drawn from neurobiologically-based research (e.g. Grace, 1991). The theory presented here also has the potential to provide a new account of normal cognitive control in terms of neurobiologically plausible mechanisms. In so doing, it provides an illustration of how a system built of simple processing elements can learn to regulate its own behavior in an intelligent and adaptive fashion, without invoking the perennial problem of the homunculus. In particular, this theory extends and refines the account of PFC, DA, and cognitive control developed in our previous modelling efforts (Cohen and Servan-Schreiber, 1992; Braver et al., 1995). Here, we specify the mechanisms by which PFC is able to control processing in a top-down manner, while at the same time remaining responsive to bottom-up input from other parts of the system. We hypothesize that this interplay of bottom-up and top-down processing is mediated by the DA system, through its dual effects on gating and reward-prediction. As a consequence of these dual effects of DA, control emerges in the system through the dynamics of ‘regulated interactivity’ . DA provides the system with the ability to learn when to trigger the gating signal, thereby controlling the contents of active memory in PFC. In turn, this allows the PFC to use its representations as context for biasing processing in rest of the system. The claims we have made in this paper are bold and still somewhat speculative. Much work remains to be done in both the development and validation of the theory presented here. The effort to understand the function of DA and its role in schizophrenia promises to be a complex endeavor, and will require the most powerful conceptual tools we have available. We believe that computational modeling represents one such tool. By pursuing our ideas within a computational framework, it is possible to make them explicit within simulation models. This not only provides a check on their conceptual validity, but also provides a means for exploring, in detail, their implications for behavior.
Success in this effort would not only provide insights into the mechanisms underlying some of our highest and uniquely human faculties, but might also allow us to understand better with how these faculties can break down in a disease such as schizophrenia, which has such devastating consequences for behavior.
References Abramczyk, R.R., Jordan, D.E. and Hegel, M. (1983) ‘Reverse’ stroop effect in the performance of schizophrenics. Percept. Motor Skills, 56, 99-106. Anderson, J.R. (1983) The Architecture of Cognition. Cambridge, MA: Harvard University Press. Aston-Jones, G., Rajkowski, J., Kubiak, P. and Alexinsky, T. (1994) Locus coeruleus neurons in monkey are selectively activated by attended cues in a vigilance task. J. Neurosci., 14(7), 44674480. Baddeley, A.D. (1986) Working Memory. New York: Oxford University Press. Baddeley, A.D. and Hitch, G.J. (1994) Developments in the concept of working memory. Neuropsychology, 8(4), 485-493. Barch, D.M., Braver, T.S., Nystrom, L., Forman, S.D., Noll, D.C. and Cohen, J.D. (1997) Dissociating working memory from task difficulty in human prefrontal cortex. Neuropsychologia, 35, 1373-1380. Bauer, R.H. and Fuster, J.M. (1976) Delayed-matching and delayed-response deficit from cooling dorsolateral prefrontal cortex in monkeys. J. Comp. Physiol. Psychol., 90(3), 293-302. Bengio, Y., Frasconi, P. and Simard, P. (1993) The problem of learning long-term dependencies in recurrent networks. Paper presented at the Proceedings of the IEEE International Conference on Neural Networks. Bianchi, L. (1922) The Mechanism of the Brain and the Function of the Frontal Lobes. Edinburgh: Livingstone. Braver, T.S. (1997) Mechanisms of cognitive control: A neurocomputational model. Ph.D. Thesis, Carnegie Mellon University. Braver, T.S. and Cohen, J.D. (in press) On the control of control: The role of dopamine in regulating prefrontal function and working memory. In: S. Monsell and J. Driver (Eds.), Attention und Performance (Vol. XVIII, ) Cambridge: MA: MlT Press. Braver, T.S., Cohen, J.D. and McClelland, J.L. (1997) An integrated computational model of dopamine function in reinforcement learning and working memory. SOC. Neurosci. Abstr:, 23, 775. Braver, T.S., Cohen, J.D., Nystrom, L.E., Jonides, 5.. Smith, E.E. and Noll, D.C. (1997) A parametric study of prefrontal cortex involvement in human working memory. Neuroimage, 5( l), 49-62.
347
Braver, T.S., Cohen, J.D. and Servan-Schreiber, D. (1995) A computational model of prefrontal cortex function. In: D.S. Touretzky, G . Tesauro and T.K. Leen (Eds.), Advances in Neural Information Processing Systems (Vol. 7, pp. 141-148) Cambridge, MA: MIT Press. Brozoski, T.J., Brown, R.M., Rosvold, H.E. and Goldman, P.S. (1979) Cognitive deficit caused by regional depletion of dopamine in prefrontal cortex of rhesus monkey. Science, 205(31), 929-931. Calabresi, P., Pisani, A,, Centonze, D. and Bernardi, G. (1997) Synaptic plasticity and physiological interactions between dopamine and glutamate in the striatum. Neurosci. Biobehav. Rev., 21, 519-523. Callaway. E., Halliday, R., Naylor, H. and Yano, L. (1994) Drugs and human information processing. Neuropsychopharmacology, 10( I ) , 9-19. Carter, C.S., Robertson, L.C., Nordahl, T.E., O'Shora-Celaya, L.J. and Chaderjian, M.C. (1993) Abnormal processing of irrelevant information in schizophrenia: The role of illness subtype. Psychiatry Res., 48, 17-26. Chapman, L.J. and Chapman, J.P. (1978) The measurement of differential deficit. J. Psychiar. Res., 14, 303-3 1 1. Chapman, L.J., Chapman, J.P. and Miller, G.A. (1964) A theory of verbal behavior in schizophrenia. In: B.A. Maher (Ed.), Progress in Experimental Personaliry Reseurch,Vol. 4. New York: Academic Press, pp. 135-167. Chiodo, L. and Berger, T. (1986) Interactions between dopamine and amino-acid induced excitation and inhibition in the striatum. Brain R e x , 375, 198-203. Cohen, J.D., Barch, D.M., Carter, C.S. and Servan-Schreiber, D. (1999) Schizophrenic deficits in the processing of context: Converging evidence from three theoretically motivated cognitive tasks. I. Abnormal Psychol.. 108, 120-133. Cohen, J.D., Braver, T.S. and O'Reilly, R. (1996) A computational approach to prefrontal cortex, cognitive control and schizophrenia: recent developments and current challenges. Philosoph. Trans. Roy. SOC. Lond. Series B, 35 1(1 346), 15 15-1 527. Cohen, J.D., Perlstein, W.M., Braver, T.S., Nystrom, L.E., Noll, D.C., Jonides, J. and Smith, E.E. (1997) Temporal dynamics of brain activation during a workmg memory task. Nature, 386,604-608. Cohen, J.D. and Servan-Schreiber, D. (1992) Context, cortex and dopamine: A connectionist approach to behavior and biology in schizophrenia. Psychol. Rev., 99, 45-77. Cohen, J.D. and Servan-Schreiber, D. (1993) A theory of dopamine function and cognitive deficits in schizophrenia. Schizophr Bull., 19(1), 85-104. Comblatt, B.A. and Keilp, J.G. ( 1994) Impaired attention, genetics and the pathophysiology of schizophrenia. Schizophc B ~ l l .20(1), , 31-62. Courtney, S.M., Ungerleider, L.G., Keil, K. and Haxby, J.V. (1997) Transient and sustained activity in a distributed neural system for human working memory. Nature, 386, 608-612. Creese, I., Burt, D.R. and Snyder, S.H. (1976) Dopamine receptor binding predicts clinical and pharmacological poten-
cies of antischizophrenic drugs. Science, 192(April 30), 481483. Damasio, A.R. (1985) The frontal lobes. In: K.M. Heilman and E. Valenstein (Eds.), Clinical Neuropsychology. New York: Oxford University Press, pp. 339-375. Davis, K., Kahn, R., KO, G. and Davidson, M. (1991) Dopamine in schizophrenia: A review and reconceptualization. Am. J. Psychiat., 148(1l), 1474-1486. Dehaene, S. and Changeux, J.P. (1989) A simple model of prefrontal cortex function in delayed-response tasks. J. Cogn. Neurosci., I , 244-26 1. Dehaene, S. and Changeux, J.P. ( I 992) The Wisconsin card sorting test: Theoretical analysis and modeling in a neuronal network. Cereh. Cortex, 1, 62-79. Engle, R.W., Kane, M.J. and Tuholski, S.W. (1999) Individual differences in working memory capacity and what they tell us about controlled attention, general fluid intelligence and functions of the prefrontal cortex. In: A. Miyake and P. Shah (Eds.), Models of Working Memory: Mechanisms of Active Muintenance and Executive Control . New York: Cambridge University Press, pp. 102-1 34. Ferron, A,, Thieny, A.M., Le Douarin, C. and Glowinski, J. (1984) Inhibitory influence of the mesocortical dopaminergic system on spontaneous activity or excitatory response induced from the thalamic mediodorsal nucleus in the rat medial prefrontal cortex. Brain Res., 302, 257-265. Frith, C.D. and Done, D.J. (1983) Stereotyped responding by schizophrenic patients on a two-choice guessing task. Psychological Medicine, 13, 779-786. Fuster, J.M. (1973) Unit activity in prefrontal cortex during delayed-response performance: Neuronal correlates of transient memory. J. Neurophysiol., 36, 61-78. Fuster, J.M. and Alexander, G.E. (1971) Neuron activity related to short-term memory. Science, 173, 652454. Goldman-Rakic, P.S. (1991) Prefrontal cortical dysfunction in schizophrenia: the relevance of working memory. Psychopath. Brain, 1-23. Goldman-Rakic, P.S., Leranth, C., Williams, S.M., Mons, N. and Geffard, M. (1989) Dopamine synaptic complex with pyramidal neurons in primate cerebral cortex. Proc. Natl. Acad. Sci. USA, 86,9015-9019. Grace, A.A. (1991) Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: A hypothesis of the etiology of schizophrenia. Neuroscience, 41( I), 1-24. Heffner, T., Hartman, 3. and Seiden, L. (1980) Feeding increases dopamine metabolism in the brain. Science, 208, 1168-1 170. Heffner, T. and Seiden, L. (1980) Synthesis of catecholamines from h3-tyrosine in the brain during the performance of operant behavior. Bruin Res., 183,403409. Hochreiter, S. and Schmidhuber, J. (1997) Long short-term memory. Neur: Comput., 9,443453. Hopfield, J.J. (1982) Neural networks and physical systems with emergent collective computational abilities. Proceedings of the Narionul Academy of Sciences, 79,2554-2558.
348 Iversen, S. (1984) Cortical monoamines and behavior. In: T. Descarries, R. Reader and H. Jasper (Eds.), Monoamine Innervation of Cerebral Curtex. New York: Liss, pp. 32 1-349. Kelly, P., Seviour, P. and Iversen, S. (1975) Amphetamine and apomorphine responses following 6-ohda lesions of the nucleus accumbens septi and corpus striatum. Brain Res., 94, 507-522. Klorman, R., Bauer, L., Coons, H., Lewis, J., Peloquin, L., Perlmutter, R., Ryan, R.. Salzman, L. and J., S. (1984) Enhancing effects of methylphenidate on normal young adults cognitive processes. Psychophamacol. Bull., 20, 3-9. Koob, G.F. and Bloom, F.E. (1988) Cellular and molecular mechanisms of drug dependence. Science. 242, 7 15-723. Kornetsky, C. and Orzack, M.H. (1978) Physiological and behavioral correlates of attention dysfunction in schizophrenic patients. J . PJychiatric R e x , 14, 69-79. Kubota, K. and Niki, H. (1971) Prefrontal cortical unit activity and delayed alternation performance in monkeys. J. Neurophysiol., 34, 337-347. Law-Tho, D., Deuce, J.M. and Crepel, F. (1995) Dopamine favours the emergence of long-term depression versus longterm potentiation in slices of rat prefrontal cortex. Neurosci. Lett., 188, 125-128. Levine, D.S. and Prueitt, P.S. (1989) Modeling some effects of frontal lobe damage-novelty and perseveration. Neul: Networks, 2, 103-1 16. Lewis, D.A., Hayes, T.L., Lund, J.S. and Oeth, K.M. (1992) Dopamine and the neural circuitry of primate prefrontal cortex: Implications for schizophrenia research. Neurupsychopharmacology, 6(2), 127-134. Louilot, A,, Taghzouti, K., Deminiere, J., Simon, H. and LeMoal, M. (1987) Dopamine and behavior: Functional and theoretical considerations. Neurotransmitter lnteractions in the Basal Ganglia. New York: Raven Press, pp. 193-204. Luciana, M., Collins, P.F. and Depue, R.A. (1995) DA und 5-HT injuences on spatial working memory,functions of prefrontal cortex. Paper presented at the Cognitive Neuroscience Society Second Annual Meeting, San Francisco, CA. Luciana, M., Depue, R.A., Arbisi, P. and Leon, A. (1992) Facilitation of working memory in humans by a D2 dopamine receptor agonist. J. Cogn. Neurosci., 4( I ) , 58-68. Luria, A.R. (1969) Frontal lobe syndromes. In P.J. Vinken and G.W. Bruyn (Eds.), Handbook of Clinical Neurology, Vol. 2. New York: Elsevier, pp. 725-757. Malmo. H.P. (1974) On frontal lobe functions: Psychiatric patient controls. Cortex, 10, 23 1-237. Manschreck, T., Maher, B.A., Milavetz, J.J., Ames, D., Weisstein, C.C. and Schneyer, M.L. ( 1 988) Semantic priming in thought disordered schizophrenic patients. Schizophl: Res., 61-66. Miller, E.K., Erickson, C.A. and Desimone, R. (1996) Neural mechanisms of visual working memory in prefrontal cortex of the macaque. J. Neurosci., 16(0), 5154-5167. Milner. B. (1963) Effects of different brain lesions on card sorting. Arch. Neurol., 9, 90-100.
Mirenowicz, J. and Schultz, W. (1996) Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature, 379, 4 4 9 4 5 1. Montague, P.R., Dayan, P. and Sejnowski, T.J. (1996) A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci., 16, 1936-1947. Moody, S.L., Wise, S.P., di Pellegrino, G. and Zipser, D. (1998) A model that accounts for activity in primate frontal cortex during a delayed-match-to-sample task. J. Neurusci., 18, 3 9 9 4 10. Moore, R.Y. and Bloom, F.E. (1978) Central catecholamine neuron systems: Anatomy and physiology of the dopamine systems. Ann. Rev. Neurosci., I , 129-169. Mora, F. and Cobo, M. (1990) The neurobiological bases of prefrontal cortex self-stimulation: A review and integrative hypothesis, Progress in Brain Research (Vol. 85, pp. 419431) Berlin: Elsevier. Mozer, M.C. (1993) Neural net architectures for temporal sequence processing. In: A. Weigend and N. Gershenfeld (Eds.), Predicting the future and understanding the past. (pp. 243-264). Redwood City, CA: Sante Fe Institute Studies in the Sciences of Complexity, Proceedings Volume XVII, Addison-Wesley Publishing. Norman, D.A. and Shallice, T. (1986) Attention to action: Willed and automatic control of behavior. In: R.J. Davidson, G.E. Schwartz and D. Shapiro (Eds.), Consciousness and Self-regulation. (Vol. 4, pp. 1-18). Plenum Press, New York. Norton, S. (1973) Amphetamine as a model for hyperactivity in the rat. Physiol. Behav., 1 1. Nuechterlein, K.H. (1991) Vigilance in schizophrenia and related disorders. In: S.R. Steinhauer, J.H. Gruzelier and J. Zubin (Eds.), Handbook of Schizophrenia Vol. 5: Neuropsychology. Psychophysiology and Information Processing. Amsterdam: Elsevier, pp. 397-433. Nuechterlein, K.H. and Dawson, M.E. (1984) Information processing and attentional functioning in the developmental course of schizophrenia disorders. Schizophl: Bull., 10(2), 160-203. Oades, R.D. (1985) The role of noradrenaline in tuning and dopamine in switching between signals in the central nervous system. Neurosci. Biobehav. Rev., 9, 26 1-282. Owen, A.M., Roberts, A.C., Polkey, C.E., Sahaluan, B.J. and Robbins. T.W. (199 1) Extra-dimensional versus intra-dimensional set shifting performance following frontal lobe excisions, temporal lobe excisions or amygdalo-hippocampectomy in man. Neuropsychologia, 29, 993-1006. Park, S. and Holzman, P.S. (1992) Schizophrenics show spatial working memory deficits. Arch. Gen. Psychiatry, 49, 975-982. Penit-Soria, J., Audinat, E. and Crepel, E (1987) Excitation of rat prefrontal cortical neurons by dopamine: An in vitro electrophysiological study. Brain R e x , 425, 263-274. Phillips, A. and Fibiger, H. (1989) Neuroanatomical bases of intracranial self-stimulation: Untangling the gordian knot, The Neuropharmacological Basis of Reward (Vol. 66, pp. 1-5) Oxford: Oxford University Press.
349 Randrup, A. and Munkvad, I. (1970) Biochemical, anatomical and psychological investigations of stereotyped behavior induced by amphetamine. In: E. Costa and S. Garattini (Eds.), Amphetamine and Related Compounds. New York: Raven Press, pp. 695-7 13. Rosvold, H.E., Mirsky, A.F., Sarason, I., Bransome, E.D. and Beck, L.H. (1956) A continuous performance test of brain damage. Journal of Consulting Psychology, 20(5), 343-350. Sawaguchi, T. and Goldman-Rakic, P. (1994) The role of d l dopamine receptors in working memory: Local injections of dopamine antagonists into the prefrontal cortex of rhesus monkeys performing an oculomotor delayed-response task. J. Neurophysiol., 63(6), I40 1-14 12. Sawaguchi. T. and Goldman-Rakic, P.S. (1991) D1 dopamine receptors in prefrontal cortex: Involvement in working memory. Science, 25 I , 947-950. Sawaguchi, T., Matsumara, M. and Kubota, K. (1990) Catecholaminergic effects on neuronal activity related to a delayed response task in monkey prefrontal cortex. J. Neurophysiol., 63, 1385-1400. Schultz, W. (1992) Activity of dopamine neurons in the behaving primate. Seminars in Neurosciences, 4, 129-1 38. Schultz, W., Apicella, P. and Ljungberg, T. (1993) Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neurosci., 13(3), 906913. Segal, D. (1975) Behavioural characterisation of d- and 1amphetamine: Neurochemical implications. Science, 190, 475-417. Servan-Schreiber, D., Bruno, R.M., Carter, C.S. and Cohen, J.D. (1998) Dopamine and the mechanisms of cognition. Part I: A neural network model predicting dopamine effects on selective attention. Biol. Psychiatry., 43, 713-722. Servan-Schreiber, D., Cohen, J.D. and Steingard, S. (1996) Schizophrenic deficits in the processing of context: A test of a theoretical model. Arch. Gen. Psychiatry, 53, 1105-1 1 13. Servan-Schreiber, D., Printz, H. and Cohen, J.D. (1990) A network model of catecholamine effects: Gain, signal-tonoise ratio and behavior. Science, 249, 892-895. Sesack, S.R., Snyder, C.L. and Lewis, D.A. (1995) Axon terminals immunolabeled for dopamine or tyrosine hydroxylase synapse on GABA-immunoreactive dendrites in rat and monkey cortex. J. Comp. Neurol., 363, 1735-1780. Shallice. T. (1982) Specific impairments of planning. Phil. Trans.R. Soc. Lund., 298, 199-209. Shallice, T. (1988) From Neuropsychology to Mental Structure. Cambridge: Cambridge University Press. Snyder, S. (1972) Catecholamines in the brain as mediators of amphetamine psychosis. Arch. Gen. Psychiatry, 27, 169-179.
Storms, L.H. and Broen, W.E. (1969) A theory of schizophrenic behavioral disorganization. Arch. Gen. Psychiatry, 20(Feb), 129-1 44. Stuss, D.T. and Benson, D.F. (1986) The Frontal Lobes. New York: Raven Press. Sutton, R. (1988) Learning to predict by the method of temporal difference. Machine Learning, 3, 9-44. Sutton, R.S. and Barto, A.G. (1990) Time-derivative models of Pavlovian reinforcement. In: M. Gabriel and J. Moore (Eds.), Learning and Computational Neuroscience: Foundations of Adaptive Networks. Cambridge, MA: MlT Press, pp. 497-537. Wapner, S. and Krus, D.M. (1960) Effects of lysergic acid diethylamide and differences between normals and schizophrenics, on the Stroop color-word test. J. Neuropsychiatry, 2(Nov/Dec), 76-8 1. Weinberger, D.R., Berman, K.F. and Zec, R.F. (1986) Physiological dysfunction of dorsolateral prefrontal cortex in schizophrenia: 1. Regional cerebral blood flow evidence. Arch. Gen. Psychiatry, 43, 114-125. Wickens, J.R., Begg, A.G. and Arbuthnott, G.W. (1996) Dopamine reverses the depression of rat corticostriatal synapses which normally follows high frequency stimulation of cortex in vitro. Neuroscience, 70, 1-5. Williams, M.S. and Goldman-Rakic, P.S. (1993) Characterization of the dopaminergic innervation of the primate frontal cortex using a dopamine-specific antibody. Cereb. Cortex, 3(May-June), 1 99-222. Willner, P. and Scheel-Kruger, J. (1991) The mesolimbic dopamine system: From Motivation to Action. New York: Wiley. Wise, R.A. and Rompre, P.-P. (1989) Brain dopamine and reward. Ann. Rev. Psychol., 40, 191-225. Wynne, L.C.. Cromwell. R.L. and Matthysse, S. (1978) The Nature of Schizophrenia: New Approaches to Research and Treatment. New York: John Wiley and Sons, Inc. Wysocki, J.J. and Sweet, J.I. (1985) Identification of brain damaged, schizophrenic and normal medical patients using a brief neuropsychological screening battery. An International Journal of Clinical Neuropscyhology, 7( l), 4 0 4 4 . Zipser, D. (1991) Recurrent network model of the neural mechanism of short-term active memory. Neur Comput., 3, 179-193. Zipser, D., Kehue, B., Littlewort, G. and Fuster, J. (1993) A spiking network model of short-term active memory. J. Neurosci., 13, 3406-3420. Zubin, J. ( 1 975) Problem of attention in schizophrenia. In: M.I. Kietzman, S. Sutton and J. Zubin (Eds.), Experimental Approaches to Psychopathology. New York: Academic Press, pp. 139-1 66.
350
This Page Intentionally Left Blank
J.A. Reggia. E. Ruppin and D. Glanzman (Eds.) Progress in Brain Research, Val 121 0 1999 Elsevier Science BV. All rights reserved.
CHAPTER 20
Modeling prefrontal cortex delay cells: the role of dopamine in schizophrenia Alastair Reid* and David Willshaw Centre f o r Cognitive Science, Edinburgh University, 2 Buccleuch Place, Edinburgh EH8 9LN UK
Introduction Many strands of evidence indicate that aberrant functioning of the prefrontal cortex (PfCx) is a key part of the pathogenesis of schizophrenia. In particular the idea that the PfCx is underactive in schizophrenics (the hypofrontality hypothesis) is widely accepted (Bachneff, 1991; Wolkin et al., 1992; Berman et al., 1992). Recent research has revealed more information regarding the physiology and morphology of PfCx pyramidal cells (Yang et al., 1996), and the action of D, receptors on the firing properties of these cells has also been investigated (Yang and Seamans, 1996). In this chapter we provide a simple model of PfCx delay cells which is used to illustrate and investigate the actions of the D, dopamine receptor on these cells. Activity in the delay cells has also been shown to be modulated by D, receptor effects (Williams and Goldman-Rakic, 1995). We suggest a PfCx architecture, in accordance with biological evidence, which accounts for the information processing properties of the delay cells, following the work of Lewis and Anderson (1995). From this architecture we can explore the hypotheses that a reduction in the levels of dopamine in the PfCx effectively isolates the delay cells and also destabilises the firing patterns of these cells. We can also show how other defects in the architecture of the PfCx can also lead to disruption of PfCx function. This *Corresponding author. e-mail: A.Reid @ anc.ed.ac.uk
enables us to illustrate how multiple pathologies can lead to PfCx dysfunction and so to schizophrenia. This relates to the ideas and work of others (Hoffman and Dobscha, 1989; Hoffman and McGlashan, 1993; Cohen et al., 1996). We have used the spike response neuron (Gerstner and van Hemmen, 1994; Gerstner, 1998a, b) as a model for PfCx pyramidal cells. The advantage of this type of model is that it allows us to construct a more biologically realistic model of the PfCx in terms of spilung activity of individual neurons. However the spike response model is sufficiently simple that it can also be investigated at the network level. The PfCx is considered to be the locus of executive functioning in working memory tasks and as such holds information ‘online’ while other processing occurs. As an illustration of the cognitive deficits which may arise, we have also summarised the results of two models which perform the Tower of London task. One model follows a heuristic derived from studying real subjects performing the task (Ward and Allport, 1997), this a rule-based PfCx model. The other model solves the Tower of London task using reinforcement learning and is loosely considered to be a model of the Nucleus Accumbens (NAcc). The NAcc is implicated in schizophrenia because it is the locus of action of traditional neuroleptic drugs. These are D, antagonists or mixed D, and D, antagonists. There is good evidence that the hippocampus and parahippocampal region are also
352
involved in schizophrenia (Friston et al., 1992). Other modeling work has been aimed at investigating the interaction between frontal cortex and hippocampus in schizophrenia (Horn and Ruppin, 1995). Recent neuroscience research (Heckers et al., 1998) goes some way to support claims that frontal-hippocampal dysfunction is a key part of schizophrenia. Our two models of the Tower of London task then allow us to investigate potential dysfunctions of these two models and show how either alone or together, or in conjunction with the hippocampus, the symptomatology of schizophrenia can arise through aberrant function of these brain regions
The prefrontal cortex and schizophrenia The links between PfCx dysfunction and schizophrenia are well documented and numerous (Levin et al., 1991; Passingham, 1993). Scanning studies indicate an under-activity of PfCx in most schizophrenics, including unmedicated and nevermedicated patients suffering an acute episode of schizophrenia. Such studies allow us to rule out the effects of neuroleptics on brain function, but are limited in that it is only a subset of all acutely unwell schizophrenics that can tolerate the scanning protocol. Scanning studies indicate an under-activity of PfCx in most schizophrenics, including unmedicated and never-medicated patients suffering an acute episdode of schizophrenia. There are certain other key points to note. It is known that dorsolateral PfCx is required for a variety of working memory tasks where information is held in a short-term memory store (online) for the period between stimulus and response (Goldman-Rakic, 1990; Funahashi and Kubota, 1994; Goldman-Rakic, 1995). In addition, scanning studies have shown that there is a monotonic relationship between working memory load and activation of dorsolateral PfCx (Barch et al., 1997). Patients without frontal lobes are generally ‘stimulus bound’ and react only to external stimuli rather than internally generated motivations. The ability to hold information online can be explicitly tested using psychometric tasks such as the Wisconsin Card Sort Test, Tower of London planning task, Continuous Performance Task and delayed-
response tasks and again, scanning studies show that the PfCx is preferentially activated during these tasks (Rezai et al., 1993). People with frontal lobe damage and schizophrenics both perform poorly on such tests. Also, both patient groups often show qualitatively similar failures on these tasks, in particular they frequently exhibit perseveration and impulsiveness. An intact dopamine innervation of PfCx is also required for proper PfCx function and there is good evidence that in schizophrenia there is a reduction in PfCx dopamine activity (Dolan et al., 1995; Svensson et al., 1995, Okubo et al., 1997). Some cognitive accounts of Schizophrenia have focused on the inability of schizophrenics to use stored regularities from past experiences to guide cument behaviour (Gray et al., 1991). This process is an example of controlled processing, where there is a need to actively control behaviour rather than rely on automatic learned responses. Situations where controlled processing is required are those in which there is no learned response, for example when planning or problem-solving skills are needed or in novel situations. Controlled processing also comes into play when the automatic responses are incorrect or need to be over-ridden. It is clear that controlled processing requires the ability to hold information online for various purposes, for example the action of inhibiting automatic responses occurs through holding online an alternative response. The inhibitory role of the PfCx is very important (Dias et al., 1996,1997) and it is possible that this is, in fact, the main role of the PfCx. Controlled processing is implicitly an attentional process and is highly dependent on frontal lobe function. In relation to dopamine, we note that novelty induces release of dopamine in PfCx and subcortical regions. A loss of dopamine input to PfCx would clearly represent a failure for PfCx to function correctly in a novel situation and produce a loss of controlled processing. A failure of controlled processing can be seen as the key defect in the cognitive dysfunction found in schizophrenia and can clearly account for loss of selective attention and loss of inhibtion of automatic processes, as seen in schizophrenia. Automatic processing is essentially the use of learned responses. Such responses are considered
353
to be acquired through a form of operant conditioning which occurs in subcortical structures including the basal ganglia and ventral striatum. A computational analogy to this learning process is reinforcement learning, where behaviours are learned based solely on a feedback signal from the environment which indicates how successful a behaviour is in achieving a particular goal. This feedback signal is termed the error signal and it has been proposed that one role of dopamine in the striatum is to provide the error signal to a subcortical reinforcement learning system (Montague et al., 1996). We will investigate further the possibility that a disruption to reinforcement learning in the NAcc can produce some of the symptoms of schizophrenia later in the chapter. In summary, a picture is now developing where loss of PfCx function is seen to lead to, at least, the cognitive dysfunction in schizophrenia, and disruption of dopamine input to PfCx leads to loss of PfCx function. A brief note on the pathogenesis of the positive symptoms of schizophrenia needs to be made here since we have only referred to the cognitive dysfunction seen in schizophrenia. It has been shown that there is increased activity in subcortical components of the limbic system, especially the hippocampus, during the occurrence of these symptoms (Busatto et al., 1995; Silbersweig et al., 1995). It is possible that the hippocampus, which is known to have delay cells that are active during delayed response tasks, may attempt to take over the role of an under-active PfCx. This would lead to hippocampal overactivity and possible excitotoxic damage. As a consequence of this there may be disruption of normal hippocampal function and a potential for the uncontrolled intrusion of memory fragments into consciousness. This has been suggested to account for symptoms such as auditory hallucinations and delusional perception.(Hoffman and Dobscha, 1989; Hoffman and McGlashan, 1993) An alternative cause of hippocampal dysfunction is reduced function of a-7-nicotinic receptors, which is known to occur in certain familial cases of schizophrenia (Freedman et a]., 1997). It has been shown that this reduction in nicotinic acetylcholine activity is related to failure of suppression of response to the second of a pair of
auditory stimuli (Stevens et al., 1998). The loss of this suppression is linked with genetic predisposition to schizophrenia. Finally, if subcortical structures including NAcc are involved in reinforcement learning then disruption to subcortical dopamine dynamics could lead to reinforcement of inapppropriate responses. If the PfCx is underactive then there would be little or no opportunity for over-riding and correcting inappropriate responses and so the bizarre behaviour of schizoprenia could occur.
Biology of the PfCx The PfCx has a basic six-layered structure with a well-developed internal granular layer (layer IV) which distinguishes it from the rest of the frontal cortex (Fuster, 1989). Pyramidal cells are found in layers I11 and V (with a few in layers I1 and VI also), those in layer I11 being larger. The layer I11 cells project within the cortex to cortical association regions and the contralateral cortex. Layer V cells project to subcortex, including the ventral tegmental area (VTA), amygdala, hippocampus and NAcc, and also back to layer I11 where there are a series of stripe-like bands composed of pyramidal neuron axon arbors (Lund and Lewis, 1993; Lewis and Anderson, 1995). The PfCx has reciprocal connections with most of its efferents apart from those in the striatum and NAcc. In addition there are reciprocal connections with the mediodorsal thalamus. The PfCx also contains many GABA interneurons, in particular wide-arbor and chandelier types. Figure 1 illustrates the key aspects of the architecture of the PfCx. The three most important aspects of the PfCx, at least in terms of information processing, are: the connectivity between layers I11 and V, the dopaminergic innervation; the role of GABA inhibitory interneurons. We will discuss each of these briefly. Layer I11 pyramidal neurons appear to act as input cells to layer V pyramidal neurons. The presence of D, receptors on dendritic spines indicates delay cell activity (Williams and Goldman-Rakic, 1995). The majority of dopaminergic inputs to the PfCx go to layer I11 so this would
W, u P
Fig. 1. The functional architecture of PfCx. Layer I11 pyramidal delay cells are fully inter-connected to form a delay ensemble. Input patterns arrive via apical dendrites. Dopamine innervation is not topographic. The GABA interneuron provides global inhibition and all delay cells are reciprocally connected to it. Note also the recurrent connections from layer V pyramidal cells back to layer ILL
355
seem to be the most likely site for the delay cells. In addition, this layer is noted to be more active during the delayed response task (Friedman and Goldman-Rakic, 1994). We propose that the stripes observed in layer I11 of PfCx are composed of a network of delay cells which are fully interconnected with each other via lateral dendrites and axons. This network is known as a delay ensemble and is responsible for maintaining information during the delay phase of a delayed response task. The delay ensemble has several interesting properties which we will explore in our model. Crucially, these ensembles are activated and stabilised by dopamine. The action of dopamine at the apical dendrites of layer 111 cells is to reduce noise and prevent spurious inputs from disturbing activity in the delay cells. The dopamine projection to the PfCx from the VTA innervates predominantly layers 111 and V, mostly synapsing on apical dendrites in layer 111. This projection is not topographic and D, receptors are found mostly at extra-synaptic sites (Smiley et al., 1994) which suggests a diffuse action. This fits with our hypothesis that it acts to activate and stabilise delay ensembles. It is important to note that the PfCx sends glutamatergic inputs back to the VTA which induce burst-firing in dopaminergic neurons. Thus there is a potent mechanism for the PfCx to auto-regulate its own dopamine levels. The role of GABA interneurons is also crucial. These generally have roughly spherical dendritic and axonal arbors and have a fast-spiking firing action. This would make them ideal for the role of providing global inhibition. It has been documented that pyramidal cells receive both inhibitory and dopaminergic inputs (forming synaptic triads) (Cowan et al., 1994) and also that GABA interneurons receive excitatory and dopaminergic inputs. The dopaminergic input to GABA interneurons increases GABA levels in the PfCx and is thought to act via D, receptors (Grobin and Deutch, 1998). It is highly plausible that these neurons provide feedback inhibition to pyramidal cells, assisted by dopaminergic activity. In support of this is the observation of reciprocal firing between pyramidal and GABA interneurons in the PfCx (Wilson et al., 1994). In addition, GABA interneurons are implicated in the pathogenesis of
schizophrenia. Benes (1 993) shows that there are reduced numbers of these cells in layer IILII of anterior cingulate cortex, a brain region closely related to the PfCx and involved in attentional processes.
The spike response model The basic model is composed of an ensemble of interconnected spike response neurons which form a network. The main premise behind the spike response model is that the form of coding used by real neurons may depend upon the timing of individual spikes. Thus neural codes which are based on mean firing rates or the averaging of spatial inputs may not be adequate to capture the full spatio-temporal dynamics of a biological network of neurons. It is argued that in order to investigate the dynamics and information processing properties of a biological neural network we need models which can capture the full range of dynamics of such networks. The spike response model represents a compromise between extreme biological detail exemplified by complex multicompartmental models which require huge computational resources to generate a single spike, and more standard artificial neural networks which can compute complex functions but do not capture enough biological detail to make them plausible models of real neurons. We will briefly describe how a spike response model works. The activity of a spike response neuron at time t is given by a single equation (see Gerstner and van Hemmen (1994) for full details): V(t ) = P( t ) + R(t)
(1)
Where V ( t )represents the membrane potential, P(t) represents the post-synaptic potential (PSP) and R(t) represents the refractory potential of the neuron. If V(t)exceeds a threshold value 0 then the neuron is considered to have spiked and produced an action potential. In the discrete time case the PSP is given by:
356
Where ~ ( s )is the shape of a single PSP arising in neuronj due to a spike from neuron i at time s (Fig. 2(b)). It is given by an alpha function:
&(S)
=
1"
for 0 6 s G P ((s - A)/T:) exp( - (s - aU1)/7,y)
stability states within a spike reponse model, which we will discuss later. If a number of spike response neurons are connected together then we have to consider the synaptic weights (W,) between them. These act to modify the PSP and Eqn 2 becomes:
c z.
for s > A - r
(3)
F represents the axonal conduction delay and T , is ~ the membrane constant. In Eqn 2 the term S j ( t - s) represents the existence of a presynaptic spike in neuron j at time s and accordingly takes values {0,1)# Figure 2(a) shows the basic shape of the refractory potential R(t). However the shape of this will vary depending upon the exact firing conditions of the neuron, in particular, whether there is burst firing or not. The time of firing of a single neuron thus depends on the superposition of the two curves in Fig. 2, shifted according to the time of firing of the presynaptic input and the value of P .The value of A- is of great importance in determining the
P ( t ) = w,
& ( S ) S j ( t - s)
(4)
t=O
In order to make the model more realistic we will incorporate some noise into the system. Thus the firing of a neuron is considered to be probabilistic and to depend upon a noise parameter (Up). The equations for the firing of a neuron are: P,(t)= 1 - exp( - p - ' ( V ) )
(5)
P(V>=po exp( - P(V -
(6)
Stability of the model A fully connected network of spike response neurons linked with uniform synapses and with
0
timelmsec
limeimsec
Fig. 2. (a) shows a typical refractory potential with a very small depolarising after-potential; (b) shows a PSP with the axonal delay A indicated.
351
negligable noise can exist in one of two fundamental states: synchronous or asynchronous. The network will show either synchronous or asynchronous behaviour depending on the value of the axonal delay parameter (Aav). This can be explained if we imagine a network which is almost perfectly synchronised. If a particular neuron fires early in this network, then the PSPs it induces in other neurons will all arrive early. In order to remove the effect of the rogue early firing neuron we need the temporal advance in firing of postsynaptic neurons to be less than the temporal advance in PSP arrival. In this way the firing time discrepancy in the rogue neuron will be absorbed by the network and will tend toward perfect synchronisation. The factor which defines whether the advance in firing is greater or less than the advance in PSP arrival is the point of intersection between the PSP and the effective threshold (R(t)- 8), and this point of intersection is strongly affected by A . It can be seen (see also Gerstner, 1998a) that if the effective threshold intersects the PSP on the rising part of the PSP curve then the firing advance will be less then the PSP advance and the network will tend towards synchronous firing. This situation occurs with longer values of A. Shorter Ams will produce a network that is not able to absorb fluctuations in the timing of individual neurons and, assuming a minimum level of noise, the network will settle into an asynchronous firing state. Synchronous firing states can be harnessed to hold temporal sequences of patterns, although we have yet to investigate this in the context of PfCx delay cells and dopamine.
The DAP is dependent on the influx of Ca2’ ions to the cell (HajDahmane and Andrade, 1997). IBs also show a marked slow after-hyperpolarisation (AHP) which increases with the number of spikes fired until the cell stops firing. In order to reproduce these firing characteristics we need to assume that the shape of the refractory potential changes between the initial spike and subsequent spikes. It is known that slow AHPs are due to Ca2+-activated potassium channels (Hille, 1992). These channels are activated by Ca2’ influxes occurring during each action potential. The slow AHP is absent when extracellular Ca2’ is absent, and is enhanced when [Ca”], is raised or after a series of action potentials. This tells us that the shape of the refractory potential is largely dependent on Ca” fluxes and so will vary between action potentials. We use the following equations to model the change in refractory potential. The initial refractory potential, as shown in Fig. 2(a) is given by:
(7)
k=
A’
- refra.bs
+
5
Subsequent refractory potentials are given by a modified version of Eqn. 7:
Model of PfCx delay cells In order to model PfCx delay cells we need to be able to reproduce the firing properties of a single PfCx cell. Yang et al. (1996) show that the most prevalent pyramidal cell type in layer V-VI of rat PfCx has an initial spike doublet which is due to a prominent depolarising after-potential (DAP). They term these cells intrinsic bursting (IB) cells. We make the assumption that these results can be applied to layer 111 pyramidal neurons since they are morphologically similar although generally larger.
k=
s + 50 - refr,,
5
+2
refr,,, refers to the fast AHP or absolute refractory period which can be specified explicitly in the model if so desired. Figure 3 illustrates the inreasing magnitude of refractoriness with increasing number of spikes. The spiking characteristics for firing of a single IB PfCx cell are shown in Fig. 4.
358
-1 2
0
50
100
200
150 timelmsec
250
300
Fig. 3. The increasing magnitude of refractory potentials as the number of spikes increases.
I
. 200
300
400
500
600
700
800
900
1000
timelms
Fig. 4. A single delay cell given an initial depolarising current. There is an initial spike doublet followed by spike frequency adaptation. Note that the graph indicates spiking on the y-axis and not membrane potential. Parameters used: 8 =O.O, 1@=4.0.
359
For a single cell there is no synaptic input therefore there are no PSPs to contribute to the membrane potential V(t).We can see that as a result of the increasing magnitude of the AHP, spike frequency adaptation occurs, as is seen in the IB neurons. Our model of a PfCx delay cell reproduces the firing properties of the IB cell seen in rat PfCx.
The effect of dopamine on PfCx delay cells The effects of dopamine on the firing properties of IB PfCx cells are twofold. There is an increase in firing frequency, and a loss of AHP and spike frequency adaptation. The increase in firing frequency is due to an effective reduction in the firing threshold of the neuron. These results are replicated in our model of a single neuron (Fig. 5). To simulate the effects of dopamine we have used a lower threshold and a refractory potential which does not change with the number of spikes produced. The equations for this refractory potential are Eqns 7 and 10.
I
300
400
500
600
700
These effects seem to occur in the proximal dendrites and around the soma. There is another effect of dopamine on the delay cells which occurs in the distal dendrites. Dopamine appears to increase the threshold of the Ca2' -dependent highthreshold spike (HTS). This has the effect of attenuating distal input to the layer 111 delay cells. It is only larger, i.e. layer 111, pyramidal cells which exhibit HTS spikes. The effect of dopamine here is to gate inputs to the layer I11 delay cells and so act to isolate the delay cells during a working memory task. Having constructed a biologically plausible model of a single delay cell and the action of dopamine on this cell we now need to to consider a network of such neurons. We can compare directly the effects of dopamine on the ensemble activity of a network with an unmodified system. Results are shown in Figs 7 and 8 where we can see that dopamine gives rise to coherent network activity. By regularising the firing of individual neurons in the network we can see how dopamine in the PfCx is able to stabilise activity and allow for informa-
800
900
I0
timelms
Fig. 5 . A single delay cell firing under the influence of dopamine acting on D, receptor and an initial depolarising current. Parameters, 8 = -4.0, 1/p =40.0.
360
40
t
350 300
100
*-Hc-Hc+5*
~
,
501
0
500
0
,
1000
'
*
' 2000
1500
*
'
~
'
'
'
~
w
3000
2500
timelms
Fig. 6. Comparison of inter-spike intervals. Upper trace (open circles) is for a single cell without dopamine, lower trace (asterisks) is for a single cell with dopamine. Compare with rat data in Yang and Seamans (1996) pg 1925.
,
0.7
I
I
I
0.6
B
E._
o'5
e
; . 0.4 m
m
B
E
I
0.:
% E ?
0.;
0.1
0
0
~
100
200
300
400
500
600
700
BOO
900
00
timelms
Fig. 7. A network of 100 delay cells fully inter-connected with uniform weights, representing a delay ensemble. The network shows asynchronous activity. Parameters: 8 = 0.0, 1lP = 4.0, AM= 5.0, T , = 4.0.
36 1 0.7
,
I
500
600
I
1
0.E
2. E
o'5
r ._ L
2a
0.4
ij
E
0.2 m a
E
t 0.2 0.1
C
I 100
00
~
300
4(
timelms
I 700
800
900
1000
Fig. 8. The same network as in Fig. 7 but under the influence of dopamine. The network shows synchronous activity. Parameters: same as in Fig. 7, except 0 = - 4.0,l/p=40.0.
tion processing to take place. We will discuss exactly how the architecture of the PfCx can be utilised for information processing and the role dopamine has in the next section.
Information processing in the PfCx We have discussed some of the salient aspects of the architecture of the PfCx, now it is time to put this into the context of information processing within the PfCx. We know that at the single cell level the PfCx has delay cells which remain active during the delay period of delayed response tasks. These cells are considered to form the neural substrate for a general class of working memory type processes. If we consider information to consist of temporo-spatial firing patterns then we need to consider how an ensemble of pyramidal delay cells can maintain such a pattern. As mentioned above, we propose that the layer I11 cells form a delay ensemble which is the fundamental unit of activity in working memory or delayed response tasks. Layer I11 pyramidal cells also act as input cells to the layer V cells which project subcortically, in particular to hippocampus,
NAcc, and VTA. Layer 111 cells receive inputs from association cortex and so can direct processed aspects of sensory input into the delay ensemble. The crucial role of dopamine is to activate and stabilise the delay ensemble through its actions at proximal dendrites of reducing the firing threshold and regularising the firing frequency. Without a concurrent dopamine input to PfCx, any input pattern is very unlikely to be able to trigger firing in the delay cells, and if the delay ensemble is activated it will not enter a coherent firing state. In order for information to be held online the delay ensemble needs to be isolated from further inputs. This is the other role of dopamine. Acting at D, receptors on apical dendrites it increases the HTS threshold and makes it harder for inputs to reach the delay ensemble. We require, however, that any pattern may be stored in this delay ensemble without any learning process or structural re-arrangement occurring. Without some other mechanism for ensuring that non-firing (off) cells within the pattern are not pushed into a firing state by the activity of other cells in the ensemble, we cannot ensure that the
362
exact input pattern is maintained within the regular firing of the delay ensemble. This other mechanism is supplied by the GABA interneurons which provide reciprocal global inhibition to pyramidal delay cells and the recurrent connections from layer V. A pattern of activity is input to the delay ensemble which arrives concurrently with dopamine input which will activate the delay ensemble (see Fig. 1). The delay cells are reciprocally connected to a GABA interneuron which globally inhibits all cells in the ensemble and effectively increases the firing threshold of all cells. GABA synapses are known to be fast acting, so it is entirely plausible that the inhibitory effects will initially override the excitatory input from lateral connections. Those cells which fired initially will activate recurrent connections via layer V pyramidal cells and this excitatory input will re-activate these cells. Cells which do not receive recurrent input will not have enough excitatory input to overcome the increase in threshold due to GABA cell inhibition and so will remain quiet. In this way the spatial resolution of the input firing pattern is
0.6
maintained over time. It is clear that both these mechanisms must be intact for accurate pattern retention to occur. We can illustrate the effects of global inhibition on ensemble activity in our model of a delay ensemble. Cells which do not receive recurrent input have an effectively increased threshold, thus by increasing the value of 8 for cells receiving no input in the model we can reproduce the desired conditions; results are shown in Fig. 9. We can see that the average firing activity for inhibited and non-inhibited ensembles looks very similar, i.e. both show coherent firing patterns (compare Figs 8 and 9). However an examination of exactly which cells are firing shows that only the ensemble with inhibition and recurrent connections maintains the correct spatial firing pattern. What we have here is effectively a weightless form of memory, whereby spatial patterns can be accurately stored over time without any form of synaptic modification, dependent upon an intact dopaminergic input to PfCx and a functional PfCx architecture. I
I
800
900
-
5
0'5
''
-
.-f
.' 0.4 ..
P
E E
8
0.3
m
!f
1
0.2
L A
"0
100
200
300
400
u 500 timelms
L
_L
600
700
L
00
Fig. 9. A network with global inhibition and recurrent connections under the influence of dopamine. This network also shows synchronous behaviour. Parameters: as for Fig. 8 except cells which receive no initial input (and therefore no recurrent inputs) have 8 = 1.0.
363
It is very likely that this mechanism can be extended to sequences of patterns although we have yet to carry out the necessary simulation work to show this.
Schizophrenia: multiple pathologies lead to the same outcome We have earlier looked at how the PfCx as a brain region is involved in schizophrenia. Now, in the light of our model of PfCx delay cells and the various components necessary for correct functioning of this model, we can consider in more detail the nature of PfCx dysfunction in schizophrenia. The PfCx is known to be underactive in schizophrenia, and there is strong evidence that there is also a reduction in D, receptor activity in this part of the brain. From our model of delay cell activity we can see in a qualitative fashion exactly how this reduction in dopamine D, receptor activity affects the ability to hold spatio-temporal patterns online. A reduction in the activity of D, receptors will have two main effects. One is that the coherence of firing patterns in the delay ensemble will be disrupted. This corresponds directly to a disruption in worlung memory ability. The other implication of a hypodopaminergic state is that the gating of inputs to the delay ensemble is reduced, leading potentially to extraneous activity in some of these cells. This could be construed as a reduction in attentional ability, another hallmark of schizophrenia. Overactivity of D, receptors would also lead to deranged prefrontocortical function through over-gating of inputs and preventing access to the delay ensembles. Many psychotomimetic drugs do raise dopamine levels, e.g. amphetamine, cocaine, LSD (via serotonin), although their principal locus of action is in NAcc. Interestingly, PCP, a drug which can induce behaviour identical to both the negative and positive symptoms of schizophrenia, causes an increase in NAcc dopamine and decrease in PfCx dopamine (Svensson et al., 1995). This is good evidence for the theory that in schizophrenia there is hypodopaminergia in PfCx and hyperdopaminergia in subcortical structures, especially NAcc. We can also see from the model, however, that defective PfCx function can arise from causes other
than reduced dopamine action. A loss of global inhibition from GABA interneurons will prevent the maintenance of a spatial pattern over time although the delay ensemble activity will remain coherent. In terms of the overall activity of the PfCx, this would lead to a slight hypofunctioning, as pyramidal cell activity would be unchanged but there would be less GABA cell activity. In line with this observation there is data to suggest that in schizophrenia there may be a reduction in the number of GABA interneurons and a reduction in inhibition (Benes et al., 1996). If we look at the converse case, where GABA cells are over-active we would expect to see poor performance on working memory tasks, and this is seen in studies assessing the cognitive effects of benzodiazepines (Coull et al., 1995). Diazepam also reduces the novelty-induced release of dopamine in PfCx (Feenstra et al., 1995) so we cannot be sure exactly by which mechanism benzodiazepines reduce performance on working memory tasks. Interestingly, it is observed that one of the side-effects of vigabatrin, an anti-epileptic drug which works through raising cortical GABA levels, is psychosis. The third component of the model which is crucial to the formation of coherent firing activity in delay ensembles is the duration of the axonal delay (Aw). As discussed above, if the duration of the axonal delay is too short then coherent activity will not form in the network (except in conditions of zero noise, which do not exist in biological neural systems). There is no evidence of conduction defects in schizophrenia. There is, however, some evidence for developmental abnormalities (Harrison, 1997), and in particular for the hypothesis that over-pruning of synapses in developing neocortex underlies schizophrenia (Keshavan et al., 1994). The PfCx matures later than all other neocortical regions, and its maturation (during adolescence) coincides with the earliest ages of onset of schizophrenia. If synaptic pruning proceeds by a distance rule where long andor synaptically weak connections are removed, as suggested by Hoffman and Dobscha (1989), then we would see shorter connections maintained at the expense of longer and weaker connections. If this was the case then, assuming the general
3 64
architecture is preserved, we would have delay ensembles composed of neurons with short axonal transmission times. These axonal delays may be too short to give coherent firing and so coherent firing within a delay ensemble could never arise. In addition, there is a possible correlation between neonatal brain damage and the subsequent development of schizophrenia. It could be that a disruption in the development of frontal architecture, either in terms of altered axonal delays or disruption to the recurrent connections from layer V pyramidal cells to layer I11 stripes, is the mechanism for this correlation. It can be seen that multiple pathologies can all lead to the same or similar outcomes in terms of PfCx function. This in turn implies that the same pathologies are involved in the pathogenesis of schizophrenia. This fits nicely with the growing view that schizophrenia is not one illness but a collection of illnesses or conditions with a common core pathology. Our model shows how this could be the case.
The cognitive domain: models of the Tower of London task In this second part of the chapter we will look at the possible roles of PfCx, NAcc and hippocampus in a particular cognitive task. As mentioned earlier, these three brain regions are strongly implicated in the neuropathology of schizophrenia. We have built two models which illustrate how PfCx and the subcortical regions could each be involved in complex cognitive processes. We have also examined interactions between these two models and the effects of damage in one or other of the models. This enables us to investigate certain hypotheses regarding the genesis of schizophrenic symptoms in a way which is not possible in vivo.
The Tower of London task taps planning ability and in doing so utilises working memory. Several studies have shown that the PfCx is active during the Tower of London task (Morris et al., 1993; Baker et al., 1996; Owen, 1997) and also that performance is impaired in patients with either frontal lobe damage or schizophrenia (Morice and Delahunty, 1996). Thus it can be used as a measure of the preservation of these functions in various normal and patient populations. The aim of the Tower of London task (which is a simplified version of the more general Tower of Hanoi problem) is to go from an initial state to an end state in the minimum number of moves (See Fig. 10). Moves consist of removing one disclpiece from the top of its stack and placing it in the lowest unfilled position on another peg. With the threedisc task, different starting and goal configurations can be used to give a set of 1260 problems ranging in difficulty from 1-move to 8-moves. Subjects are asked to try and work out a solution to the problem mentally before they actually make any moves, and the time taken to think of a solution is called the thinking time. Problems requiring few moves (less than three) are easy to solve and do not really tax PfCx. Accordingly, frontal lobe patients and schizophrenics are not especially impaired on these problems. However, problems with greater than three moves are increasingly difficult, as shown by the fact that the time taken to solve them increases monotonically with number of moves. There are several factors which make these problems harder; one is that the number of decision points increases. A decision point occurs when it is not clear what the next move should be, i.e. it is not possible to place any disc in its goal position and there is more than one possible move. Ward and Allport (1997) refer to moves like this as sub-goal chunks. In order to solve the problem efficiently it is necessary to
Fig. 10. The Tower of London planning task. The aim is get from the initial configuration to the final configuration in a specified minimum number of moves.
365
store a decision point temporarily in working memory and also to remember which move was made from it. This means it is possible to backtrack and try alternative moves if the first choice of move was not successful. The time taken to solve a problem, and hence its difficulty, depends on the number of sub-goal chunks (or decision points). Another factor which adds to the degree of difficulty of a problem is whether there is any goalsubgoal conflict (Goel and Grafman, 1995; Morris et al., 1997). A goal-subgoal conflict arises when the correct move required to solve the problem efficiently takes a disc out of its goal position, or fails to place the disc in its goal position when there is an opportunity to do so. There is conflict because the instinctive choice of move is always to put a disc in its goal position. In order to solve certain problems it is necessary to inhibit the instinctive ‘going for goal’ behaviour when goal-subgoal conflict occurs. We can see now why the PfCx is necessary for efficient performance of the Tower of London task. It is required to hold decision-points in working memory and also inhibit an instincitive behaviour in certain situations.
The prefrontal cortex model This model is rule-based and requires workingmemory components to function. In both models it is assumed that given any particular configuration, all the moves which can be made from it can be seen. In other words, we consider that knowing which discs can physically be moved does not constitute part of the cognitive process of solving the task but belongs in the realm of primary sensorimotor processing. This is represented in our models as a set of pre-learnt associations between current state and possible moves. These associations are represented in neural networks trained with the standard back-propagation algorithm (e.g. Hertz et al., 1991). Since changes in primary sensorimotor processing are not fundamental features of schizophrenia these network representations are not altered in any of our simulations. The PfCx model uses the heuristic suggested by Ward and Allport (1997) (page 69) on the basis of their psychological experiments. First a target disc
TABLE 1 Scoring of different move types. Higher scoring moves take precedence over lower scoring moves Score
Move type
3
Moves a disc to its goal position Moves target disc, but not to goal location Moves disc away from goal location Moves disc not designated as target disc Moves the same disc twice on successive moves
1.5-2.5 1 0
is designated; this is the first, lowest, incorrectly placed, disc as we scan the puzzle from right to left and is effectively the most difficult disc to place. If this disc is covered then we look to see if its target location is occupied. If the target location can be freed then that move is selected, otherwise we move the discs above the target disc until it can be moved. This heuristic is combined with rules regarding the ‘goodness’ of a move (see Table 1) to select a move, This heuristic is similar to the perceptual strategy of Goel and Grafman (1995). It is important to note that a move which places a disc in its goal position is always taken. This can give rise to goal-subgoal conflict; results for the basic model are summarised in Table 2. The results for performance times give quantitatively similar results to those obtained from real subjects. The results for error rates are qualitatively similar although quantitatively much worse. This can be accounted for by the fact that we used a 3-disc problem which has a higher decision-point to goal ratio than the 5-disc problem. Also, our model did not have an explicit move counter and was only able to restart a problem when the number of items stored in working memory exceeded the minimum number of moves.
The reinforcement learning model The second model is a reward-based model which uses the adaptive-critic form of reinforcement learning (Barto et al., 1983; Barto, 1995; Kaelbling et al., 1996) to solve and learn the correct sequence of moves for the Tower of London task. It has been suggested, as mentioned earlier, that the basal ganglia operates under such a learning scheme. Also, Montague et al. (1996) have shown how the
366
TABLE 2 Results for PfCx model. We used a randomly selected problem set consisting of 6 types of each problem, with problems ranging from I-move to 8-moves. 10 simulations were run using this problem set giving a total of 480 data points. This table shows mean performance times and the average error rate. Error rate is calculated as the number of problems which are solved in greater than the minimum number of moves over the number which are solved with the minimum number of moves. One problem (7-moves) could not be solved at all. Dependent variable
Time Error rate
Number of moves
1
2
3
4
5
6
7
8
1.o 0
2.0 0
4.9 0
6.7 0.38
7.2 0
8.8 0.38
10.6 0.62
11.7 0.28
temporal difference error signal acts analogously to dopamine in reward-based learning. They did not state specific anatomical correlates for their model, however they did suggest that the weight-learning locus might be in the basal ganglia or ventral striatum. We need to mention, briefly, what function the NAcc may perform in a normal brain. Pennartz et al. (1994) propose that its main role is to select in parallel a set of motor (or other) output patterns (or programs) in response to an input pattern (or situation) in order to optimise reward. We can extend this by suggesting that it selects a sequence of output patterns to optimise reward. Clearly the sequence selected represents an appropriate response to the input, based on past responses. We will assume that the learning occurs in the NAcc and the dopamine projection arises in the VTA. We should point out that all situations are both potential learning and performance situations. This means that when we refer to learning in the NAcc, we also refer to performance. Performance does not necessarily entail physical enactment and could instead be mental rehearsal. We have used the same Tower of London task as an illustration of sequence learning here. It is unlikely that the solutions to the task are actually memorized, but if we had modelled chess playing, for example, it would not be unrealistic to envisage memorisation of sequences of moves as well as calculating optimal de novo sequences. The NAcc model uses the following equation to define the temporal difference error:
For full details of adaptive critic learning see Barto et al. (1983) and Barto (1 995). Essentially, there are two learning elements in this system: the adaptive critic which makes predictions regarding the expected reward a particular move will bring, and whose output is the temporal difference error; and the actor, who learns which are good moves to make in a particular situation under the guidance of the critic. Both actor and critic are forms of neural network. The weights in the actor network are set proportional to the temporal difference error. In Eqn 11, r, represents the actual reward received by the system (0 at all times except when the target is achieved, when it is 1); P, represents the prediction of the critic at time t, discounted by a factory (0.95 in our simulations) to allow the effects of reward to propagate through time but to eventually decay; and P,-,is the prediction at the previous timestep. The results of this simulation are presented in Table 3. The results profile is similar to that of the PfCx model but the solution times are much greater.
Dysfunction of PfCx and subcortical systems and the genesis of schizophrenic symptoms We have seen how a PfCx model and a NAcc model can both, in different ways, generate sequences of associations in a particular context. In order to relate these models to schizophrenia we need to lesion them and observe their behaviour under ‘schizophrenic conditions’. We have lesioned each model separately. In the first part of this chapter we have seen how disruption of the PfCx’s working memory function
367
TABLE 3 Results for NAcc model. The same data-set was used for this model as the PfCx model Dependent variable
Time Error rate
Number of moves
1
2
3
4
5
6
7
8
157.5 0.2
152.7 0
543.5 0
196.6 0.17
1119 0
1459.25 0.2
3152.7 0.5
3452.7 0
the Tower of London task. PfCx deficits alone are very similar to the symptoms of the psychomotor poverty syndrome suggested by Friston et al. (1992) and Kaplan et al. (1993). In looking at the NAcc model we investigated one main effect. This was to see how abnormal alterations in the level of dopamine (i.e. the TDE) affect learning. We allowed the TDE to fluctuate randomly between 0 and 1. Unsurprisingly the effect on learning was very disruptive and in fact none of the sequences were learned correctly (data not shown). Instead the network followed a bizarre series of moves, often becoming stuck in oscillatory states for a while before moving on. Some problems resulted purely in oscillation between two states. These ‘behaviours’ could be interpreted as delusional belief states in that the network has learnt a set of associations which are very much out of context of the reality of the particular task being attempted. We suggest that the symptoms comprising the disorganisation syndrome arise in this way. The fact that the Knight’s move symptom arises from working memory disruption serves to illustrate the overlap between the psychomotor poverty and disorganisation syndromes.
can arise. We investigated three conditions representing PfCx dysfunction: Loss of inihibition of ‘going for goal’ instinct Reduction in the number of items that can be stored in working memory Disruption to the integrity of memories stored in working memory. Results are shown in Table 4. With loss of inhibition we see a marked deterioration in performance on the longer problems. Reduced working memory capacity has a similar, although more gradual effect. The third condition produced the most pronounced deterioration in performance. It also produced an interesting ‘jump’ phenomenon when working memory was utilised. This was where the corrupted memory stored in the PfCx led to a bizarre jump to a new and unrelated position. In the context of the Tower of London task this jump does not make much sense other than to ascribe failure. However in a more general sense it is reminiscent of Knight’s Move in thinking. The third condition also gives results which accord well with observed performance of schizophrenics on TABLE 4
Results for the three conditions described in the text. In condition 2. working memory capacity is reduced from 9 items to 2 items. Results are averaged over one simulation only Dependent variable
Condition 1 Condition 2 Condition 3
Time Error rate Time Error rate Time Error rate
Number of moves
1
2
3
4
5
6
I
8
1.o 0 0
2.0 0 2.0 0
2.0 0
4.0 0 4.0 0.66 10.5 0.33
5.0 0 5.0 0.4 5.4 0.17
6.0 0.3 9.3 0.5 9.0 0.67
FAIL FAIL 13.7 0.5 X
FAIL FAIL 11.0 0.33 X
0
3.2 0 3.0 0 3.3 0
1 .o
I .o
1
.o
1 .o
368
We have already stated that there is considerable evidence for a dysfunction in frontal-subcortical connectivity to be involved in the pathogenesis of schizophrenia (Berman et al., 1994; Biver et al., 1995; Pantelis et al., 1997; Heckers et al., 1998). In order to investigate this we need some idea of how PfCx and subcortical structures, in particular the NAcc and hippocampus, interact normally. We have seen how the NAcc ensembles can learn sequences of associations, and how this can be disrupted by altering the error signal (dopamine). Pennartz et al. (1994) suggest that dopamine is necessary for switching between ensembles. There is, however, another possible mechanism for switching ensembles, and that involves the hippocampus. It has been shown (Mulder et al., 1997) that the hippocampus can induce short-lived (less than 60 minutes) long-term potentiation in the the NAcc. It has also been shown that the hippocampus also serves to gate PfCx inputs to the NAcc (O’Donnell and Grace, 1995). If we consider the PfCx to be able to guide and override learned responses in novel situations, then it is possible that the hippocampus plays a crucial role in this process by selecting an appropriate ensemble in the NAcc for the PfCx to act on. The hippocampus would at the same time allow the PfCx to act on the ensemble. With our model we can simulate this situation by allowing the PfCx model to guide the learning of the NAcc model. If the NAcc network has no stored associations, or stored associations similar to the new sequence, then learning is speeded up drastically and the NAcc model performs the task correctly from the start. However, if the NAcc network already has stored associations which are not similar to the new sequence or conflict with it directly, then the situation is quite different. In this case, where we assume an overactive hippocampus has selected an inappropriate NAcc ensemble, the existing associations will conflict with the external input, and also the PfCx will not be able to override the behaviour of the NAcc because the hippocampus will only be allowing the inappropriate pattern to be gated. In addition, if the hippocampus is over-active then the duration of LTP it induces in the NAcc may be much 1onger.Thiscan be illustrated in our model if we attempt to learn a new sequence using a network
that already has stored associations; in other words, there is inappropriate intrusion of controlled processes into automatic processes. The pre-existing associations act as an attractor and pull the new sequence towards it; this results in the conflation of two sequences or oscillatory behaviour. These unexpected or unwanted behaviours arising from this dysfunction may well be interpreted as alien, and could account for the depersonalisation which occurs in schizophrenia. In addition such an interpretation would account for various of the positive symptoms such as thought insertion, thought control and auditory hallucination (which is invariably in the third person). This may correspond to the reality distortion syndrome of Friston et al. (1992). In summary, we have tried to provide an account of the symptomatology of schizophrenia in terms of the individual breakdown in PfCx and NAcc function, suggesting that these correspond, respectively, to the psychomotor poverty and disorganisation syndromes. In addition we have looked at the interaction between NAcc and PfCx and shown how the hippocampus may play a crucial role in this. If hippocampal function is aberrant then this could lead to the third syndrome, reality distortion. Clearly more than one dysfunction can co-exist, and this allows for the overlap in the syndromes.
Discussion We have replicated the firing properties of a PfCx pyramidal delay cell and its modulation by dopamine using a biologically realistic spiking model neuron. We have then been able to construct a network of model PfCx delay cells which we have called a delay ensemble and which may represent one of the lattice stripes observed in the PfCx. The delay ensemble is considered to be responsible for maintaining spatial patterns of information during the delay phase of delayed response tasks and as such forms part of the neural substrate of working memory. The main aim of this work has been to investigate the information processing properties of such a network and to show how dopamine could be involved in activating and stabilising the activity of the delay ensemble. Reduced PfCx function and
369
reduced dopamine activity in the PfCx are consistent findings in people with schizophrenia and so our work may have a direct bearing on the mechanisms underlying the pathogenesis of schizophrenia. It is interesting to note that coherent firing in a delay ensemble does not mean that resolution of a spatial firing pattern is maintained. This means that in observing ensemble activity in real neurons, for example in a scanning study, we are still not in a position to comment about the informational content of that activity. A related point about modelling is that when we use a spatio-temporal pattern of firing to represent information we cannot be sure at what level of representation this pattern exists. For example, does it represent the entire encapsulation of a visual scene observed at some point in the past, or does it represent one tiny aspect of that visual scene? Our model shows exactly how dopamine, acting on proximal dendritic D, receptors, is necessary for sustained and regular firing of PfCx delay cells and how this is a prerequisite for coherent firing of the delay ensemble and the maintenance of information in the form of a spatial firing pattern. A secondary role of dopamine also appears to be its attenuation of the HTS in distal dendrites which helps to improve the signal to noise ratio in the inputs to the delay ensemble. Several other interesting observations arise from our model:
GABA interneurons are necessary for the preservation of the resolution of spatial patterns; Recurrent connections from layer V back to layer I11 of PfCx are also necessary for maintaining the resolution of spatial patterns; The axonal transmission time affects the ability of a delay ensemble to exhibit coherent firing. These observations allow us to show how different pathologies can lead to disruption of PfCx function by affecting different aspects of the neurochemistry and architecture of the PfCx. In this sense our model helps to unify several different theories of the pathogenesis of schizophrenia. The hypotheses of hypofrontality; excessive synaptic pruning; reduction in the number of GABA neurons; developmental disturbance; and neonatal
brain damage can all be accounted for by different aspects of the model. In the second part of the chapter we have suggested that hippocampal over-activity may allow excessive and inappropriate PfCx input to the NAcc. If the main PfCx lesion was in the GABA interneurons then this would not produce hypofrontality, but would lead to disruption and disintegration of working memory processes. If this was combined with hippocampal over-activity then a situation of inappropriate intrusion of fragmented worlung memories into the NAcc could lead to a serious loss of reality. This may account for rapidly deteriorating cases of schizophrenia which are refractory to neuroleptic treatments. It is worth mentioning a few extra points regarding the possibility that disturbances in normal brain development may underlie schizophrenic pathology. We have shown the necessity for intact GABA interneurons and recurrent connections from layer V of PfCx. There is evidence that GABA interneuron terminals in layer I11 develop synchronously with pyramidal cell dendntic spines. The correct interaction between these two structures may well be at risk if there is any disruption to the normal developmental course. In addition it has been shown that in models of neural development inhibitory cells are necessary for the overgrowth of neurites (van Ooyen and van Pelt, 1994). The size of the neuritic field is reduced in the absence of inhibition, thus a failure of adequate inhibition during development may induce shorter axonal connections in the mature neurons. Other factors, including D, receptor activity, are responsible for neurite outgrowth. It is possible that either a primary loss of GABA interneurons or a reduction in D, receptor function could give rise to altered connectivity of mature PfCx neurons. From our model we can see that any or all of these effects: reduced dopamine, shorter axonal connections, and reduced numbers of GABA interneurons, will disrupt PfCx function and potentially lead to schizophrenia. The regulation of dopamine activity and GABA activity in PfCx is crucial for correct functioning of PfCx and, as we have seen, an imbalance may lead to the symptoms of schizophrenia. The PfCx dopamine system is less stable than other dopamine
370
systems because there are no dopaminergic autoreceptors in the PfCx. Also, dopamine release in PfCx is release driven, as opposed to uptake driven (Gams and Wightman, 1994) thus there is no real tonic-phasic distinction in PfCx compared to striatal dopamine dynamics (Grace, 199 1). This means that there is less built-in scope for the autoregulation of dopamine levels in PfCx and it is perhaps not surprising that we should see prevalent conditions such as schizophrenia arising which may well be due to an imbalance in this system. We should also mention the role of neuroleptics in the treatment of schizophrenia. Most neuroleptics are D, antagonists and their main locus of action is the NAcc. From earlier discussion it is not surprising that they work only on the positive symptoms of schizophrenia and often make the negative symptoms worse. More recent neuroleptics such as clozapine and risperidone have additional serotonergic activity which may enhance the release of dopamine in PfCx. This would account for their actions in reversing the negative as well as positive symptoms of schizophrenia. To conclude, we have shown how the ability of the PfCx to hold information online depends on the interaction of several aspects of the architecture and neurochemistry of the PfCx. In particular, changes in dopamine activity, a reduction in the number of GABA interneurons, and shorter axonal connections can all disrupt the proper functioning of the PfCx. We have then suggested how PfCx may interact with the subcortical structures NAcc and hippocampus. Dysfunction in each of these brain regions may correlate with, respectively, the psychomotor poverty, disorganisation, and reality distortion syndromes. Dysfunction in the interaction between these regions may account for the overlap seen in these syndromes (Friston et al., 1992). These different pathologies have all been implicated in schizophrenia. It is our hope that we have contributed in some way to the understanding of the mechanisms underlying the pathogenesis of schizophrenia.
Acknowledgements We would like to thank David Sterratt for useful and informative discussions regarding spike
response neurons. We are also grateful to the editors for their suggestions. AR is funded by the Wellcome Trust.
Abbreviations NAcc = nucleus accumbens PfCx = prefrontal cortex TDE = temporal difference error LTP = long-term potentiation GABA = y-aminobutyric acid PSP = post-synaptic potential DAP = depolarising after-potential AHP = after hyperpolarising potential IB = intrinsic bursting HTS =high threshold spike VTA = ventral tegmental area PCP =phencyclidine
References Bachneff, S.A. (1991) Positron emission tomography and magnetic resonance imaging: a review and a local circuit neurons hypo(dys)function hypothesis of schizophrenia. B i d . Psychiafry, 30: 857-886. Baker, S.C., Rogers, R.D., Owen, A.M., Frith, C.D.. Dolan. R.J., Frackowiak, R.S.J. and Robbins, T.W. (1996) Neural systems engaged by planning: a PET study of the Tower of London task. Neuropsychologia, 34(6): 5 15-526. Barch, D.M., Braver, T.S., Nystrom, L.E., Forman, S.D., Noll, D.C. and Cohen, J.D. (1997) Dissociating working memory from task difficulty in human prefrontal cortex. Neuropsychologia, 35(10): 1373-1380. Barto, A.G. (1995) Adaptive critics and the basal ganglia. In: J.C. Houk, J.L. Davis and D.G. Beiser (Eds.), Models of Information Processing in the Basal Ganglia. MIT Press, Cambridge, MA, pp. 215-232. Barto, A.G., Sutton, R.S. and Anderson, C.W. (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst.. Man Cybernet., SMC-I 3: 834-846. Benes, EM., Vincent, C.L., Mane, A. and Kahn, Y. (1996) Upregulation of GABA(A) receptor-binding on neurons of the prefrontal cortex in schizophrenic subjects. Neuroscience, 75(4): 1021-1 03 1. Benes, F.M. (1993) Neurobiological Investigations in cingulate cortex of schizophrenic brain. Schizophr: Bull., 19(3): 537-549. Berman, K.F., Ostrem, J.L., Mattay, V.S., Esposito, G., Van Horn, J.D., Fuller Torrey, E. and Weinberger, D.R. (1994) The roles of the dorsolateral prefrontal cortex and hippo-
37 1
campus in working memory and schizophrenia. Biol. Psychiatry, 35: 622. Berman, K.F., Fuller Torrey, E., Daniel. D.G. and Weinberger, D.R. (1992) Regional cerebral blood flow in monozygotic twins discordant and concordant for schizophrenia. Arch. Gen. Psychiutry, 49: 927-94 1. Biver, F., Goldman, S., Luxen, A., Delvenne, V., De Maertelaer, V., De La Fuente, I., Mandlewicz, J. and Lotstra. F. (1995) Altered frontostriatal relationship in unmedicated schizophrenic patients. Psychiatry Res.: Neuroimaging. 6 I : 161-171. Busatto, G.F., David, AS., Costa, D.C., Ell, P.J., Pilowsky, L., Lucey, J.V. and Kenvin, R.W. (1995) Schizophrenic auditory hallucinations are associated with increased regional cerebral blood flow during verbal memory activation in a study using single photon emission computed tomography. Psychiatry Res.: Neuroimaging, 61: 255-264. Cohen, J.D., Braver, T.S. and O’Reilly, R.C. (1996) A comptuational approach to prefrontal cortex, cognitive control and schizophrenia: recent developments and current challenges. Philosophical Transactions of the Royal Society of London B., 351: 1515-1527. Coull, J.T., Middleton, H.C., Robbins, T.W. and Sahakian, B.J. (1995) Contrasting effects of clonidine and diazepam on tests of working memory and planning. Psychopharmaculogy, 120: 31 1-321. Cowan, R.L., Sesack, S.R., Vanbockstaele, E.J., Branchereau, P., Chan, J. and Pickel, V.M. (1994) Analysis of synaptic inputs and targets of physiologically characterised neurons in rat frontal-cortex - combined in-vitro intracellular-recording and immunolabeling. Synapse, 17(2): 101-1 14. Dias, R., Robbins, T.W. and Roberts, A.C. (1996) Dissociation in prefrontal cortex and attentional shifts. Nature, 380: 69-72. Dias, R., Robbins, T.W. and Roberts, A.C. (1997) Dissociable forms of inhibitory control within prefrontal cortex with an analog of the Wisconsin Card Sort Test: restriction to novel situations and independence from ‘on-line’ processing. J. Neurosci., I7(23): 9285-9297. Dolan, R.J., Fletcher, P., Frith, C.D., Ftiston, K.J., Frackowiak, R.S.J. and Grasby, P.M. (1995) Dopaminergic modulation of impaired cognitition activation in the anterior cingulate cortex in schizophrenia. Narure, 378: 180-1 82. Feenstra, M.G.P., Botterblom, M.H.A. and van Uum, J.F.M. (1995) Novelty-induced increase in dopamine release in the rat prefrontal cortex in vivo: inhibition by diazepam. Neurosci. Lett., 189: 81-84. Freedman, R., Coon, H., Myles-Worsley, M., Orr-Urtreger, A., Olincy, A,, Davis, A,, Polymeropoulos, M., Holik, J., Hopkins, J., Hoff, M., Rosenthal, J., Waldo, M.C., Yaw, J., Young, D.A., Breese, C.R., Adams, C., Patterson, D., Adler, L.E., Kruglyak, L., Leonard, S. and Byerley, W. (1997) Linkage of a neurophysiolgoical deficit in schizophrenia to a chromosome 15 locus. Proceedings of the Narional Academy of Science of the United States of America, 94: 587-594.
Friedman, H.R. and Goldman-Rakic, P.S. (1994) Coactivation of prefrontal cortex and inferior pareital cortex in working memory tasks revealed by 2DG functional mapping in the rhesus monkey. J. Neurosci., 14: 2775-2788. Friston, K.J., Liddle, P.F., Frith, C.D., Hirsch, S.R. and Frackowiak, R.S.J. (1992) The left medial temporal region and schizophrenia. Bruin, I IS: 367-382. Funahashi, S. and Kubota, K. (1994) Working memory and prefrontal cortex. Neurosci. Res., 21: 1-1 1. Fuster, J. (1989) The Prefrontal Cortex. Raven Press, N.Y. Garris, P.A. and Wightman, R.M. (1994) Different kinetics govern dopaminergic transmission in the amygdala, prefrontal cortex and striatum: An in vivo Voltammetric Study. J. Neurosci., l4( 1): 442-450. Gerstner, W. (1998a) Populations of Spiking Neurons. In: W. Maass, C.M. Bishop (Eds.), Pulsed Neural Networks, chap. 10. MIT Press, pp. 257-29 1. Gerstner, W. (1998b). Spiking neurons. In: W. Maass and C.M. Bishop (Eds.), Pulsed Neural Networks, chap. 1. MIT Press, pp. 3-54. Gerstner, W. and van Hemmen, J.L. (1994) Coding and information processing in neural networks. In: E. Domany, J.L. van Hemmen and K. Schulten (Eds.), Models of Neural Networks 11: Temporal Aspects of Coding and Information Processing, chap. 1. Springer-verlag, pp. 1-93. Goel, V. and Grafman, J. (1995) Are the frontal lobes implicated in ‘planning’ functions? interpreting data from the Tower of Hanoi. Neuropsychologia, 33(5): 623-642. Goldman-Rakic, P.S. (1990) Neocortical memory circuits. Cold Spring Harbor Symposia on Qualitative Biology, 55: 1025- 1038. Goldman-Rakic, P.S. (1995) Toward a circuit model of working memory and the guidance of voluntary motor action. In: J.C. Houk, J.L. Davis and D.G. Beiser (Eds), Models of Information Processing in the Basal Ganglia, chap 7. MIT Press, Cambridge, Massachussetts, pp. 131-148. Grace, A.A. (1991) Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: a hypothesis for the etiology of schizophrenia. Neuroscience, 4 l ( 1): 1-24. Gray, J.A., Feldon, J., Rawlins, J.N.P., Helmsley, D.R. and Smith, A.D. (1991) The neuropsychology of schizophrenia. Behav. Brain Sci., 14: 1-84. Grobin, A.C. and Deutch, A.Y. ( 1998) Dopaminergic regulation of extracellular gamma aminobutyric acid levels in the prefrontal cortex of the rat. J. Pharmacol. Exp. Therapeutics, 285(1): 350-357. HajDahmane, S . and Andrade, R. (1997) Calcium-activated cation non-selective current contributes to the fast afterdepolarisation in rat prefrontal cortex neurons. J. Neurophysiol., 78(4): 1983-1989. Harrison, P.J. (1997) Schizophrenia: a disorder of neurodevelopment? Current Opinion in Neurobiology, 7: 285-289. Heckers, S., Rauch, S., Goff, D., Savage, C., Schacter, D., Fischman, A. and Alpert, N. (1998) Impaired recruitment of
372 the hippocampus during conscious recollection in schizophrenia. Nut. Neurosci., l(4): 318-323. Hertz, I., Krogh, A. and Palmer, R.G. (199 I ) Introduction to rhe Theory of Neural Computation. Addison Wesley. Hille, B. (1992) Ionic Chaiinels qf Excitable Membranes (second edition). Sinauer Associates Inc. Sunderland, Massachussetts. Hoffman, R.E. and Dobscha, S.K. (1989) Cortical pruning and the development of schizophrenia: a computer model. Schizophe Bull., lS(3): 477489. Hoffman, R.E. and McGlashan, T.H. (1993) Parallel-distributed processing and the emergence of schizophrenic symptoms. Schizophr: Bull., 19(1): 119-140. Horn, D. and Ruppin, E. (1995) Compensatory mechanisms in an attractor neural network model of schizophrenia. Neue Comput., 7: 182-205. Kaelbling, L.P., Littman, M. and Moore. A.W. (1996) Reinforcement learning: a survey. J. Art$ Intell. Rex, 4: 231-285. Kaplan, R.D., Szechtman, H., Franco, S., Szechtman, B., Nahmias, C., Gamett, E.S., List, S., Cleghom, J.M. (1993) Three clinical syndromes of schizophrenia in untreated subjects: relation to brain glucose activity measured by positron emission tomography (PET). Schizophs Res., 1 I : 47-54. Keshaven, M.S., Anderson, S. and Pettegrew, JW. (1994) Is schizophrenia due to excessive synaptic pruning in the prefrontal cortex? The Feinberg hypothesis revisited. J. Psychiarr: Res., 28(3), 239-265. Levin, H.S., Eisenberg, H.M. and Brenton, A.L. (Eds.) (1991) Frontal lobe function and dysfunction. Oxford University Press. Lewis, D.A. and Anderson, S.A. (1995) The functional architecture of the prefrontal cortex and schizophrenia. Psycholog. Med., 25(5): 887-894. Lund, J.S. and Lewis, D.A. (1993) Local circuit neurons of developing and mature macaque prefrontal cortex: Golgi and immnocytochemical characteristics. J. Comparative Neurology, 328: 282-312. Montague, P.R., Dayan, P., and Sejnowski, T.J. (1996) A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci., 16(5): 1936-1947. Morice, R. and Delahunty, A. (1996) Frontal executive impairments in schizophrenia. Schizoph,hr: Bul/., 22( I): 125-137. Morris, R.G., Ahmed, S., Syed, G.M. and Toone, B.K. (1993) Neural correlates of planning ability: frontal lobe activation during the Tower of London test. Neuropsychologia, 31( 12): 136771378, Moms, R.G., Miotto, E.C., Feigenbaum, J.D., Bullock, P. and Polkey, C.E. (1997) The effect of goal-subgoal conflict on planning ability after frontal- and temporal-lobe lesions in humans. Neuropsychologia, 35(8): 1147-1 157. Mulder, A.B., Arts, M.P.M. and daSilva, F.H.L. (1997) Shortand long-term plasticity of the hippocampus to nucleus
acumbens and prefrontal cortex pathways in the rat, in vivo. Euro. J. Neurosci., 9(8): 1603-161 1. O’Donnell, P. and Grace, A.A. (1995) Synaptic interactions among excitatory afferents to nucleus accumbens neuronshippocampal gating of prefrontal cortical input. J. Neurosci.. IS(5): 3622-3639. Okubo, Y.,Suhara, T., Suzuki, K., Kobayashi, K., Inoue. O., Terasaki, O., Someya, Y., Sassa, T., Sudo, Y., Matsushima, E., Iyo, M., Tateno, Y. and Yoru, M. (1997) Decreased prefrontal dopamine DI receptors in schizophrenia revealed by PET. Nature, 385: 6343-636. Owen, A.M. (1997) Cognitive planning in humans: neuropsychological, neuroanatomical and neuropharmacological perspectives. Prog. Neurobiol., 53: 431450. Pantelis, C . , Barnes, T.R.E., Nelson, H.E., Tanner, S., Weatherley, L., Owen, A.M. and Robbins, T.W. (1997) Frontal-stnatal cognitive deficits in patients with chronic schizophrenia. Brain, 120: 1823-1 843. Passingham, R.E. (1993) The Frontal Lobes and C’oluntaty Action. No. 21 in Oxford psychology series. Oxford University Press. Pennartz, C.M.A., Groenewegen, H.J. and Silva, F.H.L.D. (1994) The nucleus accumbens as a complex of functionally distinct neuronal ensembles: an integration of behavioral, electrophysiological and anatomical data. Prog. Neurobiol., 42: 719-761. Rezai, K., Andreasen, N., Alliger, R., Cohen, G., 11, V.S. and O’Leary, D.S. (1993) The neuropsychology of the prefrontal cortex. Arch. Neurol., SO: 636-642. Silbersweig, D.A., Stern, E., Frith, C., Cahill, C., Holmes, A., Grootnook, S., Seaward, J., McKenna, P., Chua, S.E., Schnorr, L., Jones, T. and Frackowiak, R.S.J. (1995) A functional neuroanatomy of hallucinations in schizophrenia. Nature, 378: 176-179. Smiley, J.F., Levey, A.I., Ciliax, B.J. and Goldman-Rakic, P.S. ( 1994) D 1 dopamine-receptor immunoreactivity in human and monkey cerebral-cortex predominant and extrasynaptic localization in dendritic spines. Proceedings of the National Academy of Sciences of the United Stcites of America, 91(12): 5720-S724. Stevens, K.E., Kem, W.R., Mahmir, V.M. and Freedman, R. (1998) Selective a-nicotinic agaonists normalize inhibition of auditory response in DBA mice. Psychopharmacology, 136: 320-327. Svensson, T.H., Mathe, J.M., Anderson, J.L., Nomikos, G.G., Hildebrand, B.E. and Marcus, M. (1995) Mode of action of atypical neuroleptics in relation to the phencyclidine model of schizophrenia - role of 5-HT2 receptor and alpha(1)adrenoceptor antagonism. J. Clin. Psychopharmacol., 15(1 S1): s.1I-S.18. van Ooyen, A. and van Pelt, J. (1994) Activity-dependent neurite outgrowth and neural network development. Prog. Brain Res., 102: 245-259. Ward, G . and Allport, A. (1997) Planning and problem-solving using the five-disc Tower of London task. Quart. J. Exp. Psychol., SOA( 1): 49-78.
313 Williams, G.V. and Golmdan-Rahc, P.S. (1995) Modulation of memory fields by dopamine D1 receptors in prefrontal cortex. Nature, 316,512-515. Wilson, F.A.W., O’Scalaidhe, S.P. and Goldman-Rakic, P.S. ( 1994) Functional synergism between putative y-aminobutyrate-containing neurons and pyramidal neurons in prefrontal cortex. Proceedings qf the National Academy of Sciences of the United States ofAmerica, 91(9): 4009-4013. Wolkin, A,, Sanfilipo, M., Wolf, A.P., Angrist, B., Brodie, J.D. and Rotrosen, J. (1992) Negative symptoms and hypo-
frontality in chronic schizophrenia. Arch. Gen. Psychiafy, 49: 959-965. Yang, C.R. and Seamans, J.K. (1996) Dopamine DI receptor actions in layers V-VI rat prefrontal cortex neurons in vitro: modulation of dendritic-somatic signal integration. J. Neurosci., 16(5): 1922-1935. Yang, C.R., Seamans, J.K. and Gorelova, N. (1996) Electrophysiological and morphological properties of layers V-VI principal pyramidal cells in rat prefrontal cortex in vitro. J. Neurosci., 16(5): 1904-1921.
This Page Intentionally Left Blank
J.A. Reggia, E. Ruppin and D. Glanzman (Eds.) Progress in Bmin Research, Val 121 0 1999 Elsevier Science BV. All rights reserved.
CHAPTER 21
Neural models of normal and abnormal behavior: what do schizophrenia, parlunsonism, attention deficit disorder, and depression have in common? Stephen Grossberg Department of Cognitive and Neural Systems and Center for Adaptive Systems, Boston University, 677 Beacon Street, Boston, MA 02215, USA
Introduction: mental disorders as emergent properties In order to mechanistically explain how the brain gives rise to mental disorders, several problems need to be solved. First, one needs models of how the brain gives rise to the normal behaviors that are damaged or eliminated during the mental disorder. Then one needs to provide a clear description of which normal mechanisms are changed during the mental disorder. Many mental disorders may be traced to chemical or electrical imbalances of one sort or another. Knowing which chemical or electrical imbalance is involved is an important necessary condition for understanding the disorder, but it is far from sufficient. In addition, both for theoretical understanding and for informed clinical intervention, one needs to explain how each local lesion gives rise to the behavioral symptoms that characterize the disorder. Typically, such symptoms are emergent properties due to the interactions of many cells across the brain. Often these interactions involve non-linear cell properties and feedback between cells operating on multiple spatial and temporal scales. Thus, when one says that an imbalance of dopamine metabolism ‘causes’ schizophrenia or other mental disorders, such a statement may provide little insight into how this imbalance gives rise to the symptoms of
schizophrenia, despite its great value in guiding the search for clinically effective drugs. A theoretical method is therefore needed that is strong enough to explain how local lesions give rise to behaviorally relevant emergent properties across such a complex system. Even in cases where a malfunction may be localized within a particular neural subsystem, this subsystem’s interactions with several other subsystems may be disrupted. The other subsystems may then also contribute to abnormal symptoms. For example, if two subsystems are mutually inhibitory and one subsystem becomes abnormally hyporeactive, then the other subsystem may become abnormally hyperreactive, even though its own cells would otherwise perform normally. In such a situation, it may be difficult to decide, just by looking at small subsets of biochemical or neurophy siological data, whether the syndrome is due to hyporeactivity or to hyperreactivity. Due to the complementary reactions of the two subsystems, some measures of total system performance may show no effect or conflicting effects across subjects in whom the degree of imbalance varies. The present chapter discusses how some key symptoms of schizophrenia, Parlunson’s disease, attention deficit disorder, and depression can be caused as emergent properties of neural models of
316
behavior when their cellular parameters are perturbed out of the ‘normal’ range. These examples do not purport to be complete explanations of the symptoms in question, but they do show how such a model can help to close the gap between brain and behavior. In particular, several behavioral properties of these models simultaneously covary in a manner that reflects clinical data when one of its cellular parameters is varied. Other behavioral properties serve as model predictions to further test the proposed explanation. Finally, the normal functioning of the model clarifies the transition between normal and abnormal behavioral states. Each of these model circuits will be presented in its simplest possible form. For example, some properties are more easily demonstrated using excitatory transmitter substances, whereas they may be realized in vivo using a cascade of inhibitory transmitters that exert a disinhibitory action.
The golden mean: opponent processing, inverted-U, and rebound The models in question generate abnormal behavioral properties when they experience an abnormally low or high arousal level. Such a change may be due to a number of different factors. One key type of model involves an opponent processing circuit that has been called a gated dipole (Grossberg, 1972a,b, 1980). Opponent processing can influence motivated behavior. Here the opponent channels may control such opposed motivational factors as fear and relief (Estes and Skinner, 1941; Denny, 1971). Opponent processing also influences perceptual processing, where it may control opponent colors such as red and green, or perpendicular orientations such as vertical and horizontal, or opposite directions such as up or down (Helmholtz, 1866, 1962; Brown, 1965; Sekuler, 1975). Motor behaviors may also have an opponent organization, as illustrated by GO and STOP signals for gating the onset or offset of motor actions (Horak and Anderson, 1984a,b), and the opponent organization and control of flexor and extensor muscle groups. Why is opponent processing so ubiquitous in the nervous system? It has been proposed (e.g. Grossberg, 1980, 1982c) that opponent processing helps
the brain to self-organize its neural circuits in a self-stabilizing way, both during childhood development and adult learning; that is, in a way that develops and learns neural circuits that match the statistics of the environment, and that dynamically buffers these circuits against catastrophic reorganization by irrelevant environmental fluctuations. Two key properties of these opponent processing circuits are their inverted-U and reset properties. The inverted-U property (Fig. 1) enables a gated dipole circuit to maintain a type of Golden Mean in response to the circuit’s arousal level. The concept of arousal level, as here used, needs to be carefully defined, because several different mechanisms can all change the arousal level, from a functional point of view, without seeming to be arousal-specific mechanisms. This Golden Mean says that circuit sensitivity to input fluctuations is optimal at moderate arousal levels, but degrades in different ways when the circuit is either underaroused or overaroused. These properties are mathematically
GOLDEN MEAN INVERTED U AS A FUNCTION OF AROUSAL
I
Arousal
UNDERAROUSED DEPRESSION
OVERAROUSED DEPRESSION
Elevated Threshold
Low Threshold
Hyperexcitable Above Threshold
Hypoexcitable Above Threshold
“UP brings excitability “DOWN” Fig. 1. Gated dipole opponent processes exhibit an Inverted4 as a function of arousal level, with underaroused and overaroused depressive syndromes at the two ends of the Inverted-U.
311
proved in the Appendix. The main text will describe them heuristically. Such inverted-U properties are well-known to occur in behavior. For example, D-amphetamine sulfate activates feeding in an anorectic cat at the same dose that totally inhibits feeding in a normal cat (Wolgin et al., 1976). In normal cats, smaller amounts of norepinephrine can have effects opposite to those of larger amounts (Leibowitz, 1974). In like manner, amphetamine augments slow behavior and depresses fast behavior (Dews, 1958). In humans, dopamine pharmacological manipulations have shown that the relation of dopamine activity to reaction-time performance is an inverted-U function (Zuckerman, 1984; Netter and Rammsayer, 1991 ; Rammsayer, Netter, and Vogel, 1993). Subjects high on extraversion and sensation seeking scales show impaired task performance if a dopamine agonist is applied, and improved performance if an antagonist is applied. The opposite pattern was found in subjects low in these traits. inverted-& have also been reported in eventrelated potentials, such as the contingent negative variation, or CNV (Tecce and Cole, 1974). In the analysis below of mental disorders, the assumption will be made, and supportive data cited, that various circuits are far off their optimal level of arousal, and are seriously underaroused or overaroused. The reset property involves a process of antagonistic rebound that can be triggered in at least two ways. Both ways enable the circuit to shut off currently active cells, disconfirm their processing by transiently activating their opponent channels, and thereby restore the balance between opponent channels in order to process subsequent inputs with as little bias as possible. The two ways of causing an antagonistic rebound are a sudden decrease of a previously sustained input to one channel of the opponent circuit (see Fig. 2), and a sudden increment of arousal to both channels. A sudden decrease of input to one channel (say, the ‘fear’ channel, or the ‘red’ channel) can lead to a transient activation, or antagonistic rebound of activity, in the opponent channel (say, the ‘relief’ channel, or the ‘green’ channel). Antagonistic rebound has many functional uses. For example, when there is a sudden decrease of fearful cues in
a given situation, then both classical conditioning and instrumental conditioning mechanisms can use the relief rebound as a source of positive motivation with which to learn the sensory-motor contingencies that led to the reduction of fear (cf. Masterson, 1970; Reynierse and Rizley, 1970; Denny, 1971; McAllister and McAllister, 1971a,b; Grossberg, 1982b, 1984b, 1987b). Likewise, if hypothalamic stimulation elicits a given behavior, then its offset can transiently elicit an opposite behavior (Cox et al., 1969; Valenstein et al., 1969). A sudden increment in arousal may be due to an unexpected event (Renault and Les&vre, 1978; Naatanen et al., 1982; Naatanen and Gaillard, 1983; Banquet and Grossberg, 1987; Grossberg, 1984a, 1987). Such a rebound can disconfirm and reset ongoing sensory, cognitive, motivational, and motor processing in order to enable the brain to
TRANSIENT OF F-RESPONSE
SUSTAINED ON RESPONSE
-
ON
X
5
X
3
x+-L
:If T:
X r4 -
f
J I , i’v
I k t
Fig. 2. The simplest feedfonvard gated dipole circuit. In response to a phasic input J and tonic arousal input I, the dipole generates an ON-response and a transient antagonistic OFFrebound due to the action of the habituative transmitter gates within the square synapses. See text and Appendix for details.
318
better process the unexpected information and deal with it adaptively.
Underaroused and overaroused depressive syndromes The mechanisms that enable a gated dipole to achieve these useful emergent properties at normal arousal levels also generate clinically relevant properties when the arousal level is chosen too low or too high. As noted in Fig. 1, an underaroused gated dipole generates a syndrome of Underaroused Depression. Here, due to an abnormally small arousal level, inputs must be larger than normal in order to overcome the dipole’s increased response threshold. Paradoxically, once inputs are chosen large enough to overcome this threshold, then the circuit is hyperexcitable above threshold, meaning that the dipole generates abnormally large outputs in response to additional input increments. This is paradoxical because a naive view might conclude that an elevated threshold would make the circuit less, rather than more, excitable. Because such a circuit is hyperexcitable at low arousal levels, its excitability can be brought into the normal range by increasing its arousal until it reaches the peak of the Inverted-U. Here, the threshold is lower, but the network’s excitability is also lower. These properties clarify the paradoxical fact that an arousing drug can make some patients less excitable. This fact is more completely discussed below in terms of how amphetamines help attention deficit disorder, or juvenile hyperactivity, patients (Swanson and bnsbourne, 1976; Weiss and Hechtmann, 1979) and L-dopa helps Parkinson’s patients (Riklan, 1973). It is not, however, the case that unlimited increases in arousal will make a dipole behave more normally. Too much arousal generates a syndrome of Overaroused Depression. Here the extra arousal causes the response threshold to be very low. Paradoxically, however, the circuit is hypoexcitable above this low threshold, so that it generates small responses, at best, to inputs of arbitrary size. Thus ‘too much of a good thing’, such as amphetamine or L-dopa for the patients mentioned above, can create a new, and complementary, problem to the one for which they are
being treated. For example, large doses of amphetamine and L-dopa can cause a psychosis reminiscent of schizophrenia (Riklan, 1973; Wallach, 1974), although, L-dopa has also been reported to improve negative schizophrenic symptoms (Gerlach and Luhdorf, 1975; Albert and Rush, 1983). These arousal problems are discussed below in terms of negative symptoms such as flat affect in schizophrenia, and how it may lead to other schizophrenic symptoms through its interactions with other brain processes. In the opposite direction, antipsychotic drugs that block dopamine receptors (Kuhar et al., 1978) can, in sufficient quantities, produce a catalepsy suggestive of Parkinson’s disease (Hornykiewicz, 1975).
The whole is greater than the sum of its parts: arousal, transmitters, signals, competition, and thresholds are the parts The paradoxical emergent properties of a gated dipole are due to five basic mechanisms acting together (Grossberg, 1972b, 1980, 1984a,b). Figure 2 depicts the simplest example of a circuit that realizes these mechanisms. The two dipole channels are called the ON and OFF channels in the subsequent exposition. The ON channel is turned on by a phasic input, denoted by J in Fig. 2; the OFF channel registers the antagonistic rebound that occurs when the phasic input to the ON channel shuts off. The five mechanisms are: (1) a source of nonspecific arousal, denoted by I in Fig. 2, energizes both channels of the dipole; (2) a nonlinear signal function, denoted by f i n Figure 2, transduces the sum of phasic and arousal inputs to each channel; (3) a habituative transmitter substance multiplies, or gates, the nonlinear signals from both channels; (4) the gated signals compete via an on-center off-surround network; and (5) the net signal after competition is half-wave rectified, or thresholded, before generating an output from the network. The key mechanism that governs dipole dynamics is the habituative transmitter gate. This mechanism varies on a slower time scale than the rate with which input signals fluctuate, and thereby provides the ‘memory’ that calibrates the size of the antagonistic rebound. This transmitter process
379
was originally derived as the minimal mechanism whereby data about associative learning could be explained using a chemical transmitter capable of generating an unbiased signal from one cell to another (Grossberg, 1968, 1969a). Transmitter habituation occurs when the recovery rate of the transmitter falls behind the input fluctuation rate. Similar model mechanisms have been used to explain how the relative timing of paired presynaptic and postsynaptic signals can influence the strength of the adaptive weights, or associations, that link presynaptic to postsynaptic cells (Markram et al., 1997). Other early work (e.g. Grossberg, 1972a,b, 1980, 1982b,c, 1984a,b; Grossberg and Gutowski, 1987) has explained many data about normal cognitive and emotional processing using habituative chemical transmitters operating in a gated dipole within a larger network architecture. It is because so many data about normal cognitiveemotional behaviors have been clarified by these circuits that their ability to map onto clinical properties in their underaroused and overaroused regimes takes on such potential significance. Figure 3 illustrates how, in response to a changing input signal S(t), a habituative transmitter z ( t ) can gradually equilibrate to the signal’s more rapidly changing amplitudes. In particular, higher input amplitudes lead to lower levels of transmitter. The transmitter z ( t ) multiplies, or gates, the input S ( t ) to generate a gated output signal T(t)= S(t)z(t). Due to this gating process, monotonic changes in input amplitude S ( t ) cause overshoots and overshoots in the gated output T(t), before the transmitter gradually equilibrates, or habituates, to the new input level. In this simplified model, the transmitter accumulates to a fixed equilibrium concentration at a constant rate, and is inactivated, or habituates, at a rate proportional to T(t). Figure 2 shows what happens when such a habituative transmitter is embedded within a gated dipole, notably in the dipole’s square synapses. Here the transmitter in the ON channel gates the sum of the phasic input J and the tonic arousal input / after they are transformed by the signal f (variable x, in Fig. 2). As a result, just as in Fig. 3, increases and decreases the total input I + J are transformed by the habituative transmitter into overshoots and undershoots of the gated signal
(variable x3 in Fig. 2). When these overshoots and undershoots are processed by the on-center offsurround network and the output threshold, a habituative response in the ON channel (variable ON in Fig. 2) and a transient antagonistic rebound in the OFF channel (variable OFF in Fig. 2) are produced. The antagonistic rebound is energized by the arousal input I, which can activate the OFF channel even after the phasic input J to the ON channel shuts off. The effects of the phasic input f depend upon interactions between all the five mechanisms that define the dipole. These interactions determine the arousal levels / at which the dipole is in the mid-point of its Inverted-U or at an underaroused or overaroused extreme. Perhaps the most recent experimental evidence for such habituative transmitters in the brain has been reported by Abbott et al. (1997), who have used the same law to explain their data about S1
I
1
t
t OVERSHOOT HABITUATION
(FAST-SLOW) UNDERSHOOT
t
Fig. 3. In response to rapid increases and decreases in input amplitude S(t). the habituative transmitter z(f) decreases and increases in the reverse direction. The gated output signal T(r)= S(r)z(r) is the product of these fast and slow reactions, and generates overshoots and undershoots to the changes in S(r), followed by habituation to an intermediate level.
380
‘depressing synapses’ in the visual cortex. Their data exhibit many of the habituative transmitter properties that had previously been used to model other vision data (e.g. Grossberg, 1980, 1987a; Carpenter and Grossberg, 1981; Ogmen and GagnC, 1990; Ogmen, 1993; Francis et al., 1994; Francis and Grossberg, 1996a,b).
Recurrent opponent processes can actively modulate associative learning Opponent processes often include feedback pathways within and between their ON and OFF channels, as illustrated in Fig. 4.These feedback pathways realize a type of short-term memory whereby the opponent process maintains a steady operating level against sufficiently small environmental perturbations, and switches between different pathways when a sufficiently large environmental change occurs. For example, a motivational dipole can hereby maintain a steady level of motivation during the performance of a consummatory act, and cannot be reset by insignificant environmental distractions.
OUTPUT
OUTPUT
\
/
J
I
Fig. 4. A READ circuit: This circuit joins together a recurrent gated dipole with an associative learning mechanism. Learning is driven by signals from sensory representations S which activate associative weights that learn activation levels within the ON-channel and OFF-channel of the dipole.
Feedback pathways can also regulate how learning occurs by ensuring that associative synapses which input to the opponent circuit respond only to the circuit’s net activity after opponent competition takes place. In this way, the circuit can dissociate the read-out of previously learned associations from the read-in of new associations. Read-in senses only the competitive ‘decision’ that is made by the opponent interactions. This is called the associative dissociation property. Grossberg and Schmajuk (1 987) have modeled how recurrent opponent processing circuits may operate during reinforcement learning. Here the opponent channels represent opponent drive states, such as fear and relief. The name ‘READ circuit’ was used to describe this circuit because it is a REcurrent Associative Dipole that combines opponent processing with associative learning (Fig. 4). The READ circuit enables the associative dissociation property to be realized in a simple way, as noted below. It hereby realizes several useful properties: learning can go on indefinitely without saturating the associative memories; these memories can persist until they are disconfirmed by unexpected events; and network learning is buffered against noise. Dissociation is realized by placing the associative synapses on dendritic spines. Read-out of old associative values occurs at the dendritic spines and then propagates towards the cell body where they may or may not succeed in generating cell firing. Those cells which do fire and whose firing survives opponent competition may fire long enough to trigger retrograde dendritic spikes that invade the dendritic spines. Here they can drive read-in of new associative values at those synapses which are receiving concurrent presynaptic signals. This use of retrograde dendritic spikes from the cell body to dendritic spines as a way to dissociate associative read-out from read-in has been a recurrent theme during the development of the model and its extensions (e.g., Grossberg, 1975, 1987b, p. 38; Grossberg and Schmajuk, 1987). Remarkable experimental progress has recently been made towards characterizing how retrograde dendritic spikes can influence associative learning at dendritic spine sites (Johnston et al., 1996; Magee and Johnston, 1997; Markram et al., 1997; Koester and
38 I
Sakmann, 1998). It still remains to test whether or not these spikes are used to achieve associative dissociation. One type of direct test would monitor such retrograde spikes and their associative consequences during two conditions. In both conditions, the inputs to the recorded dendritic apparatus would be the same. In one condition, its cell body would ‘win’ the recurrent competition with a nearby stimulated cell; in the other, it would lose the competition. Learning should be greater in the former case.
Control of attention and action by cognitiveemotional interactions READ circuits and their variants may be embedded within model representations of cognitive, emotional, and motor processes. Interactions among these representations lead to emergent properties that resemble symptoms of mental disorders when the dipoles are underaroused or overaroused. Some of these interactions are schematized in Fig. 5 in their simplest form. Circuits of this type will be called CogEM circuits henceforth in order to abbreviate their Cognitive, Emotional, and Motor interactions. CogEM models have undergone pro-
INCENTIVE MOTIVATIONAL LEARNING
T
.oTo>\ LEARNING
INTERNAL DRIVE INPUT
Fig. 5. The simplest CogEM circuit. A t least three types of representations-sensory S , drive D. and motor M-interact to control cognitive-emotional interactions. At least three types of learning-conditioned reinforcer, incentive motivational, and motor learning-connect these representations. Conditioned stimuli (CS) activate the S representations, which compete among themselves for limited-capacity short-term memory activation and storage. The activated S representations elicit conditionable signals to drive representations and motor representations. See text for details.
gressive development to explain ever more behavioral and neural data about normal cognitiveemotional interactions, including reinforcement learning and attention (e.g., Grossberg, 1971, 1972a,b, 1982a,b; Grossberg and Gutowski, 1987; Grossberg and Levine, 1987; Grossberg and Schmajuk, 1987; Buonomano et al., 1990; Grossberg and Merrill, 1992, 1996). The CogEM architecture may be derived from simple and broadly accepted hypotheses about how associative learning occurs (see Grossberg 1982b). Figure 5 summarizes the hypothesis that (at least) three types of internal representation interact during reinforcement learning: sensory and cognitive representations S, drive representations D, and motor representations M . The S representations are thalamocortical representations of external events, including the object recognition categories that are learned by inferotemporal and prefrontal cortical interactions (Ungerleider and Mishkin, 1982; Mishkin et al., 1983; Desimone, 1991; Gochin et al., 1991 ; Harries and Perrett, 199 1). The D representations include hypothalamic and amygdala circuits at which homeostatic and reinforcing cues converge to generate emotional reactions and motivational decisions (Halgren et al, 1978; Bower, 1981; Gloor et al., 1982; Aggleston, 1993; LeDoux, 1993; Davis, 1994). The M representations include cortical and cerebellar circuits that control discrete adaptive responses (Evarts, 1973; Ito, 1984; Thompson, 1988; Kalaska et al., 1989). More complete models of the internal structure of these several types of representations are developed elsewhere (e.g., Grossberg, 1987b; Grossberg and Schmajuk, 1987; Carpenter and Grossberg, 1994; Fiala et al., 1996; Grossberg and Merrill, 1996; Contreras-Vidal et al., 1997; Bullock et al., 1998). Even the model in its simplest form has successfully learned to control motivated behaviors in mobile robots (e.g., Baloch and Waxman, 1991; Gaudiano et al., 1996; Gaudiano and Chang, 1997; Chang and Gaudiano, 1998). Three types of learning take place among these representations: The S+ D conditioned reinforcer learning converts a conditioned stimulus (CS) into a reinforcer by pairing activation of its sensory representation S with activation of the drive representation D , where representation D is acti-
382
vated by an unconditioned stimulus (US) or other previously conditioned reinforcer CSs. The D 4S incentive motivational learning enables an activated drive representation D to prime, or modulate, the sensory representations S of all cues, including the CSs, that have consistently been correlated with it. Activating D hereby generates a ‘motivational set’ by priming all of the sensory and cognitive representations that have been associated with that drive in the past. These incentive motivational signals are a type of motivationally-biased attention. The S - M motor, or habit, learning enables the sensorimotor maps, vectors, and gains that are involved in motor control to be adaptively calibrated. These processes control the learning, recognition, and recall of sensory and cognitive memories (‘declarative memory’; Mishkin, 1982, 1983; Squire and Cohen, 1984) and the performance of learned motor skills (‘procedural memory’; Gilbert and Thatch, 1977; Ito, 1984; Thompson, 1988). In positive feedback particular, learned S+ D-S quickly draws attention to motivationally salient cues by amplifying the activation of their sensory representations. The sensory representations use recurrent interactions to store these activities in short-term, or working, memory (Baddeley, 1986). This is accomplished by linking the sensory representations by a recurrent on-center off-surround network, whereby cells excite themselves and possibly their immediate neighbors, and inhibit a wider range of cells, possibly including themselves (Fig. 5 ) . Such a network enables the sensory representations to store activities that retain their sensitivity to the relative sizes of their inputs, while also tending to conserve, or normalize, the total activity among the representations (Grossberg, 1973, 1978a,b; Bradski et al., 1994). This activity normalization property realizes the limited capacity of short-term memory, since when one sensory representation gets very active, the representations with which it competes are forced to become less active.
Two types of distractability due to overaroused drive representations Taken together, these learning and short-term memory mechanisms help to explain data (Pavlov,
1927; Kamin, 1968, 1969) about how attention can be focused on motivationally salient cues, and ‘blocked’ from being allocated to less salient or irrelevant cues: When the sensory representations S that categorize conditioned reinforcers are amplified by their strong S+ D-S attentional feedback pathways, they can block activation of other S populations via S--, S lateral inhibition. (Grossberg and Levine (1987) have presented model simulations of attentional blocking.) In a more elaborate version of the model, the drive representations are built up from motivational READ circuit dipoles that code such opponent drive states as fear and relief (Grossberg and Schmajuk, 1987). If these emotional dipoles are overaroused, for any number of reasons, then they cannot be effectively activated by their sensory and cognitive inputs. Their activities remain small no matter how large these inputs become. The result is flat effect, because the dipoles cannot generate a large emotional response. As a consequence of this reduced response, the dipoles cannot generate adequate incentive motivational feedback signals with which to activate motivationally compatible sensory representations (see Fig. 5). These sensory representations thus cannot successfully compete for attention based upon their motivational salience; nor can they control the release of motivationally appropriate responses. In the absence of motivationally directed attention, motivationally irrelevant cues can attract attention and generate inappropriate responses. In summary, overaroused drive representations can lead to flat effect and distractability. These problems may elicit additional cognitive problems as a result of the way in which the motivational circuits interact with other circuits in the brain. For example, attentional and consummatory circuits compete with orienting circuits for the control of behavior (Grossberg, 1980; Staddon, 1983), as schematized in Fig. 6. This property realizes a competition between the circuits that process expected events and those that process unexpected events. The latter circuits help to incorporate the unexpected events, notably unfamiliar events, into the corpus of expected and familiar events through learning. The competition between attentional and orienting systems enables
383
Drive
Orienting
Fig. 6. Competition occurs between consummatory and orienting circuits. Here it is realized between drive representations and orienting representations.
such learning to take place without forcing unselective forgetting of previously learned knowledge (Grossberg, 1980; Carpenter and Grossberg, 199 1; Grossberg and Merrill, 1996). A simple example illustrates what this competition means: Suppose that you hear a sudden and unexpected loud noise to your right. The noise elicits a rapid orienting movement to look at the loud noise for further processing. On the other hand, suppose that you are trained to use the loud noise as a discriminative cue for pushing a button to receive a large monetary award. Then the orienting responses can be supplanted by consummatory responses such as button-pressing. When motivational dipoles in the drive representations D are overaroused, then they cannot adequately inhibit the orienting circuits with which they interact. As a result, overarousal can disinhibit orienting responses in response to any events that happen to occur, and cannot prevent these distracting responses from occurring when motivationally salient events are happening. Overarousal of opponent drive representations can hereby have multiple effects. It can cause flat effect by desensitizing emotional dipoles to emotionally charged events. It can cause distractability in at least two ways: Overarousal can reduce the incentive motivational signals that help to focus attention upon motivationally relevant events (motivational distractability), and it can disinhibit orienting reactions whereby irrelevant events can continually disrupt attentional processing (orienting distractability). Hyperreactive orienting reactions may hereby be generated by hyporeactive emotional reactions (Ellinwood and Kilbey, 1980). The incentive motivational pathways in Fig. 5 have been interpreted as generators of Continent Negative Variation, or CNV, event-related poten-
tials, that covary with expectancy, decision (Walter et al., 1964), motivation (Irwin et al., 1966; Cant and Bickford, 1967), volition (McAdam et al., 1966), preparatory set (Low et al., 1966), and arousal (McAdam, 1969). The orienting reactions have been interpreted in terms of the N200 eventrelated potential, that has been linked to the processing of unexpected events (Renault and Lesevre, 1978; Banquet et al., 1981; Naatanen et al., 1982). The CogEM model hereby suggests how both the CNV and the N200 can become abnormal in an overaroused emotional syndrome. The model also suggests how the P300 eventrelated potential can also become abnormal in this way. In particular, CogEM interactions have been embedded within a larger framework, called Adaptive Resonance Theory, or ART, which also models how recognition categories are learned and recognized (Grossberg, 1982b, 1984b). ART suggests how the orienting subsystem may reset short-term memory in response to unexpected events, and thereby drive a memory search for a bettermatching recognition category with which to represent the unexpected event. This short-term memory reset event has been interpreted in terms of the P300 event-related potential (Grossberg, 1978a, 1982b, 1984b; Banquet and Grossberg, 1987). Grossberg and Merrill (1996) have summarized how the CogEM model can be embedded into an ART recognition learning model that also includes learning of adaptively-timed motivated attention and movement. This extension clarifies how the CNV becomes adaptively timed.
Polyvalent interactions between sensory, drive, and motor representations Where in the brain do such interactions occur? In order to make this connection, the circuit in Fig. 5 needs to be expanded. In its present form, after a reinforcing cue activates its sensory representation S , it can activate a motor representation M even as it sends conditioned reinforcer signals to a drive representation D. Thus a motor response can be initiated before the sensory representation receives incentive motivational feedback to determine whether the sensory cue should generate a response at that time. For example, eating behavior could be
384
initiated before the network could determine if it was hungry. Even in the circuit of Fig. 5 , each drive representation D obeys a polyvalent constraint whereby it can generate incentive motivational output signals only if it gets a sufficiently large primary or conditioned reinforcer input at the same time that it gets a sufficiently large internal drive input. The internal drive input designates whether an internal drive, such as hunger, thirst, sex, etc. is high and in need of satisfaction. Different drive representations exist to represent these distinct internal homeostatic states. Due to the polyvalent constraint, an external cue cannot activate strong incentive motivation, and with it action, to satisfy a drive that is already satisfied. On the other hand, the
circuit, as it stands, could trigger such an action even if incentive motivational support is not forthcoming. A way is needed to prevent the sensory representation from triggering an action until it gets incentive feedback from a motivationally -consistent drive representation. Figure 7 describes the minimal network in which this property can be achieved (Grossberg, 1971; Grossberg and Levine, 1987). In it, the sensory representation corresponding to a given cue is broken into two stages, or populations, rather than the single stage in Fig. 5 . Presentation of a given cue, or CS, activates the first stage of its sensory representation. This activation is stored in shortterm memory using the positive feedback pathways within the sensory representation. This stored
MOTOR RESPONSE
t
DRIVE INPUT
Fig. 7. Each sensory representation possesses (at least) two stages with STM activities x,, and . Y , ~ . A CS or US input activates its corresponding x,, . Activation of x,, generates signals to x,* and conditioned reinforcer signals to D. In response to a conditioned reinforcer CS. conditioned incentive motivational signals from L) activate the second stages x , ~ which , deliver feedback signals to the corresponding first stages x , ~Some . first stages x , ~are hereby amplified by conditionable feedback and block activation of other, less favored, sensory representations. Motor learning is elicited by sensory-motor signals from the winning .q2 to the motor representations.
385
activity gives rise to output signals to all the drive representations with which the sensory representation is linked, as well as to the second stage of its sensory representation. The second stage of its sensory representation obeys a polyvalent constraint; it cannot fire unless it receives converging signals from the first stage and from a drive representation. To see how this works, suppose that the first stage of the sensory representation has a strong connection to one or more drive representations, whether the prewired connection of a primary reinforcer or the learned connection of a conditioner reinforcer. If the sensory representation strongly activates a drive representation when the drive representation is receiving a sufficiently large drive input, then the polyvalent constraint of the drive representation is satisfied and the drive representation can fire. All the drive representations that are active at that time compete among themselves to allow the most active one - the one that represents the best combination of sensory and drive information at that moment - to fire. If the winning drive representation has a strong prewired or learned incentive motivational pathway to the second stage of the cue’s sensory representation, then the polyvalent constraint of the second stage is overcome, and the sensory representation can fire. Such firing can control the release of motivationally compatible actions. In summary, by making the final stages of both the sensory and the drive representations polyvalent, then the S+ M motor pathways are activated only if the S - D - S feedback pathway can get sufficiently activated.
Motivational amplification and blocking of sensory and cognitive processing Figure 7 indicates how the second stage of sensory representation may be gated by motivational signals. How does the first stage of sensory representation benefit from motivational modulation? The model proposes that excitatory feedback pathways exist from the second stage to the first stage. Keep in mind that the second stage receives motivational input only if a drive representation with which it is associated is prepotent in the present cognitive-emotional context. Only those
second stage representations that receive these motivational signals can fire. As a result, positive feedback from the second stages to the first stages amplifies those active first-stage sensory representations that are motivationally prepotent in the present context. This provides the motivational amplification of activity that enables these sensory representations to attentionally block less salient representations via S+ S lateral inhibition. Figure 7 illustrates why feedback from higher to lower stages of sensory and cognitive processing is needed to simultaneously achieve motivationallyappropriate sensory attention and responding. In particular, the top-down feedback enables motivationally-relevant selection of distributed information on the lower level of processing. Such top-down feedback is also needed to prevent sensory and cognitive learning from being destabilized and eroded by the ‘blooming buzzing confusion’ of irrelevant events (Grossberg, 1980, 1995).
Interactions between sensory cortices, amygdala, and orbital prefrontal cortex The circuit in Fig. 7 may, in principle, be replicated at multiple stages in the thalamocortical and corticocortical elaboration of environmental cues. One such brain circuit is depicted in Fig. 8. This figure is taken from Barbas (1995), who noted that many different types of sensory cortex, including visual, somatosensory, auditory, gustatory, and olfactory cortex, are connected to both the amygdala and to the prefrontal cortex. In this interpretation of Fig. 7, the various sensory cortices play the role of the first stages of the sensory representations, the prefrontal cortex plays the role of the second stages of the sensory representations, and the amygdala plays the role of the drive representations. The amygdala has been identified in both animals and humans to be a brain region that is involved in learning and eliciting memories of experiences with strong emotional significance (Halgren et al, 1978; Gloor et al., 1982; Aggleston, 1993; LeDoux, 1993; Davis, 1994). The feedback between the second and first sensory stages may be interpreted as an example of the ubiquitous positive feedback that occurs between cortical regions
386
\
J
Fig. 8. Caudal orbitofrontal areas receive projections from sensory cortex (visual, somatosensory, auditory, gustatory, and olfactory) and from the arnygdala, which also receives inputs from the same sensory modalities. (Reprinted with permission from Barbas ( I995 1.)
(Macchi and Rinvik, 1976; Tsumoto et al., 1978; van Essen and Maunsell. 1983; Felleman and Van Essen, 1991; Sillito et a]., 1994).
Schizophrenia and arousal: cognitive-emotional interactions The formal symptoms of the CogEM model when its drive representations are overaroused (or underof aroused) are strikingly reminiscent schizophrenic symptoms. This linkage was made in Grossberg (1972b, 1984a,b) in an attempt to connect neural mechanisms of normal cognitiveof emotional behavior with properties schizophrenic behavior. Some of the main symptoms are now reviewed. Because the model is still under development, the proposed linkage is necessarily incomplete. In addition, other model mechanisms that are reviewed below may also contribute to schizophrenic behaviors; not all of these symptoms are attributed to direct and indirect effects of improperly aroused emotional dipoles. Some types of schizophrenia have been ascribed to dopamine hyperactivity of various parts of the
limbic system, including increased dopaminergic input to the amygdala (Lloyd, 1978; Reynolds, 1983, 1987). This type of effect may be interpreted as an overaroused condition. This hypothesis is consistent with data showing that dopaminergic agonists, such as L-dopa and amphetamine, can produce a behavioral syndrome that has been compared to schizophrenia (Riklan, 1973; Torrey and Peterson, 1974; Wallach, 1974; Stevens, 1993), although L-dopa has been reported to improve negative schizophrenic symptoms (Gerlach and Luhdorf, 1975; Albert and Rush, 1983). In the opposite direction, various antipsychotic drugs block dopamine receptors (Kuhar et al., 1978) and in sufficient quantities can produce a catalepsy that resembles Parkinson’s disease (Hornykiewicz, 1975). This latter result, which suggests that some schizophrenics and Parkinson’s patients are at opposite ends of a dopamine continuum, is consistent with model properties in the underaroused state that resemble Parkinson’s disease (see below). More generally, the facts that an underaroused syndrome can be transmuted into an overaroused syndrome using a given drug, and that the reverse
387
transformation can be caused by an oppositely acting drug, suggest that the two syndromes may be extremal points on an inverted-U of a common mechanistic substrate, albeit one that may exist in multiple brain regions for different behavioral purposes. Because opponent processes like gated dipoles are assumed to exist in many brain regions, too much of a drug that is aimed at correcting a dopaminergic imbalance in one brain region may create an opposite dopaminergic imbalance in that and other brain regions. Multiple secondary effects, including lateralized effects that are different in different brain hemispheres, may also occur due to these dopaminergic abnormalities (Early et al., 1994), but these are beyond the scope of the present chapter. When the drive representations of Fig. 7 are overaroused, the interpretation of this circuit using Fig. 8 is consistent with data suggesting a possible involvement of prefrontal cortices in schizophrenia (Weinberger, 1988). In support of the CogEM model hypothesis that the prefrontal sensory representation gates the release of properly motivated actions, Fuster (1989) has concluded from studies of monkeys that the orbital (ventral) prefrontal cortex helps to suppress inappropriate responses. These monkey data are consistent with clinical evidence that patients with injury to orbital prefrontal cortex tend to behave in an inappropriate manner (Blumer and Benson, 1975; Liddle, 1994). Other research has suggested that schizophrenia may involve a chronic deficiency in striatal glutamate transmission due to decreased activity in those regions of the prefrontal cortex that project to the striatum (Carlsson, 1988; Andreasen, 1990; Grace, 1991; Lynch, 1992). One possible cause of decreased prefrontal activity may be a reduction in incentive motivational signals from overaroused (or underaroused) amygdala circuits that project to the prefrontal cortex. Other symptoms of schizophrenia are also similar to model properties. Since the time of Kraepelin (1913/1919), it has been noted that schizophrenics have difficulties with attentional control, motivation defects, and disorganization of behavior. Kraepelin wrote: ‘This behavior is without doubt clearly related to the disorder of attention
which we very frequently find conspicuously developed in our patients. It is quite common for them to lose both inclination and ability on their own initiative to keep their attention fixed for any length of time’ (pp. 5-6). Attentional deficits in schizophrenia have also been emphasized by a number of other workers; e.g. Bleuler (191 1/1950), Mirsky (1969) and Braff (1985). Liddle (1994) has refined this analysis by segregating schizophrenic symptoms into ‘three distinguishable syndromes: ( 1 ) psychomotor poverty (poverty of speech, flat affect, decreased spontaneous movement); (2) disorganization (disorders of the form of thought, inappropriate affect); and (3) reality distortion (delusions and hallucinations)’ (p. 43), which have been supported by several studies (Arndt et al., 1991; Pantelis et al., 1991; Sauer et al., 1991). Liddle suggested that two of these syndromes ‘reflect volitional disorders: psychomotor poverty reflects a difficulty initiating activity and disorganization reflects a difficulty in the selection of appropriate activity’ (p, 43). Both of these problems are, moreover, associated with impairment in neuropsychological tests of frontal lobe function (Liddle and Morris, 1991). The CogEM model suggests that one possible source of flat affect may be in overaroused emotional centers, such as the amygdala and its projections, and that this flat affect can lead to multiple deficits in behaviors that require the ability to sustain motivated attention on a consummatory task. Modeling work has not yet explicitly characterized how brain mechanisms of speech and movement control react to overarousal, although Grossberg et al. (1997) and Bullock et al. (1998) have modeled speech and movement control mechanisms that include volitional gain control mechanisms that may malfunction during certain mental disorders. One can nonetheless already discern how symptoms of poverty of speech, decreased spontaneous movement, disorders of the form of thought, and inappropriate affect might all be influenced by how flat affect reduces the incentive motivational signals that normally energize these behaviors. Reality distortions are ascribed below to overarousal of a different type of brain circuit.
388
How do schizophrenics lose a ‘theory of mind’? Frith (1992, 1994) has interpreted schizophrenic symptoms as impairments in the processes that underlie a ‘theory of mind’, including the ability to represent beliefs and intentions. For example, when asked to describe photographs of people, schizophrenics described their physical appearance, rather than their mental states (Pilowsky and Bassett, 1980). Frith noted, however, that the theory of mind approach ‘does not explain the other major feature of negative schizophrenia: their impoverishment of will (Frith, 1994, p. 150). He also wrote that ‘mental states include not only affects and emotions, but also goals and intentions. A person who was unaware of their goals could, on the one hand, be a slave to every environmental influence or, on the other hand, be prone to perseverative or stereotyped behavior, because they would not have the insight to recognize that certain goals were unobtainable or inappropriate’ (Frith, 1994, p. 151). The present model provides an intuitive framework that can begin to explain both types of behavior. Concerning the impoverishment of will: This loss may be linked to the flattening of affect and the consequence collapse of incentive motivational signals. Without these emotional/ motivational resources, all mental activities that depend upon interpreting one’s own emotional state, as well as the emotional states of others, will be diminished. Concerning goals and intentions: Without adequate incentive motivational signals, the prefrontal representations, such as those schematized in Figs 7 and 8, will not be adequately activated. Without adequately activated prefrontal representations, their top-down signals to earlier sensory and cognitive processing stages will be eliminated. As a result, these earlier representations will not be able to organize information according to its emotional meaning or to the individual’s motivational goals. In addition, motivationally irrelevant information will not be adequately blocked from attention, thereby making it difficult to maintain attention upon motivationally relevant events. Or, in Kraepelin’s words, schizophrenics ‘lose both inclination and ability on their own initiative to keep their attention fixed for any length
of time.’ This summary illustrates how a problem that is localized within one type of brain circuit can seriously disturb cognitive and emotional processing throughout the entire network with which that circuit interacts. The model summarized in Fig. 7 has been extended in various ways. One extension suggests how the hippocampal system may interact with cortical and amygdala circuits to learn new recognition categories and to adaptively time motivated attention to match situational constraints. Grossberg and Merrill (1996) have reviewed this extension and suggested why the cerebellum also contains adaptively timed circuits for the control of movement. Fiala et al. (1996) have modeled adaptive timing in terms of the metabotropic glutamate receptor system.
Contingent negative variation vs. readiness potential The CogEM model helps to clarify the functional difference between the Contingent Negative Variation, or CNV, event-related potential (Walter et al., 1964; Brunia et al., 1985; Bribaumer et al., 1990) and the Bereitschaftspotential, or BP, or readiness potential (Kornhuber and Deecke, 1965). The BP is a DC potential that precedes motor action by I to 2 seconds, and appears to originate in the Supplementary Motor Cortex. The CNV is a slow negative potential of prefrontal origin that occurs even earlier than the BP, and has been associated with an animal’s expectancy, decision, motivation, volition, preparatory set, and arousal (Fuster, 1995). Figure 9 summarizes the model hypothesis (Grossberg, 1975, Fig. 10; Grossberg, 1987b, p. 67) of how these two events may be related. The functional need for this anatomical distinction may be understood from the following example. Consider the incentive motivational feedback that is generated by positive and negative drive representations. Both types of drive representation carry positive incentive motivational signals, because it is just as important to pay attention to a source of fear as it is to pay attention to a source of pleasure. These positive incentive signals can amplify the sensory representations corresponding to fearful or pleasurable events, and
389
prefrontal cortex undergoes processing at multiple stages before it influences observable actions.
Schizophrenia as an overaroused syndrome: working memory and learned serial order
+
’/
ATTENTIONAL FEEDBACK
MOTOR INCENTIVE
I
Fig. 9. Distinct incentive pathways: The positive attentional feedback pathway from both positive and negative drive representations is distinguished from the positive or negative feedback pathways by which positive and negative reinforcers can activate or suppress movements, or control approach and avoidance movements.
thereby rapidly focus attention on them. For example, in his experiments on the effects of mood on memory, Bower (1981) found that sad-congruent lists are learned no worse than happy-congruent lists. He also found that incongruent moods can interfere with recall, which can be explained by the competitive interactions between drive representations and cue representations. On the other hand, many conditioning data (e.g. Estes and Skinner, 1941; Reynolds, 1968; Estes, 1969; Maier et al., 1969; Grossberg, 1972a) describe how fearful and other negative drive sources can suppress responding. How can attention be drawn to fearful cues at the same time that these cues can suppress responding? The CogEM model proposes that the incentive motivational inputs to the prefrontal cortex have positive sign, but that subsequent processing stages in the elaboration of motor actions may be modulated by both positive and negative motor arousal sources, which may also be used for the control of approach and avoidance behaviors. The former signals are linked to the CNV event-related potential, the latter to the BP event-related potential. Although this distinction would have to be carried much further to understand the details of how such responses are planned and executed, its very existence illustrates that decreased activity in
The inverted-U that occurs in an opponent process is not the only type of inverted-U that can influence abnormal behaviors. Another inverted-U has been proposed to occur during the processing of serially ordered events, such as a sentence or a planned series of actions. Such a sequence of events is temporarily stored in working memory (Grossberg, 1978a,b; Baddeley, 1986) before relationships between the events are encoded in long-term memory by associative learning. A model of this process has been developed (Grossberg, 1969b, 1978a,b; Grossberg and Pepe, 1970, 1971) in which sequences of events cause working memory activations that decay due to interference by subsequent events and the passage of time. These active shortterm memory traces generate learning signals which sample the distribution of activity across all the other event representations. Lateral inhibition among the representations enables the strongest associations to suppress weaker ones. In more complex versions of the model, as sequences of events are encoded in worlung memory, they trigger learning of cognitive planning chunks, or categories, that are selectively activated by particular event sequences. These chunks, in turn, learn to predict which subsequent events will occur. Grossberg and Pepe (1970, 1971) discovered an inverted-U that occurs when such networks learn and perform sequentially ordered series of events at different levels of arousal. The underaroused end of this inverted-U is easily understood in terms of an insufficient amount of arousal with which to energize the learning and encoding of short-term memory patterns into long-term memory. The overaroused end of the inverted-U is more difficult to understand because an ample amount of arousal is available with which to energize learning and performance. However, the patterning of learning and performance through time is seriously impaired by overarousal. It has been mathematically proved (see Grossberg, 1974 and 1982a, for reviews) that when all of the representations in such an associa-
390
tive network are overaroused, there is a reduction of associative span, contextual collapse, and noisy network activations. In other words, the network loses its ability to represent plans and other higherorder contextual representations that depend upon sequential information. The resulting contextual collapse, fuzzy response categories, and punning based on low-order associations are also characteristics of schizophrenia (Maher, 1977). These properties may clarify how the positive schizophrenic symptom of thought derailment may arise. Andreason (1979) defines derailment as ‘A pattern of spontaneous speech in which the ideas slip off the track onto another that is clearly but obliquely related, or onto one that is completely unrelated.’ This happens in overaroused serial learning networks because they cannot represent the higher-order temporal contexts that can keep thoughts ‘on track’. Other positive symptoms, such as auditory hallucinations and thought insertions may also be, at least in part, due to the collapse of the ability to maintain a sequential context long enough for its meaning to be elaborated, and the decrease in the network’s signal-to-noise ratio. Several types of evidence point to regions of the prefrontal cortex, such as the dorsolateral prefrontal cortex and the regions with which it interacts, as a substrate of working memory and its associative consequences (Fuster, 1973, 1989; Milner, 1982; Goldman-Rakic, 1987). A synthesis of these proposed cognitive and emotional sources of arousal lead to the (greatly oversimplified) schematic shown in Fig. 10 of how several positive and negative schizophrenic symptoms may be generated in patients who may be generally overaroused. As noted below, however, because there is an inverted-U as a function of arousal in a gated dipole, some properties of overarousal can also be caused by underarousal. To the extent that these properties are due to opponent processes like gated dipoles, they can be differentiated by parametric properties of underarousal such as the following.
The underaroused depressive syndrome A number of paradoxical properties are generated together in an underaroused gated dipole. These
O V E R A R O U S E D DEPRESSIVE S Y N D R O M E COGNITIVE
collapse of contextual associations (punning, rhyming)
EMOTIONAL
ORIENTING
Flat A f f e c t
Dislnhibit orienting reponses to irrelevant events
Fig. 10. Schizophrenia as an overaroused depressive syndrome: Overarousal of emotional opponent processes and of cognitive circuits for short-term and long-term memory of sequentially organized events (e.g. language, motor plans) can yield a combination of positive and negative symptoms.
properties, which are listed below, are mathematically proved in the Appendix, and interpreted below in terms of symptoms of attention deficit disorder and Parkinson’s disease. A . Elevated response threshold to phasic inputs. The output threshold is elevated in response to a phasic input J (see Fig. 2). In other words, a larger intensity J is needed to elicit a positive ON-output from an underaroused dipole. B. Suprathreshold hypersensitivity. The ONreactions are hypersensitive to increments in input intensity J that exceed the elevated threshold. In other words, larger than normal ON-outputs are produced by suprathreshold input increments in an underaroused dipole than in a normally aroused dipole. C. Sensitivity is brought down by a drug that brings arousal ‘up’. These hypersensitive reactions are reduced by a drug that acts like an arousal ‘upper.’ In other words, a drug that causes a parametric increase in arousal level I (see Fig. 2), or leads to an equivalent effect through an action taking place at previous or subsequent processing stages, decreases dipole sensitivity to increments in J .
39 1
D. Too much ‘up’ causes an overaroused syndrome. Too much of an ‘upper’ drug can depress output size by carrying the dipole over its InvertedU into the overaroused (large I ) range. E. Hyposensitive OFF-reuctions occur to phasic input decrements. This is true despite the fact that hypersensitive ON-reactions occur in response to phasic input increments. In particular, no OFFrebound may occur in response to cutting the phasic input J in half to J / 2 , and cutting an input J / 2 to 0 may cause an abnormally small OFF-rebound. Since OFF-rebounds act to reset the dipole, underaroused dipoles may substitute paradoxical ON-reactions for the OFF-reactions that would have occurred in the normally aroused case. E Paradoxical dishabituation by unexpected events. Sudden increments AI in arousal level that cause an OFF-rebound in a normally aroused dipole can cause a paradoxical enhancement, or dishabituation, of the previous ON-response in an underaroused dipole. As a result, event representations that may have been attentionally blocked, or ignored, as being irrelevant, may attract attention when an unexpected event occurs.
Attention deficit disorder as an underaroused depressive syndrome How these properties reflect themselves in the brain depends upon where the affected dipoles may be found. For example, in a case where sensory and/or motivational dipoles are affected, symptoms relevant to attention deficit disorder may be created, whereas when motor dipoles are affected - as in the basal ganglia systems that control how motor actions, among other processes, are gated ON and OFF (e.g. Horak and Anderson, 1984a,b) - symptoms familiar from Parkinson’s disease emerge. Consider, for example, the case of sensory dipoles and its interpretation in terms of attention deficit disorder. Here, the elevated response threshold to phasic cues may clarify why thresholds during an electroencephalic audiometry test are reduced by medications such as amphetamine (Weber and Sulzbacher, 1975). The suprathreshold hypersensitivity defines the behavioral syndrome. The reduction of sensitivity by an arousal ‘upper’ may be compared with the fact that children
exhibiting this syndrome often suffer from catecholamine deficiencies (Shaywitz et al., 1977; Shekim et al., 1977) and amphetamine-type drugs are used as a treatment (Swanson and Kinsbourne, 1976; Weiss and Hechtmann, 1979). The property that too much ‘upper’ causes an overaroused syndrome can be compared to data showing that an amphetamine psychosis can occur in response to large drug doses (Ellinwood and Kilbey, 1980; MacLennan and Maier, 1983). The property that hyposensitivity of OFF-reactions may occur to halving of a sensory cue (J+J/2) is unknown. For example, does cutting a reward or punishment in half cause an abnormally small affective reaction of opposite sign? Does halving the intensity of a previously sustained visual cue cause an abnormally small negative after effect? Finally, the property of paradoxical dishabituation by unexpected events predicts that irrelevant sensory cues can attract attention after an unexpected event occurs. To the extent that a reduced reset event maps into a reduced P300, such a reduction of the P300 may be expected to correlate with enhanced attention to irrelevant events.
Parkinson’s disease as an underaroused depressive syndrome In the case of Parkinson’s disease, the elevated response threshold to phasic inputs is translated into the difficulty which Parkinson’s patients have in initiating movements (Briley and Marien, 1994). The suprathreshold hypersensitivity is translated into their difficulty in terminating movements after they begin. The fact that they can be treated by an arousal ‘upper’ translates into the fact that, for example, in Parkinson’s disease, dopamine-rich cells of the substantia nigra show marked degeneration (Weiner and Klawans, 1978) and L-dopa, a dopaminergic agonist, is used as a treatment. The fact that too much of an ‘upper’ can cause an overaroused syndrome is interpreted in terms of the fact that too much L-dopa can elicit schizophrenic symptoms (Riklan, 1973; Wallach, 1974). The fact that these extremes are part of the same Inverted-U is illustrated by the fact that antipsychotic drugs that block dopamine receptors (Kuhar et al., 1978) can, in sufficient quantity, produce a catalepsy akin to Parkinson’s disease (Hornykiewicz, 1975).
392
The predicted hyposensitivity of OFF-reactions to decrements in phasic inputs (J+J/2) seems to be unknown. The paradoxical dishabituation by unexpected events has a natural analog in the fact that Parkinson bracing occurs in response to an unexpected push; that is, ‘if suddenly pushed forward or backward while standing, many people with Parkinson’s brace rigidly without stepping, or with short shuffling steps which are unable to counteract their fall’ (Schallert et al., 1979). Why do these patients not right themselves as normal people do, or just fall over? In the model, these bracing reactions may be at least partly caused by the enhanced ON-reactions of the motor commands that were active before the push. An enhanced ONreaction would strengthen the current motor pattern, rather than rebounding it to an antagonistic pattern that could facilitate a righting reaction. The hypothesis that such reactions are due to underaroused circuits are consistent with data showing that intraventricular application of 6-OHDA severely depletes brain catecholamines and thereby produces symptoms such as catalepsy, akinesia, and Parkinson bracing (Levitt and Teitelbaum, 1975; Schallert et al., 1978a,b, 1979). This interpretation suggests the utility of studying how novelty-mediated motor potentials may vary with the amount of bracing. Identification of a key opponent processing circuit and how it may be afflicted during a disease like Parkinson’s is only one step in developing a more complete neural theory of the disease. Contreras-Vidal and colleagues (e.g. ContrerasVidal and Stelmach, 1995; Teulings et al., 1997; Contreras-Vidal et a]., 1998; Van Gemmert et al., 1998) have suggested how a gated dipole circuit, suitably specialized, may be embedded within a larger theory of sensory-motor control (e.g. Bullock and Grossberg, 1988; Bullock et al., 1993a,b; Bullock et al., 1998) to provide a more complete explanation of Parkinsonian symptoms.
Weber law models of mental disorders: similar symptoms with opposite causes? By contrast with the hyperractive orienting reactions that may be indirectly released by a hyposensitive overaroused gated dipole, the hyper-
reactive reactions that can occur in an underaroused gated dipole are direct properties of this circuit. Although these two types of reactions may look similar to casual behavioral analysis, they may be differentiated in terms of their triggering events and parametric properties. Likewise, although both underaroused and overaroused depressive syndromes may both cause a reduction in output from an afflicted brain region, whether amygdala, basal ganglia, or prefrontal cortex, these reductions may be due to different, indeed, opposite causes that lie a polar ends of an inverted-U. Some authors view schizophrenic and Parkinsonian symptoms as having a similar cause (e.g. Ingvar, 1996), whereas others suggest that ‘although the pattern of impairinent [of schizophrenics] was similar to that seen in Parkinson’s disease, different underlying processes may be involved in the two conditions’ (Pantelis and Nelson, 1994, p. 223). The existence of opponent processes that exhibit similar properties at opposite ends of their Inverted4 make the interpretation of these data more difficult. Neural models of normal cognitive and emotional behaviors may provide an additional tool by which the brain mechanisms underlying these abnormal behaviors may be unified, classified, and explained. The habituative dynamics of arousalmodulated opponent processing circuits have been particularly rich in data implications. In addition to the types of properties that have been summarized herein, gated dipoles have been used to explain data about decision making under risk, gambling, memory repression, self-punitive behaviors, eating disorders, analgesic effects, and sleep rhythms, among others (Grossberg, 1972a,b, 1982b, 1984b; Carpenter and Grossberg, 1983, 1984, 1985; Grossberg and Gutowski, 1987). That such a simple combination of neural mechanisms can begin to rationalize a wide range of normal and abnormal behavioral and neural properties provides converging evidence that mechanisms of this type are used by the brain. A key component in these explanations is the way in which tonic arousal sets the sensitivity to phasic inputs when they both activate habituative transmitters and opponent competition. Taken together, these properties realize a Weber Law explanation of various mental disorders. Such a Weber Law, as in
393
visual psychophysics, suggests how the size of a baseline or tonic input can influence the sensitivity to a phasic input that is superimposed upon it. Grace (1991) has also described a Weber Law model of schizophrenia in which tonic baseline signals play a key role in determining the brain’s sensitivity to phasic inputs. As in the present theory, Grace notes that low arousal can cause hyperreactive responses to phasic inputs, whereas high arousal can cause hyporeactive responses, and uses this hypothesis to interpret data about dopamine metabolism in more detail than was attempted here. Grace suggests that low arousal due to abnormally low prefrontal activity is the basis of schizophrenia. He does not focus on the possible causative role of limbic overarousal or underarousal in causing flat affect and, with it, low levels of prefrontal activity. The model of Grace also does not incorporate the possible role of opponent interactions, and does not make significant contact with behavioral data. Thus, although the Grace (1991) model and the Grossberg (1972b, 1984a,b) model both emphasize Weber Law processing as key in these mental disorders, there are significant differences in other model hypotheses. Deciding definitively between them may require more complete models of how the prefrontal cortex, basal ganglia, amygdala, and their interactions with other brain regions generate behavioral properties. Such models are currently being developed in a number of laboratories worldwide. A better mechanistic understanding of the neural substrates of schizophrenia and other arousal-modulated mental disorders may thus soon be available.
Appendix: gated dipoles Transmitters as gates The transmitter model presented here was derived from associative learning postulates in Grossberg ( 1968, 1969a). The gated dipole model was derived from conditioning postulates in Grossberg ( 1972b). The transmitter derivation that is given below suggests that this transmitter law is the minimal dynamic law for unbiased transmission using a depletable signal (Grossberg, 1980).
We start by asking the following question: What is the simplest law whereby one nerve cell can send unbiased signals to another nerve cell? The simplest law says that if a signal S passes through a given nerve cell v l , the signal has a proportional effect
T = SB,
(1)
where B > 0, on the next nerve cell v 2 . Suppose, in addition, that the signal from v , to v2 is due to the release of a chemical z(t) from vI that activates v2. If such a chemical transmitter is persistently released when S is large, what keeps the net signal, T, from getting smaller and smaller as v, runs out of transmitter? Some means of replenishing or accumulating the transmitter must exist to counterbalance its depletion due to release from v, . To accommodate this interpretation, we can rewrite Eqn (1) in the form
T = Sz
(2)
and ask: How can the system keep z replenished so that
z(t) B
(3)
at all times t ? This is a question about the sensitivity of v2 to signals from vI , since if z could decrease to small values, then even large signals S would have only a small effect on T. Equation (2) has the following interpretation. The signal, S, causes the transmitter, z, to be released at a rate T=Sz. Whenever two processes, such as S and z, are multiplied, they are said to interact by mass action, or that z gates 5’. Thus, (2) says that z gates S to release a net signal T, and Eqn ( 3 ) says that the cell tries to replenish z to maintain the system’s sensitivity to S. The simplest law that joins together both (2) and ( 3 ) is the following differential equation for the net rate of change, dzldt, of z :
dz -=A(B - 2) - SZ dt
(4)
Equation (4) describes the following four processes going on simultaneously. Accumulation or Production and Feedback Inhibition: The term A(B - z ) enjoys two possible interpretations, depending on whether it represents
394
a passive accumulation process or an active production process. In the former interpretation, there exist B sites to which transmitter can be bound, z sites are bound at time t, and B - z sites are unbound. Then term A(B - z ) says that transmitter is bound at a rate proportional to the number of unbound sites. In the latter interpretation, two processes go on simultaneously. Term AB on the right-hand side of Eqn (4)says that z is produced at a constant rate AB. Term - A z says that once z is produced, it inhibits the production rate by an amount proportional to the concentration of z. In biochemistry, such an inhibitory effect is called feedback inhibition by the end product of a reaction. Without feedback inhibition, the constant rate of production, AB, would eventually cause the cell to burst. With feedback inhibition, the net production rate is A ( B - z), which causes z(t) to approach the finite amount B, as we desire by Eqn (3). The term A ( B - z ) thus enables the cell to accumulate a target level B of transmitter. Gating and Release: Term - Sz in Eqn (4) says that z is inactivated or released at a rate Sz. As in Eqn ( 2 ) ,inactivation or release of z is due to a mass action interaction, or gating, of S by z . Equations (2) and (4) describe the simplest dynamic law that corresponds to constraints (2) and (3). These equations reconcile the two constraints of unbiased signal transmission and maintenance of sensitivity when the signals are due to release of transmitter.
By Eqn (3,a larger value of So causes more transmitter to be inactivated or released. In other words, z(t(,) is a decreasing function of S[). By contrast, Eqn (2) implies that the net signal to v2 at time tois Soz( to) =
A BSo . A+So ~~~~
By Eqn (6), the rate of transmitter release is an increasing function of So. Now let S ( t ) switch to the value S , >So. Because z(t) is slowly varying, z(t) approximately equals &) for awhile after t= ti,. Thus, the net signal to v? during these times is approximately equal to
ABS, S,z(to)= . A+So ~
(7)
Equation (7) has the same form as a Weber law, J(A + I ) - ’ . The signal S, is evaluated relative to the baseline, So, just as J is evaluated relative to I . This Weber law is due to slow intracellular adaptation of the transmitter gate to the input level through time. It is not due to fast intercellular lateral inhibition across space (Grossberg, 1980,Appendix C and D), which also obeys a Weber law. Many of the properties derived below are due to this intracellular Weber law. As z(t) in Eqn (4)begins to respond to the new transmitter level, S = S,, z ( t ) gradually approaches the new equilibrium point that is determined by S = S,, namely
Weber-law adaptation and habituation To determine how the net signal, T = Sz, reacts to a sudden change in S, as in Fig. 3, suppose that z(t) reacts slowly compared to the rate with which S ( t ) can change. For definiteness, suppose that S(t) =So for all times 7 S to and that, at time r = to, S(r) suddenly increases to S , . By Eqn (4),z(t) reacts to the constant value S(t)=So by approaching an equilibrium value z(to). This equilibrium value is found by setting dz/dt=O in Eqn (4)and solving for
AB A+So
z ( t 0 ) = ~.
The net signal consequently decays to the asymptote. S,z( 03 ) =
ABS, A +S, ~
(9)
Thus, after S(t) switches from So to S,, the net signal Sz jumps from (6) to (7) and then gradually decays to Eqn (9). The exact course of this decay is described by the equation
395
for t a t o ,which shows that the rate, or gain, A + S , of the response increases with the signal S,,just as in the case of shunting lateral inhibition (Grossberg, 1980). The sudden increment followed by slow decay can be intuitively described as an overshoot followed by habituation to the new sustained signal level, S , (see Fig. 3). Both intracellular adaptation and habituation occur whenever a transmitter fluctuates more slowly than the signals that it gates. The size of the overshoot can be found by subtracting Eqn (9) from Eqn (7). For definiteness, let S , = f ( I ) and S , = f ( I + J ) , where f ( w ) is a function that transmutes the inputs I and Z+J that exist before and after the increment J into net signals S, and S , , respectively. Then the overshoot size is approximately
Figure 11 depicts the simplest network in which two channels receive inputs that are gated by slowly varying transmitters before the channels compete to elicit a net output response. In such a feedforward gated dipole, specific phasic inputs are turned on and off by internal or external cues and nonspecific arousal inputs are on all the time, or tonic, even though their size can vary through time. Each channel can have its own sum of specific inputs, K, or K,, such as hunger or satiety drive inputs, respectively, that are added to positive or negative conditioned reinforcer signals. Both channels also receive the same arousal input, L. The total signals to the two channels are, therefore, S , =f(K, + L), where the signal function, f ( w ) , is monotone increasing. The relative sizes of S , and S , and their rates of change through time relative to the transmitter fluctuation rate determine whether an antagonistic rebound will occur. To emphasize this fact, let
It is shown below that the rebound size in response to specific cue offset is related to (1 1) in a way that allows both f ( w ) and the arousal level, I , to be estimated. Intracellular habituation due to a slow transmitter gate is not the only type of habituation in the brain. An intercellular variety of habituation can also occur. After a feedback expectancy is learned, a mismatch of the feedback expectancy with feedforward data can trigger an orienting reaction by dishabituating the network’s orienting subsystem (Grossberg, 1980; Grossberg and Merrill, 1996). Feedback expectancies and slow gates are both needed to regulate perceptual and motivational events, but they are quite distinct mechanistically.
I = min(K, + L, K2+ L)
(12)
J = I K , - K,I.
(13)
A gated dipole It is shown below how, if transmitters gate signals before the gated signals compete, as in Fig. 2, then antagonistic rebound can be elicited by offset of a specific cue, as in light-ON vs. light-OFF, or fear vs. relief. It is also shown how unexpected events can cause an antagonistic rebound. They do this by triggering an increase in the level of nonspecific arousal that is gated by all the transmitter pathways.
and
The quantity I determines the network’s net arousal level and J determines how asymmetric the inputs are to the two channels (cf. Fig. 2). Suppose, for definiteness, that K, > K, . Then S , =f(Z + J j and S, =f ( I ) . The notational shift from S , =f(K, + L) and S,=f(K2+L) to S , = f ( I + J ) and S,=f(I) in Eqns (12) and (13) is motivated by more than formal convenience. The notation I and J emphasizes that the dipole does not know how many input sources are perturbing it through time. All it can compute is the net arousal level, I , and the degree of asymmetry, J , above I , whether one or a million input sources are active. If a million cues equally perturb the ON-channel (positive reinforcers) and another million cues equally perturb the OFFchannel (negative reinforcers), the net effect of all the cues will be to increase I , not J. Thus, after dipole competition takes place, all these cues need not generate any incentive motivation. On the other hand, by increasing I, these cues can alter the sensitivity of the dipole to other asymmetrically distributed inputs due to the dipole’s inverted4
396
properties. This is the kind of simple but subtle distinction that the I and J notation emphasizes.
d ZI = A ( B - Z I - Slzl dt
~
and the transmitter in the OFF-channel, z 2 . satisfy the equation
Rebound due to phasic cue offset A rebound can be caused if, after the network equilibrates to the input J , the input is suddenly shut off (see J in Fig. 2). This effect is analogous to the reaction that occurs when a previously sustained light is shut off or a previously sustained aversive cue is shut off. To see how this rebound is generated, suppose that the arousal level is I and that the cue input is J . Let the total signal in the ON-channel be S, =f(i+J)and that in the OFFchannel be S, = f ( I ) . Let the transmitter in the ON-channel, z I , satisfy the equation
h
ON
(14)
d ~2 = A ( B - 22 ) - SZ22. dt ~
(15)
After zI and z2 equilibrate to S , and S,, (d/dt)z,= (d/ dt)z2=0. Thus, by Eqns (14) and (1% AB A+S,
(16)
AB A+S2’
(17)
21 =-
and 2,
=
______
OFF
+ L ’ 1 A 1;: ;{ - +
tz
z, I
I 7.
Competition
Gate
Signal
I 1
/V
J
I
1 ON-Drive (K,)
Arousal (L)
OFF-Drive (K2)
(a) Fig. 11. (a) Specific inputs (K, and K z ) and a nonspecific input ( L ) have the same effect on a gated dipole as (b) a specific input J and a net arousal level I if K , > K 2 .
397
Since S, > S , , it follows that z , < z 2 ; that is, z , is habituated more than z, . However, the gated signal in the ON-channel is S,z, and the gated signal in the OFF-channel is S2z2.Since
Sill =
and
Thus, S Tz, < S qz, . The OFF-channel now gets the bigger signal, so an antagonistic rebound occurs, the size of which is approximately
ABS, A+S, ~
and s2z2
=
ABS, A+&’ ~
it follows from the inequality S, > S, that S,z, > &z2, despite the fact that z1< 2,. Thus, the ON-channel gets a bigger signal than the OFF-channel. After the two channels compete, the input J produces a sustained ON-output whose size is proportional to
Division of the overshoot amplitude Eqn (11) by the sustained ON-output amplitude Eqn (20) yields an interesting relationship between the size of the overshoot in the ON-channel and the size of the steady-state ON-output; namely, on -overshoot -~- f ( I + J ) A * steady on-output
(21)
which provides an estimate of f(w) if J is parametrically varied. In particular, if f(w) is a linear signal, f ( w )= w,then Eqn (20) becomes
which is an increasing function of J (more fear given more shock) but a decreasing function of I (analgesic effect). Now shut J off to see how an antagonistic rebound (relief) is generated. The cell potentials rapidly adjust until new signal values, S T = f ( I ) and S q = f ( I ) , obtain. However, the transmitters z , and z2 change much more slowly, so that (16) and (17) are approximately valid in a time interval that follows J offset. Thus, the gated signals in this time interval approximately equal
Division of the rebound amplitude Eqn (25) by the steady-state ON-output Eqn (20) yields an interesting relationship between the maximal OFF-rebound-output and the steady ON-output; namely, off -rebound - f(Z) -on-output A ’
(26)
which provides an estimate of f(w) as Z is parametrically varied. A comparison of Eqn (21) with (26) shows that, as I is parametrically varied, Eqn (21) should have the same graph as Eqn (26), shifted by J. This comparison provides an estimate of J (that is, of how the behavioral input is transformed into neural units) and also a strong test of the model. Oncef(w) is estimated, Eqns (20) and (25) can be verified. If f(w) = w in Eqn (25), then A BZJ
sqz,=s~zl = ( A+ I ) ( A+ I + J ) ’
(27)
The rebound is then an increasing function of J (offset of larger shock elicits more relief) and an Inverted4 function of I (an optimal arousal level exists). The rebound is transient (see OFF in Fig. 2) because the equal signals, S, = S, =f(Z) gradually equalize the z1 and z2 levels until they both approach A B ( A + f ( I ) ) - ’ . Then S,zl - S2z2 approaches zero, so the competition between channels shuts off both of their outputs.
Rebound due to arousal onset A surprising property of gated dipoles is their reaction to sudden increments in the arousal level, I. Such increments may, for example, occur in response to unexpected events. Suppose that the ON-channel and the OFF-channel have equilibrated
398
to the input levels I and J . Now suddenly increase I to I * , thereby changing the signals to S t = f ( I * + J ) and S $ = f ( Z * ) . The transmitters zI and z2 continue to obey Eqns (16) and (17) for a while, with S , =f(Z+J)and S,=f(Z). A rebound occurs if S 5z2> S Tz, . In general,
sgn{w) = + 1 if w>O, 0 if w=O, and - 1 if w
+ 2 A [ f ( I ) f ’ ( I + J )- f ( I + J ) f ’ ( 0 1
+ [f’ (Of‘ ( I + J ) -f2 (1+ J >f’( 0 1 I. In particular, if f(w) = w,a rebound occurs whenever
Sincef(w) is sigmoid,
f(0) =f’(0) = 0.
since then ABJ(I* - I - A ) s $22 - s TZl = ( A + I + J ) ( A + I ) ’
(32)
(33)
Thus, by Eqns (32) and (33), sgn(x;(O)} =sgn{A?’(J)} >O.
(30)
Thus, given a linear signal function, a rebound will occur if I* exceeds I + A no matter how J is chosen. If the event is so unexpected that it increments the arousal level by more than amount A, then all dipoles in the network will simultaneously rebound. Moreover, the size of the OFF-cell rebound increases as a function of the size of the ON-cell input, J , as Eqn (30) shows. In particular, no rebound occurs if the ON-cell was inactive before the unexpected event occurs. Thus, the rebound mechanism is selective. It rebounds most vigorously those cells which are most active ( J + 0) and spares inactive cells (J = 0).
Inverted4 in dipole output The inverted4 effect holds iff(w) is a sigmoid, or S-shaped, function; that is, if f ( 0 ) = dfldw(0) = 0, df/dw(w)>O if w>O, f ( w ) < w , and d2fldw2(w) changes sign once from positive to negative as w increases. Sigmoid signal functions are found in many neural systems if only because of their noise suppression and contrast-enhancement properties (cf. Grossberg, 1980). In particular, if f(w) is sigmoid, an inverted4 occurs in the sustained ONoutput Eqn (20) as I is parametrically increased, despite the fact that an inverted-U does not obtain in (22) when f(w) is linear. The results are simplified by using the signum function
(34)
At large values of I , f(l+J)>fU>,
(35)
whereas
f ’( I + J )
(36)
Consequently, each term in brackets on the righthand side of Eqn (32) is negative. Thus, at large I values, ssnIx;(ol
(37)
The inequalities in Eqns (34) and (37) show that, for fixed J, x,(Z) increases and then decreases as a function of I . This is the inverted-U for the ONreaction. In fact, since f( m ) < w , Eqn (20) implies that lim,+,x, ( I ) = 0. A similar proof holds for the OFF-reaction.
Hypersensitive underaroused reaction to phasic increments To illustrate why the underaroused syndrome is hypersensitive to phasic increments, suppose that I is chosen abnormally small and, consequently, that f ( Z ) is very small because off’s S-shaped graph. Let J represent the intensity of a fearful cue (e.g., a shock level) and let the dipole ON-output Eqn (20) be correlated with the amount of fear. Since I is so small, the ‘fear threshold is raised’ in the sense that a larger value of J is needed to create a large net
399
ON-output than when I is chosen in the ‘normal’ range. Although the fear threshold is high, once J is chosen sufficiently large to elicit a detectable net ON-reaction, additional increments in J create larger than normal increments in fear. This is because the terms f ( I ) in the numerator and denominator of Eqn (20) are abnormally small. More precisely, differentiating Eqn (20) with respect to J , we find the rate at which the ONoutput increases to unit increases in J . This rate is
underaroused dipole than a normally aroused dipole. Moreover, if AI < A J - ’ , then when I = 0,
”
( I + A Z + J ) * --]>O.( I + A I ) 2 d(AI) A + ( I + J ) * A+J2
(41)
If I + J is chosen so that f ( I + J ) is small but growing rapidly, then f’( I + J is relatively large when the denominator, [A + f ( I + J ) ] ’ , is relatively small. In other words, underaroused depression is hyperexcitable despite its high threshold.
In other words, an arousal increment can actually enhance the ON-output of an underaroused dipole instead of rebounding the dipole. Use of a sigmoid function also helps explain how, in response to an arousal burst, previously unattended sensory representations can be enhanced even while very active sensory representations are inhibited. This is because the function g(Z, J ) is a decreasing function of J , as well as of I . This means that it is easier to rebound a more active sensory representation than a less active sensory representation.
Paradoxical on-reaction to unexpected events and differential enhancement of unattended cues
Paradoxical lack of rebound to phasic decrement: ordering of reinforcement magnitude
Two other properties of underaroused dipoles are related to Parkinsonian bracing. These properties, like underaroused hyperexcitability, are due to the faster-than-linear, or threshold, behavior of the Sshaped signal function, ffw), at small activity values, w. Neither property holds if the signal function is linear, say f(w) = w. In particular, by Eqn (30), whenf(w) = w, an arousal increment AI in response to an unexpected event causes a rebound whenever A I > A . The minimal AI capable of causing a rebound is independent of the ambient arousal level, I. This property does not hold when f(w) grows faster than linearly, say f(w)=w2, which approximates the sigmoid shape of f(w) at low arousal levels. By (28), a rebound occurs when f(w) = w 2 only if
This section illustrates how several behavioral indices should all covary as arousal level is parametrically increased. The first index says that reducing J units of shock (or other negative reinforcer) to 112 units is less rewarding (i.e. produces a smaller rebound) than reducing J/2 units of shock to 0 units, despite the fact that both operations reduce shock by J/2 units. This result is based on the fact that Eqns (20) and (25) include Weber law ratios of I and J terms as well as differences of I and J terms. A formula has been derived that predicts when reducing J , units of shock to K , units at arousal level I , is more reinforcing than reducing J2 units of shock to K2 units at arousal level I2 (Grossberg, 1972b). To make these assertions, assume that the size of the relief rebound caused by reducing the shock level is proportional to the rewarding effect of the manipulation, other things being equal. To simplify the computations, it is convenient to use a signal function
A2Bf ’( I + J ) (SIZ, - S2z2)=aJ [A +f(Z+J)l2’
a
~
AI>g(I, J ) ,
(38)
(39)
where the function g(I, J ) =
A - ] ( I +J ) + ( A + Z2)’/’[A+ ( I + J ) 2 ] 1 / 2 (40) 2I+J
is a decreasing function of I . In fact, g ( I , J ) approaches 0 as I is chosen arbitrarily large. Thus, a much larger AI is needed to rebound an
f(w) = max(w - C, 0).
(42)
Such a signal function has a threshold C, below which it equals 0 and above which it grows linearly.
400
This threshold function approximates a sigmoid function in the activity range before saturation occurs. Denote the steady-state ON-reaction that occurs after a specific input of intensity J is kept on for S time units by x5(S,J - K ) and the OFFrebound that occurs when intensity J is switched to K at time t = S by x 6 ( S +J, - K ) . To compute x 6 ( S +J, K ) , the transmitters z are approximated by their steady-state values at t = S and the potentials x by their new steady-state values in response to input K. Given an arousal level I that exceeds the threshold, C, then
where D = A - C. By comparison, Eqns (20) and (25) imply that x5( S , J - 0 )=
A~BJ ( D + I ) ( D + I +J )
(44)
and ABJ(I - C ) x6(S J- 0)= ( D+ I ) @ + I + J ) +,
increases with I , as in the more general Eqn (26). Substituting Eqn (47) into Eqn (43) shows that
J A2B - [x,'(S, K-+O)x,(S', K-0) - 11 2 - ~. (D+ I ) ( D+ I + J ) (49) By Eqn (49), an arousal level that favors the possibility of learned avoidance in the presence of fearful cues (i.e. the OFF-rebound is much bigger than the ON-response so that the right hand side of (49) is positive) also favors a large rewarding effect when the shock level is halved. If I is chosen to be small (underarousal), then x, in Eqn (43) can be negative (no rebound occurs) even if x6 in Eqn (46) is positive (a rebound occurs). Such dipole properties are linked to the membrane equations that define cell dynamics in Grossberg (1984b).
Acknowledgments (45)
from which it also follows that
and
for any K > 0. Comparing Eqns (43) and (46) shows that the relative rebound sizes satisfy
or that cutting J units in half is less rewarding than shutting off J / 2 units. In addition, the ratio (47)
The author wishes to thank Robin Amos and Diana Meyers for their valuable assistance in the preparation of the manuscript. This work was supported in part by the Defense Advanced Research Projects Agency and the Office of Naval Research (ONR N00014-95- 1-0409), and the National Science Foundation (NSF IRI-97-20333).
References Abbott, L.F., Varela, K., Sen, K. and Nelson, S.B. (1997) Synaptic depression and cortical gain control. Science, 275: 220-223. Aggleston, J.P. (1993) The contribution of the amygdala to normal and abnormal emotional states. Trends Neurosri., 16: 328-333. Albert, M. and Rush, M. (1983) Comparison of effects in Parkinson's disease and schizophrenia. Psychopharmacology Bull., 19: 118-120. Andreason, N.C. (1979) Thought, language, and communication disorders. Arch. Gen. Psychiatry, 36: 1315-1321. Andreasen, N.C. (1990) Positive and negative symptoms: Historical and conceptual aspects. Mod. Probl. Phurinacopsychiatry, 24: 1.
40 1 Arndt, S., Alliger, R.J. and Andreasen, N.C. (1991) The distinction of positive and negative symptoms: The failure of a two-dimensional model. BI: J. Psychiatry, 158: 317-322. Baddeley, A.D. (1986) Working Memory. Oxford: Clarendon Press. Baloch, A. and Waxman, A. (1991) Visual learning, adaptive expectations, and behavioral conditioning of the mobile robot MAVIN. Neur: Networks, 4: 271-302. Banquet, J.P. and Grossberg, S. (1987) Probing cognitive processes through the structure of event-related potentials during learning: An experimental and theoretical analysis. Appl. Optics, 26: 493 14946. Banquet, J.-P., Renault, B. and Lesevre, N. (1981) Effect of task and stimulus probability on evoked potentials. Biol. Psychol.. 13: 203-214. Barbas, H. (1995) Anatomic basis of cognitive-emotional interactions in the primate prefrontal cortex. Neurosci. Biobehav. Rev., 19: 499-510. Bleuler, E. (1950) Dementia Praecox. or the Group of Schizophrenics. (J. Zinken, Trans.) New York: International Universities Press. (Original work published in 191 I) Blumer, D. and Benson, D.F. (1975) Personality changes with frontal lobe lesions. In: D.F. Benson and D. Blumer (Eds.), Psychiatric Aspects of Neurological Disease. New York: Grune and Stratton, pp. 151-170. Bower, G.H. (1981) Mood and memory. Am. Psychologist. 36: 129- 148. Bradski, G., Carpenter, G.A. and Grossberg, S. (1994) STORE workmg memory networks for storage and recall of arbitrary temporal sequences. Biol. Cyber:, 71: 469480. Braff. D.L. (1985) Attention, habituation, and information processing in psychiatric disorders. In: B. Michael, J.O. Cavenar and H.K. Brodie et al. (Eds.), Psychiatry, Vol. 3. Philadelphia, PA: J.P. Lippincott, pp. 1-12. Briley, M. and Marien, M. (Eds.) (1994) Noradrenergic Mechanisms in Parkinson’s Disease. New York, NY: CRC Press. Brown, J.L. (1965) Afterimages. In: C.H. Graham (Ed.) Vision and Visucil Perception. Wiley: New York. Brunia, C.H.M., Haagh, S.A.V.M and Scheirs. J.G.M. (1985) Waiting to respond: Electrophysiological measurements in man during preparation for a voluntary movement. In: H. Heuer, U. Kleinbeck, and K.-H. Schmidt (Eds.) Motor Behavior. New York: Springer. Bullock, D., Cisek, P. and Grossberg, S. (1998) Cortical networks for control of voluntary arm movements under variable force conditions. Cereb. Cortex, 8: 48-62. Bullock, D. and Grossberg, S. (1988) Neural dynamics of planned arm movements: Emergent invariants and speedaccuracy properties during trajectory formation. Psychol. Rev., 95: 49-90. Bullock, D., Grossberg, S. and Mannes, C. (1993a) A neural network model for cursive script production. Biol. Cyber:, 70: 15-28. Bullock, D., Grossberg, S. and Guenther, F.H. (1993b) A selforganizing neural model of motor equivalent reaching and
tool use by a multijoint arm. J. Cogn. Neurosci., 5: 408435. Buonomano, D.V., Baxter, D.A. and Byme, J.H. ( I 990) Small networks of empirically derived adaptive elements simulate some higher-order features of classical conditioning. Neur: Networks, 3: 507-523. Cant, B.R. and Bickford. R.G. (1967) The effect of motivation on the contingent negative variation (CNV). Electroencephalog. Clin. Neurophysiol., 23: 594. Carlsson, A. (1988) The current status of the dopamine hypothesis of schizophrenia. Neuropsychopharmacology, 1: 179. Carpenter, G.A. and Grossberg, S. (1981) Adaptation and transmitter gating in vertebrate photoreceptors. J. Theoret. Neurobiol., I: 1 4 2 . Carpenter, G.A. and Grossberg, S. (1983) A neural theory of circadian rhythms: The gated pacemaker. Biol. Cybel:, 48: 35-59. Carpenter, G.A. and Grossberg, S. (1984) A neural theory of circadian rhythms: Aschoff’s rule in diurnal and nocturnal mammals. Am. J. Physiol. (Regulatory, Integrative, and Compara. Physiol.), 247: Rl067-Rl082. Carpenter, G.A. and Grossberg, S. (1985) A neural theory of circadian rhythms: Split rhythms, after-effects, and motivational interactions. J. Theoret. Biol., 113: 163-223. Carpenter, G.A. and Grossberg, S. (1991) Pattern Recognition by Self-organizing Neural Networks. Cambridge, MA: MIT Press. Carpenter, G.A. and Grossberg, S. (1994) Integrating symbolic and neural processing in a self-organizing architecture for pattern recognition and prediction. In: V. Honavar and L. Uhr (Eds.), Artificial Intelligence and Neural Networks: Steps Towards Principled Prediction. San Diego: Academic Press, pp. 387-421. Chang, C. and Gaudiano, P. (1998) Application of biological learning theories to mobile robot avoidance and approach behaviors. J. Complex Systems, in press. Contreras-Vidal, J.L., Grossberg, S. and Bullock, D. (1997) A neural model of cerebellar learning for arm movement control: cortico-spinal-cerebellar dynamics. Learn. Mem., 3: 475-502. Contreras-Vidal, J.L., Poluha, P.. Teulings, H.-L. and Stelmach, G.E. (1998) Neural dynamics of short and medium-term motor control effects of levodopa therapy in Parkinson’s disease. Art$ Intell. Med., in press. Contreras-Vidal, J.L. and Stelmach, G.E. (1995) A neural model of basal ganglia-thalamocortical relations in normal and parkinsonian movement. Biol. Cybel:, 73: 467476. Cox, V.C., Kakolewski, J.W., and Valenstein, E.S. (1969) Inhibition of eating and drinking following hypothalamic stimulation in the rat. J. Comp. Physiol. Psychol., 68: 530-535. Davis, M. (1994) The role of the amygdala in emotional learning, Int. Rev. Neurabiol., 36: 225-265. Denny, M.R. (1971) Relaxation theory and experiments. In: F.R. Brush (Ed.), Aversive Condition. Learn.. New York: Academic Press.
402 Desimone, R. (199 1) Face-selective cells in the temporal cortex of monkeys. J. Cogn. Neurosci., 3: 1-8. Dews, P.B. (1958) Studies on behavior, IV: Stimulant actions of methamphetamine. J. Pharmacol. Exp. TheL:, 122: 137-147. Early, T.S., Haller, J.W., Posner, M.I. and Raichle, M. (1994) The left striato-pallidal hyperactivity model of schizophrenia. In: AS. David and J.C. Cutting (Eds.), The Neuropsychology of Schizophrenia. Hillsdale, New Jersey: Erlbaum Press, pp. 15-37, Ellinwood, E.H. and Kilbey, M.M. (1980) Fundamental mechanisms underlying altered behavior following chronic administration of psychomotor stimulants. Biol. Psychiatry, 1s: 749-757. Estes, W.K. (1969) Outline of a theory of punishment. In: B.A. Campbell and R.M. Church (Eds.), Punishment und Aversive Behavior. New York: Appleton-Centry-Crofts. Estes, W.K. and Skinner, B.F. (1941) Some quantitative properties of anxiety. J. Exp. Psycho[., 29: 390400. Evarts, E.V. (1973) Motor cortex reflexes associated with learned movement. Science, 179: 501-503. Felleman, D.J. and van Essen, C.D. (1991) Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex, I: 1 4 7 . Fiala, J.C., Grossberg, S . and Bullock, D. (1996) Metabotropic glutamate receptor activation in cerebellar Purhnje cells as substrate for adaptive timing of classically conditioned eye blink response. J. Neurosci., 16: 3760-3774. Francis, G. and Grossberg, S . (l996a) Cortical dynamics of boundary segmentation and reset: Persistence, afterimages, and residual traces. Perceprion, 25: 543-567. Francis. G. and Grossberg, S. (1996b) Cortical dynamics of form and motion integration: Persistence, apparent motion, and illusory contours. Vision Rex, 36: 149-173. Francis, G., Grossberg, S. and Mingolla, E. (1994) Cortical dynamics of feature binding and reset: Control of visual persistence. Vision R e x , 34: 1089-1 104. Frith, C.D. (1992) The Cognitive Neuropsychology of Schizophrenia.. Hillsdale, New Jersey: Erlbaum Press. Frith, C.D. (1994) Theory of mind in Schizophrenia. In: AS. David and J.C. Cutting (Eds.), The Neuropsychology of Schizophrenia. Hillsdale, New Jersey: Erlbaum Press, pp. 147-1 6 I . Fuster, J.M. (1973) Unit activity in prefrontal cortex during delayed-response performance: Neuronal correlates of transient memory. J. Neurophysiol., 36: 61-78. Fuster, J.M. (1989) The Prefrontal Cortex (second edition) New York: Raven Press. Fuster, J.M. (1995) Memory in the Cerebral Cortex. Cambridge, MA: MIT Press. Gaudiano. P. and Chang, C. ( I 997) Adaptive obstacle avoidance with a neural network for operant conditioning: Experiments with real robots. In: Proceedings of the 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation. IEEE Press, pp. 13-18. Gaudiano, P., Zalama, E., Chang, C. and Lopez-Coronado, J. (1996) A model of operant conditioning for adaptive obstacle avoidance. In: P. Maes et al. (Eds.), From Animals to Animars
4. Proceedings of the fourth International Conference on
Simulation o j Adaptive Behavior. Cambridge, MA: MIT Press, pp. 373-381. Gerlach, J. and Luhdorf, K. (1975) The effect of L-dopa on young patients with simple schizophrenia, treated with neuroleptic drugs. Psychopharmacologia, 44: 105-110. Gilbert, P.F.C. and Thach, W.T. (1977) Purkinje cell activity during motor learning. Brain Res., 128: 309-328. Gloor, P., Olivier, A.. Quesney, L.F., Andermann, E and Horowitz, S. (1982) The role of the limbic system in experiential phenomena of temporal lobe epilepsy. Ann. Neurol., 12: 129-144. Gochin, P.M., Miller, E.K., Gross, C.G. and Gerstein, G.L. (199 I ) Functional interactions among neurons in inferior temporal cortex of the awake macaque. Exp. Bruin Rex, 84: 505-5 16. Goldman-Rakic, P.S. (1987) Circuitry of primate prefrontal cortex and regulation of behavior by representational memory. In: F. Plum (Ed.), Handbook of Physiology; Nervous System. Vol 5., Higher Functions of the Brain (Part 1). Bethesda, MD: American Physiological Society, pp. 373417. Grace, A.A. (1991) Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: A hypothesis for the etiology of schizophrenia. Neuroscience, 41: 1-24. Grossberg, S. (1968) Some physiological and biochemical consequences of psychological postulates. Proceedings of the National Academy of Sciences, 60: 758-765. Grossberg, S . (1969a) On the production and release of chemical transmitters and related topics in cellular control. J. Tlieoret. Biol., 22: 325-364. Grossberg, S. (1969b) On the serial learning of lists. Math. Biosci., 4: 201-253. Grossberg, S. (1971) On the dynamics of operant conditioning. J. Theoret. Biol., 33: 225-255. Grossberg, S . (1972a) A neural theory of punishment and avoidance, I: Qualitative theory. Math. Biosci., 15: 39-67. Grossberg, S . (1972b) A neural theory of punishment and avoidance, 11: Quantitative theory. Math. Biosci., 15: 253-285. Grossberg, S . (1973) Contour enhancement, short term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics, 52: 217-257. Reprinted in Grossberg (1 982c). Grossberg, S. (1974) Classical and instrumental learning by neural networks. In: R. Rosen and F. Snell (Eds.). Progress in Theoretical Biology. New York: Academic Press, pp. 51-14]. Reprinted in Grossberg (1982~). Grossberg, S. (1975) A neural model of attention, reinforcement, and discrimination learning. International Review of Neurobiology, 18, 263-327. Reprinted in Grossberg (1982~). Grossberg, S. (1978a) A theory of human memory: SelfOrganization and performance of sensory-motor codes, maps, and plans. In: R. Rosen and F. Snell (Eds.), Progress in
403 Theoretical Biology. New York: Academic Press, pp. 233-374. Reprinted in Grossberg (1982~). Grossberg, S. (1978b) Behavioral contrast in short term memory: Serial binary memory models or parallel continuous memory models? J. Math. Psychol., 3: 199-219. Grossbeg, S. (1980) How does a brain build a cognitive code? Psychol. Rev., I : 1-51. Grossberg. S. (1 982a) Associative and competitive principles of learning and development: The temporal unfolding and stability of STM and LTM patterns. In: S.I. Amari and M. Arbib (Eds.), Competition and Cooperation in Neural Networks. New York: Springer-Verlag, pp. 295-341. Grossberg, S. (1982b) Processing of expected and unexpected events during conditioning and attention: A psychophysiological theory. Psychol. Rev., 89: 529-572. Grossberg, S. (1982~)Studies ofMind and Brain. Amsterdam: Kluwer. Grossberg, S. (1984a) Some normal and abnormal behavioral syndromes due to transmitter gating of opponent processes. Biol. Psychiatry, 19: 1075-1 118. Grossberg, S. (198413) Some psychophysiological and pharmacological correlates of a developmental, cognitive, and motivational theory. In: R. Karrer, J. Cohen, and P. Tueting (Eds.), Brain and Information: Event Related Potentials. New York: New York Academy of Sciences, pp. 58-142. Reprinted in Grossberg (1982~). Grossberg, S. ( 1987a) Cortical dynamics of three-dimensional form, color, and brightness perception: 11. Binocular theory. Percep. Psychophys., 41: 117-158. Grossberg, S. (1987b) The Adaptive Brain, Volume I. Amsterdam: ElsevierNorth-Holland. Grossberg, S. ( 1988) Neural Networks and Natural Intelligence. Cambridge, MA: MIT Press. Grossberg. S. (1995) The attentive brain. Am. Scientist, 83: 438-449. Grossberg, S., Boardman, I., and Cohen, M.A. (1997) Neural dynamics of Variable-rate speech categorization. J. Exp. Psychol.: Hum. Percep. Perform., 23: 48 1-503. Grossberg, S. and Gutowski, S. (1987) Neural dynamics of decision making under risk: Affective balance and cognitiveemotional interactions. Psychol. Rev., 94: 30&318. Grossberg. S. and Levine, D.S. (1987) Neural dynamics of attentionally modulated Pavlovian conditioning: Blocking, inter-stimulus interval, and secondary conditioning. Appl. Opt.. 26: 5015-5030. Grossberg, S. and Merrill, J.W.L. (1992) A neural network model of adaptively timed reinforcement learning and hippocampal dynamics. Cogn. Brain Res., 1 : 3-38. Grossberg. S. and Menill, J.R.W. (1996) The hippocampus and cerebellum in adaptively timed learning, recognition, and movement. J. Cogn. Neurosci., 8 : 257-277. Grossberg, S. and Pepe, J. (1970) Schizophrenia: Possible dependence of associational span, bowing, and primacy vs. recency on spiking threshold. Behuv. Sci., 15: 359-362. Grossberg, S. and Pepe, J. (1971) Spiking threshold and overarousal effects in serial learning. J. Stat. Phys., 3: 95-125.
Grossberg, S. and Schmajuk, N.A. (1987) Neural dynamics of attentionall y-modulated Pavlovian conditioning: Conditioned reinforcement, inhibition, and opponent processing. Psychobiology, 15: 195-240. Halgren, E., Walter, R.D., Cherlow, D.G. and Crandall, P.H. (1 978) Mental phenomena evoked by electrical stimulations of the human hippocampal formation and amygdala. Bruin, 101: 83-117. Hames, M.H. and Perrett, D.I. (1991) Visual processing of faces in temporal cortex: Physiological evidence for a modular organization and possible anatomical correlates. J. Cogn. Neurosci., 3: 9-24. Helmholtz, H. von (1866) Handbuch der Pbysiologischen Optik. Hamburg, Leipzig: Voss. Helmholtz, H. von (1962) Physiological Optics Vol. 2. J.P. Southall (Ed.) New York: Dover. Hodgkin, A.L. (1 964) The Conduction of the Nervous Impulse. Springfield, Illinois: C.C. Thomas. Horak, R.B. and Anderson, M.E. (1984a) Influence of globus pallidus on arm movements in monkeys. I. Effects of nainic acid-induced lesions. J. Neurophysiol., 52: 29@304. Horak, R.B. and Anderson, M.E. (1984b) Influence of globus pallidus on arm movements in monkeys. 11. Effects of stimulations. J. Neurophysiol., 52: 305-322. Hornykiewicz, 0. (1975) Parkinsonism induced by dopaminergic antagonists. In: D.B. Calne and A. Barbeau (Eds.), Advances in Neurology. Vol. 9. New York: Raven Press. Ingvar, D.H. (1996) The will of the brain: Cerebral correlates of willful acts. In: A.R. Damasio, H. Damasio and Y, Christen (Eds.), Neurobiology of Decision-Making. Berlin: Springer, pp. 115-123. Irwin, D.A., Rebert, C.S., McAdam, D.W. and Knott, J.R. (1966) Slow potential change (CNV) in the human EEG as a function of motivational variables. Electroencephalog. Clin. Neurophysiol., 21 : 412-413. Ito, M. (1984) The Cerebellum and Neural Control. New York: Raven Press. Johnson, D., Magee, J.C., Colbert, C.M. and Christie, B.R. (1996) Active properties of neuronal dendrites. Ann. Rev. Neurosci., 19: 165-186. Kalaska, J.F., Cohen, D.A.D., Hyde. M.L. and Prud’homme, M.J. (1989) A comparison of movement direction-related versus load direction-related activity in primate motor cortex using a two-dimensional reaching task. J. Neurosci., 9: 2080-2102. Kamin, L.J. (1968) ‘Attention-like’ processes in classical conditioning. In: M.R. Jones (Ed.), Miami Symposium on the Prediction of Behavior: Aversive Srimulation. Miami: University of Miami Press. Kamin, L.J. (1969) Predictability, surprise, attention and conditioning. In: B.A. Campbell and R.M. Church (Eds.), Punishment and Aversive Behavior. New York: AppletonCentury-Crofts. Koester, H.J. and Sakmann, B. (1998) Calcium dynamics in single spines during coincident pre- and postsynaptic activity depend on relative timing of back-propagating action potentials and subthreshold excitatory postsynaptic potentials.
404
Proceedings of the National Academy of Sciences USA, 95: 9596-9601. Kornhuber, H.H. and Deecke, L. (1965) Hirnpotentialanderungen bei Willkurbewegungen und passiven Bewegungen des Menshen: Bereitschaftspotential und reafferent Potentiale. Pjlugers Arch. Gesamte Physiol., 284: 1-17. Kraepelin, E. (1919) Dementia Praecox and Paraphrenia (R.M. Barclay, Trans.) Edinburgh: E and S. Livingston. (Original work published in 1913) Kuhar, M.J., Atweh, S.F. and Bird, S.J. (1978) Studies of cholinergic-monoaminergic interactions in rat brain. In: L.L. Butcher (Ed.), Cholinergic-Monoarninergic Interactions in the Brain. New York: Academic Press. ,eDoux, J.E. (1993) Emotional memory systems in the brain. Behav. Brain Res., 58: 69-79. ,eibowitz, S.F. (1974) Adrenergic receptor mechanisms in eating and drinking. In: F.O. Schmitt and F.G. Worden (Eds.), The Neurosciences Third Study Program. Cambridge, MA: MIT Press. ,evitt, D.R. and Teitelbaum, P. (1975) Somnolence, akinesia, and sensory activation of motivated behavior in the lateral hypothalamic syndrome. Proceedings of the National Academy of Sciences USA, 72: 2819-2823. .iddle, P.F. (1994) Volition and schizophrenia. In: AS. David and J.C. Cutting (Eds.), The Neuropsychnlogy of Schizophrenia. Hillsdale, New Jersey: Erlbaum Press, pp. 3 9 4 9 . ,loyd, K.G. (1978) Observations concerning neurotransmitter interaction in schizophrenia. In: L.L. Butcher (Ed.), Cholinergic-Monoaminergic Interactions in the Brain. New York: Academic Press. ,ow, M.D., Borda, R.P., Frost, J.D. and Kellaway, P. (1966) Surface negative slow potential shift associated with conditioning in man. Neurologv, 16: 71 1-782. ,ynch, M.E. (1992) Schizophrenia and the D1 receptor: Focus on negative symptoms. Prog. Neuro-Psychopharmacol. Biol. Psychiatry, 16: 797. Macchi, G . and Rinvik, E. (1976) Thalmo-telencephalic circuits: A neuroanatomical survey. In: A. Rtmond (Ed.), Handbook of Electroencephalography and Clinical Neurophysiology. (Vol. 2, Pt. A) Amsterdam: Elsevier. MacLennan, A.J. and Maier, S.F. (1983) Coping and the stressinduced potentiation of stimulant stereotypy in the rat. Science, 219: 1091-1093. Magee, J.C. and Johnston, D. (1997) A synaptically controlled, associative signal for Hebbian plasticity in hippocampal neurons. Science, 275: 209-213. Maher, B.A. (1977) Contributions to the Psychopathology of Schizophrenia. New York: Academic Press. Maier, S.F., Seligman, M.E.P. and Solomon, R.L. (1969) Pavlovian fear conditioning and learned helplessness effects on escape and avoidance behavior of (a) the CS-US contingency and (b) the independence of the US and voluntary responding. In: B.A. Campbell and R.M. Church (Eds.), Punishment and Aversive Behavior. New York: Appleton, pp. 299-342.
Markram, H., Lubke, J., Frotscher, M. and Sakmann, B. (1997) Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science, 275: 2 13-215. Masterson, F.A. (1970) Is termination of a warning signal an effective reward for the rat? J. Compara. Physiolog. Psychol., 72: 471-475. McAdam, D.W. (1969) Increases in CNS excitability during negative cortical slow potentials in man. Electroencephalog. Clin. Neurophysiol., 26: 216-219. McAdam, D.W., Irwin, D.A., Rebert, C.S. and Knott, J.R. (1966) Conative control of the contingent negative variation. Electroencej~halog.Clin. Neurophysiol., 21: 194-195. McAllister, W.R. and McAllister, D.E. (197 la) Behavioral measurement of conditioned fear. In: F.R. Brush (Ed.), Aversive Conditioning and Learning. New York: Academic Press, pp. 105-179. McAllister, W.R. and McAllister, D.E. (1971b) The inverse relationship between shock intensity and shuttlebox avoidance learning in rats. J. Compara. Physiolog. Psychol., 74: 4 2 6 4 3 3. Milner, B. (1982) Some cognitive effects of frontal-lobe lesions in man. Philosophical Transactions of the Royal Society of London (Biology),298: 21 1-226. Mirsky, A.F. (1969) Neuropsychological bases of schizophrenia. Ann. Rev. Psychol., 20: 321-348. Mishkin, M. (1982) A memory system in the monkey. Philosophical Transactions of the Royal Society qf London. B, 298: 85-95. Mishkin, M. (1983) Cerebral memory circuits. In: T.A. Poggio and D.A. Glaser (Eds.), Exploring Brain Functions: Models in Neiiroscience. New York: Wiley, pp. 113-125. Naatanen, R and Gaillard, A.W. (1983) The orienting reflex and the N2 deflection of the ERP. In: A.W.K. Gaillard and W. Ritter (Eds.), Tutorials in Event-Related Potential Research: Endogenous Components. Amsterdam: North-Holland, pp. 119-142. Naatanen, R., Simpson, M. and Loveless, N.E. (1982) Stimulus deviance and evoked potentials. Biol. Psychiatry, 14: 53-98. Netter, P. and Rammsayer, T. (1991) Reactivity to dopaminergic drugs and aggression related personality traits. Personality and Individual Differences, 12, 1009-1017. Ogmen, H. (1993) A neural theory of retino-cortical dynamics. Neur: Networks, 6: 245-273. Ogmen, H. and Gagnt, S. (1990) Neural network architectures for motion perception and elementary motion detection in the fly visual system. Neur: Networks, 3: 487-505. Pantelis. C., Harvey, C., Taylor, J. and Campbell, P.G. (1991) The anterior cingulate mediates processing selection in the Stroop attentional conflict paradigm. Proceedings of the National Academy of Sciences, 87: 256-259. Pantelis. C. and Nelson, H.E. (1994) Cognitive functioning and symptomatology in schizophrenia: The role of frontalsubcortical systems. In: A.S. David and J.C. Cutting (Eds.), The Neuropsychology of Schizophrenia. Hillsdale, NJ: Erlbaum Press, pp. 215-229. Pavlov, I.P. (1927) Conditioned Reflexes. London: Oxford University Press.
405 Pilowsky, I. and Bassett, D. (1980) Schizophrenia and the response to facial emotions. Comprehen. Psychiatry, 2 1 : 5 15-526. Rammsayer, T., Netter, P. and Vogel, W. (1993) A neurochemical model underlying differences in reaction times between introverts and extroverts. Personality Individ. Diferences, 14: 701-712. Renault, B. and Lesbvre, N. (1978) Topographical study of the emitted potential obtained after the omission of an expected visual stimulus. In: D. Otto (Ed.), Multidisciplinary Perspectives in Event-Related Brain Potential Research. Washington, DC: U.S. Government Printing Office, pp. 202-208. Reynierse, J.H. and Rizley, R.C. (1970) Relaxation and fear as determinants of maintained avoidance in rats. J. Compara. Physiol. P.Yychol., 72: 223-232. Reynolds, G.P. (1983) Increased concentrations and lateral asymmetry of amygdala dopamine in schizophrenia. Naturr, 305: 527-529. Reynolds, G.P. ( 1987) Post-mortem neurochemical studies in human postmortem brain tissue. In: H. Hafner, W.F. Gattaz and W. Janzarik (Eds.), Search ,for fhe Causes of Schizophrenia. Heidelberg: Springer, pp. 236-240. Reynolds. G.S. (1968) A Primer of Operant Conditioning. Glenview, Illinois: Scott, Foresman, and Company. Riklan, M. (1973) L-dopa and Parkinsonism: A Psychological Assessment. Springfield, Ill.: C.C. Thomas. Sauer, H., Geider, F.J., Birkert, M., Reitz. C. and Schroder, J . (1991) Is chronic schizophrenia heterogeneous? Biol. P . Y ~ chiatry, 29 (Suppl.): 661s. Schallert, T., Whishaw, I.Q., de Ryck, M. and Teitelbaum, P. (1978a) The postures of catecholamine-depletion catalepsy: Their possible adaptive value in thermoregulation. Physiol. Behav.. 2 1 : 8 17-820. Schallert, T., Whishaw, I.Q., Ramirez, V.D. and Teitelbaum, P. (3978b) Compulsive, abnormal walking caused by anticholinergics in akinetic, 6-hydroxydopamine-treated rats. Science, 199: 1461-1463. Schallert, T., de Ryck, M. Whishaw. I.Q. and Ramirez, V.D. (1979) Excessive bracing reactions and their control by atroprine and L-dopa in an animal analog of parkinsonism. Exp. Neurol., 64: 33-43. Sekuler, R. (1975) Visual motion perception. In: E.C. Carterette and M.P. Friedman (Eds.), Handbook of Perception: Seeing. Vol. V San Diego, California: Academic Press, pp. 387-433. Shaywitz, B.A., Cohen, D.J. and Boswers. M.B. Jr. (1977) CSF monoamine metabolites in children with minimal brain dysfunctions: Evidence for alteration of brain dopamine. J. Peditztr., 90: 67-7 1. Shekim, W.O., Dekirmenjian, H. and Chapel, J.L. (1977) Urinary catecholamine metabolites in hyperkinetic boys treated with d-amphetamine. American Journal of Psychiatry, 134: 1276-1279. Sillio, A.M., Jones, H.E., Gerstein, G.L. and West, D.C. (1994) Feature-linked synchronization of the thalamic relay cell
firing induced by feedback from the visual cortex. Nature, 369: 479482. Squire, L.R. and Cohen, N.J. (1984) Human memory and amnesia. In: G. Lynch, J. McGaugh, and N.M. Weinberger (Eds.), Neurobiology of Learning and Memoly. New York; Guilford Press, pp. 3 4 4 . Staddon, J.E.R. (1983) Adaptive Behavior and Learning. New York: Cambridge University Press. Stevens, J. (1993) An anatomy of schizophrenia? Arch. Gen. Psychiatry, 29: 177-189. Swanson, J.M. and Kinsbourne, M. (1976) Stimulant-related state-dependent learning in hyperactive children. Science, 192: 1354-1356. Tecce. J.J. and Cole, J.O. (1974) Amphetamine effects in man: Paradoxical drowsiness and lowered electrical brain activity (CNV). Science, 185: 4 5 1 4 5 3 . Teulings, H.-L., Contreras-Vidal, J.L., Stelmach, G.E. and Adler, C.H. (1997) Parkinsonism reduces coordination of fingers, wrist, and arm in fine motor control. Exp. Neurol., 146: 159-170. Thompson, R.F. (1988) The neural basis of basic associative learning of discrete behavioral responses. Trends Neurosci., 11: 152-155. Torrey, E. and Peterson, M. (1974) Schizophrenia and the limbic system. Lancet, 11: 942-946. Tsurnoto, T., Creutzfeldt, O.D. and Legtndy, C.F. (1978) Functional organization of the corticofugal system from visual cortex to lateral geniculate nucleus in the cat. Exp. Brain Res., 32: 345-364. Ungerleider, L.G. and Mishkin, M. (1982) Two cortical visual systems: Separation of appearance and location of objects. In: D.L. Ingle, M.A. Goodale, and R.J.W. Mansfield (Eds.), Ana/.ysis of Visual Behavior. Cambridge, MA: MIT Press, pp. 549-586. Valenstein, E.S., Cox, V.C. and Kakolewski, J.W. (1969) The hypothalamus and motivated behavior. In: J.T. Tapp (Ed.), Reinforcement and Behavior. Academic Press: New York. Van Gemmert, A.W.A., Teulings, H.-L., Contreras-Vidal, J.L. and Stelmach, G.E. (1998) The effects of Parkinson’s disease on the control of size and speed in handwriting. Neuropsychology, in press. Wallach, M.B. (1974) Drug-induced stereotyped behavior: Similarities and differences. In: E. Usdin (Ed.), Neurophyshopharmacology of Monoamines and their Regulatory Enzymes. New York: Raven Press. Walter, W.B., Cooper, R. Aldrige, V.J., McCallum, W.C. and Winter, A.L. (1964) Contingent negative variation: An electric sign of sensori-motor association and expectancy in the human brain. Nature, 203: 380-384. Weber, B.A. and Sulzbacher, S.1. (1975) Use of CNS stimulant medication in averaged electroencephalic audiometry with children with MBD. Learn. Disabilities, 8: 300-303. Weinberger, D.F. (1988) Schizophrenia and the frontal lobe. Trends Neurosci., 11 : 367-370. Weiner, W.J. and Klawans, H.L. (1978) Cholinergic-monoaminergic interactions within the striatum: Implications for
406 choreiform disorders. In: L.L. Butcher (Ed.), CholinergicMonouminergic Interactions in the Brain. New York: Academic Press. Weiss, G. and Hechtman, L. (1979) The hyperactive child syndrome. Science, 205: 1348-1354.
Wolgin, D.L., Cytawa, J. and Teitelbaum, P. (1976) The role of activation in the regulation of food intake. In: D. Novin, W. Wynvicka, and G. Bray (Eds.). Hunger: Basic Mechanisms und Clinical Implications. New York: Raven Press. Zuckerman, M. (1984) Sensation seeking: A comparative approach to a human trait. Behuv. Bruin Sci.,7: 413-471.
J.A. Reggia. E. Ruppin and D. Glanzman IEds.) Progress in Bruin Research, Vol 121 0 1999 Elsevier Science BV. All rights reserved.
CHAPTER 22
A neural network model of attention biases in depression Greg J. Siegle" Sun Diego State University and the University of California, Sun Diego, CA, USA
A neural network model of attention biases in depression Depression is a disabling disorder characterized by negative moods, lack of interest in pleasurable activities, weight change, sleep disturbance, psychomotor retardation, fatigue, feelings of worthlessness, decreased attention, and suicidal ideation (APA, 1994). The lifetime prevalence for depression in the US has been estimated at up to 17% of population (Kessler et al., 1994). The prevalence and seriousness of the disorder make understanding factors associated with its onset and maintenance a common goal of clinical researchers. Negative information processing biases may be such factors. Research finds that depressed and dysphoric (a sad mood state thought to underlie depression) individuals selectively attend to negative information over positive information (Matthews and Harley, 1996; Williams et al., 1996), selectively remember negative information (Blaney, 1986; Matt et al., 1992), and interpret information as negative that other people do not see as negative (Williams et al., 1998). Yet, it is unclear what aspects of negative information depressed and dysphoric individuals attend to, and whether biased attention to negative information occurs in the early stages of attention, having to do with initial perceptions of information (e.g. as suggested by *Corresponding author. Currently at the Clarke Institute for Psychiatry, 250 College St., Toronto, Ontario MST 1R8, Canada. e-mail: gsiegle @psychology.sdsu.edu
Kitayama (1990) and Matthews and Southall (1991)), or in late stages of attention, involving retrieval of associations from memory (Macleod and Mathews, 1991), and whether biased attention helps to maintain depression. The current chapter presents a physiologically constrained framework for understanding the role of depression in attention to negative information. A computational neural network is used to evaluate predictions based on this framework. Data from a number of experiments are used to evaluate the model's predictions. Conclusions about depressive information processing biases, stemming from aspects of the model that appear valid are then presented.
A physiological framework for understanding information processing biases in depression Physiological constraints on attention to emotional information help to resolve ambiguities regarding the location and time course of attention biases in depression. LeDoux (1997) suggests that emotional information is processed in parallel by brain systems responsible for identifying emotional aspects of information (the amygdala system) and non-emotional, conceptual or semantic aspects of information (the hippocampal system). Research documents the importance of the amygdala system in identifying information as either positive or negative (Halgren, 1992; LeDoux, 1992), and the hippocampal system in semantic association, suggesting that it moderates activation of semantic qualities associated with stimuli in cortex (e.g.
108
Squire, 1992). Extensive feedback occurs between the hippocampal and amygdala systems (Amarel et al., 1992; Tucker and Derryberry, 1992). This feedback may allow individuals to associate emotional aspects of information with non-emotional aspects. LeDoux’s (1 997) model is conceptually similar to cognitive theories that suggest emotional information processing involves spreading activation throughout a semantic network in which both semantic and affective features are represented as nodes in the network (Bower, 1981). For example, a stimulus such as a crying person might activate both ‘person’ and ‘sadness’ nodes in an observer’s semantic network, Ingram (1984) suggests that depressed individuals suffer from strongly activated connections between negative affective nodes and multiple semantic concepts, creating feedback loops that maintain depressive affect and cognition. Ingram’s (1984) theory could be used to explain depressive information processing biases in LeDoux’s model in a number of ways, each of which appeals to the notion that attention can be separately allocated to affective and non-affective aspects of information. Depressive attention biases could involve excessive attention to non-emotional features of information (biased hippocampal system processing), excessive attention to emotional features (biased amygdala system processing), or feedback between affective and semantic processing systems. Each scenario suggests different roles for attention in the onset and maintenance of depression.
Ways to evaluate attention in depression Based on LeDoux’s model, two tasks seem particularly useful for elucidating the nature of attention biases in depression. A lexical decision task, in which participants are asked to identify whether a string of letters spells a word, which may be positive, negative, or neutral, directs people’s attention towards non-emotional aspects of information. If depressed individuals focus on non-emotional features, negative information processing biases should be apparent in this task. In contrast, if depressed people focus more on the affective aspects of information, information proc-
essing biases would be more apparent on a task designed to focus people’s attention on affective features. A valence-identification task, in which individuals are asked to name whether a word is positive, negative, or neutral, can be used for this purpose. Attentional biases involving feedback between mental representations of affective and semantic aspects of information are assumed to result in biased information processing on both tasks. By analyzing response biases on the tasks, the extent to which attention to certain types of information is impaired can be quantified. By analyzing reaction times on the tasks, the extent to which biases are apparent in the early stages of attention can be quantified. To effectively analyze attentional biases over the entire time course of attention, a more continuous measure of cognitive load is needed. Pupil dilation is a strong candidate for such a measure. Muscles controlling pupil dilation are innervated by structures essential to both cognitive and affective information processing. Thus, the effectiveness of pupil dilation, as a measure of cognitive load, has been repeatedly demonstrated using attention and memory tasks (see Beatty, 1982 for a review). For example, Kahneman and Beatty (1966) show that the pupil reliably dilates one millimeter for each digit research participants are asked to remember in a short-term memory task. Pupils also dilate in proportion to the difficulty of tasks (Hess and Polt, 1964) and effort needed to perceive information (Hakerem and Sutton, 1966). These results can be explained by projections from semantic identification structures to the midbrain reticular formation which is connected to the ocularmotor nuclei; stimulation of the midbrain reticular formation has been shown to lead to changes in pupil dilation (Beatty, 1986). Emotional activity is often thought to be mediated by activity in the hypothalamo-thalamo-cortical axis. As such, activity in these structures, and limbic structures connected to them, has been shown to result in pupil dilation (see Hess, 1972 for a review). Stimulation of the amygdala, in particular, increases pupil dilation in cats, dogs, and monkeys (Koikegami and Yoshida, 1953; Fernandez de Molina and Hunsberger, 1962).
409
Empirical studies using the affective lexical decision and valence identification tasks There is a growing body of literature exploring affective lexical decision and valence identification tasks with depressed and nondepressed individuals (for reviews, see Siegle et al., 1999; Siegle, 1999) though the two tasks have rarely been examined together. Siegle et al. (1999) performed an affective lexical decision task and affective valence identification task with 30 dysphoric and 46 non-dysphoric undergraduates, measuring signal detection rates and reaction times. A number of computational simulations, described below, were generated to help understand this data. To test predictions based on the simulations, Siegle (1999) performed similar tasks with 23 unmedicated clinically depressed and 25 non-depressed adults, measuring reaction times, signal detection rates, and pupil dilation. To distinguish between personally relevant and non-relevant words, both normed word-lists and words generated by participants were employed. The results of simulations described in the following sections will be compared to data from these studies.
Why computational neural networks are particularly appropriate for investigating attention biases in depression Le Doux’s model involves complex interactions between highly non-linear systems. The flow of information through these systems is difficult, if not impossible, to understand just by thinking about the model. Similarly, the flow of information through a network such as Bower’s (1981) model is notoriously hard to predict if there is noise in processing (Movellan and McClelland, 1995). Computational modeling can provide a rigorous basis for understanding how negative experiences could impact information processing in models such as Bower’s or LeDoux’s, and can suggest ways to test such these understandings empirically. Computational neural networks are particularly useful for understanding attention in depression, because they are natural extensions of the notion of semantic networks, on which Bower’s (1981) network theory is based (Blanket al., 1991; Hinton,
1991; Yates and Nasby, 1993). Their biological congruity allows an intuitive representation of LeDoux’s (1997) model to be integrated with the semantic network approach. Additionally, neural network representations avoid difficulties associated with representing ‘hot’ vs. ‘cold’ cognitions common in semantic network approaches (Teasdale and Barnard, 1993). That is, as long as both the semantic and emotional aspects of sadness are represented by a single node in a semantic network, it is difficult to represent the idea of thinking about sadness without formally being sad in a semantic network. Because simulated neurons in a neural network represent ‘microfeatures’ of concepts (Hinton et al., 1986) it is logical to assume that emotional and semantic aspects of a single concept would be represented by different nodes, corresponding to the different brain areas implicated by LeDoux’s model.
A computational framework for investigating affective information processing The following sections augment a growing body of neural network models of unipolar depression (see Siegle, 1998, for a review) with a computational neural network model that embodies the essential features of Bower’s (1981) and LeDoux’s (1997) models of emotional processing (Siegle et al., 1995; Siegle, 1996; Siegle and Ingram, 1997a,b; Siegle, 1999; Siegle et al., 1998). The goal in producing the computational model was to reproduce salient aspects of attention to emotional information, including the gradual perception and recognition of emotional and non-emotional aspects of a stimulus. Nodes in the model were not intended to strictly represent groups of human neurons, or capture details of brain structures. Rather, the general hypothesis that parallel connected systems are responsible for the recognition of affective and semantic aspects of information was represented. The model can be used to make predictions regarding the time course of emotional information processing on the lexical decision and valence identification tasks. Architecture
The model is shown in Fig, 1. In the figure, small circles represent individual nodes. Large ellipses
410
Fig. 1. A neural network model for investigating affective and semantic infonnation processing
represent clusters of nodes that perform the same conceptual function. In the model, nine orthographic nodes, representing perceptual characteristics of stimuli, are fully connected to, and feed activation forward to nine nodes representing the semantic content of stimuli and two nodes representing the affective content of stimuli, in parallel. Feedback occurs between the nodes representing affective and semantic features. The semantic nodes roughly correspond to hippocampal system processing. The affective nodes correspond to amygdala system processing. The feedback between them captures LeDoux’s notion of feedback between these brain areas, as well as Ingram’s (1984) notion of feedback between mental representations of affective and semantic features. To allow the network to make an analog of a lexical decision or valence identification, both semantic and affective feature nodes feed activation forward to twelve nodes representing the network’s outputs (nine semantic concepts, three valences). The outputs thus represent products of decision processes assumed to occur in the frontal lobes.
Inhibitory connections from the outputs back to the valence units were incorporated to approximate Davidson’s (1998) idea that frontal activity inhibits amygdala firing. These connections were not used throughout the majority of simulations for reasons discussed at the end of this chapter. Which task a person is performing, the lexical decision or valence identification task, is assumed only to affect her eventual decision, and not early attentional processes. This intuition is captured in the network by allowing task units, representing the context in which the stimulus is to be interpreted (either as a lexical decision or valence identification) to feed activation to the output nodes. These nodes are represented on the right side of Fig. 1 to imply that they are an internal cognitive phenomenon, rather than perceptual inputs. Conceptually, the recognition of a stimulus might proceed as follows. At the beginning of a simulation, activations of the input nodes are set to predetermined values representing a stimulus, subject to some perceptual noise. As activation from the input units feeds to the valence and semantic
41 1
units, a pattern of activation in the semantic units would be formed, corresponding to some nonemotional features the network has learned (e.g. if the stimulus was ‘birthday’, the notion of the date on which a person is born might be retrieved). At the same time, the positive and negative valence units would take on activations suggesting that the stimulus is either positive or negative. Feedback between the semantic and valence units might lead the network to change the pattern of activations in the semantic units, suggesting that the network has associated a different set of non-emotional features with the stimulus. Similarly, the feedback could lead to a different pattern of activations occurring in the valence units, suggesting, for example, that the network originally identified the stimulus as positive, but now identifies it as negative. During this process the output units become active in proportion to activation of the semantic and valence nodes, with additional contextual information from the task units. The fit of these activations to each stored pattern is evaluated simultaneously. When an overwhelming proportion of evidence for one output pattern is accumulated, the network can be said to have ‘recognized’ the stimulus, as that pattern. This event would correspond to a person having recognized the non-emotional features of the stimulus, and having assigned it an affective valence. By allowing activation to continue within the model, associations occumng after a reaction time can also be observed. The model thereby captures the time course of attention to a presented stimulus. To the extent that the model is valid, it provides information about how relationships between emotional and nonemotional aspects of information could influence attention, and can provide insight into what aspects of an emotional stimulus a person might pay attention to, both before and after the stimulus is recognized. Representation of stimuli Representation of non-affective features was kept as simple as possible. Orthographc, semantic, and output features were bipolar and normalized, such that one node was activated with strength 1 and all others were activated with strength -2/(vector-
length -2). In the task nodes, lexical decision task was represented in the task nodes as activations: 1, - 1. Valence identification was represented as activations - 1, 1. Representation of affective features was based on conventional assumptions that positive and negative valences are either opposite or orthogonal. To represent positivity and negativity orthogonally, two nodes could be used. High activation of one node represents positive information while high activation of the other node represents negative information. Low activation of both nodes represents neutral information. The validity of using a near orthogonal representation of positivity and negativity can be supported empirically. Williams, et al. (1998) had 600 undergraduates rate the positivity and negativity of 30 words normed for emotionality. On a scale of 1 (not emotional) to 5 (very emotional), they found that positive words were generally rated as somewhat positive and not negative (mean positivity, mean negativity = 3.62, 1.13). Negative words were rated as less positive and more negative (1.49, 3.45). Neutral words were rated as slightly positive, but lacking in negativity (2.36, 1.28). Given that these values were not perfectly orthogonal, activation of the valence units in the network was made proportional to Williams et al.’s (1998) means. Specifically, ideal activations of valence units for each valence were: positive: 1, 0.31; negative: 0.41, 0.95; neutral: 0.65, 0.35. Training: simulation of normal and depressed experiences in the model Following the idea that non-depressed individuals are exposed to a variety of positive, negative, and neutral information, an analog of normal experience was induced in the network by training it on equal numbers of positive, negative, and neutral exemplars, using a Hebb (1949) learning rule. Practically, this was done by multiplying input vectors by the transpose of desired output vectors to obtain a weight matrix, for each set of connections. Many theorists suggest that the induction of depression involves one or a few pervasive negative life events or loss experiences (e.g. Beck 1974; Paykel 1979) that are continuously thought about.
412
This process was operationalized in the neural network model by training the network on a single negative stimulus for a prolonged period after it had been trained on equal numbers of positive, negative and neutral stimuli. Specifically, products of the valence and semantic features for a single negative stimulus were repeatedly added to connections between the valence and semantic units.’ This technique implemented a Hebb rule. To bound the increase in weights, a slight decay factor on previously learned information (a forgetting rule) was imposed. A number of different algorithms for allowing the network to ‘learn’ initial exemplars, as well as negative information have been explored. In initial descriptions of the network (Siegle, 1996; Siegle and Ingram 1997a,b), a ‘back-propagation’ learning algorithm was used instead of a Hebb rule. Though the resulting simulations were qualitatively similar, differences between the Hebb and backpropagation trained networks are discussed in the Appendix. Network activation during tusks The network is cascaded, meaning that each node’s activation is a function of its input over time. Unless otherwise noted, all multiplication described below is matrix multiplication. Activation of a layer is represented by that layer’s name. Connections are represented by the name of each layer. For example, ‘Inputsemantic ‘ represents connections from the input to the semantic nodes. Before and after stimulus presentation, only noise entered the system. Activation of nodes represents the average firing rate of a population of neurons at a given time t. Activation of the semantic and valence units occurred according to the rules: ‘Interestingly, overtraining connections from the input units to the semantic and valence units, along with the affective-semantic loop had the same effects on the network’s performance as overtraining only the affective semantic loop. This observation could suggest that individuals who experience negative events (i.e. have excess negative inputs) become depressed in the same way as individuals who think a great deal about negative things (only activating the affective-semantic loop). Empirical research is needed to address this question.
,
Semantic, = (1 - T)Semantic,+ T x ((Input + noise) x InputSemanticT)
,
Valence, = ( 1 - .r)Valence,+ T x ((Input + noise) x InputValenceT) where T is the diffusion rate for inputs. Noise was bipolar and uniformly distributed. Before and after the presentation of a stimulus, the Input vector was null. Stimuli were presented for 10 epochs, after which the network operated entirely on noise input plus feedback between the semantic and affective feature units for 250 epochs, representing the brief (150 ms) presentation time for empirical stimuli in Siegle’s ( 1998a) tasks. Feedback between semantic and valence nodes was operationalized according to the differential equations: Semantic,= ( 1 - p)Semantic,- I + p x lyapunov x (Valence x VaIenceSemanticT) Valence, = ( 1 - p)Valence,_ I + p x iyapunov x (Semantic x SemanticValenceT) where p governed the amount of feedback between the structures. lyapunov governed how quickly the network settled on a set of activations. Values below one act as a decay factor, allowing activations to approach zero. Values above one tend to preserve and increase activation, creating a positive feedback loop between the affective and semantic structures. Activation of the output units was based on the activation of all units feeding to them as: Output, = Semantic x Semanticoutput’ +Valence x Valenceoutput’ + TaskPriority x (Task x Taskout’) where TaskPriority governed how much the context could affect the task. Nonlinearity was introduced by limiting activations of nodes to 2.2 Soft competition was introduced for output nodes (semantic and valence nodes, in early ’This technique was used rather than a sigmoid activation function because even small deviations from zero, using a sigrnoid, tended to magnify on feedback as a function of the squashing function rather than other properties of the network. Using a piecewise linear function allowed all observed biasing effects to be based on the architecture and training of the network.
413
simulations) by subtracting the maximum activation of any other node in the output layer from each node’s activation. Matches were determined in a manner analogous to that used by Cohen et a]. (1990) to represent word and color naming in a connectionist model of the Stroop task. Following Ratcliff’s (1978) notion that semantic identification is a diffusion process, they suggest that a semantic identification occurs when the activation of the mental representation of a stimulus reaches a threshold. Counters were therefore defined to represent the accumulated evidence for each possible item the network might identify. The counters added evidence for a given stimulus proportional to the difference in the fit of the outputs to the expected output for the stimulus the maximum fit to any other trained output, subject to gaussian noise (magnitude 0 for Hebb-trained simulations). Fit was computed as the cosine of the output vector with an expected output vector. When any counter exceeded a threshold (arbitrarily set to 2.5), the network was said to have made an identification. That epoch was counted as the network’s reaction time.
Using valence ratings to empirically estimate decay To check that the processes used to represent valence in the network, as well as to induce an analog of depression, were effective, an analog of a Williams et al’s (1998) valence-rating procedure can be used, in which dysphoric and non-dysphoric individuals rated the positivity and negativity of a normed word set. The network was presented with each stimulus on which it was trained, for 200 epochs (long enough to reach assymptotic activations in the valence units). The resulting activation values for valence units representing positivity and negativity were recorded as an analog of a rating for how positive and how negative stimuli were rated. These values were scaled from 1 to 3.62 (mean valence rating for positivity). The median valence ratings for stimuli of each valence, for 10 trials are shown in Table 1. For each valence, ratings for all three stimuli were within 1 l/lOth of a point. As expected, positive stimuli were primarily positive. Negative stimuli were somewhat
positive. Neutral stimuli were more positive than negative stimuli. Whether the overtraining procedure behaved as expected can also be tested using this method. Williams et al. (1998) found that more dysphoric college students generally rated stimuli progressively more negatively and less positively. Inspection showed that the rate of forgetting in the Hebb network governs the magnitude of resulting weights after overlearning. With too little forgetting, all ratings go up after overtraining (i.e. too much positivity for all stimuli). With too much forgetting, previously learned stimuli are no longer recognized after overlearning. The desired effect was obtained for a minirmm forgetting rate of 0.89, which was therefore atiopted for subsequent simulations. Table 1 also presents the simulated valence ratings for the network, overtrained five times, with a forgetting rate of 0.89. In each case, ratings are more negative for the overtrained network. While ratings for the positive information are lower on positivity than in the original network, ratings for neutral stimuli are similar, and positivity ratings for negative stimuli are higher in the overtrained network.
Implementation The Hebb-trained network was implemented in Matlab on an Intel Pentium I1 computer. The backpropagation network was implemented in the PlaNet modeling environment (Miyata, 1991) on a Sun SPARC 1 computer. The code used to implement all networks is available from the author upon request. The parameters used for Hebb trained network simulations included here are shown in Table 2.
Use of the network to predict the time course of attention to emotional information The validity of the current model can be evaluated by examining how well it captures behavioral and physiological aspects of depressed and nondepressed individuals’ responses to the affective valence identification and lexical decision tasks. Three aspects of the tasks were modeled using the network, including reaction times, response biases,
414
Modeling reaction times
and pupil dilation. The following sections present information gained from simulating each of these phenomena. For each aspect of behavior, the network’s behaviors are first examined. Results of empirical experiments derived from the network’s predictions are then described. Finally, implications of confirmed predictions are discussed.
Attention to the emotional and non-emotional aspects of information can be examined separately by observing how quickly individuals respond to questions regarding either the emotional or nonemotional aspects of a stimulus. Reaction times
TABLE 1 Parameters used in the Hebb-trained neural network simulations Value
Parameter Network construction Number of input nodes Number of semantic nodes Number of Valence nodes
9 9 2
Activation parameters ‘I(input diffusion rate) p (affective-semantic loop diffusion rate) lyapunov (lyupanov exponent) Taskpriority Maximum network activation Minimum network activation Noise magnitude
0. I 0.02 0.2 0.5 2.0 - 2.0 0.05
Task parameters Stimulus duration Total measured duration Accumulation noise Positive determination accumulation threshold Negative determination accumulation threshold
10 epochs 250 epochs 0.0 1.o 1 .o
Learning parameters Additional epochs of training on negative stimuli Rate at which new training exemplars are assimilated Rreservation of old learning during new learning (Le., the forgetting rate)
5 1 .0 0.89
Training set Number of stimuli Number of negative stimuli representing depressogenic loss
9 1
TABLE 2 Median valence ratings for each stimulus from LO simulated rating sessions Nonovertrained
Positive Negative Negative personally relevant Neutral
Overtrained (5 epochs)
Positivity
Negativity
Positivity
Negativity
3.6 2.2 2.2 2.7
1.4
2.3 2.3
2.5 1.9 5.0 2.0
1.9 3.1 5.0 1.9
1.1
415
have long been assumed to reflect the amount of attention an individual pays to aspects of information (Massaro, 1988). Longer reaction times are associated with paying less attention to a task, potentially because an individual is attending to aspects of information not related to the task. Simulated reaction times are presented for simulations done by Siegle and Ingram (1997a) in Figure 2. Because the overtrained network tends to associate incoming information with the negative stimulus on which it was overtrained (Siegle, 1996; Siegle and Ingram, 1997a), the overtrained network is shown in Fig. 2 to recognize negative stimuli as negative quickly on the simulated valence identification task. In contrast, it is slower to recognize positive information as positive, because of competing activation from the representation of negativity. The network's behavior suggests that depressed individuals will be slow to report that positive words are positive, but quick to report that negative words are negative, on a valence identification task. If neutral decisions are made in the same way as positive decisions, these would also be slowed (since depressed people would think of negativity rather than neutrality). Alternately, Siegle and Ingram (1997a) suggest that neutral decisions may be exclusionary, made when neither - NonOvertrained-k~alDecision
131
136
Owrtrakd-Lexkal Deckkmn NonOvemakd-Vaknce Idertikation
-0vemkd-Valence I d e d c a t b n
/
A.
Ill I06
Positive
Negative
New1
Negative-Personally Relevant
Fig. 2. Reaction time predictions from simulated affective and lexical decision tasks, from Siegle and Ingram (1997a). Simulation details: Backpropagation trained network
a positive or negative decision is reached after a variable temporal threshold. If depressed and nondepressed individuals have the same threshold for neutral decisions, neutral decision making would not be biased. On a simulated lexical decision task, the network is slowest to make associations with negative stimuli on which it is not overtrained because the representation of negative information on which it was overtrained competes for activation. It is fastest at making associations with the negative stimuli on which it is overtrained. The network's behavior thus suggests that depressed individuals will be slow to say that negative words not specifically associated with their particular depression, are words, because they will be reminded, so strongly, of personally relevant information that they will not immediately respond to the task. In contrast, depressed individuals are expected to be especially fast at responding to negative personally-relevant words on a lexical decision task. Human reaction time data largely supports the network's predictions. In a meta-analysis of affective lexical decision task studies, Siegle (1996; Siegle et al., 1999) found that depressed people generally appeared to react more slowly to negative words than to positive or neutral words, in comparison to non-depressed people. Results from the network simulations parallel Siegle et al's (1999) study, in which the difference ( > 0) between negative and positive reaction times was larger for dysphoric undergraduates than non-dysphoric undergraduates on an affective lexical decision task. The same dysphoric undergraduates were slower to respond to positive than negative words on an affective valence identification task. Siegle (1999) found similar results with clinically depressed individuals. Specifically, depressed individuals were slowest to name the affective valence of positive words, and were no faster to say that negative words were words than positive words. For comparison with model predictions, Siegle et al.'s (1999) and Siegle's (1999) data are shown in Fig. 3 . Thus, similar patterns of reaction times were observed in the network and in people. In the network, biased reaction times were due to association of affective and semantic aspects of stimuli
416
Undergraduate data from Segle d al(1998a) ndysphoric =30’ nandysphoric =46
-El-
Dysphoric-LexkA Decision
-4- NonDysphork-ValenceIdentification
++Dy 0.9
0.6
sphoric-ValenceIdentification
Adult data from Siegle (1998a) ndepressed
=24,
nmdepressed
=24
+NonDepressed-LexicalDecision -I) Depressed-LexicalDecision 4-NonDepressed-Valence Identification +Deures sed-Valence Identification
T
t
Positive
Nesive
Neutral
Positive
Negative
Neutral
Fig. 3. Reaction times from Siegle et al’s (1999) and Siegle’s (1999) affective lexical decision and valence identification tasks
with personally relevant negative information. To the extent that mechanisms behind delays in the network match people, it is suggested that depressed individuals could have personally relevant negative thoughts in response to environmental stimuli. Biases observed in the network happen as a function of feedback between structures responsible for identifying affective and semantic features of information. If mechanisms responsible for information processing biases in the network are similar to those in humans, the amount of feedback between structures in the amygdala and hippocampal systems could moderate such biases. Depressed individuals with biases similar to those exhibited by the network are expected to have particular difficulty processing positive information.
Effects of rumination on reaction times The previous analysis intuitively suggests that greater information processing biases should be associated with more feedback between structures
responsible for representing affective and semantic aspects of information. Siegle and Ingram (1997a) suggest that feedback between the affective and semantic nodes could be considered an analog for ‘rumination’, common in depressed individuals. They show that information processing biases on the valence identification task tend to increase when the number of feedback cycles is increased throughout the network’s training. This type of rumination may represent a coping style in which individuals think excessively about emotional information throughout their lives. To test the prediction that increased feedback, representing rumination, is associated with increasing information processing biases, Siegle ( 1999) gave depressed and non-depressed individuals a measure of ruminative coping (Nolen-Hoeksema and Morrow’s (1991) Response Styles Questionnaire; RSQ) along with the valence identification task. The RSQ is a self-report measure that asks test-takers to endorse thoughts and behaviors they engage in while in a depressed mood. It contains a rumination subscale, composed
417
of questions that ask about how often individuals think of aspects of their depression, e.g. ‘think “I am ruining everything”’. Scores on the rumination scale of the RSQ were compared to depressive information processing biases on the valence identification task, operationalized as the difference in reaction times to positive and negative stimuli. Nearly all high rumination scores on the RSQ were associated with depression. Because the network’s predictions affected both depressed and non-depressed individuals, a hierarchical regression was performed on valence identification biases in which depression status was entered on the first step and an individual’s score on the rumination scale of the RSQ was entered on the second step. Depression accounted for 20.3% of the variation in valence identification biases. Rumination was positively linked to depressive information processing biases, accounting for an additional 7.6% of the variation in biases, which was statistically significant.. Results examining relationships between information processing biases and rumination were consistent with predictions from the model, suggesting that information processing biases could be a function of having both a ruminative coping style (represented by increased feedback between the hippocampal and amygdala systems) as a diathesis, and overtraining on some negative event as a stressor that predisposes ruminating individuals to turn things negative during rumination. Ruminating individuals are specifically hypothesized to overlearn their initial perceptions of negative information, as they think about the affect associated with them, repeatedly.
Personally relevant information The network responded very differently to negative information on which it was overtrained and other negative information. On the valence identification task, biases were especially strong for negative information on which it was overtrained. On the lexical decision task, the network responded especially quickly to negative stimuli on which it was overtrained, but especially slowly to all other negative stimuli. Based on the network’s performance, Siegle (1996) suggests that confounding of
such personally relevant and non-relevant information in experiments with depressed individuals may contribute to the wide variability in effect sizes obtained on affective lexical decision tasks with depressed people. To test the hypothesis that depressed individuals would respond differently to personally relevant and non-relevant information on the tasks, Siegle ( 1999) asked individuals to generate stimuli they considered representative of what they thought about when they were depressed. These stimuli were included along with normed stimuli on the valence identification and lexical decision tasks. In contrast to the network’s predictions, results suggested that depressed individuals responded especially slowly to personally relevant negative words on the valence identification task, in comparison to other negative words and in comparison to positive words. Depressed individuals did not appear to respond particularly quickly to personally relevant negative words on the lexical decision task.3 There are a number of possible implications of the human data. Potentially personal relevance is not representative of overtraining, in the sense in which it is implemented in the model. Another possible explanation is that the model is missing components relevant to explaining the range of depressive information processing biases. In support of this idea, many depressed individuals commented on their long reaction times, saying they had attended to personally relevant negative words to the extent that they had been unable to respond to the task. Two depressed individuals broke down in tears whenever personally relevant negative words were displayed. These reports suggest that slow reaction times for personally relevant negative information could be due to motor inhibition, present when depressed 3As a testament to the value of using the network for hypothesis generation, Siegle (1 999) did find interactions with personal relevance for reaction times, valenceconfusion rates, and aspects of pupil dilation. While the
observed effects were not always predicted by the network, including this variable based on simulations was especially productive in empirically investigating depressive information processing.
418
people think about negative information. Potentially, when depressed people think hard about negative information, their entire attention is drawn to the stimulus, and away from their motor response. This type of motor inhibition was not represented in the neural network, and was thus not accounted for in initial predictions. To ‘fix’ the network, aspects of the motor system could be incorporated. This technique would add considerable complexity to the network, and the knowledge to be gained by such an endeavor is questionable. Rather, it may be more useful to examine whether other behavioral and physiological indices, not subject to motor slowing, do not mirror these delays.
Modeling response biases on the valence identification task Reaction times yield information regarding how quickly individuals recognize information, but not about what they have recognized, e.g. whether they deemed positive information to be positive when they reacted to it. To understand whether information processing biases could lead to valence confusions (e.g. saying that a positive word is negative) a number of simulations were performed. Valence confusion rates can be simulated by examining the frequency with which patterns of activation in the output nodes, at simulated reaction times, are closer to expected patterns, than other erroneous patterns of activation. Using this metric, the predicted confusion rates for various levels of overtraining, based on Siegle and Ingram’s (1997b) model are presented in Fig. 4. As shown in the figure, the non-overtrained model rarely makes valence identification errors. As a consequence of the model’s tendency to associate incoming information with the stimulus on which it was overtrained, the overtrained model displays a tendency to label neutral and positive stimuli as negative. The frequency of valence confusions increases as the model is overtrained. The model never made a valence confusion for negative words. These predictions can be summarized by suggesting that depressed individuals will be biased to label all stimuli as negative.
I.
1 0
10
30
50
70
90 110
Positive
Neutral
130
Epochs of overtram on negative infomtbn Fig. 4. Valence Confusion Rate predictions from the simulated valence identification task, from Siegle and Ingram (1997h) Simulation details: Backpropagation trained network, mean of 25 simulation
To examine whether depressed individuals showed this same bias to name words as negative, Siegle (1999) calculated response biases for each valence on the valence identification task. Bias terms for negative words were over twice as high as those for positive or neutral words, for depressed individuals, whereas bias terms were roughly equal for non-depressed individuals. This result suggests that as predicted by the network model, depressed people are biased to say that many stimuli are negative on a valence-identification task, whereas non-depressed individuals are not. To the extent that these data generalize to situations outside the laboratory, it is suggested that depressed individuals will rarely have difficulty in perceiving negative information. In contrast, they may have difficulty processing non-negative information. Mechanisms behind the model’s similar performance suggest that depressed individuals may tend to see even positive information as negative, because it becomes associated with personally relevant negative information. This type of bias may help to maintain depressive affect, as few environmental stimuli would appear positive. Siegle and Ingram (1997b) have used the model’s performance to explain the occurrence of individuals who are “too depressed” to complete information processing tasks, or who make exces-
419
sive errors on these tasks, suggesting that they interpret nearly all incoming information as negative.
Modeling pupil dilation Using the neural network model to produce analogs of reaction times and valence response biases shows whether a snapshot of the network’s course of attention can reflect a single moment of human attention. The network provides a great deal more information regarding the nature of attention to emotional information than the moment at which an individual reacts to information. To capture this additional information, it is useful to examine the network’s activation during the entire course of attention. To understand the time course of attention in the network, activation of the valence units (amygdala system functions), semantic units (hippocampal system functions) and output accumulators (frontal functions) can be compared for positive, negative, or neutral information. The sum of positive activations throughout these layers (the network’s energy) is thus a rough estimate of total cognitive load over time. This sum was therefore used as an analog of human pupil d i l a t i ~ n . ~ To make predictions for pupil dilation, it is useful to examine the activity of the network in response to positive, negative, and neutral information on each task after a moderate amount of overtraining. Figure 5 presents the Hebb-trained network’s median response from five trials, in response to positive and negative stimuli on the valence identification task, over time. As shown in the figures, before overtraining (left side), the network generally responds to the presentation of positive and negative stimuli by activating its representation of semantic aspects of 41nput node activations were not included so as to minimize the effects of stimulus energy on simulated dilations. As the magnitude of task node inputs were constant, these were not added. Absolute activation of the output nodes were not added because real neuronal systems were assumed to have exclusively competitive accumulating neurons, represented by the match nodes, rather than the non-competitive, non-accumulating output nodes.
the incoming stimulus and its valence. This activation falls off after a period. This behavior is seen by the one peak in the top left panel of each sub-figure. Similarly, for positive or negative stimuli, there is a peak and decay for the appropriate valence unit (top right panel). Activation of the appropriate output units leads to a sustained match for the correct output (bottom left panel). Consequently, there is a peak and dip in the expected pupil dilation waveform (bottom right panel). When the network is overtrained on negative information (right side), its activation of the appropriate valence and semantic pattern are initially kindled, but after a short time the network’s representation of the negative information on which it has been overtrained becomes more highly activated. A similar reversal occurs for neutral words. For personally relevant negative words, no reversal occurs because semantic nodes representing personally relevant information are initially activated. The reversal happens most quickly for negative words on which the network has not been overtrained, followed by neutral words, and finally by positive words. Similarly, the negative valence unit becomes activated late in the course of attention, even for positive stimuli. As the task nodes do not affect semantic and valence unit activations, the corresponding graphs are nearly identical for the lexical decision task, with the match unit for the personally relevant stimulus, eventually becoming very activated. It was therefore predicted that average dilations would be highest to negative personally relevant words, followed by non-personally relevant negative stimuli, neutral, and positive stimuli on both tasks. These predictions are consistent with the idea that consideration of personally relevant negative information interferes with the processing of environmental stimuli by depressed individuals. More intriguing predictions emerge when simulated pupil dilations are examined as a function of depressive overtraining. Examples of how pupil dilation changes continuously with overtraining on the valence identification task are shown in Fig. 6. The figure on the bottom right shows that initial activations in the non-overtrained network are generally hgh, and decay slowly. As the network is overtrained, its initial activations become lower, but
420
later activations are higher, for all non-personally relevant information, as a function of the activation of personally relevant negative information. This pattern becomes stronger as overtraining increases. The same patterns shown in Fig. 6 are apparent on the simulated lexical decision task. The prediction for pupil dilations is thus, that non-depressed people should have relatively high early dilations on both tasks, but should have considerably lower late dilations, for all stimuli, on both tasks. Depressed individuals should have lower early dilations and higher late dilations for all nonpersonally relevant information. Depressed individuals are expected to have high early and late
dilation for personally relevant negative information. With no forgetting function on learning, the same pattern emerges for late simulated dilations, but there is no decrease with overtraining in simulated early dilations. To examine whether the predicted attentional patterns could be obtained empirically, Siegle (1999) measured pupil dilation for six seconds after stimuli were presented, in depressed and nondepressed individuals, on each task. To rule out effects of differential response times, responses were averaged time-locked to reaction times. The average response curves for depressed and nondepressed individuals were similar for all valences;
Fig. 5 . Simulated valence identification task network responses. Sub-figures on the left represent the network’s behavior before overtraining. Sub-figures on the right represent the network’s behavior after four epochs of overtraining. Each sub-figure represents the network’s response to an exemplar of a different valence. The x-axis in each panel represents time and the y-axis represents activation. In each of the sub-figures, the top left panel represents the activation of the network’s semantic features. The activation of each trained exemplar is represented by a single line in the panel. The top right panel is the activation of its affective features. The bottom left panel is the accumulation of evidence for a given valence or semantic exemplar. On the bottom right is the sum of activations for these layers, used as an analog of pupil dilation. Simulation details: Hebbtrained network, median of 5 simulations
42 1
curves averaged over all individuals, for all trials, of all valences are shown in Fig. 7. Principal components analysis was performed on pupil dilation waveforms for each valence, for each
condition, for each individual to establish separate early and late dilation intervals. As predicted by the network model, factors representing early dilations were uniformly lower for depressed than non-
Fig. 6. Simulated early and late network activation on the valence identification task. For each set of graphs, representing the median activation over five simulations, the top left panel is the network’s average activation in the first 30 processing epochs. The top right activation is the network’s average activation after the first 100 epochs. The bottom left panel is the network’s simulated reaction time. The bottom right panel shows the network’s activation on the vertical z axis. Time is represented on the horizontal x axis and overtraining is on the horizontal y axis. Simulation details: Hebb-trained network, eight levels of overtraining. The median of five simulations was taken at each level.
422
Valence Identification Task
Lexical Decision Task .4
.3
.‘ ’. **.
I
...
deprcased
mnml
-.I -1.6 -1.3
-.67
0.0
.67
1.3
2.0
2.67
3.3
4.0
4.7
time relative 10 reaction time (seconds)
-1.0
-1.3
-.67
-.33 0.0
.33
.67
1.0 1.3
1.6
20
2.33 3.0 2.67
3.3
3.7 4.0
4.3 4.7
time relative to reaction time (seconds)
Fig. 7. Mean of median pupil dilations averaged over valences, for depressed and non-depressed individuals, from Siegle (1998a) Responses were time-locked to reaction times. Time on the x axis is relative to individuals’ reaction times.
depressed individuals. Late dilations were uniformly higher. Depressed individuals’ dilations in response to personally relevant stimuli were not, on average, dramatically different from those to other stimuli, as predicted by the network. Still, tests of parameters associated with dilation suggested that in comparison to non-depressed individuals, the slopes of depressed individuals’ pupil dilation curves in the late phases of attention were flatter in response to personally relevant negative words than to positive words on the valence identification task. The results above suggest that depressed individuals do not attend to information in the early stages of attention. Their attention is particularly sustained in the late stages of attention. Mechanisms behind the model’s similar performance suggest that environmental stimuli could serve primarily as cues for depressed individuals to think about overtrained information, which they do, long after stimuli are presented. This interpretation is consistent with the idea that depressive information processing biases are primarily associated with late elaborative or ruminative processes, rather than the earliest per-
ceptual processes (e.g. Macleod and Mathews, 1991).
On using different sets of parameters Different sets of parameters, and slightly different models developed over the last five years, were used to create the different simulations discussed here. In some sense, this practice is problematic in that a consistent set of parameters was not used to make a consistent set of predictions. The rationale for this decision speaks for the seriousness, or lack thereof, with which the models are intended to be taken. The models presented here are simply sets of differential equations that help to formalize intuitions about the behavior of groups, and to generate predictions about ways these groups may behave under certain conditions. Humans may have enough actual degrees of freedom that slight changes in parameters approximated by the models would not lead to catastrophic differences in human behavior as they would in the highly constrained models. Different humans may have different values for human analogs of model parameters
423
(Siegle and Ingram, 1997a). The models’ behaviors should, at most, be interpreted as interesting ways to understand aspects of behavior, one at a time, that allow predictions about aspects of the behavior of possibly different, and variable, depressed and non-depressed individuals.
What this experiment could say about depressed people Phenomenology Depression leaves people miserable, thinking about negative things, feeling bad, and frequently, becoming suicidal. By understanding the processes that transform normal patterns of attention and association to be very negative, insight can be gained regarding the experiences of depressed people. The current model specifically lends insight into aspects of attention occurring in the seconds after information is presented. Taken together, the model’s predictions present a novel picture of attention in depression. Depressed individuals are hypothesized to pay little attention
to information as it is presented. As time passes they begin to think and ruminate on the effect associated with the information. They turn things negative. Seconds after a stimulus is presented, they are still thinking, but about whatever negative information is central to their depression, rather than the presented stimulus. In contrast, nondepressed individuals are suggested to process stimuli quickly, and to be done with them. Their responses will therefore differ based on the valence of stimuli. Localizing biases To better understand the mechanisms behind this pattern of information processing, it is instructive to view the internal representations within the network that produced them. Figure 8 graphically presents the weights for every connection within the network in a Hinton diagram. In the non-overtrained network, connections suggest that each input strongly activates one semantic unit. Each input and semantic unit is slightly negatively associated with one valence, and
Fig. 8. Strength of connections with in the network before and after overtraining. Each layer of connections within the network is represented. The layer’s inputs are graphed across the top. Outputs are down the left. Filled circles represent positive weights and can be thought of as activitory connections. Empty circles represent negative weights and can be thought of as inhibitory. The nonovertrained network is presented on the left. The overtrained network’s weights are on the right. In the semantic and output layers, the first three stimuli are positive, the next three are negative and the final three stimuli are neutral. In the output layer, the first nine stimuli are semantic outputs. The tenth through twelfth outputs are negative, positive and neutral decisions, respectively. Simulation details: Hebb-trained network, 5 epochs overtraining on the first negative stimulus
424
more negatively associated with the other. Thus, positive stimuli strongly inhibit the activation of negative associations both from inputs and from the semantic units. Negative stimuli inhibit the activation of positive associations. Similarly, positive associations in the valence layer inhibit negative semantic associations, and negative valence associations inhibit positive semantic associations. Neutral associations inhibit both positive and negative associations. In the output layer, semantic units activate exactly one semantic output. Semantic units activate one valence output, and inhibit the others. Valence units activate either the positive or negative valence, and inhibit the semantic and other valence units. After overtraining, connections throughout the network are affected. Each input activates the semantic units for all other units to a greater degree than in the non-overtrained network, with the exception of the one negative exemplar on which the network was overtrained. Unintuitively, all inputs inhibit the activation of this exemplar, while that exemplar inhibits the activations of all other semantic units. When presented, the personally relevant stimulus activates its semantic representation to a greater degree than in the non-overtrained network. In addition, after overtraining, the negative valence unit is inhibited to a greater extent by all stimuli other than that for the personally relevant stimulus. The valence units inhibit the activation of all semantic units other than the personally relevant stimulus unit, to a greater extent than before the overtraining. Biases in the network’s performance can thereby be explained in the following way. All valences are represented by positive activations of both valence units to some small degree. In the overtrained network, even slight positive activation of the valence units excites the semantic unit for the personally relevant stimulus. This in turn excites the negative valence unit to a large degree. All other units are strongly inhibited as a vicious cycle of activation of a personally relevant negative thought exciting the mental representation of sadness begins. While analogs of such a simplistic system to a human brain are tenuous at best, basic application of the same processes to physiology may be useful.
Were the semantic units to represent hippocampal system activity and the valence units to represent amygdala system activity, the following interpretation would be offered. When people become depressed, the transfer of information between the hippocampal and amygdala systems is inhibited for all but a few personally relevant thoughts. Normal semantic associations, based on emotional valence would thus be disrupted. Any activitory amygdala activity would be assumed to kindle thoughts of the personally relevant stimulus, which would rekindle the amygdala. Predictions about inhibition of amygdalar activity by cortex These connection weights help to derive predictions that are consistent with recent proposals regarding amygdala activity in depressed individuals. Davidson (1998) suggests that inhibition of amygdalar activity by prefrontal cortex prevents effect from interfering with normal semantic associations, in non-depressed individuals. He further suggests that amygdala activity becomes generally uninhibited in depressed individuals, such that many stimuli would lead to amygdalar activity. In support of this idea, Henriques and Davidson (199 1) have shown that depressed individuals have lower activity in the left dorsolateral prefrontal cortex (PFC) than do non-depressed individuals; Irwin et al. (1996) implicate this assymetry in increased amygdalar inhibition in depressed individuals in response to negative information. The following experiment with the neural network suggests that amygdalar inhibition might help to maintain normal information processing in the face of overtraining. Figure 9 shows the simulated responses from a network in which feedback from the output nodes (representative of PFC activity) was allowed to inhibit the valence nodes (representing amygdalar activity). As shown in the figure, as inhibition increased, early dilation did not change appreciably, but late dilation decreased, as did the match to negativity and the activation of the personally relevant semantic stimulus. The resulting intuition is that inhibition of the amygdala by the PFC could prevent depressive rumination after an individual has negative experiences. Still, there
425
Fig. 9. Simulated pupil dilation for increasing inhibitory feedback from output nodes to valence nodes, on presentation of a nonpersonally relevant negative stimulus on the valence identification task. The top segment of the figure contains panels representing the network’s simulated pupil dilation as a function of inhibition strength. The bottom right panel in this segment shows time on horizontal x axis, simulated dilation on the vertical y axis and inhibition strength on the depth z axis, increasing towards the observer. The bottom segments show the response in each network layer at the minimum (0) and maximum (0.85) simulated levels of inhibition. Simulation details: Hebb-trained network, eight epochs overtraining, median of five simulations
426
is no reason, based on the model, to believe that low inhibition of the amygdala alone would cause depressive information processing biases. Rather, frontal activation could serve as a protective factor against depression. Some predictions derived from the network appear to contrast with Davidson’s proposal. Under Davidson’s theory, hypofrontal activation could lead to a constant disinhibition of amygdalar activity. In contrast, the weight diagrams in Fig. 9 suggest that only thoughts of personally relevant negative information will lead to amygdala activity in depressed individuals, and other stimuli will effectively inhibit it, until they remind the individual of personally relevant negative information. The theories could be experimentally differentiated in the following manner. Davidson’s (1998) theory of a disinhibited amygdala would suggest that in depressed individuals the amygdala would be especially active throughout the course of attention. The current theory suggests that the amygdala (and hippocamus) in depressed individuals would be less active than in non-depressed people in the early stages of attention, but as they began to associate a stimulus with personally relevant negative information in the late stages of attention, their amygdala would become progressively more active. Neuroimaging of the amygdala system during a valence identification task could thus differentiate between these theories.
Potential treatment implications Ideally, information about depressive attentional styles can be used to create interventions that account for them. For example, if clinicians understand what aspects of information a depressed person focuses on, these aspects of information could be focused on during cognitive interventions. Moreover, experiments can be carried out to understand what strategies remediate information processing biases in the network; these strategies could then be applied to people. The current findings have potential implications for the cognitive and pharmacological treatment of depression. Cognitive therapies are often based on the idea that by identifying and challenging negative thoughts, depressed individuals can see things
more positively. The current model represents many aspects of negative information processing without ever accounting for a person’s subjective belief in the validity of his or her negative thoughts; the thought, irrespective of its subjective validity kindles the vicious spiral of depression. Any therapy that focused primarily on identifying negative cognitions could, unless carefully executed, serve to help the depressed individual learn them better, and thus to have more negative thoughts! A first implication of the network is that cognitive therapies should be designed so as not to make the individual relearn negative associations. Other techniques seem necessary to make such a strategy work. Replacing dysfunctional negative cognitions with more adaptive or useful positive cognitions seems especially promising. In terms of the current model it could be construed as training a person on positive thoughts, whose entire limbic system is actively inhibiting the consideration of such material. The following experiment with the simulated neural network shows that positive retraining can largely overcome biases induced by overtraining. Figure 10 shows the network’s response to a positive stimulus, along with connection weights between the semantic and valence layers after the network is retrained on one positive, and one neutral exemplar for five epochs. As shown in the figure, the retrained network’s valence activation, match accumulation, and simulated pupil dilation curves for the valence identification task look increasingly like they did before the overtraining. In the semantic nodes, it can be seen that the retrained network responds to the presented retrained stimulus by activation of the retrained stimuli, to the extent that the overtrained personally relevant negative stimulus can not usurp their activation. As shown in the accompanying Hinton diagrams, the retrained network still inhibits positive information more than it had originally done so, but activation from the new personally relevant positive and neutral patterns allows competition from valence nodes representing positivity. Retraining depressed people on positive exemplars is thus expected to lead people to think about specific positive exemplars, even when negative
427
Fig. 10. Network response to a positive stimulus before and after positive retraining Simulation details: Hebb-trained network, median of 5 simulations.
cognitions are not challenged. The trick will be to make positive cognitions ‘stick’ for depressed people in the same way that negative cognitions do. The more a depressed person associates incoming information with learned negative exemplars, the less likely a positive exemplar is to be learned, as such. Siegle (1996; Siegle and Ingram, 1997a) have shown that the amount of feedback occurring between the affective and semantic representations of information in the brain govern how likely information is to be turned negative. Potentially, ruminative response styles can be targeted in therapy before positive retraining is engaged in to aid in relearning. Interventions directed towards increasing attentional control, such as mindfulness training (e.g. Teasdale et a]., 1995) may help individuals to stop associating affective aspects of stimuli with semantic associations, thereby breaking the ‘ruminative’ affective-semantic feedback cycle that drives the network’s information processing biases. There are also implications of the current analyses for pharmacologic interventions. It was noted that the primary function of depressive overtraining was to increase inhibition of cognitions that are not personally relevant and negative. via inhibition from the amygdala system. This analysis suggests that a pharmacologic agent that could block inhibition in the amygdala and hippocampal systems might be useful in the remediation
of depression. Park (1998, unpublished) presents converging evidence suggesting that serotonergic pathways stemming from the median raphe may serve a primarily inhibitory function, and may thus be candidates for pharmacologic intervention. Additionally, because biases are hypothesized to occur as a result of inhibitory feedback between the hippocampus and amygdala systems, drugs targeting either of these structures could break the cycle. If later research shows that certain depressed individuals attend primarily to the affective or semantic aspects of information, drugs specifically targeting one or the other of these structures could be considered.
Conclusions The work presented in this chapter has brought together a number of converging lines of research. Using a computational neural network model, cognitive and physiological theories of attention to emotional information were shown to be equivalent. A theory of depression, originally advanced for semantic networks (Ingram, 1984) was shown to translate, with only slight modification, to a plausible set of physiological structures. By implementing the model computationally, predictions regarding the time course of attention in these models, for depressed and non-depressed individuals, were advanced. Using behavioral and
428
physiological measures, these predictions were tested. Based on the results of the research, integrative conclusions regarding the behavioral, cognitive, and physiological underpinnings of depression were advanced. Most notably, depressed people were observed to engage in a physiological analog of rumination, related to sustained attentional allocation, more than non-depressed people. Such a physiological analog of rumination had not been suggested, or demonstrated, before its use was sparked by emergent properties of the computational model. Additionally, the suggestion, based on the network’s performance, that depression involves sustained attention to personally relevant aspects of any presented information is a fertile ground for future research. Depression is, by definition, a disorder that affects people’s behavior, cognitions, and physiology (APA, 1994). It therefore seems necessary to account for all three domains in explaining the onset and maintenance of the disorder. Hopefully, this chapter has shown that computational models can provide a platform for integrating behavioral, cognitive, and physiological understandings of depression. Further experimentation can now be undertaken to reveal whether this approach has the power to advance treatment of each of these aspects of depression.
Appendix Differences between the backpropagation and hebb-trained neural network Differences between Hebb and backpropagation learning rules Early simulations were conducted using a backpropagation (Rumelhart et al., 1986) learning strategy rather than a Hebb rule. So that the reader may understand what aspects of the simulations were due to the learning rules, this appendix details relevant differences in the behavior and interpretation of Hebb and back-propagation-based network models. Differences in assumptions about the nature of learning. The back-propagation algorithm treats learning as an error correction process, in which connections within the network are adjusted to
minimize discrepancies between network’s outputs and expected outputs. Back-propagation thus provided a natural measure of how fast the network could learn a concept (based on the error from the presented exemplar). Thus, concepts such as speed of learning during rumination or after various levels of overtraining could be easily explored. This procedure is widely used, but can be criticized on two grounds. First, there is little evidence suggesting that back-propagation is a biologically plausible learning rule (Jobe et al., 1995). Second, when information is learned sufficiently well, little new learning takes place during back-propagation, unless there is a great deal of noise is present (i.e. unless error is always introduced into the system’s output, so that there is something to be minimized). Two consequences of this approach affected modeling efforts. First, enough noise was added so that the system’s behavior was often erratic. Second, when feedback was increased in the network after original training, no new learning took place during overtraining since the network’s outputs resembled learned patterns; the feedback acted like a ‘cleanup’ system, minimizing the network’s errors. A counter-intuitive implication was that someone who continued to have negative experiences would not have been expected to become progressively more depressed. To overcome these barriers new simulations conducted here were done using a Hebb learning rule, which strengthens connections between active nodes during training. Hebb learning assumes that each experience strengthens connections, regardless of how strong those connections were previously. Thus, new learning can always occur. In this case, rumination could not be considered a coping mechanism. More feedback would strengthen associations with negativity, and thus, would allow the network to relearn negative associations more strongly. Someone who has more negative experiences would be predicted to become progressively more depressed. The resulting network’s behavior was, in most cases, qualitatively similar to the backpropagation trained network. For these simulations, noise could be considerably reduced. Parameter differences. Different parameters are also available for investigating as analogs of
429
cognitive variables, to the researcher using backpropagation versus Hebb learning. In backpropagation, two parameters representing the rate at which new stimuli are learned, and the effect of recent previous learning on new learning, are available. Siegle and Ingram (1997a) interpret these parameters as representing different dimensions of the personality variable Openness to Experience. In Hebb learning a parameter representing the effect of new experiences on connection strengths is relatively analogous to the learning rate in backpropagation. A parameter representing the rate at which new learning occludes old learning, i.e. a forgetting function, is also available. A nonjudgemental conclusion. Differences between Hebb and back-propagation learning rules afford different interpretations of the effects of exposure to new information. Simulations using both architectures may be valuable for better understanding disorders that may involve overlearning information, such as depression.
Other differences in simulations over time
As various other methodological decisions unrelated to learning were also changed over time, decisions unrelated to the actual learning method also changed from simulation to simulation. So that the reader can effectively decide which results were due to properties of the overall design of the network versus specific methodological decisions of a given simulation, this appendix details the design of the backpropagation trained network used in early simulations. Because the code for the backpropagation network is similar to the code for the Hebb-trained network, only salient differences will be described. Other differences between the networks are detailed below. Representation. In early simulations it was assumed that some semantic differentiation occurred before valences were perceived. Thus, valence nodes received input only from semantic nodes and not inputs in Siegle (1996) and Siegle and Ingram’s (1997a) simulations. To see whether results scaled to a larger network, Siegle (1996) used 18 input and semantic nodes and 12 generalization nodes. Siegle and Ingram (1997a) eliminated the generalization nodes, and included
only 10 orthographic and 10 semantic nodes. The tenth node was reserved for simulations involving ‘novel’ stimuli to which the network was not exposed during its initial training period. Siegle’s (1996) network was trained on twelve positive, twelve negative and twelve neutral stimuli. Stimuli were generated as pseudo-random strings of 0.5s and - 0.5s in which 2/3 of the stimuli were made to be - 0.5s. To simplify analyses, three positive, three negative, and three neutral stimuli were used for Siegle and Ingram’s (1997a) study, in which only one orthographic, one semantic and one valence node was expected to be active for a given stimulus. The restriction to a localist representation was useful for ‘illustrating how network connections changed when aspects of personality were simulated, and pre-empted concerns regarding the differential feature frequencies in a distributed representation. In Siegle’s (1996) study valences were represented orthogonally (positive: 0.5, - 0.5, negative: - 0.5, 0.5, neutral: - 0.5, - 0.5) rather than using empirically derived valences as in the current simulations. This representational decision never seems to have made a difference. Training. Training involved presenting simulated orthographic features of a stimulus to the network, using the presentation parameters for the tasks, for 10 cycles, and adjusting the weights within the network until the desired semantic and valence representations were achieved using a modified back-propagation learning algorithm. Weights were modified based on the error after 10 cycles rather than according to the standard backpropagation through time algorithm which updates weights based on the average error at each cycle, because it was assumed that learning only occurs after associations are made. Training continued until the sum of the mean-squared error in the semantic and valence nodes was below 0.00 1 for Siegle’s ( 1996) study, for a block of all inputs. Due to the greater error incurred by not using hidden nodes, Siegle and Ingram (1997a) used an error threshold of 0.004 for all stimuli. Training for connections from the semantic and valence nodes to output nodes was done separately. To represent the induction of depression, the network was trained on a single negative stimulus for 100 epochs after the net-
430
TABLE 3 Parameters used in the backpropagation trained neural network simulations Parameter
Value
Network consrruction Number of input nodes Number of semantic nodes Number of Valence nodes
Activation parameters T (input diffusion rate) p (affective-semantic loop diffusion rate) Maximum network activation Minimum network activation Network noise Task parameters Accumulation noise Tcmporal threshold for ‘Nonword’ decisions Epochs temporal threshold noise Positi*e determination accumulation threshold Negative determination accumulation threshold Learning parameters Eta (learning rate) Alpha (learning momentum) Error threshold for initial learning Additional epochs of training on negative stimuli Activations in one training epoch Training set Number of stimuli Number of negative stimuli representing depressogenic loss
work’s initial training was complete. Due to their smaller network, Siegle and Ingram (1997a) used 70 epochs of overtraining. Network activation during tasks. The rules governing the network’s activation were the same as for the Hebb trained network, with the exception that nonlinearity was introduced as a logistic function (gain = 0.5, bias = 0.1) rather than a piecewise linear function. The threshold for affective and semantic determinations on match filters was 0.46. Gaussian noise was incorporated on all layers. After a specified stimulus onset asynchrony network inputs were eliminated entirely, rather than propagating noise through the network, as in the Hebb network simulations. To allow for neutral judgements in a network with unipolar weights (i.e. neutrality is represented as the absence of positivity and negativity) the network was said to judge a stimulus to be neutral when little evidence was accumulated for either valence (both accumulators less than 0.8) after a temporal threshold of 132
10-18 10-18 2
0.5 0.2
1 .o
0 0.05 1.0 200 epochs 10 1.o
I .0 0.2 0.4 0.004 70 10 9-18 1-3
epochs plus gaussian noise. Non-word decisions were also made using a temporal threshold. Siegle and Ingram (1997a) allowed less feedback between the network’s representation of semantic and valence identification than did Siegle (1996) The network performed relatively similarly to Siegle’s (1996) original network with the exception that when overtrained on negative stimuli it was facilitated on negative stimuli on the valence identification task with respect to the network, which was not overtrained.
Acknowledgements This research has been supported by NIMH grant MH30914 through the UCSD Mental Health Clinical Research Center and a Sigma Xi Grant in Aid of Research to Greg Siegle. Thanks go to Rick Ingram, Georg Matt and Eric Granholm for considerable assistance on empirical validation for the network model.
43 1
References Amaral, D., Price, J., Pitkanen, A. and Carmichael, S.T. (1992) Anatomical organization of the primate amygdaloid complex. In: J.P. Aggleton (Ed.), The Amygdala: Neurobiological Aspects of Emotion, Memory And Mental Dysfunction. New York, NY Wiley-Liss, pp. 1-66. American Psychiatric Association, ( 1994) Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Washington, D.C.: American Psychiatric Association. Beatty, J. (1982) Task-evoked pupillary responses, processing load and the structure of processing resources. Psychol. Bull., 91: 276-292. Beatty, J. (1986) The pupillary system. In: M.G.H. Coles, E. Donchin, S.W. Porges (Eds.), Psychophysiology: Systems, Processes and Applications. New York: Guilford, pp. 43-50. Beck, A.T. (1974) The development of depression. in: R.J. Friedman and M.M. Katz (Eds.), The psychology ofdepression. New York: Wiley. Blaney, P. (1986) Affect and memory: a review. Psychol. Bull., 99: 229-246. Blank, D.S., Meeden, L.A., Marshall, J.B., (1991) Exploring the symholic/subsymbolic continuum: A case study of RAAM. In: J. Dinsmore (Ed.), Closing the Cap: Symbolism vs. Connectionism, Hillsdale, NJ: Erlbaum, pp. 113-148. Bower, G. (1981) Mood and memory. Am. Psychol., 36: 129-148. Cohen. J.D., Dunbar, K. and McClelland, J. (1990) On the Control of Automatic Processes: A Parallel Distributed Processing Account of the Stroop Effect, Psychol. Rev., 97: 332-361. Davidson, R. (1998) Affective style and affective disorders: Perspectives from affective neuroscience. Address given at the Fourth Annual Wisconsin Symposium on Emotion: Affective Neuroscience. Madison, WI. This material may also be presented in Davidson, R.J. (1998) Affective style and affective disorders: Perspectives from affective neuroscience. Cogn. and Emotion, 12,307-330. Femandez de Molina, A. and Hunsberger, R.W. (1962) Organization of the subcortical system governing defence and flight reactions in the cat. J. Physiol., 7: 200-213. Halgren, E. (1992) Emotional neurophysiology of the amygdala within the context of human cognition. In: J.P. Aggleton (Ed.), The Amygdala: Neurobiological Aspects oj Emotion, Memory and Mental Dysfunction. New York, N Y WileyLiss, pp. 191-228. Hakerem, G. Sutton, S. (1966) Pupillary Response at Visual Threshold. Nature, 2 12: 485-486. Hebb, D.O. (1949) The Organization of Behavior: A NeuroPsychological Theory. New York: Wiley. Henriques. J.B. and Davidson, R.J. (1991) Left frontal hypoactivation in depression. J. Abnorm. Psychol., 100: 535-545. Hess, E.H. (1972) Pupillometrics: A method of studying mental, emotional and sensory processes. In: N.S. Greenfield and R.A. Sternhach (Eds.), Handbook of Psychophysiology. New York, N.Y.: Holt, Rinehart and Winston, pp. 491-531.
Hess, E.H. and Polt, J.H. (1964) Pupil size in relation to mental activity during simple problem solving. Science, 182: 177-180. Hinton, G.E. (Ed.), (1991) Connectionist Symbol Processing, Cambridge, MA: MIT Press. Hinton, G.E., McClelland, J.L. and Rumelhart, D.E. (1986) Distributed Representations. In: J.L. McClelland and D.E. Rumelhart (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol 1. Cambridge, MA: MIT Press, pp. 77-109. Ingram, R. (1984) Towards an information processing analysis of depression, C o p . Ther: Res., 8: 443-478. Irwin, W., Davidson, R.J., Lowe, M.J., Mock, B.J., Sorenson, J.A. and Turski, P.A. (1996) Human amygdala activation detected with echo-planar funcitonal magnetic resonance imaging. NeuroReport, 7: 1765-1769. lobe, T.H., Fichtner, C.G., Port, J.D. and Gavira, M.M. (1995) Neuropoiesis: Proposal for a connectionistic neurohiology. Med. Hypoth., 45: 147-163. Kahneman, D. Beatty, J. (1966) Pupil diameter and load on memory. Science, 154: 1583-1585. Kitayama, S. (1990) Interaction between affect and cognition in word perception. J. Personality Social Psychol., 58: 209-2 17. Koikegami, H. and Yoshida, K. (1953) Pupillary dilation induced by stimulation of amygdaloid nuclei. Folia Pychiatrica Neurologica Japonica, 7: 109-125. LeDoux, J. (1997) Emotion, Memory and the Brain. Presentation at the meeting of the American Psychological Association. LeDoux, J. (1996) The emotional brain. New York, N Y Touchstone. LeDoux, J. (1992) Emotion and the amygdala. In: J.P. Aggleton (Ed.), The Amygdala: Neurobiological aspects of Emotion, Memory and Mental Dysfinction. New York, N Y WileyLiss, pp. 339-35 1. MacLeod, C. and Mathews, A.M. (1991) Cognitive-experimental approaches to the emotional disorders. In: Paul R. Martin, (Ed.), Handbook of Behavior Therapy and PsychoLogical Science: An Integrative Approach. Vol. 164 New York: Pergamon Press, pp. I 1 6 - 1 50. Massaro, D. (1988) Experimental Psychology: An Information Processing Approach. San Diego, CA: Harcourt, Brace, Jovanovich. Matt, G., Vazquez, C. and Campbell, W. (1992) Moodcongruent recall of affectively toned stimuli: A meta-analytic review, Clin. Psychol. Rev., 12: 227-255. Matthews, G. and Harley, T.A. (1996) Connectionist models of emotional distress and attentional bias. Cogn. Emotion, 10: 56 1-600. Matthews, G . and Southall, A. (1991) Depression and the Processing of Emotional Stimuli: A study of Semantic Priming, Cogn. Thec Res., 15: 283-302. Miyata, Y.(1991) A User's Guide to Planet Version 5.6: A Tool for Constructing, Running and Looking into a PDP Network. (Available from Yoshiro Miyata, Department of Computer
432 Science, Univ. of Colorado at Boulder, Boulder, CO 80309-0430). Movellan, J.R. and McClelland, J.L. (1994) Stochasric interactive Processing, Charinel Separability and Optimal Perceptual Inte$erence: An Examination of Morton’s LAW. Department of Psychology, Carnegie Mellon University, Technical Report Series on PDP and Cognitive Neuroscience PDP.CNS.95.4. Nolen-Hoeksema, S. Morrow, J. (1991) A prospective study of depression and posttraumatic stress symptoms after a natural disaster: The 1989 Loma Pneta earthquake. J. Personality SOC.P~ychol.,61: 115-121. Park, B. (1998, unpublished) A Connectionist Account of Antidepressant Action. Available from the Connectionist models of cognitive, affective, brain and behavioral disorders website at www.sci.sdsu,edu/CAL/connectionist-models/. Paykel. E.S. (1979) Causal relationships between clinical depression and life events. In: Barrett, J.E. (Ed.), Stress and Mental Disorder. New York: Raven Press, pp. 71-86. Ratcliff, R. (1978) A theory of memory retrieval. Psychol. Rev., 85(2): 59-108. Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986) Learning internal representations by error propagation. In: D.E. Rumelhart, J.L. McClelland and the PDP Research Group (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol 1. MA: MIT Press, pp. 3 18-362. Siegle, G.J. (1996) Rumination on Affect: Cause for Negative Attention Biases in Depression? Unpublished Master’s Thesis, San Diego State University. Siegle, G.J. (1999) Cognitive and Physiological Aspects of Attention to Personally Relevant Negative Information in Depression. Unpublished Doctoral Dissertation, San Diego State University / University of California, San Diego. Availabile on the World Wide Web at http://www.sci.sdsu.edu/CALlgregldisserU. Siegle, G.J. (1998b) Connectionist Models of Cognitive, Affective, Brain and Behavioral Disorders. World Web Site located at http://www.sci.sdsu,edu/CAL/connectionistmodels/. Siegle, G.J. and Ingram, R.E. (1997a) Modeling individual differences in negative information processing biases. In: G. Matthews, (Ed.), Personality and Individual Differences in Psychopathology. Princeton, NJ: Erlbaum.
Siegle, G.J. and Ingram, R.E. (1997b) A Neural Network Model of Inability to Process Emotional Information in Depression. Presentation at the meeting of the Society for Research in Psychopathology. Palm Springs, CA, pp. 301-353. Siegle, G.J., Ingram, R.E. and Matt, G.E., (1995) A Neural Network Model of Information Processing Biases in Depression. Poster session presented at the workshop Neural Modeling of Cognitive and Brain Disorders. College Park, Maryland. Siegle, G.J., Ingram, R.E. and Matt, G.E., (1999, submitted) Affective Interference: Cause for Negative Attention Biases in Depression? Note: This information is also presented in Siegle (1996). Siegle, G., Ingram, R., Granholm, E. and Matt, G. (1998) Modeling the time course of attention to negative information in depression. In: G. Matthews (Chair), Cognitive Science Perspectives on Personality and Emotion. Presentation at the 9th European Conference on Personality, Surrey, England. Squire, L.R. (1992) Memory and the hippocampus: A synthesis from findings with rats, monkeys and humans. P.syclto1. R ~ L : , 99, 195-23 1. Teasdale, J.D. and Barnard, P., (1993) Aflect, Cognition and change: Remodelling Depressive Thought, Hillsdale: Erlbaurn. Teasdale, J.D., Segal, Z. and Williams, J.M. (1995) How does cognitive therapy prevent depressive relapse and why should attentional control (mindfulness) training help? Behav. Res. Ther:, 33, 25-39. Tucker, D.M. and Denybeny, D. (1992) Motivated attention: anxiety and the frontal executive functions. Neuropsychiatry, Neuropsychol. Behav. Neurol., 5 : 233-252. Williams, G., Conner, J., Siegle, G., Ingram, R. and Cole, D. (1998) Is More Negative Less Positive? Relating Dysphoria to Emotion Ratings. Presentation at the meeting of the Western Psychological Association, Albuquerque, New Mexico. Williams, J.M.G., Mathews, A. and MacLeod, C. (1996) The emotional Stroop task and psychopathology. Psychol. Bull., 120: 3-24. Yates, J. and Nasby, W. (1993) Dissociation, affect and network models of memory: An integrative proposal. J. Traum. Stress, 6: 305-326.
433
Subject Index
ablation 205, 208-209, 213, 215, 217 absence epilepsy 289 acetylcholine 3-5, 8-10, 13-14. 19-20, 24-25,29,38,353 acquired dyslexia 67 activation gradient 147, 149, 152, 156, 160-161 acute stroke 243, 245, 250-251 afferent focal damage 69 after-hyperpolarization 28 1, 283 agnosia 166, 179-180 akinesia 262, 269-270, 277, 283, 392 alexia 166, 187 Alzheimer’s disease 12-14, 19-20, 3 I , 3940, 50-52, 68,72, 188-189 amnesia 7,47,49-50, 52-58, 60, 75-76 AMPA 23.25.3 1,36-37,40,292,300 amygdala 55,353,38 1,385-388,392-393, 407408,410,416-417,419,424,426-427 angular gyrus 187 aphasia 4748, 144,238-239 arousal 9, 55, 376-379, 383, 386, 388-391, 393,395.397-400 associative memory 14, 320 attention 10, 15, 55, 87, 92, 99, 102, 105, 107, 113-1 14, 116. 121, 159-160, 168, l94-lY5,2l3,220,261,283,327,381, 423 attention biases 407409 attention deficit disorder 375, 378, 391 attention in 1 1 1 , 117, 330 attentional allocation 101 attentional deficit 107 attentional impairment 144, 159 attentional mechanism 102-103, 121 attractor 6-7, 10, 12, 15, 30-36, 39-40, 67, 329-330.368 autism 12-13, 185 autoassociative recall 34 autobiographical memory 50
backpropagation 29, 75, 176, 182, 315, 342, 428-429 basal ganglia 123-124, 136, 261-262. 264-265,269-273,279-283,285-286, 353,366,391,393
basis function 83-89, 91-94, 131, 137 bicuculline 290,293, 301 biophysical modeling 21 biophysical network 35 blockade of cholinergic suppression 8 bottom-up 101, 104, 190, 195, 346 bradykinesia 262, 269, 277, 282 brain and cognitive disorders vii brain imaging 121, 185, 188-189, 194 burstingl9, 22, 25, 27, 29-30, 3840, 295, 357 CAI 5-6,20,40 CA3 5-7, 11, 19,21-22,25,30-31, 39-41 callosal damage 235 callosal lesions 219-220, 224, 227-230, 232, 235,239 cellular models 22 cerebral blood flow 186, 236, 238, 246-247, 249-25 1,257 cerebral cortex (see also cortex) 235, 246, 256257,261,289,321 cerebral hemispheres 159, 219-220,236 cholinergic hypothesis 19-20 cholinergic suppression 4, 6, 3 1 chronic disease progression 285 cognitive control 327-329, 332-333, 337, 339,343,345-346 cognitive impairments 13 cognitive interventions 426 compartmental model 355 competition 1 I , 75, 79, 87-89, 105-106, 1 1 1, 124, 136-138, 169, 172, 175, 178, 180, 328,336,339,378,380-38 1,397,426 complex sequential task 262 computer models 3 I 1 consolidation 4-5, 7-8, 10, 12-14, 29, 39,41, 54-58 context 5-6, 76, 80, 84, 102, 137-138, 148, 283 contextual units 149, 159 Continuous Performance Test 339 coordinate frame 83 corpus callosum 130,219-220,222-223, 227-230,232,234-237 cortex 3,5-9,20, 31, 34,49,54-55,57, 121-125, 153, 167, 187, 190-191, 194,
434 204,215, 220, 238, 243, 245, 247, 251-252,256,262,264,270-272,290, 295,297-298,300-301,303-304, 312-313,329,353,357,385,424 cortical ablation 2 15 cortical feedback connections 4 cortical metabolism 236 cortical modules 67 cortical spreading depression (CSD) waves 243,254 corticothalamic 289, 297-298, 301, 303 coupled differential equations 246 covariance paradigm 187-1 88 covert attention 1 I 3 D, receptors 363, 369 delay ensembles 355, 363 delusions 3 1 I dendritic integrity 281 dendritic weakening 70 dentate gyrus 5 depression 12, 14-15, 30, 133, 185, 236, 244, 375,399,407409,41 1 , 417,423,426429 direct 25, 34, 52, 55, 88-89, 114, 123, 125, 138, 189 disinhibition 130, 208-209, 262, 282, 426 disorganized speech 3 1 1 dopamine 19,41,237,264-265,269,272, 277-280,282-285,321,327,330, 351-253,355,357,359,361,363,366, 368-370,375,377-378,386 dopamine depletion 262, 264-265, 267 double dissociation 50, 165-166 drug intake-related motor fluctuations 278, 280,283 drug intake-unrelated motor fluctuations 278, 280,283 dysfunction 370 EEG 3,5, 243, 289 epileptic disorders 12 episodic 4, 9, 12-13, 19, 49-50, 53-55, 58-60 episodic memory 7, 8, I 1-12, 14, 47, 49-50, 52-54 error backpropagation (see backpropagation) exaggerated stimuli 78 excitotoxic cell death 313, 316, 318 excitotoxicity 41, 216-217 extinction 121 face identification 168, 178 face processing 165-167, 176, 179-181 face recognition system 167-168, 172
finger flexiordextension 262, 267 firing-rate model 152 fluoxetine (Prozac) 12 focal brain damage 235 focal ischemia 243, 245, 247 focal lesion 67, 193, 219, 232 forgetting rates 59 frame of reference 8 1, 82, 83-84, 88-89, 90-91,94,99, 113, 116, 137 frontal cortex 3, 12, 282, 312, 352 frontal lobe 49, 194, 269, 312, 328, 352, 364, 410 functional connectivity 185 functional imaging 185-186, 188-189, 221, 238 functional magnetic resonance imaging (fMRI) 185, 271 GABA 9,41,264-265,267,272-273.321, 353, 355,362-263,369 GABA, 9,31,33,289-296,300-301, 303-304 GABA, 9-10,23-25,41,289-295,297, 300-304 gate network 173-175, 177, 182 gated dipole 264, 378-379, 390, 392-393, 395,397 gating 334, 342 gating network 170 gating signals 262, 265, 345 GENESIS 22-23.4 1 geometrical form of the stimulated lesion 25 I globus pallidus 262, 282 glucose metabolism 236 glutamate excitotoxicity 244, 251, 254 glutamatergic synapses 280 G-protein 293, 302 grandmother neuron 2 12 hallucinations 8, 13, 311, 313, 316, 318, 321-322,253,390 handwriting 261-263, 265, 267, 269, 271, 2 74 Hebbian algorithm 21 1 Hebbian learning 75-76,79 hemi-extinction 134, 137 hemineglect 81-85, 94, 116, 122, 134-135, 137 hemispatial neglect 144 hemispheric asymmetries 220, 223-224, 237-23 8 hemispheric interactions 2 19-220, 222 hippocampal formation 5 , 7, 9, 20, 49
435 hippocampus 3-5,7-11, 13-14, 19-21, 28-31,40,48-49,53,55,58-59,293, 351-353,361,364,368,427 holistic representations 167, 180-1 81 hypokinesia 277,282-283, 285 hypometria 262, 269, 271 hypothalamo-thalamo-cortical axis 408 implicit memory 57, 60 indirect pathways 123,264-265, 267, 269, 272 infarct 243-244, 249-252 information processing 19, 110, 319-320, 351,353,361,368,407409,416-418, 422-424.426 integrate and fire neurons 1 I inter-hemispheric competition 121 interspike intervals 283 intra-modular 67 intrusion 353, 368, 369 ischemic penumbra 243,245, 250 ischemic stroke 245, 249, 254 Kohonen network 76, 79 Landau-Kleffner syndrome 13 language 13,47-49,76-77, 117, 145, 151, 159,220,224,229,237,320 language impairment 77, 144146 language lateralization 219 large-scale neural modeling 186, 188-1 89, 197 lateralization 136-137, 219-220, 223-224, 227-235,237-239 learning algorithms 75, 215 left-hemisphere lesions 144 lesion sizes 224-225, 234, 249 letter string encoding 143, 146, 149, 15 1 lexical access 145 line bisection 87, 91, 94, 102, 106, 116, 134 line cancellation 83, 91,94 local recurrent excitation 192 long-term memory 47,49-50,52-53,58-61, 389 long-term memory consolidation 60 map 85, 87, 89,94, 103-104, 106-108, 203, 379 maps 85-86,252 medial temporal lobe 48-49, 52, 55 memory 4,7-8, 11-13, 15, 19-21,29-32, 34-35,37,39-40,49,52,54,68, 194-195, 272,328,337 memory disorder ix
memory representation 4, 13, 15, 53-54, 56-59 mesocortical DA 333, 339, 342, 344 metabolic events 243 Mexican Hat pattern 229 mixture of experts model 176, 179 modular learning systems 179 modular neural network 165, 169 MORSEL 102-103, 109 motor fluctuations 277-279, 281-283, 285 motor behavior 262, 270, 274, 330 motor thalamus 262, 264, 270 movement control 261-262, 265, 267, 271, 387 multi-modular model 68 multi-modular structure 67 NAAG 14 negative 15, 25, 33, 139, 217 negative information 426 neglect 81-83,88-91,93-94,99, 101-102, 106-110, 113-114, 117, 121, 128, 133-134, 137, 143-145, 159 neglect dyslexia 102 neocortex 4-5,7-9, 13,29,49-53,58, 103, 297,363 neural models 220, 235,237,239, 322,392 neural network 22, 30, 32, 34, 38-39, 67-68, 75,92, 137, 169-170, 173, 190,205,211, 215,220,265,272,311,313-314,330, 355,365,375,407,409,418419,426428 neurodegenerative 47, 185 neurodevelopment 3 1 1-3 12,320 neuroimaging 137, 185-188, 190,313, 321, 329 neurology xi neuromodulators 12, 19-21, 41 neuronal gradients 86, 88 neuropsychology x NMDA 14,21,23-25,31,33,36-37,3940, 216,245,247,280,285,313 NMDA receptors 14, 23,40, 300 non-linear dendritic processing 69-70 Nucleus Accumbens 35 1 object recognition 100-101, 165-170, 176, 178,381 object-based reference frames 99-102, 109, 113-114, 116-118 object-based representations 99-10]. 113, 116, I18 object-centred coordinates 92-93, 137 object-centred reference frames 109 object-centered neglect 83, 90-9 1,94
436 object-centered representations 83, 90, 92 on-off phenomena 277 opponent processes 267, 380 opponent processing 264, 376, 392 optic chiasm 123 oral reading 143 orienting 122, 124, 127, 129-130, 133-134, 136-138,382-383,392,395 oscillations 9-10, 14, 19, 29-31, 34, 3940, 150-154,209,289,291-295,297,300, 301-304,368 overaroused 376, 378-379, 381-383, 386387.389-392 pallido-thalamic gating signals 262, 264 paradoxical effects 1 3 6 137 paradoxical lesion 121 paradoxical restorations of‘function 122, 128, I36 parietal 84-86, 89, 94, I 17, 121, 188 parietal cortex 82-83, 85, 93-94, 101, 103, 129, 131,262 parietal lesions 8 I , 88, 128 parietal lobe 8 1, 137 parietal neurons 82-83, 85,93 Parkinson’s disease 185, 277-283, 285, 330, 375,378,386,390-392 pathogenesis 245, 254, 320, 351, 353, 364, 368 pathological long-term potentiation 280 pathophysiological model 269-27 1, 28 1 penumbra 243-247,249-252 perceptual errors 102 phasic disinhibition 265, 271 Pick’s disease 48 plasticity 19, 29-31, 39, 55-56, 77-78, 137, 189,203-205,211,215-217,230, 237-238,332 positive symptoms 321-322, 363, 368, 370. 390 positron emission tomography (PET) 185, 279,289 post-lesion recovery arises 220 prefrontal cortex 187, 261, 3 12, 327, 35 1 , 365,385,387,388,390,392,393,424 presynaptic buffer model 279, 285 principal component analysis 176, 187 processing 3, 15, 54, 67-69, 81, 87, 91, 101, 103-105, 109, 121, 138, 143, 147, 151, 153, 155, 160,205,215,224 prosopagnosia 165-1 69, 175, 178, 1 80-1 8 1 pupil dilation 408409,414,419420,422, 426 pyramidal cell model 25, 29.41
pyramidal cells 14, 19, 21-22, 24-25, 29-34, 36,38,61, 192,219,265,300-301,303, 33 1,338,351,355,359,361,364 reaction times 90-91, 148,409,415-417, 4 19-420 reaction-diffusion process 243, 246 reading 75, 112, 116, 143, 145, 156, 159, 160, 166, 187 reading deficits 102 receptive field changes 208 receptive fields 82, 85, 92-93, 167, 170-171, 204-205,207,209,211,213,215,217 recovery 135, 203, 224, 236 recurrent connections 9, 31-32, 193, 362, 364,369 reference frame 99-101, 113, 116-1 17 regional cerebral blood flow 186, 236, 262 regular spiking 22.25, 27, 38, 39 rehabilitation 94-95, 203-204, 2 16 reinforcement learning 332, 338, 35 1, 353, 365,380-381 relative neglect 9 1 REM sleep 3, 8, 9, 1 I , 14, 15 retrograde amnesia 56 reward 176, 182, 321, 332, 366 reward functions 330 right parietal lesion 82, 86 ringed-shape lesion 252 rumination 4 I 6 4 17, 424, 428 runaway synaptic modification 13-14 schizophrenia 12-14, 185, 311-314, 312-314, 320-322,327,329,333,339-346, 351-353,355,363-365,368-370,375, 378,386-390,393 scotoma 2 13 seizure activity 12-13, 290 seizures 13, 289-291, 295, 303 selective cell death 3 13 semantic dementia 47-55, 57-61, 63 semantic memory 8, 12-14, 19,48-51, 59, 67 semantic networks 8, 409,427 sequences of patterns 12, 357, 363 serial processes 151 short-term memory 330, 352, 380, 382, 408 simulated stroke 235 simulations 3, 5-7, 11, 19, 22, 25, 27, 30, 34-36,38-39,47,58-59,6142,77-78, 83,8687, 89,91,99, 102-103, 108-109, 1 1 1-1 13, 117, 128, 143, 145, 148, 152, 155-156, 158-159, 192, 195,209,222, 229,237,265,267,271-273,297, 304, 3 12,333,363
437 single-word reading 187, 2 19-22 1 slow-wave sleep 3-9, l2-13,29-30 social withdrawal 31 1 spatial attention 99, 121-124, 130, 136-138 spatial orienting 125, 127, 135-136 specialization 122, 165, 167, 169-170, 173, 178, I82,2 19-220,223 spike response neurons 355, 356 spike trains 282, 283, 285 spike-and-wave 289-290, 295, 300, 302, 304 spindle oscillations 291, 293-295, 297, 300-301,303 striatal model 277, 283, 285 striatum 121, 264, 271-272.281, 353, 366, 387 stroke 156, l8S,203-204,211,2l3,2l6-2I7, 224.236-238,243,262,264,274 structural equation modeling 188-189, 197 sub-cortical structures 12I . 124, 136 substantia nigra 122, 281-282, 391 subtraction paradigm 185-1 86 superior colliculus 85, 87, 122 suprasylvian cortex 12I , 123 surround inhibition 215, 264, 271, 273 symptoms 13-14,47, 146, 158, 169, 185, 213, 277, 311, 327, 339, 341, 353, 264, 375,378 synaptic pruning 3 19-320, 363 synchronous 32,40, 293, 357 synchrony 31,248, 291, 293, 297, 300, 301 systems-level modeling 188 Target Position Vector 267 T-currents 304 temporal firing pattern 149, 151, 159 temporal encoding 149, 153 temporal lobe 47-50, 53, 57 temporal representation 149, 15 1-152 thalamic oscillations 292 thalamocortical 269, 272, 282, 289, 292, 300-302,304,38 I , 385 thalamus 29, 85, 121, 262, 265, 269, 271, 273,289-293,295,297-298,300-30 I , 303.3 12 theta rhythm 3-5, 9, 1 I , 14, 33-34,40
top-down 104, 195, 327, 346, 385, 388 TraceLink 47, SO, 53-60 trajectory 271, 274 transcallosal diaschisis 2 19, 238 treatment implications 426 tricyclic antidepressants 12 two-stage model 4, 29-30, 101-102, I18 underaroused 376-379, 38 I , 386-387, 389-392,398-399 unilateral neglect (see also hemineglect) 99, 101, 103 unmasking 208 upregulation 294, 299, 301
valence identification 409410,415-419, 422, 426,430 Vector Integration To Endpoint or VITE module 264 viewer-based frame 99, 102, 109, 116-1 17 vision 99, 116, 121, 130, 133, 136, 188, 194, 380 visual areas 204-205, 21 1 visual cortex 9, 83, 101, 103, 123, 130, 167, 171,380 visual errors 144-145 visual information 102, 116, 123 visual neglect 128, 136 visual object agnosia 165, 169, 178 visual orienting 127 visual processing 85, 188, 261 visual search 102 V-shaped lesions 252
Weber law 394, 399 word processing 143, 145-146, 156 word recognition 92, 146 word substitution errors 146 working memory 40.48, 186, 190, 194, 3 13-322,327-328,331,351-352,359, 361,363-367,369,389-390 wrist flexiodextension 267, 269, 274 wrist supinatiodpronation 267
This Page Intentionally Left Blank