This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
0. If we write
then substituting, we obtain where A(f) = A(l, t). Using the relation between the hazard function and the distribution function (Eq. 1.9), we have In terms of the variable I, with t held fixed, this is the Weibull distribution (Appendix B.I, item 5), which as is shown in Appendix A (Theorem A.5) arises as one of the limiting forms of the maximum of independent, identically distributed random variables. In order to bypass estimating time hazard functions, which as I noted earlier they felt to be risky, and yet to have some check on the model, Wandell et al. (1984) noted that Eq. 4.15 is equivalent to So, for each value of t, the left side is predicted to grow linearly with signal intensity measured in decibels. Their experiment to test this prediction was described in Section 4.1.3. The plot for one subject using the 500-msec signal is shown in Figure 4.19. They interpret this as support for Eq. 4.15. Considering the narrowness of the range, I am not persuaded. In both duration conditions and for both subjects, they find (3 is about 2. From Eq. 4.15 we can immediately write down the relation between -E(T) and I:
158
Detection Paradigms FIG. 4.19 Logarithm of the log survivor function versus intensity in dB for the response times to a faint flash of light. The parameter is response time. According to the model of Eq. 4.IS, these curves should be linear with the same slope. [Figure 5, Wandell et al. (1984); copyright 1984; reprinted by permission.]
For a specific choice of A(t), this can be evaluated either analytically or numerically. For example, with A(t) = A(, which of course is a mighty poor approximation to the data of Figure 4.11, it is easy to verify that E(T) = r 0 + 1/AJ 0 , which is Picron's law; however, the estimate of |3 = 2 is far larger than the constants reported in Table 2.1 for this law. No one has investigated Eq. 4.15 carefully for more realistic assumptions for the temporal hazard function. 4.3.6 Binary Increments and Decrements and Fixed Criterion: Difference of Counting Processes The following variant of a counting process, due to McGill (1963), is a type of random walk where the jumps occur at irregular times. Suppose the decision center is able not only to monitor the counting process that is affected by the signal—let N(t) denote the count at time t on that process— but also a counting process that reflects the (presumably) internal noise level of the typical, relevant nerve fibers—let I(() denote the count at time t on the internal noise process. So long as no signal is present and assuming that there is no external noise, the process N is assumed to be statistically identical to I. Thus, E[N(t)-I(f)] = E[N(f)]-E(I(f)] = 0. If the two counting processes are renewal ones, then it can be shown that the variance of N(t) —I(t) grows in proportion to t. When a signal is present, we assume that N is increased relative to I and that E [ N ( f ) - - I ( t ) ] grows with t. Thus, if a criterion is established for a sufficient difference to indicate the presence of a
Distributions of Simple Decision Latencies
159
signal, we should be able to derive (under some assumptions about N and I) the distribution of times at which the signal is detected. In fact, it is the diffusion process of Section 4.2.4 that leads to the Wald distribution, and so we have already dealt with it. 4.4 4.4.1
RACE BETWEEN LEVEL AND CHANGE DETECTORS Change and Level Detectors
Ample physiological evidence suggests the existence of two quite distinct mechanisms, both of which may have something to do with signal detection. One type of neural cell, called transient, is active immediately following a change in signal intensity, but otherwise remains quiescent. A second type, called sustained, is more-or-less steadily active when the signal is present, but otherwise is inactive. Direct physiological evidence exists for such cells both in vision and audition: see Abeles and Goldstein (1972), Cleland et al. (1971), Enroth-Cugell and Robson (1966), Gerstein, Butler, and Erulkar (1968), Kiang, Watanabe, Thomas, and Clark (1965), Kulikowski and Tolhurst (1973), Marrocco (1976), Moller (1969), Pfeiffer (1966), and Tolhurst (1975b). In addition, there is a limited amount of psychophysical evidence suggesting functional units (often called channels in vision) of this general type; they might be cells or networks of cells. In order to distinguish clearly between physiologically observed cells and hypothetical units proposed to account for behavioral data, I will use the terms "transient" and "sustained" only for cells and "change" and "level" detectors for the more speculative units. A change detector is, in effect, a differentiator operating on the sensory flow of information. It is not difficult to construct a change detector from a comparative simple network of electrical components having properties typical of neural cells. MacMillan (1971, 1973) interpreted some auditory detection and recognition in terms of change and level detectors, as do Ahumada, Marken, and Sandusky (1975). But the clearest behavioral data, by far, are certain simple reaction-time data. Tolhurst (1975b) did a visual study involving sinusoidal gratings that were relatively difficult to detect. A tone served as the warning signal for a random, but nonexponential foreperiod, following which a sinusoid grating of low contrast was presented. There were catch trials. The subject, Tolhurst, responded whenever he became aware of the signal. Two major variables were manipulated. First, the onsets and offsets of the signal were either abrupt or ramped over a 300-msec period, the idea being that a change detector can only respond to the abrupt changes and not to the slow ramps. Second, other evidence had suggested that low spatial frequencies activate the transient cells whereas high frequencies activate the sustained ones, so data were collected at 0.2c/deg and at 3.5c/deg. The several conditions that were run are apparent in Figures 4.20 to 4.22. Note the clear
FIG. 4.20 Histograms of simple reaction times to low contrast gratings of 0.2c/deg under the three temporal patterns of signal intensity shown. The sample size can be inferred from the figure. [Figure 1 of Tolhurst (1975a); .copyright 1975; reprinted by permission.]
FIG. 4.21 This figure is similar to Figure 4.20, but with an abrupt signal onset and three durations. [Figure 3 of Tolhurst (1975a); copyright 1975; reprinted by permission.]
160
Distributions of Simple Decision Latencies 161 FIG. 4.22 This figure is similar to Figure 4.20, but with a grating of 3.5 c/deg. [Figure 4 of Tolhurst (1975a); copyright 1975; reprinted by permission.)
evidence in Figures 4.18 and 4.19 that at low frequency the abrupt change initiates a response and the ramp does not, whereas at high frequency the abrupt changes are ignored and only the sustained level matters. Additional data supporting this point of view are found in Breitmeyer (1975), and a parametric study of mean reaction time as a function of the spatial frequency and contrast of gratings was carried out by Harwerth and Levi (1978). Although they continued to find evidence for these two types of detectors, the pattern of shifting between them was complex. Parker (1980) and Parker and Salzen (1977) found additional complexities including evidence for reaction-time changes with spatial frequency within one type of channel. In the auditory domain no one has yet discovered a signal parameter, comparable to spatial frequency, that delineates psychophysically the two classes of cells so sharply.* Therefore, it is necessary to approach the problem somewhat less directly. * J. W. Tukey (personal communication) has suggested that frequency modulation might be suitable.
162
Detection Paradigms
On the assumption that there are two types of detectors, both of which may be activated with appropriately complex stimuli, the question is: how does the system go about generating a response? One plausible hypothesis, which we explore in this section, is that whichever detector first detects a signal triggers a response. If a system responds when the first of several random processes is completed, then we are involved in a race among the processes. Suppose that in general there are n processes, that T k is the time to complete the fcth process and its distribution function is Fk, and that the detectors are statistically independent of each other, then the distribution of times to complete the race is given by
If A k (t) are the hazard functions of the T k and A(t) is that of the distribution of Eq. 4.16, then we know by Theorem 1.3 that
Thus, it is very simple to study the behavior of the hazard function, and so the distribution function, of race models involving independent subprocesses. The literature includes a number of race models that are different from the one I am about to describe. Two examples are Oilman (1973) and Logan, Cowan, and Davis (1984). Burbeck (1979) and Burbeck and Luce (1982) pointed out that if a signal activates both classes of detectors, if neither totally overwhelms the other, and if there is a race between them, then we should see a characteristic pattern in the reaction-time distribution. This pattern is most easily understood by looking at the hazard functions. Motivated by the data shown in Sections 4.1.2 and 4.1.3, they assumed that a change detector is characterized by a hazard function that rises from its constant noise level when no signal is present to a peak shortly after signal onset, after which it soon again returns to the noise level. By contrast, a level detector rises rather more slowly from the noise level to some constant value that is sustained so long as the signal is present. Specific models yielding these two shapes are discussed shortly. Figure 4.23 exhibits these two hazard function patterns separately as well as giving their sums and the corresponding log survivor function. Observe that this hazard function has the pattern shown in the data of Figures 4.5, 4.7, and 4.11. The question to be dealt with below is whether plausible specific models can account for these data in detail. The theoretical literature on level detectors, per se, is rather sparse, most of the effort having gone into models in which decisions are based upon an accumulation of information over time (Sections 4.2 and 4.3). Basically both level and change detectors are devices with an absolute minimum of
Distributions of Simple Decision Latencies
163
FIG. 4.23 Illustration of the hazard and log survivor functions that arise from a race of two systems, thought to be change and level detectors, with the hazard functions shown. [Figure 1 of Burbeck and Luce (1982); copyright 1982; reprinted by permission.]
memory, whereas the accumulator models have substantial memories of a sort. It is possible, however, that an accumulator process normally designed to identify signals can serve the function of a level detector. Two kinds of ideas for level and change detectors have received limited attention. In both, the sensory input is assumed to be transduced into a spike train whose firing rate is an increasing function of signal intensity; in fact, we shall suppose that these are Poisson processes in which the hazard function has one constant value v before the signal onset and another value fA after the signal onset. In the first class of models, we assume that the decision mechanism acts on the resulting pulse train in the way a naive statistician would; I make no attempt to state how this behavior could come about physiologically. A level detector is assumed to keep updating its estimate of the current firing rate, and it initiates a response whenever that
164
Detection Paradigms
estimate exceeds a fixed criterion that depends upon the experimental conditions (see Section 4.4.3). A change detector has sufficient memory also to retain the preceding estimate, and it compares this estimate with the current estimate and initiates a response whenever the two estimates differ sufficiently in the appropriate direction. Again, a criterion must be established for what constitutes a sufficient difference (see Section 4.4.4). The other class of models, which has only been developed for the change detector, attempts to be far more realistic about possible physiological mechanisms, and it has been fitted carefully to one body of data (see Section 4.4.5). 4.4.2
The Hazard Function for Noise
As we saw in Figure 4.3, the hazard function for a loud noise rises to a high value (about 60-100 events sec' 1 ) and stays there, unlike those for loud clicks and pure tones that return to near zero values (see Figures 4.1 and 4.2). In form, the noise hazard function looks much like the pure tone, weak signal functions of Figures 4.5 and 4.7—that is, like our conception of a level detector shown in Figure 4.21. However, in absolute value there is a substantial difference, with the noise functions being between 10 and 40 times higher than the near threshold signals and, of course, reaching their maximum at an earlier time. Is there a plausible explanation for the difference? One untested possibility is that there is only a single change detector but many level detectors located at various frequencies—most likely one for each critical band of frequencies, of which something between 20 and 30 are estimated to exist in human beings (Scharf, 1970). Thus, for a pure tone the race is between the change detector and just one of the level detectors; whereas, for a noise signal the race is among the change detector and all of the level detectors activated by the noise, which number of course depends upon its bandwidth. It is easy to see that the sum of the n (nearly) identical hazard functions of the level detectors itself looks like the hazard function of a level detector, but with a magnitude n times as large as for a pure tone of the same intensity. This means that for wide band noise the level should be 20 to 30 times the height of the comparable pure tone data, which is about where the asymptote is. The height of the single change detector component does not change as one goes from a pure tone to wide band noise, it only increases the rise time of the apparent level detector. This explanation needs to be tested. For one thing, why there should be only one change detector rather than one for each critical band is not obvious, but that assumption is necessary in order to avoid having a peaked hazard function for noise. Presumably the change detector somehow operates on something corresponding to the temporal waveform of the signal rather than on a decomposition of the waveform into frequencies, as in the critical bands. Obviously, reaction times should be obtained with the same
Distributions of Simple Decision Latencies 165
subject for various bandwidths and for multiple bands of noise and their hazard functions estimated to see if they behave as predicted. Moreover, as J. Johnson (personal communication) suggested, if this hypothesis is correct, then an auditory signal composed of one frequency for a short period and then shifted to an appreciably different frequency, but without a change of intensity, should activate the change detector just once and not at each frequency separately. That prediction should be easy to check.
4.4.3 Level Detection as Comparison of an Estimated Local Rate with a Criterion One general idea for a level detector was outlined earlier. A criterion C is established (under the influence of the payoffs and instructions) and the rate of firing is continually estimated from local samples of data, compared with C, and a response is initiated whenever the estimated rate exceeds C (assuming signal onset involves an increase in intensity). There are a number of ways to carry out the local estimate, including counting and timing, but all lead to substantially the same result. I consider several specific examples. First, suppose time is quantized into intervals of duration 8, and the number of pulses N n in the nth interval is counted as that interval is completed. The local estimate of the rate is jin = NJS. Assume for simplicity that both noise and signal generate Poisson processes. Obviously, during any interval for which the Poisson rate is constant, N n has some distribution (called the Poisson) and so does \i.n; thus, there is some fixed probability (independent of n so long as the Poisson rate is constant), call it p, that |in > C. This is the well-known geometric process; it is the discrete analogue of the exponential, and both have constant hazard functions—the one in discrete time and the other in continuous time. So this level detector can be described as a geometric process with parameter pv up to the interval at which the signal change occurs, a second geometric process with parameter p^ until a response occurs. During the time between the two geometric processes, there is some sort of smooth transition from pv to p^. An alternative way to estimate a local firing rate is to observe the interarrival times (IAT) between successive pulses, which under the Poisson assumption has an expected value of l/v prior to the signal onset and I//LI following it. The decision criterion is to respond whenever IAT
166
Detection Paradigms
of the process as beginning anew, but with time t - x to the response, so
If we assume C is small—with firing rates on the order of 10 to 100 per sec, C must be less than .1—and if we expand the integral in a Taylor's series and retain only the linear term in C, then we get
Differentiate this with respect to t, cancel e*\ and collect terms,
which is well known to have the exponential as its solution with the parameter ju,2C/(l + fj,C). So, once again, this sort of local level detector has a hazard rate with one constant value up to signal onset, at which point it increases and eventually settles down to another constant value. The exact nature of the transition is not understood except that it is monotonic increasing. Obviously, this last model can be generalized so as to base the local estimate of the rate upon k successive lATs, which of course are gamma distributed (Appendix A.1.5). Nothing is known of the properties of this model, but on intuitive grounds one expects qualitatively the same sort of transition from one constant level to another, but the transition should be slower the larger the value of k. Rapoport, Stein, and Burkheimer (1979) examined the following model. Time is discrete, and an event occurs at one of these times according to a fixed geometric distribution. Observations X n are obtained at each instance of time; they are independent and before the event they are distributed according to one distribution and after the event according to another one. This obviously is a generalization of the counting situation just described. In Chapter 5 of their monograph they study the decision rule that a response is initiated whenever there have been k successive observations that exceed a fixed criterion. The counting model was the case where k = 1 and X n has a Poisson distribution. They provide exact formulas for the response distribution, but as they are very complex and none too revealing I will forego reproducing them. To gain some idea of the performance of such a system, they report several simulations of the process. Reading the numbers from the graphs of the distributions of response times following the event (signal onset), I computed the hazard functions. As one would anticipate, the
Distributions of Simple Decision Latencies 167
process has the character of all level detectors: a monotonic transition between two constant levels. A further generalization, which they do not study, would be to respond when j out of the last k observations met the criterion. They do study (Chapter 4) the case where the system responds whenever a total of k observations meet the criterion, but that really is an accumulation model of the recruitment type discussed in Section 4.2.2. Still another idea, due to Kingman (1963), is for time to be partitioned according to a fixed Poisson process. The sensory process is a counting process, not necessarily Poisson, and a criterion N is established so that whenever the sensory count between two successive Poisson events exceeds N, a response is initiated. While different in detail, all such processes end up with a hazard function that rises gradually from one constant level to another when the signal is a sudden change to a constant value. 4.4.4 Change Detection as Comparison of Two Estimated Local Rates Using either the counting method or timing method for estimating rates, if the system has sufficient memory to retain successive estimates, (in and lLn.i, then a decision rule that is plausible is to respond whenever \in — p, f l _ 1 >C or, alternatively, when |in/(i,n...1>C. In principle, one should be able to work out the hazard functions for either of these rules, but in fact nothing yet exists. In order to get some idea as to the actual performance of such a change detector, I had simulated* a process in which the rate estimate is based on k successive lATs and the ratio decision rule is used. The results are summarized in Figure 4.24. There are two curves on each plot. Those to the left of the origin are the hazard function of the ratio mechanism when it is applied to a Poisson process with a mean rate of 5 spikes per sec. Those to the right of the origin result when Poisson noise is decreased discontinuously from either 20 or 50 spikes per sec to 5 spikes per sec and the change is controlled by an exponential distribution with a mean of 3 sec. The mechanism used k = 2 (samples of two lATs for each estimate jju n ) and a criterion C of <=. Such a mechanism clearly gives results of the correct character and order of magnitude. 4.4.5
Change Detection as Neural Post Inhibitory Rebound
Although the simple-minded statistical models of detectors are indeed simple to state, they are not always as tractable as one might like and they are not particularly reasonable from a physiological point of view. By that I mean that simple networks of units having known neural properties do not * A. F. Smith carried them out.
FIG. 4.24a, b Hazard functions estimated from simulations of a decision rule in which a Poisson rate is estimated from two successive samples of k interarrival times and their ratio is compared with a criterion C. The foreperiod is exponential with parameter A, and the two Poisson rates are v before the signal onset and /x after onset. 168
Distributions of Simple Decision Latencies
169
result in either counting or timing behavior as such. Both processes require a means of counting, of timing, and of dividing. This lack of realism led Burbeck (1979, 1985) to suggest an explicit neural model for change detection; it is an adaptation of one first suggested by Perkel (1974) in another context. Assume that a neural spike train, which encodes the sensory information, impinges upon a decision neuron, which is characterized by three functions of time. The first is its membrane potential, p(t). The second is its threshold, 0(0, which serves to control when the decision neuron emits a spike namely, when p(t) >#((), and that in turn initiates a response. And the third is a rebound variable, q(t), which functions in the following ways. Whenever a sensory spike arrives, the rebound variable is incremented; otherwise it decays exponentially. It plays a role of memory in the process by increasing the likelihood of a spike output (and so a response) both by increasing the membrane potential and decreasing the threshold. Assuming that all of the changes are linear, we write down the three equations for what happens when no spike occurs:
where a, |8, and a are time constants of the three processes, O^TJ < 1, and POC and $„ are the equilibrium levels of p and 6. [These equations are closely similar to, but more complex than, the accumulation model postulated by Grice (Section 4.3.1).] The solutions to these simultaneous equations are easily written down, but there is no reason to give them here. The impact of a sensory spike arriving at time t can be expressed directly in terms of increments on p and q: for small 5 > 0 ,
where Ap, Aq, pR, and qR are parameters. There are a total of 10 free parameters in the equations as written, but it is natural to assume that reducing the total to 9. To apply the model to data, Burbeck proceeded as follows. First, he attempted to reduce the number of free parameters by taking into account various facts known from other sources. Second, he attempted to deconvolve the residual distribution in order to reveal the decision distribution. And third, since the data are for weak signals, he took into account that a level detector might be working in parallel with the change detector. I take each up separately. Since the variables have explicit physiological meanings, those parameters
170
Detection Paradigms
having to do with equilibrium behavior can be estimated directly from physiological observations. Burbeck selected the following values from the literature: px = —70 mv, 0^ = —60 mv, pR = —75 mv, qK = 0«,— pR = 15 mv. Moreover, he assumed that 0 is not directly affected by q—that is, T) = 1. This renders 9 independent of the rest of the system and, for large t, 0 approaches &„ independent of the value of (3, and so it need not be estimated. These assumptions leave one with a, cr, Ap, and Aq to be estimated. As we have seen in Chapter 3, no clearly satisfactory method exists to rid ourselves of the residual times. Among the questionable alternatives, Burbeck opted to approximate the residual distribution by the reaction-time distribution to an intense signal and to deconvolve it from the distribution of reaction times to the weak signals of interest. There is the distinct possibility that too much is removed in this way, but aside from a displacement it should have limited impact on the form of the hazard function. But working with weak signals and assuming a race model, we know that the level detector will play a role. As we have seen, none of the models led to an explicit usable formula for the hazard function, and yet all were roughly of the same character—namely, a gradual rise at signal onset from one constant value to another. Burbeck selected the cumulative exponential as a reasonable approximation to this type of growth: where a and T are two free parameters and t() is the time of the signal change. This increases to six the number of free parameters. The sensory spike trains were assumed to be Poisson, with parameter v when the signal is present and /j, when it is absent. On the basis of physiological data, he set v — 10 spikes/sec; the actual value does not matter greatly since to a good first approximation the model is only sensitive to \L\V. Note that the numbers of free parameters do not increase in proportion to the number of experimental conditions run. If the signal intensity is changed, we anticipate that \L and a will change, but all of the other parameters will remain fixed. So Burbeck proceeded as follows. He ran three conditions: signal offset to noise from both 23 dB and 25 dB and signal offset from 25 dB to 23 dB. The first two sets of data were used to estimate all of the parameters, and these were used to make a parameter-free prediction of the third condition. The technique was an exploration of presumably relevant regions of the parameter space using a computer simulation of the model; it was very time consuming and it is by no means clear that optimal results were attained. In any event, Figure 4.25 shows the fit of the model to the three sets of data. It is obvious, and was confirmed statistically, that significant discrepancies exist, but also that it is clearly capturing much of the phenomenon. One would expect some relation between the estimates of fi and a since both reflect the Poisson spike rate in the two types of detectors. Figure 4.26 shows the relation, and it is strikingly linear.
FIG. 4.25 Fit of the physiologically based model (PIR) discussed in text to a sample of simple reaction time data. [Figure of Burbeck (1985); copyright 1985; reprinted by permission.]
171
172
Detection Paradigms
FIG. 4.26 Estimated Poisson rate parameter from the PIR model (see text) as function of the estimate asymptotic value of the hazard function for simple reaction times. [Figure of Burbeck (1985); copyright 1985; reprinted by permission.]
4.4.6
Why Two Types of Detector?
Because the distinction between change and level detectors has only recently been clearly drawn and shown to be relevant to reaction time, most papers are not explicit about the type of detector intended. On the one hand, since most of the models have been confronted by data from relatively intense signals, one could interpret them to be models for a change detector. But, equally well, they may be models for a level or some other type of detector that were simply fit to the wrong data because the author was unaware of the possible distinction. On the other hand, internal evidence in some of the models, such as those described in Sections 4.2 and 4.3, suggest they were definitely not intended as change detectors. Many models derive from ideas about choice reaction time, where it is not enough for the subject to become aware of a change in stimulation, but he or she must accumulate sufficient information about the signal in order to identify it among several possibilities. To do that surely requires the use of processes that continue to carry information about the signal throughout the period of its presentation. This raises the possibility that a level detector is nothing other than an adaptation of a mechanism normally used to recognize signals. It is brought into play as a detection mechanism when either the intensity change is so small or the change is so gradual that the usual change detector is rendered inoperable.
Distributions of Simple Decision Latencies
173
If this be true, two things follow. First, we must be quite careful in the data we use to test either kind of model, since otherwise the other type of detector is likely to have an impact. If our interest is in level detectors, then we probably should be using low intensity, slowly ramped signals. The major difficulty with that advice is that the models are either silent as to what happens with a changing signal or, if not silent, difficult to analyze. Second, since most of the modeling appears to have been driven by considerations not at all suited to change detectors, we lack really adequate models for them. This is ironic since at intensities well above threshold only these, if they exist, are relevant. 4.5
CONCLUSIONS
It is easy to give a qualitative summary of the data for simple reaction-time distributions to auditory signals: for intense pure tones and clicks, but not intense noise, the hazard function for the reaction time begins some 100 to 150 msec after signal onset to rise rapidly from its noise level, reaches a peak, and falls rapidly at approximately the same rate ending at the noise level. As signal intensity is reduced, the height of the peak is reduced, as one would expect; but at the same time for signals of sufficient duration (> 500 msec) the right side of the hazard function no longer drops to the initial noise level, but rather becomes constant at some higher value. At sufficiently low levels, the peak disappears and so the function rises monotonically to its asymptotic value. For vision, less seems to be known about simple reaction-time distributions. For weak monochromatic signals, the behavior seems similar to that to pure tones. I know of no distributional data comparing intense signals to weak signals for monochromatic light nor of any whatsoever for white light, the analogue of auditory noise. Of the traditional theories of information accumulation, only two have the appropriate qualitative features: the variable criterion model with a nonlinear deterministic accumulation function (and, of course, its several generalizations) and the Wald distribution arising from the random walk model. A more careful fit of the Wald model to tone data showed it to be inadequate. The possibility remains that if we were able to estimate the residue better and deconvolve it and/or alter the assumption that the steps in the random walk are normally distributed, then that model would be adequate. The one attempt to fit the Grice model, which was fairly successful, suffered from an incorrect reduction of the parameter space. A different account was given in terms of a race between two types of detectors—called change and level—that are postulated to have hazard functions corresponding, respectively, to what one sees for the most intense and for the least intense, but detectable, signals. The psychophysical evidence for this distinction is moderately direct in vision, but less direct in
174
Detection Paradigms
audition; the physiological evidence is direct in both domains. What we now need is considerably more work on theories for the two types of detector. The physiologically motivated theory of Burbeck for a change detector is complex and has many free parameters. The statistical one of comparing interarrival times locally has not yet led to analytic solutions. The same is true for the local version of a level detector. And finally, a number of the information accumulation models have ha/ard functions similar to those we have postulated for a level detector. One major reason for trying to arrive at the precise nature of the reaction-time distribution is to be able to gain some understanding of how the various experimental manipulations reported in Chapter 2 impact the basic parameters of the model. Had we a well-accepted theory of the relevant decision proccss(es), then at this point we could look into how the parameters of the process vary with those manipulations, thus bringing some closure to Chapter 2. In addition, we could anticipate the evolution of theories for those relations and some account about how they relate to functions found in other parts of psychophysics. However, this is not yet the case, and our story ends here awaiting further understanding of the mechanisms that give rise to change and level detection.
5 Detection of Signals Presented at Irregular Times 5.1 INTRODUCTION All of the experiments discussed to this point and all to follow in later chapters were conducted in highly structured, trial designs. Although this strategy is most convenient for the experimenter, it is after all really quite unlike any ordinary situation encountered by the subject. One can therefore wonder if the experiments are so atypical as to mislead us seriously about the subject's normal behavior. The purpose of the present chapter is to explore the extremely limited literature that exists on designs in which there is little or no experimenter-imposed time structure on signal presentations. This leads one, however, into a special problem that has mostly been studied in the context of trial designs, and that is treated in Section 5.4. The simplest case to consider is the one in which there is only one type of signal, as in any simple reaction-time experiment, that is presented randomly in time at some constant rate. The subject has a single response, such as a key to press, and is free to respond whenever the signal is thought to occur. Judging by our discussion in Section 1.2.3, presenting signals randomly in time should mean presentation according to a Poisson process— that is, with a constant hazard function—and so the time between successive signal presentations is distributed exponentially. The data from such an experiment consist of two time series: the one recording the times at which signals are presented, and the other recording the times at which the subject responds. Although such a design is quite reasonable as an idealization of the subject's natural environment, it encounters two quite serious difficulties that have greatly limited its use. The first is that under an exponential interstimulus interval the most probable time between signals is zero, which of course is impossible to realize since signals must be of some duration in order to be detected. So the experimenter cannot avoid deviating somewhat from the strict Poisson process whenever a very short interstimulus time arises. But even if that can be dealt with in a satisfactory manner—for example, by dropping the hazard function of signal presentations to zero for the duration of the signal plus a short additional time in order to make the signals distinct—there is still a concern. The subject's response to a signal may well differ when that signal is isolated from other signals as compared 175
176 Detection Paradigms
to when it is immediately preceded by one whose processing is still underway. As we shall see, this is quite a serious matter. The second difficulty is that there is no sure way to establish correspondences between the response-time series and the signal one. Which responses go with which signals? Sometimes it is obvious that a response cannot possibly have been due to any experimental signal because none has occurred for some seconds before the response; but in other cases matters are not as clear. Examples are a signal 700 msec before the response and two signals one 500 msec and the other 150 msec before the response. In the former case, was that response actually due to that signal, or was the signal missed and the response technically a false alarm in the sense that it was initiated by noise? In the second case, was the response a slow one to the first signal, perhaps slowed additionally by the interference of the second one, or was the first signal missed or ignored and the response actually a fast one to the second signal? The fact is that we really do not know how to answer such questions unequivocally, and one is forced to explore various models of what is going on. The structure of the chapter is this: In Section 5.2 I examine briefly the earliest literature, which goes under the name of vigilance, in which the signals occur quite infrequently. One of the primary concerns is the manner in which performance deteriorates with the duration of the "watch." That term arises because this work, which began during World War II, was largely motivated by military applications having to do with radar and sonar watches. The fact that the signal rate is very low largely eliminates the problem of two signals being very close together and so that difficulty can be ignored; but the problem of associating responses to signals remains. In Section 5.3 1 take up experiments that resemble typical simple reaction-time ones in that the signal rate is relatively high; here, of course, the question of signals close together is an issue, and several attempts to deal with the two problems are outlined. Section 5.4 is devoted exclusively to studies aimed at discovering the impact that the processing of one signal has on the processing of a second signal when the second occurs very shortly after the first. Unlike the earlier experiments of the chapter, these are trial designs. 5.2 VIGILANCE AT LOW SIGNAL RATES The major concern in the vigilance literature is with the factors that lead to the decreased performance observed when the events to be detected occur sporadically and infrequently. The basic fact is quite simply that observers detect more signals early in a watch than they do toward the end of it. To bring this into the laboratory, two major types of design have been used. One is in fact a trial design, not unlike those we have already studied, except that the probability of a signal occurring on a trial is exceedingly low. Perhaps the most famous version of this design is the clock experiment of N.
Detection of Signals Presented at Irregular Times
111
H. Mackworth (1950) in which a clock hand jumped once per second through a circle divided into 100 equal intervals. A signal was a jump of two intervals rather than one. The signal to non-signal ratio was 1:150 and the watch lasted two hours. The detection rate fell from 85% in the first half hour to 73% during the rest of the watch. Because this is actually a trial design, the methods we have previously discussed can be applied. It is, however, obvious that one can run closely analogous experiments in which the usual analysis is not applicable. An example is any design in which a signal, such as a brief duration tone, occurs every now and then without any indication of possible times of occurrence. Such a study was run by Broadbent and Gregory (1963) as part of an effort to disentangle the sources of decay in performance. They asked whether the loss is due to reduced sensory sensitivity or to some change in the tendency to make a detection response. They reported two experiments: a visual one that was a trial design and an auditory one that was trial free. In the latter the signals were 1-sec, 1000-Hz tone bursts that were 5 dB above absolute threshold. The watch was 45 min long, with (unknown to the subject) the same temporal pattern of presentations during each 15 min subinterval. In one condition a total of 18 signals occurred and the times between them varied from 25 to 299 sec. In the other, 72 signals were spaced from 4 to 153 sec. The subjects gave confidence ratings on a four-point scale. The authors wished to analyze the data in terms of the concepts of signal detection theory—sensitivity and response bias. There was little difficulty in doing this for the trial design since one could clearly identify both signal and nonsignal trials, but in the auditory experiment nothing corresponded directly to non-signal trials. Their solution was to suppose that the s'ubject covertly divides time into intervals, comparable in size to the duration of the signals, and so renders the situation into a series of Yes-No detection trials. This permitted them to calculate a false alarm rate. Using the confidence ratings they constructed an ROC curve in the usual way, and so calculated the measures d' of sensitivity and (3 of response bias. Their finding was that performance deteriorated from the first third to the last third of the watch because |3 increased whereas d' did not change significantly. This conclusion had considerable impact on theorizing about the cause of the diminished performance. For a detailed summary of these and numerous other studies about vigilance, see Davies and Parasuraman (1982), Davies and Tune (1969), Mackie (1977), J. F. Mackworth (1969, 1970), and Stroh (1971). Although efforts have been made to show that selection of the time interval in this analysis little affects the conclusions, there clearly is something terribly artificial about introducing it. We have no reason whatsoever to suppose that subjects actually reduce ordinary detection tasks—either in life or in the laboratory—into a series of discrete judgments, and there are surely reasons to think otherwise. The signals that one constantly contends with come in all sizes, magnitudes, and durations with no prior indication about the type with which one will have to deal, and so it is difficult to
178 Detection Paradigms
imagine how an appropriate unit of time can be chosen. Such an assumption appears to be more a matter of convenience for data analysis than a serious theory about detection behavior. It probably results from the general discomfort psychologists have evidenced about dealing with probabilities in continuous time—that is, data as time series and models that are continuous stochastic processes. As we shall see, the study of such processes is not without its complications; nonetheless, that seems to be the natural formulation of experiments in which signals occur at irregular, perhaps random, times. 5.3 VIGILANCE AT HIGH SIGNAL RATES 5.3.1 Pseudo-Random Presentations and Pseudo-Hazard Functions To my knowledge, the first experiment in which weak signals were presented at irregular times but at a relatively high rate was Egan, Greenberg, and Schulman (1961). They spoke of the method as "free responding," but I am more inclined to emphasize the irregularity of the signal presentations than that of the responses. The temporal structure of the design consisted of inter-stimulus intervals of 3.5, 5.5, 7.5, 8, 8.5, 9, 9.5, 10, 10.5, 11, 11.5, 13.5, and 15.5 sec, which were selected according to a uniform distribution. As we know (Section 1.2.3) this means a rising hazard rate governing the time of the next presentation. In their second experiment the signals were 1000-Hz pure tones in a background of noise with signal-to-noise ratios of 9, 11, and 13 dB. Responses, which the subject was free to make at any time a signal was thought to have occurred, were recorded only to an accuracy of 133.3 msec. Each of four subjects was run under three instructions about the criterion to use: strict, medium, and lax. The procedure of data analysis was, first, to eliminate all interstimulus intervals (ISIs) of 3.5 and 5.5 sec, and then to estimate the density of stimulus-response times conditional on neither a response nor a signal intervening. To be quite explicit, what they did was partition time into 133.3-msec bins, and then they counted the total number of responses whose time since the preceding signal fell into each bin. Of course, this count excludes all ISIs shorter than that bin, which makes the estimated distribution conditional on no signal intervening, and by the very way the estimate is constructed it deals with the first response following the signal. Thus, it is a kind of hazard function, unusual in its conditioning on no intervening signal. The type of result this leads to is shown in Figure 5.1. We see three major features: The response pseudo-hazard function exhibits a sharp peak at about 1 sec, which appears to have been caused by the signal, after which there is a long, flatish tail that resembles a Poisson process of false alarms. Both the height of the peak and the level of the false alarm rate increases with the laxness of the criterion, and the height of the peak increases systematically with signal strength.
Detection of Signals Presented at Irregular Times
179
FIG. 5.1 For a free-response, auditory detection experiment, histogram of the times between a signal and the next response conditional on no intervening signal. The distribution of times between signals was equally likely on 13 values. There were from 240 to 640 observations per condition for each subject. This is one of four subjects run. [Figure 12 of Egan et al. (1961); copyright 1961; reprinted by permission.]
To my mind, the approach taken in this experiment and its analysis is in the correct direction in that it does not attempt an artificial partitioning into trials. It has, however, two drawbacks. First, the fact that the signals are not really random in time is worrisome since, as in the simple reaction-time experiment, this may induce the subject to modify his or her response strategy as a function of the time that has elapsed since the last response.
180 Detection Paradigms
And second, although the conditional density function that was estimated is plausible, we do not really know what to make of it. There is, after all, no theoretical reason to expect it to have any particular form, and there are other ways one might choose to analyze the data. The next group of studies attempted to overcome both of these problems; but as we shall see their design in fact generated another problem that has never been successfully overcome, 5.3.2
Signal Queuing Model for Random Presentations
Consider idealizing the signals as points in time and assume that they are presented according to a Poisson schedule with hazard rate A. Suppose that when a signal is presented the subject detects it with probability q. Moreover, suppose that the noise in the situation generates false alarms and that this too is random in time with a constant hazard rate v. Then the overall process of the events that trigger responses is Poisson with the hazard rate TJ = qA + v. Further, once a response has been triggered, we suppose that there is some delay until the response is executed, and that this is a random variable R with density function r. The only assumption we shall make about r, one that was previously used in Section 3.2.2, is that the random variable in question is bounded from above; that is, for some value T,
In practice, it seems safe to assume that all response processes arc completed within 2 sec. Up to this point the model seems relatively unexceptional, but there is a problem: what happens if a response that is triggered at time t is not completed before another signal or pseudo-signal due to noise intervenes? What sort of interaction arises between the two signals and their corresponding responses? Here we arc speaking of effects lasting several hundred milliseconds, and so they are not simple stimulus masking effects that are measured in tens of milliseconds. Such interaction is a serious matter when signal events follow a Poisson schedule because it is not uncommon for two to occur close together. Obviously, it is an empirical matter to uncover the nature of the interaction, and one can follow either of two strategies. The most obvious strategy—and, it turns out, the most successful—is to isolate experimentally the problem of responding to two signals as a function of the inter-stimulus time and to study it in isolation. This procedure I discuss in the next section. The other alternative is to make guesses as to what might happen, selecting them in part so that the mathematics of the theory can be developed, and deriving predictions about what is to be expected in the data. This is what Luce (1966), Green and Luce (1967), and Luce and Green (1970) attempted, and I recount these developments briefly. In the first two papers listed, they postulated a lock-out model in which
Detection of Signals Presented at Irregular Times 181
any signal that occurred during the completion of a response process was simply ignored, or locked out. This assumption—invoked in classical models of certain Geiger counters—coupled with the previous assumptions, makes it possible to derive a number of unconditional and conditional density functions. In the 1970 paper they also considered three other models. One, called response suppression, assumed that each signal event—either a signal detected or noise event perceived as a signal—initiates a response process, and a new process suppresses an ongoing one. A second was again a lock-out model, but with the more realistic assumption that the signals are of short but non-negligible duration. And the third was a queuing model in which each signal initiates a response process provided that none is under way, and when one is underway the signal is stored in a queue until the response mechanism is free, at which point it initiates the process. The last model is of interest since it is one of the major proposals that has evolved from empirical studies of responses to pairs of signals presented in close succession. This topic is discussed in the next section. 1 therefore present here the queuing model for random signal presentations. Consider, first, just the density between two responses, call it f l . This is the sum of two terms, one arising if there is a signal in the queue and the other if there is none. When a signal is in the queue, the density of responses is just r(t). If not, the density is simply the convolution of the density of the stimulus event, which is exponential with parameter TJ, and the response density r. The queuing assumption is implicit since while a second stimulus event can occur in the interval between the first signal and its response, we do not set up a race between the two response processes as leading to the next response. Let p = probability that a signal is in the queue. If TjE(R) < 1, then it can be shown that, asymptotically, p = 7)E(R) (Cox & Smith, 1961, p. 53). Thus we see,
Observe that by invoking the boundedness of the response process, Eq. 5.1, we obtain where we have defined the quantity
So, by Eq. 5.3, the tail of this distribution beyond time r is predicted to be exponential with the time constant TJ, which provides one way to estimate TJ. Another observable quantity is the probability of a response occurring
182 Detection Paradigms
within time T of the previous response, which is given by integrating Eq. 5.3: A second observable density is of the interresponse times conditional on there being no signal within the time between the two responses; call this f2(t). Note that by the conditioning, the second response must have been initiated either by a signal in the queue or, if none exists, then by a noise event. Place the first of the two responses at the origin and let Xs and XK denote the random variables representing the time until the next signal and the next response, respectively. Keeping in mind that the signals are governed by a Poisson process, and so occur quite independently of the response processes, we see that
We evaluate each of the three terms:
where Rv is defined by Eq. 5.4 with v substituted for 17.
Substituting the two preceding expressions, simplifying, and integrating by parts yields Substituting these three expressions in Eq. 5.6 yields as the tail of the distribution Integrating this from T to °° and subtracting from 1 yields the probability of an IRT with no intervening signal and not longer than T: Without going into the mathematical details, one can derive several other distributions. For example, the density of times from a signal to the next response conditional on there being no observable event—either signal or response—in the T sec before the signal can be shown to satisfy
Detection of Signals Presented at Irregular Times
183
And the corresponding probability of such times being less than T is A fourth density is for the signal response time conditional on both no observable event T sec prior to the signal and no additional signal between the given one and the response. This is, in essence, the one calculated by Egan et al. (1961) except, of course, their presentation schedule was not Poisson. This density, for Or, is given by The corresponding probability is
A special case of this model is the lock-out one, which is the queuing model with p = 0. In this model signals that arrive during the processing of other signals are simply lost because there is no queue for them to enter. This was the model tested by Green and Luce (1967) and in an improved replication by Luce and Green (1970). We ran studies with brief, dimcultto-detect pure tones in a noise background. Using T = 2 sec, which seems quite conservative, we estimated TJ from the tails of densities 1 and 3 (Eqs. 5.3 and 5.9) and v + A from densities 2 and 4 (Eqs. 5.7 and 5.11). Since A is known—it is selected by the experimenter—q and v may be calculated. The probability expression for density 2, Eq. 5.8, was then fit to the data for that case in order to estimate RJR x. Thus, all of the parameters entering into the probability expression for density 4, Eq. 5.12, are known, and so it can be predicted. This is shown in Figure 5.2; it is abundantly clear that the lock-out model is not satisfactory. The question remains whether the queuing model is more adequate. A direct test was not carried out and the data are no longer available, so we will have to follow the indirect argument of Luce and Green. According to Eq. 5.8, assuming the lock-out model instead of the queuing one causes a misestimate of R JRV by a factor of
Note that B - 1 = p(v + A)/(l - p) > 0 and so B > 1. If we let
then the error in estimating the probability in Figure 5.2 is, according to Eq. 5.12, (l-q)e- ( " + x > 11/a+A)-l/(l + AB)] = (l-q)e ( " +x)T A(B -!)/(! + A)(1 + AB) <(l-q)e" ( v + x ) T A(B-l) < (1 - q)qe- ( ^ 2x)T (A/„)[(„ + A)H[p/(l - p)],
184 Detection Paradigms
FIG. 5.2 Observed proportion of a signal-response interval that is not preceded within 2 sec by a signal or a response and during which no second signal occurs versus the predicted value for that probability in the lock-out model. The model and method for estimating the parameters is described in the text. There are between 100 and 400 observations per data point. [Figure 3 of Luce and Green (1970); copyright 1970; reprinted by permission.]
where we have used the fact that R_JRV < e k^. In the data, \
Detection of Signals Presented at Irregular Times
185
about the model is wrong, but studies discussed below indicate that something more than simple queuing is involved. As none of these models resolved the difficulties of signals close together in the Green and Luce data, we had two options. One, which we took, was to try to eliminate the source of the problem by changing the experimental design. We were led away from vigilance experiments as such and turned to simple reaction time designs in which the foreperiod was exponential (this has been discussed in Sections 2.4.3 and 4.1.3). The other strategy, which was underway much earlier in Great Britain and Australia in connection with tracking and skilled motor skills, was to make a direct frontal attack on the question of what subjects do when two signals occur close together. I turn to this next. 5.4 INTERACTION OF TWO SIGNALS IN CLOSE SUCCESSION 5.4.1
The Major Phenomenon: Psychological Refractoriness
Telford (1931) was one of the first to report an interaction in the response times to two signals that are presented in rapid succession, and he spoke of a "refractory phase" following the occurrence of each signal. This term was invoked because he thought the phenomenon to be one of recovery, somewhat analogous to the refractoriness of nerves; however, careful work in the 1950s made clear that this is really a very poor analogy. Among other things, neural refractoriness is measured in msec whereas behavioral interference lasts as long as a few hundred msec. As a result, the phenomenon is no longer called that but rather the "psychological refractory period," still a potentially misleading term. Several surveys have been published on this topic, among them Bertelson (1966), Herman and Kantowitz (1973), Kahneman (1973), Kantowitz (1974), Smith (1967c), and Welford (1980c). Much of the work goes considerably beyond issues of detection, ranging into rather more complex cognitive tasks. For example, Posner (1978) has extensively used a probe technique in which a stimulus is presented and some more-or-less difficult cognitive task is begun. But on some trials after the main task is underway a probe stimulus is presented, and the subject is to respond to the probe as rapidly as possible. This procedure is obviously related to the topic of the chapter. To encompass this more general class of experiments, Kantowitz uses the term "double stimulation." In contrast, in this chapter I shall take a relatively narrow view of the matter and concentrate on simple detection and the simplest of identification designs. Since the empirical literature has been covered quite thoroughly in these reviews, I shall not attempt to deal with all of its nuances here. The situation in this literature, and we shall see much the same thing again in Section 6.5 when we examine sequential effects in the choice reaction-time context, is
186 Detection Paradigms
agreement about the major outlines of the phenomenon but with considerable disagreement about some of the fine details. A good deal of the motivation for the work was not simple reaction times as such or even choice ones, but rather more complex tasks involving more-or-less skilled motor activities. The work began with an inquiry into the nature of continuous tracking, such as arises in the control of various vehicles, and more recently has focused on extremely rapid discrete activities such as typing or piano playing. I shall focus my attention entirely on the interaction that is relevant to reaction times and avoid the temptation to enter into a discussion of the literature on skilled motor activity. The interested reader should examine the following references and those listed there: Cooper (1983), Stelmach (1978), and Stelmach and Requin (1980). Let us turn now to psychological refractoriness itself. An early study, following on the pioneering work of Craik (1947, 1948), is that of Vince (1948) in which the subject viewed a rotating drum through a narrow slit whose width corresponded to about 50 msec. On the drum was a line that mostly was constant, but every now and then changed, either up or down, by 2.5 cm. These changes were described as occurring at random, although I doubt if they were programmed as a Poisson process; rather, it appears that a limited set of ISI intervals was chosen and these were equally likely to be selected. The subject controlled a lever that was to be moved as rapidly as possible in the direction of the change. The data of interest were comparisons of reaction times to successive changes in opposite directions as a function of the time between them, the ISI. She found that for an interval of 50msec the MRT to the first change was 290msec (to an accuracy of 10msec), whereas that to the second was 510msec, some 220msec more. As the ISI was increased the magnitude of the effect decreased until at 500msec there was no noticeable difference. A number of questions can be raised about the source of the slowing. For one, the subject had to reverse very rapidly the movement of the hand, and so the effect could simply have been a motor limitation. Also, it is certainly conceivable that there is some sort of peripheral sensory interaction, something of the sort suggested by Telford. Davis (1956, 1957, 1959), Fraisse (1957), and Welford (1952, 1959) ruled out these possibilities and made clear that the phenomenon is probably central in origin. For example, Davis (1959) ran four conditions involving a visual stimulus first and either a visual or an auditory one second. The first visual stimulus was a 40-msec flash of a neon bulb just left of center and at the end of a 3-ft tube. The second visual one was a similar light just right of center at the end of the same tube, and the auditory one was a highly audible click. If a response was required to the first stimulus, it was made by pressing a key with a finger of the left hand, and the response to the second signal was always with a right finger. Four conditions were run: either a visual or an auditory second signal and either a response to each signal or just to the second one. The two visual conditions were interleaved as were the two auditory ones. The data are shown in Figure 5.3.
Detection of Signals Presented at Irregular Times 187
FIG. 5.3 MRT to the second of two signals versus interstimulus interval (ISI) for the Davis (1959) experiment described in the text. The first signal was visual. The data in the first row are for a visual second signal and in the second row for an auditory one; the data in the left panels arose when the subject responded to both signals and in the right, when the subject responded only to the second one. There were 40 observations per data point. [Figures 1 and 2 of Davis (1959); copyright 1959; reprinted by permission.]
Four aspects of these data are striking: The psychological refractory period was complete by 250 msec, which seems to be the pattern of most of the later studies; its existence did not depend upon the two stimuli being in the same modality; it did not arise through some conflict of the response mechanisms since they were carried out by different hands [this was also ruled out by Welford (1959) using a similar response procedure]; and it depended very little upon any response being carried out to the first stimulus [which was also shown by Fraisse (1957) and Smith (1967b)]. The impact of the first signal on the response to the second depended to some degree upon the repeated alternation of conditions in which response to the first signal
188 Detection Paradigms
was required. This was established by running the same subjects some two months later in the visual-auditory case with responses only to the auditory signal and finding that they were substantially faster than they had been. Earlier, Davis (1956, 1957) addressed the contention that using the two hands to respond might involve the two hemispheres of the brain, thereby introducing a communication delay between them, by having the two responses occur on one hand with different fingers. He found that this made little difference. These then are the major features of the phenomenon about which there is considerable consensus. We turn now to some of the attempts to understand theoretically what is involved and to some of the less consistent findings. 5.4.2
Signal Queuing: The Single-Channel Hypothesis
Welford (1952, 1959) is the author of the single-channel hypothesis, which he summarized carefully in his 1980 review. In its simplest form, it says that if the second signal arrives while the processing or responding to the first signal is underway, then some buffered representation of the second signal enters a queue and awaits the completion of the response to the first signal. At that time it is then processed in the usual fashion. Thus, if / denotes the inter-stimulus interval (ISI) and T; is the undisturbed random variable of response time to signal i, i = 1, 2, then the processing of signal 2 is delayed by the amount T-,-/, if that time is positive, and is not delayed otherwise. Thus, the observed time to the second stimulus is predicted to be
On a priori grounds, the model has at least two defects. First, if we think of simple reaction times as composed of three additive components—the conduction time C, decision time D, and response activation time R—then queuing should only arise if the decision time for T,, D,, overlaps that for T2. If we assume C, and C2 are identically distributed, which is plausible if both signals are in the same modality and of the comparable magnitude, then the model really should read Since D^Tj, the observed T2 is thus predicted to reach its asymptotic value well before I exceeds T[. As we shall see, this result does not happen, suggesting that the a priori model is defective. Welford (1959) formulated a variant in which a signal that arrives during D, is stored until that process is completed, whereas a signal that arrives during R, is stored until the response process is completed. Second, it is assumed that the act of storing
Detection of Signals Presented at Irregular Times
189
the signal in the queue has no effect on the subsequent reaction time or on the quality of performance. This is tantamount to assuming no degradation of the representation as a result of its being stored and that it takes no time to retrieve the representation from the queue. Ignoring these criticisms, we work with Eq. 5.13. Welford typically has substituted E(T)s for the Ts in Eq. 5.13, but that is incorrect and potentially misleading. Following Oilman (1968), who was more general in that he treated 7 as an independent random variable, we compute E(T) exactly. In doing so, we do not need to assume that Tj and T2 are independent. Let Y = max(T,,I), F the distribution function of Y, F, that of T,, and G that of I, which when I is constant is simply
Since T! and I are independent, we see F(x) = F l ( x ) G ( x ) . Using this relationship, assuming that density functions exist, and carrying out an integration by parts,
Thus, setting n = 1
Observe,
As an example of how Eq. 5.14 behaves, suppose fi = /2 is a displaced exponential
190 Detection Paradigms
Then it is easy to calculate the integral in Eq. 5.15, yielding
where m = 1/fx. Observe that by differentiating Eq. 5.15 with respect to I, It is entirely possible for the minimum slope to be achieved as follows. Suppose that the density of T ( lies entirely between two bounds, say, a and b—that is, then from Eq. 5.14 we see that
So, the model predicts that E(T'2) will initially decrease with slope — 1, provided Ka, and the slope will gradually increase until E(T'2) asymptotes at E(T2). Moreover, since E(T7) is decreasing, Eq. 5.16 implies Before looking into this empirically, we deduce two other properties of the model. In the following calculations for the variance of T^ and for its correlation with TI, we assume that T, and T2 are independent random variables. Then, using Eq. 5.14,
Taking the derivative with respect to I,
Thus, V(Y) is decreasing, so the maximum occurs for [ = 0, whence by Eqs.
Detection of Signals Presented at Irregular Times
191
5.14 and 5.20,
where
So the correlation is given by
Again the limiting cases are interesting. Note that for Ka, Vj(I)= V) and El(l) = 0, and for I>b both quantities are 0, so
Observe that if V(Tj) = V(T2), which is certainly plausible when the two signals are identical, then for I
Data Relevant to the Single-Channel Model
The model yields at least six major predictions. 1. For sufficiently short ISI, the slope of E(T2) versus ISI is -1 (Eq. 5.18). 2. For sufficiently long ISI, E(T2) approaches E(T2), the simple reaction time to signal s2 (Eq. 5.17).
192 Detection Paradigms FIG. 5.4 MRT to the second of two signals versus ISI when the first signal was selected at random from among 1, 2, or 5 digits and the subject was to identify the digit presented and the second signal was one of two tones that was also to be identified. See the text for discussion. Each data point is based on 360 observations—90 each from four subjects. [Figure 1 of Karlin and Kestenbaum (1968); copyright 1968; reprinted by permission.]
3. For I
Detection of Signals Presented at Irregular Times
1 93
well sustained in the data. For example, in all of the curves shown in Figures 5.3 to 5.5, there is only one minor discrepancy: subject D in upper left of Figure 5.3 has a slope slightly less than — 1. All of the others are either approximately —1 or larger. It is not surprising that the slopes are larger than — 1 since all ISIs are well above 0. Other data not shown here are also consistent with this prediction (Welford, 1980c). 2. Figures 5.3 to 5.5 suggest that for larger ISIs T^ may approach simple reactions times since the values achieved are in the typical range. However, there has been no direct verification of that fact. If we were to discover an effect of s-i on T2 for such long ISIs, it would have consequences for running simple reaction-time experiments in order to avoid unwanted sequential effects. In Figure 5.5, the asymptote is about 84msec slower when s2 involves a choice than when it does not; as we shall see in Section 6.1, this is typical of choice data. What is quite surprising, however, is the fact that this difference disappears as the ISI becomes shorter. This is clearly inconsistent with Eq. 5.13, and it suggests that something else is involved for short ISIs. 3. Smith (1967a) manipulated T x by varying the intensity of s1; which was a light flash, and she found a systematic change in MRT^ as a function of ISI except, possibly, for the shortest value of ISI, 60 msec. So, except for the shortest ISI, the data are consistent with the model. 4. Oilman (1968) reanalyzed the data of Davis (1959) and found that while the means satisfy the predicted inequality, the variances definitely do
FIG. 5.5 This figure is the same as Figure 5.4 except that the numbers of alternatives are 1 and 2, in all four pairings with first and second signal position. See the text for discussion. [Figure 3 of Karlin and Kestenbaum (1968); copyright 1968; reprinted by permission.]
194 Detection Paradigms FIG. 5.6 MRT to the first signal in the experiment of Figures 5.4 and 5.5. [Figure 4 of Karlin and Kestenbaum (1968); copyright 1968; reprinted by permission.]
not. In all four cases, the estimate of V(T2) was at least four times as large as that of V(T2), and in one of them it was larger by a factor of 20. This clear violation of the model seems largely to have been ignored in the secondary sources. 5. The effect of ISI and s2 on MRT, is shown in Figure 5.6 for the Karlin and Kestenbaum data. ISI does not appear to affect T 1; but whether s2 entails a choice or not definitely matters. This result is not predicted by the single channel model. Oilman (1968) pointed out that a simple reaction-time experiment with a warning signal can be viewed as a special case of this two-signal paradigm in which the forcperiod is the ISI and no response is made to the first signal. Nickerson (1965b) reported such data in which MRT was examined as a function of the family of foreperiods used in different conditions. The model says that T2 should depend only on the actual forcperiod, not the family. This was not the case. Although this finding appears inconsistent with the model, it may very well not be terribly telling if the subject is controlling a response criterion that is changed with foreperiod family in order to keep the level of anticipations below somesmall proportion. 6. For an ISI of 50msec, Davis (1959) reported an estimate of p = 0.77. Welford (1967, p. 20) reported a correlation of 0.473 for all cases of his 1959 data for which ISI<MR r I\; however, Welford (1959, p. 205) seems to say that the correlation was zero for short ISIs, but he is not very specific about it. Karlin and Kestenbaum (1968) report correlations for each of their
Detection of Signals Presented at Irregular Times 195
four subjects in three experimental conditions for the following ISIs: 90 + nlOOmsec (n = 0 , . . . , 9), 1050, and 1150msec. Although the individual correlations seem somewhat variable, they decrease approximately linearly from a mean of 0.69 at 90msec to less than 0.10 for all ISIs beyond 590msec except 790, which is 0.14. This finding appears to be consistent with Eqs. 5.22 and 5.23 but it does not inform us about what happens at very short ISIs. In sum, this simple queuing model accounts for some of the major features of the data, but it fails on some points. Thus, almost certainly, the model does not describe all that is involved in the interaction of two nearby signals. As we shall see, a more detailed examination of the data suggests additional complications, and we know from Section 5.3.2 that the queuing model did not fit the weak signal data of Green and Luce. Both in his 1959 paper and in the 1980 survey, Welford gave consideration to two refinements of the theory that are designed to handle some discrepancies between the model and the data. The one, which I shall not go into in detail, entails distinguishing between the effects that arise when signal 2 arrives during the perceptual processing of signal 1 and those that arise when it arrives after that has been completed and the response selection process is underway. He assumed that in the former case the processing of s2 is initiated when the perceptual processing of s, is completed, but that in the latter case processing of s2 is not started until the response to Sj is made and there is feedback to the CNS that the response is complete. Salthouse (1982) reported an empirical study whose findings he interpreted in terms of two mechanisms: one complete at about 100 msec, which he suggested may be masking of the first signal by the second, and the other lasting to about 200 msec, which he interpreted as the second signal interrupting the central processing of the first one. The argument for two mechanisms is the following. Signal 2 was either the same or different from signal 1. When the signals are the same, the probability of a correct judgment was independent of ISI, but when they were different it was at chance for an ISI of 20 msec rising to that of the other case by about 100 msec. Equally well, the duration of the same case was constant and that of the different one was some 100 msec slower at an ISI of 20 msec, it fell with increasing ISI, but did not reach the other value until an ISI of about 200 msec. The case for two mechanisms is not compelling, however, for as we shall see in Section 8.4.2 sometimes a single mechanism can account for two quite different time courses. The other phenomenon is discussed in Section 5.4.4. 5.4.4
Signal Grouping
Studies have shown repeatedly that when the ISI is quite brief, say less than 100 msec, Welford's model breaks down, and it appears as if the two signals are sometimes treated as a single entity. If this is true and if the grouped and ungrouped cases are not separated in the data analysis, then on average T2
196 Detection Paradigms
responses will be somewhat briefer than would be expected by the model, and this has been found (Elithorn & Lawrence, 1955; Koster & Bekker, 1967; Marill, 1957). Welford (1980c) pointed out that such grouping of nearby signals really comes as no surprise since it appears to be an essential feature of all fast, highly skilled activities such as typing or playing an instrument. Karlin and Kestenbaum questioned whether the signals can be viewed as grouped at all. They observed that if any grouping of the two signals occurs, then both response times should exhibit a comparable delay due to ISI, but their plots of MRT-, against ISI are quite constant (Figure 5.6). However, this fact does not seem relevant since their shortest ISI was 90 msec, which is at the very top of the range within which grouping is thought to occur. The most notable effect in this analysis was the fact that MRT} depended upon the number of alternatives for s 2 as well as the number for s^. when s2 has two possibilities rather than one, then MRT, is some 30 msec slower. This result is not predicted by the single channel model. 5.4.5
Quantized Perception
One theoretical idea that leads naturally to some sort of signal grouping was pointed out by Broadbent (1958), who raised the possibility that visual perception (it is extremely doubtful if anything comparable is true in audition) is partitioned into discrete temporal "snapshots" of the environment. Such discreteness of sensory intake over time appears to have been first proposed by Stroud (1956). These perceptual quanta, as they are called, are thought to be of the order of 100 to 300 msec in duration—they would have to be near the latter value to account for the present data. According to this view, unless ISI is quite small, in which case s^ and s 2 are dealt with as a single signal, signal s, captures the time quantum in which it occurs. Should s2 be presented during the same quantum, it must be held in a buffer until that quantum ends and a new one begins. To be quite explicit about what is assumed, suppose the quanta are of duration Q time units, that relative to the onset of a quantum, sl is presented at a random time X that is uniformly distributed over the interval 0 to Q, and the time between s1 and s2 is /. For simplicity, assume that the processing times to the two signals are independent and identically distributed, say Y, and Y2. Then we obtain as the model for the case where the signals are not grouped:
At least two empirical observations cast considerable doubt on this model. Observe that since Y v is independent of both Y2 and X, it implies that T, and T2 are independent, which as we have seen above is far from the case.
Detection of Signals Presented at Irregular Times
197
The second argument is based on the data of Karlin and Kestenbaum (1968), which show that the more alternatives there are for s1; the larger must be the value of the ISI before T2 = T2 (see Figures 5.5 and 5.6). One striking empirical fact is that the effect of the ISI on MRT2 changes with the number of alternatives, and there is nothing in the quantal theory to account for that, whereas it is exactly what one expects if the single channel hypothesis holds and if, as is generally true (Section 6.1 and Chapter 10), MRT l increases with the number of stimulus alternatives. 5.4.6
Perceptual and/or Response Preparation
A still different and important approach to understanding these phenomena, as well as some we consider in Section 6.5, is to suppose that subjects vary in their state of readiness to receive, process, and respond to signals. Summaries of this point of view—which is largely conceptual and has not been stated as a detailed model comparable to the single-channel hypothesis—can be found in Bertelson (1966), Smith (1967c), and Welford (1980c). The idea of preparation can be partitioned into two distinct ideas: preparation to receive the signal, which is referred to as an expectancy theory, and preparation to make the response, which is referred to as a readiness theory. In an expectancy theory, the ISI serves as a foreperiod for the presentation of s2, and experience with the ISIs used should serve to develop an expectancy for some waits and not others. The obvious test of this concept is a design in which ISI is constant either in blocks over trials in a withinsubjects design (Borger, 1963) or over subjects in a between-subjects design (Creamer, 1963). In neither case was the effect of ISI eliminated. Smith (1967c) discussed several other relevant studies. A readiness theory also likens the ISI to a foreperiod during which the subject is developing a readiness to make the required response. As was mentioned above, Nickerson (1965b) demonstrated a significant effect of foreperiod duration on simple RT to visual signals. Perhaps the most directly relevant study is Kay and Weiss (1961) in which they formed many of the possible combinations of constant or variable foreperiod to s1; constant or variable ISI, and responses to both signals or only to s2. They found that MRT2 was longer when a response was required to s-i than when it was not, and it was longer when the foreperiod to s, was variable than when it was constant. Indeed, when the foreperiod is variable and the ISI is constant, MRT2 is longer than when the foreperiod is constant and the ISI is variable, which is rather different from what one expects from a readiness theory. The conclusion is that the delay is due largely to the processing of the first signal and not upon the nature of the ISI per se. Welford (1980c, p. 206) remarked that the substantial correlations between T] and T2 are not explicable on the basis of changes in uncertainty about the onset of s2. It appears to me that this claim may be incorrect. If
198 Detection Paradigms
the level of readiness is both variable and relatively slow to change, then it should be at approximately the same level for both st and s2 when the ISI is sufficiently small, in which case the two response times should be correlated. The consequence of signal uncertainty would be to alter the distribution of levels, which would change but not necessarily eliminate the correlation. My argument is not valid if, however, the presentation of Sj rapidly decreases the level of readiness so that it is at a very low level when s2 arrives, and so the response time to it is slow. Of course, intuitively, one might have suspected that s, would serve as a signal to raise, not lower, the level of readiness. The data shown in Figure 5.5 also seem to cast considerable doubt upon the readiness hypothesis because the pattern of effects due to ISI was identical in all conditions, and yet there was considerable additional impact of uncertainty about Sj on MRT2. 5.4.7 Capacity Sharing One of the most elusive, yet plausible, ideas to arise in accounts of response times is the "mental" capacity needed to carry out a task, and the assumption that the total amount of relevant capacity is distinctly limited. Two things make the concept particularly slippery. First, the nature of capacity is rarely specified in much detail. In a few models it is identified with some parameter of the model. For example, in the model of Section 11.3.4 capacity will be the decay parameter of an exponential process, and in another model it has been identified with the sample size of the information on which a decision is based. But for the most part, it is a verbal concept whose use is little constrained by agreed-upon properties. Second, if the network of processing stages is also treated as something unspecified, to be ascertained from the data, and if capacity can be shifted more-or-less freely from one task to another, then making decisive inferences about either the network or the capacity is really very difficult. For example, a serial system such as is suggested by the single-channel hypothesis can be mimicked easily by a parallel system with capacity sharing. These issues are developed in some detail in Sections 11.3.3, 11.3.4, and 12.2. Here I will merely report one set of data in which the single-channel hypothesis is placed into doubt, at least for tasks in two different modalities, and in which capacity considerations provide a plausible, but partial, account of what is going on. Greenwald (1970, 1972) pointed out that some stimulus-response pairs are so overlearned and/or "natural" that they draw on far less capacity than do other pairs, even ones we may feel are quite compatible. One example involves the presentation of a spoken A or B. A compatible, but not overlearned, response is for the subject to say "one" to A and "two" to B. An overlearned response is to say "A" to A and "B" to B. The former arc called SR pairs and the latter IM pairs, where IM stands for his term "ideomotor," which seems to be little specified beyond meaning overlearned
Detection of Signals Presented at Irregular Times
199
and natural. In this example, the stimulus is the same in both cases, but the response differs. In the next example, which is visual, the response is the same, moving a key to either the left or the right, but the stimulus differs. In the SR case the stimulus is either a light flash to the left or right of a midpoint, and in the IM case the light flash is, in fact, an arrow pointing either left or right, in accord with the response desired. Greenwald assumed that the capacity used in simple IM tasks is sufficiently small so that two such tasks can be carried out simultaneously with no loss in performance, whereas that is not true for SR tasks. This assumption was confirmed in the 1972 paper. Greenwald and Shulman (1973) extended this idea to the study of the psychological refractory period. They used the visual stimulus mentioned above for Si and the auditory one for s2, and then ran all four combinations of SR and IM with six ISIs ranging from 0 to 1000msec. The 0 case was the same as that studied in the 1972 paper. To their initial chagrin, the usual psychological refractory decay in MRT^ was found in all four cases, with the curves essentially the same except for overall vertical shifts, the IM-IM case being the fastest and SR-SR the slowest. Not only did the psychological refractory period remain, but they failed to replicate the 1972 result. In addition, they noted the surprising fact that MRTi actually increased with ISI, and that the average of the two times is constant for the IM-IM condition and not for the others. These results suggested to them that, for some reason, capacity had in fact been shifting between the two tasks in the IM-IM case, contrary to what they had anticipated. They argued that perhaps this was because the instructions had not made clear that the ISI could, in fact, be 0. So the experiment was rerun with two changes. The instructions made clear that the ISI could be 0, and two of the intermediate ISIs were dropped and a control, denoted C, of just «! and just s2 was added. The data are shown in Figure 5.7, where we see that MRTj is independent of ISI in the IM-IM condition and the duration of the ISI effect is much attenuated (to less than 100 msec) in the IM-SR case as compared with its full-blown development in both cases where the first task was SR. In addition, there continues to be a pronounced effect of ISI on MRT1; with significant slowing during the first 200 msec. This they attribute to capacity shifts as ISI changes. The natural way to test capacity shift arguments is to observe the correlation between the two responses. If there is such a shift and if there is any degree of trial-by-trial variation in the amount, then it should be reflected as a negative correlation between the two response times. When no shifting is required, then a zero correlation should occur. Unfortunately, these correlations were not provided. It is unclear to me how capacity arguments are to be adapted to account for the positive correlations that others have reported, at least when both tasks are substantially the same. In Section 12.4.2, on applications of the method of additive factors, a study by Pashler (1984) is described in which it is concluded that capacity sharing is not the major source of the psychological refractory period.
200 Detection Paradigms
FIG. 5.7 MRT to the first and second of two signals versus ISI for four conditions of stimulus-response compatibility described in the text. Each data point is based on 300 observations, 30 from each of 10 subjects. [Figure 3 of Greenwald and Schulman (1973); copyright 1973; reprinted by permission.]
Moreover, by selective manipulations of the several stages of processing thought to exist, the locus of the single channel bottleneck is argued to exist at the decision stage, not the perceptual or response initiation stages. 5.5 CONCLUSIONS There is not a great deal to say in conclusion. Although Poisson or other irregular temporal presentations seem far more natural than do trial designs, the problems of analysis are considerable. Either we use a Poisson schedule—which aids considerably in carrying out mathematical analyses of the resulting process but inevitably leads to the presentation of pairs of signals in close succession—or we use a modified schedule that eliminates such pairs, in which case detailed analyses appear to be quite difficult. Empirically, there is no doubt whatsoever that when two signals are close together, the response time to the second one is slowed considerably. To some extent, this appears to be a delay of the sort expected if there is a single channel and the processing of the second signal must await the
Detection of Signals Presented at Irregular Times
201
completion of the processing of the first one—a queue. But the phenomenon is more complex than that as is evidenced by various deviations from that model, including the possible grouping of the two signals when they are sufficiently close, and the possibility of shifting capacity. Unfortunately, these additional complexities are not yet understood in sufficient detail to define a proper model. For the rest of the book we deal with both experiments and models for trial designs that largely, although not altogether, bypass the possible interaction of successive signals.
This page intentionally left blank
II IDENTIFICATION PARADIGMS
This page intentionally left blank
6 Two-Choice Reaction Times: Basic Ideas and Data 6.1 GENERAL CONSIDERATIONS 6.1.1
Experimental Design
The simplest choice-reaction time experiment has much in common with the simple-reaction time design. The major difference is that on each trial one of several signals is presented that the subject attempts to identify as rapidly as is consistent with some level of accuracy. This attempted identification is indicated by making one of several responses that correspond systematically to the stimuli. In terms of the driving example, not only must dangerous obstacles be detected, but they must be identified and appropriate responses—braking, swerving, or accelerating—be made, which one depending upon the exact identity, location, and movement of the obstacle. Formally, we shall suppose that the signal is one of a fixed set of possible signals, s 1 ; . . . , sk, and the corresponding responses are r 1 ; . . . , rk, so that the subject responds r{ if he or she believes sf to have been presented. One may view the selection of a signal by the experimenter as a random variable, which we denote as sn on trial n, and the response is another random variable rn. When there are only two signals and two responses, as will be true throughout this and the following three chapters, I shall avoid the subscripts and, depending upon the context, use either of two notations. In most two-choice situations, I use {a, b} and {A, B} for the names of the stimuli and corresponding responses; the generic symbols are s and r, s = a or b, r = A or B. Occasionally when the experiment was clearly a Yes-No detection one in which the subject responded Yes if a signal was detected and No otherwise, I will denote the presentation set by {s, n}, where n stands for no signal or noise alone, and the response set by {Y, N}. Many aspects of the design do not differ from the simple case. There is always a foreperiod—either the time from the preceding response, in which case it is called the response—stimulus interval, RSI, or from a warning signal—and it may be variable or constant. Catch trials can be used (e.g., Alegria, 1978, and Oilman, 1970), although it is not common to do so. Payoffs may be based upon response time as well as on its accuracy. But more interesting are the new possibilities that arise from the increased complexity of the design. I list five: 1. Because a discriminative response is being made, it is more plausible to 205
206 Identification Paradigms
use a fixed foreperiod design than in the simple reaction case; indeed, it is entirely possible to have a well-marked time interval during which the signal is presented (this is often used in Yes-No and forced-choice designs). Basically, the argument is that if a discriminative response is made, the subject must wait for the signal to occur and so there is no need to worry about anticipations. This argument is compelling so long as the signals are perfectly discriminable and no response errors occur; but the minute some inaccuracy in the performance is present, then the claim becomes suspect. For a discussion of this and the effective use of random foreperiod in a choice experiment to suppress anticipatory responses, see Green, Smith, and von Gierke (1983). And as we shall see in Section 6.6.5, considerable evidence exists that anticipations may be a problem in many choice-reaction time experiments. 2. Since the experimenter has total control over the sequence of signals presented, it is possible to use that schedule in an attempt to elicit information about what the subject is doing. The most commonly used schedule is a purely random one, but in some studies the stimuli are not presented equally often and in a few sequential dependencies are built into the schedule in order to affect locally the decision strategies employed by the subjects. 3. In addition to payoffs based upon the observed reaction times, there is clearly the possibility of introducing information feedback and payoffs based upon the accuracy of the responses. By varying the values of the payoffs for both response time and accuracy, the experimenter sets up different monetary conflicts between accuracy and speed. If people are able to alter their strategies to take advantage of this tradeoff—and they are—then we may exploit this technique to gain some information about what those strategies are. 4. The relation between the signals can be manipulated over experimental runs in order to study how that variable affects behavior. For example, one can use two pure tones of frequencies f and / + A/, where the separation A/ is manipulated. Clearly, the dependence of reaction-time performance may very well vary with the value of / used and almost surely will depend upon the intensity of the two tones, which can vary from threshold to very loud. 5. As was true for simple reactions, there are many options for the signals—indeed, there are all the possible differences signals may exhibit beyond simple presence and absence. And there are many options for the response, although the most common continues to be finger presses of highly sensitive keys. For an empirical study of the use of different fingers and hands, see Heuer (198la, b). What is new, and complicating, is the multiple ways in which the possible responses can be related to the possible signals. In most studies, aside from those that undertake to study the impact of different mappings between responses and signals, experimenters attempt to select a mapping that on intuitive grounds is as natural, compatible, and symmetric as possible. Of course, one can study directly the impact of stimulus-response compatibility on response tune, and we will do so in
Two-Choice Reaction Times: Basic Ideas and Data 207
Section 10.2.3. In the two-choice situation, three closely related designs are the most common. In each there are two highly sensitive keys, and either the two forefingers are used, each being held over a key, or the subject's preferred forefinger rests on a well-defined spot between the two keys and is moved to the appropriate key, or the forefinger and middle finger of the preferred hand are placed over the two keys. 6.1.2
Response Measures
In any choice experiment there are at least two dependent measures— namely, the choices made and the times they take. One may, in addition, ask of the subject other things such as the confidence felt about the judgment. I shall focus primarily on the first two measures—to my mind the most natural ones. It will prove convenient to partition the story into a number of subpieces. In Sections 6.2 and 6.4 I discuss matters that do not entail much, if any, interaction between the two measures or between the measures on successive trials. In Section 6.5 the focus turns to the interaction between response times and errors—the so-called speed-accuracy tradeoff—that arises when the times are manipulated. Section 6.6 examines interactions of responses and their associatea times with the events on preceding trials— sequential effects in the data. The following three chapters discuss a number of models that have been proposed to account for some of these as well as other phenomena. After that, in Chapter 10, attention turns to response times when more than two signals are to be identified. There is, of course, some option in the exact measures of accuracy and time to be collected and reported. For the choices there is little debate (perhaps there should be) about what data to collect: one postulates the existence of a conditional probability, P(r s), of response r being made when signal s is presented. Of course, we must be sensitive to what aside from the current signal may affect this probability. For example, it may well depend upon some previous signals and responses. Much of the literature tacitly assumes otherwise and, ignoring any dependencies that may exist, relative frequencies of responding are used to estimate these simple conditional probabilities. Some of the data in Section 6.6 should lead to some concern about this practice. Suppose the signals a, b and the responses A, B are in natural 1:1 correspondence; then there are just two independent conditional probabilities since
We usually elect to use P(A j a) and P(A | b). The study of the behavior of these probabilities is the province of much of traditional psychophysics, a topic that is alive and well today in part because of the theory of signal detectability. This literature offers several possible summary measures of
208 Identification Paradigms
performance accuracy, the most famous of which is d'. We will go into this question more thoroughly in Sections 6.4 and 6.5.3. For reaction times, a distribution of times can be developed for each of the four signal-response pairs and, unlike the choice probabilities, these are not logically constrained to be related in any particular way. Of course, these distributions may also depend upon something else, such as preceding signals and responses. Again, much of the literature implicitly assumes no such dependence; but as we shall see in Section 6.6, that assumption is highly optimistic. As with simple reaction times, the distributions can be presented in a variety of ways, but in practice little has been published about actual distributions or their hazard functions. For the most part, only mean reaction times have been reported, although in a few recent studies some other statistics, usually the variance, skew, and kurtosis (see Sections 1.3.4 and 1.4.5), have also been provided. 6.2 RELATIONS TO SIMPLE REACTION TIMES 6.2.1
Means and Standard Deviations
Perhaps the best established fact about choice reaction times is that, with everything else as similar as possible, these times are slower than the comparable simple ones by 100 to 150 msec, and they are usually, but not always, somewhat more variable. This has been known for a very long time and has been repeatedly demonstrated; I shall cite just two studies in which the relationship has been explored with some care. The first does not involve any feedback-payoff manipulation; the second does. In his Experiment 4, Laming (1968) ran 20 subjects for 120 trials in each of five conditions that were organized in a Latin square design. The signals, white bars of width 0.50 in. and height 2.83 in. and 4.00 in., were presented singly in a tachistoscope and the subject had to identify each presentation as being either the longer or shorter stimulus. Signals were presented according to an equally likely random schedule. There were three general types of conditions. In the simple-reaction time design subjects were to respond as rapidly as possible to both signals. In one variant they used only their right forefinger and in a second only their left forefinger. In the choice design, they used the right forefinger response for one signal and the left for the other. And in the recognition design, the situation was as in the choice one except that one of the two responses was withheld. That is, they had to identify the signals and respond to just one of them. In a certain sense, the recognition design resembles a choice one in which withholding a response is itself a response. Table 6.1 presents the median reaction times and the median over subjects of the standard deviation of reaction times. The pattern of medians is typical: the simple reaction times of about 220msec arc normal for visual stimuli; the choice values are about 200msec slower; and the recognition times are some 35 msec faster than the choice ones. The
Two-Choice Reaction Times: Basic Ideas and Data 209 TABLE 6.1. Response time data from Experiment 4 of Laming (1968) Response time in msec Condition Simple Right forefinger Left forefinger Choice Recognition 4 in. 2. 83 in.
Median
Median standard deviation
228 220 419
87 88 88
384 385
81 85
variability, which appears to be the same in all conditions, is not so typical. I will say more about it below. Snodgrass, Luce, and Galanter (1967) ran three subjects in four conditions, including the three run by Laming plus a simple design in which just one of the signals was presented. The two simple cases are identified as simple-1 and simple-2, depending on the number of signals used. The warning and reaction signals were pure tones of 143msec duration; the foreperiod was 2 sec. The warning signal was a 1100-Hz tone and the two reaction signals were 1000- and 1200-Hz tones, both of which were quite audible. Payoffs were used. They were defined relative to one of two time bands: the fast band ran from 100 to 200msec following signal onset, and the slow one, from 200 to 300 msec. Responses prior to the band in force were fined 5^, and those within each successive third of the band received rewards of 3<£, 2<£, and !
FIG. 6.1 MRT to pure tones of 1000 and 1200 Hz under four conditions: simple reactions to a single signal; simple reactions to either signal; recognition of and response to just one of the two signals; and choice reactions where the signal presented is to be identified. Two reinforcement bands, 100-200 msec and 200-300 msec, were used. Each data point is based on a sample of 240 reactions. [Figure 2 of Snodgrass et al. (1976); copyright 1967; reprinted by permission.
FIG. 6.2 Estimated standard deviations for the experiment discussed in Figure 6.1. [Figure 3 of Snodgrass et al. (1967); copyright 1967; reprinted by permission. | 210
Two-Choice Reaction Times: Basic Ideas and Data 211
variability for simple reaction times is rather large—the coefficient of variation (cr/n) being about .40 as compared with about .17 in the Snodgrass et al. data—whereas the variability for the choice conditions is similar in the two experiments. There are a number of differences that might underlie this difference in results. The two most notable are the modality, auditory versus visual, and the use of feedback and payoffs in the one and not the other. Although recognition data typically are somewhat faster than comparable choice reactions, this is not always true. For example, Smith (1978) (as reported in Smith, 1980) found the recognition times (248 msec) to be slower than the choice ones (217 and 220) in a study of vibrotactile stimuli to the index fingers with keypress responses by the stimulated fingers. This reversal of choice and recognition times will reappear as an issue in Section 6.2.2. Turning to another phenomenon, we know (Section 2.4.1) that simple MRTs become slower as the proportion of catch trials is increased. Alegria (1978) examined their impact in a choice situation where, of course, we must ask both how the error rate and the MRT is affected. A fixed foreperiod of 700 msec was defined by a spot of light moving from left to right, and the signal interval began when it passed a vertical line. The signals were tones of 900 and 2650 Hz, and responses were key presses of the right index and middle fingers. The three conditions of catch trial frequency were 0%, 20%, and 77%. As can be seen in Table 6.2, the error rate was little affected by the proportion of catch trials or by whether the trial previous to the one being examined was a catch or a signal trial. Figure 6.3 shows that MRT is greatly affected by the nature of the preceding trial, and the effect cumulates over trials, but MRT seems little affected by the actual proportion of catch trials. These data are not consistent with the idea that subjects speed up by relaxing their criterion for what constitutes a signal and thereby increase the number of anticipatory responses. The reason for this conclusion is the fact that the error rate is, if anything, smaller following a signal trial than it is following a catch trial. Section 2.4.3, on the impact of exponential foreperiods in simple reaction time, reported that MRT increases gradually with the actual foreperiod wait,
TABLE 6.2. Percent error as a function of type of preceding trial and proportion of catch trials (Alegria, 1978) Proportion of catch trials
Preceding trial Catch
Signal
.77 .20 0
11.1 9.5 —
7.6 5.2 7.7
212
Detection Paradigms FIG. 6.3 MRT for choice reactions to two tone frequencies with ether 20% or 77% catch trials versus the length of the preceding run of catch trials (—) or of signal trials (+). The solid line, which is ordinary simple reactions, involved 44,800 observations per subject; the 20% condition involved 164,000 observations per subject; and the 77%, 234,000. There were six subjects. Depending upon the number n of conditioning events, these sample sizes are reduced by a factor of 1/2". [Figure 1 of Alegria (1978); copyright 1978; reprinted by permission.]
an increase of from 50 to 100msec over a range from half a second to 30 sec. For choice reaction times, Green, Smith, and von Gierke (1983) ran a closely comparable auditory experiment in which the signals differed in frequency. They found substantially the same pattern—a rise of about 30 msec in the range from 200 msec to 6 sec. Error responses are slightly faster than correct ones making the same response (see Section 6.4.3), but the curves exhibit basically the same shape. For waits less than 200 msec, the reaction time increases rather sharply as the wait decreases, rising some 20 msec over that interval. The impact of an exponentially distributed foreperiod, therefore, appears to be substantially the same in the choice and simple-reaction time paradigms. 6.2.2
Danders' Subtraction Idea
Donders (1868), in a highly seminal paper, proposed that the time to carry out a specific mental subprocess can be inferred by running pairs of experiments that are identical in all respects save that in the one the. subject must use the particular process whereas in the other it is not used. He put the idea this way (Roster's 1969 translation): The idea occurred to rne to interpose into the process of the physiological time some new components of mental action. If I investigated how much this would lengthen the physiological time, this would, I judged, reveal the time required for the interposed term. (Donders, 1969, p. 418)
He proceeded then to report data for several different classes of stimuli for both simple- and choice-reaction times (some for two-stimulus designs
Two-Choice Reaction limes: Basic Ideas and Data 213
and much for a five-stimulus one), which he spoke of as, respectively, the aand b-procedures. The difference between the two times he attributed to the difference in what is required—namely, both the identification of the signal presented and the selection of the correct response to make. He next suggested that the times of the two subprocesses could be estimated separately by collecting times in the recognition or, as he called it, the c-procedure. As we have seen above, this entails a schedule of presentations like that used in the choice design, but instead of there being as many responses as there are stimuli, there is just one. It is made whenever one of the signals is presented and is withheld whenever the other(s) occur. Using vowels as stimuli and himself as subject, he reported the difference in times between the c- and a-procedures was 36msec, which he took to be an estimate of the recognition time, and the b — c difference was 47 msec, an estimate of the time to effect the choice between responses. The comparable numbers for the Laming data are 161 and 34 msec and for the Snodgrass et al. data they are 110 and 40msec. Data in which choices are faster than recognitions, such as Smith (1978), should be impossible in Bonders' framework. They may result from subjects using different criteria for responding in the two procedures. Indeed, Donders expressed one concern over the c-procedure: [Other people] give the response when they ought to have remained silent. And if this happens only once, the whole series must be rejected: for, how can we be certain that when they had to make the response and did make it, they had properly waited until they were sure to have discriminated? . . . For that reason 1 attach much value to the result of the three series mentioned above and obtained for myself as a subject, utilizing the three methods described for each series, in which the experiments turned out to be faultless. (Donders, 1969, p. 242)
Although the subtraction method was very actively pursued during the last quarter of the 19th century and today is often used with relatively little attention given to its theoretical basis (e.g., Posner, 1978), it has not found favor in this century among those who study response times as a speciality. The criticisms have been of four types: First is the one mentioned by Donders himself—namely, that the recognition method may not, in fact, induce the subjects always to wait until the signal actually has been identified. (This may be a difficulty for the other two methods, as well.) His proposed method of eliminating those runs in which errors occur is not fully satisfactory because, as we noted in Sections 2.2.5 and 2.2.6, it is highly subject to the vagaries of small sample statistics. As we shall see below in Section 6.6.5, there is evidence from choice designs that anticipations also occur when the time pressure is sufficiently great. This problem can probably be greatly reduced by using a random (exponential) foreperiod, as in the simple-reaction-time design (see Green et al., 1983). The reason that this should work better than catch trials is the greater
214 Identification Paradigms
opportunity it provides the lax criterion to evidence itself since each trial affords an opportunity to anticipate. Working against it is the fact that many foreperiods are relatively brief. The second concern, which was more in evidence in the 1970s than earlier, centers on the assumption of a purely serial process in which all of the times of the separate stages simply add. As we know from Chapter 3, this assumption has not been clearly established for the decision latency and the residue. Sternberg (1969a, b), in his classic study on the method of additive factors, suggested a method for approaching the questions of whether the times associated with the several stages arc additive, provided that one has empirical procedures for affecting the stages individually. A special case of the method was discussed in Section 3.3.4, and it will be discussed more fully in Section 12.4. This means that we have methods, perhaps not yet as perfected as we would like, to investigate empirically the truth of this criticism. The third criticism, which is the least easy to deal with, was the one that turned the tide against the method at the turn of the century. It calls into question the assumption of "pure insertion"—namely, that it is possible to add a stage of mental processing without in any way affecting the remaining stages. Sternberg (1969b, p. 422) describes the attack in this way. [Tjntrospective reports put into question the assumption of pure insertion, by suggesting that when the task was changed to insert a stage, other stages might also be altered. (For example, it was felt that changes in stimulus-processing requirements might also alter a response-organization stage.) If so, the difference between RTs could not be identified as the duration of the inserted stage. Because of these difficulties, Kiilpe, among others, urged caution in the interpretation of results from the subtraction method (1895, Sees. 69, 70). But it appears that no tests other than introspection were proposed for distinguishing valid from invalid applications of the method. A stronger stand was taken in later secondary sources. For example, in a section on the "discarding of the subtraction method" in his Experimental Psychology (1938, p. 309), R. S. Woodworth queried "[Since] we cannot break up the reaction into successive acts and obtain the time of each act, of what use is the reaction-time?" And, more recently, D. M. Johnson said in his Psychology of Thought and Judgment (1955, p. 5), "The reaction-time experiment suggests a method for the analysis of mental processes which turned out to be unworkable."
As Bonders seemed unaware of the problem and as introspection is not a wholly convincing argument, an example of the difficulty is in order. Suppose that the decision latencies of simple reaction times are as was described in Section 4.4—namely, a race between change and level detectors, where a level detector is nothing more than the recognition process being put to another use. If in a recognition or choice situation the identification mechanism is no longer available to serve as a level detector because it is being used to identify which signal has been presented, then the
Two-Choice Reaction Times: Basic Ideas and Data 215
insertion of the identification task necessarily alters the detection mechanism, changing it from a race to the use of just the change detector. This being so means that the detection of signal onset, particularly of a weak signal, is somewhat slower and more variable than it would have been if the same mechanisms for detection were used as in simple reaction time, and so the recognition times will be somewhat overestimated. The idea of stages of processing is very much alive today, as we shall see in Part III, but very few are willing to bank on the idea of pure insertion. It is not to be ruled out a priori, but anyone proposing it is under considerable obligation to explain why the other stages will be unaffected by the insertion. The fourth criticism, first pointed out by Cattell (1886b), is that the c -reaction involves more than pure identification since the subject must also decide either to respond or to withhold the response, which in a sense is just as much of a choice as selecting one of two positive responses. Wundt suggested a d-procedure in which the subject withholds the response until the signal is identified, but in fact makes the same response to every signal. This procedure caused difficulty for most subjects and has rarely been used. It is possible, although not necessary, to interpret Smith's slow c-reactions as direct evidence favoring Cattell's view. For additional discussion of these matters see Welford (1980b), who summarized the matter as follows (p. 107): The evidence suggests that Donders' approach was correct in that, except under very highly compatible or familiar conditions, the c-reaction involves less choice of response than the b-reactions, but was wrong in assuming that all choice of response was eliminated. The difference between b- and c-reactions will therefore underestimate the time taken by choice, and the difference between c- and a-reactions will overestimate the time taken to identify signals.
6.2.3
Subtraction with Independence
An obvious assumption to add to Bonders' additivity and pure insertion is statistical independence of the times for the several stages. This, it will be recalled, was the basic assumption of Chapter 3. The main consequence of adding this is the prediction that not only will the mean times add, but so will all of the cumulants (Section 1.4.5). Taylor (1966) applied these ideas to the following design. Donders conditions b and c were coupled with two modified conditions. Using the same stimulus presentation schedule as in b and c, condition b' involves substituting for one of the signals ordinary catch trials, and the subject is required to make the discriminative response. In condition c', the presentation is as in b', but only a single response is required on signal trials. The model assumes a total of four stages: signal detection, signal identification, response selection, and response execution. The first and last are common to
216 Identification Paradigms TABLE 6.3. P resence of stages iii Taylor (1966) design (see text for description of conditions) Stage Signal identification
Response selection Yes
b'
Yes Yes No
c'
No
No
Condition
b c
No Yes
all four conditions, and so they can be ignored. The second and third are assumed to be invoked as they are needed, and the pattern assumed is shown in Table 6.3. Assuming that this is correct, then we see that the difference between b and c' is both stages, that between c and c' is just the signal identification stage, and that between b' and c' is just the response selection stage. So if we let T(i) denote the response time observed in condition i, the following equation embodies the additivity and pure insertion assumptions which is equivalent to Taylor tested this null hypothesis in an experiment in which the stimuli were red and green disks, the warning signal was a 500-Hz tone, and foreperiods of 800, 1000, 1300, and 1500msec were equally likely. The sample size in each condition was the 32 responses of the preferred hand of each of eight subjects, for a total of 256 per condition. He tested the additivity prediction for the mean, variance, and third cumulant, and none rejected the null hypothesis. Were this to be repeated today, one would probably use the Fast Fourier Transform to verify the additivity of the entire cumulative generating function; however, I am not sure what statistical test one should employ in that case. As was pointed out in Section 3.2.3, if the component inserted is exponential with time constant A and the density is / with it in and g without it, then So an easy test of the joint hypothesis of pure insertion, independence, and the insertion of an exponential stage is that the right side of the above expression be independent of t. Ashby and Townsend (1980) examined this
Two-Choice Reaction Times: Basic Ideas and Data 217
for data reported by Townsend and Roos (1973) on a memory search procedure of the type discussed in Section 11.1, and to a surprising degree constancy was found. Thus, while there are ample a priori reasons to doubt all three assumptions, an unlikely consequence of them was sustained in one data set. This suggests that additional, more detailed work should be carried out on Donders' model. 6.2.4
Varying Signal! Intensity
The effect of intensity on simple reaction times is simple: both MRT and VRT decrease systematically as signal intensity is increased (Sections 2.3.1 and 2.3.2), and for audition but not vision an interaction exists between signal intensity and response criterion (Section 2.5.2). Neither of these statements appears to be true for choice reaction times, as was made evident by Nissen (1977) in a careful summary of the results to that point. The independence for visual stimuli was found. For example, Pachella and Fisher (1969) varied the intensity and linear spacing of 10 lights and imposed a deadline to control response criterion, and did not find evidence for an interaction of intensity with the deadline, although there was one between intensity and spacing. Posner (1978) varied the intensity of visual signals, and using conditions of mixed and blocked intensities he found no interaction. However, since no interaction was found in simple reactions for visual stimuli, the question of what would happen with auditory signals was not obvious. In 1977 the evidence was skimpy. This question was taken up by van der Molen, Keuss, and Orlebeke in a series of papers. In 1979 van der Molen and Keuss published a study in which the stimuli were 250 msec tones of 1000 Hz and 3000 Hz with intensities ranging from 70 to 105 dB. There were several foreperiod conditions, which I will not go into except to note that the foreperiods were of the order of a few seconds. Both simple and choice reactions were obtained. The major finding, which earlier had been suggested in the data of Keuss (1972) and Keuss and Orlebeke (1977), was that for the choice condition the MRT is a U-shaped function of intensity. The simple reactions were decreasing, as is usual. This result makes clear that the impact of intensity in the choice context is by no means as simple as it is for simple reactions, where it can be interpreted as simply affecting the rate at which information about the signal accumulates. They raised the possibility that a primary effect of intensity is in the response selection stage of the process rather than just in the accumulation process. That was not a certain conclusion because the error rate had not remained constant as intensity was altered. Additional U-shaped data were exhibited by van der Molen and Orlebeke (1980). To demonstrate more clearly the role of response selection, van der Molen and Keuss (1981) adapted a procedure introduced by J. R. Simon (1969; Simon, Acosta, & Mewaldt, 1975; Simon, Acosta, Mewaldt, &
218 Identification Paradigms Speidel, 1976). The signals were as before with a range of 50 to 110 dB. They were presented monaurally through ear phones, and responses were key presses with the two hands corresponding to signal frequency. Lights and digits presented just before the beginning of the foreperiod provided additional information about what was to be presented. The digit code was the following: 000 indicated to the subject that the signal was equally likely to be presented to either ear, 001 that it would go to the right ear, and 100 that it would go to the left ear. One colored light indicated that the ear receiving the signal and its frequency would be perfectly correlated, whereas the other color indicated no correlation of location and frequency. Note that the correct response could be either ipsilateral or contralateral to the ear to which the signal was presented. The data showed that responding was fast and a monotonic decreasing function of intensity when the correlated presentation was used and the response was ipsilateral. In any contralateral or uncorrelated ipsilateral condition, MRT was slower and U-shaped. They concluded that these results support the hypothesis that a major impact of intensity in choice reactions is on the response selection stage. In still another study, Keuss and van der Molen (1982) varied the foreperiod—either a fixed one of 2 sec or a variable one of 20, 25, 30, 35, or 40 sec—and whether the subject had preknowledge of the intensity of the presentation. The effect of preknowledge was to reduce MRT by about 10 msec. More startling was the fact that both simple and choice MRTs decreased with intensity except for the briefer 2-sec foreperiod, where again it was found to be U-shaped. Moreover, the error rate was far larger in this case than in the others. They seemed to conclude that the foreperiod duration was the important factor, although it was completely confounded with constant versus variable foreperiod. Assuming that it is the duration, they claimed this to be consistent with the idea of intensity affecting the response selection process. I do not find it as compelling as the earlier studies. 6.3 6.3.1
A CONCEPTUAL SCHEME FOR TRADEOFFS Types of Parameters
A major theoretical feature of cognitive psychology is its attempt to distinguish both theoretically and experimentally between two classes of variables and mechanisms that underlie behavior. The one class consists of those experimental manipulations that directly affect behavior through mechanisms that are independent of the subjects' motivations. For example, most psychologists and physiologists believe that the neural pulse patterns that arise in the peripheral nervous system, immediately after the sensory transducer converts the physical stimuli into these patterns, are quite independent of what the subject will ultimately do with that information. This means that these representations of the signal are independent of the type of
Two-Choice Reaction Times: Basic Ideas and Data 219
experiment—reaction time, magnitude estimation, discrimination—, of the questions we pose to the subject, and of the information feedback and payoffs we provide. Such mechanisms are often called sensory or perceptual ones, and the free parameters that arise in models of the mechanism are called sensory parameters. The experimental variables that activate such mechanisms have no agreed upon name. In one unpublished manuscript, Oilman (1975) referred to them as "display variables," but in a later paper (1977) he changed it to "task variables." I shall use a more explicit version of his first term, sensory display variables. The other type of mechanism is the decision process that, in the light of the experimental task posed, the subject brings to bear on the sensory information. These are mechanisms that have variously been called control, decision, motivation, or strategy mechanisms. Oilman refers in both papers to the experimental variables that are thought to affect these mechanisms directly as strategy variables. I shall follow the terminology of decision mechanism, decision parameters, and decision strategy variables. The latter include a whole range of things having to do with experimental procedure: the task—whether it is detection, absolute identification, item recognition, and so on—the details of the presentation schedule of the signals, the payoffs that are used to affect both the tradeoff of errors and to manipulate response times, and various instructions aimed at affecting the tradeoffs established among various aspects of the situation. It should be realized that there is a class of motivational variables, of which attention is a prime example, that I shall not attempt to deal with in a systematic fashion. Often, attentional issues lie not far from the surface in our attempts to understand many experiments, and they certainly relate to the capacity considerations of Chapters 11 and 12. Moreover, they are a major concern of many psychologists (Kahneman, 1973). Some believe that such variables affect the sensory mechanism, which if true only complicates the story to be outlined. To the degree that we are accurate in classifying sensory display and decision strategy variables, the former affect the sensory parameters and the latter the decision parameters. But a major asymmetry is thought to exist. Because the decision parameters are under the subject's control, they may be affected by sensory display variables as well as by decision strategy ones; whereas, it is assumed that the sensory parameters are not affected by the decision strategy variables. What makes the study of even the simplest sensory or perceptual processes tricky, and so interesting, is the fact that we can never see the impact of the sensory display variables on the sensory mechanism free from their impact on the decision parameters. Even if we hold constant all of the decision strategy variables that are known to affect the decision mechanism, but not the sensory ones, we cannot be sure that the decision parameters are constant. The subject may make changes in these parameters as a joint function of the sensory display and decision strategy variables.
220
Identification Paradigms
One major issue of the field centers on how to decide whether a particular experimental design has been successful in controlling the decision parameters as intended. This is a very subtle matter, one that entails a careful interplay of theoretical ideas and experimental variations. During the late 1960s and throughout the 1970s this issue was confronted explicitly as it had never been before, and out of this developed considerable sensitivity to the so-called speed-accuracy tradeoff. In Section 6.3.2 I shall try to formulate this general conceptual framework as clearly as I know how, and various special cases of it will arise in the particular models examined in Chapters 7-10. 6.3.2
Formal Statement
Assuming the distinction just made between sensory and decision mechanisms is real, we may formulate the general situation in quite general mathematical terms. Each experimental trial can be thought of as confronting the subject with a particular environment. This consists not only of the experimentally manipulated stimulus presented on that trial, but the surrounding context, the task confronting the subject, the reward structure of the situation, and the previous history of stimuli and responses. We can think of this environment as described by a vector of variables denoted by E. We may partition this into a subvector § of sensory display variables and a subvector D of decision strategy variables: E = (S, D). The observable information obtained from the trial arc two random variables, the response r and some measure T of time of occurrence. We assume that (r, T) is governed by a joint probability density function that is conditioned by the environmental vector, and it is denoted In order to estimate /' from data, it is essential that we be able to repeat the environment on a number of trials, and so be able to use the empirical histograms as a way to estimate /. Obviously, this is not possible if JB really includes all past stimuli and responses, and so in practice we truncate the amount of the past included in our definition of E. For more on that, see Section 6.6. The theoretical structure postulates the existence of a sensory mechanism with a vector o> = (or ] , . . . , or ( ) of sensory parameters and a decision mechanism with a vector 8 = ( 8 J ; . . . , 6 m ) of decision parameters. In general, cr and 8 should be thought of as random vectors; that is, their components are random variables and they have a joint distribution that is conditional on E. If (? and 5 are numerical I- and m-tuples, respectively, we denote the joint density function of or and fi by When we have no reason to distinguish between the two types of parameters, we simply write i= ; (cr, S). For each set of parameter values, it is
Two-Choice Reaction Times: Basic Ideas and Data 221
assumed that the sensory and decision processes relate the (r, T) pair probabilistically to the parameters and to them alone. In particular, it is assumed that (r, T) has no direct connection to E except as it is mediated through the parameters. We denote the joint density function of (r, T) conditional on e by So M* is the theory of how the sensory and decision processes jointly convert a particular set of parameter values into the (r, T) pair. By the law of total probability applied to the conditional probabilities of Eqs. 6.2 and 6.3, we obtain from Eq. 6.1
where the integral is to be interpreted as just that in the case of continuous parameters and as a sum in the case of discrete ones. We will always assume that the densities are of such a character that the integral exists in some usual sense (Riemann or Lebesgue). A special case of considerable interest is where r and T are independent random variables for each set of parameters; that is, if P(r \ e) and f(t \ e) denote the marginal distributions of ty, We speak of this as local (or conditional) independence. Observe that it does not in general entail that f (r, t \ E) also be expressed as the product of its marginals. [It is perhaps worth noting that conditional independence is the keystone of the method of latent structure analysis sometimes used in sociology (Lazarsfeld, 1954)]. Among the models we shall discuss in Chapters 7 to 9, local independence holds for the fast guess and counting models, but not for the random walk or timing models. One happy feature of local independence is the ease with which the marginal distributions are calculated. Theorem 6.1.
If Eqs. 6.4 and 6.5 hold, then
The easy proof is left to the reader. 6.3.3
An Example: J~he Fast-Guess Model
A specific, simple example should help fix these ideas and those to follow (especially Section 6.5). I shall use a version of the fast-guess model of Oilman (1966) and Yellott (1967, 1971), which will be studied more fully in
222 Identification Paradigms
Section 7.4. Suppose that prior to the presentation of the signal the subject opts to behave in one of two quite distinct ways. One is to make a simple reaction to signal onset without waiting long enough to gain any idea as to which signal was presented. We assume that the simple-reaction-time density to both signals is g 0 ( ( ) an d, quite independent of that, response r is selected with probability (3r, r = A,B, where 0A +(3B = 1. The other option is to wait until the information is extracted from the signal and to respond according to that evidence. This time is assumed to have density g t (t) for both signals and that, independent of the time taken, the conditional probability of response r to signal s is Psr, where s = a, b, r = A, B, PSA + P.SB = 1 • Note that we have built in local independence of r and T for each of the two states, 0 representing fast guesses (simple reactions) and 1 the informed responses. Denote by p the probability that the subject opts for the informed state. In arriving at the general form for f(r, t E), let us suppress all notation for E save for the signal presented, s. From Eqs. 6.4 and 6.5 we obtain Observe that this model has two decision parameters—namely, p and |3A (recall, /3B = 1 - |8A)—, two sensory functions—g () and g,—, and two discrimination (sensory) parameters—P sA , s = a,b (recall, P s n ==l-P s A ). For some purposes we can reduce the functions to their means, va and vl. Implicitly, I assume val (see Section 6.2.1). Either by direct computation from Eq. 6.8 or by using Eqs. 6.6 and 6.7, we obtain for the marginals
Note that f(t \ s) is actually independent of s. From Eq. 6.9 we can compute the overall expected reaction time to the presentation of a particular signal s:
For some purposes it is useful to compute £(T) for each (s, r) pair separately—that is, the mean of f ( t r, s) =/(r, t \ s)/P(r \ s), which from Eq. 6.8 we see is
where P(r | s) is given by Eq. 6.10. A few words may be appropriate at this point about how one attempts to confront such a model with data. One immediately obvious problem is that the data provide us with estimates of f(r, t E) whereas the theory as stated in Eq. 6.8 has a number of parameters that are not explicit functions of E.
Two-Choice Reaction Times: Basic Ideas and Data
223
Of course, from the interpretation of the model the parameters p and j3r belong to the decision process and the others belong to the sensory process. Thus, we do not anticipate any dependence of the latter parameters on manipulations of the decision strategy, but the former may depend upon any aspect of E. It is quite typical of models throughout psychology—not just those for response times—that no explicit account is offered for the dependence of the parameters on environments. This is a fact and limitation, not a virtue, of our theorizing. Because we cannot compute the parameters from knowing E, in practice we estimate them in some fashion from the data and then attempt to evaluate how well the model accounts for those data. For example, this will be done for the fast guess model in Section 7.4. If the fit is reasonably satisfactory and if enough data have been collected, which often is not the case, we can then study empirically how the parameters vary with the different experimental manipulations. Sometimes quite regular relations arise that can be approximated by some mathematical functions and that are then used in later applications of the model to data. 6.3.4
Discrimination of Color by Pigeons
In some data, however, it is reasonably clear without any parameter estimation that something like fast guesses are involved because the two distributions g() and gi are unimodal and are so separated that the overall responsetime distribution is bimodal. The clearest example of this that I know is not with human data, but with pigeons. During the training phase, Blough (1978) reinforced the birds for responding to the onset of a 582-nm light (S + ) and the onset of the signal was delayed a random amount whenever a response (peck) was made at a time when the light was not on. This is a discrimination, not a choice design. In the test phase, all lights from 575- to 589-nm in 1-nm steps were presented equally often except for 582 nm, which was three times more frequent than the others and was reinforced on one third of its occurrences. Response probability was a decreasing function of the deviation from 582 and it was approximately the same decay on both sides. To keep the figure from being too complex, only the data for the smaller wave lengths are shown. The pattern of response times, shown for one bird in Figure 6.4 is strikingly bimodal. The earlier mode, the unshaded region of these distributions, clearly does not differ from signal to signal in frequency of occurrence, location in time, or general shape. Since that mode had a central tendency of about 170msec, I suspect they were simple reaction times to signal onset—that is, fast guesses. However, I am not aware of any simple reaction time data for pigeons with which to compare that number. The second mode, the shaded region, is clearly signal dependent in that the number of responses of this type decreases as the signal deviates from the reinforced S+ (582 nm); however, its location did not seem to change and its central tendency was about 350 msec. These data,
224 Identification Paradigms FIG. 6.4 Histograms of response times of a pigeon to signal lights of several frequencies, of which the one marked S' was reinforced during training and partially reinforced during testing. The sample size for each distribution was 1760. Note the striking bimodality of the histograms and the relative independence of the first mode as the wavelength of the light varies. [Figure 4 of Blough (1978); copyright 1978; reprinted by permission.]
which are completely consistent with the fast-guess model, suggest that pigeons's reactions to visual signals are similar to those of people; the mean of 170msec for simple reactions may be a trifle faster, but the additional delay of about 180 msec for accurate responding is, as we saw in Section 6.2.1, very similar to people. 6.4 DISCR1MINABILITY AND ACCURACY 6.4.1
Varying the Response Criterion: Various ROC Curves
By now it is a commonplace that either by varying the relative frequency of a to b, or by differential payoffs for the four possible signal-response pairs, or just by instructing the subject to favor A or B, one can cause P(A \ a) and P(A \ b) to vary all the way from both conditional probabilities being 0 to both being 1—that is, from no A responses at all to all A responses. Note that nothing about the stimulating conditions, and so presumably nothing about the sensory mechanism, is altered. Just the presentation probability or the payoffs or the instructions are varied. Moreover, the locus of these pairs of points is highly regular, appearing to form a convex function when P(A a) is plotted against P(A b)—that is, an increasing function with a decreasing slope. Typical data are shown in Figure 6.5. Such functions are
Two-Choice Reaction Times: Basic Ideas and Data 225
called ROC curves, the term stemming from the engineering phrase "receiver operating characteristic." Detailed discussions of these curves, of data, and of mathematical models to account for them can be found in Green and Swets (1966, 1974) and Egan (1975). In the case of Yes-No detection in which A = Y, B = N, a = s = signal, and b = n = noise, we speak of P(A 1 s) as a "hit," P(N | s) as a "miss," P(Y n) as a "false alarm," and P(N | n) as a "correct rejection." Let us work out the ROC for the fast-guess model, which we do by eliminating (3A from Eq. 6.10, where s = a, b, r = A: which is simply a straight line with slope 1. Other models yield different, usually curved, ROCs. For example, the standard model of the theory of signal detectability assumes that each stimulus is internally represented as a Gaussian random variable and the continuum of possible values is partitioned by a cut—called the response criterion—and the subject responds according to whether the observation is larger or smaller than the criterion. This model is sketched in Figure 6.5. For some time it had been noted that as the criterion was varied, subjects exhibited a tendency to respond faster when the internal representation of the signal was far from the criterion, as judged by confidence ratings, and slower when it was close to the criterion (Emmerich, Gray, Watson, & Tanis, 1972; Fernberger, Glass, Hoffman, & Willig, 1934; Festinger, 1943a,b; Gescheider, Wright, Weber, Kirchner, & Milligan, 1969; Koppell, 1976; Pike, 1973; and Pike & Ryder, 1973). Pike and Ryder (1973) refer to the assumption that E(T) = f(\x-c\), where c is the criterion, as the latency function hypothesis. Festinger (1943a, b), Garrett (1922), and Johnson (1939) all reported no evidence that the relation between confidence and reaction time is affected by instructions emphasizing either speed or accuFIG. 6.5 Example of auditory ROC data obtained under fixed signal conditions with varied payoffs. The theoretical curve arises from Gaussian decision variables postulated in the theory of signal detectability. The inset indicates how the ROC is generated by varying the response criterion. [Figure 4.1 of Green and Swets (1966); copyright 1966; reprinted by permission.]
226 Identification Paradigms
FIG. 6.6 Schematic of how latency ROC curves are constructed; see text for explanation.
racy of responding (Section 6.5). As a result confidence and reaction time were assumed to be very closely related. Recently, however, Vickers and Packer (1981) carried out a careful study using line length discriminations and found a decided difference as a function of instruction. The reason for this difference in results is uncertain. The earlier apparent relation between confidence and reaction time together with the fact that ROC curves can be quite accurately inferred from confidence judgments led to the idea of trying to infer the ROC curve from response-time data. One method was proposed by Carterette, Friedman, and Cosmides (1965) and another, closely related, one by Norman and Wickelgren (1969), which is illustrated in Figure 6.6. The idea is this: For each signal, place the two subdistributions, weighted by their probabilities of occurring, back to back, with the A response on the left. From these
Two-Choice Reaction Times: Basic Ideas and Data 227
artificial densities we generate the ROC as follows: Align them at the point of transition from A responses to B responses and vary the criterion. To be more explicit, the locus of points (x, y) is generated as follows. For x < P(A | a), let t be such that and set
For x>P(A | a), let t be such that and set
This locus of points has been called both the latency operating characteristic, abbreviated LOG, and the RT-ROC. It is important to note that Lappin and Disch (1972a, b and 1973) used the term LOG for a speed-accuracy measure that others call the conditional accuracy function (see Section 6.5.4). The construction we have just outlined is really quite arbitrary; it does not stem from any particular theoretical view about what the subject is doing. The major motive for computing it is the intuition that reaction time serves as a proxy for the subject's confidence in the response made. According to Gescheider et al. (1969) in a model with a response criterion, such as that of the theory of signal detectability, this function has to do with how far the evidence about the stimulus is from the response criterion. The only serious theoretical study of the relation between the ROC and RT-ROC is Thomas and Myers (1972). The results, which are rather complicated and will not be reported very fully here, were developed for both discrete and continuous signal detection models; that is, there is an internally observed random variable Xs for signal s, which is either discrete or continuous and has density function f(x \ s), and there is a response criterion j3 such that the response is A when Xs > |3 and B when Xs < (3. They assumed that the latency is a decreasing function of Xs-|3|. They considered the somewhat special case where the distributions are simply a shift family; that is, there is a constant kab such that for all x, f ( t \ a ) = f(t~kab | b), and that the slope of the ROC curve is decreasing (which they showed is equivalent to -d 2 log/(x s)/dx 2 >0). Under these assumptions, they proved that the RT-ROC lies below the ROC except, of course, at the point (P(A | a), P(A | b)), which, by construction, they have in common. Emmerich et al. (1972) reported a detection study of a 400-Hz tone in noise in which detection responses and confidence judgments were made; in addition, without the subjects being aware of it, response times were recorded. The confidence and RT-ROCs are shown in Figure 6.7. Note that these are not plots of P(A | a) versus P(A \ b), but of the corresponding Gaussian z-scores {i.e., z(r s) is defined by P(r | s) = <J>[z(r | s)], where <3> is the unit Gaussian distribution}, which results in straightline ROCs if the
228 Identification Paradigms
FIG. 6.7 ROC data of several types presented in z-score coordinates (displaced along the abscissa for clarity), in which case the Gaussian ROCs are straight lines. The three curves on the left are latency ROCs, and the two on the right are conventional ones obtained from choice probabilities, with the squares data from Emmerich (1968) and the triangles from Watson ct al. (1964). [Figure 4 of Emmerich et al. (1972); copyright 1972; reprinted by permission.]
underlying distributions are Gaussian. Ordinary Yes-No and confidence ROCs are usually rather closely fit by straight lines in z-score coordinates, but as can be seen the RT-ROCs exhibit a rather distinct elbow. This result is typical of all that have been reported: Moss, Meyers, and Filmore (1970) on same-difference judgments of two tones; Norman and Wickelgren (1969) using the first of memorized digit pairs as the stimuli; and Yager and Duncan (1971) using a generalization task with gold fish. The conclusion drawn by Emmerich et al. (1972, p. 72) was: Thus latency-based ROCs should probably not be viewed as a prime source of information about sensory processing alone. Response latencies are known to be influenced by many factors, and this is undoubtedly also the case for latencybased ROCs. Yager and Duncan (1971) reach a similar conclusion . . . .
Additional cause for skepticism is provided by Blough's (1978) study of wavelength discrimination by pigeons, which was discussed previously in Section 6.3.4. Using the response-time distributions, RT-ROC curves were developed with the result in Figure 6.8. These do not look much like the typical ROC curves. For example, from other pigeon data in which rate of responding was observed, one gets the more typical data shown in Figure 6.9. The reason for the linear relations seen in Figure 6.8 is the bimodal character of the response-time distributions seen in Figure 6.4 (Section 6.3.4).
Two-Choice Reaction Times: Basic Ideas and Data 229
6.4.2
Varying the Discriminability of the Signals
The most obvious effect of altering the separation between two signals that differ on just one dimension—for example, intensity of lights—is to alter the probability of correctly identifying them. This information was traditionally presented in the form of the psychometric function: holding signal b fixed and varying a, it is the plot of P(A \ a) as a function of the signal separation, FIG. 6.8 Latency ROCs for pigeons; an example of the distributions from which these were constructed was shown in Figure 6.4. [Figure 2 of Blough (1978); copyright 1978; reprinted by permission.]
230 Identification Paradigms FIG. 6.9 Conventional choice probability ROCs for pigeons responding to light stimuli. [Figure 3 of Blough (1978); copyright 1978; reprinted by permission.]
usually either the difference, the ratio, or log ratio of the relevant physical measures. This function begins at about | for two identical signals and g to 1 when they are sufficiently widely separated—approximately a factor of 4 in intensity. However, given the impact of response criterion just discussed, it is clear that there is no unique psychometric function but rather a family of them. For example, in the fast-guess model (Section 6.3.3) with the interpretation given the parameters above, the impact of signal difference will be on Psr and not directly on p or (3A, which are thus parameters of the family of psychometric functions for this model (Eq. 6.10). For very careful and insightful empirical and theoretical analysis of psychometric functions, see Laming (1985). He makes very clear the importance of plotting these functions in terms of different physical measures depending on the exact nature of the task. He also provides a most interesting theoretical analysis concerning the information subjects are using in the basic psychophysical experiments. In order to get a unique measure it is necessary to use something that captures the entire ROC curve. The most standard measure is called d'', and its calculation is well known (see Green and Swcts, 1966, 1974, or Egan, 1975, or any other book on signal detection theory); it is the mean separation of the underlying distributions of internal representations normalized by some average of their standard deviations. Navon (1975) showed under the same assumptions made by Thomas and Meyers (1972) that if the false-alarm rate is held constant as stimulus discriminability is varied, then fc'(T|a, A) fi(T b, A) is a monotonic function of d'ab. The second, and almost equally obvious, effect is that subjects are slower
Two-Choice Reaction Times: Basic Ideas and Data 231
when the signals are close and faster when they are farther apart. Moreover, this phenomenon occurs whether the conditions of signal separation are run blocked or randomized. Among the relevant references are Birren and Botwinick (1955), Botwinick, Brinley, and Robbin (1958), Grossman (1955), Henmon (1906), Festinger (1943a, b), Johnson (1939), Kellogg (1931), Lemmon (1927), Link and Tindall (1971), Morgan and Alluisi (1967), Pickett (1964, 1967, 1968), Pike (1968), Vickers (1970), Vickers, Caudrey, and Willson (1971), Vickers and Packer (1981), and Wilding (1974). The fact that it appears using randomized separations means that the phenomenon is very much stimulus controlled since the subject cannot know in advance whether the next discrimination will be easy or difficult. As Johnson (1939) made clear and as has been replicated many times since, the stimulus range over which these two measures—accuracy and time—vary appears to be rather different. At the point the response probability appears to reach its ceiling of 1, response times continue to get briefer with increasing signal separation. Moreover, the same is true for the subject's reports of confidence in the response, which is one reason that reaction time is often thought to reflect confidence in a judgment (or vice versa, since it is not apparent on what the confidence judgments are based). It is unclear the degree to which this is a real difference or a case of probabilities being very close to 1 and estimated to be 1 from a finite sample. In the latter case, it may be the ranges corresponding to changes in d' and E(T) may actually be comparable. Some debate has occurred over what aspect of the stimulus separation is controlling, differences or ratios or something else. Furthermore, there is no very good agreement as to the exact nature of the functions involved (see Vickers, 1980, pp. 36-38). If accuracy increases and time decreases with signal separation, then they must covary and one can be plotted as a function of the other. This plot, however, is not what is meant when one speaks of a speed-accuracy tradeoff, which is discussed in Section 6.5. The only study of the dependence of response time on signal separation I shall present here in any detail is Wilding (1974), because he gives more information about the reaction-time distributions than do the others. The task for each of his seven subjects was to decide on each trial if a 300-msec spot of light of moderate brightness was to the right or left of the (unmarked) center of the visual field. There were four possible locations on each side forming a horizontal line, numbered from 1 on the left to 8 on the right, spanning a visual angle of about 0.4°. So 1 and 8 were the most discriminable stimuli and 4 and 5, the least. The data were collected in runs of 110 trials, the first 10 of which were discarded. There were four runs in each of two sessions which differed according to instructions, the one emphasizing accuracy and the other speed. In analyzing the data for certain things, such as the fastest or the slowest response, one must be cautious about sample sizes. For signals 1 and 8 the probability of being correct was virtually 1, whereas it was very much less
NJ U>
FIG. 6.10 For signal positions to be discriminated as being to the right or left of an unmarked point, various summary statistics about the response times as a function of signal location and whether the response was correct or in error. (1, 8), (2, 7), (3, 6), (4, 5) are all correct responses from the most distant to the least, and (5, 4) and (6, 3) are incorrect ones for the least distant and the next position. The data on the left of each subpanel were obtained with the subjects instructed to be fast, and on the right, to be accurate. Each subject participated in 400 trials; there were seven subjects. [Figure 1 of Wilding (1974); copyright 1974; reprinted by permission.]
Two-Choice Reaction Times: Basic Ideas and Data 233
than that for 4 and 5, and so the sample sizes of, say, correct responses were not the same. Wilding, therefore, forced comparability by choosing randomly from the larger sets samples of the same size as smaller ones, and he reported the results both for the original and the truncated samples. Figure 6.10 shows a number of statistics. The measures all come in pairs, with the left one arising from the speed instructions and the right one from the accuracy instructions. We see a number of things. First, all of the times are very slow, indeed, much slower than the choice data discussed in Section 6.2.1. I do not understand why this was so. One possibility is that the stimuli were actually weak, but I cannot be certain from the paper. Second, the effect of the instructions is to produce a mean difference of nearly 500 msec. Third, the times that arise from errors, the pairs denoted (6, 3) and (5, 4), are slower than the times from the corresponding correct responses, (3, 6) and (4, 5). Fourth, easy discriminations are, of course, both faster and less prone to error than are difficult ones. Fifth, the pattern of the standard deviations mimics rather closely that of the means. And sixth, the skewness and kurtosis are both positive and increasing with ease of discrimination. This is in agreement with data of Vickers, Caudrey, and Willson (1971) discussed in Section 8.5.1. Figure 6.11 shows the latency frequency histograms for each subject for the accuracy condition; the columns correspond to (1, 8), (2, 7 ) , . . . , (6.3), as in Figure 6.9. There are not enough data here to tell much about the mathematical form involved except that both the mean and variance increase as stimulus discriminability decreases. 6.4.3 Are Errors Faster, the Same as, or Slower than the Corresponding Correct Responses? This question has loomed important because a number of the models to be discussed in Chapters 7 to 9 make strong (and often nonobvious) predictions about the relation. I know of three appraisals of the situation—Swensson (1972a), Vickers (1980), and Wilding (197 la)—and they are inconsistent. The earlier ones, which Vickers seems to have ignored, are the more accurate. It is actually clear from the data already presented that there is no simple answer to the question. For the pigeons attempting to make a difficult discrimination, but doing it quite rapidly, the data in Figure 6.7 make clear that errors are faster than correct responses; in fact, the larger the error the faster it is. But for people also engaged in a visual discrimination, we see in Figure 6.10 that errors are slower than the corresponding correct response. The source of the difference probably is not the species of the subject, although no directly comparable data exist. According to Swensson the important difference, at least for human beings, can be described as follows. Errors are faster than correct responses when two conditions are met: the discrimination is easy and the pressure to
N)
OJ 4^
FIG. 6.11 For the experiment of Figure 6.10, histograms for response times in the accuracy condition. The rows correspond, from top to bottom, to the abscissa code of Figure 6.10 with (1,8) at the top; the columns are different subjects. [Figure 2 of Wilding (1974); copyright 1974; reprinted by permission.]
Two-Choice Reaction Times: Basic Ideas and Data 235
be fast is substantial. This is true of the data of Egeth and Smith (1967), Hale (1969a), Laming (1968), Lemmon (1927), Oilman (1966), Rabbitt (1966), Schouten and Bekker (1967), Swensson (1972a), Swensson and Edwards (1971), Weaver (1942), and Yellot (1967). A near exception to this rule are the results of Green et al. (1983). In this study well-practiced observers each making over 20,000 frequency identifications exhibited virtually the same mean times for errors and correct responses. A careful analysis of the distributions, however, did show a difference in the direction of faster errors. I will go into this experiment in detail in Section 8.5. Continuing with Swensson's rule, errors are slower than correct responses when two conditions are met: the discrimination is difficult and the pressure to be accurate is substantial. This is true of Audley and Mercer (1968), Emmerich et al. (1972), Hecker, Stevens, and Williams (1966), Henmon (1911), Kellogg (1931), Pickett (1967), Pierrel and Murray (1963), Pike (1968), Swensson (1972a), Vickers (1970), and Wilding (197 la, 1974). There are fewer studies in which the discrimination is difficult and the pressure to be fast is great. Henmon (1911) showed that under such conditions the fastest and slowest response times involved higher proportions of errors than did the intermediate times, suggesting that the subjects may have oscillaled between two modes of behavior. Rabbitt and Vyas (1970) suggested that errors can arise from a failure of either what they call perceptual analysis or response selection. They state (apparently their belief) that when the failure is in response selection, errors are unusually fast; but when it is in perceptual analysis, error and correct RT distributions are the same. Apparently, perceptual analysis corresponds to what I call the decision process and is the topic of most modeling. In Blough's experiment, the pigeons exhibited fast errors, and judging by the times involved the pigeons were acting as if they were under time pressure. Of course, Wilding found errors to be slower than correct responses under both his speed and accuracy instructions, but one cannot but question the effectiveness of his speed instructions when the fastest times exceeded 450 msec. Link and Tindall (1971) combined four levels of discriminability with three time deadlines—260 msec, 460 msec, and °o msec—in a study of same-different discrimination of pairs of line lengths successively presented, separated by 200msec. Their results are shown in Figure 6.12. Note that under the accuracy condition and for the most difficult discriminations, errors are slower than correct responses; whereas at the 460-msec deadline the pattern is that errors are faster than correct responses and the magnitude of the effect increases with increased discriminability; and at 260msec, which is only slightly more than simple reaction times, the mean times are constant, a little less than 200 msec, independent of the level of discriminability and of whether the response is correct or in error. The latter appear to be dominated by fast guesses—that is, simple reactions—but they cannot be entirely that since accuracy is somewhat above chance. A careful analysis of these data will be presented in Section 7.6.2. Thomas (1973) reported a
236 Identification Paradigms
FIG. 6.12 For same-different judgments of correct judgments as a function of signal deadline conditions. In the MRT plot, the fast-guess model's correction for guessing. copyright 1971; reprinted by permission.]
pairs of lines, MRT and proportion of discriminability (abscissa) under three solid circles arc obtained by using the [Figure 1 of Link and Tindall (1971);
study in which errors were fast for one foreperiod distribution and one group of subjects and slow for another distribution and group of subjects; 1 cannot tell from the experimental description if the signals were easy or difficult to discriminate. Heath (1980) ran a study on the discrimination of the order of onset of two lights, where the time between onset varied, and he imposed response deadlines. Unfortunately, the data do not seem to be in a form suited to answering the question of this section.
6.5
SPEED-ACCURACY TRADEOFF
6.5. / General Concept of a Speed-Accuracy Tradeoff Function (SATF) Within any response-time model of the type formulated in Section 6.3.3, as the decision parameters are varied, changes occur in ^(r, t \ a, 8), which in turn are reflected in f(r, t E). The general intuition is that these changes are such that the marginal measures of probability of choice, P(r E), and of response time, f(t E), covary. The more time taken up in arriving at a decision, the more information available, and so the better its quality. This statement is, perhaps, overly simple since if a great deal of time is allowed to pass between the completion of the signal presentation and the execution of the response, then the accuracy of responding may deteriorate because of some form of memory decay. But for a reasonable range of times the
Two-Choice Reaction Times: Basic Ideas and Data 237
statement appears to be correct. What is probably most relevant is the portion of the stimulus presentation that can be processed before an order to respond is issued. Usually we attempt to control that time indirectly through time deadlines on the response time. As usual, the theory involves a covariation due to changes in parameters, and the data a covariation due to changes induced in E. For example, suppose the experimenter imposes a time deadline on the subject such that any response occurring after signal onset but before the deadline is rewarded for accuracy, but those that are slower than the deadline are fined independent of their accuracy. The effect of changes in the deadline is to vary both the accuracy and the mean response time. Suppose that we suppress all of the notation for E except for the two things that vary from trial to trial—namely, the signal presented, s, and the deadline, 8, imposed. The averaged data then consist of four pairs of numbers:
for each value of 8. If we think of these as functions of 8, then we may solve for the one in terms of the other, eliminating 8, yielding four empirical functions of the form
These are called empirical speed-accuracy tradeoff functions (SATF) or latency-probability functions (LPF). I use the former term and abbreviation. As four separate functions are a bother, especially since they probably contain much the same information, in practice some simplification is made. Usually a single measure of accuracy—four different ones will be mentioned below—is plotted against the overall MRT, and that is referred to as the SATF. Other terms are found in the literature such as the speed-accuracy operating characteristic or S-A OC (Pew, 1969) and the macro tradeoff (Thomas, 1974). The rationale for the latter term will become apparent later. One must be ever sensitive to the fact that such a collapse of information may discard something of importance. Observe that if we vary something in the environment, different from the deadline, that affects both speed and accuracy, we may or may not get the same SATF. As an example of a different procedure, Reed (1973) signaled the subjects when to respond, and he varied the time during the presentation of the reaction signal when the response was to be initiated. Theoretically, what SATF we get depends upon which decision parameters are affected by the experimental manipulation. Since we do not usually have a very firm connection between E and 8, there can be disagreement about which theoretical tradeoff goes with which empirical one. For the fast guess model this is not really an issue, since the only decision parameter affecting E(T\ s) (Eq. 6.11) is p. So if we eliminate it between Eqs. 6.10 and 6.11 we
238 Identification Paradigms
obtain the unique theoretical SATF,
which is the simplest possible relation, a linear one. Swensson and Thomas (1974) described a broad class of models, called fixed-stopping ones, that yields a relation that has played a role in empirical investigations. Suppose that there is a series of n distinct observations X; each taking time Ti; i = 1, . . ., n, where all of the random variables are independent, the X; are identically distributed, and the T; are also identically distributed. The density of X j; f(x \ s), depends on the signal presented whereas that of T; does not. If R is the residual time, then the response time i So
Assume the decision variable to be the logarithm of the likelihood of the observations; that is, For n reasonably large, the Central Limit Theorem (Appendix A. 1.1) implies that ¥„ is distributed approximately as a Gaussian with mean E(Yn | s) = fis and variance V(Y n | s) = cr^/n, s = a, b. If a response criterion c is selected for responding A when Yn > c, then we see that
Denoting the z-score of P(A s)—that is, the upper limit on the unit normal that yields this probability, by z(s)—then eliminating c between z(a) and z ( b ) yields the linear ROC curve There are several ways to aenne a so-caned a measure of the accuracy of performance described by the ROC curve; perhaps the simplest is that value of z ( a ) corresponding to z(fo) = 0; that is, Now, if the speed-accuracy tradeoff is achieved by varying n, we see that it is given by They also describe another class of models with an optional stopping rule, in which the value of n depends upon the observations actually made; it is more complex and versions of it arc described in detail in Sections 8.2 and 8.3.
Two-Choice Reaction Times: Basic Ideas and Data 239
With more complicated models in which J5(T | s) and/or P(r \ s) depend upon two or more decision parameters, then there can be a great deal of uncertainty as to what theoretical relation to compare with what data. Weatherburn (1978) pointed this out as a serious issue for a number of the models we shall discuss in Chapter 8. Pike and Dalgleish (1982) attempted to counter his admittedly correct observations by showing that for some of the models the locus of possible pairs of speed and accuracy values is sufficiently constrained under all possible (or plausible) values of the parameters that the model can, in principle, be rejected. Weatherburn and Grayson (1982) replied, reemphasizing that the rejoinder rested on interpretations of model parameters, which may not be correct. The fact is that the models they were discussing indeed do have a wide range of possible speed-accuracy pairs consistent with them, not a simple function. Great caution must be exercised when a model has more than one decision parameter that affects both speed and accuracy. This point, which I believe to be of considerable importance, has not been as widely recognized as it should be by those who advocate the use of SATFs. In a sense, whether a tradeoff plot is useful depends, in part, on the complexity of the model one assumes to underlie the behavior. However, ignoring the tradeoff or assuming that certain parameters can be held constant by empirically trying to achieve constancy of accuracy or of time is subject to exactly the same difficulties. A substantive realization of these observations can be found in Santee and Egeth (1982), in which they argue that in some cognitive tasks—they use letter recognition—experimental procedures that permit the use of accuracy measures draw upon a different aspect of the cognitive processing than do procedures that use response-time measures. In their study they manipulated exposure duration. They argued that for brief exposures the accuracy is affected by limitations on data processing by the subject, whereas with long exposures, which is typical of cognitive experiments, the behavior studied has to do with response limitations. For example, in their Experiment 1, there were two letters on either side of a fixation point, and an arrow under one indicated that the subject was to respond whether that letter was an A or an E. There were three conditions involving the other letter: it could be the same as the indicated one, or the other target letter, or an irrelevant letter (K and L were used). I refer to these as same, different, and irrelevant, respectively. With the tachistoscope timed to produce about 75% overall accuracy in each subject (8 to 20msec exposure durations), it was found that subjects were most accurate in the different condition and least accurate in the same one. With an exposure of 100 msec and instructions to respond as rapidly as possible while maintaining a high degree of accuracy, they were fastest for same and slowest for different. They believe these findings to be inconsistent, and so conclude that different aspects of the process are being tapped.
240 Identification Paradigms
6.5.2
Use of the SATF
Consider assessing the impact of some variable, say the amount of alcohol ingested, upon performance. If as the dosage level is increased both accuracy and MRT change in the same direction, then it can be quite unclear whether there has been a change in the quality of performance or merely a change in the speed-accuracy tradeoff. In particular, it may be quite misleading to plot just one of the two measures against dosage level. For a summary of references in which ambiguous results about alcohol have been presented, see Jennings, Wood, and Lawrence (1976). Some experimenters have attempted to overcome this problem by experimentally controlling one of the two variables. For example, one can use some sort of time payoff scheme, such as band payoffs or a deadline, to maintain MRT within a narrow range as the independent variable—in this case, amount of alcohol ingested—is manipulated, and to evaluate the performance in terms of accuracy. Alternatively, one can attempt to control accuracy. This approach is widely used in the study of short term memory, as we shall see in Chapter 11. Often an attempt is made to keep the error rate low, in the neighborhood of 2% to 5%. Not only is it difficult to estimate such a rate with any degree of accuracy, but if the SATF is changing rapidly in the region of small errors—which as we shall see it often appears to be—then this is a region in which very large time changes can correspond to very small changes in the error rate, making it very unlikely that the intended control is effective. Furthermore, one can easily envisage a model having more than one sensory state, one of which exhibits changes only in accuracy and another of which involves a speed-accuracy tradeoff. If experimentally we control accuracy through the first stage, then we will have done nothing whatsoever to control the SATF, which is under the jurisdiction of the second stage. Because of these difficulties in keeping control of one of the variables, some authors (most notably Wickelgren and those associated with him, but also Oilman, Swensson, and Thomas) have taken the position that the only sensible thing to do is to estimate the entire SATF and to report how it as a whole varies with the experimental manipulation. This attitude parallels TABLE 6.4. Mean slopes and intercepts of the best-fitting linear regression for each alcohol condition (Jennings et al., 1976) Dose (mg/kg)
Slope (bits/sec) Intercept (msec)
0
.33
.66
1.00
1.33
6.45 168
5.71 162
4.92 161
4.90 173
3.38 150
Two-Choice Reaction Times: Basic Ideas and Data 241 FIG. 6.13 Schematic of the general tradeoff relation (SATF) believed to hold between some measure of accuracy and response time. For any time below ti accuracy is nil; for times above ii accuracy does not change; and in between it is monotonic increasing. [Figure 1 of Wood and Jennings (1976); copyright 1976; reprinted by permission of the publisher. |
closely that expressed by those who say discrimination can only be studied via the ROC curve, and any measure of discriminability should relate to that curve. As an example, Jennings et al. (1976) studied the SATF for choice reactions involving the identification of 1000 Hz and 1100 Hz tones. To manipulate the tradeoff they used a variety of deadlines, and subjects were paid for accuracy when responses were faster than the deadline and were fined for the slower responses. Measuring accuracy in terms of information transmitted (see Section 6.5.3 for a general discussion of accuracy measures), they fit linear functions to the curves and Table 6.4 shows how the intercept and slope varied with alcohol dose level. The slope is affected systematically, and the intercept somewhat irregularly. 6.5.3
Empirical Representations of SATFs
A certain amount of discussion has been devoted to the best way to present the empirical SATF. The initial studies* (Fitts, 1966; Pachella & Pew, 1968) separately plotted percent correct and MRT as functions of the decision strategy variable manipulated by the experimenter. Schouten and Bekker (1967), using a somewhat different measure of performance, which will be discussed in detail in Section 6.5.4, plotted their measure of accuracy against MRT. They, and others, who have plotted a probability measure of accuracy against MRT, have found the general pattern shown in Figure 6.13, which is composed of three separate pieces that can, to a rough first approximation, be thought of as linear pieces. Up to a certain time, the accuracy level remains at chance, after which it grows linearly until it reaches its ceiling of * Perhaps the earliest relevant study is Garrett (1922) in which he said (p. 6), "Everyday knowledge seems to indicate that, in general, accuracy diminishes as speed increases, but there is little detailed information beyond the bare statement." He then went on to study how accuracy is affected by stimulus exposure time, but he did not directly manipulate the overall response time, as such, and so it was not really an example of a SATF.
242 Identification Paradigms
1, after which it stays at perfect accuracy with increases in time. The important facts are that for sufficiently short times, accuracy is nil; beyond another time, changes in MRT, which do occur, do not seem to affect the accuracy, which is virtually perfect; and between the two times there is a monotonic increase in accuracy. As was noted earlier, it is unclear whether accuracy really does become perfect or whether we are dealing with an asymptotic phenomenon. The models usually imply the latter. Taylor, Lindsay, and Forbes (1967) replotted the Schouten and Bekker data, replacing the probability measure of accuracy by the d' measure of signal detectability theory, and they showed that (d')2 was approximately linear with MRT. Pew (1969), noting that log odds = log P C /(1-P C ), where Pc is the probability of being correct, is approximately linear with (d'Y in the 2-alternative case, replotted the data existing at the time in log odds, which is shown in Figure 6.14. Another set of data, plotted in the same way, will be presented in Figure 9.6 (Section 9.3.4) when we discuss the timing and counting models. Lappin and Disch (1972a) raised the question as to which of several measures of accuracy gave the most linear plot against MRT. They compared d'; (d')2 (see Eq. 6.14); information transmitted, that is,
(see Section 10.2 for a rationale); and
which becomes log odds in a symmetric situation and was suggested by Luce (1963). For each they established the best linear regression and evaluated the fit by the percent of variance accounted for; it had the small range of .86 to .91 with d' having a very slight edge over the others. Swensson (1972a) made a similar comparison with similar results. Salthouse (1981) reported for each of four subjects the correlations of MRT with Pc, d', (d1)2, log odds, and information transmitted. Again, the range of values was not large—.706 to .924. Ranking the five measures for each subject and adding the ranks show information transmitted to be best, (d'Y the worst, and the other three about midway with little difference among them. So among these measures of accuracy none provides a clearly better correlation with MRT, and as we shall see, different theories suggest different choices. Why do we concern ourselves with the question of which accuracy measures lead to linear SATFs? One reason is case of comparison, but that is hardly overriding since as we shall see shortly in certain memory studies comparisons are readily made among exponential fits. A more significant reason is formulated as follows by Thomas (1974, p. 449): "Capacity [of an information processing system] is usually defined as the maximum rate at which information is processed, and it is measured by finding that monotonic
Two-Choice Reaction Times: Basic Ideas and Data 243
FIG. 6.14 SATF, with accuracy measured as log odds, from four studies: a is Schouten and Bekker (1967) and Pachella and Pew (1968), b and c are data from P. Fitts, and d is data from R. Swensson. [Figure 1 of Pew (1969); copyright 1969; reprinted by permission.]
function of accuracy which is linear with reaction time. The slope of this line is taken to be the measure of capacity...." To a considerable degree, this appears to be definitional rather than descriptive. It is, however, disconcerting that the linear function usually does not pass through the origin. For further discussion of the concept and modeling of capacity, see Townsend and Ashby (1978, 1984). Kantowitz (1978), in response to a general survey of SATFs and their role in cognitive research by Wickelgren (1977), was highly critical of our current uncertainty about which accuracy measure to use. He cited Townsend and
244 Identification Paradigms
Ashby (1978, p. 122) as suggesting some reasons why the Lappin and Disch (1972a) and Swensson (1972a) studies were inconclusive, including the possibility of too variable data, the fact that some of the measures are nearly identical (although not d' and its square), and that the range of RTs from 100 to 300 msec was simply too small to see much in the way of nonlinearities. Wickelgren's (1978) response, while sharp about other matters, does not really disagree with this point. Those working on memory rather than sensory discrimination have not found d' to be very linear with MRT, as a summary by Dosher (1979) of some of these studies makes clear. Corbett (1977), Corbett and Wickelgren (1978), Dosher (1976), and Wickelgren and Corbett (1977) all fit their SATFs by the cumulative exponential
Reed (1973) used the somewhat more complex
in order to account for a drop in d' with sufficiently large times. Ratcliff (1978) proposed
which Dosher (1979) noted is a special case of Reed's formula. Both the exponential and Reed's formula are ad hoc; Ratcliff's follows from a continuous random walk model to be discussed in Section 11.4.4. McClelland (1979) used his cascade model (Section 4.3.2) to arrive at another possible formula for the SATF. He assumed that for each s, r pair the level of activation is the deterministic one embodied in Eq. 4.12 perturbed by two sources of noise. One of these is purely additive and he attributed it to noise associated with activation of the response units. He took it to be Gaussian with mean 0 and variance 1, thereby establishing a unit of measurement. Let it be denoted X. The other source is assumed to be additive on the scale factor As that multiplies the generalized gamma r n (f). This random variable Y is also assumed to be Gaussian with mean 0 and variance cr*. Moreover, the residual time is R. Putting this together, the decision variable is Assuming the random variables are independent, then its expected value and variance are
Two-Choice Reaction Times: Basic Ideas and Data 245
Using the usual signal detection approach to this Gaussian decision variable, d' between signals a and b is easily seen to be
where o-2 = 2(cr^+crf;) 1/2 . He showed numerically that for appropriate choices of the parameters, this equation is virtually indistinguishable from Wickelgren's cumulative exponential.
6.5.4
Conditional Accuracy Function (CAF)
For the joint density f(r, t E), the plot of
versus t is a tradeoff function of some interest. Thomas (1974) called it the micro tradeoff in contrast to the macro tradeoff of the SATF; Lappin and Disch (1972a, b) used the term latency operating characteristic (which of course has also been suggested for other things); Rabbitt and Vyas (1970), the T-function; and Oilman (1977) the conditional accuracy function, (CAF), which Lappin (1978) has adopted as better than LOC. And as I think CAF is the most descriptive, I too shall use it. A major difference between the CAF and SAFT is that the former can be computed in any experimental condition for which sufficient data are collected to estimate the density functions, whereas the latter is developed only by varying the experimental conditions. For example, by using several response time deadlines one can generate the SATF, and one can compute a CAF for each deadline separately. To get some idea of just how distinct the CAF is from the SATF, we compute it for the fast guess model. By Eqs. 6.8 and 6.9.
Obviously, the form of P(r \ t, s) as a function of t depends entirely upon the forms of g0 and gi, whereas the SATF of Eq. 14 is linear in MRT independent of their forms. Actually, we can establish the general relationship between the CAF and SATF as follows (Thomas, 1974). Let r' denote the response other than r, then by Bayes' theorem (Section 1.3.2),
246 Identification Paradigms
FIG. 6.15 A possible relation between SATF and conditional accuracy function (CAP).
Substituting the SATF for P(r s) establishes the general connection between them. Observe that,
Thus, the general character of the pattern relating the CAP and SATF as one varies MRT must be something of the sort shown in Figure 6.15. One nice result involving the CAF has been established by Oilman (unpublished, 1974).
Two-Choice Reaction Times: Basic Ideas and Data 247
Theorem 6.2. Suppose in a two-choice situation, r' denotes the other response from r. Let T denote the response time random variable and suppress the notation for the environment except for the signal presented. If P(r \ s)> 0 and P(r'|s)>0, then I
Proof.
We use
and Eq. 6.17 in the following calculation:
So, whether errors to a given signal are faster or slower than the corresponding correct responses depends on whether the correlation embodied in the CAP is positive or negative. Note, this is not the correct versus error comparison made in Section 6.3.3 since there it was the response, not the stimulus, that was constant in the comparison. For the fast-guess model,
Since p(l — p ) > 0 and vi>v0, the covariance is positive or negative as Psr — /3r is positive or negative. 6.5.5
Use of the CAP
Lappin and Disch (1972a) and Harm and Lappin (1973) explored the question: does a subject's knowledge of the presentation probability affect
248 Identification Paradigms
FIG. 6.16 CAF for subjects identifying presentation schedules, where response bands. The accuracy measure is j l n P(B Lappin (1978); copyright 1978; reprinted
random dot patterns with two different time is manipulated by 20-msec reward | a)P(A ] 6)/P(A | a)P(B | ft). [Figure 1 of by permission.]
the subject's ability to discriminate signals? Using the CAF, with accuracy measured by -lnTi = -4ln[P(B a)P(A b)/P(A
a)P(B\b)\,
they found the CAF to be essentially unaffected. This was done with perfectly discriminable signals, so Lappin (1978) repeated the study using difficult-to-discriminate signals—namely, pairs of random patterns of eight dots located in an invisible 8 x 8 matrix. Subjects were required to confine their responses to a 100-msec wide band that was located so as to get about 75% correct responses. There were two conditions: 50:50 and 75:25 presentations schedules. The resulting CAF is shown in Figure 6.16, and we see that the CAF has a slight tendency to be flatter in the biased condition, but without additional data it is probably unwise to assume any real effect. By contrast, the bias measure of choice theory, In |3 = \ ln[P(A j a)P(A \ b)/P(B | a)P(B j b)], is much affected, as seen in Figure 6.17. 6.5.6
Can CAFs be Pieced Together lo get the SATF?
Apparently the first appearance of CAFs in the reaction-time literature was in the Schouten and Bekker (1967) study in which they manipulated
Two-Choice Reaction Times: Basic Ideas and Data 249
reaction time rather directly. The reaction signals were lights, one above the other, which the subject identified by key presses. In addition, three 20msec acoustic "pips" spaced at 75 msec were presented, and subjects were instructed to respond in coincidence with the third pip. The time between that pip and the signal onset was manipulated experimentally. The data were reported in terms of CAFs, although that term was not used. In principle, a CAP can be estimated in its entirety for each speed manipulation, but because of the fall off of the RT density on either side of the mean the sample sizes become quite small for times more than a standard deviation away from the mean. For this reason, it is tempting to try to piece them together to get one overall function. Schouten and Bekker's data, the means of which were replotted by Pew in Figure 6.14, are shown in Figure 6.18. Their conclusion was that these CAP lie on top of one another and so, judging by Figure 6.15, they may actually reconstruct the SATF. Wood and Jennings (1976) discussed whether it is reasonable to expect this piecing together to work—of course, we already know from the example of fast-guess model that it cannot always work. They presented data from the end of the training period of their alcohol study. The CAFs calculated for each deadline are shown in Figure 6.19 and as is reasonably apparent— they confirmed it by a non-parametric analysis of variance—these estimated CAFs are not samples from a single function. At a theoretical level Oilman (1977) raised the question under what conditions would the CAF and SATF coincide. He found a set of sufficient conditions that he dubbed the Adjustable Timing Model (ATM). Because the framework given in Eq. 6.4 is somewhat more general than that postulated by Oilman, we must add a condition not mentioned explicitly in his ATM. This is the postulate that the sensory parameters are fixed, not random variables, which we denote cf = a-(E). Thus, $(e | E) = $(<5 | E), where 8 is the decision parameter vector. Introducing this assumption into
FIG. 6.17 For the experiment of Figure 6.16, response bias as measured by j ! n P ( A a ) x P(A | b)/P(B a)P(B | b) versus band location. Note the substantial shifts due to changes in presentation probability. [Figure 2 of Lappin (1978; copyright 1978; reprinted by permission.]
FIG. 6.18 SATF pieced together from a number of CAFs, where the stimuli were lights and the subjects were induced to use different response times by a series of three auditory pips (duration 20msec each, separated by 75msec) and the reaction to the signal light was to coincide with the third pip. These were set at the values T = 100, 200, 300, 400, 600, and 800 msec. The data are averaged over 20 subjects and have 4000 responses per CAF. [Figure 4 of Schouten and Bekker (1967); copyright 1967; reprinted by permission.]
FIG. 6.19 CAFs for the identification of 1000- and 1100-Hz tones with response deadlines and payoffs for responses within the deadline. They do not appear to be samples from a single function. [Figure 2 of Wood and Jennings (1976); copyright 1976; reprinted by permission of the publisher. |
250
Two-Choice Reaction Times: Basic Ideas and Data 251
Eqs. 6.4 and 6.15,
where we have used the fact £ P(f I t, <*> 8) = 1. Oilman's major assumption is that the impact of the decision parameter vector 8 on r is completely indirect, via its impact on t; that is, From Eqs. 6.20 and 6.21, which embody the assumptions of the ATM, it is easy to see that The significance of this is that any manipulation of E that affects 8, and so t, but not cr, will generate the same CAP, namely, P(r t, a). It does not follow that this function is the same as the SATF; in general, they will differ. However, as Oilman has pointed out, if the ATM holds and if the CAF is linear—that is, where k and c are independent of t but may very well depend on r and on a—then the SATF is To show this, consider
To my knowledge, no good a priori reasons exist to expect ATM to hold. Indeed, the whole philosophy of the theory of signal detectability is exactly the opposite—namely, that decision parameters do in fact directly affect the
252 Identification Paradigms
FIG. 6.20 SATF (solid circles) and CAP (open circles) calculated from the same set of data for two subjects. Note that both are approximately linear, but they are distinct functions. [Figure 3 of Wood and Jennings (1976); copyright 1976; reprinted by permission of the publisher.!
choice probabilities and most of the models we shall examine in the next two chapters fail to meet the conditions of ATM. Moreover, Wood and Jennings (1976) have presented data where both the CAP and SATF appear to be linear, but quite different, as can be seen in Figure 6.20. 6.5.7
Conclusions about SATFs
I believe the SATF may prove to be of comparable importance to the ROC curves of error tradeoffs in giving us a way to describe the compromises subjects make between accuracy and time demands. In general, enough data should be obtained in order that some summary of the SATF can be reported. As yet, there is no consensus about the best way to summarize the SATF. Presumably, as we better understand the empirical SATFs, we will arrive at a simple measure to summarize them, something comparable to d' for ROC curves. Some authors have attempted to find an accuracy measure that leads to a linear relation, in which case two parameters summarize it. However, apparent linearity seems to be relatively insensitive to what appear, on other grounds, to be appreciable nonlinear changes in the accuracy measures. Other authors, working with search paradigms (see Chapter 11) have attempted to fit the resulting function of d' versus MRT by one or another of several families of curves, and they then study how the several parameters of the fitted family vary with experimental manipulations; however, there are no compelling theoretical reasons underlying the choice of some of these families and no real consensus exists as to which is best to use. Wickelgren (1977) has presented a very spirited argument for preferring SATF analysis to either pure reaction time or pure error analysis. As was
Two-Choice Reaction Times: Basic Ideas and Data 253
noted, Kantowitz (1978) took exception on, among other grounds, that we do not really know which accuracy measure to use. Weatherburn (1978) observed that if two or more parameters are involved, then the SATF is simply not a well-defined function, but rather a region of possible speedaccuracy pairs that can sometimes be thought of as a family of functions. Schmitt and Scheiver (1977) took exception to the attempts to use SATFs on the grounds that one does not know which family of functions to use, and they claimed that the analysis of data had, in several cases, been incomplete in terms of the family chosen. Dosher (1979), one of those attacked, provided a vigorous defense, pointing out gross errors in the critique, and giving a careful appraisal of the situation. Without denying our uncertainty about how to summarize the data and the real possibility that because of multiple parameters there may be no single function relating speed and accuracy, there can be little doubt that presenting SATFs in some form is more informative than data reported just in terms of MRTs with error rates either listed, usually in an effort to persuade the reader that they have not varied appreciably, or merely described as less than some small amount. As was mentioned previously the problem is that in many of the plots such as the exponential relation between d' and MRT, small changes in the accuracy measure can translate into far larger changes in the MRT when the error rate is small than when it is large. This parallels closely the problem of ill-defined psychometric functions arising because at small values of P(A \ b) a change of one percentage point corresponds to a very large change in P(A \ a); that is, the ROC is steep for small P(A \ b ) . The CAF, which has been invoked by some as almost interchangeable with the SATF, is in fact completely distinct from the SATF, and there really is no justifiable reason to treat them as the same. Despite the fact that the CAF is defined for each experimental condition, on the whole I think it is the less useful of the two measures. The CAF does not clearly address the tradeoff of interest, it is certainly far more difficult to pin down empirically over a wide range of times, and in most theoretical models it is analytically less tractable than the SATF. 6.6
SEQUENTIAL EFFECTS*
Up to now I have treated the data as if successive trials are independent. On that assumption, it is reasonable to suppose that f(r, t\ E) on a trial depends * I am particularly indebted to D. R. J. Laming for severe, and accurate, criticism of an earlier version of this section. I suspect that he will view my changes as inadequate, especially since his primary recommendation was that I drop the section entirely, in part at least, on the grounds that too little consensus exists about the empirical facts for the material to be of any real use to model builders. My judgment is that, in spite of its complexity and inconsistencies, this literature is simply too important to ignore.
254 Identification Paradigms
on the signal presented on that trial and upon the entire experimental context, but not on the preceding history of stimuli and responses. If that were so, then this density function could be estimated by collecting together all trials on which a particular stimulus was presented and forming the (r, T) histogram. But if the trials are not independent, we are in some danger when we make such an estimate. At the very least, the usual binomial computation for evaluating the magnitude of the variance of an estimate based upon a known sample size is surely incorrect (Norman, 1971). Depending upon the signs of the correlations, the dependence can either make the estimate too large or too small. Beyond that, if the trials are not independent, then we face the major theoretical problem of trying to account systematically for the dependencies. The evidence for the existence of such dependencies or sequential effects is very simple: we determine whether the estimate of some statistic, usually either the response probabilities P(r s) or the corresponding expected reaction time E(T s, r), differs appreciably depending upon how much of the history is taken into account. As stated by Kornblum (1973b, p. 260), "The term sequential effect may be denned as follows: If a subset of trials can be selected from a series of consecutive trials on the basis of a particular relationship that each of these selected trials bear to their predecessor(s) in the series, and the data for that subset differs significantly from the rest of the trials, then the data may be said to exhibit sequential effects." Let it be very clear that the principle of selection depends on events prior to the trial in question and does not in any way depend upon the data from that trial. 6.6.1
Stimulus Controlled Effects on the Mean
The most thoroughly studied sequential effects are those arising when the signal on the current trial is the same as or different from that on the preceding trial. These trials are referred to, respectively, as repetitions and alternations (in the case of two stimuli) or as non-repetitions (in the case of more than two stimuli). The discussion of sequential effects begins here for the two-stimulus, two-response situation, and I draw heavily upon the survey articles of Kirby (1980) and Kornblum (1973b). One notable omission from the Kirby review is the extensive set of experiments and their detailed analysis in Laming (1968); among other things, his sample sizes (averaged over the subjects) are appreciably larger than any of the other studies except Green et al. (1983), who report very large samples on a few subjects. Although we can clearly demonstrate the existence of such effects in the two-choice experiment and discover a number of their properties, many of the hypotheses that arise can only be tested in k-choice designs with fc>3. So our discussion of sequential effects will resume in Section 10.3 when we turn to these more complex reaction-time experiments. An illustration of the limitations of the k = 2 case may be useful. Suppose
Two-Choice Reaction Times: Basic Ideas and Data 255
we have reason to believe that the magnitude of the sequential effects depends both upon the relative frequency with which the signals are presented, Pr(sn = a) = p, where sn is the signal presented on trial n, and upon the tendency for the signals to be repeated in the presentation schedule (which is sequential structure imposed by the experimenter), which we make independent of the particular signal—that is, P(sn = a s n _i = a) = P(sn = b | sn.._! = b) = P. Since P(sn = a) is independent of n, it must satisfy the constraint
and solving, either P = I or p = \. Thus, if we wish to vary p and P independently, we must either use more than two signals or abandon the condition that the probability of repetition is the same for both signals. Another problem in studying sequential effects is reduced sample sizes. Suppose we consider the purely random schedule, and we wish to partition the history of stimulus presentations back m trials. If the overall sample is of size N, then each history has an expected sample size of N/2m and so the standard error of the resulting mean estimate is 21/2mcr/]V1/2. So, for example, if the true MRT is 400 msec with a standard deviation of 75 msec and the basic sample N is 2000, the standard error of the overall MRT for one signal is 2.37 msec, that of a two step history is 3.35 and that of a four step one is 6.71 msec. Many of the differences in the data are under 10 msec, and so they must be viewed with some skepticism. The first two studies in which the data were partitioned according to their history of stimuli were Laming (1968, Ch. 8) and Remington (1969). The latter experiment involved five subjects who responded by key presses to one of two lights. A warning light and a 1-sec foreperiod was used prior to the signal presentation. Subjects were asked to respond as rapidly as possible, consistent with an error rate of less than 5%. The actual overall level was about 1%. The average interstimulus interval was about 4 sees. (I begin to report this time because, as will soon be evident, it is an important independent variable for sequential effects.) There were two experimental conditions: one with equally likely presentations (50:50) and the other with a ratio of 70:30. Some data were rejected as not having achieved stability. What remained were 800 observations per subject in the 50:50 condition and 1000 in the 70:30. These were partitioned into histories up to five trials back, and they are shown in Figures 6.21 and 6.22. Observe in Figure 6.21 the pronounced repetition effect. Note that in this figure the symbol A denotes the stimulus on the trial under consideration and B the other stimulus, and the past history is read from the current trial on the right and back to the left. Thus, a string of the other stimulus B before and the current A presentation makes for a slow response, the time increasing as the
256 Identification Paradigms
FIG. 6.21 MRT versus previous stimulus history for two equally likely lights. The sample size was 800 for the signal not partitioned into any previous history and it decreases to 800/2" for a history of length n. The symbol A designates the signal being responded to and B the other signal. The stimulus history is read from right to left. [Figure 2 of Remington (1969); copyright 1969; reprinted by permission.]
number of Bs increases; a string of As'before an A makes for a rapid response. The pattern for the 70:30 data is similar with, of course the less frequent signal distinctly slower than the more frequent one. An experiment by Falmagne, Cohen, and Dwivedi (1975) in essence replicated these results of Remington, but in some ways is more striking. The span of times from the slowest to the fastest as a function of presentation pattern is some two to five times as large as in Remington's data. This may be because a fairly brief response-stimulus interval (200 msec) was used by Falmagne et al. as compared with Remington's average of four seconds. Since the data are not qualitatively different and they will be described in
Two-Choice Reaction Times: Basic Ideas and Data 257
some detail relative to a sequential model (Section 7.6.4), I do not present them here. As was mentioned earlier, Laming's experiments involved the identification of two white bars on a black background presented tachistocopically. His Experiment 3, which was run without automation, had a 2500-msec RSI interval. Twenty-four subjects participated in five series of 200 trials
FIG. 6.22 MRT versus previous stimulus history with 70:30 presentation probability. Here 1 denotes the more probable signal and 2 the less probable one. [Figure 3 of Remington (1969); copyright 1969; reproduced by permission of the publisher.]
258 Identification Paradigms
FIG. 6.23 MRT versus previous stimulus history when the signals were equally likely line lengths with the intertrial interval as a parameter. The code used for the history is explained in the text. The sample size was 5000/2" per stimulus per history of length n (see discussion in text). [Figure 8.5 of Laming (1968); copyright 1968; reproduced by permission.]
each, for a total of 24,000 observations. The series differed in both instructions and a point scheme aimed at influencing the speed-accuracy tradeoff. They were such that the error rate was intentionally considerably larger than Remington's. For example, when an alternation followed a string of six or seven identical presentations, it was approximately 20%. As there was little difference among the series, they were pooled for the purposes of sequential analyses. Laming also found that repetitions decreased the MRT and they decreased the errors made. In contrast to Remington, the MRT to an alternation following a string was largely independent of the length of the string, but the error rate increased monotonically with the length. His Experiment 5 was automated, which permitted use of the intertrial interval as an experimental variable. The same general experiment was run with intertrial intervals of 1, 8, 64, 512, and 4096 msec, arranged in a Latin
Two-Choice Reaction Times: Basic Ideas and Data 259
square design. Each of 25 subjects was run for 100 trials at each ITI, yielding 12,500 observations. The MRT data for selected histories are shown in Figure 6.23 and the corresponding error data in Figure 6.24. The code is read from left to right in time up to the trial before the signal in question. A 0 means that the signal in that position (trial) was the same as the one for which the data are being plotted, and a 1 means that it was the other signal. Thus, for example, 0100 arises either from the sequences on trials n— 4, n — 3, n— 2, n — 1 , and n of either abaaa or babbb. It is quite clear that the shorter the ITI, the slower the response and the more likely an error, especially after particular histories. Laming (1968, p. 109) summarized the results as follows: The sequential analyses of Experiments 1, 2, and 3 suggested that if the subject experiences a run of one signal or an alternating sequence, he expects those patterns to continue. On this basis the subjective expectation of ... the signal actually presented .. . increases from left to right [in the left panels of Figures 6.23 and 6.24] and from right to left [in the right panels]. When the intertrial interval is long the mean reaction times and proportions of errors behave as one would expect; they both decrease as the subjective expectation of [the signal] increases. But when the intertrial interval is short they behave differently: after a run of either kind of signal the mean reaction times and proportion of errors all decrease, while after an alternating sequence they increase greatly, irrespective of which signal might have been most expected.
Later Kirby (1976b), apparently unaware of Laming's work, ran a study in which he manipulated the interval between a response and the next presentation, the RSI. His data presentation was the same as Remington's except that he separated it into first and second halves. These are shown in Figure
FIG. 6.24 The response proportions corresponding to Figure 6.23. [Figure 8.6 of Laming (1968); copyright 1968; reprinted by permission.]
FIG. 6.25 Effect of response-stimulus interval (RSI) and experience on MRT versus previous stimulus history. The stimulus history code is as in Figures 6.21 and 22. The sample is 100 trials per run and three runs per panel. fFigure 4.1 of Kirby (1980); copyright 1980; reprinted by permission.]
260
Two-Choice Reaction Times: Basic Ideas and Data 261
6.25. The most striking fact can be seen in the first-order sequential effects—namely, that a repetition of a signal, AA in his notation, speeds up the MRT for the 50-msec RSI, but slows it down for the 500- and 2000-msec RSIs. It should be noted, however, that at all RSIs the effect of additional repetitions is to speed the MRT slightly from the one step repetition; however, for any length of history at the two longer RSIs, the fastest time arises with pure alternation. Certain inconsistencies exist between Laming and Kirby's data, and one wonders what might account for them. Two notable differences in the studies are the sample sizes (12,500 versus 3,600) and the fact that Laming included all responses whereas Kirby discarded error responses (less than 4.5%). The first means that for the smaller sample size, it is entirely possible that more of the orderings are inverted due to sampling variability, and so not all differences can be assumed to be real. The second raises the possibility that the experiments were run at different speed-accuracy tradeoffs and that this affects the sequential pattern in some way. Another fact, much emphasized by Green et al. (1983) as probably contributing to instability in all of these experiments, is the use of many, relatively inexperienced subjects for relatively few trials each. In their work, which I discuss more fully in Section 8.4, only three subjects were used, but each was practiced for at least 3500 trials and was run for 21,600 trials. There was evidence of changes in MRT during at least the first 7200 trials of the experiment. Another factor they emphasize is their use of random (exponential) foreperiods, in contrast to most other studies that use constant foreperiods (i.e., RSI). No matter which, if any or all, of these factors are relevant to the different results, the fact is that considerable uncertainty obtains about the basic facts. That the RSI affects in some manner the significance of alternations and repetitions had been noted earlier. The point at which this change from a repetition to an alternation effect takes place appears to be approximately half a second. Thus, repetition effects have been found for RSIs of less than approximately half a second by Bertelson (1961, 1963), Bertelson and Renkin (1966), Hale (1967), Kornblum (1967), Hale (1969a), Eichelman (1970), Kirby (1976b) and alternation effects with intervals greater than half a second by Williams (1966), Hale (1967), Moss, Engel, and Faberman (1967), Kirby (1972, 1976b). At intervals of, or close to, half a second, repetition, alternation, and nonsignificant sequential effects have been reported (e.g., Bertelson, 1961; Hale, 1967; Schvaneveldt and Chase, 1969; Eichelman, 1970; Kirby, 1976b). (Kirby, 1980, p. 132)
And, of course, we know from Section 5.4 that a significant interaction exists even in simple reaction times when two signals are separated by less than 300 msec. So far as I know, no attempt has been made to use those ideas in the study of sequential effects in choice paradigms or to place both sets of phenomena in a common framework.
262 Identification Paradigms
Kirby went on to point out that there are at least four anomolous studies in which a repetition effect was in evidence at long RSI: Bertelson and Renkin (1966), Entus and Bindra (1970), Remington (1969), and Hannes (1968). To this list, we must add Laming's (1968) data. As I just remarked, we do not know the source of these differences, but it is interesting that in a second experiment Kirby (1976) was able to produce either repetition or alternation effects at long RSI by instructing the subjects to attend to repetitions or alternations; whereas, at short RSI the instructions had little effect, with a repetition effect occurring under both instructions. 6.6.2
Facilitation and Expectancy
The data make clear that at least the previous signal and probably a considerable string of previous signals affect the MRT. Moreover, the nature of that effect differs depending upon the speed with which new stimuli occur. It seems likely, therefore, that two mechanisms are operative: one having a brief temporal span and the other a much longer one. In terms of the types of mechanisms we have talked about, I suspect the brief one is part of the sensory mechanism and the longer lasting one involves memory phenomena that last some seconds and arc a part of the decision process. In this literature, the relevant sensory mechanism is spoken of as "automatic facilitation" (Kirby, 1976; also "intertrial phenomenon" by Bertelson, 1961, 1963, and "automatic after effect" by Vervaeck and Boer, 1980) and the decision one as a "strategy" (or "subjective expectancy" or "anticipation"). And, as we shall see below, there is reason to suspect that there may be at least two distinct strategy mechanisms involved. The facilitation mechanism is thought to be of one of two types. The first is that the signal leaves some sort of sensory trace that decays to a negligible level in about 750 msec. Sperling (I960) used masking techniques to provide evidence for such a trace. When a second signal occurs in less time than that, the traces are "superimposed." If the two signals differ, there is little gain and perhaps some interference in the superimposed representation. If they are the same, however, the residual trace somehow facilitates the next presentation of the signal. This could involve either some sort of direct addition of the old to the new representation, making it stronger than it would otherwise be and, thereby, allows the identification to proceed more rapidly than usual, or it could entail some sort of priming of a signal coding mechanism, for example, by influencing the order in which certain features are examined. In either event, a repetition effect results. The facts that MRTs in a choice situation run from 300 to 400 msec with signals of 100-msec duration and that the repetition effect gives way to an alternation one at about an RSI of 500 msec suggest that the trace has largely disappeared after 700 to 800 msec. The other facilitation mechanism is somewhat less clearly formulated. It supposes that the effect of a repeated signal is to bypass some of the signal
Two-Choice Reaction Times: Basic Ideas and Data 263
processing that is normally involved. Since exactly what is bypassed is unclear, it is not evident how to distinguish facilitation from trace strengthening. If a sensory-perceptual facilitation mechanism were the whole story, then since it is entirely driven by the stimulus schedule we should not see any impact of previous experience or instructions on it. Kirby (1976) tested this idea in his third experiment, again using lights as signals. There were six conditions, half of which were at an RSI of 1 msec and the others at 2000 msec. Within each condition, the last third of the trials were run at a 50:50 ratio of repetitions and alternations. The first two-thirds were run at three ratios: 70:30, 50:50, and 30:70. He found that for the longer RSI, a repetition effect in the 70:30 condition and an alternation effect in the 30:70 condition persisted into the last third of the data. For the short RSI, the effects did not persist. These data are consistent with the idea that for short RSIs the sequential effects are stimulus determined, but for the long ones something other than the stimulus pattern affects the behavior. So one considers possible decision strategies. The fact that faster responses occur with alternations suggests that the subjects in some sense anticipate the occurrence of alternations. This is reminiscent of the famed negative recency effect uncovered in probability learning experiments (Jarvik, 1951). It appears that many people have a powerful tendency to act as if a local law of averages exists—that is, as if there were a force altering conditional probabilities in a random sequence—so as to keep the local proportions very close to 50:50. Such a rule or belief would lead to an exaggerated expectation for alternations. This idea that the subject is predicting, albeit incorrectly, which signal will be presented has led to a series of attempts to have the subject make the predictions overt and to see how MRT depends on the prediction. The major idea is that if we single out successive pairs of correct predictions, then the sequential effects on those trials should vanish. This literature was discussed in detail by Kirby (1980, pp. 143-144), who concluded that there is just too much evidence that the act of predicting directly affects the response times, and so overt prediction fails to be a useful experimental strategy. The relevant papers are DeKlerk and Eerland (1973), Geller (1975), Geller and Pitz (1970), Geller, Whitman, Wrenn, and Shipley (1971), Hacker and Hinrichs (1974), Hale (1967), Hinrichs and Craft (197 la), Schvaneveldt and Chase (1969), Whitman and Geller (1971a,b; 1972), and Williams (1966). Kirby (1980, pp. 145-148) examined the evidence as to whether the strategy effect is, as he appears to believe, one of preparation and/or expectancy or whether the strategy develops after the signal is presented. Although the argument is protracted, it appears to me that its main thread goes as follows. If the strategy comes into play after signal onset, then shortening the RSI should only accentuate it since it reduces the time for memory to decay. Any effect to the contrary must be due to sensory
264 Identification Paradigms
facilitation, which as we have seen comes into play with RSIs under half a second. On the other hand, if the strategy is set prior to signal presentation, the shorter the RSI the less time it has to be developed, and so at the shortest times its effect should be negligible. And Kirby contends that the data are more consistent with the latter view. For example, the fact that some well-practiced subjects develop the ability to produce alternation or repetition effects at will, even at short RSIs, he interprets as their becoming more efficient at preparing their strategies. However, these data could just as well be interpreted in terms of a shift from a preparation strategy, which at short RSI does not have time to be effective, to a reactive one that is established after the signal onset. I find the arguments unpersuasive, and I believe that the basic nature of these strategy effects is still an open question. I shall, nonetheless, explore some rather detailed, specific suggestions about them in Section 6.6.5. Vervaeck and Boer (1980) made some important observations in this connection. First, they noted that an expectancy hypothesis predicts not only that the expected signal will be responded to faster than when no expectancy is involved, but that the unexpected one will be responded to more slowly. In contrast, a general facilitation mechanism of any sort predicts that an increase of the facilitation factor leads to faster responses and a decrease to slower responses independent of which signal is presented. At short RSI, Kirby (1976) reported sequential effects that differed according to whether the signal was a repetition or an alternation, whereas Laming (1968, Experiment 5) obtained facilitation that did not depend upon the signal. Vervaeck and Boer pointed out a significant procedural difference: Laming measured his RSI from the onset of the response key, and Kirby measured it from the offset. They judged from other data that the typical duration of a key press was from 100 to 200 msec. They repeated both experiments under the same conditions, found a difference of 106 msec between the two observed RSIs, and replicated both Laming's and Kirby's results. 6.6.3
Stimulus-Response Controlled Sequential Effects
Our discussion of sequential effects to this point should seem a bit odd since nothing has been said about the responses except that their MRT is used as the dependent variable. What about previous responses as a source of sequential effects? To my knowledge, this has never been examined with anything like the care that has gone into stimulus effects. In particular, I do not know of any two-choice studies, analogous to those reported in Figures 6.20-6.24, that partition the data according to the past history of responses or, better, the joint past history of signals and responses. Laming (1968, Section 8.4) reports some linear regression results, but no very clear pattern is evident. There are, however, several studies in which the joint past history for one trial back is examined, and I discuss them here. Concerning the sequential effects exhibited jointly by response probability and time, there is
Two-Choice Reaction Times: Basic Ideas and Data 265
but one study, which is reported in the next section. Within the general psychophysical literature there is a fair amount of data about sequential effects in absolute identification and magnitude estimation, but with many more than two signals. The only data for two-stimulus designs of which I am aware are concerned primarily with the impact of an error on the next trial. Some of these data, particularly those in Rabbitt (1966), are based upon more than two signals, but they are too relevant to postpone until Chapter 10. In his studies, Rabbitt has used a short RSI, often 20 msec and never more than 220 msec. The task was the identification of one of several lights appearing in different positions. His data showed: that errors are faster than correct responses (Section 6.4.3), that the MRT on the trial preceding an error did not differ significantly from the overall MRT, and that the MRT following an error is slower than the overall MRT. The fact that the MRT before an error is not unusually fast suggests that the fastness of the error trials is not part of some overall waxing and waning of the reaction time. Rabbitt and Rogers (1977) and Rabbitt (1969), using Arabic numerals as signals and key presses as responses, showed that the delay following an error was considerably greater when the alternative signal was used (and so to be correct in the two choice situation the response was a repetition of the previously erroneous one) than when the signal was repeated. Laming (1979b), in discussing Rabbitt's work, cited the data from Experiment 5 of Laming (1968) (described earlier) in which he varied the RSI and found appreciable changes in performance. Recall, at long RSI the probability of an error following an error is sharply reduced below the average error rate independent of whether the signal was repeated or alternated; whereas, at short RSI the error probability is reduced for a repeated signal and greatly increased when the signal is alternated. For Experiments 1, 2, and 3—where the first two were slight variants in which the presentation probability was varied and in 3 the error rate was varied by instructions— the RSI was long: 2500msec, 1500msec, and 2500msec. Because of the pronounced effects that the previous stimulus history is known to have (Section 6.6.1), Laming corrected for it both in his error probabilities and MRTs using a multiple regression analysis described in Appendix C of Laming (1968). The data are broken up according to whether the error stimulus is repeated or alternated, and so the erroneous response is either alternated or repeated in order to be correct. The data are shown in Figure 6.26 as a function of the number of trials since the preceding error. Error probabilities are presented directly whereas times are presented in terms of deviations from the overall MRT for that stimulus history. We see, first, that error trials are faster by about 50 msec than the overall mean times. Second, immediately following an error the time is slower than average, which is true whether the stimulus is repeated or alternated. The effect is somewhat larger for alternations than repetitions. Third, following an error, the error rate for both types of trials is reduced below average (except for the alternative
266 Identification Paradigms
FIG. 6.26 For three experiments in which the subject is attempting to identify length, the probability of an error and the deviation of MRT from the overall mean versus the number of trials since an error. The data on the left are when the stimulus in question is the same as the one for which an error was made, and on the right, when they differ. The sample sizes were as follows: Exp 1
No. of Conditions 4
2 3 4 5
4 8 10 5
Ss/Cond 6 6 3 2 5
Obs/S 1000 1000 1000 800
1000
[Figure 1 of Laming (1979b); copyright 1979; reprinted by permission.]
signal of Experiment 3). Fourth, the recovery of MRT to its normal value is comparatively rapid; in the case of repeated signals it is complete in one trial. Fifth, the recovery of error probability to its normal level is, except for Experiment 3, not achieved even after five trials. The fact that the recovery patterns for MRT and error probability are quite different suggests two distinct mechanisms, but this is not a necessary conclusion. For example, Laming has shown how a single mechanism can suffice (Section 6.6.5). First, however, it is desirable to examine the sequential problem empirically from the point of view of the SATF. 6.6.4
Are the Sequential Effects a Speed-Accuracy
Tradeoff?
The answer to the question of the title is not obvious from what has been presented. It could be Yes, but equally well it could be No in that the SATF itself is changed as a result of the previous history. Obviously, the answer to the question may very well depend upon the RSI. Swensson (1972b) performed the following relevant experiment. There were five distinct but unmarked horizontal locations on a cathode ray tube,
Two-Choice Reaction Times: Basic Ideas and Data 267
and a square with one of the two diagonals was shown successively in these locations from left to right, with the choice of the diagonal made at random for each location. The subjects responded by key presses to identify the direction of the diagonal at each presentation. At the end of each sequence of five stimuli, information was fed back on the number of correct responses and the sum of the five response times. Two subjects were run. One major manipulation was the time between each response and the presentation of the next signal in the sequence, the RSI. One at 0 msec was called the immediate serial task (IS) and the other at 1000 msec was called delayed serial (DS). Another manipulation was to shift the emphasis between speed and accuracy between blocks of 50 or 100 trials. In addition, for subject RG an explicit monetary payoff was used to effect the speed-accuracy tradeoff. The total number of trials (each a sequence of five stimuli) run in each condition for each subject varied from 1500 to 3000. The data were pooled over blocks of trials having comparable error rates into four groups. The SATF presented is log odds versus MRT. Because of the design into five presentations per trial, the first question to be raised is whether serial position effects are in evidence. Figure 6.27 shows that a considerable serial position effect exists for the IS condition and is very slight for the DS condition. This is not surprising if memory traces dominate studies with brief RSIs. To study the sequential effects, the data from positions 2 through 5 were combined and then partitioned into four categories according to whether the response (not the signal) in question is a repetition or alternation (coded in
FIG. 6.27 SATF for one of two subjects for each serial position of a sequence of five presentations on each of which the subject was to identify orientation of a line. The IS condition is with zero RSI and DS with a 1000-msec RSI. Sample sizes varied from 1500 to 3500 per condition. The bars show ±1 standard deviation. [Figure 3 of Swensson (1972b); copyright 1972; reprinted by permission.]
268 Identification Paradigms
FIG. 6.28 SATF sequential effects for two subjects in the zero RSI condition of Figure 6.27. Open symbols are repeated responses and solid ones are alternated ones; circles mean the previous response was correct and triangles means it was in error. [Figure 5 of Swensson (1972b); copyright 1972; reprinted by permission.
the figures as open and closed symbols, respectively) and whether the response follows an error or a correct response (coded by circles or triangles, respectively). The data for the IS condition are shown in Figure 6.28. Following a correct response there is little difference between alternations and repetitions, but following an error the performance is rather seriously degraded, the more so for alternations which in the case of RG are exceedingly fast and have approximately a 50% error rate. By contrast, the data from the DS condition, shown in Figure 6.29, show very little effect on the SATF of these categorizations except that repetitions of the response may be slightly poorer than alternations following a correct response for RG and following an error for RS. Once again it is clear that RSI makes an important difference. These data seem to accord with earlier evidence that with short RSI there is some tendency for the subject to attempt to correct the error just made, which leads to an unusually fast and, relative to the next signal, largely random response (Burns, 1971; Rabbitt, 1966, 1967, 1968a, 1968b, 1969). For the long RSI this tendency disappears and to a first approximation the sequential effects appear to be not changes in the SATF, but shifts in the tradeoff on that function.
Two-Choice Reaction Times: Basic Ideas and Data 269
6.6.5
Two Types of Decision Strategy
Judging by Swensson's data, the most obvious idea to account for the impact of an error with a long RSI is as some sort of adjustment on the SATF, the subject becoming more conservative and so more accurate at the expense of being slower. The difficulty with this view is that it predicts that changes in MRT and error probability should covary, even though we know from Figure 6.26 that during the recovery phase following an error they do not. Laming (1968, pp. 80-82) suggested another possibility. This arose in his discussion of the important classical random walk model (Sections 8.2, 8.3.1, and 8.4.2) which model has as one of its predictions that the distributions of error and of correct responses (same response, different signals) should be identical. As this is contrary to the data, Laming asked if some plausible mechanism would account for the fact that errors are usually faster than correct responses. He suggested that when the subject is under time pressure he or she may err as to the actual onset of the signal. Laming described this tendency as arising from time estimation, but it is just as plausible that the subject adjusts the detection criterion to the point where on a significant
FIG. 6.29 Same plot as Figure 6.28 for the 1000-msec RSI. [Figure 6 of Swensson (1972b); copyright 1972; reprinted by permission.!
270 Identification Paradigms
fraction of the trials the pre-stimulus background "noise" triggers a detection response in the system. Either way, if a premature cue serves to begin the information accumulation on which a decision is to be based, then the response is both more likely to be in error and to be somewhat faster than it would otherwise be. According to this view, then, there are two criteria at the subjects' disposal. The one has to do with the point at which information for the decision begins to be accumulated—either the criterion for detection or the setting of the parameters of the time estimation process. The other is the criterion for making a response once information begins to be collected. Laming made the interesting observation that the differential recovery of MRT and error probability following an error can, in fact, be accounted for by just the anticipation mechanism grafted onto a standard decision model (the SPRT model of Section 8.3.1). He showed that if the subject starts to accumulate information well in advance of signal presentation, the error rate is substantially increased from what it would have been had accumulation begun exactly at signal presentation. That result is not surprising, but what is surprising is the fact that E(T) is reduced by only a few milliseconds. This arises because the information accumulated prior to the signal does not itself tend to lead to a decision, but rather introduces variability in the starting point of information accumulation at signal onset. He assumed that following an error, the subject becomes really very conservative and begins accumulating information well after signal onset. This tendency both greatly reduces the error probability (on the assumption that signal information is continuing to be available) and delays E(T) by the amount of the delay after signal onset plus the amount of anticipation that was in effect at the time of the error. If on the next trial the time at which accumulation is initiated is moved forward, then E(T) is moved forward by the same amount, but the error probability does not change. In fact, the initiation time can shorten until it coincides with signal onset before the error probability begins to rise to its original value. Clearly, this single mechanism is sufficient qualitatively to account for the apparently separate recovery of MRT and error probability. This model illustrates nicely Weatherburn's point that there may well be more than one source of tradeoff between speed and accuracy. The time to begin accumulating information establishes one tradeoff, and the criterion for responding on the basis of the information accumulated establishes a second one. At the present time, we do not know of any reliable way experimentally to evoke just one of them. Some authors in discussing the sequential data have spoken of selective preparation for one response rather than the other (Bertelson, 1961; Falmagne, 1965), and others of nonselective preparation (Alegria, 1975; Bertelson, 1967; Granjon & Reynard, 1977). It is difficult to know for sure what is meant by these concepts, but one possibility is the two sorts of mechanisms just discussed, with the criterion for signal detection being non-selective.
Two-Choice Reaction Times: Basic Ideas and Data 271
One final point. Laming (1969b) developed an argument, based on his 1968 data, that the same theoretical ideas that account for the sequential patterns may also underlie the so-called signal-probability effect. This effect is the fact that the more probable signal is responded to more rapidly and more accurately than the less probable one. It is clear that the more probable signal will, in general, be preceded by a longer run of repetitions than the less probable one. Thus to the extent that repetitions lead to both speed and accuracy, the effect follows. He worked out this idea in some detail for the SPRT model. We will return to the relation between sequential effects and presentation probability in Section 10.3.1.
6.7
CONCLUSIONS
Comparing choice-reaction times with simple-reaction times leaves little doubt that somewhat more is going on. It usually takes 100 msec and sometimes more to respond to the identity of a signal than to its presence. Many believe that most of the additional time has to do with the processing of information required to distinguish among the possible signals. Others believe that some and perhaps much of the time is required to select among the possible responses. This point of view is argued by Sternberg (1969a). To some degree, this distinction can be examined by uncorrelating the number of responses from the number of signals; some of those studies are taken up in Chapters 11 and 12. It is also clear from data that subjects effect important compromises or tradeoffs. Perhaps the best known of these is between the two types of error, which is represented by the ROC curves and has become a standard feature not only of modern psychophysics but of much of cognitive psychology. Some models assume that the mind is able to partition the internal evidence about stimuli into categories corresponding to the responses. Once the experimenter records response times as well as choices, a number of questions arise about how the response times relate to the responses made. One controversial question has been how the time depends upon whether the response is correct or in error. The result tends to be this: for highly discriminable signals responded to under considerable time pressure, errors are faster than the corresponding correct responses, but for signals that are difficult to discriminate and with pressure for accurate responding, the opposite is the case. A few studies fail to conform to this pattern, and there is little work on time pressure coupled with difficult-to-discriminate signals. Another major tradeoff is between speed and accuracy, which is thought to arise when the subject varies the amount of information to be accumulated and processed prior to a response. There is at least the possibility that this tradeoff is a strategy distinct from another tradeoff—namely, the selection of a criterion to determine the actual response to be made once the infor-
272 Identification Paradigms
mation is in hand. In the fast-guess and fixed stopping-rule models they were distinct; in some of the other models studied in later chapters they are not. The last body of data examined in the chapter concerned sequential effects. Here we were led to believe that there may be at least three distinct mechanisms at work, the last of which is probably a manifestation of the speed-accuracy tradeoff just discussed. The first is the possibility of direct sensory interaction between successive signal presentations, which some authors have suggested arises from the superposition of sensory traces when a signal is repeated sufficiently rapidly. The evidence for this came from the different effects that occur as the time from a response to the next signal presentation is varied. But even after eliminating this sensory effect by using long SRIs, there is still a rather complex pattern of sequential effects following an error. After an error both the MRT slows and accuracy increases, but on subsequent trials the former returns rapidly—in some cases in one trial—to its overall mean value whereas the error rate increases only slowly over a number of trials. Some have interpreted this as evidence for both a type of non-selective preparation that is not long sustained and a selective one that is. Within the information accrual framework, Laming has suggested that the non-selective one is some mechanism—perhaps time estimation, perhaps a signal detector that is causing anticipations—that often initiates the accrual process prior to the actual signal onset. The other mechanism appears to be some sort of speed-accuracy tradeoff for which there are a number of models (Chapters 7 to 10). Most theories attempt to provide accounts of the ROC and SATF. Relatively little has been done to account in detail for the sequential effects, in part because it leads to messy mathematics and in part because the phenomena to be explained remain somewhat uncertain. There are a few attempts to wed stochastic learning processes to otherwise static models. Laming has coupled a time estimation model with the random walk one, which was originally designed only to provide an account of the ROC and SATF. I am not aware of any models for the sensory effect found with brief RSIs. I have elected to partition this theoretical work into three broad classes. In Chapter 7 the models assume that the subject can opt, as in the fast-guess model, to be in one of several states, each of which has its characteristic time and error pattern. Chapters 8 and 9 explore information accrual models for two-signal experiments. They differ only in whether we assume that time is quantized independently of the information accrual process or by that process itself. And Chapter 10 deals with identification data and models when there are more than two signals, a topic far less fully developed than the two-signal case.
7 Mixture Models
Beginning with this chapter, we examine a series of models that attempt to formulate what information the mind has about the signals presented and how it makes decisions about what and when to respond. In some earlier chapters, especially Chapter 4, we have already encountered mathematical processes that unfold in time, but we did not explore them in much depth. To study such processes in the detail required to compare them with data, it becomes necessary to be explicit about the mathematical formulations, which technically are stochastic processes. In this chapter, only Markov chains will play a role. In Chapter 8, the major focus is on discrete-time random walks. Chapter 9 rests mainly on diffusion and renewal counting processes in continuous time. To the extent that I can, I will keep the presentation self-contained. As an aid to those having limited familiarity with the fundamental concepts of stochastic processes, some of the basic definitions were listed in Section 1.5. And, of course, there are a number of well-known introductions to stochastic processes; among my standbys are Karlin and Taylor (1975) and Parzen (1962). An advanced treatment is Gihman and Skorohod (1974). The topic of this chapter is the fast-guess model, which we have already explored briefly (Section 6.3.3), and various generalizations of it. First, I take up the general concept of a mixture of distributions and work out some of their general properties. Second, the possibility is explored of formulating sequential effects within the mixture framework. Third, the more specific fast-guess model is examined in some depth and several important predictions are derived from it. Fourth, a blend of fast guesses and a simple model of memory scanning that involves two response states is shown to exhibit something close to the correct sequential effects in both times and errors. And finally quite a number of experiments that have been run are used to evaluate the models. The conclusion I shall draw from this is double pronged: two-state models are too simple to account for all aspects of the data, but the idea that subjects may revert to fast guesses when under sufficient time pressure seems well sustained. In fitting any other model to a body of data, it is wise first to check whether the data are contaminated by fast guesses and either to remove those responses before carrying out further analyses or model them as in the promising three-state memory model. 273
274 Identification Paradigms
7.1 TWO-STATE MIXTURES Psychologists as well as those not formally trained in the area generally believe that performance is greatly influenced by the degree to which one is prepared to perform. This preparedness is thought to consist of three distinct aspects: overall attention to the task at hand, focusing upon one or a very few of the possible stimuli, and preparation to carry out particular responses. These factors are easily illustrated in the example of driving under adverse conditions. The very fact that the conditions are adverse—say mist and fog—is usually enough, in the absence of alcohol, to bring attention fully to the task of driving so that one is unlikely to converse or to recall what was said or played on the radio. Further, one is likely to be particularly focused upon certain possible stimuli—namely, those signaling the greatest danger such as colliding with another vehicle or striking a pedestrian. Stimuli such as road signs are likely to require more response time than normal and certainly more than is required for something that is interpreted as a car or a person. And finally, one is highly prepared to make the braking response rather than something else (e.g., accelerating and passing). One of the early explicit models for choice reaction time attempted to capture at least part of this idea. It, and most other models, take for granted the overall attention to the task at hand, and they attempt to model the focus-preparation aspect. In fact, the framework of the initial model is sufficiently general that it is unclear which of the three aspects—overall attention, focus, or preparation—is actually involved. Falmagne (1965) suggested that for the choice reaction time situation the observer is in one of two states where responses are slower in one than in the other. Various interpretations are possible for the states, but level of motivation to respond rapidly is the most obvious. The variable to be manipulated experimentally is the probability of being in each state. Let us denote by p(E) the probability of being in the slow state when experimental state E is used. The experimental condition is part of the environment which, except for the conditions being manipulated including the stimulus presented, is presumed to be invariant over trials. Thus, in addition to the stimulus presented, E can refer to the instructions, the probability distribution of presentations, and/or the payoffs, and so on. Denote by G { ( t ) the distribution function of response times when in the slow state and by G 0 (t) the distribution when in the fast state. On these assumptions, the distribution of observed response times is
This is a two-state mixture model, an example of which is the fast-guess model. The generalization of the two-state model to multistate mixtures is obvious, but only one, a three-state model, has been developed in detail (Section 7,5). The conclusion will be that at a minimum three states are
Mixture Models
275
needed in general. For a more general discussion of mixture models and their associated statistical theory, see Everitt and Hand (1981). One drawback of the model formulated by Eq. 7.1 is that it fails to say anything at all about the relation between stimuli and responses. The simplest assumption, the one made by Falmagne, is that the stimuli are perfectly discriminable and no errors are made in either the fast or slow states. This assumption is unlikely if one thinks of the two states as different levels of attention, but quite acceptable if they are thought of as levels of response preparation. Under that assumption, Eq. 7.1 may apply to the identification of any finite number of stimuli. The generalization to imperfect performance when there are just two stimuli and two responses is embodied in the fast-guess model of Oilman (1966, 1970) and Yellott (1967, 1971), who assumed that no stimulus discrimination is made in the fast state, whereas some, albeit not necessarily perfect, discrimination is made in the slow state. As was noted earlier, this could arise if the fast responses are based merely on the detection of a signal, whereas the slower ones involve identification as well. This model is discussed in Section 7.4 and is compared with data in Sections 7.6 and 7.7. 7.1.1
Linear Functions
The mixture model of Eq. 7.1 is, for many purposes, usefully re-expressed as
Observe that by differentiating one finds the same mixture property for densities
The density exhibits one of the most characteristic features of mixtures— namely, that if g () and g, are unimodal and the modes are well separated, then / is bimodal. Recall, we saw evidence of this in the pigeon discrimination data of Figure 6.4 (Section 6.3.4). Because the expectation operator is linear, a mixture relation holds for the moment generating function (mgf),
where, of course, M,^ is the mgf of G(. From this follows immediately that all of the raw moments are mixtures of the corresponding component raw moments: where v^r) is the rth raw moment of G^ and /j,(JB)
276 Identification Paradigms
We see that there is an endless list of linear relations that arise by eliminating p(E) between any two mixture equations. One example is the linear relation between the first and second raw moments—namely, where
Another, slightly less obvious one, is that between the distribution funetion itself and the mean—namely, where
This can be plotted for any value of t by manipulating the experimental state represented by E. Examples will be provided in Seetion 7.3.1. Another family of relations arise by taking the difference of any equation for two values of E, call them E and E'. For example, from Eq. 7.2 we see that And if we run a third distinct condition, E", it follows by division that
which predicts an invariance in the data—namely, that the left side is independent of (. Thomas (1969) noted that one can always express F(t E) as a linear mixture of the two most extreme observable distributions. To show this, suppose p(m) < p(E) < p(M). Then from Eq. 7.1 with E = rh and E = M, we solve explicitly for G () (f) and G,(z) as functions of F(t rh), F(t M), p ( m ) , and p(M). Substitute into Kq. 7.1 with general E to obtain where All of these relations provide partial ways of testing the model and of estimating parameters.
7.1.2
Fixed Points
In any of the linear expressions involving functions—for example, Eqs. 7.2, 7.3, and 7.4—the following fixed-point property holds (Falmagne, 1968),
Mixture Models
277
which I will illustrate for Eq. 7.3. Suppose that for some (*, g 0 ((*) = gi(t*), which for densities is surely plausible if they overlap at all. Then for all E, that is, all of the densities pass through the same fixed point. Observe that the condition g0(t*) = gi(t*) fails to hold only when g0 and g t do not overlap, in which case f(t E) must be a perfectly bimodal density function, a property easily checked in the data. The only difficulty in checking Eq. 7.11 is that one is forced to work with histograms, not density functions, and there will be some uncertainty as to whether the fixed-point property holds. The comparable fixed-point property of the distribution function is rather less revealing because there is no compelling reason to suppose G0 and G, intersect at any value intermediate between 0 and 1. For example, if they are Gaussian with equal variances, then there will be no intermediate fixed point. The same is true for the mgf in the Gaussian case. *7.1.3 Statistical Test Thomas (1969) has provided a statistical test of the hypothesis Eq. 7.10 that derives from the general mixture hypothesis. To my knowledge it has not been systematically applied. The assumed data are samples of observations from each of three distributions, (l^i, t^2, • • • , thn(S))> h = E, m, M. Now select triples consisting of an observation from each set, say (t&, tAj, t^k), and rank order them. The ranking is a permutation of (1,2,3). Assign to the triple the number 0 if the permutation is effected by an even number of interchanges of adjacent numbers and 1 if the number is odd. Let Z denote that assignment, and let Z*, where n=mm[n(E), n ( m ) , n(m)], denote the average of this statistic over all possible distinct triples:
Thomas established the following facts about Z*, which I do not prove here. First, as n —» °°, (Z* — j)/Vn approaches the Gaussian distribution with zero mean and unit variance. Second, the variance of Z* is given by
where
278 Identification Paradigms
Thomas developed a method to estimate 6 and
7.2 A LINEAR OPERATOR MODEL FOR SEQUENTIAL EFFECTS 7.2.1
The Model
Falmagne (1965) suggested a simple learning mechanism to describe the way the mixture parameter p(E) might be selected. It is of such a character as to provide a possible account of the sequential effects seen in much reactiontime data. Later, Falmagne, Cohen, and Dwivedi (1975) formulated a somewhat different Markov chain model within the conceptual framework of memory search, and they compared it in great detail and with considerable success for one subject to sequential data of the type discussed in Section 6.5.1. Although that work belongs conceptually with the memory search ideas of Chapter 11,1 elect to discuss it now (Section 7.5) because the ideas are easy to state. It is an important example of a mixture model. Turning to the linear operator approach, suppose there are k stimuli, one of which is presented on each trial, and suppose further that the probability of being in state 1 rather than state 0 depends upon which stimulus is presented. Note that this is a rather surprising assumption. The more obvious one would be that G 0 and G t are affected by the stimulus. Pursuing Falmagne's assumption, denote by p n (s) the probability that state 1 arises on trial n if stimulus s is presented. The basic mechanism postulated for passing from pn(s) to pn+{(s) is this. If some stimulus other than s, say s*, is presented on trial n, then with some fixed probability a* the system moves into state 0, and with probability 1. — a* it remains in state 1. And if s is presented, there is a fixed probability a that the system moves into state 1, and with probability 1 — a no change is made. So, letting s* denote any stimulus different from s,
The mixture distribution on trial n + 1 when s is presented is obviously Substituting Eq. 7.12 into Eq. 7.13 yields
Consider, now, a situation where beginning with trial n and running through
Mixture Models
279
trial n + k — 1, a sequence of k repetitions of stimulus s occurs, then by a very simple induction we obtain In like manner, a sequence of k stimuli different from s yields From these equations, very simple expressions for the moments of the reaction times arise; for example, the means are
Clearly, a strong sequential structure is imposed on the mean reaction-time data, which can be predicted once the parameters ic0, v t , a, and a* are estimated. It is evident that from Eqs. 7.12 and 7.13 an explicit equation for E(Tn^k s) can be calculated for any particular sequence of stimuli beginning with trial n; I will not, however, write these down.
7.2.2
Parameter Estimation
Observe from Eq. 7.17 that the one-step transitions of mean reaction times are given by
In principle, then, if we have estimates of the mean times on successive trials, then the plot of one against the other should be linear. Thus 1-a and I —a* could be estimated as the slopes and avl and a*vQ as the intercepts. The problem in carrying this out is that E(Tn s) is not an observable. So the task is to try to transform this scheme into one that does involve observables. The major idea for doing so rests on the process settling down to a stationary stochastic process after a sufficiently long time, in which case the transitions becomes independent of n. And if that is so, then one can (with care) average over trials and thereby estimate mean reaction times. There are two problems to consider: when is the process asymptotically stationary and what averaging should be done? For the first, it is enough to give a sufficient condition. Consider the class of experiments in which the probability of presenting signal s is a constant, TT(S), that is independent of the trial, where £s TT(S)= 1. Consider the state transitions for signal s. Suppose that on trial n the system is in state 1, so
280 Identification Paradigms p n (s)= 1, then using Eq. 7.12 we see that
And if the system is in state 0, so p n (s) = 0, then
So the transition matrix among the states is trial independent, and that process forms a Markov chain (Section 1.5.2). It is a well-known property of Markov chains that there is an asymptotic distribution of states that is easily determined from Eqs. 7.19 and 7.20. Let p(s) be the asymptotic probability of state 1 for signal s, then
Collecting terms and solving,
So the condition that the stimulus presentation schedule is random, independent of trial number, is sufficient to lead to an asymptotically stationary process. Let us turn, therefore, to the issue of averaging in order to use Eq. 7.18. The key problem is that while the process is overall stationary, there are all sorts of local run properties that must be taken into account. So we estimate a number of different E(Tn s) terms depending upon the length of the run leading up to the presentation of s on trial n: s*s, s*ss, s*,sss, s*ssss, and so on, where s* denotes any stimulus different from s. These means are all kept distinct. Now, what do we use to estimate E(T tl+1 s)? If we are in the situation where stimulus s was presented on trial n, then the answer is simple: to the mean time estimated from a run of k s's, associate the mean estimated from a run of k + 1 s's. The problem is trickier when we deal with the second half of Eq. 7.18 in which the condition is that stimulus s was not i> presented on trial n. Then the relevant past history is of the form s*§~^-~^~ss*, and so E(Tn , , s) is estimated over all histories of this type, which were called "intervals" by Falmagnc, and it is compared with the E(T fl s) estimated from $*s~^~ss, which history was called a "repetition." Each stimulus and each value of k yields a pair of mean times for each half of Eq. 7.18, and if they exhibit sufficient range, then a way to estimate the parameters is provided. Obviously when TT(S) is small, runs rapidly become very scarce with increasing k. A second method for estimating the parameters arises from the expressior for the asymptotic probabilities, Eq. 7.21. From Eq. 7.14,
Mixture Models
281
FIG. 7.1 Second raw moment versus MRT from a six-choice digit experiment with nonuniform presentation probabilities (Falmagne, 1965). There were 1400 trials per subject and seven subjects. According to two-state mixture model, this should be linear (Eq. 7.6).
Let n -H> °° and substitute from Eq. 7.21,
Observe that only a/a* is identifiable, not a and a* separately. The parameter estimates can be chosen so as to minimize the mean square error for the times over the stimuli. 7.3 DATA WITH FEW RESPONSE ERRORS 7.3.1
Linear Functions
Falmagne (1965) reported a study in which each of seven subjects responded as rapidly as possible, subject to maintaining near error-free performance, as to which of six digits appeared on a display. The instructions were successful in holding down the error rate to about 3%. In the data analysis, all error trials were discarded. Responses were carried out by three fingers of each hand. Considerable care was taken to counterbalance the design, and the data were averaged over subjects, digits, and fingers. The major manipulation was the likelihood of the digits, which appeared with probabilities 0.01, 0.03, 0.06, 0.10, 0.24, and 0.56. The data were analyzed from three points of view: linear functions (in this subsection), fixed points in the density of reaction times (Section 7.3.2), and sequential effects (Section 7.3.3). Figure 7.1 shows the relation between the first two raw moments, Eq. 7.6;
FIG. 7.2 For the same experiment as Figure 7.1, estimated F(t) as a function of MRT, with t a parameter. By Eq. 7.7 this should be linear in the two-state mixture model.
FIG. 7.3 This is the same plot as Figure 7.2 for data of Lupker and Theios (1977), which I adapted from an unpublished manuscript of D. Noreen.
282
Mixture Models
283
there appears to be a very slight departure from linearity. Figure 7.2 shows the distribution function F(t \ s) for several values of s against the mean reaction time (Eq. 7.7). Quite obviously these curves are not linear, and so the model surely does not fit these data terribly well, although as we shall see in Section 7.3.3 it gives a partial account of the sequential effects. Lupker and Theois (1977) conducted a similar experiment using four digits and presentation probabilities of 0.10, 0.20, 0.30, and 0.40 with the results shown in Figure 7.3. Superficially, these data appear rather more linear, but in relation to the Falmagne data they correspond only to the 100-msec range from 400 to 500msec, where his data are also fairly linear. 7.3.2
Fixed Points
Falmagne (1968) first pointed out to psychologists the striking fixed-point property of the mixture model, Eq. 7.11, and he plotted the distributions for his 1965 experiment. These are shown in Figure 7.4, and quite obviously they do not exhibit the fixed-point property since the top two histograms intersect at the dotted line and that line does not go through the point of intersection of the top histogram with any of the others. Lupker and Theios (1975) ran both a four- and six-digit version of the experiment with unequal presentation probabilities. The design differed, however, in having a key stimulus that was selected to be presented with probability IT, and the remaining digits were equally likely with a total probability of \-tr. The values of IT used were 0.10, 0.25, 0.40, 0.55, and 0.70. The key stimulus for each subject was the same throughout the conditions, but differed among subjects. In the four-digit experiment there
FIG. 7.4 Response-time histograms from the digit study of Falmage (1965) in which the total sample size is 9800. In the twostate mixture model, the densities underlying these histograms should have a fixed point. [Figure 1 of Falmagne (1968); copyright 1968; reprinted by permission.]
284 Identification Paradigms FIG. 7.5 For a four-digit experiment similar to that of Figure 7.4, the response-time histograms for several probability distributions. The distributions are based on 608, 1620, 2632, 3644, and 4856 observations with increasing IT. [Figure 1 of Lupker and Theios (1975); copyright 1975; reprinted by permission.]
were a total of 20 subjects, five per key digit. In the six-digit case, the same pattern was used, leading to 30 subjects. Figures 7.5 and 7.6 show the estimated density functions for the four- and six-digit experiments, respectively. The fixed-point property that all histograms should meet at a common point is well sustained in the four-digit case, and less well so in the six-digit case, although it does not fail nearly so badly as in the Falmagne data. Lupker and Thcois suggested that the extremely small presentation probabilities used by Falmagne were the source of the problem because, they
FIG. 7.6 The same as Figure 7.5 for a six-digit experiment. The sample sizes, with increasing TT, were 912, 2280, 3648, 5016, and 6484. [Figure 2 of Lupker and Theios (1975); copyright 1975; reprinted by permission.[
Mixture Models
285
FIG. 7.7 For Falmagne's (1956) digit experiment, MRT for repetition sequences of length n + 1 versus MRT for repetition sequences of length n, where a repetition sequence is defined in the text (Section 7.2.2). [Figure 3 of Falmagne (1965); copyright 1965, reprinted by permission.]
conjectured, the subjects treated the three least probable stimuli in a way quite different from the others. In any event, it seems clear that the two-state mixture model is not sufficiently general to account for even these simple experiments; nonetheless, in some cases the data do in fact exhibit the fixed-point property. 7.3.3
Estimation of Parameters and Fitting Sequential Data
There are two major ways to estimate the parameters. One is to use the asymptotic mixing probability given by Eq. 7.21 in the expression for the mean reaction time, Eq. 7.5. This yields estimates of v0, c t , and a/a*. The other is to examine the one-step sequential effects on the reaction times, as was outlined at the end of Section 7.2.2. Figures 7.7 and 7.8 show the repetition and interval scatter diagrams for Falmagne's (1965) data, the former yielding a and i>H, the latter a* and i/,. The results of these estimation schemes are shown in Table 7.1. The correspondence is good.
FIG. 7.8 Same as Figure 7.7 but for interval sequences (see Section 7.2.2). [Figure 4 of Falmagne (1965); copyright 1965; reprinted by permission.]
286 Identification Paradigms TABLE 7.1. Parameters estimated by Falmagne (1965) Source
MRT Sequential MRT
t^"' .328
.076
a*' /a.
.210 .232
f
(l
295 300
v,
611 593
Using those estimated from the one-step sequential data, predictions can be made for repetitions and intervals of various lengths, and these are shown in Table 7.2. There are, as one might expect by now, substantial discrepancies for the low probability presentations where, for example, the total sample of presentations at the 0.01 probability over the seven subjects was only 378. Splitting this up into sequences is bound to produce highly unstable estimates. Remington (1969), whose experiment and data were presented in Section 6.5.1, interpreted his results as inconsistent with a two-state mixture model. Theois and Smith (1972) pointed out that this is incorrect. They fit the model to the data in two ways under the restriction that a = a*. Either the value of a can be assumed independent of the experimental manipulation of presentation probability or to depend on it. Under the former assumption, the fit to all of the sequential MRTs had an average absolute error of 8.6msec, and under the latter assumption the error was reduced to 2.8 msec. The numbers resulting from the latter assumption, shown in Table 7.3, is impressive.
7.4 THE FAST-GUESS ACCOUNT OF ERRORS 7.4.1
The Simple t^'ast-Guess Model
Oilman (1966) proposed a generalization of Falmagne's mixture model that allows for errors in responses. He stated it only for means, and Yellott (1967, 1971) generalized it to distributional form and pointed out some problems with Oilman's test of the model. More detailed tests of the model are included in Oilman's (1970) dissertation and in Yellott (1971). As we recall from Section 6.3.3, the basic idea is that in the fast mode, which has response-time distribution G 0 , the subject is unable to discriminate at all between the signals and responds simply by guessing; whereas, in the slower identification mode, which has response-time distribution G t , the subject is able to discriminate between the signals, although not necessarily perfectly. Oilman (1966) assumed that in advance of the trial the subject decides either just to detect the signal onset and make a preprogrammed response or to wait until the signal is identified and then respond. So, G 0 should be the simple-reaction-time distribution. Yellott (1967) suggested
TABLE 7.2. Observed and theoretical MRT for the various repetition and interval histories (parameter values: vl = 6l1, i> 0 =300, a*/a = 0.232) Repetition
Interval Probability
0.56
Pre. Obs. 0.24 Pre. Obs. 0.10 Pre. Obs. 0.06 Pre. Obs. 0.03 Pre.
1
2
3
4
5
6
7
358 376 423 455
376 399 436 464
392 400 448 470
407 422 459 463
421 427 470 479
434 434 480 461
446 434 498 487
Obs.
Note : Pre. = predicted; Obs. = observed.
N>
00
-J
8
506 512
0
1
2
3
4
5
6
7
8
373
335 320 381 391 404 406
324 316 357 367 373 362
316 309
311 314
307 314
305 308
303 310
511 419
352 352 416 434 448 491
555 557
478 482
604
513 479
389 465 471
607
9
10
302 301 302 305
288 Identification
Paradigms
TABLE 7.3. Mean reaction time (in milliseconds) on trial n given preceding stimulus sequence Stimulus presentation probability
3
Stimulus sequences Order First
S
n
-V
, S,, 2
sn -:, s;, 4
1
Second
1 1
2 2 1 1
Fourth
Fifth
1 1
( 1
1
2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 1 1 1 1 1 1 1 1 2 2 2 2
1
2
1 1 1
2 2 2
Pre.
Obs.
Pre.
Obs.
Pre.
307
308 300 311 293 304 306 314 288 295 298 304 302 308
274 269 280 267 276 278 292 263 271 275 279 278 280 291 294 264 262 273 266 273 278 275 285 279 276 283 276 287
273 269 281 267 274 278 288 266 270 272 278 276 282 285 293 265 268 269 273 271
290 284 295 278 288 290 300 276 280 283 293 292 286 293 305
290 284 296 280 289 292 301 276 284 286 293 288 295 296 304 274 279 281 286 283 288 290 295 285 291 292 298 294 300 301 307
310 297
2
Third
Obs. 302
1
1 2 1 2 1 1
2.
2 1 1 2 2 1 1 1 1
2 2 2 2 1 1 1 1 2 2 2 ")
i
2 1 2 1 •1
1
2 1 1 2 2 1
1 2 1 2 1
2 2 1
I 2 1
1 2
2 1
1
o ^ I
I
2 2
2
0 ^ I
2
1 2
.5
.7
309 306 314 292 301 309 309 309 305 308 316 287 296 296 302 303 311 309 309 303 312 314 300 308 308 320 315
309
315 283 290 292 297 295 300 302 305 300 304 306 309 308 312 313 316
300
288 302
275
277
282
275 279 281 286 284 289
291 297
273
279 277 283 285 281 293 294 292 292 282 290 288 300 301 310
Note: Obs. - observed; Pre. = predicted.
that the mechanism might actually be one of time estimates from the warning signal. I will discuss the evidence on that point in Section 7.5.3. The source of tradeoff between speed and accuracy in this, the simplest, version of fast guessing is attending or not to the identity of the stimulus. If one does attend, then there is assumed to be a fixed distribution of response times and a fixed error rate. One of the major empirical tasks is to decide, given that the subject attends to the stimulus, whether or not there is a further speed-accuracy tradeoff beyond that of attention. If there is, as appears to be the case, then it too must be modeled, which is the topic of Chapters 8 and 9.
Mixture Models
289
Recall that p = p(E) is the probability of being in state 1 (i.e., wait to identify the signal), |3A and /3B are the two guessing probabilities, where PA + PB = 1, and Psr is the probability of making response r to stimulus s when in state 1, where FsA 4- PsB = 1. Then the response probabilities are (Eq.6.10) Although in his work Falmagne assumed that p is not only a function of the past history of the experiment, but also of the signal on the current trial, given the interpretation of the fast-guess model, dependence on the signal presented is implausible. Moreover, the analyses suggested by Oilman and Yellott do not go through if that is the case. So we explicitly assume that p is independent of the signal presented although it still depends upon other aspects of the environment. It is easy to see that the response-time densities are given by
where and P(r s) is given in Eq. 7.23. This form is exactly the same as Eq. 7.1 except that the mixture parameter T|sr is a function of both s and r as well as the condition. 7.4.2
Two Predictions
In addition to the fixed-point property, which holds since Eq. 7.24a is a special case of Eq. 7.3, the model embodied in Eqs. 7.23 and 7.24 makes two major predictions about the data. The first seems not to have been noted in the literature, even though it provides a very stringent and useful test of the simple fast-guess model. Keeping in mind that the denominator of Eq. 7.24b is P(r s) and using the local independence of the model,
which is completely independent of the signal s; that is, for all t, and in particular
290 Identification Paradigms
This will be checked below in several bodies of data, but we already know that it is at considerable variance with some of the data reported in Section 6.5. (For example, with unequal presentation probabilities, the more probable signal is responded to more rapidly than is the less probable one.) The second prediction follows a similar line of argument, originally stated by Oilman (1966) for means. Following Yellott (1971) and using the subscript notation c and e on P and / for "correct" and "error" responses, we first define
Then we establish that
and
2/e(0 = /(f a,b)P(B a) + f(t b, A)P(A [2 -PaA-Pbfs]g ,(0-go(0
poA+n B -i
b)
(P c -P e ) + g 0 (f).
(7.28b)
These are proved as follows. First, observe that by definition of Pc and PE and Eq. 7.23,
Next, substitute Eq. 7.24a into the definitions of fc and fe, and then use Eq. 7.29 to eliminate p and get Eq. 7.28. If we define
then from Eqs. 7.28 and 7.29, we obtain the important relations
Mixture Models
291
By Eq. 7.27, Pc + Pe = 1, and so adding Eqs. 7.31a and 7.31b, we see
Equation 7.31c follows immediately from Eqs. 6.13 and 7.26b. The first two of this trio of equations say that when p is varied by some experimental manipulation, the data should exhibit linear relations between estimates of 2PCMC and 2PeMe and estimates of Pc - Pe. Moreover, they have a common intercept at the mean guessing time VQ. These are two speed-accuracy tradeoffs in terms of the observables. The third combines these two more detailed results into a single linear SATF between JS(T) and Pc (see, also, Eq. 6.13). As was noted earlier, these tradeoffs arise in the data without our having to postulate any direct tradeoff between speed and accuracy in the underlying mechanism. Another important property of the model arises by subtracting Eq. 7.28b from Eq. 7.28a, yielding For Pc — Pe not too near 0, this provides us with a means of testing the hypothesis that g j is invariant under changes of conditions in the experiment. If it is, one can use Eq. 7.32 to estimate g t from the data. Such a formula is known as a "correction for guessing." The general question about techniques for correcting various response measures for the effect of guessing was explored by Link (1982). It follows immediately from Eq. 7.32 that the means satisfy the linear relation in which the slope is the mean identification time. 7.4.3
Estimation of the Mean Identification Time
The obvious way to estimate the mean identification time is to estimate P<:, Pe, Mc, and Me from the data in the usual way and then compute an estimate of V-, from Eq. 7.33. Is this a satisfactory estimate? Since we have made no assumptions about the distributions G0 and G b we do not know the sampling distribution of v{ unless we can in some manner invoke the central limit theorem (Appendix A). Basically, we would like to know the conditions under which the estimate from Eq. 7.33 converges in probability to v^ as the sample size goes to <*. Yellott (1971) presented the following sufficient conditions: (1) G 0 and G t have finite mean and variances, (2) HmE[P(A a) + P(B b)-1]>0, and (3) the convergence of E(pm s^r^.TJ to E(p m ) is not too slow—geometric or better will do—where s n , r n , and
292 Identification Paradigms
Tn denote, respectively, the stimulus, response, and latency on trial n and pm is the probability of attending to a signal on trial m. Since these conditions seem plausible, it is reasonable to use Eq. 7.33 to estimate vl. The proof may be found on pp. 168-169 of Yellott (1971). The intercept of Eq. 7.31c yields an estimate of y(), and with both of those means known, then the slope provides an estimate of PaA + -P|,B7.4.4
Generalized Fast-Guess Model
Oilman, in his original presentation, stated a somewhat more general model than the simple fast-guess one. This generalized model supposes that G, depends both on the signal presented and the response made, and so is replaced by G t Sl. and that G 0 depends on the response, and so is replaced by G 0 r . The question, then, is the extent to which the two predictions of Section 7.4.2 are changed. First, nothing as simple as Eq. 7.26—that /(( s) is independent of the signal presented—obtains. Second, the linear relation given in Eq. 7.33 still obtains with v, replaced by
This result is verified in exactly the same way that Eq. 7.33 was. Thus, if we find Eq. 7.26 rejected but Eq. 7.33 sustained in data, we know that the simple fast-guess model is wrong but the generalized one might be correct. 7.4.5
Sequential
Effects
Since we know there are sequential effects in the data (Section 6.5), it is reasonably important to see if there is reason to expect them to affect the predictions of the fast-guess model. Basically, one wants to graft Falmagne's linear operator model onto the fast-guess structure. The most obvious way of doing so—namely, to suppose p is trial- and stimulus-dependent and changes according to Eq. 7.12—will not do, the reason being that if the subject knows which signal was presented then the fast mode would not also be a guessing mode. Rather, Yellott (1971, p. 194) suggested that the impact is on the guessing strategy as follows. On trial n there is a probability
Mixture Models
293
and error response; that is,
The sequential process is to change
A Donderian Assumption
Consider the following further specification of the simple fast-guess model. First, suppose, as Oilman suggested, that the difference between the two states 0 and 1 is that only detection occurs (say, by means of a change detector) in the former, whereas in the latter it is followed by an independent stage of identification. This is a typical Donderian assumption of the type discussed in Section 6.2.2. Second, let us explicitly suppose that the time for identification is independent of the time for detection. If so, then g, is the convolution of g() with the identification distribution, call it gr. Putting this together with Eq. 7.24 and calculating the mgf yields where T|sr is given by Eq. 7.24b. In principle, this can be used as follows, although it has yet to be applied to data. By experimental manipulation, run the experiment for two values of p : p(l) and p(2). Let T) sr (0, ' = 1. 2, be the corresponding values of Eq. 7.24b, then it follows immediately that
This linear relation can be tested by obtaining M(x \ s, r) from the choice situation for two experimental manipulations run on the same subject under
294 Identification Paradigms
the same conditions. If three levels of p are run, then Eq. 7.23 yields six probabilities and the six parameters: paA, Phli, |3A, p(l), p(2), and p(3). This permits us to compute r\sr(i) from Eq. 7.24b and to insert these values in Eq. 7.35, so the predicted relation is parameter free. And, of course, we can compute M,(x) from Eq. 7.35. It would be of interest to do this in a situation where the general fast-guess model appears to hold. 7.5
A THREE-STATE, FAST-GUESS, MEMORY MODEL
Falmagne, Cohen, and Dwivedi (1975) suggested combining two sets of ideas—fast guessing and memory scanning—as a way to account for the sequential effects found in two-choice reaction times. Unlike Falmagne's 1965 model, this one is sufficiently general to account for the sequential effects in error data as well as time data. Although the general issues of memory scanning are not dealt with until Chapter 11, the ideas used in this model are sufficiently self-contained and simple that introducing them now does not present any real difficulty. It should be recognized, however, that the assumptions of serial and self-terminated scanning are controversial. Without a doubt, the subject has some sort of stored representation of each of the two signals, a or b. Suppose on each trial, the subject elects either to base the decision on a comparison of the information arising from the signal with the representation held in memory or to make a fast guess. In the former case suppose memory is organized so that a comparison is first made with one of the representations and then, sometimes, with the other— that is, either in the order (a, b) or (b, a). This is the serial aspect of memory. If this is true, then the organization of the memory can serve to reflect the subject's expectancy about which signal is more likely or more important. The model has two distinct components: how the responses are affected by both the signal presented and the state of the subject's memory, and how the state of memory is altered. The response part is taken up in Section 7.5.2. Basically three things are involved: under what conditions does the memory search terminate after comparing the information from the signal and the first item in memory; what response is made in each case; and how long does each response take? The temporal structure of this response system is a three-state mixture in which fast guesses have one distribution, single scans have another, slower one, and double scans a still slower one. In addition, these distributions are assumed to differ depending upon which response is made; this is necessary if, for example, the subject's response times for the two hands differ. The other component of the model, which is developed in detail in Section 7.5.1, is a description of the possible states of memory and a model for how these change. This leads to a Markov chain representation of memory states in which the experience on a trial has some tendency to alter
Mixture Models
295
the state of memory. We will be interested only in these dynamics after steady state has been reached, in which case they are the source of sequential effects in this model. 7.5.1 Memory Transitions In describing this model, I find it convenient to continue using a notation similar to that used so far; it differs somewhat from that of Falmagne et al. I use a and b to refer, in context, to the particular stimuli, the particular responses used to identify them, and the corresponding representations in memory. Context will make quite clear which of the three levels is involved. Generic stimuli and responses are denoted s and r, respectively. The stimulus other than s is denoted s*, so if s = a, then s* = b. The same convention is followed for responses. For trial n, the stimulus presented, the response made, and the response time are all observable realizations of random variables, denoted as usual, sn, r n , and Tn. In addition, there is assumed to be an unobserved random variable, Mn, describing the state of the subject's memory. This variable is discussed next. There are four possible states of memory which will be denoted aF, br, a, and b. The first two are the fast-guess states, with the former biased toward a and the latter, bF, toward b. The other two states determine the order of scanning: a indicating that the representation of a is scanned first and b second, and b indicating the opposite order. The intuitive idea about how memory changes is that if the signal presented agrees with the bias in memory, then that bias remains or becomes more severe, where SF is a more severe bias than is s, s = a or b. If, however, the signal presented does not agree with the bias, then memory ends up at one or the other of the less biased states, the probability of each depending upon where it was. The following matrix of transition probabilities from the states on trial n to trial n + \ captures these ideas explicitly: if signal s is presented, then the transitions among memory states are:
Note, there are two distinct matrices depending on whether s = a or b, and that they introduce a total of six parameters. So far as I can tell, the assumptions embodied in Eq. 7.37 have no basis in any observations aside from the data they help explain. They seem plausible, and certainly memory for simple signals might work this way. But it is not
296 Identification Paradigms difficult to think of other quite plausible ways it might work, and so their real test comes later, in accounting for sequential data. Before turning to the response model, it is useful to introduce an assumption about the experiment to which the model will be applied— namely, that the probability of presenting a signal depends upon nothing but the signal. Formally, if we let W,, be the entire history of random variables up to and including trial n—that is, s f , ri; Ti; M f , i = 1, 2, . . . , n, then the postulate is
where £ TTS = 1. On that assumption, one can calculate the transitions among .s
the memory states. For example, consider, Pr(M n ( 1 = a Mn = b, sn, W n _ , ) = Pr(Mn < j = a Mn = b, sn = a, W n __,) Pr(srt = a Mn = h, Yfn_,) + Pr(Mn M = a | Mn = h, sn = b, W n ..,) Pr(sn = b\Mn = b, W n ..,) = aaTTa + 0. The first term arises from Eq. 7.37 by noting that with s = a, the memory states are Mn = s* and M,H , — s, and so the transition probability is aa. With s = b, the memory states are s and s*, which has a transition probability of 0. Thus, we derive that this transition probability is a constant, independent of n and of W n ,, r m and T,a. As this is true of all other pairs of states, the memory transitions form a Markov chain with the following transition matrix:*
The fact that memory transitions form a Markov chain very much simplifies the calculations involved (see Section 7.5.3).
* The matrix in Falmagnc et al. (Table 6. p. 329) appears to he in error, in that it has a pair of parameters in addition to the four pairs postulated. These appear to arise from having the last row of Eq. 7.37 written as (-yv Ss, 1 - js —Ss, 0), thereby allowing a direet transition from one guessing state to the other. In fitting the model to data, they elearly assumed 7, = 0.
Mixture Models 7.5.2
297
Responses and Response Times
The postulated response rule is simple. When the memory is in one of the fast-guess states, the response is that of the state. So assume memory is in one of a or b. The assumption is that if the signal agrees with the memory state—that is, there is a match between the signal and the first memory scan—then that response is made and it is error free. If, however, a mismatch exists, then either of two things can happen. The mismatch may be perceived as a match, in which case the response is that of the memory, and it is an error. Such a misperception is presumed to occur with some probability T) S * S when s is presented and, as before, s* denotes the opposite symbol. If, however, the mismatch is correctly perceived, then the second memory is scanned, a match is found, and the correct response is made. It is easiest to summarize this in terms of the conditional error probabilities:
This rule introduces two new parameters, TJU|, and r\ba. For the response times, six different distributions are introduced. For fast guesses, they are G rF , where r= a, b. For a single scan of the memory, they are G r l , where r = a, b. Recall, this occurs whenever the response matches the memory, either because the signal matches the memory or because it does not and an erroneous response is made. For two scans of the memory, they are G r2 , and this occurs when the response and memory do not agree. Presumably, for each r, the means are ordered vrF
This completes the formal statement of the assumptions of the model. There are a lot of parameters: six mean times, two T) S * S , and six transition probabilities. However, considering the variety of probabilities and mean times that can be generated by partitioning the data by its history, the model can certainly be tested. We turn next to how this is done.
7.5.3
A Strategy for Data Analysis
Unlike the fast-guess model or even the more general two-state models, few simple properties of this model have been found that can be used to evaluate it. The main test is to estimate all of the parameters and see how well the model reproduces the sequential data. I outline that procedure in
298 Identification Paradigms
this subsection, but first I derive the only known testable property, which also serves to simplify slightly the estimation problem. Consider the class of histories: (rn f l = s*, sn+l = s, Tn, r n , sn = s, Wn...J. Observe that the memory state on trial n + 1 must be s*. The reasoning is as follows. Since sn = s, we know that M n M cannot be s* because the final column of the memory matrix, Eq. 7.37, is all O's. And since r n _ , : is an error, we know by Eq. 7.39 this is possible only if Mn + 1 is either s* or s*. As the latter is impossible, the former must hold. Knowing that, consider Tn M . Since M n+1 = s* =r n , , , we see by Eq. 7.40 that the distribution G s * , must be used, so
This affords both a test of the model since, for example, one can partition the data on r n and, of course, Wn ,, and see if the estimated values are the same or not. Assuming they are, it affords an estimate of these parameters. Beyond that, it is necessary to deduce explicit expressions for the various conditional probabilities. These deductions are straightforward, once one gets the knack of it, but they never cease being tedious. To illustrate in a simple case the type of calculation involved, consider the special case of no guessing states—that is, /3S = 0 (this is called model I in Falmagne et al.). Using first the law of total probability, next Eq. 7.39, and then Bayes' rule, we can develop an explicit expression for the error probabilities (in which I suppress the notation for much of the past history):
where we have cancelled the two presentation terms that, despite different conditioning, are by assumption (Eq. 7.38) just the probability TTS. This result becomes useful when we let n —»• °o and note that, because the memory transitions form a Markov chain and so have an asymptotic distribution, lim Pr(M,t = m) = p(m). The asymptotic distribution can be calculated explicitly in terms of the parameters of the transition matrix, but there is no need to do so here. This illustrates the general strategy. Since the equations are algebraically complex, Falmagne et al. programmed a computer to do the work for them. They then did a numerical exploration of the 12-dimensional parameter space, finding a numerical minimum for x2- The experiment to which this was applied and the results are described in Section 7.6.4.
Mixture Models
299
7.6 DATA WITH RESPONSE ERRORS: HIGHLY DISCRIMINABLE STIMULI 7.6.1 Identification of Color, Auditory Frequency, and Auditory Intensity Yellott (1967, 1971) ran a series of studies modeled on a design first introduced by Oilman (1966), except that Yellott's stimuli were far more discriminable than were Oilman's. The design involved manipulating a deadline, such that responses after the deadline were fined and those before it were paid off for accuracy. The deadlines used by Yellott were 150, 200, 250, 300, 350, 400, 500, and 800 msec. Following each trial, information feedback was provided. The stimuli to be identified were all highly discriminable. In Experiment 1, which was partially analyzed in his 1967 paper, the stimuli were red and green lights on a display screen. The signals were equally likely to appear, and the symmetric payoffs were not varied. In addition to the eight deadline conditions, there were two other conditions: "speed" in which subjects were pressed for speed and asked to ignore accuracy, and "accuracy" in which there was no deadline and they were urged to be as accurate as possible. There were 960 trials per condition and three subjects. Experiment 2 was similar to 1 except that the payoffs were varied in order to alter the accuracy of performance over a wider range. There were 500 trials per condition and four subjects. Experiment 3 changed the stimuli to 1000- and 1500-Hz, 70-dB SPL tones, reduced the deadlines to three (250, 300, 400 msec), and varied the presentation probabilities (.1, .3, .5, and .7). Of the possible deadline-probability pairs, nine were run. There were 500 trials per condition and four subjects. Only for Experiment 3 were the data reported in sufficient detail (see Appendix C.I) to compute MRT(a) and MRT(b) and so to check their equality (Eq. 7.26b). With mean reaction times in range from 200 to 300 msec, the mean differences of the estimated means and standard deviations of those differences were: 1.84 ± 18.64, 3.77 ±12.88, and — 8.83±12.81 msec. This does not reject the null hypothesis of Eq. 7.26b. The plot of PCM,. - PeMK versus Pc - Pe for the first experiment is shown in Figure 7.9. The linearity of these functions seems very satisfactory, supporting one of the main predictions of any of the fast-guess models. The data from the second and third experiments were comparable. The major deviations from linearity were the accuracy condition in Experiment 1 and, perhaps, the 800-msec deadline in Experiment 2 (condition 9). As Yellott pointed out, in the accuracy condition the subject has no motive to respond as rapidly as is consistent with perfect accuracy, and so this failure probably should not be taken very seriously. The major evidence in these data that there may be some difficulty with the model is the fact that when the estimate of v\ is correlated with
300 Identification
Paradigms
FIG. 7.9 P,MC — P,.M(. versus P, —/', for the identification of highly discriminable red and green lights collected under various response deadlines. There are 960 observations per condition. The linear function is predicted by the fast-guess model (Eq. 7.33) and the value /x s —in my notation, v\—is the slope of the line. [Figure 2 of Yellott (1971); copyright 1971; reprinted by permission.]
accuracy, Pr, the correlation is significantly different from 0 for the four subjects of Experiment 2. Recall that for the simple fast-guess model, there are in addition to the linear prediction of Eq. 7.33 the two separate linear predictions of 2PCMC and 2Pf,Me versus Pc-~Pe, for which v(} is the common intercept (Eqs. 7.3 la, b). With y, known, one can use one slope—Yellott used the error one—to estimate PaA+Pbu and so predict the slope of the other. The estimates and predictions for the deadline conditions of Experiment 1 are shown in Figure 7.10. The model seems in close accord with the data, as it does for the other two experiments except for one deadline in Experiment 2. In his unpublished dissertation Oilman (1970) reran his earlier 1966 study (see Section 7.7.1 below), but with several changes in design and one of
Mixture Models
301
analysis. He introduced catch trials and variable foreperiods in order either to induce the subjects not to time-estimate from the warning signal or to detect it if they did; the evidence was that they were not estimating time. And he used asymmetric distributions of the stimuli—20% catch trials, 50% signal a, and 30% signal b—in order to avoid symmetric guessing probabilities (i.e., /3A = pB = 2), which in certain cases cause estimation problems. The signals were 900-Hz tones of 100-msec duration that differed by 19 dB. The conditions differed according to the instructions, which attempted to vary the speed-accuracy tradeoff. Appendix C.2 presents the data. Letting a denote the more probable signal, then the mean value of
FIG. 7.10 Estimates and predictions from the fast-guess model of 2PCMC and 2PeMe versus Pr~Pf for the experiment of Figure 7.9. [Figure 3 of Yellott (1971); copyright 1971; reprinted by permission.]
302 Identification Paradigms
MRT(a)-MRT(fa) over conditions is 30.1 msec with standard deviations of 22.3, 23.6, and 14.1msec for the three subjects. The difference is negative only once in 17 cases for the first subject and only once in 13 cases for the second. Thus, the simple fast-guess model surely is not correct. To test the more general model, Oilman estimated i>0 and vl for each condition and plotted them against P(A a) + P(B b)- 1 = Pc -P,,. In 9 out of 10 cases when P(A a) + P(B b)— 1>0.8, the estimates of v0 were actually negative. It is clear that these data are not consistent with any of the fast-guess models. Within the fast-guess framework, one is forced to conclude there is some covariation of PaA and PbH with v,. 7.6.2
Identification of Length, Orientation, and Digits
Laming (1968) performed a number of experiments to test the version of the random walk model known as the sequential probability ratio test (SPRT) model, which will be described in detail in Chapter 8. Of these experiments, only Experiments 1 and 2 are reported in sufficient detail for our purposes here, and since 2 is a refinement of 1, I report only (hose data. The task, it will be recalled, was to identify which of two vertical white stripes on a black background had been presented in a tachistoscope. The main experimental manipulation was the presentation probability, which varied over blocks of trials. The order of presentation of these conditions was varied among four groups of six observers each. The instructions were to respond rapidly "without worrying too much about accuracy." No information feedback was provided. There were 200 trials per condition, for a total of 4800 observations over the 24 observers. The mean data over observers are summarized in Appendix C.3. In general, I believe it unwise to try to test models using data averaged over groups of subjects because there may be individual differences of, at least, parameter values. Two exceptions to that rule are when only ordinal comparisons are involved or when, as in this case, the relations being studied are all linear. The latter follows from the fact that any probability mixture of linear functions is itself linear. So the objection to group data is less salient here, but it will be for some of the models tested in Chapter 8. Figure 7.11 presents the plot of MRT(fo) versus MRT(a), and there is little doubt that they arc not equal—indeed, the correlation is nearer to — 1 than it is to +1. Link (1975) ran a similar study in which the two stimuli were outlines of squares each with just one of the two possible diagonals. Presentation of one or the other followed a 1000-msec foreperiod, and the signal was response terminated. The major experimental manipulation was the presentation probabilities that, unlike Laming's experiment, were interleaved at random, and the observer was told prior to each presentation which probability was in use. Information feedback about performance was not used. Seven hundred observations were collected from each observer in each condition.
Mixture Models
303
FIG. 7.11 MRT of stimulus a versus MRT of b in an experiment on the identification of length for which presentation probability was varied (Laming, 1968) (Appendix C.3). There are 4800 observations per point. The line is the prediction of the fast guess-model.
The mean data for the four observers are provided in Appendix C.4. The plot MRT(b) versus MRT(a) is shown in Figure 7.12, and again the prediction of Eq. 7.26b is violated about as badly as is possible. For these two data sets, the simple fast-guess model is clearly wrong. And because these data exhibit very little variation in Pc — Pe, it is impossible to study the linearity of PCMC and PeMe with Pc — Pe, and so we have no test of the general model.
FIG. 7.12 The same as Figure 7.11 for identification of orientation (Link, 1975) (Appendix C.4). There are 2800 observations per condition.
304 Identification Paradigms
Another relevant study is Link and Tindall (1971), which was described in Section 6.3.3. They used Eq. 7.33 to estimate c, (called fj,r> in their paper), and these estimates were shown in Figure 6.12 (p. 236). Not only are they not constant between deadline conditions, but for the accuracy condition the estimates vary with the discriminability of the signals. So, clearly, the data are inconsistent with the simple fast-guess model. Turning to the generalized model, we see from the discussion given that the slope of Eq. 7.33 should depend on PaA and Phli, which of course we expect to vary with deadline conditions. So there is nothing in these facts to cause rejection of the more general model. However, the other major prediction of any mixture model, including the fast-guess ones, is the fixed point property, and it was grossly violated in their data. Falmagne (1972) pointed out that if identification is perfect in state 1, then Eq. 7.23 yields Thus, if both p and (3A are manipulated experimentally, the error data should exhibit a simple product structure. To test this, he ran a study very similar to his 1965 one and checked this hypothesis; it was rejected, casting doubt on the fast-guess model. He did not report the time data in his paper. 7.6.3
Persistent Strategies in the Location of a Light
Perhaps the most conclusive study favoring the idea of two distinct strategies as in the fast-guess model, although not some other features of that model, is that of Swensson and Edwards (1971). They noticed in their data something that had been neither assumed nor predicted theoretically, but which makes studying the processes far more direct and convincing. Their subjects tended to persist using one of the two strategies rather than shifting frequently between them. Sometimes one strategy appeared to be in use throughout an entire session, and the guessing strategy seemed to involve not random guessing but rather a highly regular pattern. Although it is impossible to estimate with perfect accuracy when a switch between strategies occurred, the fact that runs of one strategy were quite long means that slight errors in estimating the boundaries between runs does not introduce gross errors into the estimates of either accuracy or mean times associated with a strategy. For their first experiment, they arrived at the following definition of guessing or what they called "preprogrammed responses." Such a run consisted of any maximal sequence of successive responses using just one key and involving at least three error trials. Any sequence lying between two such preprogrammed runs was defined to be a run of the accuracy strategy. The experiment itself involved the presentation of a small square of bright, easily detectable light in one of two easily discriminable locations to the left or right of a fixation point. The foreperiod was uniformly distributed
Mixture Models
305
TABLE 7.4. Accurate and preprogrammed-response (PPR) performance for each subject in Experiment 1 of Swensson and Edwards (1971) during homogeneous and mixed sessions with equiprobable stimuli Type of session Homogeneous Strategy Accurate (No. response) Mean block length" Error rate MRT (msec)
PPR
(No. response) Mean block length" Error rate MRT (msec) MRT difference 1
.ID
TW
Mixed JM
JD
TW
JM
(758) (1084) (513) 253 64 259 .0435 .0052 .0356 .0266 .0409 .0222 201.0 228.3 200.1 223.3 198.8 210.8 (1350)
(2611)
(899)
—
(448)
(1797)
— — —
_
.493 185.8 14.3
—
.506 189.0 21.8
(1124) 70
.487 203.4 19.9
(343) 58
.492 187.2
11.6
(592) 59
.503 182.9 18.1
Mean number of consecutive trials before S switched to the other strategy during mixed-strategy sessions.
in the sense that it lay in the interval from 1 to 3 sec with each individual millisecond in that interval being equally likely. Subjects were severely fined for anticipations, they were rewarded for time inversely to the response time measured from signal onset, and they were paid off for accuracy of responding. The sums of money involved were sufficiently large that wins and losses of as much as $50 occurred in a session. Feedback was provided on a trialby-trial basis. The experimental manipulations were three levels of presentation probability, 0.5, 0.7, and 0.9, and three payoff matrices, including a symmetric one and two asymmetric ones. This resulted in 18 conditions, to which were added the two extremes of 0 and 1 presentation probabilities, in which case no discrimination was required. For some conditions, the evidence suggested that a single, simple strategy was in force throughout the condition. For others, a mix of the two strategies seemed to be used. The data were segregated accordingly. As can be seen in Table 7.4, the three subjects are quite similar. First, for both of the estimates of accuracy and mean time, the values are substantially the same whether a run came from an entire session (called homogeneous) or from a mixed session. Second, under the accuracy strategy, the error rate was less than 5%, whereas for the preprogrammed (guessing) strategy it was close to 50%. And third, mean reaction times for the "accurate" responses were some 12 to 22 msec slower than the preprogrammed ones. The times for the preprogrammed responses are about those found in most visual, simplereaction time studies, but those for the accurate responses seem remarkably
306 Identification Paradigms
fast as compared with other estimates of the time required for accurate identifications. The major effect of the experimental conditions—payoffs and presentation probabilities—was to determine the proportion of time a strategy was used. In particular, the payoffs had no significant effect on the error rate in the accuracy condition. This fact is inconsistent with all of the models to be discussed in Chapters 8 and 9. Moreover, mean reaction time increased with decreasing signal probability when the subject was operating under the accuracy strategy and it also decreased for the incorrect responses under the preprogrammed strategy. These effects are not consistent with the fast-guess model. Experiments 2 and 3 undertook a more careful study of the speedaccuracy trade off, in particular, to see if it is possible to induce some form of intermediate responding of the type described in the next two chapters. The results, except perhaps for one subject, strongly confirmed the existence of just two types of strategy, although they found it necessary to admit into the definition of preprogrammed patterns two additional ones besides simple repetition of one response—namely, strict alternation and always giving the response that was appropriate to the preceding signal. Aside from Swensson (1972a)—which experiment I take up in Section 7.7.3—the only other person I am aware of analyzing their data in this fashion is Wilding (1974). However, because his subjects were so slow, it made no difference. I believe that such an analysis should be used routinely since it provides simple definitions of guessing strategies, thereby permiting one not only to check if there was guessing in experiments where one hopes no guessing was involved, but to identify the runs of guesses and remove them for separate analysis. 7.6.4
Sequential Effects
in the Identification of Orientation
In a study mentioned earlier (Section 6.5.1), Falmagne et al. (1975) ran three subjects in the identification of two isosceles triangles presented on a CRT. The bases were vertical and the apex faced either right or left, to which right and left keys corresponded. The vertical and horizontal visual angles were about 2.4° and 1.2°. The presentation probabilities were 0.65 and 0.35, which on half the sessions were assigned to right and left, respectively, and on the other half to left and right. The stimuli were response terminated, and the response-stimulus interval was 200 msec. Each session consisted of 100 practice trials followed by three blocks of 300 trials each. Following each block, subjects were informed of error rates and were urged to speed up when it was less than 0.10 and to be more careful when it was greater than 0.12. A total of 20,000 trials were run, of which 2600 were practice and were not included in the analyses. As the two hands made a difference, the data were partitioned accordingly. As was noted in Section 7.5.3, the model was fit to the data by means of a
Mixture Models
307
FIG. 7.13 MRT versus stimulus history, similar to Figures 6.21 and 22 for Remington (1969), for an unequal presentation probability experiment of Falmagne et al. (1975). The sample size for the pair of rightmost points is 8700 and is reduced to 1087 for any corresponding pair of leftmost points. The theoretical predictions are for the threestate model described in the text. [Figure 13 of Falmagne et al. (1975); copyright 1975; reprinted by permission.]
numerical search. For one subject, two states of memory (no fast guesses) sufficed, but for the other two fast-guessing states had to be included. Only one data set was fully analyzed because of the expense of the parameter search. To gain some idea of how the model fits some of these data, Figures 7.13 and 7.14 show the time and error sequential data for the 0.35 probability signal. The code is this: squares are data points and circles are predictions from the model. Filled symbols are signals that are the same as the one on trial n, whereas unfilled symbols are the opposite signal. Thus, if sn = a, sn ., = a, and s f l _ 2 = b, then the symbol over n is filled, that over n — 1 is also filled, but that over n - 2 is unfilled since b differs from a. The ability of the model to mimic the data is striking. The estimated mean latencies (in msec) for the several scanning conditions are
In this case, b corresponded to the less preferred hand, and we see that for the identification responses it is slower. The value of x2 f° r these data was 104 with 34 degrees of freedom (46 nodes less 12 estimated parameters, ignoring the two times that were estimated separately). This is, of course, highly significant. Nevertheless, the model is mimicking the pattern of the data exceedingly well. The authors * I have verified with Falmagne that this is not a typesetting error.
308 Identification Paradigms FIG. 7.14 Error probability versus stimulus history for the same experiment as in Figure 7.13. [Figure 15 of Falmagne et a!. (1975); copyright 1975; reprinted by permission.]
were unable to find any systematic pattern to the erroneous predictions, which was not the case when they attempted to fit the simpler model to the same data. Clearly, the model is not quite correct, but nevertheless it does an impressive job in reproducing a highly complex set of results. As they note, no doubt with a bit of tinkering the fit could be improved. The greatest difficulty in this approach is how to generalize it. This has been the dilemma of much model building in psychology, and has proved especially acute with Markov chain models. As one goes to more signals, the number of parameters proliferates rapidly. This would not be too bad if they could be partitioned into subgroups that could be estimated from simpler experiments, but so far that goal has not been achieved. Part of the difficulty is in understanding how the parameters relate to the subject and to the stimuli being used. To what extent can we assume the memory parameters are a fixed property of the subject, and to what extent are they experiment dependent? All too often, they seem to depend on both. 7.7 DATA WITH RESPONSE ERRORS: CONFUSABLE SIGNALS 7.7.1
Identification of Auditory Frequency
Oilman (1966) reported a study using three subjects and a monaural presentation of 100 msec pure tones. One pair of stimuli were 900 and
Mixture Models
309
907 Hz of the same intensity, the other 900 and 920 Hz. The payoff involved a deadline such that responses occurring after the deadline lost one point and those before it were paid off according to a symmetric matrix of payoffs; the points were converted to money. The deadlines used were 300, 400, and 500 msec. One subject was insensitive to the deadline. For the other two, PCMC and PeMc were plotted against Pc—Pe, as shown in Figure 7.15. The lines fitted to the data are those of Eq. 7.3'la, b (with the notation a = |[PaA + PhB\, K-VO, and H = v1). These data seem supportive of the fast-guess model. 7.7.2
Detection of Auditory Intensity
In this section I report some additional data sets which were run for quite different reasons (taken up in Chapters 8 and 9). Carterette, Friedman, and Cosmides (1965) reported a Yes-No experiment involving the detection of a relatively weak 1000-Hz tone in noise. At the end of a 750-msec warning light, a 100-msec listening interval began that might or might not contain the tone. The observers, who were already extensively experienced in detection, were practiced for two weeks in the task. They thought it just a detection task and were unaware that their response times were being recorded and were, in fact, the primary focus of the study. (This is a good example of where I am reluctant to speak of reaction times.) Each experimental condition involved one of two intensities called high and low, although both were quite near threshold, and one of three presentation
FIG. 7.15 Fit of Eq. 7.31 of the fast-guess model to data from the identification of auditory frequency in which errors cannot be avoided. I have not been able to ascertain the sample size. [Figure 1 of Oilman (1966); copyright 1966; reprinted by permission.]
310 Identification Paradigms
MRT(s)
FIG. 7.1.6 MRT(s) versus MRT(n) for an auditory detection study of Carterette et al. (1965). There were two intensity levels and three presentation probabilities (see Appendix C.5). For each observer, there are 1800 observations per condition. The line is the prediction of the fast-guess model.
probabilities—0.2, 0.5, and 0.8. A total of 1800 observations were obtained in each condition from each observer. The response probabilities and mean response times are provided in Appendix C.5. Figure 7.16 compares MRT(s) with MRT(n). They may be equal for observer 3 but certainly not the other two. Estimates of v^ obtained from Eq. 7.33 arc shown in Table 7.5. As they are quite variable it seems doubtful if the generalized fast-guess model holds for these subjects. The exceedingly erratic values for Observer 3 arise primarily from the fact that P(. Pe was very small for all conditions. Green and Luce (1973) reported a detection study whose basic data have not been published in full (Link, 1978, published the averages over the three observers from one part of the study); they are given in Appendix C.6. The signals were near-threshold, 1000-Hz tones in noise that were response terminated. Each condition involved a deadline and the following payoff structure: Anticipations were each fined 25 points, responses later than the deadline were fined 4 points, and responses within the deadline were paid
Mixture Models
311
TABLE 7.5. Estimate of y, (msec) from the equation (P,,M<: - P.MJ/CP,. - PJ for data from Carterette et al. (1965), Appendix C.5 Experimental conditions Low intensity
High intensity
Observer
.2
.5
.8
.2
.5
.8
1 2 3
1413 2626 1167
1630 1660 678
499 1138 42610
1660 3088 -434
898 2076 826
1269 2766 -32
off according to a matrix of the following character: Responses Stimuli
Yes
No
s n
x -10
-10 y
The points were converted to money at the end of the experiment. In one manipulation, x = y = 10, and the deadline was varied. In another manipulation, the deadline was fixed at 600 msec and the following (x, y) pairs were used: (20, 1), (15, 5), (10, 10), (5, 15), and (1, 20). Two types of deadline were run. In the s-deadline condition, the deadline applied only on those trials when a signal was presented, in which case there was no fine for slow responses on non-signal trials; in the sn-deadline condition, the deadline applied to all types of trials. (The motivation for this design will become apparent in Section 9.3.) Information feedback followed each response. Each condition had approximately 1500 trials for each observer. Figure 7.17 plots MRT(n) versus MRT(s) for this experiment. It is clear that the simple fast-guess model fails for the s-deadline procedure. Figure 7.18 shows PCMC — PeMe versus PC—PC for the sn-deadline data; those for the s-deadline are similar. Obviously, for the weak signal case the relation is not linear, and so none of the fast-guess models describe these data. 7.7.3 Persistent Strategies in the Identification of Orientation In a study closely resembling Swensson and Edwards (1971) (Section 7.6.3), Swensson (1972a) used stimuli that were somewhat confusable. They were rectangles in one of two 45° orientations with the ratio of the sides sufficiently near to 1 that each subject had an error rate of 1% to 2% when
FIG. 7,17 MRT(s) versus MRT(n) for an auditory detection study with response deadlines of Green and Luce (1973) (sec Appendix C.6). The upper row is for the condition in which the deadline applied only to signal trials, and the bottom for the deadline on all trials. The left column is the data generated with a fixed payoff matrix and varied deadline. The right column is the data with a fixed deadline and varied payoffs. For each observer there are 1100 observations per condition. 312
Mixture Models
313
time was not a factor. As before, anticipatory responses were severely fined, accuracy was rewarded, and time was charged according to a proportional cost function. Other aspects of the design were essentially as before. Preprogrammed guesses were identified in two ways. The first was the definition used in the earlier study—namely, any unbroken sequence of responses that included at least three errors or any pattern of strict alternation or any pattern of making the response that would have been appropriate to the preceding signal. The second criterion was based on the observation that the response-time distributions were, in fact, bimodal with very few times falling in the interval from 240 to 280 msec. So, any response that had not been classed as preprogrammed under the first definition was so classed if it was faster than 250msec. As can be seen in Figure 7.19, the error rate of the preprogrammed responses was very close to 50%, whereas, the other, more accurate ones have an error rate of 10% or a bit less. The times for the accurate responses—those greater than 250 msec—are about 200 msec slower than the preprogrammed ones, which at about 200 msec is typical of simple reaction times to intense visual signals. Note the enormous difference in the time estimated for accurate responding in this experiment and in Swensson and Edwards (1971). I do not understand what accounts for the difference. Because the error rates found among the accurate responses exceeded the 1% or 2% the subjects were capable of in an ordinary discrimination experiment, and because there was no evidence for a speed-accuracy tradeoff to account for it, two additional experiments were conducted. In both an attempt was made to vary the monetary tradeoff between speed and accuracy over a wider range, thereby encouraging a speed-accuracy tradeoff. FIG. 7.18 PcMc-PcMe versus Pc-P,, for the sn -deadline condition of the experiment of Figure 7.17. The open symbols are for the weak signals and the closed ones are for intense signals. [Figure 2 of Green and Luce (1973); copyright 1973; reprinted by permission.]
FIG. 7.19 In a study of orientation for which errors occur and using the definition of guessing and accurate trials described in the text, plots of proportion of errors and MRT versus sessions. There are data for six subjects, with 350 trials per session. The guessing trials appear as the upper curves in the error panels and the lower curves in the MRT panels. [Figure 2 of Swensson (1972a); copyright 1972; reprinted by permission.]
314
Mixture Models
315
FIG. 7.20 SATFs for three subjects of the study cited in Figure 7.19. The open circles are data from accurate trials and the closed ones are from guessing trials. [Figure 5 of Swensson (1972a); copyright 1972; reprinted by permission.]
The second of these additional experiments had the added feature of taking into account a finding of the first—namely, that following a detection there is a substantial time before any accurate identification is possible. This time appears to the subject as a fixed cost in electing to be accurate rather than fast. So the payoff structure was revamped to take this into account. For two of the subjects the costless delay was set at 250 msec and for a third at 300 msec. The resulting pattern of tradeoff for responses greater than 250msec is shown in Figure 7.20. The measure of accuracy used is log odds—that is, log(Pc/Pc). It seems clear that there is a substantial speedaccuracy tradeoff of a type different from that envisaged in the fast-guess model. Also, it should be noted in Experiment 2 that some subjects exhibited a clear ceiling effect in the sense that additional time devoted to the task did not have any effect on accuracy. Swensson discussed at some length the possibility of accounting for this tradeoff by combining the fast-guess model with one or another of the models to be discussed in Chapter 8. At that time, no one allowed for error responses to be faster than correct ones, as is needed for the responses classed as accurate.* This is now possible within the original random walk framework using either Laming's modification or the version developed by Link and Heath (Section 8.3.3). Because nothing seemed available, Swensson urged the simpler deadline model of Nickerson (1969) in which the * Apparently Swensson, like many others at the time, was not yet aware of Laming's (1968) idea of sampling prior to the signal onset in the SPRT random walk model in order to account for this fact.
316 Identification Paradigms
accurate responses result from a race between a subject-imposed time deadline and an accrual process with some sort of response criterion. The responses that are determined by the deadline, necessarily the slowest ones, are also more likely to be errors simply because the criterion for responding was not satisfied at the time the response was in fact effected. So that model has the qualitatively correct features of errors being slower than correct responses, given that the fast guesses have been excluded. Wilding (1974) pointed out a number of predictions of the model, among them these three (p. 484): (1) A subsidiary peak in the latency distribution should occur at the longest latency obtained, due to the deadline responses. (2) This peak will be relatively larger for errors. (3) The peak will be greater when the deadline comes earlier.
His distribution data, shown in Figure 6.10 (p. 234), do not reveal such a secondary peak. Moreover, additional predictions of the deadline model were also rejected. The only concern I have is whether the verbal instructions that Wilding used were adequately effective in establishing a deadline at all. Recall that Swensson's data had a mean response time for fast guesses of about 200 msec and a mean time for accurate responses of about 400 msec, whereas Wilding's data (which involved the identification of the location of an 800 msec flash to the right or left of a fixation point) had no responses faster than 450 msec. I suspect that subjects were exhibiting quite different styles of behavior in the two experiments, at least to the extent that Wilding's subjects did not seem to produce any fast guesses.
7.8
CONCLUSIONS
Although it is rare to find an empirical study for which simple two-state mixture models, such as the fast-guess one, actually fit the data accurately, in a number of cases it appears as if something like fast guesses occur. Perhaps the most vivid demonstrations of this are Blough's pigeon data (Figure 6.4, p. 224) and the human studies by Swensson and by Swensson and Edwards in which subjects sustained for prolonged periods either of two quite different patterns of responding. When subjects are put under severe time pressure it is probably advisable for experimenters to examine their data for fast guesses, possibly using Swensson's (1972a) criteria and, if they are interested in testing other models, eliminating those responses from the data analy/ed. The fast guesses appear to be just what had been postulated—simple reactions to signal onset that exhibit virtually no accuracy at all. It is less clear what happens when subjects are not guessing. It seems to me that if one attempts to encompass all of the data presented in this chapter within a framework of mixtures of finitely many states of
Mixture Models
317
attention, then there can be no less than three states, including one of pure guessing (and quite possibly there are more than three). The argument is as follows. The fact that perfectly identifiable stimuli can, under sufficient time pressure, be incorrectly identified (Yellott, 1971) forces one to at least two states, one of which is a guessing state. Although Yellott's data appeared consistent with the simple fast-guess model, those of Laming (1968), Link (1975), Link and Tindall (1971), and Oilman (1970), all of which also involved easily identified stimuli, made clear that the simple fast-guess model is not sufficient, and certainly one possibility is at least two states of more-or-less accurate responding. Such a model was realized in the striking work of Falmagne et al. (1975) on sequential effects, which suggests that four states of memory leading to mixtures of three different latency distributions (complicated by a response effect, which doubles the number of distributions) are very nearly sufficient to account fully for the data. Such a conclusion is, however, tentative since the model has only been studied intensively for one subject. Next consider the data obtained from perfectly identifiable stimuli and for which few response errors occurred. The lack of error rules out the guessing state playing any role, and the data are not easily understood in terms of a single state. For example, two states seem needed to encompass sequential effects. So, at least two states in addition to the guessing one are needed. If there were just two, then the data histograms should exhibit the fixed-point property. Those of Lupker and Theois (1975) seemed to, but those of Falmagne (1965, 1968) and of Link and Tindall (1971) grossly violated it. If we hold to the simple assumption that each state has a single response-time distribution, then we are forced to conclude that there must be more than two states of perfect discrimination. The alternative—an example of which was developed in Section 7.5—is that there are different distributions for the two responses in each state. Then, in the case of two states, the fixed-point property would not follow for the overall reaction times but only when they are partitioned according to responses. I am unaware of such plots in the literature. Data arising from confusable stimuli (Carterette et al., 1965; Green and Luce, 1973; Oilman, 1970; Swensson, 1972a) uniformly reject all versions of the two-state, fast-guess models, and so within the finite-state framework require a minimum of two states of stimulus identification. So far as I can tell, these data do not clearly implicate a third non-guessing state, although the results of Swensson (1972a) and Swensson and Edwards (1971), with their vastly different mean times for accurate responses and with the speed-accuracy tradeoff of the former paper appearing to lie outside the fast-guess model, are easier to understand with more than two non-guessing states. Thus, to encompass all of the data in a unified framework, we appear to need at least three states, 0, 1, 2, . . . , with reaction time distributions Gt having means vi with the property that ^ < V ; n and state identification
318 Identification Paradigms
probabilities P sr (i) with the following plausible properties:
where state 0 is a guessing state and the slower a state the more accurate the identification. This model, although quite plausible given the data, has not been much pursued (except of course in the various special two- and three-state subcases discussed above) because it has so many parameters— p(0), p ( l ) , . . . , p ( i ) , v0, vi, • • • , /3A, PaAW,PhnW,..., where £ p ( i ) = l . i
Thus, with three states there are 10 and with four states 14 parameters (some subject to inequalities) to account for just six pieces of independent data per experimental condition. Even assuming that some parameters are invariant under experimental manipulations, the problems of estimation and finding adequate ways to test the model are formidable, as we saw in the work of Falmagne et al. (1975). It appears clear that only by insisting such models account for sequential effects do we have any hope of evaluating them. The primary alternative that has been pursued is to suppose that in addition to a special guessing state, a continuum of accuracy states exist, establishing a continuous tradeoff between speed and accuracy, and that any experimental condition elicits just one of these states. Under that assumption, the observed speed-accuracy tradeoff provides indirect evidence about the underlying tradeoff. For this approach to be successful, considerable care must be taken once again not to become overwhelmed with free parameters. Attempts along these lines are the topics of the next two chapters.
8 Stochastic Accumulation of Information in Discrete Time
This chapter and the next one examine several closely related models for possible processes underlying choice reaction time. Except for Section 8.4.2, it is implicitly assumed in all of the models that the subject knows when to begin accumulating information, the only uncertainty being the signal presented, which is to be identified rapidly and accurately. In practice, signal onset is known either because there is an intense signal in another modality that identifies the onset time, or because the signals themselves are sufficiently intense that they alert the subject (perhaps via a change detector) to their presence. However it is done, we usually attempt to insure that we do not have to contend with the complication of information accumulation during periods when the signal is not present—which raises questions about how to deal with signal termination. One possibility is to avoid the problem experimentally by using response-terminated signals; that technique has been used only rarely. Another possibility is simply to assume that information is collected only during the duration of the signal. For the most part, the problem has simply been ignored by model builders. It conceivably can be treated by using time-dependent parameters; however, we are forever in danger of being overwhelmed with free parameters. The major ideas for these accrual processes were presented in Section 4.2.1, and the reader is urged to reread that material because what was said there will be taken for granted here. In this chapter I assume that information about the signal accrues at discrete, brief times as samples of it are "observed" in the nervous system. These internal observations are assumed to arise from a process that is unitary in the sense that it is not possible experimentally to manipulate one observation independently of the others. This assumption is the reason that the models of this chapter (and the next) are treated as having just a single stage rather than many. Further, most of these models are classed as optional stopping in the sense that the decision to respond rests upon the observations that have been obtained (Swensson and Thomas, 1974). As in the preceding chapter, we concentrate on a two-choice reactiontime experiment in which just one of two signals, a or b, is presented on each trial. The subject is to try to identify the signal presented rapidly and accurately, attempting to respond A to a and B to b. 319
320 Identification Paradigms
8.1 ACCUMULATOR MODELS In the first two subsections of the chapter (in contrast to the others), information is assumed to be binary. At each instant, the information registers a single count in favor of one of the two responses, A or B. LaBerge (1962), who also admitted the possibility of the information not favoring either response, cast this in terms of sampling stimulus elements in the sense of Estes (1950). Audley (1960) referred to these observations as implicit responses. If signal s is presented, we assume that the probability of a count in favor of A is ps and in favor of B is qs = 1 — ps, s = a or b. Obviously, we anticipate that pa > pb. There are two distinct response rules in the literature, which we now explore. 8.1.1
The Simple Accumulator
LaBerge (1962), who worked out this model, called it recruitment theory; later Audley and Pike (1965) called it a simple accumulator, which strikes me as more graphic. There are two constants kA and k B and the decision rule is that the system responds whenever either the count favoring A reaches fcA or the count favoring B reaches kB, and the response is A in the former case and B in the latter. What we need to compute is the probability that a particular response, say A, occurs after a fixed number of observations. Since it takes kA observations in favor of A, the least number that can occur is kA. And the most is k A + kn — 1, since if kB occurs favoring B, then that response would be made. So we need to consider kA+j observations where O ^ / s kB — 1. Clearly, the last observation must favor A, and so the / observations favoring B must be distributed in kA — 1 locations, and it is well known that there are
ways for that to happen. So the probability distribution is Pr(response A at step kA +/' | given signal s)
In order to calculate the moments of this distribution, it is very convenient to put the distribution in the form of a generating function (see Section 1.4.6),
For some purposes it is convenient to reexpress this in terms of the
Accumulation of Information in Discrete Time 321 incomplete beta function:
To do so we use the following known relations among the cumulative negative binomial distribution, the binomial one, and the incomplete beta function (Eq. 8.3) (see, e.g., Beyer, 1966, p. 202-203):
Keeping in mind that psz is not 1 — qsz and substituting this expression for I into the right side of Eq. 8.4a immediately yields the right side of Eq. 8.2, so
By interchanging A and B and ps and qs,
From these equations it is easy to derive explicit expressions for the response probabilities and the mean number of steps to a response. In carrying out the latter, one uses the following known identity:
We obtain
where J is the random variable representing the number of steps to a decision. Assuming that there is a constant conversion factor A from discrete time to physical time and an additive mean residual time r0, the expression for expected response time is Weatherburn (1978) has expressed some doubt about the assumption of a constant conversion factor, but there is little to do with the model if one does not assume it.
322 Identification Paradigms
Since there are six independent equations and a total of six parameters— Pa, Pb, ^A! ^B, A> and r0—it is clear that, except for possible degeneracies, the model can always fit the data from one experimental condition. The only effective test, therefore, is to manipulate some experimental variable and observe how the estimated parameters vary with the manipulation. For example, in Section 8.5 we will examine data from four experiments in each of which at least the presentation probability or the payoff matrix was manipulated. Given the apparent meanings of the parameters, we would expect pa, ph, A, and r(, to be invariant and for k A . and kB to vary. 8.1.2
The Runs Model
Audley (1960), in a continuous time context, and later Audley and Pike (1965), introduced a variant on the accumulator idea. They postulated that the relevant accumulation toward an alternative is the total number of counts in favor of it following the most recent count favoring the other alternative. That is to say, response A occurs at the end of the first run of kA implicit observations favoring A that is not preceded by a run of kB observations favoring B. This makes distant past history irrelevant to the current decision. To calculate the generating function, we may argue as follows. The overall process is a convolution of alternating runs with the following properties. The last run is a sequence of kA favoring A. Those runs preceding it must not exceed fcA - 1 favoring A and k ri -1 favoring B. We will make use of the fact that the generating function of a convolution is the product of the generating functions of the component distributions of the convolution. Letting ps be the probability of an implicit response favoring a given stimulus s, then
characterizes the generating function of all possible lengths of a single run favoring A that does not lead to a response. Similarly,
does the same thing for a run favoring B. Now, prior to the last run favoring A, either there is no run, which has a generating function of 1, or a run favoring B, represented by |8s(z), or one of that type preceded by one favoring A, represented by a s (z)(3 s (z), and so on. Thus, the overall general-
Accumulation of Information in Discrete Time
323
ing function is
where
and
If we let
then
So,
Similarly,
We now derive the equations for the response probabilities and mean
324 Identification Paradigms response times:
Taking the derivative of In P s (z) yields
and so
Turning to S'JSS,
Substituting
Accumulation of Information in Discrete Time
325
A similar expression follows for £(J | 5, B) with A and B interchanged and p and q interchanged. As before, Eq. 8.7 relates E(T | s, r) to E(J | s, r), and there are six parameters and six equations. Audley and Pike (1965) generalized the model to m alternatives. 8.1.3
The Strength Accumulator
Vickers (1970) proposed a variant on LaBerge's simple accumulator in which magnitudes favoring the two alternatives are accumulated as follows. Suppose that at each instance of time i during which the signal s is present a Gaussian random variable X ; (s) with mean fj,s and variance erf is observed. We anticipate p,a > 0 and /xb <(). Suppose further that the X ; (s) are independent random variables. If X;(s)> 0, this is interpreted as evidence favoring the presence of a, and that has the probability Ps = Jo ^(i^s, crs). Similarly, X; <0 is evidence favoring b. However, unlike LaBerge's model, which registered a count one way or the other, Vickers suggested that the evidence is the more persuasive the further it is from 0, and so the total amount of evidence is tallied for each alternative:
where s£ = {i X 1 (s)>0} and 9% = {i \ X i (s)<0}. Let which are, of course, completely determined by jn,s and crs. The response criteria are assumed to be of the form and the response rule is to keep a running tally of KA — TA and Kti +T B and respond according to whichever first becomes negative. McCarthy (1947) is quoted as showing that the expected total number of instances prior to a response is
where Ksr = Krlmsr and 7P is the incomplete beta function. No other analytic results seem to be known; in particular, it has not been possible to derive the four mean times separately.* Assuming a symmetric case,
326 Identification Paradigms FIG. 8.1 For the accumulator model, the relation between MRT and response probability computed from simulations of 500 runs for each condition (10 values of the criterion C and 20 values of oY/x. [Figure 3 of Vickers (1970); copyright 1970; reprinted by permission.]
8.2 RANDOM WALKS WITH BOUNDARIES Assuming discrete counts favoring either A or B, another natural model to consider is one in which decisions are based upon the difference in counts favoring A and favoring B. If we denote that difference on the ith observation by N;, the natural decision rule is to respond A if N, > fcA occurs before N; < — kB and to respond B the other way round. This process is the simplest example of a random walk, and its analysis will be subsumed under a rather more general model I now describe.
Accumulation of Information in Discrete Time
8.2.1
327
The Model
Assume that at each discrete time i a new piece of information, Y;, accrues; moreover, assume that these random variables are independent and identically distributed. Let their common density be denoted fa when signal a is presented, fb when b is presented. Suppose further that there is a monotonic transformation h that takes Y( into X; = h(Y ( ) and that decisions are based upon the partial sums
The stochastic process {Stl} is called a random walk on the line (see Section 1.5.2). Assume that the random variables X; are such that the walk tends in one direction, say to the right, when a is presented and in the opposite direction when b is presented. Introducing a symbol for the mean of X f , the assumption is
The decision rule for the process is as follows. Place the origin of the line at the initial point of the process. The process is assumed to terminate when enough evidence has accumulated to provide good reason to believe it is tending in one direction or the other. Obviously, the direction is ambiguous only when X; can take on both positive and negative values. The simplest way to decide about the tendency is to establish criteria both to the right and left of 0 such that when Sn crosses a criterion, a response is made—namely, A when the right one is crossed and B when the left one is. There will be no confusion in letting A and B have double meanings, both as the name of a response and as the absolute numerical value of the corresponding criterion. So the decision rule at the nth observation is:
The theoretical problem is to understand the distribution of times at which a response occurs and the probability of that happening. Observe that by moving a boundary toward 0, the corresponding responses will tend to speed up and become more error prone. Obviously, the problem just formulated is very general and of much broader interest than just choice reaction time. The literature on the subject began with Wald's (1947) original and seminal book, Sequential Analysis, which was motivated largely by the problem of reducing sample sizes when it is very expensive to collect data. Although there is considerable statistical
328 Identification Paradigms
literature on the subject, a tiny fraction of which I shall reference (for a reasonably recent treatment, see Wetherill, 1975), most of the reaction-time work is focused on two restrictive assumptions about the distribution of X;. The reason is that it appears to be very difficult to get usable results except under these assumptions. And even then, the attack on the problem requires some indirection. 8.2.2
The Assumption of Small Steps and Wald's Identity
It is not immediately obvious whether or not the process just defined always ends in a response; it is conceivable that there is a positive probability that for all integers n the condition — £ ? < S n < A will hold, in which case no response would be made. Were that to happen, we would need to take it into account. As a matter of fact, it cannot, as is proved in Theorem 8.1 (see Section 8.2.6). We turn now to the major device used to analyze such processes. I do not know how to motivate the approach intuitively; suffice to say that it works. Let the mgf of X given signal s be denoted Observe that for a fixed integer n, Eq. 1.44 together with the independence of the X; leads to which can be rewritten as It does not follow from this observation that the same equality holds generally when n is replaced by an integer-valued random variable. (The reason is that with infinite sums and integrals, an argument is needed to justify interchanging them.) However, what Wald's identity asserts is that it does in fact hold for a broad class of distributions for X when n is replaced by the trial number at which criterion is crossed. This fact is stated as a formal theorem, which is proved in Section 8.2.6. Theorem 8.2 (Wald, 1947, p. 159), In a random walk on the line of the type described above, let N denote the least integer such that either S N < - B or S N > A. Then for all 0 for which |M S (0)|> 1, where the expectation is the joint one over the X; and N. Equation 8.14 is Wald's identity. It becomes of value only if we are willing to assume that the mean and standard deviation of Xf are very small relative to A and to B, in which case the value of SN is approximately either
Accumulation of Information in Discrete Time
329
-B or A at the time the criterion is satisfied. We state this formally: The Assumption of Small Steps:
The value of SN is either -B or A.
In the context of sequential sampling, where the error probabilities are usually set at 0.05 or 0.01 or even smaller, this assumption is not unreasonable. But those importing the random walk model to the study of reaction times seem to have been unaware that the circumstances under which the assumption holds and at the same time leads to typical error rates—say in the range 0.1 to 0.3—are somewhat unusual. For the Gaussian case, this is explored more fully in Section 8.2.4. If one thinks of the size of the steps as becoming vanishingly small and the number of them infinitely large, the discrete case passes into a continuous one. We deal with that in the next chapter. Under the assumption of small steps, we may immediately rewrite Eq. 8.14 as: for s = a, b, where P(A | s) and P(B \ s) are, respectively, the probabilities of an A and B response occurring (i.e., the probability the corresponding criterion being crossed) when signal s is presented. We make use of Eq. 8.15 in two ways. First, we search for those values of 6 for which Ms(0)= 1, in which case both expectations become 1 and we thus obtain an equation involving only P(A j s), P(B \ s), A, and B. Obviously, 0 = 0 leads to Ms(0)- 1, but that only results in the triviality (given Theorem 8.1) that P(A ] s) + P(B | s)= 1. The important question is values of 0 ^ 0 for which Ms(6)~\. Once those are found, then P(A\s) is determined explicitly as a function of the parameters of the model. Second, we can then substitute those values for P(A s) in Eq. 8.15 in order to study E[MS(6)"N \ s, r], s = a, b, r = A,B. If we think of l/Ms(6) = z as a variable, then we are studying E[z N s, r], which is the generating function of the random variable N. In principle it tells us the distribution of N and, in practice, it permits us to compute the mean and variance of N fairly easily. 8.2.3
Response Probabilities and Mean Decision Latencies
If we assume that the distribution of X( is so restricted that the mgf satisfies 1. lim Ms(6)> 1, and O^rfcoo
2. M ' s ( ( ) ) ^ ( ) ,
then it can be shown (Section 8.2.6) that there is a unique value of 6 distinct from 0, call it 0 ls , such that Ms(6ls)= 1. These assumptions must be verified each time we make a specific choice for the distribution of X f or the existence of 0 ls must be shown directly, which is often easy to do. Assumption 1 holds if the random variable X; straddles the origin and Ms(0) exists
330 Identification Paradigms
for all 6. Thus, any entirely positive or entirely negative random variable necessarily violates the first condition. For the special case that the transformation h is the log of the probability ratio (see Section 8.3.1), the second condition is equivalent to the distributions corresponding to the two signals not being identical. Inserting 0 l s into Eq. 8.15 and taking into account that P ( A | s ) + P(B s)= 1 yields Setting
we obtain for the two response probabilities
The computation of the mean decision latency is based upon the following property of generating functions
From this we show in Section 8.2.6 that
where yv is a parameter of the distribution of X defined by
At this point we have a total of six equations- one each for P(A a), P(A b), LaA, LbA, Lali, and L,)B—and eight parameters—A, B, 0 l t t , 0 l h , ju,a, fib, ya, and yh. Despite the fact that the unit of the underlying scale of the
Accumulation of Information in Discrete Time
331
process is arbitrary, we cannot reduce the number of parameters because the relation between physical time and distance in the random walk is unspecified. As that leaves us with two more parameters than equations, it is tempting to derive more equations from the moment generating function. For example, we can calculate four more equations for the variance of the response times. Unfortunately, that adds four more parameters—the variances of Xj for a and b and two values analogous to ys. Even though the model is not completely determined, there are nevertheless constraints imposed by it and, in principle, these can be tested. Unfortunately, none of any use have been derived. An example of the difficulty may be instructive. From Eq. 8.15 it can be shown that the overall mean decision latency time and the response probabilities satisfy
Link (1978, Eq. 5) derived this relation for the special case of symmetric stimulus representation (8.3.3), but it holds generally (see Section 8.2.6, Eq. 8.25a). Equation 8.20 would be of use if we could perform an experiment in which Ls and P(A | s) are manipulated while holding A//x s and B/^ fixed, in which case we could test for a linear relation between them. Observe that from Eqs. 8.17 and 8.18 that this is possible only if 0 ls . is varied while holding everything else fixed. For all practical purposes, this entails finding a manipulation that affects the variance of X( without affecting either its mean or the boundaries A and B. No one has had an idea about how to do that. So our only recourse is to restrict the model further, which means the imposition of some assumptions on the distribution of Xj. One possibility is to limit attention to a specific parametric family of distributions, which is done in Section 8.2.4. Another, more useful, approach is to limit attention to broad classes of distributions that lead to tractable results. In Section 8.3.1 we study the case where the two moment generating functions are translations of one another (Theorem 8.3) and in Section 8.3.3 where they are reflections of each other (Theorem 8.5). 8.2.4
Gaussian Steps
The most obvious tack, although probably not the best, is to limit one's self to very specific, well-known families of distributions. For example, if we assume that X; has a Gaussian distribution with mean p,s and variance crj, then since (Eq. 1.50) it follows that
332 Identification Paradigms The model now has five free parameters, and so in principle can be tested. No attempt to do so exists in the literature, probably, in part, because two other classes of models have attracted a good deal of attention and, in part, because one would have to explore a five-dimensional parameter space by Monte Carlo methods, which used to be expensive and difficult—although that is rapidly becoming a non-issue. We study the other two classes of models in Section 8.3. Heath (1981) noted the following non-parametric test of the Gaussian model. From Eqs. 8.18a and b, wesee
And from Eqs. 8.17a and b,
Assuming that Olx - — 2fi s /cr, is small, and so
yields
This says that in the Gaussian model with
To simplify, suppose A = B, then a s = |3S and so by Eq. 8.17a,
For a s = l, P(A | s ) = . 7 3 1 ; for as •= 2, P(A s ) = . 8 8 l . This is the normal range of probabilities, but in the latter case it means that A = (Ts/fv If we let k = ITS//J.V, then A = fc
Accumulation of Information in Discrete Time 8.2.5
333
Distributions of Decision Latencies
At a theoretical level, the title of this subsection is misleading because explicit expressions for the distributions are not arrived at; rather, I derive their characteristic functions. In the process use is made of the Wald identity which, as an examination of the proof in Section 8.2.6 shows, holds for characteristic as well as moment generating functions. Recall that the characteristic function of N is To do this, Wald proceeded as follows. Search for those values of 6 such that which causes Eq. 8.15 to become an identity in conditional characteristic functions. Assuming M's(0) =£ 0 for Q = 0 and O ls , then we know that for sufficiently small \6\, there are just two roots 60s(u>) and 0i s (o>) to Eq. 8.21 such that Substituting these into Eq. 8.15 and, for simplicity, dropping the conditioning on s throughout,
Solving,
Suppose that X is Gaussian distributed, then Eq. 8.21 becomes
and the two solutions are easily seen to be
On substitution into Eqs. 8.22a and 8.22b we have the characteristic functions for the reaction times. To my knowledge, these have not been inverted in general.
334 Identification Paradigms For the special case where B approaches °c—simple reaction times— observe that since
we know that for small \6\, the real part of 6i(u>)
So
This can be shown (Wald, 1947, p. 193, or Johnson and Kotz, 1970, Vol. 1, p. 139) to be the characteristic function of the Wald distribution:
The reaction-time distribution is rather more sensitive to the exact form of the distribution of X than one might have anticipated. For example, suppose the distribution of X is the gamma \mxm le Ax /(m - 1)!. This arises when X is the sum of m -independent, identical exponential random variables, which it is if X is the sum of n interarrival times of a Poisson process. This fact suggests that for large m, the reaction-time distribution should be much like the Wald since by the central limit theorem the gamma approaches the Gaussian. We show that this conjecture is not so. The distribution of crossing times and distances are the concatenation of m gamma distributions with shape parameter m and, therefore, are gamma distributed with shape parameter nm when N== n. As is well known, to achieve a fixed distance, A in the case B = =o, the distribution of the number of steps, Nm, is necessarily Poisson. Going to a continuous generalization,
where g(AA) is simply a normalizing factor and V is the gamma function, we see
where t// is the diagamma function, and it is known to be increasing. Thus, by Theorem 1.1, the hazard function of T is increasing, which of course is not true of the Wald distribution.
Accumulation of information in Discrete Time 335 TABLE 8.1. Comparison of theoretical and simulated (50,000 trials) observables in the Gaussian random walk model MRT
P(A/a) P(B/ft)
Parameter values
aA
ftA
aB
bR
A = B = 500 cr = 25, n = 1
Theory Simulated
.83 .84
.83 .84
332 340
332 340
332 334
332 345
A = B = 35, cr = 8 l M , = 1
Theory Simulated
.75 .77
.75 .77
17.4 22.1
17.4 21.7
17.4 22.1
17.4 21.8
A = 50, B = 25 a = 8,/x = 1
Theory Simulated
.80 .65
.87 .88
23.7 28.9
14.4 18.4
23.7 28.6
14.4 18.3
Apparently, therefore, the observed reaction-time distribution is quite sensitive to the exact nature of the decay in the tail of the distribution of X. The faster decay of exp(— x2) as compared with exp(—x) is reflected in a peaked as compared with an increasing hazard function for the reactiontime distribution. Since, to my knowledge, the form of the reaction-time distribution is not known in the choice case for any particular assumption about the distribution of X, I had the Gaussian process simulated.* Figure 8.2 gives some examples of the simulated density and hazard functions, based upon samples of 50,000 trials. The main thing to observe is that qualitatively they are of the same general shape as the Wald distribution with hazard functions that asymptote to a constant and that may or may not have a peak. Perhaps more important is the observation that for these simulations, and many others we have done, the estimated mean reaction times are substantially larger than the theoretical ones. Table 8.1 gives some examples. Presumably this is due to a failure of the small step assumption. This fact raises serious questions about attempts to fit random walk models to data (see Section 8.5). *8.2.6
Proofs
Theorem 8.1 (Wald, 1974, p. 157). or S n <-B.
With probability 1, for some n, S n > A
Proof. Let X,,X 2 , . . ., X n , . . . be the sequence of independent, identical random variables. Let k be an integer, and partition the sequence into successive subsequences of length k. Let S- k) be the sum of the jth such subsequence. If the process does not terminate in a boundary crossing, then for every /, |S,(k>|< A + B. Let P, be the probability that |S| k) l
336 Identification Paradigms
FIG. 8.2 Density and hazard functions for the Gaussian case of the random walk computed from simulations of 50,000 runs with /j. = 1 and or = 8. Parts a and b are, respectively, the density and hazard functions for the symmetric case A =B = 35; c and d are A ==50, B = 25.
Accumulation of Information in Discrete Time
337
338 Identification Paradigms But we may select fc sufficiently large to insure P< 1 because E(X n ) ^ 0 and so |.E[Sjfc)]| can be made arbitrarily large. Theorem 8.2 (Wald, 1947, p. 159). Let N be the least integer such that either S N <-B or > A. Then for all 6 for which |M S (0)|> 1, Proof. Lei n be any integer and let Pn denote the probability that a barrier is crossed on or before n. Since Sn is the sum of identical and independent random variables, Let En denote the expectation operator conditional on the crossing occurring on or before n and E* the operator conditional on the crossing after n. Then,
Since for each value of N, §„—S N is independent of SN,
Substituting and dividing by Ms(6)", By Theorem 8.1, lim Pn = 1. Because E* is conditional on crossing after n, |E*[eS"fl s]\ is bounded by eAe + e Thus,
Be
. And by hypothesis,
\MS(6)\>1.
Lemma 8.1. //' (1) lim M,(0)>1 and (2) M',(Q)=fQ, then there exists a e—>i«>
unique 9 l s ^ ( ) such that Ms(Qls)--= 1. Proof.
Observe that
hence there is only one value of 8, say 00s, such that M's(6Us) = (). By hypotheses 1 and 2, 00s must be a minimum of Ms(6). By hypothesis 2, 0 0 s /0, so Ms(e0s)<Ms(0)='\. Thus, there exists a unique 0 l s ^ 0 with ^s(0ls)-l. Proof of Eqs. 8.18a and 8.185.
Take the derivative of Eq. 8.15 with
Accumulation of Information in Discrete Time
339
To evaluate d9/dz, observe
whence
Since
we see
Define ys by
Using Eqs. 8.24 and 8.19 and setting 0 = 0, then 0 = 0 l s in Eq. 8.23 yields
where
Solving for LsA and LsB yields Eqs. 8.18a and 8.18b. Note that Eq. 8.25a is the same as Eq. 8.20.
340 Identification Paradigms
8.3 RESTRICTIONS ON THE RANDOM WALK MODEL 8.3.1 The Classical Optimum Model: Sequential Probability Ratio Test (SPRT) If the observed random variables underlying the decision process—call them Yj, ( = 1 , 2 , . . .—have the common distributions gu and gf), depending on which signal is presented, then by definition the likelihood ratio that a rather than b was presented, given that Y was observed, is g a (V)/g(,(V). Under repeated, independent observations, the overall likelihood ratio is the random variable
Taking the logarithm leads to a sum of independent and identically distributed random variables, and that suggests we define where h = ln(g a /g b ). This is the model Wald (1947) developed and that Stone (1960) suggested as a choice reaction-time model. Of course, X f has one distribution, fa, when a is presented and another, fh, when b is presented. The relation between these two distributions is indirectly, but adequately, captured in the next result, which is Swensson and Green's (1977) adaptation of a result in Thomas (1975). Theorem 8.3.
Equation 8.26 holds if and only if .Mh(0) = Ma(6 - 1).
We use this result shortly to see how the general random walk model is constrained by the assumption that decisions are based upon likelihood ratios. As one can see, using products of likelihood ratios to generate a decision is exactly the same method of updating as in Bayes' Theorem (Section 1.3.2). Moreover, it generalizes the normative theory of signal detectability (Green & Swets, 1966, 1974); in that theory there is just one observation and a single criterion with no region of indeterminancy. Wald and Wolfowitz (1948) showed that this sampling procedure is optimal in the following sense: if the error probabilities are fixed, then the number of observations before a decision is reached is, on average, no greater than it would be for any other decision rule. Of course, there is no assurance whatsoever that the brain is optimal in this sense, and as we shall see there is ample reason to believe it is not. Theorem 8.4.
Under the conditions of Theorem 8.3,
Accumulation of Information in Discrete Time
341
Of these properties, the most striking is the last. It says that the distribution of response times depends on the response made, but, surprisingly, it is quite independent of the signal presented. Assuming the model to be correct, the testable prediction that response times are independent of the signal presented follows only if the distribution of residual times also does not depend upon the signal. The next most striking fact, embodied in Eqs. 8.28 and 8.29, is that the boundaries A and B are determined entirely by the response probabilities, independent of the response times. This simplifies appreciably the estimation problem, which we discuss in Section 8.3.2. This last fact, together with Eq. 8.20, yields the relation embodied in Eq. 8.30. If one assumes there is an experimental manipulation that does not affect JL(,S, but does vary the response times and probabilities, then Eq. 8.30 formulates a speed-accuracy tradeoff function for the model. For some purposes, it may be useful to know more about the responsetime distribution than the mean. For the SPRT model, Ghosh (1969) has developed approximate expressions for the first four moments; I do not reproduce them here as they are quite lengthy. 8.3.2
On Testing SPRT
Let us begin by considering the parameters to be estimated. In SPRT the unit of the underlying decision axis has already been selected; it is determined by Eq. 8.26. If one wishes to change the unit—to assign to the likelihood ratio of 1 a value different from 1—it is necessary to multiply Eq. 8.26 by a constant k, which in turn will appear in Theorem 8.3 as Mh(6) = Ma(6-k) and will also appear in some of the expressions of Theorem 8.4. So that degree of freedom is lost. That leaves us with four model parameters: A, B, (jLa, fi b . As we noted, A and B are determined uniquely by, and uniquely determine, the response probabilities. So we turn to the latency equations. There are two complications. First, the model describes the mean number of discrete steps prior to a decision, and that must be converted into real time. Normally, one assumes there is a constant factor, A, that converts steps into time, but as Weatherburn (1978) has pointed out, the constancy of the conversion is not necessarily correct. Because no one really knows what to do in the more general case, I shall assume that the conversion is independent of the number of steps. As a
342 Identification Paradigms
result, the equations for the mean reaction times will have the factor A/fA s to be estimated, and we will be unable to identify A and ju,s separately. The second complication is the existence of residual times for which we have no theory. If we assume them to be the same, independent of signal and response, then we obtain from Eqs. 8.18a and 8.18b the following two equations in the three parameters A/ja a , A//n b , and rn:
where
In practice few attempts have been made to estimate these parameters because most data grossly violate the condition that as we have seen in Section 6.3.3. This means that this model, including the assumption that the subject starts processing information at signal onset (Section 8.4.2), can be saved only by making the residual a function of either the signal or the response or both. Once that is done, the model is indeterminate unless one varies conditions and assumes that some of the parameters are invariant under those changes. For example, suppose that the presentation probabilities are varied and it is assumed that this affects A and B, but not the two stimulus parameters |U,S. If. in addition, one were to assume the residuals depend on either the signal or the response but not both, then if there are m different conditions of presentation probabilities, there are a total of four parameters to estimate from 2m equations. Note, however, that this strategy will not succeed if the residuals are also assumed to depend upon the presentation probabilities. 8.3.3
Symmetric Stimulus Representation (SSR)
Because the data clearly reject SPRT with constant residuals, other specializations of the general random walk model are clearly of interest. Link
Accumulation of Information in Discrete Time
343
(1975) and Link and Heath (1975) have made the only other suggestion, which Laming (1979c, p. 433) has concluded may well be the only other tractable alternative. The suggestion, once again, is that the two densities of X are very closely related. In SSR it is assumed that the tendency to go in the A direction when a is presented is exactly the same as the tendency to go in the B direction when b is presented.* Stated formally, for all real x, It is trivial to see that: Theorem 8.5.
Equation 8.35 holds if and only if Ma(6) = Mh(—6).
Theorem 8.6.
Under the conditions of Theorem 8.5,
If we set 0 la = —1, which determines the unit of the decision axis, then Eqs. 8.37 and 8.38 become Eqs. 8.28 and 8.29 of SPRT, and A and B are determined by the response probabilities. Moreover, the speed-accuracy Eq. 8.20 then becomes exactly that of SPRT, Eq. 8.30. The major difference in the relations among the parameters is that there is but one mean, JJL = /xa = -ft b , and y = ya = yb, rather than ya = llyb. The most important consequence of this is that the mean times for the same response need not be the same. We can see that by using the properties of Eqs. 8.37-8.40 in Eq. 8.18, we obtain
The equality of SPRT holds if and only if y = 1, which in fact is a special case of SPRT. Recall that 7 = 1 is a property of the Gaussian model. Observe that the relations between the response probabilities and A and * It is easy to become confused about terminology. Link (1975, 1978) speaks of "relative judgment theory" (RJT), by which he means that the observations Xt arise by comparing an internal observation relative to a standard; however, in these papers he imposed Eq. 8.35, which he did not name and which need not arise from relative judgments, and some have come, incorrectly in my view, to refer to that assumption as RJT.
344 Identification
Paradigms
B are exactly the same in both models. Furthermore, so long as A and B are otherwise unconstrained, absolutely any pair of response probabilities P(A a) and P(A b) can occur. Thus, neither SPRT nor SSR imposes any constraints whatsoever on the ROC curve—the relation between P(A | a) and P(A | b) that arises when the signal is held fixed and the error tradeoff is manipulated, usually by varying the presentation probabilities, the payoff matrix for accuracy, or the instructions. This observation is not inconsistent with Link and Heath's (1975) emphasis on the fact that SSR encompasses the ROC curves of many well-known psychophysical theories, such as low thresholds and the theory of signal detectability; however, the significance of the fact is much undercut when it is realized that SSR and SPRT both encompass every conceivable ROC curve! Noreen (1976) was perhaps the first to note and certainly the first to exploit the fact that E'qs. 8.18a and 8.18b become the following linear system in 1//J, and I/YJU.:
where the 17's arise from Eq. 8.18 and simplify to
8.3.4
On Testing SSR
The situation is similar to that of SPRT, but simpler. For A and B, the models are the same. For the mean response times, there are four equations and two parameters, A//x and A/y/j,, associated with the steps of the random walk. Of course, nothing can be done if we assume the residuals are affected by both the signals and the responses. So, I consider two special cases. The first is that the residuals are a function only of the signal. Then where Lsr is given by Eq. 8.41, is a system of four equations linear in the
Accumulation of Information in Discrete Time
345
TABLE 8,2. Mean and standard deviations of the parameter estimates of the perturbed MRTs using A = 5, B = 2, ji = 10, y:--2, ra = .100, and rh = .125 Size of perturbation .05 .10
Mean standard deviation Mean standard deviation
(i/A
T
rl(
f*
10.05 0.72 10.21 1.51
2.02 0.16 2.08 0.38
.100 .016 .100 .031
.125 .021 .125 .043
four unknowns A//LI, A/7/u,, ra, and rb, and it can be solved explicitly for any set of data. There is some question about the sensitivity of the estimation given that there is error in the time estimates. To get some idea how bad the estimations might be I had the following calculations carried out.* For the parameter values
the four values of E(T \ s, r) were computed. Then they were systematically perturbed by the factors (1 + e), 1, and (1 — e ) , which results in 81 quadruples of mean reaction times. The estimated parameters were obtained for each. As a crude measure of failure of the model to fit the data, I used any of the following as a definition of failure:
Using e = .05, there were no failures of this sort; using e = .10, there were five failures, all of them cases where r s <50msec. (This will be important later in evaluating fits of the model to data.) To get some idea of the deviation of the estimates from the "true" values of the parameters, the means and standard deviations, over the 81 cases, are given in Table 8.2. The obvious alternative model is to suppose that the residuals depend upon the responses, not on the signals. Unfortunately, the resulting linear system is singular since
* A. F. Smith carried out these calculations.
346 Identification Paradigms and so eliminating A(y — l)/^j,
It follows that we cannot estimate all of the parameters, but as a check on the model we do have two independent estimates of A(y — l)/M-y. Except for Laming (1979c), the published attempts to test SSR (Link, 1975, 1978, and Link and Heath, 1975) have not involved estimation of the parameters /it/A and y and an evaluation of their plausibility or how they vary over experimental conditions. Rather, the emphasis has been on linear relations, such as the speed-accuracy tradeoff of Eq. 8.30, that follows from parts of the linear system. As we shall see, this can be grossly misleading. #.3.5 Restrictions on the Boundaries
One way to reduce the number of parameters in SPRT and SSR is to entertain hypotheses about how Ihe boundaries are selected. I give three examples. First, let us suppose that response symmetry, A = B, is imposed. Then from Eq. 8.20 we see that the speed accuracy tradeoff becomes
And from Eqs. 8.16a and 8.16b we see «s = j3s and so by Eqs. 8.28 and 8.29 of SPRT and the corresponding Eqs. 8.37 and 8.38 of SSR, either P(A | a ) = P(A | b) or P(A\a)= P(B \ b ) . As the former represents no stimulus discrimination, so then the latter—that is, response symmetry— must obtain. Second, let us suppose that the separation between the boundaries is held constant—that is, for some K, A + B = K. As we shall see in Figures 8.9 to 8.11, this appears to be approximately correct for at least two experiments. For either SPRT or SSR, using Eq. 8.36 and substituting into Eqs. 8.37 and 8.38 and eliminating A, we obtain as the ROC curve
The area under this curve is
The third approach, due to Edwards (1965), assumes that there arc
Accumulation of Information in Discrete Time 347
payoffs for the four signal-response pairs and a cost that is proportional to the decision latency. Let the payoffs be Vsr and the cost per unit time be — U, where C/>0. If TT denotes the probability of presenting a, then the expected value on each trial in SPRT and SSR is easily seen to be
Let us suppose that A and JB are chosen so as to maximize expected value. Since selecting A and B is equivalent (in these two models) to selecting P(A | a) and P(B \ b), it suffices to solve the two equations
This leads to the pair of equations
where
These can be solved numerically for flA and flB, and so for A and B. To my knowledge, this has not been done.
348 Identification *8.3.6
Proofs
Theorem 8.3. Proof.
Paradigms
X = In gQ(Y)/gb(Y) iff Mh(0) = Ma(0- 1).
Denoting X = h(Y), then
Conversely,
By the uniqueness theorem for moment generating functions, and so
Theorem 8.4.
Under the conditions of Theorem 8.3,
Accumulation of Information in Discrete Time 349 Proof. Since Ma(0) = 1 = Mh(0), then Mb(8) = Ma(6- 1) yields Mh(V) = Ma(\ - 1) = 1 and Ma(-V) = Mb(0) = 1, and so 0 la = -1 = -0 lb . Substituting into Eqs. 8.16 and 8.17 yields Eqs. 8.28 and 8.29. Substituting these into Eq. 8.20 yields Eq. 8.30. Observe that M'b(6) = M'a(Q- 1), so
from which it follows that ya = l/yh. To show Eq. 8.32, substitute eA = P(A | a)/P(A \ b) P(B | a)/P(B b) into Eq. 8.15 with s = b:
and
e
B
=
Now replace 0 by 6 + I and, by Theorem 8.3, substitute Ma(8) = J
But setting s = a in Eq. 8.15,
Since these equations must hold for all 6,
and The proof of Theorem 8.6 is very similar to that just given and is left as an exercise.
8.4
MODIFICATIONS OF THE RANDOM WALK
The literature includes several attempts to preserve the main features of the two major, tractable random walk models while making them more consistent with the data, especially the lack of equality of error and correct responses. Two of these modifications I treat here and the others fit more naturally into Chapter 9. The basic ideas involved are the following: to assume systematic changes in the boundaries as a function of time, to assume variability in the
350 Identification Paradigms
location of the boundaries, and to assume variability in the time at which the subject begins processing information aimed at making a response. The first and third are taken up here—in that order which does not correspond to chronological order—and the second is dealt with in Chapter 9. 8.4.1
Linear Boundary Changes: Biased SPRT Model
In the presentation of SPRT in Section 8.3.1, we suggested that the decision variable at each step is the log-likelihood ratio. As Laming (1968) was clearly aware and as Ashby (1983) reiterated, this seems unduly restrictive in the light of modern psychophysics because the evidence favors the idea of bias being introduced into likelihood ratio. Laming (1968) suggested the Bayesian formulation
where k =Pr(s = a)/Pr(s = b). This does not alter the theory in any serious way except to replace the boundary A by A — In k and B by B + In fc; that is, it shifts the starting point by In fc. Ashby (1983) suggested a more pervasive bias, one that affects each observation, not the entire set. So he assumed Thus,
Observe that is equivalent to
which is the standard random walk model with boundaries that begin at A and B and change linearly with n with slope —In k. On this assumption, the proof of Theorem 8.3 is readily modified to show: Theorem 8.7.
Equation 8.45 holds if and only if Mh(0) = kMa(6 - 1).
The line of argument used to prove Theorem 8.4 does not go through for this generalized model. In lieu of this, Ashby developed the special case in which Y; is Gaussian distributed with mean /xs when signal s is presented and variance cr2, independent of s. In that case,
Accumulation of Information in Discrete Time 351 Thus X; is linear with Y s and so it, too, is Gaussian, and as is easily verified,
where If these are entered into the general Wald model, a system of equations, similar to those previously derived (Section 8.3.2) can be developed (see Ashby, 1983, p. 288). The model was fitted to the sn-deadline data of Green and Luce (Appendix C.6). Although it fits slightly better than SPRT and SSR, the estimates of k were 1.00 to two decimal places, suggesting no real difference from the standard Gaussian model. 8.4.2
Premature Sampling,
As I noted in Section 6.6.5, Laming (1968) made the important observation that all of the random walk models, and in particular SPRT, presuppose that signal onset is so accurately detected that the subject knows exactly when to begin accumulating discriminative information. As is known from simple reaction-time data (Section 2.4 and Chapter 4), anticipations are common when the time pressure is considerable, and so they may also occur in the choice situation although they are much less noticeable because the response does not usually occur until discriminative information has accumulated. If this happens, then the noise that is being processed tends not to affect the discrimination, but it does introduce variability in the position of the random walk when useful information begins to accumulate. In essence, then, the starting point is a random variable, which means that it is almost always closer to one of the boundaries than it would be if sampling were to begin at signal onset. The major consequence of this is that error times are in fact faster than correct ones. In particular, suppose that for the N0 steps prior to signal onset the subject is sampling noise information, which is assumed to be a random variable Z with the following properties: Z is independent of the signal to be presented, -E(Z) = 0, and Z has the moment generating function MQ(6). Let S0 denote the sum of these N0 observations. Then Wald's identity is easily seen to be modified from Eq. 8.14 to
Making the assumption of small steps, Eq. 8.15 is replaced by
352 Identification Paradigms
If we now assume that S0 is independent of N—it is far from clear that this assumption is valid although it may be if, as Laming assumes, the distribution of Z, and so S0, is symmetric—then Eq. 8.47 can be rewritten as
The right side is as before (Eq. 8.15), but the left side depends on 6. If one now proceeds as in Section 8.2.6 by taking the derivative of Eq. 8.48 with respect to z = ,M!S(6Y'1 and sets 9 = 0 and 0 l s (Section 8.2.3), the right side develops as before but the left side, instead of going to 0, involves the parameters /V0, J£ 0 (0 ls ), E[cxp(0, s S 0 )], and E[S 0 exp(0 ls S ( ,)|. Clearly, one can solve, as before, for the latencies, but four parameters have been added. Laming (1968) reduced them to two by assuming that 0 l s is sufficiently small so that only the first-order terms need be retained in the last two constants, reducing them to 1 and a negligible term. Since in SPRT 0 ls - -1, the assumption that higher-order terms of the exponential can be neglected seems implausible. Accepting that approximation, the second constant can then be incorporated into the definitions of «s and |3S (Eq. 8.16), which then leaves only /V0 to affect the latencies. This leads to a difference of 2N0 in the correct and error latencies, which at the expense of an additional parameter brings SPRT back into the running. Aside from Laming, little attention has been paid to this modification of the model, and in my opinion more work developing it is warranted. As was described in Section 6.5.3, Laming (1979b,c) has used this idea with systematic changes in N0 as a possible way to account for sequential effects. No really detailed analysis of specific data has been presented for this model.
8.5 DATA The models of this chapter all include two distinct types of parameters. One describes the nature of the information accrual process, including always some measure of the rate of accrual and, in some models, the variability of that process. The other describes the conditions under which responses are made. Although there is nothing in the structure of the models to force any particular identification of model parameters with experimental manipulations, there is in fact fairly wide consensus that certain pairings make intuitive sense. The information accrual process is assumed to be affected both by properties of the stimuli, such as their discriminability, and by properties of the observer that are not under voluntary, trial-by-trial or condition-by-condition control. The conditions for making a response are also affected by the stimuli, but, in addition, with the stimuli fixed, they can be significantly altered by variables that affect an observer's motivation.
Accumulation of Information in Discrete Time
353
Among these manipulations are the probability of presenting a signal, the instructions used, and the payoffs both for accuracy of performance and for the time to respond. These beliefs about the models lead to two strategies of experimental testing. The one attempts to manipulate the accrual process while holding constant the response conditions, and the other attempts to hold constant the accrual process while manipulating the response conditions. I say "attempts" advisedly because I fear we do not always succeed in these manipulations. The ones that are most suspect, in my opinion, are some attempts to manipulate the accrual process. A problem arises when the stimulus manipulation is such that it may also result in systematic changes in the response process. This means that if the trials are run in blocks—which has almost always been the case—we have no reason whatsoever to believe that the response parameters are the same from block to block. Only by randomizing the stimulus manipulation on a trial-by-trial basis can we hope that the response parameters may be the same from trial to trial. This possibility casts doubt upon at least the first study discussed in Section 8.5.1. 8.5.1
Attempts to Vary Accumulation of Information
Pickett (1967) used matrices of dot patterns for stimuli, where the dots were selected according to two-state Markov chains in which the average density of dots was constant but the transition probabilities varied. Subjectively, the stimuli varied from appearing clustered to quite evenly distributed. The observers were required to classify them as such and their times and accuracy were recorded. Error feedback was provided. The primary result was that the distribution of incorrect times was uniformly slower than that of correct ones—the two distributions did not overlap at all. The major conclusion was the SPRT does not fit these data. However, Pickett did argue that these data might be consistent with a random walk model in which the boundaries approach each other with time. Sanders and Ter Linden (1967) undertook to design stimuli so as to manipulate the accrual process locally without altering the overall character of the stimuli per se. This design appears to avoid the problem of affecting the response parameters as well as the accrual ones. There were two colored lights, and a stimulus was a sequence of flashes fluctuating between the two lights. For one class of stimuli the lights appeared in the proportion 60:40; for the other in the proportion 40:60. The observers were to identify on each trial which proportion was being used. A payoff scheme was employed in which each error of identification was fined 100 points and each observation of a flash cost one point. The prior probability of each stimulus was \. With that prior, the Bayesian strategy to minimize the expected loss of points is to respond whenever the difference is four in the total number of flashed by each source. The basic experimental manipulation was to control the position in a sequence when the evidence became decisive. For example,
354 Identification Paradigms
in their first experiment there were two classes of sequences: class I in which the difference remained small during the first part of the sequence and became large in the second part, and class II in which the difference became large early in the sequence. They found that observers responded on average when the difference reached 5.3 for class I and 7.0 for class II. Experiment 2 altered the rate at which the flashes occurred, and Experiment 3 caused a difference of 5 to occur at various places in the sequence. The results were as in Experiment 1: the later the response, the smaller the difference. If one assumes that a binary random walk is involved, these data force the conclusion that the boundaries diminish with time. However, as Vickers, Caudrey, and Willson (1971) pointed out, the observed pattern is exactly what one expects from LaBerge's simple accumulator. Consider, for example, the following two sequences of length 26 and a ratio of 62:38. The one is obtained from the other by interchanging the first and second halves: 1
1
0
1
0
1
1
0
1
1 0
0
1 0
1
1
1
0 0
1
0
1 0
1
1 0
1 1
1
1
i 0
0
0
1
0
1
0
1
1 0
I
1
1
1
1
0
1
1 0
0
1
If k = 8, then the response occurs at the 10th observation with a difference of 6 for the first sequence and at the 14th observation with a difference of 2 for the second sequence. The fourth experiment investigated the impact of runs by presenting sequence of the following character, where the runs of 1's are indicated by parentheses: (i i) o (i i) o (i i) o i o ( i D O i o i o i o o ( i i i :i i i) o o o i i o i o (i i i) o i o (1 i i) o i o i o i o o (i i i i i i) o o o i 1 0
1 0
1 0
1 0 (I
1
1
1
I) 0
1 0
1 0
1 0
0 (1
1
1
1
1 I) 0
0
0
1
A difference of 5 is achieved at the end of the initial string of 13, after which the sequences are the same. Each initial sequence differs in its run pattern. The data did not provide any evidence that these patterns mattered, and they concluded that the runs model is not correct. Vickers et al. (1971) employed stimuli of the same type, using the proportions 68 :32, 64: 36, 60 :40, 56 :44, and 52:48. No error feedback was provided. They controlled for possible practice effects by means of a latin-square design. Their basic data summary consisted of plots of the following statistics versus the experimental proportion: probability of responding A together with the mean, standard deviation, skew and kurtosis of the response-time distribution. These were compared with the predictions of four, somewhat specialized, models: the simple accumulator (recruitment model) with kA = kB = k (Section 8.1.1); the runs model with k A = k B = k (Section 8.1.2); the binary random walk (in which a unit step is taken toward the appropriate boundary with probability p and a unit step away from it with probability 1 - p) with equal boundaries, A = B = k (Section 8.2.1); a
FIG. 8.3 Plots of various statistics as a function of the parameters p (shown as p in the plot) and k (listed as a column of numbers in each panel) for the runs, random walk, and recruitment models. See text for a discussion of the meaning of the parameters for each model. [Figure 1 of Vickers et al. (1971); copyright 1971; reprinted by permission. 1 355
356 Identification Paradigms
FIG. 8.4 Continuation of Figure 8.3 for the variable recruitment and accumulation models and experimental data of Vickers et al. (1971) on stimulus sequences with different proportions of the two types of signals (see text). [Figure 2 of Vickers et al. (1971); copyright 1971; reprinted by permission.]
variable recruitment model in which k is a Gaussian random variable with mean fc and standard deviation cr(— 0.5 in plot). The relevant plots are shown in Figures 8.3 and 8.4. On the assumption that the p of each model is the same as the probability of the experimental manipulation—not a necessary assumption—the conclusion has to be that no single model is fully satisfactory. The slower subjects exhibit an asymmetry of the mean and to a lesser degree of the skewness
Accumulation of Information in Discrete Time
357
and kurtosis that is similar to the recruitment model and unlike the symmetry of the runs and random walk models. The faster subjects, however, seem far more symmetric. Pike (1968) noted this same pattern in other bodies of data, including his own. There are two other facts, however, that argue against both of the recruitment models—namely, that the standard deviation is symmetric and more widely spread out than can be accounted for by these models, and the skewness data are mostly positive for the data whereas they are mostly negative for the models in the range of probabilities used. Although it is not perfect, the most promising model for these data is Vickers' strength accumulator model. Qualitatively it seems to have most of the correct features. In summarizing this study, Vickers et al. (1971, p. 169) say: "It would be premature, however, to offer any firm verdict on the adequacy of the [strength] accumulator model in this context, while there remains no obvious rationale for assigning a value to
358 Identification Paradigms FIG. 8.5 MRT versus presentation probability for experiment 2 of Laming (1968) (see Appendix C.3). The sample size is 4800 observations per condition. According to SPRT each pair of curves with symbols of the same shape should be the same. [Figure 5.2 of Laming (1968); copyright 1968; reprinted by permission.]
and by now there is general concensus that SPRT in its original form is completely inadequate as a general model for choice reaction times. There are, however, two exceptional studies in which the appropriate invariance was found. In the first, Kornbrot (1977), the stimuli were eight classes of random dot patterns, with the same number of dots in each pattern of a class. Each of the four subjects made absolute identifications. There were five experimental conditions in which different patterns of payoffs were used. A ^ 2 test on the reaction-time distributions led her to accept the null hypothesis of no difference as a function of the signal presented for a fixed response. She mentions that in two-choice pilot data, this property was rejected. In the other study by Green et al. (1983), whose data are summarized in Appendix C.7, the experiment was designed to overcome some of the potential sources of discrepancy. In particular, highly practiced subjects were run for many trials, the warning and response signals were in different modalities—a visual countdown and a 70-dB response signal of either 1000 or 1700 Hz—and a random (exponential) foreperiod with a mean wait of 1 sec. Payoffs were employed that aimed at fast and correct responding, but severely punished anticipations. Presentation probabilities of .25, .50, and .75 were used. The main reason for all of these features was to attempt to reduce the tendency for subjects to begin the accumulation of information
Accumulation of Information in Discrete Time
359
that will lead to a decision prior to signal onset, which as Laming (1968, 1979a) had argued was possibly happening (Sections 6.6.5 and 8.4.2). As can be seen from Appendix C.7, the mean times nearly conform to the prediction, and indeed the fit of the SPRT model (with A and B varying with presentation probability and with the sensory parameters fixed) is satisfactory. However, a detailed comparison of the response distributions, using the Kolmorgorov-Smirnov maximum deviation test, showed that in 14 out of 18 comparisons the distributions differed at the 1% level. Autocorrelations of the responses showed a much attenuated pattern as compared with Laming's (1979b) data, and for one subject there was no correlation at all. Another happy feature of these data is the fact that when the presentation probabilities were changed to 0 and 1—simple reaction times—the values obtained were very close to the estimated values of the residuals. This has not been true by some 50 to 100 msec in some other experiments (Laming, 1979c, Swensson, 1972). 8.5.4
SSR Analysis With Signal-Dependent Residuals
Link (1975, 1978) examined the data of Appendix C.1-C.4 and C.6 from the point of view of SSR. As Laming (1979) pointed out, Link's strategy has been one of giving partial analyses of the data, centering primarily on some of the linear relations predicted by SSR or, in the case of a favorite plot, Eq. 8.30, by both SSR and SPRT. Laming criticized this strategy and he concluded from other analyses that SSR is also inadequate to account for any of these bodies of data. I shall draw the same conclusion, although my approach, which was outlined above, is somewhat different from Laming's. Assuming the residuals are affected only by the signal, the linear system Eq. 8.43 has been solved* for all of the conditions of all the experiments. As a first pass at the data, let us class the model as a failure if any of the estimates exhibit any of the inequalities in Section 8.3.4—namely,
The reasoning is as follows. If ji/A<0, then (i<0 and the random walk is actually in the opposite direction to the responses. The assumptions made imply y > 0. If fs < 50 msec, then one has an estimate of the residual far less than anyone believes possible. If r s >MRT sr , then at least one of the decision latencies is forced to be negative, which is clearly untenable. Using these criteria for failure, we find the pattern in Table 8.3. Consider first the Green and Luce data, which on the face of it are grossly inconsistent with the SSR model. Several observations should be made. First, judging by the sensitivity analysis provided in Section 8.3.4, it is doubtful if experimental error is sufficient to account for this number of * A. F. Smith carried out this computation.
360 Identification Paradigms TABLE 8.3. Number of failures of SSR/Number of Conditions, where a failure is defined by any parameter estimate meeting the conditions (1/A<0, i < 0 , MRT R ,
Ohs. I, Obs. 2, Obs. 3,
Laming (1968) 0/5 Link (1975) 1/6 Green and Luce (1974)
Obs. 1 Obs. 2 Obs. 3
0/6 0/6 3/6
s -deadline 6/8 3/8 6/8
s-ROC 3/5 3/5 4/5
sn -deadline 3/9 6/9 4/9
sn-ROC 0/5 2/5 2/5
FIG. 8.6 Estimated parameters obtained by fitting the system of equations given in Eq. 8.43 for the SSR model with signal dependent residual times to the data of Laming (1968, Experiment 2) (Appendix C.3).
Accumulation of Information in Discrete Time
361
failures. Second, the 42 failures in these 90 conditions break down as follows: two were singular matrices, 22 were either /1/A<0 (and these were mostly accompanied by some rs > MRTsr) or -y < 0, and 16 were rs < 50 msec, and of these 14 were actually rs <0. This is compared with the five failures in the 81 conditions of 10% perturbations of the data, all of which were fs <50 msec and none of which involved rs <0. Third, on the basis primarily of the speed-accuracy tradeoff, Link (1978) concluded these data tended to support SSR, which in my opinion is an erroneous conclusion. Given the asymmetry that usually appears in Yes-No psychophysical detection experiments, it is not terribly surprising that a model that assumes a highly symmetric stimulus representation fails to account for the data. Turning to the other sets of data, except for observer 3 of Carterette et al. and one condition of the Link data, one can plot the several estimated parameters as a function of the experimental manipulation. These are shown in Figures 8.6 through 8.9. First, the pattern of A and B is very much as one would expect if manipulating the presentation probability had had the intended effect. Moreover, the hypothesis that A + B is a constant seems moderately consistent with these data, which is plausible for fixed signals. Second, there is sufficiently little difference between the estimates of ra and
FIG. 8.7 Same as Figure 8.6 for the data of Link (1975) (Appendix C.4).
FIG. 8.8 Same as Figure 8.6 for observer 1 of Carterette et al. (1965) (Appendix C.5).
FIG. 8.9 Same as Figure 8.6 for observer 2 of Carterette et al. (1965) (Appendix C.5).
Accumulation of Information in Discrete Time
363
rh to lead one to believe that they are probably equal. Third, while there is some trend in the values of y, the hypothesis that it is a constant independent of the presentation probability is not untenable. Fourth, except for observer 2 of Carterette et al., the estimate of fi/A is consistently and appreciably larger for high presentation probabilities than for smaller ones. Fifth, the pattern in the change of ra and rh closely parallels that of jx/A, which suggests that the one change is compensating the other one. This could arise if there were systematic misestimates of the A and B parameters and therefore in the coefficients of the linear equations. Because of this last observation, I have computed* those values of ju,/A, j, ra, and rb that minimize the quantity
For the Laming and Link data this leads to
Laming Link
|i/A
Y
?a
rb
30.2 16.5
1.96 10.87
297 311
293 306
The scatter diagram of the MRTs is shown in Figure 8.10, and there does not appear to be any systematic pattern to the plots, especially when one recalls that there is probably a substantial error in the equations for the mean reaction times (recall Table 8.1). Heath (1981) pointed out that there are signs of trouble in the parametric analyses as evidenced by the non-constancy of the estimated parameter /it/A, shown in Figures 8.6 through 8.9. He demonstrated it in a slightly different fashion. Equation 8.20 can be written where The scatter plot of Ls versus zs should be constant if y^n is constant. For the Green and Luce sn-data, this was not the case. 8.5.5 SSR Analysis With Response-Dependent Residuals Turning to the other model in which the residuals depend upon A and B and not on a and b, we can do little more than estimate * Carried out by A. F. Smith.
from Eq.
364 Identification Paradigms
FIG. 8.10 Observed versus predicted MRT for data of Link (1965) and Laming (1968), where the prediction is from the SSR model with the parameter values reported in Figures 8.6 and 8.7.
8.44. There are two estimates—one from the A equations and another from the B equations. On the average, half of the A ones should be larger than the B ones, and half smaller. Table 8.4 shows that this predietion seems not to be true in the Carterette et al. data (p<.01 by the sign test) and surely not true in the Green and Luce data (p <.()()! by the sign test). Moreover,
TABLE 8.4. The number of cases where the estimate of — 1 the A equations exceeds that from the B equations Carterette et al. (1965)
Obs. 1 Obs. 2 Obs. 3
Laming (1968) 2/5 Link (1975) 3/6 Green and Luce (1974)
Obs. 1 Obs. 2 Obs. 3
1 from
1/6 0/6 2/6
s -deadline 1/8 0/8 3/8
s ROC 0/5 0/5 2/5
m -deadline 3/9 0/9 2/9
sn-ROC 2/5 0/5 1/5
Accumulation of Information in Discrete Time 365
in the Carterette et al. data, both positive and negative estimates arise. For the Laming and Link data, where there is no evidence of a problem, the scatter plot is shown as Figure 8.11. It is not as tight as one might wish. Green et al. (1983) carried out this analysis on their data, and the fit was slightly, but not significantly, better than that of the SPRT model. For all subjects, the estimate of y (Eq. 8.40) was approximately 1.2, which predicts a latency difference for the same response to different signals of . 167rA/fA, were r = A, B, which for these data corresponds to a difference of from 3 to 15 msec. 8.5.6
A Gaussian Analysis
As was noted in Section 8.2.4, for the Gaussian model with small 0 ls it follows that the slope of LsA - LsB versus 2P(A | s)- 1 must be negative. To test this, Heath (1981) carried out a study in which subjects were to identify which of two lights came on first. He varied the interstimulus interval (5, 10, 20, and 40 msec), imposed deadlines (250, 400, and 900 msec), and also ran an accuracy condition with no deadline. The mean times of the four deadline conditions were 231, 319, 460, and 712msec. The slope of LsA — LsB versus 2P(A s ) — 1 was decidedly positive for the three deadline conditions and decidedly negative for the accuracy one. This is strong evidence against the Gaussian model. Since comparable predictions do not exist for the more general models, we can reach no new conclusions about them. Heath outlined a model—called the tandem random walk model—to account for such data; however, as it is not worked out in sufficient detail to test it, I do not present it.
FIG. 8.11 Estimates of (-y-1) /fi7 from independent parts of the Link (1965) and Laming (1968) data.
366 Identification Paradigms
8.6 CONCLUSIONS It is clear that the major idea underlying all of the models of this chapter— namely, differential accumulation of information with different signals and a decision rule based on that accumulation—results in models that exhibit some of the gross features of much data. The problems arise as one attempts to examine matters in closer detail. Although, in principle, the models characterize the four response-time distributions, in practice relatively little is known about these distributions beyond their means, and most of the data analysis has been in terms of mean latencies. Because there tend to be as many or more parameters than the six independent pieces of data—the two conditional response probabilities and the four mean times—one is forced into one of three possible strategies. (1) Additional assumptions—such as symmetry in the information representation or residual times that do not depend on the stimulus presented—can be invoked in order to reduce the numbers of parameters. (2) Vary some experimental condition and in the light of the intended meaning of the parameters assume that some are invariant under the changes in conditions. Or (3) seek out properties of the model that are free, or relatively free, of parameter estimates. In the attempts to fit the SPRT and SSR random walk models to the several bodies of data in Appendix C, a mixture of strategies 1 and 2 has been used with, in my opinion, only partial success. In general, parameters that we believe should not be affected by manipulations were less constant than one would have wished. The cleanest results involved the use of strategy 3 in the SPRT model; most of the data do not support the prediction that for a given response, error and correct distributions are identical. As the experiments are usually run, that prediction is clearly false. Laming (1968) argued that this is likely due to premature sampling of the sensory information flow, and he suggested how to modify SPRT—indeed, the general random walk model—to take this into account. Approximations are required in order to avoid adding four new parameters, and the resulting equations have not been fitted to data in the same detail as SPRT itself. Rather than modify the model, with its attcndent complications, Green et al. (1983) suggested that the tack should be to revise the experiment so that subjects no longer engage in premature sampling. Although a statistically significant difference between the two distributions still remained, most of the discrepancy had in fact vanished by introducing three changes: highly practiced subjects, warning and reaction signals in different modalities, and random foreperiods. This suggests that SPRT (or SSR with 7 = 1) may be more accurate than had been thought for a while and Laming may be correct in attributing the failures to premature sampling. Further work on the premature sampling model seems in order. It is especially important that some attempt be made to couple it with a learning process, much as was done in Chapter 7, in an attempt to see if the sequential effects can actually be reproduced.
9 Stochastic Accumulation of Information in Continuous Time 9.1 INTRODUCTION The models of this chapter are similar in spirit to those we have just discussed. The major difference is that the accumulation of information is assumed to occur not at discrete times, but in some sense at continuous times. This does not mean that the accumulation is necessarily continuous in time, although it may be, but rather that it does not occur at periodic intervals. Perhaps the best known example of such a process with a continuous-time parameter is the Poisson process in which discrete events occur in such a fashion that the interevent times are independently, identically, and exponentially distributed. This is an example of a counting process, so called because one can simply count the number of events that have occurred prior to each instant in time. Each of the models to be discussed can be thought of as a continuous generalization of the discrete case in which the time between observations is made arbitrarily small. If so, the question arises: what is the point—why not work with the discrete approximation, especially since it will almost surely be used in carrying out numerical solutions? The reasons are several. First, in some cases the limiting process is rather delicate to carry out, and errors easily result. Nevertheless, as we shall see in Section 9.2.2, careful use of limiting processes can be most helpful in getting detailed results. Second, analytically, the relation between a continuous process and the continuous asymptotic results often seems more natural, and many questions that are complex combinatorially in the discrete case are easily understood as differential equations. Third, at least one class of processes—the continuous random walks—are more readily generalized to non-constant boundaries and non-constant rates of accumulation than are their discrete analogues. Fourth, renewal processes, which arise as a natural generalization of the Poisson, are not nearly so natural as generalizations of the geometric process, which is the discrete analogue of the Poisson. Last, and perhaps most compelling, is the lack of any substantial evidence for synchronization in the nervous system comparable to that found in a digital computer. A discrete stochastic process, of course, involves temporal regularity, which usually is the first step toward synchronization. Since the mathematics of continuous processes can become quite formidable as the models are made increasingly complex and, presumably, more 367
368 Identification Paradigms
realistic, I shall limit attention only to the simplest eases. And of those, 1 shall discuss only the few for which substantial attempts have been made to apply them to response-time data. References will be given to some of the more complex variants not outlined in detail. Two main types of models are described. First I take up continuous analogues of the random walk models of the last chapter. That is followed by models based on the idea that information accumulates in a punctate manner according to a renewal counting process, which leads to models that differ somewhat from those of the preceding chapter. 9.2 ADDITIVE PROCESSES Let |X(()} be the stochastic process that represents the information available to the subject at time t, where, of course, it will be necessary later to subscript X according to the conditions obtaining on a particular trial of the experiment. We shall say that the process is additive if and only if the difference random variables X(t + s) — X(t) over the intervals ( t , t + s) arc independent for non-overlapping intervals, and they have the additive property Note that by defining Y(t) = X(t) --X(0), this equation can be simplified to Obviously such a process is stationary (see Section 1.5). A random walk on the line is a simple discrete example of such an additive process. Following Bartlett (1962) consider the cumulant generating function which by Eq. 9.2 and the independence of the intervals yields the following functional equation:
Since K is continuous, being the logarithm of an integral, this functional equation is known to have a unique solution of the form (Aczel, 1966, p. 34) So the general problem of continuous additive processes is equivalent to finding all processes that satisfy Eq. 9.4.
Accumulation of Information in Continuous Time 369 A result that is beyond the scope of this book (see Cramer, 1937) is that the most general solution is composed of weighted sums of two processes characterized by
and We examine Eq. 9.5 in Sections 9.2.1 and 9.2.2, generalizations of it in Section 9.2.3, Eq. 9.6 in Section 9.2.4, and a blend of the two in Section 9.2.5.
9.2.1
Diffusion Processes
A process satisfying Eq. 9.5 goes under two main names—diffusion process and Wiener process—named after the mathematician Norbert Wiener who studied them as a model for Brownian motion. We see from the definition of K in terms of the characteristic function ^—namely,
and Eqs. 9.4 and 9.5,
Recall (see Section 1.4.4) that if %, denotes the characteristic function of the nth partial derivative with respect to x of the density function /((, x) of the random variable in question, then
Thus, by multiplying Eq. 9.7 by ^(&>, t) and integrating,
Since this must hold for all w for which the integrals are defined, it follows that / satisfies the partial differential equation:
This is often called the diffusion equation with drift. When the drift parameter m = 0, it is referred to as the diffusion equation or, in the physical literature, as the Fokker-Planck equation. In probability theory it also goes under the name of the backward equation, which is a general type of equation used in studying Markov processes. The parameter cr 2 /2 is called the diffusion coefficient.
370 Identification Paradigms
It is not difficult to show that an unrestricted solution to Eq. 9.8 under the assumption that the process begins at 0 at time 0 is
Ratcliff (1978) employed Eq. 9.9 to arrive at a SATF as follows. He assumed the parameter m actually to be a random variable with mean p,s depending upon the signal s presented and a variance rj 2 , which is independent of the signal. Calculating
yields an equation just like 9.9 but with m replaced by ju,s and the variance term cr 2 t replaced by f(T) 2 t + o"2). This, then, is the effective variance to use in a d' calculation (i.e., difference in the mean normalized by a standard deviation). The result is
This equation was listed in Section 6.4.3 [with A = (p,a -•• i^b)/j] and v = a/r]] as one of several proposed SATFs. In the study of reaction-time distributions, Eq. 9.9 itself is of little direct interest, since one is concerned with first passage times—that is, times when the process first crosses one of the boundaries and so initiates a response. So the key question is the distribution of first passage times. It turns out that the easiest way to get at that is via a discrete-time approximation to the process. 9.2.2
An Alternative Derivation
The absolutely simplest—and classical—random walk is the one in which the process makes unit jumps, one unit up with probability p and one unit down with probability q = 1 - p. It is assumed that the process begins at Z and that there are boundaries at A and 0. This is a special case of the random walk model of Chapter 8, one having two virtues: its properties are very well understood and, under an appropriate limiting procedure, it approaches the above diffusion process. This discrete process goes under the name gamblers ruin because of its interpretation in terms of two gamblers, having amounts of money Z and A - Z to begin with, and who gamble for one unit at a time until one or the other possesses all of the money. Let P(Z) denote the probability that the process is ultimately absorbed at 0. Observe that if 0 < Z < A , then after the first step the process is either at Z + I with probability p or at Z — 1 with probability q, so P(Z) must satisfy the recursive equation
Accumulation of Information in Continuous Time
371
and the boundary conditions P(0)=l
and
P(A) = 0.
Following Feller (1957, p. 314), when p^ q, it is easy to verify that P(Z) = 1
and
P(Z) = (q/p) z
are particular solutions to Eq. 9.10. The general solution is well known to be a linear combination of the two particular ones, which when we take into account the boundary conditions yields the unique solution
For p = q = -2, the two particular solutions just used are not distinct; however, in this case it is easy to verify that P(Z) = 1 and P(Z) = Z are distinct solutions, and so
is the general solution meeting the boundary conditions. More generally, one would like to consider the probability that the process beginning at Z first gets absorbed at 0 on trial n; denote this P(Z, n). As above, we see that it satisfies the recursion
along with the boundary conditions
Next we want to show that by taking appropriate limits, this process evolves into the diffusion one. We want to shift both our unit of time and of distance from 1 to small quantities we call A and 8. In particular, if t and z are fixed quantities, we let n and Z go to °° and A and 8 to 0 in such a way that nA = t and ZS = z. Moreover, we let A and 8 go to zero in such a way that S 2 /A is constant, and p — q = 2 p — 1 approaches 0 in such a way that (p — q)S/A is constant. In fact, select the constants to be
Then, formally, Eq. 9.13 in the units A and S becomes that is,
372 Identification Paradigms
Doing a certain amount of elementary algebra,
Dividing by A and taking the limits as A and S approach 0, we see that the diffusion equation 9.8 follows. We may also carry out the limiting process on Eq. 9.11 for the probability of absorption, as follows. Observe from the definition of m,
and so under the above limiting process
Letting a = SA, we see that Eq. 9.1.1 becomes
Considering the difficulty experienced in developing explicit expressions for the distribution of first passage times in the general random walk model of Chapter 8, it is rather remarkable that one can in fact do so for the general diffusion problem. One way to do it is to develop explicit expressions for the distribution of the trial of the gambler's ruin, and then take limits to derive expressions for the distribution of first passage times in the general diffusion problem with barriers. The detailed development is given in a number of sources including Feller (1957), Karlin and Taylor (1975), RatclifT (1978, Appendix), and Sampath and Srinivasan (1977). The result is that the density of first passage times at the 0 boundary is simply a weighted sum of exponentials (recall that this was also the conclusion of the cascade model, Section 4.3.2):
Accumulation of Information in Continuous Time
373
where
The expression for a crossing of the other boundary is obtained by replacing z by a - 2 and m by -m. The hazard function corresponding to this density is
Since C(fc)> C(l), the limit as t -»°c is C(l). I do not know the shape of this hazard function. We return to Ratcliffs successful application of this model to memory data in Section 11.3.6. 9.2.3
Generalizations
Several generalizations of the standard diffusion process are worth referencing. The first of these investigated time-dependent absorbing boundaries. Anderson (1960) examined the case of a linear time change. Later Kryukov (1976) assembled together and generalized a number of results in the literature in which the boundaries are of the form
He also studied decaying jump-like processes, somewhat analogous to those of Section 4.3.2. These generalizations have been motivated primarily by data on neural spike trains, and they have not been applied to reaction-time models. A detailed survey of the neural models is provided by Sampath and Srinivasan (1977). The other major generalization, due to R. Ratcliff, assumes that the drift parameter is itself a random variable, Gaussian distributed with mean and variance as parameters of the model. This generalization is relatively easy to analyze primarily because the diffusion equation is linear with constant coefficients. And it is, to my knowledge, the only version of the diffusion model that has been applied to reaction-time data, specifically to matching experiments in which two strings of symbols are presented successively and the subject is required to say whether they are the same or different. As this experiment and the models for it comes more naturally in Chapter 11, I postpone discussion until then (Sections 11.3.6 and 11.4.4).
374 Identification Paradigms
9.2.4
Poisson Processes
As was mentioned at the beginning of Section 9.2, the second basic additive (Eq. 9.4) continuous-time stochastic process is the one with the cumulant generating function (Eq. 9.6) Recall that for a Poisson process with parameter m the distribution of counts during duration I is the Poisson distribution with parameter mi:
and so
Thus, we see that the second solution to the additive property is the Poisson process—that is, a counting process in which the times between counts are independently, identically, and exponentially distributed. Put another way, it is a process with a constant ha/ard function. As was noted earlier, such processes are very basic. Judging by data on the electrical activity of single peripheral neurons, Poisson processes can serve as first approximations to the spike trains generated in the peripheral nervous system when a constant signal is imposed. But in detail, such a model is incorrect on at least two counts. First, when a signal comes on, the hazard function of the neural spike train usually exhibits a significant overshoot and then drops back to a constant level as long as the signal remains present. (This is not unlike the hazard functions discussed in Chapter 4.) Second, after each spike occurs, the neuron is refractory for something on the order of half a millisecond, after which it rapidly recovers to its normal level. In terms of the hazard function, it drops to zero immediately after a firing, and at about \ msec it rises to its constant level until the next firing. So, it is really necessary to modify the model somewhat, which means we can no longer assume a purely additive process. The literature includes a number of modifications, each of which has virtues and faults. Many of them are summarized in Sampath and Srinivasan (1977). Here I mention three—two of which are described more fully in the next sections. The first postulates a generalized Poisson process in which spiking is governed by a hazard function that is locked to the signal and is distinctly not constant. Some such process is strongly implicated by data on the patterning of interspike distributions that, in the auditory system at least, are multimodal and phase locked to the frequency of a pure tone signal.
Accumulation of Information in Continuous Time 375
This can be modeled readily by having the hazard function vary with the signal. Such models fail in one major respect—namely, they do not deal with the refractoriness of the neuron. Models of this type with applications to detection can be found in Siebert (1968, 1970) and are summarized briefly by Green and Luce (1974). So far as I know, they have not played a direct role in reaction-time analyses. A second generalization, which I discuss in Section 9.2.5, entails a curious blending of the Poisson and diffusion processes. The former is used to model the accumulation of sensory information, much in the spirit of the Grice and McClelland models (Sections 4.3.1-2), and the diffusion process with drift is used to model a variable response threshold. A third approach is to generalize the Poisson process to a general renewal counting process in which the times between successive spikes are independent, identically distributed random variables—but not exponential. It is obviously easy in this model to encompass refractoriness simply by setting the density equal to zero for the first 5 msec. It is equally obvious that the model fails to capture the phase-locked character of the data since, by assumption, the successive intervals are independent, which is just what they cannot be in a phase-locked system. Despite the latter weakness, these models have been developed in some detail and provide interesting accounts of some data. They are developed in Section 9.3. 9.2.5
Diffusion Boundaries and Poisson Accumulation
Viviani (1979a, b; Viviani & Terzuolo, 1972) has presented a model that incorporates both of the additive processes—diffusion and Poisson—into a generalization of the strength or information accumulation models we have examined earlier (see Sections 4.3.2 and 4.3.3). As in all such models, information about the stimulus accumulates gradually, and when it crosses a boundary criterion a response is initiated. In Viviani's model, the boundary is assumed to be a diffusion process with drift. In a two-stimulus discrimination study, this introduces a total of four parameters: the two initial values of the boundaries, a drift parameter (it being assumed that both boundaries drift at the same rate), and a diffusion coefficient (also assumed to be identical for both boundaries, and called by Viviani "the spectral density of the threshold noise"). The assumption that the boundaries drift towards each other with time is motivated by the data described below. The Gaussian variability of the boundary is as in Grice's model, but its increasing magnitude with time is different and is motivated by two considerations, one empirical and the other a feature of Viviani's accumulator model. His data, like most, indicate that the variability of response times grows with their mean, but his model for accumulation exhibits reduced variability with increasing information. So, if the accumulator model is accepted, there is little option but to assign increasing variability to the boundary.
376 Identification Paradigms
The accumulator model is unusual in two respects. First, the stimulus postulated is not a simple signal of the type usually employed in choice reaction times, but rather a string of brief pulses of two types—red and green in his experiment—that are randomly interleaved in different proportions. Stimuli of a similar type were encountered earlier in Section 8.5.1 where I summarized work of Sanders and Ter Linden (1967) and Vickers et al. (1971). In fact, Viviani assumed that the stimuli can be treated as the sum of two Poisson processes, one for each color. Of course, the parameters of the two processes can be manipulated by the experimenter. Each process is assumed to set up a strength or information variable in the following manner. Let X(t) denote one of the Poisson processes and let Y(() be a strength variable associated with that process. Assume that Y is governed by the following stochastic differential equation: where cp is called the decay constant and (/> the increment saturation constant. Recall that in previous models of this general character, the dependence was of the general form d\fdt = -(//Y-
where y,, = Y(0). In the appendix to Viviani (1979b), this solution was recast in another form, and by using cumulant and Taylor-scries arguments he developed approximate expressions for the mean and variance of Y. He verified the adequacy of his approximations by simulating the process. There is no need to reproduce those formulas here. 9.2.6
A Flash-Rate Experiment
The above model was then applied with considerable success to the data reported in Viviani (1979a). That experiment had three parameters: d the duration of time during which the two Poisson processes controlling the stimuli were identical and so, on average, the same number of red and green flashes occurred; and the two Poisson parameters i>R and vc;, which characterize the imbalance introduced after duration d. The Poisson parameters may be reported in terms of their sum v = VK + v(J and their ratio r = VR/V(;In the latter notation, the values used were 3 and 5 pulses per second (pps) for v and |, f, 1, §, and 2 for r. (In some of the data plots, the \ and 2 and the
Accumulation of Information in Continuous Time
377
§ and | data are collapsed in the obvious way.) The values selected fo were 8, 10, 12, 14, and 16 sec. Thus, there were 2 x 5 x 5 experimental conditions, and each was run 20 times for a total of 1000 sequences per subject. This was carried out by randomizing them over 10 sessions. Ten subjects were run; however, three failed to complete the 5 pps rate. Judgments occurring during the d -interval, which were few, were eliminated from the analysis. A hierarchical clustering analysis revealed four patterns of behavior. One subject was little affected by either r or d; three were mainly affected by d and not by r; three were affected about equally by both d and r; and three were intermediate between the last two cases. Ignoring the isolated subject, each of the other three groups were pooled and analyzed separately. The data to which the model was fitted were the mean response times and the two independent response probabilities. For the latencies and one of the response probabilities, the pooling resulted in 5 x 3 rather than 5 x 5 data points and the other response probability contributed 5 x 5 , for a total of 55 points. The number of parameters estimated from the data are fewer than might first be anticipated. The choice of ^ is an arbitrary scaling of Y and so was set at 1, and the value of
378 Identification Paradigms FIG. 9.1 From an experiment of Viviani (1979a), which is described in the text, involving random mixed sequences of red and green flashes, plots of MRT, probability of responding, and probability of no response as a function of the "foreperiod" parameter d for one subgroup of similar subjects. The parameter r is the ratio of Poisson parameters governing the two light sequences; the left panels have an overall Poisson rate of 3 pps and the right ones, 5 pps. The data represent averaged results of three subjects, with a total of 300 observations per point. The dotted lines show the theoretical values predicted by the model described in the text [Figure 2 of Viviani (1979b); copyright 1979; reprinted by permission of the publisher. ]
FIG. 9.2 MRT to errors as a function of the Poisson ratio vRlvc, (denoted nRln(s on plot) for the experiment of Figure 9.1. The legend shows the code for each of the groups of similar subjects, whose data are plotted separately. The dotted lines arc predictions from the theory. [Figure 6 of Viviani (1979b); copyright 1979; reprinted by permission of the publisher.]
Accumulation of Information in Continuous Time 379
it is clear that the worse the error (largest and smallest values of r) the faster the response. Viviani discussed briefly the possibility of other models accounting for his data. For example, he argued—correctly I believe—that the accumulation models of Section 8.1 will not do since if certain particular sequences were sufficient to trigger a response, then all subjects should be sensitive to changes in r, but some are not. The SPRT model is, of course, rejected because observed mean time to respond depends on whether a response is correct or an error. The SSR model was not disposed of by a qualitative argument nor was any attempt made to fit it to his data. Two other models will be considered shortly. 9.3 RENEWAL PROCESSES As I pointed out earlier, if information accumulation forms a counting process, there are two quite distinct types of decision rule. One is familiar: an information boundary is established and its crossing initiates a decision. This determines the response time. But there is a major difference from earlier models because in a counting process, the count can never decrease. The difference between two signals is found not in the barrier crossed, but the rate at which the count tends to increase. The value of the number of counts that have occurred—which is equivalent to the location of the boundary—divided by the time to reach the boundary is an estimator of the rate of accumulation of information. If the signals differ in their rates, as is the case for intensity discrimination, then the estimated rate can serve as a decision variable, as in the theory of signal detectability. Note that in this model all decisions are based upon the same sample size, which is uniquely specified by the boundary. Because the random variable observed is the time to achieve a fixed count, these are called timing models. The specific model worked out below is somewhat more complex than this because the information is accumulated in parallel as well as in series. The other decision procedure is to fix a time boundary on the accumulation—that is, a fixed observation time. This time, which may be a random variable, determines the response time. Again the decision is based on an estimate of the rate, this time estimated by the ratio of the observed count to the fixed observation time. Because the observed random variable is a count, these are called counting models. In each type of model it is plausible to suppose that the total sample of information is obtained by making observations on all of the neural fibers activated by the signal. Although it is surely somewhat of an oversimplification, let us suppose that all of the active fibers are statistically identical, and so each carries the same amount of information about the signal. [Wandell
380
Identification Paradigms
(1977) has shown that this is not a serious restriction in, at least, the Poisson case.] Aside from the fact that physiological observation indicates massive parallelism in peripheral sensory systems, it is moderately clear from what we know that something of this sort must be the case. Decisions are reached in a matter of a few hundreds of milliseconds, and neural firing rates are of the order of a hundred spikes per second. Thus, a single channel provides only tens of spikes in the time available, and such small samples simply are not sufficient to permit the accuracy of discrimination that is observed. Thus, the central nervous system must have available more information than is carried by a single fiber. Before turning to the specific models, several a priori observations are in order. First, each type of model requires some form of tinier and some form of counter. Hypothetical neural networks can be devised to count and to time. In a counting model the timer is used to determine the observation interval (i.e., the time boundary) and the counter to count the spikes that occur. In a timing model the counter is used to determine the sample size (i.e., the count boundary) and the timer to time how long it takes to achieve the count. Thus, if the relevant sensory nervous system is capable of being reprogrammed, we should anticipate finding evidence for neither model or evidence for both. vSecond, in experiments with signals of fixed, brief duration—as is typical of most psychophysics—the counting model seems more appropriate because, by setting the observation window to correspond to signal duration, the exactly relevant information can be collected and processed by the nervous system. Were the subject to use the timing procedure, the fixed sample size would have to compromise between the different values that would be appropriate for the weaker and stronger signals, and a compromise choice must fail to use all of the available information when the strong signal is presented and to include irrelevant information (outside the signal interval) when the weak signal is presented. Third, if the organism is faced with signals of irregular durations and intensities, then the counting procedure may well be inferior to the timing one. The counting procedure guarantees a fixed response time independent of the signal presented, but the quality of discrimination must vary greatly, being exceedingly poor for weak signals and very good for intense ones. The timing procedure maintains equal quality of discrimination, but it does so at the expense of being slower to weak signals than to strong ones. Since the ordinary world is surely populated with signals of irregular durations and intensities, and the laboratory tends to use fixed durations, we cannot ignore the possibility that our training procedures serve, in part at least, to reprogram subjects from timing to counting modes of behavior. If so, our data may systematically misinform us about most ordinary human processing of discriminative stimuli. In any event, it seems clear that if we wish to study the availability of these two strategics, it will be necessary to design experiments aimed at eliciting both strategies and to some extent, therefore, they will have to differ from the standard mold.
Accumulation of Information in Continuous Time 9.3.1
381
A Counting Model
The psychological literature on counting models is modest, but growing. Much of it focuses on discrimination, and somewhat less on the speedaccuracy tradeoff. Some of the relevant discrimination papers are: Lachs & Teich (1981), Luce & Green (1972, 1974), McGill (1960, 1967), McGill & Goldberg (1968), Prucnal & Teich (1983), Teich & Cantor (1978), Teich & Lachs (1979), Teich & McGill (1976), and Vannucci & Teich (1978). The major references to speed-accuracy tradeoffs are fewer still: Green & Luce (1973, 1974), Luce & Green (1972), Wandell (1977), and Wandell & Luce (1978). Since in a discrimination paradigm the signals are really quite similar, it is not unreasonable to suppose that both activate the same number, J, of channels. Undoubtedly the observation time really is a random variable, but we shall treat it as a constant, 8. and assume all of the observed variability in the response times is due to other times such as the residual time and computation time. As an approximation, assume that the data collected from each of the J channels over the observation interval 5 is the same as if the data had been collected from a single channel for a total time of J8. Further, assume that the renewal process generating the spike train has interspike random variables with finite mean /u, and variance cr2 and that the firing rate, l/pu, is sufficiently large so that the number of counts observed in time J8, N = N(JS), can be considered to have a distribution close to the asymptotic distribution. Then, according to Theorem A.3, the distribution of N is approximately Gaussian with mean JS//M and variance J8a2/^3. (As I noted after that theorem, the variance expression is not particularly intuitive.) Consider two signals s;, ( = 1,2, whose stochastic representations are renewal processes with means ^ and variances
obtained by using a response criterion JV0 are (following a development parallel to that of Section 6.4.3 for the fixed-stopping rule model)
Eliminating N0 from the two equations and solving for z2, we obtain the following equation for the ROC curve:
Suppose /u, 3 /(r 2 is an increasing function of /x—which it is in the Poisson case since (x 3 /rr 2 =fj, and will most likely be for any reasonable process—then
382 Identification Paradigms
since f j L , > ( A 2 , we see that the ROC curve is a straight line in z-score coordinates with a slope less than 1. Furthermore, if we treat the right term of Eq. 9.18—which is the value of z 2 corresponding to z, = 0—as a measure of discriminability, called d', and if we eliminate 8 between the expressions for d' and E(T) = r n +S, we obtain as the speed-accuracy tradeoff
Recall from Section 6.4.3 that such a square-root relation has been arrived at from other models as well as having been proposed as an empirical approximation to the speed-accuracy tradeoff function. It should be recognized that in this model the reaction-time distribution is a function neither of the response made nor of the signal presented. That is not typical of most of the data we have previously encountered—for example, Viviani's above—which therefore cannot possibly be fitted by a counting model. However, as was noted above, the counting model may entail reprogramming of the subject that will occur only in experiments with fixed duration signals and for which a great deal of data is collected from each subject. Viviani's experiment did not satisfy the first condition, and many "cognitive" experiments do not meet the second one. We turn next to a model that is better suited to signals of indefinite duration. 9.3.2
A Timing Model
Assume exactly the same set up as for the counting model, except that instead of a fixed observation time there is a fixed count, K, on each channel. We again assume that the total sample, now ./«, is sufficiently large so that the decision variable can be assumed to be approximately distributed according to the central limit theorem. Thus, the total time to observe this many counts is distributed approximately as a Gaussian with a mean of JKJU, and a variance of JKo-2. If the response is determined by a temporal response criterion, T0, then the two z-scores are given by
and so eliminating 7 0 we find for the ROC curve
Observe that if
Accumulation of Information in Continuous Time
383
and /n2 are not too similar, this provides a striking criterion for deciding which model is better suited to a body of data. As in the counting model, a measure of similarity is the d' value given by the right term of Eq. 9.20. However, working out the speed-accuracy tradeoff equation is more complex because the observations times are not fixed—indeed, they are intimately involved with what happens in the discrimination. No one has yet worked out useable expressions for the E(T) as a function of both the signal and response to it. Formally, the problem to be solved is this: let Tb i = 1, 2 , . . . , / , be independent and identically distributed random variables (i.e., the times observed on the separate channels, to find the distribution of max(Tj) subject to the condition that ZT;
Changing variables to y = [f — (K + I)/LL]/(K + l)1/2cr and letting H(J) denote the mean of the maximum of J Gaussian random variables with mean 0 and variance 1, we see that the mean reaction time to the signal is
where I have written this so that the first three terms form the irreducible minimum time when K = 0 and so d' = 0. Observe that this minimum time is somewhat larger than it is in the counting model, and unlike that model the mean times differ for the two signals. If we introduce the variable and then eliminate K between the d' and E ( T j S j ) expressions, we get a formula for the speed-accuracy tradeoff. The equation can be found in Green and Luce (1974, p. 390). Suffice it to say that the initial rise of the function (d') 2 versus Mt at Mf = 0 exceeds the initial rise of comparable expression in the counting model by a factor of ^Jn2 when signal 1 is presented and by
384 Identification Paradigms
the square of that when signal 2 is presented. So the tradeoff is predicted to start later and to be appreciably steeper in the timing model than in the counting model. Equation 9.21 affords another prediction when a{ = fx;, which holds in the Poisson case—namely, that So, to the extent that the assumption holds, the slope of this curve should be exactly that of the ROC curve, Eq. 9.20. 9.3.3
Flash-Rate Data
Viviani's (1979b) experiment seems to invite timing behavior: the stimuli are themselves counting processes of the sort envisaged in the model and they continue until the subject makes a response. He did not attempt to fit the model to the data because of the following argument. If a fixed sample size is used, then the ratio of times to achieve that sample in the conditions with rates of 3 and 5 presentations per second should stand in the ratio 5=1.67. The empirically observed ratio of response times was about 1.5, from which he concluded the timing model is incorrect for these data. The argument is unconvincing to me on two counts. First, it is not clear why one should assume that subjects will necessarily maintain the same sample size for the two rates; it is plausible that they might use a smaller one for the slower rate in order to accelerate the pace of the experiment, which was very slow. Second, the argument makes the implicit assumption that the difference between information accumulation time and response time is negligible. Given that the response times were 10 to 15 sec, the usual estimated residual time of 100 msec is surely negligible, but that may not be a very good estimate for the following reason. After the information has been accumulated, a certain amount of data processing is necessary, and it is not difficult to envisage such, possibly cognitive, processing taking a matter of seconds. For example, suppose the response times to the slow and fast rates are 15 sec and 10 sec, giving the observed ratio of 1.5, and that the "residual'" time is 2.5 sec, then the accumulation ratio is (15 — 2.5)/(10 — 2.5) =1.67, as it should be on the assumption of constant sample size. It appears to me that a more serious attempt to fit the timing model is needed before it can be rejected as wrong for these data. 9.3.4
Deadline Data
To test these models, Green and Luce (1973) ran a simple Yes-No detection experiment for auditory signals in noise (Appendix C.6 and Section 7.7.2). Recall that from the point of view of psychophysics the experiment was unusual in two respects: the signals were response terminated and response deadlines were imposed. In one condition, the deadline was applied on all trials, in which case the counting strategy, if available, is appreciably more
Accumulation of Information in Continuous Time
385
FIG. 9.3 ROC data from Green and Luce's (1973) auditory detection study with response deadline on all trials (Appendix C.6). For each observer, there are 1500 observations per condition. [Figure 4 of Green and Luce (1973); copyright 1973; reprinted by permission.]
efficient than the timing strategy in making use of the information generated by the stimulus on each trial. In another condition, the deadline was applied only on signal trials, in which case the timing strategy, if available, with the count chosen to complete the signal trials just under the deadline, is the more efficient strategy. The reason is that on noise trials the rate is slower, and so the count will often exceed the deadline for signal trials, but that does not matter since it is not applied on noise trials. For quality of decision, however, it is better to have a larger than a smaller sample size. The speed-accuracy curves were developed by varying the deadline from 300 msec to 2000 msec, and the ROC curves were developed by varying the payoffs at the 600-msec deadline. Except for the longest deadlines, the MRT data for the sn -deadline condition seem relatively independent of the signal and the response, as must be the case for the counting model. The times in the s-deadline data are clearly longer for noise trials than for signal ones. As I noted above, exact predictions do not exist for the mean time partitioned by responses, E(T|s, r), but some qualitative deductions seem plausible. An error response occurs when the total time observed was on the wrong side of the criterion, which suggests that in the case of signal trials the mean response time for errors should be unusually slow—that is, £(T | s, N ) > E(T [ s, Y)— whereas for noise trials the errors should be unusually fast—that is, E ( T | n , N ) > E ( T | n , V). Of the 78 comparisons, eight (or 10%) violate these inequalities. Of the eight, several differences are so small that they probably can be attributed to chance, and five arise either at the longest or the shortest times, both of which may be somewhat peculiar. The short ones may well be contaminated by fast guesses and the longer ones, by a shifting criterion. The most striking confirmations of the theory are found in the plots of the ROC curves, of MRT(n) versus MRT(s), and the speed-accuracy tradeoffs. The src-deadline ROC data shown in Figure 9.3 all exhibit slopes less than 1, which is usual in psychophysical data, and the s -deadline ROC data shown in Figure 9.4 all have slopes greater than 1, as predicted by the timing model. The MRT plots for the s-deadline experiment, Figure 9.5, have slopes virtually identical to those for the ROC curve, also as predicted.
386 Identification Paradigms
FIG. 9.4 For the same study as Figure 9.3, the ROC data when the response deadline is imposed only on signal trials. Again, there are 1500 observations per condition for each observer. [Figure 9 of Green and Luce (1973); copyright 1973; reprinted by permission.]
Finally, the speed-accuracy tradeoff shown in Figure 9.6 exhibits just the predicted features of the two models: the functions d' versus MRT for the s-deadline data rise from 0 both somewhat later and much more rapidly than for the sn-deadline data, as should be the ease if the former is governed by the timing model and the latter by counting. Wandell (1977) replicated and extended this experiment using visual rather than auditory stimuli. He added the condition of a deadline only on
FIG. 9.5 For the same study as Figure 9.3, MRT(n) versus MRT(s) for the s-deadline procedure. Note the similarity of these slopes with those of Figure 9.4. [Figure 7 of Green and Luce (1973); copyright 1973; reprinted by permission.]
FIG. 9.6 For the same study as Figure 9.3, SATF with accuracy measured by d'. The upper left panel is for the condition where the deadline applies to all trials, and it combines the data for all three observers. The other three panels show the data for each observer separately for the condition where the deadline applied only on signal trials. [Figure 11 of Green and Luce (1973); copyright 1973; reprinted by permission.]
387
388 Identification Paradigms
noise trials because, as he observed, the s -deadline and n -deadline predictions differ. As we have argued, the s-deadline invites timing behavior. But the n -deadline does not since if the sample si/e is selected to get the response in by the deadline on noise trials, then on signal trials the sample will be achieved appreciably before the deadline. This means that timing behavior entails less complete use of the available information about the signal presented as compared with counting, and that is inefficient. His data were as predicted: the ROC slopes were >1 for the s-deadline condition and <1 for both the sn- and n-deadline conditions. Wandell and Luce (1978) raised the following question. So far we have assumed without discussion that the information gathered on the several channels is summed in order to arrive at a decision variable. There are other ways a discriminative decision can be reached, the simplest probably being to use the slowest observation over the channels. From Theorems A.5 and A.6 we know that for the types of intcrspike time distributions we are assuming, the asymptotic distribution should then be the double exponential, not the Gaussian. The whole theory can be reworked on that assumption, transforming the probabilities according to that distribution to get their analogue of the z-scores and then to construct their ROC curves. To our surprise, the data were as linear in these coordinates as in the Gaussian ones, but the slopes were much more extreme. This fact was sufficient to reject the model since slopes of the timing ROC curve and MRT curve were again predicted to be equal, which they were not. 9.4 CONCLUSIONS One is left feeling considerable lack of closure. In this and the preceding two chapters we have examined a number of stochastic models, each attempting to account in detail for the interplay of errors and response times in two-alternative designs. And a variety of experiments have been conducted in relatively isolated attempts to check the adequacy of, usually, particular models, and each can be said to have found some support in data and most have also been shown to fail for some data. There have not been, however, any truly comprehensive attempts to fit all models to one body of data, and we do not really have any taxonomy of experiments that would permit us to say, for broad categories of designs, which models appear to be adequate. One can well believe that it matters whether signal duration is fixed or response terminated, whether deadlines are used or not, and whether the signal is a single flash of light at threshold or a mixed sequence of moderately bright light flashes of different colors in a fixed ratio or one of two letters of the alphabet. But no one can say today with any confidence which, if any, of these models is appropriate to each design, although some models are definitely known to fail in certain designs. This is not a satisfactory state of affairs.
10 Absolute Identification of More Than Two Signals 10.1 TYPES OF DESIGNS WITH MORE THAN TWO SIGNALS Once we admit more than two signals, quite a variety of experimental designs become possible. Some special ones, designed to investigate specific issues, crop up every now and then, but most of the literature is dominated by two designs. The earlier, and perhaps the most natural extension of the two-signal, two-response design of Chapter 6, involves N signals and N responses that are in one-to-one correspondence, so each response uniquely identifies a signal. One signal is presented on each trial, usually but not always selected according to a random schedule, and the subject attempts to identify it absolutely by responding with the "correct" response. Some major reaction-time results for this design are presented in this chapter. Much of the experimental literature involves easily discriminated signals and nearly error-free responding. The impact of number of signals is treated in Section 10.2 and sequential effects in Section 10.3. What there is involving errors and speed-accuracy tradeoffs, and the corresponding theories, is dealt with in Section 10.4. In addition, of course, the psychophysical literature includes quite a number of papers involving unidimensional signals varying along one physical dimension, such as tones varying in intensity, for which the error rate is quite high; however, for the most part, reaction times have not been reported for these studies and so they are not treated here. Since there are pronounced sequential effects in the reaction times for the error-free case and in the probabilities of errors in the error-prone case, it is evident that not only should speed-accuracy be studied more fully, but so should its sequential structure. This is a major lacuna. The other major design—one version of which is called memory scanning and another is called visual search—is a slightly more complex task with a simpler response. Prior to each trial, or sometimes prior to a run of trials, the subject in a memory scanning experiment is presented with a set of stimuli called the memory set, which is to be commited to (possibly shortterm) memory. An experimental trial involves the presentation of a single signal that is either from the memory set or from another distractor set. The subject is to report, as rapidly as is consistent with highly accurate performance, whether or not it was a member of the memory set. The dependent 389
390 Identification Paradigms
variable—the time to respond—presumably reflects something of the search process of short-term memory. Visual search designs differ from memory scanning ones by inverting the order of the single signal and the set of signals to which it is matched; the single signal comes first and it is followed by the sequential presentation of possible matches. Again, the response set remains at two, independent of the size of the set of stimulus alternatives. Some of these experiments and proposed interpretations are initially treated in Chapter 11. As the theoretical issues that have grown up around them arc rich and somewhat complex, they are treated separately in Chapter 12. 10.2
EXPERIMENTS WITH FEW ERRORS: MEAN TIMES
Once we go beyond two signals, a number of variables can be manipulated—although doing so without confounding them is virtually impossible, as we shall see. We focus initially on three of them: the number of signals, the probability that a particular signal is presented, and the conditional probability of one signal following another. The following qualitative facts arc well established. First, assuming the signals are equiprobable, MRT increases or, under certain circumstances (see Section 10.2.3), is constant as the number of alternatives increase (Hick, 1952; Merkel, 1885). This is simply a generalization of the fact that the N = 2 times are slower than the N= 1 (simple) ones. Second, MRT decreases with increasing probability of a signal (Falmagnc, 1965; Krinchik, 1969; Leonard, Newman, & Carpenter, 1966). Since as N increases the probability of each signal, 1/N, decreases, the first result is conceivably just a consequence of the second; but, as we shall see, matters are not so simple. Third, MRT to a repeated signal is faster than to a non-repeated one (Kornblum, 1967). The major question facing a theorist is to try to specify more precisely these regularities and to understand how they relate to one another, with an eye to spelling out the underlying mechanisms. 10.2.1
The Stimulus Information
Hypothesis
An initial bold theoretical attempt of the 1950s, much influenced by the then new and very fashionable theory of information of Shannon (1948), was the hypothesis that MRT is simply a function of—indeed, is linear with—the information transmitted (in a technical sense to be specified) by the signal presentations. The idea was first formulated by Hick (1952) for the N equally likely alternative case. Following the major constructive device of information theory, he argued that the decision to make a response involves repeated partitions of the signal set into equally likely subsets, and he assumed that, independent of the size of the subset, the same time was needed to decide which subset included the signal presented. The latter,
Absolute Identification of More Than Two Signals
391
apparently unlikely, assumption is discussed more fully below. At least for N of the form 2" these assumptions imply that the response time should be proportional to Iog2 N. Obviously, this cannot be correct since for N = 1 = 2°, it implies zero reaction time. To deal with this, Hick compared and
with his and Merkel's data and concluded that Eq. 10.2 fitted somewhat better. He then argued that this equation is plausible because, in reality, there are N + l choices to be made when we take into account that a discrimination between something and nothing is involved. Of course, Eq. 10.1 simply incorporates the familiar idea of a residual response time. Hick pointed out that his selection model, as stated, should lead to individual reaction times that are integral multiples of the simple reaction time. It comes as no surprise that he did not find such periodicities in the data. This fact does not seem particularly telling since, to be rid of the periodicities, one need only replace the assumption of a constant inspection time with a random variable. From one perspective, a major assumption of the Hick model seems almost to contradict the data it is intended to explain. On the one hand, the data say that the larger the set of possible signals, the longer it takes to isolate one of them and to respond. On the other hand, the model says that successive decisions about which half of the remaining possible response alternatives includes the signal take equally long, no matter how many alternatives remain. One way to interpret Hick's assumption, the way he suggested, is to suppose that each signal is a bundle of (unspecified) features or attributes, and a signal is identified by sequentially ascertaining whether or not it has one feature and then another, until it becomes clear which of a limited class of signals it is. If the signal presented exhibits a particular feature, the response set is thereby reduced to just those alternatives having that feature; if it does not have the feature, then the response set is limited to the complementary set of alternatives—that is, to those not having the feature. Assuming that each feature is exhibited by half the possible signals and that the several relevant features are uncorrelated over the set of possible signals (neither of which is easy to guarantee when the features are not explicitly known to the experimenter), then the number of features that must be examined on average in order to isolate the signal is the number of halvings required—that is, Iog2 N. In contrast to such feature isolation, there are two other types of theories in which the signal is treated wholistically. One of these, the more prevalent, is called "template matching," although to me that term conjures up more about the matching of silhouettes than is intended. If comparisons of the encoded test signal are carried out sequentially among the encoded representations (or memories) of the possible signals, then the number of
392 Identification
Paradigms
comparisons taken to achieve a match should, on average, be proportional to the number of signals. While this prediction does not accord with data from the absolute identification design, it does with those from memory scanning designs (see Section 11.1.2). A careful comparison of the template matching and feature extraction points of view was provided by Smith (1968). Another wholistic point of view, taken up in Section 10.4.2, supposes the process to consist of a sequence of successive discards of possible responses. At each stage of the process, the discard takes some time, but it reduces the choice set by one, at which point the process is assumed to begin anew as if the reduced set were the possible choice set. In this model, unlike Hick's, the times are increasingly faster as each discard occurs, but under certain circumstances this leads to predictions not unlike the data. No interpretation is given to the mechanism that leads to discarding alternatives. Since Hick's argument was essentially identical to the partitioning used in the theory of information, Grossman (1953) and Hyman (1953) independently generalized the hypothesis to a linear relation between MRT and stimulus information, where the latter is denned as follows. Suppose the stimuli are selected on each trial according to independent random variables with the distribution P(s), s = 1, 2, . . . , N, then by definition the information in the selection is
If there are one-step sequential dependencies P(s s') and the asymptotic distribution of choices is P(x), then the information measure is altered to
Hyman systematically varied the number of equally likely alternatives, the probability distribution, and the one-step sequential dependencies. His correlations between MRT and stimulus information for these three manipulations were .983, .975, and .938. Although his data were generally supportive of the hypothesis, detailed aspects cast the model into doubt. First, the response time to a particular signal with probability p of occurring does not depend upon p alone, but is also affected by the distribution of probabilities over the other signals. Second, in the sequential design there were cases where if a signal had just occurred it could not possibly occur on the next trial. Rather than speeding up the response on the next trial (because the effective N was reduced by 1), it was slowed. Third, as Laming (1968, p. 89) has demonstrated, the overall MRT versus H function is very different from that obtained when the data arc partitioned according to the response made, which in these error-free data is tantamount to the stimulus presented, and plotted against the
Absolute Identification of More Than Two Signals
393
corresponding information value. It thus appears that the stimulus information model does not really provide an accurate summary. As we shall see in the Section 10.3.1, the hypothesis is quite incomplete. Laming (1968) devoted the first chapter of his monograph on choice reaction times to a detailed and useful critique of the approach to reaction time based on information theory—which he calls communication theory because he uses the word information in a different way. Among other things, he pointed out that the basic conceptual framework of information theory entails the use of asymptotic theorems. For example, the information measure given in Eq. 10.4 for sequential dependencies states how many binary choices are required on average by a maximally efficient coding scheme working on an indefinitely long sequence of signals. Since the subject must respond trial-by-trial and cannot wait to take advantage of the sequential dependencies, at best Eq. 10.4 sets a lower bound on the number of covert partitionings required, and one really should not expect MRT to be linear with H. 10.2.2 Alternative Accounts of Hick's Logarithmic Relation There are a number of conceptually distinct ways that the logarithm, or something close to it, can arise. The one most fully studied is called a parallel, exhaustive search. It assumes that each possible stimulus is being held in memory and that on presentation of a signal a comparison is made simultaneously with each of the stored representations. The response occurs following the completion of all the comparisons. If we assume that the comparisons are independent and each is distributed according to the same exponential random variable, then we are in the case of the parallel model dealt with in Appendix A.2, and so, adding a residual time, it is easy to see that the mean time is
Independently, Laming (1966) and Rapoport (1959) noted that for large N
where A = In 2 and C is Euler's constant. Rapoport reported an elaborate experiment in an attempt to evaluate the model, but as I do not find it especially illuminating, I do not report it. Laming observed that Hick's "law" can be rewritten using
394 Identification Paradigms
FIG. 10.1 For identification experiments with various numbers of stimuli (Merkel, 1885; Hick, 1952), the fit of Eq. 10.8a with fc = 0 on the left and k = \ on the right. [Figure 1 of Laming (1966); copyright 1966; reprinted by permission.]
and he noted that a function that encompasses both this and Eq. 10.5 is
where k = () corresponds to Eq. 10.5 and k = 1 to Eq. 10.7. The fit of these two extreme functions to Merkel's and Hick's data arc shown in Figure 10.1. Laming then raised the question of whether there is some distribution F with the following property: the mean of the maximum of N independent random variables distributed according lo F is given by Eq. 10.8a. He showed that such a function exists, that it is unique, and that it can be characterized indirectly. Moreover, he derived an expression for the variance
which for k =0 and k = 1 is compared with the data in Figure 10.2. The main point of this example is to illustrate, early on, that data fit by a serial model, Hick's, often can be fit equally by a parallel one. This will be explored thoroughly in Section 12.2. And there arc other models that are conceptually distinct that also lead to an approximately linear relation between stimulus information and MRT. I do not delve into them in detail here because they all face a very serious problem—namely, that they require
Absolute Identification of More Than Two Signals 395
MRT to grow with N, whereas as we see in Section 10.2.3, that does not necessarily happen empirically. 10.2.3 Practice and Stimulus-Response Compatibility Since the early 1950s a number of papers on absolute identification have explored two major classes of variables. One class is the amount of practice the subject has had either through previous experience or in the experimental procedure—the latter has varied from a few tens of trials to tens of thousands. The other class of variables has been termed the compatibility between the responses and the signals. Actually two quite different things are meant by the term "compatibility." The more special meaning has to do with whether the one-to-one relation between signals and responses is "natural." For example, suppose signals are the illumination of one of a semicircle of small light bulbs before each of which is a response key, then the "natural" relation is to respond using the key nearest the lighted bulb; but many unnatural ones are also possible. One is to place the fingers of the two hands on the keys and to respond with the corresponding key of the hand opposite to the "natural" response. For the present, I do not mean this type of compatibility. Rather, there is thought to be some sort of mental compatibility between the modality of the signal and the nature of the response. For example, responding vocally to the visual or auditory presentation of digits is thought to be more compatible than responding by a key press, no matter how compatible in the first sense the ordering of the keys is.
FIG. 10.2 Predictions based on Eq. 10.8b for the variance for the data discussed in Figure 10.1. [Figure 2 of Laming (1966); copyright 1966; reprinted by permission.]
396
Identification Paradigms
FIG. 10.3 MRT versus log N (TV ~ number of alternatives) for the various stimulusresponse pairings shown in legend. (Not all sources listed in the legend are included in the present bibliography.) Note the flatness for the digit-voice case. [Figure 1 of Teichner and Krebs (1974); copyright 1974; reprinted by permission.]
Over the years a good deal has been published on these two topics and fortunately Teichner and Krebs (1974) performed an heroic survey of that part of the literature—which is most of it—having to do with visual signals. There appear to be some reasonably simple patterns to this mass of data. The reader is urged to consult their paper, for I shall summarize only part of it. Theios (1975) reviewed some of the same material. Consider, first, minimally practiced subjects, and let us compare their performance with different pairings of stimulus and response modalities. The stimuli are either the presentation of one of N lights or the visual presentation of one of N digits. The responses are either key presses, arranged in a "natural" fashion, or voice response. The data are surnmari/.ed in Figure 10.3. Three aspects are noteworthy: the linear growth with log N, the degree of parallelism of three of the four pairings, and the fact that the digit-voice curve is constant, independent of N. The key question is what to make of the constancy of the digit-voice data.
Absolute Identification of More Than Two Signals
397
If it says what it seems to say, all of the models we have discussed are clearly wrong. Laming (1968), however, suggested that it does not really mean what it seems to. He noted that the times involved, about 450msec, are really quite slow for two-choice visual reaction times, which are usually about 300 to 350 msec. He suggested the reason for this is that digits are so overlearned that, despite subjects being told that only two are being used in a particular run, they continued to behave as if there were ten. If that is so, then the growth of the digit-key curve with N must be due to response aspects, not decision ones. Evidence that this is true, provided by Theios (1973), is shown in Figure 10.4. Here the stimuli are digits and the response is either to name them or to identify them by pushing a button. The difference is striking. This immediately raises the question of the effect of practice on such performance. Another, widely quoted study, in which MRT did not change with the number of stimuli, is Leonard (1959). The stimuli were vibrations of the fingers and the response on each trial was movement of the finger stimulated, an exceedingly compatible response. For values of N = 2, 4, and 8, the mean time for the right forefinger to react was independent of N. Recently Ten Floopen, Akerboom, and Raaymakers (1982) repeated the study, varying both the frequency (40 Hz and 150 Hz) and intensity (difference of 9.5 dB) of the stimuli as well as the RSI. They reported data for all fingers. They found no interaction of RSI with the number of choices, comparatively little difference among the fingers, and the plot of MRT was fairly fiat (but with a significantly non-zero slope) for the 150-Hz, more intense signals and quite steep for the 40-Hz less intense signals. Leonard's stimuli were comparable to the former. The authors demonstrated that the effects over stimulus type are not attributable to a speed-accuracy tradeoff. Practice data for the digit-key task are shown in Figure 10.5 and for the light-key task in Figure 10.6, with the number of alternatives as a parameter of the plot. From the linear functions fit in Figure 10.5, one can infer how
FIG. 10.4 MRT versus number of digits, N, in two conditions: when the response was to name the digit presented and when it was to push a button in a compatible arrangement. The data are averaged over 12 subjects, with each contributing 296 observations per point [Figure 5 of Theios (1975); copyright 1975; reprinted by permission.]
398 Identification Paradigms
FIG. 10,5 MRT versus log of the number of trials of experience in the identification of a digit by a key response with the number of alternative as a parameter. The legend identifies the sources of the data, not all of which are included the present bibliography. [Figure 2 of Teiehner and Krebs (1974); copyright 1974; reprinted by permission.]
MRT grows with number of alternatives, with practice shown as the parameter of the plot. This is shown in Figure 10.7. We see that they extrapolate the data to a constant function at something less than 250 msec with over a million practice trials. This suggests that Laming rnay well have been correct that the digit-voice case is flat because the subjects, who are not practiced in the experiment, are treating all runs as a choice out of 10. It also suggests— but no more than that since the million practice trials have not been run—that with practice the efTcd of number of alternatives disappears. Some of Hick's data—those involving about 8000 trials of practice—were not used in developing Figure 10,5. Using the 8000 trial curve estimated from Figure 10.6, Teiehner and Krebs made the parameter-free fit to Hick's data shown in Figure 10.8. It is strikingly good. The model builder is posed with a dilemma. The data suggest, but by no
Absolute Identification of More Than Two Signals 399
means yet prove, that with adequate practice the times get shorter and become independent of N. If true, the models should exhibit this property, which means a different class of models from those we have been considering. But we do not know for sure that the data say this because they have never been collected; we are working with extrapolations, at best a risky business. 10.3 EXPERIMENTS WITH FEW ERRORS: SEQUENTIAL EFFECTS 10.3.1
First-Order Stimulus Effects
The most systematic attack on the stimulus information hypothesis of Hick is that of Kornblum (1967, 1968, 1969, 1973b, 1975), with the 1969 and
FIG. 10.6 Data similar to Figure 10.5 for pairings of lights with key presses. Figure 5 of Teichner and Krebs (1974); copyright 1974; reprinted by permission.]
400 Identification Paradigms
FIG. 10.7 MRT versus Iog2 N as inferred from Figure 10.5 with practice as a parameter. [Figure 8 of Teichner and Krebs (1974); copyright 1974; reprinted by permission.]
1973 papers providing overviews. Basically he pointed out that earlier work, mostly in the two-signal setting (see Section 6.6), had shown that reaction times are faster when a signal is repeated than when it is not, and in most of the N> 2 studies performed prior to his work there was a serious confounding of stimulus information and the probability of a signal being repeated. The question to be considered is what are the relative roles of these two variables as well as set size itself.
Absolute Identification of More Than Two Signals
401
It is useful to examine closely the nature of the confounding. Suppose signal presentations are governed by a Markov chain with asymptotic distribution P(s) and with transition probabilities for successive trials of P(s | s')- Then the probability of a non-repetition, PNR, is given by
and so that of a repetition, PR, is Consider the three ways Hyman varied stimulus information H. 1. N equiprobable alternatives. In this case P(s|s')=l/N
and
P(s)=l/N
and so
and of course H = Iog 2 N. It is thus obvious that PNR and H are monotonically related. 2. N fixed, no sequential dependencies, and variable distribution of presentations. Although explicit formulas are of little help in the case, note that H and PNR each achieve their maximum value when the distribution is uniform. They covary, but not in a simple way. 3. N fixed, uniform distribution, and Markov sequential dependencies. Consider the case where there is a difference between the probability of repetitions and non-repetitions, and all of the latter probabilities are the
FIG. 10.8 MRT versus Iog2 N as inferred for 8000 trials of experience from Figure 10.6 and the data of Hick (1952), which was not used in constructing Figure 10.6. [Figure 9 of Teichner and Krebs (1974); copyright 1974; reprinted by permission.]
402 Identification Paradigms FIG. 10.9 Pairs of sequential probabilities of repetitions that were chosen for experimental study because each pair yields the same value of the measure of information transmitted. [Figure 1 of Kornblum (1969); copyright 1969; reprinted by permission.]
same—that is,
Then
and
Observe that H is an increasing function of p for ( ) < p <(N — 1)/N and decreasing for (N — l)/N
FIG. 10.10 MRT and proportion of error versus stimulus information when both values of PNR yield the same value for information transmitted. The serial experiment involved one light from an array being presented and then responded to by a key in natural correspondence. The discrete experiment was the presentation on a video screen of one of the first four digits, and responses were again key presses. Each observer yielded 300 observations in the serial case and 150 in the discrete case per point. The number of observers was not stated. [Figure 2 of Kornblum (1969); copyright 1969; reprinted by permission.]
FIG. 10.11 For the experiment of Figure 10.10, MRT versus the actual value of PNR used partitioned according to whether or not the stimulus presented was in fact a repetition. [Figure 3 of Kornblum (1969); copyright 1969; reprinted by permission.] 403
404 Identification Paradigms FIG. 10.12 For the experiment of Figure 10.10, MRT versus PNR partitioned according to the number of alternatives and whether or not the stimulus was a repetition. [Figure 5 of Kornblum (1969); copyright 1969; reprinted by permission.]
MRT is a linear decreasing function of the probability of the type of event in question and that, for each level of probability, repetitions are approximately 50 msec faster than non-repetitions. If one assumes that the slopes are identical, which oversimplifies the data a bit, then we can approximate the data by
And if we assume that no other effects than these are relevant, which is doubtful (see Kornblum, 1967), then by definition MRT = P R MRT R + P NR MRT NR = -p2N(N- l ) m + p(N- 1)(6 NR - b K + 2m)- m + bR, which is quadratic in p. [Kornblum expressed the same result in terms of the variable p' — p/(N — 1), which is the probability of a non-repetition.] As he pointed out, for the relatively narrow ranges of H involved in studies with fixed N, it is not easy to distinguish between a linear dependence on H and a quadratic one on p. He also examined the impact of N on MRT using the data of the 1967 paper. The relevant plot is shown in Figure 10.12, where we see that the major impact of N on the linear relations between MRT and P(s | s) or P(s s'), Eq. 10.10, is to affect bR and bNR, the intercepts, with by far the greater impact on bNR. He used this fact to account for what appears to be a
Absolute Identification of More Than Two Signals
405
change in slope of MRT versus H when the data are segregated according to the number of signals employed. This, in turn, may explain why Hick's law fails to account for the MRTs of single signals as a function of —log P(s) in Hyman's data. The reason is that the low probability events arise from data with larger numbers of alternatives and the high probability events arise from those with smaller numbers of alternatives, and so have quite different temporal structures. Kornblum (1975) reported data that suggest the impact of the size of the stimulus set is primarily on the non-repetitions rather than the repetitions. This agrees with earlier studies of Hale (1969b) and Remington (1969,1971), but Schvaneveldt and Chase (1969) found the effect to be about the same for repetitions and non-repetitions. In Kornblum's experiment one signal, called the critical one, had a probability of \ of occurring independent of the total of number of signals employed. In what was called the experimental condition, the critical signal was responded to by the forefinger of one hand whereas all other signals were responded to by the other hand. In the control condition, all signals were responded to using the same hand. Not using thumbs, N could be varied from 2 to 5 in the experimental condition and from 2 to 4 in the control. We see in Figure 10.13 that with sufficient experience (five days of running versus three), the MRT to the critical signal is largely independent of the number of alternatives, which is not true of the other signals (see also Section 10.2.3). The upshot of all this work is a very firm conviction that sequential effects in the signal presentations have a major impact on the response times, and that any model or experiment that ignores this or fails to predict it surely is incomplete and likely is wrong, as well. Often attempts to manipulate something about the signal presentations result in an unintended confounding with sequential effects which is not taken into account. For example, this appears to be an issue in some of Hinrichs' work (Hinrichs, 1970; Hinrichs & Craft, 1971a, b; Hinrichs & Krainz, 1970). As a case in point, in Hinrichs and Craft (1971b) the stimuli are one of three digits on a visual display with one response associated with s, and the other with both s2 and s3. The experimental manipulation was the relative frequencies of the signals: 2:1:1, 1:1:1, 1 : 1 : 2 , and 1:1:4. The subjects were required to predict which signal would be presented and then to react with the appropriate response when it appeared. The MRTs exhibited two main features. Times to correct predictions were faster than to incorrect ones. And in the 1 : 1 : 4 case, the time to predict s3 correctly was significantly faster than the times to the other two correct predictions. No analyses of sequential effects were presented, but since the probability of repetitions of s3 is rather high in the 1:1:4 condition, such effects may very well underlie the apparent effect of signal frequency. At the present time, there are no sequential models for N > 2 . Should a model builder attempt to attack the problem, it would be wise also to take into account some of the recent psychophysical studies of sequential matters
406 Identification Paradigms
FIG. 10.13 MRT for a signal that, independent of N, had a probability of \ (and all others were equally likely) versus N. In one condition, the special signal was responded to with the other hand (heavier lines with a value for N = 5); in another condition the same hand was used for all responses (lighter lines with no values for N = 5, since the thumb was not used). The data are partitioned both as to the type of stimulus and the amount of experience the subject had had in the task. There were 12 observers, each of whom contributed 216 observations per point. [Figure 3 of Kornblum (1975); copyright 1975; reprinted by permission.]
in absolute identification, among them Purks, Cailahan, Braida, and Durlach (1980); Luce, Nosofsky, Green, and Smith (1982); Nosofsky (1983); Treisman (1984); and Ward and Lockhead (1970). In some of these papers the data are analyzed in terms of a Thurstonian model in which sensitivity is measured by d' and response bias by the location of category boundaries. The evidence suggests that there arc sequential effects both on the value of d' and on the category boundaries. The d' effects tend to be small or nonexistent unless the signals on several successive trials are close together on the continuum on which they vary, in which case the changes in d' can be quite sizable. In the random presentation design, the boundaries around the signal just presented appear to separate, making it more likely that a repeated signal will be correctly identified. Presumably the reaction-time effects are related to the changes in category boundaries, although this statement has not been established empirically. Kirby (1975), in discussing data described below in Section 10.3.3, proposed a theoretical scheme not unlike the anchoring ideas that have arisen in the work of Braida, Durlach, and their colleagues (Berliner and Durlach, 1973).
Absolute Identification of More Than Two Signals 407
10.3.2
Higher-Order Stimulus
Effects
Recall that for the two-choice situation, the impact of signal repetitions did not end after just one trial (Section 6.6.1). It seems plausible that the same should be true of data involving more signals, which Kornblum (1973b) has shown for his 1969 data. Figure 10.14 presents MRTs as a function of the number of intervening items between repetitions and, separately, as a function of the number of repetitions. In each case, the absolute probability of the signal presentation is a parameter of the plot. It is clear that repetitions speeded the process and non-repetitions slowed it down.
FIG. 10.14 MRT versus the number of successive repetitions for the experiment of Kornblum (1969), Figure 10.9. The upper panel is for the number of repetitions just prior to a non-repetition, and the lower panel is for the number of repetitions since the last non-repetition. The parameters of the curves are the probabilities of a repetition. For sample size, see Figure 10.10. [Figure 2 of Kornblum (1973b); copyright 1973; reprinted by permission.]
408 Identification Paradigms TABLE 10.1. Mean reaction times (msec) for different classes of transitions between signals and responses (Rabbitt, 1968, table II adapted) (R, S) Group
Nature of signal
Trials Nature or response 301-600 1201-1500
New
New
584 567 491
424 394 383
New
New
608 580 510
484 416 377
4,8
New
New
725 720 652
592 532 416
8,8
New
New
970 693
770 461
2,4
2,8
Equivalent Identical
Same Same Same Same
Equivalent Identical
Same Same
Equivalent Identical Identical
10.3.3 First-Order Response
Same
Effects
Confounded with the stimulus effects just discussed may very well be response ones. For example, Rabbitt (1965) ran an N= 10 design in which all eight fingers and two thumbs were used to make responses. The signal presentations were random except that no signal repetitions were permitted, which was intended to eliminate the major effect of signal repetitions on MRT. This, of course, also means that with accurate responding there were no exact response repetitions; however, there were repetitions involving the same hand, which can be compared with hand alternations. He reported MRTs of 534msec and 581msec for hand repetitions and alternations, respectively. Bertelson (1965) used an N ~ 4 design in which pairs of signals received the same response. He distinguished repetitions of identical signals, of equivalent ones for which the response was repeated, and new ones for which the response was different. The times for identical and equivalent signals were both considerably faster than for new ones, with the identical ones being slightly faster than the equivalent ones. Rabbitt (1968b) elaborated this idea in order to distinguish between stimulus and response effects and to study the impact of practice. He used digits on a CRT as stimuli and key responses with up to eight fingers. The stimuli were partitioned in a natural way into equal-sized equivalence classes, forming the conditions 2R/ZS, 4R/8S, 8JR/8S, and 2R/4S, where the latter replicates Bertelson's experiment. The error pattern was similar across conditions and was about 7.8% for trials 301-600 and 4.4% for trials 1201-1500. Table 10.1 shows
Absolute Identification of More Than Two Signals
409
FIG. 10.15 MRT versusP NR from Kornblum's (1969) four-choice, serial experiment. The solid circles are repetitions (HF) and the open symbols are for nonrepetitions of various sorts. The triangles are HF, the squares HF, and the circles HF. For sample size, see Figure 10.10. [Figure 3 of Kornblum (1973b); copyright 1973; reprinted by permission.]
the data from these two runs. With sufficient practice, Bertelson's finding is reproduced; but with little practice equivalent signals are closer to new ones than to identical ones. Perhaps the most notable feature of these data is the fact that following practice neither the stimulus entropy (compare 2R/4S and 2K/8S) nor the response entropy (compare 2R/8S, 4R/8S, and 8R/8S) has much effect on the identical responses (he provided the necessary statistical tests). Similarly, equivalent responses are not much affected by stimulus entropy, but in contrast they are affected by response entropy. Kornblum (1973b) analyzed his 1969 data in a fashion similar to that of Rabbitt (1965): hand repetitions, H, or not, H, and finger repetitions, F, or not, F. Thus, HF means an exact repetition of the same finger of the same hand. By HF is meant the use of the same finger of the opposite hand. We see in Figure 10.15 that pure repetitions, HF, are much the fastest, HF next, and the other two are indistinguishable. In a review of sequential effects, Kirby (1980) pondered both these and the N = 2 data and attempted to tease out whatever qualitative conclusions that he could. He argued that we are dealing with a preparatory or facilitatory effect and that ". . . the only firm conclusion that the evidence reviewed allows is that repetition effects are predominantly caused by central processes involved in the choice rather than peripheral stimulus identification and response execution processes." (p. 164) At present the model builder is faced with a bewildering array of data that leaves little doubt that sequential effects, both in reaction times and in accuracy of performance, are pervasive and must be an integral part of any accurate model of the process. So far we do not have any such models aside from those of Sections 7.2 and 7.5.
410 Identification Paradigms
10.4 EXPERIMENTS WITH ERRORS 10.4.1
Empirical Speed-Accuracy
Tradeoff
Functions
Exactly the same concerns as those expressed in Chapter 6 about errors for the two-signal case apply here. It is very difficult both to control the error level with sufficient precision and to measure it with sufficient accuracy to be confident that measured reaction times are uncontaminated by inadvertent speed-accuracy tradeoffs. The advice, once again, is: do not ignore the tradeoff, study it. Hick (1952) was the first to do so. By instructions to speed up, he caused himself and another subject to increase appreciably the error rate. He first calculated the information transmitted from the signal to the response, which is Eq. 10.3 evaluated for the response probabilities less Eq. 10.4, where p(s' s) is replaced by the conditional probability of response r given the presentation of signal s, p(r s)—that is, He next converted that result to the equivalent number of error-free responses, Ne, and then showed that the MRT followed the same law, proportional to log(/V(, + 1), as the (relatively) error-free data. Somewhat later this tradeoff was studied more fully in a series of papers (Hale, 1969b; Pachella, Fisher, & Karsh, 1968; Pachella & Fisher, 1969, 1972; Stanovich, Pachella, & Smith, 1977). Hale used the digits 1, 2, and 3 with key presses as the responses, and Pachella and Fisher used a visual marker in one of N unmarked locations and corresponding response keys, where N was 2, 4, and 8 in one study and 10 in the others. Hale manipulated the tradeoff by instruction. Pachella and Fisher used four response deadlines—none, 1000 msec, 700 msec, and 400 msec. The accuracy measure in all studies was again the amount of information transmitted, Eq. 10.11. The major finding, which is completely consistent with Hick's observations, is a linear tradeoff between information transmitted and MRT. This is shown in Figures 10.16 and 10.1.7. Note that for the location of a position, the tradeoff is approximately 3.5 bits per sec, independent of the number of response alternatives; whereas, for the digits it is approximately 14 bits per sec, larger than that for location by a factor of 4. According to Swensson's rule of thumb stated in Section 6.4.3, since the digits in the Hale study were easy to discriminate and the data were collected under time pressure, we anticipate that errors will be faster than the corresponding correct responses. The data are shown in Figure 10.18, with time plotted as a function of the error probability. The decline with probability is extremely systematic. Judging by Figure 10.18, even with unlimited time in the Pachella and Fisher experiments, errors are being made and therefore the signals must be difficult to discriminate. So, according to Swenson's rule of thumb, a response in error should be slower than the same response when correct, but
Absolute Identification of More Than Two Signals 411
FIG. 10.16 SATF in the form of information transmitted (Eq. 10.11) versus MRT with number of alternatives as a parameter. The SATF was achieved by means of response deadlines. There were 12 observers each of whom contributed 192 observations per point [Figure 2 of Pachella and Fisher (1972); copyright 1972; reprinted by permission.]
FIG. 10.17 SATF in the form of MRT versus information transmitted with N = 3 and the tradeoff effected by instructions. Each data point is based on 184 observations; there are data from three subjects on the graph. [Figure 6 of Hale (1969b); copyright 1969; reprinted by permission.]
412 Identification Paradigms
FIG. 10.18 For the experiment of Figure 10.17, MRT as a function of the proportion of errors. The top panel is overall MRT whereas the bottom panel has the MRT partitioned by whether the response was correct or in error. [Figure 5a,b of Hale (1969b); copyright 1969; reprinted by permission.)
these data were not reported. On the other hand, they do report that as the subjects are pressed for speed the fraction of response repetitions increases. For example, with eight signals, the proportion of repetitions in the presentation was reported to be .05 (which is low as compared with a theoretical .125, and which apparently results from the constraint that in every block of 64 trials each stimulus appeared eight times), and the proportion of repetitions in the responses was .08 for the accuracy condition and .15 for the 300msec one. Presumably, these errors arc faster than correct responses. To my knowledge, there are only three theoretical attempts to deal with the N-alternative case in which errors explicitly occur. I take them up in chronological order.
Absolute Identification of More Than Two Signals
413
10.4.2 A Response Discard Model Luce (1960) suggested* a process for reducing a choice from N alternatives to a single one; it is in spirit somewhat different from Hicks model. Denote by R the set of response alternatives which are 1 :1 correspondence with the set S of signals. Denote by P(r, t \ s, R) the joint probability density that response r is made at time f given that stimulus s was presented and that R is the response set. [In my 1960 presentation I was not explicit about the conditioning on the signal presented. Indeed, the way the paper was written suggests that I had in mind overall response probabilities P(r) = £ P(r | s)P(s). 1 cannot recall exactly what I intended, but in any event I now believe that it should have been stated in terms of P(r s), and that is how I will formulate it here.] The assumed mechanism for responding is that the subject discards, one-by-one, possible responses until only one alternative remains, which is then the response made. Denote by Q(fc, T \ s, R) the probability of discarding response k at time r given response set R and signal s. The following convolution describes the process if we assume that the subsequent choice probability following a discard is an event independent of the discard:
I his is the first major postulate ot the theory, it trie discard and choice probabilities are families of gamma distributed random variables, the parameters can be selected so Eq. 10.12 holds (Luce, 1960, Theorem 5). By definition, the choice probabilities are the marginals obtained by integrating out the time, and as a technical postulate it is assumed that they can be obtained from their moment generating functions by taking limits:
* Laming (1977a) pointed out that I had made an error in the proof and by a counter example showed that the theorem, as asserted, is false. He stated a modified result, but that proof makes use of an assumption that was not explicitly stated. In correspondence, he has convinced me that it is a plausible assumption, and so I will state the corrected version of Laming's theorem. I will make clear which assumption is implicit in his paper, to which the reader is referred for a proof of the result.
414 Identification Paradigms The second major postulate of the theory is that the choice axiom (Luce, 1959) holds among these probabilities; namely, if they differ from 0 and 1, then for r e A c R,
This simply says that the response probabilities of a subset of alternatives can be obtained from those of a superset by normalizing them over all alternatives in the subset. The mean reaction time of response r to signal s when the response alternative set is r is obtained by computing it from P(r, i s, R), and we again assume as a somewhat technical postulate that it can be obtained in the usual limiting fashion from the moment generating function:
and the mean discard latency is
The next major assumption of the theory is that for each stimulus s, a continuous function fs exists such that This postulates a somewhat strong form of a SATF (Section 6.5.1) not unlike those found in some, but not all, models for two-stimulus experiments.* Finally, the implicit assumption in Laming's proof is that if r is in R and if we consider a series of process Pf satisfying the other assumptions and such that then for r' =/= r,
* In my 1960 paper I assumed a weaker form of this—namely, that f depended on both s and R, and this was shown by Laming (1977a) not to he sufficient to get the result. I have subsequently shown that a variant of my original assumption will do, but the proof has not been published.
Absolute Identification of More Than Two Signals
415
In words, if a response has no chance of being selected, then in terms of the mean discard latency it does not matter whether that response is in the choice set or not. Theorem 10.1. // the assumptions embodied in Eqs. 10.12 through 10.19 are satisfied, then for each stimulus s there exist constants A(s) and B(s)>0 such that The proof can be found in Laming (1977a). Two immediate consequences of Eq. 10.20 are worthy of note. First, if we denote the N signals by Sj, s2, . • - , SN and the corresponding responses by TI, r2, . . . , PN> then for each I,
So, on the assumption (which seems always to be satisfied) that any error probability is smaller than the correct response probability to the same stimulus, then the corresponding mean error latency is longer than the corresponding correct one. Note that this does not have anything to do with the generalizations made in Section 6.4.3 about correct and error latencies for the same response, not signal. Here we are talking about rows of the stimulus-response matrices, and there we were talking about columns of the same matrix. In fact, to the extent that the stimuli admit error-free performance, so P(rt | S;), i± j , approaches 0, the mean time approaches infinity. Laming (personal communication) feels that this is an absolutely fatal flaw of the model—so bad, in fact, that he feels the model should be forgotten. His argument is that when stimuli are perfectly identifiable, errors tend to be very fast, not very slow. The question to my mind is whether those fast errors can reasonably be considered a part of the same mechanism. As we have seen, there is substantial reason to suppose that they may be fast guesses or some other type of anticipatory response that is not really a part of what is being modeled here. Moreover, when the signals are actually confused to a substantial degree, as in Laming's own data (see Appendix C.8), the error responses to a signal are slower than the correct ones to it. The second consequence of Eq. 10.20 is this
On the surface this resembles somewhat the empirical hypothesis that mean reaction time is linear with transmitted information. So we should look into
416 Identification Paradigms
the relation—or, as will become apparent, the lack of clear relation— between the present model and that hypothesis. Consider, first, the errorfree case in which P(r, x,, R) = 1 and so X P(r \ s, R) log P(r s, R) = 0. So E(T s, R) = A(s). We are not really sure what function A is of s, but if the information theory hypothesis were correct, then A(s) = a-MogP(s). Of course, we know from Hyman's work, this is not really accurate. Nonetheless, proceeding as if it were true, then from Eq. 10.21,
This has some resemblance to the generalized hypothesis that
information theory
)
but a careful examination reveals major differences. The Z p In p term on the left of Eq. 10.23 differs from that on the left of Eq. 10.24 in that the former is totally about stimulus probabilities and the latter about response probabilities. However, to the extent that response probabilities match response ones, which is quite common in data, they are the same. The double sum terms on the right arc the same except for the factor B(s) under the s-summation in Eq. 10.23. To the extent B(s) is constant, these terms are the same. So while there is no formal identity between Eqs. 10.23 and 10.24, there may be a practical one. Let us turn to data. Luce (1960) compared some data of Kellogg (1931), even though they were really not appropriate, to Eq. 10.20, and the fit was poor. Laming (1968) reported appropriate data in his Experiment 6, which was an N = 5 absolute identification of lengths in which the signal probabilities were varied from ,'s to r5. Two groups of 12 subjects were run, the one with probabilities increasing from the thumb to the little finger and the other with the pattern reversed. These data, which Dr. Laming has kindly provided me, are presented in Appendix C.8 with the two groups of subjects combined so that signals of the same prior probability, regardless of which finger was used, are averaged. Laming (1968, Fig. 7.3) plotted
where versus -log P(r R). Because he believed the theory to predict a linear
Absolute Identification of More Than Two Signals 417
FIG. 10.19 MRTjj-MRT,, versus In P(r, [ s,)/P(r, | s,), which should be linear with slope a function of s{ if Eq. 10.21 is true, for Laming data of Appendix C.8.
relation between these variables and the plot was non-monotonic, he concluded the model to be in error. [Laming's discussions in 1968 and, especially, in 1977b make clear that he was perplexed by my failure to make explicit reference to a stimulus in my 1960 paper. My presentation and attempt to use Kellogg's data reasonably lead one to conclude that I was talking about marginal response times and probabilities rather than conditional ones.] If, as I now believe is a reasonable interpretation of the assumptions, the model holds for probabilities conditioned on the stimulus, then substituting Eq. 10.20 into Eq. 10.25 yields
which predicts nothing unless A and B are known functions of s. Probably the best test is to plot Eq. 10.21 because one of the two unknown functions drops out. This is shown for Laming's data in Figure 10.19. Aside from the apparently anomalous point at —165 msec and having a response probability of 0.013, these data are not inconsistent with the hypothesis of a linear fan through the origin, but with a slope that decreases
418 Identification Paradigms as signal probability decreases. To be sure, this is not a stringent test of the model, but I do not believe these data are adequate to reject it. An experiment with more stimuli more densely packed so as to guarantee more confusions is needed to tax the model adequately. The theory—although not necessarily the conclusion embodied in Eq. 10.20—can, of course, be rejected by showing any of its three major assumptions to be wrong. At this point there are considerable data to suggest that the choice axiom is not valid when the set of alternatives is highly structured (Luce, 1977), which is the case when they are onedimensional. Laming (1968) studied this directly and again in 1977b when he published a corrected and somewhat more detailed analysis. His conclusion, which accords with other studies, is that the choice axiom is incorrect in this situation. 10.4.3
A Generalization of SPRT
Laming (1968) appears to have been the first to suggest a version of the random walk for N>2 alternatives. Basically, the idea is that at each instant t there is associated with each possible response r an "information" random variable /(r, t), which represents the accumulated evidence favoring response r. As in the two-alternative case (Chapter 8), different models result depending upon how I(r, t) is assumed to develop in time. For example, in Section 8.3 we explored two distinct models, and another arose in Section 9.2.1. The one proposed by Laming in Section 3.3 of his monograph and worked out in detail in his Appendix B is a continuous generalization of the SPRT two-choice model (Section 8.3.1). It can be stated as follows. At time t there is a probability ir(r, t) that r is a correct response, where X ir(r, t) = 1. This is assumed to be related to the information favoring r by a Y
log odds expression:
The response rule is as follows: associated with each possible response is an error criterion Ar, with 0 < A,. < 1 and A,. + A r . < 1, such that response r is made at time t if and only if, for all r' and all r < t, ir(r', r) < 1 — A r >, and -rr(r, f ) > 1 - A r . The major assumptions of the theory are equivalent (by Laming's Theorem 3.1) to the following assertion: For each S t > 0 , there exists a series of independent random variables X s j , s = \,...,N, and z ' = l , 2 , . . . , with distribution functions Ft(x s), such that the products k
[I FJ(XJ | s) are all distinct for sufficiently large k and for some * ; , . . . , xk,
i= I
Absolute Identification of More Than Two Signals 419
where P(s) is the presentation probability of signal s. This is, in essence, a Bayesian information accumulation assumption. Laming proved a number of technical results about the process he postulated as well as the following important substantive prediction (Theorem 3.6): The response distribution is independent of whether the response is correct or incorrect. Recall that this was a major property of the SPRT model, and that it was seriously violated by most of the two-choice data. It is also incorrect for Laming's (1968) five-choice data (Appendix C.8) as we summarize: Response alternative corresponding to signal with probability
MRT (correct) -MRT (error)
J_ 15
_2_ 15
70
6
_3_ 15
-100
_4_ 15
JL 15
-100
-191
We see that there is a strong interaction with the probability of the signal, with errors being fast for the low probability signal and slow for the three highest probability signals. The data clearly do not agree with the SPRT model. Little additional work seems to have been done with this model, so I do not pursue it further here. 10.4.4
Accelerating Cycle Model
G. A. Smith (1977, 1980) described a sequential model I find difficult to formulate to my satisfaction. Part of my discomfort is that several of the assumptions seem entirely ad hoc, explicitly chosen to yield just the right properties. To the degree that this is true, the model lacks both elegance and depth. Another part of my discomfort is that the stochastic process involved is not made explicit, and the equations derived for the deterministic case do not provide much insight into the general case. Suppose that initially each possible signal s has associated with it an amount of excitation e(s), where this variable is normalized so that £ e(s) = s 1. Suppose that when s(, is presented, and
The excitement associated with non-presented signals is considered the noise level of the system. Next, a sensory processor is assumed to cycle over all the potential signal
420 Identification
Paradigms
alternatives, on each cycle extracting information that is ultimately going to determine the response. For a specified signal-response mapping, r = 0 ( s ) , there is assumed to be some associated time a(s), the time devoted to processing signal s is a(s)e(s), for a total cycle time of £ a(s)e(s). However, and this seems most ad hoc, it is assumed that each cycle gets faster in proportion to its cycle number. In particular, on the mth cycle, the cycle time is assumed to be £ a(s)e(s)/m. I can see no very principled reason for assuming this, although the motive for doing so is that it produces a logarithm. In particular, suppose that the cycling process terminates after k cycles, then the total mean time is
and replacing the first sum by an integral one obtains the approximation
The next question is how to determine k. Smith observed that if E(T) is to be linear with In N, then it is necessary that k be proportional to N. One way this might come about is to assign each response the prior probability 1/N, which is then modified on each cycle by the excitement accorded the signal that corresponds under the mapping 0 to response r; that is, after k — 1 cycles we have
If we suppose there is a response criterion X(r) for each response, then for the response r to occur at cycle k requires
Solving for k and substituting into Eq. 10.26 yields
where
Fundamentally, very little has been shown at the expense of two highly arbitrary assumptions. Moreover, the model does not make clear—as was done for the Wald model in Chapter 8 and for both of the two preceding
Absolute Identification of More Than Two Signals 421
models—how the response probabilities and response times covary. One reason is that, as stated, it is deterministic; presumably it could be made stochastic by making e(s) a random variable. It also lacks freedom to handle unequal presentation probabilities. By playing on the enormous freedom existing in the three unspecified functions—e(s), a(s), and A(r)—Smith attempted to give an account of various phenomena in the literature (see pp. 201-210 of Smith, 1980), but it is far from compelling. 10.5
CONCLUSIONS
The chapter has been relatively brief and so will be my conclusions. It is evident that we don't have as much solid empirical information about the absolute identification of more than two signals as we do for two, and the development of theoretical ideas about them is also not as rich. This is by no means unique to response times. In psychophysics, generally, detection and discrimination experiments and theory are far more extensively developed for two signals than for more than two signals. Part of the reason is the difficulty in getting adequate sample sizes in a reasonable length of time and part is the difficulty of avoiding an exponential explosion of free parameters in the models. The major empirical findings are that MRT grows approximately linearly with information transmitted, although in detail this statement is surely incorrect. No detailed generalization seems to exist, especially when one attempts to take into account the very pervasive sequential effects. They are complex and ill understood. There are also substantial practice effects, and with some overlearned stimuli the MRT appears to be virtually independent of the size of the stimulus set, although that may result from an inability to confine attention to subsets of the overlearned set. Models that attempt any sort of detailed analysis are very few and at present lack any substantial empirical support. All in all, the situation is very unsatisfactory.
This page intentionally left blank
Ill MATCHING PARADIGMS
This page intentionally left blank
11 Memory Scanning, Visual Search, and Same-Different Designs 11.1 11.1.1
MEMORY SCANNING AND VISUAL SEARCH The Experimental Designs
In the memory scan or the item recognition task, as Sternberg (1966, 1967a, b, 1969a, b, 1975), who made it popular, called it, the set of N stimuli are partitioned into two parts. The one, composed of M items, is presented to the subject and is called either the memory or positive set. The other, consisting of D = N — M items, is the distractor or negative set. On each trial an item is presented and the subject is simply to recognize whether or not it is from the memory set. So, independent of N and M, there are always two possible responses. Items and responses are called positive and negative according to which set they are drawn from or associated to. The scheme is shown in Figure 11.1. The basic data are response accuracy and response time. Typically, the instructions are ambiguous as to the speed-accuracy tradeoff, emphasizing both responding as rapidly as possible consistent with highly accurate performance, which can be nearly perfect since the items are not ambiguous. Error rates of less than 2% to 5% are usually obtained and are considered acceptable. So for most of the studies (but see Sections 11.3.6 and 11.3.7) the analysis is entirely in terms of response times, often just the MRT but more recently in terms of other aspects of the response-time distribution. Of course, it is important to examine the times separately for positive and negative items. It may also be important to consider sequential effects, but they have not been much attended to in this literature. The experimenter has a number of choices to make. Usually M and N are parameters to be manipulated. The class of items must be selected. Often letters or digits are used, but other more-or-less homogeneous classes have been employed. The positive set and/or the negative set can be fixed throughout a long run of trials or they can be varied from trial to trial. Usually the union of the positive and negative sets is constant throughout the experiment. When the positive set is held fixed, the procedure is called fixed- or consistent-set; when it is varied on each trial, varied-set. Since the stimuli are usually highly recognizable and the error rates are low, there is little reason to provide trial-by-trial feedback about accuracy. Often after a block of trials feedback is provided and payoffs are used in an attempt to hold the error rates to a low level. There is, of course, the possibility of 425
426 Matching Paradigms
FIG. 11.1 Scheme for the memory scan experiment [Figure 2 of Sternberg (1969b); copyright 1969; reprinted by permission.]
some sort of feedback about the response times. For example, one can attempt to manipulate the speed-accuracy tradeoff by means of time-based or deadline payoffs, although little research of that sort has been done. Another choice is how to present the positive set—either sequentially or simultaneous. Sternberg used a sequential presentation, presumably to insure the same exposure to each item, which cannot be controlled if all items are present at once. The visual search task differs from memory scanning in two major ways. First, the order of presentation of the test item and memory set are reversed, in which case these names are inappropriate. They are called the target and the search list. Second, since our interest is in how long the subject takes to process the items in the search list, it cannot be presented one item at a time. And when presented simultaneously, there are questions of how best to display the items. For example, if they form a linear array, then reading habits as well as changes in visual angle with number of items become issues. 11.1.2
Typical Memory Scanning Data
In Sternberg (1969b) we find a varied-set procedure involving the ten digits: 0, 1 , . . . , 9 . On each trial a positive set of from one to six digits was presented visually and sequentially at a rate of 1.2 sec/digit. Two seconds after the last positive item was completed, a warning signal occurred and it was followed by a test item. The response was to pull one of two levers corresponding to the test item being positive or negative. Average data for eight subjects are shown in Figure 11.2. These data are noteworthy in two ways. First, in sharp contrast to those from absolute identification (Section 10.2.1), MRT increases linearly with M, not with log M. Second, the search time per item is the same for positive and negative items. The experiment,
Scanning, Search, and Same-Different
Designs 427
using both fixed- and varied-set presentation and using visual search as well as memory scanning designs, has been repeated by many researchers, all finding basically the same result: linear growth with the same rate, about 40 msec per item, independent of whether the item is positive or negative. These are robust results. A review of the earlier studies can be found in Nickerson (1972). A recent study by Hockley (1984) compared memory and visual search tasks in the following way. He fitted the distributional data by the exGaussian distribution (Section 3.2.1 and Appendix B.2), and then plotted the three parameters, /u, and a of the Gaussian and A of the exponential, as a function of set size, separating positive and negative cases. In the visual search, the slope of p, with number of items was about 70 msec/item for positive ones and 90 for negative ones, whereas a and A had much smaller slopes, under 30 msec/item. By contrast, in memory search p, and or showed little change, less than 12 msec/item, and A was greater than 30 msec/item. It is unclear exactly what interpretation to give to these result since the ex-Gaussian distribution is ad hoc, with no theoretical basis. There is an additional empirical concern, as S. Sternberg (personal communication) has noted: these data are unusually variable, with a standard deviation 2j times greater than Sternberg found in comparable data. 11.2 11.2.1
THE SERIAL, EXHAUSTIVE SEARCH MODEL The Model
Sternberg (1966) proposed the following, now classical, account of these data. Suppose the items in memory are coded serially. The items are drawn FIG. 11.2 MRT versus the size of the memory (or positive) set in a memory search design described in text. The data are averaged over eight subjects, for a total of about 95 observations per point. Standard errors of the means are shown by vertical bars. [Figure 1 of Sternberg (1966); copyright 1966; reprinted by permission.]
428 Matching Paradigms
singly from memory, compared with the code of the test item, and so on until the memory is exhausted. Each comparison either achieves an acceptable match or not. After exhausting the memory, the positive response is initiated if a match has occurred; otherwise, the negative response is initiated. The search is assumed to be serial in the sense that items arc drawn from memory, compared, and a match is achieved or not prior to the next comparison. The search is exhaustive in the sense that the entire memory is exhausted before initiating a response. If the mean time per comparison is k, the mean residual time is r () , and there are M items in memory, then
independent of the test item. As we noted, for these experiments k is in the neighborhood of 40 msec. The assumption of a serial search of memory did not, at first, attract nearly as much attention as the assumption that the search is exhaustive. On the face of it, exhaustive search seems less efficient than a self-terminating one. For if it were self-terminating and assuming no correlation between the signal presented and the order in which memories are extracted, then when the test item is positive it will require on average M/2 comparisons to achieve a match. Thus, the prediction is that Kq. 11.1 will continue to hold for negative test items and
would hold for positive ones. Note that the slope is half of that in Eq. 11.1. Accepting the serial assumption, the data seem to leave no choice—the search appears to be exhaustive rather than self-terminating. Certainly that is consistent with these data, but it is not, in fact, forced by them (see Section 1 1.3.1). Sternberg (1969b, p. 444) suggested that the optimality argument could be made to favor exhaustive scanning if the scanning process were very fast compared to a process that yields an output exactly when a match occurs. This could happen, for example, if considerable time is consumed in passing from the scanning process to examining the counter that registers whether or not a match has occurred. If the switching time plus that of examining the counter is, on average, h, then instead of Eqs. 11.1 and 11.2 we have for the exhaustive search
and for the self-terminating search
Scanning, Search, and Scime-Different Designs 429 Thus, on a positive item, if k « h, which is positive for h > kM/(M — 2)>k. To demonstrate that this may really be the case, Sternberg ran a study just like the memory scanning one except that the subject was to report which item in memory was the immediate successor to the item that matched the test stimulus. The evidence supports a self-terminating search, but with a much slowed scanning rate of 124 msec per item, which means an actual rate of 248 msec per item, some six times slower than simple scanning. 71.2.2
Reactions to the SES Model
The attack on Sternberg's interpretation of his data has been intensive and sustained. It can be partitioned into three major phases that overlap in time. The first to appear consisted of experiments and data analyses intended to show that the SES model fails to give a satisfactory account of aspects of the data other than the simple means. This is taken up in Section 11.2.3. Second, alternative models were proposed that are either not serial, or not exhaustive, or neither serial nor exhaustive but that give at least as satisfactory an account of the data. The role of these models has been to make clear that the serial, exhaustive search model is by no means uniquely forced by the original data. One of the principal features of some of these models has been to add another conceptual element to the modeling—the idea of limited capacity to carry out comparison processes. Some of these alternative models are discussed in Section 11.3. The third phase has been to study, as generally as possible, consequences of each of the three major conceptual building blocks of these mental models of memory. The first is the structure of the search process with the two most extreme cases being serial and parallel systems. The second is the nature of the termination of the search with the two simplest cases being exhaustive and self-terminating search. The third is the availability of resources (sometimes called capacity, other times attention) to each of the several comparisons that are being carried out. These distinctions are initially sharp but they become blurred in various ways. Nonetheless, the goal is, so far as possible, to see if we can devise differential predictions and appropriate experiments that allow us to make decisions about each aspect, more-or-less independently of the others. The most general results to date, which are taken up in Chapter 12, concern the structure of the search process. 11.2.3
Failures of Prediction
According to Townsend and Ashby (1983), four aspects of the data are inconsistent with the SES model. The first two arise simply as a result of
430 Matching Paradigms
additional analyses. The other two entail some, but not very major, ehanges in the experimental design. The first is what happens to the variances. According to the serial exhaustive model and assuming that comparisons are independent of each other and of the residual processes, then for each test item whether in the memory set or not, where ov- is the variance attributable to a comparison and CTR is that of the residual process. (Even if there were correlations, the variance should not differ between positive and negative test items, unless the occurrence of a match were to alter the correlation.) The data arc otherwise: the variance of the positive items grows more rapidly with M than does that of the negative items. For example, Schneider and Shiffrin (1977), using memory set sizes of 1, 2, and 4 symbols and independent of that choice a test display of 1, 2, or 4 symbols, showed a differential growth of variance occurs with the varied mapping procedure but not with the consistent one. Note that such a differential growth in variance is to be expected in a self-terminating model since for the positive items the number of comparisons, K, required for termination is a random variable that adds to the total variance as where k is the mean comparison time. Of course, these data do not imply self-terminating search; they merely cast doubt on the serial, exhaustive search model. A second aspect of the data are the serial position curves; these are a further partition of the data by the location of the test item in the sequential presentation of the memory set. So for each memory size, there is a curve of MRT as a function of the serial position of the test item. According to the serial-exhaustive model, these curves should be flat, with the larger memory sets being slower than the smaller ones. The same prediction is true of the serial, self-terminating model if the memory is accessed randomly, but not if it is retrieved according to the order of presentation. Sternberg reported flat curves, but a number of other authors have found recency effects (Atkinson, Holmgren, & Juola, 1969; Clifton & Birenbaum, 1970; Corballis, 1967; Morin, De Rosa, & Stultz, 1967; Morin, De Rosa, & Ulm, 1967; Townsend & Roos, 1973). Sternberg (1969b) suggested that it may be due to highly rapid presentations of both the items in the memory set and the test item. Nonetheless, there surely are circumstances when the serial-exhaustive model breaks down. A slight modification of the visual search design is to include more than one copy of the target item in the search list. The finding is that MRT decreases with increased numbers of replicas of the target items (Bjork & Estes, 1971; Bacldcley & Ecob, 1973; Van der Heijdcn & Menckenberg, 1974). No such effect is predicted by the original serial-exhausitive model.
Scanning, Search, and Same-Different Designs
431
By itself, this effect can be incorporated into the model simply by supposing target items are processed more rapidly than nontarget items, which could occur if features are compared and a sufficient number of feature matches leads to a match (see Section 11.4.2). For if kT is the mean target time, k (>k T ) is the mean nontarget time, and T the number of targets, we see
The effect is to alter the intercept, but it does not alter the common value of the slope for target and nontarget items. Townsend and Ashby (1983, Chapter 7) suggested that a way around these problems is to use designs with r targets and M items in memory and to vary (T, M) so that the mean number of items searched to the first target—namely, (M + !)/(T+ 1)—is a constant. So, for example, (1, 3), (2, 5), and (3,7) all have a mean of 2. Under self-termination the mean time to respond on target trials is
which is constant. By contrast, under serial exhaustive search the times are (M-T)k + TkT. If (M+1)/(T+1) = C, then M-T = (C- l)(r+ 1), and so the serial exhaustive search time can be rewritten as T[k T + ( C - l ) k ] + (C+l)k, which is not constant. To my knowledge, this experiment has not yet been performed. The first modification that raises problems about the serial aspect of the model is to present the search set of a visual search experiment in a circular rather than a linear array (Egeth, Jonides, & Wall, 1972; Gardner, 1973; Van der Heijden & Menckenberg, 1974). This design change greatly reduces the slope of the MRT versus M curve, even being flat in some cases. This means either that the model is wrong or that, under these display conditions, the scanning rate is incredibly fast.
11.3 ALTERNATIVE MODELS This section is devoted to specific models intended to account for at least as much of the data as the serial, exhaustive search model. In addition to the serial or parallel, self-terminating or exhaustive, and various capacity models we discussed below, several others of a similar nature have appeared in the literature. Many of these are due to Townsend with various collaborators,
432 Matching Paradigms
and they are summarized in Chapters 6 and 7 of Townsend and Ashby (1983). Included are a self-terminating model in which the scanning rate is not constant, an exhaustive model in which the scanning rate differs for positive and negative items, a broad class of serial self-terminating models that is rejected by the data (Townsend and Roos, 1973), and an experimental design intended to force self-terminating behavior and so to permit a choice between serial and parallel models (Taylor, Townsend, and Sudevan, 1978). 11.3.1 A Serial, Self-Terminating, Search Model: Push-Down Stack The first, and perhaps best known, of these alternative models is a serial self-terminating search model developed by Falmagne and Theios (1969), Theios and Smith (1972), Theios, Smith, Haviland, Traupman, and Moy (1973), and Theios (1973). It is called a push-down stack model and it demonstrates that exhaustive search is not a necessary consequence of the data even if the search is assumed to be serial. Memory search is assumed to be ordered from top to bottom, it is serial, and it is self-terminating. Memory acquisition is push down in character, with two major exceptions. First, if an item is presented that is already in memory, say at location i, then instead of there being two representations of it, one at i + 1 and the new one at 1, there is a single representation at j , where 1 < y < i . The items previously at /, / + 1, . . . , i — 1 each move down one level: j + k to / + k + 1. Second, the lowest level of memory is identified with long-term memory. Unlike the other levels, which can hold only a single item, long-term memory holds many. It is searched only after the other levels have been searched. A major difference from the Sternberg model is the assumption that all items presented enter the memory stack, so it includes both positive and negative items, labeled accordingly. Since the model is serial and self-terminating, it predicts the same linear relation for both positive and negative items provided they are presented with the same probability. Moreover, because of its built-in assumption about the effect of repetitions and the push-down character, it admits sequential effects. 7 / .3.2 Comparison of Serial-Exhaustive and Self-Terminating Models One notable difference between the two models, as stated, is that the push-down model is greatly influenced by the probability of a stimulus being presented. Theios et al. (1973) investigated this in an experiment designed to unconfound presentation probability and set si/e. In their design the positive and negative sets, consisting of digits, were of the same size and varied from one to five. Except for the case of one item, presentation probabilities were varied according to the same pattern in both sets. At each set size, presentation probability had comparable effects for both positive
Scanning, Search, and Same-Different
Designs
433
and negative items. For example, in going from 30% to 5% in set size 3, the MRT increased about 80 msec. Using an analysis of variance, the effects were significant. In addition, at each probability level there was an effect of set size. Various versions of the push-down stack model were simulated and fit to the data quite successfully. However, the exhaustive model is also able to account for them, with more parameters, by assuming that the encoding process depends upon the presentation probability. In this study no significant interaction between presentation probability and set size was found. None is predicted by the exhaustive model and although a correlation is predicted by the self-terminating model, it was too small to be seen. Theios and Walter (1974) repeated Sternberg (1967b), and they did find a significant interaction. Snodgrass (1972) also presented data favoring self-terminating over exhaustive search (see Section 12.3.4). In a still unpublished paper, however, Sternberg (1973) raised distributional arguments against the self-terminating serial models. Assume that N stimuli, presented equally often, are in the buffer. Let Fj(t) be the probability distribution that it takes time t to search for and respond to an item when it is in memory location i—this is assumed to depend on the location, but not on the particular item in store. The main assumption made about these distributions is that they are ordered by their location, later ones taking longer than earlier ones: for all r > 0 and all ; < f c , Next, let a^ be the probability of item i being located at memory site /. Since every item is somewhere and every location is filled,
Now let Gj_ N (t) denote the probability distribution of a response to item i when N states are involved, and let G N denote the average distribution. Then,
and
It is interesting that GN does not depend upon the «;,.
434 Matching Paradigms
Now, suppose M
Stcrnberg called this the short-RT property. Note that its derivation does not depend upon Eq. 11.3, but merely on Eq. 11.5 and Fj(()~0. To derive the long-RT property, suppose 0<M<7V. From Eq. 11.3, we know that for /<M, F , > F M h l , so
And for M + l < y , Using these two inequalities, we see that
Combining Eqs. 11.6 and 11.7,
Sternberg tested this prediction using N = 4, M =: 2, and N = 8, M = 2, so and in a study of the absolute identification of digits. As we would anticipate, MRT was linear with log N. The long-RT property was satisfied, but the short-RT property was grossly violated. He also applied the method to memory scanning data, and again rejected the serial, self-termination hypothesis.
Scanning, Search, and Same-Different
Designs 435
11.3.3 A Parallel, Exhaustive Search, Capacity Reallocation Model Atkinson, Holmgren, and Juola (1969) and, independently, Townsend (1969, 1974) proposed the following parallel, exhaustive search model to account for Sternberg's data. Suppose the test item is simultaneously compared with the M items stored in memory. The assumption is that the hazard function ht(t) of the ith comparison represents the part of some finite total capacity that is devoted to that activity. Let u denote the total, then the assumption is that As we know (Theorem 2.3), in a race among independent processes the hazard function to the first completion is the sum of the individual hazard functions, and so is u. When that first completion occurs, it is assumed that the capacity is then redistributed among the remaining M— 1 alternatives, at which point the process continues as a race between M — 1 independent process with overall hazard function v. This repeats until all M comparisons have been completed. Thus, the overall time to complete the memory search, which since it is exhaustive is independent of whether the test item is positive or negative, is gamma distributed with parameters M and u. The mean is, of course, M/v. Note that because of the reallocation, the random variables describing the overall times for each comparison are not independent. Sternberg (1966) showed that no independent, parallel model with fixed distributions for the processes can yield a time proportional to M. If, however, the distributions are allowed to depend upon M, then, as Townsend (1974) has shown by example, it is possible to get the desired prediction. These models share the weakness of the serial, exhaustive search model in being unable to account for serial position effects and incorrectly predicting the variances to grow linearly with M. 11.3.4
An Optimal, Capacity-Reallocation Model
During World War II a theory of optimal search (for submarines) was developed (Koopman, 1956a, b, 1957; see Stone, 1975, for a more recent review). Shaw and Shaw (1977) and Shaw (1978) proposed to adopt it as a psychological theory. Basically, the model assumes that attention is allocated over time to several possible locations as in Eq. 11.8. If the target is in location j, it is assumed that the probability of a response—namely, spotting the target—by time t, has the distribution corresponding to the hazard function h ; (t); that is, by Eq. 1.8,
436 Matching Paradigms where
Stone (1975, Theorem 2.25) showed the following. Suppose the cost of the search is proportional to the effort expanded, that is, to total capacity
and, for each i, the detection function ft(t) is what he calls regular—namely, it is 0 if Hj(f) — 0 and F' ; (f) =- /i ; (f) cxp[- H,(()] is continuous, positive, and strictly decreasing. Then there is an allocation function h*(t) = [ h * ( t ) , h*(t), . . . , h*(t)] of capacity that is uniformly optimal in the sense that for any other allocation function h ( i ) meeting the cost requirement, the probabilities of detection satisfy
Since, by Feller (1966, p. 148), for any random variable T; with distribution Fi,
it follows immediately from Eq. 11.10 that
that is, the uniformly optimal search also minimizes expected search time. This simultaneous optimization of two variables, which we do not normally expect, arises in this case because cost is proportional to hh which is functionally related to response time. Shaw erred in thinking thai Eq. 11.9 insured that Ff would be concave down, as is needed for Stone's theorem to apply. It has to be an added assumption. Assume that detection is exponential—that is, for some f,,,
where v{ is the capacity allocated to the search in location i, and assume that total capacity is constant,
Suppose the alternatives have been numbered so that the prior probabilities of target location are F( > P 2 > • • • > PN, and let t{ be the first time that the prior probability equals the posterior probability of the region i being searched (i.e., a region i is searched at time t if h j ( t ) > 0 ) . Then it can be shown (Stone, 1975, p. 55) that t , < t 2 < - - - < f N and that the optimal
Scanning, Search, and Same-Different
Designs 437
allocation function satisfies for i < j and t> f,
From this it follows (Shaw, 1978, Appendix B; see Section 11.3.5) that the expected time to discovery of a target at location / is given by
For the experiment to which she applied the model, there were two levels of prior probability, Pa and Pb, Pa > Pb, with na locations having probability Pa and nb with probability Ph. In that case, it follows from Eq. 11.14 that
and so one test of the model is to estimate 1/u from different choices of Pa and Pb. In such an experiment Shaw (1978) found satisfactory agreement for 10 of 14 subjects. In terms of the memory search experiment we see that the model is self-terminating, but it is neither serial nor parallel, but rather hierarchal. It predicts a linear growth with the number of items because when the target locations are equally likely—that is, P{ = \/N, the term In P,/Pj of Eq. 11.14 dropis out and so E(Tj) = N/u. Adding a residual time yields the linear relation. Note that no serial position effect is predicted. As it stands, the model does not deal with the absence of a target. *H.3.5
Proof of Eq. 11.14
The following proof is taken from Appendix B of Shaw (1978):
where
We evaluate G;J. First, suppose i < /', then for t < t, we know by definition of
438 Matching Paradigms tj that h,-(0 = 0 and for t > t, by Eq. 11.13 that expf-ff i (t)]=(P i /P i )expf-/d Differentiating and substituting,
Observe that for t>tN, Ht(t)--=vtlN, so u(°c) = 0 and Substituting, Next, suppose i>j; then for t
As before, u(°°) = 0 and u(t-,) = exp|"--Hi(t;)l = exp(-O) = 1, so Putting these two terms together yields Eq. 1 1.14. 11.3.6 A Parallel, Self-Term mating Search Based on a Continuous Random Walk Perhaps the most sweeping theoretical model in this general area is the memory retrieval model of Ratclifif (1978). It assumes that whenever a person must compare a signal presented with items stored in memory, it is done by carrying out in parallel and independently as many searches as are needed. If a match is found in any of the searches, then all processing ceases and a match response is made. Assuming that, the only question is how an individual match is carried out. Ratcliff assumed that it can be described as a continuous random walk of the type discussed in some detail in Section 9.2.1. Recall, there are four parameters: a diffusion rate m and variance
Scanning, Search, and Same-Different
Designs 439
stimuli, those in the memory set and those not, we anticipate two diffusion rates—one positive and one negative—which Ratcliflf denoted u and v, respectively. He assumed that cr2 (which he symbolized by s2) is the same for both. The original wrinkle introduced at this point, one that has given this model a great deal of flexibility, is that the rates are in fact random variables over otherwise identical trials. He initially assumed them to be Gaussian with means u and v and variance -rj2. This added freedom is adequate to permit the model to mimic a great deal of somewhat surprising data. In his 1978 paper, he examined five sets of data, some old and some new, and was able to use the same values for the two variances for all of the experiments (of which I shall only mention one here and another in Section 11.4.4). Thus, in any particular experiment he had the four parameters u, v, z, and a, to which he added a mean residual time r() (he called it Ter). As an aside, let me mention his somewhat unusual way of fitting the model to data. As we know, this is a delicate matter when we study response distributions. For example, fitting the moments of, say, the response-time distribution, to the empirical moments is not usually a very good idea if more than the first two moments are involved. The reason is that increasingly they become dominated by a very few observations in the tails because their values are being raised to powers >3, and so these estimates simply are not very stable quantities. The major alternative is to attempt some sort of smoothing of the expirical histogram, and then fit the theory to the smoothed data. The two most common ways of doing that are to run the histogram through some sort of weighted averaging or to approximate it by a spline (see Section 3.2.5). Rather than that, RatclifF took seriously the fact that, for reasons unknown, the ex-Gaussian—the convolution of an exponential with a Gaussian—fits response-time distributions well. This was demonstrated in Section 3.2.1, although it is equally clear that for weak signals it is a poor approximation since its hazard function is increasing whereas the data indicate that it must be increasing and then decreasing to a constant (Sections 4.1 and 4.4). In any event, he did optimal fits of this three-parameter family to the data, and then he fitted the random walk model to that approximation. The danger in this approach is having data not well fitted by the Ex-Gaussian; the fit must be verified each time the technique is used. It is not intuitively obvious that this model can reproduce the Sternberg results for mean times. To check that, Ratcliflf (1978) ran a memory set-size experiment resulting in typical data shown in Figure 11.3. Of course, in fitting his model many more parameters are involved than the two needed for an SES model. But it is equally true that with these parameters one accounts for far, far more data than just the means. For example, the predicted values for the response-time distributions to correct responses are shown in Figure 11.4. The fit is clearly very satisfactory. The same cannot be said for the error distributions, since the model predicts far more long times than are observed. Whether this is a serious criticism of the model is not
440 Matching Paradigms FIG. 11.3 MRT versus serial position and set size for Ratcliff's memory set-size experiment. Each of two observers contributed 1280 observations per set size. One standard deviation error bars arc shown. [Figure 13 of Ratcliff (1978); copyright 1978; reprinted by permission.
clear. Ratcliff claimed not, arguing that the prediction rests largely on the form of the tails of the Gaussian governing the rate of information accumulation. By simulating the process, he showed that a somewhat truncated Gaussian, in which the tail probability is spread about as a uniform distribution over the center of the empirical distribution, had the appropriate properties. 11.3.7
Distributive Memory Model
Anderson (1973) pointed out that the estimated memory-search rate of 30-40 msec per item arising from the serial search models is simply too fast, given what is known about neural switching times, to make plausible the existence of a detailed item-by-item comparison. He suggested instead a quite different type of model for the memory involved, one derived from his earlier discussions of the nature of long-term memory (Anderson, 1970,
Scanning, Search, and Same.-Different
Designs
441
1972). For more recent developments, see Anderson, Silverstein, Ritz, and Jones (1977), Hinton and Anderson (1982), Kohoncn (1977), and Levy, Anderson, and Lehmkuhle (1984). Memory is assumed to draw upon some large number K of individual neural systems in the following fashion. Each signal s, results in a characteristic pattern of activity over these systems. If we assume the level (this
FIG. 11.4 Response-time distributions fit by the memory retrieval model for the data of Figure 11.3. [Figure 15 of Ratcliff (1978); copyright 1978; reprinted by permission.]
442 Matching Paradigms could be a voltage or a firing rate) of activity on subsystem k is fik, then the signal is represented by the vector
The memory of a series of presentations s l 5 s ? , . . . , SM is simply a weighted sum of these vectors
At first glance, it appears that all the information about an individual signal is lost by summing over all of them. This, however, is not as severe as it seems. To take an extreme case, suppose all signals, both in the memory set and in the distractor set, have orthogonal representations in the sense that
where • denotes inner product. Then we see that
This model implies that a correct recognition, although not an absolute identification, can be based upon the calculation of the scalar f-t • M. Anderson generalized this property by supposing that the stimuli actually form a random sample from a population of stimuli whose representations are generated by independent selections from K independent random variables f k . In addition to the independence assumptions, several somewhat inessential ones are made:
It follows that if it and f,- are independent selections so that f i k and ijk, fe = 1,. . . , K, are mutually independent random variables, then using Eq. 11.19.
Thus,
Anderson estimated a signal-to-noise ratio for this process, but following my previous practice I shall compute d'. We need to know the variance of
Scanning, Search, and Same-Different
Designs
443
the criterion variable when a random signal not in the memory set is presented. Suppose its representation is g. Then using independence freely,
Observe,
So
For simplicity, assume Q = C, then On this assumption, we may calculate d' from Eqs. 11.20,11.21, and 11.23,
Thus performance grows as the square root of the number of neuronal systems dedicated to memory and inversely as the square root of the number of items in the memory. To get a reaction-time prediction, Anderson simply postulated that J5(T) is linear with the number of items in memory, the rationale apparently having to do with the summing operations to form M. To my mind, this does not seem at all sensible. After all, the memory is formed prior—in some experiments a long time prior—to the presentation of the test item, whereas response time is measured from that point. According to the model, the subject calculates g • M and that calculation clearly depends upon K but not obviously on M. There is a way out; however, it is quite ad hoc. If we are dealing with short-term memory, perhaps the subject is able, up to a point, to dedicate increasing numbers of neuronal systems to the task as the memory load is increased. In particular, if it is done so that the error rate is
444 Matching Paradigms
FIG. 11.5 MRT versus memory set size for varied and consistent mappings. Each data point is based on a sample size of 60 from each of two subjects. Note that the consistent mapping leads to the faster search (smaller slope). [Figure 5 of Schneider and Shiffrin (1977); copyright 1977; reprinted by permission.]
fixed—that is, d' is constant—then by Eq. 11.24, K — (d')zM, whence
which produces the Sternberg result. Observe that this model automatically exhibits no impact of memory set size on E(T) once K reaches its maximum value, which presumably is not large for short-term memory and is large but possibly not flexible for long-term memory. Figure 11.5 shows data of Schneider and Shiffrin (1977) in which MRT is examined for both varied- and consistent-set procedures. MRT changes very little for the consistent procedure, as would be expected if the memory set is in long-term store, and increases appreciably for the varied-set procedure. These ideas are very attractive for recognition, but they do not automatically generalize to absolute identification so far as I can see.
Scanning, Search, and Same-Different Designs
11.4
445
SAME-DIFFERENT EXPERIMENTS
Perhaps the simplest version of the memory search task is one in which the memory set includes just one stimulus (which may, in fact, consist of a string of items such as letters), and the task is to decide whether the test stimulus is the same as or different from the memorized stimulus. Since this is a two-stimulus, two-response design, it logically belongs in Chapter 6. However, the fact that in general the stimuli consist of L distinct items such as letters or digits and that some sort of search is assumed to be carried out over these items make it more appropriate to treat these experiments here. Because L, the length of the stimulus string, is often varied, some concern must be given to whether the stimuli are apprehended in a single fixation or not; presumably what goes on within a glance differs from what happens in shifting from one portion of the stimulus to another (Krueger, 1978). 11.4.1
The Basic Data
The experiment has been run in many variants and discussed by many people. The major references are: Bamber (1969a, b, 1972), Beller (1970), Burrows (1972), Egeth (1966), Eichelman (1970), Grill (1971), Krueger (1973, 1978), Krueger and Shapiro (1980, 1981), Nickerson (1965a, 1969, 1971, 1972), Posner and Mitchell (1967), Ratcliff and Hacker (1981), Silverman (1973), Silverman and Goldberg (1975), Taylor (1976a, b), and Tversky (1969). The four most conspicuous features of these data are: 1. On trials when the stimuli do not match and with L fixed, the MRT decreases with the number D of individual elements that differ. For example, if the memory stimulus is XBMF, then on average the response of "different" is faster to XPME than it is to XPMF. 2. On non-match trials, longer strings are slower than shorter ones. These two facts are illustrated in Figure 11.6. 3. The MRT when the "same" response is made is faster on average than when the "difference" response is made; indeed, in Bamber's data the former time is approximately the same as the fastest difference case—that is, when D = L. This can be seen by comparing Figure 11.6 with Figure 11.7. Krueger (1978, p. 290) reported that in the 57 experiments he found involving stimuli with just one element and for which times were reported, 35 exhibited faster "same" responses than "different" ones. However, at least one later study (Ratcliff & Hacker, 1981) demonstrated that this relation is affected by instructions and so, presumably, is under subject control. 4. The pattern of errors is also systematically related to the experimental parameters. In particular,
446 Matching Paradigms FIG. 11.6 MRT versus the number, D, of elements that differ in strings of length L in a same-different experiment. There were four subjects, each of whom contributed from 120 to 240 observations per point. The predicted values are from the serial, self-terminating model. [Figure 1 of Bamber (1969a); copyright 1969; reprinted by permission.]
(a) The probability of an error is larger when the response is "same" than when it is "different." Krueger (1978, p. 290) stated this was true in 42 of 65 single-element experiments that he examined. (b) The probability of an error when the response is "different" increases with I, and decreases with D. This is evident in Figure 11.8. Two things are noteworthy. First, the rapidity of the "same" responses relative to the "different" ones is somewhat surprising since, after all, only one difference need be noted to warrant a "different" response whereas all elements must be identical to warrant a "same" response. Second, the facts FIG. 11.7 MRT versus list length L for "same" responses of the experiment reported in Figure 11.6. Again there were four subjects, each of whom contributed from 240 to 540 observations per point. The predicted values are from the serial, selfterminating model. [Figure 2 of Bamber (1969a); copyright 1969; reprinted by permission.]
Scanning, Search, and Same-Different Designs 447
that both times and errors vary systematically with the nature of the stimulus—same or different—and with the experimental parameters L and D strongly suggest that a speed-accuracy tradeoff should flow naturally from any plausible model of the process. 11.4.2
Modified Error-Free Stage Models
The fact that for strings of fixed length L, MRT decreases with increasing numbers of different elements D suggests (but by no means proves) a self-terminating search process. Bamber (1969a) fit the serial selfterminating model to the different responses, with the satisfactory results we have already seen in Figure 11.6. Only two parameters are involved—a residual time of 384msec, which seems a bit long, and a scan rate of 60 msec per item, which is 50% slower than that estimated by Sternberg. But the real problem is not these estimates but the fact that for each length L the model predicts the slowest search should be to stimulus pairs that match; that prediction is often grossly wrong (see Figure 11.7). In addition, the estimated scan rate has speeded up to 25 msec per item—more than twice as fast as the other estimate—which is an unhappy lack of invariance. And, of course, as is generally true of stage models, there is no place for errors. This model will not do. Bamber suggested a two-process model in which, in addition to a relatively slow serial scan, there is a more rapid "identity" reporter. This will explain the data since it was designed to do so, but it is neither particularly plausible nor has it been shown to be a useful device for explaining other data. As Taylor (1976b) has noted, a device that can rapidly spot a match can presumably also rapidly spot a mismatch; it seems odd not to invoke it
FIG. 11.8 Percentage of same responses to the non-matching stimuli versus the number D of mismatches and list length L for the experiment of Figure 11.6. The sample sizes are as in Figure 11.7 [Figure 4 of Bamber (1969a); copyright 1969; reprinted by permission.]
448 Matching Paradigms
on all trials. Moreover, as Bamber (1972) showed, the results do not ehange if the match is conceptual (A and a) instead of identical (either A and A or a and a), which belies any sort of direct template comparisons. Taylor (1976a) undertook a comprehensive survey of a variety of stage models that had arisen to that point. He (1976b) systematically attempted to fit each of these models to the data in Bamber's study as well as his replication of it and a variant in which the subject was to report either "all different" or "some different." Without going into the details of how he went about it, he found that a number of the models were suitable provided they were somehow modified to generate fast "same" responses. In particular, all acceptable models were self-terminating, and there was some evidence favoring parallel over serial models. Moreover, those favored exhibited limited capacity, one consequence of which is that the performance on one item is affected by what has happened on other items. But no model in its simple form gave a satisfactory account of the "same" data. To handle that, Taylor explored three modifications, two of which yielded identical predictions for the mean times. One was Bamber's fast identity reporter. Indistinguishable from it as far as MRTs are concerned is a decision bias that increases, by a constant factor, the decision times to items that differ. The third is a guessing model having the following features. After any identical item is compared and no mismatch has been found up to that point, there is a fixed probability that the entire process, whether serial or parallel, is terminated and the response "same" is emitted. (In the variant mentioned above, this guessing rule was applied to non-identical items but the guessed response is that all comparisons are different.) All three modifications were adequate to bring all of the partially acceptable models into approximate accord with the data. Again, as in Bamber's identity model, the add-on seems ad hoc and to be acceptable it must be useful in understanding a great deal of other data. So far, that has not been demonstrated. 11.4.3
Serial, Exhaustive, Imperfect Search
Krueger (1978) suggested a somewhat different approach to these data that blends the simple search ideas with some stochastic models developed for two-choice response times (Sections 8.2 and 8.3). His general qualitative assumptions are three. First, the process of comparing pairs of stimulus features is less than perfect, which places the decision system in a situation of selecting a response on the basis of a random variable—namely, the count of the number of mismatches of individual features (possibly stimulus items, but not necessarily). Second, the imperfection is asymmetric in the sense that the effect of "noise" is far more likely to change a match into a mismatch than vice-versa. This seems plausible since almost any change will corrupt a match whereas only highly selected ones convert a mismatch into a match. Third, because mismatches arise so easily from noise, the system must be designed to be conservative about reporting mismatches and so is
Scanning, Search, and Same-Different
Designs
449
likely to recheck the stimuli whenever a non-zero mismatch count occurs in an attempt to develop more certain evidence of an actual mismatch. In a sense, then, he assumed a random walk model on the number of mismatches. Qualitatively, the model has the correct properties. It is slower on average when there is an actual mismatch because that condition generally leads to an observed mismatch and unless it is extreme, it is likely to be rechecked. In contrast, on those trials where the stimuli match and no observed mismatches occur, there will be no rechecking and so a certain fraction of the same responses are faster than most of the difference responses. It is built into the second assumption that there must be more errors when the pair is a match than when it is a mismatch. And with the number of items held fixed, the proportion of errors decreases with the number of mismatched items, D, because the conversion of an actual mismatch into a perceived match is more probable when the number of mismatches is smaller than when it is larger. Let us make the model more specific. Let n denote the number of distinct features defining the stimulus for the subject. This is almost certainly not the number of letters in a stimulus—for example, Krueger assumed n = 100 when L = 1. Let d be the number of mismatched features among these n, so on trials when the pair is a match, d - 0. Assume that feature comparisons are statistically identical and independent, and let p denote the probability that a feature pair is incorrectly perceived—that is, either a match being a mismatch or the other way around. Symmetry is assumed here, but leads to overall asymmetry because p ^ l —p. Let M be the random variable that gives the total count of perceived mismatches. On these assumptions, for the match trials, the distribution of M is obviously the binomial
For trials with d actual mismatches, M can be thought of as the sum of the observed mismatches, Md, arising from the d mismatches and the observed mismatches, M n d, arising from the n — d matches. Each is again binomial:
So the distribution of M is the convolution of these two distributions, which we do not need to write out explicitly. For stimuli composed of a single letter, Krueger assumed n = 100 and he treated d and p as free parameters to be estimated from the data. For the random walk, he established barriers at values of M corresponding to likelihood ratios of 1 to 25 and 25 to 1. Whenever a rechecking occurred,
450 Matching Paradigms
FIG. 11.9 Data and predictions for a replication of Bambers (1969a) experiment. The predicted values arise from a model in which mismatches tend to be rechecked (see text). I could not infer the sample sizes. [Figure 6 of Krueger (1978); copyright 1978; reprinted by permission.]
the several random variables were added and the boundaries were recomputed so as to maintain the same likelihood ratios. The parameter space was searched numerically. To compare the model with the response-time data, it is necessary to add another parameter, namely, the time equivalent for each cycle of comparisons of the several features. The choice varied from case to case, but in the one shown here it was chosen to be 125 msec per cycle. That choice was not part of the numerical parameter search. For stimuli with strings of L (<4) letters, it was assumed that the entire stimulus consisted of nL = 100L features. Letting D denote the number of letter mismatches, it was found that the value of p must vary considerably with D. Rather than estimate the number of feature mismatches and p separately for each D, he assumed that the former to be given by dD, where d is the value for a single mismatched letter, and that the latter, p, arises from a rather ad hoc formula I will not spell out. Six distinct experiments, three involving L = 1 and the other three having values from L = 1 to 4, were reported. 1 present the data for just case 4, which replicates Bambcr (1969). The fit of the model to both MRT and error data is shown in Figure 11.9. The major failing of the model is the underestimation of the response time for D >2. Presumably this could have been greatly improved by making separate estimates of p, which reflects a change in the subject's criterion for what constitutes a match of a feature pair. The fit of the other data are comparable, and they certainly encourage further work on the model.
Scanning, Search, and Same-Different
Designs 451
One prediction of the model, which is not made by either Bamber's or the several examined by Taylor but is widely true of the data, is that erroneous (i.e., same) responses to non-matched stimuli are actually slower than the correct responses. This presumably arises in the model because the count has hovered longer in the region of indeterminancy before the erroneous response is finally made, which can happen because of the asymmetry of distributions of steps in the random walk. A variant on the Krueger model was proposed by Miller and Bauer (1981). The basic difference is that rechecking occurs not because found differences are suspected to be unreal, but because they may simply be irrelevant to the discrimination being made. 11.4.4
A Continuous Random Walk Model
Probably the most successful model to have been proposed so far is the parallel, self-terminating, continuous random walk model of Ratcliff (1978). It is similar in some ways to Krueger's model, but does not invoke a rechecking mechanism. The general model was described in Section 11.3.6 in connection with the memory scanning experiment. Its main features, it will be recalled, are that the memory is searched by independent, parallel processes; a response is initiated as soon as a match is established or, barring that, when all processes have been completed without finding a match; the individual processes are modeled by a continuous random walk in which the rate parameter is a random variable over trials, its mean value being larger the more similar two stimuli are; and the boundaries of the random walk are parameters adjusted by the subject to effect a speed-accuracy tradeoff. In that first paper he fitted the model to several studies, including one from Ratcliff and Murdock (1976) in which subjects saw, and attempted to remember, 16 words and were tested with 32 words of which 16 were the memory set. The task for the subject was to identify which of the 32 words were "old" words from the memory set. A quite decent fit was provided by the model, as can be seen in Figures 11.10 and 11.11. (The plots of /n, and T in Figure 11.10 are the mean and time constant of the Gaussian and exponential whose convolution was fitted to the data.) He also accounted for the response-time distributions, which no other theory attempts to do, with the excellent results shown in Figure 11.12. In subsequent papers he has repeatedly shown the flexibility of his model to account for data. For example, in 1981 he observed that of the major contenders neither Bamber nor Taylor could produce a SATF since in both models the mechanism for error is independent of that for time. Although Krueger's model could do so, as I mentioned earlier, it entailed the curious change in one of the parameters. And in Ratcliff (1981) he took up in detail some of the properties exhibited by order of presentation—which had been reported and modeled by Lee and Estes (1977, 1981)—and he was able to provide an account of them in terms of his random walk model.
452 Matching Paradigms FIG. 11.10 Comparison of data from a memory search experiment of Ratcliff and Murdock (1976) in which 16 of 32 words were in the memory set with predictions from Ratcliff's memory retrieval model. In addition to accuracy and MRT versus serial position, the parameters /u. and r are the means of the Gaussian and exponential of the exGaussian fit to the data, and v, a, and z are parameters of the diffusion model. Each of four subjects contributed 192 positive and 192 negative observations. [Figure 9 of Ratcliff (1978); copyright 1978; reprinted by permission.]
One of the more striking successes of the model was to account for the data from Ratcliff and Hacker (1981), who not only showed empirically that they could shift, by instructions, the temporal relation between "same" and "different" responses, but that there are some odd speed-accuracy effects. The study involved memory sets of four letters selected at random and without replacement from a set of 10 consonants. The test stimulus was also a set of four letters that involved replacing either 0, 1, 2, 3, or 4 of the letters of the memory set by other consonants from the 10. The two experimental conditions were: say "same" only when absolutely sure of a match, and say "different" only when absolutely sure of a non-match. With a sample size of about 575 per condition, the results were as shown in Table 11.1. I find two features of these data noteworthy. First, in the condition where the subject is instructed to be sure of the "same" response, the MRT on matching trials is slower than that for three of the four non-match conditions. And second, accuracy on matching trials was poorer when the subject was attempting to be sure—and presumably that means more accurate—in making a "same" response than jvhen attempting to be sure in making a "different" response. The parallel result is true on non-match trials when the instructions are to be sure of the different response—times are longer and accuracy is reduced. This fact, which apparently is not contested, is not accounted for in any of the other models, including a verbally formulated one of Proctor (1981) and a modification of it by
FIG. 11.11 Plots of MRT, accuracy, and a parameter of the model versus study position and partitioned according to the output position (O/P) for same experiment as Figure 11.10. [Figure 10 of Ratcliff (1978); copyright 1978; reprinted by permission.]
FIG. 11.12 For the same experiment as Figure 11.10, comparison of distributions and predictions of the memory retrieval model. [Figure 11 of Ratcliff (1978); copyright 1978; reprinted by permission.]
454 Matching Paradigms TABLE 11.1. Effect of instructions on MRT and percent correct in a Same-Different experiment (Ratciiff and Hacker, 1981) Instru ction Sure of "same" letters switched
MRT (msec)
Match 0 1 2 3 4
Sure of "•different"
Percent correct
MRT (msec)
Percent correct
573
.89
472
.97
605 51.7
.80 .96
695
.65
484
480 46 f
.98
531 518
.89 .94
.97
.97
Krucger and Shapiro (1981), but when the data arc fitted by Ratclilf s model this arises naturally. There has ensued a considerable debate (Proctor and Rao, 1982, 1983; Proctor, Rao, and Hurst, 1983; Ratciiff, 1984; and Ratciiff and Hacker, 1982), which, although not fully resolved, seems to have ended up with acceptance of the flexibility of RatclifF's model to handle the data and acknowledgement of the difficulty that other models have with it. I believe that one reason it has taken so long to accept Ratcliff's model is the fact that it is not as well understood analytically as one would like. For example, we do not understand the hazard function of the first passage time, and it is difficult to get clear-cut predictions that one can seek to see in the data. Rather, this is a case where the only option seems to be to search the parameter space numerically for a best fit, and then the quality of the fit is demonstrated pictorially as we have seen above in Figures 11.10-11.12. I find it difficult to sense intuitively what properties the model will and will not exhibit. Perhaps additional theoretical work will illuminate it more completely, although since it is a classical, well-studied stochastic process it is hard to be optimistic about finding simple results.
11.5
CONCLUSIONS
The evidence is that the basic MRT data from all three types of matching experiments should be approached with considerable theoretical caution. The most obvious interpretation is surely not the only one and may very well be wrong. A complex tradeoff exists among at least three major ways in which the mind might be organized: (i) the pattern of memory search— serial and parallel being the two simplest cases—which conceivably may differ for ovcrlearned versus just-learned stimuli; (ii) the nature of the stopping rule—
Scanning, Search, and Same-Different
Designs 455
self-terminating and exhaustive being the simplest but others exist, such as Krueger's repeated checking of a mismatch; and (iii) the possibility, which is especially important in parallel systems, of shifting "mental resources" or capacity, as it is called, from a completed process to those not yet completed. It should be clear at this point that MRT data from a single paradigm cannot lead to a clearcut decision among these possibilities. More of the response-time distributions than the mean should be explained by a model, SATFs should be studied explicitly, and tightly interlocked designs in which the same processes are used in different ways should be jointly explained by a theory. And, although there is little precedent for it in this literature, I suspect that sequential effects should be examined and accounted for. The next, and last, chapter examines some of the research that has resulted from attempts to understand how to make inferences about these three basic aspects of mental organization. It is quite astonishingly difficult to answer what seem to be the simplest and most basic questions about mental organization: Do we search memory serially or in parallel? When does a search stop? To what extent are mental resources shared and reallocated as they are no longer needed? These are the questions, and although we will understand better how to get at the first two, we are very far from having evolved ways to resolve them conclusively.
12 Processing Stages and Strategies
12.1
INTRODUCTION
Previously, especially in the preceding chapter, three general questions have arisen about models of mental processing. One is: may we think of the process as partitioned into distinct, autonomous stages of activity and, if so, what is the network that establishes the temporal relations among the stages? The second is: what strategy is followed in deciding when to respond, with the two extremes being exhaustive and self-terminating searches? And the third is: can processing capacity be reallocated when one stage is completed and others are still being processed? These interrelated questions have previously arisen in a variety of specific models. In addition, they have led to some quite general considerations about the equivalence of models and about experimental strategies to attempt to reach decisions about them. These more general considerations are our present topic. In Section 12.2, I focus on the ability of serial and parallel models to mimic either an arbitrary response-time distribution or just one another. The results vary both in what is assumed to be observable and what is assumed about the process: for example, self-terminating or exhaustive, and the nature of the independence among stages. The upshot of that work is simple. It is virtually impossible to identify the network being used from just data on the distributions of response times in a single experiment; however, with data on intermediate completion times and on the order of completion of stages, neither of which is usually available, it is possible to reject some models. This failure has led to two distinct experimental strategies, which are outlined in Sections 12.3 and 12.4. The first of these involves running a series of closely related experiments based on a limited set of stimuli, and to demand that they be accounted for in a consistent way using the same set of underlying parameters (usually, mean latencies). The second is to confine attention to a single paradigm, but to introduce experimental manipulations designed to affect the stages differentially. The pattern of overall times as a function of the manipulations allows some inferences about the nature of the network involved. A substantial fraction of the work covered in Sections 12.2 and 12.3 is due to J. T. Townsend together with students and collaborators. Anyone interested in these and related topics in greater detail than I shall treat them should consult the book by Townsend and Ashby (1983) and/or the original papers cited below. 456
Processing Stages and Strategies 457
12.2 12.2.1
SERIAL-PARALLEL EQUIVALENCES General Nomenclature
If we are willing to suppose that information processing is carried out by distinct mechanisms that, under certain input conditions, do just one thing and then yield an output, we may schematically think of them as forming the nodes of a flow diagram in which the arrows represent precedence relations. The activity represented by a node is begun when and only when all of the nodes that feed into it have completed their tasks. This is the so-called "stage" conception of information processing. It is vaguely plausible if one thinks of the brain as highly compartmentalized, but many doubt if it is a good general model of processing. At a minimum, they argue, there may be substantial temporal overlap of function, with one stage beginning to operate on a partial output of another stage that has not fully completed its activities. One such theory, called cascade theory, was discussed in Section 4.3.2. We also note that the stage conception is not really consistent with most of the models that have been proposed for two-choice situations, in particular the various forms of information accumulation in Chapters 4, 8, and 9. Nonetheless, the stage conception has received the attention of many cognitive psychologists throughout the 1970s, and I shall report some of their results. If the network of a process is a simple chain in which each node, except for the initial one, follows just one other, then the system is said to be serial. If the network is a fan of arrows from one node to all but one of the others and a reverse fan from these to the remaining one, it is called a parallel system. Suppose that the nodes are numbered 1, 2,. . ., n and that the time from initiation to completion of the processing stage i is a random variable TV A realization of Tj is referred to as the actual processing time of the stage. The distribution of Tf may depend upon aspects of the system other than the stage itself. In a serial system the time at which the ith stage is i
completed, assuming the nodes are numbered as they occur, is £ TV In a r-1
parallel system matters are far more complex since the order of completion of stages is not fixed in advance; it depends upon the actual realizations of the n random variables TV In general, any order of completion can arise. Such an order is a permutation p of the set {1, 2,. . . , n} of indices for the stages. Let the associated completion times be denoted T 1 < T 2 = £ - • - < r k < • • • Tn, where Tk = T p -i (k) . We speak of
as the intercornpletion times of the parallel process. Obviously, one set of intercompletion times looks exactly like a realization of a serial process with nodes in the same order as the times. But the intercompletion times, except for the first one, are not the actual processing times of the nodes of the
458 Matching Paradigms parallel system. Rather,
are the actual processing times. In a serial system, the intercompletion times are identical to the actual processing times. The reader should be alerted to a potential terminological confusion. I use the word "stage" to refer to a unitary processing activity occurring in an underlying network of nodes. My usage is consistent with that of Sternberg and his followers. In contrast, Townsend and his followers speak of such an activity as a "subsystem" or "subprocess," which is only a problem because they use the word "stage" to refer to something else—namely, the number identifying an intercompletion time. It thus refers not to any locus in the system, but to a denned random variable. It is obvious, without any further analysis, that rigid serial and parallel systems cannot mimic each other at this level of detail. The reason is that the order of the one is invariant and that of the other depends upon the exact realizations of the random variables. But this is probably not a particularly relevant observation for two reasons. First, in many situations we simply do not have observational access to the order of completion and usually not to the intercompletion times themselves, but only to the overall time. This is certainly typical of most response-time studies treated in this book. Second, in many processing models we do not assume that the order of events in the serial system is rigidly determined to be the same trial after trial, in which case it is far less clear that the two processes are so distinct. 12.2.2 Mimicking, Overall Decision Times Consider any system where overall response time T has some distribution F. Can it be mimicked by either a serial or parallel system of n stages with exhaustive search? The answer is that it is trivial to do so using a serial system having perfectly correlated and identically distributed stages or with a parallel system having independent and identically distributed stages. There are infinitely many other ways of partitioning such a distribution into n stages. For the serial system, let the random variable of each of the n stages be T; =T/«, in which case they are perfectly correlated and their distribution functions are Observe that
and so the process is mimicked. Next consider a parallel system with n independent stages having actual
Processing Stages and Strategies 459 completion times T; with distribution functions Ft(t) = F(() 1/n - Since the search is exhaustive, the observed time is max(T;), which we know (Appendix A, Eq. A.7) has the distribution function
Consider next processes that are self-terminating. For the serial case, place the terminating stage in the first position and let Ft = F. For the parallel process, let all of the independent stages have the same distribution Gj = F, then independent of what happens on the other stages, the process is determined just by the terminating stage and so has distribution F. Summarizing, Theorem 12.1. For any distribution of overall response time and any integer n, there exist both serial and parallel and both exhaustive and self-terminating search processes of n stages that have the given distribution; in the parallel case, the stages may be selected to be independent. 12.2.3 Mimicking Intercompletion Times by Serial and Parallel Exhaustive Searches To make the mimicking tasks more difficult, let us suppose next that all of the intercompletion times are known, although the order of the stages to which they are attached is not specified. Numbering the intercompletion times by their order of occurrence, suppose that they have the joint distribution F(t{, t2, . . . , tn). Mimicking this by the serial exhaustive search is trivial: simply identify the stages with the intercompletion times. For the parallel exhaustive search simply let the actual processing times be
with their joint distribution induced by F. Summarizing, Theorem 12.2. For any joint distribution of n intercompletion times, there exist serial and parallel, exhaustive search processes of n stages whose intercompletion times have the given distribution. 12.2.4 Mimicking Intercompletion Times and Order by Serial and Parallel Exhaustive Searches As we noted at the end of Section 12.1, it is impossible for a rigid serial search to mimic both the times and orders of a parallel search. To be able to do this one must drop the fixed order of processing. It suffices to suppose that the order, a permutation p of { 1 , 2 , . . . , n}, is selected at random with some probability P(p). We refer to this as a variable order serial system.
460 Matching Paradigms Theorem 12.3 (Vorberg, 1977). For any joint distribution of n intercompletion times and order of stages, there exist variable-order serial and parallel, exhaustive search processes whose intercompletion times and order of stages have the given distribution. Proof. Let H(t,, t2, . . ., t,,, p) be the distribution of the given process. To mimic it by a variable-order, serial process, let and
This yields the given process since H = FP. To mimic it by a parallel process, let TJ < T2ss • • • =s rn, and define the joint distribution for the parallel process with the temporal order described by p as follows: This obviously yields the desired mimicking.
•
12.2.5 Mimicking Intercompletion Times and Order by Physically Realizable Searches Vorberg's Theorem 12.3 is simply too flexible to be of interest. It does not attempt to distinguish systems that can be realized by physical processes from those in which the temporal dependencies are not physically possible because an event is modified by something yet to happen. Consider, for example, serial systems of three stages. The distribution of times in the first stage to occur can depend upon the fact that it is first and on the order in which the other two will occur, which is assumed known in advance, but it must not depend upon the actual (inter)completion times t2 and (3. Similarly, the distribution of T2 cannot depend upon the value assumed by T3, but it can depend upon the value of T,, which has already been selected and so can be known. When we turn to parallel systems, the problem of dependencies is a great deal more difficult to formulate. At the onset, there really cannot be any conditioning, since everything is in the future, but during the course of information processing in the several stages it is entirely possible for interactions to develop. For example, the completion of the processing within any one stage can alter the hazard functions for the other stages, thereby introducing a dependence among the processes. This is exactly what was postulated in the capacity allocation models of Sections 11.3.3 and 11.3.4. Far and away the easiest assumption to make about parallel processes is that between successive completions, the stages are independent stochastic processes. Such processess can be called locally stage independent. Townsend, who used the term "stage" to refer to intercompletion times, called this "within-stage independence."
Processing Stages and Strategies
461
For initial simplicity, let us begin by formulating the two-stage processes, after which we generalize this to n stages. A variable order, two-stage serial process is said to be realizable if there are unconditional density functions ftj, where i denotes the stage and j the position in the temporal ordering, and a probability P such that the distribution of ordered pair of intercompletion times ( t l 5 1 2 ) is given by when stage 1 precedes stage 2 and by when stage 2 precedes stage 1. For a local stage-independent parallel system, which is of course realizable, there are four density functions gip-, where again i denotes the stage and / the position in the temporal ordering. The distribution of the ordered pair of intercompletion times ( f j , t2) is given by when stage 1 is completed before stage 2 and by when stage 2 is completed before stage 1. We may think of the two intercompletion times as being generated separately. The first is described by the term gii(ti)[l — Gn('i)L which is the density that stage i is completed at time t; and the complementary stage i fails to be completed in that time. This is on the assumption that both processes unfold independently. Once stage i is completed, however, the stage i process comes to be governed by a new conditional density function namely, g^fe ti)To generalize these processes to n stages, proceed as follows. In the serial case, the stages are dealt with in some order that can be thought of as a permutation p of (1,2, . . . , n } . The process terminates either when all n stages are completed, which is exhaustive search, or when a critical item is reached in some stage k, which is self-terminating search. For present purposes, it is quite immaterial which search procedure is used. Let P(p) be the probability that order p is used. We suppose that the processing of a particular stage depends upon three things: the stage itself, the order in which it is done, and the preceding intercompletion times. We could also assume that it depends upon the order of processing yet to come, but for simplicity 7 do not. So the relevant densities are of the form fit(t .) where i is the stage being processed, ; the rank order, and the conditioning is on the preceding intercompletion times. In that case, the joint distribution of the first k intercompletion times t,, f 2 , . . ., (k and order is given by )
For the locally stage-independent parallel process, assume that as each
462 Matching Paradigms
stage is completed new hazard functions are established on each of the remaining stages and the processes are carried out independently until the next completion, at which point new hazard functions govern the remaining stages, and so on. So, once again, we assume that the density function of a stage depends upon that stage, the number but not identity of the stages already completed, and the actual intercompletion times. Suppose p is a permutation of {1, 2, . . ., n}, then let Now suppose we are in the situation where j — \ stages have been completed and the intercompletion times were t { , t2, . . ., tt { . Then each of the remaining stages enters a race to be the /th completion. We suppose that each process is now governed by a conditional density function gijO, I *i, t2, • • • , t,--i), where i denotes the stage and / the order of the intercompletion time under consideration. So, if the /th intercompletion time is tt and the order of intercompletion times being considered is p, then the conditional density function is
Theorem 12.4 (Townsend, 1976a). Any locally stage-independent, parallel process described by Eq. 12.6 can be mimicked by a realizable serial one described by Eq. 12.5. The converse is true only under special conditions. Proof. Let the parallel process be given. For any p, let P(p) be the probability obtained by integrating the factors given by Eq. 12.6 from 0 to °o and then forming the product over j. Let /p i ( j ) j jU, (,, 12, . . . , f,--i) be defined to be Eq. 12.6 normalized by its integral. This serial process clearly mimics the parallel one. To mimic a serial process that terminates in k steps by a parallel one, it is necessary and sufficient that the intercompletion time distributions agree for each order p. Thus, by Eqs. 12.5 and 12.6,
Let A be a fixed subset of n — j + i elements of {1, 2, . . . , n}. For each i in A, define the subset of permutations Note that the number of elements in R ; is independent of (; denote it by K. Define P(R,)= X P ( p ) u k - From Eq. 12.7 (but suppressing the notation for peRi
Processing Stages and Strategies 463
t,,t2,..., t. -i and writing t, = t),
Summing over A,
Integrating and taking into account the boundary condition F( s (°° .) = Gu(«> - ) = 1 ,
Dividing Eq. 12.8 by Eq. 12.9,
Thus, given k, P(p), and the f u , Eq. 12.10 determines the hazard function of the gj i that mimics it. This is a hazard function of a probability density function if and only if both hu > 0 and
There is absolutely no assurance that the second of these two conditions is satisfied. For example, if the fitj are assumed to be exponential, then the condition can be shown to hold if and only if the intensity parameters of the several exponentials are identical, in which case the h u are constant, corresponding to exponential g ir . Townsend (1976b) gave two sufficient conditions, neither implying the other, for this condition on the hazard function to be met. One is
The other is, for some constants a, b, and c,
464 Matching Paradigms
Ross and Anderson carried out a memory retrieval study, which was motivated by the assumption in Anderson's (1976) ACT theory of parallel retrieval processing, to see if it was possible for such a parallel structure to mimic a serial one. The test was of Eq. 12.11, and it was judged to be sustained in these data. 72.2.6
Subconclusion
The upshot of these exercises is very simple. As long as the behavior of the stages is unspecified, it is quite impossible to use the observed reaction-time distributions to decide whether we are dealing with serial or parallel (or other) systems. This is true not only when the overall response-time distribution is given (Theorem 12.1), but even when it is also assumed that the intercompletion times are known (Theorem 12.2). Adding finally the information of the precise order in which the stages are completed fails to narrow things down if we ignore the question of physical realizability (Theorem 12.3); however, adding physical realizability does mean there are serial systems that cannot be mimicked by parallel ones, although the other mimicking is always possible. Since we do not usually have either order or intercompletion time information, the message is clear: the decision about the structure of the mental network will have to be reached using information that is richer than can be obtained from a single response-time experiment. We describe in Sections 12.3 and 12.4 two ideas about how to gain and use additional information for this purpose. 12.3 SIMULTANEOUS ACCOUNTS OF SEVERAL EXPERIMENTS To overcome our inability to infer the order of processing from the data of a single experiment, two major ideas have been pursued. The one—the topic of the present section—is to run several different designs using the same stimuli and subjects and to assume that the same processing stages, network, and optimal search strategy underlie the observed behavior. It is then possible to show that certain non-equivalences hold among models and so, in principle, it is possible to select among them. The other idea—the topic of Section 12.4—is to remain within a single design, but to introduce manipulations of the stimuli and/or the procedure that affect some but not all of the durations of processing stages. The pattern of observed effects on response times sometimes sheds light upon the processing network and/or search strategy. 12.3.1 A Serial or Parallel, Exhaustive or Self-Terminating Testing, Paradigm The ideas discussed in this subsection are found in Snodgrass and Townsend (1980), Townsend (1976b), Townsend and Ashby (1983), and Townsend
Processing Stages and Strategies
465
and Snodgrass (1974); they simplify an earlier proposal of the same general type first described by Snodgrass (1972). The basic paradigm is to have a target, A, followed by the presentation of two alternatives, one of which may be the target and the other, of similar type, we may call B. In procedure /, either AB or BA is presented and the subject is to report whether the target is on the right or the left. In procedures II and III, there are four different presentations, each equally likely: AA, AB, BA, and BB. In II the subject is to distinguish AA from the other three presentations; that is, one response is made if no B is present and the other if at least one B is present. In III, the distinction is between BB and the rest—that is, one response if no A is present and the other if at least one A is present. The goal of the design is to force distinctions between serial versus parallel and between self-terminating versus exhaustive search procedures based upon the assumption that any A — A comparison and any A — B comparison is like any other of the same type carried out in the same temporal order and in the same physical location. For initial simplicity, let us suppose that neither the order nor the location matter—the only relevant factor being whether the comparison is the same or different. Let Xs and X0 denote the decision latencies (random variables) for same and different decisions, respectively. We may then write out the total decision time for each case, which is assumed to differ from the observed reaction time only by a residual time independent of the decision. This analysis is shown in Table 12.1, where in the serial self-terminating column the parameter p denotes the probability that the search is carried out from left to right. Let us examine several cases so it becomes clear how the table is constructed. For exhaustive searches, the random variable Xs arises when an A appears and XD when a JB appears. In the serial case, they are added; in the parallel case, the slower one determines the overall time, and so the maximum of the two is calculated. Nothing is assumed about independence at this point; later we will assume the parallel processes to be locally stage independent. TABLE 12.1. Type of search (see text for explanation) Experimental Condition I AB BA II AA AB BA BB III AA AB BA BB
Self -terminating
Exhaustive Serial
Parallel
Xs + X,, X D + XS XS+XS Xs +X D XD+XS X0 + XD
max(X s , X D ) max(X n , X s ) max(X s , X s ) max(X s , X n ) max(X 0 , X s ) maxtX,-,, X 0 )
XS+XS
max(X s , X s ) max(X s , X D ) max(X n , X s ) max(X 0 , XD)
xs +x n
XD + XS XI:) + X o
Serial
Parallel
pX s + (l-p)X D pX D + (l-p)X s
min(Xs, X D ) min(Xs, X D )
x s +x s
max(Xs, X s ) XD XD min(X 0> X D )
pXs + Xn (l-p)Xs+XD XD X.s ( l - p ) X D + Xs pX D + Xs XD
min(X s , X s ) Xs Xs max(XD, X n )
466 Matching Paradigms
The self-terminating search is more complex. For the serial structure, it is necessary to consider the order in which comparisons are made. For procedure /, only one observation need be made because the other must be the other stimulus. So, for the presentation AB, with probability p the A is examined and takes time Xs and with probability 1 — p the B is examined and takes time XD, and the response is determined. The overall random variable is thus pXs + (1 — p)XD. In the parallel case, the process terminating first determines the response, and so the minimum of the two times is the relevant time variable. For procedure II, where the subject is attempting to decide if a B is present, the AA presentation must be examined exhaustively, and the others need not be. For example, consider AB. In the serial analysis, either A is examined first, with probability p, and then B for a total time of Xs+Xjj, or B is examined first, with probability 1-p, and the response is determined with time XD. So the overall random variable is p(X s +X D ) + (l — p)X D = pXs + Xjj. The parallel analysis is simpler; the subject must wait until the B process is completed, which takes time X D ; however, keep in mind that this examination is carried out while A is being examined, and so the distribution of X D may be altered when A is completed. The other analyses are similar.
12.3.2
How Distinct are the Predictions?
The perfect parallelism of the serial and parallel columns of the exhaustive search make clear that these two cannot be discriminated by this paradigm. We now establish that the other pairs can be distinguished at the level of mean response times. A major distinction between exhaustive and self-terminating models lies in comparing IIAA with IIIAA. In both of the exhaustive models these times are identical and in both of the self-terminating models IIAA is slower than IIIAA. The remaining question concerns the serial-parallel distinction within the self-terminating class. Theorem 12.5 (Townsend, 1976c; Townsend & Ashby, 1983). Assuming that Xs and XD have distinct continuous distribution functions, it is impossible for serial and locally stage-independent parallel models to mimic each other at the level of mean times simultaneously for procedures I, II, and III.
Proof. Observe that for the serial case, the mean time for IAB added to the mean time for IBA is pE(X s ) + (l-p)E(X D ) + P E(X D ) + (l- P )E(X s ) = E(X s )+E(X 0 ). Similarly, adding 1IBB to IIIAA also yields E(X S ) + E(X D )- For the parallel model, the two sums are, respectively,
Processing Stages and Strategies
467
and
It is, thus, sufficient to show that the latter two times are not equal when we assume independent realizations of the random variables within the different comparisons. Recall that when X and Y are locally independent random variables,
where F = 1 — F and F is a distribution function. So for Eq. 12.13 to equal Eq. 12.14, we must have
Rewriting,
But this is impossible because [F s (f)-F D (f)] 2 ^0, F S ^F D , and the distribution functions are continuous. So the two equations must not be equal. This suggests that, in principle, data can be used to choose among models. 12.3.3 An Experiment Following a technique first used by Snodgrass (1972), which will be discussed in Section 12.3.4, Townsend and Snodgrass (1976) used stimuli that can be thought of as arising from 2 x 2 matrices of the letters Q, R, T, and Z. The subjects were to focus only on the columns. A column is a target, or type A, item if both entries are the same and a non-target, or type B, item if the two entries differ. So, for example, presentation whereas
is one version of an AA
is one version of a BA presentation. The
procedure involved the visual presentation of the top row for 300 msec, followed by a 400-msec blank, and then the bottom row for 300 msec. Response time was measured from the onset of the bottom row. Thus, the task was to compare the second presentation, column by column, with the memory of the first presentation. Depending upon the condition, the response is affected by the location and existence or not of a match. Data were collected from four subjects, and their mean times (and the sample sizes underlying them) are shown in Table 12.2. The most obvious fact about these data is that the mean times for IIAA and 1IIAA, 347 and 352 msec, are virtually identical, which is consistent
468 Matching Paradigms TABLE 12.2. Mean data, averaged over four subjects, from Townsend and Snodgrass (1974) Condition
No. of trials
MRT in msec
I AB BA
720 720
381 352
II A A AB BA BB
2160 720 770 720
347 400 394 367
III AA AB BA BB
720 770 720 2 1 60
352 437 440 464
with an exhaustive model but not with a self-terminating one. However, that is about all that is consistent with the exhaustive model. For example, the model says that the AB and BA presentations should all take the same time, but the values of 381, 356, 400, 394, 437, and 440 are far too variable to be explained by sampling variability. If the population standard deviation is 30% of the mean, then with a sample size of 720 the sampling standard deviation is only 4.5 msec. One possible difficulty is the assumption that only a single random variable is associated with each type of stimulus. Perhaps their location, right or left, and/or the order in which the processing occurs makes a difference. Of course, admitting this possibility increases the number of mean latencies from two to eight, which together with the order of search probability p means nine parameters to account for ten pieces of data. Even so, the model is still not satisfactory since it still requires that the four times of procedure II should equal the corresponding ones of procedure III, and 367 and 464 cannot possibly arise from the same distribution based on samples of 720 and 2160, respectively. Townsend and Ashby (1983, Chapter 13) proceeded as follows. To get around the fact that IIAA and IIIAA are nearly identical, which is quite inconsistent with the self-terminating model, they suggested that matches in both columns are dealt with differently from the other cases. They phrased it that such stimuli are treated as a gestalt. To my taste this is unacceptably ad hoc, although it certainly has precedent in the literature (Section 11.4). It adds one more parameter, which leaves no degrees of freedom for the serial case and just one for the parallel one. They worked out the equations for the means on the assumption of exponential random variables and fitted the parallel, self-terminating model to the data and found that it cannot be rejected by a x 2 test. I do not find il a very convincing test of the models. The most reasonable conclusion for this one set of data is that none of the four models really accounts for what is going on.
Processing Stages and Strategies
469
TABLE 12.3. Snodgrass (1972) experimental conditions"
" Table 1 of Snodgrass (1972) is misleading in that the last two Differences of G and the last two Same Cases of H are omitted, although they enter into the calculations to follow.
12.3.4
A Variant on the Paradigm*
Prior to the developments just outlined, Snodgrass (1972) proposed essentially the same idea but in a somewhat more complex procedure. She used 2 x 2 matrices and subsets of them as stimuli with the top row presented for 2.5 sec, a 2.5 sec wait, and then the bottom row. Subjects had to consider all possible comparisons between items in the top row and the bottom row, not just the column pairs. So in the full 2 x 2 case, there were four comparisons. The stimuli and responses are shown in Table 12.3. The increased complexity of the comparisons forces some increase in the complexity of analysis and requires added assumptions. Snodgrass assumed the following rule of search. The subject randomly chooses one of the presented elements from the bottom row. Memory is then searched for either the only element in it (Cases A, B, and E) or in the general case a column is examined at random. If the pair exhibits a match, delete that pair. Depending upon the condition and the model, that either determines the * Dr. Snodgrass was helpful in making clear to me exactly what assumptions she used to arrive at the predictions discussed in this section.
470 Matching Paradigms
response or a comparison is made of what remains. If, however, the first pair is not a match, then the subject examines what remains with priority being given to the diagonal, if it exists, involving the same lower element and the other upper one. Following this, if necessary, the other diagonal is examined. It should be noted that this is but one of many possible search strategies, and it is unclear why it is to be preferred. Let us work out a few cases of serial self-terminating search in order to see how the rule is applied. Let Xs denote the time required for a match and X0 that for a non-matching pair. For condition A, the presentation takes Xs whereas
takes X0. For condition B, either presentation takes
Xs or XD according to the order of the search, and so on average it is (Xs +X n )/2, assuming equally likely presentations. Conditions C, D, E, and F are all treated similarly. For condition G, same response, if presented, either
or
is
is first checked and eliminated. The remaining
comparison determines the response, and so takes Xs +X S = 2XS. If is presented, then efther vertical comparison takes XD, which then is followed by both diagonal comparisons for a total of X n +2X s . Averaging 2XS and X D +2X S yields (4X s +X n )/2. For the four different responses we have the display shown in Table 12.4. Averaging yields (2X s +7X u )/4. The analysis of the parallel models is nearly the same, except instead of adding one either takes the maximum of the random variables when all must be completed or the minimum when the first to be completed (yields the response. All of this is summarized in Table 12.4. From the table one can deduce quite a number of inequalities among the means on the assumption the E(X S )<£(X 0 ). For example, in the serial self-terminating case,
Without the additional assumptions, it is impossible to conclude how E(TK) relates to E(T f J ) and it to E(T H ). The inequalities for the two exhaustive searches arc easy to derive. These results together with Snodgrass' data are shown in Figure 12.1. There is a slight reason to favor the serial self-terminating model, but note that this conclusion rests entirely on the comparison of E(T A ) and E(T B ). Indeed, if we average the data for B, C, and D, it is slightly less than that of A, giving a slight edge for the parallel self-terminating model. Significance tests are needed to tell if there is really an empirical difference.
TABLE 12.4. Random variables arising in the different conditions and for each of the models on the assumption that a same comparison is governed by Xs and a difference by XD Condition
B, C, D E, F
G H
4^ ^J
Response
Serial exhaustive
Serial self-terminating
Same Diff.
xs
xs
XD
Same Diff. Same Diff. Same Diff.
XS + X D X S +X D 2XD 2X S +2X D XS + 3XD Xs + 3XD 4XD
XD (X s +X D )/2 X s +X D /2 2XD (4X s +X D )/2 (2X s +7X D )/4 (2Xs + 3X D )/2 4XD
Parallel exhaustive Xs XD max(Xs, X D ) max(Xs, X D )
max(XD, X D ) max(Xs, Xs, XD, XD) max(Xs,XD, XD, X D ) max(Xs, XD, XD, X D ) max(XD, XD, XD, X D )
Parallel self -terminating Xs XD min(Xs, XD) Xs max(XD, X D ) max(Xs, Xs) max(Xs, X D ) S max(XD, XD, XD, X D )
472 Matching Paradigms FIG. 12.1 Comparison of the several models described in the text with the data of Snodgrass (1972). Each of five subjects contributed 720 observations for each of conditions A, B, C, E, F, and G, and 1080 for conditions D and H. [Figure 6 of Snodgrass (1972); copyright 1972; reprinted by permission.]
Additional data of the same type are reported by Snodgrass and Townsend (1980), and they continue to favor serial self-termination. 12.4
SELECTIVE MANIPULATION OF STAGES
If the processing is in fact carried out in distinct stages, another way to try to infer the temporal network among the stages is to affect differentially the latencies of the stages. There are at least four different attempts to make use of this general idea. The oldest is Bonders' method of subtraction or, as Sternberg (1969b) called it, pure insertion (see Section 6.2.2). The goal is to find manipulations that bypass or eliminate one or more of the stages without affecting the remaining ones. Assuming a serial process, as Donders did, then the distribution of times associated with the subtracted stage is simply the deconvolution of the distributions with it in and with it out. I discussed some of the drawbacks of this idea in Section 6.2.2. The next oldest method, which in fact is quite recent, was first proposed to find if there is some serial link in the network. This is the single-channel hypothesis discussed in Section 5.4, and its study is based upon observing how the response to the second of two signals presented in rapid succession is affected by the time interval between them. Next came a modification of Donders in which one does not try to bypass a stage but rather tries to affect it selectively and alter its latency. The first discussion of this idea, known as additive factors methodology and due to S. Sternberg, was limited to serial
Processing Stages and Strategies 473
systems, as was Bonders' work. We encountered it in Section 3.3.4, and it is described in some detail in Section 12.4.1; an illustrative application is given in Section 12.4.2. Next in Section 12.4.3 the question is considered of when the means of an additive search can be mimicked by the means of a parallel, independent one. Criticisms of the method of additive factors are taken up in Section 12.4.4. Very recently a scheme has been proposed that largely subsumes the earlier ones and is able to uncover the nature of the network of stages. Its major limitation at present is that it is entirely deterministic, and it has not yet been worked out for latencies that are random variables. It is described in the final three subsections. 12.4.1
The Additive Factors Method
Sternberg (1966, 1969a, b) revived and revised Donders' discredited, although still widely used, method of subtraction (Section 6.2.2). Recall that Donders assumed both a strictly serial system and the existence of procedures, such as comparing simple and choice response times, that he believed resulted in the use—or insertion—of an additional stage of mental processing while leaving the other stages unchanged. Sternberg pointed out that a much weaker hypothesis was often equally useful, namely, that certain experimental manipulations may affect—not eliminate—some stages but not others, and these effects will show up as revealing patterns in the overall response times. For example, suppose in the standard memory search design the test stimulus is purposely degraded in quality, then the perceptual processing that transforms the stimulus into a form that can be compared with the items in memory should be slowed down. As this processing is thought to be a single stage of processing that presumably is prior to any memory search, we should find the entire plot of time to respond versus number of memory items displaced upward, but the slope (search time) should not be affected. Such data are shown in Figure 12.2. Note that if the experimental manipulation were to affect each stage of the comparison process, then we would expect a systematic change in the slope. For example, that should happen in a procedure that makes it impossible for the subject to hold the memorized list in active, short-term memory. To effect that, Sternberg had subjects memorize a list of digits, for some subjects a list of 1, others of 3, and still others 5 items. On each trial a new list of 7 letters was presented. On a random third of the trials subjects were required to recall the list of letters—a manipulation that was designed to keep short-term memory fully active by rehearsing the letters. On the remaining trials, a test digit was presented and the subject was required to decide whether it was in the memorized list of digits. Presumably these memorized digits had to be withdrawn from long-term memory and placed into active memory, which should be slower for each item than just searching short-term memory. The results are shown in Figure 12.3, and we see that both the slope and the intercept were affected.
474 Matching Paradigms
FIG. 12.2 Effect of degrading stimulus quality in a standard memory search design. The data are based on averages from 12 subjects. [Figure 4 of Sternberg (1967b); copyright 1967; reprinted by permission.]
The basic mathematical idea underlying what we have just illustrated is this. Suppose there are n serial stages of search and m other serial stages whose latencies are, respectively, T,, T2, . . ., Tn, Sj, S2, . . . , Sm, and so the overall response time is
Now, if we assume that these latencies are independent random variables, so each rth-ordcr cumulant (Section 1.4.5) is the sum of the component rth-order cumulants, we will be led to simple results. For example, working just with the means (in which case we need not assume independence) and assuming that a manipulation affects a single stage—for example, changing S-! into SJ, we see the following change in mean times: In contrast, a manipulation that affects all of the times of a group, say TI, . . . ,T m then yields
Processing Stages and Strategies 475
FIG. 12.3 Effect of keeping short-term memory occupied with another activity in a standard memory search design. The data were averaged over 12 subjects, but the number of observations per subject was not stated. [Figure 16 of Sternberg (1969b); copyright 1969; reprinted by permission.]
Now we may formulate the major idea of the method of additive factors. Suppose we have two independent experimental manipulations. If the first affects only stage 1, changing its latency from Si to Si, and the second one affects only stage 2, changing its latency from S2 to 82, then the two changes can be described by
476 Matching Paradigms Now consider applying both manipulations simultaneously. Since they affect distinct stages, we should find
So the incremental effects should be additive. On the additional assumption of independence, the incremental effects on the higher cumulants should also be additive. In contrast, if the first manipulation affects stage 1 but the second one affects both stages 1 and 2, then we may very well not obtain additive effects because the simultaneous application of the two manipulations need not, and in general will not, have additive effects on the stage they both affect, stage 1. Indeed, the general philosophy of the method is such that were the manipulations to be additive one would suppose that stage 1 is actually composed of two serial stages, one affected by one of the manipulations and the other by the second one. If, in fact, the system of stages is serial and if there are manipulations available that affect each stage individually, then in principle the method permits us to establish the number of stages, although it tells us nothing about their order of occurrence. That can sometimes be guessed by interpreting the role of the stage from the nature of the manipulations that affect it (see Section 12.4.2), but such interpretations are somewhat risky. A modified method (see Section 12.4.7) goes further and permits us, in principle, to infer either the order or its converse. Although Sternberg proposed his method as a substitute for Donders' pure insertion, the latter continues to be widely used, often with apparent success (e.g., see Section 6.2.3). The possibility of using both in the same design was raised by Salthouse (1981), who pointed out that this increases one's ability to make clear which treatments affect the inserted stage. For example, suppose task II involves inserting a stage into task I, and that some pair of experimental treatments exhibit an interaction in task II but are additive in task I, then the conclusion suggested is that the treatments affect separate stages of task I and both affect the inserted stage of task II. He carried out a study illustrating the methods.
12.4.2
Illustrative Applications of the Model
Sternberg (1971) reported a number of applications of the additive-factors method, and subsequently it has often been used by others—although not always well. I merely illustrate it by summarizing one of Sternberg's experiments. The stimuli were numerals the subject was to identify. Three ways of slowing the response were effected by the manipulations: degrading the quality of the visual presentation; increasing the number of possible stimuli from two to eight; and requiring the subject to respond to the digit n with
Processing Stages and Strategies
477
FIG. 12.4 A memory search experiment with three manipulations designed to effect response delays: degrading the stimulus quality; two or eight stimuli; respond with the digit presented or its successor. Additive stages should not produce interactions. [Figure 8 of Sternberg (1969a); copyright 1969; reprinted by permission.]
n + l rather than n. Intuitively, it seems clear that degrading the stimulus should affect the time to encode it. Varying the number of possible responses and the compatibility of the responses should not affect the encoding, but they should affect respectively the search and response processes. One might anticipate the three effects to be additive. The mean data are shown in Figure 12.4, and there is a surprise. A sizable interaction exists between the manipulation of the number of stimuli and response compatibility, whereas stimulus quality is additive with each. The tentative conclusion to be drawn is that the data are compatible with a two-stage, serial model. However, in Section 12.4.6, we shall see that an additional analysis of the data actually suggests the three-stage, serial model may indeed be correct. To illustrate further some of the current uses of the method, I cite three papers. Sanders and Andriessen (1978) used four vowels as signals. The manipulations were: intensity—45 dBA or 85 dBA; foreperiod duration— 1 sec and 10 sec; and stimulus-response compatibility—respond by indicating the vowel presented or the one after it in the alphabet. At high intensity, but not at low, they found evidence of interaction between compatibility and foreperiod duration. Sanders, Wijnen, and van Arkel (1982) varied sleep
478 Matching Paradigms
loss, signal degradation, stimulus-response compatibility, and signal modality. They found only one interaction—namely, between sleep loss and degradation—which they interpreted in terms of several theories of arousal and motivation. Simon (1982) presented one of two visual stimuli, an X or an O, and a tone to either ear. The visual signals were either degraded or not. With exactly the same visual display, subjects either responded by key presses to just the visual display (single task) or to both the visual and auditory (dual task). The experimenters interpreted the pattern of interactions as evidence of parallel initial stages—not too surprising—followed by a common stage—also not too surprising since the same hand was involved. Nonetheless, it is interesting that the method could tease this out from the response times. Another use of the additive-factors methodology is Pashler (1984), who attempted to infer which processing stages are affected in the overlapping, dual task that leads to the so-called psychological refractory period (Section 5.4). He assumed that the second task can be thought of as composed of a minimum of three main stages—a perceptual one, followed by a stage of decision making and response selection, in turn followed by response initiation and execution. The delay in response time to the second task due to there being another, immediately prior task, has in one paper or another been attributed to postponement of each of these three stages while the preceding stage is taken up with the first task. The general magnitude of the effect makes reasonably clear that not all of it can possibly be due just to delays in the perceptual stage, and some authors—for example, Duncan (1980)—hold that such activity goes on automatically and in parallel and definitely is not a source of delay. Welford (1980c) and his colleagues have suggested that the bottleneck lies in the use of a single, central processor during the decision stage. And Keele (1973) and Keele and Neill (1978) have argued for the delay lying in the response execution stage. In an attempt to get evidence on these alternatives, Pashler (1984) reasoned as follows. If the delay in the second task is due to a particular stage being taken up with task-one activities, then experimental manipulations that affect that stage directly should manifest themselves as additive effects in the response time to the second task. If, however, the delay is due to a later stage, then manipulations affecting the earlier one should exhibit no effect (at least until these delays are sufficiently long so that they further delay the start of the later stage), and so empirically it should appear as underadditivity. Thus, carrying out separate manipulations that arc believed to affect the perceptual and decision stages should result in patterns of response delays that permit the localization of the bottleneck, if one exists. When there was a first task (on half of the trial blocks), it was to identify which of two horizontal bars, one above and one below a fixation point, had been presented. The response was before the second task. The second, experimental task, involved search for a target letter in a display of M (= 2, 4, or 6) letters. Some of the manipulations carried out on the second task
Processing Stages and Strategies 479
were signal contrast (aimed at affecting the perceptual stage), presence or absence of the target letter and display size (both aimed at affecting the decision stage), and length of the SOA (time between signal onsets in tasks 1 and 2). Experiment 1 involved a fixed SOA of 100 msec and a fixed set size (4), and it varied both target present or absent and signal contrast. A substantial underadditivity was found for signal contrast, whereas target absent or present had an additive effect. Experiment 2, again at an SOA of 100 msec, involved target absent or present and varied set size. Again, target absent or present exhibited additivity, but set size was underadditive. At an SOA of 300msec (Experiment 3), both variables were additive. The conclusion is that the delay is not in the response execution stage and that the decision stage is divided into two substages, with the first substage affected by set size and the second affected by target presence or absence. Additional experiments were performed aimed at deciding between the bottleneck and capacity sharing accounts of the delays. As the arguments involved are less convincing to me, I do not go into them here. After this book was in production, I became aware, via Mudd (1983), of the research program of George E. Briggs in the late 1960s and during the first half of the 1970s in which he systematically applied Sternberg's method, information theory ideas, and Bayesian calculations to data from a series of experiments. This led him to a surprisingly detailed model of the several stages involved. I do not attempt to insert a description of this work, but those interested in generalizations of Sternberg's approach will find that Mudd (1983) has provided a comprehensive summary. In addition, the following original references are of interest: Briggs (1972, 1974); Briggs and Blaha (1969); Briggs and Johnsen (1973); Briggs, Johnsen, and Shinar (1974); Briggs, Peters, and Fisher (1972); Briggs and Shinar (1972); Briggs and Swanson (1969, 1970); Briggs, Thomason, and Hagman (1978); Johnsen and Briggs (1973); Swanson and Briggs (1969); and Swanson, Johnsen, and Briggs (1972).
12.4.3 Can a Parallel System Mimic the Means of a Serial System? Although Theorem 12.4 and its proof provides general conditions under which a realizable serial process can be mimicked by a locally stageindependent parallel process, within the framework of additive factors we should consider the conditions under which an observed additivity of factors precludes the possibility of a parallel interpretation. This is, after all, a less stringent demand. Townsend and Ashby (1983, pp. 372-375) have dealt with this question. Suppose ( and / are manipulations and we have
480 Matching Paradigms
It is obvious that if I and k are also manipulations, then Further, it is not difficult to show that Eq. 12.16 is equivalent to the manipulations satisfying what is known in the measurement literature as the Thomson condition (Krantz, Luce, Suppes, and Tversky, 1971): If i
Furthermore, Eq. 12.15 implies the following condition that is known as monotonicity or independence: and
It is also known that under somewhat restrictive additional conditions on the manipulations (restricted solvability and Archimidean) Eqs. 12.1.7 and 12.18 imply Eq. 12.15 (Krantz, et al, 1971, Ch. 6). So we will work with its consequence, Eq. 12.16. Now, suppose the underlying mechanism is not serial stages, but rather parallel ones with times Ta and Tb, where the one class of manipulations affects process a and the other affects b. Let
then as we have seen before we may write
Our question is when is it impossible for Eq. 12.19 to satisfy Eq. 12.16. Observe that
and so a sufficient condition for Eq. 12.16 to fail in the parallel model is that for all l > 0,
Another sufficient condition is to replace > by < in Eq. 12.20. Consider the special case when the parallel process are independent, then there are distributions of the component processes, Gai and Ghfi, such that
Processing Stages and Strategies 481 in which case Eq. 12.20 is equivalent to or equally well <0 for all t. Equation 12.22 amounts to saying that the effect of a change in treatment is to alter the distribution function so that the two do not have a common point. An example is a shift family where for some constant ati Thus, to the degree Eq. 12.22 is plausible, which it seems to be, it is impossible for independent parallel processes to mimic a finding of additive means using the additive-factors method. 12.4.4
Cautions and Concerns
In presenting the additive-factors method, Sternberg was very cautious in discussing its use. He made the following points. 1. The analysis of variance model is exactly appropriate for additive, independent stages; however, the usual tests of significance are not very appropriate and if used incautiously will lead too easily into accepting the additive independence of response times as a function of the manipulations. As there is no fully satisfactory statistical procedure to decide this, one must be conservative and attempt to run a sufficiently precise experiment so that additivity can be rejected when it is false. 2. The method can isolate serial processing stages, if they exist, but it in no way isolates processes. It is a considerable theoretical leap to decide exactly which processes have been affected and to infer their order of occurrence. 3. If the temporal network is not in fact serial, the inferences can be totally misleading. In particular, under some circumstances, independent parallel processes can be interpreted as interactive serial ones. The circumstances appear likely to hold (Section 12.4.3). Moreover, as McClelland (1979) first established and Ashby (1982b) confirmed, the cascade model, which is a serial, but overlapping, strength model, produces approximate additivity of means when the component processes are manipulated. This result is numerical in character and its analytic basis is not understood. Both authors give rules-of-thumb indicating when they believe additivity will appear in the cascade model. 4. Unlike Bonders' scheme, this method provides no information whatsoever about the latencies associated with individual stages. Others have raised different issues. I shall cite five that strike me as the most pervasive. First, the method does not confront speed-accuracy tradeoff concerns.
482 Matching Paradigms
Sternberg assumed that subjects could and would perform as rapidly as is consistent with perfect performance. In practice, however, error rates up to 5% are usually considered acceptable. Others have argued that this is inadequate and one must analyze explicitly the class of speed-accuracy tradeoff models that are consistent with the additive-factors methodology. To some extent, I have already examined these issues in Section 6.4, and I go into it further in Section 12.4.5. Second, even if one accepts the idea of stages that are temporally quite distinct, surely it is possible that they are not organized serially. One can imagine a stage with several inputs that does not begin processing until all of the inputs to it are completed, or equally one can imagine a stage that begins processing as soon as any input occurs but does not output anything until the processing of the previous stage is completed. Nevertheless, one may be able to infer the network involved by using methods that affect the stages differentially. This I take up in Section 12.4.6. Third, many of those who have commented on the matter very strongly question the existence of stages at all, at least as conceived by Sternberg. A number of the more popular two-stimulus, response-time models, such as most of those in Chapters 8 and 9, do not postulate anything resembling stages except for the distinction between decision and residual processes. Others who accept the idea of separate processing mechanisms question whether the sharp temporal order postulated by Sternberg is plausible. The cascade model of Section 4.3.2 is an example of one involving temporally overlapping stages. Perhaps the most systematic attack against the stage idea is that found in Taylor (1976a). Other relevant references are Blackman (1975), Pachella (1974), Smith (1980), and Stanovich and Pachella (1977). Fourth, a variety of what may be lumped together as statistical issues have been raised. Perhaps the most sweeping attack of this sort was by Pieters (1983). I shall mention three major points from this paper, (i) The use of ANOVA was questioned on two counts: the power of the test, about which Sternberg had early cautioned, and the assumption that the variance is independent of the mean. As we know from many sources of data, the latter is almost always false in response-time data—the mean and variance covary in a highly regular fashion. This seems a highly telling argument against using ANOVA in this context, (ii) The data analyzed are usually group data, whereas the idea of stages is clearly about the operation of individual minds. Sufficient conditions are known (Thomas and Ross, 1980) for when a family of individual distributions lead to group data that can be modeled in the same family of distributions. Such conditions are not so easily satisfied that the issue can afford to be ignored; it usually is, however, (iii) Assuming independent stages and studying the additivity of cumulants may well lead to the introduction of artificial stages. Consider for example a stage having a Gaussian distribution with mean ^ and variance cr2, where /n is substantially larger than cr (so the Gaussian approximation does not lead to much trouble with negative times). We know (Eq. 1.56) that the first two cumulants are /x
Processing Stages and Strategies 483
and cr2. Now if we have experimental procedures that affect these two parameters of the Gaussian independently, then we will be led to believe two stages are involved. Pieters, in summarizing, likened the method of additive factors to a temporal factor analysis, an analogy not intended to be favorable. Others have proposed using correlational methods in analyzing such data. One example is Kadane, Larkin, and Mayer (1981). Without going into the details of their proposal, the key idea of a correlational approach is to use Eq. 12.15 to write Ty — Tik = Xf — Xk and then correlate this quantity with the experimental variable used to manipulate, say, stage j. Donaldson (1983) pointed out that the estimate of the correlation should not be treated as straightforward because of possible measurement error in Ty — Tik. He made several proposals for alternative approaches, which I do not go into. For the fifth, I devote a separate subsection. 12.4.5
Speed-Accuracy
Tradeoff
As I have already remarked, the custom in much of cognitive psychology is to use response-time measures but to ignore the speed-accuracy tradeoff function or region. (In psychophysics, the custom is to use response error and to ignore response times.) In particular, when presenting the additive factors method, Sternberg focused entirely on the times and was little concerned with errors. The usual reasoning is that the stimuli are perfectly identifiable and error-free performance is readily achieved, and so the focus is necessarily on the time taken to achieve that performance. In practice, however, some errors do occur and one of two conventions is followed: either the relative frequency of errors is reported for each experimental condition and it is noted that no systematic error pattern exists, or the trials on which errors were made are discarded from the analysis. Neither procedure is really satisfactory to those who are convinced that speed-accuracy tradeoff may well reflect a deep aspect of mental processing. For such a person, the only acceptable approach is to study the relation, both empirically and theoretically. At a theoretical level, Oilman (1979) pointed out that when one is contemplating the method of additive factors it is important to consider the class of tradeoff models that is compatible with the method. Not all are. No completely general answer is known, but Oilman has provided two models that are sufficient. Suppose we are dealing with a model that presupposes the existence of n additive stages, of which n — 1 are associated with perceptual processes and one is response in nature. Each stage has some probability of introducing an error. Assume that there are no speed-accuracy tradeoffs in the perceptual stages, but that one exists in the response stage. If so, manipulations affecting the perceptual stages will alter the times but not the errors. So if the response stage is not manipulated, the error rate will be unaffected by
484 Matching Paradigms
perceptual manipulations whereas the times will be affected additively. To be more explicit, consider a two-choice situation and lump together the perceptual processes. Let Ts denote the latency of perceptual processing and Ps the probability that it is carried out correctly. Similarly, let TR and PR denote the latency and probability of correct response processing. Then the overall mean response time and probability of a correct response are given by
Suppose the speed-accuracy tradeoff of the response process is described by then substituting Eq. 12.25 into Eq. 12.24 and using Eq. 12.23, The surprising thing about this observable speed-accuracy equation is that its intercept is altered as fi(Ts) is changed, despite the fact that it is assumed that perceptual manipulations do not entail a speed-accuracy tradeoff. Note that if Ts is the sum of the latencies corresponding to two or more perceptual components that can be manipulated independently, then by Eq. 12.23, the additive-factors method is appropriate. Oilman pointed out that the fast-guess model is also appropriate to the additive-factors method. Assuming symmetry, so PaA = PhB = Ps and setting E(T R ) = i>0 and E(Ts) = vl — v0 in Eq. 7.31c, we then have as the speedaccuracy trade-off
Unlike the previous model, altering the perceptual process affects not the intercept of the tradeoff, but rather its slope. But so long as the manipulation affects the sensory process in such a way that the probability of a fast guess is unaffected, there will be no speed-accuracy tradeoff and the additive factors methods is appropriate. For other speed-accuracy tradeoff models, it is a matter to verify whether the method is appropriate. Probably one's prior bias should be against its appropriateness. 12.4.6 Critical Path Generalization of Additive-Factors Method The only attempt of which I am aware to generalize additive factors to processing networks other than serial is due to Schweickert (1978, 1980, 1982a, b; 1983a, b). His work is both conceptually and empirically interesting; but as of now it is somewhat limited by the assumption that the
Processing Stages and Strategies 485
processing latencies are deterministic, not random variables. It draws upon a method, called critical path analysis, originally developed in operations research and computer programming. The focus there was on calculating the time required to complete a task as a function of the flow diagram and the durations of the various stages of processing (Kelley, 1961; Modor and Phillips, 1970; Weist and Levy, 1969). In contrast to the way in which critical path methods are usually applied, when studying mental processes we do not know the processing network in advance. Rather, our task is to infer the network from the overall response times obtained under various experimental manipulations aimed at differentially affecting the latencies of the components. We shall need some terminology. A partial order on a set X is a relation < on X satisfying two properties: (i) < is irreflexive: for all x in X, not x<x. (ii) < is transitive: for all x, y, z in X, if x < y and y < z , then x < z. A relation on a set of processing stages is called a processing network if and only if it is a partial order for which there is a unique starting stage s (i.e., a stage having no antecedent element in the order) and a unique terminating stage t (i.e., one having no successor in the order). Two stages are called comparable if and only if one precedes the other in the order; otherwise, they are called incomparable. These two concepts generalize the primary features of, respectively, serial and parallel networks. A network is serial if there is a unique labeling of the elements, K\, x2, • • • , xn such that s = x,, t = xn, and X; < x, if and only if i < /. In a serial network, any two distinct stages are comparable. A network is parallel if for all x, y in X — {s, t}, s <x, y < t, and x and y are incomparable. To sketch such networks one can use either of two conventions. Each stage can be represented as a node, connecting an arrow from node x to node y if and only if x < y, or each stage can be represented as an arrow, leading the head of stage x into the tail of stage y if and only if x < y. The latter seems more useful in the present context. The first major concept in this work is that of the slack of a stage: the amount of delay that can be introduced into that stage before it results in an increase in the overall time to complete the network using paths through that stage. For example, in the network of Figure 12.5, where the times of the individual stages are shown, then the slack of stage x is 2. The reason is that stage w cannot be begun until z is completed, which takes 11 time units, whereas x followed by y takes only 9 units. So, the path via x and y does not slow down the entire process until it takes more than 11 units, and so x can be increased from 5 to 7 without affecting matters. Similarly, the slack for y is 2, whereas those of z and w are each 0. The latter illustrates the general fact that the slacks of stages on a critical path—one controlling the overall time—must all be 0. In particular, in a serial process, all stages have zero slack.
486 Matching Paradigms FIG. 12.5 A network of stages described in the text.
We have previously encountered slack considerations in the study of Welford's single channel hypothesis (Section 5.4). The results stated here generalize those. Theorem 12.6 (Schweickert, 1978). Suppose x and y are two stages of a processing network and suppose that the time of x is prolonged by Ax and that of y by Ay. Let AT denote the change in overall time, T, due to the prolongations shown in its argument. (i) If x and y are incomparable stages, then
(ii) // x and y are comparable, if x precedes y, if x, and y are sufficiently large so that and if in the presence of both prolongations x and y lie on a critical path, then
where the term K(x, y), which is called the coupled slack between x and y, depends upon x and y but not upon Ax and Ay. Proof, (i) Let Sx and Sy be the slacks of x and y and let S y (Ax) be the slack of stage y given that x has been prolonged by Ax. Introduce the notation
By definition of the slacks,
But clearly,
Processing Stages and Strategies 487 and so substituting and using the definition of [a] + ,
(ii) By assumption, when both Ax and Ay have been applied there is a critical path from s through x and y to t; let X j and x2 denote the stages of the critical path that immediately precede and succeed, respectively, x, and let y t and y2 be the similar ones for y. This path may or may not be critical when less than both prolongations have been introduced. Denote by d(u, v) the duration on this path from stage u to stage v and let d ( u ) be the duration of stage u. Thus, on this path but with no prolongations,
T = d(M) + d(s,y,) + d(y) + d(y 2 ,t) + Sy. Let Sxy denote the slack of x relative to just stage y, not to the entire process. Then, Substituting,
d(s, y t ) = d(s, x1) + d(x) + d(x 2 , y t ) + Sxv.
T = d(s, X!) + d(x) + d(x 2 , y 1 ) + d(y) + d(y 2 , t) + Sy + Sxy. Since by assumption the path becomes critical when both prolongations Ax and Ay are introduced, and so all of the times simply add, we see that
Taking the difference,
But 0 < AT(Ax) = Ax - Sx and 0 < AT(Ay) = Ay - Sv, so Eq. 12.29 holds with At first glance, it appears as if the coupled slack K(x, y) should be non-negative since one might anticipate that the slack of x relative to stage y is no greater than the entire slack, but that is incorrect. Consider the network shown in Figure 12.6. We see that Sxv is 4 because y cannot begin FIG. 12.6 A network of stages discussed in the text.
488 Matching Paradigms TABLE 12.5. Response times and changes in them from Stcrnberg's digit naming experiment" Stimulus quality
Number of alternatives
Degraded
2 2
Intact
8
Degraded Intact
8 2 2
Intact
Degraded Intact Degraded
8 8
Response transformation
Prol ongation
H
Baseline
n n n n +1 n +l n + 1. n +1
AO AO AQ AO
RT 328
ART 0
30 AN AN AR AR AN AR AN AR
43 97 18 45 144 197
" Adapted from a handout of Schwcickcrt (1982h).
until 9 units have elapsed. But for the entire proeess the slack of x is only 2 because any larger value causes the upper path of 3 + 5 = 8 to exceed 10, the total time of the originally slowest path. Thus, K(x, y) = 2 — 4 — —2. Schweickert (1978) has shown that a network has a negative coupled slack if and only if it includes a subnetwork of the form shown in Figure 12.6. Part (ii) of Theorem 12.6 generalizes to three manipulations as follows:
12.4.7
An Application of the Critical Path Method
Schweickert (1982b) did a critical path analysis of Sternberg's (1971) digit-naming experiment (Section 12.4.2) in which the three manipulations were: intact or degraded stimuli, two or eight search alternatives, and respond either with the target digit or its successor. Using Q to refer to the stage(s) affected by stimulus Quality, N those affected by the Number of alternatives, and R to those affected by the Response transformation, the data are tabulated in Table 12.5. It is clear that Eq. 12.28 does not hold for any pair of the factors, so none are incomparable. Using Eq. 12.29 twice to
Processing Stages and Strategies 489 calculate the two coupled slacks K(Q, N) and K(N, R) from the pairwise data, we see that Eq. 12.30 is satisfied within 1 msec. So the evidence is that we are dealing with a simple serial process with the stages either in the order Q
12.4.8 Inferring, the Processing Network from Comparability Data In principle, at least, Theorem 12.6 provides us with a way to decide for each pair of stages whether they are comparable or incomparable. So the next question, which Schweickert (1983b) took up in detail, is the degree to which one can uniquely infer the processing network from comparability information on all pairs of stages. A comparability relation simply says for any two nodes (stages), whether or not they are comparable, which relation is obviously symmetric. Any symmetric relation C is said to be consistent with a partial order < on the same set X of nodes if and only if for every x, y in X, xCy is equivalent to either x < y or y <x. So the first issue is the conditions under which a comparability graph is consistent with a partial order, and the second issue is how unique is the induced partial order. To answer these two questions, we need two additional concepts. Suppose C on X is a symmetric relation. A subset Y of X is partitive if and only if for all x, y in Y and all a in X - Y it is true that aCx is equivalent to aCy. A subset Z of X is stable if and only if for all x, y in Z it is not true that xCy. Theorem 12.7 (Shevrin and Filippov, 1970; Trotter, Moore, and Summer, 1976). A symmetric (comparability) relation is consistent with a partial order
490 Matching Paradigms
if and only if every proper partitive subset Ls stable. In that case, the relation is consistent with exactly two partial orders, which are converses of each other. A proof can be found in Schweickcrt (1983b). The following result attempts to make clear the meaning of partitive in those comparability relations arising from response-time experiments. Theorem 12.8. A subset Y of a comparability relation arising from Theorem 12.6 is partitive if and only if for every x, y in Y and a in X- Y, the slack Sax is defined if and only if Say is defined; when both are defined, they are equal. So far, the situations to which these methods have been applied are sufficiently simple, usually involving only three manipulations, that it is easy to infer the partial order and to establish its uniqueness. For more complex cases in the future, these results provide the conditions for the existence and uniqueness of the partial order. Schweickert (1983a) has provided a systematic algorithm for finding the partial order, if it exists. 12.5
CONCLUSIONS
The first conclusion of the chapter, formulated in Section 12.2.6, is that even rather complete distributional data from a single choice experiment do not permit us to infer much about the underlying organization of information processing. This was based upon a series of quite general mimicking theorems. The remaining two sections were concerned with attempts to use closely related sets of experiments to draw the desired inferences. In Section 12.3, we examined families of related Same-Difference designs involving stimuli consisting of one or two basic elements, such as letters, which were intended to decide between serial or parallel processing and exhaustive or self-terminating searches. Results were derived that, in principle, should permit the choices to be made, but I do not find the argument entirely convincing. There were two basic problems. The least fundamental was the fact that to fit the data at all so many parameters had to be introduced that hardly any degrees of freedom were left to test the model. Presumably, that could be overcome by considering other, somewhat more complex designs. However, that would feed into my other, far more fundamental, objection: the particular search sequences postulated, which are by no means uniquely defined by the experimental situation, are not justified by any deep psychological principles. In some sense they are optimal, but there is no assurance that subjects in such an artificial experiment are performing optimally. To make the approach convincing, one really should consider all logically tenable search procedures, in which case it is doubtful if any clear inference could be drawn. The third approach is really a historical sequence of ideas, beginning with Bonders' method of pure insertion, to the serial, additive factors techniques
Processing Stages and Strategies 491
of Sternberg and the single channel hypothesis of Welford, and ending in the critical path methods of Schweickert. All of these methods presume substantially the same experimental task, but with experimental manipulations that are believed either to eliminate a stage of processing totally (Donders) or to alter the time required of one or more stages. In different contexts, both Sternberg and Welford introduced versions of the latter idea on the assumption of a serial system. Sternberg showed how to use analysis-of-variance techniques to search for signs of nonadditivity, which within the context of a serial system, is interpreted as evidence that the manipulations affect a common stage. Schweickert, in contrast, considered any possible network of precedence relations and used various relations among the observed (and treated as deterministic) times as the manipulations are carried out singly and together. From such data, he has shown how to make reasonably detailed inferences about the underlying processing. His methods have yet to be widely used and they suffer from the fact that they have not been developed for random variables. However, they appear to hold considerable promise for clarifying the organization of processing—at least on the assumption that stages exist. As I said at the onset, I make no attempt to draw overall conclusions, which would amount to little more than a recapitulation of the conclusions of this and the ten preceding chapters. I cannot, however, refrain from commenting that despite the fragmentary nature of the results, progress is being made. We know far more today about response times and their uses than we did a decade ago, let alone in the early 1950s when I first started to work in the area.
This page intentionally left blank
APPENDIXES
This page intentionally left blank
Appendix A: Asymptotic Results for Independent Random Variables A.I
SERIAL MODELS: SUM OF RANDOM VARIABLES
A number of authors have suggested that perhaps the decision process, as may also be true of the residual one, is composed of a number of independent components. One way in which this might come about is for the information on several different neural fibers to be extracted sequentially. Such a notion coupled with the physiological observations that from 100 to as many as 1000 fibers are activated by, for example, a pure tone, suggests that it may be reasonable to suppose there are a relatively large number of very similar components. Of course, the possibility cannot be dismissed that before a decision is reached another component, say a computation, having quite a different distribution is added to all of them. It is not difficult to think of additional variations. I shall treat several of the best known variants in this appendix. Throughout, independence is the main assumption, and in this section I also assume that the several component times are added. A. 1.1 Identical Random Variables The purpose of this subsection is to formulate the result that, among other things, describes how the sample mean of a random sample,
behaves as n becomes large. The result—that there is a unique limiting distribution and that it is Gaussian—is called the Central Limit Theorem. The term apparently originated with Polya (1920) who referred to it as "the central limit theorem of probability theory," meaning "primary" or "most significant" limit theorem. Recall that in Section 1.3.5 we argued that one should look not at Xn itself but the normalized random variable
which has expected value 0 and variance 1 for every n. 495
496
Appendixes
Theorem A.I (Central Limit Theorem). Suppose X , , . . . , X fc , . . . , are independent, identically distributed random variables with mean VL and variance a2. Then for all real a and b, a < b,
where
Since
then letting My denote the mgf of Y, where X is the generic random variable. Since
Consider the Taylor's series expansion of M-x.--».,
Let e(s) = s^£L,(£)/3!. UsinR Eq. 1.48,
and so bv several applications of Eq. 1.47,
Substituting into the Taylor s series expansion,
Therefore,
Since e(s) —» 0 as s —» 0, the right term is negligible relative to s2/2n, and so
Appendix A 497 by definition of the number e,
which by Eq. 1.50 is the mgf of the Gaussian with mean 0 and variance 1. • Among other things, the proof establishes that a unique asymptotic distribution exists. If one is willing to assume this is so, then there is a simpler line of argument which establishes that the asymptotic distribution is Gaussian. With no loss of generality, assume /LA = 0 and a = 1. Let clearly, 8,,/Vn" has mean 0 and variance 1. On the assumption that an asymptotic distribution G exists, consider the very special case where all the Xk have that distribution, as of course so does SjVn. Denote by M the moment generating function of G, then
where we have used the fact that the mgf of a sum of independent random variables is the product of their respective mgf's. Since Eq. A.2 must hold for every positive number s and every integer n, it clearly places substantial constraints on the mgf M. Indeed, this functional equation has a unique solution. Theorem A.2. If Eq. A.2 holds for mgf M, then M(s) = exp(s2/2), which is the mgf of the Gaussian with mean 0 and variance \. I formulate this result, even though it is implied by Theorem A.I, because it is really quite easy to prove and because the strategy involved can be generalized to other asymptotic problems. In particular, we will follow it in order to arrive at the form of the asymptotic distribution of the maximum of independent, identically distributed (abbreviated iid) random variables. A. 1.2
Distribution of Counts in a Renewal Process
Another way to view the problem of sums of independent, identically distributed random variables is in terms of the number N(t) of events that have occurred by time t. Such a counting process, in which the inter-event intervals are independent and identically distributed is called a renewal process. The major observation is the following equivalence of events when we look at the process both from the point of view of sums and counts: This simple fact can be used to derive a central limit theorem for the renewal process (N(f)}.
498 Appendixes Theorem A.3. Suppose X m n =- 1, 2, . . . , are independent and Identically distributed random variables with finite mean ^ and variance a2. Let N(t) = n if and only if Sn = £ X; < t and Sn , t > I. 77ien
where
Fix x and let n and t approach °° so that
Now, by Theorem A.I, we know that
But,
Observe that
By hypothesis, as n, f—»°° the first term on the right approaches x, and we show that the second on the right approaches 1, which then establishes the assertion. To show that /u,n/f—»•!, consider the identity
If fin/f—»°° or -co, then the right term approaches jj,n 1/2 /cr, which in turn approaches °°. But this contradicts the assumption about the limiting process that the left term of Eq. A.3 approaches —x. So iin/t must have a finite limit, say k. If k^l, then since we see again that the right term of Eq. A.6 approaches <« or — °o, which is impossible. Thus, k = 1, completing the proof. Perhaps the most remarkable aspect of this result is the fact that the asymptotic variance is of the form frr 2 /|U, 3 , and there is little that is intuitive about the factor jx 3 except that it leads to a dimensionless quantity, which the variance of a count must be.
Appendix A 499
A. 1.3 Nearly Identical Random Variables From the middle of the 18th century, various limit theorems were formulated and more-or-less proved, but it was not until Liapounov (1900, 1901) that the central limit theorem for iid random variables was proved rigorously and generally. During the next 40 years the problem was greatly broadened and completely solved. Basically the question became a search for conditions under which it is possible to find two sequences of numbers {an} and {bn} such that the distribution of anSn + bn converges to the Gaussian distribution. Note that for the iid case, bn = -n' /2 fj,/o- and an = l/n 1/2 cr are the normalizing sequences usually used. For those with a strong mathematical background, a full treatment can be found in Ch. VI of Loeve (1963, 1977) or in Gnedenko and Kolmogorov (1954). Here we only mention a sufficient condition due to Lindeberg (1922). Suppose Xfc are mutually independent random variables with density functions ft for which E(X k ) = 0 and V(XJ = crk. Set
If for every t> 0,
then the distribution of S* approaches the Gaussian with 0 mean and unit variance as n approaches oo. In particular, if there is a finite interval [a, b] such that with probability 1 each of the X k lies entirely in [a, b], then the condition is met; however, there are many cases of unbounded variables for which it is also met. Although great care must be exercised in invoking the central limit theorem to understand empirical phenomena—it is by no means a universal rationale for assuming that a random variable of a theory has the Gaussian distribution—certain psychophysical reaction-time situations appear to fulfill at least some of the conditions just described. For example, signals to either the eye or to the ear are known to activate relatively large numbers of anatomically parallel nerve fibers, and there is some physiological evidence to suggest that the activation of these fibers is both comparable in magnitude and independent. Thus, one might well associate with each fiber a random variable corresponding to the time it takes to extract the information on that fiber. At this point, however, it becomes less clear what one should assume. Do those times add, leading to a decision? If so, is there some additional time taken up in somehow combining the information? Or are all those times being carried out in parallel so it is the slowest that determines the total time involved in the initial extraction of information? It is clear that a variety of models are possible; some of these possibilities are treated in the remainder of the appendix.
500
Appendixes
A.1.4
Identical Gaussians
If some sort of repetition of random variables leads to an asymptotic distribution, then repeating random variables with that distribution must continue to result in it. This is the closure property of asymptotic distributions. So one anticipates that the sum of two independent Gaussian random variables must also be Gaussian. This is easily verified by a moment generating function argument. Suppose {Xk} are independent Gaussian with means /u,k and variances a\. By Eq. 1.50, their mgf's are exp(/Lt k s -f^o-j^s 2 ). Thus, the mgf of the sum is
which clearly is the mgf of a Gaussian with m e a n a n d variance I oi
A.I.5
Identical Exponentials
Suppose the random variables X k are independent and exponentially distributed—say, by 1-e AfcX . By Eq. 1.49, each has the mgf A k /(A k -s), and so by Eq. 1.46 the mgf of the sum Srt is ]] A k /(A k — s ) - Even in the simple Ic-l
case where all A k = A, it is not immediately obvious which distribution has A"/(A-s) n as its mgf. One can, in fact, find it in tables (Erdelyi, Magnus, Oberhettinger, & Tricomi, 1954). However, because exponentials have nice mathematical properties, it is not very difficult to calculate the density function fn of Sn directly by induction. Observe that S n = t occurs whenever S n __ 1 = x and X pl = t — x. Since X,, and §„--, are independent and nonnegative,
For n = 2, fn-i(x) = j \ ( x ) = Ae~ A x , and substituting into Eq. A.4 and integrating we see / 2 (t) = A 2 te xt. Computing a few more terms suggests the induction hypothesis
Substitution of Eq. A.5 into Eq. A.4 verifies the conjecture, as does computing the mgf of Eq. A.5. The density shown in Eq. A.5 is known as
Appendix A 501
the gamma; an explicit formula, in elementary functions, for its distribution function is not possible. Obviously, being the sum of n identical exponentials, the mean is n/A and the variance is n/A 2 . Its hazard function cannot be expressed in simple terms because the distribution function can only be expressed as an integral; however, we may use Theorem 1.1 to determine its Qualitative characteristics. Observe that and so
which for n > 1 is an increasing function of t. Thus, the hazard function of the gamma for n > 1 is increasing; for n — 1, it is constant. To gain some idea of what happens when the parameters are not equal, consider the case of n = 2. Following McGill (1963), we may write the mgf as
which we see is the mgf of
Suppose with no loss of generality that A 2 > A t , and let a = A 2 — A 1; then Eq. A.6 can be rewritten as
and we see that for large t, e "' is negligible and the overall process is dominated by the slower process. Thus, one can estimate A j by fitting e ~ x '' to the tail of the empirical distribution, and then by deconvolving it, A 2 e X2' remains. This generalizes to any number of exponentials. In principle, the slowest can be isolated and removed, then the next slowest, and so on. Clearly, however, errors accumulate, and the method has never been used in connection with response-time data.
502
Appendixes
Ashby (1982a) has established the following result, which weakens considerably the hypotheses leading to an exponential tail, but also weakens the conclusion to the point where it is not clear how to make use of the result. Theorem A.4. Suppose X,, . . . , Xn are independent random variables whose hazard functions are all non-decreasing, that at least one of them is exponential,and that A is the smallest of the exponential parameters. Then there exists n a number T ) < A such that if \(t) is the hazard function of £ X k , then k l i m A ( t ) = T). ~'
The result says the tail of the sum is exponential if at least one component is exponential and all others have non-decreasing hazard functions, but the trouble is that the exponential parameter of the tail need not be that of any of the components. We need to know conditions under which r\ = A, in which case we could deconvolve the slowest exponential component.
A.2 PARALLEL MODELS: MAXIMA OF INDEPENDENT, IDENTICAL RANDOM VARIABLES If the information from each neural fiber is extracted in parallel, not in series, then the total time required to extract that information is the maximum, not the sum, of these random variables. Suppose they are X k and have distribution functions Fk. The only way in which the process will be completed by time t is for each of the separate ones to be completed by then. Assuming they are independent, we see that
In the case where they are all identically distributed, this distribution reduces to F". Observe that there is some formal similarity to the additive case. There we worked with products of the mgf's, whereas here products of the distribution functions play a comparable role. A.2.1 Form of Asymptotic Distributions As we did for the sum of iid random variables, we may ask the question whether or not the distribution of the maximum converges in some sense to an asymptotic distribution that is independent of the initial distribution. And as was the case there, we may gain some initial insight into the problem by supposing that such a distribution exists and then searching for it by assuming that the components all have this distribution. So let G be that distribution, and let us suppose that X k , k = 1, 2, . . . , are independent and are distributed according to G. It follows that under some appropriate
Appendix A 503 change of variable, the maximum of n of these also has the distribution G. That is, for some sequences {a(} and {bt}, for all integers n, and for all real x, The question is to find the solutions to this family of equations. The result is only slightly more complex than that for sums. Theorem A.5. Suppose that G is a distribution function that is differentiable and strictly increasing for 0 < G < 1 and that {an} and {bn} are sequences of numbers for which Eq. A.8 holds. Then there are three classes of solutions: (i) G(0) 7^ 0, 1 in which case an = 1, bn = —In n, and (ii) G(0) = 0, in which case one choice for the sequences is an = (lln)lla, a > 0, bn = () and
(iii) G ( 0 ) = l , in which case one choice for the sequences is an = —nlla, a >0, bn = 0, and
Proof, (i) G(0) + 0, 1. Since G lies in [0, 1], In G(x)<0, and -oo
Setting x = 0 and noting t/;(0) = 1 yields ifr(bn) = n, and so Since G is differentiable, so is i//, whence
Set x = 0, solve for an, and substitute,
Because i|/ is decreasing, »/>'(0)<0, and so we know that i// and -^'/^'(O) satisfy the same equation, whose solution we show below is unique. If so, then
whose solution is well known to be t//(jc) = exp(— ax + b).
504
Appendixes
To show that the solution is unique, observe that if ani=\, then it is possible to select x. such that x = anx + bn, in which case it follows that 4>(bn)=l. But since we know, ijf(bn) = n, this is impossible except possibly for n = 1. Excluding that case, we have
This is Abel's functional equation and it is well known that its family of possible solutions is generated from a single solution i//0 and a function p that is periodic with period In n for any n > 1 and is of the form i// 0 +p(i|/ 0 ). This, of course, means that p is periodic with the dense set of periods In m/n, where m and n are integers > 1. Since (// is strictly increasing and is onto, it is continuous and so p must also be continuous. So p is periodic with every period; that is, it is a constant. Thus, / — fe(//0, but since (//(()) = i/r0(0) = 1, the solution is unique. Substituting t^i = exp(—ax + b) into Eq. A.12 immediately yields bn = In n and an = 1. Solving for G yields the result. (ii) G(0) = (), whence G(s) = 0 for s < 0 . Setting s = 0 in the basic equation A.8 yields G(fr u ) = 0. Select bn=0. Assume an are not constant, then setting s = l, we see G(a n ) = G(l)", so G ( l ) - ^ 0 , i. Let iMs) = In G(s)/ln G ( l ) ; as in (i) it is decreasing and
By same uniqueness argument as used in part i, the solution is unique and it is easily verified to be ijs(s) — s ™, a >(), s X) and G(s) = cxp(— bs "). From this and the basic equation, we see an = ( l / n ) 1 7 ™ . (iii) Similar to (ii). This result is due to Fisher and Tipped (1928). For more recent presentations of it and many other results, see Gumbel (1958) or, for a much more readable presentation, Galambos (1978) or Leadbetter, Lindgren, and Rootzen (1983). These three asymptotic distributions are sometimes called the Extreme Value distributions of Type I, II, and III, respectively; in addition, the first (Eq. A.9) is often called the double exponential and the third (Eq. A.I 1) the Weibull. The former has recently played an important role in the work of Yellott (1977) as the distribution that in Thurstone's (1927a) discriminablc dispersion model is equivalent to my (1959) choice model.
A.2.2 Conditions on Distributions Leading to Each Asymptotic Distribution Given that we know the three possible asymptotic distributions for the maximum of iid random variables, the next natural question to ask is what properties on the underlying distribution lead to each. This is rather more difficult to formulate and answer than is the first questioti. I summarize the
Appendix A 505
answer in the next theorem, but I do not provide a proof; one can be found in Galambos (1978). Theorem A.6. Suppose sup{x |F(x)
F
is
a
distribution
function
and
w(F) =
and
where
then the asymptotic maximum is distributed as Eq. A.9 with a = 1, (3 = 0, and the normalizing constants may be chosen to be an — R(bn) and bn = inf{x | l - F ( x ) < l / n } . (ii) If w(F) = 00 and
then the asymptotic maximum is distributed as Eq. A.10 with b = \, a = T, and the normalizing constants may be chosen to be an =inf{x j 1 —F(x)s 1/rc} and bn =0. (iii) If ut (F) < co and if
satisfies Eq. A. 13, then the asymptotic maximum is distributed as Eq. A. 11 with b = 1, a = T, and the normalizing constants may be chosen to be an = (o(F)-inf{x l - F ( x ) < l / n } and bn = aj(F).
(iv) These are the only cases for which the asymptotic distribution exists.
To get some idea of what is involved, consider two important special cases with w(F) = oo. In the first, for large x the distribution is approximately exponential; that is, It is easy to see that R ( x ) = 1/0, and so
Thus, if a distribution function is asymptotically exponential, then the
506
Appendixes
asymptotic maximum is the double exponential. In the second case, suppose the distribution function is asymptotically a power function (which is a form often assumed for so-called high tailed distributions); that is, then
Thus, if a distribution function is asymptotically a power function, its asymptotic maximum is Eq. A. 10. Somewhat more generally, Galambos (1978, pp. 93-94) showed, first, that if a density function exists, a>(F) = °°, and
then case (11) holds. Basically, this condition says that tor large x the hazard function is decreasing as 1/x. And second, if a density function exists, has a derivative everywhere, f(x) ^ 0, and
then case (i) holds. This says that the reciprocal of the hazard function is flat as x approaches w(F); if o>(F) = °c; this means that the hazard function approaches a constant; that is, the tail of the distribution is exponential. A variety of particular cases have been studied. Galambos (1978, pp. 64-69, 99-100) included proofs that the Gaussian, log Gaussian, and gamma fall in case (i), the Cauchy [F(x) = ^ + ( l / r r ) arctan x] and Pareto [ F ( x ) = l — x p ] in (ii), and the uniform in case (iii). From the point of view of response-time models in which the individual random variables are assumed to have to do with the time to accumulate a fixed sample of information on each nerve fiber, if the spike trains are approximately Poisson, these times are asymptotically exponential and so the maximum time over a large number of fibers is approximately the double exponential. This distribution plays a minor role (Section 9.3.4) in choice reaction times. For particular assumptions about the underlying distributions, it is useful to know how the parameters of the asymptotic distribution relate to those of the underlying ones. The Gaussian and gamma cases have been worked out fully, and the relevant sequences are plotted in Wandell and Luce (1978).
Appendix B: Properties of Continuous Distributions for Simple Reaction Times A general source of information about distributions is Johnson and Kotz (1970). B.I THEORETICALLY 1.
BASED
DISTRIBUTIONS
Gaussian
m g f=e S ( i + ^ 2 / 2 mean = n. variance — cr2 hazard function: increasing (Section 1.2.3)
Comment: limit distribution of sum of independent, nearly identically distributed random variables Sections: 1.2.3, 1.4.3, 2.5.3, 3.2.1, 4.2.4, 4.3.1, 4.3.2, 4.3.4, 6.4.1, 6.5.1, 6.5.3, 7.1.3, 8.1.3, 8.2.4, 8.2.5, 8.4.1, 8.5.6, 9.3.1, 9.3.2, 9.3.4, 12.4.4, A.l.l-A.1.4, A.2.1 2.
Gamma , n an integer or replace (n — 1)!
by I» mgf=l/(l-s/A)"
mean= n/A variance = n/A 2 507
508
Appendixes
hazard function: increasing (Theorem 1.4(ii), Section A.1.5)
Comment: distribution of sum of n independent, identically distributed, exponential random variables Sections: 1.2.4, 1.4.3, 2.5.3, 3.2.3, 3.2.5, 4.1.3, 4.2.2, 4.3.2, 8.2.5, 10.4.2, 11.3.3, A.I.5 3.
Double Exponential (Extreme Value Type I)
mean = (3-!//(!)/«, where t//(l) = 0.57722 is Euler's constant variance = hazard function =
increasing
Comment: limit distribution of the maximum of independent, identically distributed random variables that are asymptotically exponential Sections 4.3.4, A.2.1, A.2.2 4.
Extreme Value Type II
mgf: No explicit formula known mean: No explicit formula known variance: No explicit formula known hazard function =
Comment: limit distribution of the maximum of independent, identically distributed random variables that are asymptotically power functions
Appendix B 509
Section A.2.1 5.
Weibull (Extreme Value Type III)
mgf: No explicit formula known mean = variance = hazard function =
increasing for
decreasing for
Comment: limit distribution of the maximum of independent, identically distributed random variables that are bounded from above Sections 4.3.4-4.3.5, A.2.1 6.
Wald (Inverse Gaussian)
Comment: arises from random walk model with one absorbing barrier Sections: 1.2.4, 4.2.4, 4.3, 8.2.5 7.
LaPlace
510 Appendixes mean = 0 variance
hazard function
Section 4.2.3
B.2 8.
AD HOC DISTRIBUTIONS Log Gaussian
mgf: No explicit formula known mean = variance = ha/.ard function: increasing and then decreasing to 0
Comment: logT is Gaussian distributed Section A.2.2; Woodworth & Schlosberg (1954) suggested it as an empirical approximation. 9.
Double Monomial
mgt: No explicit formula known mean = variance =
otherwise mean 2 ,
3therwise
Appendix B
Comment: hazard function peaked Section 4.1.2 10. Logistic mgf
mean variance : hazard function
11.
Ex-Gaussian
mgf
mean variance : hazard function: increasing (Theorem 1.4(ii)) where
511
Appendix C: Experimental Data For a description of the experiments, see the sections indicated. C.I
YELLOTT (1971), EXPERIMENT 3 DATA
Section 7.6.1 Sample size: 1000 observations per condition The nine conditions were as follows:
Condition
1 2
3 4 5 6
7 8 9
512
Deadline (msec)
P(SI)
300 300 250 250 400 400 300 250 400
.5 .7 .5 .3 .5 .7 .1 .1 .1
Subject 302"
Subject 301
Prop aA
aB
bA
bB
aA
bB
aA
aB
bA
bB
.86 .84 .92 .52 .77 .71 .47 .92 .70 .85 .93 .79 .18 .98 0 1.00 .08 1 .00
286 220 247 299 258 249 350 — 398
232 229 186 153 195 229 110 88 166
232 181 202 210 216 210 292 — —
279 311 244 203 245 284 132 91 174
.92 .97 .87 .57 .94 .98 .69
.91 .86 .86 .87 .96 .91 .98
265 253 250 254 275 271 287
243 265 219 179 259 283 186
249 229 218 218 222 252 277
269 275 259 225 281 296 232
210 158 157 168 184 185 205 201 225
238 229 201 171 217 244 188 158 179
Subject 304
Subject 303
1 2 3 4 5 6 7 8 9 1
MRT (msec)
bB
Cond. aA
1 2 3 4 5 6 7 8 9
Prop
MRT (msec)
.94 .98 .86 .63 .92 .99 .60 .29 .54
.90 .85 .79 .93 .88 .91 .99 .99 .99
259 247 239 245 254 258 259 247 254
245 263 232 195 236 272 185 163 178
258 243 229 233 250 256 268 200 231
260 268 230 218 241 271 210 174 196
.83 .89 .70 .39 .75 .93 .51 .22 .34
.89 .59 .63 .91 .82 .68 .98 .99 .99
248 200 205 214 222 218 249 253 291
199 188 166 147 181 209 163 158 172
Subject 302 completed only the first seven conditions.
513
C.2 OLLMAN (1970), EXPERIMENT 2 DATA Section 7.6.1 These are session-by-session data. Number of signals Observer
3
b
aA
bB
aA
6A
aB
bB
421 407 418
292 300 296
.976 .936 .950 .883
.853
374 365 319
430 403 313 265 266 207 214 206 251 336
354 367
423 419 394
423
406 418
421 417 415 416 407
419
383 415 418
422 408 419 415 423 412
422 420 415 415 425
408 422
514
MRT (msec)
a
420
4
Response proportion
292
299 293 300 296
292 342
.908 .494
.768 .804 .898 .950
299
.896
279 300 289 264
.934 .862
295 333
292 301 292 299 292 302 298 344 296 296 286 296 329
.832
.935
.901 .974
.991 .975 .940 .887 .868 .968 .912 .990 .508 .935 .911 .949 .995
.883 .895 .743 .696 .502
.487 .382 .688 .915 .428 .480
292 273 226
236 223 255
310 228 216 296
.827 .377 .625 .563 .913
220 234 228 296
.976
349
.900 .682 .645 .664 .921 .477 .980
.503 .794 .685 .696 .988
283
255 240 233 286 226 280 155 233 224 232 282
223 208 316 207
232 224 326 358 255 208 200 201 279 184 275 145
194 176
184 265
336 317 296
230
241 216 242 325 235 210 305
202 204 229 309
454 274 200 225 214 322 220 276 165 124 223 219 313
344 321
227 267 264 283 363 265 257
361
249
298 276 359 405
344 315
270 263
314 301 317 159 268 261
276 322
C.3 LAMING (1968), EXPERIMENT 2 DATA Section 7.6.2 Sample size: 200 per condition-observer averaged over 24 observers for a total of 4800 per condition
Presentatiori probability
.250 .375 .500 .625 .750
Respirmse propr>rtion
MRT
(msec )
aA
bA
aA
bA
aB
bB
.934 .959 .971 .981 .989
.009 .016 .025 .046 .069
430
381 380
286 327
373
417 414 402
364 349 329
351 375 371
415 422 433
385
391
C.4 LINK (1975) DATA Section 7.6.2 Sample size: 700 observations per condition—observer averaged over 4 observers for a total of 2800 per condition Response proportion
Presentation probability
aA
bA
.125 .250 .375 .500 .625 .750 .875
.559 .727 .779 .967 .985 .988 1.000
.008 .044 .016 .062 .199 .250 .450
MRT aA
(msee)
bA
aB
bB
576
298
289
333
523 514 442
377 331 351
338 312 339
395 404 479
405 383
304 314
276 333
520 544
.334
341
—
553
515
C.5 CARTERETTE, FRIEDMAN, AND COSMIDES (1965) DATA Section 7.7.2 Sample size: 1800 observations per observer—condition Response proportion Obs Condition aA
1
2
3
516
bA
MRT aA
bA
(msec) aB
bB
.2 low .Slow .Slow
.149 .080 1652 1543 1382 1357 .715 .622 1527 1602 1820 1922 .894 .850 1329 1455 1955 1969
.2 high .5 high .8 high
.302 .792 .947
.2 low .Slow .Slow
.246 .085 3295 3025 1422 1491 .749 .598 1608 1789 2035 2183 .897 .807 1723 2064 2760 3157
.2 high .5 high .8 high
.467 .047 4025 3577 1372 1694 .815 .338 1895 1956 1915 2192 .959 .619 1353 1364 2306 3995
.2 low .Slow .Slow
.333 .231 .660 .648 .900 .899
871 838 809
827 853 722
627 712 732
635 733 786
.2 high .5 high .8 high
.271 .264 .754 .660 .912 .894
798 865 730
894 873 764
587 759 765
600 782 787
.071 1744 1770 1471 1494 .427 1421 1626 1712 1770 .653 1478 1867 1738 1896
C.6 GREEN AND LUCE (1973) DATA Section 7.7.2 Sample size: 1500 observations per observer—condition s-deadline with (10, 10) payoff Response proportion
Obs Deadlines aA
bA
MRT (msec) aA
bA
aB
bB
2000 1500 1000 800 600 500 400 300
.994 .993 .972 .977 .955 .880 .870 .705 .997 .998 .993 .994 .993 .935 .988 .896
.004 902 1029 674 1165 .013 770 1266 934 1009 .028 587 538 631 746 .045 558 595 711 726 .052 478 443 527 611 .236 382 371 424 561 .292 342 286 344 577 .565 252 188 202 348 .007 913 1020 1145 1385 .002 749 839 865 998 .037 599 654 690 879 .043 539 571 840 859 .109 428 480 575 730 .292 347 341 403 570 .276 338 310 565 682 .630 266 231 211 576
2000 1500 1000 800 600 500 400 300
.999 .997 .992 .985 .941 .929 .929 .820
.004 .007 .030 .062 .174 .354 .386 .693
1
2000 1500 1000 800 600 500 400 300
2
3
965 1335 2333 1316 756 1315 975 1083 580 598 756 846 517 532 734 772 371 357 608 579 322 306 435 522 294 258 489 548 210 196 249 361
517
500msec s -deadline and varied payoffs Response proportion Obs Payoff
MRT (msec)
aA
bA
aA
bA
aB
bB
1 (1,20) (5, 15) (10,10) (15,5) (20, 1)
.618 .657 .859 .953 .978
.052 .071 .130 .290 .392
463 443 429 377 391
395 414 386 368 386
401 471 456 472 691
468 506 578 637 799
2 (1,20) (5,15) (10,10) (15,5) (20,1)
.596 .651 .913 .973 .994
.047 431 .065 462 .130 413 .334 363 .436 344
382 458 386 362 360
372 482 385 528 448 624 441 647 562 1050
3 (1,20) (5,15) (10, 10) (15,5) (20,1)
.652 .756 .872 .932 .961
.097 .159 .206 .300 .405
359 412 354 334 321
596 567 672 578 519
368 373 362 348 338
536 575 564 616 753
sn -deadline condition with (10, 10) payoff Response
MRT (msec)
proportion Obs Deadline 4
5
bA
aA
.096 .103 .130 .237 .315 .468 .480 .507 .594
897 1020 717 791 625 665 465 423 398 379 282 272 285 265 191 181
bA
2000 1500 1000 800 600 500 400 300 250
.935 .866 .818 .748 .711 .643 .648 .503 .589
2000 1500 1000 800 600 500
.061 1044 1128 .089 769 871 .115 629 716 .208 472 489 .240 427 442 .368 349 343 299 290 .547 .458 200 .555 .461 201 .470 .490 179 174 .906 .103 833 980 .911 .078 804 973 .864 .113 665 736 .758 .124 492 539 .768 .147 468 490 .648 .258 404 410 .605 .402 297 276 .680 .654 177 173 .663 .666 125 127
400 300
250 2000 1500 1000 800 600 500 400 300 250
518
aA
.950 .888 .799 .767 .691 .629
140
142
aB
bB
886 722 621 441 379 288 282 187 140
933 746 634 478 422 320 296 192 141
961 1093 792 826 648 654 491 509 424 459 344 352 284 311 180 191 174 169 843 823 694 483 478 403 297 171 130
837 770
679
493 475 409 326
172 135
600msec sn-deadline and variable payoffs Response proportion
MRT (msec)
aA
bA
aA faA
aB
bB
4 (1,20) (5,15) (10, 10) (15,5) (20, 1)
.371 .803 .888
.083 .178 .295 .435 .628
420 405 4L7 410 397
413 428 410 403 376
347 385 409 447 454
361 405 429 452 452
5 (1,20) (5, 15) (10,10) (15, 5) (20, 1)
.639 .657 .738 .749 .823
.165 .190 .222 .303 .363
455 434 436 426 407
469 476 455 452 418
449 425 447 454 451
464 452 464 474 471
6 (1,20) (1, 15) (10, 10) (15,5) (20, 1)
.582 .080 .634 .094 .755 .193 .834 .380 .879 .463
485 468 469 434 436
505 484 490 447 449
460 430
457 430 475 462 487
Obs
Payoff
.549 .719
483 460 480
C.7 GREEN, SMITH, AND VON GIERKE (1983) DATA Section 8.5.3 Response proportion N
P(a)
DS
5997 5800 5697
CE
SVG
Obs
aA
bA
MRT
(msec)
aA
bA
aB
bB
.25 .50 .75
.732 .006 259 .929 .039 238 .992 .365 202
232 237 199
188 209 233
186 205 249
5999 5299 5700
.25 .50 .75
.819 .012 229 .943 .059 206 .984 .190 189
210 190 179
178 193 215
183 200 229
6000 5800 5699
.25 .50 .75
.743 .024 243 .909 .091 211 .982 .278 192
223 211 202
199 210 217
198 214 238
519
C.8 LAMING (1968), EXPERIMENT 6 DATA In each case the top entry is the conditional response probability and the bottom one is the mean reaction time in msec. Section 10.4.2 Response Stimulus
A
N
C
1035 .825 742 2985 .013 672 4121 —
D
4976
E
6083
B
19200
520
A
— — — —
B
C
.174 823 .847 837 .057 835 .002 913 — —
.001 1340 .137 917 .862 794 .096 817 .002 819
D
E
— — — .003 — 1221 — .081 — 843 — .891 .011 734 835 .096 .902 824 644
893 2951 4453 5360 5543
References
Abeles, M. & M. H. Goldstein. Response of single units in the primary auditory cortex of the cat to tones and to tone pairs. Brain Research, 1972, 42, 337-352. Aczel, J. Lectures on Functional Equations and their Applications. New York: Academic Press, 1966. Ahumada, A., Jr., R. Marken, & A. Sandusky. Time and frequency analysis of auditory signal detection. Journal of the Acoustical Society of America, 1975, 57, 385-390. Aiken, L. R., Jr. & M. Lichenstein. Interstimulus and interresponse time variables in reaction times to regularly recurring visual stimuli. Perceptual and Motor Skills, 1964, 19, 339-342. Alegria, J. Sequential effects of foreperiod duration: Some strategical factors in tasks involving time uncertainty. In P. M. A. Rabbitt and S. Domic (eds.), Attention and Performance V. London: Academic Press, 1975, pp. 1-10. Alegria, J. Sequential effects of catch-trials on choice reaction time. Acta Psychologies, 1978, 42, 1-6. Allport, D. A. Phenomenal simultaneity and the perceptual moment hypothesis. British Journal of Psychology, 1968, 59, 395-406. Anderson, J. A. Two models for memory organization using interacting traces. Mathematical Biosciences, 1970, 8, 137-160. Anderson, J. A. A simple neural network generating an interactive memory. Mathematical Biosciences, 1972, 14, 197-220. Anderson, J. A. A theory for the recognition of items from short memorized lists. Psychological Review, 1973, 80, 417-438. Anderson, J. A., J. W. Silverstein, S. Z. Rity,, & R. S. Jones. Distinctive features, catagorical perception, and probability learning: Some applications of a neural model. Psychological Review, 1977, 84, 413-451. Anderson, J. R. Language, Memory and Thought. Hillsdale, N.J.: Erlbaum, 1976. Anderson, N. H. Algebraic models in perception. In E. C. Carterette & M. P. Friedman (eds.), Handbook of Perception, II. New York: Academic Press, 1974, pp. 215-298. Anderson, N. H. Foundations of Information Integration Theory. New York: Academic Press, 1981. Anderson, N. H. Methods of Information Integration Theory. New York: Academic Press, 1982. Anderson, T. W. A modification of the sequential probability ratio test to reduce the sample size. Annals of Mathematical Statistics, 1960, 31, 165-197. Angel, A. Input-output relations in simple reaction-time experiments. Quarterly Journal of Experimental Psychology, 1973, 25, 193-200. Annett, J. Payoff and the refractory period. Acta Psychofogica, 1969, 30, 65-71. Ashby, F. G. Testing the assumptions of exponential additive reaction-time models. Memory and Cognition, 1982a, W, 125-134. Ashby, F. G. Deriving exact predictions from the cascade model. Psychological Review, 1982b, 89, 599-607. Ashby, F. G. A biased random walk model for two choice reaction times. Journal of Mathematical Psychology, 1983, 27, 277-297. Ashby, F. G. & J. T. Townsend. Decomposing the reaction time distribution: Pure insertion and selective influence revisited. Journal of Mathematical Psychology, 1980, 21, 93-123. 521
522
References
Atkinson, R. C., J. E. Holmgren, & J. F. Juola. Processing time as influenced by the number of elements in a visual display. Perception & Psychophysics, 1969, 6, 321-326. Audley, R. J. A stochastic model for individual choice behavior. Psychological Review, 1960, 67, 1-15. Audley, R. J. & A. Mercer. The relation between decision time and the relative response frequency in a blue-green discrimination. British Journal of Mathematical and Statistical Psychology, 1968, 21, 183-192. Audley, R. J. & A. R. Pike. Some alternative stochastic models of choice. The British Journal of Mathematical and Statistical Psychology, 1965, 18, 207-225. Baddeley, A. D. & J. R. Ecob. Reaction time and short-term memory: Implications of repetition effects for the high-speed exhaustive scan hypothesis. Quarterly Journal of Experimental Psychology, 1973, 25, 229-240. Bamber, D. Reaction times and error rates for "same"-"different" judgments of multidimensional stimuli. Perception & Psychophysics, 1969a, 6, 169-174. Bamber, D. "Same"-"differcnt" judgements of multi-dimensional stimuli: Reaction times and error rates. Unpublished doctoral dissertation. Stanford University, 1969b. Bamber, D. Reaction times and error rates for judging nominal identity of letter strings. Perception & Psychophysics, 1972, 12, 321-326. Barlow, R. E., A. W. Marshall, & F. Proschan. Properties of probability distributions with monotone hazard rate. Annals of Mathematical Statistics, 1966, 37, 1574-1592. Barlow, R. E. & F. Proschan. Mathematical Theory of Reliability. New York: Wiley, 1965. Barlow, R. E. & F. Proschan. Statistical Theory of Reliability and Life Testing: Probability Models. New York: Holt, Rinehart, & Winston, 1975. Bartlett, M. S. An Introduction to Stochastic Processes. Cambridge, England: Cambridge University Press, 1962. Bartlett, N. R. & S. Macleod. Effect of flash and field luminance upon human reaction time. Journal of the Optical Society of America, 1954, 44, 306-311. Bartlett, N. R. A comparison of manual reaction times as measured by three sensitive indices. The Psychological Record, 1963, 13, 51-56. Beller, H. K. Parallel and serial stages in matching. Journal of Experimental Psychology, 1970, 84, 213-219. Berliner, J. E. & N. E. Durlach. Intensity perception. IV. Resolution in roving-level discrimination. Journal of the Acoustical Society of America, 1973, 53, 1270-1287. Bertelson, P. Sequential redundancy and speed in a serial two-choice responding task. Quarterly Journal of Psychology, 1961, 13, 90-102. Bertelson, P. S-R relationships and reaction times to a new versus repeated signals in a serial task. Journal of Experimental Psychology, 1963, 65, 478-484. Bertelson, P. Serial choice reaction-time as a function of response versus signal-and-responsc repetition. Nature, 1965, 206, 217-218. Bertelson, P. Central intermittency twenty years later. Quarterly Journal of Experimental Psychology, 1966, 18, 153-163. Bertelson, P. The time course of preparation. Quarterly Journal of Experimental Psychology, 1967, 19, 272-279. Bertelson, P. & A. Renkin. Reaction times to new versus repeated signals in a serial task as a function of response-signal time interval. Ada Psychologica, 1966, 25, 132-136. Beyer, W. A. Handbook of tables for probability and statistics. Cleveland, Ohio: The Chemical Rubber Co., 1966. Birren, J. E. & J. Botwinick. Speed of response as a function of perceptual difficulty and age. Journal of Gerontololgy, 1955, 10, 433-436. Bjork, E. L. & W. K. Estes. Detection and placement of redundant signal elements in tachistoscopic displays of letters. Perception & Psychophysics, 1971, 9, 439-442. Blackman, A. R. Test of the additive-factor method of choice reaction time analysis. Perceptual Motor Skills, 1975, 41, 607-613. Blough, D. S. Reaction times of pigeons on a wavelength discrimination task. Journal of the Experimental Analysis of Behavior, 1978, 30, 163-167.
References
523
Bloxom, B. Estimating an unobserved component of a serial response-time model. Psychometrika, 1979, 44, 473-484. Bloxom, B. Some problems in estimating response time distributions. In H. Wainer & S. Messick (eds.), Principals of Modem Psychological Measurement: A Festschrift in Honor of Frederick M. Lord. Hillsdale, N.J.: Erlbaum, 1983, pp. 303-328. Bloxom, B. Estimating response time hazard functions: An exposition and extension. Journal of Mathematical Psychology, 1984, 28, 401-420. Bloxom, B. A constrained spline estimator of a hazard function. Psychometrika, 1985, 50, 301-321. Blumenthal, A. L. The Process of Cognition. Englewood Cliffs, N.J.: Prentice-Hall, 1977. Borger, R. The refractory period and serial choice-reactions. Quarterly Journal of Experimental Psychology, 1963, 15, 1-12. Botwinick, J., J. F. Brinley, & J. S. Robbin. The interaction effects of perceptual difficulty and stimulus exposure time on age differences in speed and accuracy of response. Gerontologia, 1958, 2, 1-10. Botwinick, J. & L. W. Thompson. Premotor and motor components of reaction time. Journal of Experimental Psychology, 1966, 71, 9-15. Bracewell, R. The Fourier Transform and its Applications. New York: McGraw-Hill, 1965. Brebner, J. M. T. & A. T. Welford. Introduction: An historical background sketch. In A. T. Welford (ed.), Reaction Times. New York: Academic Press, 1980, pp. 1-23. Breitmeyer, B. G. Simple reaction time as a measure of the temporal response properties of transient and sustained channels. Vision Research, 1975, 15, 1411-1412. Briggs, G. E. The additivity principle in choice reaction time—A functionalist approach to mental processes. In R. F. Thompson & J. F. Voss (eds.), Topics in Learning and Performance. New York: Academic Press, 1972. Briggs, G. E. On the predictor variable for choice reaction time. Memory & Cognition, 1974, 2, 575-580. Briggs, G. E. & J. Blaha. Memory retrieval and central comparison time in information processing. Journal of Experimental Psychology, 1969, 79, 395-402. Briggs, G. E. & A. M. Johnsen. On the nature of central processing in choice reactions. Memory & Cognition, 1973, I, 91-100. Briggs, G. E., A. M. Johnsen, & D. Shinar. Central processing uncertainty as a determinant of choice reaction time. Memory & Cognition, 1974, 2, 417-425. Briggs, G. E., G. L. Peters, & R. P. Fisher. On the locus of the divided-attention effects. Perception & Psychophysics, 1972, M, 315-320. Briggs, G. E. & D. Shinar. On the locus of the speed/accuracy tradeoff. Psychonomic Science, 1972, 28, 326-328. Briggs, G. E. & J. M. Swanson. Retrieval time as a function of memory ensemble size. Quarterly Journal of Experimental Psychology, 1969, 21, 185-191. Briggs, G. E. & J. M. Swanson. Encoding, decoding, and central functions in human information processing. Journal of Experimental Psychology, 1970, 86, 296-308. Briggs, G. E., S. C. Thomason, & J. D. Hagman. Stimulus classification strategies in an information reduction task. Journal of Experimental Psychology: General, 1978, 107, 159-186. Brigham, E. O. The Fast Fourier Transform. Englewood Cliffs, N.J.: Prentice-Hall, 1974. Broadbent, D. E. Perception and Communication. London: Pergamon, 1958. Broadbent, D. E. & M. Gregory. Vigilance considered as a statistical decision. British Journal of Experimental Psychology, 1963, 54, 309-323. Brunk, H. D. An Introduction to Mathematical Statistics, 3rd ed. Boston: Ginn, 1975. Buckolz, E. & R. Rogers. The influence of catch trial frequency on simple reaction time. Ada Psychologica, 1980, 44, 191-200. Burbeck, S. L. Change and level detectors inferred from simple reaction times. University of California at Irvine, Ph.D. dissertation, 1979. Burbeck, S. L. A physiologically motivated model for change detection in audition. Journal of Mathematical Psychology, 1985. 29, 106-121.
524
References
Burbeck, S. L. & R. D. Luce. Evidence from auditory simple reaction times for both change and level detectors. Perception & Psychophysics, 1982, 32, 117-133. Burns, J. T. Error-induced inhibition in a serial reaction time task. Journal of Experimental Psychology, 1971, 90, 141-148. Burrows, D. Modality effects in retrieval of information from short-term memory. Perception & Psychophysics, 1972, 11, 365-372. Bush, R. R. & F. Mostcller. Stochastic Models for Learning. New York: Wiley, 1955. Carterette, E. C., M. P. Friedman, & R. Cosmides. Reaction-time distributions in the detection of weak signals in noise. Journal of the Acoustical Society of America, 1965, .38, 531-542. Cattell, J. McK. The influence of the intensity of the stimulus on the length of reaction time. Brain, 1886a, 8, 512-515. Cattell, J. McK. The time taken by cerebral operations. Mind, 1886b, 11, 20-242. Reprinted in A. T. Poffenberger (ed.), James McKeen Cattell, Man of Science, Vol. I: Psychological Research. Lancaster, PA: The Science Press, 1947. Chocholle, R. Variations des temps de reaction auditifs en fonction de I'intensitc a diverses frequences. L'Annee Psychologique, 1940, 41, 65-124. Christie, L. S. The measurement of discriminative behavior. Psychological Review, 1952, 59, 443-452. Christie, L. S., & R. D. Luce. Decision structure and time relations in simple choice behavior. Bulletin of Mathematical Biophysics, 1956 18, 89-112. Cleland, B. G., M. W. Dubin, & W. R. Levick. Sustained and transient cells in the cat's retina and lateral geniculate nucleus. Journal of Physiology, London, 1971, 217, 473-496. Clifton, C. & Birenbaum, S. Effects of serial position and delay of probe in a memory scan task. Journal of Experimental Psychology, 1970, 86, 69-76. Cooper, W. E. (ed.). Cognitive Aspects of Skilled Typewriting. New York: Springer-Verlag, 1983. Corballis, M. C. Serial order in recognition and recall. Journal of Experimental Psychology, 1967, 74, 99-105. Corbett, A. T. Retrieval dynamics for rote and visual image mnemonics. Journal of Verbal Learning and Verbal Memory, 1977, 16, 233-236. Corbett, A. T. & W. A. Wickelgren. Semantic memory retrieval: analysis by speed and accuracy tradeoff functions. Quarterly Journal of Experimental Psychology, 1978, 30, 1-15. Costa, L. D., H. G. Vaughan, & L. Gilden. Comparison of eiectromyographic and microswitch measures of auditory reaction time. Perceptual and Motor Skills, 1965, 20, 771-772. Cox, D. R. & W. L. Smith. Queues. New York: Wiley, 1961. Craik, K. .1. W. Theory of the human operator in control systems, I. The operator as an engineering system. British Journal of Psychology, 1947, 38, 56-61. Craik, K. J. W. Theory of the human operator in control systems, II. Man as an element in the control system. British Journal of Psychology, 1948, 142-148. Cramer, H. Random Variables and Probability Distributions. Cambridge, England: Cambridge University Press, 1937. Creamer, L. R. Event uncertainty, psychological refractory period, and human data processing. Journal of Experimental Psychology, 1963, 66, 187-194. Grossman, E. R. F. W. Entropy and choice time: The effect of frequency unbalance on choice-response. Quarterly Journal of Experimental Psychology, 1953, 5, 41-52. Grossman, E. R. F. W. The measurement of discriminability. Quarterly Journal of Experimental Psychology, 1955, 7, 176-195. Davies, D. R. & R. Parasuraman. The Psychology of Vigilance. London: Academic, 1982. Davies, D. R. & G. S. Tune. Human Vigilance Performance. New York: American Elscvier, 1969. Davis, R. The limits of the "psychological refractory period." Quarterly Journal of Experimental Psychology, 1956, 8, 24-38.
References
525
Davis, R. The human operator as a single channel information system. Quarterly Journal of Experimental Psychology, 1957, 9, 119-129. Davis, R. The role of "attention" in the psychological refractory period. Quarterly Journal of Experimental Psychology, 1959, I I , 211-220. de Boor, C. A Practical Guide to Splines. New York: Springer-Verlag, 1978. de Klerk, L. F. W. & E. Eerland. The relation between prediction outcome and choice reaction speed: Comments on the study of Geller et al., Acta Psychologica, 1973, 37, 301-306. Dixon, W. J., M. B. Brown, L. Engleman, J. W. France, M. A. Hall, R. C. Jennrich, & J. D. Toporck. BMDP Statistical Software. Berkeley, CA: University of California Press, 1981. Donaldson, G. Confirmatory factor analysis models of information processing stages: An alternative to difference scores. Psychological Bulletin, 1983, 94, 143-151. Donders, F. C. Over de snelheid van psychische processen. (On the speed of mental processes.) Onderzoekingen degaan in het physiologisch Laboratorium der Ugtrechtsche Hoogeschool, 1868-69, Tweede reeks, II, 92-130. Translated by W. G. Koster, in W. G. Koster (ed.), Attention and Performance II, Acta Psychologica, 1969, 30, 412-431. Dosher, B. A. The retrieval of sentences from memory: a speed-accuracy study. Cognitive Science, 1976, 8, 291-310. Dosher, B. A. Empirical approaches to information processing: speed-accuracy tradeoff functions or reaction time—a reply. Acta Psychologica, 1979, 43, 347-359. Drazin, D. H. Effects of foreperiod, foreperiod variability, and probability of stimulus occurrence on simple reaction time. Journal of Experimental Psychology, 1961, 62, 43-50. Duncan, J. The locus of interference in the perception of simultaneous stimuli. Psychological Review, 1980, 87, 272-300. Edwards, W. Optimal strategies for seeking information: Models for statistics, choice reaction times, and human information processing. Journal of Mathematical Psychology, 1965, 2, 312-329. Efron, B. The Jackknife, the Bootstrap and Other Resampling Plans. Philadephia, PA: Society for Industrial and Applied Mathematics, 1982. Egan, J. P. Signal Detection Theory and ROC Analysis. New York: Academic Press, 1975. Egan, J. P., G. Z. Greenberg, & A. I. Schulman. Operating characteristics, signal detectability, and the method of free response. Journal of the Acoustical Society of America, 1961, 33, 933-1007. Egeth, H. E. Parallel versus serial processes in multidimensional stimulus discrimination. Perception & Psychophysics, 1966, I, 245-252. Egeth, H., J. Jonides, & S. Wall. Parallel processing of multielement displays. Cognitive Psychology, 1972, 3, 674-698. Egeth, H. & E. E. Smith. On the nature of errors in a choice reaction task. Psychonomic Science, 1967, 8, 345-346. Eichelman, W. H. Stimulus and response repetition effects for naming letters at two responsestimulus intervals. Perception & Psychophysics, 1970, 7, 94-96. Elithorn, A. & C. Lawrence. Central inhibition—some refractory observations. Quarterly Journal of Experimental Psychology, 1955, 7, 116-127. Elliott, R. Simple visual and simple auditory reaction time: A comparison. Psychonomic Science, 1968, 10, 335-336. Emerson, P. L. Simple reaction time with markovian evolution of gaussian discriminal processes. Psychometrika, 1970, 35, 99-109. Emmerich, D. Receiver operating characteristics determined under several interaural conditions of listening. Journal of the Acoustical Society of America, 1968, 43, 298-307. Emmerich, D. S., J. L. Gray, C. S. Watson, & D. C. Tanis. Response latency, confidence, and ROCs in auditory signal detection. Perception & Psychophysics, 1972, 11, 65-72. Enroth-Cugell, C. & J. G. Robson. The contrast sensitivity of retinal ganglion cells of the cat. Journal of Physiology, London, 1966, 187, 517-552.
526
References
Entus, A. & D. Bindra. Common features of the "repetition" and "same-different" effects in reaction time experiments. Perception & Psychophysics, 1970, 7, 143-148. Erdelyi, A., W. Magnus, F. Obcrhettinger, & F. G. Tricomi. Tables of Integral Transforms, Vol I. New York: McGraw-Hill, 1954. Estes, W. K. Toward a statistical theory of learning. Psychological Review, 1950, 57, 94-107. Everitt, B. & D. J. Hand. Finite Mixture Distributions. New York: Chapman and Hall, 1981. Falmagne, J. C. Stochastic models for choice reaction time with applications to experimental results. Journal of Mathematical Psychology, 1965, 2, 77-124. Falmagne, J. C. Note on a simple fixed-point property of binary mixtures. British Journal of Mathematical and Statistical Psychology, 1968, 21, 131-132. Falmagne, J. C. Biscalability of error matrices and all-or-none reaction time theories. Journal of Mathematical Psychology, 1972, 9, 2,06-224. Falmagne, J. C., S. P. Cohen, & A. Dwivedi. Two-choice reactions as an ordered memory scanning process. In P. M. A. Rabbitt & S. Dornic (eds.), Attention and Performance, V. New York: Academic Press, 1975, pp. 296-344. Falmagne, J. C. & J. Theios. On attention and memory in reaction time experiments. Acta Psychological, 1969, 30, 316-323. Feller, W. An Introduction to Probability Theory and Its Applications, Vols. I and II. New York: Wiley, 1957, 2nd ed., 1966. Fernberger, S. W., E. Glass, I. Hoffman, & M. Willig. Judgment limes of different psychophysical categories. Journal of Experimental Psychology, 1934, 17, 286-293. Festinger, E. Studies in decision: I. Decision time, relative frequency of judgment, and subjective confidence as related to physical stimulus difference. Journal of Experimental Psychology, 1943a, 32, 291-306. Festinger, E. Studies in decision: II. An empirical test of a quantitative theory of decision. Journal of Experimental Psychology, 1943b, 32, 422-423. Fisher, R. A. & L. H. C. Tippett. Limiting forms of the frequency distributions of the largest or smallest number of a sample. Proceedings of the Cambridge Philosophical Society, 24, 180-190. Fitts, P. M. Cognitive aspects of information processing: III. Set for speed versus accuracy. Journal of Experimental Psychology, 1966, 71, 849-857. Fraisse, P. La periode refractorie psychologique. Annee Psychologique, 1957, 57, 315-328. Galambos, J. The asymptotic theory of extreme order statistics. New York: Wiley, 1978. Gardner, G. T. Evidence for independent parallel channels in tachistoscopic perception. Cognitive Psychology, 1973, 4, 130-155. Garrett, H. E. A study of the relation of accuracy to speed. Archives of Psychology, 1922, 56, 1-105. Geller, E. S. Prediction outcome and choice reaction time: Inhibition versus facilitation effects. Acta Psychologies 1975, 39, 69-82. Geller, E. S. & G. F. Pitz. Effects of prediction, probability and run length on choice reaction speed. Journal of Experimental Psychology, 1970, 84. '361-367. Geller, E. S., C. P. Whitman, R. F. Wrcnn, & W. G. Shipley. F.xpectancy and discrete reaction time in a probability reversal design. Journal of Experimental Psychology, 1971, 90, 113-119. Gcrstein, G. L., R. A. Butler, & S. D. Erulkar. Excitation and inhibition in cochlear nucleus. I. Tone-burst stimulation. Journal of Neurophysiology. 1968, 31, 526-536. Geschcidcr, G. A., .(. H. Wright, B. .). Weber, B. M. Kirchner, & E. A. Milligan. Reaction time as a function of the intensity and probability of occurrence of vibrotactile signals. Perception & Psychophysics. 1969, 5, 18-20. Ghosh, B. K. Moments of the distribution of sample si/e in a SPRT. Journal of the American Statistical Association, 1969, 64, 1560-1575. Gihman, I. I. & A. V. Skorohod. '['he Theory of Stochastic Processes, I, II, III. New York: Springer-Verlag, 1974. (Translated from the Russian edition of 1971.)
References
527
Glaser, R. E. Bathtub and related failure rate characterizations. Journal of the American Statistical Association, 1980, 75, 667-672. Gnedenko, B. V., Yu. K. Belyayev, & A. D. Soloyev. Mathematical Methods of Reliability Theory. (Translated by Scripta Technica; translation edited by R. E. Barlow.) New York: Academic Press, 1969. Gnedenko, B. V. & A. N. Kolmogorov. Limit Distributions for Sums of Independent Random Variables. Reading, Mass.: Addison-Wesley, 1954. Gordon, I. E. Stimulus probability and simple reaction time. Nature, 1967, 875-896. Granjon, M. & G. Reynard. Effect of length of the runs of repetitions on the simple RT-ISI relationship. Quarterly Journal of Experimental Psychology, 1977, 29, 283-295. Grayson, D. A. The Role of the Response Stage in Stochastic Models of Simple Reaction Time. University of Sidney, Australia, Ph.D. dissertation, 1983. Green, D. M. Fourier analysis of reaction time data. Behavior Research Methods & Instruments, 1971, 3, 121-125. Green, D. M. & R. D. Luce. Detection of auditory signals presented at random times. Perception & Psychophysics, 1967, 2, 441-449. Green, D. M., & R. D. Luce. Detection of auditory signals presented at random times: III. Perception & Psychophysics, 1971, 9, 257-268. Green, D. M. & R. D. Luce. Speed-accuracy trade off in auditory detection. In S. Kornblum (ed.), Attention and Performance, IV. New York: Academic Press, 1973, pp. 547-569. Green, D. M. & R. D. Luce. Timing and counting mechanisms in auditory discrimination and reaction time. In D. H. Krantz, R. C. Atkinson, R. D. Luce, & P. Suppes (eds.), Contemporary Developments in Mathematical Psychology, Vol. II. San Francisco: Freeman, 1974, pp. 372-415. Green, D. M. & A. F. Smith. Detection of auditory signals occurring at random times: Intensity and duration. Perception & Psychophysics, 1982, 31, 117-127. Green, D. M., A. F. Smith, & S. M. von Gierke. Choice reaction time with a random foreperiod. Perception & Psychophysics, 1983, 34, 195-208. Green, D. M. & J. A. Swets. Signal Detection Theory and Psychophysics. New York: Wiley, 1966. Reprinted by Huntington, N.Y.: Krieger, 1974. Greenwald, A. G. Sensory feedback mechanisms in performance control: With special reference to the ideo-motor mechanism. Psychological Review, 1970, 77, 73-99. Greenwald, A. G. On doing two things at once: Time sharing as a function of ideomotor compatibility. Journal of Experimental Psychology, 1972, 94, 52-57. Greenwald, A. G. & H. G. Shulman. On doing two things at once: II. Elimination of the psychological refractory period effect. Journal of Experimental Psychology, 1973, 101, 70-76. Gregg, L. W. & W. J. Brogden. The relation between duration and reaction time difference to fixed duration and response terminated stimuli. Journal of Comparative and Physiological Psychology, 1950, 43, 329-337. Grice, G. R. Stimulus intensity and response evocation. Psychological Review, 1968, 75, 359-373. Grice, G. R. Conditioning and a decision theory of response evocation. In G. Bower (ed.), Psychology of Learning and Motivation, Vol 5. New York: Academic, 1971, pp. 1-65. Grice, G. R. Application of a variable criterion model to auditory reaction time as a function of the type of catch trial. Perception & Psychophysics, 1972, 12, 103-107. Grice, G. R., R. Nullmeyer, & J. Schnizlein. Variable criterion analysis of brightness effects in simple reaction time. Journal of Experimental Psychology: Human Perception and Performance, 1979, 5, 303-314. Grill, D. P. Variables influencing the mode of processing of complex stimuli. Perception & Psychophysics, 1971, JO, 51-57. Gross, A. J. & V. A. Clark. Survival Distributions: Reliability Applications in the Biomedical Sciences. New York: Wiley, 1975.
528
References
Grossberg, M. The latency of response in relation to Bloch's Law at threshold. Perception & Psychophysics, 1968, 4, 229-232. Gumhel, E. J. The Statistics of Extremes. New York: Columbia University Press, 1958. Hacker, M. J. & J. V. Hinrichs. Multiple predictions in choice reaction time: A serial memory scanning interpretation. Journal of Experimental Psychology, 1974, 103, 999-1005. Hale, D. J. Sequential effects in a two-choice serial reaction task. Quarterly Journal of Experimental Psychology. 1967, 19, 133-141. Hale, D. J. Repetition and probability effects in a serial choice reaction task. Acta Psychologies, 1969a, 29, 163-171. Hale, D. J. Speed-error tradeoff in a three-choice serial reaction task. Journal of Experimental Psychology, 1969b, 81, 428-435. Hannes, M. The effect of stimulus repetitions and alternations on one-finger and two-finger responding in two-choice reaction time. Journal of Psychology, 1968, 69, 161-164. Harm, O. J. & J. S. Lappin. Probability, compatibility, speed, and accuracy. Journal of Experimental Psychology, 1973, 100, 416-418. Harwerth, R. S. & D. M. Levi. Reaction time as a measure of suprathreshold grating detection. Vision Research, 1978, 18, 1579-1586. Heath, R. A. A tandem random walk model for psychological discrimination. British Journal of Mathematical and Statistical Psychology, 1981, 34, 76-92. Hecker, M. H., K. N. Stevens, & C. E. Williams. Measurements of reaction time in intelligibility tests. Journal of the Acoustical Society of America, 1966, 39, 1188-1189. Henmon, V. A. C. The time of perception as a measure of differences in sensation. Archives of Philosophy, Psychology and Scientific Method, 1906, 8, 1-75. Henmon, V. A. The relation of the time of a judgment to its accuracy. Psychological Review, 1911, 18, 186-201. Herman, L. M. & B. H. Kantowitz. The psychological refractory period effect: Only half the double-stimulation story? Psychological Bulletin, 1970, 73, 74-88. Heuer, H. Binary choice reaction time as a criterion of motor equivalence. Acta Psychologies, 1981a, 50, 35-47. Heuer, H. Binary choice reaction time as a criterion of motor equivalence: Further evidence. Ada Psychologica, 1981b, 50, 49-60. Hick,, W. E. On the rate of gain of information. Quarterly Journal of Experimental Psychology, 1952, 4, 11-26. Hinrichs, .1. V. Probability and expectancy in two-choice reaction time. Psychonomic Science, 1970, 21. 227-228. Hinrichs, J. V. & J. L. Craft. Verbal expectancy and probability in two-choice reaction time. Journal of Experimental Psychology, 197 la, 88, 367-371. Hinrichs, J. V. & J. L. Craft. Stimulus and response factors in discrete choice reaction time. Journal of Experimental Psychology, 1971b, 91, 305-309. Hinrichs, J. V. & P. L. Krainz. Expectancy in choice reaction time: Stimulus or response anticipation? Journal of Experimental Psychology, 1970, 85, 330-334. Hinton, G. & J. A. Anderson (eds.). Parallel Models of Associative Memory. Hillsdale, N.J.: Erlbaum, 1981. Hockley, W. E. Analysis of response time distributions in the study of cognitive processes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 1984, 10, 598-615. Hohle, R. H. Inferred components of reaction times as functions of foreperiod duration. Journal of Experimental Psychology, 1965, 69, 382-386. Hopkins, G. W. Ultrastable stimulus-response latencies: Towards a model of response-stimulus synchronization. In J. Gibbon and L. Allan, Timing and Time Perception. New York: New York Academy of Sciences, Vol. 423, 1984, pp. 16-29. Hopkins, G. W. & A. B. Kristofferson. Ultrastable stimulus-response latencies: Acquisition and stimulus control. Perception & Psychophysics, 1980, 27, 241-250. Hull, C. L. Principles of Behavior. New York: Appleton-Century 1943.
References
529
Hyman, R. Stimulus information as a determinant of reaction-time. Journal of Experimental Psychology, 1953, 45, 188-196. Isaacson, D. L. & R. W. Madsen. Markov Chains. New York: Wiley, 1976. Jarvik, M. E. Probability learning and a negative recency effect in the serial anticipation of alternative symbols. Journal of Experimental Psychology, 1951, 41, 291-297. Jastrow, J. The Time Relations of Mental Phenomena. New York: Hodges, 1890. Jennings, J. R., C. C. Wood, & B. E. Lawrence, Effects of graded doses of alcohol on speed-accuracy tradeoff in choice reaction time. Perception & Psychophysics, 1976, 19, 85-91. John, I. D. A statistical decision theory of simple reaction time. Australian Journal of Psychology, 1967, 19, 17-34. Johnsen, A. M. & G. E. Briggs. On the locus of display load effects in choice reactions. Journal of Experimental Psychology, 1973, 99, 266-271. Johnson, D. M. Confidence and speed in the two-category judgment. Archives of Psychology, 1939, 241, 1-53. Johnson, D. M. The Psychology of Thought and Judgment. New York: Harper, 1955. Johnson, N. L. & S. Kotz. Distributions in Statistics, I, II, III. Boston: Houghton Mifflin Co., 1970. Jury, E. I. Theory and Application of the z - Transform Method. New York: Wiley, 1964. Kadane, J. B., J. H. Larkin, & R. H. Mayer. A moving average model for sequenced reaction-time data. Journal of Mathematical Psychology, 1981, 23, 115-133. Kahneman, D. Attention and Effort. Englewood Cliffs, N.J.: Prentice-Hall, 1973. Kantowitz, B. H. Double stimulation. In B. H. Kantowitz (ed.), Human Information Processing: Tutorials in Performance and Cognition. Hillsdale, N.J.: Erlbaum, 1974, pp. 83131. Kantowitz, B. H. On the accuracy of speed-accuracy tradeoff. Acta Psychologica, 1978, 42, 79-80. Karlin, L. Reaction time as a function of foreperiod duration and variability. Journal of Experimental Psychology, 1959, 58, 185-191. Karlin, L. & R. Kestenbaum. Effects of number of alternatives on the psychological refractory period. Quarterly Journal of Experimental Psychology, 1968, 20, 167-178. Karlin, S. & H. M. Taylor. A First Course in Stochastic Processes, 2nd ed. New York: Academic Press, 1975. Kay, H. & A. D. Weiss. Relationship between simple and serial reaction times. Nature, 1961, 191, 790-791. Keele, S. W. Attention and Human Performance. Pacific Palisades, CA: Goodyear, 1973. Keele, S. W. & W. T. Neill. Mechanisms of attention. In E. C. Carterette & M. P. Friedman (eds.), Handbook of Perception, Vol. IX. London: Academic Press, 1978, pp. 3-47. Kellas, G., A. Baumeister, & S. Wilcox. Interactive effects of preparatory intervals, stimulus intensity, and experimental design on reaction time. Journal of Experimental Psychology, 1969, 80, 311-316. Kelling, S. T. & B. P. Halpern. Taste flashes: Reaction times, intensity, and quality. Science, 1983, 219, 412-414. Kelley, J. E., Jr. Critical path planning and scheduling, mathematical basis. Operations Research, 1961, 9, 296-320. Kellogg, W. M. The time of judgment in psychometric measures. American Journal of Psychology, 1931, 43, 65-86. Kemeny, J. G. & J. L. Snell, Finite Markov Chains. Princeton, N.J.: Van Nostrand, 1960. Keuss, P. J. G. Reaction time to the second of two shortly spaced auditory signals both varying in intensity. Acta Psychologica, 1972, 36, 226-238. Keuss, P. J. G. & J. F. Orlebeke. Transmarginal inhibition in a reaction time task as a function of extraversion and neuroticism. Acta Psychologica, 1977, 41, 139-150. Keuss, P. J. G. & M. W. van der Molen. Positive and negative effects of stimulus intensity in
530
References
auditory reaction tasks: Further studies on immediate arousal. Acta Psychologica, 1982, 52, 61-72. Kiang, N. Y.-S., T. Watanabe, E. C. Thomas, & L. F. Clark. Discharge Patterns of Single Fibers in the Cat's Auditory Nerve. Cambridge, Mass.: MIT, 1965. Kingman, J. F. C. Poisson counts for random sequences of events. Annals of Mathematical Statistics, 1963, 34, 1217-1232. Kirby, N. H. Sequential effects in serial reaction time. Journal of Experimental Psychology, 1972, 96, 32-36. Kirby, N. H. Sequential effects in an eight choice serial reaction time task. Acta Psychologica, 1975, 39, 205-216. Kirby, N. H. Sequential effects in two-choice reaction time: Automatic facilitation or subjective expectancy? Journal of Experimental Psychology: Human Perception and Performance, 1976, 2, 567-577. Kirby, N. Sequential effects in choice reaction time. In A. T. Welford (ed.), Reaction Times. London: Academic Press, 1980, pp. 129-172. Klemmer, E. T. Time uncertainty in simple reaction time. Journal of Experimental Psychology, 1956, 51, 179-184. Kohfeld, D. L. Effects of the intensity of auditory and visual ready signals on simple reaction time. Journal of Experimental Psychology, 1969, 82, 88-95. Kohfeld, D. L. Simple reaction time as a function of stimulus intensity in decibels of light and sound. Journal Experimental Psychology, 1971, 88, 251-257. Kohfeld, D. L., J. L. Santee, & N. D. Wallace. Loudness and reaction time: I. Perception & Psychophysics, 198la, 29, 535-549. Kohfeld, D. L., J. L. Santee, & N. D. Wallace. Loudness and reaction time: II. Identification of detection components at different intensities and frequencies. Perception & Psychophysics, 1981b, 29, 550-562. Kohonen, T. Associative Memory: A System-Theoretical Approach. Berlin: Springer-Verlag, 1977. Koopman, B. O. The theory of search; Pt. 1: Kinematic bases. Operations Research, 1956a, 4, 324-346. Koopman, B. O. The theory of search; Pt. II: Target detection, Operations Research, 1956b, 4, 503-531. Koopman, B. O. The theory of search; Pt. Ill: The optimum distribution of searching effort. Operations Research, 1957, 5, 613-626. Koppell, S. The latency function hypothesis and Pike's multiple-observations model for latencies in signal detection. Psychological Review, 1976, 83, 308-309. Kornblum, S. Choice reaction time for repetitions and nonrepetitions: A reexamination of the information hypothesis. Acta Psychologica, 1967, 27, 178-187. Kornblum, S. Serial-choice reaction time: Inadequacies of the information hypothesis. Science, 1968, 159, 432-434. Kornblum, S. Sequential determinants of information processing in serial and discrete choice reaction time. Psychological Review, 1969, 76, 113-131. Kornblum, S. Simple reaction time as a race between signal detection and time estimation: A paradigm and model. Perception & Psychophysics, 1973a, 13, 108-112. Kornblum, S. Sequential effects in choice reaction time. A tutorial review. In S. Kornblum (ed.), Attention and Performance IV, New York: Academic Press, 1973b, pp. 259-288. Kornblum, S. An invariance in choice reaction time with varying numbers of alternatives and constant probability. In P. M. A. Rabbitt and S. Domic, (eds.), Attention and Performance V. New York: Academic Press, 1975, pp. 366-382. Kornbrot, D. E. The effect of response and speed bias on reaction time distributions in choice experiments. Paper presented to the Experimental Psychology Section, Sheffield, England, March, 1977. Koster, W. G. & J. A. M. Bekker. Some experiments on refractoriness. Acta Psychologica, 1976, 27, 64-70.
References
531
Krantz, D. H., R. D. Luce, P. Suppes, & A. Tversky. Foundations of Measurement, Vol. I. New York: Academic Press, 1971. Krinchik, E. P. The probability of a signal as a determinant of reaction time. Acta Psychologica, 1969, 30, 27-36. Kristofferson, A. B. Low-variance stimulus-response latencies: Deterministic internal delays? Perception & Psychophysics, 1976, 20, 89-100. Krueger, L. E. Effect of irrelevant surrounding material on speed of same-different judgment of two adjacent letters. Journal of Experimental Psychology, 1973, 98, 252-304. Krueger, L. E. A theory of perceptual matching. Psychologica! Review, 1978, 85, 278-304. Krueger, L. E. & R. G. Shapiro. Repeating the target neither speeds or slows its detection: Evidence for independent channels in letter processing. Perception & Psychophysics, 1980, 28, 68-76. Kryvkov, V. I. Wald's identity and random walk models for neuron firing. Advances in Applied Probability, 1976, 8, 257-277. Kulikowski, J. J. & D. J. Tolhurst. Psychophysical evidence for sustained and transient channels in human vision. Journal of Physiology, London, 1973, 232, 149-163. LaBerge, D. A. A recruitment theory of simple behavior. Psychometrika, 1962, 27, 375-396. LaBerge, D. A. On the processing of simple visual and auditory stimuli at distinct levels. Perception & Psychophysics, 1971, 9, 331-334. Lachs, G. & M. C. Teich. A neural counting model incorporating refractoriness and spread of excitation. II. Applications to loudness estimation. Journal of the Acoustical Society of America, 1981, 69, 774-782. Laming. D. R. J. A new interpretation of the relation between choice reaction time and the number of equiprobable alternatives. British Journal of Mathematical and Statistical Psychology, 1966, 19, 139-149. Laming, D. R. J. Information Theory of Choice-Reaction Times. London: Academic Press, 1968. Laming, D. R. J. Subjective probability in choice-reaction time experiments. Journal of Mathematical Psychology, 1969, 6, 81-120. Laming, D. R. J. A correction and a proof of a theorem by Duncan Luce. British Journal of Mathematical and Statistical Psychology, 1977a, 30, 90-97. Laming, D. R. J. Luce's choice axiom compared with choice-reaction data. British Journal of Mathematical and Statistical Psychology, 1977b, 30, 141-153. Laming, D. R. J. Choice reaction performance following an error. Acta Psycho/ogica, 1979a, 43, 199-224. Laming, D. R. J. Autocorrelation of choice-reaction times. Acta Psychologica, 1979b, 43, 381-412. Laming, D. R. J. A critical comparison of two random-walk models for two-choice reaction time. Acta Psychofogica, 1979c, 43, 431-453. Laming, D. R. J. Sensory Analysis. London: Academic Press, 1985. Lappin, J. S. The relativity of choice behavior and the effect of prior knowledge in the speed and accuracy of recognition. In N. J. Castellan, Jr. and F. Restle (eds.), Cognitive Theory, Vol. 3. Hillsdale, N.J.: Erlbaum, 1978, pp. 139-168. Lappin, J. S. & K. Disch. The latency operating characteristic: I. Effects of stimulus probability on choice reaction time. Journal of Experimental Psychology, 1972a, 92, 419-427. Lappin, J. S. & K. Disch. The latency operating characteristic: II. Effects of stimulus intensity on choice reaction time. Journal of Experimental Psychology, 1972b, 93, 367372. Lappin, J. S. & K. Disch. Latency operating characteristic: III. Temporal Uncertainty effects. Journal of Experimental Psychology, 1973, 98, 279-285. Lazarsfeld, P. F. A conceptual introduction to latent structure analysis. In P. F. Lazarsfeld (ed.), Mathematical Thinking in the Social Sciences. Glencoe, 111.: Free Press, 1954, pp. 349-387.
532
References
Leadbetter, M. R., G. Lindren, & H. Rootzen. Extremes and Related Properties of Random Sequences and Processes. New York: Springer-Verlag, 1983. Lee, C. & W. K. Estes. Order and position is primary memory for letter strings. Journal of Verbal Behavior, 1977, 16, 395-418. Lee, C. & W. K. Estes. Item and order information in short-term memory: Evidence for multilevel perturbation processes. Journal of Experimental Psychology: Human Learning and Memory, 1981, 7, 149-169. Lemmon, V. W. The relation of reaction time to measures of intelligence, memory and learning. Archives of Psychology, 1927, 94, 1-38. Leonard, J. A. Tactual reactions: I. Quarterly Journal of Experimental Psychology, 1959, 11, 76-83. Leonard, J. A., R. C. Newman, & A. Carpenter. On the handling of heavy bias in a self-paced task. Quarterly Journal of Experimental Psychology, 1966, 18, 130-141. Levy, W. B., J. A. Anderson, & S. Lehmkuhle (eds.). Synoptic Modification, Neuron Selectivity and Nervous System Organization. Hillsdale, N.J.: Erlbaum, 1984. Liapounov, A. M. Sur une proposition de la theorie des probabilites. Bull, de I'Acad. Imp. des Sci. de St. Petersbourg, 1900, 13, 359-386. Liapounov, A. M. Nouvelle forms du theoreme sur la limite de la probability. Mem Acad. Sci. St.-Petersbourg, 1901, 12, 1-24. Lindeberg, J. W. Eine neue Herleitung des Exponentialgesetzes in der Wahrscheimlichkeitsrechnung. Math. Ze.il., 1922, 15, 211-225. Link, S. W. A Stochastic Model for Bisensory Choice Mechanisms. Stanford University, Ph.D. dissertation, 1968. Link, S. W. The relative judgment theory of two-choice response time. Journal of Mathematical Psychology, 1975, 12, 114-135. Link, S. W. The relative judgment theory analysis of response time deadline experiments. In N. J. Castellan, Jr. & F. Restle (eds.), Cognitive Theory, III. Hillsdale, N.J.: Erlbaum Associates, 1978, pp. 117-138. Link, S. W. Correcting response measures for guessing and partial information. Psychological Bulletin, 1982, 92, 469-486. Link, S. W. & R. A. Heath. A sequential theory of psychological discrimination. Psychometrika, 1975, 40, 77-105. Link, S. W. & A. D. Tindall. Speed and accuracy in comparative judgments of line length. Perception & Psychophysics, 1971, 9, 284-288. Loeve, M. Probability theory, Vols. I and II. New York: Springer-Verlag, 1963, 1977. Logan, G. D., W. B. Cowan, & K. A. Davis. On the ability to inhibit simple and choice reaction time responses: A model and a method. Journal of Experimental Psychology: Human Perception and Performance, 1984, 10, 276-291. Luce, R. D. Individual Choice Behavior. New York: Wiley, 1959. Luce, R. D. Response latencies and probabilities. In K. J. Arrow, S. Karlin, & P. Suppes (eds.), Mathematical Methods in the Social Sciences. Stanford, CA: Stanford University Press, I960, pp. 298-311. Luce, R. D. Detection and recognition. In R. D. Luce, R. R. Bush, and E. Galanter (eds.), Handbook of Mathematical Psychology, Vol. I. New York: Wiley, 1963, pp. 103-189. Luce, R. D. A model for detection in temporally unstructured experiments with a Poisson distribution of signal presentations. Journal of Mathematical Psychology, 1966, 3, 48-64. Luce, R. D. The choice axiom after twenty years. Journal of Mathematical Psychology, 1977, 3, 215-233. Luce, R. D. & D. M. Green. Detection of auditory signals presented at random times, II. Perception & Psychophysics, 1970, 7, 1-14. Luce, R. D. & D. M. Green. A neural timing theory for response times and the psychophysics of intensity. Psychological Review, 1972, 79, 14-57. Luce, R. D. & D. M. Green. Neural coding and psychophysical discrimination data. Journal of the Acoustical Society of America, 1974, 56, 1554-1564.
References
533
Luce, R. D., R. M. Nosofsky, D. M. Green, & A. F. Smith. The bow and sequential effects in absolute identification. Perception & Psychophysics, 1982, 32, 397-408. Lupker, S. J. & J. Theios. Tests of two classes of models for choice reaction times. Journal of Experimental Psychology: Human Perception and Performance, 1975, 104, 137-146. Lupker, S. J. & J. Theios. Further tests of a two-state Markov model for choice reaction times. Journal of Experimental Psychology: Human Perception and Performance, 1977, 3, 496-504. Mackie, R. R. (ed.). Vigilance. New York: Plenum, 1977. Mackworth, J. F. Vigilance and Habituation: A Neuropsychological Approach. Harmondsworth, England: Penguin, 1969. Mackworth, J. F. Vigilance and Attention. Harmondsworth, England: Penguin, 1970. Mackworth, N. H. Researches in the Measurement of Human Performance. M.R.C. Special Report 268, H.H.S.O., 1950. Reprinted in H. A. Sinaiko (ed.), Selected Papers on Human Factors in the Design and Use of Control Systems. New York: Dover, 1961, pp. 174-331. MacMillan, N. A. Detection and recognition of increments and decrements in auditory intensity. Perception & Psychophysics, 1971, 10, 233-238. MacMillan, N. A. Detection and recognition of intensity changes in tone and noise: The detection-recognition disparity. Perception & Psychophysics, 1973, 13, 65-75. Maloney, L. T. & B. A. Wandell. A model of a single visual channel's response to weak test lights. Vision Research, 1984, 24, 633-640. Mansfield, R. J. W. Latency functions in human vision. Vision Research, 1973, 13, 2219-2234. Marill, T. The psychological refractory phase. British Journal of Psychology, 1957, 48, 93-97. Marrocco, R. T. Sustained and transient cells in monkey lateral geniculate nucleus: Conduction velocities and response properties. Journal of Neurophysiology, 1976, 39, 340-353. Marteniuk, R. G. & C. L. MacKenzie. Information processing in movement organization and execution. In R. S. Nickerson (ed.), Attention and Performance VIII. Hillsdale, N.J.: Erlbaum, 1980, pp. 29-57. McCarthy, P. I. Approximate solutions for means and variances in a certain class of box problems. Annals of Mathematical Statistics, 1947, I S , 349-383. McClelland, J. L. On the time relations of mental processes: An examination of systems of processes in cascade. Psychological Review, 1979, 86, 287-330. McGill, W. J. Loudness and reaction time: A guided tour of the listener's private world. Proceedings of the XVI International Congress of Psychology, 1960. Amsterdam: NorthHolland, 1960. pp. 193-199. McGill, W. J. Random fluctuations of response rate. Psychometrika, 1962, 27, 3-17. McGill, W. J. Stochastic latency mechanisms. In R. D. Luce, R. R. Bush, & E. Galanter (eds.). Handbook of Mathematical Psychology, Vol. I. New York: Wiley, 1963, pp. 309-360. McGill, W. J. Neural counting mechanisms and energy detection in audition. Journal of Mathematical Psychology, 1967, 4, 351-376. McGill, W. J. & J. P. Goldberg. Pure-tone intensity discrimination in audition. Journal of the Acoustical Society of America, 1968, 44, 576-581. Meihers, L. M. M. & E. G. J. Eijkman. The motor system in simple reaction time experiments. Acta Psychoiogica, 1974, 38, 367-377. Merkel, J. Die zeitlichen Verhaltnisse der Willensthatigkeit. Philosophische Studien, 1885, 2, 73-127. Miller, D. R. & N. D. Singpurwalla. Failure rate estimation using random smoothing. National Technical Information Service, No. AD-A040999/5ST, 1977. Miller, J. Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology, 1982, 14, 247-279. Miller, J. & D. W. Bauer. Irrelevant differences in the "Same"-"Different" task. Journal of Experimental Psychology: Human Perception and Performance, 1981, 7, 196-207. Modor, J. & C. Phillips. Project Management with CPM and PERT. New York: Van Nostrand, 1970, 2nd ed.
534
References
Moller, A. R. Unit responses in the cochlear nucleus of the rat to pure tones. Acta Psychologica Scandinavia, 1969, 75, 530-541. Mollon, J. D. & J. Krauskopf. Reaction time as a measure of the temporal response properties of individual color mechanisms. Vision Research, 1973, 13, 27-40. Morgan, B. B. & E. A. Alliusi. Effects of discriminability and irrelevant information on absolute judgment. Perception & Psychophysics, 1967, 2, 54-58. Morin, R. E., D. V. DeRosa, & V. Stultz. Recognition memory and reaction time. In A. F. Sanders (ed.), Attention and performance, Acta Psychologica, 1967, 27, 298-305. Morin, R. E., D. V. DeRosa, & R. Ulm. Short-term recognition memory for spatially isolated items. Psychonomic Science, 1967, 9, 617-618. Moss, S. M., S. Engel, & D. Faberman. Alternation and repetition reaction times under three schedules of event sequencing. Psychonomic Science, 1967, 9, 557-558. Moss, S. M., J. L. Myers, & T. Filmore. Short-term recognition memory of tones. Perception & Psychophysics, 1970, 7, 369-373. Mudd, S. Briggs' Information-Processing Model of the Binary Classification Task. Hillsdale, N.J.: Erlbaum, 1983. Murray, H. G. & D. L. Kohfeld. Role of adaptation level in stimulus intensity dynamism. Psychonomic Science, 1965, 3, 439-440. Naatanen, R. An explanation for the longest reaction times obtained by using the shortest foreperiod of a randomized foreperiod series. Psychological Institute Technical Report No. 4/67, University of Helsinki, 1967. Naatanen, R. The diminishing time-uncertainty with the lapse of time after the warning-signal in reaction-time experiments with varying foreperiods. Acta Psychologica, 1970, 34, 399-419. Naatanen, R. Non-aging foreperiod and simple reaction-time. Acta Psychologica, 1971, 35, 316-327. Naatanen, R. Time uncertainty and occurrence uncertainty of the stimulus in a simple reaction time task. Acfa Psychologica, 1972, 36, 492-503. Naatanen, R. & A. Merisalo. Expectancy and preparation in simple reaction time. In S. Domic (ed.), Attention and Performance, VI. Hillsdale, N.J.: Erlbaum, 1977, pp. 115-138. Navon, D. A simple method for latency analysis in signal detection tasks. Perception & Psychophysics, 1975, 18, 61-64. Nickerson, R. S. Response times for "same"-"different" judgments. Perceptual and Motor Skills, 1965a, 20, 15-18. Nickerson, R. S. Response time to the second of two successive signals as a function of absolute and relative duration of the intersignal interval. Perceptual and Motor Skills, 1965b, 21, 3-10. Nickerson, R. S. "Same"-"different" response times: A model and a preliminary test. Acta Psychologica, 1969, 30, 257-275. Nickerson, R. S. "Same-different" response times: A further test of a "counter and clock" model. Acta Psychologica, 1971, 35, 112-127. Nickerson, R. S. Binary-classification reaction times: A review of some studies of human information-processing capabilities. Pschychonomic Monograph Supplements, 1972, 4, 275-318. Nickerson, R. S. & D. Burnham. Response times with nonaging foreperiods. Journal of Experimental Psychology, 1969, 79, 452-457. Niemi, P. Stimulus intensity effects on auditory and visual reaction processes. Acta Psychologies 1979, 43, 299-312. Niemi, P. & E. Lehtonen. Foreperiod and visual stimulus intensity: a reappraisal. Acta Psychologica, 1982, 50, 73-82. Niemi, P. & R. Naatanen. Foreperiod and simple reaction time. Psychological Bulletin, 1981, 89, 133-162. Niemi, P. & T. Valitalo. Subjective response speed and stimulus intensity in a simple reaction time task. Perceptual and Motor Skills, 1980, 51, 419-422.
References
535
Nissen, M. J. Stimulus intensity and information processing. Perception & Psychophysics, 1977, 22, 338-352. Noreen, D. L. Response probabilities and response times in psychophysical discrimination tasks: A review of three models. Unpublished monograph, 1976. Norman, D. A. & W. Wickelgren. Strength theory and decision rules and latency in retrieval from short-term memory. Journal of Mathematical Psychology, 1969, 6, 192-208. Norman, M. F. Statistical inference with dependent observations: Extensions of classical procedures. Journal of Mathematical Psychology, 1971, 8, 444-451. Nosofsky, R. M. Shifts of attention in the identification and discrimination of intensity. Perception & Psychophysics, 1983, 33, 103-112. Oilman, R. T. Fast guesses in choice-reaction time. Psychonomic Science, 1966, 6, 155-156. Oilman, R. T. Central refractoriness in simple reaction time: The deferred processing model. Journal of Mathematical Psychology, 1968, 5, 49-60. Oilman, R. T. A Study of the Fast Guess Model for Choice Reaction Times. University of Pennsylvania, Ph.D. dissertation, 1970. Oilman, R. T. Simple reactions with random countermanding of the "go" signal. In S. Kornblum (ed.), Attention and Performance, IV. New York: Academic Press, 1971, pp. 571-581. Oilman, R. T. On the similarity between two types of speed/accuracy tradeoff. Unpublished manuscript, 1974. Oilman, R. T. Discovering how display variables affect decision performances. Unpublished manuscript, 1975. Oilman, R. T. Choice reaction time and the problem of distinguishing task effects from strategy effects. In S. Dornic (ed.), Attention and Performance VI. Hillsdale, N.J.: Erlbaum, 1977, pp. 99-113. Oilman, R. T. Additive factors and the speed-accuracy tradeoff. Unpublished manuscript, 1979. Oilman, R. T. & M. J. Billington. The deadline model for simple reaction times. Cognitive Psychology, 1972, 3, 311-336. Pachella, R. G. The interpretation of reaction time in information processing research. In B. Kantrowitz (ed.), Human Information: Tutorials in Performance and Cognition. Hillsdale, N.J.: Erlbaum, 1974, 41-82. Pachella, R. G. & D. F. Fisher. Effect of stimulus degradation and similarity on the trade-off between speed and accuracy in absolute judgments. Journal of Experimental Psychology, 1969, 81, 7-9. Pachella, R. G. & D. F. Fisher. Hick's law and the speed-accuracy trade-off in absolute judgment. Journal of Experimental Psychology, 1972, 92, 378-384. Pachella, R. G., D. F. Fisher, & R. Karsh. Absolute judgments in speeded tasks: Quantification of the trade-off between speed and accuracy. Psychonomic Science, 1968, 12, 225226. Pachella, R. G. & R. W. Pew. Speed-accuracy tradeoff in reaction time: Effect of discrete criterion times. Journal of Experimental Psychology, 1968, 76, 19-24. Pacut, A. Some properties of threshold models of reaction latency. Biological Cybernetics, 1977, 28, 63-72. Pacut, A. Stochastic model of the latency of the conditioned escape response. Progress in Cybernetics and System Research, Vol. 3. London: Advance Publications, 1978, pp. 633-642. Pacut, A. Mathematical modelling of reaction latency: The structure of the models and its motivation. Acta Neurobiologiae Experimentalis, 1980, 40, 199-215. Pacut, A. Mathematical modelling of reaction latency Part II: A model of escape reaction and escape learning. Acta Neurobiologiae Experimentalis, 1982, 42, 379-395. Pacut, A. & W. Tych. Mathematical modelling of the reaction latency Part III: A model of avoidance reaction latency and avoidance learning. Acta Neurobiologiae Experimentalis, 1982, 42, 397-420.
536
References
Parker, D. M. Simple reaction times to the onset, offset, and contrast reversal of sinusoidal grating stimuli. Perception & Psychophysics, 1980, 28, 365-368. Parker, D. M. & E. A. Salzen. Latency changes in the human visual evoked response to sinusoidal gratings. Vision Research, 1977, 17, 1201-1204. Parzen, E. Modern Probability Theory and Its Applications. New York: Wiley, 1960. Parzen, E. Stochastic Processes. San Francisco: Holden-Day, 1962. Pashler, H. Processing stages in overlapping tasks: Evidence for a central bottleneck. Journal of Experimental Psychology: Human Perception and Performance, 1984, 10, 358-377. Pearson, K. Tables of the Incomplete Beta-Function. London: Cambridge University Press, 1932. Pease, V. The intensity-time relation of a stimulus in simple visual reaction time. Psychological Record, 1964, 14, 157-464. Perkel, D. H. A computer program for simulating a network of interacting neurons, I. Organization and physiological assumptions. Computers and Biomedical Research, 1976, 9, 31-43. Pew, R. W. The speed-accuracy operating characteristic. In W. G. Koster (ed.), Attention and Performance II. Acta Psychologica, 1969, 30, 16-26. Pfeiffer, R. R. Classification of response patterns of spike discharges for units in the cochlear nucleus: Tone burst stimulation. Experimental Brain Research, 1966, I, 220-235. Pfingst, B. E., R. Hienz, J. Kimm, & J. Miller. Reaction-time procedure for measurement of hearing, I. Journal of the Acoustical Society of America, 1975, 57, 421-430. Pfingst, B. E., R. Hienz, & J. Miller. Reaction-time procedure for the measurement of hearing, II. Journal of the Acoustical Society of America, 1975, 57, 431-436. Pickett, R. M. The perception of a visual texture. Journal of Experimental Psychology, 1964, 68, 13-20. Pickett, R. M. Response latency in a pattern perception situation. In A. F. Sanders (ed.), Attention and performance, Acta Psychologica, 1967, 27, 160-169. Pickett, R. M. The visual perception of random line segment texture. Paper read at Ninth Meeting of the Psychonomic Society, 1968. Pieron, H. Recherches sur les lois de variation des temps de latence sensorielle en fonction des intensites excitatrices. L'Annee Psychologique, 1914, 20, 17-96. Pieron, H. Nouvelles recherches sur 1'analyse du temps de latence sensorialle et sur la loi qui relie de temp a 1'intensite d'excitation. Annee Psychologique, 1920, 22, 58-142. Pierrel, R. & C. S. Murray. Some relationships between comparative judgment, confidence and decision-time in weight lifting. American Journal of Psychology, 1963, 76, 28-38. Pieters, J. P. M. Sternberg's additive factor method and underlying psychological processes: Some theoretical considerations. Psychological Bulletin, 1983, 93, 411-426. Pike, A. R. Latency and relative frequency of response in psychophysical discrimination. British Journal of Mathematical and Statistical Psychology, 1968, 21, 161-182. Pike, A. R. & L. Dalgleish. Latency-probability curves for sequential decision models: A comment on Weatherburn. Psychological Bulletin, 1982, 91, 384-388. Pike, R. Response latency models for signal detection. Psychological Review, 1973, 80, 53-68. Pike, R. & P. Ryder. Response latencies in the yes/no detection task. Perception & Psychophysics, 1973, 13, 224-232. Poffenberger, A. T. Reaction time to retinal stimulation with special reference to the time lost in conduction through nerve centers. Archives of Psychology, 1912, 23, 1-73. Polya, G. Uber den zentralin Grenzwertsatz der Wahrscheimlichkeitsrechnung und das Moment problem. Mathematisches Zeitschriff, 1920, 8, 173-181. Posner, M. I. Chronometric Explorations of Mind. Hillsdale, N.J.: Erlbaum, 1978. Posner, M. I. & R. F. Mitchell. Chronometric analysis of classification. Psychological Review, 1967, 74, 392-409. Proctor, R. W. A unified theory for matching task phenomena. Psychological Review, 1981, 88, 291-326. Proctor, R. W. & K. V. Rao. On the "misguided" use of reaction-time differences: A discussion of Ratcliff and Hacker (1981). Perception & Psychophysics, 1982, 31, 601-602.
References
537
Proctor, R. W. & K. V. Rao. Reinstating the original principles of Proctor's unified theory for matching-task phenomena: An evaluation of Krueger and Shapiro's reformulation. Psychological Review, 1983, 90, 21-37. Proctor, R. W., K. V. Rao, & P. W. Hurst. An examination of response bias in multiletter matching: Implications for models of the comparison process. Paper presented at the Sixteenth Annual Meeting of the Society for Mathematical Psychology, 1983. Prucnal, P. R. & M. C. Teich. Refractory effects in neural counting processes with exponentially decaying rates. IEEE Transactions on Systems, Man, and Cybernetics, 1983, 13, 10281033. Purks, S. R., D. J. Callahan, L. D. Braida, & N. Durlach. I. Intensity perception. X. Effect of preceding stimulus on identification performance. Journal of the Acoustical Society of America, 1980, 67, 634-636. Raab, D. H. Effects of stimulus-duration on auditory reaction-time. American Journal of Psychology, 1962-d, 75, 298-301. Raab, D. H. Magnitude estimates of the brightness of brief foveal stimuli. Science, 1962b, 135, 42-43. Raab, D. & E. Fehrer. The effect of stimulus duration and luminance on visual reaction time. Journal of Experimental Psychology, 1962, 64, 326-327. Raab, D., E. Fehrer, & M. Hershenson. Visual reaction time and the Broca-Sulzer phenomenon. Journal of Experimental Psychology, 1961, 61, 193-199. Rabbitt, P. M. A. Response facilitation on repetition of a limb movement. British Journal of Psychology, 1965, 56, 303-304. Rabbitt, P. M. A. Errors and error correction in choice-response tasks. Journal of Experimental Psychology, 1966, 71, 264-272. Rabbitt, P. M. A. Time to detect errors as a function of factors affecting choice-response time. Acta Psychologies 1967, 27, 131-142. Rabbitt, P. M. A. Three kinds of error-signaling responses in a serial choice task. Quarterly Journal of Experimental Psychology, 1968a, 20, 179-188. Rabbitt, P. M. A. Repetition effects and signal classification strategies in social choice-response tasks. Quarterly Journal of Experimental Psychology, 1968b, 20, 232-240. Rabbitt, P. M. A. Psychological refractory delay and response-stimulus interval duration in serial, choice-response tasks. In W. G. Koster (ed.), Attention and Performance II. Amsterdam: North Holland, 1969, pp. 195-219. Rabbitt, P. M. A. & B. Rogers. What does a man do after he makes an error? An analysis of response programming. Quarterly Journal of Experimental Psychology, 1977, 29, 727743. Rabbitt, P. M. A. & S. M. Vyas. An elementary preliminary taxonomy for some errors in laboratory choice RT tasks. Acta Psychologica, 1977, 29, 727-743. Rabiner, L. R. & B. Gold. Theory and Application of Digital Signal Processing. Englewood Cliffs, N.J.: Prentice-Hall, 1975. Rains, J. D. Signal luminance and position effects in human reaction time. Vision Research, 1963, 3, 239-251. Rapoport, A. A study of disjunctive reaction times. Behavioral Science, 1959, 4, 299315. Rapoport, A., W. E. Stein, & G. J. Burkheimer. Response Models for Detection of Change. Dordrecht: Reidel, 1979. Ratcliff, R. A theory of memory retrieval. Psychological Review, 1978, 85, 59-108. Ratcliff, R. A theory of order relations in perceptual matching. Psychological Review, 1981, 88, 552-572. Ratcliff, R. Theoretical interpretations of speed and accuracy of positive and negative responses. Psychological Review, 1985, 92, 212-225. Ratcliff, R. & M. J. Hacker. Speed and accuracy of same and different responses in perceptual matching. Perception & Psychophysics, 1981, 30, 303-307. Ratcliff, R. & M. J. Hacker. On the misguided use of reaction-time differences: A reply to Proctor and Rao (1982). Perception & Psychophysics, 1982, 31, 603-604.
538
References
Ratcliff, R. & B. B. Murdock, Jr. Retrieval processes in recognition memory. Psychological Review, 1976, 86, 190-214. Reed, A. V. Speed accuracy trade-off in recognition memory. Science, 1973, 181, 574-576. Remington, R. J. Analysis of sequential effects in choice reaction times. Journal of Experimental Psychology, 1969, 82, 250-257. Remington, R. J. Analysis of sequential effects for a four-choice reaction time experiment. Journal of Psychology, 1971, 77, 17-27. Restle, F. Psychology of Judgment and Choice. New York: Wiley, 1961. Rice, J. & M. Rosenblatt. Estimation of the log survivor function and hazard function. Sankhya (Series A), 1976, 38, 60-78. Rose, J. E., J. F. Brugge, D. J. Anderson, & J. E. Hind. Phase-locked response to lowfrequency tones in single auditory nerve fibers of the squirrel monkey. Journal of Neurophysiology, 1967, 30, 769-793. Ross, B. H. & J. R. Anderson. A test of parallel versus serial processing applied to memory retrieval. Journal of Mathematical Psychology, 1981, 24, 183-223. Rubinstein, L. Intersensory and intrasensory effects in simple reaction time. Perceptual and Motor Skills, 1964, 18, 159-172. Salthouse, F. A. Converging evidence for information-processing stages: A comparativeinfluence stage-analysis method. Acta Psychologies, 1981, 47, 39-61. Sampath, G. & S. K. Srinivasan. Stochastic Models for Spike Trains of Single Neurons. New York: Springer-Verlag, 1977. Sanders, A. F. The foreperiod effect revisited. Quarterly Journal of Experimental Psychology, 1975, 27, 591-598. Sanders, A. F. Structural and functional aspects of the reaction process. In S. Domic (ed.), Attention and Performance VI. Hillsdale, N.J.: Erlbaum, 1977, pp. 3-25. Sanders, A. F. & J. E. B. Andreiessen. A suppressing effect of response selection on immediate arousal in choice reaction task. Acta Psychoiogica, 1978, 42, 181-186. Sanders, A. F. & W. Ter Linden. Decision making during paced arrival of probabilistic information. In A. F. Sanders (ed.), Attention and Performance, Acta Psycholgica, 1967, 27, 170-177. Sanders, A. F. & A. Wertheim. The relation between physical stimulus properties and the effect of foreperiod duration on reaction time. Quarterly Journal of Experimental Psychology, 1973, 25, 201-206. Sanders, A. F., J. L. C. Wijncn, & A. E^. van Arkel. An additive factor analysis of the effects of sleep loss on reaction processes. Acfa Psychologica, 1982, 53, 41-59. Santee, J. L. & H. E. Egeth. Do reaction time and accuracy measure the same aspects of letter recognition. Journal of Experimental Psychology: Human Perception and Performance, 1982, 8, 489-501. Saslow, C. A. Operant control of response latency in monkeys: Evidence for a central explanation. Journal of the Experimental Analysis of Behavior, 1968, 11, 89-98. Saslow, C. A. Behavioral definition of minimal reaction time in monkeys. Journal of the Experimental Analysis of Behavior, 1972, 18,87-106. Scharf, B. Critical bands. In J. V. Tobias (ed.), Foundations of Modern Auditory Theory, Vol. I. New York: Academic, 1970, pp. 159-202. Schmitt, J. C. & C. J. Scheiver. Empirical approaches to information processing: Speedaccuracy tradeoff function or reaction time. Acta Psychologica, 1977, 41, 321-325. Schneider, W. & R. M. Shiffrin. Controlled and automatic human information processing: I. Detection, search, and attention. Psychological Review, 1977, 84, 1-66. Schnizlein, J. M. Psychophysics of Auditory Simple Reaction Time. The University of New Mexico, Ph.D. dissertation, 1980. Schouten, J. F. & J. A. M. Bekker. Reaction time and accuracy. In A. F. Sanders (ed.), Attention and Performance. Ada Psychologica, 1967. 27, 143-153. Schvaneveldt, R. W. & W. G. Chase. Sequential effects in choice reaction time. Journal of Experimental Psychology, 1969, 80, 1-8.
References
539
Schweickert, R. A Critical path generalization of the additive factor method: Analysis of a Stroop task. Journal of Mathematical Psychology, 1978, 18, 105-139. Schweickert, R. Critical-path scheduling of mental processes in a dual task. Science, 1980, 209, 704-706. Schweickert, R. The bias of an estimate of coupled slack in stochastic PERT networks. Journal of Mathematical Psychology, 1982a, 26, 1-12. Schweickert, R. Scheduling decisions in critical path networks of mental processes. Presentation at the meeting of the Society for Mathematical Psychology, 1982b. Schweickert, R. Latent network theory: Scheduling of processes in sentence verification and the Stroop effect. Journal of Experimental Psychology: Learning, Memory and Cognition, 1983a, 9, 353-383. Schweickert, R. Synthesizing partial orders given comparability information: Partitive sets and slack in critical path networks. Journal of Mathematical Psychology, 1983b, 27, 261-276. Schweichert, R. The representation of mental activities in critical path networks. In J. Gibbon & L. Allan, Timing and Time Perception. New York: New York Academy of Sciences, Vol 423, 1984, pp. 82-95. Shannon, C. E. A mathematical theory of communication. Bell System Technical Journal, 1948, 27, 379-423 and 623-656. Shaw, M. L. A capacity allocation model for reaction time. Journal of Experimental Psychology: Human Perception and Performance, 1978, 4, 586-598. Shaw, M. L. & P. Shaw. Optimal allocation of cognitive resources to spatial locations. Journal of Experimental Psychology: Human Perception and Performance, 1977, 3, 201-211. Shevrin, L. N. & N. D. Filippov. Partially ordered sets and their comparability graphs. Siberian Mathematical Journal, 1970, 11, 497-509. Siebert, W. M. Stimulus transformation in the peripheral auditory system. In P. A. Kolers & M. Eden (eds.), Recognizing Patterns. Cambridge, Mass.: MIT Press, 1968, pp. 104-133. Siebert, W. M. Frequency discrimination in the auditory system: Place or periodicity mechanisms? Proceedings of the IEEE, 1970, 58, 723-730. Silverman, W. P. The perception of identity in simultaneously presented complex visual displays. Memory and Cognition, 1973, I , 459-466. Silverman, W. P. & S. L. Goldberg. Further confirmation of same vs. different processing differences. Perception & Psychophysics, 1975, 17, 189-193. Simon, J. R. Sterotypic reactions in information processing. In L. E. Smith (ed.), Psychology of Motor Learning. Ames, Iowa: University of Iowa Press, 1969. Simon, J. R. Effect of an auditory stimulus on the processing of a visual stimulus under singleand dual-tasks conditions. Acfa Psychoiogica, 1982, 51, 61-73. Simon, J. R., E. Acosta, Jr., & S. P. Mewaldt. Effects of locus of a warning tone on auditory choice reaction time. Memory and Cognition, 1975, 3, 167-170. Simon, J. R., E. Acosta, Jr., S. P. Mewaldt, & C. R. Speidel. The effect of an irrelevant cue on choice reaction time: Duration of the phenomenon and its relation to stages of processing. Perception & Psychophysics, 1976, 19, 16-22. Singpurwalla, N. D. & M.-T. Wong. Estimation of the failure rate—A survey of nonparametric methods. Part I: Non-Bayesian methods. Communications in Statistics—Theory and Methods, 1983, A12, 559-588. Skinner, B. F. Are theories of learning necessary? Psychological Review, 1950, 57, 193216. Smith, E. E. Choice reaction time: An analysis of the major theoretical positions. Psychological Bulletin, 1968, 69, 77-110. Smith, G. A. Studies of Compatibility and Investigations of a Model of Reaction Time. University of Adelaide, Australia: Ph.D. dissertation, 1978. Smith, G. A. Studies of compatibility and a new model of choice reaction time. In S. Dornic (ed.), Attention and Performance VI. Hillsdale, N.J.: Lawrence Erlbaum, 1977. Smith, G. A. Models of choice reaction time. In A. T. Welford (ed.), Reaction Times. London: Academic Press, 1980, pp. 173-214.
540
References
Smith, M. C. Reaction time to a second stimulus as a function of the intensity of the first stimulus. Quarterly Journal of Experimental Psychology. 1967a, 19, 125-132. Smith, M. C. The psychological refractory period as a function of performance of a first response. Quarterly Journal of Experimental Psychology. ,1967b, 19, 350-352. Smith, M. C. Theories of the psychological refractory period. Psychological Bulletin, 1967c, 67, 202-213. Smith, M. C. The effect of varying information on the psychological refractory period. In W. G. Koster (ed.), Attention and performance, 11. Acta Psychologica, 1969, 30, 220-231. Snodgrass, J. G. Foreperiod effects in simple reaction time: Anticipation or expectancy? Journal of Experimental Psychology, Monograph, 1969, 79, 1-19. Snodgrass, J. G. Reaction times for comparisons of successively presented visual patterns: Evidence for serial self-terminating search. Perception & Psychophysics, 1972, 72, 364-372. Snodgrass, J. G., R. D. Luce, & E. Galanter. Some experiments on simple and choice reaction time. Journal of Experimental Psychology, 1967, 75, 1-17. Snodgrass, J. G. & J. T. Townsend. Comparing parallel and serial models: Theory and implementation. Journal of Experimental Psychology: Human Perception and Performance 1980, 6, 330-354. Spence, K. W. The relation of response latency and speed to the intervening variables and N in S-R theory. Psychological Review, 1954, 61, 209-216. Sperling, G. The information available in brief visual presentations. Psychological Monographs, 1960, 74. SPSSx User's Guide. Chicago, 111.: McGraw-Hill, 1983. Stanovich, K. & R. Pachella. Encoding, stimulus-response compatibility and stages of processing. Journal of Experimental Psychology: Human Perception and Performance, 1977, 3, 411-421. Stanovich, K. E., R. G. Pachella, & J. E. K. Smith. An analysis of confusion errors in naming letters under speed stress. Perception & Psychophysics, 1977, 21, 545-552. Stelmach, G. K. (ed,). Information Processing in Motor Control and Motor Learning. New York: Academic Press, 1978. Stelmach, G. E. & J. Requin. (eds.). Tutorials in Motor Behavior. Amsterdam: North-Holland, 1980. Sternberg, S. High speed scanning in human memory. Science, 1966, 153, 652-654. Sternberg, S. Retrieval of contextual information from memory. Psychonomic Science, 1967a, 8, 55-56. Sternberg, S. Two operations in character-recognition: Some evidence from reaction-time measurements. Perception & Psychophysics, 1967b, 2, 45-53. Sternberg, S. The discovery of processing stages: Extensions of Donders' method. In W. G. Koster (ed.), Attention and performance II, Acta Psychologica, 1969a, 30, 276-315. Sternberg, S. Memory scanning: Mental processes revealed by reaction-time experiments. American Scientist, 1969b, 57, 421-457. Reprinted in J. S. Antrobus (ed.), Cognition and Affect. Boston: Little, Brown, 1970, pp. 13-58. Sternberg, S. Decomposing mental processes with reaction-time data. Invited address, Annual Meeting of the Midwestern Psychological Association, Detroit, May 1971. Sternberg, S. Evidence against self-terminating memory search from properties of RT distributions. Paper presented at the Meeting of the Psychonomic Society, St. Louis, November 1973. Sternberg, S. Memory scanning: New findings and current controversies. Quarterly Journal of Experimental Psychology, 1975, 17, 1-32. Sternberg, S., S. Monsell, R. L. Knoll, & C. E. Wright. The latency and duration of rapid movement sequences: Comparisons of speech and typewriting. In G. E. Stilmach (ed.), Information Processing in Motor Control and Learning. New York: Academic Press, 1978, pp. 117-152. Sternberg, S., C. E. Wright, R. L. Knoll, & S. Monsell. Motor programs in rapid speech:
References
541
Additional evidence. In R. A. Cole (ed.), The Perception and Production of Fluent Speech. Hillsdale, N.J.: Erlbaura, 1980, pp. 507-534. Stevens, J. C. & J. W. Hall. Brightness and loudness as functions of stimulus duration. Perception & Psychophysics, 1966, 1, 319-327. Stevens, S. S. Psychophysics. New York: Wiley, 1975. Sticht, T. G. Effects of intensity and duration on the latency of response to brief light and dark stimuli. Journal of Experimental Psychology, 1969, 80, 419-422. Stilitz, I. Conditional probability and components of RT in the variable foreperiod experiment. Quarterly Journal of Experimental Psychology, 1972, 24, 159-168. Stone, L. D. Theory of Optimal Search. New York: Academic Press, 1975. Stone, M. Models for choice-reaction time. Psychometrika, 1960, 25, 251-260. Stroh, C. M. Vigilance: The Problem of Sustained Attention. Oxford: Pergamon, 1971. Stroud, J. M. The fine structure of psychological time. In H. Quastler (ed.), Information Theory in Psychology. Glencoe: 111.: Free Press, 1955, pp. 174-207. SUGI Supplemental Library User's Guide. Gary, N.C.: SAS Institute Inc., 1983. Swanson, J. M. & G. E. Briggs. Information processing as a function of speed versus accuracy. Journal of Experimental Psychology, 1969, 31, 223-229. Swanson, J. M., A. M. Johnsen, & G. E. Briggs. Recoding in a memory search task. Journal of Experimental Psychology, 1972, 93, 1-9. Swensson, R. G. The elusive tradeoff: Speed versus accuracy in visual discrimination tasks. Perception & Psychophysics, 1972a, 12, 16-32. Swensson, R. G. Trade-off bias and efficiency effects in serial choice reactions. Journal of Experimental Psychology, 1972b, 95, 397-407. Swensson, R. G. & W. Edwards. Response strategies in a two-choice reaction task with a continuous cost for time. Journal of Experimental Psychology, 1971, 88, 67-81. Swensson, R. G. & D. M. Green. On the relations between random walk models for two-choice response times. Journal of Mathematical Psychology, 1977, 15, 282-291. Swensson, R. G. & R. E. Thomas. Fixed and optional stopping models for two-choice discrimination times. Journal of Mathematical Psychology, 1974, 11, 213-236. Swets, J. A. Signal detection applied to vigilance. In R. R. Mackie (ed.), Vigilance. New York: Plenum, 1977, pp. 705-718. Taylor, D. A. Stage analysis of reaction time. Psychological Bulletin, 1976a, 83, 161-191. Taylor, D. A. Effect of identity in the multiletter matching task. Journal of Experimental Psychology: Human Perception and Performance, 1976b, 2, 417-428. Taylor, D. A., J. T. Townsend, & P. Sudevan. Analysis of intercompletion times in multielement processing. Paper presented at the Annual Meeting of the Psychonomic Society, San Antonio, November 1978. Taylor, D. H. Latency models for reaction time distributions. Psychometrika, 1965, 30, 157-163. Taylor, D. H. Latency components in two choice responding. Journal of Experimental Psychology, 1966, 72, 481-487. Taylor, M. M., P. H. Lindsay, & S. M. Forbes. Quantification of shared capacity processing in auditory and visual discrimination. In A. F. Sanders (ed.), Attention and Performance, Acta Psychologies 1967, 27, 223-229. Teich, M. C. & B. I. Cantor. Information, error, and imaging in deadtime-perturbed doubly stochastic Poisson counting processes. IEEE Journal of Quantum Electronics, 1978, 14, 993-1003. Teich, M. C. & G. Lachs. A neural counting model incorporating refractoriness and spread of excitation. I. Application to intensity discrimination. Journal of the Acoustical Society of America, 1979, 66, 1738-1749. Teich, M. C. & W. J. McGill. Neural counting and photon counting in the presence of deadtime. Physical Review Letters, 1976, 36, 754-758. Teichner, W. H. Recent studies of simple reaction time. Psychological Bulletin, 1954, 51, 172-177.
542
References
Teichner, W. H. & M. J. Krebs. Laws of the simple visual reaction time. Psychological Review, 1972, 79, 344-358. Teichner, W. H. & M. J. Krebs. Laws of visual choice reaction time. Psychological Review, 1974, 81, 75-98. Telford, C. W. The refractory phase of voluntary and associative responses. Journal of Experimental Psychology, 1931, 14, 1-36. Ten Hoopen, G., S. Akerboom, & E. Raaymakers. Vibrotactual choice reaction time, tactile receptor systems and ideomotor compatibility. Ada Psychologica, 1982, 50, 143157. Theios, J. Reaction time measurements in the study of memory processes: Theory and data. In G. H. Bower (ed.), The Psychology of Learning and Motivation, Vol. 7. New York: Academic Press, 1973, pp. 43-85. Theios, J. The components of response latency in simple human information processing tasks. In P. M. A. Rabbitt and S. Domic (eds.), Attention and Performance V, London: Academic Press, 1975, pp. 418-440. Theios, J. & P. G. Smith. Can a two-state model account for two-choice reaction-time data? Psychological Review, 1972, 79, 172-177. Theios, J., P. G. Smith, S. E. Haviland, J. Traupmann, & M. C. Moy. Memory scanning as a serial self-terminating process. Journal of Experimental Psychology, 1973, 97, 323-336. Theios, J. & D. G. Walter. Stimulus and response frequency and sequential effects in memory scanning reaction times. Journal of Experimental Psychology, 1974, 102, 1092-1099. Thomas, E. A. C. Reaction time studies: The anticipation and interaction of responses. The British Journal of Mathematical and Statistical Psychology, 1967, 20, 1-29. Thomas, E. A. C. Distribution free tests for mixed probability distributions. Biometrika, 1969, 56, 475-484. Thomas, E. A. C. Sufficient conditions for monotone hazard rate and application to latencyprobability curves. Journal of Mathematical Psychology, 1971, 8, 303-332. Thomas, E. A. C. On expectancy and the speed and accuracy of responses. In S. Kornblum (ed.), Attention and Performance, IV. New York: Academic Press, 1973, pp. 613-626. Thomas, E. A. C. The selectivity of preparation. Psychological Review, 1974, 81, 442-464. Thomas, E. A. C. A note on the sequential probability ratio test. Psychometrika, 1975, 40, 107-111. Thomas, E. A. C. & J. L. Myers. Implications of listing data for threshold and nonthreshold models of signal detection. Journal of Mathematical Psychology, 1972, 9, 253-285. Thomas, E. A. C. & B. H. Ross. On appropriate procedures for combining probability distributions within the same family. Journal of Mathematical Psychology, 1980, 21, 136-152. Thrane, V. Sensory and preparatory factors in response latency: I. The visual intensity function. Scandinavian Journal of Psychology, 1960a, 1, 82-96. Thrane, V. Sensory and preparatory factors in response latency: II. Simple reaction or compensatory interaction? Scandinavian Journal of Psychology, 196()b, 1, 169-176. Thurstone, L. L. A law of comparative judgment. Psychological Review, 1927a, 34, 273-286. Thurstone, L. L. A mental unit of measurement. Psychological Review, 1927b, 34, 415-423. Titchmarsh, E. C. Introduction to the Theory of Fourier Integrals. Oxford: Oxford University Press, 1948. Tolhurst, D. J. Reaction times in the detection of gratings by human observers: A probabilistic mechanism. Vision Research, 1975a, 15, 1143-1149. Tolhurst, D. J. Sustained and transient channels in human vision. Vision Research, 1975b, 15, 1151-1155. Townsend, J. T. Mock parallel and serial models and experimental detection of these. In Proceedings of The Symposium on Information Processing. Purdue University School of Electrical Engineering, 1969, 617-628. Townsend, J. T. Some results concerning the identificability of parallel and serial processes. British Journal of Mathematical and Statistical Psychology, 1972, 25, 168-197. Townsend, J. T. Issues and models concerning the processing of a finite number of inputs. In B.
References
543
H. Kantrowitz (ed.), Human Information Processing: Tutorials in Performance and Cognition. New York: Wiley, 1974, pp. 133-185. Townsend, J. T. Serial and within-stage independent parallel model equivalence on the minimum completion time. Journal of Mathematical Psychology, 1976a, 14, 219238. Townsend, J. T. A stochastic theory of matching processes. Journal of Mathematical Psychology, 1976b, 14, 1-52. Townsend, J. T. Paper presented at the Mathematical Psychology 9th Annual Meeting, New York University, 1976c. Townsend, J. T. & F. G. Ashby. Methods of modeling capacity in simple processing systems. In N. J. Castellan, Jr. and F. Restle (eds.), Cognitive Theory, Vol. 3. Hillsdale, N.J.: Erlbaum, 1978, pp. 199-239. Townsend, J. T. & F. G. Ashby. Stochastic Modeling of Elementary Psychological Processes. Cambridge: Cambridge University Press, 1983. Townsend, J. T. & R. N. Roos. Search reaction times for single targets in multiletter stimuli with brief visual displays. Memory and Cognition, 1973, 1, 319-332. Townsend, J. T. & J. G. Snodgrass. A serial vs. parallel testing paradigm where "same" and "different" comparison rates differ. Paper presented at Psychonomic Society Meeting, Boston, 1974. Treisman, M. A theory of criterion setting with an application to sequential dependencies. Psychological Review, 1985, in press. Tretter, S. A. Introduction to Discrete-Time Signal Processing. New York: Wiley, 1978. Trotter, W. T., J. I. Moore, & D. P. Sumner. The dimension of a comparability graph. Proceedings of the American Mathematical Society, 1976, 60, 35-38. Tversky, B. Pictorial and verbal encoding in a short-term memory task. Perception & Psychophysics, 1969, 6, 225-233. Uttal, W. R. & M. Krissoff. Response of the somasthetic system to patterned trains of electrical stimuli. In D. R. Krenshalo (ed.), The Skin Senses. Springfield, 111.: Thomas, 1968, pp. 262-303. van der Heijden, A. H. C. & H. W. Menckenberg. Some evidence for a self-terminating process in simple visual search tasks. Acta Psychologica, 1974, 38, 169-181. van der Molen, M. W. & P. J. G. Keuss. The relationship between reaction time and auditory intensity in discrete auditory tasks. Quarterly Journal of Experimental Psychology, 1979, 31, 95-102. van der Molen, M. W. & P. J. G. Keuss. Response selection and the processing of auditory intensity. Quarterly Journal of Experimental Psychology, 1981, 33, 177-184. van der Molen, M. W. & J. F. Orlebeke. Phasic heart rate change and the U-shaped relationship between choice reaction time and auditory intensity. Psychophysiology, 1980, 17, 471481. Vannucci, G. & M. C. Teich. Effects of rate variation on the counting statistics of dead-time modified Poisson processes, Optics Communications, 1978, 25, 267-272. Vaughn, H. G., Jr., L. D. Costa, & L. Gilden. The functional relation of visual evoked response and reaction time to stimulus intensity. Vision Research, 1966, 6, 654-656. Vervaeck, K. R. & L. C. Boer. Sequential effects in two-choice reaction time: Subjective expectancy and automatic after-effect at short response-stimulus intervals. Acta Psychologica, 1980, 44, 175-190. Vickers, D. Evidence for an accumulator model of psychophysical discrimination. Ergonomics, 1970, 13, 37-58. Vickers, D. Discrimination. In A. T. Welford (ed.), Reaction Times. London: Academic Press, 1980, pp. 25-72. Vickers, D., D. Caudrey, & R. J. Willson. Discriminating between the frequency of occurrence of two alternative events. Acta Psychologica, 1971, 35, 151-172. Vickers, D. & J. Packer. Effects of alternating set for speed or accuracy on response time, accuracy and confidence in a unidimensional task. Acta Psychologica, 1981, 50, 179197.
544
References
Vince, M. A. The intermittency of control movements and the psychological refractory period. British Journal of Psychology, 1948, 38, 149-157. Viviani, P. Choice reaction times for temporal numerosity. Journal of Experimental Psychology: Human Perception and Performance, 1979a, 5, 157-167. Viviani, P. A diffusion model for discrimination of temporal numerosity. Journal of Mathematical Psychology, 1979b, 19, 108-136. Viviani, P. & C. A. Terzuolo. On the modeling of the performance of the human brain in a two-choice task involving decoding and memorization of simple visual patterns. Kybernelik, 1972, 10, 121-137. Vorberg, D. On the equivalence of serial and parallel processing systems. Paper presented at the Mathematical Psychology 10th Annual Meeting, UCLA, 1977. Wald, A. Sequential Analysis. New York: Wiley, 1947. Wald, A. & .1. Wolfowit/,. Optimum character of the sequential probability ratio test. Annals of Mathematical Statistics, 1948, 19, 326-339. Wandell, B. A. Speed-accuracy tradeoff in visual detection: Applications of neural counting and timing. Vision Research, 1977, 17, 217-225. Wandell, B. A., P. Ahumada, & D. Welsh. Reaction times to weak test lights. Vision Research, 1984, 24, 647-652. Wandell, B., & R. D. Luce. Pooling peripheral information: Average versus extreme values. Journal of Mathematical Psychology, 1978, (7,220-235. Ward, L. M. & G. R. Lockhead. Sequential effects and memory in category judgments. Journal of Experimental Psychology, 1970, 84, 27--34. Watson, C. S., M. E. Rilling, & W. T. Bourbon. Receiver-operating characteristics determined by a mechanical analog to a rating scale. Journal of the Acoustical Society of America, 1964, 36, 283-288. Weatherburn, D. Latency-probability functions as bases for evaluating competing accounts of the sensory decision process. Psychological Bulletin, 1978, 85, 1344-1347. Weatherburn, D. & D. Grayson. Latency-probability functions: A reply to Pike and Dalgleish. Psychological Bulletin, 1982, 91, 389-392. Weaver, H. R. A study of discriminative serial action: Manual response to color. Journal of Experimental Psychology, 1942, 31, 177-201. Weiss, A. D. The locus of reaction time change with set, motivation and age. Journal of Gerontology, 1965, 20, 60-64. Weist, J. D. & F. K. Levy. A Management Guide to PERT/CPM. Englewood Cliffs, N.J.: Prentice-Hall, 1969. Welford, A. T. The "psychological refractory period" and the timing of high-speed performance—a review and a theory. British Journal of Psychology, 1952, 43, 2-19. Welford, A. T. Ltvidence of a single-channel decision mechanism limiting performance in a serial reaction task. Quarterly Journal of Experimental Psychology, 1959, 11, 193-210. Welford, A. T. Single-channel operation in the brain. Acta Psychologica 1967, 27, 5-22. Welford, A. T. (ed.). Reaction Times. London: Academic Press, 1980a. Welford, A. T. Choice reaction time: Basic concepts. In A. T. Welford (ed.), Reaction Times. London: Academic Press, 1980b, pp. 73-128. Welford, A. T. The single-channel hypothesis. In W. T. Welford (ed.), Reaction Times. London: Academic Press, 198()c, pp. 215-252. Wells, G. R. The influence of stimulus duration on reaction time. Psychological Monographs, 1913, /5 (5, Whole No. 66). Wetherill, G. B. Sequential Methods in Statistics. London: Chapman and Hall, 1975. Whitman, C. P. & E. S. Geller. Prediction outcome, S-R compatibility and choice reaction time. Journal of Experimental Psychology, 1971a, 91, 299-304. Whitman, C. P. & E. S. Geller. Runs of correct and incorrect predictions as determinants of choice reaction time. Psychonomic Science, 1971b. 23, 421-423. Wickelgren, W. A. Speed-accuracy tradeoff and information processing dynamics. Acta Psychologica, 1977, 41, 67-85.
References
545
Wickelgren, W. A. Wickelgrcn's neglect. Ada Psychologies 1978, 42, 81-82. Wickelgren, W. A. & A. T. Corbett. Associative interference and retrieval dynamics in yes-no recall and recognition. Journal of Experimental Psychology; Human Learning and Memory, 1977, 3, 189-202. Wilding, J. M. The relation between latency and accuracy in the identification of visual stimuli. II. The effects of sequential dependencies. Acta Psychologies 1971, 35, 399-413. Wilding, J. M. Effects of stimulus discriminability on the latency distribution of identification responses. Acta Psychologies 1974, 38, 483-500. Williams, J. A. Sequential effects in disjunctive reaction time: Implications for decision models. Journal of Experimental Psychology, 1966, 71, 665-672. Wood, C. C. & J. R. Jennings. Speed-accuracy trade-off functions in choice reaction time: Experimental designs and computational procedures. Perception & Psychophysics, 1976, 9, 93-102. Woodrow, H. The reproduction of temporal intervals. Journal of Experimental Psychology, 1930, 13, 473-499. Woodworth, R. S. Experimental Psychology. New York: Holt, 1938. Woodworth, R. S. & H. Schlosberg. Experimental Psychology. New York: Holt, 1954. Yager, D. & I. Duncan. Signal-detection analysis of luminance generalization in goldfish using latency as a graded response measure. Perception & Psychophysics, 1971, 9, 353-355. Yellott, J. I., Jr. Correction for guessing in choice reaction time. Psychonomic Science, 1967, 8, 321-322. Yellott, J. I., Jr. Correction for guessing and the speed-accuracy tradeoff in choice reaction time. Journal of Mathematical Psychology, 1971, 8, 159-199. Yellott, J. L, Jr. The relationship between Luce's Choice Axiom, Thurstone's theory of comparative judgment, and the double exponential distribution. Journal of Mathematical Psychology, 1977, 15, 109-144.
This page intentionally left blank
Author Index Note: Entries in boldface are to the references. Abeles, M., 159, 521 Acosta, E., Jr., 217, 539 Aczel, J., 157,368,521 Ahumada, A., Jr., 159,521 Ahumada, P., 137, 138, 157, 158, 521, 544 Aiken.L. R., Jr., 73, 521 Akerboom, S., 397, 542 Alegria, J., 205, 211, 212, 270, 521 Allan, L., 538 Alliusi, E. A., 231, 534 Allport, D. A.,140, 521 Anderson, D. J., 112,538 Anderson, J. A., 440, 441, 521, 528, 532 Anderson, J. R., 464, 521, 538 Anderson, N. H., 85,521 Anderson, T. W., 373, 521 Andriessen, J. E. B., 477, 538 Angel, A. ,51, 58, 521 Annett, J., 73, 521 Arrow, K. J.,532 Ashby, F. G., 24, 105, 119, 154, 216, 243, 244, 350, 351, 429, 431, 432, 456, 464, 466, 468,479,481,502,521,543 Atkinson, R. C., 430, 435, 522, 527 Audley, R. J., 235, 320, 322, 325, 522 Baddeley, A. D., 430, 522 Bamber, D., 445-447, 522 Barlow, R. E., 14, 24, 25, 522, 526 Bartlett, M. S., 368,522 Bartlett, N. R.,58, 118,522 Bauer, D. W., 451,533 Baumeister, A., 80,529 Bayes, T., 21 Bekker, J. A. M., 196, 235, 241-243, 248250, 530, 538 Seller, H. C., 445, 522 Belyayev, Yu. K., 134,527 Berger, G. O., 58 Berliner, J. E., 406, 522 Bertelson, P., 185, 197, 261, 262, 270, 408, 409, 522 Beyer, W. A., 321, 522
Billington, M. J., 74, 75, 535 Bindra, D.,262, 525 Birenbaum, S., 430, 524 Birren, J. E., 231, 522 Bjork, E. L., 430, 522 Blackman, A. R., 482, 522 Blaha, J.,479,523 Blough, D. S., 223, 224, 228-230, 316, 522 Bloxom, B., 17, 113, 114, 123, 523 Blumenthal, A. L., 7, 523 Boer, L.C., 262, 264, 543 Borger, R., 197,523 Botwinick, J., 118,231,523 Bourbon, W. T., 228, 544 Bower, G. H., 527, 542 Bracewell, R., 36,523 Braida, L. D., 406, 537 Brebner, J. M. T., 68, 523 Breitmeyer, B. G., 161, 523 Briggs, G. E., 479, 523, 529, 541 Brigham, E. O., 36, 37,523 Brinley, J. F., 231, 523 Broadbent, D. E., 177, 196, 523 Brogden, W. J., 66, 527 Brown, M. B., 123, 525 Brugge, J. F., 112,538 Brunk, H. D., 23, 26, 29, 148, 523, 524 Buckolz, E.,81,523 Burbeck, S. L., 35, 81, 83, 115, 117, 125, 127, 132-135, 137, 146, 147, 162, 163, 169-171, 523, 524 Burkheimer, G. J., 148, 149, 166, 537 Burnham, D., 76, 77, 534 Burns, J. T., 268, 524 Burrows, D., 445, 524 Bush, R. R.,524, 532, 533 Buder, R. A., 159,526 Callahan, D. J., 406, 537 Cantor, B. I., 381, 541 Carpenter, A., 390, 532 Carterette, E. C., 226, 309, 311, 317, 360362, 364, 365, 516, 521, 524 547
548 Author Index Castellan, N. J., Jr., 531, 532 Cattcll, J. McK.,58,215,524 Caudrcy, D., 231, 233, 354, 355, 357, 376, 543 Chase, W. G., 261, 263, 405, 538 Chocholle, R., 62, 64, 65, 68, 70, 524 Christie, L. S., 98, 120,524 Clark, L. F., 159,529 Clark, V. A., 14, 123,527 Cleland, B. G., 159,524 Clifton, C., 430, 524 Cohen, S. P., 256, 278, 294-296, 298, 306308, 317, 318,526 Cooper, W. E., 186,524 Corballis, M. C., 430, 524 Corbett, A. T.,244, 524, 545 Cosmides, R., 226, 309, 311, 317, 360-362, 364,365,516,524 Costa, L. D., 118,524,543 Cowan, W. B., 162,532 Cox, D. R., 181, 524 Craft, J. L., 263, 405, 528 Craik, K. J . W . , 1 8 6 , 524 Cramer, H., 369, 524 Creamer, L . R . , 197, 231,524 Crossman, E. R. F. W., 392, 524 Dalgleish, L.,239, 536 Davies, D. R., 177,524 Davis, K. A., 162,532 Davis, R., 74, 186-188, 193, 194, 524, 525 dcBoor, C. A., 113,525 de Klerk, L. F. W., 263, 525 DeRosa, D. V., 430,534 Disch, K., 227, 242, 244, 245, 247, 531 Dixon, W. J., 123,525 Donaldson, G., 483, 525 Donders, F. C., 1, 55, 212-215, 217, 472, 476,491,525 Dosher, B. A., 244, 253, 525 Dornic, S., 521, 526, 530, 534, 538, 539, 542 Drazin, D. H., 74, 525 Dubin, M. W., 159,524 Duncan, I., 228, 545 Duncan, J., 478, 525 Durlach, N.,406, 522, 536 Dwivcdi, A., 256, 278, 294-296, 298, 306308,317, 318,526 Ecob, J. R., 430, 522 Eden, M., 538 Edwards, W., 235, 304, 305, 311, 313, 316, 317, 346, 525, 541 Eerland, E.,263,525
Efron, B.,525 Egan, J. P., 178, 179, 183, 225, 230, 525 Egcth, H. E., 235, 239, 431, 445, 525, 538 Eichelman, W. H., 261, 445, 525 Eijkraan, E. G. J., 99,533 Eithron, A., 196,525 Elliott, R., 63, 525 Emerson, P. L., 145,525 Emmerich, D. S., 225, 227, 228, 235, 529 Engcl, S.,261,534 Engelman, L., 123, 525 Enroth-Cugcll, C., 159,525 Entus, A., 262, 526 Erdclyi, A.,36, 500, 526 Erulkar, S. D., 159,526 Estes, W. K., 141, 320, 430, 451, 526, 532 Everitt, B., 275, 526 Faberman, D., 261,534 Falmagne, J.-C., 256, 270, 276, 278, 280, 281, 283, 285, 286, 294-296, 298, 304, 306-308, 317, 318,390,432,526 Fehrcr, E.,67, 537 Feller, W., 33, 371, 372, 436, 526 Fernberger, S. W., 225, 526 Festingcr, L., 225, 231, 526 Filippov, N. D., 489,539 Filmore, T.,228, 534 Fisher, D. F., 217, 410, 411, 535 Fisher, R. A., 504, 526 Fisher, R. P., 479, 523 Fills, P. M., 241, 243, 526 Forbes, S. M.,242, 541 France,!. W., 123,525 Fraisse,P., 186, 187,526 Friedman, M. P., 226, 309, 311, 317, 360362,364,365,516,521,524 Galambos, J., 504-506, 526 Galanter, E. H., 72, 88-90, 117, 124, 126, 209-211,532,533,540 Gardner, G. T., 431, 526 Garrctt, II. E., 225, 241, 526 Geller, E. S., 263, 526, 544 Gerstein, D., 159, 526 Gescheider, G. A., 225, 227, 526 Ghosh, B. K., 341, 526 Gibbon, J.,538 Gihman, I. L, 273, 526 Gilden, L., 118,524,543 Glaser, R. E., 16,527 Glass, E., 225, 526 Gnedenko, B. V., 134, 499, 527 Gold, B., 39, 537
Author Index Goldberg.J. P., 381, 533 Goldberg, S. L., 445, 539 Goldstein, M. H., 159,521 Gordon, I. E., 80, 81, 527 Granjon, M.,270, 527 Gray, J. L., 225, 227, 228, 235, 525 Grayson, D. A., 98, 118, 119, 239, 527, 543 Green, D. M., 37, 56, 59, 64, 66, 76, 78, 82, 105, 108-112, 115, 131, 132, 134-136, 156, 180, 183, 184, 195, 206, 212, 213, 225, 230, 235, 254, 261, 310, 312, 313, 317, 340, 351, 357-360, 363-366, 375, 381, 383-387, 406, 517, 519, 527, 532, 533, 541 Greenbaum, H. B.,99 Greenberg, G. Z., 178, 179, 183, 525 Greenwald, A. G., 198-200, 527 Gregg, L. W., 66, 527 Gregory, M., 177, 523 Grice, G. R., 85, 87, 149, 150, 152-155, 169, 375, 527 Grill, D. P., 445, 527 Gross, A. J., 14, 123,527 Grossberg, M., 67, 528 Gumbel, E. J.,504, 528
549
Hopkins, G. W., 90, 528 Hull, C. L., 155, 528 Hurst, P. W., 454, 537 Hyman, R.,392, 529 Issacson, D. L., 44, 529 Jarvik, M. E.,529 Jastrow, J., 1,529 Jennings, J. R., 240, 241, 249, 250, 252, 529, 545 Jennrich, R. C., 123,525 John, I. D., 139,529 Johnsen, A. M., 479, 523, 529, 541 Johnson, D. M., 214, 225, 231, 529 Johnson, J., 165 Johnson, N. L., 334, 529 Jones, R. S., 441, 521 Jonides, J., 431,525 Juola, J. F.,430, 435, 522 Jury, E. I., 39, 529
Kadane, J. B., 483, 529 Kahneman, D., 185, 219, 529 Hacker, M. J., 263, 445, 452, 454, 528, 536, Kantowitz, B. H., 185, 243, 253, 528, 529, 537 535, 542 Hagman, J. D., 479, 523 Karlin, L., 72, 75, 192, 193, 194, 196, 197, Hale, D. J., 235, 261, 263, 405, 410-412, 527, 529 Karlin, S., 41, 273, 372, 529, 532 528 Hall, J. W., 71, 541 Karsh, R., 410, 535 Hall, M. A., 123,525 Kay, H., 197,529 Halpern, B. P., 68,529 Keele,S. W., 478, 529 Hand, D. J., 275, 525 Kellas, G.,80, 529 Hannes, M., 262, 528 Kelley, J.E., Jr.,485,529 Harm, O. J., 247, 528 Kelling, S. T.,68, 529 Harwerth, R. S., 161, 528 Kellogg, W. M., 231, 235, 416, 417, 529 Haviland.S. E., 432, 541 Kemeny, J. G., 44, 529 Heath, R. A., 236, 315, 332, 336, 343, 344, Kestenbaum, R., 192-194, 196, 197, 529 Keuss, P. J. G., Jr., 217, 218, 529, 543 346, 363, 365, 528, 532 Kiang, N. Y.-S., 112, 159,530 Hecker, M. H., 235, 528 Kimm, J., 69,536 Henmon, V. A., 231, 235, 528 Herman, L. M., 185,528 Kingman, J. F. C., 167,530 Hershenson, M., 67, 537 Kirby, N. H., 254, 259-264, 406, 409, 530 Kirchner, D. M., 225, 227, 526 Heuer, H.,206, 528 Hick, W. E., 390, 391, 394, 398, 410, 528 Klemmer, E. T.,72, 74, 530 Hienz, R.,69, 70, 536 Knoll, R. L., 92, 93,540 Hind, J. E., 112,538 Kohfeld, D. L., 55, 61-63, 70, 85, 86, 115-117, 125, 530, 534 Hinrichs, J. V., 263, 405, 528 Hinton, G., 441, 528 Kohonen, T., 441, 530 Hockley, W. E., 102, 427, 528 Kolers, P. A., 538 Hoffman, L, 225, 526 Kolmogorov, A. N., 499, 527 Koopman, B. O., 435, 530 Hohle, R. H., 99-102, 528 Koppel, S.,225,530 Holmgren, J. E., 430, 435, 522
550 Author Index Kornblum, S., 74, 254, 261, 390, 399, 402407, 409, 527, 530, 534, 542 Kornbrot, D. E., 358, 530 Kostcr, W. G., 196, 212, 525, 530, 536, 539, 540 Kotz, S.,334, 529 Krainz, P. L., 405, 528 Krantz, D. H., 480, 527, 531 Krauskopf, J.,59, 534 Krebs, M. J., 65, 67, 396, 398-401, 542 Krenshalo, D. R.,543 Krinchik, E. P., 390, 531 Krissoff, M., 156,543 Kristofferson, A. B., 89-91, 117, 528, 531 Krueger, L. E., 445, 446, 448, 451, 454, 455, 531
Kryukov, V. I., 373, 531 Kulikowski, J. J., 159,531 Kiilpe, O.,214 LaBerge, D., 141, 143, 150, 320, 325, 354, 531 Lachs, G., 381,531,541 Laming, D. R. J., 208, 209, 230, 235, 253255, 257-259, 261, 262, 264-266, 269-271, 302, 303, 317, 343, 346, 350-352, 357, 358-360, 364-366, 392-394, 397, 398, 413, 416-418,515,520,531 Lappin, J. S., 227, 242, 244, 245, 247, 248, 249, 528, 531 Larkin, J. H., 483, 529 Lawrence, B. E., 240, 241, 529 Lawrence, C., 196, 525 Lazarsfeld, P. F., 221, 531 Leadbetter, L E., 504, 532 Lee, C., 451, 532 Lehmkuhle, J. A.,441, 532 Lehtonen, E., 80, 534 Lemmon, V. W., 231, 235, 532 Leonard,:. A., 390, 397, 532 Levi,D. M., 161,528 Levick, W. R., 159,524 Levy, F. K., 485, 544 Levy, W. B., 441, 532 Liapounov, A. M., 499, 532 Lichtenstein, M., 73, 521 Lindeberg, J. W., 499, 532 Lindgren, G., 504, 532 Lindsay, P. H., 242, 541 Link, S. W., 102, 231, 235, 236, 291, 302304, 310, 315, 317, 331, 343, 344, 346, 360, 361, 364, 365, 515, 532 Lockhead, G. R., 406, 544 Loeve, M., 499, 532
Logan, G. D., 162, 532 Luce, R. D., 35, 59, 64, 66, 72, 76, 78, 81-83, 88-90, 98, 105, 108-112, 114, 117, 120, 124-126, 131-135, 137, 146, 147, 156, 162, 163, 180, 183, 184, 195, 209-211, 242, 310, 312,313, 317,351, 357,359,360,363,364, 375, 381, 383-388, 406, 413, 414, 416, 418, 480, 504, 506, 517, 524, 527, 531-533, 540, 544 Lupker, S. J., 283, 284, 317, 533 MacKenzie, C. L., 97, 533 Mackie, R. R., 177,533 Mackworth, J. F., 177, 533 Mackworth, N. H., 177,533 Macleod, S., 58, 522 MacMillan, N. A., 159,533 Madsen, R. W., 44,528 Magnus, F., 36, 500, 526 Maloney, L. T., 156,533 Mansfield, R. J. W., 59, 60, 62, 64-66, 68, 533 Marill, T., 196, 533 Marken, R., 159,521 Marrocco, R. T., 159,533 Marshall, A. W., 24,522 Marteniuk, R. G., 97,533 Mayer, R. H.,483, 529 McCarthy, P. L, 325,533 McClelland, J. L., 152, 153, 244, 375, 481, 533 McGill, W. J., 62, 99, 105, 143, 144, 153, 158, 381,501,533,541 Meihers, L. M. M., 99,533 Menckenbcrg, H. W., 430, 431, 543 Mercer, A., 235, 522 Mcrisalo, A., 73, 534 Merkel, J.,390, 391, 394,533 Messick, S.,523 Mewaldt, S. P., 217,539 Meyers, J. L., 227, 228, 230, 534, 542 Miller, D. R., 123,533 Miller, J., 69, 70, 128, 129, 130, 134, 451, 533, 536 Milligan, E. A., 225, 227,526 Mitchell, R. F., 445, 536 Modor, .L, 485, 533 Moller, A. R., 159,534 Mollon, J. D., 59, 534 Monsell.S.,92, 93,540 Moore, J. I., 489, 543 Morgan, B. B., 231,534 Morin, R. E., 430, 534 Moss, S. M., 228, 261, 534
Author Index Mosteller, F., 524 Moy, M. C, 432, 542 Mudd, S., 479, 534 Murdock, B. B., Jr., 102, 451, 452, 538 Murray, C. S., 235, 536 Murray, H. G., 62, 534 Myers, J. L., 227, 533 Naatanen, R., 71, 73-75, 77, 79-81, 534 Navon, D., 230, 534 Neill, W. T.,478, 529 Newman, R. C., 390, 532 Nickerson, R. S., 74, 76, 77, 194, 197, 315, 427, 445, 532, 534 Niemi, P., 71, 80, 534 Nissen, M. J., 87, 217, 535 Moreen, D. L., 344, 535 Norman, D. A., 226, 228, 535 Norman, M. F., 254, 535 Nosofsky, R. M., 62, 335, 357, 406, 533, 535 Nullmeyer, R., 87, 527 Oberhettinger, F., 36, 500, 526 Oilman, R. T., 74, 75, 162, 189, 193, 194, 205, 219, 221, 235, 240, 245, 246, 249-251, 275, 286, 290, 292, 293, 299, 300, 302, 308, 309,317,483,484,514,535 Orlebeke, J . F . , 217, 529,543 Pachella, R. G., 217, 241, 243, 410, 411, 482, 535, 540 Packer, J., 226, 231,543 Pacut, A., 155,535 Parasuraman, R., 177, 524 Parker, D. M., 161,536 Parzen, E., 29, 41, 273, 536 Pashler, H., 199,478,536 Pearson, K., 142, 536 Pease, V., 67,536 Perkel, D. H., 169,536 Peters, G. L., 479, 523 Pew, R. W., 237, 241-243, 249, 535, 536 Pfeiffer,R. R.,159, 536 Pfingst, B. E.,69, 70, 536 Phillips, C., 485, 533 Pickett, R. M., 231, 235, 353, 536 Pieron, H., 58, 59,536 Pierrel, R., 235,536 Pieters, J. P. M., 482, 536 Pike, A. R., 231, 235, 239, 320, 322, 325, 357, 522, 536 Pike, R., 225, 536 Pitz, G. F., 263, 526 Poffenberger, A. T., 67, 524, 536
551
Polya, G.,495,536 Posner, M. I., 7, 185, 213, 217, 445, 536 Proctor, R. W., 452, 454, 536, 537 Proschan, F., 14,24,25,522 Prucnal, P. R., 381, 537 Purks, S. R., 406, 537 Quastler, H.,540 Raab, D. H., 66, 68, 70, 537 Raaymakers, E., 397, 542 Rabbitt, P. M. A., 235, 245, 265, 268, 408, 409, 521, 526, 530, 537, 542 Rabiner, L. R., 39,537 Rains, J. D., 67, 537 Rao, K. V.,454, 536, 537 Rapoport, A., 148, 149, 166, 393, 537 Ratcliff, R., 102, 103, 244, 370, 372, 373, 438-441, 445, 451-454,536,537 Reed, A. V., 237, 244, 538 Remington, R. J., 255-258, 262, 286, 405, 538 Renkin, A,,261, 262, 522 Requin, J., 186,540 Resile, F., 531, 532, 538 Reynard, G., 270, 527 Rice,J., 123,538 Rilling, M. E., 228, 544 Ritz, S. Z., 441, 521 Robbin, J. S., 231, 523 Robson, J. G., 159,525 Rogers,B., 265, 537 Rogers, R., 81,523 Roos, R. N., 217, 430, 423, 543 Rootzen,H.,504, 532 Rose, J. E., 112,538 Rosenblatt, M., 123,538 Ross, B. H., 464, 482, 538, 542 Rubinstein, L., 74,538 Ryder, P., 225, 536 Salthouse, F. A., 195, 242, 538 Salzen, E. A., 161,536 Sampath, G., 372-374, 538 Sanders, A. F., 80, 87, 353, 376, 477, 535, 538 Sandusky, A., 159,521 Santee, J. L., 55, 70, 115-117, 125, 239, 530, 538 Saslow, C. A., 89,538 Scharf, B.,59, 164,538 Scheiver, C. J., 253, 538 Schlosberg, H., 59,63,545 Schmitt, J. C., 253, 538
552 Author Index Schneider, W., 430, 444, 538 Schnizlein, J. M., 87, 120, 125, 128, 129, 142, 147, 151,527,538 Schouten, J. F., 235, 241-243, 248-250, 538 Schulman, A. I., 178, 179, 183, 199, 200, 529 Schvaneveldt, R. W., 261, 263, 405, 538 Schweickert, R., 484, 486, 488-491, 539 Shannon, C. E., 390, 539 Shannon, R., 132, 133 Shapiro, R. G., 445, 454, 531 Shaw, M. L., 435-437, 539 Shaw, P., 435, 539 Shevrin, L. N., 489, 539 Shiffrin, R. M., 430, 444, 538 Shinar, D., 479, 523 Shipley, W. G., 263, 526 Shulman, H. G., 199, 200, 527 Siebert, W. M., 375, 539 Silverman, W. P., 445, 539 Silverstein, J. W., 441, 521 Simon, J. R., 217, 478, 539 Sinaiko, H. A., 533 Singpurwalla, N. D., 14, 123, 134, 533, 539 Skinner, B. F.,7, 539 Skorohod, A. V., 273, 526 Smith, A. F., 62, 135-137, 167, 206, 212, 213, 235, 254, 261, 345, 357-359, 363, 365, 366, 406, 519, 527, 533 Smith, E. E., 235, 392, 525, 539 Smith, G. A., 211, 419, 421, 482, 539 Smith, J. E. K., 410, 540 Smith, M. C., 185, 187, 192, 193, 197, 540 Smith, P. G.,286, 432, 542 Smith, P. L., 325 Smith, W. L., 181,524 Snell, J. L., 44, 529 Snodgrass, J. G., 72, 88-90, 117, 124-126, 209-211, 433, 464, 465, 467-469, 472, 540, 543 Soloyev, A. D., 134,527 Speidel, C. R., 218, 539 Spence, K. W., 155,540 Sperling, G., 262, 540 Srinivasan, S. K., 372-374, 538 Stanovich, K., 410, 482, 540 Stein, W. E., 148, 149, 166, 537 Stelmach, G. E., 186,540 Sternberg, S., 80, 92, 93, 119, 214, 271, 423, 426-430, 433-435, 439, 472-477,479,481483,488,491,540 Stevens, J. C., 71, 541 Stevens, K. N., 235, 528 Stevens, S. S., 69, 541 Sticht, T. G.,67, 541
Stilitz, I., 74, 541 Stone, L. D.,435, 436,541 Stone, M., 340, 541 Stroh, C. M., 177,541 Stroud, J. M., 140, 196,541 Stultz, V., 430, 534 Sudevan,P.,432, 541 Summer, D. P., 489,543 Suppes, P., 480, 527, 531, 532 Swanson, J. M., 479, 523, 530, 541 Swensson, R. G., 233, 235, 238, 240, 242244, 266-269, 304-306, 311, 313-317, 319, 340, 359, 410, 541 Swets, J. A., 56, 225, 230, 340, 527, 541 Tanis, D. C., 225, 227, 228, 235, 525 Taylor, D. A., 432, 445, 447, 448, 482, 541 Taylor, D. H., 37, 215, 216, 541 Taylor, H. M., 41, 273, 372, 541 Taylor, M. M., 242, 541 Teich, M. C., 381, 531, 537, 541, 543 Teichner, W. H., 65, 67, 71, 396, 398-401, 541, 542 Telford, C. W., 185, 186, 542 Ten Hoopen, G., 397, 542 Ter Linden, W., 353, 376, 538 Terzuolo, C. A., 375, 377, 544 Theios, J., 283, 284, 286, 317, 397, 432, 433, 526, 533, 542 Thomas, E. A. C., 16, 76, 227, 230, 235, 238, 240, 242, 245, 276, 277, 319, 340, 482, 529, 542 Thomas, E. C., 159,529 Thomas, R. E., 319, 541 Thomason, S. C., 479, 523 Thompson, L. W., 118,523 Thranc, V., 87,542 Thurstone, L. L., 150, 504, 542 Tindall, A. D., 231, 235, 236, 304, 317, 532 Tippett, L. H. C., 504, 526 Titchmarsh, E. C., 36, 542 Tobias, J. V.,538 Tolhurst, D. J., 159, 160, 161, 531, 542 Torporck, J. D., 123,525 Townsend, J. T., 105, 119, 216, 217, 243, 429-432, 435, 456, 462-464, 466-468, 472, 479, 521, 540, 542, 543 Traupman, J., 432,542 Treisman, M. A., 406, 543 Tretter, S. A., 39, 543 Tricomi.F. G., 36, 500, 526 Trotter, W. T., 489, 543 Tukey, J. W., 108, 161 Tune, G. S., 177,524
Author Index Tversky, A.,480, 531 Tversky,B.,445,543 Tych, W., 155,535 Ulm, C.,430, 534 Uttal, W. R., 156,543 Valitalo,T.,80, 534 van Arke, A. E., 477, 538 van der Heijden, A. H. C., 430, 431, 543 van der Molen, M. W., 217, 218, 529, 543 Vannucci, G., 381, 543 Vaughan, H. G., 118, 542, 543 Vervaeck, K. R., 262, 264, 543 Vickers, D., 226, 231, 233, 235, 325, 326, 354, 355, 357, 376, 543 Vince, M. A., 186,544 Viviani, P., 375-378, 382, 384, 544 von Gierke, S. M., 206, 212, 213, 235, 254, 261, 357, 358, 365, 366, 519, 527 Vorberg, D., 460, 544 Vyas, S. M., 235, 245, 536
553
Weaver, H. R., 235, 544 Weber, B.J., 225, 227, 526 Weiss, A. D., 118, 197,529,544 Weist, J. D., 485, 544 Welford, A. T., 2, 7, 68, 185-188, 193, 194197, 215, 478, 491, 523, 530, 539, 544 Wells, G. R., 67, 544 Welsh, D., 137, 138, 157, 158, 544 Wertheim, A.,538 Wetherill, G. B., 328, 544 Whitman, C. P., 263, 526, 544 Wickelgren, W. A., 226, 228, 240, 243, 244, 245, 252, 524, 535, 544, 545 Wiener, N., 369 Wijnen, J. L. C., 477, 538 Wilcox, S.,80, 529 Wilding, J. M., 231-235, 306, 316, 545 Williams, C. E., 235, 528 Williams, J. A., 261, 263, 545 Willig, M.,225,526 Willson, R. J., 231, 233, 354, 355, 357, 376, 543
Wolfowitz, J.,340, 544 Wong, M.-T., 14, 123,539 Wainer, H.,523 Wood, C. C., 240, 241, 249, 250, 252, 529, Wald, A., 145, 327, 328, 333-335, 340, 544 545 Wall, S., 431, 525 Woodrow, H.,72, 545 Wallace, N. D., 55, 115-117, 125, 530 Woodworth, R. S., 59, 63, 214, 545 Walter, D. G., 433, 542 Wandell, B. A., 137, 138, 156-158, 379, 381, Wrenn, R. F.,263, 526 386, 388, 506, 533, 544 Wright, C. E., 92, 93,540 Ward, L. M., 406, 544 Wright,:. H.,225, 227, 526 Watanabe,T., 159,529 Watson, C. S., 225, 227, 228, 235, 525, 544 Yager, D., 228, 545 Weatherburn, D., 239, 253, 270, 321, 341, Yellott, J. I., Jr., 221, 235, 275, 286, 290292, 299-301, 317, 504, 512, 545 544
Subject Index a-reactions, 213 Abel's functional equation, 504 Absolute identification paradigm, 389 Accelerating cycle model, 419 Accumulation of information, experimental manipulation of, 353 Accuracy, measures of, 237, 242 Accurate responses, run of, 304 Actual processing time, 457 Additive factors, 97 method of, 119, 473 generalization of, 485 models suitable for, 483 and SATF, 483 Additive stochastic process, 368 Adjustable timing model (ATM), 249 Algebra of events, 11 Alternations, 254 Analysis, critical path, 485 Anticipation, 56, 262 Assumption of small steps, 329 ATM, adjustable timing model, 249 Attention, 73 Automatic facilitation, 262 axiom, choice, 414 b-reactions, 213 Backward equation, 369 Band payoff, 88,209 Band, critical, 164 Bayes' Theorem, 148, 245, 340 Beta function, incomplete, 142, 321 Binomial distribution, 39, 321 negative, 142, 321 generating function of, 40 Borel sets, 11 Boundaries, SPRT, 350 diffusion process for, 375 Bounded residual latency, 104, 180 Bypassing some processing, 262 c-reactions, 213 CAP, conditional accuracy function, 245 for fast-guess model, 245 relation to SATF, 249
554
Capacity rcallocation model, 435 Cascade model, 152, 481 equation, 153 SATF, 244 Catch trials, 55, 74, 205 Cell, sustained and transient, 159 Censoring reaction time data, 132 Center, decision, 96 Central Limit Theorem, 32, 99, 496 for renewal processes, 496 cf, characteristic function, 36 Chain, Markov, 44, 148 Change detectors, 159 rate and neural" models of, 167 Characteristic function (cf), 36 Choice axiom, 414 Clock experiment, 176 Clocking model, 156 Comparability relation, 489 Comparable stages, 485 Compatability, stimulus-response, 198, 395 Computer programs for estimating hazard functions, 123 Conditional accuracy function, see CAF Conditional density function, see Hazard function Conditional distribution function, 21 Confidence rating ROC, 177 Consistent estimator, 29 of distributions, 29 of raw moments, 29 Consistent relation, 489 Consistent-set presentation, 423 Constant loreperiod, 55, 72 Continuous random walk, 438, 451 Continuous state stochastic process, 42 Continuous time stochastic process, 42 Convolution, 31 Correct rejection, 225 Counting model, 156, 379 generalization of, 166 ROC, 381 SATF, 382 Counting process, 42, 155 variant of, 158 Coupled slack, 486
Subject Index Covariance of random variables, 25 Criterion for hazard function, increasing and decreasing, 16 Criterion, response, 72, 84 variable, 149 Gaussian distribution of, 150 Critical band of frequencies, 164 Critical path analysis, 485 Cubic spline, 137 Cumulant generating function, 37 of additive stochastic process, 368 Cumulants of a random variable, 37 relation to moments, 38 Curves LOC, 227, 245 ROC, 225 S-A OC, 237 d',230 Deadline, 74 model, 74, 315 s-,311 sn-, 311 Decision latency, 96 negligible, 115 negligible variance, 117 Decision mechanism, 219 Decision parameter, 219, 220 Decision rule for random walk, 327 Decision strategy variable, 219, 220 Decision time, 94 Decision center, 96 Density function, 9, see also Distribution function conditional, 21 marginal, 21 Detectors, change and level, 159 Diffusion process, 145, 154, 369 with drift, 369 random, 373 equation of, 369, 370 hazard function, 373 SATF, 370 variable boundaries, 373, 375 Discrete state stochastic process, 42 Discrete time stochastic process, 42 Distribution function, 8, 12 binomial, 39, 321 generating function of, 40 negative, 142, 321 conditional, 21 consistent estimator of, 29 double monomial, 124, 510
555
double exponential (extreme value type I), 508 exponential, 10, 15 double, 508 mgf, 34 ex-Gaussian, 35, 100, 439, 511 extreme value type II, 508 gamma, 14, 19, 501, 507 generalized, 153 Gaussian, 16, 507 inverse (Wald), 145 mgf, 35 geometric, 40 high-tail, 88 joint, 20 LaPlace, 144, 509 log-Gaussian, 510 logistic, 511 marginal, 21 mixture, 132 normal, see Gaussian Poisson, 24, 40 generating function of, 40 Wald (inverse Gaussian), 19, 145, 334, 509 Weibull (extreme value type III), 157, 509 Distribution, sample, 29 Distributive memory model, 440 Donders' subtraction method, 213, 293, 472 Double exponential distribution (extreme value type I), 508 Double monomial distribution, 124, 510 Double-stimulation experiment, 185 Drift parameter, 369 Driving force, periodic, 142 Duration, signal, 65
Electromyogram (EMG), 97, 118 EMG, electromyogram, 97, 118 Energy, signal, 66 Equally-likely foreperiods, 73 Equation, Abel's functional, 504 backward, 369 cascade, 153 diffusion, 369 Fokker-Planck, 369 Error tradeoff (ROC), 56 Estimation, time, 72, 89, 269 suppression of, 74, 76 variability of, 90 Estimator of a statistic, 27 consistent, 29 of hazard function, 123
556 Subject Index Estimator of a statistic (com.) unbiased, 27 of mean, 27 of variance, 28 Ex-Gaussian distribution, 35, 100, 439, 511 mean, 36 variance, 36 Exhaustive search, 428 in parallel model, 435 versus self-terminating search, 432, 465 in serial model, 427 Expectancy theory, 197 subjective, 262 Expected value of a random variable, 25 properties of, 26 Experiment, see also Paradigm clock, 176 double-stimulation, 185 flash-rate, 376 Exponential distribution, 10, 15 cumulative SATF, 244 mfg, 34 foreperiods, 75, 105 Extreme value distributions, 502-508 type I (double exponential), 504, 508 type II, 504, 508 type III (Weibull), 504, 509 Facilitation, automatic, 262, 409 Factors, method of additive, 119, 473 Failure rate, 14 False alarm, 81,225 Fast Fourier transform (FFT), 37, 98, 108 Fast-guess model, 221, 286-288 CAF, 245 generalized, 292 and linear operations, 292 and memory scanning, 294 predictions, 289 SATF, 238, 291 Feedback, information, 57, 206 FFT, fast Fourier transform, 37, 98, 108 Fixed foreperiod, 206 Fixed point property of mixture model, 276 Fixed-set presentation, 423 Fixed-stopping model, 238 Flash-rate experiment, 376 Fokker-Planck equation, 369 Force of mortality, 14 Foreperiod, 53 constant, 55, 72, 206 interaction with intensity, 80 interaction with signal probability, 80
random, 54, 206 equally likely, 73 exponential, 75, 105 non-aging, 76 variable, 73 Fourier transform, 36 fast (FFT), 37, 98, 108 Free-response paradigm, 51, 178 Function characteristic (cf), 36 conditional density, 14, 21 conditional accuracy (CAF), 245 cumulant generating, 37 density, 9 distribution, 8, 12 joint, 20 failure rate, 14 generating, see Generating function hazard (intensity), 14 estimate of, 123 separable, 156 incomplete beta, 142, 321 intensity, 44 latency-probability, 237 log survivor, 15 moment generating (mgf), 33 psychometric, 229 speed-accuracy tradeoff (SATF), 237 Functional equation, Abel's, 504 Gamblers ruin, 370 Gamma distribution, 14, 19, 501, 507 generalized, 153 Gaussian (normal) distribution, 16, 507 of criterion, 150 generalized, 153 inverse (Wald), 145, 509 mgf, 35 Generating function, 39 binomial distribution, 40 cumulant, 37 of additive stochastic process, 368 moment (mgf), 33 Poisson distribution, 40 Geometric distribution, 40 Geometric process, 165 Go, No-Go paradigm, 55 Grice model, 85, 87 Grouping of signals, 195 Hamming smoothing, 131 Hazard function, 14 criterion for increasing and decreasing, 16
Subject Index of diffusion process, 373 estimation of, 123 by quadratic splines, 123 random smoothing of, 123 separable, 156 of sum of independent random variables, 24 Hick's law, 391 High-tailed distribution, 88 History, interval and repetition, 280 Hit, 225 Hypothesis, latency function, 225 stimulus information, data against, 399 IAT, interarrival time, 45 Identification, two-choice paradigm, 205 Identity reporter, 447 Identity, Wald's, 328 Ideomotor pair, (IM), 198 IM, ideomotor pair, 198 Incomparable stages, 485 Incomplete beta function, 142, 321 Independence, 480 in a stochastic process, 42 local, 221 of random variables, 22 Independent increments in a stochastic process, 43 Independent random variables, sum of, 24, 495 Information accumulation, 84 Bayesian, 419 binary, 320 continuous time, 367 cumulative exponential, 150 experimental manipulation, 353 in simple reaction time, 140 Information feedback, 57, 206 Information measure, 392 Information transmitted, 242, 410 Insertion, pure, 214 Instantaneous failure rate, 14 Integrator, leaky, nonlinear, cascade, 376 Intensity function, 14 Intensity, signal, 58, 64 interaction with foreperiod, 80 subjective, 69 Interarrival time (IAT), 45 Intercompletion time, 457 Interstimulus interval (ISI), 178 Intertrial phenomenon, 262 Interval history, 280
557
Inverse Gaussian (Wald) distribution, 145, 509 Irreducible minimum reaction time, 59 ISI, interstimulus interval, 178 Item recognition task, 423 Joint distribution function, 20 LaPlace distribution, 144, 509 LaPlace transform, 34 Latency, 96 bounded residual, 104 decision, 96 negligible variance of, 117 negligible, 115 residual, 97, 124 bounded,104,180 Latency function hypothesis, 225 Latency operating characteristic (LOG), 227, 245 Latency-probability function, 237 Latent structure analysis, 221 Law, Hick's, 391 Law, Pieron's, 58, 63, 69, 158 Level detectors, 159 and signal recognition, 172 Likelihood ratio, 22, 340 Limit theorem Central, 32, 496 for parallel processes, 503 for serial processes, 496 Linear operator model, 278 and fast-guess model, 292 Linear transformation, moment generating function of, 34 Linear, dynamic, stochastic model, 155 LOG, latency operating characteristic, 227, 245 Local independence, 221 stage, 460 Lock-out model, 181 Log odds, 242 Log-survivor function, 15 Log-Gaussian distribution, 510 Logistic distribution, 511 Macro tradeoff, 237 Marginal distribution, 21 Markov chain, 44, 148 Markov process, 43 continuous-time, 145 Mean of random variable, 25, 507-511 ex-Gaussian, 36
558 Subject Index Mean of random variable (contd.) residue, 59 sample, 27, 32 unbiased estimator of, 27 Mechanism, decision, 219 perceptual, 219 sensory, 219 Memory model, distributive, 440 Memory retrieval model, 438 Memory scan paradigm, 423 Memory states, 295 and fast-guess model, 294 mgf, moment generating function, 33, 507-511 exponential density, 34 Gaussian density, 35, 507 of linear transformation, 34 Micro tradeoff, 245 Miss, 225 Mixture distribution, 132 Mixture model, two-state, 274 fixed-point property, 276 Model, accelerating cycle, 419 for additive factors method, 483 adjustable timing (ATM), 249 capacity reallocation, 435 cascade, 152, 481 SATF, 244 clocking, 156 continuous random walk, 451 counting, 156, 379 generalization of, 166 deadline, 74, 315 diffusion, 369 distributive memory, 440 fast-guess, 221 CAP, 245 generalized. 292 and linear operations, 292 and memory scanning, 294 predictions, 289 fixed-stopping, 238 Grice, 85, 87 linear operator, 278 linear, dynamic, stochastic, 155 lock-out, 181 memory retrieval, 438 mixture, 274 fixed-point property of, 276 neural for change detection, 167 optional-stopping, 319 parallel exhaustive search, 435 parallel, self-terminating, 438 queuing, 181
race, 162 random walk, 144, 269, 315, 327 boundary restrictions, 346 Gaussian steps, 331 hazard function, 334 simple reaction times, 334 SPRT, 340 SSR, 342 tandern, 365 rate for change detection, 167 recruitment, 320 response discard, 413 runs, 322 serial, exhaustive search (SES), 428 imperfect search, 448 serial, self-terminating search, 428 simple accumulator, 320, 354 single-channel, 188 strength accumulator, 325 tandem random walk. 365 timing, 156, 379 two-state mixture, 274 stage, 457, see also Stages Moment generating function, see mgf Moment, raw, 25 consistent estimator of, 29 sample, 29 relation to cumulants, 38 Monotonicity, 480 MRT, sample mean of reaction times, 60 and stimulus information, 392 Negative binomial distribution, 142, 321 Network critical path, 485 parallel, 457, 485 serial, 457, 485 Non-aging foreperiods, 76 Non-repetition trial, 254 Nonselective preparation, 270 Nonsensory factor, 59 Normal density, see Gaussian
Odds, posterior, 22 Operating characteristic, latency (LOG), 245 Operator, linear, 278 Optimal capacity reallocation model, 435 Optimal scanning, 428 Optional-stopping model, 319 Order, partial, 485 relation consistent with, 489 Orthogonal representations of signals, 442
Subject Index overlearned responses, 198 Paradigm absolute identification, 389 deadline Yes-No, 384 exhaustive search model, 435 free response, 51, 178 Go, No-Go, 55 memory scan, 423 recognition, 208 same-different, 445 simple-reaction time, 51 vigilance, 51, 176 visual search, 426 two-choice identification, 205 Yes-No detection, 205 Parallel network, 457, 485 versus serial, 465 self-terminating model, 438 Parallel processes, limit theorem, 503 Parameter, sensory and decision, 219, 220 Partial order, 485 relation consistent with, 489 Partitive subset, 489 Path, critical, 485 Payoff, 206 band, 88, 209 Perceptual mechanisms, 219 Perceptual preparation, 197 Perceptual quanta, 196 Performance, change in versus SATF, 240 Periodic driving force, 142 Physically realizable searches, 460 Pieron's law, 58, 63, 69, 158 Poisson distribution, 24, 40 generating function of, 40 signal presentation, 175 Poisson process, 15, 45, 374 Population, 11 Posterior odds, 22 Preparatory effects, 409 nonselective, 270 perceptual and response, 197 selective, 270 Preprogrammed responses, 92, 313 Presentation, fixed-, consistent-, and variedset, 423 Prior odds, 22 Process, stochastic, see Stochastic process Processing time, actual, 457 Psychological refractory period, 185 Psychometric function, 229 Pure insertion, 214, 472 Push-down stack, 432
559
Quadratic splines, estimating hazard functions by, 123 Quanta, perceptual, 196 Queuing model, 181 Queuing, signal, 188 Race model, 162 Random foreperiod, 206 Random sample, 26 of random size, 29 Random smoothing of hazard function, 123 Random variable, 7, 12 covariance of, 25 criterion, 147 cumulants of, 37 expected value of, 25 properties of, 26 foreperiod, 73, 206 independence of, 22 raw moments of, 25 sum of independent, 24 transformation of independent ones, 23 variance of, 26 properties of 26 Random walk, 44 Random walk model, 144, 269, 315, 327 boundaries, constant separation of, 346 continuous, 438, 451 decision rule for, 327 Gaussian steps, 331 data analysis, 365 hazard function, 334 maximizing expected value, 347 simple reaction times, 334 SPRT, 340 SSR, 342 tandem, 365 Ratio, likelihood, 22 Raw moment of a random variable, 25 consistent estimator of, 29 Reaction signal, 51 Reaction time, 53 censoring of data, 132 decision rate in simple, 140 irreducible minimum, 59 simple, paradigm, 51 stopping rate in simple, 140 sample mean (MRT), 60 sample variance (VRT), 60 versus response time, 3 Reactions, a-, b-, c-, 213 Readiness theory, 197 Realizable process, 461 Recognition and level detectors, 172
560 Subject Index Recognition paradigm, 208 Recruitment model, 141, 320 Refractoriness, 77 Refractory period, psychological, 185 Relation, comparability, 489 consistent with a partial order, 489 Relative judgement theory (RJT), 344 Renewal process, 45, 155 Repetition of stimuli, 254, 280, 405 Residual latency, 94, 97, 124 bounded, 104,180 Residue, mean, 59 Responding to first spike, 106 Response discard model, 413 Response-stimulus interval, see RSI Response terminated signal, 310, 319 Response time versus reaction time, 2 Responses, 90-94 criteria for, 72, 84 free, 178 modes of, 90 overlearned, 198 preparation for, 197 preprogrammed sequences of, 92, 304 run of accurate, 304 RJT, relative judgment theory, 344 ROC curves, 225 confidence, 177 counting model, 381 RT-, 227 timing model, 382 RSI, response-stimulus interval, 53, 205, 261 role of in sequential effects, 261 RT-ROC, 227 Run of preprogrammed responses, 304 Runs model, 322 S-A OC, speed-accuracy operating characteristic, 237 s-deadline, 311 Same-different paradigm, 445 SATF, 451 Sample distribution, 29 consistent estimator of, 29 Sample mean, 27, 32 of reaction time (MRT), 60 Sample moment, 29 Sample size for sequential effects, 255 Sample space 11 Sample variance, 27 of reaction times (VRT), 60 Sample, random, 26 random size, 29
Sampling prior to signal onset, 316 in SPRT, 351 Sampling, theory of stimulus, 141 SATF, speed-accuracy tradeoff function, 237 and additive factors, 483 and CAP, 249 in cascade model, 244 versus change in performance, 240 in counting model, 382 fit by cumulative exponential, 244 in diffusion model, 370 in fast-guess model, 238, 291 in same-different experiment, 451 and sequential effects, 266 in timing model, 383 Scanning, optimal, 428 Schedule of signal presentations, 206 Search, physically realizable, 460 Search, visual, 426 Selective preparation, 270 Self-terminating model, 432 serial, argument against, 433 parallel, 438 versus exhaustive search, 465 Sensory display variables, 219, 220 Sensory mechanisms, 219 Sensory parameters, 219, 220 Sensory trace, 262 Separable hazard function, 156 Sequential effects, definition of, 254 sample size for, 255 role of RSI, 261 and SATF, 266 Sequential probability ratio test, see SPRT Serial network, 457, 485 exhaustive search model (SES), 428 imperfect search model, 448 variance prediction, 430 limit theorem, 496 versus parallel, 465 self-terminating search model, 428 variance prediction, 430 push-down stack, 432 argument against, 433 SES, serial, exhaustive search model, 428 Signal dctectability, theory of (TSD), 73, 340 Signals duration of, 65 energy of, 66 grouping of, 195 intensity of, 58, 64 subjective, 69 orthogonal representations of, 442 presentations of, schedule of, 206
Subject Index as Poisson process, 175 probability of, interaction with foreperiod, 80 queuing of, 188 reaction, 51 response-terminated, 319 warning, 53 countdown, 76 Simple accumulator model, 320, 354 Simple reaction time paradigm, 51 decision rate, 140 stopping rate, 140 Single-channel model, 188 Slack coupled, 486 of stage, 485 Small steps, assumption of, 329 Smoothing Hamming, 131 random, 123 sn-deadline, 311 Space, probability and sample, 11 Spectral density of the threshold noise, 375 Speed-accuracy operating characteristic (S-A OC), 237 Speed-accuracy tradeoff, 56, 81 Speed-accuracy trade-off function, see SATF Spline, 113 cubic, 137 quadratic, estimation of hazard functions by, 123 SPRT, sequential probability ratio test, 340 biased boundaries. 350 data analysis, 357 generalized to N-alternativcs, 418 linear boundary changes, 350 permature sampling, 351 properties of, 340 time equations, 342 SSR, symmetric stimulus representation, 342 data analysis, 359 properties of, 343 time equations, 344 Stable subset, 489 Stack, push-down, 432 Stage models, 457 Stages, 152 comparable and incomparable, 485 independence, local, 460 independent manipulation of, 119 slack of, 485 tests of, 448 State, stochastic process, discrete and continuous, 42
561
States of memory, 295 Stationary increments in a stochastic process, 44 Stationary stochastic process, 43 Statistic, 27 Strength accumulator model, 325 Stimulus information and MRT, 392 Stimulus information hypothesis, data against, 399 Stimulus-response compatability, 198, 395 Stimulus sampling theory, 141 Stochastic process, 41 additive, 368 continuous state, 42 continuous time, 42 counting, 42, 155, 381 variant of, 158 diffusion, 145, 154, 369 discrete state, 42 discrete time, 42 geometric, 165 independence in, 42 independent increments, 43 linear, dynamic, 155 Markov, 43, 145 Markov chain, 45 Poisson, 15, 45, 374 realizable, 461 renewal, 45, 155 stationary, 43 stationary increments, 44 Wiener, 154, 369 Strategy, 262 Subjective expectancy, 262 Subjective signal intensity, 69 Subset partitive, 489 stable, 489 Subtraction method of Donders, 213, 293, 472 Sustained cell, 159 Swensson's rule of thumb, 410 Symmetric stimulus representation, see SSR Tandem random walk model, 365 Template comparisons, 448 Theorem, Bayes', 148, 245, 340 Theory, see also Model expectancy, 197 readiness, 197 recruitment, 141 signal detectability (TSD), 73, 340 stimulus sampling, 141 Thomson condition, 480
562 Subject Index Threshold noise, spectral density of, 375 Time estimation, 72, 89, 269 suppression of, 74, 76 variability of, 90 Time series of signals and responses, 176 Time actual processing, 457 decision, 94 interarrival, 45 intercompletion, 457 reaction, 3, 53 residual, 94 response, 3 Timing model, 156, 379 ROC, 382 SATF, 383 Trace, sensory, 262 Tradeoff function, speed-accuracy, see SATF Tradeoff error (ROC), 56 macro, 237 micro, 245 speed-accuracy, 56, 81 Transform Fourier, 36 fast (FFT), 37, 98, 108 of independent random variables, 23 LaPlace, 34 z-, 39 Transient cell, 159 Transitions among memory states, 295 Trial, 53 alternation, 254 catch, 55, 74, 205 non-repetition, 254 repetition, 254
Two-choice identification paradigm, 205 Two-state mixture model, 274 Unbiased estimator, 27 of mean, 27 of variance, 28 Variable decision strategy, 219, 220 random, see Random variable sensory diaplay, 219, 220 Variance of a random variable, 26, 507-511 ex-Gaussian, 36 properties of, 26 unbiased estimator of, 28 sample, 27 of reaction times (VRT), 60 Varied-set presentation, 423 Vigilance paradigm, 51, 176 Visual search paradigm, 426 Wald distribution (inverse Gaussian), 19, 145, 334, 509
Wald's identity, 328 Walk, random, see Random Walk Warning signal, 53 countdown, 76 Watch, 176 Weibull distribution (extreme value type III), 157, 509 Wiener process, 154, 369 Yes-No detection paradigm, 205 z-transform, 39